Table of Contents
This chapter describes the tbLoader utility and its usage.
tbLoader is a utility designed to load a large amount of data into a Tibero database at once. With the tbLoader utility, a user does not have to write and input SQL statements individually into the database. Instead, the user can convert the column data into a general text file and stack it into the database. The tbLoader utility is useful when large amounts of data need to be saved into a Tibero database.
The tbLoader utility enables users to easily write data files and reduce data stacking time.
ThetbLoader utility is installed automatically when installing Tibero. If Tibero is uninstalled, the tbLoader utility will also be removed.
The following shows how the tbLoader utility can be executed through a command prompt.
$ tbLoader [options]
For the detailed information about the options that can be specified in the command prompt, refer to “5.7. Specifying Parameters in the Command Prompt”.
The following shows an example of how to execute the tbLoader utility.
[Example 5.1] Executing the tbLoader Utility
$ tbloader userid=db_user/db_password@default
control=sample.ctl data=sample.data direct=Y
The tbLoader utility receives control and data files and outputs log and error files. Control and data files are written by a user, while log and error files are automatically created by tbLoader. tbLoader utility I/O files are all text files, which makes input file creation simple for, the user.
The following section discusses input files (control and data files) and output files (log and error files). These files are necessary when using the tbLoader utility.
The control file defines the parameters that are used to execute the tbLoader utility. In the control file the user specifies the data to be read, how the data is read, and the location in which to save the data. For more information on setting control file parameters refer to “5.9. Specifying Control File Options”, “Specifying Control File Options”.
The data file is a text file that contains the data that will be stored in the database table. The data file uses the SPOOL commands of tbSQL , to save the results of SQL queries. As it is a text file, the data file can also be manually written in a general text editor.
The data file specified by a user is saved in two formats: fixed record format and separated record format.
Fixed record format is applied when the user specifies the POSITION information for all columns in the control file. This format does not use a delimiter; instead, it reads data values from the positions specified by the user. The column position is fixed by byte length. This format is less flexible than the separated record format but has higher performance.
In order to separate records, the user can specify the record size or use the end of line (EOL) character. To enhance speed, the system reads data from the column position specified in the control file, ignoring the ESCAPED BY and LINE STARTER parameter values.
The fixed record format uses the POSITION statement, not the FIELDS or the LINE STARTED BY statement. (Explained in “5.9. Specifying Control File Options”.)
“5.9. Specifying Control File Options”
When to use the LINES FIX statement
The LINES FIX statement is used when the record length included in the data file has a fixed length (number of bytes). The user has to specify the record size in the control file e.g. LINES FIX 12.
If the LINES TERMINATED BY statement is used, it will be ignored.
The following is an example of fixed length record.
example.ctl: LOAD DATA INFILE 'example.dat' LOGFILE 'example.log' BADFILE 'example.bad' APPEND INTO TABLE EMP LINES FIX 12 ( empno position (01:04), ename position (06:12) ) example.dat: 7922 MILLER 7777 KKS 7839 KING 7934 MILLER 7566 JONES
When not to use LINES FIX statement
When a user does not specify the record size in the control file, the EOL character ('\n') is used as a delimiter.
The following is an example of the EOL character as a delimiter.
example.ctl: LOAD DATA INFILE 'example.dat' LOGFILE 'example.log' BADFILE 'example.bad' APPEND INTO TABLE EMP TRAILING NULLCOLS ( empno position (01:04), ename position (06:15), job position (17:25), mgr position (27:30), sal position (32:39) )
example.dat: 7922 MILLER CLERK 7782 920.00 7777 KKS CHAIRMAN 7839 KING PRESIDENT 5500.0 7934 MILLER CLERK 7782 920.00 7566 JONES MANAGER 7839 3123.75 7658 CHAN ANALYST 7566 3450.00 7654 MARTIN SALESMAN 7698 1312.50
In the example.dat file, the second line does not have values in the last two columns. If “TRAILING NULLCOLS” is not specified, an error will occur. For more information about TRAILING NULLCOLS refer to “5.9.15. TRAILING NULLCOLS ”.
The separated record format is applied when the user does not specify POSITION information for all columns in the control file. Each field is separated by the FIELD TERMINATOR (field delimiter) and each record is separated by the LINE TERMINATOR (record delimiter). The separated record format has lower performance than the fixed record format but has higher flexibility because the format can be modified by the user.
The following is an example of the separated record format.
example.ctl: LOAD DATA INFILE 'example.dat' LOGFILE 'example.log' BADFILE 'example.bad' APPEND INTO TABLE emp FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '\\' LINES TERMINATED BY '\n' ( empno, ename, job, mgr, hiredate, sal, comm, deptno )
example.dat: 7654, "Martin", "Sales",7698,1981/10/28,1312.50,3,10 7782, "\,Clark","Manager" ,7839, 1981/01/11 ,2572.50,10,20 7839, "King",President,,1981/11/17,5500.00,,10 7934,"Miller","Clerk",7782 ,1977/10/12,920.00,,10 7566, "Jones",Manager" ,7839, 1981/04/02,3123.75,,20 7658, "Chan", Ana lyst, 7566,1982/05/03,3450,,20
The tbLoader utility execution process is recorded in the log file. The log file provides users with basic metadata of the input columns and statistics about successful and failed input records. For failed records, the cause of failure, as judged by the tbLoader utility, is provided.
tbLoader utility can load data through two methods:
The conventional path load method is the basic data loading method provided by the tbLoader utility. This method reads column data and the user specified data file. Its places the column data in a column array and loads the data to the database server by the batch update method. This method has lower performance than the direct path load but does not have any constraints.
The direct path load reads a data file specified by the user. The data is then loaded into a column array to match to a specific column data type. The column array type data is changed to match the Tibero database block type through a block formatter, and written to the Tibero database.
The direct path load is significantly faster than the conventional path load, but has the following constraints:
Does not check the CHECK constraint and the referential constraint.
Checks the Primary Key constraint, the Unique Key constraint, and the NOT NULL constraint.
The insert trigger does not work during loading.
The following section describes the constraints of the tbLoader utility.
If the FIELD TERMINATOR, ENCLOSED BY, ESCAPED BY, or LINE TERMINATOR BY parameter values are the same, the data file cannot be read correctly. For example, if FIELD TERMINATED BY and ENCLOSED BY parameters have the same value, an error occurs.
FIELDS TERMINATED BY '"' OPTIONALLY ENCLOSED BY '"'
If an input field value of a data file is the same as the character specified for ENCLOSED BY, FIELD TERMINATED BY, or LINE TERMINATED BY, the input field is not interpreted correctly. For more information refer to “5.9.11. FIELDS ENCLOSED BY ”.
This following section describes how the tbLoader utility handles whitespace.
A field value of the input file corresponds to the column value of a single record. If the field value consists of only whitespace(s), thetbLoader utility loads ‘0’ or NULL according to the data type of input table column.
The tbLoader utility regards blank space (' '), tab ('\t'), and the EOL character ('\n') as whitespace. If the character is defined as FIELD TERMINATED BY or LINE TERMINATED BY, however, it does not regard the character as whitespace. Whitespace located at the beginning and end of a field is ignored but whitespace in the middle of field is regarded as a part of the data.
The tbLoader utility regards whitespace differently according to the data file type.
Whitespace preceding a field value is regarded as actual data while whitespace at the end is regarded as unnecessary and is truncated. Because users generally delete whitespace preceding a field but in many cases the whitespace inserted at the beginning is used for formatting, whitespace at the end is deleted.
The following code illustrates the how whitespace is handled in case of fixed record.
" aaa \t" -> " aaa"
All whitespace preceding and following the field value is regarded as unnecessary and is truncated. If the user wants to insert whitespace preceding or following the field value, wrap the whitespace with the ENCLOSED BY string and the whitespace will be regarded as data.
When the user puts whitespace both at the beginning and at the end as shown below, the whitespace is truncated.
" aaa \t" -> "aaa"
As previously mentioned the tbLoader utility truncates data but whitespace at the beginning and at the end of a field can be forcibly considered as data by using the PRESERVE BLANKS parameter.
The following section describes the tbLoader utility parameters that can be specified in the command prompt.
If a parameter is not specified when executing the tbLoader utility, parameters can be specified from the command prompt. The following is the format that the command prompt parameters must be entered in.
$ tbloader tbLoader 5 Copyright (c) 2008, 2009, 2011, 2012 TmaxData Corporation. All rights reserved. Usage: tbloader [options] [controls] Options: -h|--help Displays the more detailed information. -v|--version Displays the version information. Controls: userid userid/passwd@dbname control Control file name log Log file name bad Bad file name data Data file name skip Skip lines in data file [default:0] direct Direct Path Load [default:N] dpl_log Enable Direct Path Load logging [default:N] message Loading progress message to stdout readsize Read buffer size bindsize Bind buffer size errors Errors to allow [default:50] rows Rows per commit multithread Use multi-thread for Direct Path Loading [default:Y] dpl_parallel Use Parallel Direct Path Loading [default:n] Example: tbloader userid=userid/passwd@dbname control=sample.ctl bindsize=1000000
From the command prompt, a user can input the metadata required for parameters or the control file. The SERID, CONTROL, DIRECT, MESSAGE, READSIZE, BINDSIZE, ERRORS, and ROWS parameters can be specified only from the command prompt.
The following shows an example of the tbLoader utility parameters, that can be specified in the command prompt.
[Example 5.2] Parameters of the tbLoader Utility that can be Specified in the Command Prompt
userid=userid/passwd@dbname control=/home/test/control.ctl control=../control.ctl log=/home/test/control.log log=../control.log data=/home/test/data.dat data=../data.dat bad=/home/test/data.bad bad=../data.bad skip=1 direct=Y dpl_log=Y message=10000 readsize=65536 bindsize=65536 errors=100 rows=100 multithread=N dpl_parallel=Y
Item | Description |
---|---|
userid | Specifies the Tibero database user name and password. E.g. userid=userid/passwd@databasename. |
control | Specifies the path and the name of the control file that contains parameter information. Both the absolute path and the relative path to the specified directory can be used. |
log | Specifies the path and the name of the file that records the tbLoader data loading execution process. (The default value: <Control file name>.log) Both the absolute path and the relative path to the specified directory can be used. If the user specifies paths to both the command prompt and the control file, the value specified in the command prompt has a higher priority. |
data | Specifies the path and the name of the text file which contains actual data. Both the absolute path and the relative to the specified directory can be used. If the user specifies paths to both the command prompt and the control file, the value specified in the command prompt has a higher priority. |
bad | Specifies the path and the name of the text file that records data load failures. (The default value: <Data file name>.bad> Both the absolute path and the relative path to the specified directory can be used. If the user specifies paths to both the command prompt and the control file, the value specified in the command prompt has a higher priority. |
skip | Skips (ignores) a specified number of lines from a data file, starting from the first line. (The default value: 0) Skip is the same as “5.9.16. IGNORE LINES ” in control file option. |
direct | Specifies either the conventional path load or the direct path load when the user loads data.
|
dpl_log | Parameter that specifies whether to leave logs in the log file of the server, when loading data using the direct path load method.
|
message | Parameter that displays the number of logical records the tbLoader utility is currently processing, on the screen. If not specified separately, the process state will not be displayed on the screen. If the specified number is too small, performance can be affected. |
readsize | The Loader reads the content of the data file through buffering. This parameter specifies the size of the read-only buffer. The units are in bytes. The maximum value is 2,097,152 (2MB). The default value is 65536 (64KB). |
bindsize | When the direct path load method is used for loading data, this parameter specifies the size of the direct path stream used by the client. TheTibero client is not uploaded to the server until the data is bound to a specified size. This parameter is effective when uploading large amounts of data. The units are in bytes. The maximum value is 15,728,640 (15MB). The default value is 65536 (64KB). |
errors | Specifies the maximum number of errors allowed when uploading data. The default value is 50. The ttbLoader utility uploads data until the number of errors reaches the user specified maximum number.
|
rows | This parameter specifies the number of records to commit, when the user uploads a large amount of data. Due to the tbLoader performance, the exact number of records specified will not be sent to the server. |
multithread | When direct path load is used for data loading, this parameter specifies whether to use multi-threads. ThetbLoader utility uses two threads. One thread reads data from the data file and loads the data to the stream buffer for direct path loading. The other thread loads the loaded stream buffer to the server. Setting this parameter can enhance performance if the machine has multiple CPUs or the client and the server are located on different machines.
|
dpl_parallel | Specifies whether to transfer data in Parallel Loading when loading data using the Direct Path Load method.
|
This section describes the advanced functions added by the tbLoader utility.
If the Direct Path Load method is used to transfer data, the target table is locked, which prevents loading to the same table in DPL at the same time. Parallel DPL is a method that addresses this problem.
If the Parallel DPL method is used, the Parallel DPL flag is set in the data to be loaded, the server receives data in parallel, and the data is saved after merging. To use parallel DPL, dpl and dpl_parallel must be set to 'Y' in tbloader.
The following example shows how to use Parallel DPL.
[Example 5.3] tbLoader Execution Example using Parallel DPL
$ tbloader userid=db_user/db_password@default
control=sample.ctl data=sample.data direct=Y dpl_parallel=Y
The tbLoader utility provides a function that encrypts database access information (connect_string) using an encryption file (wallet).
To use the access information encryption function, the encryption file (wallet) that stores the access information must be created. For more information about the encryption file, refer to “1.6.30. SAVE CREDENTIAL ”.
The following example shows how to access information using the encryption file.
[Example 5.4] tbLoader Execution Example using the Encryption File (wallet)
$ export LR_WALLET_PATH=./wallet_file.dat $ tbloader control=sample.ctl data=sample.data
Currently, the access information encryption function is supported only in UNIX with Open SSL installed.
The following section describes how to specify control file options. A user can specify the following options in the control file:
Character sets included in the data file
The data file that contains the data to be loaded
Log file that records the events during data loading
Error file that records data that failed to load
How to process existing data from a table (APPEND|REPLACE|TRUNCATE|MERGE)
Field delimiters and other options (TERMINATOR, ENCLOSED BY STRING, ESCAPED BY STRING)
Start and end string of a line
The number of lines to be ignored from the beginning of the data file
Options for a specific column of a table
The following describes the format of the control file. Ensure the order of the options match the sequence shown in the following example code. The contents inside square brackets ([ ]) can be omitted.
LOAD DATA [CHARACTERSET characterset_name] [INFILE data_file_name] [LOGFILE log_file_name] [BADFILE bad_file_name] [APPEND|REPLACE|TRUNCATE|MERGE(column_name, .....)] [PRESERVE BLANKS] INTO TABLE table_name [MULTI INSERT INDEXES|FAST BUILD INDEXES] [FIELDS [TERMINATED BY field_terminator] [OPTIONALLY ENCLOSED BY enclosed_by_start_string [AND enclosed_by_end_string]] [ESCAPED BY escaped_by_string]] [LINES [FIX number] [STARTED BY line_start_string] [TERMINATED BY line_terminator_string]] [TRAILING NULLCOLS] [IGNORE number LINES] (column_name [POSITION(from:to)] [INTEGER EXTERNAL(size)|FLOAT EXTERNAL(size)|DOUBLE EXTERNAL(size)| CHAR(SIZE)|RAW(SIZE)|DATE(size) date_fmt_string| TIMESTAMP(size) timestamp_fmt_string|TIME(size) time_fmt_string] [OUTFILE] [CONSTANT constant_value] [NULL TERMINATED] [PRESERVE BLANKS] [sql_expression], ..........) -- This line is comment.
The tbLoader utility can load data of various character sets into the Tibero server. The specified values affect the character sets in the control file and data file containing actual data. If the parameter value is not specified, TB_NLS_LANGUP_NLS_LANG, from the dsn file, is specified as the default character set of the data file.
The following describes how character sets can be set:
Syntax
CHARACTERSET characterset_name
Parameter Value | Description |
---|---|
characterset_name | Set the TB_NLS_LANGUP_NLS_LANG variable to a valid character set value. TB_NLS_LANGUP_NLS_LANG is the default character set value set in the environment configuration file or the dsn file. |
Example
If the character set of current client is set to KSC5601 (TB_NLS_LANGUP_NLS_LANG =EUCKR), to load the MSWIN949 data file to Tibero database, specify in the following way.
CHARACTERSET MSWIN949
The data_file_name parameter specifies the path and the name of the data file (text file) containing actual data. A user can use both the absolute path and the relative path to the specified directory.
If a user specifies paths to both the command prompt and the control file, the value specified in the command prompt has a higher priority.
The following describes how the data file name and path can be set:
Syntax
INFILE data_file_name
Parameter Value | Description |
---|---|
data_file_name | Specify the path and the name of the data file. |
Example
INFILE '/home/test/data.dat' INFILE '../data.dat'
The log_file_name parameter specifies the path and the name of the log file that records the tbLoader data loading execution process. A user can use both the absolute path and the relative path to the specified directory.
If a user specifies paths to both the command prompt and the control file, the value specified in the command prompt has a higher priority.
The following describes how the log file can be set:
Syntax
LOGFILE log_file_name
Parameter Value | Description |
---|---|
log_file_name | Specify the path and the name of log file. The default name of the log file is <control file name>.log. |
Example
LOGFILE '/home/test/control.log' LOGFILE '../control.log'
The bad_file_name parameter specifies the path and the name of file that records data loading failures. A user can use both the absolute path and the relative path to the specified directory.
If a user specifies paths to both the command prompt and the control file, the value specified in the command prompt has a higher priority.
If an error occurs when using DPL to load, a failed record is not written. A record is not written because identifying the row that failed is difficult in DPL, which focuses on performance.
The following describes how the bad (error) file can be set:
Syntax
BADFILE bad_file_name
Parameter Value | Description |
---|---|
bad_file_name | Specify the path and the name of bad file. The default value is <data file name>.bad. |
Example
BADFILE '/home/test/data.bad' BADFILE '../data.bad'
If existing data remains in the table specified by a user, the existing-data handling method can be specified.
In the case of the REPLACE and TRUNCATE options, after deleting a record from the table, the results are automatically committed. Therefore after thetbLoader utility has executed, the existing data cannot be recovered.
The following describes how existing data can be handled:
Syntax
APPEND|REPLACE|TRUNCATE|MERGE(column_name, .....)
Parameter Value | Description |
---|---|
APPEND | Adds new data while preserving existing data. APPEND is the default parameter value. |
REPLACE | Deletes the existing data with the DELETE command and adds new data. The user requires DELETE permission. If a trigger is associated to the DELETE command, the trigger will be executed. |
TRUNCATE | Deletes the existing data with the TRUNCATE statement and adds new data. Note: If the table has the referential integrity constraint, deactivate the condition and then execute the TRUNCATE statement. |
MERGE(column_name, .....) | Specify the list of columns to be merged. ThetbLoader utility uses the list of columns specified by the user as a key value. If the new data has the same key value as the existing data, the existing data is updated with the new data values. If the key values do not match, new data is entered. |
Example
control.ctl: LOAD DATA ... REPLACE ... (...)
control.ctl: LOAD DATA ... MERGE(EMPNO, EMPNM) ... (...)
PRESERVE BLANKS is used to enter whitespace into to a database, without truncating blanks in a data field.
Example
control.ctl: LOAD DATA ... PRESERVE BLANKS ... (...)
table_name is the target table name parameter.
The following describes how tables can be specified:
Syntax
INTO TABLE table_name
Parameter Value | Description |
---|---|
table_name | Specifies the name of the target table. A PUBLIC synonym can be specified by a table name. |
Example
control.ctl: LOAD DATA ... INTO TABLE EMP ... (...)
The following describes how to create indexes in a table when loading data to the target table, using the direct path load method. Either the multi insert method or the fast build method can be selected.
MULTI INSERT INDEXES optimizes multiple records and creates indexes simultaneously, while FAST BUILD INDEXES overwrites existing indexes and loads the data from the data file.
When using thetbLoader utility, if the target table has a large amount of existing data, there are advantages to using the MULTI INSERT method. Otherwise, the FAST BUILD method is recommended.
When using DPL through the tbLoader utility, the index state may change to unusable due to a data redundancy problem.
The following describes how indexes can be created:
Syntax
[MULTI INSERT INDEXES|FAST BUILD INDEXES]
Parameter Value | Description |
---|---|
MULTI INSERT INDEXES | Create indexes with the MULTI INSERT method. |
FAST BUILD INDEXES | Create indexes with the FAST BUILD method. |
Example
control.ctl: LOAD DATA ... MULTI INSERT INDEXES ... (...)
When reading records from a data file the data read before the field delimiter, as defined by the FIELD TERMINATED BY parameter, is considered as a single field.
The following describes how the field_terminator parameter can be defined:
Syntax
FIELDS TERMINATED BY field_terminator
Parameter Value | Description |
---|---|
field_terminator | Specify an ASCII character string. |
Example
control.ctl: LOAD DATA ... FIELDS TERMINATED BY ',' ... (...)
The FIELDS OPTIONALLY ENCLOSED BY parameter is a character string that wraps the start and the end of a field value, when reading records from a data file.
If the field value contains meta-character strings: blank space (' ', '\t', '\r', '\n'), field terminator, and line terminator. ThetbLoader utility recognizes the string as data. In the case of the ESCAPED BY string, even if the string is wrapped by the ENCLOSED BY string, the string is recognized as a meta-character string.
If the field value contains the same string as the ENCLOSED BY value, the ESCAPED BY value should be used as a prefix. If the ESCAPED BY value is not used as a prefix, the field is interpreted as a ENCLOSED BY value and the field value is not recognized.
The following describes how FIELDS OPTIONALLY ENCLOSED BY can be specified:
Syntax
FIELDS OPTIONALLY ENCLOSED BY enclosed_by_start_string
Parameter Value | Description |
---|---|
enclosed_by_start_string | Specify as ASCII character string. |
Example
If the start string and the end string to wrap the field are the same, specify as follows.
control.ctl: LOAD DATA ... FIELDS OPTIONALLY ENCLOSED BY '"' ... (...)
If the start string and the end string to wrap the field are different, specify as follows.
control.ctl: LOAD DATA ... FIELDS OPTIONALLY ENCLOSED BY '{$' AND '$}' ... (...)
If the parameter value is defined as follows, specify as shown below.
control.ctl: LOAD DATA ... FIELDS OPTIONALLY ENCLOSED BY '"' ESCAPED BY '\\' ... (...)
For the field value to be recognized correctly, when the field value contains a string that is the same as the by ENCLOSED BY value, the string specified by ESCAPED BY should be used as a prefix.
”quotation mark[\”]”,0001→(quotation mark[" ],0001) ”quotation mark[”]”,0001→Error occurs
Special characters following the ESCAPE BY string can still be read. Two '\\' must be used to specify a '\'.
The following describes how to specify the ESCAPED BY parameter.
Syntax
ESCAPED BY escaped_by_string
Parameter Value | Description |
---|---|
escaped_by_string | Specify as ASCII character string. |
Example
ESCAPED BY '\\' ESCAPED BY '$$!'
The LINES FIX parameter specifies the length of a line in a data file. The FIELDS, LINE TERMINATED BY and LINE STARTED BY parameters cannot be used with this syntax.
The following describes how the LINES FIX string can be specified:
Syntax
LINES FIX number
Parameter Value | Description |
---|---|
number | Specify an integer value. The units are in bytes. |
Example
control.ctl: LOAD DATA ... LINES FIX 5 ... (...)
Specify the LINES FIX parameter as shown in the previous example. Assume that the data file that contains the following content is read.
data.dat: abcdefghijklmnopqrst
The tbLoader utility reads the data file and interprets it as follows.
LINE #1: abcde LINE #2: fghij LINE #3: klmno LINE #4: pqrst
The LINES STARTED BY parameter is a string that prefixes data in a data file. The text (in the data file) following the LINES STARTED BY defined string is recognized as a unit of data. If a line in the data file does not contain the defined prefix, the whole line is exempt from being loaded. A set of the same consecutive characters is used as a prefix to enhance performance.
The line starting string can be specified as follows.
Syntax
LINES STARTED BY line_start_string
Parameter Value | Description |
---|---|
line_start_string | Specify an ASCII character string (maximum 30 bytes). |
Example
control.ctl: LOAD DATA ... LINES STARTED BY '$$$' ... (...)
Specify the parameter following the previous example, and assume that the data file which contains the following content is read.
data.dat: $$$0001,”SMITH” … something … $$$0002,”DAVID”
ThetbLoader utility reads the data file above and interprets it as follows.
(0001, “SMITH”), (0002, “DAVID”)
When reading data from a data file, the LINE TERMINATOR signals that the end of the record has been reached.
The following describes how the line ending string can be specified:
Syntax
LINES TERMINATED BY line_terminator_string
Parameter Value | Description |
---|---|
line_terminator_string | Specify an ASCII character string. The default value is '\n'. |
Example
The following is an example of specifying the LINE TERMINATOR string '|\n'.
control.ctl: LOAD DATA ... LINES TERMINATED BY '|\n' ... (...)
TRAILING NULLCOLS handles absent last column data problems. If data is missing from the last column of a record, the column value is set to NULL instead of being regarded as an error.
The following describes how to specify the TRAILING NULLCOLS clause:
Syntax
TRAILING NULLCOLS
Example
The following is an example of the ‘job’ column value of the record being set to NULL.
control.ctl: LOAD DATA ... TRAILING NULLCOLS ... ( id, name, job )
data.dat: 10 SMITH
IGNORE LINES excludes a specified number of lines from the beginning of the data file in the loading target.
The following describes how to specify the Loading target:
Syntax
IGNORE number LINES
Parameter Value | Description |
---|---|
number | Specify an integer value. The default value is 0. |
Example
If the first line of a data file expresses the column name, specify as follows to exclude the first line.
control.ctl: LOAD DATA ... IGNORE 1 LINES ... (...)
column_name specifies the column list of a table the user will enter data into. The order should be the same as the column order of the data file. The following section explains the column properties; “5.9.17.1. POSITION of a Column ”, “5.9.17.2. Data Type ”, “5.9.17.3. Specifying Data Buffer Size ”, “5.9.17.4. OUTFILE of a Column ”, “5.9.17.5. CONSTANT of a Column ”, “5.9.17.6. Preserving Blanks for a Specific Column ”, and “5.9.17.7. SQL Expressions ”.
The following describes how the Target column and properties can be specified:
Syntax
(column_name [POSITION(from:to)] [INTEGER EXTERNAL(size)|FLOAT EXTERNAL(size)| DOUBLE EXTERNAL(size)| CHAR(SIZE)|RAW(SIZE)|DATE(size) date_fmt_string| TIMESTAMP(size) timestamp_fmt_string| TIME(size) time_fmt_string] [OUTFILE] [CONSTANT constant_value] [NULL TERMINATED] [PRESERVE BLANKS] [sql_expression], ..........)
Parameter Value | Description |
---|---|
column_name | Specify the column name of target table. |
Specifies the start and end position of the column within a line of the data file.
The following describes how POSITION can be specified:
Syntax
POSITION(from:to)
Parameter Value | Description |
---|---|
from | Specifies the start position of the column within a line of the data file. The line starts at 1. |
to | Specifies the last position of the column within a line of the data file. |
Example
In case of fixed record format, specify the exact position in the column list, as follows.
control.ctl: LOAD DATA ... ( empno position(01:04), ename position(06:15), job position(17:25), mgr position(27:30), sal position(32:39), comm position(41:48), deptno position(50:51) )
In case of separated records format, the POSITION syntax is not required.
control.ctl: LOAD DATA ... ( empno, ename, job, mgr, sal, comm, deptno )
ThetbLoader utility provides specific data types. Ensure that the control file formats and default values have different data types and that the column data is bound by different methods.
The following lists the data types provided by the tbLoader utility:
Numeric external type
Character string data type
Binary data type
DATETIME data type
Numeric external data type is used to load character string data into numeric columns in the Tibero server. If NULL is specified to a column, the tbLoader utility binds ‘0’ as the default value and loads it to the server. In order to load NULL to the server, declare the column as a character string data type.
The following describes how the column format of the numeric external type can be specified:
Syntax
INTEGER EXTERNAL(size)|FLOAT EXTERNAL(size)|DOUBLE EXTERNAL(size)
Parameter Value | Description |
---|---|
INTEGER EXTERNAL(size) | Used when the character string data is an integer type. |
FLOAT EXTERNAL(size) | Used when the character string data is a float type. |
DOUBLE EXTERNAL(size) | Used when the character string data is a double precision float type. |
The size is specified in “5.9.17.3. Specifying Data Buffer Size ”.
Character string data type is used to load character string data to character type (CHAR, VARCHAR, CLOB, and NCLOB) columns in the Tibero server. If NULL is specified, the tbLoader utility binds NULL as the default value and loads it to the server. Declare numeric external type to load ‘0’ to the server.
The following describes how the column format of character string data type can be specified:
Syntax
CHAR(size)
Parameter Value | Description |
---|---|
CHAR(size) | Used for character string data. Size: Specify data buffer size. |
Binary data type is used to load binary data to mass data and object-type columns (RAW, BLOB, and LONGRAW) to theTibero server. If NULL is specified to the column, the tbLoader utility binds NULL as the default value and loads it to the server.
The following describes how the column format of the binary data type can be specified:
Syntax
RAW(size)
Parameter Value | Description |
---|---|
RAW(size) | Used for binary data. Size: Specify “5.9.17.3. Specifying Data Buffer Size ”. |
The DATETIME data type is used to load character string data into date type (DATE, TIME, and TIMESTAMP) columns in the Tibero server. If NULL is specified in a column, the tbLoader utility binds NULL as the default value and loads it to the server.
Tibero client uses the TB_NLS_DATE_FORMATUP_NLS_DATE_FORMAT and TB_NLS_TIMESTAMP_FORMATUP_NLS_TIMESTAMP_FORMAT parameters to specify the DATE and TIMESTAMP type column format. A user can directly specify column formats such as DATA, TIMESTAMP, and TIME type in the control file of the tbLoader utility. The specified column format types should be wrapped by double quotes (" ") in the control file.
The following describes how the DATETIME column format can be specified:
Syntax
DATE date_fmt_string|TIMESTAMP timestamp_fmt_string|TIME time_fmt_string
Parameter Value | Description |
---|---|
date_fmt_string | Specifies the column format to be DATE type. |
timestamp_fmt_string | Specifies the column format to be TIMESTAMP type. |
time_fmt_string | Specifies the column format to be TIME type. |
Example
The following example shows the control file of the tbLoader utility. This example specifies the date to the 'YYYYMMDD' format in the column 'hiredate' within the tbloader.dat file.
control.ctl: LOAD DATA ..... INFILE 'data.dat' ..... ( empno, ename, hiredate DATE "YYYYMMDD" )
data.dat: 1111, JOHN, 19981112 2222, TOM, 20070802
The tbLoader utility reads data from the data file and saves the data to the data buffer. The tbLoader utility then converts the data to the Tibero server data type and loads it to the server.
The tbLoader utility uses the schema information from within the database to read separated record type data and allocates an appropriately sized buffer. In order to read fixed record type data, the tbLoader utility uses the start and end information of POSITION and allocates the buffer.
If a user knows the length of the data, the user can specify the size of the data buffer. The user can also specify the data buffer size (units are in bytes) to read mass data such as BLOB, CLOB, LONG, and LONG RAW.
Example
If the control file type is as follows, 5 bytes and 10 bytes of data buffer are allocated to empno and hiredate columns respectively.
control.ctl: LOAD DATA .... ( empno INTEGER EXTERNAL(5), hiredate DATE(10) "YYYY/MM/DD" )
A user can specify the OUTFILE property in order to read mass data (BLOB, CLOB, LONG, and LONG RAW type) or to read binary data such as RAW from another file, rather than the data file. However, the user must specify the path of the file, in the data file.
If a user specifies the OUTFILE property, the tbLoader utility internally buffers the data in the file and loads it to the server in several sessions. In contrast, If a user does not specify the OUTFILE property, the tbLoader utility reads mass data from the data file directly and allocates a 32KB data buffer, by default. If a user wants to read larger data from the data file, the user can specify the data buffer size directly.
Example
If control file type is as follows,
control.ctl: LOAD DATA ..... INFILE 'data.dat' ..... ( empno, ename, resume OUTFILE )
The tbLoader utility reads data from the file if the file path is specified in the resume column in data.dat file.
data.dat 1111, JOHN, ./clobdata.txt 2222, TOM, /home/test/clobdata.txt
The value of a specific column can be specified as CONSTANT regardless of the data file value. The value must be wrapped in double quotes (" ") because the constant value may include a blank character.
Example
If the control file format is as follows, the column (empno) value is specified as 1234 regardless of the data value in data.dat.
control.ctl: LOAD DATA ..... INFILE 'data.dat' ..... ( empno constant "1234", ename )
The PRESERVE BLANKS clause is applied to all fields being loaded. To preserve whitespace in a specific column, users can use this column property.
Example
If control file format is as follows, it preserves the whitespace in the first field of the column 'empno'.
control.ctl: LOAD DATA ... ( empno PRESERVE BLANKS, ename )
SQL expressions can be used to represent the value of a specific column. The SQL value must be wrapped in double quotes (" ") and may include a semicolon and the column name.
Example
The following is a control file. The TO_CHAR function saves the SYSDATE value in the column 'empno' as a character string in the format 'YYYYMMDD'. The field value of the column 'empno' is bound in the column 'ename'.
control.ctl: LOAD DATA ... ( empno "TO_CHAR(SYSDATE, 'YYYYMMDD')", ename ":empno" )
This section shows examples of three scenarios: separated record format, fixed record format, and mass data format (e.g. BLOB and CLOB).
Create a table. (Common for all examples.)
Write a control file.
Write a data file.
Execute the tbLoader utility.
Check the log file and the error file.
The following is an example of creating a common table. In this example, the database name is the default value and the table owner is loader/loader_pw .
CREATE TABLE MEMBER ( ID NUMBER(4) NOT NULL, NAME VARCHAR2(10), JOB VARCHAR2(9), BIRTHDATE DATE, CITY VARCHAR2(10), PICTURE BLOB, AGE NUMBER(3), RESUME CLOB ); CREATE TABLE CLUB ( ID NUMBER(6) NOT NULL, NAME VARCHAR2(10), MASTERID NUMBER(4) );
Use the Convention Path Load method to load separated record format data to the Tibero server.
The following is the content of the control file control.ctl.
LOAD DATA INFILE './data.dat' APPEND INTO TABLE club FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '|\n' IGNORE 1 LINES (id, name, masterid)
The control file stores the following information:
Data file
The data.dat file located in the current directory.
Log file
If the user does not specify the log file name in the command prompt, a file named after the control file is created by default. In this example, a file named control.log is created.
Error file
If the user does not specify the error log file name in the command prompt, a file with the same file name as the data file is created by default. In this example, a file named data.bad is created.
Note: If an error record does not exist, an error file is not created.
Target table
A user wants to load id, name, and masterid column data, into a club table.
FIELDS TERMINATED BY character string, FIELDS OPTIONALLY ENCLOSED BY character string, FIELDS ESCAPED BY character string, LINES TERMINATED BY character string
Use comma (,) for FIELD TERMINATOR; quotation mark (") for FIELDS OPTIONALLY ENCLOSED BY; and character (|\n) for LINES TERMINATED BY. The FIELDS ESCAPED BY character string is not specified, therefore the default value, (\\) is used.
Loading target
IGNORE LINES ignores the record of the first line of a data file.
The content of the data file data.dat is as follows.
id name masterid 111111,FC-SNIFER,2345| dkkkkkkkkkk| 111112,"DOCTOR CLUBE ZZANG",2222| 111113,"ARTLOVE",3333| 111114,FINANCE,1235| 111115,"DANCE MANIA",2456| 111116,"MUHANZILZU",2378| 111117,"INT'L",5555
Execute the tbLoader utility as follows.
tbloader userid=loader/loader_pw@default control=./control.ctl
After executing the tbLoader utility, check the log and error files.
The content of the log file which records the execution process of the tbLoader utility, is as follows.
tbLoader 5 Copyright (c) 2008, 2009, 2011, 2012 TmaxData Corporation. All rights reserved. Data File : ./data.dat Bad File : ./data.bad Table 'CLUB' was loaded from the data file. Column Name Position DataType ------------------------------ ---------- ------------ ID 1 NUMERIC EXTERNAL NAME 2 CHARACTER MASTERID 3 NUMERIC EXTERNAL Record 2 was rejected. TBR-80053 : Exist any relatively positioned columns that are not present in the record Record 3 was rejected. TBR-80025 : Exceed data buffer size Record 6 was rejected. TBR-80025 : Exceed data buffer size Table 'CLUB' ---------------- 5 Rows were loaded successfully. 3 Rows were failed to load because of some errors Elapsed time was: 00:00:00.407089
The following shows the error file which records tbLoader utility failures while loading data. A user can modify the data that failed to load and can upload it to the Tibero again.
dkkkkkkkkkk| 111112,"DOCTOR CLUBE ZZANG",2222| 111115,"DANCE MANIA",2456|
Use Direct Path Load to load fixed record formatted data to the Tibero server.
The content of the control file control.ctl is as follows.
LOAD DATA INFILE '/home/test/data.dat' APPEND INTO TABLE MEMBER ( id position (01:04), name position (06:15), job position(17:25), birthdate position(27:36), city position(38:47), age position(49:51) )
The following lists the information stored in the control file:
Data file
The data.dat file located in the directory /home/test.
Log file
If the user does not specify the log file name in the command prompt, a file with the same name as the control file is created by default. In this example, a file named control.log is created.
Error file
If the user does not specify the error log file name in the command prompt, a file named after the data file is created by default. In this example, a file named data.bad is created.
Note: If an error record does not exist, an error file is not created.
Target table
A user wants to load; name, job, birth date, city, and age, column data into a member table.
FIELDS TERMINATED BY a character string, FIELDS OPTIONALLY ENCLOSED BY a character string, FIELDS ESCAPED BY a character string, LINES TERMINATED BY a character string
The previously mentioned delimiter parameters are not used because the data is in fixed record format. LINES FIX is also not used so the EOL character is used to terminate a line.
Loading target
Loads data from the first line of a data file.
The following is the content of the data file data.dat. Because the tbLoader utility reads in fixed record format, the user must enter column data in the correct positions.
7777 KKS CHAIRMAN 1975-11-18 SEOUL 33 7839 KING PRESIDEN DAEGU 45 7934 MILLER CLERK 1967-01-24 BUSAN 37 7566 JONES MANAGER 7499 ALLEN SALESMAN ddddddddd KYUNG-JU aaaa7654 MARTIN SALESMAN 7648 CHAN ANALYST 1979-10-11 INCHON 28
Execute the tbLoader utility as follows.
tbloader userid=loader/loader_pw@default control=./control.ctl direct=Y
After executing the tbLoader utility, check the log and error files.
The following is the content of the log file that recorded the execution process of the tbLoader utility.
tbLoader 5 Copyright (c) 2008, 2009, 2011, 2012 TmaxData Corporation. All rights reserved. Data File : ./data.dat Bad File : ./data.bad Table 'MEMBER' was loaded from the data file. Column Name Position DataType ------------------------------ ---------- ------------ ID 1 NUMERIC EXTERNAL NAME 2 CHARACTER JOB 3 CHARACTER BIRTHDATE 4 DATE CITY 5 CHARACTER AGE 6 NUMERIC EXTERNAL Record 4 was rejected. TBR-80053 : Exist any relatively positioned columns that are not present in the record Record 5 was rejected. TBR-80053 : Exist any relatively positioned columns that are not present in the record Record 6 was rejected. TBR-80053 : Exist any relatively positioned columns that are not present in the record Table 'MEMBER' ---------------- 4 Rows were loaded successfully. 3 Rows were failed to load because of some errors Elapsed time was: 00:00:00.338089
In the log file contents (previous figure) the log states that 3 rows failed to load due to an empty last column in each row. If using the “5.9.15. TRAILING NULLCOLS ” clause, the values of the last columns of the records can be bound to NULL.
The error file which records tbLoader utility failures while loading data is as follows. The data that failed to load must be modified and uploaded to the Tibero.
7566 JONES MANAGER 7499 ALLEN SALESMAN ddddddddd KYUNG-JU aaaa7654 MARTIN SALESMAN
Use the Conventional Path Load to load fixed length record formatted data to the Tibero server.
The content of the control file control.ctl is as follows.
LOAD DATA INFILE './data.dat' APPEND INTO TABLE MEMBER LINES FIX 51 TRAILING NULLCOLS ( id position (01:04), name position (06:15), job position(17:25), birthdate position(27:36), city position(38:47), age position(49:50) )
The following lists the information stored in the control file:
Data file
The data.dat file located in the current directory.
Log file
If the user does not specify the log file name in the command prompt, a file with the same name as the control file is created by default. In this example, a file named control.log is created.
Error file
If the user does not specify the error log file name in the command prompt, a file with the same name as the data file is created by default. In this example, a file named data.bad is created.
Note: If an error record does not exist, an error file is not created.
Target table
A user wants to load; name, job, birth date, city, and age, column data into a member table.
FIELDS TERMINATED BY character string, FIELDS OPTIONALLY ENCLOSED BY character string, FIELDS ESCAPED BY character string, LINES TERMINATED BY character string
The previously mentioned delimiter parameters are not used because the data is in fixed record format. LINES FIX is not used so the EOL character is used to terminate a line.
Loading target
Loads data from the first line of the data file.
When the record value in the data file does not exist, it is bound to NULL.
The following is the content of the data file data.dat. Because the tbLoader utility reads data in fixed record format, the user must enter column data in the correct positions.
7777 KKS CHAIRMAN 1975-11-18 SEOUL 33 7839 KING PRESIDEN DAEGU 45 7934 MILLER CLERK 1967-01-24 BUSAN
Execute the tbLoader utility as follows.
tbloader userid=loader/loader_pw@default control=./control.ctl
After executing the tbLoader utility, check the log and error files.
The following is the content of the log file that recorded the execution process of thetbLoader utility.
tbLoader 5 Copyright (c) 2008, 2009, 2011, 2012 TmaxData Corporation. All rights reserved. Data File : ./data.dat Bad File : ./data.bad Table 'MEMBER' was loaded from the data file. Column Name Position DataType ------------------------------ ---------- ------------ ID 1 NUMERIC EXTERNAL NAME 2 CHARACTER JOB 3 CHARACTER BIRTHDATE 4 DATE CITY 5 CHARACTER AGE 6 NUMERIC EXTERNAL Record 4 was rejected. TBR-80053 : Exist any relatively positioned columns that are not present in the record Record 5 was rejected. TBR-80053 : Exist any relatively positioned columns that are not present in the record Record 6 was rejected. TBR-80053 : Exist any relatively positioned columns that are not present in the record Table 'MEMBER' ---------------- 3 Rows were loaded successfully. 0 Rows were failed to load because of some errors Elapsed time was: 00:00:00.022404
The tbLoader utility does not create an error file because error data does not exist.
Use the Conventional Path Load to load separated record formatted data to the Tibero server.
The following is an example of mass binary data being loaded into BLOB column types. The CLOB data type, which can store mass binary data, is similar to this scenario. This example shows how to write an input file.
The content of the control file control.ctl is as follows.
LOAD DATA INFILE './data.dat' LOGFILE './logfile.log' BADFILE './badfile.bad' APPEND INTO TABLE member FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '\\' LINES TERMINATED BY '\n' TRAILING NULLCOLS IGNORE 2 LINES ( id, name, job, birthdate, city, age, picture outfile )
The following lists the information stored in the control file:
Data file
The data.dat file located in the current directory.
Log file
If the user does not specify the log file name in the command prompt, the log file logfile.log is created in the current directory.
Error file
If the user does not specify the error file name in the command prompt, the error file badfile.bad is created in the current directory.
Note: If an error record does not exist, an error file is not created.
Target table
A user wants to load; name, job, birth date, city, and age, column data into a member table.
FIELDS TERMINATED BY a character string, FIELDS OPTIONALLY ENCLOSED BY a character string, FIELDS ESCAPED BY a character string, LINES TERMINATED BY a character string
Use comma (,) for FIELD TERMINATOR; quotation mark (") for FIELDS OPTIONALLY ENCLOSED BY; (\\') for FIELDS ESCAPED BY; and character (|\n) for LINES TERMINATED BY.
Loading target
The IGNORE LINES command starts data loading from the third line of the data file.
When the record value of the data file does not exist, it is bound to NULL.
The content of the data file data.dat is as follows. Enter the path of the binary file in order to upload the binary data to BLOB column.
Ignores the first line 7782,"Clark","Manager",1981-01-11 , DAEGU,26,./blob.jpg 7839,"King",President,1960/11/17,SEOUL,47 7934,"Miller","Clerk",1977/10/12,BUSAN,30,./blob2.jpg 7566,"Jones",Manager\, ,1981/04/02,31 7499, "Allen", "Salesman" a,1981:02/20,26 7654, "Martin", "Sale smn", 1981/10/28,26 7658, "Chan",Ana lyst, 1982/05/03,25,25
Execute the tbLoader utility as follows.
tbloader userid=loader/loader_pw@default control=./control.ctl
After executing the tbLoader utility, check the log and error files.
The following is the content of the log file that recorded the execution process of the tbLoader utility.
tbLoader 5 Copyright (c) 2008, 2009, 2011, 2012 TmaxData Corporation. All rights reserved. Data File : ./data.dat Bad File : ./badfile.bad Table 'MEMBER' was loaded from the data file. Column Name Position DataType ------------------------------ ---------- ------------ ID 1 NUMERIC EXTERNAL NAME 2 CHARACTER JOB 3 CHARACTER BIRTHDATE 4 DATE CITY 5 CHARACTER AGE 6 NUMERIC EXTERNAL PICTURE 7 RAW Record 4 was rejected. TBR-80025 : Exceed data buffer size Table 'MEMBER' ---------------- 5 Rows were loaded successfully. 1 Rows were failed to load because of some errors Elapsed time was: 00:00:00.055475
The error file, which records tbLoader utility failures while loading data, is shown in the following way. The data that failed to load must be modified and uploaded again to the Tibero.
7499, "Allen", "Salesman" a,1981:02/20,26