Tacnode facilitates efficient data import and export within a table. For importing large datasets, the INSERT or COPY FROM methods are typically used. The INSERT operation is suitable for smaller datasets or when incrementally adding to an existing table. In contrast, the COPY FROM command is ideal for large initial imports, as it reads data directly from a file and writes it into the storage engine's underlying file format, reducing the overhead associated with SQL transaction processing linked with INSERT. This significantly enhances the import experience. Additionally, the COPY TO command allows for batch exporting of a table or query results to a file for subsequent imports or processing.
For example, consider the following table for students:
CREATE TABLE students(id int PRIMARY KEY,name varchar(255) NOT NULL,phone varchar(16));
We have a prepared file (stu.txt) containing data conforming to the above table schema. Its content is:
123 Jacob \N456 "Alex Xia" 13888811888789 Henry \N
In this file, fields in each row are separated by a single tab character \t, and an empty field is indicated by \N.
Once connected to Tacnode's default PostgreSQL database using psql, run these commands to import data and view the results:
postgres=> \COPY students FROM stu.txt;COPY 3postgres=> SELECT * FROM students; id | name | phone-----+------------+------------- 123 | Jacob | 456 | "Alex Xia" | 13888811888 789 | Henry |(3 rows)
The syntax for the detailed batch import command COPY FROM is as follows:
DELIMITER 'delimiter_character': Specifies the column separator for each data row in the file, defaulting to the tab character \t. Since Tacnode currently only supports text format file input, it must be escaped if this delimiter appears in normal field content. Future versions will support CSV and binary format file input.
NULL 'null_string': Indicates a string representing a NULL value, defaulting to the two-character text \N.
ON CONFLICT DO NOTHING: If a primary key from the copied file already exists in the existing data, that row will be ignored.
The INSERT statement adds new records to a table. You can insert individual rows or multiple rows simultaneously. The syntax is as follows:
[ WITH [ RECURSIVE ] with_query [, ...] ]INSERT INTO table_name [ AS alias ] [ ( column_name [, ...] ) ] [ OVERRIDING { SYSTEM | USER } VALUE ] { DEFAULT VALUES | VALUES ( { expression | DEFAULT } [, ...] ) [, ...] | query } [ ON CONFLICT [ conflict_target ] conflict_action ] [ RETURNING * | output_expression [ [ AS ] output_name ] [, ...] ]where conflict_target can be one of: ( { index_column_name | ( index_expression ) } [ COLLATE collation ] [ opclass ] [, ...] ) [ WHERE index_predicate ] ON CONSTRAINT constraint_nameand conflict_action is one of: DO NOTHING DO UPDATE SET { column_name = { expression | DEFAULT } | ( column_name [, ...] ) = [ ROW ] ( { expression | DEFAULT } [, ...] ) | ( column_name [, ...] ) = ( sub-SELECT ) } [, ...] [ WHERE condition ]
For additional syntax for INSERT, refer to INSERT.
The test table's definition is outlined below. Unless otherwise stated, the above table definition will be used for testing.
CREATE TABLE t1( id int primary key, name text, age smallint, gender char(1));
Clearly define the column names, field, and value columns in a one-to-one correspondence. You may provide only a selection of column names and their corresponding values, while the values for the other columns will default to their standard values.
Example:
tacnode=> INSERT INTO t1 (id, name, age, gender) VALUES (1,'testname','22','f');INSERT 0 1tacnode=> INSERT INTO t1 (id ,name) VALUES (2,'testname');INSERT 0 1
If you don't specify column names, you must provide values for all fields in VALUES.
Example:
tacnode=> INSERT INTO t1 VALUES (3,'testname3','33','f');INSERT 0 1
Batch insertion allows you to add several records simultaneously.
Utilize query results as input for the INSERT INTO statement.
Example:
tacnode=> INSERT INTO t1 (id, name, age, gender) SELECT id, name, age, gender FROM t2 WHERE id>10;INSERT 0 1
When a primary key conflicts during data insertion, you can use the INSERT INTO ON CONFLICT syntax to determine the desired action. There are two main CONFLICT_ACTION options:
NOTHING: When a primary key conflict occurs, no action is taken, and the conflicting data to be inserted is discarded.
UPDATE: The specified column is updated after a primary key conflict (Note: While technically permissible, updating primary key columns is not recommended).
It is advisable to implement ON CONFLICT based on the primary key or to set up a unique index; otherwise, a full table scan may be initiated, which can diminish performance.
Example:
tacnode=> \d t1 Table "public.t1" Column | Type | Collation | Nullable | Default--------+--------------+----------+----------+-------------------------------------- id | integer | | not null | nextval('t1_id_seq'::text::regclass) name | text | | | age | smallint | | | gender | character(1) | | |Indexes: "t1_pkey" PRIMARY KEY, btree (id ASC)tacnode=> SELECT * FROM t1 WHERE id=1; id | name | age | gender----+----------+-----+-------- 1 | testname | 22 | f(1 row)-- Use INSERT INTO ON CONFLICT DO NONTHING to insert data with primary key conflictstacnode=> INSERT INTO t1 (id, name, age, gender) VALUES (1,'testname1', 11, 'm') ON CONFLICT(id) DO NOTHING;INSERT 0 0-- Data not updatedtacnode=> SELECT * FROM t1 WHERE id=1; id | name | age | gender----+----------+-----+-------- 1 | testname | 22 | f(1 row)-- Use INSERT INTO ON CONFLICT DO UPDATE to insert data with primary key conflictstacnode=> INSERT INTO t1 (id, name, age, gender) VALUES (1,'testname1', 11, 'm') ON CONFLICT(id) DO UPDATE SET name=EXCLUDED.name,age=EXCLUDED.age,gender=EXCLUDED.gender;INSERT 0 1-- Data is updated by specified columntacnode=> SELECT * FROM t1 WHERE id=1; id | name | age | gender----+-----------+-----+-------- 1 | testname1 | 11 | m(1 row)
For large volumes of data, batch submission statements allow for the simultaneous submission of multiple SQL statements to Tacnode, enhancing import performance through batching. Here’s an example:
INSERT INTO test (col1, col2) VALUES ('A', 10), ('B', 20), ('C', 30)
Connection conn = dataSource.getConnection();try { conn.setAutoCommit(false); // Turn off autocommit and turn on transactions PreparedStatement pstat = conn.prepareStatement("INSERT INTO test (col1, col2) VALUES (?, ?)"); // Add records in batch mode pstat.setString(1, "A"); pstat.setInt(2, 10); pstat.addBatch(); pstat.setString(1, "B"); pstat.setInt(2, 20); pstat.addBatch(); pstat.setString(1, "C"); pstat.setInt(2, 30); pstat.addBatch(); // batch commit pstat.executeBatch(); conn.commit(); // commit the transaction} catch (Exception e) { conn.rollback(); // rollback the transaction e.printStackTrace();} finally { conn.setAutoCommit(true); // restore the autocommit conn.close(); // close the connection}
For incremental updates to a large data table, such as updating a value of 3 million in a base table of 3 billion, you cannot use COPY with ON CONFLICT DO UPDATE. Instead, use COPY FROM to a temporary table followed by INSERT INTO SELECT FROM (temporary table) ON CONFLICT for optimal import performance. Here are the steps:
-- Verify the table formattest=> \d test Table "public.test" Column | Type | Collation | Nullable | Default------+---------+----------+----------+------ col1 | integer | | not null | col2 | text | | | col3 | text | | |Indexes: "test_pkey" PRIMARY KEY, btree (col1 ASC)-- Create the temporary tabletest=> CREATE TABLE test_tmp (like test including all);-- Execute copy import to the temporary tabletest=> \copy "test_tmp" FROM csv_file;-- Batch import via insert into select fromtest=> INSERT INTO test(col1, col2, col3) SELECT col1, col2, col3 FROM test_tmp on conflict (col1) do update set col1=excluded.col1,col2=excluded.col2,col3=excluded.col3;