Get started with Secoda
See why hundreds of industry leaders trust Secoda to unlock their data's full potential.
See why hundreds of industry leaders trust Secoda to unlock their data's full potential.
The COPY INTO
command in Snowflake is a versatile tool used for loading data from various stages (internal or external) into Snowflake tables, or for unloading data from Snowflake tables into stages. This tutorial will guide you through the key functionalities, best practices, and examples of using the COPY INTO
command.
The COPY INTO
command in Snowflake is designed to facilitate the bulk loading and unloading of data. It supports various file formats, offers options for data transformation, and includes mechanisms for error handling and performance optimization.
COPY INTO <table_name>
FROM { internalStage | externalStage | externalLocation }
[ FILES = ( '<file_name>' [ , '<file_name>' ... ] ) ]
[ FILE_FORMAT = ( { FORMAT_NAME = '<file_format_name>' | TYPE = { CSV | JSON | PARQUET } [ formatTypeOptions ] } ) ]
[ copyOptions ]
[ VALIDATION_MODE = RETURN_ROWS ]
[ HEADER ]
This basic syntax highlights how to load data into a Snowflake table from various stages. The command supports different file formats and includes options for validation and error handling.
Loading data using the COPY INTO
command involves specifying the source stage, file format, and any necessary options for data transformation and error handling. Here are some key aspects:
PATTERN
parameter to match file names and paths using regular expressions.ON_ERROR
option specifies actions like CONTINUE
, SKIP_FILE
, or ABORT_STATEMENT
if errors are encountered.Unloading data involves specifying the target location and file format. The command supports partitioning, file size control, and performance optimization options.
MAX_FILE_SIZE
option helps manage file sizes for optimal performance.Upload data files to a Snowflake stage using the PUT
command. You can use an internal stage, or a table/user stage. If you're loading data from an S3 bucket, you can use the AWS upload interfaces and utilities to stage the files. Snowflake also recommends creating an external stage that references the bucket.
PUT file:///path/to/data/file.csv @my_stage;
This command uploads a local file to a specified internal stage in Snowflake.
Use the COPY INTO <table>
command to load the staged files into a Snowflake database table.
COPY INTO mytable
FROM @my_stage
FILE_FORMAT = (TYPE = CSV FIELD_DELIMITER = '|' SKIP_HEADER = 1);
This command loads data from the specified stage into the target table, using the defined file format options.
Specify a pattern to match file names and paths using regular expressions.
COPY INTO mytable
FROM @my_ext_stage
PATTERN='.*sales.*.csv';
This command loads only the files matching the specified pattern from the external stage.
Apply transformations such as column reordering and data type casting during the data load process.
COPY INTO home_sales(city, zip, sale_date, price)
FROM (SELECT SUBSTR(t.$2,4), t.$1, t.$5, t.$4 FROM @mystage t)
FILE_FORMAT = (FORMAT_NAME = mycsvformat);
This command loads data into the target table with specified transformations applied to the columns.
To ensure efficient data loading and unloading, follow these best practices:
MAX_FILE_SIZE
option to manage file sizes for better performance.Error handling is crucial when loading and unloading data. The COPY INTO
command provides several options:
Choose the error handling option that best fits your data integrity requirements and operational needs.
Performance optimization is key to efficient data loading and unloading. Consider the following tips:
The COPY INTO
command supports various file formats. Here are examples for different formats:
COPY INTO mytable FROM @my_stage FILE_FORMAT = (TYPE = CSV FIELD_DELIMITER = '|' SKIP_HEADER = 1);
COPY INTO mytable FROM @my_stage FILE_FORMAT = (TYPE = JSON);
COPY INTO mytable FROM @my_stage FILE_FORMAT = (TYPE = PARQUET);
Specify the appropriate file format options to ensure correct data loading.
While using the COPY INTO
command, you might encounter some common challenges. Here are solutions to address them:
ON_ERROR
option to specify actions like CONTINUE
, SKIP_FILE
, or ABORT_STATEMENT
when errors are encountered.MAX_FILE_SIZE
option to control the size of the files being unloaded, optimizing performance and manageability.In this tutorial, we covered the key functionalities and best practices for using the COPY INTO
command in Snowflake. Here are the key takeaways:
COPY INTO
command supports various file formats and offers options for data transformation and error handling.