Get started with Secoda
See why hundreds of industry leaders trust Secoda to unlock their data's full potential.
See why hundreds of industry leaders trust Secoda to unlock their data's full potential.
Snowflake external tables offer a powerful way to query data stored in external stages without needing to load it into Snowflake. This feature is particularly beneficial for organizations dealing with large volumes of data stored in cloud storage services such as AWS S3, Azure Blob Storage, or Google Cloud Storage. In this guide, we will explore the foundational concepts, schema design, performance considerations, and practical use cases of Snowflake external tables.
Snowflake external tables allow users to query data stored in an external stage as if it were inside a Snowflake table. This capability eliminates the need to load data into Snowflake, making it an efficient solution for handling large datasets stored in cloud storage. External tables are inherently read-only, meaning they cannot perform Data Manipulation Language (DML) operations like insert, update, or delete. However, they can be queried and joined with other tables.
CREATE OR REPLACE EXTERNAL TABLE my_external_table
WITH LOCATION = '@my_stage/path/to/data'
FILE_FORMAT = (TYPE = PARQUET);
This code snippet demonstrates how to create an external table in Snowflake using data stored in a specified location and file format. The external table can then be queried as if it were a native Snowflake table.
Snowflake external tables work by referencing data stored in external stages, such as cloud storage services. When a query is executed on an external table, Snowflake accesses the external storage to retrieve the data. While querying external tables might be slower compared to native Snowflake tables due to the overhead of accessing external storage, using materialized views can significantly enhance performance.
External tables in Snowflake have a specific schema design to handle data efficiently. The VALUE
column represents a single row in the external file and is of type VARIANT
, which is essential for handling semi-structured data formats such as JSON, Avro, ORC, and Parquet. Additionally, columns like METADATA$FILENAME
and METADATA$FILE_ROW_NUMBER
provide context about the data source and help in tracking and debugging.
<!-- Example of defining schema and columns for an external table -->
CREATE OR REPLACE EXTERNAL TABLE my_external_table (
VALUE VARIANT,
METADATA$FILENAME STRING,
METADATA$FILE_ROW_NUMBER NUMBER
)
WITH LOCATION = '@my_stage/path/to/data'
FILE_FORMAT = (TYPE = PARQUET);
This code snippet demonstrates how to define the schema and columns for an external table in Snowflake, including the VALUE
column and metadata columns.
Proper file sizing is crucial for optimizing the performance of external tables. The following recommendations are based on the file formats supported by Snowflake:
File Format Recommended File Size Recommended Row Group Size Parquet 256 - 512 MB 16 - 256 MB Other Formats 16 - 256 MB N/A
Partitioning external tables can significantly improve query performance by dividing data into smaller, more manageable parts. There are two primary methods for partitioning:
ALTER EXTERNAL TABLE … ADD PARTITION
command.Snowflake external tables support Delta Lake, a storage layer that brings ACID (Atomicity, Consistency, Isolation, Durability) transactions to data lakes using Apache Parquet format. Key aspects of Delta Lake support include:
TABLE_FORMAT = DELTA
option in the CREATE EXTERNAL TABLE
command.Users can add or remove columns using the ALTER TABLE … ADD COLUMN
or ALTER TABLE … DROP COLUMN
commands. However, default columns such as VALUE
, METADATA$FILENAME
, and METADATA$FILE_ROW_NUMBER
cannot be dropped.
<!-- Example of adding a column to an external table -->
ALTER TABLE my_external_table ADD COLUMN new_column STRING;
This code snippet demonstrates how to add a new column to an existing external table in Snowflake.
To secure external tables, users can apply masking and row access policies. These policies help ensure that sensitive data is protected and that only authorized users can access specific rows or columns.
<!-- Example of applying a masking policy to an external table -->
CREATE MASKING POLICY mask_sensitive_data AS (val STRING) RETURNS STRING ->
CASE
WHEN CURRENT_ROLE() IN ('authorized_role') THEN val
ELSE '****'
END;
ALTER TABLE my_external_table MODIFY COLUMN sensitive_column SET MASKING POLICY mask_sensitive_data;
This code snippet demonstrates how to create and apply a masking policy to an external table in Snowflake.
Snowflake external tables are particularly useful in scenarios where data resides in external storage systems, and users need to perform ad-hoc queries or combine external data with internal Snowflake data. Some practical use cases include:
However, there are performance considerations to keep in mind:
Feature External Tables Native Snowflake Tables Data Storage External storage (e.g., AWS S3, Azure Blob) Internal Snowflake storage Read/Write Capability Read-only Read and write Performance Potentially slower due to external data access Optimized for high performance Metadata Storage File-level metadata stored in Snowflake Complete metadata stored internally Use Cases Querying external data, ETL processes High-performance querying, comprehensive analytics Cost Storage costs depend on external storage provider Included in Snowflake storage pricing
While working with Snowflake external tables, users might encounter several challenges. Here are some common issues and their solutions:
In this guide, we explored the various aspects of Snowflake external tables, including their creation, management, and practical use cases. Here are the key takeaways:
By understanding and implementing the concepts discussed in this guide, users can effectively leverage Snowflake external tables to enhance their data querying and integration capabilities.