Get started with Secoda
See why hundreds of industry leaders trust Secoda to unlock their data's full potential.
See why hundreds of industry leaders trust Secoda to unlock their data's full potential.
The Snowflake ARRAY_AGG function is an aggregation function that consolidates input values into an array. It is a versatile function used primarily for transforming rows of data into a single array format, which can be extremely useful for various data analysis and manipulation tasks. The function can be employed as both an aggregate function and a window function, offering a flexible approach to data aggregation.
ARRAY_AGG([DISTINCT] <expr1>) [WITHIN GROUP (<orderby_clause>)]
ARRAY_AGG([DISTINCT] <expr1>) [WITHIN GROUP (<orderby_clause>)] OVER ([PARTITION BY <expr2>])
The primary syntax for ARRAY_AGG includes options for using it as an aggregate function and as a window function. The expr1 is a required expression, typically a column name, whose values will be aggregated into an array. The DISTINCT keyword is optional and removes duplicate values from the array.
The WITHIN GROUP orderby_clause is also optional and orders the values within the array based on specified expressions. The OVER() clause specifies the function as a window function.
The ARRAY_AGG function collects values from a specified column and groups them into an array. This array can then be used for further analytical purposes or data transformation. The function can handle both distinct and non-distinct values, and it allows for ordering within the array using the WITHIN GROUP clause.
The function returns a value of type ARRAY. Note that the maximum data size for a single call is 16MB. Here are some usage notes:
The ARRAY_AGG function is particularly useful for transforming rows of data into a single array, making it easier to handle and analyze grouped data. Below are a few practical applications:
SELECT O_ORDERKEY AS order_keys
FROM orders
WHERE O_TOTALPRICE > 450000
ORDER BY O_ORDERKEY;
This query selects order keys from the orders table where the total price is greater than 450,000 and orders them by order key.
SELECT ARRAY_AGG(O_ORDERKEY) WITHIN GROUP (ORDER BY O_ORDERKEY ASC)
FROM orders
WHERE O_TOTALPRICE > 450000;
This query aggregates the order keys into an array, ordering them by order key in ascending order.
SELECT ARRAY_AGG(DISTINCT O_ORDERSTATUS) WITHIN GROUP (ORDER BY O_ORDERSTATUS ASC)
FROM orders
WHERE O_TOTALPRICE > 450000;
This query aggregates distinct order statuses into an array, ordering them by order status in ascending order.
SELECT O_ORDERSTATUS, ARRAY_AGG(O_CLERK) WITHIN GROUP (ORDER BY O_TOTALPRICE DESC)
FROM orders
WHERE O_TOTALPRICE > 450000
GROUP BY O_ORDERSTATUS
ORDER BY O_ORDERSTATUS DESC;
This query groups the data by order status and aggregates the clerks into an array, ordering them by total price in descending order.
While ARRAY_AGG serves a unique purpose of transforming data into arrays, it is helpful to compare it with other common aggregation functions to understand its distinct advantages and limitations.
Function Purpose Key Features ARRAY_AGG Aggregates data into an array Supports DISTINCT, WITHIN GROUP, and windowing COUNT Counts the number of rows Can count all rows or distinct values SUM Sums values in a column Useful for numeric data AVG Calculates the average of values Suitable for numerical data MIN Finds the minimum value in a column Useful for both numerical and non-numerical data MAX Finds the maximum value in a column Suitable for both numerical and non-numerical data
When using the ARRAY_AGG function, consider the following best practices to maximize its effectiveness and efficiency:
To start using the ARRAY_AGG function, you need to understand its basic syntax and how to apply it to your data.
SELECT ARRAY_AGG(column_name)
FROM table_name;
This query aggregates the values in the specified column into an array.
To remove duplicate values from the array, use the DISTINCT keyword.
SELECT ARRAY_AGG(DISTINCT column_name)
FROM table_name;
This query aggregates distinct values in the specified column into an array.
To maintain a specific order of elements within the array, use the WITHIN GROUP (ORDER BY ) clause.
SELECT ARRAY_AGG(column_name) WITHIN GROUP (ORDER BY column_name ASC)
FROM table_name;
This query aggregates the values in the specified column into an array, ordering them in ascending order.
To use ARRAY_AGG as a window function, include the OVER() clause.
SELECT ARRAY_AGG(column_name) OVER (PARTITION BY another_column)
FROM table_name;
This query aggregates the values in the specified column into an array, partitioning the data by another column.
While using the ARRAY_AGG function, you might encounter some common challenges. Here are solutions to address them:
The ARRAY_AGG function in Snowflake is a powerful tool for aggregating data into arrays, offering a versatile approach for data transformation and analysis. Here are the key takeaways:
By understanding its syntax, practical applications, and best practices, you can leverage the ARRAY_AGG function to efficiently handle and analyze grouped data in Snowflake.