Get started with Secoda
See why hundreds of industry leaders trust Secoda to unlock their data's full potential.
See why hundreds of industry leaders trust Secoda to unlock their data's full potential.
In this tutorial, we will explore how to calculate percentiles using various functions provided by the Snowflake platform. Percentiles are statistical measures that indicate the value below which a given percentage of observations in a group of observations falls. Snowflake offers several functions to calculate these values based on either a continuous or discrete distribution of the input column, or to estimate them using approximation algorithms.
Snowflake provides several functions to calculate percentiles. These include `PERCENTILE_CONT`, `PERCENTILE_DISC`, `APPROX_PERCENTILE`, `APPROX_PERCENTILE_ACCUMULATE`, and `PERCENT_RANK`. Each of these functions has a unique way of calculating percentiles, and they can be used as aggregate functions or as window functions, depending on the need to calculate percentiles over a partition of the data or across the entire dataset.
The `PERCENTILE_CONT` function returns a percentile value based on a continuous distribution of the input column. If no input row lies exactly at the desired percentile, the result is calculated using linear interpolation of the two nearest input values. NULL values are ignored in the calculation. The percentile must be a constant between 0.0 and 1.0.
SELECT PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY column_name) FROM table_name;
This code calculates the 25th percentile of the specified column in the table, ignoring NULL values and using linear interpolation if necessary.
The `PERCENTILE_DISC` function returns a percentile value based on a discrete distribution of the input column. The returned value is that whose row has the smallest cumulative distribution value that is greater than or equal to the given percentile. Like `PERCENTILE_CONT`, NULL values are ignored, and the percentile must be a constant between 0.0 and 1.0. Unlike `PERCENTILE_CONT`, `PERCENTILE_DISC` chooses the closest value rather than interpolating.
SELECT PERCENTILE_DISC(0.25) WITHIN GROUP (ORDER BY column_name) FROM table_name;
This code calculates the 25th percentile of the specified column in the table, ignoring NULL values and choosing the closest value rather than interpolating.
The `APPROX_PERCENTILE` function returns an approximated value for the desired percentile using an improved version of the t-Digest algorithm. This function is useful for large datasets where an exact calculation may be too resource-intensive. The result is an approximation, and the accuracy depends on the size and skew of the dataset.
SELECT APPROX_PERCENTILE(column_name, 0.25) FROM table_name;
This code calculates an approximate 25th percentile of the specified column in the table, using the t-Digest algorithm.
The `APPROX_PERCENTILE_ACCUMULATE` function returns the internal representation of the t-Digest state at the end of aggregation. This intermediate state can be combined with other states or processed by other functions to estimate percentiles.
SELECT APPROX_PERCENTILE_ACCUMULATE(column_name) FROM table_name;
This code returns the internal representation of the t-Digest state for the specified column in the table.
The `PERCENT_RANK` function returns the relative rank of a value within a group of values, specified as a percentage ranging from 0.0 to 1.0.
SELECT PERCENT_RANK() OVER (ORDER BY column_name) FROM table_name;
This code calculates the relative rank of each value in the specified column in the table, expressed as a percentage.
While using percentile functions in Snowflake, you might encounter some common challenges:
When calculating percentiles in Snowflake, keep the following best practices in mind:
To deepen your understanding of percentile calculations in Snowflake, consider exploring the following topics:
In this tutorial, we've covered how to calculate percentiles in Snowflake using various functions. We've discussed the `PERCENTILE_CONT`, `PERCENTILE_DISC`, `APPROX_PERCENTILE`, `APPROX_PERCENTILE_ACCUMULATE`, and `PERCENT_RANK` functions, and provided examples of how to use each one. We've also discussed common challenges and best practices when calculating percentiles in Snowflake.