Get started with Secoda
See why hundreds of industry leaders trust Secoda to unlock their data's full potential.
See why hundreds of industry leaders trust Secoda to unlock their data's full potential.
The dbt source freshness feature is an integral part of the data build tool (dbt) that ensures data timeliness and validity. It monitors source tables to verify they have been updated within a predetermined timeframe, which is crucial for maintaining the accuracy of data analytics and meeting Service Level Agreements (SLAs). By setting thresholds for data staleness, users can receive warnings or errors if data is not refreshed as expected, thus preventing outdated data from affecting decision-making processes. Implementing dbt continuous integration can be a valuable step for those looking to enhance their data quality and efficiency.
When the dbt source freshness command is executed, it evaluates the timestamp column in your source tables against your specified time cadence, such as minutes, hours, or days. If the data is older than your defined freshness criteria, the command will trigger a warning or error. This proactive approach helps identify and resolve data issues early, simplifying debugging and preventing data loss.
Dbt source freshness operates by allowing users to configure their dbt projects with freshness blocks in source definitions. These blocks let users define warn_after and error_after thresholds, which serve as alerts for when data becomes stale. The command evaluates the timestamp column, known as the loaded_at_field, against these thresholds.
When you execute the dbt source freshness
command, it checks the data sources and exits with a nonzero code if any source is found stale. This ensures consistent data quality validation. Additionally, the results of these checks are logged in a sources.json
file, detailing the freshness state of each source, which aids in maintaining a comprehensive overview of data health.
To effectively utilize the dbt source freshness command, follow these steps:
Begin by defining your sources in a YAML file. This file should include the source tables you want to monitor for freshness. For instance:
sources:
- name: my_source
tables:
- name: my_table
loaded_at_field: _etl_loaded_at
In this example, a source named "my_source" is defined with a table "my_table" and a specified loaded_at_field as "_etl_loaded_at".
Next, establish the freshness criteria by defining a "freshness" block within your sources YAML file. For example:
sources:
- name: my_source
tables:
- name: my_table
loaded_at_field: _etl_loaded_at
freshness:
warn_after: {count: 3, period: day}
error_after: {count: 5, period: day}
In this configuration, a warning is issued if the data is older than 3 days, and an error is triggered if it's older than 5 days. This helps to maintain data quality by ensuring timely updates.
After defining your sources and setting the freshness criteria, execute the dbt source freshness command:
dbt source freshness
This command compares the timestamp in your source tables against the defined time cadence and will pass, warn, or fail based on the freshness criteria. This ensures that data remains up-to-date for analysis and decision-making.
While using dbt source freshness, users may encounter several common challenges:
To maximize the effectiveness of dbt source freshness, adhere to the following best practices:
Dbt Core v1.7 introduced significant enhancements to the dbt source freshness feature, making it more versatile and user-friendly. One notable enhancement is the ability to configure freshness for more sources, even those lacking a loaded_at_field, by leveraging warehouse metadata tables. This expansion allows users to monitor a broader range of data sources, enhancing the overall utility of the dbt source freshness feature. For a comprehensive understanding of how to optimize continuous integration jobs in dbt, you can explore this comprehensive guide.
These enhancements improve the flexibility and applicability of dbt source freshness, enabling users to maintain data integrity across diverse data environments and configurations.
Dbt source freshness offers several additional features that enhance its functionality and performance:
Secoda enhances data management by centralizing and streamlining data discovery, lineage tracking, governance, and monitoring across an organization's data stack. This platform allows users to easily find, understand, and trust their data by providing a single source of truth. Features like search, data dictionaries, and lineage visualization improve data collaboration and efficiency within teams, essentially acting as a "second brain" for data teams.
By utilizing AI-powered insights, Secoda extracts metadata, identifies patterns, and provides contextual information about data, enhancing understanding and accessibility for both technical and non-technical users. This leads to faster data analysis and improved data quality.
Secoda offers several key features that enhance data management and collaboration.
Try Secoda today and experience a significant boost in data collaboration and efficiency. With our cutting-edge tools, you can streamline your data governance processes and improve data quality.
To explore how Secoda can transform your data management, get started today.