January 8, 2025

Guide to Using dbt Source Freshness for Data Updates

Ensure data accuracy with dbt source freshness by monitoring updates, setting staleness thresholds, and enhancing data pipeline reliability.
Dexter Chu
Head of Marketing

What is dbt source freshness, and how does it impact data updates?

The dbt source freshness feature is an integral part of the data build tool (dbt) that ensures data timeliness and validity. It monitors source tables to verify they have been updated within a predetermined timeframe, which is crucial for maintaining the accuracy of data analytics and meeting Service Level Agreements (SLAs). By setting thresholds for data staleness, users can receive warnings or errors if data is not refreshed as expected, thus preventing outdated data from affecting decision-making processes. Implementing dbt continuous integration can be a valuable step for those looking to enhance their data quality and efficiency.

When the dbt source freshness command is executed, it evaluates the timestamp column in your source tables against your specified time cadence, such as minutes, hours, or days. If the data is older than your defined freshness criteria, the command will trigger a warning or error. This proactive approach helps identify and resolve data issues early, simplifying debugging and preventing data loss.

How does dbt source freshness work?

Dbt source freshness operates by allowing users to configure their dbt projects with freshness blocks in source definitions. These blocks let users define warn_after and error_after thresholds, which serve as alerts for when data becomes stale. The command evaluates the timestamp column, known as the loaded_at_field, against these thresholds.

When you execute the dbt source freshness command, it checks the data sources and exits with a nonzero code if any source is found stale. This ensures consistent data quality validation. Additionally, the results of these checks are logged in a sources.json file, detailing the freshness state of each source, which aids in maintaining a comprehensive overview of data health.

What are the steps to use dbt source freshness effectively?

To effectively utilize the dbt source freshness command, follow these steps:

1. Define your sources

Begin by defining your sources in a YAML file. This file should include the source tables you want to monitor for freshness. For instance:

sources:
- name: my_source
tables:
- name: my_table
loaded_at_field: _etl_loaded_at

In this example, a source named "my_source" is defined with a table "my_table" and a specified loaded_at_field as "_etl_loaded_at".

2. Set freshness criteria

Next, establish the freshness criteria by defining a "freshness" block within your sources YAML file. For example:

sources:
- name: my_source
tables:
- name: my_table
loaded_at_field: _etl_loaded_at
freshness:
warn_after: {count: 3, period: day}
error_after: {count: 5, period: day}

In this configuration, a warning is issued if the data is older than 3 days, and an error is triggered if it's older than 5 days. This helps to maintain data quality by ensuring timely updates.

3. Run dbt source freshness

After defining your sources and setting the freshness criteria, execute the dbt source freshness command:

dbt source freshness

This command compares the timestamp in your source tables against the defined time cadence and will pass, warn, or fail based on the freshness criteria. This ensures that data remains up-to-date for analysis and decision-making.

What are common challenges and solutions when using dbt source freshness?

While using dbt source freshness, users may encounter several common challenges:

  • Incorrectly defined sources or freshness criteria: Ensure that your sources and freshness criteria are accurately defined in the YAML file. Double-check the syntax and parameters to prevent errors.
  • Timestamp issues: Ensure your timestamp column is formatted correctly and in the appropriate timezone. Consistency in time zones, preferably UTC, is essential to avoid discrepancies.
  • Connectivity issues: Verify that your dbt project can connect to the source data system. Network configurations and permissions should be checked to ensure seamless connectivity.

What are the best practices for using dbt source freshness?

To maximize the effectiveness of dbt source freshness, adhere to the following best practices:

  • Run freshness tests frequently: Conduct freshness tests at least twice as often as your lowest SLA to ensure data is consistently updated and reliable.
  • Utilize the dbt_utils.recency test: This test helps verify that data in downstream tables is being updated, providing an additional layer of data validation.
  • Monitor and address warnings and errors promptly: Swiftly resolving any alerts helps maintain data quality and prevents potential issues from escalating.

What enhancements were introduced in dbt Core v1.7?

Dbt Core v1.7 introduced significant enhancements to the dbt source freshness feature, making it more versatile and user-friendly. One notable enhancement is the ability to configure freshness for more sources, even those lacking a loaded_at_field, by leveraging warehouse metadata tables. This expansion allows users to monitor a broader range of data sources, enhancing the overall utility of the dbt source freshness feature. For a comprehensive understanding of how to optimize continuous integration jobs in dbt, you can explore this comprehensive guide.

These enhancements improve the flexibility and applicability of dbt source freshness, enabling users to maintain data integrity across diverse data environments and configurations.

What additional features does dbt source freshness offer?

Dbt source freshness offers several additional features that enhance its functionality and performance:

  • Configuration of specific sources for freshness snapshots: Users can specify which sources to include in freshness snapshots, allowing for targeted monitoring and validation.
  • Utilization of advanced filters: These filters improve performance and reduce compute costs during freshness checks by focusing on relevant data changes.
  • Exclusion of certain sources from freshness calculations: Users can exclude sources that do not require freshness monitoring, streamlining the validation process.
  • Automation of regular checks within dbt Cloud: Regular automated checks help maintain data pipeline integrity and performance, ensuring ongoing data quality.

How does Secoda improve data management?

Secoda enhances data management by centralizing and streamlining data discovery, lineage tracking, governance, and monitoring across an organization's data stack. This platform allows users to easily find, understand, and trust their data by providing a single source of truth. Features like search, data dictionaries, and lineage visualization improve data collaboration and efficiency within teams, essentially acting as a "second brain" for data teams.

By utilizing AI-powered insights, Secoda extracts metadata, identifies patterns, and provides contextual information about data, enhancing understanding and accessibility for both technical and non-technical users. This leads to faster data analysis and improved data quality.

What are the key features of Secoda?

Secoda offers several key features that enhance data management and collaboration.

  • Data discovery: Users can search for specific data assets using natural language queries, making it easy to find relevant information.
  • Data lineage tracking: Automatically maps the flow of data from its source to its final destination, providing complete visibility.
  • AI-powered insights: Leverages machine learning to extract metadata and identify patterns, enhancing data understanding.
  • Data governance: Enables granular access control and data quality checks to ensure data security and compliance.
  • Collaboration features: Allows teams to share data information and collaborate on data governance practices.

Ready to take your data management to the next level?

Try Secoda today and experience a significant boost in data collaboration and efficiency. With our cutting-edge tools, you can streamline your data governance processes and improve data quality.

To explore how Secoda can transform your data management, get started today.

Keep reading

View all