Updated
December 16, 2024

How Kaufland e-commerce automates data governance across over 15K tables

Customer name
Kaufland
Industry
Retail
Company size
Enterprise
Pain point
About the company
Kaufland is a German hypermarket chain, part of the Schwarz Gruppe which also owns Lidl. It operates over 1,500 stores across Europe.
https://www.kaufland.com/
Data stack

The situation

Kaufland.de is one of the fastest-growing online marketplaces in Germany with over 500 employees in Cologne, Darmstadt and Düsseldorf. The ecommerce marketplace offers a comprehensive range of more than 37 million products from over 5,000 different categories. Each month, the platform services over 8 thousand sellers.

Richard Hondrich, Head of Data and Analytics at Kaufland E-Commerce, identified the need for a data enablement platform to streamline how the organization as a whole accessed, discovered, and used data. As their data landscape grew to over 15 thousand tables with triple digit growth in active data users, Kaufland E-Commerce needed a system to make data discoverable in order for it to be used efficiently.

The solution

To manage the scale of their data ecosystem, Richard used Secoda to create and maintain a consolidated view of all data assets that power the organization’s critical business processes.

Richard recognized the need for automation in order to sustainably maintain a high level of data governance. Using Secoda, the data & analytics team integrated documentation into their table creation process to ensure all data is verified and up to date, reduced time to insight, and increased transparency.

“The issue with most data catalogs is their limited functionality and inability to fit nicely within workflows. It’s a chicken and egg problem - if the data is not kept up-to-date, then the ecosystem will not be used. Secoda allows us to incorporate data governance into our existing processes without getting in the way.”

Integrated data governance

Within Secoda, you have the ability to create Collections which serve as top-level folders to help systematically organize your documents in a logical, nested format. Kaufland E-Commerce’s Secoda workspace is organized in such a way that each functional area and team is represented by a Collection. This way, each team has a single data repository for documents, questions, and knowledge. For new team members, this is a prescriptive process that helps efficiently onboard new data team members and data consumers.

For Kaufland E-Commerce, every table across their entire data stack maps to a specific Collection and has a dedicated owner. With over 15 thousand tables in BigQuery, automation is the only way to sustain this precise level of data management where each table holds unique semantic association and context. Kaufland E-Commerce’s data sets grew to a point where the lack of hierarchical structure within BigQuery created significant inefficiencies for the data team when accessing and using their data. In addition, there was no ability to apply data governance to ensure data integrity. Given BigQuery does not disassociate a name from an ID, Kaufland E-Commerce used Secoda as an abstraction layer in order to restructure the content and used the Secoda API to create a process to ensure data governance at scale.

Organize tables and documents in a logical, nested format

When a new table is created, a YML file containing required fields such as functional area, team, Secoda Collection ID, and the Slack channel that corresponds with the table being created must be populated. The table then is created in BigQuery and simultaneously appears in Secoda within the identified collection. Any errors caused by missing values will cause the pipeline to create the table to fail and the merge request will not be successful.

Impact analysis

As the tables are created in BigQuery, they are simultaneously reflected in Secoda. The Secoda Collection name and ID are then pulled into BigQuery for the Data & Analytics team to build Table Relevance Reports in Tableau. These reports identify the priority of tables which require documentation and maintenance by applying a score based on factors such as number of columns in the table, table fill rate, number of maintained columns, presence of a table description, presence of a schema description, count of downstream assets, is table clustered, is table partitioned, query count, and distinct user count. Each table has a corresponding slack channel monitored by the table owner. This process allows Kaufland E-Commerce to systematically prioritize the maintenance of the most important tables and ensure that the most relevant data is always validated and trustworthy.

Clicking on a bubble opens the respective table in Secoda for maintenance purposes

Automated stakeholder communication

Kaufland E-Commerce uses Secoda’s announcements feature to notify relevant stakeholders of any changes to key assets. For example, in the event of a schema change, Secoda automatically grabs the lineage relationships between each data source. Kaufland E-commerce then triggers a notification to the downstream owners via Slack. Secoda's integration with Slack ensures that team members are always up-to-date with the latest changes and can easily collaborate with one another.

“The automated lineage that Secoda provides is the most important piece because we now have one consolidated place where we can see dependencies and communicate them to the potentially impacted parties.”
Richard Hondrich, Head of Data and Analytics

This feature has improved communication efficiency among team members, reduced downtime, and increased data accuracy. With Secoda, Kaufland E-Commerce can ensure that all stakeholders are aware of changes and can take necessary actions to prevent issues downstream. Additionally, the centralized platform enables better collaboration and faster decision-making among team members.

Conclusion

By having one place to store documents, questions, and knowledge, it's easier for team members to find and access the information they need. Using Secoda has allowed Kaufland E-Commerce to increase the discoverability of their data so it can be used efficiently.

More customer stories

See all stories