Today, data teams are working in a constant state of choas. The amount of data generated by companies today is exploding, and data teams serve as stewards of this growing resource. There's no denying that data is traditionally siloed and in need of cleaning, documentation, and smooth delivery to stakeholders.
Most data teams are working with poor tools to facilitate workflows, efficiency, and enablement. This is because they're using tooling that isn't specifically designed to make the data team more productive: Confluence for data documentation, Slack for data requests, Jira for project management. By adopting these tools as their workflow tools, data teams are missing out on efficiency that can be gained by centralizing their operations in a single place.
Similar to customer support teams, data teams are usually reactive by nature. But customer support teams have started using tools like Intercom to avoid repetitive work and enable self-service. Data teams need similar tools to improve their efficiency, help them avoid repetitive work and enable self-service across the company. This is what we’re working towards at Secoda.
The answer doesn't lie in standalone data catalogues, data discovery, data lineage or data governance tools.
We believe the solution requires something new. The right tool is a bundle (here we go again, data Twitter) of these different tools into a new category called Data Enablement.
The perfect Data Enablement tool makes it easier to:
- Understand how often data assets are being used, by whom.
- Search through all data knowledge in one place, not in between 4-5 different tools.
- Find past answers and questions related to company data similar to “stack overflow”
- Have an automatically generated diagram of the data model
- Share data knowledge with external stakeholders
- Easily identify, hide PII data and build a request process for anyone that may need to access it.
The list goes on...
This tool needs to be simple to use for both technical and non-technical stakeholders and should help data teams work smarter as they service the never-ending list of data requests.
There is an urgent need for better tools that assist data teams in offloading the low-value, high-effort work to focus on higher-value tasks. Otherwise, we'll see the same costly churn and burn-out that data teams are no stranger to.
This is why it’s time for a Data Enablement revolution.
What is Data Enablement?
Data Enablement comprises everything that contributes to improving data teams' workflows. With good Data Enablement practices, even teams with the most archaic data stack can operate efficiently. Without good Data Enablement tools, even the most modern data stack won’t help.
Data Enablement is compromised of three main components: people, process and technology. People describe the processes that are needed and the role separation between the different members of the team. Process describes the way that the data team works on their ticketing, tech debt and version control. Technology covers the tools that help data teams discover, document, and share their data knowledge with the rest of the team.
Bringing people onto the data team is not an easy task. Onboarding someone onto “data” can take months if not done properly. Improving Data Enablement means improving the onboarding practices for analysts, DS, engineering and people outside of data. Additionally, people in the team should have an understanding of defined ownership, whether shared or individual, over deliverables, outcomes, and analysis. This can be done with a tool like Secoda, but requires people to buy into the reason for this process. Lastly, people on the data team should have clear roles and scope.
What is a Data Enablement Tool?
By building processes around Data Enablement, data teams should create systems that allow them to automate repetitive work and focus on making compounding improvements to the companies data knowledge over time. The process for managing the data requests queue is usually a barrage of Slack messages, emails, Jira tickets and dashboards that all live in isolated systems. If data teams are interested in reducing the number of repetitive data questions, they need to consider a Data Enablement tool that brings these workflows into a singular system built for data requests. The tool should allow a consumer to search for previous questions, ask a question and allow the data team to answer is by linking a dashboard or even running a live query. Similar to Stack Overflow, but for your data team.
Secondly, data teams should examine their internal data operations to find out what parts of their data are the least efficient. A good Data Enablement tool should also help teams identify and reduce the amount of data debt that exists across their system. The best way to stay ahead of data debt is to adopt a proactive Data Enablement tool, which lets you know when and what important data is stale, muddled or undocumented. When there is a large amount of data debt, the right resources are difficult to find, manage and understand. This costs teams time and effort- but too many times, data debt is difficult to measure. While this problem may seem like a small nuance in the short term, it can add up in costs directly and indirectly.
While speaking with data engineer managers and analysts, most voice their concern for dark, muddled, decayed and undocumented data, but the biggest part of the problem is that most don’t know where to start and how to measure the cost of data debt. Data teams should think about creating metrics about these areas of data debt and work overtime to reduce them. Similar to running an audit of your website to improve the SEO, data teams should be able to diagnose the pieces of data debt that can be improved across their data stack.
Why aren't Data Discovery and Data Catalogues Enough?
Organizations are reaching a point in their data journeys where they have accumulated enough data to start thinking about it more strategically. In other words, they want to get more out of their data, and many are realizing that the data catalogue is not enough.
A data catalogue tool is a repository of information about all the organization’s data assets. This can include metadata (data about the data) like who owns the dataset, what kind of data it contains, how often it’s updated, etc. It can also include tags (terms that describe the content), descriptions and comments from users, and links to where the actual datasets live.
A key reason data catalogues aren't enough is that a typical data catalogue isn’t actively maintained. Someone has to take ownership of keeping it up-to-date, which often doesn’t happen because people are busy doing their jobs and don’t see this as a core part of their role.
Data catalogues are not the end game. It's not that they're bad products; it's just that they don't go far enough. Data catalogues are heavily focused on metadata — specifically, understanding what information is in an asset (e.g., a table), who owns it, how to get it and how to use it. They also help with field-level descriptions that are useful for analysts and developers who need to understand the content of each field. This is a great start, but it's not enough.
Data catalogues and metadata discovery tools are primarily built for and by technical users. If the goal of the data teams is to allow business users to do more on their own — without having to ask the data team for help, the data team should consider tools that cater to all levels of data knowledge, not just the most technical.
As we move toward data enablement platforms, the goal is to allow business users to do more on their own — without having to ask the data team for help. Data enablement tools should include not only metadata but data requests, queries, data docs, dictionaries etc. The purpose of Data Enablement is to democratize access to data knowledge and make it as easy as possible for business users to find, understand and leverage information assets in their day-to-day work. A data catalogue is a vital component of this effort because it provides insight about what exists so that business users, but it’s not the end-all and be-all.
Our Need for Automation
The key to getting the most value from your data is to build a culture of data literacy across your whole organization. But that’s difficult when most of the time spent working with data is tied up in mundane tasks like cleaning, joining, and filtering. This means that documentation becomes an additional burden for the data teams. Automating the generation of data documentation with a stand-alone Data Enablement tool is a best practice for the long-term success of your data analysis and reporting.
A well-documented system provides an organization with a single source of truth for all information related to that system. While many organizations utilize documentation, some challenges make it hard to maintain accurate and up-to-date information.
In the context of data management, a data dictionary (also called a metadata repository) is an electronic library or database where you can store information about the meaning and structure of your business data. This information is about the meaning and usage (how the value is used), not about the actual values.
Although a tool won’t automate all of the documentation, a good Data Enablement tool should provide lift to the data team when they first connect their data automatically. A great Data Enablement tool should focus on automating data lineage, data dictionary, data governance, data documentation and even repetitive questions related to data, so the data team can focus on high-value tasks.
What’s next?
Navy seals train with the philosophy slow is smooth and smooth is fast. This motto enables teams to move as a unit and to overcome hurdles that would seem impossible at the outset. On the other hand, data teams today are usually - unintentionally - working with the philosophy “fast is chaotic and chaotic is okay for now”. We believe that it doesn’t have to be this way. Implementing and perfecting processes that improve over time can help make any group move cohesive, efficient and accurate.
While it’s true that data teams have become better at extracting, transforming and loading data, many data teams are in the dark about their operations and what they can do to enable better data usage across the company and better collaboration in their team. There is an urgent need for better tools that assist data teams in offloading the low-value, high-effort work to focus on higher-value tasks.
This is why it’s time for a Data Enablement revolution.