Data quality is one of those things that seems obvious until you dig into it. Everyone agrees it’s important, but few people realize just how much it affects everything else. Bad data can quietly sabotage your decisions, slow down your operations, and erode trust within your organization. So, getting your data right isn’t just a nice-to-have—it’s essential.
At Secoda, we’ve built a framework that helps organizations improve their data quality. It breaks down into four key areas: Stewardship, Usability, Reliability, and Accuracy. Each one focuses on a different part of what makes data useful and trustworthy.
The anatomy of Data Quality Scores
The Data Quality Score (DQS) at Secoda is a comprehensive metric designed to guide data producers on improving the quality of their tables within the Secoda workspace. The DQS is divided into four main sections: Stewardship, Usability, Reliability, and Accuracy, each with specific indicators contributing to the overall score.
- Stewardship (25 Points): Focuses on the management and governance of data, ensuring proper ownership, tagging, and question responses.
- Usability (25 Points): Measures the ease with which data can be understood and utilized, emphasizing comprehensive documentation.
- Reliability (20 Points): Ensures data is consistently up-to-date and available when needed.
- Accuracy (30 Points): Ensures data correctness and reliability, focusing on key aspects like nullness, uniqueness, and test passing rates.
These categories and indicators aggregate into a composite score on a scale of 0 to 100, allowing for a customized approach to data quality assessment. This comprehensive and actionable framework is essential for improving data quality across the organization.
What Actually Makes Data Good?
Let’s break down each category and how it’s measured:
- Stewardship: This is about making sure data has an owner. Someone needs to be responsible for the data, and it needs to be properly tagged and documented. A well-stewarded dataset is one that people know and trust because someone’s in charge of maintaining it. To measure this, you’d track how many datasets have designated owners, how well-tagged they are, and whether questions about the data are being answered. In companies with good stewardship, you’ll see a clear chain of responsibility, reducing the chances of data falling through the cracks.
- Usability: This one’s simple: How easy is it for someone to actually use the data? Can they find what they need? Do they understand it? Usability comes down to things like clear documentation and ease of access. In a typical organization, you might find only 40-50% of key datasets fully documented. But getting that number up to even 70% can make a huge difference in how people interact with your data. Documentation is like good code—it needs to be clear, straightforward, and updated regularly.
- Reliability: This is about whether your data is consistent and up-to-date. Are the pipelines delivering data on time? Is the data still accurate when people need it? You’d measure this through things like data freshness and uptime. For most organizations, aiming for 95-99% data freshness is a good benchmark. Anything less, and you’re risking operational issues—people will stop trusting the data if it’s consistently outdated.
- Accuracy: This is the big one, especially for industries like finance and healthcare. Accuracy is all about correctness: Are you missing values? Is the data duplicated? Are validation tests passing? For fintech companies, accuracy is critical—account balances need to be correct, transaction data needs to be reliable, and you’re aiming for over 99% accuracy. Retail companies, by contrast, can tolerate a bit more noise in their data as long as the overall trends are accurate enough to drive decisions.
Why Some Industries Care More About Accuracy than Others
Different industries prioritize different aspects of data quality based on their needs. In healthcare, for instance, accuracy is paramount. If patient data is wrong, it’s not just inconvenient—it can be dangerous. Healthcare organizations tend to invest heavily in ensuring their data is accurate and properly governed.
In retail, the focus often shifts more towards usability and reliability. They need to track things like inventory in real time, and the data has to be accessible to a wide range of people. The sheer volume of data in retail makes 100% accuracy harder to achieve, but they can usually work with slightly imperfect data as long as the overall picture is correct.
How Tech Can Help
The tools you use have a big impact on your ability to manage data quality. Companies that implement good data governance platforms—like Master Data Management (MDM) systems—tend to do much better. These tools help maintain a single source of truth and ensure that data stays consistent across the board.
That said, even the best tools can’t fix bad data on their own. Large companies often struggle with this because they have so much data coming from so many sources. More data means more complexity, and that’s where things can go wrong. On the flip side, smaller companies typically find it easier to manage data quality because they’re dealing with fewer silos and simpler systems.
The Data Quality Trends We’re Seeing
Across industries, we see some clear patterns emerge:
- Financial services and fintech companies tend to score highest on accuracy and reliability. It’s not surprising—they’re dealing with strict regulations and real-time data needs, so mistakes are expensive. These companies often score 85% or higher in these areas after implementing proper data governance tools.
- Technology companies, especially those that build software products, often score high on usability because they’re naturally inclined toward documentation and processes. We often see tech companies score around 80-90% on usability once they get their documentation in order.
- Retail and healthcare tend to struggle more with reliability. In retail, it’s because of the complexity of supply chain data, inventory management, and legacy systems that need integration. Retailers typically start around 70% on reliability but can improve to the 80-85% range with the right tools. Healthcare has similar challenges, especially with fragmented data sources like different EMR systems. Accuracy is critical here, but usability often lags.
- Larger organizations (1,000+ employees) generally score lower in the early stages, especially on stewardship and reliability. There’s just more data and more complexity to manage. We’ve seen large companies start at 60-65%, while smaller organizations (<200 employees) often hit 75-80% right away because they have fewer data silos and simpler systems.
The tangible impact of improved Data Quality Scores
The benefits of improving Data Quality Scores (DQS) extend far beyond the abstract realm of "better data." Our case studies reveal concrete, measurable impacts on organizational performance based on customer experiences.
Case study: Homebot
Homebot, an innovative real estate tech company, adopted Secoda’s DQS feature to enhance their data governance and documentation practices. Their experience with DQS provided several key benefits:
- Clear, Quantifiable Metrics: For Homebot’s team, having a specific DQS threshold has made it easier to set clear expectations and ensure that their data models are properly documented. As their BI team manager noted, "For me as a manager, if I tell someone to go document something, they could take that in a million different ways. But to say we need to reach a certain threshold on the DQS before we want to say that this is properly ready to go and publish makes things a lot easier for me."
- Streamlined Documentation Process: The DQS has provided Homebot with a clear checklist, reducing ambiguity and making the documentation process more efficient. "The data quality score comes in really handy because it provides a very quantifiable, discrete metric of how well these things are documented."
- Reduction in Workload: By providing a straightforward, intuitive tool, the DQS feature has reduced the workload for Homebot’s BI team, allowing them to focus on more strategic initiatives. "The biggest ROI for us is preserving our resources, particularly time. DQS helps us streamline work, reducing the amount of ticket volume via self-service tools, and letting us focus on more proactive work that makes a bigger impact."
- Increased Confidence in Data Management: Homebot's use of DQS has given them greater confidence in identifying and managing proper data artifacts, helping to maintain a clean and efficient data environment. "Secoda's Data Quality Score gives us more confidence when going through and identifying proper artifacts to get rid of, which is crucial for maintaining a clean and efficient data environment."
Case study: Fintech payments company
A fast-growing fintech company adopted Secoda’s Data Quality Score (DQS) feature to enhance their data management processes and ensure operational reliability. Key results from their implementation include:
- Commitment to Data Quality: For this organization, maintaining high data quality is crucial to their success. As their team shared, "Data quality is critical for ensuring smooth operations and achieving our business objectives."
- dbt Integration: The company appreciated how Secoda's DQS seamlessly integrated with their existing dbt tests, which facilitated more streamlined data quality management. "The integration of dbt tests into the overall quality score has been especially valuable. Secoda’s thoughtful design makes it easy to manage and update descriptions and tests across tools."
- Measurable Data Quality: Secoda's DQS provided the company with a clear and quantifiable measure of data quality, enabling them to better monitor and enhance their data management practices. "Secoda offers us a tangible metric that captures key aspects of data quality—something we previously found challenging to quantify."
The real-world experiences of these users illustrate how Secoda’s DQS feature can significantly improve data quality management practices. These improvements lead to better data visibility, more efficient processes, and enhanced decision-making capabilities, ultimately driving business success.
If you want to get serious about improving data quality, it’s important to start small. The trick isn’t to try and fix everything at once, but to focus on the areas that matter most. Start by identifying the most critical data—especially sensitive data—and make sure it’s well-documented and owned by someone responsible. With Secoda’s tools, you can easily track things like data lineage (where the data comes from and how it’s used) and popularity (which data is used the most).
Then, take a Crawl-Walk-Run approach: start with a pilot project on a subset of critical data, and once you’ve figured out what works, expand gradually across the organization. It’s a lot easier to build momentum this way than to try and fix everything at once. Set realistic goals, use the DQS tools to track where your data stands, and develop a roadmap for improvement.
Secoda’s tools can help automate a lot of the heavy lifting, too. With automated suggestions at the table or column level, you don’t have to guess where the problems are. The system can point them out for you, giving you a clear list of next steps. Plus, features like AI-generated descriptions and automated tagging can take some of the grunt work off your team’s plate. Over time, this doesn’t just improve your data—it builds a culture of accountability around data quality.
Data Quality is a Competitive Advantage
What we’re seeing is that companies that invest in data quality aren’t just tidying up their data for its own sake. They’re setting themselves up to make faster, better decisions, and they’re building a system that can scale as they grow. And as more companies wake up to the importance of data quality, the ones who get it right will have a clear competitive edge.
The future of data quality is about making these systems more integrated and real-time. With AI and machine learning, we’re already seeing tools that can predict where data quality issues are likely to happen and catch them before they spiral out of control. It’s not far off to imagine a world where data quality is checked and corrected in real time, right at the point of ingestion.
And as data ecosystems grow more complex, getting data quality right won’t just be a nice bonus—it’ll be a baseline requirement for any company that wants to stay competitive.
The evolution of data quality assessment
As data ecosystems continue to grow in complexity and scale, the future of Data Quality Scoring is likely to be shaped by several key trends:
Artificial intelligence and machine learning
AI and ML are set to play an increasingly important role in data quality assessment:
- Anomaly detection: Advanced algorithms can identify data quality issues in real-time, flagging inconsistencies and outliers as they occur.
- Predictive data Quality: Machine learning models can forecast potential quality issues before they occur, allowing preemptive measures.
- Automated data cleansing: ML-powered tools can correct data quality issues autonomously, reducing the need for manual intervention.
- Context-Aware Quality Assessment: AI systems can understand the semantic context of data, providing more nuanced and accurate quality scoring.
Real-time data quality scoring
The move towards real-time data processing is driving demand for instantaneous quality scoring:
- Stream Processing: Integration of data quality checks into streaming data pipelines allows for real-time validation and correction.
- Edge Computing: Performing data quality assessments at the point of data collection ensures immediate quality control, even in distributed environments.
- Continuous Monitoring: Real-time dashboards and alerts for data quality metrics provide ongoing visibility into data health.
Integration with data observability
We're seeing a trend towards the integration of DQS with broader data observability practices:
- End-to-End Lineage: Tracking data quality throughout the entire data lifecycle provides comprehensive insights into data flow and transformation.
- Impact Analysis: Tools for assessing the downstream impact of data quality issues help identify and mitigate potential risks.
- Root Cause Analysis: Advanced diagnostics for identifying the source of data quality problems enable more effective resolution and prevention.
Industry-specific standards
As the field matures, we may see the emergence of industry-specific DQS standards:
- Regulatory Alignment: DQS frameworks tailored to specific regulatory requirements ensure compliance and reduce risk.
- Sector Benchmarks: Industry associations can establish data quality benchmarks and best practices, promoting standardization and excellence.
- Certification Programs: Third-party certifications for organizational data quality practices provide validation and assurance of data quality efforts.
The Data Quality Score (DQS) will become an essential metric for organizations, offering them a clear measure of their data health and a pathway for improvement.
Looking ahead, advanced data management platforms are set to be instrumental in navigating the complexities of data quality. By integrating DQS with AI-driven data catalogs, automated workflows, and comprehensive data lineage, these platforms provide a holistic approach to improving data quality.
The journey towards optimal data quality is continuous and evolving. As new technologies emerge in the modern data space, so too will the methods for assessing and enhancing data quality. Organizations who remain committed to refining their data quality management will gain a strong competitive edge.