Traditional retailers like Nordstrom and Old Navy may not seem like highly regimented organizations, but they have streamlined operations that could be beneficial to data analytics and engineering teams.
Every season, give or take 3-4 months, they scrap all of their inventory, all of their assets, and restart. The SKUS which sold well turn into evergreen pieces that are stocked all year long, but for the most part, each season’s collection is new. Fast fashion retailers like Zara and H&M are turning over inventory at an even faster pace, say every 3-4 weeks. Now for the purpose of this article, we won’t be getting into the environmental repercussions and footprint left by fast fashion but it’s worth acknowledging.
When you think about that methodology in the context of software and data teams, theres a quick realization that the benefits are transferable. The idea of maintaining cleanliness and ensuring that we do not accumulate unnecessary resources or contribute to dysfunctional pipelines is crucial in facilitating the journey towards enabling data.
Just like inventory that has not been refreshed in a while, data debt can accumulate and become overwhelming, making it difficult to make meaningful progress on projects.
Why data teams should operate like retail stores
So, how can the practices of retail stores help analytics and engineering teams manage data debt?
Like how retailers regularly refresh their inventory, data teams can regularly address debt. Instead of piling up technical debt over time, they can make a conscious effort to pay down this debt incrementally, much like refreshing inventory every few weeks or every season.
Adopt a "clean slate" mentality when it comes to tech debt. Just like how retailers start anew every season, data teams can take a similar approach by periodically reassessing their documentation, dashboards, and current sources of truth. On a periodic basis, decide what to keep and what to discard. This approach can help teams ensure that they are only keeping what is necessary and what is actually being consumed or relied on.
Untracked Inventory = Dark Data
For retail businesses, inventory management is critical to ensuring that products are available to customers when they want them. Untracked inventory can cause significant problems for retailers, leading to lost sales opportunities, increased operating and storage costs, and inefficiencies in the supply chain.
When retailers have untracked inventory, they may not know what products they have in stock. This can result in missed sales opportunities, as customers may be unable to find the products they want in the store or online. This can lead to lost revenue for the retailer, as well as frustration and disappointment for customers.
Dark data affects businesses in the same way.
Organizations are collecting data faster than they can effectively use it. With the growing need for data to power business decisions, data teams are central to business success.
But Data teams are burdened by the increasing volume of data requests to feed business needs. On top of this, poor data discovery is costing companies time and money. According to Forrester, up to 73% of company data goes unused for analytics and decision-making.
Dark data is the data that is collected by organizations but is not analyzed or used in any way. It includes data that is collected through various channels, such as customer interactions, website traffic, social media, and other event streams. This data is often unstructured and resides in various databases, spreadsheets, and other data storage systems.
One of the biggest costs of dark data is the cost of storage. When organizations collect data but do not use it, they still have to store it. This can be costly because storing dark data consumes storage space that could be used for active data. As the amount of data stored in the data warehouse increases, so does the time required to back up the data. This can lead to increased costs for backup storage media and infrastructure. As more data is stored in the data warehouse, queries may take longer to execute, as the database needs to scan a larger amount of data. This can lead to slower response times and decreased productivity for users.
Organizations have a responsibility to protect the data they collect, but when they collect data that is not being used, it can be more challenging to ensure its security. Unaccounted for, and uncatalogued dark data can lead to data breaches, resulting in legal and financial consequences. This work often gets deprioritized because it isn’t part of a larger business mandate and requires a significant amount of manual labor.
In addition, dark data can cost organizations in terms of missed opportunities. This data can contain valuable insights that organizations can use to make informed decisions, improve customer experiences, and gain a competitive advantage. However, when dark data is stored alongside active data, it can make it harder for users to find the data they need.
So, how can organizations address the issue of dark data? The first step is to identify what dark data they have and where it is stored. There are tools that automate this process, like Secoda. Rather than manually identifying PII, companies working in highly regulated industries like Paystack and Cardinal health are able to automatically identify sensitive data and appropriately restrict access.
Uncataloged, unusable, and unknown data is the unsung villain of today’s modern data stack.
Until it’s revealed, organizations won’t be able to realize the benefits of their data and will be unable to distil noise into form, meaning, and value.
Spring cleaning
As we approach the spring season, it's not just our closets that need cleaning. It's time for data teams to take a page out of the retail playbook and refresh their data inventory. The accumulation of data debt can be overwhelming, making it difficult to make meaningful progress on projects.
This idea reminds me of the TV show Monk, where Tony Shalhoub's character had a ritual of deep cleaning his apartment every few weeks, almost like hitting a reset button. In the context of software and data teams, there's a quick realization that the benefits of this approach are transferable. The idea of maintaining cleanliness and ensuring that we do not accumulate unnecessary resources or contribute to dysfunctional pipelines is crucial in facilitating the journey towards enabling data.
Just like how retailers refresh their inventory every season, data teams can make a conscious effort to periodically address their technical debt. By adopting a "clean slate" mentality, reassessing documentation, dashboards, and current sources of truth, teams can ensure they are only keeping what is necessary and what is being consumed or relied on. Failure to address dark data can be costly, both in terms of storage and missed opportunities. Organizations have a responsibility to protect the data they collect, and by identifying and addressing dark data, they can distil noise into form, meaning, and value, and fully realize the benefits of their data.