Join the MDS Fest slack community to access recordings, engage with speakers, and get updates on MDS Fest ’24
Managing a Modern Data Stack, in a “Postmodern Data Stack” World
“The modern data stack has leveled-up the capability of all companies. Everybody has superpowers, and we're learning how to use them, but we're also learning the limits.”
I hosted a panel featuring industry experts, Joe Reis, Chad Sanderson, Mark Freeman, and Scott Breitenother. Together they discussed the challenges of managing a modern data stack in a “post-modern” data stack world. They explored how data teams have made significant progress but are now facing new challenges.
Key themes included:
- Balancing speed and governance in the cloud
- The importance of scaling data operations thoughtfully
- Approaching data initiatives with a mindset of flexibility and scalability.
Some important recommendations from the session included:
1. Data Modeling: Revisit and adapt data modeling practices to align with changing use cases and data needs. Consider the evolution of data modeling in the context of modern data requirements.
2. Data Governance: Pay attention to data governance, especially in light of regulations like GDPR, and ensure that you have control over your data, including knowing where it resides and who has access to it.
3. DevOps for Data: Embrace DevOps practices for managing data pipelines, quality assurance, and automation. Create efficient and reliable data pipelines to support your business processes.
4. Usefulness and Value of Data: Continually assess the usefulness and value of your data. Prioritize data initiatives based on the evolving needs of your business and its stakeholders.
One of the most important takeaways from this session was to remember that data is not a static resource; it evolves with your business, and your data strategies should evolve with it.
Data contracts vitamin or painkiller ft. Sarah Yazouri
"Data contracts are not a silver bullet, but they are a step in the right direction."
In this session, Sarah Yazouri, CEO and co-founder of StratusHawk, painted a pretty familiar picture for most data teams: the challenges with lack of data ownership, visibility, and consistency, specifically in the context of a scaling company. Sarah discussed the potential for data contracts to help solve these challenges, and gave an overview of a tool that she and her team at StratusHawk are building called Dabler, an all-in-one data transformation platform that includes features like data modeling, validation, and governance, all integrated with data contracts. Dabler can help data teams create validated models that are enforceable with data contracts. Sarah’s session demonstrated an innovative approach aimed to streamline data management with data contracts to enhance collaboration within organizations–definitely one to check out for data teams scaling their teams and interested in employing data contracts.
Using your usage tips for optimizing for cloud spend ft. Tim Castillo
“We have to be more conscious of our budgets…especially with the way that everything has been going in 2023”
In a very relevant talk titled “Using your usage: Tips for optimizing your cloud spend”, Tim Castillo breaks down some key knowledge for data teams about how to manage your infrastructure costs. As we know, many data teams are feeling the pressure with respect to cost containment. Most are still just getting the hang of building out pipelines and learning how best to leverage all the different tooling options available. But unfortunately, many are ill-equipped to manage the costs that come with using these various tools. Tim breaks down some of the key things you should be paying attention to when it comes to managing your data team costs, making this a critically useful session for any and all data teams.
CI for dbt Beyond the basics! ft. Pádraic Slattery
“Everyone’s dbt set up is slightly different…You need several approaches depending on what the issue at hand is”
In this session, Pádraic Slattery takes us into a deep dive on CI for dbt, and how you can set up the right custom solutions for your CI needs. He gives a great overview of continuous integration (CI) and the problems it helps data teams to solve. He also talks about the importance of automation in the CI process, how the CI process can begin even before code reviews (with pre-commit hooks), how to leverage Slim CI to run CI efficiently and cost effectively, and shares various other solutions (such as pytest on dbt artifacts) to help enforce specific conventions and quality in your code base in a scalable way. A critical session for analytics engineers and team leads looking to drive an efficient CI process!
Enterprise data core: a scalable data platform ft. Karthik Venkatesh
“It was a 100% modern data stack design”
In this session, titled “Enterprise Data Core - Scalable Data Platform”, Karthik Venkatesh of Wisdom Schema shared a nice overview of how his team migrated a large telecom company from legacy reporting in Excel with little to no security or access controls, to a fully modern data stack design that enabled advanced analytics, RBAC, and delivery of high-quality data to external partners. The story provides a detailed overview of the approach and design implementing the modern data stack, including data ingestion, data processing, orchestration, and data delivery, with a focus on achieving continuous delivery through a well-structured data entity-based approach. The session shared some of the challenges faced during the project, and how they overcame them.
Gee, stop building into production! ft. Sonny Nguyen
“You never actually build in production–you build in pre-prod, test, and then swap with production” (on blue-green deployments)
Sonny’s session titled “Gee, stop building into Production” provided guidelines and best practices for deploying your dbt projects in production environments. Sonny highlighted the challenges of dealing with unexpected events and changes in data sources, schemas, and objects and emphasized the importance of implementing production gatekeepers to ensure data integrity. He offers some examples and solutions, including blue green deployments, production rollbacks, and others. If you’re facing challenges maintaining data quality and preventing data issues from affecting production environments, be sure to check this talk out!
Semantic superiority ft. David Jayatillake
“Semantic layers provide a framework for variant queries to be run with consistency”
In this talk, David Jayatillake gives a thorough overview of semantic layers, mapping of real-world entities with associated metrics and attributes to logical data structures. David demystifies the difference between semantic layers and metric layers, explaining that semantic layers define entities, while metric layers focus on metrics and dimensions. David also shares some of the benefits of using semantic layers (such as providing a common understanding of data within an organization, reducing confusion and improving decision-making) while also focusing on some of the drawbacks of not having a semantic layer (repeatedly defining data structures and joining data in queries). The session also explores the pros and cons of having a semantic layer embedded within a visualization tool versus using a standalone semantic layer.
Semantic Layer : the backbone of AI powered data experiences ft. Artyom Keydunov
"Admitting that it cannot answer the question is much better than trying to give you the wrong question."
In this session, Artyom, the co-founder and CEO of Cube, a semantic layer company, discusses the role of the semantic layer in building AI-powered data experiences and applications. He shares his experience from starting a chatbot company, and emphasizes the need for a semantic layer to bridge the gap between natural language queries and database queries.
If you’re considering how your team can best leverage LLMs in your data tooling, be sure to watch this session to learn more about the importance of first building a semantic layer.
Small & Slow: When its right to do all the wrong things ft. Jerrie Kumalah and Taylor Brownlow
“There's an opportunity to step back and think about a bigger paradigm shift” (towards small and slow)
In this session, Jerrie Kumalah and Taylor Brownlow share their viewpoints on the concepts of "small and slow" analytics, in a world that is typically pushing “big and fast”. They highlight the challenges of information overload, burnout, and the fear of missing out in the fast-paced data industry. As a solution, they propose the idea of leveraging smaller communities as spaces for meaningful collaboration, and slow analytics as a paradigm shift that encourages experimentation, creativity, and a focus on outcomes rather than speed. They emphasize the importance of breaking away from rigid processes, fostering communication, and learning from mistakes.
Communication as a means of better data governance ft. João Vitor de Camargo
"Bridging the gap between data teams and external stakeholders to better understand how data is created, and processed–it's a communication problem."
In this session titled “Communication as a means of better data governance”, João Vitor (JV) of Dialogue discusses the importance of communication in data governance, particularly in a rapidly growing organization like Dialogue. He emphasizes the need for a clear framework and ownership of key metrics to bridge the gap between data teams and external stakeholders. JV’s session provides suggestions for how documentation can make metrics accessible to everyone, promote data accessibility, and democratize data usage within the company. By involving metric owners, they can provide context, improve data quality, and share insights with the rest of the organization. JV's approach enhances data governance and communication, ultimately improving data reliability and quality.
Full Conference Videos