By: John Loury & Recia Roopnarine
Author’s note… Being mostly Italian and a lover of good food (so, yeah, Italian), I like to think of building a data model in terms of making a really great pasta sauce: You need to gather quality ingredients and take the time and care to combine them in the right way, to get the best results.
Whatever your company does, you’re probably sitting on a lot of data — with more coming in every day. You know you need to understand it and organize it into a flexible, functional data model to answer next-level questions that will allow your organization to excel in your industry. In this article, we’ll run through the vital elements you need to keep in mind to build a data model that will work for you, grow as your needs do, and help you uncover game-changing intelligence.
Where is your data and what are you doing with it?
Since you’re reading this article, it’s safe to say that you appreciate the power of data. So before we go any further, it’s worth you taking a moment to think about your data landscape — what you know and questions you’ll need to have answers to before you can build your data model. (This is like deciding what ingredients are going into your sauce.)
So, where’s all your data located? Do you have a data warehouse, a data lake, spreadsheets? Is it all on the cloud? Are some datasets on the cloud and others stored in local machines? Who has access to these different datasets? Where does the data in them come from and who maintains those systems?
As mentioned above: Your organization is probably generating more data than ever before, from a growing number of sources. If you know for a fact that all the data you need for your model is in one place (say a data warehouse or data lake), that can be very convenient (the hard part then becomes sorting through it, but we have tools to do that). Otherwise, before you can begin building your model, you should understand where all your data lives (you may need to call your team’s data person — they can have a lot of different titles) or the CDO (if you’re an executive or department head reading this).
Once you get your datasets and sources in line, you can start cooking that delicious pasta sauce — that is, your data model!
A note about questions
This whole article is based on the supposition that you already know at least some of the questions you want to answer with your analytics tools and your amazing soon-to-be-built data model. However, if that’s not the case, you definitely want to start building your list of questions as you gather all your datasets in the previous step. It’s a simple process (sit down with your stakeholders, talk about their business questions today and possibilities for tomorrow), but it can be a time-consuming one, depending on how many people you need to talk to.
Simple is beautiful — and useful
Data can only bring value to your company when it’s organized and in a usable state. Ironically, having too much data can be just as bad as not having enough of it or the right data. You can have a robust cloud data lake brimming with every piece of information that your company produces or touches or otherwise needs to know, but until you clean and organize it, it’s not going to be fully useful to you and the other stakeholders.
Just like when you’re making sauce, you don’t just buy a bunch of ingredients, throw them in a pot, and hope for the best. You’ve got to rinse the basil and toss out the janky-looking leaves, peel and mince the garlic (really thin), open your cans of San Marzano or Roma tomatoes (accept no substitutes) … then you can start cooking!
Cleaning and organizing your data (also called “data preparation”) is a similar, vital process: You can’t just start querying your massive data warehouse and trust that you’re going to get meaningful answers. Datasets can hold conflicting information in incompatible formats, unnecessary duplicate entries, or even discrepancies that can compromise the quality and reliability of the insights you eventually hope to pull from your data model. The right system (which can include automated tools or programming languages like SQL, Python, and R) will run deduplication, put all dates and times into the same format, and perform other tasks that will assure when you finally start querying your data model, the answers you get accurately represent reality!
Our director of BI & Analytics, Eloy Meira, likes to remind us, “Anyone can make something difficult. The real challenge is making something simple.”
Properly unifying, cleaning, and validating your data will help ensure that your analyses and conclusions will be trustworthy. So once you’ve woven all your datasets and sources together, you and your data team need to clean and prepare your data as you build your model.
Better data, better model, better results
This is why a data model is like pasta sauce: Again, to guarantee quality results, you can’t just mix a bunch of random stuff in a pot and hope it works out. You have to choose the best ingredients and bring them together in an artful way to create a balanced, superior flavor. What you put into a dish (or data model) influences the results. When you care about the ingredients and truly understand how the dish is made, then you’ll create the tastiest sauce.
It’s the same with data. When you understand the subject area that you want to analyze and then combine, clean, and organize your data, you’re on your way to building a better data model. These very important steps will ensure you can come up with meaningful insights and power-packed visualizations, which help business leaders and other stakeholders do their jobs better and stay enthusiastic about the power of data.
Having a great sauce as your foundation can elevate a simple plate of pasta, and similarly, a strong data model is the foundation of great BI. Building a great model and delivering game-changing insights is how you encourage adoption across your organization, leaving your users hungry for more.