If you've been following along in this introductory series on Numetric Warehouse, you'll have already learned how to bring data into Warehouse, as well as how to setup powerful, automatic cleaning and transformations to prepare your data for use throughout your organization.
The final step in the data integration process is to supply Numetric with the information it needs to connect all of your data together. Don't worry! Unlike in other data tools, you won't be building a database schema. We've created a clean, minimalistic experience that is so quick, you'll wonder when the real work is going to start.
Numetric Dictionary has two key purposes. First, it allows you to establish relationships among your Warehouse Tables. Second, and equally important, Dictionary allows you to establish (and publish) a unified vocabulary surrounding your organization's data. Let's look at each of these in turn.
Defining Relationships Among Warehouse Tables
At Numetric, we refuse to treat Warehouse like a database just because that's where most of the data originated. Instead, we follow a simple "dictionary" paradigm to create understanding around your data. The process looks a little something like this:
- Define a set of terms that describe key aspects of your data. Examples include "Customer," "Product," "Order," or "Visitor."
- For each term, you will identify all the places in your data where that term is referenced. For example, the "Product" term is obviously referenced in the product catalog, but it will also show up in inventory data, in sales data, in web traffic data, and potentially several other places. Dictionary makes it very easy to identify all of those alternate locations and link them to the "Product" term.
- Once you have established your terms and identified where they exist in your Warehouse Tables, you can add key metadata about the term, including a description of the term, its origin, its data type, and even clean examples for what it should look like. This results in a standardized set of "data definitions" that can be shared among the data curation team, as well as with the broader business user community throughout the organization.
All of this happens in a pleasing, simple interface, and you won't see a single ER diagram or "spiderweb" database schema with relationship connection lines everywhere.
Ready to see it in action? Let's do it.
To create a term, simply navigate to the Dictionary tab within Warehouse and click on "New Term" in the upper right.
First, you can provide a name for the term you're creating. You'll then be presented with the list of primary keys that are not yet associated with a Dictionary term. For example, if I have a table containing all of the information about Manufacturers of the products in inventory, but I have not yet created a "Manufacturer" term in Dictionary, then it will show up as a candidate for definition. This means that this list of available primary keys will become shorter over time as you add definitions to your Dictionary.
(Pro tip: if your list of primary keys is long, you can use the filter bar to quickly search for the key you're seeking.)
Once you've found the key, you can select it from the list. Select "Create Term" when you're finished.
Congratulations! You're now looking at your newly created term. If this term doesn't show up elsewhere in your Warehouse, then you can simply add a description (if desired), perhaps add some clean examples (if desired), and you're done.
It's more likely that you'll be adding terms that exist in multiple places in your Warehouse. To identify these other "appearances," you can click "Add Columns" in the upper right.
You are now presented with a list of all other columns in your data warehouse that haven't yet been identified as a part of another term. This is usually a pretty long list, so you'll want to use the filter search bar along the top to quickly find the column(s) in other tables that identify the term.
Using the manufacturer example, I have searched for "Manufacturer" to find other references to the Manufacturer term, and I have selected two additional columns (from a couple of different products tables elsewhere in Warehouse).
Note: Though the example I'm using happens to have product tables that use the same column name (ManufacturerID), this is not required (nor very common in data from separate systems). Those other columns could be named "manufacturer" or "mfctr_name" or something else, and I could still associate them with the Manufacturer term.
And that's it! We've now created a standard representation of Manufacturer and identified all the places in Warehouse where it is referenced.
For you database people, you'll recognize that what we've done here is identify the main, defining table for a key piece of data, and then identified the foreign keys that point to that defining table. But you'll also (hopefully) recognize that we aren't forcing you to think about this in database terms. There's no reason to complicate this any further.
Did I just Join my Warehouse Tables?
I want to point out an important distinction about what we've just done in Dictionary. While we have identified relationships among our Warehouse tables, we have not actually joined anything together to create a merged set of data.
This is important. On the Numetric platform, we have intentionally separated the cleaning, transformation, and organization activities (all of which happen in Warehouse) from the actual joining of data into a usable, analyzable form. The reason? By keeping data in it's "pure," separated form in Warehouse, we allow the data curation team to do all of the heavy lifting in organizing and managing their organization's data, and we believe they should only have to do that once.
When it comes time to answer business questions, a purpose-built Dataset can be assembled that provides an answer to that question. With relationships pre-defined in Dictionary, anyone (yes, even business users) can easily join data together and then build a Workbook to dig into the data and get answers quickly. These joining activities will likely be repeated in many ways and with many unique combinations over time, but none of those future Dataset assembly activities will require the data curators to drop everything and put together the right view of the data.
We think it's the right way to do this. And we hope you agree.