Numetric Warehouse represents a fairly large departure from the typical "data warehouse" model found on other business intelligence platforms. So you may be interested in a quick overview of how Warehouse works, and why we've designed it in this way.
In case you'd rather not read the full description below, here's the tl;dr version: Numetric Warehouse separates data import and cleaning from the process of analysis. This significantly reduces the burden of data managers in the organization, and allows average (non-technical) business users to create their own ad-hoc Datasets and accompanying Workbooks. Pretty cool, right?
And, if you're feeling ambitious (or just curious), here's the full story:
The Old Way
The ideas surrounding data integration, ETL operations, data cleaning, and data warehousing are not new, and we're not trying to reinvent the wheel. But the traditional way that business intelligence tools implement a "data warehouse" is very inefficient, and still leaves data curators with a lot of manual work.
Here's what the process looks like on other BI platforms:
- A business user has a question. ("What did my customers buy in regions 2 and 3 last quarter?")
- This question is handed off to a data analyst, who decides what data sources will be required to answer that question (say, the customers, sales, and products data sources).
- The analyst then pulls the appropriate data from those systems (or, if her tool supplies her with data connectors of some sort, she can access the data via that connector).
- The analyst then merges these datasets together and does a bit of cleanup to make the merged dataset more useful or understandable.
- This merged, cleaned dataset supports some sort of report or visualization that summarizes the data in a way that helps the business user understand the answer to the original question.
From a data management perspective, the approach outlined above presents a few problems:
- The data analyst has to repeat the data fetching, merging, and cleaning procedures for every report (or set of related reports) that she puts together. Some tools streamline portions of this process, but the overall theme tends to stay consistent: lots of repetition and wasted time and effort.
- This process makes data analysis heavily dependent on the data analyst or data curation team, leading to delayed data deliverables and overworked data people.
- Use of data from various data sources tends to be haphazard and inconsistent across different analysts and over time. This makes it hard to remember what data is where, and what should be considered "authoritative truth."
For many analysts and their employers, this has become the norm, and businesses have come to accept the inherent inefficiency in this process.
The Numetric Way
It should be obvious, with all of that setup, that we at Numetric think about this process differently, and we've built our tools according to this alternative perspective. In short, our approach aims at one important goal: we want to separate the data import/cleaning process from the process of assembling business-relevant data assets that answer business questions. We call this "two-stage data integration," and it looks a little something like this:
Stage One: Data Import, Cleanup, and Organization
- Data curators connect to and import data from all sources that contain data that is likely relevant to anyone in the business. (These sources can later be modified if something is missed or if new systems or data sources need to be added.)
- Data curators establish a set of cleaning and transformation rules that will be applied to the data at the point of entry into the Numetric platform. Note that this is before it has been decided whether or how the data will be used in some sort of report or visualization. Note also that, with this rule-based cleaning approach, the cleaning and transformations need only be setup once, and they will be applied every time new data comes in from that source. Again, changes can be made later as needed, but these rules are intended to be a "set it and forget it" process. After data passes through the transformation and cleaning process (on its way into Warehouse), each set of data will live in its own, clean, clarified Warehouse table.
- Using Numetric Dictionary, data curators then establish relationships among these cleaned Warehouse tables. Note that this process does not do any joining of data into combined datasets. (This joining will happen during stage 2, when the join becomes necessary in order to answer a specific business question.) Establishing relationships within Dictionary merely outlines the relationships inherent in the data, identifying them as "joinable."
At the "end" of Stage One, the data curation team has turned messy data from multiple disparate sources into clean, understandable, reusable data sources that can be made available to the broader organization.
Stage Two: Ad-Hoc Dataset Assembly
Now that your data is cleaned and relationships clarified, it can be used (and reused) many times over in order to answer business-relevant questions. Importantly, because the data curation team has done their job in the Warehouse, this ad-hoc assembly does not require much data expertise. The average business user or line manager can assemble the Dataset they need to answer their questions.
- The user first selects the primary table of information for the Dataset. This will typically be something like "Customers" or "Orders."
- To this primary table, the user can add (join) additional tables from the Warehouse. Because Dictionary designates the "joinable" relationships, assembling a multi-table Dataset is very simple; only available relationships are presented to the user. Because of Numetric's proprietary storage technology, Datasets assembled here can be incredibly large (even hundreds of millions of rows), with no noticeable compromises in performance.
- After the desired tables have been added to the Dataset, the user can, if desired, perform a set of transformations or other adjustments to ensure that the final Dataset contains just the right data in just the right format.
- Finally, the user can, in just minutes, create a Workbook ( similar to a visualization or report) that summarizes the information in the assembled dataset.
Benefits of Two-Stage Data Integration
There are numerous benefits to doing data integration in this way, but I'll highlight just a few:
- Data import, transformation, and cleaning happens just once. You shouldn't have to re-clean your Salesforce data every time it comes in. Because cleaning and transformation are rule-based, once you setup those rules, you should never have to touch it again. And we think that's pretty cool.
- Dictionary provides a centralized, comprehensive, and standardized "data vocabulary" that can be shared across the whole organization. Data managers and business users alike are provided with a go-to resource where they can see how data is supposed to look, where it's coming from, and what data is available to answer critical business questions.
- Because all of the "data-savvy" operations are handled upfront in Warehouse (probably by the data team), the process of assembling a purpose-built Dataset is very straightforward. This means that the average business user or line manager can build their own Datasets and Workbooks to answer their own questions, without having to rely heavily on the data team to help.