With your data imported, cleaned, and organized in Numetric, you're now ready to build customized Datasets that will quickly help you find the answers you and your coworkers are looking for.
About Numetric Datasets
We've already discussed the import, transformation, and organization of your data in Tables. When your data is carefully organized in Dictionary, you can easily pull data from anywhere in Numetric, merging data into a larger set that contains detailed information. These larger, merged sets of data are intended for exploration and analysis, and we call them Datasets (it is important to note however, that it is possible for a Dataset to contain only one Table).
Numetric stores and indexes your Datasets as separate, flat, tabular files (with basic rows and columns). Tabular files look like this:
Numetric Datasets are unique in that they can be very large (hundreds of millions of rows), while remaining unbelievably fast and flexible, thanks to our proprietary backend technology, Lightning Storage. The specific details of how this works are something of a family secret here at Numetric, but you should know that Numetric Datasets are easy to understand and can be used by everyone in your organization, data background or not. You might think of them as a really cool version of an Excel spreadsheet.
Datasets are the key to the fast, flexible visualizations and analysis found in our Workbooks. Each metric in a Workbook draws its data from a single Dataset. So if you'd like to know how to build powerful Workbooks, you'll have to first understand how to assemble your own Datasets.
Tables vs. Datasets
Datasets and Tables have some similarities, but their purposes are quite different. Both are stored in tabular format (rows and columns), can be modified with the same transformations, and can contain the same or similar information.
One of the key differences between Tables and Datasets is how they are stored. Tables are stored in a traditional SQL database, which are cheap to operate, and slow to search. Datasets are stored on our premium Lightning Storage, which allows for fast, searchable results. Because Tables are cheap and easy for us to store for you, your Numetric subscription includes unlimited storage for Tables. In contrast, your contract will limit the amount of Lightning Storage available to you for Datasets.
Additionally, Tables contain only information from a single data source (usually a corresponding external database table), and you cannot compile information from more than one data source into a single Table. As a result, your transformations are limited to operating exclusively within that Table, and cannot reference or modify any other Tables within Numetric. Datasets can be assembled from data across several Tables and Sources. Because of this, the transformations that take place within Datasets can reference and modify information across various Tables.
The final difference between Tables and Datasets is what they are used for. Tables are intended as a staging area, from which data is pulled to feed into a Dataset. Datasets, the real workhorses of the Numetric platform, compile and feed information into Workbooks. Additionally, because Tables serve as a sort of staging area for data, they are usable across the entire organization. A single Table can be used in multiple Datasets for multiple different user groups. Datasets, on the other hand, are more targeted at a specific domain or group in the business.
Assembling a Dataset
If your Dictionary Terms are set up correctly, your Tables are already connected with Dictionary Terms. This makes the process of Dataset assembly quite straightforward. If your Dictionary Terms are not properly established between Tables, you will not be able to build an effective Dataset.
Determining the Primary Table
To begin building a Dataset, you will need to identify the Primary Table of your Dataset. This is the main Table that defines the structure of your Dataset. The granularity your Dataset (i.e., what each row in the Dataset represents) is determined by the Primary Table you select. It is generally a good practice to select a Primary Table that has the most detailed, granular level related to your Dataset. If you have elements of three Tables you would like to include in your Dataset, your Dataset would be dramatically different depending on which Table is selected as the Primary Table.
Let’s take a look at an example using the following three Tables:
If you were to set the Customer Table as your Primary Table, your Dataset would contain one row for each customer (the Primary Key of that Table being Customer ID). This may seem like a logical Primary Table, however, because each customer can place multiple orders there are multiple rows of order data from the Orders Table that will need to be displayed for each customer. Because this Dataset will only have one row for each Customer, this data will need to be combined or reduced in some way. This means we would lose a lot of order-specific information in our Dataset. For example, Numetric couldn’t display all of the order numbers, because there is only one row per customer, and one column in that row for order numbers.
If we were to set Service as the Primary Table, we would have one row for each service ticket number. This means that orders without a service ticket number wouldn’t have a place to appear in the Dataset. So, while it may have a little more detail and granularity than the Customer Table, there are still many limitations.
If we were to set Orders as the Primary Table, the Dataset would contain one row for each order placed. This means that the customer ID will show up in the Dataset each time they placed an order. Additionally, every service ticket will be displayed, as they are all associated with orders. Using this structure, we can view all of our customer information from the Customer Table, all of our order information from the Order Table, as well as nearly all of our service ticket information from the Service Table.
Adding Joined Tables
After identifying and selecting your Primary Table, you can add joined Tables to pull related data from other Tables. If your Dictionary Terms are setup correctly, Numetric should recommend Tables to join based on the relationships your Primary Table has established through your Dictionary Terms. Once you have determined which Table(s) to join, you can select which columns you would like to bring into the Dataset. You have to include the Primary Key of the joined Table, but you can choose whether or not to include any other column from that Table in the Dataset.
There are two types of joins available: a one to one join, and a one to many join. Note that Numetric automatically identifies the type of join required without any input required from the user.
One to one join
A one to one join occurs when there is a single Foreign Key key (the Primary Key from a Joined Table) associated with each primary key (the Primary Key from the Primary Table) in the Dataset. In our example below, because each order can only be placed by one customer there is only one Customer ID (Foreign Key) for each Order ID (Primary Key).
This is the simplest, and cleanest type of join in Numetric. All of the selected data from the joined Table will appear in the corresponding row of the Primary Table, nothing is lost, and everything lines up.
In our example above, because there is only one customer ID (Foreign Key) associated with each Order Number (Primary) it’s a one to one join.
One to many join
A one to many join occurs when there are multiple Foreign Keys (the Primary Key of the joined Table) related to each Primary Key (the Primary Key of the Primary Table) in the Dataset. In our example below, there is more than one service ticket (Foreign Key) associated with a single order (Primary Key).
This means that two rows of data (R-57122 and R-57235) need to be consolidated into one row of data. When this occurs, you need to tell Numetric what to do with the multiple rows of data associated with the joined Table. This is called an Aggregation.
While one to many joins are not as simple and clean as a one to one join, they can still work perfectly fine depending on your application.
After you have created your new Dataset, you can make any needed transformations, including creating new custom calculated fields. The process for transforming your Datasets is identical to transforming Tables. You can use these transformations to make your Dataset look and behave exactly as you would like.
Publish and Share
Once you have your Dataset all set up and good to go, you're ready to start exploring and analyzing. Before you or your coworkers can build a Workbook from the Dataset, it must be published. The process of publishing a Dataset can actually take quite a while (potentially hours, or even days) depending on the size (number of rows/columns) of the Dataset. During the publish process, the entire Dataset is indexed, and optimized so that the Workbooks referencing the Dataset can filter, and display data without any delay.
When building out any Dataset, it is essential to construct it with your end workbook in mind. Having a clear vision of what you want your Workbook to display will help shape the design of your Dataset. In fact, you may even find yourself coming back, adding data to or modifying your Dataset while you are building out a Workbook.