With your data imported, cleaned, and organized in Numetric Warehouse, you're now ready to build customized Datasets that will quickly help you find the answers you and your coworkers are looking for. If you're a part of the data management team and haven't read the other articles in this series, you'll probably want to do that first. If you're just looking for guidance on how to quickly build datasets from your organization's data, then you've come to the right place. 

About Numetric Datasets

First, a quick overview of Numetric's storage architecture. We've already discussed the import, transformation, and organization of your data in Warehouse tables. When your data is carefully organized in Dictionary, you can easily pull data from anywhere in your Warehouse, merging data into a larger set that contains really valuable information. These larger, merged sets of data are intended for exploration and analysis, and we call them Datasets.

In case you wanted to know, Numetric stores and indexes your Datasets as separate, "tabular" files (with basic rows and columns). Tabular files look like this:

Numetric Datasets can be very large (hundreds of millions of rows), and they are unbelievably fast and flexible, thanks to our proprietary backend technology, Lightning Storage. The specific details of how this works are something of a family secret here at Numetric, but you should know that Numetric Datasets are very easily understood and used by everyone in the organization, data background or not. You might think of them as a really cool version of a simple Excel spreadsheet.

Datasets are the technology underlying the fast, flexible visualization and analysis experience found in our Workbooks. Each metric on a Workbook draws its data from a single Dataset. So if you'd like to know how to build powerful Workbooks, you'll have to first understand how to assemble your own datasets.

Assembling a Dataset

To build a dataset, navigate to the "Datasets" screen in Numetric. (Note that if you're currently in Warehouse, you'll have to hover over the Numetric logo in the top left of the screen and switch to Numetric.)

On the Datasets screen, you'll see a list of the Datasets that you have created or been given access to. You can create a new Dataset by clicking on the "New Dataset" button in the top right of the screen.

Step 1: Select a Primary Table

The first step in Dataset assembly is to select the data you'd like to analyze. This data can come from several different tables, but you'll have to start with one "primary" table. This is usually the table that is most central to the question that you'd like to answer. 

For example, if you'd like to understand sales activities, your primary table will probably be something like "Purchases" or "Orders." In the example below, I've chosen to add the "Sales" table as the primary table for my Dataset.

Each time you add a table to a Dataset, you can choose which columns you'd like to include from that table, simply by checking or unchecking the boxes next to the column names. By default, all columns will be selected, but you can easily deselect columns that are irrelevant to your particular question. In the example below, I've chosen to include 4 columns, but I'm not interested in the tax column, so I've unchecked it.

Note that you can also search for a particular column (handy for tables with many columns). Note also that you'll see a preview of the data from that table so you can make sure that you're pulling in the desired data.

Step 2: Add (Join) Additional Tables

Answering critical questions often requires data from many different sources, and this is typically a primary source of difficulty on business intelligence platforms. 

Joining data from another table is as simple as clicking the "+" sign to Add a Joined Table. A list of potential, joinable tables is presented, and you can make a selection.

The example above shows the simplest of joins, a "one-to-one" join, in which a single customer is associated with an individual sale. 

Often tables are joined using what is called a "one-to-many" join. This type of join happens when there are many rows in one table associated with each individual row in another table. In the example below, there are many soccer matches in each FIFA league, so when I join the match and league tables, Numetric automatically uses a one-to-many join operation. 

Notice above that when a one-to-many join occurs, Numetric allows you to specify aggregations. Aggregations allow you to summarize values from the "many" side of the join as they are merged with the "one" side of the join. For example, I have chosen to summarize the goals scored in matches (the "many" side) for each league (the "one" side) by totaling them (using the Count aggregation method) and saving the total as a new column called "Total Goals."

Note that you can add multiple aggregations if there are several items you'd like to aggregate from the joined table. Additional examples from the image above might include average number of injuries per match, average match attendance, and so on.

Step 3: Apply Transformations

Once you have added all of the desired data to your Dataset, the next step is to apply transformations to the merged data. These transformations range from very simple (e.g., filter out rows where a certain column is empty) to very complex (in which you can build complex formulas to derive calculated fields). For this tutorial series, however, I'm going to simply point you to the description of the transformations that we discussed in the Cleaning Warehouse Tables article, including Filters, If-Then transformations, and Formulas. The same set of transformations can be applied here during Dataset assembly, and these transformations can be used to make your Dataset look and behave just as you'd like it to.

Step 4: Publish and Share

When you've assembled and shaped your Dataset to your heart's content, you're ready to start exploring and analyzing. Before you or your coworkers can build a Workbook from the Dataset data, it must be published. This is accomplished by clicking the "Publish" button in the upper right of the Dataset builder window. As with Warehouse tables, any changes that you make can either be Reverted or Published, and a small "Draft" notation will remain by the Dataset title until you do so.

Lastly, the Datasets you work so hard to assemble will be valuable to many of your coworkers. Numetric makes it easy to share Datasets with just a few clicks. Dataset sharing options are found under the "Dataset Properties" tab.

To make any asset (Warehouse Table, Dataset, or Workbook) available to another user or group, simply type in the name of the user or group, and choose whether they should have "edit" access (with which they will be able to modify the asset) or just "view" access (with which they will be able to use but not change the asset). You can also choose to share assets with all users of a certain role. For example, a customer data asset will probably be useful to pretty much everyone in the organization, so I have chosen to given everyone at Numetric "view" access to the table. 

Did this answer your question?