We're excited to show you how easy it is to bring your valuable data into Numetric and prepare it for exploration and analysis. We'll follow a two step process, first ensuring that the to-be-imported data is in the correct format, after which we can walk you through creating your first dataset from imported data.
Formatting Your Files for Import
The file structure for to-be-imported files is very simple. There are just three requirements:
- The file is in tabular format. This means the data is arranged in columns and rows. The most recognizable version of this type of file is a Comma-Separated Value (CSV) file, which is the filetype we'll use for this quick start guide.
- The file's first line contains the column names.
- Each subsequent line contains a separate record or row of data.
With each of these three properties, your to-be-imported data should look something like this:
Note: The import process we'll use for this quick start guide is a manual CSV file upload, and this requires the imported file to be smaller than 1 GB. Later, you'll learn how to use the Numetric API to bring in larger datasets with significantly more control, but for now, make sure the file you'll be using is under the 1 GB limit.
Importing your Data
You're now ready to bring some data into Numetric and create your first dataset! Numetric supports several more advanced methods of data import, but this quick start guide will use the simplest approach: manually uploading a CSV file. (This tutorial will probably be most effective if you use your own dataset, but you can also download a small sample file here if you just want to experiment.)
To upload a CSV file, navigate to your "Datasets" screen and click the "New Dataset" button in the upper right of the screen.
In the "Create Dataset" screen, select CSV from the available options.
You can now give your new dataset a name and provide the CSV file for upload. This can be accomplished by dragging your file into the browser window where indicated, or by clicking to browse to the file on your hard drive.
Finally, click "Create Dataset" to complete the import process.
After successful import, Numetric will display its progress while it initially processes your uploaded data. Numetric will go through each row of data to get an idea of what your dataset looks like. This may take a few minutes, depending on the size of the file you've just imported.
When the processing is complete, Numetric will know what your data looks like and will be able to display some summary information under the "Overview" tab.
Clean Things Up
The next step is to clean up your imported data and prepare it for use in a Workbook. Numetric provides a powerful set of functions that take the pain out of what is often the least enjoyable part of data analytics. An extensive discussion of these tools is a topic for another article, so for now we'll just give you a few highlights.
All of your cleaning activities will take place in the "Fields" tab. Here (along the left of the screen) you can see a list of the fields in your dataset, and by clicking on a particular field, you can view and modify many of that field's attributes. For now, we'll focus on just two key attributes, the semantic display name (found at the top of the attribute pane) and the data type, located in the "Meta" section just above the description.
We highlight these two field attributes in this quick start guide because these are the attributes that will enable a good first Workbook in the next step in the guide.
First, the semantic display name is editable and allows you to specify a more descriptive name than is often found in the actual data being imported. For example, the data coming from your inventory system might have a field called "PROD_CAT_4," but that's not a very informative name, especially for others in your organization who won't recognize the name. Instead, you could give the field a nicer semantic name like "Product Category" so that you and others better recognize it. If you feel like this process would help clarify a few of the fields in the dataset you just imported, go ahead and add some nicer, more friendly semantic names to the fields in question. This will make things easier later when you build your first Workbook.
Second, the field's data type attribute allows you to specify, well, the type of data stored in that field. Upon import, Numetric will do its best to establish the fields in your dataset according to the most likely data type, but it's not always perfect. Sometimes it's hard to tell whether a field should be considered a number (like quantity or weight) or a text field (like CustomerID, for which arithmetic functions don't really make sense). It's usually a good idea to go through each field in a dataset you just imported to make sure the data types are correct, and to make any needed adjustments.
As discussed, there are more (and more advanced) capabilities that can support data cleaning activities, but we'll leave those for a separate, more detailed article. For this quick start guide, let's move (finally) to the final step in the process - creating your first Workbook!