**The following formulas can be applied as a part of a transformation in either the warehouse or as a part of dataset assembly.**

## abs(x)

Returns the absolute value of (x). This formula only applies to numeric values and returns the (positive) distance of that number from zero.

*Syntax: abs({Column Name})*

Example: abs({cost})

## addDateTime(d, value1, units1, ..., valueN, unitsN)

Return d with the given value(s) (in specified units) added. Each value argument is an integer (can also be negative), and each corresponding unit argument specifies the units of time that will be added. Each units argument is a string with the following possible values: years, months, weeks, hours, minutes, seconds.

*Syntax: addDateTime({Column Name}, value1, units1, …, valueN, unitsN)*

Examples:

## ampm(time)

Returns “AM” or “PM” based on the time. This formula only works on a column with a time value measured in 24 hours instead of 12 hours. In order to apply this formula, create a new column that, through indexing, would be filled with either “AM” or “PM”, depending on the time stamp from the initial column.

*Syntax: ampm({Column name})*

Example: ampm({time})

## ceil(x)

Returns the smallest integer greater than, or equal to a given number. In other words, Ceil(x) is rounding up to the nearest whole number.

Syntax*: ceil({Column name})*

Example: ceil({avg_wkly_clnts})

Other useful examples:

When non-integer rounding is desired (like, for example, when rounding to two decimal places for currency fields), the rounding functions can be used in conjunction with multiplication and division to achieve the desired level of precision.

Rounding average price to two decimal places:

Rounding up: floor(ave_item_price*100)/100

Rounding down: ceil(ave_item_price*100)/100

## coalesce(arg1, … argN)

Returns the first non-NULL expression among its arguments. The coalesce formula is used when you need just one of multiple possible answers. Using the example coalesce formula below would return the first phone number that it encountered by searching the other columns ** in the specific order** the formula is written. The formula would first look if there was a business number, if not it looks for a cell number, if there is neither a business nor a cell, it would look for a home phone number. One of these results would be entered into a new column that you create.

*Syntax: coalesce({column name}, {column name}, {column name})*

Example: coalesce({bus_phone}, {home_phone}, {cell_phone})

## contains(value, array)

Return true if value is found in array.

contains is used to see whether a value is contained in a set of items. Contains can be used with any datatype, on an array of items of any size. The formula returns a boolean (true or false) value.

*Syntax: contains({value}, {array})*

Example: contains({err_status}, pack({order_result}, {shipment_result}))

## cos(x)

Return the cosine of x in radians.

*Syntax: cos({Column Name})*

Example: cos({trajectory})

## cosd(x)

Return the cosine of x in degrees.

*Syntax: cosd({Column Name})*

Example: cosd({trajectory})

## dateDiffSeconds(date1, date2)

Return the number of seconds between date1 and date2. dateDiffSeconds is used to create a new, calculated field that will tell you the length of a time period in between two dates. A good example would be for a sports venue. By calculating the time between when a ticket was purchased and when it was scanned at the gate, you can easily know how far in advance the average ticket for each event was purchased.

Use dateDiffSeconds by creating a new column and applying the formula using two different date columns. This will show the number of seconds between the two dates. If you want to show the number of hours or days or weeks, the formula can be appended with a little extra math as is shown below:

*Syntax: dateDiffSeconds({Date Column 1},{Date Column 2})*

Example: dateDiffSeconds({start},{end})/60

Other Useful Examples:

- To Get Minutes: dateDiffSeconds({Date Column 1},{Date Column 2})/60
- To Get Hours: (dateDiffSeconds({Date Column 1},{Date Column 2})/60)/60
- To Get Days: ((dateDiffSeconds({Date Column 1},{Date Column 2})/60)/60)/24
- To Get Weeks: (((dateDiffSeconds({Date Column 1},{Date Column 2})/60)/60)/24)/168

## dayOfMonth(date)

Returns the day of the month. This is valuable when looking at monthly cycles. A question like "How does your production team do at the beginning of the month compared to the end of the month when more orders start coming in?" is easily answered by grouping on the new "Day of Month" column that you'd create.

*Syntax: dayOfMonth({Date Column})*

Example: dayOfMonth({prod_date})

## dayOfWeek(date)

Returns the numeric day of the week (Monday=1 – Sunday=7). This formula is particularly valuable when normalizing the data based on day of the week rather than day of the month. A home improvement store, for example, could use this formula to get a better understanding of how sales are increased on Saturdays and Sundays compared to the rest of the week.

*Syntax: dayOfWeek({DateColumn}) *

*Example: dayOfWeek({prod_date})*

Side note: Once the data has been given a numerical value with the formula, you can create a "Value Mapping" rule in the data to have it return the names of the days of the week (Monday, Tuesday, Wednesday, etc.) rather than just the numbers.

## dayOfYear(date)

Returns the numeric day of the year. When comparing month to month, the dates aren’t always the best way to do it. Sometimes, you want to compare the first Monday of one month to the first Monday of the next month. In order to do this, the day of the week

*Syntax: dayOfYear({DateColumn}) *

Example: dayOfYear({prod_date})

## degrees(x)

Convert the angle x to degrees from radians.

*Syntax: degrees({Column Name})*

Example: degrees({angle_rad})

## Divide

Divide one column by another.

*Syntax:* *{Column 1} / {Column 2}*

Example: {completed}/{days}

## floor(x)

Returns the largest integer less than, or equal to a given number. In other words, Floor(x) is rounding down to the nearest number.

*Syntax: floor({Column name})*

Example: floor({avg_clients})

Other useful examples:

When non-integer rounding is desired (like, for example, when rounding to two decimal places for currency fields), the rounding functions can be used in conjunction with multiplication and division to achieve the desired level of precision.

Rounding average price to two decimal places:

Rounding up: floor(ave_item_price*100)/100

Rounding down: ceil(ave_item_price*100)/100

## format(format_string, arg1, …, argN)

Return a string with all arguments formatted according to format_string. format_string follows the Python formatting conventions. More detail on the formatting options can be found here and here.

*Syntax: format(format_string, arg1, …, argN)*

Example: format(‘{}, {}’, {LastName}, {FirstName})

## getAt(array, index)

Returns the value at index with array. (First item has position of 1.)

*Syntax: getAt(array, index)*

Example: getAt(split({sentence}, “ “), 2)

## hourOfDay(Time)

Returns the hour of the day (24 hour clock). When a date or time field uses a 12 hour clock, the hourOfDay formula will return the hour number in a 24 hour clock format.

*Syntax: hourOfDay({DateColumn})*

Example: hourOfDay({prod_date})

## ifelse(Conditional, TrueValue, FalseValue)

Returns true value if conditional is true, otherwise return false value. This is typically used when creating a new column based on other column(s) values. If you want to test multiple conditions, you can do a nested ifelse.*Syntax:* i

*felse({Conditional}, “TrueValue”, “FalseValue”)*

Example: ifelse({Quantity}>100, “In Stock", "Restock")

Nested ifelse Example: ifelse({column}==“value1","newvalue1",ifelse({col2}=="value2","newvalue2","other"))

If the value of column is value1, then it will return newvalue1. If column is not value1, it will check to see if col2 is equal to value2. If neither one of them are true, it will return “other”

Note also that the keyword EMPTY can be used as either the TrueValue of FalseValue. When the result of an expression is EMPTY, the resulting field is simply blank. See the table above for an example.

## join(sep, arg1, ..., argN)

Joins 2 or more columns. This formula allows you to join the value of Column1 through ColumnN, using a separator of your choice (, – * .).

*Syntax: join("choose character", {Column 1}, {Column 2})*

Example: join(“, ”, {City}, {State}) = Chicago, Illinois

## length(s)

Returns length of string/array s.

*Syntax: length({Column Name})*

Example: length({sentence})

## lower(s)

Returns a string with all characters converted to lowercase characters. This formula only works with columns of the Text datatype. This (or the corresponding “upper” formula) is a handy way to ensure that multiple text columns match without case sensitivity.

*Syntax: lower({Column Name})*

Example: lower({email})

## ltrim(s, length, start=0)

Returns string s, trimmed to length, beginning at position start (start defaults to position 0 if not provided). If length is longer than the trimmed string, the result is NOT padded on the end

*Syntax: ltrim({Column Name}, length, start)*

Example: ltrim({padded_text}, 12, 3)

## map(func, arg1)

Returns the array of results from applying function func to each element in arg1. Map requires func to be one of the 5 list aggregation functions (max, mean, median, min, mode), and arg1 *must* be a list of lists. This means that map can only be used in conjunction with the zip function, whose purpose is to aggregate lists of lists. See the example below.

*Syntax: map(function_name, list)*

Example: map(min, zip({list1}, {list2}))

## max(arg1, …, argN)

Returns the maximum value of all columns. The max formula will look for the highest numerical value in a list of columns and return that value.

*Syntax:* *max({Column 1}, {Column 2}, {Column N})*

Example: max({price1}, {price2}, {price3}, {price4})

## mean(arg1, …, argN)

Returns the statistical mean of all columns. The mean is found by adding all of the results together and dividing them by the number of results in the formula. It’s a fancy name for the average.

*Syntax:* *mean({Column 1}, {Column 2}, {Column N})*

Example: mean({price1}, {price2}, {price3}, {price4})

## median(arg1, …, argN) –

Returns the statistical median of all columns. The median is found by picking the middle value of all of your observations. If there are an even number of observations, the average of the middle two values is returned.

*Syntax:* *median({Column 1}, {Column 2}, {Column N})*

Example: median({price1}, {price2}, {price3}, {price4})

## min(arg1, …, argN)

Returns the minimum value of all columns. The min formula will look for the lowest numerical value in a list of columns and return that value.

*Syntax:* *min({Column 1}, {Column 2}, {Column N})*

Example: min({price1}, {price2}, {price3}, {price4})

## minuteOfHour(time)

Returns the minute of the hour of time.

*Syntax:* *minuteOfHour({Time Column})*

Example: minuteOfHour({prod_date})

## mode(arg1, …, argN)

Return the statistical mode of all columns. The mode of a set of numbers is the one that occurs most often.

*Syntax:* *mode({Column 1}, {Column 2}, {Column N})*

Example: mode({price1}, {price2}, {price3}, {price4})

## monthOfYear(date)

Returns the numeric month of the year from a date field.

*Syntax: monthOfYear({Date Column})*

Example: monthOfYear({prod_date})

## multiply

Multiplies two or more columns or any columns by a number.

*Syntax: {Column 1} * {Column 2} (or any number)*

Examples:

## pack(arg1, …, argN)

Returns an array of size N, assembled (in the specified order) from elements arg1 through argN.

*Syntax: pack({Column 1}, …, {Column N})*

Example: pack({item1}, {item2}, {item3}, {item4})

## quarterOfYear(date)

Returns the quarter of the year from a date field.

*Syntax: quarterOfYear({Date Column})*

Example: quarterOfYear({prod_date})

## radians(x)

Convert the angle x to radians from degrees.

*Syntax: radians({Column Name})*

Example: radians({angle_deg})

## regExFindAll(s, regExp)

returns an array of all the strings matching the provided regular expression, in the order they appear in the source s. If no matches are found, the result is blank. (Numetric formulas use the Perl regular expression conventions.)

*Syntax: regExFindAll({Column Name}, regExp)*

Example: regExFindAll({comments}, “/hat/“)

## regExReplace(s, regExp, r, count=0)

Replace all instances in string s with replacement string r, according to matches to regular expression regExp up to a limit of count. (When count=0, all instances are replaced.) (Numetric formulas use the Perl regular expression conventions.)

*Syntax: regExReplace(s, regExp, r, count=0)*

Example: regExReplace({address}, “/ \n”, “ ”)

## regExSearch(s, regExp)

returns the first string from source s matching the provided regular expression. If no matches are found, the result is blank. (Numetric formulas use the Perl regular expression conventions.)

*Syntax: regExSearch(s, regExp)*

Example: regExSearch({comments}, “/hat/“)

Returns string s, with any occurrences of substring old replaced with the provided string new.

Syntax: replace({Column Name}, {Old String}, {New String})

Example: replace({items}, “,”, “ “)

## round(X)

Returns the value of the field rounded to the nearest integer value. An integer is a whole number that can be either positive or negative.

*Syntax: round({Column Name})*

Example: round({ave_qty})

## rtrim(s, length, start=0)

Returns string s, trimmed to length, beginning at position start from the right (when start=0, trim the result starts at the right-most character). If length is longer than the trimmed string, the result is NOT padded on the end.

*Syntax: rtrim({Column Name}, length, start)*

Example: rtrim({padded_text}, 12, 2)

## sgn(x)

Return the sign of column as 1 or -1. This formula could be used to see, based on a profit column, whether the majority of sales deals are in the black or in the red.

Syntax: sgn({Column Name})

Example: sgn({profit})

## sin(x)

Return the sine of x in radians.

*Syntax: sin({Column Name})*

Example: sin({angle})

## sind(x)

Return the sine of X in degrees.

*Syntax: sind({Column Name})*

Example: sind({angle})

## split(s, sep)

Returns an array of strings by splitting s at each occurrence of sep. The resulting array can then be passed to another function.

*Syntax: split({Column Name}, sep)*

Example: split({items}, “,”)

## subtract

Subtract one column from another or any column from a number.

*Syntax:* *{Column 1} – {Column 2} (or any number)*

Example 1: {add_cust} – {cncl_cust}

Example 2: {add_cust} – 12

## sum

Add one column to any number of other columns or numbers.

*Syntax: {Column 1} + {Column 2}*

Example 1: {add_cust} + {cncl_cust}

Example 2: {add_cust} + 12

## tan(x)

Return the tangent of X radians

*Syntax: tan({Column Name})*

Example: tan({angle})

## tand(x)

Return the tangent of X degrees

*Syntax: tand({Column Name})*

Example: tand({angle})

## title(s)

Return string s converted to Title Case, i.e., with the first letter of each word converted to uppercase.

Syntax: title({Column Name})

Example: title({book name})

## truncate(x)

Turns a number to an integer by removing the decimal part of the number. If the data is coming in as a long decimal number, the truncate function will remove all decimals.

Syntax: *truncate({Column name})*

Example: truncate({ave_qty})

## unpack(arg1, ..., argN)

Combines arrays arg1 to argN into a single array, assembled in order by appending each successive argument array, to be passed to another function

*Syntax: unpack(arg1, ..., argN)*

Example: unpack({list1}, {list2})

## upper(s)

Return string s with all characters converted to uppercase.

*Syntax: upper({Column Name})*

Example: upper({book name})

## year(date)

Return the year of the date column.

*Syntax: year({Date Column})*

Example: year({batch_date})

## zip(arg1, ..., argN)

Return an array derived from arrays arg1 through argN matching to pattern [[arg1.1, .., argN.1], ...,[arg1.N, ..., argN.N]]. Append empties to the end of shorter arrays if the sizes differ.

*Syntax: zip(arg1, ..., argN)*

Example: zip({list1}, {list2})