Category
Transformation
Function
Categorizes components of a field
Syntax
output = Categorize(input, name);
Inputs
Name
| Type
| Default
| Description
|
input
| field
| none
| field to categorize
|
name
| string or string list
| "data"
| component to categorize
|
Outputs
Name
| Type
| Description
|
output
| field
| with additional lookup components
|
Functional Details
input
| is the field containing the components to categorize
|
name
| is the name or names of the components to categorize
|
The Categorize module converts a component of any type
to an integer array that references
a newly created "lookup" component, which is
a sorted list of the unique values in the original component.
This serves to
- reduce the size of a component that contains
duplicate values,
- allow conversion of string or vector data to
"categorical" data,
- detect repeated values in a component, and
- create a sorted list of the unique values in a component
for inspection.
Each component that is categorized will yield its own lookup
component named "compname lookup", where compname is the
name of the categorized component.
For example, if the component name is "state" and its values
are {"MO", "CA", "MO", "NH",
"AK", "NH"} then
Categorize(field, "state") would convert component state to:
{2, 1, 2, 3, 0, 3}
and produce a new component, "state lookup"
containing the values {"AK", "CA", "MO",
"NH"}.
Notes:
- Categorize works on scalar, string, or vectors of any type,
with the lookup component sorted in order of x, y, z, ...
If the lookup component has fewer items than the original
component, then there are duplicate values in the original component.
If the lookup component has 256 or fewer items,
the categorized component will be of type unsigned byte;
otherwise it will be of type int.
- Categorical data can be converted back to its original
values using either the Lookup module or Map.
If the lookup component is of type string, it can be input
as the labels parameter of Plot, ColorBar, or
AutoAxes to label the values 0, 1, .. n-1 with the corresponding strings.
This helps automate the labelling of categorical plots. Data imported
by ImportSpreadsheet can be categorized on import directly by
specifying the components to categorize. Statistics on the
categorized component, and another associated component,
can be found with CategoryStatistics.
Include can be used to remove data by category.
Components
Modifies the components specified by name, replacing
it by a list of indices. Adds a new component with the name
"name lookup" which is a lookup table for component
name.
Example Visual Programs
Duplicates.net
Categorical.net (Categorize is called on import by ImportSpreadsheet)
See Also
CategoryStatistics,
ImportSpreadsheet