IBM Visualization Data Explorer User's Reference

Categorize

Category

Function

Categorizes components of a field

Syntax

output = Categorize(input, name);

Inputs

Name Type Default Description
input field none field to categorize
name string or string list "data" component to categorize

Name	Type	Default	Description
`input`	field	none	field to categorize
`name`	string or string list	"data"	component to categorize

Outputs

Name Type Description
output field with additional lookup components

Name	Type	Description
`output`	field	with additional lookup components

Functional Details

input
is the field containing the components to categorize
name
is the name or names of the components to categorize

The Categorize module converts a component of any type to an integer array that references a newly created "lookup" component, which is a sorted list of the unique values in the original component. This serves to

reduce the size of a component that contains duplicate values,
allow conversion of string or vector data to "categorical" data,
detect repeated values in a component, and
create a sorted list of the unique values in a component for inspection.

Each component that is categorized will yield its own lookup component named "compname lookup", where compname is the name of the categorized component.

For example, if the component name is "state" and its values are {"MO", "CA", "MO", "NH", "AK", "NH"} then Categorize(field, "state") would convert component state to: {2, 1, 2, 3, 0, 3} and produce a new component, "state lookup" containing the values {"AK", "CA", "MO", "NH"}.

Notes:

Categorize works on scalar, string, or vectors of any type, with the lookup component sorted in order of x, y, z, ... If the lookup component has fewer items than the original component, then there are duplicate values in the original component. If the lookup component has 256 or fewer items, the categorized component will be of type unsigned byte; otherwise it will be of type int.
Categorical data can be converted back to its original values using either the Lookup module or Map. If the lookup component is of type string, it can be input as the labels parameter of Plot, ColorBar, or AutoAxes to label the values 0, 1, .. n-1 with the corresponding strings. This helps automate the labelling of categorical plots. Data imported by ImportSpreadsheet can be categorized on import directly by specifying the components to categorize. Statistics on the categorized component, and another associated component, can be found with CategoryStatistics. Include can be used to remove data by category.

Components

Modifies the components specified by name, replacing it by a list of indices. Adds a new component with the name "name lookup" which is a lookup table for component name.

Example Visual Programs

Duplicates.net
Categorical.net          (Categorize is called on import by ImportSpreadsheet)

See Also

CategoryStatistics, ImportSpreadsheet

[Data Explorer Home Page | Contact Data Explorer | Same document on Data Explorer Home Page ]

[IBM Home Page | Order | Search | Contact IBM | Legal ]

`input`	is the field containing the components to categorize
`name`	is the name or names of the components to categorize