Category
Transformation
Function
Calculate statistics on data associated with a categorical component
Syntax
statistics = CategoryStatistics(input, operation, category, data, lookup);
Inputs
Name
| Type
| Default
| Description
|
input
| field
| (none)
| field for which to compute
statistics
|
operation
| string
| "count"
| operation to perform
("count", "mean", "sd", "var", "min",
"max")
|
category
| string
| "data"
| component with categorical values
|
data
| string
| "data"
| data component for statistics
|
lookup
| integer, string, value list
| "category lookup"
| lookup component
|
Outputs
Name
| Type
| Description
|
statistics
| field
| field with data containing the
statistics and positions
for the category values
|
Functional Details
input
| field containing the categorical and data components
|
operation
| calculation to perform
|
category
| component with categorical values. This component must be an
integer type (int, ubyte, ...)
|
data
| data component for statistics. This component must be scalar.
|
lookup
| lookup component (optional)
|
CategoryStatistics calculates statistics on a scalar component
associated with a categorical component. If the
operation is "count", the data
component is ignored and the
number of counts in each category is calculated, corresponding
to a histogram of the unique values in the categorized component.
For example, if input is a Field with component
"state" containing the entries {1,0,1,2,3}, component
"state lookup" containing the entries {"CA", "NY",
"PA", "VA"}, and a component "sales" containing
the entries {1.2,1.0,1.4,1.7,1.8}, then
CategoryStatistics(input,"mean","state","sales") will
produce an output field where the "positions" component will
contain the indices {0,1,2,3} and the "data"
component will contain the mean value for sales for each state, that is
{1.0,1.3,1.7,1.8}.
The output of CategoryStatistics is a field with a "positions"
component corresponding to the categorical indices, and a "data"
component corresponding to the requested statistics. The
"positions" component will consist of the integers 0 to N-1, where
N can be determined in a number of ways:
- If no lookup component
is specified, and if a "categoryname lookup" component
is not found,
(where "categoryname" is the string specified by
category), then the output field will simply have
positions from 0 to MAX_N, where MAX_N is the maximum integer found in
the category component.
- If, on the other hand, a "categoryname lookup" component is
found, or lookup is specified, then the number of
category bins will be the number of items in lookup.
lookup can also simply be an integer specifying the
number of category bins.
- If a lookup table is provided, then for convenience, a
"categoryname lookup" component will be placed in the output
containing the values corresponding to the categorical indices.
Components
Creates an output field with a "positions" component representing
the categorical indices, and a "data" component containing the
requested statistics. Creates a "categoryname lookup" component if
a lookup table is specified using the lookup
parameter.
Example Visual Programs
Duplicates.net
Zipcodes.net
See Also
Categorize,
Statistics,
Lookup