proc categories


Version 2.33 Jun'06

Scripts


Manual page for proc_categories(PL)

proc categories defines a category set to be associated with the X or Y axis, for handling categorical data, typically for bar positioning, etc. This proc also controls attributes related to category processing.

Categories can be defined using this proc before invoking proc areadef. In older Ploticus versions categories were defined within proc areadef using an old syntax described below which will continue to be supported. However, additional capabilities and higher capacities are available via proc categories.

Proc categories also assumes the functionality of the old proc catslide (see the slideamount attribute).


Category sets

The category scaletype allows positioning of data points using categorical bins rather than a continuous scale, often useful in positioning bars, rangebars, etc.

Category names are alphanumeric labels, and are generally short (less than 40 chars long); they may contain embedded whitespace but this may be problematic if a category name will be used as a plotvalue or locvalue.

During plotting, data are categorized by comparing a given data field with each defined category label until a match is found, then the point is plotted at that location. If no match is found nothing is plotted, and an error is issued if the -showbad command line option is in effect.

One category set may be defined for the X axis, and one for the Y axis. Category sets and associated attributes are independent of individual plotting areas (thus categories may be defined one time and then used in several different plotting areas). Category sets are also completely independent from input data sets (thus categories may be defined from one set of data, then still be in effect after different data are read in).

Category sets may be taken from a data field or specified explicitly. Category labels should always be unique within an axis, and are normally displayed in the same order as specified.

The default maximum number of categories is 250 in X and 250 in Y. These limits can be raised using the listsize attribute.


Example

See the mouse gallery example.


Attributes

Some attributes need to be specified in a certain order, unlike most other ploticus procs. The axis attribute must be specified before any other attribute. Also, #clone is not supported.

axis     x | y

Which axis the category set is associated with. This attribute must be the first one specified.
Example: axis: x

datafield     dfield

Specify a data field to get category labels from.
Example: datafield: measnum
Example: datafield: 2

categories     multi-line text

List of category labels, one per line. Terminated with a blank line. Example:

categories:
    red
    blue
    orange


select     select expression

Allows data rows to be selected for inclusion as categories using a selection expression. This only has an effect when used with datafield, and it must be specified before datafield.
Example: select: @4 != null

extracategory     text

Allows an extra category to be added explicitly. For example, this attribute might be useful when categories are being set by a data field and it is desired to have an additional "Total" category.

This position of this attribute relative to others is important. If specified before the category set is defined, the extra category will be added to the beginning of the category list and it will appear near the axis min. If specified after, the extra category will be appended to the category list and appear near the axis max.

This attribute may be specified as many times as necessary, with each adding an additional category.
Example: extracategory: Total

checkuniq     yes|no

Default is yes. The only situation where one might set this to no is with data sets where each category tag is guaranteed to appear once and only once.. to get a tiny gain in efficiency-- because incoming category tags won't be checked against the list of known tags. Since the max # of categories is a few hundred this doesn't amount to much savings anyway.

comparemethod     exact | beginslike | length=n

When data points are being plotted using category scaletype, a given data field is compared against each defined category label until a match is found, then the data is plotted at that location. This attribute controls the method used for matching. Default is exact. To compare for only the length of the data field, use beginslike. To compare for a specific length, use length=n, where n is the number of characters.

roundrobin     yes | no

Default is yes. Normally a round-robin style lookup algorithm is used, which is most efficient when the category labels are encountered in the same order as defined. In practice this is most often the case. However, this attribute can be set to no which will cause the lookup to be sequential starting each time at the begining of the list. This might perform better in certain situations.
Example: roundrobin: no

slideamount     n

Adjust the plot locations corresponding to categories laterally by n units in scaled space. Typical use is to render pairs or clusters, or to improve alignment of datapoints with axis stubs. For example, if categories are associated with the X axis, slideamount adjusts the location left or right by a small distance.

Note: when areadef sets up the plotting area and scaling it cancels any slideamount currently in effect. So slideamount must be specified in a separate #proc categories block after #proc areadef, as shown below. The following will slide the categorical X axis 0.1 scale units to the left:

    #proc categories
     axis: x
     ...
   
    #proc areadef
     ...
   
    #proc categories
     axis: x
     slideamount: -0.1


listsize     n

Specify the size of the category list. Default capacity is 250 categories per axis. If you need more categories, you can specify the upper limit here. This attribute may be specified only one time per script, and must be given before any categories are defined for the axis. Example:

proc categories
  axis: x
  listsize: 1000
  datafield: 2



Old syntax for setting up categories

Here is a summary of the old syntax used within proc areadef to specify categories. This syntax will continue to be supported, but new work should use proc categories (above).

xcategories datafield=dfield [selectrows=conditional expression]
..OR..
xcategories multi-line text

Defines a set of categories for use on the X axis. To take categories from a data field, use the construct datafield=dfield where dfield is a data field specification. Or, category names may be listed explicitly one per line, terminating with a blank line. An optional select expression may be supplied if taking categories from data field, to use selected data rows only (new in 2.03.. see example 2 below).
Example 1:   xcategories: datafield=1

Example 2:   xcategories: datafield=1  selectrows=@3 like S*

Example 3:   xcategories: Red
   			  Blue
   			  Green

ycategories datafield=dfield [selectrows=conditional expression]
..OR..
ycategories multi-line text

Specify categories for use in Y, one per line. Same syntax as xcategories above. Default orientation of categories along Y is from top to bottom.

xextracategory text

Allows an extra X axis category to be added explicitly. For example, this attribute might be useful when categories are being set by a data field and it is desired to have an additional "Total" category. Unlike most other ploticus attributes, its behavior is position-dependent, and it may be specified more than once. If specified before (above) xcategories in the proc areadef attributes, the extra category will be added to the beginning of the category list and it will appear near the X axis min. If specified after, the extra category will be appended to the category list and appear near the X max. This attribute may be specified one or more times, with each adding a category.
Example: 	xextracategory: Total
		xextracategory: Weekly average


yextracategory text

Same as xextracategory above, but for the Y axis.

catcompmethod beginswith | exact | length=N

Control the details of how category comparisons are done. The default is beginswith for backward compatibility; exact is highly recommeded for new work. In all cases, the comparisons are case-insensitive, and work from the beginning of the categories list to the end, stopping when a match is found.
beginswith = the comparison is successful if the data item matches the category name but only to the length of the data item.
exact = the comparison is successful if the data item exactly matches the category name.
length=N = the comparison is successful if the first N characters of the data item match the first N characters of the category name.


Old syntax for proc catslide

Here's an example of the old syntax for proc catslide, which has been superseded by the slideamount attribute:
#proc catslide
  axis: x
  amount: -0.1





data display engine  
Copyright Steve Grubb


Ploticus is hosted at http://ploticus.sourceforge.net
SourceForge Logo


Markup created by unroff 1.0,    June 02, 2006.