47 DBMS_DATA_MINING_TRANSFORM

The DBMS_DATA_MINING_TRANSFORM package contains transformation routines that can be used to prepare data for mining.

See Also:

Chapter 42, "DBMS_DATA_MINING"
Oracle Data Mining Concepts for information about automatic and embedded data transformation in Oracle Data Mining

This chapter contains the following topics:

Using DBMS_DATA_MINING_TRANSFORM
- Data Transformation with Oracle Data Mining
- Types of Transformations
- The DBMS_DATA_MINING_TRANSFORM Framework
- Data Types
Summary of DBMS_DATA_MINING_TRANSFORM Subprograms

Using DBMS_DATA_MINING_TRANSFORM

This section contains topics that relate to using the DBMS_DATA_MINING_TRANSFORM package.

Data Transformation with Oracle Data Mining
Types of Transformations
The DBMS_DATA_MINING_TRANSFORM Framework
Data Types

Data Transformation with Oracle Data Mining

A transformation is a SQL expression that modifies one or more of the columns used in data mining.

Most data mining algorithms require some form of data transformation. Some transformations may be needed for categorical attributes, for numerical attributes, or for both. Data sets used for creating, testing, and scoring the model must all undergo the same transformations. To replicate these changes in each data set can be very time consuming.

Oracle Data Mining embeds transformation expressions in the model. The transformations used to create the model are automatically performed on any data sets to which the model is applied. This ability to deploy a model together with its transformations significantly enhances the model's usability while simplifying the process of model management.

Oracle Data Mining further simplifies the data transformation process by offering an Automatic Data Preparation mode (ADP). When you create a model using ADP, the training data is automatically transformed according to the requirements of the algorithm; the automatic transformations are embedded in the model. To enable ADP, set the PREP_AUTO setting to PREP_AUTO_ON in the settings table for the model.

See Also:

Table 42-6 in Chapter 42, "DBMS_DATA_MINING" for information about the PREP_AUTO setting

You can supplement automatic transformations by passing a list of transformation expressions to DBMS_DATA_MINING.CREATE_MODEL. Your transformations are embedded in the model along with the automatic transformations. If ADP is turned off, only the transformations that you pass to CREATE_MODEL are embedded.

See Also:

"CREATE_MODEL Procedure" in Chapter 42, "DBMS_DATA_MINING" for information about the transformation list parameter

The routines in DBMS_DATA_MINING_TRANSFORM are convenience routines to assist you in creating your own transformations. If these routines do not suit your needs, you can write SQL to modify them, or you can write your own routines.

Note on Notation:

This chapter uses standard interval notation for number sets:

[a,b] is the set of all real numbers greater than or equal to a and less than or equal to b
[a,b) is the set of all real numbers greater than or equal to a and less than b.

(b is in the set [a,b]; b is not in the set [a,b).)

Types of Transformations

The DBMS_DATA_MINING_TRANSFORM package supports transformations that are typically used in data mining: binning, normalization, outlier treatment, missing value treatment, and column removal.

Binning

Binning involves mapping both continuous and discrete values to discrete values of reduced cardinality. For example, the age of persons could be binned into discrete numeric bins: 1-20 to 1, 21-40 to 2, and so on. Popular car manufacturers such as Ford, Chrysler, BMW, Volkswagen could be binned into discrete categorical bins: {Ford, Chrysler} to US_Car_Makers, and {BMW, Volkswagen} to European_Car_Makers.

DBMS_DATA_MINING_TRANSFORM supports binning for both categorical and numerical attributes.

Supervised Binning (Categorical and Numerical)

The bin contents are groupings calculated by a Decision Tree model.

Top-N Frequency Categorical Binning

The bin contents are based on the frequency of each value in the column. The number of bins is user-specified. The first bin contains the most frequent values, the second bin contains the second-most frequent values, and so on. The final bin contains all the remaining values.

Equi-Width Numerical Binning

Each bin contains the same number of values. The number of bins is user-specified or estimated from the data.The first bin contains the highest values, the second bin contains the next highest values, and so on.

Quantile Numerical Binning

Each bin contains the numbers in a quantile that is computed using the SQL NTILE function. Quantiles are partitions of a data distribution.

Normalization

Normalization is the process of scaling continuous values down to a specific range, typically between zero and one. Normalization applies only to numerical attributes.

Normalization transforms each number by subtracting a given number (shift) and dividing the result by the scale. For example:

x_new = (x_old-shift)/scale

Min-Max Normalization

Min-max normalization is computed based on the minimum and maximum values of the data. The values for shift and scale are:

shift = min
scale = max-min

Scale Normalization

Scale normalization is computed based on the minimum and maximum values of the data. The values for shift and scale are:

shift = 0 
scale = max{abs(max), abs(min)}

Z-Score Normalization

Z-score normalization is computed based on the values for mean and standard deviation that are computed from the data. The values for shift and scale are:

shift = mean 
scale = standard_deviation

Outlier Treatment

Some computations can be significantly affected by extreme values. One approach to achieving a more robust computation is to winsorize or trim (clip) the data as a preprocessing step.

Winsorizing involves setting the tail values to some specified value. For example, for a 90% winsorization, the bottom 5% could be set to the minimum value in the 6th percentile, while the upper 5% could be set to the maximum value in the 95th percentile.

Trimming sets the tail values to NULL.

Missing Value Treatment

Oracle Data Mining distinguishes between sparse data and data that contains random missing values. The latter means that some attribute values are unknown. Sparse data, on the other hand, contains values that are assumed to be known, although they are not represented in the data.

NULLs in columns with simple data types (not nested) are assumed to represent random missing values. Values that are not present in nested columns indicate sparsity. Oracle Data Mining automatically replaces numerical data that is missing at random with the mean and categorical data that is missing at random with the mode. Missing numerical values in sparse data are replaced with zeros. Missing categorical values in sparse data are replaced with a zero vector or a special value DM$SPARSE, depending on the algorithm.

DBMS_DATA_MINING_TRANSFORM provides routines for handling missing values. These routines are available for anyone who wishes to manage missing value treatment explicitly, instead of relying on the default treatment.

See Also:

Oracle Data Mining Application Developer's Guide for more information about missing value treatment by Oracle Data Mining

The DBMS_DATA_MINING_TRANSFORM Framework

DBMS_DATA_MINING_TRANSFORM provides a framework for transforming columns of data for mining. The framework is composed of CREATE, INSERT, STACK, and XFORM operations. You can modify the output or implement your own version of any of these operations.

The DBMS_DATA_MINING_TRANSFORM framework consists of the following operations:

CREATE — Create a transformation table.

A transformation table has the columns needed for specifying one of the following transformations:
- Binning
- Column removal
- Normalization
- Outlier treatment
- Missing value treatment
For example, the binning transformation table has columns for listing the original column values with their bin assignments.

The columns of the transformation table are empty until they are populated by an INSERT routine.
INSERT — Populate a transformation table for a given data source.

For example, you could use an INSERT routine to populate a transformation table with bin assignments for the categorical attributes in a specified data source. Bin assignments could be calculated using supervised binning or top-n frequency binning.

You can invoke CREATE and INSERT routines several times until the transformation tables fully represent all the data attributes for a given mining problem.
STACK — Add the transformation instructions to a list (stack) of transformation instructions.

For example, you could use a STACK routine to add the transformation instructions that populated the binning transformation table to a list of transformation instructions. With STACK routines, you can assemble all the transformation instructions for a given data source into a single list.

The list of transformation instructions can be passed to CREATE_MODEL or to other STACK routines. (See "CREATE_MODEL Procedure".) .
XFORM — Create a view of the data table that includes the transformed columns.

For example, you could use an XFORM routine to create a view that includes the binned columns from the binning transformation table.

The view created by an XFORM routine includes the transformed values for the columns in the transformation table and the original values for all other columns. The view can be used for creating, testing, or scoring a model.

Data Types

DBMS_DATA_MINING_TRANSFORM defines the data types described in this section.

COLUMN LIST Collection Type

This type is used to store quoted and non-quoted identifiers for column names.

TYPE    COLUMN_LIST                    IS VARRAY(1000) OF VARCHAR2(32);

DESCRIBE LIST Collection Type

This type holds the results of describe operations.

SUBTYPE
    DESCRIBE_LIST                  IS dbms_sql.DESC_TAB2;

EXPRESSION Record Type

This type is used for storing transformation expressions.

TYPE 
    EXPRESSION_REC                 IS RECORD (
                                       ...........);,

Unlike VARCHAR2, EXPRESSION_REC can be used for expressions that can grow larger than 32K.

Use SET_EXPRESSION to write to an EXPRESSION_REC. Use GET_EXPRESSION to retrieve the contents of an EXPRESSION_REC.

TRANSFORMATION Record Type

This type is used for storing the in-memory transformation stack definition for a single attribute.

TYPE
    TRANSFORM_REC                  IS RECORD (
      attribute_name                 VARCHAR2(30),
      attribute_subname              VARCHAR2(4000),
      expression                     EXPRESSION_REC,
      reverse_expression             EXPRESSION_REC,
      attribute_spec                 VARCHAR2(4000));

The attribute_spec field is used to specify additional information and actions for an attribute. You can specify the value NOPREP for attribute_spec.

NOPREP disables ADP for an individual attribute. NOPREP can be used in the transformation list passed to DBMS_DATA_MINING.CREATE_MODEL. (See "CREATE_MODEL Procedure" .)

TRANSFORMATION LIST Collection Type

This type is used for storing the in-memory stack definition.

TYPE
    TRANSFORM_LIST                    IS TABLE OF TRANSFORM_REC;

NESTED COLUMN Constants

These constants are used for specifying data types of nested columns.

NEST_NUM_COL_TYPE                   CONSTANT NUMBER := 100001;
NEST_CAT_COL_TAYPE                  CONSTANT NUMBER := 100002;

Summary of DBMS_DATA_MINING_TRANSFORM Subprograms

Table 47-1 DBMS_DATA_MINING_TRANSFORM Package Subprograms

Subprogram	Purpose
CREATE_BIN_CAT Procedure	Creates a categorical bin definition table
CREATE_BIN_NUM Procedure	Creates a numerical bin definition table
CREATE_CLIP Procedure	Creates a clipping definition table
CREATE_COL_REM Procedure	Creates a column removal definition table
CREATE_MISS_CAT Procedure	Creates a categorical missing value treatment definition table
CREATE_MISS_NUM Procedure	Creates a numerical missing value treatment definition table
CREATE_NORM_LIN Procedure	Creates a normalization definition table
DESCRIBE_STACK Procedure	Describes the stack definition
GET_EXPRESSION Function	Returns a transformation expression
INSERT_AUTOBIN_NUM_EQWIDTH Procedure	Populates the numerical bin definition table, using the number of bins estimated from the data
INSERT_BIN_CAT_FREQ Procedure	Populates the categorical bin definition table, applying frequency-based binning to the categorical input data
INSERT_BIN_NUM_EQWIDTH Procedure	Populates the numerical bin definition table, applying equi-width binning to the numerical input data
INSERT_BIN_NUM_QTILE Procedure	Populates the numerical bin definition table, applying quantile binning to the numerical input data
INSERT_BIN_SUPER Procedure	Populates a binning definition table, applying supervised binning to both numerical and categorical data
INSERT_CLIP_TRIM_TAIL Procedure	Populates the clipping definition table, applying trimming based on tail fraction to the numerical input data
INSERT_CLIP_WINSOR_TAIL Procedure	Populates the clipping definition table, applying Winsorizing based on tail fraction to the numerical input data
INSERT_MISS_CAT_MODE Procedure	Populates the categorical missing value treatment definition table, applying the mode to each missing value
INSERT_MISS_NUM_MEAN Procedure	Populates the numerical missing value treatment definition table, applying the mean to each missing value
INSERT_NORM_LIN_MINMAX Procedure	Populates the normalization definition table, applying min-max normalization to the numerical input data
INSERT_NORM_LIN_SCALE Procedure	Populates the normalization definition table, applying scale normalization to the numerical input data
INSERT_NORM_LIN_ZSCORE Procedure	Populates the normalization definition table applying z-score normalization to the numerical input data
SET_EXPRESSION Procedure	Appends to a transformation expression
SET_TRANSFORM Procedure	Specifies a transformation expression
STACK_BIN_CAT Procedure	Add categorical binning expressions to the transformation list
STACK_BIN_NUM Procedure	Add numerical binning expressions to the transformation list
STACK_CLIP Procedure	Add clipping expressions to the transformation list
STACK_COL_REM Procedure	Add column removal expressions to the transformation list
STACK_MISS_CAT Procedure	Add categorical missing value treatment expressions to the transformation list
STACK_MISS_NUM Procedure	Add numerical missing value treatment expressions to the transformation list
STACK_NORM_LIN Procedure	Add linear normalization expressions to the transformation list
XFORM_BIN_CAT Procedure	Creates a view that applies a categorical binning transformation to the data
XFORM_BIN_NUM Procedure	Creates a view that applies a numerical binning transformation to the data
XFORM_CLIP Procedure	Creates a view that applies a clipping transformation to the data
XFORM_COL_REM Procedure	Creates a view that removes the specified columns from the data
XFORM_EXPR_NUM Procedure	Creates a view that applies a transformation expression to a numerical column in the data
XFORM_EXPR_STR Procedure	Creates a view that applies a transformation expression to a character column in the data
XFORM_MISS_CAT Procedure	Creates a view that performs missing value treatment on the categorical attributes in the data
XFORM_MISS_NUM Procedure	Creates a view that performs missing value treatment on the numerical attributes in the data
XFORM_NORM_LIN Procedure	Creates a view that performs linear normalization to numerical attributes in the data
XFORM_STACK Procedure	Creates a view that transforms the data as specified by the stack definition (transformation list)

CREATE_BIN_CAT Procedure

This procedure creates a categorical binning transformation table.

Syntax

DBMS_DATA_MINING_TRANSFORM.CREATE_BIN_CAT (
     bin_table_name     IN VARCHAR2,
     bin_schema_name    IN VARCHAR2 DEFAULT NULL);

Parameters

Table 47-2 CREATE_BIN_CAT Procedure Parameters

Parameter	Description
`bin_table_name`	Name of the binning transformation table
`bin_schema_name`	Name of the schema hosting the binning transformation table

Usage Notes

The table created by this procedure can be used to store bin assignments for categorical attributes. The following INSERT routines are available for populating the table:

INSERT_BIN_CAT_FREQ Procedure for frequency-based binning
INSERT_BIN_SUPER Procedure for supervised binning

The columns in the categorical bin definition table are described in Table 47-3.

Table 47-3 Categorical Binning Definition Table

Column	Data Type	Description
`col`	`VARCHAR2(30)`	Column containing the categorical values to be binned
`att`	`VARCHAR2(4000)`	Attribute name if the column is nested
`val`	`VARCHAR2(4000)`	A value in the column to be binned
`bin`	`VARCHAR2(4000)`	The bin that contains the value

Examples

BEGIN
   DBMS_DATA_MINING_TRANSFORM.CREATE_BIN_CAT('build_bin_cat_table');
END;
/
 
DESCRIBE build_bin_cat_table
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 COL                                                VARCHAR2(30)
 ATT                                                VARCHAR2(4000)
 VAL                                                VARCHAR2(4000)
 BIN                                                VARCHAR2(4000)

CREATE_BIN_NUM Procedure

This procedure creates a numerical binning transformation table. This table is used as input to the INSERT_BIN_NUM_EQWIDTH, INSERT_BIN_NUM_QTILE, INSERT_AUTOBIN_NUM_EQWIDTH, and XFORM_BIN_NUM procedures.

Syntax

DBMS_DATA_MINING_TRANSFORM.CREATE_BIN_NUM (
     bin_table_name    IN VARCHAR2,
     bin_schema_name   IN VARCHAR2 DEFAULT NULL);

Parameters

Table 47-4 CREATE_BIN_NUM Procedure Parameters

Parameter	Description
`bin_table_name`	Name of the bin definition table
`bin_schema_name`	Name of the schema hosting the bin definition table

Usage Notes

The generated bin definition table will have the following columns.

Column Name	Data Type
`col`	`VARCHAR2(30)`
`att`	`VARCHAR2(4000)`
`val`	`NUMBER`
`bin`	`VARCHAR2(4000)`

Examples

BEGIN
  DBMS_DATA_MINING_TRANSFORM.CREATE_BIN_NUM('build_bin_num_table');
END;
/
 
DESCRIBE build_bin_num_table
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 COL                                                VARCHAR2(30)
 ATT                                                VARCHAR2(4000)
 VAL                                                NUMBER
 BIN                                                VARCHAR2(4000)

CREATE_CLIP Procedure

This procedure creates a clipping definition table. This table is used as input to the INSERT_CLIP_WINSOR_TAIL, INSERT_CLIP_TRIM_TAIL, and XFORM_CLIP procedures.

Syntax

DBMS_DATA_MINING_TRANSFORM.CREATE_CLIP (
     clip_table_name    IN VARCHAR2,
     clip_schema_name   IN VARCHAR2 DEFAULT NULL);

Parameters

Table 47-5 CREATE_CLIP Procedure Parameters

Parameter	Description
`clip_table_name`	Name of the clipping definition table
`clip_schema_name`	Name of the schema hosting the clipping definition table

Usage Notes

The generated clipping definition table will have the following columns.

Column Name	Data Type
`col`	`VARCHAR2(30)`
`att`	`VARCHAR2(4000)`
`lcut`	`NUMBER`
`lval`	`NUMBER`
`rcut`	`NUMBER`
`rval`	`NUMBER`

Examples

BEGIN
  DBMS_DATA_MINING_TRANSFORM.CREATE_CLIP('build_clip_table');
END;
/
 
DESCRIBE build_clip_table
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 COL                                                VARCHAR2(30)
 ATT                                                VARCHAR2(4000)
 LCUT                                               NUMBER
 LVAL                                               NUMBER
 RCUT                                               NUMBER
 RVAL                                               NUMBER

CREATE_COL_REM Procedure

This procedure creates a column removal definition table. This table is used by the STACK_COL_REM, XFORM_COL_REM, INSERT_BIN_SUPER, and INSERT_AUTOBIN_NUM_EQWIDTH procedures.

Syntax

DBMS_DATA_MINING_TRANSFORM.CREATE_COL_REM (
      rem_table_name           VARCHAR2,
      rem_schema_name          VARCHAR2 DEFAULT NULL);

Parameters

Table 47-6 CREATE_COL_REM Procedure Parameters

Parameter	Description
`rem_table_name`	Name of the column removal definition table.
`rem_schema_name`	Schema of the column removal definition table.

Usage Notes

The generated column removal definition table has the following columns.

Column Name	Data Type
`col`	`VARCHAR2(30)`
`att`	`VARCHAR2(4000)`

The column removal definition table lists the column names and the associated attribute names to remove from the data source. The col column is case sensitive, since it generates quoted identifiers. There can be multiple entries in the column removal definition table for the same col when it is a nested column.

Examples

BEGIN
    DBMS_DATA_MINING_TRANSFORM.CREATE_COL_REM ('col_rem_tbl');
END;
 /
 
DESCRIBE col_rem_tbl
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 COL                                                VARCHAR2(30)
 ATT                                                VARCHAR2(4000)

CREATE_MISS_CAT Procedure

This procedure creates a categorical missing value treatment definition table. This table is used as input to the INSERT_MISS_CAT_MODE procedure.

Syntax

DBMS_DATA_MINING_TRANSFORM.CREATE_MISS_CAT (
     miss_table_name       IN VARCHAR2,
     miss_schema_name      IN VARCHAR2 DEFAULT NULL);

Parameters

Table 47-7 CREATE_MISS_CAT Procedure Parameters

Parameter	Description
`miss_table_name`	Name of the categorical missing value treatment definition table
`miss_schema_name`	Name of the schema hosting the categorical missing value treatment definition table

Usage Notes

The generated categorical missing value treatment definition table will have the following columns.

Column Name	Data Type
`col`	`VARCHAR2(30)`
`att`	`VARCHAR2(4000)`
`val`	`VARCHAR2(4000)`

Examples

BEGIN

DBMS_DATA_MINING_TRANSFORM.CREATE_MISS_CAT('build_miss_cat_table');
END;
/
 
DESCRIBE build_miss_cat_table
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 COL                                                VARCHAR2(30)
 ATT                                                VARCHAR2(4000)
 VAL                                                VARCHAR2(4000)

CREATE_MISS_NUM Procedure

This procedure creates a numerical missing value treatment definition table. This table is used as input to the INSERT_MISS_NUM_MEAN procedure.

Syntax

DBMS_DATA_MINING_TRANSFORM.CREATE_MISS_NUM (
     miss_table_name       IN VARCHAR2,
     miss_schema_name      IN VARCHAR2 DEFAULT NULL);

Parameters

Table 47-8 CREATE_MISS_NUM Procedure Parameters

Parameter	Description
`miss_table_name`	Name of the numeric missing value treatment definition table
`miss_schema_name`	Name of the schema hosting the numeric missing value treatment definition table

Usage Notes

The generated numeric missing value definition table will have the following columns.

Column Name	Data Type
`col`	`VARCHAR2(30)`
`att`	`VARCHAR(4000)`
`val`	`NUMBER`

Example

BEGIN
    DBMS_DATA_MINING_TRANSFORM.CREATE_MISS_NUM('build_miss_num_table');
END;
/
 
DESCRIBE build_miss_num_table
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 COL                                                VARCHAR2(30)
 ATT                                                VARCHAR2(4000)
 VAL                                                NUMBER

CREATE_NORM_LIN Procedure

This procedure creates a linear normalization definition table. This table is used as input to the INSERT_NORM_LIN_MINMAX, INSERT_NORM_LIN_SCALE, INSERT_NORM_LIN_ZSCORE, and XFORM_NORM_LIN procedures.

Syntax

DBMS_DATA_MINING_TRANSFORM.CREATE_NORM_LIN (
     norm_table_name       IN VARCHAR2,
     norm_schema_name      IN VARCHAR2 DEFAULT NULL);

Parameters

Table 47-9 CREATE_NORMALIZE_LIN Procedure Parameters

Parameter	Description
`norm_table_name`	Name of the normalization definition table
`norm_schema_name`	Name of the schema hosting the normalization definition table

Usage Notes

The generated linear normalization definition table will have the following columns.

Column Name	Data Type
`col`	`VARCHAR2(30)`
`att`	`VARCHAR2(4000)`
`shift`	`NUMBER`
`scale`	`NUMBER`

Examples

BEGIN
     DBMS_DATA_MINING_TRANSFORM.CREATE_NORM_LIN('build_norm_table');
END;
/
 
DESCRIBE build_norm_table
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 COL                                                VARCHAR2(30)
 ATT                                                VARCHAR2(4000)
 SHIFT                                              NUMBER
 SCALE                                              NUMBER

DESCRIBE_STACK Procedure

This procedure describes the transformation of the data table specified by the stack definition.

Only the columns that are specified in the stack definition are transformed. The remaining columns do not change.

All columns in the stack definition are applied. Columns with NULL in the expression field are excluded. Columns in the stack definition that do not have a matching column in the data are added to the describe list.

Syntax

DBMS_DATA_MINING_TRANSFORM.DESCRIBE_STACK (
     xform_list           IN TRANSFORM_LIST,
     data_table_name      IN VARCHAR2,
     describe_list        OUT DESCRIBE_LIST
     data_schema_name     IN VARCHAR2 DEFAULT NULL);

Parameters

Table 47-10 DESCRIBE_STACK Procedure Parameters

Parameter	Description
`xform_list`	List of stacked transformations
`data_table_name`	Table that contains the mining data
`describe_list`	Descriptions of stacked transformations
`data_schema_name`	Schema of table that contains the mining data. If you do not specify a schema, your own schema is used.

Usage Notes

For DM_NESTED_NUMERICALS, the value of col_type is nest_num_col_type. For DM_NESTED_CATEGORICALS, the value of col_type is nest_cat_col_type.

GET_EXPRESSION Function

This procedure returns a VARCHAR2 chunk from the expression.

Chunk numbering starts with one. For chunks outside of the range, the return value is NULL. When a chunk number is NULL the whole expression is returned as a string. If the expression is too big, a VALUE_ERROR is raised.

Syntax

DBMS_DATA_MINING_TRANSFORM.GET_EXPRESSION (
     expression           IN EXPRESSION_REC,
     chunk_num            IN PLS_INTEGER DEFAULT NULL);
 RETURN VARCHAR2;

Parameters

Table 47-11 GET_EXPRESSION Function Parameters

Parameter	Description
`expression`	Transformation expression
`chunk_num`	Number of the chunk

INSERT_AUTOBIN_NUM_EQWIDTH Procedure

This procedure finds the numerical binning definition for every numerical column in the data table that is not specified in the exclusion list and inserts the definition into the numerical binning definition table that was created using CREATE_BIN_NUM. Based on the statistical information it collects on the input data, this procedure calculates the number of bins.

Definition for each relevant column is computed based on the minimum and maximum values that are computed from the data table.

N, the number of bins, is computed for each column separately and is based on the number of non-NULL values (cnt), the maximum (max), the minimum (min), the standard deviation (dev) and the constant C=3.49/0.9 as follows:

N=floor(power(cnt,1/3)*(max-min)/(c*dev))

Each of the bin_num (= N) bins bin_1,..., bin_N span ranges of equal width inc = (max – min) / N where bin_I = I when N > 0 or bin_I = N+1–I when N < 0, and bin_0 = bin_(N+1) = NULL. The values of the val column are rounded to round_num significant digits prior to scoring them in the definition table.

The parameter bin_num is used to adjust N to be at least bin_num. No adjustment is done when bin_num is NULL or 0. The parameter max_bin_num is used to adjust N to be at most max_bin_num. No adjustment is done when bin_num is NULL or 0. For columns with all integer values (discrete columns), N is adjusted to be at most the maximum number of distinct values in the observed range max-min+1.

The parameter sample_size is used to adjust cnt to be at most sample_size. No adjustment is done when sample_size is NULL or 0.

Syntax

DBMS_DATA_MINING_TRANSFORM.INSERT_AUTOBIN_NUM_EQWIDTH (
     bin_table_name        IN VARCHAR2,
     data_table_name       IN VARCHAR2,
     bin_num               IN PLS_INTEGER DEFAULT 3,
     max_bin_num           IN PLS_INTEGER DEFAULT 100,
     exclude_list          IN Column_List DEFAULT NULL,
     round_num             IN PLS_INTEGER DEFAULT 6,
     sample_size           IN PLS_INTEGER DEFAULT 50000,
     bin_schema_name       IN VARCHAR2 DEFAULT NULL,
     data_schema_name      IN VARCHAR2 DEFAULT NULL,
     rem_table_name        IN VARCHAR2 DEFAULT NULL,
     rem_schema_name       IN VARCHAR2 DEFAULT NULL));

Parameters

Table 47-12 INSERT_AUTOBIN_EQWIDTH Procedure Parameters

Parameter	Description
`bin_table_name`	Name of the categorical bin table generated using `CREATE_BIN_NUM` procedure
`data_table_name`	Name of the table containing the data
`bin_num`	Minimum number of bins; default number is 3
`max_bin_num`	Maximum number of bins that sets the upper limit for estimates of bin numbers; default is 100
`exclude_list`	List of columns (attributes) to be excluded from this iteration of the binning process; categorical attributes are automatically excluded
`round_num`	Number of significant digits; default is 6
`sample_size`	Size of the data sample; default is 50,000
`bin_schema_name`	Name of the schema hosting the bin definition table; default is user schema
`data_schema_name`	Name of the schema hosting the table with data; default is user schema
`rem_table_name`	Name of the column removal definition table created with `CREATE_COL_REM`.
`rem_schema_name`	Schema of the column removal definition table.

Usage Notes

For a given input table, you can call this routine several times with different specifications for number of bins for a given input table. For each call, you can selectively exclude attributes (that is, column names) using the exclude_list parameter for a particular binning specification.

Columns with all NULL values or only one unique value are ignored. The sign of bin_num, max_bin_num, and sample_size have no effect on the result; absolute values are used. The value adjustment of N is done in the following order: First bin_num, next, max_bin_num, and, finally, discrete column adjustment.

Examples

The simplest invocation of this routine populates bin definitions in the num_bin_table for all the numerical attributes found in build_table.

BEGIN

DBMS_DATA_MINING_TRANSFORM.INSERT_AUTOBIN_NUM_EQUIWIDTH(
      'num_bin_table', 'build_data_table');

END;

INSERT_BIN_CAT_FREQ Procedure

This procedure finds the categorical binning definition for every VARCHAR2 and CHAR column in the data table that is not specified in the exclusion list and inserts the definition into the categorical binning definition table created using CREATE_BIN_CAT.

Definition for each relevant column is computed based on the occurrence frequency of column values that are computed from the data table. Each of the bin_num(N) bins bin_1, ..., bin_N corresponds to the values with top frequencies when N > 0 or bottom frequencies when N < 0, and bin_(N+1) to all remaining values, where bin_I = I. Ordering ties among identical frequencies are broken by ordering on column values (ASC for N > 0 or DESC for N < 0). When the number of distinct values C < N only C+1 bins will be created.

The parameter default_num (D) is used for pruning based on the number of values that fall into the default bin. When D > 0 only columns that have at least D defaults are kept while others are ignored. When D < 0 only columns that have at most D values are kept. No pruning is done when D is NULL or D = 0. Parameter bin_support (SUP) is used for restricting bins to frequent (SUP > 0) values frq >= SUP*tot, or infrequent (SUP < 0) ones frq <= –SUP*tot, where frq is a given value count and tot is a sum of all counts as computed from the data. No support filtering is done when SUP is NULL or when SUP = 0.

Syntax

DBMS_DATA_MINING_TRANSFORM.INSERT_BIN_CAT_FREQ (
     bin_table_name       IN VARCHAR2,
     data_table_name      IN VARCHAR2,
     bin_num              IN PLS_INTEGER DEFAULT 9,
     exclude_list         IN Column_List DEFAULT NULL,
     default_num          IN PLS_INTEGER DEFAULT 2,
     bin_support          NUMBER DEFAULT NULL,
     bin_schema_name      IN VARCHAR2 DEFAULT NULL,
     data_schema_name     IN VARCHAR2 DEFAULT NULL);

Parameters

Table 47-13 INSERT_BIN_CAT_FREQ Procedure Parameters

Parameter	Description
`bin_table_name`	Name of the categorical bin table generated using `CREATE_BIN_CAT` procedure
`data_table_name`	Name of the table containing the data
`bin_num`	Number of bins
`exclude_list`	List of columns (attributes) to be excluded from this iteration of the binning process
`default_num`	Number of default values
`bin_support`	Bin support as a fraction
`bin_schema_name`	Name of the schema hosting the bin definition table
`data_schema_name`	Name of the schema hosting the table with data

Usage Notes

For a given input table, you can iteratively call this routine several times with different specifications for number of bins for a given input table. For each iteration, you can selectively exclude attributes (that is, column names) using the exclude_list parameter for a particular binning specification.

Columns with all NULLs are ignored. No bin definitions are populated when bin_num = 0, or bin_num, is NULL.

Examples

The simplest invocation of this routine populates bin definitions in the cat_bin_table for all the categorical attributes found in build_table.

BEGIN

DBMS_DATA_MINING_TRANSFORM.INSERT_BIN_NUM(
      'cat_bin_table', 'build_table');

END;
/

INSERT_BIN_NUM_EQWIDTH Procedure

This procedure finds the numerical binning definition for every NUMBER column in the data table that is not specified in the exclusion list and inserts the definition into the numerical binning definition table that was created using CREATE_BIN_NUM.

Definition for each relevant column is computed based on the minimum and maximum values that are computed from the data table. Each of the bin_num (= N) bins bin_1,..., bin_N span ranges of equal width inc = (max – min) / N where bin_I = I when N > 0 or bin_I = N+1–I when N < 0, and bin_0 = bin_(N+1) = NULL.

The values of the val column in the bin definition table are rounded to round_num significant digits. For more information, see the Usage Notes.

Syntax

DBMS_DATA_MINING_TRANSFORM.INSERT_BIN_NUM_EQWIDTH (
     bin_table_name        IN VARCHAR2,
     data_table_name       IN VARCHAR2,
     bin_num               IN PLS_INTEGER DEFAULT 10,
     exclude_list          IN Column_List DEFAULT NULL,
     round_num             IN PLS_INTEGER DEFAULT 6,
     bin_schema_name       IN VARCHAR2 DEFAULT NULL,
     data_schema_name      IN VARCHAR2 DEFAULT NULL);

Parameters

Table 47-14 INSERT_BIN_EQWIDTH Procedure Parameters

Parameter	Description
`bin_table_name`	Name of the numerical bin table generated using `CREATE_BIN_NUM` procedure
`data_table_name`	Name of the table containing the data
`bin_num`	Number of bins
`exclude_list`	List of columns (attributes) to be excluded from this iteration of the binning process
`round_num`	Number of significant digits. See Usage Notes.
`bin_schema_name`	Name of the schema hosting the bin definition table
`data_schema_name`	Name of the schema hosting the table with data

Usage Notes

For a given input table, you can iteratively call this routine with different specifications for number of bins for a given input table. For each iteration, you can selectively exclude attributes (that is, column names) using the exclude_list parameter for a particular binning specification.

Columns with all NULL values or only one unique value are ignored. No bin definitions are populated when bin_num = 0, or bin_num is NULL.

For example, when N=2, col='mycol', min=10, and max = 21, the following three rows are inserted into the definition table (inc = 5.5):

COL     VAL  BIN
 ----- -----  -----
 mycol    10  NULL
 mycol  15.5  1
 mycol    21  2

The round_num parameter specifies how to round the number in the VAL column of the definition table. When round_num is positive, it specifies the most significant digits to retain. When round_num is negative, it specifies the least significant digits to remove. In both cases, the result is rounded to the specified number of digits. When round_num is 0, the value is unchanged.

For example, a value of 308.162 would be rounded as follows.

For a value of 308.162:
     when round_num = 1       result is 300
     when round_num = 2       result is 310
     when round_num = 3       result is 308
     when round_num =  0      result is 308.162
     when round_num = -1      result is 308.16
     when round_num = -2      result is 308.2
     when round_num = NULL    result is NULL

Examples

The simplest invocation of this routine populates bin definitions in the num_bin_table for all the numerical attributes found in build_table.

BEGIN

DBMS_DATA_MINING_TRANSFORM.INSERT_BIN_NUM(
      'num_bin_table', 'build_table');

END;

INSERT_BIN_NUM_QTILE Procedure

This procedure finds a numerical binning definition for every NUMBER column in the data table that is not specified in the exclusion list and inserts the definition into the binning definition table that was created using CREATE_BIN_NUM.

The definition for each relevant column is computed based on the minimum values for each quantile, where quantiles are computed from the data using NTILE function. Bins bin_1,..., bin_N span the following ranges: bin_1 spans [min_1,min_2]; bin_2,..., bin_i,..., bin_N-1 span (min_i, min_(i+1)] and bin_N spans (min_N, max_N]. Bins with equal left and right boundaries are collapsed.

Syntax

DBMS_DATA_MINING_TRANSFORM.INSERT_BIN_NUM_QTILE (
    bin_table_name       IN VARCHAR2,
    data_table_name      IN VARCHAR2,
    bin_num              IN PLS_INTEGER DEFAULT 10,
    exclude_list         IN Column_List DEFAULT NULL,
    bin_schema_name      IN VARCHAR2 DEFAULT NULL,
    data_schema_name     IN VARCHAR2 DEFAULT NULL);

Parameters

Table 47-15 INSERT_BIN_NUM_QTILE Procedure Parameters

Parameter	Description
`bin_table_name`	Name of the numerical binning definition table generated using the `CREATE_BIN_NUM` procedure
`data_table_name`	Name of the table containing the data
`bin_num`	Number of bins
`exclude_list`	List of columns (attributes) to be excluded from this iteration of the binning process
`bin_schema_name`	Name of the schema hosting the numerical binning definition table
`data_schema_name`	Name of the schema hosting the table with data

Usage Notes

For a given input table, you can iteratively call this routine several times with different specifications for bin_num for a given input table. For each iteration, you can selectively exclude attributes (that is, column names) using the exclude_list parameter for a particular specification. Columns with all NULL values are ignored.

Example 1. When N = 4, col='mycol', and data is {1,2,2,2,2,3,4}, the following three rows are inserted into the definition table:

COL     VAL   BIN 
----- -----  ----- 
mycol     1   NULL 
mycol     2   1
mycol     4   2

Here quantities are {1,2}, {2,2}, {2,3}, {4} and min(1) = 1, min(2) = 2, min(3) = 2, min(4) = 4, max(4) = 4, and ranges are [1,2], (2,2], (2,4], (4,4]. After collapsing [1,2] and (2,4].

Examples

The simplest invocation of this routine populates numerical binning definitions in the num_bin_table for all the numerical attributes found in build_table.

BEGIN
    DBMS_DATA_MINING_TRANSFORM.INSERT_BIN_NUM_QTILE(
       'num_bin_table', 'build_table');
END;

INSERT_BIN_SUPER Procedure

This procedure finds a numerical and categorical binning definition for every NUMBER, FLOAT, VARCHAR2, or CHAR column in the data table that is not in the exclusion list and inserts the definition into the numerical or categorical binning definition table.

The bin boundaries are computed using a technique called Supervised Binning. The definition for each relevant column is computed based on the splits found by a Decision Tree model built on a single predictor. Columns that have no interesting splits are inserted into the column removal definition table (if one is specified).

Syntax

DBMS_DATA_MINING_TRANSFORM.INSERT_BIN_SUPER (
    num_table_name        IN VARCHAR2,
    cat_table_name        IN VARCHAR2,
    data_table_name       IN VARCHAR2,
    target_column_name    IN VARCHAR2,
    max_bin_num           IN PLS_INTEGER  DEFAULT 1000,
    exclude_list          IN COLUMN_LIST  DEFAULT NULL,
    num_schema_name       IN VARCHAR2     DEFAULT NULL,
    cat_schema_name       IN VARCHAR2     DEFAULT NULL,
    data_schema_name      IN VARCHAR2     DEFAULT NULL,
    rem_table_name        IN VARCHAR2     DEFAULT NULL,
    rem_schema_name       IN VARCHAR2     DEFAULT NULL);

Parameters

Table 47-16 INSERT_BIN_SUPER Procedure Parameters

Parameter	Description
`num_table_name`	Name of the numerical bin definition table generated using the `CREATE_BIN_NUM` procedure.
`cat_table_name`	Name of the categorical bin definition table generated using the `CREATE_BIN_CAT` procedure.
`data_table_name`	Name of the table containing the data.
`target_column_name`	Name of the target column.
`max_bin_num`	The maximum number of bins.
`exclude_list`	List of columns (attributes) to be excluded from this iteration of the binning process.
`num_schema_name`	Schema of the numerical bin definition table.
`cat_schema_name`	Schema of the categorical bin definition table.
`data_schema_name`	Schema of the table containing the data.
`rem_table_name`	Name of the column removal definition table created with `CREATE_COL_REM`.
`rem_schema_name`	Schema of the column removal definition table.

INSERT_CLIP_TRIM_TAIL Procedure

This procedure finds the trimming definition for every NUMBER column in the data table that is not specified in the exclusion list and inserts the definition into the clipping definition table that was created using CREATE_CLIP.

The definition for each relevant column is computed based on the non-NULL values sorted in ascending order such that val(1) < val(2) <... < val(N), where N is a total number of non-NULL values in a column:

lcut = val(1+floor(N*q))
lval = NULL
rcut = val(N–floor(*N*q))
rval = NULL

where q = ABS(NVL(tail_frac,0)). Nothing is done when q >= 0.5.

Syntax

DBMS_DATA_MINING_TRANSFORM.INSERT_CLIP_TRIM_TAIL (
    clip_table_name     IN VARCHAR2,
    data_table_name     IN VARCHAR2,
    tail_frac           IN NUMBER DEFAULT 0.025,
    exclude_list        IN Column_List DEFAULT NULL,
    clip_schema_name    IN VARCHAR2 DEFAULT NULL,
    data_schema_name    IN VARCHAR2 DEFAULT NULL);

Parameters

Table 47-17 INSERT_CLIP_TRIM_TAIL Procedure Parameters

Parameter	Description
`clip_table_name`	Name of the clipping definition table generated using the `CREATE_CLIP` procedure
`data_table_name`	Name of the table containing the data
`tail_frac`	Tail fraction
`exclude_list`	List of columns (attributes) to be excluded from this iteration of the clipping process
`clip_schema_name`	Name of the schema hosting the clipping definition table
`data_schema_name`	Name of the schema hosting the table with data

Usage Notes

For a given input table, you can iteratively call this routine several times with different specifications for tail_frac for a given input table. For each iteration, you can selectively exclude attributes (that is, column names) using the exclude_list parameter for a particular specification.

Example 1. When q = 0.2, col='mycol', and data is {1,2,2,2,3,4,4}, the following row is inserted into the definition table:

COL    LCUT   LVAL   RCUT   RVAL
----- -----  -----  -----  -----
mycol     2    NULL     4   NULL

Here 1 + floor(N*q) = 1 + floor(7*0.2) = 2, lcut = val(2) = 2.

N – floor(N*q) = 7 – floor(7*0.2) = 6, rcut = val(6) = 4.

Examples

The simplest invocation of this routine populates clipping definitions in the clip_table for all the numerical attributes found in build_table.

BEGIN
    DBMS_DATA_MINING_TRANSFORM.INSERT_CLIP_TRIM_TAIL(
       'clip_table', 'build_table');
END;

INSERT_CLIP_WINSOR_TAIL Procedure

This procedure finds the Winsorizing definition for every NUMBER column in the data table that is not specified in the exclusion list and inserts the definition into the clipping definition table that was created using CREATE_CLIP.

Definition for each relevant column is computed based on the non-NULL values sorted in ascending order such that val(1) < val(2) <... < val(N), where N is a total number of non-NULL values in a column:

lcut = val(1+floor(N*q))
lval = lcut
rcut = val(N–floor(N*q))
rval = rcut

where q = ABS(NVL(tail_fraq,0)). Nothing is done when q >= 0.5.

Syntax

DBMS_DATA_MINING_TRANSFORM.INSERT_CLIP_WINSOR_TAIL (
    clip_table_name    IN VARCHAR2,
    data_table_name    IN VARCHAR2,
    tail_frac          IN NUMBER DEFAULT 0.025,
    exclude_list       IN Column_List DEFAULT NULL,
    clip_schema_name   IN VARCHAR2 DEFAULT NULL,
    data_schema_name   IN VARCHAR2 DEFAULT NULL);

Parameters

Table 47-18 INSERT_CLIP_WINSOR_TAIL Procedure Parameters

Parameter	Description
`clip_table_name`	Name of the clipping definition table generated using `CREATE_CLIP` procedure
`data_table_name`	Name of the table containing the data
`tail_frac`	Tail fraction
`exclude_list`	List of columns (attributes) to be excluded from this iteration of the clipping process
`clip_schema_name`	Name of the schema hosting the clipping definition table
`data_schema_name`	Name of the schema hosting the table with data

Usage Notes

For a given input table, you can iteratively call this routine several times with different specifications for tail_frac for a given input table. For each iteration, you can selectively exclude attribute (that is, column names using the exclude_list parameter for a particular specification. Columns with all NULL values are ignored.

Example 1. When q = 0.2, col='mycol', and data is {1,2,2,2,3,4,4}, the following row is inserted into the definition table:

COL    LCUT   LVAL   RCUT  RVAL
----- -----  -----  ----- -----
mycol     2      2      4      4

Here 1 + floor(N*q) = 1 + floor(7*0.2) = 2, lcut = val(2) = 2.

N – floor(N*q) = 7 – floor(7*0.2) = 6, rcut = val(6) = 4.

Examples

The simplest invocation of this routine populates clipping definitions in the clip_table for all the numerical attributes found in build_table.

BEGIN
    DBMS_DATA_MINING_TRANSFORM.INSERT_CLIP_WINSOR_TAIL(
       'clip_table', 'build_table');
END;

INSERT_MISS_CAT_MODE Procedure

This procedure finds the categorical missing value treatment definition for every VARCHAR2 and CHAR column in the data table that is not specified in the exclusion list and inserts the definition into the definition table that was created using CREATE_MISS_CAT.

The definition for each selected column is computed based on the mode value that is computed from the data table.

Syntax

DBMS_DATA_MINING_TRANSFORM.INSERT_MISS_CAT_MODE (
    miss_table_name    IN VARCHAR2,
    data_table_name    IN VARCHAR2,
    exclude_list       IN COLUMN_LIST DEFAULT NULL,
    miss_schema_name   IN VARCHAR2 DEFAULT NULL,
    data_schema_name   IN VARCHAR2 DEFAULT NULL);

Parameters

Table 47-19 INSERT_MISS_CAT_MODE Procedure Parameters

Parameter	Description
`miss_table_name`	Name of the categorical missing value treatment definition table generated using `CREATE_MISS_CAT`
`data_table_name`	Name of the table containing the data
`exclude_list`	List of columns (attributes) to be excluded from this iteration of the missing value treatment. See "Data Types" for the definition of `COLUMN_LIST`.
`miss_schema_name`	Name of the schema hosting the categorical missing value treatment definition table
`data_schema_name`	Name of the schema hosting the table containing the data

Usage Notes

You can choose the categorical attributes that will receive missing value treatment by using the exclude_list parameter. NULL values in all the selected attributes will be replaced with the mode (the most commonly occurring value) for the attribute.

If you wish to replace NULLs with some other value, you can edit the definition table.

Example

The simplest invocation of this routine populates missing value definitions (the mode) in miss_table for all categorical attributes found in build_table.

BEGIN
    DBMS_DATA_MINING_TRANSFORM.INSERT_MISS_CAT_MODE(
       'miss_table', 'build_table');
END;

INSERT_MISS_NUM_MEAN Procedure

This procedure finds the numerical missing value treatment definition for every NUMBER column in the data table that is not specified in the exclusion list and inserts the definition into the definition table that was created using CREATE_MISS_NUM.

The definition for each selected column is computed based on the mean value that is computed from the data table. The value of mean is rounded to round_num significant digits prior to storing it in the definition table.

Syntax

DBMS_DATA_MINING_TRANSFORM.INSERT_MISS_NUM_MEAN (
    miss_table_name    IN VARCHAR2,
    data_table_name    IN VARCHAR2,
    exclude_list       IN COLUMN_LIST DEFAULT NULL,
    round_num          IN PLS_INTEGER DEFAULT 6,
    miss_schema_name   IN VARCHAR2 DEFAULT NULL,
    data_schema_name   IN VARCHAR2 DEFAULT NULL);

Parameters

Table 47-20 INSERT_MISS_NUM_MEAN Procedure Parameters

Parameter	Description
`miss_table_name`	Name of the categorical missing value treatment definition table generated using `CREATE_MISS_CAT`
`data_table_name`	Name of the table containing the data
`exclude_list`	List of columns (attributes) to be excluded from this iteration of the miss value treatment. See "Data Types" for the definition of `COLUMN_LIST`.
`round_num`	The number of significant digits
`miss_schema_name`	Name of the schema hosting the numerical missing value treatment definition table
`data_schema_name`	Name of the schema hosting the table containing the data

Usage Notes

You can choose the numerical attributes that will receive missing value treatment by using the exclude_list parameter. NULL values in all the selected attributes will be replaced with the mean (average) value for the attribute.

If you wish to replace NULLs with some other value, you can edit the definition table.

Example

The simplest invocation of this routine populates missing value definitions (the mode) in miss_table for all numerical attributes found in build_table.

BEGIN
    DBMS_DATA_MINING_TRANSFORM.INSERT_MISS_CAT_MODE(
       'miss_table', 'build_table');
END;

INSERT_NORM_LIN_MINMAX Procedure

This procedure finds the normalization definition for every NUMBER column in the data table that is not specified in the exclusion list and inserts the definition based on min-max normalization into the table that was created using CREATE_NORM_LIN.

Definition for each relevant column is computed based on the mean and standard deviation that are computed from the data table, such that shift = mean and scale = standard deviation. The values of shift and scale are rounded to round_num significant digits prior to storing them in the definition table.

Syntax

DBMS_DATA_MINING_TRANSFORM.INSERT_NORM_LIN_MINMAX (
     norm_table_name     IN VARCHAR2,
     data_table_name     IN VARCHAR2,
     exclude_list        IN Column_List DEFAULT NULL,
     round_num           IN PLS_INTEGER DEFAULT 6,
     norm_schema_name    IN VARCHAR2 DEFAULT NULL,
     data_schema_name    IN VARCHAR2 DEFAULT NULL);

Parameters

Table 47-21 INSERT_NORM_LIN_MINMAX Procedure Parameters

Parameter	Description
`norm_table_name`	Name of the normalization table generated using `CREATE_NORM_LIN` procedure
`data_table_name`	Name of the table containing the data
`exclude_list`	List of columns (attributes) to be excluded from this iteration of the normalization process
`round_num`	Number of significant digits
`norm_schema_name`	Name of the schema hosting the normalization definition table
`data_schema_name`	Name of the schema hosting the table with data

Usage Notes

For a given input table, you can iteratively call this routine several times with selective exclusion of attributes (that is, column names) using the exclude_list parameter for a particular normalization specification.

Columns with all NULL values or only one unique value are ignored.

Examples

The simplest invocation of this routine populates normalization definitions in the norm_minmax_table for all the numerical attributes found in build_table.

BEGIN

DBMS_DATA_MINING_TRANSFORM.INSERT_NORM_LIN_MINMAX(
      'norm_minmax_table', 'build_table');

END;

INSERT_NORM_LIN_SCALE Procedure

The normalization definition for each attribute is computed based on the minimum and maximum values of the data. The values for shift and scale are shift = 0 and scale = max{abs(max), abs(min)}.

Syntax

DBMS_DATA_MINING_TRANSFORM.INSERT_NORM_LIN_SCALE (
     norm_table_name     IN VARCHAR2,
     data_table_name     IN VARCHAR2,
     exclude_list        IN Column_List DEFAULT NULL,
     round_num           IN PLS_INTEGER DEFAULT 6,
     norm_schema_name    IN VARCHAR2 DEFAULT NULL,
     data_schema_name    IN VARCHAR2 DEFAULT NULL);

Parameters

Table 47-22 INSERT_NORM_LIN_SCALE Procedure Parameters

Parameter	Description
`norm_table_name`	Name of the normalization table generated using `CREATE_NORM_LIN` procedure
`data_table_name`	Name of the table containing the data
`exclude_list`	List of columns (attributes) to be excluded from this iteration of the normalization process
`round_num`	Number of significant digits
`norm_schema_name`	Name of the schema hosting the normalization definition table
`data_schema_name`	Name of the schema hosting the table with data

Usage Notes

For a given input table, you can iteratively call this routine several times with selective exclusion of attributes (that is, column names) using the exclude_list parameter for a particular normalization specification.

Columns with all NULL values or only one unique value are ignored.

Examples

The simplest invocation of this routine populates normalization definitions in the norm_minmax_table for all the numerical attributes found in build_table.

BEGIN

DBMS_DATA_MINING_TRANSFORM.INSERT_NORM_LIN_SCALE(
      'norm_scale_table', 'build_table');

END;

INSERT_NORM_LIN_ZSCORE Procedure

This procedure finds the normalization definition for every NUMBER column in the data table that is not specified in the exclusion list and inserts the definition based on z-score normalization into the table that was created using CREATE_NORM_LIN.

Definition for each relevant column is computed based on the minimum and maximum values that are computed from the data table, such that shift = min and scale = max – min. The values of shift and scale are rounded to round_num significant digits prior to storing them in the definition table.

Syntax

DBMS_DATA_MINING_TRANSFORM.INSERT_NORM_LIN_ZSCORE (
     norm_table_name     IN VARCHAR2,
     data_table_name     IN VARCHAR2,
     exclude_list        IN Column_List DEFAULT NULL,
     round_num           IN PLS_INTEGER DEFAULT 6,
     norm_schema_name    IN VARCHAR2 DEFAULT NULL,
     data_schema_name    IN VARCHAR2 DEFAULT NULL);

Parameters

Table 47-23 INSERT_NORM_LIN_ZSCORE Procedure Parameters

Parameter	Description
`norm_table_name`	Name of the normalization table generated using `CREATE_NORM_LIN` procedure
`data_table_name`	Name of the table containing the data
`exclude_list`	List of columns (attributes) to be excluded from this iteration of the normalization process
`round_num`	Number of significant digits
`norm_schema_name`	Name of the schema hosting the normalization definition table
`data_schema_name`	Name of the schema hosting the table with data

Usage Notes

Columns with all NULL values or only one unique value are ignored.

Examples

The simplest invocation of this routine populates normalization definitions in the norm_zscore_table for all the numerical attributes found in build_table.

BEGIN

DBMS_DATA_MINING_TRANSFORM.INSERT_NORM_LIN_ZSCORE(
      'norm_zscore_table', 'build_table');

END;

SET_EXPRESSION Procedure

This procedure appends a VARCHAR2 chunk to the expression. Appending NULL clears the expression.

Syntax

DBMS_DATA_MINING_TRANSFORM.SET_EXPRESSION (
           expression       IN OUT NOCOPY EXPRESSION_REC,
           chunk            VARCHAR2 DEFAULT NULL);

Parameters

Table 47-24 SET_EXPRESSION Procedure Parameters

Parameter	Description
`expression`	Transformation expression
`chunk`	Chunk to add to the expression

SET_TRANSFORM Procedure

This procedure specifies a transformation expression and a reverse transformation expression for an attribute.

The new transformation specification is appended to the transformation list.

Syntax

DBMS_DATA_MINING_TRANSFORM.SET_TRANSFORM (
          xform_list               IN OUT NOCOPY TRANSFORM_LIST,
          attribute_name           VARCHAR2,
          attribute_subname        VARCHAR2,
          expression               VARCHAR2,
          reverse_expression       VARCHAR2,
          attribute_spec           VARCHAR2 DEFAULT NULL);

Parameters

Table 47-25 SET_TRANSFORM Procedure Parameters

Parameter	Description
`xform_list`	The transformation list for the attribute
`attribute_name`	Name of the attribute
`attribute_subname`	Name of the nested attribute if `attribute_name` is a nested column, otherwise `NULL`.
`expression`	A SQL expression.
`reverse_expression`	SQL expression that reverses the transformation for readability in model details and in the target of a supervised model (if the attribute is a target)
`attribute_spec`	You can specify the value `NOPREP` for `attribute_spec` to prevent this attribute from being automatically transformed in `DBMS_DATA_MINING.CREATE_MODEL`. (See "CREATE_MODEL Procedure".)

STACK_BIN_CAT Procedure

For every column in the stack definition that has a matching entry in the transformation definition, this procedure stacks a categorical binning expression on top of the existing expression and updates the stack definition.

Columns in the transformation definition that have no matching entries in the stack definition are added to the stack definition (using "{col}" in place of the original expression or "VALUE" for nested attributes).

Given an entry ({attr}, {expr}, {rexp}) in the stack definition and a matching entry in the transformation definition, {expr} has the following form after stacking:

DECODE({expr}, '{val1}', '{bin1}',
                 ...
               '{valN}', '{binN}',
                NULL,     NULL,
                         '{bin(N+1)}')

and {rexp} has the following form:

DECODE("{attr}", '{bin1}',     '{rev1}',
                  ...
                  '{binN}',     '{revN}',
                  NULL,         'NULL',
                  '{bin(N+1)}', 'DEFAULT')

where {revI} is the result of applying reverse expression to {valI}.

If {binI} and {binJ} are equal, then the corresponding entries of the DECODE function above are merged into:

'{binI}', '{revI}, {revJ}'

If more than one entry maps to the default bin {bin(N+1)}, they are all merged to

'{bin(N+1)}', 'DEFAULT'

Note that the reverse expression implicitly maps invalid bins to NULL.

Syntax

DBMS_DATA_MINING_TRANSFORM.STACK_BIN_CAT (
     bin_table_name     IN       VARCHAR2,
     transform_list     IN/OUT   DM_TRANSFORMS,
     literal_flag       IN       BOOLEAN  DEFAULT FALSE,
     bin_schema_name    IN       VARCHAR2 DEFAULT NULL);

Parameters

Table 47-26 STACK_BIN_CAT Procedure Parameters

Parameter	Description
`bin_table_name`	Name of the bin definition table
`transform_list`	List of stacked transformations
`literal_flag`	Literal flag (See the Usage Notes)
`bin_schema_name`	Schema of the bin definition table

Usage Notes

The Literal flag indicates whether the values in {bin} are valid SQL literals. When the the flag is set to TRUE, the value of {bin} is used "as is" in the expression generation, otherwise it is converted into a valid text literal (surrounded by quotes and double the quotes inside). By default the flag is set to FALSE. One example when it can be set to TRUE is in cases when all {bin} are numbers. In that case the transformed column will remain numeric as opposed to textual (default behavior). For example, for the following transform definition:

COL   VAL            BIN
      ----- -------------- ----
      mycol Waltham        1
      mycol Burlington     1
      mycol Redwood Shores 2

the following {expr} and {rexp} are generated when the flag is FALSE:

expr = DECODE("mycol", 'Waltham',        '1',
                            'Burlington',     '1',
                            'Redwood Shores', '2')
     rexp = DECODE("COL1", '1',  '''Waltham'', ''Burlington''',
                           '2',  '''Redwood Shores''',
                             NULL, 'DEFAULT')

and when the flag is set to TRUE:

expr = DECODE("mycol", 'Waltham',        1,
                            'Burlington',     1,
                            'Redwood Shores', 2)
      rexp = DECODE("COL1", 1,    '''Waltham'', ''Burlington''',
                            2,    '''Redwood Shores''',
                            NULL, 'DEFAULT')

Example

Given transformation definition:

{col = COL1, val = waltham,        bin = MA}
       {col = COL1, val = burlington,     bin = MA}
       {col = COL1, val = redwood shores, bin = CA}
       {col = COL2, val = MA,             bin = East}
       {col = COL2, val = CA,             bin = West}
       {col = COL2, val = NULL,           bin = USA}

and stack definition:

{attr = COL1, expr = lower(COL1), rexp = initcap(COL1)} 
       {attr = COL3, expr = upper(COL3), rexp = initcap(COL3)}

the following updated stack definition is generated:

{attr = COL1,
        expr = DECODE(lower(COL1), 'waltham',        'MA',
                                   'burlington',     'MA',
                                   'redwood shores', 'CA'),
        rexp = DECODE("COL1", 'MA', '''Waltham'', ''Burlington''',
                              'CA', '''Redwood Shores''',
                               NULL, 'DEFAULT')}
       {attr = COL3,
        expr = upper(COL3),
        rexp = initcap(COL3)}
       {attr = COL2,
        expr = DECODE("COL2", 'MA', 'East',
                              'NY', 'East',
                              'CA', 'West',
                               NULL, NULL,
                                    'USA')
        rexp = DECODE("COL2", 'East', '''MA''',
                              'West', '''CA''',
                               NULL,   'NULL',
                              'USA',  'DEFAULT')}

STACK_BIN_NUM Procedure

For every column in the stack definition that has a matching entry in the transformation definition, this procedure stacks numerical a binning expression on top of the existing expression and updates the stack definition.

Columns in the stack definition that have no matching entries in the transformation definition or have NULL expression are not changed. Columns in the transformation definition that have no matching entries in the stack definition are added to the stack definition (using "{col}" in place of the original expression or "VALUE" for nested attributes).

Reverse expressions in the stack definition are updated accordingly, that is if an expression is updated, added, or remains unchanged, then a corresponding reverse expression is also updated, added, or remains unchanged. When reverse expression is NULL it is treated as identity function ("{attr}" or "VALUE" for nested attributes).

Given an entry ({attr}, {expr}, {rexp}) in the stack definition and a matching entry in the transformation definition, {expr} has the following form after stacking:

CASE WHEN ({expr}) <  {val0}   THEN '{bin0}'
     WHEN ({expr}) <= {val1}   THEN '{bin1}'
     ...
     WHEN ({expr}) <= {valN}   THEN '{binN}'
     WHEN ({expr}) IS NOT NULL THEN '{bin(N+1)}'
END

and {rexp} has the following form:

DECODE("{attr}", '{bin0}',     '( ; {rev0})',
                 '{bin1}',     '[{rev0}; {rev1})',
                 ...
                 '{binN}',     '[{rev(N-1)}; {revN}]',
                 '{bin(N+1)}', '({revN}; )',
                   NULL,         'NULL')

where {revI} is the result of applying reverse expression to {valI}.

If {binI} and {binJ} are equal then the corresponding entries of the DECODE function above are merged into:

'{binI}', '[{rev(I-1)}; {revI}), [{rev(J-1)}; {revJ})'

Note that reverse expression implicitly maps invalid bins to NULL.

Syntax

DBMS_DATA_MINING_TRANSFORM.STACK_BIN_NUM (
     bin_table_name     IN       VARCHAR2,
     transform_list     IN/OUT   DM_TRANSFORMS,
     literal_flag       IN       BOOLEAN  DEFAULT FALSE,
     bin_schema_name    IN       VARCHAR2 DEFAULT NULL);

Parameters

Table 47-27 STACK_BIN_NUM Procedure Parameters

Parameter	Description
`bin_table_name`	Name of the bin definition table.
`transform_list`	Stack definition.
`literal_flag`	Literal flag (See Usage Notes)
`bin_schema_name`	Schema of the bin definition table

Usage Notes

Literal flag indicates whether the values in {bin} are valid SQL literals. When the flag is set to TRUE the value of {bin} is used "as is" in the expression generation, otherwise it is converted into a valid text literal (surround by quotes and double the quotes inside. By default the flag is set to FALSE. One example when it can be set to TRUE is in cases when all {bin} are numbers. In that case the transformed column will remain numeric as opposed to textual (default behavior). For example, for the following xform definition:

COL     VAL BIN
  ----- ----- -----
  mycol    10 NULL
  mycol    15 1
  mycol    20 2

the following {expr} and {rexp} are generated when the flag is FALSE:

expr = CASE WHEN "mycol" <  10 THEN NULL
                    WHEN "mycol" <= 15 THEN '1'
                    WHEN "mycol" <= 20 THEN '2' END

        rexp = DECODE("mycol", '1',  '[10; 15)',
                               '2',  '[15; 20]'
                               NULL, '( ; 10) OR (20; ) OR NULL')

and when the flag is set to TRUE:

expr = CASE WHEN "mycol" <  10 THEN NULL
                    WHEN "mycol" <= 15 THEN 1
                    WHEN "mycol" <= 20 THEN 2 END
        rexp = DECODE("mycol", 1,    '[10; 15)',
                               2,    '[15; 20]',
                               NULL, '( ; 10) OR (20; ) OR NULL')

Examples

Given transformation definition:

{col = COL1, val = 0,   bin = NULL}
  {col = COL1, val = 1,   bin = A}
  {col = COL1, val = 2,   bin = B}
  {col = COL1, val = 3,   bin = A}
  {col = COL2, val = 10,  bin = NULL}
  {col = COL2, val = 15,  bin = 1}
  {col = COL2, val = 20,  bin = 2}

and stack definition:

{attr = COL1, expr = log(10, COL1), rexp = power(10, COL1)} 
  {attr = COL3, expr = ln(COL3),      rexp = exp(COL3)}

the following updated stack definition is generated:

{attr = COL1,
   expr = CASE WHEN (log(10, COL1)) <  0 THEN NULL
               WHEN (log(10, COL1)) <= 1 THEN 'A'
               WHEN (log(10, COL1)) <= 2 THEN 'B'
               WHEN (log(10, COL1)) <= 3 THEN 'A' END,
   rexp = DECODE("COL1", 'A',   '[1; 10), [100; 1000]',
                         'B',   '[10; 100)',
                         NULL, '( ; 1), (1000; ), NULL')}
   {attr = COL3,
    expr = ln(COL3),
    rexp = exp(COL3)}
   {attr = COL2,
    expr = CASE WHEN "COL2" <  10 THEN NULL
                WHEN "COL2" <= 15 THEN '1'
                WHEN "COL2" <= 20 THEN '2' END
    rexp = DECODE("COL2", '1',  '[10; 15)',
                          '2',  '[15; 20]',
                           NULL, '( ; 10) OR (20; ) OR NULL')}

STACK_CLIP Procedure

For every column in the stack definition that has a matching entry in the transformation definition. Stacks clipping expression on top of the existing expression and updates the stack definition.

Columns in the stack definition that have no matching entries in the transformation definition or have NULL expression are not changed.

Given an entry ({attr}, {expr}, {rexp}) in the stack definition and a matching entry in the transformation definition, after stacking {expr} has the following form:

CASE WHEN {expr} < {lcut} THEN {lval}
       WHEN {expr} > {rcut} THEN {rval}
                            ELSE {expr}
    END

and {rexp} remains unchanged

Syntax

DBMS_DATA_MINING_TRANSFORM.STACK_CLIP (
          clip_table_name       VARCHAR2,
          xform_list            IN OUT NOCOPY TRANSFORM_LIST,
          clip_schema_name       VARCHAR2 DEFAULT NULL);

Parameters

Table 47-28 STACK_CLIP Procedure Parameters

Parameter	Description
`clip_table_name`	Clipping definition table
`xform_list`	Stack definition
`clip_schema_list`	Schema of the clipping definition table. If you do not specify a schema, you own schema is assumed.

Example

Given transformation definition:

{col = COL1, lcut = -1.5, lval = -1.5, rcut = 4.5, rval = 4.5}
       {col = COL2, lcut = 0,    lval = 0,    rcut = 1,   rval = 1}

and stack definition:

{attr = COL1, expr = log(10, COL1), rexp = power(10, COL1)} 
       {attr = COL3, expr = ln(COL3),      rexp = exp(COL3)}

the following updated stack definition is generated:

{attr = COL1,
        expr = CASE WHEN log(10, COL1) < -1.5 THEN -1.5
                    WHEN log(10, COL1) > 4.5  THEN 4.5
                                              ELSE log(10, COL1)
               END,
        rexp = power(10, COL1)}
       {attr = COL3,
        expr = ln(COL3),
        rexp = exp(COL3)}
       {attr = COL2,
        expr = CASE WHEN COL2 < 0 THEN 0
                    WHEN COL2 > 1 THEN 1
                                  ELSE COL2
               END,
        rexp = NULL}

STACK_COL_REM Procedure

For every column in the stack definition that has a matching entry in the transformation definition, this procedure stacks column removal expression (NULL expression) instead of the existing expression and updates the stack definition.

Given an entry ({attr}, {expr}, {rexp}) in the stack definition and a matching entry in the transformation definition, after stacking {expr} and {rexp} are both changed to NULL.

Syntax

DBMS_DATA_MINING_TRANSFORM.STACK_COL_REM (
     rem_table_name     IN       VARCHAR2,
     transform_list     IN/OUT   DM_TRANSFORMS,
     rem_schema_name    IN       VARCHAR2 DEFAULT NULL);

Parameters

Table 47-29 STACK_COL_REM Procedure Parameters

Parameter	Description
`rem_table_name`	Name of the column removal definition table.
`transform_list`	Stack definition
`rem_schema_name`	Schema of the column removal definition table.

Examples

Given transformation definition:

{col = COL1, att = NULL}
   {col = COL2, att = NULL}

and stack definition:

{attr = COL1, expr = log(10, COL1), rexp = power(10, COL1)} 
   {attr = COL3, expr = ln(COL3),      rexp = exp(COL3)}

the following updated stack definition is generated:

{attr = COL1, expr = NULL,     rexp = NULL}
   {attr = COL3, expr = ln(COL3), rexp = exp(COL3)}
   {attr = COL2, expr = NULL,     rexp = NULL}

STACK_MISS_CAT Procedure

For every column in the stack definition that has a matching entry in the transformation definition, this procedure stacks categorical missing value treatment expression on top of the existing expression and updates the stack definition.

Reverse expressions in the stack definition are updated accordingly, that is if an expression is updated, added or remains unchanged then a corresponding reverse expression is also updated, added or remains unchanged. When reverse expression is NULL it is treated as identity function ("{attr}" or "VALUE" for nested attributes). Given an entry ({attr}, {expr}, {rexp}) in the stack definition and a matching entry in the transformation definition, after stacking {expr} has the following form:

NVL({expr}, {val})

and {rexp} remains unchanged.

Syntax

DBMS_DATA_MINING_TRANSFORM.STACK_MISS_CAT (
     miss_table_name     IN       VARCHAR2,
     transform_list      IN/OUT   DM_TRANSFORMS,
     miss_schema_name    IN       VARCHAR2 DEFAULT NULL);

Parameters

Table 47-30 STACK_MISS_CAT Procedure Parameters

Parameter	Description
`miss_table_name`	Name of the missing value definition table.
`transform_list`	Stack definition.
`miss_schema_name`	Schema of the missing value definition table.

Examples

Given transformation definition:

{col = COL1, val = 'ma'}
   {col = COL2, val = 'CA'}

and stack definition:

{attr = COL1, expr = lower(COL1), rexp = initcap(COL1)} 
  {attr = COL3, expr = upper(COL3), rexp = initcap(COL3)}

the following updated stack definition is generated:

{attr = COL1,
   expr = NVL(lower(COL1), 'ma'),
   rexp = initcap(COL1)}
  {attr = COL3,
  expr = upper(COL3),
  rexp = initcap(COL3)}
 {attr = COL2,
  expr = NVL(COL2, 'CA'),
  rexp = NULL}

STACK_MISS_NUM Procedure

For every column in the stack definition that has a matching entry in the transformation definition, stacks numerical missing value treatment expression on top of the existing expression and updates the stack definition.

Given an entry ({attr}, {expr}, {rexp}) in the stack definition and a matching entry in the transformation definition, after stacking {expr} has the following form:

NVL({expr}, {val})

and {rexp} remains unchanged

Syntax

DBMS_DATA_MINING_TRANSFORM.STACK_MISS_NUM (
     miss_table_name     IN       VARCHAR2,
     transform_list      IN/OUT   DM_TRANSFORMS,
     miss_schema_name    IN       VARCHAR2 DEFAULT NULL);

Parameters

Table 47-31 STACK_MISS_NUM Procedure Parameters

Parameter	Description
`miss_table_name`	Name of the missing value definition table.
`transform_list`	Stack definition.
`miss_schema_name`	Schema of the missing value definition table.

Examples

Given transformation definition:

{col = COL1, val = 4.5}
   {col = COL2, val = 0}

and stack definition:

{attr = COL1, expr = log(10, COL1), rexp = power(10, COL1)} 
  {attr = COL3, expr = ln(COL3),      rexp = exp(COL3)}

the following updated stack definition is generated:

{attr = COL1,
   expr = NVL(log(10, COL1), 4.5),
   rexp = power(10, COL1)}
  {attr = COL3,
   expr = ln(COL3),
   rexp = exp(COL3)}
  {attr = COL2,
   expr = NVL(COL2, 0),
   rexp = NULL}

STACK_NORM_LIN Procedure

For every column in the stack definition that has a matching entry in the transformation definition, stacks linear normalization expression on top of the existing expression and updates the stack definition.

Given an entry ({attr}, {expr}, {rexp}) in the stack definition and a matching entry in the transformation definition, after stacking {expr} has the following form:

({expr} - {shift})/{scale}

and {rexp} maintains the following form with every occurrence of {attr} replaced with:

{attr}*{scale} + {shift}

Syntax

DBMS_DATA_MINING_TRANSFORM.STACK_NORM_LIN (
     norm_table_name     IN       VARCHAR2,
     transform_list      IN/OUT   DM_TRANSFORMS,
     norm_schema_name    IN       VARCHAR2 DEFAULT NULL);

Parameters

Table 47-32 STACK_NORM_LIN Procedure Parameters

Parameter	Description
norm_table_name	Name of the linear normalization definition table.
transform_list	Stack definition.
norm_schema_name	Schema of the linear normalization definition table.

Examples

Given transformation definition:

{col = COL1, shift = -1.5, scale = 20}
  {col = COL2, shift = 0,    scale = 10}

and stack definition:

{attr = COL1, expr = log(10, COL1), rexp = power(10, COL1)} 
  {attr = COL3, expr = ln(COL3),      rexp = exp(COL3)}

the following updated stack definition is generated:

{attr = COL1,
   expr = (log(10, COL1) - (-1.5)) / 20,
   rexp = power(10, COL1*20 + (-1.5))}
  {attr = COL3,
   expr = ln(COL3),
   rexp = exp(COL3)}
  {attr = COL2,
   expr = (COL2 - 0) / 10,
   rexp = COL2*10 + 0}

XFORM_BIN_CAT Procedure

This procedure creates the view that performs categorical binning. Only the columns that are specified in the definition table are transformed; the remaining columns do not change.

Syntax

DBMS_DATA_MINING_TRANSFORM.XFORM_BIN_CAT (
     bin_table_name       IN VARCHAR2,
     data_table_name      IN VARCHAR2,
     xform_view_name      IN VARCHAR2,
     literal_flag         IN BOOLEAN DEFAULT FALSE,
     bin_schema_name      IN VARCHAR2 DEFAULT NULL,
     data_schema_name     IN VARCHAR2 DEFAULT NULL,
     xform_schema_name    IN VARCHAR2 DEFAULT NULL);

Parameters

Table 47-33 XFORM_BIN_CAT Procedure Parameters

Parameter	Description
`bin_table_name`	Name of the categorized binning definition table generated using `CREATE_BIN_CAT` procedure
`data_table_name`	Name of the table containing the data
`xform_view_name`	View representing the transformed output
`literal_flag`	Literal flag
`bin_schema_name`	Name of the schema hosting the bin definition table
`data_schema_name`	Name of the schema hosting the data table
`xform_schema_name`	Name of the schema hosting the view representing the transformed output

Usage Notes

The bin table created by CREATE_BIN_CAT and populated with bin definitions by INSERT_BIN_CAT_FREQ is used to guide the query generation process to construct categorical binning expressions of the following form:

DECODE("col", val_1, bin_1,
              ...
              val_N, bin_N,
              NULL,   NULL,
              bin_(N+1)) "col"

This expression maps values val_1,..., val_N into N bins bin_1,..., bin_N, and other values into bin_(N+1), while NULL values remain unchanged. bin_(N+1) is optional. If not specified, it defaults to NULL. To specify bin_(N+1) provide a row with val set to NULL.

The literal_flag parameter indicates whether the values in bin are valid SQL literals. When the flag is set to TRUE, the value of bin is used as is in query generation; otherwise it is converted into a valid text literal (surrounded by quotes and each single quote is replaced by two single quotes). By default, the flag is set to FALSE. One example of when it can be set to TRUE is in cases when all bin are numbers. In that case the transformed column will be numeric as opposed to textual (default behavior).

Set literal_flag to TRUE when the data is binned for an O-Cluster model build.

The col parameter is case-sensitive since it generates quoted identifiers. In cases when there are multiple entries with the same col,val combination with different bin, the behavior is undefined — any one of the bin values might be used.

Examples

Example 1. bin_cat contains four rows with col = 'mycol':

{col = 'mycol', val = 'Waltham',        bin = 'MA'}
{col = 'mycol', val = 'Burlington',     bin = 'MA'}
{col = 'mycol', val = 'Redwood Shores', bin = 'CA'}
{col = 'mycol', val = NULL,             bin = 'OTHER'}

the following expression is generated:

DECODE("mycol", 'Waltham',        'MA',
                 'Burlington',     'MA',
                 'Redwood Shores', 'CA',
                  NULL,             NULL,
                                   'OTHER') "mycol"

Example 2. bin_cat contains three rows with col = 'mycol':

{col = 'mycol', val = 'Waltham',        bin = 'MA'}
{col = 'mycol', val = 'Burlington',     bin = 'MA'}
{col = 'mycol', val = 'Redwood Shores', bin = 'CA'}

the following expression is generated:

DECODE("mycol", 'Waltham',        'MA',
                'Burlington',     'MA',
                'Redwood Shores', 'CA') "mycol"

Example 3. For the definition:

COL   VAL            BIN
----- ----------     ---
mycol Waltham        1
mycol Burlington     1
mycol Redwood Shores 2

the following expression is generated when the literal flag is set to FALSE:

DECODE ("mycol", 'Waltham',        '1',
                 'Burlington'      '1',
                 'Redwood Shores', '2') "mycol"

and when the flag is set to TRUE:

DECODE("mycol", 'Waltham',        1,
                'Burlington',     1,
                'Redwood Shores', 2) "mycol"

The simplest invocation of this routine generates a view build_view that represents the transformation query on build_table based on bin definitions in the cat_bin_table.

BEGIN

DBMS_DATA_MINING_TRANSFORM.XFORM_BIN_CAT(
'cat_bin_table', 'build_table', 'build_view');

END;
/

XFORM_BIN_NUM Procedure

This procedure creates the view that performs numerical binning. Only the columns that are specified in the definition table are transformed; the remaining columns do not change.

Syntax

DBMS_DATA_MINING_TRANSFORM.XFORM_BIN_NUM (
     bin_table_name     IN VARCHAR2,
     data_table_name    IN VARCHAR2,
     xform_view_name    IN VARCHAR2,
     literal_flag       IN BOOLEAN DEFAULT FALSE,
     bin_schema_name    IN VARCHAR2 DEFAULT NULL,
     data_schema_name   IN VARCHAR2 DEFAULT NULL,
     xform_schema_name  IN VARCHAR2 DEFAULT NULL);

Parameters

Table 47-34 XFORM_BIN_NUM Procedure Parameters

Parameter	Description
`bin_table_name`	Name of the numerical binning definition table generated using `CREATE_BIN_NUM` procedure
`data_table_name`	Name of the table containing the data
`xform_view_name`	View representing the transformed output
`literal_flag`	Literal flag
`bin_schema_name`	Name of the schema hosting the bin definition table
`data_schema_name`	Name of the schema hosting the data table
`xform_schema_name`	Name of the schema hosting the view representing the transformed output

Usage Notes

The bin table created by CREATE_BIN_NUM and populated with bin definitions by INSERT_BIN_NUM_EQWIDTH or INSERT_BIN_NUM_QTILE is used to guide the query generation process to construct numerical binning expressions of the following form:

CASE WHEN "col" <  val_0   THEN 'bin0_0
     WHEN "col" <= val_1   THEN 'bin_1'
     ...
     WHEN "col" <= val_N   THEN 'bin_N'
     WHEN "col" IS NOT NULL THEN 'bin_(N+1)'
END "col"

This expression maps values in the range [val_0;val_N] into N bins bin_1,..., bin_N, values outside of this range into bin_0 or bin_(N+1), such that

(-inf; val_0)  -> bin_0
[val_0; val_1)  -> bin_1
...
(val_(N-1); val_N] -> bin_N
(val_N; +inf)  -> bin_(N+1)

NULL values remain unchanged. bin_(N+1) is optional. If it is not specified, the values ("col" > val_N) are mapped to NULL. To specify bin_(N+1), provide a row with val set to NULL. The order of the WHEN... THEN pairs is based on the ascending order of val for a given col.

The literal_flag parameter indicates whether the values in bin are valid SQL literals. When the flag is set to TRUE, the value of bin is used as is in query generation; otherwise it is converted into a valid text literal (surrounded by quotes and each single quote is replaced by two single quotes). By default, the flag is set to FALSE. One example of when it can be set to TRUE is in cases when all bin are numbers. In that case the transformed column will be numeric as opposed to textual (default behavior).

Note that col is case-sensitive since it generates quoted identifiers. In cases where there are multiple entries with the same col,val combination with different bin, the behavior is undefined — any one of the bin values might be used.

Examples

Example 1. bin_num contains four rows with col = 'mycol':

{col = 'mycol', val = 15.5, bin = 'small'}
{col = 'mycol', val = 10,   bin = 'tiny'}
{col = 'mycol', val = 20,   bin = 'large'}
{col = 'mycol', val = NULL, bin = 'huge'}

the following expression is generated:

CASE WHEN "mycol" <  10       THEN 'tiny'
     WHEN "mycol" <= 15.5     THEN 'small'
     WHEN "mycol" <= 20       THEN 'large'
     WHEN "mycol" IS NOT NULL THEN 'huge'
 END "mycol"

Example 2. bin_num contains three rows with col = 'mycol':

{col = 'mycol', val = 15.5, bin = NULL}
{col = 'mycol', val = 10,   bin = 'tiny'}
{col = 'mycol', val = 20,   bin = 'large'}

the following expression is generated:

CASE WHEN "mycol" <  10   THEN NULL
     WHEN "mycol" <= 15.5 THEN 'small'
     WHEN "mycol" <= 20   THEN 'large'
END "mycol"

Example 3. For the definition:

COL   VAL  BIN
----- ---- ---
mycol   10 NULL
mycol 15.5 1
mycol   21 2

the following expression is generated when the literal flag is set to FALSE:

CASE WHEN "mycol" <  10   THEN NULL
     WHEN "mycol" <= 15.5 THEN '1'
     WHEN "mycol" <= 20   THEN '2' 
 END "mycol"

and when the flag is set to TRUE:

CASE WHEN "mycol" <   10  THEN NULL
     WHEN "mycol" <= 15.5 THEN 1
     WHEN "mycol" <=  20  THEN 2
     END "mycol"

The simplest invocation of this routine generates a view build_view that represents the transformation query on build_table based on transform definitions in bin definitions in the num_bin_table.

BEGIN

DBMS_DATA_MINING_TRANSFORM.XFORM_BIN_NUM(
      'num_bin_table', 'build_table', 'build_view');

END;
/

XFORM_CLIP Procedure

This procedure creates the view that performs clipping. Only the columns that are specified in the transform definition are clipped; the remaining columns do not change.

Syntax

DBMS_DATA_MINING_TRANSFORM.XFORM_CLIP (
    clip_table_name       IN VARCHAR2, 
    data_table_name       IN VARCHAR2,
    xform_view_name       IN VARCHAR2,
    clip_schema_name      IN VARCHAR2 DEFAULT NULL,
    data_schema_name      IN VARCHAR2,DEFAULT NULL,
    xform_schema_name     IN VARCHAR2,DEFAULT NULL;

Parameters

Table 47-35 XFORM_CLIP Procedure Parameters

Parameter	Description
`clip_table_name`	Name of the clipping definition table generated using `CREATE_CLIP`
`data_table_name`	Name of the table containing the data
`xform_view_name`	View representing the transformed output
`clip_schema_name`	Name of the schema hosting the clipping definition table
`data_schema_name`	Name of the schema hosting the data table
`xform_schema_name`	Name of the schema hosting the view representing the transformed output

Usage Notes

The clipping definition table created by CREATE_CLIP and populated with clipping definitions by INSERT_CLIP_WINSOR_TAIL or INSERT_CLIP_TRIM_TAIL is used to guide query generation process to construct clipping expressions of the following form:

CASE WHEN "col" < lcut THEN lval
     WHEN "col" > rcut THEN rval
                       ELSE "col"
END "col"

Note that col is case-sensitive since it generates quoted identifiers. When there are multiple entries in the transform definition table for the same col, the behavior is undefined. Any one of the definitions may be used in query generation. NULL values remain unchanged.

Example 1 (Winsorizing). When col = 'my_col', lcut = –1.5, lval = –1.5, and rcut = 4.5 and rval = 4.5, the following expression is generated:

CASE WHEN "my_col" < –1.5 THEN -1.5
     WHEN "my_col" >  4.5 THEN 4.5
                          ELSE "my_col"
END "my_col"

Examples

The simplest invocation of this routine generates a view object build_view that represents the transformation query on build_table based on transform definitions in clipping definitions in the clip_table.

BEGIN
    DBMS_DATA_MINING_TRANSFORM.XFORM_CLIP(
    'clip_table', 'build_table', 'build_view');
END;

XFORM_COL_REM Procedure

This procedure creates a view that removes columns from the data table. Only the columns that are specified in the transform definition are removed; the remaining columns do not change.

Syntax

DBMS_DATA_MINING_TRANSFORM.XFORM_COL_REM (
     rem_table_name     IN       VARCHAR2,
     data_table_name    IN       VARCHAR2,
     xform_view_name    IN       VARCHAR2,
     rem_schema_name    IN       VARCHAR2 DEFAULT NULL,
     data_schema_name   IN       VARCHAR2 DEFAULT NULL,
     xform_schema_name  IN       VARCHAR2 DEFAULT NULL);

Parameters

Table 47-36 XFORM_COL_REM Procedure Parameters

Parameter	Description
`rem_table_name`	Name of the column removal definition table.
`data_table_name`	Name of the table containing the data.
`xform_view_name`	View representing the transformed output.
`rem_schema_name`	Schema of the column removal definition table.
`data_schema_name`	Schema of the data table.
`xform_schema_name`	Schema of the transforming view.

XFORM_EXPR_NUM Procedure

Creates a view that applies a given expression for every NUMBER column in the data table that is not in the exclusion list and in the inclusion list. The remaining columns do not change.

Expressions are constructed from the expression pattern by replacing every occurrence of the column pattern with an actual column name.

Syntax

DBMS_DATA_MINING_TRANSFORM.XFORM_EXPR_NUM (
     expr_pattern       IN       VARCHAR2,
     data_table_name    IN       VARCHAR2,
     xform_view_name    IN       VARCHAR2,
     exclude_list       IN       COLUMN_LIST DEFAULT NULL,
     include_list       IN       COLUMN_LIST DEFAULT NULL,
     col_pattern        IN       VARCHAR2 DEFAULT ':col',
     data_schema_name   IN       VARCHAR2 DEFAULT NULL,
     xform_schema_name  IN       VARCHAR2 DEFAULT NULL);

Parameters

Table 47-37 XFORM_EXPR_NUM Procedure Parameters

Parameter	Description
`expr_pattern`	Expression pattern
`data_table_name`	Name of the table containing the data.
`xform_view_name`	View representing the transformed output.
`exclude_list`	List of columns to exclude.
`include_list`	List of columns to include.
`col_pattern`	Column pattern.
`data_schema_name`	Schema of the data table.
`xform_schema_name`	Schema of the transformation view.

Examples

For a table TAB with two NUMBER columns CN1, CN3 and one CHAR columns CC2 and expression pattern TO_CHAR(:col), the following query is generated:
```
SELECT TO_CHAR("CN1") "CN1", "CC2", TO_CHAR("CN3") "CN3"
       FROM TAB
```
This procedure can be used for clipping (winsorizing) normalized data to a [0..1] range, that is values x > 1 become 1 and values x < 0 become 0.

For the table in example 1 and pattern

CASE WHEN :col < 0 THEN 0 WHEN :col > 1 THEN 1 ELSE :col END

the following query is generated:
```
SELECT CASE WHEN "CN1" < 0 THEN 0 WHEN "CN1" > 1 THEN 1 
            ELSE "CN1" END "CN1", 
        "CC2",
      CASE WHEN "CN3" < 0 THEN 0 WHEN "CN3" > 1 THEN 1 
           ELSE "CN3" END "CN3"
      FROM TAB
```

XFORM_EXPR_STR Procedure

This procedure creates a view that applies a given expression for every VARCHAR2 or CHAR column in the data table that is not in the exclusion list and in the inclusion list. The remaining columns do not change.

Similar to xform_expr_num, except that it applies to CHAR and VARCHAR2 columns instead of NUMBER. See "XFORM_EXPR_NUM Procedure"

Syntax

DBMS_DATA_MINING_TRANSFORM.XFORM_EXPR_STR (
     expr_pattern       IN       VARCHAR2,
     data_table_name    IN       VARCHAR2,
     xform_view_name    IN       VARCHAR2,
     exclude_list       IN       COLUMN_LIST DEFAULT NULL,
     include_list       IN       COLUMN_LIST DEFAULT NULL,
     col_pattern        IN       VARCHAR2 DEFAULT ':col',
     data_schema_name   IN       VARCHAR2 DEFAULT NULL,
     xform_schema_name  IN       VARCHAR2 DEFAULT NULL);

Parameters

Table 47-38 XFORM_EXPR_STR Procedure Parameters

Parameter	Description
`expr_pattern`	Expression pattern
`data_table_name`	Name of the table containing the data.
`xform_view_name`	View representing the transformed output.
`exclude_list`	List of columns to exclude.
`include_list`	List of columns to include.
`col_pattern`	Column pattern.
`data_schema_name`	Schema of the data table.
`xform_schema_name`	Schema of the transformation view.

XFORM_MISS_CAT Procedure

This procedure creates a view that performs categorical missing value treatment. Only the columns that are specified in the xform definition are treated; the remaining columns do not change.

Syntax

DBMS_DATA_MINING_TRANSFORM.XFORM_MISS_CAT (
    miss_table_name       IN VARCHAR2, 
    data_table_name       IN VARCHAR2,
    xform_view_name       IN VARCHAR2,
    miss_schema_name      IN VARCHAR2 DEFAULT NULL,
    data_schema_name      IN VARCHAR2 DEFAULT NULL,
    xform_schema_name     IN VARCHAR2 DEFAULT NULL;

Parameters

Table 47-39 XFORM_MISS_CAT Procedure Parameters

Parameter	Description
`miss_table_name`	Name of the categorical missing value treatment definition table generated using `CREATE_MISS_CAT`
`data_table_name`	Name of the table containing the data
`xform_view_name`	View representing the transformed output
`miss_schema_name`	Name of the schema hosting the categorical missing value treatment definition table
`data_schema_name`	Name of the schema hosting the data table
`xform_schema_name`	Name of the schema hosting the view representing the transformed output

Usage Notes

The data type of the transformed columns is preserved by putting a CAST expression around the NVL function. For example, when col = 'state', val = 'MA' the data type is CHAR(2) the following expression is generated:

CAST (NVL("state", 'MA') AS CHAR(2)) "state"

Examples

The simplest invocation of this routine generates a view object build_view that represents the transformation query on build_table based on transform definitions in missing value definitions in miss_table.

BEGIN
    DBMS_DATA_MINING_TRANSFORM.XFORM_MISS_CAT(
       'miss_table', 'build_table', 'build_view');
END;

XFORM_MISS_NUM Procedure

This procedure creates a view that performs numerical missing value treatment of the data table. Only the columns that are specified in the xform definition are treated, the remaining columns do not change.

Syntax

DBMS_DATA_MINING_TRANSFORM.XFORM_MISS_NUM (
    miss_table_name       IN VARCHAR2, 
    data_table_name       IN VARCHAR2,
    xform_view_name       IN VARCHAR2,
    miss_schema_name      IN VARCHAR2 DEFAULT NULL,
    data_schema_name      IN VARCHAR2 DEFAULT NULL,
    xform_schema_name     IN VARCHAR2 DEFAULT NULL;

Parameters

Table 47-40 XFORM_MISS_NUM Procedure Parameters

Parameter	Description
`miss_table_name`	Name of the numeric missing value treatment definition table generated using `CREATE_MISS_NUM`
`data_table_name`	Name of the table containing the data
`xform_view_name`	View representing the transformed output
`miss_schema_name`	Name of the schema hosting the numerical missing value treatment definition table
`data_schema_name`	Name of the schema hosting the data table
`xform_schema_name`	Name of the schema hosting the view representing the transformed output

Examples

BEGIN
    DBMS_DATA_MINING_TRANSFORM.XFORM_MISS_NUM(
    'miss_table', 'build_table', 'build_view');
END;

XFORM_NORM_LIN Procedure

This procedure creates the view that performs linear normalization. Only the columns that are specified in the definition table are transformed; the remaining columns do not change.

Syntax

DBMS_DATA_MINING_TRANSFORM.XFORM_NORM_LIN (
     norm_table_name      IN VARCHAR2,
     data_table_name      IN VARCHAR2,
     xform_view_name      IN VARCHAR2,
     norm_schema_name     IN VARCHAR2 DEFAULT NULL,
     data_schema_name     IN VARCHAR2 DEFAULT NULL,
     xform_schema_name    IN VARCHAR2 DEFAULT NULL);

Parameters

Table 47-41 XFORM_NORM_LIN Procedure Parameters

Parameter	Description
`norm_table_name`	Name of the normalization definition table generated using `CREATE_NORM_LIN` procedure
`data_table_name`	Name of the table containing the data
`xform_view_name`	View representing the transformed output
`norm_schema_name`	Name of the schema hosting the normalization definition table
`data_schema_name`	Name of the schema hosting the data table
`xform_schema_name`	Name of the schema hosting the view representing the transformed output

Usage Notes

The normalization table created by CREATE_NORM_LIN is populated with definitions by either INSERT_NORM_LIN_ZSCORE or INSERT_NORM_LIN_MINMAX is used to guide the query generation process to construct normalization expressions of the following form:

("col" - shift)/scale "col"

For example, when col = 'my_col', shift = -1.5, and scale = 20. The following expression is generated:

("my_col" - (-1.5))/20 "my_col"

Examples

The simplest invocation of this routine generates a view build_view that represents the transformation query on build_table based on normalization definitions in the norm_minmax_table.

BEGIN

DBMS_DATA_MINING_TRANSFORM.XFORM_NORM_LIN(
      'norm_minmax_table', 'build_table', 'build_view');

END;

XFORM_STACK Procedure

Creates a view that performs transformation of the data table specified by the stack definition. Only the columns that are specified in the stack definition are transformed, the remaining columns do not change.

All columns in the stack definition are applied. Columns with NULL value in the expression field are excluded. Columns in the stack definition that do not have a matching column in the data are added to the view.

Syntax

DBMS_DATA_MINING_TRANSFORM.XFORM_STACK (
     transform_list     IN     DM_TRANSFORMS,
     data_table_name    IN     VARCHAR2,
     xform_view_name    IN     VARCHAR2,
     data_schema_name   IN     VARCHAR2 DEFAULT NULL,
     xform_schema_name  IN     VARCHAR2 DEFAULT NULL);

Parameters

Table 47-42 XFORM_STACK Procedure Parameters

Parameter	Description
`transform_list`	Stack definition.
`data_table_name`	Name of the table containing the data.
`xform_view_name`	View representing the transformed output.
`data_schema_name`	Schema of the data table.
`xform_schema_name`	Schema of the transformation view.