7 The Data Mining Sample Programs

You can learn a great deal about the Oracle Data Mining APIs from the Data Mining sample programs. The programs illustrate typical approaches to data preparation, algorithm selection, algorithm tuning, testing, and scoring. Each program creates a mining model in the database. All the programs include extensive inline comments to help you understand the code.

See Also:

Oracle Data Mining Application Developer's Guide for information about the Data Mining APIs

This chapter includes the following sections:

Installation and Setup
The PL/SQL Programs
The Java Programs
The Sample Data

Installation and Setup

The Data Mining sample programs are installed with Oracle Database Examples. They are also available for download from the Oracle Technology Network:

The programs require access to a database that includes the sample schemas. Before you can run the programs, you must run two configuration scripts to configure the data and assign the required privileges to your user ID.

Install the Sample Programs

Follow these steps to install the sample programs:

Install Oracle Database with the sample schemas, or obtain access to a database that includes the sample schemas.
- If you followed the instructions in "Install Oracle Database", the sample schemas are installed automatically in the starter database. Be sure to unlock the SH schema, as described in step 11.
- If the database does not include the sample schemas, you can install them manually or by using Oracle Database Configuration Assistant. See Oracle Database Sample Schemas for instructions.
Determine whether or not Database Examples was installed with Oracle Database. Database Examples provides a set of sample programs that illustrate numerous features of Oracle Database, including Oracle Data Mining. The programs are loaded into the RDBMS/demo subdirectory of Oracle home.

If Database Examples was not installed, you can perform the installation by following the instructions in "Install Oracle Database Examples". Alternatively, you can download the Data Mining sample programs from the Oracle Technology Network.
```
http://www.oracle.com/technology/products/bi/odm/index.html
```

Run the Configuration Scripts

Follow these steps to configure the sample data and grant the necessary privileges to your data mining user ID.

Enter user-name: sys / as sysdba
    Enter password: password

If you do not have a user ID for your data mining activities, you can create one by following the instructions in "Example: Create a Database User in SQL*Plus".
Run dmshgrants.sql to grant data mining privileges and SH access to your user ID. Several tables in SH are used by the Data Mining sample programs. Specify the data mining user name as the parameter. Specify the full path of the Oracle home directory for ORACLE_BASE\ORACLE_HOME.
```
@ ORACLE_BASE\ORACLE_HOME\RDBMS\demo\dmshgrants dmuser
```
Now connect to the database as the Data Mining user.
```
CONNECT dmuser
Enter password: password
```
Run dmsh.sql to populate the schema of the Data Mining user with tables, views, and other objects needed by the sample programs. Specify the full path of the Oracle home directory for OACLE_BASE\ORACLE_HOME.
```
@ ORACLE_BASE\ORACLE_HOME\RDBMS\demo\dmsh
COMMIT;
```

Locate the Sample Programs

This section explains how to locate the sample programs if they were installed with Database Examples.

To locate the PL/SQL programs, navigate to the parent directory and search for the files that start with dm and end with .sql.

For example, if Database Examples was installed in Oracle home C:\app\demotest\product\11.2.0\db_1\, then navigate to C:\app\demotest\product\11.2.0\db_1\RDBMS\demo\ and use Windows Search to find the files named dm*.sql. Windows Search returns the list of Data Mining PL/SQL programs, as shown in Figure 7-1.

Figure 7-1 The Data Mining Sample PL/SQL Programs

Description of "Figure 7-1 The Data Mining Sample PL/SQL Programs"

Note:

The files listed in Figure 7-1 include all the Data Mining PL/SQL programs. However, one of the files, dmhpdemo.sql, is not a Data Mining program.

Use Windows Search to find the files named dm*.java in the same directory. Windows Search returns the Data Mining Java programs, as shown in Figure 7-2.

Figure 7-2 The Data Mining Sample Java Programs

Description of "Figure 7-2 The Data Mining Sample Java Programs"

Run the Sample Programs

You can run the sample programs as many times as you wish. The programs clean up the results of the previous run before executing the current run.

While the program is running, it displays the program code and the program output.

Run the PL/SQL Sample Programs

To run the PL/SQL programs:

Start SQL*Plus and log in as the Data Mining user.

Enter user-name: dmuser
    Enter password: password

Run the program by specifying an at sign (@) followed by the fully-qualified path of the program. In the following example, replace ORACLE_BASE\ORACLE_HOME with the path of the Oracle home directory.
```
SQL>@ ORACLE_BASE\ORACLE_HOME\RDBMS\demo\dmnbdemo
```
This example executes the program dmnbdemo.sql, which creates a Naive Bayes model.

Prepare to Run the Java Programs

Before you can run the Java programs, you must set up your Java environment and compile the programs.

Check that the version of Java you are using is 1.5 or higher. You can execute the following in a command window to check the version of Java.
```
>java -version
```
Add ORACLE_BASE\ORACLE_HOME\jdk\bin\ to your PATH variable before the paths of any other Java versions. ORACLE_BASE\ORACLE_HOME is the full path to the Oracle home directory.

Add the following Data Mining JAR files to your Windows CLASSPATH:

ORACLE_BASE\ORACLE_HOME\RDBMS\jlib\jdm.jar
            ORACLE_BASE\ORACLE_HOME\RDBMS\jlib\ojdm_api.jar
            ORACLE_BASE\ORACLE_HOME\RDBMS\jlib\xdb.jar
            ORACLE_BASE\ORACLE_HOME\jdbc\lib\ojdbc5.jar
            ORACLE_BASE\ORACLE_HOME\oc4j\j2ee\home\lib\connector.jar
            ORACLE_BASE\ORACLE_HOME\jlib\orai18n.jar   
            ORACLE_BASE\ORACLE_HOME\jlib\orai18n-mapping.jar
            ORACLE_BASE\ORACLE_HOME\lib\xmlparserv2.jar

Compile the programs listed in Figure 7-2. To use the JAVAC executable, open a command window and go to \RDBMS\demo in Oracle home.
```
>javac program_name.java
```
For example:
```
>javac dmnbdemo.java
```
If JAVAC is not found, then check the value of the PATH variable.

Run the Java Programs

You can run the Java programs from the operating system prompt with a command like this:

>java program_name host_name:port_number:database_identifier user password

List the Sample Models

The mining models created by the sample programs can be viewed with a query like the one shown in Example 7-1.

Example 7-1 Sample Data Mining Models

SQL> SELECT model_name, mining_function, algorithm FROM user_mining_models
            ORDER BY model_name;
 
MODEL_NAME                     MINING_FUNCTION                ALGORITHM
------------------------------ ------------------------------ ------------------------------
ABNMODEL_JDM                   CLASSIFICATION                 ADAPTIVE_BAYES_NETWORK
ABN_SH_CLAS_SAMPLE             CLASSIFICATION                 ADAPTIVE_BAYES_NETWORK
AIMODEL_JDM                    ATTRIBUTE_IMPORTANCE           MINIMUM_DESCRIPTION_LENGTH
AI_SH_SAMPLE                   ATTRIBUTE_IMPORTANCE           MINIMUM_DESCRIPTION_LENGTH
APMODEL_JDM                    CLASSIFICATION                 NAIVE_BAYES
ARMODEL_JDM                    ASSOCIATION_RULES              APRIORI_ASSOCIATION_RULES
AR_SH_SAMPLE                   ASSOCIATION_RULES              APRIORI_ASSOCIATION_RULES
AR_SH_SAMPLE_STR_XNAL          ASSOCIATION_RULES              APRIORI_ASSOCIATION_RULES
AR_SH_SAMPLE_XNAL_SVAL         ASSOCIATION_RULES              APRIORI_ASSOCIATION_RULES
DT_SH_CLAS_SAMPLE              CLASSIFICATION                 DECISION_TREE
GLMCMODEL_JDM                  CLASSIFICATION                 GENERALIZED_LINEAR_MODEL
GLMC_SH_CLAS_SAMPLE            CLASSIFICATION                 GENERALIZED_LINEAR_MODEL
GLMRMODEL_JDM                  REGRESSION                     GENERALIZED_LINEAR_MODEL
GLMR_SH_REGR_SAMPLE            REGRESSION                     GENERALIZED_LINEAR_MODEL
KMMODEL_JDM                    CLUSTERING                     KMEANS
KM_SH_CLUS_SAMPLE              CLUSTERING                     KMEANS
NBEXPIMPMODEL_JDM              CLASSIFICATION                 NAIVE_BAYES
NBMODEL_JDM                    CLASSIFICATION                 NAIVE_BAYES
NB_SH_CLAS_SAMPLE              CLASSIFICATION                 NAIVE_BAYES
NMFMODEL_JDM                   FEATURE_EXTRACTION             NONNEGATIVE_MATRIX_FACTOR
NMF_SH_SAMPLE                  FEATURE_EXTRACTION             NONNEGATIVE_MATRIX_FACTOR
OCMODEL_JDM                    CLUSTERING                     O_CLUSTER
OC_SH_CLUS_SAMPLE              CLUSTERING                     O_CLUSTER
SVMCMODEL_JDM                  CLASSIFICATION                 SUPPORT_VECTOR_MACHINES
SVMC_SH_CLAS_SAMPLE            CLASSIFICATION                 SUPPORT_VECTOR_MACHINES
SVMOMODEL_JDM                  CLASSIFICATION                 SUPPORT_VECTOR_MACHINES
SVMO_SH_CLAS_SAMPLE            CLASSIFICATION                 SUPPORT_VECTOR_MACHINES
SVMRMODEL_JDM                  REGRESSION                     SUPPORT_VECTOR_MACHINES
SVMR_SH_REGR_SAMPLE            REGRESSION                     SUPPORT_VECTOR_MACHINES
TREEMODEL_JDM                  CLASSIFICATION                 DECISION_TREE
TXTNMFMODEL_JDM                FEATURE_EXTRACTION             NONNEGATIVE_MATRIX_FACTOR
TXTSVMMODEL_JDM                CLASSIFICATION                 SUPPORT_VECTOR_MACHINES
T_NMF_SAMPLE                   FEATURE_EXTRACTION             NONNEGATIVE_MATRIX_FACTOR
T_SVM_CLAS_SAMPLE              CLASSIFICATION                 SUPPORT_VECTOR_MACHINES

The model names distinguish the models created by the Java programs from those created by the PL/SQL programs. The models created by the Java programs have "_JDM" appended to the name.

The PL/SQL Programs

The PL/SQL sample programs illustrate the use of the DBMS_DATA_MINING package for creating models and the DBMS_DATA_MINING_TRANSFORM package for performing transformations on the mining data.

See Also:

Oracle Database PL/SQL Packages and Types Reference for syntax of the PL/SQL API
Oracle Data Mining Application Developer's Guide for information on the use of the APIs
Oracle Database SQL Language Reference for syntax of the SQL functions for model scoring

PL/SQL Programs: Algorithms

The PL/SQL programs are presented by algorithm in Table 7-1.

Table 7-1 Algorithms in PL/SQL Sample Programs

Program File	Algorithm	Mining Function or Task
`dmaidemo.sql`	Minimum Descriptor Length	Attribute Importance
`dmardemo.sql`	Apriori	Association
`dmdtdemo.sql`	Decision Tree	Classification
`dmdtxvlddemo.sql`	Decision Tree (cross validation)	Classification
`dmglcdem.sql`	Binary Logistic Regression (GLM)	Classification
`dmglrdem.sql`	Multivariate Linear Regression (GLM)	Regression
`dmkmdemo.sql`	k-Means	Clustering
`dmnbdemo.sql`	Naive Bayes	Classification
`dmnmdemo.sql`	Non-Negative Matrix Factorization	Feature Extraction
`dmocdemo.sql`	O-Cluster	Clustering
`dmsvcdem.sql`	Support Vector Machine	Classification
`dmsvodem.sql`	Support Vector Machine	Anomaly Detection
`dmsvrdem.sql`	Support Vector Machine	Regression
`dmtxtfe.sql`	Term extraction using Oracle Text	Text transformation for mining
`dmtxtnmf.sql`	Non-Negative Matrix Factorization	Text mining using NMF
`dmtxtsvm.sql`	Support Vector Machine	Text mining using SVM

PL/SQL Programs: Mining Functions

The PL/SQL sample programs are presented by mining function in Table 7-2. For detailed descriptions of the sample programs, see the comments in the source code.

Table 7-2 Mining Functions of PL/SQL Sample Programs

Mining Function	Description
Classification	The classification programs demonstrate various preprocessing techniques and perform the following steps: Build a classification model using training data Display model details and settings Test the model by applying the model on the test data Compute test metrics, such as confusion matrix, lift, and ROC Apply the model on the scoring data Present apply results Present ranked apply results, influenced by a cost matrix `dmnbdemo.sql` illustrates Naive Bayes. `dmdtdemo.sql` illustrates Decision Tree. `dmsvcdem.sql` illustrates SVM classification. `dmglcdem.sql` illustrates GLM classification (binary logistic regression) The `dmdtxvlddemo.sql` program demonstrates cross-validation techniques for decision tree based-classification. With minor modifications, this program can be used to perform cross validation using other models/algorithms.
Regression	`dmsvrdem.sql` uses different test metrics, but otherwise performs most of the same steps used in the classification programs. Selected attributes of the input data are preprocessed (normalized). NOTE: `dmsvrdem`.`sq`l illustrates the new Automatic Data Preparation feature. `dmglrdem.sql` illustrates GLM regression (multivariate linear regression)
Anomaly Detection	`dmsvodem.sql` illustrates one-class SVM
Association	`dmardemo.sql` builds an association model and presents frequent itemsets and association rules as output.
Clustering	`dmkmdemo.sql` (k-Means) and `dmocdemo.sql` (0-Cluster) build clustering models and present cluster details, such as rules, centroid, and histogram for each cluster as output. The models are scored, and the probabilities associated with each cluster are returned as output. Selected attributes of the input data are preprocessed. NOTE: `dmkmdemo`.`sq`l illustrates the new Automatic Data Preparation feature.
Feature extraction	`dmnmdemo.sql` builds a feature extraction model and presents model details as the output. The model is scored, and each feature ID is associated with a probability. Selected attributes of the input data are preprocessed (normalized).
Attribute importance	`dmaidemo.sql` builds an attribute importance model and presents a list of important attributes as the output of model details. Selected attributes of the input data are preprocessed (binned).

PL/SQL Text Mining Programs

Three sample programs illustrate the process of text mining using PL/SQL. One program illustrates the preprocessing that is required to transform the text for mining. The other two programs build models that use the transformed text.

See Also:

Oracle Data Mining Concepts for information about text mining with Oracle Data Mining
Oracle Data Mining Application Developer's Guide for details about text transformation in PL/SQL

The PL/SQL sample text mining programs are:

dmtxtfe.sql — Illustrates the process of term extraction that prepares text for mining
dmtxtnmf.sql — Creates a text mining model using the Non-Negative Matrix Factorization algorithm
dmtxtsvm.sql — Creates a text mining model using SVM classification

The Java Programs

The Java demos illustrate the features of the Oracle Data Mining Java API, which implements Oracle-specific extensions to the Java Data Mining (JDM) 1.0.1.1 standard. The sample Java programs demonstrate all the Data Mining algorithms as well as data transformation techniques, predictive analytics, export/import, and text mining.

See Also:

Oracle Data Mining Application Developer's Guide for information on the Oracle Data Mining Java API
Oracle Data Mining Java API Reference (javadoc)

Java Programs: Algorithms

The Java programs are presented by algorithm in Table 7-3.

Table 7-3 Algorithms in Java Sample Programs

Program File	Algorithm	Mining Function or Task
`dmaidemo.java`	Minimum Description Length	Attribute importance
`dmapplydemo.java`	Naive Bayes	Scoring methods
`dmardemo.java`	Apriori	Association
`dmexpimpdemo.java`	export/import	Model Export/Import
`dmglcdemo.java`	Binary Logistic Regression (GLM)	Classification
`dmglrdemo.java`	Multivariate Linear Regression (GLM)	Regression
`dmkmdemo.java`	k-Means	Clustering
`dmnbdemo.java`	Naive Bayes	Classification
`dmnmdemo.java`	Non-Negative Matrix Factorization	Feature extraction
`dmocdemo.java`	O-Cluster	Clustering
`dmpademo.java`	Automated predict and explain	Predictive Analytics
`dmsvcdemo.java`	Support Vector Machine	Classification
`dmsvodemo.java`	Support Vector Machine (one class)	Classification
`dmsvrdemo.java`	Support Vector Machine	Regression
`dmtreedemo.java`	Decision Tree	Classification
`dmtxtnmfdemo.java`	Non-Negative Matrix Factorization	Text mining with NMF
`dmtxtsvmdemo.java`	Support Vector Machine	Text mining with SVM classification
`dmxfdemo.java`	Binning, clipping, and normalization	Data Transformations

Java Programs: Mining Functions

The Java sample programs are presented by mining function in Table 7-4. For detailed descriptions of the sample programs, see the comments in the source code.

Table 7-4 Mining Functions of the Java Sample Programs

Mining Function or Task	Description
Classification	The classification programs demonstrate various preprocessing techniques and perform the following steps: Build a classification model using training data Display model details and settings Test the model by applying the model on the test data Compute test metrics, such as confusion matrix, lift, and ROC Apply the model on the scoring data Present apply results Present ranked apply results, influenced by a cost matrix The `dmapplydemo.java` program demonstrates several ways of applying a Naive Bayes model. `dmglcdemo.java` illustrates GLM classification (binary logistic regression)
Regression	`dmsvrdemo.java` uses different test metrics, but otherwise performs most of the same steps used in the classification programs. Selected attributes of the input data are preprocessed (normalized). `dmglrdemo.java` illustrates GLM regression (multivariate linear regression)
Association	`dmardemo.java` builds an association model and presents frequent itemsets and association rules as output. Selected attributes of the input data are preprocessed (binned).
Clustering	`dmkmdemo.java` (k-Means) and `dmocdemo.java` (0-Cluster) build clustering models and present cluster details, such as rules, centroid, and histogram for each cluster as output. The models are scored, and the probabilities associated with each cluster are returned as output. Selected attributes of the input data are preprocessed (normalized).
Feature extraction	`dmnmdemo.java` builds a feature extraction model and presents model details as the output. The model is scored, and each feature ID is associated with a probability. Selected attributes of the input data are preprocessed (normalized).
Attribute importance	`dmaidemo.java` builds an attribute importance model and presents a list of important attributes as the output of model details. Selected attributes of the input data are preprocessed (binned).
Data transformations	`dmxfdemo.java` demonstrates binning, clipping, and normalization transformations.
Predictive Analytics	`dmpademo.java` demonstrates `PREDICT`, `EXPLAIN`, and `PROFILE` functions.
Model import/export	`dmexpimpdemo.java` builds a Naive Bayes model, exports it to a dump file, then imports it from the dump file.

Java Text Mining Programs

Two Java programs illustrate the process of text mining. One builds a feature extraction model, the other builds a classification model.

See Also:

Oracle Data Mining Concepts for information about text mining with Oracle Data Mining
Oracle Data Mining Application Developer's Guide for details about text transformation in Java

The Java text mining programs both use the dmtxtnmfdemo.java interface to transform the text for mining. The programs are as follows:

dmtxtnmf.sql — Creates a text mining model using the Non-Negative Matrix Factorization algorithm
dmtxtsvmdemo.java — Creates a text mining model using SVM classification

The Sample Data

The dmsh.sql script creates views, tables, and indexes in the user's schema. The views define columns of customer data from tables in the SH schema. This data is used by the Data Mining sample programs. The tables reference the same columns in SH, but they include an extra COMMENTS column for text mining. The indexes are used to extract terms from the text in the COMMENTS column and build a nested table column.

Customer Data for Data Mining

Views in the data mining user's schema define columns of data from the CUSTOMERS, SALES, PRODUCTS, COUNTRIES, and SUPPLEMENTARY_DEMOGRAPHICS tables in the SH schema. You can list these views with the following SQL statements.

SQL>CONNECT dmuser
Enter password: password
SQL>SELECT view_name FROM user_views;

The views are listed in Table 7-5.

Table 7-5 Views Used by the Data Mining Sample Programs

View Name	Description
`MINING_DATA_APPLY_STR_V`	Scoring data for o-cluster
`MINING_DATA_BUILD_STR_V`	Training data for o-cluster
`MINING_DATA_APPLY_V`	Scoring data for data mining (not text mining)
`MINING_DATA_BUILD_V`	Training data for data mining (not text mining)
`MINING_DATA_TEST_V`	Test data for data mining (not text mining)
`MARKET_BASKET_V`	Data for association rules
`MINING_DATA_ONE_CLASS_V`	Data for one-class SVM

You can see the references to tables in SH by listing the view definitions. The definition of the view MINING_DATA_BUILD_V is shown as follows.

SQL> set long 1000000
SQL> set longc 100000
SQL> set pagesize 100
SQL> SELECT text FROM all_views WHERE 
    owner='dmuser3'AND view_name='mining_data_build_v';

      SELECT a.CUST_ID, a.CUST_GENDER, 2003-a.CUST_YEAR_OF_BIRTH AGE, 
             a.CUST_MARITAL_STATUS, c.COUNTRY_NAME, a.CUST_INCOME_LEVEL,
             b.EDUCATION, b.OCCUPATION, b.HOUSEHOLD_SIZE, b.YRS_RESIDENCE,
             b.AFFINITY_CARD, b.BULK_PACK_DISKETTES, b.FLAT_PANEL_MONITOR,
             b.HOME_THEATER_PACKAGE, b.BOOKKEEPING_APPLICATION, 
             b.PRINTER_SUPPLIES, b.Y_BOX_GAMES, b.OS_DOC_SET_KANJI 
       FROM  sh.customers a, 
             sh.supplementary_demographics b, 
             sh.countries c 
       WHERE a.CUST_ID = b.CUST_ID AND a.country_id = c.country_id 
             AND a.cust_id between 101501 and 103000

The views are used to build, test, and score the sample models. Each view has a CUSTOMER_ID column, which is the case ID, and an AFFINITY_CARD column, which is the target used by the predictive models. Most of the views provide data for 1500 customers (1500 rows). The view used by the One-Class SVM model has data for 940 customers.

The columns of training data in the MINING_DATA_BUILD_V view are as follows.

SQL> DESCRIBE mining_data_build_v

CUST_ID                    NOT NULL            NUMBER
CUST_GENDER                NOT NULL            CHAR(1)
AGE                                            NUMBER
CUST_MARITAL_STATUS                            VARCHAR2(20)
COUNTRY_NAME               NOT NULL            VARCHAR2(40)
CUST_INCOME_LEVEL                              VARCHAR2(30)
EDUCATION                                      VARCHAR2(21)
OCCUPATION                                     VARCHAR2(21)
HOUSEHOLD_SIZE                                 VARCHAR2(21)
YRS_RESIDENCE                                  NUMBER
AFFINITY_CARD                                  NUMBER(10)
BULK_PACK_DISKETTES                            NUMBER(10)
FLAT_PANEL_MONITOR                             NUMBER(10)
HOME_THEATER_PACKAGE                           NUMBER(10)
BOOKKEEPING_APPLICATION                        NUMBER(10)
PRINTER_SUPPLIES                               NUMBER(10)
Y_BOX_GAMES                                    NUMBER(10)
OS_DOC_SET_KANJI                               NUMBER(10)

Market Basket Data for Association Rules

The association demos use the MARKET_BASKET_V data set, which includes columns of products from the PRODUCTS table and the CUSTOMER_ID column from the CUSTOMERS table in SH. The columns of the MARKET_BASKET_V view are as follows.

SQL> DESCRIBE market_basket_v

CUST_ID                     NOT NULL         NUMBER
EXTENSION_CABLE                              NUMBER
FLAT_PANEL_MONITOR                           NUMBER
CD_RW_HIGH_SPEED_5_PACK                      NUMBER
ENVOY_256MB_40GB                             NUMBER
ENVOY_AMBASSADOR                             NUMBER
EXTERNAL_8X_CD_ROM                           NUMBER
KEYBOARD_WRIST_REST                          NUMBER
SM26273_BLACK_INK_CARTRIDGE                  NUMBER
MOUSE_PAD                                    NUMBER
MULTIMEDIA_SPEAKERS_3INCH                    NUMBER
OS_DOC_SET_ENGLISH                           NUMBER
SIMM_16MB_PCMCIAII_CARD                      NUMBER
STANDARD_MOUSE                               NUMBER

Customer Data for Text Mining

The text mining demos use the same customer data from tables in SH, but they include either an extra text column or a collection type column. The collection type is a nested table of type DM_NESTED_NUMERICALS.

Tip:

The process of extracting terms from a text column into a nested table column is described in Oracle Data Mining Application Developer's Guide.

You can list these tables with the following SQL statements.

SQL>CONNECT dmuser
Enter password: password
SQL>SELECT table_name FROM user_tables WHERE table_name LIKE '%MINING%';

The text mining tables are listed in Table 7-6.

Table 7-6 Tables Used by the Text Mining Sample Programs

Table Name	Description
`MINING_APPLY_NESTED_TEXT`	Apply table with `COMMENTS` column as `DM_NESTED_NUMERICALS`
`MINING_BUILD_NESTED_TEXT`	Build table with `COMMENTS` column as `DM_NESTED_NUMERICALS`
`MINING_TEST_NESTED_TEXT`	Test table with `COMMENTS` column as `DM_NESTED_NUMERICALS`
`MINING_APPLY_TEXT`	Apply table with `COMMENTS` column as `VARCHAR2(4000)`
`MINING_BUILD_TEXT`	Build table with `COMMENTS` column as `VARCHAR2(4000)`
`MINING_TEST_TEXT`	Test table with `COMMENTS` column as `VARCHAR2(4000)`

In the MINING_BUILD_TEXT, MINING_TEST_TEXT, and MINING_APPLY_TEXT tables, the COMMENTS column is of type VARCHAR2(4000).

SQL> DESCRIBE mining_build_text
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 CUST_ID                                   NOT NULL NUMBER
 CUST_GENDER                               NOT NULL CHAR(1)
 AGE                                                NUMBER
 CUST_MARITAL_STATUS                                VARCHAR2(20)
 COUNTRY_NAME                              NOT NULL VARCHAR2(40)
 CUST_INCOME_LEVEL                                  VARCHAR2(30)
 EDUCATION                                          VARCHAR2(21)
 OCCUPATION                                         VARCHAR2(21)
 HOUSEHOLD_SIZE                                     VARCHAR2(21)
 YRS_RESIDENCE                                      NUMBER
 AFFINITY_CARD                                      NUMBER(10)
 BULK_PACK_DISKETTES                                NUMBER(10)
 FLAT_PANEL_MONITOR                                 NUMBER(10)
 HOME_THEATER_PACKAGE                               NUMBER(10)
 BOOKKEEPING_APPLICATION                            NUMBER(10)
 PRINTER_SUPPLIES                                   NUMBER(10)
 Y_BOX_GAMES                                        NUMBER(10)
 OS_DOC_SET_KANJI                                   NUMBER(10)
 COMMENTS                                           VARCHAR2(4000)

In the MINING_*_NESTED_TEXT tables, the COMMENTS column is of type DM_NESTED_NUMERICALS.

SQL> DESCRIBE mining_build_nested_text
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 CUST_ID                                   NOT NULL NUMBER
 CUST_GENDER                               NOT NULL CHAR(1)
 AGE                                                NUMBER
 CUST_MARITAL_STATUS                                VARCHAR2(20)
 COUNTRY_NAME                              NOT NULL VARCHAR2(40)
 CUST_INCOME_LEVEL                                  VARCHAR2(30)
 EDUCATION                                          VARCHAR2(21)
 OCCUPATION                                         VARCHAR2(21)
 HOUSEHOLD_SIZE                                     VARCHAR2(21)
 YRS_RESIDENCE                                      NUMBER
 AFFINITY_CARD                                      NUMBER(10)
 BULK_PACK_DISKETTES                                NUMBER(10)
 FLAT_PANEL_MONITOR                                 NUMBER(10)
 HOME_THEATER_PACKAGE                               NUMBER(10)
 BOOKKEEPING_APPLICATION                            NUMBER(10)
 PRINTER_SUPPLIES                                   NUMBER(10)
 Y_BOX_GAMES                                        NUMBER(10)
 OS_DOC_SET_KANJI                                   NUMBER(10)
 COMMENTS                                           DM_NESTED_NUMERICALS