Oracle® Data Mining Administrator's Guide 11g Release 2 (11.2) Part Number E12217-02 |
|
|
View PDF |
You can learn a great deal about the Oracle Data Mining APIs from the Data Mining sample programs. The programs illustrate typical approaches to data preparation, algorithm selection, algorithm tuning, testing, and scoring. Each program creates a mining model in the database. All the programs include extensive inline comments to help you understand the code.
See Also:
Oracle Data Mining Application Developer's Guide for information about the Data Mining APIsThis chapter includes the following sections:
The Data Mining sample programs are installed with Oracle Database Examples. They are also available for download from the Oracle Technology Network:
The programs require access to a database that includes the sample schemas. Before you can run the programs, you must run two configuration scripts to configure the data and assign the required privileges to your user ID.
Follow these steps to install the sample programs:
Install Oracle Database with the sample schemas, or obtain access to a database that includes the sample schemas.
If you followed the instructions in "Install Oracle Database", the sample schemas are installed automatically in the starter database. Be sure to unlock the SH
schema, as described in step 11.
If the database does not include the sample schemas, you can install them manually or by using Oracle Database Configuration Assistant. See Oracle Database Sample Schemas for instructions.
Determine whether or not Database Examples was installed with Oracle Database. Database Examples provides a set of sample programs that illustrate numerous features of Oracle Database, including Oracle Data Mining. The programs are loaded into the RDBMS/demo
subdirectory of Oracle home.
If Database Examples was not installed, you can perform the installation by following the instructions in "Install Oracle Database Examples". Alternatively, you can download the Data Mining sample programs from the Oracle Technology Network.
http://www.oracle.com/technology/products/bi/odm/index.html
Follow these steps to configure the sample data and grant the necessary privileges to your data mining user ID.
Log in to SQL*Plus with system privileges.
Enter user-name: sys / as sysdba
Enter password: password
If you do not have a user ID for your data mining activities, you can create one by following the instructions in "Example: Create a Database User in SQL*Plus".
Run dmshgrants.sql
to grant data mining privileges and SH
access to your user ID. Several tables in SH
are used by the Data Mining sample programs. Specify the data mining user name as the parameter. Specify the full path of the Oracle home directory for ORACLE_BASE\ORACLE_HOME
.
@ ORACLE_BASE\ORACLE_HOME\RDBMS\demo\dmshgrants dmuser
Now connect to the database as the Data Mining user.
CONNECT dmuser Enter password: password
Run dmsh.sql
to populate the schema of the Data Mining user with tables, views, and other objects needed by the sample programs. Specify the full path of the Oracle home directory for OACLE_BASE\ORACLE_HOME
.
@ ORACLE_BASE\ORACLE_HOME\RDBMS\demo\dmsh
COMMIT;
This section explains how to locate the sample programs if they were installed with Database Examples.
To locate the PL/SQL programs, navigate to the parent directory and search for the files that start with dm
and end with .sql
.
For example, if Database Examples was installed in Oracle home C:\app\demotest\product\11.2.0\db_1\
, then navigate to C:\app\demotest\product\11.2.0\db_1\RDBMS\demo\
and use Windows Search to find the files named dm*.sql
. Windows Search returns the list of Data Mining PL/SQL programs, as shown in Figure 7-1.
Figure 7-1 The Data Mining Sample PL/SQL Programs
Note:
The files listed in Figure 7-1 include all the Data Mining PL/SQL programs. However, one of the files,dmhpdemo.sql
, is not a Data Mining program.Use Windows Search to find the files named dm*.java
in the same directory. Windows Search returns the Data Mining Java programs, as shown in Figure 7-2.
Figure 7-2 The Data Mining Sample Java Programs
You can run the sample programs as many times as you wish. The programs clean up the results of the previous run before executing the current run.
While the program is running, it displays the program code and the program output.
To run the PL/SQL programs:
Start SQL*Plus and log in as the Data Mining user.
Enter user-name: dmuser Enter password: password
Run the program by specifying an at sign (@) followed by the fully-qualified path of the program. In the following example, replace ORACLE_BASE\ORACLE_HOME
with the path of the Oracle home directory.
SQL>@ ORACLE_BASE\ORACLE_HOME\RDBMS\demo\dmnbdemo
This example executes the program dmnbdemo.sql
, which creates a Naive Bayes model.
Before you can run the Java programs, you must set up your Java environment and compile the programs.
Check that the version of Java you are using is 1.5 or higher. You can execute the following in a command window to check the version of Java.
>java -version
Add ORACLE_BASE\ORACLE_HOME
\jdk\bin\
to your PATH
variable before the paths of any other Java versions. ORACLE_BASE\ORACLE_HOME
is the full path to the Oracle home directory.
Add the following Data Mining JAR files to your Windows CLASSPATH
:
ORACLE_BASE\ORACLE_HOME\RDBMS\jlib\jdm.jar ORACLE_BASE\ORACLE_HOME\RDBMS\jlib\ojdm_api.jar ORACLE_BASE\ORACLE_HOME\RDBMS\jlib\xdb.jar ORACLE_BASE\ORACLE_HOME\jdbc\lib\ojdbc5.jar ORACLE_BASE\ORACLE_HOME\oc4j\j2ee\home\lib\connector.jar ORACLE_BASE\ORACLE_HOME\jlib\orai18n.jar ORACLE_BASE\ORACLE_HOME\jlib\orai18n-mapping.jar ORACLE_BASE\ORACLE_HOME\lib\xmlparserv2.jar
Compile the programs listed in Figure 7-2. To use the JAVAC
executable, open a command window and go to \RDBMS\demo
in Oracle home.
>javac program_name.java
For example:
>javac dmnbdemo.java
If JAVAC
is not found, then check the value of the PATH
variable.
The mining models created by the sample programs can be viewed with a query like the one shown in Example 7-1.
Example 7-1 Sample Data Mining Models
SQL> SELECT model_name, mining_function, algorithm FROM user_mining_models ORDER BY model_name; MODEL_NAME MINING_FUNCTION ALGORITHM ------------------------------ ------------------------------ ------------------------------ ABNMODEL_JDM CLASSIFICATION ADAPTIVE_BAYES_NETWORK ABN_SH_CLAS_SAMPLE CLASSIFICATION ADAPTIVE_BAYES_NETWORK AIMODEL_JDM ATTRIBUTE_IMPORTANCE MINIMUM_DESCRIPTION_LENGTH AI_SH_SAMPLE ATTRIBUTE_IMPORTANCE MINIMUM_DESCRIPTION_LENGTH APMODEL_JDM CLASSIFICATION NAIVE_BAYES ARMODEL_JDM ASSOCIATION_RULES APRIORI_ASSOCIATION_RULES AR_SH_SAMPLE ASSOCIATION_RULES APRIORI_ASSOCIATION_RULES AR_SH_SAMPLE_STR_XNAL ASSOCIATION_RULES APRIORI_ASSOCIATION_RULES AR_SH_SAMPLE_XNAL_SVAL ASSOCIATION_RULES APRIORI_ASSOCIATION_RULES DT_SH_CLAS_SAMPLE CLASSIFICATION DECISION_TREE GLMCMODEL_JDM CLASSIFICATION GENERALIZED_LINEAR_MODEL GLMC_SH_CLAS_SAMPLE CLASSIFICATION GENERALIZED_LINEAR_MODEL GLMRMODEL_JDM REGRESSION GENERALIZED_LINEAR_MODEL GLMR_SH_REGR_SAMPLE REGRESSION GENERALIZED_LINEAR_MODEL KMMODEL_JDM CLUSTERING KMEANS KM_SH_CLUS_SAMPLE CLUSTERING KMEANS NBEXPIMPMODEL_JDM CLASSIFICATION NAIVE_BAYES NBMODEL_JDM CLASSIFICATION NAIVE_BAYES NB_SH_CLAS_SAMPLE CLASSIFICATION NAIVE_BAYES NMFMODEL_JDM FEATURE_EXTRACTION NONNEGATIVE_MATRIX_FACTOR NMF_SH_SAMPLE FEATURE_EXTRACTION NONNEGATIVE_MATRIX_FACTOR OCMODEL_JDM CLUSTERING O_CLUSTER OC_SH_CLUS_SAMPLE CLUSTERING O_CLUSTER SVMCMODEL_JDM CLASSIFICATION SUPPORT_VECTOR_MACHINES SVMC_SH_CLAS_SAMPLE CLASSIFICATION SUPPORT_VECTOR_MACHINES SVMOMODEL_JDM CLASSIFICATION SUPPORT_VECTOR_MACHINES SVMO_SH_CLAS_SAMPLE CLASSIFICATION SUPPORT_VECTOR_MACHINES SVMRMODEL_JDM REGRESSION SUPPORT_VECTOR_MACHINES SVMR_SH_REGR_SAMPLE REGRESSION SUPPORT_VECTOR_MACHINES TREEMODEL_JDM CLASSIFICATION DECISION_TREE TXTNMFMODEL_JDM FEATURE_EXTRACTION NONNEGATIVE_MATRIX_FACTOR TXTSVMMODEL_JDM CLASSIFICATION SUPPORT_VECTOR_MACHINES T_NMF_SAMPLE FEATURE_EXTRACTION NONNEGATIVE_MATRIX_FACTOR T_SVM_CLAS_SAMPLE CLASSIFICATION SUPPORT_VECTOR_MACHINES
The model names distinguish the models created by the Java programs from those created by the PL/SQL programs. The models created by the Java programs have "_JDM" appended to the name.
The PL/SQL sample programs illustrate the use of the DBMS_DATA_MINING
package for creating models and the DBMS_DATA_MINING_TRANSFORM
package for performing transformations on the mining data.
See Also:
Oracle Database PL/SQL Packages and Types Reference for syntax of the PL/SQL API
Oracle Data Mining Application Developer's Guide for information on the use of the APIs
Oracle Database SQL Language Reference for syntax of the SQL functions for model scoring
The PL/SQL programs are presented by algorithm in Table 7-1.
Table 7-1 Algorithms in PL/SQL Sample Programs
Program File | Algorithm | Mining Function or Task |
---|---|---|
|
||
|
Association |
|
|
||
|
Decision Tree (cross validation) |
Classification |
|
Classification |
|
|
||
|
Clustering |
|
|
Classification |
|
|
||
|
||
|
||
|
Support Vector Machine |
|
|
Support Vector Machine |
|
|
||
|
||
|
Support Vector Machine |
Text mining using SVM |
The PL/SQL sample programs are presented by mining function in Table 7-2. For detailed descriptions of the sample programs, see the comments in the source code.
Table 7-2 Mining Functions of PL/SQL Sample Programs
Three sample programs illustrate the process of text mining using PL/SQL. One program illustrates the preprocessing that is required to transform the text for mining. The other two programs build models that use the transformed text.
See Also:
Oracle Data Mining Concepts for information about text mining with Oracle Data Mining
Oracle Data Mining Application Developer's Guide for details about text transformation in PL/SQL
The PL/SQL sample text mining programs are:
The Java demos illustrate the features of the Oracle Data Mining Java API, which implements Oracle-specific extensions to the Java Data Mining (JDM) 1.0.1.1 standard. The sample Java programs demonstrate all the Data Mining algorithms as well as data transformation techniques, predictive analytics, export/import, and text mining.
See Also:
Oracle Data Mining Application Developer's Guide for information on the Oracle Data Mining Java API
Oracle Data Mining Java API Reference (javadoc)
The Java programs are presented by algorithm in Table 7-3.
Table 7-3 Algorithms in Java Sample Programs
Program File | Algorithm | Mining Function or Task |
---|---|---|
|
||
|
||
|
||
|
export/import |
|
|
||
|
||
|
||
|
Classification |
|
|
||
|
||
|
||
|
Classification |
|
|
Support Vector Machine (one class) |
Classification |
|
Support Vector Machine |
Regression |
|
Classification |
|
|
||
|
Support Vector Machine |
Text mining with SVM classification |
|
The Java sample programs are presented by mining function in Table 7-4. For detailed descriptions of the sample programs, see the comments in the source code.
Table 7-4 Mining Functions of the Java Sample Programs
Two Java programs illustrate the process of text mining. One builds a feature extraction model, the other builds a classification model.
See Also:
Oracle Data Mining Concepts for information about text mining with Oracle Data Mining
Oracle Data Mining Application Developer's Guide for details about text transformation in Java
The Java text mining programs both use the dmtxtnmfdemo.java
interface to transform the text for mining. The programs are as follows:
The dmsh.sql
script creates views, tables, and indexes in the user's schema. The views define columns of customer data from tables in the SH
schema. This data is used by the Data Mining sample programs. The tables reference the same columns in SH
, but they include an extra COMMENTS
column for text mining. The indexes are used to extract terms from the text in the COMMENTS
column and build a nested table column.
Views in the data mining user's schema define columns of data from the CUSTOMERS
, SALES
, PRODUCTS
, COUNTRIES
, and SUPPLEMENTARY_DEMOGRAPHICS
tables in the SH
schema. You can list these views with the following SQL statements.
SQL>CONNECT dmuser
Enter password: password
SQL>SELECT view_name FROM user_views;
The views are listed in Table 7-5.
Table 7-5 Views Used by the Data Mining Sample Programs
View Name | Description |
---|---|
|
Scoring data for o-cluster |
|
Training data for o-cluster |
|
Scoring data for data mining (not text mining) |
|
Training data for data mining (not text mining) |
|
Test data for data mining (not text mining) |
|
Data for association rules |
|
Data for one-class SVM |
You can see the references to tables in SH
by listing the view definitions. The definition of the view MINING_DATA_BUILD_V
is shown as follows.
SQL> set long 1000000 SQL> set longc 100000 SQL> set pagesize 100 SQL> SELECT text FROM all_views WHERE owner='dmuser3'AND view_name='mining_data_build_v'; SELECT a.CUST_ID, a.CUST_GENDER, 2003-a.CUST_YEAR_OF_BIRTH AGE, a.CUST_MARITAL_STATUS, c.COUNTRY_NAME, a.CUST_INCOME_LEVEL, b.EDUCATION, b.OCCUPATION, b.HOUSEHOLD_SIZE, b.YRS_RESIDENCE, b.AFFINITY_CARD, b.BULK_PACK_DISKETTES, b.FLAT_PANEL_MONITOR, b.HOME_THEATER_PACKAGE, b.BOOKKEEPING_APPLICATION, b.PRINTER_SUPPLIES, b.Y_BOX_GAMES, b.OS_DOC_SET_KANJI FROM sh.customers a, sh.supplementary_demographics b, sh.countries c WHERE a.CUST_ID = b.CUST_ID AND a.country_id = c.country_id AND a.cust_id between 101501 and 103000
The views are used to build, test, and score the sample models. Each view has a CUSTOMER_ID
column, which is the case ID, and an AFFINITY_CARD
column, which is the target used by the predictive models. Most of the views provide data for 1500 customers (1500 rows). The view used by the One-Class SVM model has data for 940 customers.
The columns of training data in the MINING_DATA_BUILD_V
view are as follows.
SQL> DESCRIBE mining_data_build_v CUST_ID NOT NULL NUMBER CUST_GENDER NOT NULL CHAR(1) AGE NUMBER CUST_MARITAL_STATUS VARCHAR2(20) COUNTRY_NAME NOT NULL VARCHAR2(40) CUST_INCOME_LEVEL VARCHAR2(30) EDUCATION VARCHAR2(21) OCCUPATION VARCHAR2(21) HOUSEHOLD_SIZE VARCHAR2(21) YRS_RESIDENCE NUMBER AFFINITY_CARD NUMBER(10) BULK_PACK_DISKETTES NUMBER(10) FLAT_PANEL_MONITOR NUMBER(10) HOME_THEATER_PACKAGE NUMBER(10) BOOKKEEPING_APPLICATION NUMBER(10) PRINTER_SUPPLIES NUMBER(10) Y_BOX_GAMES NUMBER(10) OS_DOC_SET_KANJI NUMBER(10)
The association demos use the MARKET_BASKET_V
data set, which includes columns of products from the PRODUCTS
table and the CUSTOMER_ID
column from the CUSTOMERS
table in SH
. The columns of the MARKET_BASKET_V
view are as follows.
SQL> DESCRIBE market_basket_v CUST_ID NOT NULL NUMBER EXTENSION_CABLE NUMBER FLAT_PANEL_MONITOR NUMBER CD_RW_HIGH_SPEED_5_PACK NUMBER ENVOY_256MB_40GB NUMBER ENVOY_AMBASSADOR NUMBER EXTERNAL_8X_CD_ROM NUMBER KEYBOARD_WRIST_REST NUMBER SM26273_BLACK_INK_CARTRIDGE NUMBER MOUSE_PAD NUMBER MULTIMEDIA_SPEAKERS_3INCH NUMBER OS_DOC_SET_ENGLISH NUMBER SIMM_16MB_PCMCIAII_CARD NUMBER STANDARD_MOUSE NUMBER
The text mining demos use the same customer data from tables in SH
, but they include either an extra text column or a collection type column. The collection type is a nested table of type DM_NESTED_NUMERICALS
.
Tip:
The process of extracting terms from a text column into a nested table column is described in Oracle Data Mining Application Developer's Guide.You can list these tables with the following SQL statements.
SQL>CONNECT dmuser
Enter password: password
SQL>SELECT table_name FROM user_tables WHERE table_name LIKE '%MINING%';
The text mining tables are listed in Table 7-6.
Table 7-6 Tables Used by the Text Mining Sample Programs
Table Name | Description |
---|---|
|
Apply table with |
|
Build table with |
|
Test table with |
|
Apply table with |
|
Build table with |
|
Test table with |
In the MINING_BUILD_TEXT
, MINING_TEST_TEXT
, and MINING_APPLY_TEXT
tables, the COMMENTS
column is of type VARCHAR2(4000)
.
SQL> DESCRIBE mining_build_text Name Null? Type ----------------------------------------- -------- ---------------------------- CUST_ID NOT NULL NUMBER CUST_GENDER NOT NULL CHAR(1) AGE NUMBER CUST_MARITAL_STATUS VARCHAR2(20) COUNTRY_NAME NOT NULL VARCHAR2(40) CUST_INCOME_LEVEL VARCHAR2(30) EDUCATION VARCHAR2(21) OCCUPATION VARCHAR2(21) HOUSEHOLD_SIZE VARCHAR2(21) YRS_RESIDENCE NUMBER AFFINITY_CARD NUMBER(10) BULK_PACK_DISKETTES NUMBER(10) FLAT_PANEL_MONITOR NUMBER(10) HOME_THEATER_PACKAGE NUMBER(10) BOOKKEEPING_APPLICATION NUMBER(10) PRINTER_SUPPLIES NUMBER(10) Y_BOX_GAMES NUMBER(10) OS_DOC_SET_KANJI NUMBER(10) COMMENTS VARCHAR2(4000)
In the MINING_*_NESTED_TEXT
tables, the COMMENTS
column is of type DM_NESTED_NUMERICALS
.
SQL> DESCRIBE mining_build_nested_text Name Null? Type ----------------------------------------- -------- ---------------------------- CUST_ID NOT NULL NUMBER CUST_GENDER NOT NULL CHAR(1) AGE NUMBER CUST_MARITAL_STATUS VARCHAR2(20) COUNTRY_NAME NOT NULL VARCHAR2(40) CUST_INCOME_LEVEL VARCHAR2(30) EDUCATION VARCHAR2(21) OCCUPATION VARCHAR2(21) HOUSEHOLD_SIZE VARCHAR2(21) YRS_RESIDENCE NUMBER AFFINITY_CARD NUMBER(10) BULK_PACK_DISKETTES NUMBER(10) FLAT_PANEL_MONITOR NUMBER(10) HOME_THEATER_PACKAGE NUMBER(10) BOOKKEEPING_APPLICATION NUMBER(10) PRINTER_SUPPLIES NUMBER(10) Y_BOX_GAMES NUMBER(10) OS_DOC_SET_KANJI NUMBER(10) COMMENTS DM_NESTED_NUMERICALS