7 The Data Mining Java API

This chap ter presents an overview of the Oracle Data Mining Java API. The Java API is based on JDM, the industry-standard Java API for data mining.

Note:

The Java API is layered on the PL/SQL and SQL language interfaces to Oracle Data Mining. All the SQL-based functionality described in this manual is also implemented in the Java API.

However, the Java API supports several features not implemented in SQL, such as asynchronous execution of mining tasks and text transformation.

See Also:

"Data Mining Java API"
Oracle Data Mining Java API Reference (javadoc)
Oracle Data Mining Concepts for a summary of the new features in the Oracle Data Mining Java API.

This chapter contains the following topics:

The Java Environment
Connecting to the Data Mining Engine
API Design Overview

The Java Environment

The Oracle Data Mining Java API requires Oracle Database 11g Release 2 (11.2) and J2SE 1.5. It is backward compatible and can be used with Oracle Database 10.2.

To use the Oracle Data Mining Java API, include the following libraries in your CLASSPATH:

$ORACLE_HOME/rdbms/jlib/jdm.jar
$ORACLE_HOME/rdbms/jlib/ojdm_api.jar
$ORACLE_HOME/rdbms/jlib/xdb.jar
$ORACLE_HOME/jdbc/lib/ojdbc5.jar
$ORACLE_HOME//oc4j/j2ee/home/lib/connector.jar 
$ORACLE_HOME/jlib/orai18n.jar
$ORACLE_HOME/jlib/orai18n-mapping.jar 
$ORACLE_HOME/lib/xmlparserv2.jar

Connecting to the Data Mining Engine

The Data Mining Engine (DME) is the infrastructure that offers a set of data mining services to its JDM clients. The Oracle Database provides the in-database data mining functionality for JDM through the core Oracle Data Mining option. So in the rest of this document the Oracle Database is referred to as the DME.

To access data mining functionality in the database, a DME Connection needs to be created. To connect to the DME, OJDM supports following different options.

The DME Connection object is described in detail in "Features of a DME Connection".

Connection Factory

The Connection factory is used to create a DME connection. The JDM standard defines ConnectionFactory as a Java interface to provide a vendor neutral approach to create a DME Connection. In this approach, the application infrastructure needs to register the instance of ConnectionFactory in a JNDI server. Applications can lookup for ConnectionFactory in the JNDI server to instantiate a Connection using this factory.

OJDM provides oracle.dmt.jdm.resource.OraConnectionFactory class, which can either be instantiated and accessed to create the connection or can be registered in JNDI server. Following code illustrates these two approaches to create a connection factory.

Create ConnectionFactory Using OraConnectionFactory

//Create OraConnectionFactory
javax.datamining.resource.ConnectionFactory connFactory = 
                 oracle.dmt.jdm.resource.OraConnectionFactory();

Lookup ConnectionFactory From the JNDI Server

//Setup the initial context to connect to the JNDI server
Hashtable env = new Hashtable();
env.put( Context.INITIAL_CONTEXT_FACTORY,
"oracle.dmt.jdm.resource.OraConnectionFactory" );
env.put( Context.PROVIDER_URL, "http://myHost:myPort/myService" );
env.put( Context.SECURITY_PRINCIPAL, "user" );
env.put( Context.SECURITY_CREDENTIALS, "password" );
InitialContext jndiContext = new javax.naming.InitialContext( env );
// Perform JNDI lookup to obtain the connection factory
javax.datamining.resource.ConnectionFactory dmeConnFactory =
(ConnectionFactory) jndiContext.lookup("java:comp/env/jdm/MyServer");
//Lookup ConnectionFactory
javax.datamining.resource.ConnectionFactory connFactory = 
  (ConnectionFactory) jndiContext.lookup("java:comp/env/jdm/MyServer");

Connect Using JDBC

This option is useful when the applications want to control the JDBC Connections outside the OJDM and allow the OraConnectionFactory to use the specified OracleDataSource to create the database connection. This approach gives applications the ability to use the implicit connection caching features as required. By default, OJDM doesn't enable the implicit connection caching. Oracle Database JDBC Developer's Guide and Reference for information about connection caching.

//Create an OracleDataSource
OracleDataSource ods = new OracleDataSource();
ods.setURL(URL);
ods.setUser(user);
ods.setPassword(password);
 
//Create a connection factory using the OracleDataSource
javax.datamining.resource.ConnectionFactory connFactory = 
  oracle.dmt.jdm.resource.OraConnectionFactory(ods);
//Create DME Connection
javax.datamining.resource.Connection dmeConn = 
    connFactory.getConnection();

Connect Using ConnectionSpec

This option is useful when the application doesn't want to pre-create the JDBC Connection and allow OJDM to maintain the JDBC Connection. Here the user needs to create an empty ConnectionSpec instance using getConnectionSpec() method in the oracle.dmt.jdm.resource.OraConnectionFactory class and create a DME Connection using the connection spec. The following code illustrates the usage.

//Create ConnectionSpec
ConnectionSpec connSpec = m_dmeConnFactory.getConnectionSpec();
connSpec.setURI("jdbc:oracle:thin:@host:port:sid");
connSpec.setName("user");
connSpec.setPassword("password");          
 
//Create DME Connection
javax.datamining.resource.Connection m_dmeConn = 
m_dmeConnFactory.getConnection(connSpec);

Features of a DME Connection

In the Oracle Data Mining Java API, the DME Connection is the primary factory object. The Connection instantiates the object factories using the getFactory method. The Connection object provides named object lookup, persistence, and task execution features.

Create Object Factories

The Connection.getFactory method creates a factory object. For example, to create a factory for the PhysicalDataSet object, pass the absolute name of the object to this method. The getFactory method creates an instance of PhysicalDataSetFactory.

javax.datamining.data.PhysicalDataSetFactory pdsFactory =
                dmeConn.getFactory("javax.datamining.data.PhysicalDataSet");

Provide Access to Mining Object Metadata

The Connection object provides methods for retrieving metadata about mining objects.

Method	Description
`getCreationDate`	Returns the creation date of the specified named object. getCreationDate(java.lang.String objectName, NamedObject objectType) returns java.util.Date
`getDescription`	Returns the description of the specified mining object. getDescription(java.lang.String objectName, NamedObject objectType) returns java.lang.String
`getObjectNames`	Returns a collection of the names of the objects of the specified type. getObjectNames(NamedObject objectType) returns java.util.Collection
`getObjectNames`	This is an Oracle JDM extension method that is added in 11.1 to provide a listing of mining object names across schemas or within a schema. It provides various optional method arguments that can be used to get a filtered list of arguments. getObjectNames( java.lang.String schemaPattern, NamedObject objectType, java.lang.String objectNamePattern, javax.datamining.Enum minorType_1, javax.datamining.Enum minorType_2 ): returns java.sql.ResultSet See Example 7-1.

Example 7-1 Oracle JDM Extension Method getObjectNames

This example illustrates the getObjectNames method.

To list the names of classification test metrics computed by the user SCOTT, specify:

the schemaPattern as "SCOTT"
objectType as NamedObject.testMetrics
objectPattern as null
minorType_1 as MiningFunction.classification
minorType_2 as null

Irrespective of the type of filters specified, the getObjectNames method returns the java.sql.ResultSet object with the following columns.

Column Name	Data Type	Description
`SCHEMA_NAME`	String	Name of the schema (can be null)
`TYPE`	String	Type of the mining object
`NAME`	String	Name of the mining object
`MINOR_TYPE_1`	String	Mining objects can have minor/sub types. For example, model objects can have function and algorithm as minor types.
`MINOR_TYPE_2`	String	Mining objects can have more than one minor type. If they have a second minor type, then this column is used.
`CREATION_DATE`	Timestamp	Date when this object was created
`DESCRIPTION`	String	Description of the object

Persistence and Retrieval of Mining Objects

The Connection object provides methods for retrieving mining objects and saving them in the DME. Persistent objects are stored as database objects. Transient objects are stored in memory by the Connection object.

Method	Description
`saveObject`	Saves the named object in the metadata repository associated with the connection. saveObject(java.lang.String name, MiningObject object, boolean replace)
`retrieveObject`	Retrieves a copy of the specified named object from the metadata repository associated with the connection. retrieveObject(java.lang.String objectIdentifier) returns MiningObject
`retrieveObject`	Retrieves a copy of the named object from the metadata repository associated with the connection. retrieveObject(java.lang.String name, NamedObject objectType) returns MiningObject
`retrieveObjects`	Returns a collection of mining objects of the given type that were created within the specified time interval (from createAfter to createBefore). (java.util.Date createdAfter, java.util.Date createdBefore, NamedObject objectType): returns java.util.Collection
`retrieveObjects`	Returns a collection of mining objects of the specified type that were created within the specified time interval (from createAfter to createBefore) retrieveObjects(java.util.Date createdAfter, java.util.Date createdBefore, NamedObject objectType, Enum minorType): returns java.util.Collection

See Also:

Execute Mining Tasks

The Connection object provides an execute method, which can execute mining tasks either asynchronously or synchronously. The DME uses the database Scheduler to execute mining tasks, which are stored in the user's schema as Scheduler jobs. The following methods are used to execute the tasks.

Task Execution	execute method syntax
asynchronous	execute(java.lang.String taskName) returns ExecutionHandle
synchronous	execute(Task task,java.lang.Long timeout)) returns ExecutionHandle Typically to be used with single record scoring, but it may be used in other contexts as well.

See Also:

"Task"
"Executing Mining Tasks"
Oracle Database Administrator's Guide for information about the Oracle Database Scheduler.

Retrieve DME Capabilities and Metadata

The Connection interface provides a ConnectionMetaData and supportsCapability retrieval methods. This feature is useful for applications to know more about the DME at runtime. The following methods are used for retrieving this information from the connection.

Method	Description
`getMetaData`	Returns information about the underlying DME instance represented through an active connection. `ConnectionMetaData` provides version information for the JDM implementation and Oracle Database. getMetaData() returns ConnectionMetaData
`getSupportedFunctions`	Returns an array of mining functions that are supported by the implementation. getSupportedFunctions() returns MiningFunction[]
`getSupportedAlgorithms`	Returns an array of mining algorithms that are supported by the specified mining function. getSupportedAlgorithms(MiningFunction function) returns MiningAlgorithm[]
`supportsCapability`	Returns true if the specified combination of mining capabilities is supported. If an algorithm is not specified, returns true if the specified function is supported. supportsCapability(MiningFunction function, MiningAlgorithm algorithm, MiningTask taskType) returns boolean

Retrieve Version Information

The ConnectionMetaData object provides methods for retrieving JDM standard version information and Oracle version information.

Method	Description
`getVersion`	Returns the version of the JDM Standard API. It must be "JDM 1.0" for the first release of JDM. getVersion() returns String
`getMajorVersion`	Returns the major version number. For the first release of JDM, this is "1". getMajorVersion() returns int
`getMinorVersion`	Returns the minor version number. For the first release of JDM, this is "0". getMinorVersion() returns int
`getProviderName`	Returns the provider name as "Oracle Corporation". getProviderName() returns String
`getProviderVersion`	Returns the version of the Oracle Database that shipped the Oracle Data Mining Java API jar file. getProviderVersion() returns String
`getProviderDMEVersion`	Returns the DME version of the provider getProviderDMEVersion() returns String

API Design Overview

This section briefly describes the OJDM design principles to familiarize the developers with the API. The JDM standard uses the factory method pattern as the core design pattern for the API. User can instantiate a JDM object using its factory. This enables JDM vendors like Oracle to implement a vendor neutral API. OJDM follows the same factory method pattern for its extensions. javax.datamining is the base package for the JDM standard defined classes and oracle.dmt.jdm is the base package for the Oracle extensions to the JDM standard

The JDM standard organizes its packages by the mining functions and mining algorithms. For example, javax.datamining.supervised package has all the supervised functions related classes and sub-packages java.datamining.supervised.classification and java.datamining.supervised.regression. Each function sub-package has the classes related to that function. Similarly, javax.datamining.algorithm is the base package for all algorithms and each algorithm has its sub-package under this package, for example, javax.datamining.algorithm.naivebayes is the sub-package for Naïve Bayes algorithm related classes. OJDM follows a similar package structure for the extensions, for example, feature extraction is a non-JDM standard function supported by the OJDM, here oracle.dmt.jdm.featureextraction is the package for this function and oracle.dmt.jdm.algorithm.nmf package for the Non-Negative Matrix Factorization algorithm used for feature extraction.

The JDM standard has some core packages that define common classes and packages for tasks, model details, rules and statistics. For more details refer to the JDM javadoc. The class diagram in Figure 7-1 illustrates the inheritance hierarchy of the named mining objects that are discussed in Chapter 2. In the subsequent sections more class diagrams are used to illustrate other OJDM objects. Note that the classes/interfaces shown in gray color are oracle JDM extension interfaces/classes. In Figure 7-1, oracle.dmt.jdm.transform.OraTransformationSequence is an Oracle extension to the mining objects defined in JDM 1.1 standard.

Figure 7-1 JDM Named Objects Class Diagram

Description of "Figure 7-1 JDM Named Objects Class Diagram"

Describing the Mining Data

The JDM standard defines physical and logical data objects to describe the mining attribute characteristics of the data as well as statistical computations for describing the data.

The javax.datamining.data package contains all the data-related classes. The class diagram in Figure 7-2 illustrates the class relationships of the data objects supported by the Oracle Data Mining Java API.

Figure 7-2 Data Objects in Oracle Data Mining Java API

Description of "Figure 7-2 Data Objects in Oracle Data Mining Java API"

The PhysicalDataSet object is used to specify the name and location of the dataset used for mining operations. For example, to represent a model build input dataset MINING_DATA_BUILD_V in a DMUSER schema account, PhysicalDataSet object is created with the data URI DMUSER.MINING_DATA_BUILD_V. The schema name prefix is optional when accessing the datasets in the same user account.

Note that in the class diagram in Figure 7-2 a PhysicalDataSet can have PhysicalAttribute objects. A PhysicalAttribute represents physical characteristics of the columns of the input dataset; optionally physical attributes can specify the roles of the column. For example, in the MINING_DATA_BUILD_V dataset, CUST_ID uniquely identifies each case used for mining. So the role of the CUST_ID column is specified as case id.

Example 7-2 illustrates the code sample that creates the PhysicalDataSet object. Note that the PhysicalDataSet object is saved with the name JDM_BUILD_PDS that can be specified as input to model build that we discuss later in "Build Settings"

The PhysicalDataRecord object shown in Figure 7-2 is used to specify a single record of a dataset. It is used for single record apply that we will discuss in a later section of this Chapter. The SignatureAttribute is used to specify the model signature of a mining model that will be discussed in later section of this Chapter.In OJDM attribute data types are used to implicitly specify the mining attribute types. For example, all VARCHAR2 columns are treated as categorical and all NUMBER columns are treated as numerical. So there is no need to specify logical data details in OJDM. However, to rename attributes of a column, a user can specify the embedded transformations that are discussed in the next section.

Example 7-2 Creation of a Physical Dataset

//Create PhysicalDataSetFactory
PhysicalDataSetFactory pdsFactory = 
(PhysicalDataSetFactory)m_dmeConn.getFactory("javax.datamining.data.PhysicalDataSet");
//Create a PhysicalDataSet object
PhysicalDataSet buildData = pdsFactory.create("DMUSER.MINING_DATA_BUILD_V", false);
//Create PhysicalAttributeFactory 
PhysicalAttributeFactory paFactory =
(PhysicalAttributeFactory)m_dmeConn.getFactory("javax.datamining.data.PhysicalAttribute");
//Create PhysicalAttribute object
PhysicalAttribute pAttr = paFactory.create(
"cust_id", AttributeDataType.integerType, PhysicalAttributeRole.caseId );
//Add the attribute to the PhysicalDataSet object
buildData.addAtribute(pAttr);
//Save the physical data set object
dmeConn.saveObject("JDM_BUILD_PDS", buildData, true);

Build Settings

In the Oracle Data Mining Java API, the BuildSettings object is saved as a table in the database. The settings table is compatible with the DBMS_DATA_MINING.CREATE_MODEL procedure. The name of the settings table must be unique in the user's schema. Figure 7-3 illustrates the build settings class hierarchy.

Figure 7-3 Build Settings Class Diagram.

Description of "Figure 7-3 Build Settings Class Diagram."

The code in Example 7-3 illustrates the creation and storing of a classification settings object with a tree algorithm.

Example 7-3 Creation of a Classification Settings with Decision Tree Algorithm

//Create a classification settings factory
ClassificationSettingsFactory clasFactory = 
(ClassificationSettingsFactory)dmeConn.getFactory
     ("javax.datamining.supervised.classification.ClassificationSettings");
//Create a ClassificationSettings object
ClassificationSettings clas = clasFactory.create();
//Set target attribute name
clas.setTargetAttributeName("AFFINITY_CARD");
//Create a TreeSettingsFactory
TreeSettingsFactory treeFactory =
(TreeSettingsFactory)dmeConn.getFactory
     ("javax.datamining.algorithm.tree.TreeSettings");
//Create TreeSettings instance
TreeSettings treeAlgo = treeFactory.create();
treeAlgo.setBuildHomogeneityMetric(TreeHomogeneityMetric.entropy);
treeAlgo.setMaxDepth(10);
treeAlgo.setMinNodeSize( 10, SizeUnit.count );
//Set algorithm settings in the classification settings
clas.setAlgorithmSettings(treeAlgo);
//Save the build settings object in the database
dmeConn.saveObject("JDM_TREE_CLAS", clas, true);

Enable Automated Data Preparation

In 11.1, all mining algorithms support automated data preparations (ADP). By default for decision tree and GLM algorithms, ADP is enabled. For other algorithms it is disabled by default for backward compatibility reasons. To enable ADP explicitly for the algorithms that do not enable by default, invoke the following function, by specifying the useAutomatedDataPreparations boolean flag as true.

OraBuildSettings.useAutomatedDataPreparations
          (boolean useAutomatedDataPreparations)

For more information about automatic data preparation, see Oracle Data Mining Concepts.

Executing Mining Tasks

OJDM uses the DBMS_SCHEDULER infrastructure for executing mining tasks either synchronously or asynchronously in the database. A mining task is saved as a DBMS_SCHEDULER job in the user schema and is set to DISABLED state. When user calls the execute method in DME Connection, the job state will be changed to ENABLED and scheduler starts executing the mining task by creating a new database session for asynchronous executions. For synchronous executions scheduler uses the same database session opened by the DME connection.

The class diagram in Figure 7-4 illustrates the different types of tasks that are available in OJDM and its class hierarchy. Subsequent sections will discuss more about the individual tasks shown in this diagram

Figure 7-4 Task Class Diagram

Class diagram of different types of tasks.

Description of "Figure 7-4 Task Class Diagram"

DBMS_SCHEDULER provides additional scheduling and resource management features. You can extend the capabilities of Oracle Data Mining tasks by using the Scheduler infrastructure.

See Also:

Oracle Database Administrator's Guide for information about the database scheduler.

Creating Mining Task Workflows

In Oracle Data Mining 11.1, the task infrastructure supports applications to specify dependent tasks through the API and deploy the execution of the tasks to the database server. The server executes complete workflow of tasks specified through the API and once deployed it does not depend on client. Client can monitor the execution process using OJDM API. For example, typically after the completion of data preparations, model is built and then tested and applied. Both test and apply can be done in parallel after model build is successful.

To build a task flow invoke the method OraTask.addDependency(String parentTaskName). For example, the code in Example 7-4 illustrates how to setup a mining task workflow, where first run the transformations task and then model build task. After successful completion of the build task run apply and test tasks in parallel.

Example 7-4 Building Mining Task Workflows

//Task objects declarations
private TransformationTask xformTask;
private BuildTask buildTask;
private TestTask testTask;
private DataSetApplyTask applyTask;
//Creation of the tasks and task input objects are skipped for this example
…
//Save the first task in the workflow (the transformations task)
dmeConn.saveObject("transformationTask", xformTask, true);
//Specify dependencies before saving of the tasks
buildTask.addDependency("transformationTask");
dmeConn.saveObject("modelBuildTask", buildTask, true);
testTask.addDependency("modelBuildTask");
dmeConn.saveObject("modelTestTask", testTask, true);
applyTask.addDependency("modelBuildTask");
dmeConn.saveObject("modelApplyTask", applyTask, true);
//Execute the first task in the workflow to initiate the execution of the whole workflow
dmeConn.execute("transformationTask");

Building a Mining Model

The javax.datamining.task.BuildTask class is used to build a mining model. Prior to building a model, a PhysicalDataSet object and a BuildSettings object must be saved.

Example 7-5 illustrates the building of a tree model using the PhysicalDataSet described in "Describing the Mining Data" and the BuildSettings described in "Build Settings".

Example 7-5 Building a Model

//Create BuildTaskFactory
BuildTaskFactory buildTaskFactory =
     dmeConn.getFactory("javax.datamining.task.BuildTask");
//Create BuildTask object
BuildTask buildTask = buildTaskFactory.create
     ( "JDM_BUILD_PDS","JDM_TREE_CLAS","JDM_TREE_MODEL"); 
//Save BuildTask object
dmeConn.saveObject("JDM_BUILD_TASK", buildTask, true);
//Execute build task asynchronously in the database
ExecutionHandle execHandle = dmeConn.execute("JDM_BUILD_TASK");
//Wait for completion of the task
ExecutionStatus execStatus = execHandle.waitForCompletion(Integer.MAX_VALUE);

Exploring Model Details

After building a model using the BuildTask, a model object is persisted in the database. It can be retrieved to explore the model details.

The class diagram in Figure 7-5 illustrates the different types of model objects and model details objects supported by the Oracle Data Mining Java API.

Figure 7-5 Model and Model Detail Class Diagram

Class diagram of Model objects and model details

Description of "Figure 7-5 Model and Model Detail Class Diagram"

Example 7-6 illustrates the retrieval of the classification tree model built in "Building a Mining Model" and its TreeModelDetail.

Example 7-6 Retrieve Model Details

//Retrieve classification model from the DME
ClassificationModel treeModel = (ClassificationModel)dmeConn.retrieveObject
     ( "JDM_TREE_MODEL", NamedObject.model);
//Retrieve tree model detail from the model
TreeModelDetail treeDetail = (TreeModelDetail)treeModel.getModelDetail();
//Get the root node
TreeNode rootNode = treeDetail.getRootNode();
//Get child nodes
TreeNode[] childNodes = rootNode.getChildren();
//Get details of the first child node
int nodeId = childNodes[0].getIdentifier();
long caseCount = childNodes[0].getCaseCount();
Object prediction = childNodes[0].getPrediction();

Testing a Model

Once a supervised model has been built, it can be evaluated using a test operation. The JDM standard defines two types of test operations: one that takes the mining model as input, and the other that takes the apply output table with the actual and predicted value columns.

javax.datamining.supervised.TestTask is the base class for the model- based test tasks, and javax.datamining.supervised.TestMetricsTask is the base class for the apply output table-based test tasks.

The test operation creates and persists a test metrics object in the DME. For classification model testing, either of the following can be used.

javax.datamining.supervised.classification.ClassificationTestTask
javax.datamining.supervised.classification.ClassificationTestMetricsTask

Both of these tasks create the named object javax.datamining.supervised.classification.ClassificationTestMetrics and store it as a table in the user's schema.

The classification test metrics components, confusion matrix, lift results, and ROC associated with the ClassificationTestMetrics object are stored in separate tables whose names are the ClassificationTestMetrics object name followed by the suffix _CFM, _LFT, or _ROC. These tables can be used to display test results in dashboards, BI platforms such as Oracle BI, Business Objects, and so on.

Similarly for regression model testing, either of the following can be used:

javax.datamining.supervised.regression.RegressionTestTask
javax.datamining.supervised.regression.RegressionTestMtericsTask

Both these tasks create a named object javax.datamining.supervised.regression.RegressionTestMetrics and store it as a table in the user's schema.

The class diagram in Figure 7-6 illustrates the test metrics class hierarchy. It refers to Figure 7-4, "Task Class Diagram" for the class hierarchy of test tasks.

Figure 7-6 Test Metrics Class Hierarchy

Description of "Figure 7-6 Test Metrics Class Hierarchy"

Example 7-7 illustrates the test of a tree model JDM_TREE_MODEL using the ClassificationTestTask on the dataset MINING_DATA_TEST_V.

Example 7-7 Testing a Model

//Create & save PhysicalDataSpecification      
PhysicalDataSet testData = m_pdsFactory.create(
        "MINING_DATA_TEST_V", false );
PhysicalAttribute pa = m_paFactory.create("cust_id", 
        AttributeDataType.integerType, PhysicalAttributeRole.caseId );
testData.addAttribute( pa );
m_dmeConn.saveObject( "JDM_TEST_PDS", testData, true );
//Create ClassificationTestTaskFactory
ClassificationTestTaskFactory testTaskFactory =  
  (ClassificationTestTaskFactory)dmeConn.getFactory(
     "javax.datamining.supervised.classification.ClassificationTestTask");
//Create, store & execute Test Task
ClassificationTestTask testTask = testTaskFactory.create( 
        "JDM_TEST_PDS", "JDM_TREE_MODEL", "JDM_TREE_TESTMETRICS" );
testTask.setNumberOfLiftQuantiles(10);
testTask.setPositiveTargetValue(new Integer(1));
//Save TestTask object
dmeConn.saveObject("JDM_TEST_TASK", testTask, true);
//Execute test task asynchronously in the database
ExecutionHandle execHandle = dmeConn.execute("JDM_TEST_TASK");
//Wait for completion of the task
ExecutionStatus execStatus = execHandle.waitForCompletion(Integer.MAX_VALUE);
//Explore the test metrics after successful completion of the task
if(ExecutionState.success.equals(execStatus.getState())) {
  //Retrieve the test metrics object
  ClassificationTestMetrics testMetrics =  
          (ClassificationTestMetrics)dmeConn.getObject("JDM_TREE_TESTMETRICS");
  //Retrieve confusion matrix and accuracy
  Double accuracy = testMetrics.getAccuracy();
  ConfusionMatrix cfm = testMetrics.getConfusionMatrix();
  //Retrieve lift 
  Lift lift = testMetrics.getLift();
  //Retrieve ROC
  ReceiverOperatingCharacterics roc = testMetrics.getROC();
}

In Example 7-7, a test metrics object is stored as a table called JDM_TREE_TESTMETRICS. The confusion matrix is stored in the JDM_TREE_TESTMETRICS_CFM table, lift is stored in the JDB_TREE_TESTMETRICS_LFT table, and ROC is stored in the JDM_TREE_TESTMETRICS_ROC table. You can use BI tools like Oracle Discoverer to query these tables and create reports.

Applying a Model for Scoring Data

All supervised models can be applied to data to find the prediction. Some of the unsupervised models, such as clustering and feature extraction, support the apply operation to find the cluster id or feature id for new records.

The JDM standard API provides an ApplySettings object to specify the type of output for the scored results. javax.datamining.task.apply.ApplySettings is the base class for all apply settings. In the Oracle Data Mining Java API, the ApplySettings object is transient; it is stored in the Connection context, not in the database.

The class diagram in Figure 7-7 illustrates the class hierarchy of the apply settings available in the Oracle Data Mining Java API.

Figure 7-7 Apply Settings

Description of "Figure 7-7 Apply Settings"

In the Oracle Data Mining Java API, default apply settings produce the apply output table in fixed format. The list in Table 7-1 illustrates the default output formats for different functions.

Table 7-1 Default Output Formats for Different Functions

Mining Function
Classification without Cost	Case ID	Prediction	Probability
Classification with Cost	Case ID	Prediction	Probability	Cost
Regression	Case ID	Prediction
Feature extraction	Case ID	Feature ID	Value

All types of apply settings support source and destination attribute mappings. For example, if the original apply table has customer name and age columns that need to be carried forward to the apply output table, it can be done by specifying the source destination mappings in the apply settings.

In the Oracle Data Mining Java API, classification apply settings support map by rank, top prediction, map by category, and map all predictions. Regression apply settings support map prediction value. Clustering apply settings support map by rank, map by cluster id, map top cluster, and map all clusters. Feature extraction apply settings support map by rank, map by feature id, map top feature, and map all features.

Example 7-8 illustrates the applying of a tree model JDM_TREE_MODEL using ClassificationApplyTask on the dataset MINING_DATA_APPLY_V.

Example 7-8 Applying a Model

//Create & save PhysicalDataSpecification      
PhysicalDataSet applyData = m_pdsFactory.create( "MINING_DATA_APPLY_V", false );
PhysicalAttribute pa = m_paFactory.create("cust_id", 
        AttributeDataType.integerType, PhysicalAttributeRole.caseId );
applyData.addAttribute( pa );
m_dmeConn.saveObject( "JDM_APPLY_PDS", applyData, true );
//Create ClassificationApplySettingsFactory
ClassificationApplySettingsFactory applySettingsFactory =  
  (ClassificationApplySettingsFactory)dmeConn.getFactory(
     "javax.datamining.supervised.classification. ClassificationApplySettings");
//Create & save ClassificationApplySettings
ClassificationApplySettings clasAS = applySettingsFactory.create();
m_dmeConn.saveObject( "JDM_APPLY_SETTINGS", clasAS, true);
//Create DataSetApplyTaskFactory
DataSetApplyTaskFactory applyTaskFactory =  
  (DataSetApplyTaskFactory)dmeConn.getFactory(
     "javax.datamining.task.apply.DataSetApplyTask");
//Create, store & execute apply Task
DataSetApplyTask applyTask = m_dsApplyFactory.create(
        " JDM_APPLY_PDS ", "JDM_TREE_MODEL", " JDM_APPLY_SETTINGS ", 
        "JDM_APPLY_OUTPUT_TABLE");
//Save ApplyTask object
dmeConn.saveObject("JDM_APPLY_TASK", applyTask, true);
//Execute test task asynchronously in the database
ExecutionHandle execHandle = dmeConn.execute("JDM_APPLY_TASK");
//Wait for completion of the task
ExecutionStatus execStatus = execHandle.waitForCompletion(Integer.MAX_VALUE);

Using a Cost Matrix

The class javax.datamining.supervised.classification.CostMatrix is used to represent the costs of the false positive and false negative predictions. It is used for classification problems to specify the costs associated with the false predictions.

In the Oracle Data Mining Java API, cost matrix is supported in apply and test operations for all classification models. For the decision tree algorithm, a cost matrix can be specified at build time. For more information about cost matrix, see Oracle Data Mining Concepts.

Example 7-9 illustrates how to create a cost matrix object where the target has two classes: YES (1) and NO (0). Suppose a positive (YES) response to the promotion generates $2 and the cost of the promotion is $1. Then the cost of misclassifying a positive responder is $2. The cost of misclassifying a non-responder is $1.

Example 7-9 Creating a Cost Matrix

//Create category set factory & cost matrix factory
CategorySetFactory catSetFactory = (CategorySetFactory)m_dmeConn.getFactory(
      "javax.datamining.data.CategorySet" );
CostMatrixFactory costMatrixFactory = (CostMatrixFactory)m_dmeConn.getFactory(
      "javax.datamining.supervised.classification.CostMatrix");
//Create categorySet
CategorySet catSet = m_catSetFactory.create(AttributeDataType.integerType);
//Add category values
catSet.addCategory(new Integer(0), CategoryProperty.valid);
catSet.addCategory(new Integer(1), CategoryProperty.valid);
//create cost matrix
CostMatrix costMatrix = m_costMatrixFactory.create(catSet);
costMatrix.setCellValue(new Integer(0), new Integer(0), 0);
costMatrix.setCellValue (new Integer(1), new Integer(1), 0);
costMatrix.setCellValue (new Integer(0), new Integer(1), 2);
costMatrix.setCellValue (new Integer(1), new Integer(0), 1);
//Save cost matrix in the DME
dmeConn.saveObject("JDM_COST_MATRIX", costMatrix);

Using Prior Probabilities

Prior probabilities are used for classification problems if the actual data has a different distribution for target values than the data provided for the model build. A user can specify the prior probabilities in the classification function settings, using setPriorProbabilitiesMap. For more information about prior probabilities, see Oracle Data Mining Concepts.

Note:

Priors are not supported with decision trees.

Example 7-10 illustrates how to create a PriorProbabilities object, when the target has two classes: YES (1) and NO (0), and probability of YES is 0.05, probability of NO is 0.95.

Example 7-10 Creating Prior Probabilities

//Set target prior probabilities
Map priorMap = new HashMap();
priorMap.put(new Double(0), new Double(0.7));
priorMap.put(new Double(1), new Double(0.3));
buildSettings.setPriorProbabilitiesMap("affinity_card", priorMap);

Embedded Transformations

In 11.1, OJDM supports embedding transformations with the model metadata. When the transformations are embedded with the model, they are implicitly applied to apply and test datasets. For example, user can embed a transformation that recodes the response attributes value representation from 1/0 to Yes/No; model uses this transformation when applying the model to the new data.

Users can specify these transformations as SQL expressions or can use the OJDM transformations discussed in Section 2.13 and build a transformation sequence.

In this section, the first example discusses the simple expression transformation using the oracle.dmt.jdm.transform.OraExpressionTransform class specified as input for model build.

The second example illustrates how to build a sequence of complex transformations and persist them as a transformation sequence and embed them into the model.

Embed Single-Expression Transformations

Using OraTransformationFactory user can create transformation objects such as OraTransformationSequence, OraExpressionTransform, OraBinningTransform, OraNormalizationTransform and OraClippingTransform.

In Example 7-11, we create an expression transform that defines a simple log transformation for age attribute, recode transformation for affinity_card attribute and explicit exclusion of original age attribute from the model build. The code illustrates using OJDM API how one can embed these simple SQL expression transformations with the model.

Example 7-11 Simple Expression Transformation

//Create OraTransformationFactory
OraTransformationFactory m_xformFactory = (OraTransformationFactory)m_dmeConn.getFactory(
      "oracle.dmt.jdm.transform.OraTransformation" );
//Create OraExpressionTransform from the transformation factory
OraExpressionTransform exprXform = m_xformFactory.createExpressionTransform();
  //1) Specify log transformation of age attribute and create a new attribute call log_age
  //   that can be used for mining
  exprXform.addAttributeExpression("log_age", //Expression output attribute name
                                  "log(10, age) as log_age", //Expression
                                  "power(10, log_age)" //Reverse expression
                                  );
  //2) Recode 1/0 values of the affinity card attribute with the yes/no values and replace 
  //   existing attribute with the new recoded attribute
  exprXform.addAttributeExpression("affinity_card", //Expression output attribute name
                                  "CASE WHEN AFFINITY_CARD = 1 THEN 'YES' ELSE 'NO' END ", 
                                  null //No explicit reverse expression
                                  );
  //3) Exclude age attribute from mining
  exprXform.addAttributeExpression("age", //Expression output attribute name
                                  null, //Specify expression as null 
                                //to exclude attribute from mining 
                                  null 
                                  );
//Create transformation sequence object using expression transformation
OraTransformationSequence xformSeq = m_xformFactory.createTransformationSequence(
        "MINING_DATA_BUILD_V", //Input table
        exprXform, //Expressions to be defined
        null //Output transformed view is specified as null as we are trying to 
             //embed the transformations to the model
      );
//Save transformation sequence object
m_dmeConn.saveObject("simpleExprXForm_jdm", xformSeq, true);
//Create build Task  with transformation sequence
BuildTask buildTask = m_buildFactory.create(
                     "inputPDS", //Build data specification
                     "inputBuildSettings", //Mining function settings name
                     "outputModel" //Mining model name
                     );                          
//Specify transformation sequence as one of the input to enable embedding
//of the transformations defined in the sequence with the model
//In this example only expression transformations are specified
((OraBuildTask)buildTask).setTransformationSequenceName("simpleExprXForm_jdm");
//Save and execute the build task
... 
//After successful model build specified transformations are embedded with the model
//User can retrieve the transformation details that are embedded with the model by calling 
//the following function in OraModel
OraExpressionTransform modelExmbeededTransforms = 
                     ((OraModel)model). GetModelTransformations();

Embed Complex Sequence of Transformations

In the previous example, we explored how to embed simple SQL expression transformations for trivial business transformations. In this section we will detail how a complex transformation sequence can be built using OJDM and embed these with the model.

OJDM 10.2 provides typical mining related individual transformations such as binning, normalization and outlier treatment (clipping). In 10.2 users have to maintain these transformations outside the modeling process and do the consistent transformations for the build, apply and test datasets outside the mining operations. This requires significant additional coding and maintenance of the transformation related objects by the end-user applications.

With the model embedded transformations capability, users can embed even complex transformation sequences such as first add business transformations and new attributes using the expression transforms (as discussed in the previous example), second treat outliers with the user specified clipping definitions and lastly normalize the data with the user specified normalization technique.

In OJDM new OraTransformationSequence object supports ability to specify sequence of transformations and convert these transformations into per attribute SQL expressions to embed them to the model. Example 7-12 illustrates using OJDM API how one can build a transformation sequence which uses the expression transform created in the previous example and extends it with the outlier and normalization data mining transformations and embed these complex transformation sequence with the model.

Example 7-12 Complex Sequence Transformations

//Create a list of transformations to be performed on the input mining data
List xformList = new ArrayList();
xfromList.add( exprXform );     //Expression transformation
xformList.add( clippingXform ); //Clipping transformation to treat outliers
xformList.add( normalizeXform );//Normalization transformation
//Create transformation sequence object using list of transformation
OraTransformationSequence xformSeq = m_xformFactory.createTransformationSequence(
        "MINING_DATA_BUILD_V", //Input table
        xformList, //List of transformations
        null //Output transformed view is specified as null as we are trying to 
             //embed the transformations to the model
      );
//Save transformation sequence object
m_dmeConn.saveObject("complexXFormSeq_jdm", xformSeq, true);
//Create transformation task with the transformation sequence
OraTransformationTaskFactory m_xformTaskFactory = 
       (OraTransformationTaskFactory)m_dmeConn.getFactory(
                   "oracle.dmt.jdm.task.OraTransformationTask");         
OraTransformationTask xformTask =  m_xformTaskFactory .create(
                 "complexXFormSeq_jdm",,   
                 false //boolean flag useTransformDefinitionTables
                 );
//Save and execute transformation task to populate transformation sequence with the
//SQL expressions necessary before embedding them to the build task
. . . .
//Create build Task with transformation sequence
. . . .
((OraBuildTask)buildTask).setTransformationSequenceName("complexXFormSeq_jdm ");
//Save and execute the build task with the embedded transformations
...

Note that in both the examples we specified the output view of the transformation sequence as null to avoid creation of the database view that includes the transformations in the sequence. However, one can specify the view name to create a database view with the transformations and use this view as input to the model build to maintain the transformations outside the modeling process. OJDM API provides flexibility for the applications to choose the approach that best fits the need.

Using Predictive Analytics Tasks: Predict, Explain, and Profile

OJDM has oracle.dmt.jdm.task.OraPredictTask, OraExplainTask and OraProfileTask for data mining novice users to get predictions, to explain attributes importance and to discover profiles from the data.

Using OraPredictTask predictions are computed by just specifying the data location and the target column. This task learns from the known values in the target column and other columns in the input table and fills the unknown values in the target column with the predictions. This task hides all the data mining process done inside the database and produces the predictions and accuracy of the predictions.

Using OraExplainTask attributes ranking/importance with respect to an explain column. By just specifying the data location and explain column this task produces the attribute ranking table.

Using OraProfileTask profiles are discovered from the data for a given target attribute. For example, to find the profiles of the customers who respond to a product promotion, give the customers dataset with the customer attributes and promotion response attribute to the profile task. Profile task outputs a table with the profile definitions that applications can display to the users.

Both the tasks do automated data preparation where needed.

Example 7-13 illustrates how to execute predict, explain, and profile tasks.

Example 7-13 Predictive Analytics

//Get Predictive Analytics Task Factory object
   OraPredictiveAnalyticsTaskFactory  m_paFactory =
     (OraPredictiveAnalyticsTaskFactory)m_dmeConn.getFactory(
          "oracle.dmt.jdm.task.OraPredictiveAnalyticsTask");
//Predict task
   //Create predict task object
   OraPredictTask predictTask = m_paFactory.createPredictTask (
                     "MINING_DATA_BUILD_V", //Input table
                     "cust_id", //Case id column
                     "affinity_card", //target column
                     "JDM_PREDICTION_RESULTS"); //prediction output table
   //Save predict task object
   dmeConn.saveObject("JDM_PREDICT_TASK", predictTask, true);
   //Execute test task asynchronously in the database
   ExecutionHandle execHandle1 = dmeConn.execute("JDM_PREDICT_TASK");
   //Wait for completion of the task
   ExecutionStatus execStatus1 = execHandle1.waitForCompletion(Integer.MAX_VALUE);                         
//Explain task
   //Create explain task object
   OraExplainTask explainTask = m_paFactory.createExplainTask (
                     "MINING_DATA_BUILD_V", //Input table
                     "affinity_card", //explain column
                     "JDM_EXPLAIN_RESULTS"); //explain output table
   //Save predict task object
   dmeConn.saveObject("JDM_EXPLAIN_TASK", explainTask, true);
   //Execute test task asynchronously in the database
   ExecutionHandle execHandle2 = dmeConn.execute("JDM_ EXPLAIN_TASK");
   //Wait for completion of the task
   ExecutionStatus execStatus2 = execHandle2.waitForCompletion(Integer.MAX_VALUE);                         
//Profile task
   //Create profile task 
    OraProfileTask profileTask = m_paFactory.createProfileTask(
                         "MINING_DATA_BUILD_V", //Input table
                         "affinity_card", //Target column
                                 "JDM_PROFILE_RESULTS); //Profile output table
   //Save predict task object
   dmeConn.saveObject("JDM_PROFILE_TASK", profileTask, true);
   //Execute test task asynchronously in the database
   ExecutionHandle execHandle3 = dmeConn.execute("JDM_PROFILE_TASK");
   //Wait for completion of the task
   ExecutionStatus execStatus3 = execHandle3.waitForCompletion(Integer.MAX_VALUE);

Preparing the Data

In the Oracle Data Mining Java API, data must be prepared before building, applying, or testing a model. The oracle.dmt.jdm.task.OraTransformationTask class supports common transformations used in data mining: binning, normalization, clipping, and text transformations. For more information about transformations, see Oracle Data Mining Concepts.

The class diagram in Figure 7-8 illustrates the OraTransformationTask and its relationship with other objects.

Figure 7-8 OraTransformationTask Class Diagram

Description of "Figure 7-8 OraTransformationTask Class Diagram"

Using Binning/Discretization Transformation

Binning is the process of grouping related values together, thus reducing the number of distinct values for an attribute. Having fewer distinct values typically leads to a more compact model and one that builds faster, but it can also lead to some loss in accuracy.

The class diagram in Figure 7-9 illustrates the binning transformation classes.

Figure 7-9 OraBinningTransformation Class Diagram

Description of "Figure 7-9 OraBinningTransformation Class Diagram"

Here, OraBinningTransformation contains all the settings required for binning. The Oracle Data Mining Java API supports top-n, custom binning for categorical attributes, and equi-width, quantile and custom binning for numerical attributes. After running the binning transformations, it creates a transformed table and bin boundary tables in the user's schema. The user can specify the bin boundary table names, or the system will generate the names for the bin boundary tables. This facilitates the reusing of the bin boundary tables that are created for binning build data for apply and test data.

The following code illustrates the binning operation on the view MINING_BUILD_DATA_V

//Create binning transformation instance
OraBinningTransformFactory binXformFactory = 
   (OraBinningTransformFactory)dmeConn.getFactory(
      "oracle.dmt.jdm.transform.binning.OraBinningTransform");
OraBinningTransform binTransform = m_binXformFactory.create(
      "MINING_DATA_BUILD_V", // name of the input data set
      "BINNED_DATA_BUILD_V", // name of the transformation result 
      true); // result of the transformation is a view  
// Specify the number of numeric bins
binTransform.setNumberOfBinsForNumerical(10);
// Specify the number of categoric bins
binTransform.setNumberOfBinsForCategorical(8);
// Specify the list of excluded attributes
String[] excludedList = new String[]{"CUST_ID", "CUST_GENDER"};
binTransform.setExcludeColumnList(excludedList);
// Specify the type of numeric binning: equal-width or quantile
       ( default is quantile )
binTransform.setNumericalBinningType(binningType);
// Specify the type of categorical binning as Top-N: by default it is none   
binTransform.setCategoricalBinningType(OraCategoricalBinningType.top_n);
//Create transformation task
OraTransformationTask xformTask = m_xformTaskFactory.create(binTransform);
//Save transformation task object
dmeConn.saveObject("JDM_BINNING_TASK", xformTask, true);
//Execute transformation task asynchronously in the database
ExecutionHandle execHandle = dmeConn.execute("JDM_ BINNING _TASK");
//Wait for completion of the task
ExecutionStatus execStatus = execHandle.waitForCompletion(Integer.MAX_VALUE);

Using Normalization Transformation

Normalizing converts individual attribute values in such a way that all attribute values lie in the same range. Normally, values are converted to be in the range 0.0 to 1.0 or the range -1 to +1. Normalization ensures that attributes do not receive artificial weighting caused by differences in the ranges that they span.

The class diagram in Figure 7-10 illustrates the normalization transformation classes.

Figure 7-10 OraNormalizeTransformation Class Diagram

Normalization transformation related classes.

Description of "Figure 7-10 OraNormalizeTransformation Class Diagram"

Here, OraNormalizeTransformation contains all the settings required for normalization. The Oracle Data Mining Java API supports z-Score, min-max, and linear scale normalizations. Normalization is required for SVM, NMF, and k-Means algorithms.

The following code illustrates normalization on the view MINING_BUILD_DATA_V.

//Create OraNormalizationFactory
OraNormalizeTransformFactory normalizeXformFactory = 
  (OraNormalizeTransformFactory)m_dmeConn.getFactory(
      "oracle.dmt.jdm.transform.normalize.OraNormalizeTransform");
//Create OraNormalization
OraNormalizeTransform normalizeTransform = m_normalizeXformFactory.create(
      "MINING_DATA_BUILD_V", // name of the input data set
      "NORMALIZED_DATA_BUILD_V", // name of the transformation result 
      true, // result of the transformation is a view
      OraNormalizeType.z_Score, //Normalize type
      new Integer(6) ); //Rounding number    
// Specify the list of excluded attributes
String[] excludedList = new String[]{"CUST_ID", "CUST_GENDER"};
normalizeTransform.setExcludeColumnList(excludedList);
//Create transformation task
OraTransformationTask xformTask = m_xformTaskFactory.create(normalizeTransform);
//Save transformation task object
dmeConn.saveObject("JDM_NORMALIZE_TASK", xformTask, true);
//Execute transformation task asynchronously in the database
ExecutionHandle execHandle = dmeConn.execute("JDM_NORMALIZE_TASK");
//Wait for completion of the task
ExecutionStatus execStatus = execHandle.waitForCompletion(Integer.MAX_VALUE);

Using Clipping Transformation

Some computations on attribute values can be significantly affected by extreme values. One approach to achieving a more robust computation is to either winsorize or trim the data using clipping transformations.

Winsorizing involves setting the tail values of a particular attribute to some specified value. For example, for a 90% winsorization, the bottom 5% are set equal to the minimum value in the 6th percentile, while the upper 5% are set equal to the value corresponding to the maximum value in the 95th percentile.

Trimming "removes" the tails in the sense that trimmed values are ignored in further values. This is achieved by setting the tails to NULL.

The class diagram in Figure 7-11 illustrates the clipping transformation classes.

Figure 7-11 OraClippingTransformation Class Diagram

Description of "Figure 7-11 OraClippingTransformation Class Diagram"

Here, OraClippingTransformation contains all the settings required for clipping. The Oracle Data Mining Java API supports winsorize and trim types of clipping.

The following code illustrates clipping on the view MINING_BUILD_DATA_V.

//Create OraClippingTransformFactory
OraClippingTransformFactory clipXformFactory = 
  (OraClippingTransformFactory)dmeConn.getFactory(
      "oracle.dmt.jdm.transform.clipping.OraClippingTransform");
//Create OraClippingTransform
OraClippingTransform clipTransform = clipXformFactory.create(
      "MINING_DATA_BUILD_V", // name of the input data set
      "WINSORISED_DATA_BUILD_V", // name of the transformation result 
      true );// result of the transformation is a view    
//Specify the list of excluded attributes
String[] excludedList = new String[]{"CUST_ID", "CUST_GENDER"};
clipTransform.setExcludeColumnList(excludedList);
//Specify the type of clipping
clipTransform.setClippingType(OraClippingType.winsorize);
// Specify the tail fraction as 3% of values on both ends
clipTransform.setTailFraction(0.03);
//Create and save transformation task
OraTransformationTask xformTask = xformTaskFactory.create(clipTransform);
//Save transformation task object
dmeConn.saveObject("JDM_CLIPPING_TASK", xformTask, true);
//Execute transformation task asynchronously in the database
ExecutionHandle execHandle = dmeConn.execute("JDM_CLIPPING_TASK");
//Wait for completion of the task
ExecutionStatus execStatus = execHandle.waitForCompletion(Integer.MAX_VALUE);

Using Text Transformation

Text columns need to be transformed to nested table structure to do the mining on text columns. This transformation converts the text columns to nested table columns. A features table is created by text transformation. A model build text data column features table must be used for apply and test tasks to get the correct results.

The class diagram in Figure 7-12 illustrates the text transformation classes.

Figure 7-12 Text Transformation Class Diagram

Description of "Figure 7-12 Text Transformation Class Diagram"

Here, OraTextTransformation is used to specify the text columns and the feature tables associated with the text columns.

The following code illustrates clipping on the table MINING_BUILD_TEXT.

//Create OraTextTransformFactory
OraTextTransformFactory textXformFactory = dmeConn.getFactory(
      "oracle.dmt.jdm.transform.text.OraTextTransform");
//Create OraTextTransform
OraTextTransform txtXform = (OraTextTransformImpl)textXformFactory.create(
      "MINING_BUILD_TEXT", // name of the input data set
      "NESTED_TABLE_BUILD_TEXT ", // name of the transformation result
      "CUST_ID", //Case id column
      new String[] { "COMMENTS" } ); //Text column names 
      );
//Create transformation task
OraTransformationTask xformTask = m_xformTaskFactory.create(txtXform);
//Save transformation task object
dmeConn.saveObject("JDM_TEXTXFORM_TASK", xformTask, true);
//Execute transformation task asynchronously in the database
ExecutionHandle execHandle = dmeConn.execute("JDM_TEXTXFORM_TASK");
//Wait for completion of the task
ExecutionStatus execStatus = execHandle.waitForCompletion
     (Integer.MAX_VALUE);