What's New in the Oracle Data Mining APIs?

This section describes new features in the Oracle Data Mining APIs. It includes the following sections:

Oracle Data Mining 11g Release 2 (11.2) API New Features
Oracle Data Mining 11g Release 1 (11.1) API New Features
Oracle Data Mining 10g Release 2 (10.2) API New Features

Oracle Data Mining 11g Release 2 (11.2) API New Features

This section lists the changes that have been introduced in the Oracle Data Mining 11.2 PL/SQL API:

Note:

The same changes are implemented in the Java API. Refer to Oracle Data Mining Java API Reference.

Support for Native Transactional Data with Association Rules

In Oracle Data Mining 11g Release 2 (11.2), you can build association rules models without first transforming the transactional data.

See:
"Market Basket Data"
SVM Class Weights Specified with CLAS_WEIGHTS_TABLE_NAME

Previously SVM class weights were specified in the priors table (CLAS_PRIORS_TABLE_NAME setting). Now SVM class weights and GLM class weights are both specified in a class weights table (CLAS_WEIGHTS_TABLE_NAME setting)

See:
DBMS_DATA_MINING setting CLAS_WEIGHTS_TABLE_NAME in Oracle Database PL/SQL Packages and Types Reference
FORCE argument to DROP_MODEL

You can now force a drop model operation even if a serious system error has interrupted the model build process.

See:
DBMS_DATA_MINING.DROP_MODEL procedure in Oracle Database PL/SQL Packages and Types Reference
GET_MODEL_DETAILS_SVM has new REVERSE_COEF parameter

To preserve model transparency, the GET_MODEL_DETAILS functions automatically reverse the transformations generated by ADP during the model build. You can obtain the transformed attribute coefficients used internally by an SVM model by setting the new reverse_coef parameter to 1. This causes the coefficients and bias to be returned with the normalization shifts and scales applied by ADP.

See:
DBMS_DATA_MINING.GET_MODEL_DETAILS_SVM in Oracle Database PL/SQL Packages and Types Reference

Oracle Data Mining 11g Release 1 (11.1) API New Features

This section describes features introduced in Oracle Data Mining 11g Release 1 (11.1).

Mining Model schema objects

In Oracle 11g, Data Mining models are implemented as data dictionary objects in the SYS schema. A set of new data dictionary views present mining models and their properties. New system and object privileges control access to mining model objects.

In previous releases, Data Mining models were implemented as a collection of tables and metadata within the DMSYS schema. In Oracle 11g, the DMSYS schema no longer exists.

See Also:

Oracle Data Mining Administrator's Guide for information about privileges for accessing mining models

"Mining Model Schema Objects" for information about Oracle Data Mining data dictionary views
Automatic and Embedded Data Preparation (ADP and EDP)

In most cases, data must be transformed using techniques such as binning, normalization, or missing value treatment before it can be mined. Data for build, test, and apply must undergo the exact same transformations.

In previous releases, data transformation was the responsibility of the user. In Oracle Database 11g, the data preparation process can be automated. Algorithm-appropriate transformation instructions are embedded in the model and automatically applied to the build data and scoring data. The automatic transformations can be complemented by or replaced with user-specified transformations.

Because they contain the instructions for their own data preparation, mining models are known as supermodels.

See Also:

Oracle Data Mining Concepts for information about automatic, embedded, and custom data transformation for Data Mining

Oracle Database PL/SQL Packages and Types Reference for information about DBMS_DATA_MINING_TRANSFORM
Scoping of Nested Data

Oracle Data Mining supports nested data types for both categorical and numerical data. Most algorithms require multi-record case data to be presented as columns of nested rows, each containing an attribute name/value pair. Oracle Data Mining processes each nested row as a separate attribute.

Attributes that are not nested are identified by the column name. Since Oracle Database prevents duplicate column names, the names of non-nested attributes are always unique. However, no such guarantee exists for nested attributes. In Oracle Data Mining 10g, it was up to the user to verify the uniqueness of nested attribute names. This name checking was a required step in the preparation of nested data.

In Oracle Data Mining 11g, attribute name duplication is not possible, because nested attribute names are scoped with the column name. The name checking step is no longer required, thus simplifying the task of data preparation for the user.

In Oracle Data Mining 11g, Decision Tree and O-Cluster algorithms do not support nested data.

See Also:
"Nested Data"
Standardized Handling of Sparse Data and Missing Values

Handling of sparse data and missing values has been standardized across algorithms in Oracle Data Mining 11g. Data is sparse when a high percentage of the cells are empty but all the values are assumed to be known. This is the case in market basket data. When some cells are empty, and their values are not known, they are assumed to be missing at random. Oracle Data Mining assumes that missing data in a nested column is a sparse representation, and missing data in a non-nested column is assumed to be missing at random.

See Also:
"Missing Data"
Generalized Linear Models

A new algorithm, Generalized Linear Models, is introduced in Oracle 11g. It supports two mining functions: classification (logistic regression) and regression (linear regression).

See Also:
Oracle Data Mining Concepts for information about Generalized Linear Models
New SQL Data Mining Function

A new SQL Data Mining function, PREDICTION_BOUNDS, has been introduced for use with Generalized Linear Models. PREDICTION_BOUNDS returns the confidence bounds on predicted values (regression models) or predicted probabilities (classification).

See Also:
Chapter 6, "Scoring and Deployment"
Enhanced Support for Cost-Sensitive Decision Making

Cost matrix support is significantly enhanced in Oracle 11g. A cost matrix can be added or removed from any classification model using the new procedures, DBMS_DATA_MINING.ADD_COST_MATRIX and DBMS_DATA_MINING.REMOVE_COST_MATRIX.

The SQL Data Mining functions support new syntax for specifying an in-line cost matrix. With this new feature, cost-sensitive model results can be returned within a SQL statement even if the model does not have an associated cost matrix for scoring.

Only Decision Tree models can be built with a cost matrix.

See Also:

Oracle Data Mining Concepts for information about costs

Oracle Database PL/SQL Packages and Types Reference for information about ADD_COST_MATRIX and REMOVE_COST_MATRIX

"Cost-Sensitive Decision Making"
Desupported Features
- DMSYS schema
- Oracle Data Mining Scoring Engine
- In Oracle 10.2, you could use Database Configuration Assistant (DBCA) to configure the Data Mining option. In Oracle 11g, you do not need to use DBCA to configure the Data Mining option.
- Basic Local Alignment Search Tool (BLAST)
Deprecated Features
- Adaptive Bayes Network classification algorithm
- DM_USER_MODELS view and functions that provide information about models, model signature, and model settings (for example, GET_MODEL_SETTINGS, GET_DEFAULT_SETTINGS, and GET_MODEL_SIGNATURE). These are replaced by data dictionary views. See Chapter 5 for information about the data dictionary views.

Enhancements to the Oracle Data Mining Java API

The Oracle Data Mining Java API (OJDM) fully supports the new features in Oracle Data Mining 11g Release 2 (11.2). This section provides a summary of the new features in the Java API. For details, see Oracle Data Mining Java API Reference (Javadoc).

As described in "Mining Model schema objects", mining models in 11g Release 2 (11.2) are data dictionary objects in the SYS schema. System and object privileges control access to mining models.

In the Oracle Data Mining Java API, a new extension method OraConnection.getObjectNames is added to support listing of mining objects that can be accessed by a user. This method provides various object filtering options that applications can use as needed.
As described in "Automatic and Embedded Data Preparation (ADP and EDP)", Oracle Data Mining 11g Release 2 (11.2) supports automatic and embedded data preparation (supermodels).

In the Oracle Data Mining Java API, a new build setting extension method, OraBuildSettings.useAutomatedDataPreparations, is added to enable ADP. Using the new OraBuildTask.setTransformationSequenceName, applications can embed the transformations with the model.
Two new GLM packages are introduced: oracle.dmt.jdm.algorithm.glm and oracle.dmt.jdm.modeldetail.glm. These packages have GLM algorithm settings and model details interfaces respectively.
New apply content enumeration values, probabilityLowerBound and probabilityUpperBound, are added to specify probability bounds for classification apply output. The enumeration oracle.dmt.jdm.supervised.classification.OraClassificationApplyContent specifies these enumerations. Similarly apply contents enumeration values predictionLowerBound and predictionUpperBound are added to specify prediction bounds for regression model apply output. In this release only GLM models support this feature.
New static methods addCostMatrix and removeCostMatrix are added to OraClassificationModel to support associating a cost matrix with the model. This will greatly ease the deployment of costs along with the model.
Mining task features are enhanced to support the building of mining process workflows. Applications can specify dependent tasks using the new OraTask.addDependency method. Another notable new task feature is overwriteOutput, which can be enabled by calling the new OraTask.overwriteOutput method.

With these new features, applications can easily develop mining process workflows and deploy them to the database server. These task workflows can be monitored from the client side. For usage of these methods refer to the demo programs shipped with the product (See Oracle Data Mining Administrator's Guide for information about the demo programs.)
A new mining object, oracle.dmt.jdm.transform.OraTransformationSequence supports the specification of user-defined transformation sequences. These can either be embedded in the mining model or managed externally. In addition, the new OraExpressionTransform object can be used to specify SQL expressions to be included with the model.
New oracle.dmt.jdm.OraProfileTask is added to support the new predictive analytics profile functionality.
The Oracle Data Mining Java API can be used with Oracle Database 11g Release 2 (11.2) and with Oracle Database 10.2. When used with a 10.2 database, only the 10.2 features are available.

Oracle Data Mining 10g Release 2 (10.2) API New Features

This section describes features introduced in Oracle Data Mining 10.2.

Java Data Mining (JDM) compliant Java API

Oracle 10g Release 2 introduced a completely new Java API for Data Mining. The API implements JSR-000073, developed through the Java Community Process (http://jcp.org).

The new Java API is layered on the PL/SQL API, and the two APIs are fully interoperable. The new Java API is not compatible with the Java API available in the previous release (Oracle 10g Release 1).
SQL built-in functions for Data Mining

New built-in SQL functions support the scoring of classification, regression, clustering, and feature extraction models. Within the context of standard SQL statements, pre-created models can be applied to new data and the results returned for further processing. The Data Mining SQL functions are:
- PREDICTION, PREDICTION_COST, PREDICTION_DETAILS, PREDICTION_PROBABILITY, PREDICTION_SET
- CLUSTER_ID, CLUSTER_PROBABILITY, CLUSTER_SET
- FEATURE_ID, FEATURE_SET, FEATURE_VALUE
Predictive Analytics

Predictive Analytics automates the process of data mining. Without user intervention, Predictive Analytics routines manage data preparation, algorithm selection, model building, and model scoring.

In the DBMS_PREDICTIVE_ANALYTICS PL/SQL package, Oracle Data Mining provides Predictive Analytics routines that calculate predictions and determine the relative influence of attributes on the prediction.

Oracle Spreadsheet Add-In for Predictive Analytics implements DBMS_PREDICTIVE_ANALYTICS within the context of an Excel spreadsheet. The Spreadsheet Add-In is distributed on Oracle Technology Network.
New and enhanced algorithms
- The new Decision Tree algorithm generates human-understandable rules for a prediction.
- The new One-Class Support Vector Machine algorithm supports anomaly detection.
- The support Vector Machine algorithm is enhanced with active learning for the management of large build data sets.
- The PL/SQL and Java API both support the O-Cluster algorithm. In Oracle 10g Release 1, O-Cluster was only supported in the Java API.