Skip Headers
Oracle® Warehouse Builder Data Modeling, ETL, and Data Quality Guide
11g Release 2 (11.2)

Part Number E10935-02
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
View PDF

20 Monitoring Quality with Data Auditors and Data Rules

This chapter describes the use of data auditors to monitor data quality. It contains the following topics:

Overview of Data Auditors

Data auditors are processes that validate data against a set of data rules to determine which records comply and which do not. Data auditors gather statistical metrics on how well the data in a system complies with a rule by auditing and marking how many errors are occurring against the audited data. Data auditors ensure that your data complies with the data rules you defined, and thus let you track compliance with your business rules.

Data auditors are an important tool in ensuring, on an ongoing basis, that data quality levels meet business requirements. Identifying sudden increases in bad data also enables you to associate the increase with specific events and then identify possible root causes, such as addition of new data sources or changes to a data source or application that impacts your system.

To monitor data using Warehouse Builder, you must first discover or design data rules, as described in "Performing Data Profiling", then define data auditors, and schedule them or incorporate them into larger process flows.

Data auditors can be deployed and executed ad hoc if necessary. More often, they are scheduled for regular execution as part of a process flow, to monitor the quality of the data in an operational environment such as a data warehouse or ERP system, either immediately after updates like data loads, or at regular intervals.

Data Auditor Thresholds 

Data auditors have thresholds that allow you to create logic based on the fact that too many noncompliant records can divert the process flow into an error or notification stream. You can specify a threshold value for each data rule that the data auditor audits. This value is used to determine if the data in the data object is within the limits that you defined. Based on this threshold, the process can choose actions. In a process flow, you can test this value and branch based upon the result.

For example, you create a data auditor to audit the data in the Employees table. This table contains two data rules, emp_email_unique_rule and emp_sal_min_rule. You specify that the threshold value for both rules is 80%. If less than 80% of the data in the Employees table does not comply with the data rules, the auditing for the table fails.

See Also:

"Specifying Actions for Data That Violates Defined Data Rules" for information about specifying threshold value

Audit Results for Data Auditors 

In addition to setting thresholds for noncompliant records, you can capture audit results and store them for analysis. When executed, the data auditor sets several output values. One of these values is the audit result. Audit results provide information about the extent of data rule violations that occurred while running the data auditor.

Monitoring Data Quality Using Data Auditors

Data auditors are objects that you can use to continuously monitor your source schema to ensure that the data adheres to the defined data rules. You can monitor an object only if you have defined data rules for the object.

A data auditor can monitor data quality for more than one data object. You can create data auditors for tables, views, materialized views, and external tables.

See Also:

"Overview of Data Auditors" for more information about data auditors

To monitor data quality, perform the following steps:

  1. Create a data auditor containing the data objects that you want to monitor.

    See "Creating Data Auditors".

  2. (Optional) Configure the data auditor and set physical deployment parameters for the data auditor. These parameters are used while running the data auditor.

    See "Configuring Data Auditors".

  3. Deploy the data auditor to create the data auditor in the target schema.

    See "Deploying Objects".

  4. Run the data auditor to identify records that do not comply with the data rules defined on the data objects. You can either run data auditors manually or schedule them to run at specified times.

    See "Auditing Data Objects Using Data Auditors" for information about running data auditors.

  5. View the records that were identified as not complying with the defined data rules for the objects that are part of the data auditor.

    See "Viewing Data Auditor Error Tables".

Note:

You cannot import metadata for data auditors in Merge mode. For more information about import mode options, see Oracle Warehouse Builder Installation and Administration Guide for Windows and UNIX.

Creating Data Auditors

Use the Create Data Auditor Wizard to create data auditors. Data auditors are part of an Oracle module in a project.

To create a data auditor:

  1. Expand the Oracle module in which you want to create the data auditor.

  2. Right-click Data Auditors and select New Data Auditor.

    The Create Data Auditor Wizard is displayed.

  3. On the Name and Description page, provide the following details and click Next.

    • Name: Enter the name of the data auditor.

    • Description: Enter an optional description for the data auditor.

  4. On the Select Objects page, select the data objects that you want to audit and click Next.

    The Available section lists the objects available for auditing. The Selected section contains the objects that are selected for auditing. Use the buttons to move objects to the Selected section. To select multiple objects, hold down the Ctrl key while selecting objects.

  5. On the Choose Actions page, specify the action to be taken for records that do not comply with the data rules bound to the selected objects and click Next.

    See "Specifying Actions for Data That Violates Defined Data Rules".

  6. On the Summary page, review the selections that you made. Click Back to modify any selected values or click Finish to create the data auditor.

The new data auditor is added to the Data Auditors node. At this stage, only the metadata for the data auditor is stored in your workspace. To use this data auditor to monitor the quality of data in your data objects, you must run the data auditor as described in "Auditing Data Objects Using Data Auditors".

Specifying Actions for Data That Violates Defined Data Rules

Use the Choose Actions page of the Create Data Auditor Wizard or the Choose Action tab of the Edit Data Auditor dialog box to specify how records that violate data rules defined on the data objects are handled. You can also specify the level of non-compliance that is permitted for the data.

This page contains two sections: Error threshold mode and Data Rules.

Error threshold mode

Use the Error threshold mode to specify the method used to determine compliance of data to data rules.

Select one of the following methods:

  • Percent: The data auditor will set the audit result based on the percentage of records that do not comply with the data rule. This percentage is specified in the rule's Defect Threshold value.

  • Six Sigma: The data auditor will set the audit result based on the Six Sigma values for the data rules. If the calculated Six Sigma value for any rule is less than the specified Sigma Threshold value, then the data auditor will set the AUDIT RESULT to 2.

Data Rules

The Data Rules section lists the data rules applied to the objects selected on the Select Object page. For each rule, specify the following:

  • Action: The action to be performed if data in the source object does not comply with the data rule. Select Report to ensure that the data rule is audited. Select Ignore if you want the data rule to be ignored.

  • Defect Threshold: The percent of records that should comply with the data rules to ensure successful auditing. Specify a value between 1 and 100. This value is ignored if you select Six Sigma in the Error threshold mode section.

  • Sigma Threshold: The required success rate. Specify a number between 0 and 7. If you set the value to 7, no failures are allowed. This value is ignored if you select Percent in the Error threshold mode section.

Editing Data Auditors

After you create a data auditor, you can edit it and modify any of its properties using the following steps.

  1. In the Projects Navigator, right-click the data auditor and select Open.

    The Edit Data Auditor dialog box is displayed.

  2. On the Name tab, enter a new name or description for the data auditor.

  3. On the Select Objects tab, use the buttons to add or remove objects that will be audited as part of the data auditor.

  4. On the Choose Actions tab, edit the data correction actions that you specified.

    See "Specifying Actions for Data That Violates Defined Data Rules".

  5. On the Reconcile Objects tab, select the check box to the left of an object to reconcile its definition with the latest repository definition. Click Reconcile.

  6. Click OK to close the Edit Data Auditor dialog box.

Configuring Data Auditors

During the configuration phase, you assign physical deployment properties to the data auditor that you created by setting the configuration parameters. The Configuration tab for the data auditor enables you to configure the physical properties of the data auditor.

To configure a data auditor:

  1. From the Projects Navigator, expand the Databases node and then the Oracle node.

  2. Right-click the name of the data auditor that you want to configure and select Configure.

    The Configuration tab for the data auditor is displayed.

  3. Based on your requirement, configure the parameters listed in Run Time Parameters, Data Auditor Parameters, and Code Generation Options.

Run Time Parameters

Table 20-1 lists the Run Time configuration parameters.

Table 20-1 Run Time Configuration Parameters for Data Auditors

Configuration Parameter Name Description

Default Purge Group

Used when executing the package. Each audit record in the runtime schema is assigned to the purge group specified.

Bulk Size

Number of rows to be fetched as a batch while processing cursors

Analyze table sample percentage

Percentage of rows to be sampled when the target tables are analyzed. You analyze target tables to gather statistics that you can use to improve performance while loading data into the target tables.

Commit frequency

Number of rows processed before a commit is issued

Maximum number of errors

Maximum number of errors allowed before the execution of this step is terminated

Default Operating mode

Represents the operating mode used

The options that you can select are Row based, Row based (target only), Set based, Set based fail over to row based, Set based fail over to row based (target only).

Default Audit Level

Indicates the audit level used when executing the package. When the package is run, the amount of audit information captured in the runtime schema depends on the value set for this parameter.

The options that you can select are as follows:

  • ERROR DETAILS: At run time, error information and statistical auditing information is recorded.

  • COMPLETE: All auditing information is recorded at run time. This generates a huge amount of diagnostic data which may quickly fill the allocated tablespace.

  • NONE: No auditing information is recorded at run time.

  • STATISTICS: At run time, statistical auditing information is recorded.


Data Auditor Parameters

This category uses the same name as the data auditor. Table 20-2 describes the generic data auditor configuration parameters.

Table 20-2 Data Auditor Configuration Parameters for Data Auditors

Configuration Parameter Name Description

Generation Comments

Specify additional comments for the generated code.

Threshold Mode

Specify the mode that should be used to measure failure thresholds. The options are PERCENTAGE and SIX SIGMA.

Language

Language used to define the generated code. The options are PL/SQL (default) and UNDEFINED. Ensure that PL/SQL (default) is selected.

Deployable

Select this option to indicate that you want to deploy this data auditor. Warehouse Builder generates code only if the data auditor is marked as deployable.

Referred Calendar

Specify the schedule to associate with the data auditor. The schedule defines when the data auditor will run.


Code Generation Options

Table 20-3 describes the code generation options that you can set for data auditors.

Table 20-3 Code Generation Options for Data Auditors

Configuration Parameter Name Description

ANSI SQL Syntax

Select this option to use ANSI SQL code in the generated code. If this option is not selected, Oracle SQL syntax is generated.

Commit Control

Specifies how commit is performed. The options available for this parameter are: Automatic, Automatic Correlated, and Manual. Ensure that this parameter is set to Automatic.

Enable Parallel DML

Select this option to enable parallel DML at run time.

Analyze table statements

Set this option to True to generate the statement used to collect statistics for the data auditor. If the target table is not in the same schema as the mapping and you want to analyze the table, then you must grant ANALYZE ANY privilege to the schema owning the mapping.

Optimized Code

Select this option to indicate that optimized code should be generated.

Generation Mode

Select the mode in which optimized code should be generated. The options that you can select are: All Operating Modes, Row based, Row based (target only), Set based, Set based fail over to row based, and Set based fail over to row based (target only).

Use Target Load Ordering

Select this option to generate code for target load ordering.

Error Trigger

Specify the name of the error trigger procedure.

Bulk Processing Code

Select this option to generate bulk processing code.


Auditing Data Objects Using Data Auditors

After you create a data auditor, you can use it to monitor the data in your data objects. This ensures that the data rule violations for the objects are detected. When you run a data auditor, any records that violate the data rules defined on the data objects are written to the error tables.

There are two ways of using data auditors:

Manually Running Data Auditors

To check if the data in the data object adheres to the data rules defined for the object, you must run the data auditor. You can run data auditors from the Design Center or the Control Center Manager. To run a data auditor from the Design Center, right-click the data auditor and select Start. In the Control Center Manager, select the data auditor, and from the File menu, select Start. The results are displayed in the Job Details window as described in "Data Auditor Execution Results".

Scheduling a Data Auditor to Run

You can schedule the execution of a data auditor using the following steps:

  1. Create a process flow that contains a Data Auditor Monitor activity that represents the data auditor.

  2. Schedule this process flow to run at a predefined time.

    For more information about scheduling objects, see "Defining Schedules".

Figure 20-1 displays a process flow that contains a Data Auditor Monitor activity. In this process flow, LOAD_EMP_MAP is a mapping that loads data into the EMP table. If the data load is successful, the data auditor EMP_DATA_AUDIT is run. The data auditor monitors the data in the EMP table based on the data rules defined for the table.

Figure 20-1 Data Auditor Monitor Activity in a Process Flow

Description of Figure 20-1 follows
Description of "Figure 20-1 Data Auditor Monitor Activity in a Process Flow"

Data Auditor Execution Results

After you run a data auditor, the Job Details window displays the details of the execution. The Job Details window contains two tabs: Input Parameters and Execution Results. Note that the Job Details window is displayed only when you set the deployment preference Show Monitor to true.

See Also:

Oracle Warehouse Builder Concepts for more information about deployment preferences.

Figure 20-2 displays the Job Details window containing the Input Parameters Tab and the Execution Results Tab.

Figure 20-2 Data Auditor Execution Results

Description of Figure 20-2 follows
Description of "Figure 20-2 Data Auditor Execution Results"

Input Parameters Tab 

The Input Parameters tab contains the values of input parameters used to run the data auditor.

Execution Results Tab 

The Execution Results tab displays the results of running the data auditor. This tab contains two sections: Row Activity and Output Parameters.

The Row Activity section contains details about the inserts into the error table for each step. Note that when more than one data rule is specified, multitable insert may be used in the data auditor. In this case, the count of the number of rows will not be accurate.

For example, in the data auditor execution result in Figure 20-2, the data rule called E_NOT_NULL inserted one record into the error table.

The Output Parameters section contains the following three parameters:

  • AUDIT_RESULT: Indicates the result of running the data auditor. The possible values for this parameter are as follows:

    • 0: No data rule violations occurred.

    • 1: At least one data rule violation occurred, but no data rule failed to meet the minimum quality threshold as defined in the data auditor.

    • 2: At least one data rule failed to meet the minimum quality threshold.

    For more information about setting the threshold, see the step on choosing actions in "Creating Data Auditors".

  • EO_<data_rule_name>: Represents the calculated error quality for the specified data rule. Zero (0) indicates all errors and 100 indicates no errors.

  • SO_<data_rule_name>: Represents the Six Sigma quality calculated for the specified data rule.

Viewing Data Auditor Error Tables

When you run a data auditor, either manually or as part of the process flow, Warehouse Builder writes records that do not comply with defined data rules for the objects contained in the data auditor to error tables. Each object contained in the data auditor has a corresponding error table that stores noncompliant records for that object.

You view all noncompliant records that are written to error tables by using the Repository Browser.

To view error tables created as a result of data auditor execution:

  1. Grant privileges on the error tables as described in "Granting Privileges on Error Tables".

  2. Use the Repository Browser to view the error tables. Perform the following steps:

    1. Open the Repository Browser as described in "Opening the Repository Browser".

    2. View error tables using the Repository Browser as described in "Viewing Error Tables Created as a Result of Data Auditor Execution".

Granting Privileges on Error Tables

Before you view data stored in error tables using the Repository Browser, you must grant privileges on the error tables to the OWBSYS user. This enables the Repository Browser to access error table data.

To grant privileges on error tables:

  1. In SQL*Plus, log in to the schema containing the error tables.

    The error table for an object is stored in the same schema as the object.

  2. Run the SQL script OWB_ORACLE_HOME\owb\rtp\sql\grant_error_table_privileges.sql.

  3. When prompted, enter the name of the error table for which you want to grant privileges.

    If you did not specify a name for the error table of an object using the Error Table Name property, Warehouse Builder provides a default name. For objects that use error tables, the default error table name is the object name suffixed by "_ERR".

  4. Repeat Steps 2 and 3 for each error table to which you want to grant privileges.