Skip Headers
Oracle® Database Performance Tuning Guide
11g Release 2 (11.2)

Part Number E10821-03
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
View PDF

13 Managing Optimizer Statistics

This chapter explains why statistics are important for the query optimizer and how to gather and use optimizer statistics with the DBMS_STATS package.

The chapter contains the following sections:

13.1 Overview of Optimizer Statistics

Optimizer statistics describe details about the database and the objects in the database. The query optimizer uses these statistics to choose the best execution plan for each SQL statement.

Optimizer statistics include the following:

The database stores optimizer statistics in the data dictionary. You can access these statistics using data dictionary views.

Because objects in a database can be constantly changing, statistics must be regularly updated so that they accurately describe these database objects. Oracle Database automatically maintains optimizer statistics.

You can maintain optimizer statistics manually using the DBMS_STATS package. For example, you can save and restore copies of statistics. You can export statistics from one database and import those statistics into another database. For example, you could export statistics from a production system to a test system. You can also lock statistics to prevent them from changing.

13.2 Managing Automatic Optimizer Statistics Collection

Oracle recommends that you enable automatic optimizer statistics collection. In this case, the database automatically collects optimizer statistics for all schema objects in the database for which statistics are absent or stale. The process eliminates many manual tasks associated with managing the optimizer, and significantly reduces the risks of generating poor execution plans because of missing or stale statistics.

Automatic optimizer statistics collection calls the DBMS_STATS.GATHER_DATABASE_STATS_JOB_PROC procedure. This internal procedure operates similarly to the DBMS_STATS.GATHER_DATABASE_STATS procedure using the GATHER AUTO option. The main difference is that GATHER_DATABASE_STATS_JOB_PROC prioritizes database objects that require statistics, so that objects that most need updated statistics are processed first, before the maintenance window closes.

This section contains the following topics:

13.2.1 Enabling and Disabling Automatic Optimizer Statistics Collection

The automated maintenance tasks infrastructure (known as AutoTask) schedules tasks to run automatically in Oracle Scheduler windows known as maintenance windows. By default, one window is scheduled for each day of the week. Automatic optimizer statistics collection runs as part of AutoTask and is enabled by default to run in all predefined maintenance windows.

If for some reason automatic optimizer statistics collection is disabled, you can enable it using the ENABLE procedure in the DBMS_AUTO_TASK_ADMIN package:

BEGIN
  DBMS_AUTO_TASK_ADMIN.ENABLE(
    client_name => 'auto optimizer stats collection', 
    operation => NULL, 
    window_name => NULL);
END;
/

When you want to disable automatic optimizer statistics collection, you can disable it using the DISABLE procedure in the DBMS_AUTO_TASK_ADMIN package:

BEGIN
  DBMS_AUTO_TASK_ADMIN.DISABLE(
    client_name => 'auto optimizer stats collection', 
    operation => NULL, 
    window_name => NULL);
END;
/

Automatic optimizer statistics collection relies on the modification monitoring feature, described in "Determining Stale Statistics". If this feature is disabled, then the automatic optimizer statistics collection job will not be able to detect stale statistics. This feature is enabled when the STATISTICS_LEVEL parameter is set to TYPICAL or ALL. TYPICAL is the default value.

See Also:

13.2.2 Considerations When Gathering Statistics

This section discusses:

13.2.2.1 When to Use Manual Statistics

Automatic optimizer statistics collection should be sufficient for most database objects being modified at a moderate speed. However, in some cases the collection may not be adequate. Because the collection runs during maintenance windows, the statistics on tables that are significantly modified throughout the day may become stale. There are typically two types of such objects:

  • Volatile tables that are being deleted or truncated and rebuilt during the course of the day.

  • Objects which are the target of large bulk loads which add 10% or more to the object's total size.

For highly volatile tables, there are two approaches:

  • The statistics on these tables can be set to NULL. When Oracle encounters a table with no statistics, Oracle dynamically gathers the necessary statistics as part of query optimization. This dynamic sampling feature is controlled by the OPTIMIZER_DYNAMIC_SAMPLING parameter, and this parameter should be set to a value of 2 or higher. The default value is 2. The statistics can be set to NULL by deleting and then locking the statistics:

    BEGIN
      DBMS_STATS.DELETE_TABLE_STATS('OE','ORDERS');
      DBMS_STATS.LOCK_TABLE_STATS('OE','ORDERS');
    END;
    /
    

    See "Dynamic Sampling Levels" for information about the sampling levels that can be set.

  • The statistics on these tables can be set to values that represent the typical state of the table. You should gather statistics on the table when the tables has a representative number of rows, and then lock the statistics.

    This may be more effective than automatic optimizer statistic collection, because any statistics generated on the table during the overnight batch window may not be the most appropriate statistics for the daytime workload.

For tables which are being bulk-loaded, the statistics-gathering procedures should be run on those tables immediately following the load process, preferably as part of the same script or job that is running the bulk load.

For external tables, statistics are not collected during GATHER_SCHEMA_STATS, GATHER_DATABASE_STATS, and automatic optimizer statistics collection processing. However, you can collect statistics on an individual external table using GATHER_TABLE_STATS. Sampling on external tables is not supported so the ESTIMATE_PERCENT option should be explicitly set to NULL. Because data manipulation is not allowed against external tables, it is sufficient to analyze external tables when the corresponding file changes.

If the monitoring feature is disabled by setting STATISTICS_LEVEL to BASIC, automatic optimizer statistics collection cannot detect stale statistics. In this case statistics need to be manually gathered. See "Determining Stale Statistics" for information on the automatic monitoring facility.

Another area in which statistics need to be manually gathered are the system statistics. These statistics are not automatically gathered. See "System Statistics" for more information.

Statistics on fixed objects, such as the dynamic performance tables, need to be manually collected using GATHER_FIXED_OBJECTS_STATS procedure. Fixed objects record current database activity; statistics gathering should be done when database has representative activity.

13.2.2.2 Restoring Previous Versions of Statistics

Whenever statistics in dictionary are modified, old versions of statistics are saved automatically for future restoring. Statistics can be restored using RESTORE procedures of DBMS_STATS package. See "Restoring Previous Versions of Statistics" for more information.

13.2.2.3 Locking Statistics

In some cases, you may want to prevent any new statistics from being gathered on a table or schema by the DBMS_STATS_JOB process, such as highly volatile tables discussed in "When to Use Manual Statistics". In those cases, the DBMS_STATS package provides procedures for locking the statistics for a table or schema. See "Locking Statistics for a Table or Schema" for more information.

13.3 Gathering Statistics Manually

If you do not use automatic optimizer statistics collection, then you must run DBMS_STATS to manually collect statistics in all schemas, including system schemas. If the database content changes regularly, then you must also gather statistics regularly to ensure that the statistics accurately represent characteristics of database objects.

This section contains the following topics:

13.3.1 Gathering Statistics with DBMS_STATS Procedures

You can gather statistics with the DBMS_STATS package. This PL/SQL package is also used to modify, view, export, import, and delete statistics.

Note:

Do not use the COMPUTE and ESTIMATE clauses of ANALYZE statement to collect optimizer statistics. These clauses are supported solely for backward compatibility and may be removed in a future release. The DBMS_STATS package collects a broader, more accurate set of statistics, and gathers statistics more efficiently.

You may continue to use ANALYZE statement for other purposes not related to optimizer statistics collection:

  • To use the VALIDATE or LIST CHAINED ROWS clauses

  • To collect information on free list blocks

The DBMS_STATS package can gather statistics on table and indexes and individual columns and partitions of tables. It does not gather cluster statistics. However, you can use DBMS_STATS to gather statistics on individual tables instead of the whole cluster.

When you generate statistics for a table, column, or index, if the data dictionary already contains statistics for the object, then Oracle updates the existing statistics. The older statistics are saved and can be restored later if necessary. See "Restoring Previous Versions of Statistics".

When gathering statistics on system schemas, you can use the procedure DBMS_STATS.GATHER_DICTIONARY_STATS. This procedure gathers statistics for all system schemas, including SYS and SYSTEM, and other optional schemas, such as CTXSYS and DRSYS.

When statistics are updated for a database object, Oracle invalidates any currently parsed SQL statements that access the object. The next time such a statement executes, the statement is re-parsed and the optimizer automatically chooses a new execution plan based on the new statistics. Distributed statements accessing objects with new statistics on remote databases are not invalidated. The new statistics take effect the next time the SQL statement is parsed.

Table 13-1 lists the procedures in the DBMS_STATS package for gathering statistics on database objects.

Table 13-1 Statistics Gathering Procedures in the DBMS_STATS Package

Procedure Collects

GATHER_INDEX_STATS

Index statistics

GATHER_TABLE_STATS

Table, column, and index statistics

GATHER_SCHEMA_STATS

Statistics for all objects in a schema

GATHER_DICTIONARY_STATS

Statistics for all dictionary objects

GATHER_DATABASE_STATS

Statistics for all objects in a database


See Also:

Oracle Database PL/SQL Packages and Types Reference for syntax and examples of all DBMS_STATS procedures

When using any of these procedures, there are several important considerations for statistics gathering:

13.3.1.1 Statistics Gathering Using Sampling

The statistics-gathering operations can utilize sampling to estimate statistics. Sampling is an important technique for gathering statistics. Gathering statistics without sampling requires full table scans and sorts of entire tables. Sampling minimizes the resources necessary to gather statistics.

Sampling is specified using the ESTIMATE_PERCENT argument to the DBMS_STATS procedures. While the sampling percentage can be set to any value, Oracle recommends setting the ESTIMATE_PERCENT parameter of the DBMS_STATS gathering procedures to DBMS_STATS.AUTO_SAMPLE_SIZE to maximize performance gains while achieving necessary statistical accuracy. AUTO_SAMPLE_SIZE lets Oracle determine the best sample size necessary for good statistics, based on the statistical property of the object. Because each type of statistics has different requirements, the size of the actual sample taken may not be the same across the table, columns, or indexes. For example, to collect table and column statistics for all tables in the OE schema with auto-sampling, you could use:

EXECUTE DBMS_STATS.GATHER_SCHEMA_STATS('OE',DBMS_STATS.AUTO_SAMPLE_SIZE);

When the ESTIMATE_PERCENT parameter is manually specified, the DBMS_STATS gathering procedures may automatically increase the sampling percentage if the specified percentage did not produce a large enough sample. This ensures the stability of the estimated values by reducing fluctuations.

13.3.1.2 Parallel Statistics Gathering

The statistics-gathering operations can run either serially or in parallel. The degree of parallelism can be specified with the DEGREE argument to the DBMS_STATS gathering procedures. Parallel statistics gathering can be used in conjunction with sampling. Oracle recommends setting the DEGREE parameter to DBMS_STATS.AUTO_DEGREE. This setting allows Oracle to choose an appropriate degree of parallelism based on the size of the object and the settings for the parallel-related init.ora parameters.

Note that certain types of index statistics are not gathered in parallel, including cluster indexes, domain indexes, and bitmap join indexes.

13.3.1.3 Statistics on Partitioned Objects

For partitioned tables and indexes, DBMS_STATS can gather separate statistics for each partition and global statistics for the entire table or index. Similarly, for composite partitioning, DBMS_STATS can gather separate statistics for subpartitions, partitions, and the entire table or index. The type of partitioning statistics to be gathered is specified in the GRANULARITY argument to the DBMS_STATS gathering procedures.

Depending on the SQL statement being optimized, the optimizer can choose to use either the partition (or subpartition) statistics or the global statistics. Both types of statistics are important for most applications, and Oracle recommends setting the GRANULARITY parameter to AUTO to gather both types of partition statistics.

With partitioned tables, new data is usually loaded into a new partition. As new partitions are added and data loaded, statistics must be gathered on the new partition, and global statistics must be kept up to date. If the INCREMENTAL value for a partition table is set to TRUE, and you gather statistics on that table with the GRANULARITY parameter set to AUTO, Oracle will gather statistics on the new partition and update the global table statistics by scanning only those partitions that have been modified and not the entire table. If the INCREMENTAL value for the partitioned table is set to FALSE (default value), then a full table scan is used to maintain the global statistics. This is a highly resource intensive and time consuming operation for large tables.

Note:

When you set INCREMENTAL to TRUE for a partitioned table, the SYSAUX tablespace consumes additional space to maintain the global statistics.

Use the DBMS_STATS.SET_TABLE_PREF procedure to change the INCREMENTAL value for a partition table. For more information, see Oracle Database PL/SQL Packages and Types Reference.

13.3.1.4 Column Statistics and Histograms

When gathering statistics on a table, DBMS_STATS gathers information about the data distribution of the columns within the table. The most basic information about the data distribution is the maximum value and minimum value of the column. However, this level of statistics may be insufficient for the optimizer's needs if the data within the column is skewed. For skewed data distributions, histograms can also be created as part of the column statistics to describe the data distribution of a given column. Histograms are described in more details in "Viewing Histograms".

Histograms are specified using the METHOD_OPT argument of the DBMS_STATS gathering procedures. Oracle recommends setting the METHOD_OPT to FOR ALL COLUMNS SIZE AUTO. With this setting, Oracle automatically determines which columns require histograms and the number of buckets (size) of each histogram. You can also manually specify which columns should have histograms and the size of each histogram.

Note:

If you need to remove all rows from a table when using DBMS_STATS, use TRUNCATE instead of dropping and re-creating the same table. When a table is dropped, workload information used by the auto-histogram gathering feature and saved statistics history used by the RESTORE_*_STATS procedures will be lost. Without this data, these features will not function properly.

13.3.1.5 Extended Statistics

Oracle can also gather statistics on a group of columns within a table or an expression on a column. For more details on these, refer to:

13.3.1.6 MultiColumn Statistics

When multiple columns from a single table are used together in the where clause of a query (multiple single column predicates), the relationship between the columns can strongly affect the combined selectivity for the column group.

For example, consider the customers table in the SH schema. The columns cust_state_province and country_id are related, with cust_state_province determining the country_id for each customer. Querying the customers table where the cust_state_province is California:

Select count(*)
from sh.customers 
where cust_state_province = 'CA';

returns the following value:

COUNT(*)
----------
    3341

Adding an extra predicate on the country_id column does not change the result when the country_id is 52790 (United States of America).

Select count(*) 
from customers 
where cust_state_province = 'CA' 
and country_id=52790;

returns the same value as the previous query:

COUNT(*)
----------
    3341

However, if the country_id has any other value, such as 52775 (Brazil), as in the following query:

Select count(*) 
from   customers 
where  cust_state_province = 'CA' 
and    country_id=52775;

then the returned value is:

COUNT(*)
----------
       0

With individual column statistics the optimizer has no way of knowing that the cust_state_province and the country_id columns are related. By gathering statistics on these columns as a group (column group), the optimizer will now have a more accurate selectivity value for the group, instead of having to generate the value based on the individual column statistics.

By default, Oracle creates column groups for a table, based on the workload analysis, similar to how it is done for histograms.

You can also create column groups manually by using the DBMS_STATS package. You can use this package to create a column group, get the name of a column group, or delete a column group from a table.

13.3.1.6.1 Creating a Column Group

Use the create_extended_statistics function to create a column group. The create_extended_statistics function returns the system-generated name of the newly created column group. Table 13-2 lists the input parameters for this function:

Table 13-2 Parameters for the create_extended_statistics Function

Parameter Description

owner

Schema owner. NULL indicates current schema.

tab_name

Name of the table to which the column group is being added.

extension

Columns in the column group.


For example, if you want to add a column group consisting of the cust_state_province and country_id columns to the customers table in SH schema:

declare
  cg_name varchar2(30);
begin
 cg_name := dbms_stats.create_extended_stats(null,'customers', '(cust_state_province,country_id)');
end;
/
13.3.1.6.2 Getting a Column Group

Use the show_extended_stats_name function to obtain the name of the column group for a given set of columns. Table 13-3 lists the input parameters for this function:

Table 13-3 Parameters for the show_extended_stats_name Function

Parameter Description

owner

Schema owner. NULL indicates current schema.

tab_name

Name of the table to which the column group belongs.

extension

Name of the column group.


For example, use the following query to obtain the column group name for a set of columns on the customers table:

select sys.dbms_stats.show_extended_stats_name('sh','customers',
       '(cust_state_province,country_id)') col_group_name 
from dual;

You will obtain an output similar to the following:

COL_GROUP_NAME
----------------
SYS_STU#S#WF25Z#QAHIHE#MOFFMM
13.3.1.6.3 Dropping a Column Group

Use the drop_extended_stats function to delete a column group from a table. Table 13-4 lists the input parameters for this function:

Table 13-4 Parameters for the drop_extended_stats Function

Parameter Description

owner

Schema owner. NULL indicates current schema.

tab_name

Name of the table to which the column group belongs.

extension

Name of the column group to be deleted.


For example, the following statement deletes a column group from the customers table:

exec dbms_stats.drop_extended_stats('sh','customers','(cust_state_province,country_id)');
13.3.1.6.4 Monitoring Column Groups

Use the dictionary table user_stat_extensions to obtain information about MultiColumn statistics:

Select extension_name, extension 
from user_stat_extensions 
where table_name='CUSTOMERS';
EXTENSION_NAME                     EXTENSION
-------------------------------------------------------------------------
SYS_STU#S#WF25Z#QAHIHE#MOFFMM_     ("CUST_STATE_PROVINCE","COUNTRY_ID")

Use the following query to find the number of distinct values and find whether a histogram has been created for a column group:

select e.extension col_group, t.num_distinct, t.histogram
  2  from user_stat_extensions e, user_tab_col_statistics t
  3  where e.extension_name=t.column_name
  4  and e.table_name=t.table_name
  5  and t.table_name='CUSTOMERS';
COL_GROUP                             NUM_DISTINCT        HISTOGRAM
-------------------------------------------------------------------------------
("COUNTRY_ID","CUST_STATE_PROVINCE")  145                 FREQUENCY
13.3.1.6.5 Gathering Statistics on Column Groups

The METHOD_OPT argument of the DBMS_STATS package enables you to gather statistics on column groups. If you set the value of this argument to FOR ALL COLUMNS SIZE AUTO, the optimizer will gather statistics on all the existing column groups. To collect statistics on a new column group, specify the group using FOR COLUMNS. The column group will be automatically created as part of statistic gathering.

For example, the following statement creates a new column group for the customers table on the columns cust_state_province, country_id and gathers statistics (including histograms) on the entire table and the new column group:

EXEC DBMS_STATS.GATHER_TABLE_STATS('SH','CUSTOMERS',METHOD_OPT =>
'FOR ALL COLUMNS SIZE SKEWONLY 
FOR COLUMNS (CUST_STATE_PROVINCE,COUNTRY_ID) SIZE SKEWONLY');

Note:

The optimizer will only use MultiColumn statistics with equality predicates.

13.3.1.7 Expression Statistics

When a function is applied to a column in the where clause of a query (function(col1)=constant), the optimizer has no way of knowing how that function will affect the selectivity of the column. By gathering expression statistics on the expression function(col1), the optimizer will have a more accurate selectivity value.

An example of such a function is:

SELECT COUNT(*)
FROM   CUSTOMERS
WHERE  LOWER(CUST_STATE_PROVINCE)='CA';
13.3.1.7.1 Creating Expression Statistics

You can create statistics on an expression as part of the gather_table_stats procedure:

exec dbms_stats.gather_table_stats('sh','customers', method_opt =>
'for all columns size skewonly 
 for columns (lower(cust_state_province)) size skewonly');

You can also use the create_extended_statistics function to accomplish this:

select 
dbms_stats.create_extended_stats(null,'customers','(lower(cust_state_province))') 
from dual;
13.3.1.7.2 Monitoring Expression Statistics

Use the dictionary table user_stat_extensions to obtain information about expression statistics:

Select extension_name, extension 
from user_stat_extensions 
where table_name='CUSTOMERS';
EXTENSION_NAME                    EXTENSION
------------------------------------------------------------------------
SYS_STUBPHJSBRKOIK9O2YV3W8HOUE    (LOWER("CUST_STATE_PROVINCE"))

Use the following query to find the number of distinct values and find whether a histogram has been created:

select e.extension col_group, t.num_distinct, t.histogram
  2  from user_stat_extensions e, user_tab_col_statistics t
  3  where e.extension_name=t.column_name
  4  and t.table_name='CUSTOMERS';
COL_GROUP                        NUM_DISTINCT          HISTOGRAM
------------------------------------------------------------------------
(LOWER("CUST_STATE_PROVINCE"))   145                   FREQUENCY
13.3.1.7.3 Dropping Expression Statistics

Use the drop_extended_stats function to delete expression statistics from a table:

exec dbms_stats.drop_extended_stats(null,'customers','(lower(country_id))');

13.3.1.8 Determining Stale Statistics

Statistics must be regularly gathered on database objects as those database objects are modified over time. To determine whether a given database object needs new database statistics, Oracle provides a table monitoring facility. This monitoring is enabled by default when STATISTICS_LEVEL is set to TYPICAL or ALL. Monitoring tracks the approximate number of INSERTs, UPDATEs, and DELETEs for that table and whether the table has been truncated, since the last time statistics were gathered. The information about changes of tables can be viewed in the USER_TAB_MODIFICATIONS view. Following a data-modification, there may be a few minutes delay while Oracle propagates the information to this view. Use the DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO procedure to immediately reflect the outstanding monitored information kept in the memory.

The GATHER_DATABASE_STATS or GATHER_SCHEMA_STATS procedures gather new statistics for tables with stale statistics when the OPTIONS parameter is set to GATHER STALE or GATHER AUTO. If a monitored table has been modified more than 10%, then these statistics are considered stale and gathered again.

13.3.1.9 User-Defined Statistics

You can create user-defined optimizer statistics to support user-defined indexes and functions. When you associate a statistics type with a column or domain index, Oracle calls the statistics collection method in the statistics type whenever statistics are gathered for database objects.

You should gather new column statistics on a table after creating a function-based index, to allow Oracle to collect column statistics equivalent information for the expression. This is done by calling the statistics-gathering procedure with the METHOD_OPT argument set to FOR ALL HIDDEN COLUMNS.

See Also:

Oracle Database Data Cartridge Developer's Guide for details about implementing user-defined statistics

13.3.2 Setting Preferences for Manual Statistics Gathering

You can use the DBMS_STATS.SET_*_PREFS procedures to set the default values for parameters used by the DBMS_STATS procedures that gather statistics. You can set a preference for each parameter at a table, schema, database, and global level, thus providing a fine granularity of control.

Note:

In previous releases, you used the DBMS_STATS.SET_PARM procedure to set the default parameter values. The scope of these changes was all operations that occurred after running SET_PARM. In Oracle Database 11g, SET_PARM is deprecated.

You can use the DBMS_STATS.SET_*_PREFS procedures to change the following parameters:

  • AUTOSTATS_TARGET (SET_GLOBAL_PREFS only)

  • CASCADE

  • DEGREE

  • ESTIMATE_PERCENT

  • GRANULARITY

  • INCREMENTAL

  • METHOD_OPT

  • NO_INVALIDATE

  • PUBLISH

  • STALE_PERCENT

Table 13-5 lists the DBMS_STATS procedures for setting preferences. Parameter values set in the DBMS_STAT.GATHER_*_STATS procedures overrule other settings. If a parameter has not been set, then the database checks for a table-level preference. If no table preference exists, then the database uses the GLOBAL preference.

Table 13-5 Setting Preferences for Gathering Statistics

Procedure Purpose

SET_TABLE_PREFS

Enables you to change the default values of the parameters used by the DBMS_STATS.GATHER_*_STATS procedures for the specified table only.

SET_SCHEMA_PREFS

Enables you to change the default values of the parameters used by the DBMS_STATS.GATHER_*_STATS procedures for all existing objects in the specified schema.

This procedure calls SET_TABLE_PREFS for each of the tables in the specified schema. Because it uses SET_TABLE_PREFS, calling SET_SCHEMA_PREFS does not affect any new objects created after it has been run. New objects use the GLOBAL_PREF values for all parameters.

SET_DATABASE_PREFS

Enables you to change the default values of the parameters used by the DBMS_STATS.GATHER_*_STATS procedures for all user-defined schemas in the database. You can include system-owned schemas such as SYS and SYSTEM by setting the ADD_SYS parameter to TRUE.

This procedure calls SET_TABLE_PREFS for each table in the specified schema. Because it uses SET_TABLE_PREFS, calling SET_SCHEMA_PREFS does not affect any new objects created after it has been run. New objects use the GLOBAL_PREF values for all parameters.

SET_GLOBAL_PREFS

Enables you to change the default values of the parameters used by the DBMS_STATS.GATHER_*_STATS procedures for any object in the database that does not have an existing table preference.

All parameters default to the global setting unless a table preference is set or the parameter is explicitly set in the DBMS_STATS.GATHER_*_STATS command. Changes made by this procedure will affect any new objects created after it has been run. New objects use the GLOBAL_PREF values for all parameters.

With GLOBAL_PREFS, you can set a default value for the parameter AUTOSTAT_TARGET. This additional parameter controls which objects the automatic statistic gathering job running in the nightly maintenance window looks after. Possible values for this parameter are ALL, ORACLE, and AUTO (default).


See Also:

Oracle Database PL/SQL Packages and Types Reference for syntax and examples of all DBMS_STATS procedures

13.3.3 When to Gather Statistics

When gathering statistics manually, you not only need to determine how to gather statistics, but also when and how often to gather new statistics.

For an application in which tables are being incrementally modified, you may only need to gather new statistics every week or every month. The simplest way to gather statistics in these environments is to use a script or job scheduling tool to regularly run the GATHER_SCHEMA_STATS and GATHER_DATABASE_STATS procedures. The frequency of collection intervals should balance the task of providing accurate statistics for the optimizer against the processing overhead incurred by the statistics collection process.

For tables that are being substantially modified in batch operations, such as with bulk loads, statistics should be gathered on those tables as part of the batch operation. The DBMS_STATS procedure should be called as soon as the load operation completes.

For partitioned tables, there are often cases in which only a single partition is modified. In those cases, statistics can be gathered only on those partitions rather than gathering statistics for the entire table. However, gathering global statistics for the partitioned table may still be necessary.

See Also:

Oracle Database PL/SQL Packages and Types Reference for more information about the GATHER_SCHEMA_STATS and GATHER_DATABASE_STATS procedures in the DBMS_STATS package

13.3.4 Comparing Statistics with DBMS_STATS Functions

DBMS_STATS enables you to compare statistics for a table from two different sources. Table 13-6 lists the functions in the DBMS_STATS package for comparing statistics.

Table 13-6 Functions That Compare Statistics in the DBMS_STATS Package

Procedure Compares

DIFF_TABLE_STATS_IN_PENDING

Pending statistics and statistics as of a timestamp or statistics from dictionary

DIFF_TABLE_STATS_IN_STATTAB

Statistics for a table from two different sources

DIFF_TABLE_STATS_IN_HISTORY

Statistics for a table from two timestamps in past and statistics as of that timestamp


The functions in Table 13-6 also compare the statistics of dependent objects such as indexes, columns, and partitions. They display statistics of the objects from both sources if the difference between those statistics exceeds a certain threshold. The threshold can be specified as an argument to the function, with a default of 10%. Oracle Database uses the statistics corresponding to the first source as basis for computing the differential percentage.

See Also:

Oracle Database PL/SQL Packages and Types Reference for more information about the DIFF_TABLE_STATS_* functions in the DBMS_STATS package

13.4 System Statistics

System statistics describe the system's hardware characteristics, such as I/O and CPU performance and utilization, to the query optimizer. When choosing an execution plan, the optimizer estimates the I/O and CPU resources required for each query. System statistics enable the query optimizer to more accurately estimate I/O and CPU costs, enabling the query optimizer to choose a better execution plan.

When Oracle gathers system statistics, it analyzes system activity in a specified time period (workload statistics) or simulates a workload (noworkload statistics). The statistics are collected using the DBMS_STATS.GATHER_SYSTEM_STATS procedure. Oracle highly recommends that you gather system statistics.

Note:

You must have DBA privileges or GATHER_SYSTEM_STATISTICS role to update dictionary system statistics.

Table 13-7 lists the optimizer system statistics gathered by the DBMS_STATS package and the options for gathering or manually setting specific system statistics.

Table 13-7 Optimizer System Statistics in the DBMS_STAT Package

Parameter Name Description Initialization Options for Gathering or Setting Statistics Unit

cpuspeedNW

Represents noworkload CPU speed. CPU speed is the average number of CPU cycles in each second.

At system startup

Set gathering_mode = NOWORKLOAD or set statistics manually.

Millions/sec.

ioseektim

I/O seek time equals seek time + latency time + operating system overhead time.

At system startup

10 (default)

Set gathering_mode = NOWORKLOAD or set statistics manually.

ms

iotfrspeed

I/O transfer speed is the rate at which an Oracle database can read data in the single read request.

At system startup

4096 (default)

Set gathering_mode = NOWORKLOAD or set statistics manually.

Bytes/ms

cpuspeed

Represents workload CPU speed. CPU speed is the average number of CPU cycles in each second.

None

Set gathering_mode = NOWORKLOAD, INTERVAL, or START|STOP, or set statistics manually.

Millions/sec.

maxthr

Maximum I/O throughput is the maximum throughput that the I/O subsystem can deliver.

None

Set gathering_mode = NOWORKLOAD, INTERVAL, or START|STOP, or set statistics manually.

Bytes/sec.

slavethr

Slave I/O throughput is the average parallel slave I/O throughput.

None

Set gathering_mode = INTERVAL or START|STOP, or set statistics manually.

Bytes/sec.

sreadtim

Single block read time is the average time to read a single block randomly.

None

Set gathering_mode = INTERVAL or START|STOP, or set statistics manually.

ms

mreadtim

Multiblock read is the average time to read a multiblock sequentially.

None

Set gathering_mode = INTERVAL or START|STOP, or set statistics manually.

ms

mbrc

Multiblock count is the average multiblock read count sequentially.

None

Set gathering_mode = INTERVAL or START|STOP, or set statistics manually.

blocks


Unlike table, index, or column statistics, Oracle does not invalidate already parsed SQL statements when system statistics get updated. All new SQL statements are parsed using new statistics.

Oracle offers two options for gathering system statistics:

These options better facilitate the gathering process to the physical database and workload: when workload system statistics are gathered, noworkload system statistics will be ignored. Noworkload system statistics are initialized to default values at the first database startup.

See Also:

Oracle Database PL/SQL Packages and Types Reference for detailed information on the procedures in the DBMS_STATS package for implementing system statistics

13.4.1 Workload Statistics

Workload statistics, introduced in Oracle 9i, gather single and multiblock read times, mbrc, CPU speed (cpuspeed), maximum system throughput, and average slave throughput. The sreadtim, mreadtim, and mbrc are computed by comparing the number of physical sequential and random reads between two points in time from the beginning to the end of a workload. These values are implemented through counters that change when the buffer cache completes synchronous read requests. Since the counters are in the buffer cache, they include not only I/O delays, but also waits related to latch contention and task switching. Workload statistics thus depend on the activity the system had during the workload window. If system is I/O bound—both latch contention and I/O throughput—it will be reflected in the statistics and will therefore promote a less I/O intensive plan after the statistics are used. Furthermore, workload statistics gathering does not generate additional overhead.

In release 9.2, maximum I/O throughput and average slave throughput were added to set a lower limit for a full table scan (FTS).

13.4.1.1 Gathering Workload Statistics

To gather workload statistics, either:

  • Run the dbms_stats.gather_system_stats('start') procedure at the beginning of the workload window, then the dbms_stats.gather_system_stats('stop') procedure at the end of the workload window.

  • Run dbms_stats.gather_system_stats('interval', interval=>N) where N is the number of minutes when statistics gathering will be stopped automatically.

To delete system statistics, run dbms_stats.delete_system_stats(). Workload statistics will be deleted and reset to the default noworkload statistics.

13.4.1.2 Multiblock Read Count

If you gather workload statistics, then the mbrc value gathered as part of the workload statistics is used to estimate the cost of a full table scan. However, during the gathering process of workload statistics, it is possible that Oracle Database will not gather the mbrc and mreadtim values if no table scans are performed during serial workloads, as is often the case with OLTP systems. On the other hand, full table scans occur frequently on DSS systems but may run parallel and bypass the buffer cache. In such cases, Oracle Database still gathers the sreadtim value because index lookup is performed using the buffer cache.

If Oracle Database cannot gather or validate gathered mbrc or mreadtim values, but has gathered sreadtim and cpuspeed values, then only the sreadtim and cpuspeed values are used for costing. In this case, the optimizer uses the value of the initialization parameter DB_FILE_MULTIBLOCK_READ_COUNT to cost a full table scan. However, if DB_FILE_MULTIBLOCK_READ_COUNT is not set or is set to 0 (zero), then the optimizer uses a value of 8 for costing.

13.4.2 Noworkload Statistics

Noworkload statistics consist of I/O transfer speed, I/O seek time, and CPU speed (cpuspeednw). The major difference between workload statistics and noworkload statistics lies in the gathering method.

Noworkload statistics gather data by submitting random reads against all data files, while workload statistics uses counters updated when database activity occurs. ioseektim represents the time it takes to position the disk head to read data. Its value usually varies from 5 ms to 15 ms, depending on disk rotation speed and the disk or RAID specification. The I/O transfer speed represents the speed at which one operating system process can read data from the I/O subsystem. Its value varies greatly, from a few MBs per second to hundreds of MBs per second. Oracle uses relatively conservative default settings for I/O transfer speed.

In Oracle 10g, Oracle uses noworkload statistics and the CPU cost model by default. The values of noworkload statistics are initialized to defaults at the first instance startup:

ioseektim = 10ms
iotrfspeed = 4096 bytes/ms
cpuspeednw = gathered value, varies based on system

If workload statistics are gathered, noworkload statistics will be ignored and Oracle will use workload statistics instead.

13.4.2.1 Gathering Noworkload Statistics

To gather noworkload statistics, run DBMS_STATS.GATHER_SYSTEM_STATS() with no arguments. There will be an overhead on the I/O system during the gathering process of noworkload statistics. The gathering process may take from a few seconds to several minutes, depending on I/O performance and database size.

The information is analyzed and verified for consistency. In some cases, the value of noworkload statistics may remain its default value. In such cases, repeat the statistics gathering process or set the value manually to values that the I/O system has according to its specifications by using the DBMS_STATS.SET_SYSTEM_STATS procedure.

13.5 Managing Statistics

This section includes the following topics:

13.5.1 Pending Statistics

Starting with the 11g Release 2 (11.2), when gathering statistics, you have the option to automatically publish the statistics at the end of the gather operation (default behavior), or to have the new statistics saved as pending. Saving the new statistics as pending allows you to validate the new statistics and publish them only if they are satisfactory.

To check whether the statistics will be automatically published as soon as they are gathered, use the DBMS_STATS package as follows:

SELECT DBMS_STATS.GET_PREFS('PUBLISH') PUBLISH FROM DUAL;

The preceding query returns either TRUE or FALSE. TRUE indicates that the statistics will be published as and when they are gathered, while FALSE indicates that the statistics will be kept pending.

Note:

Published statistics are stored in data dictionary views, such as USER_TAB_STATISTICS and USER_IND_STATISTICS. Pending statistics are stored in views such as USER_TAB_PENDING_STATS and USER_IND_PENDING_STATS.

You can change the PUBLISH setting at either the schema or the table level. For example, to change the PUBLISH setting for the customers table in the SH schema, execute the statement:

Exec dbms_stats.set_table_prefs('SH', 'CUSTOMERS', 'PUBLISH', 'false');

Subsequently, when you gather statistics on the customers table, the statistics will not be automatically published when the gather job completes. Instead, the newly gathered statistics will be stored in the USER_TAB_PENDING_STATS table.

By default, the optimizer uses the published statistics stored in the data dictionary views. If you want the optimizer to use the newly collected pending statistics, set the initialization parameter OPTIMIZER_USE_PENDING_STATISTICS to TRUE (the default value is FALSE), and run a workload against the table or schema:

alter session set optimizer_use_pending_statistics = TRUE;

The optimizer will use the pending statistics instead of the published statistics when compiling SQL statements.If the pending statistics are valid, they can be made public by executing the following statement:

Exec dbms_stats.publish_pending_stats(null, null);

You can also publish the pending statistics for a specific database object. For example, by using the following statement:

Exec dbms_stats.publish_pending_stats('SH','CUSTOMERS');

If you do not want to publish the pending statistics, delete them by executing the following statement:

Exec dbms_stats.delete_pending_stats('SH','CUSTOMERS');

You can export pending statistics using dbms_stats.export_pending_stats function. Exporting pending statistics to a test system enables you to run a full workload against the new statistics.

13.5.2 Restoring Previous Versions of Statistics

Whenever statistics in dictionary are modified, old versions of statistics are saved automatically for future restoration. Statistics can be restored using RESTORE procedures of DBMS_STATS package. These procedures use a time stamp as an argument and restore statistics as of that time stamp. This is useful in case newly collected statistics leads to some sub-optimal execution plans and the administrator wants to revert to the previous set of statistics.

There are dictionary views that display the time of statistics modifications. These views are useful in determining the time stamp to be used for statistics restoration.

  • Catalog view DBA_OPTSTAT_OPERATIONS contain history of statistics operations performed at schema and database level using DBMS_STATS.

  • The views *_TAB_STATS_HISTORY views (ALL, DBA, or USER) contain a history of table statistics modifications.

The old statistics are purged automatically at regular intervals based on the statistics history retention setting and the time of the recent analysis of the system. Retention is configurable using the ALTER_STATS_HISTORY_RETENTION procedure of DBMS_STATS. The default value is 31 days, which means that you would be able to restore the optimizer statistics to any time in last 31 days.

Automatic purging is enabled when STATISTICS_LEVEL parameter is set to TYPICAL or ALL. If automatic purging is disabled, the old versions of statistics need to be purged manually using the PURGE_STATS procedure.

The other DBMS_STATS procedures related to restoring and purging statistics include:

  • PURGE_STATS: This procedure can be used to manually purge old versions beyond a time stamp.

  • GET_STATS_HISTORY_RETENTION: This function can be used to get the current statistics history retention value.

  • GET_STATS_HISTORY_AVAILABILITY: This function gets the oldest time stamp where statistics history is available. Users cannot restore statistics to a time stamp older than the oldest time stamp.

When restoring previous versions of statistics, the following limitations apply:

  • RESTORE procedures cannot restore user-defined statistics.

  • Old versions of statistics are not stored when the ANALYZE command has been used for collecting statistics.

Note:

If you need to remove all rows from a table when using DBMS_STATS, use TRUNCATE instead of dropping and re-creating the same table. When a table is dropped, workload information used by the auto-histogram gathering feature and saved statistics history used by the RESTORE_*_STATS procedures will be lost. Without this data, these features will not function properly.

13.5.3 Exporting and Importing Statistics

Statistics can be exported and imported from the data dictionary to user-owned tables, enabling you to create multiple versions of statistics for the same schema. You can also copy statistics from one database to another database. You may want to do this to copy the statistics from a production database to a scaled-down test database.

Note:

Exporting and importing statistics is a distinct concept from the Data Pump Export and Import utilities.

Before exporting statistics, you first need to create a table for holding the statistics. This statistics table is created using the procedure DBMS_STATS.CREATE_STAT_TABLE. After this table is created, then you can export statistics from the data dictionary into your statistics table using the DBMS_STATS.EXPORT_*_STATS procedures. The statistics can then be imported using the DBMS_STATS.IMPORT_*_STATS procedures.

Note that the optimizer does not use statistics stored in a user-owned table. The only statistics used by the optimizer are the statistics stored in the data dictionary. In order to have the optimizer use the statistics in a user-owned tables, you must import those statistics into the data dictionary using the statistics import procedures.

In order to move statistics from one database to another, you must first export the statistics on the first database, then copy the statistics table to the second database, using the Data Pump Export and Import utilities or other mechanisms, and finally import the statistics into the second database.

Note:

The Data Pump Export and Import utilities export and import optimizer statistics from the database along with the table. When a column has system-generated names, Original Export does not export statistics with the data, but this restriction does not apply to Data Pump Export.

13.5.4 Restoring Statistics Versus Importing or Exporting Statistics

The functionality for restoring statistics is similar in some respects to the functionality of importing and exporting statistics. In general, use the restore capability when:

  • You want to recover older versions of the statistics. For example, to restore the optimizer behavior to an earlier date.

  • You want the database to manage the retention and purging of statistics histories.

You should use EXPORT/IMPORT_*_STATS procedures when:

  • You want to experiment with multiple sets of statistics and change the values back and forth.

  • You want to move the statistics from one database to another database. For example, moving statistics from a production system to a test system.

  • You want to preserve a known set of statistics for a longer period of time than the desired retention date for restoring statistics.

13.5.5 Locking Statistics for a Table or Schema

Statistics for a table or schema can be locked. Once statistics are locked, no modifications can be made to those statistics until the statistics have been unlocked. These locking procedures are useful in a static environment in which you want to guarantee that the statistics never change.

The DBMS_STATS package provides two procedures for locking (LOCK_SCHEMA_STATS and LOCK_TABLE_STATS) and two procedures for unlocking statistics (UNLOCK_SCHEMA_STATS and UNLOCK_TABLE_STATS).

13.5.6 Setting Statistics

You can set table, column, index, and system statistics using the SET_*_STATISTICS procedures. Setting statistics in the manner is not recommended, because inaccurate or inconsistent statistics can lead to poor performance.

13.5.7 Estimating Statistics with Dynamic Sampling

The purpose of dynamic sampling is to improve server performance by determining more accurate estimates for predicate selectivity and statistics for tables and indexes. The statistics for tables and indexes include table block counts, applicable index block counts, table cardinalities, and relevant join column statistics. These more accurate estimates allow the optimizer to produce better performing plans.

You can use dynamic sampling to:

  • Estimate single-table predicate selectivities when collected statistics cannot be used or are likely to lead to significant errors in estimation.

  • Estimate statistics for tables and relevant indexes without statistics.

  • Estimate statistics for tables and relevant indexes whose statistics are too out of date to trust.

This dynamic sampling feature is controlled by the OPTIMIZER_DYNAMIC_SAMPLING parameter. For dynamic sampling to automatically gather the necessary statistics, this parameter should be set to a value of 2 or higher. The default value is 2. See "Dynamic Sampling Levels" to learn about the sampling levels that can be set.

13.5.7.1 How Dynamic Sampling Works

The primary performance attribute is compile time. Oracle Database determines at compile time whether a query would benefit from dynamic sampling. If so, a recursive SQL statement is issued to scan a small random sample of the table's blocks, and to apply the relevant single table predicates to estimate predicate selectivities. The sample cardinality can also be used, in some cases, to estimate table cardinality. Any relevant column and index statistics are also collected.

Depending on the value of the OPTIMIZER_DYNAMIC_SAMPLING initialization parameter, a certain number of blocks are read by the dynamic sampling query.

See Also:

Oracle Database Reference for details about this initialization parameter

13.5.7.2 When to Use Dynamic Sampling

For a query that normally completes quickly (in less than a few seconds), you will not want to incur the cost of dynamic sampling. However, dynamic sampling can be beneficial under any of the following conditions:

  • A better plan can be found using dynamic sampling.

  • The sampling time is a small fraction of total execution time for the query.

  • The query will be executed many times.

Dynamic sampling can be applied to a subset of a single table's predicates and combined with standard selectivity estimates of predicates for which dynamic sampling is not done.

13.5.7.3 How to Use Dynamic Sampling to Improve Performance

You control dynamic sampling with the OPTIMIZER_DYNAMIC_SAMPLING parameter, which can be set to a value from 0 to 10. The default is 2.

  • A value of 0 means dynamic sampling will not be done.

  • Increasing the value of the parameter results in more aggressive application of dynamic sampling, in terms of both the type of tables sampled (analyzed or unanalyzed) and the amount of I/O spent on sampling.

Dynamic sampling is repeatable if no rows have been inserted, deleted, or updated in the table being sampled. The parameter OPTIMIZER_FEATURES_ENABLE turns off dynamic sampling if set to a version before 9.2.0.

13.5.7.4 Dynamic Sampling Levels

The sampling levels are as follows if the dynamic sampling level used is from a cursor hint or from the OPTIMIZER_DYNAMIC_SAMPLING initialization parameter:

  • Level 0: Do not use dynamic sampling.

  • Level 1: Sample all tables that have not been analyzed if the following criteria are met: (1) there is at least 1 unanalyzed table in the query; (2) this unanalyzed table is joined to another table or appears in a subquery or non-mergeable view; (3) this unanalyzed table has no indexes; (4) this unanalyzed table has more blocks than the number of blocks that would be used for dynamic sampling of this table. The number of blocks sampled is the default number of dynamic sampling blocks (32).

  • Level 2: Apply dynamic sampling to all unanalyzed tables. The number of blocks sampled is two times the default number of dynamic sampling blocks.

  • Level 3: Apply dynamic sampling to all tables that meet Level 2 criteria, plus all tables for which standard selectivity estimation used a guess for a predicate that is a potential dynamic sampling predicate. The number of blocks sampled is the default number of dynamic sampling blocks. For unanalyzed tables, the number of blocks sampled is twice the default number of dynamic sampling blocks.

  • Level 4: Apply dynamic sampling to all tables that meet Level 3 criteria, plus all tables that have single-table predicates that reference 2 or more columns. The number of blocks sampled is the default number of dynamic sampling blocks. For unanalyzed tables, the number of blocks sampled is two times the default number of dynamic sampling blocks.

  • Levels 5, 6, 7, 8, and 9: Apply dynamic sampling to all tables that meet the previous level criteria using 2, 4, 8, 32, or 128 times the default number of dynamic sampling blocks respectively.

  • Level 10: Apply dynamic sampling to all tables that meet the Level 9 criteria using all blocks in the table.

The sampling levels are as follows if the dynamic sampling level for a table is set using the DYNAMIC_SAMPLING optimizer hint:

  • Level 0: Do not use dynamic sampling.

  • Level 1: The number of blocks sampled is the default number of dynamic sampling blocks (32).

  • Levels 2, 3, 4, 5, 6, 7, 8, and 9: The number of blocks sampled is 2, 4, 8, 16, 32, 64, 128, or 256 times the default number of dynamic sampling blocks respectively.

  • Level 10: Read all blocks in the table.

    See Also:

    Oracle Database SQL Language Reference to learn about setting the sampling levels with the DYNAMIC_SAMPLING hint

13.5.8 Handling Missing Statistics

When Oracle encounters a table with missing statistics, Oracle dynamically gathers the necessary statistics needed by the optimizer. However, for certain types of tables, Oracle does not perform dynamic sampling. These include remote tables and external tables. In those cases and also when dynamic sampling has been disabled, the optimizer uses default values for its statistics, shown in Table 13-8 and Table 13-9.

Table 13-8 Default Table Values When Statistics Are Missing

Table Statistic Default Value Used by Optimizer

Cardinality

num_of_blocks * (block_size - cache_layer) / avg_row_len

Average row length

100 bytes

Number of blocks

100 or actual value based on the extent map

Remote cardinality

2000 rows

Remote average row length

100 bytes


Table 13-9 Default Index Values When Statistics Are Missing

Index Statistic Default Value Used by Optimizer

Levels

1

Leaf blocks

25

Leaf blocks/key

1

Data blocks/key

1

Distinct keys

100

Clustering factor

800


13.6 Viewing Statistics

This section discusses:

13.6.1 Statistics on Tables, Indexes and Columns

Statistics on tables, indexes, and columns are stored in the data dictionary. To view statistics in the data dictionary, query the appropriate data dictionary view (USER, ALL, or DBA). These DBA_* views include the following:

  • DBA_TABLES and DBA_OBJECT_TABLES

  • DBA_TAB_STATISTICS and DBA_TAB_COL_STATISTICS

  • DBA_TAB_HISTOGRAMS

  • DBA_TAB_COLS

  • DBA_COL_GROUP_COLUMNS

  • DBA_INDEXES and DBA_IND_STATISTICS

  • DBA_CLUSTERS

  • DBA_TAB_PARTITIONS and DBA_TAB_SUBPARTITIONS

  • DBA_IND_PARTITIONS and DBA_IND_SUBPARTITIONS

  • DBA_PART_COL_STATISTICS

  • DBA_PART_HISTOGRAMS

  • DBA_SUBPART_COL_STATISTICS

  • DBA_SUBPART_HISTOGRAMS

See Also:

Oracle Database Reference for information on the statistics in these views

13.6.2 Viewing Histograms

You can store column statistics as histograms. These histograms provide accurate estimates of the distribution of column data. Histograms provide improved selectivity estimates in the presence of data skew, resulting in optimal execution plans with nonuniform data distributions.

Oracle uses the following types of histograms for column statistics:

The type of histogram is stored in the HISTOGRAM column of the *TAB_COL_STATISTICS views (USER and DBA). This column can have values of HEIGHT BALANCED, FREQUENCY, or NONE.

13.6.2.1 Height-Balanced Histograms

In a height-balanced histogram, the column values are divided into buckets so that each bucket contains approximately the same number of rows. The histogram shows where the endpoints fall in the range of values.

Consider a column my_col with values between 1 and 100 and a histogram with 10 buckets. If the data in my_col is uniformly distributed, then the histogram looks similar to Figure 13-1, where the numbers are the endpoint values. For example, the 7th bucket has rows with values between 60 and 70.

Figure 13-1 Height-Balanced Histogram with Uniform Distribution

Description of Figure 13-1 follows
Description of "Figure 13-1 Height-Balanced Histogram with Uniform Distribution"

The number of rows in each bucket is 10% the total number of rows. In this example of uniform distribution, 40% of the rows have values between 60 and 100.

If the data is not uniformly distributed, then the histogram may look like Figure 13-2. In this case, most of the rows have the value 5 for the column. Only 10% of the rows have values between 60 and 100.

Figure 13-2 Height-Balanced Histogram with Non-Uniform Distribution

Description of Figure 13-2 follows
Description of "Figure 13-2 Height-Balanced Histogram with Non-Uniform Distribution"

You can view height-balanced histograms using the USER_TAB_HISTOGRAMS table, as shown in Example 13-1.

Example 13-1 Viewing Height-Balanced Histogram Statistics

BEGIN
  DBMS_STATS.GATHER_table_STATS ( 
    OWNNAME    => 'OE', 
    TABNAME    => 'INVENTORIES', 
    METHOD_OPT => 'FOR COLUMNS SIZE 10 quantity_on_hand' );
END;
/

SELECT COLUMN_NAME, NUM_DISTINCT, NUM_BUCKETS, HISTOGRAM 
FROM   USER_TAB_COL_STATISTICS
WHERE  TABLE_NAME = 'INVENTORIES' 
AND    COLUMN_NAME = 'QUANTITY_ON_HAND';

COLUMN_NAME                    NUM_DISTINCT NUM_BUCKETS HISTOGRAM
------------------------------ ------------ ----------- ---------------
QUANTITY_ON_HAND                        237          10 HEIGHT BALANCED

SELECT ENDPOINT_NUMBER, ENDPOINT_VALUE 
FROM   USER_TAB_HISTOGRAMS
WHERE  TABLE_NAME = 'INVENTORIES'
AND    COLUMN_NAME = 'QUANTITY_ON_HAND'
ORDER BY ENDPOINT_NUMBER;

ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
              0              0
              1             27
              2             42
              3             57
              4             74
              5             98
              6            123
              7            149
              8            175
              9            202
             10            353

In the Example 13-1 query output, one row (1-10) corresponds to each bucket in the histogram. Oracle Database added a special 0th bucket to this histogram because the value in the 1st bucket (27) is not the minimum value for the quantity_on_hand column. The 0th bucket holds the minimum value of 0 for quantity_on_hand.

13.6.2.2 Frequency Histograms

In a frequency histogram, each value of the column corresponds to a single bucket of the histogram. Each bucket contains the number of occurrences of this single value. For example, suppose that 36 rows contain the value 1 for column warehouse_id. The endpoint value 1 has an endpoint number 36.

The database automatically creates frequency histograms instead of height-balanced histograms when the number of distinct values is less than or equal to the number of histogram buckets specified. You can view frequency histograms using the USER_TAB_HISTOGRAMS view, as shown in Example 13-2.

Example 13-2 Viewing Frequency Histogram Statistics

BEGIN
  DBMS_STATS.GATHER_TABLE_STATS (
    OWNNAME    => 'OE', 
    TABNAME    => 'INVENTORIES', 
    METHOD_OPT => 'FOR COLUMNS SIZE 20 warehouse_id' );
END;
/

SELECT COLUMN_NAME, NUM_DISTINCT, NUM_BUCKETS, HISTOGRAM 
FROM   USER_TAB_COL_STATISTICS
WHERE  TABLE_NAME = 'INVENTORIES' 
AND    COLUMN_NAME = 'WAREHOUSE_ID';

COLUMN_NAME                    NUM_DISTINCT NUM_BUCKETS HISTOGRAM
------------------------------ ------------ ----------- ---------------
WAREHOUSE_ID                              9           9 FREQUENCY

SELECT   ENDPOINT_NUMBER, ENDPOINT_VALUE 
FROM     USER_TAB_HISTOGRAMS
WHERE    TABLE_NAME = 'INVENTORIES' 
AND      COLUMN_NAME = 'WAREHOUSE_ID'
ORDER BY ENDPOINT_NUMBER;

ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
             36              1
            213              2
            261              3
            370              4
            484              5
            692              6
            798              7
            984              8
           1112              9

In Example 13-2, the first bucket is for the warehouse_id of 1. The value appears 36 times in the table, as confirmed by the following query:

oe@PROD> SELECT COUNT(*) FROM inventories WHERE warehouse_id = 1;
 
  COUNT(*)
----------
        36