Oracle® Database VLDB and Partitioning Guide 11g Release 2 (11.2) Part Number E10837-02 |
|
|
View PDF |
This chapter covers tuning in a parallel execution environment and discusses the following topics:
Parallel execution is the ability to apply multiple CPU and I/O resources to the execution of a single database operation. It dramatically reduces response time for data-intensive operations on large databases typically associated with decision support systems (DSS) and data warehouses. You can also implement parallel execution on OLTP system for batch processing or schema maintenance operations such as index creation. Parallel execution is sometimes called parallelism. Parallelism is the idea of breaking down a task so that, instead of one process doing all of the work in a query, many processes do part of the work at the same time. An example of this is when four processes work together to calculate the total sales for a year, each process handle one quarter of the year instead of a single processing handling all four quarters by itself. The improvement in performance can be quite significant. Parallel execution improves processing for:
Queries requiring large table scans, joins, or partitioned index scans
Creation of large indexes
Creation of large tables (including materialized views)
Bulk inserts, updates, merges, and deletes
You can also use parallel execution to access object types within an Oracle database. For example, you can use parallel execution to access large objects (LOBs).
Parallel execution benefits systems with all of the following characteristics:
Symmetric multiprocessors (SMPs), clusters, or massively parallel systems
Underutilized or intermittently used CPUs (for example, systems where CPU usage is typically less than 30%)
Sufficient memory to support additional memory-intensive processes, such as sorts, hashing, and I/O buffers
If your system lacks any of these characteristics, parallel execution might not significantly improve performance. In fact, parallel execution may reduce system performance on overutilized systems or systems with small I/O bandwidth.
The benefits of parallel execution can be seen in DSS and data warehousing environments. OLTP systems can also benefit from parallel execution during batch processing and during schema maintenance operations such as creation of indexes. The average simple DML or SELECT
statements that characterize OLTP applications would not see any benefit from being executed in parallel.
Parallel execution is not normally useful for:
Environments in which the typical query or transaction is very short (a few seconds or less). This includes most online transaction systems. Parallel execution is not useful in these environments because there is a cost associated with coordinating the parallel execution servers; for short transactions, the cost of this coordination may outweigh the benefits of parallelism.
Environments in which the CPU, memory, or I/O resources are already heavily utilized. Parallel execution is designed to exploit additional available hardware resources; if no such resources are available, then parallel execution will not yield any benefits and indeed may be detrimental to performance.
Parallel execution is designed to effectively use multiple CPUs and disks to answer queries quickly. It is very I/O intensive by nature. In order to achieve optimal performance, each component in the hardware configuration must be sized to sustain the same level of throughput: from the CPUs and the Host Bus Adapters (HBAs) in the compute nodes, to the switches, and on into the I/O subsystem, including the storage controllers and the physical disks. If the system is an Oracle Real Application Cluster (RAC) then the interconnect also has to be size appropriately. The weakest link is going to limit the performance and scalability of operations in a configuration.
It is recommended to measure the maximum I/O performance a hardware configuration can achieve without the Oracle Database. This measurement can be used as a baseline for the future system performance evaluations. Remember, it is not possible for parallel execution to achieve better I/O throughput then the underlying hardware can sustain. Oracle Database provides a free calibration tool called Orion, which is designed to measure the I/O performance of a system by simulating Oracle I/O workloads. A parallel execution typically performs large random I/Os.
You can use parallel execution for any of the following:
Access methods
Some examples are table scans, index fast full scans, and partitioned index range scans.
Join methods
Some examples are nested loop, sort merge, hash, and star transformation.
DDL statements
Some examples are CREATE
TABLE
AS
SELECT
, CREATE
INDEX
, REBUILD
INDEX
, REBUILD
INDEX
PARTITION
, and MOVE
/SPLIT
/COALESCE
PARTITION
.
You can normally use parallel DDL where you use regular DDL. There are, however, some additional details to consider when designing your database. One important restriction is that parallel DDL cannot be used on tables with object or LOB
columns.
All of these DDL operations can be performed in NOLOGGING
mode for either parallel or serial execution.
The CREATE
TABLE
statement for an index-organized table can be parallelized either with or without an AS
SELECT
clause.
Different parallelism is used for different operations. Parallel CREATE
(partitioned) TABLE
AS
SELECT
and parallel CREATE
(partitioned) INDEX
statements run with a degree of parallelism (DOP) equal to the number of partitions.
DML statements
Some examples are INSERT
AS
SELECT
, updates, deletes, and MERGE
operations.
Parallel DML (parallel insert, update, merge, and delete) uses parallel execution mechanisms to speed up or scale up large DML operations against large database tables and indexes. You can also use INSERT
... SELECT
statements to insert rows into multiple tables as part of a single DML statement. You can normally use parallel DML where you use regular DML.
Although data manipulation language normally includes queries, the term parallel DML refers only to inserts, updates, merges, and deletes done in parallel.
Parallel query
You can parallelize queries and subqueries in SELECT
statements, as well as the query portions of DDL statements and DML statements (INSERT
, UPDATE
, DELETE
, and MERGE
).
Miscellaneous SQL operations
Some examples are GROUP
BY
, NOT
IN
, SELECT
DISTINCT
, UNION
, UNION
ALL
, CUBE
, and ROLLUP
, as well as aggregate and table functions.
SQL*Loader
You can parallelize the use of SQL*Loader, where large amounts of data are routinely encountered. To speed up your loads, you can use a parallel direct-path load as in the following example:
sqlldr CONTROL=LOAD1.CTL DIRECT=TRUE PARALLEL=TRUE sqlldr CONTROL=LOAD2.CTL DIRECT=TRUE PARALLEL=TRUE sqlldr CONTROL=LOAD3.CTL DIRECT=TRUE PARALLEL=TRUE
Where you provide your userid and password on the command line. You can also use a parameter file to achieve the same result.
An important point to remember is that indexes are not maintained during a parallel load.
When a user issues a SQL statement, the optimizer decides whether or not to execute it in parallel. If parallel execution is chosen, then the following steps will happen:
The user session or shadow process takes on the role of a coordinator, often called the query coordinator.
The query coordinator will obtain the necessary number of parallel servers.
The SQL statement will be executed as a sequence of operations (a full table scan to perform a join on a non-indexed column, an ORDER
BY
, and so on). The parallel execution servers will perform each operation in parallel if possible.
When the parallel servers are finished executing, the query coordinator will perform any portion of the work that cannot be executed in parallel. For example, a parallel query with a SUM()
operation requires adding the individual sub-totals calculated by each parallel server.
Finally, the query coordinator will return any results to the user.
Each SQL statement undergoes an optimization and parallelization process when it is parsed. After the optimizer determines the execution plan of a statement, the parallel execution coordinator determines the parallelization method for each operation in the plan. For example, the parallelization method might be to parallelize a full table scan by block range or parallelize an index range scan by partition. The coordinator must decide whether an operation can be performed in parallel and, if so, how many parallel execution servers to enlist. The number of parallel execution servers in one set is the degree of parallelism (DOP).
When the parallel execution coordinator examines each operation in a SQL statement's execution plan it will determine the way in which the rows operated on by the operation must be divided or redistributed among the parallel execution servers. As an example of parallel query with intra- and inter-operation parallelism, consider the following query:
explain plan for SELECT /*+ PARALLEL(4) */ customers.cust_first_name, customers.cust_last_name, MAX(QUANTITY_SOLD), AVG(QUANTITY_SOLD) FROM sales, customers WHERE sales.cust_id=customers.cust_id GROUP BY customers.cust_first_name, customers.cust_last_name; Explained.
Note that a hint has been used in the query to specify the DOP of the tables customers
and sales
.
Figure 8-1 illustrates the data flow graph or query plan for this query.
PLAN_TABLE_OUTPUT --------------------------------------------------------------------------------------------------- Plan hash value: 4060011603 -------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | TQ |IN-OUT| PQ Distrib | --------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 925 | 25900 | | | | | 1 | PX COORDINATOR | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10003 | 925 | 25900 | Q1,03 | P->S | QC (RAND) | | 3 | HASH GROUP BY | | 925 | 25900 | Q1,03 | PCWP | | | 4 | PX RECEIVE | | 925 | 25900 | Q1,03 | PCWP | | | 5 | PX SEND HASH | :TQ10002 | 925 | 25900 | Q1,02 | P->P | HASH | |* 6 | HASH JOIN BUFFERED | | 925 | 25900 | Q1,02 | PCWP | | | 7 | PX RECEIVE | | 630 | 12600 | Q1,02 | PCWP | | | 8 | PX SEND HASH | :TQ10000 | 630 | 12600 | Q1,00 | P->P | HASH | | 9 | PX BLOCK ITERATOR | | 630 | 12600 | Q1,00 | PCWC | | | 10 | TABLE ACCESS FULL| CUSTOMERS | 630 | 12600 | Q1,00 | PCWP | | | 11 | PX RECEIVE | | 960 | 7680 | Q1,02 | PCWP | | | 12 | PX SEND HASH | :TQ10001 | 960 | 7680 | Q1,01 | P->P | HASH | | 13 | PX BLOCK ITERATOR | | 960 | 7680 | Q1,01 | PCWC | | | 14 | TABLE ACCESS FULL| SALES | 960 | 7680 | Q1,01 | PCWP | | ------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 6 - access("SALES"."CUST_ID"="CUSTOMERS"."CUST_ID") 26 rows selected.
Figure 8-1 Data Flow Diagram for Joining Tables
Given two sets of parallel execution servers SS1 and SS2 for the query plan illustrated in Figure 8-1, the execution will proceed as follows: each server set (SS1 and SS2) will have four execution processes because of the PARALLEL
hint in the query that specifies the DOP.
Slave set SS1 first scans the table customers
and sends rows to SS2, who will build a hash table on the rows. In other words, the consumers in SS2 and the producers in SS1 work concurrently: one in scanning customers
in parallel, the other is consuming rows and building the hash table to enable the hash join in parallel. This is an example of inter-operation parallelism.
After SS1 has finished scanning the entire customers
table, it scans the sales
table in parallel. It sends its rows to servers in SS2, which then perform the probes to finish the hash-join in parallel. After SS1 is done scanning the sales
table in parallel and sending the rows to SS2, it switches to performing the GROUP
BY
in parallel. This is how two server sets run concurrently to achieve inter-operation parallelism across various operators in the query tree.
Another important aspect of parallel execution is the redistribution of rows when they are sent from servers in one server set to another. For the query plan in Figure 8-1, after a server process in SS1 scans a row from the customers
table, which server process in SS2 should it send it to? The operator into which the rows are flowing decides the redistribution. In this case, the redistribution of rows flowing up from SS1 performing the parallel scan of customers
into SS2 performing the parallel hash-join is done by hash partitioning on the join column. That is, a server process scanning customers
computes a hash function of the value of the column customers.cust_id
to decide the number of the server process in SS2 to send it to. The redistribution method used in parallel queries is explicitly shows in the Distrib
column in the EXPLAIN
PLAN
of the query. In Figure 8-1, "Data Flow Diagram for Joining Tables", this can be seen on line 5, 8, and 12 of the EXPLAIN
PLAN
.
Operations that require the output of other operations are known as consumer operations. In Figure 8-1, the GROUP
BY
SORT
operation is the consumer of the HASH
JOIN
operation because GROUP
BY
SORT
requires the HASH
JOIN
output.
Consumer operations can begin consuming rows as soon as the producer operations have produced rows. In the previous example, while the parallel execution servers are producing rows in the FULL
SCAN
of the sales
table, another set of parallel execution servers can begin to perform the HASH
JOIN
operation to consume the rows.
Each of the two operations performed concurrently is given its own set of parallel execution servers. Therefore, both query operations and the data flow tree itself have parallelism. The parallelism of an individual operation is called intra-operation parallelism and the parallelism between operations in a data flow tree is called inter-operation parallelism. Due to the producer-consumer nature of the Oracle Database's operations, only two operations in a given tree need to be performed simultaneously to minimize execution time. To illustrate intra- and inter-operation parallelism, consider the following statement:
SELECT * FROM employees ORDER BY last_name;
The execution plan implements a full scan of the employees
table. This operation is followed by a sorting of the retrieved rows, based on the value of the last_name
column. For the sake of this example, assume the last_name
column is not indexed. Also assume that the DOP for the query is set to 4, which means that four parallel execution servers can be active for any given operation.
Figure 8-2 illustrates the parallel execution of the example query.
Figure 8-2 Interoperation Parallelism and Dynamic Partitioning
As you can see from Figure 8-2, there are actually eight parallel execution servers involved in the query even though the DOP is 4. This is because a producer and consumer operator can be performed at the same time (interoperation parallelism).
Also note that all of the parallel execution servers involved in the scan operation send rows to the appropriate parallel execution server performing the SORT
operation. If a row scanned by a parallel execution server contains a value for the last_name
column between A
and G
, that row gets sent to the first ORDER
BY
parallel execution server. When the scan operation is complete, the sorting processes can return the sorted results to the query coordinator, which, in turn, returns the complete query results to the user.
To execute a query in parallel, Oracle Database generally creates a set of producer parallel execution servers and a set of consumer parallel execution servers. The producer server retrieves rows from tables and the consumer server performs operations such as join, sort, DML, and DDL on these rows. Each server in the producer set has a connection to each server in the consumer set. This means that the number of virtual connections between parallel execution servers increases as the square of the degree of parallelism.
Each communication channel has at least one, and sometimes up to four memory buffers, which are allocated from the SHARED_POOL
. Multiple memory buffers facilitate asynchronous communication among the parallel execution servers.
A single-instance environment uses at most three buffers for each communication channel. An Oracle Real Application Clusters environment uses at most four buffers for each channel. Figure 8-3 illustrates message buffers and how producer parallel execution servers connect to consumer parallel execution servers.
Figure 8-3 Parallel Execution Server Connections and Buffers
When a connection is between two processes on the same instance, the servers communicate by passing the buffers back and forth in memory (in the SHARED_POOL
). When the connection is between processes in different instances, the messages are sent using external high-speed network protocols over the interconnect. In Figure 8-3, the DOP is equal to the number of parallel execution servers, which in this case is n. Figure 8-3 does not show the parallel execution coordinator. Each parallel execution server actually has an additional connection to the parallel execution coordinator. It is important to size the SHARED_POOL
adequately when using parallel execution. If there is not enough free space in the SHARED_POOL
to allocate the necessary memory buffers for a parallel server, it will fail to start.
The number of parallel execution servers associated with a single operation is known as the degree of parallelism (DOP). Parallel execution is designed to effectively use multiple CPUs. Oracle Database's parallel execution framework enables you to either explicitly chose a specific degree of parallelism or to rely on Oracle to automatically control it.
A specific DOP can be requested from the Oracle Database. For example, you can set a fixed DOP at a table or index level:
ALTER TABLES sales PARALLEL 8; ALTER TABLE customers PARALLEL 4;
In this case, queries accessing just the sales
table use a requested DOP of 8 and queries accessing the customers
table request a DOP of 4. A query accessing both the sales
and the customers
table will be processed with a DOP of 8 and potentially allocate 16 parallel servers (producer/consumer); whenever different DOPs are specified, Oracle Database uses the higher DOP.
If the PARALLEL
clause is specified but no degree is listed, the object will get the default DOP. Default parallelism uses a formula to determine the DOP based on the system configuration, as in the following:
For a single instance, DOP = PARALLEL_THREADS_PER_CPU x CPU_COUNT
For an Oracle RAC configuration, DOP = PARALLEL_THREADS_PER_CPU x CPU_COUNT x INSTANCE_COUNT
By default, INSTANCE_COUNT
is all of the nodes in the cluster. However, if you have used Oracle RAC services to limit the number of nodes a parallel operation can execute across, then the number of participating nodes is the number of nodes belonging to that service. For example, on a four node RAC cluster, with each node having 8 CPU cores and no RAC services, the default DOP would be 2 x 8 x 4 = 64.
The default DOP algorithm is designed to use maximum resources and assumes that the operation will finish faster if it can use more resources. Default parallelism targets the single-user workload. In a multi-user environment, default parallelism is not recommended.
The DOP for a SQL statement can also be set or limited by the Resource Manager. See Oracle Database Administrator's Guide for more information.
When the parameter PARALLEL_DEGREE_POLICY
is set to AUTO
, Oracle Database automatically decides if a statement should execute in parallel or not and what DOP it should use. Oracle Database also determines if the statement can be executed immediately or if it will be queued until more system resources are available. Finally, it decides if the statement can take advantage of the aggregated cluster memory or not.
The optimizer automatically determines the DOP for a statement based on the resource requirements of the statement. The optimizer uses the cost of all scan operations (full table scan, index fast full scan, and so on) in the execution plan to determine the necessary DOP for the statement.
However, the optimizer will cap the actual DOP to ensure parallel server processes do not flood the system. This cap is set by the parameter PARALLEL_DEGREE_LIMIT
. The default for value for this parameter is CPU, which means the number of processes is limited by the number of CPUs on the system (PARALLEL_THREADS_PER_CPU * CPU_COUNT * INSTANCE_COUNT
) also known as the default DOP. By adjusting this parameter setting, you can control the maximum DOP the optimizer can choose for a SQL statement.
The DOP determined by the optimizer is shown in the notes section of an execution plan, visible either using the explain plan command or V$SQL_PLAN
.
EXPLAIN PLAN FOR SELECT SUM(AMOUNT_SOLD) FROM SH.SALES; PLAN TABLE OUTPUT Plan hash value: 672559287 ------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost(%CPU) | Time | Pstart | Pstop | ------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 4 | 5 (0) | 00:00:01 | | | | 1 | SORT AGGREGATE | | 1 | 4 | | | | | | 2 | PX COORDINATOR | | 1 | 4 | | | | | | 3 | PX SEND QC(RANDOM) |:TQ10000| 1 | 4 | 5 (0) | | | | | 4 | SORT AGGREGATE | | 1 | 4 | | | | | | 5 | PX BLOCK ITERATOR | | 960 | 3840 | 5 (0) | 00:00:01 | 1 | 16 | | 6 | TABLE ACCESS FULL | SALES | 960 | 3840 | 5 (0) | 00:00:01 | 1 | 16 | -------------------------------------------------------------------------------------------------- Note ----- - Computed Degree of Parallelism is 2 - Degree of Parallelism of 2 is derived from scan of object SH.SALES
PARALLEL_MIN_TIME_THRESHOLD
is the second initialization parameter that controls automatic DOP. It specifies the minimum execution time a statement should have before the statement is considered for automatic DOP. By default, this is 30 seconds. The optimizer will first calculate a serial execution plan for the SQL statement; if the estimated execution elapse time is greater than PARALLEL_MIN_TIME_THRESHOLD
(30 seconds)
, the statement will become a candidate for automatic DOP.
There are two initialization parameters that control automatic DOP, PARALLEL_DEGREE_POLICY
and PARALLEL_MIN_TIME_THRESHOLD
. They are described in this section, as well as "Automatic Degree of Parallelism" and "Controlling Automatic Degree of Parallelism, Statement Queuing, and In-Memory Parallel Execution".
You can set the DOP using an ALTER
SESSION
statement, as in the following:
ALTER SESSION SET parallel_degree_policy = limited; ALTER TABLE emp parallel (degree default);
You can use the PARALLEL
hint to force parallelism. It takes an optional parameter, the DOP at which the statement should run. In addition, the NO_PARALLEL
hint overrides a PARALLEL
parameter in the DDL that created or altered the table.The following example illustrates forcing the statement to be executed in parallel:
SELECT /*+parallel */ ename, dname FROM emp e, dept d WHERE e.deptno=d.deptno;
The following example illustrates forcing the statement to be executed in parallel with a degree of 10:
SELECT /*+ parallel(10) */ ename, dname FROM emp e, dept d WHERE e.deptno=d.deptno;
The following example illustrates forcing the statement to be executed in serial:
SELECT /*+ no_parallel */ ename, dname FROM emp e, dept d WHERE e.deptno=d.deptno;
The following example illustrates computing the DOP the statement should use:
SELECT /*+ parallel(auto) */ ename, dname FROM emp e, dept d WHERE e.deptno=d.deptno;
The following example forces the statement to use Oracle Database 11g, Release 1 (11.1) behavior:
SELECT /*+ parallel(manual) */ ename, dname FROM emp e, dept d WHERE e.deptno=d.deptno;
When the parameter PARALLEL_DEGREE_POLICY
is set to AUTO
, Oracle will queue SQL statements that require parallel execution, if the necessary parallel server processes are not available. Once the necessary resources become available, the SQL statement will be dequeued and allowed to execute. The queue is a simple First In - First Out queue based on the time a statement was issued.
Statement queuing will begin once the number of parallel server processes active on the system is equal to or greater than PARALLEL_SERVERS_TARGET
. By default, this parameter is set to 4 X CPU_COUNT X PARALLEL_THREADS_PER_CPU X ACTIVE_INSTANCES
. This is not the maximum number of parallel server processes allowed on the system, but the number available to run parallel statements before statement queuing will be used. It is set lower than the maximum number of parallel server processes allowed on the system (PARALLEL_MAX_SERVERS
) to ensure each parallel statement will get all of the parallel server resources required and to prevent overloading the system with parallel server processes. Note all serial (non-parallel) statements will execute immediately even if statement queuing has been activated.
There are two wait events to help identity if a statement has been queued. A statement waiting on the event PX QUEUING: statement queue
is the first statement in the statement queue. Once the necessary resource become available for this statement, it will be dequeued and will be executed. All other statements in the queue will be waiting on PX QUEUING: statement queue
. Only when a statement gets to the head of the queue will the wait event switch to PX QUEUING: statement queue
.
When the parameter PARALLEL_DEGREE_POLICY
is set to AUTO
, Oracle Database will decide if an object that will be accessed using parallel execution would benefit from being cached in the SGA (buffer cache). The decision to cache an object is based on a well-defined set of heuristics including the size of the object and frequency on which it is accessed. In an Oracle RAC environment, Oracle Database will affinitize or map pieces of the object into each of the buffer caches on the active instances. By creating this mapping, Oracle Database will automatically know which buffer cache to access to find different parts or pieces of the object. Using this information, Oracle Database will prevent multiple instances from reading the same information from disk over and over again, thus maximizing the amount of memory that can be used to cache objects. If the size of the object is larger than the size of the buffer cache (single instance) or the size of the buffer cache multiplied by the number of active instances in an Oracle RAC cluster, then it will be read using direct path reads.
The initialization parameter PARALLEL_DEGREE_POLICY
controls whether or not automatic degree of parallelism (Auto DOP), statement queuing, and in-memory parallel execution will be enabled. This parameter has three possible values:
MANUAL
- Disables Auto DOP, statement queuing and in-memory parallel execution. It reverts the behavior of parallel execution to what it was prior to Oracle Database 11g, Release 2 (11.2), which is the default.
LIMITED
- Enables Auto DOP for some statements but statement queuing and in-memory parallel execution are disabled. Auto DOP is only applied to statements that access tables or indexes decorated explicitly with the PARALLEL
clause. Tables and indexes that have a specific DOP specified will use that specified DOP.
AUTO
- Enables Auto DOP, statement queuing, and in-memory parallel execution.
By default, the system will only use parallel execution when a parallel degree has been explicitly set on an object or if a parallel hint is specified in the SQL statement. The degree of parallelism used will be exactly what was specified. No statement queue will occur and parallel execution will not use the buffer cache. If you want Oracle Database to decide the degree of parallelism, a statement should set PARALLEL_DEGREE_POLICY
to AUTO
. However, if you want Oracle Database to only automatically decide the degree of parallelism for a subset of statements that touch a specific subset of objects, then set PARALLEL_DEGREE_POLICY
to LIMITED
and set the parallel clause on that subset of objects.
The adaptive multiuser algorithm, which is enabled by default, reduces the degree of parallelism as the load on the system increases. When using Oracle's adaptive parallelism capabilities, the database will use an algorithm at SQL execution time to determine whether a parallel operation should receive the requested DOP or have its DOP lower to ensure the system is not overloaded.
In a system that makes aggressive use of parallel execution by using a high DOP, the adaptive algorithm will adjust the DOP down with only few operations running in parallel. While the algorithm will still ensure optimal resource utilization, users may experience inconsistent response times. Using solely the adaptive parallelism capabilities in an environment that requires deterministic response times is not advised. Adaptive parallelism is controlled through the database initialization parameter PARALLEL_ADAPTIVE_MULTI_USER
.
When an instance starts up, Oracle Database creates a pool of parallel execution servers, which are available for any parallel operation. The initialization parameter PARALLEL_MIN_SERVERS
specifies the number of parallel execution servers that Oracle Database creates at instance startup.
When executing a parallel operation, the parallel execution coordinator obtains parallel execution servers from the pool and assigns them to the operation. If necessary, Oracle Database can create additional parallel execution servers for the operation. These parallel execution servers remain with the operation throughout execution. After the statement has been processed completely, the parallel execution servers return to the pool.
If the number of parallel operations increases, Oracle Database creates additional parallel execution servers to handle incoming requests. However, Oracle Database never creates more parallel execution servers for an instance than the value specified by the initialization parameter PARALLEL_MAX_SERVERS
.
If the number of parallel operations decreases, Oracle Database terminates any parallel execution servers that have been idle for a threshold period of time. Oracle Database does not reduce the size of the pool less than the value of PARALLEL_MIN_SERVERS
, no matter how long the parallel execution servers have been idle.
Oracle Database can process a parallel operation with fewer than the requested number of processes. If all parallel execution servers in the pool are occupied and the maximum number of parallel execution servers has been started, the parallel execution coordinator switches to serial processing.
See Oracle Database Reference for information about using the initialization parameter PARALLEL_MIN_PERCENT
and "Tuning General Parameters for Parallel Execution" for information about the PARALLEL_MIN_PERCENT
and PARALLEL_MAX_SERVERS
initialization parameters.
The basic unit of work in parallelism is a called a granule. Oracle Database divides the operation being parallelized (for example, a table scan, table update, or index creation) into granules. Parallel execution processes execute the operation one granule at a time. The number of granules and their size correlates with the degree of parallelism (DOP). It also affects how well the work is balanced across query server processes.
Block range granules are the basic unit of most parallel operations, even on partitioned tables. Therefore, from an Oracle Database perspective, the degree of parallelism is not related to the number of partitions.
Block range granules are ranges of physical blocks from a table. Oracle Database computes the number and the size of the granules during runtime to optimize and balance the work distribution for all affected parallel execution servers. The number and size of granules are dependent upon the size of the object and the DOP. Block range granules do not depend on static pre-allocation of tables or indexes. During the computation of the granules, Oracle Database takes the DOP into account and tries to assign granules from different data files to each of the parallel execution servers to avoid contention whenever possible. Additionally, Oracle Database considers the disk affinity of the granules on MPP systems to take advantage of the physical proximity between parallel execution servers and disks.
When partition granules are used, a parallel server process works on an entire partition or subpartition of a table or index. Because partition granules are statically determined by the structure of the table or index when a table or index is created, partition granules do not give you the flexibility in parallelizing an operation that block granules do. The maximum allowable DOP is the number of partitions. This might limit the utilization of the system and the load balancing across parallel execution servers.
When partition granules are used for parallel access to a table or index, you should use a relatively large number of partitions (ideally, three times the DOP), so that Oracle Database can effectively balance work across the query server processes.
Partition granules are the basic unit of parallel index range scans, joins between two equipartitioned tables where the query optimizer has chosen to use partition-wise joins, and of parallel operations that modify multiple partitions of a partitioned object. These operations include parallel creation of partitioned indexes, and parallel creation of partitioned tables.
You can tell which types of granules were used by looking at the execution plan of a statement. The line PX
BLOCK
ITERATOR
above the table or index access indicates that block range granules have been used. In the following example plan, you can see this on line 7 of the execution plan just above the TABLE
FULL
ACCESS
on the SALES
table.
------------------------------------------------------------------------------------------------- |Id| Operation | Name |Rows|Bytes|Cost%CPU| Time |Pst|Pst| TQ |INOUT|PQDistri| ------------------------------------------------------------------------------------------------- | 0|SELECT STATEMENT | | 17| 153 |565(100)|00:00:07| | | | | | | 1| PX COORDINATOR | | | | | | | | | | | | 2| PX SEND QC(RANDOM) |:TQ10001| 17| 153 |565(100)|00:00:07| | |Q1,01|P->S |QC(RAND)| | 3| HASH GROUP BY | | 17| 153 |565(100)|00:00:07| | |Q1,01|PCWP | | | 4| PX RECEIVE | | 17| 153 |565(100)|00:00:07| | |Q1,01|PCWP | | | 5| PX SEND HASH |:TQ10000| 17| 153 |565(100)|00:00:07| | |Q1,00|P->P | HASH | | 6| HASH GROUP BY | | 17| 153 |565(100)|00:00:07| | |Q1,00|PCWP | | | 7| PX BLOCK ITERATOR | | 10M| 85M | 60(97) |00:00:01| 1 | 16|Q1,00|PCWC | | |*8| TABLE ACCESS FULL| SALES | 10M| 85M | 60(97) |00:00:01| 1 | 16|Q1,00|PCWP | | ------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 8 - filter("CUST_ID"<=22810 AND "CUST_ID">=22300)
When partition granules are used, you will see the line PX
PARTITION
RANGE
above the table or index access in the execution plan. In the example below, on line 6, it says PX
PARTITION
RANGE
ALL
because this statement will access all of the 16 partitions in the table. If not all of the partitions are accessed, it will simply say PX
PARTITION
RANGE
.
--------------------------------------------------------------------------------------------------- |Id| Operation | Name |Rows|Byte|Cost%CPU| Time |Ps|Ps| TQ |INOU|PQDistri| --------------------------------------------------------------------------------------------------- | 0|SELECT STATEMENT | | 17| 153| 2(50)|00:00:01| | | | | | | 1| PX COORDINATOR | | | | | | | | | | | | 2| PX SEND QC(RANDOM) |:TQ10001| 17| 153| 2(50)|00:00:01| | |Q1,01|P->S|QC(RAND)| | 3| HASH GROUP BY | | 17| 153| 2(50)|00:00:01| | |Q1,01|PCWP| | | 4| PX RECEIVE | | 26| 234| 1(0)|00:00:01| | |Q1,01|PCWP| | | 5| PX SEND HASH |:TQ10000| 26| 234| 1(0)|00:00:01| | |Q1,00|P->P| HASH | | 6| PX PARTITION RANGE ALL | | 26| 234| 1(0)|00:00:01| | |Q1,00|PCWP| | | 7| TABLEACCESSLOCAL INDEX ROWID|SALES| 26| 234| 1(0)|00:00:01| 1|16|Q1,00|PCWC| | |*8| INDEX RANGE SCAN |SALES_CUST|26| | 1(0)|00:00:01| 1|16|Q1,00|PCWP| | --------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 8 - access("CUST_ID"<=22810 AND "CUST_ID">=22300)
To optimize performance, all parallel execution servers should have equal workloads. For SQL statements parallelized by block range or by parallel execution servers, the workload is dynamically divided among the parallel execution servers. This minimizes workload skewing, which occurs when some parallel execution servers perform significantly more work than the other processes.
For the relatively few SQL statements parallelized by partitions, if the workload is evenly distributed among the partitions, you can optimize performance by matching the number of parallel execution servers to the number of partitions or by choosing a DOP in which the number of partitions is a multiple of the number of processes. This applies to partition-wise joins and parallel DML on tables created before Oracle9i Database. See "Limitation on the Degree of Parallelism" for details regarding this topic.
For example, suppose a table has 16 partitions, and a parallel operation divides the work evenly among them. You can use 16 parallel execution servers (DOP equals 16) to do the work in approximately one-tenth the time that one process would take. You might also use five processes to do the work in one-fifth the time, or two processes to do the work in one-half the time.
If, however, you use 15 processes to work on 16 partitions, the first process to finish its work on one partition then begins work on the 16th partition; and as the other processes finish their work, they become idle. This configuration does not provide good performance when the work is evenly divided among partitions. When the work is unevenly divided, the performance varies depending on whether the partition that is left for last has more or less work than the other partitions.
Similarly, suppose you use six processes to work on 16 partitions and the work is evenly divided. In this case, each process works on a second partition after finishing its first partition, but only four of the processes work on a third partition while the other two remain idle.
In general, you cannot assume that the time taken to perform a parallel operation on a given number of partitions (N) with a given number of parallel execution servers (P) will be N/P. This formula does not take into account the possibility that some processes might have to wait while others finish working on the last partitions. By choosing an appropriate DOP, however, you can minimize the workload skew and optimize performance.
By default, in an Oracle RAC environment, a SQL statement executed in parallel can run across all of the nodes in the cluster. For this cross-node or inter-node parallel execution to perform the interconnect in the RAC environment must be size appropriately as inter-node parallel execution may result in a lot of interconnect traffic. If interconnect has a considerably lower bandwidth in comparison to the I/O bandwidth from the server to the storage subsystem, it may be better to restrict the parallel execution to a single node or to a limited number of nodes. Inter-node parallel execution will not scale with an undersized interconnect.
Prior to Oracle Database 11g, Release 2 (11.2), the initialization parameters PARALLEL_INSTANCE_GROUP
and INSTANCE_GROUPS
or database services could be used to limit inter-node parallel execution. In the current release, there is a simpler way to control parallel execution in an Oracle RAC environment using a single initialization parameter PARALLEL_FORCE_LOCAL_RAC
. By setting this parameter to TRUE
, the parallel server processes can only execute on the same RAC node where the SQL statement was started.
In Oracle Real Application Clusters, services are used to limit the number of instances that participate in a parallel SQL operation. The default service includes all available instances. You can create any number of services, each consisting of one or more instances. Parallel execution servers will only be used on instances which are members of the specified service. See Oracle Real Application Clusters Administration and Deployment Guide for more information about instance groups.
The following types of parallelism are discussed in this section:
You can parallelize queries and subqueries in SELECT
statements. You can also parallelize the query portions of DDL statements and DML statements (INSERT
, UPDATE
, and DELETE
). You can also query external tables in parallel.
Parallelization has two components: the decision to parallelize and the DOP. These components are determined differently for queries, DDL operations, and DML operations. To determine the DOP, Oracle Database looks at the reference objects:
Parallel query looks at each table and index, in the portion of the query being parallelized, to determine which is the reference table. The basic rule is to pick the table or index with the largest DOP.
For parallel DML (INSERT
, UPDATE
, MERGE
, and DELETE
), the reference object that determines the DOP is the table being modified by and insert, update, or delete operation. Parallel DML also adds some limits to the DOP to prevent deadlock. If the parallel DML statement includes a subquery, the subquery's DOP is the same as the DML operation.
For parallel DDL, the reference object that determines the DOP is the table, index, or partition being created, rebuilt, split, or moved. If the parallel DDL statement includes a subquery, the subquery's DOP is the same as the DDL operation.
See Also:
"Operations That Can Be Parallelized" for information on the query operations that Oracle Database can parallelize
"Parallelizing SQL Statements" for an explanation of how the processes perform parallel queries
"Distributed Transaction Restrictions" for examples of queries that reference a remote object
"Rules for Parallelizing Queries" for information on the conditions for parallelizing a query and the factors that determine the DOP
The following parallel scan methods are supported on index-organized tables:
Parallel fast full scan of a nonpartitioned index-organized table
Parallel fast full scan of a partitioned index-organized table
Parallel index range scan of a partitioned index-organized table
These scan methods can be used for index-organized tables with overflow areas and for index-organized tables that contain LOBs.
Parallel query on a nonpartitioned index-organized table uses parallel fast full scan. The DOP is determined, in decreasing order of priority, by:
A PARALLEL
hint (if present)
An ALTER
SESSION
FORCE
PARALLEL
QUERY
statement
The parallel degree associated with the table, if the parallel degree is specified in the CREATE
TABLE
or ALTER
TABLE
statement
The allocation of work is done by dividing the index segment into a sufficiently large number of block ranges and then assigning the block ranges to parallel execution servers in a demand-driven manner. The overflow blocks corresponding to any row are accessed in a demand-driven manner only by the process, which owns that row.
Both index range scan and fast full scan can be performed in parallel. For parallel fast full scan, parallelization is exactly the same as for nonpartitioned index-organized tables. For a parallel index range scan on a partitioned index-organized table, the DOP is the minimum of the degree picked up from the previous priority list (like in parallel fast full scan) and the number of partitions in the index-organized table. Depending on the DOP, each parallel execution server gets one or more partitions, each of which contains the primary key index segment and the associated overflow segment, if any.
Parallel queries can be performed on object type tables and tables containing object type columns. Parallel query for object types supports all of the features that are available for sequential queries on object types, including:
Methods on object types
Attribute access of object types
Constructors to create object type instances
Object views
PL/SQL and OCI queries for object types
There are no limitations on the size of the object types for parallel queries.
The following restrictions apply to using parallel query for object types.
A MAP
function is needed to parallelize queries involving joins and sorts (through ORDER
BY
, GROUP
BY
, or set operations). In the absence of a MAP
function, the query will automatically be executed serially.
Parallel DML and parallel DDL are not supported with object types, and such statements are always performed serially.
In all cases where the query cannot execute in parallel because of any of these restrictions, the whole query executes serially without giving an error message.
This section discusses some rules for parallelizing queries.
A SELECT
statement can be parallelized only if the following conditions are satisfied:
The query includes a parallel hint specification (PARALLEL
or PARALLEL_INDEX
) or the schema objects referred to in the query have a PARALLEL
declaration associated with them.
At least one of the tables specified in the query requires one of the following:
A full table scan
An index range scan spanning multiple partitions
No scalar subqueries are in the SELECT
list.
The DOP for a query is determined by the following rules:
The query uses the maximum DOP taken from all of the table declarations involved in the query and all of the potential indexes that are candidates to satisfy the query (the reference objects). That is, the table or index that has the greatest DOP determines the query's DOP (maximum query directive).
If a table has both a parallel hint specification in the query and a parallel declaration in its table specification, the hint specification takes precedence over parallel declaration specification. See Table 8-2 for precedence rules.
This section includes the following topics on parallelism for DDL statements:
You can parallelize DDL statements for tables and indexes that are nonpartitioned or partitioned. Table 8-2 summarizes the operations that can be parallelized in DDL statements.
The parallel DDL statements for nonpartitioned tables and indexes are:
CREATE
INDEX
CREATE
TABLE
... AS
SELECT
ALTER
INDEX
... REBUILD
The parallel DDL statements for partitioned tables and indexes are:
CREATE
INDEX
CREATE
TABLE
... AS
SELECT
ALTER
TABLE
... [MOVE|SPLIT|COALESCE]
PARTITION
ALTER
INDEX
... [REBUILD|SPLIT
] PARTITION
This statement can be executed in parallel only if the (global) index partition being split is usable.
All of these DDL operations can be performed in no-logging mode for either parallel or serial execution.
CREATE
TABLE
for an index-organized table can be parallelized either with or without an AS
SELECT
clause.
Different parallelism is used for different operations (see Table 8-2). Parallel CREATE
TABLE
... AS
SELECT
statements on partitioned tables and parallel CREATE
INDEX
statements on partitioned indexes execute with a DOP equal to the number of partitions.
Parallel DDL cannot occur on tables with object columns. Parallel DDL cannot occur on non-partitioned tables with LOB
columns.
Parallel execution lets you parallelize the query and create operations of creating a table as a subquery from another table or set of tables. This can be extremely useful in the creation of summary or rollup tables.
Clustered tables cannot be created and populated in parallel.
Figure 8-4 illustrates creating a table from a subquery in parallel.
Figure 8-4 Creating a Summary Table in Parallel
Parallel DDL is often used to create summary tables or do massive data loads that are stand-alone transaction, which do not always need to be recoverable. By switching off Oracle Database logging, no undo or redo will be generated, so the parallel DML operation will likely perform better but will become an "all or nothing" operation. In other words, if the operation fails, for whatever reason, you will need to completely redo the operation, it will not be possible to restart it.
If you disable logging during parallel table creation (or any other parallel DDL operation), you should back up the tablespace containing the table once the table is created to avoid loss of the table due to media failure.
Use the NOLOGGING
clause of the CREATE
TABLE
, CREATE
INDEX
, ALTER
TABLE
, and ALTER
INDEX
statements to disable undo and redo log generation.
Creating a table or index in parallel has space management implications that affect both the storage space required during a parallel operation and the free space available after a table or index has been created.
When creating a table or index in parallel, each parallel execution server uses the values in the STORAGE
clause of the CREATE
statement to create temporary segments to store the rows. Therefore, a table created with a NEXT
setting of 4 MB and a PARALLEL
DEGREE
of 16 consumes at least 64 megabytes (MB) of storage during table creation because each parallel server process starts with an extent of 4 MB. When the parallel execution coordinator combines the segments, some of the segments may be trimmed, and the resulting table may be smaller than the requested 64 MB.
When you create indexes and tables in parallel, each parallel execution server allocates a new extent and fills the extent with the table or index data. Thus, if you create an index with a DOP of 4, the index will have at least four extents initially. This allocation of extents is the same for rebuilding indexes in parallel and for moving, splitting, or rebuilding partitions in parallel.
Serial operations require the schema object to have at least one extent. Parallel creations require that tables or indexes have at least as many extents as there are parallel execution servers creating the schema object.
When you create a table or index in parallel, it is possible to create pockets of free space. This occurs when the temporary segments used by the parallel execution servers are larger than what is needed to store the rows.
If the unused space in each temporary segment is larger than the value of the MINIMUM
EXTENT
parameter set at the tablespace level, then Oracle Database trims the unused space when merging rows from all of the temporary segments into the table or index. The unused space is returned to the system free space and can be allocated for new extents, but it cannot be coalesced into a larger segment because it is not contiguous space (external fragmentation).
If the unused space in each temporary segment is smaller than the value of the MINIMUM
EXTENT
parameter, then unused space cannot be trimmed when the rows in the temporary segments are merged. This unused space is not returned to the system free space; it becomes part of the table or index (internal fragmentation) and is available only for subsequent inserts or for updates that require additional space.
For example, if you specify a DOP of 3 for a CREATE
TABLE
... AS
SELECT
statement, but there is only one data file in the tablespace, then internal fragmentation may occur, as shown in Figure 8-5. The pockets of free space within the internal table extents of a data file cannot be coalesced with other free space and cannot be allocated as extents.
See Oracle Database Performance Tuning Guide for more information about creating tables and indexes in parallel.
Figure 8-5 Unusable Free Space (Internal Fragmentation)
You need to keep the following in mind when parallelizing DDL statements.
DDL operations can be parallelized if a PARALLEL
clause (declaration) is specified in the syntax. In the case of CREATE
INDEX
and ALTER
INDEX
... REBUILD
or ALTER
INDEX
... REBUILD
PARTITION
, the parallel declaration is stored in the data dictionary.
You can use the ALTER
SESSION
FORCE
PARALLEL
DDL
statement to override the parallel clauses of subsequent DDL statements in a session.
The DOP is determined by the specification in the PARALLEL
clause, unless it is overridden by an ALTER
SESSION
FORCE
PARALLEL
DDL
statement. A rebuild of a partitioned index is never parallelized.
Parallel clauses in CREATE
TABLE
and ALTER
TABLE
statements specify table parallelism. If a parallel clause exists in a table definition, it determines the parallelism of DDL statements as well as queries. If the DDL statement contains explicit parallel hints for a table, however, those hints override the effect of parallel clauses for that table. You can use the ALTER
SESSION
FORCE
PARALLEL
DDL
statement to override parallel clauses.
The following rules apply:
The CREATE
INDEX
and ALTER
INDEX
... REBUILD
statements can be parallelized only by a PARALLEL
clause or an ALTER
SESSION
FORCE
PARALLEL
DDL
statement.
ALTER
INDEX
... REBUILD
can be parallelized only for a nonpartitioned index, but ALTER
INDEX
... REBUILD
PARTITION
can be parallelized by a PARALLEL
clause or an ALTER
SESSION
FORCE
PARALLEL
DDL
statement.
The scan operation for ALTER
INDEX
... REBUILD
(nonpartitioned), ALTER
INDEX
... REBUILD
PARTITION
, and CREATE
INDEX
has the same parallelism as the REBUILD
or CREATE
operation and uses the same DOP. If the DOP is not specified for REBUILD
or CREATE
, the default is the number of CPUs.
The ALTER
INDEX
... MOVE
PARTITION
and ALTER
IN
DEX
...SPLIT
PARTITION
statements can be parallelized only by a PARALLEL
clause or an ALTER
SESSION
FORCE
PARALLEL
DDL
statement. Their scan operations have the same parallelism as the corresponding MOVE
or SPLIT
operations. If the DOP is not specified, the default is the number of CPUs.
The CREATE
TABLE
... AS
SELECT
statement contains two parts: a CREATE
part (DDL) and a SELECT
part (query). Oracle Database can parallelize both parts of the statement. The CREATE
part follows the same rules as other DDL operations.
The query part of a CREATE
TABLE
... AS
SELECT
statement can be parallelized only if the following conditions are satisfied:
The query includes a parallel hint specification (PARALLEL
or PARALLEL_INDEX
) or the CREATE
part of the statement has a PARALLEL
clause specification or the schema objects referred to in the query have a PARALLEL
declaration associated with them.
At least one of the tables specified in the query requires one of the following: a full table scan or an index range scan spanning multiple partitions.
The DOP for the query part of a CREATE
TABLE
... AS
SELECT
statement is determined by one of the following rules:
The query part uses the values specified in the PARALLEL
clause of the CREATE
part.
If the PARALLEL
clause is not specified, the default DOP is the number of CPUs.
If the CREATE
is serial, then the DOP is determined by the query.
Note that any values specified in a hint for parallelism are ignored.
The CREATE
operation of CREATE
TABLE
... AS
SELECT
can be parallelized only by a PARALLEL
clause or an ALTER
SESSION
FORCE
PARALLEL
DDL
statement.
When the CREATE
operation of CREATE
TABLE
... AS
SELECT
is parallelized, Oracle Database also parallelizes the scan operation if possible. The scan operation cannot be parallelized if, for example:
The SELECT
clause has a NO_PARALLEL
hint
The operation scans an index of a nonpartitioned table
When the CREATE
operation is not parallelized, the SELECT
can be parallelized if it has a PARALLEL
hint or if the selected table (or partitioned index) has a parallel declaration.
The DOP for the CREATE
operation, and for the SELECT
operation if it is parallelized, is specified by the PARALLEL
clause of the CREATE
statement, unless it is overridden by an ALTER
SESSION
FORCE
PARALLEL
DDL
statement. If the PARALLEL
clause does not specify the DOP, the default is the number of CPUs.
Parallel DML (PARALLEL
INSERT
, UPDATE
, DELETE
, and MERGE
) uses parallel execution mechanisms to speed up or scale up large DML operations against large database tables and indexes.
Note:
Although DML generally includes queries, in this chapter the term DML refers only to inserts, updates, merges, and deletes.This section discusses the following parallel DML topics:
Parallel DML is useful in a DSS environment where the performance and scalability of accessing large objects are important. Parallel DML complements parallel query in providing you with both querying and updating capabilities for your DSS databases.
The overhead of setting up parallelism makes parallel DML operations infeasible for short OLTP transactions. However, parallel DML operations can speed up batch jobs running in an OLTP database.
Some of the scenarios where parallel DML is used include:
In a data warehouse system, large tables need to be refreshed (updated) periodically with new or modified data from the production system. You can do this efficiently by using the MERGE
statement.
In a DSS environment, many applications require complex computations that involve constructing and manipulating many large intermediate summary tables. These summary tables are often temporary and frequently do not need to be logged. Parallel DML can speed up the operations against these large intermediate tables. One benefit is that you can put incremental results in the intermediate tables and perform parallel update.
In addition, the summary tables may contain cumulative or comparison information which has to persist beyond application sessions; thus, temporary tables are not feasible. Parallel DML operations can speed up the changes to these large summary tables.
Many DSS applications score customers periodically based on a set of criteria. The scores are usually stored in large DSS tables. The score information is then used in making a decision, for example, inclusion in a mailing list.
This scoring activity queries and updates a large number of rows in the table. Parallel DML can speed up the operations against these large tables.
Historical tables describe the business transactions of an enterprise over a recent time interval. Periodically, the DBA deletes the set of oldest rows and inserts a set of new rows into the table. Parallel INSERT
... SELECT
and parallel DELETE
operations can speed up this rollover task.
Dropping a partition can also be used to delete old rows. However, to do this, the table has to be partitioned by date and with the appropriate time interval.
Batch jobs executed in an OLTP database during off hours have a fixed time window in which the jobs must complete. A good way to ensure timely job completion is to parallelize their operations. As the work load increases, more computer resources can be added; the scaleup property of parallel operations ensures that the time constraint can be met.
A DML statement can be parallelized only if you have explicitly enabled parallel DML in the session. For example, the following statement:
ALTER SESSION ENABLE PARALLEL DML;
This mode is required because parallel DML and serial DML have different locking, transaction, and disk space requirements and parallel DML is disabled for a session by default.
When parallel DML is disabled, no DML will be executed in parallel even if the PARALLEL
hint is used.
When parallel DML is enabled in a session, all DML statements in this session will be considered for parallel execution. However, even if parallel DML is enabled, the DML operation may still execute serially if there are no parallel hints or no tables with a parallel attribute or if restrictions on parallel operations are violated.
The session's PARALLEL
DML
mode does not influence the parallelism of SELECT
statements, DDL statements, and the query portions of DML statements. Thus, if this mode is not set, the DML operation is not parallelized, but scans or join operations within the DML statement may still be parallelized.
UPDATE
, MERGE
, and DELETE
operations are parallelized by partition or subpartition. Update, merge, and delete parallelism are not possible within a partition, nor on a nonpartitioned table. See "Limitation on the Degree of Parallelism" for a possible restriction.
You have two ways to specify parallel directives for UPDATE
, MERGE
, and DELETE
operations (assuming that PARALLEL
DML
mode is enabled):
Use a parallel clause in the definition of the table being updated or deleted (the reference object).
Use an update, merge, or delete parallel hint in the statement.
Parallel hints are placed immediately after the UPDATE
, MERGE
, or DELETE
keywords in UPDATE
, MERGE
, and DELETE
statements. The hint also applies to the underlying scan of the table being changed.
You can use the ALTER
SESSION
FORCE
PARALLEL
DML
statement to override parallel clauses for subsequent UPDATE
, MERGE
, and DELETE
statements in a session. Parallel hints in UPDATE
, MERGE
, and DELETE
statements override the ALTER
SESSION
FORCE
PARALLEL
DML
statement.
The following rule determines whether the UPDATE
, MERGE
, or DELETE
operation should be parallelized:
The UPDATE
or DELETE
operation will be parallelized if and only if at least one of the following is true:
The table being updated or deleted has a PARALLEL
specification.
The PARALLEL
hint is specified in the DML statement.
An ALTER
SESSION
FORCE
PARALLEL
DML
statement has been issued previously during the session.
If the statement contains subqueries or updatable views, then they may have their own separate parallel hints or clauses. However, these parallel directives do not affect the decision to parallelize the UPDATE
, MERGE
, or DELETE
.
The parallel hint or clause on the tables is used by both the query and the UPDATE
, MERGE
, DELETE
portions to determine parallelism, the decision to parallelize the UPDATE
, MERGE
, or DELETE
portion is made independently of the query portion, and vice versa.
The DOP is determined by the same rules as for the queries. Note that, in the case of UPDATE
and DELETE
operations, only the target table to be modified (the only reference object) is involved. Thus, the UPDATE
or DELETE
parallel hint specification takes precedence over the parallel declaration specification of the target table. In other words, the precedence order is: MERGE
, UPDATE
, DELETE
hint > Session > Parallel declaration specification of target table. See Table 8-2 for precedence rules.
If the DOP is less than the number of partitions, then the first process to finish work on one partition continues working on another partition, and so on until the work is finished on all partitions. If the DOP is greater than the number of partitions involved in the operation, then the excess parallel execution servers will have no work to do.
Example 8-1 Parallelization: Example 1
UPDATE tbl_1 SET c1=c1+1 WHERE c1>100;
If tbl_1
is a partitioned table and its table definition has a parallel clause, then the update operation is parallelized even if the scan on the table is serial (such as an index scan), assuming that the table has more than one partition with c1
greater than 100.
An INSERT
... SELECT
statement parallelizes its INSERT
and SELECT
operations independently, except for the DOP.
You can specify a parallel hint after the INSERT
keyword in an INSERT
... SELECT
statement. Because the tables being queried are usually not the same as the table being inserted into, the hint enables you to specify parallel directives specifically for the insert operation.
You have the following ways to specify parallel directives for an INSERT
... SELECT
statement (assuming that PARALLEL
DML
mode is enabled):
SELECT
parallel hints specified at the statement
Parallel clauses specified in the definition of tables being selected
INSERT
parallel hint specified at the statement
Parallel clause specified in the definition of tables being inserted into
You can use the ALTER
SESSION
FORCE
PARALLEL
DML
statement to override parallel clauses for subsequent INSERT
operations in a session. Parallel hints in insert operations override the ALTER
SESSION
FORCE
PARALLEL
DML
statement.
The following rule determines whether the INSERT
operation should be parallelized in an INSERT
... SELECT
statement:
The INSERT
operation will be parallelized if and only if at least one of the following is true:
The PARALLEL
hint is specified after the INSERT
in the DML statement.
The table being inserted into (the reference object) has a PARALLEL
declaration specification.
An ALTER
SESSION
FORCE
PARALLEL
DML
statement has been issued previously during the session.
The decision to parallelize the INSERT
operation is made independently of the SELECT
operation, and vice versa.
Once the decision to parallelize the SELECT
or INSERT
operation is made, one parallel directive is picked for deciding the DOP of the whole statement, using the following precedence rule Insert hint directive > Session> Parallel declaration specification of the inserting table > Maximum query directive.
In this context, maximum query directive means that among multiple tables and indexes, the table or index that has the maximum DOP determines the parallelism for the query operation.
The chosen parallel directive is applied to both the SELECT
and INSERT
operations.
To execute a DML operation in parallel, the parallel execution coordinator acquires parallel execution servers, and each parallel execution server executes a portion of the work under its own parallel process transaction.
Each parallel execution server creates a different parallel process transaction.
If you use rollback segments instead of Automatic Undo Management, you may want to reduce contention on the rollback segments by limiting the number of parallel process transactions residing in the same rollback segment. See Oracle Database SQL Language Reference for more information.
The coordinator also has its own coordinator transaction, which can have its own rollback segment. In order to ensure user-level transactional atomicity, the coordinator uses a two-phase commit protocol to commit the changes performed by the parallel process transactions.
A session that is enabled for parallel DML may put transactions in the session in a special mode: If any DML statement in a transaction modifies a table in parallel, no subsequent serial or parallel query or DML statement can access the same table again in that transaction. This means that the results of parallel modifications cannot be seen during the transaction.
Serial or parallel statements that attempt to access a table that has already been modified in parallel within the same transaction are rejected with an error message.
If a PL/SQL procedure or block is executed in a parallel DML enabled session, then this rule applies to statements in the procedure or block.
If you use rollback segments instead of Automatic Undo Management, there are some restrictions when using parallel DML. See Oracle Database SQL Language Reference for information about restrictions for parallel DML and rollback segments.
The time required to roll back a parallel DML operation is roughly equal to the time it takes to perform the forward operation.
Oracle Database supports parallel rollback after transaction and process failures, and after instance and system failures. Oracle Database can parallelize both the rolling forward stage and the rolling back stage of transaction recovery.
See Oracle Database Backup and Recovery User's Guide for details about parallel rollback.
A user-issued rollback in a transaction failure due to statement error is performed in parallel by the parallel execution coordinator and the parallel execution servers. The rollback takes approximately the same amount of time as the forward transaction.
Recovery from the failure of a parallel execution coordinator or parallel execution server is performed by the PMON process. If a parallel execution server or a parallel execution coordinator fails, PMON rolls back the work from that process and all other processes in the transaction roll back their changes.
Recovery from a system failure requires a new startup. Recovery is performed by the SMON process and any recovery server processes spawned by SMON. Parallel DML statements may be recovered using parallel rollback. If the initialization parameter COMPATIBLE
is set to 8.1.3 or greater, Fast-Start On-Demand Rollback enables terminated transactions to be recovered, on demand one block at a time.
Parallel UPDATE
will use the existing free space in the object, while direct-path INSERT
gets new extents for the data.
Space usage characteristics may be different in parallel than sequential execution because multiple concurrent child transactions modify the object.
The following restrictions apply to parallel DML (including direct-path INSERT
):
Intra-partition parallelism for UPDATE
, MERGE
, and DELETE
operations require that the COMPATIBLE
initialization parameter be set to 9.2 or greater.
The INSERT
VALUES
statement is never parallelized.
A transaction can contain multiple parallel DML statements that modify different tables, but after a parallel DML statement modifies a table, no subsequent serial or parallel statement (DML or query) can access the same table again in that transaction.
This restriction also exists after a serial direct-path INSERT
statement: no subsequent SQL statement (DML or query) can access the modified table during that transaction.
Queries that access the same table are allowed before a parallel DML or direct-path INSERT
statement, but not after.
Any serial or parallel statements attempting to access a table that has already been modified by a parallel UPDATE
, DELETE
, or MERGE
, or a direct-path INSERT
during the same transaction are rejected with an error message.
Parallel DML operations cannot be done on tables with triggers.
Replication functionality is not supported for parallel DML.
Parallel DML cannot occur in the presence of certain constraints: self-referential integrity, delete cascade, and deferred integrity. In addition, for direct-path INSERT
, there is no support for any referential integrity.
Parallel DML can be done on tables with object columns provided you are not touching the object columns.
Parallel DML can be done on tables with LOB
columns provided the table is partitioned. However, intra-partition parallelism is not supported.
A transaction involved in a parallel DML operation cannot be or become a distributed transaction.
Clustered tables are not supported.
Parallel UPDATE
, DELETE
, and MERGE
operations are not supported for temporary tables.
Violations of these restrictions cause the statement to execute serially without warnings or error messages (except for the restriction on statements accessing the same table in a transaction, which can cause error messages). For example, an update is serialized if it is on a nonpartitioned table.
You can only update the partitioning key of a partitioned table to a new value if the update does not cause the row to move to a new partition. The update is possible if the table is defined with the row movement clause enabled.
The function restrictions for parallel DML are the same as those for parallel DDL and parallel query. See "Parallel Execution of Functions" for more information.
This section describes the interactions of integrity constraints and parallel DML statements.
These types of integrity constraints are allowed. They are not a problem for parallel DML because they are enforced on the column and row level, respectively.
Restrictions for referential integrity occur whenever a DML operation on one table could cause a recursive DML operation on another table. These restrictions also apply when, in order to perform an integrity check, it is necessary to see simultaneously all changes made to the object being modified.
Table 8-1 lists all of the operations that are possible on tables that are involved in referential integrity constraints.
Table 8-1 Referential Integrity Restrictions
DML Statement | Issued on Parent | Issued on Child | Self-Referential |
---|---|---|---|
|
(Not applicable) |
Not parallelized |
Not parallelized |
|
(Not applicable) |
Not parallelized |
Not parallelized |
|
Supported |
Supported |
Not parallelized |
|
Supported |
Supported |
Not parallelized |
|
Not parallelized |
(Not applicable) |
Not parallelized |
Delete on tables having a foreign key with delete cascade is not parallelized because parallel execution servers will try to delete rows from multiple partitions (parent and child tables).
DML on tables with self-referential integrity constraints is not parallelized if the referenced keys (primary keys) are involved. For DML on all other columns, parallelism is possible.
A DML operation will not be parallelized if the affected tables contain enabled triggers that may get fired as a result of the statement. This implies that DML statements on tables that are being replicated will not be parallelized.
Relevant triggers must be disabled in order to parallelize DML on the table. Note that, if you enable or disable triggers, the dependent shared cursors are invalidated.
A DML operation cannot be parallelized if it is in a distributed transaction or if the DML or the query operation is against a remote object.
This section contains several examples of distributed transaction processing.
Example 8-4 Distributed Transaction Parallelization
In this example, the DML statement queries a remote object:
INSERT /* APPEND PARALLEL (t3,2) */ INTO t3 SELECT * FROM t4@dblink;
The query operation is executed serially without notification because it references a remote object.
Example 8-5 Distributed Transaction Parallelization
In this example, the DML operation is applied to a remote object:
DELETE /*+ PARALLEL (t1, 2) */ FROM t1@dblink;
The DELETE
operation is not parallelized because it references a remote object.
Example 8-6 Distributed Transaction Parallelization
In this example, the DML operation is in a distributed transaction:
SELECT * FROM t1@dblink; DELETE /*+ PARALLEL (t2,2) */ FROM t2; COMMIT;
The DELETE
operation is not parallelized because it occurs in a distributed transaction (which is started by the SELECT
statement).
SQL statements can contain user-defined functions written in PL/SQL, in Java, or as external procedures in C that can appear as part of the SELECT
list, SET
clause, or WHERE
clause. When the SQL statement is parallelized, these functions are executed on a per-row basis by the parallel execution server process. Any PL/SQL package variables or Java static attributes used by the function are entirely private to each individual parallel execution process and are newly initialized when each row is processed, rather than being copied from the original session. Because of this, not all functions will generate correct results if executed in parallel.
User-written table functions can appear in the statement's FROM
list. These functions act like source tables in that they output rows. Table functions are initialized once during the statement at the start of each parallel execution process. All variables are entirely private to the parallel execution process.
In a SELECT
statement or a subquery in a DML or DDL statement, a user-written function may be executed in parallel if it has been declared with the PARALLEL_ENABLE
keyword, if it is declared in a package or type and has a PRAGMA
RESTRICT_REFERENCES
that indicates all of WNDS
, RNPS
, and WNPS
, or if it is declared with CREATE
FUNCTION
and the system can analyze the body of the PL/SQL code and determine that the code neither writes to the database nor reads or modifies package variables.
Other parts of a query or subquery can sometimes execute in parallel even if a given function execution must remain serial.
See Oracle Database Advanced Application Developer's Guide for information about the PRAGMA
RESTRICT_REFERENCES
and Oracle Database SQL Language Reference for information about CREATE
FUNCTION
.
In a parallel DML or DDL statement, as in a parallel query, a user-written function may be executed in parallel if it has been declared with the PARALLEL_ENABLE
keyword, if it is declared in a package or type and has a PRAGMA
RESTRICT_REFERENCES
that indicates all of RNDS
, WNDS
, RNPS
, and WNPS
, or if it is declared with CREATE
FUNCTION
and the system can analyze the body of the PL/SQL code and determine that the code neither reads nor writes to the database or reads nor modifies package variables.
For a parallel DML statement, any function call that cannot be executed in parallel causes the entire DML statement to be executed serially.
For an INSERT
... SELECT
or CREATE
TABLE
... AS
SELECT
statement, function calls in the query portion are parallelized according to the parallel query rules in the prior paragraph. The query may be parallelized even if the remainder of the statement must execute serially, or vice versa.
In addition to parallel SQL execution, Oracle Database can use parallelism for the following types of operations:
Parallel recovery
Parallel propagation (replication)
Parallel load (external tables and the SQL*Loader utility)
Like parallel SQL, parallel recovery, propagation, and external table loads are performed by a parallel execution coordinator and multiple parallel execution servers. Parallel load using SQL*Loader, however, uses a different mechanism.
The behavior of the parallel execution coordinator and parallel execution servers may differ, depending on what kind of operation they perform (SQL, recovery, or propagation). For example, if all parallel execution servers in the pool are occupied and the maximum number of parallel execution servers has been started:
In parallel SQL and external table loads, the parallel execution coordinator switches to serial processing.
In parallel propagation, the parallel execution coordinator returns an error.
For a given session, the parallel execution coordinator coordinates only one kind of operation. A parallel execution coordinator cannot coordinate, for example, parallel SQL and parallel recovery or propagation at the same time.
See Also:
Oracle Database Utilities for information about parallel load and SQL*Loader
Oracle Database Backup and Recovery User's Guide for information about parallel media recovery
Oracle Database Performance Tuning Guide for information about parallel instance recovery
Oracle Database Advanced Replication for information about parallel propagation
Table 8-2 shows how various types of SQL statements can be parallelized and indicates which methods of specifying parallelism take precedence.
The priority (1) specification overrides priority (2) and priority (3).
The priority (2) specification overrides priority (3).
Table 8-2 Parallelization Priority Order: By Clause, Hint, or Underlying Table/Index Declaration
Parallel Operation | PARALLEL Hint | PARALLEL Clause | ALTER SESSION | Parallel Declaration |
---|---|---|---|---|
Parallel query table scan (partitioned or nonpartitioned table) |
1) |
2) |
3) of table |
|
Parallel query index range scan (partitioned index) |
1) |
2) |
2) of index |
|
Parallel |
1) |
3) of table being updated or deleted from |
||
|
1) |
3) of table being inserted into |
||
|
Takes degree from |
Takes degree from |
Takes degree from |
Takes degree from |
|
1) |
2) of table being selected from |
||
|
Note: Hint in the |
2) |
||
|
Takes degree from |
Takes degree from |
Takes degree from |
Takes degree from |
|
1) |
2) of querying tables or partitioned indexes |
||
Parallel |
2) |
|||
Parallel |
2) |
1) |
||
|
||||
Parallel |
2) |
1) |
||
Parallel |
2) |
Oracle Database computes defaults for the parallel execution parameters based on the value at database startup of CPU_COUNT
and PARALLEL_THREADS_PER_CPU
. The parameters can also be manually tuned, increasing or decreasing their values to suit specific system configurations or performance goals. For example:
On systems where parallel execution will never be used, PARALLEL_MAX_SERVERS
can be set to zero.
On large systems with abundant SGA memory, PARALLEL_EXECUTION_MESSAGE_SIZE
can be increased to improve throughput.
You can also manually tune parallel execution parameters. Parallel execution is enabled by default.
Initializing and tuning parallel execution involves the following steps:
By default, Oracle Database automatically sets parallel execution parameters, as shown in Table 8-3.
Table 8-3 Parameters and Their Defaults
Parameter | Default | Comments |
---|---|---|
|
|
Causes parallel execution SQL to throttle DOP requests to prevent system overload. |
|
|
Controls the maximum DOP a statement can have when |
|
|
Controls whether or not auto DOP, statement queuing and in-memory parallel execution are used. By default, all of these features are disabled. |
|
16 KB |
The size of the buffers used by the parallel execution servers to communicate among themselves and with the query coordinator. These buffers will be allocated out of the shared pool. |
|
|
Restricts parallel execution to the current RAC instance. |
|
|
Specifies the maximum number of parallel execution processes and parallel recovery processes for an instance. As demand increases, Oracle Database increases the number of processes from the number created at instance startup up to this value. If you set this parameter too low, some queries may not have a parallel execution process available to them during query processing. If you set it too high, memory resource shortages may occur during peak periods, which can degrade performance. The value of |
|
0 |
Specifies the number of parallel execution processes to be started and reserved for parallel operations, when the Oracle Database is started up. Increasing this setting can help balance the startup cost of a parallel statement, but will require greater memory usage as these parallel execution processes will not be removed until the database is shut down. |
|
0 |
Specifies the minimum percentage of requested parallel execution processes required for parallel execution. With the default value of 0, a parallel statement will execute serial if no parallel server processes are available. |
|
30 seconds |
The execution time, as estimated by the optimizer, above which a statement is considered for automatic parallel query and automatic derivation of DOP. |
|
|
The number of parallel execution server processes available to run queries before statement queuing is used. Note that statement queuing will only be active if |
|
2 |
Describes the number of parallel execution processes or threads that a CPU can handle during parallel execution. |
Note that you can set some parameters in such a way that Oracle Database will be constrained. For example, if you set PROCESSES
to 20, you will not be able to get 25 slaves.
If you are sure you want to execute in parallel and want to avoid setting the DOP for a table or modifying the queries involved, you can force parallelism with the following statement:
ALTER SESSION FORCE PARALLEL QUERY;
All subsequent queries will be executed in parallel provided no restrictions are violated. You can also force DML and DDL statements. This clause overrides any parallel clause specified in subsequent statements in the session, but is overridden by a parallel hint.
In typical OLTP environments, for example, the tables are not set parallel, but nightly batch scripts may want to collect data from these tables in parallel. By setting the DOP in the session, the user avoids altering each table in parallel and then altering it back to serial when finished.
This section discusses the following topics:
The parameters that establish resource limits are:
The recommended value for the PARALLEL_FORCE_LOCAL
parameter is FALSE
.
This parameter specifies whether or not a SQL statement executed in parallel will be restricted to a single instance in an Oracle RAC environment. By setting this parameter to TRUE
, you will restrict the scope of the parallel server processed to the single RAC instance where the query coordinator is running.
The default value for PARALLEL_MAX_SERVERS
is as follows:
(CPU_COUNT x PARALLEL_THREADS_PER_CPU x (2 if PGA_AGGREGATE_TARGET > 0; otherwise 1) x 5)
The value of PGA_AGGREGATE_TARGET
can be derived when MEMORY_TARGET
is set. See Oracle Database Performance Tuning Guide for descriptions of how to use PGA_AGGREGATE_TARGET
in different scenarios. The above formula might not be sufficient for parallel queries on tables with higher DOP attributes. We recommend users who expects to run queries of higher DOP to set PARALLEL_MAX_SERVERS
as follows:
2 x DOP x NUMBER_OF_CONCURRENT_USERS
For example, setting PARALLEL_MAX_SERVERS
to 64 will allow you to run four parallel queries simultaneously, assuming that each query is using two slave sets with a DOP of eight for each set.
When concurrent users have too many query server processes, memory contention (paging), I/O contention, or excessive context switching can occur. This contention can reduce system throughput to a level lower than if parallel execution were not used. Increase the PARALLEL_MAX_SERVERS
value only if the system has sufficient memory and I/O bandwidth for the resulting load.
You can use operating system performance monitoring tools to determine how much memory, swap space and I/O bandwidth are free. Look at the run queue lengths for both your CPUs and disks, as well as the service time for I/Os on the system. Verify that the system has sufficient swap space to add more processes. Limiting the total number of query server processes might restrict the number of concurrent users who can execute parallel operations, but system throughput tends to remain stable.
You can limit the amount of parallelism available to a given user by establishing a resource consumer group for the user. Do this to limit the number of sessions, concurrent logons, and the number of parallel processes that any one user or group of users can have.
Each query server process working on a parallel execution statement is logged on with a session ID. Each process counts against the user's limit of concurrent sessions. For example, to limit a user to 10 parallel execution processes, set the user's limit to 11. One process is for the parallel coordinator and the other 10 consist of two sets of query server servers. This would allow one session for the parallel coordinator and 10 sessions for the parallel execution processes.
See Oracle Database Administrator's Guide for more information about managing resources with user profiles and Oracle Real Application Clusters Administration and Deployment Guide for more information on querying GV$
views.
The recommended value for the PARALLEL_MIN_PERCENT
parameter is 0 (zero).
This parameter enables users to wait for an acceptable DOP, depending on the application in use. Setting this parameter to values other than 0 (zero) causes Oracle Database to return an error when the requested DOP cannot be satisfied by the system at a given time. For example, if you set PARALLEL_MIN_PERCENT
to 50, which translates to 50 percent, and the DOP is reduced by 50 percent or greater because of the adaptive algorithm or because of a resource limitation, then Oracle Database returns ORA-12827
. For example:
SELECT /*+ FULL(e) PARALLEL(e, 8) */ d.department_id, SUM(SALARY) FROM employees e, departments d WHERE e.department_id = d.department_id GROUP BY d.department_id ORDER BY d.department_id;
Oracle Database responds with this message:
ORA-12827: insufficient parallel query slaves available
This parameter lets you specify in a single instance the number of processes to be started and reserved for parallel operations. The syntax is:
PARALLEL_MIN_SERVERS=n
The n variable is the number of processes you want to start and reserve for parallel operations.
Setting PARALLEL_MIN_SERVERS
balances the startup cost against memory usage. Processes started using PARALLEL_MIN_SERVERS
do not exit until the database is shut down. This way, when a query is issued the processes are likely to be available.
The recommended value for PARALLEL_SERVERS_TARGET
is 4 X CPU_COUNT X PARALLEL_THREADS_PER_CPU X ACTIVE_INSTANCES
.
When PARALLEL_DEGREE_POLICY
is set to AUTO
, statements that require parallel execution will be queued if the number of parallel processes currently in use on the system is equal to or greater than PARALLEL_SERVERS_TARGET
. This is not the maximum number of parallel server processes allowed on a system (that is controlled by PARALLEL_MAX_SERVERS
). However, PARALLEL_SERVERS_TARGET
and statement queuing is used to ensure and that each statement that requires parallel execution is allocated the necessary parallel server resources and the system is not flooded with too many parallel server processes.
Parallel execution requires memory resources in addition to those required by serial SQL execution. Additional memory is used for communication and passing data between query server processes and the query coordinator.
Oracle Database allocates memory for query server processes from the shared pool. Tune the shared pool as follows:
Allow for other clients of the shared pool, such as shared cursors and stored procedures.
Remember that larger values improve performance in multiuser systems, but smaller values use less memory.
You can then monitor the number of buffers used by parallel execution and compare the shared
pool
PX
msg
pool
to the current high water mark reported in output from the view V$PX_PROCESS_SYSSTAT
.
Note:
If you do not have enough memory available, error message 12853 occurs (insufficient memory for PX buffers: current stringK, max needed stringK). This is caused by having insufficient SGA memory available for PX buffers. You need to reconfigure the SGA to have at least (MAX
- CURRENT
) bytes of additional memory.By default, Oracle Database allocates parallel execution buffers from the shared pool.
If Oracle Database displays the following error on startup:
ORA-27102: out of memory SVR4 Error: 12: Not enough space
You should reduce the value for SHARED_POOL_SIZE
low enough so your database starts. After reducing the value of SHARED_POOL_SIZE
, you might see the error:
ORA-04031: unable to allocate 16084 bytes of shared memory ("SHARED pool","unknown object","SHARED pool heap","PX msg pool")
If so, execute the following query to determine why Oracle Database could not allocate the 16,084 bytes:
SELECT NAME, SUM(BYTES) FROM V$SGASTAT WHERE POOL='SHARED POOL' GROUP BY ROLLUP (NAME);
Your output should resemble the following:
NAME SUM(BYTES) -------------------------- ---------- PX msg pool 1474572 free memory 562132 2036704
If you specify SHARED_POOL_SIZE
and the amount of memory you need to reserve is bigger than the pool, Oracle Database does not allocate all the memory it can get. Instead, it leaves some space. When the query runs, Oracle Database tries to get what it needs. Oracle Database uses the 560 KB and needs another 16KB when it fails. The error does not report the cumulative amount that is needed. The best way of determining how much more memory is needed is to use the formulas in "Adding Memory for Message Buffers".
To resolve the problem in the current example, increase the value for SHARED_POOL_SIZE
. As shown in the sample output, the SHARED_POOL_SIZE
is about 2 MB. Depending on the amount of memory available, you could increase the value of SHARED_POOL_SIZE
to 4 MB and attempt to start your database. If Oracle Database continues to display an ORA-4031
message, gradually increase the value for SHARED_POOL_SIZE
until startup is successful.
After you determine the initial setting for the shared pool, you must calculate additional memory requirements for message buffers and determine how much additional space you need for cursors.
You must increase the value for the SHARED_POOL_SIZE
parameter to accommodate message buffers. The message buffers allow query server processes to communicate with each other.
Oracle Database uses a fixed number of buffers for each virtual connection between producer query servers and consumer query servers. Connections increase as the square of the DOP increases. For this reason, the maximum amount of memory used by parallel execution is bound by the highest DOP allowed on your system. You can control this value by using either the PARALLEL_MAX_SERVERS
parameter or by using policies and profiles.
To calculate the amount of memory required, use one of the following formulas:
For SMP systems:
mem in bytes = (3 x size x users x groups x connections)
For SMP Real Application Clusters and MPP systems:
mem in bytes = ((3 x local) + (2 x remote)) x (size x users x groups) / instances
Each instance uses the memory computed by the formula.
The terms are:
SIZE
= PARALLEL_EXECUTION_MESSAGE_SIZE
USERS
= the number of concurrent parallel execution users that you expect to have running with the optimal DOP
GROUPS
= the number of query server process groups used for each query
A simple SQL statement requires only one group. However, if your queries involve subqueries which will be processed in parallel, then Oracle Database uses an additional group of query server processes.
CONNECTIONS
= (DOP2 + 2 x DOP)
If your system is a cluster or MPP, then you should account for the number of instances because this will increase the DOP. In other words, using a DOP of 4 on a two instance cluster results in a DOP of 8. A value of PARALLEL_MAX_SERVERS
times the number of instances divided by four is a conservative estimate to use as a starting point.
LOCAL
= CONNECTIONS
/INSTANCES
REMOTE
= CONNECTIONS
- LOCAL
Add this amount to your original setting for the shared pool. However, before setting a value for either of these memory structures, you must also consider additional memory for cursors, as explained in the following section.
The formulas in this section are just starting points. Whether you are using automated or manual tuning, you should monitor usage on an on-going basis to make sure the size of memory is not too large or too small. To do this, tune the shared pool using the following query:
SELECT POOL, NAME, SUM(BYTES) FROM V$SGASTAT WHERE POOL LIKE '%pool%' GROUP BY ROLLUP (POOL, NAME);
Your output should resemble the following:
POOL NAME SUM(BYTES) ----------- -------------------------- ---------- shared pool Checkpoint queue 38496 shared pool KGFF heap 1964 shared pool KGK heap 4372 shared pool KQLS heap 1134432 shared pool LRMPD SGA Table 23856 shared pool PLS non-lib hp 2096 shared pool PX subheap 186828 shared pool SYSTEM PARAMETERS 55756 shared pool State objects 3907808 shared pool character set memory 30260 shared pool db_block_buffers 200000 shared pool db_block_hash_buckets 33132 shared pool db_files 122984 shared pool db_handles 52416 shared pool dictionary cache 198216 shared pool dlm shared memory 5387924 shared pool event statistics per sess 264768 shared pool fixed allocation callback 1376 shared pool free memory 26329104 shared pool gc_* 64000 shared pool latch nowait fails or sle 34944 shared pool library cache 2176808 shared pool log_buffer 24576 shared pool log_checkpoint_timeout 24700 shared pool long op statistics array 30240 shared pool message pool freequeue 116232 shared pool miscellaneous 267624 shared pool processes 76896 shared pool session param values 41424 shared pool sessions 170016 shared pool sql area 9549116 shared pool table columns 148104 shared pool trace_buffers_per_process 1476320 shared pool transactions 18480 shared pool trigger inform 24684 shared pool 52248968 90641768
Evaluate the memory used as shown in your output, and alter the setting for SHARED_POOL_SIZE
based on your processing needs.
To obtain more memory usage statistics, execute the following query:
SELECT * FROM V$PX_PROCESS_SYSSTAT WHERE STATISTIC LIKE 'Buffers%';
Your output should resemble the following:
STATISTIC VALUE ------------------- ----- Buffers Allocated 23225 Buffers Freed 23225 Buffers Current 0 Buffers HWM 3620
The amount of memory used appears in the Buffers
Current
and Buffers
HWM
statistics. Calculate a value in bytes by multiplying the number of buffers by the value for PARALLEL_EXECUTION_MESSAGE_SIZE
. Compare the high water mark to the parallel execution message pool size to determine if you allocated too much memory. For example, in the first output, the value for large pool as shown in px
msg
pool
is 38,092,812 or 38 MB. The Buffers
HWM
from the second output is 3,620, which when multiplied by a parallel execution message size of 4,096 is 14,827,520, or approximately 15 MB. In this case, the high water mark has reached approximately 40 percent of its capacity.
Before considering the following section, you should read the descriptions of the MEMORY_TARGET
and MEMORY_MAX_TARGET
initialization parameters in Oracle Database Performance Tuning Guide and Oracle Database Administrator's Guide for details. The PGA_AGGREGATE_TARGET
initialization parameter need not be set as MEMORY_TARGET
autotunes the SGA and PGA components.
The first group of parameters discussed in this section affects memory and resource consumption for all parallel operations, in particular, for parallel execution. These parameters are:
A second subset of parameters discussed in this section explains parameters affecting parallel DML and DDL.
To control resource consumption, you should configure memory at two levels:
At the database level, so the system uses an appropriate amount of memory from the operating system.
At the operating system level for consistency. On some platforms, you might need to set operating system parameters that control the total amount of virtual memory available, summed across all processes.
A large percentage of the memory used in data warehousing operations (compared to OLTP) is more dynamic. This memory comes from process memory (PGA), and both the size of process memory and the number of processes can vary greatly. Use the PGA_AGGREGATE_TARGET
initialization parameter to control both the process memory and the number of processes in such cases. Explicitly setting PGA_AGGREGATE_TARGET
along with MEMORY_TARGET
ensures that autotuning still occurs but PGA_AGGREGATE_TARGET
is not tuned below the specified value.
You can simplify and improve the way PGA memory is allocated by enabling automatic PGA memory management. In this mode, Oracle Database dynamically adjusts the size of the portion of the PGA memory dedicated to work areas, based on an overall PGA memory target explicitly set by the DBA. To enable automatic PGA memory management, you have to set the initialization parameter PGA_AGGREGATE_TARGET
. For new installations, PGA_AGGREGATE_TARGET
and SGA_TARGET
are set automatically by DBCA, and MEMORY_TARGET
is zero. That is, automatic memory management is disabled. You can enable it in Enterprise Manager on the Memory Parameters page. Therefore, automatic tuning of the aggregate PGA is enabled by default. However, the aggregate PGA will not grow unless you enable automatic memory management by setting MEMORY_TARGET
to a nonzero value.
See Oracle Database Performance Tuning Guide for descriptions of how to use PGA_AGGREGATE_TARGET
in different scenarios.
The PARALLEL_EXECUTION_MESSAGE_SIZE
parameter specifies the size of the buffer used for parallel execution messages. The default value is OS-specific, but is typically 16K. This value should be adequate for most applications.
The parameters that affect parallel DML and parallel DDL resource consumption are:
Parallel inserts, updates, and deletes require more resources than serial DML operations. Similarly, PARALLEL
CREATE
TABLE
... AS
SELECT
and PARALLEL
CREATE
INDEX
can require more resources. For this reason, you may need to increase the value of several additional initialization parameters. These parameters do not affect resources for queries.
For parallel DML and DDL, each query server process starts a transaction. The parallel coordinator uses the two-phase commit protocol to commit transactions; therefore, the number of transactions being processed increases by the DOP. As a result, you might need to increase the value of the TRANSACTIONS
initialization parameter.
The TRANSACTIONS
parameter specifies the maximum number of concurrent transactions. The default assumes no parallelism. For example, if you have a DOP of 20, you will have 20 more new server transactions (or 40, if you have two server sets) and 1 coordinator transaction. In this case, you should increase TRANSACTIONS
by 21 (or 41) if the transactions are running in the same instance. If you do not set this parameter, Oracle Database sets it to a value equal to 1.1 x SESSIONS
. This discussion does not apply if you are using server-managed undo.
If a system fails when there are uncommitted parallel DML or DDL transactions, you can speed up transaction recovery during startup by using the FAST_START_PARALLEL_ROLLBACK
parameter.
This parameter controls the DOP used when recovering terminated transactions. Terminated transactions are transactions that are active before a system failure. By default, the DOP is chosen to be at most two times the value of the CPU_COUNT
parameter.
If the default DOP is insufficient, set the parameter to the HIGH
. This gives a maximum DOP of at most four times the value of the CPU_COUNT
parameter. This feature is available by default.
This parameter specifies the maximum number of DML locks. Its value should equal the total number of locks on all tables referenced by all users. A parallel DML operation's lock requirement is very different from serial DML. Parallel DML holds many more locks, so you should increase the value of the DML_LOCKS
parameter by equal amounts.
Table 8-4 shows the types of locks acquired by coordinator and parallel execution server processes for different types of parallel DML statements. Using this information, you can determine the value required for these parameters.
Table 8-4 Locks Acquired by Parallel DML Statements
Type of Statement | Coordinator Process Acquires: | Each Parallel Execution Server Acquires: |
---|---|---|
Parallel |
1 table lock SX 1 partition lock X for each pruned (sub)partition |
1 table lock SX 1 partition lock 1 partition-wait lock S for each pruned (sub)partition owned by the query server process |
Parallel row-migrating |
1 table lock SX 1 partition X lock for each pruned (sub)partition 1 partition lock SX for all other (sub)partitions |
1 table lock SX 1 partition lock 1 partition-wait lock S for each pruned partition owned by the query server process 1 partition lock SX for all other (sub)partitions |
Parallel |
1 table lock SX Partition locks X for all (sub)partitions |
1 table lock SX 1 partition lock 1 partition-wait lock S for each (sub)partition |
Parallel |
1 table lock SX 1 partition lock X for each specified (sub)partition |
1 table lock SX 1 partition lock 1 partition-wait lock S for each specified (sub)partition |
Parallel |
1 table lock X |
None |
Note:
Table, partition, and partition-wait DML locks all appear as TM locks in theV$LOCK
view.Consider a table with 600 partitions running with a DOP of 100. Assume all partitions are involved in a parallel UPDATE
or DELETE
statement with no row-migrations.
The coordinator acquires:
1 table lock SX
600 partition locks X
Total server processes acquires:
100 table locks SX
600 partition locks NULL
600 partition-wait locks S
The parameters that affect I/O are:
These parameters also affect the optimizer, which ensures optimal performance for parallel execution I/O operations.
When you perform parallel updates, merges, and deletes, the buffer cache behavior is very similar to any OLTP system running a high volume of updates.
The recommended value for this parameter is 8 KB or 16 KB.
Set the database block size when you create the database. If you are creating a new database, use a large block size such as 8 KB or 16 KB.
This parameter determines how many database blocks are read with a single operating system READ
call. In this release, the default value of this parameter is a value that corresponds to the maximum I/O size that can be performed efficiently. This value is platform-dependent and is 1MP for most platforms. If you set DB_FILE_MULTIBLOCK_READ_COUNT
to an excessively high value, your operating system will lower the value to the highest allowable level when you start your database.
The recommended value for both of these parameters is TRUE
.
These parameters enable or disable the operating system's asynchronous I/O facility. They allow query server processes to overlap I/O requests with processing when performing table scans. If the operating system supports asynchronous I/O, leave these parameters at the default value of TRUE
. Figure 8-6 illustrates how asynchronous read works.
Asynchronous operations are currently supported for parallel table scans, hash joins, sorts, and serial table scans. However, this feature can require operating system specific configuration and may not be supported on all platforms.
You should perform the following types of monitoring when trying to diagnose parallel execution performance problems:
Oracle's real-time monitoring feature enables you to monitor the performance of SQL statements while they are executing. SQL monitoring is automatically started when a SQL statement runs parallel or when it has consumed at least five seconds of CPU or I/O time for a single execution. See Oracle Database Performance Tuning Guide for more details.
After your system has run for a few days, you should monitor parallel execution performance statistics to determine whether your parallel processing is optimal. Do this using any of the views discussed in this section.
In Oracle Real Application Clusters, global versions of the views described in this section aggregate statistics from multiple instances. The global views have names beginning with G
, such as GV$FILESTAT
for V$FILESTAT
, and so on.
The V$PX_BUFFER_ADVICE
view provides statistics on historical and projected maximum buffer usage by all parallel queries. You can consult this view to reconfigure SGA size in response to insufficient memory problems for parallel queries.
The V$PX_SESSION
view shows data about query server sessions, groups, sets, and server numbers. It also displays real-time data about the processes working on behalf of parallel execution. This table includes information about the requested DOP and the actual DOP granted to the operation.
The V$PX_SESSTAT
view provides a join of the session information from V$PX_SESSION
and the V$SESSTAT
table. Thus, all session statistics available to a normal session are available for all sessions performed using parallel execution.
The V$PX_PROCESS
view contains information about the parallel processes, including status, session ID, process ID, and other information.
The V$PX_PROCESS_SYSSTAT
view shows the status of query servers and provides buffer allocation statistics.
The V$PQ_SESSTAT
view shows the status of all current server groups in the system such as data about how queries allocate processes and how the multiuser and load balancing algorithms are affecting the default and hinted values. V$PQ_SESSTAT
will be obsolete in a future release.
You might need to adjust some parameter settings to improve performance after reviewing data from these views. In this case, refer to the discussion of "Tuning General Parameters for Parallel Execution". Query these views periodically to monitor the progress of long-running parallel operations.
For many dynamic performance views, you must set the parameter TIMED_STATISTICS
to TRUE
in order for Oracle Database to collect statistics for each view. You can use the ALTER
SYSTEM
or ALTER
SESSION
statements to turn TIMED_STATISTICS
on and off.
As a simple example, consider a hash join between two tables, with a join on a column with only two distinct values. At best, this hash function will have one hash value to parallel execution server A and the other to parallel execution server B. A DOP of two is fine, but, if it is four, then at least two parallel execution servers have no work. To discover this type of skew, use a query similar to the following example:
SELECT dfo_number, tq_id, server_type, process, num_rows FROM V$PQ_TQSTAT ORDER BY dfo_number DESC, tq_id, server_type, process;
The best way to resolve this problem might be to choose a different join method; a nested loop join might be the best option. Alternatively, if one of the join tables is small relative to the other, a BROADCAST
distribution method can be hinted using PQ_DISTRIBUTE
hint. Note that the optimizer considers the BROADCAST
distribution method, but requires OPTIMIZER_FEATURES_ENABLE
set to 9.0.2 or higher.
Now, assume that you have a join key with high cardinality, but one of the values contains most of the data, for example, lava lamp sales by year. The only year that had big sales was 1968, and thus, the parallel execution server for the 1968 records will be overwhelmed. You should use the same corrective actions as described previously.
The V$PQ_TQSTAT
view provides a detailed report of message traffic at the table queue level. V$PQ_TQSTAT
data is valid only when queried from a session that is executing parallel SQL statements. A table queue is the pipeline between query server groups, between the parallel coordinator and a query server group, or between a query server group and the coordinator. The table queues are represented explicitly in the operation column by PX SEND <partitioning type>
(for example, PX SEND HASH
) and PX
RECEIVE
. For backward compatibility, the row labels of PARALLEL_TO_PARALLEL
, SERIAL_TO_PARALLEL
, or PARALLEL_TO_SERIAL
will continue to have the same semantics as previous releases and can be used as before to infer the table queue allocation. In addition, the top of the parallel plan is marked by a new node with operation PX
COORDINATOR
.
V$PQ_TQSTAT
has a row for each query server process that reads from or writes to in each table queue. A table queue connecting 10 consumer processes to 10 producer processes has 20 rows in the view. Sum the bytes column and group by TQ_ID
, the table queue identifier, to obtain the total number of bytes sent through each table queue. Compare this with the optimizer estimates; large variations might indicate a need to analyze the data using a larger sample.
Compute the variance of bytes grouped by TQ_ID
. Large variances indicate workload imbalances. You should investigate large variances to determine whether the producers start out with unequal distributions of data, or whether the distribution itself is skewed. If the data itself is skewed, this might indicate a low cardinality, or low number of distinct values.
Note that the V$PQ_TQSTAT
view will be renamed in a future release to V$PX_TQSTSAT
.
These examples use the dynamic performance views described in "Monitoring Parallel Execution Performance with Dynamic Performance Views".
Use GV$PX_SESSION
to determine the configuration of the server group executing in parallel. In this example, sessions 9 is the query coordinator, while sessions 7 and 21 are in the first group, first set. Sessions 18 and 20 are in the first group, second set. The requested and granted DOP for this query is 2, as shown by the output from the following query:
SELECT QCSID, SID, INST_ID "Inst", SERVER_GROUP "Group", SERVER_SET "Set", DEGREE "Degree", REQ_DEGREE "Req Degree" FROM GV$PX_SESSION ORDER BY QCSID, QCINST_ID, SERVER_GROUP, SERVER_SET;
Your output should resemble the following:
QCSID SID Inst Group Set Degree Req Degree ---------- ---------- ---------- ---------- ---------- ---------- ---------- 9 9 1 9 7 1 1 1 2 2 9 21 1 1 1 2 2 9 18 1 1 2 2 2 9 20 1 1 2 2 2
For a single instance, use SELECT
FROM
V$PX_SESSION
and do not include the column name Instance
ID
.
The processes shown in the output from the previous example using GV$PX_SESSION
collaborate to complete the same task. The next example shows the execution of a join query to determine the progress of these processes in terms of physical reads. Use this query to track any specific statistic:
SELECT QCSID, SID, INST_ID "Inst", SERVER_GROUP "Group", SERVER_SET "Set", NAME "Stat Name", VALUE FROM GV$PX_SESSTAT A, V$STATNAME B WHERE A.STATISTIC# = B.STATISTIC# AND NAME LIKE 'PHYSICAL READS' AND VALUE > 0 ORDER BY QCSID, QCINST_ID, SERVER_GROUP, SERVER_SET;
Your output should resemble the following:
QCSID SID Inst Group Set Stat Name VALUE ------ ----- ------ ------ ------ ------------------ ---------- 9 9 1 physical reads 3863 9 7 1 1 1 physical reads 2 9 21 1 1 1 physical reads 2 9 18 1 1 2 physical reads 2 9 20 1 1 2 physical reads 2
Use the previous type of query to track statistics in V$STATNAME
. Repeat this query as often as required to observe the progress of the query server processes.
The next query uses V$PX_PROCESS
to check the status of the query servers.
SELECT * FROM V$PX_PROCESS;
Your output should resemble the following:
SERV STATUS PID SPID SID SERIAL ---- --------- ------ --------- ------ ------ P002 IN USE 16 16955 21 7729 P003 IN USE 17 16957 20 2921 P004 AVAILABLE 18 16959 P005 AVAILABLE 19 16962 P000 IN USE 12 6999 18 4720 P001 IN USE 13 7004 7 234
The V$SYSSTAT
and V$SESSTAT
views contain several statistics for monitoring parallel execution. Use these statistics to track the number of parallel queries, DMLs, DDLs, data flow operators (DFOs), and operations. Each query, DML, or DDL can have multiple parallel operations and multiple DFOs.
In addition, statistics also count the number of query operations for which the DOP was reduced, or downgraded, due to either the adaptive multiuser algorithm or the depletion of available parallel execution servers.
Finally, statistics in these views also count the number of messages sent on behalf of parallel execution. The following syntax is an example of how to display these statistics:
SELECT NAME, VALUE FROM GV$SYSSTAT WHERE UPPER (NAME) LIKE '%PARALLEL OPERATIONS%' OR UPPER (NAME) LIKE '%PARALLELIZED%' OR UPPER (NAME) LIKE '%PX%';
Your output should resemble the following:
NAME VALUE -------------------------------------------------- ---------- queries parallelized 347 DML statements parallelized 0 DDL statements parallelized 0 DFO trees parallelized 463 Parallel operations not downgraded 28 Parallel operations downgraded to serial 31 Parallel operations downgraded 75 to 99 pct 252 Parallel operations downgraded 50 to 75 pct 128 Parallel operations downgraded 25 to 50 pct 43 Parallel operations downgraded 1 to 25 pct 12 PX local messages sent 74548 PX local messages recv'd 74128 PX remote messages sent 0 PX remote messages recv'd 0
The following query shows the current wait state of each slave and query coordinator process on the system:
SELECT px.SID "SID", p.PID, p.SPID "SPID", px.INST_ID "Inst", px.SERVER_GROUP "Group", px.SERVER_SET "Set", px.DEGREE "Degree", px.REQ_DEGREE "Req Degree", w.event "Wait Event" FROM GV$SESSION s, GV$PX_SESSION px, GV$PROCESS p, GV$SESSION_WAIT w WHERE s.sid (+) = px.sid AND s.inst_id (+) = px.inst_id AND s.sid = w.sid (+) AND s.inst_id = w.inst_id (+) AND s.paddr = p.addr (+) AND s.inst_id = p.inst_id (+) ORDER BY DECODE(px.QCINST_ID, NULL, px.INST_ID, px.QCINST_ID), px.QCSID, DECODE(px.SERVER_GROUP, NULL, 0, px.SERVER_GROUP), px.SERVER_SET, px.INST_ID;
There is considerable overlap between information available in Oracle Database and information available though operating system utilities (such as sar
and vmstat
on UNIX-based systems). Operating systems provide performance statistics on I/O, communication, CPU, memory and paging, scheduling, and synchronization primitives. The V$SESSTAT
view provides the major categories of operating system statistics as well.
Typically, operating system information about I/O devices and semaphore operations is harder to map back to database objects and operations than is Oracle Database information. However, some operating systems have good visualization tools and efficient means of collecting the data.
Operating system information about CPU and memory usage is very important for assessing performance. Probably the most important statistic is CPU usage. The goal of low-level performance tuning is to become CPU bound on all CPUs. Once this is achieved, you can work at the SQL level to find an alternate plan that might be more I/O intensive but use less CPU.
Operating system memory and paging information is valuable for fine tuning the many system parameters that control how memory is divided among memory-intensive data warehouse subsystems like parallel communication, sort, and hash join.
This section contains some ideas for improving performance in a parallel execution environment and includes the following topics:
Oracle Database cannot return results to a user process in parallel. If a query returns a large number of rows, execution of the query might indeed be faster. However, the user process can only receive the rows serially. To optimize parallel execution performance for queries that retrieve large result sets, use PARALLEL
CREATE
TABLE
... AS
SELECT
or direct-path INSERT
to store the result set in the database. At a later time, users can view the result set serially.
Performing the SELECT
in parallel does not influence the CREATE
statement. If the CREATE
is parallel, however, the optimizer tries to make the SELECT
run in parallel also.
When combined with the NOLOGGING
option, the parallel version of CREATE
TABLE
... AS
SELECT
provides a very efficient intermediate table facility, for example:
CREATE TABLE summary PARALLEL NOLOGGING AS SELECT dim_1, dim_2 ..., SUM (meas_1) FROM facts GROUP BY dim_1, dim_2;
These tables can also be incrementally loaded with parallel INSERT
. You can take advantage of intermediate tables using the following techniques:
Common subqueries can be computed once and referenced many times. This can allow some queries against star schemas (in particular, queries without selective WHERE
-clause predicates) to be better parallelized. Note that star queries with selective WHERE
-clause predicates using the star-transformation technique can be effectively parallelized automatically without any modification to the SQL.
Decompose complex queries into simpler steps in order to provide application-level checkpoint or restart. For example, a complex multitable join on a database 1 terabyte in size could run for dozens of hours. A failure during this query would mean starting over from the beginning. Using CREATE
TABLE
... AS
SELECT
or PARALLEL
INSERT
AS
SELECT
, you can rewrite the query as a sequence of simpler queries that run for a few hours each. If a system failure occurs, the query can be restarted from the last completed step.
Implement manual parallel deletes efficiently by creating a new table that omits the unwanted rows from the original table, and then dropping the original table. Alternatively, you can use the convenient parallel delete feature, which directly deletes rows from the original table.
Create summary tables for efficient multidimensional drill-down analysis. For example, a summary table might store the sum of revenue grouped by month, brand, region, and salesman.
Reorganize tables, eliminating chained rows, compressing free space, and so on, by copying the old table to a new table. This is much faster than export/import and easier than reloading.
Be sure to use the DBMS_STATS
package to gather optimizer statistics on newly created tables. To avoid I/O bottlenecks, specify a tablespace that is striped across at least as many physical disks as CPUs. To avoid fragmentation in allocating space, the number of files in a tablespace should be a multiple of the number of CPUs. See Oracle Database Data Warehousing Guide, for more information about bottlenecks.
Use the EXPLAIN
PLAN
statement to see the execution plans for parallel queries. EXPLAIN
PLAN
output shows optimizer information in the COST
, BYTES
, and CARDINALITY
columns. You can also use the utlxplp.sql
script to present the EXPLAIN
PLAN
output with all relevant parallel information.
There are several ways to optimize the parallel execution of join statements. You can alter system configuration, adjust parameters as discussed earlier in this chapter, or use hints, such as the DISTRIBUTION
hint.
The key points when using EXPLAIN
PLAN
are to:
Verify optimizer selectivity estimates. If the optimizer thinks that only one row will be produced from a query, it tends to favor using a nested loop. This could be an indication that the tables are not analyzed or that the optimizer has made an incorrect estimate about the correlation of multiple predicates on the same table. Extended statistics or a hint may be required to provide the optimizer with the correct selectivity or to force the optimizer to use another join method.
Use hash join on low cardinality join keys. If a join key has few distinct values, then a hash join may not be optimal. If the number of distinct values is less than the DOP, then some parallel query servers may be unable to work on the particular query.
Consider data skew. If a join key involves excessive data skew, a hash join may require some parallel query servers to work more than others. Consider using a hint to cause a BROADCAST
distribution method if the optimizer did not choose it. Note that the optimizer will consider the BROADCAST
distribution method only if the OPTIMIZER_FEATURES_ENABLE
is set to 9.0.2 or higher. See "V$PQ_TQSTAT" for further details.
The following example illustrates how the optimizer intends to execute a parallel query.
explain plan for SELECT /*+ PARALLEL */ cust_first_name, cust_last_name FROM customers c, sales s WHERE c.cust_id = s.cust_id; ---------------------------------------------------------- | Id | Operation | Name | ---------------------------------------------------------- | 0 | SELECT STATEMENT | | | 1 | PX COORDINATOR | | | 2 | PX SEND QC (RANDOM) | :TQ10000 | | 3 | NESTED LOOPS | | | 4 | PX BLOCK ITERATOR | | | 5 | TABLE ACCESS FULL | CUSTOMERS | | 6 | PARTITION RANGE ALL | | | 7 | BITMAP CONVERSION TO ROWIDS| | | 8 | BITMAP INDEX SINGLE VALUE | SALES_CUST_BIX | ---------------------------------------------------------- Note ----- - Computed Degree of Parallelism is 2 - Degree of Parallelism of 2 is derived from scan of object SH.CUSTOMERS
When you want to refresh your data warehouse database using parallel insert, update, or delete on a data warehouse, there are additional issues to consider when designing the physical database. These considerations do not affect parallel execution operations. These issues are:
If a parallel restriction is violated, the operation is simply performed serially. If a direct-path INSERT
restriction is violated, then the APPEND
hint is ignored and a conventional insert is performed. No error message is returned.
For tables created prior to Oracle9i Database release version 9.0.1 or tables that do not have the parallel DML itl
invariant property, the DOP is equal to the number of partitions or subpartitions. That means that, if the table is not partitioned, the query runs in serial. To see what tables do not have this property, issue the following statement:
SELECT u.name, o.name FROM obj$ o, tab$ t, user$ u WHERE o.obj# = t.obj# AND o.owner# = u.user# AND bitand(t.property,536870912) != 536870912;
If you have global indexes, a global index segment and global index blocks are shared by server processes of the same parallel DML statement. Even if the operations are not performed against the same row, the server processes can share the same index blocks. Each server transaction needs one transaction entry in the index block header before it can make changes to a block. Therefore, in the CREATE
INDEX
or ALTER
INDEX
statements, you should set INITRANS
, the initial number of transactions allocated within each data block, to a large value, such as the maximum DOP against this index.
There is a limitation on the available number of transaction free lists for segments in dictionary-managed tablespaces. Once a segment has been created, the number of process and transaction free lists is fixed and cannot be altered. If you specify a large number of process free lists in the segment header, you might find that this limits the number of transaction free lists that are available. You can abate this limitation the next time you re-create the segment header by decreasing the number of process free lists; this leaves more room for transaction free lists in the segment header.
For UPDATE
and DELETE
operations, each server process can require its own transaction free list. The parallel DML DOP is thus effectively limited by the smallest number of transaction free lists available on the table and on any of the global indexes the DML statement must maintain. For example, if the table has 25 transaction free lists and the table has two global indexes, one with 50 transaction free lists and one with 30 transaction free lists, the DOP is limited to 25. If the table had had 40 transaction free lists, the DOP would have been limited to 30.
The FREELISTS
parameter of the STORAGE
clause is used to set the number of process free lists. By default, no process free lists are created.
The default number of transaction free lists depends on the block size. For example, if the number of process free lists is not set explicitly, a 4 KB block has about 80 transaction free lists by default. The minimum number of transaction free lists is 25.
Parallel DDL and parallel DML operations can generate a large amount of redo logs. A single ARCH
process to archive these redo logs might not be able to keep up. To avoid this problem, you can spawn multiple archiver processes. This can be done manually or by using a job queue.
Parallel DML operations dirty a large number of data, index, and undo blocks in the buffer cache during a short period of time. For example, suppose you see a high number of free_buffer_waits
after querying the V$SYSTEM_EVENT
view, as in the following syntax:
SELECT TOTAL_WAITS FROM V$SYSTEM_EVENT WHERE EVENT = 'FREE BUFFER WAITS';
In this case, you should consider increasing the DBWn processes. If there are no waits for free buffers, the query will not return any rows.
The [NO]LOGGING
clause applies to tables, partitions, tablespaces, and indexes. Virtually no log is generated for certain operations (such as direct-path INSERT
) if the NOLOGGING
clause is used. The NOLOGGING
attribute is not specified at the INSERT
statement level but is instead specified when using the ALTER
or CREATE
statement for a table, partition, index, or tablespace.
When a table or index has NOLOGGING
set, neither parallel nor serial direct-path INSERT
operations generate redo logs. Processes running with the NOLOGGING
option set run faster because no redo is generated. However, after a NOLOGGING
operation against a table, partition, or index, if a media failure occurs before a backup is taken, then all tables, partitions, and indexes that have been modified might be corrupted.
Direct-path INSERT
operations (except for dictionary updates) never generate redo logs. The NOLOGGING
attribute does not affect undo, only redo. To be precise, NOLOGGING
allows the direct-path INSERT
operation to generate a negligible amount of redo (range-invalidation redo, as opposed to full image redo).
For backward compatibility, [UN]RECOVERABLE
is still supported as an alternate keyword with the CREATE
TABLE
statement. This alternate keyword might not be supported, however, in future releases.
At the tablespace level, the logging clause specifies the default logging attribute for all tables, indexes, and partitions created in the tablespace. When an existing tablespace logging attribute is changed by the ALTER
TABLESPACE
statement, then all tables, indexes, and partitions created after the ALTER
statement will have the new logging attribute; existing ones will not change their logging attributes. The tablespace-level logging attribute can be overridden by the specifications at the table, index, or partition level.
The default logging attribute is LOGGING
. However, if you have put the database in NOARCHIVELOG
mode, by issuing ALTER
DATABASE
NOARCHIVELOG
, then all operations that can be done without logging will not generate logs, regardless of the specified logging attribute.
Multiple processes can work together simultaneously to create an index. By dividing the work necessary to create an index among multiple server processes, Oracle Database can create the index more quickly than if a single server process created the index sequentially.
Parallel index creation works in much the same way as a table scan with an ORDER
BY
clause. The table is randomly sampled and a set of index keys is found that equally divides the index into the same number of pieces as the DOP. A first set of query processes scans the table, extracts key-rowid pairs, and sends each pair to a process in a second set of query processes based on key. Each process in the second set sorts the keys and builds an index in the usual fashion. After all index pieces are built, the parallel coordinator simply concatenates the pieces (which are ordered) to form the final index.
Parallel local index creation uses a single server set. Each server process in the set is assigned a table partition to scan and for which to build an index partition. Because half as many server processes are used for a given DOP, parallel local index creation can be run with a higher DOP. However, the DOP is restricted to be less than or equal to the number of index partitions you wish to create. To avoid this, you can use the DBMS_PCLXUTIL
package.
You can optionally specify that no redo and undo logging should occur during index creation. This can significantly improve performance but temporarily renders the index unrecoverable. Recoverability is restored after the new index is backed up. If your application can tolerate a window where recovery of the index requires it to be re-created, then you should consider using the NOLOGGING
clause.
The PARALLEL
clause in the CREATE
INDEX
statement is the only way in which you can specify the DOP for creating the index. If the DOP is not specified in the parallel clause of CREATE
INDEX
, then the number of CPUs is used as the DOP. If there is no PARALLEL
clause, index creation is done serially.
When creating an index in parallel, the STORAGE
clause refers to the storage of each of the subindexes created by the query server processes. Therefore, an index created with an INITIAL
of 5 MB and a DOP of 12 consumes at least 60 MB of storage during index creation because each process starts with an extent of 5 MB. When the query coordinator process combines the sorted subindexes, some of the extents might be trimmed, and the resulting index might be smaller than the requested 60 MB.
When you add or enable a UNIQUE
or PRIMARY
KEY
constraint on a table, you cannot automatically create the required index in parallel. Instead, manually create an index on the desired columns, using the CREATE
INDEX
statement and an appropriate PARALLEL
clause, and then add or enable the constraint. Oracle Database then uses the existing index when enabling or adding the constraint.
Multiple constraints on the same table can be enabled concurrently and in parallel if all the constraints are already in the ENABLE
NOVALIDATE
state. In the following example, the ALTER
TABLE
... ENABLE
CONSTRAINT
statement performs the table scan that checks the constraint in parallel:
CREATE TABLE a (a1 NUMBER CONSTRAINT ach CHECK (a1 > 0) ENABLE NOVALIDATE) PARALLEL; INSERT INTO a values (1); COMMIT; ALTER TABLE a ENABLE CONSTRAINT ach;
This section provides an overview of parallel DML functionality. The topics covered include:
The functionality available using an INSERT
statement can be summarized as follows:
Table 8-5 Summary of INSERT Features
Insert Type | Parallel | Serial | NOLOGGING |
---|---|---|---|
Conventional |
No |
Yes |
No |
Direct-path ( |
Yes, but requires:
|
Yes, but requires:
|
Yes, but requires:
|
If parallel DML is enabled and there is a PARALLEL
hint or PARALLEL
attribute set for the table in the data dictionary, then inserts are parallel and appended, unless a restriction applies. If either the PARALLEL
hint or PARALLEL
attribute is missing, the insert is performed serially.
The append mode is the default during a parallel insert: data is always inserted into a new block which is allocated to the table. Therefore, the APPEND
hint is optional. You should use append mode to increase the speed of INSERT
operations, but not when space utilization needs to be optimized. You can use NOAPPEND
to override append mode.
The APPEND
hint applies to both serial and parallel insert: even serial inserts are faster if you use this hint. APPEND
, however, does require more space and locking overhead.
You can use NOLOGGING
with APPEND
to make the process even faster. NOLOGGING
means that no redo log is generated for the operation. NOLOGGING
is never the default; use it when you wish to optimize performance. It should not normally be used when recovery is needed for the table or partition. If recovery is needed, be sure to take a backup immediately after the operation. Use the ALTER
TABLE
[NO]LOGGING
statement to set the appropriate value.
When the table or partition has the PARALLEL
attribute in the data dictionary, that attribute setting is used to determine parallelism of INSERT
, UPDATE
, and DELETE
statements as well as queries. An explicit PARALLEL
hint for a table in a statement overrides the effect of the PARALLEL
attribute in the data dictionary.
You can use the NO_PARALLEL
hint to override a PARALLEL
attribute for the table in the data dictionary. In general, hints take precedence over attributes.
DML operations are considered for parallelization only if the session is in a PARALLEL
DML
enabled mode. (Use ALTER
SESSION
ENABLE
PARALLEL
DML
to enter this mode.) The mode does not affect parallelization of queries or of the query portions of a DML statement.
In the INSERT
... SELECT
statement you can specify a PARALLEL
hint after the INSERT
keyword, in addition to the hint after the SELECT
keyword. The PARALLEL
hint after the INSERT
keyword applies to the INSERT
operation only, and the PARALLEL
hint after the SELECT
keyword applies to the SELECT
operation only. Thus, parallelism of the INSERT
and SELECT
operations are independent of each other. If one operation cannot be performed in parallel, it has no effect on whether the other operation can be performed in parallel.
The ability to parallelize inserts causes a change in existing behavior if the user has explicitly enabled the session for parallel DML and if the table in question has a PARALLEL
attribute set in the data dictionary entry. In that case, existing INSERT
... SELECT
statements that have the select operation parallelized can also have their insert operation parallelized.
If you query multiple tables, you can specify multiple SELECT
PARALLEL
hints and multiple PARALLEL
attributes.
Example 8-7 Parallelizing INSERT ... SELECT
Add the new employees who were hired after the acquisition of ACME
.
INSERT /*+ PARALLEL(employees) */ INTO employees SELECT /*+ PARALLEL(ACME_EMP) */ * FROM ACME_EMP;
The APPEND
keyword is not required in this example because it is implied by the PARALLEL
hint.
The PARALLEL
hint (placed immediately after the UPDATE
or DELETE
keyword) applies not only to the underlying scan operation, but also to the UPDATE
or DELETE
operation. Alternatively, you can specify UPDATE
or DELETE
parallelism in the PARALLEL
clause specified in the definition of the table to be modified.
If you have explicitly enabled parallel DML for the session or transaction, UPDATE
or DELETE
statements that have their query operation parallelized can also have their UPDATE
or DELETE
operation parallelized. Any subqueries or updatable views in the statement can have their own separate PARALLEL
hints or clauses, but these parallel directives do not affect the decision to parallelize the update or delete. If these operations cannot be performed in parallel, it has no effect on whether the UPDATE
or DELETE
portion can be performed in parallel.
Example 8-8 Parallelizing UPDATE and DELETE
Give a 10 percent salary raise to all clerks in Dallas.
UPDATE /*+ PARALLEL(employees) */ employees SET SAL=SAL * 1.1 WHERE JOB='CLERK' AND DEPTNO IN (SELECT DEPTNO FROM DEPT WHERE LOCATION='DALLAS');
The PARALLEL
hint is applied to the UPDATE
operation as well as to the scan.
Example 8-9 Parallelizing UPDATE and DELETE
Remove all products in the grocery category because the grocery business line was recently spun off into a separate company.
DELETE /*+ PARALLEL(PRODUCTS) */ FROM PRODUCTS WHERE PRODUCT_CATEGORY ='GROCERY';
Again, the parallelism is applied to the scan as well as UPDATE
operation on table employees
.
Parallel DML combined with the updatable join views facility provides an efficient solution for refreshing the tables of a data warehouse system. To refresh tables is to update them with the differential data generated from the OLTP production system.
In the following example, assume that you want to refresh a table named customer
that has columns c_key
, c_name
, and c_addr
. The differential data contains either new rows or rows that have been updated since the last refresh of the data warehouse. In this example, the updated data is shipped from the production system to the data warehouse system by means of ASCII files. These files must be loaded into a temporary table, named diff_customer
, before starting the refresh process. You can use SQL*Loader with both the parallel and direct options to efficiently perform this task. You can use the APPEND
hint when loading in parallel as well.
Once diff_customer
is loaded, the refresh process can be started. It can be performed in two phases or by merging in parallel, as demonstrated in the following:
The following statement is a straightforward SQL implementation of the update using subqueries:
UPDATE customers SET(c_name, c_addr) = (SELECT c_name, c_addr FROM diff_customer WHERE diff_customer.c_key = customer.c_key) WHERE c_key IN(SELECT c_key FROM diff_customer);
Unfortunately, the two subqueries in this statement affect performance.
An alternative is to rewrite this query using updatable join views. To do this, you must first add a primary key constraint to the diff_customer
table to ensure that the modified columns map to a key-preserved table:
CREATE UNIQUE INDEX diff_pkey_ind ON diff_customer(c_key) PARALLEL NOLOGGING; ALTER TABLE diff_customer ADD PRIMARY KEY (c_key);
You can then update the customers
table with the following SQL statement:
UPDATE /*+ PARALLEL(cust_joinview) */ (SELECT /*+ PARALLEL(customers) PARALLEL(diff_customer) */ CUSTOMER.c_name AS c_name CUSTOMER.c_addr AS c_addr, diff_customer.c_name AS c_newname, diff_customer.c_addr AS c_newaddr FROM diff_customer WHERE customers.c_key = diff_customer.c_key) cust_joinview SET c_name = c_newname, c_addr = c_newaddr;
The base scans feeding the join view cust_joinview
are done in parallel. You can then parallelize the update to further improve performance, but only if the customers
table is partitioned.
The last phase of the refresh process consists of inserting the new rows from the diff_customer
temporary table to the customers
table. Unlike the update case, you cannot avoid having a subquery in the INSERT
statement:
INSERT /*+PARALLEL(customers)*/ INTO customers SELECT * FROM diff_customer s);
However, you can guarantee that the subquery is transformed into an anti-hash join by using the HASH_AJ
hint. Doing so enables you to use parallel INSERT
to execute the preceding statement efficiently. Parallel INSERT
is applicable even if the table is not partitioned.
You can combine updates and inserts into one statement, commonly known as a merge. The following statement achieves the same result as all of the statements in "Updating the Table in Parallel" and "Inserting the New Rows into the Table in Parallel":
MERGE INTO customers USING diff_customer ON (diff_customer.c_key = customer.c_key) WHEN MATCHED THEN UPDATE SET (c_name, c_addr) = (SELECT c_name, c_addr FROM diff_customer WHERE diff_customer.c_key = customers.c_key) WHEN NOT MATCHED THEN INSERT VALUES (diff_customer.c_key,diff_customer.c_data);