10 Storage Management for VLDBs

Storage performance in data warehouse environments often translates into I/O throughput (MB/s). For OLTP systems the number of I/O requests Per Second (IOPS) is a key measure for performance.

This chapter discusses storage management for the database files in a VLDB environment only. Non-database files, including the Oracle Database software, are not discussed because management of those files is no different from a non-VLDB database. Therefore, the focus is on the high availability, performance, and manageability aspects of storage systems for VLDB environments.

This chapter contains the following topics:

High Availability
Performance
Scalability and Manageability
ASM Settings Specific to VLDBs
Monitoring Database Storage Using Database Control

Note:

Oracle supports the use of database files on raw devices and on file systems, and supports the use of Automatic Storage Manager (ASM) on top of raw devices or logical volumes. Oracle recommends that ASM be used whenever possible.

High Availability

High availability can be achieved by implementing storage redundancy. In storage terms, these are mirroring techniques. There are three options for mirroring in a database environment:

Hardware-based mirroring
Using ASM for mirroring
Software-based mirroring not using ASM

Oracle does not recommend software-based mirroring that is not using ASM. The following sections discuss hardware mirroring and mirroring using ASM.

Note:

In a cluster configuration, the software you use must support cluster capabilities. ASM is a cluster file system for Oracle database files.

Hardware-Based Mirroring

Most external storage devices provide support for different RAID (Redundant Array of Inexpensive Disks) levels. The most commonly used high availability hardware RAID levels in VLDB environments are RAID 1 and RAID 5. Though less commonly used in VLDB environments, other high availability RAID levels can also be used.

RAID 1 Mirroring

RAID 1 is a basic mirroring technique. Every storage block that is written to storage will be stored twice on different physical devices as defined by the RAID setup. RAID 1 provides fault tolerance because if one device fails, then there is another, mirrored, device that can respond to the request for data. The two writes in a RAID 1 setup are generated at the storage level. RAID 1 requires at least two physical disks to be effective.

Storage devices generally provide capabilities to read either the primary or the mirror in case a request comes in, which may result in better performance compared to other RAID configurations designed for high availability. RAID 1 is the simplest hardware high availability implementation but will require double the amount of storage needed to store the data. RAID 1 is often combined with RAID 0 (striping) in RAID 0+1 configurations. In the simplest RAID 0+1 configuration, individual stripes are mirrored across two physical devices.

RAID 5 Mirroring

RAID 5 requires at least 3 storage devices, but commonly 4 to 6 devices are used in a RAID 5 group. When using RAID 5, for every data block written to a device, parity is calculated and stored on a different device. On reads, the parity is checked. The parity calculation takes place in the storage layer. RAID 5 provides high availability because in case of a device failure, the device's contents can be rebuilt based on the parities stored on other devices.

RAID 5 provides good read performance. Writes may be slowed down by the parity calculation in the storage layer. RAID 5 does not require double the amount of storage but rather a smaller percentage depending on the number of devices in the RAID 5 group. RAID 5 is relatively complex and as a result, not all storage devices support a RAID 5 setup.

Mirroring using ASM

Automatic Storage Manager (ASM) provides software-based mirroring capabilities. ASM provides support for normal redundancy (mirroring) and high redundancy (triple mirroring). ASM also supports the use of external redundancy, in which case ASM will not perform additional mirroring. ASM normal redundancy can be compared to RAID 1 hardware mirroring.

With ASM mirroring, the mirror is produced by the database servers. As a result writes require more I/O throughput when using ASM mirroring compared to using hardware-based mirroring. Depending on your configuration and the speed of the hardware RAID controllers, ASM mirroring or hardware RAID may introduce a bottleneck for data loads.

In ASM, the definition of failure groups enables redundancy, as ASM will mirror data across the boundaries of the failure group. For example, in a VLDB environment, you can define one failure group per disk array, in which case ASM will make sure to place mirrored data on a different disk array. That way, you could not only survive a failure of a single disk in a disk array, but you could even survive the crash of an entire disk array or failure of all channels to that disk array. Hardware RAID configurations typically do not support this kind of fault tolerance.

ASM using normal redundancy requires double the amount of disk space needed to store the data. High redundancy requires triple the amount of disk space.

Performance

In order to achieve the optimum throughput from storage devices, multiple disks must work in parallel. This can be achieved using a technique called striping, which stores data blocks in equi-sized slices (stripes) across multiple devices. Striping enables storage configurations for good performance and throughput.

Optimum storage device performance is a trade-off between seek time and accessing consecutive blocks on disk. In a VLDB environment, a 1 MB stripe size provides a good balance for optimal performance and throughput, both for OLTP systems and data warehouse systems. There are three options for striping in a database environment:

Hardware-based striping
Software-based striping using ASM
Software-based striping not using ASM

It is possible to use a combination of striping techniques but you have to make sure you physically store stripes on different devices in order to get the performance advantages out of striping. From a conceptual perspective, software-based striping not using ASM is very similar to hardware-based striping. The following sections discuss hardware-based striping and striping using ASM.

Note:

In a cluster configuration, the software you use must support cluster capabilities. ASM is a cluster file system for Oracle database files.

Hardware-Based Striping

Most external storage devices provide striping capabilities. The most commonly used striping techniques to improve storage performance are RAID 0 and RAID 5.

RAID 0 Striping

RAID 0 requires at least 2 devices to implement. Data blocks written to the devices are split up and alternatively stored across the devices using the stripe size. This technique enables the use of multiple devices and multiple channels to the devices.

RAID 0, despite its RAID name, is not redundant. Loss of a device in a RAID 0 configuration results in data loss, and should always be combined with some redundancy in a mission-critical environment. Database implementations using RAID 0 are often combined with RAID 1, basic mirroring, in RAID 0+1 configurations.

RAID 5 Striping

RAID 5 configurations spread data across the available devices in the raid group using a hardware-specific stripe size. As a result, multiple devices and channels are used to read and write data. Due to its more complex parity calculation, not all storage devices support RAID 5 configurations.

Striping Using ASM

Automatic Storage Manager (ASM) always stripes across all devices presented to it in the context of a disk group. A disk group is a logical storage pool in which you create data files. The default ASM stripe size is 1 MB, which is a good stripe size for a VLDB.

See Also:

Oracle Database Storage Administrator's Guide for more information about ASM configuration

Oracle recommends that you use disks with the same performance characteristics in a disk group. All disks in a disk group should also be the same size for optimum data distribution and hence optimum performance and throughput. The disk group should span as many physical spindles as possible in order to get the best performance. The disk group configuration for a VLDB does not have to be different from the disk group configuration for a non-VLDB.

ASM can be used on top of already striped storage devices. If you use such a configuration, then make sure not to introduce hotspots by defining disk groups that span logical devices which physically may be using the same resource (disk, controller, or channel to disk) rather than other available resources. Always make sure that ASM stripes are distributed equally across all physical devices.

ILM

In an Information Lifecycle Management environment, you cannot use striping across all devices, because all data would then be distributed across all storage pools. In an ILM environment, different storage pools typically have different performance characteristics. Therefore, tablespaces should not span storage pools, and hence data files for the same tablespace should not be stored in multiple storage pools.

Storage in an ILM environment should be configured to use striping across all devices in a storage pool. If you use ASM, then separate disk groups for different storage pools should be created. Using this approach, tablespaces will not store data files in different disk groups. Data can be moved online between tablespaces using partition movement operations in the case of partitioned tables, or using the DBMS_REDEFINITION package when the tables are not partitioned.

Partition Placement

Partition placement is not a concern if you stripe across all available devices and distribute the load across all available resources. If you cannot stripe data files across all available devices, then consider partition placement to optimize the use of all available resources (physical disk spindles, disk controllers, and channels to disk).

I/O-intensive queries or DML operations should make optimal use of all available resources. Storing database object partitions in specific tablespaces, each of which uses a different set of hardware resources, enables you to use all resources for operations against a single partitioned database object. Make sure that I/O-intensive operations can use all resources by using an appropriate partitioning technique.

Hash partitioning and hash subpartitioning on a unique or almost unique column or set of columns with the number of hash partitions equal to a power of 2 is the only technique likely to result in an even workload distribution when using partition placement to optimize I/O resource utilization. Other partitioning and subpartitioning techniques may yield similar benefits depending on your application.

Bigfile Tablespaces

Oracle Database enables the creation of bigfile tablespaces. A bigfile tablespace consists of a single data or temp file which can be up to 128 TB in size. The use of bigfile tablespaces can significantly reduce the number of data files for your database. Oracle Database 11g introduces parallel RMAN backup and restore on single data files.

As a result, there is no disadvantage to using bigfile tablespaces and you may choose to use bigfile tablespaces in order to significantly reduce the number of data and temp files.

File allocation is a serial process. If you use automatic allocation for your tables and automatically extensible data files, then a large data load can be impacted by the amount of time it takes to extend the file, regardless of whether you use BigFile tablespaces. However, if you pre-allocate data files and you use multiple data files, then you can spawn multiple processes to add data files concurrently.

Scalability and Manageability

A very important characteristic of a VLDB is its large size. Storage scalability and management is an important factor in a VLDB environment. The large size introduces the following challenges:

Simple statistics suggest that storage components are more likely to fail because VLDBs use more components.
A small relative growth in a VLDB may amount to a significant absolute growth, resulting in possibly many devices to be added.
Despite its size, performance and (often) availability requirements are not different from smaller systems.

The storage configuration you choose should be able to cope with these challenges. Regardless of whether storage is added or removed, deliberately or accidentally, your system should remain in an optimal state from a performance and high availability perspective.

Stripe and Mirror Everything (SAME)

The stripe and mirror everything (SAME) methodology has been recommended by Oracle for many years and is an approach to optimize high availability, performance, and manageability. In order to simplify the configuration further, a fixed stripe size of 1 MB is recommended in the SAME methodology as a good starting point for both OLTP and data warehouse systems. ASM implements the SAME methodology and adds automation on top of it.

SAME and Manageability

In order to achieve maximum performance, the SAME methodology proposes to stripe across as many physical devices as possible. This can be achieved without ASM, but if the storage configuration changes, for example, by adding or removing devices, then the layout of the database files on the devices should change. ASM performs this task automatically in the background. In most non-ASM environments, re-striping is a major task that often involves manual intervention.

In an ILM environment, you apply the SAME methodology to every storage pool.

ASM Settings Specific to VLDBs

Configuration of Automatic Storage Manager for VLDBs is not very different from ASM configuration for non-VLDBs. Certain parameter values, such as the memory allocation to the ASM instance, may need a higher value.

Oracle Database 11g, introduces ASM variable allocation units. Large variable allocation units are beneficial for environments that use large sequential I/Os. VLDBs in general, and large data warehouses in particular, are good candidate environments to take advantage of large allocation units. Allocation units can be set between 1 MB and 64 MB in powers of two (that is, 1, 2, 4, 8, 16, 32, and 64). If your workload contains a significant number of queries scanning large tables, then you should use large ASM allocation units. Use 64 for a very large data warehouse system. Large allocation units also reduce the memory requirements for ASM and improve the ASM startup time.

See Also:

Oracle Database Storage Administrator's Guide for details on how to set up and configure ASM

Monitoring Database Storage Using Database Control

The Performance page in Oracle Enterprise Manager (Enterprise Manager) provides I/O performance overviews. This page is useful for monitoring performance and throughput of the storage configuration. The I/O performance data can be accessed through the Performance page in Enterprise Manager. The I/O charts show I/O statistics collected from all database clients and the underlying system hosting the database. The I/O wait time for a database process represents the amount of time that the process could have been doing useful work if a pending I/O had completed. Oracle Database captures the I/O wait times for all important I/O components in a uniform fashion so that every I/O wait by any Oracle process can be deduced from the I/O statistics.

Three graphs display the following I/O performance data:

Single-block I/O latency: production systems should not show latency of more than ten milliseconds. High latency points to a potential bottleneck in the storage configuration and possibly hot-spots.
I/O megabytes per second: this metric shows the I/O throughput. I/O throughput is an important measure in data warehouse performance.
I/O per second: this metric, commonly referred to as IOPS, is key in an OLTP application. Large OLTP applications with many concurrent users see a lot of IOPS.

Other charts are also available depending on your selection for I/O breakdown:

At the instance level
- Monitoring I/O by Function
- Monitoring I/O by Type
- Monitoring I/O by Consumer Group
At the host level
- Total Disk I/O Per Second page appears
- Longest I/O Service Time

Because the database resides on a set of disks, the performance of the I/O subsystem is very important to database performance. Important disk statistics include the disk I/Os per second and the length of the service times. These statistics show if the disk is performing optimally or if the disk is being overworked. Oracle Database 2 Day + Performance Tuning Guide, Chapter 4 Monitoring Real-Time Database Performance, provides more information on how to monitor disk I/O utilization in the sections headed "Monitoring Instance Activity" and "Monitoring Host Activity".