Oracle® Database 2 Day + Real Application Clusters Guide 11g Release 2 (11.2) Part Number E10743-01 |
|
|
View PDF |
This chapter describes how to administer your Oracle Clusterware environment. It describes how to administer the voting disks and the Oracle Cluster Registry (OCR) in the following sections:
Oracle Real Application Clusters (Oracle RAC) uses Oracle Clusterware as the infrastructure that binds together multiple nodes that then operate as a single server. In an Oracle RAC environment, Oracle Clusterware monitors all Oracle components (such as instances and listeners). If a failure occurs, Oracle Clusterware automatically attempts to restart the failed component and also redirects operations to a surviving component.
Oracle Clusterware includes a high availability framework for managing any application that runs on your cluster. Oracle Clusterware manages applications to ensure they start when the system starts. Oracle Clusterware also monitors the applications to make sure that they are always available. For example, if an application process fails, then Oracle Clusterware attempts to restart the process based on scripts that you customize. If a node in the cluster fails, then you can program application processes that typically run on the failed node to restart on another node in the cluster.
Oracle Clusterware includes two important components: the voting disk and the OCR. The voting disk is a file that manages information about node membership. The OCR is a file that contains information about the cluster node list, instance-to-node mapping information, and information about Oracle Clusterware resource profiles for resources that you have customized.
Each node in a cluster also has its own local OCR, called an Oracle Local Registry (OLR), that is created when Oracle Clusterware is installed. Multiple processes on each node have simultaneous read and write access to the OLR particular to the node on which they reside, whether or not Oracle Clusterware is fully functional. By default, OLR is located at Grid_home
/cdata/
$HOSTNAME
.olr
The Oracle Clusterware installation process creates the voting disk and the OCR on shared storage. If you select the option for normal redundant copies during the installation process, then Oracle Clusterware automatically maintains redundant copies of these files to prevent the files from becoming single points of failure. The normal redundancy feature also eliminates the need for third-party storage redundancy solutions. When you use normal redundancy, Oracle Clusterware automatically maintains two copies of the OCR file and three copies of the voting disk file.
You can dynamically add and remove voting disks after installing Oracle RAC. Do this using the following commands where path
is the fully qualified path for the additional voting disk.
To add or remove a voting disk:
Run the following command as the root
user to add a voting disk:
crsctl add css votedisk path
Run the following command as the root
user to remove a voting disk:
crsctl delete css votedisk path
Note:
If your cluster is down, then you can use the-force
option to add a voting disk without interacting with active Oracle Clusterware daemons. However, you may corrupt your cluster configuration if you use the -force
option while a cluster node is active.High availability configurations have redundant hardware and software that maintain operations by avoiding single points of failure. When a component is down, Oracle Clusterware redirects its managed resources to one of the redundant components. However, if a disaster strikes, or a massive hardware failure occurs, having redundant components might not be enough. To fully protect your system it is important to have backups of your critical files.
The voting disk records node membership information. A node must be able to access more than half of the voting disks at any time. To avoid simultaneous loss of multiple voting disks, each voting disk should be on a storage device that does not share any components (controller, interconnect, and so on) with the storage devices used for the other voting disks.
For example, if you have five voting disks configured, then a node must be able to access at least three of the voting disks at any time. If a node cannot access the minimum required number of voting disks it is evicted, or removed, from the cluster. After the cause of the failure has been corrected and access to the voting disks has been restored, you can instruct Oracle Clusterware to recover the failed node and restore it to the cluster.
If you lose all copies of the voting disk and do not have a backup, the only safe way to re-create the voting disk is to reinstall Oracle Clusterware. Having a backup of the voting disk can drastically simplify the recovery of your system.
The voting disk files are backed up automatically by Oracle Clusterware if the contents of the files have changed in the following ways:
Configuration parameters, for example misscount
, have been added or modified
After performing voting disk add
or delete
operations
If a voting disk is damaged, and no longer usable by Oracle Clusterware, you can recreate the voting disk. The voting disk contents are restored from a backup when a new voting file is added; this occurs regardless of whether or not the voting disk file is stored in Oracle Automatic Storage Management (Oracle ASM). If you need to replace a corrupt, damaged, or missing voting disk, then use CRSCTL to first delete the voting disk and then create a new voting disk in the same location.
Restoring a voting disk from a copy created with the operating system dd
command is not supported.
Oracle Clusterware automatically creates OCR backups every 4 hours. At any one time, Oracle Clusterware always retains the latest 3 backup copies of the OCR that are 4 hours old, 1 day old, and 1 week old.
You cannot customize the backup frequencies or the number of files that Oracle Clusterware retains. You can use any backup software to copy the automatically generated backup files at least once daily to a different device from where the primary OCR file resides.
This section contains the following topics:
Use the ocrconfig
utility to view the backups generated automatically by Oracle Clusterware.
To find the most recent backup of the OCR:
Run the following command on any node in the cluster:
ocrconfig -showbackup
Use the ocrconfig
utility to force Oracle Clusterware to perform a backup of OCR at any time, rather than wait for the automatic backup that occurs at 4-hour intervals. This option is especially useful when you want to obtain a binary backup on demand, such as before you make changes to OCR.
To manually backup the contents of the OCR:
Log in as the root
user.
Use the following command to force Oracle Clusterware to perform an immediate backup of the OCR:
ocrconfig -manualbackup
The date and identifier of the recently generated OCR backup is displayed.
(Optional) If you need to change the location for the OCR backup files, use the following command, where directory_name
is the new location for the backups:
ocrconfig -backuploc directory_name
The default location for generating backups on Oracle Enterprise Linux systems is Grid_home
/cdata/
cluster_name
where cluster_name
is the name of your cluster and Grid_home is the home directory of your Oracle grid infrastructure software. Because the default backup is on a local file system, Oracle recommends that you include the backup file created with the ocrconfig
command as part of your operating system backup using standard operating system or third-party tools.
Tip:
You can use theocrconfig -backuploc
command to move the location where the OCR backups are createdThere are two methods for recovering the OCR. The first method uses automatically generated OCR file copies and the second method uses manually created OCR export files.
This section contains the following topics:
In event of a failure, before you attempt to restore the OCR, ensure that the OCR is unavailable.
To check the status of the OCR:
Run the following command:
ocrcheck
If this command does not display the message 'Device/File integrity check succeeded'
for at least one copy of the OCR, then all copies of the OCR have failed. You must restore the OCR from a backup or OCR export.
If there is at least one copy of the OCR available, you can use that copy to restore the other copies of the OCR.
When restoring the OCR from automatically generated backups, you first have to determine which backup file you will use for the recovery.
To restore the OCR from an automatically generated backup on an Oracle Enterprise Linux system:
Log in as the root
user.
Identify the available OCR backups using the ocrconfig
command:
[root]# ocrconfig -showbackup
Review the contents of the backup using the following ocrdump
command, where file_name
is the name of the OCR backup file for which the contents should be written out to the file ocr_dump_output_file:
[root]# ocrdump ocr_dump_output_file -backupfile file_name
If you do not specify an output file name, then the OCR contents are written to a file named OCRDUMPFILE
in the current directory.
As the root
user, stop Oracle Clusterware on all the nodes in your Oracle RAC cluster by executing the following command:
[root]# crsctl stop cluster -all
As the root
user, restore the OCR by applying an OCR backup file that you identified in Step 1 using the following command, where file_name
is the name of the OCR that you want to restore. Make sure that the OCR devices that you specify in the OCR configuration exist, and that these OCR devices are valid before running this command.
[root]# ocrconfig -restore file_name
As the root
user, restart Oracle Clusterware on all the nodes in your cluster by running the following command:
[root]# crsctl start cluster -all
Use the Cluster Verification Utility (CVU) to verify the OCR integrity. Exit the root
user account, and as the software owner of the Oracle grid infrastructure run the following command, where the -n all
argument retrieves a list of all the cluster nodes that are configured as part of your cluster:
cluvfy comp ocr -n all [-verbose]
You maintain the Oracle Local Registry using the OCRCHECK, OCRDUMP, and OCRCONFIG utilities with the -local
option.
To check the status of the OLR:
As the root
user, use the OCRCHECK utility, as shown in the following example:
[root]# ocrcheck -local
This command produces output similar to the following:
Status of Oracle Local Registry is as follows : Version : 3 Total space (kbytes) : 262132 Used space (kbytes) : 9200 Available space (kbytes) : 252932 ID : 604793089 Device/File Name : /u01/grid/cdata/node01.olr Device/File integrity check succeeded Local registry integrity check succeeded Logical corruption check succeeded
To view the contents of the OLR:
Use the OCRDUMP utility to display the contents of the OLR to the terminal window that initiated the program, as follows:
ocrdump -local -stdout
To export the OLR to a file:
As the root
user, use the OCRCONFIG utility, as shown in the following example:
[root]# ocrconfig –local –export file_name
To import a specified file to the OLR:
As the root
user, use the OCRCONFIG utility, as shown in the following example:
[root]# ocrconfig –local –import file_name
To change the location of the OLR file on the local node:
As the root
user, use the OCRCONFIG utility to modify the location where the OLR file is stored on the local host:
$ ocrconfig –local –repair -replace current_olr_file_name -replacement new_olr_file_name
This section describes how to administer the OCR. The OCR contains information about the cluster node list, which instances are running on which nodes, and information about Oracle Clusterware resource profiles for applications that have been modified to be managed by Oracle Clusterware.
This section contains the following topics:
Note:
The operations in this section affect the OCR for the entire cluster. However, theocrconfig
command cannot modify OCR configuration information for nodes that are shut down or for nodes on which Oracle Clusterware is not running. Avoid shutting down nodes while modifying the OCR using the ocrconfig
command.Oracle Clusterware supports up to 5 OCR copies. You can add an OCR location after an upgrade or after completing the Oracle RAC installation. Additional OCR copies provide greater fault tolerance.
To add an OCR file:
As the root
user, enter the following command to add a new OCR file:
[root]# ocrconfig -add new_ocr_file_name
This command updates the OCR configuration on all the nodes on which Oracle Clusterware is running.
If you need to change the location of an existing OCR, or change the location of a failed OCR to the location of a working one, you can use the following procedure as long as one OCR file remains online.
To change the location of an OCR or replace an OCR file:
Use the OCRCHECK utility to verify that a copy of the OCR other than the one you are going to replace is online, using the following command:
ocrcheck
Note:
The OCR that you are replacing can be either online or offline.Use the following command to verify that Oracle Clusterware is running on the node on which the you are going to perform the replace operation:
crsctl check cluster -all
As the root
user, enter the following command to designate a new location for the specified OCR file:
[root]# ocrconfig -replace source_ocr_file -replacement destination_ocr_file
This command updates the OCR configuration on all the nodes on which Oracle Clusterware is running.
Use the OCRCHECK utility to verify that OCR replacement file is online:
ocrcheck
To remove an OCR file, at least one copy of the OCR must be online. You can remove an OCR location to reduce OCR-related overhead or to stop mirroring your OCR because you moved the OCR to a redundant storage system, such as a redundant array of independent disks (RAID).
To remove an OCR location from your Oracle RAC cluster:
Use the OCRCHECK utility to ensure that at least one OCR other than the OCR that you are removing is online.
ocrcheck
Note:
Do not perform this OCR removal procedure unless there is at least one active OCR online.As the root
user, run the following command on any node in the cluster to remove a specific OCR file:
[root]# ocrconfig -delete ocr_file_name
This command updates the OCR configuration on all the nodes on which Oracle Clusterware is running.
If one of the nodes in your cluster was not available when you modified the OCR configuration, then you might need to repair the OCR configuration on that node before it is restarted.
To repair an OCR configuration:
As the root
user, run the one or more of the following commands on the node on which Oracle Clusterware is stopped, depending on the number and type of changes that were made to the OCR configuration:
[root]# ocrconfig –repair -add new_ocr_file_name
[root]# ocrconfig –repair -delete ocr_file_name
[root]# ocrconfig –repair -replace source_ocr_file -replacement dest_ocr_file
These commands update the OCR configuration only on the node from which you run the command.
Note:
You cannot perform these operations on a node on which the Oracle Clusterware daemon is running.Restart Oracle Clusterware on the node you have just repaired.
As the root
user, check the OCR configuration integrity of your cluster using the following command:
[root]# ocrcheck
This section includes the following topics about troubleshooting the Oracle Cluster Registry (OCR):
The OCRCHECK utility displays the data block format version used by the OCR, the available space and used space in the OCR, the ID used for the OCR, and the locations you have configured for the OCR. The OCRCHECK utility calculates a checksum for all the data blocks in all the OCRs that you have configured to verify the integrity of each block. It also returns an individual status for each OCR file as well as a result for the overall OCR integrity check. The following is a sample of the OCRCHECK output:
Status of Oracle Cluster Registry is as follows : Version : 3 Total space (kbytes) : 262144 Used space (kbytes) : 16256 Available space (kbytes) : 245888 ID : 570929253 Device/File Name : +CRS_DATA Device/File integrity check succeeded ... Decive/File not configured Cluster registry integrity check succeeded Logical corruption check succeeded
The OCRCHECK utility creates a log file in the following directory, where Grid_home
is the location of the Oracle grid infrastructure installation, and hostname
is the name of the local node:
Grid_home/log/hostname/client
The log files have names of the form ocrcheck_
nnnnn
.log
, where nnnnn
is the process ID of the operating session that issued the ocrcheck
command.
Table 5-1 describes common OCR problems and their corresponding solutions.
Table 5-1 Common OCR Problems and Solutions
Problem | Solution |
---|---|
The OCR is not mirrored. |
Run the |
A copy of the OCR has failed and you must replace it. Error messages are being reported in Enterprise Manager or the OCR log file. |
Run the |
The OCR configuration was updated incorrectly. |
Run the |
You are experiencing a severe performance effect from updating multiple OCR files, or you want to remove an OCR file for other reasons. |
Run the |