Oracle® TimesTen In-Memory Database TimesTen to TimesTen Replication Guide Release 11.2.1 Part Number E13072-02 |
|
|
View PDF |
This chapter describes how to design a highly available system and define replication schemes. It includes the following topics:
When you are planning an active standby pair, keep in mind the following restrictions:
You can specify at most 127 subscriber data stores.
Each master and subscriber data store must be on a different node to ensure high availability.
The active master data store and the standby master data store should be on the same LAN.
The clock skew between the active node and the standby node cannot exceed 250 milliseconds.
For the initial set-up, you can create a standby master data store only by duplicating the active master data store with the ttRepAdmin
-duplicate
utility or the ttRepDuplicateEx
C function.
Read-only subscribers can be created only by duplicating the standby master data store. If the standby master data store is unavailable, then the read-only subscribers can be created by duplicating the active master standby store. See "Duplicating a data store".
After failover, the new standby master data store can only be recovered from the active master data store by duplicating the active master data store unless return twosafe replication is used between the active and the standby master data stores. If return twosafe replication is used, the automated master catch-up feature may be used instead. See "Automatic catch-up of a failed master data store".
Operations on replicated tables are not allowed on the standby master data store and the subscriber stores. However, operations on sequences and XLA bookmarks are allowed on the standby master data store and the subscriber stores.
Replication from the standby master data store to the read-only subscribers occurs asynchronously.
ALTER ACTIVE STANDBY PAIR statements can be executed only on the active master data store. If ALTER ACTIVE STANDBY PAIR is executed on the active master data store, then the standby master data store must be regenerated by duplicating the active master data store. All subscribers must also be regenerated from the standby master data store. See "Reversing the roles of the active and standby master data stores".
Before you define the active standby pair, you need to define the DSNs for the active master, standby master and read-only subscriber data stores. On Linux or UNIX, create an odbc.ini
file. On Windows, use the ODBC Administrator to name the data stores and set data store attributes. See "Step 1: Create the DSNs for the master and the subscriber data stores" for an example.
Each data store "name" specified in a replication scheme must match the prefix of the data store file name without the path given for the DataStore
attribute in the DSN definition for the data store. A replication scheme that uses the names specified in the Data Source Name
attributes does not work. To avoid confusion, use the same name for both your DataStore
and Data Source Name
attributes in each DSN definition. For example, if the data store path is directory/subdirectory/foo.ds0
, then foo
is the data store name that you should use.
Use the CREATE ACTIVE STANDBY PAIR SQL statement to create an active standby pair replication scheme. The complete syntax for the CREATE ACTIVE STANDBY PAIR statement is provided in the Oracle TimesTen In-Memory Database SQL Reference.
You must have the ADMIN privilege to use the CREATE ACTIVE STANDBY PAIR statement and to perform other replication operations.
Table 3-1 shows the components of an active standby pair replication scheme and identifies the parameters associated with the topics in this chapter.
Table 3-1 Components of an active standby pair replication scheme
Component | See... |
---|---|
|
|
|
|
|
|
|
|
|
|
|
"Including or excluding data store objects from replication" |
Use the full store name described in "Defining the DSNs for the data stores". The first data store name designates the active master data store. The second data store name designates the standby master data store. Read-only subscriber data stores are indicated by the SUBSCRIBER clause.
You can also specify the hosts where the data stores reside by using an IP address or a literal host name surrounded by double quotes.
The active master data store and the standby master data store should be on separate hosts to achieve a highly available system. Read-only subscribers can be either local or remote. A remote subscriber provides protection from site-specific disasters.
You can provide an optional host ID as part of FullStoreName
:
DataStoreName [ON Host]
Host
can be either an IP address or a literal host name. It is good practice to surround a host name with double quotes.
You can configure your replication scheme with a return service to ensure a higher level of confidence that your replicated data is consistent on the active master and standby master data stores. See "Copying updates between data stores". This section describes how to configure and manage the return receipt and return twosafe services. NO RETURN is the default setting.
The following sections describe the following return service clauses:
TimesTen provides an optional return receipt service to loosely couple or synchronize your application with the replication mechanism.
You can specify the RETURN RECEIPT clause to enable the return receipt service for the standby master data store. With return receipt enabled, when your application commits a transaction for an element on the active master data store, the application remains blocked until the standby acknowledges receipt of the transaction update.
If the standby is unable to acknowledge receipt of the transaction within a configurable timeout period, your application receives a tt_ErrRepReturnFailed
(8170) warning on its commit request. See "Setting the return service timeout period" for more information on the return service timeout period.
You can use the ttRepXactStatus
procedure to check on the status of a return receipt transaction. See "Checking the status of return service transactions" for details.
You can also configure the replication agent to disable the return receipt service after a specific number of timeouts. See "Managing return service timeout errors" for details.
RETURN RECEIPT enables notification of receipt for all transactions. You can use RETURN RECEIPT with the BY REQUEST clause to enable receipt notification only for specific transactions identified by your application.
If you specify RETURN RECEIPT BY REQUEST, you must use the ttRepSyncSet
procedure to enable the return receipt service for a transaction. The call to enable the return receipt service must be part of the transaction (autocommit must be off).
If the standby master data store is unable to acknowledge receipt of the transaction update within a configurable timeout period, the application receives a tt_ErrRepReturnFailed
(8170) warning on its commit request. See "Setting the return service timeout period" for more information on the return service timeout period.
You can use ttRepSyncGet
to check if a return service is enabled and obtain the timeout value. For example:
Command> CALL ttRepSyncGet(); < 01, 45, 1> 1 row found.
TimesTen provides a return twosafe service to fully synchronize your application with the replication mechanism. The return twosafe service ensures that each replicated transaction is committed on the standby master data store before it is committed on the active master data store. If replication is unable to verify the transaction has been committed on the standby, it returns notification of the error. Upon receiving an error, the application can either take a unique action or fall back on preconfigured actions, depending on the type of failure.
When replication is configured with RETURN TWOSAFE, you must disable the AutoCommit
connection attribute.
If the standby is unable to acknowledge commit of the transaction update within a configurable timeout period, the application receives a tt_ErrRepReturnFailed
(8170) warning on its commit request. See "Setting the return service timeout period" for more information on the return service timeout period.
RETURN TWOSAFE enables notification of commit on the standby master data store for all transactions. You can use RETURN TWOSAFE with the BY REQUEST clause to enable notification of a commit on the standby only for specific transactions identified by your application.
If you specify RETURN TWOSAFE BY REQUEST for a standby master data store, you must use the ttRepSyncSet
procedure to enable the return twosafe service for a transaction. The call to enable the return twosafe service must be part of the transaction (autocommit must be off).
If the standby is unable to acknowledge commit of the transaction within the timeout period, the application receives a tt_ErrRepReturnFailed
(8170) warning on its commit request. The application can then chose how to handle the timeout, in the same manner as described for "RETURN TWOSAFE".
The ALTER TABLE statement cannot be used to alter a replicated table that is part of a TWOSAFE BY REQUEST transaction. If DDLCommitBehavior
=1, this operation results in error 8051. If DDLCommitBehavior
=0, the operation succeeds because a commit is performed before the ALTER TABLE operation, resulting in the ALTER TABLE operation being in a new transaction which is not part of the TWOSAFE BY REQUEST transaction.
See "Setting the return service timeout period" for more information on setting the return service timeout period.
You can use ttRepSyncGet
to check if a return service is enabled and obtain the timeout value. For example:
Command> CALL ttRepSyncGet(); < 01, 45, 1> 1 row found.
Table 3-2 lists the optional STORE attributes for the CREATE ACTIVE STANDBY PAIR statement.
Table 3-2 STORE attribute descriptions
Data store attribute | Description |
---|---|
|
Set the return service policy so that return service blocking is disabled after the number of timeouts specified by See "Establishing return service failure/recovery policies". |
|
Set return services on or off when replication is disabled. See "Establishing return service failure/recovery policies". |
|
If See "Establishing return service failure/recovery policies". |
|
Specifies the number of seconds to wait for return service acknowledgement. A value of 0 means that there is no waiting. The default value is 10 seconds. The application can override this timeout setting by using the |
|
Overrides the See "DURABLE COMMIT". |
|
Specifies the default action to be taken for a return service transaction in the event of a timeout. The options are:
This default setting can be overridden for specific transactions by using the |
|
Compresses replicated traffic to reduce the amount of network bandwidth used. |
|
Sets the port number used by a data store to listen for updates from another data store. In an active standby pair, the standby master data store listens for updates from the active master data store. Read-only subscribers listen for updates from the standby master data store. If no See "Port assignments". |
|
Set the maximum number of seconds a data store waits before re-sending a message to an unresponsive data store. In an active standby pair, the active master data store sends messages to the standby master data store. The standby master data store sends messages to the read-only subscribers. |
|
Sets the log failure threshold. |
The rest of this section includes these topics:
If a replication scheme is configured with one of the return services described in "Using a return service", a timeout occurs if the standby master data store is unable to send an acknowledgement back to the active master within the time period specified by TIMEOUT.
The default return service timeout period is 10 seconds. You can specify a different return service timeout period by:
Specifying the RETURN WAIT TIME in the CREATE ACTIVE STANDBY PAIR statement or ALTER ACTIVE STANDBY PAIR statement. A RETURN WAIT TIME of '0' indicates no waiting.
Specifying a different return service timeout period programmatically by calling the ttRepSyncSet
procedure with a new returnWait
parameter.
Once set, the timeout period applies to all subsequent return service transactions until you either reset the timeout period or terminate the application session.
A return service may time out because of a replication failure or because replication is so far behind that the return service transaction times out before it is replicated. However, unless there is a simultaneous replication failure, failure to obtain a return service confirmation from the standby does not necessarily mean the transaction has not been or will not be replicated.
You can set other STORE attributes to establish policies that automatically disable return service blocking if there are excessive timeouts and re-enable return service blocking when conditions improve. See "Managing return service timeout errors" for details.
If the standby master data store is unable to acknowledge the transaction update from the active master data store within the timeout period, the application receives an errRepReturnFailed
warning on its commit request.
The default return service timeout period is 10 seconds. You can specify a different timeout period by:
Using the RETURN WAIT TIME clause in the STORE clause of the CREATE ACTIVE STANDBY PAIR statement
Calling the ttRepSyncSet
procedure with a new returnWait
parameter
A return service may time out or fail because of a replication failure or because replication is so far behind that the return service transaction times out before it is replicated. However, unless there is a simultaneous replication failure, failure to obtain a return service confirmation from the subscriber does not mean the transaction has not or will not be replicated.
You can respond to return service timeouts by:
You may want respond in some manner if replication is stopped or return service timeout failures begin to adversely impact the performance of your replicated system. Your "tolerance threshold" for return service timeouts may depend on the historical frequency of timeouts and the performance/availability equation for your particular application, both of which should be factored into your response to the problem.
When using the return receipt service, you can manually respond by:
Using the ALTER ACTIVE STANDBY PAIR statement to disable return receipt blocking
Calling the ttDurableCommit
procedure to durably commit transactions on the active master data storey that you can no longer verify as being received by the standby
If you decide to disable return receipt blocking, your decision to re-enable it depends on your confidence level that the return receipt transaction is no longer likely to time out.
An alternative to manually responding to return service timeout failures is to establish return service failure and recovery policies in the replication scheme. These policies direct the replication agents to detect changes to the replication state and to keep track of return service timeouts and then automatically respond in a predefined manner.
The following attributes in the CREATE ACTIVE STANDBY PAIR statement set the failure and recovery policies when using a RETURN RECEIPT or RETURN TWOSAFE service:
The policies set by these attributes are applicable for the life of the data store or until changed. The replication agent must be running to enforce these policies.
The RETURN SERVICES {ON | OFF} WHEN [REPLICATION] STOPPED attribute determines whether a return receipt or return twosafe service continues to be enabled or is disabled when replication is stopped. "Stopped" in this context means that either the active master replication agent is stopped (for example, by ttAdmin
-repStop
active
) or the replication state of the standby master data store is set to Stop
or Pause
with respect to the active master data store (for example, by ttRepAdmin
-state stop
standby
). A failed standby that has exceeded the specified FAILTHRESHOLD value is set to the Failed
state, but is eventually set to the Stop
state by the master replication agent.
Note:
A standby master data store may become unavailable for a period of time that exceeds the timeout period specified by RETURN WAIT TIME but still be considered by the master replication agent to be in theStart
state. Failure policies related to timeouts are set by the DISABLE RETURN attribute.RETURN SERVICES OFF WHEN REPLICATION STOPPED disables the return service when replication is stopped and is the default when using the RETURN RECEIPT service. RETURN SERVICES ON WHEN REPLICATION STOPPED allows the return service to continue to be enabled when replication is stopped and is the default when using the RETURN TWOSAFE service.
When a DISABLE RETURN value is set, the data store keeps track of the number of return receipt or return twosafe transactions that have exceeded the timeout period set by RETURN WAIT TIME. If the number of timeouts exceeds the maximum value set by DISABLE RETURN, the application reverts to a default replication cycle in which it no longer waits for the standby to acknowledge the replicated updates.
Specifying SUBSCRIBER is the same as specifying ALL. Both settings refer to the standby master data store.
The DISABLE RETURN failure policy is only enabled when the replication agent is running. If DISABLE RETURN is specified but RESUME RETURN is not specified, the return services remain off until the replication agent for the data store has been restarted.You can cancel this failure policy by stopping the replication agent and specifying DISABLE RETURN with a zero value for NumFailures
. The count of timeouts to trigger the failure policy is reset either when you restart the replication agent, when you set the DISABLE RETURN value to 0, or when return service blocking is re-enabled by RESUME RETURN.
When we say return service blocking is "disabled," we mean that the applications on the master data store no longer block execution while waiting to receive acknowledgements from the subscribers that they received or committed the replicated updates. Note, however, that the master still listens for an acknowledgement of each batch of replicated updates from the standby master data store.
You can establish a return service recovery policy by setting the RESUME RETURN attribute and specifying a resume latency value. When this attribute is set and return service blocking has been disabled for the standby master data store, the return receipt or return twosafe service is re-enabled when the commit-to-acknowledge time for a transaction falls below the value set by RESUME RETURN. The commit-to-acknowledge time is the latency between when the application issues a commit and when the master receives acknowledgement of the update from the subscriber.
The RESUME RETURN policy is enabled only when the replication agent is running. You can cancel a return receipt resume policy by stopping the replication agent and then using ALTER ACTIVE STANDBY PAIR to set RESUME RETURN to zero.
You can set the DURABLE COMMIT attribute to specify the durable commit policy for applications that have return service blocking disabled by DISABLE RETURN. When DURABLE COMMIT is set to ON, it overrides the DurableCommits
attribute on the master data store and forces durable commits for those transactions that have had return service blocking disabled.
If the replication scheme is configured with RETURN SERVICES ON WHEN REPLICATION STOPPED, the replication agent must be running to enforce the DURABLE COMMIT policy.
When you are using the return twosafe service, you can specify how the master replication agent responds to timeouts by setting LOCAL COMMIT ACTION. You can override the setting for specific transactions by calling the localAction
parameter in the ttRepSyncSet
procedure.
The possible actions upon receiving a timeout during replication of a twosafe transaction are:
COMMIT - Upon timeout, the replication agent on the active master data store commits the transaction and no more operations are allowed in the transaction.
NO ACTION - Upon timeout, the replication agent on the active master data store does not commit the transaction. The process recovery commits the transaction. This is equivalent to a forced commit.
If you are replicating over a low-bandwidth network, or if you are replicating massive amounts of data, you can set the COMPRESS TRAFFIC attribute to reduce the amount of bandwidth required for replication. The COMPRESS TRAFFIC attribute compresses the replicated data from the data store specified by the STORE parameter in the CREATE ACTIVE STANDBY PAIR or ALTER ACTIVE STANDBY PAIR statement. TimesTen does not compress traffic from other data stores.
Though the compression algorithm is optimized for speed, enabling the COMPRESS TRAFFIC attribute has some impact on replication throughput and latency.
Example 3-1 Compressing traffic from active master data store
For example, to compress replicated traffic from active master data store dsn1
and leave the replicated traffic from standby master data store dsn2
uncompressed, the CREATE ACTIVE STANDBY PAIR statement looks like:
CREATE ACTIVE STANDBY PAIR dsn1 ON "machine1", dsn2 ON "machine2" SUBSCRIBER dsn3 ON "machine3" STORE dsn1 ON "machine1" COMPRESS TRAFFIC ON;
Example 3-2 Compressing traffic from both master data stores
To compress the replicated traffic from the dsn1
and dsn2
data stores, use:
CREATE ACTIVE STANDBY PAIR dsn1 ON "machine1", dsn2 ON "machine2" SUBSCRIBER dsn3 ON "machine3" STORE dsn1 ON "machine1" COMPRESS TRAFFIC ON STORE dsn2 ON "machine2" COMPRESS TRAFFIC ON;
If you do not assign a PORT attribute, the TimesTen daemon dynamically selects the port. When ports are assigned dynamically in this manner for the replication agents, then the ports of the TimesTen daemons have to match as well.
When statically assigning ports, it is important to specify the full host name, DSN and PORT in the STORE attribute of the CREATE ACTIVE STANDBY PAIR statement.
You can establish a threshold value that, when exceeded, sets an unavailable standby master data store or a read-only subscriber to the Failed
state before the available log space is exhausted.
You can set the log threshold by specifying a STORE clause with a FAILTHRESHOLD value in the CREATE ACTIVE STANDBY PAIR or ALTER ACTIVE STANDBY PAIR statement. The default threshold value is 0, which means "no limit."
If an active master data store sets the standby master data store or a read-only subscriber to the Failed
state, it drops all of the data for the failed data store from its log and transmits a message to the failed data store. If the active master replication agent can communicate with the replication agent of the failed data store, then the message is transmitted immediately. Otherwise, the message is transmitted when the connection is reestablished.
Any application that connects to the failed data store receives a tt_ErrReplicationInvalid
(8025) warning indicating that the data store has been marked Failed
by a replication peer. Once the data store has been informed of its failed status, its state on the active master data store is changed from Failed
to Stop
.
An application can use the ODBC SQLGetInfo
function to check if the data store the application is connected to has been set to the Failed
state.
If a replication host has more than one network interface, you may wish to configure replication to use an interface other than the default interface. Although you must specify the host name returned by the operating system's hostname
command when you specify the data store name, you can configure replication to send or receive traffic over a different interface using the ROUTE clause.
The syntax of the ROUTE clause is:
ROUTE MASTER FullStoreName SUBSCRIBER FullStoreName {{MASTERIP MasterHost | SUBSCRIBERIP SubscriberHost} PRIORITY Priority} [...]
In the context of the ROUTE clause, each master data store is a subscriber of the other master data store and each read-only subscriber is a subscriber of both master data stores.
Example 3-4 Configuring multiple network interfaces
If host machine1
is configured with a second interface accessible by the host name machine1fast
, and machine2
is configured with a second interface at IP address 192.168.1.100, you may specify that the secondary interfaces are used with the replication scheme.
CREATE ACTIVE STANDBY PAIR dns1, dsn2 ROUTE MASTER dsn1 ON "machine1" SUBSCRIBER dsn2 ON "machine2" MASTERIP "machine1fast" PRIORITY 1 SUBSCRIBERIP "192.168.1.100" PRIORITY 1 ROUTE MASTER dsn2 ON "machine2" SUBSCRIBER dsn1 ON "machine1" MASTERIP "192.168.1.100" PRIORITY 1 SUBSCRIBERIP "machine1fast" PRIORITY 1;
Alternately, on a replication host with more than one interface, you may wish to configure replication to use one or more interfaces as backups, in case the primary interface fails or the connection from it to the receiving host is broken. You can use the ROUTE clause to specify two or more interfaces for each master or subscriber that are used by replication in order of priority.
If replication on the master host is unable to bind to the MASTERIP with the highest priority, it will try to connect using subsequent MASTERIP addresses in order of priority immediately. However, if the connection to the subscriber fails for any other reason, replication will try to connect using each of the SUBSCRIBERIP addresses in order of priority before it tries the MASTERIP address with the next highest priority.
Example 3-5 Configuring network priority
If host machine1
is configured with two network interfaces at IP addresses 192.168.1.100 and 192.168.1.101, and host machine2
is configured with two interfaces at IP addresses 192.168.1.200 and 192.168.1.201, you may specify that replication use IP addresses 192.168.1.100 and 192.168.200 to transmit and receive traffic first, and to try IP addresses 192.168.1.101 or 192.168.1.201 if the first connection fails.
CREATE ACTIVE STANDBY PAIR dns1, dns2 ROUTE MASTER dsn1 ON "machine1" SUBSCRIBER dsn2 ON "machine2" MASTERIP "192.168.1.100" PRIORITY 1 MASTERIP "192.168.1.101" PRIORITY 2 SUBSCRIBERIP "192.168.1.200" PRIORITY 1 SUBSCRIBERIP "192.168.1.201" PRIORITY 2;
An active standby pair replicates an entire data store by default. Use the INCLUDE clause to replicate only the tables, cache groups and sequences that are listed in the INCLUDE clause. No other data store objects will be replicated in an active standby pair that is defined with an INCLUDE clause. For example, this INCLUDE clause specifies three tables to be replicated by the active standby pair:
INCLUDE TABLE employees, departments, jobs
You can choose to exclude specific tables, cache groups or sequences from replication by using the EXCLUDE clause of the CREATE ACTIVE STANDBY PAIR statement. Use one EXCLUDE clause for each object type. For example:
EXCLUDE TABLE ttuser.tab1, ttuser.tab2 EXCLUDE CACHE GROUP ttuser.cg1, ttuser.cg2 EXCLUDE SEQUENCE ttuser.seq1, ttuser.seq2