Monday, July 30, 2007

RAC Architecture Overview

RAC Architecture Overview

Let's begin with a brief overview of RAC architecture.

A cluster is a set of 2 or more machines (nodes) that share or coordinate resources to perform the same task.

A RAC database is 2 or more instances running on a set of clustered nodes, with all instances accessing a shared set of database files.

Depending on the O/S platform, a RAC database may be deployed on a cluster that uses vendor clusterware plus Oracle's own clusterware (Cluster Ready Services), or on a cluster that solely uses Oracle's own clusterware.

Thus, every RAC sits on a cluster that is running Cluster Ready Services. srvctl is the primary tool DBAs use to configure CRS for their RAC database and processes.

Cluster Ready Services and the OCR

Cluster Ready Services, or CRS, is a new feature for 10g RAC. Essentially, it is Oracle's own clusterware. On most platforms, Oracle supports vendor clusterware; in these cases, CRS interoperates with the vendor clusterware, providing high availability support and service and workload management. On Linux and Windows clusters, CRS serves as the sole clusterware. In all cases, CRS provides a standard cluster interface that is consistent across all platforms.

CRS consists of four processes (crsd, occsd, evmd, and evmlogger) and two disks:
the Oracle Cluster Registry (OCR), and the voting disk.
CRS manages the following resources:
The ASM instances on each node
Databases
The instances on each node
Oracle Services on each node

The cluster nodes themselves, including the following processes, or "nodeapps":
VIP
GSD
The listener
The ONS daemon

CRS stores information about these resources in the OCR. If the information in the OCR for one of these resources becomes damaged or inconsistent, then CRS is no longer able to manage that resource. Fortunately, the OCR automatically backs itself up regularly and frequently.
Interacting with CRS and the OCR: srvctl
srvctl is the tool Oracle recommends that DBAs use to interact with CRS and the cluster registry. Oracle does provide several tools to interface with the cluster registry and CRS more directly, at a lower level, but these tools are deliberately undocumented and intended only for use by Oracle Support. srvctl, in contrast, is well documented and easy to use. Using other tools to modify the OCR or manage CRS without the assistance of Oracle Support runs the risk of damaging the OCR.
Using srvctl
Even if you are experienced with 9i srvctl, it's worth taking a look at this section; 9i and 10g srvctl commands are slightly different.
srvctl must be run from the $ORACLE_HOME of the RAC you are administering. The basic format of a srvctl command is
srvctl [options]
where command is one of
enabledisablestartstoprelocatestatusaddremovemodifygetenvsetenvunsetenvconfig
and the target, or object, can be a database, instance, service, ASM instance, or the nodeapps.
The srvctl commands are summarized in this table:
Table 1. Summary of srvctl commands.
Command
Targets
Description
srvctl addsrvctl modifysrvctl remove
databaseinstanceservicenodeapps
srvctl add / remove adds/removes target's configuration information to/from the OCR.
srvctl modify allows you to change some of target's configuration information in the OCR without wiping out the rest.
srvctl relocate
service
Allows you to reallocate a service from one named instance to another named instance.
srvctl config
databaseservicenodeappsasm
Lists configuration information for target from the OCR.
srvctl disablesrvctl enable
databaseinstanceserviceasm
srvctl disable disables target, meaning CRS will not consider it for automatic startup, failover, or restart. This option is useful to ensure an object that is down for maintenance is not accidentally automatically restarted.
srvctl enable reenables the specified object.
srvctl getenvsrvctl setenvsrvctl unsetenv
databaseinstanceservicenodeapps
srvctl getenv displays the environment variables stored in the OCR for target.
srvctl setenv allows these variables to be set, and unsetenv unsets them.
srvctl startsrvctl statussrvctl stop
databaseinstanceservicenodeappsasm
Start, stop, or display status (started or stopped) of target.
As you can see, srvctl is a powerful utility with a lot of syntax to remember. Fortunately, there are only really two commands to memorize: srvctl -help displays a basic usage message, and srvctl -h displays full usage information for every possible srvctl command.

Examples for using srvctl

Example 1. Bring up the MYSID1 instance of the MYSID database.
[oracle@myserver oracle]$ srvctl start instance -d MYSID -i MYSID1

Example 2. Stop the MYSID database: all its instances and all its services, on all nodes.
[oracle@myserver oracle]$ srvctl stop database -d MYSID

Example 3. Stop the nodeapps on the myserver node. NB: Instances and services also stop.
[oracle@myserver oracle]$ srvctl stop nodeapps -n myserver

Example 4. Add the MYSID3 instance, which runs on the myserver node, to the MYSID clustered database.
[oracle@myserver oracle]$ srvctl add instance -d MYSID -i MYSID3 -n myserver

Example 5. Add a new node, the mynewserver node, to a cluster.[oracle@myserver oracle]$ srvctl add nodeapps -n mynewserver -o $ORACLE_HOME -A 149.181.201.1/255.255.255.0/eth1
(The -A flag precedes an address specification.)

Example 6. To change the VIP (virtual IP) on a RAC node, use the command
[oracle@myserver oracle]$ srvctl modify nodeapps -A new_address

Example 7. Find out whether the nodeapps on mynewserver are up.[oracle@myserver oracle]$ srvctl status nodeapps -n mynewserverVIP is running on node: mynewserverGSD is running on node: mynewserverListener is not running on node: mynewserverONS daemon is running on node: mynewserver

Example 8. Disable the ASM instance on myserver for maintenance.
[oracle@myserver oracle]$ srvctl disable asm -n myserver
Debugging srvctl
Debugging srvctl in 10g couldn't be easier. Simply set the SRVM_TRACE environment variable.
[oracle@myserver bin]$ export SRVM_TRACE=true
Let's repeat Example 6 with SRVM_TRACE set to true:[oracle@myserver oracle]$ srvctl status nodeapps -n mynewserver/u01/app/oracle/product/10.1.0/jdk/jre//bin/java -classpath /u01/app/oracle/product/10.1.0/jlib/netcfg.jar:/u01/app/oracle/product/10.1.0/jdk/jre//lib/rt.jar:/u01/app/oracle/product/10.1.0/jdk/jre//lib/i18n.jar:/u01/app/oracle/product/10.1.0/jlib/srvm.jar:/u01/app/oracle/product/10.1.0/jlib/srvmhas.jar:/u01/app/oracle/product/10.1.0/jlib/srvmasm.jar:/u01/app/oracle/product/10.1.0/srvm/jlib/srvctl.jar -DTRACING.ENABLED=true -DTRACING.LEVEL=2 oracle.ops.opsctl.OPSCTLDriver status nodeapps -n mynewserver[main] [19:53:31:778] [OPSCTLDriver.setInternalDebugLevel:165] tracing is true at level 2 to file null[main] [19:53:31:825] [OPSCTLDriver.:94] Security manager is set[main] [19:53:31:843] [CommandLineParser.parse:157] parsing cmdline args[main] [19:53:31:844] [CommandLineParser.parse2WordCommandOptions:900] parsing 2-word cmdline[main] [19:53:31:866] [GetActiveNodes.create:212] Going into GetActiveNodes constructor...[main] [19:53:31:875] [HASContext.getInstance:191] Module init : 16[main] [19:53:31:875] [HASContext.getInstance:216] Local Module init : 19...[main] [19:53:32:285] [ONS.isRunning:186] Status of ora.ganges.ons on mynewserver is trueONS daemon is running on node: mynewserver[oracle@myserver oracle]$

Pitfalls

A little impatience when dealing with srvctl can corrupt your OCR, ie, put it into a state where the information for a given object is inconsistent or partially missing. Specifically, the srvctl remove command provides the -f option, to allow you to force removal of an object from the OCR. Use this option judiciously, as it can easily put the OCR into an inconsistent state.

Restoring the OCR from an inconsistent state is best done with the assistance of Oracle Support, who will guide you in using the undocumented $CRS_HOME/bin/crs_* tools to repair it. The OCR can also be restored from backup.

Error messages

srvctl errors are PRK% errors, which are not documented in the 10gR1 error messages manual. However, for those with a
Metalink account, they are documented on Metalink here.
Conclusion

srvctl is a powerful tool that will allow you to administer your RAC easily and effectively. In addition, it provides a valuable buffer between the DBA and the OCR, making it more difficult to corrupt the OCR.











Global Cache Service GCS and Global Enqueue Service GES together manage the Cache Fusion processes, resource transfer and resource escalation among the instances. Enqueues are internal oracle locks; GCS to handle the buffer cache across all instances.

GCS and GES together maintain Global Resource Directory (GRD). GRD remains in the memory and is stored on all instances. GRD records current status of the data blocks.
It is distributed across all instances in a cluster and is located in the variable or shared pool section of the SGA.

RAC Processes : LMON – Global Enqueue Service Monitor (LMON) monitors the entire cluster. It manages instance and process failures and associated recovery for GCS and GES.

LMDx – Global Enqueue Service Daemon (LMD) is the lock agent. It also handles dead lock detection and remote enqueue requests.

LMSx – Global Cache Service Processes are the processes that handle remote GCS messages. Can have upto 10 in number. LMS is interconnect process and monitors the block transfer between instances.

LCKx – It manages Global Enqueue requests and the cross-instance broadcast.

DIAG – Diagnosability Daemon – Monitors the health of the instance and captures the data for instance process failures. Can be in non-rac database too after 9i.

The use of single SPFILE provide administrative ease and SPFILE has to be located on a clustered file system.

OCR contains cluster and database configuration information for RAC and CRS, instance and

Voting Disk is a file on a shared cluster system; used to maintain cluster integrity or cluster membership.

CRS starts up first and at that time ASM is not up, so we can not have OCR and voting disk on ASM.
$CRS_HOME/log contains the alert log and

ASM – has its init.ora file which tells which instances it has.

Nodeapps : VIP/GSD/Listener/Oracle Notification Service (ONS).

VIP (virtual IP) – CRS resource associated with an ip address.

TAF – Transparent Application Failover

ASM

ASM stores the metadata that is required to make available the files stored within the ASM storage system to non-ASM oracle databases.

Two additional Processes : RBAL – Rebalancer Process to rebalance activity for ASM disk groups and ARBn – Actual rebalancer process to rebalance of data extent movements.

wmos=
(DESCRIPTION=
(LOAD_BALANCE=yes)
(FAILOVER=on)
(ADDRESS= (PROTOCOL=tcp)(HOST=wmsdevrac1-vip.acme.com)(PORT=1526))
(ADDRESS= (PROTOCOL=tcp)(HOST=wmsdevrac2-vip.acme.com)(PORT=1526))
(CONNECT_DATA=
(SERVICE_NAME=wmos)
(FAILOVER_MODE=
(TYPE=select)
(METHOD=basic))))


Overview of Transparent Application Failover
Uncommitted insert, update, and delete commands are rolled-back and must be resubmitted after reconnection. Again, use of the OCI packages should be utilized to have the DML operations reissued.


The Oracle Net process carries out TAF functionality. The failover is configured in the tnsnames file. The TAF settings are placed in the net service name area, within the connect_data section of the tnsnames, using the failover_mode and instance_role parameters.

Load Balancing

The listener connection load-balancing feature improves connection performance by balancing the number of active connections among multiple dispatchers and instances. In a single-instance environment, the listener selects the least-loaded dispatcher to handle the incoming client requests. In an Oracle Database 10g Real Application Clusters environment, connection load balancing also has the capability to balance the number of active connections among multiple instances. Due to dynamic service registration, a listener is always aware of all instances, and in the case of the multi-threaded server (MTS), a listener is aware of the dispatchers, regardless of their locations. Depending on the load information, a listener decides which instance, and if the multi-threaded server (shared server) is configured, it decides which dispatcher to send the incoming client request. In a MTS configuration, a listener selects a dispatcher in the following order: