Highly available and cost effective solution with Oracle SE

Purpose

 

Maximum Availability Architecture is the concept of getting the absolute most of available resources to a service. We can distinguish two concepts that generally are in the loop:

 

  • Data availability: Multiplying the location of the data
  • Service availability: Multiplying the location of the service

 

To achieve this, the architect needs to be prepared for various interrelated issues:

  • Hardware and software issues,
  • Human errors,
  • Natural or man-made disasters,
  • Performance and manageability.

 

Addressing all those points could be quiet challenging from the technical perspective and could also require highly qualified staff to manage that.

In addition, the associated cost behind, depending on the hardware used, could be significant. Thus, for many customers it could even become prohibitive causing some points to deliberately not being addressed at all.

 

So what’s the alternative?

 

When the database doesn’t requires partitioning, parallel processing and other options available only within Enterprise Edition, there is a costless alternative with the implementation of stretched cluster that gives a flexible Highly Available solution based on Oracle Standard Edition.

 

This article will cover the implementation of this architecture using low cost hardware configuration and will highlight some particular points to consider to successfully deploying it.

 

Proposed alternative

The goal here is to maximize the availability and protect the storage that becomes the single point of failure.

Proposition for stretched cluster within a single data center based on Oracle Standard Edition in RAC:

 

Cluster2

Here we are using 4 servers with 1 socket each as Oracle SE is limited to 4 sockets in total. The advantage is the number of cores per CPU which is unlimited. Some vendors are proposing servers with one socket that can handle very powerful processor with up to 8 cores and a bunch of RAM in it.

Key features of the configuration are:

Note: assuming the entire data center is not lost

 

  • Very high availability and flexibility
  • Powerful processing configuration

 

Weaknesses:

 

  • No scalability except by upgrading to more powerful CPU
  • Not a real Disaster Recovery Solution

 

Associated price

Here are the products to be licensed

  • Oracle Standard Edition 000 USD (4 sockets)

 

It can run on a system with up to 4 sockets or in cluster with max 4 sockets in total.

Cost savings comparison

In the same configuration using Enterprise Edition + RAC will requires the following CPU licenses:

E.E.: 4 sockets * 8 cores = 32 cores * 0.5 = 16 CPU

S.E.: 4 CPU license.

 

Scenario CPU model
Oracle EE+ RAC 1.128.000 USD
Oracle SE 70.000 USD
Total savings 1.058.000

 

Note: The cost is without any discount and support is not included

Oracle price list

 

This alternative, even with some missing features offers a good compromise and an interesting cost saving.

Technical considerations

 

All hardware part has a good redundancy and guarantees the required availability.

 

Keeping in mind, that we are building our infrastructure on a low cost hardware, costly enterprise storage subsystems with real time volume mirroring are out the loop.

 

The storage solution is provided by ASM with normal redundancy disk group configuration. To guarantee the required high availability in case one storage system is lost, we need at least three locations.

 

Best Practice: It is not recommended to store voting files in the same disk group as database files. I recommend a grid dedicated DISKGROUP, (for example CRSDATA) to store cluster voting files, OCR and ASM Spfile. A mirror copy of the OCR could eventually be put in any other DISKGROUP.

Preparing clusterware disk groups

 

disks

From our three SAN configuration, the definition of the required storage will looks like this.

In the configuration above, we have defined six LUNs of 2Gb each to store voting and OCR files (cluster registry). This will give us 2 disk groups with around 4GB available space and will be enough to store 3 voting files of 280 Mb each plus 2 OCR copy of 280Mb each.

 

During the installation process, the ASM configuration screen will look like this:

ASM

 

After Grid Infrastructure installation

 

Query disk group information

[grid@node1 ~]# sqlplus / as sysasmSQL> select VOTING_FILES,TOTAL_MB,FREE_MB,STATE from v$asm_diskgroup where name=’VOTING’;

 

V       TOTAL_MB USABLE_FILE_MB STATE

– ———- ————– ——–

Y           6058            2320       MOUNTED

 

 

Check location of voting disks.

[root@node1 ~]# crsctl query css votedisk##     STATE   File Universal Id               File Name Disk group–     —–       —————–                   ——— ———

1. ONLINE       684ec807ae794fe6bf3e09626c0a6827 (/dev/asm-Voting3) [VOTING]

2. ONLINE       726b96afa5224f38bf442d8a184f76d4 (/dev/asm-Voting1) [VOTING]

3. ONLINE       492c665794154f4dbf3ed62466e32fdd (/dev/asm-Voting2) [VOTING]

Located 3 voting disk(s).

 

We have one voting disk located in each storage system.

All cluster components are now fully redundant.

  • The network interface cards are redundant
  • The SAN switches are redundant
  • We have four nodes within our cluster
  • The storage is redundant

 

At any time, we can lose one component and the cluster will still be up and running.

Alternative if no three SANs are available

Cluster

 

The third SAN is replaced by an NFS or iSCSI target to store only the third voting file.

Read: Oracle white paper on using standard NFS to support a third voting file for extended cluster configurations.

Using service capabilities

About Services

Since 10g Oracle introduced an automatic workload management facility, called services. A single service can represent an application, multiple applications or a subset of a single application that can be associated with one or more instances of an Oracle RAC database.

The following benefits can be attributed to services:

  • A single entity for managing applications that compete for the same resources
  • Allow each workload to be managed as a single unit
  • Hide the complexity of the cluster from the client

 

To manage workloads, we can define and assign services to particular application operations. We can also use services to manage the workload for different types of work. For example, online users can use one service while reporting users can use a different one.

 

Services are integrated with the Database Resource Manager, which enables the resources restriction that is used by a service within an instance.

Assign nodes to functionality

 

The database is a pure OLTP but let’s assume it requires some reporting capability.

We have four nodes in our cluster and depending on the most important part of the application we can decide which nodes are used for what. To do that, we will use services to dedicate available nodes for a specific application.

 

To guarantee the availability we have no other choices than giving two nodes per functionality. This can be achieved using services.

Assuming our database name is sales, we are going to define two services: sales_oltp and sales_rep

 

[oracle@node1 ~]# srvctl add service -d sales -s sales_oltp -r sales1,sales3 -P NONE -e SESSION -m BASIC -w 5 -z 10 [oracle@node1 ~]# srvctl config service -d sales -s sales_oltp

Service name: sales_oltp

Service is enabled

Server pool: sales_sales_oltp

Cardinality: 2

Disconnect: false

Service role: PRIMARY

Management policy: AUTOMATIC

DTP transaction: false

AQ HA notifications: false

Failover type: NONE

Failover method: SESSION

TAF failover retries: 1

TAF failover delay: 1

Connection Load Balancing Goal: LONG

Runtime Load Balancing Goal: NONE

TAF policy specification: NONE

Edition:

Preferred instances: sales1,sales3

Available instances:

 

 

[oracle@node1 ~]# srvctl add service -d sales -s sales_rep -r sales2,sales4 -P NONE -e SELECT -m BASIC -w 5 -z 10

 

[oracle@node1 ~]# srvctl config service -d sales -s sales_rep

Service name: sales_rep

Service is enabled

Server pool: sales_sales_rep

Cardinality: 2

Disconnect: false

Service role: PRIMARY

Management policy: AUTOMATIC

DTP transaction: false

AQ HA notifications: false

Failover type: SELECT

Failover method: NONE

TAF failover retries: 1

TAF failover delay: 1

Connection Load Balancing Goal: LONG

Runtime Load Balancing Goal: NONE

TAF policy specification: NONE

Edition:

Preferred instances: sales2,sales4

Available instances:

 

 

[oracle@node1 ~]# srvctl start service -d sales -s sales_oltp

 

[oracle@node1 ~]# srvctl start service -d sales -s sales_rep

 

[oracle@node1 ~]# srvctl status service -d sales -s sales_oltp

Service sales_oltp is running on instance(s) sales1, sales3

 

[oracle@node1 ~]# srvctl status service -d sales -s sales_rep

Service sales_rep is running on instance(s) sales2, sales4

 

Client configuration

Here is a simple tnsnames.ora configuration that provides basic transparent failover functionality.

 

Sales_oltp =(DESCRIPTION =(ADDRESS = (PROTOCOL = TCP)(HOST = sales.cust.ch)(PORT = 1521))

(CONNECT_DATA =

(SERVER = DEDICATED)

(SERVICE_NAME = sales_oltp.cust.ch)

)

)

 

Sales_rep =

(DESCRIPTION =

(ADDRESS = (PROTOCOL = TCP)(HOST = sales.cust.ch)(PORT = 1521))

(CONNECT_DATA =

(SERVER = DEDICATED)

(SERVICE_NAME = sales_rep.cust.ch)

)

)

 

 

 

We have a service failover solution for each service as Node1 and Node3 will be dedicated to OLTP and Node2 and Node4 to the reporting.

See picture on the next page.

ASM I/O distribution between OLTP and Reporting

 

Let’s assume our OLTP is not I/O demanding. ASM allows specifying preferred failure group for read operations; this is a known usage best practice in extended cluster configurations. We will use that functionality to limit reads of OLTP application to only one specific failure group out of three, for example the one in SAN1.

 

To do that, we will set on the two nodes that host the OLTP service the ASM parameter: ASM_PREFERRED_READ_FAILURE_GROUPS

[grid@node1 ~]#sqlplus / as sysasmSQL> alter system set ASM_PREFERRED_READ_FAILURE_GROUPS=”DATA.DATA_0000”,”FRA.FRA_0000”;

System altered.

 

[grid@node3 ~]#sqlplus / as sysasm

SQL> alter system set ASM_PREFERRED_READ_FAILURE_GROUPS=”DATA.DATA_0000”,”FRA.FRA_0000”;

System altered.

 

 

 

Reporting will take advantage of the two other failures groups.

[grid@node2 ~]#sqlplus / as sysasmSQL> alter system set ASM_PREFERRED_READ_FAILURE_GROUPS=”DATA.DATA_0001”,”FRA.FRA_0001”;

System altered.

 

[grid@node4 ~]#sqlplus / as sysasm

SQL> alter system set ASM_PREFERRED_READ_FAILURE_GROUPS=”DATA.DATA_0002”,”FRA.FRA_0002”;

System altered.

 

 

As long as the specified set of failure group is available, ASM will read from it, otherwise it will read from the other failure groups.

Write operations are not impacted by the setting of this parameter.

 

service

Using Oracle Clusterware and VIP to protect third party application

 

Oracle Clusterware is a general purpose cluster software, able to protect any kind of application using a generic agent and a user defined action script.

 

To manage that, the Oracle Clusterware includes a high availability framework that provides an infrastructure to manage any application. It ensures that applications that it manages start when the system starts in respect of service dependencies. The Oracle Clusterware also monitors the applications to make sure that they are always available. If a node in the cluster fails, then we can make processes that normally run on the failed node to be restarted on another node. Application profile describes the detail of our application and how Oracle Clusterware will manage it.

 

Highlights

 

Strengths Weaknesses
■    Cost■    Performance

■    Flexibility

■    Protection level

 

■    Scalability■    Option limitations

■    Not a DR solution

 

 

Limitations

 

Areas Value
■    Number of CPU sockets in the cluster■    DR Capability

■    Available options

 

4No

Limited to SE

 

 

The following two tabs change content below.

Jacques

I am Oracle Certified Master 11g & 12c database architect with significant experience in heterogeneous environments, and strong ability to lead complex and critical projects requiring multiple technical implementations. at Trivadis SA

Leave a Reply

Your email address will not be published. Required fields are marked *