Transparent Application Continuity with Data Guard
Oracle databases provide robust and high available solution for storing your data. Besides of that Oracle has implemented many interesting features into its databases and one of them is Transparent Application Continuity or TAC. TAC is a mechanism that transparently hides unavailability of a DB node from the application during relocation or switchover. But when I was about to give it a try I came across a startling fact that all of the documentation I found expected RAC setup. So I have decided to document the steps for setting TAC in Data Guard setup using DBMS_SERVICE. In this article I’ll be setting up the service only for one PDB (with the name PDB01) in multi tenant database. My testing environment is running two instances of DBCS in OCI (Oracle Cloud) in Data Guard setup.
Licensing
TAC/AC requires the Oracle Active Data Guard option, the Oracle RAC One Node option, or the Oracle Real Application Clusters option.
Prerequisites
- I assume you have an environment with two databases in Data Guard setup. I presume that your environment is already working so all necessary bits like open ports, Data Guards setup and connection from your client to DBs is already in place. In case you are using DBCS (with ASM) in OCI you can easily enable Data Guard following this guide https://docs.oracle.com/en-us/iaas/dbcs/doc/enable-oracle-data-guard-db-system.html
- Your DB setup must have installed Oracle Grid. This requirements comes from the point that TAC is using FAN (Fast application notification) to make application aware of what is going on during switchover by sending events.
- Besides having opened listener port in your firewall (typically 1521) you need to open also port 6200 that is used by ONS which is used by FAN.
- For application that doesn’t support TAC by default additional setup is required but that is not subject of this article.
- In case you need to use more advanced setup or simply application that doesn’t support TAC by default you should have a look on Oracle’s documentation for TAC/AC https://www.oracle.com/docs/tech/database/applicationcontinuityformaa.pdf
- The database must be in version 18c or newer but I wouldn’t go into DB older than 19c.
- Oracle client must be in version 19c or newer (for certain cases also 18c can be used).
- Either you use full (thick) client or if you use think client environment variables are set for the client.
Server steps
- Create a service that supports TAC. That basically means that some specific parameters are required like COMMIT_OUTCOME=True or FAILOVER_RESTORE=AUTO. There is one unexpected behavior I came across. I wasn’t able to create the service in one step but instead I had to create it in two steps. One tricky part is that as of today we must use undocumented parameter values (FAILOVER_RESTORE=AUTO, SESSION_STATE_CONSISTENCY=AUTO and FAILOVER_TYPE=AUTO) for the service. For more details on DBMS_SERVICE have a look on Oracle documentation https://docs.oracle.com/en/database/oracle/oracle-database/19/arpls/DBMS_SERVICE.html
exec DBMS_SERVICE.CREATE_SERVICE('pdb01_ha','pdb01_ha');
DECLARE
params dbms_service.svc_parameter_array;
BEGIN
params('FAILOVER_TYPE'):='AUTO';
params('FAILOVER_METHOD'):=DBMS_SERVICE.FAILOVER_METHOD_BASIC;
params('REPLAY_INITIATION_TIMEOUT'):=1800;
params('RETENTION_TIMEOUT'):=86400;
params('FAILOVER_DELAY'):=10;
params('FAILOVER_RETRIES'):=30;
params('DRAIN_TIMEOUT'):=60;
params('commit_outcome'):='true';
params('aq_ha_notifications'):='true';
params('STOP_OPTION'):=DBMS_SERVICE.STOP_OPTION_IMMEDIATE;
params('SESSION_STATE_CONSISTENCY'):='AUTO';
params('FAILOVER_RESTORE'):='AUTO';
DBMS_SERVICE.MODIFY_SERVICE('pdb01_ha',params);
END;
2. When the service is created you need to start the service to make it work.
exec DBMS_SERVICE.start_service(service_name => 'pdb01_ha');
3. When the service exits and is available you still have to do one more very important thing. You must update the PDB state otherwise the service won’t come up after switchover.
alter pluggable database PDB01 save state;
Client steps
- Make sure that you have environment variable set to point to your Oracle Client. Change your location of the client according to your setup.
Windows:
LD_LIBRARY_PATH=C:\app\client_19c\lib
TNS_ADMIN=C:\app\client_19c\network\admin
PATH=%PATH%;C:\app\client_19c\bin
Linux:
LD_LIBRARY_PATH=/opt/app/client_19c/lib
TNS_ADMIN=/opt/app/client_19c/network/admin
PATH=${PATH}:/opt/app/client_19c/bin
2. Update your tnsnames.ora. Please note that service name and both hosts are coming from my setup so you must adjust those parameters according to your environment.
ADG_PDB01_DG=
(DESCRIPTION=
(CONNECT_TIMEOUT=120)
(TRANSPORT_CONNECT_TIMEOUT=4)
(RETRY_COUNT=50)
(RETRY_DELAY=3)
(ADDRESS_LIST=
(FAILOVER = ON)(LOAD_BALANCE = OFF)
(ADDRESS=
(PROTOCOL=TCP)
(HOST=10.0.0.180)
(PORT=1521))
(ADDRESS=
(PROTOCOL=TCP)
(HOST=10.0.0.225)
(PORT=1521)))
(CONNECT_DATA=
(SERVICE_NAME=pdb01_ha.sub10111300471.ppvcn.oraclevcn.com)
))
Perform the test
- Create an user under your PDB (PDB01).
create user test identified by abCD__54321 quota unlimited on users default tablespace users;
grant create session,create table to test;
2. For the sake of testing I will use SQL*Plus 19c. To activate TAC on the sqlplus you must run it with “-ac” switch.
sqlplus -ac test/abCD__54321@ADG_PDB01_DG
SQL> create table test_tab (test_col varchar2(100));
SQL> insert into test_tab values ('test1');
--don't commit yet but do the swithchover in dgmgrl
3. Do switchover in dgmgrl (if you are not using DG Broker you can switchover using SQL*Plus)
$ dgmgrl
DGMGRL> connect sys
Password:
DGMGRL> show configuration
DGMGRL> validate database <Physical Standby>
DGMGRL> switchover to <Physical Standby>
4. During switchover run commit of your transaction. The transaction will be hanging for a while until the switchover is completed but no error is thrown by either database or client.
SQL> commit;
Commit complete.
SQL>
5. You can check which sessions are using failover
--for all sessions
select username,module,machine,service_name,logon_time,osuser,failover_type, failover_method from v$session order by osuser;
--for current user
SELECT USERNAME,MODULE,PROGRAM,FAILOVER_METHOD, FAILOVER_TYPE, FAILED_OVER FROM V$SESSION WHERE username=user;
6. To verify the communication between client and both nodes in Data Guard you can check that there is a traffic over port 6200 that is used by FAN/ONS. That traffic is used to exchange TAC events.
The whole test is below. Switchover took about 50s so commit was waiting without an error until the switchover completes and then the transaction was committed. Note that the session was marked FAILED_OVER=YES once the switchover completed.
DGMGRL> validate database ADB01_fra2q6
Database Role: Physical standby database
Primary Database: ADB01_primary
Ready for Switchover: Yes
Ready for Failover: Yes (Primary Running)
Managed by Clusterware:
ADB01_primary: YES
ADB01_fra2q6 : YES
DGMGRL> switchover to ADB01_fra2q6
Performing switchover NOW, please wait...
Operation requires a connection to database "ADB01_fra2q6"
Connecting ...
Connected to "ADB01_fra2q6"
Connected as SYSDBA.
New primary database "ADB01_fra2q6" is opening...
Oracle Clusterware is restarting database "ADB01_primary" ...
Connected to "ADB01_primary"
Switchover succeeded, new primary is "adb01_fra2q6"
DGMGRL>
SQL> set line 200
SQL> col USERNAME format A15
SQL> col MODULE format A15
SQL> col PROGRAM format A15
SQL> col FAILOVER_METHOD format A15
SQL> col FAILOVER_TYPE format A15
SQL> col FAILED_OVER format A15
SQL> SELECT USERNAME,MODULE,PROGRAM,FAILOVER_METHOD, FAILOVER_TYPE, FAILED_OVER FROM V$SESSION WHERE username=user;
USERNAME MODULE PROGRAM FAILOVER_METHOD FAILOVER_TYPE FAILED_OVER
--------------- --------------- --------------- --------------- --------------- ---------------
TEST SQL*Plus sqlplus.exe BASIC AUTO NO
SQL> select * from test_tab;
no rows selected
SQL> insert into test_tab values ('test1');
1 row created.
SQL> select systimestamp from dual;
SYSTIMESTAMP
---------------------------------------------------------------------------
02-MAR-23 08.14.57.570012 AM +00:00
SQL> commit;
Commit complete.
SQL> select systimestamp from dual;
SYSTIMESTAMP
---------------------------------------------------------------------------
02-MAR-23 08.15.47.734968 AM +00:00
SQL> SELECT USERNAME,MODULE,PROGRAM,FAILOVER_METHOD, FAILOVER_TYPE, FAILED_OVER FROM V$SESSION WHERE username=user;
USERNAME MODULE PROGRAM FAILOVER_METHOD FAILOVER_TYPE FAILED_OVER
--------------- --------------- --------------- --------------- --------------- ---------------
TEST sqlplus.exe sqlplus.exe BASIC AUTO YES
SQL> select * from test_tab;
TEST_COL
----------------------------------------------------------------------------------------------------
test1
Troubleshooting
In case of errors consult your setup with the TAC checklist https://www.oracle.com/technetwork/database/clustering/checklist-ac-6676160.pdf
For troubleshooting of TAC or AC I recommend to check “How to use Application Continuity Protection Check (ACCHK) (Doc ID 2862075.1)” on Oracle support. The Doc elaborates how to use ACCHK to capture and report TAC/AC problems.
Result
As shown above we made a setup and did a test of TAC in Data Guard environment. The client (in our case SQL*Plus) committed a transaction but because of ongoing switchover it waited for the switchover completion and then hanging transaction was replayed. The whole transaction was completed without an error and in high available mode.