Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Oracle Drivers configuration for High Availability

... is it a developer's job?

UCP, GridLink, TAF, AC, TAC, FAN… The configuration of Oracle Drivers for application high availability is not an easy job. The developers often care about the minimal working configuration, while the DBAs are busy with the operations. In this session I will try to demystify application server’s connectivity to the database and give a direction toward the highest availability, using Real Application Clusters and new Oracle features like TAC and CMAN TDM.

  • Login to see the comments

Oracle Drivers configuration for High Availability

  1. 1. Oracle Drivers configuration for High Availability is it a developer's job? Ludovico Caldara - Computing Engineer @CERN, Oracle ACE Director
  2. 2. ■ http://www.ludovicocaldara.net ■ @ludodba ■ ludovicocaldara ■ Two decades of DBA experience (Not Only Oracle) ■ ITOUG co-founder ■ OCP (11g, 12c, MySQL) & OCE ■ Italian living in Switzerland Ludovico Caldara
  3. 3. 3 The Large Hadron Collider (LHC) Largest machine in the world 27km, 6000+ superconducting magnets Emptiest place in the solar system High vacuum inside the magnets Hottest spot in the galaxy During Lead ion collisions create temperatures 100 000x hotter than the heart of the sun Fastest racetrack on Earth Protons circulate 11245 times/s (99.9999991% the speed of light)
  4. 4. SQL> select sum(bytes/power(1024,5)) as "PetaBytes" > from dba_data_files; PetaBytes -------------- 1.056794738695 Large databases
  5. 5. Or complex ones
  6. 6. oracle.com/gbtour New Free Tier Always Free Oracle Cloud Infrastructure Services you can use for unlimited time 30-Day Free Trial Free credits you can use for more services +
  7. 7. A new project is coming in your company … and the development starts unsplash.com/@helloquence
  8. 8. Disclaimer • Some oversimplifications • A very complex topic • Requires DBA and developer skills • Assume you know some basic concepts • High availability and failover concepts • Connections to database • Basic NET configurations (SCAN, Listener, Services, TNS) • Assume you have recent DB and client (>=12.2)
  9. 9. "Failure happens all the time. It happens every day in practice. What makes you better is how you react to it." ― Mia Hamm
  10. 10. What do you have to protect? • New network session • Established network session Try - Wait - Failover Wait - Retry - Wait - Failover - Replay query/transaction
  11. 11. Factors that influence HA Too many! • Network topology • OS type and configuration • DB version and service configuration • Client version and type • Application design / exception handling
  12. 12. Factors that influence HA Too many! • Network topology • OS type and configuration • DB version and service configuration • Client version and type • Application design / exception handling Our mission today
  13. 13. Factors that influence HA Too many! • Network topology • OS type and configuration • DB version and service configuration • Client version and type • Application design / exception handling Good white-paper: Oracle Client Failover - Under the Hood By Robert Bialek (Trivadis)
  14. 14. A concept that you must know
  15. 15. Database Services • Virtual name for a database endpoint HR_SVC HR_SVC CRM_SVC REP_SVC Registered with the listener Real Applications Cluster / Data Guard
  16. 16. Database Services • Active-Active (RAC, Golden Gate) HR_SVC HR_SVC Real Applications Cluster / Data Guard
  17. 17. Database Services • Active-Passive (RAC, Data Guard, RAC ON) REP_SVC Real Applications Cluster / Data Guard
  18. 18. Database Services • The DBA can create services with: • srvctl add service • dbms_service.create_service() PL/SQL procedure. • Both methods have parameters for HA • Hint: HA at service level is superfluous if the client is not configured properly • Did you know? Parameter service_names is deprecated!
  19. 19. Oracle recommends against using default services (DB_NAME or PDB_NAME) or SID
  20. 20. Recommended descriptor (client >=12.2) HR = (DESCRIPTION = (CONNECT_TIMEOUT=120)(RETRY_COUNT=20) (RETRY_DELAY=3)(TRANSPORT_CONNECT_TIMEOUT=3) (ADDRESS_LIST = (LOAD_BALANCE=on) (ADDRESS=(PROTOCOL=TCP)(HOST=primary-scan)(PORT=1521))) (ADDRESS_LIST = (LOAD_BALANCE=on) (ADDRESS=(PROTOCOL=TCP)(HOST=standby-scan)(PORT=1521))) (CONNECT_DATA=(SERVICE_NAME = HR.cern.ch)))
  21. 21. Planned Maintenance
  22. 22. Planned Maintenance • CRM sessions exist on instance 1 CRM_SVC Real Applications Cluster / Data Guard
  23. 23. Planned Maintenance • Need to restart instance 1 CRM_SVC Real Applications Cluster / Data Guard
  24. 24. Planned Maintenance • Service relocation: new sessions go to instance 2 CRM_SVC Real Applications Cluster / Data Guard
  25. 25. Planned Maintenance • Service relocation: new sessions go to instance 2 • Problem: what about existing sessions? CRM_SVC Real Applications Cluster / Data Guard
  26. 26. Planned Maintenance • Service relocation: new sessions go to instance 2 • Problem: what about existing sessions? CRM_SVC Real Applications Cluster / Data Guard
  27. 27. How to drain sessions • You need to know that the service is being relocated • Use Fast Application Notification (FAN)! CRM_SVC Real Applications Cluster / Data Guard ONS
  28. 28. How to drain sessions • You need to know that the service is being relocated • Use Fast Application Notification (FAN)! CRM_SVC Real Applications Cluster / Data Guard ONS register connect
  29. 29. How to drain sessions • You need to know that the service is being relocated • Use Fast Application Notification (FAN)! CRM_SVC Real Applications Cluster / Data Guard ONS stop notification! CRM_SVCstart
  30. 30. How to drain sessions • You need to know that the service is being relocated • Use Fast Application Notification (FAN)! CRM_SVC Real Applications Cluster / Data Guard ONS CRM_SVC disconnect when the transaction is over and reconnect ONS
  31. 31. FAN at database side • Grid Infrastructure is necessary to register with ONS • ONS must be enabled (default remote port 6200) • 18c: in-band notifications • FAN/enabled Service srvctl add service –db orcl –service hr_svc -rlbgoal [SERVICE_TIME | THROUGHPUT] # for load balancing advisory -notification TRUE # for OCI/ODP.net connections srvctl relocate service –db orcl –service hr_svc -oldinst orcl1 -newinst orcl2 -drain_timeout 10 # let some time for sessions to drain # switch –force not specified, sessions are not killed
  32. 32. FAN at client side import oracle.simplefan.FanEventListener; import oracle.simplefan.FanManager; import oracle.simplefan.FanSubscription; import oracle.simplefan.ServiceDownEvent; [...] FanManager fanMngr = FanManager.getInstance(); onsProps.setProperty("onsNodes", “node1:6200,node2:6200"); fanMngr.configure(onsProps); FanSubscription sub = fanMngr.subscribe(props); sub.addListener(new FanEventListener() { public void handleEvent(ServiceDownEvent event) { System.out.println("Service down event"); System.out.println(event.getReason()); // handle the event } });
  33. 33. FAN at client side import oracle.simplefan.FanEventListener; import oracle.simplefan.FanManager; import oracle.simplefan.FanSubscription; import oracle.simplefan.ServiceDownEvent; [...] FanManager fanMngr = FanManager.getInstance(); onsProps.setProperty("onsNodes", “node1:6200,node2:6200"); fanMngr.configure(onsProps); FanSubscription sub = fanMngr.subscribe(props); sub.addListener(new FanEventListener() { public void handleEvent(ServiceDownEvent event) { System.out.println("Service down event"); System.out.println(event.getReason()); // handle the event } });
  34. 34. Fast Connection Failover (FCF) • Pre-configured FAN integration • Works with connection pools • The application must be pool aware • (borrow/release) • The connection pool leverages FAN events to: • Remove quickly dead connections on a DOWN event • (opt.) Redistribute the load on a UP event
  35. 35. Fast Connection Failover (FCF) • UCP (Universal Connection Pool, ucp.jar) and WebLogic Active GridLink handle FAN out of the box. No code changes! Just enable FastConnectionFailoverEnabled. • Third-party connection pools can implement FCF • If JDBC driver version >= 12.2 • simplefan.jar and ons.jar in CLASSPATH • Connection validation options are set in pool properties • Connection pool can plug javax.sql.ConnectionPoolDataSource • Connection pool checks connections at borrow/release
  36. 36. Fast Connection Failover (FCF) • UCP (Universal Connection Pool, ucp.jar) and WebLogic Active GridLink handle FAN out of the box. No code changes! Just enable FastConnectionFailoverEnabled. • Third-party connection pools can implement FCF • If JDBC driver version >= 12.2 • simplefan.jar and ons.jar in CLASSPATH • Connection validation options are set in pool properties • Connection pool can plug javax.sql.ConnectionPoolDataSource • Connection pool checks connections at borrow/release
  37. 37. Fast Connection Failover (FCF) • OCI Connection Pool handles FAN events as well • Need to configure oraaccess.xml properly in TNS_ADMIN • Python’s cx_oracle, PHP oci8, etc. have native options • ODP.Net: just set "HA events = true;pooling=true"
  38. 38. Session Draining in 18c • Database invalidates connection at: • Standard connection tests for connection validity (conn.isValid(), CheckConStatus, OCI_ATTR_SERVER_STATUS) • Custom SQL tests for validity (DBA_CONNECTION_TESTS) • SELECT 1 FROM DUAL • SELECT COUNT(*) FROM DUAL • SELECT 1 • BEGIN NULL;END • Add new: execute dbms_app_cont_admin.add_sql_connection_test( 'select * from dual', service_name);
  39. 39. “Have we implemented FAN/FCF correctly?” • TEST, TEST, TEST • Relocate services as part of your CI/CD • Application ready for planned maintenance => happy DBA, Dev, DevOps
  40. 40. Why draining? • Draining best solution for hiding planned maintenance No draining Killing persisting sessions Unplanned from application perspective
  41. 41. Unplanned Maintenance unsplash.com/@darmfield
  42. 42. Unplanned Maintenance (failover) • CRM sessions exist on instance 1 CRM_SVC Real Applications Cluster / Data Guard
  43. 43. Unplanned Maintenance (failover) • CRM sessions exist on instance 1 • The instance crashes. What about running sessions/transactions? CRM_SVC Real Applications Cluster / Data Guard
  44. 44. Unplanned Maintenance (failover) • CRM sessions exist on instance 1 • The instance crashes. What about running sessions/transactions? • (Any maintenance that terminate sessions non-transactional) CRM_SVC Real Applications Cluster / Data Guard
  45. 45. Transparent Application Failover (TAF) • For OCI drivers only • Automates reconnect • Allows resumable queries (session state restored in 12.2) • Transactions and PL/SQL calls not resumed (rollback)
  46. 46. Transparent Application Failover (TAF) • For OCI drivers only • Automates reconnect • Allows resumable queries (session state restored in 12.2) • Transactions and PL/SQL calls not resumed (rollback) Oracle Net Fetched
  47. 47. Transparent Application Failover (TAF) • For OCI drivers only • Automates reconnect • Allows resumable queries (session state restored in 12.2) • Transactions and PL/SQL calls not resumed (rollback) Oracle Net Fetched Lost
  48. 48. Transparent Application Failover (TAF) • For OCI drivers only • Automates reconnect • Allows resumable queries (session state restored in 12.2) • Transactions and PL/SQL calls not resumed (rollback) Oracle Net Fetched Lost Discarded
  49. 49. Transparent Application Failover (TAF) • For OCI drivers only • Automates reconnect • Allows resumable queries (session state restored in 12.2) • Transactions and PL/SQL calls not resumed (rollback) Oracle Net Fetched Lost Fetched Discarded
  50. 50. Transparent Application Failover (TAF) srvctl add service –db orcl –service hr_svc -failovertype SELECT -failoverdelay 1 -failoverretry 180 -failover_restore LEVEL1 # restores session state (>=12.2) -notification TRUE Server side: Client side: HR = (DESCRIPTION = (FAILOVER=ON) (LOAD_BALANCE=OFF) (ADDRESS=(PROTOCOL=TCP)(HOST=server1)(PORT=1521)) (CONNECT_DATA = (SERVICE_NAME = HR.cern.ch) (FAILOVER_MODE = (TYPE = SESSION) (METHOD = BASIC) (RETRIES = 180) (DELAY = 1) )))
  51. 51. Fast Connection Failover and FAN • Like for planned maintenance, but… • Connection pool recycles dead connections • Application must handle all the exceptions • FAN avoids TCP timeouts!
  52. 52. Application Continuity (AC) • Server-side Transaction Guard (included in EE) • Transaction state is recorded upon request • Client-side Replay Driver • Keeps journal of transactions • Replays transactions upon reconnect • JDBC thin 12.1, OCI 12.2
  53. 53. Application Continuity (AC) • AC with UCP: no code change • AC without connection pool: code change PoolDataSource pds = PoolDataSourceFactory.getPoolDataSource(); pds.setConnectionFactoryClassName("oracle.jdbc.replay.OracleDataSourceImpl"); ... conn = pds.getConnection(); // Implicit database request begin // calls protected by Application Continuity conn.close(); // Implicit database request end OracleDataSourceImpl ods = new OracleDataSourceImpl(); conn = ods.getConnection(); ... ((ReplayableConnection)conn).beginRequest(); // Explicit database request begin // calls protected by Application Continuity ((ReplayableConnection)conn).endRequest(); // Explicit database request end
  54. 54. Application Continuity (AC) srvctl add service –db orcl –service hr -failovertype TRANSACTION # enable Application Continuity -commit_outcome TRUE # enable Transaction Guard -failover_restore LEVEL1 # restore session state before replay -retention 86400 # commit outcome retained 1 day -replay_init_time 900 # replay not be initiated after 900 seconds -notification true Service definition: Special configuration to retain mutable values at replay: GRANT KEEP SEQUENCE ON <SEQUENCE> TO USER <USER>; GRANT KEEP DATE TIME TO <USER>; GRANT KEEP SYSGUID TO <USER>;
  55. 55. Transparent Application Continuity (TAC) • “New” in 18c for JDBC thin, 19c for OCI • Records session and transaction state server-side • No application change • Replayable transactions are replayed • Non-replayable transactions raise exception • Good driver coverage but check the doc! • Side effects are never replayed
  56. 56. Transparent Application Continuity (TAC) srvctl add service –db orcl –service hr -failover_restore AUTO # enable Transparent Application Continuity -failovertype AUTO # enable Transparent Application Continuity -commit_outcome TRUE # enable Transaction Guard -retention 86400 # commit outcome retained 1 day -replay_init_time 900 # replay not be initiated after 900 seconds -notification true Service definition: Special configuration to retain mutable values at replay: GRANT KEEP SEQUENCE ON <SEQUENCE> TO USER <USER>; GRANT KEEP DATE TIME TO <USER>; GRANT KEEP SYSGUID TO <USER>;
  57. 57. Still not clear? • Fast Application Notification to drain sessions • Application Continuity for full control (code change) • Transparent Application Continuity for good HA (no code change)
  58. 58. Connection Manager in Traffic Director Mode CMAN with an Oracle Client “brain”
  59. 59. Classic vs TDM CLIENT DB cman CLIENT DB cman SQLNet is redirected transparently CMAN is the end point of client connections CMAN opens its own connection to the DB
  60. 60. Session Failover with TDM CLIENT cman CDBA PDB1 • Client connects to cman:1521/pdb1 CDBA
  61. 61. Session Failover with TDM CLIENT cman CDBA PDB1 • Client connects to cman:1521/pdb1 • Cman opens a connection to pdb1 CDBA
  62. 62. Session Failover with TDM CLIENT cman CDBA PDB1 • Client connects to cman:1521/pdb1 • Cman opens a connection to pdb1 • Upon PDB/service relocate, cman detects the stop and closes the connections at transaction boundaries CDBA
  63. 63. Session Failover with TDM CLIENT cman CDBA • Client connects to cman:1521/pdb1 • Cman opens a connection to pdb1 • Upon PDB/service relocate, cman detects the stop and closes the connections at transaction boundaries • The next request is executed on the surviving instance CDBA PDB1
  64. 64. Session Failover with TDM CLIENT cman CDBA • Client connects to cman:1521/pdb1 • Cman opens a connection to pdb1 • Upon PDB/service relocate, cman detects the stop and closes the connections at transaction boundaries • The next request is executed on the surviving instance • The connection client-cman is intact, the client does not experience a disconnection CDBA PDB1
  65. 65. Magic does not happen, you need to plan
  66. 66. Thank you! Ludovico Caldara - Computing Engineer @CERN, Oracle ACE Director

×