A presentation about real world experiences of running Oracle Database Appliances (ODA VP) in production for nearly 2 years. Given by Simon Haslam and Peter Moore, Principal DBA at Simplyhealth (a long time Veriton customer), at the UKOUG Systems Event in London on 20 May 2015.
1. Living with the
Oracle Database Appliance
Simon Haslam, Veriton
Peter Moore, Simplyhealth
2. Simon Haslam
Consultant, Veriton &
Technical Director of
Oracle s/w since 1995
Middleware & SOA
WebLogic, SOA, BPM
Peter Moore
Principal Oracle DBA & MW
Admin, Simplyhealth
Oracle s/w since 1988
Oracle DBA for 19 years
Database Administrator
3. Introduction & Background
ODA BM/VP & Sizing of Recovery Area
Hardware Maintenance (ASR & Disk Failures)
Patching
Miscellaneous
4. What is ODA?
Two fast Intel compute nodes
Shared, direct attached storage array including flash
InfiniBand interconnect & 10Gb public networks
Management software (database & virtualisation)
Sold as a single product for $68k (list)
in a slide!
5. Bulk Data HDD
Redo Logs
ODA Cache
SSD
Compute Node
Compute Node
HDD
Now with
InfiniBand
6.
7. Background
Started in 1872
◦ Previously… HSA, BCWA, HealthSure, LHF, Remedi, Medisure, Denplan
Primary business areas
◦ Health Cash Plans
◦ Private Medical Insurance
◦ Dental Capitation
◦ Healthcare delivery
Over 3M customers / 20,000 companies
~1700 Employees
8. Core IT
Product / CRM / Finance Application
~1000 Users / 600 Active
3M Customer records
Java EE and PL/SQL
3rd Party communications platform
RAC (2TB main db), WebLogic, Reports
9. ZFS Appliance
Simplyhealth’s ODAs
Production Test
ODA Base
OLTP
Reporting
standby
Comms
ODA Base
TTD container
VM 1
TTD container
VM 2
ODA BaseODA Base
OLTP
standby
Comms
standby
Test
Reporting
Reporting
APEX
portal
RMAN
OLTP
archive
RMAN
standbyOLTP
UAT
Comms
UAT
Test
11. 13 | 1013 • 50
Virtualized Platform: databases
Database
Each node has a “ODA Base”
DomU
Looks a lot like ODA BM – most
admin done from ODA Base
Nodes
Run a special OVS image
Appliance Manager
GUI when you first provision it
oakcli tool
Node 0 - OVS
ODA Base (DomU)
• Appliance Manager
• Database(s)
• Grid Infrastructure
Node 1 - OVS
ODA Base (DomU)
• Appliance Manager
• Database(s)
• Grid Infrastructure
Dom0 Dom0Repo Repo
Local Local
Shared
Storage
Lots of room for app
VMs like SOA
12. ODA BM or VP?
Simplyhealth chose ODA VP
◦ Initially driven by WebLogic
◦ Turned out to be good for test databases
If in doubt Simon recommends ODA VP:
◦ gives you more flexibility in future (app & probably database)
◦ only moderate extra operational complexity
13. Sizing of RECO
DATA is on outer part of hard disks, RECO on inner
Only set during initial provisioning
RECO
DATA
RECO
DATA
RECO
DATA
Default: “Local Backup” “External Backup”
DATA
RECO
DATA
RECO
DATA
RECO
14. DATA:RECO Sizes
Disks are physically partitioned according to whether Local or External
Backup was chosen
Same ratios for all ODA hardware versions and HIGH/NORMAL redundancy
DATA 43% RECO 57%
DATA 86%
RECO
14%
“Local Backup”
“External Backup”
OUTER
OUTER
INNER
INNER
15. Usable Space Example
ODA X5-2, 1 shelf, NORMAL redundancy
DATA 12TB RECO 16TB
DATA 24TB
RECO
4TB
“Local Backup”
“External Backup”
REDO
250GB
FLASH
750GB
24. ASR
Funnies
ASR raises one SR per disk… or none… or two…
Sometimes the first time you know that a disk has
failed has been when Oracle has updated the SR
◦ New ODA plug-in for EM is expected to include hardware
notifications
26. Our Disk History
We have 2 x dual shelf ODA X3-2s 16 SSD & 88 HDD
Running for 1.5 years (1.35M HDD-hours)
Total of 6 HDDs have been replaced (i.e. 225k h MTBF)
◦ 5 predicted failures
◦ 1 real failure… bad experience with I/O waits though
No SSDs have failed
Note: new ZFS SA disk arrived automatically next morning without sys
admin knowing it had failed! (ODA should be more like this)
27. Disk Failure ‘Gotchas’
1 predicted failure fixed itself!
General fiddliness of replacing disks
◦ Firmware updating, getting new disks ONLINE, etc
◦ MOS 1435946.1 & 1496114.1
The replacement disk includes the courier details to collect
the failed one…
◦ this is a European courier who will know nothing about it!
◦ we need the UK courier
Blinking yellow light doesn’t always work?!
29. Patching: It’s Really Good!
Vastly simplified process compared to DIY for full stack
Approx. quarterly ODA-only bundled patches
◦ includes PSU for databases (optional)
Oracle Support says <=2 versions behind current
There’s probably a backlog of ODA customers on 2.10
(last 11g GI but CPU only to April 2014)
30. prep
• Download & load to patch repositories on ODA nodes
INFRA
• Update INFRA
GI
• Update GI
db
• (optional) Update database Oracle Homes & databases
31. Upgrade Example
ODA 2.10 to 12.1.2.2.0 INFRA, GI, DB PSU
11g12c CRS/ASM upgrade would have probably
been a project pre-ODA
We only have a single 11.2.0.4.x Oracle Home
◦ some people have several, e.g. for different apps
32. prep
• scp p20340774_121220_Linux-x86-64_[12]of2.zip
• oakcli unpack –package p20340774… {for each zip, on each node}
• oakcli update -patch 12.1.2.2.0 --verify
INFRA
• oakcli update –patch 12.1.2.2.0 --infra
GI
• oakcli update –patch 12.1.2.2.0 --gi
db
• oakcli update –patch 12.1.2.2.0 --database
33. Lost
1h 10min
12c GI / 11g PSU Upgrade Timeline
--infra
2h 29min
--gi
1h 12min
--d.b.
40min
App Prep.
1h
Elapsed outage for app ~6h
Restarting
app etc
Supposed
to be rolling?
(all DBs shutdown)
Supposed
to be rolling?
Both nodes rebooted
automatically
Database were open for most of day but we were never sure when they would be shut down…
(our lack of experience of ODA patching?)
Possibly bug in
shared repo
upgrade
34.
35. What happened under the covers?
INFRA updates
◦ BIOS
◦ ILOM
◦ Firmware updated on all disks (except new ones)
◦ OVM 3.2.9
GI updates
◦ CRS 12.1.0.2.2
◦ ASM 12.1.2.x.0 (i.e. inc Flex ASM)
◦ ODA Base to Oracle Linux 5.10 UEK2
Database PSU
◦ Oracle home to 11.2.0.4.5 (plus 12.1.0.2.2, 11.2.0.3.13 if we had them)
◦ Databases updated (some!)
…and probably much more!
36. DB Patch-Set Update
Choose which Oracle Home(s) to apply PSU to
Script loops through databases running in each
updated home & runs catbundle.sql
◦ Recognises standbys - didn’t apply PSU (correctly) but still
shut them down! Perhaps because they shared the home
being patched? Possibly our fault!
37.
38. Strange Error Messages
Some strange messages, but mostly harmless:
◦ Console: “An error occurred while restoring domain oakDom1:
Error: not a valid guest state file: config size read”
But… 2 of us were watching everything very closely
◦ Probably better to just go for a long lunch instead!
39. Patching Wish List
Status/confidence
◦ more timestamps (for checking back later – test vs prod)
◦ a progress indicator for anything taking over ~3 min
e.g. “INFO: Running prepatching on node 0” ~20 mins
Could firmware updates of disks (35 mins) be done in
parallel?
40. Patching Wish List
Help us to understand which parts of process are
rolling (could be different per ODA version) and how to
minimise downtime
◦ Is INFRA ever rolling?
◦ GI rolling?
◦ DB rolling if using RAC or RON?
41. Patching Nirvana:
Rolling Upgrades for Everything?!
Size of ODA X5-2 invites DB consolidation
Simplyhealth: Lack of rolling INFRA will drive all non-UAT
databases off test ODA
(v hard to test bundled patches on pre-prod/UAT)
O-box SOA Appliance: sold on strength as HA so need rolling
updates below WebLogic layer
43. NFS Storage for Databases
Oracle ZFS and NFS (e.g. NetApp) is supported
◦ See MOS 1445253.1: External Storage (read/write) Support
◦ Use files over NFS, not via ASM
Uses Direct NFS (dNFS) fast
◦ we have 10 GbE network dedicated to storage
Not so self-contained so perhaps not “the ODA way”
44. An Innovative Approach for Test DBs
Requirement:
◦ To use DB EE NUP licences for test, when the 2 ODA bases are
licensed by RAC processor
Solution:
◦ One large VM on each node with multiple Linux Containers
◦ Test databases within the containers use ZFS SA for storage
Suffers from lack of rolling upgrades for ODA INFRA
Technical Credit/Implementation:
Mark Leeuw & Fabrizio Bordaccini
45. Backup & Disaster Recovery
Data Guard works well of course
ODA VP & ODA Base?
◦ In practice you need to rebuild
VMs running on ODA VP?
◦ Host level backup within VM
◦ ACFS Replication...?
Oracle White Paper:
Backup and Recovery Best Practices for the Oracle Database Appliance (April 2014)
46. Management
Looking forward to trying the new EM 12c R4 ODA
plug-in
Initial ODA VP imaging
◦ Why can’t ODA come with VP image?
◦ Speed of booting .ISO over ILOM if not local
47. Tips
Keep It Simple!
◦ Don’t stray too far from standard ODA design goals
◦ Custom databases running off vDisks will end in tears!
Don’t mess with BIOS!
◦ Simon’s don’t-do-this-at-home node eviction test