Lessons Learned:A DoD Exadata Migration Chris Bradham
Vizuri – an Operating Division of AEMApplied Engineering Management (AEM) Corporation Founded in 1986 as a100% woman-owned businessMore than 25 years of profitable growthHeadquartered in Chantilly, VA with offices located in major metropolitanareas including; Los Angeles, San Antonio, and JacksonvilleDiversified client base including Fortune 500 and major governmentagenciesIndustry recognized awards and certifications for performance, capabilityand delivery
Chris Bradham•Lead Architect, Data Services•Oracle DBA experience 1997 to present (Oracle 7 to 11.2)•Data Guard, Replication, Materialized Views, GoldenGate, Exadata, RAC•Part-time Instructor George Mason University (OCA/OCP)•Oracle Certified Exadata Implementation Specialist, Oracle CertifiedProfessional (11g), Performance Tuning Certified(11g), ITIL Foundation,Securityfirstname.lastname@example.org
What’s being covered? •Technology Refresh •Legacy Environment / Options •Exadata Components •Security Considerations •Migration Considerations •Results of Migration •Lessons Learned •References •Q & A
Background Information Global multi-service DoD Web-based Housing application Over 300 schemas 750 Gb of data 4,300 Active Users 4.2 million annual log ins 4,500 Reports Generated Per Day AEM Corporation responsible for Hosting / Operations & Maintenance / Technology Refresh
Pre-Tech Refresh Issues Legacy hardware over six years old •Patches (5 nodes, slower machines) •Deployments, data updates time consuming •Large or complex reports often hang •Node evictions due to network / disk speed issues •Oracle 10.2.0.4 Support ended 6/31/11 Data Warehouse delay due to performance requirements (Oracle Streams attempt)
Alternative 1 : Based on Legacy Solution Virtualized application servers Network bonding 8 Gbps backbone EMC Disk Array 5 Database Servers Oracle 11gR2 RAC install
Alternative 2 : Based on Exadata Solution Virtualized application servers Network bonding 40 Gbps backbone Oracle Storage Servers 2 node Quarter Rack Oracle 11gR2 RAC preconfigured Surprise, we chose Exadata!
X2-2 Quarter Rack Specifications •2 Xeon-based Dual-processor Database Servers (Sun Fire X4170 M2) • 24 cores (12 per server) • 192 GB memory expandable to 288 GB (96 GB per server expandable to 144 GB) • 10 GigE connectivity to Data Center • 4 x 10GbE ports (2 per server) •1.1 TB High Speed Flash •3 Exadata Storage Servers X2-2 • All with High Performance 600GB disks OR • All with High Capacity 3 TB disks •2 Sun Datacenter InfiniBand Switch 36 • 36-port Managed QDR (40Gb/s) switch •1 “Admin” Cisco Ethernet switch •Keyboard, Video, Mouse (KVM) hardware Can Upgrade to a Half Rack •Redundant Power Distributions Units (PDUs) or just add storage
Exadata Selection Points•Licensing fees made Exadata the low cost solution•Total database hardware solution•Sizable and expandable•Oracle vested to help DoD succeed•Patch Strategy•Storage Indexes / Smart Scan / Smart Flash Cache
Throughput Gb/Second 80.0 75 70.0 60.0 50.0 2 Gbps Fibre Channel x2 4 Gbps Fibre Channel x2 40.0 8 Gbps Fibre Channel x2 30.0 37 Exadata 1/4 - Disk 20.0 25 Exadata 1/2 - Disk 10.0 12.5 16 Exadata Full - Disk 0.4 0.8 5.4 1.6 Exadata 1/4 - Disk & Flash 0.0 Exadata 1/2 - Disk & Flash Exadata Full - Disk & Flash
Tech Refresh Challenges•100% hardware replacement and Data Center move•Data Center staff responsiveness•Narrow window for outage to avoid negative impact on end users•Performance of system, database growth, and network bandwidth•Exadata unproven in DoD space at the time (Security)•Upgrading Database versions (data/code/reports)Lots of change, what if issues surface???
Smart Flash Cache ConsiderationsHelps with… •Write-Through cache voids caching data that will not be reused •Holds hot data, much faster than disk (small, random I/O) •Data not duplicated from cache in other Storage Servers •Reduce latency of log write by simultaneous write to flash / disk (faster writes) with minimal space (512 Mb) •Write-Back cache 184.108.40.206.9Don’t touch except for… •Alter table <table_name> flash_cache keep; •Create Flash Disks out of the Flash Cache •Reassign portion for TEMP tablespace on index builds
Database Node Considerations•Database Consolidation•SGA Settings •AMM Bad! ASMM Good! (set minimum values)•Where’s the shared storage space? •DBFS is the answer (fix_control=8,ac_timeout=60 and SGA=2Gb)•Is everything setup correctly? •Exachk is the answer•Indexes / Hints / Compression•Huge Pages (reduce overhead)•Large Segments <- 8 Mb Initial / Next Size with Autoallocate•TEMP <- BIGFILE, Autoextend 1 Gb, Uniform 1 Mb
Exadata Patch ManagementMultiple Patches •Infiniband (once per year) •DB Nodes / Storage Server (quarterly) •Bundle Path (BP) DB Software (quarterly) •Additional components (Ethernet switch, KVM, PDU)Bug Fixes included so important to applyProceed with caution: one-off patchesRolling option time a consideration
Security•DoD 8570 Requirements•Security Technical Implementation Guide (STIG) oOracle installation not customizable oDBFS and idle_time don’t play well together oAutomatic Service Request (ASR) / Configuration Manager Limitation oGrid Control / Third Party Certificates (September release) oBanners / SQLNET.ORA settings impact on tools (DEFAULT_SDU_SIZE=32767, ORA-12541)Don’t assume security settings will not have impact. Must TEST!!!
Migration Strategies10.2.0.4 to 11.2.0.x Options Considered •DBFS with external tables (5 to 7 GB/sec file system I/O throughput) •GoldenGate with datapump (near-zero downtime) •DatapumpFactors •Maintenance window •Risk of data loss •Familiarity with technologyWhatever the choice, perform multiple trial runs for optimal settings.
2011 – Technical Refresh (Data Center move)On 9/9/11 at 7pm application serversat legacy site were Turned Off:•Transferred encrypted data pumpexports to Data Center•Network outage occurred during datatransfer (2 hours)•On 9/10/11at 7am New System Testingwas Initiated• Users were on the system by 3pm
Migration Timeline Text Initial1/4 Rack Grid DB CutoverDelivered Migration Oracle Setup / STIG Prod/ Migrate Control Setup to (Test) Options Setup DBFS Test CAB Load Apply BP Exadata 1/11 2/11 3/11 4/11 5/11 6/11 7/11 8/11 9/11 Chris Initial DB STIG Apply BP Migration 1/4 Rack Migration Grid First Setup / Test Test Selection Delivered Test Control Day Load (Prod) Setup
Post Tech Refresh Performance (in hours) 9 8 7 6 5 4 Legacy 3 Exadata 2 1 0 Exports MV Refresh BOR Batch Index Process Rebuild
Exadata Lessons Learned•Ensure hosting center can accommodate Exadata’s dimensions•Staff requirements (more communication necessary)•Testing required, ideally 2 Exadata Database machines•Smart Scan <- direct path reads, table access full, fast full index scans,parallel with parallel_degree_policy not auto•Chained rows void smart scans•EHCC 10x space and performance (DML)•Data Warehouse (EHCC, SGA sizing)•IORM not heavily used by customers In-Memory X2-8 Massive Memory X3 2010 All I/Os to Memory 2012
Exadata Lessons Learned (cont.)•Grid Control for monitoring / managing components•Expect CPU utilization to decrease•Expect Disk failures•Can’t mix drive types and pricy to switch•Standard tuning principles apply (OLTP)•DB link opportunities for tuning•Platinum Support, major assistance•Exachk and opatch before / after patching•Time, Experience keys to stability
References (cont.)Database Machine and Exadata Storage Server (888828.1)Oracle Exadata Database Machine exachk (1070954.1)Oracle Exadata Best Practices (757552.1)Best Practices for OLTP on the Sun Oracle Database Machine (1269706.1)Best Practices for Data Warehousing on Database Machine (1297112.1)Oracle Sun Database Machine Application Best Practices for Data Warehousing(1094934.1)Oracle Sun Database Machine Diagnosability and Troubleshooting Best Practices(1274324.1)Expert Oracle Exadata (Osborne, Johnson, Poder)