Gallup moved its database from on-premises Oracle to Amazon RDS MySQL and Amazon Aurora on AWS to gain cost savings, scalability, high availability, and integration with other AWS services. The transition required developing workarounds for missing Oracle features in MySQL, rewriting stored procedures, and establishing new processes for deployment and operations on AWS. While Amazon RDS MySQL provided a cost-effective Oracle alternative, Gallup looks forward to better support and integration with AWS services in Amazon Aurora. The cloud migration met Gallup's business needs and positions it for scalable analytics and reporting in the future.
2. What to Expect from the Session
- Introduction
- Problem statement
- Why AWS?
- Non-database considerations
- RDS MySQL: Benefits and challenges
- Solution architecture
- Process and DevOps
- Amazon RDS / Amazon Aurora
- Conclusions
3. Introduction – Our Company
GALLUP Inc. has studied human nature and behavior for more than 70
years. Gallup employs many of the world's leading scientists in
management, economics, psychology, and sociology. Gallup performance
management systems help organizations boost organic growth by
increasing customer engagement and maximizing employee productivity
through measurement tools, coursework, and strategic advisory services.
Gallup's 2,000 professionals deliver services at client organizations,
through the Web, at Gallup University’s campuses, and in 40 offices
around the world.
4. Problem Statement
- Scalable reporting & analytics platform
- Cost effective
- Rich analytics capabilities
- Security & encryption (compliance)
- 24x7 availability (HA)
- Replication
- Same & multi-region data segregation
- Ease of administration
5. Why AWS?
- Cost effective
- Traditional/existing model
- Software licensing costs upfront
- Hardware investments
- Hardware/database administration overhead
- Multi-region support
- Patriot act
- Cross border data transfer
7. Non-Database Considerations: Process
- On-premises
- Existing stable processes
- Optimized over a decade
- Legacy overhead
- Cloud
- New processes
- New toolsets
- Cultural change (data is not within premises)
- Data segregation
8. Non-Database Considerations: Process
- Data migration
- VPC vs. public
- Bandwidth (VPN - Gallup Network <<>> Amazon VPC)
- Secure data migration
- Data encryption
- Database
- ETL
9. Non-Database Considerations: Technical
- Resource challenges/skillset gaps
- Experience with MySQL procedures/functions, etc.
- AWS skillsets
- Service layer mindset (http, web services, et al)
- Oracle skills are portable
- Lots of deficiencies and peculiarities
- Data migration
- Data synchronization issues
- On-premises vs cloud
- Automate - build vs. buy
10. Non-Database Considerations: Technical
- Data migration
- Amazon RDS reporting repository
- Data lakes
- Amazon S3 data repository (unified/global)
- Ad-hoc custom data & analytical deliverables
- Ease of cross-domain data analysis
- AWS Gotchas
- Amazon SQS: Not a conventional queue
- Amazon S3: eventual consistency
- Variable latency/performance of services
12. Amazon RDS MySQL: Challenges (Database)
- Oracle is far more productive and feature-rich
- No AWS component integrations from the DB
- Tough to support primary database applications
- Developer productivity
- Package support non-existent
- Package level variables
- Codebase is scattered
- Better data structure support (ex. collections)
- Temporary tables
13. Amazon RDS MySQL: Challenges (Database)
- Cursor parameters in procedures
- Dynamic SQL (execute immediate)
- Debugging/logging
- Declare cursors with dynamic SQL
- Global temporary tables
- Support for subqueries in FROM clause
18. Solution Architecture
Oracle DB
Shared
Directories
Tomcat/Java
(QA & Prod)
S3
ElastiCache
Amazon Kinesis
SES/SNS
RDS MySQL
External Reporting
Data Integrations
External Data
Integrations
Gallup
Network
ELB
EC2 Tomcat
Cluster
CloudFront-S3
EC2 Tomcat Data
Server/RDS++
SQS
V
P
N
Amazon VPC (QA/PROD)
External Reporting
Developer
VMs/Jenkins
19. Solution Architecture – MySQL Workarounds
- Package scope variables
- Session variables to share between stored procedures
- SET @SUPPRESSION_VAL = -1 etc.
- Cursors with dynamic SQL
- Create temporary table and open a cursor
- DECLARE outCursor CURSOR FOR
SELECT * FROM test_tmp_tab;
20. Solution Architecture – MySQL Workarounds
- Cursors with dynamic SQL (contd.)
- Write dynamic SQL (populates temporary table)
- SET @v_dyn_sql = CONCAT("INSERT INTO test_tmp_tab
SELECT CONCAT_WS(@TEST1,D1,D2,D3,D4, 'High',
IFNULL(i_measure_list, '""')") out_val FROM test.test_vw
WHERE D1 in (", i_d1_list, ") AND D2 = ", i_d2_id,
IF(i_measure_list IS NULL, ' AND 1 = 0', ' AND 1 = 1')
21. Solution Architecture – MySQL Workarounds
- Execute dynamic SQL, which populates temporary table
- PREPARE stmt FROM @v_dyn_sql;
- EXECUTE stmt; DEALLOCATE PREPARE stmt;
- OPEN outCursor;
- Loop through the cursor and build output
- Execute immediate
- Build dynamic SQL
- SET @v_var = CONCAT('SELECT GROUP_CONCAT(D1
ORDER BY D1 SEPARATOR '','') INTO @o_list FROM (
SELECT D1 FROM D WHERE D1 in (', i_D_list, ')');
-
22. Solution Architecture – MySQL Workarounds
- Execute immediate (contd.)
- SET @o_flist = null;
- Executing the dynamic SQL
- PREPARE stmt FROM @v_var; EXECUTE stmt;
- DEALLOCATE PREPARE stmt;
- SET o_flist = @o_list;
23. Solution Architecture – MySQL
- 400+ stored procedures (first phase)
- 200+ tables/views (first phase)
- Support for aggregation data from on-premises
- Support for reporting configuration
- Brand new products (first phase)
- Amazon RDS++
- Amazon SQS/Amazon S3/Amazon SNS/Amazon SES
support from MySQL
- Post stored procedure integrations
24. Process & DevOps
- GitHub (On-premises)
- VPN (Gallup Network <<>> Amazon VPC)
- Jenkins (Java deployment)
- DB code deployment
- Stored procedure deployment
- EC2/Chef
- Auto Scaling
- Stress environment (clone of production)
- Automated deployment (sysadmins)
- Ease of multi-region deployment
25. Process & DevOps
- Amazon S3 intermediary deployment repository steps
- Jenkins – Check out GIT repo (on-premises)
- Jenkins - Build war and deploy to appropriate S3 buckets
- Jenkins - Run scripts on QA EC2 instances to sync war files
- Manual script deployment on PROD EC2 instances
- Auto Scaling
- Create an EC2 machine
- Install/deploy (Chef)
- Sync with S3 for war files
- Add to ELB
Jenkins
SSH/GIT
AWS Keys
S3 Plugins
Prod EC2
AWS CLI
Amazon S3 (QA & Prod Deploy Buckets)
QA EC2
AWS CLI
26. Amazon RDS / Amazon Aurora
- Early adopter
- More read instances / Less lag times
- Replication & HA
- Better integration with AWS components in future
- Better DevOps tools for database development in future
- Encryption
- Awaiting this functionality to go forward for our production
rollout
27. Conclusions
- AWS is the right fit for our future
- Cost-effective
- Scalable
- Meets challenging overall business needs
- Amazon RDS MySQL/Amazon Aurora
- A cost-effective alternative to Oracle in the cloud for
supporting scalable applications/workloads
- Better integration with other AWS components (Aurora)