Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How Amazon.com Migrates Inventory Management Systems (DAT346) - AWS re:Invent 2018

326 views

Published on

In this session, learn from the team that migrated Amazon’s inventory and fulfillment management systems (AFT) from Oracle to Amazon Aurora. We focus on the performance and cost benefits to enterprises that migrate critical systems from Oracle to AWS services; the decision frameworks used to pick Amazon Aurora; and best practices in system design and project management.

  • Be the first to comment

  • Be the first to like this

How Amazon.com Migrates Inventory Management Systems (DAT346) - AWS re:Invent 2018

  1. 1. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How Amazon.com Migrates Inventory Management Systems Brent Bigonger Senior Database Engineer Amazon Fulfillment Technologies (AFT) D A T 3 4 6
  2. 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda Amazon Fulfillment Technologies (AFT) Technical challenges AFT’s Oracle to Aurora PostgreSQL migration Additional lessons learned
  3. 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  4. 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Fulfillment Technologies
  5. 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Fulfillment Technologies
  6. 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Fulfillment Technologies
  7. 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Fulfillment Technologies
  8. 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  9. 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Complexity and Scale 300+ Oracle source databases Minimal downtime requirements Dozens of services Some of the oldest Amazon databases • Complex dependencies • 20+ years of optimization for Oracle Stringent database call latency requirements
  10. 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Fulfillment Technologies Receive Stow Pick Sort Pack Ship Inventory Previous architecture Oracle
  11. 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Challenges in Oracle and on-premise Scalability • Difficult to scale • Required custom hardware Availability • Hardware failures compromised availability • Longer time to recover Hardware management • Hardware forecasting, ordering and provisioning • High operational burden to maintain
  12. 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Database characteristics Relational Referential integrity with strong consistency, transactions, and hardened scale Key-value Low-latency, key- based queries with high throughput and fast data ingestion Document Indexing and storing of documents with support for query on any property In-memory Microsecond latency, key-based queries, specialized data structures Graph Creating and navigating relations between data easily and quickly Complex query support via SQL Simple query methods with filters Simple query with filters, projections and aggregates Simple query methods with filters Easily express queries in terms of relations Amazon Aurora Amazon RDS Amazon DynamoDB Amazon DynamoDB Amazon ElastiCache for Redis & Memcached Amazon Neptune
  13. 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Fulfillment Technologies Relational Key-value Document Graph In-memory Search Inventory Receive Stow Pick Sort Pack Ship Aurora PostgreSQL Purpose-built databases
  14. 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Why we chose Aurora PostgreSQL High performance High availability Feature/SQL parity with Oracle scalability Vertical Horizontal (Up to 15 Reader Instances) Managed service Database snapshots Full production non-prod databases
  15. 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  16. 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Migration Preparation Migration Post-migration
  17. 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Design patterns Separate AWS prod/non-prod AWS accounts Architect applications to leverage Readers Provides horizontal read scaling Utilize database clones into development test cycles Initial performance benchmarks, functional testing, load testing, etc. Encryption in-transit and at-rest Plan for automation (AWS APIs/CLIs) Multiple non-prod cutovers/cut-backs
  18. 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Oracle & PostgreSQL considerations Timestamps/timezones differences Oracle - server’s timezone SYSDATE() PostgresSQL - client’s timezone clock_timestamp()
  19. 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Oracle & PostgreSQL considerations (cont.) Partitioning differences Oracle != PostgreSQL Inheritance Partitioning (PostgreSQL 9.6) Child tables maintained via triggers Declarative Partitioning (PostgreSQL 10+) Only range/list supported No global uniqueness (PKs/UKs)
  20. 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Oracle & PostgreSQL considerations (cont.) COLLATE 'en_US': ttest1=> SELECT * FROM bb_sort ORDER BY val; id | val ----+-------- 2 | three 3 | t-wo 1 | ONE COLLATE 'C': ttest1=> SELECT * FROM bb_sort ORDER BY val; id | val ----+-------- 1 | ONE 3 | t-wo 2 | three Default PostgreSQL collation != Oracle DMS data validation uses ‘C’ collation
  21. 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Dependency analysis Capture all query and DML activities Shared objects – tables, views, sequences Cross-database Materialized Views Data Warehouse ETL feeds How? Query historical views (DBA_HIST_%, v$active_session_history) Periodic sampling active v$ views (v$sql, v$sqlarea) Login monitoring (logon trigger)
  22. 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Migration Preparation Migration Post-migration
  23. 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Database launch automation Create Reserved Instance Create database AWS Data Migration Service (DMS) kick-off Onboard Data Warehouse ETL feeds Create scheduled jobs Integrate Amazon CloudWatch metrics Application schema creation Enable monitoring Enter database in fleet-wide metadata
  24. 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data migration Oracle Schema Conversion Tool Step 1 Oracle AWS Database Migration Service Step 2 PostgreSQL
  25. 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Migration methodology Not a “zero” downtime migration Lift-and-shift but innovate where applicable Start with manual migrations Some pilot FCs Full migration automation after pilot FCs
  26. 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Migration Requirements Performance • No operational impact on FC operations • Short downtime • Aurora PostgreSQL same or better performance from Oracle Migration management • Full data load and ongoing replication • Continuous data validation • Replication monitoring and alarming Automation • Automated database provisioning and build • Automated data migration • Full cutover automation for services and database • Support multiple concurrent cutovers
  27. 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Workflow Engine migration automation DevOps Operator WebUI/CLI System Description Amazon SWF Java Component DMS Component SQL Component Additional Components Inventory Services Migration Control Plane Monitoring Systems Networking (DNS) Additional Systems WebUI/CLI System Description Amazon SWF Java Component DMS Component SQL Component Additional Components Inventory Services Migration Control Plane Monitoring Systems Networking (DNS) Additional Systems
  28. 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Migration Preparation Migration Post-migration
  29. 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Auto Vacuum Key knob of PostgreSQL architecture CloudWatch alarm for MaximumUsedTransactionIDs 2.1 billion un-vacuumed Transaction IDs Monitor pg_stat_all_tables Key columns n_live_tup, n_dead_tup, autovacuum_date Absolute: n_dead_tup Ratio: n_dead_tup/n_live_tup
  30. 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Auto Vacuum (cont.) Review *autovacuum* parameters Defaults can’t accommodate all situations Type of Table INSERT only, INSERT + UPDATE/DELETE fillfactor (table default 100) Highly active tables (heavy DML) autovacuum_vacuum_scale_factor autovacuum_vacuum_threshold autovacuum_analyze_scale_factor autovacuum_analyze_threshold
  31. 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. CloudWatch Dashboards and Alarms
  32. 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Performance Insights
  33. 33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Aurora PostgreSQL Benefits for AFT Performance • Scales to largest AFT workloads • Supports our strict query latency requirements • No impact snapshot backups Scalability • Now scaling up/down takes minutes, not hours • Seamless horizontal read scaling Availability • Faster failovers • Back to high availability in minutes after hardware failure Hardware management • Provisioning our hardware in minutes, not months • Better use of DBA team resources Cloud-based automation • AWS API/CLI enabled management • CloudWatch monitoring • Build test databases from snapshots Cost • No more Oracle license costs • Open source with support
  34. 34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  35. 35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Additional lessons Better telemetry in service API call translation to SQL Instrumenting SQL with comments. "SELECT /* SVC-API-FooClass.java-UUID-v1 */ ..." Auto-commit typically *on* in clients No database method to set directly. Sequences PostgreSQL sequence caching occurs per connection. Oracle global sequence caching. Use SSL (verify-full) Inserting NULL In Oracle, "" (empty string) and null are treated the same. In PostgreSQL, "" (empty string) is stored as an empty string.
  36. 36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Additional lessons No Global Temporary tables Session level frequent create/drop EXCEPTION blocks consume Transaction IDs (XIDs) Even when exception not fired RAISE notices or error doesn’t Remove EXCEPTION blocks from any triggers
  37. 37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Additional lessons Idle transactions and long-running queries Both prevent dead tuple and multi-transaction cleanup Idle transactions Consider ‘idle_in_transaction_session_timeout’ (PostgreSQL 9.6+) FATAL: terminating connection due to idle-in-transaction timeout Long-running queries Consider ‘old_snapshot_threshold’ (PostgreSQL 9.6+) Similar to “snapshot too old”
  38. 38. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Additional lessons - DMS All source tables should have a PK/UK Primary Key required for LOBs Primary Key or Unique Index required for data validation Identify tables with LOBs LOB size affects technical approach (Limited LOB Mode, Full LOB Mode) Large replication instances (R4 class) allow many DMS tasks Monitors on replication instances Especially swap usage, which can make task behavior abnormal Consider one heartbeat table per DMS task DMS validation fails if PostgreSQL triggers change data Evaluate date range partitioned table retention
  39. 39. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Additional lessons - DMS Challenge • Oracle “snapshot too old” (ORA-01555) error • Large and highly active tables Mitigation • Adjust UNDO retention (undo_retention) • Reduce source data retention if possible • Evaluate date range partitioned table retention • Separate DMS task from other tables • Use multiple tasks to achieve parallelism • Run full load from Read Only Standby • Full load in low peak time
  40. 40. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Additional lessons - DMS Challenge • Migrating tables with Large Objects (LOBs) Mitigation • Table must have Primary Key • Determine max LOB size (see Notes) • Limited LOB Mode (<= 64k) • Set extra connection attribute ‘failTasksOnLobTruncation=true’ to avoid LOB truncation for LOBs > 64kb • Full LOB Mode (> 64k) • Slower than Limited LOB Mode • Set LOB chunk size (k) less than network maximum allowed packet size.
  41. 41. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Additional lessons Useful extensions: pg_stat_statements pgaudit pgstattuple pg_hint_plan pg_repack log_fdw auto_explain
  42. 42. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Additional lessons Consider parameters: log_statement = ddl log_connections =on log_disconnections =on log_lock_waits = on log_min_duration_statement = 5000 rds.force_ssl = True huge_pages = on (larger instance types) random_page_cost =1 idle_in_transaction_session_timeout = per your needs old_snapshot_threshold = per your needs
  43. 43. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Brent Bigonger bigonger@amazon.com
  44. 44. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  45. 45. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  46. 46. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Determine max LOB size select 'select (max(length(' || COLUMN_NAME || '))/(1024)) as "Size in KB" from ' || owner || '.' || TABLE_NAME ||';' "maxlobsizeqry" from dba_tab_cols where owner='<schema_name>' and data_type in ('CLOB','BLOB','LOB');

×