Your SlideShare is downloading. ×
0
AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.
AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.
AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.
AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.
AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.
AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.
AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.
AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.
AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.
AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.
AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.
AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.
AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.
AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.
AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.
AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.
AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.
AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.
AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.
AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.
AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.
AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.
AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.
AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.
AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.
AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.
AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.
AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.
AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.
AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.
AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.

489

Published on

Come hear about the services that AWS provides to manage data and when to use which tools to manage data appropriately. You will learn about both data movement and coordination, as well as data …

Come hear about the services that AWS provides to manage data and when to use which tools to manage data appropriately. You will learn about both data movement and coordination, as well as data storage and analysis, including when to use relational and NoSQL approaches, Hadoop, and data warehousing. This session will highlight how AWS data services have helped real-world customers.

Published in: Business, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
489
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
44
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • OBAMA for America -> In the system, Ruby on Rails (RoR), Python/Django, PHP, and a host of other front- and mid-tier technologies intermingled, creating a robust heterogeneous design. Below that, the use of 10 different structured storage systems reflected a focus on bringing tools suited to the data itself. Intermingling technologies included relational database management services like Amazon Relational Database Service (Amazon RDS) for MySQL, PostgreSQL, and Microsoft SQL Server; NoSQL software like MongoDB, Apache Hadoop, Vertica, and LevelDB; and Amazon S3, Amazon DynamoDB, and Amazon SimpleDB.
  • OBAMA for America -> In the system, Ruby on Rails (RoR), Python/Django, PHP, and a host of other front- and mid-tier technologies intermingled, creating a robust heterogeneous design. Below that, the use of 10 different structured storage systems reflected a focus on bringing tools suited to the data itself. Intermingling technologies included relational database management services like Amazon Relational Database Service (Amazon RDS) for MySQL, PostgreSQL, and Microsoft SQL Server; NoSQL software like MongoDB, Apache Hadoop, Vertica, and LevelDB; and Amazon S3, Amazon DynamoDB, and Amazon SimpleDB.
  • The latency characteristics of DynamoDB are under 10 msec and highly consistent.
    Most importantly, the data is durable in DynamoDB, constantly replicated across multiple data centers and persisted to SSD storage.
  • More context – Mongo DB, Cassandra

    Variety – can process many different types, custom serdes, etc.
    Velocity – certain pacakages that run on Hadoop help with real time data injestion, like flume, storm, kafka, spark streaming
    Volume – designed to work on massive data sets.
  • Start an EMR cluster using console or cli tools
    Master instance group created that controls the cluster
    Core instance group created for life of cluster
    Core instances run DataNode and TaskTracker daemons
    Optional task instances can be added or subtracted to perform work (SPOT)
    S3 can be used as underlying ‘file system’ for input/output data
    Master node coordinates distribution of work and manages cluster state
    Core and Task instances read-write to S3


  • Volume – pretty high
    Velocity – very high
    Variety – good if it fits into 40k, otherwise need to do some lifting.
  • Volume – pretty high
    Velocity – very high
    Variety – good if it fits into 40k, otherwise need to do some lifting.
  • Transcript

    • 1. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 AWS as a Data Platform Chris Keyser ckeyser@amazon.com
    • 2. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Ease of useLower costs Why AWS?
    • 3. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 no capital investment pay as you go no subscriptions only pay for what you use Ease of useLower costs
    • 4. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 programmable zero admin easy to configure integrate with existing tools Ease of useLower costs
    • 5. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 One tool to rule them all
    • 6. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 II Use the right tools
    • 7. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Movement and Coordination Data PipelineDirect Connect Storage GatewayImport / Export
    • 8. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Storage and Analysis Services EC2EBS Instance Storage RedshiftRDS SQL Stores EMR Hadoop DynamoDB NOSQL Kinesis Stream Cloud Search Search S3 Storage Services Cloud FrontGlacier
    • 9. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Movement and Coordination
    • 10. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Movement and Coordination - Plumbing Ship us your disks Direct Connect Storage Gateway Import / Export Dedicated network pipes Storage backup & archiving
    • 11. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 AWS Data Pipeline Resource management Scheduling, execution, and retry Dependency tracking Failure notification Movement and Coordination - Orchestration
    • 12. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Data Storage and Analysis
    • 13. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Storage Services – Object Store Amazon S3 > 1.5 million peak requests/sec Designed for 99.999999999% durability Trillions of objects Stores anything Lifecycle and Versioning
    • 14. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Storage Services - Archive Storage Low cost, durable archiving “Cold Storage” Infrequently accessed data Integrated S3 lifecycle policies Amazon Glacier
    • 15. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Storage Services – Edge Caching Simple to use with global footprint Streaming support Large file distribution Private content S3, EC2 and ELB integration Geo restrictions Amazon CloudFront
    • 16. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014
    • 17. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Instance Storage - Options Ephemeral Storage (“local”) You manage backup/restoral High Storage instances available  i2.8xlarge – 6.4 TB SSD (350K IOPS)  hs1.8xlarge – 48 TB Disk Storage Amazon EC2 Elastic Block Storage “Network Attached Storage” Snapshot, Encryption Provisioned throughput (IOPS)
    • 18. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Instance Storage - Build Your Own Amazon EC2 NFS MongoDB Cassandra GraphLab Titan Kafka Luster Gluster Flume Scribe Presto …and more
    • 19. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 MySQL, Oracle, SQLServer, Postgres Backup/Restore, High Availability Push Button Scalability Up to 3 TB and 30K IOPS Amazon RDS SQL Stores - Managed Relational DB
    • 20. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Relational data warehouse Massively parallel Petabyte scale Fully managed $1,000/TB/Year Amazon Redshift SQL Stores- Petabyte Data Warehouse
    • 21. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 SQL Stores- Amazon Redshift Architecture • Leader Node – SQL endpoint – Stores metadata – Coordinates query execution • Compute Nodes – Local, columnar storage – Execute queries in parallel – Backup and restore via S3 – Parallel load from S3, EMR, or DynamoDB • HW optimized for data processing – DW1: 2TB – 1.6PB Magnetic – DW2: 160GB – 256TB SSD 10 GigE (HPC) Ingestion Backup Restore JDBC/ODBC
    • 22. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 NoSQL Database Seamless scalability Zero admin Single digit millisecond latency Amazon DynamoDB NoSQL – Dial Up Capacity
    • 23. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 WRITES Continuously replicated to 3 AZ’s Quorum acknowledgment Persisted to disk (custom SSD) READS Strongly or eventually consistent No trade-off in latency NoSQL - Durable Low Latency at Scale
    • 24. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Hive, Impala, Spark, Pig, MapReduce Easy to use; fully managed On-demand and spot pricing Persistent and transient clusters Deep integration with S3 Amazon Elastic Map Reduce Hadoop – On Demand
    • 25. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Master instance group Task instance groupCore instance group HDFS HDFS Amazon S3Amazon Redshift Amazon DynamoDB Hadoop – Tuned for AWS
    • 26. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014
    • 27. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Real-time data collection Seamlessly scale to gigabytes/s Low cost managed service EMR integration Low cost managed service Streaming - at Scale Amazon Kinesis
    • 28. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Streaming - Amazon Kinesis Architecture Amazon Web Services AZ AZ AZ Durable, highly consistent storage replicates data across three data centers (availability zones) Millions of sources producing 100s of terabytes per hour Front End Authentication Authorization Ordered stream of events supports multiple readers Inexpensive: $0.028 per million puts Aggregate analysis in Hadoop or data Warehouse Machine learning algorithms or sliding window analytics Real-time dashboards and alarms Aggregate and Archive to S3
    • 29. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Fully managed search engine Simple to operate Highly available User configurable scaling Advanced feature support Search – Made Simple Amazon CloudSearch 34 languages Algorithmic stemming Geospatial search Faceted search Suggestions Highlighting Field weighting …
    • 30. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 The right tool. At the right time. At the right scale.
    • 31. AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Thank You Chris Keyser ckeyser@amazon.com

    ×