AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.
Upcoming SlideShare
Loading in...5
×
 

AWS as a Data Platform - AWS Symposium 2014 - Washington D.C.

on

  • 372 views

Come hear about the services that AWS provides to manage data and when to use which tools to manage data appropriately. You will learn about both data movement and coordination, as well as data ...

Come hear about the services that AWS provides to manage data and when to use which tools to manage data appropriately. You will learn about both data movement and coordination, as well as data storage and analysis, including when to use relational and NoSQL approaches, Hadoop, and data warehousing. This session will highlight how AWS data services have helped real-world customers.

Statistics

Views

Total Views
372
Slideshare-icon Views on SlideShare
372
Embed Views
0

Actions

Likes
1
Downloads
25
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • OBAMA for America -> In the system, Ruby on Rails (RoR), Python/Django, PHP, and a host of other front- and mid-tier technologies intermingled, creating a robust heterogeneous design. Below that, the use of 10 different structured storage systems reflected a focus on bringing tools suited to the data itself. Intermingling technologies included relational database management services like Amazon Relational Database Service (Amazon RDS) for MySQL, PostgreSQL, and Microsoft SQL Server; NoSQL software like MongoDB, Apache Hadoop, Vertica, and LevelDB; and Amazon S3, Amazon DynamoDB, and Amazon SimpleDB. <br />
  • OBAMA for America -> In the system, Ruby on Rails (RoR), Python/Django, PHP, and a host of other front- and mid-tier technologies intermingled, creating a robust heterogeneous design. Below that, the use of 10 different structured storage systems reflected a focus on bringing tools suited to the data itself. Intermingling technologies included relational database management services like Amazon Relational Database Service (Amazon RDS) for MySQL, PostgreSQL, and Microsoft SQL Server; NoSQL software like MongoDB, Apache Hadoop, Vertica, and LevelDB; and Amazon S3, Amazon DynamoDB, and Amazon SimpleDB. <br />
  • The latency characteristics of DynamoDB are under 10 msec and highly consistent. <br /> Most importantly, the data is durable in DynamoDB, constantly replicated across multiple data centers and persisted to SSD storage.
  • More context – Mongo DB, Cassandra <br /> <br /> Variety – can process many different types, custom serdes, etc. <br /> Velocity – certain pacakages that run on Hadoop help with real time data injestion, like flume, storm, kafka, spark streaming <br /> Volume – designed to work on massive data sets.
  • Start an EMR cluster using console or cli tools <br /> Master instance group created that controls the cluster <br /> Core instance group created for life of cluster <br /> Core instances run DataNode and TaskTracker daemons <br /> Optional task instances can be added or subtracted to perform work (SPOT) <br /> S3 can be used as underlying ‘file system’ for input/output data <br /> Master node coordinates distribution of work and manages cluster state <br /> Core and Task instances read-write to S3 <br /> <br /> <br />
  • Volume – pretty high <br /> Velocity – very high <br /> Variety – good if it fits into 40k, otherwise need to do some lifting.
  • Volume – pretty high <br /> Velocity – very high <br /> Variety – good if it fits into 40k, otherwise need to do some lifting.

AWS as a Data Platform - AWS Symposium 2014 - Washington D.C. AWS as a Data Platform - AWS Symposium 2014 - Washington D.C. Presentation Transcript

  • AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 AWS as a Data Platform Chris Keyser ckeyser@amazon.com
  • AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Ease of useLower costs Why AWS?
  • AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 no capital investment pay as you go no subscriptions only pay for what you use Ease of useLower costs
  • AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 programmable zero admin easy to configure integrate with existing tools Ease of useLower costs
  • AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 One tool to rule them all
  • AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 II Use the right tools
  • AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Movement and Coordination Data PipelineDirect Connect Storage GatewayImport / Export
  • AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Storage and Analysis Services EC2EBS Instance Storage RedshiftRDS SQL Stores EMR Hadoop DynamoDB NOSQL Kinesis Stream Cloud Search Search S3 Storage Services Cloud FrontGlacier
  • AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Movement and Coordination
  • AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Movement and Coordination - Plumbing Ship us your disks Direct Connect Storage Gateway Import / Export Dedicated network pipes Storage backup & archiving
  • AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 AWS Data Pipeline Resource management Scheduling, execution, and retry Dependency tracking Failure notification Movement and Coordination - Orchestration
  • AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Data Storage and Analysis
  • AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Storage Services – Object Store Amazon S3 > 1.5 million peak requests/sec Designed for 99.999999999% durability Trillions of objects Stores anything Lifecycle and Versioning
  • AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Storage Services - Archive Storage Low cost, durable archiving “Cold Storage” Infrequently accessed data Integrated S3 lifecycle policies Amazon Glacier
  • AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Storage Services – Edge Caching Simple to use with global footprint Streaming support Large file distribution Private content S3, EC2 and ELB integration Geo restrictions Amazon CloudFront
  • AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014
  • AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Instance Storage - Options Ephemeral Storage (“local”) You manage backup/restoral High Storage instances available  i2.8xlarge – 6.4 TB SSD (350K IOPS)  hs1.8xlarge – 48 TB Disk Storage Amazon EC2 Elastic Block Storage “Network Attached Storage” Snapshot, Encryption Provisioned throughput (IOPS)
  • AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Instance Storage - Build Your Own Amazon EC2 NFS MongoDB Cassandra GraphLab Titan Kafka Luster Gluster Flume Scribe Presto …and more
  • AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 MySQL, Oracle, SQLServer, Postgres Backup/Restore, High Availability Push Button Scalability Up to 3 TB and 30K IOPS Amazon RDS SQL Stores - Managed Relational DB
  • AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Relational data warehouse Massively parallel Petabyte scale Fully managed $1,000/TB/Year Amazon Redshift SQL Stores- Petabyte Data Warehouse
  • AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 SQL Stores- Amazon Redshift Architecture • Leader Node – SQL endpoint – Stores metadata – Coordinates query execution • Compute Nodes – Local, columnar storage – Execute queries in parallel – Backup and restore via S3 – Parallel load from S3, EMR, or DynamoDB • HW optimized for data processing – DW1: 2TB – 1.6PB Magnetic – DW2: 160GB – 256TB SSD 10 GigE (HPC) Ingestion Backup Restore JDBC/ODBC
  • AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 NoSQL Database Seamless scalability Zero admin Single digit millisecond latency Amazon DynamoDB NoSQL – Dial Up Capacity
  • AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 WRITES Continuously replicated to 3 AZ’s Quorum acknowledgment Persisted to disk (custom SSD) READS Strongly or eventually consistent No trade-off in latency NoSQL - Durable Low Latency at Scale
  • AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Hive, Impala, Spark, Pig, MapReduce Easy to use; fully managed On-demand and spot pricing Persistent and transient clusters Deep integration with S3 Amazon Elastic Map Reduce Hadoop – On Demand
  • AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Master instance group Task instance groupCore instance group HDFS HDFS Amazon S3Amazon Redshift Amazon DynamoDB Hadoop – Tuned for AWS
  • AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014
  • AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Real-time data collection Seamlessly scale to gigabytes/s Low cost managed service EMR integration Low cost managed service Streaming - at Scale Amazon Kinesis
  • AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Streaming - Amazon Kinesis Architecture Amazon Web Services AZ AZ AZ Durable, highly consistent storage replicates data across three data centers (availability zones) Millions of sources producing 100s of terabytes per hour Front End Authentication Authorization Ordered stream of events supports multiple readers Inexpensive: $0.028 per million puts Aggregate analysis in Hadoop or data Warehouse Machine learning algorithms or sliding window analytics Real-time dashboards and alarms Aggregate and Archive to S3
  • AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Fully managed search engine Simple to operate Highly available User configurable scaling Advanced feature support Search – Made Simple Amazon CloudSearch 34 languages Algorithmic stemming Geospatial search Faceted search Suggestions Highlighting Field weighting …
  • AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 The right tool. At the right time. At the right scale.
  • AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 AWS Government, Education, and Nonprofits Symposium Washington, DC | June 24, 2014 - June 26, 2014 Thank You Chris Keyser ckeyser@amazon.com