SlideShare a Scribd company logo
1 of 53
Download to read offline
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
BDM205
Big Data Mini Con
State of the Union
Roger Barga, AWS
November 29, 2016
What is Big Data?
When your data sets become so large and complex
you have to start innovating around how to
collect, store, process, analyze, and share it.
Amazon EMR Amazon
EC2
Process & Analyze
Amazon
Glacier
Amazon
S3
Store
AWS
Import/Export
AWS Direct
Connect
Collect
Amazon Kinesis
Amazon
Machine
Learning
Amazon
Redshift
Amazon
DynamoDB
Amazon
Kinesis
Analytics
Amazon
QuickSightAWS Database
Migration
Service
AWS Data
Pipeline
Amazon RDS,
Aurora
Big Data services on AWS
Amazon
Elasticsearch
Service
Store anything
Object storage
Highly scalable
99.999999999% durability
Amazon S3
Collection and storage
Petabyte-scale data transfer service that
uses Amazon-provided storage devices for
transport.
Copy up to 80TB data from on-prem file
system to the Snowball through a 10Gbps
network interface
All data is encrypted by 256-bit GSM
encryption
AWS
Import/Export
Snowball
Collection and storage
E-ink shipping label
Ruggedized
case
“8.5G Impact”
50TB & 80TB
10G network
Relational data warehouse
Massively parallel; Petabyte scale
Fully managed
HDD and SSD Platforms
$1,000/TB/Year; start at $0.25/hour
Amazon Redshift
Structured data processing
Hadoop as a service
Spark, Presto, Flink, Hbase, Hive, etc.
Easy to use; fully managed
On-demand and Spot pricing
HDFS & S3 file systems
Amazon EMR
Semi-structured / unstructured data processing
Distributed search and analytics engine
Managed service using Elasticsearch and Kibana
Fully managed - zero admin
Highly available and reliable
Tightly integrated with other AWS servicesAmazon
Elasticsearch
Service
Semi-structured / unstructured data processing
Serverless compute service that runs your
code in response to events.
Extend AWS services with user-defined
custom logic.
Pay only for the requests served and
compute time required - billing in
increments of 100 milliseconds
AWS Lambda
Serverless event processing
Streams: Build your own custom application to
process streaming data using Amazon Kinesis Client
Library. Connectors to S3, DynamoDB, Lambda,
Amazon Redshift, Elastisearch, Storm spout,…
Firehose: Load massive volumes of streaming data
into S3, Amazon Redshift, Elasticsearch. Inline
processing using Lambda and library of exiemplates.
Analytics: Analyze streaming data using standard
SQL, no servers to manage, elastically scale, pay as
you go.
Amazon
Kinesis
Streaming data processing
Streams: Build your own custom application to
process streaming data using Amazon Kinesis Client
Library. Connectors to S3, DynamoDB, Lambda,
Amazon Redshift, Elastisearch, Storm spout,…
Firehose: Load massive volumes of streaming data
into S3, Amazon Redshift, Elasticsearch. Inline
processing using Lambda and library of ready to use
templates.
Analytics: Analyze streaming data using standard
SQL, no servers to manage, elastically scale, pay as
you go.
Amazon
Kinesis
Streaming data processing
Fast, powered by SPICE, automatically scales.
Explore, analyze, share insights with anyone.
1/10th the cost of traditional BI solutions.
Broad connectivity with AWS data services, on-
premises data, files and business applications.
Amazon
QuickSight
Visualize and explore
Amazon RDS
Amazon S3 Amazon Redshift
Putting it together
Scale
Scale as your data and business grows
The volume, variety, and velocity at which data is being generated are leaving
organizations with new questions to answer, such as:
Store and analyze all your data, structured and unstructured
from all of your sources, in one centralized location at low cost.
Quickly ingest data without needing to force it into a
pre-defined schema, enabling ad-hoc analysis by applying
schemas on read, not write.
Separating your storage and compute allows you to scale each
component as required, attach multiple data processing and
analytics services to the same data set.
Scale
S3 Data Lake
Implementing a Data Lake on AWS
Elasticsearch
Starting small is powerful, when you can scale up fast
Scaling up your analytics systems With AWS Traditional IT *
Get a new BI server 20 minutes 3 months
Upgrade your analytics server to the newest
Intel processors and add 16GB memory
10 minutes 2 months
Add 500TB of storage instant 2 months
Grow a DWH cluster from 8GB to 1PB 1 hour 8 months
Build a 1024-node Hadoop cluster 30 minutes unlikely
Roll out multi-region production environment hours months
* actual provisioning times in a well-organized IT division
Netflix: Using Amazon S3 as the fabric of our big data ecosystem
Tuesday, Nov. 29
5:30pm – 6:30pm
Mirage, St. Croix B
Putting it together
Cost
Putting it together: cost
How much would it cost to process the Twitter fire hose?
Putting it together: cost
How much would it cost to process the Twitter fire hose?
S3: $0.025/GB-Mo
Redshift: Starts at $0.25/hour
EC2: Starts at $0.02/hour
Glacier: $0.007/GB-Mo
Kinesis: $0.015/shard 1MB/s in;
2MB/out; $0.014/million puts
500MM tweets/day = ~ 5,800 tweets/sec
2k/tweet is ~12MB/sec (~1TB/day)
$0.015/hour per shard, $0.014/million PUTS
Amazon Kinesis cost is $0.47/hour
Amazon Redshift cost is $0.850/hour (for a 2TB node)
S3 cost is $1.02/hour (no compression)
Total: $2.34/hour – on demand
Cost
Use only the services you need
Scale only the services you need
Pay for only what you use
Discounts through Reserved Instances
Types including Spot, and upfront commitments.
Cost
Putting it together
Scale and security
Putting it together: scale and security
FINRA: Monitor and enforce trading regulations
FINRA handles approximately
75 billion market events every
day to build a holistic picture of
trading in the U.S. Hundreds of
surveillance algorithms against
massive amounts of data.
FINRA mission
 Deter misconduct by enforcing the rules.
 Detect and prevent wrongdoing in US markets
 Discipline those who break the rules
Scale brings unique challenges
 Market volumes are volatile and increasing
 Exchanges are dynamically evolving
 Regulatory rules are created and enhanced
 New securities products are introduced
 Market manipulators innovate
Petabytes of data generated on
premise and brought to AWS and
stored in S3 data lake.
Thousands of analytical queries
performed on EMR and Redshift. Over
400 analytics packages.
Stringent security requirements met by
leveraging VPC, VPN, Encryption at
Rest and In Transit, AWS CloudTrail and
database auditing
Flexible
Interactive
Queries
Predefined
Queries
Surveillance
Analytics
Data Management
Data Movement
Data Registration
Version Management
Amazon S3
Platform that adapts to market dynamics
Web Applications
Analysts; Regulators
Amazon EMR
Amazon EMR
Amazon Redshift
Store an exabyte of data or more in S3
Analyze GB to PB using standard tools
Encryption of all data at each step
Auditability of all APIs and retrievals
Control egress and ingress points using VPCs
Scale and
security
FINRA: Building a Secure Data Science Platform on AWS
Tuesday, Nov. 29
4:00pm – 5:00pm
Mirage, St. Croix B
Putting it together
Agility and actionable insights
Actionable insights
Demonstration
http://amzn.to/bigdata
Access from a mobile device…
What item most interests you this week?
What item will be the most difficult to explain to
your significant other when you return home?
What will give you the biggest headache this week?
New Amazon Web Services Blackjack
Networking with Peers re:Play Party
What item most interests you this week?
What are your colleagues most interested in hearing
about when you return next week?
What will give you the biggest headache this week?
New Amazon Web Services Blackjack
Networking with Peers re:Play Party
What item most interests you this week?
What are your colleagues most interested in hearing
about when you return next week?
What will give you the biggest headache this week?
New Amazon Web Services Blackjack
Networking with Peers re:Play Party
Kinesis
Ingestion
Stream
Kinesis
Analytics
Kinesis
Aggregate
Stream
Lambda
Function
DynamoDB
TableAmazon
Cognito
SELECT ROWTIME, userId, COUNT(*)
FROM STREAM
GROUP BY userId, FLOOR(ROWTIME to
SECOND)
S3 Bucket
HTML, JavascriptAggregated DataRaw Device and
Quadrant Data
Demo architecture
The demo application
CREATE OR REPLACE STREAM DESTINATION_SQL_STREAM (UNIQUE_USER_COUNT INT, ANDROID_COUNT INT, IOS_COUNT INT, WINDOWS_PHONE_COUNT INT,
OTHER_OS_COUNT INT, QUADRANT_A_COUNT INT, QUADRANT_B_COUNT INT, QUADRANT_C_COUNT INT, QUADRANT_D_COUNT INT, WINDOW_TIME TIMESTAMP);
CREATE OR REPLACE STREAM DISTINCT_USER_STREAM (COGNITO_ID VARCHAR(64), DEVICE VARCHAR(32), OS VARCHAR(32), QUADRANT char(1), DT
TIMESTAMP);
CREATE OR REPLACE PUMP "DISTINCT_USER_PUMP" AS
INSERT INTO "DISTINCT_USER_STREAM"
SELECT STREAM DISTINCT
"cognitoId",
"device",
"os",
"quadrant",
FLOOR("SOURCE_SQL_STREAM_001".ROWTIME TO SECOND)
FROM "SOURCE_SQL_STREAM_001";
CREATE OR REPLACE PUMP "OUTPUT_PUMP" AS
INSERT INTO "DESTINATION_SQL_STREAM"
SELECT STREAM
COUNT("DISTINCT_USER_STREAM".COGNITO_ID) AS UNIQUE_USER_COUNT,
COUNT((CASE WHEN "DISTINCT_USER_STREAM".OS = 'Android' THEN COGNITO_ID ELSE null END)) AS ANDROID_COUNT,
COUNT((CASE WHEN "DISTINCT_USER_STREAM".OS = 'iOS' THEN COGNITO_ID ELSE null END)) AS IOS_COUNT,
COUNT((CASE WHEN "DISTINCT_USER_STREAM".OS = 'Windows Phone' THEN COGNITO_ID ELSE null END)) AS WINDOWS_PHONE_COUNT,
COUNT((CASE WHEN "DISTINCT_USER_STREAM".OS = 'other' THEN COGNITO_ID ELSE null END)) AS OTHER_OS_COUNT,
COUNT((CASE WHEN "DISTINCT_USER_STREAM".QUADRANT = 'A' THEN COGNITO_ID ELSE null END)) AS QUADRANT_A_COUNT,
COUNT((CASE WHEN "DISTINCT_USER_STREAM".QUADRANT = 'B' THEN COGNITO_ID ELSE null END)) AS QUADRANT_B_COUNT,
COUNT((CASE WHEN "DISTINCT_USER_STREAM".QUADRANT = 'C' THEN COGNITO_ID ELSE null END)) AS QUADRANT_C_COUNT,
COUNT((CASE WHEN "DISTINCT_USER_STREAM".QUADRANT = 'D' THEN COGNITO_ID ELSE null END)) AS QUADRANT_D_COUNT,
ROWTIME
FROM "DISTINCT_USER_STREAM"
GROUP BY
FLOOR("DISTINCT_USER_STREAM".ROWTIME TO SECOND);
Big data does not mean just batch
 Can be streamed in
 Processed in real time
 Can be used to respond quickly to requests and
actionable events, generate business value.
You can mix and match
 On-premises and cloud
 Custom development and managed services
Agility
& actionable
insights
Putting it together
Choice and selection
1-click deployment to launch, in
multiple regions around the world
Pay-as-you-go pricing with no long
term contracts required
2,000+ product listings to browse,
test, and buy software; 290
specific to big data.
Advanced Analytics
Database and Data Enablement
Business Intelligence
Putting it together: choice and selection
AWS Marketplace: Software store with simplified procurement
Largest ecosystem of ISVs & integrators
Tens of thousands of consulting and technology partners
We have a retail mindset
Use our managed big data services
Build or bring your own
Or access thousands in our marketplace
Each customer decides for themselves
Choice &
selection
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Richard T. Freeman, Ph.D., Lead Data Engineer and Architect, JustGiving
November 29, 2016
JustGiving:
Event-Driven Data Platform
BDM205
We are
A tech-for-good platform for
events-based fundraising,
charities, and crowdfunding
“Ensure no good cause
goes unfunded”
• The #1 platform for online
social giving in the world
• Peaks in traffic: Ice bucket,
natural disasters
• Raised $4.2bn in donations
• 28.5m users
• 196 countries
• 27,000 good causes
• GiveGraph
• 91 million nodes
• 0.53 billion relationships
Fundraising page
Our requirements
• Limitation in existing SQL Server data warehouse
• Long-running and complex queries for data scientists
• New data sources: API, clickstream, unstructured, log, behavioral
data, etc.
• Easy to add data sources and pipelines
• Reduce time spent on data preparation and experiments
Machine
learning
Graph
processing
Natural language
processing
Stream processing
Data
ingestion
Data
preparation
Automated Pipelines
Insight
Predictions
Measure
Recommendations
Data-driven
Event-driven data platform at JustGiving [1 of 2]
• JustGiving developed in-house analytics and data science
platform in AWS called RAVEN.
• Reporting, Analytics, Visualization, Experimental, Networks
• Uses event-driven and serverless pipelines rather than
workflows or DAGs
• Messaging, queues, pub/sub patterns
• Separate storage from compute
• Supports scalable event driven
• ETL / ELT
• Machine learning
• Natural language processing
• Graph processing
• Allows users to consume raw tables, data blocks, metrics,
KPIs, insight, reports etc.
Event-driven data platform at JustGiving [2 of 2]
Serverless streaming analytics and persist stream
The outcome
• Ingest full clickstream
• Near real-time streaming analytics
• Persist streams to Amazon S3 and Amazon Redshift
Amazon Kinesis
• AWS managed services
• Event-driven and serverless
• Scale out and automate complex queries
• Improved productivity
• Data-driven: Measure, insight, predict, recommend
RAVEN platform:
scalable event-driven data platform in AWS
Thank you!
“Ensure no good cause goes unfunded”
Contact:
https://linkedin.com/in/
drfreeman
BDM303 - JustGiving: Serverless Data Pipelines, Event-Driven ETL, and Stream Processing
Tuesday 2:30 PM - 3:30 PM
Wednesday, 3:30 PM - 4:30 PM [repeat]
Proven customer success
The vast majority of big data use cases deployed in the cloud
today run on AWS.
Big Data Mini Con sessions
Mirage, Bermuda A Mirage, St. Croix B Mirage, Event Center B Mirage, Barbados A
1:00 PM
Beeswax: Building a Real-
Time Streaming Data
Platform on AWS
Big Data Architectural
Patterns and Best
Practices on AWS
Deep Dive: Amazon
EMR Best Practices &
Design Patterns Workshop: Building
Your First Big Data
Application with AWS
2:30 PM
JustGiving: Serverless Data
Pipelines, Event-Driven ETL,
and Stream Processing
Best Practices for
Apache Spark on
Amazon EMR
Understanding IoT
Data: How to Leverage
Amazon Kinesis in Building
an IoT Analytics Platform
on AWS
4:00 PM
Analyzing Streaming Data in
Real-time with Amazon
Kinesis Analytics
FINRA: Building a
Secure Data Science
Platform on AWS
Best Practices for Data
Warehousing with
Amazon Redshift Workshop: Building
Your First Big Data
Application with AWS
5:30 PM
Real-Time Data Exploration
and Analytics with Amazon
Elasticsearch Service and
Kibana
Netflix: Using Amazon
S3 as the fabric of our
big data ecosystem
Visualizing Big Data
Insights with Amazon
QuickSight
Plus, repeats for many sessions throughout the week!
Get started with Big Data on AWS
aws.amazon.com/big-data
Big Data Quest
Learn at your own pace and practice working with AWS
services for big data on QwikLABS. (3 Hours | Online)
qwiklabs.com/quests/1
Big Data on AWS
How to use AWS services to process data with Hadoop &
create big data environments (3 Days | Classroom )
aws.amazon.com/training/course-descriptions/bigdata/
Big Data Technology Fundamentals FREE!
Overview of AWS big data solutions for architects or data
scientists new to big data. (3 Hours | Online)
AWS Courses
Self-paced Online Labs
Remember to complete
your evaluations!
Thank you!

More Related Content

What's hot

Getting started with Amazon Kinesis
Getting started with Amazon KinesisGetting started with Amazon Kinesis
Getting started with Amazon KinesisAmazon Web Services
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesAmazon Web Services
 
Getting Started with Amazon Kinesis
Getting Started with Amazon KinesisGetting Started with Amazon Kinesis
Getting Started with Amazon KinesisAmazon Web Services
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersAmazon Web Services
 
AWS re:Invent 2016: Achieving Agility by Following Well-Architected Framework...
AWS re:Invent 2016: Achieving Agility by Following Well-Architected Framework...AWS re:Invent 2016: Achieving Agility by Following Well-Architected Framework...
AWS re:Invent 2016: Achieving Agility by Following Well-Architected Framework...Amazon Web Services
 
Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
Deep Dive on Object Storage: Amazon S3 and Amazon GlacierDeep Dive on Object Storage: Amazon S3 and Amazon Glacier
Deep Dive on Object Storage: Amazon S3 and Amazon GlacierAdrian Hornsby
 
AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)
AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)
AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)Amazon Web Services
 
AWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
AWS Data Transfer Services: Data Ingest Strategies Into the AWS CloudAWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
AWS Data Transfer Services: Data Ingest Strategies Into the AWS CloudAmazon Web Services
 
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017Amazon Web Services
 
AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...
AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...
AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...Amazon Web Services
 
Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AW...
Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AW...Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AW...
Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AW...Amazon Web Services
 
Convert and Migrate Your NoSQL Database or Data Warehouse to AWS - July 2017
Convert and Migrate Your NoSQL Database or Data Warehouse to AWS - July 2017Convert and Migrate Your NoSQL Database or Data Warehouse to AWS - July 2017
Convert and Migrate Your NoSQL Database or Data Warehouse to AWS - July 2017Amazon Web Services
 
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...Amazon Web Services
 
Optimizing Storage for Big Data Analytics Workloads
Optimizing Storage for Big Data Analytics WorkloadsOptimizing Storage for Big Data Analytics Workloads
Optimizing Storage for Big Data Analytics WorkloadsAmazon Web Services
 
Data Storage for the Long Haul: Compliance and Archive
Data Storage for the Long Haul: Compliance and ArchiveData Storage for the Long Haul: Compliance and Archive
Data Storage for the Long Haul: Compliance and ArchiveAmazon Web Services
 
BDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
BDA403 How Netflix Monitors Applications in Real-time with Amazon KinesisBDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
BDA403 How Netflix Monitors Applications in Real-time with Amazon KinesisAmazon Web Services
 
Migrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar SeriesMigrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar SeriesAmazon Web Services
 
Migrating Large Scale Data Sets to the Cloud
Migrating Large Scale Data Sets to the CloudMigrating Large Scale Data Sets to the Cloud
Migrating Large Scale Data Sets to the CloudAmazon Web Services
 

What's hot (20)

Getting started with Amazon Kinesis
Getting started with Amazon KinesisGetting started with Amazon Kinesis
Getting started with Amazon Kinesis
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
 
Getting Started with Amazon Kinesis
Getting Started with Amazon KinesisGetting Started with Amazon Kinesis
Getting Started with Amazon Kinesis
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million Users
 
AWS re:Invent 2016: Achieving Agility by Following Well-Architected Framework...
AWS re:Invent 2016: Achieving Agility by Following Well-Architected Framework...AWS re:Invent 2016: Achieving Agility by Following Well-Architected Framework...
AWS re:Invent 2016: Achieving Agility by Following Well-Architected Framework...
 
Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
Deep Dive on Object Storage: Amazon S3 and Amazon GlacierDeep Dive on Object Storage: Amazon S3 and Amazon Glacier
Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
 
AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)
AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)
AWS re:Invent 2016: Deep Dive on Amazon Glacier (STG302)
 
AWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
AWS Data Transfer Services: Data Ingest Strategies Into the AWS CloudAWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
AWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
 
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
Big Data adoption success using AWS Big Data Services - Pop-up Loft TLV 2017
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...
AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...
AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...
 
Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AW...
Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AW...Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AW...
Active Archiving with Amazon S3 and Tiering to Amazon Glacier - March 2017 AW...
 
Convert and Migrate Your NoSQL Database or Data Warehouse to AWS - July 2017
Convert and Migrate Your NoSQL Database or Data Warehouse to AWS - July 2017Convert and Migrate Your NoSQL Database or Data Warehouse to AWS - July 2017
Convert and Migrate Your NoSQL Database or Data Warehouse to AWS - July 2017
 
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...
AWS re:Invent 2016: Case Study: How Startups Like Smartsheet and Quantcast Ac...
 
Optimizing Storage for Big Data Analytics Workloads
Optimizing Storage for Big Data Analytics WorkloadsOptimizing Storage for Big Data Analytics Workloads
Optimizing Storage for Big Data Analytics Workloads
 
Data Storage for the Long Haul: Compliance and Archive
Data Storage for the Long Haul: Compliance and ArchiveData Storage for the Long Haul: Compliance and Archive
Data Storage for the Long Haul: Compliance and Archive
 
BDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
BDA403 How Netflix Monitors Applications in Real-time with Amazon KinesisBDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
BDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
 
The Best of re:invent 2016
The Best of re:invent 2016The Best of re:invent 2016
The Best of re:invent 2016
 
Migrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar SeriesMigrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar Series
 
Migrating Large Scale Data Sets to the Cloud
Migrating Large Scale Data Sets to the CloudMigrating Large Scale Data Sets to the Cloud
Migrating Large Scale Data Sets to the Cloud
 

Viewers also liked

AWS re:Invent 2016: Understanding IoT Data: How to Leverage Amazon Kinesis in...
AWS re:Invent 2016: Understanding IoT Data: How to Leverage Amazon Kinesis in...AWS re:Invent 2016: Understanding IoT Data: How to Leverage Amazon Kinesis in...
AWS re:Invent 2016: Understanding IoT Data: How to Leverage Amazon Kinesis in...Amazon Web Services
 
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...Amazon Web Services
 
AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...
AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...
AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...Amazon Web Services
 
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...Amazon Web Services
 
AWS re:Invent 2016: Serverless Architectural Patterns and Best Practices (ARC...
AWS re:Invent 2016: Serverless Architectural Patterns and Best Practices (ARC...AWS re:Invent 2016: Serverless Architectural Patterns and Best Practices (ARC...
AWS re:Invent 2016: Serverless Architectural Patterns and Best Practices (ARC...Amazon Web Services
 
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...Amazon Web Services
 
AWS re:Invent 2016: Analyzing Streaming Data in Real-time with Amazon Kinesis...
AWS re:Invent 2016: Analyzing Streaming Data in Real-time with Amazon Kinesis...AWS re:Invent 2016: Analyzing Streaming Data in Real-time with Amazon Kinesis...
AWS re:Invent 2016: Analyzing Streaming Data in Real-time with Amazon Kinesis...Amazon Web Services
 
AWS re:Invent 2016: FINRA: Building a Secure Data Science Platform on AWS (BD...
AWS re:Invent 2016: FINRA: Building a Secure Data Science Platform on AWS (BD...AWS re:Invent 2016: FINRA: Building a Secure Data Science Platform on AWS (BD...
AWS re:Invent 2016: FINRA: Building a Secure Data Science Platform on AWS (BD...Amazon Web Services
 
AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...
AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...
AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...Amazon Web Services
 
AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elas...
AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elas...AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elas...
AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elas...Amazon Web Services
 
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...Amazon Web Services
 
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...Amazon Web Services
 
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...Amazon Web Services
 
AWS re:Invent 2016: Amazon Aurora Deep Dive (GPST402)
AWS re:Invent 2016: Amazon Aurora Deep Dive (GPST402)AWS re:Invent 2016: Amazon Aurora Deep Dive (GPST402)
AWS re:Invent 2016: Amazon Aurora Deep Dive (GPST402)Amazon Web Services
 
AWS re:Invent 2016: Taking Data to the Extreme (MBL202)
AWS re:Invent 2016: Taking Data to the Extreme (MBL202)AWS re:Invent 2016: Taking Data to the Extreme (MBL202)
AWS re:Invent 2016: Taking Data to the Extreme (MBL202)Amazon Web Services
 
CPA ONE 2016 - Big data: big decisions or big fallacy
CPA ONE 2016 - Big data: big decisions or big fallacyCPA ONE 2016 - Big data: big decisions or big fallacy
CPA ONE 2016 - Big data: big decisions or big fallacyLaurie Desautels
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016Monal Daxini
 
Workshop : Wild Rydes Takes Off - The Dawn of a New Unicorn
Workshop : Wild Rydes Takes Off - The Dawn of a New UnicornWorkshop : Wild Rydes Takes Off - The Dawn of a New Unicorn
Workshop : Wild Rydes Takes Off - The Dawn of a New UnicornAmazon Web Services
 
Introduction to Amazon EC2 Spot Instances
Introduction to Amazon EC2 Spot InstancesIntroduction to Amazon EC2 Spot Instances
Introduction to Amazon EC2 Spot InstancesAmazon Web Services
 
What's New in Spark 2?
What's New in Spark 2?What's New in Spark 2?
What's New in Spark 2?Eyal Ben Ivri
 

Viewers also liked (20)

AWS re:Invent 2016: Understanding IoT Data: How to Leverage Amazon Kinesis in...
AWS re:Invent 2016: Understanding IoT Data: How to Leverage Amazon Kinesis in...AWS re:Invent 2016: Understanding IoT Data: How to Leverage Amazon Kinesis in...
AWS re:Invent 2016: Understanding IoT Data: How to Leverage Amazon Kinesis in...
 
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
 
AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...
AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...
AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...
 
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
 
AWS re:Invent 2016: Serverless Architectural Patterns and Best Practices (ARC...
AWS re:Invent 2016: Serverless Architectural Patterns and Best Practices (ARC...AWS re:Invent 2016: Serverless Architectural Patterns and Best Practices (ARC...
AWS re:Invent 2016: Serverless Architectural Patterns and Best Practices (ARC...
 
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
AWS re:Invent 2016: JustGiving: Serverless Data Pipelines, Event-Driven ETL, ...
 
AWS re:Invent 2016: Analyzing Streaming Data in Real-time with Amazon Kinesis...
AWS re:Invent 2016: Analyzing Streaming Data in Real-time with Amazon Kinesis...AWS re:Invent 2016: Analyzing Streaming Data in Real-time with Amazon Kinesis...
AWS re:Invent 2016: Analyzing Streaming Data in Real-time with Amazon Kinesis...
 
AWS re:Invent 2016: FINRA: Building a Secure Data Science Platform on AWS (BD...
AWS re:Invent 2016: FINRA: Building a Secure Data Science Platform on AWS (BD...AWS re:Invent 2016: FINRA: Building a Secure Data Science Platform on AWS (BD...
AWS re:Invent 2016: FINRA: Building a Secure Data Science Platform on AWS (BD...
 
AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...
AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...
AWS re:Invent 2016| HLC301 | Data Science and Healthcare: Running Large Scale...
 
AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elas...
AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elas...AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elas...
AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elas...
 
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...
 
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
AWS re:Invent 2016: Billions of Rows Transformed in Record Time Using Matilli...
 
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
 
AWS re:Invent 2016: Amazon Aurora Deep Dive (GPST402)
AWS re:Invent 2016: Amazon Aurora Deep Dive (GPST402)AWS re:Invent 2016: Amazon Aurora Deep Dive (GPST402)
AWS re:Invent 2016: Amazon Aurora Deep Dive (GPST402)
 
AWS re:Invent 2016: Taking Data to the Extreme (MBL202)
AWS re:Invent 2016: Taking Data to the Extreme (MBL202)AWS re:Invent 2016: Taking Data to the Extreme (MBL202)
AWS re:Invent 2016: Taking Data to the Extreme (MBL202)
 
CPA ONE 2016 - Big data: big decisions or big fallacy
CPA ONE 2016 - Big data: big decisions or big fallacyCPA ONE 2016 - Big data: big decisions or big fallacy
CPA ONE 2016 - Big data: big decisions or big fallacy
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
 
Workshop : Wild Rydes Takes Off - The Dawn of a New Unicorn
Workshop : Wild Rydes Takes Off - The Dawn of a New UnicornWorkshop : Wild Rydes Takes Off - The Dawn of a New Unicorn
Workshop : Wild Rydes Takes Off - The Dawn of a New Unicorn
 
Introduction to Amazon EC2 Spot Instances
Introduction to Amazon EC2 Spot InstancesIntroduction to Amazon EC2 Spot Instances
Introduction to Amazon EC2 Spot Instances
 
What's New in Spark 2?
What's New in Spark 2?What's New in Spark 2?
What's New in Spark 2?
 

Similar to AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)

Analisi dei dati con AWS: una panoramica degli strumenti disponibili
Analisi dei dati con AWS: una panoramica degli strumenti disponibiliAnalisi dei dati con AWS: una panoramica degli strumenti disponibili
Analisi dei dati con AWS: una panoramica degli strumenti disponibiliAmazon Web Services
 
20141021 AWS Cloud Taekwon - Big Data on AWS
20141021 AWS Cloud Taekwon - Big Data on AWS20141021 AWS Cloud Taekwon - Big Data on AWS
20141021 AWS Cloud Taekwon - Big Data on AWSAmazon Web Services Korea
 
Building your First Big Data Application on AWS
Building your First Big Data Application on AWSBuilding your First Big Data Application on AWS
Building your First Big Data Application on AWSAmazon Web Services
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...Amazon Web Services
 
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAnalyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAmazon Web Services
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Amazon Web Services
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Amazon Web Services
 
Big Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_SingaporeBig Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_SingaporeAmazon Web Services
 
Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018Amazon Web Services
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesAmazon Web Services
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesBuild Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesAmazon Web Services
 
Em tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dadosEm tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dadosAmazon Web Services LATAM
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...Amazon Web Services
 
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...Amazon Web Services
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Amazon Web Services
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Amazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 

Similar to AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205) (20)

Analisi dei dati con AWS: una panoramica degli strumenti disponibili
Analisi dei dati con AWS: una panoramica degli strumenti disponibiliAnalisi dei dati con AWS: una panoramica degli strumenti disponibili
Analisi dei dati con AWS: una panoramica degli strumenti disponibili
 
20141021 AWS Cloud Taekwon - Big Data on AWS
20141021 AWS Cloud Taekwon - Big Data on AWS20141021 AWS Cloud Taekwon - Big Data on AWS
20141021 AWS Cloud Taekwon - Big Data on AWS
 
Building your First Big Data Application on AWS
Building your First Big Data Application on AWSBuilding your First Big Data Application on AWS
Building your First Big Data Application on AWS
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
 
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAnalyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
 
Big Data on AWS
Big Data on AWSBig Data on AWS
Big Data on AWS
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Big Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_SingaporeBig Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_Singapore
 
Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesBuild Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
 
Em tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dadosEm tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dados
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
 
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 

Recently uploaded (20)

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 

AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. BDM205 Big Data Mini Con State of the Union Roger Barga, AWS November 29, 2016
  • 2. What is Big Data? When your data sets become so large and complex you have to start innovating around how to collect, store, process, analyze, and share it.
  • 3. Amazon EMR Amazon EC2 Process & Analyze Amazon Glacier Amazon S3 Store AWS Import/Export AWS Direct Connect Collect Amazon Kinesis Amazon Machine Learning Amazon Redshift Amazon DynamoDB Amazon Kinesis Analytics Amazon QuickSightAWS Database Migration Service AWS Data Pipeline Amazon RDS, Aurora Big Data services on AWS Amazon Elasticsearch Service
  • 4. Store anything Object storage Highly scalable 99.999999999% durability Amazon S3 Collection and storage
  • 5. Petabyte-scale data transfer service that uses Amazon-provided storage devices for transport. Copy up to 80TB data from on-prem file system to the Snowball through a 10Gbps network interface All data is encrypted by 256-bit GSM encryption AWS Import/Export Snowball Collection and storage E-ink shipping label Ruggedized case “8.5G Impact” 50TB & 80TB 10G network
  • 6. Relational data warehouse Massively parallel; Petabyte scale Fully managed HDD and SSD Platforms $1,000/TB/Year; start at $0.25/hour Amazon Redshift Structured data processing
  • 7. Hadoop as a service Spark, Presto, Flink, Hbase, Hive, etc. Easy to use; fully managed On-demand and Spot pricing HDFS & S3 file systems Amazon EMR Semi-structured / unstructured data processing
  • 8. Distributed search and analytics engine Managed service using Elasticsearch and Kibana Fully managed - zero admin Highly available and reliable Tightly integrated with other AWS servicesAmazon Elasticsearch Service Semi-structured / unstructured data processing
  • 9. Serverless compute service that runs your code in response to events. Extend AWS services with user-defined custom logic. Pay only for the requests served and compute time required - billing in increments of 100 milliseconds AWS Lambda Serverless event processing
  • 10. Streams: Build your own custom application to process streaming data using Amazon Kinesis Client Library. Connectors to S3, DynamoDB, Lambda, Amazon Redshift, Elastisearch, Storm spout,… Firehose: Load massive volumes of streaming data into S3, Amazon Redshift, Elasticsearch. Inline processing using Lambda and library of exiemplates. Analytics: Analyze streaming data using standard SQL, no servers to manage, elastically scale, pay as you go. Amazon Kinesis Streaming data processing
  • 11. Streams: Build your own custom application to process streaming data using Amazon Kinesis Client Library. Connectors to S3, DynamoDB, Lambda, Amazon Redshift, Elastisearch, Storm spout,… Firehose: Load massive volumes of streaming data into S3, Amazon Redshift, Elasticsearch. Inline processing using Lambda and library of ready to use templates. Analytics: Analyze streaming data using standard SQL, no servers to manage, elastically scale, pay as you go. Amazon Kinesis Streaming data processing
  • 12. Fast, powered by SPICE, automatically scales. Explore, analyze, share insights with anyone. 1/10th the cost of traditional BI solutions. Broad connectivity with AWS data services, on- premises data, files and business applications. Amazon QuickSight Visualize and explore Amazon RDS Amazon S3 Amazon Redshift
  • 14. Scale as your data and business grows The volume, variety, and velocity at which data is being generated are leaving organizations with new questions to answer, such as:
  • 15. Store and analyze all your data, structured and unstructured from all of your sources, in one centralized location at low cost. Quickly ingest data without needing to force it into a pre-defined schema, enabling ad-hoc analysis by applying schemas on read, not write. Separating your storage and compute allows you to scale each component as required, attach multiple data processing and analytics services to the same data set. Scale S3 Data Lake
  • 16. Implementing a Data Lake on AWS Elasticsearch
  • 17. Starting small is powerful, when you can scale up fast Scaling up your analytics systems With AWS Traditional IT * Get a new BI server 20 minutes 3 months Upgrade your analytics server to the newest Intel processors and add 16GB memory 10 minutes 2 months Add 500TB of storage instant 2 months Grow a DWH cluster from 8GB to 1PB 1 hour 8 months Build a 1024-node Hadoop cluster 30 minutes unlikely Roll out multi-region production environment hours months * actual provisioning times in a well-organized IT division
  • 18. Netflix: Using Amazon S3 as the fabric of our big data ecosystem Tuesday, Nov. 29 5:30pm – 6:30pm Mirage, St. Croix B
  • 20. Putting it together: cost How much would it cost to process the Twitter fire hose?
  • 21. Putting it together: cost How much would it cost to process the Twitter fire hose? S3: $0.025/GB-Mo Redshift: Starts at $0.25/hour EC2: Starts at $0.02/hour Glacier: $0.007/GB-Mo Kinesis: $0.015/shard 1MB/s in; 2MB/out; $0.014/million puts
  • 22. 500MM tweets/day = ~ 5,800 tweets/sec 2k/tweet is ~12MB/sec (~1TB/day) $0.015/hour per shard, $0.014/million PUTS Amazon Kinesis cost is $0.47/hour Amazon Redshift cost is $0.850/hour (for a 2TB node) S3 cost is $1.02/hour (no compression) Total: $2.34/hour – on demand Cost
  • 23. Use only the services you need Scale only the services you need Pay for only what you use Discounts through Reserved Instances Types including Spot, and upfront commitments. Cost
  • 25. Putting it together: scale and security FINRA: Monitor and enforce trading regulations FINRA handles approximately 75 billion market events every day to build a holistic picture of trading in the U.S. Hundreds of surveillance algorithms against massive amounts of data. FINRA mission  Deter misconduct by enforcing the rules.  Detect and prevent wrongdoing in US markets  Discipline those who break the rules Scale brings unique challenges  Market volumes are volatile and increasing  Exchanges are dynamically evolving  Regulatory rules are created and enhanced  New securities products are introduced  Market manipulators innovate
  • 26. Petabytes of data generated on premise and brought to AWS and stored in S3 data lake. Thousands of analytical queries performed on EMR and Redshift. Over 400 analytics packages. Stringent security requirements met by leveraging VPC, VPN, Encryption at Rest and In Transit, AWS CloudTrail and database auditing Flexible Interactive Queries Predefined Queries Surveillance Analytics Data Management Data Movement Data Registration Version Management Amazon S3 Platform that adapts to market dynamics Web Applications Analysts; Regulators Amazon EMR Amazon EMR Amazon Redshift
  • 27. Store an exabyte of data or more in S3 Analyze GB to PB using standard tools Encryption of all data at each step Auditability of all APIs and retrievals Control egress and ingress points using VPCs Scale and security FINRA: Building a Secure Data Science Platform on AWS Tuesday, Nov. 29 4:00pm – 5:00pm Mirage, St. Croix B
  • 28. Putting it together Agility and actionable insights
  • 30. What item most interests you this week? What item will be the most difficult to explain to your significant other when you return home? What will give you the biggest headache this week? New Amazon Web Services Blackjack Networking with Peers re:Play Party
  • 31. What item most interests you this week? What are your colleagues most interested in hearing about when you return next week? What will give you the biggest headache this week? New Amazon Web Services Blackjack Networking with Peers re:Play Party
  • 32. What item most interests you this week? What are your colleagues most interested in hearing about when you return next week? What will give you the biggest headache this week? New Amazon Web Services Blackjack Networking with Peers re:Play Party
  • 33. Kinesis Ingestion Stream Kinesis Analytics Kinesis Aggregate Stream Lambda Function DynamoDB TableAmazon Cognito SELECT ROWTIME, userId, COUNT(*) FROM STREAM GROUP BY userId, FLOOR(ROWTIME to SECOND) S3 Bucket HTML, JavascriptAggregated DataRaw Device and Quadrant Data Demo architecture
  • 34. The demo application CREATE OR REPLACE STREAM DESTINATION_SQL_STREAM (UNIQUE_USER_COUNT INT, ANDROID_COUNT INT, IOS_COUNT INT, WINDOWS_PHONE_COUNT INT, OTHER_OS_COUNT INT, QUADRANT_A_COUNT INT, QUADRANT_B_COUNT INT, QUADRANT_C_COUNT INT, QUADRANT_D_COUNT INT, WINDOW_TIME TIMESTAMP); CREATE OR REPLACE STREAM DISTINCT_USER_STREAM (COGNITO_ID VARCHAR(64), DEVICE VARCHAR(32), OS VARCHAR(32), QUADRANT char(1), DT TIMESTAMP); CREATE OR REPLACE PUMP "DISTINCT_USER_PUMP" AS INSERT INTO "DISTINCT_USER_STREAM" SELECT STREAM DISTINCT "cognitoId", "device", "os", "quadrant", FLOOR("SOURCE_SQL_STREAM_001".ROWTIME TO SECOND) FROM "SOURCE_SQL_STREAM_001"; CREATE OR REPLACE PUMP "OUTPUT_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM" SELECT STREAM COUNT("DISTINCT_USER_STREAM".COGNITO_ID) AS UNIQUE_USER_COUNT, COUNT((CASE WHEN "DISTINCT_USER_STREAM".OS = 'Android' THEN COGNITO_ID ELSE null END)) AS ANDROID_COUNT, COUNT((CASE WHEN "DISTINCT_USER_STREAM".OS = 'iOS' THEN COGNITO_ID ELSE null END)) AS IOS_COUNT, COUNT((CASE WHEN "DISTINCT_USER_STREAM".OS = 'Windows Phone' THEN COGNITO_ID ELSE null END)) AS WINDOWS_PHONE_COUNT, COUNT((CASE WHEN "DISTINCT_USER_STREAM".OS = 'other' THEN COGNITO_ID ELSE null END)) AS OTHER_OS_COUNT, COUNT((CASE WHEN "DISTINCT_USER_STREAM".QUADRANT = 'A' THEN COGNITO_ID ELSE null END)) AS QUADRANT_A_COUNT, COUNT((CASE WHEN "DISTINCT_USER_STREAM".QUADRANT = 'B' THEN COGNITO_ID ELSE null END)) AS QUADRANT_B_COUNT, COUNT((CASE WHEN "DISTINCT_USER_STREAM".QUADRANT = 'C' THEN COGNITO_ID ELSE null END)) AS QUADRANT_C_COUNT, COUNT((CASE WHEN "DISTINCT_USER_STREAM".QUADRANT = 'D' THEN COGNITO_ID ELSE null END)) AS QUADRANT_D_COUNT, ROWTIME FROM "DISTINCT_USER_STREAM" GROUP BY FLOOR("DISTINCT_USER_STREAM".ROWTIME TO SECOND);
  • 35. Big data does not mean just batch  Can be streamed in  Processed in real time  Can be used to respond quickly to requests and actionable events, generate business value. You can mix and match  On-premises and cloud  Custom development and managed services Agility & actionable insights
  • 36. Putting it together Choice and selection
  • 37. 1-click deployment to launch, in multiple regions around the world Pay-as-you-go pricing with no long term contracts required 2,000+ product listings to browse, test, and buy software; 290 specific to big data. Advanced Analytics Database and Data Enablement Business Intelligence Putting it together: choice and selection AWS Marketplace: Software store with simplified procurement
  • 38. Largest ecosystem of ISVs & integrators Tens of thousands of consulting and technology partners
  • 39. We have a retail mindset Use our managed big data services Build or bring your own Or access thousands in our marketplace Each customer decides for themselves Choice & selection
  • 40. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Richard T. Freeman, Ph.D., Lead Data Engineer and Architect, JustGiving November 29, 2016 JustGiving: Event-Driven Data Platform BDM205
  • 41. We are A tech-for-good platform for events-based fundraising, charities, and crowdfunding “Ensure no good cause goes unfunded” • The #1 platform for online social giving in the world • Peaks in traffic: Ice bucket, natural disasters • Raised $4.2bn in donations • 28.5m users • 196 countries • 27,000 good causes • GiveGraph • 91 million nodes • 0.53 billion relationships
  • 43. Our requirements • Limitation in existing SQL Server data warehouse • Long-running and complex queries for data scientists • New data sources: API, clickstream, unstructured, log, behavioral data, etc. • Easy to add data sources and pipelines • Reduce time spent on data preparation and experiments Machine learning Graph processing Natural language processing Stream processing Data ingestion Data preparation Automated Pipelines Insight Predictions Measure Recommendations Data-driven
  • 44. Event-driven data platform at JustGiving [1 of 2] • JustGiving developed in-house analytics and data science platform in AWS called RAVEN. • Reporting, Analytics, Visualization, Experimental, Networks • Uses event-driven and serverless pipelines rather than workflows or DAGs • Messaging, queues, pub/sub patterns • Separate storage from compute • Supports scalable event driven • ETL / ELT • Machine learning • Natural language processing • Graph processing • Allows users to consume raw tables, data blocks, metrics, KPIs, insight, reports etc.
  • 45. Event-driven data platform at JustGiving [2 of 2]
  • 46. Serverless streaming analytics and persist stream
  • 47. The outcome • Ingest full clickstream • Near real-time streaming analytics • Persist streams to Amazon S3 and Amazon Redshift Amazon Kinesis • AWS managed services • Event-driven and serverless • Scale out and automate complex queries • Improved productivity • Data-driven: Measure, insight, predict, recommend RAVEN platform: scalable event-driven data platform in AWS
  • 48. Thank you! “Ensure no good cause goes unfunded” Contact: https://linkedin.com/in/ drfreeman BDM303 - JustGiving: Serverless Data Pipelines, Event-Driven ETL, and Stream Processing Tuesday 2:30 PM - 3:30 PM Wednesday, 3:30 PM - 4:30 PM [repeat]
  • 49. Proven customer success The vast majority of big data use cases deployed in the cloud today run on AWS.
  • 50. Big Data Mini Con sessions Mirage, Bermuda A Mirage, St. Croix B Mirage, Event Center B Mirage, Barbados A 1:00 PM Beeswax: Building a Real- Time Streaming Data Platform on AWS Big Data Architectural Patterns and Best Practices on AWS Deep Dive: Amazon EMR Best Practices & Design Patterns Workshop: Building Your First Big Data Application with AWS 2:30 PM JustGiving: Serverless Data Pipelines, Event-Driven ETL, and Stream Processing Best Practices for Apache Spark on Amazon EMR Understanding IoT Data: How to Leverage Amazon Kinesis in Building an IoT Analytics Platform on AWS 4:00 PM Analyzing Streaming Data in Real-time with Amazon Kinesis Analytics FINRA: Building a Secure Data Science Platform on AWS Best Practices for Data Warehousing with Amazon Redshift Workshop: Building Your First Big Data Application with AWS 5:30 PM Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana Netflix: Using Amazon S3 as the fabric of our big data ecosystem Visualizing Big Data Insights with Amazon QuickSight Plus, repeats for many sessions throughout the week!
  • 51. Get started with Big Data on AWS aws.amazon.com/big-data Big Data Quest Learn at your own pace and practice working with AWS services for big data on QwikLABS. (3 Hours | Online) qwiklabs.com/quests/1 Big Data on AWS How to use AWS services to process data with Hadoop & create big data environments (3 Days | Classroom ) aws.amazon.com/training/course-descriptions/bigdata/ Big Data Technology Fundamentals FREE! Overview of AWS big data solutions for architects or data scientists new to big data. (3 Hours | Online) AWS Courses Self-paced Online Labs