SlideShare a Scribd company logo
1 of 83
Abhishek Sinha
Business Development Manager, AWS
July 18, 2013
@abysinha
sinhaar@amazon.com
Big Data Analytics
Overview
• The Big Data Challenge
• Turning data into actionable information
• Building a big data platform
• Mobilewalla– Big data system in AWS for mobile app audience measurement
• Intel technology on big data.
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Lower cost,
higher throughput
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Highly constrained
Lower cost,
higher throughput
Generated data
Available for analysis
Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011
IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
Big Gap in turning data into actionable
information
Amazon Web Services helps remove
constraints
1 instance x 100 hours = 100 instances x 1 hour
Media/Advertising
Targeted
Advertising
Image and
Video
Processing
Oil & Gas
Seismic
Analysis
Retail
Recommendation
Transactions
Analysis
Life Sciences
Genome
Analysis
Financial
Services
Monte Carlo
Simulations
Risk Analysis
Security
Anti-virus
Fraud
Detection
Image
Recognition
Social
Network/Gaming
User
Demographics
Usage
analysis
In-game
metrics
Big Data Verticals and Use cases
From data to
actionable information
“Who is using our service?”
Identified early mobile usage
Invested heavily in mobile development
Finding signal in the noise of logs
9,432,061 unique mobile devices
used the Yelp mobile app.
4 million+ calls. 5 million+ directions.
In January 2013
“What kind of movies do people like ?”
More than 25 Million Streaming Members
50 Billion Events Per Day
30 Million plays every day
2 billion hours of video in 3 months
4 million ratings per day
3 million searches
Device location , time , day,
week etc.
Social data
Query complements the R3 solution by providing granular search-and-
retrieval functionality for structured and unstructured data stored in FinQloud
Building a
Big-Data Architecture
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Getting your Data into AWS
Amazon S3
Corporate Data
Center
• Console Upload
• FTP
• AWS Import Export
• S3 API
• Direct Connect
• Storage Gateway
• 3rd Party Commercial Apps
• Tsunami UDP
1
Write directly to a data source
Your application Amazon S3
DynamoDB
Any other data
store
Amazon S3
Amazon EC2
2
Queue , pre-process and then write to data source
Amazon Simple
Queue Service
(SQS)
Amazon S3
DynamoDB
Any other data
store
3
Agency Customer: Video Analytics on AWS
Elastic Load
Balancer
Edge Servers
on EC2
Workers on
EC2
Logs Reports
HDFS Cluster
Amazon Simple Queue
Service (SQS)
Amazon Simple Storage Service
(S3)
Amazon Elastic MapReduce
Aggregate and write to data source
Flume running
on EC2
Amazon S3
Any other data
store
HDFS
4
Courtesy http://techblog.netflix.com/2013/01/hadoop-platform-as-service-in-cloud.html
S3 as a “single source of truth”
S3
Amazon SQS
Amazon S3
DynamoDB
Any SQL or NO SQL
Store
Log Aggregation
tools
Choose depending upon design
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Hadoop based Analysis
Amazon SQS
Amazon S3
DynamoDB
Any SQL or NO SQL
Store
Log Aggregation
tools
Amazon
EMR
EMR is Hadoop in the Cloud
What is Amazon Elastic MapReduce (EMR)?
EMR Cluster
S3
Put the data
into S3
Choose: Hadoop distribution, # of
nodes, types of nodes, custom
configs, Hive/Pig/etc.
Get the output from
S3
Launch the cluster using the
EMR console, CLI, SDK, or
APIs
You can also store
everything in HDFS
How does EMR work ?
S3
What can you run on EMR…
EMR Cluster
Resize Nodes
EMR Cluster
You can easily add and
remove nodes
On and Off Fast Growth
Predictable peaksVariable peaks
WASTE
Fast GrowthOn and Off
Predictable peaksVariable peaks
Your choice of tools on Hadoop/EMR
Amazon SQS
Amazon S3
DynamoDB
Any SQL or NO SQL
Store
Log Aggregation
tools
Amazon
EMR
SQL based processing
Amazon SQS
Amazon S3
DynamoDB
Any SQL or NO SQL
Store
Log Aggregation
tools
Amazon
EMR
Amazon
Redshift
Pre-processing
framework
Petabyte scale
Columnar Data -
warehouse
What is Amazon Redshift ?
Amazon Redshift is a fast and powerful, fully managed, petabyte-scale
data warehouse service in the AWS cloud
Easy to provision and scale
No upfront costs, pay as you go
High performance at a low price
Open and flexible with support for popular BI tools
Amazon Redshift is priced to let you analyze all your data
Price Per Hour for HS1.XL
Single Node
Effective Hourly Price
Per TB
Effective Annual Price
per TB
On-Demand $ 0.850 $ 0.425 $ 3,723
1 Year Reservation $ 0.500 $ 0.250 $ 2,190
3 Year Reservation $ 0.228 $ 0.114 $ 999
Simple Pricing
Number of Nodes x Cost per Hour
No charge for Leader Node
No upfront costs
Pay as you go
Your choice of BI Tools on the cloud
Amazon SQS
Amazon S3
DynamoDB
Any SQL or NO SQL
Store
Log Aggregation
tools
Amazon
EMR
Amazon
Redshift
Pre-processing
framework
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Collaboration and Sharing insights
Amazon SQS
Amazon S3
DynamoDB
Any SQL or NO SQL
Store
Log Aggregation
tools
Amazon
EMR
Amazon
Redshift
Sharing results and visualizations
Amazon SQS
Amazon S3
DynamoDB
Any SQL or NO SQL
Store
Log Aggregation
tools
Amazon
EMR
Amazon
Redshift
Web App Server
Visualization tools
Sharing results and visualizations and scale
Amazon SQS
Amazon S3
DynamoDB
Any SQL or NO SQL
Store
Log Aggregation
tools
Amazon
EMR
Amazon
Redshift
Web App Server
Visualization tools
Sharing results and visualizations
Amazon SQS
Amazon S3
DynamoDB
Any SQL or NO SQL
Store
Log Aggregation
tools
Amazon
EMR
Amazon
Redshift Business
Intelligence Tools
Business
Intelligence Tools
Geospatial Visualizations
Amazon SQS
Amazon S3
DynamoDB
Any SQL or NO SQL
Store
Log Aggregation
tools
Amazon
EMR
Amazon
Redshift Business
Intelligence Tools
Business
Intelligence Tools
GIS tools on
hadoop
GIS tools
Visualization tools
Rinse Repeat every day or hour
Rinse and Repeat
Amazon SQS
Amazon S3
DynamoDB
Any SQL or NO SQL
Store
Log Aggregation
tools
Amazon
EMR
Amazon
Redshift
Visualization tools
Business
Intelligence Tools
Business
Intelligence Tools
GIS tools on
hadoop
GIS tools
Amazon data pipeline
The complete architecture
Amazon SQS
Amazon S3
DynamoDB
Any SQL or NO SQL
Store
Log Aggregation
tools
Amazon
EMR
Amazon
Redshift
Visualization tools
Business
Intelligence Tools
Business
Intelligence Tools
GIS tools on
hadoop
GIS tools
Amazon data pipeline
Kaushik Dutta
CTO
18 July, 2013
Mobilewalla – App Audience Measurement
With Amazon EC2 Infrastructure
Mobilewalla
• Seattle-based big data venture that has accumulated the largest volumetric
database of app market data in the industry.
• Applying data science techniques on this data, Mobilewalla generates
actionable intelligence of importance to ad agencies, ad tech companies, and
app publishers
• Measuring audience in mobile apps
Traditional audience measurement - Panels & Popularity
Persistence
Fundamental to panel driven measurement
Idea of popularity persistence
Large pool
of options
“small” set of
popular choices
99 – 1 rule
Objects popular today  popular 30-60-90 days from today
• Panel can be assumed to eventually gravitate towards the persistent
popular set
Mobilewalla Use Case – App Publishers
• How is my app doing?
– Rank by Category and Country, Reviews, Ratings, Feature mentions,
Sentiment Analysis, Social Media, Audience Profile, Negative Review
Analysis, Upgrades
• Competitive Tracking
– All of the above for competitors presented as overlays
• Audience Analysis
– Demographics, Psychographics
• Alerts
– Notifications upon specific events: review spikes, Twitter spikes
Mobilewalla Use Case – Mobile Ad Tech
• New Publisher Acquisition
– Top N apps & Publishers for a Category / Geography
– Top publishers by audience
• Optimal Traffic Allocation
– Related apps by content
– Related apps by Audience profile
– Behavioral profiles of network apps
• Real-Time, Programmatic Delivery
– API driven access
– Sub 100ms response times
Mobilewalla Approach
Social media / web Web Crawler Cloud Storage
Amazon S3 Amazon EBS
Amazon RDS
Mobilewalla Approach – Map-Reduce based analytics
Analytics
Analytics
Analytics
Analytics
Map Reduce Analytics
Cloud Storage
( 30+ Terabyte)
Amazon S3 Amazon EBS
Amazon RDS
Mobilewalla – Amazon EC2 Infrastructure
Web Crawler
• 700+ micro to small instances
• Elastic map-reduce – flexibility of
allocating a large number instances
for a distributed program running for
short time
• Spot Instance – reduces the cost
Mobilewalla – Amazon EC2 Infrastructure
Cloud Storage
• 50+ Medium to Large instances
• Cassandra DB Nodes – EBS backed
• Distributed in two availability zones in two
different geographical regions
• Flexibility to add nodes as and when required
– allows you to grow with the business
• Region based fail-over
• Tier Storage systems
– Local storage
– Elastic Block Storage
– S3 Storage
• Considering Amazon Redshift
Amazon S3 Amazon EBS
Amazon RDS
Mobilewalla – Amazon EC2 Infrastructure
Map Reduce Framework
• Complex analytics jobs on Hadoop
systems in EC2 nodes
• Elastic map-reduce for jobs
requiring large number of nodes on
S3 storage systems
Analytics
Analytics
Analytics
Analytics
Mobilewalla – Amazon EC2 Infrastructure
Analytics Delivery
• Multiple application servers with
load balancers
• High read throughput from data
nodes
• Load balancers (ELB) and fail-over
Amazon Web Services for Mobilewalla - Advantages
• On-Demand and reserved nodes
– Flexibility to add, modify, delete nodes as your business changes
• Tiered storage systems to store and manage terabytes of data
– Flexibility to change the data parameters (reliability, read-throughput, write
throughput) by varying the storage systems of your choice
• Elastic Map-Reduce
– Large scale map-reduce cluster without getting details into managing individual
nodes and map-reduce framework
Amazon EC2 allowed us to size our infrastructure as per our need and data
growth.
Amazon Web Services for Mobilewalla - Suggestions
• Take the initial time to explore all the various offerings of Amazon in data
storage and management, before developing a solution
• Changing solution architecture for terabytes of data at later time is a
challenge
Thank You
Big Data Analytics
Eddie Toh
Regional Platform Marketing Manager
Pricing & Product Marketing Group
Intel APAC
July 18, 2013
Create new business
models and improve
organizational
processes.
Enhance scientific
understanding, drive
innovation, and
accelerate medical cures.
Increase public safety
and improve
energy efficiency with
smart grids.
Analysis of Data Can Transform Society
Unlock Value in
Silicon
Support Open
Platforms
Deliver Software Value
Democratizing Analytics gets Value out of Big Data
Intel at the Intersection of Big Data
Enabling exascale computing on massive data sets
Helping enterprises
build open
interoperable clouds
CloudHPC
Contributing code
and fostering
ecosystem
Open Source
Intel at the Heart of the Cloud
Server
Storage
Network
Scale-Out Platform Optimizations for Big Data
Cost-effective performance
• Intel® Advanced Vector Extension
Technology
• Intel® Turbo Boost Technology 2.0
• Intel® Advanced Encryption Standard New
Instructions Technology
Intel® Advanced Vector Extensions Technology
1 : Performance comparison using Linpack benchmark. See backup for configuration details.
For more legal information on performance forecasts go to http://www.intel.com/performance
76
• Newest in a long line of
processor instruction
innovations
• Increases floating point
operations per clock up to
2X1 performance
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests,
such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to
any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating
your contemplated purchases, including the performance of that product when combined with other products.
More Performance
Higher turbo speeds maximize
performance for single and
multi-threaded applications
Intel® Turbo Boost Technology 2.0
Intel® Advanced Encryption Standard New Instructions
• Processor assistance for performing
AES encryption - 7 new instructions
• Makes enabled encryption software
faster and stronger
Richer
user
experiences
4HRS
50%
Reduction
~7MIN
80%
Reduction 50%
Reduction 40%
Reduction
TeraSort for
1TB sort
Intel®
Xeon®
Processor
E5 2600
Solid-State
Drive 10G
Ethernet Intel® Distribution
for Apache Hadoop
Previous
Intel®
Xeon®
Processor
Power of the Platform built by Intel
Cloud
Intelligent Systems
Clients
Virtuous Cycle of Data-Driven Experience
Thank You
Technical Track
Break
Technical Track

More Related Content

What's hot

Qubole on AWS - White paper
Qubole on AWS - White paper Qubole on AWS - White paper
Qubole on AWS - White paper Vasu S
 
Database Services at AWS - Darin Briskman
Database Services at AWS - Darin BriskmanDatabase Services at AWS - Darin Briskman
Database Services at AWS - Darin BriskmanAmazon Web Services
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSAmazon Web Services
 
Loading Data into Redshift with Lab
Loading Data into Redshift with LabLoading Data into Redshift with Lab
Loading Data into Redshift with LabAmazon Web Services
 
Building a data warehouse with AWS Redshift, Matillion and Yellowfin
Building a data warehouse with AWS Redshift, Matillion and YellowfinBuilding a data warehouse with AWS Redshift, Matillion and Yellowfin
Building a data warehouse with AWS Redshift, Matillion and YellowfinLynn Langit
 
Building Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueBuilding Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueAmazon Web Services
 
AWS re:Invent 2016: Migrating a Highly Available and Scalable Database from O...
AWS re:Invent 2016: Migrating a Highly Available and Scalable Database from O...AWS re:Invent 2016: Migrating a Highly Available and Scalable Database from O...
AWS re:Invent 2016: Migrating a Highly Available and Scalable Database from O...Amazon Web Services
 
Serverless Datalake Day with AWS
Serverless Datalake Day with AWSServerless Datalake Day with AWS
Serverless Datalake Day with AWSAmazon Web Services
 
Caching with DynamoDB and DAX - DevDay Austin 2017 Day 2
Caching with DynamoDB and DAX - DevDay Austin 2017 Day 2Caching with DynamoDB and DAX - DevDay Austin 2017 Day 2
Caching with DynamoDB and DAX - DevDay Austin 2017 Day 2Amazon Web Services
 
Best Practices for Building a Data Lake on AWS
Best Practices for Building a Data Lake on AWSBest Practices for Building a Data Lake on AWS
Best Practices for Building a Data Lake on AWSAmazon Web Services
 
Choosing the Right Database for the Job: Relational, Cache, or NoSQL?
Choosing the Right Database for the Job: Relational, Cache, or NoSQL?Choosing the Right Database for the Job: Relational, Cache, or NoSQL?
Choosing the Right Database for the Job: Relational, Cache, or NoSQL?Amazon Web Services
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
Building a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - WebinarBuilding a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - WebinarAmazon Web Services
 
Bursting on-premise analytic workloads to Amazon EMR using Alluxio
Bursting on-premise analytic workloads to Amazon EMR using AlluxioBursting on-premise analytic workloads to Amazon EMR using Alluxio
Bursting on-premise analytic workloads to Amazon EMR using AlluxioAlluxio, Inc.
 
High Performance Computing Implementation on AWS
High Performance Computing Implementation on AWSHigh Performance Computing Implementation on AWS
High Performance Computing Implementation on AWSAmazon Web Services
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudAmazon Web Services
 
Data Design for Microservices - DevDay Austin 2017 Day 2
Data Design for Microservices - DevDay Austin 2017 Day 2Data Design for Microservices - DevDay Austin 2017 Day 2
Data Design for Microservices - DevDay Austin 2017 Day 2Amazon Web Services
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSAmazon Web Services
 
Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019
Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019
Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019Amazon Web Services
 

What's hot (20)

Qubole on AWS - White paper
Qubole on AWS - White paper Qubole on AWS - White paper
Qubole on AWS - White paper
 
Database Services at AWS - Darin Briskman
Database Services at AWS - Darin BriskmanDatabase Services at AWS - Darin Briskman
Database Services at AWS - Darin Briskman
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
 
Loading Data into Redshift with Lab
Loading Data into Redshift with LabLoading Data into Redshift with Lab
Loading Data into Redshift with Lab
 
Aws meetup 20190427
Aws meetup 20190427Aws meetup 20190427
Aws meetup 20190427
 
Building a data warehouse with AWS Redshift, Matillion and Yellowfin
Building a data warehouse with AWS Redshift, Matillion and YellowfinBuilding a data warehouse with AWS Redshift, Matillion and Yellowfin
Building a data warehouse with AWS Redshift, Matillion and Yellowfin
 
Building Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueBuilding Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS Glue
 
AWS re:Invent 2016: Migrating a Highly Available and Scalable Database from O...
AWS re:Invent 2016: Migrating a Highly Available and Scalable Database from O...AWS re:Invent 2016: Migrating a Highly Available and Scalable Database from O...
AWS re:Invent 2016: Migrating a Highly Available and Scalable Database from O...
 
Serverless Datalake Day with AWS
Serverless Datalake Day with AWSServerless Datalake Day with AWS
Serverless Datalake Day with AWS
 
Caching with DynamoDB and DAX - DevDay Austin 2017 Day 2
Caching with DynamoDB and DAX - DevDay Austin 2017 Day 2Caching with DynamoDB and DAX - DevDay Austin 2017 Day 2
Caching with DynamoDB and DAX - DevDay Austin 2017 Day 2
 
Best Practices for Building a Data Lake on AWS
Best Practices for Building a Data Lake on AWSBest Practices for Building a Data Lake on AWS
Best Practices for Building a Data Lake on AWS
 
Choosing the Right Database for the Job: Relational, Cache, or NoSQL?
Choosing the Right Database for the Job: Relational, Cache, or NoSQL?Choosing the Right Database for the Job: Relational, Cache, or NoSQL?
Choosing the Right Database for the Job: Relational, Cache, or NoSQL?
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Building a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - WebinarBuilding a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - Webinar
 
Bursting on-premise analytic workloads to Amazon EMR using Alluxio
Bursting on-premise analytic workloads to Amazon EMR using AlluxioBursting on-premise analytic workloads to Amazon EMR using Alluxio
Bursting on-premise analytic workloads to Amazon EMR using Alluxio
 
High Performance Computing Implementation on AWS
High Performance Computing Implementation on AWSHigh Performance Computing Implementation on AWS
High Performance Computing Implementation on AWS
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
 
Data Design for Microservices - DevDay Austin 2017 Day 2
Data Design for Microservices - DevDay Austin 2017 Day 2Data Design for Microservices - DevDay Austin 2017 Day 2
Data Design for Microservices - DevDay Austin 2017 Day 2
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
 
Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019
Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019
Building Serverless Analytics Pipelines with AWS Glue - AWS Summit Sydney 2019
 

Similar to AWS Big Data Analytics Architecture

AWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWSAWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWSAmazon Web Services
 
Building your First Big Data Application on AWS
Building your First Big Data Application on AWSBuilding your First Big Data Application on AWS
Building your First Big Data Application on AWSAmazon Web Services
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Amazon Web Services
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Amazon Web Services
 
Amazon Web Services
Amazon Web ServicesAmazon Web Services
Amazon Web ServicesJisc
 
Big Data Analytics, Machine Learning e Inteligência Artificial
Big Data Analytics, Machine Learning e Inteligência ArtificialBig Data Analytics, Machine Learning e Inteligência Artificial
Big Data Analytics, Machine Learning e Inteligência ArtificialAmazon Web Services LATAM
 
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...Amazon Web Services
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale Amazon Web Services
 
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Amazon Web Services
 
Vancouver keynote - AWS Innovate - Sam Elmalak
Vancouver keynote - AWS Innovate - Sam ElmalakVancouver keynote - AWS Innovate - Sam Elmalak
Vancouver keynote - AWS Innovate - Sam ElmalakAmazon Web Services
 
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014Amazon Web Services
 
Welcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewWelcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewAmazon Web Services
 
Intro Presentation at AWS AWSome Day London September 2015
Intro Presentation at AWS AWSome Day London September 2015Intro Presentation at AWS AWSome Day London September 2015
Intro Presentation at AWS AWSome Day London September 2015Ian Massingham
 
The AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewThe AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewAmazon Web Services
 
Getting Started with Amazon QuickSight
Getting Started with Amazon QuickSightGetting Started with Amazon QuickSight
Getting Started with Amazon QuickSightAmazon Web Services
 
AWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAmazon Web Services
 
Real-time Analytics with Open-Source
Real-time Analytics with Open-SourceReal-time Analytics with Open-Source
Real-time Analytics with Open-SourceAmazon Web Services
 

Similar to AWS Big Data Analytics Architecture (20)

AWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWSAWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
 
Building your First Big Data Application on AWS
Building your First Big Data Application on AWSBuilding your First Big Data Application on AWS
Building your First Big Data Application on AWS
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Amazon Web Services
Amazon Web ServicesAmazon Web Services
Amazon Web Services
 
Big Data Analytics, Machine Learning e Inteligência Artificial
Big Data Analytics, Machine Learning e Inteligência ArtificialBig Data Analytics, Machine Learning e Inteligência Artificial
Big Data Analytics, Machine Learning e Inteligência Artificial
 
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...
 
Building your Datalake on AWS
Building your Datalake on AWSBuilding your Datalake on AWS
Building your Datalake on AWS
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
 
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
 
Vancouver keynote - AWS Innovate - Sam Elmalak
Vancouver keynote - AWS Innovate - Sam ElmalakVancouver keynote - AWS Innovate - Sam Elmalak
Vancouver keynote - AWS Innovate - Sam Elmalak
 
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
 
Welcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewWelcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution Overview
 
Intro Presentation at AWS AWSome Day London September 2015
Intro Presentation at AWS AWSome Day London September 2015Intro Presentation at AWS AWSome Day London September 2015
Intro Presentation at AWS AWSome Day London September 2015
 
The AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewThe AWS Big Data Platform – Overview
The AWS Big Data Platform – Overview
 
2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days
 
Getting Started with Amazon QuickSight
Getting Started with Amazon QuickSightGetting Started with Amazon QuickSight
Getting Started with Amazon QuickSight
 
AWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions Showcase
 
Real-time Analytics with Open-Source
Real-time Analytics with Open-SourceReal-time Analytics with Open-Source
Real-time Analytics with Open-Source
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 

Recently uploaded (20)

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 

AWS Big Data Analytics Architecture

  • 1. Abhishek Sinha Business Development Manager, AWS July 18, 2013 @abysinha sinhaar@amazon.com Big Data Analytics
  • 2. Overview • The Big Data Challenge • Turning data into actionable information • Building a big data platform • Mobilewalla– Big data system in AWS for mobile app audience measurement • Intel technology on big data.
  • 3.
  • 4. Generation Collection & storage Analytics & computation Collaboration & sharing
  • 5. Generation Collection & storage Analytics & computation Collaboration & sharing Lower cost, higher throughput
  • 6. Generation Collection & storage Analytics & computation Collaboration & sharing Highly constrained Lower cost, higher throughput
  • 7. Generated data Available for analysis Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011 IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
  • 8. Big Gap in turning data into actionable information
  • 9. Amazon Web Services helps remove constraints
  • 10.
  • 11. 1 instance x 100 hours = 100 instances x 1 hour
  • 12. Media/Advertising Targeted Advertising Image and Video Processing Oil & Gas Seismic Analysis Retail Recommendation Transactions Analysis Life Sciences Genome Analysis Financial Services Monte Carlo Simulations Risk Analysis Security Anti-virus Fraud Detection Image Recognition Social Network/Gaming User Demographics Usage analysis In-game metrics Big Data Verticals and Use cases
  • 13. From data to actionable information
  • 14. “Who is using our service?”
  • 15. Identified early mobile usage Invested heavily in mobile development Finding signal in the noise of logs
  • 16. 9,432,061 unique mobile devices used the Yelp mobile app. 4 million+ calls. 5 million+ directions. In January 2013
  • 17. “What kind of movies do people like ?”
  • 18. More than 25 Million Streaming Members 50 Billion Events Per Day 30 Million plays every day 2 billion hours of video in 3 months 4 million ratings per day 3 million searches Device location , time , day, week etc. Social data
  • 19.
  • 20.
  • 21.
  • 22. Query complements the R3 solution by providing granular search-and- retrieval functionality for structured and unstructured data stored in FinQloud
  • 24.
  • 25. Generation Collection & storage Analytics & computation Collaboration & sharing
  • 26. Generation Collection & storage Analytics & computation Collaboration & sharing
  • 27. Getting your Data into AWS Amazon S3 Corporate Data Center • Console Upload • FTP • AWS Import Export • S3 API • Direct Connect • Storage Gateway • 3rd Party Commercial Apps • Tsunami UDP 1
  • 28. Write directly to a data source Your application Amazon S3 DynamoDB Any other data store Amazon S3 Amazon EC2 2
  • 29. Queue , pre-process and then write to data source Amazon Simple Queue Service (SQS) Amazon S3 DynamoDB Any other data store 3
  • 30. Agency Customer: Video Analytics on AWS Elastic Load Balancer Edge Servers on EC2 Workers on EC2 Logs Reports HDFS Cluster Amazon Simple Queue Service (SQS) Amazon Simple Storage Service (S3) Amazon Elastic MapReduce
  • 31. Aggregate and write to data source Flume running on EC2 Amazon S3 Any other data store HDFS 4
  • 33. Amazon SQS Amazon S3 DynamoDB Any SQL or NO SQL Store Log Aggregation tools Choose depending upon design
  • 34. Generation Collection & storage Analytics & computation Collaboration & sharing
  • 35. Hadoop based Analysis Amazon SQS Amazon S3 DynamoDB Any SQL or NO SQL Store Log Aggregation tools Amazon EMR
  • 36. EMR is Hadoop in the Cloud What is Amazon Elastic MapReduce (EMR)?
  • 37. EMR Cluster S3 Put the data into S3 Choose: Hadoop distribution, # of nodes, types of nodes, custom configs, Hive/Pig/etc. Get the output from S3 Launch the cluster using the EMR console, CLI, SDK, or APIs You can also store everything in HDFS How does EMR work ?
  • 38. S3 What can you run on EMR… EMR Cluster
  • 39. Resize Nodes EMR Cluster You can easily add and remove nodes
  • 40. On and Off Fast Growth Predictable peaksVariable peaks WASTE
  • 41. Fast GrowthOn and Off Predictable peaksVariable peaks
  • 42. Your choice of tools on Hadoop/EMR Amazon SQS Amazon S3 DynamoDB Any SQL or NO SQL Store Log Aggregation tools Amazon EMR
  • 43. SQL based processing Amazon SQS Amazon S3 DynamoDB Any SQL or NO SQL Store Log Aggregation tools Amazon EMR Amazon Redshift Pre-processing framework Petabyte scale Columnar Data - warehouse
  • 44. What is Amazon Redshift ? Amazon Redshift is a fast and powerful, fully managed, petabyte-scale data warehouse service in the AWS cloud Easy to provision and scale No upfront costs, pay as you go High performance at a low price Open and flexible with support for popular BI tools
  • 45. Amazon Redshift is priced to let you analyze all your data Price Per Hour for HS1.XL Single Node Effective Hourly Price Per TB Effective Annual Price per TB On-Demand $ 0.850 $ 0.425 $ 3,723 1 Year Reservation $ 0.500 $ 0.250 $ 2,190 3 Year Reservation $ 0.228 $ 0.114 $ 999 Simple Pricing Number of Nodes x Cost per Hour No charge for Leader Node No upfront costs Pay as you go
  • 46. Your choice of BI Tools on the cloud Amazon SQS Amazon S3 DynamoDB Any SQL or NO SQL Store Log Aggregation tools Amazon EMR Amazon Redshift Pre-processing framework
  • 47. Generation Collection & storage Analytics & computation Collaboration & sharing
  • 48. Collaboration and Sharing insights Amazon SQS Amazon S3 DynamoDB Any SQL or NO SQL Store Log Aggregation tools Amazon EMR Amazon Redshift
  • 49. Sharing results and visualizations Amazon SQS Amazon S3 DynamoDB Any SQL or NO SQL Store Log Aggregation tools Amazon EMR Amazon Redshift Web App Server Visualization tools
  • 50. Sharing results and visualizations and scale Amazon SQS Amazon S3 DynamoDB Any SQL or NO SQL Store Log Aggregation tools Amazon EMR Amazon Redshift Web App Server Visualization tools
  • 51. Sharing results and visualizations Amazon SQS Amazon S3 DynamoDB Any SQL or NO SQL Store Log Aggregation tools Amazon EMR Amazon Redshift Business Intelligence Tools Business Intelligence Tools
  • 52. Geospatial Visualizations Amazon SQS Amazon S3 DynamoDB Any SQL or NO SQL Store Log Aggregation tools Amazon EMR Amazon Redshift Business Intelligence Tools Business Intelligence Tools GIS tools on hadoop GIS tools Visualization tools
  • 53. Rinse Repeat every day or hour
  • 54. Rinse and Repeat Amazon SQS Amazon S3 DynamoDB Any SQL or NO SQL Store Log Aggregation tools Amazon EMR Amazon Redshift Visualization tools Business Intelligence Tools Business Intelligence Tools GIS tools on hadoop GIS tools Amazon data pipeline
  • 55. The complete architecture Amazon SQS Amazon S3 DynamoDB Any SQL or NO SQL Store Log Aggregation tools Amazon EMR Amazon Redshift Visualization tools Business Intelligence Tools Business Intelligence Tools GIS tools on hadoop GIS tools Amazon data pipeline
  • 56. Kaushik Dutta CTO 18 July, 2013 Mobilewalla – App Audience Measurement With Amazon EC2 Infrastructure
  • 57. Mobilewalla • Seattle-based big data venture that has accumulated the largest volumetric database of app market data in the industry. • Applying data science techniques on this data, Mobilewalla generates actionable intelligence of importance to ad agencies, ad tech companies, and app publishers • Measuring audience in mobile apps
  • 58. Traditional audience measurement - Panels & Popularity Persistence Fundamental to panel driven measurement Idea of popularity persistence Large pool of options “small” set of popular choices 99 – 1 rule Objects popular today  popular 30-60-90 days from today • Panel can be assumed to eventually gravitate towards the persistent popular set
  • 59. Mobilewalla Use Case – App Publishers • How is my app doing? – Rank by Category and Country, Reviews, Ratings, Feature mentions, Sentiment Analysis, Social Media, Audience Profile, Negative Review Analysis, Upgrades • Competitive Tracking – All of the above for competitors presented as overlays • Audience Analysis – Demographics, Psychographics • Alerts – Notifications upon specific events: review spikes, Twitter spikes
  • 60. Mobilewalla Use Case – Mobile Ad Tech • New Publisher Acquisition – Top N apps & Publishers for a Category / Geography – Top publishers by audience • Optimal Traffic Allocation – Related apps by content – Related apps by Audience profile – Behavioral profiles of network apps • Real-Time, Programmatic Delivery – API driven access – Sub 100ms response times
  • 61. Mobilewalla Approach Social media / web Web Crawler Cloud Storage Amazon S3 Amazon EBS Amazon RDS
  • 62. Mobilewalla Approach – Map-Reduce based analytics Analytics Analytics Analytics Analytics Map Reduce Analytics Cloud Storage ( 30+ Terabyte) Amazon S3 Amazon EBS Amazon RDS
  • 63. Mobilewalla – Amazon EC2 Infrastructure Web Crawler • 700+ micro to small instances • Elastic map-reduce – flexibility of allocating a large number instances for a distributed program running for short time • Spot Instance – reduces the cost
  • 64. Mobilewalla – Amazon EC2 Infrastructure Cloud Storage • 50+ Medium to Large instances • Cassandra DB Nodes – EBS backed • Distributed in two availability zones in two different geographical regions • Flexibility to add nodes as and when required – allows you to grow with the business • Region based fail-over • Tier Storage systems – Local storage – Elastic Block Storage – S3 Storage • Considering Amazon Redshift Amazon S3 Amazon EBS Amazon RDS
  • 65. Mobilewalla – Amazon EC2 Infrastructure Map Reduce Framework • Complex analytics jobs on Hadoop systems in EC2 nodes • Elastic map-reduce for jobs requiring large number of nodes on S3 storage systems Analytics Analytics Analytics Analytics
  • 66. Mobilewalla – Amazon EC2 Infrastructure Analytics Delivery • Multiple application servers with load balancers • High read throughput from data nodes • Load balancers (ELB) and fail-over
  • 67. Amazon Web Services for Mobilewalla - Advantages • On-Demand and reserved nodes – Flexibility to add, modify, delete nodes as your business changes • Tiered storage systems to store and manage terabytes of data – Flexibility to change the data parameters (reliability, read-throughput, write throughput) by varying the storage systems of your choice • Elastic Map-Reduce – Large scale map-reduce cluster without getting details into managing individual nodes and map-reduce framework Amazon EC2 allowed us to size our infrastructure as per our need and data growth.
  • 68. Amazon Web Services for Mobilewalla - Suggestions • Take the initial time to explore all the various offerings of Amazon in data storage and management, before developing a solution • Changing solution architecture for terabytes of data at later time is a challenge
  • 70. Big Data Analytics Eddie Toh Regional Platform Marketing Manager Pricing & Product Marketing Group Intel APAC July 18, 2013
  • 71. Create new business models and improve organizational processes. Enhance scientific understanding, drive innovation, and accelerate medical cures. Increase public safety and improve energy efficiency with smart grids. Analysis of Data Can Transform Society
  • 72. Unlock Value in Silicon Support Open Platforms Deliver Software Value Democratizing Analytics gets Value out of Big Data
  • 73. Intel at the Intersection of Big Data Enabling exascale computing on massive data sets Helping enterprises build open interoperable clouds CloudHPC Contributing code and fostering ecosystem Open Source
  • 74. Intel at the Heart of the Cloud Server Storage Network
  • 75. Scale-Out Platform Optimizations for Big Data Cost-effective performance • Intel® Advanced Vector Extension Technology • Intel® Turbo Boost Technology 2.0 • Intel® Advanced Encryption Standard New Instructions Technology
  • 76. Intel® Advanced Vector Extensions Technology 1 : Performance comparison using Linpack benchmark. See backup for configuration details. For more legal information on performance forecasts go to http://www.intel.com/performance 76 • Newest in a long line of processor instruction innovations • Increases floating point operations per clock up to 2X1 performance Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
  • 77. More Performance Higher turbo speeds maximize performance for single and multi-threaded applications Intel® Turbo Boost Technology 2.0
  • 78. Intel® Advanced Encryption Standard New Instructions • Processor assistance for performing AES encryption - 7 new instructions • Makes enabled encryption software faster and stronger
  • 79. Richer user experiences 4HRS 50% Reduction ~7MIN 80% Reduction 50% Reduction 40% Reduction TeraSort for 1TB sort Intel® Xeon® Processor E5 2600 Solid-State Drive 10G Ethernet Intel® Distribution for Apache Hadoop Previous Intel® Xeon® Processor Power of the Platform built by Intel