SlideShare a Scribd company logo
1 of 16
Download to read offline
Big Data in Production:
Lessons from Running in the
Cloud
Marvin Theimer
Amazon Web Services
From Prototype to Production
Wow!! I got it to work!
How soon can I have a million of those?
2
Production “ilities”
• Scalability: many jobs, running day-in and day-out
• High availability: no excuses for not delivering
results
• Maintainability: seamless upgrades, security
patches, backups, …
• Evolvability: the new person, a year from now, can
add new features to your production system
• …
3
Test Data è Valuable Data
4
Mickey Mouse
Main St
Disneyland USA
John Doe
1234 5th Ave
New York NY 10101
S3: $123
EC2: $450
Redshift: $303
Quicksight: $242
…
?
Metering records Bills My bill
Algorithmic efficiency è “mundane”
efficiency
5
Constants matter
The never-ending battle
against waste
vs.
Fine-Grained Control vs. Ease-of-Use
6
vs.
You’ll never believe the variety of workloads they’ll feed you
Scale è Automation
7
HELP!
High Availability
8
Single points of failure vs. “large scale events”
Where did it go wrong, when, and why??
Step 1
Step 1-a
Step 1-
a-1
Step 1-
1-2
Step 2
Step 2-a
Activity 1
Activity 2
Step 1
Step 2
Activity 3
Step 1
Step 2
X
Security and
Identity/Access Management
9
Mundane Efficiency
• Support for smart usage: usage reports, billing alerts, cost explorers,
life-cycle management, etc.
• Multi-tenant solutions become necessary but take a new way of
thinking
– The cloud is all about elastic resource management. But that’s only part of
the solution
– Also need help with cross-account access management, fine-grained
cost/benefit allocation, etc.
– But things like IAM, detailed billing reports, and Cost Explorer can help.
• Ditto for reusability of services, computations, data, results, etc.
10
From Pioneers to Settlers
11
foreach inputDataSet in sources
computeAlgorithm
From “do it exactly the way I tell you to” to “just do it”
Production readiness has to be built in
The potential of Big Data The production use of Big Data
12
• Automate “everything”
• Security by default
• End-to-end change journals and audit logs
• Serverless, event-driven computing
• Reproducibility, support for data amendments
• Support for relentless cost reduction
Luckily the Cloud can do most of the
heavy lifting for you
• CloudFormation , Opsworks , Spark on EMR
• VPC , IAM , Cloudtrail , AWS Config , KMS , ACM , CloudHSM
, Inspector
• CloudWatch Logs & Events , Lambda
• Storage life-cyclemanagement
– S3 , Glacier , Snowball , DirectConnect
• Tags, usage reports, billing alerts, cost allocation, detailed billing reports, cost
explorers, Trusted Advisor , QuickSight , Redshift
13
THANK YOU.
Images used in this talk
• Slide 2:
– https://pixabay.com/static/uploads/photo/2014/11/20/08/57/success-538725_960_720.jpg
– https://i.ytimg.com/vi/Nc1-gACc4lo/hqdefault.jpg
• Slide 5:
– https://upload.wikimedia.org/wikipedia/commons/8/87/IBM_card_storage.NARA.jpg
– https://upload.wikimedia.org/wikipedia/commons/1/1b/Francois_sagat_in_BLAB_LA_ZOMBIE_by_ARNO_ROCA_%2
84399970229%29.jpg
– https://upload.wikimedia.org/wikipedia/commons/thumb/7/7b/United_States_one_dollar_bill,_obverse.jpg/1024px-
United_States_one_dollar_bill,_obverse.jpg
• Slide 6:
– https://upload.wikimedia.org/wikipedia/commons/b/b8/Manusingmicroscope.jpg
http://images.forbes.com/media/2009/05/04/0504_best-paying-jobs_21.jpg
– https://cdn.milwaukeetool.com/~/media/Images/Power%20Tools/Corded/6538-21/39789_6538-21v3-lg.jpg
– https://www.milwaukeetool.ca/fr//cdn.milwaukeetool.com/~/media/Images/Power%20Tools/Corded/6520-
21/39679_6520-21v2-lg.jpg
– https://s-media-cache-ak0.pinimg.com/236x/b1/1a/af/b11aaf393cf23ef3d2baf979d0c69d22.jpg
• Slide 7:
– http://media.culturemap.com/crop/01/49/633x475/Katy-Freeway-highway-Interstate-10-traffic-traffic-jam-March-
2014_165658.jpg
– https://pixabay.com/static/uploads/photo/2013/07/13/13/41/robot-161367_960_720.png
15
Images used in this talk (cont.)
• Slide 8:
– https://upload.wikimedia.org/wikipedia/commons/2/2c/House_destroyed_by_fire_in_village_Burkovo,_Brilyakosky_Sel
sovet.jpg
– https://upload.wikimedia.org/wikipedia/commons/b/bf/22_May_2011_Joplin_tornado_damage.jp
– http://www.meldrethhistory.org.uk/images/uploaded/originals/post_office_book.jpg
– https://c2.staticflickr.com/8/7410/8981216419_2e53463a33_b.jpg
• Slide 9:
– https://pixabay.com/static/uploads/photo/2013/07/13/10/24/burglar-157142_960_720.png
– http://www.cynic.org.uk/holidays/nyc2002/manhattan-s.jpg
• Slide 11:
– http://www.legendsofamerica.com/photos-americanhistory/William-becknell.jpg
– http://1.bp.blogspot.com/_bvtos4pi3no/S-BtPUuxDkI/AAAAAAAAA0E/RnSb1l7NWPY/s1600/DSCN1071.JPG
– https://pixabay.com/static/uploads/photo/2015/04/24/22/47/san-francisco-738416_960_720.jpg
– https://c3.staticflickr.com/3/2133/2179187226_e2e107e0cd.jpg
• Slide 12:
– http://www.mountainphotography.com/images/275/20051222-Timpanogos-Sage.jpg
– http://www.systemsolutionsdevelopment.com/wp-content/uploads/2015/05/chicagoatnight.jpg
16

More Related Content

What's hot

Analytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret WeaponAnalytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret Weapon
Databricks
 

What's hot (20)

1 Introduction to Microsoft data platform analytics for release
1 Introduction to Microsoft data platform analytics for release1 Introduction to Microsoft data platform analytics for release
1 Introduction to Microsoft data platform analytics for release
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
 
Data saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overviewData saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overview
 
Data & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architectureData & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architecture
 
Big data on AWS
Big data on AWSBig data on AWS
Big data on AWS
 
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...
 
IMCSummit 2015 - Day 2 Developer Track - A Reference Architecture for the Int...
IMCSummit 2015 - Day 2 Developer Track - A Reference Architecture for the Int...IMCSummit 2015 - Day 2 Developer Track - A Reference Architecture for the Int...
IMCSummit 2015 - Day 2 Developer Track - A Reference Architecture for the Int...
 
Big Data – A New Testing Challenge
Big Data – A New Testing ChallengeBig Data – A New Testing Challenge
Big Data – A New Testing Challenge
 
Analysing data analytics use cases to understand big data platform
Analysing data analytics use cases  to understand big data platformAnalysing data analytics use cases  to understand big data platform
Analysing data analytics use cases to understand big data platform
 
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
 
Data Driven Decisions at Scale
Data Driven Decisions at ScaleData Driven Decisions at Scale
Data Driven Decisions at Scale
 
Building Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLBuilding Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemML
 
IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...
IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...
IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...
 
Analytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret WeaponAnalytics-Enabled Experiences: The New Secret Weapon
Analytics-Enabled Experiences: The New Secret Weapon
 
Saving Energy in Homes with a Unified Approach to Data and AI
Saving Energy in Homes with a Unified Approach to Data and AISaving Energy in Homes with a Unified Approach to Data and AI
Saving Energy in Homes with a Unified Approach to Data and AI
 
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
 
Intuit Analytics Cloud 101
Intuit Analytics Cloud 101Intuit Analytics Cloud 101
Intuit Analytics Cloud 101
 
How to design and implement a data ops architecture with sdc and gcp
How to design and implement a data ops architecture with sdc and gcpHow to design and implement a data ops architecture with sdc and gcp
How to design and implement a data ops architecture with sdc and gcp
 
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
Introduction to Big Data Technologies:  Hadoop/EMR/Map Reduce & RedshiftIntroduction to Big Data Technologies:  Hadoop/EMR/Map Reduce & Redshift
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
 
Pivot 2.0 - The next generation visualization tool for your streaming data
Pivot 2.0 - The next generation visualization tool for your streaming dataPivot 2.0 - The next generation visualization tool for your streaming data
Pivot 2.0 - The next generation visualization tool for your streaming data
 

Viewers also liked

Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Databricks
 

Viewers also liked (20)

Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
 
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
 
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
 
Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow
 
Temporal Operators For Spark Streaming And Its Application For Office365 Serv...
Temporal Operators For Spark Streaming And Its Application For Office365 Serv...Temporal Operators For Spark Streaming And Its Application For Office365 Serv...
Temporal Operators For Spark Streaming And Its Application For Office365 Serv...
 
Huohua: A Distributed Time Series Analysis Framework For Spark
Huohua: A Distributed Time Series Analysis Framework For SparkHuohua: A Distributed Time Series Analysis Framework For Spark
Huohua: A Distributed Time Series Analysis Framework For Spark
 
Recent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced AnalyticsRecent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced Analytics
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
 
Airstream: Spark Streaming At Airbnb
Airstream: Spark Streaming At AirbnbAirstream: Spark Streaming At Airbnb
Airstream: Spark Streaming At Airbnb
 
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
Building Realtime Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtime Data Pipelines with Kafka Connect and Spark Streaming
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming
 
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibElasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlib
 
Morticia: Visualizing And Debugging Complex Spark Workflows
Morticia: Visualizing And Debugging Complex Spark WorkflowsMorticia: Visualizing And Debugging Complex Spark Workflows
Morticia: Visualizing And Debugging Complex Spark Workflows
 
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
 
From MapReduce to Apache Spark
From MapReduce to Apache SparkFrom MapReduce to Apache Spark
From MapReduce to Apache Spark
 
GraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQLGraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQL
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity Clusters
 
Huawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark StreamingHuawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark Streaming
 
Spark at Bloomberg: Dynamically Composable Analytics
Spark at Bloomberg:  Dynamically Composable Analytics Spark at Bloomberg:  Dynamically Composable Analytics
Spark at Bloomberg: Dynamically Composable Analytics
 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache Spark
 

Similar to Big Data in Production: Lessons from Running in the Cloud

Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
ConnectaDigital
 
Lean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataLean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big Data
Stylight
 
2016 DSG Webinar Azure HDInsight 2 V4
2016 DSG Webinar Azure HDInsight 2 V42016 DSG Webinar Azure HDInsight 2 V4
2016 DSG Webinar Azure HDInsight 2 V4
Janani Eshwaran
 

Similar to Big Data in Production: Lessons from Running in the Cloud (20)

Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerceDon't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
 
The Cloud Imperative – What, Why, When and How
The Cloud Imperative – What, Why, When and HowThe Cloud Imperative – What, Why, When and How
The Cloud Imperative – What, Why, When and How
 
AWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and RedshiftAWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
 
Solving Big Data problems on AWS by Rajnish Malik
Solving Big Data problems on AWS by Rajnish MalikSolving Big Data problems on AWS by Rajnish Malik
Solving Big Data problems on AWS by Rajnish Malik
 
Tapping the cloud for real time data analytics
 Tapping the cloud for real time data analytics Tapping the cloud for real time data analytics
Tapping the cloud for real time data analytics
 
Transitioning to the Cloud: Implications for Reliability, Redundancy & Recove...
Transitioning to the Cloud: Implications for Reliability, Redundancy & Recove...Transitioning to the Cloud: Implications for Reliability, Redundancy & Recove...
Transitioning to the Cloud: Implications for Reliability, Redundancy & Recove...
 
Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...
Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...
Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...
 
FirstEigen Brochure- All clouds.pdf
FirstEigen Brochure- All clouds.pdfFirstEigen Brochure- All clouds.pdf
FirstEigen Brochure- All clouds.pdf
 
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
 
Effective Cost Management for Amazon EMR
Effective Cost Management for Amazon EMREffective Cost Management for Amazon EMR
Effective Cost Management for Amazon EMR
 
Architecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High PerformanceArchitecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High Performance
 
Swiss API Day - Amazon Web Service - Digital transformation with the AWS Cloud
Swiss API Day - Amazon Web Service - Digital transformation with the AWS CloudSwiss API Day - Amazon Web Service - Digital transformation with the AWS Cloud
Swiss API Day - Amazon Web Service - Digital transformation with the AWS Cloud
 
Building Confidence in Big Data - IBM Smarter Business 2013
Building Confidence in Big Data - IBM Smarter Business 2013 Building Confidence in Big Data - IBM Smarter Business 2013
Building Confidence in Big Data - IBM Smarter Business 2013
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...
 
Bridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the CloudBridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the Cloud
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
 
Lean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataLean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big Data
 
Unconstrained Analytics in the Age of Data – Delivering High-Performance Anal...
Unconstrained Analytics in the Age of Data – Delivering High-Performance Anal...Unconstrained Analytics in the Age of Data – Delivering High-Performance Anal...
Unconstrained Analytics in the Age of Data – Delivering High-Performance Anal...
 
2016 DSG Webinar Azure HDInsight 2 V4
2016 DSG Webinar Azure HDInsight 2 V42016 DSG Webinar Azure HDInsight 2 V4
2016 DSG Webinar Azure HDInsight 2 V4
 

More from Jen Aman

More from Jen Aman (20)

Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Deep Learning and Streaming in Apache Spark 2.x with Matei ZahariaDeep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
 
Snorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher RéSnorkel: Dark Data and Machine Learning with Christopher Ré
Snorkel: Dark Data and Machine Learning with Christopher Ré
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
 
RISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsRISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time Decisions
 
Spatial Analysis On Histological Images Using Spark
Spatial Analysis On Histological Images Using SparkSpatial Analysis On Histological Images Using Spark
Spatial Analysis On Histological Images Using Spark
 
A Graph-Based Method For Cross-Entity Threat Detection
 A Graph-Based Method For Cross-Entity Threat Detection A Graph-Based Method For Cross-Entity Threat Detection
A Graph-Based Method For Cross-Entity Threat Detection
 
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In SparkYggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
 
Deploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using SparkDeploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using Spark
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out Databases
 
Livy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache SparkLivy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache Spark
 
GPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And PythonGPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And Python
 
Spark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 FuriousSpark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 Furious
 
Spark on Mesos
Spark on MesosSpark on Mesos
Spark on Mesos
 
EclairJS = Node.Js + Apache Spark
EclairJS = Node.Js + Apache SparkEclairJS = Node.Js + Apache Spark
EclairJS = Node.Js + Apache Spark
 
Spark: Interactive To Production
Spark: Interactive To ProductionSpark: Interactive To Production
Spark: Interactive To Production
 
High-Performance Python On Spark
High-Performance Python On SparkHigh-Performance Python On Spark
High-Performance Python On Spark
 
Scalable Deep Learning Platform On Spark In Baidu
Scalable Deep Learning Platform On Spark In BaiduScalable Deep Learning Platform On Spark In Baidu
Scalable Deep Learning Platform On Spark In Baidu
 
Scaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of ParametersScaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of Parameters
 

Recently uploaded

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 

Big Data in Production: Lessons from Running in the Cloud

  • 1. Big Data in Production: Lessons from Running in the Cloud Marvin Theimer Amazon Web Services
  • 2. From Prototype to Production Wow!! I got it to work! How soon can I have a million of those? 2
  • 3. Production “ilities” • Scalability: many jobs, running day-in and day-out • High availability: no excuses for not delivering results • Maintainability: seamless upgrades, security patches, backups, … • Evolvability: the new person, a year from now, can add new features to your production system • … 3
  • 4. Test Data è Valuable Data 4 Mickey Mouse Main St Disneyland USA John Doe 1234 5th Ave New York NY 10101 S3: $123 EC2: $450 Redshift: $303 Quicksight: $242 … ? Metering records Bills My bill
  • 5. Algorithmic efficiency è “mundane” efficiency 5 Constants matter The never-ending battle against waste vs.
  • 6. Fine-Grained Control vs. Ease-of-Use 6 vs. You’ll never believe the variety of workloads they’ll feed you
  • 8. High Availability 8 Single points of failure vs. “large scale events” Where did it go wrong, when, and why?? Step 1 Step 1-a Step 1- a-1 Step 1- 1-2 Step 2 Step 2-a Activity 1 Activity 2 Step 1 Step 2 Activity 3 Step 1 Step 2 X
  • 10. Mundane Efficiency • Support for smart usage: usage reports, billing alerts, cost explorers, life-cycle management, etc. • Multi-tenant solutions become necessary but take a new way of thinking – The cloud is all about elastic resource management. But that’s only part of the solution – Also need help with cross-account access management, fine-grained cost/benefit allocation, etc. – But things like IAM, detailed billing reports, and Cost Explorer can help. • Ditto for reusability of services, computations, data, results, etc. 10
  • 11. From Pioneers to Settlers 11 foreach inputDataSet in sources computeAlgorithm From “do it exactly the way I tell you to” to “just do it”
  • 12. Production readiness has to be built in The potential of Big Data The production use of Big Data 12 • Automate “everything” • Security by default • End-to-end change journals and audit logs • Serverless, event-driven computing • Reproducibility, support for data amendments • Support for relentless cost reduction
  • 13. Luckily the Cloud can do most of the heavy lifting for you • CloudFormation , Opsworks , Spark on EMR • VPC , IAM , Cloudtrail , AWS Config , KMS , ACM , CloudHSM , Inspector • CloudWatch Logs & Events , Lambda • Storage life-cyclemanagement – S3 , Glacier , Snowball , DirectConnect • Tags, usage reports, billing alerts, cost allocation, detailed billing reports, cost explorers, Trusted Advisor , QuickSight , Redshift 13
  • 15. Images used in this talk • Slide 2: – https://pixabay.com/static/uploads/photo/2014/11/20/08/57/success-538725_960_720.jpg – https://i.ytimg.com/vi/Nc1-gACc4lo/hqdefault.jpg • Slide 5: – https://upload.wikimedia.org/wikipedia/commons/8/87/IBM_card_storage.NARA.jpg – https://upload.wikimedia.org/wikipedia/commons/1/1b/Francois_sagat_in_BLAB_LA_ZOMBIE_by_ARNO_ROCA_%2 84399970229%29.jpg – https://upload.wikimedia.org/wikipedia/commons/thumb/7/7b/United_States_one_dollar_bill,_obverse.jpg/1024px- United_States_one_dollar_bill,_obverse.jpg • Slide 6: – https://upload.wikimedia.org/wikipedia/commons/b/b8/Manusingmicroscope.jpg http://images.forbes.com/media/2009/05/04/0504_best-paying-jobs_21.jpg – https://cdn.milwaukeetool.com/~/media/Images/Power%20Tools/Corded/6538-21/39789_6538-21v3-lg.jpg – https://www.milwaukeetool.ca/fr//cdn.milwaukeetool.com/~/media/Images/Power%20Tools/Corded/6520- 21/39679_6520-21v2-lg.jpg – https://s-media-cache-ak0.pinimg.com/236x/b1/1a/af/b11aaf393cf23ef3d2baf979d0c69d22.jpg • Slide 7: – http://media.culturemap.com/crop/01/49/633x475/Katy-Freeway-highway-Interstate-10-traffic-traffic-jam-March- 2014_165658.jpg – https://pixabay.com/static/uploads/photo/2013/07/13/13/41/robot-161367_960_720.png 15
  • 16. Images used in this talk (cont.) • Slide 8: – https://upload.wikimedia.org/wikipedia/commons/2/2c/House_destroyed_by_fire_in_village_Burkovo,_Brilyakosky_Sel sovet.jpg – https://upload.wikimedia.org/wikipedia/commons/b/bf/22_May_2011_Joplin_tornado_damage.jp – http://www.meldrethhistory.org.uk/images/uploaded/originals/post_office_book.jpg – https://c2.staticflickr.com/8/7410/8981216419_2e53463a33_b.jpg • Slide 9: – https://pixabay.com/static/uploads/photo/2013/07/13/10/24/burglar-157142_960_720.png – http://www.cynic.org.uk/holidays/nyc2002/manhattan-s.jpg • Slide 11: – http://www.legendsofamerica.com/photos-americanhistory/William-becknell.jpg – http://1.bp.blogspot.com/_bvtos4pi3no/S-BtPUuxDkI/AAAAAAAAA0E/RnSb1l7NWPY/s1600/DSCN1071.JPG – https://pixabay.com/static/uploads/photo/2015/04/24/22/47/san-francisco-738416_960_720.jpg – https://c3.staticflickr.com/3/2133/2179187226_e2e107e0cd.jpg • Slide 12: – http://www.mountainphotography.com/images/275/20051222-Timpanogos-Sage.jpg – http://www.systemsolutionsdevelopment.com/wp-content/uploads/2015/05/chicagoatnight.jpg 16