SlideShare a Scribd company logo
1 of 9
Download to read offline
Big Data on EC2: Mashing Technology in the Cloud

ShareThis


Paco Nathan, Data Insights Team




ScaleCamp 2009-06-09
Scenario…


✤   Given: >10^6 publishers, >10^9 users, >10^10 urls

✤   Early-stage start-up, < 25 people, seeking to minimize spend on Ops
    and capex for data centers

✤   Serving widgets on page views for popular online publications: ESPN,
    HuffPost, FOX, CS Monitor, CBS Marketwatch, Wired, TechCrunch, etc.

✤   Spikes in popularity of stories leads to elastic demands throughout
    the system architecture: serving API, logging, DW, BI, etc.

✤   Business needs to improve user experience by analyzing how people
    share online content


ScaleCamp 2009-06-09
System Architecture


✤   100% infrastructure based on Amazon AWS

✤   Each component designed for cost-effective, horizontal scale-out

✤   AsterData: infrastructure based on a “hub-and-spoke” pattern of batch
    jobs and data consolidation

✤   Cascading: abstraction layer for tying together system components

✤   Batch jobs run on Amazon Elastic MapReduce and AsterData SQL/MR

✤   Vertical search based on Bixo and Katta


ScaleCamp 2009-06-09
ScaleCamp 2009-06-09
Cascading

✤   Syntax is for humans, APIs are for software

✤   Cascading defines apps in terms of functions applied to data flows,
    incorporates end-points; whereas MR is coarse, key/values are brittle

✤   Direct benefits to team’s responsibilities, process, deliverables

✤   Also consider impact on team size, composition, staffing, training

✤   Ideal for provisioning/monitoring elastic resources, such as EMR

✤   Great potential to allow for migrating apps

✤   (see also: Cascading talk @ Hadoop Summit tomorrow, 6pm)

ScaleCamp 2009-06-09
Amazon Elastic MapReduce

✤   Can scale-out wide and select different instance types at launch

✤   Reads input from S3, writes output to S3, optionally copies logs to S3

✤   Excellent command line tools make dev/test/debug cycle more efficient

✤   Risks for DIY Hadoop clusters: time+cost to acquire leases, data locality,
    etc., — EMR resolves these to make large apps more cost-effective

✤   Simplifies needs for Ops — which can be quite difficult and expensive
    to staff for Big Data apps, and pose a risk to scale-out strategy

✤   See the Cascading examples: LogAnalyzer for CloudFront and Multitool

ScaleCamp 2009-06-09
AsterData nCluster Cloud Edition

✤   Scalable, fault-tolerant, relational database — directly in the cloud

✤   Takes only minutes to go from idea to a prototype which scales well;
    accessible by data scientists who have less coding background

✤   Use of SQL unions on map phase, plus other SQL primitives for shuffle
    and reduce phases, allows for complex MR workflows

✤   Leveraging SQL primitives during shuffle and reduce phases allows for
    automated optimizations

✤   Contributed code among developer community, implementing libraries
    for popular algorithms based on In-Database MapReduce


ScaleCamp 2009-06-09
ShareThis as Case Study

See also: ShareThis case study, "Cascading"
by Chris K Wensel, in…




ScaleCamp 2009-06-09
Contacts:
http://sharethis.com
@pacoid on Twitter
http://cascading.org
http://asterdata.com/product/ncluster_cloud.php
http://aws.amazon.com/elasticmapreduce
http://github.com/emi/bixo
http://katta.sourceforge.net
http://www.hadoopbook.com




ScaleCamp 2009-06-09

More Related Content

What's hot

PowerStream: Propelling Energy Innovation with Predictive Analytics
PowerStream: Propelling Energy Innovation with Predictive Analytics PowerStream: Propelling Energy Innovation with Predictive Analytics
PowerStream: Propelling Energy Innovation with Predictive Analytics SingleStore
 
High availability, real-time and scalable architectures
High availability, real-time and scalable architecturesHigh availability, real-time and scalable architectures
High availability, real-time and scalable architecturesJampp
 
Modeling the Smart and Connected City of the Future with Kafka and Spark
Modeling the Smart and Connected City of the Future with Kafka and SparkModeling the Smart and Connected City of the Future with Kafka and Spark
Modeling the Smart and Connected City of the Future with Kafka and SparkSingleStore
 
Processing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the processProcessing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the processJampp
 
Razorfish - Amazon EMR usecase
Razorfish - Amazon EMR usecaseRazorfish - Amazon EMR usecase
Razorfish - Amazon EMR usecaseguestda111d9
 
Big problems Big Data, simple solutions
Big problems Big Data, simple solutionsBig problems Big Data, simple solutions
Big problems Big Data, simple solutionsClaudio Pontili
 
Multi objective tasks scheduling algorithm for cloud computing throughput opt...
Multi objective tasks scheduling algorithm for cloud computing throughput opt...Multi objective tasks scheduling algorithm for cloud computing throughput opt...
Multi objective tasks scheduling algorithm for cloud computing throughput opt...Shakas Technologies
 
Change Data Capture - Scale by the Bay 2019
Change Data Capture - Scale by the Bay 2019Change Data Capture - Scale by the Bay 2019
Change Data Capture - Scale by the Bay 2019Petr Zapletal
 
Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...
Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...
Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...Coburn Watson
 
Howtomakeyourown gi sdashboard
Howtomakeyourown gi sdashboardHowtomakeyourown gi sdashboard
Howtomakeyourown gi sdashboardGeoMedeelel
 
0 supermapproductsintroduction
0 supermapproductsintroduction0 supermapproductsintroduction
0 supermapproductsintroductionGeoMedeelel
 
02 supermapiclientforjavascriptintroduction
02 supermapiclientforjavascriptintroduction02 supermapiclientforjavascriptintroduction
02 supermapiclientforjavascriptintroductionGeoMedeelel
 
01 supermapiportaloverview
01 supermapiportaloverview01 supermapiportaloverview
01 supermapiportaloverviewGeoMedeelel
 
Autonomous analytics on streaming data
Autonomous analytics on streaming dataAutonomous analytics on streaming data
Autonomous analytics on streaming dataClaudiu Barbura
 
01 supermapiserverintroduction
01 supermapiserverintroduction01 supermapiserverintroduction
01 supermapiserverintroductionGeoMedeelel
 
Scalable crawling with Kafka, scrapy and spark - November 2021
Scalable crawling with Kafka, scrapy and spark - November 2021Scalable crawling with Kafka, scrapy and spark - November 2021
Scalable crawling with Kafka, scrapy and spark - November 2021Max Lapan
 
AWS Customer Presentation - Cloud Made
AWS Customer Presentation - Cloud MadeAWS Customer Presentation - Cloud Made
AWS Customer Presentation - Cloud MadeAmazon Web Services
 
Cloud computing's truly open silver lining: OpenStack
Cloud computing's truly open silver lining: OpenStackCloud computing's truly open silver lining: OpenStack
Cloud computing's truly open silver lining: OpenStackAsociatia ProLinux
 
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and SparkSpark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and SparkSingleStore
 

What's hot (20)

PowerStream: Propelling Energy Innovation with Predictive Analytics
PowerStream: Propelling Energy Innovation with Predictive Analytics PowerStream: Propelling Energy Innovation with Predictive Analytics
PowerStream: Propelling Energy Innovation with Predictive Analytics
 
High availability, real-time and scalable architectures
High availability, real-time and scalable architecturesHigh availability, real-time and scalable architectures
High availability, real-time and scalable architectures
 
Modeling the Smart and Connected City of the Future with Kafka and Spark
Modeling the Smart and Connected City of the Future with Kafka and SparkModeling the Smart and Connected City of the Future with Kafka and Spark
Modeling the Smart and Connected City of the Future with Kafka and Spark
 
Processing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the processProcessing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the process
 
Razorfish - Amazon EMR usecase
Razorfish - Amazon EMR usecaseRazorfish - Amazon EMR usecase
Razorfish - Amazon EMR usecase
 
Big problems Big Data, simple solutions
Big problems Big Data, simple solutionsBig problems Big Data, simple solutions
Big problems Big Data, simple solutions
 
Multi objective tasks scheduling algorithm for cloud computing throughput opt...
Multi objective tasks scheduling algorithm for cloud computing throughput opt...Multi objective tasks scheduling algorithm for cloud computing throughput opt...
Multi objective tasks scheduling algorithm for cloud computing throughput opt...
 
Change Data Capture - Scale by the Bay 2019
Change Data Capture - Scale by the Bay 2019Change Data Capture - Scale by the Bay 2019
Change Data Capture - Scale by the Bay 2019
 
Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...
Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...
Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...
 
Howtomakeyourown gi sdashboard
Howtomakeyourown gi sdashboardHowtomakeyourown gi sdashboard
Howtomakeyourown gi sdashboard
 
0 supermapproductsintroduction
0 supermapproductsintroduction0 supermapproductsintroduction
0 supermapproductsintroduction
 
Practical Experiences With ArcGIS Server
Practical Experiences With ArcGIS ServerPractical Experiences With ArcGIS Server
Practical Experiences With ArcGIS Server
 
02 supermapiclientforjavascriptintroduction
02 supermapiclientforjavascriptintroduction02 supermapiclientforjavascriptintroduction
02 supermapiclientforjavascriptintroduction
 
01 supermapiportaloverview
01 supermapiportaloverview01 supermapiportaloverview
01 supermapiportaloverview
 
Autonomous analytics on streaming data
Autonomous analytics on streaming dataAutonomous analytics on streaming data
Autonomous analytics on streaming data
 
01 supermapiserverintroduction
01 supermapiserverintroduction01 supermapiserverintroduction
01 supermapiserverintroduction
 
Scalable crawling with Kafka, scrapy and spark - November 2021
Scalable crawling with Kafka, scrapy and spark - November 2021Scalable crawling with Kafka, scrapy and spark - November 2021
Scalable crawling with Kafka, scrapy and spark - November 2021
 
AWS Customer Presentation - Cloud Made
AWS Customer Presentation - Cloud MadeAWS Customer Presentation - Cloud Made
AWS Customer Presentation - Cloud Made
 
Cloud computing's truly open silver lining: OpenStack
Cloud computing's truly open silver lining: OpenStackCloud computing's truly open silver lining: OpenStack
Cloud computing's truly open silver lining: OpenStack
 
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and SparkSpark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
Spark Summit West 2017: Real-Time Image Recognition with MemSQL and Spark
 

Viewers also liked

Ahead of the curve with Ecommerce
Ahead of the curve with EcommerceAhead of the curve with Ecommerce
Ahead of the curve with EcommerceRahul Singh
 
Social Media for Atlantic Canada Marketing
Social Media for Atlantic Canada MarketingSocial Media for Atlantic Canada Marketing
Social Media for Atlantic Canada MarketingMediaBadger
 
Women and Social Media in Canada
Women and Social Media in CanadaWomen and Social Media in Canada
Women and Social Media in CanadaAyelet Baron
 
How Canadians Shop
How Canadians ShopHow Canadians Shop
How Canadians Shopguest970d5
 
Kristof Coussement - The Debate: the Future of (Big) Data Analytics Software
Kristof Coussement - The Debate: the Future of (Big) Data Analytics SoftwareKristof Coussement - The Debate: the Future of (Big) Data Analytics Software
Kristof Coussement - The Debate: the Future of (Big) Data Analytics SoftwareBAQMaR
 
Histmojurassicpark 090807051516-phpapp01
Histmojurassicpark 090807051516-phpapp01Histmojurassicpark 090807051516-phpapp01
Histmojurassicpark 090807051516-phpapp01ajeesh n
 
eMarketer Webinar: Demographics in Canada—Age-based Digital Behaviors
eMarketer Webinar: Demographics in Canada—Age-based Digital BehaviorseMarketer Webinar: Demographics in Canada—Age-based Digital Behaviors
eMarketer Webinar: Demographics in Canada—Age-based Digital BehaviorseMarketer
 
Animatronics
AnimatronicsAnimatronics
AnimatronicsPLANB1
 
The Impact of Big Data On Marketing Analytics (UpStream Software)
The Impact of Big Data On Marketing Analytics (UpStream Software)The Impact of Big Data On Marketing Analytics (UpStream Software)
The Impact of Big Data On Marketing Analytics (UpStream Software)Revolution Analytics
 
Ultra wideband technology (UWB)
Ultra wideband technology (UWB)Ultra wideband technology (UWB)
Ultra wideband technology (UWB)Mustafa Khaleel
 
Canada country presentation_presentation
Canada country presentation_presentationCanada country presentation_presentation
Canada country presentation_presentationTeresa Maroto Ramos
 
Wireless Application Protocol ppt
Wireless Application Protocol pptWireless Application Protocol ppt
Wireless Application Protocol pptgo2project
 

Viewers also liked (20)

Ahead of the curve with Ecommerce
Ahead of the curve with EcommerceAhead of the curve with Ecommerce
Ahead of the curve with Ecommerce
 
Social Media for Atlantic Canada Marketing
Social Media for Atlantic Canada MarketingSocial Media for Atlantic Canada Marketing
Social Media for Atlantic Canada Marketing
 
Social media for brands.pdf
Social media for brands.pdfSocial media for brands.pdf
Social media for brands.pdf
 
Canada
CanadaCanada
Canada
 
Women and Social Media in Canada
Women and Social Media in CanadaWomen and Social Media in Canada
Women and Social Media in Canada
 
How Canadians Shop
How Canadians ShopHow Canadians Shop
How Canadians Shop
 
Canadaeconomics
CanadaeconomicsCanadaeconomics
Canadaeconomics
 
Maglev trains
Maglev trainsMaglev trains
Maglev trains
 
Wind power forecasting accuracy and uncertainty in Finland
Wind power forecasting accuracy and uncertainty in FinlandWind power forecasting accuracy and uncertainty in Finland
Wind power forecasting accuracy and uncertainty in Finland
 
Kristof Coussement - The Debate: the Future of (Big) Data Analytics Software
Kristof Coussement - The Debate: the Future of (Big) Data Analytics SoftwareKristof Coussement - The Debate: the Future of (Big) Data Analytics Software
Kristof Coussement - The Debate: the Future of (Big) Data Analytics Software
 
Animatronics Portfolio
Animatronics PortfolioAnimatronics Portfolio
Animatronics Portfolio
 
Histmojurassicpark 090807051516-phpapp01
Histmojurassicpark 090807051516-phpapp01Histmojurassicpark 090807051516-phpapp01
Histmojurassicpark 090807051516-phpapp01
 
eMarketer Webinar: Demographics in Canada—Age-based Digital Behaviors
eMarketer Webinar: Demographics in Canada—Age-based Digital BehaviorseMarketer Webinar: Demographics in Canada—Age-based Digital Behaviors
eMarketer Webinar: Demographics in Canada—Age-based Digital Behaviors
 
Stan winston & animatronics
Stan winston & animatronicsStan winston & animatronics
Stan winston & animatronics
 
Animatronics
AnimatronicsAnimatronics
Animatronics
 
Maglev trains ppt
Maglev trains pptMaglev trains ppt
Maglev trains ppt
 
The Impact of Big Data On Marketing Analytics (UpStream Software)
The Impact of Big Data On Marketing Analytics (UpStream Software)The Impact of Big Data On Marketing Analytics (UpStream Software)
The Impact of Big Data On Marketing Analytics (UpStream Software)
 
Ultra wideband technology (UWB)
Ultra wideband technology (UWB)Ultra wideband technology (UWB)
Ultra wideband technology (UWB)
 
Canada country presentation_presentation
Canada country presentation_presentationCanada country presentation_presentation
Canada country presentation_presentation
 
Wireless Application Protocol ppt
Wireless Application Protocol pptWireless Application Protocol ppt
Wireless Application Protocol ppt
 

Similar to Big Data on EC2: Mashing Tech in the Cloud

Slides: Proven Strategies for Hybrid Cloud Computing with Mainframes — From A...
Slides: Proven Strategies for Hybrid Cloud Computing with Mainframes — From A...Slides: Proven Strategies for Hybrid Cloud Computing with Mainframes — From A...
Slides: Proven Strategies for Hybrid Cloud Computing with Mainframes — From A...DATAVERSITY
 
AWS Start-Up Tour 2009 / ShareThis
AWS Start-Up Tour 2009 / ShareThisAWS Start-Up Tour 2009 / ShareThis
AWS Start-Up Tour 2009 / ShareThisPaco Nathan
 
High Performance Computing on AWS: Accelerating Innovation with virtually unl...
High Performance Computing on AWS: Accelerating Innovation with virtually unl...High Performance Computing on AWS: Accelerating Innovation with virtually unl...
High Performance Computing on AWS: Accelerating Innovation with virtually unl...Amazon Web Services
 
Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS Amazon Web Services
 
Cloud Computing Realities - Getting past the hype and setting your cloud stra...
Cloud Computing Realities - Getting past the hype and setting your cloud stra...Cloud Computing Realities - Getting past the hype and setting your cloud stra...
Cloud Computing Realities - Getting past the hype and setting your cloud stra...Compuware APM
 
Apache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterpriseApache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterprisejbellis
 
ESA and the Cloud
ESA and the CloudESA and the Cloud
ESA and the CloudNetcetera
 
Cloud Computing ...changes everything
Cloud Computing ...changes everythingCloud Computing ...changes everything
Cloud Computing ...changes everythingLew Tucker
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analyticsconfluent
 
Simplify Your Migration to AWS and Cut Costs by 30% with TSO Logic
 Simplify Your Migration to AWS and Cut Costs by 30% with TSO Logic Simplify Your Migration to AWS and Cut Costs by 30% with TSO Logic
Simplify Your Migration to AWS and Cut Costs by 30% with TSO LogicAmazon Web Services
 
Optimizing Cost and Capacity for Compute - CMP302 - Santa Clara AWS Summit
Optimizing Cost and Capacity for Compute - CMP302 - Santa Clara AWS SummitOptimizing Cost and Capacity for Compute - CMP302 - Santa Clara AWS Summit
Optimizing Cost and Capacity for Compute - CMP302 - Santa Clara AWS SummitAmazon Web Services
 
Cloud Has Become the New Normal: TCS
Cloud Has Become the New Normal: TCS Cloud Has Become the New Normal: TCS
Cloud Has Become the New Normal: TCS Amazon Web Services
 
Zeller Edm Summit Agile Deployment Of Predictive Analytics
Zeller Edm Summit   Agile Deployment Of Predictive AnalyticsZeller Edm Summit   Agile Deployment Of Predictive Analytics
Zeller Edm Summit Agile Deployment Of Predictive AnalyticsRonald.Ramos
 
High Performance Computing on AWS
High Performance Computing on AWSHigh Performance Computing on AWS
High Performance Computing on AWSAmazon Web Services
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computingwebscale
 
AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)
AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)
AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)Amazon Web Services
 
Giga Spaces Alternative To GAE_JavaOne 09
Giga Spaces Alternative To GAE_JavaOne 09Giga Spaces Alternative To GAE_JavaOne 09
Giga Spaces Alternative To GAE_JavaOne 09Amnon Raviv
 
AWS Summit 2018 Summary
AWS Summit 2018 SummaryAWS Summit 2018 Summary
AWS Summit 2018 SummaryAshish Mrig
 
Cloud Architecture - Multi Cloud, Edge, On-Premise
Cloud Architecture - Multi Cloud, Edge, On-PremiseCloud Architecture - Multi Cloud, Edge, On-Premise
Cloud Architecture - Multi Cloud, Edge, On-PremiseAraf Karsh Hamid
 
Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Am...
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Am...Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Am...
Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Am...Yahoo Developer Network
 

Similar to Big Data on EC2: Mashing Tech in the Cloud (20)

Slides: Proven Strategies for Hybrid Cloud Computing with Mainframes — From A...
Slides: Proven Strategies for Hybrid Cloud Computing with Mainframes — From A...Slides: Proven Strategies for Hybrid Cloud Computing with Mainframes — From A...
Slides: Proven Strategies for Hybrid Cloud Computing with Mainframes — From A...
 
AWS Start-Up Tour 2009 / ShareThis
AWS Start-Up Tour 2009 / ShareThisAWS Start-Up Tour 2009 / ShareThis
AWS Start-Up Tour 2009 / ShareThis
 
High Performance Computing on AWS: Accelerating Innovation with virtually unl...
High Performance Computing on AWS: Accelerating Innovation with virtually unl...High Performance Computing on AWS: Accelerating Innovation with virtually unl...
High Performance Computing on AWS: Accelerating Innovation with virtually unl...
 
Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS Building a Big Data & Analytics Platform using AWS
Building a Big Data & Analytics Platform using AWS
 
Cloud Computing Realities - Getting past the hype and setting your cloud stra...
Cloud Computing Realities - Getting past the hype and setting your cloud stra...Cloud Computing Realities - Getting past the hype and setting your cloud stra...
Cloud Computing Realities - Getting past the hype and setting your cloud stra...
 
Apache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterpriseApache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterprise
 
ESA and the Cloud
ESA and the CloudESA and the Cloud
ESA and the Cloud
 
Cloud Computing ...changes everything
Cloud Computing ...changes everythingCloud Computing ...changes everything
Cloud Computing ...changes everything
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analytics
 
Simplify Your Migration to AWS and Cut Costs by 30% with TSO Logic
 Simplify Your Migration to AWS and Cut Costs by 30% with TSO Logic Simplify Your Migration to AWS and Cut Costs by 30% with TSO Logic
Simplify Your Migration to AWS and Cut Costs by 30% with TSO Logic
 
Optimizing Cost and Capacity for Compute - CMP302 - Santa Clara AWS Summit
Optimizing Cost and Capacity for Compute - CMP302 - Santa Clara AWS SummitOptimizing Cost and Capacity for Compute - CMP302 - Santa Clara AWS Summit
Optimizing Cost and Capacity for Compute - CMP302 - Santa Clara AWS Summit
 
Cloud Has Become the New Normal: TCS
Cloud Has Become the New Normal: TCS Cloud Has Become the New Normal: TCS
Cloud Has Become the New Normal: TCS
 
Zeller Edm Summit Agile Deployment Of Predictive Analytics
Zeller Edm Summit   Agile Deployment Of Predictive AnalyticsZeller Edm Summit   Agile Deployment Of Predictive Analytics
Zeller Edm Summit Agile Deployment Of Predictive Analytics
 
High Performance Computing on AWS
High Performance Computing on AWSHigh Performance Computing on AWS
High Performance Computing on AWS
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)
AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)
AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)
 
Giga Spaces Alternative To GAE_JavaOne 09
Giga Spaces Alternative To GAE_JavaOne 09Giga Spaces Alternative To GAE_JavaOne 09
Giga Spaces Alternative To GAE_JavaOne 09
 
AWS Summit 2018 Summary
AWS Summit 2018 SummaryAWS Summit 2018 Summary
AWS Summit 2018 Summary
 
Cloud Architecture - Multi Cloud, Edge, On-Premise
Cloud Architecture - Multi Cloud, Edge, On-PremiseCloud Architecture - Multi Cloud, Edge, On-Premise
Cloud Architecture - Multi Cloud, Edge, On-Premise
 
Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Am...
Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Am...Apache Hadoop India Summit 2011 talk  "Making Hadoop Enterprise Ready with Am...
Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Am...
 

More from George Ang

Wrapper induction construct wrappers automatically to extract information f...
Wrapper induction   construct wrappers automatically to extract information f...Wrapper induction   construct wrappers automatically to extract information f...
Wrapper induction construct wrappers automatically to extract information f...George Ang
 
Opinion mining and summarization
Opinion mining and summarizationOpinion mining and summarization
Opinion mining and summarizationGeorge Ang
 
Huffman coding
Huffman codingHuffman coding
Huffman codingGeorge Ang
 
Do not crawl in the dust 
different ur ls similar text
Do not crawl in the dust 
different ur ls similar textDo not crawl in the dust 
different ur ls similar text
Do not crawl in the dust 
different ur ls similar textGeorge Ang
 
大规模数据处理的那些事儿
大规模数据处理的那些事儿大规模数据处理的那些事儿
大规模数据处理的那些事儿George Ang
 
腾讯大讲堂02 休闲游戏发展的文化趋势
腾讯大讲堂02 休闲游戏发展的文化趋势腾讯大讲堂02 休闲游戏发展的文化趋势
腾讯大讲堂02 休闲游戏发展的文化趋势George Ang
 
腾讯大讲堂03 qq邮箱成长历程
腾讯大讲堂03 qq邮箱成长历程腾讯大讲堂03 qq邮箱成长历程
腾讯大讲堂03 qq邮箱成长历程George Ang
 
腾讯大讲堂04 im qq
腾讯大讲堂04 im qq腾讯大讲堂04 im qq
腾讯大讲堂04 im qqGeorge Ang
 
腾讯大讲堂05 面向对象应对之道
腾讯大讲堂05 面向对象应对之道腾讯大讲堂05 面向对象应对之道
腾讯大讲堂05 面向对象应对之道George Ang
 
腾讯大讲堂06 qq邮箱性能优化
腾讯大讲堂06 qq邮箱性能优化腾讯大讲堂06 qq邮箱性能优化
腾讯大讲堂06 qq邮箱性能优化George Ang
 
腾讯大讲堂07 qq空间
腾讯大讲堂07 qq空间腾讯大讲堂07 qq空间
腾讯大讲堂07 qq空间George Ang
 
腾讯大讲堂08 可扩展web架构探讨
腾讯大讲堂08 可扩展web架构探讨腾讯大讲堂08 可扩展web架构探讨
腾讯大讲堂08 可扩展web架构探讨George Ang
 
腾讯大讲堂09 如何建设高性能网站
腾讯大讲堂09 如何建设高性能网站腾讯大讲堂09 如何建设高性能网站
腾讯大讲堂09 如何建设高性能网站George Ang
 
腾讯大讲堂01 移动qq产品发展历程
腾讯大讲堂01 移动qq产品发展历程腾讯大讲堂01 移动qq产品发展历程
腾讯大讲堂01 移动qq产品发展历程George Ang
 
腾讯大讲堂10 customer engagement
腾讯大讲堂10 customer engagement腾讯大讲堂10 customer engagement
腾讯大讲堂10 customer engagementGeorge Ang
 
腾讯大讲堂11 拍拍ce工作经验分享
腾讯大讲堂11 拍拍ce工作经验分享腾讯大讲堂11 拍拍ce工作经验分享
腾讯大讲堂11 拍拍ce工作经验分享George Ang
 
腾讯大讲堂14 qq直播(qq live) 介绍
腾讯大讲堂14 qq直播(qq live) 介绍腾讯大讲堂14 qq直播(qq live) 介绍
腾讯大讲堂14 qq直播(qq live) 介绍George Ang
 
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍George Ang
 
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍George Ang
 
腾讯大讲堂16 产品经理工作心得分享
腾讯大讲堂16 产品经理工作心得分享腾讯大讲堂16 产品经理工作心得分享
腾讯大讲堂16 产品经理工作心得分享George Ang
 

More from George Ang (20)

Wrapper induction construct wrappers automatically to extract information f...
Wrapper induction   construct wrappers automatically to extract information f...Wrapper induction   construct wrappers automatically to extract information f...
Wrapper induction construct wrappers automatically to extract information f...
 
Opinion mining and summarization
Opinion mining and summarizationOpinion mining and summarization
Opinion mining and summarization
 
Huffman coding
Huffman codingHuffman coding
Huffman coding
 
Do not crawl in the dust 
different ur ls similar text
Do not crawl in the dust 
different ur ls similar textDo not crawl in the dust 
different ur ls similar text
Do not crawl in the dust 
different ur ls similar text
 
大规模数据处理的那些事儿
大规模数据处理的那些事儿大规模数据处理的那些事儿
大规模数据处理的那些事儿
 
腾讯大讲堂02 休闲游戏发展的文化趋势
腾讯大讲堂02 休闲游戏发展的文化趋势腾讯大讲堂02 休闲游戏发展的文化趋势
腾讯大讲堂02 休闲游戏发展的文化趋势
 
腾讯大讲堂03 qq邮箱成长历程
腾讯大讲堂03 qq邮箱成长历程腾讯大讲堂03 qq邮箱成长历程
腾讯大讲堂03 qq邮箱成长历程
 
腾讯大讲堂04 im qq
腾讯大讲堂04 im qq腾讯大讲堂04 im qq
腾讯大讲堂04 im qq
 
腾讯大讲堂05 面向对象应对之道
腾讯大讲堂05 面向对象应对之道腾讯大讲堂05 面向对象应对之道
腾讯大讲堂05 面向对象应对之道
 
腾讯大讲堂06 qq邮箱性能优化
腾讯大讲堂06 qq邮箱性能优化腾讯大讲堂06 qq邮箱性能优化
腾讯大讲堂06 qq邮箱性能优化
 
腾讯大讲堂07 qq空间
腾讯大讲堂07 qq空间腾讯大讲堂07 qq空间
腾讯大讲堂07 qq空间
 
腾讯大讲堂08 可扩展web架构探讨
腾讯大讲堂08 可扩展web架构探讨腾讯大讲堂08 可扩展web架构探讨
腾讯大讲堂08 可扩展web架构探讨
 
腾讯大讲堂09 如何建设高性能网站
腾讯大讲堂09 如何建设高性能网站腾讯大讲堂09 如何建设高性能网站
腾讯大讲堂09 如何建设高性能网站
 
腾讯大讲堂01 移动qq产品发展历程
腾讯大讲堂01 移动qq产品发展历程腾讯大讲堂01 移动qq产品发展历程
腾讯大讲堂01 移动qq产品发展历程
 
腾讯大讲堂10 customer engagement
腾讯大讲堂10 customer engagement腾讯大讲堂10 customer engagement
腾讯大讲堂10 customer engagement
 
腾讯大讲堂11 拍拍ce工作经验分享
腾讯大讲堂11 拍拍ce工作经验分享腾讯大讲堂11 拍拍ce工作经验分享
腾讯大讲堂11 拍拍ce工作经验分享
 
腾讯大讲堂14 qq直播(qq live) 介绍
腾讯大讲堂14 qq直播(qq live) 介绍腾讯大讲堂14 qq直播(qq live) 介绍
腾讯大讲堂14 qq直播(qq live) 介绍
 
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
 
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
 
腾讯大讲堂16 产品经理工作心得分享
腾讯大讲堂16 产品经理工作心得分享腾讯大讲堂16 产品经理工作心得分享
腾讯大讲堂16 产品经理工作心得分享
 

Recently uploaded

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 

Recently uploaded (20)

Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 

Big Data on EC2: Mashing Tech in the Cloud

  • 1. Big Data on EC2: Mashing Technology in the Cloud ShareThis Paco Nathan, Data Insights Team ScaleCamp 2009-06-09
  • 2. Scenario… ✤ Given: >10^6 publishers, >10^9 users, >10^10 urls ✤ Early-stage start-up, < 25 people, seeking to minimize spend on Ops and capex for data centers ✤ Serving widgets on page views for popular online publications: ESPN, HuffPost, FOX, CS Monitor, CBS Marketwatch, Wired, TechCrunch, etc. ✤ Spikes in popularity of stories leads to elastic demands throughout the system architecture: serving API, logging, DW, BI, etc. ✤ Business needs to improve user experience by analyzing how people share online content ScaleCamp 2009-06-09
  • 3. System Architecture ✤ 100% infrastructure based on Amazon AWS ✤ Each component designed for cost-effective, horizontal scale-out ✤ AsterData: infrastructure based on a “hub-and-spoke” pattern of batch jobs and data consolidation ✤ Cascading: abstraction layer for tying together system components ✤ Batch jobs run on Amazon Elastic MapReduce and AsterData SQL/MR ✤ Vertical search based on Bixo and Katta ScaleCamp 2009-06-09
  • 5. Cascading ✤ Syntax is for humans, APIs are for software ✤ Cascading defines apps in terms of functions applied to data flows, incorporates end-points; whereas MR is coarse, key/values are brittle ✤ Direct benefits to team’s responsibilities, process, deliverables ✤ Also consider impact on team size, composition, staffing, training ✤ Ideal for provisioning/monitoring elastic resources, such as EMR ✤ Great potential to allow for migrating apps ✤ (see also: Cascading talk @ Hadoop Summit tomorrow, 6pm) ScaleCamp 2009-06-09
  • 6. Amazon Elastic MapReduce ✤ Can scale-out wide and select different instance types at launch ✤ Reads input from S3, writes output to S3, optionally copies logs to S3 ✤ Excellent command line tools make dev/test/debug cycle more efficient ✤ Risks for DIY Hadoop clusters: time+cost to acquire leases, data locality, etc., — EMR resolves these to make large apps more cost-effective ✤ Simplifies needs for Ops — which can be quite difficult and expensive to staff for Big Data apps, and pose a risk to scale-out strategy ✤ See the Cascading examples: LogAnalyzer for CloudFront and Multitool ScaleCamp 2009-06-09
  • 7. AsterData nCluster Cloud Edition ✤ Scalable, fault-tolerant, relational database — directly in the cloud ✤ Takes only minutes to go from idea to a prototype which scales well; accessible by data scientists who have less coding background ✤ Use of SQL unions on map phase, plus other SQL primitives for shuffle and reduce phases, allows for complex MR workflows ✤ Leveraging SQL primitives during shuffle and reduce phases allows for automated optimizations ✤ Contributed code among developer community, implementing libraries for popular algorithms based on In-Database MapReduce ScaleCamp 2009-06-09
  • 8. ShareThis as Case Study See also: ShareThis case study, "Cascading" by Chris K Wensel, in… ScaleCamp 2009-06-09