Big Data Marketing in the  AWS Cloud: ImprovingCross-Media Effectiveness
Welcome       Sheri Sullivan  Senior Marketing Manager    Global SI Ecosystem   Amazon Web Services
Webinar Overview•   Submit Your Questions using the Q/A tool.•   A copy of today’s presentation will be made available on:...
What We’ll Cover• Intro to AWS Database and Big Data Services• Customer Use Cases and Solutions• Delivering Cross-Media An...
John Gannon    AWS BusinessDevelopment Manager jgannon@amazon.com
Big Data and Databases on AWS  Managed services designed to reduce administration, accelerate  deployment, and minimize th...
Amazon Relational Database           ServiceRDS is a fully managed Relational database service that issimple to deploy, ea...
Amazon DynamoDBDynamoDB is a fully managed NoSQL databaseservice that provides extremely fast andpredictable performance w...
AMAZON ELASTIC MAPREDUCE Reduces complexity & cost of Hadoop Management Integrates with AWS Services and 3rd Party vendors...
Operated 2 million+ Hadoop    clusters last year
Amazon EMR is the #1       Enterprise Hadoop SolutionAWS is “the mostprominent Hadoop cloudservice provider” and“leads the...
Success StoryBusiness Challenge   Needed a real-time analytics tool to determine dynamic live event pricing during the   t...
Anupam Singh   MarketShare  VP, Technologyasingh@marketshare.com
Elastic Data ManagementMulti-Cluster, Elastic, Failure          Resistant
Who we are                                            MarketShare                           MarketShare                   ...
Terabytes per                                                              1000+ variables   customer                     ...
Brand                    Product                   Earned media                      ETL                     Organic searc...
ETL   Reporting   Modeling                                         SimulationFTP                           Application
ETL   Reporting   Modeling                                         SimulationFTP                           Application
ETL   Reporting   Modeling                                         SimulatioFTP                                         n ...
ETL   Reporting   Modeling                                     SimulationFTP                           Application
Many applications in         productionMarketing Efficiency                     Attribution                       Dynamic ...
The Technology That Makes             It PossibleElastic Cloud™                  AWS                                      ...
Giant Hadoop clusterISSUE   1               Overwhelmed for small periods               Unused for large periods
Partition the data pipelineSOLUTION   1                  Identify independent data sources                  Redesign ETL...
Cluster proliferationISSUE    2                Manual bring up and tear down of clusters                Dramatic increas...
Cluster proliferationSOLUTION         2                          Use Elastic Map Reduce                          Dynamical...
Too many failure points     ISSUE         3                                             Amazon EC2                        ...
Invent technology for partial restartsSOLUTION           3                                             Amazon EC2         ...
SummaryDesign your data pipeline for a multi-cluster environment • Write Configurable ETL to become independent, partition...
Programs to help you get started         with Big Data on AWS        Big Data                                         EMR ...
EMR Training Schedule•   Los Angeles, CA – 10/16-10/18•   Boston, MA – 10/30-11/1•   Mountain View, CA – 11/13-11/15•   Da...
Questions?Contact:William MerchanVP, Business DevelopmentMarketSharewmerchan@marketshare.comJohn GannonBusiness Developmen...
Upcoming SlideShare
Loading in …5
×

Big Data Marketing in the AWS Cloud: Improving Cross-Media Effectiveness - Webinar

2,008 views

Published on

Big Data Marketing in the AWS Cloud: Improving Cross-Media Effectiveness - Partner Webinar

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,008
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • We’ve been operating the service for over 3 years now and in the last year alone we’ve operated over 2 MILLIONHadoop clusters
  • Forrester wave report named Amazon EMR the #1 enterprise hadoop solution because of it’s integration with various data stores, it’s ecosystem of vendors and the number of customers the service supports.
  • Hi, my name is Anupam Singh. I am the Vice President of Technology at MarketShare.
  • MarketShare builds solutions for marketing organizations at Fortune 100 companies. Our customers provide us data and we provide a cloud based analytic applications to improve the efficiency of our customer’s marketing.
  • So, what are the big challenges that we face? Our entire business is based on scaling complex data modeling. Our scaling challenges are across 4 major dimensions. Each customer has 10s of terabytes of data. The data comes from hundreds of data sources. This data has thousands of variables to analyze. And we need to do this for hundreds of customers. Let us look at the various stages to build a solution that scales.
  • The first stage is bringing the data together. Today’s marketing organization is faced with hundreds of data sources. Consider this picture where we bring together data from the customer’s website, the advertising logs from their vendors, revenue data from the ERP systems, variables like Seasonality & Economy. As you can see, we have to gather more than 40 data sources in this single picture. Just managing the storage for daily, weekly and monthly updates is a challenge.
  • A lot of this data is machine generated. And it is not ready for analytics. Each data source has to be scrubbed and cleaned through an ETL pipeline before doing analytics. Our ETL pipelines have 20-30 main stages with 100s of sub-stages. Scheduling these and correcting data errors is one of our biggest technical challenges. We will dive deeper into this later. Once the data has been cleaned, it is ready for analytics.
  • Many of our customers have never seen these data sources in a single dashboard. Even before running the data through our proprietary modeling platform, we can help our customers get dashboards on previous data black holes.
  • The term data scientist has been in vogue lately. At MarketShare, we have a large team of modelers who run modeling on the cloud. As the data has been cleaned up, the modelers run thousands of different equations. Many analytic applications stop their cloud usage at reporting. At MarketShare, we believe that reporting is not enough to answer the questions. Building a predictive model is key to answering business questions on terabytes of data. We use the cloud to build custom models for each one of our customers. We use the power of distributed systems to validate these models for accuracy.
  • Once the models have been prepared, they are deployed in an easy to use application. It should be noted that reducing big data should not mean that the user is lost in a forest of reports. At MarketShare, we believe in simplifying access to Big Data. We hide the model complexity behind easy to use applications that let our users build many different scenarios for their business.
  • So, what does all this give our customers? We have been able to release many different applications on top of this analytics pipeline. The first one is marketing efficiency. The second application is Attribution. The third one is Dynamic Pricing.
  • So, what makes this pipeline run? Our entire analytics workflow is built using various services from Amazon as building blocks. Our applications are deployed behind the elastic load balancer service. The data is stored in Storage services like S3, RDS and we are trying out Dynamo DB. Our analytics jobs are executed on dynamic clusters provided by elastic map reduce.
  • So, let us quickly go under the hood. 3 years ago, we started with a hadoop cluster to store all our data. Very quickly we noticed two important things with the cluster. The first observation is that however big we made the cluster, jobs kept running into each other. Try as we might, the cluster would get hot for some time when many different stages would start executing at the same time. The second observation was how unused the cluster was for large periods of our time. So, while we are spending a lot of dollars on this large cluster, our customers are still unhappy with the response times!
  • So, what was our solution? We rewrote our entire data pipeline to run many different clusters. So,
  • Big Data Discovery WorkshopBrainstorm pilot use casesIdentify data sources and formatsReview business and financial driversRecommended use casesRoadmap for data migration and production rolloutReference architectureEstimated pilot costNext stepsEMR BootcampInteractive onsite workshop (is not classroom training)Work w/customer to architect, install, and config EMRRun and debug production job flowsCustomer’s dataset(s) must be on S3
  • Big Data Marketing in the AWS Cloud: Improving Cross-Media Effectiveness - Webinar

    1. 1. Big Data Marketing in the AWS Cloud: ImprovingCross-Media Effectiveness
    2. 2. Welcome Sheri Sullivan Senior Marketing Manager Global SI Ecosystem Amazon Web Services
    3. 3. Webinar Overview• Submit Your Questions using the Q/A tool.• A copy of today’s presentation will be made available on: • AWS SlideShare Channel@ http://www.slideshare.net/AmazonWebServices/ • AWS YouTube Channel@ http://www.youtube.com/user/AmazonWebServices Special Note: Today’s Webinar is being recorded.
    4. 4. What We’ll Cover• Intro to AWS Database and Big Data Services• Customer Use Cases and Solutions• Delivering Cross-Media Analytics• MarketShare Planner Platform
    5. 5. John Gannon AWS BusinessDevelopment Manager jgannon@amazon.com
    6. 6. Big Data and Databases on AWS Managed services designed to reduce administration, accelerate deployment, and minimize the cost of analysis and experimentation DynamoDB Schema-less data store that enables fast deployment of new applications without the burden of database administration Relational Database Service (RDS) Manage existing database applications without the effort required to provision, upgrade, backup and scale highly available instances ElastiCache Accelerate data retrieval performance by caching data in memory and avoiding slower disk-based systems Elastic MapReduce (EMR) Hadoop-based infrastructure service enabling the parallel processing of massive amounts of data
    7. 7. Amazon Relational Database ServiceRDS is a fully managed Relational database service that issimple to deploy, easy to scale, reliable and cost-effective Choice of Database Engines Fully Managed Service Push Button Scalability Fault Tolerance with Multi-AZ Works with EC2 & ElastiCache
    8. 8. Amazon DynamoDBDynamoDB is a fully managed NoSQL databaseservice that provides extremely fast andpredictable performance with seamless scalability Authors of NoSQL Zero Administration Low Latency SSD’s Unlimited Potential Storage and Throughput
    9. 9. AMAZON ELASTIC MAPREDUCE Reduces complexity & cost of Hadoop Management Integrates with AWS Services and 3rd Party vendors Highly customizable
    10. 10. Operated 2 million+ Hadoop clusters last year
    11. 11. Amazon EMR is the #1 Enterprise Hadoop SolutionAWS is “the mostprominent Hadoop cloudservice provider” and“leads the pack (ofLeaders) due to itsproven, feature-rich ElasticMapReduce service…”-The Forrester Wave™:Enterprise HadoopSolutions Q1 2012
    12. 12. Success StoryBusiness Challenge Needed a real-time analytics tool to determine dynamic live event pricing during the ticket sales life cycle Optimize event ticket pricing, improve yield management & generate incremental revenueAWS Services Elastic Load Amazon Elastic Amazon SimpleDB Amazon Simple Balancer MapReduce Amazon CloudWatch Email Service (SES)Business Benefits Ease of use, reducing developers’ infrastructure management time by 3 hours per day Estimated 80% cost reduction annually, compared to fixed service costs
    13. 13. Anupam Singh MarketShare VP, Technologyasingh@marketshare.com
    14. 14. Elastic Data ManagementMulti-Cluster, Elastic, Failure Resistant
    15. 15. Who we are MarketShare MarketShare Planner™ Price™ The global marketer partner of choice MarketShare MarketShare for understanding, optimizing and 360™ Optimizer™ driving revenue MarketShare Platform Cloud modeling | Saas infrastructure | Data connectors• Recognized industry leader Risky Strong• Bets Contenders Performers Leaders Cloud-based software solutions Strong• Over half the Fortune 100• Strong media and agency Current Offering partnerships• Global presence Weak Weak Strategy Strong
    16. 16. Terabytes per 1000+ variables customer Data ArchitectClient Data ETL Reportin Modeling g Sim-OptFTP Scale Complex Modeling Simulation Engineer Modeling Sim-Opts Tool Stack Production Stack Stack Tables Tables Tables Tables Application Modeler100+ Customers 100+ data sources
    17. 17. Brand Product Earned media ETL Organic search Reporting Modeling Innovation Quality Events Conferences Controllable Bing WOM Google Trade shows Sales Blogs Social media Twitter Awareness Training Owned PR Facebook Service Support media Commerce Simulatio Website Content Consideration DisplaysFTP n Shelf space In store Google Paid Search Bing Discounts Purchase Bundles Banner Ads Coupons Promotions Display Video Ads Magazine Offering Print Newspaper Pricing Competition TV Applicati Radio on Broadcast Signs Interest Seasonality Digital rates Non- Stock market signage Catalog Direct Mobile controllable mail email Paid media Economy Outdoor Direct
    18. 18. ETL Reporting Modeling SimulationFTP Application
    19. 19. ETL Reporting Modeling SimulationFTP Application
    20. 20. ETL Reporting Modeling SimulatioFTP n Applicati on
    21. 21. ETL Reporting Modeling SimulationFTP Application
    22. 22. Many applications in productionMarketing Efficiency Attribution Dynamic Pricing
    23. 23. The Technology That Makes It PossibleElastic Cloud™ AWS Amazon EC2 Amazon EC2 Permanent Instances On-Demand Instances EC2 EC2 Amazon Instance Instance Elastic MapReduce Elastic Load Balancer Web App Server Server AWS Amazon EC2 Amazon Permanent Instances Managed Storage EC2 EC2 RDS Database Amazon Simple Instance Instance Instance Storage Service (S3) Web App Serve Serve r r
    24. 24. Giant Hadoop clusterISSUE 1  Overwhelmed for small periods  Unused for large periods
    25. 25. Partition the data pipelineSOLUTION 1  Identify independent data sources  Redesign ETL for independent stages
    26. 26. Cluster proliferationISSUE 2  Manual bring up and tear down of clusters  Dramatic increase in maintenance costs
    27. 27. Cluster proliferationSOLUTION 2 Use Elastic Map Reduce Dynamically change the size of cluster based on:  Volume of data & Historical performance Amazon EC2 Amazon EC2 Amazon EC2 On-Demand Instances On-Demand Instances On-Demand Instances Amazon Amazon Amazon Elastic MapReduce Elastic MapReduce Elastic MapReduce
    28. 28. Too many failure points ISSUE 3 Amazon EC2 On-Demand Instances Amazon Elastic MapReduce Jobs fail after ~90% completion Amazon Rerunning costs $$$ Managed Storage Amazon Simple Storage Service (S3) Amazon EC2 Amazon EC2 Amazon EC2 Amazon EC2 On-Demand Instances On-Demand Instances On-Demand Instances On-Demand Instances Amazon Amazon Amazon Amazon Elastic MapReduce Elastic MapReduce Elastic MapReduce Elastic MapReduce
    29. 29. Invent technology for partial restartsSOLUTION 3 Amazon EC2 On-Demand Instances Amazon Elastic MapReduce Collect job stats obsessively Amazon Managed Storage Restart based on Amazon Simple patented technology Storage Service (S3) called PauseNPlayTM Amazon EC2 Amazon EC2 Amazon EC2 On-Demand Instances On-Demand Instances On-Demand Instances Amazon Amazon Amazon Elastic MapReduce Elastic MapReduce Elastic MapReduce
    30. 30. SummaryDesign your data pipeline for a multi-cluster environment • Write Configurable ETL to become independent, partitioned workflows • A cluster that stays up the entire month is not elastic Save your intermediate results in low cost storage • Think about compression • Do not underestimate schema complexityLoosely coupled architecture has failure points • Save state obsessively • Build restart-ability into your architecture
    31. 31. Programs to help you get started with Big Data on AWS Big Data EMR Discovery EMR Training Bootcamp WorkshopIdentify and prioritize target Deploy a sample use case 3 day intensive Big Data use cases with real customer data developer training
    32. 32. EMR Training Schedule• Los Angeles, CA – 10/16-10/18• Boston, MA – 10/30-11/1• Mountain View, CA – 11/13-11/15• Dallas, TX – 11/27-11/29• New York, NY – 12/11-12/13Visit http://bit.ly/AWS_EMR_Training for class details and registration
    33. 33. Questions?Contact:William MerchanVP, Business DevelopmentMarketSharewmerchan@marketshare.comJohn GannonBusiness Development Manager, AWSjgannon@amazon.com

    ×