Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS re:Invent 2014


Published on

Collecting and processing terabytes of data per day is a challenge for any technology company. As marketers and brands become more sophisticated consumers of data, enabling granular levels of access to targeted subsets of data from outside your firewalls presents new challenges. This session discusses how to build scalable, complex, and cost-effective data processing pipelines using Amazon Kinesis, Amazon EC2 Spot Instances, Amazon EMR, and Amazon Simple Storage Service (S3). Learn how MediaMath revolutionized their data delivery platform with the help of these services to empower product teams, partners, and clients. As a result, a number of innovative products and services are delivered on top of terabytes of online user behavior. MediaMath covers their journey from legacy batch processing and vendor lock-in to a new world where the raw materials to build advanced lookalike models, optimization algorithms, or marketing attribution models are readily available to any engineering team in real time, substantially reducing the time - and cost - of innovation.

Published in: Technology
  • Be the first to comment

(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS re:Invent 2014

  1. 1. November 13, 2014 | Las Vegas, NV Eddie Fagin, VP Engineering, MediaMath Ian Hummel, Sr. Director Engineering, MediaMath Adi Krishnan, Sr. PM Amazon Kinesis
  2. 2. Canonical Data Flow With Amazon Kinesis
  3. 3. •Query Engine Approach •Pre-computations such as indices and dimensional views improve performance •Historical, structured data •Amazon Redshift •HIVE/SQL-on-Hadoop/ M-R/ Spark •Batch programs, or other abstractions breaking down into MR style computations •Historical, Semi-structured data •Amazon EMR •Custom computations of relative simple complexity •Continuous Processing –filters, sliding windows, aggregates –on infinite data streams •Semi/Structured data, generated continuously in real-time •Amazon Kinesis Data Warehousing Hadoop Style Processing Stream Processing
  4. 4. Real-time processingHigh throughput; elasticEasy to useS3, Redshift, DynamoDB Integrations Amazon Kinesis
  5. 5. Amazon Kinesis Amazon Web ServicesAZAZAZDurable, highly consistent storage replicates dataacross three data centers (availability zones) Aggregate andarchive to S3Millions ofsources producing100s of terabytesper hourFrontEndAuthenticationAuthorizationOrdered streamof events supportsmultiple readersReal-timedashboardsand alarmsMachine learningalgorithms or sliding windowanalyticsAggregate analysisin Hadoop or adata warehouseInexpensive: $0.028 per million puts
  6. 6. Hadoop/HDFS clustersHive, Impala, MapReduceEasy to use; fully managedOn-demand and spot pricing Amazon EMR
  7. 7. Warehouse (analytics, decisioning, optimization, archive) Bidder Data (wins) Site Events3rd Party Segments
  8. 8. Firehose(Kinesis) Decisioning& OptimizationReal-time AnalyticsArchiveS3Bidder Data (wins) Site Events3rd Party Segments
  9. 9. App (metadata) Data mart (Oracle/ Postgres) QuboleRedshiftHadoopScriptsAttributionBiddersBiddersBidders S3 S3 S3S3 EMREMREMR Recurring partition jobs/process jobs Partners/clients/tools/ internal services Pixels PixelsPixelsRealtimeFirehoseNetezza
  10. 10.