Razorfish - Amazon EMR usecase

1. Edge in the cloud Salim Hemdani VP, Experiences and Platforms @shemdani

2. 500,000,000,000

3. 1,000

4. 100

7. What are these numbers?

8. Numbers 500,000,000,000 records 1,000 clients 100 markets 25 data sources 13 terabytes per day

9. Agenda

10. Time for a change

12. Heavy on CAPEX

13. Managed by Atlas/MSFT networking teams

14. To be completed by October 2010; no interruption in SLAMove away from PVM

15. Ad Serving Event Log Request hash(key) mod R FS01 FS03 FS02 98101 98104 98115 98201 98203 98004 98007 98065

17. Distributed processing

18. Language agnostic Any Language Job tracker Task tracker

19. AWS

20. Aggregate Ad Serving data Log Files File Export APIs Internet Client Provided Data Data Sources Presentation Layer Talend Data Flow Manager Direct Analytics Processing via EMR Web Application Layer ODBC Edge Provisioning DB OLAP Cache Cloud Storage S3 HBase/SDB 15 Elastic MapReduce

22. Decreasing web marketing effectiveness

24. Drive a personalized message User recently purchased a home theater system and is now looking for sports games Target Ad ( 1.7 million per day )

25. We import Atlas transaction level data 24 servers S3 file storage Compress and upload 200 + GB of data per day ( 180 days = ½ Trillion ICA records )

26. We use EMR to process and segment EMR S3 100 Machinecluster created on demand ( 3.5 Billion records, 71 million unique cookies a day)

27. Process and Cost This all happens in about 8 hours every day and is fully automated (previously 2+ days) And increased ROAS by 500% (to $74)

28. Why AWS Efficient Elastic infrastructure from AWS allows capacity to be provisioned as needed based on load, reducing cost and the risk of processing delays Ease of integration Amazon Elastic MapReduce with Cascading allows data processing in the cloud without any changes to the underlying algorithms Flexible Hadoop with Cascading is flexible enough to allow “agile” implementation and unit testing of sophisticated algorithms. Adaptable Cascading simplifies the integration of Hadoop with external ad system Scalable AWS infrastructure helps reliably store and process huge (Petabytes) data setss

29. Learning

30. Thank you.

Razorfish - Amazon EMR usecase

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Razorfish - Amazon EMR usecase

Similar to Razorfish - Amazon EMR usecase (20)

Razorfish - Amazon EMR usecase

Editor's Notes