Cloud as a Data Platform

969 views
899 views

Published on

My slides on how to use cloud as a data platform at BigDataWeek 2013 Romania

http://www.eurocloud.ro/en/events/all-there-is-to-know-about-big-data/#.UXZFaUDvlVI

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
969
On SlideShare
0
From Embeds
0
Number of Embeds
35
Actions
Shares
0
Downloads
4
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Cloud as a Data Platform

  1. 1. Cloud as a Data PlatformWhat is (Big) Data? Amazon Data Services
  2. 2. Andrei SavuFounder of Axemblr.comCo-organizer of Bucharest JUGLead of Apache ProvisionrPassion for Automation & Data AnalysisConnect with me on LinkedIn
  3. 3. @ AxemblrData Processing InfrastructureDeployment Automation on IaaS platformsProduct: Hadoop On-Demand ApplianceApache Provisionr (Open Source)Consulting & Professional Services
  4. 4. TopicsIntroduction on (Big)Data● Characteristics● In Practice● ValueAmazon Data Platform● Tools● How they fit
  5. 5. What is (Big)Data?Beyond the Hype (Source)
  6. 6. ... size & speed are relative
  7. 7. Characteristics #1Too big, Too fast, Unstructured
  8. 8. 1. Volume"Simple models work better with more data"The Unreasonable Effectiveness of DataAlon Halevy, Peter Norvig, and Fernando Pereira, GoogleChallenging from a technical perspectiveNeeds scalable storageDistributed query engines (massively parallel)
  9. 9. 2. VelocityNothing new for financial tradersTight feedback loop as competitive advantageComplex event processing (CEPs)Online stream summarization (estimation)Online aggregation (key-value stores)Long term storage for batch processing
  10. 10. 3. VarietyThe reality of data is messy and the formatevolves over timeEntity Resolution, Language Detection etc.Mantra: Detect Schema, Annotate, Enrich
  11. 11. Characteristics #2In Practice
  12. 12. (Big) data is messy80% efforts go into identifying sources,integration and cleaningMessy and disconnected: different systems,different networks, different departmentsConsider data-markets
  13. 13. (Big) data has gravityTends to attract processing servicesThe cost of moving may be large
  14. 14. Cloud or in-house?Cloud:● for development & exploration● low usage or variable capacity needsIn-house:● due to strict regulations● for performance and cost efficiency
  15. 15. People & Data ScienceYou need a team that combines: math,programming and scientific instinctBuilding data-science teamshttp://radar.oreilly.com/2011/09/building-data-science-teams.html
  16. 16. (Big)Data Value
  17. 17. ... answer them w/ Data
  18. 18. Enables New ProductsRecommendation engines (think Amazon,Netflix, Facebook, LinkedIn)Advanced advertising (more later)Advanced search & spelling suggestions(and many more)
  19. 19. Rule of thumb"Advice to businesses starting out with big data:first, decide what problem you want to solve." *Christer Johnson, IBM’s leader for advancedanalytics in North America* create data-driven business processes (more)
  20. 20. (Big)Data on AWShttp://aws.amazon.com/big-data/
  21. 21. Based on my work atMagnolia Labs Inc. http://magnolialabs.com/San Francisco, CA based company with R&Din RomaniaVarious products: RTB (real-time bidding),Secure Browsing etc.They are hiring! info@magnolialabs.com
  22. 22. Overview
  23. 23. Amazon S3Amazon S3
  24. 24. Amazon GlacierAmazon Glacier
  25. 25. Amazon EMR (Elastic MapReduce)
  26. 26. Amazon Data Pipeline
  27. 27. Amazon RedShift
  28. 28. Amazon DynamoDB
  29. 29. How they fit?
  30. 30. Thanks! Questions?Andrei Savu - asavu @ axemblr.con

×