Cloud as a Data Platform
Upcoming SlideShare
Loading in...5

Cloud as a Data Platform



My slides on how to use cloud as a data platform at BigDataWeek 2013 Romania

My slides on how to use cloud as a data platform at BigDataWeek 2013 Romania



Total Views
Views on SlideShare
Embed Views



3 Embeds 7 3 2 2



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Cloud as a Data Platform Cloud as a Data Platform Presentation Transcript

  • Cloud as a Data PlatformWhat is (Big) Data? Amazon Data Services
  • Andrei SavuFounder of Axemblr.comCo-organizer of Bucharest JUGLead of Apache ProvisionrPassion for Automation & Data AnalysisConnect with me on LinkedIn
  • @ AxemblrData Processing InfrastructureDeployment Automation on IaaS platformsProduct: Hadoop On-Demand ApplianceApache Provisionr (Open Source)Consulting & Professional Services
  • TopicsIntroduction on (Big)Data● Characteristics● In Practice● ValueAmazon Data Platform● Tools● How they fit
  • What is (Big)Data?Beyond the Hype (Source)
  • ... size & speed are relative
  • Characteristics #1Too big, Too fast, Unstructured
  • 1. Volume"Simple models work better with more data"The Unreasonable Effectiveness of DataAlon Halevy, Peter Norvig, and Fernando Pereira, GoogleChallenging from a technical perspectiveNeeds scalable storageDistributed query engines (massively parallel)
  • 2. VelocityNothing new for financial tradersTight feedback loop as competitive advantageComplex event processing (CEPs)Online stream summarization (estimation)Online aggregation (key-value stores)Long term storage for batch processing
  • 3. VarietyThe reality of data is messy and the formatevolves over timeEntity Resolution, Language Detection etc.Mantra: Detect Schema, Annotate, Enrich
  • Characteristics #2In Practice
  • (Big) data is messy80% efforts go into identifying sources,integration and cleaningMessy and disconnected: different systems,different networks, different departmentsConsider data-markets
  • (Big) data has gravityTends to attract processing servicesThe cost of moving may be large
  • Cloud or in-house?Cloud:● for development & exploration● low usage or variable capacity needsIn-house:● due to strict regulations● for performance and cost efficiency
  • People & Data ScienceYou need a team that combines: math,programming and scientific instinctBuilding data-science teams
  • (Big)Data Value
  • ... answer them w/ Data
  • Enables New ProductsRecommendation engines (think Amazon,Netflix, Facebook, LinkedIn)Advanced advertising (more later)Advanced search & spelling suggestions(and many more)
  • Rule of thumb"Advice to businesses starting out with big data:first, decide what problem you want to solve." *Christer Johnson, IBM’s leader for advancedanalytics in North America* create data-driven business processes (more)
  • (Big)Data on AWS
  • Based on my work atMagnolia Labs Inc. Francisco, CA based company with R&Din RomaniaVarious products: RTB (real-time bidding),Secure Browsing etc.They are hiring!
  • Overview
  • Amazon S3Amazon S3
  • Amazon GlacierAmazon Glacier
  • Amazon EMR (Elastic MapReduce)
  • Amazon Data Pipeline
  • Amazon RedShift
  • Amazon DynamoDB
  • How they fit?
  • Thanks! Questions?Andrei Savu - asavu @ axemblr.con