Cloud as a Data Platform
Upcoming SlideShare
Loading in...5

Cloud as a Data Platform



My slides on how to use cloud as a data platform at BigDataWeek 2013 Romania

My slides on how to use cloud as a data platform at BigDataWeek 2013 Romania



Total Views
Slideshare-icon Views on SlideShare
Embed Views



3 Embeds 7 3 2 2



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Cloud as a Data Platform Cloud as a Data Platform Presentation Transcript

    • Cloud as a Data PlatformWhat is (Big) Data? Amazon Data Services
    • Andrei SavuFounder of Axemblr.comCo-organizer of Bucharest JUGLead of Apache ProvisionrPassion for Automation & Data AnalysisConnect with me on LinkedIn
    • @ AxemblrData Processing InfrastructureDeployment Automation on IaaS platformsProduct: Hadoop On-Demand ApplianceApache Provisionr (Open Source)Consulting & Professional Services
    • TopicsIntroduction on (Big)Data● Characteristics● In Practice● ValueAmazon Data Platform● Tools● How they fit
    • What is (Big)Data?Beyond the Hype (Source)
    • ... size & speed are relative
    • Characteristics #1Too big, Too fast, Unstructured
    • 1. Volume"Simple models work better with more data"The Unreasonable Effectiveness of DataAlon Halevy, Peter Norvig, and Fernando Pereira, GoogleChallenging from a technical perspectiveNeeds scalable storageDistributed query engines (massively parallel)
    • 2. VelocityNothing new for financial tradersTight feedback loop as competitive advantageComplex event processing (CEPs)Online stream summarization (estimation)Online aggregation (key-value stores)Long term storage for batch processing
    • 3. VarietyThe reality of data is messy and the formatevolves over timeEntity Resolution, Language Detection etc.Mantra: Detect Schema, Annotate, Enrich
    • Characteristics #2In Practice
    • (Big) data is messy80% efforts go into identifying sources,integration and cleaningMessy and disconnected: different systems,different networks, different departmentsConsider data-markets
    • (Big) data has gravityTends to attract processing servicesThe cost of moving may be large
    • Cloud or in-house?Cloud:● for development & exploration● low usage or variable capacity needsIn-house:● due to strict regulations● for performance and cost efficiency
    • People & Data ScienceYou need a team that combines: math,programming and scientific instinctBuilding data-science teams
    • (Big)Data Value
    • ... answer them w/ Data
    • Enables New ProductsRecommendation engines (think Amazon,Netflix, Facebook, LinkedIn)Advanced advertising (more later)Advanced search & spelling suggestions(and many more)
    • Rule of thumb"Advice to businesses starting out with big data:first, decide what problem you want to solve." *Christer Johnson, IBM’s leader for advancedanalytics in North America* create data-driven business processes (more)
    • (Big)Data on AWS
    • Based on my work atMagnolia Labs Inc. Francisco, CA based company with R&Din RomaniaVarious products: RTB (real-time bidding),Secure Browsing etc.They are hiring!
    • Overview
    • Amazon S3Amazon S3
    • Amazon GlacierAmazon Glacier
    • Amazon EMR (Elastic MapReduce)
    • Amazon Data Pipeline
    • Amazon RedShift
    • Amazon DynamoDB
    • How they fit?
    • Thanks! Questions?Andrei Savu - asavu @ axemblr.con