Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Like this presentation? Why not share!

Cloud Computing Skepticism






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Cloud Computing Skepticism Cloud Computing Skepticism Presentation Transcript

  • Cloud Computing Skepticism
    Abhishek Verma
  • Outline
    Cloud computing hype
    MapReduce Vs Parallel DBMS
    Cost of a cloud
  • Recent Trends
    Amazon S3
    (March 2006)
    Amazon EC2
    (August 2006)
    (March 2006)
    Google App Engine
    (April 2008)
    Microsoft Azure
    (Oct 2008)
    Facebook Platform
    (May 2007)
  • Tremendous Buzz
  • Gartner Hype Cycle*
    Cloud Computing
    * From http://en.wikipedia.org/wiki/Hype_cycle
  • Blind men and an Elephant
  • “Cloud computing is simply a buzzword used to repackage grid computing and utility computing, both of which have existed for decades.”
    Definition of Cloud Computing
  • “The interesting thing about cloud computing is that we’ve redefined cloud computing to include everything that we already do. […]
    The computer industry is the only industry that is more fashion-driven than women’s fashion.
    Maybe I’m an idiot, but I have no idea what anyone is talking about. What is it? It’s complete gibberish. It’s insane. When is this idiocy going to stop?”
    Larry Ellison
    During Oracle’s Analyst Day
    From http://blogs.wsj.com/biztech/2008/09/25/larry-ellisons-brilliant-anti-cloud-computing-rant/
  • From http://geekandpoke.typepad.com
  • Reliability
    Many enterprise (necessarily or unnecessarily) set their SLAs uptimes at 99.99% or higher, which cloud providers have not yet been prepared to match
    • Not clear that all applications require such high services
    • IT shops do not always deliver on their SLAs but their failures are less public and customers can’t switch easily
    * SLAs expressed in Monthly Uptime Percentages; Source : McKinsey & Company
  • A Comparison of Approaches to Large-Scale Data Analysis*
    Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel J. Abadi, David J. DeWitt, Samuel Madden, Michael Stonebraker
    To appear in SIGMOD ‘09
    *Basic ideas from MapReduce - a major step backwards, D. DeWitt and M. Stonebraker
  • MapReduce – A major step backwards
    A giant step backward
    No schemas, Codasyl instead of Relational
    A sub-optimal implementation
    Uses brute force sequential search, instead of indexing
    Materializes O(m.r) intermediate files
    Does not incorporate data skew
    Not novel at all
    Represents a specific implementation of well known techniques developed nearly 25 years ago
    Missing most of the common current DBMS features
    Bulk loader, indexing, updates, transactions, integrity constraints, referential Integrity, views
    Incompatible with DBMS tools
    Report writers, business intelligence tools, data mining tools, replication tools, database design tools
  • MapReduce II*
    MapReduce didn't kill our dog, steal our car, or try and date our daughters. 
    MapReduce is not a database system, so don't judge it as one
    Both analyze and perform computations on huge datasets
    MapReduce has excellent scalability; the proof is Google's use
    Does it scale linearly?
    No scientific evidence
    MapReduce is cheap and databases are expensive
    We are the old guard trying to defend our turf/legacy from the young turks
    Propagation of ideas between sub-disciplines is very slow and sketchy
    Very little information is passed from generation to generation
    * http://www.databasecolumn.com/2008/01/mapreduce-continued.html
  • Tested Systems
    0.19 on Java 1.6, 256MB block size, JVM reuse
    Rack-awareness enabled
    DBMS-X (unnamed)
    Parallel DBMS from a “major relational db vendor”
    Row based, compression enabled
    Vertica(co-founded by Stonebraker)
    Column oriented
    Hardware configuration: 100 nodes
    2.4 GHz Intel Core 2 Duo
    4GB RAM, 2 250GB SATA hard disks
    GigE ports, 128Gbps switching fabric
  • Data Loading
    • Hadoop
    • Command line utility
    • DBMS-X
    • LOAD SQL command
    • Administrative command to re-organize data
    • Grep Dataset
    • Record = 10b key + 90b random value
    • 5.6 million records = 535MB/node
    • Another set = 1TB/cluster
  • Grep Task Results
  • Select Task Results
    FROMRankingsWHEREpageRank > X;
  • Join Task
    SELECT INTO Temp sourceIP,
    AVG(pageRank) as avgPageRank,
    SUM(adRevenue) as totalRevenue
    FROM Rankings AS R,
    UserVisits AS UV
    WHERE R.pageURL=UV.destURL
    AND UV.visitDate
    BETWEEN Date(‘2000-01-15’)
    AND Date(‘2000-01-22’)
    GROUP BY UV.sourceIP;
    SELECT sourceIP,totalRevenue,
    FROM Temp
    ORDER BY totalRevenue
  • Concluding Remarks
    DBMS-X 3.2 times, Vertica 2.3 times faster than Hadoop
    Parallel DBMS win because
    B-tree indices to speed the execution of selection operations,
    novel storage mechanisms (e.g., column-orientation)
    aggressive compression techniques with ability to operate directly on compressed data
    sophisticated parallel algorithms for querying large amounts of relational data.
    Ease of installation and use
    Fault tolerance?
    Loading data?
  • References
    “Clearing the air on cloud computing”, McKinsey&Company
    “Clearing the Air - Adobe Air, Google Gears and Microsoft Mesh”, FarhadJavidi
    “A Comparison of Approaches to Large-Scale Data Analysis”, Pavlo et al
    MapReduce - a major step backwards, D. DeWitt and M. Stonebraker