Cloud Computing Skepticism

533 views
481 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
533
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Cloud Computing Skepticism

  1. 1. AbhishekVerma
  2. 2.  Cloud computing hype  Cynicism  MapReduceVs Parallel DBMS  Cost of a cloud
  3. 3. Google App Engine (April 2008) Microsoft Azure (Oct 2008) Facebook Platform (May 2007) Amazon EC2 (August 2006) Amazon S3 (March 2006) Salesforce AppExchange (March 2006)
  4. 4. “No less influential than e-business” (Gartner, 2008) “Cloud computing achieves a quicker return on investment“ (LindsayArmstrong of salesforce.com, Dec 2008) “ Economic downturn, the appeal of that cost advantage will be greatly magnified" (IDC, 2008) “Revolution, the biggest upheaval since the invention of the PC in the 1970s […] IT departments will have little left to do once the bulk of business computing shifts […] into the cloud” (Nicholas Carr, 2008) “Not only is it faster and more flexible, it is cheaper. […] the emergence of cloud models radically alters the cost benefit decision“ (FT Mar 6, 2009) The economics are compelling, with business applications made three to five times cheaper and consumer applications five to 10 times cheaper (Merrill Lynch, May, 2008)
  5. 5. Cloud Computing * From http://en.wikipedia.org/wiki/Hype_cycle
  6. 6. “Cloud computing is simply a buzzword used to repackage grid computing and utility computing, both of which have existed for decades.” whatis.com Definition of Cloud Computing
  7. 7. “The interesting thing about cloud computing is that we’ve redefined cloud computing to include everything that we already do. […] The computer industry is the only industry that is more fashion-driven than women’s fashion. Maybe I’m an idiot, but I have no idea what anyone is talking about. What is it? It’s complete gibberish. It’s insane. When is this idiocy going to stop?” Larry Ellison During Oracle’s Analyst Day From http://blogs.wsj.com/biztech/2008/09/25/larry-ellisons-brilliant-anti-cloud-computing-rant/
  8. 8. From http://geekandpoke.typepad.com
  9. 9.  Many enterprise (necessarily or unnecessarily) set their SLAs uptimes at 99.99% or higher, which cloud providers have not yet been prepared to match Amazon’s cloud outages receive a lot of exposure … July 20, 2008 Failure due to stranded zombies, lasts 5 hours Feb 15, 2008 Authentication overload leads to two-hour service outage October 2007 Service failure lasts two days October 2006 Security breach where users could see other users data … and their current SLAs don’t match those of enterprises* Amazon EC2 99.95% Amazon S3 99.9% * SLAs expressed in Monthly Uptime Percentages; Source : McKinsey & Company • Not clear that all applications require such high services • IT shops do not always deliver on their SLAs but their failures are less public and customers can’t switch easily
  10. 10. Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel J. Abadi, David J. DeWitt, Samuel Madden, Michael Stonebraker To appear in SIGMOD ‘09 *Basic ideas from MapReduce - a major step backwards, D. DeWitt and M. Stonebraker
  11. 11.  A giant step backward  No schemas, Codasyl instead of Relational  A sub-optimal implementation  Uses brute force sequential search, instead of indexing  MaterializesO(m.r) intermediate files  Does not incorporate data skew  Not novel at all  Represents a specific implementation of well known techniques developed nearly 25 years ago  Missing most of the common current DBMS features  Bulk loader, indexing, updates, transactions, integrity constraints, referential Integrity, views  Incompatible with DBMS tools  Report writers, business intelligence tools, data mining tools, replication tools, database design tools
  12. 12. Architectural Element Parallel Databases MapReduce Schema Support Structured Unstructured Indexing B-Trees or Hash based None Programming Model Relational Codasyl Data Distribution Projections before aggregation Logic moved to data, but no optimizations Execution Strategy Push Pull Flexibility No, but Ruby on Rails, LINQ Yes FaultTolerance Transactions have to be restarted in the event of a failure Yes: Replication, Speculative execution
  13. 13.  MapReduce didn't kill our dog, steal our car, or try and date our daughters.  MapReduce is not a database system, so don't judge it as one  Both analyze and perform computations on huge datasets  MapReduce has excellent scalability; the proof is Google's use  Does it scale linearly?  No scientific evidence  MapReduce is cheap and databases are expensive  We are the old guard trying to defend our turf/legacy from the young turks  Propagation of ideas between sub-disciplines is very slow and sketchy  Very little information is passed from generation to generation * http://www.databasecolumn.com/2008/01/mapreduce-continued.html
  14. 14.  Hadoop  0.19 on Java 1.6, 256MB block size, JVM reuse  Rack-awareness enabled  DBMS-X (unnamed)  Parallel DBMS from a “major relational db vendor”  Row based, compression enabled  Vertica (co-founded by Stonebraker)  Column oriented  Hardware configuration: 100 nodes  2.4 GHz Intel Core 2 Duo  4GB RAM, 2 250GB SATA hard disks  GigE ports, 128Gbps switching fabric
  15. 15.  Hadoop  Command line utility  DBMS-X  LOAD SQL command  Administrative command to re- organize data  Grep Dataset  Record = 10b key + 90b random value  5.6 million records = 535MB/node  Another set = 1TB/cluster
  16. 16. SELECT * FROM DataWHERE field LIKE ‘%XYZ%’;
  17. 17. SELECT pageURL, pageRank FROM Rankings WHERE pageRank > X;
  18. 18. SELECT INTOTemp sourceIP, AVG(pageRank) as avgPageRank, SUM(adRevenue) as totalRevenue FROM RankingsAS R, UserVisitsAS UV WHERE R.pageURL = UV.destURL AND UV.visitDate BETWEEN Date(‘2000-01-15’) AND Date(‘2000-01-22’) GROUP BY UV.sourceIP; SELECT sourceIP, totalRevenue, avgPageRank FROMTemp ORDER BY totalRevenue DESC LIMIT 1;
  19. 19.  DBMS-X 3.2 times,Vertica 2.3 times faster than Hadoop  Parallel DBMS win because  B-tree indices to speed the execution of selection operations,  novel storage mechanisms (e.g., column-orientation)  aggressive compression techniques with ability to operate directly on compressed data  sophisticated parallel algorithms for querying large amounts of relational data.  Ease of installation and use  Fault tolerance?  Loading data?
  20. 20.  “Clearing the air on cloud computing”, McKinsey&Company  http://geekandpoke.typepad.com/  “Clearing the Air - Adobe Air, Google Gears and Microsoft Mesh”, Farhad Javidi  http://en.wikipedia.org/wiki/Hype_cycle  “A Comparison of Approaches to Large-Scale Data Analysis”, Pavlo et al  MapReduce - a major step backwards, D. DeWitt and M. Stonebraker

×