• Save
AWS Partner Presentation - PetaByte Scale Computing on Amazon EC2 with BigData - Vishal Malik, Cognizant
 

AWS Partner Presentation - PetaByte Scale Computing on Amazon EC2 with BigData - Vishal Malik, Cognizant

on

  • 6,999 views

 

Statistics

Views

Total Views
6,999
Views on SlideShare
2,510
Embed Views
4,489

Actions

Likes
1
Downloads
0
Comments
0

4 Embeds 4,489

http://softwarestrategiesblog.com 4480
http://lcolumbus.wordpress.com 4
http://techpostmedia.com 3
http://translate.googleusercontent.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

AWS Partner Presentation - PetaByte Scale Computing on Amazon EC2 with BigData - Vishal Malik, Cognizant AWS Partner Presentation - PetaByte Scale Computing on Amazon EC2 with BigData - Vishal Malik, Cognizant Presentation Transcript

  • PetaByte Scale Computing on Amazon EC2 with Big Data Vishal Malik Head Cloud CoE Cognizant
  • PetaByte scale computing on Amazon EC2 with BigData©2011, Cognizant
  • Some Background..•Only 5% of data on the web today is structured•Challenge would be cutting through the noise! Ability to process huge data, filter at scale. Turning raw unstructured data into insights using ML etc. Adding relevance to data via personalizing content. Analyzing data by applying ML about what user likes and give more of it. (driving online-ad revenues for example)•By 2013, we’ll have 650 Exabytes on internet!•Sentimental analysis in real-time will become more prevalent.•The need to process 40+ TB (compressed) data/day by single organization will become more prevalent.2 | ©2011, Cognizant
  • The Challenge?How to scale without significant increase in the infrastructure cost (processing & storage).How to do Analytics near real-time, as opposed to guess work! Process 5TB+ (uncompressed data) in less than 1 minute! (today it can take ~ an hour)People are asking new questions everyday, hence the need to have all data in DWH and ability to answer these questions near real-time. (Agile BI!)Feedback loop presenting data in line with user preferences.3 | ©2011, Cognizant
  • rDBMS: The “good and the bad”Good for: Relational data transactionsBad for: Queues, polling, caching Social graph tree traversal, NxN relationships Don’t require ACID for everything! Not good for scaling to PetaBytes of dataTraditional SQL based systems have: Replication delay & cache eviction produce inconsistent results to the at end-user. Slow (single threaded) Locks create contention for popular data hence can’t scale to PetaBytes4 | ©2011, Cognizant
  • Solution?Cost effective way to Process data and, Store dataProcessing side: One of the most popular ones are: Use Hadoop (Open Source MR framework) for back-end distributed processing. Build a sql-like (lightweight) layer on top of Hadoop. Access time is in micro-seconds, moving towards near-real time!Storage side: Popular and very stables ones are: Use S3, SimpleDB (from Amazon’s AWS) etc Private cloud using NoSQL db’s namely Hbase, CouchDB, MongoDB, Riak, Redis etc5 | ©2011, Cognizant
  • Current State of Storage Tiering Solutions Existing Innovation? required Solutions Customers are asking • Only h/w based option • Easy to manage • Cost of implementation for storage solutions storage is very high assuming that are cheaper and • East to implement RAID 6, RAID 10+0 easier to implement, storage systems. and other costly understand. • Have a say in policies options . set to move data • H/W based solution not “We’re seeing a big wherever required at user friendly and opportunity to position the disk level. policies set are • Visibility of what is transparent to the user. iMoveS where data is happening to my data • Purely based on disk growing significantly and how/where it is storage hardware from along with cost” stored support perspective. • Better control over • Storing 7TB in 6 hours where/how/what is or less is not possible stored on my storage using current disks systems. with 80MBytes/sec write rate. | ©2011, Cognizant
  • NoSQL DataStores…Make Storing/Retrieving of information easier to All done using manage & use  Based on access pattern, migrate iMoveS Engine data to the right storage engine • S/W based based on pre-set policies. E.g. < 10% writes go to Hbase. > 50GB checkpoint system stores go to HBase. < 50GB go to • Policy based data MongoDB.  Understand access patterns to object migration refine and retune policies under • Policy based data which data migration happens access/storage • Extreme ScalabilityMake S/W based storage • Great for machine engine do all the intelligent work generated data for  Performance gains analysis.  High availability  Administration & monitoring  Low cost/gigabytes Anyone should be able to store data and not worry about replication, RAID, mirroring. | ©2011, Cognizant
  • Thank You8 | ©2011, Cognizant