The Elephant In The Room Big Data Analytics In the Cloud    Bill Peer, Principal, Infosys Labs                 UP 2012    ...
What’s on the agenda?•   Definitions•   Big Data and Analytic Technologies•   Architecture Stuff•   SummaryAppendix : Refe...
Definitions – Big Data• Big Data –       data processing scenarios wherein the volume,                   variety, and/or v...
Definitions – Analytics• Analytics –   discovery of meaningful patterns in data  Bill’s Two Uses:        • Decision Suppor...
Past, Present, Future(images sourced from WikiCommons)                                    5
Big Data and Analytic Technologies                      6
Big Data and Analytic Technology : 3 to Know• Based on Google Paper published in 2004 (MapReduce)• Can be segmented into 2...
Big Data and Analytic Technology : 3 to Know                                DRILL• Based on Google Paper published in 2010...
Big Data and Analytic Technology : 3 to Know                                      Storm• Event Streaming platform used by ...
A Cloud Centric Big Data NRT Architecture                   CEP                                Interactive                ...
Big, Big Data Analytic Architecture Consideration• Data Transfer Speed   • Where is your data? Is it where you will be pro...
Framework For Selecting Approach                     12
Summary•   “approaches for near-real time Business Intelligence and Analytics”•   “Info. on technologies ranging from Hado...
Feedback FormsPlease extract from your walletOne of the feedback forms to the rightAdd any commentary you have in theWhite...
References• Big Data Spectrum, Infosys         http://www.infosys.com/cloud/resource-center/Pages/big-data-spectrum.aspx• ...
Upcoming SlideShare
Loading in...5
×

The elephantintheroom bigdataanalyticsinthecloud

214

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
214
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "The elephantintheroom bigdataanalyticsinthecloud"

  1. 1. The Elephant In The Room Big Data Analytics In the Cloud Bill Peer, Principal, Infosys Labs UP 2012 Cloud Computing ConferenceSan Francisco, California – December 12, 2012
  2. 2. What’s on the agenda?• Definitions• Big Data and Analytic Technologies• Architecture Stuff• SummaryAppendix : References 2
  3. 3. Definitions – Big Data• Big Data – data processing scenarios wherein the volume, variety, and/or velocity of the data is such that conventional RDBMS and/or Data Warehouse technologies alone do not suffice for the need, Bill’s Stake In The Ground: • Volume - Greater than 100 GB • Variety - Structured and Unstructured (forms, video, blogs, photos, …) • Velocity - 10 GB per hour 3
  4. 4. Definitions – Analytics• Analytics – discovery of meaningful patterns in data Bill’s Two Uses: • Decision Support (to help make a choice) -Business Intelligence -Operational Intelligence • Value Creation (to add worth) -Algorithm Discovery -Analytics as a Service 4
  5. 5. Past, Present, Future(images sourced from WikiCommons) 5
  6. 6. Big Data and Analytic Technologies 6
  7. 7. Big Data and Analytic Technology : 3 to Know• Based on Google Paper published in 2004 (MapReduce)• Can be segmented into 2 key capabilities: MapReduce and HDFS• Designed to work in a distributed, fault possible environment MapReduce – HDFS – Job Based! Processing Hadoop File System Orchestration (Reliable independent Pig Latin - Language to explore data Framework of persistence Hive QL– SQL like calls (Great if a problem mechanism by way of Mahout – Machine Learning collection can be easily divided) multi-node replication) 7
  8. 8. Big Data and Analytic Technology : 3 to Know DRILL• Based on Google Paper published in 2010 (Dremel)• Provides analysis of large-scale datasets• Designed to work in a distributed environment Query Languages- Low-Latency Apache Incubator Phase Google BigQuery Distributed “[Dremel] is capable of running aggregation Execution – queries over trillion-row tables in seconds. Columnar centric The system scales to thousands of CPUs storage and petabytes of data, and has thousands of users at Google.” src: Google Dremel Paper 8
  9. 9. Big Data and Analytic Technology : 3 to Know Storm• Event Streaming platform used by Twitter• Allows for continuous real-time data spelunking• Designed to work in a distributed environment leveraging clusters Resident Queries- Topology Centric- Event Streaming is Different Requests for event You create graphs of computation Storm can be used effectively to build a patterns of interest Complex Event Processing (CEP) are continuously capability by an enterprise. As with other watched for CEP type frameworks, it requires a shift to an uncommon perspective to be effective. 9
  10. 10. A Cloud Centric Big Data NRT Architecture CEP Interactive Query *Architecture Graphic is a modified version of WSO2’s BAM picture Not Cloud In Cloud Not Cloud 10
  11. 11. Big, Big Data Analytic Architecture Consideration• Data Transfer Speed • Where is your data? Is it where you will be processing? • 1TB of Data takes: • 300 hours over a 10Mbps network • 30 hours over a 100Mbps network • 3 hours over a 1Gbps network • 20 minutes over a 10Gbps network 11
  12. 12. Framework For Selecting Approach 12
  13. 13. Summary• “approaches for near-real time Business Intelligence and Analytics”• “Info. on technologies ranging from Hadoop to Dremel to Event Streaming “• “applicability and limitations of these when in the Cloud”• “high-level architectures that must be considered will be shared”• “entertained, energized, and enlightened”• “realistic frame of reference to bring back to their organization”• “Journey to the Clouds”• “Dumbo can really fly” 13
  14. 14. Feedback FormsPlease extract from your walletOne of the feedback forms to the rightAdd any commentary you have in theWhite space, and hand to thePresenter after the sessionThank you for attending!See you in the Clouds! 14
  15. 15. References• Big Data Spectrum, Infosys http://www.infosys.com/cloud/resource-center/Pages/big-data-spectrum.aspx• Dremel: Interactive Analysis of Web-Scale Datasets, Melnik et. all, Google http://research.google.com/pubs/pub36632.html• DrillProposal, Apache http://wiki.apache.org/incubator/DrillProposal• Storm Rationale https://github.com/nathanmarz/storm/wiki/Rationale• WSO2 BAM, wso2 http://wso2.com/products/business-activity-monitor/ 15

×