0
Open Source SOA in
         the Cloud: Data
       Analytics in the Cloud
Tom Plunkett   TomPlunkett@vt.edu
Michael Sick  ...
Overview

                                                  • Who are we?
                                 Introductions
 ...
Introductions




                                     Data Analytics in the Cloud:                Data Analytics
        ...
Introductions



                                                                                                         ...
Introductions



                                                                                                         ...
Introductions



                                                                                                         ...
Introductions



                                                                                                         ...
Introductions



                                                                                                         ...
Introductions



                                                                                                         ...
Introductions




                   Data Clouds & Data Grids – What‘s the                                                ...
Introductions



                                                                                                         ...
Introductions




                                     Data Analytics in the Cloud:                Data Analytics
        ...
Introductions




               Use Case: Cloud Data Analytical Tools for                                                ...
Introductions




              Why the “Buzzword” Soup? Convergence                                                   Dat...
Introductions




                                         Early Data Analytic Cloud                                      ...
Introductions




                                     Data Analytics in the Cloud:                 Data Analytics
       ...
Introductions



                                                                                                         ...
Introductions



                                                                                                         ...
Introductions




                          As-Is Hadoop Simplified Reference                                       Data A...
Introductions



                                                                                                         ...
Introductions




                                     Data Analytics in the Cloud:                 Data Analytics
       ...
Introductions



                                                                                                         ...
Introductions



                                                                                                         ...
Introductions



                                                                                                         ...
Introductions




                                     Data Analytics in the Cloud:                 Data Analytics
       ...
Introductions



                                                                                                         ...
Upcoming SlideShare
Loading in...5
×

Data Analytics In The Cloud Soa World

1,849

Published on

Data Analytics in the Cloud presentation at SOA World, part of the SOA & Cloud Computing track, focus on open source software, SOA, data analytics, Apache Hadoop

Published in: Technology, Education
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,849
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
109
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

Transcript of "Data Analytics In The Cloud Soa World"

  1. 1. Open Source SOA in the Cloud: Data Analytics in the Cloud Tom Plunkett TomPlunkett@vt.edu Michael Sick michael.sick@serenesoftware.com SOA World 2009
  2. 2. Overview • Who are we? Introductions • Baselines & definitions • Targeted Use Cases Opportunity • Technical convergence & opportunities • Commercial opportunities & drivers • State of current technology Data Analytics Technology & • Commercial & FOSS solutions in the Cloud Standards • Hadoop Focus • Challenges to Meet Target Use Cases Challenges • Economic challenges & the role of “free” • Wide scale challenges in Cloud and data analytics • Questions Questions • Contacts This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 2 License
  3. 3. Introductions Data Analytics in the Cloud: Data Analytics in the Cloud Opportunity Technology & Standards Introductions Challenges Questions Introductions Opportunity Data Analytics Technology & in the Cloud Standards Challenges Questions This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 3 License
  4. 4. Introductions Opportunity Tom Plunkett Data Analytics Technology & in the Cloud Standards Challenges Questions Extensive Federal Government Experience IBM Certified SOA Solution Designer Patents Teach OOP and Java for Virginia Tech This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 4 License
  5. 5. Introductions Opportunity Michael Sick Data Analytics Technology & in the Cloud Standards Challenges Questions Commercial & Federal Enterprise Architect Owner: Serene Software Inc. – EA Services Firm Clients include: BAE, USAF, Raytheon, BearingPoint, McGraw-Hill, Sun Microsystems, Badcock Furniture Fascinated by technology -15 years running This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 5 License
  6. 6. Introductions Opportunity Serene Software Data Analytics Technology & in the Cloud Standards Challenges Questions • Serene is a boutique consulting company focusing on delivery of Enterprise Architecture services and solutions • Service Areas – IT Governance – IT Strategy – IT Cost Containment – Service Oriented Architectures (SOA) – IT Solution Selection – IT Audit & Analysis • Experience includes: BAE, USAF, Raytheon, BearingPoint, McGraw-Hill, Sun Microsystems, Badcock Furniture, … • Founded in 2003 (privately held, no debt) and headquartered in Jacksonville, FL This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 6 License
  7. 7. Introductions Opportunity Draft NIST Definition of Cloud Computing Data Analytics Technology & in the Cloud Standards Challenges Questions A model for enabling convenient, on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and relea- sed with minimal management effort or service provider interaction Essential Characteristics Delivery Models Deployment Models • On-demand self-service • Cloud Software as a • Private cloud Service (SaaS) • Ubiquitous network access • Community cloud • Cloud Platform as a Service • Location independent • Public cloud (PaaS) resource pooling • Hybrid cloud • Cloud Infrastructure as a • Rapid elasticity Service (IaaS) • Measured Service Source: Draft NIST Definition of Cloud Computing, 06/2009 This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 7 License
  8. 8. Introductions Opportunity OSI Open Source Definition Data Analytics Technology & in the Cloud Standards Challenges Questions Free Redistribution Source Code Derived Works Integrity of The Author's Source Code No Discrimination Against Persons or Groups No Discrimination Against Fields of Endeavor Distribution of License License Must Not Be Specific to a Product License Must Not Restrict Other Software License Must Be Technology-Neutral Source: http://www.opensource.org/docs/osd This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 8 License
  9. 9. Introductions Opportunity The Open Group SOA Definition Data Analytics Technology & in the Cloud Standards Challenges Questions Service-Oriented Architecture (SOA) is an architectural style that supports service orientation Service orientation is a way of thinking in terms of services and service-based development and the outcomes of services Source: http://www.opengroup.org/projects/soa/doc.tpl?gdid=10632 This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 9 License
  10. 10. Introductions Data Clouds & Data Grids – What‘s the Data Analytics in the Cloud Opportunity Technology & Standards difference? Challenges Questions Often Data Clouds & Data Grids are used inter- changeably, we make the following distinctions Data Grids Data Clouds • Grid computing system optimized to share • Focuses on perception of infinite storage, large amounts of distributed data computing capacity • Focus on technical capabilities • Focus on cost, virtualization & flexible capacity • Often combined with computational grid computing systems • Enables scale-up/scale-down economics • Data often moved to compute grid for use • Data moved rarely, locality is a key feature • Often oriented towards highly structured • Clouds thus far focusing on column scientific data computing applications oriented, massively scalable data stores Sources: Wikipedia & [Grossman 1] This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 10 License
  11. 11. Introductions Opportunity Definition: Mashups Data Analytics Technology & in the Cloud Standards Challenges Questions Web available resource that combines data/functions from two or more external resources Idea of mashup efforts is to reduce the cost of producing and consuming resources Integration should be fast, easy Often focuses on widely available formats/protocols like RSS or Atom over HTTP This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 11 License
  12. 12. Introductions Data Analytics in the Cloud: Data Analytics in the Cloud Opportunity Technology & Standards Opportunities Challenges Questions Introductions Opportunity Data Analytics Technology & in the Cloud Standards Challenges Questions This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 12 License
  13. 13. Introductions Use Case: Cloud Data Analytical Tools for Data Analytics in the Cloud Opportunity Technology & Standards Intelligence Community Field Analyst Challenges Questions Problem Statement: Analytical Tools Obsolete On Deployment, field analysts need timely, configurable data analytics. How does cloud based DA meet the needs of IC analysts Cloud Analytical Customer Problem Customer Value Tools Solution • Traditional business • Recomposable Cloud • Enabling field analysts to intelligence tools require Computing Data Analytical quickly build the analytical years to develop Tools tool they need to analyze petabytes of data • Field Analysts confront – Apache Hadoop situations which are rapidly – Mashups changing – Service-Oriented • Petabytes of data require Architecture analysis This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 13 License
  14. 14. Introductions Why the “Buzzword” Soup? Convergence Data Analytics in the Cloud Opportunity Technology & Standards of Capabilities Challenges Questions Convergence of capabilities Free Open New opportunities in breadth Source and depth of DA services Software • Big Data: Cloud disk and data (FOSS) storage engines make peta- byte environments available to new clients • Value Based Billing: Heavy Virtual- Cloud Data use of FOSS in the cloud SaaS reduces costs directly & ization Computing Analytics indirectly • Capacity Scaling: Scaling up/down of capacity in pay-go fashion makes DA available to wider audience Mashups • Composable UI’s: Capability to assemble DA results into various interfaces This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 14 License
  15. 15. Introductions Early Data Analytic Cloud Data Analytics in the Cloud Opportunity Technology & Standards Consumers/Providers Challenges Questions Profile Types Example Companies Big Internet Companies • Yahoo, Amazon – can build DA on inf. Internet Scale Services Service SaaS Companies • Force.com – DA & Warehousing to SBA’s Providers • Facebook – sell DA access to anon. user info Social Platforms Insurers • BCBS – private clouds across consortium Services Large data- centric Tradi- Healthcare & Biotech • Kaiser Permanente – common DA services Cloud DA tional Co’s Rating Agencies • S & P – open DA cloud to customers Oppor- tunities Intelligence Community • CIA –private org-wide Cloud Services Government Defense Managed Services • DISA -- offer DA to .mil clients Organizations Healthcare • SSA – offer DA to fraud prevention analysts Services DAaas Infrastructure • Cloudera –managed Hadoop instances DAaaS Providers SMB DAaaS Provider • ?? – managed DAaaS, simplified, low cost This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 15 License
  16. 16. Introductions Data Analytics in the Cloud: Data Analytics in the Cloud Opportunity Technology & Standards Technology & Standards Challenges Questions Introductions Opportunity Data Analytics Technology & in the Cloud Standards Challenges Questions This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 16 License
  17. 17. Introductions Opportunity Google MapReduce Data Analytics Technology & in the Cloud Standards Challenges Questions Algorithm for computing distributed problems using a divide and conquer approach with a cluster of nodes Master node Maps input into smaller sub-problems and distributes the work to the cluster. A worker node may further map the work for a further cluster of nodes. The worker nodes then process the smaller problems, and return the answers back to the master node Master node then Reduces the set of answers into the answer to the original problem This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 17 License
  18. 18. Introductions Opportunity Apache Hadoop Data Analytics Technology & in the Cloud Standards Challenges Questions Open Source implementation of the MapReduce algorithms Hadoop can store and process petabytes of data Subprojects include HBase, Chukwa, Hive, Pig, and ZooKeeper Yahoo (more than 100,000 CPUs in >25,000 computers running Hadoop) and other companies make extensive use of Hadoop This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 18 License
  19. 19. Introductions As-Is Hadoop Simplified Reference Data Analytics in the Cloud Opportunity Technology & Standards Architecture Challenges Questions Chukwa HBase Structured Data Apache Hadoop Unstructured Zookeeper Data Business ETL Pig Hive Intelligence This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 19 License
  20. 20. Introductions Opportunity Apache Hadoop Sub-projects Data Analytics Technology & in the Cloud Standards Challenges Questions Hadoop Sub- Capabilities Example Companies projects Chukwa • Data collection system for monitoring and • Yahoo analyzing large distributed systems HBase • Similar to Google’s BigTable • Yahoo • Distributed database for structured data • Multi-dimensional sorted map Hive • Data warehouse infrastructure for large • Facebook datasets • Hive QL query language Pig • High-level language for data analysis • Yahoo • Compiler for Map-Reduce programs Zookeeper • Configuration, Naming, Distributed • Yahoo Synchronization, and group services This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 20 License
  21. 21. Introductions Data Analytics in the Cloud: Data Analytics in the Cloud Opportunity Technology & Standards Challenges Challenges Questions Introductions Opportunity Data Analytics Technology & in the Cloud Standards Challenges Questions This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 21 License
  22. 22. Introductions Opportunity To-Be Simplified Hadoop Architecture Data Analytics Technology & in the Cloud Standards Challenges Questions REST API HBase SOAP API Business Structured Intelligence Data Query Apache Hadoop Language Unstructured Pig Chukwa Zookeeper Data Hive Algorithm Library ETL This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 22 License
  23. 23. Introductions Opportunity Key Challenges Data Analytics Technology & in the Cloud Standards Challenges Questions Hardware Speed of Rack Interconnects, Multi-core Infrastructure Parallelization Core platform, Data Analytic Components Node Affinity Make use of super nodes, XML i/o, en/de-crypt Cost “brutally efficient” pricing, FOSS advantages Adoption Cost Models Accurate, open models of CapEx, OpEx costs Migration Pain Full warehouse migration, ETL, Ease of Admin. Parallel current RDBMS, Warehouse admin Debugging Distributed debugging, integration w/ Provider Emerging Administration Challenges Flexible Provisioning Multi-level provisioning – co., dept, individual System Reporting Reporting, audit trails, view to DA system ETL Integration Interface, metadata optimized for ETL loading Input & Analysis Intuitive API’s Declarative & programmatic cross language Product Integration BI, Applications (SAP, Oracle Financial, Lawson) Data Visualization Viewing & drill down of very large data sets Output Intuitive API’s Declarative & programmatic cross language Mashups/Dynamics Easy discovery of data & functions & workflows This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 23 License
  24. 24. Introductions Opportunity Solutions: Projected & In-Progress Data Analytics Technology & in the Cloud Standards Challenges Questions Hardware Interconnect $$ dropping, hardware maturing Infrastructure Parallelization Platforms advance, market for components Node Affinity Discovery of capability, affinity into Hadoop, … Cost FOSS’s game to loose, small diff * a lot = a lot Adoption Cost Models Industry standard ROI/IRR models for CC Migration Pain Migration toolkits for traditional DW products Ease of Admin. Integrated & extended admin packages Debugging Commercial distributed debugging Emerging Administration Challenges Flexible Provisioning Multi-level provisioning – co., dept, individual System Reporting Reporting, audit trails, view to DA system ETL Integration ETL interface, support of popular packages Input & Analysis Intuitive API’s SQL like interface in core, language bindings Product Integration 3rd party adaptors, IWay et al Data Visualization Modeling, meta-data, traceability, and new UI’s Output Intuitive API’s SQL like interface in core, language bindings Mashups/Dynamics Generic datatypes, discovery services This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 24 License
  25. 25. Introductions Data Analytics in the Cloud: Data Analytics in the Cloud Opportunity Technology & Standards Questions Challenges Questions Introductions Opportunity Data Analytics Technology & in the Cloud Standards Challenges Questions This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 25 License
  26. 26. Introductions Opportunity Question? & Contact Information Data Analytics Technology & in the Cloud Standards Challenges Questions Principle Architect / Partner Cloud Computing Architect Michael A. Sick Tom Plunkett 888.777.1847 888.777.1847 michael.sick@serenesoftware.com TomPlunkett@vt.edu Address Address Serene Software Serene Software 116 19th Ave. North, Suite 503 116 19th Ave. North, Suite 503 Jacksonville Beach, FL Jacksonville Beach, FL URL: www.serenesoftware.com URL: www.serenesoftware.com This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 26 License
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×