Data Analytics In The Cloud Soa World
Upcoming SlideShare
Loading in...5
×
 

Data Analytics In The Cloud Soa World

on

  • 1,252 views

Exploration of Use Cases, Technology, and Upcoming Challenges and Solutions of Cloud Based Data Analytics - Focus on Hadoop

Exploration of Use Cases, Technology, and Upcoming Challenges and Solutions of Cloud Based Data Analytics - Focus on Hadoop

Statistics

Views

Total Views
1,252
Views on SlideShare
1,239
Embed Views
13

Actions

Likes
0
Downloads
26
Comments
0

3 Embeds 13

http://www.linkedin.com 10
http://www.slideshare.net 2
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Data Analytics In The Cloud Soa World Data Analytics In The Cloud Soa World Presentation Transcript

  • Open Source SOA in the Cloud: Data Analytics in the Cloud Tom Plunkett TomPlunkett@vt.edu Michael Sick michael.sick@serenesoftware.com SOA World 2009
  • Overview Unit of measure • Who are we? Introductions • Baselines & definitions • Targeted Use Cases Opportunity • Technical convergence & opportunities • Commercial opportunities & drivers • State of current technology Data Analytics Technology & • Commercial & FOSS solutions in the Cloud Standards • Hadoop Focus • Challenges to Meet Target Use Cases Challenges • Economic challenges & the role of “free” • Wide scale challenges in Cloud and data analytics • Questions Questions • Contacts * Footnote Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 2 License
  • Introductions Data Analytics in the Cloud: Data Analytics in the Cloud Opportunity Technology & Standards Introductions Challenges Questions Unit of measure Introductions Opportunity Data Analytics Technology & in the Cloud Standards Challenges Questions * Footnote Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 3 License View slide
  • Introductions Opportunity Tom Plunkett Data Analytics Technology & in the Cloud Standards Challenges Questions Unit of measure Extensive Federal Government Experience Java and SOA Certifications Patents Teach OOP and Java for Virginia Tech * Footnote Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 4 License View slide
  • Introductions Opportunity Michael Sick Data Analytics Technology & in the Cloud Standards Challenges Questions Unit of measure Commercial & Federal Enterprise Architect Owner: Serene Software Inc. – EA Services Firm Clients include: BAE, USAF, Raytheon, BearingPoint, McGraw-Hill, Sun Microsystems, Badcock Furniture Fascinated by technology -15 years running * Footnote Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 5 License
  • Introductions Opportunity Serene Software Data Analytics Technology & in the Cloud Standards Challenges Questions Unit of measure • Serene is a boutique consulting company focusing on delivery of Enterprise Architecture services and solutions • Service Areas – Cloud Computing – IT Governance – IT Strategy – IT Cost Containment – Service Oriented Architectures (SOA) – IT Solution Selection – IT Audit & Analysis • Experience includes: BAE, USAF, Raytheon, BearingPoint, McGraw-Hill, Sun Microsystems, Badcock Furniture, … • Founded in 2003 (privately held, no debt) and headquartered in Jacksonville, FL * Footnote Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 6 License
  • Introductions Opportunity Draft NIST Definition of Cloud Computing Data Analytics Technology & in the Cloud Standards Challenges Questions Unit of measure for enabling convenient, on-demand network access to a shared pool A model of configurable computing resources that can be rapidly provisioned and relea- sed with minimal management effort or service provider interaction Essential Characteristics Delivery Models Deployment Models • On-demand self-service • Cloud Software as a Service • Private cloud (SaaS) • Ubiquitous network access • Community cloud • Cloud Platform as a Service • Location independent • Public cloud (PaaS) resource pooling • Hybrid cloud • Cloud Infrastructure as a • Rapid elasticity Service (IaaS) • Measured Service * Footnote Source: Draft NIST Definition of Cloud Computing, 06/2009 Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 7 License
  • Introductions Opportunity OSI Open Source Definition Data Analytics Technology & in the Cloud Standards Challenges Questions Unit of measure Free Redistribution Source Code Derived Works Integrity of The Author's Source Code No Discrimination Against Persons or Groups No Discrimination Against Fields of Endeavor Distribution of License License Must Not Be Specific to a Product License Must Not Restrict Other Software License Must Be Technology-Neutral * Footnote Source: http://www.opensource.org/docs/osd Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 8 License
  • Introductions Opportunity The Open Group SOA Definition Data Analytics Technology & in the Cloud Standards Challenges Questions Unit of measure Service-Oriented Architecture (SOA) is an architectural style that supports service orientation Service orientation is a way of thinking in terms of services and service-based development and the outcomes of services * Footnote Source: http://www.opengroup.org/projects/soa/doc.tpl?gdid=10632 Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 9 License
  • Introductions Data Clouds & Data Grids – What‘s the Data Analytics in the Cloud Opportunity Technology & Standards difference? Challenges Questions Unit of measure Often Data Clouds & Data Grids are used inter- changeably, we make the following distinctions Data Grids Data Clouds • Grid computing system optimized to share • Focuses on perception of infinite storage, large amounts of distributed data computing capacity • Focus on technical capabilities • Focus on cost, virtualization & flexible capacity • Often combined with computational grid computing systems • Enables scale-up/scale-down economics • Data often moved to compute grid for use • Data moved rarely, locality is a key feature • Often oriented towards highly structured • Clouds thus far focusing on column scientific data computing applications oriented, massively scalable data stores * Footnote Sources: Wikipedia & [Grossman 1] Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 10 License
  • Introductions Opportunity Definition: Mashups Data Analytics Technology & in the Cloud Standards Challenges Questions Unit of measure Web available resource that combines data/functions from two or more external resources Idea of mashup efforts is to reduce the cost of producing and consuming resources Integration should be fast, easy Often focuses on widely available formats/protocols like RSS or Atom over HTTP * Footnote Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 11 License
  • Introductions Data Analytics in the Cloud: Data Analytics in the Cloud Opportunity Technology & Standards Opportunities Challenges Questions Unit of measure Introductions Opportunity Data Analytics Technology & in the Cloud Standards Challenges Questions * Footnote Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 12 License
  • Introductions Use Case: Cloud Data Analytical Tools for Data Analytics in the Cloud Opportunity Technology & Standards Intelligence Community Field Analyst Challenges Questions Unit of measure Problem Statement: Analytical Tools Obsolete On Deployment, field analysts need timely, configurable data analytics. How does cloud based DA meet the needs of IC analysts Cloud Analytical Customer Problem Customer Value Tools Solution • Traditional business • Recomposable Cloud • Enabling field analysts to intelligence tools require Computing Data Analytical quickly build the analytical years to develop Tools tool they need to analyze petabytes of data • Field Analysts confront – Apache Hadoop situations which are – Mashups rapidly changing – Service-Oriented • Petabytes of data require Architecture analysis * Footnote Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 13 License
  • Introductions Why the “Buzzword” Soup? Convergence Data Analytics in the Cloud Opportunity Technology & Standards of Capabilities Challenges Questions Unit of measure Convergence of capabilities Free Open New opportunities in breadth Source and depth of DA services Software • Big Data: Cloud disk and data (FOSS) storage engines make peta- byte environments available to new clients • Value Based Billing: Heavy Virtual- Cloud Data use of FOSS in the cloud SaaS reduces costs directly & ization Computing Analytics indirectly • Capacity Scaling: Scaling up/down of capacity in pay-go fashion makes DA available to wider audience Mashups • Composable UI’s: Capability to assemble DA results into * Footnote various interfaces Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 14 License
  • Introductions Early Data Analytic Cloud Opportunity Data Analytics Technology & in the Cloud Standards Consumers/Providers Challenges Questions Unit of measure Profile Types Example Companies Big Internet Companies • Yahoo, Amazon – can build DA on inf. Internet Scale Services Service SaaS Companies • Force.com – DA & Warehousing to SBA’s Providers • Facebook – sell DA access to anon. user info Social Platforms Insurers • BCBS – private clouds across consortium Services Large data- centric Tradi- Healthcare & Biotech • Kaiser Permanente – common DA services Cloud DA tional Co’s Rating Agencies • S & P – open DA cloud to customers Oppor- tunities • CIA –private org-wide Cloud Intelligence Community Services Government Defense Managed Services • DISA -- offer DA to .mil clients Organizations Healthcare • SSA – offer DA to fraud prevention analysts Services DAaas Infrastructure • Cloudera –managed Hadoop instances DAaaS Providers SMB DAaaS Provider • ?? – managed DAaaS, simplified, low cost * Footnote Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 15 License
  • Introductions Data Analytics in the Cloud: Data Analytics in the Cloud Opportunity Technology & Standards Technology & Standards Challenges Questions Unit of measure Introductions Opportunity Data Analytics Technology & in the Cloud Standards Challenges Questions * Footnote Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 16 License
  • Introductions Opportunity Google MapReduce Data Analytics Technology & in the Cloud Standards Challenges Questions Unit of measure Algorithm for computing distributed problems using a divide and conquer approach with a cluster of nodes Master node Maps input into smaller sub-problems and distributes the work to the cluster. A worker node may further map the work for a further cluster of nodes. The worker nodes then process the smaller problems, and return the answers back to the master node Master node then Reduces the set of answers into the answer to the original problem * Footnote Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 17 License
  • Introductions Opportunity Apache Hadoop Data Analytics Technology & in the Cloud Standards Challenges Questions Unit of measure Open Source implementation of the MapReduce algorithms Hadoop can store and process petabytes of data Subprojects include HBase, Chukwa, Hive, Pig, and ZooKeeper Yahoo (more than 100,000 CPUs in >25,000 computers running Hadoop) and other companies make extensive use of Hadoop * Footnote Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 18 License
  • Introductions As-Is Hadoop Simplified Reference Data Analytics in the Cloud Opportunity Technology & Standards Architecture Challenges Questions Unit of measure Chukwa HBase Structured Data Apache Hadoop Unstructured Zookeeper Data Business ETL Pig Hive Intelligence * Footnote Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 19 License
  • Introductions Opportunity Apache Hadoop Sub-projects Data Analytics Technology & in the Cloud Standards Challenges Questions Unit of measure Hadoop Sub- Capabilities Example Companies projects Chukwa • Data collection system for monitoring and • Yahoo analyzing large distributed systems HBase • Similar to Google’s BigTable • Yahoo • Distributed database for structured data • Multi-dimensional sorted map Hive • Data warehouse infrastructure for large • Facebook datasets • Hive QL query language Pig • High-level language for data analysis • Yahoo • Compiler for Map-Reduce programs Zookeeper • Configuration, Naming, Distributed • Yahoo * Footnote Synchronization, and group services Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 20 License
  • Introductions Data Analytics in the Cloud: Data Analytics in the Cloud Opportunity Technology & Standards Challenges Challenges Questions Unit of measure Introductions Opportunity Data Analytics Technology & in the Cloud Standards Challenges Questions * Footnote Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 21 License
  • Introductions Opportunity To-Be Simplified Hadoop Architecture Data Analytics Technology & in the Cloud Standards Challenges Questions Unit of measure REST API HBase SOAP API Business Structured Intelligence Data Query Apache Hadoop Language Unstructured Pig Chukwa Zookeeper Data Hive Algorithm Library ETL * Footnote Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 22 License
  • Introductions Opportunity Key Challenges Data Analytics Technology & in the Cloud Standards Challenges Questions Unit of measure Hardware Speed of Rack Interconnects, Multi-core Infrastructure Parallelization Core platform, Data Analytic Components Node Affinity Make use of super nodes, XML i/o, en/de-crypt Cost “brutally efficient” pricing, FOSS advantages Adoption Cost Models Accurate, open models of CapEx, OpEx costs Migration Pain Full warehouse migration, ETL, Ease of Admin. Parallel current RDBMS, Warehouse admin Debugging Distributed debugging, integration w/ Provider Emerging Administration Challenges Flexible Provisioning Multi-level provisioning – co., dept, individual System Reporting Reporting, audit trails, view to DA system ETL Integration Interface, metadata optimized for ETL loading Input & Analysis Intuitive API’s Declarative & programmatic cross language Product Integration BI, Applications (SAP, Oracle Financial, Lawson) Data Visualization Viewing & drill down of very large data sets Output Intuitive API’s Declarative & programmatic cross language * Footnote Mashups/Dynamics Easy discovery of data & functions & workflows Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 23 License
  • Introductions Opportunity Solutions: Projected & In-Progress Data Analytics Technology & in the Cloud Standards Challenges Questions Unit of measure Hardware Interconnect $$ dropping, hardware maturing Infrastructure Parallelization Platforms advance, market for components Node Affinity Discovery of capability, affinity into Hadoop, … Cost FOSS’s game to loose, small diff * a lot = a lot Adoption Cost Models Industry standard ROI/IRR models for CC Migration Pain Migration toolkits for traditional DW products Ease of Admin. Integrated & extended admin packages Debugging Commercial distributed debugging Emerging Administration Challenges Flexible Provisioning Multi-level provisioning – co., dept, individual System Reporting Reporting, audit trails, view to DA system ETL Integration ETL interface, support of popular packages Input & Analysis Intuitive API’s SQL like interface in core, language bindings Product Integration 3rd party adaptors, IWay et al Data Visualization Modeling, meta-data, traceability, and new UI’s Output Intuitive API’s SQL like interface in core, language bindings * Footnote Mashups/Dynamics Generic datatypes, discovery services Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 24 License
  • Introductions Data Analytics in the Cloud: Data Analytics in the Cloud Opportunity Technology & Standards Questions Challenges Questions Unit of measure Introductions Opportunity Data Analytics Technology & in the Cloud Standards Challenges Questions * Footnote Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 25 License
  • Introductions Opportunity Question? & Contact Information Data Analytics Technology & in the Cloud Standards Challenges Questions Unit of measure Principle Architect / Partner Cloud Computing Architect Michael A. Sick Tom Plunkett 888.777.1847 888.777.1847 michael.sick@serenesoftware.com TomPlunkett@vt.edu Address Address Serene Software Serene Software 116 19th Ave. North, Suite 503 116 19th Ave. North, Suite 503 Jacksonville Beach, FL Jacksonville Beach, FL URL: www.serenesoftware.com URL: www.serenesoftware.com * Footnote Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 26 License