Hw09 Data Processing In The Enterprise


Published on

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Hw09 Data Processing In The Enterprise

  1. 1. Hadoop In the Enterprise? Sih Lee & Peter Krey, Innovation & Shared Services Firmwide Engineering & Architecture Hadoop World, New York City, October 2nd, 2009  2009 JPMorgan Chase & Co. All rights reserved. Confidential and proprietary to JPMorgan Chase & Co.
  2. 2. Agenda Page JPMorgan Chase + Open Source 2 Hadoop In The Enterprise? 3 Active POC Pipeline 6 Hadoop Positioning 7 Cost Comparisons 8 Hadoop Additions & Must Haves 10 Hadoop In The Enterprise ? Q&A 11 1
  3. 3. JPMorgan Chase + Open Source Established Multi-Year Open Source History Big Supporter of Industry Standards & Open Source Projects Numerous Production Open Source Implementations QPID (AMQP) - Top Level Apache Project (http://qpid.apache.org/) Tyger - Apache + Tomcat + Spring - Fully Integrated App Server Environment 30+ OS Components Compute Backbone (CBB) HPC Grid - 1000's of Linux Based Compute Hadoop In The Enterprise ? Servers MuleSoft.org (a.k.a. MuleSource) Enterprise Message Bus others … 2
  4. 4. Hadoop In The Enterprise – Economics Driven Many Big Data Lessons Learned From Web 2.0 Community Potential For Large Capex and Opex "Dislocation" Reduced Consumption of Enterprise Premium Resources Grid Computing Economics Brought To Data Intensive Computing Stagnant Data Innovation Enabling & Potentially Disruptive Platform Many Historical Similarities Java, Linux, Tomcat, Web / Internet, … Hadoop In The Enterprise ? Mini's to Client / Server, Client / Server to Web, Solaris to Linux, … Key Question: What Can Be Built On Top of and Enabled by Hadoop? 3
  5. 5. Hadoop In The Enterprise – Choice Driven Overuse of Relational Database Containers Institutional “Muscle Memory” … Not Much Else to Choose From Increasing Large Percentage of Static Data Stored In Proprietary Transactional DB's Over-Normalized Schemas … Still Makes Sense With Cheap Compute & Storage? Enterprise Storage "Prisoners" Hadoop In The Enterprise ? Captive To The Economics & Technology of "A Few" Vendors Developers Need More Choice Too Much Proprietary, Single-Source Data Infrastructure Increasing Need For Minimal / No System + Storage Admins 4
  6. 6. Hadoop In The Enterprise – Other Drivers Growing Developer Interest In "No SQL" Data Technologies Open Source, Distributed, Non-relational Databases Growing Influence Of Web 2.0 Technologies & Thinking On Enterprise Hadoop, Cassandra, HBase, Hive, CouchDB, HadoopDB, …, others memcached For Caching FSI Industry Drivers Increased Regulatory Oversight + Reporting = Hadoop In The Enterprise ? More Data Needed Over Longer Period Of Time Growing Need For Less Expensive Data Repository / Store Increasing Need To Support "One Off" Analysis On Large Data 5
  7. 7. Active POC Pipeline Growing Stream of Real Projects To Gauge Hadoop "Goodness of Fit" Broad Spectrum of Use Cases Driven By Need To Impact / Dislocate OPEX + CAPEX Evaluated On Metric Based Performance, Functional, And Economic Measures Hadoop In The Enterprise ? 6
  8. 8. Hadoop Positioning Semi-Structured Analysis Higher-Latency • Map/Reduce + HDFS • DW7 • DW6 • DW5 • DW3 • SQLDB1 • DW4 GB’s TB’s –> PB’s Hadoop In The Enterprise ? • SQLDB2 • DW2 • SQLDB3 • DW1 • InMemory1 • SQLDB4 Index Based Access – Index Based Access – Updates / XActns Analysis Lower-Latency 7
  9. 9. Comparative Storage Cost Bar Graph Slide “Normalized" SAN + NAS $ per gb per month versus HDFS $ per gb per month Hadoop In The Enterprise ? p p p p N N N N N N AS AS AS AS oo oo oo oo SA SA SA SA SA SA N N N N ad ad ad ad H H H H 8
  10. 10. Enterprise Data Warehousing Costs "normalized” bar chart utilizing retail $ per TB Data Warehouse S/W -- $K per TB $250 $200 $150 Hadoop In The Enterprise ? $100 $50 $0 Products 9
  11. 11. Hadoop Additions & Must Haves Improved SQL Front-end Tool Interoperability Better Interop With Skills & Content That Firms Already Have Improved Security & ACL enforcement … Kerberos integration? Grow Developer Programming Model Skill Sets Improve Relational Container Integration & Interop For Data Archival Management & Monitoring Tools Improved Developer & Debugging Tools Hadoop In The Enterprise ? Reduce Latency Via Integration With Open Source Data Caching memcached, others Invitation To FSI or Enterprise Roundtable 10
  12. 12. Q&A Sih Lee, Head of Innovation & Shared Services Firmwide Engineering & Architecture W# 212-622-3038 sih.x.lee@jpmchase.com Peter Krey, Consultant, Innovation & Shared Services Firmwide Engineering & Architecture W# 212-622-2926 peter.j.krey@jpmchase.com Hadoop In The Enterprise ? 11