Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Enterprise Integration of Disruptive Technologies


Published on

This talk will detail the HSBC Big Data journey to date walking through the genesis of the Big Data initiative which was triggered by continual challenges in delivering data driven products. The global scale, diversity and legacy of an organization like HSBC presents challenges for Hadoop adoption not typically faced by younger companies. Big Data technologies are by their very nature disruptive to the established Enterprise IT environment. Hadoop and the peripheral toolsets in the big data ecosystem do not fit comfortably into an Enterprise Data Centre, IT Operational processes and can even prove disruptive to current organization structures. Alasdair will focus on the steps that HSBC has taken to mitigate concerns about Hadoop and raise awareness of the game changing benefits a successful adoption of the technology will bring. HSBC have taken an innovative approach to proving out the value of the technology engaging developers with a brakes off opportunity to use the platform and by placing Hadoop in a competitive scenario with traditional technologies. The Hadoop journey in HSBC was initiated in Scotland, blessed in London and proved out in China.

Published in: Technology
  • Be the first to comment

Enterprise Integration of Disruptive Technologies

  1. 1. Big Data AdoptionEnterprise Integration of Disruptive TechnologiesPrepared by: Date:Alasdair Anderson 18 March 2013PUBLIC
  2. 2. Big Data and the Enterprise != 2 PUBLIC
  3. 3. Business Context: HSBC (HSS) a business with a lot of data…..Global BusinessGlobal outsourcer ofinvestment operationsActive in 40+ countries& jurisdictionsOver 150 operationaltechnology systemsOutsourcing is adiverse andincrementally complexbusiness 3 PUBLIC
  4. 4. Challenges in building Big Data Environments ETL is a brittle 1 shot at success One version of the truth…. Design Tight coupling to the relational model Any significant change initiates data migration Time Source Integration Warehouse Division Marts Channels Ops Product Product Read ODS ETL Product eCommerce Trades Product ETL Position ETL Enterprise Logical Strategic Marts Analytical Corp CMF Actions Model Tools Function Function Read External ETL Function ETL Staging Function ReportingMarket Data ClientExchange Vertical Scale RDBMS struggle with scale out Multi-Marts increase duplication Run Big Batch Appliances are uneconomic Cost increases with proliferation Time Time to Market: Months for any given slice, years in total Total Cost: Any volume or low latency environment requires annual spend in the millions to 10’s of millions 4 PUBLIC
  5. 5. Building Big Data platforms has been an unhappy experience Time to market has increased proliferation not consolidation Delivery risk is high, as witnessed in industry wide failure rates Ultimate Customer satisfaction is low, we often end up answering yesterdays questions tomorrow The economics of traditional technologies are against proliferation of analytical platforms – Costs increase with addition of data sources – Costs of change increase with addition of data sources Processing ceilings are reached quickly when adding newer sources of data to traditional platforms 5 PUBLIC
  6. 6. Crisis of Supply and Demand, we need a new approach High level requirements…… A single data platform that can provide 360 views of clients, operations and products – Functionally the platform should support: – Continual development, integration and deployment – Parallel development streams – Integration of poly-structured datasets – Multi-views on single data sets – ……..act as an ENABLER of change – Non-functionally the platform should support: – A low cost economic model for analytical platforms – Scale to terabytes with high throughput ingest and integration – Co-exist with our current estate – Be accessible to business and technology teams Enter Hadoop! 6 PUBLIC
  7. 7. Introducing any new technology to an enterprise Adoption Lifecycle: Hadoop Learn Plan Build Proof Business Pilot Projects Of Concept Value Strategic Stack What have we done? Whats left, whats next? 7 PUBLIC
  8. 8. Big Data Vision 8 PUBLIC
  9. 9. Big Data Vision: The Agile Information Lifecycle Data Events Discovery Analytical Blotters Application Map Reduce Ingest ProcessingInsights rarely happen on the first query or build, more likely to occur after several iterations on a dataset 9 PUBLIC
  10. 10. Hadoop Proof of Concept Scope: Gaungzhou China Using Time to install Ease of Performance a vendor maintaining Hadoop comparison package the cluster Developing Integration of Building Porting existing applications existing code on Hadoop databases on the cluster to Hadoop Advanced Enhance an Build out a Development existing Analytics skills levels analytics new modelling service on Hadoop package 10 PUBLIC
  11. 11. Proof of Concept Results Hadoop was installed and operational in a week 18 RDBMS Warehouse and Marts databases were ported to Hadoop in 4 weeks A existing batch that currently take 3 hours was reengineering on Hadoop: Run Time 10 minutes A current Java based analytics routine was ported onto Hadoop increasing data coverage and reducing execution time We lost the namenode and had to rebuild the cluster….. 11 PUBLIC
  12. 12. Hadoop Code Day: Gaungzhou ChinaWe sponsored a 24 hour code competitionto allow the off-shore teams to show theirstuffWe had over 50 volunteers for the eventThe volunteers were split into teams of 3and given 24 hours to develop anapplication using the Proof of Conceptcluster1 weeks training was offered to theparticipant on a casual basisAll the teams delivered………… 12 PUBLIC
  13. 13. Next Step: Planning Adoption Lifecycle Learn Plan Build Proof Business Pilot Projects Of Concept Value Strategic Stack 13 PUBLIC
  14. 14. Big Data Plan: Big Data Economics (names removed to protect the innocent) 14 PUBLIC
  15. 15. Hadoop Economics: Technology for Austerity REVENUE MARGIN COST Hadoop speaks to the economics of today Growing product and capacity at the same time as increasing margin 15 PUBLIC
  16. 16. Generic HSBC Big Data Use Cases Volume File Processing Big Warehouse Advanced Analytics Characteristics Characteristics Characteristics • High Volume, High Throughput • Multi-source warehouse analytics • Statistical modeling and what if processing of legacy flat files, XML environment providing a single data analysis on group wide data across or other structured and semi- platform across multiple business multiple business lines structured data lines • Production of data derived products • Integration of polystructred data Current challenges Current challenges Current challenges • Cost: High volumes processing • Time to Market: Data Warehouse / • Scale: Traditional Analytic Data predominantly still reside on the MI projects have proved extremely platforms have only been able to mainframe, making low complexity challenging to implement in HSBC scale on the vertical processing expensive and in the Finance Industry in • Cost: The amount of compute • Scale: the ability to grow out general power required to perform volume mainframe capacity quickly is • Complexity: Data Integration of statistical operations is cost limited, the ability to scale on even group standard systems has prohibitive distributed platforms is limited proved difficult due the variety of • Fidelity: Analytical calculations are data structures and content typically run on aggregate totals • Latency: Real Time MI is still only leading to a disconnect between available via reporting from source events and the derived conclusions directly or decisions . Day 1 Value Strategic Value 16 PUBLIC
  17. 17. Big Data Plan: When and Where 17 PUBLIC
  18. 18. Next Step: Planning Adoption Lifecycle Learn Plan Build Proof Business Pilot Projects Of Concept Value Strategic Stack 18 PUBLIC
  19. 19. So we’re done? Not quite…… 19 PUBLIC
  20. 20. Remaining Challenges: Big Data Operations Big Data Operations Big Data Organisation Hype / CynicismIs Hadoop anti-virtualisation? Segregation of duties USE IT AS A POSITIVE!!! High Availability / disaster Big Data doesn’t want a Place Big Data into a competitive Recovery needs to improve separate app, database, os & situation against your existing storage team. The platform Information Management Security and data privacy demands skilled generalists technologies, if you can’t get the concerns job done better/faster/cheaper then alter your decision tree? Data Federation PUBLIC 20
  21. 21. The art of the possible in 24 hours….. Hadoop excites…… Hadoop on iPad & Android (and tires) The Winners…. Hadoop on HTML5 & Flex Hadoop & R for Portfolio Optimisation 21 PUBLIC