Enterprise Integration of Disruptive Technologies

3,712 views
3,608 views

Published on

This talk will detail the HSBC Big Data journey to date walking through the genesis of the Big Data initiative which was triggered by continual challenges in delivering data driven products. The global scale, diversity and legacy of an organization like HSBC presents challenges for Hadoop adoption not typically faced by younger companies. Big Data technologies are by their very nature disruptive to the established Enterprise IT environment. Hadoop and the peripheral toolsets in the big data ecosystem do not fit comfortably into an Enterprise Data Centre, IT Operational processes and can even prove disruptive to current organization structures. Alasdair will focus on the steps that HSBC has taken to mitigate concerns about Hadoop and raise awareness of the game changing benefits a successful adoption of the technology will bring. HSBC have taken an innovative approach to proving out the value of the technology engaging developers with a brakes off opportunity to use the platform and by placing Hadoop in a competitive scenario with traditional technologies. The Hadoop journey in HSBC was initiated in Scotland, blessed in London and proved out in China.

Published in: Technology
0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,712
On SlideShare
0
From Embeds
0
Number of Embeds
38
Actions
Shares
0
Downloads
0
Comments
0
Likes
8
Embeds 0
No embeds

No notes for slide
  • In essence: We are a processor of other peoples dataChallengesNobody does data the same way, even in the same systemsDifferences are inDefinitionsFormatscontent
  • In essence: We are a processor of other peoples dataChallengesNobody does data the same way, even in the same systemsDifferences are inDefinitionsFormatscontent
  • Dedicated ETL is an expensive way of doing thingsBig RDBMS or dedicated appliances are expensiveMarts mart everywhereCONCLUSION: high volume or/and low latency is very expensive to runRESULT: People are becoming reluctant to invest in these platforms and are looking for a service that can start small and grow
  • The road to damascus…..Vision is HSS only at this point in timeThe search for an alternate way of doing things has led us to hadoopHadoop lowers the barrier to entry for compute style solution to data problemsCONCLUSION: We view Hadoop as THE future technology for data platformsRESULT: We have begun the tech adoption process in the bank
  • The road to damascus…..Vision is HSS only at this point in timeThe search for an alternate way of doing things has led us to hadoopHadoop lowers the barrier to entry for compute style solution to data problemsCONCLUSION: We view Hadoop as THE future technology for data platformsRESULT: We have begun the tech adoption process in the bank
  • Todays biggest business challenge: Information management currently representsAgility in delivering data integrationFlexibility to present multi-views of dataBiggest business opportunity: AnalyticsScenario modellingPortfolio efficiency measurementThese all require big compute
  • …..here’s what it looks likeWalk left to rightExplain Map ReduceContrast with the old way, our vision of the new wayEDW will be around for some time to come but will be gradually superceededMap Reduce will be implemented via high level languagesA single warehouse become achievableMarts are demised in favour of views onto the base dataThe value add will come via data discovery….iterative ETL…..hypothesis testingCONCLUSION: Hadoop brings massive compute levels to bear on these problems, affordably
  • The is the next generation ETLETL process become truly iterativeAccept that you will get it wrong the first time round, Hadoop make the penalty for failure minimalThe value add will come via data discovery….iterative ETL…..hypothesis testingCONCLUSION: ETL moves from brittle to bend don’t breakRESULT: In building your Big Warehouse adding additional data/systems/perspect is a low tax operation
  • Where we’ve got toGo through the key challengesCONCLUSION: It’s a journey, and we’re walking through it just nowRESULT: first 2 have been addressed, challenges remain
  • Where we’ve got toGo through the key challengesCONCLUSION: It’s a journey, and we’re walking through it just nowRESULT: first 2 have been addressed, challenges remain
  • ….our experience wasA vendor Hadoop package makes sense to an organisation like usData loads tooks days not monthsWe were quickly able to automate the loadsUsed Apache tools onlyBONUS Calypso data…. New for HSSHACKATHONOpen invite to all markets staffObjective; to use Hadoop against the business use caseSet judging criteriaStraight 24 hours over a weekendCompetition Prizes Attended by nearly 60 staff, equal to 20% of our China office18 teams, 17 delivered Wining application was stunningCONCLUSION: Hadoop is a great functional fit for our business demandRESULT: High level of confidence around the technology
  • Todays biggest business challenge: Information management currently representsAgility in delivering data integrationFlexibility to present multi-views of dataBiggest business opportunity: AnalyticsScenario modellingPortfolio efficiency measurementThese all require big compute
  • Where we’ve got toGo through the key challengesCONCLUSION: It’s a journey, and we’re walking through it just nowRESULT: first 2 have been addressed, challenges remain
  • The is the next generation ETLETL process become truly iterativeAccept that you will get it wrong the first time round, Hadoop make the penalty for failure minimalThe value add will come via data discovery….iterative ETL…..hypothesis testingCONCLUSION: ETL moves from brittle to bend don’t breakRESULT: In building your Big Warehouse adding additional data/systems/perspect is a low tax operation
  • Where we’ve got toGo through the key challengesCONCLUSION: It’s a journey, and we’re walking through it just nowRESULT: first 2 have been addressed, challenges remain
  • Todays biggest business challenge: Information management currently representsAgility in delivering data integrationFlexibility to present multi-views of dataBiggest business opportunity: AnalyticsScenario modellingPortfolio efficiency measurementThese all require big compute
  • Where we’ve got toGo through the key challengesCONCLUSION: It’s a journey, and we’re walking through it just nowRESULT: first 2 have been addressed, challenges remain
  • Where we’ve got toGo through the key challengesCONCLUSION: It’s a journey, and we’re walking through it just nowRESULT: first 2 have been addressed, challenges remain
  • ….our experience wasA vendor Hadoop package makes sense to an organisation like usData loads tooks days not monthsWe were quickly able to automate the loadsUsed Apache tools onlyBONUS Calypso data…. New for HSSHACKATHONOpen invite to all markets staffObjective; to use Hadoop against the business use caseSet judging criteriaStraight 24 hours over a weekendCompetition Prizes Attended by nearly 60 staff, equal to 20% of our China office18 teams, 17 delivered Wining application was stunningCONCLUSION: Hadoop is a great functional fit for our business demandRESULT: High level of confidence around the technology
  • Enterprise Integration of Disruptive Technologies

    1. 1. Big Data AdoptionEnterprise Integration of Disruptive TechnologiesPrepared by: Date:Alasdair Anderson 18 March 2013PUBLIC
    2. 2. Big Data and the Enterprise != 2 PUBLIC
    3. 3. Business Context: HSBC (HSS) a business with a lot of data…..Global BusinessGlobal outsourcer ofinvestment operationsActive in 40+ countries& jurisdictionsOver 150 operationaltechnology systemsOutsourcing is adiverse andincrementally complexbusiness 3 PUBLIC
    4. 4. Challenges in building Big Data Environments ETL is a brittle 1 shot at success One version of the truth…. Design Tight coupling to the relational model Any significant change initiates data migration Time Source Integration Warehouse Division Marts Channels Ops Product Product Read ODS ETL Product eCommerce Trades Product ETL Position ETL Enterprise Logical Strategic Marts Analytical Corp CMF Actions Model Tools Function Function Read External ETL Function ETL Staging Function ReportingMarket Data ClientExchange Vertical Scale RDBMS struggle with scale out Multi-Marts increase duplication Run Big Batch Appliances are uneconomic Cost increases with proliferation Time Time to Market: Months for any given slice, years in total Total Cost: Any volume or low latency environment requires annual spend in the millions to 10’s of millions 4 PUBLIC
    5. 5. Building Big Data platforms has been an unhappy experience Time to market has increased proliferation not consolidation Delivery risk is high, as witnessed in industry wide failure rates Ultimate Customer satisfaction is low, we often end up answering yesterdays questions tomorrow The economics of traditional technologies are against proliferation of analytical platforms – Costs increase with addition of data sources – Costs of change increase with addition of data sources Processing ceilings are reached quickly when adding newer sources of data to traditional platforms 5 PUBLIC
    6. 6. Crisis of Supply and Demand, we need a new approach High level requirements…… A single data platform that can provide 360 views of clients, operations and products – Functionally the platform should support: – Continual development, integration and deployment – Parallel development streams – Integration of poly-structured datasets – Multi-views on single data sets – ……..act as an ENABLER of change – Non-functionally the platform should support: – A low cost economic model for analytical platforms – Scale to terabytes with high throughput ingest and integration – Co-exist with our current estate – Be accessible to business and technology teams Enter Hadoop! 6 PUBLIC
    7. 7. Introducing any new technology to an enterprise Adoption Lifecycle: Hadoop Learn Plan Build Proof Business Pilot Projects Of Concept Value Strategic Stack What have we done? Whats left, whats next? 7 PUBLIC
    8. 8. Big Data Vision 8 PUBLIC
    9. 9. Big Data Vision: The Agile Information Lifecycle Data Events Discovery Analytical Blotters Application Map Reduce Ingest ProcessingInsights rarely happen on the first query or build, more likely to occur after several iterations on a dataset 9 PUBLIC
    10. 10. Hadoop Proof of Concept Scope: Gaungzhou China Using Time to install Ease of Performance a vendor maintaining Hadoop comparison package the cluster Developing Integration of Building Porting existing applications existing code on Hadoop databases on the cluster to Hadoop Advanced Enhance an Build out a Development existing Analytics skills levels analytics new modelling service on Hadoop package 10 PUBLIC
    11. 11. Proof of Concept Results Hadoop was installed and operational in a week 18 RDBMS Warehouse and Marts databases were ported to Hadoop in 4 weeks A existing batch that currently take 3 hours was reengineering on Hadoop: Run Time 10 minutes A current Java based analytics routine was ported onto Hadoop increasing data coverage and reducing execution time We lost the namenode and had to rebuild the cluster….. 11 PUBLIC
    12. 12. Hadoop Code Day: Gaungzhou ChinaWe sponsored a 24 hour code competitionto allow the off-shore teams to show theirstuffWe had over 50 volunteers for the eventThe volunteers were split into teams of 3and given 24 hours to develop anapplication using the Proof of Conceptcluster1 weeks training was offered to theparticipant on a casual basisAll the teams delivered………… 12 PUBLIC
    13. 13. Next Step: Planning Adoption Lifecycle Learn Plan Build Proof Business Pilot Projects Of Concept Value Strategic Stack 13 PUBLIC
    14. 14. Big Data Plan: Big Data Economics (names removed to protect the innocent) 14 PUBLIC
    15. 15. Hadoop Economics: Technology for Austerity REVENUE MARGIN COST Hadoop speaks to the economics of today Growing product and capacity at the same time as increasing margin 15 PUBLIC
    16. 16. Generic HSBC Big Data Use Cases Volume File Processing Big Warehouse Advanced Analytics Characteristics Characteristics Characteristics • High Volume, High Throughput • Multi-source warehouse analytics • Statistical modeling and what if processing of legacy flat files, XML environment providing a single data analysis on group wide data across or other structured and semi- platform across multiple business multiple business lines structured data lines • Production of data derived products • Integration of polystructred data Current challenges Current challenges Current challenges • Cost: High volumes processing • Time to Market: Data Warehouse / • Scale: Traditional Analytic Data predominantly still reside on the MI projects have proved extremely platforms have only been able to mainframe, making low complexity challenging to implement in HSBC scale on the vertical processing expensive and in the Finance Industry in • Cost: The amount of compute • Scale: the ability to grow out general power required to perform volume mainframe capacity quickly is • Complexity: Data Integration of statistical operations is cost limited, the ability to scale on even group standard systems has prohibitive distributed platforms is limited proved difficult due the variety of • Fidelity: Analytical calculations are data structures and content typically run on aggregate totals • Latency: Real Time MI is still only leading to a disconnect between available via reporting from source events and the derived conclusions directly or decisions . Day 1 Value Strategic Value 16 PUBLIC
    17. 17. Big Data Plan: When and Where 17 PUBLIC
    18. 18. Next Step: Planning Adoption Lifecycle Learn Plan Build Proof Business Pilot Projects Of Concept Value Strategic Stack 18 PUBLIC
    19. 19. So we’re done? Not quite…… 19 PUBLIC
    20. 20. Remaining Challenges: Big Data Operations Big Data Operations Big Data Organisation Hype / CynicismIs Hadoop anti-virtualisation? Segregation of duties USE IT AS A POSITIVE!!! High Availability / disaster Big Data doesn’t want a Place Big Data into a competitive Recovery needs to improve separate app, database, os & situation against your existing storage team. The platform Information Management Security and data privacy demands skilled generalists technologies, if you can’t get the concerns job done better/faster/cheaper then alter your decision tree? Data Federation PUBLIC 20
    21. 21. The art of the possible in 24 hours….. Hadoop excites…… Hadoop on iPad & Android (and tires) The Winners…. Hadoop on HTML5 & Flex Hadoop & R for Portfolio Optimisation 21 PUBLIC

    ×