Managing Growing Transaction Volumes Using Hadoop


Published on

Practical approach to managing growing data volumes by leveraging Hadoop in your Information Architecture

Published in: Data & Analytics
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Managing Growing Transaction Volumes Using Hadoop

  1. 1. The entire contents of this document are subject to copyright with all rights reserved. All copyrightable text and graphics, the selection, arrangement and presentation of all information and the overall design of the document are the sole and exclusive property of Virtusa. Copyright © 2010 Virtusa Corporation. All rights reserved Click to edit Master title style 2000 West Park Drive Westborough MA 01581 USA Phone: 508 389 7300 Fax: 508 366 9901 Managing Growing Transaction Volumes Using Hadoop Arvind Purushothaman – Director, IM Practice
  2. 2. 2 © Virtusa Corporation ● Confidential Agenda • Context Setting • CIO’s mandate • • Coexistence of architectures • Evaluation • Summary
  3. 3. 3 © Virtusa Corporation ● Confidential During this presentation... In the Millennial World 15 minutes is a long time…….. 1.8Mn tweets will be generated Apple will receive about 700,000 App downloads Brands & Organisations will receive around 500,000 likes on Facebook Over 14Mn status updates on FACEBOOK 54,000 photos will be shared on INSTAGRAM Over 13Mn pieces of new FACEBOOK content will be created Over 3Bn email messages will be sent Google will receive over 30Mn Search Queries Over 8,000 new websites will be created Sources: Forrester Research, Hubspot Centre for Social Media, The Social Skinny, AlTwitter
  4. 4. 4 © Virtusa Corporation ● Confidential …Consumers will spend over $5Mn online shopping During the course of this presentation…….. 44% of companies who tweet acquired new customers Almost 8 new people come onto the internet every second 57% of Companies who blog acquired new customers 61% of global internet users research products online 9/10 mobile searches lead to action… …Over half lead to purchaseSources: Forrester Research, Hubspot Centre for Social Media, The Social Skinny, AlTwitter
  5. 5. 5 © Virtusa Corporation ● Confidential BIG DATA BIG NOISE BIG OPPORTUNITY Technology enables you to make sense out of ALL Available Data
  6. 6. 6 © Virtusa Corporation ● Confidential How the Industry defines Big Data ? Gartner Defines Big Data is high-volume, high-velocity and high-variety information assets that demand cost effective, innovative forms of information processing for enhanced insight and decision making Forrester Defines The frontier of a firm’s ability to store, process and access (SPA) all the data it needs to operate effectively, make decisions, reduce risks, and serve customers. IBM: “….Big data is more than simply a matter of size; it is an opportunity to find insights in new and emerging types of data and content, to make your business more agile, and to answer questions that were previously considered beyond your reach..” Oracle: “…. Big Data refers to datasets that grow so large that it is difficult to capture, store, manage, share, analyze and visualize with the typical database software tools…” Website Network Switches Social Media RFIDTransactional / operational systems
  7. 7. 7 © Virtusa Corporation ● Confidential CIO’s manifesto Support business growth through innovation Lower costs Both are not optional – you need to lower costs and innovate at the same time In the Information Management world, this means exponentially more data volumes, different types of data More investments in data storage, computing power, licenses What is the way forward?
  8. 8. 8 © Virtusa Corporation ● Confidential Relational/Analytical Relational/Analytical Financial Data Marketing Data Data Warehouse (Relational) Data Mart Data Mart Sales Data Data Warehouse Access Parametric & Ad Hoc reporting OLAP Dashboards Exploratory Visualization Direct Data AccessETL Data Points Data stores Access to BI Platform Insight Generation Hadoop As Data Transformation Platform Transactions Logs Big Data Cluster (Hadoop) Parsed data Analytic data sets Raw Data Master Data Real Time Store (No SQL) Big Data Access BusinessIntelligencePlatform Statistical Analysis Machine Learning OpenSourceETL StreamingETL
  9. 9. 9 © Virtusa Corporation ● Confidential Hybrid Architecture For A Telecom Client That Leverages HDFS, HBase, and Oracle 11g Integration & Infrastructure Platform SDEDS (APP10765) Bill & Payments Platform ONM (APP10487) HDFS HADOOP CLUSTER Raw Call Data CDR Store MapReduce  ICS  OCS  Answered  Unanswered  Diverted  Others REST GATEWAY UI Reports UI Reports UI Reports ETL Call Summary Data Oracle DB  Month  Date  Hour Level
  10. 10. 10 © Virtusa Corporation ● Confidential Technology Components Of Hadoop Core • HDFS + MapReduce Data Movement • Relational Database – Sqoop • Real-time – Flume NoSQL •HBase Scheduling • Oozie Analytics • Cloudera Impala, Tableau with Hive Machine Learning •Mahout
  11. 11. 11 © Virtusa Corporation ● Confidential 3W’s – What, Where and When Traditional DW data Semi and Un-structured dataHistorical , Infrequently AccessedLegal & Regulatory Insights Post shelf life Post processing – DW 85% tables and 50% columns unused* * Source: TDWI
  12. 12. 12 © Virtusa Corporation ● Confidential Decision Points Source: Dr. Amr Awadallah and Dan Graham, “Hadoop and the Data Warehouse: When to Use Which”, copublished by Cloudera, Inc. and Teradata Corporation. *HBase.
  13. 13. 13 © Virtusa Corporation ● Confidential Cost Considerations ETL Hadoop Hardware Expensive Low Software Expensive Low Development Medium Medium Maintenance High Low Investment High upfront Invest as needed
  14. 14. 14 © Virtusa Corporation ● Confidential How Can You Get Started • Hadoop as an Enterprise Data Management platform is here to stay • Get started – either moving “unused data” or bringing in additional sources and types of data • In addition to “back-end” type functions, it provides Analytical capabilities in its own right • To start small, leverage Hadoop on the Cloud • Co-Existence is going to be the key for successful adoption Build a good use case before you start, build a POC, Evangelize It
  15. 15. US - Boston, New York UK - Windsor, London India – Hyderabad, Chennai Sri Lanka - Colombo © 2010 All rights reserved. Virtusa and all other related logos are either registered trademarks or trademarks of Virtusa Corporation in the United States, the European Union, and/or India. All other company and service names are the property of their respective holders and may be registered trademarks or trademarks in the United States and/or other countries.