Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.



Published on

  • I can advise you this service - ⇒ ⇐ Bought essay here. No problem.
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this


  1. 1. Beyond the buzz – what does “Big Data” mean to your organization? Attila Barta, Ph.D. Head of Architecture at Private Client Group and BMO Insurance
  2. 2. 1BIG DATA WORLD CANADA 2013 Introduction to this presentation •This presentation covers the following topics: There is more in Big Data than Hadoop. To understand the Big Data buzz, one has to go to the beginnings and understand the forces that brought Big Data to life. Is Big Data another buzz world like Semantic Web, Web 2.0 or Cloud? Where are Canadian companies on Big Data in comparison with the World? How a reference Big Data architecture looks like. Big Data at BMO Financial Group. The road ahead, what needs to be done. •Note: this presentation reflects the opinions of the author alone and by no means of BMO Financial Group.
  3. 3. 2BIG DATA WORLD CANADA 2013 Big Data – How we got here •In a 2001 research report[1] Gartner analyst Doug Laney defined data growth challenges and opportunities as being three-dimensional, i.e. increasing volume (amount of data), velocity (speed of data in and out), and variety (range of data types and sources). Gartner, and now much of the industry, continue to use this "3Vs" model for describing Big Data[2]. (source Wikipedia). •What was happening in 2001? Three major trends:  Sloan Digital Sky Survey began collecting astronomical data in 2000 at a rate of 200GB/night – volume  Sensor networks (web of things) and streaming databases (Message Oriented Middleware) – velocity  Semi-structured databases, XML native databases beside object-oriented, relational databases – variety •What happened after 2001?  Rise of search engines and portals - Yahoo and Google: • Problem: how to store and query (cheaply) in real time large amounts of (semi-structured) data. • Answer: Hadoop on commodity Linux farms.  Memory got cheaper – in-memory data grids.  Rise of Social Media – petabytes in pictures, unstructured and semi-structured data.  Increased computational power and large memory – visual analytics.
  4. 4. 3BIG DATA WORLD CANADA 2013 Big Data – Definitions and Examples •In 2012, Gartner updated its definition as follows: "Big data are high-volume, high-velocity, and/or high-variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization“[3]. • In 2012 IDC defines Big Data technologies as “a new generation of technologies and architectures designed to extract value economically from very large volumes of a wide variety of data by enabling high-velocity capture, discovery, and/or analysis”[4]. •In 2012 Forrester characterize Big Data as “increases in data volume, velocity, variety, and variability”[5]. •Big Data Characteristics: 1. Data Volume: data size in order of petabytes. • Example: Facebook on June 13, 2012 announced that their had reached 100 PB of data. On November 8, 2012 they announced that their warehouse grows by half a PB per day. 2. Data Velocity: real time processing of streaming data, including real time analytics. • Example: a jet engine generates 20TB data/hour that has to be processed near real time. 3. Data Variety: structured, semi-structured, text, imagines, video, audio, etc. • Example: 80% of enterprise data is unstructured. YouTube - 500TB of video uploaded per year 4. Data Variability: data flows can be inconsistent with periodic peaks. • Example: blogs commenting the new Blackberry 10; stock market data that reacts to market events.
  5. 5. 4BIG DATA WORLD CANADA 2013 Big Data – In Canada, where are we? •In December 2012 IDC published a study of Big Data in Canada [4] by surveying 75 businesses with over 250MM in revenue. The conclusions of the survey are sobering:  Less than one tenth of the respondents were familiar wit Hadoop (the Big Data framework) and slightly more familiar with in memory data grids and in-memory analytics.  Only half of Canadian organization already work with Big Data in comparison with more than three quarters worldwide.  The majority of Canadian companies use mainly internally produced data with less than a quarter of Canadian organizations using data from non-traditional sources such as social media web data, RFID tags and GPS.  Big Data strategies are delegated to mid-level management level, while world-class companies integrate technology decisions at the executive level.
  6. 6. 5BIG DATA WORLD CANADA 2013 Big Data – What are we missing in Canada? •McKinsey Global Institute published “Big Data: The next frontier for innovation, competition and productivity” in May 2011. In the sectors that they examined they estimated opportunities of hundreds of billion/yearly in savings or new businesses by unleashing the potential of Big Data [6]. •Big Data immediate business opportunities:  Transparent omni-channel information environment – an evolution of multi-channel characterized by a seamlessly approach to the consumer experience through all available interaction channels.  Sentiment analysis – data from social media enable organizations to perceive and analyze client sentiment in order to better tailor marketing campaigns, products and services.  Predictive models – based on real-time data streams determine likelihood to churn and take pre-emptive actions for customer retention.  Social technologies – not only understand holistically the client (the 360-degree view), but understand the clients network of family, friends and peers in order to build the client 720-degree view.  Location data – better understand behaviour, better offers based on location.  Operational improvement: RFI and sensor networks allows (retailers) to get insights into demand and better manage inventory and supply chains.
  7. 7. 6BIG DATA WORLD CANADA 2013 Big Data – Reference Architecture •Typical architectures for Big Data address the following capabilities: 1.Real-time complex event processing (including sense and response). 2.Massive volumes of data (petabytes) relational and non-relational (i.e. social media, location, RFID). 3.Parallel processing/fast loading, typically based on Hadoop. 4.High-performance query systems based on in-memory data architectures. 5.Advanced analytics, e.g. visual analytics, columnar databases. Virtual Infrastructure Workload Management Infrastructure Services Event Mgmt. Query (SQL, non-SQL) Processing Advanced Analytics Shared nothing hwd, massively parallel Commodity; own or rent Massive load via parallel processing Data Stream A variant of the Forrester architecture [5] Stream Processing Non-relational dbms Data Management Relational dbms Distributed File System In-Memory Data Grid
  8. 8. 7BIG DATA WORLD CANADA 2013 Big Data – at BMO Financial Group Virtual Infrastructure Workload Management Infrastructure Services Event Mgmt. Query (SQL, non-SQL) Processing Advanced Analytics Client Omni-Channel Interactions Tableau, SAS Spotfire, HANA Tibco BusinessEvents Stream Processing Non-relational dbms Data Management Relational dbms Distributed File System In-Memory Data Grid Tibco ActiveSpaces, HANA Sybase IQ PaaS, IaaS •Big Data is work in progress at BMO Financial Group with some areas more advanced then others:  Event management and in-memory data grids are state of the art.  Advanced analytics are in transition to mature.  Infrastructure virtualization is in progress.  Hadoop infrastructure not in scope yet.  Non-relational capability is in its infancy. • Operational • Proof of Concept Legend Note: the vendor list is by no means exhaustive, these are some of the technologies in use or in PoC.
  9. 9. 8BIG DATA WORLD CANADA 2013 Big Data – Capabilities at BMO Financial Group •How the reference Big Data capabilities are reflected at BMO Financial Group: 1.Real-time complex event processing (including sense and response): • Built a state of the art omni-channel sense and response capability based on a Tibco stack. • Deployed real time in-bound lead management capability in 2011 that generated a significant increase in up-sale and cross-sale – major new revenue for the Retail Bank. 2.Massive volumes of data (petabytes) relational and non-relational (i.e. social media, location, RFID): • Data volumes manageable within the current infrastructure. • Location data is currently available and in plan to be harvested. • Plans on using social media data for sentiment analysis. 3.Parallel processing/fast loading, typically based on Hadoop: • Not in plan, the current ETL investment is performing well. 4.High-performance query architecture based on in-memory data architectures: • Running a state of the art in-memory data grid for real time event processing as well as for client 360- degree view. • Currently evaluating in-memory data grids for real time risk management as well as several regulatory requirements, like Anti-Money-Laundering and Client Risk Management. 5.Advanced analytics, i.e. visual analytics, columnar databases: • There are several advanced analytics tools in use such as Tableau and Sybase IQ, while currently evaluating Tibco Spotfire, HANA and others.
  10. 10. 9BIG DATA WORLD CANADA 2013 Big Data – Impact on Enterprise Information Management •Is the traditional MDM redundant?  By no means; while there are in-memory MDM implementations it rather makes sense to keep the current investment and load to in-memory databases only subsets of MDM data, e.g. client 360-degree view or any other data elements needed for event management, sense and response or other capabilities. •What will happen with the current EDW?  Not much; transactional data will still be an important source for BI. However, the full power of parallel query processing and the parallelism built into hardware should be harvested.  EDWs should be augmented with social data, location data, either directly or via service providers in order to provide the foundation for sentiment analysis and predictive modeling. •Are ETLs tools done?  Depends. This is the sweet spot where vendors are pitching Hadoop. Moreover, is your enterprise ready for Hadoop? Are you ready to move to commodity hardware? Do you have the skills for both commodity hardware and Hadoop? •Time to retire current BI tools (e.g. Cognos, Business Objects, etc.)?  Definitely not; continue to use the current management reports and dash-boards.  Educate business on the new visual analytic tools and let them decide the way forward.  Educate business on the new BI capabilities enabled by in-memory data bases. •However be aware of the new competitor that is building it’s Information Management from scratch and with the proper Big Data technology might compromise your established business advantage!
  11. 11. 10BIG DATA WORLD CANADA 2013 Big Data – Organizational challenges •What needs to be done:  In Big Data initiatives business leaders have to take the initiative. The new role of the CIO team is to educate business in Big Data and its opportunities versus defining and leading initiatives.  CIOs have to take a holistic approach to Big Data by considering all Big Data capabilities and define strategies accordingly, instead of focusing on some capabilities like fast ETL loading for which Hadoop is a quick fix.  Adapt the Information Management Strategy to include behavioral oriented data, like social data, as well as location and sensor data.  Change the BI strategy towards commoditization and massive parallel processing.  Big Data requires new skill set for handling Hadoop environments as well as in-memory data and advanced analytics. McKinsey predicts a current shortage of more than a hundred thousand Big Data professionals in the US alone [6]. •Last but not least:  Big Data is an evolution of many technologies around for the last decade or so. Although, with the potential to be a technology disruptor, Big Data is rather an important augmentation to the current technologies and if used properly it can provide significant business benefits as well as competitive advantage.
  12. 12. 11BIG DATA WORLD CANADA 2013 Thank you for your time! Questions?
  13. 13. 12BIG DATA WORLD CANADA 2013 Appendix 1. References 2. Hadoop – a Definition
  14. 14. 13BIG DATA WORLD CANADA 2013 References 1. Douglas, Laney "3D Data Management: Controlling Data Volume, Velocity and Variety". Gartner, 2001. 2. Beyer, Mark "Gartner Says Solving 'Big Data' Challenge Involves More Than Just Managing Volumes of Data“, Gartner, 2011. 3. Douglas, Laney "The Importance of 'Big Data': A Definition“, Gartner, 2012. 4. Wallis, Nigel “Big Data in Canada: Challenging Complacency for Competitive Advantage”, IDC, 2012. 5. Gogia, Sanchit “The Big Deal About Big Data For Customer Engagement”, Forrester, 2012. 6. James Manika et al. “Big Data: The next frontier for innovation, competition and productivity”, McKinsey Global Institute, 2011.
  15. 15. 14BIG DATA WORLD CANADA 2013 Hadoop – a Definition •Apache Hadoop is an open-source software framework that supports data-intensive distributed applications, licensed under the Apache v2 license. It supports the running of applications on large clusters of commodity hardware. The Hadoop framework transparently provides both reliability and data motion to applications. •Hadoop implements a computational paradigm named MapReduce, where the application is divided into many small fragments of work, each of which may be executed or re-executed on any node in the cluster. In addition, it provides a distributed file system that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both map/reduce and the distributed file system are designed so that node failures are automatically handled by the framework. It enables applications to work with thousands of computation-independent computers and petabytes of data. Hadoop was derived from Google's MapReduce and Google File System (GFS) papers. •The entire Apache Hadoop “platform” is now commonly considered to consist of the Hadoop kernel, MapReduce and Hadoop Distributed File System (HDFS), as well as a number of related projects – including Apache Hive, Apache HBase, and others. •Hadoop is written in the Java programming language and is a top-level Apache project being built and used by a global community of contributors. Hadoop and its related projects (Hive, HBase, Zookeeper, and so on) have many contributors from across the ecosystem. Though Java code is most common, any programming language can be used with "streaming" to implement the "map" and "reduce" parts of the system. Source: Wikipedia