SlideShare a Scribd company logo
Attila Barta, Ph.D.
Head of Architecture at Private Client Group and BMO Insurance
Exploit Big Data to Enhance Enterprise Decision-
Making, Productivity and Process Optimization
1The 2nd Annual Mobile Enterprise Strategies Summit
Introduction to this presentation
•The purpose of this presentation is to go beyond the buzz and present what “Big Data” means
and the impact on your organization in general, and in the context of mobile.
•This presentation covers the following topics:
To understand the Big Data buzz, one has to go to the beginnings and understand the forces
that brought Big Data to life.
Is Big Data another buzz world like Semantic Web, Web 2.0 or Cloud?
Where are Canadian companies on Big Data in comparison with the World?
Big Data and Mobile integration points.
How a reference Big Data architecture looks like.
Big Data at BMO Financial Group.
The road ahead, what needs to be done.
•Note: this presentation reflects the opinions of the author alone and by no means of BMO Financial Group.
2The 2nd Annual Mobile Enterprise Strategies Summit
Big Data – How we got here
•In a 2001 research report[1] Gartner analyst Doug Laney defined data growth challenges and opportunities as
being three-dimensional, i.e. increasing volume (amount of data), velocity (speed of data in and out), and
variety (range of data types and sources). Gartner, and now much of the industry, continue to use this "3Vs"
model for describing Big Data[2]. (source Wikipedia).
•What was happening in 2001? Three major trends:
 Sloan Digital Sky Survey began collecting astronomical data in 2000 at a rate of 200GB/night – volume
 Sensor networks (web of things) and streaming databases (Message Oriented Middleware) – velocity
 Semi-structured databases, XML native databases beside object-oriented, relational databases – variety
•What happened after 2001?
 Rise of search engines and portals - Yahoo and Google:
• Problem: how to store and query (cheaply) in real time large amounts of (semi-structured) data.
• Answer: Hadoop on commodity Linux farms.
 Memory got cheaper – in-memory data grids.
 Rise of Social Media – petabytes in pictures, unstructured and semi-structured data.
 Increased computational power and large memory – visual analytics.
3The 2nd Annual Mobile Enterprise Strategies Summit
Big Data – Definitions and Examples
•In 2012, Gartner updated its definition as follows: "Big data are high-volume, high-velocity, and/or high-variety
information assets that require new forms of processing to enable enhanced decision making, insight discovery
and process optimization“[3].
• In 2012 IDC defines Big Data technologies as “a new generation of technologies and architectures designed
to extract value economically from very large volumes of a wide variety of data by enabling high-velocity
capture, discovery, and/or analysis”[4].
•In 2012 Forrester characterize Big Data as “increases in data volume, velocity, variety, and variability”[5].
•Big Data Characteristics:
1. Data Volume: data size in order of petabytes.
• Example: Facebook on June 13, 2012 announced that their had reached 100 PB of data. On
November 8, 2012 they announced that their warehouse grows by half a PB per day.
2. Data Velocity: real time processing of streaming data, including real time analytics.
• Example: a jet engine generates 20TB data/hour that has to be processed near real time.
3. Data Variety: structured, semi-structured, text, imagines, video, audio, etc.
• Example: 80% of enterprise data is unstructured. YouTube - 500TB of video uploaded per year.
4. Data Variability: data flows can be inconsistent with periodic peaks.
• Example: blogs commenting the new Blackberry device; stock market data that reacts to market
events.
4The 2nd Annual Mobile Enterprise Strategies Summit
Big Data – In Canada, where are we?
•In December 2012 IDC published a study of Big Data in Canada [4] by surveying 75 businesses with over
250MM in revenue. The conclusions of the survey are sobering:
 Less than one tenth of the respondents were familiar wit Hadoop (the Big Data framework) and slightly
more familiar with in memory data grids and in-memory analytics.
 Only half of Canadian organization already work with Big Data in comparison with more than three quarters
worldwide.
 The majority of Canadian companies use mainly internally produced data with less than a quarter of
Canadian organizations using data from non-traditional sources such as social media web data, RFID tags
and GPS.
 Big Data strategies are delegated to mid-level management level, while world-class companies integrate
technology decisions at the executive level.
5The 2nd Annual Mobile Enterprise Strategies Summit
Big Data – What are we missing in Canada?
•McKinsey Global Institute published “Big Data: The next frontier for innovation, competition and productivity”
in May 2011. In the sectors that they examined they estimated opportunities of hundreds of billion/yearly in
savings or new businesses by unleashing the potential of Big Data [6].
•Big Data immediate business opportunities:
 Transparent omni-channel information environment – an evolution of multi-channel characterized by a
seamlessly approach to the consumer experience through all available interaction channels.
 Sentiment analysis – data from social media enable organizations to perceive and analyze client
sentiment in order to better tailor marketing campaigns, products and services.
 Predictive models – based on real-time data streams determine likelihood to churn and take pre-emptive
actions for customer retention.
 Social technologies – not only understand holistically the client (the 360-degree view), but understand the
clients network of family, friends and peers in order to build the client 720-degree view.
 Location data – better understand behaviour, better offers based on location.
 Operational improvement: RFI and sensor networks allows (retailers) to get insights into demand and
better manage inventory and supply chains.
6The 2nd Annual Mobile Enterprise Strategies Summit
Big Data and Mobile integration points
•Mobile data provides a new and challenging data source for enterprise (big) data:
 Transparent omni-channel information environment – an evolution of multi-channel characterized by a
seamlessly approach to the consumer experience through all available interaction channels.
• Mobile channels are not only the newcomer but the strategy for growth for the service oriented
businesses. Mobile devices and tablets are now the norm and the technologies employed by mobile
devices, like HTML 5, are becoming the norm for all on-line channels.
 Sentiment analysis – data from social media enable organizations to perceive and analyze client
sentiment in order to better tailor marketing campaigns, products and services.
• Mobile devices connected to social media provide the base for instant feedback for sentiment analysis.
 Predictive models – based on real-time data streams determine likelihood to churn and take pre-emptive
actions for customer retention.
• Mobile devices connect to social media also provide opportunities for pre-emptive action, e.g. in order to
improve a bad customer experience.
 Social technologies – not only understand holistically the client (the 360-degree view), but understand the
clients network of family, friends and peers in order to build the client 720-degree view.
• Tempting to use mobile devices to build the client 720-degree view, however (arguably) only the
governments have this authority.
 Location data – better understand behaviour, better offers based on location.
• The location data provided by mobile devices not only provides the base for a better customer
experience (e.g. location based tailored offers) but also important risk management capability.
7The 2nd Annual Mobile Enterprise Strategies Summit
Big Data – Reference Architecture
•Typical architectures for Big Data address the following capabilities:
1.Real-time complex event processing (including sense and response).
2.Massive volumes of data (petabytes) relational and non-relational (i.e. social media, location, RFID).
3.Parallel processing/fast loading, typically based on Hadoop.
4.High-performance query systems based on in-memory data architectures.
5.Advanced analytics, e.g. visual analytics, columnar databases.
Virtual Infrastructure Workload Management
Infrastructure Services
Event Mgmt.
Query
(SQL, non-SQL)
Processing
Advanced
Analytics
Shared nothing hwd,
massively parallel
Commodity;
own or rent
Massive load via
parallel processing
Data Stream,
mobile data
A variant of the Forrester architecture [5]
Stream Processing
Non-relational dbms
Data Management
Relational dbms
Distributed File System
In-Memory Data Grid
8The 2nd Annual Mobile Enterprise Strategies Summit
Big Data – at BMO Financial Group
Virtual Infrastructure Workload Management
Infrastructure Services
Event Mgmt.
Query
(SQL, non-SQL)
Processing
Advanced
Analytics
Client Omni-Channel
Interactions, mobile
Spotfire, SAS,
Tableau, HANA
Tibco
BusinessEvents
Stream Processing
Non-relational dbms
Data Management
Relational dbms
Distributed File System
In-Memory Data Grid
Tibco ActiveSpaces,
HANA
Sybase IQ
PaaS, IaaS
•Big Data is work in progress at BMO Financial Group with some areas more advanced then others:
 Event management and in-memory data grids are state of the art.
 Advanced analytics are in transition to mature.
 Infrastructure virtualization is in progress.
 Hadoop infrastructure not in scope yet.
 Non-relational capability is in its infancy.
• Operational
• Proof of Concept
Legend
Note: the vendor list is by no means exhaustive, these are some of the technologies in use or in PoC.
9The 2nd Annual Mobile Enterprise Strategies Summit
Big Data – Capabilities at BMO Financial Group
•How the reference Big Data capabilities are reflected at BMO Financial Group:
1.Real-time complex event processing (including sense and response):
• Built a state of the art omni-channel sense and response capability based on a Tibco stack.
• Deployed real time in-bound lead management capability in 2011 that generated a significant increase
in up-sale and cross-sale – major new revenue for the Retail Bank.
2.Massive volumes of data (petabytes) relational and non-relational (i.e. social media, location, RFID):
• Data volumes manageable within the current infrastructure.
• Location data is currently available and in plan to be harvested.
• Plans on using social media data for sentiment analysis.
3.Parallel processing/fast loading, typically based on Hadoop:
• Not in plan, the current ETL investment is performing well.
4.High-performance query architecture based on in-memory data architectures:
• Running a state of the art in-memory data grid for real time event processing as well as for client 360-
degree view.
• Currently evaluating in-memory data grids for real time risk management as well as several regulatory
requirements, like Anti-Money-Laundering and Client Risk Management.
5.Advanced analytics, i.e. visual analytics, columnar databases:
• There are several advanced analytics tools in use such as Tibco Spotfire, Tableau and Sybase IQ, while
currently evaluating HANA and others.
10The 2nd Annual Mobile Enterprise Strategies Summit
Big Data – Impact on Enterprise Information Management
•Is the traditional MDM redundant?
 By no means; while there are in-memory MDM implementations it rather makes sense to keep the current
investment and load to in-memory databases only subsets of MDM data, e.g. client 360-degree view or any
other data elements needed for event management, sense and response or other capabilities.
•What will happen with the current EDW?
 Not much; transactional data will still be an important source for BI. However, the full power of parallel
query processing and the parallelism built into hardware should be harvested.
 EDWs should be augmented with social data, location data, either directly or via service providers in order
to provide the foundation for sentiment analysis and predictive modeling.
•Are ETLs tools done?
 Depends. This is the sweet spot where vendors are pitching Hadoop. Moreover, is your enterprise ready for
Hadoop? Are you ready to move to commodity hardware? Do you have the skills for both commodity
hardware and Hadoop?
•Time to retire current BI tools (e.g. Cognos, Business Objects, etc.)?
 Definitely not; continue to use the current management reports and dash-boards.
 Educate business on the new visual analytic tools and let them decide the way forward.
 Educate business on the new BI capabilities enabled by in-memory data bases.
•However be aware of the new competitor that is building it’s Information Management from scratch and with
the proper Big Data technology might compromise your established business advantage!
11The 2nd Annual Mobile Enterprise Strategies Summit
Big Data – Organizational challenges
•What needs to be done:
 In Big Data initiatives business leaders have to take the initiative. The new role of the CIO team is to
educate Business in Big Data and its opportunities versus defining and leading initiatives.
 CIOs have to take a holistic approach to Big Data by considering all Big Data capabilities and define
strategies accordingly, instead of focusing on some capabilities like fast ETL loading for which Hadoop is a
quick fix.
 Adapt the Information Management Strategy to include behavioral oriented data, like social data, as well as
location and sensor data.
 Change the BI strategy towards commoditization and massive parallel processing.
 Big Data requires new skill set (Data Scientist) for handling Hadoop environments as well as in-memory
data and advanced analytics. McKinsey predicts a current shortage of more than a hundred thousand Big
Data professionals in the US alone [6].
•Last but not least:
 Big Data is an evolution of many technologies around for the last decade or so. Although, with the potential
to be a technology disruptor, Big Data is rather an important augmentation to the current technologies and
if used properly it can provide significant business benefits as well as competitive advantage.
12The 2nd Annual Mobile Enterprise Strategies Summit
Thank you for your time! Questions?
attila.barta@bmo.com
13The 2nd Annual Mobile Enterprise Strategies Summit
Appendix
1. References
2. Hadoop – a Definition
14The 2nd Annual Mobile Enterprise Strategies Summit
References
1. Douglas, Laney "3D Data Management: Controlling Data Volume, Velocity and Variety“, Gartner, 2001.
2. Beyer, Mark "Gartner Says Solving 'Big Data' Challenge Involves More Than Just Managing Volumes of
Data“, Gartner, 2011.
3. Douglas, Laney "The Importance of 'Big Data': A Definition“, Gartner, 2012.
4. Wallis, Nigel “Big Data in Canada: Challenging Complacency for Competitive Advantage”, IDC, 2012.
5. Gogia, Sanchit “The Big Deal About Big Data For Customer Engagement”, Forrester, 2012.
6. James Manika et al. “Big Data: The next frontier for innovation, competition and productivity”, McKinsey
Global Institute, 2011.
15The 2nd Annual Mobile Enterprise Strategies Summit
Hadoop – a Definition
•Apache Hadoop is an open-source software framework that supports data-intensive distributed applications,
licensed under the Apache v2 license. It supports the running of applications on large clusters of commodity
hardware. The Hadoop framework transparently provides both reliability and data motion to applications.
•Hadoop implements a computational paradigm named MapReduce, where the application is divided into many
small fragments of work, each of which may be executed or re-executed on any node in the cluster. In addition,
it provides a distributed file system that stores data on the compute nodes, providing very high aggregate
bandwidth across the cluster. Both map/reduce and the distributed file system are designed so that node
failures are automatically handled by the framework. It enables applications to work with thousands of
computation-independent computers and petabytes of data. Hadoop was derived from Google's MapReduce
and Google File System (GFS) papers.
•The entire Apache Hadoop “platform” is now commonly considered to consist of the Hadoop kernel,
MapReduce and Hadoop Distributed File System (HDFS), as well as a number of related projects –
including Apache Hive, Apache HBase, and others.
•Hadoop is written in the Java programming language and is a top-level Apache project being built and used
by a global community of contributors. Hadoop and its related projects (Hive, HBase, Zookeeper, and so on)
have many contributors from across the ecosystem. Though Java code is most common, any programming
language can be used with "streaming" to implement the "map" and "reduce" parts of the system.
Source: Wikipedia

More Related Content

What's hot

Big data consumer analytics and the transformation of marketing
Big data consumer analytics and the transformation of marketingBig data consumer analytics and the transformation of marketing
Big data consumer analytics and the transformation of marketingNicha Tatsaneeyapan
 
Emergence of Big Data in Digital Marketing
Emergence of Big Data  in Digital MarketingEmergence of Big Data  in Digital Marketing
Emergence of Big Data in Digital MarketingKrishnan Parasuraman
 
Worst practices in Business Intelligence setup
Worst practices in Business Intelligence setupWorst practices in Business Intelligence setup
Worst practices in Business Intelligence setupThe Marketing Distillery
 
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE...
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE...LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE...
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE...ijdpsjournal
 
Reaping the benefits of Big Data and real time analytics
Reaping the benefits of Big Data and real time analyticsReaping the benefits of Big Data and real time analytics
Reaping the benefits of Big Data and real time analyticsThe Marketing Distillery
 
Big Data in the Fund Industry: From Descriptive to Prescriptive Data Analytics
Big Data in the Fund Industry: From Descriptive to Prescriptive Data AnalyticsBig Data in the Fund Industry: From Descriptive to Prescriptive Data Analytics
Big Data in the Fund Industry: From Descriptive to Prescriptive Data AnalyticsBroadridge
 
Bigdata and Social Media Analytics
Bigdata and Social Media Analytics Bigdata and Social Media Analytics
Bigdata and Social Media Analytics Dillip kumar
 
Big data
Big dataBig data
Big dataRiya
 
Thinking Small: Bringing the Power of Big Data to the Masses
Thinking Small:  Bringing the Power of Big Data to the MassesThinking Small:  Bringing the Power of Big Data to the Masses
Thinking Small: Bringing the Power of Big Data to the MassesFlutterbyBarb
 
Bardess Moderated - Analytics and Business Intelligence - Society of Informat...
Bardess Moderated - Analytics and Business Intelligence - Society of Informat...Bardess Moderated - Analytics and Business Intelligence - Society of Informat...
Bardess Moderated - Analytics and Business Intelligence - Society of Informat...bardessweb
 

What's hot (20)

Big data consumer analytics and the transformation of marketing
Big data consumer analytics and the transformation of marketingBig data consumer analytics and the transformation of marketing
Big data consumer analytics and the transformation of marketing
 
Emergence of Big Data in Digital Marketing
Emergence of Big Data  in Digital MarketingEmergence of Big Data  in Digital Marketing
Emergence of Big Data in Digital Marketing
 
Hadoop Overview
Hadoop OverviewHadoop Overview
Hadoop Overview
 
The ABCs of Big Data
The ABCs of Big DataThe ABCs of Big Data
The ABCs of Big Data
 
The dawn of Big Data
The dawn of Big DataThe dawn of Big Data
The dawn of Big Data
 
Worst practices in Business Intelligence setup
Worst practices in Business Intelligence setupWorst practices in Business Intelligence setup
Worst practices in Business Intelligence setup
 
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE...
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE...LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE...
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE...
 
Buyer's guide to strategic analytics
Buyer's guide to strategic analyticsBuyer's guide to strategic analytics
Buyer's guide to strategic analytics
 
Making sense of consumer data
Making sense of consumer dataMaking sense of consumer data
Making sense of consumer data
 
Reaping the benefits of Big Data and real time analytics
Reaping the benefits of Big Data and real time analyticsReaping the benefits of Big Data and real time analytics
Reaping the benefits of Big Data and real time analytics
 
Monetize Big Data
Monetize Big DataMonetize Big Data
Monetize Big Data
 
Big Data in the Fund Industry: From Descriptive to Prescriptive Data Analytics
Big Data in the Fund Industry: From Descriptive to Prescriptive Data AnalyticsBig Data in the Fund Industry: From Descriptive to Prescriptive Data Analytics
Big Data in the Fund Industry: From Descriptive to Prescriptive Data Analytics
 
Big data
Big dataBig data
Big data
 
Bigdata and Social Media Analytics
Bigdata and Social Media Analytics Bigdata and Social Media Analytics
Bigdata and Social Media Analytics
 
Big data basics
Big data basicsBig data basics
Big data basics
 
Big Data: Where is the Real Opportunity?
Big Data: Where is the Real Opportunity?Big Data: Where is the Real Opportunity?
Big Data: Where is the Real Opportunity?
 
Big data
Big dataBig data
Big data
 
Thinking Small: Bringing the Power of Big Data to the Masses
Thinking Small:  Bringing the Power of Big Data to the MassesThinking Small:  Bringing the Power of Big Data to the Masses
Thinking Small: Bringing the Power of Big Data to the Masses
 
Big data
Big dataBig data
Big data
 
Bardess Moderated - Analytics and Business Intelligence - Society of Informat...
Bardess Moderated - Analytics and Business Intelligence - Society of Informat...Bardess Moderated - Analytics and Business Intelligence - Society of Informat...
Bardess Moderated - Analytics and Business Intelligence - Society of Informat...
 

Viewers also liked

TOMMY RESUME 02-25-2016
TOMMY RESUME  02-25-2016TOMMY RESUME  02-25-2016
TOMMY RESUME 02-25-2016Tommy Ngo
 
Ramesh resume
Ramesh resumeRamesh resume
Ramesh resumeRamesh R
 
nACT concept presentation
nACT concept presentationnACT concept presentation
nACT concept presentationWolfgang Kluth
 
The Future of Magento Extensibility | Imagine 2013 Technology | Christopher O...
The Future of Magento Extensibility | Imagine 2013 Technology | Christopher O...The Future of Magento Extensibility | Imagine 2013 Technology | Christopher O...
The Future of Magento Extensibility | Imagine 2013 Technology | Christopher O...Atwix
 
Ventajas de la aplicación prezi
Ventajas de la aplicación preziVentajas de la aplicación prezi
Ventajas de la aplicación preziEstefySanz
 
Factores de riesgo durante el puerperio
Factores de riesgo durante el puerperioFactores de riesgo durante el puerperio
Factores de riesgo durante el puerperioFernanda Silva Lizardi
 
Gestión de la disciplina laboral para contribuir a los resultados de la organ...
Gestión de la disciplina laboral para contribuir a los resultados de la organ...Gestión de la disciplina laboral para contribuir a los resultados de la organ...
Gestión de la disciplina laboral para contribuir a los resultados de la organ...Percy Alache
 
CURRICULUM_VITAE 2
CURRICULUM_VITAE 2CURRICULUM_VITAE 2
CURRICULUM_VITAE 2Jason Lloyd
 

Viewers also liked (13)

Diagrama de flujo
Diagrama de flujoDiagrama de flujo
Diagrama de flujo
 
TOMMY RESUME 02-25-2016
TOMMY RESUME  02-25-2016TOMMY RESUME  02-25-2016
TOMMY RESUME 02-25-2016
 
Ramesh resume
Ramesh resumeRamesh resume
Ramesh resume
 
nACT concept presentation
nACT concept presentationnACT concept presentation
nACT concept presentation
 
The Future of Magento Extensibility | Imagine 2013 Technology | Christopher O...
The Future of Magento Extensibility | Imagine 2013 Technology | Christopher O...The Future of Magento Extensibility | Imagine 2013 Technology | Christopher O...
The Future of Magento Extensibility | Imagine 2013 Technology | Christopher O...
 
Expo1
Expo1Expo1
Expo1
 
Рейтинг публикации открытых данных за 2015 год
Рейтинг публикации открытых данных за 2015 годРейтинг публикации открытых данных за 2015 год
Рейтинг публикации открытых данных за 2015 год
 
Fútbol.pdpp
Fútbol.pdppFútbol.pdpp
Fútbol.pdpp
 
Ventajas de la aplicación prezi
Ventajas de la aplicación preziVentajas de la aplicación prezi
Ventajas de la aplicación prezi
 
Factores de riesgo durante el puerperio
Factores de riesgo durante el puerperioFactores de riesgo durante el puerperio
Factores de riesgo durante el puerperio
 
Gestión de la disciplina laboral para contribuir a los resultados de la organ...
Gestión de la disciplina laboral para contribuir a los resultados de la organ...Gestión de la disciplina laboral para contribuir a los resultados de la organ...
Gestión de la disciplina laboral para contribuir a los resultados de la organ...
 
Exposición parrafo introduccion
Exposición parrafo introduccionExposición parrafo introduccion
Exposición parrafo introduccion
 
CURRICULUM_VITAE 2
CURRICULUM_VITAE 2CURRICULUM_VITAE 2
CURRICULUM_VITAE 2
 

Similar to exploit_big_data_v1

Let's make money from big data!
Let's make money from big data! Let's make money from big data!
Let's make money from big data! B Spot
 
UNIT 1 -BIG DATA ANALYTICS Full.pdf
UNIT 1 -BIG DATA ANALYTICS Full.pdfUNIT 1 -BIG DATA ANALYTICS Full.pdf
UNIT 1 -BIG DATA ANALYTICS Full.pdfvvpadhu
 
Big data seminor
Big data seminorBig data seminor
Big data seminorberasrujana
 
The Comparison of Big Data Strategies in Corporate Environment
The Comparison of Big Data Strategies in Corporate EnvironmentThe Comparison of Big Data Strategies in Corporate Environment
The Comparison of Big Data Strategies in Corporate EnvironmentIRJET Journal
 
Aziksa hadoop for buisness users2 santosh jha
Aziksa hadoop for buisness users2 santosh jhaAziksa hadoop for buisness users2 santosh jha
Aziksa hadoop for buisness users2 santosh jhaData Con LA
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Casesaziksa
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notesMohit Saini
 
Introduction to big data – convergences.
Introduction to big data – convergences.Introduction to big data – convergences.
Introduction to big data – convergences.saranya270513
 
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docxProject 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docxstilliegeorgiana
 
Ab cs of big data
Ab cs of big dataAb cs of big data
Ab cs of big dataDigimark
 
Big Data Analytics Research Report
Big Data Analytics Research ReportBig Data Analytics Research Report
Big Data Analytics Research ReportIla Group
 
BIG DATA CHAPTER 2 IN DSS.pptx
BIG DATA CHAPTER 2 IN DSS.pptxBIG DATA CHAPTER 2 IN DSS.pptx
BIG DATA CHAPTER 2 IN DSS.pptxmuflehaljarrah
 
Big data - a review (2013 4)
Big data - a review (2013 4)Big data - a review (2013 4)
Big data - a review (2013 4)Sonu Gupta
 

Similar to exploit_big_data_v1 (20)

Unit III.pdf
Unit III.pdfUnit III.pdf
Unit III.pdf
 
Let's make money from big data!
Let's make money from big data! Let's make money from big data!
Let's make money from big data!
 
UNIT 1 -BIG DATA ANALYTICS Full.pdf
UNIT 1 -BIG DATA ANALYTICS Full.pdfUNIT 1 -BIG DATA ANALYTICS Full.pdf
UNIT 1 -BIG DATA ANALYTICS Full.pdf
 
Big data seminor
Big data seminorBig data seminor
Big data seminor
 
uae views on big data
  uae views on  big data  uae views on  big data
uae views on big data
 
Big data Analytics
Big data Analytics Big data Analytics
Big data Analytics
 
The Comparison of Big Data Strategies in Corporate Environment
The Comparison of Big Data Strategies in Corporate EnvironmentThe Comparison of Big Data Strategies in Corporate Environment
The Comparison of Big Data Strategies in Corporate Environment
 
Aziksa hadoop for buisness users2 santosh jha
Aziksa hadoop for buisness users2 santosh jhaAziksa hadoop for buisness users2 santosh jha
Aziksa hadoop for buisness users2 santosh jha
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
Identifying the new frontier of big data as an enabler for T&T industries: Re...
Identifying the new frontier of big data as an enabler for T&T industries: Re...Identifying the new frontier of big data as an enabler for T&T industries: Re...
Identifying the new frontier of big data as an enabler for T&T industries: Re...
 
Big data assignment
Big data assignmentBig data assignment
Big data assignment
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
 
Introduction to big data – convergences.
Introduction to big data – convergences.Introduction to big data – convergences.
Introduction to big data – convergences.
 
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docxProject 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
 
Ab cs of big data
Ab cs of big dataAb cs of big data
Ab cs of big data
 
Big data
Big dataBig data
Big data
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
Big Data Analytics Research Report
Big Data Analytics Research ReportBig Data Analytics Research Report
Big Data Analytics Research Report
 
BIG DATA CHAPTER 2 IN DSS.pptx
BIG DATA CHAPTER 2 IN DSS.pptxBIG DATA CHAPTER 2 IN DSS.pptx
BIG DATA CHAPTER 2 IN DSS.pptx
 
Big data - a review (2013 4)
Big data - a review (2013 4)Big data - a review (2013 4)
Big data - a review (2013 4)
 

exploit_big_data_v1

  • 1. Attila Barta, Ph.D. Head of Architecture at Private Client Group and BMO Insurance Exploit Big Data to Enhance Enterprise Decision- Making, Productivity and Process Optimization
  • 2. 1The 2nd Annual Mobile Enterprise Strategies Summit Introduction to this presentation •The purpose of this presentation is to go beyond the buzz and present what “Big Data” means and the impact on your organization in general, and in the context of mobile. •This presentation covers the following topics: To understand the Big Data buzz, one has to go to the beginnings and understand the forces that brought Big Data to life. Is Big Data another buzz world like Semantic Web, Web 2.0 or Cloud? Where are Canadian companies on Big Data in comparison with the World? Big Data and Mobile integration points. How a reference Big Data architecture looks like. Big Data at BMO Financial Group. The road ahead, what needs to be done. •Note: this presentation reflects the opinions of the author alone and by no means of BMO Financial Group.
  • 3. 2The 2nd Annual Mobile Enterprise Strategies Summit Big Data – How we got here •In a 2001 research report[1] Gartner analyst Doug Laney defined data growth challenges and opportunities as being three-dimensional, i.e. increasing volume (amount of data), velocity (speed of data in and out), and variety (range of data types and sources). Gartner, and now much of the industry, continue to use this "3Vs" model for describing Big Data[2]. (source Wikipedia). •What was happening in 2001? Three major trends:  Sloan Digital Sky Survey began collecting astronomical data in 2000 at a rate of 200GB/night – volume  Sensor networks (web of things) and streaming databases (Message Oriented Middleware) – velocity  Semi-structured databases, XML native databases beside object-oriented, relational databases – variety •What happened after 2001?  Rise of search engines and portals - Yahoo and Google: • Problem: how to store and query (cheaply) in real time large amounts of (semi-structured) data. • Answer: Hadoop on commodity Linux farms.  Memory got cheaper – in-memory data grids.  Rise of Social Media – petabytes in pictures, unstructured and semi-structured data.  Increased computational power and large memory – visual analytics.
  • 4. 3The 2nd Annual Mobile Enterprise Strategies Summit Big Data – Definitions and Examples •In 2012, Gartner updated its definition as follows: "Big data are high-volume, high-velocity, and/or high-variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization“[3]. • In 2012 IDC defines Big Data technologies as “a new generation of technologies and architectures designed to extract value economically from very large volumes of a wide variety of data by enabling high-velocity capture, discovery, and/or analysis”[4]. •In 2012 Forrester characterize Big Data as “increases in data volume, velocity, variety, and variability”[5]. •Big Data Characteristics: 1. Data Volume: data size in order of petabytes. • Example: Facebook on June 13, 2012 announced that their had reached 100 PB of data. On November 8, 2012 they announced that their warehouse grows by half a PB per day. 2. Data Velocity: real time processing of streaming data, including real time analytics. • Example: a jet engine generates 20TB data/hour that has to be processed near real time. 3. Data Variety: structured, semi-structured, text, imagines, video, audio, etc. • Example: 80% of enterprise data is unstructured. YouTube - 500TB of video uploaded per year. 4. Data Variability: data flows can be inconsistent with periodic peaks. • Example: blogs commenting the new Blackberry device; stock market data that reacts to market events.
  • 5. 4The 2nd Annual Mobile Enterprise Strategies Summit Big Data – In Canada, where are we? •In December 2012 IDC published a study of Big Data in Canada [4] by surveying 75 businesses with over 250MM in revenue. The conclusions of the survey are sobering:  Less than one tenth of the respondents were familiar wit Hadoop (the Big Data framework) and slightly more familiar with in memory data grids and in-memory analytics.  Only half of Canadian organization already work with Big Data in comparison with more than three quarters worldwide.  The majority of Canadian companies use mainly internally produced data with less than a quarter of Canadian organizations using data from non-traditional sources such as social media web data, RFID tags and GPS.  Big Data strategies are delegated to mid-level management level, while world-class companies integrate technology decisions at the executive level.
  • 6. 5The 2nd Annual Mobile Enterprise Strategies Summit Big Data – What are we missing in Canada? •McKinsey Global Institute published “Big Data: The next frontier for innovation, competition and productivity” in May 2011. In the sectors that they examined they estimated opportunities of hundreds of billion/yearly in savings or new businesses by unleashing the potential of Big Data [6]. •Big Data immediate business opportunities:  Transparent omni-channel information environment – an evolution of multi-channel characterized by a seamlessly approach to the consumer experience through all available interaction channels.  Sentiment analysis – data from social media enable organizations to perceive and analyze client sentiment in order to better tailor marketing campaigns, products and services.  Predictive models – based on real-time data streams determine likelihood to churn and take pre-emptive actions for customer retention.  Social technologies – not only understand holistically the client (the 360-degree view), but understand the clients network of family, friends and peers in order to build the client 720-degree view.  Location data – better understand behaviour, better offers based on location.  Operational improvement: RFI and sensor networks allows (retailers) to get insights into demand and better manage inventory and supply chains.
  • 7. 6The 2nd Annual Mobile Enterprise Strategies Summit Big Data and Mobile integration points •Mobile data provides a new and challenging data source for enterprise (big) data:  Transparent omni-channel information environment – an evolution of multi-channel characterized by a seamlessly approach to the consumer experience through all available interaction channels. • Mobile channels are not only the newcomer but the strategy for growth for the service oriented businesses. Mobile devices and tablets are now the norm and the technologies employed by mobile devices, like HTML 5, are becoming the norm for all on-line channels.  Sentiment analysis – data from social media enable organizations to perceive and analyze client sentiment in order to better tailor marketing campaigns, products and services. • Mobile devices connected to social media provide the base for instant feedback for sentiment analysis.  Predictive models – based on real-time data streams determine likelihood to churn and take pre-emptive actions for customer retention. • Mobile devices connect to social media also provide opportunities for pre-emptive action, e.g. in order to improve a bad customer experience.  Social technologies – not only understand holistically the client (the 360-degree view), but understand the clients network of family, friends and peers in order to build the client 720-degree view. • Tempting to use mobile devices to build the client 720-degree view, however (arguably) only the governments have this authority.  Location data – better understand behaviour, better offers based on location. • The location data provided by mobile devices not only provides the base for a better customer experience (e.g. location based tailored offers) but also important risk management capability.
  • 8. 7The 2nd Annual Mobile Enterprise Strategies Summit Big Data – Reference Architecture •Typical architectures for Big Data address the following capabilities: 1.Real-time complex event processing (including sense and response). 2.Massive volumes of data (petabytes) relational and non-relational (i.e. social media, location, RFID). 3.Parallel processing/fast loading, typically based on Hadoop. 4.High-performance query systems based on in-memory data architectures. 5.Advanced analytics, e.g. visual analytics, columnar databases. Virtual Infrastructure Workload Management Infrastructure Services Event Mgmt. Query (SQL, non-SQL) Processing Advanced Analytics Shared nothing hwd, massively parallel Commodity; own or rent Massive load via parallel processing Data Stream, mobile data A variant of the Forrester architecture [5] Stream Processing Non-relational dbms Data Management Relational dbms Distributed File System In-Memory Data Grid
  • 9. 8The 2nd Annual Mobile Enterprise Strategies Summit Big Data – at BMO Financial Group Virtual Infrastructure Workload Management Infrastructure Services Event Mgmt. Query (SQL, non-SQL) Processing Advanced Analytics Client Omni-Channel Interactions, mobile Spotfire, SAS, Tableau, HANA Tibco BusinessEvents Stream Processing Non-relational dbms Data Management Relational dbms Distributed File System In-Memory Data Grid Tibco ActiveSpaces, HANA Sybase IQ PaaS, IaaS •Big Data is work in progress at BMO Financial Group with some areas more advanced then others:  Event management and in-memory data grids are state of the art.  Advanced analytics are in transition to mature.  Infrastructure virtualization is in progress.  Hadoop infrastructure not in scope yet.  Non-relational capability is in its infancy. • Operational • Proof of Concept Legend Note: the vendor list is by no means exhaustive, these are some of the technologies in use or in PoC.
  • 10. 9The 2nd Annual Mobile Enterprise Strategies Summit Big Data – Capabilities at BMO Financial Group •How the reference Big Data capabilities are reflected at BMO Financial Group: 1.Real-time complex event processing (including sense and response): • Built a state of the art omni-channel sense and response capability based on a Tibco stack. • Deployed real time in-bound lead management capability in 2011 that generated a significant increase in up-sale and cross-sale – major new revenue for the Retail Bank. 2.Massive volumes of data (petabytes) relational and non-relational (i.e. social media, location, RFID): • Data volumes manageable within the current infrastructure. • Location data is currently available and in plan to be harvested. • Plans on using social media data for sentiment analysis. 3.Parallel processing/fast loading, typically based on Hadoop: • Not in plan, the current ETL investment is performing well. 4.High-performance query architecture based on in-memory data architectures: • Running a state of the art in-memory data grid for real time event processing as well as for client 360- degree view. • Currently evaluating in-memory data grids for real time risk management as well as several regulatory requirements, like Anti-Money-Laundering and Client Risk Management. 5.Advanced analytics, i.e. visual analytics, columnar databases: • There are several advanced analytics tools in use such as Tibco Spotfire, Tableau and Sybase IQ, while currently evaluating HANA and others.
  • 11. 10The 2nd Annual Mobile Enterprise Strategies Summit Big Data – Impact on Enterprise Information Management •Is the traditional MDM redundant?  By no means; while there are in-memory MDM implementations it rather makes sense to keep the current investment and load to in-memory databases only subsets of MDM data, e.g. client 360-degree view or any other data elements needed for event management, sense and response or other capabilities. •What will happen with the current EDW?  Not much; transactional data will still be an important source for BI. However, the full power of parallel query processing and the parallelism built into hardware should be harvested.  EDWs should be augmented with social data, location data, either directly or via service providers in order to provide the foundation for sentiment analysis and predictive modeling. •Are ETLs tools done?  Depends. This is the sweet spot where vendors are pitching Hadoop. Moreover, is your enterprise ready for Hadoop? Are you ready to move to commodity hardware? Do you have the skills for both commodity hardware and Hadoop? •Time to retire current BI tools (e.g. Cognos, Business Objects, etc.)?  Definitely not; continue to use the current management reports and dash-boards.  Educate business on the new visual analytic tools and let them decide the way forward.  Educate business on the new BI capabilities enabled by in-memory data bases. •However be aware of the new competitor that is building it’s Information Management from scratch and with the proper Big Data technology might compromise your established business advantage!
  • 12. 11The 2nd Annual Mobile Enterprise Strategies Summit Big Data – Organizational challenges •What needs to be done:  In Big Data initiatives business leaders have to take the initiative. The new role of the CIO team is to educate Business in Big Data and its opportunities versus defining and leading initiatives.  CIOs have to take a holistic approach to Big Data by considering all Big Data capabilities and define strategies accordingly, instead of focusing on some capabilities like fast ETL loading for which Hadoop is a quick fix.  Adapt the Information Management Strategy to include behavioral oriented data, like social data, as well as location and sensor data.  Change the BI strategy towards commoditization and massive parallel processing.  Big Data requires new skill set (Data Scientist) for handling Hadoop environments as well as in-memory data and advanced analytics. McKinsey predicts a current shortage of more than a hundred thousand Big Data professionals in the US alone [6]. •Last but not least:  Big Data is an evolution of many technologies around for the last decade or so. Although, with the potential to be a technology disruptor, Big Data is rather an important augmentation to the current technologies and if used properly it can provide significant business benefits as well as competitive advantage.
  • 13. 12The 2nd Annual Mobile Enterprise Strategies Summit Thank you for your time! Questions? attila.barta@bmo.com
  • 14. 13The 2nd Annual Mobile Enterprise Strategies Summit Appendix 1. References 2. Hadoop – a Definition
  • 15. 14The 2nd Annual Mobile Enterprise Strategies Summit References 1. Douglas, Laney "3D Data Management: Controlling Data Volume, Velocity and Variety“, Gartner, 2001. 2. Beyer, Mark "Gartner Says Solving 'Big Data' Challenge Involves More Than Just Managing Volumes of Data“, Gartner, 2011. 3. Douglas, Laney "The Importance of 'Big Data': A Definition“, Gartner, 2012. 4. Wallis, Nigel “Big Data in Canada: Challenging Complacency for Competitive Advantage”, IDC, 2012. 5. Gogia, Sanchit “The Big Deal About Big Data For Customer Engagement”, Forrester, 2012. 6. James Manika et al. “Big Data: The next frontier for innovation, competition and productivity”, McKinsey Global Institute, 2011.
  • 16. 15The 2nd Annual Mobile Enterprise Strategies Summit Hadoop – a Definition •Apache Hadoop is an open-source software framework that supports data-intensive distributed applications, licensed under the Apache v2 license. It supports the running of applications on large clusters of commodity hardware. The Hadoop framework transparently provides both reliability and data motion to applications. •Hadoop implements a computational paradigm named MapReduce, where the application is divided into many small fragments of work, each of which may be executed or re-executed on any node in the cluster. In addition, it provides a distributed file system that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both map/reduce and the distributed file system are designed so that node failures are automatically handled by the framework. It enables applications to work with thousands of computation-independent computers and petabytes of data. Hadoop was derived from Google's MapReduce and Google File System (GFS) papers. •The entire Apache Hadoop “platform” is now commonly considered to consist of the Hadoop kernel, MapReduce and Hadoop Distributed File System (HDFS), as well as a number of related projects – including Apache Hive, Apache HBase, and others. •Hadoop is written in the Java programming language and is a top-level Apache project being built and used by a global community of contributors. Hadoop and its related projects (Hive, HBase, Zookeeper, and so on) have many contributors from across the ecosystem. Though Java code is most common, any programming language can be used with "streaming" to implement the "map" and "reduce" parts of the system. Source: Wikipedia