Big data presentation, explanations and use cases in industrial sector

Big Data
explanations
&
use cases in
industrial sector
September 2015
Nicolas SARRAMAGNA
https://fr.linkedin.com/pub/nicolas-sarramagna/19/941/587

CONTENTS
 What’s Big Data ?
1. Definition, 3 V
2. General use cases
3. Technologies used
4. Market Overview
 Big Data in Industrial sector
1. What for ?
2. Vision
3. Demo Poc / PoV

COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
What’s Big Data – 3V
SEPTEMBER 2015
3
 BIG DATA :
 New contexts on data -> 3V
 New business ambitions, new technologies
 VOLUME : MASSIFICATION AND AUTOMATION OF DATA EXCHANGES
 80% data created last 12 months
 30 billions of contents on FB each month, Flickr 5 billions of page, 2 billions videos read on sur Youtube each day
 VARIETY : MULTIPLICATION OF SOURCES AND TYPES
 Mails, documents, logs (applications, networks, systems), databases, sensor data, open data, social networks,
blogs, forums, articles, browsing history, geolocation data, …
 Structured data (DB), semi-structured (html page, tweet, xml), unstructured (mail content, excel, ppt, video, audio)
 VELOCITY : NEED TO COLLECT AND PROCESS DATA IN REAL TIME
 Risk management (fraud, security of the SI – SIEM)
 Real time route optimization
 Personalized advertising

CONFIDENTIAL
What’s Big Data – new technologies
SEPTEMBER 2015
4
 BIG DATA :
 More efficient components but also throughput I/O -> grid architecture
 New technological knowledge : storage of large volumes of data in a cluster at a lower cost, distributed computing,
data mining industrialized, on-demand IT architecture with the cloud
 ORIGIN OF BIG DATA
 index the web and search engine for Google, Yahoo - years ~2006

CONFIDENTIAL
What’s Big Data - general use cases IT
SEPTEMBER 2015
5
 COMPLETE THE ARCHITECTURE OF THE DATA
 Vision of a Data lake / Enterprise data hub
 Bringing closer data applications and not duplicate data for each application
 "Deliver" managed data
 REDUCE STORAGE COSTS AND COMPUTING COSTS
 Big Data technologies use commodity hardware and / or cloud and parallel computing
 STRONG TECHNICAL CONSTRAINTS
 Manage + 1000 transactions / seconde
 Flow of + 1000 events to collect / seconde
 Computing + 10 threads /core cpu
 Storage of data set +10To for actions
Require major adaptations and material logic without big data technologies

CONFIDENTIAL
What’s Big Data - general use cases business
SEPTEMBER 2015
6
 END-USER CENTRIC
 Products recommendation
 Optimization of ads
 PROCESS CENTRIC
 Detection of unexpected events : fraud, network, predictive maintenance
 Path optimization
 DIVERSIFICATION OF THE BUSINESS MODEL
 Orange : resale of geolocation data

CONFIDENTIAL
What’s Big Data – misconceptions
SEPTEMBER 2015
7
Only used for
unstructured data
Only needed for
massive data sets
Only available from
open-source
Replaces my current
BI platform
Used with structured
and unstructured data
To store and analyse
all size of data
It is complimentary to
our existing BI
strategy and
investments
Big Data will become esential for Business Intelligence
All big editors are on
the bridge

CONFIDENTIAL
What’s Big Data – BD completes the architecture of the data
SEPTEMBER 2015
8

CONFIDENTIAL
What’s Big Data – BI opportunities
SEPTEMBER 2015 FOOTER CAN BE PERSIZED AS FOLLOW: INSERT / HEADER AND FOOTER
9
THE PAST - BI
BIG DATA ANALYTICS

CONFIDENTIAL
What’s Big Data - technologies under the hood - standard Hadoop
SEPTEMBER 2015 FOOTER CAN BE PERSONALIZED AS FOLLOW: INSERT / HEADER AND FOOTER
10
PLATEFORME HADOOP

CONFIDENTIAL
What’s Big Data - technologies under the hood
11
 COLLECT
 Spark, flume, Sqoop
 Inject data into HDFS and NoSql DB : command line, API REST, API Java, streaming injection, massive injection,
from RDBMS injection
 STORAGE
 Cloud, Hadoop -> distributed file system HDFS (large and small data set)
 NoSql, : not only sql : db distributed, schema-less : CAP theorem, DB key-value, column, document, graph oriented

CONFIDENTIAL
SEPTEMBER 2015
12
 ANALYSIS
 Data Science, Map / Reduce, Spark
 Analysis, clean data
 Goal : build a model
 Machine Learning : 1 data set to train the model (67% of the data set), 1 data set to evaluate the model (33%)
 VISUALIZATION
 DataViz : all visual representation techniques to do data mining.
 Build indicators decision easier
 Give indicator whatever size or type of data
 Innovate : give new perspectives to discover new opportunities
 Tableau, QlikView, Power Pivot
 Take data with ODBC connector, JDBC connector, API REST, native connector of the DataViz tool

CONFIDENTIAL
SEPTEMBER 2015
13
 CONCEPTS OF A BIG DATA ARCHITECTURE
 Data and actions distributed : the file-system, jobs (Map/Reduce, Spark, …) , databases (noSql)
 Data and actions co-location : replication, treatments strategy in Hadoop
 Horizontal elasticity : master / nodes architecture
 Shared nothing : when a node breaks down, no data is lost. Each node is independent.
 Design for failure : when a node breaks down, the cluster continues to work.

CONFIDENTIAL
14
 HDFS : HADOOP DISTRIBUTED FILE SYSTEM
 Name node : master of the system. Maintains and manages blocks presents on the datanodes
 Data nodes : slaves deployed on each machine and provide actual storage. Serve read and write requests for the
clients

CONFIDENTIAL
What’s Big Data – technologies under the hood - storage costs
SEPTEMBER 2015 FOOTER CAN BE PERSIZED AS FOLLOW: INSERT / HEADER AND FOOTER
15
 USE COMMODITY HARDWARE
 In Big Data, the data center is not a collection of servers but is a collection of co-located cpus, ram and local disks
 1 MILLION $ GETS ->

CONFIDENTIAL
 COTS DISTRIBUTION
 Cloudera, n°1
 Hortonworks, n°2
 MapR, n°3
 CLOUD (BASED ON A DISTRIB)
 Microsoft – Azure
 Amazon - AWS
 APPLIANCE EDITEURS, COSTS++
 Terradata
 Oracle
What’s Big Data - market Overview
SEPTEMBER 2015
16
leaders

CONFIDENTIAL
 CLOUDERA
 Business model editor, 5-6k€ / year / node
 Amazon deploy Cloudera
 Better maturity than others distributions
 HORTONWORKS
 Free, business model based on support : 15k€ / year / slot of 4 nodes or per slot of 50To
 Azure, Amzon deploy Hortonworks
 Less mature than Cloudera on security, administration
 MAPR
 Business model editor
 Divergence with the standard Hadoop
Big Data – positioning of the distributions
SEPTEMBER 2015
17
0
20
40
60
80
100
Cloudera
Hortonworks
MapR
Between distributions, ratio 1 to 4

CONTENTS
 What’s Big Data ?
1. Definition, 3 V
2. Use cases
3. Technologies under the hood
4. Market Overview
 Big Data in Industrial sector
1. What for ?
2. Vision
3. Demo Poc / PoV

CONFIDENTIAL
Big Data in Industrial sector – What for ? - use cases IT
 BUILD A DATA LAKE
 Reduce cost, move cold data from DataWarehouse
 Break the storage of the data in silos
 Stock raw data and can work (data mining) with all of the data
 Open the data, enrich them with metadata
 LOG ANALYSIS AND MONITORING - SIEM
 Monitoring of applications, networks, systems logs -> Splunk
 PREDICTIVE MAINTENANCE
 Monitoring of sensor data, predict breakdowns inter plants
19

CONFIDENTIAL
Big Data in Industrial sector – What for ? - use cases HR
 SKILLS VISION AND MANAGEMENT
 Cross informations from professional networks : viadeo, linkedin and internal HR informations : build a map of the
skills in PO
 Build and manage groups of skills, enrich internal RH tools
 E REPUTATION
 Follow in real time the data about your brand, about the competitors, the customers
 Monitoring of social networks (twitter, facebook), press news, financial news, forums, blogs, …
 Quickly react in according with the results if necessary
SEPTEMBER 2015
20

CONFIDENTIAL
Big Data in Industrial sector – What for ? - use cases Marketing
 VISION 360 OF CUSTOMERS, SUPPLIERS, COMPETITORS
 Have as much information about a company : social, legal, financial, competitive position.
 Evaluate risk, opportunity to work together
 VISION OF THE ROI OF PLANTS
 Real-time indicators from plants : invest, number of bumpers, tanks
 Rank the plants, predict gain
SEPTEMBER 2015
21

CONFIDENTIAL
Big Data in Industrial sector – Vision & Roadmap
 2016 : BEGIN TO BUILD A DATA LAKE
 Make the data directly available for BI, Data Science and / or to transfer it in a Datawarehouse
 Collect data and manage it (who has access, metadata)
 Infrastructure : hybrid with cloud / on premise / appliance ?
 2016 : CREATE A NEW CROSS-DIVISION SERVICE AROUND THE DATA
 DataViz : create reporting, use your current dataViz tools -> current BI analyst, no change
 Data IS : know his data and could give metadata to classify it -> current IS , no change
 Data engineer : use collecting tools, coding jobs, transform data -> new skills
 Data Administrator IT : Big Data architecture integration and monitoring -> new skills
 Data Analysis & data mining : cross analysis the data, apply models, design indicators to the dataViz -> new skills
 2016+ : IMPLEMENT OTHER USER CASES
 Begin small and accelerate
SEPTEMBER 2015
22

CONFIDENTIAL
Big Data in Industrial sector – Data Lake
 DATA LAKE / ENTERPRISE DATA HUB / DATA RESERVOIR
 Low cost storage of heterogeneous data (semi, non-structured and structured data)
 Raw data storage but data enriched and classified by metadata – a data reservoir, not a SWAMP
 Used for data exploration, analysis and data mining
 Data schema on read : old ETL, new ELT
 Can be directly used for BI (ELT mode)
 DATA LAKE AND DATA WAREHOUSE
 Complete the sources of the data warehouse
 Could stock cold data from Data Warehouse
 Feed the Data Warehouse
 DATA LAKE VISION
 Stores aggregated data, can stock all the data
 Data Lake centric vision : bring applications to Data and not copy Data to Applications
SEPTEMBER 2015
23

CONFIDENTIAL
Big Data in Industrial sector – Data Lake - infrastructure
 BIG DATA INFRASTRUCTURE
 hybrid with cloud : NO if you want to keep your data inside (security), network effort, cloud skills
 appliance : infra, license, deployment -> TCO ++
 On-premise : best compromise between cost, convenience of deployment and usages.
 CHOICE : ON-PREMISE INFRASTRUCTURE
 Go for Cloudera (better administration and security functionalities, ‘real-time’ module : Impala) or Hortonworks
 Send your IT training : dev, admin, data mining
SEPTEMBER 2015
24

CONFIDENTIAL
Big Data in Industrial sector – Proof of Concept – Proof of Value
SEPTEMBER 2015
25
 SUBJECT : E-REPUTATION
 GOALS
 Put in place indicators of e-Reputation of your enterprise/competitors/suppliers/customers
from various sources : news, social network
 Experiment of big data tools
 INDICATORS
 Who speaks about ? How (positive, negative, neutral) ? What’s the content ? Where in the world ? From what
source ?
 Different views of e-Reputation : financial, HR, societal, commercial
 DEMO

QUESTIONS ?
Nicolas SARRAMAGNA https://fr.linkedin.com/pub/nicolas-sarramagna/19/941/587

Big data presentation, explanations and use cases in industrial sector

More Related Content

What's hot

Viewers also liked

Similar to Big data presentation, explanations and use cases in industrial sector

Recently uploaded

Big data presentation, explanations and use cases in industrial sector

Editor's Notes