Data Science
With R & Vanilla Air
Business Intelligence &
Analytics - Data Science Platform
Patrick Beaucamp – patrick.beaucamp@bpm-conseil.com
General Introduction
II-SDV, Nice 19th April 2016
Presentation Agenda
2Data Science with R & Vanilla Air
Landscape for Statistics & Analytics
- Open Source : R, Knime, Weka, RapidMiner
- Commercial : SAS – SPSS - Watson
- A Key Decision from FDA for R : december 2014
Demo Platform : Vanilla & Vanilla Air
Business Intelligence versus Data Science
R Platform Introduction
… need for visualization and server-ready !!!
Introduction
3Data Science with R & Vanilla Air
If you don’t find it, it doesn’t exist !
Document Data inside document
Business Intelligence - Subject
4
Project Initialisation
• Requests for Report, Dashboard, to visualize data stored in production database
• Requests to access data from various database and build global activity report, kpi projects
• Projects to align number with process, to set global rules for calculation of Kpi, to deliver legacy reports, etc …
Focus on
• Data Quality & Data consistancy, using ETL & Data Quality tools
• Define rules to aggregate data, to standardize informations, to clean data, using Master Data Management tools
• Loading Data into Datawarehouse (ODS, DWH and DTM parts), using ETL tools
• Define Reports, Dashboard, KPI and Cube with end users, and adjust Datamart structure to comply with the
expectation
• Create Report, Dashboard, Cube and various Metadata to provide access to validated data
• Define Workflow to process - for example - data loading + kpi calculation + report creation
Business Intelligence
• Reporting
• OLAP (cubes)
• Dashboards
• KPI (performance indicators)
• Maps (OSM support)
ETL & WorkFlow
• Master Data
Management
• Data Quality
• Data Profiling
Data Science with R & Vanilla Air
Business Intelligence - Platform
5Data Science with R & Vanilla Air
Internal
Data
ETL Dwh BI DataViz
Business Intelligence Visualisation
6Data Science with R & Vanilla Air
Data Science - Subject
7
Project Initialisation
• Requests to understand why such data results are available – Business Question
• Request to cross existing information with additional information, to add value to existing data
• Projects to try to build model to understand data, such as clustering, association, decision tree
• Projects to try to build forecasting & predictive models
Focus on
• Platform & Components, such as predictive language (R is recommanded)
• External data analysis & integration : what are the external information which influence my data
• Analysing data and building model to explain correlation between data, impact on data input
modification
• Building statistics, analytics & predicative models
• Providing tools to advanced users to access data, visualize data, manipulate data
Data Science with R & Vanilla Air
Data Science - Platform
8Data Science with R & Vanilla Air
Data Acquisition
(Internal – External)
Data Lake
(Hadoop)
Predictive
Engine
Data Viz
Data Science - Visualisation
9Data Science with R & Vanilla Air
Data Mining
Open Source Landscape
10Data Science with R & Vanilla Air
RapidMiner
Weka
Knime
R :
- Rstudio & Shiny
- RevolutionAnalytics (Microsoft R Server)
- Vanilla Air
- ORE (Oracle R Enterprise)
Interfaces Examples (1/2)
11Data Science with R & Vanilla Air
Interfaces Examples (2/2)
12Data Science with R & Vanilla Air
Commercial Corner
13Data Science with R & Vanilla Air
Visualization : Qlik - Tableau
Statistics : Matlab, Statistica, Stata, etc …
DataMining : SAS – SPSS – IBM Watson
Key Decision from FDA - 2000
14Data Science with R & Vanilla Air
Key Decision from FDA - 2014
15Data Science with R & Vanilla Air
Document : R-FDA.PDF
R Introduction
16Data Science with R & Vanilla Air
What is R ?
R is a programming language and software environment for
statistical computing and graphics.
www.R-project.org
R Common use cases
17Data Science with R & Vanilla Air
R in DataMining Landscape
18Data Science with R & Vanilla Air
R Challenges
19Data Science with R & Vanilla Air
Need for Development Studio
20Data Science with R & Vanilla Air
Web Based
Need for Visualization (1/4)
21Data Science with R & Vanilla Air
Shiny (R Studio)
Need for Visualization (2/4)
22Data Science with R & Vanilla Air
Jupyter Notebook (Python, Microsoft Azure)
Need for Visualization (3/4)
23Data Science with R & Vanilla Air
Apache Zeppelin (incubation project)
Need for Visualization (4/4)
24Data Science with R & Vanilla Air
Vanilla Air
R – Need for Enterprise Ready
25Data Science with R & Vanilla Air
Vanilla Air
Shiny Server
Microsoft R Server
Oracle R Enterprise
Very recently (end 2015) : R Foundation
Certified Packages Server Side Architecture
Vanilla Air Introduction
26Data Science with R & Vanilla Air
With R
Vanilla Air – R Package Management
27Data Science with R & Vanilla Air
Vanilla Air – Dataset Management
28Data Science with R & Vanilla Air
Vanilla Air – Workspace
29Data Science with R & Vanilla Air
Vanilla Air – WorkFlow
30Data Science with R & Vanilla Air
Vanilla Air – Parameters
31Data Science with R & Vanilla Air
Vanilla Air – Publish to Vanilla
32Data Science with R & Vanilla Air
Vanilla Air –Vanilla Visualization
33Data Science with R & Vanilla Air
II-SDV Event
34Data Science with R & Vanilla Air
Join Us :
WorkShop Wednesday
35Data Science with R & Vanilla Air
Thanks for your attention
Try Vanilla Air:
Download and Share your Experience
Questions & Answers
Annexe
36Data Science with R & Vanilla Air
A Series of Screen
Vanilla & Vanilla Air
Vanilla BI - MetaData Explorer
37Vanilla Smart Data - General Introduction
Vanilla BI - Dashboard
38Vanilla Smart Data - General Introduction
Vanilla BI - OLAP
39Vanilla - General Introduction
Vanilla BI - KPI
40Vanilla - General Introduction
Vanilla Visualisation
41Vanilla - General Introduction
Vanilla Air
42Vanilla - General Introduction
Vanilla Air
43Vanilla - General Introduction
Dataset
Vanilla Air
44Vanilla - General Introduction
WorkSpace
Vanilla Air
45Vanilla - General Introduction
WorkFlow
Vanilla Air
46Vanilla - General Introduction
Markdown Support with Vanilla
Vanilla Smart Data Business Case
47Vanilla - General Introduction
 What does influence my sales ?
 How weather can influence sales on product ?
 If I can have some weather prediction, can I forecast my sales ?
Retail Industry
Vanilla Smart Data Business Case
48Vanilla - General Introduction
• How to find the better price for my product using more data sources ?
• How social media comments on a product can influence its price ?
Purchase Platform
Vanilla Smart Data Business Case
49Vanilla - General Introduction
• Why some products are damaged during the transport: which product ?which
transporter ?
• What external events like weather or transport duration can explain the situation ?
• What is the best transporter for specific products based on weather forecast ?
Pharmaceutical Industry
Vanilla Smart Data Business Case
50Vanilla - General Introduction
 How does temperature evolution and weather impact pathologies
 How does holiday & week-end impact pathologies
 How the patient are splited in different groupes, based on pathologies, age, gender …
Hospital Analysis
Vanilla Smart Data Business Case
51Vanilla - General Introduction
 How does social media impact sales
 How to get alerts when social media start discussion on my products
 How to set alerts on various « products / social media activity » (including
competition) and evaluate impact on my sales
Beauty Industry
Documentations and tutorials available on our Web sites
www.bpm-conseil.com
www.vanillasmartdata.com
Thanks for your attention
52Vanilla - General Introduction

II-SDV 2016 Patrick Beaucamp - Data Science with R and Vanilla Air

  • 1.
    Data Science With R& Vanilla Air Business Intelligence & Analytics - Data Science Platform Patrick Beaucamp – patrick.beaucamp@bpm-conseil.com General Introduction II-SDV, Nice 19th April 2016
  • 2.
    Presentation Agenda 2Data Sciencewith R & Vanilla Air Landscape for Statistics & Analytics - Open Source : R, Knime, Weka, RapidMiner - Commercial : SAS – SPSS - Watson - A Key Decision from FDA for R : december 2014 Demo Platform : Vanilla & Vanilla Air Business Intelligence versus Data Science R Platform Introduction … need for visualization and server-ready !!!
  • 3.
    Introduction 3Data Science withR & Vanilla Air If you don’t find it, it doesn’t exist ! Document Data inside document
  • 4.
    Business Intelligence -Subject 4 Project Initialisation • Requests for Report, Dashboard, to visualize data stored in production database • Requests to access data from various database and build global activity report, kpi projects • Projects to align number with process, to set global rules for calculation of Kpi, to deliver legacy reports, etc … Focus on • Data Quality & Data consistancy, using ETL & Data Quality tools • Define rules to aggregate data, to standardize informations, to clean data, using Master Data Management tools • Loading Data into Datawarehouse (ODS, DWH and DTM parts), using ETL tools • Define Reports, Dashboard, KPI and Cube with end users, and adjust Datamart structure to comply with the expectation • Create Report, Dashboard, Cube and various Metadata to provide access to validated data • Define Workflow to process - for example - data loading + kpi calculation + report creation Business Intelligence • Reporting • OLAP (cubes) • Dashboards • KPI (performance indicators) • Maps (OSM support) ETL & WorkFlow • Master Data Management • Data Quality • Data Profiling Data Science with R & Vanilla Air
  • 5.
    Business Intelligence -Platform 5Data Science with R & Vanilla Air Internal Data ETL Dwh BI DataViz
  • 6.
    Business Intelligence Visualisation 6DataScience with R & Vanilla Air
  • 7.
    Data Science -Subject 7 Project Initialisation • Requests to understand why such data results are available – Business Question • Request to cross existing information with additional information, to add value to existing data • Projects to try to build model to understand data, such as clustering, association, decision tree • Projects to try to build forecasting & predictive models Focus on • Platform & Components, such as predictive language (R is recommanded) • External data analysis & integration : what are the external information which influence my data • Analysing data and building model to explain correlation between data, impact on data input modification • Building statistics, analytics & predicative models • Providing tools to advanced users to access data, visualize data, manipulate data Data Science with R & Vanilla Air
  • 8.
    Data Science -Platform 8Data Science with R & Vanilla Air Data Acquisition (Internal – External) Data Lake (Hadoop) Predictive Engine Data Viz
  • 9.
    Data Science -Visualisation 9Data Science with R & Vanilla Air
  • 10.
    Data Mining Open SourceLandscape 10Data Science with R & Vanilla Air RapidMiner Weka Knime R : - Rstudio & Shiny - RevolutionAnalytics (Microsoft R Server) - Vanilla Air - ORE (Oracle R Enterprise)
  • 11.
    Interfaces Examples (1/2) 11DataScience with R & Vanilla Air
  • 12.
    Interfaces Examples (2/2) 12DataScience with R & Vanilla Air
  • 13.
    Commercial Corner 13Data Sciencewith R & Vanilla Air Visualization : Qlik - Tableau Statistics : Matlab, Statistica, Stata, etc … DataMining : SAS – SPSS – IBM Watson
  • 14.
    Key Decision fromFDA - 2000 14Data Science with R & Vanilla Air
  • 15.
    Key Decision fromFDA - 2014 15Data Science with R & Vanilla Air Document : R-FDA.PDF
  • 16.
    R Introduction 16Data Sciencewith R & Vanilla Air What is R ? R is a programming language and software environment for statistical computing and graphics. www.R-project.org
  • 17.
    R Common usecases 17Data Science with R & Vanilla Air
  • 18.
    R in DataMiningLandscape 18Data Science with R & Vanilla Air
  • 19.
    R Challenges 19Data Sciencewith R & Vanilla Air
  • 20.
    Need for DevelopmentStudio 20Data Science with R & Vanilla Air Web Based
  • 21.
    Need for Visualization(1/4) 21Data Science with R & Vanilla Air Shiny (R Studio)
  • 22.
    Need for Visualization(2/4) 22Data Science with R & Vanilla Air Jupyter Notebook (Python, Microsoft Azure)
  • 23.
    Need for Visualization(3/4) 23Data Science with R & Vanilla Air Apache Zeppelin (incubation project)
  • 24.
    Need for Visualization(4/4) 24Data Science with R & Vanilla Air Vanilla Air
  • 25.
    R – Needfor Enterprise Ready 25Data Science with R & Vanilla Air Vanilla Air Shiny Server Microsoft R Server Oracle R Enterprise Very recently (end 2015) : R Foundation Certified Packages Server Side Architecture
  • 26.
    Vanilla Air Introduction 26DataScience with R & Vanilla Air
  • 27.
    With R Vanilla Air– R Package Management 27Data Science with R & Vanilla Air
  • 28.
    Vanilla Air –Dataset Management 28Data Science with R & Vanilla Air
  • 29.
    Vanilla Air –Workspace 29Data Science with R & Vanilla Air
  • 30.
    Vanilla Air –WorkFlow 30Data Science with R & Vanilla Air
  • 31.
    Vanilla Air –Parameters 31Data Science with R & Vanilla Air
  • 32.
    Vanilla Air –Publish to Vanilla 32Data Science with R & Vanilla Air
  • 33.
    Vanilla Air –VanillaVisualization 33Data Science with R & Vanilla Air
  • 34.
    II-SDV Event 34Data Sciencewith R & Vanilla Air Join Us : WorkShop Wednesday
  • 35.
    35Data Science withR & Vanilla Air Thanks for your attention Try Vanilla Air: Download and Share your Experience Questions & Answers
  • 36.
    Annexe 36Data Science withR & Vanilla Air A Series of Screen Vanilla & Vanilla Air
  • 37.
    Vanilla BI -MetaData Explorer 37Vanilla Smart Data - General Introduction
  • 38.
    Vanilla BI -Dashboard 38Vanilla Smart Data - General Introduction
  • 39.
    Vanilla BI -OLAP 39Vanilla - General Introduction
  • 40.
    Vanilla BI -KPI 40Vanilla - General Introduction
  • 41.
  • 42.
    Vanilla Air 42Vanilla -General Introduction
  • 43.
    Vanilla Air 43Vanilla -General Introduction Dataset
  • 44.
    Vanilla Air 44Vanilla -General Introduction WorkSpace
  • 45.
    Vanilla Air 45Vanilla -General Introduction WorkFlow
  • 46.
    Vanilla Air 46Vanilla -General Introduction Markdown Support with Vanilla
  • 47.
    Vanilla Smart DataBusiness Case 47Vanilla - General Introduction  What does influence my sales ?  How weather can influence sales on product ?  If I can have some weather prediction, can I forecast my sales ? Retail Industry
  • 48.
    Vanilla Smart DataBusiness Case 48Vanilla - General Introduction • How to find the better price for my product using more data sources ? • How social media comments on a product can influence its price ? Purchase Platform
  • 49.
    Vanilla Smart DataBusiness Case 49Vanilla - General Introduction • Why some products are damaged during the transport: which product ?which transporter ? • What external events like weather or transport duration can explain the situation ? • What is the best transporter for specific products based on weather forecast ? Pharmaceutical Industry
  • 50.
    Vanilla Smart DataBusiness Case 50Vanilla - General Introduction  How does temperature evolution and weather impact pathologies  How does holiday & week-end impact pathologies  How the patient are splited in different groupes, based on pathologies, age, gender … Hospital Analysis
  • 51.
    Vanilla Smart DataBusiness Case 51Vanilla - General Introduction  How does social media impact sales  How to get alerts when social media start discussion on my products  How to set alerts on various « products / social media activity » (including competition) and evaluate impact on my sales Beauty Industry
  • 52.
    Documentations and tutorialsavailable on our Web sites www.bpm-conseil.com www.vanillasmartdata.com Thanks for your attention 52Vanilla - General Introduction