Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Meetup Data-science OVH

974 views

Published on

Data Organisation & Big Data Architecture

Published in: Data & Analytics
  • Be the first to comment

Meetup Data-science OVH

  1. 1. Data Organization & Big Data Architecture
  2. 2.  Data Organization  Big Data Architecture  Recruitment Agenda
  3. 3. Data Organization
  4. 4. Line Of Business HR Finance Sales Customers Competitors Markets Products Supply Trafic Acquisition Communication Security Prospects * If you read this text, work in the data field and are interested in joining us, please go to: https://www.ovh.com/fr/careers/
  5. 5. Use Line Of Business •LOB 1 ( Customer ) BI Team DataScience Team LOB 2 ( Support ) BI Team DataScience Team LOB 3 … BI Team DataScience Team
  6. 6. Data Office Data Centralization Datalake Cleansing Data Integration Data Office CRM BI Team Data Science Team • ExtractsData Analyst •Events •Actions Customer Animation •Product Analysis •Global AnalysisBUS •Country Analysis SUBS •PAC •Analyse AdhocDigital •Onsite •PartnerBIZDEV •Campaigns •Text mining Trafic Acquistion •Segmentation •Normalisation Targeting Channel Incaseyoumisseditonthepreviousslide,ifyouworkinthedatafield, weareinterestedinyourprofile!
  7. 7. Data Maturity Level 1: POC Data are manually created or extracted once Data are modified by one data scientist Data are assessed by a data analyst and manually sent to a business analyst post control
  8. 8. Data Maturity Level 1: POC Data are manually created or extracted once Data are modified by one data scientist Data are assessed by a data analyst and manually sent to a business analyst post control Level 2: Manual Data are manually created on a regular basis Data are manually added to the enterprise model with an automated process Data can be used by all data scientists, data analysts or business analysts
  9. 9. Data Maturity Level 1: POC Data are manually created or extracted once Data are modified by one data scientist Data are assessed by a data analyst and manually sent to a business analyst post control Level 2: Manual Data are manually created on a regular basis Data are manually added to the enterprise model with an automated process Data can be used by all data scientists, data analysts or business analysts Level 3: Automatic Data are created through a controlled business process Data are automatically added to the enterprise model Data can be used by all data scientists, data analysts or business analysts
  10. 10. Data Maturity Matrix Customers Competitors Products Advanced 5 Potential Strategy 4 Attrition New Product 3 Churn Rank 2 Adds Event Basic 1 NIC Pricing …
  11. 11. Exploration : Code First Industrialisation : Model first Data Scientists Data Analysts Business Analysts Analyse Test Validation Data Management Team ( Architect + Data Integrator ) Business Intelligence Team Data Lake Team
  12. 12. Data Lake Team Tool / Infrastructure Exploration : Code First Industrialisation : Model first Data Scientists Data Analysts Business Analysts Technical model Analyse Test Validation Data Management Team ( Architect + Data Integrator ) Business Intelligence Team
  13. 13. Tool / Infrastructure Exploration : Code First Industrialisation : Model first Data preparation : 80% Data Scientists Data Analysts Business Analysts Technical model Machine Learning : 20% Analyse Test Validation Data Management Team ( Architect + Data Integrator ) Business Intelligence Team Data Lake Team
  14. 14. Tool / Infrastructure Exploration : Code First Industrialisation : Model first Data preparation : 80% Data Scientists Data Analysts Business Analysts Technical model Machine Learning : 20% Analyse Test Validation Data Analysis / Creation Data Analysis Data Management Team ( Architect + Data Integrator ) DataViz Model Business Intelligence Team POC Expose POC POC Mode Data Lake Team
  15. 15. Tool / Infrastructure Exploration : Code First Industrialisation : Model first Data preparation : 80% Data Scientists Data Analysts Business Analysts Technical model Machine Learning : 20% Analyse Test Validation Data Analysis / Creation Data Analysis DataCommitee Data Management Team ( Architect + Data Integrator ) DataViz Model Enterprise Model Building Datamart and report building Business Intelligence Team DTM Data Prepare: industrialise POC Datastore 360 Level 2 & 3 mode Expose POC Entreprise model POC Mode Data Lake Team
  16. 16. Tool / Infrastructure Exploration : Code First Industrialisation : Model first Data preparation : 80% Data Scientists Data Analysts Business Analysts Technical model Machine Learning : 20% Analyse Test Validation Data Analysis / Creation Data Analysis DataCommitee Data Management Team ( Architect + Data Integrator ) DataViz Model Enterprise Model Building Datamart and report building Business Intelligence Team DTM Data Prepare: industrialise Build Datamart and Dashboard POC Datastore 360 Expose POC Entreprise model POC Mode Level 2 & 3 mode Data Lake Team
  17. 17. Data Commitee  Define data that needs to be added to enterprise data  Define priority and owners by subject  Industrialise New data production : from excel to full business process  Validate enterprise model – Common vocabulary – Business and/or Functional model  Be informed of evolution Participant  Data Scientist  Data Analyst  Business Analyst  Data Management Team Periodicity  Every month Objectives
  18. 18. Datastore 360 EDS 360 History  Get all data from – Front office application – Back Office Application – External Data  Stores data in a business oriented model  Responsable to historize data when this makes sense for the business – What data do we want to keep ? What will I need in 20 years ?  Expose data to all application that requires it – Business Intelligence : reporting or datamart – Front office Application Current Client Produit Activity Client Produit Activity … … Data Scientist Data Analyst Business Analyst DataViz User APPs (CRM, Support api api Direct read
  19. 19. Big Data Architecture
  20. 20. Context ~ 50 Replicas SQL ~ 700 DB ~ 300K tables ~ 100TB ~ 500K events/s
  21. 21. Datalake Hardware view Private network OVH Dedicated server OVH Public Cloud High scalability Security Performance Reliability
  22. 22. Lille Grand Palais – 28 Février 2017
  23. 23. Datalake software view Pig Flink Spark HDFS HBase Phoenix Kafka (Queue)Couch Base
  24. 24. Jobs Job Skills Output Data Analyst Excel Dataviz : Tableau, PowerBI Data strategy Data Scientist Scala, Java, R, Python, Cube Datasets, Flows, Patterns, Models Data Integrator Flink, Hbase, Pig, Spark Data preparation Data Dev Ops Kafka, Hbase, Go, Apache Beam, … Datalake
  25. 25. Thank you ! Join us : ovh.com/fr/careers

×