Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big Data Analytics for connected home


Published on

Big Data Analytics for connected home: a few usecases, some important messages and a little example. Presentation given at CEA Cadarache - Cité des Nouvelles Energies at the strategic comittee of ARCSIS (

Published in: Data & Analytics
  • Be the first to comment

Big Data Analytics for connected home

  1. 1. May 22, 2015 Data Science Consulting Héloïse Nonne Senior Data Scientist - Manager Big Data Analytics for connected home
  2. 2. Data analytics for disconnected homes 2 𝑦𝑡 = 𝜇 + 𝜖 𝑡 + 𝜙1 𝑦𝑡−1 + ⋯ + 𝜙 𝑛 𝑦𝑡−𝑛 − 𝜃1 𝜖 𝑡−1 − ⋯ − 𝜃 𝑛 𝜖 𝑡−𝑛 ARIMA models (AutoRegressive Integrated Moving Average) 𝑦𝑡 = electric load at time t 𝜖 𝑡 = noise at time t • Very low frequency resolution for local (household) measurements (< trimestrial) • Only aggregated data (sum of individual loads) for higher frequency measurements (region, neighborhood) • Data storage issues • Computation power • Limited knowledge at local level • Limited predictive power • Complex sophisticated models exist but are difficult to tune
  3. 3. • Sun • Wind • Cloud cover • Humidity • Temperature Reducing electricity costs: a complete data ecosystem 3 Weather Energy production Energy price Historical data Actual measurement (real-time) Forecast • Appliances and use • Heating • Electricity storage • Elevators • Doors / lights • Network activity -> current occupation • Renewable energy • Shutter orientation • Anthropologic data • Building structure (thermal mass) Electricity demand ???? Regional / national scale Local / neighborhood scale Anthropologic data • Energy consumption patterns Anthropologic data • comfort temperature • children at school • activity of occupants • Weekday /holiday • Hour of day
  4. 4. Multiple sources of data for multiple models • Volume – vast amounts of data – too large to store and analyse using traditional technology • Velocity – speed at which new data is generated – speed at which data change • Variety – types of data (number, text, images, video) – types of sources (real-time, static) • Veracity – accuracy of data (frequency, errors) – quality of data (sampling errors, typos) 4
  5. 5. Technology choices depend on the usecase Transaction-oriented • Write/Read • Logs • Transactions Streaming-oriented • Compute on the fly • Reactivity • Real-time decisions Computationally intensive • CPU/GPU bound • Complex problem to solve Storage-oriented • Loads of data • Analysis • Algorithms Hadoop SQL interactive Tez Mahout Spark Hbase Cassa ndra HPC Storm Kafka Spark Hardware Software Need Bank – Stock market Web logs In/out Image recognition Research on DNA, … Energy load management Industrial processes Aeronautics Customers Web journey Bank – Insurance Customer management Records, archiving 5
  6. 6. Anomaly detection Load prediction Statistics for reporting on dashboards Identification of consumption patterns Data analytics on energy load 6 • Moving average and thresholds • Outlier detection • ARIMA • Neural networks • Recurrent neural networks + + • Clustering: K-means, DBScan • Self-organizing maps • Recommendations to reschedule appliances • Storage of energy (photovoltaic, geothermic, etc.)
  7. 7. Many usecases • Detect precarity (underheating) • Detect people in distress (illnesses, elderly, heat wave, …) • Improved safety (fire detection, security, …) Business Society Research / knowledge Sustainability • Building optimization (thermal mass, isolation, configuration, windows orientation) • Consumption patterns • Social behaviors • Optimize use and storage of energy (light management, applicances use, demand reduction, …) • Improve comfort in neighborhood • Reduce waste (energy, water, appliances) • Scoring and customer segmentation • Predict the demand in energy • Predictive maintenance (elevators, HVAC, photovoltaic, ..) • Cost reduction But remain pragmatic and think about the whole picture -> predictive maintenance on light bulbs ??! 7
  8. 8. Predictive maintenance Data • Shaft speed • Vibrations (X, Y, Z) • Sound measurements • Rail vibrations • Motor temperature • Oil buffer • … Wear, failure • Bearing fault • Door: Shoe deformation • Unbalance • Misalignment • Resonance • … Elevator maintenance predict failure before breakage Cost reduction and improvement of reliability through predictive maintenance 8
  9. 9. A predictive maintenance management system • Continuous adaptation of diagnostic • Build, increase and maintain knowledge • Handle large quantity of data • Handle uncertainty in diagnostic • Assess fault severity Requirements • Symptoms are a mix of different causes • Information is unclear • Limited frequency resolution • Missing data • Noise Challenges Data center Remote management system Richer knowledge multiple sources 9
  10. 10. Bayesian networks • Compact representation of entities states or events as random variables • Contains knowledge about how states /events are related BF Bearing fault DF Door deformation WU Weight unbalance RN Resonance MA Misalignment AYX Vibration freq peak on axis A at Y X TP Temperature > x °C SP Shaft speed freq peaks SdB Sound > x dB MA RN SP SdB BF DF WU X1X X2X Y1X Y2X Z1X Z2XTP • Qualitative = dependence relations • Quantitative = the strengths of the relations • Mix a priori knowledge with experimental (real-time) data • Explanatory (human understanding of phenomena vs black-box models) • Uncertainty management (assessment of probability of failure) • Possibility to learn • Parameters • Structures (events, entities, causes and effects) AdvantagesBayesian network Decision rules for action 10 Absolute need of prior knowledge from professionals
  11. 11. Bayesian networks MA RN SP SdB BF DF WU X1X X2X Y1X Y2X Z1X Z2XTP WU True (failure) 0.60 False 0.40 Experience 10 A priori conditional probability table Update with new experience P n + 1 = (P n ∗ nb_experiences) + 1 nb_experiences + 1 WU True (failure) 0.636 False 0.364 Experience 11 One can unlearn (forget the past (outdated) experiences) by using fading tables Add a fading factor in front of the oldest experiences 11
  12. 12. The big (data) picture • Many sources of data: weather, energy production, economic, social, behavioral data, appliances characteristics, current building occupation, activity, etc. • Different scales: worldwide, regional, local, individual • Different times: historical data, year, month, day, hour, real-time • The system is not going to be perfect at once -> design it constant improvement • A single model is useless: each model has its use and models feed each other with their knowledge and prediction • Choose the right model and the right technology: according to usecase, time cost, energy cost, pragmatism, realism • Build models with the professionals who know the problem -> build on existing knowledge An efficient system implies close collaboration business, researchers, manufacturers, maintainers, owners, users, developpers, data scientists, data managers, optimization specialists, and end-users 12
  13. 13. Quantmetry – Spécialiste de la Data science Agir Prédire Analyser Stocker Collecter 13 De plus en plus de data disponibles Tout stocker! Analyser pour mieux comprendre signaux forts et faibles Prévoir ce qui peut advenir grâce aux tendances du passé Automatiser la décision et l’action Quantmetry accompagne ses clients sur l’ensemble des strates de la pyramide des données et participe ainsi à leur transformation digitale par le quantitatif pour des résultats concrets sur leur performance business. • un cabinet de conseil « pure player » du Big Data et de la Data science dont le développement commercial a démarré en 2013 • des méthodes statistiques avancées, le machine learning et les technologies Big data • 2014: 1,5 millions d’euros de chiffres d’affaire avec une forte ambition de croissance, en France et à l’étranger • Une vingtaine de data scientists / consultants
  14. 14. Activités de Quantmetry 14 Optimisation Business par la Data Structuration d’un Data Lab Conseil Accompagnement Réalisation • Détection et priorisation d’opportunités par la data • Construction de schéma d’architecture IT • Retours d’expérience et bonnes pratiques • Schéma d’organisation et de gouvernance • Choix d’une architecture technologique Conduite du changement Conduite de projet • Cadrage, projet d’industrialisation • Méthodologie (modèles statistiques et algorithmes) • Technologies Big Data • Montée en compétences • Recrutement • Gouvernance Projets pilotes Industrialisation • Proof of concept de Data science • Pilotes technologiques • Industrialisation de pilotes (API, …) • Création d’une architecture Big Data et mise en place de flux de données
  15. 15. Veillle technologique et expérimentations • Des thèmes d’investigation : – Online learning – Deep learning et réseaux de neurones – Industrialisation – Analyse sémantique – Energie (analyse de séries temporelles) – Smart cities – Amélioration de l’expérience utilisateur • Acteur de l’écosystème Big Data : participation à des séminaires, conférences internationales, hackathons, compétitions Kaggle, partenariats éditeurs… Collaborations avec des laboratoires de recherche et des écoles. 15 • Création et développement de produits spécifiques autour des technologies Big Data • Recherche et développement en Data science
  16. 16. Baseline (régression logistique) Gradient Boosting Données non structurée s Feature engineeri ng Lift = 2 Lift = 6 Quelques Références en Data science 16 Amélioration du lift pour la conquête en banque des clients assurés Détection de churn pour un opérateur télécom 0 20 40 URL page résilitation Age Groupe Nb pages vues… Durée session Mise en place d’un Data Lab pour un assureur Analyse de comportements pour une mutuelle Optimisation d’un outil de pricing pour un acteur de la distribution B2B Modèles prédictifs de consommation d’énergie
  17. 17. Excellence Altruisme Résultats … et Big Data Visitez notre blog