PAN3: Catching Outliers
with Cluster Analysis
Robin Louvet, GE Energy Connections
robin.louvet@ge.com, @rlouvet
2PREDIX TRANSFORM
Agenda
Patterns in Time Series1
Catching Outliers
Cluster Analysis
Predix Analytics
2
3
4
3PREDIX TRANSFORM
4PREDIX TRANSFORM
5PREDIX TRANSFORM
Patterns in Time Series
Time Series is a predominant raw data type in
industry
Signal Processing Independent Single Samples
Machine Learning Huge Volume of Historical Samples
6PREDIX TRANSFORM
Catching Outliers
Spotting abnormal patterns can be critical in
industry:
• Fraudulent Transaction Blocking
• Asset Health Monitoring
• Non-technical Losses On Networks
7PREDIX TRANSFORM
Cluster Analysis
(source: Wikipedia, Cluster
Analysis)
8PREDIX TRANSFORM
Cluster Analysis
9PREDIX TRANSFORM
Predix Analytics
10PREDIX TRANSFORM
Predix Analytics
{ “ntl-detection-py” : {
"tags": {
"analytic-root": "analytic",
"driver-root": "driver",
"driver-main": "driver/AnalyticDriver.py",
"mapper": "driver",
"resultprovider": "getOutput" } },
"libs": [ "boto3" ],
"conda-libs": [ "numpy", "scipy", "pandas", "scikit-
learn" ] }
11PREDIX TRANSFORM
Predix Analytics (demo)
General Electric reserves the right to make changes in specifications and features, or discontinue the product or service described at any time, without notice or obligation.
These materials do not constitute a representation, warranty or documentation regarding the product or service featured. Illustrations are provided for informational purposes,
and your configuration may differ. This information does not constitute legal, financial, coding, or regulatory advice in connection with your use of the product or service. Please
consult your professional advisors for any such advice. GE, Predix and the GE Monogram are trademarks of General Electric Company. ©2016 General Electric Company – All
rights reserved.

Predix Transform 2016 - Catching outliers with cluster analysis

Editor's Notes

  • #2 Exited to share my first experience with Predix, especially Predix analytics I am personnaly passionate about machine learning and maybe some of you are already familiar with this kind of technology A few weeks ago, GE Digital organized the Predix Discover hackathon, the first hackahton for GE employees (exciting event) Let me show you how my hackathon team developped a Predix application prototype that is able to “Catch outliers using Cluster Analysis” I hope this will inspire you to develop amazing Analytics on Predix!
  • #3 During this session: First, I will tell you about a publication that inspired me regarding the identification of patterns in a set of timeseries Second, I will talk about Fraud Detection (Catching Outliers) which is one of the most mature field of application for machine learning Third, I will present one specific set of machine learning algorithm that can help perform fraud detection Finally I will describe how my hackathon team implemented an application prototype on Predix
  • #4 I am currently working at GE Energy Connection and one recurring problematic that we need to solve for our customers is to characterize and forecast electrical consumption (load) What you can see here is a graph with weather-normalized hourly electricity consumption from a random sample of 1,000 residential utility customers, for a typical weekday. Chart extracted from blog post from an OPOWER article from 2014. Opower is a company that provides cloud-based software to the utility industry and their customers. Recently acquired by Oracle. They took a database with more than 8 hundred thousand load curves (time series) and tried to programmatically identify load curves archetypes (clusters).
  • #5 They found that it was possible to construct load archetypes at scale That the result of the computation matched with the categories of customers that were already handled by the types of customer contracts that Opower manages Also that they could identify new categories of customers (new business!)
  • #6 Signal processing is a well established discipline but applies to single sample (Fourier transform, …) Machine learning algorithm applied to time series can leverage huge amount of historical data collected in cloud systems
  • #12 En conclusion : - Mentionner a nouveau le fait que ce type d’analyse peut ouvrir la voie (pave the way) a de nouveaux businesses, services