SlideShare a Scribd company logo
1 of 10
Using Advanced Analytics to Classify
Electrical-Network-Related Incidents
Jessie Nghiem – Energy Safe Victoria
Let’s start with who we are…
• Energy Safe Victoria (ESV)
• Data and Analytics team
• About myself
• Current: part-time Senior Data Scientist and full-time new mom
• Past
• Lead Data Scientist at MLC Life Insurance
• Postdoc researcher at RMIT
• Data Scientist at InfoCentric
• Teaching Associate at Monash
• PhD at Monash
What have we used AI/ML for?
• Audit electrical products being sold online
• Find the correlations between weather conditions with the rate of
network related ground fire incidents
• Classify electrical network related incidents – case study for today
• And more…
The classic free text problem…
• Distribution businesses are required to report critical network incidents to ESV through OSIRIS,
our reporting platform
Let’s take a closer look at the data
• 5 years of data
• 3761 rows and 33 columns
• 3 important free text columns
• Description
• Causes
• Actions taken
• a labelled column of 15 incident
categories.
• We use IDEAR utility data
exploration
First we build models
using only text columns …
Incident data
Data extraction
and cleansing
Preprocessing
Feature
Engineering
Model Building
Model
Evaluation
TF-IDF (word/character/n-
gram)
Word Embedding
Count
Naïve Bayes
Linear Classifier
SVM
Bagging Models
(Random Forest, Extreme Gradient Boosting)
First iteration
Tokenisation
Remove stop words
Stemming and Lemmatisaion
And the winner is…
Algorithm Accuracy
Naive Bayes (Count Vector) 0.68
Naive Bayes (Word Level TF-IDF) 0.63
Naive Bayes (N-gram level TF-IDF) 0.63
Naive Bayes (Character level TF-IDF) 0.55
Linear Classifier (Count Vector) 0.76
Linear Classifier (Word Level TF-IDF) 0.77
Linear Classifier (N-gram level TF-IDF) 0.68
Linear Classifier (Character level TF-IDF) 0.75
Xgb Classifier (Count Vector) 0.76
Xgb Classifier (Word Level TF-IDF) 0.76
Xgb Classifier (Character level TF-IDF) 0.76
… …
Linear Classifier (Logistic Regression) on
Work Level TF-IDF Vector model is chosen
with the highest accuracy
This time we throw non-free- text
columns in…
Incident data
Data extraction
and cleansing
Preprocessing
Feature
Engineering
Model Building
Model
Evaluation
TF-IDF (word level)
Linear Classifier (text column only)
Linear Classifier (all columns)
Second iteration
Accuracy reaches 80%
Now we have a good model, so what’s next?
• The output of the model has been fed into the data process for
internal reporting and other advanced analytics
• Work with Deakin university to improve the accuracy of model
Thank You
LinkedIn: https://www.linkedin.com/in/jessie-nghiem-844a9925/
Email: Jessie.nghiem@energysafe.vic.gov.au

More Related Content

Similar to Using AA for incident classification

NEURAL Network Design Training
NEURAL Network Design  TrainingNEURAL Network Design  Training
NEURAL Network Design TrainingESCOM
 
background.pptx
background.pptxbackground.pptx
background.pptxKabileshCm
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NETDev Raj Gautam
 
Machine Learning : why we should know and how it works
Machine Learning : why we should know and how it worksMachine Learning : why we should know and how it works
Machine Learning : why we should know and how it worksKevin Lee
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data ScientistsRichard Garris
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnBenjamin Bengfort
 
acmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptxacmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptxdongchangim30
 
The Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureThe Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureIvo Andreev
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017Manish Pandey
 
Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmea
Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmeaIntroduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmea
Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmeaSandesh Rao
 
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEA
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEAIntroduction to Machine Learning - From DBA's to Data Scientists - OGBEMEA
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEASandesh Rao
 
04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptxShree Shree
 
rsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningrsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningJeff Heaton
 
Part 3 Machine Learnning
Part 3 Machine LearnningPart 3 Machine Learnning
Part 3 Machine LearnningMohamed Essam
 
Machine Learning techniques used in AI.
Machine Learning  techniques used in AI.Machine Learning  techniques used in AI.
Machine Learning techniques used in AI.ArchanaT32
 
Machine learning in computer security
Machine learning in computer securityMachine learning in computer security
Machine learning in computer securityKishor Datta Gupta
 
Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Artificial Intelligence and Deep Learning in Azure, CNTK and TensorflowArtificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Artificial Intelligence and Deep Learning in Azure, CNTK and TensorflowJen Stirrup
 
Data mining technique for classification and feature evaluation using stream ...
Data mining technique for classification and feature evaluation using stream ...Data mining technique for classification and feature evaluation using stream ...
Data mining technique for classification and feature evaluation using stream ...ranjit banshpal
 

Similar to Using AA for incident classification (20)

NEURAL Network Design Training
NEURAL Network Design  TrainingNEURAL Network Design  Training
NEURAL Network Design Training
 
background.pptx
background.pptxbackground.pptx
background.pptx
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NET
 
Machine Learning : why we should know and how it works
Machine Learning : why we should know and how it worksMachine Learning : why we should know and how it works
Machine Learning : why we should know and how it works
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
 
Project
ProjectProject
Project
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
acmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptxacmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptx
 
The Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureThe Machine Learning Workflow with Azure
The Machine Learning Workflow with Azure
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017
 
Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmea
Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmeaIntroduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmea
Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmea
 
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEA
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEAIntroduction to Machine Learning - From DBA's to Data Scientists - OGBEMEA
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEA
 
04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx
 
rsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningrsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morning
 
Part 3 Machine Learnning
Part 3 Machine LearnningPart 3 Machine Learnning
Part 3 Machine Learnning
 
Machine Learning techniques used in AI.
Machine Learning  techniques used in AI.Machine Learning  techniques used in AI.
Machine Learning techniques used in AI.
 
Machine learning in computer security
Machine learning in computer securityMachine learning in computer security
Machine learning in computer security
 
Text Analytics for Legal work
Text Analytics for Legal workText Analytics for Legal work
Text Analytics for Legal work
 
Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Artificial Intelligence and Deep Learning in Azure, CNTK and TensorflowArtificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
 
Data mining technique for classification and feature evaluation using stream ...
Data mining technique for classification and feature evaluation using stream ...Data mining technique for classification and feature evaluation using stream ...
Data mining technique for classification and feature evaluation using stream ...
 

Recently uploaded

edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfgreat91
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationmuqadasqasim10
 
原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证pwgnohujw
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives23050636
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"John Sobanski
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证zifhagzkk
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证pwgnohujw
 
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksSensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksBoston Institute of Analytics
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxStephen266013
 
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...yulianti213969
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfRobertoOcampo24
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token PredictionNABLAS株式会社
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...ThinkInnovation
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...yulianti213969
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchersdarmandersingh4580
 
Data Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster AnalysisData Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster AnalysisBoston Institute of Analytics
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Valters Lauzums
 
Digital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae CoolbethDigital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae CoolbethSamantha Rae Coolbeth
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesBoston Institute of Analytics
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证acoha1
 

Recently uploaded (20)

edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic information
 
原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksSensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
 
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchers
 
Data Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster AnalysisData Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster Analysis
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
Digital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae CoolbethDigital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
 

Using AA for incident classification

  • 1. Using Advanced Analytics to Classify Electrical-Network-Related Incidents Jessie Nghiem – Energy Safe Victoria
  • 2. Let’s start with who we are… • Energy Safe Victoria (ESV) • Data and Analytics team • About myself • Current: part-time Senior Data Scientist and full-time new mom • Past • Lead Data Scientist at MLC Life Insurance • Postdoc researcher at RMIT • Data Scientist at InfoCentric • Teaching Associate at Monash • PhD at Monash
  • 3. What have we used AI/ML for? • Audit electrical products being sold online • Find the correlations between weather conditions with the rate of network related ground fire incidents • Classify electrical network related incidents – case study for today • And more…
  • 4. The classic free text problem… • Distribution businesses are required to report critical network incidents to ESV through OSIRIS, our reporting platform
  • 5. Let’s take a closer look at the data • 5 years of data • 3761 rows and 33 columns • 3 important free text columns • Description • Causes • Actions taken • a labelled column of 15 incident categories. • We use IDEAR utility data exploration
  • 6. First we build models using only text columns … Incident data Data extraction and cleansing Preprocessing Feature Engineering Model Building Model Evaluation TF-IDF (word/character/n- gram) Word Embedding Count Naïve Bayes Linear Classifier SVM Bagging Models (Random Forest, Extreme Gradient Boosting) First iteration Tokenisation Remove stop words Stemming and Lemmatisaion
  • 7. And the winner is… Algorithm Accuracy Naive Bayes (Count Vector) 0.68 Naive Bayes (Word Level TF-IDF) 0.63 Naive Bayes (N-gram level TF-IDF) 0.63 Naive Bayes (Character level TF-IDF) 0.55 Linear Classifier (Count Vector) 0.76 Linear Classifier (Word Level TF-IDF) 0.77 Linear Classifier (N-gram level TF-IDF) 0.68 Linear Classifier (Character level TF-IDF) 0.75 Xgb Classifier (Count Vector) 0.76 Xgb Classifier (Word Level TF-IDF) 0.76 Xgb Classifier (Character level TF-IDF) 0.76 … … Linear Classifier (Logistic Regression) on Work Level TF-IDF Vector model is chosen with the highest accuracy
  • 8. This time we throw non-free- text columns in… Incident data Data extraction and cleansing Preprocessing Feature Engineering Model Building Model Evaluation TF-IDF (word level) Linear Classifier (text column only) Linear Classifier (all columns) Second iteration Accuracy reaches 80%
  • 9. Now we have a good model, so what’s next? • The output of the model has been fed into the data process for internal reporting and other advanced analytics • Work with Deakin university to improve the accuracy of model

Editor's Notes

  1. Hello everyone, thanks for giving me this opportunity to share our learning experience from applying advanced analytics in classifying electrical network-related incidents.
  2. Energy Safe Victoria is a technical and safety regulator responsible for the safe generation, supply and use of electricity, gas and pipelines. When we talk about electricity for example, that will include everything from the generation, transmission distribution to installation and equipment. We also license and register electricians, and issue and audit Certificates of Electrical Safety, which is your guarantee that electrical work has been performed by a qualified electrician. We have a small and relatively new established DA team of 5 people. We extend our capability and resources by collaboration with CSIRO and universities (such as Monash, Deakin and UTS) and external contractors I am currently a part-time Senior Data Scientist, leading the Advanced Analytics programs and managing R&D funding at ESV and also a full-time mom of a always fully-charged toddler. The pandemic doesn’t help my childcare plan but I am grateful that ESV has been always very supportive to give me flexible working arrangement so I can complete the work and enjoy more quality time with my little one. Prior to this role, I worked for MLC Life Insurance, RMIT and InfoCentric. Before being a mom, I enjoyed spending my spare time on teaching students at Monash, where I did my PhD in Computer Science..
  3. Our AI/ML learning journey has been quite challenging by the lack of readiness of the data for AA at ESV. However, instead of sitting there and waiting for perfect setting to come, we roll our sleeve up and work with what we have. A few projects have been started and well-received by our internal customers. We build an proof-of-concept AI solution to audit electrical products being sold on eBay/Amazon using Azure Cognitive Services. We have been secured a funding from Senior Committee of Officials (SCO) – Council of Australian Government to develop toward a full-scale product out of this PoC. Another project is to find which weather factors has strong influence on the number of ground fire incidents on a certain day. This implies whether the occurrence of an incident can be explained by the weather conditions or bad performance of the distribution businesses, which can be resulted in further investigation. We are also working with UTS to develop a predictive solution to anticipate the rate of incidents based on weather conditions and other geo-spatial factors. For today case study, I am going to share our experience in developing text classifier to put network-related incidents into the right bucket, a data enrichment work for the project we have just mentioned earlier.
  4. The classic free text problem… I believe you can find it in any organization. As a background, distribution businesses are legally required to report critical network incidents to ESV through OSIRIS, our reporting platform. It is going through review, approval process and investigation (if needed). This dataset feeds into our operational report, annual network performance report and other analysis. These incidents can be classified into 15 categories. The classification is based on the causes of the incidents. However, data input is mostly free text and can be filled by non technical people. The casual factors columns are multi-value and hard to identify root cause of the problem. At the moment we have a specialist who is going through incident by incident to classify them into the right bucket. This poses an opportunity to use machine learning to automate and optimize the process.
  5. Let’s take a closer look at the data. We have 5 years of data since the platform was launched. It is not a really big dataset as the number of incidents are small (less than 4K rows with 33 columns) and we try to make it is smaller for our community safety. There are 3 free text columns that have been used by our staff to classify the incidents into 15 categories based on their empirical knowledge. Incidents related to vehicle, connection and trees are the most popular. To explore the data, we use IDEAR utility, an tool written in R/Rmarkdown by Microsoft.
  6. First we built models that use only text columns. As any ML project, 70-80% time of project was on data processing and cleansing. Pre-processing steps include tokenization, removing stop word, lemmatisaion and stemming. It is followed by feature extraction/engineering to prepare for different types of models we want to try with this dataset. In particular, the generated features are TF-IDF vectors on word, character and n-gram), word embedded and count vectors. For each set of features, we trained different models including Naïve Bayes, Linear Classifier (Logistic Regression), support vector machine, bagging models such as Random Forest and Extreme Gradient Boosting to see which one performs the best in term of accuracy.
  7. And the winner is Linear Classifier (Logistic Regression: with accuracy 77%. That means we found a good model but it can be better…
  8. This time we throw more columns in in combination with those extracted feature from the previous build. These columns do have to be free text but can contain categorical or numerical values such as network type, voltage, caused by technical or work practice or environmental factors etc. We run the Logistic Regression model again. As a result the accuracy reaches 80%, makes the model a lot more useful.
  9. Now we have a good model, so what is next - The output of the model has been fed into the data process for internal reporting and other advanced analytics. We are working with Deakin university to improve the accuracy of model And that concludes my talk
  10. Thank you for attending my session. I hope you enjoy it. If you have any question or interested in collaborating with us, please feel free to reach out. My LinkedIn profile and work email address are included here. And thank you again