Using AA for incident classification

•Download as PPTX, PDF•

0 likes•132 views

Jessie_N

A presentation for Advancing APS Data and Analytics conference

Data & Analytics

Using Advanced Analytics to Classify
Electrical-Network-Related Incidents
Jessie Nghiem – Energy Safe Victoria

Let’s start with who we are…
• Energy Safe Victoria (ESV)
• Data and Analytics team
• About myself
• Current: part-time Senior Data Scientist and full-time new mom
• Past
• Lead Data Scientist at MLC Life Insurance
• Postdoc researcher at RMIT
• Data Scientist at InfoCentric
• Teaching Associate at Monash
• PhD at Monash

What have we used AI/ML for?
• Audit electrical products being sold online
• Find the correlations between weather conditions with the rate of
network related ground fire incidents
• Classify electrical network related incidents – case study for today
• And more…

The classic free text problem…
• Distribution businesses are required to report critical network incidents to ESV through OSIRIS,
our reporting platform

Let’s take a closer look at the data
• 5 years of data
• 3761 rows and 33 columns
• 3 important free text columns
• Description
• Causes
• Actions taken
• a labelled column of 15 incident
categories.
• We use IDEAR utility data
exploration

First we build models
using only text columns …
Incident data
Data extraction
and cleansing
Preprocessing
Feature
Engineering
Model Building
Model
Evaluation
TF-IDF (word/character/n-
gram)
Word Embedding
Count
Naïve Bayes
Linear Classifier
SVM
Bagging Models
(Random Forest, Extreme Gradient Boosting)
First iteration
Tokenisation
Remove stop words
Stemming and Lemmatisaion

And the winner is…
Algorithm Accuracy
Naive Bayes (Count Vector) 0.68
Naive Bayes (Word Level TF-IDF) 0.63
Naive Bayes (N-gram level TF-IDF) 0.63
Naive Bayes (Character level TF-IDF) 0.55
Linear Classifier (Count Vector) 0.76
Linear Classifier (Word Level TF-IDF) 0.77
Linear Classifier (N-gram level TF-IDF) 0.68
Linear Classifier (Character level TF-IDF) 0.75
Xgb Classifier (Count Vector) 0.76
Xgb Classifier (Word Level TF-IDF) 0.76
Xgb Classifier (Character level TF-IDF) 0.76
… …
Linear Classifier (Logistic Regression) on
Work Level TF-IDF Vector model is chosen
with the highest accuracy

This time we throw non-free- text
columns in…
Incident data
Data extraction
and cleansing
Preprocessing
Feature
Engineering
Model Building
Model
Evaluation
TF-IDF (word level)
Linear Classifier (text column only)
Linear Classifier (all columns)
Second iteration
Accuracy reaches 80%

Now we have a good model, so what’s next?
• The output of the model has been fed into the data process for
internal reporting and other advanced analytics
• Work with Deakin university to improve the accuracy of model

Thank You
LinkedIn: https://www.linkedin.com/in/jessie-nghiem-844a9925/
Email: Jessie.nghiem@energysafe.vic.gov.au

Similar to Using AA for incident classification

NEURAL Network Design TrainingESCOM

background.pptxKabileshCm

Machine Learning With ML.NETDev Raj Gautam

Machine Learning : why we should know and how it worksKevin Lee

Azure Databricks for Data ScientistsRichard Garris

ProjectSujith C.P

Introduction to Machine Learning with SciKit-LearnBenjamin Bengfort

acmsigtalkshare-121023190142-phpapp01.pptxdongchangim30

The Machine Learning Workflow with AzureIvo Andreev

Keynote at IWLS 2017Manish Pandey

Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmeaSandesh Rao

Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEASandesh Rao

04-Data-Analysis-Overview.pptxShree Shree

rsec2a-2016-jheaton-morningJeff Heaton

Part 3 Machine LearnningMohamed Essam

Machine Learning techniques used in AI.ArchanaT32

Machine learning in computer securityKishor Datta Gupta

Text Analytics for Legal workAlgoAnalytics Financial Consultancy Pvt. Ltd.

Artificial Intelligence and Deep Learning in Azure, CNTK and TensorflowJen Stirrup

Data mining technique for classification and feature evaluation using stream ...ranjit banshpal

Similar to Using AA for incident classification (20)

NEURAL Network Design Training

background.pptx

Machine Learning With ML.NET

Machine Learning : why we should know and how it works

Azure Databricks for Data Scientists

Project

Introduction to Machine Learning with SciKit-Learn

acmsigtalkshare-121023190142-phpapp01.pptx

The Machine Learning Workflow with Azure

Keynote at IWLS 2017

Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmea

Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEA

04-Data-Analysis-Overview.pptx

rsec2a-2016-jheaton-morning

Part 3 Machine Learnning

Machine Learning techniques used in AI.

Machine learning in computer security

Text Analytics for Legal work

Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow

Data mining technique for classification and feature evaluation using stream ...

Recently uploaded

edited gordis ebook sixth edition david d.pdfgreat91

What is Insertion Sort. Its basic informationmuqadasqasim10

原件一样伦敦国王学院毕业证成绩单留信学历认证pwgnohujw

Displacement, Velocity, Acceleration, and Second Derivatives23050636

Aggregations - The Elasticsearch "GROUP BY"John Sobanski

如何办理(Dalhousie毕业证书）达尔豪斯大学毕业证成绩单留信学历认证zifhagzkk

原件一样(UWO毕业证书）西安大略大学毕业证成绩单留信学历认证pwgnohujw

Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksBoston Institute of Analytics

Audience Researchndfhcvnfgvgbhujhgfv.pptxStephen266013

obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...yulianti213969

Formulas dax para power bI de microsoft.pdfRobertoOcampo24

社内勉強会資料_Object Recognition as Next Token PredictionNABLAS株式会社

Identify Customer Segments to Create Customer Offers for Each Segment - Appli...ThinkInnovation

obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...yulianti213969

Bios of leading Astrologers & Researchersdarmandersingh4580

Data Analysis Project Presentation : NYC Shooting Cluster AnalysisBoston Institute of Analytics

Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Valters Lauzums

Digital Marketing Demystified: Expert Tips from Samantha Rae CoolbethSamantha Rae Coolbeth

Predictive Precipitation: Advanced Rain Forecasting TechniquesBoston Institute of Analytics

如何办理(UPenn毕业证书）宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证acoha1

Recently uploaded (20)

edited gordis ebook sixth edition david d.pdf

What is Insertion Sort. Its basic information

原件一样伦敦国王学院毕业证成绩单留信学历认证

Displacement, Velocity, Acceleration, and Second Derivatives

Aggregations - The Elasticsearch "GROUP BY"

如何办理(Dalhousie毕业证书）达尔豪斯大学毕业证成绩单留信学历认证

原件一样(UWO毕业证书）西安大略大学毕业证成绩单留信学历认证

Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks

Audience Researchndfhcvnfgvgbhujhgfv.pptx

obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...

Formulas dax para power bI de microsoft.pdf

社内勉強会資料_Object Recognition as Next Token Prediction

Identify Customer Segments to Create Customer Offers for Each Segment - Appli...

obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...

Bios of leading Astrologers & Researchers

Data Analysis Project Presentation : NYC Shooting Cluster Analysis

Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...

Digital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth

Predictive Precipitation: Advanced Rain Forecasting Techniques

如何办理(UPenn毕业证书）宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证

Using AA for incident classification

1. Using Advanced Analytics to Classify Electrical-Network-Related Incidents Jessie Nghiem – Energy Safe Victoria

2. Let’s start with who we are… • Energy Safe Victoria (ESV) • Data and Analytics team • About myself • Current: part-time Senior Data Scientist and full-time new mom • Past • Lead Data Scientist at MLC Life Insurance • Postdoc researcher at RMIT • Data Scientist at InfoCentric • Teaching Associate at Monash • PhD at Monash

3. What have we used AI/ML for? • Audit electrical products being sold online • Find the correlations between weather conditions with the rate of network related ground fire incidents • Classify electrical network related incidents – case study for today • And more…

4. The classic free text problem… • Distribution businesses are required to report critical network incidents to ESV through OSIRIS, our reporting platform

5. Let’s take a closer look at the data • 5 years of data • 3761 rows and 33 columns • 3 important free text columns • Description • Causes • Actions taken • a labelled column of 15 incident categories. • We use IDEAR utility data exploration

6. First we build models using only text columns … Incident data Data extraction and cleansing Preprocessing Feature Engineering Model Building Model Evaluation TF-IDF (word/character/n- gram) Word Embedding Count Naïve Bayes Linear Classifier SVM Bagging Models (Random Forest, Extreme Gradient Boosting) First iteration Tokenisation Remove stop words Stemming and Lemmatisaion

7. And the winner is… Algorithm Accuracy Naive Bayes (Count Vector) 0.68 Naive Bayes (Word Level TF-IDF) 0.63 Naive Bayes (N-gram level TF-IDF) 0.63 Naive Bayes (Character level TF-IDF) 0.55 Linear Classifier (Count Vector) 0.76 Linear Classifier (Word Level TF-IDF) 0.77 Linear Classifier (N-gram level TF-IDF) 0.68 Linear Classifier (Character level TF-IDF) 0.75 Xgb Classifier (Count Vector) 0.76 Xgb Classifier (Word Level TF-IDF) 0.76 Xgb Classifier (Character level TF-IDF) 0.76 … … Linear Classifier (Logistic Regression) on Work Level TF-IDF Vector model is chosen with the highest accuracy

8. This time we throw non-free- text columns in… Incident data Data extraction and cleansing Preprocessing Feature Engineering Model Building Model Evaluation TF-IDF (word level) Linear Classifier (text column only) Linear Classifier (all columns) Second iteration Accuracy reaches 80%

9. Now we have a good model, so what’s next? • The output of the model has been fed into the data process for internal reporting and other advanced analytics • Work with Deakin university to improve the accuracy of model

10. Thank You LinkedIn: https://www.linkedin.com/in/jessie-nghiem-844a9925/ Email: Jessie.nghiem@energysafe.vic.gov.au

Editor's Notes

Hello everyone, thanks for giving me this opportunity to share our learning experience from applying advanced analytics in classifying electrical network-related incidents.
Energy Safe Victoria is a technical and safety regulator responsible for the safe generation, supply and use of electricity, gas and pipelines. When we talk about electricity for example, that will include everything from the generation, transmission distribution to installation and equipment. We also license and register electricians, and issue and audit Certificates of Electrical Safety, which is your guarantee that electrical work has been performed by a qualified electrician. We have a small and relatively new established DA team of 5 people. We extend our capability and resources by collaboration with CSIRO and universities (such as Monash, Deakin and UTS) and external contractors I am currently a part-time Senior Data Scientist, leading the Advanced Analytics programs and managing R&D funding at ESV and also a full-time mom of a always fully-charged toddler. The pandemic doesn’t help my childcare plan but I am grateful that ESV has been always very supportive to give me flexible working arrangement so I can complete the work and enjoy more quality time with my little one. Prior to this role, I worked for MLC Life Insurance, RMIT and InfoCentric. Before being a mom, I enjoyed spending my spare time on teaching students at Monash, where I did my PhD in Computer Science..
Our AI/ML learning journey has been quite challenging by the lack of readiness of the data for AA at ESV. However, instead of sitting there and waiting for perfect setting to come, we roll our sleeve up and work with what we have. A few projects have been started and well-received by our internal customers. We build an proof-of-concept AI solution to audit electrical products being sold on eBay/Amazon using Azure Cognitive Services. We have been secured a funding from Senior Committee of Officials (SCO) – Council of Australian Government to develop toward a full-scale product out of this PoC. Another project is to find which weather factors has strong influence on the number of ground fire incidents on a certain day. This implies whether the occurrence of an incident can be explained by the weather conditions or bad performance of the distribution businesses, which can be resulted in further investigation. We are also working with UTS to develop a predictive solution to anticipate the rate of incidents based on weather conditions and other geo-spatial factors. For today case study, I am going to share our experience in developing text classifier to put network-related incidents into the right bucket, a data enrichment work for the project we have just mentioned earlier.
The classic free text problem… I believe you can find it in any organization. As a background, distribution businesses are legally required to report critical network incidents to ESV through OSIRIS, our reporting platform. It is going through review, approval process and investigation (if needed). This dataset feeds into our operational report, annual network performance report and other analysis. These incidents can be classified into 15 categories. The classification is based on the causes of the incidents. However, data input is mostly free text and can be filled by non technical people. The casual factors columns are multi-value and hard to identify root cause of the problem. At the moment we have a specialist who is going through incident by incident to classify them into the right bucket. This poses an opportunity to use machine learning to automate and optimize the process.
Let’s take a closer look at the data. We have 5 years of data since the platform was launched. It is not a really big dataset as the number of incidents are small (less than 4K rows with 33 columns) and we try to make it is smaller for our community safety. There are 3 free text columns that have been used by our staff to classify the incidents into 15 categories based on their empirical knowledge. Incidents related to vehicle, connection and trees are the most popular. To explore the data, we use IDEAR utility, an tool written in R/Rmarkdown by Microsoft.
First we built models that use only text columns. As any ML project, 70-80% time of project was on data processing and cleansing. Pre-processing steps include tokenization, removing stop word, lemmatisaion and stemming. It is followed by feature extraction/engineering to prepare for different types of models we want to try with this dataset. In particular, the generated features are TF-IDF vectors on word, character and n-gram), word embedded and count vectors. For each set of features, we trained different models including Naïve Bayes, Linear Classifier (Logistic Regression), support vector machine, bagging models such as Random Forest and Extreme Gradient Boosting to see which one performs the best in term of accuracy.
And the winner is Linear Classifier (Logistic Regression: with accuracy 77%. That means we found a good model but it can be better…
This time we throw more columns in in combination with those extracted feature from the previous build. These columns do have to be free text but can contain categorical or numerical values such as network type, voltage, caused by technical or work practice or environmental factors etc. We run the Logistic Regression model again. As a result the accuracy reaches 80%, makes the model a lot more useful.
Now we have a good model, so what is next - The output of the model has been fed into the data process for internal reporting and other advanced analytics. We are working with Deakin university to improve the accuracy of model And that concludes my talk
Thank you for attending my session. I hope you enjoy it. If you have any question or interested in collaborating with us, please feel free to reach out. My LinkedIn profile and work email address are included here. And thank you again

Using AA for incident classification

Recommended

Recommended

More Related Content

Similar to Using AA for incident classification

Similar to Using AA for incident classification (20)

Recently uploaded

Recently uploaded (20)

Using AA for incident classification

Editor's Notes