SlideShare a Scribd company logo
1 of 22
Download to read offline
Governance Services
iTime Project:
Automated Filing
Tom Page
Learn. Connect. Collaborate.
What Is Filing?
Learn. Connect. Collaborate.
What Is Automated Filing?
Learn. Connect. Collaborate.
Real Data is Hard
National
Security
Business
Sensitive
Personal
Data
Learn. Connect. Collaborate.
Records Documents
Learn. Connect. Collaborate.
Planning
Learn. Connect. Collaborate.
Data Collection
Learn. Connect. Collaborate.
Supervised Learning in One Slide
2. Make model better
3. Check quality
1. Split Data
4. Use!
Learn. Connect. Collaborate.
Supervised Learning in One Slide
2. Make model better
3. Check quality
1. Split Data
4. Use!
Learn. Connect. Collaborate.
Design
Train Model
Get Suggestions
Learn. Connect. Collaborate.
Library Choice
+ One of the above
UI
No UI
Java Other (Python)
Learn. Connect. Collaborate.
Weka
+
Learn. Connect. Collaborate.
Train: Find most popular site
Test: Guess at that site
Baseline
17.5%
Learn. Connect. Collaborate.
Algorithms
Accuracy Train Time
(13k docs)
Test Time
(6k docs)
Can
Update?
k Nearest Neighbours 96% 0 2s Y
Decision Tree (PART) 95% 3s 0 N
Hoeffding Tree 89% 1s 0 Y
Naive Bayes 87% 0 1s Y
Neural Network 75% 15m 10s Y
Learn. Connect. Collaborate.
Algorithms
Accuracy Train Time
(13k docs)
Test Time
(6k docs)
Can
Update?
k Nearest Neighbours 96% 0 2s Y
Decision Tree (PART) 95% 3s 0 N
Hoeffding Tree 89% 1s 0 Y
Naive Bayes 87% 0 1s Y
Neural Network 75% 15m 10s Y
Learn. Connect. Collaborate.
Kevin Frans
http://kvfrans.com/simple-algoritms-for-solving-cartpole/
In the end, simple methods are
almost always better when
dealing with simple problems.
Learn. Connect. Collaborate.
Feature Engineering
cm:createdByUser tpage 361 Identifier
cm:createdAt 2019-01-31T12:59:18 1548939558 Unix Timestamp
Raw Data Feature
Learn. Connect. Collaborate.
Feature Engineering
cm:createdByUser tpage 361 Identifier
cm:createdAt 2019-01-31T12:59:18 1548939558 Unix Timestamp
12 Hour of Day
4 Day of Week
cm:name DevCon.pptx DevCon First word
pptx Last word
0,0,0,1,0,0,1,0,... Bag of words
Raw Data Feature
Learn. Connect. Collaborate.
Feature Engineering
cm:createdByUser tpage 361 Identifier
cm:createdAt 2019-01-31T12:59:18 1548939558 Unix Timestamp
12 Hour of Day
4 Day of Week
cm:name DevCon.pptx DevCon First word
pptx Last word
0,0,0,1,0,0,1,0,... Bag of words
Raw Data Feature
Learn. Connect. Collaborate.
Feature Performance
Author 86%
Created At 59%
Day Of Week 23%
Hour Of Day 29%
First Word 60%
Last Word 34%
MIME Type 30%
File Size 25%
91%
Learn. Connect. Collaborate.
Feature Performance
Author 86%
Created At 59%
Day Of Week 23%
Hour Of Day 29%
First Word 60%
Last Word 34%
MIME Type 30%
File Size 25%
91%
97%
Three
guesses
Governance Services
iTime Project:
Automated Filing
Thank you!

More Related Content

Similar to Governance Services iTime Project: Automated Filing - DevCon 2019

Plone, the Python CMS & Web Framework for Advanced Topics and Non-Developers
Plone, the Python CMS & Web Framework for Advanced Topics and Non-DevelopersPlone, the Python CMS & Web Framework for Advanced Topics and Non-Developers
Plone, the Python CMS & Web Framework for Advanced Topics and Non-DevelopersAlexander Loechel
 
Sacrificing the golden calf of "coding"
Sacrificing the golden calf of "coding"Sacrificing the golden calf of "coding"
Sacrificing the golden calf of "coding"Christian Heilmann
 
10 Ways To Improve Your Code( Neal Ford)
10  Ways To  Improve  Your  Code( Neal  Ford)10  Ways To  Improve  Your  Code( Neal  Ford)
10 Ways To Improve Your Code( Neal Ford)guestebde
 
SCAM 2012 Keynote Slides on Cooperative Testing and Analysis by Tao Xie
SCAM 2012 Keynote Slides on Cooperative Testing and Analysis by Tao XieSCAM 2012 Keynote Slides on Cooperative Testing and Analysis by Tao Xie
SCAM 2012 Keynote Slides on Cooperative Testing and Analysis by Tao XieTao Xie
 
Real-world Document Classification with Transfer Learning
Real-world Document Classification with Transfer LearningReal-world Document Classification with Transfer Learning
Real-world Document Classification with Transfer LearningPradeep Thiyyagura
 
PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning" PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning" Joshua Bloom
 
Building a cutting-edge data processing environment on a budget
Building a cutting-edge data processing environment on a budgetBuilding a cutting-edge data processing environment on a budget
Building a cutting-edge data processing environment on a budgetGael Varoquaux
 
Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014
Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014
Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014Austin Ogilvie
 
Computer Tools for Academic Research
Computer Tools for Academic ResearchComputer Tools for Academic Research
Computer Tools for Academic ResearchMiklos Koren
 
JavaScript Best Pratices
JavaScript Best PraticesJavaScript Best Pratices
JavaScript Best PraticesChengHui Weng
 
Software Carpentry and the Hydrological Sciences @ AGU 2013
Software Carpentry and the Hydrological Sciences @ AGU 2013Software Carpentry and the Hydrological Sciences @ AGU 2013
Software Carpentry and the Hydrological Sciences @ AGU 2013Aron Ahmadia
 
[DevDay 2016] Secret tools for a Scrum Team - Speaker: Sebastian Sussman – CI...
[DevDay 2016] Secret tools for a Scrum Team - Speaker: Sebastian Sussman – CI...[DevDay 2016] Secret tools for a Scrum Team - Speaker: Sebastian Sussman – CI...
[DevDay 2016] Secret tools for a Scrum Team - Speaker: Sebastian Sussman – CI...DevDay.org
 
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge DatasetsScaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge DatasetsTuri, Inc.
 
Abcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosasAbcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosasMerce Crosas
 
Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jo...
Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jo...Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jo...
Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jo...Grammarly
 

Similar to Governance Services iTime Project: Automated Filing - DevCon 2019 (20)

Plone, the Python CMS & Web Framework for Advanced Topics and Non-Developers
Plone, the Python CMS & Web Framework for Advanced Topics and Non-DevelopersPlone, the Python CMS & Web Framework for Advanced Topics and Non-Developers
Plone, the Python CMS & Web Framework for Advanced Topics and Non-Developers
 
10 Ways To Improve Your Code
10 Ways To Improve Your Code10 Ways To Improve Your Code
10 Ways To Improve Your Code
 
Tf itpbapm
Tf itpbapmTf itpbapm
Tf itpbapm
 
Sacrificing the golden calf of "coding"
Sacrificing the golden calf of "coding"Sacrificing the golden calf of "coding"
Sacrificing the golden calf of "coding"
 
10 Ways To Improve Your Code( Neal Ford)
10  Ways To  Improve  Your  Code( Neal  Ford)10  Ways To  Improve  Your  Code( Neal  Ford)
10 Ways To Improve Your Code( Neal Ford)
 
SCAM 2012 Keynote Slides on Cooperative Testing and Analysis by Tao Xie
SCAM 2012 Keynote Slides on Cooperative Testing and Analysis by Tao XieSCAM 2012 Keynote Slides on Cooperative Testing and Analysis by Tao Xie
SCAM 2012 Keynote Slides on Cooperative Testing and Analysis by Tao Xie
 
Real-world Document Classification with Transfer Learning
Real-world Document Classification with Transfer LearningReal-world Document Classification with Transfer Learning
Real-world Document Classification with Transfer Learning
 
PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning" PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning"
 
Building a cutting-edge data processing environment on a budget
Building a cutting-edge data processing environment on a budgetBuilding a cutting-edge data processing environment on a budget
Building a cutting-edge data processing environment on a budget
 
2014 toronto-torbug
2014 toronto-torbug2014 toronto-torbug
2014 toronto-torbug
 
Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014
Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014
Applied Data Science: Building a Beer Recommender | Data Science MD - Oct 2014
 
Computer Tools for Academic Research
Computer Tools for Academic ResearchComputer Tools for Academic Research
Computer Tools for Academic Research
 
Ig1 Mashups
Ig1   MashupsIg1   Mashups
Ig1 Mashups
 
JavaScript Best Pratices
JavaScript Best PraticesJavaScript Best Pratices
JavaScript Best Pratices
 
Software Carpentry and the Hydrological Sciences @ AGU 2013
Software Carpentry and the Hydrological Sciences @ AGU 2013Software Carpentry and the Hydrological Sciences @ AGU 2013
Software Carpentry and the Hydrological Sciences @ AGU 2013
 
2014 pycon-talk
2014 pycon-talk2014 pycon-talk
2014 pycon-talk
 
[DevDay 2016] Secret tools for a Scrum Team - Speaker: Sebastian Sussman – CI...
[DevDay 2016] Secret tools for a Scrum Team - Speaker: Sebastian Sussman – CI...[DevDay 2016] Secret tools for a Scrum Team - Speaker: Sebastian Sussman – CI...
[DevDay 2016] Secret tools for a Scrum Team - Speaker: Sebastian Sussman – CI...
 
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge DatasetsScaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
 
Abcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosasAbcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosas
 
Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jo...
Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jo...Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jo...
Grammarly AI-NLP Club #2 - Recent advances in applied chatbot technology - Jo...
 

Recently uploaded

why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 

Recently uploaded (20)

why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 

Governance Services iTime Project: Automated Filing - DevCon 2019