SlideShare a Scribd company logo
1 of 26
COMP 7/8150
Data Science I
Sorting out Leaky Records in
Payments Log
Kishor Datta Gupta
Computer Science
COMP 7/8150
Data Science I
Goal/Scope
 Develop a classifier for leaky and non-leaky data
 Chicago city Council daily release their operational logs.
 I will try to identify leaky and non-leaky data in their daily purchase
logs.
COMP 7/8150
Data Science I
Lit Review
“The Government should provide opportunities for citizens to
participate in decision-making processes by harnessing collective
knowledge of the society”
“A primary goal of open government, transparency means
disclosure of information about official decisions and activity in
forms that citizens can easily read and use ”
In this respect, the Chicago open data web portal released more than
800 data set in various machine readable formats such as tables, plain
texts or maps about various activities of the city authorities
COMP 7/8150
Data Science I
Leaky Records
 Example Daily Payments Record
“Day 06-09-2017 For Invoice PVCI17CI018100 paid 2884$ from DEPT OF
GENERAL SERVICES reference contract no 26775”
Definition of leaky is a records containing useful information to purport an attack on
Chicago city infrastructures or violate HIPA, CIPA or other privacy laws. Such as
 A record can reveal information about police and emergency response team. As
example: Chicago city police weapon inventory.
 It contains details of restricted place as example airport runway electronic signal
system.
 It contains cyber security information for city day to day work specially in medical
area. As example data storage facility information.
COMP 7/8150
Data Science I
..Leaky Records (example)
COMP 7/8150
Data Science I
..Leaky Records (example)
COMP 7/8150
Data Science I
..Leaky Records (example)
COMP 7/8150
Data Science I
Raw Data Sample
Daily Purchase Log Example
Invoice Amount Date Department
Contract
Number Vendor Name
PVCI17CI018100 2884 06-09-2017
DEPT OF GENERAL
SERVICES 26775 SOUTHWEST INDUSTRIES
PVCI17CI028443 659.56 06-09-2017
DEPT OF GENERAL
SERVICES 30559 CUMMINS N POWER, LLC
PVCI17CI087902 59.58 06-09-2017
DEPT OF GENERAL
SERVICES 33233 OFFICE DEPOT, INC.
PVCI17CI087954 58450 06-09-2017
DEPARTMENT OF
POLICE 25150
ALLIED SERVICES GROUP,
INC.
Data Point 1.24M Updating Every Day
COMP 7/8150
Data Science I
Raw Data Sample
Contract Information Example
Description Spec Rv Vendor ID Type Total Ammount Prc Type
OMP - South Airfield Runway 10R-28L -
Site Preparation 26117 119 99339
CONSTRUCTION-
AVIATION 179643.4 BID
Airfield Lighting Control Vault
Improvements - MDW, Spec# 115950,
Req# 79785 28241 17 115950
CONSTRUCTION-
AVIATION 24940 PRC
Phase 16 Residential Sound Insulation
Program ORD -Bid Pkg #2 (200 Homes),
Spec# 117222, Req# 81468 29398 2 117222
CONSTRUCTION-
AVIATION -69837.1 BID
Data Point 131K Updating Every Day
COMP 7/8150
Data Science I
Raw Data Sample
Vendor Information Example
Rv Vendor ID Type Vendor Name Address1 Address2 State Zip
119 99339 CONSTRUCTION-AVIATION TURNER-CONCRETE
STRUCTURES-LINDAHL TRI
VENTURE
55 E MONROE ST CHICAGO IL 60603
17 115950 CONSTRUCTION-AVIATION DIVANE BROTHERS. ELECTRIC
CO.
2424 N 25TH AVE FRANKLI
N PARK
IL 60131
2 117222 CONSTRUCTION-AVIATION ASBACH & VANSELOW INC 1000 BROWN
STREET EFT
WAUCO
NDA
IL 60084
Data Point 4,989 Updating Every Month
(expected)
COMP 7/8150
Data Science I
Data corpus
Population: All Data available in Chicago city data portal until 30th
November 2018
Purchase Log data point: 1.24M
Cleansed data point : 119277
I1 I2 I3 I4 Amm Contract Dep Sprc PRC ZIP Speccode SpecNum Result
22 174 101 17 2237.08 33697 16 6 0 60660 65 14 1
127 183 0 0 181184.8 8363 18 7 0 50266 422 34 1
11 181 86 495 59.96 33233 0 0 0 60000 300 10 1
11 181 17 156 3465.25 24932 3 5 1 60101 364 73 1
138 174 0 0 1566.21 19550 12 7 0 60062 263 72 1
188 179 0 0 13720.86 28002 11 2 2 20151 7 90 0
11 181 19 315 9523.5 33233 0 0 0 60000 300 10 1
COMP 7/8150
Data Science I
..Data Corpus
Curated Data: 30000 data point ( Based on contract document
published + manual curating)
Un observed Data :89277
Training Data: 10000 data point (7603 leaky and 2397 non leaky)
Testing Data: 20000 data point (15336 leaky and 4664 non leaky)
COMP 7/8150
Data Science I
Ontology
Invoice number: Each daily log has an invoice number as payment
description reference.
Examples:CV50165009685 PV85168550294
Specification Code: Every purchase order has a specification code
based on purchase type and description.
Specification Type: Every purchase order has a specification Type
based on purchase type and description. There are 49 unique
specification type.
Department: Purchase done under different departments there are
56 department
Procurement: The way vendor get the work order , there are 14
different type such as BID, Sole source, Joint, etc.
Vendor Specification : Vendor type code.
COMP 7/8150
Data Science I
Case Study
Application Developed using Public records from the Chicago City
Council
Chicago City Crime is an Android application that implements a useful and simplistic tool for
users to instantly get crime data, based on their current position in Chicago. By becoming more
conscious of how and what kinds of crimes have perpetrated around their area, this allows them
to secure informed judgment and act that will help them and their neighborhood.
Application Developed to analyze financial data
Chicago TIF Viewer is a unique map viewer allowing free access to data and services with
three features: Tax Increment Financing District Information, Ward Contact Information, and US
Census 2010 Unemployment Rates.
[1] Kassen, M. (2013). A promising phenomenon of open data: A case study of the Chicago
open data project. Government Information Quarterly, 30(4), 508-513.
COMP 7/8150
Data Science I
Correlation
-ggpairs(data=dataa, columns=c("Result","Sprc","PRC","Dep","SPECcode","SpecNum"), title="payment data")
COMP 7/8150
Data Science I
.. Correlation
-ggpairs(data=dataa, columns=c("Result","Sprc","PRC","Dep","SPECcode","SpecNum"), title="payment data")
COMP 7/8150
Data Science I
..Linear Model
COMP 7/8150
Data Science I
..Neural Network
COMP 7/8150
Data Science I
Neural Network
COMP 7/8150
Data Science I
Neural Network Result
Reference
0 1
Prediction 164 824
1040 2972
Accuracy 0.6272
F1 0.7612704918
Recall 0.7829293994
precision 0.740777667
RMSE 0.486190425
COMP 7/8150
Data Science I
..SVM
Reference
0 1
Prediction 498 118
225 2159
Accuracy 0.6272
F1 0.9264106415
Recall 0.9481774264
precision 0.9056208054
sigma 0. 09105773094
COMP 7/8150
Data Science I
Decision Tree
COMP 7/8150
Data Science I
..Decision Tree
COMP 7/8150
Data Science I
..Tree
Reference
0 1
Prediction 1133 291
1264 7312
Accuracy 0.8445
F1 0.9038878
Recall 0.9617256
precision 0.8526119
Reference
0 1
Prediction 342 1082
2065 6511
Accuracy 0.6853
F1 0.8053683
Recall 0.8575003
precision 0.7592118
Training Set Testing Set
COMP 7/8150
Data Science I
Data Analysis (Using Weka)
For J48 Tree above
COMP 7/8150
Data Science I
Deliverables
 Data Classifier which will classify each purchase record in
purchase logs from Chicago city council as
 Leaky
 Non-Leaky
 Evaluation results of performance
I Calibrate the model against different classifier and accepted accuracy
threshold is 80% and F1 score is >0.85

More Related Content

What's hot

What's hot (7)

The archived Canadian US Patent Competitive Intelligence Database (2018/2/6)
The archived Canadian US Patent Competitive Intelligence Database (2018/2/6)The archived Canadian US Patent Competitive Intelligence Database (2018/2/6)
The archived Canadian US Patent Competitive Intelligence Database (2018/2/6)
 
The latest Canadian US Patent Competitive Intelligence Database (2018/9/4)
The latest Canadian US Patent Competitive Intelligence Database (2018/9/4)The latest Canadian US Patent Competitive Intelligence Database (2018/9/4)
The latest Canadian US Patent Competitive Intelligence Database (2018/9/4)
 
The archived Canadian US Patent Competitive Intelligence Database (2018/5/1)
The archived Canadian US Patent Competitive Intelligence Database (2018/5/1)The archived Canadian US Patent Competitive Intelligence Database (2018/5/1)
The archived Canadian US Patent Competitive Intelligence Database (2018/5/1)
 
The latest Canadian US Patent Competitive Intelligence Database (2018/6/5)
The latest Canadian US Patent Competitive Intelligence Database (2018/6/5)The latest Canadian US Patent Competitive Intelligence Database (2018/6/5)
The latest Canadian US Patent Competitive Intelligence Database (2018/6/5)
 
ACT Research: Demand Strength and Stagnating Freight Growth Continue for Nort...
ACT Research: Demand Strength and Stagnating Freight Growth Continue for Nort...ACT Research: Demand Strength and Stagnating Freight Growth Continue for Nort...
ACT Research: Demand Strength and Stagnating Freight Growth Continue for Nort...
 
WUSTL Roundtable
WUSTL RoundtableWUSTL Roundtable
WUSTL Roundtable
 
Esri U.S. Data Fact Sheet
Esri U.S. Data Fact SheetEsri U.S. Data Fact Sheet
Esri U.S. Data Fact Sheet
 

Similar to Analyze open data of Chicago city data portal

IoT in Public Sector
IoT in Public Sector IoT in Public Sector
IoT in Public Sector
Bessie Wang
 
Follow The Money Vars
Follow The Money VarsFollow The Money Vars
Follow The Money Vars
susanallen75
 
Recorded Future News Analytics for Financial Services
Recorded Future News Analytics for Financial ServicesRecorded Future News Analytics for Financial Services
Recorded Future News Analytics for Financial Services
Chris Holden
 
INTEL CORPFORM 10-K(Annual Report)Filed 021414 for.docx
INTEL CORPFORM 10-K(Annual Report)Filed 021414 for.docxINTEL CORPFORM 10-K(Annual Report)Filed 021414 for.docx
INTEL CORPFORM 10-K(Annual Report)Filed 021414 for.docx
mariuse18nolet
 

Similar to Analyze open data of Chicago city data portal (20)

IoT in Public Sector
IoT in Public Sector IoT in Public Sector
IoT in Public Sector
 
Follow The Money Vars
Follow The Money VarsFollow The Money Vars
Follow The Money Vars
 
Vivek Kundra: Creating the Digital Public Square / Forum One Web Executive Se...
Vivek Kundra: Creating the Digital Public Square / Forum One Web Executive Se...Vivek Kundra: Creating the Digital Public Square / Forum One Web Executive Se...
Vivek Kundra: Creating the Digital Public Square / Forum One Web Executive Se...
 
Analytics in IoT
Analytics in IoTAnalytics in IoT
Analytics in IoT
 
1010 chapter9
1010 chapter91010 chapter9
1010 chapter9
 
Open Data, Research, & the Internet of Things
Open Data, Research, & the Internet of ThingsOpen Data, Research, & the Internet of Things
Open Data, Research, & the Internet of Things
 
travel safety
travel safetytravel safety
travel safety
 
Recorded Future News Analytics for Financial Services
Recorded Future News Analytics for Financial ServicesRecorded Future News Analytics for Financial Services
Recorded Future News Analytics for Financial Services
 
World Routes 2014 Keynote Presentation – How Big Date Changes Aviation Effici...
World Routes 2014 Keynote Presentation – How Big Date Changes Aviation Effici...World Routes 2014 Keynote Presentation – How Big Date Changes Aviation Effici...
World Routes 2014 Keynote Presentation – How Big Date Changes Aviation Effici...
 
Keynote Presentation – How Big Date Changes Aviation Efficiency (Josh Marks, ...
Keynote Presentation – How Big Date Changes Aviation Efficiency (Josh Marks, ...Keynote Presentation – How Big Date Changes Aviation Efficiency (Josh Marks, ...
Keynote Presentation – How Big Date Changes Aviation Efficiency (Josh Marks, ...
 
INTEL CORPFORM 10-K(Annual Report)Filed 021414 for.docx
INTEL CORPFORM 10-K(Annual Report)Filed 021414 for.docxINTEL CORPFORM 10-K(Annual Report)Filed 021414 for.docx
INTEL CORPFORM 10-K(Annual Report)Filed 021414 for.docx
 
AI on Demand: Data Science in Operations
AI on Demand: Data Science in OperationsAI on Demand: Data Science in Operations
AI on Demand: Data Science in Operations
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
Using real-time data in the energy sector
Using real-time data in the energy sectorUsing real-time data in the energy sector
Using real-time data in the energy sector
 
Monitoring with Elastic Machine Learning at Sky
Monitoring with Elastic Machine Learning at SkyMonitoring with Elastic Machine Learning at Sky
Monitoring with Elastic Machine Learning at Sky
 
México yolanda martínez
México yolanda martínezMéxico yolanda martínez
México yolanda martínez
 
Plan de Gobierno Digital en México
Plan de Gobierno Digital en MéxicoPlan de Gobierno Digital en México
Plan de Gobierno Digital en México
 
Panchayat Departmental Information Portal Management System - I.T. ACADEMICS ...
Panchayat Departmental Information Portal Management System - I.T. ACADEMICS ...Panchayat Departmental Information Portal Management System - I.T. ACADEMICS ...
Panchayat Departmental Information Portal Management System - I.T. ACADEMICS ...
 
Making IoT Data Actionable Using Predictive Analytics
Making IoT Data Actionable Using Predictive AnalyticsMaking IoT Data Actionable Using Predictive Analytics
Making IoT Data Actionable Using Predictive Analytics
 
Follow The Money Vars
Follow The Money VarsFollow The Money Vars
Follow The Money Vars
 

More from Kishor Datta Gupta

Interpretable Learning Model for Lower Dimensional Feature Space: A Case stud...
Interpretable Learning Model for Lower Dimensional Feature Space: A Case stud...Interpretable Learning Model for Lower Dimensional Feature Space: A Case stud...
Interpretable Learning Model for Lower Dimensional Feature Space: A Case stud...
Kishor Datta Gupta
 
A safer approach to build recommendation systems on unidentifiable data
A safer approach to build recommendation systems on unidentifiable dataA safer approach to build recommendation systems on unidentifiable data
A safer approach to build recommendation systems on unidentifiable data
Kishor Datta Gupta
 
Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...
Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...
Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...
Kishor Datta Gupta
 

More from Kishor Datta Gupta (20)

GAN introduction.pptx
GAN introduction.pptxGAN introduction.pptx
GAN introduction.pptx
 
Interpretable Learning Model for Lower Dimensional Feature Space: A Case stud...
Interpretable Learning Model for Lower Dimensional Feature Space: A Case stud...Interpretable Learning Model for Lower Dimensional Feature Space: A Case stud...
Interpretable Learning Model for Lower Dimensional Feature Space: A Case stud...
 
A safer approach to build recommendation systems on unidentifiable data
A safer approach to build recommendation systems on unidentifiable dataA safer approach to build recommendation systems on unidentifiable data
A safer approach to build recommendation systems on unidentifiable data
 
Adversarial Attacks and Defense
Adversarial Attacks and DefenseAdversarial Attacks and Defense
Adversarial Attacks and Defense
 
Who is responsible for adversarial defense
Who is responsible for adversarial defenseWho is responsible for adversarial defense
Who is responsible for adversarial defense
 
Robust Filtering Schemes for Machine Learning Systems to Defend Adversarial A...
Robust Filtering Schemes for Machine Learning Systems to Defend Adversarial A...Robust Filtering Schemes for Machine Learning Systems to Defend Adversarial A...
Robust Filtering Schemes for Machine Learning Systems to Defend Adversarial A...
 
Zero shot learning
Zero shot learning Zero shot learning
Zero shot learning
 
Using Negative Detectors for Identifying Adversarial Data Manipulation in Mac...
Using Negative Detectors for Identifying Adversarial Data Manipulation in Mac...Using Negative Detectors for Identifying Adversarial Data Manipulation in Mac...
Using Negative Detectors for Identifying Adversarial Data Manipulation in Mac...
 
Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...
Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...
Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...
 
Machine learning in computer security
Machine learning in computer securityMachine learning in computer security
Machine learning in computer security
 
Policy Based reinforcement Learning for time series Anomaly detection
Policy Based reinforcement Learning for time series Anomaly detectionPolicy Based reinforcement Learning for time series Anomaly detection
Policy Based reinforcement Learning for time series Anomaly detection
 
Cyber intrusion
Cyber intrusionCyber intrusion
Cyber intrusion
 
understanding the pandemic through mining covid news using natural language p...
understanding the pandemic through mining covid news using natural language p...understanding the pandemic through mining covid news using natural language p...
understanding the pandemic through mining covid news using natural language p...
 
Different representation space for MNIST digit
Different representation space for MNIST digitDifferent representation space for MNIST digit
Different representation space for MNIST digit
 
"Can NLP techniques be utilized as a reliable tool for medical science?" -Bui...
"Can NLP techniques be utilized as a reliable tool for medical science?" -Bui..."Can NLP techniques be utilized as a reliable tool for medical science?" -Bui...
"Can NLP techniques be utilized as a reliable tool for medical science?" -Bui...
 
Applicability issues of Evasion-Based Adversarial Attacks and Mitigation Tech...
Applicability issues of Evasion-Based Adversarial Attacks and Mitigation Tech...Applicability issues of Evasion-Based Adversarial Attacks and Mitigation Tech...
Applicability issues of Evasion-Based Adversarial Attacks and Mitigation Tech...
 
Adversarial Input Detection Using Image Processing Techniques (IPT)
Adversarial Input Detection Using Image Processing Techniques (IPT)Adversarial Input Detection Using Image Processing Techniques (IPT)
Adversarial Input Detection Using Image Processing Techniques (IPT)
 
Clustering report
Clustering reportClustering report
Clustering report
 
Basic digital image concept
Basic digital image conceptBasic digital image concept
Basic digital image concept
 
An empirical study on algorithmic bias (aiml compsac2020)
An empirical study on algorithmic bias (aiml compsac2020)An empirical study on algorithmic bias (aiml compsac2020)
An empirical study on algorithmic bias (aiml compsac2020)
 

Recently uploaded

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 

Recently uploaded (20)

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 

Analyze open data of Chicago city data portal

  • 1. COMP 7/8150 Data Science I Sorting out Leaky Records in Payments Log Kishor Datta Gupta Computer Science
  • 2. COMP 7/8150 Data Science I Goal/Scope  Develop a classifier for leaky and non-leaky data  Chicago city Council daily release their operational logs.  I will try to identify leaky and non-leaky data in their daily purchase logs.
  • 3. COMP 7/8150 Data Science I Lit Review “The Government should provide opportunities for citizens to participate in decision-making processes by harnessing collective knowledge of the society” “A primary goal of open government, transparency means disclosure of information about official decisions and activity in forms that citizens can easily read and use ” In this respect, the Chicago open data web portal released more than 800 data set in various machine readable formats such as tables, plain texts or maps about various activities of the city authorities
  • 4. COMP 7/8150 Data Science I Leaky Records  Example Daily Payments Record “Day 06-09-2017 For Invoice PVCI17CI018100 paid 2884$ from DEPT OF GENERAL SERVICES reference contract no 26775” Definition of leaky is a records containing useful information to purport an attack on Chicago city infrastructures or violate HIPA, CIPA or other privacy laws. Such as  A record can reveal information about police and emergency response team. As example: Chicago city police weapon inventory.  It contains details of restricted place as example airport runway electronic signal system.  It contains cyber security information for city day to day work specially in medical area. As example data storage facility information.
  • 5. COMP 7/8150 Data Science I ..Leaky Records (example)
  • 6. COMP 7/8150 Data Science I ..Leaky Records (example)
  • 7. COMP 7/8150 Data Science I ..Leaky Records (example)
  • 8. COMP 7/8150 Data Science I Raw Data Sample Daily Purchase Log Example Invoice Amount Date Department Contract Number Vendor Name PVCI17CI018100 2884 06-09-2017 DEPT OF GENERAL SERVICES 26775 SOUTHWEST INDUSTRIES PVCI17CI028443 659.56 06-09-2017 DEPT OF GENERAL SERVICES 30559 CUMMINS N POWER, LLC PVCI17CI087902 59.58 06-09-2017 DEPT OF GENERAL SERVICES 33233 OFFICE DEPOT, INC. PVCI17CI087954 58450 06-09-2017 DEPARTMENT OF POLICE 25150 ALLIED SERVICES GROUP, INC. Data Point 1.24M Updating Every Day
  • 9. COMP 7/8150 Data Science I Raw Data Sample Contract Information Example Description Spec Rv Vendor ID Type Total Ammount Prc Type OMP - South Airfield Runway 10R-28L - Site Preparation 26117 119 99339 CONSTRUCTION- AVIATION 179643.4 BID Airfield Lighting Control Vault Improvements - MDW, Spec# 115950, Req# 79785 28241 17 115950 CONSTRUCTION- AVIATION 24940 PRC Phase 16 Residential Sound Insulation Program ORD -Bid Pkg #2 (200 Homes), Spec# 117222, Req# 81468 29398 2 117222 CONSTRUCTION- AVIATION -69837.1 BID Data Point 131K Updating Every Day
  • 10. COMP 7/8150 Data Science I Raw Data Sample Vendor Information Example Rv Vendor ID Type Vendor Name Address1 Address2 State Zip 119 99339 CONSTRUCTION-AVIATION TURNER-CONCRETE STRUCTURES-LINDAHL TRI VENTURE 55 E MONROE ST CHICAGO IL 60603 17 115950 CONSTRUCTION-AVIATION DIVANE BROTHERS. ELECTRIC CO. 2424 N 25TH AVE FRANKLI N PARK IL 60131 2 117222 CONSTRUCTION-AVIATION ASBACH & VANSELOW INC 1000 BROWN STREET EFT WAUCO NDA IL 60084 Data Point 4,989 Updating Every Month (expected)
  • 11. COMP 7/8150 Data Science I Data corpus Population: All Data available in Chicago city data portal until 30th November 2018 Purchase Log data point: 1.24M Cleansed data point : 119277 I1 I2 I3 I4 Amm Contract Dep Sprc PRC ZIP Speccode SpecNum Result 22 174 101 17 2237.08 33697 16 6 0 60660 65 14 1 127 183 0 0 181184.8 8363 18 7 0 50266 422 34 1 11 181 86 495 59.96 33233 0 0 0 60000 300 10 1 11 181 17 156 3465.25 24932 3 5 1 60101 364 73 1 138 174 0 0 1566.21 19550 12 7 0 60062 263 72 1 188 179 0 0 13720.86 28002 11 2 2 20151 7 90 0 11 181 19 315 9523.5 33233 0 0 0 60000 300 10 1
  • 12. COMP 7/8150 Data Science I ..Data Corpus Curated Data: 30000 data point ( Based on contract document published + manual curating) Un observed Data :89277 Training Data: 10000 data point (7603 leaky and 2397 non leaky) Testing Data: 20000 data point (15336 leaky and 4664 non leaky)
  • 13. COMP 7/8150 Data Science I Ontology Invoice number: Each daily log has an invoice number as payment description reference. Examples:CV50165009685 PV85168550294 Specification Code: Every purchase order has a specification code based on purchase type and description. Specification Type: Every purchase order has a specification Type based on purchase type and description. There are 49 unique specification type. Department: Purchase done under different departments there are 56 department Procurement: The way vendor get the work order , there are 14 different type such as BID, Sole source, Joint, etc. Vendor Specification : Vendor type code.
  • 14. COMP 7/8150 Data Science I Case Study Application Developed using Public records from the Chicago City Council Chicago City Crime is an Android application that implements a useful and simplistic tool for users to instantly get crime data, based on their current position in Chicago. By becoming more conscious of how and what kinds of crimes have perpetrated around their area, this allows them to secure informed judgment and act that will help them and their neighborhood. Application Developed to analyze financial data Chicago TIF Viewer is a unique map viewer allowing free access to data and services with three features: Tax Increment Financing District Information, Ward Contact Information, and US Census 2010 Unemployment Rates. [1] Kassen, M. (2013). A promising phenomenon of open data: A case study of the Chicago open data project. Government Information Quarterly, 30(4), 508-513.
  • 15. COMP 7/8150 Data Science I Correlation -ggpairs(data=dataa, columns=c("Result","Sprc","PRC","Dep","SPECcode","SpecNum"), title="payment data")
  • 16. COMP 7/8150 Data Science I .. Correlation -ggpairs(data=dataa, columns=c("Result","Sprc","PRC","Dep","SPECcode","SpecNum"), title="payment data")
  • 17. COMP 7/8150 Data Science I ..Linear Model
  • 18. COMP 7/8150 Data Science I ..Neural Network
  • 19. COMP 7/8150 Data Science I Neural Network
  • 20. COMP 7/8150 Data Science I Neural Network Result Reference 0 1 Prediction 164 824 1040 2972 Accuracy 0.6272 F1 0.7612704918 Recall 0.7829293994 precision 0.740777667 RMSE 0.486190425
  • 21. COMP 7/8150 Data Science I ..SVM Reference 0 1 Prediction 498 118 225 2159 Accuracy 0.6272 F1 0.9264106415 Recall 0.9481774264 precision 0.9056208054 sigma 0. 09105773094
  • 22. COMP 7/8150 Data Science I Decision Tree
  • 23. COMP 7/8150 Data Science I ..Decision Tree
  • 24. COMP 7/8150 Data Science I ..Tree Reference 0 1 Prediction 1133 291 1264 7312 Accuracy 0.8445 F1 0.9038878 Recall 0.9617256 precision 0.8526119 Reference 0 1 Prediction 342 1082 2065 6511 Accuracy 0.6853 F1 0.8053683 Recall 0.8575003 precision 0.7592118 Training Set Testing Set
  • 25. COMP 7/8150 Data Science I Data Analysis (Using Weka) For J48 Tree above
  • 26. COMP 7/8150 Data Science I Deliverables  Data Classifier which will classify each purchase record in purchase logs from Chicago city council as  Leaky  Non-Leaky  Evaluation results of performance I Calibrate the model against different classifier and accepted accuracy threshold is 80% and F1 score is >0.85