SlideShare a Scribd company logo
PREDICTING OUTCOME OF
LEGAL CASES
Ankita Singh Nilutpal Goswami
Ankita Singh Nilutpal Goswami
Agenda
Domain Objective Data Extraction EDA
Architecture Model
Development
Q & ASummary
and Findings
1 2 3 4
5 6 7 8
Ankita Singh Nilutpal Goswami
Legal Systems of the World
Source – Citation (http://saint-claire.org/)
Ankita Singh Nilutpal Goswami
Hierarchy of Indian Judiciary
Sources of Law
§ Constitution
§ Legislation
• Ordinary
• Delegated
• Ordinance
§ Judicial Precedent
§ Customs
Ankita Singh Nilutpal Goswami
Background
Indian Judicial System
o Largest judicial machinery based on the biggest constitution
o Constitution of India have 448 articles in 25 parts, 12
schedules, 5 appendices and 98 amendments
o Indian Penal Code (IPC) defines various crimes/offences and
prescribes the punishment
o Criminal Procedure Code (CrPC) defines the mandatory
procedures to be carried out pursuing a case
o 24 High Courts and over 600 district courts
o Nearly 5 lakhs cases are filled daily in Indian Courts
o Approximately 4.5 lakhs of cases are put up before the courts
daily
o About 2.5 lakhs of cases are disposed off daily
Ankita Singh Nilutpal Goswami
Challenging Facts
CASES PENDING CIVIL CASES CRIMINAL CASES TOTAL CASES PERCENTAGE
> 10 years 597,166 1,691,515 2,288,681 8.28%
Between 5 to 10 years 1,244,117 3,212,377 4,456,494 16.13%
Between 2 to 5 years 2,542,925 5,394,015 7,936,940 28.73%
< 2 years 3,946,341 8,997,935 12,944,276 46.85%
Total Pending Cases 8,330,549 19,295,842 27,626,391
> 10 years
8%
Between 5 to 10
years
16%
Between 2 to 5 years
29%
< 2 years
47%
Other
76%
Source – National Judicial Data Grid (as on September 18th 2018)
v Case Disposal Rates (August 2018)
§ 10 years – 1.5 %
§ All cases – 3.8 %
v Cases filed daily
~ 5- 8 Lakhs
v Cases pending registration
~ 7.5 Lakhs
v Has 15 judges for every 1 million
of people
v 22.2 million undertrials –
undertrials outnumber the
convicts
Ankita Singh Nilutpal Goswami
Faster processing of legal issues / cases
“Judgement” data sourcing and understanding of
the details
Evaluating predictions based on various machine
learning model
Develop social value by means of streamlining
the judicial case intake
Objective
Ankita Singh Nilutpal Goswami
Sample Judgement document snapshot
o Case Documents Analyzed – 120
o Data extraction mechanism – manual
o Unique fields extracted – 58
o Total number of final observations - 202
Data
• Nature of Disposal
• Case Type
• Court Number
• Court Name
• Judge
• Judge Gender
• Judgement Date
• Total Number of Sections
• Section 1 thru Section 10
• FIR Number/Year
• Police station
• Investigating officer
• Case Number
• Year
• Complainant
• Total Accused
• Accused #
• Accused Name
• Accused Gender
• Accused Age
• Accused Confessed? (plea)
• Date Of first Hearing
• Complainant advocate
• Prosecution advocate
• Advocate Defendant
• Number of Prosecution witnesses
• Names of prosecution witnesses
• PW's Examined?
• Number of hostile witnesses
• Defense witnesses
• Charge sheet
• Points for consideration
• Exhibits on behalf of prosecution P series
• Number of exhibits considered
• Exhibits on behalf of court Cseries
• Exhibits on behalf of accused Dseries
• Total Number of Material Objects
• Charges proved
• Charges not proved
• Issues Proved
• Issues Not Proved
• Accused released on bail
• Accused committed to prison
• Sentence of Imprisonment granted
• Fine with Imprisonment (Rs)
• Term Served in Prison(days)
• Set off (if any)
• Judgement
• Citations
Original Features
Ankita Singh Nilutpal Goswami
• Source – Publicly available judgement
documents
• Case Documents Analyzed – 120
• Data extraction mechanism – manual
• Unique fields extracted – 58
• Consistent features identified -15
(Judgement decision is the Target Variable)
• Total number of final observations - 202
Data
# Feature Name Description Datatype Value
1 ipc_420
Binary indicator to confirm if the case is filed
under IPC 420
Categorical Yes=1, No=0
2 ipc_120b
Binary indicator to confirm if the case is filed
under IPC 120b
Categorical Yes=1, No=0
3 ipc_471
Binary indicator to confirm if the case is filed
under IPC 471
Categorical Yes=1, No=0
4 ipc_468
Binary indicator to confirm if the case is filed
under IPC 468
Categorical Yes=1, No=0
5 ipc_34
Binary indicator to confirm if the case is filed
under IPC 34
Categorical Yes=1, No=0
6 jud_gender Gender of the judge presiding over the case Categorical Male=0, Female=1
7 jud_date Date when judgement was meted Date Date
8 tot_sec Total number of sections filed for the case Numeric Number
9 case_no Unique number of the case Categorical Multiple Factors
10 comp Complainant name * String Name
11 tot_accu Total number of accused presented in the case Numeric Number
12 accu_gender Gender of the individual accused Categorical Male=0, Female=1
13 accu_no Sequence number of the accused Categorical Multiple Factors
14 accu_age Age of the accused Numeric Number
15 judgement Judgement given in the case Categorical
Guilty=1, Not
Guilty=0
Ankita Singh Nilutpal Goswami
Feature Importance
Ankita Singh Nilutpal Goswami
Exploratory Data Analysis
Guilty – 20 Non-Guilty - 182
Ankita Singh Nilutpal Goswami
Exploratory Data Analysis
Density Plot
Ankita Singh Nilutpal Goswami
Exploratory Data Analysis
IPC sections frequency
Ankita Singh Nilutpal Goswami
Exploratory Data Analysis
Correlation Matrix
Ankita Singh Nilutpal Goswami
Architecture
Ankita Singh Nilutpal Goswami
INITIAL MODEL DEVELOPMENT STEPS
PredictionData Collection
Feed data to model
1
2
3
POST IMPLEMENTATION STEPS
FEEDBACK
Development Methodology
Ankita Singh Nilutpal Goswami
Model Development
• Logistic Regression
• K-Nearest Neighbor
• Random Forest
• Support Vector Machine
Ankita Singh Nilutpal Goswami
Model Development
Ankita Singh Nilutpal Goswami
Logistic Regression
Pseudo R-square - 45.4% of the Intercept only
model has been explained by the Full model
Log likelihood ratio implies that the null hypothesis
of all Betas are zero is rejected and at least one Beta
is nonzero.
Ankita Singh Nilutpal Goswami
Accuracy
• Training Sample – 92.9 %
• Validation Sample – 88.5 %
Logistic Regression
Variable Importance
Ankita Singh Nilutpal Goswami
Cross-Validation
10 fold cross-validation resulted the best value
with k=7
From the results,
Accuracy and Kappa reducing after k=5
K-Nearest Neighbor
Ankita Singh Nilutpal Goswami
K-Nearest Neighbor
Model was further tuned by setting twoclassSummary and classProbs
as True.
Tuned model has better accuracy of
93.44%
Ankita Singh Nilutpal Goswami
Random Forest
Model parameters -
• ntree = 250
as OOB hardly changes after 250 trees
• mtry = 3
initially we took sqrt(total_no_of_features)
• nodesize = 3
1% of the total observation (202 observations)
Ankita Singh Nilutpal Goswami
Random Forest
Cross Validation with Parameter Tuning with mtry=2,3 and 4
Tuned model has
accuracy of 93.55 %
Ankita Singh Nilutpal Goswami
Support Vector Machine
• Model found 41 support vectors with gamma
value of 0.017 and cost of 1
• SVM model accuracy 90.02%
Ankita Singh Nilutpal Goswami
10 fold cross validation identified best values of
gamma - 0.1, Cost - 1
Tuned model has accuracy of 95.04 %
Support Vector Machine
Ankita Singh Nilutpal Goswami
Observation
o From the assessment of all the models, Support Vector Machine provides a better
accuracy including other performance parameters.
Model Accuracy (%) Precision (%) Recall (%)
Decision Trees (Gini) 82% 82% 97%
K-Nearest Neighbor 93% 93% 100%
Logistic Regression 88% 96% 91%
Naïve Bayes 75% 76% 95%
Random Forest 94% 93% 100%
Support Vector Machines 95% 94% 99%
Summary - Model Performance
Ankita Singh Nilutpal Goswami
• Support Vector Machine provides a better accuracy
• Better Precision and Recall values obtained from SVM
and Gradient Boosting
• Random data is skewed towards Non-Guilty cases (89 :
11 in favor of Non-Guilty)
• Model has been developed on IPC 420 cases found
across multiple District / High Courts
• Prediction obtained were majorly predicting Non-
Guilty
Summary and Findings
Ankita Singh Nilutpal Goswami
Plan Ahead
MOBILE APPLICATION
Ankita Singh Nilutpal Goswami
Q&A
Thanks

More Related Content

More from Analytics India Magazine

Keep it simple and it works - Simplicity and sticking to fundamentals in the ...
Keep it simple and it works - Simplicity and sticking to fundamentals in the ...Keep it simple and it works - Simplicity and sticking to fundamentals in the ...
Keep it simple and it works - Simplicity and sticking to fundamentals in the ...
Analytics India Magazine
 
Feature Based Opinion Mining By Gourab Nath Core Faculty – Data Science at Pr...
Feature Based Opinion Mining By Gourab Nath Core Faculty – Data Science at Pr...Feature Based Opinion Mining By Gourab Nath Core Faculty – Data Science at Pr...
Feature Based Opinion Mining By Gourab Nath Core Faculty – Data Science at Pr...
Analytics India Magazine
 
Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...
Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...
Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...
Analytics India Magazine
 
Getting your first job in Data Science By Imaad Mohamed Khan Founder-in-Resid...
Getting your first job in Data Science By Imaad Mohamed Khan Founder-in-Resid...Getting your first job in Data Science By Imaad Mohamed Khan Founder-in-Resid...
Getting your first job in Data Science By Imaad Mohamed Khan Founder-in-Resid...
Analytics India Magazine
 
10 data science & AI trends in india to watch out for in 2019
10 data science & AI trends in india to watch out for in 201910 data science & AI trends in india to watch out for in 2019
10 data science & AI trends in india to watch out for in 2019
Analytics India Magazine
 
The hitchhiker's guide to artificial intelligence 2018-19
The hitchhiker's guide to artificial intelligence 2018-19The hitchhiker's guide to artificial intelligence 2018-19
The hitchhiker's guide to artificial intelligence 2018-19
Analytics India Magazine
 
Data Science Skills Study 2018 by AIM & Great Learning
Data Science Skills Study 2018 by AIM & Great LearningData Science Skills Study 2018 by AIM & Great Learning
Data Science Skills Study 2018 by AIM & Great Learning
Analytics India Magazine
 
Emerging engineering issues for building large scale AI systems By Srinivas P...
Emerging engineering issues for building large scale AI systems By Srinivas P...Emerging engineering issues for building large scale AI systems By Srinivas P...
Emerging engineering issues for building large scale AI systems By Srinivas P...
Analytics India Magazine
 
Bringing AI into the Enterprise - A Practitioner's view By Piyush Chowhan CIO...
Bringing AI into the Enterprise - A Practitioner's view By Piyush Chowhan CIO...Bringing AI into the Enterprise - A Practitioner's view By Piyush Chowhan CIO...
Bringing AI into the Enterprise - A Practitioner's view By Piyush Chowhan CIO...
Analytics India Magazine
 
Explainable deep learning with applications in Healthcare By Sunil Kumar Vupp...
Explainable deep learning with applications in Healthcare By Sunil Kumar Vupp...Explainable deep learning with applications in Healthcare By Sunil Kumar Vupp...
Explainable deep learning with applications in Healthcare By Sunil Kumar Vupp...
Analytics India Magazine
 
Getting started with text mining By Mathangi Sri Head of Data Science at Phon...
Getting started with text mining By Mathangi Sri Head of Data Science at Phon...Getting started with text mining By Mathangi Sri Head of Data Science at Phon...
Getting started with text mining By Mathangi Sri Head of Data Science at Phon...
Analytics India Magazine
 
“Who Moved My Cheese?” – Sniff the changes and stay relevant as an analytics ...
“Who Moved My Cheese?” – Sniff the changes and stay relevant as an analytics ...“Who Moved My Cheese?” – Sniff the changes and stay relevant as an analytics ...
“Who Moved My Cheese?” – Sniff the changes and stay relevant as an analytics ...
Analytics India Magazine
 
"Route risks using driving data on road segments" By Jayanta Kumar Pal Staff ...
"Route risks using driving data on road segments" By Jayanta Kumar Pal Staff ..."Route risks using driving data on road segments" By Jayanta Kumar Pal Staff ...
"Route risks using driving data on road segments" By Jayanta Kumar Pal Staff ...
Analytics India Magazine
 
“Who Moved My Cheese?” – Sniff the changes and stay relevant as an analytics ...
“Who Moved My Cheese?” – Sniff the changes and stay relevant as an analytics ...“Who Moved My Cheese?” – Sniff the changes and stay relevant as an analytics ...
“Who Moved My Cheese?” – Sniff the changes and stay relevant as an analytics ...
Analytics India Magazine
 
Analytics Education — A Primer & Learning Path
Analytics Education — A Primer & Learning PathAnalytics Education — A Primer & Learning Path
Analytics Education — A Primer & Learning Path
Analytics India Magazine
 
Analytics & Data Science Industry In India: Study 2018 - by AnalytixLabs & AIM
Analytics & Data Science Industry In India: Study 2018 - by AnalytixLabs & AIMAnalytics & Data Science Industry In India: Study 2018 - by AnalytixLabs & AIM
Analytics & Data Science Industry In India: Study 2018 - by AnalytixLabs & AIM
Analytics India Magazine
 
Study: Analytics & Data Science Jobs in India - 2018
Study: Analytics & Data Science Jobs in India - 2018Study: Analytics & Data Science Jobs in India - 2018
Study: Analytics & Data Science Jobs in India - 2018
Analytics India Magazine
 
Analytics India Salary Study 2018 - by AIM & Great Learning
Analytics India Salary Study 2018 - by AIM & Great LearningAnalytics India Salary Study 2018 - by AIM & Great Learning
Analytics India Salary Study 2018 - by AIM & Great Learning
Analytics India Magazine
 
State of analytics in domestic firms in India 2017 - by AIM & Cartesian Consu...
State of analytics in domestic firms in India 2017 - by AIM & Cartesian Consu...State of analytics in domestic firms in India 2017 - by AIM & Cartesian Consu...
State of analytics in domestic firms in India 2017 - by AIM & Cartesian Consu...
Analytics India Magazine
 
Cypher 2017 - Lalit Sachan - Workshop
Cypher 2017 - Lalit Sachan - WorkshopCypher 2017 - Lalit Sachan - Workshop
Cypher 2017 - Lalit Sachan - Workshop
Analytics India Magazine
 

More from Analytics India Magazine (20)

Keep it simple and it works - Simplicity and sticking to fundamentals in the ...
Keep it simple and it works - Simplicity and sticking to fundamentals in the ...Keep it simple and it works - Simplicity and sticking to fundamentals in the ...
Keep it simple and it works - Simplicity and sticking to fundamentals in the ...
 
Feature Based Opinion Mining By Gourab Nath Core Faculty – Data Science at Pr...
Feature Based Opinion Mining By Gourab Nath Core Faculty – Data Science at Pr...Feature Based Opinion Mining By Gourab Nath Core Faculty – Data Science at Pr...
Feature Based Opinion Mining By Gourab Nath Core Faculty – Data Science at Pr...
 
Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...
Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...
Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...
 
Getting your first job in Data Science By Imaad Mohamed Khan Founder-in-Resid...
Getting your first job in Data Science By Imaad Mohamed Khan Founder-in-Resid...Getting your first job in Data Science By Imaad Mohamed Khan Founder-in-Resid...
Getting your first job in Data Science By Imaad Mohamed Khan Founder-in-Resid...
 
10 data science & AI trends in india to watch out for in 2019
10 data science & AI trends in india to watch out for in 201910 data science & AI trends in india to watch out for in 2019
10 data science & AI trends in india to watch out for in 2019
 
The hitchhiker's guide to artificial intelligence 2018-19
The hitchhiker's guide to artificial intelligence 2018-19The hitchhiker's guide to artificial intelligence 2018-19
The hitchhiker's guide to artificial intelligence 2018-19
 
Data Science Skills Study 2018 by AIM & Great Learning
Data Science Skills Study 2018 by AIM & Great LearningData Science Skills Study 2018 by AIM & Great Learning
Data Science Skills Study 2018 by AIM & Great Learning
 
Emerging engineering issues for building large scale AI systems By Srinivas P...
Emerging engineering issues for building large scale AI systems By Srinivas P...Emerging engineering issues for building large scale AI systems By Srinivas P...
Emerging engineering issues for building large scale AI systems By Srinivas P...
 
Bringing AI into the Enterprise - A Practitioner's view By Piyush Chowhan CIO...
Bringing AI into the Enterprise - A Practitioner's view By Piyush Chowhan CIO...Bringing AI into the Enterprise - A Practitioner's view By Piyush Chowhan CIO...
Bringing AI into the Enterprise - A Practitioner's view By Piyush Chowhan CIO...
 
Explainable deep learning with applications in Healthcare By Sunil Kumar Vupp...
Explainable deep learning with applications in Healthcare By Sunil Kumar Vupp...Explainable deep learning with applications in Healthcare By Sunil Kumar Vupp...
Explainable deep learning with applications in Healthcare By Sunil Kumar Vupp...
 
Getting started with text mining By Mathangi Sri Head of Data Science at Phon...
Getting started with text mining By Mathangi Sri Head of Data Science at Phon...Getting started with text mining By Mathangi Sri Head of Data Science at Phon...
Getting started with text mining By Mathangi Sri Head of Data Science at Phon...
 
“Who Moved My Cheese?” – Sniff the changes and stay relevant as an analytics ...
“Who Moved My Cheese?” – Sniff the changes and stay relevant as an analytics ...“Who Moved My Cheese?” – Sniff the changes and stay relevant as an analytics ...
“Who Moved My Cheese?” – Sniff the changes and stay relevant as an analytics ...
 
"Route risks using driving data on road segments" By Jayanta Kumar Pal Staff ...
"Route risks using driving data on road segments" By Jayanta Kumar Pal Staff ..."Route risks using driving data on road segments" By Jayanta Kumar Pal Staff ...
"Route risks using driving data on road segments" By Jayanta Kumar Pal Staff ...
 
“Who Moved My Cheese?” – Sniff the changes and stay relevant as an analytics ...
“Who Moved My Cheese?” – Sniff the changes and stay relevant as an analytics ...“Who Moved My Cheese?” – Sniff the changes and stay relevant as an analytics ...
“Who Moved My Cheese?” – Sniff the changes and stay relevant as an analytics ...
 
Analytics Education — A Primer & Learning Path
Analytics Education — A Primer & Learning PathAnalytics Education — A Primer & Learning Path
Analytics Education — A Primer & Learning Path
 
Analytics & Data Science Industry In India: Study 2018 - by AnalytixLabs & AIM
Analytics & Data Science Industry In India: Study 2018 - by AnalytixLabs & AIMAnalytics & Data Science Industry In India: Study 2018 - by AnalytixLabs & AIM
Analytics & Data Science Industry In India: Study 2018 - by AnalytixLabs & AIM
 
Study: Analytics & Data Science Jobs in India - 2018
Study: Analytics & Data Science Jobs in India - 2018Study: Analytics & Data Science Jobs in India - 2018
Study: Analytics & Data Science Jobs in India - 2018
 
Analytics India Salary Study 2018 - by AIM & Great Learning
Analytics India Salary Study 2018 - by AIM & Great LearningAnalytics India Salary Study 2018 - by AIM & Great Learning
Analytics India Salary Study 2018 - by AIM & Great Learning
 
State of analytics in domestic firms in India 2017 - by AIM & Cartesian Consu...
State of analytics in domestic firms in India 2017 - by AIM & Cartesian Consu...State of analytics in domestic firms in India 2017 - by AIM & Cartesian Consu...
State of analytics in domestic firms in India 2017 - by AIM & Cartesian Consu...
 
Cypher 2017 - Lalit Sachan - Workshop
Cypher 2017 - Lalit Sachan - WorkshopCypher 2017 - Lalit Sachan - Workshop
Cypher 2017 - Lalit Sachan - Workshop
 

Recently uploaded

一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理
exukyp
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
z6osjkqvd
 
Senior Engineering Sample EM DOE - Sheet1.pdf
Senior Engineering Sample EM DOE  - Sheet1.pdfSenior Engineering Sample EM DOE  - Sheet1.pdf
Senior Engineering Sample EM DOE - Sheet1.pdf
Vineet
 
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
yuvarajkumar334
 
一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
zsafxbf
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
TeukuEriSyahputra
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
ywqeos
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
Vineet
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Marlon Dumas
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
uevausa
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
nyvan3
 
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
Rebecca Bilbro
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
nhutnguyen355078
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
dataschool1
 
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdfreading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
perranet1
 
一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理
ugydym
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
blueshagoo1
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
ytypuem
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
NABLAS株式会社
 
Call Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
Call Girls Lucknow 0000000000 Independent Call Girl Service LucknowCall Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
Call Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
hiju9823
 

Recently uploaded (20)

一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
 
Senior Engineering Sample EM DOE - Sheet1.pdf
Senior Engineering Sample EM DOE  - Sheet1.pdfSenior Engineering Sample EM DOE  - Sheet1.pdf
Senior Engineering Sample EM DOE - Sheet1.pdf
 
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
 
一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
 
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
 
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdfreading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
 
一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
 
Call Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
Call Girls Lucknow 0000000000 Independent Call Girl Service LucknowCall Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
Call Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
 

Predicting outcome of legal case using machine learning algorithms By Ankita Singh Service Delivery Specialist at IBM India , Nilutpal Goswami Senior Manager at Capgemini at CYPHER 2018

  • 1. PREDICTING OUTCOME OF LEGAL CASES Ankita Singh Nilutpal Goswami
  • 2. Ankita Singh Nilutpal Goswami Agenda Domain Objective Data Extraction EDA Architecture Model Development Q & ASummary and Findings 1 2 3 4 5 6 7 8
  • 3. Ankita Singh Nilutpal Goswami Legal Systems of the World Source – Citation (http://saint-claire.org/)
  • 4. Ankita Singh Nilutpal Goswami Hierarchy of Indian Judiciary Sources of Law § Constitution § Legislation • Ordinary • Delegated • Ordinance § Judicial Precedent § Customs
  • 5. Ankita Singh Nilutpal Goswami Background Indian Judicial System o Largest judicial machinery based on the biggest constitution o Constitution of India have 448 articles in 25 parts, 12 schedules, 5 appendices and 98 amendments o Indian Penal Code (IPC) defines various crimes/offences and prescribes the punishment o Criminal Procedure Code (CrPC) defines the mandatory procedures to be carried out pursuing a case o 24 High Courts and over 600 district courts o Nearly 5 lakhs cases are filled daily in Indian Courts o Approximately 4.5 lakhs of cases are put up before the courts daily o About 2.5 lakhs of cases are disposed off daily
  • 6. Ankita Singh Nilutpal Goswami Challenging Facts CASES PENDING CIVIL CASES CRIMINAL CASES TOTAL CASES PERCENTAGE > 10 years 597,166 1,691,515 2,288,681 8.28% Between 5 to 10 years 1,244,117 3,212,377 4,456,494 16.13% Between 2 to 5 years 2,542,925 5,394,015 7,936,940 28.73% < 2 years 3,946,341 8,997,935 12,944,276 46.85% Total Pending Cases 8,330,549 19,295,842 27,626,391 > 10 years 8% Between 5 to 10 years 16% Between 2 to 5 years 29% < 2 years 47% Other 76% Source – National Judicial Data Grid (as on September 18th 2018) v Case Disposal Rates (August 2018) § 10 years – 1.5 % § All cases – 3.8 % v Cases filed daily ~ 5- 8 Lakhs v Cases pending registration ~ 7.5 Lakhs v Has 15 judges for every 1 million of people v 22.2 million undertrials – undertrials outnumber the convicts
  • 7. Ankita Singh Nilutpal Goswami Faster processing of legal issues / cases “Judgement” data sourcing and understanding of the details Evaluating predictions based on various machine learning model Develop social value by means of streamlining the judicial case intake Objective
  • 8. Ankita Singh Nilutpal Goswami Sample Judgement document snapshot o Case Documents Analyzed – 120 o Data extraction mechanism – manual o Unique fields extracted – 58 o Total number of final observations - 202 Data • Nature of Disposal • Case Type • Court Number • Court Name • Judge • Judge Gender • Judgement Date • Total Number of Sections • Section 1 thru Section 10 • FIR Number/Year • Police station • Investigating officer • Case Number • Year • Complainant • Total Accused • Accused # • Accused Name • Accused Gender • Accused Age • Accused Confessed? (plea) • Date Of first Hearing • Complainant advocate • Prosecution advocate • Advocate Defendant • Number of Prosecution witnesses • Names of prosecution witnesses • PW's Examined? • Number of hostile witnesses • Defense witnesses • Charge sheet • Points for consideration • Exhibits on behalf of prosecution P series • Number of exhibits considered • Exhibits on behalf of court Cseries • Exhibits on behalf of accused Dseries • Total Number of Material Objects • Charges proved • Charges not proved • Issues Proved • Issues Not Proved • Accused released on bail • Accused committed to prison • Sentence of Imprisonment granted • Fine with Imprisonment (Rs) • Term Served in Prison(days) • Set off (if any) • Judgement • Citations Original Features
  • 9. Ankita Singh Nilutpal Goswami • Source – Publicly available judgement documents • Case Documents Analyzed – 120 • Data extraction mechanism – manual • Unique fields extracted – 58 • Consistent features identified -15 (Judgement decision is the Target Variable) • Total number of final observations - 202 Data # Feature Name Description Datatype Value 1 ipc_420 Binary indicator to confirm if the case is filed under IPC 420 Categorical Yes=1, No=0 2 ipc_120b Binary indicator to confirm if the case is filed under IPC 120b Categorical Yes=1, No=0 3 ipc_471 Binary indicator to confirm if the case is filed under IPC 471 Categorical Yes=1, No=0 4 ipc_468 Binary indicator to confirm if the case is filed under IPC 468 Categorical Yes=1, No=0 5 ipc_34 Binary indicator to confirm if the case is filed under IPC 34 Categorical Yes=1, No=0 6 jud_gender Gender of the judge presiding over the case Categorical Male=0, Female=1 7 jud_date Date when judgement was meted Date Date 8 tot_sec Total number of sections filed for the case Numeric Number 9 case_no Unique number of the case Categorical Multiple Factors 10 comp Complainant name * String Name 11 tot_accu Total number of accused presented in the case Numeric Number 12 accu_gender Gender of the individual accused Categorical Male=0, Female=1 13 accu_no Sequence number of the accused Categorical Multiple Factors 14 accu_age Age of the accused Numeric Number 15 judgement Judgement given in the case Categorical Guilty=1, Not Guilty=0
  • 10. Ankita Singh Nilutpal Goswami Feature Importance
  • 11. Ankita Singh Nilutpal Goswami Exploratory Data Analysis Guilty – 20 Non-Guilty - 182
  • 12. Ankita Singh Nilutpal Goswami Exploratory Data Analysis Density Plot
  • 13. Ankita Singh Nilutpal Goswami Exploratory Data Analysis IPC sections frequency
  • 14. Ankita Singh Nilutpal Goswami Exploratory Data Analysis Correlation Matrix
  • 15. Ankita Singh Nilutpal Goswami Architecture
  • 16. Ankita Singh Nilutpal Goswami INITIAL MODEL DEVELOPMENT STEPS PredictionData Collection Feed data to model 1 2 3 POST IMPLEMENTATION STEPS FEEDBACK Development Methodology
  • 17. Ankita Singh Nilutpal Goswami Model Development • Logistic Regression • K-Nearest Neighbor • Random Forest • Support Vector Machine
  • 18. Ankita Singh Nilutpal Goswami Model Development
  • 19. Ankita Singh Nilutpal Goswami Logistic Regression Pseudo R-square - 45.4% of the Intercept only model has been explained by the Full model Log likelihood ratio implies that the null hypothesis of all Betas are zero is rejected and at least one Beta is nonzero.
  • 20. Ankita Singh Nilutpal Goswami Accuracy • Training Sample – 92.9 % • Validation Sample – 88.5 % Logistic Regression Variable Importance
  • 21. Ankita Singh Nilutpal Goswami Cross-Validation 10 fold cross-validation resulted the best value with k=7 From the results, Accuracy and Kappa reducing after k=5 K-Nearest Neighbor
  • 22. Ankita Singh Nilutpal Goswami K-Nearest Neighbor Model was further tuned by setting twoclassSummary and classProbs as True. Tuned model has better accuracy of 93.44%
  • 23. Ankita Singh Nilutpal Goswami Random Forest Model parameters - • ntree = 250 as OOB hardly changes after 250 trees • mtry = 3 initially we took sqrt(total_no_of_features) • nodesize = 3 1% of the total observation (202 observations)
  • 24. Ankita Singh Nilutpal Goswami Random Forest Cross Validation with Parameter Tuning with mtry=2,3 and 4 Tuned model has accuracy of 93.55 %
  • 25. Ankita Singh Nilutpal Goswami Support Vector Machine • Model found 41 support vectors with gamma value of 0.017 and cost of 1 • SVM model accuracy 90.02%
  • 26. Ankita Singh Nilutpal Goswami 10 fold cross validation identified best values of gamma - 0.1, Cost - 1 Tuned model has accuracy of 95.04 % Support Vector Machine
  • 27. Ankita Singh Nilutpal Goswami Observation o From the assessment of all the models, Support Vector Machine provides a better accuracy including other performance parameters. Model Accuracy (%) Precision (%) Recall (%) Decision Trees (Gini) 82% 82% 97% K-Nearest Neighbor 93% 93% 100% Logistic Regression 88% 96% 91% Naïve Bayes 75% 76% 95% Random Forest 94% 93% 100% Support Vector Machines 95% 94% 99% Summary - Model Performance
  • 28. Ankita Singh Nilutpal Goswami • Support Vector Machine provides a better accuracy • Better Precision and Recall values obtained from SVM and Gradient Boosting • Random data is skewed towards Non-Guilty cases (89 : 11 in favor of Non-Guilty) • Model has been developed on IPC 420 cases found across multiple District / High Courts • Prediction obtained were majorly predicting Non- Guilty Summary and Findings
  • 29. Ankita Singh Nilutpal Goswami Plan Ahead MOBILE APPLICATION
  • 30. Ankita Singh Nilutpal Goswami Q&A Thanks