SlideShare a Scribd company logo
3rd INTERNATIONAL CONFERENCE ON SUSTAINABLE
TECHNOLOGIES FOR INDUSTRY 4.0
Date: 18 - 19 December 2021
Natural Language Query to SQL conversion
using Machine Learning Approach
Minhazul Arefin, Kazi Mojammel Hossen and Mohammed Nasir Uddin
2
Overview
Introduction
Problem Description
Objective of this Paper
Proposed Methodology
Result
Conclusion
Future Works
3
• Natural language or ordinary
language is any language that
has evolved naturally in human brain.
• It can take different forms, such as
Natural Language
 Speech
 Signing
• SQL stands for Structured Query
Language
• It is used to communicate with a
database.
• Standard SQL commands
4
Structured Query Langage
 Select
 Insert
 Update
 Delete
5
Natural Language
Processing (NLP)
Structured Query Language (SQL)
Machine Learning
Algorithms
• NLIDB stands for Natural Language Interface with Database
Systems.
Introduction
Problem Description
• Asking questions in natural language to get answers from
databases is a very convenient and easy method of data
access.
• For non-expert user, it is necessary to compile the natural
language to structured query language (SQL)
• Filling a form in internet that has many fields can be tedious
for navigate through the screen, to scroll, to look up the
scroll box values.
6
Objective
The main objectives of this research work are:
 To provide algorithms for converting Natural Language to
Structured Query Language (SQL)
 To propose a general framework for efficient processing of
natural language query
 To extract information from the database.
7
Contributions
The main contributions of this research work are:
 Designing algorithms for this machine translation system
 Implementing the proposed translation algorithm and comparing
the performance of our approach with the state-of-the-art works.
Our findings show that machine learning approach can outperform
other existing systems.
 Using simple algorithms increase the performance as well as
reducing the time complexity.
8
Methodology
9
1. Text Preprocessing
10
Tokenization
Escape word
Parts of Speech Tagger
Word Similarity
1.1. Tokenization
• Tokenization is the process of converting a sequence of
characters into a sequence of tokens.
• This tokenize function performs the following steps:
 treat most punctuation characters
 split off commas and single quotes, followed by
whitespace
 separate periods that appear at the end of line
Input Text : “get names of all students”
Output After tokenization:
11
1.2. Escape Words
• The escape word is a set of words which contains
the list of unnecessary words that occur in the
given text.
• It mainly contains
 Auxiliaries verb
 Articles
12
Input from Tokenization Step :
Output After Removing Escape Words:
1.2. Escape Words
13
1.3. Part-Of-Speech Tagger
• Parts-of-speech(PoS) tagging used to classify
words into their parts-of-speech
• Input Get from Tokenization Step :
14
Here,
VB -> Verb, base form
NNS -> noun, common, plural
DT -> Determiner, article
Output After tokenization:
1.3. Part-Of-Speech Tagger
15
1.4. Word Similarity
• In this step we get all the synonyms of all words after we
remove escape words from the given text.
• For word similarity, we use WordNet database .
• For example all synonym of phone is:
16
For example Similarity between ‘telephone’ & ‘phone’:
1.4. Word Similarity
17
2. Attribute Extraction
• In this section at first we get the synonym of words from
tokenization step
• Then we match the type with one to another by Jaro -
Winkler algorithm
• It computes the similarity between two strings, and the
returned value lies in the interval [0.0, 1.0]
• The distance is computed as:
simw = simj + (lp(1- simj))
18
19
2. Attribute Extraction
2. Attribute Extraction
20
Input Text :
Output After attribute extraction:
2. Attribute Extraction(Continue)
21
Input Text :
“get all telephone number, address & name of the
students”
Output After attribute extraction:
3. Table Extraction
• This step only works if the previous step gets no
attribute from the given text.
• At first, this step find all table names from the
existing database.
• Then, it will go to the next step.
22
• For example:
Input Text : “show all”
Output After Table Extraction:
3. Table Extraction
23
4. Command Extraction
• Here we use Naive Bayes classifier for detecting SQL command
• Using Bayes' theorem, the conditional probability can be described
as:
𝑷(𝑨|𝑩) =
𝑷 𝑩 𝑨 × 𝑷(𝑨)
𝑷(𝑩)
• In our case, suppose we wantP(select | get names of all students). So
using this theorem we can get the conditional probability:
P(select |get names of all students) =
P (get names of all students | select) × 𝑷(𝒔𝒆𝒍𝒆𝒄𝒕)
P (get names of all students| select)
24
4. Command Extraction
25
Input Text : “get names of all students”
Output command:
Sentence Result Select Insert Delete Update
Get names of all students Select 86.97 0.26 12.36 0.39
Result of command Extraction:
• We used decision tree classier for extract condition from
the given input
• It find the specific condition appropriate for the given input
text
26
5. Condition Extraction
6. Query Generation
Input Text : “get names of all students”
Output After Query Generation:
27
Attributes FROM Table Name WHERE Condition
Operation
• In this step we will start to build the query.
28
7. Executing the code
• In this step we run the SQL query which we get
from query generation step
Input Text : “get names of all students”
Output After Query Generation:
Input Text : “get all phone number, address, name of the students”
Output After Query Generation:
Result
29
Input Text : “get names of all students”
Output After Building Query:
Input Text : “SELECT names FROM students”
Output After Running Query:
Jakir, Minhaz, Jisan, Rana, Imran
Comparison
30
Sl. Model / Performance Factor Accuracy (%) Error Rate (%) Run Time (s)
1 Generic Model 73.14 26.86 5.29
2 NLIDB for RDBMS 83.6 16.4 7.8
3 Our Study 88.17 11.83 2.929
o We mainly focus on attributes to build an SQL query
Comparison
31
• This research has a substantial import on 4th industrial
revolution.
• As every automation system or IoT devices has a data store
that can be manipulated via plain text.
• Furthermore, existing database can be used as a knowledge
base for AI powered chat bots.
• Which may be used as Virtual assistants.
32
Conclusion
Future Works
In future, we intend to work on
 joining table
 develop an efficient algorithm using other mechanism for
better performance.
 provide a deep learning solution for this problem
33
Thanks!
Any questions?
34
Why use Naive Bayes
▷ Relatively less number of training samples are sufficient for
training with Naive Bayes algorithm
▷ variance tradeoff. Spam/sentiment type data are often
noisy and usually high-dimensional (more predictors than
samples, n « p. The naive assumption that predictors are
independent of one another is a strong, high-bias, one.
▷ By assuming independence of predictors we're saying that
covariance matrix of our model only has non-zero entries
on the diagonal.
35
Why use Jaro-Winkler?
▷ Jaro-Winkler gives a matching score between 0.0 to 1.0.
▷ The Jaro algorithm is a measure of characters in common,
being no more than half the length of the longer string in
distance, with consideration for transpositions.
▷ It gives high-accuracy
36
Jaro-Winkler
• Here we use Jaro-Winkler algorithm for attribute extraction
• We match all similar word with attributes by Jaro-Winkler algorithm and
detect the necessary attribute for the specific query.
• Jaro-Winkler is a string edit distance that was developed in the area of
record linkage (duplicate detection)
• It computes the similarity between two strings, and the returned value lies
in the interval [0.0, 1.0]
• The distance is computed as:
simw = simj + (lp(1- simj))
Where:
 Simj is the Jaro similarity for given strings s1 and s2
 l is the length of common prefix at the start of the string up to a
maximum of four characters
 p is a constant scaling factor for how much the score is adjusted
upwards for having common prefixes.
 The Jaro-Winkler distance dw is defined as dw = 1- simw
37

More Related Content

What's hot

Presentation Slides of College Management System Report
Presentation Slides of College Management System ReportPresentation Slides of College Management System Report
Presentation Slides of College Management System Report
MuhammadHusnainRaza
 
Liit tyit sem 5 enterprise java unit 5 important questins with solutions for ...
Liit tyit sem 5 enterprise java unit 5 important questins with solutions for ...Liit tyit sem 5 enterprise java unit 5 important questins with solutions for ...
Liit tyit sem 5 enterprise java unit 5 important questins with solutions for ...
tanujaparihar
 
Group By, Order By, and Aliases in SQL
Group By, Order By, and Aliases in SQLGroup By, Order By, and Aliases in SQL
Group By, Order By, and Aliases in SQL
MSB Academy
 
Software Development Methodologies
Software Development MethodologiesSoftware Development Methodologies
Software Development Methodologies
Nicholas Davis
 
Software myths | Software Engineering Notes
Software myths | Software Engineering NotesSoftware myths | Software Engineering Notes
Software myths | Software Engineering Notes
Navjyotsinh Jadeja
 
Placement Cell project
Placement Cell projectPlacement Cell project
Placement Cell projectManish Kumar
 
Online Examination System in .NET & DB2
Online Examination System in .NET & DB2Online Examination System in .NET & DB2
Online Examination System in .NET & DB2
Abhay Ananda Shukla
 
Full report on blood bank management system
Full report on  blood bank management systemFull report on  blood bank management system
Full report on blood bank management system
Jawhar Ali
 
Traning and placement management system
Traning and placement management systemTraning and placement management system
Traning and placement management system
riteshitechnosoft
 
OOP Poster Presentation
OOP Poster PresentationOOP Poster Presentation
OOP Poster Presentation
Md Mofijul Haque
 
Java Thread Synchronization
Java Thread SynchronizationJava Thread Synchronization
Java Thread SynchronizationBenj Del Mundo
 
Software engineering a practitioners approach 8th edition pressman solutions ...
Software engineering a practitioners approach 8th edition pressman solutions ...Software engineering a practitioners approach 8th edition pressman solutions ...
Software engineering a practitioners approach 8th edition pressman solutions ...
Drusilla918
 
Quizz app By Raihan Sikdar
Quizz app By Raihan SikdarQuizz app By Raihan Sikdar
Quizz app By Raihan Sikdar
raihansikdar
 
Final project report
Final project reportFinal project report
Final project reportssuryawanshi
 
Course registration system dfd
Course registration system dfdCourse registration system dfd
Course registration system dfd
Utsav mistry
 
Project report on online examination system
Project report on online examination systemProject report on online examination system
Project report on online examination system
Mo Irshad Ansari
 
Software Engineering Fundamentals
Software Engineering FundamentalsSoftware Engineering Fundamentals
Software Engineering Fundamentals
Rahul Sudame
 
14.project online eamination system
14.project online eamination system14.project online eamination system
14.project online eamination system
jbpatel7290
 
KaGemCo - Mobile Recharge System
KaGemCo - Mobile Recharge SystemKaGemCo - Mobile Recharge System
KaGemCo - Mobile Recharge System
Panos Gemos
 
A generic view of software engineering
A generic view of software engineeringA generic view of software engineering
A generic view of software engineering
Inocentshuja Ahmad
 

What's hot (20)

Presentation Slides of College Management System Report
Presentation Slides of College Management System ReportPresentation Slides of College Management System Report
Presentation Slides of College Management System Report
 
Liit tyit sem 5 enterprise java unit 5 important questins with solutions for ...
Liit tyit sem 5 enterprise java unit 5 important questins with solutions for ...Liit tyit sem 5 enterprise java unit 5 important questins with solutions for ...
Liit tyit sem 5 enterprise java unit 5 important questins with solutions for ...
 
Group By, Order By, and Aliases in SQL
Group By, Order By, and Aliases in SQLGroup By, Order By, and Aliases in SQL
Group By, Order By, and Aliases in SQL
 
Software Development Methodologies
Software Development MethodologiesSoftware Development Methodologies
Software Development Methodologies
 
Software myths | Software Engineering Notes
Software myths | Software Engineering NotesSoftware myths | Software Engineering Notes
Software myths | Software Engineering Notes
 
Placement Cell project
Placement Cell projectPlacement Cell project
Placement Cell project
 
Online Examination System in .NET & DB2
Online Examination System in .NET & DB2Online Examination System in .NET & DB2
Online Examination System in .NET & DB2
 
Full report on blood bank management system
Full report on  blood bank management systemFull report on  blood bank management system
Full report on blood bank management system
 
Traning and placement management system
Traning and placement management systemTraning and placement management system
Traning and placement management system
 
OOP Poster Presentation
OOP Poster PresentationOOP Poster Presentation
OOP Poster Presentation
 
Java Thread Synchronization
Java Thread SynchronizationJava Thread Synchronization
Java Thread Synchronization
 
Software engineering a practitioners approach 8th edition pressman solutions ...
Software engineering a practitioners approach 8th edition pressman solutions ...Software engineering a practitioners approach 8th edition pressman solutions ...
Software engineering a practitioners approach 8th edition pressman solutions ...
 
Quizz app By Raihan Sikdar
Quizz app By Raihan SikdarQuizz app By Raihan Sikdar
Quizz app By Raihan Sikdar
 
Final project report
Final project reportFinal project report
Final project report
 
Course registration system dfd
Course registration system dfdCourse registration system dfd
Course registration system dfd
 
Project report on online examination system
Project report on online examination systemProject report on online examination system
Project report on online examination system
 
Software Engineering Fundamentals
Software Engineering FundamentalsSoftware Engineering Fundamentals
Software Engineering Fundamentals
 
14.project online eamination system
14.project online eamination system14.project online eamination system
14.project online eamination system
 
KaGemCo - Mobile Recharge System
KaGemCo - Mobile Recharge SystemKaGemCo - Mobile Recharge System
KaGemCo - Mobile Recharge System
 
A generic view of software engineering
A generic view of software engineeringA generic view of software engineering
A generic view of software engineering
 

Similar to Natural Language Query to SQL conversion using Machine Learning Approach

A Generic Neural Network Architecture to Infer Heterogeneous Model Transforma...
A Generic Neural Network Architecture to Infer Heterogeneous Model Transforma...A Generic Neural Network Architecture to Infer Heterogeneous Model Transforma...
A Generic Neural Network Architecture to Infer Heterogeneous Model Transforma...
Lola Burgueño
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_experts
Sanghamitra Deb
 
Introduction to C ++.pptx
Introduction to C ++.pptxIntroduction to C ++.pptx
Introduction to C ++.pptx
VAIBHAVKADAGANCHI
 
5_RNN_LSTM.pdf
5_RNN_LSTM.pdf5_RNN_LSTM.pdf
5_RNN_LSTM.pdf
FEG
 
Generating test cases using UML Communication Diagram
Generating test cases using UML Communication Diagram Generating test cases using UML Communication Diagram
Generating test cases using UML Communication Diagram
Praveen Penumathsa
 
Java-Intro.pptx
Java-Intro.pptxJava-Intro.pptx
Java-Intro.pptx
VijalJain3
 
Introduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKennaIntroduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKenna
openseesdays
 
Introduction to oop
Introduction to oop Introduction to oop
Introduction to oop Kumar
 
An LSTM-Based Neural Network Architecture for Model Transformations
An LSTM-Based Neural Network Architecture for Model TransformationsAn LSTM-Based Neural Network Architecture for Model Transformations
An LSTM-Based Neural Network Architecture for Model Transformations
Jordi Cabot
 
Programming in java basics
Programming in java  basicsProgramming in java  basics
Programming in java basics
LovelitJose
 
Lec1.ppt
Lec1.pptLec1.ppt
Lec1.ppt
ssuser8bddb2
 
Facial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional FaceFacial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional Face
Takrim Ul Islam Laskar
 
240318_JW_labseminar[Attention Is All You Need].pptx
240318_JW_labseminar[Attention Is All You Need].pptx240318_JW_labseminar[Attention Is All You Need].pptx
240318_JW_labseminar[Attention Is All You Need].pptx
thanhdowork
 
Data structure Unit-I Part A
Data structure Unit-I Part AData structure Unit-I Part A
Data structure Unit-I Part A
SSN College of Engineering, Kalavakkam
 
Deep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep FeaturesDeep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep FeaturesTuri, Inc.
 
RAJAT PROJECT.pptx
RAJAT PROJECT.pptxRAJAT PROJECT.pptx
RAJAT PROJECT.pptx
SayedMohdAsim2
 
Searching Algorithms
Searching AlgorithmsSearching Algorithms
Searching Algorithms
Afaq Mansoor Khan
 
Rui Meng - 2017 - Deep Keyphrase Generation
Rui Meng - 2017 - Deep Keyphrase GenerationRui Meng - 2017 - Deep Keyphrase Generation
Rui Meng - 2017 - Deep Keyphrase Generation
Association for Computational Linguistics
 
EKON 23 Code_review_checklist
EKON 23 Code_review_checklistEKON 23 Code_review_checklist
EKON 23 Code_review_checklist
Max Kleiner
 

Similar to Natural Language Query to SQL conversion using Machine Learning Approach (20)

A Generic Neural Network Architecture to Infer Heterogeneous Model Transforma...
A Generic Neural Network Architecture to Infer Heterogeneous Model Transforma...A Generic Neural Network Architecture to Infer Heterogeneous Model Transforma...
A Generic Neural Network Architecture to Infer Heterogeneous Model Transforma...
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_experts
 
Introduction to C ++.pptx
Introduction to C ++.pptxIntroduction to C ++.pptx
Introduction to C ++.pptx
 
5_RNN_LSTM.pdf
5_RNN_LSTM.pdf5_RNN_LSTM.pdf
5_RNN_LSTM.pdf
 
Unit 1
Unit 1Unit 1
Unit 1
 
Generating test cases using UML Communication Diagram
Generating test cases using UML Communication Diagram Generating test cases using UML Communication Diagram
Generating test cases using UML Communication Diagram
 
Java-Intro.pptx
Java-Intro.pptxJava-Intro.pptx
Java-Intro.pptx
 
Introduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKennaIntroduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKenna
 
Introduction to oop
Introduction to oop Introduction to oop
Introduction to oop
 
An LSTM-Based Neural Network Architecture for Model Transformations
An LSTM-Based Neural Network Architecture for Model TransformationsAn LSTM-Based Neural Network Architecture for Model Transformations
An LSTM-Based Neural Network Architecture for Model Transformations
 
Programming in java basics
Programming in java  basicsProgramming in java  basics
Programming in java basics
 
Lec1.ppt
Lec1.pptLec1.ppt
Lec1.ppt
 
Facial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional FaceFacial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional Face
 
240318_JW_labseminar[Attention Is All You Need].pptx
240318_JW_labseminar[Attention Is All You Need].pptx240318_JW_labseminar[Attention Is All You Need].pptx
240318_JW_labseminar[Attention Is All You Need].pptx
 
Data structure Unit-I Part A
Data structure Unit-I Part AData structure Unit-I Part A
Data structure Unit-I Part A
 
Deep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep FeaturesDeep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep Features
 
RAJAT PROJECT.pptx
RAJAT PROJECT.pptxRAJAT PROJECT.pptx
RAJAT PROJECT.pptx
 
Searching Algorithms
Searching AlgorithmsSearching Algorithms
Searching Algorithms
 
Rui Meng - 2017 - Deep Keyphrase Generation
Rui Meng - 2017 - Deep Keyphrase GenerationRui Meng - 2017 - Deep Keyphrase Generation
Rui Meng - 2017 - Deep Keyphrase Generation
 
EKON 23 Code_review_checklist
EKON 23 Code_review_checklistEKON 23 Code_review_checklist
EKON 23 Code_review_checklist
 

More from Minhazul Arefin

Controlling Home Appliances adopting Chatbot using Machine Learning Approach
Controlling Home Appliances adopting Chatbot using Machine Learning ApproachControlling Home Appliances adopting Chatbot using Machine Learning Approach
Controlling Home Appliances adopting Chatbot using Machine Learning Approach
Minhazul Arefin
 
Object Detection on Dental X-ray Images using R-CNN
Object Detection on Dental X-ray Images using R-CNNObject Detection on Dental X-ray Images using R-CNN
Object Detection on Dental X-ray Images using R-CNN
Minhazul Arefin
 
Efficient estimation of word representations in vector space (2013)
Efficient estimation of word representations in vector space (2013)Efficient estimation of word representations in vector space (2013)
Efficient estimation of word representations in vector space (2013)
Minhazul Arefin
 
Semantic scaffolds for pseudocode to-code generation (2020)
Semantic scaffolds for pseudocode to-code generation (2020)Semantic scaffolds for pseudocode to-code generation (2020)
Semantic scaffolds for pseudocode to-code generation (2020)
Minhazul Arefin
 
Recurrent neural networks (rnn) and long short term memory networks (lstm)
Recurrent neural networks (rnn) and long short term memory networks (lstm)Recurrent neural networks (rnn) and long short term memory networks (lstm)
Recurrent neural networks (rnn) and long short term memory networks (lstm)
Minhazul Arefin
 
SPoC: search-based pseudocode to code
SPoC: search-based pseudocode to codeSPoC: search-based pseudocode to code
SPoC: search-based pseudocode to code
Minhazul Arefin
 
The rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computingThe rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computing
Minhazul Arefin
 

More from Minhazul Arefin (7)

Controlling Home Appliances adopting Chatbot using Machine Learning Approach
Controlling Home Appliances adopting Chatbot using Machine Learning ApproachControlling Home Appliances adopting Chatbot using Machine Learning Approach
Controlling Home Appliances adopting Chatbot using Machine Learning Approach
 
Object Detection on Dental X-ray Images using R-CNN
Object Detection on Dental X-ray Images using R-CNNObject Detection on Dental X-ray Images using R-CNN
Object Detection on Dental X-ray Images using R-CNN
 
Efficient estimation of word representations in vector space (2013)
Efficient estimation of word representations in vector space (2013)Efficient estimation of word representations in vector space (2013)
Efficient estimation of word representations in vector space (2013)
 
Semantic scaffolds for pseudocode to-code generation (2020)
Semantic scaffolds for pseudocode to-code generation (2020)Semantic scaffolds for pseudocode to-code generation (2020)
Semantic scaffolds for pseudocode to-code generation (2020)
 
Recurrent neural networks (rnn) and long short term memory networks (lstm)
Recurrent neural networks (rnn) and long short term memory networks (lstm)Recurrent neural networks (rnn) and long short term memory networks (lstm)
Recurrent neural networks (rnn) and long short term memory networks (lstm)
 
SPoC: search-based pseudocode to code
SPoC: search-based pseudocode to codeSPoC: search-based pseudocode to code
SPoC: search-based pseudocode to code
 
The rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computingThe rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computing
 

Recently uploaded

Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
symbo111
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
gestioneergodomus
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
ChristineTorrepenida1
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
anoopmanoharan2
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
camseq
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Christina Lin
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
SyedAbiiAzazi1
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
heavyhaig
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
ydteq
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
Dr Ramhari Poudyal
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 

Recently uploaded (20)

Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 

Natural Language Query to SQL conversion using Machine Learning Approach

  • 1. 3rd INTERNATIONAL CONFERENCE ON SUSTAINABLE TECHNOLOGIES FOR INDUSTRY 4.0 Date: 18 - 19 December 2021 Natural Language Query to SQL conversion using Machine Learning Approach Minhazul Arefin, Kazi Mojammel Hossen and Mohammed Nasir Uddin
  • 2. 2 Overview Introduction Problem Description Objective of this Paper Proposed Methodology Result Conclusion Future Works
  • 3. 3 • Natural language or ordinary language is any language that has evolved naturally in human brain. • It can take different forms, such as Natural Language  Speech  Signing
  • 4. • SQL stands for Structured Query Language • It is used to communicate with a database. • Standard SQL commands 4 Structured Query Langage  Select  Insert  Update  Delete
  • 5. 5 Natural Language Processing (NLP) Structured Query Language (SQL) Machine Learning Algorithms • NLIDB stands for Natural Language Interface with Database Systems. Introduction
  • 6. Problem Description • Asking questions in natural language to get answers from databases is a very convenient and easy method of data access. • For non-expert user, it is necessary to compile the natural language to structured query language (SQL) • Filling a form in internet that has many fields can be tedious for navigate through the screen, to scroll, to look up the scroll box values. 6
  • 7. Objective The main objectives of this research work are:  To provide algorithms for converting Natural Language to Structured Query Language (SQL)  To propose a general framework for efficient processing of natural language query  To extract information from the database. 7
  • 8. Contributions The main contributions of this research work are:  Designing algorithms for this machine translation system  Implementing the proposed translation algorithm and comparing the performance of our approach with the state-of-the-art works. Our findings show that machine learning approach can outperform other existing systems.  Using simple algorithms increase the performance as well as reducing the time complexity. 8
  • 10. 1. Text Preprocessing 10 Tokenization Escape word Parts of Speech Tagger Word Similarity
  • 11. 1.1. Tokenization • Tokenization is the process of converting a sequence of characters into a sequence of tokens. • This tokenize function performs the following steps:  treat most punctuation characters  split off commas and single quotes, followed by whitespace  separate periods that appear at the end of line Input Text : “get names of all students” Output After tokenization: 11
  • 12. 1.2. Escape Words • The escape word is a set of words which contains the list of unnecessary words that occur in the given text. • It mainly contains  Auxiliaries verb  Articles 12 Input from Tokenization Step : Output After Removing Escape Words:
  • 14. 1.3. Part-Of-Speech Tagger • Parts-of-speech(PoS) tagging used to classify words into their parts-of-speech • Input Get from Tokenization Step : 14 Here, VB -> Verb, base form NNS -> noun, common, plural DT -> Determiner, article Output After tokenization:
  • 16. 1.4. Word Similarity • In this step we get all the synonyms of all words after we remove escape words from the given text. • For word similarity, we use WordNet database . • For example all synonym of phone is: 16 For example Similarity between ‘telephone’ & ‘phone’:
  • 18. 2. Attribute Extraction • In this section at first we get the synonym of words from tokenization step • Then we match the type with one to another by Jaro - Winkler algorithm • It computes the similarity between two strings, and the returned value lies in the interval [0.0, 1.0] • The distance is computed as: simw = simj + (lp(1- simj)) 18
  • 20. 2. Attribute Extraction 20 Input Text : Output After attribute extraction:
  • 21. 2. Attribute Extraction(Continue) 21 Input Text : “get all telephone number, address & name of the students” Output After attribute extraction:
  • 22. 3. Table Extraction • This step only works if the previous step gets no attribute from the given text. • At first, this step find all table names from the existing database. • Then, it will go to the next step. 22 • For example: Input Text : “show all” Output After Table Extraction:
  • 24. 4. Command Extraction • Here we use Naive Bayes classifier for detecting SQL command • Using Bayes' theorem, the conditional probability can be described as: 𝑷(𝑨|𝑩) = 𝑷 𝑩 𝑨 × 𝑷(𝑨) 𝑷(𝑩) • In our case, suppose we wantP(select | get names of all students). So using this theorem we can get the conditional probability: P(select |get names of all students) = P (get names of all students | select) × 𝑷(𝒔𝒆𝒍𝒆𝒄𝒕) P (get names of all students| select) 24
  • 25. 4. Command Extraction 25 Input Text : “get names of all students” Output command: Sentence Result Select Insert Delete Update Get names of all students Select 86.97 0.26 12.36 0.39 Result of command Extraction:
  • 26. • We used decision tree classier for extract condition from the given input • It find the specific condition appropriate for the given input text 26 5. Condition Extraction
  • 27. 6. Query Generation Input Text : “get names of all students” Output After Query Generation: 27 Attributes FROM Table Name WHERE Condition Operation • In this step we will start to build the query.
  • 28. 28 7. Executing the code • In this step we run the SQL query which we get from query generation step Input Text : “get names of all students” Output After Query Generation: Input Text : “get all phone number, address, name of the students” Output After Query Generation:
  • 29. Result 29 Input Text : “get names of all students” Output After Building Query: Input Text : “SELECT names FROM students” Output After Running Query: Jakir, Minhaz, Jisan, Rana, Imran
  • 30. Comparison 30 Sl. Model / Performance Factor Accuracy (%) Error Rate (%) Run Time (s) 1 Generic Model 73.14 26.86 5.29 2 NLIDB for RDBMS 83.6 16.4 7.8 3 Our Study 88.17 11.83 2.929 o We mainly focus on attributes to build an SQL query
  • 32. • This research has a substantial import on 4th industrial revolution. • As every automation system or IoT devices has a data store that can be manipulated via plain text. • Furthermore, existing database can be used as a knowledge base for AI powered chat bots. • Which may be used as Virtual assistants. 32 Conclusion
  • 33. Future Works In future, we intend to work on  joining table  develop an efficient algorithm using other mechanism for better performance.  provide a deep learning solution for this problem 33
  • 35. Why use Naive Bayes ▷ Relatively less number of training samples are sufficient for training with Naive Bayes algorithm ▷ variance tradeoff. Spam/sentiment type data are often noisy and usually high-dimensional (more predictors than samples, n « p. The naive assumption that predictors are independent of one another is a strong, high-bias, one. ▷ By assuming independence of predictors we're saying that covariance matrix of our model only has non-zero entries on the diagonal. 35
  • 36. Why use Jaro-Winkler? ▷ Jaro-Winkler gives a matching score between 0.0 to 1.0. ▷ The Jaro algorithm is a measure of characters in common, being no more than half the length of the longer string in distance, with consideration for transpositions. ▷ It gives high-accuracy 36
  • 37. Jaro-Winkler • Here we use Jaro-Winkler algorithm for attribute extraction • We match all similar word with attributes by Jaro-Winkler algorithm and detect the necessary attribute for the specific query. • Jaro-Winkler is a string edit distance that was developed in the area of record linkage (duplicate detection) • It computes the similarity between two strings, and the returned value lies in the interval [0.0, 1.0] • The distance is computed as: simw = simj + (lp(1- simj)) Where:  Simj is the Jaro similarity for given strings s1 and s2  l is the length of common prefix at the start of the string up to a maximum of four characters  p is a constant scaling factor for how much the score is adjusted upwards for having common prefixes.  The Jaro-Winkler distance dw is defined as dw = 1- simw 37