SlideShare a Scribd company logo
1 of 4
Download to read offline
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 03 | Mar-2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 1152
Information Retrieval & Text Analytics Using Artificial Intelligence
Lavanya N1, Dr Mahesh K Kaluti2
1M Tech, CSE, Dept. of Computer Science and Engineering, PESCE, Mandya
2Assistant professor, CSE, Dept. of Computer Science and Engineering, PESCE, Mandya
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract—Information Retrieval is a key technology for
knowledge management. The information seekerformulatesa
query trying to describe the information in need. There is a
need for technologies that will allow efficient and effective
processing of huge datasets. An effective document processing
system must be able to recognize structured and semi
structured forms that is written by different people’s
handwriting. In this paper, we use an OCR method and K
means clustering algorithm to readhandwrittendocumentsby
computer system and provide required information using
efficient Information Retrieval approaches using Artificial
Intelligence.
Keywords— Text Analytics, Information Retrieval ,OCR,
Artificial Neural networks, Information Extraction.
Introduction:
This Artificial Intelligence (AI) is defined as intelligence
exhibited by an artificial entity to solve complex problems
and such a system is generally assumed to be a computer or
machine. Artificial Intelligence is an integration of computer
science and physiology Intelligence in simple languageisthe
computational part of the ability to achieve goals in the
world. Intelligence is the ability to think to imagine creating
memorizing and understanding, recognizing patterns,
making choices adapting to change and learn from
experience. Artificial intelligence concerned with making
computers behave likehumans more humanlikefashionand
in much less time then a human takes. Hence it is called as
Artificial Intelligence.
Text Analytics is the most recent name given to Natural
Language Understanding, Data and Text Mining. In the last
few years a new name has gained popularity, Big Data, to
refer mainly to unstructured text (or other information
sources), more often in the commercial rather than the
academic area, probably because unstructured free text
accounts for 80% in a business context, including tweets,
blogs, wikis and surveys .Text Analytics is the discovery of
new, previously unknown information, by automatically
extracting information from different written resources.
Information retrieval (IR) is the activity of
obtaining information resources relevant to an information
need from a collection of information resources. Searches
can be based on full-text or other content-based indexing.
Information retrieval is the science of searching for
information in a document, searching for documents
themselves, and also searching for metadata that describe
data, and for databases of texts, images or sounds.
Fig: Overview Of Artificial Intelligence
Optical Character Recognition (OCR) is a process by which
we convert printed document or scanned page to ASCII
character that a computer can recognize. The document
image itself can be either machine printedorhandwritten,or
the combination of two. Computer system equipped with
such an OCR system can improve the speed of input
operation and de-crease some possible human errors.
Difference in font and sizes makes recognitiontaskdifficultif
preprocessing, feature extraction and recognition are not
robust and width of the stroke is also a factor that affects
recognition. Therefore, a good character recognition
approach must eliminate the noise after reading binary
image data, smooth the image for better recognition, extract
features efficiently and classify patterns.
Artificial Neural Network approach has been used for
classification and recognition. It is a computational model
widely used in situation where the problem is complex and
data is subject to statistical variation. Training and
recognition phase of the ANN has been per-formed using
conventional back propagation algorithm with two hidden
layers. The architecture of a neural networkdetermineshow
a neural network transfers its input into output. This
transfer can be viewed as a computation.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 03 | Mar-2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 1153
Methodology:
OCR System:
Following steps have been followed in the OCR system
A. Preprocessing
B. Segmentation
C. Feature Extraction
Preprocessing
In the OCR system, text digitization is done by a flatbed
scanner. The digitized images are usually in gray tone, and
for a clear document, a simple histogram based threshold
approach is sufficient for converting them to two tone
images.
The histogram of gray values of the pixels shows two
prominent peaks, and a middle gray value located between
the peaks is a good choice for threshold.
i. Thresholding: This is the first preprocessingstage.
Gray or color image is taken as a input to convert
into binary image, based on threshold. The output
image obtained replaces all pixelsintheinputimage
with luminance greater than threshold level with
the value 1 (white) and replaces all other pixels
with the value 0 (black). Inverted binary image
performs inverse operation, replaces 1‟s with 0‟s
and 0‟s with 1‟s.In proposed algorithm 0.9
threshold is found good for conversion.
ii. Noise Reduction And Normalization: Binary
image exists some noise which has to be removed
for further operation that is segmentation and
features extraction. Different Morphological
operations are required for noise reduction.
Morphologically open binary image removes small
objects or all connected components fewer than
threshold pixels, brings accuracy in features
extraction. Clean operation removesisolated pixels
individual 1sthat are surrounded by 0sthatissmall
dots from image which is beneficial to bring
accuracy in segmentation. After noise reduction
unwanted 0‟saround the character are removedby
cropping and image is normalized and used for
features extraction after segmentation.
Segmentation
The Segmentation is one of the most important phases of
OCR system. Segmentation subdivides an image into its
constituent regions or objects.
i. Line segmentation: The text lines are detected by
finding the valleys of the projection profile
computed by a row-wise sum of black pixels. This
profile shows large values for the headline of the
individual text line. The position between two
consecutive headlines, where the projection profile
height is minimum, denotes the boundary between
two text lines.
ii. Word segmentation: To segmenting the text line
image into words, compute vertical projection
profiles. The projection profile is the histogram of
the image. In the profile, the zero valley peaks
represent the word space. Segmented linefromfirst
stage is taken as input for second stage that is word
segmentation. Line is then segmented into word in
this stage.
iii. Character Segmentation: Character segmentation
is done after the individual words are identified. To
extract character from word removal of headline is
essential. For this first horizontal projection of
individual word is computed and the rows having
highest projection is consider as a headline and
removed for further character segmentation. After
removal vertical projections of individual word is
computed.
Fig: Segmentation Using OCR Technique.
Feature Extraction
This step describes the various features selected by us for
classification of the selected characters. Extrac-tion of good
features is the main key to correctly recognize an un-known
character. A good feature set contains discriminating infor-
mation, which can distinguishone object from other objects.
It must also be as robust as possible in order to prevent
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 03 | Mar-2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 1154
generating differ-ent feature codes for the objects in the
same class.
Fig: Feature Extraction In OCR Technique
Text analysis: concepts and technique:
i. Before Information extraction (IE): It identifies
key phrases and relationships within text. It does
this by looking for predefined sequences in text, a
process usually called pattern matching, typically
based on regular expressions. The most popular
form of IE is named entity recognition (NER). NER
seeks to locate and classifyatomic elements in text
into predefined categories (usually matching
preestablished ontologies). NER techniques extract
features such as the names of persons,
organizations, locations, temporal or spatial
expressions, quantities, monetary values, stock
values, percentages, gene or protein names, etc.
These are several tools relevant for this task:
Apache OpenNLP, Stanford Named Entity
Recognizer LingPipe .
Fig: Overview Of Text Mining Framework.
ii. K-means Clustering Algorithm: Clustering is the
classification of objects into different groups,
partitioning of a data set into subsets (clusters), so
that the data in each subset share some common
trait-often according to some defined distance
measure. Once cluster are obtained then each
cluster can be examined for other outcomessuchas
classification. The goal is to achieve high
availability, scalability or both.
Fig: Document Clustering
Algorithm and flowchart
Fig: Flowchart Of K-Clustering Algorithm
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 03 | Mar-2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 1155
Advantages
 Easy to compute.
 Some basic metric to extract the most descriptive
terms in a document.
 It does not capture the position in text, semantics,
co-occurances in different documents.
 Cannot capture semantics(e.g:ascomparedtoword
embeddings).
Conclusion:
The computing world has a lot to gain or benefits from
various AI approaches. Their ability to learn by example
makesthem very flexible and powerful.Furthermorethereis
no need to devise an algorithm in order to perform a specific
task i.e. there is no need to understand the internal
mechanisms of that task. They are also very well suited for
real time systems because of their fast response and
computational times which are due to their parallel
architecture. The goal of artificial intelligence is to create
computers whose intelligence equals or surpasses humans.
The proposed methodology will helpininformationretrieval
in the field of scanned documents. The future work will be
implementing the use of dictionary words to improve the
performance of OCR system.
References
[1] Russell G and Perrone M (2012) International
Workshop on Frontiers in Handwriting Recognition ©
IEEE
[2] Zadeh L.A. The concept of a linguistic variable and its
application to approximate reasoning.
[3] P. Cowling, S. Remde, P. Hartley, W. Stewart, J. Stock-
Brooks, T. Woolley(2010), “C-Link Concept Linkage in
Knowledge Repositories”. AAAI Spring Symposium
Series.
[4] C-Link (2015): http://www.conceptlinkage.org
[5] Y. Hassan-Montero, and V Herrero-Solana (2016).
“Improving Tag-Clouds as Visual Information Retrieval
Interfaces”, I International Conference on
Multidisciplinary Information Sciences and
Technologies, InSciT2016.
[6] Wordle (2014): http://www.wordle.net.
[7] Xindong Wu, Senior Member, IEEE "Data Mining: An AI
Perspective" vol.4 no 2 (2004)
[8] Satvika Khanna et al. “Expert Systems Advances in
Education” NCCI 2010 -National Conference on
Computational Instrumentation CSIO Chandigarh,
INDIA, 19-20 March 2010
[9] Kaijun Xu.” Dynamic neuro-fuzzy control design forcivil
aviation aircraft in intelligent landing system. Dept. of
Air Navig. Civil Aviation Flight Univ. of China 2011.
[10] Eike.F Anderson.,”Playing smart artificial intelligencein
computer games” The National Centre for Computer
Animation (NCCA) Bournemouth University UK.
[11] K.R. Chaudhary “Goals, Roots and Sub-fields of Artificial
Intelligence. MBM Engineering College, Jodhpur, India
2012
[12] Text Analytics: the convergence of BigDataandArtificial
Intelligence(2016)-Universal Autonoma de Madrid and
Instituto de Ingeniería del Conocimiento, Madrid, Spain
2ZED Worldwide, Madrid, Spain.
[13] Information Retrieval Using Artificial Intelligence &
Fuzzy Logic For Documents Through OCR(2015) –
PandeyM.K And Nandan Singh.
[14] Google – Word2vec (2016): http://arxiv.org/pdf.

More Related Content

What's hot

An efficient hash based steganography technique for text message using col
An efficient hash based steganography technique for text message using colAn efficient hash based steganography technique for text message using col
An efficient hash based steganography technique for text message using col
IAEME Publication
 
Enhancement and Segmentation of Historical Records
Enhancement and Segmentation of Historical RecordsEnhancement and Segmentation of Historical Records
Enhancement and Segmentation of Historical Records
csandit
 
Traffic Sign Detection and Recognition for Automated Driverless Cars Based on...
Traffic Sign Detection and Recognition for Automated Driverless Cars Based on...Traffic Sign Detection and Recognition for Automated Driverless Cars Based on...
Traffic Sign Detection and Recognition for Automated Driverless Cars Based on...
ijtsrd
 
Browsing-oriented Semantic Faceted Search
Browsing-oriented Semantic Faceted SearchBrowsing-oriented Semantic Faceted Search
Browsing-oriented Semantic Faceted Search
Wagner Andreas
 
Volume 2-issue-6-1930-1932
Volume 2-issue-6-1930-1932Volume 2-issue-6-1930-1932
Volume 2-issue-6-1930-1932
Editor IJARCET
 
Proposed-curricula-MCSEwithSyllabus_24_...
Proposed-curricula-MCSEwithSyllabus_24_...Proposed-curricula-MCSEwithSyllabus_24_...
Proposed-curricula-MCSEwithSyllabus_24_...
butest
 
Implementation and Performance Evaluation of Neural Network for English Alpha...
Implementation and Performance Evaluation of Neural Network for English Alpha...Implementation and Performance Evaluation of Neural Network for English Alpha...
Implementation and Performance Evaluation of Neural Network for English Alpha...
ijtsrd
 

What's hot (19)

IRJET- A Survey on Smart Security System using Artificial Intelligence
IRJET- A Survey on Smart Security System using Artificial IntelligenceIRJET- A Survey on Smart Security System using Artificial Intelligence
IRJET- A Survey on Smart Security System using Artificial Intelligence
 
An efficient hash based steganography technique for text message using col
An efficient hash based steganography technique for text message using colAn efficient hash based steganography technique for text message using col
An efficient hash based steganography technique for text message using col
 
G1803054653
G1803054653G1803054653
G1803054653
 
Enhancement and Segmentation of Historical Records
Enhancement and Segmentation of Historical RecordsEnhancement and Segmentation of Historical Records
Enhancement and Segmentation of Historical Records
 
Traffic Sign Detection and Recognition for Automated Driverless Cars Based on...
Traffic Sign Detection and Recognition for Automated Driverless Cars Based on...Traffic Sign Detection and Recognition for Automated Driverless Cars Based on...
Traffic Sign Detection and Recognition for Automated Driverless Cars Based on...
 
Hardware enhanced association rule mining
Hardware enhanced association rule miningHardware enhanced association rule mining
Hardware enhanced association rule mining
 
An approach to empirical Optical Character recognition paradigm using Multi-L...
An approach to empirical Optical Character recognition paradigm using Multi-L...An approach to empirical Optical Character recognition paradigm using Multi-L...
An approach to empirical Optical Character recognition paradigm using Multi-L...
 
IRJET- A Survey on Soft Computing Techniques and Applications
IRJET- A Survey on Soft Computing Techniques and ApplicationsIRJET- A Survey on Soft Computing Techniques and Applications
IRJET- A Survey on Soft Computing Techniques and Applications
 
IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)
IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)
IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)
 
A Survey on Efficient Privacy-Preserving Ranked Keyword Search Method
A Survey on Efficient Privacy-Preserving Ranked Keyword Search MethodA Survey on Efficient Privacy-Preserving Ranked Keyword Search Method
A Survey on Efficient Privacy-Preserving Ranked Keyword Search Method
 
Knowledge Acquisition Based on Repertory Grid Analysis System
Knowledge Acquisition Based on Repertory Grid Analysis SystemKnowledge Acquisition Based on Repertory Grid Analysis System
Knowledge Acquisition Based on Repertory Grid Analysis System
 
Effective data mining for proper
Effective data mining for properEffective data mining for proper
Effective data mining for proper
 
Browsing-oriented Semantic Faceted Search
Browsing-oriented Semantic Faceted SearchBrowsing-oriented Semantic Faceted Search
Browsing-oriented Semantic Faceted Search
 
Volume 2-issue-6-1930-1932
Volume 2-issue-6-1930-1932Volume 2-issue-6-1930-1932
Volume 2-issue-6-1930-1932
 
Proposed-curricula-MCSEwithSyllabus_24_...
Proposed-curricula-MCSEwithSyllabus_24_...Proposed-curricula-MCSEwithSyllabus_24_...
Proposed-curricula-MCSEwithSyllabus_24_...
 
OPTICAL CHARACTER RECOGNITION USING RBFNN
OPTICAL CHARACTER RECOGNITION USING RBFNNOPTICAL CHARACTER RECOGNITION USING RBFNN
OPTICAL CHARACTER RECOGNITION USING RBFNN
 
Variance rover system web analytics tool using data
Variance rover system web analytics tool using dataVariance rover system web analytics tool using data
Variance rover system web analytics tool using data
 
Variance rover system
Variance rover systemVariance rover system
Variance rover system
 
Implementation and Performance Evaluation of Neural Network for English Alpha...
Implementation and Performance Evaluation of Neural Network for English Alpha...Implementation and Performance Evaluation of Neural Network for English Alpha...
Implementation and Performance Evaluation of Neural Network for English Alpha...
 

Similar to IRJET- Information Retrieval & Text Analytics using Artificial Intelligence

Anomalous symmetry succession for seek out
Anomalous symmetry succession for seek outAnomalous symmetry succession for seek out
Anomalous symmetry succession for seek out
iaemedu
 
Reconstructing the Path of the Object based on Time and Date OCR in Surveilla...
Reconstructing the Path of the Object based on Time and Date OCR in Surveilla...Reconstructing the Path of the Object based on Time and Date OCR in Surveilla...
Reconstructing the Path of the Object based on Time and Date OCR in Surveilla...
ijtsrd
 

Similar to IRJET- Information Retrieval & Text Analytics using Artificial Intelligence (20)

IRJET- Intelligent Character Recognition of Handwritten Characters
IRJET- Intelligent Character Recognition of Handwritten CharactersIRJET- Intelligent Character Recognition of Handwritten Characters
IRJET- Intelligent Character Recognition of Handwritten Characters
 
IRJET- Optical Character Recognition using Image Processing
IRJET-  	  Optical Character Recognition using Image ProcessingIRJET-  	  Optical Character Recognition using Image Processing
IRJET- Optical Character Recognition using Image Processing
 
IRJET- Offline Transcription using AI
IRJET-  	  Offline Transcription using AIIRJET-  	  Offline Transcription using AI
IRJET- Offline Transcription using AI
 
IRJET- Cheque Bounce Detection System using Image Processing
IRJET- Cheque Bounce Detection System using Image ProcessingIRJET- Cheque Bounce Detection System using Image Processing
IRJET- Cheque Bounce Detection System using Image Processing
 
Design and Description of Feature Extraction Algorithm for Old English Font
Design and Description of Feature Extraction Algorithm for Old English FontDesign and Description of Feature Extraction Algorithm for Old English Font
Design and Description of Feature Extraction Algorithm for Old English Font
 
Optical Character Recognition Using Python
Optical Character Recognition Using PythonOptical Character Recognition Using Python
Optical Character Recognition Using Python
 
IRJET- Document Layout analysis using Inverse Support Vector Machine (I-SV...
IRJET- 	  Document Layout analysis using Inverse Support Vector Machine (I-SV...IRJET- 	  Document Layout analysis using Inverse Support Vector Machine (I-SV...
IRJET- Document Layout analysis using Inverse Support Vector Machine (I-SV...
 
Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...
Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...
Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...
 
IRJET- Intelligent Character Recognition of Handwritten Characters using ...
IRJET-  	  Intelligent Character Recognition of Handwritten Characters using ...IRJET-  	  Intelligent Character Recognition of Handwritten Characters using ...
IRJET- Intelligent Character Recognition of Handwritten Characters using ...
 
IRJET- Photo Optical Character Recognition Model
IRJET- Photo Optical Character Recognition ModelIRJET- Photo Optical Character Recognition Model
IRJET- Photo Optical Character Recognition Model
 
BLOB DETECTION TECHNIQUE USING IMAGE PROCESSING FOR IDENTIFICATION OF MACHINE...
BLOB DETECTION TECHNIQUE USING IMAGE PROCESSING FOR IDENTIFICATION OF MACHINE...BLOB DETECTION TECHNIQUE USING IMAGE PROCESSING FOR IDENTIFICATION OF MACHINE...
BLOB DETECTION TECHNIQUE USING IMAGE PROCESSING FOR IDENTIFICATION OF MACHINE...
 
Text Recognition using Convolutional Neural Network: A Review
Text Recognition using Convolutional Neural Network: A ReviewText Recognition using Convolutional Neural Network: A Review
Text Recognition using Convolutional Neural Network: A Review
 
Anomalous symmetry succession for seek out
Anomalous symmetry succession for seek outAnomalous symmetry succession for seek out
Anomalous symmetry succession for seek out
 
­­­­Cursive Handwriting Recognition System using Feature Extraction and Artif...
­­­­Cursive Handwriting Recognition System using Feature Extraction and Artif...­­­­Cursive Handwriting Recognition System using Feature Extraction and Artif...
­­­­Cursive Handwriting Recognition System using Feature Extraction and Artif...
 
IRJET-MText Extraction from Images using Convolutional Neural Network
IRJET-MText Extraction from Images using Convolutional Neural NetworkIRJET-MText Extraction from Images using Convolutional Neural Network
IRJET-MText Extraction from Images using Convolutional Neural Network
 
STOCKSENTIX: A MACHINE LEARNING APPROACH TO STOCKMARKET
STOCKSENTIX: A MACHINE LEARNING APPROACH TO STOCKMARKETSTOCKSENTIX: A MACHINE LEARNING APPROACH TO STOCKMARKET
STOCKSENTIX: A MACHINE LEARNING APPROACH TO STOCKMARKET
 
Document Analyser Using Deep Learning
Document Analyser Using Deep LearningDocument Analyser Using Deep Learning
Document Analyser Using Deep Learning
 
Development of Information Extraction for Data Analysis using NLP
Development of Information Extraction for Data Analysis using NLPDevelopment of Information Extraction for Data Analysis using NLP
Development of Information Extraction for Data Analysis using NLP
 
Application of Advance Encryption Algorithm to Implement Access to Sensitive ...
Application of Advance Encryption Algorithm to Implement Access to Sensitive ...Application of Advance Encryption Algorithm to Implement Access to Sensitive ...
Application of Advance Encryption Algorithm to Implement Access to Sensitive ...
 
Reconstructing the Path of the Object based on Time and Date OCR in Surveilla...
Reconstructing the Path of the Object based on Time and Date OCR in Surveilla...Reconstructing the Path of the Object based on Time and Date OCR in Surveilla...
Reconstructing the Path of the Object based on Time and Date OCR in Surveilla...
 

More from IRJET Journal

More from IRJET Journal (20)

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
 

Recently uploaded

Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Christo Ananth
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 

Recently uploaded (20)

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 

IRJET- Information Retrieval & Text Analytics using Artificial Intelligence

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 03 | Mar-2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 1152 Information Retrieval & Text Analytics Using Artificial Intelligence Lavanya N1, Dr Mahesh K Kaluti2 1M Tech, CSE, Dept. of Computer Science and Engineering, PESCE, Mandya 2Assistant professor, CSE, Dept. of Computer Science and Engineering, PESCE, Mandya ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract—Information Retrieval is a key technology for knowledge management. The information seekerformulatesa query trying to describe the information in need. There is a need for technologies that will allow efficient and effective processing of huge datasets. An effective document processing system must be able to recognize structured and semi structured forms that is written by different people’s handwriting. In this paper, we use an OCR method and K means clustering algorithm to readhandwrittendocumentsby computer system and provide required information using efficient Information Retrieval approaches using Artificial Intelligence. Keywords— Text Analytics, Information Retrieval ,OCR, Artificial Neural networks, Information Extraction. Introduction: This Artificial Intelligence (AI) is defined as intelligence exhibited by an artificial entity to solve complex problems and such a system is generally assumed to be a computer or machine. Artificial Intelligence is an integration of computer science and physiology Intelligence in simple languageisthe computational part of the ability to achieve goals in the world. Intelligence is the ability to think to imagine creating memorizing and understanding, recognizing patterns, making choices adapting to change and learn from experience. Artificial intelligence concerned with making computers behave likehumans more humanlikefashionand in much less time then a human takes. Hence it is called as Artificial Intelligence. Text Analytics is the most recent name given to Natural Language Understanding, Data and Text Mining. In the last few years a new name has gained popularity, Big Data, to refer mainly to unstructured text (or other information sources), more often in the commercial rather than the academic area, probably because unstructured free text accounts for 80% in a business context, including tweets, blogs, wikis and surveys .Text Analytics is the discovery of new, previously unknown information, by automatically extracting information from different written resources. Information retrieval (IR) is the activity of obtaining information resources relevant to an information need from a collection of information resources. Searches can be based on full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for metadata that describe data, and for databases of texts, images or sounds. Fig: Overview Of Artificial Intelligence Optical Character Recognition (OCR) is a process by which we convert printed document or scanned page to ASCII character that a computer can recognize. The document image itself can be either machine printedorhandwritten,or the combination of two. Computer system equipped with such an OCR system can improve the speed of input operation and de-crease some possible human errors. Difference in font and sizes makes recognitiontaskdifficultif preprocessing, feature extraction and recognition are not robust and width of the stroke is also a factor that affects recognition. Therefore, a good character recognition approach must eliminate the noise after reading binary image data, smooth the image for better recognition, extract features efficiently and classify patterns. Artificial Neural Network approach has been used for classification and recognition. It is a computational model widely used in situation where the problem is complex and data is subject to statistical variation. Training and recognition phase of the ANN has been per-formed using conventional back propagation algorithm with two hidden layers. The architecture of a neural networkdetermineshow a neural network transfers its input into output. This transfer can be viewed as a computation.
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 03 | Mar-2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 1153 Methodology: OCR System: Following steps have been followed in the OCR system A. Preprocessing B. Segmentation C. Feature Extraction Preprocessing In the OCR system, text digitization is done by a flatbed scanner. The digitized images are usually in gray tone, and for a clear document, a simple histogram based threshold approach is sufficient for converting them to two tone images. The histogram of gray values of the pixels shows two prominent peaks, and a middle gray value located between the peaks is a good choice for threshold. i. Thresholding: This is the first preprocessingstage. Gray or color image is taken as a input to convert into binary image, based on threshold. The output image obtained replaces all pixelsintheinputimage with luminance greater than threshold level with the value 1 (white) and replaces all other pixels with the value 0 (black). Inverted binary image performs inverse operation, replaces 1‟s with 0‟s and 0‟s with 1‟s.In proposed algorithm 0.9 threshold is found good for conversion. ii. Noise Reduction And Normalization: Binary image exists some noise which has to be removed for further operation that is segmentation and features extraction. Different Morphological operations are required for noise reduction. Morphologically open binary image removes small objects or all connected components fewer than threshold pixels, brings accuracy in features extraction. Clean operation removesisolated pixels individual 1sthat are surrounded by 0sthatissmall dots from image which is beneficial to bring accuracy in segmentation. After noise reduction unwanted 0‟saround the character are removedby cropping and image is normalized and used for features extraction after segmentation. Segmentation The Segmentation is one of the most important phases of OCR system. Segmentation subdivides an image into its constituent regions or objects. i. Line segmentation: The text lines are detected by finding the valleys of the projection profile computed by a row-wise sum of black pixels. This profile shows large values for the headline of the individual text line. The position between two consecutive headlines, where the projection profile height is minimum, denotes the boundary between two text lines. ii. Word segmentation: To segmenting the text line image into words, compute vertical projection profiles. The projection profile is the histogram of the image. In the profile, the zero valley peaks represent the word space. Segmented linefromfirst stage is taken as input for second stage that is word segmentation. Line is then segmented into word in this stage. iii. Character Segmentation: Character segmentation is done after the individual words are identified. To extract character from word removal of headline is essential. For this first horizontal projection of individual word is computed and the rows having highest projection is consider as a headline and removed for further character segmentation. After removal vertical projections of individual word is computed. Fig: Segmentation Using OCR Technique. Feature Extraction This step describes the various features selected by us for classification of the selected characters. Extrac-tion of good features is the main key to correctly recognize an un-known character. A good feature set contains discriminating infor- mation, which can distinguishone object from other objects. It must also be as robust as possible in order to prevent
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 03 | Mar-2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 1154 generating differ-ent feature codes for the objects in the same class. Fig: Feature Extraction In OCR Technique Text analysis: concepts and technique: i. Before Information extraction (IE): It identifies key phrases and relationships within text. It does this by looking for predefined sequences in text, a process usually called pattern matching, typically based on regular expressions. The most popular form of IE is named entity recognition (NER). NER seeks to locate and classifyatomic elements in text into predefined categories (usually matching preestablished ontologies). NER techniques extract features such as the names of persons, organizations, locations, temporal or spatial expressions, quantities, monetary values, stock values, percentages, gene or protein names, etc. These are several tools relevant for this task: Apache OpenNLP, Stanford Named Entity Recognizer LingPipe . Fig: Overview Of Text Mining Framework. ii. K-means Clustering Algorithm: Clustering is the classification of objects into different groups, partitioning of a data set into subsets (clusters), so that the data in each subset share some common trait-often according to some defined distance measure. Once cluster are obtained then each cluster can be examined for other outcomessuchas classification. The goal is to achieve high availability, scalability or both. Fig: Document Clustering Algorithm and flowchart Fig: Flowchart Of K-Clustering Algorithm
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 03 | Mar-2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 1155 Advantages  Easy to compute.  Some basic metric to extract the most descriptive terms in a document.  It does not capture the position in text, semantics, co-occurances in different documents.  Cannot capture semantics(e.g:ascomparedtoword embeddings). Conclusion: The computing world has a lot to gain or benefits from various AI approaches. Their ability to learn by example makesthem very flexible and powerful.Furthermorethereis no need to devise an algorithm in order to perform a specific task i.e. there is no need to understand the internal mechanisms of that task. They are also very well suited for real time systems because of their fast response and computational times which are due to their parallel architecture. The goal of artificial intelligence is to create computers whose intelligence equals or surpasses humans. The proposed methodology will helpininformationretrieval in the field of scanned documents. The future work will be implementing the use of dictionary words to improve the performance of OCR system. References [1] Russell G and Perrone M (2012) International Workshop on Frontiers in Handwriting Recognition © IEEE [2] Zadeh L.A. The concept of a linguistic variable and its application to approximate reasoning. [3] P. Cowling, S. Remde, P. Hartley, W. Stewart, J. Stock- Brooks, T. Woolley(2010), “C-Link Concept Linkage in Knowledge Repositories”. AAAI Spring Symposium Series. [4] C-Link (2015): http://www.conceptlinkage.org [5] Y. Hassan-Montero, and V Herrero-Solana (2016). “Improving Tag-Clouds as Visual Information Retrieval Interfaces”, I International Conference on Multidisciplinary Information Sciences and Technologies, InSciT2016. [6] Wordle (2014): http://www.wordle.net. [7] Xindong Wu, Senior Member, IEEE "Data Mining: An AI Perspective" vol.4 no 2 (2004) [8] Satvika Khanna et al. “Expert Systems Advances in Education” NCCI 2010 -National Conference on Computational Instrumentation CSIO Chandigarh, INDIA, 19-20 March 2010 [9] Kaijun Xu.” Dynamic neuro-fuzzy control design forcivil aviation aircraft in intelligent landing system. Dept. of Air Navig. Civil Aviation Flight Univ. of China 2011. [10] Eike.F Anderson.,”Playing smart artificial intelligencein computer games” The National Centre for Computer Animation (NCCA) Bournemouth University UK. [11] K.R. Chaudhary “Goals, Roots and Sub-fields of Artificial Intelligence. MBM Engineering College, Jodhpur, India 2012 [12] Text Analytics: the convergence of BigDataandArtificial Intelligence(2016)-Universal Autonoma de Madrid and Instituto de Ingeniería del Conocimiento, Madrid, Spain 2ZED Worldwide, Madrid, Spain. [13] Information Retrieval Using Artificial Intelligence & Fuzzy Logic For Documents Through OCR(2015) – PandeyM.K And Nandan Singh. [14] Google – Word2vec (2016): http://arxiv.org/pdf.