SlideShare a Scribd company logo
1 of 42
 A Fuzzy Logic intelligent agent for Information Extraction: Introducing a new
Fuzzy Logic-based term weighting scheme (Jorge Ropero, Ariel Gomez, Alejandro
Carrascom Carlos Leon ; Department of Electronic Technology, University of
Seville, Spain ; October 2011)
 A new Fuzzy Logic Based Information Retrieval Model (Slawomir Zadozny, Janusz
Kacprzyk ; Polish Academy of Sciences ; January 2008)
 Information Extraction in a Set of Knowledge Using a Fuzzy Logic Based
Intelligent Agent (Jorge Ropero, Ariel Gomez, Alejandro Carrascom Carlos Leon ;
University of Seville, Spain ; August 2007)
 A method of Information Extraction (IE) in a set of knowledge is proposed to
answer to user consultations using natural language.
 The system is based on fuzzy logic engine, which takes advantages of its flexibility
for managing sets of accumulated knowledge.
 These sets can be built in hierarchic levels by a tree structure.
 The eventual aim of this system is the implementation of an intelligent agent to
manage the information contained in an internet website (portal).
 The quantity of web information grows exponentially.
 Achieving both high recall and precision in IR is one of its most important
objectives.
 IR has been widely used for text classification introducing approaches such as
Vector Space Model (VSM), K nearest neighbor method , Bayesian Classification
model , Neural networks and Support Vector Machine.
 VSM is the most frequently used model.
 We propose a fully novel method using Fuzzy Logic (FL), to extract the required
information and related information too.
 Giving answers to the user consultations in natural language.
 Bearing in mind that non-experts users tend not to be exact in their searches.
 Data Mining (DM) : is an automatic process of analyzing information in order to discover
patters and to build predictive models, some of its applications are e-commerce, Text
mining, e-learning and marketing.
 Information Retrieval (IR) : is an automatic search of the relevant information contained
in a set of knowledge.
 Information Extraction (IE) : Once documents have been retrieved , the challenge is to
extract the required information automatically, so its task is to identify the specific
fragments of a document which constitute its main semantic content.
 The main objective of the designed system must be to let the users find possible
answers to what they are looking for in a huge set of knowledge.
 The whole set must be classified into different objects.
 These objects are the answers to possible user consultations, organized in hierarchic
groups.
 One or more standard questions are assigned for every object.
 Different index terms from each standard question must then be selected in order to
differentiate one object from the others.
 Finally, term weights are assigned to every index term for every level of hierarchy in a
scheme based on VSM, these terms are the inputs to the FL system.
 The system must return to the user the object correspondent with the standard
question, or questions that are more similar to the user consultations.
 The first step is divide the whole set of knowledge into objects, one or more
questions in NL are assigned to every object, the answers to this or these
questions must represent the desired object.
 The second step is the selection of the index terms, which are extracted from the
questions. The index terms represent the most related terms of standard
questions with the represented object.
 We may consider mainly two methods for term weighting:
1. Let an expert in the matter evaluate intuitively the importance of index terms. This
method is simple, but it has the disadvantages of depending exclusively on the
engineer of knowledge.
2. Automate TW by means of a series of rules.
 Given the large quantity of information there is in a web portal, we choose the
second option.
 We propose a VSM method for TW.
 The most widely used method for TW is so-called TF-IDF method.
 Every index term has an associated weight. This weight has a value between 0 and 1
depending on the importance of the term in every hierarchic level.
 The greater is the importance of the term in a level, the higher is the weight of the
term.
 The term weight might not be the same for every hierarchic level.
𝑤𝑡,𝑑 = 𝑡𝑓𝑡,𝑑 . log
𝐷
𝑑′ ∈ 𝐷 | 𝑡 ∈ 𝑑′
Is the term frequency of term t
in document d
is inverse document frequency
(a global parameter). | D | is
the total number of documents
in the document set, the
dominator is the number of
documents containing the
term t.
 The element of the intelligent agent which determines the degree of certainty for a
group of index terms belonging or not to every possible subsets of the whole set of
knowledge is the fuzzy inference engine.
 The Inference engine has several inputs : the weights of selects index terms.
 The Inference engine gives an output : the degree of certainty for a particular
subset in particular level.
 For the fuzzy engine , it is necessary to define:
1. The number of inputs to the inference engine.
2. Input fuzzy sets : input ranges, number of fuzzy sets and shape and range of
membership functions.
3. Output fuzzy sets: output ranges, member of fuzzy sets, and shape and range of
membership functions.
4. Fuzzy rules, they are of the IF … THEN type.
5. Used methods for AND and OR operations and defuzzifying.
 All these parameters must be taken into account to find the optimal configuration
for the inference engine.
 Once standard questions are defined, index terms are extracted from them. Index
terms are the ones that better represent a standard question.
 Every index term is associated with its correspondent term weight. This weight
has a value between 0 and 1 and depends on the importance of a term in a level.
 The higher the importance of a term weight in a level, the higher is the term
weight.
 The final aim of the intelligent agent must be to find object or objects whose
information is more similar to the requested user consultation.
Step Example
Step 1: Web page identified by standard
question/s
- Web page :
www.us.es/univirtual/internetwww.us.es/u
nivirtuail/internet
- Standard question : Which services can I
access as a virtual user at the University
of Seville?
Step 2: Locate standard question/s in
hierarchic structure.
Topic 12: Virtual University
Section 6: Virtual User
Object : 2
Step 3: Extract index terms Index terms : “services” , “virtual” , “user
Step 4 : Term weighting Will discuss it later
 The first goal is to check that the system makes a correct identification of
standard questions with an index of certainty higher than a certain threshold.
 This is related to the concept id recall.
 The second goal is to check weather the required standard question is among the
tree answers with higher degree of certainty.
 This is related to precision.
 Test results for standard questions recognition fit into five categories :
1. The correct question is the only one found or the one that has the highest degree of
certainty.
2. The correct question is one between the two with the highest certainty or is the one
that has the second highest degree of certainty.
3. The correct question is one between the three with the highest certainty or is the
one that has the third highest degree of certainty.
4. The correct question is found but not among the three with the highest degree of
certainty.
5. The correct question is not found.
 Input range corresponds to weight range for every index term.
 We considered three fuzzy sets represented by values LOW, MEDIUM and HIGH.
 All of them are kept as triangular.
 Output which gives the degree of certainty is defined as LOW, LOW-MEDIUM,
MEDIUM-HIGH and HIGH.
 The fact that the input takes 3 fuzzy sets is due to that the number of these sets
are enough so that results are coherent and there are not so many options to let
the number of rules increase.
 Center of gravity defuzzifier.
 The range of values for every input set is :
 LOW, from 0.0 to 0.4 centered in 0.0.
 MEDIUM from 0.2 to 0.8 centered in 0.5.
 HIGH from 0.6 to 0.1 centered in 1.0.
 The range of values for every output fuzzy set is :
 LOW, from 0.0 to 0.4 centered in 0.0.
 MEDIUM-LOW from 0.1 to 0.7 centered in 0.4.
 MEDIUM-HIGH from 0.3 to 0.9 centered in 0.6.
 HIGH from 0.6 to 1.0 centered in 1.0.
Rule number Rule definition Output
R1 IF one or more inputs = HIGH HIGH
R2 IF three inputs = MEDIUM HIGH
R3 IF two inputs = MEDIUM and one input =
LOW
MEDIUM-HIGH
R4 IF one input = MEDIUM and two inputs =
LOW
MEDIUM-LOW
R5 IF all inputs = LOW LOW
 Step 1 : User query in Natural language.
 “Which services can I access as a virtual user at the University of Seville ?”
 Step 2: Index term extraction.
 TiW = Term weight vector for Topic i.
 Step 3: Weight vectors are taken as inputs to the fuzzy engine for every topic.
 TiO = Fuzzy engine output for Topic i.
 *Topics 10 and 12 are considered , threshold 0.4 in our case.
 Step 4: Step 3 is repeated for the next hierarchic level “ Sections of the selected
Topics”.
 TiSjW = Term weight vector for Topic i, Section j.
 TisjO = Fuzzy engine output for Topic i, Section j.
 Step 5 : Step 3 is repeated for the next hierarchic level “Objects of the selected
Sections”.
 TiSjOkW = Term weight vector for Topic i, Section j, Object k.
 TiSjOkO = Fuzzy engine output for Topic i, Section j , Object k.
 We may consider two options to define these weights :
 An expert in the matter should evaluate intuitively the importance of the index term.
 Simple, but it has the disadvantage of depending exclusively on the knowledge engineer. Also not
possible to automate.
 The generation of automated weights by means of a set of rules.
 The most widely used method is the TF-IDF method
 A novel Fuzzy Logic based method.
 Achieves better results in IE.
 The FL-based method has two main advantages :
1. It improves the basic TF-IDF method by creating a table with all keywords and their
corresponding weights for every object, this table will be created in the phase of
keyword extraction from standard questions.
2. The whole method of term weighting is automated and the level of required
expertise for an operator is lower.
 The chosen formula for the tests was the one proposed by Liu et al(2001).
𝑊𝑖𝑘 =
𝑡𝑓𝑖𝑘 × log(
𝑁
𝑛 𝑘 + 0.001
)
𝑘=1
𝑚
𝑡𝑓𝑖𝑘 × log
𝑁
𝑛 𝑘 + 0.001
2
 Here, 𝑡𝑓𝑖𝑘 is the ith term frequency of occurrence in the kth subset
“Topic/Section/Object”, 𝑛 𝑘 is the number of subsets to which the term Ti is
assigned in a collection on N objects.
 As an example we are using the term ‘virtual’ as used in the previous example.
 At Topic level :
o ‘Virtual’ appears 8 times in Topic 12 (tf = 8, K = 12)
o ‘Virtual’ appears twice in other topics (nk = 3)
o There are 12 Topics in total (N = 12)
o Substituting , Wik = 0.20
 At Section level:
o ‘Virtual’ appears 3 times in Section 12.6 (tf = 3, K = 6)
o ‘Virtual’ appears 5 times in other sections in Topic 12 (nk = 6)
o There are 6 Sections in Topic 12 ( N = 6).
o Substituting , Wik = 0.17
 At Object level:
o ‘Virtual’ appears Once in Section 12.6.2 (tf = 1, K = 2)
o ‘Virtual’ appears twice in other Topic 12 (nk = 3)
o There are 3 Objects in Section 12.6 ( N = 3).
o Substituting , Wik = 0.01
 TF-IDF has the disadvantage of not considering the degree of identification of the
object if only the considered Index term is used.
 FL-based term weighting method is defined as a four questions that must be
answered to determine the TW of an index term.
1. Question 1 : How often does an index term appear in other subsets ? –related to IDF.
2. Question 2 : How often does an index term appear in its own subset? – related to TF.
3. Question 3 : Does an index term undoubtedly define an object by itself?
4. Question 4 : Is an index term tied to anther one?
 The answer to these questions gives a series of values which are the inputs to a
Fuzzy logic system , called Weight Assigner. The output of the WA is the definite
weight for the corresponding index term.
 Term weight is partly associated with the question “How often does an index term
appear in other subsets”.
 It is given by a value between 0 and 1.
 0 if it appears many times
 1 if it does not appear in any other subset.
 Term weight values for every Topic for Q1:
 Term weight values for every Section for Q1:
 Term weight values for every Object for Q1:
 To find the term weight associated with “How often does an index term appear in
its own subset ?”
 Q2 is senseless at the level of Object.
 Example for term weight for every Topic and Section:
 Does a term define undoubtedly a standard question ?
 The answer is completely subjective to the answers of “Yes”, ”Rather” and “No”.
 Term weight values for Q3:
 Is an index term tied to another one?
Rule Rule definition Output
R1 IF Q1 = HIGH and Q2 != LOW At least MEDIUM-HIGH
R2 IF Q1 = MEDIUM and Q2 = HIGH At least MEDIUM-HIGH
R3 IF Q1 = HIGH and Q2 = LOW Depends on other Questions
R4 IF Q1 = HIGH and Q2 = LOW Depends on other Questions
R5 IF Q3 = HIGH At least MEDIUM-HIGH
R6 IF Q4 = LOW Descends a level
R7 IF Q4 = MEDIUM If the Output is MEDIUM-
LOW, it depends to LOW
R8 IF (R1 and R2) or (R1 and R5) or (R2 and
R5)
HIGH
R9 In any other case MEDIUM-LOW
Cat1 Cat2 Cat3 Cat4 Cat5 Total
TF-IDF
method
466
(50.98%)
223
(24.40%)
53 (5.80%) 79 (8.64%) 93 (10.18%) 914
FL method 710
(77.68%)
108
(11.82%)
27 (2.95%) 28 (3.06%) 41 (4.49%) 914

More Related Content

What's hot

Sentiment Analysis on Twitter
Sentiment Analysis on TwitterSentiment Analysis on Twitter
Sentiment Analysis on TwitterSubarno Pal
 
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEA FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEkevig
 
Modeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector QuantizationModeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector QuantizationTELKOMNIKA JOURNAL
 
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTION
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTIONTEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTION
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTIONijistjournal
 
Sentiment Analysis on Twitter Data
Sentiment Analysis on Twitter DataSentiment Analysis on Twitter Data
Sentiment Analysis on Twitter DataIRJET Journal
 
Binary search query classifier
Binary search query classifierBinary search query classifier
Binary search query classifierEsteban Ribero
 
Paper id 28201441
Paper id 28201441Paper id 28201441
Paper id 28201441IJRAT
 
IRJET- Personality Recognition using Multi-Label Classification
IRJET- Personality Recognition using Multi-Label ClassificationIRJET- Personality Recognition using Multi-Label Classification
IRJET- Personality Recognition using Multi-Label ClassificationIRJET Journal
 
IRJET- Classifying Twitter Data in Multiple Classes based on Sentiment Class ...
IRJET- Classifying Twitter Data in Multiple Classes based on Sentiment Class ...IRJET- Classifying Twitter Data in Multiple Classes based on Sentiment Class ...
IRJET- Classifying Twitter Data in Multiple Classes based on Sentiment Class ...IRJET Journal
 
MACHINE LEARNING TOOLBOX
MACHINE LEARNING TOOLBOXMACHINE LEARNING TOOLBOX
MACHINE LEARNING TOOLBOXmlaij
 
Sentiment analysis of twitter data
Sentiment analysis of twitter dataSentiment analysis of twitter data
Sentiment analysis of twitter dataBhagyashree Deokar
 
Learning strategy with groups on page based students' profiles
Learning strategy with groups on page based students' profilesLearning strategy with groups on page based students' profiles
Learning strategy with groups on page based students' profilesaciijournal
 
IRJET-Sentiment Analysis in Twitter
IRJET-Sentiment Analysis in TwitterIRJET-Sentiment Analysis in Twitter
IRJET-Sentiment Analysis in TwitterIRJET Journal
 
Profile Analysis of Users in Data Analytics Domain
Profile Analysis of   Users in Data Analytics DomainProfile Analysis of   Users in Data Analytics Domain
Profile Analysis of Users in Data Analytics DomainDrjabez
 
Ijmer 46067276
Ijmer 46067276Ijmer 46067276
Ijmer 46067276IJMER
 
Assessment of Programming Language Reliability Utilizing Soft-Computing
Assessment of Programming Language Reliability Utilizing Soft-ComputingAssessment of Programming Language Reliability Utilizing Soft-Computing
Assessment of Programming Language Reliability Utilizing Soft-Computingijcsa
 
Sensing Trending Topics in Twitter for Greater Jakarta Area
Sensing Trending Topics in Twitter for Greater Jakarta Area Sensing Trending Topics in Twitter for Greater Jakarta Area
Sensing Trending Topics in Twitter for Greater Jakarta Area IJECEIAES
 
IRJET- Machine Learning: Survey, Types and Challenges
IRJET- Machine Learning: Survey, Types and ChallengesIRJET- Machine Learning: Survey, Types and Challenges
IRJET- Machine Learning: Survey, Types and ChallengesIRJET Journal
 
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET- Sentimental Analysis for Online Reviews using Machine Learning AlgorithmsIRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET- Sentimental Analysis for Online Reviews using Machine Learning AlgorithmsIRJET Journal
 

What's hot (20)

Sentiment Analysis on Twitter
Sentiment Analysis on TwitterSentiment Analysis on Twitter
Sentiment Analysis on Twitter
 
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEA FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
 
Modeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector QuantizationModeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector Quantization
 
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTION
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTIONTEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTION
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTION
 
Sentiment Analysis on Twitter Data
Sentiment Analysis on Twitter DataSentiment Analysis on Twitter Data
Sentiment Analysis on Twitter Data
 
Binary search query classifier
Binary search query classifierBinary search query classifier
Binary search query classifier
 
Paper id 28201441
Paper id 28201441Paper id 28201441
Paper id 28201441
 
IRJET- Personality Recognition using Multi-Label Classification
IRJET- Personality Recognition using Multi-Label ClassificationIRJET- Personality Recognition using Multi-Label Classification
IRJET- Personality Recognition using Multi-Label Classification
 
IRJET- Classifying Twitter Data in Multiple Classes based on Sentiment Class ...
IRJET- Classifying Twitter Data in Multiple Classes based on Sentiment Class ...IRJET- Classifying Twitter Data in Multiple Classes based on Sentiment Class ...
IRJET- Classifying Twitter Data in Multiple Classes based on Sentiment Class ...
 
MACHINE LEARNING TOOLBOX
MACHINE LEARNING TOOLBOXMACHINE LEARNING TOOLBOX
MACHINE LEARNING TOOLBOX
 
Sentiment analysis of twitter data
Sentiment analysis of twitter dataSentiment analysis of twitter data
Sentiment analysis of twitter data
 
Learning strategy with groups on page based students' profiles
Learning strategy with groups on page based students' profilesLearning strategy with groups on page based students' profiles
Learning strategy with groups on page based students' profiles
 
573 248-259
573 248-259573 248-259
573 248-259
 
IRJET-Sentiment Analysis in Twitter
IRJET-Sentiment Analysis in TwitterIRJET-Sentiment Analysis in Twitter
IRJET-Sentiment Analysis in Twitter
 
Profile Analysis of Users in Data Analytics Domain
Profile Analysis of   Users in Data Analytics DomainProfile Analysis of   Users in Data Analytics Domain
Profile Analysis of Users in Data Analytics Domain
 
Ijmer 46067276
Ijmer 46067276Ijmer 46067276
Ijmer 46067276
 
Assessment of Programming Language Reliability Utilizing Soft-Computing
Assessment of Programming Language Reliability Utilizing Soft-ComputingAssessment of Programming Language Reliability Utilizing Soft-Computing
Assessment of Programming Language Reliability Utilizing Soft-Computing
 
Sensing Trending Topics in Twitter for Greater Jakarta Area
Sensing Trending Topics in Twitter for Greater Jakarta Area Sensing Trending Topics in Twitter for Greater Jakarta Area
Sensing Trending Topics in Twitter for Greater Jakarta Area
 
IRJET- Machine Learning: Survey, Types and Challenges
IRJET- Machine Learning: Survey, Types and ChallengesIRJET- Machine Learning: Survey, Types and Challenges
IRJET- Machine Learning: Survey, Types and Challenges
 
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET- Sentimental Analysis for Online Reviews using Machine Learning AlgorithmsIRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
 

Similar to A Fuzzy Logic Intelligent Agent for Information Extraction

IRJET - A Survey on Machine Learning Algorithms, Techniques and Applications
IRJET - A Survey on Machine Learning Algorithms, Techniques and ApplicationsIRJET - A Survey on Machine Learning Algorithms, Techniques and Applications
IRJET - A Survey on Machine Learning Algorithms, Techniques and ApplicationsIRJET Journal
 
Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...ijsc
 
Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...ijsc
 
IRJET- Factoid Question and Answering System
IRJET-  	  Factoid Question and Answering SystemIRJET-  	  Factoid Question and Answering System
IRJET- Factoid Question and Answering SystemIRJET Journal
 
Classification of Machine Learning Algorithms
Classification of Machine Learning AlgorithmsClassification of Machine Learning Algorithms
Classification of Machine Learning AlgorithmsAM Publications
 
Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine LearningSentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine LearningIRJET Journal
 
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5ssuser33da69
 
Methodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniquesMethodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniquesijsc
 
Classification of Health Forum Messages using Deep Learning
Classification of Health Forum Messages using Deep LearningClassification of Health Forum Messages using Deep Learning
Classification of Health Forum Messages using Deep LearningSejal Naidu
 
Threat Detection System Using Data-science and NLP
Threat Detection System Using Data-science and NLPThreat Detection System Using Data-science and NLP
Threat Detection System Using Data-science and NLPIRJET Journal
 
Artificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptxArtificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptxChandrakalaV15
 
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques  Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques ijsc
 
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsDinusha Dilanka
 
Feature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering TechniquesFeature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering TechniquesIRJET Journal
 
CIS 25 SPRING 2020FINAL Due 1159 PM May 22 (this is a har.docx
CIS 25 SPRING 2020FINAL Due 1159 PM May 22 (this is a har.docxCIS 25 SPRING 2020FINAL Due 1159 PM May 22 (this is a har.docx
CIS 25 SPRING 2020FINAL Due 1159 PM May 22 (this is a har.docxsleeperharwell
 
IRJET- Semantics based Document Clustering
IRJET- Semantics based Document ClusteringIRJET- Semantics based Document Clustering
IRJET- Semantics based Document ClusteringIRJET Journal
 
IRJET- Personality Prediction System using AI
IRJET- Personality Prediction System using AIIRJET- Personality Prediction System using AI
IRJET- Personality Prediction System using AIIRJET Journal
 
A Survey Ondecision Tree Learning Algorithms for Knowledge Discovery
A Survey Ondecision Tree Learning Algorithms for Knowledge DiscoveryA Survey Ondecision Tree Learning Algorithms for Knowledge Discovery
A Survey Ondecision Tree Learning Algorithms for Knowledge DiscoveryIJERA Editor
 

Similar to A Fuzzy Logic Intelligent Agent for Information Extraction (20)

IRJET - A Survey on Machine Learning Algorithms, Techniques and Applications
IRJET - A Survey on Machine Learning Algorithms, Techniques and ApplicationsIRJET - A Survey on Machine Learning Algorithms, Techniques and Applications
IRJET - A Survey on Machine Learning Algorithms, Techniques and Applications
 
Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...
 
Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...Document Classification Using Expectation Maximization with Semi Supervised L...
Document Classification Using Expectation Maximization with Semi Supervised L...
 
IRJET- Factoid Question and Answering System
IRJET-  	  Factoid Question and Answering SystemIRJET-  	  Factoid Question and Answering System
IRJET- Factoid Question and Answering System
 
Classification of Machine Learning Algorithms
Classification of Machine Learning AlgorithmsClassification of Machine Learning Algorithms
Classification of Machine Learning Algorithms
 
Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine LearningSentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
 
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
 
Methodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniquesMethodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniques
 
Classification of Health Forum Messages using Deep Learning
Classification of Health Forum Messages using Deep LearningClassification of Health Forum Messages using Deep Learning
Classification of Health Forum Messages using Deep Learning
 
syllabus-CBR.pdf
syllabus-CBR.pdfsyllabus-CBR.pdf
syllabus-CBR.pdf
 
Threat Detection System Using Data-science and NLP
Threat Detection System Using Data-science and NLPThreat Detection System Using Data-science and NLP
Threat Detection System Using Data-science and NLP
 
Artificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptxArtificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptx
 
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques  Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
 
V34132136
V34132136V34132136
V34132136
 
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning Algorithms
 
Feature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering TechniquesFeature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering Techniques
 
CIS 25 SPRING 2020FINAL Due 1159 PM May 22 (this is a har.docx
CIS 25 SPRING 2020FINAL Due 1159 PM May 22 (this is a har.docxCIS 25 SPRING 2020FINAL Due 1159 PM May 22 (this is a har.docx
CIS 25 SPRING 2020FINAL Due 1159 PM May 22 (this is a har.docx
 
IRJET- Semantics based Document Clustering
IRJET- Semantics based Document ClusteringIRJET- Semantics based Document Clustering
IRJET- Semantics based Document Clustering
 
IRJET- Personality Prediction System using AI
IRJET- Personality Prediction System using AIIRJET- Personality Prediction System using AI
IRJET- Personality Prediction System using AI
 
A Survey Ondecision Tree Learning Algorithms for Knowledge Discovery
A Survey Ondecision Tree Learning Algorithms for Knowledge DiscoveryA Survey Ondecision Tree Learning Algorithms for Knowledge Discovery
A Survey Ondecision Tree Learning Algorithms for Knowledge Discovery
 

Recently uploaded

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 

Recently uploaded (20)

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 

A Fuzzy Logic Intelligent Agent for Information Extraction

  • 1.
  • 2.  A Fuzzy Logic intelligent agent for Information Extraction: Introducing a new Fuzzy Logic-based term weighting scheme (Jorge Ropero, Ariel Gomez, Alejandro Carrascom Carlos Leon ; Department of Electronic Technology, University of Seville, Spain ; October 2011)  A new Fuzzy Logic Based Information Retrieval Model (Slawomir Zadozny, Janusz Kacprzyk ; Polish Academy of Sciences ; January 2008)  Information Extraction in a Set of Knowledge Using a Fuzzy Logic Based Intelligent Agent (Jorge Ropero, Ariel Gomez, Alejandro Carrascom Carlos Leon ; University of Seville, Spain ; August 2007)
  • 3.  A method of Information Extraction (IE) in a set of knowledge is proposed to answer to user consultations using natural language.  The system is based on fuzzy logic engine, which takes advantages of its flexibility for managing sets of accumulated knowledge.  These sets can be built in hierarchic levels by a tree structure.  The eventual aim of this system is the implementation of an intelligent agent to manage the information contained in an internet website (portal).
  • 4.  The quantity of web information grows exponentially.  Achieving both high recall and precision in IR is one of its most important objectives.  IR has been widely used for text classification introducing approaches such as Vector Space Model (VSM), K nearest neighbor method , Bayesian Classification model , Neural networks and Support Vector Machine.  VSM is the most frequently used model.
  • 5.  We propose a fully novel method using Fuzzy Logic (FL), to extract the required information and related information too.  Giving answers to the user consultations in natural language.  Bearing in mind that non-experts users tend not to be exact in their searches.
  • 6.  Data Mining (DM) : is an automatic process of analyzing information in order to discover patters and to build predictive models, some of its applications are e-commerce, Text mining, e-learning and marketing.  Information Retrieval (IR) : is an automatic search of the relevant information contained in a set of knowledge.  Information Extraction (IE) : Once documents have been retrieved , the challenge is to extract the required information automatically, so its task is to identify the specific fragments of a document which constitute its main semantic content.
  • 7.  The main objective of the designed system must be to let the users find possible answers to what they are looking for in a huge set of knowledge.  The whole set must be classified into different objects.  These objects are the answers to possible user consultations, organized in hierarchic groups.  One or more standard questions are assigned for every object.  Different index terms from each standard question must then be selected in order to differentiate one object from the others.  Finally, term weights are assigned to every index term for every level of hierarchy in a scheme based on VSM, these terms are the inputs to the FL system.  The system must return to the user the object correspondent with the standard question, or questions that are more similar to the user consultations.
  • 8.  The first step is divide the whole set of knowledge into objects, one or more questions in NL are assigned to every object, the answers to this or these questions must represent the desired object.  The second step is the selection of the index terms, which are extracted from the questions. The index terms represent the most related terms of standard questions with the represented object.
  • 9.  We may consider mainly two methods for term weighting: 1. Let an expert in the matter evaluate intuitively the importance of index terms. This method is simple, but it has the disadvantages of depending exclusively on the engineer of knowledge. 2. Automate TW by means of a series of rules.  Given the large quantity of information there is in a web portal, we choose the second option.  We propose a VSM method for TW.  The most widely used method for TW is so-called TF-IDF method.
  • 10.
  • 11.  Every index term has an associated weight. This weight has a value between 0 and 1 depending on the importance of the term in every hierarchic level.  The greater is the importance of the term in a level, the higher is the weight of the term.  The term weight might not be the same for every hierarchic level. 𝑤𝑡,𝑑 = 𝑡𝑓𝑡,𝑑 . log 𝐷 𝑑′ ∈ 𝐷 | 𝑡 ∈ 𝑑′ Is the term frequency of term t in document d is inverse document frequency (a global parameter). | D | is the total number of documents in the document set, the dominator is the number of documents containing the term t.
  • 12.  The element of the intelligent agent which determines the degree of certainty for a group of index terms belonging or not to every possible subsets of the whole set of knowledge is the fuzzy inference engine.  The Inference engine has several inputs : the weights of selects index terms.  The Inference engine gives an output : the degree of certainty for a particular subset in particular level.
  • 13.  For the fuzzy engine , it is necessary to define: 1. The number of inputs to the inference engine. 2. Input fuzzy sets : input ranges, number of fuzzy sets and shape and range of membership functions. 3. Output fuzzy sets: output ranges, member of fuzzy sets, and shape and range of membership functions. 4. Fuzzy rules, they are of the IF … THEN type. 5. Used methods for AND and OR operations and defuzzifying.  All these parameters must be taken into account to find the optimal configuration for the inference engine.
  • 14.  Once standard questions are defined, index terms are extracted from them. Index terms are the ones that better represent a standard question.  Every index term is associated with its correspondent term weight. This weight has a value between 0 and 1 and depends on the importance of a term in a level.  The higher the importance of a term weight in a level, the higher is the term weight.  The final aim of the intelligent agent must be to find object or objects whose information is more similar to the requested user consultation.
  • 15. Step Example Step 1: Web page identified by standard question/s - Web page : www.us.es/univirtual/internetwww.us.es/u nivirtuail/internet - Standard question : Which services can I access as a virtual user at the University of Seville? Step 2: Locate standard question/s in hierarchic structure. Topic 12: Virtual University Section 6: Virtual User Object : 2 Step 3: Extract index terms Index terms : “services” , “virtual” , “user Step 4 : Term weighting Will discuss it later
  • 16.  The first goal is to check that the system makes a correct identification of standard questions with an index of certainty higher than a certain threshold.  This is related to the concept id recall.  The second goal is to check weather the required standard question is among the tree answers with higher degree of certainty.  This is related to precision.
  • 17.  Test results for standard questions recognition fit into five categories : 1. The correct question is the only one found or the one that has the highest degree of certainty. 2. The correct question is one between the two with the highest certainty or is the one that has the second highest degree of certainty. 3. The correct question is one between the three with the highest certainty or is the one that has the third highest degree of certainty. 4. The correct question is found but not among the three with the highest degree of certainty. 5. The correct question is not found.
  • 18.  Input range corresponds to weight range for every index term.  We considered three fuzzy sets represented by values LOW, MEDIUM and HIGH.  All of them are kept as triangular.  Output which gives the degree of certainty is defined as LOW, LOW-MEDIUM, MEDIUM-HIGH and HIGH.  The fact that the input takes 3 fuzzy sets is due to that the number of these sets are enough so that results are coherent and there are not so many options to let the number of rules increase.  Center of gravity defuzzifier.
  • 19.  The range of values for every input set is :  LOW, from 0.0 to 0.4 centered in 0.0.  MEDIUM from 0.2 to 0.8 centered in 0.5.  HIGH from 0.6 to 0.1 centered in 1.0.
  • 20.  The range of values for every output fuzzy set is :  LOW, from 0.0 to 0.4 centered in 0.0.  MEDIUM-LOW from 0.1 to 0.7 centered in 0.4.  MEDIUM-HIGH from 0.3 to 0.9 centered in 0.6.  HIGH from 0.6 to 1.0 centered in 1.0.
  • 21. Rule number Rule definition Output R1 IF one or more inputs = HIGH HIGH R2 IF three inputs = MEDIUM HIGH R3 IF two inputs = MEDIUM and one input = LOW MEDIUM-HIGH R4 IF one input = MEDIUM and two inputs = LOW MEDIUM-LOW R5 IF all inputs = LOW LOW
  • 22.  Step 1 : User query in Natural language.  “Which services can I access as a virtual user at the University of Seville ?”
  • 23.  Step 2: Index term extraction.  TiW = Term weight vector for Topic i.
  • 24.  Step 3: Weight vectors are taken as inputs to the fuzzy engine for every topic.  TiO = Fuzzy engine output for Topic i.  *Topics 10 and 12 are considered , threshold 0.4 in our case.
  • 25.  Step 4: Step 3 is repeated for the next hierarchic level “ Sections of the selected Topics”.  TiSjW = Term weight vector for Topic i, Section j.  TisjO = Fuzzy engine output for Topic i, Section j.
  • 26.  Step 5 : Step 3 is repeated for the next hierarchic level “Objects of the selected Sections”.  TiSjOkW = Term weight vector for Topic i, Section j, Object k.  TiSjOkO = Fuzzy engine output for Topic i, Section j , Object k.
  • 27.
  • 28.  We may consider two options to define these weights :  An expert in the matter should evaluate intuitively the importance of the index term.  Simple, but it has the disadvantage of depending exclusively on the knowledge engineer. Also not possible to automate.  The generation of automated weights by means of a set of rules.  The most widely used method is the TF-IDF method  A novel Fuzzy Logic based method.  Achieves better results in IE.
  • 29.  The FL-based method has two main advantages : 1. It improves the basic TF-IDF method by creating a table with all keywords and their corresponding weights for every object, this table will be created in the phase of keyword extraction from standard questions. 2. The whole method of term weighting is automated and the level of required expertise for an operator is lower.
  • 30.  The chosen formula for the tests was the one proposed by Liu et al(2001). 𝑊𝑖𝑘 = 𝑡𝑓𝑖𝑘 × log( 𝑁 𝑛 𝑘 + 0.001 ) 𝑘=1 𝑚 𝑡𝑓𝑖𝑘 × log 𝑁 𝑛 𝑘 + 0.001 2  Here, 𝑡𝑓𝑖𝑘 is the ith term frequency of occurrence in the kth subset “Topic/Section/Object”, 𝑛 𝑘 is the number of subsets to which the term Ti is assigned in a collection on N objects.
  • 31.  As an example we are using the term ‘virtual’ as used in the previous example.  At Topic level : o ‘Virtual’ appears 8 times in Topic 12 (tf = 8, K = 12) o ‘Virtual’ appears twice in other topics (nk = 3) o There are 12 Topics in total (N = 12) o Substituting , Wik = 0.20  At Section level: o ‘Virtual’ appears 3 times in Section 12.6 (tf = 3, K = 6) o ‘Virtual’ appears 5 times in other sections in Topic 12 (nk = 6) o There are 6 Sections in Topic 12 ( N = 6). o Substituting , Wik = 0.17  At Object level: o ‘Virtual’ appears Once in Section 12.6.2 (tf = 1, K = 2) o ‘Virtual’ appears twice in other Topic 12 (nk = 3) o There are 3 Objects in Section 12.6 ( N = 3). o Substituting , Wik = 0.01
  • 32.  TF-IDF has the disadvantage of not considering the degree of identification of the object if only the considered Index term is used.  FL-based term weighting method is defined as a four questions that must be answered to determine the TW of an index term. 1. Question 1 : How often does an index term appear in other subsets ? –related to IDF. 2. Question 2 : How often does an index term appear in its own subset? – related to TF. 3. Question 3 : Does an index term undoubtedly define an object by itself? 4. Question 4 : Is an index term tied to anther one?
  • 33.  The answer to these questions gives a series of values which are the inputs to a Fuzzy logic system , called Weight Assigner. The output of the WA is the definite weight for the corresponding index term.
  • 34.  Term weight is partly associated with the question “How often does an index term appear in other subsets”.  It is given by a value between 0 and 1.  0 if it appears many times  1 if it does not appear in any other subset.
  • 35.  Term weight values for every Topic for Q1:  Term weight values for every Section for Q1:  Term weight values for every Object for Q1:
  • 36.  To find the term weight associated with “How often does an index term appear in its own subset ?”  Q2 is senseless at the level of Object.
  • 37.  Example for term weight for every Topic and Section:
  • 38.  Does a term define undoubtedly a standard question ?  The answer is completely subjective to the answers of “Yes”, ”Rather” and “No”.
  • 39.  Term weight values for Q3:
  • 40.  Is an index term tied to another one?
  • 41. Rule Rule definition Output R1 IF Q1 = HIGH and Q2 != LOW At least MEDIUM-HIGH R2 IF Q1 = MEDIUM and Q2 = HIGH At least MEDIUM-HIGH R3 IF Q1 = HIGH and Q2 = LOW Depends on other Questions R4 IF Q1 = HIGH and Q2 = LOW Depends on other Questions R5 IF Q3 = HIGH At least MEDIUM-HIGH R6 IF Q4 = LOW Descends a level R7 IF Q4 = MEDIUM If the Output is MEDIUM- LOW, it depends to LOW R8 IF (R1 and R2) or (R1 and R5) or (R2 and R5) HIGH R9 In any other case MEDIUM-LOW
  • 42. Cat1 Cat2 Cat3 Cat4 Cat5 Total TF-IDF method 466 (50.98%) 223 (24.40%) 53 (5.80%) 79 (8.64%) 93 (10.18%) 914 FL method 710 (77.68%) 108 (11.82%) 27 (2.95%) 28 (3.06%) 41 (4.49%) 914