Introduction to Expert Systems.pptx

Introduction to Expert Systems
 Title: Author Identification System Using Keywords and
Pattern Frequency
 OBJECTIVE OF SYSTEM: To identify or verify authors
through text analysis.
 How it works: The system analyzes word frequency,
keyword frequency, and part-of-speech pattern frequency
(POS permutations) after keywords.

Problem definition and system
objectives
 PROBLEM DEFINITION: From multiple texts, it is difficult
to identify or verify the authorship of each text. This is
needed in many areas, such as copyright infringement
issues, pragmatism research, or authorship identification
of literary works.
 System Purpose: the system solves the above problem
using text analysis and Natural Language Processing (NLP)
techniques.

Introduction of the entire system
 Our system analyzes word frequency, keyword frequency,
and POS patterns after keywords in the input text.
 These data form a feature vector that represents the
writing style and style of a particular author.
 Finally, these feature vectors are used to identify or verify
authors.

Specific details in the system
 Word frequency: Authors tend to favor certain words.
Analyzing the frequency of these words can help identify
authors.
 Keyword frequency: The frequency of certain keywords is
also important because authors frequently write about a
particular subject or topic.
 POS Patterns: Authors also tend to use grammatical
patterns consistently. For example, a particular keyword
is always followed by a particular part of speech.

Problem
 Debbie was on her honeymoon and wrote an email to her mother. However,
the mother thought that Debbie might not have written the email and
contacted the police. We need to consider an expert system to ascertain
whether Debbie really wrote the email.

Overall system
 Prepare four data sets: one set of emails received from Debbie after the
marriage (Questioned), one set of emails received from Debbie before the
marriage (Known1), one set of emails received from Jamie (Known2), and an
unspecified set of emails (Reference).
 Using statistical analysis, discover Debbie's and Jamie's respective keywords
and see if the Keyword in Questioned applies to either of them.

System in detail (1/2)
 It divides the sentences in the four data sets by word and counts the number
of words that occur. Then sort them in order of word count.
 Summarize the words with the smallest percentage used compared to the
reference and the three data. These become keywords.
 Compare the keywords of the Questioned and the two people and calculate
how applicable they are. The one with the highest number of applicable
keywords is assumed to be the writer.

System in detail (2/2)
Tokenization of
sentences
Find keywords
based on
references
Output Jamie
label
Output Debbie
label
Does the keyword in
Known1 apply to the
Questioned more than
the keyword in Known2?
Yes No
Sort by word
frequency

Problem/Purpose
 Problem
The sentence does not know who wrote it.
 Purpose
Determine who wrote sentences obtained through the datasets

Overall system
Step1.input
sentences
Step2. compare them and
datasets
(such as word
frequency,keyword
frequency,keyword and
pattern frequency)
Step3.Output result of
Step2

System in detail how
①How to create dataset?
・ get a lot of sentences written by someone who we want to search.
・separate sentences by words
・Divide the dataset into word frequencies, keyword frequencies, and keyword and pattern frequencies

System in detail how (continue)
②How to compare inputting sentences and datasets?
・separate sentences by words
・Compare the dataset with the input words to see if there are similarities in word frequency, keyword
frequency, and keyword/pattern frequency.

Problem/Purpose
Problem:
After marriage, were emails received from Debbie written by herself?
Purpose:
To identify whether emails written by Debbie.

Overall system
Input:
Emails received from Debbie
after marriage
Inference:
Word frequency
Keyword frequency
Keyword pattern
Output:
Jamie/Debbie

System in detail 1
・The knowledge needed
Known Dataset: Emails received from Debbie before marriage,
Emails received from Jamie
Reference Dataset: Large collection of emails from many different
senders
・The inference needed
Frequency comparison, Keyword matching, Pattern analysis

System in detail 2
• Frequency comparison
This starts to create a word frequency list from each dataset of emails and compare whether there are
words or keywords which are overlapped.
• Keyword matching
This divides the sentences on emails into several keywords, counts their frequency, and searches for
keywords which are matching.
• Pattern analysis
This calculates keyword POS patterns comparing the reference dataset and other datasets. Also, this
evaluates the overlap of keyword patterns in emails.

Introduction to Expert Systems.pptx

Recommended

Recommended

More Related Content

Similar to Introduction to Expert Systems.pptx

Similar to Introduction to Expert Systems.pptx (20)

More from john6938

More from john6938 (20)

Recently uploaded

Recently uploaded (20)

Introduction to Expert Systems.pptx