Submitted by: Iqra Tamseela
Roll no: 20 21
Topic
Normalization
Presentation
Information Retrieval
Technique
Our Content
• IR tool
Why use information retrieval tools
Retrieval Tools. Systems created for retrieval of information.Retrieval
tools are essential as basic building blocks for a system that will organize
recorded information that is collected by libraries, archives, museums, etc.
What is Normalization
The process of organizing data to minimize
redundancy. Normalization usually involves dividing a database into
two or more tables and defining relationships between the tables. ...
For example, in an employee list, each table would contain only one
birthdate field. normalization and its Types. Normalization is the
process of organizing data into a related table; it also eliminates
redundancy and increases the integrity which improves performance
of the query. Mar 15, 2011
Types of Normalization
• Normalization Avoids:
• Duplication of Data - The same data is listed in multiple lines of the database
• Insert Anomaly - A record about an entity cannot be inserted into the table without first
inserting information about another entity - Cannot enter a customer without a sales order
• Delete Anomaly - A record cannot be deleted without deleting a record about a related entity.
Cannot delete a sales order without deleting all of the customer's information.
• Update Anomaly - Cannot update information without changing information in many places.
To update customer information, it must be updated for each sales order the customer has placed
•
• Normalization ensure that the database is structured in the best possible way.
• To achieve control over data redundancy .There should be no necessary
duplication of data in different tables.
• To ensure tables have flexible.
• Searching,sorting,and creating indexes is faster, since tables are narrower, and
more rows fit on a data page.
• You usually have more tables
• Index searching is often faster
• A common misunderstanding is the term "frequency". To some, it seems to
be the count of objects. But usually, frequency is a relative value. TF/IDF
usually is a two-fold normalization. First, each document is normalized to
length 1, so there is no bias for longer or shorter documents
• Formula
• =tfi= tfi/tfmax
• More complicated SQL required for multitable sub queries and
joins.
• Extra work for DBMS can mean slower applications
• First Normal form(1NF)
• Second Normal form(2NF)
• Third Normal form(3NF)
• Fourth Normal form(4NF)
• Fifth Normal form(5NF)
•
Types of Normalization
•
• Document length normalization adjusts the term frequency or the
relevance score in order to normalize the effect of document length on the
document ranking.
. we may need to “normalize” words in indexed text as well as query words into the same
form
. we want to match U.S.A and USA
Token are transformed to terms which are then entered into the index
A term is a(normalized)word type ,which is an entry in our IR system dictionary
We most commonly implicitly define equivalence class of terms by e.g.,
deleting periods to form a term
U.S.A, USA(USA
. deleting hyphens to form a term
.anti-discriminatory, antidiscriminatory (antidiscriminatory
• Accents: e.g., French résumés. resume
• Simple remedy remove accent but not good in case of Resume
with and without accent.
Thanks for paying attention

information retrieval Techniques and normalization

  • 1.
    Submitted by: IqraTamseela Roll no: 20 21 Topic Normalization
  • 2.
  • 3.
  • 4.
    Why use informationretrieval tools Retrieval Tools. Systems created for retrieval of information.Retrieval tools are essential as basic building blocks for a system that will organize recorded information that is collected by libraries, archives, museums, etc.
  • 5.
    What is Normalization Theprocess of organizing data to minimize redundancy. Normalization usually involves dividing a database into two or more tables and defining relationships between the tables. ... For example, in an employee list, each table would contain only one birthdate field. normalization and its Types. Normalization is the process of organizing data into a related table; it also eliminates redundancy and increases the integrity which improves performance of the query. Mar 15, 2011
  • 6.
    Types of Normalization •Normalization Avoids: • Duplication of Data - The same data is listed in multiple lines of the database • Insert Anomaly - A record about an entity cannot be inserted into the table without first inserting information about another entity - Cannot enter a customer without a sales order • Delete Anomaly - A record cannot be deleted without deleting a record about a related entity. Cannot delete a sales order without deleting all of the customer's information. • Update Anomaly - Cannot update information without changing information in many places. To update customer information, it must be updated for each sales order the customer has placed •
  • 7.
    • Normalization ensurethat the database is structured in the best possible way. • To achieve control over data redundancy .There should be no necessary duplication of data in different tables. • To ensure tables have flexible.
  • 8.
    • Searching,sorting,and creatingindexes is faster, since tables are narrower, and more rows fit on a data page. • You usually have more tables • Index searching is often faster
  • 9.
    • A commonmisunderstanding is the term "frequency". To some, it seems to be the count of objects. But usually, frequency is a relative value. TF/IDF usually is a two-fold normalization. First, each document is normalized to length 1, so there is no bias for longer or shorter documents • Formula • =tfi= tfi/tfmax
  • 10.
    • More complicatedSQL required for multitable sub queries and joins. • Extra work for DBMS can mean slower applications
  • 11.
    • First Normalform(1NF) • Second Normal form(2NF) • Third Normal form(3NF) • Fourth Normal form(4NF) • Fifth Normal form(5NF)
  • 13.
  • 14.
  • 15.
    • Document lengthnormalization adjusts the term frequency or the relevance score in order to normalize the effect of document length on the document ranking.
  • 16.
    . we mayneed to “normalize” words in indexed text as well as query words into the same form . we want to match U.S.A and USA Token are transformed to terms which are then entered into the index A term is a(normalized)word type ,which is an entry in our IR system dictionary We most commonly implicitly define equivalence class of terms by e.g., deleting periods to form a term U.S.A, USA(USA . deleting hyphens to form a term .anti-discriminatory, antidiscriminatory (antidiscriminatory
  • 17.
    • Accents: e.g.,French résumés. resume • Simple remedy remove accent but not good in case of Resume with and without accent.
  • 19.