SlideShare a Scribd company logo
INFORMATION RETRIEVAL (IR)
(PRIVATE VS. PUBLIC)
VENINGSTON. K
Ph.D. Student, Department of CSE,
Government College of Technology, Coimbatore.
veningstonk@gct.ac.in
PRESENTATION OUTLINE
 Public IR
 What is Web IR?
 Overview of Web IR Technologies
 Web IR Models
 Web Search architecture
 Semantic Matching
 Personalization in Web IR
 Challenges in Web based IR
 Challenges in Personalizing Web IR
 Summary Note
 Private IR
 What is Private IR?
 How Does It Work?
 PIR Model
 Approaches to PIR
 PIR Properties
 Summary Note
2
11/December/2013AICTEFDPonWebApplicationSecurity
WHY INFORMATION RETRIEVAL?
11/December/2013
3
AICTEFDPonWebApplicationSecurity
WEB INFORMATION RETRIEVAL
(WEB SEARCH)
 Technologies for helping users to accurately,
quickly, and easily find information on the web
11/December/2013
4
AICTEFDPonWebApplicationSecurity
GOAL OF WEB SEARCH
Accurate Efficient Easy to Use
Results are
relevant
Response time
is short
Good user
experience
Results are
comprehensive
Results are
novel
Fast task
completion
11/December/2013
5
AICTEFDPonWebApplicationSecurity
WEB USERS HEAVILY RELY ON SEARCH
ENGINES
11/December/2013
6
AICTEFDPonWebApplicationSecurity
HUGE DATA CENTERS
11/December/2013
7
AICTEFDPonWebApplicationSecurity
OVERVIEW OF WEB SEARCH
TECHNOLOGIES
 General Web Search, Entity Search, Facet
Search, Question Answering, Multimedia Search
 Ranking, Matching, Retrieval Document
Understanding, Query Understanding, Crawling,
Indexing, Result Presentation, Anti-spam
 Classification, Clustering, Ranking, Graph
Learning, Tagging, Distributed Computing
11/December/2013
8
AICTEFDPonWebApplicationSecurity
WEB SEARCH ARCHITECTURE
Query
String
IR
System
Ranked
Documents
1. Page1
2. Page2
3. Page3
.
.
Document
corpus
Web Spider
9
11/December/2013
9
AICTEFDPonWebApplicationSecurity
COMPONENT TECHNOLOGIES FOR WEB IR
 Relevance Ranking
 Importance Ranking
 Web Page Understanding
 Query Understanding
 Crawling
 Indexing
 Search Result Presentation
 Anti-Spam
 Search Log Data Mining / Web Mining
11/December/2013
10
AICTEFDPonWebApplicationSecurity
THREE IMPORTANT PROCESSES IN WEB IR
 Retrieval
 Finding documents from inverted index
 Matching
 Calculating relevance score between query and
document pair
 Ranking
 Ranking documents based on relevance scores,
importance scores, etc.,
11/December/2013
11
AICTEFDPonWebApplicationSecurity
WEB IR MODELS
 Vector Space Model (Salton 1975 )
 Probabilistic Model
 Okapi or BM25 Model (Robertson and Walker
1994 )
 Language Model (Ponte and Croft 1998 )
 User Model
11/December/2013
12
AICTEFDPonWebApplicationSecurity
VECTOR SPACE MODEL
11/December/2013
13
AICTEFDPonWebApplicationSecurity
PROBABILISTIC MODEL
11/December/2013
14
AICTEFDPonWebApplicationSecurity
OKAPI OR BM25 MODEL
11/December/2013
15
AICTEFDPonWebApplicationSecurity
LANGUAGE MODEL
11/December/2013
16
AICTEFDPonWebApplicationSecurity
USER MODEL
 User models are personal characteristics of the
user that the system maintains
 A user profile can be thought as a user model
 Types of user models
 Depending on the user being modeled
 Individual
 Canonical (group)
 Depending on Acquisition model
 Explicit (stated)
 Implicit (inferred)
11/December/2013
17
AICTEFDPonWebApplicationSecurity
SEMANTIC MATCHING
11/December/2013
18
AICTEFDPonWebApplicationSecurity
PERSONALIZATION - ENVIRONMENTS WHERE
IS BEING USED
 Databases
 Newsgroups
 Personal Information Management (desktop files, E-mail,
bookmarks, etc.)
 News: electronic journals
 Search engines
 Web sites
 Business
 e-commerce
 e-health
 e-etc.,
11/December/2013
19
AICTEFDPonWebApplicationSecurity
OBJECTIVES
 To enhance the Personalized Web Search and
Retrieval with an intention to satisfy user‟s search
context
 To customize the Web Information Retrieval (IR)
for users.
 To Provide results specific to individual users.
 It is predominantly important because different users
expect different information even for the same query
 To predict whether personalization required or not
 To develop Computationally intelligent and
efficient algorithm for this personalization task
11/December/2013
20
AICTEFDPonWebApplicationSecurity
PERSONALIZATION IN WEB IR [1/2]
 Web Personalization is viewed as an application
of data mining and machine learning techniques
to build models of user behavior that can be
applied to the task of predicting user needs and
adapting future interactions with the ultimate
goal of improved user satisfaction.
11/December/2013
21
AICTEFDPonWebApplicationSecurity
PERSONALIZATION IN WEB IR [2/2]
 Initially Search engines were concerned with
retrieving relevant documents to a query.
 Within the information overload on the web,
it is increasingly difficult for search engines
to satisfy the individual user needs.
 Personalization has long been recognized as
an avenue to greatly improve search
experience.
 Disambiguates the web search by modeling
the user profile by his/her interests and
preferences.
11/December/2013
22
AICTEFDPonWebApplicationSecurity
PROBLEM DESCRIPTION
 Personalization in Web IR
 Customize search results according to each individual user
 Research questions in Personalized Web IR
 What to use to Personalize?
 How to model and represent past search contexts?
 How to Personalize?
 How to use it to improve search results?
 When not to Personalize?
 How to decide whether personalization required or not?
 How to know Personalization helped?
 How to evaluate personalized results?
11/December/2013
23
AICTEFDPonWebApplicationSecurity
GENERAL PROBLEM STATEMENT
 When search query is issued, most of the search
engines return the same results irrespective of
the users interest
 Lack the existence of semantic structure and
hence it makes difficult for the machine to
understand the information provided by the user
 Lack in Identifying intention of the user
 Lack in processing Inaccurate / Ambiguous
queries  imprecise keyword
11/December/2013
24
AICTEFDPonWebApplicationSecurity
RELATED WORKS
 Short term personalization - book mark
 Long term personalization - browsing history
 Result Diversification - Query reformulation
 Collaborative personalization - for group of
users
 Search interaction personalization - Clicks
 Session based personalization
 Location based personalization
 Task based personalization
 and so on…
11/December/2013
25
AICTEFDPonWebApplicationSecurity
ARCHITECTURE OF PERSONALIZATION BASED
WEB IR
Rankings
Document
corpus
Ranked
Documents
1. Doc1
2. Doc2
3. Doc3
.
.
1. Doc1 
2. Doc2 
3. Doc3 
.
.
Feedback
Query
String
Revise
d
Query
Re-Ranked
Documents
1. Doc2
2. Doc4
3. Doc5
.
.
Query
Reformulation
Personalized
IR
Web
11/December/2013
26
AICTEFDPonWebApplicationSecurity
CHALLENGES FOR WEB IR
 Distributed Data: Documents spread over millions
of different web servers.
 Volatile Data: Many documents change or
disappear rapidly (e.g. dead links).
 Large Volume: Billions of separate documents.
 Unstructured and Redundant Data: No uniform
structure, HTML errors, up to 30% near duplicate
documents.
 Quality of Data: No editorial control, false
information, poor quality writing, typos, etc.
 Heterogeneous Data: Multiple media types (images,
video), languages, character sets, etc.
11/December/2013
27
AICTEFDPonWebApplicationSecurity
CHALLENGES FOR PERSONALIZATION IN
WEB IR
 From the system centered approach to a
user centered approach to IR
 Modeling the user context in personalized
IR
 Exploiting the user context to enhance
search quality
 The privacy issues
 The evaluation issues
11/December/2013
28
AICTEFDPonWebApplicationSecurity
Focused on the
next part of
presentation
POSSIBLE APPROACHES TO INFORMATION
RETRIEVAL
 Statistical approaches
◦ Co-occurrence of features between document
and query
◦ Rank documents based on similarity
 Semantic approaches
◦ “Understand” the query, find matching
documents
 User profile approaches
◦ User profiles store approximations of user
interests
11/December/2013
29
AICTEFDPonWebApplicationSecurity
BENEFITS OF PERSONALIZED SEARCH
 Resolving ambiguity
 The profile provides a context to the query in order
to reduce ambiguity.
 Example: The profile of interests will allow to distinguish what
the user asked about “Jaguar” (“Animal”, “Car”) really wants
 Revealing hidden treasures
 The profile allows to bring the most relevant
documents, which could be hidden beyond top
results page
 Example: Owner of iPhone searches for Google Android. Pages
referring to both would be most interesting
11/December/2013
30
AICTEFDPonWebApplicationSecurity
WHERE TO APPLY USER PROFILES?
 The user profile can be applied in several ways
 To modify the query itself  pre-processing
 Query Expansion  User profile is applied to add
terms to the query
 To process results of a query  post-processing
To present document snippets
Adaptation of meta-search
11/December/2013
31
AICTEFDPonWebApplicationSecurity
VARIATIONS OF USER PROFILE USAGE
11/December/2013
32
AICTEFDPonWebApplicationSecurity
SUMMARY ON IR
 Web Information Retrieval is a very challenging
yet exciting area!
 Solution: Learning individual user to match the
query with the document
 Personalized Web Information Retrieval
 Promises significant quality improvements. However,
they are far from optimal
 Thus, more research is necessary in the field of IR
 “Computational Intelligence“ could be adopted by
search tools to manage effectively search,
retrieval, filtering and presenting relevant
information.
11/December/2013
33
AICTEFDPonWebApplicationSecurity
PRIVATE INFORMATION RETRIEVAL (PIR)
[1995]
 Goal: allow user to query database while hiding the
identity of the data-items.
 Note: hides identity of data-items; not existence of
interaction with the user.
 Motivation: patent databases; stock quotes; web access
and so on.
 Paradox(?): imagine buying in a store without the seller
knowing what you buy.
(Encrypting requests is useful against third parties; not
against owner of data.)
11/December/2013
34
AICTEFDPonWebApplicationSecurity
WHAT IS PRIVATE INFORMATION
RETRIEVAL?
 Real-World Example:
 Suppose there is a movie database and we
want to find information on the movie „Indian‟
 We do not want anyone to know about our
interest in this movie.
11/December/2013
35
AICTEFDPonWebApplicationSecurity
THE GOAL OF PIR
 Suppose there is a movie database and we want
to find information on the movie „Endiran‟
 We do not want the database operator to know
about our interest in this movie.
 Users' intentions are to be kept secret
11/December/2013
36
AICTEFDPonWebApplicationSecurity
HOW DOES IT WORK?
 Very Simple approach
 Download the entire database
 Improved approach
 Suppose there is a database with blocks D1,…, Dr.
 A client wants to retrieve block Dα from the database
in such a way that the database operator learns
nothing about α.
 Do this without downloading the entire database.
11/December/2013
37
AICTEFDPonWebApplicationSecurity
GOLDBERG‟S SCHEME
 We can represent a database of r blocks as an rxs
matrix D and get the αth block (αth row) of D
using simple linear algebra
 Dα = eα.D
 Where eα =[0 0 … 1… 0] is a vector with all zeros,
except a one for the α coordinate.
 There are l servers, each with a copy of the
database.
 We secretly share eα in to v1,….,vl and send one to
each server.
 Each server computes and sends their response
 ri=vi.D
11/December/2013
38
AICTEFDPonWebApplicationSecurity
GOLDBERG‟S SCHEME
 The responses r1,….rk are secret shares for Dα. (k
is the number of responses)
 What happens if some of the responses are
wrong?
11/December/2013
39
AICTEFDPonWebApplicationSecurity
AOL SEARCH LOG DATA SCANDAL
#4417749:
 clothes for age 60
 60 single men
 best retirement city
 jarrett arnold
 jack t. arnold
 jaylene and jarrett arnold
 gwinnett county yellow pages
 rescue of older dogs
 movies for dogs
 sinus infection
Thelma Arnold
62-year-old widow
Lilburn, Georgia
11/December/2013
40
AICTEFDPonWebApplicationSecurity
OBSERVATION
 The owners of databases know a lot about the
users!
 This poses a risk to users‟ privacy.
 E.g. consider database with stock prices
 What can we do?
 Trust them that they will protect our secrecy,
or
 Use Cryptography
11/December/2013
41
AICTEFDPonWebApplicationSecurity
HOW CAN CRYPTO HELP?
Note: This problem has nothing to do with
secure communication!
user U database D
11/December/2013
42
AICTEFDPonWebApplicationSecurity
CURRENT SETTING
user U
database D
A new primitive:
Private Information Retrieval (PIR)
secure link
11/December/2013
43
AICTEFDPonWebApplicationSecurity
MODELING PIR
 Server: holds n-bit string x
 n should be thought of as very large
 User: desires
 to retrieve xi and
 to keep i private
11/December/2013
44
AICTEFDPonWebApplicationSecurity
x=x1,x2 , . . ., xn {0,1}n
SERVER
i {1,…n}
xi
USER
i j


PRIVATE PROTOCOL TO INFORMATION
RETRIEVAL
11/December/2013
45
AICTEFDPonWebApplicationSecurity
There is NO privacy preservation.
Communication Cost: log n
SERVER
USER
x =x1,x2 , . . ., xn
xi
NON-PRIVATE PROTOCOL
i
i {1,…n}
11/December/2013
46
AICTEFDPonWebApplicationSecurity
 Server sends entire database x to User.
 Information theoretic privacy.
 Communication Cost: n
SERVER
xi
USER
x =x1,x2 , . . ., xn
x1,x2 , . . ., xn
TRIVIAL PRIVATE PROTOCOL
Is this optimal?
“The number of bits communicated
between U and S has to be smaller
than n.”
11/December/2013
47
AICTEFDPonWebApplicationSecurity
PROBLEM
 In any 1-server PIR with information
theoretic privacy the communication is at
least n.
11/December/2013
48
AICTEFDPonWebApplicationSecurity
POSSIBLE SOLUTIONS
 User is asked for additional random indices.
 Drawback: reveals a lot of information
 Employ general crypto protocols to compute xi
privately.
 Drawback: highly inefficient (polynomial in n).
 Anonymity.
Note: Hides identity of user; not the fact that xi is
retrieved.
11/December/2013
49
AICTEFDPonWebApplicationSecurity
ANONYMITY - EXAMPLE
 Original Data vs. Anonymized Data
11/December/2013
50
AICTEFDPonWebApplicationSecurity
TWO APPROACHES
 Information-Theoretic PIR
 Replicate database among k servers.
 Unconditional privacy against t servers.
 Computational PIR
 Computational privacy, based on cryptographic
assumptions.
11/December/2013
51
AICTEFDPonWebApplicationSecurity
INFORMATION THEORETIC PRIVACY
(PERFECT PRIVACY)
 The distribution of the queries the user sends to
any server is independent of the index he/she
wishes to retrieve.
 This means that each server cannot gain any
information about user‟s interest regardless of
his computational power.
11/December/2013
52
AICTEFDPonWebApplicationSecurity
COMPUTATIONAL PRIVACY
 The distributions of the queries the user sends to
any server are computationally indistinguishable
by varying the index.
 This means that each server cannot gain any
information about user‟s interest provided that
he/she is computationally bounded.
11/December/2013
53
AICTEFDPonWebApplicationSecurity
COMMUNICATION COST
 Multiple servers, information-theoretic
PIR:
 2 servers, comm. n1/2
 k servers, comm. n1/k
 log n servers, comm. Poly( log(n) )
 Single server, computational PIR:
 Comm. Poly( log(n) )
11/December/2013
54
AICTEFDPonWebApplicationSecurity
K-SERVER PIR
Correctness: User
obtains xi
Privacy: No single
server gets
information about i
U
S1
x {0,1}n
S2
x {0,1}n
i
x {0,1}n
Sk



11/December/2013
55
AICTEFDPonWebApplicationSecurity
input:
PIR PROPERTIES
B1 B2 … Bw
input:
index i = 1,…,w
• the user learns Bi
• the database does not learn i
• the total communication is < w
Note: secrecy of the database is not required
correctness
secrecy (of the user)
non-triviality
These properties needs to be defined more formally!
polynomial time randomized interactive algorithms
11/December/2013
56
AICTEFDPonWebApplicationSecurity
PIR PROPERTIES
 Correctness
 In every invocation of the protocol the user retrieves
the bit he is interested in (i.e. xi)
 Privacy
 In every invocation of the protocol each server does
not gain any information about the index of the bit
retrieved by the user (i.e. i).
11/December/2013
57
AICTEFDPonWebApplicationSecurity
PIR DOESN‟T EXISTS [1/4]
Correctness, Non-triviality and Secrecy CANNOT be
satisfied simultaneously.
 Def: A transcript T is possible for (i,B) if P(T(i,B) = T) > 0
 Take some T’, and look where it is possible:
T’ T’
T’ T’
indices i
databasesB
11/December/2013AICTEFDPonWebApplicationSecurity
58
PIR DOESN‟T EXISTS [2/4]
secrecy → if
T’ is possible for some B and i
then
it is possible for B and all the other i’s
T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’
T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’
indices i
databasesB
T’ T’
T’ T’
11/December/2013AICTEFDPonWebApplicationSecurity
59
PIR DOESN‟T EXISTS [3/4]
non-triviality → length(transcript) < length(database)
↓
# transcripts < #databases
↓
there has to exist T’ that is possible for
two databases B0 and B1
T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’
T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’
databasesB
← B0
← B1
indices i
11/December/2013AICTEFDPonWebApplicationSecurity
60
PIR DOESN‟T EXISTS [4/4]
 B0 and B1 differ on at least one index i’. So, if i’ is the input
of the user then
correctness → contradiction
T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’
T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’
databasesB
← B0
← B1
i‟
↓
indices i
11/December/2013AICTEFDPonWebApplicationSecurity
61
THUS, IDEAL PIR DOESN‟T EXIST!
 How to bypass the impossibility result?
 Two ideas:
 limit the computing power of a cheating database
 use a larger number of “independent” databases
11/December/2013AICTEFDPonWebApplicationSecurity
62
SUMMARY
 Complexity of PIR
 Communication
 Computation
 Possible Extensions
 Symmetric PIR
 User may not learn any item other than the one he/she
requested
 Searching by key-words
 Public-key encryption with key-word search
11/December/2013
63
AICTEFDPonWebApplicationSecurity
REFERENCES
 Xiaohui Tao, Yuefeng Li, and Ning Zhong, “A Personalized Ontology model for
Web information gathering”, IEEE Trans. Knowledge and Data Engg., vol.23, No.
4, pp 496-511, April 2011.
 Markus Strohmaier, Mark Kr¨oll“Acquiring Knowledge about human goals from
search query logs”, ACM Transactions on Information System, March 2011.
 K.W.-T. Leung, W. Ng, and D.L. Lee, “Deriving Concept- Based User Profiles
from Search Engine Logs,” IEEE Trans. Knowledge and Data Engg., vol. 22,
no. 7, pp 969-982, July. 2010.
 Zhicheng Dou, Ruihua Song, Ji-Rong Wen, and Xiaojie Yuan, “Evaluating the
Effectiveness of Personalized Web Search” IEEE Trans. Knowledge and Data
Engg., Vol. 21, No. 8,pp 1178-1190, Aug 2009.
 Y. Li and N. Zhong. “Mining Ontology for Automatically Acquiring Web User
Information Needs”, IEEE Transactions on Knowledge and Data Engg., 18(4), pp
554-568, April 2006.
 Fang Liu, Clement Yu, Weiyi Meng, “Personalized Web Search for Improving
Retrieval Effectiveness” IEEE Trans. Knowledge and Data Engg., Vol. 16, No.
1,pp 28-40, January 2004.
 B. Chor, O. Goldreich, E. Kushilevitz, and M. Sudan, “Private information
retrieval”. Journal of the ACM 45(6),pp 965-982, 1995.
THANKING YOU

More Related Content

Similar to Information Retrieval AICTE FDP at GCT Coimbatore

Projection Multi Scale Hashing Keyword Search in Multidimensional Datasets
Projection Multi Scale Hashing Keyword Search in Multidimensional DatasetsProjection Multi Scale Hashing Keyword Search in Multidimensional Datasets
Projection Multi Scale Hashing Keyword Search in Multidimensional Datasets
IRJET Journal
 
50120140502013
5012014050201350120140502013
50120140502013
IAEME Publication
 
50120140502013
5012014050201350120140502013
50120140502013
IAEME Publication
 
Kp3518241828
Kp3518241828Kp3518241828
Kp3518241828
IJERA Editor
 
Detection of Behavior using Machine Learning
Detection of Behavior using Machine LearningDetection of Behavior using Machine Learning
Detection of Behavior using Machine Learning
IRJET Journal
 
Search Engine Scrapper
Search Engine ScrapperSearch Engine Scrapper
Search Engine Scrapper
IRJET Journal
 
USER PROFILE BASED PERSONALIZED WEB SEARCH
USER PROFILE BASED PERSONALIZED WEB SEARCHUSER PROFILE BASED PERSONALIZED WEB SEARCH
USER PROFILE BASED PERSONALIZED WEB SEARCH
ijmpict
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 
Supporting privacy protection in personalized web search (1)
Supporting privacy protection in personalized web search (1)Supporting privacy protection in personalized web search (1)
Supporting privacy protection in personalized web search (1)
Shakas Technologies
 
Tb mobile office presentation
Tb   mobile office presentationTb   mobile office presentation
Tb mobile office presentation
Dorothy Davis
 
Ac02411221125
Ac02411221125Ac02411221125
Ac02411221125
ijceronline
 
Study and Implementation of a Personalized Mobile Search Engine for Secure Se...
Study and Implementation of a Personalized Mobile Search Engine for Secure Se...Study and Implementation of a Personalized Mobile Search Engine for Secure Se...
Study and Implementation of a Personalized Mobile Search Engine for Secure Se...
IRJET Journal
 
Team of Rivals: UX, SEO, Content & Dev UXDC 2015
Team of Rivals: UX, SEO, Content & Dev  UXDC 2015Team of Rivals: UX, SEO, Content & Dev  UXDC 2015
Team of Rivals: UX, SEO, Content & Dev UXDC 2015
Marianne Sweeny
 
Data Science: Expediting Use of Data by Business Users with Self-service Disc...
Data Science: Expediting Use of Data by Business Users with Self-service Disc...Data Science: Expediting Use of Data by Business Users with Self-service Disc...
Data Science: Expediting Use of Data by Business Users with Self-service Disc...
Denodo
 
Gc vit sttp cc december 2013
Gc vit sttp cc december 2013Gc vit sttp cc december 2013
Gc vit sttp cc december 2013
Seema Shah
 
1.supporting privacy protection in personalized web search..9440480873 ,proje...
1.supporting privacy protection in personalized web search..9440480873 ,proje...1.supporting privacy protection in personalized web search..9440480873 ,proje...
1.supporting privacy protection in personalized web search..9440480873 ,proje...
RamaKrishnaReddyKona
 
Going mobile with enterprise application
Going mobile with enterprise applicationGoing mobile with enterprise application
Going mobile with enterprise application
Muzayun Mukhtar
 
Web mining and social media mining
Web mining and social media miningWeb mining and social media mining
Web mining and social media mining
Roxana Tadayon
 
Web
WebWeb
A270104
A270104A270104

Similar to Information Retrieval AICTE FDP at GCT Coimbatore (20)

Projection Multi Scale Hashing Keyword Search in Multidimensional Datasets
Projection Multi Scale Hashing Keyword Search in Multidimensional DatasetsProjection Multi Scale Hashing Keyword Search in Multidimensional Datasets
Projection Multi Scale Hashing Keyword Search in Multidimensional Datasets
 
50120140502013
5012014050201350120140502013
50120140502013
 
50120140502013
5012014050201350120140502013
50120140502013
 
Kp3518241828
Kp3518241828Kp3518241828
Kp3518241828
 
Detection of Behavior using Machine Learning
Detection of Behavior using Machine LearningDetection of Behavior using Machine Learning
Detection of Behavior using Machine Learning
 
Search Engine Scrapper
Search Engine ScrapperSearch Engine Scrapper
Search Engine Scrapper
 
USER PROFILE BASED PERSONALIZED WEB SEARCH
USER PROFILE BASED PERSONALIZED WEB SEARCHUSER PROFILE BASED PERSONALIZED WEB SEARCH
USER PROFILE BASED PERSONALIZED WEB SEARCH
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Supporting privacy protection in personalized web search (1)
Supporting privacy protection in personalized web search (1)Supporting privacy protection in personalized web search (1)
Supporting privacy protection in personalized web search (1)
 
Tb mobile office presentation
Tb   mobile office presentationTb   mobile office presentation
Tb mobile office presentation
 
Ac02411221125
Ac02411221125Ac02411221125
Ac02411221125
 
Study and Implementation of a Personalized Mobile Search Engine for Secure Se...
Study and Implementation of a Personalized Mobile Search Engine for Secure Se...Study and Implementation of a Personalized Mobile Search Engine for Secure Se...
Study and Implementation of a Personalized Mobile Search Engine for Secure Se...
 
Team of Rivals: UX, SEO, Content & Dev UXDC 2015
Team of Rivals: UX, SEO, Content & Dev  UXDC 2015Team of Rivals: UX, SEO, Content & Dev  UXDC 2015
Team of Rivals: UX, SEO, Content & Dev UXDC 2015
 
Data Science: Expediting Use of Data by Business Users with Self-service Disc...
Data Science: Expediting Use of Data by Business Users with Self-service Disc...Data Science: Expediting Use of Data by Business Users with Self-service Disc...
Data Science: Expediting Use of Data by Business Users with Self-service Disc...
 
Gc vit sttp cc december 2013
Gc vit sttp cc december 2013Gc vit sttp cc december 2013
Gc vit sttp cc december 2013
 
1.supporting privacy protection in personalized web search..9440480873 ,proje...
1.supporting privacy protection in personalized web search..9440480873 ,proje...1.supporting privacy protection in personalized web search..9440480873 ,proje...
1.supporting privacy protection in personalized web search..9440480873 ,proje...
 
Going mobile with enterprise application
Going mobile with enterprise applicationGoing mobile with enterprise application
Going mobile with enterprise application
 
Web mining and social media mining
Web mining and social media miningWeb mining and social media mining
Web mining and social media mining
 
Web
WebWeb
Web
 
A270104
A270104A270104
A270104
 

More from veningstonk

PReMI23_CFP - ISI Kolkata - Deadline - 10-05-2023.pdf
PReMI23_CFP - ISI Kolkata - Deadline - 10-05-2023.pdfPReMI23_CFP - ISI Kolkata - Deadline - 10-05-2023.pdf
PReMI23_CFP - ISI Kolkata - Deadline - 10-05-2023.pdf
veningstonk
 
25.ppt
25.ppt25.ppt
25.ppt
veningstonk
 
Unit 1 c - all topics
Unit 1   c - all topicsUnit 1   c - all topics
Unit 1 c - all topics
veningstonk
 
Enhancing Information Retrieval by Personalization Techniques
Enhancing Information Retrieval by Personalization TechniquesEnhancing Information Retrieval by Personalization Techniques
Enhancing Information Retrieval by Personalization Techniques
veningstonk
 
Personalized Information Retrieval system using Computational Intelligence Te...
Personalized Information Retrieval system using Computational Intelligence Te...Personalized Information Retrieval system using Computational Intelligence Te...
Personalized Information Retrieval system using Computational Intelligence Te...
veningstonk
 
Image re ranking system
Image re ranking systemImage re ranking system
Image re ranking system
veningstonk
 

More from veningstonk (6)

PReMI23_CFP - ISI Kolkata - Deadline - 10-05-2023.pdf
PReMI23_CFP - ISI Kolkata - Deadline - 10-05-2023.pdfPReMI23_CFP - ISI Kolkata - Deadline - 10-05-2023.pdf
PReMI23_CFP - ISI Kolkata - Deadline - 10-05-2023.pdf
 
25.ppt
25.ppt25.ppt
25.ppt
 
Unit 1 c - all topics
Unit 1   c - all topicsUnit 1   c - all topics
Unit 1 c - all topics
 
Enhancing Information Retrieval by Personalization Techniques
Enhancing Information Retrieval by Personalization TechniquesEnhancing Information Retrieval by Personalization Techniques
Enhancing Information Retrieval by Personalization Techniques
 
Personalized Information Retrieval system using Computational Intelligence Te...
Personalized Information Retrieval system using Computational Intelligence Te...Personalized Information Retrieval system using Computational Intelligence Te...
Personalized Information Retrieval system using Computational Intelligence Te...
 
Image re ranking system
Image re ranking systemImage re ranking system
Image re ranking system
 

Recently uploaded

一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 

Recently uploaded (20)

一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 

Information Retrieval AICTE FDP at GCT Coimbatore

  • 1. INFORMATION RETRIEVAL (IR) (PRIVATE VS. PUBLIC) VENINGSTON. K Ph.D. Student, Department of CSE, Government College of Technology, Coimbatore. veningstonk@gct.ac.in
  • 2. PRESENTATION OUTLINE  Public IR  What is Web IR?  Overview of Web IR Technologies  Web IR Models  Web Search architecture  Semantic Matching  Personalization in Web IR  Challenges in Web based IR  Challenges in Personalizing Web IR  Summary Note  Private IR  What is Private IR?  How Does It Work?  PIR Model  Approaches to PIR  PIR Properties  Summary Note 2 11/December/2013AICTEFDPonWebApplicationSecurity
  • 4. WEB INFORMATION RETRIEVAL (WEB SEARCH)  Technologies for helping users to accurately, quickly, and easily find information on the web 11/December/2013 4 AICTEFDPonWebApplicationSecurity
  • 5. GOAL OF WEB SEARCH Accurate Efficient Easy to Use Results are relevant Response time is short Good user experience Results are comprehensive Results are novel Fast task completion 11/December/2013 5 AICTEFDPonWebApplicationSecurity
  • 6. WEB USERS HEAVILY RELY ON SEARCH ENGINES 11/December/2013 6 AICTEFDPonWebApplicationSecurity
  • 8. OVERVIEW OF WEB SEARCH TECHNOLOGIES  General Web Search, Entity Search, Facet Search, Question Answering, Multimedia Search  Ranking, Matching, Retrieval Document Understanding, Query Understanding, Crawling, Indexing, Result Presentation, Anti-spam  Classification, Clustering, Ranking, Graph Learning, Tagging, Distributed Computing 11/December/2013 8 AICTEFDPonWebApplicationSecurity
  • 9. WEB SEARCH ARCHITECTURE Query String IR System Ranked Documents 1. Page1 2. Page2 3. Page3 . . Document corpus Web Spider 9 11/December/2013 9 AICTEFDPonWebApplicationSecurity
  • 10. COMPONENT TECHNOLOGIES FOR WEB IR  Relevance Ranking  Importance Ranking  Web Page Understanding  Query Understanding  Crawling  Indexing  Search Result Presentation  Anti-Spam  Search Log Data Mining / Web Mining 11/December/2013 10 AICTEFDPonWebApplicationSecurity
  • 11. THREE IMPORTANT PROCESSES IN WEB IR  Retrieval  Finding documents from inverted index  Matching  Calculating relevance score between query and document pair  Ranking  Ranking documents based on relevance scores, importance scores, etc., 11/December/2013 11 AICTEFDPonWebApplicationSecurity
  • 12. WEB IR MODELS  Vector Space Model (Salton 1975 )  Probabilistic Model  Okapi or BM25 Model (Robertson and Walker 1994 )  Language Model (Ponte and Croft 1998 )  User Model 11/December/2013 12 AICTEFDPonWebApplicationSecurity
  • 15. OKAPI OR BM25 MODEL 11/December/2013 15 AICTEFDPonWebApplicationSecurity
  • 17. USER MODEL  User models are personal characteristics of the user that the system maintains  A user profile can be thought as a user model  Types of user models  Depending on the user being modeled  Individual  Canonical (group)  Depending on Acquisition model  Explicit (stated)  Implicit (inferred) 11/December/2013 17 AICTEFDPonWebApplicationSecurity
  • 19. PERSONALIZATION - ENVIRONMENTS WHERE IS BEING USED  Databases  Newsgroups  Personal Information Management (desktop files, E-mail, bookmarks, etc.)  News: electronic journals  Search engines  Web sites  Business  e-commerce  e-health  e-etc., 11/December/2013 19 AICTEFDPonWebApplicationSecurity
  • 20. OBJECTIVES  To enhance the Personalized Web Search and Retrieval with an intention to satisfy user‟s search context  To customize the Web Information Retrieval (IR) for users.  To Provide results specific to individual users.  It is predominantly important because different users expect different information even for the same query  To predict whether personalization required or not  To develop Computationally intelligent and efficient algorithm for this personalization task 11/December/2013 20 AICTEFDPonWebApplicationSecurity
  • 21. PERSONALIZATION IN WEB IR [1/2]  Web Personalization is viewed as an application of data mining and machine learning techniques to build models of user behavior that can be applied to the task of predicting user needs and adapting future interactions with the ultimate goal of improved user satisfaction. 11/December/2013 21 AICTEFDPonWebApplicationSecurity
  • 22. PERSONALIZATION IN WEB IR [2/2]  Initially Search engines were concerned with retrieving relevant documents to a query.  Within the information overload on the web, it is increasingly difficult for search engines to satisfy the individual user needs.  Personalization has long been recognized as an avenue to greatly improve search experience.  Disambiguates the web search by modeling the user profile by his/her interests and preferences. 11/December/2013 22 AICTEFDPonWebApplicationSecurity
  • 23. PROBLEM DESCRIPTION  Personalization in Web IR  Customize search results according to each individual user  Research questions in Personalized Web IR  What to use to Personalize?  How to model and represent past search contexts?  How to Personalize?  How to use it to improve search results?  When not to Personalize?  How to decide whether personalization required or not?  How to know Personalization helped?  How to evaluate personalized results? 11/December/2013 23 AICTEFDPonWebApplicationSecurity
  • 24. GENERAL PROBLEM STATEMENT  When search query is issued, most of the search engines return the same results irrespective of the users interest  Lack the existence of semantic structure and hence it makes difficult for the machine to understand the information provided by the user  Lack in Identifying intention of the user  Lack in processing Inaccurate / Ambiguous queries  imprecise keyword 11/December/2013 24 AICTEFDPonWebApplicationSecurity
  • 25. RELATED WORKS  Short term personalization - book mark  Long term personalization - browsing history  Result Diversification - Query reformulation  Collaborative personalization - for group of users  Search interaction personalization - Clicks  Session based personalization  Location based personalization  Task based personalization  and so on… 11/December/2013 25 AICTEFDPonWebApplicationSecurity
  • 26. ARCHITECTURE OF PERSONALIZATION BASED WEB IR Rankings Document corpus Ranked Documents 1. Doc1 2. Doc2 3. Doc3 . . 1. Doc1  2. Doc2  3. Doc3  . . Feedback Query String Revise d Query Re-Ranked Documents 1. Doc2 2. Doc4 3. Doc5 . . Query Reformulation Personalized IR Web 11/December/2013 26 AICTEFDPonWebApplicationSecurity
  • 27. CHALLENGES FOR WEB IR  Distributed Data: Documents spread over millions of different web servers.  Volatile Data: Many documents change or disappear rapidly (e.g. dead links).  Large Volume: Billions of separate documents.  Unstructured and Redundant Data: No uniform structure, HTML errors, up to 30% near duplicate documents.  Quality of Data: No editorial control, false information, poor quality writing, typos, etc.  Heterogeneous Data: Multiple media types (images, video), languages, character sets, etc. 11/December/2013 27 AICTEFDPonWebApplicationSecurity
  • 28. CHALLENGES FOR PERSONALIZATION IN WEB IR  From the system centered approach to a user centered approach to IR  Modeling the user context in personalized IR  Exploiting the user context to enhance search quality  The privacy issues  The evaluation issues 11/December/2013 28 AICTEFDPonWebApplicationSecurity Focused on the next part of presentation
  • 29. POSSIBLE APPROACHES TO INFORMATION RETRIEVAL  Statistical approaches ◦ Co-occurrence of features between document and query ◦ Rank documents based on similarity  Semantic approaches ◦ “Understand” the query, find matching documents  User profile approaches ◦ User profiles store approximations of user interests 11/December/2013 29 AICTEFDPonWebApplicationSecurity
  • 30. BENEFITS OF PERSONALIZED SEARCH  Resolving ambiguity  The profile provides a context to the query in order to reduce ambiguity.  Example: The profile of interests will allow to distinguish what the user asked about “Jaguar” (“Animal”, “Car”) really wants  Revealing hidden treasures  The profile allows to bring the most relevant documents, which could be hidden beyond top results page  Example: Owner of iPhone searches for Google Android. Pages referring to both would be most interesting 11/December/2013 30 AICTEFDPonWebApplicationSecurity
  • 31. WHERE TO APPLY USER PROFILES?  The user profile can be applied in several ways  To modify the query itself  pre-processing  Query Expansion  User profile is applied to add terms to the query  To process results of a query  post-processing To present document snippets Adaptation of meta-search 11/December/2013 31 AICTEFDPonWebApplicationSecurity
  • 32. VARIATIONS OF USER PROFILE USAGE 11/December/2013 32 AICTEFDPonWebApplicationSecurity
  • 33. SUMMARY ON IR  Web Information Retrieval is a very challenging yet exciting area!  Solution: Learning individual user to match the query with the document  Personalized Web Information Retrieval  Promises significant quality improvements. However, they are far from optimal  Thus, more research is necessary in the field of IR  “Computational Intelligence“ could be adopted by search tools to manage effectively search, retrieval, filtering and presenting relevant information. 11/December/2013 33 AICTEFDPonWebApplicationSecurity
  • 34. PRIVATE INFORMATION RETRIEVAL (PIR) [1995]  Goal: allow user to query database while hiding the identity of the data-items.  Note: hides identity of data-items; not existence of interaction with the user.  Motivation: patent databases; stock quotes; web access and so on.  Paradox(?): imagine buying in a store without the seller knowing what you buy. (Encrypting requests is useful against third parties; not against owner of data.) 11/December/2013 34 AICTEFDPonWebApplicationSecurity
  • 35. WHAT IS PRIVATE INFORMATION RETRIEVAL?  Real-World Example:  Suppose there is a movie database and we want to find information on the movie „Indian‟  We do not want anyone to know about our interest in this movie. 11/December/2013 35 AICTEFDPonWebApplicationSecurity
  • 36. THE GOAL OF PIR  Suppose there is a movie database and we want to find information on the movie „Endiran‟  We do not want the database operator to know about our interest in this movie.  Users' intentions are to be kept secret 11/December/2013 36 AICTEFDPonWebApplicationSecurity
  • 37. HOW DOES IT WORK?  Very Simple approach  Download the entire database  Improved approach  Suppose there is a database with blocks D1,…, Dr.  A client wants to retrieve block Dα from the database in such a way that the database operator learns nothing about α.  Do this without downloading the entire database. 11/December/2013 37 AICTEFDPonWebApplicationSecurity
  • 38. GOLDBERG‟S SCHEME  We can represent a database of r blocks as an rxs matrix D and get the αth block (αth row) of D using simple linear algebra  Dα = eα.D  Where eα =[0 0 … 1… 0] is a vector with all zeros, except a one for the α coordinate.  There are l servers, each with a copy of the database.  We secretly share eα in to v1,….,vl and send one to each server.  Each server computes and sends their response  ri=vi.D 11/December/2013 38 AICTEFDPonWebApplicationSecurity
  • 39. GOLDBERG‟S SCHEME  The responses r1,….rk are secret shares for Dα. (k is the number of responses)  What happens if some of the responses are wrong? 11/December/2013 39 AICTEFDPonWebApplicationSecurity
  • 40. AOL SEARCH LOG DATA SCANDAL #4417749:  clothes for age 60  60 single men  best retirement city  jarrett arnold  jack t. arnold  jaylene and jarrett arnold  gwinnett county yellow pages  rescue of older dogs  movies for dogs  sinus infection Thelma Arnold 62-year-old widow Lilburn, Georgia 11/December/2013 40 AICTEFDPonWebApplicationSecurity
  • 41. OBSERVATION  The owners of databases know a lot about the users!  This poses a risk to users‟ privacy.  E.g. consider database with stock prices  What can we do?  Trust them that they will protect our secrecy, or  Use Cryptography 11/December/2013 41 AICTEFDPonWebApplicationSecurity
  • 42. HOW CAN CRYPTO HELP? Note: This problem has nothing to do with secure communication! user U database D 11/December/2013 42 AICTEFDPonWebApplicationSecurity
  • 43. CURRENT SETTING user U database D A new primitive: Private Information Retrieval (PIR) secure link 11/December/2013 43 AICTEFDPonWebApplicationSecurity
  • 44. MODELING PIR  Server: holds n-bit string x  n should be thought of as very large  User: desires  to retrieve xi and  to keep i private 11/December/2013 44 AICTEFDPonWebApplicationSecurity
  • 45. x=x1,x2 , . . ., xn {0,1}n SERVER i {1,…n} xi USER i j   PRIVATE PROTOCOL TO INFORMATION RETRIEVAL 11/December/2013 45 AICTEFDPonWebApplicationSecurity
  • 46. There is NO privacy preservation. Communication Cost: log n SERVER USER x =x1,x2 , . . ., xn xi NON-PRIVATE PROTOCOL i i {1,…n} 11/December/2013 46 AICTEFDPonWebApplicationSecurity
  • 47.  Server sends entire database x to User.  Information theoretic privacy.  Communication Cost: n SERVER xi USER x =x1,x2 , . . ., xn x1,x2 , . . ., xn TRIVIAL PRIVATE PROTOCOL Is this optimal? “The number of bits communicated between U and S has to be smaller than n.” 11/December/2013 47 AICTEFDPonWebApplicationSecurity
  • 48. PROBLEM  In any 1-server PIR with information theoretic privacy the communication is at least n. 11/December/2013 48 AICTEFDPonWebApplicationSecurity
  • 49. POSSIBLE SOLUTIONS  User is asked for additional random indices.  Drawback: reveals a lot of information  Employ general crypto protocols to compute xi privately.  Drawback: highly inefficient (polynomial in n).  Anonymity. Note: Hides identity of user; not the fact that xi is retrieved. 11/December/2013 49 AICTEFDPonWebApplicationSecurity
  • 50. ANONYMITY - EXAMPLE  Original Data vs. Anonymized Data 11/December/2013 50 AICTEFDPonWebApplicationSecurity
  • 51. TWO APPROACHES  Information-Theoretic PIR  Replicate database among k servers.  Unconditional privacy against t servers.  Computational PIR  Computational privacy, based on cryptographic assumptions. 11/December/2013 51 AICTEFDPonWebApplicationSecurity
  • 52. INFORMATION THEORETIC PRIVACY (PERFECT PRIVACY)  The distribution of the queries the user sends to any server is independent of the index he/she wishes to retrieve.  This means that each server cannot gain any information about user‟s interest regardless of his computational power. 11/December/2013 52 AICTEFDPonWebApplicationSecurity
  • 53. COMPUTATIONAL PRIVACY  The distributions of the queries the user sends to any server are computationally indistinguishable by varying the index.  This means that each server cannot gain any information about user‟s interest provided that he/she is computationally bounded. 11/December/2013 53 AICTEFDPonWebApplicationSecurity
  • 54. COMMUNICATION COST  Multiple servers, information-theoretic PIR:  2 servers, comm. n1/2  k servers, comm. n1/k  log n servers, comm. Poly( log(n) )  Single server, computational PIR:  Comm. Poly( log(n) ) 11/December/2013 54 AICTEFDPonWebApplicationSecurity
  • 55. K-SERVER PIR Correctness: User obtains xi Privacy: No single server gets information about i U S1 x {0,1}n S2 x {0,1}n i x {0,1}n Sk    11/December/2013 55 AICTEFDPonWebApplicationSecurity
  • 56. input: PIR PROPERTIES B1 B2 … Bw input: index i = 1,…,w • the user learns Bi • the database does not learn i • the total communication is < w Note: secrecy of the database is not required correctness secrecy (of the user) non-triviality These properties needs to be defined more formally! polynomial time randomized interactive algorithms 11/December/2013 56 AICTEFDPonWebApplicationSecurity
  • 57. PIR PROPERTIES  Correctness  In every invocation of the protocol the user retrieves the bit he is interested in (i.e. xi)  Privacy  In every invocation of the protocol each server does not gain any information about the index of the bit retrieved by the user (i.e. i). 11/December/2013 57 AICTEFDPonWebApplicationSecurity
  • 58. PIR DOESN‟T EXISTS [1/4] Correctness, Non-triviality and Secrecy CANNOT be satisfied simultaneously.  Def: A transcript T is possible for (i,B) if P(T(i,B) = T) > 0  Take some T’, and look where it is possible: T’ T’ T’ T’ indices i databasesB 11/December/2013AICTEFDPonWebApplicationSecurity 58
  • 59. PIR DOESN‟T EXISTS [2/4] secrecy → if T’ is possible for some B and i then it is possible for B and all the other i’s T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ indices i databasesB T’ T’ T’ T’ 11/December/2013AICTEFDPonWebApplicationSecurity 59
  • 60. PIR DOESN‟T EXISTS [3/4] non-triviality → length(transcript) < length(database) ↓ # transcripts < #databases ↓ there has to exist T’ that is possible for two databases B0 and B1 T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ databasesB ← B0 ← B1 indices i 11/December/2013AICTEFDPonWebApplicationSecurity 60
  • 61. PIR DOESN‟T EXISTS [4/4]  B0 and B1 differ on at least one index i’. So, if i’ is the input of the user then correctness → contradiction T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ T’ databasesB ← B0 ← B1 i‟ ↓ indices i 11/December/2013AICTEFDPonWebApplicationSecurity 61
  • 62. THUS, IDEAL PIR DOESN‟T EXIST!  How to bypass the impossibility result?  Two ideas:  limit the computing power of a cheating database  use a larger number of “independent” databases 11/December/2013AICTEFDPonWebApplicationSecurity 62
  • 63. SUMMARY  Complexity of PIR  Communication  Computation  Possible Extensions  Symmetric PIR  User may not learn any item other than the one he/she requested  Searching by key-words  Public-key encryption with key-word search 11/December/2013 63 AICTEFDPonWebApplicationSecurity
  • 64. REFERENCES  Xiaohui Tao, Yuefeng Li, and Ning Zhong, “A Personalized Ontology model for Web information gathering”, IEEE Trans. Knowledge and Data Engg., vol.23, No. 4, pp 496-511, April 2011.  Markus Strohmaier, Mark Kr¨oll“Acquiring Knowledge about human goals from search query logs”, ACM Transactions on Information System, March 2011.  K.W.-T. Leung, W. Ng, and D.L. Lee, “Deriving Concept- Based User Profiles from Search Engine Logs,” IEEE Trans. Knowledge and Data Engg., vol. 22, no. 7, pp 969-982, July. 2010.  Zhicheng Dou, Ruihua Song, Ji-Rong Wen, and Xiaojie Yuan, “Evaluating the Effectiveness of Personalized Web Search” IEEE Trans. Knowledge and Data Engg., Vol. 21, No. 8,pp 1178-1190, Aug 2009.  Y. Li and N. Zhong. “Mining Ontology for Automatically Acquiring Web User Information Needs”, IEEE Transactions on Knowledge and Data Engg., 18(4), pp 554-568, April 2006.  Fang Liu, Clement Yu, Weiyi Meng, “Personalized Web Search for Improving Retrieval Effectiveness” IEEE Trans. Knowledge and Data Engg., Vol. 16, No. 1,pp 28-40, January 2004.  B. Chor, O. Goldreich, E. Kushilevitz, and M. Sudan, “Private information retrieval”. Journal of the ACM 45(6),pp 965-982, 1995.