SlideShare a Scribd company logo
1 of 35
1
Searching Keyword-lacking Files
based on Latent Interfile
Relationships
Tetsutaro Watanabe (Tokyo Tech.
Japan)
Takashi Kobayashi (Nagoya U.
Japan)
Haruo Yokota (Tokyo Tech. Japan)Tokyo Tech
Nagoya U
ICSOFT2010 – 5th
Intl Conf.Software and Data Technologies
22nd
July 2010 @ Athens, Greece
2
Outline of today talk
Desktop search is must-have features
 But, how often say “Good Boy!” to him?
New desktop search method using “LATENT”
relationship between files
Our major contributions:
 A search method and system using inter-file
relationship with full-text search engine
 A method for automatic extraction of latent
inter-file relationship from file access logs
 Show feasibility and performance of our method
with real data experiments
We DON’T care
contents of files
cancel
Searching…
3
Background and Goal
Information Explosion
1. Background & Goal
2. Related works
3. Proposed
method & system
4. Experiment
5. Conclusion
4
Background
 Increase the number of files in file system [1]
 Many files & folders are generated and kept everyday
 Desktop file system became a forest of folders!
 Hard to classify files into appropriate directories
 Difficult to find a desired file in a deep node
 Desktop search (DS) is must-have features
 Give up classify file and traversing the folder forest
 Powerful desktop search function seamless merged
with current OS.
1.Background and Goal
[1] Agrawal, N., Bolosky, W. J., Douceur, J. R., and Lorch, J. R.
A five-year study of file-system metadata. ACM Transactions on Storage, 3(3). 2007.
5
 DS can find files include search keywords ONLY
 It based on full-text search engine
 CANNOT find keyword lacking files
even if they are related with keywords
 Many related files don’t include keywords
 Image figures
 Source data files
 Paper of related works
 Source codes for
experiments
 Explanatory filename is one solution. But…
 “figure_sect2_ICSOFT2010_FRIDAL_outline.jpg”
Research
Paper
1.Background and Goal
Background (cond.)
6
Our research goal
Searching method for keyword-lacking files
that match with given keywords
File system
1.Background and Goal
Not include but
Related with keyword
Include Keyword
(Full-text search)
Target
7
 Use metadata (eg. facet search )
 Enable rich search but need good metadata
 For important archive files, It works fine.
 Can you attach into all files you generated??
 Use references (eg. Google image search)
 One of automatic generatable metadata
 We can find even if images included no text
with text in referring documents.
 Reference information is (very) rare & costly
Need target specific (syntactic, logical) analyzer,
such as HTML/TeX analyzer, specific XML doc,
paper analyzer ( to find citation )
So…
1. Background & Goal
To find keyword lacking files:
8
 Use metadata (eg. facet search )
 Enable rich search but need good metadata
 For important archive files, It works fine.
 Can you attach into all files you generated??
 Use references (eg. Google image search)
 One of automatic generatable metadata
 We can find even if images included no text
with text in referring documents.
 Reference information is (very) rare & costly
Need target specific (syntactic, logical) analyzer,
such as HTML/TeX analyzer, specific XML doc,
paper analyzer ( to find citation )
So…
1. Background & Goal
To find keyword lacking files:
Research Question:
How to get the common, cost-free
relation information?
Our Answer:
Mine them from user activity automaticaly
9
Related works
1. Background & Goal
2. Related works
3. Proposed
method & system
4. Experiment
5. Conclusion
10
Related works
Semantic Approach [1][2]
 Attach rich metadata to manage & search files
Time based Metaphor
 Searching with timeline of past activity
 Time machine computing[3], SIS[4], OreDesk[5]
2.Related works
[1] Gifford, D. K et al. Semantic file systems.
In Proc. ACM Symposium on Operating Systems Principles (1991)
[2] Chirita, P. A. et al. Activity based metadata for semantic desktop search.
In Proc. Second European Semantic Web Conference (ESWC) (2005)
[3] Rekimoto, J. Timemachine computing: A timecentric approach for the information
environment. In Proc. ACM UIST’99 (1999)
[4] Dumais, S. el al. Stuff I’ve seen: A system for personal information retrieval and re-use.
In Proc. SIGIR2003 (2003)
[5] Ohsawa, R. et al. Oredesk: A tool for retrieving data history based on user operations.
In Proc. IEEE International Symposium on Multimedia (ISM) (2006)
11
Related works (cond.)
 Using relationship between files
 Applying PageRank idea [6]
 Using usage analysis technique [7]
 Integrate with fulltext-search: Connections[8]
 Calculate interfile relationships using system call
to file, and search files related with files in
context based search
2.Related works
[6] Nejd, W and Paiu, R. : Desktop search – how contextual information influences search
results and rankings. In Proc. Workshop on Information Retrieval in Context (IRiX) (2005)
[7] Chirita, P. A. and Nejdl, W. Analyzing user behavior to rank desktop items.
In Proc. Intl’ Symp. On String Processing and Information Retrieval(SPIRE) (2006)
[8] Soules, C. A. and Ganger:, G. R. : Connections: Using context to enhance file search,.
In Proc. ACM Symposium on Operating Systems Principles (2005)
12
Connections [Soules and Ganger 2005]
Count read-write relation in a time-window
They assume Written file refer Read file.
Propagate full-text search points
A B C
N sec
A
B
C
1
2
time
read()
write()
read()
write()
write()
Sytem call
trace log
open(s)
read(s)
write(s)
mmap(s)
stat(s)
dup(s)
link(S,D)
rename(S,F)
write()
2.Related works
Problem: Raw File I/O information is
NOT enough to analyze user activity
13
Proposed method & system
1. Background & Goal
2. Related works
3. Proposed
method & system
4. Experiment
5. Conclusion
File
Retrieval by
Inter-file relationship
Derived from Access Log
14
Outline of FRIDAL
Basic Assumption:
 Files frequently used
same timing are related
Key Features
 Cleaning raw file access log to extract
approximate file usage duration (AFUD)
 Calculate latent relation by analyzing
overlap of AFUDs
 Calculate Ranking for keyword using
Fulltext-search and relationship graph
Paper
(TeX)
Figure
3. Proposed method
15
Approximate File Use Duration (AFUD)
Case1: User keep opening
files without using.
Need to Triming FUD
Detect Activity
1) Any activity Exist in frame
“Ta”, “(s)he was active”
-> Eliminate inactive time.
2) Long ( > “Tb”) inactive time
means “(s)he went home”
-> Eliminate after inactive
time
Active
Time
>Tb
FUDs AFUDs
Ta
apply
1)
apply
2)
3. Proposed method:
16
Approximate File Use Duration (cond.)
 Case2: Some Application
don’t keep opening
 No or different exclusive
access control mechanism
 Many short FUDs only appers
 Detect Application manner
 “Average of FUD < Tc” means
“App don’t lock the file”
 Fill time slot between FUDs
in Active Times for such file type
Time
Active
Time FUDs AFUDs
3. Proposed method
17
 Calculate the interfile relationships by
the file use duration
1. Calculate four relationship
elements
T:Total time of COs
C:Number of COs
D:Total time of the time span
between COs
P: Similarity of the timings of
the open-file operations
2. Calculate interfile relatioship
Relationships =
δγβα
PDCT ⋅⋅⋅
Time
COs
Calculate latent interfile relationships
3.Proposed method
COs=co-occurrences
AFUDs
18
Calculate latent relationships (1 of 3)
 T:Total time of COs
 C:Number of COs
 Length & Frequency
of co-using
3.Proposed method
nC =
∑=
=
n
i
itT
1 c1
c2
c3
COsx
t2
t1
t3
c4 t4
Time
y
AFUDs
19
time
D1 D2 COs
time
C1 C2 COsD:Total time of
the time span
between COs
When user co-use in
several task, the
relation is stronger
than in a task.
Calculate latent relationships (2 of 3)
AFUDs AFUDs
3.Proposed method
d12
d23
d12
d23
∑
−
=
+=
1
1
)1(
n
i
iidD
20
Time Time
A1 A2 B1 B2
 P: Similarity of the
timings of the
open-file operations
Calculate latent relationships (3 of 3)
3.Proposed method
)1(1
)1(
1
1
1
1
<=
>





=
∑
∑∑
=
=
−
=
n
i
i
n
i
i
n
i
i
pP
ppP
p1
p2
p3 p3 = 0
p2
p1
21
1. Run the Full-text search using the input keywords
2. We score the file point for all files related to the
files found in the full-text search (discuss later)
3. Display the files ordered by point
Search result
1th 25pt
2th 20pt
3th 15pt
4th 10pt
5th 5pt
Search files using interfile relationships
2
12
5
3
203 10
13
9
Full-text search result
Relationship
File System
3.Proposed method
Target of Proposed method
25pt
15pt
5pt
10pt20pt
22
Score the file point
10
20 30
0.5
10.75
Full-text search result
0+15
(20 * 0.75)
+30
(30* 1)45
30
+10
+5
+0 +0
20
25
 Use TF-IDF and
Normalized Relationship
 Propagate just one hop
for computational costs.
3.Proposed method & System
Score of TF-IDF →  
10Final Score →   20   
Point (F) =
TF-IDF(F) +
∑TF-IDF(X) * NormRel (F,Xi)
Normalized
Relationship
23
FRIDAL Implementation
Full-text
Search Engine
(Hyper Estraier)
Web Interface
RDBMS
Controller
(java)  
User
File server
(Samba)
Full-text
index
Use file
Searching phase
Preparing phase
Store
relationships
Calculates
relationships
Get access
logs
Use file
Use file
Search
result
Search
related
files
Calculate
points
Search
Search
Full-text
search
Make full-text index
3.Proposed method & System
File system
Store
relationships
24
Experiments
1. Background & Goal
2. Related works
3. Proposed
method & system
4. Experiments
5. Conclusion
25
 Parameter of Relationships
 (α,β,γ, )=δ (1, 1, 0.5, 0.5)
based on a preparatory experiment
Experimental Environment
4. Experiments
Tester A
WinXP
319 Days
Tester B
WinXP
319 Days
Tester C
Win Vista
323 Days
Samba 2.2
Access Log of
MS Ofiice file, LaTeX
Image, Movie, file
A’s
Home
A’s
Home
B’s
Home
B’s
Home C’s
Home
C’s
Home
26
Mined Latent interfile relations
#Relations was not correlate size of Logs
 Depends on what (s)he were doing
Lines of Logs #Files # Rels
Tester A 4,873,703 1100 17,472
Tester B 4,323,090 713 5,692
Tester C 7,863,206 793 5,236
4. Experiments
27
Evaluation1
Task:
 Find specific files in another user’s home
Evaluate values
 The number of queries
 The number of files
that user checked until find files
 The number of found answer files
Comparison methods
 FRIDAL
 Full-text search
4. Experiment
28
Evaluation1: Results
File
Search
Method
#Check
File
#Check
Files
found
F1
FRIDAL 2 1 
Full-text 2 15 
F4
FRIDAL 1 2 
Full-text 1 11 
F6
FRIDAL 1 15 
Full-text 2 14 
Ave.
FRIDAL 1.3 6.0
 
Full-text 1.7 13.3
File
Search
method
#Queries
#Check
Files
found
F2
FRIDAL 1 9 1/1
Full-text 1 6 0/1
F3
FRIDAL 1 4 3/8
Full-text 1 0 0/8
F5
FRIDAL 1 2 1/1
Full-text 1 14 0/1
4. Experiment
Smaller cost
Only FRIDAL can find
FRIDAL can find keyword lacking files
and smaller costs than Full-text Search
F1 The paper of tester A
F2 The source of the image files in the paper of tester A
F3 The eight data files for the paper of tester A
F4 The paper of tester C
F5 The source of the image files in the paper of tester C
F6 The data file for the paper of tester C
29
Evaluation2
 Performance Comparison with other methods
 Prepare six tasks searching files from home directory
 (Details in Table 4 in our paper)
 Evaluate values
 Average of 11points avg precidion
 Average of top 20 precidion and recall
Comparison methods
 FRIDAL
 Full-text search
 Directory search
 Connections calculation
4. Experiment
30
Evaluation2 : Comparison methods
 Directory search
 Straightforward strategy
 Search the directory that
includes the full-text search
result
4. Experiment
Full-text
search 結果
...
In the
same
directory
with 1st
1st
2nd
3rd
4th
5th
6th
7th
Directory search
1st
2nd
...
 Connections calculation
 Use calculation method of
Connections
 Use the read/write attribute for file
access in the access logs instead of
read()/write()
 Use optimal parameter values authors
reported in their paper.
In the
same
directory
with 2nd
31
Evaluation2: Results
4. Experiment
Top 20 Avg of
precision
Avg of
recall
FRIDAL 0.72 0.15
Full-text search 0.54 0.12
Directory search 0.61 0.13
Connections calculation
0.48 0.10
FRIDAL が
最も高い値
FRIDAL is the
best score
 The precision of FRIDAL is higher than the other
methods at low recalls
FRIDAL can retrieve more relevant files than the others
in the high orders of the results, and so we can find the
desired files efficiently by using FRIDAL
32
Conclusion & Future work
1. Background & Goal
2. Related works
3. Proposed
method & system
4. Experiments
5. Conclusion
33
Conclusion
 FRIDAL: A new desktop search method using
latent relationship to search keyword-lacking files
 A method for automatic extraction of latent
relationship between files from file access logs
 A search method and system using inter-file
relationship with full-text search engine
Show feasibility and performance of FRIDAL
with real data experiments
 Best performance in Comparison methods
34
Future work
 Improve an implementation
 Support copy, move, and rename files
 Support other file access log (Windows Event Log)
 Improve the calculation of the interfile
relationships.
 Filter noise in calculation of AFUD
 Considering read/write(& move, delete…) actions.
 Improve our ranking method
 Detail analysis for multi user logs
 More Consideration of Time related infomation
Need to disuses “Old log is important or not”
35
Thank you! Questions & Comments ?

More Related Content

What's hot

Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with LuceneWO Community
 
LIS688_Group1
LIS688_Group1 LIS688_Group1
LIS688_Group1 e_chae
 
Tutorial 5 (lucene)
Tutorial 5 (lucene)Tutorial 5 (lucene)
Tutorial 5 (lucene)Kira
 
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...Anubhav Jain
 
Linked data intro primer
Linked data intro primerLinked data intro primer
Linked data intro primerKaren Estlund
 
CSCI6505 Project:Construct search engine using ML approach
CSCI6505 Project:Construct search engine using ML approachCSCI6505 Project:Construct search engine using ML approach
CSCI6505 Project:Construct search engine using ML approachbutest
 
Seminar report(rohitsahu cs 17 vth sem)
Seminar report(rohitsahu cs 17 vth sem)Seminar report(rohitsahu cs 17 vth sem)
Seminar report(rohitsahu cs 17 vth sem)ROHIT SAHU
 
Privacy Preserving Multi-keyword Top-K Search based on Cosine Similarity Clus...
Privacy Preserving Multi-keyword Top-K Search based on Cosine Similarity Clus...Privacy Preserving Multi-keyword Top-K Search based on Cosine Similarity Clus...
Privacy Preserving Multi-keyword Top-K Search based on Cosine Similarity Clus...IRJET Journal
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?lucenerevolution
 
Open-source tools for generating and analyzing large materials data sets
Open-source tools for generating and analyzing large materials data setsOpen-source tools for generating and analyzing large materials data sets
Open-source tools for generating and analyzing large materials data setsAnubhav Jain
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and KnowledgeIan Foster
 
The DuraMat Data Hub and Analytics Capability: A Resource for Solar PV Data
The DuraMat Data Hub and Analytics Capability: A Resource for Solar PV DataThe DuraMat Data Hub and Analytics Capability: A Resource for Solar PV Data
The DuraMat Data Hub and Analytics Capability: A Resource for Solar PV DataAnubhav Jain
 
Applications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignApplications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignAnubhav Jain
 
Lucene BootCamp
Lucene BootCampLucene BootCamp
Lucene BootCampGokulD
 

What's hot (20)

Elastic search mind mapping
Elastic search mind mappingElastic search mind mapping
Elastic search mind mapping
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with Lucene
 
LIS688_Group1
LIS688_Group1 LIS688_Group1
LIS688_Group1
 
Tutorial 5 (lucene)
Tutorial 5 (lucene)Tutorial 5 (lucene)
Tutorial 5 (lucene)
 
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
 
Grouper
GrouperGrouper
Grouper
 
Data structure mind mapping
Data structure mind mapping Data structure mind mapping
Data structure mind mapping
 
Linked data intro primer
Linked data intro primerLinked data intro primer
Linked data intro primer
 
Iugonet 20100706
Iugonet 20100706Iugonet 20100706
Iugonet 20100706
 
CSCI6505 Project:Construct search engine using ML approach
CSCI6505 Project:Construct search engine using ML approachCSCI6505 Project:Construct search engine using ML approach
CSCI6505 Project:Construct search engine using ML approach
 
Seminar report(rohitsahu cs 17 vth sem)
Seminar report(rohitsahu cs 17 vth sem)Seminar report(rohitsahu cs 17 vth sem)
Seminar report(rohitsahu cs 17 vth sem)
 
Privacy Preserving Multi-keyword Top-K Search based on Cosine Similarity Clus...
Privacy Preserving Multi-keyword Top-K Search based on Cosine Similarity Clus...Privacy Preserving Multi-keyword Top-K Search based on Cosine Similarity Clus...
Privacy Preserving Multi-keyword Top-K Search based on Cosine Similarity Clus...
 
Lucene indexing
Lucene indexingLucene indexing
Lucene indexing
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?
 
Open-source tools for generating and analyzing large materials data sets
Open-source tools for generating and analyzing large materials data setsOpen-source tools for generating and analyzing large materials data sets
Open-source tools for generating and analyzing large materials data sets
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
The DuraMat Data Hub and Analytics Capability: A Resource for Solar PV Data
The DuraMat Data Hub and Analytics Capability: A Resource for Solar PV DataThe DuraMat Data Hub and Analytics Capability: A Resource for Solar PV Data
The DuraMat Data Hub and Analytics Capability: A Resource for Solar PV Data
 
Lucece Indexing
Lucece IndexingLucece Indexing
Lucece Indexing
 
Applications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignApplications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials Design
 
Lucene BootCamp
Lucene BootCampLucene BootCamp
Lucene BootCamp
 

Similar to Searching Keyword-lacking Files based on Latent Interfile Relationships

An Efficient Search Engine for Searching Desired File
An Efficient Search Engine for Searching Desired FileAn Efficient Search Engine for Searching Desired File
An Efficient Search Engine for Searching Desired FileIDES Editor
 
File Reconstruction in Digital Forensic
File Reconstruction in Digital ForensicFile Reconstruction in Digital Forensic
File Reconstruction in Digital ForensicTELKOMNIKA JOURNAL
 
Research data management: course OGO Quantitative research (21-11-2018)
Research data management: course OGO Quantitative research (21-11-2018)Research data management: course OGO Quantitative research (21-11-2018)
Research data management: course OGO Quantitative research (21-11-2018)Leon Osinski
 
An Efficient Approach to Manage Small Files in Distributed File Systems
An Efficient Approach to Manage Small Files in Distributed File SystemsAn Efficient Approach to Manage Small Files in Distributed File Systems
An Efficient Approach to Manage Small Files in Distributed File SystemsIRJET Journal
 
6.domain extraction from research papers
6.domain extraction from research papers6.domain extraction from research papers
6.domain extraction from research papersEditorJST
 
Automatic document clustering
Automatic document clusteringAutomatic document clustering
Automatic document clusteringIAEME Publication
 
Article Summarizer
Article SummarizerArticle Summarizer
Article SummarizerJose Katab
 
Pikas Asist2007 PIM Senior Engineers Final
Pikas Asist2007 PIM Senior Engineers FinalPikas Asist2007 PIM Senior Engineers Final
Pikas Asist2007 PIM Senior Engineers FinalChristina Pikas
 
Data Science Process.pptx
Data Science Process.pptxData Science Process.pptx
Data Science Process.pptxWidsoulDevil
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodDuncan Hull
 
File Handling and Preprocessor Directives
File Handling and Preprocessor DirectivesFile Handling and Preprocessor Directives
File Handling and Preprocessor DirectivesSelvaraj Seerangan
 
Research data management: course 0HV90, Behavioral Research Methods
Research data management: course 0HV90, Behavioral Research MethodsResearch data management: course 0HV90, Behavioral Research Methods
Research data management: course 0HV90, Behavioral Research MethodsLeon Osinski
 
INput output stream in ccP Full Detail.pptx
INput output stream in ccP Full Detail.pptxINput output stream in ccP Full Detail.pptx
INput output stream in ccP Full Detail.pptxAssadLeo1
 
Degonto file management
Degonto file managementDegonto file management
Degonto file managementDegonto Islam
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibEl Habib NFAOUI
 
An Analyzing of different Techniques and Tools to Recover Data from Volatile ...
An Analyzing of different Techniques and Tools to Recover Data from Volatile ...An Analyzing of different Techniques and Tools to Recover Data from Volatile ...
An Analyzing of different Techniques and Tools to Recover Data from Volatile ...ijsrd.com
 
A Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia ArticlesA Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia Articlesijma
 
Liberating Laboratory Data - Eureka
Liberating Laboratory Data - EurekaLiberating Laboratory Data - Eureka
Liberating Laboratory Data - EurekaStuart Chalk
 

Similar to Searching Keyword-lacking Files based on Latent Interfile Relationships (20)

An Efficient Search Engine for Searching Desired File
An Efficient Search Engine for Searching Desired FileAn Efficient Search Engine for Searching Desired File
An Efficient Search Engine for Searching Desired File
 
File Reconstruction in Digital Forensic
File Reconstruction in Digital ForensicFile Reconstruction in Digital Forensic
File Reconstruction in Digital Forensic
 
Research data management: course OGO Quantitative research (21-11-2018)
Research data management: course OGO Quantitative research (21-11-2018)Research data management: course OGO Quantitative research (21-11-2018)
Research data management: course OGO Quantitative research (21-11-2018)
 
An Efficient Approach to Manage Small Files in Distributed File Systems
An Efficient Approach to Manage Small Files in Distributed File SystemsAn Efficient Approach to Manage Small Files in Distributed File Systems
An Efficient Approach to Manage Small Files in Distributed File Systems
 
6.domain extraction from research papers
6.domain extraction from research papers6.domain extraction from research papers
6.domain extraction from research papers
 
Automatic document clustering
Automatic document clusteringAutomatic document clustering
Automatic document clustering
 
Article Summarizer
Article SummarizerArticle Summarizer
Article Summarizer
 
Pikas Asist2007 PIM Senior Engineers Final
Pikas Asist2007 PIM Senior Engineers FinalPikas Asist2007 PIM Senior Engineers Final
Pikas Asist2007 PIM Senior Engineers Final
 
Data Science Process.pptx
Data Science Process.pptxData Science Process.pptx
Data Science Process.pptx
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
FILE MANAGEMENT.pptx
FILE MANAGEMENT.pptxFILE MANAGEMENT.pptx
FILE MANAGEMENT.pptx
 
File Handling and Preprocessor Directives
File Handling and Preprocessor DirectivesFile Handling and Preprocessor Directives
File Handling and Preprocessor Directives
 
Research data management: course 0HV90, Behavioral Research Methods
Research data management: course 0HV90, Behavioral Research MethodsResearch data management: course 0HV90, Behavioral Research Methods
Research data management: course 0HV90, Behavioral Research Methods
 
INput output stream in ccP Full Detail.pptx
INput output stream in ccP Full Detail.pptxINput output stream in ccP Full Detail.pptx
INput output stream in ccP Full Detail.pptx
 
Degonto file management
Degonto file managementDegonto file management
Degonto file management
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_Habib
 
An Analyzing of different Techniques and Tools to Recover Data from Volatile ...
An Analyzing of different Techniques and Tools to Recover Data from Volatile ...An Analyzing of different Techniques and Tools to Recover Data from Volatile ...
An Analyzing of different Techniques and Tools to Recover Data from Volatile ...
 
A Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia ArticlesA Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia Articles
 
Python-FileHandling.pptx
Python-FileHandling.pptxPython-FileHandling.pptx
Python-FileHandling.pptx
 
Liberating Laboratory Data - Eureka
Liberating Laboratory Data - EurekaLiberating Laboratory Data - Eureka
Liberating Laboratory Data - Eureka
 

Recently uploaded

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 

Recently uploaded (20)

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 

Searching Keyword-lacking Files based on Latent Interfile Relationships

  • 1. 1 Searching Keyword-lacking Files based on Latent Interfile Relationships Tetsutaro Watanabe (Tokyo Tech. Japan) Takashi Kobayashi (Nagoya U. Japan) Haruo Yokota (Tokyo Tech. Japan)Tokyo Tech Nagoya U ICSOFT2010 – 5th Intl Conf.Software and Data Technologies 22nd July 2010 @ Athens, Greece
  • 2. 2 Outline of today talk Desktop search is must-have features  But, how often say “Good Boy!” to him? New desktop search method using “LATENT” relationship between files Our major contributions:  A search method and system using inter-file relationship with full-text search engine  A method for automatic extraction of latent inter-file relationship from file access logs  Show feasibility and performance of our method with real data experiments We DON’T care contents of files cancel Searching…
  • 3. 3 Background and Goal Information Explosion 1. Background & Goal 2. Related works 3. Proposed method & system 4. Experiment 5. Conclusion
  • 4. 4 Background  Increase the number of files in file system [1]  Many files & folders are generated and kept everyday  Desktop file system became a forest of folders!  Hard to classify files into appropriate directories  Difficult to find a desired file in a deep node  Desktop search (DS) is must-have features  Give up classify file and traversing the folder forest  Powerful desktop search function seamless merged with current OS. 1.Background and Goal [1] Agrawal, N., Bolosky, W. J., Douceur, J. R., and Lorch, J. R. A five-year study of file-system metadata. ACM Transactions on Storage, 3(3). 2007.
  • 5. 5  DS can find files include search keywords ONLY  It based on full-text search engine  CANNOT find keyword lacking files even if they are related with keywords  Many related files don’t include keywords  Image figures  Source data files  Paper of related works  Source codes for experiments  Explanatory filename is one solution. But…  “figure_sect2_ICSOFT2010_FRIDAL_outline.jpg” Research Paper 1.Background and Goal Background (cond.)
  • 6. 6 Our research goal Searching method for keyword-lacking files that match with given keywords File system 1.Background and Goal Not include but Related with keyword Include Keyword (Full-text search) Target
  • 7. 7  Use metadata (eg. facet search )  Enable rich search but need good metadata  For important archive files, It works fine.  Can you attach into all files you generated??  Use references (eg. Google image search)  One of automatic generatable metadata  We can find even if images included no text with text in referring documents.  Reference information is (very) rare & costly Need target specific (syntactic, logical) analyzer, such as HTML/TeX analyzer, specific XML doc, paper analyzer ( to find citation ) So… 1. Background & Goal To find keyword lacking files:
  • 8. 8  Use metadata (eg. facet search )  Enable rich search but need good metadata  For important archive files, It works fine.  Can you attach into all files you generated??  Use references (eg. Google image search)  One of automatic generatable metadata  We can find even if images included no text with text in referring documents.  Reference information is (very) rare & costly Need target specific (syntactic, logical) analyzer, such as HTML/TeX analyzer, specific XML doc, paper analyzer ( to find citation ) So… 1. Background & Goal To find keyword lacking files: Research Question: How to get the common, cost-free relation information? Our Answer: Mine them from user activity automaticaly
  • 9. 9 Related works 1. Background & Goal 2. Related works 3. Proposed method & system 4. Experiment 5. Conclusion
  • 10. 10 Related works Semantic Approach [1][2]  Attach rich metadata to manage & search files Time based Metaphor  Searching with timeline of past activity  Time machine computing[3], SIS[4], OreDesk[5] 2.Related works [1] Gifford, D. K et al. Semantic file systems. In Proc. ACM Symposium on Operating Systems Principles (1991) [2] Chirita, P. A. et al. Activity based metadata for semantic desktop search. In Proc. Second European Semantic Web Conference (ESWC) (2005) [3] Rekimoto, J. Timemachine computing: A timecentric approach for the information environment. In Proc. ACM UIST’99 (1999) [4] Dumais, S. el al. Stuff I’ve seen: A system for personal information retrieval and re-use. In Proc. SIGIR2003 (2003) [5] Ohsawa, R. et al. Oredesk: A tool for retrieving data history based on user operations. In Proc. IEEE International Symposium on Multimedia (ISM) (2006)
  • 11. 11 Related works (cond.)  Using relationship between files  Applying PageRank idea [6]  Using usage analysis technique [7]  Integrate with fulltext-search: Connections[8]  Calculate interfile relationships using system call to file, and search files related with files in context based search 2.Related works [6] Nejd, W and Paiu, R. : Desktop search – how contextual information influences search results and rankings. In Proc. Workshop on Information Retrieval in Context (IRiX) (2005) [7] Chirita, P. A. and Nejdl, W. Analyzing user behavior to rank desktop items. In Proc. Intl’ Symp. On String Processing and Information Retrieval(SPIRE) (2006) [8] Soules, C. A. and Ganger:, G. R. : Connections: Using context to enhance file search,. In Proc. ACM Symposium on Operating Systems Principles (2005)
  • 12. 12 Connections [Soules and Ganger 2005] Count read-write relation in a time-window They assume Written file refer Read file. Propagate full-text search points A B C N sec A B C 1 2 time read() write() read() write() write() Sytem call trace log open(s) read(s) write(s) mmap(s) stat(s) dup(s) link(S,D) rename(S,F) write() 2.Related works Problem: Raw File I/O information is NOT enough to analyze user activity
  • 13. 13 Proposed method & system 1. Background & Goal 2. Related works 3. Proposed method & system 4. Experiment 5. Conclusion File Retrieval by Inter-file relationship Derived from Access Log
  • 14. 14 Outline of FRIDAL Basic Assumption:  Files frequently used same timing are related Key Features  Cleaning raw file access log to extract approximate file usage duration (AFUD)  Calculate latent relation by analyzing overlap of AFUDs  Calculate Ranking for keyword using Fulltext-search and relationship graph Paper (TeX) Figure 3. Proposed method
  • 15. 15 Approximate File Use Duration (AFUD) Case1: User keep opening files without using. Need to Triming FUD Detect Activity 1) Any activity Exist in frame “Ta”, “(s)he was active” -> Eliminate inactive time. 2) Long ( > “Tb”) inactive time means “(s)he went home” -> Eliminate after inactive time Active Time >Tb FUDs AFUDs Ta apply 1) apply 2) 3. Proposed method:
  • 16. 16 Approximate File Use Duration (cond.)  Case2: Some Application don’t keep opening  No or different exclusive access control mechanism  Many short FUDs only appers  Detect Application manner  “Average of FUD < Tc” means “App don’t lock the file”  Fill time slot between FUDs in Active Times for such file type Time Active Time FUDs AFUDs 3. Proposed method
  • 17. 17  Calculate the interfile relationships by the file use duration 1. Calculate four relationship elements T:Total time of COs C:Number of COs D:Total time of the time span between COs P: Similarity of the timings of the open-file operations 2. Calculate interfile relatioship Relationships = δγβα PDCT ⋅⋅⋅ Time COs Calculate latent interfile relationships 3.Proposed method COs=co-occurrences AFUDs
  • 18. 18 Calculate latent relationships (1 of 3)  T:Total time of COs  C:Number of COs  Length & Frequency of co-using 3.Proposed method nC = ∑= = n i itT 1 c1 c2 c3 COsx t2 t1 t3 c4 t4 Time y AFUDs
  • 19. 19 time D1 D2 COs time C1 C2 COsD:Total time of the time span between COs When user co-use in several task, the relation is stronger than in a task. Calculate latent relationships (2 of 3) AFUDs AFUDs 3.Proposed method d12 d23 d12 d23 ∑ − = += 1 1 )1( n i iidD
  • 20. 20 Time Time A1 A2 B1 B2  P: Similarity of the timings of the open-file operations Calculate latent relationships (3 of 3) 3.Proposed method )1(1 )1( 1 1 1 1 <= >      = ∑ ∑∑ = = − = n i i n i i n i i pP ppP p1 p2 p3 p3 = 0 p2 p1
  • 21. 21 1. Run the Full-text search using the input keywords 2. We score the file point for all files related to the files found in the full-text search (discuss later) 3. Display the files ordered by point Search result 1th 25pt 2th 20pt 3th 15pt 4th 10pt 5th 5pt Search files using interfile relationships 2 12 5 3 203 10 13 9 Full-text search result Relationship File System 3.Proposed method Target of Proposed method 25pt 15pt 5pt 10pt20pt
  • 22. 22 Score the file point 10 20 30 0.5 10.75 Full-text search result 0+15 (20 * 0.75) +30 (30* 1)45 30 +10 +5 +0 +0 20 25  Use TF-IDF and Normalized Relationship  Propagate just one hop for computational costs. 3.Proposed method & System Score of TF-IDF →   10Final Score →   20    Point (F) = TF-IDF(F) + ∑TF-IDF(X) * NormRel (F,Xi) Normalized Relationship
  • 23. 23 FRIDAL Implementation Full-text Search Engine (Hyper Estraier) Web Interface RDBMS Controller (java)   User File server (Samba) Full-text index Use file Searching phase Preparing phase Store relationships Calculates relationships Get access logs Use file Use file Search result Search related files Calculate points Search Search Full-text search Make full-text index 3.Proposed method & System File system Store relationships
  • 24. 24 Experiments 1. Background & Goal 2. Related works 3. Proposed method & system 4. Experiments 5. Conclusion
  • 25. 25  Parameter of Relationships  (α,β,γ, )=δ (1, 1, 0.5, 0.5) based on a preparatory experiment Experimental Environment 4. Experiments Tester A WinXP 319 Days Tester B WinXP 319 Days Tester C Win Vista 323 Days Samba 2.2 Access Log of MS Ofiice file, LaTeX Image, Movie, file A’s Home A’s Home B’s Home B’s Home C’s Home C’s Home
  • 26. 26 Mined Latent interfile relations #Relations was not correlate size of Logs  Depends on what (s)he were doing Lines of Logs #Files # Rels Tester A 4,873,703 1100 17,472 Tester B 4,323,090 713 5,692 Tester C 7,863,206 793 5,236 4. Experiments
  • 27. 27 Evaluation1 Task:  Find specific files in another user’s home Evaluate values  The number of queries  The number of files that user checked until find files  The number of found answer files Comparison methods  FRIDAL  Full-text search 4. Experiment
  • 28. 28 Evaluation1: Results File Search Method #Check File #Check Files found F1 FRIDAL 2 1  Full-text 2 15  F4 FRIDAL 1 2  Full-text 1 11  F6 FRIDAL 1 15  Full-text 2 14  Ave. FRIDAL 1.3 6.0   Full-text 1.7 13.3 File Search method #Queries #Check Files found F2 FRIDAL 1 9 1/1 Full-text 1 6 0/1 F3 FRIDAL 1 4 3/8 Full-text 1 0 0/8 F5 FRIDAL 1 2 1/1 Full-text 1 14 0/1 4. Experiment Smaller cost Only FRIDAL can find FRIDAL can find keyword lacking files and smaller costs than Full-text Search F1 The paper of tester A F2 The source of the image files in the paper of tester A F3 The eight data files for the paper of tester A F4 The paper of tester C F5 The source of the image files in the paper of tester C F6 The data file for the paper of tester C
  • 29. 29 Evaluation2  Performance Comparison with other methods  Prepare six tasks searching files from home directory  (Details in Table 4 in our paper)  Evaluate values  Average of 11points avg precidion  Average of top 20 precidion and recall Comparison methods  FRIDAL  Full-text search  Directory search  Connections calculation 4. Experiment
  • 30. 30 Evaluation2 : Comparison methods  Directory search  Straightforward strategy  Search the directory that includes the full-text search result 4. Experiment Full-text search 結果 ... In the same directory with 1st 1st 2nd 3rd 4th 5th 6th 7th Directory search 1st 2nd ...  Connections calculation  Use calculation method of Connections  Use the read/write attribute for file access in the access logs instead of read()/write()  Use optimal parameter values authors reported in their paper. In the same directory with 2nd
  • 31. 31 Evaluation2: Results 4. Experiment Top 20 Avg of precision Avg of recall FRIDAL 0.72 0.15 Full-text search 0.54 0.12 Directory search 0.61 0.13 Connections calculation 0.48 0.10 FRIDAL が 最も高い値 FRIDAL is the best score  The precision of FRIDAL is higher than the other methods at low recalls FRIDAL can retrieve more relevant files than the others in the high orders of the results, and so we can find the desired files efficiently by using FRIDAL
  • 32. 32 Conclusion & Future work 1. Background & Goal 2. Related works 3. Proposed method & system 4. Experiments 5. Conclusion
  • 33. 33 Conclusion  FRIDAL: A new desktop search method using latent relationship to search keyword-lacking files  A method for automatic extraction of latent relationship between files from file access logs  A search method and system using inter-file relationship with full-text search engine Show feasibility and performance of FRIDAL with real data experiments  Best performance in Comparison methods
  • 34. 34 Future work  Improve an implementation  Support copy, move, and rename files  Support other file access log (Windows Event Log)  Improve the calculation of the interfile relationships.  Filter noise in calculation of AFUD  Considering read/write(& move, delete…) actions.  Improve our ranking method  Detail analysis for multi user logs  More Consideration of Time related infomation Need to disuses “Old log is important or not”
  • 35. 35 Thank you! Questions & Comments ?

Editor's Notes

  1. 0:15
  2. Q想定質問「なぜ他がないのか」 インターフェースが出来ていない ディレクトリを開くコストを確定していない