SlideShare a Scribd company logo
1 of 19
Download to read offline
Cutting Edge Technology
used in ePADD
RBMS - June 25, 2015
Peter Chan, Digital Archivist
5 Cutting Edge Technologies in ePADD
1. Intellectual Content Appraisal
2. Lexicon Search
3. Query Generator
4. Attachment Browsing
5. Entity Resolution
Intellectual Content Appraisal
Common practices
• Physical attribute
– File count
– File size
– File listing
– File format
ePADD information extraction
• Intellectual content
– Personal name
– Organizational name
– Location name
• Physical attributes listed on
the left.
Lexicon Search
Regular search
• One query at a time
• Exact terms
• Search terms cannot be
saved for later use
• Search terms not grouped
ePADD Lexicon search
• Multiple queries at a time
• Stemming search
• Search terms saved for
future use
• Search term groupings
saved for future use
Query Generator
Regular search
• Users enter search terms
ePADD Query Generator
• System generate search
terms from text provided
Attachment Browsing
Inside the email message
• Email client
– Gmail, Hotmail, Yahoo Mail,
etc.
• Email archiving software
– MailStore
ePADD Attachment Browsing
• Consolidate all attached
images in one place and link
images back to originating
messages
• Consolidate all document
attachments in one place
for easy download
• Consolidate all other
attachments in one place
for easy download
Entity Resolution
Not seen elsewhere ePADD Disambiguation
• External resolution
– ePADD resolves entities to the FAST
for person names, and Freebase for
location and organizations.
• Internal resolution
– ePADD generates a list of internal
“authority” records consisting of all
recognized entities and multi-word
address book names
– When the user hovers over such
names or acronyms in an email,
possible resolutions to internal
authority records are shown in
decreasing order of confidence.
Cutting Edge Technologyused in ePADD
Cutting Edge Technologyused in ePADD

More Related Content

What's hot

ESWC 2011 - Designing an Ontology for the Data Documentation Initiative
ESWC 2011 -  Designing an Ontology for the Data Documentation InitiativeESWC 2011 -  Designing an Ontology for the Data Documentation Initiative
ESWC 2011 - Designing an Ontology for the Data Documentation Initiative
Dr.-Ing. Thomas Hartmann
 
An Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland ProjectAn Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland Project
Alasdair Gray
 

What's hot (20)

Crossref Content Registration - LIVE Mumbai
Crossref Content Registration - LIVE MumbaiCrossref Content Registration - LIVE Mumbai
Crossref Content Registration - LIVE Mumbai
 
Crossref Community Call May 2016
Crossref Community Call May 2016Crossref Community Call May 2016
Crossref Community Call May 2016
 
Text and Data Mining
Text and Data MiningText and Data Mining
Text and Data Mining
 
Geoffrey Bilder: Strategic Initiatives Update #crossref15
Geoffrey Bilder: Strategic Initiatives Update #crossref15Geoffrey Bilder: Strategic Initiatives Update #crossref15
Geoffrey Bilder: Strategic Initiatives Update #crossref15
 
Managing plagiarism: Similarity Check
Managing plagiarism: Similarity CheckManaging plagiarism: Similarity Check
Managing plagiarism: Similarity Check
 
Crossref Metadata and Metadata Services
Crossref Metadata and Metadata ServicesCrossref Metadata and Metadata Services
Crossref Metadata and Metadata Services
 
DRI Introduction to Digital Preservation Training- Metadata and xml-Kathryn C...
DRI Introduction to Digital Preservation Training- Metadata and xml-Kathryn C...DRI Introduction to Digital Preservation Training- Metadata and xml-Kathryn C...
DRI Introduction to Digital Preservation Training- Metadata and xml-Kathryn C...
 
Introduction to Crossref: History, Mission, Members
Introduction to Crossref: History, Mission, MembersIntroduction to Crossref: History, Mission, Members
Introduction to Crossref: History, Mission, Members
 
ESWC 2011 - Designing an Ontology for the Data Documentation Initiative
ESWC 2011 -  Designing an Ontology for the Data Documentation InitiativeESWC 2011 -  Designing an Ontology for the Data Documentation Initiative
ESWC 2011 - Designing an Ontology for the Data Documentation Initiative
 
Ecp 11 created by hedley. hendricks and presented by gerald Louw
Ecp 11 created by hedley. hendricks and presented by gerald LouwEcp 11 created by hedley. hendricks and presented by gerald Louw
Ecp 11 created by hedley. hendricks and presented by gerald Louw
 
CrossRef System Update
CrossRef System UpdateCrossRef System Update
CrossRef System Update
 
Drupal - What is it?
Drupal - What is it?Drupal - What is it?
Drupal - What is it?
 
Digital Curation: gaps and challenges
Digital Curation: gaps and challengesDigital Curation: gaps and challenges
Digital Curation: gaps and challenges
 
Understanding Crossref Metadata
Understanding Crossref MetadataUnderstanding Crossref Metadata
Understanding Crossref Metadata
 
An Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland ProjectAn Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland Project
 
Introduction to Crossref, Seoul - Ed Pentz
Introduction to Crossref, Seoul - Ed PentzIntroduction to Crossref, Seoul - Ed Pentz
Introduction to Crossref, Seoul - Ed Pentz
 
PhD Dissertation Writers
PhD Dissertation WritersPhD Dissertation Writers
PhD Dissertation Writers
 
2013 CrossRef Annual Meeting United in Preservation - Randy Kiefer and Kate W...
2013 CrossRef Annual Meeting United in Preservation - Randy Kiefer and Kate W...2013 CrossRef Annual Meeting United in Preservation - Randy Kiefer and Kate W...
2013 CrossRef Annual Meeting United in Preservation - Randy Kiefer and Kate W...
 
Islandora and Linked Open Data
Islandora and Linked Open Data Islandora and Linked Open Data
Islandora and Linked Open Data
 
Supporting Dataset Descriptions in the Life Sciences
Supporting Dataset Descriptions in the Life SciencesSupporting Dataset Descriptions in the Life Sciences
Supporting Dataset Descriptions in the Life Sciences
 

Similar to Cutting Edge Technology used in ePADD

SPConnections Amsterdam: Beyond the Search Center - Application or Solution? ...
SPConnections Amsterdam: Beyond the Search Center - Application or Solution? ...SPConnections Amsterdam: Beyond the Search Center - Application or Solution? ...
SPConnections Amsterdam: Beyond the Search Center - Application or Solution? ...
Agnes Molnar
 
RedisSearch / CRDT: Kyle Davis, Meir Shpilraien
RedisSearch / CRDT: Kyle Davis, Meir ShpilraienRedisSearch / CRDT: Kyle Davis, Meir Shpilraien
RedisSearch / CRDT: Kyle Davis, Meir Shpilraien
Redis Labs
 
Lucene Bootcamp - 2
Lucene Bootcamp - 2Lucene Bootcamp - 2
Lucene Bootcamp - 2
GokulD
 
#SEASPC: Information Architecture and Enterprise Search - Better Together
#SEASPC: Information Architecture and Enterprise Search - Better Together#SEASPC: Information Architecture and Enterprise Search - Better Together
#SEASPC: Information Architecture and Enterprise Search - Better Together
Agnes Molnar
 
Find Information Faster Using SharePoint 2010 Search
Find Information Faster Using SharePoint 2010 SearchFind Information Faster Using SharePoint 2010 Search
Find Information Faster Using SharePoint 2010 Search
Perficient, Inc.
 

Similar to Cutting Edge Technology used in ePADD (20)

SPConnections Amsterdam: Beyond the Search Center - Application or Solution? ...
SPConnections Amsterdam: Beyond the Search Center - Application or Solution? ...SPConnections Amsterdam: Beyond the Search Center - Application or Solution? ...
SPConnections Amsterdam: Beyond the Search Center - Application or Solution? ...
 
Fedora
FedoraFedora
Fedora
 
Three levels of pain
Three levels of painThree levels of pain
Three levels of pain
 
Designing and Implementing Search Solutions
Designing and Implementing Search SolutionsDesigning and Implementing Search Solutions
Designing and Implementing Search Solutions
 
Agnes Molnar - Best Practices for Information Architecture and Enterprise Search
Agnes Molnar - Best Practices for Information Architecture and Enterprise SearchAgnes Molnar - Best Practices for Information Architecture and Enterprise Search
Agnes Molnar - Best Practices for Information Architecture and Enterprise Search
 
Enterprise Search @EPAM
Enterprise Search @EPAMEnterprise Search @EPAM
Enterprise Search @EPAM
 
RedisSearch / CRDT: Kyle Davis, Meir Shpilraien
RedisSearch / CRDT: Kyle Davis, Meir ShpilraienRedisSearch / CRDT: Kyle Davis, Meir Shpilraien
RedisSearch / CRDT: Kyle Davis, Meir Shpilraien
 
How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?
 
How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?
 
Lucene Bootcamp - 2
Lucene Bootcamp - 2Lucene Bootcamp - 2
Lucene Bootcamp - 2
 
From Expert Finding to Entity Search on the Web
From Expert Finding to Entity Search on the Web From Expert Finding to Entity Search on the Web
From Expert Finding to Entity Search on the Web
 
Data Archiving and Sharing
Data Archiving and SharingData Archiving and Sharing
Data Archiving and Sharing
 
Large-Scale Semantic Search
Large-Scale Semantic SearchLarge-Scale Semantic Search
Large-Scale Semantic Search
 
Data Visibility and Protection at the Scale of Life Sciences
Data Visibility and Protection at the Scale of Life SciencesData Visibility and Protection at the Scale of Life Sciences
Data Visibility and Protection at the Scale of Life Sciences
 
Enterprise Search in SharePoint 2013
Enterprise Search in SharePoint 2013Enterprise Search in SharePoint 2013
Enterprise Search in SharePoint 2013
 
SharePoint site admins leverage search
SharePoint site admins leverage searchSharePoint site admins leverage search
SharePoint site admins leverage search
 
Analytics with unified file and object
Analytics with unified file and object Analytics with unified file and object
Analytics with unified file and object
 
#SEASPC: Information Architecture and Enterprise Search - Better Together
#SEASPC: Information Architecture and Enterprise Search - Better Together#SEASPC: Information Architecture and Enterprise Search - Better Together
#SEASPC: Information Architecture and Enterprise Search - Better Together
 
Find Information Faster Using SharePoint 2010 Search
Find Information Faster Using SharePoint 2010 SearchFind Information Faster Using SharePoint 2010 Search
Find Information Faster Using SharePoint 2010 Search
 
eDiscovery and Microsoft Teams
eDiscovery and Microsoft TeamseDiscovery and Microsoft Teams
eDiscovery and Microsoft Teams
 

More from peterchanws

Born digital collection work flow2
Born digital collection work flow2Born digital collection work flow2
Born digital collection work flow2
peterchanws
 
Workshop 2 revised
Workshop 2 revisedWorkshop 2 revised
Workshop 2 revised
peterchanws
 
Workshop 1 revised
Workshop 1 revisedWorkshop 1 revised
Workshop 1 revised
peterchanws
 

More from peterchanws (13)

How can the cultural heritage community best meet the challenges of email arc...
How can the cultural heritage community best meet the challenges of email arc...How can the cultural heritage community best meet the challenges of email arc...
How can the cultural heritage community best meet the challenges of email arc...
 
Video game controlled vocabulary in wikidata
Video game controlled vocabulary in wikidataVideo game controlled vocabulary in wikidata
Video game controlled vocabulary in wikidata
 
Digital game preservation conference 12 25-2018
Digital game preservation conference   12 25-2018Digital game preservation conference   12 25-2018
Digital game preservation conference 12 25-2018
 
Potential Future Directions for ePADD
Potential Future Directions for ePADDPotential Future Directions for ePADD
Potential Future Directions for ePADD
 
Imaging 5.25 Floppy Disks
Imaging 5.25 Floppy DisksImaging 5.25 Floppy Disks
Imaging 5.25 Floppy Disks
 
Why We Want to Publish Controlled Vocabulary in SKOS?
Why We Want to Publish Controlled Vocabulary in SKOS?Why We Want to Publish Controlled Vocabulary in SKOS?
Why We Want to Publish Controlled Vocabulary in SKOS?
 
SCA Accessioning Born-Digital Materials Workshop, Nov. 8, 2012
SCA Accessioning Born-Digital Materials Workshop, Nov. 8, 2012SCA Accessioning Born-Digital Materials Workshop, Nov. 8, 2012
SCA Accessioning Born-Digital Materials Workshop, Nov. 8, 2012
 
Accessioning Born-Digital Materials
Accessioning Born-Digital MaterialsAccessioning Born-Digital Materials
Accessioning Born-Digital Materials
 
MUSE
MUSEMUSE
MUSE
 
Born digital collection work flow2
Born digital collection work flow2Born digital collection work flow2
Born digital collection work flow2
 
Workshop 3
Workshop 3Workshop 3
Workshop 3
 
Workshop 2 revised
Workshop 2 revisedWorkshop 2 revised
Workshop 2 revised
 
Workshop 1 revised
Workshop 1 revisedWorkshop 1 revised
Workshop 1 revised
 

Recently uploaded

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Krashi Coaching
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 

Recently uploaded (20)

Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 

Cutting Edge Technology used in ePADD

  • 1. Cutting Edge Technology used in ePADD RBMS - June 25, 2015 Peter Chan, Digital Archivist
  • 2. 5 Cutting Edge Technologies in ePADD 1. Intellectual Content Appraisal 2. Lexicon Search 3. Query Generator 4. Attachment Browsing 5. Entity Resolution
  • 3. Intellectual Content Appraisal Common practices • Physical attribute – File count – File size – File listing – File format ePADD information extraction • Intellectual content – Personal name – Organizational name – Location name • Physical attributes listed on the left.
  • 4.
  • 5.
  • 6.
  • 7. Lexicon Search Regular search • One query at a time • Exact terms • Search terms cannot be saved for later use • Search terms not grouped ePADD Lexicon search • Multiple queries at a time • Stemming search • Search terms saved for future use • Search term groupings saved for future use
  • 8.
  • 9.
  • 10.
  • 11. Query Generator Regular search • Users enter search terms ePADD Query Generator • System generate search terms from text provided
  • 12.
  • 13.
  • 14. Attachment Browsing Inside the email message • Email client – Gmail, Hotmail, Yahoo Mail, etc. • Email archiving software – MailStore ePADD Attachment Browsing • Consolidate all attached images in one place and link images back to originating messages • Consolidate all document attachments in one place for easy download • Consolidate all other attachments in one place for easy download
  • 15.
  • 16.
  • 17. Entity Resolution Not seen elsewhere ePADD Disambiguation • External resolution – ePADD resolves entities to the FAST for person names, and Freebase for location and organizations. • Internal resolution – ePADD generates a list of internal “authority” records consisting of all recognized entities and multi-word address book names – When the user hovers over such names or acronyms in an email, possible resolutions to internal authority records are shown in decreasing order of confidence.