SlideShare a Scribd company logo
1 of 10
This is for ELM
Ralph LeVan
Sr. Research Scientist
7/14/2016
Code4Lib Midwest
AutoSuggest
Goals
• Return records at keystroke speeds
• Run on an underpowered Unix box
2
Result
• Precalculate a response record for every
possible legitimate keystroke combination
• Load those records into a Pears database and
expose via SRW
• Client javascript takes keystrokes and turns
them into queries to an AutoSuggest servlet
• The thin gateway servlet takes queries, turns
them into SRW requests and passes through the
record returned
3
How are the records precalculated?
• For each source record, a relevance score is
calculated
– For VIAF, that’s a value in the record
• Names are extracted from the record.
– The names are ranked
– The best name gets the score of the record and
subsequent names get a reduced score
– For each name, a tuple is generated containing the
name, the recordID of the source record, the score for
the name and any other data extracted from the
record
4
How are the records precalculated?
• The tuples are sorted
• A process reads in all the names that start with
the same letter.
• The first two terms are compared and a top-10
list is started for each set of letters in common
– E.g. Andrew and Anthony each go into the top-10 list
for A and AN.
– AutoSuggest records are generated for the singletons
Andrew and Anthony. The full name is the key for
these records.
5
How are the records precalculated?
• The next term is compared to the one that
preceeded it
– E.g. Anthony and Astrid are compared
– Astrid is added to the top-10 list for A
– An AutoSuggest record is written for the AN list
• The key for the record is AN
• Each of the names (and associated data) are included in the
record
– An AutoSuggest record is generated for the singleton
Astrid
6
Top-10 is complicated
• The naïve assumption is that the 10 names with
the highest score would be in the list
• But, all the variations on Shakespeare that start
with S would be in the S record.
• So, a candidate name for the top-10 list is
checked to see if there is a higher ranking name
with the same recordID before it is added
7
It’s not really that easy
• All the names that start with A won’t fit into
memory.
• We do all of this work in Hadoop
• We partition the tuple input on the first 5 letters
in common
• Process as described before, but write the
shorter fragments (less than 5 letters) to a
separate directory
• Combine those lists to produce unified lists (and
records)
8
Loaded into Pears
• All these generated records are loaded into
Pears
• Lots and lots of records
– The latest AutoSuggest database for VIAF has 341
million records in it.
– VIAF itself only has 31M records
9
Thank You!
©2014 OCLC. This work is licensed under a Creative Commons Attribution 3.0 Unported License. Suggested attribution: “This
work uses content from [presentation title] © OCLC, used under a Creative Commons Attribution license:
http://creativecommons.org/licenses/by/3.0/”
Ralph LeVan
levan@oclc.org
10

More Related Content

Viewers also liked

가상현실 & 증강현실
가상현실 & 증강현실가상현실 & 증강현실
가상현실 & 증강현실범진 박
 
우송비트 10기 1조 ar mechanic craft
우송비트 10기 1조 ar mechanic craft우송비트 10기 1조 ar mechanic craft
우송비트 10기 1조 ar mechanic craft상우 김
 
AR M2M Curation & Social Platform
AR M2M Curation & Social PlatformAR M2M Curation & Social Platform
AR M2M Curation & Social PlatformJM code group
 
가상현실 Vs 증강현실
가상현실 Vs 증강현실가상현실 Vs 증강현실
가상현실 Vs 증강현실진태 김
 
증강현실 기술의 동향과 구현 사례(위치정보가 융합된 스마트폰의 증강현실 시스템 및 서비스)
증강현실 기술의 동향과 구현 사례(위치정보가 융합된 스마트폰의 증강현실 시스템 및 서비스) 증강현실 기술의 동향과 구현 사례(위치정보가 융합된 스마트폰의 증강현실 시스템 및 서비스)
증강현실 기술의 동향과 구현 사례(위치정보가 융합된 스마트폰의 증강현실 시스템 및 서비스) JM code group
 
Kaist arrc 20160901
Kaist arrc 20160901Kaist arrc 20160901
Kaist arrc 20160901Woontack Woo
 
Mezzo Newsletter_모바일 증강현실(ar)
Mezzo Newsletter_모바일 증강현실(ar)Mezzo Newsletter_모바일 증강현실(ar)
Mezzo Newsletter_모바일 증강현실(ar)MezzoMedia
 
가상현실과 증강현실의 차이점(The Difference between VR and AR
가상현실과 증강현실의 차이점(The Difference between VR and AR가상현실과 증강현실의 차이점(The Difference between VR and AR
가상현실과 증강현실의 차이점(The Difference between VR and AR경희 김
 
[IGC2015] 스마일게이트 김용하-VR? AR? 차세대 게임의 기반 기술
[IGC2015] 스마일게이트 김용하-VR? AR? 차세대 게임의 기반 기술[IGC2015] 스마일게이트 김용하-VR? AR? 차세대 게임의 기반 기술
[IGC2015] 스마일게이트 김용하-VR? AR? 차세대 게임의 기반 기술강 민우
 
증강현실트렌드
증강현실트렌드증강현실트렌드
증강현실트렌드Baekseo Choi
 
가상현실(Vr)과 증강현실(ar)
가상현실(Vr)과 증강현실(ar)가상현실(Vr)과 증강현실(ar)
가상현실(Vr)과 증강현실(ar)Heesung Youn
 

Viewers also liked (15)

가상현실 & 증강현실
가상현실 & 증강현실가상현실 & 증강현실
가상현실 & 증강현실
 
우송비트 10기 1조 ar mechanic craft
우송비트 10기 1조 ar mechanic craft우송비트 10기 1조 ar mechanic craft
우송비트 10기 1조 ar mechanic craft
 
AR M2M Curation & Social Platform
AR M2M Curation & Social PlatformAR M2M Curation & Social Platform
AR M2M Curation & Social Platform
 
가상현실 Vs 증강현실
가상현실 Vs 증강현실가상현실 Vs 증강현실
가상현실 Vs 증강현실
 
AR tool - Vuforia
AR tool - VuforiaAR tool - Vuforia
AR tool - Vuforia
 
증강현실 기술의 동향과 구현 사례(위치정보가 융합된 스마트폰의 증강현실 시스템 및 서비스)
증강현실 기술의 동향과 구현 사례(위치정보가 융합된 스마트폰의 증강현실 시스템 및 서비스) 증강현실 기술의 동향과 구현 사례(위치정보가 융합된 스마트폰의 증강현실 시스템 및 서비스)
증강현실 기술의 동향과 구현 사례(위치정보가 융합된 스마트폰의 증강현실 시스템 및 서비스)
 
증강현실
증강현실증강현실
증강현실
 
Vr & ar
Vr & arVr & ar
Vr & ar
 
Kaist arrc 20160901
Kaist arrc 20160901Kaist arrc 20160901
Kaist arrc 20160901
 
Mezzo Newsletter_모바일 증강현실(ar)
Mezzo Newsletter_모바일 증강현실(ar)Mezzo Newsletter_모바일 증강현실(ar)
Mezzo Newsletter_모바일 증강현실(ar)
 
가상현실과 증강현실의 차이점(The Difference between VR and AR
가상현실과 증강현실의 차이점(The Difference between VR and AR가상현실과 증강현실의 차이점(The Difference between VR and AR
가상현실과 증강현실의 차이점(The Difference between VR and AR
 
[IGC2015] 스마일게이트 김용하-VR? AR? 차세대 게임의 기반 기술
[IGC2015] 스마일게이트 김용하-VR? AR? 차세대 게임의 기반 기술[IGC2015] 스마일게이트 김용하-VR? AR? 차세대 게임의 기반 기술
[IGC2015] 스마일게이트 김용하-VR? AR? 차세대 게임의 기반 기술
 
증강현실트렌드
증강현실트렌드증강현실트렌드
증강현실트렌드
 
가상현실(Vr)과 증강현실(ar)
가상현실(Vr)과 증강현실(ar)가상현실(Vr)과 증강현실(ar)
가상현실(Vr)과 증강현실(ar)
 
sungmin slide
sungmin slidesungmin slide
sungmin slide
 

Similar to AutoSuggest

RFS Search Lang Spec
RFS Search Lang SpecRFS Search Lang Spec
RFS Search Lang SpecJing Kang
 
Hadoop databases for oracle DBAs
Hadoop databases for oracle DBAsHadoop databases for oracle DBAs
Hadoop databases for oracle DBAsMaxym Kharchenko
 
File organization
File organizationFile organization
File organizationGokul017
 
Sas short course_presentation_11-4-09
Sas short course_presentation_11-4-09Sas short course_presentation_11-4-09
Sas short course_presentation_11-4-09Prashant Ph
 
Sas short course_presentation_11-4-09
Sas short course_presentation_11-4-09Sas short course_presentation_11-4-09
Sas short course_presentation_11-4-09Prashant Ph
 
Self-Aligning Return Address Stack Power Point
Self-Aligning Return Address Stack Power PointSelf-Aligning Return Address Stack Power Point
Self-Aligning Return Address Stack Power PointRisingStar52
 
An Introduction to REDIS NoSQL database
An Introduction to REDIS NoSQL databaseAn Introduction to REDIS NoSQL database
An Introduction to REDIS NoSQL databaseAli MasudianPour
 
Applied Detection and Analysis with Flow Data - SO Con 2014
Applied Detection and Analysis with Flow Data - SO Con 2014Applied Detection and Analysis with Flow Data - SO Con 2014
Applied Detection and Analysis with Flow Data - SO Con 2014chrissanders88
 
Testing Rolling Roots
Testing Rolling RootsTesting Rolling Roots
Testing Rolling RootsAPNIC
 

Similar to AutoSuggest (13)

RFS Search Lang Spec
RFS Search Lang SpecRFS Search Lang Spec
RFS Search Lang Spec
 
Hadoop databases for oracle DBAs
Hadoop databases for oracle DBAsHadoop databases for oracle DBAs
Hadoop databases for oracle DBAs
 
Big data elasticsearch practical
Big data  elasticsearch practicalBig data  elasticsearch practical
Big data elasticsearch practical
 
File organization
File organizationFile organization
File organization
 
Dns ppt
Dns pptDns ppt
Dns ppt
 
Avro intro
Avro introAvro intro
Avro intro
 
Dns
DnsDns
Dns
 
Sas short course_presentation_11-4-09
Sas short course_presentation_11-4-09Sas short course_presentation_11-4-09
Sas short course_presentation_11-4-09
 
Sas short course_presentation_11-4-09
Sas short course_presentation_11-4-09Sas short course_presentation_11-4-09
Sas short course_presentation_11-4-09
 
Self-Aligning Return Address Stack Power Point
Self-Aligning Return Address Stack Power PointSelf-Aligning Return Address Stack Power Point
Self-Aligning Return Address Stack Power Point
 
An Introduction to REDIS NoSQL database
An Introduction to REDIS NoSQL databaseAn Introduction to REDIS NoSQL database
An Introduction to REDIS NoSQL database
 
Applied Detection and Analysis with Flow Data - SO Con 2014
Applied Detection and Analysis with Flow Data - SO Con 2014Applied Detection and Analysis with Flow Data - SO Con 2014
Applied Detection and Analysis with Flow Data - SO Con 2014
 
Testing Rolling Roots
Testing Rolling RootsTesting Rolling Roots
Testing Rolling Roots
 

More from OCLC

Communicating library impact beyond library walls: Findings from an action-or...
Communicating library impact beyond library walls: Findings from an action-or...Communicating library impact beyond library walls: Findings from an action-or...
Communicating library impact beyond library walls: Findings from an action-or...OCLC
 
"You can just tell whether a website looks reliable or not." People's modes o...
"You can just tell whether a website looks reliable or not." People's modes o..."You can just tell whether a website looks reliable or not." People's modes o...
"You can just tell whether a website looks reliable or not." People's modes o...OCLC
 
Factors influencing research data management programs.
Factors influencing research data management programs.Factors influencing research data management programs.
Factors influencing research data management programs.OCLC
 
Teaching research methods in LIS programs: Approaches, formats, and innovativ...
Teaching research methods in LIS programs: Approaches, formats, and innovativ...Teaching research methods in LIS programs: Approaches, formats, and innovativ...
Teaching research methods in LIS programs: Approaches, formats, and innovativ...OCLC
 
OCLC ALISE Library & Information Science Research Grant Program
OCLC ALISE Library & Information Science Research Grant ProgramOCLC ALISE Library & Information Science Research Grant Program
OCLC ALISE Library & Information Science Research Grant ProgramOCLC
 
Investing in library users and potential users: The Many Faces of Digital Vi...
 Investing in library users and potential users: The Many Faces of Digital Vi... Investing in library users and potential users: The Many Faces of Digital Vi...
Investing in library users and potential users: The Many Faces of Digital Vi...OCLC
 
Academic library impact: Improving practice and essential areas to research
Academic library impact: Improving practice and essential areas to researchAcademic library impact: Improving practice and essential areas to research
Academic library impact: Improving practice and essential areas to researchOCLC
 
Studying information behavior: The Many Faces of Digital Visitors and Residents
Studying information behavior: The Many Faces of Digital Visitors and ResidentsStudying information behavior: The Many Faces of Digital Visitors and Residents
Studying information behavior: The Many Faces of Digital Visitors and ResidentsOCLC
 
Online engagement and information literacy: The Many Face of Digital Visitors...
Online engagement and information literacy: The Many Face of Digital Visitors...Online engagement and information literacy: The Many Face of Digital Visitors...
Online engagement and information literacy: The Many Face of Digital Visitors...OCLC
 
People's mode of online engagement: The Many Faces of Digital Visitors and R...
 People's mode of online engagement: The Many Faces of Digital Visitors and R... People's mode of online engagement: The Many Faces of Digital Visitors and R...
People's mode of online engagement: The Many Faces of Digital Visitors and R...OCLC
 
Applying research methods: Investigating the Many Faces of Digital Visitors &...
Applying research methods: Investigating the Many Faces of Digital Visitors &...Applying research methods: Investigating the Many Faces of Digital Visitors &...
Applying research methods: Investigating the Many Faces of Digital Visitors &...OCLC
 
OCLC RLP @ RLUK
OCLC RLP @ RLUKOCLC RLP @ RLUK
OCLC RLP @ RLUKOCLC
 
Using Qualitative Methods for Library Evaluation: An Interactive Workshop
Using Qualitative Methods for Library Evaluation: An Interactive WorkshopUsing Qualitative Methods for Library Evaluation: An Interactive Workshop
Using Qualitative Methods for Library Evaluation: An Interactive WorkshopOCLC
 
Visitors and Residents: The Hows and Whys of Engagement with Technology
Visitors and Residents: The Hows and Whys of Engagement with TechnologyVisitors and Residents: The Hows and Whys of Engagement with Technology
Visitors and Residents: The Hows and Whys of Engagement with TechnologyOCLC
 
Action-Oriented Research Agenda on Library Contributions to Student Learning ...
Action-Oriented Research Agenda on Library Contributions to Student Learning ...Action-Oriented Research Agenda on Library Contributions to Student Learning ...
Action-Oriented Research Agenda on Library Contributions to Student Learning ...OCLC
 
Visitors and Residents: Interactive Mapping Exercise Workshop
Visitors and Residents: Interactive Mapping Exercise WorkshopVisitors and Residents: Interactive Mapping Exercise Workshop
Visitors and Residents: Interactive Mapping Exercise WorkshopOCLC
 
The Library in the Life of the User
The Library in the Life of the UserThe Library in the Life of the User
The Library in the Life of the UserOCLC
 
Where are We Going and What Do We Do Next? Demonstrating the Value of Academi...
Where are We Going and What Do We Do Next? Demonstrating the Value of Academi...Where are We Going and What Do We Do Next? Demonstrating the Value of Academi...
Where are We Going and What Do We Do Next? Demonstrating the Value of Academi...OCLC
 
Changing Tack: A Future-Focused ACRL Research Agenda
Changing Tack: A Future-Focused ACRL Research AgendaChanging Tack: A Future-Focused ACRL Research Agenda
Changing Tack: A Future-Focused ACRL Research AgendaOCLC
 
Qualitative Research Methods in LIS
Qualitative Research Methods in LISQualitative Research Methods in LIS
Qualitative Research Methods in LISOCLC
 

More from OCLC (20)

Communicating library impact beyond library walls: Findings from an action-or...
Communicating library impact beyond library walls: Findings from an action-or...Communicating library impact beyond library walls: Findings from an action-or...
Communicating library impact beyond library walls: Findings from an action-or...
 
"You can just tell whether a website looks reliable or not." People's modes o...
"You can just tell whether a website looks reliable or not." People's modes o..."You can just tell whether a website looks reliable or not." People's modes o...
"You can just tell whether a website looks reliable or not." People's modes o...
 
Factors influencing research data management programs.
Factors influencing research data management programs.Factors influencing research data management programs.
Factors influencing research data management programs.
 
Teaching research methods in LIS programs: Approaches, formats, and innovativ...
Teaching research methods in LIS programs: Approaches, formats, and innovativ...Teaching research methods in LIS programs: Approaches, formats, and innovativ...
Teaching research methods in LIS programs: Approaches, formats, and innovativ...
 
OCLC ALISE Library & Information Science Research Grant Program
OCLC ALISE Library & Information Science Research Grant ProgramOCLC ALISE Library & Information Science Research Grant Program
OCLC ALISE Library & Information Science Research Grant Program
 
Investing in library users and potential users: The Many Faces of Digital Vi...
 Investing in library users and potential users: The Many Faces of Digital Vi... Investing in library users and potential users: The Many Faces of Digital Vi...
Investing in library users and potential users: The Many Faces of Digital Vi...
 
Academic library impact: Improving practice and essential areas to research
Academic library impact: Improving practice and essential areas to researchAcademic library impact: Improving practice and essential areas to research
Academic library impact: Improving practice and essential areas to research
 
Studying information behavior: The Many Faces of Digital Visitors and Residents
Studying information behavior: The Many Faces of Digital Visitors and ResidentsStudying information behavior: The Many Faces of Digital Visitors and Residents
Studying information behavior: The Many Faces of Digital Visitors and Residents
 
Online engagement and information literacy: The Many Face of Digital Visitors...
Online engagement and information literacy: The Many Face of Digital Visitors...Online engagement and information literacy: The Many Face of Digital Visitors...
Online engagement and information literacy: The Many Face of Digital Visitors...
 
People's mode of online engagement: The Many Faces of Digital Visitors and R...
 People's mode of online engagement: The Many Faces of Digital Visitors and R... People's mode of online engagement: The Many Faces of Digital Visitors and R...
People's mode of online engagement: The Many Faces of Digital Visitors and R...
 
Applying research methods: Investigating the Many Faces of Digital Visitors &...
Applying research methods: Investigating the Many Faces of Digital Visitors &...Applying research methods: Investigating the Many Faces of Digital Visitors &...
Applying research methods: Investigating the Many Faces of Digital Visitors &...
 
OCLC RLP @ RLUK
OCLC RLP @ RLUKOCLC RLP @ RLUK
OCLC RLP @ RLUK
 
Using Qualitative Methods for Library Evaluation: An Interactive Workshop
Using Qualitative Methods for Library Evaluation: An Interactive WorkshopUsing Qualitative Methods for Library Evaluation: An Interactive Workshop
Using Qualitative Methods for Library Evaluation: An Interactive Workshop
 
Visitors and Residents: The Hows and Whys of Engagement with Technology
Visitors and Residents: The Hows and Whys of Engagement with TechnologyVisitors and Residents: The Hows and Whys of Engagement with Technology
Visitors and Residents: The Hows and Whys of Engagement with Technology
 
Action-Oriented Research Agenda on Library Contributions to Student Learning ...
Action-Oriented Research Agenda on Library Contributions to Student Learning ...Action-Oriented Research Agenda on Library Contributions to Student Learning ...
Action-Oriented Research Agenda on Library Contributions to Student Learning ...
 
Visitors and Residents: Interactive Mapping Exercise Workshop
Visitors and Residents: Interactive Mapping Exercise WorkshopVisitors and Residents: Interactive Mapping Exercise Workshop
Visitors and Residents: Interactive Mapping Exercise Workshop
 
The Library in the Life of the User
The Library in the Life of the UserThe Library in the Life of the User
The Library in the Life of the User
 
Where are We Going and What Do We Do Next? Demonstrating the Value of Academi...
Where are We Going and What Do We Do Next? Demonstrating the Value of Academi...Where are We Going and What Do We Do Next? Demonstrating the Value of Academi...
Where are We Going and What Do We Do Next? Demonstrating the Value of Academi...
 
Changing Tack: A Future-Focused ACRL Research Agenda
Changing Tack: A Future-Focused ACRL Research AgendaChanging Tack: A Future-Focused ACRL Research Agenda
Changing Tack: A Future-Focused ACRL Research Agenda
 
Qualitative Research Methods in LIS
Qualitative Research Methods in LISQualitative Research Methods in LIS
Qualitative Research Methods in LIS
 

Recently uploaded

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIShubhangi Sonawane
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 

Recently uploaded (20)

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 

AutoSuggest

  • 1. This is for ELM Ralph LeVan Sr. Research Scientist 7/14/2016 Code4Lib Midwest AutoSuggest
  • 2. Goals • Return records at keystroke speeds • Run on an underpowered Unix box 2
  • 3. Result • Precalculate a response record for every possible legitimate keystroke combination • Load those records into a Pears database and expose via SRW • Client javascript takes keystrokes and turns them into queries to an AutoSuggest servlet • The thin gateway servlet takes queries, turns them into SRW requests and passes through the record returned 3
  • 4. How are the records precalculated? • For each source record, a relevance score is calculated – For VIAF, that’s a value in the record • Names are extracted from the record. – The names are ranked – The best name gets the score of the record and subsequent names get a reduced score – For each name, a tuple is generated containing the name, the recordID of the source record, the score for the name and any other data extracted from the record 4
  • 5. How are the records precalculated? • The tuples are sorted • A process reads in all the names that start with the same letter. • The first two terms are compared and a top-10 list is started for each set of letters in common – E.g. Andrew and Anthony each go into the top-10 list for A and AN. – AutoSuggest records are generated for the singletons Andrew and Anthony. The full name is the key for these records. 5
  • 6. How are the records precalculated? • The next term is compared to the one that preceeded it – E.g. Anthony and Astrid are compared – Astrid is added to the top-10 list for A – An AutoSuggest record is written for the AN list • The key for the record is AN • Each of the names (and associated data) are included in the record – An AutoSuggest record is generated for the singleton Astrid 6
  • 7. Top-10 is complicated • The naïve assumption is that the 10 names with the highest score would be in the list • But, all the variations on Shakespeare that start with S would be in the S record. • So, a candidate name for the top-10 list is checked to see if there is a higher ranking name with the same recordID before it is added 7
  • 8. It’s not really that easy • All the names that start with A won’t fit into memory. • We do all of this work in Hadoop • We partition the tuple input on the first 5 letters in common • Process as described before, but write the shorter fragments (less than 5 letters) to a separate directory • Combine those lists to produce unified lists (and records) 8
  • 9. Loaded into Pears • All these generated records are loaded into Pears • Lots and lots of records – The latest AutoSuggest database for VIAF has 341 million records in it. – VIAF itself only has 31M records 9
  • 10. Thank You! ©2014 OCLC. This work is licensed under a Creative Commons Attribution 3.0 Unported License. Suggested attribution: “This work uses content from [presentation title] © OCLC, used under a Creative Commons Attribution license: http://creativecommons.org/licenses/by/3.0/” Ralph LeVan levan@oclc.org 10