SlideShare a Scribd company logo
1 of 10
This is for ELM
Ralph LeVan
Sr. Research Scientist
7/14/2016
Code4Lib Midwest
AutoSuggest
Goals
• Return records at keystroke speeds
• Run on an underpowered Unix box
2
Result
• Precalculate a response record for every
possible legitimate keystroke combination
• Load those records into a Pears database and
expose via SRW
• Client javascript takes keystrokes and turns
them into queries to an AutoSuggest servlet
• The thin gateway servlet takes queries, turns
them into SRW requests and passes through the
record returned
3
How are the records precalculated?
• For each source record, a relevance score is
calculated
– For VIAF, that’s a value in the record
• Names are extracted from the record.
– The names are ranked
– The best name gets the score of the record and
subsequent names get a reduced score
– For each name, a tuple is generated containing the
name, the recordID of the source record, the score for
the name and any other data extracted from the
record
4
How are the records precalculated?
• The tuples are sorted
• A process reads in all the names that start with
the same letter.
• The first two terms are compared and a top-10
list is started for each set of letters in common
– E.g. Andrew and Anthony each go into the top-10 list
for A and AN.
– AutoSuggest records are generated for the singletons
Andrew and Anthony. The full name is the key for
these records.
5
How are the records precalculated?
• The next term is compared to the one that
preceeded it
– E.g. Anthony and Astrid are compared
– Astrid is added to the top-10 list for A
– An AutoSuggest record is written for the AN list
• The key for the record is AN
• Each of the names (and associated data) are included in the
record
– An AutoSuggest record is generated for the singleton
Astrid
6
Top-10 is complicated
• The naïve assumption is that the 10 names with
the highest score would be in the list
• But, all the variations on Shakespeare that start
with S would be in the S record.
• So, a candidate name for the top-10 list is
checked to see if there is a higher ranking name
with the same recordID before it is added
7
It’s not really that easy
• All the names that start with A won’t fit into
memory.
• We do all of this work in Hadoop
• We partition the tuple input on the first 5 letters
in common
• Process as described before, but write the
shorter fragments (less than 5 letters) to a
separate directory
• Combine those lists to produce unified lists (and
records)
8
Loaded into Pears
• All these generated records are loaded into
Pears
• Lots and lots of records
– The latest AutoSuggest database for VIAF has 341
million records in it.
– VIAF itself only has 31M records
9
Thank You!
©2014 OCLC. This work is licensed under a Creative Commons Attribution 3.0 Unported License. Suggested attribution: “This
work uses content from [presentation title] © OCLC, used under a Creative Commons Attribution license:
http://creativecommons.org/licenses/by/3.0/”
Ralph LeVan
levan@oclc.org
10

More Related Content

Viewers also liked

가상현실 & 증강현실
가상현실 & 증강현실가상현실 & 증강현실
가상현실 & 증강현실범진 박
 
우송비트 10기 1조 ar mechanic craft
우송비트 10기 1조 ar mechanic craft우송비트 10기 1조 ar mechanic craft
우송비트 10기 1조 ar mechanic craft상우 김
 
AR M2M Curation & Social Platform
AR M2M Curation & Social PlatformAR M2M Curation & Social Platform
AR M2M Curation & Social PlatformJM code group
 
가상현실 Vs 증강현실
가상현실 Vs 증강현실가상현실 Vs 증강현실
가상현실 Vs 증강현실진태 김
 
증강현실 기술의 동향과 구현 사례(위치정보가 융합된 스마트폰의 증강현실 시스템 및 서비스)
증강현실 기술의 동향과 구현 사례(위치정보가 융합된 스마트폰의 증강현실 시스템 및 서비스) 증강현실 기술의 동향과 구현 사례(위치정보가 융합된 스마트폰의 증강현실 시스템 및 서비스)
증강현실 기술의 동향과 구현 사례(위치정보가 융합된 스마트폰의 증강현실 시스템 및 서비스) JM code group
 
Kaist arrc 20160901
Kaist arrc 20160901Kaist arrc 20160901
Kaist arrc 20160901Woontack Woo
 
Mezzo Newsletter_모바일 증강현실(ar)
Mezzo Newsletter_모바일 증강현실(ar)Mezzo Newsletter_모바일 증강현실(ar)
Mezzo Newsletter_모바일 증강현실(ar)MezzoMedia
 
가상현실과 증강현실의 차이점(The Difference between VR and AR
가상현실과 증강현실의 차이점(The Difference between VR and AR가상현실과 증강현실의 차이점(The Difference between VR and AR
가상현실과 증강현실의 차이점(The Difference between VR and AR경희 김
 
[IGC2015] 스마일게이트 김용하-VR? AR? 차세대 게임의 기반 기술
[IGC2015] 스마일게이트 김용하-VR? AR? 차세대 게임의 기반 기술[IGC2015] 스마일게이트 김용하-VR? AR? 차세대 게임의 기반 기술
[IGC2015] 스마일게이트 김용하-VR? AR? 차세대 게임의 기반 기술강 민우
 
증강현실트렌드
증강현실트렌드증강현실트렌드
증강현실트렌드Baekseo Choi
 
가상현실(Vr)과 증강현실(ar)
가상현실(Vr)과 증강현실(ar)가상현실(Vr)과 증강현실(ar)
가상현실(Vr)과 증강현실(ar)Heesung Youn
 

Viewers also liked (15)

가상현실 & 증강현실
가상현실 & 증강현실가상현실 & 증강현실
가상현실 & 증강현실
 
우송비트 10기 1조 ar mechanic craft
우송비트 10기 1조 ar mechanic craft우송비트 10기 1조 ar mechanic craft
우송비트 10기 1조 ar mechanic craft
 
AR M2M Curation & Social Platform
AR M2M Curation & Social PlatformAR M2M Curation & Social Platform
AR M2M Curation & Social Platform
 
가상현실 Vs 증강현실
가상현실 Vs 증강현실가상현실 Vs 증강현실
가상현실 Vs 증강현실
 
AR tool - Vuforia
AR tool - VuforiaAR tool - Vuforia
AR tool - Vuforia
 
증강현실 기술의 동향과 구현 사례(위치정보가 융합된 스마트폰의 증강현실 시스템 및 서비스)
증강현실 기술의 동향과 구현 사례(위치정보가 융합된 스마트폰의 증강현실 시스템 및 서비스) 증강현실 기술의 동향과 구현 사례(위치정보가 융합된 스마트폰의 증강현실 시스템 및 서비스)
증강현실 기술의 동향과 구현 사례(위치정보가 융합된 스마트폰의 증강현실 시스템 및 서비스)
 
증강현실
증강현실증강현실
증강현실
 
Vr & ar
Vr & arVr & ar
Vr & ar
 
Kaist arrc 20160901
Kaist arrc 20160901Kaist arrc 20160901
Kaist arrc 20160901
 
Mezzo Newsletter_모바일 증강현실(ar)
Mezzo Newsletter_모바일 증강현실(ar)Mezzo Newsletter_모바일 증강현실(ar)
Mezzo Newsletter_모바일 증강현실(ar)
 
가상현실과 증강현실의 차이점(The Difference between VR and AR
가상현실과 증강현실의 차이점(The Difference between VR and AR가상현실과 증강현실의 차이점(The Difference between VR and AR
가상현실과 증강현실의 차이점(The Difference between VR and AR
 
[IGC2015] 스마일게이트 김용하-VR? AR? 차세대 게임의 기반 기술
[IGC2015] 스마일게이트 김용하-VR? AR? 차세대 게임의 기반 기술[IGC2015] 스마일게이트 김용하-VR? AR? 차세대 게임의 기반 기술
[IGC2015] 스마일게이트 김용하-VR? AR? 차세대 게임의 기반 기술
 
증강현실트렌드
증강현실트렌드증강현실트렌드
증강현실트렌드
 
가상현실(Vr)과 증강현실(ar)
가상현실(Vr)과 증강현실(ar)가상현실(Vr)과 증강현실(ar)
가상현실(Vr)과 증강현실(ar)
 
sungmin slide
sungmin slidesungmin slide
sungmin slide
 

Similar to AutoSuggest

RFS Search Lang Spec
RFS Search Lang SpecRFS Search Lang Spec
RFS Search Lang SpecJing Kang
 
Hadoop databases for oracle DBAs
Hadoop databases for oracle DBAsHadoop databases for oracle DBAs
Hadoop databases for oracle DBAsMaxym Kharchenko
 
File organization
File organizationFile organization
File organizationGokul017
 
Sas short course_presentation_11-4-09
Sas short course_presentation_11-4-09Sas short course_presentation_11-4-09
Sas short course_presentation_11-4-09Prashant Ph
 
Sas short course_presentation_11-4-09
Sas short course_presentation_11-4-09Sas short course_presentation_11-4-09
Sas short course_presentation_11-4-09Prashant Ph
 
Self-Aligning Return Address Stack Power Point
Self-Aligning Return Address Stack Power PointSelf-Aligning Return Address Stack Power Point
Self-Aligning Return Address Stack Power PointRisingStar52
 
An Introduction to REDIS NoSQL database
An Introduction to REDIS NoSQL databaseAn Introduction to REDIS NoSQL database
An Introduction to REDIS NoSQL databaseAli MasudianPour
 
Applied Detection and Analysis with Flow Data - SO Con 2014
Applied Detection and Analysis with Flow Data - SO Con 2014Applied Detection and Analysis with Flow Data - SO Con 2014
Applied Detection and Analysis with Flow Data - SO Con 2014chrissanders88
 
Testing Rolling Roots
Testing Rolling RootsTesting Rolling Roots
Testing Rolling RootsAPNIC
 

Similar to AutoSuggest (13)

RFS Search Lang Spec
RFS Search Lang SpecRFS Search Lang Spec
RFS Search Lang Spec
 
Hadoop databases for oracle DBAs
Hadoop databases for oracle DBAsHadoop databases for oracle DBAs
Hadoop databases for oracle DBAs
 
Big data elasticsearch practical
Big data  elasticsearch practicalBig data  elasticsearch practical
Big data elasticsearch practical
 
File organization
File organizationFile organization
File organization
 
Dns ppt
Dns pptDns ppt
Dns ppt
 
Avro intro
Avro introAvro intro
Avro intro
 
Dns
DnsDns
Dns
 
Sas short course_presentation_11-4-09
Sas short course_presentation_11-4-09Sas short course_presentation_11-4-09
Sas short course_presentation_11-4-09
 
Sas short course_presentation_11-4-09
Sas short course_presentation_11-4-09Sas short course_presentation_11-4-09
Sas short course_presentation_11-4-09
 
Self-Aligning Return Address Stack Power Point
Self-Aligning Return Address Stack Power PointSelf-Aligning Return Address Stack Power Point
Self-Aligning Return Address Stack Power Point
 
An Introduction to REDIS NoSQL database
An Introduction to REDIS NoSQL databaseAn Introduction to REDIS NoSQL database
An Introduction to REDIS NoSQL database
 
Applied Detection and Analysis with Flow Data - SO Con 2014
Applied Detection and Analysis with Flow Data - SO Con 2014Applied Detection and Analysis with Flow Data - SO Con 2014
Applied Detection and Analysis with Flow Data - SO Con 2014
 
Testing Rolling Roots
Testing Rolling RootsTesting Rolling Roots
Testing Rolling Roots
 

More from OCLC

Communicating library impact beyond library walls: Findings from an action-or...
Communicating library impact beyond library walls: Findings from an action-or...Communicating library impact beyond library walls: Findings from an action-or...
Communicating library impact beyond library walls: Findings from an action-or...OCLC
 
"You can just tell whether a website looks reliable or not." People's modes o...
"You can just tell whether a website looks reliable or not." People's modes o..."You can just tell whether a website looks reliable or not." People's modes o...
"You can just tell whether a website looks reliable or not." People's modes o...OCLC
 
Factors influencing research data management programs.
Factors influencing research data management programs.Factors influencing research data management programs.
Factors influencing research data management programs.OCLC
 
Teaching research methods in LIS programs: Approaches, formats, and innovativ...
Teaching research methods in LIS programs: Approaches, formats, and innovativ...Teaching research methods in LIS programs: Approaches, formats, and innovativ...
Teaching research methods in LIS programs: Approaches, formats, and innovativ...OCLC
 
OCLC ALISE Library & Information Science Research Grant Program
OCLC ALISE Library & Information Science Research Grant ProgramOCLC ALISE Library & Information Science Research Grant Program
OCLC ALISE Library & Information Science Research Grant ProgramOCLC
 
Investing in library users and potential users: The Many Faces of Digital Vi...
 Investing in library users and potential users: The Many Faces of Digital Vi... Investing in library users and potential users: The Many Faces of Digital Vi...
Investing in library users and potential users: The Many Faces of Digital Vi...OCLC
 
Academic library impact: Improving practice and essential areas to research
Academic library impact: Improving practice and essential areas to researchAcademic library impact: Improving practice and essential areas to research
Academic library impact: Improving practice and essential areas to researchOCLC
 
Studying information behavior: The Many Faces of Digital Visitors and Residents
Studying information behavior: The Many Faces of Digital Visitors and ResidentsStudying information behavior: The Many Faces of Digital Visitors and Residents
Studying information behavior: The Many Faces of Digital Visitors and ResidentsOCLC
 
Online engagement and information literacy: The Many Face of Digital Visitors...
Online engagement and information literacy: The Many Face of Digital Visitors...Online engagement and information literacy: The Many Face of Digital Visitors...
Online engagement and information literacy: The Many Face of Digital Visitors...OCLC
 
People's mode of online engagement: The Many Faces of Digital Visitors and R...
 People's mode of online engagement: The Many Faces of Digital Visitors and R... People's mode of online engagement: The Many Faces of Digital Visitors and R...
People's mode of online engagement: The Many Faces of Digital Visitors and R...OCLC
 
Applying research methods: Investigating the Many Faces of Digital Visitors &...
Applying research methods: Investigating the Many Faces of Digital Visitors &...Applying research methods: Investigating the Many Faces of Digital Visitors &...
Applying research methods: Investigating the Many Faces of Digital Visitors &...OCLC
 
OCLC RLP @ RLUK
OCLC RLP @ RLUKOCLC RLP @ RLUK
OCLC RLP @ RLUKOCLC
 
Using Qualitative Methods for Library Evaluation: An Interactive Workshop
Using Qualitative Methods for Library Evaluation: An Interactive WorkshopUsing Qualitative Methods for Library Evaluation: An Interactive Workshop
Using Qualitative Methods for Library Evaluation: An Interactive WorkshopOCLC
 
Visitors and Residents: The Hows and Whys of Engagement with Technology
Visitors and Residents: The Hows and Whys of Engagement with TechnologyVisitors and Residents: The Hows and Whys of Engagement with Technology
Visitors and Residents: The Hows and Whys of Engagement with TechnologyOCLC
 
Action-Oriented Research Agenda on Library Contributions to Student Learning ...
Action-Oriented Research Agenda on Library Contributions to Student Learning ...Action-Oriented Research Agenda on Library Contributions to Student Learning ...
Action-Oriented Research Agenda on Library Contributions to Student Learning ...OCLC
 
Visitors and Residents: Interactive Mapping Exercise Workshop
Visitors and Residents: Interactive Mapping Exercise WorkshopVisitors and Residents: Interactive Mapping Exercise Workshop
Visitors and Residents: Interactive Mapping Exercise WorkshopOCLC
 
The Library in the Life of the User
The Library in the Life of the UserThe Library in the Life of the User
The Library in the Life of the UserOCLC
 
Where are We Going and What Do We Do Next? Demonstrating the Value of Academi...
Where are We Going and What Do We Do Next? Demonstrating the Value of Academi...Where are We Going and What Do We Do Next? Demonstrating the Value of Academi...
Where are We Going and What Do We Do Next? Demonstrating the Value of Academi...OCLC
 
Changing Tack: A Future-Focused ACRL Research Agenda
Changing Tack: A Future-Focused ACRL Research AgendaChanging Tack: A Future-Focused ACRL Research Agenda
Changing Tack: A Future-Focused ACRL Research AgendaOCLC
 
Qualitative Research Methods in LIS
Qualitative Research Methods in LISQualitative Research Methods in LIS
Qualitative Research Methods in LISOCLC
 

More from OCLC (20)

Communicating library impact beyond library walls: Findings from an action-or...
Communicating library impact beyond library walls: Findings from an action-or...Communicating library impact beyond library walls: Findings from an action-or...
Communicating library impact beyond library walls: Findings from an action-or...
 
"You can just tell whether a website looks reliable or not." People's modes o...
"You can just tell whether a website looks reliable or not." People's modes o..."You can just tell whether a website looks reliable or not." People's modes o...
"You can just tell whether a website looks reliable or not." People's modes o...
 
Factors influencing research data management programs.
Factors influencing research data management programs.Factors influencing research data management programs.
Factors influencing research data management programs.
 
Teaching research methods in LIS programs: Approaches, formats, and innovativ...
Teaching research methods in LIS programs: Approaches, formats, and innovativ...Teaching research methods in LIS programs: Approaches, formats, and innovativ...
Teaching research methods in LIS programs: Approaches, formats, and innovativ...
 
OCLC ALISE Library & Information Science Research Grant Program
OCLC ALISE Library & Information Science Research Grant ProgramOCLC ALISE Library & Information Science Research Grant Program
OCLC ALISE Library & Information Science Research Grant Program
 
Investing in library users and potential users: The Many Faces of Digital Vi...
 Investing in library users and potential users: The Many Faces of Digital Vi... Investing in library users and potential users: The Many Faces of Digital Vi...
Investing in library users and potential users: The Many Faces of Digital Vi...
 
Academic library impact: Improving practice and essential areas to research
Academic library impact: Improving practice and essential areas to researchAcademic library impact: Improving practice and essential areas to research
Academic library impact: Improving practice and essential areas to research
 
Studying information behavior: The Many Faces of Digital Visitors and Residents
Studying information behavior: The Many Faces of Digital Visitors and ResidentsStudying information behavior: The Many Faces of Digital Visitors and Residents
Studying information behavior: The Many Faces of Digital Visitors and Residents
 
Online engagement and information literacy: The Many Face of Digital Visitors...
Online engagement and information literacy: The Many Face of Digital Visitors...Online engagement and information literacy: The Many Face of Digital Visitors...
Online engagement and information literacy: The Many Face of Digital Visitors...
 
People's mode of online engagement: The Many Faces of Digital Visitors and R...
 People's mode of online engagement: The Many Faces of Digital Visitors and R... People's mode of online engagement: The Many Faces of Digital Visitors and R...
People's mode of online engagement: The Many Faces of Digital Visitors and R...
 
Applying research methods: Investigating the Many Faces of Digital Visitors &...
Applying research methods: Investigating the Many Faces of Digital Visitors &...Applying research methods: Investigating the Many Faces of Digital Visitors &...
Applying research methods: Investigating the Many Faces of Digital Visitors &...
 
OCLC RLP @ RLUK
OCLC RLP @ RLUKOCLC RLP @ RLUK
OCLC RLP @ RLUK
 
Using Qualitative Methods for Library Evaluation: An Interactive Workshop
Using Qualitative Methods for Library Evaluation: An Interactive WorkshopUsing Qualitative Methods for Library Evaluation: An Interactive Workshop
Using Qualitative Methods for Library Evaluation: An Interactive Workshop
 
Visitors and Residents: The Hows and Whys of Engagement with Technology
Visitors and Residents: The Hows and Whys of Engagement with TechnologyVisitors and Residents: The Hows and Whys of Engagement with Technology
Visitors and Residents: The Hows and Whys of Engagement with Technology
 
Action-Oriented Research Agenda on Library Contributions to Student Learning ...
Action-Oriented Research Agenda on Library Contributions to Student Learning ...Action-Oriented Research Agenda on Library Contributions to Student Learning ...
Action-Oriented Research Agenda on Library Contributions to Student Learning ...
 
Visitors and Residents: Interactive Mapping Exercise Workshop
Visitors and Residents: Interactive Mapping Exercise WorkshopVisitors and Residents: Interactive Mapping Exercise Workshop
Visitors and Residents: Interactive Mapping Exercise Workshop
 
The Library in the Life of the User
The Library in the Life of the UserThe Library in the Life of the User
The Library in the Life of the User
 
Where are We Going and What Do We Do Next? Demonstrating the Value of Academi...
Where are We Going and What Do We Do Next? Demonstrating the Value of Academi...Where are We Going and What Do We Do Next? Demonstrating the Value of Academi...
Where are We Going and What Do We Do Next? Demonstrating the Value of Academi...
 
Changing Tack: A Future-Focused ACRL Research Agenda
Changing Tack: A Future-Focused ACRL Research AgendaChanging Tack: A Future-Focused ACRL Research Agenda
Changing Tack: A Future-Focused ACRL Research Agenda
 
Qualitative Research Methods in LIS
Qualitative Research Methods in LISQualitative Research Methods in LIS
Qualitative Research Methods in LIS
 

Recently uploaded

Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 

Recently uploaded (20)

Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 

AutoSuggest

  • 1. This is for ELM Ralph LeVan Sr. Research Scientist 7/14/2016 Code4Lib Midwest AutoSuggest
  • 2. Goals • Return records at keystroke speeds • Run on an underpowered Unix box 2
  • 3. Result • Precalculate a response record for every possible legitimate keystroke combination • Load those records into a Pears database and expose via SRW • Client javascript takes keystrokes and turns them into queries to an AutoSuggest servlet • The thin gateway servlet takes queries, turns them into SRW requests and passes through the record returned 3
  • 4. How are the records precalculated? • For each source record, a relevance score is calculated – For VIAF, that’s a value in the record • Names are extracted from the record. – The names are ranked – The best name gets the score of the record and subsequent names get a reduced score – For each name, a tuple is generated containing the name, the recordID of the source record, the score for the name and any other data extracted from the record 4
  • 5. How are the records precalculated? • The tuples are sorted • A process reads in all the names that start with the same letter. • The first two terms are compared and a top-10 list is started for each set of letters in common – E.g. Andrew and Anthony each go into the top-10 list for A and AN. – AutoSuggest records are generated for the singletons Andrew and Anthony. The full name is the key for these records. 5
  • 6. How are the records precalculated? • The next term is compared to the one that preceeded it – E.g. Anthony and Astrid are compared – Astrid is added to the top-10 list for A – An AutoSuggest record is written for the AN list • The key for the record is AN • Each of the names (and associated data) are included in the record – An AutoSuggest record is generated for the singleton Astrid 6
  • 7. Top-10 is complicated • The naïve assumption is that the 10 names with the highest score would be in the list • But, all the variations on Shakespeare that start with S would be in the S record. • So, a candidate name for the top-10 list is checked to see if there is a higher ranking name with the same recordID before it is added 7
  • 8. It’s not really that easy • All the names that start with A won’t fit into memory. • We do all of this work in Hadoop • We partition the tuple input on the first 5 letters in common • Process as described before, but write the shorter fragments (less than 5 letters) to a separate directory • Combine those lists to produce unified lists (and records) 8
  • 9. Loaded into Pears • All these generated records are loaded into Pears • Lots and lots of records – The latest AutoSuggest database for VIAF has 341 million records in it. – VIAF itself only has 31M records 9
  • 10. Thank You! ©2014 OCLC. This work is licensed under a Creative Commons Attribution 3.0 Unported License. Suggested attribution: “This work uses content from [presentation title] © OCLC, used under a Creative Commons Attribution license: http://creativecommons.org/licenses/by/3.0/” Ralph LeVan levan@oclc.org 10