SlideShare a Scribd company logo
1 of 4
Download to read offline
The Coming Explosion of Records at FamilySearch
Ben Baker – bakerb@familysearch.org
To view the presentation slides this handout accompanies, please go to:
https://www.slideshare.net/bakers84/the-coming-explosion-of-records-at-familysearch-presentation
Historical Records Basics
• FamilySearch published its 2 billionth image in April 2018 – 1 billionth image was in June 2014
• Continue to digitize nearly 1M images per day from microfilm and over 320 cameras worldwide
• Many records are only available as images via the catalog
• Despite having 6.3B indexed names, only a fraction of records have been indexed
• Indexing isn’t keeping up with the ability to digitize images, especially in non-English languages
• Only indexed records can be presented as record hints
• Record hinting has already made FamilySearch Family Tree the most well sourced tree in the
world with over 931M sources attached to persons in the tree
• Current available record images do not match church membership in some areas
Changing the Records Publication Paradigm
• Several teams at FamilySearch are dedicated to improving the records publication platform
• The Goal: Provide more findable, relevant, curated records for gathering multi-generational
families from around the world
• Want to publish and make hintable 20% of the top tier records in 50 of the highest priority
countries within 15 years
• Seeking to allow homelands to be more involved in building local content
• Will support user corrections to records and indexing on-the-fly
• Will use automated technologies to accelerate publication
Historical Records Images by
Region at FamilySearch
North America Europe and Middle East
Latin America Other
Asia Africa/Pacific
LDS Church Membership by
Region
North America Europe and Middle East
Latin America Other
Asia Africa/Pacific
Investigations into Automated Indexing
• Personal Story - 2011 International Conference on Document Analysis and Recognition in Beijing
• Collaboration with other companies to explore handwriting recognition – “not ready yet”
• First “mini explosion” occurred a couple of years ago
o Partnership with GenealogyBank to extract data from born digital obituaries
o First run indexed 5M obituaries in 10 hours, saving about 150 man-years of indexing
o 23M obituaries indexed as of May 2018, many more coming
o Uses recent advancements in machine learning and artificial intelligence (AI)
o Can produce even more information than indexing (Ex. In-law relationships)
What is Being Done Now
• Refining research code and models to be more stable, reproducible and measurable
• Support ability to publish 1M obituaries a month now, continuing to increase
• Built on scalable Amazon Web Services to meet any future demands
Basics of Artificial Intelligence / Machine Learning
• Artificial Intelligence – Machines exhibiting human intelligence
o General AI – still science fiction
o Narrow AI – technologies that perform specific tasks as well or better than humans
• Machine Learning – A subset of AI. The practice of using algorithms to parse data, learn from it,
and then make a determination or prediction about something in the world
• Machine Learning is using computers so they can learn from data instead of writing rules (i.e.
code) to solve problems
• FamilySearch has actually been using machine learning for a while
o Possible duplicates
o Record hints
• Technologies needed to successfully extract information from an obituary
o Natural Language Processing (NLP)
▪ Named entity recognition (NER) – identify the names, dates, places, etc.
▪ Relation extraction – identify relationships between the names, dates & places
o Additional processing to get into format for publication, standardize data, etc.
o Notice the steps are similar to what a genealogist would do
What is Coming in the Future
• Research already underway and looking very promising for
o Optical Character Recognition (OCR)
o Zoning (Ex. determining where newspaper articles are)
o Handwriting Recognition
• Expanding capabilities into more document and record types
• Beginning to investigate other languages
Document Type Record Type Language Status in May 2018
Digital text Obituaries English Already published 23M
Working to continuously publish
Typewritten
newspaper text
Obituaries English Active research
Handwritten text Wills and deeds English Active research
Handwritten
calligraphy
Genealogies Chinese Preliminary research
Handwritten text Church records Spanish Preliminary research
More document
types
More record types More
languages
Expect future “explosions”
What You Can Do
• Indexing is still valuable, especially in non-English languages
• Remember indexed data is the foundation for training machines to auto-index correctly
• Understand your role in correcting records that have been automatically indexed incorrectly
• Be patient as solutions continue to expand, perhaps on collections that don’t benefit your
research, remembering we are a global church
• Pray for the Lord’s help to bless these efforts
“We always overestimate the change that will occur in the next two years and
underestimate the change that will occur in the next ten.”
Bill Gates

More Related Content

What's hot

Text analysis-semantic-search
Text analysis-semantic-searchText analysis-semantic-search
Text analysis-semantic-searchDiana Maynard
 
Adding value to NLP: a little semantics goes a long way
Adding value to NLP: a little semantics goes a long wayAdding value to NLP: a little semantics goes a long way
Adding value to NLP: a little semantics goes a long wayDiana Maynard
 
DMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data MiningDMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data MiningPier Luca Lanzi
 
Online text data for machine learning, data science, and research - Who can p...
Online text data for machine learning, data science, and research - Who can p...Online text data for machine learning, data science, and research - Who can p...
Online text data for machine learning, data science, and research - Who can p...Fredrik Olsson
 
Filth and lies: analysing social media
Filth and lies: analysing social mediaFilth and lies: analysing social media
Filth and lies: analysing social mediaDiana Maynard
 
Lessons Learned from Lod Failure and Big Data : The Future Trend
Lessons Learned from Lod Failure and Big Data : The Future Trend Lessons Learned from Lod Failure and Big Data : The Future Trend
Lessons Learned from Lod Failure and Big Data : The Future Trend Konkuk University
 
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...James Hendler
 
The language of social media
The language of social mediaThe language of social media
The language of social mediaDiana Maynard
 
2013-04-02 Cybertraps for Educators
2013-04-02 Cybertraps for Educators2013-04-02 Cybertraps for Educators
2013-04-02 Cybertraps for EducatorsFrederick Lane
 
Cybertraps for Educators
Cybertraps for EducatorsCybertraps for Educators
Cybertraps for EducatorsFrederick Lane
 
DMTM 2015 - 01 Course Introduction
DMTM 2015 - 01 Course IntroductionDMTM 2015 - 01 Course Introduction
DMTM 2015 - 01 Course IntroductionPier Luca Lanzi
 
Getting Started in Data Science
Getting Started in Data ScienceGetting Started in Data Science
Getting Started in Data ScienceThinkful
 
Artificial Intelligence and the Coming Revolution of Family History - Present...
Artificial Intelligence and the Coming Revolution of Family History - Present...Artificial Intelligence and the Coming Revolution of Family History - Present...
Artificial Intelligence and the Coming Revolution of Family History - Present...bakers84
 
Using language to save the world: interactions between society, behaviour and...
Using language to save the world: interactions between society, behaviour and...Using language to save the world: interactions between society, behaviour and...
Using language to save the world: interactions between society, behaviour and...Diana Maynard
 
HathiTrust--a GovDocs Repository?
HathiTrust--a GovDocs Repository?HathiTrust--a GovDocs Repository?
HathiTrust--a GovDocs Repository?Brian Vetruba
 
Shared Data & Big Data for Libraries
Shared Data & Big Data for LibrariesShared Data & Big Data for Libraries
Shared Data & Big Data for Librariesrobin fay
 
2014-08-27 Cybertraps for Educators: The Professional Perils of 24/7 Communic...
2014-08-27 Cybertraps for Educators: The Professional Perils of 24/7 Communic...2014-08-27 Cybertraps for Educators: The Professional Perils of 24/7 Communic...
2014-08-27 Cybertraps for Educators: The Professional Perils of 24/7 Communic...Frederick Lane
 
Introduction to digital scholarship tools
Introduction to digital scholarship toolsIntroduction to digital scholarship tools
Introduction to digital scholarship toolslibrarianrafia
 
Legal Research in the Age of Cloud Computing
Legal Research in the Age of Cloud ComputingLegal Research in the Age of Cloud Computing
Legal Research in the Age of Cloud ComputingNeal Axton
 

What's hot (20)

Text analysis-semantic-search
Text analysis-semantic-searchText analysis-semantic-search
Text analysis-semantic-search
 
Adding value to NLP: a little semantics goes a long way
Adding value to NLP: a little semantics goes a long wayAdding value to NLP: a little semantics goes a long way
Adding value to NLP: a little semantics goes a long way
 
DMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data MiningDMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data Mining
 
Online text data for machine learning, data science, and research - Who can p...
Online text data for machine learning, data science, and research - Who can p...Online text data for machine learning, data science, and research - Who can p...
Online text data for machine learning, data science, and research - Who can p...
 
Filth and lies: analysing social media
Filth and lies: analysing social mediaFilth and lies: analysing social media
Filth and lies: analysing social media
 
Lessons Learned from Lod Failure and Big Data : The Future Trend
Lessons Learned from Lod Failure and Big Data : The Future Trend Lessons Learned from Lod Failure and Big Data : The Future Trend
Lessons Learned from Lod Failure and Big Data : The Future Trend
 
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...
 
The language of social media
The language of social mediaThe language of social media
The language of social media
 
2013-04-02 Cybertraps for Educators
2013-04-02 Cybertraps for Educators2013-04-02 Cybertraps for Educators
2013-04-02 Cybertraps for Educators
 
Cybertraps for Educators
Cybertraps for EducatorsCybertraps for Educators
Cybertraps for Educators
 
DMTM 2015 - 01 Course Introduction
DMTM 2015 - 01 Course IntroductionDMTM 2015 - 01 Course Introduction
DMTM 2015 - 01 Course Introduction
 
Getting Started in Data Science
Getting Started in Data ScienceGetting Started in Data Science
Getting Started in Data Science
 
Artificial Intelligence and the Coming Revolution of Family History - Present...
Artificial Intelligence and the Coming Revolution of Family History - Present...Artificial Intelligence and the Coming Revolution of Family History - Present...
Artificial Intelligence and the Coming Revolution of Family History - Present...
 
Using language to save the world: interactions between society, behaviour and...
Using language to save the world: interactions between society, behaviour and...Using language to save the world: interactions between society, behaviour and...
Using language to save the world: interactions between society, behaviour and...
 
HathiTrust--a GovDocs Repository?
HathiTrust--a GovDocs Repository?HathiTrust--a GovDocs Repository?
HathiTrust--a GovDocs Repository?
 
Shared Data & Big Data for Libraries
Shared Data & Big Data for LibrariesShared Data & Big Data for Libraries
Shared Data & Big Data for Libraries
 
2014-08-27 Cybertraps for Educators: The Professional Perils of 24/7 Communic...
2014-08-27 Cybertraps for Educators: The Professional Perils of 24/7 Communic...2014-08-27 Cybertraps for Educators: The Professional Perils of 24/7 Communic...
2014-08-27 Cybertraps for Educators: The Professional Perils of 24/7 Communic...
 
Introduction to digital scholarship tools
Introduction to digital scholarship toolsIntroduction to digital scholarship tools
Introduction to digital scholarship tools
 
Library project ethnographic presentation
Library project ethnographic presentationLibrary project ethnographic presentation
Library project ethnographic presentation
 
Legal Research in the Age of Cloud Computing
Legal Research in the Age of Cloud ComputingLegal Research in the Age of Cloud Computing
Legal Research in the Age of Cloud Computing
 

Similar to The Coming Explosion of Records at FamilySearch Syllabus

Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations Roi Blanco
 
What's the fuss about all this metadata?
What's the fuss about all this metadata?What's the fuss about all this metadata?
What's the fuss about all this metadata?Sara Sterkenburg
 
Enterprise Search Share Point2009 Best Practices Final
Enterprise Search Share Point2009 Best Practices FinalEnterprise Search Share Point2009 Best Practices Final
Enterprise Search Share Point2009 Best Practices FinalMarianne Sweeny
 
Building Corpora from Social Media
Building Corpora from Social MediaBuilding Corpora from Social Media
Building Corpora from Social MediaRichard Littauer
 
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012lljohnston
 
Opinion mining for social media
Opinion mining for social mediaOpinion mining for social media
Opinion mining for social mediaDiana Maynard
 
Spanish 3221
Spanish 3221Spanish 3221
Spanish 3221k-baril
 
Web Content Strategy for Libraries
Web Content Strategy for LibrariesWeb Content Strategy for Libraries
Web Content Strategy for LibrariesChris Evjy
 
Introduction to NLP.pptx
Introduction to NLP.pptxIntroduction to NLP.pptx
Introduction to NLP.pptxbuivantan_uneti
 
From MARC to LOD: preparing Wellcome Library metadata for discovery on the We...
From MARC to LOD: preparing Wellcome Library metadata for discovery on the We...From MARC to LOD: preparing Wellcome Library metadata for discovery on the We...
From MARC to LOD: preparing Wellcome Library metadata for discovery on the We...CILIP MDG
 
Digital Tools, Trends and Methodologies in the Humanities and Social Sciences
Digital Tools, Trends and Methodologies in the Humanities and Social SciencesDigital Tools, Trends and Methodologies in the Humanities and Social Sciences
Digital Tools, Trends and Methodologies in the Humanities and Social SciencesShawn Day
 
Leslie Johnston Keynote, Best Practices Exchange 2011
Leslie Johnston Keynote, Best Practices Exchange 2011Leslie Johnston Keynote, Best Practices Exchange 2011
Leslie Johnston Keynote, Best Practices Exchange 2011lljohnston
 
Webinar: Getting Started with Digitization An Introduction for Libraries-2016...
Webinar: Getting Started with Digitization An Introduction for Libraries-2016...Webinar: Getting Started with Digitization An Introduction for Libraries-2016...
Webinar: Getting Started with Digitization An Introduction for Libraries-2016...TechSoup
 
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...hajinouha0
 
Introduction to nlp
Introduction to nlpIntroduction to nlp
Introduction to nlpAmaan Shaikh
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information RetrievalCarsten Eickhoff
 
05. EDT 513 Week 5 2023 Searching the Internet.pptx
05. EDT 513 Week 5 2023 Searching the Internet.pptx05. EDT 513 Week 5 2023 Searching the Internet.pptx
05. EDT 513 Week 5 2023 Searching the Internet.pptxGambari Amosa Isiaka
 
Calendars, Chronologies, and Consumer Resources
Calendars, Chronologies, and Consumer ResourcesCalendars, Chronologies, and Consumer Resources
Calendars, Chronologies, and Consumer Resourcesryannoble_1
 

Similar to The Coming Explosion of Records at FamilySearch Syllabus (20)

Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations
 
What's the fuss about all this metadata?
What's the fuss about all this metadata?What's the fuss about all this metadata?
What's the fuss about all this metadata?
 
Enterprise Search Share Point2009 Best Practices Final
Enterprise Search Share Point2009 Best Practices FinalEnterprise Search Share Point2009 Best Practices Final
Enterprise Search Share Point2009 Best Practices Final
 
Building Corpora from Social Media
Building Corpora from Social MediaBuilding Corpora from Social Media
Building Corpora from Social Media
 
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012
 
Opinion mining for social media
Opinion mining for social mediaOpinion mining for social media
Opinion mining for social media
 
Spanish 3221
Spanish 3221Spanish 3221
Spanish 3221
 
Web Content Strategy for Libraries
Web Content Strategy for LibrariesWeb Content Strategy for Libraries
Web Content Strategy for Libraries
 
Introduction to NLP.pptx
Introduction to NLP.pptxIntroduction to NLP.pptx
Introduction to NLP.pptx
 
From MARC to LOD: preparing Wellcome Library metadata for discovery on the We...
From MARC to LOD: preparing Wellcome Library metadata for discovery on the We...From MARC to LOD: preparing Wellcome Library metadata for discovery on the We...
From MARC to LOD: preparing Wellcome Library metadata for discovery on the We...
 
Digital Tools, Trends and Methodologies in the Humanities and Social Sciences
Digital Tools, Trends and Methodologies in the Humanities and Social SciencesDigital Tools, Trends and Methodologies in the Humanities and Social Sciences
Digital Tools, Trends and Methodologies in the Humanities and Social Sciences
 
Leslie Johnston Keynote, Best Practices Exchange 2011
Leslie Johnston Keynote, Best Practices Exchange 2011Leslie Johnston Keynote, Best Practices Exchange 2011
Leslie Johnston Keynote, Best Practices Exchange 2011
 
Webinar: Getting Started with Digitization An Introduction for Libraries-2016...
Webinar: Getting Started with Digitization An Introduction for Libraries-2016...Webinar: Getting Started with Digitization An Introduction for Libraries-2016...
Webinar: Getting Started with Digitization An Introduction for Libraries-2016...
 
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
 
Cataloging Presentation
Cataloging PresentationCataloging Presentation
Cataloging Presentation
 
Introduction to nlp
Introduction to nlpIntroduction to nlp
Introduction to nlp
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
 
05. EDT 513 Week 5 2023 Searching the Internet.pptx
05. EDT 513 Week 5 2023 Searching the Internet.pptx05. EDT 513 Week 5 2023 Searching the Internet.pptx
05. EDT 513 Week 5 2023 Searching the Internet.pptx
 
Calendars, Chronologies, and Consumer Resources
Calendars, Chronologies, and Consumer ResourcesCalendars, Chronologies, and Consumer Resources
Calendars, Chronologies, and Consumer Resources
 
Content strategy past, present, future
Content strategy past, present, futureContent strategy past, present, future
Content strategy past, present, future
 

More from bakers84

Civil Registration Records in Latin America and Spain - Presentation
Civil Registration Records in Latin America and Spain - PresentationCivil Registration Records in Latin America and Spain - Presentation
Civil Registration Records in Latin America and Spain - Presentationbakers84
 
Civil Registration Records in Latin America and Spain - Handout
Civil Registration Records in Latin America and Spain - HandoutCivil Registration Records in Latin America and Spain - Handout
Civil Registration Records in Latin America and Spain - Handoutbakers84
 
Finding Relatives in Spanish Church Records
Finding Relatives in Spanish Church RecordsFinding Relatives in Spanish Church Records
Finding Relatives in Spanish Church Recordsbakers84
 
Leveraging the Consultant Planner - Presentation
Leveraging the Consultant Planner - PresentationLeveraging the Consultant Planner - Presentation
Leveraging the Consultant Planner - Presentationbakers84
 
Leveraging the Consultant Planner Syllabus
Leveraging the Consultant Planner SyllabusLeveraging the Consultant Planner Syllabus
Leveraging the Consultant Planner Syllabusbakers84
 
A Peek Under the Hood at FamilySearch Syllabus
A Peek Under the Hood at FamilySearch SyllabusA Peek Under the Hood at FamilySearch Syllabus
A Peek Under the Hood at FamilySearch Syllabusbakers84
 
Meaningful Family History in an Hour Syllabus
Meaningful Family History in an Hour SyllabusMeaningful Family History in an Hour Syllabus
Meaningful Family History in an Hour Syllabusbakers84
 
Meaningful Family History In an Hour - Presentation
Meaningful Family History In an Hour - PresentationMeaningful Family History In an Hour - Presentation
Meaningful Family History In an Hour - Presentationbakers84
 
Viewing Closest Relatives in the My Relatives View Paper
Viewing Closest Relatives in the My Relatives View PaperViewing Closest Relatives in the My Relatives View Paper
Viewing Closest Relatives in the My Relatives View Paperbakers84
 
Viewing Closest Relatives in the My Relatives View Poster
Viewing Closest Relatives in the My Relatives View PosterViewing Closest Relatives in the My Relatives View Poster
Viewing Closest Relatives in the My Relatives View Posterbakers84
 
Start and Grow Your Family Tree on FamilySearch.org - Presentation
Start and Grow Your Family Tree on FamilySearch.org - PresentationStart and Grow Your Family Tree on FamilySearch.org - Presentation
Start and Grow Your Family Tree on FamilySearch.org - Presentationbakers84
 
Help! My Family Is All Messed Up on FamilySearch Family Tree!
Help! My Family Is All Messed Up on FamilySearch Family Tree!Help! My Family Is All Messed Up on FamilySearch Family Tree!
Help! My Family Is All Messed Up on FamilySearch Family Tree!bakers84
 
FamilySearch Family Tree Essentials - Find, Take, Teach Webinar
FamilySearch Family Tree Essentials - Find, Take, Teach WebinarFamilySearch Family Tree Essentials - Find, Take, Teach Webinar
FamilySearch Family Tree Essentials - Find, Take, Teach Webinarbakers84
 
What I Wish Everyone in the LDS Church Knew About Family History
What I Wish Everyone in the LDS Church Knew About Family HistoryWhat I Wish Everyone in the LDS Church Knew About Family History
What I Wish Everyone in the LDS Church Knew About Family Historybakers84
 
FamilySearch Insider Tips and Tricks - Syllabus
FamilySearch Insider Tips and Tricks - SyllabusFamilySearch Insider Tips and Tricks - Syllabus
FamilySearch Insider Tips and Tricks - Syllabusbakers84
 
FamilySearch Insider Tips and Tricks - Presentation
FamilySearch Insider Tips and Tricks - PresentationFamilySearch Insider Tips and Tricks - Presentation
FamilySearch Insider Tips and Tricks - Presentationbakers84
 
Finding 'My Tree' Within FamilySearch Family Tree's 'Our Tree'
Finding 'My Tree' Within FamilySearch Family Tree's 'Our Tree'Finding 'My Tree' Within FamilySearch Family Tree's 'Our Tree'
Finding 'My Tree' Within FamilySearch Family Tree's 'Our Tree'bakers84
 
A Whirlwind Tour of FamilySearch Resources - 2013 Presentation
A Whirlwind Tour of FamilySearch Resources - 2013 PresentationA Whirlwind Tour of FamilySearch Resources - 2013 Presentation
A Whirlwind Tour of FamilySearch Resources - 2013 Presentationbakers84
 
Merging People in FamilySearch Family Tree - Presentation
Merging People in FamilySearch Family Tree - PresentationMerging People in FamilySearch Family Tree - Presentation
Merging People in FamilySearch Family Tree - Presentationbakers84
 
A Whirlwind Tour of FamilySearch Resources - 2013 URL List
A Whirlwind Tour of FamilySearch Resources - 2013 URL ListA Whirlwind Tour of FamilySearch Resources - 2013 URL List
A Whirlwind Tour of FamilySearch Resources - 2013 URL Listbakers84
 

More from bakers84 (20)

Civil Registration Records in Latin America and Spain - Presentation
Civil Registration Records in Latin America and Spain - PresentationCivil Registration Records in Latin America and Spain - Presentation
Civil Registration Records in Latin America and Spain - Presentation
 
Civil Registration Records in Latin America and Spain - Handout
Civil Registration Records in Latin America and Spain - HandoutCivil Registration Records in Latin America and Spain - Handout
Civil Registration Records in Latin America and Spain - Handout
 
Finding Relatives in Spanish Church Records
Finding Relatives in Spanish Church RecordsFinding Relatives in Spanish Church Records
Finding Relatives in Spanish Church Records
 
Leveraging the Consultant Planner - Presentation
Leveraging the Consultant Planner - PresentationLeveraging the Consultant Planner - Presentation
Leveraging the Consultant Planner - Presentation
 
Leveraging the Consultant Planner Syllabus
Leveraging the Consultant Planner SyllabusLeveraging the Consultant Planner Syllabus
Leveraging the Consultant Planner Syllabus
 
A Peek Under the Hood at FamilySearch Syllabus
A Peek Under the Hood at FamilySearch SyllabusA Peek Under the Hood at FamilySearch Syllabus
A Peek Under the Hood at FamilySearch Syllabus
 
Meaningful Family History in an Hour Syllabus
Meaningful Family History in an Hour SyllabusMeaningful Family History in an Hour Syllabus
Meaningful Family History in an Hour Syllabus
 
Meaningful Family History In an Hour - Presentation
Meaningful Family History In an Hour - PresentationMeaningful Family History In an Hour - Presentation
Meaningful Family History In an Hour - Presentation
 
Viewing Closest Relatives in the My Relatives View Paper
Viewing Closest Relatives in the My Relatives View PaperViewing Closest Relatives in the My Relatives View Paper
Viewing Closest Relatives in the My Relatives View Paper
 
Viewing Closest Relatives in the My Relatives View Poster
Viewing Closest Relatives in the My Relatives View PosterViewing Closest Relatives in the My Relatives View Poster
Viewing Closest Relatives in the My Relatives View Poster
 
Start and Grow Your Family Tree on FamilySearch.org - Presentation
Start and Grow Your Family Tree on FamilySearch.org - PresentationStart and Grow Your Family Tree on FamilySearch.org - Presentation
Start and Grow Your Family Tree on FamilySearch.org - Presentation
 
Help! My Family Is All Messed Up on FamilySearch Family Tree!
Help! My Family Is All Messed Up on FamilySearch Family Tree!Help! My Family Is All Messed Up on FamilySearch Family Tree!
Help! My Family Is All Messed Up on FamilySearch Family Tree!
 
FamilySearch Family Tree Essentials - Find, Take, Teach Webinar
FamilySearch Family Tree Essentials - Find, Take, Teach WebinarFamilySearch Family Tree Essentials - Find, Take, Teach Webinar
FamilySearch Family Tree Essentials - Find, Take, Teach Webinar
 
What I Wish Everyone in the LDS Church Knew About Family History
What I Wish Everyone in the LDS Church Knew About Family HistoryWhat I Wish Everyone in the LDS Church Knew About Family History
What I Wish Everyone in the LDS Church Knew About Family History
 
FamilySearch Insider Tips and Tricks - Syllabus
FamilySearch Insider Tips and Tricks - SyllabusFamilySearch Insider Tips and Tricks - Syllabus
FamilySearch Insider Tips and Tricks - Syllabus
 
FamilySearch Insider Tips and Tricks - Presentation
FamilySearch Insider Tips and Tricks - PresentationFamilySearch Insider Tips and Tricks - Presentation
FamilySearch Insider Tips and Tricks - Presentation
 
Finding 'My Tree' Within FamilySearch Family Tree's 'Our Tree'
Finding 'My Tree' Within FamilySearch Family Tree's 'Our Tree'Finding 'My Tree' Within FamilySearch Family Tree's 'Our Tree'
Finding 'My Tree' Within FamilySearch Family Tree's 'Our Tree'
 
A Whirlwind Tour of FamilySearch Resources - 2013 Presentation
A Whirlwind Tour of FamilySearch Resources - 2013 PresentationA Whirlwind Tour of FamilySearch Resources - 2013 Presentation
A Whirlwind Tour of FamilySearch Resources - 2013 Presentation
 
Merging People in FamilySearch Family Tree - Presentation
Merging People in FamilySearch Family Tree - PresentationMerging People in FamilySearch Family Tree - Presentation
Merging People in FamilySearch Family Tree - Presentation
 
A Whirlwind Tour of FamilySearch Resources - 2013 URL List
A Whirlwind Tour of FamilySearch Resources - 2013 URL ListA Whirlwind Tour of FamilySearch Resources - 2013 URL List
A Whirlwind Tour of FamilySearch Resources - 2013 URL List
 

Recently uploaded

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 

Recently uploaded (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 

The Coming Explosion of Records at FamilySearch Syllabus

  • 1. The Coming Explosion of Records at FamilySearch Ben Baker – bakerb@familysearch.org To view the presentation slides this handout accompanies, please go to: https://www.slideshare.net/bakers84/the-coming-explosion-of-records-at-familysearch-presentation Historical Records Basics • FamilySearch published its 2 billionth image in April 2018 – 1 billionth image was in June 2014 • Continue to digitize nearly 1M images per day from microfilm and over 320 cameras worldwide • Many records are only available as images via the catalog • Despite having 6.3B indexed names, only a fraction of records have been indexed • Indexing isn’t keeping up with the ability to digitize images, especially in non-English languages • Only indexed records can be presented as record hints • Record hinting has already made FamilySearch Family Tree the most well sourced tree in the world with over 931M sources attached to persons in the tree • Current available record images do not match church membership in some areas Changing the Records Publication Paradigm • Several teams at FamilySearch are dedicated to improving the records publication platform • The Goal: Provide more findable, relevant, curated records for gathering multi-generational families from around the world • Want to publish and make hintable 20% of the top tier records in 50 of the highest priority countries within 15 years • Seeking to allow homelands to be more involved in building local content • Will support user corrections to records and indexing on-the-fly • Will use automated technologies to accelerate publication Historical Records Images by Region at FamilySearch North America Europe and Middle East Latin America Other Asia Africa/Pacific LDS Church Membership by Region North America Europe and Middle East Latin America Other Asia Africa/Pacific
  • 2. Investigations into Automated Indexing • Personal Story - 2011 International Conference on Document Analysis and Recognition in Beijing • Collaboration with other companies to explore handwriting recognition – “not ready yet” • First “mini explosion” occurred a couple of years ago o Partnership with GenealogyBank to extract data from born digital obituaries o First run indexed 5M obituaries in 10 hours, saving about 150 man-years of indexing o 23M obituaries indexed as of May 2018, many more coming o Uses recent advancements in machine learning and artificial intelligence (AI) o Can produce even more information than indexing (Ex. In-law relationships)
  • 3. What is Being Done Now • Refining research code and models to be more stable, reproducible and measurable • Support ability to publish 1M obituaries a month now, continuing to increase • Built on scalable Amazon Web Services to meet any future demands Basics of Artificial Intelligence / Machine Learning • Artificial Intelligence – Machines exhibiting human intelligence o General AI – still science fiction o Narrow AI – technologies that perform specific tasks as well or better than humans • Machine Learning – A subset of AI. The practice of using algorithms to parse data, learn from it, and then make a determination or prediction about something in the world • Machine Learning is using computers so they can learn from data instead of writing rules (i.e. code) to solve problems • FamilySearch has actually been using machine learning for a while o Possible duplicates o Record hints • Technologies needed to successfully extract information from an obituary o Natural Language Processing (NLP) ▪ Named entity recognition (NER) – identify the names, dates, places, etc. ▪ Relation extraction – identify relationships between the names, dates & places o Additional processing to get into format for publication, standardize data, etc. o Notice the steps are similar to what a genealogist would do
  • 4. What is Coming in the Future • Research already underway and looking very promising for o Optical Character Recognition (OCR) o Zoning (Ex. determining where newspaper articles are) o Handwriting Recognition • Expanding capabilities into more document and record types • Beginning to investigate other languages Document Type Record Type Language Status in May 2018 Digital text Obituaries English Already published 23M Working to continuously publish Typewritten newspaper text Obituaries English Active research Handwritten text Wills and deeds English Active research Handwritten calligraphy Genealogies Chinese Preliminary research Handwritten text Church records Spanish Preliminary research More document types More record types More languages Expect future “explosions” What You Can Do • Indexing is still valuable, especially in non-English languages • Remember indexed data is the foundation for training machines to auto-index correctly • Understand your role in correcting records that have been automatically indexed incorrectly • Be patient as solutions continue to expand, perhaps on collections that don’t benefit your research, remembering we are a global church • Pray for the Lord’s help to bless these efforts “We always overestimate the change that will occur in the next two years and underestimate the change that will occur in the next ten.” Bill Gates