SlideShare a Scribd company logo
© 2013 by Intellectual Reserve, Inc. All rights reserved.
The Coming Explosion of
Records at FamilySearch
BYU Conference on Family History and Genealogy
July 31, 2018
Ben Baker
bakerb@familysearch.org
Background
• Over 8½ years as a Software Engineer at FamilySearch
• Currently on the Automated Content Extraction team
• Try to do my own genealogy and help others
• Hope I’ll be able to help you see a vision of the future
• Go to https://www.slideshare.net/bakers84 or e-mail me
(bakerb@familysearch.org) to get a copy of this
presentation
• Click here for the related printed handout materials
First, Some Basics
Good News
• FamilySearch published its 2
billionth image in April 2018
• The 1 billionth image was
published in June 2014
• FamilySearch continues to
digitize nearly 1M images per
day from microfilm and about 320
cameras worldwide
• Family has nearly 6.4B indexed
names of people in records
• Record hinting has already made
FamilySearch Family Tree the
most well sourced tree in the
world with over 1B sources
attached to persons in the tree
Bad News
• Many records are only available
as images via the catalog. Only a
fraction of records have been
indexed
• Indexing isn’t keeping up with the
ability to digitize images,
especially in non-English
languages
• Current available record images
do not match church membership
in some areas
• Only indexed records can be
presented as record hints
Historical Records Images by Region at
FamilySearch
North America Europe and Middle East Latin America
Other Asia Africa/Pacific
LDS Church Membership by Region
North America Europe and Middle East Latin America
Other Asia Africa/Pacific
Changing the Records Publication
Paradigm
• Several teams at FamilySearch are dedicated to improving the
records publication platform
• The Goal: Provide more findable, relevant, curated records for
gathering multi-generational families from around the world
• Want to publish and make hintable 20% of the top tier records
in 50 of the highest priority countries within 15 years
• 58% coverage in North America as of 2017
• Crossed 20% in 3 more countries in 2017 (Denmark, Finland and Sweden)
• Major release of Mexican records in 2018
• Seeking to allow homelands to be more involved in building
local content
• Will support user corrections to records and indexing on-the-fly
• Will use automated technologies to accelerate publication
International Conference on Document
Analysis and Recognition (ICDAR) 2011
Beijing Friendship Hotel
First Mini-Explosion
• Partnership with GenealogyBank to extract
data from born digital obituaries
• First run indexed 5M obituaries in 10 hours,
saving about 150 man-years of indexing
• 23M obituaries indexed as of May 2018,
many more coming
• Uses recent advancements in machine
learning and artificial intelligence (AI)
• Can produce even more information than
indexing (Ex. In-law couple relationships)
Video 1
GenealogyBank
collection of
obituaries is
available now
Improvements
to correcting
data are coming
What is Being Done Now
• Refining research code and models to be more
stable, reproducible and measurable
• Support ability to publish 1M obituaries a month
now, continuing to increase
• Built on scalable Amazon Web Services to meet
any future demands
How are Artificial Intelligence,
Machine Learning and Deep
Learning Related?
Artificial Intelligence – Machines exhibiting
human intelligence
• General AI – still science fiction
• Narrow AI – technologies that perform
specific tasks as well or better than humans
Machine Learning – Practice of using algorithms
to parse data, learn from it, and then make a
determination or prediction about something
in the world
Deep Learning – Using much larger machine
learning neural networks requiring more
training data and computational power
Artificial Intelligence
Machine
Learning
Deep
Learning
Machine Learning Isn’t Really New
• Been around for decades
• Spam filters in 1990s
• OCR (Optical Character Recognition)
• FamilySearch already uses for some things
• Match classifier
• Possible duplicates (person – person)
• Record hinting (person – record)
• FamilySearch is beginning to explore new uses
• Research Team -> Automated Content Extraction
• Exploring Deep Learning and other methods to automatically
understand historical documents
How is Machine Learning different
from traditional programming?
Machine Learning is using computers so they can learn
from data instead of writing rules (i.e. code) to solve
problems
Study the
Problem
Write Rules Evaluate
Launch!
Analyze
Errors
Study the
Problem
Train ML
Algorithm
Evaluate
Launch!
Analyze
Errors
Data
Necessary Technologies
• Natural Language Processing (NLP)
• Named entity recognition (NER) – identify the names,
dates, places, etc.
• Relation extraction – identify relationships between the
names, dates & places
• Additional processing to get into format for
publication, standardize data, etc.
• Notice the steps are similar to what a
genealogist would do
Identification and Extraction of Data
Live Demos
Lille E. Yeckley 1915-1980
Document Type Record Type Language Status in May 2018
Digital text Obituaries English Already published 23M
Working to continuously publish
Typewritten
newspaper text
Obituaries English Active research
Handwritten text Wills and deeds English Active research
Handwritten
calligraphy
Genealogies Chinese Preliminary research
Handwritten text Church records Spanish Preliminary research
More document
types
More record
types
More
languages
Expect future “explosions”
Video 2
What You Can Do
• Keep Indexing
• It is still valuable, especially in non-English languages
• Remember indexed data is the foundation for training machines
to auto-index correctly
• We’ll also likely continue to use human indexing to continue to
measure how the machines are doing
• Understand your role in correcting records that
have been automatically indexed incorrectly
• Be patient as solutions continue to expand,
perhaps on collections that don’t benefit your
research, remembering we are a global church
• Pray for the Lord’s help to bless these efforts
Infinity
Automated
Technologies
Truth / Training Data
Indexing
User Corrections
“We always overestimate the change that will
occur in the next two years and
underestimate the change that will occur in the
next ten.”
Bill Gates
Tale of Three Decades
1998-2007 – Laying the technological foundation
1996 – GEDCOM 5.5 standard released (still supported)
1999 – PAF 4.0 – First Windows version
2002 – PAF 5.2 – Last major version
2004 – First vault microfilms converted to digital images
2007 – First digital images from the vault published on FamilySearch.org
2008-2017 – Single publicly available tree integrated with historical records
2010 – Launch of FamilySearch record search (>1B names, millions of images)
2006 – FamilySearch indexing began
2007 – FamilySearch Research wiki started
2009 – new.familysearch became available in Utah (limited rollout began in 2007)
2009 – I began to work at FamilySearch
2011 – RootsTech conference began
2013 – Family Tree added – made available to non-LDS patrons
2013 – Memories (photos & stories) initial rollout
2014 – Partnerships with Ancestry, MyHeritage and FindMyPast
2014 – Record hinting
2014 – First FamilySearch mobile app released
2015 – User to User Messaging
2015 – Printing temple cards from home in 44 languages
2016 – Family Tree moved to scalable servers
2017 – Web indexing
2018 – Family Tree Lite
2018-2027 – Worldwide explosion of records
2017 – Nordic Records – Year of the Viking Scandanavian (Sweden, Denmark, Finland) first 3 of top 50 countries
2018 – Mexican Civil Records project – 60M records
???? – Billions more indexed records made available via automatic indexing technologies
???? – User corrections of records supported
???? – DNA Features?
???? – ????
Thank you!
I hope you’ve been inspired
Keep an eye out for more explosions

More Related Content

What's hot

Linked Data and Tools
Linked Data and ToolsLinked Data and Tools
Linked Data and Tools
American Art Collaborative
 
DMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data MiningDMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data Mining
Pier Luca Lanzi
 
Text analysis-semantic-search
Text analysis-semantic-searchText analysis-semantic-search
Text analysis-semantic-search
Diana Maynard
 
Adding value to NLP: a little semantics goes a long way
Adding value to NLP: a little semantics goes a long wayAdding value to NLP: a little semantics goes a long way
Adding value to NLP: a little semantics goes a long way
Diana Maynard
 
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...
James Hendler
 
Online text data for machine learning, data science, and research - Who can p...
Online text data for machine learning, data science, and research - Who can p...Online text data for machine learning, data science, and research - Who can p...
Online text data for machine learning, data science, and research - Who can p...
Fredrik Olsson
 
Is Linked Open Data the way forward?
Is Linked Open Data the way forward?Is Linked Open Data the way forward?
Is Linked Open Data the way forward?
American Art Collaborative
 
Filth and lies: analysing social media
Filth and lies: analysing social mediaFilth and lies: analysing social media
Filth and lies: analysing social media
Diana Maynard
 
2013-04-02 Cybertraps for Educators
2013-04-02 Cybertraps for Educators2013-04-02 Cybertraps for Educators
2013-04-02 Cybertraps for Educators
Frederick Lane
 
Cybertraps for Educators
Cybertraps for EducatorsCybertraps for Educators
Cybertraps for Educators
Frederick Lane
 
DMTM 2015 - 01 Course Introduction
DMTM 2015 - 01 Course IntroductionDMTM 2015 - 01 Course Introduction
DMTM 2015 - 01 Course Introduction
Pier Luca Lanzi
 
Lessons Learned from Lod Failure and Big Data : The Future Trend
Lessons Learned from Lod Failure and Big Data : The Future Trend Lessons Learned from Lod Failure and Big Data : The Future Trend
Lessons Learned from Lod Failure and Big Data : The Future Trend
Konkuk University
 
The language of social media
The language of social mediaThe language of social media
The language of social media
Diana Maynard
 
American Art Collaborative Linked Open Data presentation to "The Networked Cu...
American Art Collaborative Linked Open Data presentation to "The Networked Cu...American Art Collaborative Linked Open Data presentation to "The Networked Cu...
American Art Collaborative Linked Open Data presentation to "The Networked Cu...
American Art Collaborative
 
Open Data - Principles and Techniques
Open Data - Principles and TechniquesOpen Data - Principles and Techniques
Open Data - Principles and TechniquesBernhard Haslhofer
 
2014-08-27 Cybertraps for Educators: The Professional Perils of 24/7 Communic...
2014-08-27 Cybertraps for Educators: The Professional Perils of 24/7 Communic...2014-08-27 Cybertraps for Educators: The Professional Perils of 24/7 Communic...
2014-08-27 Cybertraps for Educators: The Professional Perils of 24/7 Communic...
Frederick Lane
 
Library project ethnographic presentation
Library project ethnographic presentationLibrary project ethnographic presentation
Library project ethnographic presentation
Missouri Western State University
 
Shared Data & Big Data for Libraries
Shared Data & Big Data for LibrariesShared Data & Big Data for Libraries
Shared Data & Big Data for Libraries
robin fay
 
Using language to save the world: interactions between society, behaviour and...
Using language to save the world: interactions between society, behaviour and...Using language to save the world: interactions between society, behaviour and...
Using language to save the world: interactions between society, behaviour and...
Diana Maynard
 
Getting Started in Data Science
Getting Started in Data ScienceGetting Started in Data Science
Getting Started in Data Science
Thinkful
 

What's hot (20)

Linked Data and Tools
Linked Data and ToolsLinked Data and Tools
Linked Data and Tools
 
DMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data MiningDMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data Mining
 
Text analysis-semantic-search
Text analysis-semantic-searchText analysis-semantic-search
Text analysis-semantic-search
 
Adding value to NLP: a little semantics goes a long way
Adding value to NLP: a little semantics goes a long wayAdding value to NLP: a little semantics goes a long way
Adding value to NLP: a little semantics goes a long way
 
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...
 
Online text data for machine learning, data science, and research - Who can p...
Online text data for machine learning, data science, and research - Who can p...Online text data for machine learning, data science, and research - Who can p...
Online text data for machine learning, data science, and research - Who can p...
 
Is Linked Open Data the way forward?
Is Linked Open Data the way forward?Is Linked Open Data the way forward?
Is Linked Open Data the way forward?
 
Filth and lies: analysing social media
Filth and lies: analysing social mediaFilth and lies: analysing social media
Filth and lies: analysing social media
 
2013-04-02 Cybertraps for Educators
2013-04-02 Cybertraps for Educators2013-04-02 Cybertraps for Educators
2013-04-02 Cybertraps for Educators
 
Cybertraps for Educators
Cybertraps for EducatorsCybertraps for Educators
Cybertraps for Educators
 
DMTM 2015 - 01 Course Introduction
DMTM 2015 - 01 Course IntroductionDMTM 2015 - 01 Course Introduction
DMTM 2015 - 01 Course Introduction
 
Lessons Learned from Lod Failure and Big Data : The Future Trend
Lessons Learned from Lod Failure and Big Data : The Future Trend Lessons Learned from Lod Failure and Big Data : The Future Trend
Lessons Learned from Lod Failure and Big Data : The Future Trend
 
The language of social media
The language of social mediaThe language of social media
The language of social media
 
American Art Collaborative Linked Open Data presentation to "The Networked Cu...
American Art Collaborative Linked Open Data presentation to "The Networked Cu...American Art Collaborative Linked Open Data presentation to "The Networked Cu...
American Art Collaborative Linked Open Data presentation to "The Networked Cu...
 
Open Data - Principles and Techniques
Open Data - Principles and TechniquesOpen Data - Principles and Techniques
Open Data - Principles and Techniques
 
2014-08-27 Cybertraps for Educators: The Professional Perils of 24/7 Communic...
2014-08-27 Cybertraps for Educators: The Professional Perils of 24/7 Communic...2014-08-27 Cybertraps for Educators: The Professional Perils of 24/7 Communic...
2014-08-27 Cybertraps for Educators: The Professional Perils of 24/7 Communic...
 
Library project ethnographic presentation
Library project ethnographic presentationLibrary project ethnographic presentation
Library project ethnographic presentation
 
Shared Data & Big Data for Libraries
Shared Data & Big Data for LibrariesShared Data & Big Data for Libraries
Shared Data & Big Data for Libraries
 
Using language to save the world: interactions between society, behaviour and...
Using language to save the world: interactions between society, behaviour and...Using language to save the world: interactions between society, behaviour and...
Using language to save the world: interactions between society, behaviour and...
 
Getting Started in Data Science
Getting Started in Data ScienceGetting Started in Data Science
Getting Started in Data Science
 

Similar to The Coming Explosion of Records at FamilySearch - Presentation

Artificial Intelligence and the Coming Revolution of Family History - Present...
Artificial Intelligence and the Coming Revolution of Family History - Present...Artificial Intelligence and the Coming Revolution of Family History - Present...
Artificial Intelligence and the Coming Revolution of Family History - Present...
bakers84
 
Building Corpora from Social Media
Building Corpora from Social MediaBuilding Corpora from Social Media
Building Corpora from Social Media
Richard Littauer
 
Familysearch for Ogden Library - plusses and minuses
Familysearch for Ogden Library - plusses and minusesFamilysearch for Ogden Library - plusses and minuses
Familysearch for Ogden Library - plusses and minusesLarry Naukam
 
Brave new search world
Brave new search worldBrave new search world
Brave new search world
voginip
 
Data stories
Data storiesData stories
Data stories
Elena Simperl
 
00-01 DSnDA.pdf
00-01 DSnDA.pdf00-01 DSnDA.pdf
00-01 DSnDA.pdf
SugumarSarDurai
 
The data we want
The data we wantThe data we want
The data we want
Elena Simperl
 
16-nlp (2).ppt
16-nlp (2).ppt16-nlp (2).ppt
16-nlp (2).ppt
testbest6
 
Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)
Thinkful
 
BEA 2015 Generating Metadata by Machine Final
BEA 2015 Generating Metadata by Machine FinalBEA 2015 Generating Metadata by Machine Final
BEA 2015 Generating Metadata by Machine FinalS. M. Hassan Zaidi
 
Better Data for a Better World
Better Data for a Better WorldBetter Data for a Better World
Better Data for a Better World
Rothamsted Research, UK
 
Getting comfortable with Data
Getting comfortable with DataGetting comfortable with Data
Getting comfortable with Data
Ritvvij Parrikh
 
Fsci 2018 wednesday1_august_am6
Fsci 2018 wednesday1_august_am6Fsci 2018 wednesday1_august_am6
Fsci 2018 wednesday1_august_am6
ARDC
 
Spark Social Media
Spark Social Media Spark Social Media
Spark Social Media
suresh sood
 
Big Data basics-Unit-1.pptx
Big Data basics-Unit-1.pptxBig Data basics-Unit-1.pptx
Big Data basics-Unit-1.pptx
varun453331
 
Enterprise Search Share Point2009 Best Practices Final
Enterprise Search Share Point2009 Best Practices FinalEnterprise Search Share Point2009 Best Practices Final
Enterprise Search Share Point2009 Best Practices Final
Marianne Sweeny
 
Guy avoiding-dat apocalypse
Guy avoiding-dat apocalypseGuy avoiding-dat apocalypse
Guy avoiding-dat apocalypse
ENUG
 
Introduction to EBI for Proteomics in ELIXIR
Introduction to EBI for Proteomics in ELIXIRIntroduction to EBI for Proteomics in ELIXIR
Introduction to EBI for Proteomics in ELIXIR
Juan Antonio Vizcaino
 
Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations
Roi Blanco
 
The web of data: how are we doing so far
The web of data: how are we doing so farThe web of data: how are we doing so far
The web of data: how are we doing so far
Elena Simperl
 

Similar to The Coming Explosion of Records at FamilySearch - Presentation (20)

Artificial Intelligence and the Coming Revolution of Family History - Present...
Artificial Intelligence and the Coming Revolution of Family History - Present...Artificial Intelligence and the Coming Revolution of Family History - Present...
Artificial Intelligence and the Coming Revolution of Family History - Present...
 
Building Corpora from Social Media
Building Corpora from Social MediaBuilding Corpora from Social Media
Building Corpora from Social Media
 
Familysearch for Ogden Library - plusses and minuses
Familysearch for Ogden Library - plusses and minusesFamilysearch for Ogden Library - plusses and minuses
Familysearch for Ogden Library - plusses and minuses
 
Brave new search world
Brave new search worldBrave new search world
Brave new search world
 
Data stories
Data storiesData stories
Data stories
 
00-01 DSnDA.pdf
00-01 DSnDA.pdf00-01 DSnDA.pdf
00-01 DSnDA.pdf
 
The data we want
The data we wantThe data we want
The data we want
 
16-nlp (2).ppt
16-nlp (2).ppt16-nlp (2).ppt
16-nlp (2).ppt
 
Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)
 
BEA 2015 Generating Metadata by Machine Final
BEA 2015 Generating Metadata by Machine FinalBEA 2015 Generating Metadata by Machine Final
BEA 2015 Generating Metadata by Machine Final
 
Better Data for a Better World
Better Data for a Better WorldBetter Data for a Better World
Better Data for a Better World
 
Getting comfortable with Data
Getting comfortable with DataGetting comfortable with Data
Getting comfortable with Data
 
Fsci 2018 wednesday1_august_am6
Fsci 2018 wednesday1_august_am6Fsci 2018 wednesday1_august_am6
Fsci 2018 wednesday1_august_am6
 
Spark Social Media
Spark Social Media Spark Social Media
Spark Social Media
 
Big Data basics-Unit-1.pptx
Big Data basics-Unit-1.pptxBig Data basics-Unit-1.pptx
Big Data basics-Unit-1.pptx
 
Enterprise Search Share Point2009 Best Practices Final
Enterprise Search Share Point2009 Best Practices FinalEnterprise Search Share Point2009 Best Practices Final
Enterprise Search Share Point2009 Best Practices Final
 
Guy avoiding-dat apocalypse
Guy avoiding-dat apocalypseGuy avoiding-dat apocalypse
Guy avoiding-dat apocalypse
 
Introduction to EBI for Proteomics in ELIXIR
Introduction to EBI for Proteomics in ELIXIRIntroduction to EBI for Proteomics in ELIXIR
Introduction to EBI for Proteomics in ELIXIR
 
Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations
 
The web of data: how are we doing so far
The web of data: how are we doing so farThe web of data: how are we doing so far
The web of data: how are we doing so far
 

More from bakers84

Civil Registration Records in Latin America and Spain - Presentation
Civil Registration Records in Latin America and Spain - PresentationCivil Registration Records in Latin America and Spain - Presentation
Civil Registration Records in Latin America and Spain - Presentation
bakers84
 
Civil Registration Records in Latin America and Spain - Handout
Civil Registration Records in Latin America and Spain - HandoutCivil Registration Records in Latin America and Spain - Handout
Civil Registration Records in Latin America and Spain - Handout
bakers84
 
Finding Relatives in Spanish Church Records
Finding Relatives in Spanish Church RecordsFinding Relatives in Spanish Church Records
Finding Relatives in Spanish Church Records
bakers84
 
Leveraging the Consultant Planner - Presentation
Leveraging the Consultant Planner - PresentationLeveraging the Consultant Planner - Presentation
Leveraging the Consultant Planner - Presentation
bakers84
 
Leveraging the Consultant Planner Syllabus
Leveraging the Consultant Planner SyllabusLeveraging the Consultant Planner Syllabus
Leveraging the Consultant Planner Syllabus
bakers84
 
A Peek Under the Hood at FamilySearch Syllabus
A Peek Under the Hood at FamilySearch SyllabusA Peek Under the Hood at FamilySearch Syllabus
A Peek Under the Hood at FamilySearch Syllabus
bakers84
 
Meaningful Family History in an Hour Syllabus
Meaningful Family History in an Hour SyllabusMeaningful Family History in an Hour Syllabus
Meaningful Family History in an Hour Syllabus
bakers84
 
Meaningful Family History In an Hour - Presentation
Meaningful Family History In an Hour - PresentationMeaningful Family History In an Hour - Presentation
Meaningful Family History In an Hour - Presentation
bakers84
 
Viewing Closest Relatives in the My Relatives View Paper
Viewing Closest Relatives in the My Relatives View PaperViewing Closest Relatives in the My Relatives View Paper
Viewing Closest Relatives in the My Relatives View Paper
bakers84
 
Viewing Closest Relatives in the My Relatives View Poster
Viewing Closest Relatives in the My Relatives View PosterViewing Closest Relatives in the My Relatives View Poster
Viewing Closest Relatives in the My Relatives View Poster
bakers84
 
Start and Grow Your Family Tree on FamilySearch.org - Presentation
Start and Grow Your Family Tree on FamilySearch.org - PresentationStart and Grow Your Family Tree on FamilySearch.org - Presentation
Start and Grow Your Family Tree on FamilySearch.org - Presentation
bakers84
 
Help! My Family Is All Messed Up on FamilySearch Family Tree!
Help! My Family Is All Messed Up on FamilySearch Family Tree!Help! My Family Is All Messed Up on FamilySearch Family Tree!
Help! My Family Is All Messed Up on FamilySearch Family Tree!
bakers84
 
FamilySearch Family Tree Essentials - Find, Take, Teach Webinar
FamilySearch Family Tree Essentials - Find, Take, Teach WebinarFamilySearch Family Tree Essentials - Find, Take, Teach Webinar
FamilySearch Family Tree Essentials - Find, Take, Teach Webinarbakers84
 
What I Wish Everyone in the LDS Church Knew About Family History
What I Wish Everyone in the LDS Church Knew About Family HistoryWhat I Wish Everyone in the LDS Church Knew About Family History
What I Wish Everyone in the LDS Church Knew About Family Historybakers84
 
FamilySearch Insider Tips and Tricks - Syllabus
FamilySearch Insider Tips and Tricks - SyllabusFamilySearch Insider Tips and Tricks - Syllabus
FamilySearch Insider Tips and Tricks - Syllabus
bakers84
 
FamilySearch Insider Tips and Tricks - Presentation
FamilySearch Insider Tips and Tricks - PresentationFamilySearch Insider Tips and Tricks - Presentation
FamilySearch Insider Tips and Tricks - Presentation
bakers84
 
Finding 'My Tree' Within FamilySearch Family Tree's 'Our Tree'
Finding 'My Tree' Within FamilySearch Family Tree's 'Our Tree'Finding 'My Tree' Within FamilySearch Family Tree's 'Our Tree'
Finding 'My Tree' Within FamilySearch Family Tree's 'Our Tree'
bakers84
 
A Whirlwind Tour of FamilySearch Resources - 2013 Presentation
A Whirlwind Tour of FamilySearch Resources - 2013 PresentationA Whirlwind Tour of FamilySearch Resources - 2013 Presentation
A Whirlwind Tour of FamilySearch Resources - 2013 Presentation
bakers84
 
Merging People in FamilySearch Family Tree - Presentation
Merging People in FamilySearch Family Tree - PresentationMerging People in FamilySearch Family Tree - Presentation
Merging People in FamilySearch Family Tree - Presentation
bakers84
 
A Whirlwind Tour of FamilySearch Resources - 2013 URL List
A Whirlwind Tour of FamilySearch Resources - 2013 URL ListA Whirlwind Tour of FamilySearch Resources - 2013 URL List
A Whirlwind Tour of FamilySearch Resources - 2013 URL List
bakers84
 

More from bakers84 (20)

Civil Registration Records in Latin America and Spain - Presentation
Civil Registration Records in Latin America and Spain - PresentationCivil Registration Records in Latin America and Spain - Presentation
Civil Registration Records in Latin America and Spain - Presentation
 
Civil Registration Records in Latin America and Spain - Handout
Civil Registration Records in Latin America and Spain - HandoutCivil Registration Records in Latin America and Spain - Handout
Civil Registration Records in Latin America and Spain - Handout
 
Finding Relatives in Spanish Church Records
Finding Relatives in Spanish Church RecordsFinding Relatives in Spanish Church Records
Finding Relatives in Spanish Church Records
 
Leveraging the Consultant Planner - Presentation
Leveraging the Consultant Planner - PresentationLeveraging the Consultant Planner - Presentation
Leveraging the Consultant Planner - Presentation
 
Leveraging the Consultant Planner Syllabus
Leveraging the Consultant Planner SyllabusLeveraging the Consultant Planner Syllabus
Leveraging the Consultant Planner Syllabus
 
A Peek Under the Hood at FamilySearch Syllabus
A Peek Under the Hood at FamilySearch SyllabusA Peek Under the Hood at FamilySearch Syllabus
A Peek Under the Hood at FamilySearch Syllabus
 
Meaningful Family History in an Hour Syllabus
Meaningful Family History in an Hour SyllabusMeaningful Family History in an Hour Syllabus
Meaningful Family History in an Hour Syllabus
 
Meaningful Family History In an Hour - Presentation
Meaningful Family History In an Hour - PresentationMeaningful Family History In an Hour - Presentation
Meaningful Family History In an Hour - Presentation
 
Viewing Closest Relatives in the My Relatives View Paper
Viewing Closest Relatives in the My Relatives View PaperViewing Closest Relatives in the My Relatives View Paper
Viewing Closest Relatives in the My Relatives View Paper
 
Viewing Closest Relatives in the My Relatives View Poster
Viewing Closest Relatives in the My Relatives View PosterViewing Closest Relatives in the My Relatives View Poster
Viewing Closest Relatives in the My Relatives View Poster
 
Start and Grow Your Family Tree on FamilySearch.org - Presentation
Start and Grow Your Family Tree on FamilySearch.org - PresentationStart and Grow Your Family Tree on FamilySearch.org - Presentation
Start and Grow Your Family Tree on FamilySearch.org - Presentation
 
Help! My Family Is All Messed Up on FamilySearch Family Tree!
Help! My Family Is All Messed Up on FamilySearch Family Tree!Help! My Family Is All Messed Up on FamilySearch Family Tree!
Help! My Family Is All Messed Up on FamilySearch Family Tree!
 
FamilySearch Family Tree Essentials - Find, Take, Teach Webinar
FamilySearch Family Tree Essentials - Find, Take, Teach WebinarFamilySearch Family Tree Essentials - Find, Take, Teach Webinar
FamilySearch Family Tree Essentials - Find, Take, Teach Webinar
 
What I Wish Everyone in the LDS Church Knew About Family History
What I Wish Everyone in the LDS Church Knew About Family HistoryWhat I Wish Everyone in the LDS Church Knew About Family History
What I Wish Everyone in the LDS Church Knew About Family History
 
FamilySearch Insider Tips and Tricks - Syllabus
FamilySearch Insider Tips and Tricks - SyllabusFamilySearch Insider Tips and Tricks - Syllabus
FamilySearch Insider Tips and Tricks - Syllabus
 
FamilySearch Insider Tips and Tricks - Presentation
FamilySearch Insider Tips and Tricks - PresentationFamilySearch Insider Tips and Tricks - Presentation
FamilySearch Insider Tips and Tricks - Presentation
 
Finding 'My Tree' Within FamilySearch Family Tree's 'Our Tree'
Finding 'My Tree' Within FamilySearch Family Tree's 'Our Tree'Finding 'My Tree' Within FamilySearch Family Tree's 'Our Tree'
Finding 'My Tree' Within FamilySearch Family Tree's 'Our Tree'
 
A Whirlwind Tour of FamilySearch Resources - 2013 Presentation
A Whirlwind Tour of FamilySearch Resources - 2013 PresentationA Whirlwind Tour of FamilySearch Resources - 2013 Presentation
A Whirlwind Tour of FamilySearch Resources - 2013 Presentation
 
Merging People in FamilySearch Family Tree - Presentation
Merging People in FamilySearch Family Tree - PresentationMerging People in FamilySearch Family Tree - Presentation
Merging People in FamilySearch Family Tree - Presentation
 
A Whirlwind Tour of FamilySearch Resources - 2013 URL List
A Whirlwind Tour of FamilySearch Resources - 2013 URL ListA Whirlwind Tour of FamilySearch Resources - 2013 URL List
A Whirlwind Tour of FamilySearch Resources - 2013 URL List
 

Recently uploaded

Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 

Recently uploaded (20)

Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 

The Coming Explosion of Records at FamilySearch - Presentation

  • 1. © 2013 by Intellectual Reserve, Inc. All rights reserved. The Coming Explosion of Records at FamilySearch BYU Conference on Family History and Genealogy July 31, 2018 Ben Baker bakerb@familysearch.org
  • 2. Background • Over 8½ years as a Software Engineer at FamilySearch • Currently on the Automated Content Extraction team • Try to do my own genealogy and help others • Hope I’ll be able to help you see a vision of the future • Go to https://www.slideshare.net/bakers84 or e-mail me (bakerb@familysearch.org) to get a copy of this presentation • Click here for the related printed handout materials
  • 3. First, Some Basics Good News • FamilySearch published its 2 billionth image in April 2018 • The 1 billionth image was published in June 2014 • FamilySearch continues to digitize nearly 1M images per day from microfilm and about 320 cameras worldwide • Family has nearly 6.4B indexed names of people in records • Record hinting has already made FamilySearch Family Tree the most well sourced tree in the world with over 1B sources attached to persons in the tree Bad News • Many records are only available as images via the catalog. Only a fraction of records have been indexed • Indexing isn’t keeping up with the ability to digitize images, especially in non-English languages • Current available record images do not match church membership in some areas • Only indexed records can be presented as record hints
  • 4. Historical Records Images by Region at FamilySearch North America Europe and Middle East Latin America Other Asia Africa/Pacific LDS Church Membership by Region North America Europe and Middle East Latin America Other Asia Africa/Pacific
  • 5. Changing the Records Publication Paradigm • Several teams at FamilySearch are dedicated to improving the records publication platform • The Goal: Provide more findable, relevant, curated records for gathering multi-generational families from around the world • Want to publish and make hintable 20% of the top tier records in 50 of the highest priority countries within 15 years • 58% coverage in North America as of 2017 • Crossed 20% in 3 more countries in 2017 (Denmark, Finland and Sweden) • Major release of Mexican records in 2018 • Seeking to allow homelands to be more involved in building local content • Will support user corrections to records and indexing on-the-fly • Will use automated technologies to accelerate publication
  • 6. International Conference on Document Analysis and Recognition (ICDAR) 2011 Beijing Friendship Hotel
  • 7. First Mini-Explosion • Partnership with GenealogyBank to extract data from born digital obituaries • First run indexed 5M obituaries in 10 hours, saving about 150 man-years of indexing • 23M obituaries indexed as of May 2018, many more coming • Uses recent advancements in machine learning and artificial intelligence (AI) • Can produce even more information than indexing (Ex. In-law couple relationships)
  • 9. GenealogyBank collection of obituaries is available now Improvements to correcting data are coming
  • 10. What is Being Done Now • Refining research code and models to be more stable, reproducible and measurable • Support ability to publish 1M obituaries a month now, continuing to increase • Built on scalable Amazon Web Services to meet any future demands
  • 11. How are Artificial Intelligence, Machine Learning and Deep Learning Related? Artificial Intelligence – Machines exhibiting human intelligence • General AI – still science fiction • Narrow AI – technologies that perform specific tasks as well or better than humans Machine Learning – Practice of using algorithms to parse data, learn from it, and then make a determination or prediction about something in the world Deep Learning – Using much larger machine learning neural networks requiring more training data and computational power Artificial Intelligence Machine Learning Deep Learning
  • 12. Machine Learning Isn’t Really New • Been around for decades • Spam filters in 1990s • OCR (Optical Character Recognition) • FamilySearch already uses for some things • Match classifier • Possible duplicates (person – person) • Record hinting (person – record) • FamilySearch is beginning to explore new uses • Research Team -> Automated Content Extraction • Exploring Deep Learning and other methods to automatically understand historical documents
  • 13. How is Machine Learning different from traditional programming? Machine Learning is using computers so they can learn from data instead of writing rules (i.e. code) to solve problems Study the Problem Write Rules Evaluate Launch! Analyze Errors Study the Problem Train ML Algorithm Evaluate Launch! Analyze Errors Data
  • 14. Necessary Technologies • Natural Language Processing (NLP) • Named entity recognition (NER) – identify the names, dates, places, etc. • Relation extraction – identify relationships between the names, dates & places • Additional processing to get into format for publication, standardize data, etc. • Notice the steps are similar to what a genealogist would do
  • 16. Live Demos Lille E. Yeckley 1915-1980
  • 17. Document Type Record Type Language Status in May 2018 Digital text Obituaries English Already published 23M Working to continuously publish Typewritten newspaper text Obituaries English Active research Handwritten text Wills and deeds English Active research Handwritten calligraphy Genealogies Chinese Preliminary research Handwritten text Church records Spanish Preliminary research More document types More record types More languages Expect future “explosions”
  • 19. What You Can Do • Keep Indexing • It is still valuable, especially in non-English languages • Remember indexed data is the foundation for training machines to auto-index correctly • We’ll also likely continue to use human indexing to continue to measure how the machines are doing • Understand your role in correcting records that have been automatically indexed incorrectly • Be patient as solutions continue to expand, perhaps on collections that don’t benefit your research, remembering we are a global church • Pray for the Lord’s help to bless these efforts
  • 20. Infinity Automated Technologies Truth / Training Data Indexing User Corrections
  • 21. “We always overestimate the change that will occur in the next two years and underestimate the change that will occur in the next ten.” Bill Gates
  • 22. Tale of Three Decades 1998-2007 – Laying the technological foundation 1996 – GEDCOM 5.5 standard released (still supported) 1999 – PAF 4.0 – First Windows version 2002 – PAF 5.2 – Last major version 2004 – First vault microfilms converted to digital images 2007 – First digital images from the vault published on FamilySearch.org 2008-2017 – Single publicly available tree integrated with historical records 2010 – Launch of FamilySearch record search (>1B names, millions of images) 2006 – FamilySearch indexing began 2007 – FamilySearch Research wiki started 2009 – new.familysearch became available in Utah (limited rollout began in 2007) 2009 – I began to work at FamilySearch 2011 – RootsTech conference began 2013 – Family Tree added – made available to non-LDS patrons 2013 – Memories (photos & stories) initial rollout 2014 – Partnerships with Ancestry, MyHeritage and FindMyPast 2014 – Record hinting 2014 – First FamilySearch mobile app released 2015 – User to User Messaging 2015 – Printing temple cards from home in 44 languages 2016 – Family Tree moved to scalable servers 2017 – Web indexing 2018 – Family Tree Lite 2018-2027 – Worldwide explosion of records 2017 – Nordic Records – Year of the Viking Scandanavian (Sweden, Denmark, Finland) first 3 of top 50 countries 2018 – Mexican Civil Records project – 60M records ???? – Billions more indexed records made available via automatic indexing technologies ???? – User corrections of records supported ???? – DNA Features? ???? – ????
  • 23. Thank you! I hope you’ve been inspired Keep an eye out for more explosions