SlideShare a Scribd company logo
1 of 35
Download to read offline
From Purple Prose to
Machine-Checkable Proofs:
Levels of rigor in family history tools
Dr. Luther A. Tychonievich, Ph.D.
Dept. Computer Science, University of Virginia
TSC Coordinator, Family History Information Standards Organisation
These slides: http://www.cs.virginia.edu/luther/RT2015.pdf
Me: ltychonievich@fhiso.org or luther@virginia.edu
Luther Tychonievich 1 of 35
Outline
• Two definitions of “proof”
• Can genealogy be rigorous?
• Can genealogy tools be rigorous?
• An upper bound on possible rigor
• Achieving rigor, and beyond
Luther Tychonievich 2 of 35
Two Definitions of Proof
• Persuasive writing intended to
convince someone that an idea is
beyond rational doubt
• A set of derivations that reduce a
non-obvious claim to a set of other
claims and derivation rules
Luther Tychonievich 3 of 35
Not Proof
• Purple prose is language that
conceals a lack of substance behind
a superfluity of description
• Structure can be that detail…
• Try a genealogy search for Odin
• Potential for misuse ≠ bad
Luther Tychonievich 4 of 35
Outline
• Two definitions of “proof”
• Can genealogy be rigorous?
• Can genealogy tools be rigorous?
• An upper bound on possible rigor
• Achieving rigor, and beyond
Luther Tychonievich 5 of 35
Reasons not to trust
genealogy
• Amateurs and fee-for-service
• …the two least-trusted sources
• Individuals don’t yield to statistics
• Linked generations compound error
• Genes and probability offer little
hope
Luther Tychonievich 6 of 35
Compound Error
• Suppose each parent-child link
correct with 95% confidence
• Great-grandparents = 14 links
• 0.9514
< 49% chance all correct
• Doubling per-link confidence
• Doubles links we can trust
• Gives one extra generation
Luther Tychonievich 7 of 35
Will genetics provide rigor?
• Makes statistical claims about
overlap of ancestry of living people
• Illegitimacy and inbreeding
• Biological vs Social parent
• (see philogenetic tree research)
Luther Tychonievich 8 of 35
Can probability provide rigor?
• Probabilistic reasoning requires
some subset of
• Priors
• Experiments or Ground truth
• Well-defined populations
• Independent distributions
• Not a silver bullet…
Luther Tychonievich 9 of 35
Outline
• Two definitions of “proof”
• Can genealogy be rigorous?
• Can genealogy tools be rigorous?
• An upper bound on possible rigor
• Achieving rigor, and beyond
Luther Tychonievich 10 of 35
Pop Quiz
• Which of the following is hardest?
(1) Finding the right source
(2) Citing the source right
(3) Drawing the right conclusions
(4) Communicating your reasoning
• Using tool , which is hardest?
Luther Tychonievich 11 of 35
Analogy: Proof-Carrying Code
• To prove code works is provably hard
• …or impossible (Rice’s Theorem)
• To communicate a proof is easy
• …if you use the right tool
• Can we make the right tool for
family history?
Luther Tychonievich 12 of 35
Outline
• Two definitions of “proof”
• Can genealogy be rigorous?
• Can genealogy tools be rigorous?
• An upper bound on possible rigor
• Achieving rigor, and beyond
Luther Tychonievich 13 of 35
On the next slide we’ll see…
• A recorder’s perspective of the
logical flow of genealogy
• Not a suggested research work-flow
Luther Tychonievich 14 of 35
Luther Tychonievich 15 of 35
• History happened, creating artifacts
(documents, monuments, memories…)
• Not part of research; should not
appear in tools
Luther Tychonievich 16 of 35
• Step 1: discover artifacts
• Digitization helps us all
• Citations are getting better…
• Logs (negative evidence) need work
Luther Tychonievich 17 of 35
• Many tiers: shapes → glyphs →
words → meaning → claims
• Single version = data pollution
• Transcription community much
better at this than us
Luther Tychonievich 18 of 35
• We don’t consciously consider every
claim we see
• How do you detect, let alone store,
subconscious selection?
• Computer could notice for us…
Luther Tychonievich 19 of 35
• “Is that my John?”
• Turns claims into people
• The crux of genealogy; most errors
I’ve seen happen here
• Deserves a second slide…
Luther Tychonievich 20 of 35
• We need to
1. record matches (and non-matches)
2. allow changes
3. record reason for each match
4. embrace ambiguity
• More tools gaining 1, a few have 2; 3
is unusual, 4 yet to appear
• Deserves a third slide…
Luther Tychonievich 21 of 35
• Match algebra:
• A = B
• A ≠ B
• {A, B, C, D, …} ≥ 2 individuals
• Matches of any noun (person, place,
event, etc.)
• Often evidence for A = B and A ≠ B
Luther Tychonievich 22 of 35
• The “hidden” step today: from
“name: Jane” we infer “gender:
female” but don’t record that we
made an inference.
• Data simple, missing in our tools
Luther Tychonievich 23 of 35
• The final belief is logically
uninteresting, but socially vital
• Just one is a problem
• Many tools store only beliefs and
artefacts found
Luther Tychonievich 24 of 35
Luther Tychonievich 25 of 35
Outline
• Two definitions of “proof”
• Can genealogy be rigorous?
• Can genealogy tools be rigorous?
• An upper bound on possible rigor
• Achieving rigor, and beyond
Luther Tychonievich 26 of 35
Do we want rigorous tools?
• Depends on the audience
• Option of rigor seems good
• Can we make a “proof-carrying
GEDCOM”?
Luther Tychonievich 27 of 35
To achieve rigor:
• Embrace ambiguity
• 2+ conflicting versions of each step
• Adopt best-practice transcription
• Match algebra
• Support reasons for matches
• Add inferences explicitly
Luther Tychonievich 28 of 35
…and beyond
• Track the “select” step
• Inference rules could allow
• translation-free rationales
• statistically-sound probability
• Mid-research belief: A = B = C ≠ A
• Time-varying rigor could be
revealing data
Luther Tychonievich 29 of 35
Outline
• Two definitions of “proof”
• Can genealogy be rigorous?
• Can genealogy tools be rigorous?
• An upper bound on possible rigor
• Achieving rigor, and beyond
• Extra Slides
Luther Tychonievich 30 of 35
More on Inferences
• Inference step:
rule + antecedents = consequent
• Antecedents, consequent: claims
• source of consequent: the inference
• support of conseq.: the antecedents
• rationale of consequent: the rule
• Support may be enough…
Luther Tychonievich 31 of 35
Inference “rule”s
• A predicate over antecedents
• A claim built from antecedents
• Could be probabilistic (“trend” not
“rule”)
• Consequent could be claim, match,
non-match, etc.
• Encodes “why” as pure data
Luther Tychonievich 32 of 35
More on Probability
• Probability a claim is true: 0 or 1
• Probability a rule’s consequent is
true: validated
validated + invalidated
(approximate)
• Is “83% chance Jane is John’s
mother” useful?
• What about “99.7% chance either
Jane, Sue, or Anna is John’s mother”?
Luther Tychonievich 33 of 35
Outline
• Two definitions of “proof”
• Can genealogy be rigorous?
• Can genealogy tools be rigorous?
• An upper bound on possible rigor
• Achieving rigor, and beyond
• Extra Slides
Luther Tychonievich 34 of 35
Questions?
These slides: http://www.cs.virginia.edu/luther/RT2015.pdf
Me: ltychonievich@fhiso.org or luther@virginia.edu
Luther Tychonievich 35 of 35

More Related Content

Similar to From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich

Why great workplaces are built on connection
Why great workplaces are built on connectionWhy great workplaces are built on connection
Why great workplaces are built on connectionSteve Pell
 
Five things every teacher needs to know about research
Five things every teacher needs to know about researchFive things every teacher needs to know about research
Five things every teacher needs to know about researchChristian Bokhove
 
relativism-151210181112.pptx
relativism-151210181112.pptxrelativism-151210181112.pptx
relativism-151210181112.pptxhectorlacbayo
 
Class Notes - Critical Thinking and The Nature of Knowledge
Class Notes - Critical Thinking and The Nature of KnowledgeClass Notes - Critical Thinking and The Nature of Knowledge
Class Notes - Critical Thinking and The Nature of Knowledgeestice
 
How does our Christian faith inform our role as citizens?
How does our Christian faith inform our role as citizens?How does our Christian faith inform our role as citizens?
How does our Christian faith inform our role as citizens?Mary Hess
 
Is there ideological bias in psychology?
Is there ideological bias in psychology?Is there ideological bias in psychology?
Is there ideological bias in psychology?Jay Van Bavel
 

Similar to From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich (7)

Why great workplaces are built on connection
Why great workplaces are built on connectionWhy great workplaces are built on connection
Why great workplaces are built on connection
 
Five things every teacher needs to know about research
Five things every teacher needs to know about researchFive things every teacher needs to know about research
Five things every teacher needs to know about research
 
relativism-151210181112.pptx
relativism-151210181112.pptxrelativism-151210181112.pptx
relativism-151210181112.pptx
 
Class Notes - Critical Thinking and The Nature of Knowledge
Class Notes - Critical Thinking and The Nature of KnowledgeClass Notes - Critical Thinking and The Nature of Knowledge
Class Notes - Critical Thinking and The Nature of Knowledge
 
How does our Christian faith inform our role as citizens?
How does our Christian faith inform our role as citizens?How does our Christian faith inform our role as citizens?
How does our Christian faith inform our role as citizens?
 
Questions about the truth
Questions about the truthQuestions about the truth
Questions about the truth
 
Is there ideological bias in psychology?
Is there ideological bias in psychology?Is there ideological bias in psychology?
Is there ideological bias in psychology?
 

Recently uploaded

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 

Recently uploaded (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 

From Purple Prose to Machine-Checkable Proofs by Luther Tychonievich

  • 1. From Purple Prose to Machine-Checkable Proofs: Levels of rigor in family history tools Dr. Luther A. Tychonievich, Ph.D. Dept. Computer Science, University of Virginia TSC Coordinator, Family History Information Standards Organisation These slides: http://www.cs.virginia.edu/luther/RT2015.pdf Me: ltychonievich@fhiso.org or luther@virginia.edu Luther Tychonievich 1 of 35
  • 2. Outline • Two definitions of “proof” • Can genealogy be rigorous? • Can genealogy tools be rigorous? • An upper bound on possible rigor • Achieving rigor, and beyond Luther Tychonievich 2 of 35
  • 3. Two Definitions of Proof • Persuasive writing intended to convince someone that an idea is beyond rational doubt • A set of derivations that reduce a non-obvious claim to a set of other claims and derivation rules Luther Tychonievich 3 of 35
  • 4. Not Proof • Purple prose is language that conceals a lack of substance behind a superfluity of description • Structure can be that detail… • Try a genealogy search for Odin • Potential for misuse ≠ bad Luther Tychonievich 4 of 35
  • 5. Outline • Two definitions of “proof” • Can genealogy be rigorous? • Can genealogy tools be rigorous? • An upper bound on possible rigor • Achieving rigor, and beyond Luther Tychonievich 5 of 35
  • 6. Reasons not to trust genealogy • Amateurs and fee-for-service • …the two least-trusted sources • Individuals don’t yield to statistics • Linked generations compound error • Genes and probability offer little hope Luther Tychonievich 6 of 35
  • 7. Compound Error • Suppose each parent-child link correct with 95% confidence • Great-grandparents = 14 links • 0.9514 < 49% chance all correct • Doubling per-link confidence • Doubles links we can trust • Gives one extra generation Luther Tychonievich 7 of 35
  • 8. Will genetics provide rigor? • Makes statistical claims about overlap of ancestry of living people • Illegitimacy and inbreeding • Biological vs Social parent • (see philogenetic tree research) Luther Tychonievich 8 of 35
  • 9. Can probability provide rigor? • Probabilistic reasoning requires some subset of • Priors • Experiments or Ground truth • Well-defined populations • Independent distributions • Not a silver bullet… Luther Tychonievich 9 of 35
  • 10. Outline • Two definitions of “proof” • Can genealogy be rigorous? • Can genealogy tools be rigorous? • An upper bound on possible rigor • Achieving rigor, and beyond Luther Tychonievich 10 of 35
  • 11. Pop Quiz • Which of the following is hardest? (1) Finding the right source (2) Citing the source right (3) Drawing the right conclusions (4) Communicating your reasoning • Using tool , which is hardest? Luther Tychonievich 11 of 35
  • 12. Analogy: Proof-Carrying Code • To prove code works is provably hard • …or impossible (Rice’s Theorem) • To communicate a proof is easy • …if you use the right tool • Can we make the right tool for family history? Luther Tychonievich 12 of 35
  • 13. Outline • Two definitions of “proof” • Can genealogy be rigorous? • Can genealogy tools be rigorous? • An upper bound on possible rigor • Achieving rigor, and beyond Luther Tychonievich 13 of 35
  • 14. On the next slide we’ll see… • A recorder’s perspective of the logical flow of genealogy • Not a suggested research work-flow Luther Tychonievich 14 of 35
  • 16. • History happened, creating artifacts (documents, monuments, memories…) • Not part of research; should not appear in tools Luther Tychonievich 16 of 35
  • 17. • Step 1: discover artifacts • Digitization helps us all • Citations are getting better… • Logs (negative evidence) need work Luther Tychonievich 17 of 35
  • 18. • Many tiers: shapes → glyphs → words → meaning → claims • Single version = data pollution • Transcription community much better at this than us Luther Tychonievich 18 of 35
  • 19. • We don’t consciously consider every claim we see • How do you detect, let alone store, subconscious selection? • Computer could notice for us… Luther Tychonievich 19 of 35
  • 20. • “Is that my John?” • Turns claims into people • The crux of genealogy; most errors I’ve seen happen here • Deserves a second slide… Luther Tychonievich 20 of 35
  • 21. • We need to 1. record matches (and non-matches) 2. allow changes 3. record reason for each match 4. embrace ambiguity • More tools gaining 1, a few have 2; 3 is unusual, 4 yet to appear • Deserves a third slide… Luther Tychonievich 21 of 35
  • 22. • Match algebra: • A = B • A ≠ B • {A, B, C, D, …} ≥ 2 individuals • Matches of any noun (person, place, event, etc.) • Often evidence for A = B and A ≠ B Luther Tychonievich 22 of 35
  • 23. • The “hidden” step today: from “name: Jane” we infer “gender: female” but don’t record that we made an inference. • Data simple, missing in our tools Luther Tychonievich 23 of 35
  • 24. • The final belief is logically uninteresting, but socially vital • Just one is a problem • Many tools store only beliefs and artefacts found Luther Tychonievich 24 of 35
  • 26. Outline • Two definitions of “proof” • Can genealogy be rigorous? • Can genealogy tools be rigorous? • An upper bound on possible rigor • Achieving rigor, and beyond Luther Tychonievich 26 of 35
  • 27. Do we want rigorous tools? • Depends on the audience • Option of rigor seems good • Can we make a “proof-carrying GEDCOM”? Luther Tychonievich 27 of 35
  • 28. To achieve rigor: • Embrace ambiguity • 2+ conflicting versions of each step • Adopt best-practice transcription • Match algebra • Support reasons for matches • Add inferences explicitly Luther Tychonievich 28 of 35
  • 29. …and beyond • Track the “select” step • Inference rules could allow • translation-free rationales • statistically-sound probability • Mid-research belief: A = B = C ≠ A • Time-varying rigor could be revealing data Luther Tychonievich 29 of 35
  • 30. Outline • Two definitions of “proof” • Can genealogy be rigorous? • Can genealogy tools be rigorous? • An upper bound on possible rigor • Achieving rigor, and beyond • Extra Slides Luther Tychonievich 30 of 35
  • 31. More on Inferences • Inference step: rule + antecedents = consequent • Antecedents, consequent: claims • source of consequent: the inference • support of conseq.: the antecedents • rationale of consequent: the rule • Support may be enough… Luther Tychonievich 31 of 35
  • 32. Inference “rule”s • A predicate over antecedents • A claim built from antecedents • Could be probabilistic (“trend” not “rule”) • Consequent could be claim, match, non-match, etc. • Encodes “why” as pure data Luther Tychonievich 32 of 35
  • 33. More on Probability • Probability a claim is true: 0 or 1 • Probability a rule’s consequent is true: validated validated + invalidated (approximate) • Is “83% chance Jane is John’s mother” useful? • What about “99.7% chance either Jane, Sue, or Anna is John’s mother”? Luther Tychonievich 33 of 35
  • 34. Outline • Two definitions of “proof” • Can genealogy be rigorous? • Can genealogy tools be rigorous? • An upper bound on possible rigor • Achieving rigor, and beyond • Extra Slides Luther Tychonievich 34 of 35
  • 35. Questions? These slides: http://www.cs.virginia.edu/luther/RT2015.pdf Me: ltychonievich@fhiso.org or luther@virginia.edu Luther Tychonievich 35 of 35