Tc accelerate-2019-05

A Comparison of Three Systems
Tom Creighton
CTO & Lead Architect
Family Search, International
tc@familysearch.org

© DataStax, All Rights Reserved.ConfidentialConfidential © DataStax, All Rights Reserved.
FamilySearch – Who We Are
● Non-profit organization dedicated to helping people make
joyful family discoveries.
● Founded as the Genealogical Society of Utah in 1894.
● More than 100 years gathering the world’s records.
● Free web tools, apps, and resources at
www.familysearch.org.
● Sponsored and funded by the Church of Jesus Christ of
Latter-day Saints

FamilySearch – What We Do
● Gather and preserve the world’s records – Everyone
deserves to be remembered
● Provide free access to records and tools
on www.familysearch.org that help people
discover who they are.
● Run the Family History Library in Salt
Lake City and more than 5,000 affiliated
Family History Centers.
● Enable people around the world to
gather and share their own stories and
connect with other family members.

FamilySearch – Why Do It?
● Gather and preserve the world’s records – Everyone
deserves to be remembered
● Family is the key unit of society. When we
strengthen families, we strengthen society.
● When we realize we are all part of the
human family we treat each other differently.
● Understanding the stories of our families
provides healing, strength, and resiliency.
● Family relationships are eternal in nature.
● Everyone deserves to be remembered as a member of
God’s family.

Comparing
Three
Systems
● Family Tree
− A wiki-like system for publishing and collaborating
on genealogical conclusions: a pedigree of
mankind
● Hinting
− ML-based recommendations
● Resource Metadata System (RMS)
− Metadata management for billions of source
records

Family Tree
Application
● Enable collaboration among FH researchers
● Maintain a common pedigree
● Provide repository for FH research conclusions
● Provide multiple views of data
● Facilitate multiple parental branches
● Support tens of thousands of concurrent users
● Work well around the world/multiple languages
● Manage billions of ancestor records

© DataStax, All Rights Reserved.Confidential
Image Placeholder Slides
Tips for using image placeholder slides will appear on the pasteboard near the image.

© DataStax, All Rights Reserved.Confidential
Image Placeholder Slides

Family Tree
DB Overview
● 5.1 DSE, Cassandra-only workload
● 21 nodes in main DC serving user reads/writes
● Peak day comes weekly every Sunday
● Peak months are Jan-Feb and Jul-Sep
● Reads: 250k/sec sustained, top node 15k/sec
● Writes: 9.3k/sec sustained

Family Tree
DB Overview
Current Topology:
• Primary ring: 21 nodes, RF=3, i3.4xlarge, ephemeral disk
• Disk safety ring: 6 nodes, RF=1, r5.2xlarge, 4500GB EBS
• Remote, Secondary ring: 15 nodes, RF=2, r4.2xlarge,
2500GB EBS
• EBS volume snapshots on secondary ring
Near-term Topology Changes:
4500GB EBS
• Eliminate data safety ring

Family Tree
DB Details
• read quorum latencies:
• p50=1.6ms
• p99=10ms
• p999=75ms
• max around 1000ms (timeout 3000ms)
• read one latencies:
• p50=0.15ms
• p99=1.13ms
• p999=5ms
• max around 1000ms
• write quorum latencies:
• p50=1.13ms
• p99=5.8ms
• p999=62ms,
• max around 1000-2000ms (timeout 5000ms)

Hinting
Application
• Manage results from large ML job
• Precomputes likely connections between tree and
sources
• Trillions of comparisons
• Supports both on-going and batch update
• Enables users to more easily locate relevant data
• Tracks use of the hints

Hinting DB
Overview
• All reads/writes are quorum
• Aggregate reads: 192K/sec
• Aggregate writes: 12K/sec
• Data model: 3 partitions, many columns + blobs
• One ring; 27 nodes: i3.2xlarge; ephemeral disk only
• RF=3; no volume backup

Records
Metadata
System(RMS)
Application
• Manage metadata about digital artifacts (images)
together in a single searchable store
• Support publication workflow on these artifacts
• Manage entitlements (access permissions) on
these artifacts

RMS
Application
Manage
Metadata
• Artifact Date – for example : Belfast Ireland Death
Records from 1850-1878
• Artifact Type – for example : Belfast Ireland Death
• Artifact Place – for example : Belfast Ireland Death
• As of May 2019 ~ 3.7 billion records managed

RMS
Application
Support
Publication
Workflow
• We capture digital images of paper records and
then process, store, and preserve those digital
artifacts.
• We transcribe (index) data from digital artifacts and
make that data also searchable
• The ability to search artifacts by date, place and
record type is essential to determining what to put
through different parts of our publication workflow.
• As of May 2019 ~ 3.7 billion records managed

RMS DB
Overview
• DSE 5.1, Cassandra and DSE Search workload
• Primary ring supports both read/write and search
• All reads/writes are quorum
• Data model: 3 partitions, many columns + blobs
• Reads: 60K/second
• Writes: 40K/second

RMS DB
Overview
Current Topology:
• Primary ring: 24 nodes, RF=3, i3.4xlarge, ephemeral disk
4500GB EBS
• EBS volume snapshots on secondary ring
Near-term Topology Changes:
• Secondary ring to move to remote DC

RMS DB
Details
• read quorum latencies:
• p50=1.95ms
• p99=10ms
• p999=90ms
• Max ~ 3000ms (timeout 5000ms)
• write quorum latencies:
• p50=1.13ms
• p99=10ms
• p999=150ms
• Max ~ 1000-2000ms (timeout 5000ms)

Summary Comparison
System
Name
Read Rate Write Rate Canonical
Data?
Multi-ring? Node
Count
Primary
Node Type
Primary
Backup? Higher-level
Services
Family
Tree
250K/second 9.3K/second yes yes 21 I3.4xlarge yes none
Hinting 192K/second 12K/second no no 27 I3.2xlarge no none
RMS 60K/second 40K/second yes yes 24 I3.4xlarge soon DSE Search

Conclusions
• Canonical data requires special care
• Architecture: DB topology, machine config,
application design and implementation
• Recovery Point/Recovery Time objectives drive
architecture
• Performance & Scale drive architecture
• Availability SLA drives architecture
• Cost trade-offs drive architecture
• Architecture == Balance

Tc accelerate-2019-05

Recommended

Recommended

More Related Content

What's hot

What's hot (16)

Similar to Tc accelerate-2019-05

Similar to Tc accelerate-2019-05 (20)

Recently uploaded

Recently uploaded (20)

Tc accelerate-2019-05