Enforcing Standards
in Ten Programming Languages
A. Jesse Jiryu Davis and Samantha Ritter
Cat-Herd's Crook
Samantha Ritter
@SamWhoCodes
Cat-Herd's Crook
A. Jesse Jiryu Davis
@JesseJiryuDavis
Cat-Herd's Crook
Agenda
• The problem
• Why the problem lingered
• Failed attempts
• Solution: YAML!
• Why it worked
• The payoff
MongoDB?
MongoDB Drivers
Similar stack, many languages
application driver
• C
• C++
• C#
• Java
• Node.js
• Perl
• PHP
• Python
• Ruby
• Scala
• Go
• Haskell
• Erlang
• Rust
• R
• Swift
• Lua
The Problem
unintentional
differences
driver
10 ms
20 ms
For Example
The Spec Sayeth:
Determine the "nearest"
member as the one with the
quickest response to a ping.
“ Choose a member randomly
from those at most 15ms
"farther" than the nearest
member.
„
Did We Have To
Let It Linger?
• Too hard to review
• No authority
• (Especially in open source)
False Starts
• Prose description
• Reference implementation
False Starts
def choose_member(servers):
best_ping_time = min(
server.ping_time for server in servers)
filtered_servers = [
server for server in servers
if server.ping_time <= best_ping_time + 15]
return random.choice(filtered_servers)
False Starts
• Prose description
• Reference implementation
• Tests in English prose
False Starts
Given servers with ping times
10ms and 20ms, the driver
reads from both equally.
“
„
False Starts
• Prose description
• Reference implementation
• Tests in English prose
• Cucumber
Cucumber
Feature: Local Threshold
In order to support local threshold
As a driver author
I want to verify driver behavior
Scenario: Nearest
Given a replica set with 2 nodes
And node 0 has latency 10
And node 1 has latency 20
And a document written to all nodes
When I track server latency on all nodes
And I query with localThresholdMS 15
Then the query occurs on either node
False Starts
• Prose description
• Reference implementation
• Tests in English prose
• Cucumber
Long
Dark
Night
… but then came the dawn.
Specs in YAML
What’s YAML?
• Yet Another Markup Language
• YAML Ain’t Markup Language
What’s YAML?
• Lightweight configuration language
• A “simple language that describes data”
What’s YAML?
• Most languages can parse YAML
• YAML JSON
Why YAML?
• Compared to JSON: more human-readable
• Compared to XML: nicer
• Compared to Cucumber: language-neutral
Why YAML?
• Standard comment format
# A primary server
address: “a:27017”
avg_rtt_ms: 20
type: RSPrimary
tags:
data_center: nyc
# A secondary server
address: “b:27017”
avg_rtt_ms: 10
type: RSSecondary
tags:
data_center: sf
Why YAML?
• “programmatic”
# A primary server
- &PrimaryServer
address: “a:27017”
avg_rtt_ms: 20
type: RSPrimary
tags:
data_center: nyc
# A group of servers
servers:
- *PrimaryServer
- *OtherServer
Test specs in YAML
• One set of tests used by all drivers
• Driver authors write a harness for YAML test suite
• New “mini-language” for each spec, but shared
across all drivers
YAML unittests
• YAML specs can test simple functionality
# Test for computation of round trip time,
# a value used in server selection
avg_rtt_ms: 3.1
new_rtt_ms: 36
new_avg_rtt: 9.68
YAML unittests
• Able to support test-driven
development
• Clearly describe
elementary behavior
YAML integration tests
# Operation performed
operation: read
read_preference:
mode: Nearest
# Expected output,
# both servers are
# suitable
in_latency_window:
- *PrimaryServer
- *SecondaryServer
# Setup for test
topology_description:
type: ReplicaSetWithPrimary
servers:
- &PrimaryServer
address: “a:27017”
avg_rtt_ms: 20
type: RSPrimary
- &SecondaryServer
address: “b:27017”
avg_rtt_ms: 10
type: RSSecondary
driver
10 ms
20 ms
YAML integration tests
# Operation performed
operation: read
read_preference:
mode: Nearest
# Expected output,
# both servers are
# suitable
in_latency_window:
- *PrimaryServer
- *SecondaryServer
# Setup for test
topology_description:
type: ReplicaSetWithPrimary
servers:
- &PrimaryServer
address: “a:27017”
avg_rtt_ms: 20
type: RSPrimary
- &SecondaryServer
address: “b:27017”
avg_rtt_ms: 10
type: RSSecondary
YAML integration tests
YAML integration tests
(link at end of talk)
“Testing with Mongo Orchestration”
talk by Emily Stolfo at MongoDB World
A Brave New World
(for specs)
• Better communication
• Better implementations
• More accountability
YAML doesn’t care
A Brave New World
(for specs)
• Better communication
• Better implementations
• More accountability
• Encourages future specs
The Open Source dream…
• We are standardizing
our drivers
• We are encouraging third-party
drivers to follow suit
• We can prove that external drivers
are as trustworthy as internal drivers
bit.ly/cat-herd
References

Cat-Herd's Crook

  • 1.
    Enforcing Standards in TenProgramming Languages A. Jesse Jiryu Davis and Samantha Ritter Cat-Herd's Crook
  • 2.
  • 3.
    A. Jesse JiryuDavis @JesseJiryuDavis Cat-Herd's Crook
  • 4.
    Agenda • The problem •Why the problem lingered • Failed attempts • Solution: YAML! • Why it worked • The payoff
  • 5.
  • 6.
    MongoDB Drivers Similar stack,many languages application driver • C • C++ • C# • Java • Node.js • Perl • PHP • Python • Ruby • Scala • Go • Haskell • Erlang • Rust • R • Swift • Lua
  • 7.
  • 8.
  • 9.
    The Spec Sayeth: Determinethe "nearest" member as the one with the quickest response to a ping. “ Choose a member randomly from those at most 15ms "farther" than the nearest member. „
  • 10.
    Did We HaveTo Let It Linger? • Too hard to review • No authority • (Especially in open source)
  • 11.
    False Starts • Prosedescription • Reference implementation
  • 12.
    False Starts def choose_member(servers): best_ping_time= min( server.ping_time for server in servers) filtered_servers = [ server for server in servers if server.ping_time <= best_ping_time + 15] return random.choice(filtered_servers)
  • 13.
    False Starts • Prosedescription • Reference implementation • Tests in English prose
  • 14.
    False Starts Given serverswith ping times 10ms and 20ms, the driver reads from both equally. “ „
  • 15.
    False Starts • Prosedescription • Reference implementation • Tests in English prose • Cucumber
  • 16.
    Cucumber Feature: Local Threshold Inorder to support local threshold As a driver author I want to verify driver behavior Scenario: Nearest Given a replica set with 2 nodes And node 0 has latency 10 And node 1 has latency 20 And a document written to all nodes When I track server latency on all nodes And I query with localThresholdMS 15 Then the query occurs on either node
  • 17.
    False Starts • Prosedescription • Reference implementation • Tests in English prose • Cucumber
  • 18.
  • 20.
  • 21.
    What’s YAML? • YetAnother Markup Language • YAML Ain’t Markup Language
  • 22.
    What’s YAML? • Lightweightconfiguration language • A “simple language that describes data”
  • 23.
    What’s YAML? • Mostlanguages can parse YAML • YAML JSON
  • 24.
    Why YAML? • Comparedto JSON: more human-readable • Compared to XML: nicer • Compared to Cucumber: language-neutral
  • 25.
    Why YAML? • Standardcomment format # A primary server address: “a:27017” avg_rtt_ms: 20 type: RSPrimary tags: data_center: nyc # A secondary server address: “b:27017” avg_rtt_ms: 10 type: RSSecondary tags: data_center: sf
  • 26.
    Why YAML? • “programmatic” #A primary server - &PrimaryServer address: “a:27017” avg_rtt_ms: 20 type: RSPrimary tags: data_center: nyc # A group of servers servers: - *PrimaryServer - *OtherServer
  • 27.
    Test specs inYAML • One set of tests used by all drivers • Driver authors write a harness for YAML test suite • New “mini-language” for each spec, but shared across all drivers
  • 28.
    YAML unittests • YAMLspecs can test simple functionality # Test for computation of round trip time, # a value used in server selection avg_rtt_ms: 3.1 new_rtt_ms: 36 new_avg_rtt: 9.68
  • 29.
    YAML unittests • Ableto support test-driven development • Clearly describe elementary behavior
  • 30.
    YAML integration tests #Operation performed operation: read read_preference: mode: Nearest # Expected output, # both servers are # suitable in_latency_window: - *PrimaryServer - *SecondaryServer # Setup for test topology_description: type: ReplicaSetWithPrimary servers: - &PrimaryServer address: “a:27017” avg_rtt_ms: 20 type: RSPrimary - &SecondaryServer address: “b:27017” avg_rtt_ms: 10 type: RSSecondary
  • 31.
  • 32.
    YAML integration tests #Operation performed operation: read read_preference: mode: Nearest # Expected output, # both servers are # suitable in_latency_window: - *PrimaryServer - *SecondaryServer # Setup for test topology_description: type: ReplicaSetWithPrimary servers: - &PrimaryServer address: “a:27017” avg_rtt_ms: 20 type: RSPrimary - &SecondaryServer address: “b:27017” avg_rtt_ms: 10 type: RSSecondary
  • 33.
  • 34.
    YAML integration tests (linkat end of talk) “Testing with Mongo Orchestration” talk by Emily Stolfo at MongoDB World
  • 35.
    A Brave NewWorld (for specs) • Better communication • Better implementations • More accountability
  • 36.
  • 37.
    A Brave NewWorld (for specs) • Better communication • Better implementations • More accountability • Encourages future specs
  • 38.
    The Open Sourcedream… • We are standardizing our drivers • We are encouraging third-party drivers to follow suit • We can prove that external drivers are as trustworthy as internal drivers
  • 39.

Editor's Notes

  • #2 who we are we're going to teach you to herd cats with effective tools
  • #3 who we are we're going to teach you to herd cats with effective tools
  • #4 who we are we're going to teach you to herd cats with effective tools
  • #6 open source, nosql
  • #7 Alas, drivers had unintentional differences, e.g.:
  • #9 They didn't have this problem due to lack of a spec…. … and how it was misinterpreted
  • #10 so we had problems like this, how come they lasted for years?
  • #12 reference impl: no one read it, i didn't maintain it
  • #13 reference impl: no one read it, i didn't maintain it
  • #14 reference impl: no one read it, i didn't maintain it
  • #15 reference impl: no one read it, i didn't maintain it
  • #16 reference impl: no one read it, i didn't maintain it
  • #17 closest we got to a real solution this didn't work for aesthetic cultural reasons also gherkin wrong tool for the job not enough languages supported it, we'd have to write an interpreter in C or whatever language https://github.com/mongodb/mongo-meta-driver/blob/master/features/topology/replica_set/read_preference.feature https://github.com/mongodb/mongo-functional/tree/master/features
  • #18 reference impl: no one read it, i didn't maintain it
  • #22 YAML is an acronym! The first name is an early one. It was changed to the second to point out that it is data-oriented vs. for document markup.
  • #23 A human-readable format for data description. It’s been around for about 15 years.
  • #24 Most languages have some sort of YAML parser. Designed with concepts from C, Perl, Python, and converts easily to types in similar languages Converts easily to JSON (JSON is a native format for MongoDB). When we release a suite of tests, we release it in JSON and YAML.
  • #25 So, why do we like it? Readable (will talk more about this) Nicer than XML (self-explanatory) Language-neutral - parsers exist for most modern languages. And, since YAML was based on several languages, it has no strong relationship to any one in particular :D
  • #26 The first “killer feature” of YAML: comments! JSON has no standard comment format. Allows test to serve as spec description as well as enforcement These are examples of servers against which we might test server selection (reference Jesse’s earlier example)
  • #27 Second “killer feature:” it is almost programmatic! I mean, it allows you to avoid repeating data. Very useful for tests! Easy to specify similar setup across tests, or express data types repeated in different phases of tests. This is one of those servers from before, given a name (“PrimaryServer”) and then referenced in an array of servers below.
  • #28 So, what do our tests look like? One set of tests for all drivers solves our problem of “different tests testing slightly different things,” no room left for misinterpretation. Driver authors write harness to parse the YAML test suite Still some work, but less work, and harnesses look very similar across languages Mini-language for each new spec, we release a README with each suite of YAML tests to explain how to parse them.
  • #29 But specifically, what do our tests look like? Here is a unit test! What is a unit test? A test that tests a small unit of work, for example, calculation of server round trip time. We need this value to determine latency for server selection.
  • #30 These tests are good! Test-driven development - this is especially helpful for community driver authors who are not privy to the spec’s internal development and discussion, and who are implementing the spec’ed feature for the first time Again, more clear description of spec’s described behavior. Less room for misinterpretation. Less work for each driver, authors no longer need to write their own unit tests - previously this was left up to the driver author, hence our drivers’ test suites have vastly varying coverage :D Good as simple regression tests.
  • #31 TODO: highlight latency window values, add YAML comment These tests are also good! Test more like end-to-end functionality. This is a test for server selection, for the example Jesse described - two servers within latency window - very clear what behavior driver should have
  • #32 They didn't have this problem due to lack of a spec…. … and how it was misinterpreted
  • #33 Test more like end-to-end functionality. This is a test for server selection, for the example Jesse described - two servers within latency window - very clear what behavior driver should have
  • #34 Test more like end-to-end functionality. This is a test for server selection, for the example Jesse described - two servers within latency window - very clear what behavior driver should have
  • #35 Want to know more about how we use these internally? Check out Emily’s talk!
  • #36 Better communication: - internally and externally YAML tests are spec test specs Better implementations: smaller margin for misinterpretation test-driven development Accountability: No special knowledge of languages needed Less time Better spec compliance More even test coverage across drivers, internally and externally Central “authority” - YAML doesn’t care
  • #38 Spec-writing is still a long and difficult process - can we really ever get 10 different project maintainers to agree on anything?? …but at least the process is worth it now! - specs WILL be adopted once written
  • #39 We are standardizing out drivers! - fewer surprising differences in behavior - efficient support -> key to open source success - our support team is happy, unified APIs are much easier to support, customers are happier, good for us, but better for the people who use our product - more even test coverage is a good thing We want community driver authors to follow suit! - and now they can - they want to! - in the past, authors upset when we surprise them with specs and very little guidance on how to implement and test them - now, specs might still be “a surprise” but at least they are a more well-defined, better-described, and easy-to-test surprise - in the future, we hope to open up the spec-writing process as well as the spec-testing process to the community. Lastly, most importantly, we can prove that external drivers are great! - we want to support our community developers - now we have tangible proof that they are as trustworthy as ones we write internally - this is great! TODO: some kind of hiring plug
  • #40 TODO: - job openings - spec resources drivers guide? - be clear about third-party driver status