Talk delivered at the Nowhere Developers Conference in Bentonville, AR.
March 15, 2018
Video link: https://youtu.be/PkAkDhBfLWk
In this tale, a software team attempts to build a new system to replace their old system that was failing because of its inability to scale. The system they end up building meets all their criteria for scaling, but they discover that it has broken other criteria in ways that they did not anticipate.
David Max is a Senior Software Engineer at LinkedIn in New York City where he helps build software systems to connect the world’s professionals and create economic opportunity for every member of the global workforce. He earned his undergraduate degree in Computer Science from the California Institute of Technology, and has a Masters degree in Computer Science from New York University. He has previously worked at Google and in the financial technology field.
About the Conference: Nowhere Developers Conference 2018 brought together hundreds of developers and engineers from across the U.S. for a one-day, pure tech conference in Bentonville, AR on March 15. With a mission of showcasing the incredible development and engineering talent in middle America, the conference featured local, regional, and national speakers from companies like Google, Mozilla, Mailchimp, and Walmart speaking about the state of the industry, software engineering, emerging technology such as machine learning and cryptocurrency, and the region's growing technology ecosystem. For more information, visit www.nowheredevelopers.com or on Facebook, Twitter, and Instagram.
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
A Tale of Two Systems - Insights from Software Architecture
1. A TALE OF TWO SYSTEMS:
INSIGHTS FROM
SOFTWARE ARCHITECTURE
DAVID MAX
Senior Software Engineer
2. ABOUT LINKEDIN NEW YORK CITY
● Located in Empire State Building.
● Approximately 90 engineers and out of
about 1000 employees total.
● Multiple teams, front end, back end and
data science.
3. #nwd2018WHAT “TWO SYSTEMS”?
System 1
● A working system that is nearing the limits of its capacity.
System 2
● The replacement system designed to address the capacity issues.
○ Solves the capacity problem…
○ …but utterly fails in other ways.
4. ANTI-PATTERN
“A common response to a recurring
problem that is usually ineffective and
risks being highly counterproductive.”
– Wikipedia
“An antipattern is just like a pattern,
except that instead of a solution it gives
something that looks superficially like a
solution but isn’t one.”
– Andrew Koenig
5. COACH VS. ROOKIE
More powerful conceptual
models help us better make
sense of what we see.
6. WHAT THE COACH HAS IS...
“...a set of mental abstractions that allow him to convert his
perceptions of raw phenomena, such as a ball being passed, into a
condensed and integrated understanding of what is happening,
such as the success of an offensive strategy.
The coach watches the same game that the rookie does, but he
understands it better.”
– George Fairbanks, Just Enough Software Architecture
7. THINKING LIKE A
COACH -
CONCEPTUAL MODELS
“Software Architecture refers to the high
level structures of a software system, the
discipline of creating such structures,
and the documentation of these
structures. These structures are needed
to reason about the software system.”
– Wikipedia
“Software architecture is the set of design
decisions which, if made incorrectly, may
cause your project to be cancelled.”
― Eoin Woods
What is Software Architecture?
8. #nwd2018ARCHITECTURALLY SIGNIFICANT REQUIREMENTS (ASRs)
Constraints - Unchangeable design decisions, usually given, sometimes
chosen.
Quality Attributes - Externally visible properties that characterize how
the system operates in a specific context.
Influential Functional Requirements - Features and functions that
require special attention in the architecture.
Other Influencers - Time, knowledge, experience, skills, office politics,
your own geeky biases, and all the other stuff that sways your decision
making.
― Michael Keeling, Design It!
9. #nwd2018QUALITY ATTRIBUTES - STANDARD BLENDER
Pros:
● Powerful motor (550 Watts)
● Sits well on kitchen counter
● Dishwasher safe
Cons:
● Must be plugged in
● Limited portability
(example from Design It! by Michael Keeling)
10. #nwd2018CORDLESS RECHARGEABLE HAND BLENDER
Pros:
● Small, very portable
● Doesn’t need electric outlet to operate
● Very easy to clean
Cons:
● Less powerful (2.5 Watts)
● Needs to be recharged after 20 minutes
● Must hold in hand to operate
11. #nwd2018CHAINSAW BLENDER
Pros
● Portable, doesn’t need
electric outlet
● Powerful! (37cc gas-powered
engine)
Cons
● Tad loud
● Emits exhaust unsafe for
indoor use
● Not suitable for kitchen
countertop use
12. #nwd2018TAKEAWAYS
● Three solutions for accomplishing the same task
● Each solution promotes a different set of quality attributes
● Quality attributes often trade off against each other
● The “best” design depends on which properties are most highly valued
16. #nwd2018PROBLEMS
● Aggregator terminates with an out-of-memory error on the
largest inputs.
● Task Manager shows there’s plenty of memory left.
● A single memory allocation is requesting well over 500MB at
once, and fails.
WHO NEEDS 500MB at once?
If there is plenty of memory left, why is it failing?
17. #nwd2018WIN32 PROCESS ADDRESS SPACE
2 GB
8000000
FFFFFFFF
0000000
System virtual address space.
Reserved for use by system.
0000000
2 GB
0000000
7FFFFFFF
Per-process virtual address space.
Available for use by applications
19. #nwd2018ADDRESS SPACE FRAGMENTATION
Even with plenty of memory available, fragmentation of the
address space means there’s not enough contiguous address space
to fit this new block:
20. #nwd2018COACHABLE MOMENT
● Don’t wait until your system is already blowing up.
● Some scaling problems can’t be solved by buying a bigger computer.
21. #nwd2018LET’S FIX IT!
Symptom: Aggregator is failing with an out-of-memory error.
Reason: Output file is too large to fit in a Win32 memory mapped file.
Analysis: Current implementation can’t scale beyond a certain size output.
Conclusion: We have a scalability problem.
Solution: Replace aggregation data store with a more scalable solution.
28. #nwd2018ROOKIE MISTAKES
● Include all constraints
○ Fixated on scalability
○ Forgot that we also had important time constraint as well!
● Quality Attributes
○ Worried mainly about scalability, time to implement, and reducing
changes to other parts of the system.
○ Forgot that quality attributes trade off against each other, and did
not analyze to what extent scalability is an ASR.
● Other differences
○ Single process memory mapped files have different performance
characteristics from in-memory distributed data caches.
29. #nwd2018SIGNIFICANT DIFFERENCES
Scenario - Lots of workers writing to same record.
Memory Mapped File - Best performance because the memory page is
most likely to be in memory. Less likely to need to swap to disk.
File on Disk
Mapped
Address
Range
Memory PageCPU Cache
Worker
Worker
Worker
Worker
Worker
30. #nwd2018IN-MEMORY DISTRIBUTED CACHE
Scenario - Lots of workers writing to same record.
Worst performance when workers write to the
same record on different machines because of
node-to-node synchronization.
Node Node
NodeNode
Node Node
Worker
Worker
Worker
Worker
31. #nwd2018IN-MEMORY DISTRIBUTED CACHE
Scenario - Lots of workers writing to same node.
Poor performance because unable to distribute load.
Node Node
NodeNode
Node Node
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
32. #nwd2018MEMORY MAPPED FILE
Scenario - Every worker writes to a different record.
Worse performance, because fewer cache hits,
more page faults, and more disk I/O.
File on Disk
Mapped
Address
Range
Memory PageCPU Cache
Worker
Worker
Worker
Worker
Worker
Memory Page
Page Fault
33. #nwd2018IN-MEMORY DISTRIBUTED CACHE
Scenario - Records associated with particular nodes. Load distributed over nodes.
Best performance. Record locality minimizes node-to-node synchronization.
Distributing connections over the cluster promotes better scaling.
Node Node
NodeNode
Node Node
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
34. #nwd2018CONCLUSION
● Thinking about the architecture helps us better understand how what
we are building addresses the important requirements.
● Promoting one quality attribute usually involves some kind of tradeoff.
Software Engineering is the discipline of balancing tradeoffs.
● The architecture is the hardest thing to change after the fact, so it pays
to invest some time up front analyzing the ASRs.
● Don’t wait until your system is falling over to make needed changes.
Less time spent on the architecture up front often means more time
spent doing avoidable rework later.