Your SlideShare is downloading. ×
Lessons From Building and
Scaling LinkedIn
Jay Kreps
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese an...
Presented at QCon New York
www.qconnewyork.com
Purpose of QCon
- to empower software development by facilitating the sprea...
what’s changed?
• data size
• traffic
• # engineers
• code size
what did I learn?
scaling
the site
#1: scalability is
about managing
state
and state belongs in
data systems
in the beginning…
…going distributed…
…and now
#2:
make
simple
cheap
scalable
primitives
#3: ops first
#4: do hard
things later
LinkedIn data systems
• Voldemort—key/value storage
– 400k peak queries per second
– 400 tables
• Kafka—messaging
– 460k p...
services
cutting up the application
in the beginning…
…going distributed…
..and now
#5:
services
(may)
scale
development
#6: Bad Services << No Services << Good
Services
Service layer evolution
• HTTP + serialized java
• Protocol Buffers + TCP
• REST + JSON
#7:
the
service
contract
is
binary
#8:
isolation
vs
utilization
scaling
software
engineering
cutting up the code base
#9:
build
your
process
we model
the complete lifecycle:
review
checkin
build
rollout
governance
communism vs capitalism
#10: treat code as property
#11: but you need effective government
#12: don’t
duplicate
integration layers
agility at scale
#14:
agility
requires
safety
two cheers
for testing
tests
are focused
on predicting things
that will go wrong
#15: measurement > prediction
how to measure cost?
#16:
reduce time in error
or
exposure to error
jay.kreps@linkedin.com
http://www.linkedin.com/in/jaykreps
@jaykreps
Lessons from Building and Scaling LinkedIn
Lessons from Building and Scaling LinkedIn
Upcoming SlideShare
Loading in...5
×

Lessons from Building and Scaling LinkedIn

3,147

Published on

Video and slides synchronized, mp3 and slide download available at http://bit.ly/145Y513.

Jay Kreps discusses the evolution of LinkedIn's architecture and lessons learned scaling from a monolithic application to a distributed set of services, from one database to distributed data stores.Filmed at qconnewyork.com.

Jay Kreps is a Principle Staff Engineer at LinkedIn where he acts as the technical lead for data infrastructure and relevance areas. He is the original engineer for several open source projects developed there including Voldemort (a key-value store), Azkaban (a Hadoop workflow scheduler), and Apache Kafka (a distributed messaging system).

Published in: Technology, Business

Transcript of "Lessons from Building and Scaling LinkedIn"

  1. 1. Lessons From Building and Scaling LinkedIn Jay Kreps
  2. 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /linkedin-architecture-stack
  3. 3. Presented at QCon New York www.qconnewyork.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  4. 4. what’s changed? • data size • traffic • # engineers • code size
  5. 5. what did I learn?
  6. 6. scaling the site
  7. 7. #1: scalability is about managing state
  8. 8. and state belongs in data systems
  9. 9. in the beginning…
  10. 10. …going distributed…
  11. 11. …and now
  12. 12. #2: make simple cheap scalable primitives
  13. 13. #3: ops first
  14. 14. #4: do hard things later
  15. 15. LinkedIn data systems • Voldemort—key/value storage – 400k peak queries per second – 400 tables • Kafka—messaging – 460k peak unique message writes per second – 2.3m peak message reads • GraphDB—graph operations – 80k peak queries per second • Hadoop—batch processing and analytics – 5000 nodes – 17 PB – 30,000 jobs per day – 700 users
  16. 16. services
  17. 17. cutting up the application
  18. 18. in the beginning…
  19. 19. …going distributed…
  20. 20. ..and now
  21. 21. #5: services (may) scale development
  22. 22. #6: Bad Services << No Services << Good Services
  23. 23. Service layer evolution • HTTP + serialized java • Protocol Buffers + TCP • REST + JSON
  24. 24. #7: the service contract is binary
  25. 25. #8: isolation vs utilization
  26. 26. scaling software engineering
  27. 27. cutting up the code base
  28. 28. #9: build your process
  29. 29. we model the complete lifecycle: review checkin build rollout
  30. 30. governance
  31. 31. communism vs capitalism
  32. 32. #10: treat code as property
  33. 33. #11: but you need effective government
  34. 34. #12: don’t duplicate integration layers
  35. 35. agility at scale
  36. 36. #14: agility requires safety
  37. 37. two cheers for testing
  38. 38. tests are focused on predicting things that will go wrong
  39. 39. #15: measurement > prediction
  40. 40. how to measure cost?
  41. 41. #16: reduce time in error or exposure to error
  42. 42. jay.kreps@linkedin.com http://www.linkedin.com/in/jaykreps @jaykreps

×