Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
What to Upload to SlideShare
Loading in …3
×
1 of 35

Archival Storage at Two Sigma - Josh Leners

1

Share

Download to read offline

This talk is about archival storage at Two Sigma. We begin by presenting CelFS, Two Sigma’s geo-distributed file system which has been in deployment for over ten years. Although CelFS has scaled to serve tens of petabytes of data, it uses physical partitioning to provide quality of service guarantees, it has a high replication overhead, and cannot take advantage of outsourced cold storage (e.g., Amazon’s Glaclier or Google’s coldline). In the rest of the talk, we describe our response to these limitations in Jaks, a new storage system to reduce the TCO of CelFS and serve as the backend for other systems at Two Sigma. We also discuss how we hedge risk in changing such a foundational system.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Archival Storage at Two Sigma - Josh Leners

  1. 1. www.twosigma.com Archival storage at Two Sigma September 13, 2018 Josh Leners
  2. 2. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. What is Two Sigma? September 13, 2018 • Technology company applying data science platform to investment management • Follow the scientific method for finding investment strategies • Over 2/3 technical staff; 72% non-financial • 10,000 data sources • 35 PB of data • 95000 CPUs; 1.7 PB Memory
  3. 3. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. “If x, then y and z correlate” Bloomberg, Thompson Reuters Analysis/news Prices, order books, trades Market data “We look beyond the obvious. So we can find connections that lead to the next great investment idea” Other data Data at Two Sigma September 13, 2018 Modeling/ Research “when x, buy y and sell z” Trading tactic $$$
  4. 4. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. This talk September 13, 2018 • Celfs: evolution of an archival file store • Jaks: a next generation backend • What an academic has learned in industry
  5. 5. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. Celfs: the architecture September 13, 2018 Celfs stores filesystem snapshots, or views. Root servers name and locate views. Data servers locate and store files. Metadata Server Root server Root Server Data server Data server Data server Data server Data server Data server Data server Data server Data server Data server NYC CHI
  6. 6. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. 2017-09-21 ------ File 1, File A client 9/21 /home/dir/: File 1 File A client 9/22 /home/dir/: File 1 File 2 File A File B client 9/23 /home/dir/: File 3 File C Celfs: the data model September 13, 2018 Cel 1. ------ File 1, File A Cel 2 ------ File 2, File B Cel 3. ------ File 3, File C LATEST ------ File 1, File A File 2, File B File 3, File C 2017-09-22 ------ File 1, File A File 2, File B 2017-09-23 ------ File 1, File A File 2, File B File 3, File C
  7. 7. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. Celfs: the teleology September 13, 2018 • Archival storage — root servers and data servers are multi-datacenter • CDN — publish information in one datacenter to another with strong consistency guarantees • High bandwidth data source — because cels are randomly distributed a large view will often be able to make use of the whole cluster’s bandwidth
  8. 8. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. Celfs drawback: storage TCO September 13, 2018 Single unit of scaling: Lots of data center real-estate, power, cooling, etc. Data has three total copies (vulnerable to a small number of disk failures) Data server
  9. 9. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. Celfs drawback: performance isolation and scalability September 13, 2018 Data server Data server Data server Data server Data server Large-scale computations Fairness based on per-user limits, so single user can’t utilize whole system. Cluster-level isolation makes scaling trade-offs worse!
  10. 10. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. This talk September 13, 2018 • Celfs: evolution of an archival file store • Jaks: a next generation backend • What an academic learned in industry
  11. 11. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. JAKS: Just another keystone for storage September 13, 2018 Most simply: put(Object) -> id get(id) -> Object delete(id) -> ok Under the hood: • Tiered storage • End-to-end encryption • Quality of service …
  12. 12. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. Storage tiers: where data lives September 13, 2018 Bandwidth/ Speed Cost/GB RAM SSD Erasure encoded disk arrays Offline storage (Glacier/Coldine/Tape) 100s Gbps 1000s Mbps 10s Mbps
  13. 13. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. JAKS: implementing storage tiers September 13, 2018 Metadata Server Metadata Server Metadata Gateway Data Gateway Data Gateway Data Gateway Data Gateway Data Gateway Data Gateway consistent metadata store backing store other sites client
  14. 14. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. JAKS: implementing storage tiers September 13, 2018 • Clients only talk to gateways in their site • Freedom to change backing store and metadata store • Data gateways are unit of scaling for bandwidth; their RAM/SSDs scale cache • Clients load-balance across gateways to make full use of cluster • Random for metadata • Consistent hash for data
  15. 15. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. Caching in Jaks September 13, 2018 • Data in Jaks can be cached with three policies • Pinned — data guaranteed to not be evicted (regardless of use) until some future point in time • Long cycle — data is not evicted until it hasn’t been used for a few weeks • Short cycle — data is not evicted until it hasn’t been used for a few days
  16. 16. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. Measuring access time in Jaks September 13, 2018 • Use two times • mtime (when a file was created) • atime (when a file was accessed) • Can’t use filesystem “atime” because of SSD wear • Use off-disk Bloom filters measuring daily access
  17. 17. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. Cache eviction in detail (today is Oct 13) September 13, 2018 Dec 5 Oct 12 Oct 7 Oct 1 Oct 13 Oct 9 Long Cycle Short Cycle Periodically: 1. Evict aged out entries 2. Check space, evict random if full
  18. 18. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. End-to-end encryption September 13, 2018 Metadata Server Metadata Server Metadata Gateway Data Gateway Data Gateway Data Gateway Data Gateway Data Gateway Data Gateway consistent metadata store backing store client get(27) secret secret PUT hash(data) 200 OK
  19. 19. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. End-to-end encryption details September 13, 2018 • Use authenticated encryption scheme (AES-OCB) • Derive baking store names from object’s secret • End-to-end check is powerful!!
  20. 20. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. Performance isolation and bursty workflows September 13, 2018 Data server Data server Large-scale computations Requirements: • Allow user to take advantage of whole system if idle • Prevent oversubscription from degrading service below SLA
  21. 21. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. Quality of service: Admission controllers September 13, 2018 • Need to limit bandwidth resources • Inbound/outbound traffic per network interface • Inbound/outbound traffic per backend • Need to limit fixed resources • Database connections (in Metadata servers) • Staging space (for uncached writes/reads)
  22. 22. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. Quality of Service: queuing and allocation September 13, 2018 Background work Research workflow Trading daemons Rachel Barry TomTina Beth RalphRandy Medium priority Guarantee 60% Gets 40% of excess Lowest priority Guarantee 10% Gets 50% of excess Highest priority Guarantee 30% Gets 10% of excess
  23. 23. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. Quality of Service: flow control September 13, 2018 • How to allocate resources like network bandwidth? • Undersubscribe the OS  sub-optimal utilization • Oversubscribe the OS  less control over allocation • Need performance feedback to determine how much flow to allocate • How can we measure TCP performance from the user level?
  24. 24. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. Measuring TCP performance from user space September 13, 2018 Server: Send 54 KB Wait 27 us Send 54 KB … Case 1: client can receive at maximum allowed rate. - Send buffer never fills up Case 2: client can’t receive at maximum allowed rate. - Send buffer fills up Gotchas: - This feedback only works when RTT is low - Feedback only effective if transfers are long - Still need to account for duty cycle on backend
  25. 25. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. Quality of Service: backpressure September 13, 2018 Data Gateway Ralph client Response, backlog info Backlog at server is communicated on every response. Clients use backlog to rate limit. Rejections (queue too full) lead to exponential backoff
  26. 26. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. JAKS: Just another keystone for storage September 13, 2018 Most simply: put(Object) -> id get(id) -> Object delete(id) -> ok Under the hood: • End-to-end encryption • Tiered storage (cached, normal, cold) • Quality of service … Not covered: - slow clients - high-availability restarts - fault-tolerance - consistent hashing strategy - geographic replication
  27. 27. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. This talk September 13, 2018 • Celfs: evolution of an archival file store • Jaks: a next generation backend • What an academic learned in industry
  28. 28. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. What an academic learned: measurement September 13, 2018 Grad school: building measurement framework • Need to test hypotheses • Need to get graphs into the paper! Industry: building measurement framework • Need to validate changes and measure impact (aka “test hypotheses”) • Need to understand performance • Need to detect and anticipate problems
  29. 29. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. What an academic learned: hedging risk September 13, 2018 Celfs is stable, important, and highly integrated • can’t expect people to jump ship voluntarily Need extensive exposure to find bugs and gain confidence • Jaks development starts January 2016; End-to-end deployment in March 2016 • Finally made GA this month (still have a Celfs safety net)
  30. 30. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. What an academic learned: compatibility September 13, 2018 Academic: thick clients allow more sophisticated fault-tolerance and scaling Industry: thick clients allow more sophisticated bugs to persist
  31. 31. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. What an academic learned: build vs. buy decisions September 13, 2018 Celfs — it’s 2006 and Hadoop is just being born from Apache Nutch Jaks • We want to avoid lock-in • Geo-redundancy not a common ask for vendors • We need performance isolation Ultimately, we took a hybrid approach: building gateways
  32. 32. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. What an academic learned: unexpected failures September 13, 2018 Jaks is designed to tolerate faults in gateways, backend stores, and other sites • Failure handling is most important part of integration testing Hard to predict all failure scenarios (Byzantine Fault Tolerance won’t help!) • Firewall configuration creates partition to certain hosts • MTU settings disable Kerberos negotiation • Misuse of Kerberos library causes authentication failures • Stale network info misdirects clients
  33. 33. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. Placeholder before backup slides September 13, 2018
  34. 34. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. Gateway caching performance as a function of clients reading 100 MB September 13, 2018 0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 1 10 20 40 80 0% hot 50% hot 75% hot 90% hot 100% hot number of clients MBps
  35. 35. For illustration purposes only. Not an offer to buy or sell securities. Two Sigma may modify its investment approach and portfolio parameters in the future in any manner that it believes is consistent with its fiduciary duty to its clients. There is no guarantee that Two Sigma or its products will be successful in achieving any or all of their investment objectives. Moreover, all investments involve some degree of risk, not all of which will be successfully mitigated. Please see the last page of this presentation for important disclosure information. Small read performance (64 KB) September 13, 2018 0 10,000 20,000 30,000 40,000 50,000 0 40 80 120 160 0% hot 50% hot 75% hot 90% hot 100% hot number of clients IOPS

×