This document discusses Scality's experiences building their first Node.js project. It summarizes that the project was building a TiVo-like cloud service for 25 million users, which required high parallelism and throughput of terabytes per second. It also discusses lessons learned around logging performance, optimizing the event loop and buffers, and useful Node.js tools.
3. When to use object storage?
1. Need for capacities beyond 100 TB and
growing fast
2. Very large number of clients accessing
isolated data
3. Object must be > 100KB, otherwise use a
Database
Bucket 1
Object A
Object B
Object C
Bucket 2
Object A
Object B
Object …
Object Z
5. Copyright Scality 2014Copyright Scality 2014
Our first node.js project - Building a Tivo in the Cloud
• 25 million users -> Designed for high degree of
parallelism
• TB/sec –> Need very efficient network transfer
• Scales out by adding nodes and drives
• Proved 30 GB/sec of ingest with 10 servers and 360
drives
SS1
Scality
FanOut
APPLICATION SERVER
A/V Fragment 1 fragment sent with X fanout
1 2 3 4 5 6 7 1 2 3
fragment is erasure coded (7,3)
data slices code slices
metadata chunk+
1
A B C D E F
HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD
HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD
HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD
Scality
FanOut
Comcast Live
Recorder
+
Chunking
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
SS1
1 to 10 servers
SS2 SS10
Test Case Latency (seconds)
Duration Recordings Batch Size Sockets
RPM per
Client
Threads per
Client
Average at 95% at 99%
2 hours 20,000 2500 1000 270 63 0.159 0.319 0.426
7. S3-Server
AWS S3 compatible server
Open source
https://github.com/scality/s3
Can use local storage
S3-MetaData
A distributed metadata database service
Supports fast Bucket & object listing
Stores ACL and Users/Groups
S3-Vault
Security, Identity & Authentication Service
Provides Accounts/Keys
Supports AWS IAM Users & Groups
Interoperable with user directory services (via SAML)
What we built: Three Key Components
7
9. Logging is hard
• Challenges
• Logging is expensive as it taxes the Node.js process
• UDP datagrams have expensive DNS lookups
• Redundant transformations by bunyan and bunyan-logstash
• Solution: Werelogs
• Produces raw JSON logs with the least resistive path
• Forward logs to ELK using Filebeat for indexing
• Avoids expensive and redundant transformations
• Ability to track requests across the components with UIDs
• Dump log history on errors
Open source -> http://github.com/scality/werelogs
13. The performance cycle
Code, Benchmark, … Repeat
• Socket & Nagle algorithm on by default -> very high
latencies
• The event loop can get backed up quickly -> hunt for
all cpu intensive tasks in the main loop
• Buffers are much more efficient when writing server
response
• Micro optimizations: Date.now() > new Date()
• Beware of libraries doing way too many things for you
• ES6 support, Babel5 was killing performance -> Babel6
16. Nifty Node Tools
Getting going
• Airbnb JavaScript Style Guide + Eslint
• babel — babel5 to babel6 with just imports,
destructuring and default parameters
• Commander — cool cli tools in minutes
• Async
17. Nifty Node Tools
Getting serious
• Level — LevelDB wrapper for node
• Memcached — client library for node
• xml — <parse>yes</parse>
• Profiler — Go fast or go home
19. Nifty Node Tools
Docs and Open Source Code
• Docs are good, but
• Code is even better
• Read the readable stream code and take a nap.
• Then read the transform stream code and create new
universes.