Node.js Web Apps
@ ebay scale
By Dmytro Semenov
Member of Technical Staff @ NodeJS, Cloud and Platform Services, eBay Inc.
dsemenov@ebay.com
Our Journey
1995 2004 2013 present2006
Perl
C++
XSL/XML
Java
Node
Scala
Java UI, JSP Extended
Python Go ???
future
Marko/Lasso(Tessa)
Polyglot initiative
OSGi
Programming Language History
● Perl/Apache (no layers)
○ Scalability limit of 50K active listings
● C/C++/IIS/XML/XSL (monolithic client/server architecture)
○ Compiler limit of methods per class, very long build times (2+ hours)
● Java/J2EE (3 tier architecture, split build into domain specific)
○ XML/XSL (MSXSL) as presentation layer hit native memory limit
○ Java UI components, proprietary technology, steep learning curve
○ Startup time up to 40 minutes
● Java Stack (embrace open-source, JSP, modular approach)
○ OSGi conflicts at startup, very hard to get everything working at the start
○ Slow startup time (2+ minutes)
● Polyglot initiative to support popular languages/technologies
Node.JS Introduction
● Started in 2013
○ CubeJS, first attempt was not so good,
○ Unified NodeJS, eBay and PayPal platform merge
○ Big projects decided to move to NodeJS
● Embedding into application teams
○ Accumulation of issue knowledge base
○ Build confidence and form local knowledge centers
○ Fast feedback
○ Fast project startup
○ Pick new ideas from application teams
Current State
● Node 4.x
● 200 applications and growing
● 80 million requests / day now 1 billion
● Platform team of 6 developers
● Very vibrant internal community
○ Local pool of knowledge among teams
○ Self sustaining support by internal community
○ PRs are welcome
○ 45 contributors to eBay NodeJS platform
● 80 platform modules
● 330 total modules
● More agile
○ Startup time <2s
○ Test coverage close to 90%-100%
○ Faster releases (every day) vs 2 weeks cycle
○ Automatic upgrades with semver
○ Modular UI architecture based on UI components
● Better tools
○ flame graphs, waterfall charts, pre-production testing
● Learnings Applied to Java Platform
○ Incorporate best practices from NodeJS
○ Startup time < 1 minute
○ Embrace modular approach and semver
○ Lighter Java stacks
Architecture
NPM: Private
Kappa
CouchDB
Primary
CouchDB
Backup
replicate
registry.npmjs.org
npmjs.ebay.com
fallback
LB
Kappa
Kappa
Multi-Screen World
mobile web
browser
native
App
Server
Domain
Experience
Service
services
API
Gateway
ajax
● Experience Service
○ Provides view models for UI modules
○ View model is specific to the device type (native, size, desktop, tablet)
● NodeJS Web App
○ Talk to experience service
○ Handles UI functionality
○ Renders desktop and mobile web pages
● Native Apps
○ Talk to experience service via API gateway
Module View
pm2
node.js
express (3 workers)
platform-ebay
analytics
logging
config
service client
lasso plugin
app
servicespages components
kraken-js lasso/tessamarko
Don’t Build Pages, Build Modules
The old directory
structure (bad)...
src/
pages/
component
s/
src/components/login-form/
index.js
style.css
template.marko
src/pages/login/
index.js
style.css
browser.json
UI Technologies
● Async + streaming
● Custom tags
● Browser + Node.js runtime
● Lightweight
● Extremely fast
● Compiles to CommonJS
markojs.com
github.com/marko-js/marko
github.com/marko-js/marko-widgets
Templating
UI components
● DOM diffing/patching
● Batched updates
● Stateful widgets
● Declarative event binding
● Efficient event delegation
● Lightweight runtime
UI components built using Marko Widgets
Uses Marko templates as the view. Plus:
github.com/marko-js/marko-widgets
● DOM diffing/patching
● Batched updates
● Stateful widgets
● Declarative event binding
● Efficient event delegation
● Lightweight runtime
Speed is king
Node is built on a streaming interface
HTTP is a streaming protocol
So why then do most templating and UI libraries render
as a single chunk?
● Faster perceived page load
● Better time, resource and traffic utilization
● Progressive rendering between browser, frontend and backend
○ Frontend<->Backend: Stream of server side events (SSE)
○ Browser<->Frontend: Module render and flush on arrival
● Marko helps:
○ Async fragments support by marko-async
○ Re-order on client side
Marko vs React: Server-side rendering
github.com/patrick-steele-idem/marko-vs-react
browser
Domain
Experience
Service
http/https http/https
chunked stream sse stream
App
Server
Service Invocation
Why not use middleware pattern on the
client?
logging
error handling
analytics
circuit breaker
retry
security/oauth
http/https/sse
serviceinvocationhandlers
service client
middlewares
service
web app
request
response
cookies
body-parser
analytics
● Generic Service Client with pluggable handlers
● Handler performs a single function
● The handler pipeline is configurable per client
Configuration based Code based
"my-service-client": {
"protocol": "http:",
"hostname": "myservice.com",
“path”: “/path/to/resource”,
"port": 8080,
"socketTimeout": 1000,
"pipeline": [
"logging/handler",
"circuit-breaker/handler",
"error/handler" ]
}
var serviceClient = require(‘service-client’);
serviceClient.use(require(‘logging/handler’));
serviceClient.use(require(‘circuit-breaker/handler’));
serviceClient.use(require(‘error/handler’));
serviceClient
.get(‘http://myservice.com/path/to/resource’)
.end((err, response) => {
console.log(response.body);
})
Resource Bundling and
Externalization
Why not deploy everything
at once?
Resource
Server
App
Server
1. GET
http://www.ebay.com
2. GET
http://rs/check/hp-24512.js
3. PUT
http://rs/upload/hp-24512.js
4. <script src=”hp-
24512.js”>
5. GET http://hp-
24512.js
6. GET http://hp-
24512.js
Akamai
browser
lasso.js/tessa
JavaScript module bundler + asset pipeline
github.com/lasso-js/lasso
● Supported resources
○ Defined by lasso plugins
○ CSS/JS/Images
○ Templates
○ I18n content
○ Bundle definition
● Lasso plugin
○ Adaptor between lasso and resource server
● Benefits
○ Single build and deployment
○ No synchronization problems between app and resource server deployments
○ Externalization at startup and during runtime on-the-fly
Configuration Management
app/config.json
/node_modules
/moduleA/config.json
/moduleB/config.json
CMS
app/1.0.1/config: {}
moduleA/1.1.1/config: {}
moduleB/1.0.0/config: {}
pull every minute
Config
Deployment
App
Deployment
App
Server
App
Server
App
Server
● Module configuration
○ Local, packaged with module
○ Remote, hot deployed
● Application configuration
○ Local
○ Remote, hot deployed
● Application can override any module configuration
● Configuration can be injected via Admin Console
● Future - Code and Config “separation”, but
○ Keep app and config together in git repo and separate at deployment.
○ Easier to manage
i18n
● Use krakenjs/spud module
○ Property file format
○ Marko tags/helpers
● Externalizable as resources
● 17 main languages
● Support multiple languages per country
● Application and modules can have localizable content
app/locales/
US/
en/
ProjectName/
xxx.properties
yyy.properties
ru/
ProjectName/
xxx.properties
yyy.properties
DE/
de/
ProjectName/
xxx.properties
yyy.properties
Folder Structure Example
Security
● Nsp tool
○ On-demand scan for every application project
○ Security badge for every platform module
● ScanJS
○ Source code scan
● CSRF tokens
● Redirect validation
● XSS
● Rate limiter
Logging & Monitoring
● Logging every transaction/subtransaction
○ Central Logging Repository (CAL) provides log aggregation per pool/box/datacenter
○ Use Domains to maintain context per request and avoid passing it around
● Explicit code instrumentation
○ Support DEBUG, INFO, WARN, ERROR, FATAL
○ Time span to record transaction (start and end)
○ Nested spans
● Health checks/stats monitoring and alerts
● Crash/OOM emails to the owner of the pool
● Early problem detection using traffic mirror
App Resiliency
● Proactive testing in pre-production
○ Traffic mirroring of read-only traffic to the box with new build
○ Easy upgrades
● Handling uncaught errors
○ Domains used to capture context
○ Send email to the owner with stack trace, group and box name, request info
○ Graceful restart
● Handling memory leaks
○ Email sent to the owner with group and box name, request info
○ Graceful restart when memory threshold is reached
● Too busy load shedding
○ 503 or connection reset to trigger browser DNS fallback
○ Filter bots traffic under heavy load
● Planning for failure
● Hystrix like service calls
○ Fail fast
○ Circuit breaker
Performance Optimization
● Fast startup/re-start
○ Cold cache to avoid service invocation
○ Pre-compiling template @ deployment
○ Pre-externalizing resources @ deployment
● Progressive rendering/streaming @ browser side
● Progressive chunking/streaming @ service side
● Performance tuning
○ Flame Graphs
○ Waterfall charts @ server side
Marko vs React: Server-side rendering
github.com/patrick-steele-idem/marko-vs-react
Tools > Flame Graphs
Flame Graphs
Why not use v8-profiler data?
● How to
○ Use v8-profiler to generate json data file
○ Aggregate stack frames into json
○ Render using d3-flame-graph
● No special environment
● No special steps
● Single button generation
● Can be used in production/dev/qa
● Exposes only javascript side of the code
Used For ...
● CPU profiling
● Troubleshooting in production the problem at hand
● Memory leak investigation
● Regular sampling
Practice Fire Safety
Flame Graphs:
Tools > Waterfall Charts
● Logs are hard to read
● Timestamps are hard to compare
● We need a faster tool?
Why not use the same method used by
developer tools in browser?
Waterfall Charts
● Requires code instrumentation
● Easy and quick to assess what is going on
● Easy to spot synchronous events
● Analyze for possible task parallelization
Lessons Learned
● Latency, TCP_NODELAY=true
● Handling request close, finish, error events is important
● No dns cache out of the box
○ Use OS level caching to allow restarts
● Avoid modules with a state
● Embedding within App teams to bootstrap works great
● Use cold cache to keep restarts fast
Challenges
● Version control
○ npm shrinkwrap does not guarantee versions
○ switched to Uber shrinkwrap
● App and platform coupling in one build
○ It is still monolithic, platform coupled to app
● Upgrading to major versions
○ Need to keep backwards compatible
○ Teams go at their own pace
● Memory leak analysis
● Debugging
○ Not stable, gets broken frequently
So Far So Good
What’s next?
● 1 billion requests / day
● Decoupling platform from application
○ Moving platform components into separate processes
○ Independent platform deployments
○ More resilient apps
○ Problem isolation (easier memory leak detection)
● Platform microservices
● Docker
● Kubernetes
● SenecaJS
● NodeJS services
References
• Progressive rendering: http://www.ebaytechblog.com/2014/12/08/async-fragments-
rediscovering-progressive-html-rendering-with-marko
• AMP: http://www.ebaytechblog.com/2016/06/30/browse-ebay-with-style-and-speed/ - sse
between frontend and backend - streaming
• http://www.ebaytechblog.com/2016/06/15/igniting-node-js-flames/
• http://www.ebaytechblog.com/2016/07/14/mastering-the-fire/
• http://www.slideshare.net/tcng3716/ebay-architecture
• Cloud http://www.computerweekly.com/news/2240222899/Case-study-How-eBay-uses-its-
own-OpenStack-private-cloud
• http://www.ebaytechblog.com/2014/10/02/dont-build-pages-build-modules/
• history: http://www.addsimplicity.com/downloads/eBaySDForum2006-11-29.pdf
• lasso: https://www.npmjs.com/package/lasso
• marko: http://markojs.com/
• https://github.com/spiermar/d3-flame-graph
Questions ?
Back slides

Node.js Web Apps @ ebay scale

  • 1.
  • 2.
    By Dmytro Semenov Memberof Technical Staff @ NodeJS, Cloud and Platform Services, eBay Inc. dsemenov@ebay.com
  • 3.
  • 4.
    1995 2004 2013present2006 Perl C++ XSL/XML Java Node Scala Java UI, JSP Extended Python Go ??? future Marko/Lasso(Tessa) Polyglot initiative OSGi Programming Language History
  • 5.
    ● Perl/Apache (nolayers) ○ Scalability limit of 50K active listings ● C/C++/IIS/XML/XSL (monolithic client/server architecture) ○ Compiler limit of methods per class, very long build times (2+ hours) ● Java/J2EE (3 tier architecture, split build into domain specific) ○ XML/XSL (MSXSL) as presentation layer hit native memory limit ○ Java UI components, proprietary technology, steep learning curve ○ Startup time up to 40 minutes ● Java Stack (embrace open-source, JSP, modular approach) ○ OSGi conflicts at startup, very hard to get everything working at the start ○ Slow startup time (2+ minutes) ● Polyglot initiative to support popular languages/technologies
  • 6.
  • 7.
    ● Started in2013 ○ CubeJS, first attempt was not so good, ○ Unified NodeJS, eBay and PayPal platform merge ○ Big projects decided to move to NodeJS ● Embedding into application teams ○ Accumulation of issue knowledge base ○ Build confidence and form local knowledge centers ○ Fast feedback ○ Fast project startup ○ Pick new ideas from application teams
  • 8.
  • 9.
    ● Node 4.x ●200 applications and growing ● 80 million requests / day now 1 billion ● Platform team of 6 developers ● Very vibrant internal community ○ Local pool of knowledge among teams ○ Self sustaining support by internal community ○ PRs are welcome ○ 45 contributors to eBay NodeJS platform ● 80 platform modules ● 330 total modules
  • 10.
    ● More agile ○Startup time <2s ○ Test coverage close to 90%-100% ○ Faster releases (every day) vs 2 weeks cycle ○ Automatic upgrades with semver ○ Modular UI architecture based on UI components ● Better tools ○ flame graphs, waterfall charts, pre-production testing ● Learnings Applied to Java Platform ○ Incorporate best practices from NodeJS ○ Startup time < 1 minute ○ Embrace modular approach and semver ○ Lighter Java stacks
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
    ● Experience Service ○Provides view models for UI modules ○ View model is specific to the device type (native, size, desktop, tablet) ● NodeJS Web App ○ Talk to experience service ○ Handles UI functionality ○ Renders desktop and mobile web pages ● Native Apps ○ Talk to experience service via API gateway
  • 16.
    Module View pm2 node.js express (3workers) platform-ebay analytics logging config service client lasso plugin app servicespages components kraken-js lasso/tessamarko
  • 17.
    Don’t Build Pages,Build Modules
  • 18.
  • 19.
  • 21.
  • 22.
    ● Async +streaming ● Custom tags ● Browser + Node.js runtime ● Lightweight ● Extremely fast ● Compiles to CommonJS markojs.com github.com/marko-js/marko github.com/marko-js/marko-widgets Templating UI components ● DOM diffing/patching ● Batched updates ● Stateful widgets ● Declarative event binding ● Efficient event delegation ● Lightweight runtime
  • 23.
    UI components builtusing Marko Widgets Uses Marko templates as the view. Plus: github.com/marko-js/marko-widgets ● DOM diffing/patching ● Batched updates ● Stateful widgets ● Declarative event binding ● Efficient event delegation ● Lightweight runtime
  • 24.
  • 25.
    Node is builton a streaming interface HTTP is a streaming protocol So why then do most templating and UI libraries render as a single chunk?
  • 27.
    ● Faster perceivedpage load ● Better time, resource and traffic utilization ● Progressive rendering between browser, frontend and backend ○ Frontend<->Backend: Stream of server side events (SSE) ○ Browser<->Frontend: Module render and flush on arrival ● Marko helps: ○ Async fragments support by marko-async ○ Re-order on client side
  • 28.
    Marko vs React:Server-side rendering github.com/patrick-steele-idem/marko-vs-react
  • 29.
  • 30.
  • 31.
    Why not usemiddleware pattern on the client?
  • 32.
    logging error handling analytics circuit breaker retry security/oauth http/https/sse serviceinvocationhandlers serviceclient middlewares service web app request response cookies body-parser analytics
  • 33.
    ● Generic ServiceClient with pluggable handlers ● Handler performs a single function ● The handler pipeline is configurable per client
  • 34.
    Configuration based Codebased "my-service-client": { "protocol": "http:", "hostname": "myservice.com", “path”: “/path/to/resource”, "port": 8080, "socketTimeout": 1000, "pipeline": [ "logging/handler", "circuit-breaker/handler", "error/handler" ] } var serviceClient = require(‘service-client’); serviceClient.use(require(‘logging/handler’)); serviceClient.use(require(‘circuit-breaker/handler’)); serviceClient.use(require(‘error/handler’)); serviceClient .get(‘http://myservice.com/path/to/resource’) .end((err, response) => { console.log(response.body); })
  • 35.
  • 36.
    Why not deployeverything at once?
  • 37.
    Resource Server App Server 1. GET http://www.ebay.com 2. GET http://rs/check/hp-24512.js 3.PUT http://rs/upload/hp-24512.js 4. <script src=”hp- 24512.js”> 5. GET http://hp- 24512.js 6. GET http://hp- 24512.js Akamai browser
  • 38.
    lasso.js/tessa JavaScript module bundler+ asset pipeline github.com/lasso-js/lasso
  • 39.
    ● Supported resources ○Defined by lasso plugins ○ CSS/JS/Images ○ Templates ○ I18n content ○ Bundle definition ● Lasso plugin ○ Adaptor between lasso and resource server ● Benefits ○ Single build and deployment ○ No synchronization problems between app and resource server deployments ○ Externalization at startup and during runtime on-the-fly
  • 40.
  • 41.
  • 42.
    ● Module configuration ○Local, packaged with module ○ Remote, hot deployed ● Application configuration ○ Local ○ Remote, hot deployed ● Application can override any module configuration ● Configuration can be injected via Admin Console ● Future - Code and Config “separation”, but ○ Keep app and config together in git repo and separate at deployment. ○ Easier to manage
  • 43.
  • 44.
    ● Use krakenjs/spudmodule ○ Property file format ○ Marko tags/helpers ● Externalizable as resources ● 17 main languages ● Support multiple languages per country ● Application and modules can have localizable content
  • 45.
  • 46.
  • 47.
    ● Nsp tool ○On-demand scan for every application project ○ Security badge for every platform module ● ScanJS ○ Source code scan ● CSRF tokens ● Redirect validation ● XSS ● Rate limiter
  • 48.
  • 49.
    ● Logging everytransaction/subtransaction ○ Central Logging Repository (CAL) provides log aggregation per pool/box/datacenter ○ Use Domains to maintain context per request and avoid passing it around ● Explicit code instrumentation ○ Support DEBUG, INFO, WARN, ERROR, FATAL ○ Time span to record transaction (start and end) ○ Nested spans ● Health checks/stats monitoring and alerts ● Crash/OOM emails to the owner of the pool ● Early problem detection using traffic mirror
  • 50.
  • 51.
    ● Proactive testingin pre-production ○ Traffic mirroring of read-only traffic to the box with new build ○ Easy upgrades ● Handling uncaught errors ○ Domains used to capture context ○ Send email to the owner with stack trace, group and box name, request info ○ Graceful restart ● Handling memory leaks ○ Email sent to the owner with group and box name, request info ○ Graceful restart when memory threshold is reached ● Too busy load shedding ○ 503 or connection reset to trigger browser DNS fallback ○ Filter bots traffic under heavy load ● Planning for failure ● Hystrix like service calls ○ Fail fast ○ Circuit breaker
  • 52.
  • 53.
    ● Fast startup/re-start ○Cold cache to avoid service invocation ○ Pre-compiling template @ deployment ○ Pre-externalizing resources @ deployment ● Progressive rendering/streaming @ browser side ● Progressive chunking/streaming @ service side ● Performance tuning ○ Flame Graphs ○ Waterfall charts @ server side
  • 54.
    Marko vs React:Server-side rendering github.com/patrick-steele-idem/marko-vs-react
  • 55.
  • 56.
  • 57.
    Why not usev8-profiler data?
  • 58.
    ● How to ○Use v8-profiler to generate json data file ○ Aggregate stack frames into json ○ Render using d3-flame-graph ● No special environment ● No special steps ● Single button generation ● Can be used in production/dev/qa ● Exposes only javascript side of the code
  • 59.
  • 60.
    ● CPU profiling ●Troubleshooting in production the problem at hand ● Memory leak investigation ● Regular sampling
  • 61.
  • 62.
  • 63.
  • 64.
    ● Logs arehard to read ● Timestamps are hard to compare ● We need a faster tool?
  • 65.
    Why not usethe same method used by developer tools in browser?
  • 66.
  • 67.
    ● Requires codeinstrumentation ● Easy and quick to assess what is going on ● Easy to spot synchronous events ● Analyze for possible task parallelization
  • 68.
  • 69.
    ● Latency, TCP_NODELAY=true ●Handling request close, finish, error events is important ● No dns cache out of the box ○ Use OS level caching to allow restarts ● Avoid modules with a state ● Embedding within App teams to bootstrap works great ● Use cold cache to keep restarts fast
  • 70.
  • 71.
    ● Version control ○npm shrinkwrap does not guarantee versions ○ switched to Uber shrinkwrap ● App and platform coupling in one build ○ It is still monolithic, platform coupled to app ● Upgrading to major versions ○ Need to keep backwards compatible ○ Teams go at their own pace ● Memory leak analysis ● Debugging ○ Not stable, gets broken frequently
  • 72.
  • 73.
  • 74.
    ● 1 billionrequests / day ● Decoupling platform from application ○ Moving platform components into separate processes ○ Independent platform deployments ○ More resilient apps ○ Problem isolation (easier memory leak detection) ● Platform microservices ● Docker ● Kubernetes ● SenecaJS ● NodeJS services
  • 75.
    References • Progressive rendering:http://www.ebaytechblog.com/2014/12/08/async-fragments- rediscovering-progressive-html-rendering-with-marko • AMP: http://www.ebaytechblog.com/2016/06/30/browse-ebay-with-style-and-speed/ - sse between frontend and backend - streaming • http://www.ebaytechblog.com/2016/06/15/igniting-node-js-flames/ • http://www.ebaytechblog.com/2016/07/14/mastering-the-fire/ • http://www.slideshare.net/tcng3716/ebay-architecture • Cloud http://www.computerweekly.com/news/2240222899/Case-study-How-eBay-uses-its- own-OpenStack-private-cloud • http://www.ebaytechblog.com/2014/10/02/dont-build-pages-build-modules/ • history: http://www.addsimplicity.com/downloads/eBaySDForum2006-11-29.pdf • lasso: https://www.npmjs.com/package/lasso • marko: http://markojs.com/ • https://github.com/spiermar/d3-flame-graph
  • 76.
  • 77.