- GS Lab engineers products for scale, speed and agility. They use a microservices architecture with loosely coupled components communicating over REST APIs to allow for distributed and scalable operation.
- Key aspects of their approach include rapid prototyping, incremental development, and multi-tenanting to allow for experimentation and selective feature rollout. Performance is optimized through caching, data replication, and selecting nearby instances.
- Analytics are used to understand user behavior while reliability is ensured through monitoring, logging, and graceful degradation. Security focuses on access control, throttling, and authentication. Devops practices like continuous integration/deployment facilitate smooth releases.
3. Nature of (most) software products today
• delivered as a service
• hosted on the cloud
• accessed via the browser, mobile
apps and custom devices
• users spread out globally
• integrate with established 3rd party
services and applications
www.gslab.com 3
4. Constraints and expectations from products
• decreasing ‘time to market’
fast development
no re-invention of the wheel
• market-driven development
frequent changes
continuous releases
incremental, experimental enhancements
“platform for testing out market hypotheses”
www.gslab.com 4
5. Key measurement parameters for such products
• ability to scale be ready for viral adoption
• performance fast and good UX
• usage analytics understand user behaviour
• reliability 24x7 uptime, monitoring
• security protect user data (storage,
access and transit)
www.gslab.com 5
6. Implications on product design…
• Lets work with a hypothetical video creation, search and
sharing service for the enterprise….
VideoFoo
www.gslab.com 6
8. Architecture
• system of loosely-coupled components
• communicating via REST APIs over/or HTTP
• multiple instances for scale-out and distributed operation
• interface: self-registration, heartbeat
www.gslab.com 8
video
recorder
video
recorder
doc
converter
user session
handler
user session
handler
user session
handler
doc
converter
registry
m1
m2
m3
OCR
indexer
9. Architecture - II
• simple common component interface
• HTTP/REST communication layer reused across components
– Jetty or Tomcat
– lightweight web servers (lighttpd, nginx)
www.gslab.com 9
doc
converter
10. • efficient multi-tenanting
– load balancing
– rules to isolate customer-specific traffic to designated instances
selective features and upgrades for customers
staggered feature rollout and experimentation
Architecture – III
www.gslab.com 10
analytics
engine
user session
handler
user session
handler
OCR-oldm1
m2 OCR-new
stable-customers
leading-edge customers
reverse
proxy
11. • internal API – for front-end
– fast, efficient, informative, backdoors
– all user interfaces (web, mobile*) should be based on this API
• external API – for 3rd party integration
– minimal, clean, secure, versioned, API-keys
API
www.gslab.com 11
product
core
internal API
external API
12. Performance
• replicate data for scale-out requests served in parallel
– relational database replication (master – slave)
– use NoSQL databases for non-transactional data (master –
master replication)
– select “nearest” instance (LBR, geo-ip)
www.gslab.com 12
us-1
sg-1
eu-1
sg-2 sg-3
us-2
?
13. Performance - II
• cache at every level
– version for cache-busting
www.gslab.com
endpoint cache
(browser)
network cache
(CDN)
application cache
(memcached)
VideoFoo
app
14. Analytics
• understand how users interact with your product
– feature-level analytics (events)
– use 3rd party libraries for tracking high-level feature usage (loggr),
performance (new relic), user profiling (flurry)
www.gslab.com 14
15. Analytics - II
• all URLs in any external communication (ex. email and mobile
notifications, social posts, shares) should include tracking-
tokens to track fulfillment
www.gslab.com 15
VideoFoo
share
16. Reliability
• within-app
– monitor startup, shutdown, restart of individual components
• outside-app
– 24x7 external monitoring, alerts (nagios)
– log analysis (splunk, logstash, kibana,)
• graceful degradation in case of 3rd party API failures/delays
– asynchronous loading after page.onLoad()
www.gslab.com 16
17. Security
• strong access-control
– every session/cookie validated for ‘user@tenant’
• API throttling to prevent DOS attacks (mod_qos, mod_security
for apache)
• timed authentication tokens (HMAC signatures) to protect
URL resources (external API, CDN)
www.gslab.com 17
18. Devops
• smooth development test staging production
deployment process
• internal private cloud setup (openstack, cloudstack)
• easy configuration management (chef, puppet)
• test application performance from multiple geographies
(websitetest, blitz)
www.gslab.com 18
make use of existing tools, libraries and solutions as much as possible
build on top of these
faster way to gauge feature viability/success as compared to planning and market research!
engineered for scale
no time fore re-design when the need is felt
performance
users everywhere, on all kinds of network connections
important for all to experience good performance
reliability
system and feature stability in face of rapid changes
security
customers entrust their data in the cloud
not going to talk about usability/UX here
build a base platform/framework quickly
basic, functional but with missing details
shortcuts (ex. no user management, no analytics)
functional but maybe not high-performance
start with 3rd party components, and replace gradually
incrementally add features (detail) to evolve the product
each component can change without impacting rest of the design drastically
program to interfaces
as long as interface maintained, everything can change underneath
procedure call RPC HTTP
binary text
choice of HTTP important to help migrate components across machines, languages (bindings) and libraries
CURL, shell-scripts, etc
consistent API with and across apps (whole world is HTTP today)
registry
each instance can have different set of components
mix of IO-intensive and compute-intensive
basis for scale-out
multiple instances of “doc converter” available
conversion requests queued and delivered when a converter is free/available
OCR old and new cannot co-exist (incompatible packages)
cannot change .. third party
have to live on different instances
OCR-new is less compute-intensive, and hence the instance can accommodate more components
API-key: standard way to identify consumers
share as little as possible (basic principle of OOP: information hiding)
overhead of changing external API is HIGH (dependencies on API consumers)
desirable to keep it separate from internal API
data replication examples
analytics dashboard served from multiple instances
streaming photos, videos and user-specific content
NoSQL databases
video attributes (do not change frequently, and not critical to reflect immediately)
ex. comments, viewing data
mongoDB, couchDB
geography-based routing
Latency-based routing (AWS) – ANYCAST issue!
Geo-IP databases
entire delivery infrastructure between user and the application is candidate for speed-up
eliminate compute-delay, reduce network-latency, eliminate network latency
application-cache (in-memory hashmap)
aggregated information about objects
listing of key pages (home, category, etc)
listing of first N pages
CDN: used for static, or large, or infrequently changing data
For any user transaction, need to go to our server node
cache-invalidation is expensive .. use versioning for cache-busting instead!
simple ‘event based’ analytics framework to measure feature usage
track where users drop out (ex. step 3-of-4 is the one most users do not complete)
simple ‘event based’ analytics framework to measure feature usage
track where users drop out (ex. step 3-of-4 is the one most users do not complete)
3rd party failures
facebook like
twitter feeds
event notifications
best effort asynchronous notifications (do not wait for HTTP response)
to avoid URL replay attacks
security without requiring centralized authority
centralized token mechanisms don’t scale
local verification
hash(data, time, secret)