From prototype to production - The journey of re-designing SmartUp.io

From prototype
to production
The journey of re-designing SmartUp.io

● Extrovert geek and tech lover
● Joined SmartUp.io @ Q2 2016
● Adores Linux, OSS, is a general minimalist
● Mining the bits in the startup & product
mines, but been through the valleys of
outsourcing & consultancy scene as well
About Me - whoami
Mate Lang
CTO @ SmartUp.io

About SmartUp.io - Who we are
● Startup company with a mobile-first,
gamified, social micro-learning SaaS
○ Initial goal is to help entrepreneurs launch and
take their startup companies to success
without bothering VCs with the exact same
questions
○ Found out, we have a much wider use case:
reach, engage, train & inspire communities to
facilitate learning
● Got a lot of attention from clients and
investors

SmartUp.io - Motivation
● Deloitte predicts huge
disruption opportunity
in corporate learning
sector

sector
Company’s average
net promoter score
for their LMS?

sector
Company’s average
net promoter score
for their LMS?
-8

What is this talk about
● How shoud I (the geek) attack the problem domain to be successful?
● How did we decompose and redesign a “complex” (a.k.a messy) monolith into
maintainable microservices
● How did we cope NFRs like security and performance

Fact #1 - In software the only constant thing is change

Fact #1 - In software the only constant thing is change
Fact #2 - Geek is the new sexy ;)

Market driven evolution
● Initial product is simple media consumption
○ Users learn by consuming content and competing with each other
○ SmartUp content team writes content on protected administration webapp
○ Users are on the same platform, without isolation
● ACME Inc comes along with proposal to use the platform internally for learning
○ Users are isolated for ACME Inc and FooBar Inc (multi-tenancy)
○ Company needs to be able to write their own content on their “slice” of the platform
● Introduce Communities = isolated instance of our platform suitable to serve the
needs for business consumers

What we had What we wanted to sell

This is what we had
under the hood

Plan is to rewrite from scratch
Team’s estimation of backlog = ~ 7 months

Plan is to rewrite from scratch
Actual delivery time = 7 months 10 months

Where could it go
wrong?
Right from architecting

Understanding your domain
● Clients use a “smart hack” on V1 to obtain a predefined order of their published
learning material
● By default the platform facilitates “feed-like” behaviour
○ no explicit ordering
○ potentially infinite list of items
● The right question to ask as an engineer
Why do our client would ever
want that?

● Clients want a course like structure
● That is substantially different than our individual publishing model
○ The content is designed from the beginning in an ordered fashion
○ The completion should be in an ordered fashion
○ Analytics should be able to correlate between completion records
○ It deserves it’s own management

● Clients want a course like structure
● That is substantially different than our individual publishing model
○ The content is designed from the beginning in an ordered fashion
○ The completion should be in an ordered fashion
○ Analytics should be able to correlate between completion records
○ It deserves it’s own management
VS

Geek team shop list for the re-write
● Scalable and maintainable codebase on all platforms
● Needs to be microservice architecture, because everything is easier with them
(this is a fat lie)
● Automate everything that is possible to automate
● Detailed and helpful documentation
● Needs to ship in a continuous fashion with Docker because Docker is cool (it
actually is)
● Needs to have Infrastructure-as-Code
● Needs to use managed solutions
● Web client and mobile apps useable by your grandma

Microservices - the why
Google Trends - search for term “microservice”

Microservices - the why
● Haters gonna hate, but there is undeniable interest & adoption in
engineering-led companies
● Microservices take common clean code (SOLID) concerns to system level
○ SRP - a service should be concerned about a single coherent domain
○ OCP - extending behavior done through encapsulating with higher level services.
Change in remote context should not produce change in a given service.
○ ISP - introducing edge services - specialized backends for clients
● The above seems like common sense, but engineers do fail in designing such
systems

Microservices - the reason we fail
“Often stepping back
you see more, don’t you?”
David Hockney
painter, draughtsman, printmaker, stage designer and photographer

Microservices - my fast service design test
● If you want to know whether you have (not) designed it correctly (false
positives may appear) fill in the following test
Test for your service design
1. Imagine you have to open source your service. Are you able to do so without
doing code changes, but staying useful to an engineer outside your business
domain? …………………………………
------------------------------ END OF TEST ------------------------------

Let’s see a concrete example

Meet the
“Leaderboard Service*”
*(the tech scene ran out of deity names to name services after,
so no Zeus or Hydra for you)

The leaderboard - proposition
● Service responsible to manage leaderboards
● A leaderboard is an ordered list of players associated with a score
● It needs to allow the near real-time update of such boards, based on individual
score change events
● Allows the player to check the “transaction history” not just the aggregated
state
● It allows the fast retrieval of a certain segment of the board
○ Top X
○ Around X ( X-10, X, X+10)

The leaderboard - collecting points
Card
Points
Card

The leaderboard - under the hood
public interface LeaderboardService {
LeaderboardCreationResponseDto createLeaderboard(LeaderboardCreationRequestDto requestDto);
LeaderboardDto retrieveTopNLeaderboard(String leaderboardId, long size);
LeaderboardDto retrieveAroundNLeaderboard(String leaderboardId, String playerId, long size);
PlayerDto retrievePlayer(String leaderboardId, String playerId);
void updatePlayerScore(String leaderboardId, String playerId, long score);
}
● Note that this interface is agnostic of all SmartUp related logic
● Could be reused in any situation where you want to represent entities in a
sorted order by score
● No matter what the player entity is, it’s created lazily when you first upgrade it’s
score
● Currently used to incentivize consumption, but can be applied to groups of
players, content creators, etc.

The leaderboard - under the hood

● Due to event sourcing we can always reconstruct state in case of failure
● Redis is in-memory. HA or not, should it ever go down, an update of state would
effectively restore our read-optimized model
● Our number of Score Processors scale with the data volume
● Concurrent state update correctness guaranteed by DynamoDB Stream shards
● In case of processor failure, upon service restoration the unprocessed events
would get picked up, all in a couple minutes (up to a couple weeks of staying
behind)
The leaderboard - the gain

The Content Service - The proposition
A service that handles the creation and modification of versioned learning material.
Also enables versions to be instantiated and completed by consumers.
Encapsulates both structural and behavioural functionalities, like:
- Structural
- Question text
- Limited number of answers
- Solution explanation
- Behavioural
- Single-choice
- Multiple-choice

The Content Service - Under the hood

The Content Service - the gain
● Due to the Context being an abstract entity we can support lots of use cases for
consumption rules & resilient to change
○ Sharing of consumption record
○ You can re-do a content in certain circumstances (e.g. exam mode has separate context)
● Feedback from our Head Of Content after going live
“Fast, smooth and easy. And really fast. And damn, this thing's fast...”
● Design enables easy clean-up should we ever do so.
For now we store every change a content creator made.
“Because I can” - Dr. Bob Kelso, Scrubs

Tuning your
microservices, cause

Tuning your
microservices, cause
with high throughput
comes high latency

Tuning your microservices
● Simple Operation: Check users
credentials and request JWT Token
● Initial results: not too bad
● P95 responds in 701 ms

● Scale it up
○ 1 OAuth Service
○ 2 User Services
● P95 in ~35k ms
● Almost all requests respond after 1
second
● 40% requests FAILED

● Found out there is no connection HTTP pooling -> TCP handshake penalty
● Update Spring Cloud to Edgware
● Set correct timeouts for Ribbon and Hystrix
● Reduce (yes, reduce) Tomcat resources
○ Max-Threads
■ The maximum number of request processing threads to be created
○ Max-Connections
■ The maximum number of connections that the server will accept and
process at any given time
○ Accept-Count
■ The maximum queue length for incoming connection requests when all
possible request processing threads are in use

● 0 Failed Requests
● Mean Req/s: ~50
● P95 latency: 148 ms

If microservices have
not solved your
engineering problems...

Try hiring a
“DevOps Engineer”

As per Wikipedia
DevOps =
a software engineering practice that aims at
unifying software development (Dev) and
software operation (Ops).

Bronicorn Release
● Went live October 3, 0600 RO time
● Development environment was used for previous 10 months
● No other environment due to cost reasons
● On the day of release we created
○ Staging (Acceptance Testing) environment
○ Production

Bronicorn Release
● Went live October 3, 0600 RO time
● Development environment was used for previous 10 months
● No other environment due to cost reasons
● On the day of release we created
○ Staging (Acceptance Testing) environment
○ Production
45 minutes difference between
deployments (5 minutes active)

How is that possible?
Infrastructure-as-Code

Infrastructure as Code
Definition
Infrastructure as code (IaC) is the process of managing and provisioning computer data
centers through machine-readable definition files, rather than physical hardware
configuration or interactive configuration tools. (as per Wikipedia)
● Just a Bunch of shell scripts
● Modern provisioning tools like: Chef, Puppet, Ansible
● Cloud-ready IoC management: Cloudformation, HashiCorp Terraform

Meet Terraform
● DSL based using HCL (HashiCorp Configuration Language)
● Module oriented
● Manages dependencies between resources (e.g. DNS depends on IP)
● Nice interpolation syntax
● Natively manages multiple environments through configuration (e.g. instance
types differ from env to env)
● Workflow = Plan > Review > Apply
● Configurable state backends
● From V0.10 supports pluggable “providers” (e.g. AWS, GCP)

Meet Terraform
Src: Terraform Homepage

Our way of Terraforming
● 4 AWS VPCs
○ Services VPC (For maintenance, and unified connection to other VPCs)
○ SmartUp VPCs (e.g. Dev, Stg, Prod)
● Each service owns its own module along with dependencies
Eg: Leaderboard:
○ Redis
○ Queues
○ DynamoDB Tables & Streams
○ etc
● Peering module to connect Services <-> SmartUp
● Using encrypted S3 for safe state storage

Managing service configuration
● Services pick up their configuration exclusively in runtime
● No mvn package -Pdev|stg|prod
● Build one artifact (docker) and use it everywhere
● Consul as service discovery and configuration storage
● Terraform injects properties into Consul upon execution
● No configuration done in YML
● Using Spring Cloud Config to pick these values up from Consul upon startup
and checking periodically for changes

Managing service configuration

Each team is responsible for
their delivery process

Each team is responsible for
their delivery process
From design to production

Let’s put it to production
● Loads of deploys in a geek’s life, better make it simple
● A good pipeline will
○ Provide fast feedback before PR integration (build, test & check infra dependencies)
○ Deploy ASAP changes to dev (fail-fast)
○ Streamline production releases so they prevent human error
○ Clear separation between steps
● Preferably define the whole pipeline using code
● Decided to use CircleCI - YAML based Workflows

Thank you for your
kind attention!
Mate Lang
CTO @ smartup.io
mate@smartup.io
twitter: @langmate
medium: @matelang

From prototype to production - The journey of re-designing SmartUp.io

Recommended

Recommended

More Related Content

Similar to From prototype to production - The journey of re-designing SmartUp.io

Similar to From prototype to production - The journey of re-designing SmartUp.io (20)

Recently uploaded

Recently uploaded (20)

From prototype to production - The journey of re-designing SmartUp.io