System Revolution- How We Did It

System revolution
How we did it
Victor Perepelitsky
questions: www.meetup.com/ILTechTalks/events/226834931/
slideshare: www.slideshare.net/victorperepelitsky
email: victor.prp@gmail.com

LivePerson customer example
salesman visitor from UK
chat lines
get session state activity revents
chat lines
sales
manager
invite chat UK visitors
see reports
invite
3

LivePerson at a glance
4
● Account (brand) - LivePerson customer
● Visitor - individuals who interacts with the
business owner’s brand
● Agent - an account representative who may
interact with visitors (examples: technical
support, sales)
● Admin - an account representative who defined
the business goals and normally manages
agents in order to effectively reach them

LivePerson at a glance
agent visitor
Chat scale
(2K req/sec)
Visitor scale
(100K req/sec)
chat lines
chat lines
admin
define business rules
see reports
Admin scale
(under 100
req/sec)
invite
5

Legacy
agent visitor
Chat scale
(2K req/sec)
Visitor scale
(100K req/sec)
chat lines
chat lines
admin
define business rulessee reports
Admin scale
(under 100
req/sec)
Real Time Server
Offline and Reporting
6

Legacy - stateful + account sticky
session from
account B
RT server
E, F, G
RT server
A, C
RT server
B, D
web server web server
session from
account A
7

Legacy
● Works
● Fast
● Partially resilient
● Huge amount of features
8

Legacy - pains
● Hard to scale
● Hard to add new features
● Poor resource utilization
● Poor manageability
● Poor QoS
● Huge friction with customers 9

Let's go back
agent visitor
Chat scale
(2K req/sec)
Visitor scale
(100K req/sec)
chat lines
chat lines
admin
define business rules
see reports
Admin scale
(under 100
req/sec)
invite
10

Proper system architecture
agent visitor
Chat scale
(2K req/sec)
Visitor scale
(100K req/sec)
chat lines
chat lines
admin
define business rulessee reports
Admin scale
(under 100
req/sec)
real time
offline
reporting config
11

The new dream
agent visitor
Chat scale
(2K req/sec)
Visitor scale
(100K req/sec)
chat lines
session state
activity reventschat lines
admin
business rules
see reports
Admin scale
(under 1K
req/sec)
chat
offline
reporting config
monitor and
engage
* Business App / Extension 12

Monitor and engage = shark
Shark manifesto
● Collects and makes available data about
individuals (visitors) as they interact with the
business owner’s brand (account)
● Acts in real-time to engage visitors (chat, ad,
call etc..)
● Is a platform for a business logic modules
(sharklets) which might be independently
developed and deployed
13

Fundamental decisions
Requirements?
14

Platform requirements
● E2E latency within DC < 30 mills
● Good resources utilization (CPU > 50%)
● Efficient - At least 500 req/sec per node
● Sharklet development lifecycle is independent
● High Availability
○ uptime > 99.99999%
○ data loss < 0.01%
● Resilient - no service downtime when external resource is
unavailable (minimal degradation is allowed)
● Business logic correctness - 99.9%
15

Requirements? -> defined
Stateful or stateless?
16

Stateful
stickiness
is
required
session 1 session 2 session 3 session 4
17

Stateless
session 1 session 2 session 3 session 4
session
data
Each request
potentially
requires
access to
session data
store
18

Facts that helped us to decide
1. Legacy works as “Stateful without HA”
2. A small data loss has a tiny customer
impact (0.01% loss is good enough)
3. Stateless requires much more
resources and initial effort
4. We can add HA store in the future
19

Stateful shark
ACCOUNT Nsession B
RT server
E, F, G
RT server
A, C
RT server
D
web server web server
session A
NN , B
20

Stateful or stateless?
What are the big parts?
21

Legacy - successful patterns
1. Requests are processed in memory
2. External resources are accessed
asynchronously to visitor requests
3. Customer Rules and Data
(AccountConfig) are kept in memory
and may be updated on background
23

Legacy - pains
1. Order of calls (inside code + rules)
2. Business logic are not pluggable
components
3. Http requests are tightly coupled
within logical levels (hard to move
toward other protocols as
WebSockets)
24

SYNC -
Fast CEP,
engagements
ASYNC -
slow actions,
external
resources
access
sharklet A
(sync
handlers)
sharklet A
(async
handlers)
web
visitor
agent
mobile
visitor
facadeadapter adapter adapter
Account
Runtime
Data
Message
BUS
external
resource
26

Shark - The Big Parts
1. Facade - decouples real world protocols
from the logical layers
2. CEP - avoids call order management
3. Sync - very fast in memory processing
4. Async - allows slow actions and ext
resources access
5. Account Runtime Store - allows in
memory access to customer
configuration
27

Stateful or stateless? -> stateful
What are the big parts? -> we have it
Basic technology stack
28

We were practical
CEP technology?
30

Drools - we tried to kill it
We had
● played with it - :)
● integrated into shark - :)
● made a POC using LivePerson logic - :)
● tested for performance - :(
33

We played with more technologies
34

And finally chose the solution
35

Shark CEP - processing cycle
handler 1
handler 2
handler 3
Event
Queue
b
a
36

handler 1
handler 2
handler 3
Event
Queue
a
b
37

handler 1
handler 2
handler 3
Event
Queue
ba
a
38

handler 1
handler 2
handler 3
Event
Queue
b
c
39

handler 1
handler 2
handler 3
Event
Queue
b
c
40

handler 1
handler 2
handler 3
Event
Queue
b
c
41

handler 1
handler 2
handler 3
Event
Queue
42

Basic technology stack -> choosed
CEP - Technology choice -> DIY (inhouse)
44

Locking architecture
45

Locking - The model
The world
account A
session 1
session 1
session 1
session
4
46

Locking - Legacy pains
● You must be aware of locking when
writing a business logic
● Write lock on account freezes all account
operations
● Locking became the bottleneck
(Not a CPU)
● BUGs 47

Locking - Shark solution
● Read/Write lock for session
● Write business logic only - no locking
awareness
● No write lock on account - copy on write
48

SYNC -
A single proc
cycle uses
consistent
account data
copy
ASYNC -
updates
account data
using copy on
write pattern
sharklet A
(sync
handlers)
sharklet A
(async
handlers)
web
visitor
agent
mobile
visitor
Account
Runtime
Data
external
resource
49

Sharklet example (no locks)
50

Locking architecture -> decided
51

LiveEngage - the big decision
54

Dream = LiveEngage platform
agent visitor
Chat scale
(2K req/sec)
Visitor scale
(100K req/sec)
chat lines
session state
admin
business rules
see reports
Admin scale
(under 1K
req/sec)
chat
offline
reporting config
monitor and
engage

Rules - from definition to runtime
visitor
activity revents
admin
business rules
config
monitor and
engage
* Business App / Extension
if the
visitor
meets the
conditions
-> invite
to chat
56

What is rules engine
Rules engine serves as pluggable software
component which executes business rules
These rules are externalized or separated
from application code
58

Rules engine implementation
Boolean logic is the easy part
59

Hard to detect which conditions
must be evaluated
new
Fact
60

Hard to implement drools like DSL
61

Rules Engine -
How to make it happen?
62
● Drools - Eats memory
● Legacy rules engine
○ Customer friction is too high
○ Not efficient

GRF - Generic Rules Framework
Conditions and outcomes are
building blocks that can be used
for complex rules creation
hard coded building blocks
TimeOnPage
GeoLocation
InviteToChat
rule
if (
timeOnPage(5)
and
geoLocation(“US”)
)execute{
inviteToChat()
}
65

GRF + CEP = RulesEngine
GeoLocation condition
trigger when (geo data is changed)
evaluate(geo, accountConfig){
if (geo == accountConfig.geo)
TRUE
else
FALSE
}
Condition type
implementor defines
the evaluation
trigger instead of
automatic detection
66

Shark Rules Engine (Condition)
67

SYNC -
Detects which
conditions
should be
evaluated and
trigger GRF
ASYNC -
loades rules to
shark rules
engine
sharklet A
(sync
handlers)
sharklet A
(async
handlers)
web
visitor
agent
mobile
visitor
Account
Runtime
Data
Message
BUS
Account
Config
Rules
Engine
69

We did a little more
AND
Felt ready to go
70

SYNC -
CEP, Rules,
Report-Sharklet
ASYNC -
integrated with
account config
sharklet B sharklet B
web
visitor
agent
mobile
visitor
Rules
Engine
Account
Config
Account
Runtime
Data
Message
BUS
sharklet A sharklet A
Account
Config
Service
71

Feel the field
Legacy
agent visitoradmin
activities
- Silent mode
72

The dream comes true
agent visitor
chat lines
session state
admin
business rules
see reports
chat
offline
reporting config
monitor and
engage

Platform in action
Legacy
chat
agent visitoradmin
activities
engagements
Account
Config
Reports
First small customers
74

Shark
We started with small cluster
And just added servers with business growth
75

We recognized major bottlenecks
76

Tools and techniques
● Statistics monitoring
● Testing methodology
● Java 8
● Notes about G1
78

Statistics monitoring - graphite
79

Statistics monitoring - graphite
80

Statistics monitoring - metrics
https://github.com/dropwizard/metrics
http://metrics.dropwizard.io
private final Timer responses =
metrics.timer(name(RequestHandler.class, "responses"));
public String handleRequest(Request request, Response response) {
final Timer.Context context = responses.time();
try {
// etc;
return "OK";
} finally {
context.stop();
}
}
81

Testing methodology
● Unit test - use it
● Integration test - invest here
● System test - try to minimize effort
● Performance
○ Integration - worth it
○ System - choose your tests
82

Performance test validations
84

Testing methodology
How did we test platform?
We had
● built main code with tests in mind
● mocked our clients
85

Java 8
● We moved to java 8 one year ago
● It was easy :)
● Pushed us to
○ more expressive code
○ functional style
○ immutability
search on youtube - LivePerson Functional Java 8
86

Notes about G1
● Designed for big heaps and
minimizes big pauses
● Is considered to be the default GC
in java 9
● We have tested our system with G1
when 12 GB was used and
○ received good results (no big GC
paused)
87

We are happy now
● Horizontal scalability
● Independent and safe business
logic development
● Fast development cycles (platform,
sharklets, data-model)
● Efficient resource utilization
● Less BUGs (Easier to fix)
● Better QoS
● Overall confidence
89

Numbers
____________________________________
Pick statistics Shark Legacy
Concurrent visitors ~ 100K ~ 1 Million
Request/Sec ~ 11K ~ 110K
Machines ~ 34 ~700
Cores ~ 224 ~ 6300
Cost per visitor ~ 0.001 ~ 0.006
90

Future challenges and ideas
● Better High availability
● Deployment with no downtime
● Management tools
● 100K accounts
91

Tips
● Define scope and requirements
● Company commitment is a must
● Work with your clients
● Treat test code as if it runs in
production
● Automated perf tests - it helps
● Sometimes DIY is the best solution
● Respect legacy - combine old ideas
with new technologies
● Understand the complexity and find
the simplest solution 92

System Revolution- How We Did It

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to System Revolution- How We Did It

Similar to System Revolution- How We Did It (20)

More from LivePerson

More from LivePerson (20)

Recently uploaded

Recently uploaded (20)

System Revolution- How We Did It