1CONFIDENTIAL
BEST
PRACTICES OF
BUILDING DATA
STREAMING API
KANSTANTSIN SLISENKA
APRIL 6, 2017
2CONFIDENTIAL
ABOUT ME
Java Backend engineer
Speaker at Java Tech Talks, SEC Online,
CMCC Tech Talks, IT Week
I’m interested in
Complex Java backend, SOA, databases
High load, fault-tolerant, distributed systems
KANSTANTSIN SLISENKA
EPAM Systems, Lead Software Engineer
3CONFIDENTIAL
Agenda
Streaming and polling1
Technical implementation of streaming2
Technical challenges of streaming3
Some streaming libraries, tools and services4
4CONFIDENTIAL
REAL-TIME APPS ARE EVERYWHERE
UBER
Facebook
Google maps
• Stock prices
• Messengers
• Social networks
• Real-time dashboards
• Games, …
5CONFIDENTIAL
Polling,
long polling,
streaming
I just want to
hear 3 magical
words…
HOW REAL-TIME APPS WORK
6CONFIDENTIAL
- Not real-time
- Useless calls
POLLING
client server
request
empty response
new data
data
source
request
empty response
request
response
7CONFIDENTIAL
- Not real-time
- Useless calls
POLLING
client server
request
empty response
new data
data
source
request
request
response new data
client server
data
source
- Not real-time
• No or less useless calls
LONG POLLING
request
empty response
request
response
8CONFIDENTIAL
- Not real-time
- Useless calls
POLLING
client server
request
empty response
new data
data
source
request
request
response new data
client server
data
source
subscribe
send data
send data
send data
new data
new data
new data
client server
data
source
- Not real-time
• No or less useless calls
• Real time
• Long held connection
LONG POLLING STREAMING
request
empty response
request
response
9CONFIDENTIAL
IMPLEMENTATION
TECHNICAL
OF STREAMING
10CONFIDENTIAL
Streaming on hardware and
network protocol level
• UDP multicast
• TCP reliable multicast protocols
– Cisco PGM and others
• The most effective network
utilization
TCP/UDP MULTICAST
http://www.java67.com/2016/09/difference-between-tcp-and-udp-in-java.html
11CONFIDENTIAL
1. Browser apps became more
popular
• No full TCP/UDP support in browsers
2. Host and network virtualization
• Virtual and hardware networks are different
• No benefit from multicast as routers are not
aware of virtual hosts
WHY TCP/UDP MULTICAST BECAME LESS POPULAR
3. Firewall/proxy restrictions
• Usually only HTTP protocol not restricted in
corporate networks
4. Poor multicast support by
hosting providers
• Multicast is being offered for additional cost
• Poor quality of service
12CONFIDENTIAL
HTTP IS REQUEST-RESPONSE PROTOCOL
FOREVER LOOP
#HIDDEN IFRAME
#AJAX
#COMET
#HTTP STREAMING
13CONFIDENTIAL
COMET / HTTP STREAMING
BENEFITS DRAWBACKS
1. Using only web-technologies
– No more JRE, flash, browser plugins on
client side
1. HTTP browser limitation
– max 6-8 parallel calls
– workaround with domain shading, multiplexing
2. Poor client and server performance
– We are using HTTP protocol not in proper way
3. Proxy/firewall/browser kills
request by timeout
4. Need to handle disconnects
Should be used as
fallback only!
14CONFIDENTIAL
Browser
EVENT SOURCE API: TURNING HACK INTO STANDARD
• Standard JavaScript API
• No more hidden IFRAMEs
• Browser automatically reconnects
server
Long-held HTTP call
One way: from server to browser
Still poor server
performance
15CONFIDENTIAL
TCP
HTTP
WEB SOCKET: TCP IN BROWSER
serverclient
WebSocket frames
WebSocket frames
HTTP/1.1 101 Web Socket Protocol Handshake
Upgrade: WebSocket
Connection: Upgrade
GET /demo HTTP/1.1
Upgrade: WebSocket
Connection: Upgrade
Origin: http://site.com
1. HTTP handshake
2. Upgrade response, “switch
protocols” header
3. Switch to TCP (ports 80/443)
16CONFIDENTIAL
• Real-time P2P connection
between browsers
• Data, audio, video
• STUN server needed for
initial handshake
https://webrtc.org/
WEB-RTC: UDP + P2P IN BROWSER
STUN
server
I AM
10.0.10.1
I AM
10.0.25.40
DATA, VOICE, VIDEO
10.0.10.1 10.0.25.40
HE IS
10.0.25.40
HE IS
10.0.10.1
17CONFIDENTIAL
HTTP/2 SERVER PUSH
serverbrowser
index.html
index.html, logo.png, styles.css
I think you also
need logo.png
and styles.css
May I have
index.html?
• Just an optimization for page
load time
• Not replacement for WebSocket
18CONFIDENTIAL
• Google Cloud Messaging: Android/Chrome
• Apple Push Notification Service: iPhone, iPad,
Safari
• Other services: Microsoft, Blackberry, …
PUSH NOTIFICATIONS
your
back-end
1. GET TOKEN
2. SEND TOKEN
4. SEND NOTIFICATION
5. SEND NOTIFICATION
3. STORE TOKEN
Messaging
service
VENDOR SERVICES
Not a replacement for web-sockets!
https://www.urbanairship.com/push-notifications-explained
19CONFIDENTIAL
COMPARATION OF STREAMING IMPLEMENTATIONS
TCP/UDP multicast
HTTP Streaming
COMET
Event Source
API
WebSocket Web-RTC
Use in
browser
NO YES YES YES YES
Use not in
browser
YES
YES (makes sense
for browser apps)
NO YES YES
Technical
details
Custom protocols over
TCP/UDP
Long HTTP calls Long HTTP calls
HTTP for handshake
with subsequent
upgrade to TCP
P2P UDP
STUN server to
exchange IP addresses
Benefits
Hardware and
protocol level – most
effective network
usage
Only web technology
used
Easier to use then
COMET
All benefits from TCP
and browser apps
All benefits from TCP
and browser apps
Drawbacks
Doesn’t work in
browser
Can be blocked by
proxy/firewall
Negative impact to
client and server
performance
Negative impact to
server performance
Needs fallback to
polling if disabled by
firewall/proxy
Needs intermediate
discovery STUN server
20CONFIDENTIAL
DATA STREAMING
CHALLENGIES
21CONFIDENTIAL
DATA STREAMING CHALLENGIES
Protocol fallback1
API design2
Fault-tolerance3
Security4
Using schemas5
Sending deltas (snapshot-update)6
Data merging7
Replaceable buffer8
ARCHITECTURE OPTIMIZATION
22CONFIDENTIAL
1. PROTOCOL FALLBACK
• Client don’t support WebSocket
• Firewall/proxy issues
• Unstable network connection
Automatic switch to
other protocol
1. Try WebSocket
2. Then HTTP streaming
3. Then Long polling*
4. Then Polling*
* Not all applications can tolerate to such a large latency
23CONFIDENTIAL
2. STREAMING API DESIGN
onMessage Publish-Subscribe ORM-style
Development and support complexity, performance
Lots of if-else blocks
Very hard to maintain
Logical notion of subscription
Trade-off between level of abstraction
and performance
High level of abstraction
We don’t know what exactly happens
under API calls
Data structures complexity
24CONFIDENTIAL
3. FAULT-TOLERANCE
CLIENT/CONNECTION IS DOWN SERVER IS DOWN
server
client
disconnect
reconnect
client
context
server
client
heartbeat
Session/context alive timeout
client
context
Try restore context + send difference (preferable)
Or request data again (HTTP/snapshot + WebSocket)
Server 1
client
Server 2
disconnect
Connect other server
Try restore context
Or request data again
client
context
client
context
If streaming no longer works - switch to polling
We are no longer stateless!
25CONFIDENTIAL
4. SECURITY
Request-response Streaming
Protocol HTTPS WSS
Authentication When HTTP session started
Authorization Each client request Beginning of the connection
Log-off
Invalidate access token and
session
Invalidate access token and
session
Terminate WebSocket
connection
26CONFIDENTIAL
5. USING SCHEMAS
Field Type
Temp Decimal
Pressure Decimal
Status CONNECTED=1,
DISCONNECTED=2
server
client
25.5 | 751 | 1
Use schema
Use schema
Need to somehow manage
different schema versions
Schema version = 1
Don’t send field names
in each message
{
sensorData: {
temp: 25.5,
pressure: 751,
status: CONNECTED
}
}
27CONFIDENTIAL
Data snapshot in
memory
6. SENDING DELTAS (SNAPSHOT-UPDATE)
client server
subscribe (TEPM, PRESSURE)
Temp=35.50, Pressure=750
snapshotTemp Pressure
35.50 750
Temp Pressure
35.50
38.60
750
Temp Pressure
38.60 750
740
Temp=38.60
update
Pressure=740
update
28CONFIDENTIAL
7. DATA GROUPING
Time Price Quantity
12:40:00.100 121.60 5
12:40:00.150 121.95 10
12:40:00.600 121.70 20
12:40:01.100 121.75 50
12:40:01.900 121.60 100
Time Max price (MAX) Total quantity (SUM)
12:40:00 121.95 35 (5+10+20)
12:40:01 121.75 150 (50+100)
Merge multiple messages into one for reducing bandwidth and frequency
clientserver
29CONFIDENTIAL
8. WRITE-BEHIND BUFFER
Modifiable buffer
Time Temperature SensorID
12:41:00 24 c* 1
Time Temperature SensorID
12:40:00 23 c 1
12:40:00 30 c 2
UPDATE
• Data has bot been sent
• But still in the buffer
client
30CONFIDENTIAL
SOME STREAMING
LIBRARIES, TOOLS AND SERVICES
31CONFIDENTIAL
Implements fallback
– WebSocket
– EventSource
– COMET
– Hidden IFRAME
– Polling
SOCK JS LIBRARY
• Integration with Spring
• Multiplexing support
https://github.com/sockjs/websocket-multiplex
32CONFIDENTIAL
• Client and server (Java) components
• Transparently supports
– WebSockets
– Server Sent Events,
– Long-Polling,
– HTTP Streaming (Forever frame)
• References
– https://github.com/Atmosphere/atmosphere
– http://async-io.org/tutorial.html
ATMOSPHERE JAVA FRAMEWORK
33CONFIDENTIAL
• Connects to external data
sources
• Provides data to LS server
• Per user/subscription
• Security and permissions
• Bandwidth/frequency limitations
• Data schemas
LIGTSTREAMER SELF-HOSTED SERVER
DATA ADAPTER
METADATA ADAPTER
• Self-hosted server
• We need to implement and deploy adapters
34CONFIDENTIAL
PUB NUB CLOUD SERVICE
Your data source Your client apps
www.pubnub.com
35CONFIDENTIAL
Cloud NoSQL data storage
• Data is automatically synced to all
connected devices
• Covers many issues
– Failover
– Protocol fallback
– Network
– Scalability
– Monitoring
– and many other
• Handles complexity behind SDK
GOOGLE FIREBASE CLOUD SERVICE
36CONFIDENTIAL
1. Real-time apps are de facto standard now
2. Use streaming, fallback to long polling or polling
3. Take advantage from TCP/UDP in browser (WebSocket, Web-RTC)
4. Streaming API is fully statefull
5. Keep in mind optimization techniques when architecting streaming API
6. Use battle-tested tools and products
CONCLUSION
37CONFIDENTIAL
Real-time web technologies overview
– https://www.leggetter.co.uk/
Data streaming frameworks and services
– List https://www.leggetter.co.uk/real-time-web-technologies-
guide
– Lightstreamer http://www.lightstreamer.com/
– SockJS https://github.com/sockjs
– PubNub: pubnub.com
– Firebase: https://firebase.google.com/
– Atmosphere: https://github.com/Atmosphere
WebSocket
– https://samsaffron.com/archive/2015/12/29/websockets-caution-
required
Server-side events vs WebSockets
– http://streamdata.io/blog/push-sse-vs-websockets/
REFERENCES
Server-side events
– http://www.html5rocks.com/en/tutorials/eventsource/basics/
Push notifications
– https://www.urbanairship.com/push-notifications-explained
Push notification services with free plans
– https://onesignal.com/
– https://clevertap.com/
– https://goroost.com/
HTTP/2
– https://daniel.haxx.se/blog/2014/04/26/http2-explained/
– https://http2.github.io/
– https://tools.ietf.org/html/rfc7540
– Explanation by Daniel Stenberg, member of IETF HTTPbis working
group, developer of Firefox
– https://bagder.gitbooks.io/http2-explained/content/
38CONFIDENTIAL
THANK YOU! QUESTIONS?
kslisenko@gmail.com
kslisenko
linkedin.com/in/kslisenko/
Konstantin Slisenko
kanstantsin_slisenka@epam.com

Best practices of building data streaming API

  • 1.
    1CONFIDENTIAL BEST PRACTICES OF BUILDING DATA STREAMINGAPI KANSTANTSIN SLISENKA APRIL 6, 2017
  • 2.
    2CONFIDENTIAL ABOUT ME Java Backendengineer Speaker at Java Tech Talks, SEC Online, CMCC Tech Talks, IT Week I’m interested in Complex Java backend, SOA, databases High load, fault-tolerant, distributed systems KANSTANTSIN SLISENKA EPAM Systems, Lead Software Engineer
  • 3.
    3CONFIDENTIAL Agenda Streaming and polling1 Technicalimplementation of streaming2 Technical challenges of streaming3 Some streaming libraries, tools and services4
  • 4.
    4CONFIDENTIAL REAL-TIME APPS AREEVERYWHERE UBER Facebook Google maps • Stock prices • Messengers • Social networks • Real-time dashboards • Games, …
  • 5.
    5CONFIDENTIAL Polling, long polling, streaming I justwant to hear 3 magical words… HOW REAL-TIME APPS WORK
  • 6.
    6CONFIDENTIAL - Not real-time -Useless calls POLLING client server request empty response new data data source request empty response request response
  • 7.
    7CONFIDENTIAL - Not real-time -Useless calls POLLING client server request empty response new data data source request request response new data client server data source - Not real-time • No or less useless calls LONG POLLING request empty response request response
  • 8.
    8CONFIDENTIAL - Not real-time -Useless calls POLLING client server request empty response new data data source request request response new data client server data source subscribe send data send data send data new data new data new data client server data source - Not real-time • No or less useless calls • Real time • Long held connection LONG POLLING STREAMING request empty response request response
  • 9.
  • 10.
    10CONFIDENTIAL Streaming on hardwareand network protocol level • UDP multicast • TCP reliable multicast protocols – Cisco PGM and others • The most effective network utilization TCP/UDP MULTICAST http://www.java67.com/2016/09/difference-between-tcp-and-udp-in-java.html
  • 11.
    11CONFIDENTIAL 1. Browser appsbecame more popular • No full TCP/UDP support in browsers 2. Host and network virtualization • Virtual and hardware networks are different • No benefit from multicast as routers are not aware of virtual hosts WHY TCP/UDP MULTICAST BECAME LESS POPULAR 3. Firewall/proxy restrictions • Usually only HTTP protocol not restricted in corporate networks 4. Poor multicast support by hosting providers • Multicast is being offered for additional cost • Poor quality of service
  • 12.
    12CONFIDENTIAL HTTP IS REQUEST-RESPONSEPROTOCOL FOREVER LOOP #HIDDEN IFRAME #AJAX #COMET #HTTP STREAMING
  • 13.
    13CONFIDENTIAL COMET / HTTPSTREAMING BENEFITS DRAWBACKS 1. Using only web-technologies – No more JRE, flash, browser plugins on client side 1. HTTP browser limitation – max 6-8 parallel calls – workaround with domain shading, multiplexing 2. Poor client and server performance – We are using HTTP protocol not in proper way 3. Proxy/firewall/browser kills request by timeout 4. Need to handle disconnects Should be used as fallback only!
  • 14.
    14CONFIDENTIAL Browser EVENT SOURCE API:TURNING HACK INTO STANDARD • Standard JavaScript API • No more hidden IFRAMEs • Browser automatically reconnects server Long-held HTTP call One way: from server to browser Still poor server performance
  • 15.
    15CONFIDENTIAL TCP HTTP WEB SOCKET: TCPIN BROWSER serverclient WebSocket frames WebSocket frames HTTP/1.1 101 Web Socket Protocol Handshake Upgrade: WebSocket Connection: Upgrade GET /demo HTTP/1.1 Upgrade: WebSocket Connection: Upgrade Origin: http://site.com 1. HTTP handshake 2. Upgrade response, “switch protocols” header 3. Switch to TCP (ports 80/443)
  • 16.
    16CONFIDENTIAL • Real-time P2Pconnection between browsers • Data, audio, video • STUN server needed for initial handshake https://webrtc.org/ WEB-RTC: UDP + P2P IN BROWSER STUN server I AM 10.0.10.1 I AM 10.0.25.40 DATA, VOICE, VIDEO 10.0.10.1 10.0.25.40 HE IS 10.0.25.40 HE IS 10.0.10.1
  • 17.
    17CONFIDENTIAL HTTP/2 SERVER PUSH serverbrowser index.html index.html,logo.png, styles.css I think you also need logo.png and styles.css May I have index.html? • Just an optimization for page load time • Not replacement for WebSocket
  • 18.
    18CONFIDENTIAL • Google CloudMessaging: Android/Chrome • Apple Push Notification Service: iPhone, iPad, Safari • Other services: Microsoft, Blackberry, … PUSH NOTIFICATIONS your back-end 1. GET TOKEN 2. SEND TOKEN 4. SEND NOTIFICATION 5. SEND NOTIFICATION 3. STORE TOKEN Messaging service VENDOR SERVICES Not a replacement for web-sockets! https://www.urbanairship.com/push-notifications-explained
  • 19.
    19CONFIDENTIAL COMPARATION OF STREAMINGIMPLEMENTATIONS TCP/UDP multicast HTTP Streaming COMET Event Source API WebSocket Web-RTC Use in browser NO YES YES YES YES Use not in browser YES YES (makes sense for browser apps) NO YES YES Technical details Custom protocols over TCP/UDP Long HTTP calls Long HTTP calls HTTP for handshake with subsequent upgrade to TCP P2P UDP STUN server to exchange IP addresses Benefits Hardware and protocol level – most effective network usage Only web technology used Easier to use then COMET All benefits from TCP and browser apps All benefits from TCP and browser apps Drawbacks Doesn’t work in browser Can be blocked by proxy/firewall Negative impact to client and server performance Negative impact to server performance Needs fallback to polling if disabled by firewall/proxy Needs intermediate discovery STUN server
  • 20.
  • 21.
    21CONFIDENTIAL DATA STREAMING CHALLENGIES Protocolfallback1 API design2 Fault-tolerance3 Security4 Using schemas5 Sending deltas (snapshot-update)6 Data merging7 Replaceable buffer8 ARCHITECTURE OPTIMIZATION
  • 22.
    22CONFIDENTIAL 1. PROTOCOL FALLBACK •Client don’t support WebSocket • Firewall/proxy issues • Unstable network connection Automatic switch to other protocol 1. Try WebSocket 2. Then HTTP streaming 3. Then Long polling* 4. Then Polling* * Not all applications can tolerate to such a large latency
  • 23.
    23CONFIDENTIAL 2. STREAMING APIDESIGN onMessage Publish-Subscribe ORM-style Development and support complexity, performance Lots of if-else blocks Very hard to maintain Logical notion of subscription Trade-off between level of abstraction and performance High level of abstraction We don’t know what exactly happens under API calls Data structures complexity
  • 24.
    24CONFIDENTIAL 3. FAULT-TOLERANCE CLIENT/CONNECTION ISDOWN SERVER IS DOWN server client disconnect reconnect client context server client heartbeat Session/context alive timeout client context Try restore context + send difference (preferable) Or request data again (HTTP/snapshot + WebSocket) Server 1 client Server 2 disconnect Connect other server Try restore context Or request data again client context client context If streaming no longer works - switch to polling We are no longer stateless!
  • 25.
    25CONFIDENTIAL 4. SECURITY Request-response Streaming ProtocolHTTPS WSS Authentication When HTTP session started Authorization Each client request Beginning of the connection Log-off Invalidate access token and session Invalidate access token and session Terminate WebSocket connection
  • 26.
    26CONFIDENTIAL 5. USING SCHEMAS FieldType Temp Decimal Pressure Decimal Status CONNECTED=1, DISCONNECTED=2 server client 25.5 | 751 | 1 Use schema Use schema Need to somehow manage different schema versions Schema version = 1 Don’t send field names in each message { sensorData: { temp: 25.5, pressure: 751, status: CONNECTED } }
  • 27.
    27CONFIDENTIAL Data snapshot in memory 6.SENDING DELTAS (SNAPSHOT-UPDATE) client server subscribe (TEPM, PRESSURE) Temp=35.50, Pressure=750 snapshotTemp Pressure 35.50 750 Temp Pressure 35.50 38.60 750 Temp Pressure 38.60 750 740 Temp=38.60 update Pressure=740 update
  • 28.
    28CONFIDENTIAL 7. DATA GROUPING TimePrice Quantity 12:40:00.100 121.60 5 12:40:00.150 121.95 10 12:40:00.600 121.70 20 12:40:01.100 121.75 50 12:40:01.900 121.60 100 Time Max price (MAX) Total quantity (SUM) 12:40:00 121.95 35 (5+10+20) 12:40:01 121.75 150 (50+100) Merge multiple messages into one for reducing bandwidth and frequency clientserver
  • 29.
    29CONFIDENTIAL 8. WRITE-BEHIND BUFFER Modifiablebuffer Time Temperature SensorID 12:41:00 24 c* 1 Time Temperature SensorID 12:40:00 23 c 1 12:40:00 30 c 2 UPDATE • Data has bot been sent • But still in the buffer client
  • 30.
  • 31.
    31CONFIDENTIAL Implements fallback – WebSocket –EventSource – COMET – Hidden IFRAME – Polling SOCK JS LIBRARY • Integration with Spring • Multiplexing support https://github.com/sockjs/websocket-multiplex
  • 32.
    32CONFIDENTIAL • Client andserver (Java) components • Transparently supports – WebSockets – Server Sent Events, – Long-Polling, – HTTP Streaming (Forever frame) • References – https://github.com/Atmosphere/atmosphere – http://async-io.org/tutorial.html ATMOSPHERE JAVA FRAMEWORK
  • 33.
    33CONFIDENTIAL • Connects toexternal data sources • Provides data to LS server • Per user/subscription • Security and permissions • Bandwidth/frequency limitations • Data schemas LIGTSTREAMER SELF-HOSTED SERVER DATA ADAPTER METADATA ADAPTER • Self-hosted server • We need to implement and deploy adapters
  • 34.
    34CONFIDENTIAL PUB NUB CLOUDSERVICE Your data source Your client apps www.pubnub.com
  • 35.
    35CONFIDENTIAL Cloud NoSQL datastorage • Data is automatically synced to all connected devices • Covers many issues – Failover – Protocol fallback – Network – Scalability – Monitoring – and many other • Handles complexity behind SDK GOOGLE FIREBASE CLOUD SERVICE
  • 36.
    36CONFIDENTIAL 1. Real-time appsare de facto standard now 2. Use streaming, fallback to long polling or polling 3. Take advantage from TCP/UDP in browser (WebSocket, Web-RTC) 4. Streaming API is fully statefull 5. Keep in mind optimization techniques when architecting streaming API 6. Use battle-tested tools and products CONCLUSION
  • 37.
    37CONFIDENTIAL Real-time web technologiesoverview – https://www.leggetter.co.uk/ Data streaming frameworks and services – List https://www.leggetter.co.uk/real-time-web-technologies- guide – Lightstreamer http://www.lightstreamer.com/ – SockJS https://github.com/sockjs – PubNub: pubnub.com – Firebase: https://firebase.google.com/ – Atmosphere: https://github.com/Atmosphere WebSocket – https://samsaffron.com/archive/2015/12/29/websockets-caution- required Server-side events vs WebSockets – http://streamdata.io/blog/push-sse-vs-websockets/ REFERENCES Server-side events – http://www.html5rocks.com/en/tutorials/eventsource/basics/ Push notifications – https://www.urbanairship.com/push-notifications-explained Push notification services with free plans – https://onesignal.com/ – https://clevertap.com/ – https://goroost.com/ HTTP/2 – https://daniel.haxx.se/blog/2014/04/26/http2-explained/ – https://http2.github.io/ – https://tools.ietf.org/html/rfc7540 – Explanation by Daniel Stenberg, member of IETF HTTPbis working group, developer of Firefox – https://bagder.gitbooks.io/http2-explained/content/
  • 38.

Editor's Notes

  • #39 Text should be left aligned / icons should be broken into two columns three and three