Apache Pulsar was developed to address several shortcomings of existing messaging systems including geo-replication, message durability, and lower message latency.
We will implement a multi-currency quoting application that feeds pricing information to a crypto-currency trading platform that is deployed around the globe. Given the volatility of the crypto-currency prices, sub-second message latency is critical to traders. Equally important is ensuring consistent quotes are available to all geographical locations, i.e the price of Bitcoin shown to a user in the USA should be the same as it to a trader in Hong Kong.
We will highlight the advantages of Apache Pulsar over traditional messaging systems and show how its low latency and replication across multiple geographies make it ideally suited for globally distributed, real-time applications.
Presenter:-
David Kjerrumgaard is an teacher, author, experienced Big Data technologist, and technology leader with Streamlio, where he serves as Director of Solution Architecture. Prior to joining Streamlio, David served as the Practice Director of professional services at Hortonworks, with a focus on delivering real-time stream processing solutions based on Apache NiFi, Apache Kafka, and Apache Storm.
2. • Company wishes to automate a crypto-currency arbitrage discovery platform in order
to exploit the spreads between crypto-currency prices offered by brokers across
multiple crypto-currency exchanges across the globe.
• Currently, the company has multiple geographically distributed trading desks (North
America, Europe, and Asia). Due to financial regulations, each trading desk has
accounts within its geographical region only and cannot execute trades outside of its
region.
• To overcome this restriction, the solution must provide the capability to perform a trade
in one region on behalf of a user in another region.
• In order to minimize risk, the company does not want to retain any open (long or short)
positions. Therefore all arbitrage trades must result in a net zero position, i.e. every
trade must be cleared.
The Business Use Case
3. • In order to accommodate this, the platform will need to support:
• Geo-replication of trade & quote information ingested from 50+ exchanges.
• The ability to send large, crypto-currency payloads between trading desks in order
to successfully complete the purchase of a security in one region and the selling
of the same security in another.
• Since we are sending actual crypto-currency payloads security and zero-message
loss is a MUST.
• Given the volatility of the crypto-currency prices, sub-second message latency is
critical to identify and exploit any market inefficiencies. All opportunities must be
identified, acted upon, and trades completed before the price moves. Failure to do
so could result in trading losses.
Business Use Case Requirements
4. • Intelligent platform for fast-moving data
• Built with open source technology
proven at scale at Twitter, Yahoo,
Salesforce…
• Founded by a team of data processing
veterans from Twitter, Yahoo, Google,
…
What is Streamlio
5. Interfaces
APIs Libraries & Connectivity
Intelligent platform for fast data
Real-time processing
Messaging & queuing
Stream storage
ConnectorsClientData SourceStormKafka Functional
Management
Resource Management
Metadata
Security
Monitoring
Management UI
Powered by
Powered by
Powered by
7. What is Apache Pulsar?
7
Ordering
Guaranteed ordering
Multi-tenancy
A single cluster can
support many tenants
and use cases
High throughput
Can reach 1.8 M
messages/s in a single
partition
Durability
Data replicated and
synced to disk
Geo-replication
Out of box support for
geographically
distributed applications
Deployed globally in
10+ data centers with
full replication and has
processed more than
100 trillion messages
to date.
Delivery Guarantees
At least once, at most once
and effectively once
Low Latency
Low publish latency of 5ms
at 99 pct
Highly scalable
Can support millions of
topics
8. • Each of the three trading desks ingests real-time trade and quote
information from all of the exchanges within their respective regions and
publish them into a shared, geo-replicated Pulsar topic.
• Trade & Quote information published in one region is available in another
• Apache Pulsar provides built-in WebSocket clients for all of its topics,
which are used to drive the trading dashboards used by the traders.
• When an arbitrage opportunity is identified, it will be published to the
arbitrage topic for further evaluation.
The Solution Architecture
12. • Subscribes to the Cryptocompare API.
• You can subscribe to multiple feeds in a single
request, but API throttles requests to 1000 /
minute.
• Subscription requests have the following
format:
• {FeedType}~{ExchangeName}~{FromCurrencySymbol}~{ToCurrencySymbol}
• Publishes the received events to a WebSocket
on port 4019, which NiFi subscribes to.
Crypto Currency Data Feed
14. • Amount of effort to produce the demo was approximately 2 weeks, with the bulk of the time spent
on the Apache NiFi processors. Source code for the Apache NiFi Processors (will be contributed
back to the Apache project) can be found here:
• https://github.com/openconnectors/nifi-pulsar-bundle
• https://github.com/openconnectors/nifi-pulsar-client-services
• Infrastructure is Google Cloud Compute, managed by Kubernetes. Helm Charts can be found here:
• https://storage.googleapis.com/streamlio/charts
• Docker images are currently available in DockerHub here:
• https://hub.docker.com/r/streamlio/nifi/
• https://hub.docker.com/r/streamlio/crypto-currency/
• Phase 2 of the demo will add automated trading using Machine Learning automated by Apache
Heron.
Final Comments