Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset and Druid

Developing High Frequency Indicators
Using Real-Time Tick Data
on Apache Superset and Druid
CBRT Big Data Team
Emre Tokel, Kerem Başol, M. Yağmur Şahin
Zekeriya Besiroglu / Komtas Bilgi Yonetimi
21 March 2019 Barcelona

Agenda
WHO WE ARE
CBRT & Our Team
PROJECT DETAILS
Before, Test Cluster,
Phase 1-2-3, Prod
Migration
HIGH FREQUENCY
INDICATORS
Importance & Goals
CURRENT ARCHITECTURE
Apache Kafka, Spark,
Druid & Superset
WORK IN
PROGRESS
Further analyses
FUTURE PLANS
6
5
4
3
2
1

Our Solutions
Data Management
• Data Governance Solutions
• Next Generation Analytics
• 360 Engagement
• Data Security
Analytics
• Data Warehouse Solutions
• Customer Journey Analytics
• Advanced Marketing Analytics Solutions
• Industry-specific analytic use cases
• Online Customer Data Platform
• IoT Analytics
• Analytic Lab Solution
Big Data & AI
• Big Data & AI Advisory Services
• Big Data & AI Accelerators
• Data Lake Foundation
• EDW Optimization / Offloading
• Big Data Ingestion and Governance
• AI Implementation – Chatbot
• AI Implementation – Image Recognition
Security Analytics
• Security Analytic Advisory Services
• Integrated Law Enforcement Solutions
• Cyber Security Solutions
• Fraud Analytics Solutions
• Governance, Risk & Compliance Solutions

• +20 IT , +18 DB&DWH
• +7 BIG DATA
• Lead Archtitect &Big Data /Analytics
@KOMTAS
• Instructor&Consultant
• ITU,MEF,Şehir Uni. BigData Instr.
• Certified R programmer
• Certified Hadoop Administrator

Our Organization
 The Central Bank of the Republic of Turkey is primarily responsible for steering the
monetary and exchange rate policies in Turkey.
o Price stability
o Financial stability
o Exchange rate regime
o The privilege of printing and issuing banknotes
o Payment systems

• Big Data Engineer• Big Data Engineer
M. Yağmur Şahin Emre Tokel Kerem Başol
• Big Data Team Leader

Importance and Goals
 To observe foreign exchange markets in real-time
o Are there any patterns regarding to specific time intervals during the day?
o Is there anything to observe before/after local working hours throughout the whole day?
o What does the difference between bid/ask prices tell us?
 To be able to detect risks and take necessary policy measures in a timely manner
o Developing liquidity and risk indicators based real-time tick data
o Visualizing observations for decision makers in real-time
o Finally, discovering possible intraday seasonality
 Wouldn’t it be great to be able to correlate with news flow as well?

Development of High Frequency Indicators Using Real-Time Tick
Data on Apache Superset and Druid
Phase 1
Prod
migration
Next
phases
Test
Cluster
Phase 2 Phase 3

Test Cluster
 Our first studies on big data have started on very humble servers
o 5 servers with 32 GB RAM for each
o 3 TB storage
 HDP 2.6.0.3 installed
o Not the latest version back then
 Technical difficulties
o Performance problems
o Apache Druid indexing
o Apache Superset maturity

TREP API
Apache
Kafka
Apache NiFi MongoDB
Apache
Zeppelin &
Power BI

Thomson Reuters Enterprise Platform (TREP)
 Thomson Reuters provides its subscribers with an enterprise platform that they can
collect the market data as it is generated
 Each financial instrument on TREP has a unique code called RIC
 The event queue implemented by the platform can be consumed with the provided
Java SDK
 We developed a Java application for consuming this event queue to collect tick-data
according to required RICs

Apache Kafka
 The data flow is very fast and quite dense
o We published the messages containing tick data collected by our Java application to a message
queue
o Twofold analysis: Batch and real-time
 We decided to use Apache Kafka residing on our test big data cluster
 We created a topic for each RIC on Apache Kafka and published data to related topics

Apache NiFi
 In order to manage the flow, we decided to use Apache NiFi
 We used KafkaConsumer processor to consume messages from Kafka queues
 The NiFi flow was designed to be persisted on MongoDB

MongoDB
 We had prepared data in JSON format with our Java application
 Since we have MongoDB installed on our enterprise systems, we decided to persist
this data to MongoDB
 Although MongoDB is not a part of HDP, it seemed as a good choice for our
researchers to use this data in their analyses

Apache Zeppelin
 We provided our researchers with access to Apache Zeppelin and connection to
MongoDB via Python
 By doing so, we offered an alternative to the tools on local computers and provided a
unified interface for financial analysis

Business Intelligence on Client Side
 Our users had to download daily tick-data manually from their Thomson Reuters
Terminals and work on Excel
 Users were then able to access tick-data using Power BI
o We also provided our users with a news timeline along with the tick-data

We needed more!
 We had to visualize the data in real-time
o Analysis on persisted data using MongoDB, PowerBI and Apache Zeppelin was not enough

TREP
API
Apache
Kafka
Apache
Druid
Apache
Superset

Apache Druid
 We needed a database which was able to:
o Answer ad-hoc queries (slice/dice) for a limited window efficiently
o Store historic data and seamlessly integrate current and historic data
o Provide native integration with possible real-time visualization frameworks (preferably from
Apache stack)
o Provide native integration with Apache Kafka
 Apache Druid addressed all the aforementioned requirements
 Indexing task was achieved using Tranquility

Apache Superset
 Apache Superset was the obvious alternative for real-time visualization since tick-data
was stored on Apache Druid
o Native integration with Apache Druid
o Freely available on Hortonworks service stack
 We prepared real-time dashboards including:
o Transaction Count
o Bid / Ask Prices
o Contributor Distribution
o Bid - Ask Spread

We needed more, again!
 Reliability issues with Druid
 Performance issues
 Enterprise integration requirements

Architecture
Internet Data
Enterprise Content
Social Media/Media
Micro Level Data
Commercial Data Vendors
Ingestion
Big Data Platform Data Science
GovernanceData Sources

TREP API Apache Kafka
Apache
Hive + Druid
Integration
Apache Spark
Apache
Superset

Apache Hive + Druid Integration
 After setting up our production environment (using HDP 3.0.1.0) and started to
feed data, we realized that data were scattered and we were missing the option to
co-utilize these different data sources
 We then realized that Apache Hive was already providing Kafka & Druid indexing
service in the form of a simple table creation and querying facility for Druid from
Hive

Apache Spark
 Due to additional calculation requirements of our users, we decided to utilize Apache
Spark
 With Apache Spark 2.4, we used Spark Streaming and Spark SQL contexts together in
the same application
 In our Spark application
o For every 5 seconds, a 30-second window is created
o On each window, outlier boundaries are calculated
o Outlier data points are detected

Current Architecture & Progress So Far
Java Application
Kafka Topic (real-time)
Kafka Topic (windowed)
TREP Event Queue
Consume Publish
Spark Application
Consume
Publish
Druid Datasource
(real-time)
Druid Datasource
(windowed)
Superset Dashboard
(tick data)
Superset Dashboard
(outlier)

Implementing…
 Moving average calculation (20-day window)
 Volatility Indicator
 Average True Range Indicator (moving average)
o [ max(t) - min(t) ]
o [ max(t) - close(t-1) ]
o [ max(t) - close(t-1) ]

To-Do List
 Matching data subscription
 Bringing historical tick data into real-time analysis
 Possible use of machine learning for intraday indicators

Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset and Druid

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset and Druid

Similar to Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset and Druid (20)

More from DataWorks Summit

More from DataWorks Summit (20)

Recently uploaded

Recently uploaded (20)

Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset and Druid

Editor's Notes