Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

NoSQL in Financial Industry - Pierre Bittner


Published on

Since his creation by Yahoo! in 2006 for web search, implementations of Hadoop never stop to evolve with nowadays strong focus on stream processing and real-time analytics. Scaled Risk aims to accelerate the adoption of Hadoop in Finance Industry. This talk explains how we leverage on HBase to respond to Capital Market specific requirements: - handling structured representation of trades with many fast evolving models which leads us to the conception of a Dynamic Data Schema - low latency message bus for extremely fast trading analytics, - data coherency and process repeatability in an externally-consistent distributed systems supported by as-of-date and versioning mechanisms for regulatory requirements.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

NoSQL in Financial Industry - Pierre Bittner

  1. 1. SCALED RISK Next Generation of Financial Platform NoSQL in Financial Industry Distributed Matters - Barcelona – 21 November 2015 Pierre Bittner - CTO SCALED RISK
  2. 2. SCALED RISK 2 Integrated Big Data & Analytics Platform SaaS or On-Premise Hadoop/HBase + Low latency + External Consistency + Flexible Data Schema + In-Memory OLAP WHAT? HOW? FOR? Real-Time Risk Management WHERE? What is Scaled Risk?
  3. 3. SCALED RISK 3 Why NoSQL Matters in Financial Industry? • Volume / Velocity § New York Stock Exchange generates about 4−5 terabytes of data per day. § Algo Trading, High Frequency Trading: In 2012, accounted for 50% of all US equity trading volume. Trade execution milli- and even microseconds. ġ E Y G • Coherency / Availability / Security § Regulatory Report: Intraday Monitoring, § MTTR, Data Spikes on Market Event, Disaster Recovery, ACL • Mixed workloads: Streaming and Historical Analysis – Point In Time Comparison § BackTesting, Replay (UTC Timestamping of all events, FIFO) § Lambda-architecture, Kappa-architecture • Needs for Multi-tenancy / Data & Process Governance (Data Lake / Data Centric Arch.)
  4. 4. SCALED RISK 4 G Y E Real-Time Enterprise-Wide Risk Management Improved and trustable view of global risk and support implementation of next regulations Real-Time Fraud Detection Pre-check, real-time and historical data verification for trades, payments, orders, … Real-Time Market Analytics On-demand live and historical data analysis on global market Why NoSQL Matters in Financial Industry? Customer Story: Market Exchange Market Surveillance
  5. 5. SCALED RISK OTC Market • Foreign Exchange • Debt Market (Bond) • Commodities • Bloomberg, FXAll,… • … Regulated Market • Securities • Options • NYSE, Eurex • LSE • … Structured Data Feeds Booking Systems • Trader Positions • Intraday Events • Valorization • Volatility, Correlation Referential Data • Counterparts • Analytical Structure • Products Definition • Mappings Unstructured Data Feeds News & Mkt Analysis • Reuters, BBG • Research Social Media • Twitter • LinkedIn, … Trading • Global Positions • Intraday funding & forecasting • Collateral Optimization RT Aggregated Positions Sales • Credit Line • Profitability Indicator • Customer Interests Global • Market Flows • Analyst/Market Correlation On-Demand Analysis Market Risk Analysis • Stress per Counterparty Sales • Customer alerts on Market Trends • Recommendation & Lead Generation Live Report & Alerts • On Market Events • Custom scenario • Market Surveillance 5 Today’s Trading Challenge: On-Demand Live Analysis & Alerts Risk • CVA, Counterparty Exposure • Limit, Stress Test Intraday Limit Risk • Automatic Monitoring
  6. 6. SCALED RISK 6 Context Extreme performance and resilience : Peak activity > 1M order p. second Low Latency Objective On-demand market analytics out of real-time & historical data Resilient primary storage Problems High volumes, difficult access to history SLAs for data & service availability Customer Story: On Demand Market Surveillance for Exchange Solution Scaled Risk at the outflow of the matching engine Benefits Streamline process, consistent view High availability and scalability Reduced TCO l Result A single system for storage and computation of spot & historical data for market surveillance
  7. 7. SCALED RISK 7 On Demand Market Surveillance: Pilot Perimeter High Level Architecture Candidateģ
  8. 8. SCALED RISK 8 On Demand Market Surveillance: Pilot Perimeter Focus on evaluating HBase frameworkå Ø HBase performance on Read/Write Ø HBase behavior during a node failure Ø HBase process isolation Ø Global consistency Key parts of the architectureå Ø Message Bus (Kafka) Ø Storage System (HDFS) Ø Operational Database (HBase) Ø Real-Time Analytics tool (Scaled Risk) Ø History & Data Analytics tool (SR & Spark) Benefits of architecture (streamline process, cost, …) not covered in this step. Confirm Hadoop/HBase technical Stackå Evaluate Scaled Risk performanceå Explore Scaled Risk featureså Pilot Perimeter Suitability of HBase and Scaled Risk in term of properties and performance. Pilot duration : 2 months
  9. 9. SCALED RISK 9 HBase: Random Access to your Planet-Wide Data Key-value data organization per row. Table is a namespace.å Each cell is timestampedå ACID per row; Rowkey for fast access and data distributionå HBase in few words HBase is an open-source, distributed, versioned, non-relational, scalable, wide-column data store. Ø It is the Hadoop database, leveraging mainly on HDFS. Ø Based on Google BigTable storage system. Four primary operations are Get, Put, Delete and Scanå Server-side operations with Coprocessor (Observer, Endpoint)å Linear scalability, automatic sharding and failover supportå Strictly consistentå Hadoop ecostem integration (YARN), MapReduce, Hive, Sparkå Phoenix for SQL Flavorå
  10. 10. SCALED RISK NoSQL Wide Column Store Real-Time Distributed OLAP • Dynamic Data Schema • Schema on read and write • Fast, Random R/W access • Fast In-Memory Data Processing • Full Consistency; Linear Scalability • Open API (Valuation) On-Demand Market Surveillance: Functional Architecture 10 LowLatencyInternalBus Read-Isolations As Of Date HBase As Storage Injector(Thrift) • Advanced Index and search for Data Classification and Correlation • Semantic reconciliation Real Time Indexing Real-Time Alerting 0 1 2 3 4 5 6 Contrat 1 Contrat 2 Contrat 3 Contrat 4 Alert on Analytics Volume Matching Cancel Rate Alert on Data REST/API/WebSocket
  11. 11. SCALED RISK On-Demand Market surveillance: Technical Architecture 11 Head Node Name Node Head Node Secondary Name Node Head Node Hbase Master Worker Node Region Server Data Node Worker Node Region Server Data Node Worker Node Region Server Data Node Worker Node Region Server Data Node Worker Node Region Server Data Node Worker Node Region Server Data Node HP Loader 3 x Hadoop Head nodes: HP ProLiant DL360Gen9 Server 8x 900GB 10k rpm SAS,128 GB RAM, 2 x (10 cores) Intel Xeon CPU E5-2660 v3 @ 2.60GHz, 4 x 1GbE ports and 2 x 10GbE ports 6 x Hadoop worker nodes: HP ProLiant DL380Gen9 Server 2 x 120GB SSD OS, 15 x 3TB 7.2k rpm SATA, 128 GB RAM, 2 x (10 cores) Intel Xeon CPU E5-2660 v3 @ 2.60GHz, 4 x 1GbE ports and 2 x 10GbE ports 1 x HP Smart HBA H240ar, 1 x HP Smart HBA H240 1 x HP Loader: HP ProLiant DL380Gen8 14 x 1TB 7.2k rpm SAS, 128 GB RAM, 2 x (10 cores) Intel Xeon CPU E5-2670 v2 @ 2.50GHz Cluster size and components Hadoop cluster details : • Hadoop HDFS usable size : 60TB (Block replication 3, no compression) • Hadoop HDFS data disk RAW size : 241TB • Hadoop cluster memory : 6 x 128GB = 768GB Hadoop componentsand associated services • Hadoop Distribution : HortonWorks HDP 2.2 Stack • Cluster management : HP Insight CMU v7.3 • Hdfs v2.6.0 • Hbase v0.98.4 • Zookeeperv3.4.6 Other details : • OS : RHEL - RedHat Enterprise Linux v6.5 – 64bit • Linux filesystem for Hadoop data : ext4 • JVM used for Hadoop : Oracle Java 1.7.0_67
  12. 12. SCALED RISK 12 On Demand Market Surveillance : Functional Consistency • Market Exchange Data types § A unique Data flow containing all types of message § Order messages § Trade messages § Test injector generates 1,5m in 7’ (client limitation) E Y G • Scaled Risk Data exhaustiveness control § Dynamic data model with two tables § Trade and Order messages are split § Test method: Messages count Message Type Message sub type Count Order New 792,546 Replace 645,889 Status 40,821 Others 80 (unique order ids) 792,886 Cancel n/a 680,626 Trade n/a 137,573 • Order and Trade Life-cycle Control § Message fields consistency control § Test method: Data sampling Message Type Count Order Table 792,886 Trade Table 137,573 Order Id Trader Contract Qty Price Side A 6C9 JFFCE150500000F 1 49350 Buy B W90 JFFCE150500000F 2 49350 Sell C MAT JFFCE150500000F 1 49345 Buy Trade Id Trader Contract Qty Price Side 1630 6C9 JFFCE150500000F 1 49350 Buy 1630 W90 JFFCE150500000F 1 49350 Sell 1631 MAT JFFCE150500000F 1 49345 Buy 1631 W90 JFFCE150500000F 1 49345 Sell
  13. 13. SCALED RISK 13 On Demand Market Surveillance : Performance Indicators Sender/Trade (per region) • 130 K trades per second • 800 K on cluster Test Scenario • 7 minutes • 1,479,335 messages • Stats only on Order Table End-to-end • Nominal Latency ~200ms • 90% of messages with <412ms
  14. 14. SCALED RISK On Demand Market Surveillance : Fault Tolerance Test HBase is designed to be fault tolerant. • A node fails when the white stripe appears on the whole width of the graph. • All nodes are impacted by the failure, and not only the killed node (as expected remember CP). • Another white rectangle is displayed before the node failure. • It represents all the messages that have been correctly inserted before the failure, but never flushed to disk. • Because the WAL is deactivated by trade injector (option), those messages were lost when regions were moved from the killed node to other nodes. X axis is the rowkey prefix, to show the distribution of insertion on the cluster. The Y axis is the time. Points displayed over the entire width of the X axis means that the distribution is correct.
  15. 15. SCALED RISK On Demand Market Surveillance : Fault Tolerance Test X axis is the rowkey prefix, to show the distribution of insertion on the cluster. The Y axis is the time. Points displayed over the entire width of the X axis means that the distribution is correct. A second test confirms that HBase remains available even if a node fails. Test consists in inserting data in HBase from both YCSB and trade injector clients. • YCSB inserts data in a table distributed on 5 nodes • Trade injector inserts data in a table distributed on 4 nodes. • The node killed does not impact trades injection.
  16. 16. SCALED RISK On Demand Market Surveillance: Next Steps Deeper evaluation of HBase Impact of volumes on performance Evaluation of HA Region Servers for data access Wider view of the targeted architecture Overall resilience Overall latency Simplification Hot zone/Cold zone TCO Business requirements of the project: MIFID II impact New services
  17. 17. SCALED RISK 17 Extreme flexibility thanks to our OLAP cube and Data Schema • 360 view of the position (As Of Date, explain, multi-aggregation level) • In-memory distributed calculation • Sub-second end-to-end (push architecture) Low latency internal bus • UDP unicast, acknowledgement by UDP • No region location pain • Exactly once delivery, no message resent, multicast storm prevention Resiliency • HBase RPC poll on message losses • HDFS message storage on overflood and region events Overview of Scaled Risk implementation Ħ Open Architecture • Open Standards: seamless integration to HBase (coprocessor) • Open API (Valuation, FIFO), Toolkit approach