Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How to Make Norikra Perfect

3,366 views

Published on

Stream Processing Casual Talks #1 #streamctjp

Published in: Software
  • Be the first to comment

How to Make Norikra Perfect

  1. 1. How to make Norikra perfect Stream Processing Casual Talks #1 #streamctjp Jul 22, 2016 Satoshi Tagomori (@tagomoris)
  2. 2. Satoshi "Moris" Tagomori (@tagomoris) Fluentd, MessagePack-Ruby, Norikra, ... Treasure Data, Inc.
  3. 3. 1. How Norikra is perfect 2. How to make Norikra more perfect
  4. 4. http://norikra.github.io/
  5. 5. Norikra:
 Schema-less Stream Processing using SQL • Server software, written in JRuby, runs on JVM • Open source software (GPLv2) • http://norikra.github.io/ • https://github.com/norikra/norikra
  6. 6. SELECT user.age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) WHERE current=”San Diego” AND attend.$0 AND attend.$1 GROUP BY user.age {“name”:”tagomoris”, “user:{“age”:35, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”San Diego”, “speaker”:true, “attend”:[true,true,false, ...] } {“user.age":35,"cnt":5},
 {"user.age":36,"cnt":8}, ...
  7. 7. How Norikra is Perfect • Ultra fast bootstrap • Schema on read • Handling complex (nested) events • Dynamic query registration/unregistration • Simple Web UI • Data connector: Fluentd • Extensible: UDF/Listener plugins • Performance: good enough for small/middle site
  8. 8. Schema on Read • Query first, Data next • Query must know what it requires • field names, types of fields, ... • Platform can ingest any data into processor.
 Query can fetch events which matches required schema. schema-less (mixed) data stream fields subset for query A fields subset for query B query A query B events from billing service events from API endpoint
  9. 9. Architecture Norikra Server (on JVM) Esper Instance (Query Engine) Type Definition Manager Output Event Pool Norikra Engine RPC Server mizuno (Jetty + Rack) Rack RPC Handler Norikra Client msgpack- rpc-over-http
  10. 10. For details :) • Norikra: Stream Processing with SQL
 http://www.slideshare.net/tagomoris/norikra-stream-processing-with-sql • Norikra: SQL Stream Processing in Ruby
 http://www.slideshare.net/tagomoris/norikra-sql-stream-processing-in-ruby • Norikra in Action
 http://www.slideshare.net/tagomoris/norikra-in-action-ver-2014-spring • Landscape of Norikra Features
 http://www.slideshare.net/tagomoris/norikra-meetup-features • Norikra Recent Updates
 http://www.slideshare.net/tagomoris/norikra-recent-updates
  11. 11. Recent Updates • v1.4.0: Jul 19, 2016 • Add support for "-D" and "-agentlib" of JVM • Update msgpack version • Previous release v1.3.1: May 7, 2015 • Explained in "Norikra Recent Updates" slide
  12. 12. IS IT REALLY PERFECT!?
  13. 13. Good & Bad • Good for startup:
 Fast bootstrap, SQL, Web UI, Fluentd plugins, 
 Handling complex events, ... • Good for middle:
 Dynamic query registration, Dynamic UDF loading,
 Good performance enough for middle (10k events/sec),
 Schema on read, ... • Bad for big players:
 No Distribution, No High availability,
 Uncontrollable JVM/Esper behavior (CPU&Memory)
  14. 14. Tentative name: Perfect Norikra
  15. 15. Perfect Norikra • All features of Norikra • Including "Ultra fast bootstrap" • Compatible RPC API w/ original Norikra • Distributed execution on any scheduler • YARN? Mesos? or ...? • Automatic failover & retry for failures (HA) • Automated optimization for load balancing • Dynamic scaling out
 from 1 to 100 nodes - without any restarts/retries
  16. 16. Rough Sketch RPC Server RPC Handler Type Definition Manager Query Compiler DAG Optimizer / Deoptimizer DAG Executor Event Router Event Buffer Queries Events Events master node processor node
  17. 17. Rough Sketch • Brand new query executor • SQL Parser • Query compiler into DAG • SQL operators as sub-DAGs (inspired by TimeStream) • DAG executor • Brand new dataflow manager / nodes • Sync/Async data replication • Barriers for event stream (inspired by Flink) • Versioned routing/distribution
  18. 18. Dynamic Scaling Out • Processing nodes are stateful • state: limited by available memory size • growing stream size -> memory overflow :-( • Scaling strategy must be dynamic • restarting queries (of static scaling) increases latency
  19. 19. Query: COUNT(DISTINCT uid) per 1day 7/1 7/2 7/3 7/4 3nodes 3nodes 3nodes memory usage per node
  20. 20. Query: COUNT(DISTINCT uid) per 1day 7/1 7/2 7/3 7/4 memory overflow - CRASH! Burst Traffic - failure 3nodes 3nodes 3nodes
  21. 21. Query: COUNT(DISTINCT uid) per 1day 7/1 7/2 7/3 7/4 3nodes 3nodes 6nodes6nodes Crash Recovery • After crash, restart the query w/ increased # of nodes • After restart, query re-reads all data of that window • After recovery, all nodes back to realtime calculation Crash & Recovery Strategy(1)
  22. 22. Query: COUNT(DISTINCT uid) per 1day 7/1 7/2 7/3 7/4 Crash & Recovery Strategy(2) 3nodes 3nodes 6nodes6nodes Crash Recovery • Pros: Very easy to implement • Cons: Requires all data stored (distributed filesystem?) • Cons: Hard to know # of nodes for increasing traffic • Cons: Recovery state requires more nodes than normal state
  23. 23. Dynamic Scaling Out strategy(1) Query: COUNT(DISTINCT uid) per 1day 7/1 7/2 7/3 7/4 3nodes 5nodes5nodes 6nodes intermediate result 3nodes merge results
 for final result • Before crash, increase # of processing nodes • Queries always produces intermediate results w/ # of distribution • Query results should be produced by merging intermediate results
  24. 24. Dynamic Scaling Out strategy(2) Query: COUNT(DISTINCT uid) per 1day 7/1 7/2 7/3 7/4 3nodes 5nodes5nodes 6nodes intermediate result 3nodes merge results
 for final result • Pros: Less latency, less computing power • Cons: All operator must support such calculation
 - SQL !
  25. 25. For Dynamic Scaling Out • De-optimization of operators • Virtual nodes for routing • ... and many others
  26. 26. Hard things • Resource monitoring & limitation • Multi-tenancy • UDF and sandbox • Queries without aggregations
  27. 27. Why not on Spark or Flink? • Because of schema-less event processing
 - it requires dataflow controlled by query manager • Because of dynamic scaling
 - it requires brand new dataflow layer
  28. 28. No Bytes Implemented :P Stay Tuned! We are hiring! by Treasure Data

×