Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Perfect Norikra 2nd Season

4,732 views

Published on

Stream Processing Casual Talks #2

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Perfect Norikra 2nd Season

  1. 1. Perfect Norikra 2nd Season Stream Processing Casual Talks #2 2017/07/27 Satoshi Tagomori (@tagomoris)
  2. 2. Satoshi "Moris" Tagomori (@tagomoris) Fluentd, MessagePack-Ruby, Norikra, ... Treasure Data, Inc.
  3. 3. http://norikra.github.io/
  4. 4. Streaming + SQL
  5. 5. Norikra:
 Schema-less Stream Processing using SQL • Server software, written in JRuby, runs on JVM • Open source software (GPLv2) • http://norikra.github.io/ • https://github.com/norikra/norikra
  6. 6. SELECT user.age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) WHERE current=”San Diego” AND attend.$0 AND attend.$1 GROUP BY user.age {“name”:”tagomoris”, “user:{“age”:35, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”San Diego”, “speaker”:true, “attend”:[true,true,false, ...] } {“user.age":35,"cnt":5},
 {"user.age":36,"cnt":8}, ...
  7. 7. How Norikra is Perfect • Ultra fast bootstrap • Schema on read • Handling complex (nested) events • Dynamic query registration/unregistration • Simple Web UI • Data connector: Fluentd • Extensible: UDF/Listener plugins • Performance: good enough for small/middle site
  8. 8. Schema on Read • Query first, Data next • Query must know what it requires • field names, types of fields, ... • Platform can ingest any data into processor.
 Query can fetch events which matches required schema. schema-less (mixed) data stream fields subset for query A fields subset for query B query A query B events from billing service events from API endpoint
  9. 9. Architecture Norikra Server (on JVM) Esper Instance (Query Engine) Type Definition Manager Output Event Pool Norikra Engine RPC Server mizuno (Jetty + Rack) Rack RPC Handler Norikra Client msgpack- rpc-over-http
  10. 10. For details :) • Norikra: Stream Processing with SQL
 http://www.slideshare.net/tagomoris/norikra-stream-processing-with-sql • Norikra: SQL Stream Processing in Ruby
 http://www.slideshare.net/tagomoris/norikra-sql-stream-processing-in-ruby • Norikra in Action
 http://www.slideshare.net/tagomoris/norikra-in-action-ver-2014-spring • Landscape of Norikra Features
 http://www.slideshare.net/tagomoris/norikra-meetup-features • Norikra Recent Updates
 http://www.slideshare.net/tagomoris/norikra-recent-updates
  11. 11. Recent Updates • v1.4.0: Jul 19, 2016 • Add support for "-D" and "-agentlib" of JVM • Update msgpack version • Previous release v1.3.1: May 7, 2015 • Explained in "Norikra Recent Updates" slide
  12. 12. User Companies • LINE Corporation • Kayac Inc. • Mercari, Inc. • (and some/many others)
  13. 13. https://www.slideshare.net/tagomoris/how-to-make-norikra-perfect
  14. 14. Perfect Norikra • All features of Norikra • Including "Ultra fast bootstrap" • Compatible RPC API w/ original Norikra • Distributed execution on any scheduler • YARN? Mesos? or ...? • Automatic failover & retry for failures (HA) • Automated optimization for load balancing • Dynamic scaling out
 from 1 to 100 nodes - without any restarts/retries
  15. 15. MAKE Norikra PERFECT AGAIN
  16. 16. Features for More Perfection • Loading operator internal states from Batch query engines • Sharing operator internal states between queries
  17. 17. Stream Processing • Monitoring, Reporting, Alerting • Fast recommendation • Matching behaviors • and ...
  18. 18. Handling Long Term Data/History timeline Website audience data Jul 24, 2014 Purchase a car Jul 28, 2017 ....? Start batch query
 to read 3~4 years history Offer a nice bonus to possible customer! Browser session already expired......
  19. 19. Stream Processing on Long Term Data timeline Website audience data: processed continuously Jul 24, 2014 Purchase a car Jul 28, 2017 Got a nice bonus offer! Jul 28, 2017 Got a wrong offer... Rewrite the query & start it
 without past data... more 3 years required for test?
  20. 20. Resume/Restart of Queries • Queries may be stopped/killed by many reasons • cluster version up / migration • troubles • Queries should be modified anytime • wrong logic • data schema upgrade • new business requirement
  21. 21. What we want: timeline Website audience data: processed continuously Jul 24, 2014 Purchase a car Jul 28, 2017 Got a nice bonus offer! Jul 28, 2017 Got a wrong offer... Rewrite & start the query with past long history
  22. 22. Load "Running" Queries Load "running" stream query from batch engines! Submit a stream query Query the history on batch engines & load the result as intermediate state of stream query Start to process realtime data
  23. 23. Load "Running" Queries Load "running" stream query from batch engines! Submit a stream query Query the history on batch engines & load the result as intermediate state of stream query Start to process realtime data
  24. 24. JOINs with Past Data Submit a stream query w/ JOIN past data JOIN Submit a query Query past data from batch & load it JOIN Start to process realtime data w/ JOIN
  25. 25. JOINs with Past Data Submit a stream query w/ JOIN past data JOIN Submit a query Query past data from batch & load it JOIN Start to process realtime data w/ JOIN
  26. 26. True Lambda Architecture • Use just one DSL on both of Stream & Batch • SQL! • Ingest data stream to both of Stream & Storage • Handle time window intelligently • Specify time window out of DSL • Write once on batch, Run anywhere :D
  27. 27. Idempotent Operator State • As a stream operator with realtime data • As a loaded stream operator with past data • Serializable operator internal states
  28. 28. Sharing Operators between Queries Query A Query B
  29. 29. SHARED Operators Sharing Operators between Queries history (stream) history (batch: 3 - 4 years ago) JOIN Query A filter + projection Query B filter + projection
  30. 30. Sharing Operators during Updating Query history (stream) history (batch: 3 - 4 years ago) JOIN Query A filter + projection Oops, I found mistake on Query A!
  31. 31. SHARED Operators Sharing Operators during Updating Query history (stream) history (batch: 3 - 4 years ago) JOIN Query A filter + projection Query A' filter + projection I've just added updated query...
  32. 32. Sharing Operators during Updating Query history (stream) history (batch: 3 - 4 years ago) JOIN Query A' filter + projection It works! I can remove older one.
  33. 33. Perfect Stream Processing Engine • Just same SQL on both of Batch and Stream • Stream processor which can resume queries using batch query engine results • reduces memory usage of JOINs • reduces memory usage about historical data • Stream Processor which can share operators between queries • reduces total amount of memory usage • makes it possible to restart/update queries anytime, casually
  34. 34. Perfect Norikra
  35. 35. Named
  36. 36. It has still 0 bytes. Stay tuned! We are hiring! - Treasure Data

×