Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Norikra: SQL Stream Processing In Ruby

5,268 views

Published on

Presentation in RubyConf 2014

Published in: Technology
  • Thank you for norikra. I would really like to know how to make a TARGET auto-create the fields that are, for example, in the slide number 34 ({"name":"tagomoris","user":{"age":35 ...)
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Norikra: SQL Stream Processing In Ruby

  1. 1. Norikra: SQL Stream Processing In Ruby 2014/11/19 RubyConf 2014 DAY 3 Satoshi Tagomori (@tagomoris)
  2. 2. Topics Why I wrote Norikra Norikra overview Norikra queries Use cases in production JRuby for me
  3. 3. Satoshi Tagomori (@tagomoris) Tokyo, Japan LINE Corporation
  4. 4. Monitoring/Data Analytics Overview collect parse clean up process visualize Access logs, store process Application logs, ...
  5. 5. collect parse clean up process visualize store process
  6. 6. collect parse clean up process visualize store process
  7. 7. Fluentd stream aggregation: Good for simple data/calculation collect parse clean up process visualize store process
  8. 8. Our services: More and more different services Many changes in a day (including logging) Many kind of logs for each services Many different metrics for each services
  9. 9. Fluentd stream aggregation: Not good for processing about complex/fragile environment... collect parse clean up process visualize store process
  10. 10. We want to: add/remove queries anytime we want write many queries for a service log stream ignore events without data we want make our service directors / growth hackers to write their own queries!
  11. 11. collect parse clean up process visualize store process
  12. 12. break.
  13. 13. Norikra: Schema-less Stream Processing with SQL Server software, written in JRuby, runs on JVM Open source software (GPLv2) http://norikra.github.io/ https://github.com/norikra/norikra
  14. 14. How To Setup Norikra: Install JRuby download jruby.tar.gz, extract it and export $PATH use rbenv rbenv install jruby-1.7.xx rbenv shell jruby-.. Install Norikra gem install norikra Execute Norikra server norikra start
  15. 15. Norikra Interface: CLI client/Client library: norikra-client norikra-client target open ... norikra-client query add ... tail -f ... | norikra-client event send ... WebUI show status show/add/remove queries HTTP API JSON, MessagePack
  16. 16. Norikra: Schema-less event stream: Add/Remove data fields whenever you want SQL: No more restarts to add/remove queries w/ JOINs, w/ SubQueries w/ UDF (in Java/Ruby as rubygems) Truly Complex events: Nested Hash/Array, accessible directly from SQL
  17. 17. Norikra Queries: (1) SELECT name, age FROM events target
  18. 18. Norikra Queries: (1) {“name”:”tagomoris”, “age”:35, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”} SELECT name, age FROM events {“name”:”tagomoris”,”age”:35}
  19. 19. Norikra Queries: (1) {“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”} without “age” SELECT name, age FROM events nothing
  20. 20. Norikra Queries: (2) {“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”} SELECT name, age FROM events WHERE current=”San Diego” {“name”:”tagomoris”,”age”:35}
  21. 21. Norikra Queries: (2) {“name”:”nobu”, “age”:0, “address”:”Somewhere”, “corp”:”Heroku”, “current”:”SAN”} current is not “San Diego” SELECT name, age FROM events WHERE current=”San Diego” nothing
  22. 22. Norikra Queries: (3) SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY age
  23. 23. Norikra Queries: (3) {“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”} SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY age every 5 mins {”age”:35,”cnt”:3}, {“age”:33,”cnt”:1}, ...
  24. 24. Norikra Queries: (4) {“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”} SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY age {”age”:35,”cnt”:3}, {“age”:33,”cnt”:1}, ... SELECT max(age) as max FROM events.win:time_batch(5 mins) {“max”:51} every 5 mins
  25. 25. Norikra Queries: (5) {“name”:”tagomoris”, “user:{“age”:35, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”San Diego”, “speaker”:true, “attend”:[true,true,false, ...] } SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY age
  26. 26. Norikra Queries: (5) {“name”:”tagomoris”, “user:{“age”:35, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”San Diego”, “speaker”:true, “attend”:[true,true,false, ...] } SELECT user.age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY user.age
  27. 27. Norikra Queries: (5) {“name”:”tagomoris”, “user:{“age”:35, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”San Diego”, “speaker”:true, “attend”:[true,true,false, ...] } SELECT user.age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) WHERE current=”San Diego” AND attend.$0 AND attend.$1 GROUP BY user.age
  28. 28. break. next: use cases
  29. 29. Use case 1: External API call reports for partners (LINE) External API call for LINE Business Connect LINE backend sends requests to partner’s API endpoint using users’ messages http://developers.linecorp.com/blog/?p=3386
  30. 30. Use case 1: External API call reports for partners (LINE) channel gateway partner’s server logs query results MySQL Mail SELECT channelId AS channel_id, reason, detail, count(*) AS error_count, min(timestamp) AS first_timestamp, max(timestamp) AS last_timestamp FROM api_error_log.win:time_batch(60 sec) GROUP BY channelId,reason,detail HAVING count(*) > 0 http://developers.linecorp.com/blog/?p=3386
  31. 31. Use case 1: External API call reports for partners (LINE) API error response summaries http://developers.linecorp.com/blog/?p=3386
  32. 32. Use case 2: Lambda architecture Prompt reports for Ad service console Prompt reports with Norikra + Fixed reports with Hive app serverapp serverapp server app serverapp serverapp server Fluentd HDFS console service execute hive query (daily) fetch query results (frequently) impression logs
  33. 33. Use case 2: Prompt reports for Ad service console SELECT yyyymmdd, hh, campaign_id, region, lang, COUNT(*) AS click, COUNT(DISTINCT member_id) AS uu FROM ( SELECT yyyymmdd, hh, get_json_object(log, '$.campaign.id') AS campaign_id, get_json_object(log, '$.member.region') AS region, get_json_object(log, '$.member.lang') AS lang, get_json_object(log, '$.member.id') AS member_id FROM applog WHERE service='myservice' AND yyyymmdd='20140913' AND get_json_object(log, '$.type')='click' ) x GROUP BY yyyymmdd, hh, campaign_id, region, lang Hive query for fixed reports
  34. 34. Use case 2: Prompt reports for Ad service console Norikra query for prompt reports SELECT campaign.id AS campaign_id, member.region AS region, member.lang AS lang, COUNT(*) AS click, COUNT(DISTINCT member.id) AS uu FROM myservice.win:time_batch(1 hours) WHERE type="click" GROUP BY campaign.id, member.region, member.lang
  35. 35. Use case 3: Realtime access dashboard on Google Platform Access log visualization Count using Norikra (2-step), Store on Google BigQuery Dashboard on Google Spreadsheet + Apps Script http://qiita.com/kazunori279/items/6329df57635799405547 https://www.youtube.com/watch?v=EZkw5TDcCGw
  36. 36. Use case 3: Realtime access dashboard on Google Platform Server Fluentd http://qiita.com/kazunori279/items/6329df57635799405547 https://www.youtube.com/watch?v=EZkw5TDcCGw ngnix access log access logs to BigQuery norikra query results norikra query to aggregate node to aggregate locally
  37. 37. Use case 3: Realtime access dashboard on Google Platform 70 servers, 120,000 requests/sec (or more!) Fluentd logs to store http://qiita.com/kazunori279/items/6329df57635799405547 https://www.youtube.com/watch?v=EZkw5TDcCGw ngnix ngngninxix ngngninxix ngngninxix ngngninxix ngngninxix ngngninxix ngngninxix ngngninxix ngnix Google BigQuery Google Spreadsheet + Apps script ... counts per host total count
  38. 38. Why Norikra is written in JRuby Esper CEP(Complex Event Processing) library, written in Java Rubygems.org Open repository, for public UDF plugins of Norikra provided as gem
  39. 39. JRuby for me Ruby! (by great JRuby developer team!) makes developing Norikra dramatically faster with rubygems and rubygems.org for easy deployment/installation with Java libraries, ex: Jetty, Esper, ... There are not so many users in Tokyo :(
  40. 40. More queries, more simplicity and less latency in data processing Thanks! photo: by my co-workers http://norikra.github.io/ https://github.com/norikra/norikra
  41. 41. See also: http://norikra.github.io/ “Lambda Architecture Platform Using SQL” http://www.slideshare.net/tagomoris/lambda-architecture-using-sql-hadoopcon- 2014-taiwan “Stream processing and Norikra” http://www.slideshare.net/tagomoris/stream-processing-and-norikra “Batch processing and Stream processing by SQL” http://www.slideshare.net/tagomoris/hcj2014-sql “Norikra in Action” http://www.slideshare.net/tagomoris/norikra-in-action-ver-2014-spring http://www.slideshare.net/tagomoris/presentations
  42. 42. Storm or Norikra? Simple and fixed workload for huge traffic Use Storm! Complex and fragile workload for non-huge traffic Use Norikra!
  43. 43. Scalability? 10,000 - 100,000 events/sec on 2CPU 8Core server
  44. 44. HA? Distributed? NO! I have some idea, but I have no time to implement it There are no needs for HA/Distributed processing
  45. 45. Data flow & API? Use Fluentd!

×