Norikra: 
SQL Stream Processing 
In Ruby 
2014/11/19 
RubyConf 2014 DAY 3 
Satoshi Tagomori (@tagomoris)
Topics 
Why I wrote Norikra 
Norikra overview 
Norikra queries 
Use cases in production 
JRuby for me
Satoshi Tagomori (@tagomoris) 
Tokyo, Japan 
LINE Corporation
Monitoring/Data Analytics Overview 
collect parse 
clean up 
process 
visualize 
Access logs, store process 
Application logs, ...
collect parse 
clean up 
process 
visualize 
store process
collect parse 
clean up 
process 
visualize 
store process
Fluentd stream aggregation: 
Good for simple data/calculation 
collect parse 
clean up 
process 
visualize 
store process
Our services: 
More and more different services 
Many changes in a day (including logging) 
Many kind of logs for each services 
Many different metrics for each services
Fluentd stream aggregation: 
Not good for processing 
about complex/fragile environment... 
collect parse 
clean up 
process 
visualize 
store process
We want to: 
add/remove queries anytime we want 
write many queries for a service log stream 
ignore events without data we want 
make our service directors / growth hackers to write their own 
queries!
collect parse 
clean up 
process 
visualize 
store process
break.
Norikra: 
Schema-less Stream Processing with SQL 
Server software, written in JRuby, runs on JVM 
Open source software (GPLv2) 
http://norikra.github.io/ 
https://github.com/norikra/norikra
How To Setup Norikra: 
Install JRuby 
download jruby.tar.gz, extract it and export $PATH 
use rbenv 
rbenv install jruby-1.7.xx 
rbenv shell jruby-.. 
Install Norikra 
gem install norikra 
Execute Norikra server 
norikra start
Norikra Interface: 
CLI client/Client library: norikra-client 
norikra-client target open ... 
norikra-client query add ... 
tail -f ... | norikra-client event send ... 
WebUI 
show status 
show/add/remove queries 
HTTP API 
JSON, MessagePack
Norikra: 
Schema-less event stream: 
Add/Remove data fields whenever you want 
SQL: 
No more restarts to add/remove queries 
w/ JOINs, w/ SubQueries 
w/ UDF (in Java/Ruby as rubygems) 
Truly Complex events: 
Nested Hash/Array, accessible directly from SQL
Norikra Queries: (1) 
SELECT name, age 
FROM events 
target
Norikra Queries: (1) 
{“name”:”tagomoris”, 
“age”:35, “address”:”Tokyo”, 
“corp”:”LINE”, “current”:”San Diego”} 
SELECT name, age 
FROM events 
{“name”:”tagomoris”,”age”:35}
Norikra Queries: (1) 
{“name”:”tagomoris”, 
“address”:”Tokyo”, 
“corp”:”LINE”, “current”:”San Diego”} 
without “age” 
SELECT name, age 
FROM events 
nothing
Norikra Queries: (2) 
{“name”:”tagomoris”, 
“address”:”Tokyo”, 
“corp”:”LINE”, “current”:”San Diego”} 
SELECT name, age 
FROM events 
WHERE current=”San Diego” 
{“name”:”tagomoris”,”age”:35}
Norikra Queries: (2) 
{“name”:”nobu”, 
“age”:0, “address”:”Somewhere”, 
“corp”:”Heroku”, “current”:”SAN”} 
current is not “San Diego” 
SELECT name, age 
FROM events 
WHERE current=”San Diego” 
nothing
Norikra Queries: (3) 
SELECT age, COUNT(*) as cnt 
FROM events.win:time_batch(5 mins) 
GROUP BY age
Norikra Queries: (3) 
{“name”:”tagomoris”, 
“address”:”Tokyo”, 
“corp”:”LINE”, “current”:”San Diego”} 
SELECT age, COUNT(*) as cnt 
FROM events.win:time_batch(5 mins) 
GROUP BY age 
every 5 mins 
{”age”:35,”cnt”:3}, {“age”:33,”cnt”:1}, ...
Norikra Queries: (4) 
{“name”:”tagomoris”, 
“address”:”Tokyo”, 
“corp”:”LINE”, “current”:”San Diego”} 
SELECT age, COUNT(*) as cnt 
FROM 
events.win:time_batch(5 mins) 
GROUP BY age 
{”age”:35,”cnt”:3}, 
{“age”:33,”cnt”:1}, 
... 
SELECT max(age) as max 
FROM 
events.win:time_batch(5 mins) 
{“max”:51} 
every 5 mins
Norikra Queries: (5) 
{“name”:”tagomoris”, 
“user:{“age”:35, “corp”:”LINE”, 
“address”:”Tokyo”}, 
“current”:”San Diego”, 
“speaker”:true, 
“attend”:[true,true,false, ...] 
} 
SELECT age, COUNT(*) as cnt 
FROM events.win:time_batch(5 mins) 
GROUP BY age
Norikra Queries: (5) 
{“name”:”tagomoris”, 
“user:{“age”:35, “corp”:”LINE”, 
“address”:”Tokyo”}, 
“current”:”San Diego”, 
“speaker”:true, 
“attend”:[true,true,false, ...] 
} 
SELECT user.age, COUNT(*) as cnt 
FROM events.win:time_batch(5 mins) 
GROUP BY user.age
Norikra Queries: (5) 
{“name”:”tagomoris”, 
“user:{“age”:35, “corp”:”LINE”, 
“address”:”Tokyo”}, 
“current”:”San Diego”, 
“speaker”:true, 
“attend”:[true,true,false, ...] 
} 
SELECT user.age, COUNT(*) as cnt 
FROM events.win:time_batch(5 mins) 
WHERE current=”San Diego” 
AND attend.$0 AND attend.$1 
GROUP BY user.age
break. 
next: use cases
Use case 1: 
External API call reports for partners (LINE) 
External API call for LINE Business Connect 
LINE backend sends requests to partner’s API 
endpoint using users’ messages 
http://developers.linecorp.com/blog/?p=3386
Use case 1: 
External API call reports for partners (LINE) 
channel 
gateway 
partner’s 
server 
logs 
query 
results 
MySQL Mail 
SELECT 
channelId 
AS 
channel_id, 
reason, 
detail, 
count(*) 
AS 
error_count, 
min(timestamp) 
AS 
first_timestamp, 
max(timestamp) 
AS 
last_timestamp 
FROM 
api_error_log.win:time_batch(60 
sec) 
GROUP 
BY 
channelId,reason,detail 
HAVING 
count(*) 
> 
0 
http://developers.linecorp.com/blog/?p=3386
Use case 1: 
External API call reports for partners (LINE) 
API error response summaries 
http://developers.linecorp.com/blog/?p=3386
Use case 2: Lambda architecture 
Prompt reports for Ad service console 
Prompt reports with Norikra + Fixed reports with Hive 
app 
serverapp 
serverapp 
server 
app 
serverapp 
serverapp 
server 
Fluentd 
HDFS 
console 
service 
execute hive query 
(daily) 
fetch query results 
(frequently) 
impression 
logs
Use case 2: 
Prompt reports for Ad service console 
SELECT 
yyyymmdd, 
hh, 
campaign_id, 
region, 
lang, 
COUNT(*) 
AS 
click, 
COUNT(DISTINCT 
member_id) 
AS 
uu 
FROM 
( 
SELECT 
yyyymmdd, 
hh, 
get_json_object(log, 
'$.campaign.id') 
AS 
campaign_id, 
get_json_object(log, 
'$.member.region') 
AS 
region, 
get_json_object(log, 
'$.member.lang') 
AS 
lang, 
get_json_object(log, 
'$.member.id') 
AS 
member_id 
FROM 
applog 
WHERE 
service='myservice' 
AND 
yyyymmdd='20140913' 
AND 
get_json_object(log, 
'$.type')='click' 
) 
x 
GROUP 
BY 
yyyymmdd, 
hh, 
campaign_id, 
region, 
lang 
Hive query 
for fixed reports
Use case 2: 
Prompt reports for Ad service console 
Norikra query for prompt reports 
SELECT 
campaign.id 
AS 
campaign_id, 
member.region 
AS 
region, 
member.lang 
AS 
lang, 
COUNT(*) 
AS 
click, 
COUNT(DISTINCT 
member.id) 
AS 
uu 
FROM 
myservice.win:time_batch(1 
hours) 
WHERE 
type="click" 
GROUP 
BY 
campaign.id, 
member.region, 
member.lang
Use case 3: 
Realtime access dashboard on Google Platform 
Access log visualization 
Count using Norikra (2-step), Store on Google BigQuery 
Dashboard on Google Spreadsheet + Apps Script 
http://qiita.com/kazunori279/items/6329df57635799405547 
https://www.youtube.com/watch?v=EZkw5TDcCGw
Use case 3: 
Realtime access dashboard on Google Platform 
Server 
Fluentd 
http://qiita.com/kazunori279/items/6329df57635799405547 
https://www.youtube.com/watch?v=EZkw5TDcCGw 
ngnix 
access log 
access logs 
to BigQuery 
norikra query results 
norikra query to aggregate node 
to aggregate locally
Use case 3: 
Realtime access dashboard on Google Platform 
70 servers, 120,000 requests/sec (or more!) 
Fluentd 
logs to store 
http://qiita.com/kazunori279/items/6329df57635799405547 
https://www.youtube.com/watch?v=EZkw5TDcCGw 
ngnix 
ngngninxix ngngninxix ngngninxix ngngninxix 
ngngninxix ngngninxix ngngninxix ngngninxix ngnix 
Google 
BigQuery 
Google 
Spreadsheet 
+ Apps script 
... 
counts per host 
total count
Why Norikra is written in JRuby 
Esper 
CEP(Complex Event Processing) library, written in Java 
Rubygems.org 
Open repository, for public UDF plugins of Norikra provided as gem
JRuby for me 
Ruby! (by great JRuby developer team!) 
makes developing Norikra dramatically faster 
with rubygems and rubygems.org for easy deployment/installation 
with Java libraries, ex: Jetty, Esper, ... 
There are not so many users in Tokyo :(
More queries, more simplicity 
and less latency 
in data processing 
Thanks! 
photo: by my co-workers 
http://norikra.github.io/ 
https://github.com/norikra/norikra
See also: 
http://norikra.github.io/ 
“Lambda Architecture Platform Using SQL” 
http://www.slideshare.net/tagomoris/lambda-architecture-using-sql-hadoopcon- 
2014-taiwan 
“Stream processing and Norikra” 
http://www.slideshare.net/tagomoris/stream-processing-and-norikra 
“Batch processing and Stream processing by SQL” 
http://www.slideshare.net/tagomoris/hcj2014-sql 
“Norikra in Action” 
http://www.slideshare.net/tagomoris/norikra-in-action-ver-2014-spring 
http://www.slideshare.net/tagomoris/presentations
Storm or Norikra? 
Simple and fixed workload for huge traffic 
Use Storm! 
Complex and fragile workload for non-huge traffic 
Use Norikra!
Scalability? 
10,000 - 100,000 events/sec 
on 2CPU 8Core server
HA? Distributed? 
NO! 
I have some idea, but I have no time to implement it 
There are no needs for HA/Distributed processing
Data flow & API? 
Use Fluentd!

Norikra: SQL Stream Processing In Ruby

  • 1.
    Norikra: SQL StreamProcessing In Ruby 2014/11/19 RubyConf 2014 DAY 3 Satoshi Tagomori (@tagomoris)
  • 2.
    Topics Why Iwrote Norikra Norikra overview Norikra queries Use cases in production JRuby for me
  • 3.
    Satoshi Tagomori (@tagomoris) Tokyo, Japan LINE Corporation
  • 7.
    Monitoring/Data Analytics Overview collect parse clean up process visualize Access logs, store process Application logs, ...
  • 9.
    collect parse cleanup process visualize store process
  • 11.
    collect parse cleanup process visualize store process
  • 13.
    Fluentd stream aggregation: Good for simple data/calculation collect parse clean up process visualize store process
  • 14.
    Our services: Moreand more different services Many changes in a day (including logging) Many kind of logs for each services Many different metrics for each services
  • 15.
    Fluentd stream aggregation: Not good for processing about complex/fragile environment... collect parse clean up process visualize store process
  • 16.
    We want to: add/remove queries anytime we want write many queries for a service log stream ignore events without data we want make our service directors / growth hackers to write their own queries!
  • 17.
    collect parse cleanup process visualize store process
  • 18.
  • 20.
    Norikra: Schema-less StreamProcessing with SQL Server software, written in JRuby, runs on JVM Open source software (GPLv2) http://norikra.github.io/ https://github.com/norikra/norikra
  • 21.
    How To SetupNorikra: Install JRuby download jruby.tar.gz, extract it and export $PATH use rbenv rbenv install jruby-1.7.xx rbenv shell jruby-.. Install Norikra gem install norikra Execute Norikra server norikra start
  • 22.
    Norikra Interface: CLIclient/Client library: norikra-client norikra-client target open ... norikra-client query add ... tail -f ... | norikra-client event send ... WebUI show status show/add/remove queries HTTP API JSON, MessagePack
  • 23.
    Norikra: Schema-less eventstream: Add/Remove data fields whenever you want SQL: No more restarts to add/remove queries w/ JOINs, w/ SubQueries w/ UDF (in Java/Ruby as rubygems) Truly Complex events: Nested Hash/Array, accessible directly from SQL
  • 24.
    Norikra Queries: (1) SELECT name, age FROM events target
  • 25.
    Norikra Queries: (1) {“name”:”tagomoris”, “age”:35, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”} SELECT name, age FROM events {“name”:”tagomoris”,”age”:35}
  • 26.
    Norikra Queries: (1) {“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”} without “age” SELECT name, age FROM events nothing
  • 27.
    Norikra Queries: (2) {“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”} SELECT name, age FROM events WHERE current=”San Diego” {“name”:”tagomoris”,”age”:35}
  • 28.
    Norikra Queries: (2) {“name”:”nobu”, “age”:0, “address”:”Somewhere”, “corp”:”Heroku”, “current”:”SAN”} current is not “San Diego” SELECT name, age FROM events WHERE current=”San Diego” nothing
  • 29.
    Norikra Queries: (3) SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY age
  • 30.
    Norikra Queries: (3) {“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”} SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY age every 5 mins {”age”:35,”cnt”:3}, {“age”:33,”cnt”:1}, ...
  • 31.
    Norikra Queries: (4) {“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”} SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY age {”age”:35,”cnt”:3}, {“age”:33,”cnt”:1}, ... SELECT max(age) as max FROM events.win:time_batch(5 mins) {“max”:51} every 5 mins
  • 32.
    Norikra Queries: (5) {“name”:”tagomoris”, “user:{“age”:35, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”San Diego”, “speaker”:true, “attend”:[true,true,false, ...] } SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY age
  • 33.
    Norikra Queries: (5) {“name”:”tagomoris”, “user:{“age”:35, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”San Diego”, “speaker”:true, “attend”:[true,true,false, ...] } SELECT user.age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY user.age
  • 34.
    Norikra Queries: (5) {“name”:”tagomoris”, “user:{“age”:35, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”San Diego”, “speaker”:true, “attend”:[true,true,false, ...] } SELECT user.age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) WHERE current=”San Diego” AND attend.$0 AND attend.$1 GROUP BY user.age
  • 35.
  • 36.
    Use case 1: External API call reports for partners (LINE) External API call for LINE Business Connect LINE backend sends requests to partner’s API endpoint using users’ messages http://developers.linecorp.com/blog/?p=3386
  • 37.
    Use case 1: External API call reports for partners (LINE) channel gateway partner’s server logs query results MySQL Mail SELECT channelId AS channel_id, reason, detail, count(*) AS error_count, min(timestamp) AS first_timestamp, max(timestamp) AS last_timestamp FROM api_error_log.win:time_batch(60 sec) GROUP BY channelId,reason,detail HAVING count(*) > 0 http://developers.linecorp.com/blog/?p=3386
  • 38.
    Use case 1: External API call reports for partners (LINE) API error response summaries http://developers.linecorp.com/blog/?p=3386
  • 39.
    Use case 2:Lambda architecture Prompt reports for Ad service console Prompt reports with Norikra + Fixed reports with Hive app serverapp serverapp server app serverapp serverapp server Fluentd HDFS console service execute hive query (daily) fetch query results (frequently) impression logs
  • 40.
    Use case 2: Prompt reports for Ad service console SELECT yyyymmdd, hh, campaign_id, region, lang, COUNT(*) AS click, COUNT(DISTINCT member_id) AS uu FROM ( SELECT yyyymmdd, hh, get_json_object(log, '$.campaign.id') AS campaign_id, get_json_object(log, '$.member.region') AS region, get_json_object(log, '$.member.lang') AS lang, get_json_object(log, '$.member.id') AS member_id FROM applog WHERE service='myservice' AND yyyymmdd='20140913' AND get_json_object(log, '$.type')='click' ) x GROUP BY yyyymmdd, hh, campaign_id, region, lang Hive query for fixed reports
  • 41.
    Use case 2: Prompt reports for Ad service console Norikra query for prompt reports SELECT campaign.id AS campaign_id, member.region AS region, member.lang AS lang, COUNT(*) AS click, COUNT(DISTINCT member.id) AS uu FROM myservice.win:time_batch(1 hours) WHERE type="click" GROUP BY campaign.id, member.region, member.lang
  • 42.
    Use case 3: Realtime access dashboard on Google Platform Access log visualization Count using Norikra (2-step), Store on Google BigQuery Dashboard on Google Spreadsheet + Apps Script http://qiita.com/kazunori279/items/6329df57635799405547 https://www.youtube.com/watch?v=EZkw5TDcCGw
  • 43.
    Use case 3: Realtime access dashboard on Google Platform Server Fluentd http://qiita.com/kazunori279/items/6329df57635799405547 https://www.youtube.com/watch?v=EZkw5TDcCGw ngnix access log access logs to BigQuery norikra query results norikra query to aggregate node to aggregate locally
  • 44.
    Use case 3: Realtime access dashboard on Google Platform 70 servers, 120,000 requests/sec (or more!) Fluentd logs to store http://qiita.com/kazunori279/items/6329df57635799405547 https://www.youtube.com/watch?v=EZkw5TDcCGw ngnix ngngninxix ngngninxix ngngninxix ngngninxix ngngninxix ngngninxix ngngninxix ngngninxix ngnix Google BigQuery Google Spreadsheet + Apps script ... counts per host total count
  • 45.
    Why Norikra iswritten in JRuby Esper CEP(Complex Event Processing) library, written in Java Rubygems.org Open repository, for public UDF plugins of Norikra provided as gem
  • 46.
    JRuby for me Ruby! (by great JRuby developer team!) makes developing Norikra dramatically faster with rubygems and rubygems.org for easy deployment/installation with Java libraries, ex: Jetty, Esper, ... There are not so many users in Tokyo :(
  • 47.
    More queries, moresimplicity and less latency in data processing Thanks! photo: by my co-workers http://norikra.github.io/ https://github.com/norikra/norikra
  • 48.
    See also: http://norikra.github.io/ “Lambda Architecture Platform Using SQL” http://www.slideshare.net/tagomoris/lambda-architecture-using-sql-hadoopcon- 2014-taiwan “Stream processing and Norikra” http://www.slideshare.net/tagomoris/stream-processing-and-norikra “Batch processing and Stream processing by SQL” http://www.slideshare.net/tagomoris/hcj2014-sql “Norikra in Action” http://www.slideshare.net/tagomoris/norikra-in-action-ver-2014-spring http://www.slideshare.net/tagomoris/presentations
  • 49.
    Storm or Norikra? Simple and fixed workload for huge traffic Use Storm! Complex and fragile workload for non-huge traffic Use Norikra!
  • 50.
    Scalability? 10,000 -100,000 events/sec on 2CPU 8Core server
  • 51.
    HA? Distributed? NO! I have some idea, but I have no time to implement it There are no needs for HA/Distributed processing
  • 52.
    Data flow &API? Use Fluentd!