Ruby on Redis	

Pascal Weemaels	

Koen Handekyn	

Oct 2013
Target	

Create a Zip file of PDF’s
based on a CSV data file	

‣  Linear version	

‣  Making it scale with Redis	


parse csv
	


create pdf
	

create pdf
	


...	


create pdf
	


zip
Step 1: linear 	

‣  Parse CSV	

•  std lib : require ‘csv’	

•  docs = CSV.read("#{DATA}.csv")
Simple Templating with String Interpolation	

invoice.html	

<<Q	

<div class="title">	

INVOICE #{invoice_nr}	


‣  Merge data into HTML	

• 

template =
File.new('invoice.html').
read

• 

html =
eval("<<QQQn#{template}
nQQQ”)

</div>	

<div class="address">	

#{name}</br>	

#{street}</br>	

#{zip} #{city}</br>	

</div>	

Q
Step 1: linear 	

‣  Create PDF	

•  prince xml using princely gem	

•  http://www.princexml.com	

•  p = Princely.new
p.add_style_sheets('invoice.css')
p.pdf_from_string(html)
Step 1: linear	

‣  Create ZIP	

•  Zip::ZipOutputstream.
open(zipfile_name)do |zos|
files.each do |file, content|
zos.new_entry(file)
zos.puts content
end
end
Full Code
	

require 'csv'!
require 'princely'!
require 'zip/zip’!
!
DATA_FILE = ARGV[0]!
DATA_FILE_BASE_NAME = File.basename(DATA_FILE, ".csv”)!
!
# create a pdf document from a csv line!
def create_pdf(invoice_nr, name, street, zip, city)!
template = File.new('../resources/invoice.html').read!
html = eval("<<WTFMFn#{template}nWTFMF")!
p = Princely.new!
p.add_style_sheets('../resources/invoice.css')!
p.pdf_from_string(html)!
end!
!
# zip files from hash !
def create_zip(files_h)!
zipfile_name = "../out/#{DATA_FILE_BASE_NAME}.#{Time.now.to_s}.zip"!
Zip::ZipOutputStream.open(zipfile_name) do |zos|!
files_h.each do |name, content|!
zos.put_next_entry "#{name}.pdf"!
zos.puts content!
end!
end!
zipfile_name!
end!
!
# load data from csv!
docs = CSV.read(DATA_FILE) # array of arrays!
!
# create a pdf for each line in the csv !
# and put it in a hash!
files_h = docs.inject({}) do |files_h, doc|!
files_h[doc[0]] = create_pdf(*doc)!
files_h!
end!
!
# zip all pfd's from the hash !
create_zip files_h!
!
DEMO
Step 2: from linear ...	

parse csv
	


create pdf
	

create pdf
	


...	


create pdf
	


zip
Step 2: ...to parallel	

parse csv
	


create pdf
	


create pdf
	


zip
	


Threads
	

?
	


create pdf
Multi Threaded	

‣  Advantage	

•  Lightweight (minimal overhead)	

‣  Challenges (or why is it hard)	

•  Hard to code: most data structures are not thread safe by default, they
need synchronized access	


•  Hard to test: different execution paths , timings	

•  Hard to maintain	

‣  Limitation	

•  single machine - not a solution for horizontal scalability 
beyond the multi core cpu
Step 2: ...to parallel	

parse csv
	

?
	


create pdf
	


create pdf
	


zip
	


create pdf
Multi Process	

• scale across machines	

•  advanced support for debugging and monitoring at the
OS level	


• simpler (code, testing, debugging, ...)	

•  slightly more overhead 	


	

BUT
But	

all this assumes	

“shared state across processes”	


MemCached	


parse csv
	


SQL?	


shared state
	


create pdf
	


create pdf
	


create pdf
	


shared state
	


File System
	


zip
	

… OR …
	


Terra Cotta
Hello Redis	

‣  Shared Memory Key Value Store with
High Level Data Structure support 	

•  String (String, Int, Float)	

•  Hash (Map, Dictionary) 	

•  List (Queue) 	

•  Set 	

•  ZSet (ordered by member or score)
About Redis	

•  Single threaded : 1 thread to serve them all	

•  (fit) Everything in memory	

• 

“Transactions” (multi exec)	


• 

Expiring keys	


• 

LUA Scripting	


• 

Publisher-Subscriber	


• 

Auto Create and Destroy	


• 

Pipelining	


• 

But … full clustering (master-master) is not available (yet)
Hello Redis	

‣  redis-cli	

• 
• 
• 
• 

set name “pascal” =
“pascal”
incr counter = 1
incr counter = 2
hset pascal name
“pascal”

• 

hset pascal address
“merelbeke”

• 
• 

sadd persons pascal
smembers persons =
[pascal]

• 
• 
• 
• 
• 
• 
• 

keys *
type pascal = hash
lpush todo “read” = 1
lpush todo “eat” = 2
lpop todo = “eat”
rpoplpush todo done =
“read”
lrange done 0 -1 =
“read”
Let Redis Distribute	

parse csv
	


create pdf
	


process	


process	


create pdf
	


process	


zip
	


...
Spread the Work	

parse csv
	


process	


1
	


zip
	


counter
	


Queue with data
	


create pdf
	


process	


create pdf
	


process	


...
Ruby on Redis	

‣ 

Put PDF Create Input data on a Queue and do the counter
bookkeeping	


!
docs.each do |doc|!
data = YAML::dump(doc)!
!r.lpush 'pdf:queue’, data!
r.incr ctr” # bookkeeping!
end!
Create PDF’s	

process	


parse csv
	


zip
	


counter
	

Queue with data
	

Hash with pdfs
	


2	


1	

create pdf
	


process	


create pdf
	


process	


...
Ruby on Redis	

‣ 

Read PDF input data from Queue and do the counter bookkeeping
and put each created PDF in a Redis hash and signal if ready	


while (true)!
_, msg = r.brpop 'pdf:queue’!
!doc = YAML::load(msg)!
#name of hash, key=docname, value=pdf!
r.hset(‘pdf:pdfs’, doc[0], create_pdf(*doc))
!
ctr = r.decr ‘ctr’

!

r.rpush ready, done if ctr == 0!
end!
Zip When Done	

parse csv
	


process	


ready
	


zip
	


3
Hash with pdfs
	


create pdf
	


process	


create pdf
	


process	


...
Ruby on Redis	

‣ 

Wait for the ready signal 
Fetch all pdf ’s
And zip them	


!
r.brpop ready“ # wait for signal!
pdfs = r.hgetall ‘pdf:pdfs‘ # fetch hash!
create_zip pdfs # zip it
More Parallelism 	

parse csv
	


zip
	

ready 	

	

ready 	

ready

counter
counter	

counter	

	

hash 	

	

hash Pdfs
Hash with
	


Queue with data
	


create pdf
	


create pdf
	


...
Ruby on Redis	

‣ 

Put PDF Create Input data on a Queue and do the counter
bookkeeping	


# unique id for this input file!
UUID = SecureRandom.uuid!
docs.each do |doc|!
data = YAML::dump([UUID, doc])!
!r.lpush 'pdf:queue’, data!
r.incr ctr:#{UUID}” # bookkeeping!
end!
Ruby on Redis	

‣ 

Read PDF input data from Queue and do the counter bookkeeping and
put each created PDF in a Redis hash	


while (true)!
_, msg = r.brpop 'pdf:queue’!
uuid, doc = YAML::load(msg)!
r.hset(uuid, doc[0], create_pdf(*doc))!
ctr = r.decr ctr:#{uuid}

!

r.rpush ready:#{uuid}, done if ctr == 0
end!

!
Ruby on Redis	

‣ 

Wait for the ready signal 
Fetch all pdf ’s
And zip them	


!
r.brpop ready:#{UUID}“ # wait for signal!
pdfs = r.hgetall(‘pdf:pdfs‘) # fetch hash!
create_zip(pdfs) # zip it
Full Code
	

require 'csv'!
require 'princely'!
require 'zip/zip’!
!
DATA_FILE = ARGV[0]!
DATA_FILE_BASE_NAME = File.basename(DATA_FILE, .csv”)!
!
# create a pdf document from a csv line!
def create_pdf(invoice_nr, name, street, zip, city)!
template = File.new('../resources/invoice.html').read!
html = eval(WTFMFn#{template}nWTFMF)!
p = Princely.new!
p.add_style_sheets('../resources/invoice.css')!
p.pdf_from_string(html)!
end!
!
# zip files from hash !
def create_zip(files_h)!
zipfile_name = ../out/#{DATA_FILE_BASE_NAME}.#{Time.now.to_s}.zip!
Zip::ZipOutputStream.open(zipfile_name) do |zos|!
files_h.each do |name, content|!
zos.put_next_entry #{name}.pdf!
zos.puts content!
end!
end!
zipfile_name!
end!
!
# load data from csv!
docs = CSV.read(DATA_FILE) # array of arrays!
!
# create a pdf for each line in the csv !
# and put it in a hash!
files_h = docs.inject({}) do |files_h, doc|!
files_h[doc[0]] = create_pdf(*doc)!
files_h!
end!
!
# zip all pfd's from the hash !
create_zip files_h!
!

LINEAR	


require 'csv’!
require 'zip/zip'!
require 'redis'!
require 'yaml'!
require 'securerandom'!
!
# zip files from hash !
def create_zip(files_h)!
zipfile_name = ../out/#{DATA_FILE_BASE_NAME}.#{Time.now.to_s}.zip!
Zip::ZipOutputStream.open(zipfile_name) do |zos|!
files_h.each do |name, content|!
zos.put_next_entry #{name}.pdf!
zos.puts content!
end!
end!
zipfile_name!
end!
!
DATA_FILE = ARGV[0]!
DATA_FILE_BASE_NAME = File.basename(DATA_FILE, .csv)!
UUID = SecureRandom.uuid!
!
r = Redis.new!
my_counter = ctr:#{UUID}!
!
# load data from csv!
docs = CSV.read(DATA_FILE) # array of arrays!
!
docs.each do |doc| # distribute!!
r.lpush 'pdf:queue' , YAML::dump([UUID, doc])!
r.incr my_counter!
end!
!
r.brpop ready:#{UUID} #collect!!
create_zip(r.hgetall(UUID)) !
!
# clean up!
r.del my_counter!
r.del UUID !
puts All done!”!

MAIN	


require 'redis'!
require 'princely'!
require 'yaml’!
!
# create a pdf document from a csv line!
def create_pdf(invoice_nr, name, street, zip, city)!
template = File.new('../resources/invoice.html').read!
html = eval(WTFMFn#{template}nWTFMF)!
p = Princely.new!
p.add_style_sheets('../resources/invoice.css')!
p.pdf_from_string(html)!
end!
!
r = Redis.new!
while (true)!
_, msg = r.brpop 'pdf:queue'!
uuid, doc = YAML::load(msg)!
r.hset(uuid , doc[0] , create_pdf(*doc))!
ctr = r.decr ctr:#{uuid} !
r.rpush ready:#{uuid}, done if ctr == 0!
end!

WORKER	


Key functions (create pdf and create zip)
remain unchanged.	

	

Distribution code highlighted
DEMO 2
Multi Language Participants	

parse csv
	


zip
	


counter
counter	

counter	

	

Queue with data
	


create pdf
	


hash 	

	

hash pdfs
Hash with
	


create pdf
	


...
Conclusions	

From Linear To Multi Process Distributed	

Is easy with	

Redis Shared Memory High Level Data Structures	

	

Atomic Counter for bookkeeping	

Queue for work distribution	

Queue as Signal	

Hash for result sets

Ruby on Redis

  • 1.
    Ruby on Redis PascalWeemaels Koen Handekyn Oct 2013
  • 2.
    Target Create a Zipfile of PDF’s based on a CSV data file ‣  Linear version ‣  Making it scale with Redis parse csv create pdf create pdf ... create pdf zip
  • 3.
    Step 1: linear ‣  Parse CSV •  std lib : require ‘csv’ •  docs = CSV.read("#{DATA}.csv")
  • 4.
    Simple Templating withString Interpolation invoice.html <<Q <div class="title"> INVOICE #{invoice_nr} ‣  Merge data into HTML •  template = File.new('invoice.html'). read •  html = eval("<<QQQn#{template} nQQQ”) </div> <div class="address"> #{name}</br> #{street}</br> #{zip} #{city}</br> </div> Q
  • 5.
    Step 1: linear ‣  Create PDF •  prince xml using princely gem •  http://www.princexml.com •  p = Princely.new p.add_style_sheets('invoice.css') p.pdf_from_string(html)
  • 6.
    Step 1: linear ‣ Create ZIP •  Zip::ZipOutputstream. open(zipfile_name)do |zos| files.each do |file, content| zos.new_entry(file) zos.puts content end end
  • 7.
    Full Code require 'csv'! require'princely'! require 'zip/zip’! ! DATA_FILE = ARGV[0]! DATA_FILE_BASE_NAME = File.basename(DATA_FILE, ".csv”)! ! # create a pdf document from a csv line! def create_pdf(invoice_nr, name, street, zip, city)! template = File.new('../resources/invoice.html').read! html = eval("<<WTFMFn#{template}nWTFMF")! p = Princely.new! p.add_style_sheets('../resources/invoice.css')! p.pdf_from_string(html)! end! ! # zip files from hash ! def create_zip(files_h)! zipfile_name = "../out/#{DATA_FILE_BASE_NAME}.#{Time.now.to_s}.zip"! Zip::ZipOutputStream.open(zipfile_name) do |zos|! files_h.each do |name, content|! zos.put_next_entry "#{name}.pdf"! zos.puts content! end! end! zipfile_name! end! ! # load data from csv! docs = CSV.read(DATA_FILE) # array of arrays! ! # create a pdf for each line in the csv ! # and put it in a hash! files_h = docs.inject({}) do |files_h, doc|! files_h[doc[0]] = create_pdf(*doc)! files_h! end! ! # zip all pfd's from the hash ! create_zip files_h! !
  • 8.
  • 9.
    Step 2: fromlinear ... parse csv create pdf create pdf ... create pdf zip
  • 10.
    Step 2: ...toparallel parse csv create pdf create pdf zip Threads ? create pdf
  • 11.
    Multi Threaded ‣  Advantage • Lightweight (minimal overhead) ‣  Challenges (or why is it hard) •  Hard to code: most data structures are not thread safe by default, they need synchronized access •  Hard to test: different execution paths , timings •  Hard to maintain ‣  Limitation •  single machine - not a solution for horizontal scalability beyond the multi core cpu
  • 12.
    Step 2: ...toparallel parse csv ? create pdf create pdf zip create pdf
  • 13.
    Multi Process • scale acrossmachines •  advanced support for debugging and monitoring at the OS level • simpler (code, testing, debugging, ...) •  slightly more overhead BUT
  • 14.
    But all this assumes “sharedstate across processes” MemCached parse csv SQL? shared state create pdf create pdf create pdf shared state File System zip … OR … Terra Cotta
  • 15.
    Hello Redis ‣  SharedMemory Key Value Store with High Level Data Structure support •  String (String, Int, Float) •  Hash (Map, Dictionary) •  List (Queue) •  Set •  ZSet (ordered by member or score)
  • 16.
    About Redis •  Singlethreaded : 1 thread to serve them all •  (fit) Everything in memory •  “Transactions” (multi exec) •  Expiring keys •  LUA Scripting •  Publisher-Subscriber •  Auto Create and Destroy •  Pipelining •  But … full clustering (master-master) is not available (yet)
  • 17.
    Hello Redis ‣  redis-cli •  •  •  •  setname “pascal” = “pascal” incr counter = 1 incr counter = 2 hset pascal name “pascal” •  hset pascal address “merelbeke” •  •  sadd persons pascal smembers persons = [pascal] •  •  •  •  •  •  •  keys * type pascal = hash lpush todo “read” = 1 lpush todo “eat” = 2 lpop todo = “eat” rpoplpush todo done = “read” lrange done 0 -1 = “read”
  • 18.
    Let Redis Distribute parsecsv create pdf process process create pdf process zip ...
  • 19.
    Spread the Work parsecsv process 1 zip counter Queue with data create pdf process create pdf process ...
  • 20.
    Ruby on Redis ‣  PutPDF Create Input data on a Queue and do the counter bookkeeping ! docs.each do |doc|! data = YAML::dump(doc)! !r.lpush 'pdf:queue’, data! r.incr ctr” # bookkeeping! end!
  • 21.
    Create PDF’s process parse csv zip counter Queuewith data Hash with pdfs 2 1 create pdf process create pdf process ...
  • 22.
    Ruby on Redis ‣  ReadPDF input data from Queue and do the counter bookkeeping and put each created PDF in a Redis hash and signal if ready while (true)! _, msg = r.brpop 'pdf:queue’! !doc = YAML::load(msg)! #name of hash, key=docname, value=pdf! r.hset(‘pdf:pdfs’, doc[0], create_pdf(*doc)) ! ctr = r.decr ‘ctr’ ! r.rpush ready, done if ctr == 0! end!
  • 23.
    Zip When Done parsecsv process ready zip 3 Hash with pdfs create pdf process create pdf process ...
  • 24.
    Ruby on Redis ‣  Waitfor the ready signal Fetch all pdf ’s And zip them ! r.brpop ready“ # wait for signal! pdfs = r.hgetall ‘pdf:pdfs‘ # fetch hash! create_zip pdfs # zip it
  • 25.
    More Parallelism parsecsv zip ready ready ready counter counter counter hash hash Pdfs Hash with Queue with data create pdf create pdf ...
  • 26.
    Ruby on Redis ‣  PutPDF Create Input data on a Queue and do the counter bookkeeping # unique id for this input file! UUID = SecureRandom.uuid! docs.each do |doc|! data = YAML::dump([UUID, doc])! !r.lpush 'pdf:queue’, data! r.incr ctr:#{UUID}” # bookkeeping! end!
  • 27.
    Ruby on Redis ‣  ReadPDF input data from Queue and do the counter bookkeeping and put each created PDF in a Redis hash while (true)! _, msg = r.brpop 'pdf:queue’! uuid, doc = YAML::load(msg)! r.hset(uuid, doc[0], create_pdf(*doc))! ctr = r.decr ctr:#{uuid} ! r.rpush ready:#{uuid}, done if ctr == 0 end! !
  • 28.
    Ruby on Redis ‣  Waitfor the ready signal Fetch all pdf ’s And zip them ! r.brpop ready:#{UUID}“ # wait for signal! pdfs = r.hgetall(‘pdf:pdfs‘) # fetch hash! create_zip(pdfs) # zip it
  • 29.
    Full Code require 'csv'! require'princely'! require 'zip/zip’! ! DATA_FILE = ARGV[0]! DATA_FILE_BASE_NAME = File.basename(DATA_FILE, .csv”)! ! # create a pdf document from a csv line! def create_pdf(invoice_nr, name, street, zip, city)! template = File.new('../resources/invoice.html').read! html = eval(WTFMFn#{template}nWTFMF)! p = Princely.new! p.add_style_sheets('../resources/invoice.css')! p.pdf_from_string(html)! end! ! # zip files from hash ! def create_zip(files_h)! zipfile_name = ../out/#{DATA_FILE_BASE_NAME}.#{Time.now.to_s}.zip! Zip::ZipOutputStream.open(zipfile_name) do |zos|! files_h.each do |name, content|! zos.put_next_entry #{name}.pdf! zos.puts content! end! end! zipfile_name! end! ! # load data from csv! docs = CSV.read(DATA_FILE) # array of arrays! ! # create a pdf for each line in the csv ! # and put it in a hash! files_h = docs.inject({}) do |files_h, doc|! files_h[doc[0]] = create_pdf(*doc)! files_h! end! ! # zip all pfd's from the hash ! create_zip files_h! ! LINEAR require 'csv’! require 'zip/zip'! require 'redis'! require 'yaml'! require 'securerandom'! ! # zip files from hash ! def create_zip(files_h)! zipfile_name = ../out/#{DATA_FILE_BASE_NAME}.#{Time.now.to_s}.zip! Zip::ZipOutputStream.open(zipfile_name) do |zos|! files_h.each do |name, content|! zos.put_next_entry #{name}.pdf! zos.puts content! end! end! zipfile_name! end! ! DATA_FILE = ARGV[0]! DATA_FILE_BASE_NAME = File.basename(DATA_FILE, .csv)! UUID = SecureRandom.uuid! ! r = Redis.new! my_counter = ctr:#{UUID}! ! # load data from csv! docs = CSV.read(DATA_FILE) # array of arrays! ! docs.each do |doc| # distribute!! r.lpush 'pdf:queue' , YAML::dump([UUID, doc])! r.incr my_counter! end! ! r.brpop ready:#{UUID} #collect!! create_zip(r.hgetall(UUID)) ! ! # clean up! r.del my_counter! r.del UUID ! puts All done!”! MAIN require 'redis'! require 'princely'! require 'yaml’! ! # create a pdf document from a csv line! def create_pdf(invoice_nr, name, street, zip, city)! template = File.new('../resources/invoice.html').read! html = eval(WTFMFn#{template}nWTFMF)! p = Princely.new! p.add_style_sheets('../resources/invoice.css')! p.pdf_from_string(html)! end! ! r = Redis.new! while (true)! _, msg = r.brpop 'pdf:queue'! uuid, doc = YAML::load(msg)! r.hset(uuid , doc[0] , create_pdf(*doc))! ctr = r.decr ctr:#{uuid} ! r.rpush ready:#{uuid}, done if ctr == 0! end! WORKER Key functions (create pdf and create zip) remain unchanged. Distribution code highlighted
  • 30.
  • 31.
    Multi Language Participants parsecsv zip counter counter counter Queue with data create pdf hash hash pdfs Hash with create pdf ...
  • 32.
    Conclusions From Linear ToMulti Process Distributed Is easy with Redis Shared Memory High Level Data Structures Atomic Counter for bookkeeping Queue for work distribution Queue as Signal Hash for result sets