SlideShare a Scribd company logo
Internship Final Report
Sep 30, 2016
Yuta Iwama
Who am I
• Yuma Iwama (@ganmacs)
• Master’s student, The University of Tokyo
• Research: Programming languages (My
theme is extending language syntax)
• Group: Chiba Shigeru Group
What I did in summer intern
• Add features and enhancements to Fluentd
v0.14.x
What I did in summer intern
• 6 features
• Counter API (Not merged yet)
• Data compression in buffer plugins and forward plugins
• New out_file plugin only for <secondary> section
• A CLI tool to read dumped event data
• Log rotation
• `filter_with_time` method in filter plugins
• 2 enhancements
• Optimizing multiple filter calls
• Add event size to options in a forward protocol
• Some small tasks
I’ll talk about
• Counter API
• Data compression in buffer plugins and
forward plugins
• New out_file plugin only for <secondary>
section
• A CLI tool to read dumped event data
• Log rotation
• Optimizing multiple filter calls
I’ll talk about
• Counter API
• Data compression in buffer plugins and
forward plugins
• New out_file plugin only for <secondary>
section
• A CLI tool to read dumped event data
• Log rotation
• Optimizing multiple filter calls
Data Compression
in buffer plugins and
forward plugins
Current buffer plugins and
forward plugins in Fluentd
• Buffer plugins have data as a string (formatted
with MessagePack or user custom formats)
• Forward plugins send data as a string (format is
same as buffer plugins)
• Although data is serialized with MessagePack, its
footprint is large
• Current way consumes many memory resources
and bandwidth of the network
New buffer plugins and
forward plugins
• String data in buffer plugins can be compressed
• Forward plugins can send and receive compressed data
• Things to be able to
• Save the bandwidth across the datacenter
• Accelerate the transfer speed and save the time
• Reduce memory consumptions and costs of IaaS (EC2 ,
etc.)
Implementation
• I used “zlib” in Ruby to implement a
compression/decompression method
• It’s hard to work both compressed version
and raw version (To solve this problem, I
used `extend` in Ruby not to break existing
interface)
New out_file plugin only for
<secondary> section
Background
• Many users use out_file plugin to dump buffer
with <secondary> sections when primary
buffered output plugins are failing flush
• But out_file is too complex and has too many
features for such purpose
• => We need simple out_file only for
<secondary> section just to dump buffer
New plugin: secondary_out_file
• Only four attributes
• directory: the directory dumped data saved
• basename: the file name of dumped data (default value is
dump.bin)
• append: the flushed data is appended to an existing file or
not (default false)
• compress: The type of the file ( gzip or txt, default is txt)
• Users can use this plugin only to set directory<match >
@type forward
...
<secondary>
type secondary_file
directory log/secondary/
</secondary>
</match>
A CLI tool which is used for
reading dump data
Background
• Dumped data are created by secondary
plugins (e.g. secondary_out_file) when
primary plugins are failing flush
• We can't read dumped data because
dumped data is binary format(MessagePack)
in most case
• => Provide a CLI tool to read dumped data
fluent-binlog-reader
• fluent-buinlog-reader is bundled in Fluent
• It reads dumped data and outputs readable
format
• Users can use fluent’s formatter plugins as an
output format
$ fluent-binlog-reader --help
Usage: fluent-binlog-reader <command> [<args>]
Commands of fluent-binlog-reader:
cat : Read files sequentially, writing them to standard output.
head : Display the beginning of a text file.
format : Display plugins that you can use.
See 'fluent-binlog-reader <command> --help' for more information on a specific command.
fluent-binlog-reader
$ fluent-binlog-reader head packed.log
2016-08-12T17:24:18+09:00 packed.log {"message":"dummy"}
2016-08-12T17:24:18+09:00 packed.log {"message":"dummy"}
2016-08-12T17:24:18+09:00 packed.log {"message":"dummy"}
2016-08-12T17:24:18+09:00 packed.log {"message":"dummy"}
2016-08-12T17:24:18+09:00 packed.log {"message":"dummy"}
Default format is json format
Using a “csv formatter” to output dumped data
$ fluent-binlog-reader cat --formats=csv -e fields=message packed.log
"dummy"
...
"dummy"
Log rotation
Background
• Fluentd can’t do log rotation
• As the file size of log increases, it becomes
difficult to handle the log file.
• => Fluentd supports log rotation to keep log
files down to a manageable size
Log rotation
• Two options
• log-rotate-age: The number of old log files
to keep
• log-rotate-size: Maximum log file size
• Use serverengine log rotation (Fluentd uses
serverengine logger to one’s log)
Optimise multiple filter
calls
Background
• If users apply multiple filters to incoming
events, Fluentd creates a lot of EventStream
object and calls its add method
• => Removing useless instantiations of
EventStream and the `add` method calls
Filter 1
1. Create an EventStream object (1 time)
2. Apply a filter to each event (5 times)
3. Add a filtered event to an EventStream object (5times)
[e1, e2, e3, e4, e5]
If 10 filters are applied
1. call 10 times
2. call 50 times
3. call 50 times
[e1’, e2’, e3’, e4’, e5’]
[e1x, e2x, e3x, e4x, e5x]
Filter n
Current filters
Filter 1
1. Create an EventStream object (1 time)
2. Apply each filters to each event (n * 5 times)
3. Add a filtered event to an EventStream object (5times)
[e1, e2, e3, e4, e5]
if 10 filters are applied
1. call 1 time
2. call 50 times
3. call 5 times
[e1x, e2x, e3x, e4x, e5x]
Filter n+
Constraint:
These filters must not be implemented `filter_stream` method
Optimised case
Performance
Tool : ruby-prof
ProductName: Mac OS X
ProductVersion: 10.11.6
BuildVersion: 15G31
PROCESSOR: 2.7 GHz Intel Core i5
MEMORY: 8 GB 1867 MHz DDR3
Not
optimized
Optimised
0.063186 0.051646
1.2 times faster
when it is using 10 filters and 1000 events per sec
I’ll talk about
• Counter API
• Data compression in buffer plugins and
forward plugins
• New out_file plugin only for <secondary>
section
• A CLI tool to read dumped event data
• Log rotation
• Optimizing multiple filter calls
Counter API
Motivations
• To get metrics of Fluentd itself between
processes
• To provide counter API to 3rd party plugins
• It is useful to implement counter plugins
(e.g. fluent-plugin-datacounter and fluent-
plugin-flowcounter)
What’s the counter
• The counter:
• A key-value store
• Used for storing the number of occurrences of a
particular event in the specified time
• Provides API to users to operate its value(e.g.
inc, reset , etc.)
• shared between processes
Counter
key value
key1 5
key2 1.2
key3 3
Counter
Process 1
inc(key1 => 2)
Process 2
reset(key2)
What’s the counter (cont.)
Counter
Counter
What’s the counter (cont.)
key value
key1 7
key2 1.2
key3 0
Process 1
inc(key1 => 2)
Process 2
reset(key2)
Implementation
• RPC server and client
• All operators should be thread safe
• Cleaning mutex objects for keys
Implementation
• RPC server and client
• All operators should be thread safe
• Cleaning mutex objects for keys
RPC server and client
• Because the counter is shared between
processes. We need a server and clients (Store
counter values in server and clients manipulate
them by RPC )
• I designed RPC server and client for counter
• I use cool.io to implement RPC server and client
• cool.io is providing a high-performance event
framework for Ruby (https://coolio.github.io/)
API to operate counter values
• init: create new value
• reset: reset a counter value
• delete: delete a counter value
• inc: increment or decrement a counter value
• get: fetch a counter value
Implementation
• RPC server and client
• All operators should be thread safe
• Cleaning mutex objects for keys
All operations should be a thread safe
• Counter works in multi threads
• You need to get a lock per keys when you
change a counter value
• Counter stores mutex objects in hash
(key_name => mutex_object)
How an inc method works
key value
key1 2
Counter server
client in worker1
inc( key1 => 2)
1. Call an inc method
key value
key1 mutex obj
Mutex hash
How an inc method works
key value
key1 2
Counter server
key value
key1 mutex obj
Mutex hash
client in worker1
inc( key1 => 2)
1. Call an inc method
2. Get a lock for a mutex hash
locked
How an inc method works
key value
key1 2
Counter server
key value
key1 locked
Mutex hash
client in worker1
inc( key1 => 2)
1. Call an inc method
2. Get a lock for a mutex hash
3. Get a lock for a key
locked
How an inc method works
key value
key1 2
Counter server
key value
key1 locked
Mutex hash
client in worker1
inc( key1 => 2)
1. Call an inc method
2. Get a lock for a mutex hash
3. Get a lock for a key
4. Unlock a mutex hash
How an inc method works
key value
key1 4
Counter server
key value
key1 locked
Mutex hash
client in worker1
inc( key1 => 2)
1. Call an inc method
2. Get a lock for a mutex hash
3. Get a lock for a key
4. Unlock a mutex hash
5. Change a counter value
How an inc method works
key value
key1 4
Counter server
key value
key1 unlock
Mutex hash
client in worker1
inc( key1 => 2)
1. Call an inc method
2. Get a lock for a mutex hash
3. Get a lock for a key
4. Unlock a mutex hash
5. Change a counter value
6. Unlock a key lock
Implementation
• RPC server and client
• All operators should be thread safe
• Cleaning mutex objects for keys
Mutex objects for keys
• To avoid storing mutex objects for all keys, I
implement a cleanup thread which removes
unused key’s mutex object (like GC)
• This thread removes mutex objects which are
not used for a certain period
Cleaning up a mutex hash
key value
key1 2
Counter server
key value
key1 mutex obj
Mutex hash
• If “key1” are not modified for a long period,
“key1” may be unused after this
Cleaning up a mutex hash
key value
key1 mutex obj
Mutex hash
• If “key1” are not modified for a long period,
“key1” may be unused after this
1. Start a cleaning thread (once in 15 min)
Cleaning up a mutex hash
key value
key1 mutex obj
Mutex hash
• If “key1” are not modified for a long period,
“key1” may be unused after this
1. Start a cleaning thread (once in 15 min)
2. Get a lock for a mutex hash locked
Cleaning up a mutex hash
key value
Mutex hash
• If “key1” are not modified for a long period,
“key1” may be unused after this
1. Start a cleaning thread (once in 15 min)
2. Get a lock for a mutex hash
3. Remove a mutex for an unused key
locked
Cleaning up a mutex hash
key value
Mutex hash
• If “key1” are not modified for a long period,
“key1” may be unused after this
1. Start a cleaning thread (once in 15 min)
2. Get a lock for a mutex hash
3. Remove a mutex for an unused key
4. Try to get a lock for the same key
If this thread can’t get a lock

restore a key-value
locked
Cleaning up a mutex hash
key value
Mutex hash
• If “key1” are not modified for a long period,
“key1” may be unused after this
1. Start a cleaning thread (once in 15 min)
2. Get a lock for a mutex hash
3. Remove a mutex for an unused key
4. Try to get a lock for the same key
5. Unlock a mutex hash
Summary
• Add six features and two enhancements to
Fluentd v0.14.x
• Counter API is not merged yet
• Other PRs have been merged
Impression of intern
• The hardest thing for me is to design about
counter API(It takes over 1 week)
• I have learned about the development of
middleware which is used by many people
• I want to became more careful to code written
by myself (typo, description, comment etc.)

More Related Content

What's hot

Facebook C++网络库Wangle调研
Facebook C++网络库Wangle调研Facebook C++网络库Wangle调研
Facebook C++网络库Wangle调研
vorfeed chen
 
Troubleshooting RabbitMQ and services that use it
Troubleshooting RabbitMQ and services that use itTroubleshooting RabbitMQ and services that use it
Troubleshooting RabbitMQ and services that use it
Michael Klishin
 
Concurrency in Python
Concurrency in PythonConcurrency in Python
Concurrency in Python
Mosky Liu
 
Gude for C++11 in Apache Traffic Server
Gude for C++11 in Apache Traffic ServerGude for C++11 in Apache Traffic Server
Gude for C++11 in Apache Traffic Server
Apache Traffic Server
 
The OMR GC talk - Ruby Kaigi 2015
The OMR GC talk - Ruby Kaigi 2015The OMR GC talk - Ruby Kaigi 2015
The OMR GC talk - Ruby Kaigi 2015
craig lehmann
 
无锁编程
无锁编程无锁编程
无锁编程
vorfeed chen
 
How DSL works on Ruby
How DSL works on RubyHow DSL works on Ruby
How DSL works on Ruby
Hiroshi SHIBATA
 
Ansible not only for Dummies
Ansible not only for DummiesAnsible not only for Dummies
Ansible not only for Dummies
Łukasz Proszek
 
Running Ruby on Solaris (RubyKaigi 2015, 12/Dec/2015)
Running Ruby on Solaris (RubyKaigi 2015, 12/Dec/2015)Running Ruby on Solaris (RubyKaigi 2015, 12/Dec/2015)
Running Ruby on Solaris (RubyKaigi 2015, 12/Dec/2015)
ngotogenome
 
Using Grails to power your electric car
Using Grails to power your electric carUsing Grails to power your electric car
Using Grails to power your electric car
Marco Pas
 
Lock? We don't need no stinkin' locks!
Lock? We don't need no stinkin' locks!Lock? We don't need no stinkin' locks!
Lock? We don't need no stinkin' locks!
Michael Barker
 
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
NAVER D2
 
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Chris Fregly
 
Rails Performance
Rails PerformanceRails Performance
Rails Performance
Wen-Tien Chang
 
Effizientere WordPress-Plugin-Entwicklung mit Softwaretests
Effizientere WordPress-Plugin-Entwicklung mit SoftwaretestsEffizientere WordPress-Plugin-Entwicklung mit Softwaretests
Effizientere WordPress-Plugin-Entwicklung mit Softwaretests
DECK36
 
Reactive programming with Rxjava
Reactive programming with RxjavaReactive programming with Rxjava
Reactive programming with Rxjava
Christophe Marchal
 
How to deploy PHP projects with docker
How to deploy PHP projects with dockerHow to deploy PHP projects with docker
How to deploy PHP projects with docker
Ruoshi Ling
 
RubyKaigi2015 making robots-with-mruby
RubyKaigi2015 making robots-with-mrubyRubyKaigi2015 making robots-with-mruby
RubyKaigi2015 making robots-with-mruby
yamanekko
 
Find bottleneck and tuning in Java Application
Find bottleneck and tuning in Java ApplicationFind bottleneck and tuning in Java Application
Find bottleneck and tuning in Java Application
guest1f2740
 
Above the clouds: introducing Akka
Above the clouds: introducing AkkaAbove the clouds: introducing Akka
Above the clouds: introducing Akka
nartamonov
 

What's hot (20)

Facebook C++网络库Wangle调研
Facebook C++网络库Wangle调研Facebook C++网络库Wangle调研
Facebook C++网络库Wangle调研
 
Troubleshooting RabbitMQ and services that use it
Troubleshooting RabbitMQ and services that use itTroubleshooting RabbitMQ and services that use it
Troubleshooting RabbitMQ and services that use it
 
Concurrency in Python
Concurrency in PythonConcurrency in Python
Concurrency in Python
 
Gude for C++11 in Apache Traffic Server
Gude for C++11 in Apache Traffic ServerGude for C++11 in Apache Traffic Server
Gude for C++11 in Apache Traffic Server
 
The OMR GC talk - Ruby Kaigi 2015
The OMR GC talk - Ruby Kaigi 2015The OMR GC talk - Ruby Kaigi 2015
The OMR GC talk - Ruby Kaigi 2015
 
无锁编程
无锁编程无锁编程
无锁编程
 
How DSL works on Ruby
How DSL works on RubyHow DSL works on Ruby
How DSL works on Ruby
 
Ansible not only for Dummies
Ansible not only for DummiesAnsible not only for Dummies
Ansible not only for Dummies
 
Running Ruby on Solaris (RubyKaigi 2015, 12/Dec/2015)
Running Ruby on Solaris (RubyKaigi 2015, 12/Dec/2015)Running Ruby on Solaris (RubyKaigi 2015, 12/Dec/2015)
Running Ruby on Solaris (RubyKaigi 2015, 12/Dec/2015)
 
Using Grails to power your electric car
Using Grails to power your electric carUsing Grails to power your electric car
Using Grails to power your electric car
 
Lock? We don't need no stinkin' locks!
Lock? We don't need no stinkin' locks!Lock? We don't need no stinkin' locks!
Lock? We don't need no stinkin' locks!
 
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
 
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
 
Rails Performance
Rails PerformanceRails Performance
Rails Performance
 
Effizientere WordPress-Plugin-Entwicklung mit Softwaretests
Effizientere WordPress-Plugin-Entwicklung mit SoftwaretestsEffizientere WordPress-Plugin-Entwicklung mit Softwaretests
Effizientere WordPress-Plugin-Entwicklung mit Softwaretests
 
Reactive programming with Rxjava
Reactive programming with RxjavaReactive programming with Rxjava
Reactive programming with Rxjava
 
How to deploy PHP projects with docker
How to deploy PHP projects with dockerHow to deploy PHP projects with docker
How to deploy PHP projects with docker
 
RubyKaigi2015 making robots-with-mruby
RubyKaigi2015 making robots-with-mrubyRubyKaigi2015 making robots-with-mruby
RubyKaigi2015 making robots-with-mruby
 
Find bottleneck and tuning in Java Application
Find bottleneck and tuning in Java ApplicationFind bottleneck and tuning in Java Application
Find bottleneck and tuning in Java Application
 
Above the clouds: introducing Akka
Above the clouds: introducing AkkaAbove the clouds: introducing Akka
Above the clouds: introducing Akka
 

Similar to Treasure Data Summer Internship 2016

Lotuscript for large systems
Lotuscript for large systemsLotuscript for large systems
Lotuscript for large systems
Bill Buchan
 
Fluentd v1 and future at techtalk
Fluentd v1 and future at techtalkFluentd v1 and future at techtalk
Fluentd v1 and future at techtalk
N Masahiro
 
Fluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshellFluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshell
N Masahiro
 
Monitoring and tuning your chef server - chef conf talk
Monitoring and tuning your chef server - chef conf talk Monitoring and tuning your chef server - chef conf talk
Monitoring and tuning your chef server - chef conf talk
Andrew DuFour
 
Cooking a rabbit pie
Cooking a rabbit pieCooking a rabbit pie
Cooking a rabbit pie
Tomas Doran
 
Utilizing the OpenNTF Domino API
Utilizing the OpenNTF Domino APIUtilizing the OpenNTF Domino API
Utilizing the OpenNTF Domino API
Oliver Busse
 
Performance Tuning Your Puppet Infrastructure - PuppetConf 2014
Performance Tuning Your Puppet Infrastructure - PuppetConf 2014Performance Tuning Your Puppet Infrastructure - PuppetConf 2014
Performance Tuning Your Puppet Infrastructure - PuppetConf 2014
Puppet
 
Ansible benelux meetup - Amsterdam 27-5-2015
Ansible benelux meetup - Amsterdam 27-5-2015Ansible benelux meetup - Amsterdam 27-5-2015
Ansible benelux meetup - Amsterdam 27-5-2015
Pavel Chunyayev
 
Building source code level profiler for C++.pdf
Building source code level profiler for C++.pdfBuilding source code level profiler for C++.pdf
Building source code level profiler for C++.pdf
ssuser28de9e
 
PowerShell for Penetration Testers
PowerShell for Penetration TestersPowerShell for Penetration Testers
PowerShell for Penetration Testers
Nikhil Mittal
 
Utilizing the OpenNTF Domino API
Utilizing the OpenNTF Domino APIUtilizing the OpenNTF Domino API
Utilizing the OpenNTF Domino API
Oliver Busse
 
Fluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshellFluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshell
N Masahiro
 
Common Pitfalls of Functional Programming and How to Avoid Them: A Mobile Gam...
Common Pitfalls of Functional Programming and How to Avoid Them: A Mobile Gam...Common Pitfalls of Functional Programming and How to Avoid Them: A Mobile Gam...
Common Pitfalls of Functional Programming and How to Avoid Them: A Mobile Gam...
gree_tech
 
How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.
Renzo Tomà
 
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
LINE Corporation
 
12 Step Guide to Lotuscript
12 Step Guide to Lotuscript12 Step Guide to Lotuscript
12 Step Guide to Lotuscript
Bill Buchan
 
PyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web ApplicationsPyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web Applications
Graham Dumpleton
 
Practical IoT Exploitation (DEFCON23 IoTVillage) - Lyon Yang
Practical IoT Exploitation (DEFCON23 IoTVillage) - Lyon YangPractical IoT Exploitation (DEFCON23 IoTVillage) - Lyon Yang
Practical IoT Exploitation (DEFCON23 IoTVillage) - Lyon Yang
Lyon Yang
 
Kafka zero to hero
Kafka zero to heroKafka zero to hero
Kafka zero to hero
Avi Levi
 
Apache Kafka - From zero to hero
Apache Kafka - From zero to heroApache Kafka - From zero to hero
Apache Kafka - From zero to hero
Apache Kafka TLV
 

Similar to Treasure Data Summer Internship 2016 (20)

Lotuscript for large systems
Lotuscript for large systemsLotuscript for large systems
Lotuscript for large systems
 
Fluentd v1 and future at techtalk
Fluentd v1 and future at techtalkFluentd v1 and future at techtalk
Fluentd v1 and future at techtalk
 
Fluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshellFluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshell
 
Monitoring and tuning your chef server - chef conf talk
Monitoring and tuning your chef server - chef conf talk Monitoring and tuning your chef server - chef conf talk
Monitoring and tuning your chef server - chef conf talk
 
Cooking a rabbit pie
Cooking a rabbit pieCooking a rabbit pie
Cooking a rabbit pie
 
Utilizing the OpenNTF Domino API
Utilizing the OpenNTF Domino APIUtilizing the OpenNTF Domino API
Utilizing the OpenNTF Domino API
 
Performance Tuning Your Puppet Infrastructure - PuppetConf 2014
Performance Tuning Your Puppet Infrastructure - PuppetConf 2014Performance Tuning Your Puppet Infrastructure - PuppetConf 2014
Performance Tuning Your Puppet Infrastructure - PuppetConf 2014
 
Ansible benelux meetup - Amsterdam 27-5-2015
Ansible benelux meetup - Amsterdam 27-5-2015Ansible benelux meetup - Amsterdam 27-5-2015
Ansible benelux meetup - Amsterdam 27-5-2015
 
Building source code level profiler for C++.pdf
Building source code level profiler for C++.pdfBuilding source code level profiler for C++.pdf
Building source code level profiler for C++.pdf
 
PowerShell for Penetration Testers
PowerShell for Penetration TestersPowerShell for Penetration Testers
PowerShell for Penetration Testers
 
Utilizing the OpenNTF Domino API
Utilizing the OpenNTF Domino APIUtilizing the OpenNTF Domino API
Utilizing the OpenNTF Domino API
 
Fluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshellFluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshell
 
Common Pitfalls of Functional Programming and How to Avoid Them: A Mobile Gam...
Common Pitfalls of Functional Programming and How to Avoid Them: A Mobile Gam...Common Pitfalls of Functional Programming and How to Avoid Them: A Mobile Gam...
Common Pitfalls of Functional Programming and How to Avoid Them: A Mobile Gam...
 
How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.
 
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
 
12 Step Guide to Lotuscript
12 Step Guide to Lotuscript12 Step Guide to Lotuscript
12 Step Guide to Lotuscript
 
PyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web ApplicationsPyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web Applications
 
Practical IoT Exploitation (DEFCON23 IoTVillage) - Lyon Yang
Practical IoT Exploitation (DEFCON23 IoTVillage) - Lyon YangPractical IoT Exploitation (DEFCON23 IoTVillage) - Lyon Yang
Practical IoT Exploitation (DEFCON23 IoTVillage) - Lyon Yang
 
Kafka zero to hero
Kafka zero to heroKafka zero to hero
Kafka zero to hero
 
Apache Kafka - From zero to hero
Apache Kafka - From zero to heroApache Kafka - From zero to hero
Apache Kafka - From zero to hero
 

Recently uploaded

一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
ecqow
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
ElakkiaU
 
Material for memory and display system h
Material for memory and display system hMaterial for memory and display system h
Material for memory and display system h
gowrishankartb2005
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
KrishnaveniKrishnara1
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
co23btech11018
 
An improved modulation technique suitable for a three level flying capacitor ...
An improved modulation technique suitable for a three level flying capacitor ...An improved modulation technique suitable for a three level flying capacitor ...
An improved modulation technique suitable for a three level flying capacitor ...
IJECEIAES
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
IJECEIAES
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
171ticu
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
bijceesjournal
 
Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...
Prakhyath Rai
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Sinan KOZAK
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
Divyanshu
 
Data Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason WebinarData Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason Webinar
UReason
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
Gino153088
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
KrishnaveniKrishnara1
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
Madan Karki
 
Data Control Language.pptx Data Control Language.pptx
Data Control Language.pptx Data Control Language.pptxData Control Language.pptx Data Control Language.pptx
Data Control Language.pptx Data Control Language.pptx
ramrag33
 

Recently uploaded (20)

一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
 
Material for memory and display system h
Material for memory and display system hMaterial for memory and display system h
Material for memory and display system h
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
 
An improved modulation technique suitable for a three level flying capacitor ...
An improved modulation technique suitable for a three level flying capacitor ...An improved modulation technique suitable for a three level flying capacitor ...
An improved modulation technique suitable for a three level flying capacitor ...
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
 
Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
 
Data Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason WebinarData Driven Maintenance | UReason Webinar
Data Driven Maintenance | UReason Webinar
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
 
Data Control Language.pptx Data Control Language.pptx
Data Control Language.pptx Data Control Language.pptxData Control Language.pptx Data Control Language.pptx
Data Control Language.pptx Data Control Language.pptx
 

Treasure Data Summer Internship 2016

  • 1. Internship Final Report Sep 30, 2016 Yuta Iwama
  • 2. Who am I • Yuma Iwama (@ganmacs) • Master’s student, The University of Tokyo • Research: Programming languages (My theme is extending language syntax) • Group: Chiba Shigeru Group
  • 3. What I did in summer intern • Add features and enhancements to Fluentd v0.14.x
  • 4. What I did in summer intern • 6 features • Counter API (Not merged yet) • Data compression in buffer plugins and forward plugins • New out_file plugin only for <secondary> section • A CLI tool to read dumped event data • Log rotation • `filter_with_time` method in filter plugins • 2 enhancements • Optimizing multiple filter calls • Add event size to options in a forward protocol • Some small tasks
  • 5. I’ll talk about • Counter API • Data compression in buffer plugins and forward plugins • New out_file plugin only for <secondary> section • A CLI tool to read dumped event data • Log rotation • Optimizing multiple filter calls
  • 6. I’ll talk about • Counter API • Data compression in buffer plugins and forward plugins • New out_file plugin only for <secondary> section • A CLI tool to read dumped event data • Log rotation • Optimizing multiple filter calls
  • 7. Data Compression in buffer plugins and forward plugins
  • 8. Current buffer plugins and forward plugins in Fluentd • Buffer plugins have data as a string (formatted with MessagePack or user custom formats) • Forward plugins send data as a string (format is same as buffer plugins) • Although data is serialized with MessagePack, its footprint is large • Current way consumes many memory resources and bandwidth of the network
  • 9. New buffer plugins and forward plugins • String data in buffer plugins can be compressed • Forward plugins can send and receive compressed data • Things to be able to • Save the bandwidth across the datacenter • Accelerate the transfer speed and save the time • Reduce memory consumptions and costs of IaaS (EC2 , etc.)
  • 10. Implementation • I used “zlib” in Ruby to implement a compression/decompression method • It’s hard to work both compressed version and raw version (To solve this problem, I used `extend` in Ruby not to break existing interface)
  • 11. New out_file plugin only for <secondary> section
  • 12. Background • Many users use out_file plugin to dump buffer with <secondary> sections when primary buffered output plugins are failing flush • But out_file is too complex and has too many features for such purpose • => We need simple out_file only for <secondary> section just to dump buffer
  • 13. New plugin: secondary_out_file • Only four attributes • directory: the directory dumped data saved • basename: the file name of dumped data (default value is dump.bin) • append: the flushed data is appended to an existing file or not (default false) • compress: The type of the file ( gzip or txt, default is txt) • Users can use this plugin only to set directory<match > @type forward ... <secondary> type secondary_file directory log/secondary/ </secondary> </match>
  • 14. A CLI tool which is used for reading dump data
  • 15. Background • Dumped data are created by secondary plugins (e.g. secondary_out_file) when primary plugins are failing flush • We can't read dumped data because dumped data is binary format(MessagePack) in most case • => Provide a CLI tool to read dumped data
  • 16. fluent-binlog-reader • fluent-buinlog-reader is bundled in Fluent • It reads dumped data and outputs readable format • Users can use fluent’s formatter plugins as an output format $ fluent-binlog-reader --help Usage: fluent-binlog-reader <command> [<args>] Commands of fluent-binlog-reader: cat : Read files sequentially, writing them to standard output. head : Display the beginning of a text file. format : Display plugins that you can use. See 'fluent-binlog-reader <command> --help' for more information on a specific command.
  • 17. fluent-binlog-reader $ fluent-binlog-reader head packed.log 2016-08-12T17:24:18+09:00 packed.log {"message":"dummy"} 2016-08-12T17:24:18+09:00 packed.log {"message":"dummy"} 2016-08-12T17:24:18+09:00 packed.log {"message":"dummy"} 2016-08-12T17:24:18+09:00 packed.log {"message":"dummy"} 2016-08-12T17:24:18+09:00 packed.log {"message":"dummy"} Default format is json format Using a “csv formatter” to output dumped data $ fluent-binlog-reader cat --formats=csv -e fields=message packed.log "dummy" ... "dummy"
  • 19. Background • Fluentd can’t do log rotation • As the file size of log increases, it becomes difficult to handle the log file. • => Fluentd supports log rotation to keep log files down to a manageable size
  • 20. Log rotation • Two options • log-rotate-age: The number of old log files to keep • log-rotate-size: Maximum log file size • Use serverengine log rotation (Fluentd uses serverengine logger to one’s log)
  • 22. Background • If users apply multiple filters to incoming events, Fluentd creates a lot of EventStream object and calls its add method • => Removing useless instantiations of EventStream and the `add` method calls
  • 23. Filter 1 1. Create an EventStream object (1 time) 2. Apply a filter to each event (5 times) 3. Add a filtered event to an EventStream object (5times) [e1, e2, e3, e4, e5] If 10 filters are applied 1. call 10 times 2. call 50 times 3. call 50 times [e1’, e2’, e3’, e4’, e5’] [e1x, e2x, e3x, e4x, e5x] Filter n Current filters
  • 24. Filter 1 1. Create an EventStream object (1 time) 2. Apply each filters to each event (n * 5 times) 3. Add a filtered event to an EventStream object (5times) [e1, e2, e3, e4, e5] if 10 filters are applied 1. call 1 time 2. call 50 times 3. call 5 times [e1x, e2x, e3x, e4x, e5x] Filter n+ Constraint: These filters must not be implemented `filter_stream` method Optimised case
  • 25. Performance Tool : ruby-prof ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G31 PROCESSOR: 2.7 GHz Intel Core i5 MEMORY: 8 GB 1867 MHz DDR3 Not optimized Optimised 0.063186 0.051646 1.2 times faster when it is using 10 filters and 1000 events per sec
  • 26. I’ll talk about • Counter API • Data compression in buffer plugins and forward plugins • New out_file plugin only for <secondary> section • A CLI tool to read dumped event data • Log rotation • Optimizing multiple filter calls
  • 28. Motivations • To get metrics of Fluentd itself between processes • To provide counter API to 3rd party plugins • It is useful to implement counter plugins (e.g. fluent-plugin-datacounter and fluent- plugin-flowcounter)
  • 29. What’s the counter • The counter: • A key-value store • Used for storing the number of occurrences of a particular event in the specified time • Provides API to users to operate its value(e.g. inc, reset , etc.) • shared between processes
  • 30. Counter key value key1 5 key2 1.2 key3 3 Counter Process 1 inc(key1 => 2) Process 2 reset(key2) What’s the counter (cont.)
  • 31. Counter Counter What’s the counter (cont.) key value key1 7 key2 1.2 key3 0 Process 1 inc(key1 => 2) Process 2 reset(key2)
  • 32. Implementation • RPC server and client • All operators should be thread safe • Cleaning mutex objects for keys
  • 33. Implementation • RPC server and client • All operators should be thread safe • Cleaning mutex objects for keys
  • 34. RPC server and client • Because the counter is shared between processes. We need a server and clients (Store counter values in server and clients manipulate them by RPC ) • I designed RPC server and client for counter • I use cool.io to implement RPC server and client • cool.io is providing a high-performance event framework for Ruby (https://coolio.github.io/)
  • 35. API to operate counter values • init: create new value • reset: reset a counter value • delete: delete a counter value • inc: increment or decrement a counter value • get: fetch a counter value
  • 36. Implementation • RPC server and client • All operators should be thread safe • Cleaning mutex objects for keys
  • 37. All operations should be a thread safe • Counter works in multi threads • You need to get a lock per keys when you change a counter value • Counter stores mutex objects in hash (key_name => mutex_object)
  • 38. How an inc method works key value key1 2 Counter server client in worker1 inc( key1 => 2) 1. Call an inc method key value key1 mutex obj Mutex hash
  • 39. How an inc method works key value key1 2 Counter server key value key1 mutex obj Mutex hash client in worker1 inc( key1 => 2) 1. Call an inc method 2. Get a lock for a mutex hash locked
  • 40. How an inc method works key value key1 2 Counter server key value key1 locked Mutex hash client in worker1 inc( key1 => 2) 1. Call an inc method 2. Get a lock for a mutex hash 3. Get a lock for a key locked
  • 41. How an inc method works key value key1 2 Counter server key value key1 locked Mutex hash client in worker1 inc( key1 => 2) 1. Call an inc method 2. Get a lock for a mutex hash 3. Get a lock for a key 4. Unlock a mutex hash
  • 42. How an inc method works key value key1 4 Counter server key value key1 locked Mutex hash client in worker1 inc( key1 => 2) 1. Call an inc method 2. Get a lock for a mutex hash 3. Get a lock for a key 4. Unlock a mutex hash 5. Change a counter value
  • 43. How an inc method works key value key1 4 Counter server key value key1 unlock Mutex hash client in worker1 inc( key1 => 2) 1. Call an inc method 2. Get a lock for a mutex hash 3. Get a lock for a key 4. Unlock a mutex hash 5. Change a counter value 6. Unlock a key lock
  • 44. Implementation • RPC server and client • All operators should be thread safe • Cleaning mutex objects for keys
  • 45. Mutex objects for keys • To avoid storing mutex objects for all keys, I implement a cleanup thread which removes unused key’s mutex object (like GC) • This thread removes mutex objects which are not used for a certain period
  • 46. Cleaning up a mutex hash key value key1 2 Counter server key value key1 mutex obj Mutex hash • If “key1” are not modified for a long period, “key1” may be unused after this
  • 47. Cleaning up a mutex hash key value key1 mutex obj Mutex hash • If “key1” are not modified for a long period, “key1” may be unused after this 1. Start a cleaning thread (once in 15 min)
  • 48. Cleaning up a mutex hash key value key1 mutex obj Mutex hash • If “key1” are not modified for a long period, “key1” may be unused after this 1. Start a cleaning thread (once in 15 min) 2. Get a lock for a mutex hash locked
  • 49. Cleaning up a mutex hash key value Mutex hash • If “key1” are not modified for a long period, “key1” may be unused after this 1. Start a cleaning thread (once in 15 min) 2. Get a lock for a mutex hash 3. Remove a mutex for an unused key locked
  • 50. Cleaning up a mutex hash key value Mutex hash • If “key1” are not modified for a long period, “key1” may be unused after this 1. Start a cleaning thread (once in 15 min) 2. Get a lock for a mutex hash 3. Remove a mutex for an unused key 4. Try to get a lock for the same key If this thread can’t get a lock
 restore a key-value locked
  • 51. Cleaning up a mutex hash key value Mutex hash • If “key1” are not modified for a long period, “key1” may be unused after this 1. Start a cleaning thread (once in 15 min) 2. Get a lock for a mutex hash 3. Remove a mutex for an unused key 4. Try to get a lock for the same key 5. Unlock a mutex hash
  • 52. Summary • Add six features and two enhancements to Fluentd v0.14.x • Counter API is not merged yet • Other PRs have been merged
  • 53. Impression of intern • The hardest thing for me is to design about counter API(It takes over 1 week) • I have learned about the development of middleware which is used by many people • I want to became more careful to code written by myself (typo, description, comment etc.)