SlideShare a Scribd company logo
1 of 83
Download to read offline
×
∼ Best Practice for Better Performance ∼
Scala Days 2015 San Francisco Un-conference
2015-03-19 @mogproject
Ad Tech Performance
Tuning
Scala
×
Agenda
About Demand Side Science
Introduction to Performance Tuning
Best Practice in Development
Japanese language version here:
http://www.slideshare.net/mogproject/scala-41799241
Yosuke Mizutani (@mogproject)

Joined Demand Side Science in April 2013



(thanks to Scala Conference in Japan 2013)
Full-stack engineer (want to be…)
Background: 9-year infrastructure engineer
About Me
http://about.me/mogproject
http://mogproject.blogspot.jp
in Japanese Language
http://demand-side-science.jp/blog
in Japanese Language
http://functional-news.com
in Japanese Language
Nov 2012

Established Demand Side Science Inc.
Brief History of DSS
Demand
×
Side
Science×
2013

Developed private DSP package fractale
Brief History of DSS
Demand
×
Side
Platform×
Advertiser’s side of realtime ads bidding (RTB)
What is DSP
Supply Side Platform
Dec 2013

Moved into the group of Opt, the e-marketing agency
Oct 2014

Released dynamic creative tool unis
Brief History of DSS
× ×
unis is a third-party ad server which creates
dynamic and/or personalized ads under the rules.


http://www.opt.ne.jp/news/pr/detail/id=2492
unis
items on sale most popular
items
fixed items
re-targeting
With venture mind + advantage of Opt group …
Future of DSS
Demand
×
Side
Science×
We will create various products based on Science!
Future of DSS
???
×
???
Science×
For everyone’s …
Future of DSS
Marketer × Publisher Consumer×
happiness!
Future of DSS
Win × Win Win×
We
DSS and Scala
Adopt Scala ×
for all products
from the day of establishment
×
System Architecture Example
RDBMS
NOSQL
log storage
cache
Log Aggregation
Machine Learning
Cache Making
etc.
Today, I will not talk about JavaScript tuning.
System Architecture Example
Agenda
About Demand Side Science
Introduction to Performance Tuning
Best Practice in Development
Resolve an issue
Reduce infrastructure cost

(e.g. Amazon Web Services)
Motivations
Application goes wrong with high load

Bad latency under the specific condition
Slow batch execution than expectations
Slow development tools
Resolve an Issue
Very important especially in ad tech industry
Cost tends to go bigger and bigger
High traffic
Need to response in few milli seconds
Big database, big log data
Business requires

Benefit from mass delivery > Infra Investment

Reduce Infrastructure Cost
You need to care about
cost (≒ engineer’s time) and
risk (possibility to cause new trouble)
for performance tuning itself.
Don’t lose you goal
Scaling up/out of Infra can be the best
solution, naively
Don’t want to be perfect
We iterate
Basic of Performance Tuning
Measure metrics
× Find bottleneck
Try with hypothesis×
Don't take erratic steps.
http://en.wikipedia.org/wiki/Pareto_principle
“80% of a program’s processing time
come from 20% of the code”
— Pareto Principle
※CAUTION: This is my own impression
Bottle Neck in My Experience
others
1%
Network
4%
JVM parameter
5%
Library
5%
OS
10%
Scala
10%
Async・Thread
15%
Database
(RDBMS/NOSQL)
50%
What is I/O
Memory × Disk
Network×
Approximate timing for various operations
http://norvig.com/21-days.html#answers
execute typical instruction 1/1,000,000,000 sec = 1 nanosec
fetch from L1 cache memory 0.5 nanosec
branch misprediction 5 nanosec
fetch from L2 cache memory 7 nanosec
Mutex lock/unlock 25 nanosec
fetch from main memory 100 nanosec
send 2K bytes over 1Gbps network 20,000 nanosec
read 1MB sequentially from memory 250,000 nanosec
fetch from new disk location (seek) 8,000,000 nanosec
read 1MB sequentially from disk 20,000,000 nanosec
send packet US to Europe and back 150 milliseconds = 150,000,000 nanosec
If Typical Instruction Takes 1 second…
https://www.coursera.org/course/reactive week3-2
execute typical instruction 1 second
fetch from L1 cache memory 0.5 seconds
branch misprediction 5 seconds
fetch from L2 cache memory 7 seconds
Mutex lock/unlock ½ minute
fetch from main memory 1½ minute
send 2K bytes over 1Gbps network 5½ hours
read 1MB sequentially from memory 3 days
fetch from new disk location (seek) 13 weeks
read 1MB sequentially from disk 6½ months
send packet US to Europe and back 5 years
A batch
reads 1,000,000 files of 10KB
from disk
for each time.
Data size:
10KB × 1,000,000 ≒ 10GB
Horrible and True Story
Assuming 1,000,000 seeks are needed,

Estimated time:

8ms × 106
+ 20ms × 10,000 ≒ 8,200 sec ≒ 2.5 h
If there is one file of 10GB and only one seek is
needed,
Estimated time:
8ms × 1 + 20ms × 10,000 ≒ 200 sec ≒ 3.5 min
Horrible and True Story
Have Respect for the Disk Head
http://en.wikipedia.org/wiki/Hard_disk_drive
JVM Trade-offs
JVM Performance Triangle
Memory Footprint ↓
Throughput ↑ Latency ↓
longest pause
time for Full GC
In the other words…
JVM Performance Triangle
Compactness
Throughput Responsiveness
C × T × R = a
JVM Performance Triangle
Tuning: vary C, T, R for fixed a
Optimization: increase a
Reference:
Everything I ever learned about JVM performance tuning
@twitter by Attila Szegedi
http://www.beyondlinux.com/files/pub/qconhangzhou2011/Everything%20I%20ever%20learned
%20about%20JVM%20performance%20tuning%20@twitter%28Attila%20Szegedi%29.pdf
Agenda
About Demand Side Science
Introduction to Performance Tuning
Best Practice in Development
1. Requirement Definition / Feasibility
2. Basic Design
3. Detailed Design
4. Building Infrastructure / Coding
5. System Testing
6. System Operation / Maintenance
Development Process
Only topics related to performance will be covered.
Make the agreement with stakeholders
about performance requirement
Requirement Definition / Feasibility
How many user IDs
internet users in Japan: 100 million
unique browsers: 200 ~ x00 million
will increase?
data expiration cycle?
type of devices / browsers?
opt-out rate?
Requirement Definition / Feasibility
Number of deliver requests for ads
Number of impressions per month
In case 1 billion / month

=> mean: 400 QPS (Query Per Second)

=> if peak rate = 250%, then 1,000 QPS
For RTB, bid rate? win rate?
Goal response time? Content size?
Plans for increasing?
How about Cookie Sync?
Requirement Definition / Feasibility
Number of receiving trackers
Timing of firing tracker
Click rate?
Conversion(*) rate?







* A conversion occurs when the user performs the
specific action that the advertiser has defined as the
campaign goal.

e.g. buying a product in an online store
Requirement Definition / Feasibility
Requirement for aggregation
Indicates to be aggregated
Is unique counting needed?
Any exception rules?
Who and when
secondary processing by ad agency?
Update interval
Storage period
Requirement Definition / Feasibility
Hard limit by business side
Sales plan
Christmas selling?
Annual sales target?
Total budget
The most important thing is to provide numbers,
although it is extremely difficult to approximate
precisely in the turbulent world of ad tech.
Requirement Definition / Feasibility
Architecture design needs assumed value
Performance testing needs numeric goal
Architecture design
Choose framework
Web framework
Choose database
RDBMS
NOSQL
Basic Design
Threading model design
Reduce blocking
Future based
Callback & function composition
Actor based
Message passing
Thread pool design
We can’t know the appropriate thread pool
size unless we complete performance
testing in production.
Basic Design
Database design
Access pattern / Number of lookup
Data size per one record
Create model of distribution when the size
is not constant
Number of records
Rate of growth / retention period
Memory usage
At first, measure the performance of the
database itself
Detailed Design
Log design
Consider compression ratio for disk usage
Cache design
Some software needs the double of capacity
for processing backup (e.g. Redis)
Detailed Design
Simplicity and clarity come first
“It is far, far easier to make a correct
program fast than it is to make a fast
program correct”



— C++ Coding Standards: 101 Rules, Guidelines, and Best Practices (C
++ in-depth series)
Building Infrastructure / Coding
— Donald Knuth
“Premature optimization
is the root of all evil.”
— Jon Bentley
“On the other hand,
we cannot ignore efficiency”
Avoid the algorithm which is worse than linear
as possible
Measure, don’t guess



http://en.wikipedia.org/wiki/Unix_philosophy
Building Infrastructure / Coding
SBT Plugin for running OpenJDK JMH 

(Java Microbenchmark Harness: Benchmark tool for Java)



https://github.com/ktoso/sbt-jmh
Micro Benchmark: sbt-jmh
addSbtPlugin("pl.project13.scala" % "sbt-jmh" % "0.1.6")
Micro Benchmark: sbt-jmh
plugins.sbt
jmhSettings
build.sbt
import org.openjdk.jmh.annotations.Benchmark
class YourBench {
@Benchmark
def yourFunc(): Unit = ??? // write code to measure
}
YourBench.scala
Just put an annotation
> run -i 3 -wi 3 -f 1 -t 1
Micro Benchmark: sbt-jmh
Run benchmark in the sbt console
Number of
measurement
iterations to do
Number of warmup
iterations to do
How many times to forks
a single benchmark
Number of worker
threads to run with
[info] Benchmark Mode Samples Score Score error Units
[info] c.g.m.u.ContainsBench.listContains thrpt 3 41.033 25.573 ops/s
[info] c.g.m.u.ContainsBench.setContains thrpt 3 6.810 1.569 ops/s
Micro Benchmark: sbt-jmh
Result (excerpted)
By default, throughput score
will be displayed.
(larger is better)
http://mogproject.blogspot.jp/2014/10/micro-benchmark-in-scala-using-sbt-jmh.html
Scala Optimization Example
Use Scala collection correctly
Prefer recursion to function call

by Prof. Martin Odersky in Scala Matsuri 2014
Try optimization libraries
def f(xs: List[Int], acc: List[Int] = Nil): List[Int] = {
if (xs.length < 4) {
(xs.sum :: acc).reverse
} else {
val (y, ys) = xs.splitAt(4)
f(ys, y.sum :: acc)
}
}
Horrible and True Story pt.2
Group by 4 elements of List[Int], then

calculate each sum respectively
scala> f((1 to 10).toList)
res1: List[Int] = List(10, 26, 19)
Example
Horrible and True Story pt.2
List#length takes time proportional to the
length of the sequence
When the length of the parameter xs is n,

time complexity of List#length is O(n)
Implemented in LinearSeqOptimized#length

https://github.com/scala/scala/blob/v2.11.4/src/library/scala/collection/
LinearSeqOptimized.scala#L35-43
Horrible and True Story pt.2
In function f,

xs.length will be evaluated n / 4 + 1 times,

so number of execution of f is also
proportional to n
Therefore,

time complexity of function f is O(n2)
It becomes too slow with big n
Horrible and True Story pt.2
For your information, the following one-liner does
same work using built-in method
scala> (1 to 10).grouped(4).map(_.sum).toList
res2: List[Int] = List(10, 26, 19)
ScalaBlitz
Library for optimising Scala collection

(by using macro)
http://scala-blitz.github.io/
Presentation in Scala Days 2014

https://parleys.com/play/
53a7d2c6e4b0543940d9e549/chapter0/
about
ScalaBlitz
System feature testing
Interface testing
Performance testing
Reliability testing
Security testing
Operation testing
System Testing
Simple load testing
Scenario load testing
mixed load with typical user operations
Aging test (continuously running test)
Performance Testing
Apache attached
Simple benchmark tool

http://httpd.apache.org/docs/2.2/programs/ab.html

Adequate for naive requirements
Latest version recommended

(Amazon Linux pre-installed version’s bug made me sick)
Example
ab - Apache Bench
ab -C <CookieName=Value> -n <NumberOfRequests> -c <Concurrency> “<URL>“
Result example (excerpted)
ab - Apache Bench
Benchmarking example.com (be patient)
Completed 1200 requests
Completed 2400 requests
(略)
Completed 10800 requests
Completed 12000 requests
Finished 12000 requests
(略)
Concurrency Level: 200
Time taken for tests: 7.365 seconds
Complete requests: 12000
Failed requests: 0
Write errors: 0
Total transferred: 166583579 bytes
HTML transferred: 160331058 bytes
Requests per second: 1629.31 [#/sec] (mean)
Time per request: 122.751 [ms] (mean)
Time per request: 0.614 [ms] (mean, across all concurrent requests)
Transfer rate: 22087.90 [Kbytes/sec] received
(略)
Percentage of the requests served within a certain time (ms)
50% 116
66% 138
75% 146
80% 150
90% 161
95% 170
98% 185
99% 208
100% 308 (longest request)
Requests per second
= QPS
Load testing tool written in Scala
http://gatling.io
Gatling
An era of Apache JMeter has finished
Say good bye to scenario making with GUI
With Gatling,
You load write scenario with Scala DSL
Gatling
Care for the resource of stressor side
Resource of server (or PC)
Network router (CPU) can be bottleneck
Don’t tune two or more parameters at one
time
Leave change log and log files
Days for Testing and Tuning
System Operation / Maintenance
Logging ×
Anomaly Detection
Trends Visualization×
Day-to-day logging and monitoring
Application log
GC log
Profiler
Anomaly detection from several metrics
Server resource (CPU, memory, disk, etc.)
abnormal response code
Latency
Trends visualization from several metrics
System Operation / Maintenance
GC log

Add JVM options as follows
JVM Settings
-verbose:gc
-Xloggc:<PathToTheLog>
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=10
-XX:GCLogFileSize=10M
— Real customer
“If someone doesn’t enable
GC logging in production,
I shoot them!
http://www.oracle.com/technetwork/server-storage/ts-4887-159080.pdf p55
JMX (Java Management eXtensions)

Add JVM options as follows
JVM Settings
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=<PORT NUMBER>
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
stdout / stderr
Should redirect to file
Should NOT throw away to /dev/null
Result of thread dump

(kill - 3 <PROCESS_ID>) will be written
here
JVM Settings
SLF4J + Profiler
http://www.slf4j.org/extensions.html
Coding example
Profiler
import org.slf4j.profiler.Profiler
val profiler: Profiler = new Profiler(this.getClass.getSimpleName)
profiler.start(“A”)
doA()
profiler.start(“B”)
doB()
profiler.stop()
logger.warn(profiler.toString)
SLF4J + Profiler
Output example
Example:

Log the result of the profiler when
timeout occurs
Profiler
+ Profiler [BASIC]
|-- elapsed time [A] 220.487 milliseconds.
|-- elapsed time [B] 2499.866 milliseconds.
|-- elapsed time [OTHER] 3300.745 milliseconds.
|-- Total [BASIC] 6022.568 milliseconds.
For catching trends, not for anomaly detection
Operation is also necessary not to look over
the sign of change
Not only for infrastructure /application, but
business indicates
Who uses the console?
System user
System administrator
Application developer
Business manager
Trends Visualization
Grafana (+Graphite)
Graphite - http://graphite.readthedocs.org
Manage and visualize numeric time-series
data
Grafana - http://grafana.org/
Visualize Graphite data more stylish

(or Kibana-like)
Grafana (+Graphite)
×
∼ Best Practice for Better Performance ∼
Scala Days 2015 San Francisco Un-conference
2015-03-19 @mogproject
Thank very
much!
you
×
"Yosuke Mizutani - Kanagawa, Japan | about.me" - http://about.me/mogproject
"mog project" - http://mogproject.blogspot.jp/
"DSS Tech Blog - Demand Side Science ㈱ の技術ブログ" - http://demand-side-
science.jp/blog/
"FunctionalNews - 関数型言語ニュースサイト" - http://functional-news.com/
"『ザ・アドテクノロジー』∼データマーケティングの基礎からアトリビューション
の概念まで∼ / 翔泳社 新刊のご紹介" - http://markezine.jp/book/adtechnology/
"オプト、ダイナミック・クリエイティブツール「unis」の提供開始 ∼ パーソナラ
イズ化された広告を自動生成し、広告効果の最大化を目指す ∼ | インターネット広
告代理店 オプト" - http://www.opt.ne.jp/news/pr/detail/id=2492
"The Scala Programming Language" - http://www.scala-lang.org/
"Finagle" - https://twitter.github.io/finagle/
"Play Framework - Build Modern & Scalable Web Apps with Java and Scala" -
https://www.playframework.com/
"nginx" - http://nginx.org/ja/
"Fluentd | Open Source Data Collector" - http://www.fluentd.org/
"Javaパフォーマンスチューニング(1):Javaパフォーマンスチューニングのルー
ル (1/2) - @IT" - http://www.atmarkit.co.jp/ait/articles/0501/29/news011.html
"パレートの法則 - Wikipedia" - http://ja.wikipedia.org/wiki/パレートの法則
"Teach Yourself Programming in Ten Years" - http://norvig.com/21-
days.html#answers
"企業が作る国際ネットワーク最前線 - [4]いまさら聞けない国際ネットワークの
基礎知識:ITpro" - http://itpro.nikkeibp.co.jp/article/COLUMN/20100119/
343461/
"Coursera" - https://www.coursera.org/course/reactive
"アースマラソン - Wikipedia" - http://ja.wikipedia.org/wiki/アースマラソン
"Hard disk drive - Wikipedia, the free encyclopedia" - http://en.wikipedia.org/
wiki/Hard_disk_drive
"Everything I ever learned about JVM performance tuning @twitter(Attila
Szegedi).pdf" - http://www.beyondlinux.com/files/pub/qconhangzhou2011/
Everything%20I%20ever%20learned%20about%20JVM%20performance
%20tuning%20@twitter%28Attila%20Szegedi%29.pdf
"Amazon.co.jp: C++ Coding Standards―101のルール、ガイドライン、ベストプ
ラクティス (C++ in-depth series): ハーブ サッター, アンドレイ アレキサンドレス
ク, 浜田 光之, Herb Sutter, Andrei Alexandrescu, 浜田 真理: 本" - http://
www.amazon.co.jp/gp/product/4894716860
"UNIX哲学 - Wikipedia" - http://ja.wikipedia.org/wiki/UNIX哲学
"ktoso/sbt-jmh" - https://github.com/ktoso/sbt-jmh
"ScalaBlitz | ScalaBlitz" - http://scala-blitz.github.io/
"Parleys.com - Lightning-Fast Standard Collections With ScalaBlitz by Dmitry
Petrashko" - https://parleys.com/play/53a7d2c6e4b0543940d9e549/chapter0/
about
"mog project: Micro Benchmark in Scala - Using sbt-jmh" - http://
mogproject.blogspot.jp/2014/10/micro-benchmark-in-scala-using-sbt-jmh.html
"Gatling Project, Stress Tool" - http://gatling.io/
"WEB+DB PRESS Vol.83|技術評論社" - http://gihyo.jp/magazine/wdpress/
archive/2014/vol83
"「Javaの鉱脈」でGatlingの記事を書きました — さにあらず" - http://
blog.satotaichi.info/gatling-is-awesome-loadtester
"Garbage Collection Tuning in the Java HotSpot™ Virtual Machine" - http://
www.oracle.com/technetwork/server-storage/ts-4887-159080.pdf
"SLF4J extensions" - http://www.slf4j.org/extensions.html
"Graphite Documentation — Graphite 0.10.0 documentation" - http://
graphite.readthedocs.org/en/latest/
"Grafana - Graphite and InfluxDB Dashboard and graph composer" - http://
grafana.org/
"Grafana - Grafana Play Home" - http://play.grafana.org/#/dashboard/db/
grafana-play-home
"不動産関係に使える 無料画像一覧" - http://free-realestate.org/information/
list.html
"AI・EPSの無料イラストレーター素材なら無料イラスト素材.com" - http://www.無
料イラスト素材.com/
"大体いい感じになるKeynoteテンプレート「Azusa」作った - MEMOGRAPHIX" -
http://memo.sanographix.net/post/82160791768
References

More Related Content

What's hot

Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Chris Fregly
 

What's hot (20)

Lagergren jvmls-2013-final
Lagergren jvmls-2013-finalLagergren jvmls-2013-final
Lagergren jvmls-2013-final
 
Atlanta Spark User Meetup 09 22 2016
Atlanta Spark User Meetup 09 22 2016Atlanta Spark User Meetup 09 22 2016
Atlanta Spark User Meetup 09 22 2016
 
Profiling & Testing with Spark
Profiling & Testing with SparkProfiling & Testing with Spark
Profiling & Testing with Spark
 
Gopher in performance_tales_ms_go_cracow
Gopher in performance_tales_ms_go_cracowGopher in performance_tales_ms_go_cracow
Gopher in performance_tales_ms_go_cracow
 
Building Google Cloud ML Engine From Scratch on AWS with PipelineAI - ODSC Lo...
Building Google Cloud ML Engine From Scratch on AWS with PipelineAI - ODSC Lo...Building Google Cloud ML Engine From Scratch on AWS with PipelineAI - ODSC Lo...
Building Google Cloud ML Engine From Scratch on AWS with PipelineAI - ODSC Lo...
 
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
Hyper-Parameter Tuning Across the Entire AI Pipeline GPU Tech Conference San ...
 
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
 
Unit testing of spark applications
Unit testing of spark applicationsUnit testing of spark applications
Unit testing of spark applications
 
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
 
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on G...
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on G...Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on G...
Optimize + Deploy Distributed Tensorflow, Spark, and Scikit-Learn Models on G...
 
Deep learning with kafka
Deep learning with kafkaDeep learning with kafka
Deep learning with kafka
 
[212]big models without big data using domain specific deep networks in data-...
[212]big models without big data using domain specific deep networks in data-...[212]big models without big data using domain specific deep networks in data-...
[212]big models without big data using domain specific deep networks in data-...
 
딥러닝프레임워크비교
딥러닝프레임워크비교딥러닝프레임워크비교
딥러닝프레임워크비교
 
Introduction of failsafe
Introduction of failsafeIntroduction of failsafe
Introduction of failsafe
 
unassert - encourage reliable programming by writing assertions in production
unassert - encourage reliable programming by writing assertions in productionunassert - encourage reliable programming by writing assertions in production
unassert - encourage reliable programming by writing assertions in production
 
Demystifying DataFrame and Dataset
Demystifying DataFrame and DatasetDemystifying DataFrame and Dataset
Demystifying DataFrame and Dataset
 
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...
Optimizing, Profiling, and Deploying TensorFlow AI Models with GPUs - San Fra...
 
Spark vs storm
Spark vs stormSpark vs storm
Spark vs storm
 
Introduction to Polyaxon
Introduction to PolyaxonIntroduction to Polyaxon
Introduction to Polyaxon
 
Making sense of your data jug
Making sense of your data   jugMaking sense of your data   jug
Making sense of your data jug
 

Viewers also liked

CIKM 2013 Tutorial: Real-time Bidding: A New Frontier of Computational Advert...
CIKM 2013 Tutorial: Real-time Bidding: A New Frontier of Computational Advert...CIKM 2013 Tutorial: Real-time Bidding: A New Frontier of Computational Advert...
CIKM 2013 Tutorial: Real-time Bidding: A New Frontier of Computational Advert...
Shuai Yuan
 

Viewers also liked (8)

A Technical Introduction to RTBkit
A Technical Introduction to RTBkitA Technical Introduction to RTBkit
A Technical Introduction to RTBkit
 
DeviceAtlas - 6 Ways Ad Platforms Can Harness Device Data
DeviceAtlas - 6 Ways Ad Platforms Can Harness Device DataDeviceAtlas - 6 Ways Ad Platforms Can Harness Device Data
DeviceAtlas - 6 Ways Ad Platforms Can Harness Device Data
 
Aerospike at Tapad
Aerospike at TapadAerospike at Tapad
Aerospike at Tapad
 
Continuous performance management with Gatling
Continuous performance management with GatlingContinuous performance management with Gatling
Continuous performance management with Gatling
 
What is Real Time Bidding (in 30 seconds)
What is Real Time Bidding (in 30 seconds)What is Real Time Bidding (in 30 seconds)
What is Real Time Bidding (in 30 seconds)
 
ScalaにまつわるNewsな話
ScalaにまつわるNewsな話ScalaにまつわるNewsな話
ScalaにまつわるNewsな話
 
CIKM 2013 Tutorial: Real-time Bidding: A New Frontier of Computational Advert...
CIKM 2013 Tutorial: Real-time Bidding: A New Frontier of Computational Advert...CIKM 2013 Tutorial: Real-time Bidding: A New Frontier of Computational Advert...
CIKM 2013 Tutorial: Real-time Bidding: A New Frontier of Computational Advert...
 
Everything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterEverything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @Twitter
 

Similar to Adtech x Scala x Performance tuning

The Yin and Yang of Software
The Yin and Yang of SoftwareThe Yin and Yang of Software
The Yin and Yang of Software
elliando dias
 
Dataiku productive application to production - pap is may 2015
Dataiku    productive application to production - pap is may 2015 Dataiku    productive application to production - pap is may 2015
Dataiku productive application to production - pap is may 2015
Dataiku
 
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward
 

Similar to Adtech x Scala x Performance tuning (20)

Flavius Ștef: Big Rewrites Without Big Risks at I T.A.K.E. Unconference
Flavius Ștef: Big Rewrites Without Big Risks at I T.A.K.E. UnconferenceFlavius Ștef: Big Rewrites Without Big Risks at I T.A.K.E. Unconference
Flavius Ștef: Big Rewrites Without Big Risks at I T.A.K.E. Unconference
 
Big rewrites without big risks
Big rewrites without big risksBig rewrites without big risks
Big rewrites without big risks
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
The Yin and Yang of Software
The Yin and Yang of SoftwareThe Yin and Yang of Software
The Yin and Yang of Software
 
Maximize Big Data ROI via Best of Breed Patterns and Practices
Maximize Big Data ROI via Best of Breed Patterns and PracticesMaximize Big Data ROI via Best of Breed Patterns and Practices
Maximize Big Data ROI via Best of Breed Patterns and Practices
 
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
Greenplum for Internet Scale Analytics and Mining - Greenplum Summit 2018
 
Front-End Performance Checklist 2020
Front-End Performance Checklist 2020Front-End Performance Checklist 2020
Front-End Performance Checklist 2020
 
Highway to heaven - Microservices Meetup Munich
Highway to heaven - Microservices Meetup MunichHighway to heaven - Microservices Meetup Munich
Highway to heaven - Microservices Meetup Munich
 
Web performance
Web  performance Web  performance
Web performance
 
Questions Log: Dynamic Cubes – Set to Retire Transformer?
Questions Log: Dynamic Cubes – Set to Retire Transformer?Questions Log: Dynamic Cubes – Set to Retire Transformer?
Questions Log: Dynamic Cubes – Set to Retire Transformer?
 
Deep learning for FinTech
Deep learning for FinTechDeep learning for FinTech
Deep learning for FinTech
 
Dataiku productive application to production - pap is may 2015
Dataiku    productive application to production - pap is may 2015 Dataiku    productive application to production - pap is may 2015
Dataiku productive application to production - pap is may 2015
 
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
 
Ch1
Ch1Ch1
Ch1
 
Ch1
Ch1Ch1
Ch1
 
Capacity Planning Infrastructure for Web Applications (Drupal)
Capacity Planning Infrastructure for Web Applications (Drupal)Capacity Planning Infrastructure for Web Applications (Drupal)
Capacity Planning Infrastructure for Web Applications (Drupal)
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data Stack
 
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience SharingClickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
 
ClickHouse Paris Meetup. ClickHouse at ContentSquare, by Christophe Kalenzaga...
ClickHouse Paris Meetup. ClickHouse at ContentSquare, by Christophe Kalenzaga...ClickHouse Paris Meetup. ClickHouse at ContentSquare, by Christophe Kalenzaga...
ClickHouse Paris Meetup. ClickHouse at ContentSquare, by Christophe Kalenzaga...
 
Enterprise application performance - Understanding & Learnings
Enterprise application performance - Understanding & LearningsEnterprise application performance - Understanding & Learnings
Enterprise application performance - Understanding & Learnings
 

Recently uploaded

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 

Recently uploaded (20)

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide Deck
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 

Adtech x Scala x Performance tuning

  • 1. × ∼ Best Practice for Better Performance ∼ Scala Days 2015 San Francisco Un-conference 2015-03-19 @mogproject Ad Tech Performance Tuning Scala ×
  • 2. Agenda About Demand Side Science Introduction to Performance Tuning Best Practice in Development Japanese language version here: http://www.slideshare.net/mogproject/scala-41799241
  • 3. Yosuke Mizutani (@mogproject)
 Joined Demand Side Science in April 2013
 
 (thanks to Scala Conference in Japan 2013) Full-stack engineer (want to be…) Background: 9-year infrastructure engineer About Me
  • 8. Nov 2012
 Established Demand Side Science Inc. Brief History of DSS Demand × Side Science×
  • 9. 2013
 Developed private DSP package fractale Brief History of DSS Demand × Side Platform×
  • 10. Advertiser’s side of realtime ads bidding (RTB) What is DSP Supply Side Platform
  • 11. Dec 2013
 Moved into the group of Opt, the e-marketing agency Oct 2014
 Released dynamic creative tool unis Brief History of DSS × ×
  • 12. unis is a third-party ad server which creates dynamic and/or personalized ads under the rules. 
 http://www.opt.ne.jp/news/pr/detail/id=2492 unis items on sale most popular items fixed items re-targeting
  • 13. With venture mind + advantage of Opt group … Future of DSS Demand × Side Science×
  • 14. We will create various products based on Science! Future of DSS ??? × ??? Science×
  • 15. For everyone’s … Future of DSS Marketer × Publisher Consumer×
  • 17. We DSS and Scala Adopt Scala × for all products from the day of establishment ×
  • 18. System Architecture Example RDBMS NOSQL log storage cache Log Aggregation Machine Learning Cache Making etc.
  • 19. Today, I will not talk about JavaScript tuning. System Architecture Example
  • 20. Agenda About Demand Side Science Introduction to Performance Tuning Best Practice in Development
  • 21. Resolve an issue Reduce infrastructure cost
 (e.g. Amazon Web Services) Motivations
  • 22. Application goes wrong with high load
 Bad latency under the specific condition Slow batch execution than expectations Slow development tools Resolve an Issue
  • 23. Very important especially in ad tech industry Cost tends to go bigger and bigger High traffic Need to response in few milli seconds Big database, big log data Business requires
 Benefit from mass delivery > Infra Investment
 Reduce Infrastructure Cost
  • 24. You need to care about cost (≒ engineer’s time) and risk (possibility to cause new trouble) for performance tuning itself. Don’t lose you goal Scaling up/out of Infra can be the best solution, naively Don’t want to be perfect
  • 25. We iterate Basic of Performance Tuning Measure metrics × Find bottleneck Try with hypothesis× Don't take erratic steps.
  • 26. http://en.wikipedia.org/wiki/Pareto_principle “80% of a program’s processing time come from 20% of the code” — Pareto Principle
  • 27. ※CAUTION: This is my own impression Bottle Neck in My Experience others 1% Network 4% JVM parameter 5% Library 5% OS 10% Scala 10% Async・Thread 15% Database (RDBMS/NOSQL) 50%
  • 28. What is I/O Memory × Disk Network×
  • 29. Approximate timing for various operations http://norvig.com/21-days.html#answers execute typical instruction 1/1,000,000,000 sec = 1 nanosec fetch from L1 cache memory 0.5 nanosec branch misprediction 5 nanosec fetch from L2 cache memory 7 nanosec Mutex lock/unlock 25 nanosec fetch from main memory 100 nanosec send 2K bytes over 1Gbps network 20,000 nanosec read 1MB sequentially from memory 250,000 nanosec fetch from new disk location (seek) 8,000,000 nanosec read 1MB sequentially from disk 20,000,000 nanosec send packet US to Europe and back 150 milliseconds = 150,000,000 nanosec
  • 30. If Typical Instruction Takes 1 second… https://www.coursera.org/course/reactive week3-2 execute typical instruction 1 second fetch from L1 cache memory 0.5 seconds branch misprediction 5 seconds fetch from L2 cache memory 7 seconds Mutex lock/unlock ½ minute fetch from main memory 1½ minute send 2K bytes over 1Gbps network 5½ hours read 1MB sequentially from memory 3 days fetch from new disk location (seek) 13 weeks read 1MB sequentially from disk 6½ months send packet US to Europe and back 5 years
  • 31. A batch reads 1,000,000 files of 10KB from disk for each time. Data size: 10KB × 1,000,000 ≒ 10GB Horrible and True Story
  • 32. Assuming 1,000,000 seeks are needed,
 Estimated time:
 8ms × 106 + 20ms × 10,000 ≒ 8,200 sec ≒ 2.5 h If there is one file of 10GB and only one seek is needed, Estimated time: 8ms × 1 + 20ms × 10,000 ≒ 200 sec ≒ 3.5 min Horrible and True Story
  • 33. Have Respect for the Disk Head http://en.wikipedia.org/wiki/Hard_disk_drive
  • 34. JVM Trade-offs JVM Performance Triangle Memory Footprint ↓ Throughput ↑ Latency ↓ longest pause time for Full GC
  • 35. In the other words… JVM Performance Triangle Compactness Throughput Responsiveness
  • 36. C × T × R = a JVM Performance Triangle Tuning: vary C, T, R for fixed a Optimization: increase a Reference: Everything I ever learned about JVM performance tuning @twitter by Attila Szegedi http://www.beyondlinux.com/files/pub/qconhangzhou2011/Everything%20I%20ever%20learned %20about%20JVM%20performance%20tuning%20@twitter%28Attila%20Szegedi%29.pdf
  • 37. Agenda About Demand Side Science Introduction to Performance Tuning Best Practice in Development
  • 38. 1. Requirement Definition / Feasibility 2. Basic Design 3. Detailed Design 4. Building Infrastructure / Coding 5. System Testing 6. System Operation / Maintenance Development Process Only topics related to performance will be covered.
  • 39. Make the agreement with stakeholders about performance requirement Requirement Definition / Feasibility How many user IDs internet users in Japan: 100 million unique browsers: 200 ~ x00 million will increase? data expiration cycle? type of devices / browsers? opt-out rate?
  • 40. Requirement Definition / Feasibility Number of deliver requests for ads Number of impressions per month In case 1 billion / month
 => mean: 400 QPS (Query Per Second)
 => if peak rate = 250%, then 1,000 QPS For RTB, bid rate? win rate? Goal response time? Content size? Plans for increasing? How about Cookie Sync?
  • 41. Requirement Definition / Feasibility Number of receiving trackers Timing of firing tracker Click rate? Conversion(*) rate?
 
 
 
 * A conversion occurs when the user performs the specific action that the advertiser has defined as the campaign goal.
 e.g. buying a product in an online store
  • 42. Requirement Definition / Feasibility Requirement for aggregation Indicates to be aggregated Is unique counting needed? Any exception rules? Who and when secondary processing by ad agency? Update interval Storage period
  • 43. Requirement Definition / Feasibility Hard limit by business side Sales plan Christmas selling? Annual sales target? Total budget
  • 44. The most important thing is to provide numbers, although it is extremely difficult to approximate precisely in the turbulent world of ad tech. Requirement Definition / Feasibility Architecture design needs assumed value Performance testing needs numeric goal
  • 45. Architecture design Choose framework Web framework Choose database RDBMS NOSQL Basic Design
  • 46. Threading model design Reduce blocking Future based Callback & function composition Actor based Message passing Thread pool design We can’t know the appropriate thread pool size unless we complete performance testing in production. Basic Design
  • 47. Database design Access pattern / Number of lookup Data size per one record Create model of distribution when the size is not constant Number of records Rate of growth / retention period Memory usage At first, measure the performance of the database itself Detailed Design
  • 48. Log design Consider compression ratio for disk usage Cache design Some software needs the double of capacity for processing backup (e.g. Redis) Detailed Design
  • 49. Simplicity and clarity come first “It is far, far easier to make a correct program fast than it is to make a fast program correct”
 
 — C++ Coding Standards: 101 Rules, Guidelines, and Best Practices (C ++ in-depth series) Building Infrastructure / Coding
  • 50. — Donald Knuth “Premature optimization is the root of all evil.”
  • 51. — Jon Bentley “On the other hand, we cannot ignore efficiency”
  • 52. Avoid the algorithm which is worse than linear as possible Measure, don’t guess
 
 http://en.wikipedia.org/wiki/Unix_philosophy Building Infrastructure / Coding
  • 53. SBT Plugin for running OpenJDK JMH 
 (Java Microbenchmark Harness: Benchmark tool for Java)
 
 https://github.com/ktoso/sbt-jmh Micro Benchmark: sbt-jmh
  • 54. addSbtPlugin("pl.project13.scala" % "sbt-jmh" % "0.1.6") Micro Benchmark: sbt-jmh plugins.sbt jmhSettings build.sbt import org.openjdk.jmh.annotations.Benchmark class YourBench { @Benchmark def yourFunc(): Unit = ??? // write code to measure } YourBench.scala Just put an annotation
  • 55. > run -i 3 -wi 3 -f 1 -t 1 Micro Benchmark: sbt-jmh Run benchmark in the sbt console Number of measurement iterations to do Number of warmup iterations to do How many times to forks a single benchmark Number of worker threads to run with
  • 56. [info] Benchmark Mode Samples Score Score error Units [info] c.g.m.u.ContainsBench.listContains thrpt 3 41.033 25.573 ops/s [info] c.g.m.u.ContainsBench.setContains thrpt 3 6.810 1.569 ops/s Micro Benchmark: sbt-jmh Result (excerpted) By default, throughput score will be displayed. (larger is better) http://mogproject.blogspot.jp/2014/10/micro-benchmark-in-scala-using-sbt-jmh.html
  • 57. Scala Optimization Example Use Scala collection correctly Prefer recursion to function call
 by Prof. Martin Odersky in Scala Matsuri 2014 Try optimization libraries
  • 58. def f(xs: List[Int], acc: List[Int] = Nil): List[Int] = { if (xs.length < 4) { (xs.sum :: acc).reverse } else { val (y, ys) = xs.splitAt(4) f(ys, y.sum :: acc) } } Horrible and True Story pt.2 Group by 4 elements of List[Int], then
 calculate each sum respectively scala> f((1 to 10).toList) res1: List[Int] = List(10, 26, 19) Example
  • 59. Horrible and True Story pt.2 List#length takes time proportional to the length of the sequence When the length of the parameter xs is n,
 time complexity of List#length is O(n) Implemented in LinearSeqOptimized#length
 https://github.com/scala/scala/blob/v2.11.4/src/library/scala/collection/ LinearSeqOptimized.scala#L35-43
  • 60. Horrible and True Story pt.2 In function f,
 xs.length will be evaluated n / 4 + 1 times,
 so number of execution of f is also proportional to n Therefore,
 time complexity of function f is O(n2) It becomes too slow with big n
  • 61. Horrible and True Story pt.2 For your information, the following one-liner does same work using built-in method scala> (1 to 10).grouped(4).map(_.sum).toList res2: List[Int] = List(10, 26, 19)
  • 63. Library for optimising Scala collection
 (by using macro) http://scala-blitz.github.io/ Presentation in Scala Days 2014
 https://parleys.com/play/ 53a7d2c6e4b0543940d9e549/chapter0/ about ScalaBlitz
  • 64. System feature testing Interface testing Performance testing Reliability testing Security testing Operation testing System Testing
  • 65. Simple load testing Scenario load testing mixed load with typical user operations Aging test (continuously running test) Performance Testing
  • 66. Apache attached Simple benchmark tool
 http://httpd.apache.org/docs/2.2/programs/ab.html
 Adequate for naive requirements Latest version recommended
 (Amazon Linux pre-installed version’s bug made me sick) Example ab - Apache Bench ab -C <CookieName=Value> -n <NumberOfRequests> -c <Concurrency> “<URL>“
  • 67. Result example (excerpted) ab - Apache Bench Benchmarking example.com (be patient) Completed 1200 requests Completed 2400 requests (略) Completed 10800 requests Completed 12000 requests Finished 12000 requests (略) Concurrency Level: 200 Time taken for tests: 7.365 seconds Complete requests: 12000 Failed requests: 0 Write errors: 0 Total transferred: 166583579 bytes HTML transferred: 160331058 bytes Requests per second: 1629.31 [#/sec] (mean) Time per request: 122.751 [ms] (mean) Time per request: 0.614 [ms] (mean, across all concurrent requests) Transfer rate: 22087.90 [Kbytes/sec] received (略) Percentage of the requests served within a certain time (ms) 50% 116 66% 138 75% 146 80% 150 90% 161 95% 170 98% 185 99% 208 100% 308 (longest request) Requests per second = QPS
  • 68. Load testing tool written in Scala http://gatling.io Gatling
  • 69. An era of Apache JMeter has finished Say good bye to scenario making with GUI With Gatling, You load write scenario with Scala DSL Gatling
  • 70. Care for the resource of stressor side Resource of server (or PC) Network router (CPU) can be bottleneck Don’t tune two or more parameters at one time Leave change log and log files Days for Testing and Tuning
  • 71. System Operation / Maintenance Logging × Anomaly Detection Trends Visualization×
  • 72. Day-to-day logging and monitoring Application log GC log Profiler Anomaly detection from several metrics Server resource (CPU, memory, disk, etc.) abnormal response code Latency Trends visualization from several metrics System Operation / Maintenance
  • 73. GC log
 Add JVM options as follows JVM Settings -verbose:gc -Xloggc:<PathToTheLog> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M
  • 74. — Real customer “If someone doesn’t enable GC logging in production, I shoot them! http://www.oracle.com/technetwork/server-storage/ts-4887-159080.pdf p55
  • 75. JMX (Java Management eXtensions)
 Add JVM options as follows JVM Settings -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=<PORT NUMBER> -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false
  • 76. stdout / stderr Should redirect to file Should NOT throw away to /dev/null Result of thread dump
 (kill - 3 <PROCESS_ID>) will be written here JVM Settings
  • 77. SLF4J + Profiler http://www.slf4j.org/extensions.html Coding example Profiler import org.slf4j.profiler.Profiler val profiler: Profiler = new Profiler(this.getClass.getSimpleName) profiler.start(“A”) doA() profiler.start(“B”) doB() profiler.stop() logger.warn(profiler.toString)
  • 78. SLF4J + Profiler Output example Example:
 Log the result of the profiler when timeout occurs Profiler + Profiler [BASIC] |-- elapsed time [A] 220.487 milliseconds. |-- elapsed time [B] 2499.866 milliseconds. |-- elapsed time [OTHER] 3300.745 milliseconds. |-- Total [BASIC] 6022.568 milliseconds.
  • 79. For catching trends, not for anomaly detection Operation is also necessary not to look over the sign of change Not only for infrastructure /application, but business indicates Who uses the console? System user System administrator Application developer Business manager Trends Visualization
  • 81. Graphite - http://graphite.readthedocs.org Manage and visualize numeric time-series data Grafana - http://grafana.org/ Visualize Graphite data more stylish
 (or Kibana-like) Grafana (+Graphite)
  • 82. × ∼ Best Practice for Better Performance ∼ Scala Days 2015 San Francisco Un-conference 2015-03-19 @mogproject Thank very much! you ×
  • 83. "Yosuke Mizutani - Kanagawa, Japan | about.me" - http://about.me/mogproject "mog project" - http://mogproject.blogspot.jp/ "DSS Tech Blog - Demand Side Science ㈱ の技術ブログ" - http://demand-side- science.jp/blog/ "FunctionalNews - 関数型言語ニュースサイト" - http://functional-news.com/ "『ザ・アドテクノロジー』∼データマーケティングの基礎からアトリビューション の概念まで∼ / 翔泳社 新刊のご紹介" - http://markezine.jp/book/adtechnology/ "オプト、ダイナミック・クリエイティブツール「unis」の提供開始 ∼ パーソナラ イズ化された広告を自動生成し、広告効果の最大化を目指す ∼ | インターネット広 告代理店 オプト" - http://www.opt.ne.jp/news/pr/detail/id=2492 "The Scala Programming Language" - http://www.scala-lang.org/ "Finagle" - https://twitter.github.io/finagle/ "Play Framework - Build Modern & Scalable Web Apps with Java and Scala" - https://www.playframework.com/ "nginx" - http://nginx.org/ja/ "Fluentd | Open Source Data Collector" - http://www.fluentd.org/ "Javaパフォーマンスチューニング(1):Javaパフォーマンスチューニングのルー ル (1/2) - @IT" - http://www.atmarkit.co.jp/ait/articles/0501/29/news011.html "パレートの法則 - Wikipedia" - http://ja.wikipedia.org/wiki/パレートの法則 "Teach Yourself Programming in Ten Years" - http://norvig.com/21- days.html#answers "企業が作る国際ネットワーク最前線 - [4]いまさら聞けない国際ネットワークの 基礎知識:ITpro" - http://itpro.nikkeibp.co.jp/article/COLUMN/20100119/ 343461/ "Coursera" - https://www.coursera.org/course/reactive "アースマラソン - Wikipedia" - http://ja.wikipedia.org/wiki/アースマラソン "Hard disk drive - Wikipedia, the free encyclopedia" - http://en.wikipedia.org/ wiki/Hard_disk_drive "Everything I ever learned about JVM performance tuning @twitter(Attila Szegedi).pdf" - http://www.beyondlinux.com/files/pub/qconhangzhou2011/ Everything%20I%20ever%20learned%20about%20JVM%20performance %20tuning%20@twitter%28Attila%20Szegedi%29.pdf "Amazon.co.jp: C++ Coding Standards―101のルール、ガイドライン、ベストプ ラクティス (C++ in-depth series): ハーブ サッター, アンドレイ アレキサンドレス ク, 浜田 光之, Herb Sutter, Andrei Alexandrescu, 浜田 真理: 本" - http:// www.amazon.co.jp/gp/product/4894716860 "UNIX哲学 - Wikipedia" - http://ja.wikipedia.org/wiki/UNIX哲学 "ktoso/sbt-jmh" - https://github.com/ktoso/sbt-jmh "ScalaBlitz | ScalaBlitz" - http://scala-blitz.github.io/ "Parleys.com - Lightning-Fast Standard Collections With ScalaBlitz by Dmitry Petrashko" - https://parleys.com/play/53a7d2c6e4b0543940d9e549/chapter0/ about "mog project: Micro Benchmark in Scala - Using sbt-jmh" - http:// mogproject.blogspot.jp/2014/10/micro-benchmark-in-scala-using-sbt-jmh.html "Gatling Project, Stress Tool" - http://gatling.io/ "WEB+DB PRESS Vol.83|技術評論社" - http://gihyo.jp/magazine/wdpress/ archive/2014/vol83 "「Javaの鉱脈」でGatlingの記事を書きました — さにあらず" - http:// blog.satotaichi.info/gatling-is-awesome-loadtester "Garbage Collection Tuning in the Java HotSpot™ Virtual Machine" - http:// www.oracle.com/technetwork/server-storage/ts-4887-159080.pdf "SLF4J extensions" - http://www.slf4j.org/extensions.html "Graphite Documentation — Graphite 0.10.0 documentation" - http:// graphite.readthedocs.org/en/latest/ "Grafana - Graphite and InfluxDB Dashboard and graph composer" - http:// grafana.org/ "Grafana - Grafana Play Home" - http://play.grafana.org/#/dashboard/db/ grafana-play-home "不動産関係に使える 無料画像一覧" - http://free-realestate.org/information/ list.html "AI・EPSの無料イラストレーター素材なら無料イラスト素材.com" - http://www.無 料イラスト素材.com/ "大体いい感じになるKeynoteテンプレート「Azusa」作った - MEMOGRAPHIX" - http://memo.sanographix.net/post/82160791768 References