SlideShare a Scribd company logo
Composewell Technologies
High Performance
Haskell
Harendra Kumar

15 Dec 2018
Composewell Technologies
Harendra Kumar
‣More than a decade of systems programming in C

‣Writing Haskell for last three years

‣Currently focusing on streamly, an ambitious
project that aims to make programming practical
systems in Haskell a joy and ensure C like high
performance.
Composewell Technologies
Haskell Performance
‣Can easily be off by 10x or 100x from the best

‣Refactoring can easily affect performance

‣You cannot be confident unless you measure

‣Best practices can easily get you in the ballpark

‣Squeezing the last drop may be harder

‣With some effort, can get close to C or even better
A Case Study
Composewell Technologies
Unicode Normalization
A case study
‣Challenge: can we do unicode normalization equal to or
faster than the best C++ library (icu)?
Composewell Technologies
The Problem
‣A unicode character may have multiple forms
(composed/decomposed).
‣ Åström (U+00C5 U+0073 U+0074 U+0072 U+00F6 U+006D) 

‣ Åström (U+0041 U+030A U+0073 U+0074 U+0072
U+006F U+0308 U+006D)

‣To compare strings we need to bring them to a common
same normal form (e.g. NFC/NFD).
Composewell Technologies
Normalized Form
Decomposed (NFD)
‣Sequence of chars: 

‣Starter,Starter,Combining1,Combining2…Starter,Combining1…
‣Lookup character:

‣has decomposition?

‣replace with its components

‣Lookup combining class:

‣0 => Starter, Non-zero => combining

‣Reorder multiple combining chars as per combining class
Composewell Technologies
Unicode Character
Database
‣Lookup maps:

‣Decomposition map, ~2000 entries

‣Combining class map ~1000 entries

‣Algorithmic decomposition of Hangul characters
Composewell Technologies
Naive, Elegant Code
‣Normalization in ~50 lines of core code

‣Use IntMap for database lookup

‣Use Haskell lists for processing

‣Idiomatic code
Composewell Technologies
Naive Implementation
Performance (C++/Haskell)
Composewell Technologies
Use Pattern Match for
Lookup
Composewell Technologies
IntMap vs Pattern Match
Composewell Technologies
Fast Path Decomposition
Lookup
Composewell Technologies
Decomposition
‣Decomposition is recursive

‣Use simple recursion instead of iterate, zip with
tail idioms to decompose recursively.
Composewell Technologies
Fast Path Reordering
‣Original code: 

‣split into groups, sortBy combining class (CC)

‣“SCCSCC” => [(S,0), (C,10), (C,11)], [(S,0), (C,5), (C,6)]

‣Optimized code: use custom sorting for the cases when
the sort group size is 1 or 2, fallback to regular list sort for
the rest.

‣Use bitmap for a quick combining or non-combining
check, non-combining is a common case.
Composewell Technologies
Monolithic Decompose and
Reorder
‣ Original code: reorder . decompose

‣ Optimized code: decomposeAndReorder reorderBuffer
‣ In the common case the buffer has just one char and it gets flushed
when we get the next char.

‣ We need to sort the buffer only when there are more than one
combining chars in the buffer.

‣ Use custom sorting for 2 char sorting case.

‣ Do not use string append for reorder buffer, manually deconstruct
and reconstruct the list for short common cases. (10% improvement)
Composewell Technologies
Hangul Jamo Normalization
‣Use algorithmic decomposition as prescribed by the unicode standard,
instead of simple lookup based approach.

‣NOINLINE Hangul Jamo case - this is not fast path

‣Use quot/rem instead of div/mod

‣user quotRem instead of quot/rem

‣Use unsafeChr instead of chr

‣Use strict values in list buffers

‣Use tuples instead of lists for returning short buffers

‣Localize recursion to non-hangul case
Composewell Technologies
Where are we?
(C++/Haskell/C)
Composewell Technologies
Can we do better?
‣Remember we are still using plain Haskell strings! Let’s
do some minimal experiments to test the limits: 

stringOp = map (chr . (+ 1) . ord) — 17 ms
textOp = T.map (chr . (+ 1) . ord) — 11 ms
textOp = T.unstream . T.stream — 4.0 ms
ICU English Normalization — 2.7 ms
Fixed Text unstream code -
NOINLINE realloc code — 1.3 ms
Composewell Technologies
Let’s Apply This
‣Use Text with stream/unstream instead of strings

‣Conditional branch readjustments, for fast path.
‣Inlining
‣INLINE the isCombining check (+16%)

‣Add NOINLINE to slow path code

‣-funbox-strict-fields
Composewell Technologies
Optimize Reorder Buffer
‣Instead of a list, use a custom data type optimized for
fast path cases:

data Buffer = Empty | One {-# UNPACK #-} !Char | Many [Char]
‣Use a mutable reorder buffer

‣ + 5%
Composewell Technologies
Where are we now?
(C++/Haskell)
Composewell Technologies
Use llvm backend (+10%)
Composewell Technologies
We can do better
‣We can use non-decomposable starter lookup for
fast path. It will cut common case lookups by half.

‣We have not tried hash lookup

‣ICU C++ library uses unicode quick check properties for
optimization, we can also do the same to further optimize
at algorithmic level.

‣Code generation by GHC can possibly be improved. I
raised a couple of tickets about it.
Composewell Technologies
Lessons
‣Using Haskell we can write concise code with acceptable
performance quickly.

‣The code can be optimized to perform as well as C

‣Most of the optimization we did were algorithmic and
logic related rather than language related issues. Mostly
custom handling of fast path.

‣The most common, language related optimizations are
INLINE annotations. Others are mostly last drop
squeezing kind.
Performance
Optimization
Composewell Technologies
Ground Rules
‣ MEASURE, define proper benchmarks

‣ ANALYZE, benchmarks may be wrong

‣ OPTIMIZE

‣ Algorithmic optimization first

‣ Biggest gain first 

‣ Optimize where it matters (fast path)

‣ DEBUG

‣ Narrow down by incremental elimination

‣ Narrow down by incremental addition

‣ RATCHET, don’t lose the hard work spent in discovering issues
Composewell Technologies
The three musketeers
1. INLINE

2. SPECIALIZE

3. STRICTIFY
INLINE
Composewell Technologies
Inlining
‣Instead of making a function call, expand the definition of
a function at the call site.
Composewell Technologies
Inlining
(Definition Site)
‣For inlining or specialization to occur in another module the
original RHS of a function must be recorded in the interface file (.hi). 

‣By default GHC may or may not choose to keep the original RHS
in the interface file.

‣INLINABLE => direct the compiler to record the original RHS of
the function in interface file (.hi)

‣INLINE => Like INLINABLE, but also direct the compiler to
actually inline the function at all call sites.

‣-fexpose-all-unfoldings is a way to mark everything INLINABLE
Composewell Technologies
Inlining
(Call Site)
‣Prerequisite: function’s original RHS must be available in
the interface file.

‣If the function was marked INLINE at the definition site,
then unconditionally inline it.

‣If the function was not marked INLINE, then the function
inline can be used to ask the compiler to inline it
unconditionally.

‣Otherwise, GHC decides whether to inline or not. See -
funfolding-* and -fmax-inline-* options to control.
Composewell Technologies
When inlining cannot occur
‣Function is not fully applied

‣The function is passed as an argument to a function which
itself is not inlined.

‣Function is self recursive

‣For mutually recursive functions GHC tries not to use a
function with INLINE pragma as a loop breaker.
Composewell Technologies
When an INLINE is missing
func :: String -> Stream IO Int -> Benchmark
func name f = bench name $ nfIO $ S.mapM_ (_ -> return ()) f
• Without an INLINE on func 50 ms, with INLINE 500us, 100x faster.

• Without marking func inline, f cannot be inlined and cannot fuse with
mapM_. So we need an INLINE on both func as well as f.

• Code depending on fusion is specially sensitive to inlining, because
fusion depends on inlining.

• CPS code is more robust against inlining. Direct style code may
perform much worse compared to CPS when an INLINE goes
missing. However, it can be much faster than CPS with proper inlining.
Composewell Technologies
NOINLINE for better
performance!
• Lot of people think it is counterintuitive, even the GHC
manual says you should never need this, but it is pretty
common to get modest perf gains by using NOINLINE.

• Putting slow path branch out of the way in a separate
function marked NOINLINE helps the fast path branch to
be executed more efficiently.

• We can use noinline as well to avoid inlining a
particular call.
SPECIALIZE
(polymorphic code)
Composewell Technologies
Specializing
‣Instead of calling a polymorphic version of a function,
make a copy, specialized to less polymorphic types.

{-# SPECIALIZE consM :: IO a -> Stream IO a -> Stream IO a #-}
consM :: Monad m => m a -> Stream m a -> Stream m a
consM = consMSerial
Composewell Technologies
Specializing
(Definition Site)
‣INLINABLE => direct the compiler to record the original
RHS of the function in interface file (.hi). The function can
then be specialized where it is imported using
SPECIALIZE.

‣SPECIALIZE => direct the compiler to specialize a
function at the given type and use that version wherever
applicable.

‣SPECIALIZE instance => direct the compiler to
specialize a type class instance at the given type.
Composewell Technologies
Specializing
(Call Site)
‣Prerequisite: function’s original RHS must be available in
the interface file. INLINE or INLINABLE can be used to
ensure that.

‣SPECIALIZE => direct the compiler to specialize an
imported function at the given type for this module.

‣For all local functions or imported functions that have their
RHS available in the interface file, GHC may automatically
specialize them. See -fspecialise-aggressively
too.
Composewell Technologies
Call Pattern Specialization
(Recursive Functions)
‣GHC option -fspec-constr specializes a recursive
function for different constructor cases of its argument.

‣Use SPEC and a strict argument to a function to direct the
compiler to perform spec-constr aggressively.
Composewell Technologies
When specialization cannot
occur
‣Function is not fully applied (unsaturated calls)

‣Function calls other functions which cannot be
specialized.

‣Function uses polymorphic recursion

‣-Wmissed-specialisations and -Wall-missed-
specialisations GHC options can be useful.
STRICTIFY
(Buffers)
Composewell Technologies
Strictness
• Do not keep lazy expressions in memory that are anyway to be
reduced ultimately, reduce them as soon as possible.

• It may be inefficient, may consume more memory and more
importantly make GC expensive.

• As a general rule be lazy for construction and transformation and
be strict for reduction. Laziness helps when you are processing
something, strictness helps when you are storing or buffering.

• Use strict accumulator for strict left folds.

• Use strict record fields for records used for buffered storage.
Composewell Technologies
Strictify and Unbox
• BangPatterns can be used to mark function arguments
or constructor fields strict, i.e. reduced when applied.

• Strict function application $!
• Use UNPACK pragma to keep constructor fields unboxed.

• -funbox-strict-fields is often useful
Measurement
Focus on tests in C, benchmarks in Haskell
Composewell Technologies
Benchmarking Tools
• gauge vs criterion

• Faced several benchmarking issues during streamly
and streaming-benchmarks development

• Made significant improvements to gauge to address the
issues.

• Wrote the bench-show package for robust analysis,
comparison and presentation of benchmarks
Composewell Technologies
Benchmarking Pitfalls
• Benchmarking code need to be optimized exactly the way
you would optimize the code being benchmarked.

• A missing INLINE in benchmarking code could cause a
huge difference invalidating the results.

• Benchmarking relies on rnf implementation, if that itself
is slow (e.g. not marked INLINE) then we may get false
results. We encountered this problem at least once.

• Multiple benchmarks can interfere with each other in ways
you may not be able to detect easily.
Composewell Technologies
Benchmarking Pitfalls
• You may be measuring the cost of doing nothing, even
with nfIO. We generate a random number in IO and pass
it to the computation being benchmarked to avoid the
issue.

• When measuring with nf f arg, remember we are
measuring f and not arg. arg may get evaluated once
and reused.
Composewell Technologies
Gauge Improvements
• Run each benchmark in isolation, in a separate process. This
is brute force way to ensure that there is no interference from
other benchmarks. Correct maxrss measurement requires
this.

• Several correctness fixes to measure stats accurately.

• Use getrusage to report many other stats like maxrss, page
faults and context switches. maxrss is especially
useful to get peak memory consumption data.

• Added a —quick mode to run benchmarks quickly (10x faster)
Composewell Technologies
Gauge Improvements
• Provides raw data for each iteration in a CSV file, for
external analysis. This is used by bench-show.

• Better control over measurement process from the CLI

• nfAppIO and whnfAppIO for more reliable
measurements. Contributed by rubenpieters.
Analyzing and Comparing
Performance
(bench-show)
Composewell Technologies
Benchmarking Business
• streamly is a high performance monadic streaming
framework generalizing lists to monads with inherent
concurrency support.

• When a single INLINE can degrade performance by 100x
how do we guarantee performance?

• Measure everything. We have hundreds of benchmarks,
each and every op is benchmarked.

• With such a large number of benchmarks, how do we
analyze the benchmarking output?
Composewell Technologies
Enter bench-show
• Analyses the results using 3 statistical estimators - linear
regression, median and mean
• Finds the difference between two runs and reports the min of
3 estimators

• Computes the percentage regression or improvement

• Sorts and reports by the highest regression, time as well as
space.

• We can automatically report regressions on each commit, by
using a threshold.
Composewell Technologies
Reporting Regressions
(% Diff)
Composewell Technologies
Reporting Regressions
(Absolute Delta)
Composewell Technologies
Comparing Packages
• bench-show can group benchmarks arbitrarily and
compare the groups.

• streaming-benchmarks package uses this to compare
various streaming libraries.
Composewell Technologies
Monadic Streaming
Composewell Technologies
Pure Streaming (Time)
Composewell Technologies
Pure Streaming (Space)
Composewell Technologies
References
• https://github.com/composewell/streamly

• https://github.com/composewell/streaming-benchmarks

• https://github.com/composewell/bench-show

• https://github.com/vincenthz/hs-gauge

• https://github.com/composewell/unicode-transforms
Thank You
harendra.kumar@gmail.com
@hk_hooda

More Related Content

What's hot

Aurora MySQL Backtrack을 이용한 빠른 복구 방법 - 진교선 :: AWS Database Modernization Day 온라인
Aurora MySQL Backtrack을 이용한 빠른 복구 방법 - 진교선 :: AWS Database Modernization Day 온라인Aurora MySQL Backtrack을 이용한 빠른 복구 방법 - 진교선 :: AWS Database Modernization Day 온라인
Aurora MySQL Backtrack을 이용한 빠른 복구 방법 - 진교선 :: AWS Database Modernization Day 온라인
Amazon Web Services Korea
 
A Practical Guide to Domain Driven Design: Presentation Slides
A Practical Guide to Domain Driven Design: Presentation SlidesA Practical Guide to Domain Driven Design: Presentation Slides
A Practical Guide to Domain Driven Design: Presentation Slides
thinkddd
 
Managing Data in Microservices
Managing Data in MicroservicesManaging Data in Microservices
Managing Data in Microservices
Randy Shoup
 
Rate limits and Performance
Rate limits and PerformanceRate limits and Performance
Rate limits and Performance
supergigas
 
Clean coding-practices
Clean coding-practicesClean coding-practices
Clean coding-practices
John Ferguson Smart Limited
 
Introduction to Apache Camel
Introduction to Apache CamelIntroduction to Apache Camel
Introduction to Apache Camel
Claus Ibsen
 
CloudWatch Logsについて
CloudWatch LogsについてCloudWatch Logsについて
CloudWatch Logsについて
Sugawara Genki
 
Agile testing
Agile testing Agile testing
The Art of Unit Testing - Towards a Testable Design
The Art of Unit Testing - Towards a Testable DesignThe Art of Unit Testing - Towards a Testable Design
The Art of Unit Testing - Towards a Testable Design
Victor Rentea
 
ログ管理のベストプラクティス
ログ管理のベストプラクティスログ管理のベストプラクティス
ログ管理のベストプラクティス
Akihiro Kuwano
 
다양한 배포 기법과 AWS에서 구축하는 CI/CD 파이프라인 l 안효빈 솔루션즈 아키텍트
다양한 배포 기법과 AWS에서 구축하는 CI/CD 파이프라인 l 안효빈 솔루션즈 아키텍트다양한 배포 기법과 AWS에서 구축하는 CI/CD 파이프라인 l 안효빈 솔루션즈 아키텍트
다양한 배포 기법과 AWS에서 구축하는 CI/CD 파이프라인 l 안효빈 솔루션즈 아키텍트
Amazon Web Services Korea
 
마이크로서비스 아키텍처로 개발하기
마이크로서비스 아키텍처로 개발하기마이크로서비스 아키텍처로 개발하기
마이크로서비스 아키텍처로 개발하기
Jaewoo Ahn
 
[전득진_22년4월] AI_ML담당_Tech_seminar-emart.pdf
[전득진_22년4월] AI_ML담당_Tech_seminar-emart.pdf[전득진_22년4월] AI_ML담당_Tech_seminar-emart.pdf
[전득진_22년4월] AI_ML담당_Tech_seminar-emart.pdf
DeukJin Jeon
 
[Cloud OnAir] Apigee でかんたん API 管理 2019年12月12日 放送
[Cloud OnAir] Apigee でかんたん API 管理 2019年12月12日 放送[Cloud OnAir] Apigee でかんたん API 管理 2019年12月12日 放送
[Cloud OnAir] Apigee でかんたん API 管理 2019年12月12日 放送
Google Cloud Platform - Japan
 
[OKKYCON] 정진욱 - 테스트하기 쉬운 코드로 개발하기
[OKKYCON] 정진욱 - 테스트하기 쉬운 코드로 개발하기[OKKYCON] 정진욱 - 테스트하기 쉬운 코드로 개발하기
[OKKYCON] 정진욱 - 테스트하기 쉬운 코드로 개발하기
OKKY
 
Domain Driven Design and Hexagonal Architecture with Rails
Domain Driven Design and Hexagonal Architecture with RailsDomain Driven Design and Hexagonal Architecture with Rails
Domain Driven Design and Hexagonal Architecture with Rails
Declan Whelan
 
Favor composition over inheritance
Favor composition over inheritanceFavor composition over inheritance
Favor composition over inheritance
Kohei Nozaki
 
Cqrs api v2
Cqrs api v2Cqrs api v2
Cqrs api v2
Brandon Mueller
 
Code quality for Terraform
Code quality for TerraformCode quality for Terraform
Code quality for Terraform
Mitchell Pronschinske
 
SOLID
SOLIDSOLID

What's hot (20)

Aurora MySQL Backtrack을 이용한 빠른 복구 방법 - 진교선 :: AWS Database Modernization Day 온라인
Aurora MySQL Backtrack을 이용한 빠른 복구 방법 - 진교선 :: AWS Database Modernization Day 온라인Aurora MySQL Backtrack을 이용한 빠른 복구 방법 - 진교선 :: AWS Database Modernization Day 온라인
Aurora MySQL Backtrack을 이용한 빠른 복구 방법 - 진교선 :: AWS Database Modernization Day 온라인
 
A Practical Guide to Domain Driven Design: Presentation Slides
A Practical Guide to Domain Driven Design: Presentation SlidesA Practical Guide to Domain Driven Design: Presentation Slides
A Practical Guide to Domain Driven Design: Presentation Slides
 
Managing Data in Microservices
Managing Data in MicroservicesManaging Data in Microservices
Managing Data in Microservices
 
Rate limits and Performance
Rate limits and PerformanceRate limits and Performance
Rate limits and Performance
 
Clean coding-practices
Clean coding-practicesClean coding-practices
Clean coding-practices
 
Introduction to Apache Camel
Introduction to Apache CamelIntroduction to Apache Camel
Introduction to Apache Camel
 
CloudWatch Logsについて
CloudWatch LogsについてCloudWatch Logsについて
CloudWatch Logsについて
 
Agile testing
Agile testing Agile testing
Agile testing
 
The Art of Unit Testing - Towards a Testable Design
The Art of Unit Testing - Towards a Testable DesignThe Art of Unit Testing - Towards a Testable Design
The Art of Unit Testing - Towards a Testable Design
 
ログ管理のベストプラクティス
ログ管理のベストプラクティスログ管理のベストプラクティス
ログ管理のベストプラクティス
 
다양한 배포 기법과 AWS에서 구축하는 CI/CD 파이프라인 l 안효빈 솔루션즈 아키텍트
다양한 배포 기법과 AWS에서 구축하는 CI/CD 파이프라인 l 안효빈 솔루션즈 아키텍트다양한 배포 기법과 AWS에서 구축하는 CI/CD 파이프라인 l 안효빈 솔루션즈 아키텍트
다양한 배포 기법과 AWS에서 구축하는 CI/CD 파이프라인 l 안효빈 솔루션즈 아키텍트
 
마이크로서비스 아키텍처로 개발하기
마이크로서비스 아키텍처로 개발하기마이크로서비스 아키텍처로 개발하기
마이크로서비스 아키텍처로 개발하기
 
[전득진_22년4월] AI_ML담당_Tech_seminar-emart.pdf
[전득진_22년4월] AI_ML담당_Tech_seminar-emart.pdf[전득진_22년4월] AI_ML담당_Tech_seminar-emart.pdf
[전득진_22년4월] AI_ML담당_Tech_seminar-emart.pdf
 
[Cloud OnAir] Apigee でかんたん API 管理 2019年12月12日 放送
[Cloud OnAir] Apigee でかんたん API 管理 2019年12月12日 放送[Cloud OnAir] Apigee でかんたん API 管理 2019年12月12日 放送
[Cloud OnAir] Apigee でかんたん API 管理 2019年12月12日 放送
 
[OKKYCON] 정진욱 - 테스트하기 쉬운 코드로 개발하기
[OKKYCON] 정진욱 - 테스트하기 쉬운 코드로 개발하기[OKKYCON] 정진욱 - 테스트하기 쉬운 코드로 개발하기
[OKKYCON] 정진욱 - 테스트하기 쉬운 코드로 개발하기
 
Domain Driven Design and Hexagonal Architecture with Rails
Domain Driven Design and Hexagonal Architecture with RailsDomain Driven Design and Hexagonal Architecture with Rails
Domain Driven Design and Hexagonal Architecture with Rails
 
Favor composition over inheritance
Favor composition over inheritanceFavor composition over inheritance
Favor composition over inheritance
 
Cqrs api v2
Cqrs api v2Cqrs api v2
Cqrs api v2
 
Code quality for Terraform
Code quality for TerraformCode quality for Terraform
Code quality for Terraform
 
SOLID
SOLIDSOLID
SOLID
 

Similar to High Performance Haskell

Build Systems with autoconf, automake and libtool [updated]
Build Systems with autoconf, automake and libtool [updated]Build Systems with autoconf, automake and libtool [updated]
Build Systems with autoconf, automake and libtool [updated]
Benny Siegert
 
Lecture 1 Compiler design , computation
Lecture 1 Compiler design , computation Lecture 1 Compiler design , computation
Lecture 1 Compiler design , computation
Rebaz Najeeb
 
groovy & grails - lecture 5
groovy & grails - lecture 5groovy & grails - lecture 5
groovy & grails - lecture 5
Alexandre Masselot
 
44CON London 2015 - reverse reverse engineering
44CON London 2015 - reverse reverse engineering44CON London 2015 - reverse reverse engineering
44CON London 2015 - reverse reverse engineering
44CON
 
C tour Unix
C tour UnixC tour Unix
C tour Unix
Melvin Cabatuan
 
Callgraph analysis
Callgraph analysisCallgraph analysis
Callgraph analysis
Roberto Agostino Vitillo
 
12 Jo P Dec 07
12 Jo P Dec 0712 Jo P Dec 07
12 Jo P Dec 07
Ganesh Samarthyam
 
PHP Mega Meetup, Sep, 2020, Anti patterns in php
PHP Mega Meetup, Sep, 2020, Anti patterns in phpPHP Mega Meetup, Sep, 2020, Anti patterns in php
PHP Mega Meetup, Sep, 2020, Anti patterns in php
Ahmed Abdou
 
Building CLIs with Ruby
Building CLIs with RubyBuilding CLIs with Ruby
Building CLIs with Ruby
drizzlo
 
Switch case looping
Switch case loopingSwitch case looping
Switch case looping
Cherimay Batallones
 
C programming session9 -
C programming  session9 -C programming  session9 -
C programming session9 -
Keroles karam khalil
 
Inline function
Inline functionInline function
Inline function
Tech_MX
 
Switch case and looping
Switch case and loopingSwitch case and looping
Switch case and looping
ChaAstillas
 
Compiler optimizations based on call-graph flattening
Compiler optimizations based on call-graph flatteningCompiler optimizations based on call-graph flattening
Compiler optimizations based on call-graph flattening
CAFxX
 
towards ruote 2.0
towards ruote 2.0towards ruote 2.0
towards ruote 2.0
John Mettraux
 
towards ruote 2.0
towards ruote 2.0towards ruote 2.0
towards ruote 2.0
guestb918079
 
Yeahhhh the final requirement!!!
Yeahhhh the final requirement!!!Yeahhhh the final requirement!!!
Yeahhhh the final requirement!!!
olracoatalub
 
C++ programming
C++ programmingC++ programming
C++ programming
viancagerone
 
My final requirement
My final requirementMy final requirement
My final requirement
katrinaguevarra29
 
Switch case and looping kim
Switch case and looping kimSwitch case and looping kim
Switch case and looping kim
kimberly_Bm10203
 

Similar to High Performance Haskell (20)

Build Systems with autoconf, automake and libtool [updated]
Build Systems with autoconf, automake and libtool [updated]Build Systems with autoconf, automake and libtool [updated]
Build Systems with autoconf, automake and libtool [updated]
 
Lecture 1 Compiler design , computation
Lecture 1 Compiler design , computation Lecture 1 Compiler design , computation
Lecture 1 Compiler design , computation
 
groovy & grails - lecture 5
groovy & grails - lecture 5groovy & grails - lecture 5
groovy & grails - lecture 5
 
44CON London 2015 - reverse reverse engineering
44CON London 2015 - reverse reverse engineering44CON London 2015 - reverse reverse engineering
44CON London 2015 - reverse reverse engineering
 
C tour Unix
C tour UnixC tour Unix
C tour Unix
 
Callgraph analysis
Callgraph analysisCallgraph analysis
Callgraph analysis
 
12 Jo P Dec 07
12 Jo P Dec 0712 Jo P Dec 07
12 Jo P Dec 07
 
PHP Mega Meetup, Sep, 2020, Anti patterns in php
PHP Mega Meetup, Sep, 2020, Anti patterns in phpPHP Mega Meetup, Sep, 2020, Anti patterns in php
PHP Mega Meetup, Sep, 2020, Anti patterns in php
 
Building CLIs with Ruby
Building CLIs with RubyBuilding CLIs with Ruby
Building CLIs with Ruby
 
Switch case looping
Switch case loopingSwitch case looping
Switch case looping
 
C programming session9 -
C programming  session9 -C programming  session9 -
C programming session9 -
 
Inline function
Inline functionInline function
Inline function
 
Switch case and looping
Switch case and loopingSwitch case and looping
Switch case and looping
 
Compiler optimizations based on call-graph flattening
Compiler optimizations based on call-graph flatteningCompiler optimizations based on call-graph flattening
Compiler optimizations based on call-graph flattening
 
towards ruote 2.0
towards ruote 2.0towards ruote 2.0
towards ruote 2.0
 
towards ruote 2.0
towards ruote 2.0towards ruote 2.0
towards ruote 2.0
 
Yeahhhh the final requirement!!!
Yeahhhh the final requirement!!!Yeahhhh the final requirement!!!
Yeahhhh the final requirement!!!
 
C++ programming
C++ programmingC++ programming
C++ programming
 
My final requirement
My final requirementMy final requirement
My final requirement
 
Switch case and looping kim
Switch case and looping kimSwitch case and looping kim
Switch case and looping kim
 

Recently uploaded

Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
Sven Peters
 
UI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design SystemUI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design System
Peter Muessig
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
Peter Muessig
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?
ToXSL Technologies
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
TheSMSPoint
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
Quickdice ERP
 
Lecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptxLecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptx
TaghreedAltamimi
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
Hornet Dynamics
 
What is Master Data Management by PiLog Group
What is Master Data Management by PiLog GroupWhat is Master Data Management by PiLog Group
What is Master Data Management by PiLog Group
aymanquadri279
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
Rakesh Kumar R
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
Green Software Development
 
Odoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Odoo ERP Vs. Traditional ERP Systems – A Comparative AnalysisOdoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Odoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Envertis Software Solutions
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
XfilesPro
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
Philip Schwarz
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
mz5nrf0n
 

Recently uploaded (20)

Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
 
UI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design SystemUI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design System
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
 
Lecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptxLecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptx
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
 
What is Master Data Management by PiLog Group
What is Master Data Management by PiLog GroupWhat is Master Data Management by PiLog Group
What is Master Data Management by PiLog Group
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
 
Odoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Odoo ERP Vs. Traditional ERP Systems – A Comparative AnalysisOdoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Odoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
 

High Performance Haskell

  • 2. Composewell Technologies Harendra Kumar ‣More than a decade of systems programming in C ‣Writing Haskell for last three years ‣Currently focusing on streamly, an ambitious project that aims to make programming practical systems in Haskell a joy and ensure C like high performance.
  • 3. Composewell Technologies Haskell Performance ‣Can easily be off by 10x or 100x from the best ‣Refactoring can easily affect performance ‣You cannot be confident unless you measure ‣Best practices can easily get you in the ballpark ‣Squeezing the last drop may be harder ‣With some effort, can get close to C or even better
  • 5. Composewell Technologies Unicode Normalization A case study ‣Challenge: can we do unicode normalization equal to or faster than the best C++ library (icu)?
  • 6. Composewell Technologies The Problem ‣A unicode character may have multiple forms (composed/decomposed). ‣ Åström (U+00C5 U+0073 U+0074 U+0072 U+00F6 U+006D) ‣ Åström (U+0041 U+030A U+0073 U+0074 U+0072 U+006F U+0308 U+006D) ‣To compare strings we need to bring them to a common same normal form (e.g. NFC/NFD).
  • 7. Composewell Technologies Normalized Form Decomposed (NFD) ‣Sequence of chars: ‣Starter,Starter,Combining1,Combining2…Starter,Combining1… ‣Lookup character: ‣has decomposition? ‣replace with its components ‣Lookup combining class: ‣0 => Starter, Non-zero => combining ‣Reorder multiple combining chars as per combining class
  • 8. Composewell Technologies Unicode Character Database ‣Lookup maps: ‣Decomposition map, ~2000 entries ‣Combining class map ~1000 entries ‣Algorithmic decomposition of Hangul characters
  • 9. Composewell Technologies Naive, Elegant Code ‣Normalization in ~50 lines of core code ‣Use IntMap for database lookup ‣Use Haskell lists for processing ‣Idiomatic code
  • 13. Composewell Technologies Fast Path Decomposition Lookup
  • 14. Composewell Technologies Decomposition ‣Decomposition is recursive ‣Use simple recursion instead of iterate, zip with tail idioms to decompose recursively.
  • 15. Composewell Technologies Fast Path Reordering ‣Original code: ‣split into groups, sortBy combining class (CC) ‣“SCCSCC” => [(S,0), (C,10), (C,11)], [(S,0), (C,5), (C,6)] ‣Optimized code: use custom sorting for the cases when the sort group size is 1 or 2, fallback to regular list sort for the rest. ‣Use bitmap for a quick combining or non-combining check, non-combining is a common case.
  • 16. Composewell Technologies Monolithic Decompose and Reorder ‣ Original code: reorder . decompose ‣ Optimized code: decomposeAndReorder reorderBuffer ‣ In the common case the buffer has just one char and it gets flushed when we get the next char. ‣ We need to sort the buffer only when there are more than one combining chars in the buffer. ‣ Use custom sorting for 2 char sorting case. ‣ Do not use string append for reorder buffer, manually deconstruct and reconstruct the list for short common cases. (10% improvement)
  • 17. Composewell Technologies Hangul Jamo Normalization ‣Use algorithmic decomposition as prescribed by the unicode standard, instead of simple lookup based approach. ‣NOINLINE Hangul Jamo case - this is not fast path ‣Use quot/rem instead of div/mod ‣user quotRem instead of quot/rem ‣Use unsafeChr instead of chr ‣Use strict values in list buffers ‣Use tuples instead of lists for returning short buffers ‣Localize recursion to non-hangul case
  • 18. Composewell Technologies Where are we? (C++/Haskell/C)
  • 19. Composewell Technologies Can we do better? ‣Remember we are still using plain Haskell strings! Let’s do some minimal experiments to test the limits: stringOp = map (chr . (+ 1) . ord) — 17 ms textOp = T.map (chr . (+ 1) . ord) — 11 ms textOp = T.unstream . T.stream — 4.0 ms ICU English Normalization — 2.7 ms Fixed Text unstream code - NOINLINE realloc code — 1.3 ms
  • 20. Composewell Technologies Let’s Apply This ‣Use Text with stream/unstream instead of strings ‣Conditional branch readjustments, for fast path. ‣Inlining ‣INLINE the isCombining check (+16%) ‣Add NOINLINE to slow path code ‣-funbox-strict-fields
  • 21. Composewell Technologies Optimize Reorder Buffer ‣Instead of a list, use a custom data type optimized for fast path cases: data Buffer = Empty | One {-# UNPACK #-} !Char | Many [Char] ‣Use a mutable reorder buffer ‣ + 5%
  • 22. Composewell Technologies Where are we now? (C++/Haskell)
  • 24. Composewell Technologies We can do better ‣We can use non-decomposable starter lookup for fast path. It will cut common case lookups by half. ‣We have not tried hash lookup ‣ICU C++ library uses unicode quick check properties for optimization, we can also do the same to further optimize at algorithmic level. ‣Code generation by GHC can possibly be improved. I raised a couple of tickets about it.
  • 25. Composewell Technologies Lessons ‣Using Haskell we can write concise code with acceptable performance quickly. ‣The code can be optimized to perform as well as C ‣Most of the optimization we did were algorithmic and logic related rather than language related issues. Mostly custom handling of fast path. ‣The most common, language related optimizations are INLINE annotations. Others are mostly last drop squeezing kind.
  • 27. Composewell Technologies Ground Rules ‣ MEASURE, define proper benchmarks ‣ ANALYZE, benchmarks may be wrong ‣ OPTIMIZE ‣ Algorithmic optimization first ‣ Biggest gain first ‣ Optimize where it matters (fast path) ‣ DEBUG ‣ Narrow down by incremental elimination ‣ Narrow down by incremental addition ‣ RATCHET, don’t lose the hard work spent in discovering issues
  • 28. Composewell Technologies The three musketeers 1. INLINE 2. SPECIALIZE 3. STRICTIFY
  • 30. Composewell Technologies Inlining ‣Instead of making a function call, expand the definition of a function at the call site.
  • 31. Composewell Technologies Inlining (Definition Site) ‣For inlining or specialization to occur in another module the original RHS of a function must be recorded in the interface file (.hi). ‣By default GHC may or may not choose to keep the original RHS in the interface file. ‣INLINABLE => direct the compiler to record the original RHS of the function in interface file (.hi) ‣INLINE => Like INLINABLE, but also direct the compiler to actually inline the function at all call sites. ‣-fexpose-all-unfoldings is a way to mark everything INLINABLE
  • 32. Composewell Technologies Inlining (Call Site) ‣Prerequisite: function’s original RHS must be available in the interface file. ‣If the function was marked INLINE at the definition site, then unconditionally inline it. ‣If the function was not marked INLINE, then the function inline can be used to ask the compiler to inline it unconditionally. ‣Otherwise, GHC decides whether to inline or not. See - funfolding-* and -fmax-inline-* options to control.
  • 33. Composewell Technologies When inlining cannot occur ‣Function is not fully applied ‣The function is passed as an argument to a function which itself is not inlined. ‣Function is self recursive ‣For mutually recursive functions GHC tries not to use a function with INLINE pragma as a loop breaker.
  • 34. Composewell Technologies When an INLINE is missing func :: String -> Stream IO Int -> Benchmark func name f = bench name $ nfIO $ S.mapM_ (_ -> return ()) f • Without an INLINE on func 50 ms, with INLINE 500us, 100x faster. • Without marking func inline, f cannot be inlined and cannot fuse with mapM_. So we need an INLINE on both func as well as f. • Code depending on fusion is specially sensitive to inlining, because fusion depends on inlining. • CPS code is more robust against inlining. Direct style code may perform much worse compared to CPS when an INLINE goes missing. However, it can be much faster than CPS with proper inlining.
  • 35. Composewell Technologies NOINLINE for better performance! • Lot of people think it is counterintuitive, even the GHC manual says you should never need this, but it is pretty common to get modest perf gains by using NOINLINE. • Putting slow path branch out of the way in a separate function marked NOINLINE helps the fast path branch to be executed more efficiently. • We can use noinline as well to avoid inlining a particular call.
  • 37. Composewell Technologies Specializing ‣Instead of calling a polymorphic version of a function, make a copy, specialized to less polymorphic types. {-# SPECIALIZE consM :: IO a -> Stream IO a -> Stream IO a #-} consM :: Monad m => m a -> Stream m a -> Stream m a consM = consMSerial
  • 38. Composewell Technologies Specializing (Definition Site) ‣INLINABLE => direct the compiler to record the original RHS of the function in interface file (.hi). The function can then be specialized where it is imported using SPECIALIZE. ‣SPECIALIZE => direct the compiler to specialize a function at the given type and use that version wherever applicable. ‣SPECIALIZE instance => direct the compiler to specialize a type class instance at the given type.
  • 39. Composewell Technologies Specializing (Call Site) ‣Prerequisite: function’s original RHS must be available in the interface file. INLINE or INLINABLE can be used to ensure that. ‣SPECIALIZE => direct the compiler to specialize an imported function at the given type for this module. ‣For all local functions or imported functions that have their RHS available in the interface file, GHC may automatically specialize them. See -fspecialise-aggressively too.
  • 40. Composewell Technologies Call Pattern Specialization (Recursive Functions) ‣GHC option -fspec-constr specializes a recursive function for different constructor cases of its argument. ‣Use SPEC and a strict argument to a function to direct the compiler to perform spec-constr aggressively.
  • 41. Composewell Technologies When specialization cannot occur ‣Function is not fully applied (unsaturated calls) ‣Function calls other functions which cannot be specialized. ‣Function uses polymorphic recursion ‣-Wmissed-specialisations and -Wall-missed- specialisations GHC options can be useful.
  • 43. Composewell Technologies Strictness • Do not keep lazy expressions in memory that are anyway to be reduced ultimately, reduce them as soon as possible. • It may be inefficient, may consume more memory and more importantly make GC expensive. • As a general rule be lazy for construction and transformation and be strict for reduction. Laziness helps when you are processing something, strictness helps when you are storing or buffering. • Use strict accumulator for strict left folds. • Use strict record fields for records used for buffered storage.
  • 44. Composewell Technologies Strictify and Unbox • BangPatterns can be used to mark function arguments or constructor fields strict, i.e. reduced when applied. • Strict function application $! • Use UNPACK pragma to keep constructor fields unboxed. • -funbox-strict-fields is often useful
  • 45. Measurement Focus on tests in C, benchmarks in Haskell
  • 46. Composewell Technologies Benchmarking Tools • gauge vs criterion • Faced several benchmarking issues during streamly and streaming-benchmarks development • Made significant improvements to gauge to address the issues. • Wrote the bench-show package for robust analysis, comparison and presentation of benchmarks
  • 47. Composewell Technologies Benchmarking Pitfalls • Benchmarking code need to be optimized exactly the way you would optimize the code being benchmarked. • A missing INLINE in benchmarking code could cause a huge difference invalidating the results. • Benchmarking relies on rnf implementation, if that itself is slow (e.g. not marked INLINE) then we may get false results. We encountered this problem at least once. • Multiple benchmarks can interfere with each other in ways you may not be able to detect easily.
  • 48. Composewell Technologies Benchmarking Pitfalls • You may be measuring the cost of doing nothing, even with nfIO. We generate a random number in IO and pass it to the computation being benchmarked to avoid the issue. • When measuring with nf f arg, remember we are measuring f and not arg. arg may get evaluated once and reused.
  • 49. Composewell Technologies Gauge Improvements • Run each benchmark in isolation, in a separate process. This is brute force way to ensure that there is no interference from other benchmarks. Correct maxrss measurement requires this. • Several correctness fixes to measure stats accurately. • Use getrusage to report many other stats like maxrss, page faults and context switches. maxrss is especially useful to get peak memory consumption data. • Added a —quick mode to run benchmarks quickly (10x faster)
  • 50. Composewell Technologies Gauge Improvements • Provides raw data for each iteration in a CSV file, for external analysis. This is used by bench-show. • Better control over measurement process from the CLI • nfAppIO and whnfAppIO for more reliable measurements. Contributed by rubenpieters.
  • 52. Composewell Technologies Benchmarking Business • streamly is a high performance monadic streaming framework generalizing lists to monads with inherent concurrency support. • When a single INLINE can degrade performance by 100x how do we guarantee performance? • Measure everything. We have hundreds of benchmarks, each and every op is benchmarked. • With such a large number of benchmarks, how do we analyze the benchmarking output?
  • 53. Composewell Technologies Enter bench-show • Analyses the results using 3 statistical estimators - linear regression, median and mean • Finds the difference between two runs and reports the min of 3 estimators • Computes the percentage regression or improvement • Sorts and reports by the highest regression, time as well as space. • We can automatically report regressions on each commit, by using a threshold.
  • 56. Composewell Technologies Comparing Packages • bench-show can group benchmarks arbitrarily and compare the groups. • streaming-benchmarks package uses this to compare various streaming libraries.
  • 60. Composewell Technologies References • https://github.com/composewell/streamly • https://github.com/composewell/streaming-benchmarks • https://github.com/composewell/bench-show • https://github.com/vincenthz/hs-gauge • https://github.com/composewell/unicode-transforms