© 2019 Composewell Technologies
Streamly:
Concurrent Data Flow
Programming
Harendra Kumar
15 Nov 2019
Composewell Technologies
© 2019 Composewell Technologies
Ergonomics of Shell
Safety of Haskell
Speed of C
Magical Concurrency
About Streamly
https://github.com/composewell/streamly
© 2019 Composewell Technologies
About Me
• C programming
• OS kernel, file systems
• Haskell since 2015
harendra@composewell.com
@hk_hooda
© 2019 Composewell Technologies
About Composewell
• Develops Streamly
• Provides commercial support
• Designs solutions using Streamly
• Provides training on Streamly
http://www.composewell.com
hello@composewell.com
© 2019 Composewell Technologies
“Simplicity is a great virtue but it
requires hard work to achieve it
and education to appreciate it. And
to make matters worse: complexity
sells better.”
— Edsger Wybe Dijkstra
© 2019 Composewell Technologies
Streams
© 2019 Composewell Technologies
What is a stream?
• A sequence of same type of items
• Combinators to process the sequence
© 2019 Composewell Technologies
Pure Effectful
List
Stream
(Effectful List)
© 2019 Composewell Technologies
Imperative Functional
Loop
Stream
(Modular Loops)
© 2019 Composewell Technologies
Do I need streams?
• Imperative version: Do I need loops?
© 2019 Composewell Technologies
Streamly
=
Efficient, Composable and
Concurrent Loops
© 2019 Composewell Technologies
Who should use Streamly?
• General purpose framework
• Declarative concurrency
• High performance, nearing or beating C
• Some examples:
• Server backends
• Reactive programming
• Real time data analysis
• Streaming data applications
© 2019 Composewell Technologies
Streamly At a Glance
© 2019 Composewell Technologies
Engineering Focused Library
(Goals)
Ergonomics Python and Shell
Performance C
Composability Haskell
Safety Haskell
Simplicity Use only basic Haskell
© 2019 Composewell Technologies
Composability
+
Performance
© 2019 Composewell Technologies
Streamly Version
• Examples in this presentation use
streamly-0.7.0
• Some examples may use APIs from
Streamly.Internal.* modules
• Source code for examples can be found at:
https://github.com/composewell/streamly-examples
© 2019 Composewell Technologies
Fundamental Operations
(Conceptual)
Operation Shape
Generate a -> Stream m b
Transform Stream m a -> Stream m b
Eliminate Stream m a -> m b
© 2019 Composewell Technologies
Fundamental Data Types
(Actual)
Type Constructor
Unfold m a b forall s. Unfold (s -> m (Step s b)) (a -> m s)
state generate inject
Stream m a forall s. Stream (s -> m (Step s a)) s
state generate state
Fold m a b forall s. Fold (s -> a -> m s) (m s) (s -> m b)
state accumulate initial extract
© 2019 Composewell Technologies
Core Modules
Type Module Abbrev.
Unfold Streamly.Data.Unfold UF
Stream Streamly.Prelude S
Fold Streamly.Data.Fold FL
© 2019 Composewell Technologies
unfold and fold Functions
• S.unfold :: Unfold m a b -> a -> Stream m b
• S.fold :: Fold m b c -> Stream m b -> m c
© 2019 Composewell Technologies
unfold & fold
fold
S.unfold UF.fromList [1..10] & S.fold FL.sum
unfold
© 2019 Composewell Technologies
unfold & fold = Action
fold
a -> m c
unfolda m cStream m b
© 2019 Composewell Technologies
unfold & fold = Loop!
a -> m c
a m cStream m b
© 2019 Composewell Technologies
Stream Types
Type Coercion Modifier
SerialT m a serially
AsyncT m a asyncly
AheadT m a aheadly
© 2019 Composewell Technologies
Summing an Int Stream
S.unfold UF.fromList [1..10] -- SerialT Identity Int
& S.fold FL.sum -- Identity Int
© 2019 Composewell Technologies
Summing an Int Stream
(Better Ergonomics)
S.fromList [1..10] -- SerialT Identity Int
& S.sum -- Identity Int
fromList = S.unfold UF.fromList [1..10]
sum = S.fold FL.sum
© 2019 Composewell Technologies
File IO Examples
© 2019 Composewell Technologies
IO Modules
Module Abbrev.
Streamly.FileSystem.Handle FH
Streamly.FileSystem.File File
Streamly.Network.Socket SK
Streamly.Network.Inet.TCP TCP
Streamly.Data.Unicode.Stream U
© 2019 Composewell Technologies
cat
File.toChunks “inFile” -- SerialT IO (Array Word8)
& FH.putChunks -- IO ()
© 2019 Composewell Technologies
File Copy (cp)
File.toChunks “inFile” -- SerialT IO (Array Word8)
& File.fromChunks “outFile” -- IO ()
© 2019 Composewell Technologies
Count bytes in a File
(wc -c)
File.toBytes “inFile” -- SerialT IO Word8
& S.fold FL.length -- IO Int
© 2019 Composewell Technologies
Count Lines in a File
(wc -l)
File.toBytes “inFile” -- SerialT IO Word8
& S.fold nlines -- IO Int
countl :: Int -> Word8 -> Int
countl n ch = if (ch == 10) then n + 1 else
n
nlines :: Monad m => Fold m Word8 Int
nlines = FL.mkPure countl 0 id
© 2019 Composewell Technologies
Count Words in a File
(wc -w)
File.toBytes “inFile” -- SerialT IO Word8
& S.fold nwords -- IO Int
countw :: (Int, Bool) -> Word8 -> (Int, Bool)
countw (n, wasSpace) ch =
if (isSpace $ chr $ fromIntegral ch)
then (n, True)
else (if wasSpace then n + 1 else n, False)
nwords :: Monad m => Fold m Word8 Int
nwords = FL.mkPure countw (0, True) fst
© 2019 Composewell Technologies
Splitting a Stream
(Composing Folds)
© 2019 Composewell Technologies
Count Bytes, Lines & Words
(wc -clw)
File.toBytes “inFile” -- SerialT IO Word8
& S.fold ((,,) <$> nlines <*> nwords <*> FL.length)
-- IO (Int, Int, Int)
© 2019 Composewell Technologies
Sending output to multiple
files (tee)
File.toBytes “inFile” -- SerialT IO Word8
& S.fold (FL.tee (File.write “outFile1”)
(File.write “outFile2"))
-- IO ((),())
© 2019 Composewell Technologies
Splitting a file
(split)
type SIO = StateT (Maybe (Handle, Int)) IO
splitFile :: FH.Handle -> IO ()
splitFile inHandle =
File.toBytes “inFile” -- SerialT IO Word8
& S.liftInner -- SerialT SIO Word8
& S.chunksOf2 64 newHandle FH.write2 -- SerialT SIO Word8
& S.evalStateT Nothing -- SerialT IO ()
& S.drain -- IO ()
Uses experimental APIs
© 2019 Composewell Technologies
Transformation
Pipeline
© 2019 Composewell Technologies
Word Classifier
File.toBytes “inFile” -- SerialT IO Word8
& S.decodeLatin1 -- SerialT IO Char
& S.map toLower -- SerialT IO Char
& S.words FL.toList -- SerialT IO String
& S.filter (all isAlpha) -- SerialT IO String
& toHashMap -- IO (Map String (IORef Int))
See https://github.com/composewell/streamly/blob/master/examples/WordClassifier.hs
Adapted from an example by Patrick Thomson
alter Nothing = fmap Just $ newIORef (1 :: Int)
alter (Just ref) = do
modifyIORef' ref (+ 1)
return (Just ref)
toHashMap = S.foldlM' (flip (Map.alterF alter)) Map.empty
© 2019 Composewell Technologies
Word Size Histogram
bucket :: Int -> (Int, Int)
bucket n = let i = n `mod` 10
in if i > 9 then (9,n) else (i,n)
File.toBytes “inFile” -- SerialT IO Word8
& S.words FL.length -- SerialT IO Int
& S.map bucket -- SerialT IO (Int, Int)
& S.fold (FL.classify FL.length) -- IO (Map Int Int)
classify directs (k,v) stream to a Map applying the length fold to the stream of values in each bucket
© 2019 Composewell Technologies
Debugging a Pipeline
(trace/tap)
File.toBytes “inFile” -- SerialT IO Word8
& S.words FL.length -- SerialT IO Int
& S.map bucket -- SerialT IO (Int, Int)
& S.trace print -- SerialT IO (Int, Int)
& S.fold (FL.classify FL.length) -- IO (Map Int Int)
classify directs (k,v) stream to a Map applying the length fold to the stream of values in each bucket
© 2019 Composewell Technologies
Combining Streams
(Composing Unfolds)
© 2019 Composewell Technologies
Appending N Streams
(cat dir/* > outfile)
Dir.toFiles dirname -- SerialT IO String
& S.concatUnfold File.read -- SerialT IO Word8
& File.fromBytes “outFile” -- IO()
© 2019 Composewell Technologies
Outer Product
(Nested Loops)
mult :: (Int, Int) -> Int
mult (x, y) = x * y
from :: Monad m => Unfold m Int Int
from = UF.enumerateFromToIntegral 1000
cross :: Monad m => Unfold m (Int, Int) Int
cross =
UF.outerProduct from from
& UF.map mult
UF.fold cross FL.sum (1,1)
© 2019 Composewell Technologies
Better Replacement for ListT
and LogicT
loops :: SerialT IO ()
loops = do
x <- S.fromList [1,2]
y <- S.fromList [3,4]
S.yieldM $ putStrLn $ show (x, y)
(1,3)
(1,4)
(2,3)
(2,4)
© 2019 Composewell Technologies
Declarative
Concurrency
© 2019 Composewell Technologies
Lookup words
get :: String -> IO String
get s = liftIO (httpNoBody (parseRequest_ s)) >> return s
fetch :: String -> IO (String, String)
fetch w =
(,) <$> pure w <*> get (“https://www.google.com/search?q=“ ++ w)
wordList :: [String]
wordList = [“cat”, “dog”, “mouse”]
meanings :: [IO (String, String)]
meanings = map fetch wordList
© 2019 Composewell Technologies
Serially
S.fromListM meanings -- SerialT IO (String, String)
& S.map show -- SerialT IO String
& FH.putStrings -- IO ()
© 2019 Composewell Technologies
Asynchronously
S.fromListM meanings -- AsyncT IO (String, String)
& asyncly -- SerialT IO (String, String)
& S.map show -- SerialT IO String
& FH.putStrings — IO ()
© 2019 Composewell Technologies
Speculatively
(Look Ahead)
S.fromListM meanings -- AheadT IO (String, String)
& aheadly -- SerialT IO (String, String)
& S.map show -- SerialT IO String
& FH.putStrings -- IO ()
© 2019 Composewell Technologies
Word Lookup Server
S.unfold TCP.acceptOnPort 8090 -- SerialT IO Socket
& S.serially -- AsyncT IO ()
& S.mapM serve -- AsyncT IO ()
& S.asyncly -- SerialT IO ()
& S.drain -- IO ()
lookupWords :: Socket -> IO ()
lookupWords sk =
S.unfold SK.read sk -- SerialT IO Word8
& U.decodeLatin1 -- SerialT IO Char
& U.words FL.toList -- SerialT IO String
& S.serially -- AheadT IO String
& S.mapM fetch -- AheadT IO (String, String)
& S.aheadly -- SerialT IO (String, String)
& S.map show -- SerialT IO String
& S.intercalateSuffix "n" UF.identity -- SerialT IO String
& S.fold (SK.writeStrings sk) — IO ()
serve :: Socket -> IO ()
serve sk = finally (lookupWords sk) (close sk)
© 2019 Composewell Technologies
Rate Control Req/Sec
lookupWords :: Socket -> IO ()
lookupWords sk =
S.unfold SK.read sk -- SerialT IO Word8
& U.decodeLatin1 -- SerialT IO Char
& U.words FL.toList -- SerialT IO String
& serially -- AheadT IO String
& S.mapM lookup -- AheadT IO (String, String)
& S.maxRate 10 -- AheadT IO (String, String)
& S.aheadly -- SerialT IO (String, String)
& S.map Show -- SerialT IO String
& S.fold Sk.write sk -- IO ()
© 2019 Composewell Technologies
Rate Control Conns/Sec
S.unfold TCP.acceptOnPort 8090 -- SerialT IO Socket
& S.serially -- AsyncT IO ()
& S.mapM serve -- AsyncT IO ()
& S.maxRate 10 -- AsyncT IO ()
& S.asyncly -- SerialT IO ()
& S.drain -- IO ()
© 2019 Composewell Technologies
Merging Live Word Streams
S.unfold TCP.acceptOnPort 8090 -- SerialT IO Socket
& S.concatMapWith S.parallel recv -- SerialT IO String
& U.unwords UF.fromList -- SerialT IO Char
& U.encodeLatin1 -- SerialT IO Word8
& File.fromBytes “outFile” -- IO ()
readWords :: Socket -> SerialT IO String
readWords sk =
S.unfold SK.read sk -- SerialT IO Word8
& U.decodeLatin1 -- SerialT IO Char
& U.words FL.toList -- SerialT IO String
recv :: Socket -> SerialT IO String
recv sk = S.finally (liftIO $ close sk) (readWords sk)
© 2019 Composewell Technologies
Recursive Directory Listing
Concurrently
listDir :: Either String String -> AheadT IO String
listDir (Left dir) =
Dir.toEither dir -- SerialT IO (Either String String)
& S.map (prefixDir dir) -- SerialT IO (Either String String)
& S.consM (return dir)
. S.concatMapWith ahead listDir -- SerialT IO String
listDir (Right file) = S.yield file -- SerialT IO String
S.mapM_ print $ aheadly $ listDir (Left ".")
© 2019 Composewell Technologies
Demand Scaled
Concurrency
• No threads if no one is consuming the stream
• Concurrency increases as consuming rate
increases
• maxThreads and maxBuffer can control the
limits
© 2019 Composewell Technologies
Concurrent Folds
(Consume Concurrently)
© 2019 Composewell Technologies
Write Concurrently to multiple Destinations
FH.getBytes -- SerialT IO Word8
& S.tapAsync (TCP.fromBytes (192,168,1,10) 8091) -- SerialT IO Word8
& S.tapAsync (TCP.fromBytes (192,168,1,11) 8091) -- SerialT IO Word8
& File.fromBytes “outFile” -- IO ()
© 2019 Composewell Technologies
Concurrent ListT
(Nested Loops)
© 2019 Composewell Technologies
Non-determinism
(Looping)
loops = $ do
x <- each [1,2]
y <- each [3,4]
liftIO $ putStrLn $ show (x, y)
main = S.drain $ serially $ loops
main = S.drain $ asyncly $ loops
main = S.drain $ aheadly $ loops
© 2019 Composewell Technologies
Streaming + Concurrency
=
Reactive Programming
© 2019 Composewell Technologies
Reactive Programming
• Reactive programs (games, GUI) can be elegantly
expressed by declarative concurrency.
• See the Acid Rain game example in the package
• See the Circling Square example from Yampa, in
the package
https://github.com/composewell/streamly/blob/master/examples/AcidRain.hs
https://github.com/composewell/streamly/blob/master/examples/CirclingSquare.hs
© 2019 Composewell Technologies
Performance
© 2019 Composewell Technologies
Micro Benchmarks (GHC 8.8.1)
• A stream of 1 million elements is generated
• unfoldrM is used to generate the stream
• Two types of operations on the stream are
measured:
• single operation applied once
• a mix of operations applied multiple times
• Compiled using GHC 8.8.1
• All benchmarks are single threaded
• Ran on MacBook Pro with Intel Core i7 processor
• Because there is a lot of variance, the comparison
is in multiples rather than as percentage diff.
© 2019 Composewell Technologies
Comparison with Haskell lists
(GHC-8.8.1) (time)
© 2019 Composewell Technologies
Comparison with lists (GHC-8.8.1)
(Micro Benchmarks)
• List is slower than streamly in most operations, the
worse is 150 times slow.
• Streamly is slower than lists for concatMap and
append operations.
• There is no significant difference in memory
consumption.
© 2019 Composewell Technologies
Comparison with streaming libraries
(time)
streaming-0.2.3.0
conduit-1.3.1.1
pipes-4.3.12
© 2019 Composewell Technologies
Comparison with streaming libraries
(memory)
streaming-0.2.3.0
conduit-1.3.1.1
pipes-4.3.12
© 2019 Composewell Technologies
Comparison with streaming libraries
• All libraries are significantly slower (ranging from
1.2x to 1100x) than streamly for all operations.
• Streaming and streamly both consistently utilize
the same amount of memory across all ops.
• Conduit and pipes spike up to 32x memory in
certain operations.
© 2019 Composewell Technologies
Comparison With C
© 2019 Composewell Technologies
Counting Words in C
(The State)
struct statfs fsb;
uintmax_t linect, wordct, charct;
int fd, len;
short gotsp;
uint8_t *p;
uint8_t *buf;
linect = wordct = charct = 0;
if ((fd = open(argv[1], O_RDONLY, 0)) < 0) {
perror("open");
exit(EXIT_FAILURE);
}
if (fstatfs(fd, &fsb)) {
perror("fstatfs");
exit(EXIT_FAILURE);
}
buf = malloc(fsb.f_bsize);
if (!buf) {
perror("malloc");
exit(EXIT_FAILURE);
}
gotsp = 1;
© 2019 Composewell Technologies
Counting Words in C
(The Logic)
while ((len = read(fd, buf, fsb.f_bsize)) != 0) {
if (len == -1) {
perror("read");
exit(EXIT_FAILURE);
}
p = buf;
…
}
while (len > 0) {
uint8_t ch = *p;
charct++;
len -= 1;
p += 1;
if (ch == 'n')
++linect;
if (isspace(ch))
gotsp = 1;
else if (gotsp) {
gotsp = 0;
++wordct;
}
}
}
© 2019 Composewell Technologies
Counting Words in Haskell
data WordCount = WordCount !Int !Bool
data Counts = Counts !Int !Int !WordCount
initialCounts = Counts 0 0 (WordCount 0 True)
countl :: Int -> Word8 -> Int
countl n ch = if (ch == 10) then n + 1 else n
countw :: WordCount -> Word8 -> WordCount
countw (WordCount n wasSpace) ch =
if (isSpace $ chr $ fromIntegral ch)
then WordCount n True
else WordCount (if wasSpace then n + 1 else n) False
{-# INLINE updateCounts #-}
updateCounts :: Counts -> Word8 -> Counts
updateCounts (Counts c l w) ch = Counts (c + 1) (countl l ch) (countw w ch)
wc :: Handle -> IO Counts
wc h =
S.unfold FH.read h -- SerialT IO Char
& S.foldl' updateCounts initialCounts -- IO Counts
© 2019 Composewell Technologies
Word Counting: C vs Haskell
(550 MB text file)
C Haskell
2.42 Second 2.17 Second
© 2019 Composewell Technologies
Can Haskell be as fast as C?
• Each Haskell combinator represents a small
piece of the loop
• The programmer composes the loop using
these combinators.
• GHC fuses these pieces together (stream
fusion) to create a monolithic loop like C.
• Finally, the structure of the optimized code
churned out by GHC is like the C code, with the
both the loops as you see in the C program.
• This is global program optimization, an efficient
big picture is created using smaller pieces.
• GCC can only perform low level optimization
© 2019 Composewell Technologies
How can GHC perform
global optimizations?
• Strong types and purity makes equational
reasoning possible.
• This allows GHC to reliably perform
transformations over the code and fuse parts of
the code to generate efficient code.
• Global program optimization is not possible in
C.
© 2019 Composewell Technologies
What are the downsides?
• Stream fusion depends on inlining and SPEC
constructor optimizations.
• Often careful INLINE annotations are needed.
• Higher order functions require INLINE in the
“right” phase.
• GHC may need more work to perform it fully
reliably
• Due to global optimization compilation is slower
and may get slower as the size of the program
increases.
© 2019 Composewell Technologies
Streamly GHC Plugin For
Fusion
• Internal join bindings beyond a size threshold
are not inlined, blocking fusion.
• We mark stream constructors with a pragma to
identify them in the core.
• Join points with such constructors are inlined
irrespective of the size.
• This allows more reliable fusion.
© 2019 Composewell Technologies
The Project
© 2019 Composewell Technologies
Current State of The Project
• ~25K LOC, ~16K Doc, ~95 files, 18 contributors
• High quality, tested, production capable
• Some parts of the API may change in near future
• type names may change
• module structure may change
© 2019 Composewell Technologies
Work In Progress
• Stream parsers
• Concurrent folds
• Splitting and merging transformations
© 2019 Composewell Technologies
Roadmap
• Shared concurrent state
• Persistent queues
• Vector instructions
• Distributed processing
• Lot more stuff
© 2019 Composewell Technologies
Thank You!
harendra@composewell.com
twitter: @hk_hooda
https://github.com/composewell/streamly
https://gitter.im/composewell/streamly

Streamly: Concurrent Data Flow Programming

  • 1.
    © 2019 ComposewellTechnologies Streamly: Concurrent Data Flow Programming Harendra Kumar 15 Nov 2019 Composewell Technologies
  • 2.
    © 2019 ComposewellTechnologies Ergonomics of Shell Safety of Haskell Speed of C Magical Concurrency About Streamly https://github.com/composewell/streamly
  • 3.
    © 2019 ComposewellTechnologies About Me • C programming • OS kernel, file systems • Haskell since 2015 harendra@composewell.com @hk_hooda
  • 4.
    © 2019 ComposewellTechnologies About Composewell • Develops Streamly • Provides commercial support • Designs solutions using Streamly • Provides training on Streamly http://www.composewell.com hello@composewell.com
  • 5.
    © 2019 ComposewellTechnologies “Simplicity is a great virtue but it requires hard work to achieve it and education to appreciate it. And to make matters worse: complexity sells better.” — Edsger Wybe Dijkstra
  • 6.
    © 2019 ComposewellTechnologies Streams
  • 7.
    © 2019 ComposewellTechnologies What is a stream? • A sequence of same type of items • Combinators to process the sequence
  • 8.
    © 2019 ComposewellTechnologies Pure Effectful List Stream (Effectful List)
  • 9.
    © 2019 ComposewellTechnologies Imperative Functional Loop Stream (Modular Loops)
  • 10.
    © 2019 ComposewellTechnologies Do I need streams? • Imperative version: Do I need loops?
  • 11.
    © 2019 ComposewellTechnologies Streamly = Efficient, Composable and Concurrent Loops
  • 12.
    © 2019 ComposewellTechnologies Who should use Streamly? • General purpose framework • Declarative concurrency • High performance, nearing or beating C • Some examples: • Server backends • Reactive programming • Real time data analysis • Streaming data applications
  • 13.
    © 2019 ComposewellTechnologies Streamly At a Glance
  • 14.
    © 2019 ComposewellTechnologies Engineering Focused Library (Goals) Ergonomics Python and Shell Performance C Composability Haskell Safety Haskell Simplicity Use only basic Haskell
  • 15.
    © 2019 ComposewellTechnologies Composability + Performance
  • 16.
    © 2019 ComposewellTechnologies Streamly Version • Examples in this presentation use streamly-0.7.0 • Some examples may use APIs from Streamly.Internal.* modules • Source code for examples can be found at: https://github.com/composewell/streamly-examples
  • 17.
    © 2019 ComposewellTechnologies Fundamental Operations (Conceptual) Operation Shape Generate a -> Stream m b Transform Stream m a -> Stream m b Eliminate Stream m a -> m b
  • 18.
    © 2019 ComposewellTechnologies Fundamental Data Types (Actual) Type Constructor Unfold m a b forall s. Unfold (s -> m (Step s b)) (a -> m s) state generate inject Stream m a forall s. Stream (s -> m (Step s a)) s state generate state Fold m a b forall s. Fold (s -> a -> m s) (m s) (s -> m b) state accumulate initial extract
  • 19.
    © 2019 ComposewellTechnologies Core Modules Type Module Abbrev. Unfold Streamly.Data.Unfold UF Stream Streamly.Prelude S Fold Streamly.Data.Fold FL
  • 20.
    © 2019 ComposewellTechnologies unfold and fold Functions • S.unfold :: Unfold m a b -> a -> Stream m b • S.fold :: Fold m b c -> Stream m b -> m c
  • 21.
    © 2019 ComposewellTechnologies unfold & fold fold S.unfold UF.fromList [1..10] & S.fold FL.sum unfold
  • 22.
    © 2019 ComposewellTechnologies unfold & fold = Action fold a -> m c unfolda m cStream m b
  • 23.
    © 2019 ComposewellTechnologies unfold & fold = Loop! a -> m c a m cStream m b
  • 24.
    © 2019 ComposewellTechnologies Stream Types Type Coercion Modifier SerialT m a serially AsyncT m a asyncly AheadT m a aheadly
  • 25.
    © 2019 ComposewellTechnologies Summing an Int Stream S.unfold UF.fromList [1..10] -- SerialT Identity Int & S.fold FL.sum -- Identity Int
  • 26.
    © 2019 ComposewellTechnologies Summing an Int Stream (Better Ergonomics) S.fromList [1..10] -- SerialT Identity Int & S.sum -- Identity Int fromList = S.unfold UF.fromList [1..10] sum = S.fold FL.sum
  • 27.
    © 2019 ComposewellTechnologies File IO Examples
  • 28.
    © 2019 ComposewellTechnologies IO Modules Module Abbrev. Streamly.FileSystem.Handle FH Streamly.FileSystem.File File Streamly.Network.Socket SK Streamly.Network.Inet.TCP TCP Streamly.Data.Unicode.Stream U
  • 29.
    © 2019 ComposewellTechnologies cat File.toChunks “inFile” -- SerialT IO (Array Word8) & FH.putChunks -- IO ()
  • 30.
    © 2019 ComposewellTechnologies File Copy (cp) File.toChunks “inFile” -- SerialT IO (Array Word8) & File.fromChunks “outFile” -- IO ()
  • 31.
    © 2019 ComposewellTechnologies Count bytes in a File (wc -c) File.toBytes “inFile” -- SerialT IO Word8 & S.fold FL.length -- IO Int
  • 32.
    © 2019 ComposewellTechnologies Count Lines in a File (wc -l) File.toBytes “inFile” -- SerialT IO Word8 & S.fold nlines -- IO Int countl :: Int -> Word8 -> Int countl n ch = if (ch == 10) then n + 1 else n nlines :: Monad m => Fold m Word8 Int nlines = FL.mkPure countl 0 id
  • 33.
    © 2019 ComposewellTechnologies Count Words in a File (wc -w) File.toBytes “inFile” -- SerialT IO Word8 & S.fold nwords -- IO Int countw :: (Int, Bool) -> Word8 -> (Int, Bool) countw (n, wasSpace) ch = if (isSpace $ chr $ fromIntegral ch) then (n, True) else (if wasSpace then n + 1 else n, False) nwords :: Monad m => Fold m Word8 Int nwords = FL.mkPure countw (0, True) fst
  • 34.
    © 2019 ComposewellTechnologies Splitting a Stream (Composing Folds)
  • 35.
    © 2019 ComposewellTechnologies Count Bytes, Lines & Words (wc -clw) File.toBytes “inFile” -- SerialT IO Word8 & S.fold ((,,) <$> nlines <*> nwords <*> FL.length) -- IO (Int, Int, Int)
  • 36.
    © 2019 ComposewellTechnologies Sending output to multiple files (tee) File.toBytes “inFile” -- SerialT IO Word8 & S.fold (FL.tee (File.write “outFile1”) (File.write “outFile2")) -- IO ((),())
  • 37.
    © 2019 ComposewellTechnologies Splitting a file (split) type SIO = StateT (Maybe (Handle, Int)) IO splitFile :: FH.Handle -> IO () splitFile inHandle = File.toBytes “inFile” -- SerialT IO Word8 & S.liftInner -- SerialT SIO Word8 & S.chunksOf2 64 newHandle FH.write2 -- SerialT SIO Word8 & S.evalStateT Nothing -- SerialT IO () & S.drain -- IO () Uses experimental APIs
  • 38.
    © 2019 ComposewellTechnologies Transformation Pipeline
  • 39.
    © 2019 ComposewellTechnologies Word Classifier File.toBytes “inFile” -- SerialT IO Word8 & S.decodeLatin1 -- SerialT IO Char & S.map toLower -- SerialT IO Char & S.words FL.toList -- SerialT IO String & S.filter (all isAlpha) -- SerialT IO String & toHashMap -- IO (Map String (IORef Int)) See https://github.com/composewell/streamly/blob/master/examples/WordClassifier.hs Adapted from an example by Patrick Thomson alter Nothing = fmap Just $ newIORef (1 :: Int) alter (Just ref) = do modifyIORef' ref (+ 1) return (Just ref) toHashMap = S.foldlM' (flip (Map.alterF alter)) Map.empty
  • 40.
    © 2019 ComposewellTechnologies Word Size Histogram bucket :: Int -> (Int, Int) bucket n = let i = n `mod` 10 in if i > 9 then (9,n) else (i,n) File.toBytes “inFile” -- SerialT IO Word8 & S.words FL.length -- SerialT IO Int & S.map bucket -- SerialT IO (Int, Int) & S.fold (FL.classify FL.length) -- IO (Map Int Int) classify directs (k,v) stream to a Map applying the length fold to the stream of values in each bucket
  • 41.
    © 2019 ComposewellTechnologies Debugging a Pipeline (trace/tap) File.toBytes “inFile” -- SerialT IO Word8 & S.words FL.length -- SerialT IO Int & S.map bucket -- SerialT IO (Int, Int) & S.trace print -- SerialT IO (Int, Int) & S.fold (FL.classify FL.length) -- IO (Map Int Int) classify directs (k,v) stream to a Map applying the length fold to the stream of values in each bucket
  • 42.
    © 2019 ComposewellTechnologies Combining Streams (Composing Unfolds)
  • 43.
    © 2019 ComposewellTechnologies Appending N Streams (cat dir/* > outfile) Dir.toFiles dirname -- SerialT IO String & S.concatUnfold File.read -- SerialT IO Word8 & File.fromBytes “outFile” -- IO()
  • 44.
    © 2019 ComposewellTechnologies Outer Product (Nested Loops) mult :: (Int, Int) -> Int mult (x, y) = x * y from :: Monad m => Unfold m Int Int from = UF.enumerateFromToIntegral 1000 cross :: Monad m => Unfold m (Int, Int) Int cross = UF.outerProduct from from & UF.map mult UF.fold cross FL.sum (1,1)
  • 45.
    © 2019 ComposewellTechnologies Better Replacement for ListT and LogicT loops :: SerialT IO () loops = do x <- S.fromList [1,2] y <- S.fromList [3,4] S.yieldM $ putStrLn $ show (x, y) (1,3) (1,4) (2,3) (2,4)
  • 46.
    © 2019 ComposewellTechnologies Declarative Concurrency
  • 47.
    © 2019 ComposewellTechnologies Lookup words get :: String -> IO String get s = liftIO (httpNoBody (parseRequest_ s)) >> return s fetch :: String -> IO (String, String) fetch w = (,) <$> pure w <*> get (“https://www.google.com/search?q=“ ++ w) wordList :: [String] wordList = [“cat”, “dog”, “mouse”] meanings :: [IO (String, String)] meanings = map fetch wordList
  • 48.
    © 2019 ComposewellTechnologies Serially S.fromListM meanings -- SerialT IO (String, String) & S.map show -- SerialT IO String & FH.putStrings -- IO ()
  • 49.
    © 2019 ComposewellTechnologies Asynchronously S.fromListM meanings -- AsyncT IO (String, String) & asyncly -- SerialT IO (String, String) & S.map show -- SerialT IO String & FH.putStrings — IO ()
  • 50.
    © 2019 ComposewellTechnologies Speculatively (Look Ahead) S.fromListM meanings -- AheadT IO (String, String) & aheadly -- SerialT IO (String, String) & S.map show -- SerialT IO String & FH.putStrings -- IO ()
  • 51.
    © 2019 ComposewellTechnologies Word Lookup Server S.unfold TCP.acceptOnPort 8090 -- SerialT IO Socket & S.serially -- AsyncT IO () & S.mapM serve -- AsyncT IO () & S.asyncly -- SerialT IO () & S.drain -- IO () lookupWords :: Socket -> IO () lookupWords sk = S.unfold SK.read sk -- SerialT IO Word8 & U.decodeLatin1 -- SerialT IO Char & U.words FL.toList -- SerialT IO String & S.serially -- AheadT IO String & S.mapM fetch -- AheadT IO (String, String) & S.aheadly -- SerialT IO (String, String) & S.map show -- SerialT IO String & S.intercalateSuffix "n" UF.identity -- SerialT IO String & S.fold (SK.writeStrings sk) — IO () serve :: Socket -> IO () serve sk = finally (lookupWords sk) (close sk)
  • 52.
    © 2019 ComposewellTechnologies Rate Control Req/Sec lookupWords :: Socket -> IO () lookupWords sk = S.unfold SK.read sk -- SerialT IO Word8 & U.decodeLatin1 -- SerialT IO Char & U.words FL.toList -- SerialT IO String & serially -- AheadT IO String & S.mapM lookup -- AheadT IO (String, String) & S.maxRate 10 -- AheadT IO (String, String) & S.aheadly -- SerialT IO (String, String) & S.map Show -- SerialT IO String & S.fold Sk.write sk -- IO ()
  • 53.
    © 2019 ComposewellTechnologies Rate Control Conns/Sec S.unfold TCP.acceptOnPort 8090 -- SerialT IO Socket & S.serially -- AsyncT IO () & S.mapM serve -- AsyncT IO () & S.maxRate 10 -- AsyncT IO () & S.asyncly -- SerialT IO () & S.drain -- IO ()
  • 54.
    © 2019 ComposewellTechnologies Merging Live Word Streams S.unfold TCP.acceptOnPort 8090 -- SerialT IO Socket & S.concatMapWith S.parallel recv -- SerialT IO String & U.unwords UF.fromList -- SerialT IO Char & U.encodeLatin1 -- SerialT IO Word8 & File.fromBytes “outFile” -- IO () readWords :: Socket -> SerialT IO String readWords sk = S.unfold SK.read sk -- SerialT IO Word8 & U.decodeLatin1 -- SerialT IO Char & U.words FL.toList -- SerialT IO String recv :: Socket -> SerialT IO String recv sk = S.finally (liftIO $ close sk) (readWords sk)
  • 55.
    © 2019 ComposewellTechnologies Recursive Directory Listing Concurrently listDir :: Either String String -> AheadT IO String listDir (Left dir) = Dir.toEither dir -- SerialT IO (Either String String) & S.map (prefixDir dir) -- SerialT IO (Either String String) & S.consM (return dir) . S.concatMapWith ahead listDir -- SerialT IO String listDir (Right file) = S.yield file -- SerialT IO String S.mapM_ print $ aheadly $ listDir (Left ".")
  • 56.
    © 2019 ComposewellTechnologies Demand Scaled Concurrency • No threads if no one is consuming the stream • Concurrency increases as consuming rate increases • maxThreads and maxBuffer can control the limits
  • 57.
    © 2019 ComposewellTechnologies Concurrent Folds (Consume Concurrently)
  • 58.
    © 2019 ComposewellTechnologies Write Concurrently to multiple Destinations FH.getBytes -- SerialT IO Word8 & S.tapAsync (TCP.fromBytes (192,168,1,10) 8091) -- SerialT IO Word8 & S.tapAsync (TCP.fromBytes (192,168,1,11) 8091) -- SerialT IO Word8 & File.fromBytes “outFile” -- IO ()
  • 59.
    © 2019 ComposewellTechnologies Concurrent ListT (Nested Loops)
  • 60.
    © 2019 ComposewellTechnologies Non-determinism (Looping) loops = $ do x <- each [1,2] y <- each [3,4] liftIO $ putStrLn $ show (x, y) main = S.drain $ serially $ loops main = S.drain $ asyncly $ loops main = S.drain $ aheadly $ loops
  • 61.
    © 2019 ComposewellTechnologies Streaming + Concurrency = Reactive Programming
  • 62.
    © 2019 ComposewellTechnologies Reactive Programming • Reactive programs (games, GUI) can be elegantly expressed by declarative concurrency. • See the Acid Rain game example in the package • See the Circling Square example from Yampa, in the package https://github.com/composewell/streamly/blob/master/examples/AcidRain.hs https://github.com/composewell/streamly/blob/master/examples/CirclingSquare.hs
  • 63.
    © 2019 ComposewellTechnologies Performance
  • 64.
    © 2019 ComposewellTechnologies Micro Benchmarks (GHC 8.8.1) • A stream of 1 million elements is generated • unfoldrM is used to generate the stream • Two types of operations on the stream are measured: • single operation applied once • a mix of operations applied multiple times • Compiled using GHC 8.8.1 • All benchmarks are single threaded • Ran on MacBook Pro with Intel Core i7 processor • Because there is a lot of variance, the comparison is in multiples rather than as percentage diff.
  • 65.
    © 2019 ComposewellTechnologies Comparison with Haskell lists (GHC-8.8.1) (time)
  • 66.
    © 2019 ComposewellTechnologies Comparison with lists (GHC-8.8.1) (Micro Benchmarks) • List is slower than streamly in most operations, the worse is 150 times slow. • Streamly is slower than lists for concatMap and append operations. • There is no significant difference in memory consumption.
  • 67.
    © 2019 ComposewellTechnologies Comparison with streaming libraries (time) streaming-0.2.3.0 conduit-1.3.1.1 pipes-4.3.12
  • 68.
    © 2019 ComposewellTechnologies Comparison with streaming libraries (memory) streaming-0.2.3.0 conduit-1.3.1.1 pipes-4.3.12
  • 69.
    © 2019 ComposewellTechnologies Comparison with streaming libraries • All libraries are significantly slower (ranging from 1.2x to 1100x) than streamly for all operations. • Streaming and streamly both consistently utilize the same amount of memory across all ops. • Conduit and pipes spike up to 32x memory in certain operations.
  • 70.
    © 2019 ComposewellTechnologies Comparison With C
  • 71.
    © 2019 ComposewellTechnologies Counting Words in C (The State) struct statfs fsb; uintmax_t linect, wordct, charct; int fd, len; short gotsp; uint8_t *p; uint8_t *buf; linect = wordct = charct = 0; if ((fd = open(argv[1], O_RDONLY, 0)) < 0) { perror("open"); exit(EXIT_FAILURE); } if (fstatfs(fd, &fsb)) { perror("fstatfs"); exit(EXIT_FAILURE); } buf = malloc(fsb.f_bsize); if (!buf) { perror("malloc"); exit(EXIT_FAILURE); } gotsp = 1;
  • 72.
    © 2019 ComposewellTechnologies Counting Words in C (The Logic) while ((len = read(fd, buf, fsb.f_bsize)) != 0) { if (len == -1) { perror("read"); exit(EXIT_FAILURE); } p = buf; … } while (len > 0) { uint8_t ch = *p; charct++; len -= 1; p += 1; if (ch == 'n') ++linect; if (isspace(ch)) gotsp = 1; else if (gotsp) { gotsp = 0; ++wordct; } } }
  • 73.
    © 2019 ComposewellTechnologies Counting Words in Haskell data WordCount = WordCount !Int !Bool data Counts = Counts !Int !Int !WordCount initialCounts = Counts 0 0 (WordCount 0 True) countl :: Int -> Word8 -> Int countl n ch = if (ch == 10) then n + 1 else n countw :: WordCount -> Word8 -> WordCount countw (WordCount n wasSpace) ch = if (isSpace $ chr $ fromIntegral ch) then WordCount n True else WordCount (if wasSpace then n + 1 else n) False {-# INLINE updateCounts #-} updateCounts :: Counts -> Word8 -> Counts updateCounts (Counts c l w) ch = Counts (c + 1) (countl l ch) (countw w ch) wc :: Handle -> IO Counts wc h = S.unfold FH.read h -- SerialT IO Char & S.foldl' updateCounts initialCounts -- IO Counts
  • 74.
    © 2019 ComposewellTechnologies Word Counting: C vs Haskell (550 MB text file) C Haskell 2.42 Second 2.17 Second
  • 75.
    © 2019 ComposewellTechnologies Can Haskell be as fast as C? • Each Haskell combinator represents a small piece of the loop • The programmer composes the loop using these combinators. • GHC fuses these pieces together (stream fusion) to create a monolithic loop like C. • Finally, the structure of the optimized code churned out by GHC is like the C code, with the both the loops as you see in the C program. • This is global program optimization, an efficient big picture is created using smaller pieces. • GCC can only perform low level optimization
  • 76.
    © 2019 ComposewellTechnologies How can GHC perform global optimizations? • Strong types and purity makes equational reasoning possible. • This allows GHC to reliably perform transformations over the code and fuse parts of the code to generate efficient code. • Global program optimization is not possible in C.
  • 77.
    © 2019 ComposewellTechnologies What are the downsides? • Stream fusion depends on inlining and SPEC constructor optimizations. • Often careful INLINE annotations are needed. • Higher order functions require INLINE in the “right” phase. • GHC may need more work to perform it fully reliably • Due to global optimization compilation is slower and may get slower as the size of the program increases.
  • 78.
    © 2019 ComposewellTechnologies Streamly GHC Plugin For Fusion • Internal join bindings beyond a size threshold are not inlined, blocking fusion. • We mark stream constructors with a pragma to identify them in the core. • Join points with such constructors are inlined irrespective of the size. • This allows more reliable fusion.
  • 79.
    © 2019 ComposewellTechnologies The Project
  • 80.
    © 2019 ComposewellTechnologies Current State of The Project • ~25K LOC, ~16K Doc, ~95 files, 18 contributors • High quality, tested, production capable • Some parts of the API may change in near future • type names may change • module structure may change
  • 81.
    © 2019 ComposewellTechnologies Work In Progress • Stream parsers • Concurrent folds • Splitting and merging transformations
  • 82.
    © 2019 ComposewellTechnologies Roadmap • Shared concurrent state • Persistent queues • Vector instructions • Distributed processing • Lot more stuff
  • 83.
    © 2019 ComposewellTechnologies Thank You! harendra@composewell.com twitter: @hk_hooda https://github.com/composewell/streamly https://gitter.im/composewell/streamly