Haskell-awk

Haskell text processor for the commandline
Mario Pastorelli
Introduction
awk
a generic text processor where
“A file is treated as a sequence of records, and by
default each line is a record.” - Alfred V. Aho
developed in 1977 by Alfred Aho, Peter Weinberger, and Brian
Kernighan @ Bell Labs
uses AWK as programming language
ak'EI {pit"el Wrd"}
w BGN
rn Hlo ol! '

procedural
interpreted
a program is a series of pattern action pairs
Why another awk?
“Whenever faced with a problem, some people say
`Lets use AWK.' Now, they have two problems.” - D.
Tilbrook
avoid the AWK programming language
use a generic language, not a DSL
BGNslt" bcca,)frii abai]1r";o( i brr ";rn r
EI{pi(a
"a;o( n )[[]=;="fri n )=" ipit }
nb$wrs" bcca
u
od a
"

procedural (imperative) vs functional programming for stream
processing
Haskell-awk (Hawk)
a generic text processor where
“A stream is treated as a sequence of records, and
by default each line is a record.”
the same philosophy of awk!
developed in 2013 by me and Samuel Gélineau, the name is a tribute
to awk
uses Haskell as programming language
hw 'HloWrd"
ak "el ol!'

functional
(incrementally) compiled
a program is a Haskell expression
Why Haskell
expressive, clean and concise
>fle od[,,,]
itr d 1234
[,]
13

functions as composable building blocks
>ltwrCut=sm.mp(egh.wrs .lns
e odon
u
a lnt
od)
ie
>:yewrCut
tp odon
wrCut: Srn - It
odon : tig > n
>wrCut" 23n 56n 89
odon 1
4
7
"
9

partial application
>:yemp
tp a
mp: ( - b - []- []
a : a > ) > a > b
>:yent
tp o
nt: Bo - Bo
o : ol > ol
>:yempnt
tp a o
mpnt: [ol - [ol
a o : Bo] > Bo]
>mpnt[reFle
a o Tu,as]
[as,re
FleTu]

point-free style, laziness ...
Hawk
Modes
evaluate an expression
$hw ''
ak 1
1
$hw '12'
ak [,]
1
2
$hw '[,][,]'
ak [12,34]'
12
34

apply an expression to the input
$eh '2n'|hw - '.ees'
co 1n3
ak a Lrvre
3
2
1

map an expression to each record of the input
$eh ' 2n 4 |hw - '.ees'
co 1 3 '
ak m Lrvre
21
43
IO format
The input is, by default, a list of list of strings where lines are
separated by n and words by spaces
$eh ' 2n 4 |hw - 'hw
co 1 3 '
ak a so'
["""",""""]
[1,2][3,4]

Options -d/-D are provided to change delimiters or set them to
empty
$eh ',;,'|hw - -''-'''hw
co 1234
ak a d, D; so'
["""",""""]
[1,2][3,4]
$eh ' 2n 4 |hw - -' 'hw
co 1 3 '
ak a d' so'
[12,34]
" "" "
$eh ' 2n 4 |hw - -' -' 'hw
co 1 3 '
ak a d' D' so'
" 2n 4n
1 3 "

The output can be any type that instantiate the typeclass Rows
cas(hwa = Rw awee
ls So ) > os
hr
rp : BtSrn - a- [yetig
er : yetig >
> BtSrn]
Examples
get all users of a UNIX system
$ct/t/asd|hw -:- '.ed
a ecpsw
ak d m Lha'
ro
ot
deo
amn
..
.

select username and userid
$ct/t/asd|hw -:-'t - 'l- ( ! 0l! 2'
a ecpsw
ak d o' m  > l ! , ! )
ro
ot
0
deo 1
amn
..
.

sort by username (instead of pid)
$ct/t/asd|hw -:- '.oty(opr `n Lha)
a ecpsw
ak d a LsrB cmae o` .ed'
bnx22bn/i:bns
i::::i:bn/i/h
deo::::amn/s/bn/i/h
amnx11deo:ursi:bns
..
.

get the number of users using each shell

>ct/t/asd|hw -d '.a (.ed&&Llnt).Lgop.Lsr .LmpLls'
a ecpsw
ak a: Lmp Lha & .egh
.ru
.ot
.a .at
/i/ah1
bnbs:
..
.
Context
Hawk can be customized using files inside the context directory (by
default ~/.hawk)
The most important file is prelude.hs that contains the "runtime
context"
$ct~.akpeueh
a /hw/rld.s
{#LNUG EtneDfutue,OelaeSrns#}
- AGAE xeddealRls vroddtig ipr Peue
mot rld
ipr qaiidDt.yetigLz.hr a B
mot ulfe aaBtSrn.ayCa8 s
ipr qaiidDt.ita L
mot ulfe aaLs s

for instance, we can add a function for taking elements in an
interval
$eh 'aeewe se=Ltk ( -s .Ldo s > ~.akpeueh
co tkBten
.ae e
)
.rp ' > /hw/rld.s
$sq010|hw - 'aeewe 24
e
0
ak a tkBten
'
2
3
Implementation
Hawk must be fast
cache the context
use the timestamp to check if the context is changed since last
run
compile it with ghc
use locks to compile only once when multiple Hawk instances
instances are running
hw '1.'|hw - '.ae3
ak [.]
ak a Ltk '

use ByteString instead of String
...
Parse and interpret Haskell
Hawk combines two Haskell libraries
haskell-src-exts to deal with haskell source code
>ipr Lnug.akl.xsPre
mot agaeHselEt.asr
>gtoPams"- LNUG NIpiiPeueOelaeSrns#}n
eTprga {# AGAE omlctrld,vroddtig -"
Prek[agaerga(rLc
asO LnugPam Sco
{rFlnm ="nnw.s,scie=1 scoun=1)
scieae
ukonh" rLn
, rClm
}
[dn "omlctrld"Iet"vroddtig"]
Iet NIpiiPeue,dn OelaeSrns]

hint to interpret the user expression
>ipr Lnug.akl.nepee
mot agaeHselItrrtr
>rnnepee $stmot [Dt.n" > itrrt""(s: It
uItrrtr
eIprs "aaIt] > nepe 1 a : n)
Rgt1
ih
>rnnepee $stmot [Dt.n" > itrrt"o"(s: It
uItrrtr
eIprs "aaIt] > nepe fo a : n)
Lf (otopl [hErr{rMg="o i soe `o'})
et WnCmie Gcro ers
Nt n cp: fo"]
Thank you!
https://github.com/gelisam/hawk

Hawk presentation

  • 1.
    Haskell-awk Haskell text processorfor the commandline Mario Pastorelli
  • 2.
  • 3.
    awk a generic textprocessor where “A file is treated as a sequence of records, and by default each line is a record.” - Alfred V. Aho developed in 1977 by Alfred Aho, Peter Weinberger, and Brian Kernighan @ Bell Labs uses AWK as programming language ak'EI {pit"el Wrd"} w BGN rn Hlo ol! ' procedural interpreted a program is a series of pattern action pairs
  • 4.
    Why another awk? “Wheneverfaced with a problem, some people say `Lets use AWK.' Now, they have two problems.” - D. Tilbrook avoid the AWK programming language use a generic language, not a DSL BGNslt" bcca,)frii abai]1r";o( i brr ";rn r EI{pi(a "a;o( n )[[]=;="fri n )=" ipit } nb$wrs" bcca u od a " procedural (imperative) vs functional programming for stream processing
  • 5.
    Haskell-awk (Hawk) a generictext processor where “A stream is treated as a sequence of records, and by default each line is a record.” the same philosophy of awk! developed in 2013 by me and Samuel Gélineau, the name is a tribute to awk uses Haskell as programming language hw 'HloWrd" ak "el ol!' functional (incrementally) compiled a program is a Haskell expression
  • 6.
    Why Haskell expressive, cleanand concise >fle od[,,,] itr d 1234 [,] 13 functions as composable building blocks >ltwrCut=sm.mp(egh.wrs .lns e odon u a lnt od) ie >:yewrCut tp odon wrCut: Srn - It odon : tig > n >wrCut" 23n 56n 89 odon 1 4 7 " 9 partial application >:yemp tp a mp: ( - b - []- [] a : a > ) > a > b >:yent tp o nt: Bo - Bo o : ol > ol >:yempnt tp a o mpnt: [ol - [ol a o : Bo] > Bo] >mpnt[reFle a o Tu,as] [as,re FleTu] point-free style, laziness ...
  • 7.
  • 8.
    Modes evaluate an expression $hw'' ak 1 1 $hw '12' ak [,] 1 2 $hw '[,][,]' ak [12,34]' 12 34 apply an expression to the input $eh '2n'|hw - '.ees' co 1n3 ak a Lrvre 3 2 1 map an expression to each record of the input $eh ' 2n 4 |hw - '.ees' co 1 3 ' ak m Lrvre 21 43
  • 9.
    IO format The inputis, by default, a list of list of strings where lines are separated by n and words by spaces $eh ' 2n 4 |hw - 'hw co 1 3 ' ak a so' ["""",""""] [1,2][3,4] Options -d/-D are provided to change delimiters or set them to empty $eh ',;,'|hw - -''-'''hw co 1234 ak a d, D; so' ["""",""""] [1,2][3,4] $eh ' 2n 4 |hw - -' 'hw co 1 3 ' ak a d' so' [12,34] " "" " $eh ' 2n 4 |hw - -' -' 'hw co 1 3 ' ak a d' D' so' " 2n 4n 1 3 " The output can be any type that instantiate the typeclass Rows cas(hwa = Rw awee ls So ) > os hr rp : BtSrn - a- [yetig er : yetig > > BtSrn]
  • 10.
    Examples get all usersof a UNIX system $ct/t/asd|hw -:- '.ed a ecpsw ak d m Lha' ro ot deo amn .. . select username and userid $ct/t/asd|hw -:-'t - 'l- ( ! 0l! 2' a ecpsw ak d o' m > l ! , ! ) ro ot 0 deo 1 amn .. . sort by username (instead of pid) $ct/t/asd|hw -:- '.oty(opr `n Lha) a ecpsw ak d a LsrB cmae o` .ed' bnx22bn/i:bns i::::i:bn/i/h deo::::amn/s/bn/i/h amnx11deo:ursi:bns .. . get the number of users using each shell >ct/t/asd|hw -d '.a (.ed&&Llnt).Lgop.Lsr .LmpLls' a ecpsw ak a: Lmp Lha & .egh .ru .ot .a .at /i/ah1 bnbs: .. .
  • 11.
    Context Hawk can becustomized using files inside the context directory (by default ~/.hawk) The most important file is prelude.hs that contains the "runtime context" $ct~.akpeueh a /hw/rld.s {#LNUG EtneDfutue,OelaeSrns#} - AGAE xeddealRls vroddtig ipr Peue mot rld ipr qaiidDt.yetigLz.hr a B mot ulfe aaBtSrn.ayCa8 s ipr qaiidDt.ita L mot ulfe aaLs s for instance, we can add a function for taking elements in an interval $eh 'aeewe se=Ltk ( -s .Ldo s > ~.akpeueh co tkBten .ae e ) .rp ' > /hw/rld.s $sq010|hw - 'aeewe 24 e 0 ak a tkBten ' 2 3
  • 12.
  • 13.
    Hawk must befast cache the context use the timestamp to check if the context is changed since last run compile it with ghc use locks to compile only once when multiple Hawk instances instances are running hw '1.'|hw - '.ae3 ak [.] ak a Ltk ' use ByteString instead of String ...
  • 14.
    Parse and interpretHaskell Hawk combines two Haskell libraries haskell-src-exts to deal with haskell source code >ipr Lnug.akl.xsPre mot agaeHselEt.asr >gtoPams"- LNUG NIpiiPeueOelaeSrns#}n eTprga {# AGAE omlctrld,vroddtig -" Prek[agaerga(rLc asO LnugPam Sco {rFlnm ="nnw.s,scie=1 scoun=1) scieae ukonh" rLn , rClm } [dn "omlctrld"Iet"vroddtig"] Iet NIpiiPeue,dn OelaeSrns] hint to interpret the user expression >ipr Lnug.akl.nepee mot agaeHselItrrtr >rnnepee $stmot [Dt.n" > itrrt""(s: It uItrrtr eIprs "aaIt] > nepe 1 a : n) Rgt1 ih >rnnepee $stmot [Dt.n" > itrrt"o"(s: It uItrrtr eIprs "aaIt] > nepe fo a : n) Lf (otopl [hErr{rMg="o i soe `o'}) et WnCmie Gcro ers Nt n cp: fo"]
  • 15.