Pandoc:
The Deep Dive
All that is great
stands in the storm
â—Ź Universal markup converter == " the swiss
army knife of text markup formats"
â—Ź ALL HASKELL
â—Ź Example:
pandoc -o myDoc.md myDoc.html
pandoc -f html -t latex hackage.org
pandoc myDoc.txt -o myDoc.pdf
What is Pandoc?
â—Ź Reads:
â—‹ Markdown (GitHub, Strict, etc.), HTML, LaTeX,
Textile, reStructuredText, JSON,
â—Ź Writes:
â—‹ Markdown, reStructuredText, HTML, Docbook
XML, OpenDocument XML, ODT, RTF, groff
man, MediaWiki markup, GNU Texinfo, LaTeX,
ConTeXt, EPUB, Textile, Emacs org-mode, Slidy,
S5
â—Ź Extensions for LaTeX math, tables, etc.
â—Ź Note to self: Pandoc in the CLI
What is Pandoc? (pt. 2)
â—Ź Performance vis-Ă -vis scripting languages
â—Ź Type safety
â—Ź Text.Parsec library
â—Ź Hypermuscular list processing (more
about FP more generally than about
Haskell)
Why Haskell?
â—Ź One possibility: functions devoted to each
type-to-type combination
â—‹ markdownToHTML
â—‹ HTMLtoEPUB
â—‹ 12^31 possibilities
â—‹ FUCK THAT
â—Ź Vastly better possibility?
Reader -->
Neutral Haskell data type -->
Writer -->
Converted document
Possible approaches
â—Ź Semi-stateful, non-opinionated REGEX
machine
○ Accumulative — return (x:xs)
â—‹ getParserState
â—‹ modifyState
â—Ź Core functions
â—‹ parse
â–  parse parser filePath input
â–  parse numbers "" "a,b,2,3"
â—‹ many
â—‹ skipMany
â—‹ manyAccum
â—Ź type Parser t s = Parsec t s
Text.Parsec
â—Ź Neutral data types
â—‹ Pandoc = [Block]
â—‹ Block = [(Inline || Block)]
â—‹ Inline
â—‹ etc.
â—Ź Reader
â—‹ Applies parsers to documents
â—‹ Documents are treated as lists
â—Ź Writer
â—‹ Converts neutral data type into document
â—‹ Again, documents are just structured lists
Basic flow
â—Ź Readers/Markdown.hs
â—Ź Writers/HTML.hs
â—Ź Pandoc/Builder.hs
Markdown to HTML
â—Ź When doing big, complex things with FP,
you're probably going to end up thinking in
terms of lists
â—Ź Lists are infinitely flexible
â—Ź Hard to escape state entirely
â—‹ ReaderState
â—‹ WriterState
â—Ź Don't give up
â—Ź Force yourself to give a presentation at
PDXFunc
General lessons

Pandoc: the deep dive (PDXFunc presentation)

  • 1.
    Pandoc: The Deep Dive Allthat is great stands in the storm
  • 2.
    â—Ź Universal markupconverter == " the swiss army knife of text markup formats" â—Ź ALL HASKELL â—Ź Example: pandoc -o myDoc.md myDoc.html pandoc -f html -t latex hackage.org pandoc myDoc.txt -o myDoc.pdf What is Pandoc?
  • 3.
    â—Ź Reads: â—‹ Markdown(GitHub, Strict, etc.), HTML, LaTeX, Textile, reStructuredText, JSON, â—Ź Writes: â—‹ Markdown, reStructuredText, HTML, Docbook XML, OpenDocument XML, ODT, RTF, groff man, MediaWiki markup, GNU Texinfo, LaTeX, ConTeXt, EPUB, Textile, Emacs org-mode, Slidy, S5 â—Ź Extensions for LaTeX math, tables, etc. â—Ź Note to self: Pandoc in the CLI What is Pandoc? (pt. 2)
  • 4.
    â—Ź Performance vis-Ă -visscripting languages â—Ź Type safety â—Ź Text.Parsec library â—Ź Hypermuscular list processing (more about FP more generally than about Haskell) Why Haskell?
  • 5.
    â—Ź One possibility:functions devoted to each type-to-type combination â—‹ markdownToHTML â—‹ HTMLtoEPUB â—‹ 12^31 possibilities â—‹ FUCK THAT â—Ź Vastly better possibility? Reader --> Neutral Haskell data type --> Writer --> Converted document Possible approaches
  • 6.
    ● Semi-stateful, non-opinionatedREGEX machine ○ Accumulative — return (x:xs) ○ getParserState ○ modifyState ● Core functions ○ parse ■ parse parser filePath input ■ parse numbers "" "a,b,2,3" ○ many ○ skipMany ○ manyAccum ● type Parser t s = Parsec t s Text.Parsec
  • 7.
    â—Ź Neutral datatypes â—‹ Pandoc = [Block] â—‹ Block = [(Inline || Block)] â—‹ Inline â—‹ etc. â—Ź Reader â—‹ Applies parsers to documents â—‹ Documents are treated as lists â—Ź Writer â—‹ Converts neutral data type into document â—‹ Again, documents are just structured lists Basic flow
  • 8.
  • 9.
    â—Ź When doingbig, complex things with FP, you're probably going to end up thinking in terms of lists â—Ź Lists are infinitely flexible â—Ź Hard to escape state entirely â—‹ ReaderState â—‹ WriterState â—Ź Don't give up â—Ź Force yourself to give a presentation at PDXFunc General lessons