[FT-11][suhorng] “Poor Man's” Undergraduate Compilers

286 views

Published on

“Poor Man's” Undergraduate Compilers
--by suhorng
--on Functional Thursday Meetup 11

Published in: Education, Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
286
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

[FT-11][suhorng] “Poor Man's” Undergraduate Compilers

  1. 1. Poor Man's Undergraduate Compilers suhorng ‘‘ " This slide: https://github.com/suhorng/ss/tree/master/ft11 1 / 20
  2. 2. Poor Man's Undergraduate Compilers What compiler? : https://github.com/suhorng/ss/ A minimal functional language compiler : https://github.com/suhorng/compiler13hw/ Compiler homework: Compiling C-- to MIPS, written in Haskell How poor? It is slow It generates slow codes ‘‘ " ss ‘‘ " compiler13hw 2 / 20
  3. 3. originally stands for small Scheme Written in Scheme, compiling a minimal functional language to x86-32 assembly No data types, no optimizations, no need for parsers Only 102 commits! suhorng@SHHY-ASPIRE2920 /d/code/test/ss (master) $ wc *.ss *.s 78 363 3645 closure.ss 192 762 9137 code-gen.ss 86 465 3588 cps.ss 11 24 172 issac.ss 168 811 5584 match-case-simple.ss 30 115 991 prelude.ss 313 1495 14334 reg-alloc.ss 130 477 4942 seq-ir-gen.ss 104 304 3090 ss.ss 166 863 8500 type-infer.ss 85 222 2095 sscrt.s 1363 5901 56078 total ss ss 3 / 20
  4. 4. : A Language Simply typed λ-calculus with constants and a fixed-point operator. Strict evaluation. Terms Types ss e :: = c | x | (+ ) | (× )e1 e2 e1 e2 | (lambda ( …) e)x1 x2 | (ifz con th el) | (fix e) | ( …)e1 e2 t :: = N | () | →t1 t2 | (× …)t1 t2 4 / 20
  5. 5. Passes type inference ⇓ CPS transformation ⇓ closure conversion ⇓ transform into low-level IR ⇓ register allocation ⇓ machine code generation 5 / 20
  6. 6. Type Inference Interpreter, [() `(() ,(prim-type 'Unit))] [,x (guard (var? x)) `(,x ,(assq x mono-cxt))] [(fix ,e) (let [(a (fresh-var)) (built-e (build-type! e mono-cxt))] (unify! (expr->type built-e) (fun-type a a)) `((fix ,built-e) ,a))] [(lambda (,[xs ..]) ,e) (let* [(obj-as (map (lambda (_) (fresh-var)) xs)) (built-e (build-type! e (append (map cons xs obj-as) mono-cxt))) (obj-b (expr->type built-e)) (obj-xs (map list xs obj-as))] `((lambda (,@obj-xs) ,built-e) ,(fun-type (tuple-type obj-as) obj-b)))] [(,e1 ,[es ..]) (let* [(built-e1 (build-type! e1 mono-cxt)) (built-es (map (lambda (e) (build-type! e mono-cxt)) es)) (obj-a2b (expr->type built-e1)) (obj-as (map expr->type built-es)) (obj-b (fresh-var))] (unify! obj-a2b (fun-type (tuple-type obj-as) obj-b)) `((,built-e1 ,@built-es) ,obj-b))] 6 / 20
  7. 7. CPS Transformation interpreter, (define cpsk ;; cpsk :: eT -> {(eT -> eC) | kC} -> eC (lambda (expr k) (match expr [(,c ,t) (guard (prim-const? c)) (apply-cont k `(,c ,t))] [(,x ,t) (guard (var? x)) (apply-cont k `(,x ,t))] [((lambda (,[xs ..]) ,e) ,t) (let [(k0 (fresh-var "&"))] (apply-cont k `((lambda ,xs ,k0 ,(cpsk e k0)) ,t)))] [((fix ,e) ,t) (cpsk e (lambda (v) `((fix ,(mark-type v e) ,(place-cont k)) ,t)))] ... (define apply-cont (lambda (k x) (cond [(procedure? k) (k x)] [else `(cont-ap ,k ,x)]))) (define place-cont (lambda (k) (cond [(procedure? k) (let [(t (fresh-var "%"))] `(lambda (,t) ,(k t)))] [else k]))) 7 / 20
  8. 8. Closure Conversion interpreter... Compute free variables (match expr [(,c ,t) (guard (prim-const? c)) '( () )] [(,x ,t) (guard (var? x)) `( ((,x ,t)) )] [((lambda (,[xs ..]) ,k ,e) ,t) ; lambda abstraction (let [(var/e (uncover-free-vars e))] `(,(remove-assoc* (map car xs) (car var/e)) ,var/e))] Closure conversion [((,x ,t) __) (guard (var? x)) (cond [(memq x bound-vars) `(,x ,t)] [else `((this-ref ,x) ,t)])] [(((lambda (,[xs ..]) ,k ,e) ,t) (,free-vars ,var/e));lambda abstraction (let [(fv-ref (map (lambda (x) (closure-convert x `((x)) bound-vars)) free-vars))] `((closure ,fv-ref ((lambda ,xs ,k ,(closure-convert e var/e (map car xs))) ,t)) ,t))] 8 / 20
  9. 9. Now the code looks like ((lambda (a) ((lambda (b) a) 5)) (+ 1 2)) ((closure () ((lambda ((argv Unit)) &1 ((+ (1 Int) (2 Int) (lambda (%2) ((((closure () ((lambda ((a Int)) &3 ((((closure ((a Int)) ((lambda ((b Int)) &4 (cont-ap &4 ((this-ref a) Int))) (Int -> Int))) (Int -> Int)) (5 Int)) &3 Int)) (Int -> Int))) (Int -> Int)) (%2 Int)) &1 Int))) Int)) (Unit -> Int))) (Unit -> Int)) 9 / 20
  10. 10. Now the code looks like ((lambda (a) ((lambda (b) a) 5)) (+ 1 2)) ((closure () ((lambda ((argv Unit)) &1 ((+ (1 Int) (2 Int) (lambda (%2) ((((closure () | ((lambda ((a Int)) &3 | ((((closure ((a Int)) | ((lambda ((b Int)) &4 | (cont-ap &4 ((this-ref a) Int)))))) | (5 Int)) | &3))))) (%2 Int)) &1)))))))) 10 / 20
  11. 11. Low-level IR Flatten the continuations, lift functions to top level (((closure ::fn1 ()) (Unit -> Int)) ((lambda ::fn1 (Unit -> Int) () ((argv Unit)) ((%2 : Int <- (1 + 2)) (tail-call (function ::fn2) %2))) (lambda ::fn2 (Int -> Int) () ((a Int)) ((%f1 : (Int -> Int) <- (closure ::fn3 (a))) (tail-call %f1 5))) (lambda ::fn3 (Int -> Int) ((a Int)) ((b Int)) ((ret (this-ref a)))))) Here comes a machine! ‘‘ " 11 / 20
  12. 12. Register Allocation A MESS :-D 12 / 20
  13. 13. Machine Code Generation From pseudo-assembly... (lambda ::fn1 (Unit -> Int) () ((argv Unit)) ((argv Unit)) 1 () () ((make-call-stack 1) (eax <- (const 3)) ((arg 0) <- eax) (edi <- (function ::fn2)) (tail-call 1 (function ::fn2)))) (lambda ::fn2 (Int -> Int) () ((a Int)) ((a Int) (%f1 Int -> Int)) 2 (ebx) () ((make-call-stack 2) (eax <- (function ::fn3)) ((arg 0) <- eax) (eax <- (const 1)) ((arg 1) <- eax) (call-prim 2 make_closure) (edx <- (formal a)) ((closure eax 0) <- edx) (make-call-stack 1) (ebx <- eax) (eax <- (const 5)) ((arg 0) <- eax) (edi <- ebx) (tail-call 1 ebx))) (lambda ::fn3 (Int -> Int) ((a Int)) ((b Int)) ((a Int) (b Int)) 0 () () ((eax <- (this-ref a)) return))) 13 / 20
  14. 14. Machine Code Generation To concrete machine code (lambda _ss_function_fn1 (Unit -> Int) () ((argv Unit)) ((argv Unit)) () ((sub esp 4) (mov eax 3) (mov (* (esp + 0)) eax) (mov edi _ss_function_fn2) (lea edx (* (esp + 4))) (mov ecx (* (esp + 4))) (mov eax (* (esp + 0))) (mov (* (edx + 4)) eax) (mov (* (edx)) ecx) (mov esp edx) (jmp _ss_function_fn2_code))) (lambda _ss_function_fn2 (Int -> Int) () ((a Int)) ((a Int) (%f1 Int -> Int)) () ((push ebp) (mov ebp esp) (sub esp 4) (mov (* (ebp - 4)) ebx) (sub esp 8) (mov eax _ss_function_fn3) (mov (* (esp + 0)) eax) (mov eax 1) (mov (* (esp + 4)) eax) (call _ss_prim_make_closure) (mov edx (* (ebp + 8))) (mov (* (eax + 4)) edx) (sub esp 4) (mov ebx eax) (mov eax 5) (mov (* (esp + 0)) eax) (mov edi ebx) (mov ebx (* (ebp - 4))) (mov edx (* (ebp))) (mov ecx (* (ebp + 4))) (lea ebp (* (esp + 12))) (mov eax (* (esp + 0))) (mov (* (ebp + 4)) eax) (mov (* (ebp)) ecx) (mov esp ebp) (mov ebp edx) (jmp (* (edi))))) (lambda _ss_function_fn3 (Int -> Int) ((a Int)) ((b Int)) ((a Int) (b Int)) () ((mov eax (* (edi + 4))) (ret 4)))) 14 / 20
  15. 15. Tail Calls Loops ((fix (lambda (loop) (lambda (n sum) (ifz n sum (loop (+ n -1) (+ sum n)))))) 5 0) Compare: int sum = ???; for (int i = n; i != 0; i = i-1) sum = sum + n; Naively implementing tail calls: Place function call arguments as usual Move arguments Adjust frame pointer; jump. 15 / 20
  16. 16. Tail Calls ;####################################### ; _ss_function_fn3: ((* Int Int) -> Int) ; parameters: ; ((n Int) (sum Int)) ; free variables: ; ((loop ((* Int Int) -> Int))) ;####################################### _ss_function_fn3_code: ; Note: this function doesn't have a frame cmp dword [esp + 4], 0 jne .L1 mov eax, [esp + 8] ret 8 ; terminating loop .L1: mov eax, [esp + 8] ; eax := sum add eax, [esp + 4] ; eax (sum') += n mov edx, [esp + 4] add edx, -1 ; edx (n') := n - 1 sub esp, 8 ; place arguments as usual mov [esp], edx ; | sum' | esp+4 mov [esp + 4], eax ; | n' | esp mov edi, [edi + 4] ; load closure pointer lea edx, [esp + 8] mov ecx, [esp + 8] ; move new arguments up mov eax, [esp + 4] mov [edx + 8], eax ; sum' overrides sum, mov eax, [esp] mov [edx + 4], eax ; so does n'! mov [edx], ecx mov esp, edx jmp [edi] 16 / 20
  17. 17. A compiler for a small subset of C, implemented in Haskell/suhorng & kevin4314 (nothing special) Parsing is done using Happy cf. Happy MonadFix, Easy -pass compiler/CindyLinz compiler13hw n 17 / 20
  18. 18. Yet the register allocation is still a mess :-D do {- read input & parsing -} let Right parsedAST = Parser.parse input {- semantic check -} let compareCompileError ce1 ce2 = compare (errLine ce1) (errLine ce2) (ast, ces) <- runWriterT $ censor (sortBy compareCompileError) $ do foldedAST <- Const.constFolding parsedAST typeInlinedAST <- Desugar.tyDesugar foldedAST let decayedAST = Desugar.fnArrDesugar typeInlinedAST symbolAST <- SymTable.buildSymTable decayedAST typedAST <- TypeCheck.typeCheck symbolAST return $ NormalizeAST.normalize typedAST when (not $ null ces) $ mapM_ (putStrLn . show) ces >> exit1 {- code generation -} let adjustedAST = SethiUllman.seull ast llir = LLIRTrans.llirTrans adjustedAST inlinedLLIR = EmptyBlockElim.elim llir llirFuncs = LLIR.progFuncs inlinedLLIR llirGlobl = LLIR.progVars inlinedLLIR llirRegs = LLIR.progRegs inlinedLLIR let mips = MIPSTrans.transProg $ inlinedLLIR simpMips = BlockOrder.jumpElim . BlockOrder.blockOrder $ mips print simpMips compiler13hw 18 / 20
  19. 19. -- all are interpreters alphaConvAST s@(S.Block _ _) = runLocal $ alphaConvBlock' s alphaConvAST (S.Expr ty line rand rators) = S.Expr ty line rand <$> mapM alphaConvAST rators alphaConvAST (S.ImplicitCast ty' ty e) = S.ImplicitCast ty' ty <$> alphaConvAST e buildMStmt (P.While line whcond whcode) = do whcond' <- buildMStmts whcond whcode' <- runLocal (buildMStmt whcode) return $ S.While line whcond' whcode' buildMStmt (P.Identifier line name) = do currScope <- get upperScope <- ask let ty = fmap S.varType $ lookup name currScope <|> lookup name upperScope case ty of Just S.TTypeSyn -> tell [errorAt line $ "Unexpected type synonym '" ++ name ++ "'"] Nothing -> tell [errorAt line $ "Undeclared identifier '" ++ name ++ "'"] otherwise -> return () return $ S.Identifier (error "buildMStmt:Identifier") line name tyCheckAST (S.Expr _ line rator [rand1, rand2]) | rator `elem` logicOps = do rand1' <- tyCheckAST rand1 rand2' <- tyCheckAST rand2 let (t1, t2) = (S.getType rand1', S.getType rand2') when ((not $ tyIsScalarType t1) || (not $ tyIsScalarType t2)) $ tell [errorAt line $ "'" ++ show rator ++ "' is applied to operands of non-scalar typ return $ S.Expr S.TInt line rator [rand1', rand2'] compiler13hw 19 / 20
  20. 20. Writing Compiler Using Functional Languages Interpreters love you! Simple AST (Pointers? new, delete? Visitor pattern? Wat?) I don't know how to deal with similar ASTs Don't know how to express constraints on ASTs (e.g. only allow a subset of operators) Haven't thought of good ways to implement register allocation 20 / 20

×