SlideShare a Scribd company logo
1 of 31
Download to read offline
Inside (C)Python
Tom Lee @tglee

 OSCON 2012
19th July, 2012
Overview

• About   me!
 • Implemented unified try/except/finally for Python 2.5 (Georg Brandl’s PEP 341)
 • Support for compiling ASTs to bytecode from within Python programs for 2.6
 • Wannabe compiler nerd.

• Terminology    & brief intro to compilers:
 • Grammars, Scanners & Parsers
 • Parse Trees & Abstract Syntax Trees
 • General architecture of a bytecode compiler

• How   to add new syntax to Python:
 • Python compiler architecture & summary
 • Case study in adding a new keyword
 • Demo!
“Python” vs. “CPython”

• When    I say “Python” in this presentation, I mean CPython.
• Other   Python implementations:
 • Jython
 • PyPy
 • IronPython
 • Stackless Python
 • More...

• Many    of the concepts here are likely applicable, but the details may differ wildly
• Refer   to the slides for my Inside PHP presentation to see how things may differ.
• Python   3.x
Compilers 101: Scanners

• Or   lexical analyzers, or tokenizers
• Input:   raw source code
• Output:   a stream of tokens
                                                 IF_TOK


                                                IDENT("a")
                                    if a == b
                                                 EQ_TOK


                                                IDENT("b")
Compilers 101: Parsers

• Input:   a stream of tokens from the scanner
• Output     is implementation dependent
• The    goal of a parser is to structure to the sequence of tokens from the scanner.
• Parsers    are frequently automatically generated from a grammar/DSL
 • See   parser generators like Yacc/Bison, ANTLR, etc. or e.g. parser combinators in Haskell, Scala, ML.


                         IF_TOK              If

                                                  Compare
                        IDENT("a")
                                                      Eq    Name("a")   Name("b")
                         EQ_TOK


                        IDENT("b")
Compilers 101: Context-free grammars

• Or   simply “grammar”
•A   grammar describes the complete syntax of a (programming) language.
• Usually     expressed in Extended Backus-Naur Form (EBNF)
 • Or   some variant thereof.

• Variants    of EBNF used for a lot of DSL-based parser generators
 • e.g.   Yacc/Bison, ANTLR, etc.
Compilers 101: Parse Tree


• Or     concrete syntax tree.
• “...
    an ordered, rooted tree that represents the [full] syntactic structure of a string
 according to some formal grammar.”
  • http://en.wikipedia.org/wiki/Parse_tree

• Python’s     parser constructs a parse tree based on Grammar/Grammar
  • Must   be manually converted to Abstract Syntax Tree before we do anything fun.
Compilers 101: Abstract Syntax Tree

• AST

• Rooted     tree data structure.
• In-memory,       intermediate representation of the input (source) program.
 • “Abstract”   in that it doesn’t necessarily include every detail that appears in the real syntax.

•A   code generator can traverse an AST and emit target code.
 • e.g.   Python byte code
                                                                    If
• Certain
        optimizations are also convenient at
 the AST level.                                                          Compare

                                                                             Eq      Name("a")         Name("b")
Generalized Compiler Architecture*

    Source files   Source code                Scanner                  Token stream




                                                                           Parser




     Bytecode                                                          Abstract
                    Bytecode              Code Generator
    Interpreter                                                       Syntax Tree




                           * Actually a generalized *bytecode* compiler architecture
Generalized *Python* Compiler Architecture

         Source files                 Source code                       Scanner                            Token stream


                                                                rammar*
                                                       Grammar/G


                       * Grammar/Grammar is processed by Parser/pgen to generate a scanner & parser

                                                                                                      r*
                                                                                             /Gramma           Parser
                                                                                    Grammar
                                                                                              &
                                                                                                   .c
                                                                                        P ython/ast




          Bytecode                                                                                         Abstract
                                       Bytecode                    Code Generator
         Interpreter                                                                                      Syntax Tree

                                                                                                       arser/Py thon.asdl
           n/ceval.c                                        Python/compile.c                          P
     Pytho
Case Study: The “unless” statement

                                                It’s basically
          will_give_you_up = False
                                                  “if not ...”
          unless will_give_you_up:
            print(“Never gonna give you up.”)

          will_let_you_down = True

          unless will_let_you_down:
            print(“Letting you down :’(”)

          -- output --

          Never gonna give you up.
Adding the “unless” construct to (C)Python

1.Describe the syntax of the new construct
2.Add new AST node(s) if necessary
3.Write a parse tree to Abstract Syntax Tree (AST) transform
4.Emit bytecode for your new AST node(s)
Before you start...

• You’ll      need the usual gcc toolchain, GNU Bison, etc.
  • Debian/Ubuntuapt-get install build-essential
  • OSX Xcode command line tools should give you most of what you need.

• Grab       the Python code from Mercurial
  • I’m   working with the stable Python 3.2 branch
    • Only   because tip was breaking in weird & wonderful ways at the time of rehearsal!
  • The   steps described here should also work for any version of Python >= 2.5.x
    • Beware   of cyclic dependencies in the build process when modifying the grammar.
1. Describe the syntax of the new construct

• Or:   describe how your new language construct “looks” in source code
• Modify    the grammar in Grammar/Grammar
 • EBNF-ish  DSL                                               Scanner         Token stream
 • Used by Parser/pgen, Python’s custom
    parser generator, to generate the C code that            rammar*
                                                    Grammar/G
    drives Python’s parser and scanner.
                                                                         rammar*   Parser
                                                              Grammar/G
• Riffing
       on existing constructs is a great way to learn:                 &
                                                                             .c
                                                                  P ython/ast
 how does an if statement look like in the grammar?
• Let’s   specify the syntax of unless
2. Add new AST node(s) if necessary

• Parser/Python.asdl
 • Zephyr Abstract Syntax Definition Language                         -- a DSL for describing ASTs.
 • http://asdl.sourceforge.net/
 • The Python build process generates C code                         describing the AST.
   • See   Parser/asdl_c.py and Parser/asdl.py for details of how this code is generated.


• Can   you represent your new Python construct using existing AST nodes?
 • Great!    Skip this step.

• Let’s    go ahead and add an Unless node anyway.
 • Demonstration          using If and Not nodes if we have time ...                        If

                                                                                                 Compare

                                                                                                     Eq    Name("a")   Name("b")
3. Parse tree to AST transformation

• Recall   the parser that is generated from Grammar/Grammar
• The   generated parser constructs a parse tree without any effort on our part ...
• ...
   BUT! earlier we saw that the
                                                                                            *
                                                                                    Grammar
 code generator takes an AST as input.                                             /                    Parser
                                                                          Grammar
                                                                                    &
                                                                                         .c
                                                                              P ython/ast

• We need to manually transform the parse                                                                                See?

 tree into an AST.
                                                                                                    Abstract
                                                        Code Generator

• Python/ast.c
                                                                                                   Syntax Tree
                                                                                                                     l
                                                              pile   .c                         Parser/P ython.asd
                                                  Py thon/com

• Again,look to the existing if statement for                                     If

                                                                                       Compare


 inspiration.                                                                              Eq        Name("a")   Name("a")
4. Emit bytecode for the new AST node

• This   step is only necessary if you added a new AST node in step 2.
• At   this point our AST is ready to compile to bytecode.
• Traverse    the tree, emitting bytecode as we go.
• What    bytecode instructions do we need to use?
 • What   does the bytecode for a simple if statement look like? How would we modify it?
Intermission: Python VM + bytecode
                                                                       data = []

• Python     has a stack based virtual machine.                        def push(x):
                                                                         data.append(x)
• Values     get pushed onto a data stack.
                                                                       def pop():
• Opcodes       manipulate the stack.                                    return data.pop()
 • e.g.   binary op: pop two values, compute result, push new value.
                                                                       def add():
• Control     flow using unconditional & conditional jumps.               push(pop() + pop())

                                                                       def sub():
                                                                         push(pop() - pop())

                                                                       push(5)      # [5]
                                                                       push(6)      # [5, 6]
                                                                       add()        # [11]
Bytecode for “if x == y: ...”
                           LOAD_NAME x




                           LOAD_NAME y         x == y




                       COMPARE_OP PyCmp_EQ




                       POP_JUMP_IF_FALSE end




                                ...            if …: ...




                                ...




                                ...              end
Bytecode for “if x == y: ...”
                           LOAD_NAME x




                           LOAD_NAME y         x == y




                       COMPARE_OP PyCmp_EQ




                       POP_JUMP_IF_FALSE end




                                ...            if …: ...




                                ...




                                ...              end
Bytecode for “if x == y: ...”
                           LOAD_NAME x




                           LOAD_NAME y         x == y




                       COMPARE_OP PyCmp_EQ




                       POP_JUMP_IF_FALSE end




                                ...            if …: ...




                                ...




                                ...              end
Bytecode for “if x == y: ...”
                           LOAD_NAME x




                           LOAD_NAME y         x == y




                       COMPARE_OP PyCmp_EQ




                       POP_JUMP_IF_FALSE end




                                ...            if …: ...




                                ...




                                ...              end
Bytecode for “if x == y: ...”
                           LOAD_NAME x




                           LOAD_NAME y         x == y




                       COMPARE_OP PyCmp_EQ




                       POP_JUMP_IF_FALSE end




                                ...            if …: ...




                                ...




                                ...              end
Bytecode for “if x == y: ...”
                           LOAD_NAME x




                           LOAD_NAME y         x == y




                       COMPARE_OP PyCmp_EQ




                       POP_JUMP_IF_FALSE end




                                ...            if …: ...




                                ...




                                ...              end
Bytecode for “if x == y: ...”
                           LOAD_NAME x




                           LOAD_NAME y         x == y




                       COMPARE_OP PyCmp_EQ




                       POP_JUMP_IF_FALSE end




                                ...            if …: ...




                                ...




                                ...              end
Bytecode for “if x == y: ...”
                           LOAD_NAME x




                           LOAD_NAME y         x == y




                       COMPARE_OP PyCmp_EQ




                       POP_JUMP_IF_FALSE end




                                ...            if …: ...




                                ...




                                ...              end
Bytecode for “if x == y: ...”
                           LOAD_NAME x




                           LOAD_NAME y         x == y




                       COMPARE_OP PyCmp_EQ




                       POP_JUMP_IF_FALSE end




                                ...            if …: ...




                                ...




                                ...              end
4. Emit bytecode for the new AST node (cont.)

• Python/compile.c
 • Look   for <AST>_kind to track down existing impls for AST nodes -- e.g. ClassDef_kind for classes

• Once    again: how is if implemented?
 • Look   for If_kind

• How     can we modify this to implement unless?
Demo!

• The    usual ./configure && make dance on Linux/OSX.
• With    a little luck, magic happens and you get a binary in the root directory.
• ...   So, uh, does it work?
And exhale.

• Lots    to take in, right?
 • In   my experience, this stuff is best learned bit-by-bit through practice.

• Ask    questions!
 • Google
 • Python-Dev
 • Or hey, ask me...
Thanks!




          oscon@tomlee.co    @tglee    http://github.com/thomaslee

                            http://newrelic.com

More Related Content

What's hot (20)

Playfulness at Work
Playfulness at WorkPlayfulness at Work
Playfulness at Work
 
Your Own Metric System
Your Own Metric SystemYour Own Metric System
Your Own Metric System
 
Python - Introduction
Python - IntroductionPython - Introduction
Python - Introduction
 
Better rspec 進擊的 RSpec
Better rspec 進擊的 RSpecBetter rspec 進擊的 RSpec
Better rspec 進擊的 RSpec
 
Python Workshop
Python WorkshopPython Workshop
Python Workshop
 
Python by Rj
Python by RjPython by Rj
Python by Rj
 
Python ppt
Python pptPython ppt
Python ppt
 
Nethemba - Writing exploits
Nethemba - Writing exploitsNethemba - Writing exploits
Nethemba - Writing exploits
 
Python Tutorial for Beginner
Python Tutorial for BeginnerPython Tutorial for Beginner
Python Tutorial for Beginner
 
Learn Python The Hard Way Presentation
Learn Python The Hard Way PresentationLearn Python The Hard Way Presentation
Learn Python The Hard Way Presentation
 
Os Goodger
Os GoodgerOs Goodger
Os Goodger
 
Python idiomatico
Python idiomaticoPython idiomatico
Python idiomatico
 
Programming with Python - Basic
Programming with Python - BasicProgramming with Python - Basic
Programming with Python - Basic
 
python.ppt
python.pptpython.ppt
python.ppt
 
LoteríA Correcta
LoteríA CorrectaLoteríA Correcta
LoteríA Correcta
 
Majlis Persaraan Pn.Hjh.Normah bersama guru-guru Sesi Petang
Majlis Persaraan Pn.Hjh.Normah bersama guru-guru Sesi PetangMajlis Persaraan Pn.Hjh.Normah bersama guru-guru Sesi Petang
Majlis Persaraan Pn.Hjh.Normah bersama guru-guru Sesi Petang
 
Jerry Shea Resume And Addendum 5 2 09
Jerry  Shea Resume And Addendum 5 2 09Jerry  Shea Resume And Addendum 5 2 09
Jerry Shea Resume And Addendum 5 2 09
 
MMBJ Shanzhai Culture
MMBJ Shanzhai CultureMMBJ Shanzhai Culture
MMBJ Shanzhai Culture
 
Washington Practitioners Significant Changes To Rpc 1.5
Washington Practitioners Significant Changes To Rpc 1.5Washington Practitioners Significant Changes To Rpc 1.5
Washington Practitioners Significant Changes To Rpc 1.5
 
Paulo Freire Pedagpogia 1
Paulo Freire Pedagpogia 1Paulo Freire Pedagpogia 1
Paulo Freire Pedagpogia 1
 

Similar to Inside Python [OSCON 2012]

Compiler Construction
Compiler ConstructionCompiler Construction
Compiler ConstructionAhmed Raza
 
Generating parsers using Ragel and Lemon
Generating parsers using Ragel and LemonGenerating parsers using Ragel and Lemon
Generating parsers using Ragel and LemonTristan Penman
 
Introduction to compiler
Introduction to compilerIntroduction to compiler
Introduction to compilerAbha Damani
 
Open Source Compiler Construction for the JVM
Open Source Compiler Construction for the JVMOpen Source Compiler Construction for the JVM
Open Source Compiler Construction for the JVMTom Lee
 
4 compiler lab - Syntax Ana
4 compiler lab - Syntax Ana4 compiler lab - Syntax Ana
4 compiler lab - Syntax AnaMashaelQ
 
Scalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAMScalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAMfnothaft
 
Os Worthington
Os WorthingtonOs Worthington
Os Worthingtonoscon2007
 
Compier Design_Unit I.ppt
Compier Design_Unit I.pptCompier Design_Unit I.ppt
Compier Design_Unit I.pptsivaganesh293
 
Compier Design_Unit I.ppt
Compier Design_Unit I.pptCompier Design_Unit I.ppt
Compier Design_Unit I.pptsivaganesh293
 
Different phases of a compiler
Different phases of a compilerDifferent phases of a compiler
Different phases of a compilerSumit Sinha
 
Convention-Based Syntactic Descriptions
Convention-Based Syntactic DescriptionsConvention-Based Syntactic Descriptions
Convention-Based Syntactic DescriptionsRay Toal
 
Using ANTLR on real example - convert "string combined" queries into paramete...
Using ANTLR on real example - convert "string combined" queries into paramete...Using ANTLR on real example - convert "string combined" queries into paramete...
Using ANTLR on real example - convert "string combined" queries into paramete...Alexey Diyan
 

Similar to Inside Python [OSCON 2012] (20)

Compiler Construction
Compiler ConstructionCompiler Construction
Compiler Construction
 
Generating parsers using Ragel and Lemon
Generating parsers using Ragel and LemonGenerating parsers using Ragel and Lemon
Generating parsers using Ragel and Lemon
 
python and perl
python and perlpython and perl
python and perl
 
LANGUAGE TRANSLATOR
LANGUAGE TRANSLATORLANGUAGE TRANSLATOR
LANGUAGE TRANSLATOR
 
Introduction to compiler
Introduction to compilerIntroduction to compiler
Introduction to compiler
 
Open Source Compiler Construction for the JVM
Open Source Compiler Construction for the JVMOpen Source Compiler Construction for the JVM
Open Source Compiler Construction for the JVM
 
4 compiler lab - Syntax Ana
4 compiler lab - Syntax Ana4 compiler lab - Syntax Ana
4 compiler lab - Syntax Ana
 
Ch1 (1).ppt
Ch1 (1).pptCh1 (1).ppt
Ch1 (1).ppt
 
Unit1 cd
Unit1 cdUnit1 cd
Unit1 cd
 
OOPSLA Talk on Preon
OOPSLA Talk on PreonOOPSLA Talk on Preon
OOPSLA Talk on Preon
 
Scalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAMScalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAM
 
Os Worthington
Os WorthingtonOs Worthington
Os Worthington
 
Cpcs302 1
Cpcs302  1Cpcs302  1
Cpcs302 1
 
Compier Design_Unit I.ppt
Compier Design_Unit I.pptCompier Design_Unit I.ppt
Compier Design_Unit I.ppt
 
Compier Design_Unit I.ppt
Compier Design_Unit I.pptCompier Design_Unit I.ppt
Compier Design_Unit I.ppt
 
Different phases of a compiler
Different phases of a compilerDifferent phases of a compiler
Different phases of a compiler
 
CD U1-5.pptx
CD U1-5.pptxCD U1-5.pptx
CD U1-5.pptx
 
Convention-Based Syntactic Descriptions
Convention-Based Syntactic DescriptionsConvention-Based Syntactic Descriptions
Convention-Based Syntactic Descriptions
 
Using ANTLR on real example - convert "string combined" queries into paramete...
Using ANTLR on real example - convert "string combined" queries into paramete...Using ANTLR on real example - convert "string combined" queries into paramete...
Using ANTLR on real example - convert "string combined" queries into paramete...
 
SS & CD Module 3
SS & CD Module 3 SS & CD Module 3
SS & CD Module 3
 

Recently uploaded

Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNeo4j
 
How to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxHow to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxKaustubhBhavsar6
 
My key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIMy key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIVijayananda Mohire
 
Extra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdfExtra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdfInfopole1
 
Top 10 Squarespace Development Companies
Top 10 Squarespace Development CompaniesTop 10 Squarespace Development Companies
Top 10 Squarespace Development CompaniesTopCSSGallery
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0DanBrown980551
 
From the origin to the future of Open Source model and business
From the origin to the future of  Open Source model and businessFrom the origin to the future of  Open Source model and business
From the origin to the future of Open Source model and businessFrancesco Corti
 
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Muhammad Tiham Siddiqui
 
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInOutage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInThousandEyes
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
 
2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdfThe Good Food Institute
 
Graphene Quantum Dots-Based Composites for Biomedical Applications
Graphene Quantum Dots-Based Composites for  Biomedical ApplicationsGraphene Quantum Dots-Based Composites for  Biomedical Applications
Graphene Quantum Dots-Based Composites for Biomedical Applicationsnooralam814309
 
Automation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsAutomation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsDianaGray10
 
AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024Brian Pichman
 
Where developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingWhere developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingFrancesco Corti
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxNeo4j
 
Technical SEO for Improved Accessibility WTS FEST
Technical SEO for Improved Accessibility  WTS FESTTechnical SEO for Improved Accessibility  WTS FEST
Technical SEO for Improved Accessibility WTS FESTBillieHyde
 
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updatePatch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updateadam112203
 
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl
 
3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud DataEric D. Schabell
 

Recently uploaded (20)

Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4j
 
How to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxHow to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptx
 
My key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIMy key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAI
 
Extra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdfExtra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdf
 
Top 10 Squarespace Development Companies
Top 10 Squarespace Development CompaniesTop 10 Squarespace Development Companies
Top 10 Squarespace Development Companies
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0
 
From the origin to the future of Open Source model and business
From the origin to the future of  Open Source model and businessFrom the origin to the future of  Open Source model and business
From the origin to the future of Open Source model and business
 
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)
 
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInOutage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf
 
Graphene Quantum Dots-Based Composites for Biomedical Applications
Graphene Quantum Dots-Based Composites for  Biomedical ApplicationsGraphene Quantum Dots-Based Composites for  Biomedical Applications
Graphene Quantum Dots-Based Composites for Biomedical Applications
 
Automation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsAutomation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projects
 
AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024
 
Where developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingWhere developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is going
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
 
Technical SEO for Improved Accessibility WTS FEST
Technical SEO for Improved Accessibility  WTS FESTTechnical SEO for Improved Accessibility  WTS FEST
Technical SEO for Improved Accessibility WTS FEST
 
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updatePatch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 update
 
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile Brochure
 
3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data
 

Inside Python [OSCON 2012]

  • 1. Inside (C)Python Tom Lee @tglee OSCON 2012 19th July, 2012
  • 2. Overview • About me! • Implemented unified try/except/finally for Python 2.5 (Georg Brandl’s PEP 341) • Support for compiling ASTs to bytecode from within Python programs for 2.6 • Wannabe compiler nerd. • Terminology & brief intro to compilers: • Grammars, Scanners & Parsers • Parse Trees & Abstract Syntax Trees • General architecture of a bytecode compiler • How to add new syntax to Python: • Python compiler architecture & summary • Case study in adding a new keyword • Demo!
  • 3. “Python” vs. “CPython” • When I say “Python” in this presentation, I mean CPython. • Other Python implementations: • Jython • PyPy • IronPython • Stackless Python • More... • Many of the concepts here are likely applicable, but the details may differ wildly • Refer to the slides for my Inside PHP presentation to see how things may differ. • Python 3.x
  • 4. Compilers 101: Scanners • Or lexical analyzers, or tokenizers • Input: raw source code • Output: a stream of tokens IF_TOK IDENT("a") if a == b EQ_TOK IDENT("b")
  • 5. Compilers 101: Parsers • Input: a stream of tokens from the scanner • Output is implementation dependent • The goal of a parser is to structure to the sequence of tokens from the scanner. • Parsers are frequently automatically generated from a grammar/DSL • See parser generators like Yacc/Bison, ANTLR, etc. or e.g. parser combinators in Haskell, Scala, ML. IF_TOK If Compare IDENT("a") Eq Name("a") Name("b") EQ_TOK IDENT("b")
  • 6. Compilers 101: Context-free grammars • Or simply “grammar” •A grammar describes the complete syntax of a (programming) language. • Usually expressed in Extended Backus-Naur Form (EBNF) • Or some variant thereof. • Variants of EBNF used for a lot of DSL-based parser generators • e.g. Yacc/Bison, ANTLR, etc.
  • 7. Compilers 101: Parse Tree • Or concrete syntax tree. • “... an ordered, rooted tree that represents the [full] syntactic structure of a string according to some formal grammar.” • http://en.wikipedia.org/wiki/Parse_tree • Python’s parser constructs a parse tree based on Grammar/Grammar • Must be manually converted to Abstract Syntax Tree before we do anything fun.
  • 8. Compilers 101: Abstract Syntax Tree • AST • Rooted tree data structure. • In-memory, intermediate representation of the input (source) program. • “Abstract” in that it doesn’t necessarily include every detail that appears in the real syntax. •A code generator can traverse an AST and emit target code. • e.g. Python byte code If • Certain optimizations are also convenient at the AST level. Compare Eq Name("a") Name("b")
  • 9. Generalized Compiler Architecture* Source files Source code Scanner Token stream Parser Bytecode Abstract Bytecode Code Generator Interpreter Syntax Tree * Actually a generalized *bytecode* compiler architecture
  • 10. Generalized *Python* Compiler Architecture Source files Source code Scanner Token stream rammar* Grammar/G * Grammar/Grammar is processed by Parser/pgen to generate a scanner & parser r* /Gramma Parser Grammar & .c P ython/ast Bytecode Abstract Bytecode Code Generator Interpreter Syntax Tree arser/Py thon.asdl n/ceval.c Python/compile.c P Pytho
  • 11. Case Study: The “unless” statement It’s basically will_give_you_up = False “if not ...” unless will_give_you_up: print(“Never gonna give you up.”) will_let_you_down = True unless will_let_you_down: print(“Letting you down :’(”) -- output -- Never gonna give you up.
  • 12. Adding the “unless” construct to (C)Python 1.Describe the syntax of the new construct 2.Add new AST node(s) if necessary 3.Write a parse tree to Abstract Syntax Tree (AST) transform 4.Emit bytecode for your new AST node(s)
  • 13. Before you start... • You’ll need the usual gcc toolchain, GNU Bison, etc. • Debian/Ubuntuapt-get install build-essential • OSX Xcode command line tools should give you most of what you need. • Grab the Python code from Mercurial • I’m working with the stable Python 3.2 branch • Only because tip was breaking in weird & wonderful ways at the time of rehearsal! • The steps described here should also work for any version of Python >= 2.5.x • Beware of cyclic dependencies in the build process when modifying the grammar.
  • 14. 1. Describe the syntax of the new construct • Or: describe how your new language construct “looks” in source code • Modify the grammar in Grammar/Grammar • EBNF-ish DSL Scanner Token stream • Used by Parser/pgen, Python’s custom parser generator, to generate the C code that rammar* Grammar/G drives Python’s parser and scanner. rammar* Parser Grammar/G • Riffing on existing constructs is a great way to learn: & .c P ython/ast how does an if statement look like in the grammar? • Let’s specify the syntax of unless
  • 15. 2. Add new AST node(s) if necessary • Parser/Python.asdl • Zephyr Abstract Syntax Definition Language -- a DSL for describing ASTs. • http://asdl.sourceforge.net/ • The Python build process generates C code describing the AST. • See Parser/asdl_c.py and Parser/asdl.py for details of how this code is generated. • Can you represent your new Python construct using existing AST nodes? • Great! Skip this step. • Let’s go ahead and add an Unless node anyway. • Demonstration using If and Not nodes if we have time ... If Compare Eq Name("a") Name("b")
  • 16. 3. Parse tree to AST transformation • Recall the parser that is generated from Grammar/Grammar • The generated parser constructs a parse tree without any effort on our part ... • ... BUT! earlier we saw that the * Grammar code generator takes an AST as input. / Parser Grammar & .c P ython/ast • We need to manually transform the parse See? tree into an AST. Abstract Code Generator • Python/ast.c Syntax Tree l pile .c Parser/P ython.asd Py thon/com • Again,look to the existing if statement for If Compare inspiration. Eq Name("a") Name("a")
  • 17. 4. Emit bytecode for the new AST node • This step is only necessary if you added a new AST node in step 2. • At this point our AST is ready to compile to bytecode. • Traverse the tree, emitting bytecode as we go. • What bytecode instructions do we need to use? • What does the bytecode for a simple if statement look like? How would we modify it?
  • 18. Intermission: Python VM + bytecode data = [] • Python has a stack based virtual machine. def push(x): data.append(x) • Values get pushed onto a data stack. def pop(): • Opcodes manipulate the stack. return data.pop() • e.g. binary op: pop two values, compute result, push new value. def add(): • Control flow using unconditional & conditional jumps. push(pop() + pop()) def sub(): push(pop() - pop()) push(5) # [5] push(6) # [5, 6] add() # [11]
  • 19. Bytecode for “if x == y: ...” LOAD_NAME x LOAD_NAME y x == y COMPARE_OP PyCmp_EQ POP_JUMP_IF_FALSE end ... if …: ... ... ... end
  • 20. Bytecode for “if x == y: ...” LOAD_NAME x LOAD_NAME y x == y COMPARE_OP PyCmp_EQ POP_JUMP_IF_FALSE end ... if …: ... ... ... end
  • 21. Bytecode for “if x == y: ...” LOAD_NAME x LOAD_NAME y x == y COMPARE_OP PyCmp_EQ POP_JUMP_IF_FALSE end ... if …: ... ... ... end
  • 22. Bytecode for “if x == y: ...” LOAD_NAME x LOAD_NAME y x == y COMPARE_OP PyCmp_EQ POP_JUMP_IF_FALSE end ... if …: ... ... ... end
  • 23. Bytecode for “if x == y: ...” LOAD_NAME x LOAD_NAME y x == y COMPARE_OP PyCmp_EQ POP_JUMP_IF_FALSE end ... if …: ... ... ... end
  • 24. Bytecode for “if x == y: ...” LOAD_NAME x LOAD_NAME y x == y COMPARE_OP PyCmp_EQ POP_JUMP_IF_FALSE end ... if …: ... ... ... end
  • 25. Bytecode for “if x == y: ...” LOAD_NAME x LOAD_NAME y x == y COMPARE_OP PyCmp_EQ POP_JUMP_IF_FALSE end ... if …: ... ... ... end
  • 26. Bytecode for “if x == y: ...” LOAD_NAME x LOAD_NAME y x == y COMPARE_OP PyCmp_EQ POP_JUMP_IF_FALSE end ... if …: ... ... ... end
  • 27. Bytecode for “if x == y: ...” LOAD_NAME x LOAD_NAME y x == y COMPARE_OP PyCmp_EQ POP_JUMP_IF_FALSE end ... if …: ... ... ... end
  • 28. 4. Emit bytecode for the new AST node (cont.) • Python/compile.c • Look for <AST>_kind to track down existing impls for AST nodes -- e.g. ClassDef_kind for classes • Once again: how is if implemented? • Look for If_kind • How can we modify this to implement unless?
  • 29. Demo! • The usual ./configure && make dance on Linux/OSX. • With a little luck, magic happens and you get a binary in the root directory. • ... So, uh, does it work?
  • 30. And exhale. • Lots to take in, right? • In my experience, this stuff is best learned bit-by-bit through practice. • Ask questions! • Google • Python-Dev • Or hey, ask me...
  • 31. Thanks! oscon@tomlee.co @tglee http://github.com/thomaslee http://newrelic.com

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n