SlideShare a Scribd company logo
Introduction to
https://github.com/roskakori/talks/tree/master/pygraz/pygments
Thomas Aglassinger
http://www.roskakori.at
@TAglassinger
What is pygments?
● Generic syntax highlighter
● Suitable for use in code hosting, forums, wikis
or other applications
● Supports 300+ programming languages and
text formats
● Provides a simple API to
write your own lexers
Agenda
● Basic usage
● A glimpse at the API: lexers and tokens
● Use case: convert source code
● Use case: write your own lexer
Basic usage
Applications that use pygments
● Wikipedia
● Jupyter notebook
● Sphinx documentation builder
● Trac ticket tracker and wiki
● Bitbucket source code hosting
● Pygount source lines of code counter
(shameless plug)
● And many others
Try it online
Try it online
Use the command line
● pygmentize -f html -O full,style=emacs
-o example.html example.sql
● Renders example.sql to
example.html
● Without
“-O full,style=emacs”
you have to provide your
own CSS
● Other formats: LaTex, RTF,
ANSI sequences
-- Simple SQL example.
select
customer_number,
first_name,
surname,
date_of_birth
from
customer
where
date_of_birth >= '1990-01-01'
and rating <= 20
Choose a specific SQL dialect
● There are many SQL dialects
● Most use “.sql” as file suffix
● Use “-l <lexer>”
to choose
a specific lexer
● pygmentize -l tsql
-f html
-O full,style=emacs
-o example.html transact.sql
-- Simple Transact-SQL example.
declare @date_of_birth date = '1990-01-01';
select top 10
*
from
[customer]
where
[date_of_birth] = @date_of_birth
order by
[customer_number]
A glimpse at the API:
lexers and tokens
What are lexers?
● Lexers split a text into a list of tokens
● Tokens are strings with an assigned meaning
● For example, a Python source code might resolve to tokens
like:
– Comment: # Some comment
– String: ‘Hellonworld!’
– Keyword: while
– Number: 1.23e-45
● Lexers only see single “words”, parsers see the whole
syntax
Split a source code into tokens
Source code for example.sql:
-- Simple SQL example.
select
customer_number,
first_name,
surname,
date_of_birth
from
customer
where
date_of_birth >= '1990-01-01'
and rating <= 20
Tokens for example.sql
(Token.Comment.Single, '-- Simple SQL example.n')
(Token.Keyword, 'select')
(Token.Text, 'n ')
(Token.Name, 'customer_number')
(Token.Punctuation, ',')
…
(Token.Operator, '>')
...
(Token.Literal.String.Single, "'1990-01-01'")
...
(Token.Literal.Number.Integer, '20')
...
-- Simple SQL example.
select
customer_number,
first_name,
surname,
date_of_birth
from
customer
where
date_of_birth >= '1990-01-01'
and rating <= 20
Source code to lex example.sql
import pygments.lexers
import pygments.token
def print_tokens(source_path):
# Read source code into string.
with open(source_path, encoding='utf-8') as source_file:
source_text = source_file.read()
# Find a fitting lexer.
lexer = pygments.lexers.guess_lexer_for_filename(
source_path, source_text)
# Print tokens from source code.
for items in lexer.get_tokens(source_text):
print(items)
Source code to lex example.sql
Obtain token
sequence
Find lexer
matching the
source code
import pygments.lexers
import pygments.token
def print_tokens(source_path):
# Read source code into string.
with open(source_path, encoding='utf-8') as source_file:
source_text = source_file.read()
# Find a fitting lexer.
lexer = pygments.lexers.guess_lexer_for_filename(
source_path, source_text)
# Print tokens from source code.
for items in lexer.get_tokens(source_text):
print(items)
Tokens in pygments
● Tokens are tuples with 2 items:
– Type, e.g. Token.Comment
– Text, e.g. ‘# Some comment’
● Tokens are defined in pygments.token
● Some token types have subtypes, e.g. Comment has
Comment.Single, Comment.Multiline etc.
● In that case, use “in” instead of “==” to check if a
token type matches, e.g.:
if token_type in pygments.token.Comment: ...
Convert source code
Convert source code
● Why?
To match coding guidelines!
● Example: “SQL keywords must
be lower case”→ faster to read
● Despite that, a lot of SQL code
uses upper case for keywords.
● Legacy from the mainframe
era and when text editors did
not have syntax highlighting.
SELECT
CustomerNumber,
FirstName,
Surname
FROM
Customer
WHERE
DateOfBirth >= '1990-01-01'
Convert source code
SELECT
CustomerNumber,
FirstName,
Surname
FROM
Customer
WHERE
DateOfBirth >= '1990-01-01'
select
CustomerNumber,
FirstName,
Surname
from
Customer
where
DateOfBirth >= '1990-01-01'
Convert source code
Check for keywords
and convert them
to lower case
def lowify_sql_keywords(source_path, target_path):
# Read source code into string.
with open(source_path, encoding='utf-8') as source_file:
source_text = source_file.read()
# Find a fitting lexer.
lexer = pygments.lexers.guess_lexer_for_filename(
source_path, source_text)
# Lex the source, convert keywords and write target file.
with open(target_path, 'w', encoding='utf-8') as target_file:
for token_type, token_text in lexer.get_tokens(source_text):
# Check for keywords and convert them to lower case.
if token_type == pygments.token.Keyword:
token_text = token_text.lower()
target_file.write(token_text)
Write your own lexer
Why write your own lexer?
● To support new languages
● To support obscure languages
(mainframe FTW!)
● To support in house domain specific languages
(DSL)
How to write your own lexer
● All the gory details:
http://pygments.org/docs/lexerdevelopment/
● For most practical purposes, inherit from
RegexLexer
● Basic knowledge of
regular expressions
required (“import re”)
NanoSQL
● Small subset if SQL
● Comment: -- Some comment
● Keyword: select
● Integer number: 123
● String: ‘Hello’; use ‘’ to escape
● Name: Customer
● Punctuation: .,;:
External lexers with pygmentize
Use -l and -x to:
pygmentize -f html -O full,style=emacs 
-l nanosqllexer.py:NanoSqlLexer -x 
-o example.html example.nsql
Source code for NanoSQL lexer
● Life coding!
● Starting from a skeleton
● Gradually adding regular expressions to render
more elements
Skeleton for NanoSQL lexer
from pygments.lexer import RegexLexer, words
from pygments.token import Comment, Keyword, Name, Number, String, 
Operator, Punctuation, Whitespace
_NANOSQL_KEYWORDS = (
'as',
'from',
'select',
'where',
)
class NanoSqlLexer(RegexLexer):
name = 'NanoSQL'
aliases = ['nanosql']
filenames = ['*.nsql']
tokens = {
'root': [
# TODO: Add rules.
],
}
Words to be treated
as keywords.
Names recognized by
pygmentize’s -l option
Patterns recognized by
get_lexer_by_filename().
Render unknown tokens as Error
from pygments.lexer import RegexLexer, words
from pygments.token import Comment, Keyword, Name, Number, String, 
Operator, Punctuation, Whitespace
_NANOSQL_KEYWORDS = (
'as',
'from',
'select',
'where',
)
class NanoSqlLexer(RegexLexer):
name = 'NanoSQL'
aliases = ['nanosql']
filenames = ['*.nsql']
tokens = {
'root': [
# TODO: Add rules.
],
}
Detect comments
class NanoSqlLexer(RegexLexer):
name = 'NanoSQL'
aliases = ['nanosql']
filenames = ['*.nsql']
tokens = {
'root': [
(r'--.*?$', Comment),
],
}
Detect whitespace
class NanoSqlLexer(RegexLexer):
name = 'NanoSQL'
aliases = ['nanosql']
filenames = ['*.nsql']
tokens = {
'root': [
(r's+', Whitespace),
(r'--.*?$', Comment),
],
}
Detect names
class NanoSqlLexer(RegexLexer):
name = 'NanoSQL'
aliases = ['nanosql']
filenames = ['*.nsql']
tokens = {
'root': [
(r's+', Whitespace),
(r'--.*?$', Comment),
(r'w+', Name),
],
}
w = [a-zA-Z0-9_]
Detect numbers
class NanoSqlLexer(RegexLexer):
name = 'NanoSQL'
aliases = ['nanosql']
filenames = ['*.nsql']
tokens = {
'root': [
(r's+', Whitespace),
(r'--.*?$', Comment),
(r'd+', Number),
(r'w+', Name),
],
}
d = [0-9]
Must check
before w
Detect keywords
class NanoSqlLexer(RegexLexer):
name = 'NanoSQL'
aliases = ['nanosql']
filenames = ['*.nsql']
tokens = {
'root': [
(words(_NANOSQL_KEYWORDS, suffix=r'b'), Keyword),
(r's+', Whitespace),
(r'--.*?$', Comment),
(r'd+', Number),
(r'w+', Name),
]
}
Detect keywords
class NanoSqlLexer(RegexLexer):
name = 'NanoSQL'
aliases = ['nanosql']
filenames = ['*.nsql']
tokens = {
'root': [
(words(_NANOSQL_KEYWORDS, suffix=r'b'), Keyword),
(r's+', Whitespace),
(r'--.*?$', Comment),
(r'd+', Number),
(r'w+', Name),
]
}
words() takes a list of strings
and returns an optimized
pattern for a regular expression
that matches any of these
strings.
b = end of word
Detect punctuation and operators
class NanoSqlLexer(RegexLexer):
name = 'NanoSQL'
aliases = ['nanosql']
filenames = ['*.nsql']
tokens = {
'root': [
(words(_NANOSQL_KEYWORDS, suffix=r'b'), Keyword),
(r's+', Whitespace),
(r'--.*?$', Comment),
(r'd+', Number),
(r'w+', Name),
(r'[.,;:]', Punctuation),
(r'[<>=/*+-]', Operator),
],
}
Detect string – finished!
class NanoSqlLexer(RegexLexer):
name = 'NanoSQL'
aliases = ['nanosql']
filenames = ['*.nsql']
tokens = {
'root': [
(words(_NANOSQL_KEYWORDS, suffix=r'b'), Keyword),
(r's+', Whitespace),
(r'--.*?$', Comment),
(r'd+', Number),
(r'w+', Name),
(r'[.,;:]', Punctuation),
(r'[<>=/*+-]', Operator),
(''', String, 'string'),
],
'string': [
("''", String),
(r'[^']', String),
("'", String, '#pop')
]
}
Detect string – finished!
class NanoSqlLexer(RegexLexer):
name = 'NanoSQL'
aliases = ['nanosql']
filenames = ['*.nsql']
tokens = {
'root': [
(words(_NANOSQL_KEYWORDS, suffix=r'b'), Keyword),
(r's+', Whitespace),
(r'--.*?$', Comment),
(r'd+', Number),
(r'w+', Name),
(r'[.,;:]', Punctuation),
(r'[<>=/*+-]', Operator),
(''', String, 'string'),
],
'string': [
("''", String),
(r'[^']', String),
("'", String, '#pop')
]
}
Change state
to ‘string’
Double single quote
(escaped quote)
On single quote, terminate string and
revert lexer to previous state (‘root’)
“Anything except
single quote”
Regex fetish note
You can squeeze string tokens in a single regex
rule without the need for a separate state:
(r"'(|'|''|[^'])*'", String),
Conclusion
Summary
● Pygments is a versatile Python package to
syntax highlight over 300 programming
languages and text formats.
● Use pygmentize to create highlighted code as
HTML, LaTex or RTF.
● Utilize lexers to implement code converters and
analyzers.
● Writing your own lexers is simple.

More Related Content

What's hot

PHP - Web Development
PHP - Web DevelopmentPHP - Web Development
PHP - Web Development
Niladri Karmakar
 
Python for web security - beginner
Python for web security - beginnerPython for web security - beginner
Python for web security - beginner
Sanjeev Kumar Jaiswal
 
Magento code audit
Magento code auditMagento code audit
Introduction to Perl
Introduction to PerlIntroduction to Perl
Session Server - Maintaing State between several Servers
Session Server - Maintaing State between several ServersSession Server - Maintaing State between several Servers
Session Server - Maintaing State between several Servers
Stephan Schmidt
 
Security Meetup 22 октября. «Реверс-инжиниринг в Enterprise». Алексей Секрето...
Security Meetup 22 октября. «Реверс-инжиниринг в Enterprise». Алексей Секрето...Security Meetup 22 октября. «Реверс-инжиниринг в Enterprise». Алексей Секрето...
Security Meetup 22 октября. «Реверс-инжиниринг в Enterprise». Алексей Секрето...
Mail.ru Group
 
PHP Object Injection Vulnerability in WordPress: an Analysis
PHP Object Injection Vulnerability in WordPress: an AnalysisPHP Object Injection Vulnerability in WordPress: an Analysis
PHP Object Injection Vulnerability in WordPress: an Analysis
Positive Hack Days
 
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]
RootedCON
 
Fantastic DSL in Python
Fantastic DSL in PythonFantastic DSL in Python
Fantastic DSL in Python
kwatch
 
Book
BookBook
Book
luis_lmro
 
Chatting dengan beberapa pc laptop
Chatting dengan beberapa pc laptopChatting dengan beberapa pc laptop
Chatting dengan beberapa pc laptop
yayaria
 
"ClojureScript journey: from little script, to CLI program, to AWS Lambda fun...
"ClojureScript journey: from little script, to CLI program, to AWS Lambda fun..."ClojureScript journey: from little script, to CLI program, to AWS Lambda fun...
"ClojureScript journey: from little script, to CLI program, to AWS Lambda fun...
Julia Cherniak
 
groovy & grails - lecture 3
groovy & grails - lecture 3groovy & grails - lecture 3
groovy & grails - lecture 3
Alexandre Masselot
 
The promise of asynchronous PHP
The promise of asynchronous PHPThe promise of asynchronous PHP
The promise of asynchronous PHP
Wim Godden
 
Php introduction
Php introductionPhp introduction
Php introduction
Osama Ghandour Geris
 
The Browser Environment - A Systems Programmer's Perspective [sinatra edition]
The Browser Environment - A Systems Programmer's Perspective [sinatra edition]The Browser Environment - A Systems Programmer's Perspective [sinatra edition]
The Browser Environment - A Systems Programmer's Perspective [sinatra edition]
Eleanor McHugh
 
PHP 8: Process & Fixing Insanity
PHP 8: Process & Fixing InsanityPHP 8: Process & Fixing Insanity
PHP 8: Process & Fixing Insanity
GeorgePeterBanyard
 
Apache AVRO (Boston HUG, Jan 19, 2010)
Apache AVRO (Boston HUG, Jan 19, 2010)Apache AVRO (Boston HUG, Jan 19, 2010)
Apache AVRO (Boston HUG, Jan 19, 2010)
Cloudera, Inc.
 
Functional Algebra: Monoids Applied
Functional Algebra: Monoids AppliedFunctional Algebra: Monoids Applied
Functional Algebra: Monoids Applied
Susan Potter
 
Initial Java Core Concept
Initial Java Core ConceptInitial Java Core Concept
Initial Java Core Concept
Rays Technologies
 

What's hot (20)

PHP - Web Development
PHP - Web DevelopmentPHP - Web Development
PHP - Web Development
 
Python for web security - beginner
Python for web security - beginnerPython for web security - beginner
Python for web security - beginner
 
Magento code audit
Magento code auditMagento code audit
Magento code audit
 
Introduction to Perl
Introduction to PerlIntroduction to Perl
Introduction to Perl
 
Session Server - Maintaing State between several Servers
Session Server - Maintaing State between several ServersSession Server - Maintaing State between several Servers
Session Server - Maintaing State between several Servers
 
Security Meetup 22 октября. «Реверс-инжиниринг в Enterprise». Алексей Секрето...
Security Meetup 22 октября. «Реверс-инжиниринг в Enterprise». Алексей Секрето...Security Meetup 22 октября. «Реверс-инжиниринг в Enterprise». Алексей Секрето...
Security Meetup 22 октября. «Реверс-инжиниринг в Enterprise». Алексей Секрето...
 
PHP Object Injection Vulnerability in WordPress: an Analysis
PHP Object Injection Vulnerability in WordPress: an AnalysisPHP Object Injection Vulnerability in WordPress: an Analysis
PHP Object Injection Vulnerability in WordPress: an Analysis
 
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]
 
Fantastic DSL in Python
Fantastic DSL in PythonFantastic DSL in Python
Fantastic DSL in Python
 
Book
BookBook
Book
 
Chatting dengan beberapa pc laptop
Chatting dengan beberapa pc laptopChatting dengan beberapa pc laptop
Chatting dengan beberapa pc laptop
 
"ClojureScript journey: from little script, to CLI program, to AWS Lambda fun...
"ClojureScript journey: from little script, to CLI program, to AWS Lambda fun..."ClojureScript journey: from little script, to CLI program, to AWS Lambda fun...
"ClojureScript journey: from little script, to CLI program, to AWS Lambda fun...
 
groovy & grails - lecture 3
groovy & grails - lecture 3groovy & grails - lecture 3
groovy & grails - lecture 3
 
The promise of asynchronous PHP
The promise of asynchronous PHPThe promise of asynchronous PHP
The promise of asynchronous PHP
 
Php introduction
Php introductionPhp introduction
Php introduction
 
The Browser Environment - A Systems Programmer's Perspective [sinatra edition]
The Browser Environment - A Systems Programmer's Perspective [sinatra edition]The Browser Environment - A Systems Programmer's Perspective [sinatra edition]
The Browser Environment - A Systems Programmer's Perspective [sinatra edition]
 
PHP 8: Process & Fixing Insanity
PHP 8: Process & Fixing InsanityPHP 8: Process & Fixing Insanity
PHP 8: Process & Fixing Insanity
 
Apache AVRO (Boston HUG, Jan 19, 2010)
Apache AVRO (Boston HUG, Jan 19, 2010)Apache AVRO (Boston HUG, Jan 19, 2010)
Apache AVRO (Boston HUG, Jan 19, 2010)
 
Functional Algebra: Monoids Applied
Functional Algebra: Monoids AppliedFunctional Algebra: Monoids Applied
Functional Algebra: Monoids Applied
 
Initial Java Core Concept
Initial Java Core ConceptInitial Java Core Concept
Initial Java Core Concept
 

Similar to Introduction to pygments

Writing Parsers and Compilers with PLY
Writing Parsers and Compilers with PLYWriting Parsers and Compilers with PLY
Writing Parsers and Compilers with PLY
David Beazley (Dabeaz LLC)
 
Embedded Typesafe Domain Specific Languages for Java
Embedded Typesafe Domain Specific Languages for JavaEmbedded Typesafe Domain Specific Languages for Java
Embedded Typesafe Domain Specific Languages for Java
Jevgeni Kabanov
 
Processing XML with Java
Processing XML with JavaProcessing XML with Java
Processing XML with Java
BG Java EE Course
 
Prompt engineering for iOS developers (How LLMs and GenAI work)
Prompt engineering for iOS developers (How LLMs and GenAI work)Prompt engineering for iOS developers (How LLMs and GenAI work)
Prompt engineering for iOS developers (How LLMs and GenAI work)
Andrey Volobuev
 
The Ring programming language version 1.9 book - Part 95 of 210
The Ring programming language version 1.9 book - Part 95 of 210The Ring programming language version 1.9 book - Part 95 of 210
The Ring programming language version 1.9 book - Part 95 of 210
Mahmoud Samir Fayed
 
July 11 Weekly Code Drop Part 1 of 3 Creating an S3 library
July 11 Weekly Code Drop Part 1 of 3 Creating an S3 libraryJuly 11 Weekly Code Drop Part 1 of 3 Creating an S3 library
July 11 Weekly Code Drop Part 1 of 3 Creating an S3 library
jasonc411
 
Code Generation
Code GenerationCode Generation
Code Generation
Eelco Visser
 
I need help building a dictionary for the unique packets tha.pdf
I need help building a dictionary for the unique packets tha.pdfI need help building a dictionary for the unique packets tha.pdf
I need help building a dictionary for the unique packets tha.pdf
sukhvir71
 
NyaruDBにゃるものを使ってみた話 (+Realm比較)
NyaruDBにゃるものを使ってみた話 (+Realm比較)NyaruDBにゃるものを使ってみた話 (+Realm比較)
NyaruDBにゃるものを使ってみた話 (+Realm比較)
Masaki Oshikawa
 
Language processor implementation using python
Language processor implementation using pythonLanguage processor implementation using python
Language processor implementation using python
Viktor Pyskunov
 
Having Fun Programming!
Having Fun Programming!Having Fun Programming!
Having Fun Programming!
Aaron Patterson
 
Writing a compiler in go
Writing a compiler in goWriting a compiler in go
Writing a compiler in go
Yusuke Kita
 
Introducing Apache Spark's Data Frames and Dataset APIs workshop series
Introducing Apache Spark's Data Frames and Dataset APIs workshop seriesIntroducing Apache Spark's Data Frames and Dataset APIs workshop series
Introducing Apache Spark's Data Frames and Dataset APIs workshop series
Holden Karau
 
Unit-2 Getting Input from User.pptx
Unit-2 Getting Input from User.pptxUnit-2 Getting Input from User.pptx
Unit-2 Getting Input from User.pptx
Lovely Professional University
 
Getting Input from User
Getting Input from UserGetting Input from User
Getting Input from User
Lovely Professional University
 
How to check valid Email? Find using regex.
How to check valid Email? Find using regex.How to check valid Email? Find using regex.
How to check valid Email? Find using regex.
Poznań Ruby User Group
 
The Ring programming language version 1.5.1 book - Part 78 of 180
The Ring programming language version 1.5.1 book - Part 78 of 180The Ring programming language version 1.5.1 book - Part 78 of 180
The Ring programming language version 1.5.1 book - Part 78 of 180
Mahmoud Samir Fayed
 
Linq
LinqLinq
Linq
samneang
 
ElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersElasticSearch for .NET Developers
ElasticSearch for .NET Developers
Ben van Mol
 
Python build your security tools.pdf
Python build your security tools.pdfPython build your security tools.pdf
Python build your security tools.pdf
TECHNOLOGY CONTROL CO.
 

Similar to Introduction to pygments (20)

Writing Parsers and Compilers with PLY
Writing Parsers and Compilers with PLYWriting Parsers and Compilers with PLY
Writing Parsers and Compilers with PLY
 
Embedded Typesafe Domain Specific Languages for Java
Embedded Typesafe Domain Specific Languages for JavaEmbedded Typesafe Domain Specific Languages for Java
Embedded Typesafe Domain Specific Languages for Java
 
Processing XML with Java
Processing XML with JavaProcessing XML with Java
Processing XML with Java
 
Prompt engineering for iOS developers (How LLMs and GenAI work)
Prompt engineering for iOS developers (How LLMs and GenAI work)Prompt engineering for iOS developers (How LLMs and GenAI work)
Prompt engineering for iOS developers (How LLMs and GenAI work)
 
The Ring programming language version 1.9 book - Part 95 of 210
The Ring programming language version 1.9 book - Part 95 of 210The Ring programming language version 1.9 book - Part 95 of 210
The Ring programming language version 1.9 book - Part 95 of 210
 
July 11 Weekly Code Drop Part 1 of 3 Creating an S3 library
July 11 Weekly Code Drop Part 1 of 3 Creating an S3 libraryJuly 11 Weekly Code Drop Part 1 of 3 Creating an S3 library
July 11 Weekly Code Drop Part 1 of 3 Creating an S3 library
 
Code Generation
Code GenerationCode Generation
Code Generation
 
I need help building a dictionary for the unique packets tha.pdf
I need help building a dictionary for the unique packets tha.pdfI need help building a dictionary for the unique packets tha.pdf
I need help building a dictionary for the unique packets tha.pdf
 
NyaruDBにゃるものを使ってみた話 (+Realm比較)
NyaruDBにゃるものを使ってみた話 (+Realm比較)NyaruDBにゃるものを使ってみた話 (+Realm比較)
NyaruDBにゃるものを使ってみた話 (+Realm比較)
 
Language processor implementation using python
Language processor implementation using pythonLanguage processor implementation using python
Language processor implementation using python
 
Having Fun Programming!
Having Fun Programming!Having Fun Programming!
Having Fun Programming!
 
Writing a compiler in go
Writing a compiler in goWriting a compiler in go
Writing a compiler in go
 
Introducing Apache Spark's Data Frames and Dataset APIs workshop series
Introducing Apache Spark's Data Frames and Dataset APIs workshop seriesIntroducing Apache Spark's Data Frames and Dataset APIs workshop series
Introducing Apache Spark's Data Frames and Dataset APIs workshop series
 
Unit-2 Getting Input from User.pptx
Unit-2 Getting Input from User.pptxUnit-2 Getting Input from User.pptx
Unit-2 Getting Input from User.pptx
 
Getting Input from User
Getting Input from UserGetting Input from User
Getting Input from User
 
How to check valid Email? Find using regex.
How to check valid Email? Find using regex.How to check valid Email? Find using regex.
How to check valid Email? Find using regex.
 
The Ring programming language version 1.5.1 book - Part 78 of 180
The Ring programming language version 1.5.1 book - Part 78 of 180The Ring programming language version 1.5.1 book - Part 78 of 180
The Ring programming language version 1.5.1 book - Part 78 of 180
 
Linq
LinqLinq
Linq
 
ElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersElasticSearch for .NET Developers
ElasticSearch for .NET Developers
 
Python build your security tools.pdf
Python build your security tools.pdfPython build your security tools.pdf
Python build your security tools.pdf
 

More from roskakori

Expanding skill sets - Broaden your perspective on design
Expanding skill sets - Broaden your perspective on designExpanding skill sets - Broaden your perspective on design
Expanding skill sets - Broaden your perspective on design
roskakori
 
Django trifft Flutter
Django trifft FlutterDjango trifft Flutter
Django trifft Flutter
roskakori
 
Multiple django applications on a single server with nginx
Multiple django applications on a single server with nginxMultiple django applications on a single server with nginx
Multiple django applications on a single server with nginx
roskakori
 
Helpful pre commit hooks for Python and Django
Helpful pre commit hooks for Python and DjangoHelpful pre commit hooks for Python and Django
Helpful pre commit hooks for Python and Django
roskakori
 
Startmeeting Interessengruppe NLP NLU Graz
Startmeeting Interessengruppe NLP NLU GrazStartmeeting Interessengruppe NLP NLU Graz
Startmeeting Interessengruppe NLP NLU Graz
roskakori
 
Helpful logging with python
Helpful logging with pythonHelpful logging with python
Helpful logging with python
roskakori
 
Helpful logging with Java
Helpful logging with JavaHelpful logging with Java
Helpful logging with Java
roskakori
 
Einführung in Kommunikation und Konfliktmanagement für Software-Entwickler
Einführung in Kommunikation und Konfliktmanagement für Software-EntwicklerEinführung in Kommunikation und Konfliktmanagement für Software-Entwickler
Einführung in Kommunikation und Konfliktmanagement für Software-Entwickler
roskakori
 
Analyzing natural language feedback using python
Analyzing natural language feedback using pythonAnalyzing natural language feedback using python
Analyzing natural language feedback using python
roskakori
 
Microsoft SQL Server with Linux and Docker
Microsoft SQL Server with Linux and DockerMicrosoft SQL Server with Linux and Docker
Microsoft SQL Server with Linux and Docker
roskakori
 
Migration to Python 3 in Finance
Migration to Python 3 in FinanceMigration to Python 3 in Finance
Migration to Python 3 in Finance
roskakori
 
Lösungsorientierte Fehlerbehandlung
Lösungsorientierte FehlerbehandlungLösungsorientierte Fehlerbehandlung
Lösungsorientierte Fehlerbehandlung
roskakori
 
XML namespaces and XPath with Python
XML namespaces and XPath with PythonXML namespaces and XPath with Python
XML namespaces and XPath with Python
roskakori
 
Erste-Hilfekasten für Unicode mit Python
Erste-Hilfekasten für Unicode mit PythonErste-Hilfekasten für Unicode mit Python
Erste-Hilfekasten für Unicode mit Python
roskakori
 
Introduction to trader bots with Python
Introduction to trader bots with PythonIntroduction to trader bots with Python
Introduction to trader bots with Python
roskakori
 
Open source projects with python
Open source projects with pythonOpen source projects with python
Open source projects with python
roskakori
 
Python builds mit ant
Python builds mit antPython builds mit ant
Python builds mit ant
roskakori
 
Kanban zur Abwicklung von Reporting-Anforderungen
Kanban zur Abwicklung von Reporting-AnforderungenKanban zur Abwicklung von Reporting-Anforderungen
Kanban zur Abwicklung von Reporting-Anforderungen
roskakori
 

More from roskakori (18)

Expanding skill sets - Broaden your perspective on design
Expanding skill sets - Broaden your perspective on designExpanding skill sets - Broaden your perspective on design
Expanding skill sets - Broaden your perspective on design
 
Django trifft Flutter
Django trifft FlutterDjango trifft Flutter
Django trifft Flutter
 
Multiple django applications on a single server with nginx
Multiple django applications on a single server with nginxMultiple django applications on a single server with nginx
Multiple django applications on a single server with nginx
 
Helpful pre commit hooks for Python and Django
Helpful pre commit hooks for Python and DjangoHelpful pre commit hooks for Python and Django
Helpful pre commit hooks for Python and Django
 
Startmeeting Interessengruppe NLP NLU Graz
Startmeeting Interessengruppe NLP NLU GrazStartmeeting Interessengruppe NLP NLU Graz
Startmeeting Interessengruppe NLP NLU Graz
 
Helpful logging with python
Helpful logging with pythonHelpful logging with python
Helpful logging with python
 
Helpful logging with Java
Helpful logging with JavaHelpful logging with Java
Helpful logging with Java
 
Einführung in Kommunikation und Konfliktmanagement für Software-Entwickler
Einführung in Kommunikation und Konfliktmanagement für Software-EntwicklerEinführung in Kommunikation und Konfliktmanagement für Software-Entwickler
Einführung in Kommunikation und Konfliktmanagement für Software-Entwickler
 
Analyzing natural language feedback using python
Analyzing natural language feedback using pythonAnalyzing natural language feedback using python
Analyzing natural language feedback using python
 
Microsoft SQL Server with Linux and Docker
Microsoft SQL Server with Linux and DockerMicrosoft SQL Server with Linux and Docker
Microsoft SQL Server with Linux and Docker
 
Migration to Python 3 in Finance
Migration to Python 3 in FinanceMigration to Python 3 in Finance
Migration to Python 3 in Finance
 
Lösungsorientierte Fehlerbehandlung
Lösungsorientierte FehlerbehandlungLösungsorientierte Fehlerbehandlung
Lösungsorientierte Fehlerbehandlung
 
XML namespaces and XPath with Python
XML namespaces and XPath with PythonXML namespaces and XPath with Python
XML namespaces and XPath with Python
 
Erste-Hilfekasten für Unicode mit Python
Erste-Hilfekasten für Unicode mit PythonErste-Hilfekasten für Unicode mit Python
Erste-Hilfekasten für Unicode mit Python
 
Introduction to trader bots with Python
Introduction to trader bots with PythonIntroduction to trader bots with Python
Introduction to trader bots with Python
 
Open source projects with python
Open source projects with pythonOpen source projects with python
Open source projects with python
 
Python builds mit ant
Python builds mit antPython builds mit ant
Python builds mit ant
 
Kanban zur Abwicklung von Reporting-Anforderungen
Kanban zur Abwicklung von Reporting-AnforderungenKanban zur Abwicklung von Reporting-Anforderungen
Kanban zur Abwicklung von Reporting-Anforderungen
 

Recently uploaded

Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
Federico Razzoli
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 

Recently uploaded (20)

Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 

Introduction to pygments

  • 2. What is pygments? ● Generic syntax highlighter ● Suitable for use in code hosting, forums, wikis or other applications ● Supports 300+ programming languages and text formats ● Provides a simple API to write your own lexers
  • 3. Agenda ● Basic usage ● A glimpse at the API: lexers and tokens ● Use case: convert source code ● Use case: write your own lexer
  • 5. Applications that use pygments ● Wikipedia ● Jupyter notebook ● Sphinx documentation builder ● Trac ticket tracker and wiki ● Bitbucket source code hosting ● Pygount source lines of code counter (shameless plug) ● And many others
  • 8. Use the command line ● pygmentize -f html -O full,style=emacs -o example.html example.sql ● Renders example.sql to example.html ● Without “-O full,style=emacs” you have to provide your own CSS ● Other formats: LaTex, RTF, ANSI sequences -- Simple SQL example. select customer_number, first_name, surname, date_of_birth from customer where date_of_birth >= '1990-01-01' and rating <= 20
  • 9. Choose a specific SQL dialect ● There are many SQL dialects ● Most use “.sql” as file suffix ● Use “-l <lexer>” to choose a specific lexer ● pygmentize -l tsql -f html -O full,style=emacs -o example.html transact.sql -- Simple Transact-SQL example. declare @date_of_birth date = '1990-01-01'; select top 10 * from [customer] where [date_of_birth] = @date_of_birth order by [customer_number]
  • 10. A glimpse at the API: lexers and tokens
  • 11. What are lexers? ● Lexers split a text into a list of tokens ● Tokens are strings with an assigned meaning ● For example, a Python source code might resolve to tokens like: – Comment: # Some comment – String: ‘Hellonworld!’ – Keyword: while – Number: 1.23e-45 ● Lexers only see single “words”, parsers see the whole syntax
  • 12. Split a source code into tokens Source code for example.sql: -- Simple SQL example. select customer_number, first_name, surname, date_of_birth from customer where date_of_birth >= '1990-01-01' and rating <= 20
  • 13. Tokens for example.sql (Token.Comment.Single, '-- Simple SQL example.n') (Token.Keyword, 'select') (Token.Text, 'n ') (Token.Name, 'customer_number') (Token.Punctuation, ',') … (Token.Operator, '>') ... (Token.Literal.String.Single, "'1990-01-01'") ... (Token.Literal.Number.Integer, '20') ... -- Simple SQL example. select customer_number, first_name, surname, date_of_birth from customer where date_of_birth >= '1990-01-01' and rating <= 20
  • 14. Source code to lex example.sql import pygments.lexers import pygments.token def print_tokens(source_path): # Read source code into string. with open(source_path, encoding='utf-8') as source_file: source_text = source_file.read() # Find a fitting lexer. lexer = pygments.lexers.guess_lexer_for_filename( source_path, source_text) # Print tokens from source code. for items in lexer.get_tokens(source_text): print(items)
  • 15. Source code to lex example.sql Obtain token sequence Find lexer matching the source code import pygments.lexers import pygments.token def print_tokens(source_path): # Read source code into string. with open(source_path, encoding='utf-8') as source_file: source_text = source_file.read() # Find a fitting lexer. lexer = pygments.lexers.guess_lexer_for_filename( source_path, source_text) # Print tokens from source code. for items in lexer.get_tokens(source_text): print(items)
  • 16. Tokens in pygments ● Tokens are tuples with 2 items: – Type, e.g. Token.Comment – Text, e.g. ‘# Some comment’ ● Tokens are defined in pygments.token ● Some token types have subtypes, e.g. Comment has Comment.Single, Comment.Multiline etc. ● In that case, use “in” instead of “==” to check if a token type matches, e.g.: if token_type in pygments.token.Comment: ...
  • 18. Convert source code ● Why? To match coding guidelines! ● Example: “SQL keywords must be lower case”→ faster to read ● Despite that, a lot of SQL code uses upper case for keywords. ● Legacy from the mainframe era and when text editors did not have syntax highlighting. SELECT CustomerNumber, FirstName, Surname FROM Customer WHERE DateOfBirth >= '1990-01-01'
  • 19. Convert source code SELECT CustomerNumber, FirstName, Surname FROM Customer WHERE DateOfBirth >= '1990-01-01' select CustomerNumber, FirstName, Surname from Customer where DateOfBirth >= '1990-01-01'
  • 20. Convert source code Check for keywords and convert them to lower case def lowify_sql_keywords(source_path, target_path): # Read source code into string. with open(source_path, encoding='utf-8') as source_file: source_text = source_file.read() # Find a fitting lexer. lexer = pygments.lexers.guess_lexer_for_filename( source_path, source_text) # Lex the source, convert keywords and write target file. with open(target_path, 'w', encoding='utf-8') as target_file: for token_type, token_text in lexer.get_tokens(source_text): # Check for keywords and convert them to lower case. if token_type == pygments.token.Keyword: token_text = token_text.lower() target_file.write(token_text)
  • 21. Write your own lexer
  • 22. Why write your own lexer? ● To support new languages ● To support obscure languages (mainframe FTW!) ● To support in house domain specific languages (DSL)
  • 23. How to write your own lexer ● All the gory details: http://pygments.org/docs/lexerdevelopment/ ● For most practical purposes, inherit from RegexLexer ● Basic knowledge of regular expressions required (“import re”)
  • 24. NanoSQL ● Small subset if SQL ● Comment: -- Some comment ● Keyword: select ● Integer number: 123 ● String: ‘Hello’; use ‘’ to escape ● Name: Customer ● Punctuation: .,;:
  • 25. External lexers with pygmentize Use -l and -x to: pygmentize -f html -O full,style=emacs -l nanosqllexer.py:NanoSqlLexer -x -o example.html example.nsql
  • 26. Source code for NanoSQL lexer ● Life coding! ● Starting from a skeleton ● Gradually adding regular expressions to render more elements
  • 27. Skeleton for NanoSQL lexer from pygments.lexer import RegexLexer, words from pygments.token import Comment, Keyword, Name, Number, String, Operator, Punctuation, Whitespace _NANOSQL_KEYWORDS = ( 'as', 'from', 'select', 'where', ) class NanoSqlLexer(RegexLexer): name = 'NanoSQL' aliases = ['nanosql'] filenames = ['*.nsql'] tokens = { 'root': [ # TODO: Add rules. ], } Words to be treated as keywords. Names recognized by pygmentize’s -l option Patterns recognized by get_lexer_by_filename().
  • 28. Render unknown tokens as Error from pygments.lexer import RegexLexer, words from pygments.token import Comment, Keyword, Name, Number, String, Operator, Punctuation, Whitespace _NANOSQL_KEYWORDS = ( 'as', 'from', 'select', 'where', ) class NanoSqlLexer(RegexLexer): name = 'NanoSQL' aliases = ['nanosql'] filenames = ['*.nsql'] tokens = { 'root': [ # TODO: Add rules. ], }
  • 29. Detect comments class NanoSqlLexer(RegexLexer): name = 'NanoSQL' aliases = ['nanosql'] filenames = ['*.nsql'] tokens = { 'root': [ (r'--.*?$', Comment), ], }
  • 30. Detect whitespace class NanoSqlLexer(RegexLexer): name = 'NanoSQL' aliases = ['nanosql'] filenames = ['*.nsql'] tokens = { 'root': [ (r's+', Whitespace), (r'--.*?$', Comment), ], }
  • 31. Detect names class NanoSqlLexer(RegexLexer): name = 'NanoSQL' aliases = ['nanosql'] filenames = ['*.nsql'] tokens = { 'root': [ (r's+', Whitespace), (r'--.*?$', Comment), (r'w+', Name), ], } w = [a-zA-Z0-9_]
  • 32. Detect numbers class NanoSqlLexer(RegexLexer): name = 'NanoSQL' aliases = ['nanosql'] filenames = ['*.nsql'] tokens = { 'root': [ (r's+', Whitespace), (r'--.*?$', Comment), (r'd+', Number), (r'w+', Name), ], } d = [0-9] Must check before w
  • 33. Detect keywords class NanoSqlLexer(RegexLexer): name = 'NanoSQL' aliases = ['nanosql'] filenames = ['*.nsql'] tokens = { 'root': [ (words(_NANOSQL_KEYWORDS, suffix=r'b'), Keyword), (r's+', Whitespace), (r'--.*?$', Comment), (r'd+', Number), (r'w+', Name), ] }
  • 34. Detect keywords class NanoSqlLexer(RegexLexer): name = 'NanoSQL' aliases = ['nanosql'] filenames = ['*.nsql'] tokens = { 'root': [ (words(_NANOSQL_KEYWORDS, suffix=r'b'), Keyword), (r's+', Whitespace), (r'--.*?$', Comment), (r'd+', Number), (r'w+', Name), ] } words() takes a list of strings and returns an optimized pattern for a regular expression that matches any of these strings. b = end of word
  • 35. Detect punctuation and operators class NanoSqlLexer(RegexLexer): name = 'NanoSQL' aliases = ['nanosql'] filenames = ['*.nsql'] tokens = { 'root': [ (words(_NANOSQL_KEYWORDS, suffix=r'b'), Keyword), (r's+', Whitespace), (r'--.*?$', Comment), (r'd+', Number), (r'w+', Name), (r'[.,;:]', Punctuation), (r'[<>=/*+-]', Operator), ], }
  • 36. Detect string – finished! class NanoSqlLexer(RegexLexer): name = 'NanoSQL' aliases = ['nanosql'] filenames = ['*.nsql'] tokens = { 'root': [ (words(_NANOSQL_KEYWORDS, suffix=r'b'), Keyword), (r's+', Whitespace), (r'--.*?$', Comment), (r'd+', Number), (r'w+', Name), (r'[.,;:]', Punctuation), (r'[<>=/*+-]', Operator), (''', String, 'string'), ], 'string': [ ("''", String), (r'[^']', String), ("'", String, '#pop') ] }
  • 37. Detect string – finished! class NanoSqlLexer(RegexLexer): name = 'NanoSQL' aliases = ['nanosql'] filenames = ['*.nsql'] tokens = { 'root': [ (words(_NANOSQL_KEYWORDS, suffix=r'b'), Keyword), (r's+', Whitespace), (r'--.*?$', Comment), (r'd+', Number), (r'w+', Name), (r'[.,;:]', Punctuation), (r'[<>=/*+-]', Operator), (''', String, 'string'), ], 'string': [ ("''", String), (r'[^']', String), ("'", String, '#pop') ] } Change state to ‘string’ Double single quote (escaped quote) On single quote, terminate string and revert lexer to previous state (‘root’) “Anything except single quote”
  • 38. Regex fetish note You can squeeze string tokens in a single regex rule without the need for a separate state: (r"'(|'|''|[^'])*'", String),
  • 40. Summary ● Pygments is a versatile Python package to syntax highlight over 300 programming languages and text formats. ● Use pygmentize to create highlighted code as HTML, LaTex or RTF. ● Utilize lexers to implement code converters and analyzers. ● Writing your own lexers is simple.