Here are a few ways a DSL could potentially benefit a current project:
- Simplify or reduce complexity of certain tasks. A DSL tailored to a specific domain or problem could make those tasks easier to understand and perform.
- Improve productivity of non-programmers. A DSL designed with the intended users in mind could allow others like domain experts or analysts to accomplish things without programming.
- Enforce correctness or best practices. By limiting what can be expressed in a DSL, it reduces the possibility of certain errors or forces compliance with standards.
- Separate logic from implementation. Expressing logic or algorithms in a DSL abstracts it away from implementation details, making it more portable and maintain
2. Overview
Why DSLs?
What are some most commonly used DSLs? Why do they exist?
A practical example from a project. How to create a parser?
3. DSL = Domain Specific Language
General purpose languages are general; you can do anything with them, but
they're not optimized for anything specific.
Domain specific languages - also called mini languages- are made to do one
thing well.
Sometimes it’s hard to say which box a language falls to.
4. General vs domain specific: a trade off
General languages can do many things, and do
them pretty well too.
But they’re not optimized for any particular
thing.
Domain specific language is optimized for
solving a particular problem. This means that it
usually doesn’t work well for other kinds of
problems.
Benefits of a DSL in a right place:
● Can be easier to learn
● Smaller feature set
● Limitations to what the user can do
● Easier to reason about the problem
(metalinguistic abstraction)
● Can reduce complexity
● Easier to do division of labor with non-
programmers
5. Some of the most commonly used DSL for
web programmers
● HTML
○ Reduces complexity a lot. Think of the alternative.
○ But not specific enough, we need more
● CSS
○ Yet another mini-language, because there was need for it
○ Not enough
○ SASS, LESS
● MarkDown
● Django Templating Language, J inja
● Front-end templating languages of various frameworks
7. Example of a custom-made DSL
We have a need to match job ads to job titles on a web site that use Django
We want non-programming people to be able to understand and write rules that
match job ads to job titles
Solution: search expressions
8. Search expression
that matches to
ads with word
“driver” in heading,
but no “truck” or
“bus”
(heading ~ driver)
NOT
(
heading ~ truck
heading ~ bus
)
Compiles to Django query something like this:
Q(heading__icontains='driver')&~(Q(heading__icon
tains='truck')|Q(heading__icontains='bus'))
9. Agent for an artist
Agent for an artist
(
heading ~ agent
)
AND
(
descr ~ music
descr ~ culture
descr ~ art
descr ~ theater
descr ~ entertainment
)
Compiles to:
Q(heading__icontains='agent')&
(Q(heading__icontains='music')|
Q(heading__icontains='culture')...)
10. Ok, how to make this work?
A few weeks ago, the recommendation was not to roll your own parser using
regular expressions
Rolling your own by hand without using regular expressions is not necessarily
better. The previous implementation was very hard to understand or maintain.
11. Lexer and parser generation lex/yacc style
Lexer converts the string, or sequence of characters, to a sequence of tokens
Lex is a tool for generating lexers
Parser generator creates a parser that can go through the sequence of tokens
and generate
Yacc is a parser generator
12. List of tools shamelessly stolen from
previous presentation
Lex/Yacc, Flex/Bison, PLY(Python Lex/Yacc)
ANTLR
pyPEG
tdparser for python
PEG.js
15. PLY parser
def p_approximates(p):
"""approximates : field APPROXIMATES value"""
p[0] = ('~', p[1], p[3])
def p_field(p):
"""field : TEXT"""
field = p[1].strip()
if field not in INDEXED:
raise TypeError("Field {} is not among the accepted ones".format(field))
p[0] = field
def p_value(p):
"""value : TEXT"""
p[0] = p[1]
def p_error(p):
raise TypeError("Error parsing '%s': %s" % (p.value, p))
_parser = yacc.yacc()
class Parser(object):
@staticmethod
def parse(expression):
if not expression.strip():
return tuple()
return _parser.parse(str(expression.replace('rn', 'n')))
parser = Parser()
16. Now we can parse
syntax tree to Q
expression
def to_q_expression(expression_str):
try:
expression = parser.parse(expression_str)
except TypeError as e:
raise ValueError(e.message)
return _to_q_expression(expression)
def _to_q_expression(expression):
if not expression:
return Q()
operator = expression[0]
if operator == 'and':
return _to_q_expression(expression[1]) & _to_q_expression(expression[2])
elif operator == 'andnot':
return _to_q_expression(expression[1]) & ~_to_q_expression(expression[2])
elif operator == 'or':
return _to_q_expression(expression[1]) | _to_q_expression(expression[2])
elif operator in ('=', '~'):
field = expression[1].strip()
value = expression[2]
if value.startswith(u'"') and value.endswith(u'"'):
value = value.strip(u'"')
field_query_type = 'iexact' if operator == '=' else 'icontains'
if value.startswith(u'*') and value.endswith(u'*'):
value = value[1:-1]
field_query_type = 'icontains'
elif value.startswith(u'*'):
value = u'.+{}$'.format(value[1:])
field_query_type = 'iregex'
elif value.endswith(u'*'):
value = u'^{}.*'.format(value[:-1])
field_query_type = 'iregex'
return Q(**{u'{}__{}'.format(field, field_query_type): value})
raise ValueError("Unknown operator '{}'".format(operator))
17. These query expressions
have a weakness: think
about the case where
you're categorizing
hundreds of jobtitles to
thousands of different job
ads
Many queries
Benefit of having an
abstract syntax tree: we
can turn it to something
else than query
expressions. Say, python
comparison functions
def _to_python_expression(expression):
if not expression:
return u''
operator = expression[0]
if operator == 'and':
return u'({} and {})'.format(_to_python_expression(expression[1]), _to_python_expression(expression[2]))
elif operator == 'andnot':
return u'({} and not {})'.format(_to_python_expression(expression[1]), _to_python_expression(expression[2
elif operator == 'or':
return u'({} or {})'.format(_to_python_expression(expression[1]), _to_python_expression(expression[2]))
elif operator in ('=', '~'):
field = expression[1]
value = expression[2]
if value.startswith(u'"') and value.endswith(u'"'):
value = value.strip(u'"')
value = value.lower()
field_comparison = (u'{field}.lower() == u"{value}" if {field} else False'
if operator == '=' else
u'u"{value}" in {field}.lower() if {field} else False')
if value.startswith(u'*') and value.endswith(u'*'):
value = value[1:-1]
field_comparison = u'u"{value}" in {field}.lower() if {field} else False'
elif value.startswith(u'*'):
value = value[1:]
field_comparison = u'{field}.lower().startswith(u"{value}") if {field} else False'
elif value.endswith(u'*'):
value = value[:-1]
field_comparison = u'{field}.lower().endswith(u"{value}") if {field} else False'
field_reference = u'entry.access_attr("{}")'.format(field.replace(u'__', u'.'))
comparison = field_comparison.format(field=field_reference, value=value)
return u'({})n'.format(comparison)
18. Could you benefit from adding a domain
specific language to a current project of
yours?
How?