Telefónica Digital DevCon 2013
Iván Montes - @drslump
Talk is focused on mainstream languages
Many of the stuff we will talk about is already available
I know your favourite language probably does something
similar, or you know a hack to make it do it.
Scripting languages do have a compiler, they just happen to
include an evaluator too :-)
C++ is purposely left out of any comparison (Guys you need to
simplify the spec!)
If you don’t understand some terminology, ask!
What is the best?
Good news! You’re probably using it already
You can focus on solving your business problem
You are proﬁcient with it
You can deploy your product without fear
You have tooling available (editors, testing, packaging…)
You are not the only soul who knows how it works :-)
Can it be improved?
Expected functionality is ever growing
Solutions complexity is ever growing
We add layers and layers to keep it under control
Those layers are usually libraries and frameworks
Your really nice language is not that important anymore
You end up programming for the framework (does it pass the check list?)
We can separate a language in:
Speciﬁcation, Compiler, Runtime and Libraries
Of those we are only in control of the Libraries
So we abuse them to solve our problems
But we can end up creating others:
What’s wrong with Libraries?
They are meant for code reuse
We (ab)use them as language
customisation and extensibility
Cycle ends with lots of
abandonware even if you don’t
Check SourceForge, GitHub
and similar sites
The guy now maintaining
your 2009 project has
megabytes of legacy source
code, with no expertise to
hack it and nobody to offer
The cycle of doom then begins:
Our cool language toolchain
We expend time modifying
editors, build tools, …
The library gets updated or
we want to support a new
Disclaimer: It’s a hard discipline, requiring lots of talent
Basically the current situation in the mainstream market
It’s slowly changing though (guys at Microsoft Research
Designers are realising (again) that we are not totally
What we need is a very old tech, we just need it reﬁned.
Premise: Any non-tech-savvy person assumes
that a developer is someone intelligent.
Think highly of yourself!
We want/need more control over our primary tool.
Make yourself this question:
What do you think has evolved more in the last
25 years: compilers or cad software?
but I already can!
No, you don’t.
Some compilers support writing add ons for them.
The problem is neither being able to write an extension in C to
be called from the host language.
The Language itself must deﬁne how this extension mechanisms
work. It must be part of its speciﬁcation.
This way we get extensibility without sacriﬁcing portability and
risking vendor lock-in.
It helps with versioning, you extend a version of the language not
a version of a given compiler.
We should be able to extend pretty much anything in our
Extending the syntax and semantics (speciﬁcation) is doable
but not very mature yet.
Extending the runtime is expensive. You wan’t something
tested and battle tested and then tested again.
However extending the compiler is ready for prime time!
Preprocessors and lispy macros are cool, don’t resent me.
But we need something more reﬁned, almost fool-proof.
A compiler basically does three tasks:
Resolve (name resolution, type binding)
Parse (Ast construction)
Emit (machine code, bytecode).
Keep in mind that a bit is a measure for information!
Memory usage (Kb)
Compilers produce a lot of information because they need
to understand our code.
Unfortunately this pretty cool information is trashed once the
compilation is done.
Some languages support reﬂection, at least not everything is
lost (re Java fan boys, type erasure is not cool).
Given that a lot of very clever people put a lot of effort over
several years creating a compiler.
It seems like a huge waste of resources to not use it more.
Almost a reality (llvm, mono mcs, roslyn, …).
Note: You know that an IDE basically implements an incremental
compiler nowadays, right? Wonder why they feel slow now?
Better editor support
Compiler Service (II)
But we want more!
If only we could hook our logic into the compiler like we
do with build systems…
The following would be dirty cheap to have:
Injection of cross-concerns (logging, aspects, …)
Statically checked/expanded DSLs
Quality, Security and Business speciﬁc enforcements
Compiler Service (III)
All of the sudden, half our
libraries functionality is now
managed by the compiler!
That means for example:
Build systems will notify
errors generated by our
Wait! won’t we suffer the “cycle
of doom” too?
Editors will autocomplete
and report errors for
The logic is encapsulated
under the compiler public
All the tooling using the
compiler will automatically
support our use cases!
Besides, we reduced some
heavy weight from the runtime,
pretty cool for restrained mobile
apps for example.
Extending the compiler is nice but could be made
more user friendly.
When you have a pattern coming again and again in
your code, it will probably be better solved with a
Macros are a pretty old concept but they mostly refer
to “text macros”. What we need are AST Macros.
Instead of replacing chunks of text we operate on the
syntax tree of the compiler. Much safer and powerful!
macro using(expr as Expression):
fp = open(‘path/to/file.ext’)
# Note that the file resource is closed automatically
Normal language syntax constructs that are
somehow quoted (`[| .. |]` in the example).
Their purpose is to make creating complex AST
structures a breeze.
The compiler generates a separate AST for
them. It’s like a template.
When we inject them somewhere the compiler
will apply that template in that AST node.
Injection points in a quasi-quotation (`$(..)` in the
Their purpose is to parametrize quasi-quotations
to dynamically build arbitrary AST structures.
Resolves to a syntax node, the compiler will
insert that node in the template tree.
They keep their lexical information once
Why are they better?
Syntax is integrated and validated as normal code.
Most of the lexical information is kept, thus reported errors
after expansion contain valid information.
Toolchain friendly. They are transparent to consuming
DSL friendly. Capture common patterns to solve your
problems and offer a statically checked interface to ﬁt
Can’t libraries do this?
Yes, but they are not the right tool for the job.
Lots of stuff can be done at compile time, no need to bloat the
runtime with stuff that just makes developing easier.
Uniﬁed error reporting. Find out incorrect usages at compile
time instead of uncontrolled runtime exceptions.
The `assert` example shows pretty well the difference:
foo = 10
assert foo != 10
foo = 10
assert foo != 10
# AssertionError: foo != 10
Disclaimer: Not saying it’s a good
idea for a general language
The technology allows it and
current systems have enough
memory to pull it.
Traditionally parsing involved a
lexer generating tokens and a
parser (either a hand-coded
recursive descent or a generated
PEG style parsers (and other
pattern matching techniques)
operate the same on a stream of
chars (source code in a ﬁle), a tree
of XML nodes or any other object
They can be made to have
constant parsing time using
OMeta is an example of PEG
parser with a built in extension
Parsing rules (productions) can be
inherited and modiﬁed at will using
a pretty simple syntax.
Making the language ours
Units: delay = 10h + 30m, weight = 20Kg
Any sort of DSL would beneﬁt from having its speciﬁc syntax
Reduce verbosity: for i in 1..10
Editors won’t know how to colorise the new syntax
It’s tempting to abuse it
Makes your code less portable
Tightly coupled to the standard grammar
Have you checked clang recently?
Syntax of a language is the ﬁrst and
most important point of interaction.
Language design is still mainly driven
by the target audience expectations
(oh those curly braces everywhere).
The syntax must be very carefully
drafted to avoid ambiguities. Even if
designers still care very much about
When asked if they performed user
tests when designing C#, Hejlsberg
answered "only for the integration in
Visual Studio”. More generally, check
how it integrates with the tools you’re
going to use.
Very terse syntaxes tend to be good
for DSLs and small and targeted
projects. Code is read and reasoned
about much more often than written!
Semantics are as important as the
syntax, don’t overlook its implications.
Stay away from languages that
release new keywords and constructs
every other minor version. A syntax
either works or doesn’t but can target
every use case.
The compiler is our main tool
but still, it’s generally
speaking, a bitch reporting
Error reporting is the main
interface, besides the
language syntax, between
the compiler and the
IDEs partially solve this by
implementing their own
heuristics on the code.
But it should be the compiler!
I don’t want my IDE bloated
with these things.
clang has done a great job
with this. Other compilers
should follow their lead.
Compilers ought to be
extensible in this aspect too,
if we can extend them with
our own logic we must have
a clear way to give feedback
to the user.
Language newcomers tend to have syntax errors until they get ﬂuent.
The common approach is to extend the grammar to include invalid productions with
Grammars tend to get out of hand and difﬁcult to refactor.
A pretty cool way to improve the errors is to use example snippets of incorrect syntax.
The snippets are fed to the parser and the state of it when it errors is recorded.
Recorded states are then used to build the ﬁnal parser, so when hits an error that
matches a custom message can be displayed.
def foo() # Note: I forgot the colon
<stdin>(3,13): BCE0043: Unexpected token: <INDENT>.
<stdin>(3,13): BCE0044: expecting "DEDENT", found '<EOL>'.
<stdin>(3,13): BCE0044: expecting "EOF", found '<DEDENT>'.
<stdin>(3,13): BCE0043: Methods must use a ‘:’ for their body.
Structural (Go, TypeScript, …)
Pattern matching solves a lot of
type safety derived issues (bye
The problem with extending them
is that the semantics of the
People can easily reason and
adapt to syntax changes but for
In the current state of the art is
probably better switching
languages completely if the
problem at hand requires a
different model of type safety.
Nominal (Java, C#, …)
Some languages offer a mix of
them (ie. Scala, ObjC, Boo, …)
Many allow for runtime duck
If your language uses nominal typing you actually have to embrace
it. Carefully plan your Interfaces to get the job done.
C# does it right. If you want to be conﬁdent about exposing an API
you can’t make every method virtual (*cough* Java *cough*).
public, protected and private are visibility modiﬁers, not extension
Structural typing really shines in this regard. But it must allow it from
the call site not only the deﬁnition site (Go vs Scala). The library
author doesn’t have a clue if his API works as designed or not :-)
Dynamic languages are ﬂawed on this. Rapid prototyping but zero
conﬁdence when upgrading.
Imagine you could tap into the
runtime environment and modify
the executing instructions on
We ask the compiler to
regenerate the changes in our
function, the result is a chunk of
byte code to fed into the VMs.
Many projects deploy nowadays
into a Virtual Machine, which
technically simplify it.
No need to run an interpreter and
very few limitations.
Still, this is the kind of thing that
would ﬂourish with an extensible
compiler design. No big vendor is
going to release something like
Still, the VM expects byte code,
we want to write in a higher level
Extensible compilers would allow
bringing Live Coding to
languages that are not
Popularised recently by F#
They are basically specialised macros, whose purpose is to
generate new types in the program.
Remember when you had to preprocess some XML schema ﬁles to
generate Java classes? What if the compiler could do that for you
Using Json? Just feed an example message ﬁle to the type
provider macro and it will generate a typed interface for it.
This is available in many languages at runtime via Reﬂection.
What’s cool is that now you have it at compile time. Type checking
and auto completion as obvious beneﬁts.
Is your responsibility
To get better tools for our job
we have to earn it. Advocate
these concepts to gain traction
in the profession.
If you have to choose between
an extensible language and
one that’s not, weight the
former appropriately in the
Don’t be afraid of moulding a
language to your needs. Know
your tool and help it help you!
If deploying into a VM (JVM,
risk in mixing languages in a
project that boost our
Forget the idea that only
functional languages are
language can do it too!
Language designers do listen
to its users even if for the
mainstream languages they
are slow reacting to it.
Languages should be
designed for extensibility.
Anyone that can create a
library should be also able to
extend the compiler/language.
Not everything must be done at
Statically typed languages are
closing the gap with dynamic
ones in ease of use and user
experience. We have to review
our current stand points on this.
If you have the chance to learn
a new framework or a new
language, chose the later, it’s
much more rewarding.
Extensible compilers will foster
innovation, no need to wait for
Oracle or Microsoft to come up
with new ideas every 3 years.
People will be able to shape
the future trying new things and
introducing those that work as
part of the standard languages.
Good drivers don’t need to
know about the internals of the
car, but it certainly doesn’t
We need better compilers in
“The limits of my language are
the limits of my world.”