league/commonmark is a well-written, super-configurable Markdown parser for PHP based on the CommonMark spec. In this lightning talk, we'll introduce the CommonMark spec, discuss why it's important, and demonstrate how the league/commonmark project can be used and extended for your own PHP projects
4. CommonMark is…
A strongly defined, highly compatible specification of Markdown.
Written by people from Github, StackOverflow, Reddit, and others.
Spec includes:
• Strict rules (precedence, parsing order, handling edge cases)
• Specific definitions (ex: “whitespace”, “punctuation”)
• 613 examples
5. Why is it needed?
*I love Markdown*
<p><em>I love Markdown</em></p>
6. Why is it needed?
*I *love* Markdown*
Source: http://johnmacfarlane.net/babelmark2/
22. league/commonmark
A well-written, super-configurable Markdown
parser for PHP based on the CommonMark spec.
1. Supports the full CommonMark standard
2. Easy to implement
3. Easy to customize
4. Full UTF-8 compatibility
23. league/commonmark
A well-written, super-configurable Markdown
parser for PHP based on the CommonMark spec.
1. Supports the full CommonMark standard
2. Easy to implement
3. Easy to customize
4. Full UTF-8 compatibility
5. Well-tested
Alright!
Today we're going to talk about the PHP League’s CommonMark library
But first, a brief introduction of myself
Also JavaScript, C#, and Java
OUT: So let’s go ahead and take a look at this Markdown parser
The league/commonmark library
At its core, it takes Markdown in and spits HTML out
OUT: Now I’ve mentioned the word CommonMark a few times, but what is that?
We’re all familiar with the usual markdown syntax
This specification dictates exactly how compliant parsers should handle Markdown input
It includes
That’s cool, but do we really NEED a spec? Is Markdown really that complicated?
Well, let’s look at an example
Straight-forward example of emphasizing text
What happens if we add two asterisks?
Whole string emphasized with nested inner emphasis, as you’d expect
Another approach some parsers take is two separate emphasis elements
Kinda makes sense
What’s really strange and unexpected
3 other ways of parsing this (22%)
OUTRO:
The CommonMark standard is designed to eliminate this ambiguity
so that your Markdown is handled in a logical, predictable fashion
That’s why we’ve adopted this specification for our library
Several integrations for this library built by the community
This project also caters to advanced users…
This project also caters to advanced users…
Who need more power, control, and flexibility
We have a URL we want to link to using the standard autolinking syntax
Wrapped with a less-than and greater-than sign
“A” tag with href and text label of the URL
Show how our engine does this behind-the-scenes
Start off with Markdown input
Run it through the various sub-parsers which results in an Abstract Syntax Tree
Also known as an AST
Tree structure of PHP objects, each representing a certain type of element
For easier visualization I’m showing what these PHP objects MIGHT look like if we showed their data as XML
Once we’ve got the final AST, we pass that along to the renderers…
…which convert the AST into HTML
Now what’s really cool is that…
You can hook into any of these three aspects, adding “your own custom…”
Now let’s go back to our autolink example
What if we wanted to add similar autolinking functionality, but for Twitter handles?
For example, say we want to enclose the Twitter handle in a similar fashion… which results in a link to that profile page
Let me show you how simple it is to add this feature
Simply create a sub-parser
1. Tell the main parser to stop whenever a less-than sign is encountered
2. Cursor is a simple yet powerful wrapper around the current line
- Current character position
- If the line is indented
- Easily advance through that string
Match this regular expression
Extract just the username from the matched text
That’s it!Now the APIs and methods might be unfamiliar, but hopefully you can see how features can be added seemlesly with only a few lines of code
Easy to customize to fit your needs with classes and extension points specifically designed with customization in mind
Things like the cursor are UTF-8-aware
Supports Chinese characters, emoji
94% coverage
Including all 613 tests from the CommonMark spec test suite
Guaranteed to be compatible with all other CommonMark parsers in other languages
So if you’d like to check it out, you can find the library on Packagist under league/commonmark
Installation instructions & documentation can be found
Hopefully you find this library useful and can try it out in your next project.
Thank you so much!