SlideShare a Scribd company logo
1 of 35
Download to read offline
URLs:
In Plain View
Mahmoud Hashemi
May, 2017
I love URLs
The most advanced technology to reach the masses.
Ever.
A piece of the web for everyone
◉ Kids
◉ Santa
◉ Dads
Locating the URL
Frontend
Backend
Non-web $ git clone git@github.com:mahmoud/boltons.git
URLs are everywhere
The Internet is very leaky.
“
“Some long ass link you are somehow
suppose to fit into the address bar.”
fnaffoxy2916
Defined Feb 17, 2017
urbandictionary.com
Those Three Words Every
Browser Understands
Uniform
Uniformity means the
mechanism stays the
same, even if the
types of resources
differ.
Resource
A resource can be
anything, even
dynamic content,
representing a
consistent concept.
Locator
Locators are more than
just identifiers; they
have directions for
network lookup.
URLs are like a treasure
map every browser can
read.
The history is long.com
◉ 1992 - W3 hypertext names
◉ 1994 - RFC 1630, 1736, 1737, 1738
◉ 1995 - RFC 1808
◉ 1997 - RFC 2141
◉ 1998 - RFC 2396, 2368
◉ 1999 - RFC 2732
◉ 2002 - RFC 3305
◉ 2005 - RFC 3986 (the gold standard)
◉ 2013 - RFC 6874
◉ 2014 - RFC 7320
◉ 2017 - WHATWG document (the browser bubble)
>67,000
Words spent explicitly defining URLs in the RFCs
#
The overambitious URL
10 years later, even the W3C had to admit it made some mistakes.
Design intent
◉ Simple
◉ Transcribable
◉ No barrier to entry
Usable by humans and computers
The knowable URL
The right amount of URL engineering know-how.
The Scheme1
https://mahmoud:urls@pyconweb.com/anatomy/scheme?lang=en&rfc=3986#subtitle-2017
◉ Short, case-insensitive
◉ Letters, numbers, +, -, .
◉ Registered with IANA
◉ Determines URL semantics
http, https, ssh, gopher, rsync, mailto, tel, …
~60 in common use
The Userinfo2
https://mahmoud:urls@pyconweb.com/anatomy/userinfo?lang=en&rfc=3986#subtitle-2017
◉ Comes after the scheme
◉ ...
The Netloc Slashes!1.5
mailto:mahmoud@hatnote.com
vs.
http://blog.hatnote.com
https://mahmoud:urls@pyconweb.com/anatomy/netloc?lang=en&rfc=3986#subtitle-2017
The Userinfo2
https://mahmoud:urls@pyconweb.com/anatomy/userinfo?lang=en&rfc=3986#subtitle-2017
◉ username:password@
◉ Password is base64-encoded into
Authentication header in HTTP
◉ Our first percent-encoded field!
Percent encoding aka quoting%
%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20
◉ URLs are built to support non-ASCII
◉ Byte values are replaced with %XX
◉ No standard encoding underneath
○ UTF-8 conventional now
○ Latin-1 one of many before
○ Binary-capable
The Host3
https://mahmoud:urls@pyconweb.com/anatomy/host?lang=en&rfc=3986#subtitle-2017
The Host3
https://mahmoud:urls@pyconweb.com/anatomy/host?lang=en&rfc=3986#subtitle-2017
◉ IPv4, [IPv6], or string resolved with DNS
◉ Supports Unicode via Punycode
u'https://bücher.ch' 'https://xn--bcher-kva.ch'
'https://xn--ggbla1c4e.xn--ngbc5azd/'
The Port4
https://mahmoud:urls@pyconweb.com:8080/anatomy/port?lang=en&rfc=3986#subtitle-2017
◉ Positive integers only
◉ Usually registered with IANA
◉ Not emitted if equal to scheme default
The Path5
https://mahmoud:urls@pyconweb.com:8080/anatomy/path?lang=en&rfc=3986#subtitle-2017
◉ Host-local hierarchy
◉ Also percent-encoded
◉ Absolute vs. relative
◉ Almost anything is a path (and a URL)
○ mailto:mahmoud@hatnote.com
○ this|is|not|a|url
The Query String6
https://mahmoud:urls@pyconweb.com:8080/anatomy/query?lang=en&rfc=3986#subtitle-2017
◉ My favorite part
◉ Order is preserved
◉ Duplicate keys combine
◉ An ordered multidict!
The Fragment7
https://mahmoud:urls@pyconweb.com:8080/anatomy/fragment?lang=en&rfc=3986#subtitle-2017
◉ The frontend developers’ favorite part
◉ Not sent to the server
◉ Based on apartment numbers
A Pythonic Example
Let’s look at some Python
core.py
def func(a1, a2, kw1=None):
pass
Python is pretty powerful
caller.py
from pkg.mod import func
# powerful
func(arg1, arg2, kw=’kw1’)
?
But it seems URLs can keep up!!
py://func.module.pkg/arg1/arg2?kw1=val1#awesome
_/ _____________/________/ ______/ _____/
| | | | |
scheme authority path query fragment
OK, back to reality.
What about urlparse?
No standard library is perfect...
urlparse design gaps
◉ Mostly RFC1738 (1994) and RFC2396 (1998)
◉ URLs are “just” tuples of strings
◉ Hardcoded schemes (~25)
◉ Crufty APIs
○ urlparse vs. urlsplit
What do we do?
pip install hyperlink
pip install hyperlink
◉ RFC3986+
◉ Full-fledged URL type
◉ 58 schemes and counting
◉ Smart conventions
○ Plus schemes (git+ssh, etc.)
○ IPv6 validation
○ normalization
◉ Python 2.6 - 3.6 tested
◉ github.com/mahmoud/hyperlink
◉ hyperlink.readthedocs.io
Hyperlink API highlights
◉ Immutable URL type
◉ URIs for computers, IRIs for humans
>>> url = URL.from_text('http://example.com/caf%C3%A9/láit')
>>> print(url.to_iri().to_text())
http://example.com/café/láit
>>> print(url.to_uri().to_text())
http://example.com/caf%C3%A9/au%20l%C3%A1it
Want corner cases?
Check hyperlink/test
Hyperlink History and Future
My idea of fun over time:
◉ 2013
○ Build an IO-agnostic HTTP library and spend
way too much time reading URL RFCs
◉ 2017
○ Work with the Twisted project to merge my URL
(boltons.urlutils) with twisted.python.url
◉ Future
○ Work on the Hyper project to bring more
sans-IO web libraries to Python
○ https://github.com/python-hyper/
URLs in short
◉ Flexible
◉ Powerful
◉ Becoming even more useful
URLs are what you make of them.
Any questions?
◉ github.com/mahmoud
◉ twitter.com/mhashemi
◉ sedimental.org
Thanks!

More Related Content

Similar to Mahmoud Hashemi - URLs: In Plain View

Similar to Mahmoud Hashemi - URLs: In Plain View (20)

Webtech1b
Webtech1bWebtech1b
Webtech1b
 
Webtech1b
Webtech1bWebtech1b
Webtech1b
 
Webtech1b
Webtech1bWebtech1b
Webtech1b
 
Webtech1b
Webtech1bWebtech1b
Webtech1b
 
Webtech1b
Webtech1bWebtech1b
Webtech1b
 
webtech1b.ppt
webtech1b.pptwebtech1b.ppt
webtech1b.ppt
 
Webtech1b
Webtech1bWebtech1b
Webtech1b
 
Webtech1b
Webtech1bWebtech1b
Webtech1b
 
Webtech1b
Webtech1bWebtech1b
Webtech1b
 
Webtech1b
Webtech1bWebtech1b
Webtech1b
 
Presentation_1367055087514
Presentation_1367055087514Presentation_1367055087514
Presentation_1367055087514
 
Webtech1b - hello 123 123
Webtech1b - hello 123 123Webtech1b - hello 123 123
Webtech1b - hello 123 123
 
Sep16_PPt
Sep16_PPtSep16_PPt
Sep16_PPt
 
Webtech1b
Webtech1bWebtech1b
Webtech1b
 
webtech1b.ppt
webtech1b.pptwebtech1b.ppt
webtech1b.ppt
 
Presentation_1367055374547
Presentation_1367055374547Presentation_1367055374547
Presentation_1367055374547
 
Webtech1b
Webtech1bWebtech1b
Webtech1b
 
title
titletitle
title
 
MongoDB and Spark
MongoDB and SparkMongoDB and Spark
MongoDB and Spark
 
MongoDB.local Dallas 2019: MongoDB and Spark
MongoDB.local Dallas 2019: MongoDB and SparkMongoDB.local Dallas 2019: MongoDB and Spark
MongoDB.local Dallas 2019: MongoDB and Spark
 

Recently uploaded

Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
FIDO Alliance
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 

Recently uploaded (20)

Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxCyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
الأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهالأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهله
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 

Mahmoud Hashemi - URLs: In Plain View