Software is complicated. Machine learning, microservice architectures, message queues... every few months there's another revolutionary idea to consider, another framework to learn. And underneath so many of these amazing ideas and abstractions is text. When you work in software, you spend your life working with text. Some of those text files are source code, some are configuration files, some of them are documentation.
Editors, revision control systems, programming languages - everything from C# and HTML to Git and VS Code is based on the idea that we're working with "plain text" files. But... what if I told you there's no such thing?
When we say something is a plain text file, we're relying on a huge number of assumptions - about operating systems, editors, file formats, language, culture, history... and, most of the time, that's OK. But when it goes wrong, good old plain text can lead to some of the weirdest bugs you've ever seen.
Why is there Chinese in the SQL event logs? Why has the city of Aarhus disappeared? And why does Magnus Mårtensson always have trouble getting into the USA? Join Dylan Beattie for a fascinating look into the hidden world of text files - from the history of mechanical teletypes, to how emoji skin tones actually work.
We'll look at some memorable bugs, some golden rules for working with plain text, and we'll even find out the story behind the strange saying "PIKE MATCHBOX" and what it has to do with driving in Ukraine.
11. 0 00000000 NUL Null char
1 00000001 SOH Start of Heading
2 00000010 STX Start of Text
3 00000011 ETX End of Text
4 00000100 EOT End of Transmission
5 00000101 ENQ Enquiry
6 00000110 ACK Acknowledgment
7 00000111 BEL Bell
8 00001000 BS Back Space
9 00001001 HT Horizontal Tab
10 00001010 LF Line Feed
11 00001011 VT Vertical Tab
12 00001100 FF Form Feed
13 00001101 CR Carriage Return
14 00001110 SO Shift Out / X-On
15 00001111 SI Shift In / X-Off
16 00010000 DLE Data Line Escape
17 00010001 DC1 Device Control 1
18 00010010 DC2 Device Control 2
19 00010011 DC3 Device Control 3
20 00010100 DC4 Device Control 4
21 00010101 NAK Negative Ack
22 00010110 SYN Synchronous Idle
23 00010111 ETB End of Transmit Block
24 00011000 CAN Cancel
25 00011001 EM End of Medium
26 00011010 SUB Substitute
27 00011011 ESC Escape
28 00011100 FS File Separator
29 00011101 GS Group Separator
30 00011110 RS Record Separator
31 00011111 US Unit Separator
Shortcut: Ctrl-A
Shortcut: Ctrl-B
Shortcut: Ctrl-C
20. 64 01000000 @ At symbol
65 01000001 A Uppercase A
66 01000010 B Uppercase B
67 01000011 C Uppercase C
68-87: Uppercase letters D-W
88 01011000 X Uppercase X
89 01011001 Y Uppercase Y
90 01011010 Z Uppercase Z
91 01011011 [ Opening bracket
92 01011100 Backslash
93 01011101 ] Closing bracket
94 01011110 ^ Caret - circumflex
95 01011111 _ Underscore
96 01100000 ` Grave accent
97 01100001 a Lowercase a
98 01100010 b Lowercase b
99 01100011 c Lowercase c
100-119: Lowercase letters d-w
120 01111000 x Lowercase x
121 01111001 y Lowercase y
122 01111010 z Lowercase z
123 01111011 { Opening brace
124 01111100 | Vertical bar
125 01111101 } Closing brace
126 01111110 ~ Tilde
127 1111111 DEL Delete
21. 64 01000000 @ At symbol
65 01000001 A Uppercase A
66 01000010 B Uppercase B
67 01000011 C Uppercase C
68-87: Uppercase letters D-W
88 01011000 X Uppercase X
89 01011001 Y Uppercase Y
90 01011010 Z Uppercase Z
91 01011011 [ Opening bracket
92 01011100 Backslash
93 01011101 ] Closing bracket
94 01011110 ^ Caret - circumflex
95 01011111 _ Underscore
96 01100000 ` Grave accent
97 01100001 a Lowercase a
98 01100010 b Lowercase b
99 01100011 c Lowercase c
100-119: Lowercase letters d-w
120 01111000 x Lowercase x
121 01111001 y Lowercase y
122 01111010 z Lowercase z
123 01111011 { Opening brace
124 01111100 | Vertical bar
125 01111101 } Closing brace
126 01111110 ~ Tilde
127 1111111 DEL Delete
1111111
30. "To provide a single, consistent way to
represent each letter and symbol needed for
all human languages across all computers
and devices."
- Mission statement of the Unicode Consortium