SlideShare a Scribd company logo
Characters encoding and
how to handle it in Ruby
Paweł Cyło
Rzeszów Ruby User Group, 30.11.2016
Characters encoding and
how to handle it in Ruby
• What is a string?
(no, it is not what Google Image search suggests)
• ASCII and ASCII-based 8-bit code pages
• Unicode!
• Working with characters encoding in Ruby
What is a String?
01001000 01100101 01101100 01101100 01101111
00100000 01010111 01101111 01110010 01101100
01100100
"Hello World”
Data + Encoding
There Ain't No Such Thing As Plain Text
ASCII
• Initially designed to work with telegraphs - first version
released in 1963
• 7 bits used in every byte
• 95 human readable characters (32-126) + 33 characters
in control codes block (0-31 and 127)
Characters encoding and how to handle it in Ruby
ASCII-based 8-bit encodings
• “Hey, look! There is one bit not being used in ASCII, lets
get it to work!”
• First 128 characters == ASCII
• Total freedom for last 128 characters
• Unable to handle all existing characters in the world
• Problematic languages mixing in single string
Windows-1252, ISO-8859-1, etc...
The first Unicode version: UCS-2
• Fixed two bytes per character (~65k characters maximum)
• Bizarre Byte Order Mark (BOM) character at the beginning (FE FF
/ FF FE), that was not even always present
• “Look at all these wasted bytes storing zeros! Unicode sucks…”
“Hello":
00 48 00 65 00 6C 00 6C 00 6F
...but it could also be:
48 00 65 00 6C 00 6C 00 6F 00
UTF-8
• Variable-width encoding!
• Code points from 0-127 are stored in a single byte
• Code points 128 and above are stored using 2, 3… in fact, up to 6
bytes
• Code point - every letter is assigned a magic number by the Unicode
consortium which is written like this: U+0639
• Currently supports 128 237 unique characters (and still growing)
• BOM (EF BB BF) is optional and only shows that the encoding is UTF-
8
UTF-8
• Full backward compatibility with ASCII - code points U+0000 to U+007F
• Other variants:
• UTF-32, a 32-bit, fixed-width encoding
• UTF-16, a 16-bit, variable-width encoding
“Hello":
48 65 6C 6C 6F
Strings in Ruby
• Unicode (UTF-8) by default
• String literals allow direct inserting byte
(“xXX“) or UTF-8 code point (“uXXXX“)
• All strings are encoding aware
Strings in Ruby
Is it UTF-8? Not really, the x99 byte is not a valid unicode code
point, but it is available in some ASCII-based encodings. For
example in Windows-1252 code page it stands for the “™”.
Strings in Ruby
Pure ASCII. But it can get more complex:
Ruby String methods to work
with encodings:
• bytes - shows you the bytes that make up a string
• encoding - returns the current encoding of a string
• encode - translates a string to another encoding
(converting characters to their equivalent in the new
encoding)
• force_encoding - shows you what those bytes would look
like interpreted by a different encoding
Characters encoding and how to handle it in Ruby
They can be replaced with predefined default character
Some characters may not be available in other encodings
For those who want more!
• http://www.justinweiss.com/articles/3-steps-to-fix-encoding-problems-in-ruby/
• http://www.justinweiss.com/articles/how-to-get-from-theyre-to-theyre/
• http://www.joelonsoftware.com/articles/Unicode.html
• http://kunststube.net/encoding/
• https://betterexplained.com/articles/unicode/
• http://blog.gatunka.com/2014/04/25/character-encodings-for-modern-programmers/
• https://ruby-doc.org/core-2.3.3/Encoding.html
• http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-solution-for-rails/
• http://nuclearsquid.com/writings/ruby-1-9-encodings/
Characters encoding and how to handle it in Ruby
Paweł Cyło
pawelcylo@gmail.com
@PawelCylo
Dziękuję za uwagę!

More Related Content

Recently uploaded

Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
Alberto Brandolini
 
All you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVMAll you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVM
Alina Yurenko
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
ICS
 
What next after learning python programming basics
What next after learning python programming basicsWhat next after learning python programming basics
What next after learning python programming basics
Rakesh Kumar R
 
316895207-SAP-Oil-and-Gas-Downstream-Training.pptx
316895207-SAP-Oil-and-Gas-Downstream-Training.pptx316895207-SAP-Oil-and-Gas-Downstream-Training.pptx
316895207-SAP-Oil-and-Gas-Downstream-Training.pptx
ssuserad3af4
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Julian Hyde
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
Remote DBA Services
 
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
Bert Jan Schrijver
 
How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?
ToXSL Technologies
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
dakas1
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
TheSMSPoint
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
Peter Muessig
 
Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !
Marcin Chrost
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
Green Software Development
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
Philip Schwarz
 
Malibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed RoundMalibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed Round
sjcobrien
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
Green Software Development
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
SOCRadar
 

Recently uploaded (20)

Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
 
All you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVMAll you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVM
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
 
What next after learning python programming basics
What next after learning python programming basicsWhat next after learning python programming basics
What next after learning python programming basics
 
316895207-SAP-Oil-and-Gas-Downstream-Training.pptx
316895207-SAP-Oil-and-Gas-Downstream-Training.pptx316895207-SAP-Oil-and-Gas-Downstream-Training.pptx
316895207-SAP-Oil-and-Gas-Downstream-Training.pptx
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
 
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
 
How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
 
Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
 
Malibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed RoundMalibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed Round
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
 

Featured

Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
SpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Lily Ray
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
Rajiv Jayarajah, MAppComm, ACC
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Christy Abraham Joy
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
Vit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
MindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
GetSmarter
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
Alireza Esmikhani
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
Project for Public Spaces & National Center for Biking and Walking
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
DevGAMM Conference
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
Erica Santiago
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Saba Software
 

Featured (20)

Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
 

Characters encoding and how to handle it in Ruby

  • 1. Characters encoding and how to handle it in Ruby Paweł Cyło Rzeszów Ruby User Group, 30.11.2016
  • 2. Characters encoding and how to handle it in Ruby • What is a string? (no, it is not what Google Image search suggests) • ASCII and ASCII-based 8-bit code pages • Unicode! • Working with characters encoding in Ruby
  • 3. What is a String? 01001000 01100101 01101100 01101100 01101111 00100000 01010111 01101111 01110010 01101100 01100100 "Hello World” Data + Encoding There Ain't No Such Thing As Plain Text
  • 4. ASCII • Initially designed to work with telegraphs - first version released in 1963 • 7 bits used in every byte • 95 human readable characters (32-126) + 33 characters in control codes block (0-31 and 127)
  • 6. ASCII-based 8-bit encodings • “Hey, look! There is one bit not being used in ASCII, lets get it to work!” • First 128 characters == ASCII • Total freedom for last 128 characters • Unable to handle all existing characters in the world • Problematic languages mixing in single string Windows-1252, ISO-8859-1, etc...
  • 7. The first Unicode version: UCS-2 • Fixed two bytes per character (~65k characters maximum) • Bizarre Byte Order Mark (BOM) character at the beginning (FE FF / FF FE), that was not even always present • “Look at all these wasted bytes storing zeros! Unicode sucks…” “Hello": 00 48 00 65 00 6C 00 6C 00 6F ...but it could also be: 48 00 65 00 6C 00 6C 00 6F 00
  • 8. UTF-8 • Variable-width encoding! • Code points from 0-127 are stored in a single byte • Code points 128 and above are stored using 2, 3… in fact, up to 6 bytes • Code point - every letter is assigned a magic number by the Unicode consortium which is written like this: U+0639 • Currently supports 128 237 unique characters (and still growing) • BOM (EF BB BF) is optional and only shows that the encoding is UTF- 8
  • 9. UTF-8 • Full backward compatibility with ASCII - code points U+0000 to U+007F • Other variants: • UTF-32, a 32-bit, fixed-width encoding • UTF-16, a 16-bit, variable-width encoding “Hello": 48 65 6C 6C 6F
  • 10. Strings in Ruby • Unicode (UTF-8) by default • String literals allow direct inserting byte (“xXX“) or UTF-8 code point (“uXXXX“) • All strings are encoding aware
  • 11. Strings in Ruby Is it UTF-8? Not really, the x99 byte is not a valid unicode code point, but it is available in some ASCII-based encodings. For example in Windows-1252 code page it stands for the “™”.
  • 12. Strings in Ruby Pure ASCII. But it can get more complex:
  • 13. Ruby String methods to work with encodings: • bytes - shows you the bytes that make up a string • encoding - returns the current encoding of a string • encode - translates a string to another encoding (converting characters to their equivalent in the new encoding) • force_encoding - shows you what those bytes would look like interpreted by a different encoding
  • 15. They can be replaced with predefined default character Some characters may not be available in other encodings
  • 16. For those who want more! • http://www.justinweiss.com/articles/3-steps-to-fix-encoding-problems-in-ruby/ • http://www.justinweiss.com/articles/how-to-get-from-theyre-to-theyre/ • http://www.joelonsoftware.com/articles/Unicode.html • http://kunststube.net/encoding/ • https://betterexplained.com/articles/unicode/ • http://blog.gatunka.com/2014/04/25/character-encodings-for-modern-programmers/ • https://ruby-doc.org/core-2.3.3/Encoding.html • http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-solution-for-rails/ • http://nuclearsquid.com/writings/ruby-1-9-encodings/

Editor's Notes

  1. 7 which made your computer beep and 12 which caused the current page of paper to go flying out of the printer and a new one to be fed in
  2. ASCII chart from a 1972 printer manual