SlideShare a Scribd company logo
Python2’s unicode problem
A primer for the Python3ista
Intro
Intro
●
This is an entry level talk on a complex problem
Intro
●
This is an entry level talk on a complex problem
●
It’s aimed at giving you a peek at the problem...
Intro
●
This is an entry level talk on a complex problem
●
It’s aimed at giving you a peek at the problem
●
...so you’ll start to have a conceptual
understanding
Intro
●
This is an entry level talk on a complex problem
●
It’s aimed at giving you a peek at the problem
●
...so you’ll start to have a conceptual
understanding
– But solving the problem is another talk
Intro
●
This is an entry level talk on a complex problem
●
It’s aimed at giving you a peek at the problem
●
...so you’ll start to have a conceptual
understanding
– But solving the problem is another talk
– In fact, having a complete understanding of the
problem is another talk
New job, old tech
●
So you learned on Python3….
New job, old tech
●
So you learned on Python3….
●
But your new job requires maintaining
Python2...
New job, old tech
●
So you learned on Python3….
●
But your new job requires maintaining
Python2...
●
What’s this unicode problem everyone’s talking
about?
The unicode problem
🎵Oh you wanna know what the big deal is?
The unicode problem
🎵Oh you wanna know what the big deal is?
I can tell you that
The unicode problem
🎵Oh you wanna know what the big deal is?
I can tell you that
Oh, I can tell you that 🎶
But first, let's learn about PHP
But first, let's learn about PHP
Because every good explanation of a problem
begins with an example in PHP.
But first, let's learn about PHP
In PHP, you can define a variable to hold a
string
But first, let's learn about PHP
That looks like this:
$a = "1";
But first, let's learn about PHP
and then you can define a second variable to
hold an int
$a = "1";
But first, let's learn about PHP
Which looks like this:
$a = "1";
$b = 1;
But first, let's learn about PHP
and then you can compare those...
$a = "1";
$b = 1;
if ($a == $b) {
echo "n";
}
But first, let's learn about PHP
which outputs this:
$ php -r '$a = "1"; $b = 1; if ($a == $b)
{ echo "Equaln"; }'
Equal
But first, let's learn about PHP
Huh
But first, let's learn about PHP
In PHP, you can define a variable to hold a
string:
$a = "1";
But first, let's learn about PHP
and then you can define a second variable to
hold another string:
$a = "1";
$b = "1.0";
But first, let's learn about PHP
And then you can compare those:
$a = "1";
$b = "1.0";
if ($a == $b) {
echo "Equaln";
}
But first, let's learn about PHP
which does just what you expect...:
$ php -r '$a = "1"; $b = "1.0"; if ($a ==
$b) { echo "Also equaln"; }'
Also equal
Okay, let’s learn about Python3
Okay, let’s learn about Python3
In Python3, you can define a variable to hold a
string (Not idiomatic... bear with me):
a = str("1");
Okay, let’s learn about Python3
And you can define another variable to hold an
int:
a = str("1");
b = int(1);
Okay, let’s learn about Python3
and then you can compare them:
a = str("1");
b = int(1);
print("Equal") if a == b else print("Unequal")
Okay, let’s learn about Python3
which shows that Python has chosen a
distinctly different path than PHP:
python3 -c 'a = str("1"); b = int(1) ; print("Equal") if
a == b else print("Unequal")'
Unequal
Choices
Different languages make different choices
Choices: PHP
Choices: PHP
●
PHP imagines a world where all input is text
Choices: PHP
●
PHP imagines a world where all input is text
●
The language should turn text into types the
programmer expects
Choices: PHP
●
PHP imagines a world where all input is text
●
The language should turn text into types the
programmer expects
●
PHP coerces strings to types before comparing
them
Choices: PHP
if (float(“1”) == float(“1.0”)) {
echo(“Equal”);
}
Just imagine that PHP is doing this:
Choices: PHP
●
PHP imagines a world where all input is text
●
The language should turn text into types the
programmer expects
●
PHP coerces strings to types before comparing
them
●
But what if you want to compare strings?
Choices: PHP
if “1” == “1.0” {
echo(“Equal”);
}
Possible...
Choices: PHP
if “1” === “1.0” {
echo(“Equal”);
}
…with a different operator
Choices: PHP
if “1” === “1.0” {
echo(“Equal”);
}
Not always convenient or intuitive to seasoned
programmers...
Choices: Python
Choices: Python
●
Variables are strongly typed
Choices: Python
●
Variables are strongly typed
●
The language forces the programmer to match
up the types
Choices: PHP
if int(“1”) == 1:
print(“Equal”)
if “1” == str(1):
print(“Equal”)
You could convert:
Choices: PHP
a = “1”
b = [(“1”, “digit”), (“one”, “word”)]
for entry in b:
if a == entry[1]:
print(entry[0])
You could define a different way to compare:
Choices: Python
●
Variables are strongly typed
●
The language forces the programmer to match
up the types
●
The power is in your hands
Let’s talk about Python3 bytes
Let’s talk about Python3 bytes
In Python3, you can define a text string
(an immutable sequence of human readable
characters):
a = str(“ñ”)
Let’s talk about Python3 bytes
And you can define a byte string
(an immutable sequence of bytes):
a = str(“ñ”)
b = bytes(b"xc3xb1")
Let’s talk about Python3 bytes
And when you attempt to compare those...
a = str(“ñ”)
b = bytes(b"xc3xb1")
print("Equal") if a == b else print("Unequal")
Let’s talk about Python3 bytes
...they continue to do what you expect:
python3 -c 'a = str("ñ"); b = bytes(b"xc3
xb1") ; print("Equal") if a == b else
print("Unequal")'
Unequal
Let’s talk about Python3 bytes
If you, the programmer, decide that you want to
convert and compare, you can do that:
a = str("ñ")
b = bytes(b"xc3xb1").decode("latin1")
if a == b:
print(f"{a} == {b}: Equal")
else:
print(f"{a} == {b}: Unequal")'
# OUTPUT: ñ == ñ: Unequal
Let’s talk about Python3 bytes
With way that you choose to convert having a
hand in the results:
a = str("ñ")
b = bytes(b"xc3xb1").decode("utf-8")
if a == b:
print(f"{a} == {b}: Equal")
else:
print(f"{a} == {b}: Unequal")'
# OUTPUT: ñ == ñ: Equal
Let’s talk about idiomatic Python3
Let’s talk about idiomatic Python3
●
I’ve been using constructors
Let’s talk about idiomatic Python3
●
I’ve been using constructors
– Make clear we’re dealing with different types
Let’s talk about idiomatic Python3
●
I’ve been using constructors
– Make clear we’re dealing with different types
●
Python3 has string literals...
Let's talk about idiomatic Python3
...for text strings:
a = “z”
Let's talk about idiomatic Python3
...and for bytes strings:
a = “z”
b = b"xc3xb1"
Let's talk about idiomatic Python3
These are the same as using the constructor:
a = “ñ”
b = b"xc3xb1"
if a == b.decode("utf-8"):
print(f"{a} == {b}: Equal")
else:
print(f"{a} == {b}: Unequal")
# ñ == ñ: Equal
Let’s talk about idiomatic Python3
●
I’ve been using constructors
– Make clear we’re dealing with different types
●
Python3 has string literals…
●
Python3 has syntactic sugar for byte strings if
they only contain characters present in ASCii.
Let's talk about idiomatic Python3
This is the sugar:
a = “z”
b = b”z”
Let's talk about idiomatic Python3
But this is just sugar. They are still different types
which compare unequal:
a = “z”
b = b”z”
if a == b:
print(“Equal”)
else:
print("Unequal")
# Unequal
Let's talk about idiomatic Python3
Unless you explicitly convert them:
a = “z”
b = b”z”
if a == b.decode(“utf-8”):
print(“Equal”)
else:
print("Unequal")
# Equal
Let's talk about idiomatic Python3
Warning: they may still compare unequal when
you decode...
a = “z”
b = b”z”
if a == b.decode(“ebcdic-cp-ch”):
print(“Equal”)
else:
print("Unequal")
# Unequal
Let's talk about idiomatic Python3
●
...because b”z” is actually 0x7a
●
and the encoding determines which human
character that maps to
a = “z”
b = b”z”
if a == b.decode(“ebcdic-cp-ch”):
print(“Equal”)
else:
print("Unequal")
# Unequal
Let’s talk about Python2!
(finally)
Let’s talk about Python2!
●
PHPPython2 will coerce bytes type to text type
Let’s talk about Python2!
●
PHPPython2 will coerce bytes type to text type
(Ouch)
Let’s talk about Python2!
●
PHPPython2 will coerce bytes type to text type
– There are many circumstances when it will coerce
Let’s talk about Python2
●
String formatting with %
a = b“test %s” % u”string” # u”test string”
b = u”test %s” % b“string” # u”test string”
Let’s talk about Python2
●
Adding strings
a = b“this” + u”too” # u”this too”
b = u”this” + b“too” # u”this too”
Let’s talk about Python2
●
Joining strings
a = u”“.join(b“yep”, b”pers”) # u”yeppers”
b = b””.join(u””, b“empty too”) # u”empty too”
Let’s talk about Python2
●
Comparisons
if u”test” == b”test”:
print(‘Equal’)
# Equal
Let’s talk about Python2!
●
So what’s the problem?
Let’s talk about Python2!
●
So what’s the problem?
●
Python2 can only coerce ASCii characters
Let’s talk about Python2!
●
So what’s the problem?
●
Python2 can only coerce ASCii characters
●
Attempting to coerce other characters will fail
Let’s talk about Python2
Most coercions that fail, fail with a traceback...
a = u“test %s” % b”coffee” # u”test coffee”
b = u”test %s” % b”café”
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode
byte 0xc3 in position 3: ordinal not in
range(128)
Let’s talk about Python2
… but with comparisons, the implicit coercion
results in a warning
if b"coffee" == u"coffee":
print("Equal")
# Equal
if b"café" == u"café":
print("Equal")
__main__:1: UnicodeWarning: Unicode equal
comparison failed to convert both arguments to
Unicode - interpreting them as being unequal
Let’s talk about Python2
Depending on your locale settings….
echo -n 'hello' > ‘み.txt’
LC_ALL=pt_BR.utf-8 python2 -c 'print(open(u"
u307f.txt").read())'
hello
Let’s talk about Python2
...encoding can also traceback
echo -n 'hello' > ‘み.txt’
LC_ALL=pt_BR.iso88591 python2 -c
'print(open(u"u307f.txt").read())'
Traceback (most recent call last):
File "<string>", line 1, in <module>
UnicodeEncodeError: 'latin-1' codec can't
encode characters in position 0-3: ordinal not
in range(256
Let’s talk about Python2!
●
So what’s the problem?
●
Python2 can only coerce ASCii characters
●
Attempting to coerce other characters will fail
(Ouch) (Ouch)
What is the underlying difference?
What is the underlying difference?
●
Python3 is modeled around them being friendly
but non-substitutable types
What is the underlying difference?
●
Python3 is modeled around them being friendly
but non-substitutable types
– Explicit conversions exist but not implicit
What is the underlying difference?
●
Python3 is modeled around them being friendly
but non-substitutable types
– Explicit conversions exist but not implicit
– Most APIs take one or the other
What is the underlying difference?
●
Python3 is modeled around them being friendly
but non-substitutable types
– Explicit conversion exist but not implicit
– Most APIs take one or the other
●
Python2 is modeled around text and bytes
being largely substitutable
What is the underlying difference?
●
Python3 is modeled around them being friendly
but non-substitutable types
– Explicit conversion exist but not implicit
– Most APIs take one or the other
●
Python2 is modeled around text and bytes
being largely substitutable
– Plethora of implicit conversions
What is the underlying difference?
●
Python3 is modeled around them being friendly
but non-substitutable types
– Explicit conversion exist but not implicit
– Most APIs take one or the other
●
Python2 is modeled around text and bytes
being largely substitutable
– Plethora of implicit conversions
– Most APIs accept either one….
Let’s talk about Listkov
Let’s talk about Liskov
●
The Liskov Substitution Principle formulates a
property of good object design
Let’s talk about Liskov
●
The Liskov Substitution Principle formulates an
essential property of good object design
Let’s talk about Liskov
●
The Liskov Substitution Principle formulates an
essential property of good object design
●
Behaviors of child objects cannot modify the
behaviors of parent objects
Let’s talk about Liskov
●
The Liskov Substitution Principle formulates an
essential property of good object design
●
Behaviors of child objects cannot modify the
behaviors of parent objects
●
Allows safely substituting the child for the
parent
Let’s talk about Liskov
●
Text and byte strings do not satisfy Liskov
Let’s talk about Liskov
●
Text and byte strings do not satisfy Liskov
(The Python authors knew this. There’s no
parent-child relationship between them)
Let’s talk about Liskov
●
Text and byte strings do not satisfy Liskov
– translate()
– decode()
– encode()
–
Let’s talk about Liskov
●
Text and byte strings do not satisfy Liskov
– translate()
– decode()
– encode()
●
What does that mean for us?
Changing our expectations
●
Would you expect this to work?
– assert [u“one”] + u“two” == [u”one”, u”two”]
Change our expectations
●
Would you expect this to work?
– assert [u“one”] + u“two” == [u”one”, u”two”]
●
How about this?
– assert add(1, u”two”) == 3
Change our expectations
●
Would you expect this to work?
– assert [u“one”] + u“two” == [u”one”, u”two”]
●
How about this?
– assert add(1, u”two”) == 3
●
So why do we expect this to work?
– assert concat( b“one”, u”two) == b”onetwo”
●
Two different types; not substitutable: it is up to
the caller to decide what to do
Change our expectations
●
Would you expect this to work?
– assert [u“one”] + u“two” == [u”one”, u”two”]
●
How about this?
– assert add(1, u”two”) == 3
●
So why do we expect this to work?
– assert concat( b“one”, u”two) == b”onetwo”
The Unicode Sandwich
●
Python2’s text type is called “unicode()”
The Unicode Sandwich
●
Python2’s text type is called “unicode()”
●
Make all human-readable strings unicode type
The Unicode Sandwich
●
Python2’s text type is called “unicode()”
●
Make all human-readable strings unicode type
●
Make all binary data str type
The Unicode Sandwich
●
Python2’s text type is called “unicode()”
●
Make all human-readable strings unicode type
●
Make all binary data str type
– Use a naming convention to identify variables that
hold binary data
The Unicode Sandwich
●
Python2’s text type is called “unicode()”
●
Make all human-readable strings unicode type
●
Make all binary data str type
– Use a naming convention to identify variables that
hold binary data
●
Transform to the appropriate type immediately
after data enters your application
The Unicode Sandwich
●
Python2’s text type is called “unicode()”
●
Make all human-readable strings unicode type
●
Make all binary data str type
– Use a naming convention to identify variables that
hold binary data
●
Transform to the appropriate type immediately
after data enters your application
●
Transform to the type an external API expects
just before calling the API
Writing APIs: General
●
Create APIs that accept text if they need
human-readable data
Writing APIs: General
●
Create APIs that accept text if they need
human-readable data
●
Create APIs that accept bytes if they deal with
binary data
Writing APIs: General
●
Create APIs that accept text if they need
human-readable data
●
Create APIs that accept bytes if they deal with
binary data
●
Use a naming convention to identify functions
which return bytes
Writing APIs: General
●
Create APIs that accept text if they need
human-readable data
●
Create APIs that accept bytes if they deal with
binary data
●
Use a naming convention to identify functions
which return bytes
●
Avoid making functions which mix text and
bytes
Writing APIs: Mixing
So you want to disregard my advice and write
functions which allow mixing….
assert concat(b”xe4xb8x80”, u” ”二 ) == u“一 ”二
Writing APIs: Mixing
Questions:
●
Should this return text or bytes?
assert concat(b”xe4xb8x80”, u”二”) == u“一二”
Writing APIs: Mixing
Questions:
●
Should this return text or bytes?
●
What encoding should it use to convert?
assert concat(b”xe4xb8x80”, u”二”) == u“一二”
Writing APIs: Mixing
Questions:
●
Should this return text or bytes?
●
What encoding should it use?
●
What should happen when it can’t convert?
assert concat(b”xe4xb8xff”, u”二”) == u“一二”
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode
bytes in position 0-1: invalid continuation
byte
Writing APIs: Mixing
Answer:
●
Return bytes or text: Naming conventions!
assert concat(b”xe4xb8x80”, u”二”) == u“一二”
def b_concat(str1, str2) -> bytes:
def concat(str1, str2) -> unicode:
Writing APIs: Mixing
Answer:
●
Return bytes or text: Naming conventions
●
Encoding: Caller is in control
assert concat(b”xe4xb8x80”, u”二”) == u“一二”
def b_concat(str1, str2, encoding=”utf-8”) ->
bytes:
def concat(str1, str2, encoding=”utf-8”) ->
unicode:
Writing APIs: Mixing
Answer:
●
Return bytes or text: Naming conventions!
●
Encoding: Caller is in control
●
Handling errors: Caller is in control
assert concat(b”xe4xb8x80”, u”二”) == u“一二”
def b_concat(str1, str2, encoding=”utf-8”,
errors=”strict”) -> bytes:
def concat(str1, str2, encoding=”utf-8”,
errors=”strict”) -> unicode:
The danger of mixed APIs
●
Now that you know how to write mixed APIs, a
reminder not to do it.
●
Mixed APIs encourage sloppy programming
●
Instead of understanding the types you are
using and controlling them you get used to
throwing any type at it and getting useful output.
●
Don’t do that.
Exceptions
●
Sometimes you’ll have an API that is type-less.
●
Like repr()… give it any type of data and get
something sensible.
●
What else could be like that?
●
Debug logging:
– Logging.debug(“Not a message, an object”)
– Logging.debug(Configparser(filename))
– Logging.debug(b”Above would sensibly log the
particulars about the ConfigParser object. This logs
the particulars about a bytes object”)
Exceptions
●
Debug logging
Exceptions
●
Debug logging
– Logging.debug(“Not a message, an object”)
Exceptions
●
Debug logging
– Logging.debug(“Not a message, an object”)
●
10:00:UTC|u”Not a message, an object”|
Exceptions
●
Debug logging
– Logging.debug(“Not a message, an object”)
●
10:00:UTC|u”Not a message, an object”
– Logging.debug(pathlib.Path(“/etc/passwd”))
Exceptions
●
Debug logging
– Logging.debug(“Not a message, an object”)
●
10:00:UTC|u”Not a message, an object”
– Logging.debug(pathlib.Path(“/etc/passwd”))
●
10:00:UTC|PosixPath('/')
Exceptions
●
Debug logging
– Logging.debug(“Not a message, an object”)
●
10:00:UTC|u”Not a message, an object”
– Logging.debug(pathlib.Path(“/etc/passwd”))
●
10:00:UTC|PosixPath('/')
– Logging.debug(b”Remember: logging objects”)
Exceptions
●
Debug logging
– Logging.debug(“Not a message, an object”)
●
10:00:UTC|u”Not a message, an object”
– Logging.debug(pathlib.Path(“/etc/passwd”))
●
10:00:UTC|PosixPath('/')
– Logging.debug(b”Remember: logging objects”)
●
10:00:UTC|b”Remember: logging objects”
Finis
●
Thanks to Gary Bernhardt of Destroy All
Software which inspired the format of this talk
●
https://www.destroyallsoftware.com/talks/wat
●
Kumar McMillan’s Pycon talk on unicode in
Python2; old but good introduction to the solution
●
http://farmdev.com/talks/unicode/
●
I’m Toshio Kuratomi (@abadger1999, @abadger,
and <abadger gmail>
●
Hope you had fun!

More Related Content

What's hot

Introduction to Python
Introduction to Python Introduction to Python
Introduction to Python
amiable_indian
 
Introduction to Python - Part Two
Introduction to Python - Part TwoIntroduction to Python - Part Two
Introduction to Python - Part Two
amiable_indian
 
python.ppt
python.pptpython.ppt
python.ppt
shreyas_test_1234
 
Python basics_ part1
Python basics_ part1Python basics_ part1
Python basics_ part1
Elaf A.Saeed
 
Python revision tour i
Python revision tour iPython revision tour i
Python revision tour i
Mr. Vikram Singh Slathia
 
Let’s Learn Python An introduction to Python
Let’s Learn Python An introduction to Python Let’s Learn Python An introduction to Python
Let’s Learn Python An introduction to Python
Jaganadh Gopinadhan
 
Python basics
Python basicsPython basics
Python basics
Manisha Gholve
 
Most Asked Python Interview Questions
Most Asked Python Interview QuestionsMost Asked Python Interview Questions
Most Asked Python Interview Questions
Shubham Shrimant
 
Python interview questions
Python interview questionsPython interview questions
Python interview questions
Pragati Singh
 
Python Workshop
Python WorkshopPython Workshop
Python Workshop
Saket Choudhary
 
Introduction to Python - Part Three
Introduction to Python - Part ThreeIntroduction to Python - Part Three
Introduction to Python - Part Three
amiable_indian
 
Parts of Speect Tagging
Parts of Speect TaggingParts of Speect Tagging
Parts of Speect Tagging
theyaseen51
 
Python language data types
Python language data typesPython language data types
Python language data types
Hoang Nguyen
 
Functions, List and String methods
Functions, List and String methodsFunctions, List and String methods
Functions, List and String methods
PranavSB
 
FUNDAMENTALS OF PYTHON LANGUAGE
 FUNDAMENTALS OF PYTHON LANGUAGE  FUNDAMENTALS OF PYTHON LANGUAGE
FUNDAMENTALS OF PYTHON LANGUAGE
Saraswathi Murugan
 
Python Presentation
Python PresentationPython Presentation
Python Presentation
Narendra Sisodiya
 
Introduction to the basics of Python programming (part 1)
Introduction to the basics of Python programming (part 1)Introduction to the basics of Python programming (part 1)
Introduction to the basics of Python programming (part 1)
Pedro Rodrigues
 
Python interview questions
Python interview questionsPython interview questions
Python interview questions
Pragati Singh
 
Python
PythonPython
Fundamentals of Python Programming
Fundamentals of Python ProgrammingFundamentals of Python Programming
Fundamentals of Python Programming
Kamal Acharya
 

What's hot (20)

Introduction to Python
Introduction to Python Introduction to Python
Introduction to Python
 
Introduction to Python - Part Two
Introduction to Python - Part TwoIntroduction to Python - Part Two
Introduction to Python - Part Two
 
python.ppt
python.pptpython.ppt
python.ppt
 
Python basics_ part1
Python basics_ part1Python basics_ part1
Python basics_ part1
 
Python revision tour i
Python revision tour iPython revision tour i
Python revision tour i
 
Let’s Learn Python An introduction to Python
Let’s Learn Python An introduction to Python Let’s Learn Python An introduction to Python
Let’s Learn Python An introduction to Python
 
Python basics
Python basicsPython basics
Python basics
 
Most Asked Python Interview Questions
Most Asked Python Interview QuestionsMost Asked Python Interview Questions
Most Asked Python Interview Questions
 
Python interview questions
Python interview questionsPython interview questions
Python interview questions
 
Python Workshop
Python WorkshopPython Workshop
Python Workshop
 
Introduction to Python - Part Three
Introduction to Python - Part ThreeIntroduction to Python - Part Three
Introduction to Python - Part Three
 
Parts of Speect Tagging
Parts of Speect TaggingParts of Speect Tagging
Parts of Speect Tagging
 
Python language data types
Python language data typesPython language data types
Python language data types
 
Functions, List and String methods
Functions, List and String methodsFunctions, List and String methods
Functions, List and String methods
 
FUNDAMENTALS OF PYTHON LANGUAGE
 FUNDAMENTALS OF PYTHON LANGUAGE  FUNDAMENTALS OF PYTHON LANGUAGE
FUNDAMENTALS OF PYTHON LANGUAGE
 
Python Presentation
Python PresentationPython Presentation
Python Presentation
 
Introduction to the basics of Python programming (part 1)
Introduction to the basics of Python programming (part 1)Introduction to the basics of Python programming (part 1)
Introduction to the basics of Python programming (part 1)
 
Python interview questions
Python interview questionsPython interview questions
Python interview questions
 
Python
PythonPython
Python
 
Fundamentals of Python Programming
Fundamentals of Python ProgrammingFundamentals of Python Programming
Fundamentals of Python Programming
 

Similar to Python2 unicode-pt1

Python Basics
Python Basics Python Basics
Python Basics
Adheetha O. V
 
Unit -1 CAP.pptx
Unit -1 CAP.pptxUnit -1 CAP.pptx
Unit -1 CAP.pptx
malekaanjum1
 
Pythonintro
PythonintroPythonintro
Pythonintro
Hardik Malhotra
 
An Introduction : Python
An Introduction : PythonAn Introduction : Python
An Introduction : Python
Raghu Kumar
 
Python for scientific computing
Python for scientific computingPython for scientific computing
Python for scientific computing
Go Asgard
 
Basic concept of Python.pptx includes design tool, identifier, variables.
Basic concept of Python.pptx includes design tool, identifier, variables.Basic concept of Python.pptx includes design tool, identifier, variables.
Basic concept of Python.pptx includes design tool, identifier, variables.
supriyasarkar38
 
Python- strings
Python- stringsPython- strings
Python- strings
Krishna Nanda
 
Core Concept_Python.pptx
Core Concept_Python.pptxCore Concept_Python.pptx
Core Concept_Python.pptx
Ashwini Raut
 
python isn't just a snake
python isn't just a snakepython isn't just a snake
python isn't just a snake
geekinlibrariansclothing
 
Introduction To Python.pptx
Introduction To Python.pptxIntroduction To Python.pptx
Introduction To Python.pptx
Anum Zehra
 
Python programming
Python  programmingPython  programming
Python programming
Ashwin Kumar Ramasamy
 
Command line arguments that make you smile
Command line arguments that make you smileCommand line arguments that make you smile
Command line arguments that make you smile
Martin Melin
 
2016 bioinformatics i_python_part_2_strings_wim_vancriekinge
2016 bioinformatics i_python_part_2_strings_wim_vancriekinge2016 bioinformatics i_python_part_2_strings_wim_vancriekinge
2016 bioinformatics i_python_part_2_strings_wim_vancriekinge
Prof. Wim Van Criekinge
 
Python
PythonPython
Python
Kumar Gaurav
 
Class 27: Pythonic Objects
Class 27: Pythonic ObjectsClass 27: Pythonic Objects
Class 27: Pythonic Objects
David Evans
 
Threading Is Not A Model
Threading Is Not A ModelThreading Is Not A Model
Threading Is Not A Model
guest2a5acfb
 
modul-python-all.pptx
modul-python-all.pptxmodul-python-all.pptx
modul-python-all.pptx
Yusuf Ayuba
 
Python Quick Start
Python Quick StartPython Quick Start
Python Quick Start
Abbas Ali
 
Python in 90 Minutes
Python in 90 MinutesPython in 90 Minutes
Python in 90 Minutes
Nachu Muthu
 
MODULE. .pptx
MODULE.                              .pptxMODULE.                              .pptx
MODULE. .pptx
Alpha337901
 

Similar to Python2 unicode-pt1 (20)

Python Basics
Python Basics Python Basics
Python Basics
 
Unit -1 CAP.pptx
Unit -1 CAP.pptxUnit -1 CAP.pptx
Unit -1 CAP.pptx
 
Pythonintro
PythonintroPythonintro
Pythonintro
 
An Introduction : Python
An Introduction : PythonAn Introduction : Python
An Introduction : Python
 
Python for scientific computing
Python for scientific computingPython for scientific computing
Python for scientific computing
 
Basic concept of Python.pptx includes design tool, identifier, variables.
Basic concept of Python.pptx includes design tool, identifier, variables.Basic concept of Python.pptx includes design tool, identifier, variables.
Basic concept of Python.pptx includes design tool, identifier, variables.
 
Python- strings
Python- stringsPython- strings
Python- strings
 
Core Concept_Python.pptx
Core Concept_Python.pptxCore Concept_Python.pptx
Core Concept_Python.pptx
 
python isn't just a snake
python isn't just a snakepython isn't just a snake
python isn't just a snake
 
Introduction To Python.pptx
Introduction To Python.pptxIntroduction To Python.pptx
Introduction To Python.pptx
 
Python programming
Python  programmingPython  programming
Python programming
 
Command line arguments that make you smile
Command line arguments that make you smileCommand line arguments that make you smile
Command line arguments that make you smile
 
2016 bioinformatics i_python_part_2_strings_wim_vancriekinge
2016 bioinformatics i_python_part_2_strings_wim_vancriekinge2016 bioinformatics i_python_part_2_strings_wim_vancriekinge
2016 bioinformatics i_python_part_2_strings_wim_vancriekinge
 
Python
PythonPython
Python
 
Class 27: Pythonic Objects
Class 27: Pythonic ObjectsClass 27: Pythonic Objects
Class 27: Pythonic Objects
 
Threading Is Not A Model
Threading Is Not A ModelThreading Is Not A Model
Threading Is Not A Model
 
modul-python-all.pptx
modul-python-all.pptxmodul-python-all.pptx
modul-python-all.pptx
 
Python Quick Start
Python Quick StartPython Quick Start
Python Quick Start
 
Python in 90 Minutes
Python in 90 MinutesPython in 90 Minutes
Python in 90 Minutes
 
MODULE. .pptx
MODULE.                              .pptxMODULE.                              .pptx
MODULE. .pptx
 

Recently uploaded

GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 

Recently uploaded (20)

GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 

Python2 unicode-pt1

  • 1. Python2’s unicode problem A primer for the Python3ista
  • 3. Intro ● This is an entry level talk on a complex problem
  • 4. Intro ● This is an entry level talk on a complex problem ● It’s aimed at giving you a peek at the problem...
  • 5. Intro ● This is an entry level talk on a complex problem ● It’s aimed at giving you a peek at the problem ● ...so you’ll start to have a conceptual understanding
  • 6. Intro ● This is an entry level talk on a complex problem ● It’s aimed at giving you a peek at the problem ● ...so you’ll start to have a conceptual understanding – But solving the problem is another talk
  • 7. Intro ● This is an entry level talk on a complex problem ● It’s aimed at giving you a peek at the problem ● ...so you’ll start to have a conceptual understanding – But solving the problem is another talk – In fact, having a complete understanding of the problem is another talk
  • 8. New job, old tech ● So you learned on Python3….
  • 9. New job, old tech ● So you learned on Python3…. ● But your new job requires maintaining Python2...
  • 10. New job, old tech ● So you learned on Python3…. ● But your new job requires maintaining Python2... ● What’s this unicode problem everyone’s talking about?
  • 11. The unicode problem 🎵Oh you wanna know what the big deal is?
  • 12. The unicode problem 🎵Oh you wanna know what the big deal is? I can tell you that
  • 13. The unicode problem 🎵Oh you wanna know what the big deal is? I can tell you that Oh, I can tell you that 🎶
  • 14. But first, let's learn about PHP
  • 15. But first, let's learn about PHP Because every good explanation of a problem begins with an example in PHP.
  • 16. But first, let's learn about PHP In PHP, you can define a variable to hold a string
  • 17. But first, let's learn about PHP That looks like this: $a = "1";
  • 18. But first, let's learn about PHP and then you can define a second variable to hold an int $a = "1";
  • 19. But first, let's learn about PHP Which looks like this: $a = "1"; $b = 1;
  • 20. But first, let's learn about PHP and then you can compare those... $a = "1"; $b = 1; if ($a == $b) { echo "n"; }
  • 21. But first, let's learn about PHP which outputs this: $ php -r '$a = "1"; $b = 1; if ($a == $b) { echo "Equaln"; }' Equal
  • 22. But first, let's learn about PHP Huh
  • 23. But first, let's learn about PHP In PHP, you can define a variable to hold a string: $a = "1";
  • 24. But first, let's learn about PHP and then you can define a second variable to hold another string: $a = "1"; $b = "1.0";
  • 25. But first, let's learn about PHP And then you can compare those: $a = "1"; $b = "1.0"; if ($a == $b) { echo "Equaln"; }
  • 26. But first, let's learn about PHP which does just what you expect...: $ php -r '$a = "1"; $b = "1.0"; if ($a == $b) { echo "Also equaln"; }' Also equal
  • 27. Okay, let’s learn about Python3
  • 28. Okay, let’s learn about Python3 In Python3, you can define a variable to hold a string (Not idiomatic... bear with me): a = str("1");
  • 29. Okay, let’s learn about Python3 And you can define another variable to hold an int: a = str("1"); b = int(1);
  • 30. Okay, let’s learn about Python3 and then you can compare them: a = str("1"); b = int(1); print("Equal") if a == b else print("Unequal")
  • 31. Okay, let’s learn about Python3 which shows that Python has chosen a distinctly different path than PHP: python3 -c 'a = str("1"); b = int(1) ; print("Equal") if a == b else print("Unequal")' Unequal
  • 32. Choices Different languages make different choices
  • 34. Choices: PHP ● PHP imagines a world where all input is text
  • 35. Choices: PHP ● PHP imagines a world where all input is text ● The language should turn text into types the programmer expects
  • 36. Choices: PHP ● PHP imagines a world where all input is text ● The language should turn text into types the programmer expects ● PHP coerces strings to types before comparing them
  • 37. Choices: PHP if (float(“1”) == float(“1.0”)) { echo(“Equal”); } Just imagine that PHP is doing this:
  • 38. Choices: PHP ● PHP imagines a world where all input is text ● The language should turn text into types the programmer expects ● PHP coerces strings to types before comparing them ● But what if you want to compare strings?
  • 39. Choices: PHP if “1” == “1.0” { echo(“Equal”); } Possible...
  • 40. Choices: PHP if “1” === “1.0” { echo(“Equal”); } …with a different operator
  • 41. Choices: PHP if “1” === “1.0” { echo(“Equal”); } Not always convenient or intuitive to seasoned programmers...
  • 44. Choices: Python ● Variables are strongly typed ● The language forces the programmer to match up the types
  • 45. Choices: PHP if int(“1”) == 1: print(“Equal”) if “1” == str(1): print(“Equal”) You could convert:
  • 46. Choices: PHP a = “1” b = [(“1”, “digit”), (“one”, “word”)] for entry in b: if a == entry[1]: print(entry[0]) You could define a different way to compare:
  • 47. Choices: Python ● Variables are strongly typed ● The language forces the programmer to match up the types ● The power is in your hands
  • 48. Let’s talk about Python3 bytes
  • 49. Let’s talk about Python3 bytes In Python3, you can define a text string (an immutable sequence of human readable characters): a = str(“ñ”)
  • 50. Let’s talk about Python3 bytes And you can define a byte string (an immutable sequence of bytes): a = str(“ñ”) b = bytes(b"xc3xb1")
  • 51. Let’s talk about Python3 bytes And when you attempt to compare those... a = str(“ñ”) b = bytes(b"xc3xb1") print("Equal") if a == b else print("Unequal")
  • 52. Let’s talk about Python3 bytes ...they continue to do what you expect: python3 -c 'a = str("ñ"); b = bytes(b"xc3 xb1") ; print("Equal") if a == b else print("Unequal")' Unequal
  • 53. Let’s talk about Python3 bytes If you, the programmer, decide that you want to convert and compare, you can do that: a = str("ñ") b = bytes(b"xc3xb1").decode("latin1") if a == b: print(f"{a} == {b}: Equal") else: print(f"{a} == {b}: Unequal")' # OUTPUT: ñ == ñ: Unequal
  • 54. Let’s talk about Python3 bytes With way that you choose to convert having a hand in the results: a = str("ñ") b = bytes(b"xc3xb1").decode("utf-8") if a == b: print(f"{a} == {b}: Equal") else: print(f"{a} == {b}: Unequal")' # OUTPUT: ñ == ñ: Equal
  • 55. Let’s talk about idiomatic Python3
  • 56. Let’s talk about idiomatic Python3 ● I’ve been using constructors
  • 57. Let’s talk about idiomatic Python3 ● I’ve been using constructors – Make clear we’re dealing with different types
  • 58. Let’s talk about idiomatic Python3 ● I’ve been using constructors – Make clear we’re dealing with different types ● Python3 has string literals...
  • 59. Let's talk about idiomatic Python3 ...for text strings: a = “z”
  • 60. Let's talk about idiomatic Python3 ...and for bytes strings: a = “z” b = b"xc3xb1"
  • 61. Let's talk about idiomatic Python3 These are the same as using the constructor: a = “ñ” b = b"xc3xb1" if a == b.decode("utf-8"): print(f"{a} == {b}: Equal") else: print(f"{a} == {b}: Unequal") # ñ == ñ: Equal
  • 62. Let’s talk about idiomatic Python3 ● I’ve been using constructors – Make clear we’re dealing with different types ● Python3 has string literals… ● Python3 has syntactic sugar for byte strings if they only contain characters present in ASCii.
  • 63. Let's talk about idiomatic Python3 This is the sugar: a = “z” b = b”z”
  • 64. Let's talk about idiomatic Python3 But this is just sugar. They are still different types which compare unequal: a = “z” b = b”z” if a == b: print(“Equal”) else: print("Unequal") # Unequal
  • 65. Let's talk about idiomatic Python3 Unless you explicitly convert them: a = “z” b = b”z” if a == b.decode(“utf-8”): print(“Equal”) else: print("Unequal") # Equal
  • 66. Let's talk about idiomatic Python3 Warning: they may still compare unequal when you decode... a = “z” b = b”z” if a == b.decode(“ebcdic-cp-ch”): print(“Equal”) else: print("Unequal") # Unequal
  • 67. Let's talk about idiomatic Python3 ● ...because b”z” is actually 0x7a ● and the encoding determines which human character that maps to a = “z” b = b”z” if a == b.decode(“ebcdic-cp-ch”): print(“Equal”) else: print("Unequal") # Unequal
  • 68. Let’s talk about Python2! (finally)
  • 69. Let’s talk about Python2! ● PHPPython2 will coerce bytes type to text type
  • 70. Let’s talk about Python2! ● PHPPython2 will coerce bytes type to text type (Ouch)
  • 71. Let’s talk about Python2! ● PHPPython2 will coerce bytes type to text type – There are many circumstances when it will coerce
  • 72. Let’s talk about Python2 ● String formatting with % a = b“test %s” % u”string” # u”test string” b = u”test %s” % b“string” # u”test string”
  • 73. Let’s talk about Python2 ● Adding strings a = b“this” + u”too” # u”this too” b = u”this” + b“too” # u”this too”
  • 74. Let’s talk about Python2 ● Joining strings a = u”“.join(b“yep”, b”pers”) # u”yeppers” b = b””.join(u””, b“empty too”) # u”empty too”
  • 75. Let’s talk about Python2 ● Comparisons if u”test” == b”test”: print(‘Equal’) # Equal
  • 76. Let’s talk about Python2! ● So what’s the problem?
  • 77. Let’s talk about Python2! ● So what’s the problem? ● Python2 can only coerce ASCii characters
  • 78. Let’s talk about Python2! ● So what’s the problem? ● Python2 can only coerce ASCii characters ● Attempting to coerce other characters will fail
  • 79. Let’s talk about Python2 Most coercions that fail, fail with a traceback... a = u“test %s” % b”coffee” # u”test coffee” b = u”test %s” % b”café” Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128)
  • 80. Let’s talk about Python2 … but with comparisons, the implicit coercion results in a warning if b"coffee" == u"coffee": print("Equal") # Equal if b"café" == u"café": print("Equal") __main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  • 81. Let’s talk about Python2 Depending on your locale settings…. echo -n 'hello' > ‘み.txt’ LC_ALL=pt_BR.utf-8 python2 -c 'print(open(u" u307f.txt").read())' hello
  • 82. Let’s talk about Python2 ...encoding can also traceback echo -n 'hello' > ‘み.txt’ LC_ALL=pt_BR.iso88591 python2 -c 'print(open(u"u307f.txt").read())' Traceback (most recent call last): File "<string>", line 1, in <module> UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-3: ordinal not in range(256
  • 83. Let’s talk about Python2! ● So what’s the problem? ● Python2 can only coerce ASCii characters ● Attempting to coerce other characters will fail (Ouch) (Ouch)
  • 84. What is the underlying difference?
  • 85. What is the underlying difference? ● Python3 is modeled around them being friendly but non-substitutable types
  • 86. What is the underlying difference? ● Python3 is modeled around them being friendly but non-substitutable types – Explicit conversions exist but not implicit
  • 87. What is the underlying difference? ● Python3 is modeled around them being friendly but non-substitutable types – Explicit conversions exist but not implicit – Most APIs take one or the other
  • 88. What is the underlying difference? ● Python3 is modeled around them being friendly but non-substitutable types – Explicit conversion exist but not implicit – Most APIs take one or the other ● Python2 is modeled around text and bytes being largely substitutable
  • 89. What is the underlying difference? ● Python3 is modeled around them being friendly but non-substitutable types – Explicit conversion exist but not implicit – Most APIs take one or the other ● Python2 is modeled around text and bytes being largely substitutable – Plethora of implicit conversions
  • 90. What is the underlying difference? ● Python3 is modeled around them being friendly but non-substitutable types – Explicit conversion exist but not implicit – Most APIs take one or the other ● Python2 is modeled around text and bytes being largely substitutable – Plethora of implicit conversions – Most APIs accept either one….
  • 92. Let’s talk about Liskov ● The Liskov Substitution Principle formulates a property of good object design
  • 93. Let’s talk about Liskov ● The Liskov Substitution Principle formulates an essential property of good object design
  • 94. Let’s talk about Liskov ● The Liskov Substitution Principle formulates an essential property of good object design ● Behaviors of child objects cannot modify the behaviors of parent objects
  • 95. Let’s talk about Liskov ● The Liskov Substitution Principle formulates an essential property of good object design ● Behaviors of child objects cannot modify the behaviors of parent objects ● Allows safely substituting the child for the parent
  • 96. Let’s talk about Liskov ● Text and byte strings do not satisfy Liskov
  • 97. Let’s talk about Liskov ● Text and byte strings do not satisfy Liskov (The Python authors knew this. There’s no parent-child relationship between them)
  • 98. Let’s talk about Liskov ● Text and byte strings do not satisfy Liskov – translate() – decode() – encode() –
  • 99. Let’s talk about Liskov ● Text and byte strings do not satisfy Liskov – translate() – decode() – encode() ● What does that mean for us?
  • 100. Changing our expectations ● Would you expect this to work? – assert [u“one”] + u“two” == [u”one”, u”two”]
  • 101. Change our expectations ● Would you expect this to work? – assert [u“one”] + u“two” == [u”one”, u”two”] ● How about this? – assert add(1, u”two”) == 3
  • 102. Change our expectations ● Would you expect this to work? – assert [u“one”] + u“two” == [u”one”, u”two”] ● How about this? – assert add(1, u”two”) == 3 ● So why do we expect this to work? – assert concat( b“one”, u”two) == b”onetwo” ● Two different types; not substitutable: it is up to the caller to decide what to do
  • 103. Change our expectations ● Would you expect this to work? – assert [u“one”] + u“two” == [u”one”, u”two”] ● How about this? – assert add(1, u”two”) == 3 ● So why do we expect this to work? – assert concat( b“one”, u”two) == b”onetwo”
  • 104. The Unicode Sandwich ● Python2’s text type is called “unicode()”
  • 105. The Unicode Sandwich ● Python2’s text type is called “unicode()” ● Make all human-readable strings unicode type
  • 106. The Unicode Sandwich ● Python2’s text type is called “unicode()” ● Make all human-readable strings unicode type ● Make all binary data str type
  • 107. The Unicode Sandwich ● Python2’s text type is called “unicode()” ● Make all human-readable strings unicode type ● Make all binary data str type – Use a naming convention to identify variables that hold binary data
  • 108. The Unicode Sandwich ● Python2’s text type is called “unicode()” ● Make all human-readable strings unicode type ● Make all binary data str type – Use a naming convention to identify variables that hold binary data ● Transform to the appropriate type immediately after data enters your application
  • 109. The Unicode Sandwich ● Python2’s text type is called “unicode()” ● Make all human-readable strings unicode type ● Make all binary data str type – Use a naming convention to identify variables that hold binary data ● Transform to the appropriate type immediately after data enters your application ● Transform to the type an external API expects just before calling the API
  • 110. Writing APIs: General ● Create APIs that accept text if they need human-readable data
  • 111. Writing APIs: General ● Create APIs that accept text if they need human-readable data ● Create APIs that accept bytes if they deal with binary data
  • 112. Writing APIs: General ● Create APIs that accept text if they need human-readable data ● Create APIs that accept bytes if they deal with binary data ● Use a naming convention to identify functions which return bytes
  • 113. Writing APIs: General ● Create APIs that accept text if they need human-readable data ● Create APIs that accept bytes if they deal with binary data ● Use a naming convention to identify functions which return bytes ● Avoid making functions which mix text and bytes
  • 114. Writing APIs: Mixing So you want to disregard my advice and write functions which allow mixing…. assert concat(b”xe4xb8x80”, u” ”二 ) == u“一 ”二
  • 115. Writing APIs: Mixing Questions: ● Should this return text or bytes? assert concat(b”xe4xb8x80”, u”二”) == u“一二”
  • 116. Writing APIs: Mixing Questions: ● Should this return text or bytes? ● What encoding should it use to convert? assert concat(b”xe4xb8x80”, u”二”) == u“一二”
  • 117. Writing APIs: Mixing Questions: ● Should this return text or bytes? ● What encoding should it use? ● What should happen when it can’t convert? assert concat(b”xe4xb8xff”, u”二”) == u“一二” Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-1: invalid continuation byte
  • 118. Writing APIs: Mixing Answer: ● Return bytes or text: Naming conventions! assert concat(b”xe4xb8x80”, u”二”) == u“一二” def b_concat(str1, str2) -> bytes: def concat(str1, str2) -> unicode:
  • 119. Writing APIs: Mixing Answer: ● Return bytes or text: Naming conventions ● Encoding: Caller is in control assert concat(b”xe4xb8x80”, u”二”) == u“一二” def b_concat(str1, str2, encoding=”utf-8”) -> bytes: def concat(str1, str2, encoding=”utf-8”) -> unicode:
  • 120. Writing APIs: Mixing Answer: ● Return bytes or text: Naming conventions! ● Encoding: Caller is in control ● Handling errors: Caller is in control assert concat(b”xe4xb8x80”, u”二”) == u“一二” def b_concat(str1, str2, encoding=”utf-8”, errors=”strict”) -> bytes: def concat(str1, str2, encoding=”utf-8”, errors=”strict”) -> unicode:
  • 121. The danger of mixed APIs ● Now that you know how to write mixed APIs, a reminder not to do it. ● Mixed APIs encourage sloppy programming ● Instead of understanding the types you are using and controlling them you get used to throwing any type at it and getting useful output. ● Don’t do that.
  • 122. Exceptions ● Sometimes you’ll have an API that is type-less. ● Like repr()… give it any type of data and get something sensible. ● What else could be like that? ● Debug logging: – Logging.debug(“Not a message, an object”) – Logging.debug(Configparser(filename)) – Logging.debug(b”Above would sensibly log the particulars about the ConfigParser object. This logs the particulars about a bytes object”)
  • 125. Exceptions ● Debug logging – Logging.debug(“Not a message, an object”) ● 10:00:UTC|u”Not a message, an object”|
  • 126. Exceptions ● Debug logging – Logging.debug(“Not a message, an object”) ● 10:00:UTC|u”Not a message, an object” – Logging.debug(pathlib.Path(“/etc/passwd”))
  • 127. Exceptions ● Debug logging – Logging.debug(“Not a message, an object”) ● 10:00:UTC|u”Not a message, an object” – Logging.debug(pathlib.Path(“/etc/passwd”)) ● 10:00:UTC|PosixPath('/')
  • 128. Exceptions ● Debug logging – Logging.debug(“Not a message, an object”) ● 10:00:UTC|u”Not a message, an object” – Logging.debug(pathlib.Path(“/etc/passwd”)) ● 10:00:UTC|PosixPath('/') – Logging.debug(b”Remember: logging objects”)
  • 129. Exceptions ● Debug logging – Logging.debug(“Not a message, an object”) ● 10:00:UTC|u”Not a message, an object” – Logging.debug(pathlib.Path(“/etc/passwd”)) ● 10:00:UTC|PosixPath('/') – Logging.debug(b”Remember: logging objects”) ● 10:00:UTC|b”Remember: logging objects”
  • 130. Finis ● Thanks to Gary Bernhardt of Destroy All Software which inspired the format of this talk ● https://www.destroyallsoftware.com/talks/wat ● Kumar McMillan’s Pycon talk on unicode in Python2; old but good introduction to the solution ● http://farmdev.com/talks/unicode/ ● I’m Toshio Kuratomi (@abadger1999, @abadger, and <abadger gmail> ● Hope you had fun!