SlideShare a Scribd company logo
Unicode
in python
We Cover these now
● Unicode history
● terms clarity (code point,BOM,utf-8,utf-16)
● decoding and encoding in python
● how django handles these?
● helpful python modules to tackle it
Note: BOM is used in utf-16.since, it has multi
bytes character code point
How it came?
Americans came up with (7 bit)ASCII
representation with english only alphabets as a
standard to exchange information.(‘A’ - 65, ’a’ -
97)
Rest of the world came up with their
unaccented english characters ('ä', )in their own
way.(messed up)
What causes unicode born?
To exchange information in all languages, we
got some requirements
● Unique and simple rule was needed
● Adoptable across all machines(windows,ibm,
etc..)
● Efficient storage as much possible
Unicode
Unicode = UCS(universal character set) + bit
representation logic
UCS:
character + code point
(‘a’, 97)
bit representation:
BOM = Big endian (or) Little endian
00 48 00 65 00 6C 00 6C 00 6F (or) 48 00 65 00 6C
utf-8 is famous, because
● multi-byte encoding
● variable width encoding
● upto 4 byte code points are allowed by utf-8
● mostly, No need BOM(8 bits)
● memory efficient
How?
for NON-ASCII bytes,
1st byte is reserved to indicate the no of
bytes the char is using(eg.compression)
decoding
Character to Numeric value(code point)
conversion
● from <type 'str'> to <type 'unicode'>
● it throws maximum “UnicodeDecodeError:”
(samples demo)
encoding
● Numeric value(code point) to Characters
● from <type 'unicode'> to <type 'str'>
● it throws maximum “UnicodeEncodeError:”
(samples demo)
Rules to Remember…
● Decode early, Unicode everywhere, Encode late
● UTF-8 is the best guess for an encoding
● chardet.detect()
==========================
in Python 3 this is solved…
● <type 'str'> is a Unicode object
How django handles?
>>> def to_unicode(
... obj, encoding='utf-8'):
... if isinstance(obj, basestring):
... if not isinstance(obj, unicode):
... obj = unicode(obj, encoding)
... return obj
smart_text(s, encoding='utf-8',
strings_only=False, errors='strict')
force_text(s, encoding='utf-8', strings_only=False,
errors='strict')
smart_bytes(s, encoding='utf-8',
How to set your python default
encoding standard?
import sys
>>>reload(sys)
>>>sys.setdefaultencoding(‘utf-8’)
>>>sys.getdefaultencoding
>>>’utf-8’
(or)
# -*- coding: utf-8 -*-
(tell to python you saved <mod_name.py> in
utf-8)
Related python modules..
● chardet.detect()
● unicodedata
● codecs
Thanks for your time
Post your questions.
samples demo….
screenshot2

More Related Content

What's hot

Java Networking
Java NetworkingJava Networking
Java Networking
Sunil OS
 
Pointer in C++
Pointer in C++Pointer in C++
Pointer in C++
Mauryasuraj98
 
Files in java
Files in javaFiles in java
Chapter 07 inheritance
Chapter 07 inheritanceChapter 07 inheritance
Chapter 07 inheritance
Praveen M Jigajinni
 
Classless subnetting
Classless subnettingClassless subnetting
Classless subnetting
Universidad Tecnica de Ambato
 
Python programming : Files
Python programming : FilesPython programming : Files
Python programming : Files
Emertxe Information Technologies Pvt Ltd
 
Functions and modules in python
Functions and modules in pythonFunctions and modules in python
Functions and modules in python
Karin Lagesen
 
Python programming
Python  programmingPython  programming
Python programming
Ashwin Kumar Ramasamy
 
Iptables the Linux Firewall
Iptables the Linux Firewall Iptables the Linux Firewall
Iptables the Linux Firewall
Syed fawad Gillani
 
Python - Regular Expressions
Python - Regular ExpressionsPython - Regular Expressions
Python - Regular Expressions
Mukesh Tekwani
 
Sets in python
Sets in pythonSets in python
1.python interpreter and interactive mode
1.python interpreter and interactive mode1.python interpreter and interactive mode
1.python interpreter and interactive mode
ManjuA8
 
Data types in python lecture (2)
Data types in python lecture (2)Data types in python lecture (2)
Data types in python lecture (2)
Ali ٍSattar
 
Pointers in c++ by minal
Pointers in c++ by minalPointers in c++ by minal
Pointers in c++ by minal
minal kumar soni
 
Shell Scripting in Linux
Shell Scripting in LinuxShell Scripting in Linux
Shell Scripting in Linux
Anu Chaudhry
 
Advantages of Python Learning | Why Python
Advantages of Python Learning | Why PythonAdvantages of Python Learning | Why Python
Advantages of Python Learning | Why Python
EvoletTechnologiesCo
 
Python Datatypes by SujithKumar
Python Datatypes by SujithKumarPython Datatypes by SujithKumar
Python Datatypes by SujithKumar
Sujith Kumar
 
Modules and packages in python
Modules and packages in pythonModules and packages in python
Modules and packages in python
TMARAGATHAM
 
[OOP - Lec 16,17] Objects as Function Parameter and ReturnType
[OOP - Lec 16,17] Objects as Function Parameter and ReturnType[OOP - Lec 16,17] Objects as Function Parameter and ReturnType
[OOP - Lec 16,17] Objects as Function Parameter and ReturnType
Muhammad Hammad Waseem
 
Command line arguments.21
Command line arguments.21Command line arguments.21
Command line arguments.21
myrajendra
 

What's hot (20)

Java Networking
Java NetworkingJava Networking
Java Networking
 
Pointer in C++
Pointer in C++Pointer in C++
Pointer in C++
 
Files in java
Files in javaFiles in java
Files in java
 
Chapter 07 inheritance
Chapter 07 inheritanceChapter 07 inheritance
Chapter 07 inheritance
 
Classless subnetting
Classless subnettingClassless subnetting
Classless subnetting
 
Python programming : Files
Python programming : FilesPython programming : Files
Python programming : Files
 
Functions and modules in python
Functions and modules in pythonFunctions and modules in python
Functions and modules in python
 
Python programming
Python  programmingPython  programming
Python programming
 
Iptables the Linux Firewall
Iptables the Linux Firewall Iptables the Linux Firewall
Iptables the Linux Firewall
 
Python - Regular Expressions
Python - Regular ExpressionsPython - Regular Expressions
Python - Regular Expressions
 
Sets in python
Sets in pythonSets in python
Sets in python
 
1.python interpreter and interactive mode
1.python interpreter and interactive mode1.python interpreter and interactive mode
1.python interpreter and interactive mode
 
Data types in python lecture (2)
Data types in python lecture (2)Data types in python lecture (2)
Data types in python lecture (2)
 
Pointers in c++ by minal
Pointers in c++ by minalPointers in c++ by minal
Pointers in c++ by minal
 
Shell Scripting in Linux
Shell Scripting in LinuxShell Scripting in Linux
Shell Scripting in Linux
 
Advantages of Python Learning | Why Python
Advantages of Python Learning | Why PythonAdvantages of Python Learning | Why Python
Advantages of Python Learning | Why Python
 
Python Datatypes by SujithKumar
Python Datatypes by SujithKumarPython Datatypes by SujithKumar
Python Datatypes by SujithKumar
 
Modules and packages in python
Modules and packages in pythonModules and packages in python
Modules and packages in python
 
[OOP - Lec 16,17] Objects as Function Parameter and ReturnType
[OOP - Lec 16,17] Objects as Function Parameter and ReturnType[OOP - Lec 16,17] Objects as Function Parameter and ReturnType
[OOP - Lec 16,17] Objects as Function Parameter and ReturnType
 
Command line arguments.21
Command line arguments.21Command line arguments.21
Command line arguments.21
 

Similar to Unicode basics in python

Except UnicodeError: battling Unicode demons in Python
Except UnicodeError: battling Unicode demons in PythonExcept UnicodeError: battling Unicode demons in Python
Except UnicodeError: battling Unicode demons in Python
Aram Dulyan
 
Notes on a Standard: Unicode
Notes on a Standard: UnicodeNotes on a Standard: Unicode
Notes on a Standard: Unicode
Elena-Oana Tabaranu
 
UTF-8: The Secret of Character Encoding
UTF-8: The Secret of Character EncodingUTF-8: The Secret of Character Encoding
UTF-8: The Secret of Character Encoding
Bert Pattyn
 
Comprehasive Exam - IT
Comprehasive Exam - ITComprehasive Exam - IT
Comprehasive Exam - IT
guest6ddfb98
 
Coa presentation1
Coa presentation1Coa presentation1
Coa presentation1
rickypatel151
 
micro:bit and JavaScript
micro:bit and JavaScriptmicro:bit and JavaScript
micro:bit and JavaScript
Kenneth Geisshirt
 
Unicode Encoding Forms
Unicode Encoding FormsUnicode Encoding Forms
Unicode Encoding Forms
Mehdi Hasan
 
hashdays 2011: Ange Albertini - Such a weird processor - messing with x86 opc...
hashdays 2011: Ange Albertini - Such a weird processor - messing with x86 opc...hashdays 2011: Ange Albertini - Such a weird processor - messing with x86 opc...
hashdays 2011: Ange Albertini - Such a weird processor - messing with x86 opc...
Area41
 
Lecture_ASCII and Unicode.ppt
Lecture_ASCII and Unicode.pptLecture_ASCII and Unicode.ppt
Lecture_ASCII and Unicode.ppt
Alula Tafere
 
The str/bytes nightmare before python2 EOL
The str/bytes nightmare before python2 EOLThe str/bytes nightmare before python2 EOL
The str/bytes nightmare before python2 EOL
Kir Chou
 
Compgenerations pented
Compgenerations pentedCompgenerations pented
Compgenerations pented
Sajib
 
WhatIsComputer.ppt
WhatIsComputer.pptWhatIsComputer.ppt
WhatIsComputer.ppt
SimonLoh8
 
Using unicode with php
Using unicode with phpUsing unicode with php
Using unicode with php
Elizabeth Smith
 
03 UNIT-2.pdf
03 UNIT-2.pdf03 UNIT-2.pdf
03 UNIT-2.pdf
31ABINESHWARANG
 
Unicode Fundamentals
Unicode Fundamentals Unicode Fundamentals
Unicode Fundamentals
SamiHsDU
 
Arduino by yogesh t s'
Arduino by yogesh t s'Arduino by yogesh t s'
Arduino by yogesh t s'
tsyogesh46
 
Abap slide class4 unicode-plusfiles
Abap slide class4 unicode-plusfilesAbap slide class4 unicode-plusfiles
Abap slide class4 unicode-plusfiles
Milind Patil
 
Using unicode with php
Using unicode with phpUsing unicode with php
Using unicode with php
Elizabeth Smith
 
Embedded C - Day 1
Embedded C - Day 1Embedded C - Day 1
It tools and buisness system.docx
It tools and buisness system.docxIt tools and buisness system.docx
It tools and buisness system.docx
hinditutorialspoint
 

Similar to Unicode basics in python (20)

Except UnicodeError: battling Unicode demons in Python
Except UnicodeError: battling Unicode demons in PythonExcept UnicodeError: battling Unicode demons in Python
Except UnicodeError: battling Unicode demons in Python
 
Notes on a Standard: Unicode
Notes on a Standard: UnicodeNotes on a Standard: Unicode
Notes on a Standard: Unicode
 
UTF-8: The Secret of Character Encoding
UTF-8: The Secret of Character EncodingUTF-8: The Secret of Character Encoding
UTF-8: The Secret of Character Encoding
 
Comprehasive Exam - IT
Comprehasive Exam - ITComprehasive Exam - IT
Comprehasive Exam - IT
 
Coa presentation1
Coa presentation1Coa presentation1
Coa presentation1
 
micro:bit and JavaScript
micro:bit and JavaScriptmicro:bit and JavaScript
micro:bit and JavaScript
 
Unicode Encoding Forms
Unicode Encoding FormsUnicode Encoding Forms
Unicode Encoding Forms
 
hashdays 2011: Ange Albertini - Such a weird processor - messing with x86 opc...
hashdays 2011: Ange Albertini - Such a weird processor - messing with x86 opc...hashdays 2011: Ange Albertini - Such a weird processor - messing with x86 opc...
hashdays 2011: Ange Albertini - Such a weird processor - messing with x86 opc...
 
Lecture_ASCII and Unicode.ppt
Lecture_ASCII and Unicode.pptLecture_ASCII and Unicode.ppt
Lecture_ASCII and Unicode.ppt
 
The str/bytes nightmare before python2 EOL
The str/bytes nightmare before python2 EOLThe str/bytes nightmare before python2 EOL
The str/bytes nightmare before python2 EOL
 
Compgenerations pented
Compgenerations pentedCompgenerations pented
Compgenerations pented
 
WhatIsComputer.ppt
WhatIsComputer.pptWhatIsComputer.ppt
WhatIsComputer.ppt
 
Using unicode with php
Using unicode with phpUsing unicode with php
Using unicode with php
 
03 UNIT-2.pdf
03 UNIT-2.pdf03 UNIT-2.pdf
03 UNIT-2.pdf
 
Unicode Fundamentals
Unicode Fundamentals Unicode Fundamentals
Unicode Fundamentals
 
Arduino by yogesh t s'
Arduino by yogesh t s'Arduino by yogesh t s'
Arduino by yogesh t s'
 
Abap slide class4 unicode-plusfiles
Abap slide class4 unicode-plusfilesAbap slide class4 unicode-plusfiles
Abap slide class4 unicode-plusfiles
 
Using unicode with php
Using unicode with phpUsing unicode with php
Using unicode with php
 
Embedded C - Day 1
Embedded C - Day 1Embedded C - Day 1
Embedded C - Day 1
 
It tools and buisness system.docx
It tools and buisness system.docxIt tools and buisness system.docx
It tools and buisness system.docx
 

Recently uploaded

Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
FODUU
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
Techgropse Pvt.Ltd.
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 

Recently uploaded (20)

Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 

Unicode basics in python

  • 2. We Cover these now ● Unicode history ● terms clarity (code point,BOM,utf-8,utf-16) ● decoding and encoding in python ● how django handles these? ● helpful python modules to tackle it Note: BOM is used in utf-16.since, it has multi bytes character code point
  • 3. How it came? Americans came up with (7 bit)ASCII representation with english only alphabets as a standard to exchange information.(‘A’ - 65, ’a’ - 97) Rest of the world came up with their unaccented english characters ('ä', )in their own way.(messed up)
  • 4. What causes unicode born? To exchange information in all languages, we got some requirements ● Unique and simple rule was needed ● Adoptable across all machines(windows,ibm, etc..) ● Efficient storage as much possible
  • 5. Unicode Unicode = UCS(universal character set) + bit representation logic UCS: character + code point (‘a’, 97) bit representation: BOM = Big endian (or) Little endian 00 48 00 65 00 6C 00 6C 00 6F (or) 48 00 65 00 6C
  • 6. utf-8 is famous, because ● multi-byte encoding ● variable width encoding ● upto 4 byte code points are allowed by utf-8 ● mostly, No need BOM(8 bits) ● memory efficient How? for NON-ASCII bytes, 1st byte is reserved to indicate the no of bytes the char is using(eg.compression)
  • 7. decoding Character to Numeric value(code point) conversion ● from <type 'str'> to <type 'unicode'> ● it throws maximum “UnicodeDecodeError:” (samples demo)
  • 8. encoding ● Numeric value(code point) to Characters ● from <type 'unicode'> to <type 'str'> ● it throws maximum “UnicodeEncodeError:” (samples demo)
  • 9. Rules to Remember… ● Decode early, Unicode everywhere, Encode late ● UTF-8 is the best guess for an encoding ● chardet.detect() ========================== in Python 3 this is solved… ● <type 'str'> is a Unicode object
  • 10. How django handles? >>> def to_unicode( ... obj, encoding='utf-8'): ... if isinstance(obj, basestring): ... if not isinstance(obj, unicode): ... obj = unicode(obj, encoding) ... return obj smart_text(s, encoding='utf-8', strings_only=False, errors='strict') force_text(s, encoding='utf-8', strings_only=False, errors='strict') smart_bytes(s, encoding='utf-8',
  • 11. How to set your python default encoding standard? import sys >>>reload(sys) >>>sys.setdefaultencoding(‘utf-8’) >>>sys.getdefaultencoding >>>’utf-8’ (or) # -*- coding: utf-8 -*- (tell to python you saved <mod_name.py> in utf-8)
  • 12. Related python modules.. ● chardet.detect() ● unicodedata ● codecs
  • 13. Thanks for your time Post your questions.