SlideShare a Scribd company logo
1 of 12
中文编码处理 蔡啸 2011/9/15
提纲 引子 基本概念 实际应用
引子 乱码是一个很烦人的,而且历史悠久的问题。从很遥远的时代这个问题就已经存在了。比如:
基本概念(一) 字符集:	一组具有共同特征抽象字符的集合 GBK:大陆使用的中文字符集标准 CJK:中日韩统一表意文字 UNICODE:万国码(6.0支持 109449个字符) 17个语言平面(每个平面65536个字符) Plane0,亦称为BMP
基本概念(二) 字符编码:字符和二进制内码的对应码表 ASCII: 最古老的字符编码,用于支持英文字符、数字和基本的控制字符 UTF8 ,[object Object],UCS 4/ 32位表示
基本概念(三) 例子:
实际应用(python) 普通字符串 unicode字符串 注: 2.x版本的字符串的编码是系统相关的 比如: >>> s = "蔡啸" 'b2ccd0a5’ #windows 'e894a1e595b8’ #linux
实际应用(JavaScript) JavaScript内部使用unicode表示字符 有用的字符编码函数: escape //获取字符的unicode表示 encodeURI // 获取字符的utf-8表示
实际应用(HTML) 浏览器中如何确定页面的字符编码? http头中的Content-Type html页面中的meta标签中指定charset 页面正文数据(浏览器可以解析正文二进制码来判断编码) 另外两个因素: 浏览器默认编码 操作系统语言类型
实际应用(URL) URL的规范: RFC1738  URL必须由英文字母,数字和某些标点符号组成 当在URL中包含中文字符时,浏览器会对其进行编码 在地址栏里输入: http://www.google.com.hk/search?q=蔡啸
总结 要点: windows下中文默认为GBK编码 linux下中文一般为UTF8编码 ,[object Object],[object Object]
中文编码问题

More Related Content

Viewers also liked

LEGISLACIÓN DE SECTOR DE EMISIONES ATMOSFÉRICAS EN COLOMBIA
LEGISLACIÓN  DE SECTOR DE EMISIONES ATMOSFÉRICAS EN COLOMBIALEGISLACIÓN  DE SECTOR DE EMISIONES ATMOSFÉRICAS EN COLOMBIA
LEGISLACIÓN DE SECTOR DE EMISIONES ATMOSFÉRICAS EN COLOMBIADIEGO DAYS.
 
Secure information aggregation in sensor networks
Secure information aggregation in sensor networksSecure information aggregation in sensor networks
Secure information aggregation in sensor networksAleksandr Yampolskiy
 
Recruiting Great Engineers in Six Easy Steps
Recruiting Great Engineers in Six Easy StepsRecruiting Great Engineers in Six Easy Steps
Recruiting Great Engineers in Six Easy StepsAleksandr Yampolskiy
 
Price of anarchy is independent of network topology
Price of anarchy is independent of network topologyPrice of anarchy is independent of network topology
Price of anarchy is independent of network topologyAleksandr Yampolskiy
 
WeDo Technologies Worldwide User Group 2013 - Post Event Brochure
WeDo Technologies Worldwide User Group 2013 - Post Event BrochureWeDo Technologies Worldwide User Group 2013 - Post Event Brochure
WeDo Technologies Worldwide User Group 2013 - Post Event BrochureSérgio Silvestre
 
Much ado about randomness. What is really a random number?
Much ado about randomness. What is really a random number?Much ado about randomness. What is really a random number?
Much ado about randomness. What is really a random number?Aleksandr Yampolskiy
 
Battle of the board!!! subject verb agreement
Battle of the board!!! subject verb agreementBattle of the board!!! subject verb agreement
Battle of the board!!! subject verb agreementrrbriggs
 
Information Theory and Coding Notes - Akshansh
Information Theory and Coding Notes - AkshanshInformation Theory and Coding Notes - Akshansh
Information Theory and Coding Notes - AkshanshAkshansh Chaudhary
 
Information theory & coding (ECE)
Information theory & coding (ECE)Information theory & coding (ECE)
Information theory & coding (ECE)nitmittal
 
Digital communication system
Digital communication systemDigital communication system
Digital communication systembabak danyal
 

Viewers also liked (11)

LEGISLACIÓN DE SECTOR DE EMISIONES ATMOSFÉRICAS EN COLOMBIA
LEGISLACIÓN  DE SECTOR DE EMISIONES ATMOSFÉRICAS EN COLOMBIALEGISLACIÓN  DE SECTOR DE EMISIONES ATMOSFÉRICAS EN COLOMBIA
LEGISLACIÓN DE SECTOR DE EMISIONES ATMOSFÉRICAS EN COLOMBIA
 
Secure information aggregation in sensor networks
Secure information aggregation in sensor networksSecure information aggregation in sensor networks
Secure information aggregation in sensor networks
 
Recruiting Great Engineers in Six Easy Steps
Recruiting Great Engineers in Six Easy StepsRecruiting Great Engineers in Six Easy Steps
Recruiting Great Engineers in Six Easy Steps
 
Price of anarchy is independent of network topology
Price of anarchy is independent of network topologyPrice of anarchy is independent of network topology
Price of anarchy is independent of network topology
 
WeDo Technologies Worldwide User Group 2013 - Post Event Brochure
WeDo Technologies Worldwide User Group 2013 - Post Event BrochureWeDo Technologies Worldwide User Group 2013 - Post Event Brochure
WeDo Technologies Worldwide User Group 2013 - Post Event Brochure
 
Business Case Studies
Business Case Studies Business Case Studies
Business Case Studies
 
Much ado about randomness. What is really a random number?
Much ado about randomness. What is really a random number?Much ado about randomness. What is really a random number?
Much ado about randomness. What is really a random number?
 
Battle of the board!!! subject verb agreement
Battle of the board!!! subject verb agreementBattle of the board!!! subject verb agreement
Battle of the board!!! subject verb agreement
 
Information Theory and Coding Notes - Akshansh
Information Theory and Coding Notes - AkshanshInformation Theory and Coding Notes - Akshansh
Information Theory and Coding Notes - Akshansh
 
Information theory & coding (ECE)
Information theory & coding (ECE)Information theory & coding (ECE)
Information theory & coding (ECE)
 
Digital communication system
Digital communication systemDigital communication system
Digital communication system
 

中文编码问题