Multimedia Technology - text

3,246 views

Published on

Published in: Education, Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,246
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
57
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Multimedia Technology - text

  1. 1. Multimedia TechnologyText S T Nandasara ADMTC/UCSC 1
  2. 2. World of Languages 2
  3. 3. World of Languages – Asian CountriesSource: Ethnologue- Languages of the World (The exact number of languages may never be determined 3 exactly)
  4. 4. World of Languages – Asian region(Half of the world’s languages are spoken in only eight countries) 4
  5. 5. World of Languages – Asian CountriesCountry Number of Languages Country Population Official or National LanguagesIndonesia 742 245,452,739 IndonesianIndia 427 1,095,351,995 Assamese, Bengali, Bodo, Dogri, English, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Maithili, Malayalam, Manipuri, Marathi, Marwari, Nepali, Oriya, Panjabi, Sanskrit, Sindhi, Tamil, Telugu, Urdu,China 241 1,313,973,713 Chinese, Zhuang, Uighur, Hmong, HaniPhilippines 180 89,468,677 Filipino, EnglishMalaysia 147 24,385,858 MalayNepal 125 28,287,147 Nepali, Gurung, TamangMyanmar 109 47,382,633 BurmeseVietnam 93 84,402,966 VietnameseLaos 82 6,368,481 LaoThailand 75 64,631,595 ThaiIran 74 68,688,433 Arabic, FarsiPakistan 69 165,803,560 Urdu, Panjabi, Sindhi, EnglishAfghanistan 45 31,056,997 Dari, PashtoBangladesh 38 147,365,352 BengaliBhutan 24 2,279,723 DzongkhaIraq 23 26,783,383 Arabic, KurdiCambodia 19 13,881,427 KhmerBrunei 17 379,444 Malay, EnglishMongolia 12 2,832,224 Halh MongolianSri Lanka 8 20,222,240 Sinhala, Tamil, English 5
  6. 6. World of Languages – Script Diversity Three types of Major Scripts in South, South East & East Asia  In East Asia - Chinese Ideographic Scripts  In South Asia, Around Indian sub-continent & Part of South Asia - Influence by Brahmi Scripts  Part of South East Asia and Austrasia - Roman Scripts Two Major Types of Scripts in West & Central Asia  In Central Asia Historically in Arabic, but later Transformed in to Cyrillic  In Western Asia, Arabic Scripts is widely used One major Type of Script in Europe and West  Roman Script 6
  7. 7. World of Languages – Script in AsiaChinese (Mandarin) 885,000,000 普通話 Nepali 16,200,000 नेपालीEnglish 322,000,000 English Filipino (Tagalog) 14,850,000 TagalogArabic (Alarabia) 280,000,000 ‫لعربية‬ Assamese 14,604,000 aসমীয়াBengali 196,000,000 বাংলা Azeri/Azerbaijani (Cyrillic) 13,869,000 Азәрбајҹан дилиHindi 182,000,000 िह दी Sinhala 13,218,000 සිං හලPortuguese (Português) 182,000,000 português Zhuang 10,000,000 Saw cuenghIndonesian 140,000,000 Indonesea Pashto/Pakhto 9,585,000 ‫پښتو‬Japanese (Nihongo) 125,000,000 日本語 Kazakh 8,000,000 Қазақ / ‫قازاق‬Hankuko (Korean) 75,000,000 한국어 [韓國語] Uighur (Uyghur) 7,464,000 Уйғур /‫ئۇيغۇر‬Telugu 73,000,000 ెలుగు Khmer 7,063,200 ភាសាែ◌ ខមរVietnamese 66,897,000 Tiếng Việt Dari 7,000,000 ‫دَري‬ ِMarathi 64,783,000 मराठी Tatar 7,000,000 татарча / ‫تاتارچا‬Tamil 62,000,000 தமிழ் Turkmen 5,397,500 түркmенчеTurkish (Türkçe) 59,000,000 Türkçe Kashmiri 4,381,000 काऽशुर / ‫كٲشر‬ ُUrdu 54,000,000 ‫اردو‬ Lao 4,000,000 ພາສາລາວGujarati 44,000,000 ગુજરાતી Balinese 3,800,000 Bahasa BaliMalayalam 34,014,000 മലയാളം Kyrgyz 2,631,420 КыргызKannada 33,663,000 ಕನನ್ಡ Fijian 650,000 vaka-VitiPunjabi/Panjabi 25,700,000 ਪੰ ਜਾਬੀ / ‫باجنپ‬ Maldivian Dhivehi 280000 ި ެ ި ‫ދވހ‬Thai 21,000,000 ภาษาไทย Sanskrit 194,433 सं कृतम ्Sindhi 19,675,000 ‫سنڌي‬ Tahitian 150,000 Te Reo TahitiUzbek (Cyrillic) 18,386,000 Ўзбек Maori 70,000 Te Reo MāoriBahasa Melayu (Malay) 17,600,000 Bahasa melayu Hawaiian 8,000 Ōlelo Hawaii 7
  8. 8. World of Languages – Script in Asia 8
  9. 9. Nature of Text The most basic media.  Easiest to generate, store and transfer in PC. Still the best for complex explanation.  Using structured text/Hypertext Light weight  Smallest sized media Static Language dependent (biggest problem) 9
  10. 10. Text – Digital FormInput Digital Form Output Creation Typeface Keyboard Bitmap fontHandwriting Vector Font Text DataHandwriting RecognitionPrinted DocumentsOptical CharacterRecognition (OCR) (Character code) Voice ASCII: 8 bit Human Voice Unicode: 16 bit Text-to-Speech Voice Recognition Universal Character Set: 32 bit 10
  11. 11. Indexing and Hypertext Large Text Data Indexing  Rapid random access/search While, it is hard when we try to process by machine a plur ality of media together. The tele phone and method for Large Text Data. radio for voice, the camera for image. we usually tend to handle diff erent media individually. Even with the computer, the represen tative device, origin -ally it could only handle text and numbers. With technological progre ss, it  Essential for reference type became able to handle voice and images and to com municate, but there we re still many limitat ions. Tel applications Dictionary, Encyclopedia Etc. a b c d e Hypertext ad am bi bot by  Non-sequential navigation adjust adorn structure for Large Text Data  Used in Web pages (HTML) Index 11
  12. 12. Hypertext, Hypermedia and Multimedia ia Hy ed pe tim Hypermedia rte ul xt MHypermedia system includes the non- linear Information links of hypertext systems and the continuous and discrete media of multimedia systems. 12
  13. 13. Typography Until end of 14th Century, all writing was done by hand. Typography – the design of the characters that make up text and display type and the way they are configured on the page. Modern software allows :  Rotation or distorting type, wrap around images, 13
  14. 14. Typography – Evolution of Asian Scripts 3 rd Bc 1st century 3 rd century 6 th century 8 th Century Pa l l awa 10 th Century 12 th century M rn ode ණ Kannada Tamil Sinhala Devanagari Gujarati Bengali Oriya Teligu Malayalam Panjabi 14
  15. 15. Typography – Complex ScriptsBengali Devanagar Gujarati iKannada Malayalam TeliguSinhala Tamil RanjanaGurmuki Oriya TibetanKhmer Lao ThaiJawani Thana BaginiSanskrit 15
  16. 16. Typography - Complex Vowels 16
  17. 17. Typography – ASCII & EBCDIC ASCII EBCDIC 17
  18. 18. Typography – 8 Bit English and Sinhala1989 - SLASCII Wadan Tharuwa SBIOS 18
  19. 19. The Code Page Problem Characters in most languages are traditionally represented by single-byte values  Allows for 256 characters max  Real limit for most encodings is 192 characters  This includes letters, digits, punctuation, symbols When a system is used for a new language, the encoding has to be adapted to use that language’s characters Encodings proliferate  Each language or group of languages gets its own encoding  Different vendors or standards committees devise different encodings, so generally each language has several, often incompatible, encodings 19
  20. 20. Multi-byte encodings Some languages (Chinese, Japanese, Korean, etc.) have more than 256 characters Encoding standards for these languages use sequences of bytes for many characters  In many standards, not all characters are the same number of bytes  Can’t tell whether a given byte is a whole character or part of a character  Corruption of one byte can corrupt the whole data stream 20
  21. 21. 21
  22. 22. Interoperability problems Can’t easily mix languages in a document or system Data not tagged with encoding, so loss can occur when transferring between systems Most encodings are ASCII-based, so problems often not seen with English-only data Two possible solutions:  Systematic tagging of textual data with encoding ID  Universal encoding standard with all languages’ characters 22
  23. 23. Encoding space An ASCII character is 7 bits wide 23
  24. 24. Encoding space Most encodings press the eighth bit into service 24
  25. 25. Encoding space Early versions of Unicode used 16 bits 25
  26. 26. Encoding space Unicode now uses 21 bits 26
  27. 27. Encoding space Plane Row Character number number number 27
  28. 28. Unicode 21-bit encoding space allows for 1,114,112 characters 95,156 code point values assigned to characters in Unicode 3.2 137,216 code point values set aside for application use 2,114 code point values set aside for non- character use 879,626 code point values reserved for future character assignments 28
  29. 29. The Unicode Encoding Space 10 F E D C B A 9 8 7 6 5 4 3 2 1 Basic Multilingual Plane 0 29
  30. 30. The Unicode Encoding Space 10 F E D C B A 9 Supplementary Planes 8 7 6 5 4 3 2 1 0 30
  31. 31. The Unicode Encoding Space 10 Supplementary Special-Purpose F E Plane D C B A 9 8 7 6 5 4 3 Supplementary Ideographic Plane 2 Supplementary Multilingual Plane 1 0 31
  32. 32. The Unicode Encoding Space Private Use Planes 10 F E D C B A 9 8 7 6 5 4 3 2 1 0 32
  33. 33. The Unicode Encoding Space 10 F E D C B A 9 8 7 6 5 4 3 2 1 Basic Multilingual Plane 0 33
  34. 34. The Basic Multilingual Plane 0 General Scripts Area 1 2 Symbols Area CJK Punct. 3 CJK Punct. 4 5 Han 6 7 8 9 A Yi B Hangul C D Surrogates Area E Private Use Area F Compatibility Area 34
  35. 35. The General Scripts Area 00/01 Latin 02/03 IPA Diacriticals Greek 04/05 Cyrillic Armenian Hebrew 06/07 Arabic Syriac Thaana 08/09 Devanagari Bengali 0A/0B Gurmukhi Gujarati Oriya Tamil 0C/0D Telugu Kannada Malayalam Sinhala 0E/0F Thai Lao Tibetan 10/11 Myanmar Georgian Hangul 12/13 Ethiopic Cherokee 14/15 Canadian Aboriginal Syllabics Ogh 16/17 am Runic Philippine Khmer 18/19 Mongolian 1A/1B 1C/1D 1E/1F Latin Greek 35
  36. 36. Unicode Coverage European scripts  Latin, Greek, Cyrillic, Armenian, Georgian, IPA Bidirectional (Middle Eastern) scripts  Hebrew, Arabic, Syriac, Thaana Indic (Indian and Southeast Asian) scripts  Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam, Sinhala, Thai, Lao, Khmer, Myanmar, Tibetan, Philippine East Asian scripts  Chinese (Han) characters, Japanese (Hiragana and Katakana), Korean (Hangul), Yi Other modern scripts  Mongolian, Ethiopic, Cherokee, Canadian Aboriginal Historical scripts  Runic, Ogham, Old Italic, Gothic, Deseret Punctuation and symbols  Numerals, math symbols, scientific symbols, arrows, blocks, geometric shapes, Braille, musical notation, etc. 36
  37. 37. Characters, Glyphs, and Fonts In computer terms, a character is a grouping of bits (binary ones and zeros) in packages of 8: one or more bytes There are two broad classes of characters: data characters and control characters 37
  38. 38. Characters, Glyphs, and FontsA – ArialA - Times New RomanA - Courier newA – Giddyup StandardA - BodoniA - PapyrusA - Forte 38
  39. 39. Characters, Glyphs, and Fonts You can run out of available characters pretty quick if you allow all those strange foreign, mathematical, scientific, engineering, currency, and other symbols (Informal Roman) 39
  40. 40. Unicode properties 0041;LATIN CAPITAL LETTER A;Lu;0;L;;;;;N;;;;0061; Representative glyph A Code point: 0041 Name: LATIN CAPITAL LETTER A Semantic General category: Uppercase letter (Lu) properties Canonical combining class: Standard spacing (0) Bidirectional category: Left-to-right (L) Mirrored: no (N) Lowercase mapping: 0061 40
  41. 41. Combining characters One character… 41
  42. 42. Combining characters …or two? 42
  43. 43. Combining charactersActually, either. Unicode is generative, with accent marks represented with their own code point values… = U+0065 (e) U+0301 (accent) …but common combinations of letters and accents are also given their own code points for convenience. = U+00E9 43
  44. 44. Combining characters This can be tough, because the two representations are to be treated as absolutely identical. = U+0065 U+0301 = U+00E9 44
  45. 45. Combining charactersThings can get really wild for characters with morethan one accent mark: = 006F (o) 0302 (circumflex) 0323 (dot) = 006F (o) 0323 (dot) 0302 (circumflex) = 00F4 (o-circumflex) 0323 (dot) = 1ECD (o-dot) 0302 (circumflex) = 1ED9 (o-circumflex-dot) 45
  46. 46. Typography - Complex Vowels Positioning 46
  47. 47. Smart rendering: ArabicKeyboard: Code points: 0628 064e 0628 0650 babibu b babib babi bab baScreen: 0628 064f 0020 0628 47
  48. 48. Smart rendering: BurmeseKeyboard: Code points: 1000 1039 101b krui kru kr 102f 102d Screen: 48
  49. 49. Smart rendering: Tamil Ur r y N m k j Keyboard: Ur rU yU NU mU kU jU Code b8a bb0 bb0 bc2 baf bc2 points: ba3 bc2 bae bc2 b95 bc2 Screen: b9c bc2 49
  50. 50. Typography - Complex Ligature 50
  51. 51. Canonical equivalence 01FA LATIN CAPITAL LETTER A WITH RING ABOVE AND ACUTE 212B 0301 ANGSTROM SIGN COMBINING ACUTE ACCENT 00C5 0301 LATIN CAPITAL LETTER A WITH RING ABOVE COMBINING ACUTE ACCENT 0041 030A 0301 LATIN CAPITAL LETTER A COMBINING RING ABOVE COMBINING ACUTE ACCENT 51
  52. 52. Case mapping Case mapping may produce strings of different length 01F0  004A 030C Case mapping may depend on the locale English 0069  0049 Turkish/Azeri 0069  0130 52
  53. 53. Combining charactersThings can get really wild for characters with morethan one accent mark: = 006F (o) 0302 (circumflex) 0323 (dot) = 006F (o) 0323 (dot) 0302 (circumflex) = 00F4 (o-circumflex) 0323 (dot) = 1ECD (o-dot) 0302 (circumflex) = 1ED9 (o-circumflex-dot) 53
  54. 54. Typography – Unicode Sinhala 1998 – Unicode Ver. 3.0 Sinhala1987- Unicode Ver. 1.0 Sinhala 54
  55. 55. Typography - Complex Ligature ttha in Devanagari ttha in Tamil Tva in Malayalam Tva in Sinhala 55
  56. 56. Typography - Complex Ligature U+200C UTF8 E2 80 8C U+200D UTF8 E2 80 8DTva with ZWNJ in Malayalam Tva with ZWJ in MalayalamTva with ZWNJ in Sinhala Tva with ZWJ in Sinhala 56
  57. 57. Typography - Complex Ligature-UTF 8U+0000 .. U+007F 1 byte 0xxx xxxxU+0080 .. U+07FF 2 bytes 110x xxxx 10xx xxxxU+0800 .. U+FFFF 3 bytes 1110 xxxx 10xx xxxx 10xx xxxxU+10000 .. U+10FFFF 4 bytes 1111 0xxx 10xx xxxx 10xx xxxx 10xx xxxx U+0026 AMPERSAND (decimal 38) U+0D85 SINHALA LETTER AYANNA (decimal 3,461) U+4E2D HAN IDEOGRAPH 4E2D (decimal 20,013) U+10346 GOTHIC LETTER FAIHU (decimal 66,374) U+0E12 THAI LETTER THO PHUTHAO (3602) 57
  58. 58. Typography - Complex Ligature Preventing Conjunct Forms in Devanagari Half-Consonants in Devanagari 58
  59. 59. Typography - Complex Ligature Buddha in Sinhala 59
  60. 60. Typography - Complex Ligature in DB<html> <head> <title>සිංහල</title></head> <body> <?php include("connection.php"); //simple connection setting $result = mysql_query("SET NAMES utf8"); //the main trick $cmd = "select * from sinhala"; $result = mysql_query($cmd); while ($myrow = mysql_fetch_row($result)) { echo ($myrow[0]); } ?> </body></html>//The dump for my database storing sinhala utf strings isCREATE TABLE `sinhala` ( `data` varchar(1000) character set utf8 collate utf8_bin default NULL) ENGINE=InnoDB DEFAULT CHARSET=latin1;INSERT INTO `sinhala` VALUES (‘අම්මා); 60
  61. 61. Typography Typical typefaces (fonts) and type styles used in Word ProcessorsTypefaces Times New Roman Arial  symbol Courier Impact Arial Narrow free hand Palatino San Serif Special Serif typefaces typefaces typefacesCrazy fonts can be distracting!Type styles Bold Italics Outline 61
  62. 62. Typography Special effects  Kerning increases or decreases the spacing between certain pairs of letters to improve their appearance.  Line spacing or leading  Orientation  Anti-alias : To smooth out a text edge.This makes the edges of the text blend into the background so that the text is cleaner and more readable when it is large. 62
  63. 63. TypographyAscender heightCap HeightX heightBase lineDescanter height 63
  64. 64. Typography - Tracking & Kerning 64
  65. 65. Typography - Orientation 65
  66. 66. Typography – Anti-alias 66
  67. 67. Typography  Special effects cont..  strokes, fills, effects and styles to textstroke fill effect style 67
  68. 68. Typography Special effects cont..  Attaching text to a path 68
  69. 69. Typography Special effects cont..  Converting text to path : Text converted to paths retains all of its visual attributes, but you can edit it only as paths. 69
  70. 70. Typography Bitmap Font Vector Font  True Type Fast, Standard, for computer screen, Printer  Adobe Type 1 Precise, Professional, used Screen from “Fontographer” for publishing Normal Anti-aliased Small font  For LCD screen ClearType etc. Optimized 70
  71. 71. Text- Cross-media Technology Voice Recognition  Converts voice (sound data) text data  Need real time procession  Specific speaker/Non specific speaker Text-to-Speech (Speech Synthesis)  Computer “dictates” text data Automatic information services/New mail dictation. 71
  72. 72. Text- Cross-media Technology cont… Optical Character Recognition  Converts text bitmap image to real text data  Used with image scanner Handwriting Recognition  Similar to OCR, but use writing order/direction for better recognition.  Used in PIM (Personal Information Manager)Devices (palmtop computers), 72
  73. 73. Text- Cross-media Technology cont… Machine Translation  All text based techniques are language dependent  Needs automatic translation Vertical Market – Technical document translation Personal Market – Web browsing  Combination of media technology Automatically translate international telephone messages. Japanese Japanese English English Voice Text data Text data Voice Japanese Machine English voice recognition Translation Speech Synthesis 73
  74. 74. File Format .TXT - (unformatted text eg. Notepad) .DOC - (Developed by Microsoft eg. MS- Word) .RTF - (Rich Text Format) PDF - (Portable Document Format) – Adobe PS - (Post Script) – Page Description Language Use mainly for Desk Top Publishing 74

×