• Save
Anvita   Web Contentfor Indian Language 15mar2001
Upcoming SlideShare
Loading in...5
×
 

Anvita Web Contentfor Indian Language 15mar2001

on

  • 990 views

 

Statistics

Views

Total Views
990
Views on SlideShare
990
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Anvita   Web Contentfor Indian Language 15mar2001 Anvita Web Contentfor Indian Language 15mar2001 Presentation Transcript

    • State of art in Web design and Content creation for Indian Languages - Anvita Bajpai.
    • Agenda Indian Languages on the Web Language - Indian Languages Introduction to Language – Categories Indian Language/ Scripts Standards about Indian Languages Enabling Indian Languages on the Web Case Studies
    • Introduction to Language Languages = (Words)* Word = String of characters between two spaces. Base-form + a suffix. Base form + Case Markers + Clitics + Stress- markers
    • Language Categories Type I: like English or Hindi The normal addition to the base form is fundamentally the verbal flexion or the nominal declension i. e. conjunction and case, accompanied by number and gender markers. Type II: like Marathi or Gujrati A considerable adjunction to the Noun or the Verb and clitic elements are joined to the base form to generate words that are considerably complex and long in nature.
    • Language Categories Type II (Examples): tEblACA: tEbl + A + CA : of the table root + declension + Case marker tEblAvar: tEbl + A + VAR : on the table root + declension + Postposition tEblAvarCA: tEbl + A + VAR + CA : “of” on the table root + declension + Postposition + Case marker tEblAvarCAC: tEbl + A + VAR + CA + C : “of” on the very table root + declension + Postposition + Case + Stress
    • Language Categories Type III: very particular - like Tamil The adjuncted element is not a grammatical unit such as a clitic or a Case Marker but is a meaningful element such as Noun/ Verb/ Adjective. The adjunction of these together can or cannot be accompanied by sandhi both internal and external. In other words, chains of meaningful elements all joined together. Type III languages also include Type II structures. Example: Paiyan + um + ai = paiyanumai BUT ilai + ukku + ai = ilaiyukkuvai.
    • Language Categories Type IV: like Tibetan or Bhutanese Words as distinct entity separated from the other word by one sentence only demarcated by a diacritic marker such as a full-stop. may or may not be sandhi accompanying the concatenation of the elements. A possible solution to this problem is to provide for artificial delimiters at the word boundary to delimit the word. Reference: “Designing Indian language spell-checker, A work manual”, C-DAC (GIST Group)
    • Indian Languages 15 officially recognized languages in India: Hindi, Marathi, Sanskrit, Punjabi, Gujarati, Oriya, Bengali, Assamese, Telugu, Kannada, Malayalam, Tamil, Urdu, Sindhi and Kashmiri. Urdu, Sindhi and Kashmiri are primarily written in Perso- Aabic scripts, but get written in Devanagari too (Singhi is also written in the Gujarati script).
    • Indian Languages Indian languages have evolved from the ancient Brahmi script Common phonetic structure, Common character set possible. The Northern scripts: Devanagari, Punjabi, Gujarati, Oriya, Bengali and Assamese. The Southern script: Telugu, Kannada, Malayalam and Tamil.
    • Indian Languages - Standards Attribute Mechanism For selection of different Indian script font and display attributes. An Extension mechanism allows use of more characters along with the ISCII code. These are only meant for the environment where no other alternative selection mechanism is available.
    • Indian Languages - Standards The ISCII code table: A super-set of all the characters required in the ten Brahmi- based Indian scripts. For convenience, the alphabet of the official script Devanagari (with diacritic marks for non-Devanagari alphabets) ha been used in the standard. For notational simplicity, elsewhere, the term Indian scripts implies Brahmi-based Indian scripts. ISFOC: Indian Script Font Code is used for displaying Indian language characters to various display devices e.g. Screen, Printer.
    • Enabling Indian Languages on the Web CONTENT CREATION IN INDIAN LANGUAGES Fonts Font Face =“ DV-TTYogesh“ Font Local OR Dynamic Font Encoding Standards Outputs types: Static Static Content Major part: Font. Dynamic Content changes according to client’s requirements
    • Enabling Indian Languages on the Web Dynamic HTML pages: Data is updated on the fly as per a query. Data is stored in ISCII / ISFOC in a database Required s/w tools: Activex controls/ plugins conversion tools ISCII to ISFOC Data storage (preferred in ISCII): Converting from ISCII to any font code is easier. Sorting or indexing can be done only in ISCII format. Speech Synthesis Lesser disk space
    • Enabling Indian Languages on the Web DISPLAYING DATA IN INDIAN LANGUAGES Set of fonts/ Local fonts downloadable on the client side methodology Transparent to the user. Dynamic fonts For delivering the windows true type fonts on the client side in a transparent way. Font delivery Automatic PFR and EOT format
    • Enabling Indian Languages on the Web Dynamic Fonts Based on Bitstream's WebFont technology and Microsoft's Web Embedding Font Technology(WEFT) Publish in HTML documents what you wanted them to see, regardless of language or font. EOT – Embedded Open Type format Microsoft’s way of sending encoded fonts to the user. only IE4.0 and above EOTs have specific URL roots PFR – Portable Font Resource Multi-platforms
    • Enabling Indian Languages on the Web PFR – Portable Font Resource Bitstream's Webfont Technology Any True Type Font can be converted to Dynamic Font. Both in Netscape(4.03 and above)/ IE( 4.0 and above) In IE however, there is a one-time download of a control on the client’s machine. URL security( can be locked to particular URLs) PFRs are stabler than EOTs But it needs Encoding changes in IE 5.0. Sometimes in IE 5.0 the user will have to make encoding as 'User Defined' to view the page properly. PFR's and EOT can co-exist on a single site. The Browser will sense the dynamic font and accordingly display.
    • Architecture for WEB applications enabled for Indian Languages Font code may change Searching Uses fonts & ISCII Static Sorting possible. DB HTML ISFOC Pages DB Searching & Sorting not Converters Possible Business Logic/ Web Application HTTP Server IE / Netscape IE IE IE / Netscape Client 1* Client 2* Client 3* Client 4* View View Interactive Only Only Interactive ActiveX Dynamic Manual iPlugin Fonts Fonts
    • Unicode: Range Devanagri 0900 – 0975 Gurmukhi 0A00 – 0A7F Gujarati 0A80 – 0AFF Oriya 0B00 – 0B7F Bengali 0980 – 09FF Telugu 0C00 – 0C7F Kannada 0C80 – 0CFF Malayalam 0D00 – 0D7F Tamil 0B80 – 0BFF