Published on

  • Be the first to comment

  • Be the first to like this


  1. 1. Subproject 4: HTML-WML Transcoding System Jia-Shung Wang Computer Science Department National Tsing Hua University March 27, 2001
  2. 2. Outline <ul><li>Motivation and Issues </li></ul><ul><li>Examples of Transcoding </li></ul><ul><li>System Overview and Translation Flow </li></ul><ul><li>Some HTML to WML Conversion Strategies </li></ul>
  3. 3. Information Appliances <ul><li>Different design constraints based on intended use, enhances ease of use </li></ul><ul><ul><li>Desktop PC </li></ul></ul><ul><ul><li>Mobile PC </li></ul></ul><ul><ul><li>Desktop “Smart” Phone </li></ul></ul><ul><ul><li>Mobile Telephone </li></ul></ul><ul><ul><li>Personal Digital Assistant </li></ul></ul><ul><ul><li>Set-top Box </li></ul></ul><ul><ul><li>Digital VCR </li></ul></ul><ul><ul><li>… </li></ul></ul><ul><li>Implications: </li></ul><ul><ul><li>Shift from computer design to consumer design </li></ul></ul><ul><ul><li>Heterogeneous “standards,” hybrid networking </li></ul></ul><ul><ul><li>Interactive networking, access on demand, QoS </li></ul></ul>
  4. 4. Motivation <ul><li>Rapidly growing diversity of wireless communication devices </li></ul><ul><li>The incredible growing of the amount of available HTML web pages on the Internet </li></ul><ul><li>Solutions for mobile devices with WML browsers to access the existing HTML or WML pages on the Internet. </li></ul>
  5. 5. Issues <ul><li>Device-enabled service for WML mobile devices with different types of screen </li></ul><ul><li>Bandwidth-driven transmission for rapid response and fast delivery speed </li></ul><ul><li>The usage of browsing behavior </li></ul><ul><li>The resizing of images /icons </li></ul><ul><li>The compression of the resulting WML data </li></ul>
  6. 6. Demos of Transcoding <ul><li>Contents from </li></ul><ul><ul><ul><li>enYES 鉅亨網 </li></ul></ul></ul><ul><ul><ul><li>USAtoday </li></ul></ul></ul><ul><ul><ul><li>CS, NTHU </li></ul></ul></ul><ul><ul><ul><li>NTHU </li></ul></ul></ul><ul><ul><ul><li>VOD </li></ul></ul></ul>
  7. 7. Discussions <ul><li>enYES provides two versions: regular HTML and WAP to serve PC users and mobile device users separately. </li></ul><ul><li>USAtoday also provides content (simplified version) for users with Palm. </li></ul><ul><li>NTHU, CS-NTHU homepages : If we keep the original figure for saving the link information, then the page layout becomes old. (using HTML browser with:Browse-It). </li></ul><ul><li>VOD homepage, one-column text: no significant difference after transcoding. </li></ul>
  8. 24. Usage of Browsing Behavior <ul><li>The automatic translation seems complicated because of the diversity of content posted on an HTML page. </li></ul><ul><li>It is unlikely to have a universal conversion strategy to translate every HTML page to sequences of WML decks effectively. </li></ul><ul><li>However, it seems a good idea to categorize the browsing behavior to classify the HTML page to be translated first. </li></ul>
  9. 25. Usage of Browsing Behavior (cont’d) <ul><li>After doing that we may realize what the client requires. Then we can have a corresponding conversion to extract the acquired content step-by-step and translate them into some predictable and small sized WML documents. </li></ul><ul><li>We believe that there would be some adequate conversions for some kinds of web pages after classification. </li></ul>
  10. 26. Related Works Transcoding Proxy of IBM alphaWorks <ul><li>It has a goal to manager different version of contents with different fidelities and modalities in order to adapt the delivery to different client device. </li></ul>
  11. 28. Related Works Intel Quick Web Technology <ul><li>New software capability that helps Internet providers and digital distribution companies increase the delivery speed of Web pages containing photos, drawings and other graphics. </li></ul><ul><li>It uses two key techniques, “Compresses” and “Caches”. </li></ul>
  12. 30. Related Works Spyglass Prism <ul><li>Spyglass Prism dynamically adapts Web content to match various non-PC devices. </li></ul><ul><li>It functions as a proxy server, caches the converted content, and dynamically converting standard HTML to WML. </li></ul>
  13. 31. Related Works Proxy Architecture for Efficient Web Browsing over Cellular Networks <ul><li>Decreases the access time of browsing WWW in narrow-band wireless environment. </li></ul><ul><li>It adopts persistent connection and pipelining technique based on proxy architecture to improve the HTTP process between the client and the proxy server. </li></ul>
  14. 33. Comparisons between HTML and WML <ul><li>Both make use of tags and attributes. </li></ul><ul><li>Similar character set, syntax and data types. </li></ul><ul><li>Two special elements of WML structure </li></ul><ul><ul><li>Deck and Card </li></ul></ul><ul><li>Different design goal </li></ul><ul><ul><li>HTML: To Publish hypertext on the World Wide Web </li></ul></ul><ul><ul><li>WML: For narrow network bandwidth devices with small displays, limited memory and fewer computational resources. </li></ul></ul>
  15. 34. Examples of HTML and WML WML <wml> <deck> <card> <p> <do type=&quot;accept&quot;> <go href=&quot;#card2&quot;/> </do> This is the first card... </p> </card> <card id=&quot;card2&quot;> <p> This is the second card. </p> < /card > </deck> </wml> HTML <html> <head> <title> Example page. </title> </head> <body> <h1> This is a headline. </h1> <p> This is a paragraph. </p> </body> </html >
  16. 35. System Overview Web Server Multimedia Content Translation Server WML Generator WML WML Browser Etc. HTTP HTML Parser WAP HTML-WML Translator HTML, WML Documents HTTP CGI Scripts etc. Client
  17. 36. Features <ul><li>An HTML-WML Translator on the Translation Server </li></ul><ul><li>Both HTTP and WAP requests are acceptable. </li></ul><ul><li>Java Servlet API compatible </li></ul><ul><li>Server- and platform-independent </li></ul>
  18. 37. Translation Server: Components and Flow Network Protocol Proxy HTML Parser Filter Document Analyzer Decks & Cards WML Generator Link Builder Request Request Response Response
  19. 38. Components <ul><li>Gateway </li></ul><ul><ul><li>Accept requests from clients </li></ul></ul><ul><ul><li>Return appropriate responses </li></ul></ul><ul><li>Proxy Servlet </li></ul><ul><ul><li>Get the requested remote documents </li></ul></ul><ul><ul><li>Determine to pass or convert </li></ul></ul><ul><ul><li>Cache the converted results </li></ul></ul>
  20. 39. Components (cont’d) <ul><li>HTML Parser </li></ul><ul><ul><li>Parse the HTML document as a parse tree </li></ul></ul><ul><li>Document Analyzer </li></ul><ul><ul><li>Analyze the parse tree </li></ul></ul><ul><li>Filter </li></ul><ul><ul><li>Filter any objects unnecessary or not supported by the client device </li></ul></ul><ul><ul><li>Image/icon resizing </li></ul></ul>
  21. 40. Components (cont’d) <ul><li>Content Divider </li></ul><ul><ul><li>Split a document into multiple, small-size documents </li></ul></ul><ul><li>Link Maker </li></ul><ul><ul><li>Insert extra links to make small documents reach one another </li></ul></ul><ul><li>WML Generator </li></ul><ul><ul><li>Produce well-formed WML documents and return them to Proxy Servlet </li></ul></ul>
  22. 41. HTML to WML Conversion Tools <ul><li>Semi-automatic: </li></ul><ul><ul><li>Used for rich HTML documents </li></ul></ul><ul><ul><li>The conversion form is designated manually with the help of analysis and editing tools. </li></ul></ul><ul><ul><li>The resulting forms are distributed to the gateway servers. </li></ul></ul><ul><li>Automatic: </li></ul><ul><ul><li>Used for simple documents, such as News and BBS, … </li></ul></ul>
  23. 42. HTML to WML Conversion Strategies <ul><li>Strategy I: Tables to Lists </li></ul><ul><ul><li>Simply removing all layout elements such as table </li></ul></ul><ul><ul><li>Let all the contents arrange into only one column with a fixed width </li></ul></ul><ul><li>Strategy II: One Table One Deck </li></ul><ul><ul><li>Extracting each table to form a deck </li></ul></ul>
  24. 43. HTML to WML Conversion Strategies (cont’d) <ul><li>Strategy III: Preview First </li></ul><ul><ul><li>a. One Table One Deck </li></ul></ul><ul><ul><li>b. Collect all the first card of every deck as preview cards </li></ul></ul><ul><ul><li>c. Arrange these preview cards to form an preview deck, which will be transmitted first, every preview card will have a link to its corresponding deck </li></ul></ul>
  25. 44. Original Document <document> <table> <table> <table> < section 4> <section 1> <section 2> < section 3> <content 1_1> <content 1_2> <content 4_1> <content 2_1> <content 2_2> <content 2_3> <content 2_4> <content 3_5> <content 3_6> <content 3_7> <content 2_5> <content 3_1> <content 3_2> <content 3_3> <content 3_4>
  26. 45. Tables to Lists <document> <deck> <content 1_1> <content 1_2> <content 2_1> <content 2_2> <content 2_3> <deck> <deck> <content 2_4> <content 2_5> <content 3_1> <content 3_2> <content 3_3> <content 4_1> <content 3_5> <content 3_6> <content 3_7> <content 3_4>
  27. 46. One Table One Deck <document> <deck> <content 1_1> <content 1_2> <content 2_1> <content 2_2> <content 2_3> <deck> <deck> <content 2_4> <content 2_5> <content 3_1> <content 3_2> <content 3_3> <content 4_1> <content 3_5> <content 3_6> <content 3_7> <content 3_4> <deck> <deck>
  28. 47. Preview First <document> <deck> <content 1_1> <content 1_2> <content 2_1> <content 2_2> <content 2_3> <deck> <deck> <content 2_4> <content 2_5> <content 3_1> <content 3_2> <content 3_3> <content 4_1> <content 3_5> <content 3_6> <content 3_7> <content 3_4> <deck> <deck>
  29. 48. Strategy Evaluation <ul><li>Assuming we have S sections in a document and the document is translated to N WML cards. </li></ul><ul><li>Every deck contains at most C cards. </li></ul><ul><li>Assuming that the contents in the same tables are similar. </li></ul>
  30. 49. Evaluation of Searching After Translation Preview First One Table One Deck Tables to Lists Good Best Worst User Friendly S/2C S/2 N/2 Average Deck Access Time
  31. 50. Performance Evaluation 5.4% 57.2% 16,891 7.4% 46.7% 11,232 3.5% 22.0% 7,440 280,727 8,325 21,203 126,740 6,137 17,937 176,361 9,471 24,359 Experiment #1 Experiment #2 Experiment #3 Headers Text Source (bytes) Images (bytes) With Images Without Images Reduction HTML Pages WML Decks (bytes) 25.2% 40.3% 12,062 17,966 20,363 9,568 Experiment #4
  32. 51. Performance Evaluation (Experiment #1: What’s WAP ) Preview Deck 1 Deck 3.2 Deck 3.1 What’s WAP Preview Deck 3 Deck 2 Deck 1 WAP Forum
  33. 52. Performance Evaluation (Experiment #2: NTHU Web Page) Preview NTHU Preview Deck 1 Preview Deck 1 Deck 2.1 Deck 2.2 Current Status Preview Deck 1 Deck 2.1 Deck 2.2 History Deck 3.1 Deck 3.2 About NTHU
  34. 53. Performance Evaluation (Experiment #3, NTHU CS Web Page) Preview Deck 1 Deck 3.2 Deck 3.1 Faculty Preview Deck 1 NTHU CS Deck 3.4 Deck 3.3 Deck 3.6 Deck 3.5
  35. 54. Performance Evaluation (Experiment #4, IETF Web Page) Preview Deck 1 IETF Preview Deck 1 Deck 2.1 Deck 2.2 Internet-Drafts Preview Deck 1 Deck 2.2 Deck 2.1 Internet-Drafts Index Deck 2.4 Deck 2.3 Deck 2.5 Preview Deck 1 Deck 2.2 Deck 2.1 DNSOP Deck 2.4 Deck 2.3 Deck 2.5
  36. 55. Implementation <ul><li>Goal: Portability, reusability, and crash protection. </li></ul><ul><li>Translation server: under Java environment with Java Servlet, Java HTML Tidy, and XML Parser for Java. </li></ul><ul><li>Servlet-enable server: Avenida Web Server and Nokia WAP Server </li></ul><ul><li>Microsoft Windows NT Workstation 4.0 with Service Pack 5 </li></ul>
  37. 56. Summary <ul><li>Design an HTML to WML transcoding system with </li></ul><ul><ul><li>Analyzing and filtering HTML contents </li></ul></ul><ul><ul><li>Image/icon resizing </li></ul></ul><ul><ul><li>WML browsing mode design and WML conversion tool </li></ul></ul><ul><ul><li>compression and decompression modules of the WML data. </li></ul></ul><ul><ul><li>WML transmission control </li></ul></ul>