Web Layout Mining - JECS 29(2)


Published on

Imran Sarwar Bajwa, M. Abbas Choudhary [2007], "Web Layout Mining (WLM):A Paradigm for Intelligent Web Layout Design", Egyptian Computer Science Journal (ECSJ), May 2007, Vol. 29, No. 02, pp:33-39

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Web Layout Mining - JECS 29(2)

  1. 1. International Journal of Egyptian Computer Society, Volume 29, Number 2 WEB LAYOUT MINING Using NLP: A PARADIGM FOR INTELLIGENT WEB LAYOUT DESIGN Imran Sarwar Bajwa1, M. Abbas Choudhary2 CISUC –Department of Informatics Engineering, University of Coimbra, 3030 Coimbra, Portugal E-mail: imran@dei.uc.pt 2 Higher Education Comission of Pakistan Islamabad, Pakistan. E-mail: machoudhary@hec.gov.pkAbstract: The problem in designing of modern website projects is to produce contents according to the latest trends and styles. The common website editors just help to draw the intended layouts but the problem is to design the accurate web layout according to the demand and latest trends and style. This approach is useful when the user has a specific layout already in mind and is familiar with the web page layout principles as to what kinds of layouts are possible. It is intrinsically difficult for particularly those who have limited artistic and creative abilities to design good layout from scratch which is acceptable in every respect. An automated system is required that has ability to mine the layouts of the desired type of websites. The designed system for “Web Layout Mining (WLM)” helps to mine the most popular web-layouts from the internet database and design a web- layout that is near to acceptable and have all the marks and features of modern requirements. The designed system actually bases on a rule based algorithm which helps the user to search out some samples related to his website category and afterwards the user himself chooses a desired web-layout and designs its own one with proper implications and variations according to his own requirements.Keywords: Web layout mining, Human Computer Interaction, Information retrieval, Natural language processing.1. INTRODUCTION A successful webpage design is typically based upon the modern and latest web designingtrends. The layout of web page for an educational institute is quite different from the weblayout of a website designed for showbiz. Similarly the layout of a sports webpage is totallydifferent from the website that has been designed for the commercial purposes [2]. Therefore, awebsite designer who is going to design a business, showbiz, sports, education, informativewebsite, or website for some other general contents, he should be well aware of the current and
  2. 2. International Journal of Egyptian Computer Society, Volume 29, Number 2latest trends and styles in particular discipline. A website for a an educational institute as auniversity or a college usually have peculiar style of having a menu bar on the tops with linkssuch that home, faculty, admissions, research, admission, contact us, etc. Typically thesewebsite have both static and dynamic pages [5]. Such web pages don’t have so many extensivemenus and don’t have so many contents. They need just manifestation of the standard of theeducation at their institute. Thus they need a flexible design to promote their educationalpolicies and research activities. These website have more graphical and visual contents ascompare to normal websites [1]. A person who is going to design an educational website, heshould know that to highlight which contents and to prioritize which stuffing. The business and showbiz websites are mostly more colourful and attractive as compared toother websites [14]. They also have lot of menus and options and are often dynamic websites.These website highlight their contents using lot of images and colors. Due to these heavycontents in terms of size, website designers has to take care that the website layout should be soefficient and effective so that it can be browse easily and in less time [22]. Sports andnewspaper websites are very simple, not very much colourful but generate the dynamiccontents as updating the scorecards and hourly updating the sports, business and weather news.Some other website for general contents are also very much peculiar in there web-layout style. A web layout is basically the arrangement of the various web contents on a web page and it ishighly integral and significant component of the structural design of the website. Often thewebsites from various perspectives have their peculiar web layouts and designs. As businesswebsites use more user forms and reports as compared to informative website which ahs moremenus and more graphics [5]. The introductory websites as of educational institutes anduniversities are more regular and the commercial and showbiz websites rather more irregularand informal. Showbiz and Personal web pages have more animated pictures, audio and videocontents than any other website. On behalf of this differentiation each website has own set ofrequirements for design and development.2. Problem statement Generally WYSIWYG (What You See Is What You Get) editors are used for designing thewebsites [4]. WYSIWYG editors are word processor like software, which display the pagealmost exactly as it will appear after publishing or printing [17]. It is very difficult forinexperienced web designers to design good-looking web page layouts especially professionallevel web pages. The WYSIWYG editors just help to draw the intended layouts but theproblem is to design the accurate web layout according to the demand and latest trends andstyle [19]. It is intrinsically difficult for particularly those who have limited artistic and creativeabilities to design good layout from scratch which is acceptable in every respect. 2
  3. 3. International Journal of Egyptian Computer Society, Volume 29, Number 2 The designed system for “Web Layout Mining (WLM): A new paradigm for Intelligent Weblayout Design” helps to mine the most popular web-layouts from the internet database anddesign a web-layout that is near to acceptable and have all the marks and features of modernrequirements. The designed system actually bases on a rule based algorithm which helps theuser to search out some samples related to his website category and afterwards the user himselfchooses a desired web-layout and designs its own one with proper implications and variationsaccording to hi own requirements. This is a effective way of designing awesome WebPageswith less effort in less time.3. Intelligent Web Designing There are many fields of software engineering and purposely web designing is one of theimportant fields which has absolutely revolutionized and grasped the way of communication,information interchange and business styles. To design a successful and excellent website isreal technical task. Web designing field comprises various aspects. A web designer has toconcentrate on various aspects as the web contents, web technology, web visuals and webeconomics [3]. Web contents related to the actual data, facts and figures which are actuallyplaced on a web page. These web contents provide the building blocks for the completedesigning of a building that is typically a website. Web technology provides the actualfunctionality of a website in the variety of forms, reports, dynamic web content generation andothers [15]. The core functionality of a website depends upon the particular web technologythat has been comprised for its designing purpose. Web visuals are primarily related to theoutlook, shape, looks and feels of a website. This is the feature which principally attracts theviewers and influences them to surf on that particular website. The web visuals may consist ofstatic images, animated images, audio and video streams for batter and long-lasting impact ofthe website. In the last, the web economics contributes the economics rectifications whererequired [3]. The web economics helps the web surfers to perform business transactionsthrough web. In early web designing days, websites were small, simple and static. Information was less andwebsites were typically specific, hence the design was easy and straightforward. Now a daysdata as aspects of a website have grown up to an explosive size due to advancements intechnologies and requirements [6, 8, 11]. A website can be successful and excellent on the basisof various factors as its usefulness, correctness, usability and its pleasant appearance. More orless all these features are directly related to the structural design of a website. Successful andeffective websites are useful to their users. A website is useful if she occupies the features ofutility & usability. 3
  4. 4. International Journal of Egyptian Computer Society, Volume 29, Number 2 • Utility describes the website’s functionality that a user hopefully meets his requirements and needs easily • Usability describes the ability to manipulate the site’s features in order to accomplish a particular goal. • Correctness is also a noteworthy issue. The user should find precise and related information on a particular web page. • Pleasant appearance of a website is main key of success or failure for a particular website. More pleasant the website is healthy chances are there for its success and usefulness. These entire four features ultimately relate to the layout design of a website and more or lessconstitute toe the success factor of a website. A website may be failed due to its complex andunrealistic design [9]. Unrealistic design means that the functions provided by the websites areso confusing that a website is not functionally useful. Usable sites are easy to learn, efficientand help the user to easily and satisfactorily accomplish their task in error free manner [18].Layout design is difficult due to its vast scope as it involves tangible and intangible factors withsuch high degree of vitality and subjectivity.4. Related Work Generally, interactive software applications are used for web designing as WYSIWYG (WhatYou See Is What You Get) interfaces based applications where the user can edit the documentvisually without explicitly typing HTML tags [23]. So many web designing tools such asGoLive, Frontpage, WebSphere, HomePageBuilder, and Dreamweaver are available forcreating web pages [16]. This approach is useful when the user has a specific layout already inmind and is familiar with the web page layout principles as to what kinds of layouts arepossible. But, WYSIWYG interfaces are not very helpful in the early stages of design becausethe editing process in a these interfaces does not support the quick exploration of multiplepossibilities [5]. The research in visual interface layout design came into being with the advent of new visualapplications as web layout and graphical user interface for computer applications. From somany examples some are UIDE [13], ADDI [14]. Various methods and techniques have beendefined to address the problem of automatic web-layout generation. These interfaceapplications typically provide the design process and also support incorporation of domain-specific preferences [3]. These applications provide the half functionality as the course ofmapping the domain objects and their properties into corresponding visual properties in thelayout design is left for the user. WebStyler [6] generates an actual HTML file from a simplesketch. It can help users to quickly obtain an html page corresponding to the input sketch. 4
  5. 5. International Journal of Egyptian Computer Society, Volume 29, Number 2DENIM [7] is a sketch-based design tool for early stage of web design. Their user studyshowed the rapid sketch interface is effective for making a design. However, DENIM isdesigned for professional designers who can easily derive more detailed web pages from theirrough sketches.5. Methods and Materials The major emphasis of the conducted research was to first search the intended web layoutand then providing the easy interface to use any one of the searched web-layout for someone’sown purpose. Designed system works like a conventional search engine that mines forappropriate web page layouts. Typically, the orthodox search engines use keywords [9] whilesome other search engines use various search methods as SQL and natural language basedqueries [10]. The user gives his query in simple natural language and designed systemunderstands the query of the user and searches for the desired web-layouts. A list of searchedweb-layouts is provided to the user. User selects its desired web-layout and the WLM systemextracts the only layout of the website after excluding the textual and image contents so that theuser may add its own contents to personalize the web-layout. The designed WLM system works in two halves. In first half the user’s given input text isread by the system and after proper understanding and analysis the necessary information isextracted. This information is further used to draw the sample web layouts. In second half ifuser wants to draw the user forms automatically, those can also be designed by just providingthe information about the forms as how many text boxes are required what are their names andother properties.5.1. Mining a Web Layout The designed system, Web Layout Mining (WLM) first of all searches the desired type ofweb layouts. For example, a web designer wants to build a news website. He is a given anoption of searching his desired web-layouts. As shown in the Figure 1.0, user writes ‘NewsWebsite” in the search toolbar. Here user can also user other standard search engines aswww.google.com and www.altavista.com to search appropriate web-layouts. As shown in the Figure 1.0, various web-layouts of the news websites have been returned tothe user. Each searched link has two options. First option is [open link] that is used to open theactual link of the website and second option is [personalize] and user can basically use thisoption to personalize a web-layout for his own website.5.2. Personalizing a Web Layout After selecting a particular web-layout that the user wants to personalize, user clicks thewebsite link and that particular link is opened into a new window. This new window containsonly the web-layout of the selected page. 5
  6. 6. International Journal of Egyptian Computer Society, Volume 29, Number 2 An algorithm has been designed to extract the web-layout of the desired web-page. Thisalgorithm has the following steps.Step 1 – Read the HTML Code of the web page.Step 2 – Find HTML tags as <html>, <body>, <frame>, <table>, etc.Step 3 – Every character that exist outside of these tags are ignored.Step 4 – A new .html file is created which only consists of the structure of the website excluding the whole actual contents as images, text. Figure 1.0: Automatically generated sample web-layout from user given preferences After following these steps following is the output of the extracted web-layout. Figure 2.0: Automatically generated sample web-layout from user given preferences 6
  7. 7. International Journal of Egyptian Computer Society, Volume 29, Number 25.3. HTML Code for Web-Layout After extracting this type of information the designed system has a vigorous ability ofgenerating related HTML code on the base of this information. On the behalf of extractedinformation nested tables technique is used. For this particular example following code isgenerated by the system. <html> <body> <center> <table width=760 border=1> <tr level=1> <! 1st layer > <td width=760 height=140 module=1> Text </td> </tr> <tr level=2 > <! 2nd layer > <table width=760 border=2> <tr level =1> <td width=140 height=450 module=1> Text </td> <td width=300 height=450 module=1> Text </td> <td width=300 height=450 module=1> Text </td> </tr> </table> <tr width=760 height=140> <! 3rd layer > <td> Text</td> </tr> </table> </body> </html> Code -1: Automated HTML generated code This HTML generated code is stored in a new file. The designed system is adequatelyflexible in analyzing the given text as in the given example the levels and modules are definehorizontally (first layer and then its particular modules) and the analysis was successful.Designed system also has vigorous ability to analyze the text where layers and modules aredefined vertically (layers are defined first and then modules are defined with reference of thedefined layers).6. How WLM System Works The designed system WLM first searches the desired web-layouts and then helps the user topersonalize a particular web-layout. The whole designed system can be divided into two majorhalves as a- Searching Web-Layouts b- Personalizing Web-Layouts 7
  8. 8. International Journal of Egyptian Computer Society, Volume 29, Number 2 In first half the desired type of web-layouts is searched on World Wide Web and in secondhalf the selected web-layout is personalized for own website application. Following is the detailof all the steps that are performed during the web-layout mining. The intended system based onthe structural design shown in the following figure 3.0. Layout PE RS Personalized OL Personalizing Web Layout Web layout page I Z I Making desirable N HTML Code Generation changes G Selecting Web-Layouts Extracting particular Web-layout S Searching Web-Layouts Type of websites, E user wants to search A R C Analyzing User Input Finding the desired H web-layouts of user I N Input G Figure 3.0: Structure of Automatic Web Layout Generation using Natural Language Processing Techniques6.1. Analyzing User Input This is the first phase and it helps to acquire input text preference from the user. Userprovides his requirements in from of paragraphs of the text. This module reads the input text inthe form characters and generates the words by concatenating those input characters. Thismodule is the implementation of the lexical phase. Lexicons and tokens are generated in thismodule.6.2. Searching Web-Layouts This phase reads the input provided by the module 1 in from of words or tokens. These wordsare categorized into various classes as verbs, helping verbs, nouns, pronouns, adjectives,prepositions, conjunctions, etc for the various intentions as understanding and furtherprocessing of the text. 8
  9. 9. International Journal of Egyptian Computer Society, Volume 29, Number 26.3. Selecting Web-layouts This phase particularly extracts different objects as the levels and modules of the web-layoutand the layouts are determined by the <tr> tag and module are represented by <td> tag. Otherrespective attributes are extracted on the basses of the input provided by the preceding module.6.4. HTML Code Generation After extracting the information required to draw the particular HTML tags as <table>, <tr>,<td> tags, the actual code is generated by this phase which actually divides the whole web-pageinto component boxes and these boxes are further used to add contest like text and images.6.5. Personalizing Web Layout This is the final phase which uses the extracted information from the previous phase toactually generate a new HTML file. The HTML generated code in previous phase is embeddedin this file. In response, the output is provided to the user according to his requirements.7. Conclusion The designed system “Automatic Web Layout Generation using Natural LanguageProcessing Techniques” was started with the aim to not only support the experts and save theirtime but also provide a very simple interface to novel users who are not highly skilled indesigning HTML pages and are not skillful in using complex web designing softwareapplications. The user provides his requirements and preferences using simple English text andthe designed application performs the compound analysis of the given text after reading it.Desired HTML code is generated on the basis of the extracted information. A new HTML fileis generated which contains the newly generated web layout. The used approach is based on anewly designed rule based framework which is highly capable of understanding the user giventext and performs the desired task.8. Future Work The designed system can be further improved in terms of its functionality as existing designis only capable of designing the web-layout. There are so many other tasks still to perform asadding contents (text, images, etc) in this web layout automatically. Furthermore, user formsare more common these days, more work done is required for automatic generation of theseuser forms. 9
  10. 10. International Journal of Egyptian Computer Society, Volume 29, Number 29. References[1] Nikiforos Karamanis and Hisar Maruli Manurung, 2002, Stochastic text structuring using the principle of continuity, Proceedings of the Second International Conference on Natural Language Generation (INLG-2002), Ramapo Mountains, NY[2] Imran S. Bajwa, M. Asif Naeem, Riaz-Ul-Amin, M A. Choudhary, Speech Language Processing Interface for Object-Oriented Application Design using a Rule-based Framework, 4th International Conference on Computer Applications 2006 Rangoon, Myanmar[3] A.R. Ahmad, O. Basir, K. Hassanein, “Intelligent Expert System for Decision Support in the Layout Design”, Working Paper, Systems Design Engineering, University of Waterloo, 2004.[4] Pant G., Srinivasan P., Menczer F.: Crawling the Web. In M. Levene and A. Poulovassilis, editors: Web Dynamics, Springer-Verlag (2004).[5] Yasunari Hashimoto1 Takeo Igarashi, "Retrieving Web Page Layouts using Sketches to Support Example-based Web Design" Proceedings of EUROGRAPHICS Workshop on Sketch-Based Interfaces and Modeling (2005).[6] Lin J., Newman M. W., Hong J. I., Landay J.A.: Denim, “Finding a Tighter Fit Between Tools and Practice for Web Site Design”. In CHI Letters: Human Factors in Computing Systems, 2, 1(2000), 510-517.[7] Hearst M.A., Gross M.D., Landay J.A., Stahovich T.E, “Sketching Intelligent Systems”. IEEE Intelligent Systems, 13, 3(1998), 10-19.[8] A. R. Ahmad, O.Basir, K.Hassanein, “Fuzzy Inferencing in the Web Page Layout Design”, Proc. of the 1st Workshop on Web Services: Modeling, Architec. & Infrastructure, France, pp. 33-41, April 2003[9] HU W.C., CHEN Y.: An Overview of World Wide Web Search Technologies. In Proc. of 5th World Multi- Conference on System, Cybernetics and Informatics, (2001).[10] Florescu D., Levy A., Mendelzon A, “Database Techniques for the World-Wide Web A Survey”. SIGMOD Record, 27, 3(1998), 59-74.[11] K.A. Dowsland, S. Vaid, W.B. Dowsland, “An algorithm for polygon placement using a bottom-left strategy”, Euro J of Op Res., Vol. 141 (Special issue on cutting and packing), pp. 371-381, 2002[12] Henderson, James Merlo, Paola Petroff, Ivan Schneider, Gerold (2002): "Using syntactic analysis to increase efficiency in visualising text collections". In: Tseng, Shu-Chuan (ed.): Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002). Taipei, Taiwan: 335-341.[13] J. Foley, W. Kim, S. Kovacevic, and K. Murray, “UIDE-An Intelligent User Interface Design Environment”, In J W Sullivan and S.W. Taylor (Eds.), Intelligent User Interface, ACM, NY, 1991[14] Gould J.D., Lewis C, “Designing for Usability: Key Principles and What Designers Think.” Communications of the ACM, 28, 3(1985), 300-311.[15] Google: http://www.google.com/[16] Hu W.C., Chen Y, “An Overview of World Wide Web Search Technologies”, In Proc. of 5th World Multi- Conference on System, Cybernetics and Informatics, (2001)[18] Hjaltason G., Samet H.: Contractive Embedding Methods for Similarity Searching in Metric Spaces. Technical Report TR-4102, Computer Science Department, Univ. of Malyland, (2000).[19] Ivory M., Hearst M., Sinha R, “Empirically Validated Web Page Design Metrics”, ACM SIGCHI’01 Conference: Human Factors in Computing Systems, (2001) 53-60.[20] Lee S.Y., Hsu F.J.: 2D C-String: A New Spatial Knowledge Representation for Image Database Systems. Pattern Recognition, 23, 10(1990), 1077-1087. 10
  11. 11. International Journal of Egyptian Computer Society, Volume 29, Number 2[21] Petrakis E.G.M., Faloutsos C., Lin K.L.: ImageMap: An Image Indexing Method Based on Spatial Similarity. IEEE Transactions on Knowledge and Data Engineering, 14, 5(2002).[22] Petrakis E.G.M., Orphanoudakis S.C.A.: Generalized Approach to Image Indexing and Retrieval Based on 2-D Strings. Intelligent Image Database Systems, World Scientific, (1996), 197-218.[23] Rui Y., Huang T.S., Chang S.F.: Image retrieval current techniques, promising directions and open issues. Journal of Visual Communication and Image Representation, 10 (1999), 39-62.[24] A.R. Ahmad, O. Basir, K. Hassanein, “Efficient Placement Heuristics for Ge netic Algorithm based Layout Optimization”, Working Paper, Systems Design Engineering, University of Waterloo, 2003. 11