Web intelligence-future of next generation web

15,992 views

Published on

Web intelligence is the area of study and research of the application of artificial intelligence and information technology on the web in order to create the next generation of products, services and frameworks based on the internet.

This presentation was presented by Nijil Y from SEO, CUSAT

Published in: Technology, Spiritual
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
15,992
On SlideShare
0
From Embeds
0
Number of Embeds
23
Actions
Shares
0
Downloads
35
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Web intelligence-future of next generation web

  1. 1. WEB INTELLIGENCE Seminar Report Submitted in partial fulfilment of the requirements for the award of the degree of Bachelor of Technology in Computer Science Engineering of Cochin University Of Science And Technology by NIJIL Y (12080050) DIVISION OF COMPUTER SCIENCE SCHOOL OF ENGINEERING COCHIN UNIVERSITY OF SCIENCE AND TECHNOLOGY KOCHI-682022
  2. 2. WEB INTELLIGENCE Seminar Report Submitted in partial fulfilment of the requirements for the award of the degree of Bachelor of Technology in Computer Science Engineering of Cochin University Of Science And Technology by NIJIL Y (12080050) DIVISION OF COMPUTER SCIENCE SCHOOL OF ENGINEERING COCHIN UNIVERSITY OF SCIENCE AND TECHNOLOGY KOCHI-682022
  3. 3. DIVISION OF COMPUTER SCIENCE SCHOOL OF ENGINEERING COCHIN UNIVERSITY OF SCIENCE AND TECHNOLOGY KOCHI-682022 Certificate Certified that this is a bonafide record of the seminar entitled “WEB INTELLIGENCE” Presented by the following student NIJIL Y of the VII th semester, Computer Science and Engineering in the year 2010 in partial f ulfillment of the requirements in the award of Degree of Bachelor of Technology in Computer Science and E ngineering of Cochin University of Science and Technology. Mr. SUDEEP EDAYILAM Seminar guide Dr. DAVID PETER Head Of Division
  4. 4. ACKNOWLEDGEMENT I thank GOD almighty for guiding me throughout the seminar. I would like to thank all those who ha ve contributed to t he c ompletion of t he s eminar a nd he lped me with va luable suggestions for improvement. I a m e xtremely grateful to Dr. David Peter, Head Of Division, Division of Computer Science, for providing me with best facilities and atmosphere for the creative work guidance and encouragement. I am profoundly indebted to my seminar guide Mr. Sudheep Elayidom, sr.Lecturer, Division of Computer Science, for all help and support extend to me. I thank all Staff me mbers of my c ollege a nd f riends f or e xtending t heir c ooperation during m y seminar. Above all I would like to thank my parents without whose blessings, I would not have been able to accomplish my goal. NIJIL Y
  5. 5. ABSTRACT Web Intelligence is a new direction for scientific research and development that explores the f undamental roles as w ell as practical i mpacts of ar tificial i ntelligence and adva nced information t echnology f or t he ne xt ge neration of Web-empowered systems, services, and environments. Web Intelligence is regarded as the key research field for the development of the Wisdom Web ( including t he S emantic W eb). The Web r evolutionizes t he w ay w e ga ther, process, a nd us e i nformation. Despite cu rrent t echnological adva nces, w e st ill ca nnot pred ict what t he Web’s ne xt pa radigm s hift w ill b e. H owever, w e pr opose t hat t his c hange w ill transform the Web into an intelligent entity—hence, the term Web intelligence. The ne xt-generation W eb w ill go b eyond i mproved i nformation s earch a nd know ledge queries a nd will h elp p eople a chieve be tter w ays of l iving, working, pl aying, a nd l earning. T o fulfil its potential, the intelligent Web’s design and development must incorporate and integrate several f undamental capa bilities. A f ew o f i ts capa bilities a re R eflexive ser ver pro pagation , Growth Specialization , A utocatalysis et c. Intelligent Web agents can use t he P roblem S olver Mark-up L anguage ( PSML) t o s pecify t heir r oles, s ettings, a nd r elationships w ith a ny ot her services. The i ntelligent Web must a lso ha ve the a bility t o pr ocess and unde rstand na tural language. It must understand and c orrectly judge the meaning of concepts expressed in words, such as “go od,” “be st,” and “season” et c. WI r esearch incorporates k nowledge f rom e xisting disciplines, such as artificial intelligence and information technology, in a t otally new domain. At t he sam e t ime, Web Intelligence r esearch also enriches t hese established disciplines as it introduces new topics and challenges.
  6. 6. TABLE OF CONTENTS CHAPTER NO. CHAPTER TITLE PAGE NO. 1 Introduction 1 2 Perspectives Of Wi 4 3 Intelligence Exploration 8 3.1 A New Field Of Science, Technology And Engineering 8 3.2 Design Philosophy And Principles Of The Web 8 3. 3 The Laws Of The Web 9 3. 4 The Web Revolution: One Link At A Time 10 3.5 The More Things Change, The More They Stay The Same 11 4 Components Of Web Intelligence 13 4.1 Web Data 13 4.2 Representation 15 4.3 Psml And Web Inference Engine 17 4.4 Social Network Intelligence 17 4.4 Social Network Intelligence 17 5 Computational Web Intelligence 18 5.1 Web Uncertainty 19 5.2 Computational Web Intelligence For Web Uncertainty 19 5.3 Granular Web Intelligence For Web Uncertainty 21 6 Trends And Challenges Of Wi Related Research And Development 23 6.1 Intelligent Web Agents 24 6.2 From Wa To Web-Based Services 25 7 Semantic Search Engine 28 8 Conclusion 29 References 30
  7. 7. Web Intelligence CHAPTER 1 INTRODUCTION With the rapid growth of Internet and World Wide Web (WWW), we have now entered into a new information age. The Web provides a total new media for communication, which goes far beyond the traditional communication media, such as radio, telephone and television. The Web has significant impacts on both academic research and ordinary daily life. It revolutionizes the way in which information is gathered, stored, processed, presented, shared, and used. The Web offers new opportunities and challenges for many areas, such as business, commerce, marketing, finance, publishing, education, research and development. For computer scientists, the Web introduces many new research topics and provides a new platform to reconsider old problems. It might be high time to create a new sub-discipline of computer science covering theories and technologies related to the Web. Web Intelligence is our proposal for this purpose. Through the billions of Web pages created with HTML and XML, or generated dynamically by underlying Web database service engines, the Web captures almost all aspects of human endeavor and provides a fertile ground for data mining. However, searching, comprehending, and using the semi-structured information stored on the Web poses a significant challenge because this data is more sophisticated and dynamic than the information that commercial database systems store. To supplement keyword-based indexing, which forms the cornerstone for Web search engines, researchers have applied data mining to Web-page ranking. In this context, data mining helps Web search engines find high-quality site administrator. WI explores the fundamental and practical impact that artificial intelligence and advanced information technology will have on the next generation of Web-empowered systems, services, and environments. In an era dominated by the World Wide Web, Grid computing, intelligentagent technology, and ubiquitous social computing, WI represents information technology’s next challenge. 3 Motivations and Justifications for WI The introduction of Web Intelligence (WI) can be motivated and justified fromboth academic and industrial perspectives. Two features of the Web make it a useful and unique platform for computer applications and research, the size and complexity. The Web contains a huge amount of interconnected Web documents known as Web pages. For example, the popular search engine Google claims that it can search 1,346,966,000 pages as of February 2001. The sheer size of the Web leads to difficulties in the storage, Division Of Computer Science , SOE CUSAT Page 1
  8. 8. Web Intelligence management, and efficient and effective retrieval of Web documents. The complexity of the Web, in terms of connectivity and diversity of Web documents, forces us to reconsider many existing information systems, as well as theories, methodologies and technologies underlying those systems. One has to deal with a heterogeneous collection of structured, unstructured, semistructured, interrelated, and distributed Web documents consisting of texts, images and sounds, instead of homogeneous collection of structured and unrelated objects. The latter is the subject of study of many conventional information systems, such as databases, information retrieval, and multi-media systems. To accommodate the needs of the Web, one needs to study issues on the design and implementation of the Web-based information systems by combining and extending results from existing intelligent information systems. Existing theories and technologies need to be modified or enhanced to deal with complexity of the Web. Although individual Web-based information systems are constantly being deployed, advanced issues and techniques for developing and for benefiting from the Web remain to be systematically studied. The challenges brought by the Web to computer scientists may justify the creation of the new sub-discipline, WI, for carrying out Web-related research. The Web increases the availability and accessibility of information to a much larger community than any other computer applications. The introduction of Personal Computers (PCs) brought the computational power to ordinary people. It is the Web that delivers more effectively information to everyone at finger tips. The Web, no doubt, offers a new means for sharing and transmitting information unmatchable by other media. The revolution started by the Web is just beginning. New business opportunities, such as e-commerce, e-banking, and e-publication, will increase with the maturity of the Web. It can hardly overemphasize more impacts of the Web on the business and industrial world. The creation of a new sub-discipline devoted toWeb related research and applications might has a significant value in the future. The needs for WI may be further illustrated by the current fast growing research and industrial activities centered on it. We searched the Web by using the keyword “Web Intelligence” through several search engines in February 2001. Division Of Computer Science , SOE CUSAT Page 2
  9. 9. Web Intelligence What is Web Intelligence? “Web Intelligence (WI) exploits Artificial Intelligence (AI) and advanced Information Technology (IT) on the Web and Internet.” This definition has the following implications. The basis of WI is AI and IT. The “I” happens to be shared by both “AI” and “IT”, although with different meanings in them, and “W” defines the platform on which WI research is carried out. The goal of WI is the joint goals of AI and IT on the new platform of the Web. That is, WI applies AI and IT for the design and implementation of Intelligent Web Information Systems (IWIS). An IWIS should be able to perform functions normally associated with human intelligence, such as reasoning, learning, and self improvement. There perhaps might not be a standard and non-controversial definition of WI, as the case that there is no standard definition of AI. One may argued that our definition of WI focuses more on the software aspects of the Web. It is not our intention to exclude any research topic using the proposed definition. The term, Web Intelligence, should be considered as an umbrella or a label of a new branch of research centered on the Web. Our definition simply states the scopes and goals of WI. This allows us to include any theories and technologies that either fall in the scopes or aim at the same goals. To complement the formal definition, we try to make the picture clearer by listing topics to be covered by WI. WI will be an ever-changing research branch. It will be evolving with development of the Web as new media for information gathering, storage, processing, delivery and utilization. It is our expectation that WI will be evolved into an inseparable research branch of computer science. Although no one can predict the future in detail and without uncertainty, it is clear that WI would have huge impacts on the application of computers, which in turn will affect our everyday lives. Division Of Computer Science , SOE CUSAT Page 3
  10. 10. Web Intelligence CHAPTER 2 Perspectives of WI As a new branch of research, Web Intelligence exploits Artificial Intelligence (AI) and Information Technology (IT) on the Web. On the one hand, it may be viewed as applying results from these existing disciplines to a totally new domain. On the other hand, WI may also introduce new problems and challenges to the established disciplines. WI may also be viewed as an enhancement or an extension of AI and IT. It remains to be seen if WI would become a sub-area of AI and IT or a child of a successful marriage of AI and IT. However, no matter what happens, studies on WI can benefit a great deal from the results, experience, success and lessons of AI and IT. In their very popular textbook, Russell and Norvig examined different definitions of artificial intelligence from eight other textbooks, in order to decide what is exactly AI. They observed that the definitions vary along the two dimensions. One dimension deals with the functionality and ability of an AI system, ranging from thought processes and reasoning ability of the systems to the behavior of the systems. The other dimension deals with the designing philosophy of AI systems, ranging from intimating human problem solving to making rational decision. The combination of the two dimensions results in four categories of AI systems adopted from Russell and Norvig . Systems that think like humans. Systems that think rationally. Systems that act like humans. Systems that act rationally. This classification provides a basis for the studies of various views and approaches for AI. It also clearly defines goals in the design of AI systems. According to Russell and Norvig , they correspond to four approaches, the cognitive modeling approach (thinking humanly), the Turing test approach (acting humanly), the laws of thought approach (thinking rationally), and the rational agent approach (acting rationally).The two rows for separating AI systems in terms of thinking and acting may not be a most suitable classification. Action is normally the final result of a thinking process. One may argue that the class of systems acting humanly is a super set of the class of system thinking humanly. In contrast, the separation of human-centered approach and rationality-centered approach may have significant implications in the studies of AI. While earlier research on AI was focus more on human-centered approach, rationality-centered approach received more attention recently Division Of Computer Science , SOE CUSAT Page 4
  11. 11. Web Intelligence The first column is centered around humans and leads to the treatment of AI as an empirical science involving hypothesis and experimental confirmation. A human-centered approach represents the descriptive view of AI. Under this view, a system is designed by intimating the human problem solving. This implies that a system should have the usual human capabilities such as knowledge representation, natural language processing, reasoning, planning and learning. The performance of an AI system is measured or evaluated through the Turing test. An system is said to be intelligent if it provides human level performance. Such a descriptive view dominates the majority of earlier studies of expert systems, a special type of AI systems. The second column represents the prescriptive or normative view of AI. It deals with theoretical principles and laws that an AI system must follow, instead of intimating humans. That is, a rationalist approach deals with an ideal concept of intelligence, which may be independent of human problem solving. An AI system is rational if it does the right thing and makes the right decision. The normative view of AI based on the well established disciplines such as mathematics, logic, and engineering. The descriptive and normative views also reflect the experimental and theoretical aspects of AI research. The experimental study represents the descriptive view. It covers theories and models for the explanation of the workings of the human mind, and applications of AI to solving problems that normally require human intelligence. The theoretic study aims at the development of theories of rationality, and focuses on the foundations of AI. The two views are complementary to each other. Studies in one direction may provide valuable insights into the other. Web Intelligence concerns the design and development of intelligent Web information systems. The previous framework for the study of AI can be immediately applied to that of Web Intelligence. More specifically, we can cluster research in WI into the prescriptive approach and the normative approach, and cluster Web information systems in terms of thinking and acting. Various research topics can be identified and grouped accordingly. Like AI, a foundation of WI can be established by drawing results from the following many related disciplines: • Mathematics: computation, logic, probability. Applied Mathematics and Statistics: algorithms, non-classical logics, decision theory, information theory, measurement theory, utility theory, theories of uncertainty, approximate reasoning. Division Of Computer Science , SOE CUSAT Page 5
  12. 12. Web Intelligence • Psychology: cognitive psychology, cognitive science, human-machine interaction, user interface. • Linguistics: computational linguistics, natural language processing, machine translation. • Information Technology: information science, databases, information retrieval systems, knowledge discovery and data mining, expert systems, knowledge-based systems, decision support systems, intelligent information agents. The topics under each entry are only intended as examples. They do not form an exhausted list. In the development of AI, we have witnessed the formulation of many of its new subbranches, such as knowledge-based systems, artificial neural networks, genetic algorithms, and intelligent agents. Recently, non-classical AI topics have received much attentions under the name of computational intelligence. Computational intelligence focuses on the computational aspect of intelligent systems , . The application of AI in other disciplines also leads to new techniques in the corresponding fields. For instance, Business Intelligence (BI) is a result of applying artificial Division Of Computer Science , SOE CUSAT Page 6
  13. 13. Web Intelligence intelligence to the business domain. Artificial Intelligence in Medicine also proved to be a successful application. When viewing WI in such settings, we can identify at least two of its roles. WI may be interpreted “Web based Artificial Intelligence” as the study of particular aspects of AI in the context of the Web, in parallel to the study of computational intelligence. WI may also be interpreted as “Artificial Intelligence on the Web” which regards it as a new application of AI.A more practical goal of WI is the design and implementation of intelligent Web information systems (IWIS). It should be realized that an IWIS is an integrated system containing many sub-systems. To design such a system, it is necessary to apply a variety of theories and technologies. In his work on vision, Marr convincingly made the point that a full understanding of an intelligent system involves explanations at various levels. The same argument is applicable to the development of an IWIS. We can identify at least two levels, the conceptual formulation and physical implementation. The conceptual formulation deals with foundations of IWIS, while physical implementation concerns with construction of an IWIS. The former depends on mathematics and logic, and the latter depends on algorithms and programming. Each level may be further divided into more sub-levels. Research in WI should include any topics at different levels. Division Of Computer Science , SOE CUSAT Page 7
  14. 14. Web Intelligence CHAPTER 3 WEB INTELLIGENCE EX PLORATION Web intelligence further explores the transformation of knowledge from information, and wisdom from knowledge, in its search of the Wisdom Web. Some of the important issues, although may not be well-conceived yet, are briefly discussed in this section. 3.1 A new field of science, technology and engineering The Web, as a new technical and social phenomenon and a growing organism, creates a new field of science that involves a multi-disciplinary study and enquiry for the understanding of the Web and its relationships to us. The Web may be studied from many perspectives, such as philosophical foundations, theoretical and technical foundations, applications, and social impacts. Some examples are given below: • Webology, • Web Science, • Web Technology, • Web Engineering, • Weblization. The term, webology, is coined to label the study of the Web as a new field of science. By postfixing the phrase, science and technology, one clearly states the scope. By post fixing the phrase, engineering, one emphasizes the design and implementation aspects. Together, they are driving forces for information revolution. The term, weblization, concisely summarizes the development: of the Web and web based systems so far. The process of weblization involves building the Web itself and reconstructing existing tools and systems OR the web platform. 3.2 Design philosophy and principles of the Web The design philosophy and principles set the direction of web growth and its ultimate destiny. It may be difficult to compile a non-controversial and complete list. However, examples include Decentralization principle, Universalist principles, Minimum constraint principle, Division Of Computer Science , SOE CUSAT Page 8
  15. 15. Web Intelligence Separation of form and content principle. The decentralization principle is inherited from the decentralization property of the Internet. The universalist principles cover universal connectivity, universal accessibility, as well as diversity of web contents and users. The minimum constraint principle suggests that the Web should be as un-constraining as possible to realize its universality. The separation principle deal with the presentation of web documents, in order to achieve location, machine, and apphcation independence. The design principles ensure that the Web has the desirable properties, such as decentralization, adaptability, evolvability, scalability, universal connectivity and accessibility, affordability, anonymity, diversity, and many others. The Web is able to support communication, collaboration. interaction, and intercreation. 3.3 The laws of the Web Two sets of laws have been studied, namely, the set of laws governing the Web and the set of empirical laws observable on the Web. The Web has given new meaning to publishing and library, but not their underlying principles. Nomzi argued that Ranganathan’s Five Laws of Library Science is weli applicable today as it was more than 70 years ago . Ranganathan’s Five Laws of Library Science state: • Books are for use. • Every reader his or her book. • Every book its reader. • Save the time of the reader. • The Library is a growing organism These laws describe a user-oriented, as well as a serviceoriented, view of library science. The Web consists of a massive collection of resources. By replacing “book”, “reader”, and “library” with “web resource”, “user”, and ‘‘web’, respectively, Noruzi stated Five Laws of the Web • Web resources are for use. • Every user his or her web resource. • Every web resource its user. • Save the time of the user. • The Web is a growing organism. Division Of Computer Science , SOE CUSAT Page 9
  16. 16. Web Intelligence They concisely represent the underlying philosophy of the Web and web services. They also describe the ideal Web - “of the people, by the people, for the people”. Many researchers studied empirical laws revealed by the Web, either its growth, web page distributions, or user surfing patterns. An example set of such laws is reported by Huberman : I. Power Law of Distribution. 2. Small World Law. 3. . Law of Surfing. 4. Law of Congestion. 5. The Free Ride Law 6. The Law of Downloading. Website designers, webmasters, and organizations can apply such laws for the design of better website and web resources. 3.4 The Web revolution: one link at a time The story of the invention of the Web and the revolution brought by the Web provides a good case study for web intelligence. It poses a challenge: how to derive insights and wisdom from the existing data, information, and knowledge. Regarding the pre-web uses of hypertext links, Berners-Lee commented, “The research community had used the links between paper documents for ages: Tables of contents, indexes, bibliographies, and reference sections are hypertext links.’’ A crucial question is what we can get from this common knowledge and practice. Two types of approaches have been proposed and studied. One focuses on the exploration of the potential implications of such knowledge, which leads to the creation of a field of science known as citation indexing and analysis. The other focuses on the representation, storage, and access of the similar types of data and knowledge using new media as they become available, which leads to the invention of the Web. A basic idea of citation indexing and analysis is to index and study the literature of science Division Of Computer Science , SOE CUSAT Page 10 based on how scientists cite each other. Although it mainly uses bibliographies, citation indexing
  17. 17. Web Intelligence and analysis brings more insights into science, publishing, scientific research, and many more fields. Information retrieval systems, based on citation indexing and analysis, have been implemented and used by scientists for many years. The same methods have been applied or rediscovered in many recent studies, such as web search engines, social network analysis, and so on. A basic idea of the Web is to create a global space in which anything can be linked to anything . The development of the Web emphasizes the implementation of this idea using different type machines and media. The Web attempts to make the existing associations and links, that people had used either explicitly or implicitly, concrete and computer manageable. The similar concepts had been explored in preweb age. Vnnevar Bush described a photoelectromechanical machine called the Memex that can make and follow cross-references among microfilm documents. Ted Nelson introduced the concept of hypertext, so that people can use computers to read, write and publish non-linear texts. Doug Engelbart demonstrated a collaborative work space called NLS which does hypertext browsing editing, email, and so on. Thanks to the timely invention of the Internet for providing global connectivity, the dream of the Web became a reality. The revolution of the Web is brought by grassroots effort that builds the Web link by link. There are recent research efforts in cross-applications of the two types of approaches. The methods developed for citation indexing and analysis are used and extended to analyze the links and conductivity of the web. Existing systems for citation indexing and analysis are moved to, and new such systems are impregnated on, the Web. The above brief description, which is almost common knowledge, is repeated here to serve one special purpose. It demonstrates that the great minds of our time bring revolutions by analyzing what everyone has already known or by implementing, alternatively, what everyone has already used. The question is: Can web intelligence help in the future? 3.5 The more things change, the more they stay the same Now, we turn our attention to the other side of the same coin by investigating the things that the resolutions do not change. In spite of the technological changes, achievements of the current Web and associated systems lie in the process of weblization. The weblization of a specific field or an organization does not change its fundamental principles, although it may become more effective and efficient, as well as being at different level of scale. For example, electronic commence does not change the principles of doing business, but does introduce more dynamics, opportunities, Division Of Computer Science , SOE CUSAT Page 11 flexibility, and other new properties. Another example is the Five Laws of the Web:the subject
  18. 18. Web Intelligence matters are changed, but the philosophy remains to be the same. Both paper documents and the Web use links. The physical implementations are different, one on paper and the other on computer, but the logical meanings stay more or less the same. The same analytical tools and methods apply to both. The property of “unchangeness” makes it possible to apply the same principles again and again, with possible adaptation and adjustment. The philosophy and principles that have been proved to be effective in past can be applied to design and implement intelligent web information systems. Some illustrative examples are listed here: Separation of logical view and physical view. Separation of knowledge and inference engine. Keep It Simple, Stupid! The first two separation principles are along the same line as the separation of content and form principle. The first one is widely used in the design and implementation of database systems. Its application to the Web implies that one can generate many virtual logical views from the same physical web. The second principle is a fundamental one in expert systems. It is applicable to the design of web inference engines. The last rule, also known as the KISS principle, is universally applicable It has been applied throughout the design of the Web. Division Of Computer Science , SOE CUSAT Page 12
  19. 19. Web Intelligence CHAPTER 4 Components of Web Intelligence 4.1 Web Data The data available in electronic commerce environments is three-fold and includes server data in the form of log files, site specific web meta data representing the structure of the web site, and marketing information, which depends on the products and services provide. Server data is generated by the interactions between the persons browsing an individual site and the web server. This data can be divided into log files and query data. Historically, web servers recording server activity, errors and referrer information used a log file to record each event. It is now the standard that web servers use a combined log file format, called Common Log file Format . This format combines the server and error logs into one file. More recently, the Extended Log file Format has been used, which consolidates the Common format with additional information, namely the referrer and cookie information. By incorporating referrer information, the output of the mining of these logs files being much more useful and actionable in marketing terms. Cookies are tokens generated by the web server and held by the clients. The information stored in a cookie helps to ameliorate the transaction less state of web server http interactions, enabling servers to track client access across their hosted web pages. The logged cookie data is customizable and can contain keys for relating the navigational data to the content of the marketing data, including transactional data. Usually the following information is contained in a cookie: User ID, source IP address, time-to-live, randomly generated unique ID and user defined information. A fourth data source that is typically generated on electronic commerce sites is query data to a web server. This data is usually generated when users of the web site use search or product locator facilities on the web site to search for relevant pages/products. This is often user interaction with a product database, via the company’s Internet site. The final source of data is web meta-data. This data describes the structure of the web site and is usually generated dynamically and automatically after a site update. Web meta-data generally includes neighbor pages, leaf nodes and entry points. This information is usually implemented as a site-specific index table, which represents a labeled, directed graph. Meta-data also provides information whether a page has been created statically or dynamically and whether user interaction is required or not. In addition to the structure of a site, web meta-data can also contain information of more Division Of Computer Science , SOE CUSAT semantic nature, usually represented in XML. Page 13
  20. 20. Web Intelligence Web Mining Components of Web Intelligence In the context of web intelligence, web mining may be defined as the application of data mining techniques to Internet data. This definition is sometimes extended to include statistical, database optimization, and artificial intelligence techniques. Web mining has been sub-divided into web structure, web usage, and web content mining . Web structure mining is the application of data mining techniques to web site structures. In many cases this may be the entire web, and research in intelligent search engines and intelligent agents is described in many articles, . In our research, we define web structure mining as the mining of Internet data, together with data about the structure of the site. This may be thought of as enriching the efficacy of the data mining process with domain knowledge. The application of domain knowledge is further discussed in the analytical process section. Web usage mining is the application of data mining to Internet web server log file data, which is described in the earlier section on web data. Web usage mining forms the core of our research in web mining for web intelligence, and log files provide the foundation data for visitor analysis. This type of analysis of the visitors to a web site can be subdivided into technographic and psychographic analysis . Technographic analysis focuses on what is known about the visitor’s technical platform, i.e., operating system, browser, plug-ins, user language, cookie information. On its own, this information is not a rich source of discriminatory data for visitor profiling but in conjunction with the homogenous data sets available after extract, transform & load operations to data warehousing, it contributes significantly. Psychographic analysis is the examination of what we know about the behavioral patterns of web site visitors. This includes the routes taken by visitors through a site, the time spent on each page, route differences based on differing entry points to site, aggregated route behavior, general click stream behavior, etc. This is the information of most use to web marketers, and is equivalent to marketing intelligence about where shoppers enter the store, where shoppers go in the store, where they leave the store, what they look at but don’t buy, what they buy and how quickly, etc. Web content mining is the application of data and text mining algorithms and techniques to the contents of web pages, usually written in HTML. At its simplest, this entails the extraction of text between HTML tags for headings and titles, or the extraction of the HTML Meta tag content.. Our research is based upon XML and RDF-based data schemas that help to ensure correctness and proper context. Division Of Computer Science , SOE CUSAT Page 14
  21. 21. Web Intelligence 4.2 Representation Intelligent Web agents can use the Problem Solver Markup Language (PSML) to specify their roles, settings, and relationships with any other services. The intelligent Web must also have the ability to process and understand natural language. It must understand and correctly judge the meaning of concepts expressed in words, such as “good,” “best,” and “season.” Further, the intelligent Web must grasp the granularities of these terms’ corresponding subjects and the location of their ontology definitions. Self-direction and learning In addition to the semantic knowledge that an intelligent search can extract and manipulate, intelligent Web agents must also incorporate a dynamically created source of metaknowledge that deals with the relationships between concepts and the spatial or temporal constraint knowledge that planning and executing services use. This allows the agents to selfresolve their conflicts. To solve specific problems, intelligent Web agents must be able to plan. The planning process uses goals and associated sub goals, as well as constraints. In the intelligent Web, ontologies alone will not be sufficient. Personalization The intelligent Web can personalize interactions by remembering a particular user’s recent encounters and relating the topics and sites that a user accesses during different online sessions. It may further identify other goals and courses of action as a user’s interactions broaden and deepen, providing ever more data upon which to base its recommendations. As part of its personalized approach to user services, the intelligent Web will interact with the user when executing these tasks. In summary, semantics contributes a vital aspect to the intelligent Web. We expect the Web to extend not only the knowledge of artificial assistants, but also their intelligence. WI’s Four Levels We can study Web intelligence on at least four conceptual levels, ranging from the lower, hardware- centered level to the higher, application-centered level. This framework builds upon the fast development and application of various Web technologies. • Internet-level communication, infrastructure, and security protocols. At its core, the Web is a computer-network system. WI techniques for this level include Web data perfecting systems built upon Web surfing patterns to resolve latency issues. The intelligence of Division Of Computer Science , SOE CUSAT Page 15
  22. 22. Web Intelligence the Web’s perfecting routines comes from an adaptive learning process based on observations of user surfing behavior. • Interface-level multimedia presentation standards. The Web functions as an interface for human-Internet interaction. At this level, the Web interfaces require adaptive cross-language processing, personalized-multimedia-representation, and multimodal-data-processing capabilities. • Knowledge-level information processing and management tools. The Web serves as a distributed data and knowledge base. Accessing and manipulating this information requires semantic markup languages to represent the Web’s contents in machineunderstandable formats. Agent-based autonomic computing functions such as searching, aggregation, classification, filtering, managing, mining, and discovery can then use this data. • Application-level ubiquitous computing and social intelligence environments. The Web can form the basis for establishing social networks that contain communities of people, organizations, or other social entities. Social relationships such as friendship, co-working, or exchanging information about common interest connect these entities. The study of WI thus encompasses issues central to social network intelligence. Users access the Web’s multimedia content from stationary desktop computers and increasingly from mobile platforms as well.5 Ubiquitous Web access and computing from various wireless devices requires even greater adaptive personalization. WI should suit these needs well by providing techniques for use in constructing interest models derived from implicit inferences based on user behavior. Division Of Computer Science , SOE CUSAT Page 16
  23. 23. Web Intelligence 4.3 PSML and Web inference engine Distributed inference engines form PSML’s core. These engines can perform automatic reasoning on the Web by incorporating autonomically collected and transformed content and meta-knowledge into locally operational knowledge and databases. A feasible way to implement PSML is to use an existing Prolog-like logic language supplemented with agents that perform dynamic-content updates, meta-knowledge. 4.4 Social network intelligence The social intelligence approach to Web computing presents new opportunities for WI research and development. As the Web becomes an integral part of our society, WI can and should support Web-based social networks at all levels. Study in this area must receive as much attention as Web mining, Web agents, ontologies, and related topics. Web-based computing The intelligent Web seeks to provide not only a medium for seamless information exchange and knowledge sharing, but also the sort of human-crafted resources that encourage sustainable knowledge creation and scientific and social evolution. The intelligent Web will rely on Grid-like service agencies that self-organize, learn, and evolve their courses of action to perform service tasks and transform their identities and interrelationships in communities. These services will also cooperate and compete among themselves to optimize their resources and utilities and those of others. 4.5 Benchmark applications To effectively develop and evaluate systems and applications that address WI research issues, we must consider benchmark applications that will demonstrate these capabilities. Suppose we want to conduct a Web-based search to compile the data and generate a market report for an existing product or a potential new product. To perform these tasks, an information agent will mine and integrate available Web information, which will in turn be passed to a market analysis agent. The analysis will involve the quantitative simulation of customer behavior in a marketplace, instantaneously handled by other service agencies involving a large number of Grid agents. Given that the number of variables can number in the hundreds or thousands, generating one prediction can easily require significant computer resources Division Of Computer Science , SOE CUSAT Page 17
  24. 24. Web Intelligence CHAPTER 5 Computational Web Intelligence and Granular Web Intelligence for Web Uncertainty With explosive growth of Web data on wired and wireless networks, a challenging problem for a new generation of intelligent Web techniques is how to handle uncertain Web data and making right decisions under Web uncertainty. So it is necessary to develop new intelligent Web techniques for Web applications under different types of uncertainty including probability, possibility, fuzziness, roughness, randomness, etc. Web Intelligence (WI), a new direction for scientific research and development, exploits Artificial Intelligence (Al) and advanced Information Technology (IT) on the Web and Internet. In general, Al-based Web techniques can be used to handle probabilistic Web data. Since there are lots of fuzzy Web data and other kinds of uncertain Web data, we need to apply relevant intelligent techniques to process different uncertain Web data that cannot be processed by traditional precise intelligent techniques like Boolean logic. To promote the use of fuzzy Logic in the Internet, Zadeh stated "fuzzy logic may replace classical logic as what may be called the brainware of the Internet" at 2001 BISC International Workshop on Fuzzy Logic and the Internet (FLINT2001) . The fuzzy intelligent agents are used in smart e-Commerce applications. The conceptual fuzzy sets are applied to Web search engines to improve quality of Web service. Clearly, the intelligent e-brainware based on soft computing plays an important role in smart e-Business applications. So soft computing techniques can play an important role in building the intelligent Web brain. So soft-computingbased Web techniques can enhance Web Qol (Quality of Intelligence). In order to use CI (Computational Intelligence) techniques to make intelligent wired and wireless systems with high Qol, Computational Web Intelligence (CWI) was proposed at the special session on CWI at FUZZ-IEEE'02 of 2002 World Congress on Computational Intelligence. CWI is a hybrid technology of CI and Web Technology (WT) dedicating to increasing Qol of e-Business application systems on the wired and wireless networks. Main CWI techniques include • Fuzzy Web Intelligence (FWI) • Neural Web Intelligence (NWI) • Evolutionary Web Intelligence (EWI) • Granular Web Intelligence (GWI) • Rough Web Intelligence (RWI) Division Of Computer Science , SOE CUSAT • Probabilistic Web Intelligence Page 18
  25. 25. Web Intelligence 5.1 WEB UNCERTAINTY Web holds various data sets distributed on a huge number of computers just like a human brain contains biological data stored on a large number of biological neurons. The biological data in the human brain are not always precise but uncertain in most cases due to information incompleteness, linguistic vagueness, imperfect measurement, knowledge limitations, etc. Similarly, Web data on the Internet are not accurate but uncertain usually because of partial Web information, dynamic Web data, fuzzy Web data, Web ontology, unpredictable Web information, different Web users, different hardware environments, different data formats, etc.So the big challenging problem is how to design intelligent Web techniques for Web-based applications with uncertainty. With explosive growth of the wired and wireless networks, Web users suffer from huge amounts of raw Web data because current Web tools still cannot find satisfactory information and knowledge effectively and make decisions correctly because of uncertain Web data, uncertain Web information, uncertain Web knowledge and uncertain Web intelligence. Now the Internet and wireless networks connect an enormous number of computing devices including computers, PDAs (Personal Digital Assistants), cell phones, home appliances, etc. CI is used in telecommunication network applications . Clearly, such a huge networked computing system on the world provides a complex, dynamic and global environment for developing the new distributed intelligent theory and technology based on Al, BI (Biological Intelligence) and CI. Therefore, we must design an intelligent Web technology for dealing with Web uncertainty. 5.2 COMPUTATIONAL WEB INTELLIGENCE FOR WEB UNCERTAINTY Zadeh states that traditional (hard) computing is the computational paradigm that underlies artificial intelligence, whereas soft computing is the basis of CI. Based on the discussions on CI and Al ,the basic conclusion is that CI is different from Al, but CI and Al have a common overlap. In general, hard computing and soft computing can be used in intelligent hard Web applications and intelligent soft Web applications. To enhance Qol (Quality of Intelligence) of e-Business, Computational Web Intelligence (CWI) is proposed to use CI and Web Technology (WT) to make intelligent e-Business applications on the Internet and wireless networks . So the concise relation is given by CWI=CI+WT. Fuzzy logic, neural networks, evolutionary computation, granular Division Of Computer Science , SOE CUSAT Page 19 computing, rough sets and probabilistic methods are major CI techniques for intelligent e-
  26. 26. Web Intelligence Applications on the Internet and wireless networks. Currently, seven major research areas of CWI are (1) Fuzzy WI (FWI), (2) Neural WI (NWI), (3) Evolutionary WI (EWI), (4) Probabilistic WI (PWI), (5) Granular WI (GWI), and (6) Rough WI (RWI). In the future, more CWI research areas will be added. The six current major CWI techniques are described below. • FWI has two major techniques: fuzzy logic and WT. The main goal of FWI is to design intelligent fuzzy e-agents to deal with fuzziness of Web data, Web information and Web knowledge, and also make good decisions for e-Applications effectively. • NWI has two major techniques: neural networks and WT. The main goal of NWI is to design intelligent neural e-agents that can learn Web knowledge from of Web data and Web information and make smart decisions for e-Applications intelligently. • EWI has two major techniques: evolutionary computing and WT. The main goal of EWI is to design intelligent evolutionary e-agents to optimize e-Application tasks effectively. • PWI has two major techniques: probabilistic computing and WT. The main goal of PWI is to design intelligent probabilistic e-agents to deal with probability of Web data, Web information and Web knowledge for e-Applications effectively. • GWI has two major techniques: granular computing and WT. The main goal of GWI is to design intelligent granular e-agents to deal with Web data granules, Web information granules and Web knowledge granules for e-Applications effectively. • RWI has two major techniques: rough sets and WT. The main goal is to design intelligent rough e-agents to deal with roughness of Web data, Web information and Web knowledge for e-Applications effectively.CWI can be used to increase the Qol of e-Business applications. CWI has a lot of wired and wireless applications in intelligent eBusiness. Currently, FWI, NWI, EWI, PWI, GWI and RWI are major CWI techniques. CWI can be used to deal with uncertainty and complexity of Web applications. HWI, a more broad area Division Of Computer Science , SOE CUSAT Page 20
  27. 27. Web Intelligence than CWI, can be applied to more complex e-Business applications. In summary, HWI including CWI will play an important role in designing the smart e-Application systems for wired and wireless users. In summary, CWI technology is based on multiple CI techniques and WT. Relevant CI techniques and WT are selected to make a powerful CWI system for the special e-Business application. 5.3 GRANULAR WEB INTELLIGENCE FOR WEB UNCERTAINTY Granular computing technology can be to do high-level information processing and knowledge discovery based on data granules that are clustered intelligently from raw data with uncertainty. Since there are huge amounts of Web data at different geographical places, it is naturally necessary to use the granular computing technology to preprocess raw Web data, then do granular Web data mining, and finally discover granular Web knowledge. So GWI is a general intelligent technology in dealing with raw Web data with Uncertainty. Mathematically speaking, to handle Web uncertainty effectively, it is really necessary to develop a novel granular set theory. Here, a general framework about granular sets is briefly described below to deal with data uncertainty such as Web data uncertainty. Definition 1 (A Granular Set) Let X be a universal set of data elements. A granular set A in Xis characterized by m granular membership functions Fk(x) for x in X, Fk(x)E[O,1], and k=1,2,...m. For example: If k=1, a granular set is a fuzzy set (a special case: a crisp set) since one membership function is used. The traditional fuzzy sets just use truth values in [0, 1] to handle data uncertainty. If k=2, a granular set is an intuitionistic fuzzy set [25] since two membership functions are used. Intuitionistic fuzzy sets use both truth values and falsity values in [0, 1] to deal with data uncertainty. If k=3, a granular set is a neutrosophic set since three membership functions are used. For example, interval neutrosophic sets are defined on a truth-membership function, an indeterminacy-membership function and a falsity-membership function . The major advantage of interval neutrosophic sets is to reduce data uncertainty by using three types of information that are truth values, falsity values and indeterminacy values in order to make a right decision. 100 Division Of Computer Science , SOE CUSAT Page 21
  28. 28. Web Intelligence We hope that new granular sets and new granular logical systems with four or more membership functions will be developed in the future to handle Web uncertainty effectively and fundamentally. Web uncertainty is a long-term challenging problem related to many Web applications like semantic Web, Web mining, Web knowledge discovery, Web agents, Web search engines, Web security, e-Commerce, e-Business, etc. To handle Web uncertainty, we need to develop relevant intelligent Web technology such as CWI and GWI. Importantly, we need to continue to create new granular sets such as neutrosophic sets to try to solve Web uncertainty effectively. Web uncertainty is a difficult long-term problem. So we need to use different intelligent techniques together for this complicated problem. Hybrid Web Intelligence (HWI), a broad hybrid research area, uses Al, CI, BI (Biological Intelligence) and WT to build hybrid intelligent Web systems to handle Web uncertainty effectively and efficiently. In the future, HWI will have a lot of intelligent Web applications under uncertainty. Main HWI applications include (1) intelligent Web agents for e-Applications such as e-Commerce, e-Government, e-Education and e-Health, (2) intelligent Web security systems such as intelligent homeland security systems, (3) intelligent Web bioinformatics systems, (4) intelligent grid computing systems, (5) intelligent wireless mobile agents, (6) intelligent Web expert systems, (7) intelligent Web entertainment systems, (8) intelligent Web services, (9) Web data mining and Web knowledge discovery, (10) intelligent distributed and parallel Web computing systems based on a large number of networked computing resources, ..., and so on. Division Of Computer Science , SOE CUSAT Page 22
  29. 29. Web Intelligence CHAPTER 6 Trends and Challenges of WI Related Research and Development Web Intelligence presents excellent opportunities and challenges for the research and development of new generation Web-based information processing technology, as well as for exploiting business intelligence. With the rapid growth of the Web, research and development on WI have received much attention. We expect that more attention will be focused on WI in the coming years. Many specific applications and systems have been proposed and studied. Several dominant trends can be observed and are briefly reviewed in this section. E-commerce is one of the most important applications of WI. The e-commerce activity that involves the end user is undergoing a significant revolution. The ability to track users’ browsing behavior down to individual mouse clicks has brought the vendor and end customer closer than ever before. It is now possible for a vendor to personalize his product message for individual customers at a massive scale. This is called targeted marketing or direct marketing Web mining and Web usage analysis play an important role in e-commerce for customer relationship management (CRM) and targeted marketing. Web min- ing is the use of data mining techniques to automatically discover and extract information from Web documents and services. Zhong et al. proposed a way of mining peculiar data and peculiarity rules that can be used for Web-log mining. They also proposed ways for targeted marketing by mining classification rules and market value functions. A challenge is to explore the connection between Web mining and the related agent paradigm such as Web farming that is the systematic refining of information resources on the Web for business intelligence. Text analysis, retrieval, and Web based digital library is another fruitful research area in WI. Topics in this area include semantics model of the Web, text ming, automatic construction of citation. Abiteboul et al. systematically investigated the data on the Web and the features of semi-structured data. Zhong et al. studied text mining on the Web including automatic construction of ontology, e-mail filtering system, and Web-based ebusiness systems. Web based intelligent agents are aimed at improving a Web site or providing help to a user. Liu et al. worked on e-commerce agents . Liu and Zhong worked on Web agents and KDDA (Knowledge Discovery and Data Mining Agents). We believe that Web agents will be a very important issue. It is therefore not surprising that we decide to hold the WI conference in Division Of Computer Science , SOE CUSAT Page 23
  30. 30. Web Intelligence parallel to the Intelligent Agents conference. In the next section, we provide a more detailed description of intelligent Web agents. The Web itself has been studied from two aspects, the structure of the Web as a graph and the semantics of the Web. Studies on Web structures investigate several structural properties of graphs arising from the Web, including the graph of hyperlinks, and the graph induced by connections between distributed search servants. The study of the Web as a graph is not only fascinating in its own right, but also yields valuable insight into Web algorithms for crawling, 10 searching and community discovery, and the sociological phenomena which char- acterize its evolution. Studies of the semantics of the Web were initiated by Tim Berners-Lee, the creator of the World Wide Web. The Web is referred to as the “semantic Web”, where information will be machine-processible in ways that support intelligent network services such as information brokers and search agents. The semantic Web requires interoperability standards that address not only the syntactic form of documents but also the semantic content. A semantic Web also lets agents utilize all the data on all Web pages, allowing it to gain knowledge from one site and apply it to logical mappings on other sites for ontology-based Web retrieval and e-business intelligence. Ontologies and agent technology can play a crucial role in enabling such Web-based knowledge processing, sharing, and reuse between applications. A new DARPA program called DAML (DARPA Agent Markup Languages) is a step toward a “semantic Web” where agents, search engines and other programs can read DAML mark-up to decipher meaning rather than just the content on a Web site. 6.1 Intelligent Web Agents Intelligent agents are computational entities that are capable of making decisions on behalf of their users and self-improving their performance in dynamically changing and unpredictable task environments . In , Liu provided a comprehensive overview of related research work in the field of autonomous agents and multi-agent systems, with an emphasis on its theoretical and computational foundations as well as in-depth discussions on the useful techniques for developing various embodiments of agent-based systems, such as autonomous robots, collective vision and motion, autonomous animation, and search and segmentation agents. The core of those techniques is the notion of synthetic or emergent autonomy based on behavioral self-organization. Intelligent Division Of Computer Science , SOE CUSAT Page 24
  31. 31. Web Intelligence Web Agents (WA) are software programs that primarily serve two important roles: a). autonomous entities for exploring and exploiting Web-based services, and b). prototype entities for exhibiting and explaining Web-generated regularities. These two roles are summarized below. 6.2 From WA to Web-Based Services The first role for WA can be readily described and appreciated by examining the following typical scenarios in which various tasks and objectives are achieved. • Personalized Multimodal Interface WA can provide users with a user-friendly style of presentation that personalizes both the interaction with users and the content presentation. This activity involves the creation of various cognitive aids, including tables, charts, executive summaries, indices, and personalized visual assistants (e.g., graphically animated personas and virtual-reality avatars). WA as interfaces must offer the ease of using electronic services. The provided cognitive aids must be concise (i.e., accessible with as fewer manipulations as possible and as less memorization as possible) and consistent (i.e., understandable based on users’ previously customized cognitive styles). • Push and Pull WA can play an important role in dynamically creating pull-and-push advertising. Here, by pull-and-push advertising we mean that a user expresses his or her favorites during the interaction with the agents (pull advertising) and in return the agents search and deliver the information about the favorite items dynamically to the user (push advertising). Such agents can also increase the positive externality of products, that is, the better people are informed about certain products, the more likely the products will be sold. • Pattern Discovery and Self-Organization WA will enable to detect what users’ buying patterns are forming and how they are structured, and hence effectively manage the online commerce. Collaborative recommendation agents can help individual users aggregate into groups, which can in turn form a dynamical marketplace. • Information Gateway WA can provide users with immediate access to the most relevant information. This support encompasses a wide spectrum of information filtering and delivery activities by manipulating various heterogeneous Web sources including databases, data warehouses, newswire, financial reports, newsletters, newsgroups, outbound emails, electronic bulletin boards, and hypermedia documents, and based on Division Of Computer Science , SOE CUSAT Page 25
  32. 32. Web Intelligence users’ profiles, tailoring and delivering the retrieved information to the users. The provided summary information must be just-in-time (i.e., delivered whenever is needed), relevant (i.e., focused on whichever topics the users are concerned with), and up-to- minute (i.e., refreshed whenever a new piece of information arrives). An example of applications with this type of agent support is comparison shopping that utilizes WA with mobile and filtering capabilities. Some related experiences have been reported in . • Reward WA can motivate users to enter and re-enter a certain electronic service. While an ever-greater proliferation of content continues to consume individuals’ attention, e.g., through push technology to sell something or to support users, WA can play a crucial role in creating a captive audience, in educating it constantly, and even in removing away users’ old purchase habits. To be rewarding is to add value. The motivational rewards or incentives can be created by offering free access to certain information and utility resources (e.g., free software download), opportunities to participate in multi-user information/commodity exchange activities (e.g., collaborative recommendation, chat, bidding, and auction), and scheduled plans for promotional deals. • Matchmaking WA can serve as a new means for trading commodities. Since the interests of users as well as the availability of products from dealers can change dynamically from time to time, what usually happens in present day electronic commerce is: (1) a dealer sells his or her items simply because these are the only items that he or she has at the moment, or (2) a user buys a certain item simply because it is the last item that he or she can find that partially fits his or her need. WA-based customized business attempts to change the existing online buying and selling into the following new scenarios: (1) a dealer identifies and offers what exactly users are interested in, and (2) a user finds and purchases what he or she really loves – some technical issues related to matchmaking have been addressed in . • Decision WA can assist Web users in making decisions. Such decision support may be in the forms of evaluations or recommendations on the various features of certain specific items, cost-benefit analysis, inference support for optimizing utility and resources with respect to functional, time, and cost requirements, and model-based trend analysis and projections concerning new patterns of demand. • Delegation WA can act on behalf of Web users in online activities. The tasks that WA may delegate to achieve include matchmaking, server monitoring, negotiation, bidding, auction, transaction, transfer of goods, and follow-up support. This Division Of Computer Science , SOE CUSAT Page 26 scenario will empower a new paradigm shift from user-centric to user-delegated
  33. 33. Web Intelligence electronic business. The delegations of these tasks may be carried out in either semiautonomous (with users’ intervention on decisions) or fully autonomous manners. To this end, various computational theories and models have been proposed and reported in. • Collaborative Work Support WA can offer the infrastructure support as well as the necessary function for collaboratively solving problems and managing workflow activities Division Of Computer Science , SOE CUSAT Page 27
  34. 34. Web Intelligence CHAPTER 7 Semantic Search Engine The framework’s search engine component queries the information generated by the annotation component. It accepts queries posed in SPARQL and returns a set of links to matching resources. A specialized search interface lets users develop an abstract model of a semantic query, pose it to the engine, and then review the resulting matched documents. The search interface gives end users (people who aren’t experts in Semantic Web technologies) a way to access the resources filtered and annotated by the semantic annotator component. It is also possible to add and delete entities and properties (with related values), so that a user can interact with the knowledge base to fine-tune the query, making subsequent searches more accurate. The key aim for the query interface is to give the user an intuitive and clear abstract query model that hides, as much as possible, the underlying complexity of representation and reasoning. Furthermore, the agents in the search engine multi-agent system exhibit various autonomic features that aim at making the system more robust and scalable. The QS system has been deployed in two different commercial test cases in the UK. In the first case, QS was used to examine specific Web-published documents for commercial opportunities matching the business interests of the customer company. In the second deployment, QS was used to perform knowledge-based searches over existing database sources. In evaluating the performance of the search system in both applications, we could see that by using ontological knowledge and ontology-based annotations, users could perform more accurate queries while being returned up to 71 percent fewer documents than with a keyword-based search engine—in the best cases eliminating more than 90 percent of the irrelevant documents. We are now in the process of further refining these two deployments, and we are planning more industrial deployments in the near future with other UK companies Division Of Computer Science , SOE CUSAT Page 28
  35. 35. Web Intelligence CHAPTER 8 CONCLUSION While it may be difficult to define what exactly Web Intelligence (WI) is, one can easily argue for the need and necessity of creating such a subfield of study in computer science. With the rapid growth of the Web, we foresee a fast growing interest in Web Intelligence. Roughly speaking, we define Web Intelligence as a field that “exploits Artificial Intelligence (AI) and advanced Information Technology (IT) on the Web and Internet.” It may be viewed as a marriage of artificial intelligence and information technology in the new setting of the Web. By examining the scope and historical development of artificial intelligence, we discuss some fundamental issues of Web Intelligence in a similar manner. There is no doubt in our mind that results from AI and IT will influence the development of WI. Instead of searching for a precise and noncontroversial definition of WI, we list topics that might be interested by a researcher working on Web related issues. In particular, we identify some challenging issues of WI, including ecommerce, studies of Web structures and Web semantics, Web information storage and retrieval, Web mining, and intelligent Web agents, to examine performance characteristics of various approaches in Web-based intelligent information technology, and to cross-fertilize ideas on the development of Web-based intelligent information systems among different domains. It is not intended to be a complete and systematic study of the field, but rather a record of personal observations, scattered (perhaps immature) ideas, general comments, speculations, and opinions. We hope that a careful study of these not yet well-connected points may lead to a web of knowledge for web intelligence. From several perspectives, we examined the Web. This enables us to see clearly the current status, the scope, and the future of web intelligence research. Web intelligence exploration of the Web was then commented from a few angles. A couple of challenges were posed. Finally, Web-based Support Systems (WSS) were used to demonstrate the ideas presented, which may further enhance the Web as a tool - “of the people, by the people, for the people” Division Of Computer Science , SOE CUSAT Page 29
  36. 36. Web Intelligence [1] REFERENCES Research Challenges and Trends in the New Information Age Y.Y. Yao1, Ning Zhong, Jiming Liu, and Setsuo Ohsuga , IEEE [2] Web Intelligence: New Frontiers of Exploration Yiyu (Y.Y.) Yao Department of Computer Science, University of Regina Regina , saskatchewa , IEEE [4] Education and the Semantic Web Vladan Devedzic, Department of Information Systems and Technologies, FON – School ,of Business Administration, University of Belgrade [5] Computational Web Intelligence and Granular ,Web Intelligence for Web Uncertainty ,Yan-Qing Zhang, Member, IEEE Division Of Computer Science , SOE CUSAT Page 30

×