Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Major Web Intelligence Tools - Artificial Intelligence Laboratory ...


Published on

  • Be the first to comment

Major Web Intelligence Tools - Artificial Intelligence Laboratory ...

  1. 1. Major Web Intelligence Tools
  2. 2. Web Intelligence Tools <ul><li>I. Collection </li></ul><ul><ul><li>Offline Explorer </li></ul></ul><ul><ul><li>SpidersRUs (AI Lab) </li></ul></ul><ul><ul><li>Google Scholar </li></ul></ul><ul><li>II. Analysis (Data and Text Mining) </li></ul><ul><ul><li>Google APIs </li></ul></ul><ul><ul><li>Google Translation </li></ul></ul><ul><ul><li>GATE </li></ul></ul><ul><ul><li>Arizona Noun Phraser (AI Lab) </li></ul></ul><ul><ul><li>Self-Organizing Map, SOM (AI Lab) </li></ul></ul><ul><ul><li>Weka </li></ul></ul><ul><li>III. Visualization </li></ul><ul><ul><li>NetDraw </li></ul></ul><ul><ul><li>JUNG </li></ul></ul><ul><ul><li>Analyst’s Notebook and Starlight </li></ul></ul>
  3. 3. Collection: Offline Explorer <ul><li>Developed by MetaProducts Corporation, Offline Explorer can d ownload Web sites to your hard disk for offline browsing . </li></ul><ul><li> </li></ul><ul><li>Advantages of Offline Explorer </li></ul><ul><ul><li>Save Time : D ownload up to 500 files simultaneously . </li></ul></ul><ul><ul><li>Save Yesterday's Web Sites for Tomorrow's Use </li></ul></ul><ul><ul><li>Monitor Web Sites </li></ul></ul><ul><ul><li>Mine your Data </li></ul></ul><ul><ul><ul><li>TextPipe tool in Offline Explorer Pro edition can extract or change the desired data, or even explort it to a database . </li></ul></ul></ul>
  4. 4. Offline Explorer Project list Project properties setup window File filters, URL filters, and other advanced properties. Download URLs Download level File modification check
  5. 5. SpidersRUs <ul><li>SpidersRUs Digital Library Toolkit wa s developed by Artificial Intelligence Lab at the University of Arizona. </li></ul><ul><li> </li></ul><ul><li>Provide modular tools for spidering, indexing, searching for building digital libraries in different languages in a simple DIY (Do-It-Yourself) way. Users can create their own search engines easily and quickly via the friendly user interface. </li></ul><ul><li>SpidersRUs can automate the development of v ertical s earch e ngines in d ifferent d omains and l anguages. It can work on non-English languages such as Asian and Middle East languages. </li></ul>
  6. 6. SpidersRUs An e xample of a Chinese search engine built by SpidersRUs Keyword search Search results
  7. 7. Google Scholar <ul><li>Google Scholar provides a simple way to broadly search for scholarly literature. </li></ul><ul><li> </li></ul><ul><li>Features of Google Scholar : </li></ul><ul><ul><li>Search diverse sources from one convenient place </li></ul></ul><ul><ul><li>Find papers, abstracts and citations </li></ul></ul><ul><ul><li>Locate the complete paper through your library or on the web </li></ul></ul><ul><ul><li>Learn about key papers and scholars in any area of research </li></ul></ul>
  8. 8. Google Scholar Search for “Bioterrorism” in Google Scholar List of papers citing this paper 366 citations
  9. 9. Analysis: Google APIs <ul><li>Google provides many APIs to help you quickly develop your own application s . </li></ul><ul><li> </li></ul><ul><li>Examples of Google APIs: </li></ul><ul><ul><li>Google API for Inlink : D iscovers what pages link to your website. </li></ul></ul><ul><ul><li>Google Data APIs: Provide a simple, standard protocol for reading and writing data on the W eb. Several Google services provide a Google Data API, including Google Base, Blogger, Google Calendar, Google Spreadsheets and Picasa Web Albums. </li></ul></ul><ul><ul><li>Google AJAX Search API: Use s JavaScript to embed a simple, dynamic Google search box and display search results in your own W eb pages. </li></ul></ul><ul><ul><li>Google Analytics: Allows users gather, view, and analyze data about their W ebsite traffic . Users can s ee which content gets the most visits, average page views and time on site for visits. </li></ul></ul><ul><ul><li>Google Safe Browsing APIs: A llow client applications to check URLs against Google's constantly-updated blacklists of suspected phishing and malware pages . </li></ul></ul><ul><ul><li>YouTube Data API: Integrate s online videos from YouTube into your application s . </li></ul></ul>
  10. 10. Example: Google API for I nlink Input “link URL” and search Results: all the related inlink Web pages
  11. 11. Google Translation <ul><li>Google's Translate function. </li></ul><ul><li>The input and output languages can be Arabic, Chinese, Dutch, English, French, German, Greek, Italian, Japanese, Korean, Portugese, Russian or Spanish. </li></ul><ul><li>Major functions of Google Translation include: </li></ul><ul><ul><li>Search multilingual Web pages </li></ul></ul><ul><ul><ul><li>Search the Internet in one language and get the results in another one. </li></ul></ul></ul><ul><ul><li>Translate text </li></ul></ul><ul><ul><ul><li>Translate free text into multiple languages. </li></ul></ul></ul><ul><ul><li>Translate a Web page </li></ul></ul><ul><ul><ul><li>Translate a W eb page into multiple languages. </li></ul></ul></ul>
  12. 12. Google Translation Translate text from Arabic to English Search multilingual Web pages Translate a Web page
  13. 13. GATE <ul><li>Generalised Architecture for Text Engineering ( GATE ) is a toolkit for Text Mining . It was developed by NLP group at the University of Sheffield (UK). </li></ul><ul><li>Information Extraction tasks: </li></ul><ul><ul><li>Named Entity Recognition (NE) </li></ul></ul><ul><ul><ul><li>Finds names, places, dates, etc. </li></ul></ul></ul><ul><ul><li>Co-reference Resolution (CO) </li></ul></ul><ul><ul><ul><li>Identifies identity relations between entities in texts. </li></ul></ul></ul><ul><ul><li>Template Element Construction (TE) </li></ul></ul><ul><ul><ul><li>Adds descriptive information to NE results (using CO). </li></ul></ul></ul><ul><ul><li>Template Relation Construction (TR) </li></ul></ul><ul><ul><ul><li>Finds relations between TE entities. </li></ul></ul></ul><ul><ul><li>Scenario Template Production (ST) </li></ul></ul><ul><ul><ul><li>Fits TE and TR results into specified event scenarios. </li></ul></ul></ul><ul><li>GATE also includes: </li></ul><ul><ul><li>Pa rsers, s temmers, and I nformation R etrieval tools ; </li></ul></ul><ul><ul><li>T ools for visuali z ing and manipulating ontolog y; and </li></ul></ul><ul><ul><li>E valuation and benchmarking tools . </li></ul></ul>
  14. 14. GATE * Picture is from Project information Results display Attributes
  15. 15. Arizona Noun Phraser <ul><li>The Arizona Noun Phraser was developed by Artificial Intelligence Lab at the University of Arizona. </li></ul><ul><li> </li></ul><ul><li>The Arizona Noun Phraser is made up of three major components, a tokenizer, a part-of-speech tagger, and a phrase generation tool. It generates precise topic descriptions. </li></ul><ul><ul><li>Tokenizer </li></ul></ul><ul><ul><ul><li>Separates punctuation and symbols from text without affecting content . </li></ul></ul></ul><ul><ul><li>Part of Speech (POS) Tagger </li></ul></ul><ul><ul><ul><li>Uses both lexical and contextual disambiguation in POS assignment ; </li></ul></ul></ul><ul><ul><ul><li>Lexicons include : Brown Corpus, Wall Street Journal, and Specialist Lexicon . </li></ul></ul></ul><ul><ul><li>Phrase Generation </li></ul></ul><ul><ul><ul><li>Uses Simple Finite State Automata (FSA) of noun phrasing rules ; </li></ul></ul></ul><ul><ul><ul><li>Breaks sentences and clauses into grammatically correct noun phrases . </li></ul></ul></ul>
  16. 16. Arizona Noun Phraser
  17. 17. SOM <ul><li>The multi-level self-organizing map neural network algorithm was developed by Artificial Intelligence Lab at the University of Arizona. </li></ul><ul><ul><li>Using a 2D map display, similar topics are positioned closer according to their co-occurrence patterns; more important topics occupy larger regions. </li></ul></ul>
  18. 18. SOM <ul><li>Developed by AI lab at the University of Arizona </li></ul>Example: FMD Paper Content Map (2001~2005) Different Topics Topic region Topic # of documents belonging to this topic Warm colors represent new topics.
  19. 19. Weka <ul><li>Weka was d eveloped at the University of Waikato in New Zealand. </li></ul><ul><li>Tools include: </li></ul><ul><ul><li>D ata preprocessing ( e.g. , Data Filters ), </li></ul></ul><ul><ul><li>Classification ( e.g. , BayesNet, KNN, C4.5 Decision Tree, Neural Networks, SVM ), </li></ul></ul><ul><ul><li>Regression ( e.g. , Linear Regression, Isotonic Regression, SVM for Regression ), </li></ul></ul><ul><ul><li>Clustering ( e.g. , Simple K-means, Expectation Maximization (EM), Farthest First ), </li></ul></ul><ul><ul><li>Association rules ( e.g. , Apriori Algorithm, Predictive Accuracy, Confirmation Guided ), </li></ul></ul><ul><ul><li>Feature Selection ( e.g. , Cfs Subset Evaluation, Information Gain, Chi-squared Statistic ), and </li></ul></ul><ul><ul><li>Visualization ( e.g. , View different two-dimensional plots of the data ) . </li></ul></ul>
  20. 20. Weka Different analysis tools Different attributes to choose The value set of the chosen attribute and the # of input items with each value
  21. 21. Visualization: NetDraw <ul><li>NetDraw is a open source program written by Steve Borgatti from Analytic Technologies for visualizing both 1-mode and 2-mode social network data. </li></ul><ul><li> </li></ul><ul><li>Handle multiple relations at the same time, and can use node attributes to set colors, shapes, and sizes of nodes. Pictures can be saved in metafile, jpg, gif and bitmap formats. </li></ul><ul><li>Two basic kinds of layouts are implemented: a circle and an MDS/ spring embedding based on geodesic distance. You can also rotate, flip, shift, resize and zoom configurations. </li></ul>
  22. 22. NetDraw Display setup of the nodes and relations Different functions The networks: nodes representing the individuals and links representing the relations
  23. 23. JUNG <ul><li>T he Java Universal Network/Graph Framework ( JUNG ) is a software library for the modeling, analysis, and visualization of data that can be represented as a graph or network. It wa s developed by School of Information and Computer Science at the University of California, Irvine. </li></ul><ul><li> </li></ul><ul><li>The current distribution of JUNG includes implementations of a number of algorithms from graph theory, data mining, and social network analysis : </li></ul><ul><ul><li>Clustering </li></ul></ul><ul><ul><li>Decomposition </li></ul></ul><ul><ul><li>Optimization </li></ul></ul><ul><ul><li>Random Graph Generation </li></ul></ul><ul><ul><li>Statistical Analysis </li></ul></ul><ul><ul><li>Calculation of Network Distances and Flows and Importance Measures (Centrality, PageRank, HITS, etc.). </li></ul></ul>
  24. 24. JUNG Examples of visualization types * Pictures are from
  25. 25. Analyst’s Notebook & Starlight <ul><li>Analyst’s Notebook, by i2: A 2D graph and timeline layout tool for crime and intelligence analysis </li></ul><ul><li>Startlight, by Pacific Northwest Lab (PNL): A 3D network visualization and navigation tool for intelligence analysis </li></ul>
  26. 26. Analyst’s Notebook, i2 Starlight, PNL