For project

259 views

Published on

this presentation is for Project

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
259
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

For project

  1. 1. Presented By : Aamir Mushtaq Jesal Mistry Kapil Tekwani Neville Shah Visual Representation of Knowledge Articles as Dynamic Interactive Connected Graph Nodes Internal Guide: Prof. Mrs. Kalyani Waghmare External Guides: Mr. Prajwalit Bhopale Mr. Kiran Kulkarni Sponsored Organization: Infinitely Beta
  2. 2. <ul><li>Introduction </li></ul><ul><li>Problem Definition </li></ul><ul><li>ACM Keywords </li></ul><ul><li>Motivation of the Project </li></ul><ul><li>Algorithm used </li></ul><ul><li>System Flow Diagram </li></ul><ul><li>System Architecture </li></ul><ul><li>Mathematical Model </li></ul><ul><li>Feasibility Analysis </li></ul><ul><li>System UML Diagrams </li></ul><ul><li>Main Modules </li></ul><ul><li>Technologies Used </li></ul><ul><li>Proposed UI </li></ul><ul><li>Restrictions, Limitations & Constraints </li></ul><ul><li>References </li></ul><ul><li>Paper Publications </li></ul>Overview
  3. 3. Introduction <ul><li>Online knowledge articles have become increasingly popular </li></ul><ul><li>Eg - Wikipedia is used by students, educators, professionals etc </li></ul><ul><li>Problem faced: </li></ul><ul><ul><li>Article topics to be studied are not easy to understand </li></ul></ul><ul><ul><li>Take too much time </li></ul></ul><ul><ul><li>Have too much content </li></ul></ul><ul><li>Possible solution: Create a graphical visualization of knowledge articles. </li></ul><ul><li>Enables users to obtain an easily understandable overview of an article </li></ul><ul><li>In this project we present an innovative technique for visualization of content and contextual information of Webpages for an effective browsing experience. </li></ul>
  4. 4. Problem Definition To implement an Easy and Interactive E- Learning Tool for Knowledge Articles. It will be implemented as a browser plugin which will represent a graphical view of the document in the form of graphical nodes with main node focusing on keyword for which we want to gain information and neighboring nodes representing keywords that are most prominently related to the searched keyword/keyword about which information is to be obtained. In addition to that, we have semantic links between the nodes where the edges represent the relation.
  5. 5. <ul><li>H. Information Systems </li></ul><ul><ul><li>H.2 Database Management </li></ul></ul><ul><ul><ul><li>H.2.8 Database Applications </li></ul></ul></ul><ul><ul><ul><ul><li>Data Mining </li></ul></ul></ul></ul><ul><ul><li>H.3 Information Storage and Retrieval </li></ul></ul><ul><ul><ul><li>H.3.3 Information Search and Retrieval </li></ul></ul></ul><ul><ul><ul><ul><li>Information filtering </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Query formulation </li></ul></ul></ul></ul><ul><ul><ul><li>H.3.5 Online Information Services </li></ul></ul></ul><ul><ul><ul><ul><li>Web based services </li></ul></ul></ul></ul><ul><ul><li>H.5 Information Interfaces and Presentation </li></ul></ul><ul><ul><ul><li>H.5.4 Hypertext/Hypermedia </li></ul></ul></ul><ul><ul><ul><ul><li>Architecture </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Navigation </li></ul></ul></ul></ul><ul><li>I. Computing Methodologies </li></ul><ul><ul><li>I.2 Artificial Intelligence </li></ul></ul><ul><ul><ul><li>I.2.7 Natural Language Processing </li></ul></ul></ul><ul><ul><ul><ul><ul><li>Text Analysis </li></ul></ul></ul></ul></ul><ul><ul><ul><li>I.2.8 Problem Solving, Control Methods & Search </li></ul></ul></ul><ul><ul><ul><ul><li>Dynamic Programming </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Graph & Tree Search Strategies </li></ul></ul></ul></ul>ACM Keywords
  6. 6. Motivation of the Project <ul><li>Project Motivation: </li></ul><ul><ul><li>Provide a user friendly solution to problem mentioned in introduction </li></ul></ul><ul><ul><li>Project overall saves man hours (a picture is worth a thousand words) </li></ul></ul><ul><ul><li>Visualization and interactivity enhances interest and makes learning fun </li></ul></ul><ul><ul><li>Knowledge articles assimilated easily and quickly. </li></ul></ul><ul><ul><li>Overview of a topic obtained with minimum reading </li></ul></ul><ul><ul><li>Time spent reading minimised </li></ul></ul><ul><li>Personal Motivation </li></ul><ul><ul><li>Learn new technologies </li></ul></ul><ul><ul><li>Learn SDLC </li></ul></ul><ul><ul><li>Project management skills </li></ul></ul><ul><ul><li>Recognition and rewards </li></ul></ul>
  7. 7. Algorithm Used <ul><li>Input to system = URL of Wikipedia article. </li></ul><ul><li>Select the document from Wikipedia dump / Scrape corresponding to the input URL. (document = Natural language words + Keywords + Links) </li></ul><ul><li>Eliminate Natural language words. </li></ul><ul><li>Count section-wise occurrences of keywords, store using tables and calculate weight. Ex: weight of particular keyword in doc = 0.7*cs1+0.5*cs2+0.3*cs3 </li></ul><ul><li>Create a table for Links in that document, if there is a link for a particular keyword it will add to the weight of that keyword. </li></ul><ul><li>Create a threshold for keywords or links to be displayed based on weight. </li></ul>
  8. 8. Algorithm Used (cont’d) 7. Depending on current depth, pre-decided window size to select top keyword/links for next level. Example: 20 for 0 th level, 10 for 1 st level, 5 for 2 nd level.(tuning required) 8. For efficient searching of accurate data we will be working across the depth i.e. at next levels if the keyword (present in previous level doc) is occurring many times (say 100), it will add weight to the corresponding keyword in the previous table. 9 Output will be graphical representation of keywords. If node (keyword) is a link, it will be connected to another node (keyword) of next level else stop at that level.
  9. 9. System flow diagram
  10. 10. System Architecture
  11. 11. Mathematical Model Let S be the system.   S = {U inp , U, D, Q, W t , K w , T Kw,S , T Kw,Wg , T U,Kw , T U,Kw,Wg }   U inp = URL identifier (input to the system)   D = database of the WWW, containing webpages as documents d i . D = {d 1 , d 2 , d 3 ,..., d n } where d i is a WWW document (webpage).   Q = set of all possible queries. Q = {q 1 , q 2 , q 3, ..., q n } where q i is any given query to be fired on the database. W t = set of words of a particular document. W t = {w 1 , w2,..., wn} where w i ϵ d i, for 1<= i <= n   K w = set of keywords ⊆ W t, obtained after F el K w = {k 1 , k 2 ,…, k m } where k i ⊆ W t , for 1<= i <= m   U = extracted URLs from document d i U = {u 1 , u 2 ,..., u n } where u i ϵ d i
  12. 12. Mathematical Model T Kw,S = table of keywords and sectional counts, obtained after F cnt T Kw,S = {<k 1 , sA 1 , sB 1 , sC 1 >, <k 2 , sA 2 , sB 2 , sC 2 >, …, <k m , sA 3 , sB 3 , sC 3 >} T Kw,Wg = table of keywords and associated weights, obtained after F w T Kw,Wg = {<k 1 ,wg 1 >, <k 2 , wg 2 >, … ,<k m , wg m >}   T U,Kw = table of urls in U mapped with the keywords and weights table T Kw,Wg obtained after F map T U,Kw = {<u n­ , k m , wg m >} U t is a mapping of keywords and their respective <U>
  13. 13. Mathematical Model Functions: F el (W T {<w 1 , w 2 , ... , w n >}) = K W F el eliminates all natural language elements from the <W T > part and resultant set of words are the keywords that are identified in the <K W > list / set. F cnt ( K w {<k 1 , k 2 , ... , k n >}) = T Kw,S F cnt returns an array of tuples of keywords and their respective sectional counts {<k m , s1, s2, s3>} which would be used in the calculation of weights of keywords. And provide the T Kw,S as input of F w . F w ( T Kw,S {<k m , sA m , sB m , sC m >}) = T Kw,Wg F w takes the T Kw,S obtained by the function F cnt as input and calculates the weight associated with each keyword and returns array of tuples of keywords and weights. {<k m , wg m >} F map ( U{<u 1 , u 2 , … u n >} ,T Kw,Wg {<k 1 ,wg 1 >, <k 2 ,wg 2 > ,…,<k m , wg m >}) = T U,Kw,Wg F map takes the U< u 1 , u 2 ...u n > and T Kw,Wg <k m , wg m > as input and it maps the keywords with the respective Urls in the d i and returns an array of urls with their mapped keywords and Urls. F win (lvl) = {<5> v <10> v <20>} F win is a window function that returns the size of the window that is dependent on the depth/ level that we are in.
  14. 14. Feasibility Analysis <ul><li>NP – Hard: </li></ul><ul><ul><li>Number of keywords and links not known while scanning wiki </li></ul></ul><ul><ul><li>Processing power at server not determined in advance </li></ul></ul><ul><ul><li>Ranking algorithm exponential in nature </li></ul></ul><ul><ul><li>Solution not determined in polynomial time </li></ul></ul><ul><li>NP – Complete: </li></ul><ul><ul><li>Assign ranks to keywords and links, using ranking algorithm </li></ul></ul><ul><ul><li>Use threshold value to limit links </li></ul></ul><ul><ul><li>Approx. processing power calculated to scan documents </li></ul></ul><ul><ul><li>Thus converted NP – Hard to NP – Complete </li></ul></ul>
  15. 15. Main Modules <ul><li>Extraction Module: </li></ul><ul><ul><li>URL and keywords from base and sub documents </li></ul></ul><ul><ul><li>Use frequency and inverse frequency calculations. </li></ul></ul><ul><li>Keyword Analysis Module: </li></ul><ul><ul><li>Implement ranking algorithm on keywords </li></ul></ul><ul><ul><li>Keywords are identified and listed in the relevant tables. </li></ul></ul><ul><li>Linking Module: </li></ul><ul><ul><li>Check if keywords linked to URL’s </li></ul></ul><ul><ul><li>If linking is not done, then there will be duplicate entries of both URL’s and the keywords. </li></ul></ul><ul><li>Graphical Module: </li></ul><ul><ul><li>Displays the entire wiki page as dynamic connected graph </li></ul></ul><ul><ul><li>Dynamic capability of changing root node </li></ul></ul><ul><ul><li>Show text snippet contained inside the node. </li></ul></ul>
  16. 16. Technologies Used <ul><li>Python – scripting language </li></ul><ul><li>MathML – Mathematical Markup Language </li></ul><ul><li>Gremlin or Neo4J for graph operations and storage </li></ul><ul><li>Ajax – client side scripting </li></ul><ul><li>Git – Distributed Revision Control System </li></ul><ul><li>Python’s NLTK libraries for Natural Language Processing </li></ul><ul><li>Unix Shell </li></ul>
  17. 17. Proposed UI – shows the output of a search
  18. 18. Restrictions, Limitations & Constraints <ul><li>We will be limiting our software to search for keywords and links to a maximum depth of 3 levels including root level. </li></ul><ul><li>There will be a limitation on links or keywords that will be chosen for further processing. </li></ul><ul><li>Dump has to upgrade for every new release. </li></ul><ul><li>In case of articles about natural language words, the NLP will itself eliminate those words. </li></ul><ul><li>In case of small articles, relevant keywords may not be properly found. </li></ul><ul><li>Page may not contain definite number of links. </li></ul>
  19. 19. References <ul><li>Schonhofen, P.; &quot;Identifying Document Topics Using the Wikipedia Category Network,&quot; Web Intelligence, 2006. WI 2006. IEEE/WIC/ACM International Conference on , vol., no., pp.456-462, 18-22 Dec. 2006 </li></ul><ul><li>Lamberti, F.; Sanna, A.; Demartini, C.; , &quot;A Relation-Based Page Rank Algorithm for Semantic Web Search Engines,&quot; Knowledge and Data Engineering, IEEE Transactions on , vol.21, no.1, pp.123-136, Jan. 2009 </li></ul><ul><li>Alani, H.; Sanghee Kim; Millard, D.E.; Weal, M.J.; Hall, W.; Lewis, P.H.; Shadbolt, N.R.; , &quot;Automatic ontology-based knowledge extraction from Web documents,&quot; Intelligent Systems, IEEE , vol.18, no.1, pp. 14- 21, Jan-Feb 2003 </li></ul><ul><li>Schindler, M.; Vrandečić, D.; , &quot;Introducing New Features to Wikipedia: Case Studies for Web Science,&quot; Intelligent Systems, IEEE , vol.26, no.1, pp.56-61, Jan.-Feb. 2011 </li></ul><ul><li>Cheong-Iao Pang; Biuk-Aghai, R.P.; , &quot;Map-like Wikipedia overview visualization,&quot; Collaboration Technologies and Systems (CTS), 2011 International Conference on , vol., no., pp.53-60, 23-27 May 2011 </li></ul><ul><li>Boukhelifa, N.; Chevalier, F.; Fekete, J.; , &quot;Real-time aggregation of Wikipedia data for visual analytics,&quot; Visual Analytics Science and Technology (VAST), 2010 IEEE Symposium on , vol., no., pp.147-154, 25-26 Oct. 2010 </li></ul><ul><li>Prato, A.; Ronchetti, M.; , &quot;Using Wikipedia as a Reference for Extracting Semantic Information from a Text,&quot; Advances in Semantic Processing, 2009. SEMAPRO '09. Third International Conference on , vol., no., pp.56-61, 11-16 Oct. 2009 </li></ul>
  20. 20. References <ul><li>Taneja, Harmunish; Gupta, Richa; , &quot;Web Information Retrieval Using Query Independent Page Rank Algorithm,&quot; Advances in Computer Engineering (ACE), 2010 International Conference on , vol., no., pp.178-182, 20-21 June 2010 </li></ul><ul><li>Pirrone, R.; Pipitone, A.; Russo, G.; “Semantic sense extraction from Wikipedia pages,” Human System Interactions (HSI), 2010 3 rd Conference on, vol., no., pp. 543-547, 13-15 May 2010 </li></ul><ul><li>Wikipedia data from Wikipedia links: http://stats.wikimedia.org/EN/TablesPageViewsMonthlyCombined.htm </li></ul><ul><li>Wikipedia database download in xml format: http://dumps.wikimedia.org/ derived from http://en.wikipedia.org/wiki/wikipedia:Database_Download </li></ul><ul><li>Wikitools from mediaWiki in url: http://en.wikipedia.org/wiki/MediaWiki </li></ul><ul><li>Wikipedia Categorization from Wikipedia website: http://en.wikipedia.org/wiki/Wikipedia:Categorization </li></ul>
  21. 21. Paper Publications <ul><li>Paper Title: </li></ul><ul><li>Visual Representation of Knowledge Articles as Dynamic Interactive Connected Graph Nodes. </li></ul><ul><li>Name of Conference where paper submitted: </li></ul><ul><ul><li>European Modeling Symposium 2011, EMS2011 </li></ul></ul><ul><ul><li>Informatics and Computational Intelligence 2011, ICI2011 </li></ul></ul><ul><ul><li>Education and e-learning conference 2011, EeL2011 </li></ul></ul><ul><li>Name of Conference where paper Accepted: </li></ul><ul><ul><li>European Modeling Symposium 2011, EMS2011 </li></ul></ul><ul><ul><li>Informatics and Computational Intelligence 2011, ICI2011 </li></ul></ul><ul><ul><li>Education and e-learning conference 2011, EeL2011 </li></ul></ul><ul><li>Name of Journal where paper Accepted: </li></ul><ul><li>International Foundation for Modern Education and Scientific Research (INFOMESR) </li></ul>
  22. 22. <ul><li>Backward References: </li></ul><ul><ul><li>Schonhofen, P.; &quot;Identifying Document Topics Using the Wikipedia Category Network,&quot; Web Intelligence, 2006. WI 2006. IEEE/WIC/ACM International Conference on , vol., no., pp.456-462, 18-22 Dec. 2006 </li></ul></ul><ul><ul><li>Cheong-Iao Pang; Biuk-Aghai, R.P.; , &quot;Map-like Wikipedia overview visualization,&quot; Collaboration Technologies and Systems (CTS), 2011 International Conference on , vol., no., pp.53-60, 23-27 May 2011 </li></ul></ul><ul><ul><li>Boukhelifa, N.; Chevalier, F.; Fekete, J.; , &quot;Real-time aggregation of Wikipedia data for visual analytics,&quot; Visual Analytics Science and Technology (VAST), 2010 IEEE Symposium on , vol., no., pp.147-154, 25-26 Oct. 2010 </li></ul></ul><ul><ul><li>Lamberti, F.; Sanna, A.; Demartini, C.; , &quot;A Relation-Based Page Rank Algorithm for Semantic Web Search Engines,&quot; Knowledge and Data Engineering, IEEE Transactions on , vol.21, no.1, pp.123-136, Jan. 2009 </li></ul></ul><ul><ul><li>Prato, A.; Ronchetti, M.; , &quot;Using Wikipedia as a Reference for Extracting Semantic Information from a Text,&quot; Advances in Semantic Processing, 2009. SEMAPRO '09. Third International Conference on , vol., no., pp.56-61, 11-16 Oct. 2009 </li></ul></ul>Paper Publications
  23. 23. <ul><li>Forward References: </li></ul><ul><ul><li>Cheong-Iao Pang; Biuk-Aghai, R.P.; , &quot;Map-like Wikipedia overview visualization,&quot; Collaboration Technologies and Systems (CTS), 2011 International Conference on , vol., no., pp.53-60, 23-27 May 2011. </li></ul></ul><ul><ul><li>Pirrone, R.; Pipitone, A.; Russo, G.; “Semantic sense extraction from Wikipedia pages,” Human System Interactions (HSI), 2010 3 rd Conference on, vol., no., pp. 543-547, 13-15 May 2010 </li></ul></ul><ul><ul><li>Wikipedia data from Wikipedia links: http://stats.wikimedia.org/EN/TablesPageViewsMonthlyCombined.htm </li></ul></ul><ul><ul><li>Wikipedia database download in xml format: http://dumps.wikimedia.org/ derived from http://en.wikipedia.org/wiki/wikipedia:Database_Download </li></ul></ul><ul><ul><li>Wikitools from mediaWiki in url: http://en.wikipedia.org/wiki/MediaWiki </li></ul></ul><ul><ul><li>Wikipedia Categorization from Wikipedia website: http://en.wikipedia.org/wiki/Wikipedia:Categorization </li></ul></ul>Paper Publications
  24. 24. Any Questions?
  25. 25. Thank You

×