Ontopia Code Camp


Published on

A presentation of the Ontopia product from the Ontopia Code Camp at TMRA 2009.

Published in: Education, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Ontopia Code Camp

  1. 1. Ontopia Code Camp<br />TMRA 2009-11-11<br />Lars Marius Garshol & Geir Ove Grønmo<br />
  2. 2. Agenda<br />About you<br />who are you?<br />what do you want from the code camp?<br />About Ontopia<br />The product<br />The future<br />Participating in the project<br />Writing some code!<br />
  3. 3. Some background<br />About Ontopia<br />
  4. 4. Brief history<br />1999-2000<br />private hobby project for Geir Ove<br />2000-2009<br />commercial software sold by Ontopia AS<br />lots of international customers in diverse fields<br />2009-<br />open source project<br />
  5. 5. The project<br />Open source hosted at Google Code<br />Contributors<br />Lars Marius Garshol, Bouvet<br />Geir Ove Grønmo, Bouvet<br />Thomas Neidhart, SpaceApps<br />Lars Heuer, Semagia<br />Hannes Niederhausen, TMLab<br />Stig Lau, Bouvet<br />Baard H. Rehn-Johansen, Bouvet<br />Peter-Paul Kruijssen, Morpheus<br />Quintin Siebers, Morpheus<br />
  6. 6. Current activity (toward 5.1)<br />tolog updates<br />added by LMG<br />Various fixes and optimizations<br />by everyone<br />Toma implementation (in sandbox)<br />by Thomas<br />TMQL implementation (in sandbox)?<br />by Sven Krosse<br />
  7. 7. Architecture and modules<br />The product<br />
  8. 8. The big picture<br />Auto-class.<br />A.N.other<br />A.N.other<br />Other<br />CMSs<br />A.N.other<br />A.N.other<br />DB2TM<br />Portlet support<br />OKP<br />XML2TM<br />Engine<br />CMSintegration<br />Data <br />integration<br />Escenic<br />Taxon.import<br />Ontopoly<br />Web<br />service<br />
  9. 9. The engine<br />Core API<br />TMAPI 2.0 support<br />Import/export<br />RDF conversion<br />TMSync<br />Fulltext search<br />Event API<br />tolog query language<br />tolog update language<br />Engine<br />
  10. 10. Query Engine<br />Implementation of Ontopia’s tolog language (based on Prolog and SQL)<br />Allows powerful queries on the topic map data structure<br />Simplifies application development and improves performance<br />Example:<br />select $B, count($A) from <br />instance-of($B, city),<br />{ premiere($A : opera, $B : place) | <br /> premiere($A : opera, $C : place), <br /> located-in($C : containee, $B : container) } <br />order by $A desc?<br /><ul><li>returns all B's and the corresponding number of A's whereB is a city ANDEITHER B is the place where A was premieredOR the place where A was premiered is located in B in decreasing order of A</li></li></ul><li>TMSync<br />Configurable module for synchronizing one TM against another<br />define subset of source TM to sync (using tolog)<br />define subset of target TM to sync (using tolog)<br />the module handles the rest<br />Can also be used with non-TM sources<br />create a non-updating conversion from the source to some TM format<br />then use TMSync to sync against the converted TM instead of directly against the source<br />
  11. 11. How TMSync works<br />Define which part of the target topic map you want,<br />Define which part of the source topic map it is the master for, and<br />The algorithm does the rest<br />
  12. 12. If the source is not a topic map<br />TMSync<br />convert.xslt<br />Simply do a normal one-time conversion<br />let TMSync do the update for you<br />In other words, TMSync reduces the update problem to a conversion problem<br />source.xml<br />
  13. 13. The City of Bergen usecase<br />Norge.no<br />Service<br />Unit<br />Person<br />LOS<br />City of Bergen<br />LOS<br />
  14. 14. The backends<br />In-memory<br />no persistent storage<br />thread-safe<br />no setup<br />RDBMS<br />transactions<br />persistent<br />thread-safe<br />uses caching<br />clustering<br />Remote<br />uses web service<br />read-only<br />unofficial<br />Engine<br />Memory<br />RDBMS<br />Remote<br />
  15. 15. RDBMS Backend<br />Allows the Engine to use topic maps stored in a relational database<br />Based on a generic topic map schema<br />Necessary when working with very large topic maps<br />Transparent to applications<br />Features<br />Automatically loads data when needed<br />Caches frequently used data<br />Full support for RDBMS transactions<br />Supports tolog-to-SQL compilation<br />Statistical reports for performance tuning<br />Platform support<br />Oracle, MySQL, PostgreSQL, MS SQL Server<br />Test suite available for verifying compatibility with other JDBC-enabled RDBMSes<br />
  16. 16. DB2TM<br />Upconversion to TMs<br />from RDBMS via JDBC<br />or from CSV<br />Uses XML mapping<br />can call out to Java<br />Supports sync<br />either full rescan<br />or change table<br />TMRAP<br />Nav<br />DB2TM<br />Classify<br />Engine<br />Memory<br />RDBMS<br />Remote<br />
  17. 17. DB2TM example<br />Ontopia<br />+<br />=<br />United Nations<br />Bouvet<br />&lt;relation name=&quot;organizations.csv&quot; columns=&quot;id name url&quot;&gt;<br /> &lt;topic type=&quot;ex:organization&quot;&gt;<br /> &lt;item-identifier&gt;#org${id}&lt;/item-identifier&gt;<br /> &lt;topic-name&gt;${name}&lt;/topic-name&gt;<br /> &lt;occurrence type=&quot;ex:homepage&quot;&gt;${url}&lt;/occurrence&gt;<br /> &lt;/topic&gt;<br />&lt;/relation&gt;<br />
  18. 18. TMRAP<br />Web service interface<br />via SOAP<br />via plain HTTP<br />Requests<br />get-topic<br />get-topic-page<br />get-tolog<br />delete-topic<br />...<br />TMRAP<br />Nav<br />DB2TM<br />Classify<br />Engine<br />Memory<br />RDBMS<br />Remote<br />
  19. 19. Navigator framework<br />Servlet-based API<br />manage topic maps<br />load/scan/delete/create<br />JSP tag library<br />XSLT-like<br />based on tolog<br />JSTL integration<br />TMRAP<br />Nav<br />DB2TM<br />Classify<br />Engine<br />Memory<br />RDBMS<br />Remote<br />
  20. 20. Ontopia Navigator Framework<br />Java API for interacting with TM repository<br />JSP tag library<br />based on tolog<br />kind of like XSLT in JSP with tolog instead of XPath<br />has JSTL integration<br />Undocumented parts<br />web presentation components<br />some wrapped as JSP tags<br />want to build proper portlets from them<br />
  21. 21. http://www.ontopia.net/operamap<br />
  22. 22. Navigator tag library example<br /> &lt;%-- assume variable &apos;composer&apos; is already set --%&gt;<br />&lt;p&gt;&lt;b&gt;Operas:&lt;/b&gt;&lt;br/&gt;&lt;tolog:foreach query=”composed-by(%composer% : composer, $OPERA : opera), { premiere-date($OPERA, $DATE) }?”&gt; &lt;li&gt; &lt;a href=&quot;opera.jsp?id=&lt;tolog:id var=&quot;OPERA&quot;/&gt;”<br /> &gt;&lt;tolog:out var=&quot;OPERA&quot;/&gt;&lt;/a&gt;<br /> &lt;tolog:if var=&quot;DATE&quot;&gt; &lt;tolog:out var=&quot;DATE&quot;/&gt; &lt;/tolog:if&gt; &lt;/li&gt;&lt;/tolog:foreach&gt;&lt;/p&gt;<br />
  23. 23. Elmer Preview<br />
  24. 24.
  25. 25.
  26. 26.
  27. 27. Automated classification<br />Undocumented<br />experimental<br />Extracts text<br />autodetects format<br />Word, PDF, XML, HTML<br />Processes text<br />detects language<br />stemming, stop-words<br />Extracts keywords<br />ranked by importance<br />uses existing topics<br />supports compound terms<br />TMRAP<br />Nav<br />DB2TM<br />Classify<br />Engine<br />Memory<br />RDBMS<br />Remote<br />
  28. 28. Example of keyword extraction<br />topic maps 1.0<br />metadata 0.57<br />subject-based class. 0.42<br />Core metadata 0.42<br />faceted classification 0.34<br />taxonomy 0.22<br />monolingual thesauri 0.19<br />controlled vocabulary 0.19<br />Dublin Core 0.16<br />thesauri 0.16<br />Dublin 0.15<br />keywords 0.15<br />
  29. 29. Example #2<br />Automated classification 1.0 5<br />Topic Maps 0.51 14<br />XSLT 0.38 11<br />compound keywords 0.29 2<br />keywords 0.26 20<br />Lars 0.23 1<br />Marius 0.23 1<br />Garshol 0.22 1<br />...<br />
  30. 30. So how could this be used?<br />To help users classify new documents in a CMS interface<br />suggest appropriate keywords, screened by user before approval<br />Automate classification of incoming documents<br />this means lower quality, but also lower cost<br />Get an overview of interesting terms in a document corpus<br />classify all documents, extract the most interesting terms<br />this can be used as the starting point for building an ontology<br />(keyword extraction only)<br />
  31. 31. Example user interface<br />The user creates an article<br />this screen then used to add keywords<br />user adjusts the proposals from the classifier<br />
  32. 32. Vizigator<br />Viz<br />Ontopoly<br />Graphical visualization<br />VizDesktop<br />Swing app to configure<br />filter/style/...<br />Vizlet<br />Java applet for web<br />uses configuration<br />loads via TMRAP<br />uses “Remote” backend<br />TMRAP<br />Nav<br />DB2TM<br />Classify<br />Engine<br />Memory<br />RDBMS<br />Remote<br />
  33. 33. The Vizigator<br />Graphical visualization of Topic Maps<br />Two parts<br />VizDesktop: Swing desktop app for configuration<br />Vizlet: Java applet for web deployment<br />Configuration stored in XTM file<br />
  34. 34. Without configuration<br />
  35. 35. With configuration<br />
  36. 36. The Vizigator<br />The Vizigator uses TMRAP<br />the Vizlet runs in the browser (on the client)<br />a fragment of the topic map is downloaded from the server<br />the fragment is grown as needed<br />Server<br />TMRAP<br />
  37. 37. Ontopoly<br />Viz<br />Ontopoly<br />Generic editor<br />web-based, AJAX<br />meta-ontology in TM<br />Ontology designer<br />create types and fields<br />control user interface<br />build views<br />incremental dev<br />Instance editor<br />guided by ontology<br />TMRAP<br />Nav<br />DB2TM<br />Classify<br />Engine<br />Memory<br />RDBMS<br />Remote<br />
  38. 38. Ontopoly<br />A generic Topic Maps editor, in two parts<br />ontology editor: used to create the ontology and schema<br />instance editor: used to enter instances based on ontology<br />Built with the Web Editor Framework<br />works with both XTM files and topic maps stored in RDBMS backend<br />supports access control to administrative functions, ontology, and instance editors<br />existing topic maps can be imported<br />parts of the ontology can be marked as read-only, or hidden<br />
  39. 39.
  40. 40. Typical deployment<br />Viewing<br />application<br />Engine<br />Users<br />DB<br />Backend<br />Ontopoly<br />Frameworks<br />Editors<br />DB<br />TMRAP<br />DB2TM<br />HTTP<br />DB<br />External application<br />Application server<br />
  41. 41. CMS integration<br />The best way to add content functionality to Ontopia<br />the world doesn’t need another CMS<br />better to reuse those which already exist<br />So far two integrations exist<br />Escenic<br />OfficeNet Knowledge Portal<br />more are being worked on<br />
  42. 42. Implementation<br />A CMS event listener<br />the listener creates topics for new CMS articles, folders, etc<br />the mapping is basically the design of the ontology used by this listener<br />Presentation integration<br />it must be possible to list all topics attached to an article<br />conversely, it must be possible to list all articles attached to a topic<br />how close the integration needs to be here will vary, as will the difficulty of the integration<br />User interface integration<br />it needs to be possible to attach topics to an article from within the normal CMS user interface<br />this can be quite tricky<br />Search integration<br />the Topic Maps search needs to also search content in the CMS<br />can be achieved by writing a tolog plug-in<br />
  43. 43. Articles as topics<br />is about<br />Elections<br />New city council appointed<br />Goal: associate articles with topics<br />mainly to say what they are about<br />typically also want to include other metadata<br />Need to create topics for the articles to do this<br />in fact, a general CMS-to-TM mapping is needed<br />must decide what metadata and structures to include<br />
  44. 44. Mapping issues<br />Article topics<br />what topic type to use?<br />title becomes name? (do you know the title?)<br />include author? include last modified? include workflow state?<br />should all articles be mapped?<br />Folders/directories/sections/...<br />should these be mapped, too?<br />one topic type for all folders/.../.../...?<br />if so, use associations to connect articles to folders<br />use associations to reproduce hierarchical folder structure<br />Multimedia objects<br />should these be included?<br />what topic type? what name? ...<br />
  45. 45. Two styles of mappings<br />Articles as articles<br />Topic represents only the article<br />Topic type is some subclass of “article”<br />“Is about” association connects article into topic map<br />Fields are presentational<br />title, abstract, body<br />Articles as concepts<br />Topic represents some real-world subject (like a person)<br />article is just the default content about that subject<br />Type is the type of the subject (person)<br />Semantic associations to the rest of the topic map<br />works in department, has competence, ...<br />Fields can be semantic<br />name, phone no, email, ...<br />
  46. 46. Article as article<br />Article about building of a new school<br />Is about association to “Primary schools”<br />Topic type is “article”<br />
  47. 47. Article as concept<br />Article about a sports hall<br />Article really represents the hall<br />Topic type is “Location”<br />Associations to<br /><ul><li>city borough
  48. 48. events in the location
  49. 49. category “Sports”</li></li></ul><li>
  50. 50.
  51. 51.
  52. 52.
  53. 53.
  54. 54. Two projects<br />
  55. 55. The project<br />A new citizen’s portal for the city administration<br />strategic decision to make portal main interface for interaction with citizens<br />as many services as possible are to be moved online<br />Big project<br />started in late 2004, to continue at least into 2008<br />~5 million Euro spent by launch date<br />1.7 million Euro budgeted for 2007<br />Topic Maps development is a fraction of this (less than 25%)<br />Many companies involved<br />Bouvet/Ontopia<br />Avenir<br />KPMG<br />Karabin<br />Escenic<br />
  56. 56. Simplified original ontology<br />Service catalog<br />Escenic (CMS)<br />LOS<br />Form<br />Article<br />nearly<br />everything<br />Category<br />Service<br />Subject<br />Department<br />Borough<br />External<br />resource<br />Employee<br />Payroll++<br />
  57. 57. Data flow<br />Ontopoly<br />Ontopia<br />Escenic<br />LOS<br />Integration<br />TMSync<br />DB2TM<br />Fellesdata<br />Payroll<br />(Agresso)<br />Dexter/Extens<br />Service<br />Catalog<br />
  58. 58. Conceptual architecture<br />Data<br />sources<br />Oracle Portal<br />Application<br />Ontopia<br />Escenic<br />Oracle Database<br />
  59. 59. The portal<br />
  60. 60. Technical architecture<br />
  61. 61. NRK/Skole<br />Norwegian National Broadcasting (NRK)<br />media resources from the archives<br />published for use in schools<br />integrated with the National Curriculum<br />In production<br />delayed by copyright wrangling<br />Technologies<br />OKS<br />Polopoly CMS<br />MySQL database<br />Resin application server<br />
  62. 62. Curriculum-based browsing (1)<br />Curriculum<br />Social studies<br />High school<br />
  63. 63. Curriculum-based browsing (2)<br />Gender roles<br />
  64. 64. Curriculum-based browsing (3)<br />Feminist movement in the 70s and 80s<br />Changes to the family in the 70s<br />The prime minister’s husband<br />Children choosing careers<br />Gay partnerships in 1993<br />
  65. 65. One video (prime minister’s husband)<br />Metadata<br />Subject<br />Person<br />Related<br />resources<br />Description<br />
  66. 66. Conceptual architecture<br />Polopoly<br />HTTP<br />Ontopia<br />MediaDB<br />Grep<br />DB2TM<br />TMSync<br />RDBMS backend<br />MySQL<br />Editors<br />
  67. 67. Implementation<br />Domain model in Java<br />Plain old Java objects built on<br />Ontopia’s Java API<br />tolog<br />JSP for presentation<br />using JSTL on top of the domain model<br />Subversion for the source code<br />Maven2 to build and deploy<br />Unit tests<br />
  68. 68. What we’d like to see<br />The future<br />
  69. 69. The big picture<br />Auto-class.<br />A.N.other<br />A.N.other<br />Other<br />CMSs<br />A.N.other<br />A.N.other<br />DB2TM<br />Portlet support<br />OKP<br />XML2TM<br />Engine<br />CMSintegration<br />Data <br />integration<br />Escenic<br />Taxon.import<br />Ontopoly<br />Web<br />service<br />
  70. 70. CMS integrations<br />The more of these, the better<br />Candidate CMSs<br />Liferay (being worked on at Bouvet)<br />Alfresco (might be started soon)<br />Magnolia<br />Inspera (possible project here)<br />JSR-170 Java Content Repository<br />CMIS (OASIS web service standard)<br />
  71. 71. Portlet toolkit<br />Subversion contains a number of “portlets”<br />basically, Java objects doing presentation tasks<br />some have JSP wrappers as well<br />Examples<br />display tree view<br />list of topics filterable by facets<br />show related topics<br />get-topic-page via TMRAP component<br />Not ready for prime-time yet<br />undocumented<br />incomplete<br />
  72. 72. Ontopoly plug-ins<br />Plugins for getting more data from externals<br />TMSync import plugin<br />DB2TM plugin<br />Subj3ct.com plugin<br />adapted RDF2TM plugin<br />classify plugin<br />...<br />Plugins for ontology fragments<br />menu editor, for example<br />
  73. 73. TMCL<br />Now implementable<br />We’d like to see<br />an object model for TMCL (supporting changes)<br />a validator based on the object model<br />Ontopoly import/export from TMCL (initially)<br />refactor Ontopoly API to make it more portable<br />Ontopoly ported to use TMCL natively (eventually)<br />
  74. 74. Things we’d like to remove<br />OSL support<br />Ontopia Schema Language<br />Web editor framework<br />unfortunately, still used by some major customers<br />Fulltext search<br />the old APIs for this are not really of any use<br />
  75. 75. Management interface<br />Import topic maps (to file or RDBMS)<br />
  76. 76. What do you think?<br />Suggestions?<br />Questions?<br />Plans?<br />Ideas?<br />
  77. 77. Setting up the developer environment<br />Getting started<br />
  78. 78. If you are using Ontopia...<br />...simply download the zip, then<br />unzip,<br />set the classpath,<br />start the server, ...<br />...and you’re good to go<br />
  79. 79. If you are developing Ontopia...<br />You must have<br />Java 1.5 (not 1.6 or 1.7 or ...)<br />Ant 1.6 (or later)<br />Ivy 2.0 (or later)<br />Subversion<br />Then<br />check out the source from Subversion<br />svn checkout http://ontopia.googlecode.com/svn/trunk/ ontopia-read-only<br />ant bootstrap<br />ant dist.jar.ontopia<br />ant test<br />ant dist.ontopia<br />
  80. 80. Beware<br />This is fun, because<br />you can play around with anything you want<br />e.g, my build has a faster TopicIF.getRolesByType<br />you can track changes as they happen in svn<br />However, you’re on your own<br />if it fails it’s kind of hard to say why<br />maybe it’s your changes, maybe not<br />For production use, official releases are best<br />
  81. 81. Participating etc<br />The project<br />
  82. 82. Our goal<br />To provide the best toolkit for building Topic Maps-based applications<br />We want it to be<br />actively maintained,<br />bug-free,<br />scalable,<br />easy to use,<br />well documented,<br />stable,<br />reliable<br />
  83. 83. Our philosophy<br />We want Ontopia to provide as much useful more-or-less generic functionality as possible<br />New contributions are generally welcome as long as<br />they meet the quality requirements, and<br />they don’t cause problems for others<br />
  84. 84. The sandbox<br />There’s a lot of Ontopia-related code which does not meet those requirements<br />some of it can be very useful,<br />someone may pick it up and improve it<br />The sandbox is for these pieces<br />some are in Ontopia’s Subversion repository,<br />others are maintained externally<br />To be “promoted” into Ontopia a module needs<br />an active maintainer,<br />to be generally useful, and<br />to meet certain quality requirements<br />
  85. 85. Communications<br />Join the mailing list(s)!<br />http://groups.google.com/group/ontopia<br />http://groups.google.com/group/ontopia-dev<br />Google Code page<br />http://code.google.com/p/ontopia/<br />note the “updates” feed!<br />Blog<br />http://ontopia.wordpress.com<br />Twitter<br />http://twitter.com/ontopia<br />
  86. 86. Committers<br />These are the people who run the project<br />they can actually commit to Subversion<br />they can vote on decisions to be made etc<br />Everyone else can<br />use the software as much as they want,<br />report and comment on issues,<br />discuss on the mailing list, and<br />submit patches for inclusion<br />
  87. 87. How to become a committer<br />Participate in the project!<br />that is, get involved first<br />let people get to know you, show some commitment<br />Once you’ve gotten some way into the project you can ask to become a committer<br />best if you have provided some patches first<br />Unless you’re going to commit changes there’s no need to be a committer<br />
  88. 88. Finding a task to work on<br />Report bugs!<br />they exist. if you find any, please report them.<br />Look at the open issues<br />there is always testing/discussion to be done<br />Look for issues marked “newbie”<br />http://code.google.com/p/ontopia/issues/list?q=label:Newbie<br />Look at what’s in the sandbox<br />most of these modules need work<br />Scratch an itch<br />if there’s something you want fixed/changed/added...<br />
  89. 89. How to fix a bug<br />First figure out why you think it fails<br />Then write a test case<br />based on your assumption<br />make sure the test case fails (test before you fix)<br />Then fix the bug<br />follow the coding guidelines (see wiki)<br />Then run the test suite<br />verify that you’ve fixed the bug<br />verify that you haven’t broken anything<br />Then submit the patch<br />
  90. 90. The test suite<br />Lots of *.test packages in the source tree<br />3148 test cases as of right now<br />test data in ontopia/src/test-data<br />some tests are generators based on files<br />some of the test files come from cxtm-tests.sf.net<br />Run with<br />ant test<br />java net.ontopia.test.TestRunner src/test-data/config/tests.xml test-group<br />
  91. 91. Source tree structure<br />net.ontopia.<br />utils various utilities<br />test various test support code<br />infoset LocatorIF code + cruft<br />persistence OR-mapper for RDBMS backend<br />product cruft<br />xml various XML-related utilities<br />topicmaps next slides<br />
  92. 92. Source tree structure<br />net.ontopia.topicmaps.<br />core core engine API<br />impl engine backends + utils<br />utils utilities (see next slide)<br />cmdlineutils command-line tools<br />entry TM repository<br />nav + nav2 navigator framework<br />query tolog engine<br />viz<br />classify <br />db2tm<br />webed cruft<br />
  93. 93. Source tree structure<br />net.ontopia.topicmaps.utils<br />* various utility classes<br />ltm LTM reader and writer<br />ctm CTM reader<br />rdf RDF converter (both ways)<br />tmrap TMRAP implementation<br />
  94. 94. Let’s write some code!<br />
  95. 95. The engine<br />The core API corresponds closely to the TMDM<br />TopicMapIF, TopicIF, TopicNameIF, ...<br />Compile with<br />ant init compile.ontopia<br />.class files go into ontopia/build/classes<br />ant dist.ontopia.jar # makes a jar<br />
  96. 96. The importers<br />Main class implements TopicMapReaderIF<br />usually, this lets you set up configuration, etc<br />then uses other classes to do the real work<br />XTM importers<br />use an XML parser<br />main work done in XTM(2)ContentHandler<br />some extra code for validation and format detection<br />CTM/LTM importers<br />use Antlr-based parsers<br />real code in ctm.g/ltm.g<br />All importers work via the core API<br />
  97. 97. Fixing a real bug<br />There is a failing test case in the TM/XML importer<br />So let’s fix that right now...<br />
  98. 98. Find an issue in the issue tracker<br />(Picking one with “Newbie” might be good, <br />but isn’t necessary)<br />Get set up<br />check out the source code<br />build the code<br />run the test suite<br />Then dig in<br />we’ll help you with any questions you have<br />At the end, submit a patch to the issue tracker<br />remember to use the test suite!<br />