Your SlideShare is downloading. ×

Ontopia Code Camp

4,239
views

Published on

A presentation of the Ontopia product from the Ontopia Code Camp at TMRA 2009.

A presentation of the Ontopia product from the Ontopia Code Camp at TMRA 2009.

Published in: Education, Technology

0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,239
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
53
Comments
0
Likes
4
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Ontopia Code Camp
    TMRA 2009-11-11
    Lars Marius Garshol & Geir Ove Grønmo
  • 2. Agenda
    About you
    who are you?
    what do you want from the code camp?
    About Ontopia
    The product
    The future
    Participating in the project
    Writing some code!
  • 3. Some background
    About Ontopia
  • 4. Brief history
    1999-2000
    private hobby project for Geir Ove
    2000-2009
    commercial software sold by Ontopia AS
    lots of international customers in diverse fields
    2009-
    open source project
  • 5. The project
    Open source hosted at Google Code
    Contributors
    Lars Marius Garshol, Bouvet
    Geir Ove Grønmo, Bouvet
    Thomas Neidhart, SpaceApps
    Lars Heuer, Semagia
    Hannes Niederhausen, TMLab
    Stig Lau, Bouvet
    Baard H. Rehn-Johansen, Bouvet
    Peter-Paul Kruijssen, Morpheus
    Quintin Siebers, Morpheus
  • 6. Current activity (toward 5.1)
    tolog updates
    added by LMG
    Various fixes and optimizations
    by everyone
    Toma implementation (in sandbox)
    by Thomas
    TMQL implementation (in sandbox)?
    by Sven Krosse
  • 7. Architecture and modules
    The product
  • 8. The big picture
    Auto-class.
    A.N.other
    A.N.other
    Other
    CMSs
    A.N.other
    A.N.other
    DB2TM
    Portlet support
    OKP
    XML2TM
    Engine
    CMSintegration
    Data
    integration
    Escenic
    Taxon.import
    Ontopoly
    Web
    service
  • 9. The engine
    Core API
    TMAPI 2.0 support
    Import/export
    RDF conversion
    TMSync
    Fulltext search
    Event API
    tolog query language
    tolog update language
    Engine
  • 10. Query Engine
    Implementation of Ontopia’s tolog language (based on Prolog and SQL)
    Allows powerful queries on the topic map data structure
    Simplifies application development and improves performance
    Example:
    select $B, count($A) from
    instance-of($B, city),
    { premiere($A : opera, $B : place) |
    premiere($A : opera, $C : place),
    located-in($C : containee, $B : container) }
    order by $A desc?
    • returns all B's and the corresponding number of A's whereB is a city ANDEITHER B is the place where A was premieredOR the place where A was premiered is located in B in decreasing order of A
  • TMSync
    Configurable module for synchronizing one TM against another
    define subset of source TM to sync (using tolog)
    define subset of target TM to sync (using tolog)
    the module handles the rest
    Can also be used with non-TM sources
    create a non-updating conversion from the source to some TM format
    then use TMSync to sync against the converted TM instead of directly against the source
  • 11. How TMSync works
    Define which part of the target topic map you want,
    Define which part of the source topic map it is the master for, and
    The algorithm does the rest
  • 12. If the source is not a topic map
    TMSync
    convert.xslt
    Simply do a normal one-time conversion
    let TMSync do the update for you
    In other words, TMSync reduces the update problem to a conversion problem
    source.xml
  • 13. The City of Bergen usecase
    Norge.no
    Service
    Unit
    Person
    LOS
    City of Bergen
    LOS
  • 14. The backends
    In-memory
    no persistent storage
    thread-safe
    no setup
    RDBMS
    transactions
    persistent
    thread-safe
    uses caching
    clustering
    Remote
    uses web service
    read-only
    unofficial
    Engine
    Memory
    RDBMS
    Remote
  • 15. RDBMS Backend
    Allows the Engine to use topic maps stored in a relational database
    Based on a generic topic map schema
    Necessary when working with very large topic maps
    Transparent to applications
    Features
    Automatically loads data when needed
    Caches frequently used data
    Full support for RDBMS transactions
    Supports tolog-to-SQL compilation
    Statistical reports for performance tuning
    Platform support
    Oracle, MySQL, PostgreSQL, MS SQL Server
    Test suite available for verifying compatibility with other JDBC-enabled RDBMSes
  • 16. DB2TM
    Upconversion to TMs
    from RDBMS via JDBC
    or from CSV
    Uses XML mapping
    can call out to Java
    Supports sync
    either full rescan
    or change table
    TMRAP
    Nav
    DB2TM
    Classify
    Engine
    Memory
    RDBMS
    Remote
  • 17. DB2TM example
    Ontopia
    +
    =
    United Nations
    Bouvet
    <relation name="organizations.csv" columns="id name url">
    <topic type="ex:organization">
    <item-identifier>#org${id}</item-identifier>
    <topic-name>${name}</topic-name>
    <occurrence type="ex:homepage">${url}</occurrence>
    </topic>
    </relation>
  • 18. TMRAP
    Web service interface
    via SOAP
    via plain HTTP
    Requests
    get-topic
    get-topic-page
    get-tolog
    delete-topic
    ...
    TMRAP
    Nav
    DB2TM
    Classify
    Engine
    Memory
    RDBMS
    Remote
  • 19. Navigator framework
    Servlet-based API
    manage topic maps
    load/scan/delete/create
    JSP tag library
    XSLT-like
    based on tolog
    JSTL integration
    TMRAP
    Nav
    DB2TM
    Classify
    Engine
    Memory
    RDBMS
    Remote
  • 20. Ontopia Navigator Framework
    Java API for interacting with TM repository
    JSP tag library
    based on tolog
    kind of like XSLT in JSP with tolog instead of XPath
    has JSTL integration
    Undocumented parts
    web presentation components
    some wrapped as JSP tags
    want to build proper portlets from them
  • 21. http://www.ontopia.net/operamap
  • 22. Navigator tag library example
    <%-- assume variable 'composer' is already set --%>
    <p><b>Operas:</b><br/><tolog:foreach query=”composed-by(%composer% : composer, $OPERA : opera), { premiere-date($OPERA, $DATE) }?”> <li> <a href="opera.jsp?id=<tolog:id var="OPERA"/>”
    ><tolog:out var="OPERA"/></a>
    <tolog:if var="DATE"> <tolog:out var="DATE"/> </tolog:if> </li></tolog:foreach></p>
  • 23. Elmer Preview
  • 24.
  • 25.
  • 26.
  • 27. Automated classification
    Undocumented
    experimental
    Extracts text
    autodetects format
    Word, PDF, XML, HTML
    Processes text
    detects language
    stemming, stop-words
    Extracts keywords
    ranked by importance
    uses existing topics
    supports compound terms
    TMRAP
    Nav
    DB2TM
    Classify
    Engine
    Memory
    RDBMS
    Remote
  • 28. Example of keyword extraction
    topic maps 1.0
    metadata 0.57
    subject-based class. 0.42
    Core metadata 0.42
    faceted classification 0.34
    taxonomy 0.22
    monolingual thesauri 0.19
    controlled vocabulary 0.19
    Dublin Core 0.16
    thesauri 0.16
    Dublin 0.15
    keywords 0.15
  • 29. Example #2
    Automated classification 1.0 5
    Topic Maps 0.51 14
    XSLT 0.38 11
    compound keywords 0.29 2
    keywords 0.26 20
    Lars 0.23 1
    Marius 0.23 1
    Garshol 0.22 1
    ...
  • 30. So how could this be used?
    To help users classify new documents in a CMS interface
    suggest appropriate keywords, screened by user before approval
    Automate classification of incoming documents
    this means lower quality, but also lower cost
    Get an overview of interesting terms in a document corpus
    classify all documents, extract the most interesting terms
    this can be used as the starting point for building an ontology
    (keyword extraction only)
  • 31. Example user interface
    The user creates an article
    this screen then used to add keywords
    user adjusts the proposals from the classifier
  • 32. Vizigator
    Viz
    Ontopoly
    Graphical visualization
    VizDesktop
    Swing app to configure
    filter/style/...
    Vizlet
    Java applet for web
    uses configuration
    loads via TMRAP
    uses “Remote” backend
    TMRAP
    Nav
    DB2TM
    Classify
    Engine
    Memory
    RDBMS
    Remote
  • 33. The Vizigator
    Graphical visualization of Topic Maps
    Two parts
    VizDesktop: Swing desktop app for configuration
    Vizlet: Java applet for web deployment
    Configuration stored in XTM file
  • 34. Without configuration
  • 35. With configuration
  • 36. The Vizigator
    The Vizigator uses TMRAP
    the Vizlet runs in the browser (on the client)
    a fragment of the topic map is downloaded from the server
    the fragment is grown as needed
    Server
    TMRAP
  • 37. Ontopoly
    Viz
    Ontopoly
    Generic editor
    web-based, AJAX
    meta-ontology in TM
    Ontology designer
    create types and fields
    control user interface
    build views
    incremental dev
    Instance editor
    guided by ontology
    TMRAP
    Nav
    DB2TM
    Classify
    Engine
    Memory
    RDBMS
    Remote
  • 38. Ontopoly
    A generic Topic Maps editor, in two parts
    ontology editor: used to create the ontology and schema
    instance editor: used to enter instances based on ontology
    Built with the Web Editor Framework
    works with both XTM files and topic maps stored in RDBMS backend
    supports access control to administrative functions, ontology, and instance editors
    existing topic maps can be imported
    parts of the ontology can be marked as read-only, or hidden
  • 39.
  • 40. Typical deployment
    Viewing
    application
    Engine
    Users
    DB
    Backend
    Ontopoly
    Frameworks
    Editors
    DB
    TMRAP
    DB2TM
    HTTP
    DB
    External application
    Application server
  • 41. CMS integration
    The best way to add content functionality to Ontopia
    the world doesn’t need another CMS
    better to reuse those which already exist
    So far two integrations exist
    Escenic
    OfficeNet Knowledge Portal
    more are being worked on
  • 42. Implementation
    A CMS event listener
    the listener creates topics for new CMS articles, folders, etc
    the mapping is basically the design of the ontology used by this listener
    Presentation integration
    it must be possible to list all topics attached to an article
    conversely, it must be possible to list all articles attached to a topic
    how close the integration needs to be here will vary, as will the difficulty of the integration
    User interface integration
    it needs to be possible to attach topics to an article from within the normal CMS user interface
    this can be quite tricky
    Search integration
    the Topic Maps search needs to also search content in the CMS
    can be achieved by writing a tolog plug-in
  • 43. Articles as topics
    is about
    Elections
    New city council appointed
    Goal: associate articles with topics
    mainly to say what they are about
    typically also want to include other metadata
    Need to create topics for the articles to do this
    in fact, a general CMS-to-TM mapping is needed
    must decide what metadata and structures to include
  • 44. Mapping issues
    Article topics
    what topic type to use?
    title becomes name? (do you know the title?)
    include author? include last modified? include workflow state?
    should all articles be mapped?
    Folders/directories/sections/...
    should these be mapped, too?
    one topic type for all folders/.../.../...?
    if so, use associations to connect articles to folders
    use associations to reproduce hierarchical folder structure
    Multimedia objects
    should these be included?
    what topic type? what name? ...
  • 45. Two styles of mappings
    Articles as articles
    Topic represents only the article
    Topic type is some subclass of “article”
    “Is about” association connects article into topic map
    Fields are presentational
    title, abstract, body
    Articles as concepts
    Topic represents some real-world subject (like a person)
    article is just the default content about that subject
    Type is the type of the subject (person)
    Semantic associations to the rest of the topic map
    works in department, has competence, ...
    Fields can be semantic
    name, phone no, email, ...
  • 46. Article as article
    Article about building of a new school
    Is about association to “Primary schools”
    Topic type is “article”
  • 47. Article as concept
    Article about a sports hall
    Article really represents the hall
    Topic type is “Location”
    Associations to
    • city borough
    • 48. events in the location
    • 49. category “Sports”
  • 50.
  • 51.
  • 52.
  • 53.
  • 54. Two projects
  • 55. The project
    A new citizen’s portal for the city administration
    strategic decision to make portal main interface for interaction with citizens
    as many services as possible are to be moved online
    Big project
    started in late 2004, to continue at least into 2008
    ~5 million Euro spent by launch date
    1.7 million Euro budgeted for 2007
    Topic Maps development is a fraction of this (less than 25%)
    Many companies involved
    Bouvet/Ontopia
    Avenir
    KPMG
    Karabin
    Escenic
  • 56. Simplified original ontology
    Service catalog
    Escenic (CMS)
    LOS
    Form
    Article
    nearly
    everything
    Category
    Service
    Subject
    Department
    Borough
    External
    resource
    Employee
    Payroll++
  • 57. Data flow
    Ontopoly
    Ontopia
    Escenic
    LOS
    Integration
    TMSync
    DB2TM
    Fellesdata
    Payroll
    (Agresso)
    Dexter/Extens
    Service
    Catalog
  • 58. Conceptual architecture
    Data
    sources
    Oracle Portal
    Application
    Ontopia
    Escenic
    Oracle Database
  • 59. The portal
  • 60. Technical architecture
  • 61. NRK/Skole
    Norwegian National Broadcasting (NRK)
    media resources from the archives
    published for use in schools
    integrated with the National Curriculum
    In production
    delayed by copyright wrangling
    Technologies
    OKS
    Polopoly CMS
    MySQL database
    Resin application server
  • 62. Curriculum-based browsing (1)
    Curriculum
    Social studies
    High school
  • 63. Curriculum-based browsing (2)
    Gender roles
  • 64. Curriculum-based browsing (3)
    Feminist movement in the 70s and 80s
    Changes to the family in the 70s
    The prime minister’s husband
    Children choosing careers
    Gay partnerships in 1993
  • 65. One video (prime minister’s husband)
    Metadata
    Subject
    Person
    Related
    resources
    Description
  • 66. Conceptual architecture
    Polopoly
    HTTP
    Ontopia
    MediaDB
    Grep
    DB2TM
    TMSync
    RDBMS backend
    MySQL
    Editors
  • 67. Implementation
    Domain model in Java
    Plain old Java objects built on
    Ontopia’s Java API
    tolog
    JSP for presentation
    using JSTL on top of the domain model
    Subversion for the source code
    Maven2 to build and deploy
    Unit tests
  • 68. What we’d like to see
    The future
  • 69. The big picture
    Auto-class.
    A.N.other
    A.N.other
    Other
    CMSs
    A.N.other
    A.N.other
    DB2TM
    Portlet support
    OKP
    XML2TM
    Engine
    CMSintegration
    Data
    integration
    Escenic
    Taxon.import
    Ontopoly
    Web
    service
  • 70. CMS integrations
    The more of these, the better
    Candidate CMSs
    Liferay (being worked on at Bouvet)
    Alfresco (might be started soon)
    Magnolia
    Inspera (possible project here)
    JSR-170 Java Content Repository
    CMIS (OASIS web service standard)
  • 71. Portlet toolkit
    Subversion contains a number of “portlets”
    basically, Java objects doing presentation tasks
    some have JSP wrappers as well
    Examples
    display tree view
    list of topics filterable by facets
    show related topics
    get-topic-page via TMRAP component
    Not ready for prime-time yet
    undocumented
    incomplete
  • 72. Ontopoly plug-ins
    Plugins for getting more data from externals
    TMSync import plugin
    DB2TM plugin
    Subj3ct.com plugin
    adapted RDF2TM plugin
    classify plugin
    ...
    Plugins for ontology fragments
    menu editor, for example
  • 73. TMCL
    Now implementable
    We’d like to see
    an object model for TMCL (supporting changes)
    a validator based on the object model
    Ontopoly import/export from TMCL (initially)
    refactor Ontopoly API to make it more portable
    Ontopoly ported to use TMCL natively (eventually)
  • 74. Things we’d like to remove
    OSL support
    Ontopia Schema Language
    Web editor framework
    unfortunately, still used by some major customers
    Fulltext search
    the old APIs for this are not really of any use
  • 75. Management interface
    Import topic maps (to file or RDBMS)
  • 76. What do you think?
    Suggestions?
    Questions?
    Plans?
    Ideas?
  • 77. Setting up the developer environment
    Getting started
  • 78. If you are using Ontopia...
    ...simply download the zip, then
    unzip,
    set the classpath,
    start the server, ...
    ...and you’re good to go
  • 79. If you are developing Ontopia...
    You must have
    Java 1.5 (not 1.6 or 1.7 or ...)
    Ant 1.6 (or later)
    Ivy 2.0 (or later)
    Subversion
    Then
    check out the source from Subversion
    svn checkout http://ontopia.googlecode.com/svn/trunk/ ontopia-read-only
    ant bootstrap
    ant dist.jar.ontopia
    ant test
    ant dist.ontopia
  • 80. Beware
    This is fun, because
    you can play around with anything you want
    e.g, my build has a faster TopicIF.getRolesByType
    you can track changes as they happen in svn
    However, you’re on your own
    if it fails it’s kind of hard to say why
    maybe it’s your changes, maybe not
    For production use, official releases are best
  • 81. Participating etc
    The project
  • 82. Our goal
    To provide the best toolkit for building Topic Maps-based applications
    We want it to be
    actively maintained,
    bug-free,
    scalable,
    easy to use,
    well documented,
    stable,
    reliable
  • 83. Our philosophy
    We want Ontopia to provide as much useful more-or-less generic functionality as possible
    New contributions are generally welcome as long as
    they meet the quality requirements, and
    they don’t cause problems for others
  • 84. The sandbox
    There’s a lot of Ontopia-related code which does not meet those requirements
    some of it can be very useful,
    someone may pick it up and improve it
    The sandbox is for these pieces
    some are in Ontopia’s Subversion repository,
    others are maintained externally
    To be “promoted” into Ontopia a module needs
    an active maintainer,
    to be generally useful, and
    to meet certain quality requirements
  • 85. Communications
    Join the mailing list(s)!
    http://groups.google.com/group/ontopia
    http://groups.google.com/group/ontopia-dev
    Google Code page
    http://code.google.com/p/ontopia/
    note the “updates” feed!
    Blog
    http://ontopia.wordpress.com
    Twitter
    http://twitter.com/ontopia
  • 86. Committers
    These are the people who run the project
    they can actually commit to Subversion
    they can vote on decisions to be made etc
    Everyone else can
    use the software as much as they want,
    report and comment on issues,
    discuss on the mailing list, and
    submit patches for inclusion
  • 87. How to become a committer
    Participate in the project!
    that is, get involved first
    let people get to know you, show some commitment
    Once you’ve gotten some way into the project you can ask to become a committer
    best if you have provided some patches first
    Unless you’re going to commit changes there’s no need to be a committer
  • 88. Finding a task to work on
    Report bugs!
    they exist. if you find any, please report them.
    Look at the open issues
    there is always testing/discussion to be done
    Look for issues marked “newbie”
    http://code.google.com/p/ontopia/issues/list?q=label:Newbie
    Look at what’s in the sandbox
    most of these modules need work
    Scratch an itch
    if there’s something you want fixed/changed/added...
  • 89. How to fix a bug
    First figure out why you think it fails
    Then write a test case
    based on your assumption
    make sure the test case fails (test before you fix)
    Then fix the bug
    follow the coding guidelines (see wiki)
    Then run the test suite
    verify that you’ve fixed the bug
    verify that you haven’t broken anything
    Then submit the patch
  • 90. The test suite
    Lots of *.test packages in the source tree
    3148 test cases as of right now
    test data in ontopia/src/test-data
    some tests are generators based on files
    some of the test files come from cxtm-tests.sf.net
    Run with
    ant test
    java net.ontopia.test.TestRunner src/test-data/config/tests.xml test-group
  • 91. Source tree structure
    net.ontopia.
    utils various utilities
    test various test support code
    infoset LocatorIF code + cruft
    persistence OR-mapper for RDBMS backend
    product cruft
    xml various XML-related utilities
    topicmaps next slides
  • 92. Source tree structure
    net.ontopia.topicmaps.
    core core engine API
    impl engine backends + utils
    utils utilities (see next slide)
    cmdlineutils command-line tools
    entry TM repository
    nav + nav2 navigator framework
    query tolog engine
    viz
    classify
    db2tm
    webed cruft
  • 93. Source tree structure
    net.ontopia.topicmaps.utils
    * various utility classes
    ltm LTM reader and writer
    ctm CTM reader
    rdf RDF converter (both ways)
    tmrap TMRAP implementation
  • 94. Let’s write some code!
  • 95. The engine
    The core API corresponds closely to the TMDM
    TopicMapIF, TopicIF, TopicNameIF, ...
    Compile with
    ant init compile.ontopia
    .class files go into ontopia/build/classes
    ant dist.ontopia.jar # makes a jar
  • 96. The importers
    Main class implements TopicMapReaderIF
    usually, this lets you set up configuration, etc
    then uses other classes to do the real work
    XTM importers
    use an XML parser
    main work done in XTM(2)ContentHandler
    some extra code for validation and format detection
    CTM/LTM importers
    use Antlr-based parsers
    real code in ctm.g/ltm.g
    All importers work via the core API
  • 97. Fixing a real bug
    There is a failing test case in the TM/XML importer
    So let’s fix that right now...
  • 98. Find an issue in the issue tracker
    (Picking one with “Newbie” might be good,
    but isn’t necessary)
    Get set up
    check out the source code
    build the code
    run the test suite
    Then dig in
    we’ll help you with any questions you have
    At the end, submit a patch to the issue tracker
    remember to use the test suite!