Expressive Query Answering For Semantic Wikis (20min)


Published on

A 20 minutes version of the talk

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Hello, welcome to my talk.
  • Semantic wikis have been increasingly popular in the past a few years. Theirpopularity may be attributed to many features of “wikiness”, such as being collaborative, simple, easy to learn, informality-tolerate, and evolving-capable. A semantic wiki allows you to start from unstructured, raw data, and gradually adding structures or even semantics to the data by yourself or by others. This approach often works better than many other knowledge management approaches for non-expert users.The part I love most of semantic wikis is that I can use them as a Web-based light-weight database. A wiki acts as an abstraction over the real data, regardless whether it is in a relational database, in a triple store, or online somewhere else. It also offers an easily-accessible interface that I can do almost all data management tasks from a browser: modeling, querying, and some inferencing. On the top of the wiki abstraction of data, we may build other interesting applications, such as maps, blogs, to-do lists, bibliography repository, and many other things.
  • Semantic MediaWiki can be said the most popular semantic wiki system currently available. There are a couple of reasons for the success of semantic wikis in general, and of SMW in particular. One prominent property shared by almost all semantic wikis is their simplicity and low-costness. Traditionally, to build a semantic application, one need tools for building ontologies, for annotating data with the ontologies, for querying data, for reasoning with the data and the ontologies, and languages to build the user interface. This involves learning a whole set of languages and tools, such as OWL, Protégé, SPARQL, Jena, Pellet and Java, etc. For many developers or users, the adoption cost of semantic web technologies is too high and the reward is relatively low. For example, a gym manager wants to build a website with a little bit semantics, will it make sense for him to learn the above set of languages? or to hire a semantic web programmer? Semantic wikis fill the gap with a low-cost solution for light-weight semantic applications. SMW, for example, provides an integrated environment for ontology building, for data annotation, for reasoning and querying, and for UI building. As it is built on the top of Mediawiki, there are many extensions, from visualization to I/O, that we can use to build applications. SMW provides a simple modeling language and a query language, which are considerably simpler than RDF and SPARQL, respectively. It is in fact a quite powerful tool and can be seen as a light-weight triple store, and we can build applications on its top.
  • However, despite its power, we often feel that the expressivity of SMW is too limited. For example, there are not inverse properties in SMW: I can not say that “has author” is the inverse of “author of”. Developers often need to use complicated templates and other tricks to work around this limitation. Another frequently needed feature is transitive property. For example, I may want to say that Nashua is a part of New Hampshire, and New Hampshire is a part of United States; therefore, Nashua is a part of United States.Similarly, we often need additional expressivity in the query language of SMW. One example is negation, such as to find cities that are not capitals. Another example is counting, for example, to find professors who advise more than 5 students.
  • To pick up a right set of expressivity for semantic wiki modeling, we need to balance between expressiveness and simplicity. For example, why not pick OWL 2 QL as SMW data is stored in a relational database anyway? Or why not OWL 2 RL which can be implemented with rule-based reasoning? To find the right mix of supported features, I believe that what matters the most is not whether the set is maximally expressive, or whether it is tractable for the worst case time complexity. The right criteria might beIf users need itIf the adoption cost is lowKeeping this in mind, I selected OWL Prime as the subset of OWL supported in the extended SMW modeling language. For the query language, I extended SMW-QL with negation as failure and cardinality queries.
  • The next question is what semantics to use. OWL adopts the open world assumption (OWA), that is, if something can not be proven true, it is not necessarily false. Databases and many rule systems, on the other hand, adopt the closed world assumption (CWA).Semantic wiki, is in fact more close to a database than to a knowledge base with OWA. When we query against a wiki, we are, for most of time, only interested in the knowledge mentioned in the wiki. If something is not said in the wiki, we assume that it is false. If we list two authors for a paper, then by default the paper has just the two authors and no others. For another example, if Berlin is not said to be a person, then Berlin is not a person. A right semantics for SMW, is therefore not that of OWL, but a closed world semantics. For this research, I used datalog, which has a descriptive, closed world semantics, and with well-understood complexity and mature tool support.For the sake of time, I will not cover the full details of modeling SMW in datalog, but only on the new features. You may refer more details in the backup slides.
  • This slide shows the translation of extended SMW-ML into datalog. Theirmeanings are similar to the corresponding constructs in RDF or OWL, thus I may not have to explain them in details. One thing worth noting is that the SameAs relation here is weaker than owl:sameAs, so that in counting, even if SameAS(x,y) is true, x and y are still counted as two individuals.
  • This slide shows the translation of a SMW “ask” query into logic program rules. The query asks for cities that are capital of something. The query is turned into a rule on the right. The head of the rule is a special predicate “result”, which is used to collect all matched results in query answering. Each selection condition is translated into a body item in the rule. This is a very simple example. For other constructs, such as conjunction, disjunction, subquery, and property chain etc, see the backup slides
  • This slide shows the translation of the extended query language with negation into datalog.For the second case, why not “C(X), not P(X,Y)” ?If we have C(a), P(a,b), then the above query will return {a,b}, because C(a) and “not P(a,a)” are both true. Thus, “C(X), not P(X,Y)” is not a right translation.
  • Qualified cardinality queries and nonqualified cardinality queries are translated into similar rules using the count function. “Thing(x)” is added for safeness of the rule, that is, the rule will always return a result. We have a set of rules to ensure that everything is an instance of “thing”.
  • A quick note on the implementation. The backend reasoner I used is DLV, which has won the first ASP competition. In theory, other logic program solvers may be used as well. I have tried clasp, which was the winner of the second ASP competition. The performance of DLV and clasp are similar. I didn’t tried other solvers yet, such as smodels or cmodels. But it should not be too difficult to use them. The implementation has a file-based mode and a database-based mode. In the database-based mode, real-time changes of instance data will be captured, but it is in general a little slower than the file-based mode.As a side-benefit of this implementation, you are now able to decouple the content storage of the wiki and the semantic data storage of the wiki. As long as you provide an ODBC interface, your semantic data can be stored anywhere, not necessarily locally. This also enables remote querying of another wiki, or federated query of multiple wikis.
  • This page shows a screen shots. On the left we show modeling and query scripts of two pages, using inverse property and transitive property. The query result is shown on the right.
  • The next two slides show the scalability results. For data complexity, we measure query time as a function of the dataset size, for a fixed query. It is almost linear. This is largely because building an result set, or in DLV’s terminology, an answer set, requires linear time to the number of facts when the number of non-fact rules are small. In this experiment, we have about 100k triples of facts, but only less than 100 rules.
  • In the second graph, we can see that the query complexity is almost constant. Query complexity measures, for a fixed dataset, how fast query time increases as a function of query size. I have tried several query patterns, and all of them show constant time behavior. It is not true for SMW itself as it translates queries into SQL. An explanation for the constant time complexity is that the extended query are translated into non-ground rules, which are small when compared with the size of ground facts. For this sake, DLV is sensitive to factbase size in a linear way (probably because of grounding), but is insensitive to the rule set size as long as the factbase size is much larger. As most semantic wikis as of today have less than 10k pages and 100k triples, the implementation is probably fast enough for typical wiki users.
  • We have released our work as an extension of SemanticMediaWiki, called SemanticQueryRDFS++. You may try it out.We pick up this name because the OWL Prime subset of OWL has been called but others as RDFS 3.0 or RDFS++, and we believe “RDFS++” may give the best intuition of what is supported by our extension.
  • Summary, we have shown that formalizing SMW using datalog allows us to extend SMW for an expressive subset of OWL, to implement a SMW query engine that is scalable for typical uses, and, not mentioned in this talk because it only be interesting to logicians, to analyze the reasoning complexity of SMW and our extensionsThere are a couple things we want to do in the future. We want to support incremental reasoning so that we don’t have to compute the answer set every time from the scratch. We may support customized reasoning rules; if some users need more advanced reasoning, they should be able to. Finally, for exchanging data with other semantic web application, it would be nice to a translation between SPARQL and the query language of SMW.
  • Expressive Query Answering For Semantic Wikis (20min)

    1. 1. Expressive Query Answering For Semantic Wikis<br />Jie Bao, Rensselaer Polytechnic Institute<br />,<br />
    2. 2. Semantic Wiki as a Data Store<br />May 10, 2011<br />2<br />
    3. 3. Semantic Media Wiki (SMW)<br />Low-cost solution for light-weight semantic applications<br />Dozens of extensions to build apps.<br />Integrated environment for modeling and querying<br />SMW-ML (Modeling language): subclass/subproperty<br />SMW-QL (Query language): disjunctive query with subquery<br />(detailed SMW expressivity in the backup slides)<br />May 10, 2011<br />3<br />
    4. 4. However, we often need more expressivity<br />Modeling<br />Inverse property: “has author” <-> “author of”<br />Transitive property: “part of”<br />…<br />Query<br />Negation: find cities that are not capitals<br />Counting: find professors who advise more than 5 students<br />May 10, 2011<br />4<br />
    5. 5. Desired Expressivity <br />Balance between expressiveness and simplicity<br />Modeling Language: OWL Prime [1]<br />rdfs:subClassOf, subPropertyOf, domain, range<br />owl:TransitiveProperty, SymmetricProperty, FunctionalProperty,<br />InverseFunctionalProperty, inverseOf<br />owl:sameAs, equivalentClass, equivalentProperty<br />Query Language: SMW-QL, plus<br />Negation as failure<br />Cardinality (aggregation)<br />May 10, 2011<br />5<br />[1]<br />
    6. 6. Formalization<br />Note: Semantic Wiki is NOT an open world (as oppose to OWL)<br />Formalizing OWL Prime with CWA using datalog<br />Descriptive, closed-world semantics<br />Well-understood complexity and mature tool support<br />May 10, 2011<br />6<br />
    7. 7. SMW-ML+<br />[[Domain::C]]<br />[[Range::C]]<br />[[Type::Transitive]]<br />[[Type::Symmetric]]<br />[[Type::Functional]]<br />[[Type::InverseFunctional]]<br />[[Inverse of::Q]]<br />C(x) :- P(x,y)<br />C(y) :- P(x,y)<br />P(x,y) :- P(x,z), P(z,y)<br />P(x,y) :- P(y,x)<br />SameAs(x,y) :- P(z,x),P(z,y)<br />SameAs(x,y) :- P(x,z),P(y,z)<br />Q(x,y) :- P(y,x)<br />May 10, 2011<br />7<br />On page “Property:P”<br />Not owl:sameAs!<br />
    8. 8. Translation Rules for SMW-QL<br />{{#ask:<br /> [[Category:City]]<br /> [[capital of::+]] <br />}}<br />result(x) :- City(x), capital_of(x, y) .<br />May 10, 2011<br />8<br />Other constructs: for conjunction, disjunction, subquery, property chain etc, see backup slides<br />
    9. 9. SMW-QL+ : Negations<br />{{#askplus:<br /> [[<>Category:C]]<br /> [[Category:D]]<br />}}<br />{{#askplus:<br /> [[Category:C]]<br /> [[<>P::+]]<br />}}<br />result(x) :- D(x), not C(x) .<br />result(x) :- C(x), #count{x: P(x,y)}<=0 .<br />Why not “C(x), not P(x,y)” ?<br />May 10, 2011<br />9<br />
    10. 10. SMW-QL+: (Non)qualified Cardinality<br />{{#askplus:<br /> [[>=3#P::+]]<br />}}<br />{{#askplus:<br /> [[>=3#P::<br /> <q>[[Category:D]]</q>]]<br />}}<br />result(x) :- thing(x),<br /> #count{x: P(x,y)}>=3 .<br />result(x) :- thing(x),<br /> #count{x: P(x,y),D(x)}>=3 .<br />May 10, 2011<br />10<br />For safeness<br />
    11. 11. Implementation<br />Using DLV as the reasoner<br />Other LP solvers may be used as well<br />Two work modes<br /> File-based: reasoning based on a static dump (snapshot) of wiki semantic data.<br /> Database-based:  reasoning based on a shadow database via ODBC; Real-time changes of instance data will be updated.<br />Optimization<br />Caching<br />May 10, 2011<br />11<br />Download:<br />
    12. 12. Example:<br />May 10, 2011<br />12<br />Inverse property<br />Caching<br />Transitive property<br />
    13. 13. Scalability: Data Complexity<br />Test machine: 2 * Xeon 5365 Quad 3.0GHz 1333MHz /16G / 2 * 1TB<br />Dataset: part of DBLP, 10,396 pages, 100,736 triples<br />May 10, 2011<br />13<br />{{#askplus: [[Category:Person]] }}<br />Almost linear <br />
    14. 14. Scalability: Query Complexity<br />May 10, 2011<br />14<br />{{#askplus: [[Knows::<q>[[Knows::<q>[[Knows::<q>…</q>]]</q>]]</q>]] }}<br />Almost constant<br /><ul><li>Dataset: DBLP 100k triples</li></li></ul><li>The SemanticQueryRDFS++ extension<br />May 10, 2011<br />15<br /><br />
    15. 15. Some other work on SMW by us<br />Semantic History – tracking provenance of semantics <br />Tetherless Map – query-based map generation <br />DBLP Import – bibtex to semantic wiki <br />Array Extension – operate on arrays <br />RDFa Extension – RDFa <-> Wiki<br />Joint work with Li Ding, Jin Zheng, Rui Huang<br />May 10, 2011<br />16<br />
    16. 16. Summary<br />Formalizing SMW using datalog allows us to<br />extend SMW for an expressive subset of OWL.<br />implement a SMW query engine that is scalable good for typical uses.<br />analyze the reasoning complexity of SMW (not mentioned in the talk)<br />Future Work<br />Incremental reasoning<br />Customized reasoning rules<br />SPARQL <-> SMW-QL+ translations<br />May 10, 2011<br />17<br />
    17. 17. Backup<br />May 10, 2011<br />18<br />
    18. 18. Expressivity (SMW 1.5.4)<br />SMW-ML (Modeling Language)<br />category instantiation e.g., [[Category:C]]<br />property instantiation e.g., [[P::v]]<br />subclass, e.g., [[Category:C]] (on a category page)<br />subproperty, e.g., [[Subpropetyof:Property:P]] (on a property page)<br />SMW-QL (Query Language)<br />conjunction: e.g., [[Category:C]][[P::v]]<br />disjunction: e.g., [[Category:C]] or [[P::v]], [[A||B]] or [[P::v||w]]<br />property chain: e.g., [[P.Q::v]]<br />property wildcat: e.g., [[P::+]]<br />subquery: e.g., [[P::<q>[[Category:C]]</q>]]<br />inverse property e.g., [[-P::v]]<br />value comparison, e.g. [[P::>3]][[P::<7]][[P::!5]]<br />May 10, 2011<br />19<br />
    19. 19. Translation Rules for SMW-ML<br />Subproperty<br />Subclass<br />Class instance<br />Property instance<br />Redirection<br />P(x,y) :- Q(x,y) .<br />C(x) :- D(x) .<br />C(a) .<br />P(a,b) .<br />a=b.<br />May 10, 2011<br />20<br />
    20. 20. Translation Rules for SMW-QL<br />result(x) :- _tmp0(x).<br />_tmp0(x) :- A(x), p3(x,x0), x0=category:B.<br />_tmp0(x) :- p(x,x2), p1(x2,x3), p2(x3,x1), _tmp9(x1).<br />_tmp9(x1) :- _tmp12(x1).<br />_tmp12(x1) :- D(x1).<br />_tmp12(x1) :- p1(x1,x4), x4=SomePage.<br />_tmp9(x1) :- thing(x), x !=v.<br />_tmp9(x1) :- E(x1).<br />{{#ask: [[Category:A]][[p3::category:B]] or <br /> [[p.p1.p2::<br /><q><br /> [[Category:D]] or [[p1::<q>[[SomePage]]</q>]]<br /> </q><br />||!v<br />||<q>[[Category:E]]</q><br /> ]]<br />}}<br />Conjunction<br />Property chain<br />Disjunction<br />Inequality<br />Subquery<br />May 10, 2011<br />21<br />
    21. 21. Theoretical Complexity<br />May 10, 2011<br />22<br />Recall that L  NL  P  NP<br />