The document discusses an experiment in acquiring rich logical knowledge from natural language text using a technique called Textual Logic (TL). TL maps text to logical formulas and vice versa using an interactive disambiguation process. In an experiment, TL was used to represent over 2,500 sentences from a biology textbook as logical formulas using Rulelog, a new knowledge representation that is defeasible, tractable and rich. The resulting logical knowledge covered over 95% of the textbook material and took an average of less than 10 minutes per sentence to author. The study demonstrates progress on rapidly acquiring rich logical knowledge from text and reasoning with such knowledge.
Part-of-speech Tagging for Web Search Queries Using a Large-scale Web CorpusAtsushi Keyaki
Presented at the 32nd ACM Symposium on Applied Computing (SAC2017)
Paper: http://www.lsc.cs.titech.ac.jp/keyaki/paper/SAC2017_Keyaki_cameraReady.pdf
Data set: http://www.lsc.cs.titech.ac.jp/keyaki/en/data/
Web site: http://www.lsc.cs.titech.ac.jp/keyaki/en/
Rulelog is in process of industry standardization via RuleML and W3C:
RIF-Rulelog specification, version of of May 24, 2013, Michael Kifer, ed. RIF-Rulelog is a powerful dialect of W3C Rule Interchange Format (RIF) that is in draft as a submission from RuleML to W3C.
Several industry standards in the areas are based heavily on our team’s contributions to the authoring/editing of the specifications and conducting the underlying research and earlier-phase standards design. These include most notably the two most important industry standards on rules knowledge:
W3C Rule Interchange Format (RIF), which is primarily based on the RuleML standards design (semantic web rules)
W3C OWL 2 RL Profile (rule-based web ontologies)
The team has also contributed to the development of W3C SPARQL and ISO Common Logic, and been strongly involved in other related standardization efforts at OMG and Oasis.
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingYoung Seok Kim
Review of paper
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
ArXiv link: https://arxiv.org/abs/1810.04805
YouTube Presentation: https://youtu.be/GK4IO3qOnLc
(Slides are written in English, but the presentation is done in Korean)
Linguistic markup and transclusion processing in XML documentsSimon Dew
Transclusion can have linguistic consequences. This presentation proposes a markup scheme that can be used to indicate [a] the required form (e.g. syntactic case) of a transcluded term in an XML document, and [b] the syntactic features of the transcluded term that demand agreement in the surrounding document. It also describes a set of XSLT transformations that can be used to select the correct form of any dependent words in the surrounding document, using dictionaries that conform to the TEI.
Part-of-speech Tagging for Web Search Queries Using a Large-scale Web CorpusAtsushi Keyaki
Presented at the 32nd ACM Symposium on Applied Computing (SAC2017)
Paper: http://www.lsc.cs.titech.ac.jp/keyaki/paper/SAC2017_Keyaki_cameraReady.pdf
Data set: http://www.lsc.cs.titech.ac.jp/keyaki/en/data/
Web site: http://www.lsc.cs.titech.ac.jp/keyaki/en/
Rulelog is in process of industry standardization via RuleML and W3C:
RIF-Rulelog specification, version of of May 24, 2013, Michael Kifer, ed. RIF-Rulelog is a powerful dialect of W3C Rule Interchange Format (RIF) that is in draft as a submission from RuleML to W3C.
Several industry standards in the areas are based heavily on our team’s contributions to the authoring/editing of the specifications and conducting the underlying research and earlier-phase standards design. These include most notably the two most important industry standards on rules knowledge:
W3C Rule Interchange Format (RIF), which is primarily based on the RuleML standards design (semantic web rules)
W3C OWL 2 RL Profile (rule-based web ontologies)
The team has also contributed to the development of W3C SPARQL and ISO Common Logic, and been strongly involved in other related standardization efforts at OMG and Oasis.
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingYoung Seok Kim
Review of paper
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
ArXiv link: https://arxiv.org/abs/1810.04805
YouTube Presentation: https://youtu.be/GK4IO3qOnLc
(Slides are written in English, but the presentation is done in Korean)
Linguistic markup and transclusion processing in XML documentsSimon Dew
Transclusion can have linguistic consequences. This presentation proposes a markup scheme that can be used to indicate [a] the required form (e.g. syntactic case) of a transcluded term in an XML document, and [b] the syntactic features of the transcluded term that demand agreement in the surrounding document. It also describes a set of XSLT transformations that can be used to select the correct form of any dependent words in the surrounding document, using dictionaries that conform to the TEI.
GPT-2: Language Models are Unsupervised Multitask LearnersYoung Seok Kim
Review of paper
Language Models are Unsupervised Multitask Learners
(GPT-2)
by Alec Radford et al.
Paper link: https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
YouTube presentation: https://youtu.be/f5zULULWUwM
(Slides are written in English, but the presentation is done in Korean)
Deep Natural Language Processing for Search and Recommender SystemsHuiji Gao
Tutorial for KDD 2019:
Search and recommender systems process rich natural language text data such as user queries and documents. Achieving high-quality search and recommendation results requires processing and understanding such information effectively and efficiently, where natural language processing (NLP) technologies are widely deployed. In recent years, the rapid development of deep learning models has been proven successful for improving various NLP tasks, indicating their great potential of promoting search and recommender systems.
In this tutorial, we summarize the current effort of deep learning for NLP in search/recommender systems. We first give an overview of search/recommender systems with NLP, then introduce basic concept of deep learning for NLP, covering state-of-the-art technologies in both language understanding and language generation. After that, we share our hands-on experience with LinkedIn applications. In the end, we highlight several important future trends.
RuleML2015: Explanation of proofs of regulatory (non-)complianceusing semanti...RuleML
With recent regulatory advances, modern enterprises have to not only comply with regulations but have to be prepared to provide explanation of proof of (non-)compliance. On top of compliance checking, this necessitates modeling concepts from regulations and enterprise operations so that stakeholder-specific and close to natural language explanations could be generated. We take a step in this direction by using Semantics of Business Vocabulary and Rules to model and map vocabularies of regulations and operations of enterprise. Using these vocabularies and leveraging proof generation abilities of an existing compliance engine, we show how such explanations can be created. Basic natural language explanations that we generate can be easily enriched by adding requisite domain knowledge to the vocabularies.
Presentation of the paper "Reasoning in the OWL 2 Full Ontology Language using First-Order Automated Theorem Proving" by Michael Schneider, FZI Karlsruhe, and Geoff Sutcliffe, University of Miami, at the 23rd International Conference on Automated Deduction (CADE 23), August 2011.
Presentation at the ESWC 2011 PhD Symposium in May 2011, by Michael Schneider, FZI. Included are backup slides that have not been presented at the event. The corresponding PhD proposal can be found in the ESWC proceedings at <http: />. Alternatively, the PhD proposal can be downloaded from <http: />.
Applications of Word Vectors in Text Retrieval and Classificationshakimov
Applications of word vectors (word2vec, BERT, etc.) on problems such as text retrieval, classification of textual documents for tasks such as sentiment analysis, spam detection.
a system called natural language interface which transforms user's natural language question into SPARQL query
find related papers here https://sites.google.com/site/fadhlinams81/publication
GPT-2: Language Models are Unsupervised Multitask LearnersYoung Seok Kim
Review of paper
Language Models are Unsupervised Multitask Learners
(GPT-2)
by Alec Radford et al.
Paper link: https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
YouTube presentation: https://youtu.be/f5zULULWUwM
(Slides are written in English, but the presentation is done in Korean)
Deep Natural Language Processing for Search and Recommender SystemsHuiji Gao
Tutorial for KDD 2019:
Search and recommender systems process rich natural language text data such as user queries and documents. Achieving high-quality search and recommendation results requires processing and understanding such information effectively and efficiently, where natural language processing (NLP) technologies are widely deployed. In recent years, the rapid development of deep learning models has been proven successful for improving various NLP tasks, indicating their great potential of promoting search and recommender systems.
In this tutorial, we summarize the current effort of deep learning for NLP in search/recommender systems. We first give an overview of search/recommender systems with NLP, then introduce basic concept of deep learning for NLP, covering state-of-the-art technologies in both language understanding and language generation. After that, we share our hands-on experience with LinkedIn applications. In the end, we highlight several important future trends.
RuleML2015: Explanation of proofs of regulatory (non-)complianceusing semanti...RuleML
With recent regulatory advances, modern enterprises have to not only comply with regulations but have to be prepared to provide explanation of proof of (non-)compliance. On top of compliance checking, this necessitates modeling concepts from regulations and enterprise operations so that stakeholder-specific and close to natural language explanations could be generated. We take a step in this direction by using Semantics of Business Vocabulary and Rules to model and map vocabularies of regulations and operations of enterprise. Using these vocabularies and leveraging proof generation abilities of an existing compliance engine, we show how such explanations can be created. Basic natural language explanations that we generate can be easily enriched by adding requisite domain knowledge to the vocabularies.
Presentation of the paper "Reasoning in the OWL 2 Full Ontology Language using First-Order Automated Theorem Proving" by Michael Schneider, FZI Karlsruhe, and Geoff Sutcliffe, University of Miami, at the 23rd International Conference on Automated Deduction (CADE 23), August 2011.
Presentation at the ESWC 2011 PhD Symposium in May 2011, by Michael Schneider, FZI. Included are backup slides that have not been presented at the event. The corresponding PhD proposal can be found in the ESWC proceedings at <http: />. Alternatively, the PhD proposal can be downloaded from <http: />.
Applications of Word Vectors in Text Retrieval and Classificationshakimov
Applications of word vectors (word2vec, BERT, etc.) on problems such as text retrieval, classification of textual documents for tasks such as sentiment analysis, spam detection.
a system called natural language interface which transforms user's natural language question into SPARQL query
find related papers here https://sites.google.com/site/fadhlinams81/publication
Expressive Querying of Semantic Databases with Incremental Query RewritingAlexandre Riazanov
This talk briefly introduces the Incremental Query Rewriting (IQR) method (see http://link.springer.com/chapter/10.1007%2F978-1-4419-7335-1_1 ) and presents an approach for extremely expressive querying of RDF triplestores, based on IQR.
A presentation on a special category of databases called Deductive Databases. It is an attempt to merge logic programming with relational database. Other types include Object-oriented databases, Graph databases, XML databases, Multi-model databases, etc.
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
Talk given at the 8th Forum for Information Retrieval Evaluation (FIRE, http://fire.irsi.res.in/fire/2016/), December 10, 2016, and at the Qatar Computing Research Institute (QCRI), December 15, 2016.
What's Spain's Paris? Mining Analogical Libraries from Q&A DiscussionsChunyang Chen
Best paper award at SANER'16. Third-party libraries are an integral part of many software projects. It often happens that developers need to find analogical libraries that can provide comparable features to the libraries they are already familiar with. Existing methods to find analogical libraries are limited by the community-curated list of libraries, blogs, or Q&A posts, which often contain overwhelming or out-of-date information. In this paper, we present a new approach to recommend analogical libraries based on a knowledge base of analogical libraries mined from tags of millions of Stack Overflow questions. The novelty of our approach is to solve analogical-libraries questions by combining state-of-the-art word embedding technique and domain-specific relational and categorical knowledge mined from Stack Overflow. We implement our approach in a proof-of-concept web application (https://graphofknowledge.appspot.com/similartech). The evaluation results show that our approach can make accurate recommendation of analogical libraries (Precision@1=0.81 and Precision@5=0.67). Google Analytics of the website traffic provides initial evidence of the potential usefulness of our web application for software developers.
Presentation from March 18th, 2013 Triangle Java User Group on Taming Text. Presentation covers search, question answering, clustering, classification, named entity recognition, etc. See http://www.manning.com/ingersoll for more.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
2. 2
Digital Aristotle and Project Halo Gunning et al, AAAI & IAAI (August 2011)
3. 3
IBM Watson FAQ on QA using logic or NLP
•Classic knowledge-based AI approaches to QA try to logically prove an answer is correct from a logical encoding of the question and all the domain knowledge required to answer it. Such approaches are stymied by two problems:
•the prohibitive time and manual effort required to acquire massive volumes of knowledge and formally encode it as logical formulas accessible to computer algorithms, and
•the difficulty of understanding natural language questions well enough to exploit such formal encodings if available.
•Techniques for dealing with huge amounts of natural language text, such as Information Retrieval, suffer from nearly the opposite problem in that they can always find documents or passages containing some keywords in common with the query but lack the precision, depth, and understanding necessary to deliver correct answers with accurate confidences.
4. 4
Why not QA using logic and NLP?
•What if it was “cheap” to acquire massive volumes of knowledge formally encoded as logical formulas?
•What if it was “easy” to understand natural language questions well enough to exploit such formal encodings?
5. 5
Knowledge Acquisition for Deep QA: Expt.
•Goal 1: represent the knowledge in one chapter of a popular college- level science textbook, at 1st-year college level
•Chapter 7 on cell membranes, in Biology 9th ed., by Campbell et al
•Goal 2: measure what KA productivity is achieved by KE’s
•Assess level of effort, quality of resulting logic, and coverage of textbook
•Software used in this case study:
•for translating English to logic
•Automata Linguist™ and KnowBuddy™ (patents pending)
•English Resource Grammar (http://www.delph-in.net/erg/)
•for knowledge representation & reasoning
•Vulcan, Inc.’s SILK (http://www.projecthalo.com/): prototype implementation of Rulelog
6. 6
Summary of Effort & Results
•Captured 3,000+ sentences concerning cellular biology
•hundreds of questions (2 examples herein)
•600 or so sentences directly from Campbell’s Biology textbook
•2,000 or so sentences of supporting or background knowledge
•Sentence length averaged 10 words up to 25 words
•background knowledge tends to be shorter
•disambiguation of parse typically requires a fraction of a minute
•hundreds of parses common, > 30 per sentence on average
•the correct parse is typically not the parse ranked best by statistical NLP
•Sentences disambiguated and formalized into logic in very few minutes on average
•resulting logic is typically more sophisticated than skilled logicians typically produce
•Collaborative review and revision of English sentences, disambiguation, and formalization approximately doubled time per sentence over the knowledge base
9. 9
Knowledge Acquisition
•Note: the “parse” ranked first by machine learning techniques is usually not the correct interpretation
10. 10
Query Formulation
•Are the passage ways provided by channel proteins hydrophilic or hydrophobic?
11. 11
The Answer is “Hydrophilic”
•Hypothetical query uses “presumption” below
•Presumption yields tuples with skolems
•The answer is on the last line below
14. 14
A Bloom level 4 question
•If a Paramecium swims from a hypotonic environment to an isotonic environment, will its contractile vacuole become more active?
∀(?x9)paramecium(?x9) ⇒∃(?x13)(hypotonic(environment)(?x13) ∧∃(?x21)(isotonic(environment)(?x21) ∧∀₁(?x31)contractile(vacuole)(of(?x9))(?x31) ⇒if(then)(become(?x31,more(active)(?x31)),swim(from(?x13))(to(?x21))(?x9))))
•The above formula is translated into a hypothetical query, which answers “No”.
15. 15
Textual Logic Approach: Overview
•Logic-based text interpretation & generation, for KA & QA
•Map text to logic (“text interpretation”): for K and Q’s
•Map logic to text (“text generation”): for viewing K, esp. for justifications of answers (A’s)
•Map based on logic
•Textual terminology – phrasal style of K
•Use words/word-senses directly as logical constants
•Natural composition: textual phrase logical term
•Interactive logical disambiguation technique
•Treats: parse, quantifier type/scope, co-reference, word sense
•Leverages lexical ontology – large-vocabulary, broad-coverage
•Initial restriction to stand-alone sentences – “straightforward” text
•Minimize ellipsis, rhetoric, metaphor, etc.
•Implemented in Automata Linguist
•Leverage defeasibility of the logic
•For rich logical K: handle exceptions and change
•Incl. for NLP itself: “The thing about NL is that there’s a gazillion special cases” [Peter Clark]
16. 16
Requirements on the logical KRR for KA of Rich Logical K
•The logic must be expressively rich – higher order logic formulas
•As target for the text interpretation
•The logic must handle exceptions and change, gracefully
•Must be defeasible = K can have exceptions, i.e., be “defeated”, e.g., by higher-priority K
•For empirical character of K
•For evolution and combination of KB’s. I.e., for social scalability.
•For causal processes, and “what-if’s” (hypotheticals, e.g., counterfactual)
•I.e., to represent change in K and change in the world
•Inferencing in the logic must be computationally scalable
•Incl. tractable = polynomial-time in worst-case
•(as are SPARQL and SQL databases, for example)
17. 17
Past Difficulties with Rich Logical K
•KRR not defeasible & tractable
•… even when not target of text-based KA
•E.g.
1.FOL-based – OWL, SBVR, CL: infer garbage
•Perfectly brittle in face of conflict from errors, confusions, tacit context
2.E.g., FOL and previous logic programs: run away
•Recursion thru logical functions
18. 18
Rulelog: Overview
•First KRR to meet central challenge: defeasible + tractable + rich
•New rich logic: based on databases, not classical logic
•Expressively extends normal declarative logic programs (LP)
•Transforms into LP
•LP is the logic of databases (SQL, SPARQL) and pure Prolog
•Business rules (BR) – production-rules -ish – has expressive power similar to databases
•LP (not FOL) is “the 99%” of practical structured info management today
•RIF-Rulelog in draft as industry standard (W3C and RuleML)
•Associated new reasoning techniques to implement it
•Prototyped in Vulcan’s SILK
•Mostly open source: Flora-2 and XSB Prolog
19. 19
Rulelog: more details
•Defeasibility based on argumentation theories (AT) [Wan, Grosof, Kifer 2009]
•Meta-rules (~10’s) specify principles of debate, thus when rules have exceptions
•Prioritized conflict handling. Ensures consistent conclusions. Efficient, flexible, sophisticated defeasibility.
•Restraint: semantically clean bounded rationality [Grosof, Swift 2013]
•Leverages “undefined” truth value to represent “not bothering”
•Extends well-foundedness in LP
• Omniformity: higher-order logic formula syntax, incl. hilog, rule id’s
•Omni-directional disjunction. Skolemized existentials.
•Avoids general reasoning-by-cases (cf. unit resolution).
•Sound interchange of K with all major standards for sem web K
•Both FOL & LP, e.g.: RDF(S), OWL-DL, SPARQL, CL
•Reasoning techniques based on extending tabling in LP inferencing
•Truth maintenance, justifications incl. why-not, trace analysis for KA debug, term abstraction, delay subgoals
20. 20
TL KA – Study Results
•Axiomatized ~2.5k English sentences during 2013:
•One defeasible axiom in Rulelog (SILK syntax) per sentence
•On average, each of these axioms correspond to > 5 “rules”
•e.g., “rule” as in logic programs (e.g., Prolog) or business rules (e.g., PRR, RIF-PRD)
•<< 10 minutes on average to author, disambiguate, formalize, review & revise a sentence
•The coverage of the textbook material was rated “A” or better for >95% of its sentences
•Collaboration resulted in an average of over 2 authors/editors/reviewers per sentence
•Non-authors rated the logic for >90% of sentences as “A” or better; >95% as “B+” or better
•TBD: How much will TL effort during QA testing?
•TBD: How much will TL effort as TL tooling & process mature?
21. 21
TL KA – Study Results (II)
•Expressive coverage: very good, due to Rulelog
•All sentences were representable but some (e.g., modals) are TBD wrt reasoning
•This and productivity were why background K was mostly specified via TL
•Small shortfalls (< few %) from implementation issues (e.g., numerics)
•Terminological coverage: very good, due to TL approach
•Little hand-crafted logical ontology
•Small shortfalls (< few %) from implementation issues
•Added several hundred mostly domain-specific lexical entries to the ERG
22. 22
TL KA: KE labor, roughly, per Page
•(In the study:)
•~~$3-4/word (actual word, not simply 5 characters)
•~~$500-1500/page (~175-350 words/page)
•Same ballpark as: labor to author the text itself
•… for many formal text documents
•E.g., college science textbooks
•E.g., some kinds of business documents
•“Same ballpark” here means same order of magnitude
•TBD: How much will TL effort when K is debugged during QA testing?
•TBD: How much will TL effort as its tooling & process mature?
23. 23
KA Advantages of Approach
•Approach = Textual Logic + Rulelog
•Interactive disambiguation: relatively rapidly produces rich K
•With logical and semantic precision
•Starting from effectively unconstrained text
•Textual terminology: logical ontology emerges naturally
•From the text’s phrasings, rather than needing effort to specify it explicitly and become familiar with it
•Perspective: Textual terminology is also a bridge to work in text mining and “textual entailment”
•Rulelog as rich target logic
•Can handle exceptions and change, and is tractable
•Rulelog supports K interchange (translation and integration)
•Both LP and FOL; all the major semantic tech/web standards (RDF(S), SPARQL, OWL, RIF, CL, SBVR); Prolog, SQL, and production rules. (Tho’ for many of these, with restrictions.)
24. 24
Conclusions
•Research breakthrough on two aspects:
•1. rapid acquisition of rich logical knowledge
•2. reasoning with rich logical knowledge
•Appears to be significant progress on the famous “KA bottleneck” of AI
•“Better, faster, cheaper” logic. Usable on a variety of KRR platforms.
•It’s early days still, so lots remains to do
•Tooling, e.g.: leverage inductive learning to aid disambiguation
•More experiments, e.g.: push on QA; scale up
25. 25
recap: Scalable Rich KA – Requirements
KA & QA Logic
KRR
Defeasible
+ Tractable
Text-based
Disambig.
26. 26
Rulelog
recap: Scalable Rich K – Approach
Textual Logic
Defeasible + Tractable
Argumentation theories
Restraint
Omniform
Textual terminology
Interactive disambiguation
Text-based KA & QA
Logic
KRR
Logic-based map
of text logic
27. 27
Rulelog
Textual Logic
Defeasible + Tractable
Argumentation theories
Restraint
Omniform
Textual terminology
Interactive disambiguation
Text-based & KA QA
Logic-based map
of text logic
Apps
Resources
Databases
Services
Specialized UI
Service
interfaces
Usage Context for Approach
Logic
KRR
28. 28
Late-Breaking News
•Company created to commercialize approach
Coherent Knowledge Systems
(coherentknowledge.com went live today)
•Target markets: policy-centric, NL QA and HCI
29. 29
Acknowledgements
•This work was supported in part by Vulcan, Inc., as part of the Halo Advanced Research (HalAR) program, within overall Project Halo. HalAR included SILK.
•Thanks to:
•The HalAR/SILK team
•The Project Sherlock team at Automata
•The Project Halo team at Vulcan
•RuleML and W3C, for their cooperation on Rulelog
•