2. I. Formal definition, syntax for and KR applications of Contexts (Microtheories) in Cyc
II. Conceptually, contexts scoped to interoperate to solve a systemic barrier to emergence of open and
commercial markets for vital information can provide a be harnessed
III. Theoretically, contexts provide a complete basis for the solution when
I. Implemented as Containers normalization models, SKS content sets, derivation models
II. The expressivity of derivation models is sufficiently constrained
IV. Path to scalable implementation : Case study of risk data production chains in GSIBs
3. Three Problems and a Hypothesis
Preliminary:
What is an Computational Ontology Language?
The role of ontology data production is to compromise set of general statements that, for sets arbitrary particular
statements S1, in some domain provide, for distinct an informative set of statements S2 both
A rational basis for the acceptance of the truth of the statements in S2
A computational basis for automated generation of tokens of the sentences in S2.
A language suitable for this is role have both an formal and inform semanticsroduction of true statements from
A formal, modeling language, L, that is a machine readable set of logical and logical terms with both a formal and
informal semantics that
Ontologies make assertions, and with all of the import of statement of any synonymous expression by a human
agent, and with responsibility for its truth
4. Microtheories
• First class objects of CycL
• An implicit and explicit means for placement of assertions
#$ist and #$ist-Asserted
• Semantically, a microtheory is simply a set of CycL sentences subject
to a specified, possibly null, set of closure conditions
• Time is modeled as a dimension of contexts according to a particual
interval semantics for temporary truth and is implemented in
inference as a relationship of inclusions between contexts.
5. A Framework of Contexts for Production of Vitally Important Data
• Interpretation Models
• Specify templates that provide a executable basis for translation of arbitrary data elements in
the schema for a particular integrated source
• Specify the means to access the source, such as physical address, connection credentials and
the language for retrieval of content
• SKS Contexts
• A microtheory whose content is exactly that of a particular SKS
• The content is generate by Cyc query access to the source and translation ofd the via code
generate according to the specified implicit in the interpretation model for the SKS
• A system of such contexts in combination with Ontology Models provides a basis
for independently developed methods to ensure the completeness and
correctness the information production process along with user access to the
complete provenance trace of each final datum
7. Business Questions
1. Which Fortune 500 companies have CEOs that have been board members for a non-US
investment bank?
2. What GS customers have previously made >$1M high risk investments in European
markets within one week of a 2% drop in the AMEX?
3. What was the performance of the DJIA in each 48 hour period subsequent to a mention of
uranium enrichment by an Iranian official?
4. What publicly traded non-US pharmaceutical companies have patent disputes with a US
pharmaceutical company that owns a subsidiary in the country in which the disputant is
based?
5. What Japanese steel manufacturers receive more than 50% of their iron ore from US or
Canadian mines?
6. What, if any, is the shortest sequence of previous co-board members connecting a board
member of AIG with a board member of Merrill?
7. What type of derivative, whose underlier is equity in African corporations, is most heavily
invested in by GS corporate customers in Japan?
10. GS Facts
Market capitalization = $96 Billion
Assets = $ 860 Billion > Saudi Arabia
Assets under management = $1.3 Trillion ≈ Russia, 12
Total Assets Controlled = 2.1x1012 < India, 8th Globally
Employees = 34,800
Technology budget = $10B
Technology Employees = 9,000
Development = 4000
Operations = 5000
Servers = 800k
Compute = 500k
Storage = 300k
o UAT/Failover = 200k
o Production = 100k
Data Size = 16 Petabytes
Data Quantity = 1015
“The Weather Desk”
“South American, Tech,
Vol 1 Derivatives Desk”
11.
12. Function and Performance
• Semantic synthesis is never the bottleneck executable; known
methods can readily out scale even pipeline capacity
• With specialized/high end hardware and industrial scale execution
clusters (1000s of nodes) information derivation rates sizes to achieve
rates of 1010 inferences per second are readily attainable.
• Open question: what is the maximal class of FOL rule expression types
for which derivation of ground be finite and thus verifiable.