Text Analytics Online Knowledge Base / Database

Text Analytics Online Knowledge
Base / Database

Text Analytics Online Knowledge Base / Database
Page | 1
Contents
FreeBase Api ...........................................................................................................................................2
WordReferenceApi..................................................................................................................................9
DBPedia.................................................................................................................................................11
YahooApi’s ............................................................................................................................................17
YAGO.....................................................................................................................................................19
TrueKnowledgeAPI................................................................................................................................29
Comparision of DBPedia and FreeBase ..................................................................................................31
Comparision of DBPedia and YAGO........................................................................................................33
Conclusion.............................................................................................................................................34

Page | 2
FreeBaseApi
It is the API given by Google :
Freebase contains at this time of writing more than 20 million topics, more than 3000 types,
and more than 30,000 properties. This is not a small database by any measure. If you were to
think of it in terms of relational databases, it is probably the database with the most number of
relational tables (3000+ types), and the most number of table columns (30,000+ properties).
Furthermore, Freebase is designed to store the amorphous kind of data that you find in
everyday life. To store data about the prolific Bob Dylan --who composed songs, sang and
performed, wrote books, acted in movies-- which relational table should we use? The "song
composer" table, or the "singer" table, or the "book author" table, or the "film actor" table?
The answer is that we need to store data about that same person in all those different tables.
This complexity is not limited to prolific people; a building could start out as a church, be turned
into a hospital during a war, and later become a tourist destination. The apple is a fruit, but also
an ingredient in numerous recipes, the logo of a company, and a literary device in the story of
Snow White.
Those million topics are very intricately connected. A certain politician might have run a
campaign funded by a pharmaceutical company, whose board consists of some people who
used to study at some particular Ivy League schools. Topics in different domains (politics,
business, education, etc.) are linked together, spanning across virtually any combination of
tables. Real life is intricately interconnected, and so is Freebase data.
Considering the sheer size and the data modeling complexity of Freebase, we can proudly say:
this isn't your father's kind of database. It's a whole new kind of database and one that was
specifically designed to play well as a citizen of the web.
Freebase is not only a web site that people can use directly with their browsers, but it's also a
collection of web services that your own web applications can use to achieve things that
wouldn't be possible without additional data or a hosting platform where you can develop and
run securely your web applications directly in Freebase's own server infrastructure.
Ways to use Freebase:
 Use Freebase's Ids to uniquely identify entities anywhere on the web
 Query Freebase's data using MQL
 Build applications using our API or Acre, our hosted development platform

Page | 3
ABOUT :
Freebase extract structured data from Wikipedia and make RDF available.
Freebase (the open global structured knowledge base) is a high-profile public instantiation of
the Metaweb technology.
We use Metaweb Query Language (MQL) for programmatic queries .
For example :
We want
Find an object in the database whose type is "/music/artist" and whose name is "The Police".
Then return its set of albums :
Then our query would be like :
https://api.freebase.com/api/service/mqlread?query={%22query%22:{%22type%22:%22/music
/artist%22,%22name%22:%22The%20Police%22,%22album%22:[]}}
and the result we will get is :
{
"code": "/api/status/ok",
"result": {
"album": [
"Outlandosd'Amour",
"Reggatta de Blanc",
"Zenyattu00e0 Mondatta",
"Ghost in the Machine",
"Synchronicity",
"Every Breath You Take: The Singles",
"Greatest Hits",
"Message in a Box: The Complete Recordings",
"Live!",
"Every Breath You Take: The Classics",
"Their Greatest Hits",
"Can't Stand Losing You",
"Roxanne '97 (Puff Daddy remix)",
"Roxanne '97",
"The Police",
"Greatest Hits",
"The Very Best of Sting & The Police",
"Brimstone and Treacle",
"Can't Stand Losing You",
"De Do DoDo, De Da DaDa",
"Certifiable: Live in Buenos Aires",
"Roxanne",
"2007-09-16: Geneva",
"Live in Boston",
"The 50 Greatest Songs",

Page | 4
"King of Pain",
"Invisible Sun",
"Message in a Bottle",
"Spirits in the Material World",
"Don't Stand So Close to Me '86",
"The Police Live!",
"Synchronocity",
"The Very Best of Sting & The Police",
"When the World Is Running Down (You Can't Go Wrong)"
],
"name": "The Police",
"type": "/music/artist"
},
"status": "200 OK",
"transaction_id": "cache;cache02.p01.sjc1:8101;2012-12-26T10:15:37Z;0031"
}
For making queries , you need to have thorough knowledge of Meta Web Query
Language(MQL) architecture and its notations.
You can write the query in an easy way :
When we try to access API by googleapi, such as :-
Explaining the concept in detail of writing the query , which is to be hit at browser :
Here are the parameter we use for writing the query :
Parameters: -
Param Required Datatype Multiple Default Description
Query yes string False The text you want to match against
freebase entities.
Callback no string False JS method name for JSONP
callbacks.
Domain no string True A comma separated list of domain
IDs. Search results must include
these domains.
Exact no boolean False false Matches only the name, and keys
'exactly'. No normalization of any
kind is done at indexing and query
time. The text is only broken up on
space characters.
Filter no string True A filter s-expression.

Page | 5
Format no string False The keyword "classic" to return the
same information the original
search API would have
Encode no boolean False false Whether or not to html escape
entities' names.
Indent no boolean False false Whether to indent the json.
Limit no integer
(≥1)
False 20 Return up to this number of results.
mql_output no string False A MQL query thats extracts entity
information.
Prefixed no boolean False false Whether or not to match by name
prefix. (used for autosuggest)
Start no integer
(≥0)
False 0 Allows paging through results.
Type no string True A comma seperated list of type IDs.
Search results must include these
types.
Lang no string True The language you are searching in.
Can pass multiple languages.
Where Query is text you want to search for.
Now comes how to write query , for that we should know the query string .
Example: Here we have query string as Washington, then :
https://www.googleapis.com/freebase/v1/search?query=Washington&indent=true&lim
it=222&prefixed=true&lang=en
Output is Jsonformat :
Washington Free Base.txt
We can also use filter , as there are no of filters defined :

Page | 6
Filter: - e.g. (any type:/people/person) more details , Filter can make magic refer to
http://wiki.freebase.com/wiki/Search_Cookbook
The filter param allows you to create more complex rules and constraints to apply to your
query. The filter value is a simple language that supports the following symbols:
 the all, any, should and not operators
 the type, domain, name, alias, with and without operands
 the ( and ) parenthesis for grouping and precedence
About Size and Use of FreeBase :-
Size of FreeBase Date source at the latest in accordance to the categories are:
Category Size of Topics
MUSIC 11M
BOOKS 6M
People 2M
TV 1M
Location 1M
Film 877 K
Business 704K
Government 139 K
Here are some topic about FreeBase:

Page | 7
Topic Solution
What is FreeBase Freebase is an open, Creative Commons licensed collection of
structured data, and a platform for accessing and manipulating that
data via the Freebase API.
Size of FreeBase Freebase contains about 20 million topics (aka entities) .
Is Freebase a wiki? No, though it shares some similarities with open wiki projects:
Freebase is a free source of information
Freebase is a collaborative project, and Freebase data may be edited
by anyone
Most of the data in Freebase is openly licensed under Creative
Commons
However:
Freebase does not run on wiki software, but on a graph database that
represents structured data
Most wikis arrange information primarily in the form of text-based
articles, while Freebase houses information in a structured, machine-
readable database format
Is Freebase a Semantic
Web project?
Yes, Freebase is part of the Semantic Web. We emit Linked Open Data
(via RDF) for all our entities, and are involved in various SemWeb
projects/communities/etc.
Where does the
information in
Freebase come from?
Initially, Freebase was seeded by pulling in information from a large
number of high-quality open data sources, such as Wikipedia,
MusicBrainz, and others. The Freebase community along with the
internal Freebase team continue to drive the growth of the graph –
focusing on bulk, algorithmic data imports, data extraction from free
text, ongoing synchronization of data feeds, and rigorous quality
management.
What are the limits on
use of API?
You may use Freebase's API for almost any use, including commercial
uses, up to a limit of 100,000 API calls per day. If you are interested in
using the Freebase API beyond 100,000 API calls per day, please
contact Metaweb

Page | 8
What are the rules for
using data in Freebase?
It depends on what type of content it is. Data is available for use
under the Creative Commons Attribution Only (or CC-BY) license. This
means you are free to use it on your site, as long as you credit the
Freebase community appropriately. The Freebase attribution policy
has all the details. Many of the images in Freebase are also CC-BY,
although some images are hosted under different license terms, like
GFDL (which is similar to CC-BY), public domain, or Fair Use, and you
can use the Freebase API to filter your results by license type. Finally,
long descriptions that they have pulled in from Wikipedia are licensed
under the GFDL.
What is the
relationship between
Freebase and
Metaweb?
Metaweb is the commercial entity that sponsored and developed the
Freebase platform. Metaweb was acquired by Google in July, 2010.
Will the licensing of
information in
Freebase ever change?
No. The data in Freebase has already been licensed under CC-BY,
which means it will always be available under that license; adding a
new license would not impact the current corpus of data.
Furthermore, all of the data in Freebase is available for download,
and people are allowed to store it locally.
References :
http://wiki.freebase.com/wiki/FAQ#Is_Freebase_a_wiki.3F
http://wiki.freebase.com/wiki/DBPedia
http://blog.dbpedia.org/2008/11/15/dbpedia-is-now-interlinked-with-freebase-links-to-
opencyc-updated/

Page | 9
Word Reference API
The API comes in two varieties: a JSON format and a regular-HTML/web format.
The URL for the HTML API is
http://api.wordreference.com/{api_version}/{API_key}/{dictionary}/{term}
and for the JSON API
http://api.wordreference.com/{api_version}/{API_key}/json/{dictionary}/{term}
where {term} is the term being searched for, {dictionary} is the dictionary you want to
search, and {api_version} is the desired version of the API. If {api_version} is omitted,
the API will redirect to the latest version automatically. Version upgrades will be posted
here; the current version is 0.8.
For translation purpose we use fo.llowing:
Examples:
api.wordreference.com/0.8/1/enfr/grin
api.wordreference.com/0.8/1/json/enfr/grin
For the use , we are testing this , we will be using "/thesaurus/" in place of {dictionary},
Although this API is useful for conversion of any language word to English word ,and vice
versa.
But for the meaning , its data source is very limited ,
We can use this , like :
http://api.wordreference.com/3cd08/json/thesaurus/washington
this will give result for Washington , but its data source is limited ,
So ,

Page | 10
this is not the better option for word disambiguation task , but its good
if we are using this for translating in diff languages .
Getting an API key
To get an API key:
1. Go to the API Registration page and fill out the necessary information.
2. After registering, you will receive your API key. Write this down and record it somewhere.
You must use it to access the API!
3. If you lose your key, you can retrieve it here.
4. If you registered for a key and it is not working, you can check the status of your key here.
Whenever you make a request to the API, make sure your URL includes this unique ID ---
otherwise the API will not work, and you will be prompted to include this.
Terms of Service
 You must include the copyright line: © WordReference.com
 You must link back to the term's entry on WordReference's website with the translation
or equivalent of: 'term' at WordReference.com
 No derivative works (without permission).
 API data can only be stored and cached for 24 hours (without permission).
 You are limited to 600 requests to the API per hour by default.
 Cannot be used in: browser toolbars.
 Cannot be used in an application or webpage whose primary function is as a dictionary
or translator (without permission).

Page | 11
DBPedia API
DBpedia is a project aiming to extract structured content from the information created as part
of the Wikipedia project. This structured information is then made available on the World Wide
Web. DBpedia allows users to query relationships and properties associated with Wikipedia
resources, including links to other related datasets.DBpedia has been described by Tim Berners-
Lee as one of the more famous parts of the Linked Dataproject.
Developer(s) University of Leipzig, Freie
Universität Berlin,OpenLink
Software
Initial release 23 January 2007
Stable release DBpedia 3.8 / 06 August 2012[1]
Written in Scala, Java, VSP
Operating system Virtuoso Universal Server
Type Semantic Web, Linked Data
License GNU General Public License
Website dbpedia.org
As of September 2011, the DBpedia dataset describes more than 3.64 million things, out of
which 1.83 million are classified in a consistent ontology, including 416,000 persons, 526,000
places, 106,000 music albums, 60,000 films, 17,500 video games, 169,000 organizations,
183,000 species and 5,400 diseases. The DBpedia data set features labels and abstracts for
these 3.64 million things in up to 97 different languages; 2,724,000 links to images and

Page | 12
6,300,000 links to external web pages; 6,200,000 external links into other RDF datasets,
740,000 Wikipedia categories, and 2,900,000 YAGO2 categories. From this dataset, information
spread across multiple pages can be extracted, for example book authorship can be put
together from pages about the work, or the author.
The DBpedia project uses the Resource Description Framework (RDF) to represent the
extracted information. As of September 2011, the DBpedia dataset consists of over 1 billion
pieces of information (RDF triples) out of which 385 million were extracted from the English
edition of Wikipedia and 665 million were extracted from other language editions.
One of the challenges in extracting information from Wikipedia is that the same concepts can
be expressed using different properties in templates, such as birthplace and placeofbirth.
Because of this, queries about where people were born would have to search for both of these
properties in order to get more complete results. As a result, the DBpedia Mapping Language
has been developed to help in mapping these properties to an ontology while reducing the
number of synonyms. Due to the large diversity of infoboxes and properties in use on
Wikipedia, the process of developing and improving these mappings has been opened to public
contributions.
DBpedia extracts factual information from Wikipedia pages, allowing users to find answers to
questions where the information is spread across many different Wikipedia articles. Data is
accessed using an SQL-like query language for RDF called SPARQL.
About SPARQL
SPARQL (pronounced "sparkle", a recursive
acronym for SPARQL Protocol and RDF Query Language) is an RDF query language, that is,
a query language for databases, able to retrieve and manipulate data stored in Resource
Description Framework format.It was made a standard by the RDF Data Access Working
Group (DAWG) of the World Wide Web Consortium, and is considered as one of the key
technologies of the semantic web. On 15 January 2008, SPARQL 1.0 became an official W3C
Recommendation.
SPARQL allows for a query to consist of triple patterns, conjunctions, disjunctions, and
optional patterns.
SPARQL allows users to write unambiguous queries.
Query forms
The SPARQL language specifies four different query variations for different purposes.
SELECT query
Used to extract raw values from a SPARQL endpoint, the results are returned in a table
format.
CONSTRUCT query

Page | 13
Used to extract information from the SPARQL endpoint and transform the results into
valid RDF.
ASK query
Used to provide a simple True/False result for a query on a SPARQL endpoint.
DESCRIBE query
Used to extract an RDF graph from the SPARQL endpoint, the contents of which is left to
the endpoint to decide based on what the maintainer deems as useful information.
Each of these query forms takes a WHERE block to restrict the query although in the
case of the DESCRIBE query the WHERE is optional.
A Simple Example is: “Write a query to find capitals of all the countries in Asia”
PREFIX abc: <http://example.com/exampleOntology#>
SELECT ?capital ?country
WHERE{
?x abc:cityname ?capital ;
abc:isCapitalOf ?y .
?y abc:countryname ?country ;
abc:isInContinent abc:Asia .
}
Note: SPARUL, or SPARQL/Update, is an extension to the SPARQL query language that provides
the ability to add, update, and delete RDF data held within a triple store.
The DBpedia knowledge base is served as Linked Data on the Web. As DBpedia
defines Linked Data URIs for millions of concepts, various data providers have
started to set RDF links from their data sets to DBpedia, making DBpedia
one of the central interlinking-hubs of the emerging Web of Data.
Querying DBPedia :
If we are accessing the look up service of DBPedia api , then we can use that as follows , ( it
returns with XML page)

Page | 14
The DBpedia Lookup Service can be used to look up DBpedia URIs by related keywords. Related
means that either the label of a resource matches, or an anchor text that was frequently used
in Wikipedia to refer to a specific resource matches (for example the resource
http://dbpedia.org/resource/Washington can be looked up by the string “Washington”).
The results are ranked by the number of inlinks pointing from other Wikipedia pages at a result
page.
Two APIs are offered: Keyword Search and Prefix Search. The URL has the form
http://lookup.dbpedia.org/api/search.asmx/<API>?<parameters>
Keyword Search
The Keyword Search API can be used to find related DBpedia resources for a given string.
The string may consist of a single or multiple words.
Example: Places that have the related keyword “berlin”
http://lookup.dbpedia.org/api/search.asmx/KeywordSearch?QueryClass=place&QueryString=b
erlin
Prefix Search (i.e. Autocomplete)
The Prefix Search API can be used to implement autocomplete input boxes. For a given partial
keyword like berl the API returns URIs of related DBpedia resources like
http://dbpedia.org/resource/Berlin.
Example: Top five resources for which a keyword starts with “berl”
http://lookup.dbpedia.org/api/search.asmx/PrefixSearch?QueryClass=&MaxHits=5&QueryStrin
g=berl
Like we are searching for Washington in Keyword Search with QueryClass specifying as person:-
We give Query as:
http://lookup.dbpedia.org/api/search.asmx/KeywordSearch?QueryClass=&QueryString=Washi
ngton&MaxHits=30
the result we get is :
WashingtonSearch.txt

Page | 15
The three parameters are
 QueryString: a string for which a DBpedia URI should be found.
 QueryClass: a DBpedia class from the Ontology that the results should have (for
owl#Thing and untyped resource, leave this parameter empty).
o CAUTION: specifying any values that do not represent a DBpedia class will lead
to no results (contrary to the previous behavior of the service).
 MaxHits: the maximum number of returned results (default: 5)
Note : It’s not able to find out Francisco’D Souza when we give Francisco as QueryString when
when we give 1000 hits.
Every DBpedia resource is described by a label, a short and long English abstract, a link to the
corresponding Wikipedia page, and a link to an image depicting the thing (if available).
If a thing exists in multiple language versions of Wikipedia, then short and long abstracts within
these languages and links to the different language Wikipedia pages are added to the
description. The DBpedia data set contains the following numbers of abstracts per language
(July 2012):
Language Number of Abstracts
English 3,770,000
German 1,244,000
French 1,197,000
Dutch 993,000
Italian 882,000
Spanish 879,000
Polish 848,000
Japanese 781,000
Portuguese 699,000
Swedish 457,000
Chinese 445,000

Page | 16
LICENSE :
DBpedia is derived from Wikipedia and is distributed under the same licensing terms
as Wikipedia itself. As Wikipedia has moved to dual-licensing, we also dual-license DBpedia
starting with release 3.4.
DBpedia data from version 3.4 on is licensed under the terms of the Creative Commons
Attribution-ShareAlike 3.0 license and the GNU Free Documentation License. All DBpedia
releases up to and including release 3.3 are licensed under the terms of the GNU Free
Documentation License only.

Page | 17
Yahoo API
Yahoo does not have any proper API , which can be used for the purpose of Word
Disambiguation .
It has API such as:
 Yahoo! Answers API
 Content Analysis API
But these cannot be used for the specific purpose .
ContentAnalysisAPI :
Content Analysis Web Service detects entities/concepts, categories, and relationships within
unstructured content. It ranks those detected entities/concepts by their overall relevance,
resolves those if possible into Wikipedia pages, and annotates tags with relevant meta-data.
Please give our content analysis service a try to enrich your content.
RATE LIMITS :
The Content Analysis service is limited to 5,000 queries per IP address per day and to
noncommercial use.
Reference:http://developer.yahoo.com/search/content/V2/contentAnalysis.html
Yahoo!AnswersApi :
Yahoo! Answers is a place where people ask and answer questions on any topic. The Answers
API lets you tap into the collective knowledge of millions of Yahoo! users. Search for expert
advice on any topic, watch for new questions in the Answers categories of your choice, and
keep track of fresh content from your favorite Answers experts.

Page | 18
There are categories to use Answers API:
Using Answers API:
questionSearch
Find questions that match your query.
getByCategory
List questions from one of our hundreds of categories, filtered by type. You'll need the
category name or ID, which you can get from questionSearch.
getQuestion
Found an interesting question? getQuestion lists all the details for every answer to the
question ID you specify, including the best answer, if it's been chosen. Get that question
ID from questionSearch or getByCategory.
getByUser
List questions from specific users on Yahoo! Answers. You'll need the user id, which you
can get from any of the other services listed above.
RATE LIMITS
Yahoo! Web Search web services are limited to 5,000 queries per IP per day per API. See
information on rate limiting and our UsagePolicy to learn about acceptable uses and how to
request additional queries.

Page | 19
YAGO-
(YET ANOTHER GREAT ONTOLOGY)
YAGO2s is a huge semantic knowledge base, derived from
 Wikipedia
 WordNet
 GeoNames.
Currently, YAGO2s has knowledge of more than 10 million entities (like persons, organizations,
cities, etc.) and contains more than 120 million facts about these entities.
YAGO is special in several ways:
1. The accuracy of YAGO has been manually evaluated, proving a confirmed accuracy of
95%. Every relation is annotated with its confidence value.
2. YAGO is an ontology that is anchored in time and space. YAGO attaches a temporal
dimension and a spacial dimension to many of its facts and entities.
3. In addition to a taxonomy, YAGO has thematic domains such as "music" or "science"
from WordNet Domains.
YAGO2s is part of the YAGO-NAGA project at the Max Planck Institute for Informatics in
Saarbrücken/Germany. It is maintained jointly by the Databases and Information Systems
Group and the Ontologies Group.
The YAGO-NAGA project started in 2006 with the goal of building a conveniently searchable,
large-scale, highly accurate knowledge base of common facts in a machine-processible
representation.
They have already harvested knowledge about millions of entities and facts about their
relationships, from Wikipedia and WordNet with careful integration of these two sources. The
resulting knowledge base, coined YAGO, has very high precision and is freely available. The
facts are represented as RDF triples, and they have developed methods and prototype systems
for querying, ranking, and exploring knowledge. The search engine NAGA provides ranked
answers to queries based on statistical models.

Page | 20
where:
NAGA- Not Another Google Answer is a new semantic search engine which provides ranked
answers to queries based on statistical models.
What it contains:
• It contains all the entities and facts from GeoNames - (from a dump of August 2010).
• It also contains textual and structural data from Wikipedia.
• All links+anchor texts between the YAGO entities.
• All Wikipedia category names.
• The titles of references.
YAGO is particularly suited for disambiguation purposes, as it contains a large number of names
for entities. It also knows the gender of people.
YAGO is the resulting knowledge base, the facts are represented as RDF triples (Resource
Description Framework).
Why YAGO-NAGA :
• Three major research:
– Semantic-Web-style knowledge repositories.
• Such as SUMO, OpenCyc, and WordNet.
– Large-scale information extraction.
– Social tagging and Web 2.0 communities that constitute the social Web.
• Wikipedia is another example of the Social Web paradigm.

Page | 21
• The challenge is how to extract the important facts from the Web and organize them
into an explicit knowledge base that captures entities and semantic relationships among
them.
How YAGO-NAGA Works?
• YAGO adopts concepts from the standardized SPARQL Protocol and RDF Query
Language for RDF data but extends them through more expressive pattern matching
and ranking.
• The prototype system that implements these features is NAGA.
Growing the Knowledge Base :

Page | 22
YAGO Knowledge Base :
• Combine knowledge from WordNet& Wikipedia.
• Additional Gazetteers (geonames.org).
Like following diagram explains for a particular entity:
We can check YAGO through
• Browse through the YAGO knowledge base.
– https://d5gate.ag5.mpi-sb.mpg.de/webyagospotlx/Browser
• Ask queries on YAGO using SPOTLX patterns. View the results on a map and timeline.
– https://d5gate.ag5.mpi-sb.mpg.de/webyagospotlx/WebInterface

Page | 23
More than 13 sub-projects of YAGO-NAGA.
AIDA: is a method, implemented in an online tool, for disambiguating mentions of
named entities that occur in natural-language text or Web tables.
– https://d5gate.ag5.mpi-sb.mpg.de/webaida/
To Use this , you should have a knowledge of Ontology and RDF principles.
Some FAQ in brief about YAGO :
Question Answer
What is
YAGO?
YAGO is an ontology, i.e., a database with knowledge about the real world. YAGO
contains both entities (such as movies, people, cities, countries, etc.) and facts
about these entities (who played in which movie, which city is located in which
country, etc.). All in all, YAGO contains 10 million entities and 120 million facts.
What is so
special
about
YAGO?
YAGO is special in several ways:
The accuracy of YAGO has been manually evaluated, proving a confirmed accuracy
of 95%. Every relation is annotated with its confidence value.
YAGO is an ontology that is anchored in time and space. YAGO attaches a
temporal dimension and a spacial dimension to many of its facts and entities.
In addition to a taxonomy, YAGO has thematic domains such as "music" or
"science".
What is
new in
YAGO2s?
While preserving the quality and accuracy of its predecessor YAGO2, YAGO2s
improves over it in several ways:
YAGO2s is stored natively in Turtle, making it completely RDF/OWL compliant
while still maintaining the fact identifiers that are unique to YAGO.
The new YAGO2s architecture enables cooperation of several contributors,
facilitates debugging and maintenance. The data is divided into themes, so that
users can download only particular pieces of YAGO ("YAGO a la carte").
YAGO2s contains thematic domains such as "music" or "science", which gives a
topic structure to YAGO.

Page | 24
How is the
taxonomy
of YAGO
structured
?
YAGO classifies each entity into a taxonomy of classes. Every entity is an instance
of one or multiple classes. Every class (except the root class) is a subclass of one or
multiple classes. This yields a hierarchy of classes — the taxonomy. The YAGO
taxonomy is the backbone of the ontology, and is designed with much care and
attention to correctness.
For those interested in the details of that taxonomy, we provide here a more in-
depth explanation of the classes. The taxonomy consists of 4 layers:
The root node of the taxonomy is rdfs:Resource. It includes entities, but also
properties, literals, etc. rdfs:Resource has a subclass owl:Thing, which is the class
of things (entities).
Under owl:Thing, there is the class taxonomy from WordNet. Each class name is of
the form <wordnet_XXX_YYY>, where XXX is the name of the concept (e.g.,
singer), and YYY is the WordNet 3.0 synset id of the concept (e.g., 110599806). For
example, the class of singers is <wordnet_singer_110599806>. Each class is
connected to its more general class by the rdfs:subclassOf relationship.
The middle layer of the taxonomy consists of classes that have been derived from
Wikipedia categories. For example, one class is
<wikicategory_American_rock_singers>, derived from the Wikipedia category
American rock singers. Each of these classes is connected to one class of the
WordNet layer by a rdfs:subclassOf relationship. In the example,
<wikicategory_American_rock_singers>rdfs:subclassOf<wordnet_singer_1105998
06>. Not all Wikipedia categories become classes in YAGO.
The lowest layer of the taxonomy is the layer of instances. Instances comprise
individual entities such as rivers, people, or movies. For example, this layer
contains <Elvis_Presley>. Each instance is connected to one or multiple classes of
the higher layers by the relationship rdf:type. In the example:
<Elvis_Presley>rdf:type<wikicategory_American_rock_singers>.
This way, you can walk from the instance up to its class by rdf:type, and then
further up by rdfs:subclassOf.
Does
YAGO
have
thematic
domains?
YAGO provides a class hierarchy in the sense of RDF: Every subclass represents a
set of instances that is a subset of the set of instances of the super class. For
example, Elvis Presley is in the class of singers (because Elvis is a singer). This class
is a subclass of the class of persons, because every singer is a person. This is
different from a thematic domain hierarchy! A thematic domain hierarchy
contains items such as "Football", "Sports", "Music" etc. In such a hierarchy, Elvis
is in the domain "Music".
The new YAGO2s now contains a theme with WordNet Domains, which gives such
a thematic domain structure to YAGO.

Page | 25
What is
the data
format of
YAGO2s?
Turtle
The YAGO knowledge base is a set of independent modular full-text files. These
files are in the N3 Turtle format, ending in *.ttl. See
http://www.w3.org/TeamSubmission/turtle/ for details on this format.
N4
YAGO extends the Turtle format to the "N4 format". In this format, every triple
can have an identifier, the fact identifier. The fact identifier is specified as a
comment in the line before the triple. As a result, all N4 files are fully backwards
compatible with standard Turtle and N3. The fact identifier can appear as a
subject in other triples. This is used to annotate YAGO facts with time and space.
Identifiers
All identifiers in YAGO are standard Turtle identifiers. There are a number of
prefixes predefined, such as rdf, rdfs, owl, etc. The base is set to the namespace of
YAGO, http://yago-knowledge.org/resource/
YAGO defines its own datatypes, which extend the standard datatypes. Here are
examples for identifiers:
Entities are written in <> : <Elvis_Presley>
Strings are written in double quotes, with optional language tags: "Elvis",
"Elvis"@en
Literals are written in double quotes with a datatype: "1977-08-16"^^xsd:date,
"70"^<m>
(<m> is the YAGO literal datatype "meter", which is a subclass of "quantity")
How do
labels
work in
YAGO?
In line with RDF, YAGO distinguishes between the entity (Elvis_Presley) and names
for that entity ("Elvis", "The King", "Mr. Presley", etc.). The reason for this
distinction is that one entity can have multiple names. Also, one name can mean
multiple entities. Consider, e.g., the name "The King", which is highly ambiguous.
YAGO links an entity to its name by the relationship rdfs:label. For example, YAGO
contains the fact <Elvis_Presley>rdfs:label "Elvis". In addition, YAGO knows, for
each entity, its preferred name. This name is designated by the relationship
skos:prefLabel. For example, <Elvis_Presley>skos:prefLabel "Elvis Presley". Even if
Elvis has multiple names, his standard name is "Elvis Presley". In addition, YAGO
contains for each name its preferred meaning. This meaning is designated by
<isPreferredMeaningOf>. In the example, <Elvis_Presley><isPreferredMeaningOf>
"Elvis". Even if the word "Elvis" can refer to multiple entities, its default meaning is
Elvis Presley.

Page | 26
How do
meta facts
work?
YAGO gives a fact identifier to each fact. For example, the fact
<Elvis_Presley>rdf:type<person> could have the fact identifier <id_42>. In the
native N4/TTL version of YAGO, the fact identifiers are given in a comment line
before the actual fact. In the TSV version, they are simply an additional column.
YAGO contains facts about these fact identifiers. For example, YAGO contains
<id_42><occursSince> "1935-01-08"
<id_42><occursUntil "1977-08-16"
<id_42><extractionSource><http://en.wikipedia.org/Elvis_Presley>
These facts mean that Elvis was a person from the year 1935 to the year 1977, and
that this fact was found in Wikipedia.
What is
the
difference
between
YAGO and
DBpedia?
DBpedia is a community effort to extract structured information from Wikipedia.
In this sense, both YAGO and DBpedia share the same goal of generating a
structured ontology. The projects differ in their foci. In YAGO, the focus is on
precision, the taxonomic structure, and the spatial and temporal dimension. For a
detailed comparison of the projects, see Chapter 10.3 of our AI journal paper
"YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia".
How can I
access
YAGO?
There are several ways to access YAGO:
1.Online in person on our Web Interface
2.Online through the SPARQL interface provided by OpenLink
3.Offline by downloading the TTL version of YAGO, and loading it into an RDF
triple store (e.g., Jena)
4.Offline by downloading the TSV version of YAGO, loading it into a database with
the script provided at the bottom of the page, and using SQL
YAGO is freely available at http://yago-knowledge.org.
References :
http://www.mpi-inf.mpg.de/yago-naga/yago/
http://www.mpi-inf.mpg.de/yago-naga/yago/publications/aij.pdf
http://www.mpi-inf.mpg.de/yago-naga/javatools/doc/index.html

Page | 27
http://www.mpi-inf.mpg.de/yago-naga/
https://d5gate.ag5.mpi-sb.mpg.de/webyagospotlx/Browser
http://www.google.co.in/url?sa=t&rct=j&q=how+to+use+Yago+data+source+in+an+application
&source=web&cd=6&cad=rja&ved=0CFIQFjAF&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fvi
ewdoc%2Fdownload%3Fdoi%3D10.1.1.85.8206%26rep%3Drep1%26type%3Dpdf&ei=eCjlUI66D
czPrQeD6IHYAQ&usg=AFQjCNGDSu7YPl5asBgwh2coNOfVHui5AA&bvm=bv.1355534169,d.bmk
http://www.mpi-inf.mpg.de/yago-naga/yago/downloads.html
http://www.mpi-inf.mpg.de/~mtb/pub/yago-qa.ppt
http://faculty.ist.unomaha.edu/ylierler/teaching/material/YAGO-NAGA.pptx
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.85.8206&rep=rep1&type=pdf
Demo Paper:{ http://www.mpi-inf.mpg.de/yago-naga/yago/publications/btw2013d.pdf
https://d5gate.ag5.mpi-sb.mpg.de/webyagospo/FlightPlanner }
Paper: AI journal paper "YAGO2: A Spatially and Temporally Enhanced Knowledge Base from
Wikipedia":
TrueKnowledgeAPI
True Knowledge is a new class of Internet search technology aimed at improving the experience
of finding known facts on the Web. The True Knowledge Answer Engine gives consumers
instant answers to complex questions. Request information on any topic and get back results in
a processable form. Early areas of strength include geographic knowledge, local time, and
geolocation. Natural language questions can also be processed. View demos at their website.
Its now called as EVI .
In January 2012 True Knowledge launched a major new product Evi (pronounced eevee), an
artificial intelligence program which can be communicated with using natural language
The company changed its name from True Knowledge to Evi in June 2012.

Page | 28
The True Knowledge Answer Engine attempts to comprehend posed questions by
disambiguating from all possible meanings of the words in the question to find the most likely
meaning of the question being asked. It does this by drawing upon its database of knowledge of
discrete facts. As these facts are stored in a form that the computer can understand, the
answer engine attempts to produce an answer to what it comprehends to be the question by
logically deducing from them.[5]
For example, if one were to type in “What is the birth date of
George W. Bush?”, True Knowledge would reason from the facts “George W. Bush is a
president”, “George W. Bush is a human being”, “A president is a subclass of human being”,
“Date of creation is a more general form for birth date”, and “the 6th of July is the date of
creation for George W. Bush”, to produce the simple answer, “the 6th of July”. True Knowledge
differs from competitors like Freebase and DBpedia in that they offer natural language access.
Unlike the others however, users who post information to True Knowledge granted the
company a "non-exclusive, irrevocable, perpetual licence to use such information to operate
this website and for any other purposes.".
Evi gathers information for its database in two ways: importing it from "credible" external
databases (which for them includes Wikipedia: Citation Required) and from user submission
following a consistent format and detailed process for input.True Knowledge strives to monitor
this user submitted knowledge in multiple ways. One method involves a system of checks and
balances in some ways similar to Wikipedia's, allowing users to modify or “agree”/“disagree”
with information presented by True Knowledge. The system itself also assesses submitted
information due the fact that the information is submitted as discrete facts that computers can
understand. The system is able to reject any facts that are semantically incompatible with other
approved knowledge. On November 21, 2008, True Knowledge announced on its official blog
that over 100,000 facts had been added by beta users and as of August 18, 2010 the True
Knowledge database overall contained 283,511,156 facts about 9,237,091 things.
Note :In November 2010, True Knowledge used some 300 million facts to calculate that Sunday
April 11, 1954 was the most boring day since 1900.
The True Knowledge API enables developers to utilize True Knowledge’s functionality in third
party applications.
True Knowledge provides the following API services: the Direct Answer API and the Query API.
The Direct Answer API exposes the natural language question answering feature of True
Knowledge while the
Query API allows users to bypass our natural language translation system and directly query the
knowledge base using a simple query language.

Page | 29
API services are comprised of HTTP requests and XML responses.
With a free API account, users of our API services must credit True Knowledge and place a
prominent link back
to our site: http://www.trueknowledge.com/. With the direct answer service the question URL
returned in the tk_question_url tag should be used.
You can see the details about the use in the following link
http://images.trueknowledge.com/blog/wp-content/uploads/2011/02/tk_api_docs.pdf
Refernces :
http://www.apihub.com/api/true-knowledge-api
http://en.wikipedia.org/wiki/Evi_%28software%29
http://www.evi.com/q/francisco
http://www.evi.com/

Page | 30
Comparision of FreeBase and DBPedia
Freebase is an open-license database which provides data about million of things from various
domains. Freebase has recently released an Linked Data interface to their content (See release
note). As there is a big overlap between DBpedia and Freebase, they have added 2.4 million
RDF links to DBpedia pointing at the corresponding things in Freebase. These links can be
used to smush and fuse data about a thing from DBpedia and Freebase. For instance, you can
use the Marbles Linked Data browser to view data about the Lord of the Rings from Freebase
and DBpedia smushed together.
They have also updated the the RDF links to OpenCyc, which allow you to use DBpedia instance
data together with conceptual knowledge of OpenCyc.
Major diff b/w FreeBase and DBPedia :
FreeBase DBPedia
Freebase imports data from a wide variety of
sources, not just Wikipedia.
DBPedia focuses on just Wikipedia data
Freebase is owned and funded by Google, an
incorporated company.
DBPedia is funded by grants/sponsorships from
various organisations.
Freebase is part of the Semantic Web. We
emit Linked Open Data (via RDF) for all our
entities, and are involved in various SemWeb
projects/communities/etc. But now
FreeBase is connected with DBPedia.
It depends on RDF
Freebase is user-editable and contributions
can be made through a public interface
DBPedia requires that you edit Wikipedia for the
change to appear in DBPedia
Other important points are :-
DBpedia stores its data as RDF triples in a 3rd-party triple store.
Freebase stores its data as n-tuples in a proprietary tuple store.
Both communities make their data available as RDF.
Freebase provides complete data dumps.
DBpedia provides complete data dumps

Page | 31
DBpedia schema mappings can be edited by the community.
Freebase schema & data can be edited by the community.
DBpedia data is automatically generated from Wikipedia several times a year.
Wikipedia data is automatically imported into Freebase every two weeks.
DBpedia lets you query its data via a SPARQL endpoint.
Freebase lets you query its data via an MQL API.
DBpedia has strong connections to the Semantic Web research community.
Freebase has strong connections to the open data / startup community.
DBpedia tools are predominantly developed by 3rd parties and the open-source community.
Freebase tools are predominantly developed by Metaweb and the Freebase community.

Page | 32
Comparision of YAGO and DBPedia
Closest to YAGO in spirit is the DBpedia project, which also extracts an ontological knowledge
base from Wikipedia.
DBPedia YAGO
DBpedia project has manually developed
its own taxonomy
YAGO re-uses WordNet and enriches it with the
leaf categories from Wikipedia
DBpedia's taxonomy has merely 272 classes YAGO2 contains about 350,000.YAGO's
compatibility with WordNet allows easy linkage
and integration with other resources such as
Universal WordNet , which we
have exploited for YAGO2
DBpedia
outsourced the task of pattern de nition to
its community and uses a much
larger number of more diverse extraction
patterns, but ends up with redundancies and
even inconsistencies.
Overall, DBpedia contains about 1100
relations
For extracting relational facts from infoboxes,
YAGO2 uses carefully handcrafted
patterns, and reconciliates duplicate infobox
attributes (such as birth-
date and dateofbirth), mapping them to the
same canonical relation.
YAGO2 having about 100 relations.
The following key differences explain this big quantitative gap, and put the comparison in the
perspective of data quality.
 Many relations in DBpedia are very special. As an example, take air-
craftHelicopterAttack, which links a military unit to a means of transportation.
Half of DBpedia's relations have less than 500 facts.
 YAGO2's relations have more coarse-grained type signatures than DBpedia's.
For example, DBpedia knows the relations Writer, Composer, and
Singer, while YAGO2 expresses all of them by hasCreated. On the other
hand, it is easy for YAGO2 to infer the exact relationship (Writer vs.
Composer) from the types of the entities (Book vs. Song). So the same
information is present.

Page | 33
 YAGO2 does not contain inverse relationships. A relationship between
two entities is stored only once, in one direction. DBpedia, in contrast,
has several relations that are the inverses of other relations (e.g.,
hasChild/hasParent). This increases the number of relation names without
adding information.
 YAGO2 has a sophisticated time and space model, which represents time
and space as facts about facts. DBpedia closely follows the infobox attributes
in Wikipedia. This leads to relations such as populationAsOf,
which contain the validity year for another fact. A similar observation
holds for geospatial facts, with relations such as distanceToCardiff.
Overall, DBpedia and YAGO share the same goal and use many similar
ideas. At the same time, both projects have also developed complementary
techniques and foci. Therefore, the two projects generally inspire, enrich, and
help each other. For example, while DBpedia uses YAGO's taxonomy (for its
yago:type triples), YAGO relies on DBpedia as an entry point to the Web of
Linked Data

Page | 34
Conclusion
We discussed about the API’s above , giving a brief report on that :
Name of API LICENSE SIZE of
Dictionar
y
Features Ease of Usibility
FreeBaseAPI Creative
Commons
Attribution Only
(or CC-BY)
license.
Limit of 100,000
API calls per day
20 million
topics
more
than 3000
types,
and more
than
30,000
propertie
s
Freebase extract
structured data
from Wikipedia
and make RDF
available.Freeba
se is part of the
Semantic Web.
It is quite easy to use as , explained
in detail above , just by hitting the
url to browser.
DBPedia Creative
Commons
Attribution-
ShareAlike 3.0
license and the
GNU Free
Documentation
License
English
3,770,000
German
1,244,000
French
1,197,000
Dutch
993,000
Italian
882,000
Spanish
879,000
Polish
848,000
Japanese
781,000
Portugues
e
699,000
Swedish
457,000
Chinese
445,000
The DBpedia
project uses the
Resource
Description
Framework
(RDF) to
represent the
extracted
information.DBp
edia extracts
factual
information
from Wikipedia
pages, allowing
users to find
answers to
questions where
the information
is spread across
many different
Wikipedia
articles. Data is
accessed using
an SQL-like
query language
for RDF called
SPARQL.
To use this you should have idea of
SPARQL , if you are using it by the url
query, uyou have to use SPARQL for
complex query , there is other
option also by simply giving some
parameter in the url query and hit it
on the browser.

Page | 35
YAGO YAGO Ontology
is licensed under
a Creative
Commons
Attribution 3.0
License YAGO is
freely available
at http://yago-
knowledge.org.It
can be accessed
by diff ways:
1.Online in
person on our
Web Interface
2.Online through
the SPARQL
interface
provided by
OpenLink
3.Offline by
downloading the
TTL version of
YAGO, and
loading it into an
RDF triple store
(e.g., Jena)
4.Offline by
downloading the
TSV version of
YAGO, loading it
into a database
with the script
provided at the
bottom of the
page, and using
SQL
10 million
entities
YAGO2s is a
huge semantic
knowledge base,
derived from
• Wikipedia
• WordNet
• GeoNames.
1. The accuracy
of YAGO has
been manually
evaluated,
proving a
confirmed
accuracy of 95%.
Every relation is
annotated with
its confidence
value.
2. YAGO is an
ontology that is
anchored in time
and space. YAGO
attaches a
temporal
dimension and a
spacial
dimension to
many of its facts
and entities.
3. In addition to
a taxonomy,
YAGO has
thematic
domains such as
"music" or
"science" from
WordNet
Domains.
For using this , you should have
knowledge of RDF and ontology
principles,it gives out RDF tuples to
its queries.
WordRefere
nce
You must
include the
copyright line: ©
WordReference.
com
You are limited
to 600 requests
This has a
large data
source of
different
language
dictionary
. It's
WordReference
offers dictionary
translations of
words and
phrases, not
machine
translation of
Its easy to use. Get apikey ,and use
it to create the url query and hit the
request.

Page | 36
to the API per
hour by default.
dictionary
keeps on
growing
sentences. It has
recently released
a "/thesaurus/"
for its use of
dictionary of
english .
True
Knowledge
API
They provide
access to
portions of our
service through
an API enabling
people to build
applications on
top of our
platform. The
basic service is
free and they
offer paid
upgrades for
additional
features and
services. Free
API accounts
have a daily limit
(currently 2,000
"tokens") for
each person or
organisation. To
discuss an
upgrade please
contact
partners@evi.co
m
283,511,1
56 facts
about
9,237,091
things
True Knowledge
provides the
following API
services: the
Direct Answer
API and the
Query API.
The Direct
Answer API
exposes the
natural language
question
answering
feature of True
Knowledge while
the
Query API allows
users to bypass
our natural
language
translation
system and
directly query
the knowledge
base
using a simple
query language.
API services are
comprised of
HTTP requests
and XML
responses.
The Direct Answer API exposes the
natural language question
answering feature of True
Knowledge while the
Query API allows users to bypass our
natural language translation system
and directly query the knowledge
base
using a simple query language.
API services are comprised of HTTP
requests and XML responses.We can
see the use of this from the
following
http://images.trueknowledge.com/b
log/wp-
content/uploads/2011/02/tk_api_d
ocs.pdf
About YahooAPI’s :
We discussed about the 2 Api which Yaahoogave , i.e.
 Yahoo! Answers API
 Content Analysis API

Page | 37
These API’s are for answering any question and to analyse the content , but they are not having
any specific feature which can give the facility of word disambiguation.
There are many other data sources such as which can be used for specific purposes , such as :
 KnowItAll
 Omega
 WolframAlpha
 Cyc

Text Analytics Online Knowledge Base / Database

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Text Analytics Online Knowledge Base / Database

Similar to Text Analytics Online Knowledge Base / Database (20)

Recently uploaded

Recently uploaded (20)

Text Analytics Online Knowledge Base / Database