Commonsense
Knowledge in Wikidata
Filip Ilievski - Pedro Szekely - Daniel Schwabe
submitted to the Wikidata workshop @ ISWC’20
1.1 billion edges
84 million nodes
(May 2020)
‘sister’ of Wikipedia
1.1 billion edges
84 million nodes
(May 2020)
‘sister’ of Wikipedia
Q: pictures of animals with female grammatical gender
in German but male grammatical gender in French
Common sense
the basic ability to perceive, understand, and judge things that
are shared by nearly all people and can be reasonably
expected of nearly all people without need for debate
Research questions
Q1: Does Wikidata contain relevant commonsense knowledge?
Q2: If so, is this complementary to other commonsense knowledge sources?
Principles of Commonsense Knowledge
P1: Concepts, not entities
Houses have rooms
Versailles Palace has 700 rooms
P2: Commonness
Container used for storage
Noma subclass of aphthous stomatitis
P3: General-domain knowledge
wheel is part of a car
cholesterol has component cell membrane
Principles of Commonsense Knowledge
P1: Concepts, not entities
Houses have rooms
Versailles Palace has 700 rooms
Keep nodes with lowercase
alphanumeric characters
P2: Commonness
Container used for storage
Noma subclass of aphthous stomatitis
P3: General-domain knowledge
wheel is part of a car
cholesterol has component cell membrane
Principles of Commonsense Knowledge
P1: Concepts, not entities
Houses have rooms
Versailles Palace has 700 rooms
Keep nodes with lowercase
alphanumeric characters
P2: Commonness
Container used for storage
Noma subclass of aphthous stomatitis
Frequent words ~ common concepts
Usage stats on a large (independent!) corpus
P3: General-domain knowledge
wheel is part of a car
cholesterol has component cell membrane
After step 1 & 2:
414 relations
421k edges
Principles of Commonsense Knowledge
P1: Concepts, not entities
Houses have rooms
Versailles Palace has 700 rooms
Keep nodes with lowercase
alphanumeric characters
P2: Commonness
Container used for storage
Noma subclass of aphthous stomatitis
Frequent words ~ common concepts
Usage stats on a large (independent!) corpus
P3: General-domain knowledge
wheel is part of a car
cholesterol has component cell membrane
Take the top 50 relations (97.4% of all edges)
Annotate: domain-specific?
Annotate: map to ConceptNet relations
Domain-specific relations
cell component
strand orientation
molecular function
biological process
decays to
property constraint
Mapping
general-domain
relations to
ConceptNet
How much common sense is there in WD?
Has it been
growing
over time?
Is WD’s commonsense knowledge novel?
Discussion
1. Integrating Wikidata-CS with ConceptNet and other sources
2. Generalizing over instance-level knowledge
a. birthplace of people -> functional property
3. Missing knowledge types
a. typical/expected quantities (chairs have 4 legs, spiders have 8)
b. agent goals (compete in order to win)
c. symbolism (red - danger)
Conclusions
Common concepts & general relations allow us to distill Wikidata-CS
Wikidata contains some commonsense knowledge (0.01%)
Very little overlap with existing commonsense KGs
Future work:
1. enrich common sense coverage of Wikidata
2. integrate commonsense knowledge across sources
Thanks!

Commonsense knowledge in Wikidata

  • 1.
    Commonsense Knowledge in Wikidata FilipIlievski - Pedro Szekely - Daniel Schwabe submitted to the Wikidata workshop @ ISWC’20
  • 2.
    1.1 billion edges 84million nodes (May 2020) ‘sister’ of Wikipedia
  • 3.
    1.1 billion edges 84million nodes (May 2020) ‘sister’ of Wikipedia Q: pictures of animals with female grammatical gender in German but male grammatical gender in French
  • 4.
    Common sense the basicability to perceive, understand, and judge things that are shared by nearly all people and can be reasonably expected of nearly all people without need for debate
  • 5.
    Research questions Q1: DoesWikidata contain relevant commonsense knowledge? Q2: If so, is this complementary to other commonsense knowledge sources?
  • 6.
    Principles of CommonsenseKnowledge P1: Concepts, not entities Houses have rooms Versailles Palace has 700 rooms P2: Commonness Container used for storage Noma subclass of aphthous stomatitis P3: General-domain knowledge wheel is part of a car cholesterol has component cell membrane
  • 7.
    Principles of CommonsenseKnowledge P1: Concepts, not entities Houses have rooms Versailles Palace has 700 rooms Keep nodes with lowercase alphanumeric characters P2: Commonness Container used for storage Noma subclass of aphthous stomatitis P3: General-domain knowledge wheel is part of a car cholesterol has component cell membrane
  • 8.
    Principles of CommonsenseKnowledge P1: Concepts, not entities Houses have rooms Versailles Palace has 700 rooms Keep nodes with lowercase alphanumeric characters P2: Commonness Container used for storage Noma subclass of aphthous stomatitis Frequent words ~ common concepts Usage stats on a large (independent!) corpus P3: General-domain knowledge wheel is part of a car cholesterol has component cell membrane
  • 9.
    After step 1& 2: 414 relations 421k edges
  • 11.
    Principles of CommonsenseKnowledge P1: Concepts, not entities Houses have rooms Versailles Palace has 700 rooms Keep nodes with lowercase alphanumeric characters P2: Commonness Container used for storage Noma subclass of aphthous stomatitis Frequent words ~ common concepts Usage stats on a large (independent!) corpus P3: General-domain knowledge wheel is part of a car cholesterol has component cell membrane Take the top 50 relations (97.4% of all edges) Annotate: domain-specific? Annotate: map to ConceptNet relations
  • 12.
    Domain-specific relations cell component strandorientation molecular function biological process decays to property constraint
  • 13.
  • 14.
    How much commonsense is there in WD?
  • 15.
  • 16.
    Is WD’s commonsenseknowledge novel?
  • 17.
    Discussion 1. Integrating Wikidata-CSwith ConceptNet and other sources 2. Generalizing over instance-level knowledge a. birthplace of people -> functional property 3. Missing knowledge types a. typical/expected quantities (chairs have 4 legs, spiders have 8) b. agent goals (compete in order to win) c. symbolism (red - danger)
  • 18.
    Conclusions Common concepts &general relations allow us to distill Wikidata-CS Wikidata contains some commonsense knowledge (0.01%) Very little overlap with existing commonsense KGs Future work: 1. enrich common sense coverage of Wikidata 2. integrate commonsense knowledge across sources
  • 19.