SlideShare a Scribd company logo
Using Wikipedia to Determine
Knowledge Domains
Mitchell Burton
Project Aims
• Gather new information from Wikipedia
• Detect the knowledge domain of a document
• Use the domain to select descriptive concepts
• Explore the organisation of domains within
Wikipedia
wikipedia-miner.cms.waikato.ac.nz
Linking Phrases to Concepts
Linking Phrases to Concepts
Linking Phrases to Concepts
Linking Phrases to Concepts
Concepts
• Represented by a single Wikipedia article
• Disambiguated by Wikipedia Miner
• Context provided by article features
• Computed likelihood score
Talk Pages
Talk Pages
WikiProjects
• Most Talk pages on Wikipedia are tagged as being of
interest to one or more WikiProjects
• WikiProjects are organising structures for groups of
people to coordinate their activities
• WikiProjects focus on one topic
• WikiProjects are communities that define knowledge
domains
WikiProjects
• WikiProjects rate articles
• Importance
• Quality
Detecting the Domain of a
Document
• Use Wikipedia Miner to generate a list of Topics/Articles
• List all WikiProjects associated with those Articles
• Calculate a weight for each WikiProject
– Sum of the importances of the Articles to that WikiProject
– Top -> 1
– High -> 1/2
– Mid -> 1/4
– Low -> 1/8
• Return a ranked list of WikiProjects for that Document
Detecting the Domain of a
Document
{{WikiProject Baseball |braves=yes
|class=start |importance=mid
|phillies=yes |padres=yes
|padres-importance=mid}}
Detecting the Domain of a
Document
{{WPBannerMeta |PROJECT = Baseball |category={{{category|}}} |
listas={{{listas|}}} |IMAGE_LEFT = Mitlogo.svg
|IMAGE_LEFT_SMALL = 30px |IMAGE_LEFT_LARGE = 50px
|QUALITY_SCALE = subpage
|class={{{class|}}}
|auto={{{auto|}}}
|importance={{{importance|}}}
|ASSESSMENT_LINK = Wikipedia:WikiProject Baseball/Assessment
|MAIN_ARTICLE = baseball
Semantic Relatedness
• Measures how close two concepts are
Document Classification
• Tested on the 20 Newsgroups dataset
– Ken Lang
• (1995) Newsweeder: Learning to filter netnews
(probably…)
– Jason Rennie
• qwone.com/~jason/20Newsgroups/
• 1000 messages from each of 20 Usenet
Newsgroups (18,828 documents)
Document Classification
• alt.atheism WikiProject Atheism
• comp.graphics WikiProject Computer graphics
• comp.os.ms-windows.misc WikiProject Microsoft
• comp.sys.ibm.pc.hardware WikiProject Computing
• comp.sys.mac.hardware WikiProject Apple Inc.
• comp.windows.x WikiProject Linux
• rec.autos WikiProject Automobiles
• rec.motorcycles WikiProject Motorcycling
• rec.sport.baseball WikiProject Baseball
• rec.sport.hockey WikiProject Ice Hockey
• sci.crypt WikiProject Cryptography
• sci.electronics WikiProject Electronics
• sci.med WikiProject Medicine
• sci.space WikiProject Spaceflight
• soc.religion.christian WikiProject Christianity
• talk.politics.guns WikiProject Firearms
• talk.politics.mideast WikiProject Western Asia
• talk.politics.misc WikiProject Politics
• talk.religion.misc WikiProject Religion
Document Classification
• One WikiProject assigned to each
Newsgroup
• Documents processed and Newsgroup
predicted using Main Article Semantic
Relatedness
• Random sample of 1700 documents
Percent Correctly Classified
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
comp.graphics
comp.os.ms-windows.misc
comp.sys.ibm.pc.hardware
comp.sys.mac.hardware
comp.windows.x
sci.crypt
sci.electronics
sci.med
sci.space
rec.autos
rec.motorcycles
rec.sport.baseball
rec.sport.hockey
talk.religion.misc
alt.atheism
soc.religion.christian
talk.politics.mideast
talk.politics.guns
talk.politics.misc
Learning about Wikipedia
• Maps of knowledge produced using inter-
journal citations as links
– Boyack, K. W., Klavans, R., & Börner, K. (2005). Mapping the backbone of
science. Scientometrics, 64(3), 351-374.
– Focus on science or academia
• Some work on mapping Wikipedia
– Holloway, T., Bozicevic, M., Börner, K. (2007). Analyzing and visualizing the
semantic coverage of Wikipedia and its authors, Complexity 12(3), 30-40.
– Unclear what we can draw from this
Boyack, Klavans, & Börner. (2005).
Holloway, Bozicevic, Börner. (2007).
Mapping WikiProjects
• Iterate over all Wikipedia Articles
• Gather all WikiProject information
• Links between WikiProjects formed by
common Articles
• Small WikiProjects folded in to larger ones
– Smallest WikiProject folded into its largest link
ConfrencePres
ConfrencePres
ConfrencePres
ConfrencePres

More Related Content

Viewers also liked

怎麼看電影Day4
怎麼看電影Day4 怎麼看電影Day4
怎麼看電影Day4
gaowenwen
 
161102 學次方 我是牛魔王intro
161102 學次方 我是牛魔王intro161102 學次方 我是牛魔王intro
161102 學次方 我是牛魔王intro
Chia Yin Lin
 
Surbhi Sukhija - Work Profile
Surbhi Sukhija - Work ProfileSurbhi Sukhija - Work Profile
Surbhi Sukhija - Work ProfileSurbhi Sukhija
 
Researched Critical Essay
Researched Critical EssayResearched Critical Essay
Researched Critical EssayAlexis Smith
 
Kudu and Rust
Kudu and RustKudu and Rust
Kudu and Rust
Dan Burkert
 
Assalamu’alaikum wr
Assalamu’alaikum wrAssalamu’alaikum wr
Assalamu’alaikum wr
Avivah Vivah
 
Cara menghidupkan dan mematikan komputer1
Cara menghidupkan dan mematikan komputer1Cara menghidupkan dan mematikan komputer1
Cara menghidupkan dan mematikan komputer1
NURHADI Mas_nurhadie
 
TAHA MARAR C.V
TAHA MARAR C.VTAHA MARAR C.V
TAHA MARAR C.VTaha Marar
 
Fall 07 ProGd
Fall 07 ProGdFall 07 ProGd
Fall 07 ProGdJoan Dawn
 
BNI Kigdom - Ciocia Renia
BNI Kigdom - Ciocia ReniaBNI Kigdom - Ciocia Renia
BNI Kigdom - Ciocia Renia
BNIKingdom
 
jbl Professional Resume 012015-1
jbl Professional Resume 012015-1jbl Professional Resume 012015-1
jbl Professional Resume 012015-1Jack Lynch
 
161102 學次方 我是牛魔王intro
161102 學次方 我是牛魔王intro161102 學次方 我是牛魔王intro
161102 學次方 我是牛魔王intro
Chia Yin Lin
 
Emprendimiento presentacion 1
Emprendimiento presentacion 1Emprendimiento presentacion 1
Emprendimiento presentacion 1
Pilyalex
 
So!Art association presentation in Russian
So!Art association presentation in RussianSo!Art association presentation in Russian
So!Art association presentation in Russian
Daria Gissot
 

Viewers also liked (19)

Final Paper
Final PaperFinal Paper
Final Paper
 
怎麼看電影Day4
怎麼看電影Day4 怎麼看電影Day4
怎麼看電影Day4
 
161102 學次方 我是牛魔王intro
161102 學次方 我是牛魔王intro161102 學次方 我是牛魔王intro
161102 學次方 我是牛魔王intro
 
Surbhi Sukhija - Work Profile
Surbhi Sukhija - Work ProfileSurbhi Sukhija - Work Profile
Surbhi Sukhija - Work Profile
 
Researched Critical Essay
Researched Critical EssayResearched Critical Essay
Researched Critical Essay
 
ink 22
ink 22ink 22
ink 22
 
Kudu and Rust
Kudu and RustKudu and Rust
Kudu and Rust
 
Assalamu’alaikum wr
Assalamu’alaikum wrAssalamu’alaikum wr
Assalamu’alaikum wr
 
Asmeniniai rezultatai-5
Asmeniniai rezultatai-5Asmeniniai rezultatai-5
Asmeniniai rezultatai-5
 
Cara menghidupkan dan mematikan komputer1
Cara menghidupkan dan mematikan komputer1Cara menghidupkan dan mematikan komputer1
Cara menghidupkan dan mematikan komputer1
 
TAHA MARAR C.V
TAHA MARAR C.VTAHA MARAR C.V
TAHA MARAR C.V
 
Fall 07 ProGd
Fall 07 ProGdFall 07 ProGd
Fall 07 ProGd
 
BNI Kigdom - Ciocia Renia
BNI Kigdom - Ciocia ReniaBNI Kigdom - Ciocia Renia
BNI Kigdom - Ciocia Renia
 
jbl Professional Resume 012015-1
jbl Professional Resume 012015-1jbl Professional Resume 012015-1
jbl Professional Resume 012015-1
 
161102 學次方 我是牛魔王intro
161102 學次方 我是牛魔王intro161102 學次方 我是牛魔王intro
161102 學次方 我是牛魔王intro
 
Emprendimiento presentacion 1
Emprendimiento presentacion 1Emprendimiento presentacion 1
Emprendimiento presentacion 1
 
marketing_value_add_brochure-WEB
marketing_value_add_brochure-WEBmarketing_value_add_brochure-WEB
marketing_value_add_brochure-WEB
 
Business-Day
Business-DayBusiness-Day
Business-Day
 
So!Art association presentation in Russian
So!Art association presentation in RussianSo!Art association presentation in Russian
So!Art association presentation in Russian
 

Similar to ConfrencePres

Wikimedia 재단과 MediaWiki 위키 소프트웨어 조사
Wikimedia 재단과 MediaWiki 위키 소프트웨어 조사Wikimedia 재단과 MediaWiki 위키 소프트웨어 조사
Wikimedia 재단과 MediaWiki 위키 소프트웨어 조사
Chris
 
Using a Wiki for Collaboration and Coordination
Using a Wiki for Collaboration and CoordinationUsing a Wiki for Collaboration and Coordination
Using a Wiki for Collaboration and Coordination
Connie Crosby
 
Exploring Article Networks on Wikipedia with NodeXL
Exploring Article Networks on Wikipedia with NodeXLExploring Article Networks on Wikipedia with NodeXL
Exploring Article Networks on Wikipedia with NodeXL
Shalin Hai-Jew
 
Wiki technologies nov_2008_ye
Wiki technologies nov_2008_yeWiki technologies nov_2008_ye
Wiki technologies nov_2008_yevafopoulos
 
Wrangling Wikipedia
Wrangling WikipediaWrangling Wikipedia
Wrangling Wikipedia
moniquekclark
 
Häskell und Grepl: Data Hacking Wikimedia Projects Exampled With Open Access ...
Häskell und Grepl: Data Hacking Wikimedia Projects Exampled With Open Access ...Häskell und Grepl: Data Hacking Wikimedia Projects Exampled With Open Access ...
Häskell und Grepl: Data Hacking Wikimedia Projects Exampled With Open Access ...
Maximilian Klein
 
Csvconf data hacking-with_wikimedia_projects
Csvconf data hacking-with_wikimedia_projectsCsvconf data hacking-with_wikimedia_projects
Csvconf data hacking-with_wikimedia_projectsmattsenate
 
Advanced Wikipedia Editing Workshop
Advanced Wikipedia Editing WorkshopAdvanced Wikipedia Editing Workshop
Advanced Wikipedia Editing Workshop
dorohoward
 
Wikipedia & Cultural Heritage Institutions: Opportunities for Partnership
Wikipedia & Cultural Heritage Institutions: Opportunities for PartnershipWikipedia & Cultural Heritage Institutions: Opportunities for Partnership
Wikipedia & Cultural Heritage Institutions: Opportunities for Partnership
dorohoward
 
CS Honors Library Training - February 2017
CS Honors Library Training - February 2017CS Honors Library Training - February 2017
CS Honors Library Training - February 2017
pvhead123
 
Open Access and Wikipedia : Taking accessible research to the global public"
Open Access and  Wikipedia : Taking accessible research to the global public"Open Access and  Wikipedia : Taking accessible research to the global public"
Open Access and Wikipedia : Taking accessible research to the global public"
Nick Sheppard
 
Wikipedia and Libraries: Island Hopping the Data Archipelago
Wikipedia and Libraries: Island Hopping the Data ArchipelagoWikipedia and Libraries: Island Hopping the Data Archipelago
Wikipedia and Libraries: Island Hopping the Data Archipelago
Maximilian Klein
 
The public library and wikipedia
The public library and wikipediaThe public library and wikipedia
The public library and wikipedia
dorohoward
 
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...Olivier Grisel
 
Wikis, Rubrics and Views: An Integrated Approach to Improving Documentation
Wikis, Rubrics and Views: An Integrated Approach to Improving DocumentationWikis, Rubrics and Views: An Integrated Approach to Improving Documentation
Wikis, Rubrics and Views: An Integrated Approach to Improving Documentation
Ted Habermann
 
Beyond the Encylcopedia: The Frontiers of Free Knowledge
Beyond the Encylcopedia: The Frontiers of Free KnowledgeBeyond the Encylcopedia: The Frontiers of Free Knowledge
Beyond the Encylcopedia: The Frontiers of Free Knowledge
ErikMoeller
 
CS honours library training
CS honours library trainingCS honours library training
CS honours library training
pvhead123
 
Contributing to the global commons: Repositories and Wikimedia
Contributing to the global commons: Repositories and WikimediaContributing to the global commons: Repositories and Wikimedia
Contributing to the global commons: Repositories and Wikimedia
Nick Sheppard
 
SLA Presentation - Institutional Partnerships with Wikipedia
SLA Presentation - Institutional Partnerships with Wikipedia SLA Presentation - Institutional Partnerships with Wikipedia
SLA Presentation - Institutional Partnerships with Wikipedia
dorohoward
 
Wikidata: A New Way to Disseminate Structured Data
Wikidata: A New Way to Disseminate Structured DataWikidata: A New Way to Disseminate Structured Data
Wikidata: A New Way to Disseminate Structured Data
Luca Martinelli
 

Similar to ConfrencePres (20)

Wikimedia 재단과 MediaWiki 위키 소프트웨어 조사
Wikimedia 재단과 MediaWiki 위키 소프트웨어 조사Wikimedia 재단과 MediaWiki 위키 소프트웨어 조사
Wikimedia 재단과 MediaWiki 위키 소프트웨어 조사
 
Using a Wiki for Collaboration and Coordination
Using a Wiki for Collaboration and CoordinationUsing a Wiki for Collaboration and Coordination
Using a Wiki for Collaboration and Coordination
 
Exploring Article Networks on Wikipedia with NodeXL
Exploring Article Networks on Wikipedia with NodeXLExploring Article Networks on Wikipedia with NodeXL
Exploring Article Networks on Wikipedia with NodeXL
 
Wiki technologies nov_2008_ye
Wiki technologies nov_2008_yeWiki technologies nov_2008_ye
Wiki technologies nov_2008_ye
 
Wrangling Wikipedia
Wrangling WikipediaWrangling Wikipedia
Wrangling Wikipedia
 
Häskell und Grepl: Data Hacking Wikimedia Projects Exampled With Open Access ...
Häskell und Grepl: Data Hacking Wikimedia Projects Exampled With Open Access ...Häskell und Grepl: Data Hacking Wikimedia Projects Exampled With Open Access ...
Häskell und Grepl: Data Hacking Wikimedia Projects Exampled With Open Access ...
 
Csvconf data hacking-with_wikimedia_projects
Csvconf data hacking-with_wikimedia_projectsCsvconf data hacking-with_wikimedia_projects
Csvconf data hacking-with_wikimedia_projects
 
Advanced Wikipedia Editing Workshop
Advanced Wikipedia Editing WorkshopAdvanced Wikipedia Editing Workshop
Advanced Wikipedia Editing Workshop
 
Wikipedia & Cultural Heritage Institutions: Opportunities for Partnership
Wikipedia & Cultural Heritage Institutions: Opportunities for PartnershipWikipedia & Cultural Heritage Institutions: Opportunities for Partnership
Wikipedia & Cultural Heritage Institutions: Opportunities for Partnership
 
CS Honors Library Training - February 2017
CS Honors Library Training - February 2017CS Honors Library Training - February 2017
CS Honors Library Training - February 2017
 
Open Access and Wikipedia : Taking accessible research to the global public"
Open Access and  Wikipedia : Taking accessible research to the global public"Open Access and  Wikipedia : Taking accessible research to the global public"
Open Access and Wikipedia : Taking accessible research to the global public"
 
Wikipedia and Libraries: Island Hopping the Data Archipelago
Wikipedia and Libraries: Island Hopping the Data ArchipelagoWikipedia and Libraries: Island Hopping the Data Archipelago
Wikipedia and Libraries: Island Hopping the Data Archipelago
 
The public library and wikipedia
The public library and wikipediaThe public library and wikipedia
The public library and wikipedia
 
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
 
Wikis, Rubrics and Views: An Integrated Approach to Improving Documentation
Wikis, Rubrics and Views: An Integrated Approach to Improving DocumentationWikis, Rubrics and Views: An Integrated Approach to Improving Documentation
Wikis, Rubrics and Views: An Integrated Approach to Improving Documentation
 
Beyond the Encylcopedia: The Frontiers of Free Knowledge
Beyond the Encylcopedia: The Frontiers of Free KnowledgeBeyond the Encylcopedia: The Frontiers of Free Knowledge
Beyond the Encylcopedia: The Frontiers of Free Knowledge
 
CS honours library training
CS honours library trainingCS honours library training
CS honours library training
 
Contributing to the global commons: Repositories and Wikimedia
Contributing to the global commons: Repositories and WikimediaContributing to the global commons: Repositories and Wikimedia
Contributing to the global commons: Repositories and Wikimedia
 
SLA Presentation - Institutional Partnerships with Wikipedia
SLA Presentation - Institutional Partnerships with Wikipedia SLA Presentation - Institutional Partnerships with Wikipedia
SLA Presentation - Institutional Partnerships with Wikipedia
 
Wikidata: A New Way to Disseminate Structured Data
Wikidata: A New Way to Disseminate Structured DataWikidata: A New Way to Disseminate Structured Data
Wikidata: A New Way to Disseminate Structured Data
 

ConfrencePres