Your SlideShare is downloading. ×
0
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to Create a Custom Taxonomy
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to Create a Custom Taxonomy
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to Create a Custom Taxonomy
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to Create a Custom Taxonomy
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to Create a Custom Taxonomy
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to Create a Custom Taxonomy
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to Create a Custom Taxonomy
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to Create a Custom Taxonomy
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to Create a Custom Taxonomy
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to Create a Custom Taxonomy
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to Create a Custom Taxonomy
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to Create a Custom Taxonomy
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to Create a Custom Taxonomy
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to Create a Custom Taxonomy
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to Create a Custom Taxonomy
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to Create a Custom Taxonomy
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to Create a Custom Taxonomy
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to Create a Custom Taxonomy
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to Create a Custom Taxonomy
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to Create a Custom Taxonomy
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to Create a Custom Taxonomy
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to Create a Custom Taxonomy
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to Create a Custom Taxonomy
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to Create a Custom Taxonomy
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to Create a Custom Taxonomy
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to Create a Custom Taxonomy
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to Create a Custom Taxonomy
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to Create a Custom Taxonomy
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to Create a Custom Taxonomy
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to Create a Custom Taxonomy
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to Create a Custom Taxonomy
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to Create a Custom Taxonomy
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to Create a Custom Taxonomy
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to Create a Custom Taxonomy
Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to Create a Custom Taxonomy
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to Create a Custom Taxonomy

247

Published on

Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to Create a Custom Taxonomy …

Anna Divoli (Pingar Research): Extracting and Mapping SharePoint Content to Create a Custom Taxonomy
Pingar presentation at ShareFEST in Philadelphia (Apr 2013).

Published in: Health & Medicine, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
247
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Extracting and MappingSharePoint Content to Create aCustom TaxonomyAnna DivoliPingar Research@annadivoli
  • 2. Why?Why Automatic Generation?DynamicFastCheapConsistentRDF / Flexible…Why from a DocumentCollection?Focused/specificOptimal for those documents…Why Taxonomies?Organize knowledgeDomain representationEnable automatic tasks…Why in SharePoint?All you need is there!Can be used straight away!
  • 3. Talk OverviewThe TeamThe ProcessEvaluationUse Cases– Withdrawn drug– Cancer treatments– Re-purposed drugSummary
  • 4. Taxonomy Generation Research TeamOlena Medelyan, Steve Manion, Jeen Broekstra, Anna Divoli, Anna Lan Huang and Ian WittenConstructing a Focused Taxonomy from a Document CollectionESWC 2013, Montpellier, France
  • 5. Taxonomy Generation ProcessInput:Documentsstored somewhereAnalysis:Using variety of tools*and datasets, extractconcepts,entities, relationsGrouping & Output:A taxonomy is createdthat groups resultingtaxonomy termshierarchicallyCustomTaxonomy
  • 6. How Taxonomy Generation works
  • 7. DocumentDatabaseSolrConcepts &Relations DatabaseSesame1. Import& convert to text2. Extract concepts3. Annotatewith Linked Data4. Disambiguateclashing concepts5. ConsolidatetaxonomyInputDocsPreferredtop-level termsIn 5 Steps!FocusedSKOSTaxonomy
  • 8. Step 1. Document input & conversionInputDocuments DocumentDatabase1. Convert to textCurrent input:• Directory path readrecursivelyOther possible inputs:• Docs in a database or a DMS• Emails +attachments(Exchange)• Website URL• RSS feedExternal tool toconvert different fileformats to textDatabase to storedocument content
  • 9. Step 2. Extracting conceptsDocumentsDatabaseConceptsDatabase2. Extract conceptshttp://localhost/solr/select?q=path:mycollectiondocument456.txtPingar API:Taxonomy Terms:Climate and WeatherLeadersAgreementsPeople:Yvo de BoerMaite Nkoana-MashabaneOrganizations:Associated PressSouth African Council of ChurchesLocations:South AfricaWikify:Wikipedia Terms:South AfricaYvo de BoerU.N.Climate agreementsAssociated PressSpecific terminology:green policies; climate diplomacy
  • 10. Step 3. Annotation with meaningAnnotationsDatabase3. Annotate withLinked Datamycollection/document456.txtPingar API:People:Yvo de BoerMaite Nkoana-MashabaneOrganizations:Associated PressSouth African Council of ChurchesLocations:South AfricaLater this additional infowill help createe-Discovery & semantic searchsolutionsConceptsDatabase
  • 11. Step 4. Discarding irrelevant meaningsFinal ConceptsDatabase4. Disambiguateclashing conceptswikipedia.org/wiki/Oceanwikipedia.org/wiki/Apple_Corps freebase.com/view/en/apple_incwww.fao.org/aos/agrovoc#c_4607Over the past three years, Apple has acquired three mapping companiesFor millions of years, the oceans have been filled with sounds from natural sources.Two concepts were extracted,that are dissimilarDiscard the incorrect oneTwo concepts were extracted,that are similarAccept both correctAgrovoc term:Marine areasConceptsDatabase
  • 12. Step 5. Group taxonomy (a)5a. Add relationsConcepts &Relations Databasefelines tiger birdhorse familyzebra donkey pigeonhorselizardCategory:Carnivorous animals Category:Animalsanimals Building the taxonomybottom upBroader: Sqamata/Reptiles/Tetrapods/Vertebrates/Chordates/AnimalsFocusedSKOSTaxonomy
  • 13. Step 5. Consolidating taxonomy (b)Films and film makingFilm starsMila KunisDaniel RadcliffeSally HawkinsJulianna MarguliesAssociation football clubsFormer Football League clubsManchester United F.C.Manchester United F.C.Manchester City F.C.FinanceEconomics and financePersonal financeCommercial financeTaxCapital gains taxTaxCapital gains tax5b. Prune relationsConcepts &Relations DatabaseFocusedSKOSTaxonomy
  • 14. EvaluationRecall: 75%(comparing with manually generated taxonomy for thesame domain)Precision:89% for concepts90% for relations(15 human judges based evaluation)
  • 15. SharePoint Taxonomy Generation ProcessAnalysis:Using variety of tools*and datasets, extractconcepts,entities, relationsCustomTaxonomy
  • 16. Triazolam[A benzodiazepine drug used for short-term treatment of acute insomnia.Withdrawn in 1991 in the UK because ofrisk of psychiatric adverse drug reactions.It continues to be available in the U.S.]Excerpt of the taxonomy generated from:- 131 PubMed abstracts of clinical trialson triazolam before1991- 180 PubMed abstracts of clinical trialson triazolam since1991Colors of terms:- proposed to group other terms- found in both document collections- in before withdrawal docs- in since withdrawal docsTaxonomy StatisticsConcept Count: 305Edges Count: 437Intermediate Count: 97Leaves Count: 183Labels Count: 353Nesting Counts0: 251: 512: 1243: 1604: 1765: 1536: 547: 4Average Depth: 3.6
  • 17. proposed to group other termsin both document collectionsin before withdrawal docsin since withdrawal docs
  • 18. proposed to group other termsin both document collectionsin before withdrawal docsin since withdrawal docs
  • 19. proposed to group other termsin both document collectionsin before withdrawal docsin since withdrawal docs
  • 20. Cancer TreatmentsExcerpt of the taxonomy generated from:- 200 PubMed abstracts on breast cancertreatments- 149 (all) PubMed abstracts on lungcancer treatments- 47 (all) PubMed abstracts on gastriccancer treatmentsColors of terms:- proposed to group other terms- found in two or more documentcollections- in the breast treatment docs- in the stomach treatment docs- in the lung treatment docsTaxonomy StatisticsConcept Count: 308Edges Count: 387Intermediate Count: 90Leaves Count: 195Labels Count: 371Nesting Counts0: 231: 522: 993: 1384: 1375: 1596: 607: 368: 6Average Depth: 3.88
  • 21. proposed to group other termsin two or more document collectionsin the breast treatment docsin the stomach treatment docsin the lung treatment docs
  • 22. proposed to group other termsin two or more document collectionsin the breast treatment docsin the stomach treatment docsin the lung treatment docs
  • 23. proposed to group other termsin two or more document collectionsin the breast treatment docsin the stomach treatment docsin the lung treatment docs
  • 24. proposed to group other termsin two or more document collectionsin the breast treatment docsin the stomach treatment docsin the lung treatment docs
  • 25. proposed to group other termsin two or more document collectionsin the breast treatment docsin the stomach treatment docsin the lung treatment docs
  • 26. TamoxifenTamoxifen is drug commonly used to treat breast cancerbut with a subsequent indication for treating bipolardisorder.Excerpt of the taxonomy generated from:- papers discussing tamoxifen and bipolar disorder: 8 PubMedabstracts AND 2 PDFs of full papers (17641532, 18316672)- papers discussing tamoxifen and breast cancer: 50 PubMedabstracts of AND 2 PDFs of full papers (21635709, 12618491)- papers discussing tamoxifen but no mention of either breastcancer nor bipolar disorder: 50 PubMed abstracts of AND 2PDFs of full papers (16275887, 19458291)Colors of terms:- proposed to group other concepts- in two or more document collections- in the bipolar document collection- in the breast cancer document collection- in the neither cancer or bipolar document collectionTaxonomy StatisticsConcept Count: 587Edges Count: 751Intermediate Count: 188Leaves Count: 365Labels Count: 718Nesting Counts0: 341: 732: 1333: 2844: 2255: 1576: 897: 308: 2Average Depth: 3.66
  • 27. proposed to group other conceptsin two or more document collectionsin the bipolar document collectionin the breast cancer document collectionin the neither cancer or bipolar doc. collection
  • 28. proposed to group other conceptsin two or more document collectionsin the bipolar document collectionin the breast cancer document collectionin the neither cancer or bipolar doc. collection
  • 29. proposed to group other conceptsin two or more document collectionsin the bipolar document collectionin the breast cancer document collectionin the neither cancer or bipolar doc. collection
  • 30. proposed to group other conceptsin two or more document collectionsin the bipolar document collectionin the breast cancer document collectionin the neither cancer or bipolar doc. collection
  • 31. proposed to group other conceptsin two or more document collectionsin the bipolar document collectionin the breast cancer document collectionin the neither cancer or bipolar doc. collection
  • 32. proposed to group other conceptsin two or more document collectionsin the bipolar document collectionin the breast cancer document collectionin the neither cancer or bipolar doc. collection
  • 33. SummaryEntity ExtractionLinked DataDisambiguationConsolidationCase Studies
  • 34. More? bit.ly/f-steppingar.com@PingarHQanna.divoli@pingar.com@annadivoliFocused SKOS Taxonomy Extraction Process (F-STEP) wiki

×