Coupled Semi-Supervised Learning for Information Extraction

1,203 views
1,024 views

Published on

Andrew Carlson, Justin Betteridge, Richard C. Wang, Estevam R. Hruschka Jr., Tom M. Mitchell
, Coupled Semi-Supervised Learning for Information Extraction, In Proceeding WSDM '10 Proceedings of the third ACM international conference on Web search and data mining

Published in: Technology, Education
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,203
On SlideShare
0
From Embeds
0
Number of Embeds
51
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Coupled Semi-Supervised Learning for Information Extraction

  1. 1. Minsu KoCoupled Semi-Supervised Learning for InformationExtraction129 April2013Coupled Semi-Supervised Learningfor Information ExtractionAndrew Carlson, Justin Betteridge, Richard C.Wang, Estevam R. Hruschka Jr., Tom M. MitchellIn Proceeding WSDM 10 Proceedings of the third ACM internationalconference on Web search and data mining
  2. 2. INTRODUCTIONCoupled Semi-Supervised Learning for Information Extraction
  3. 3. Minsu KoCoupled Semi-Supervised Learning for InformationExtraction329 April2013More constraints, much greater accuracyMachine Learning Techniques used in this PaperBootstrapping Coupled TrainingSemi-Supervised LearningKey IdeaConsideration Examples PredicatesCategories academic fields, athletes Unary relationsRelations PlaysSport(athlete, sport) Binary relationsMuch greater accuracy can be achieved by furtherconstraining the learning task, by coupling the semi-supervised training of many extractors for differentcategories and relations.
  4. 4. Minsu KoCoupled Semi-Supervised Learning for InformationExtraction429 April2013Read the WebURL : http://rtw.ml.cmu.edu/Project Goal System that runs 24x7 and continually• Extracts knowledge from web text• Improves its ability to do sowith limited human effortInputsinitial ontologyhandful of examples of each predicate in ontologythe weboccasional access to human trainerIntroduction
  5. 5. MACHINE LEARNING TECHNIQUES- BOOTSTRAPPINGCoupled Semi-Supervised Learning for Information Extraction
  6. 6. Minsu KoCoupled Semi-Supervised Learning for InformationExtraction629 April2013Bootstrapping refers to a group of metaphors whichrefer to a self-sustaining process that proceedswithout external help.Start with a small number of labeled seed examples.Iteratively grow the set of labeled examples using high-confidence labels from the current model.Bootstrapping
  7. 7. Minsu Ko29 April20137Coupled Semi-Supervised Learning for InformationExtractionPROSThe existing sample dataprovides a knowledge wecan start from.Best as a gradual process.Overall, quality should bebetter.CONSDifference between sampleand target data.After many iterations,accuracy typically declinesbecause errors in labelingaccumulate.Bootstrapping
  8. 8. MACHINE LEARNING TECHNIQUES- SEMI-SUPERVISED LEARNINGCoupled Semi-Supervised Learning for Information Extraction
  9. 9. Minsu KoCoupled Semi-Supervised Learning for InformationExtraction929 April2013Supervised LearningAccurate, but ExpensiveRequiring many labeled training examplesAny alternatives? Semi-Supervised Learning!Small number of labeled examplesMotivation
  10. 10. Minsu KoCoupled Semi-Supervised Learning for InformationExtraction1029 April2013Semi-Supervised Methods in Information Extractionsuffer from divergencepotential for advances in semi-supervised machine learningExtracted knowledge is useful for many applications.Computational Advertising Find the best match between a given user in a given context and asuitable advertisementSearchQuestion AnsweringSemi-Supervised Learning
  11. 11. Minsu KoCoupled Semi-Supervised Learning for InformationExtraction1129 April2013The Key to Accurate Semi-Supervised LearningSemi-Supervised LearningNP1 NP2Krzyzewski coaches the Blue Devils.athleteteamcoachesTeam(c,t)personcoachsportplaysForTeam(a,t)NPKrzyzewski coaches the Blue Devils.coach(NP)hard (underconstrained)semi-supervised learningproblemmuch easier (more constrained)semi-supervised learning problemteamPlaysSport(t,s)playsSport(a,s)Key idea: Couple the training of many functions to make unlabeled datamore informative.
  12. 12. Minsu KoCoupled Semi-Supervised Learning for InformationExtraction1229 April2013Bootstrapped Pattern Learning: Countries[Brin 98, Riloff and Jones 99]Semi-Supervised Bootstrap LearningCanadaEgyptFranceGermanyIraqcountries except XX is the only countryhome country of XPakistanSri LankaArgentinaGreeceRussiaGDP of Xelected president of XX has a multi-party systemit’s underconstrained !!.....
  13. 13. COUPLED TRAININGCoupled Semi-Supervised Learning for Information Extraction
  14. 14. Minsu KoCoupled Semi-Supervised Learning for InformationExtraction1429 April2013Central idea to this workcoupling the semi-supervised learning of multiple functionsto constrain our learning problemiteratively training classifiers in a self-supervised mannerCoupling Different Extraction TechniquesIntuition Different extractors make independent errorsStrategy (Meta-Bootstrap Learner) Only promote instances recommended by multiple techniquesCoupled Training
  15. 15. Minsu KoCoupled Semi-Supervised Learning for InformationExtraction1529 April2013The functionscategory extractorrelation extractorExtractors decide if a noun phrase or pair of noun phrases isan instance of some category or relation.Coupling Constraints
  16. 16. Minsu KoCoupled Semi-Supervised Learning for InformationExtraction1629 April2013Mutual ExclusionMutually exclusive predicates cannot both be satisfied bythe same input x.General type of Mutual Exclusion Output constraintsFor two functions and ,if we know some constraint on values and for an input xwe can require and to satisfy this constraint.Coupling Constraintsaa YXf →: bb YXf →:bfafbfaf
  17. 17. Minsu KoCoupled Semi-Supervised Learning for InformationExtraction1729 April2013Relation Argument Type Checkinge.g., The arguments of the CompanyIsInEconomicSectorrelation are declared to be of the categories Company andEconomicSector.General type of Type Checking Compositional constraintsFor two functions and ,we may have a constraint on valid and pairsfor a given and any .we can require and to satisfy this constraint.Coupling Constraints111 : YXf → 2212 : YXXf →×2y1y2x1x2f1f
  18. 18. Minsu KoCoupled Semi-Supervised Learning for InformationExtraction1829 April2013Unstructured and Semi-structured Text FeaturesNoun phrases on the web appear in two types of contexts:freeform textual contexts / semi-structured context.General type of Type Checking Multi-view-agreement constraintsFor a function , if can be partitioned intotwo “views” where we writeand we assume that both and can predict ,then we can learn andand constrain them to agree.Coupling ConstraintsYXf →: X21, XXX =YYXf →11 : YXf →22 :2X1X
  19. 19. Minsu KoCoupled Semi-Supervised Learning for InformationExtraction1929 April2013Semantic Drift (Curran 07)Without costly human intervention, the extracted termsdrift from the meaning of the original seed terms.Weakness of Bootstrap techniquesCanadaEgyptFranceGermanyIraq...war with Xambassador to Xwar in Xoccupation of Xinvasion of Xplanet EarthFreetownNorth Africa
  20. 20. Minsu KoCoupled Semi-Supervised Learning for InformationExtraction2029 April2013Mutual ExclusionCoupling Constraints for Avoiding Semantic DriftPositivesPositivesCanadaEgyptFranceGermanyIraq…war with Xambassador to Xwar in Xoccupation of Xinvasion of Xplanet EarthFreetownNorth AfricaNegativesNegativesCanadaEgyptFranceGermanyIraq…nations like Xcountries other than Xcountry like Xnations such as Xcountries , like XPakistanSri LankaArgentinaGreeceRussia
  21. 21. Minsu KoCoupled Semi-Supervised Learning for InformationExtraction2129 April2013Type CheckingCoupling Constraints for Avoiding Semantic DriftX , which is based in YPillar, San Jose OKOKType Checking Arguments:... companies such as Pillar ...... cities like San Jose ...inclined pillar, foundation plate Not OKNot OK
  22. 22. Minsu KoCoupled Semi-Supervised Learning for InformationExtraction2229 April2013Solid lines : Mutual ExclusionDash lines : Type-checkingExample of Coupling Constraints
  23. 23. Minsu KoCoupled Semi-Supervised Learning for InformationExtraction2329 April2013Coupled Bootstrap Learner algorithm
  24. 24. ALGORITHMSCoupled Semi-Supervised Learning for Information Extraction
  25. 25. Minsu KoCoupled Semi-Supervised Learning for InformationExtraction2529 April2013Investigate the feasibility of improving semi-supervised learning for information extraction withcoupling.Focus on extracting facts that are stated multipletimes, which we can assess probabilistically usingcorpus statistics.Do not resolve strings to real-world entities:No Synonym resolutionNo disambiguation of stringsAlgorithms
  26. 26. Minsu KoCoupled Semi-Supervised Learning for InformationExtraction2629 April2013Specific inputs to algorithmsLarge text corpusInitial ontology with predefined categoriesRelationsMutual-exclusion relationships between same-aritypredicatesSeed instances for all predicatesAlgorithms
  27. 27. Minsu KoCoupled Semi-Supervised Learning for InformationExtraction2729 April2013CPL learns contextual patterns that are high-precisionextractors for each predicate.X and other software firmsX scored a goal for Yuses them to grow a set of high-precision predicateinstancesNoun phrases that fill in the X and Y blanks of patternsin sentences in the text corpus are said to co-occurwith those patterns.Coupled Pattern Learner (CPL)
  28. 28. Minsu KoCoupled Semi-Supervised Learning for InformationExtraction2829 April2013Coupled Pattern Learner (CPL)
  29. 29. Minsu KoCoupled Semi-Supervised Learning for InformationExtraction2929 April2013Intuition : Coupling can improve SEAL (Set Expanderfor Any Language).CSEALA set expansion algorithmCSEAL filters out any document that extracts a candidateinstance that is a member of a mutually exclusive predicate.CSEAL only considers candidate relation instances, if theirarguments are candidate instances for the respectivecategory types.Coupled SEAL (CSEAL)
  30. 30. Minsu KoCoupled Semi-Supervised Learning for InformationExtraction3029 April2013SEAL (Wang and Cohen, 2007)SEAL: Set Expander for Any Language
  31. 31. Minsu KoCoupled Semi-Supervised Learning for InformationExtraction3129 April2013Coupled SEAL (CSEAL)
  32. 32. Minsu KoCoupled Semi-Supervised Learning for InformationExtraction3229 April2013Intuition : The errors made by different extractiontechniques should be independent.MBL : CSEAL, CPLThe subordinate algorithms do not promote instanceson their own.MBL promotes any instance that has beenrecommended by both techniques while obeying themutual-exclusion and type-checking constraintsspecified in the ontology.Meta-Bootstrap Learner (MBL)
  33. 33. Minsu KoCoupled Semi-Supervised Learning for InformationExtraction3329 April2013Coupling between CPL and CSEAL learned functionsMulti-view constraint between each pairMeta-Bootstrap Learner (MBL)
  34. 34. Minsu KoCoupled Semi-Supervised Learning for InformationExtraction3429 April2013Meta-Bootstrap Learner (MBL)In the ontology: categories,relations, seed instances andpatterns, type information, mutualexclusion and subset relationsExtraction :Arg1 HQ in Arg2  (CBC ||Toronto), (Adobe || San Jose), …Micron || Boise  arg2 isheadquarters for chipmaker arg1,arg1 of arg2, arg1 Corpheadquarters in arg2, …Filtering :CBC || Toronto  Not enoughevidencearg1 of arg2  too generalarg2 is headquarters forchipmaker arg1  too specificPromote top ranked instancesand patterns. Use type-checking.
  35. 35. EXPERIMENTAL EVALUATIONCoupled Semi-Supervised Learning for Information Extraction
  36. 36. Minsu KoCoupled Semi-Supervised Learning for InformationExtraction3629 April201376 predicates32 relations, 44 categoriesRun different algorithms for 10 iterations:MBL: Meta-Bootstrap Learner (CPL + CSEAL)CSEAL: Coupled SEALCPL: Coupled Pattern LearnerSEAL: Uncoupled SEALUPL: Uncoupled Pattern LearnerEvaluate correctness of instances with MechanicalTurkExperimental Evaluation
  37. 37. Minsu KoCoupled Semi-Supervised Learning for InformationExtraction3729 April2013Precision of Promoted Instances
  38. 38. Minsu KoCoupled Semi-Supervised Learning for InformationExtraction3829 April2013Example Promoted InstancesInstance Predicatesolomon islands countrystuffit productmarine industry economicSectorsoccer, player sportUsesEquipmentunocal, oil companyEconomicSectorfinal cut pro, software productInstanceOf
  39. 39. Minsu KoCoupled Semi-Supervised Learning for InformationExtraction3929 April2013Example PatternsPattern Predicateblockbuster trade for X athleteairlines, including X companypersonal feelings of X emotionX announced plans to buy Y companyAcquiredCompanyX learned to play Y athletePlaysSortX dominance in Y teamPlaysInLeague
  40. 40. Minsu KoCoupled Semi-Supervised Learning for InformationExtraction4029 April2013Examples of Promoted Factsnissan:generalizations = {company}literalString = {Nissan, NISSAN, nissan}acquired = {toyota}acquiredBy = renaulthasOfficeInCountry = {japan, usa, mexico}competesWith = {honda}ebay:generalizations = {company}literalString = {eBay, EBay, Ebay, ebay, EBAY, eBAY}acquired = {skype, stumbleupon}competesWith = {amazon, yahoo, google, microsoft}hasOfficeInCountry = {usa, united_kingdom}
  41. 41. Minsu KoCoupled Semi-Supervised Learning for InformationExtraction4129 April2013Number of New Instances per Category
  42. 42. CONCLUSIONCoupled Semi-Supervised Learning for Information Extraction
  43. 43. Minsu KoCoupled Semi-Supervised Learning for InformationExtraction4329 April2013The first to couple the simultaneous semi-supervisedtraining of category and relation extractors.The first to couple the training of freeform-textextractors and semi-structured web page wrapperinducers by assuming that they make independenterrors.Large-scale coupled training as a strategy tosignificantly improve accuracy in semi-supervisedlearning, identifies three distinct types of coupling.Contribution
  44. 44. Minsu KoCoupled Semi-Supervised Learning for InformationExtraction4429 April2013Improves free text pattern learning (CPL)Improves semi-structured IE (CSEAL)Improves separate techniques that make independenterrors (MBL)Conclusion
  45. 45. Minsu KoCoupled Semi-Supervised Learning for InformationExtraction4529 April2013감 사 합 니 다 .문의사항 및 기술지원contact@owl-nest.com

×