download
Upcoming SlideShare
Loading in...5
×
 

download

on

  • 958 views

 

Statistics

Views

Total Views
958
Views on SlideShare
958
Embed Views
0

Actions

Likes
0
Downloads
13
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

download download Presentation Transcript

  • Practical semantic web mining platform
  • What is?
    • SWM includes:
      • Semantic Web and RDF
      • Regular Expressions, Web Agents
      • HMMs and Information Extraction
      • Rule Mining, F-Logic, Description Logic
      • Information Integration
      • Planning for Data Gathering
      • Ontologies, Learning, Editing
      • Text Classification
      • Applications: E-Commerce
      • Web services
      • Semantic Web Browser
      • etc
  • Some Background
  •  
  • Algorithm/theory of ML
    • Techniques of Machine Learning /Data Mining
      • Bayesian classification/NN/GA
      • Statistical technique
      • Active Learning, Multi-View Learning
      • Risk Minimization/Maximum Entropy Model
  • Annotation
    • Multiple Sources
    • Annotation tools
    • Using ML to automate the process
      • Learn annotation rule
      • Active Learn Driven (reduce training sample)
      • Multi-view (improve performance)
      • Multi-view detection (improve again)
  • Mapping & Link
    • Mapping
      • Find mapping points
      • Find Complex mapping points (subof, superof, 5*(a+b), even conjunct of, etc)
      • Translate instances based on Mapping
    • Link
      • Find Link Points
      • Find Complex Links
      • Integrate Ontology
    • Mapping/Link detection.
  •  
  • Mapping & Link
    • Multi-view
      • name
      • Instance
      • Relationship, etc
    • Active learning. Ask the user to specify the most confused mapping/link
    • Multi-view detection. Improve the performance
  • Indexing
    • What is the difference between SI and Text indexing/XML indexing?
    • How to define the data structure of SI? (note that such structure should represent the characters of SW & Ontology)
    • How to make it efficient? (how to compare to others work? Are there some works on it?)
  • Semantic Retrieval
    • Domain vs. General
    • Make use of SI & Ontology to improve the performance.
    • Make use of reasoning technique to improve.
  • Reasoning
    • Reasoning rules learning
    • Example: Resumes, Jobs
      • How to find the most appropriate job for individual?
      • How to find the most appropriate person for specified job?
      • Define the Rules: if Person.Age(x)<30 then Job(y).Salary>8000
    • Rule Discovery
  • Applications
    • Jobs & Resumes
    • E-Commerce. E.g. Travel, Tickets, etc.
    • Personal Assistant. Track ones work and interest to find new information automatically.
    • Semantic Web Browser
  • Free discussion for the platform
  • Aspects
    • Data
    • Content
      • what will to do, what can do, what not.
      • Semantic web, semantic web services
    • Theory->>may be basic for SCI 
    • Practical application!!!! important
    • Proposal & Schedule.
  • Data
    • Data preparation
      • Domain: job&resume, software (from sourceforge), travel web services.
      • ontology. Metadata & instance
    • Works:
      • metadata definition  integrate a ontology editor (protégé or ontoedit or orient)
      • Instance database,  use technique of annotation or IE to extract information from specific web sites.
      • How to save  use jena to save the data in database and query it by RQL  indexing?
  • Content
    • Ontology building, knowledge base building  use wordnet to assist
    • Composition for web services. If not web services, what we can do, such as jobs & resumes.
    • Annotation & deep annotation. Web service annotation, text annotation, even image annotation.
    • Mapping. concept mapping, instance mapping.  translation, merge, meaning negotiation(mapping representation)
    • Data Integration. Combine annotation and mapping
  • Content
    • Semantic search engine. Its definition? Simple search=data search , then how to make use of ontology. Reasoning ?
      • How to make it practical, that is, how to do it in our domain. Shall it be a general one or domain one?
    • Ontology summary. Need a better name. output knowledge in ontology by NLP.
    • Indexing?
    • Tools integration
  • Theory
    • ML, data mining.
      • Inductive learning: NN, Bayes, SVM, GA. Code them or one of them by ourselves. It will cost our time, but it doesn’t mean waste time.
      • Transductive learning.
      • Selective learning.
      • More general theory, risk minimization. Note that RM is an algorithm. It is a framework for ML. Any learning algorithms can be used as its implementation.
    • Active learning + multi-view
      • Reduce the samples of training.
      • Improve the precision.
  • Practical application
    • Jobs & resumes
      • Targets: to find the best qualified resumes/persons for specified job or to find the best jobs for a person.
    • Software from sourceforge, etc.
      • Aim at software composition.  web service composition. Software search
  • Practical application
    • more?
  • Proposal & schedule
    • Why proposal?
    • Why schedule?
    • Can we work together for the possible platform?
  • Further Reading
  • Further reading on Semantic Annotation
    • A. Kiryakov, B. Popov, et al. Semantic Annotation, Indexing, and Retrieval. 2nd International Semantic Web Conference (ISWC2003), http://www.ontotext.com/publications/index.html#KiryakovEtAl2003
    • [Alani, 2003] Alani, H., Kim, S., Millard, D., Weal, M., Hall, W., Lewis, P. and Shadbolt, N. Automatic Ontology-Based Knowledge Extraction from Web Documents. IEEE Intelligent Systems 18(1):pp. 14-21.
    • [Bemjamins, 2002]Richard Benjamins, Jesus Contreras. White Paper Six Challenges for the Semantic Web. Intelligent Software Components. Intelligent software for the networked economy (isoco). April, 2002.
    • [Berners-Lee, 1999] Tim Berners-Lee, Mark Fischetti (Contributor), Michael L. Dertouzos; “Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web”; 1999.
    • [Califf, 1998] Califf M. E. (1998), Relational Learning Techniques for Natural Language Information Extraction, Ph.D. thesis, Univ. Texas, Austin, 1998
    • [Ciravegna, 2001] Fabio Ciravegna. (LP)2, an adaptive algorithm for information extraction from web-related texts. In Proceedings of the IJCAI-2001 Workshop on Adaptive Text Extraction and Mining held in conjunction with 17th International Joint Conference on Artificial Intelligence (IJCAI), Seattle, Usa, August 2001.
  • Further reading on Semantic Annotation
    • [Cohen, 2001] W. Cohen, L. Jensen, A structured wrapper induction system for extracting information from semi-structured documents, in: Proceedings of the Workshop on Adaptive Text Extraction and Mining (IJCAI’01), 2001.
    • [Cunningham. 2002] H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, 2002.
    • [Czejdo, 2000] B. Czejdo, J. Dinsmore, C. H. Hwang, R. Miller, M. Rusinkiewicz. Automatic Generation of Ontology Based Annotations in XML and Their Use in Retrieval Systems. Proceedings of the First International Conference on Web Information Systems Engineering (WISE'00)-Volume 1. IEEE Computer Society Washington, DC, USA. 2000. 296-300
    • [Dhamankar, 2004] Robin Dhamankar, Yoonkyong Lee, AnHai Doan, Alon Halevy, Pedro Domingos. iMAP: Discovering Complex Semantic Matches between Database Schemas. SIGMOD 2004 June 1318, 2004, Paris, France.
  • Further reading on Semantic Annotation
    • [Dill, 2003] Stephen Dill, Nadav Eiron, David Gibson, Daniel Gruhl, R. Guha, Anant Jhingran, Tapas Kanungo, Kevin S. McCurley, Sridhar Rajagopalan, Andrew Tomkins, John A. Tomlin, Jason Y. Zien. A case for automated large-scale semantic annotation. Journal of Web Semantics: Science, Services and Agents on the World Wide Web. Published by Elsevier B.V. July, 2003:115-132
    • [Eriksson, 1999] H. Eriksson, R. Fergerson, Y. Shahar, and M. Musen. Automatic generation of ontology editors. In Proceedings of the 12th Banff Knowledge Acquisition Workshop, Banff Alberta, Canada, 1999.
    • [Handschuh, 2002] S. Handschuh, S. Staab, F. Ciravegna, S-CREAM—semi-automatic creation of metadata, in: Proceedings of the 13th International Conference on Knowledge Engineering and Management (EKAW 2002), Siguenza, Spain, 2002, pp. 358-372.
    • [Heflin, 2000] J. Heflin, J. Hendler, Searching the web with shoe, in: AAAI-2000 Workshop on AI for Web Search, Austin, Texas, 2000.
    • [Kahan, 2001] J. Kahan, M.-R. Koivunen, Annotea: an open RDF infrastructure for shared web annotations, in: World Wide Web, 2001, pp. 623-632.
  • Further reading on Semantic Annotation
    • [Kogut, 2001] P. Kogut, W. Holmes, AeroDAML: applying information extraction to generate DAML annotations from web pages, 2001.
    • [Kushmerick, 1997] N. Kushmerick, D.S. Weld, R.B. Doorenbos, Wrapper induction for information extraction, in: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 1997, Nagoya, Japan, pp. 729-C737.
    • [Leonard, 2001] T. Leonard, H. Glaser, Large scale acquisition and maintenance from the web without source access, http://www. semannot2001.aifb.uni-karlsruhe.de/positionpapers/Leonard. pdf, 2001.
    • [Lerman, 2001] K. Lerman, C. Knoblock, S. Minton, Automatic data extraction from lists and tables in web sources, in: IJCAI-2001 Workshop on Adaptive Text Extraction and Mining, Seattle, WA, August 2001.
    • [Li, 2001] L.Z. Jianming Li, Y. Yu, Learning to generate semantic annotation for domain specific sentences, in: Knowledge Markup and Semantic Annotation Workshop in K-CAP 2001, Victoria, BC, 2001.
    • [Popov, 2003] Borislav Popov, Atanas Kiryakov, Dimitar Manov, Angel Kirilov, Damyan Ognyanoff, and Miroslav Goranov. Towards Semantic Web Information Extraction. In ISWC'03 Workshop on Human Language Technology for the Semantic Web and Web Services, 2003.1-21
  • Further reading on Semantic Annotation
    • [Schaffer, 1993] Selecting a classification method by cross-validation. Machine Learning, 13(1):135-143
    • [Soderlan, 1999] Soderland, S. Learning information extraction rules for semi-structured and free text. Machine Learning. 1999,1. 1-44
    • [Soo, 2003] Von-Wun Soo, Chen-Yu Lee, Chung-Cheng Li, Shu Lei Chen and Ching-chih Chen. Automated Semantic Annotation and Retrieval Based on Sharable Ontology and Case-based Learning Techniques. Proceedings of the 2003 Joint Conference on Digital Libraries. 2003 IEEE.
    • [Vargas-Vera, 2001] M. Vargas-Vera, E. Motta, J. Domingue, S. Buckingham Shum, and M. Lanzoni. Knowledge Extraction by using an Ontology-based Annotation Tool. In K-CAP 2001 workshop on Knowledge Markup and Semantic Annotation, Victoria, BC, Canada, October 2001.
    • [Vargas-Vera, 2002] M. Vargas-Vera, E. Motta, J. Domingue, M. Lanzoni, A. Stutt, F. Ciravegna, MnM: ontology driven semiautomatic and automatic support for semantic markup, in: Proceedings of the 13th International Conference on Knowledge Engineering and Management (EKAW 2002), Siguenza, Spain, 2002.
  • Further reading on Ontology Mapping
    • [1] Berger, J. Statistical decision theory and Bayesian analysis. Springer-Verlag. 1985
    • [2] Calvanese, D.; De Giacomo, G.; and Lenzerini, M. 2002. A framework for ontology integration. In Cruz, I.; Decker, S.; Euzenat, J.; and McGuinness, D., eds., The Emerging Semantic Web. IOS Press. 201-214.
    • [3] H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, 2002.
    • [4] Robin Dhamankar, Yoonkyong Lee, AnHai Doan, etal. iMAP: Discovering Complex Semantic Matches between Database Schemas. Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, 2004. Paris, France: ACM Press.
    • [5] H. Do and E. Rahm. Coma: A system for flexible combination of schema matching approaches. In Proc. of VLDB-2002.
    • [6] Doan, A.H., P. Domingos, A. Halevy: Reconciling Schemas of Disparate Data Sources: A Machine-Learning Approach. SIGMOD 2001.
    • [7] A. Doan, J. Madhavan, P. Domingos, and A. Halevy. Learning to map between ontologies on the semantic web. In Proceedings of the World-Wide Web Conference (WWW-2002), pages 662-673. ACM Press, 2002.
  • Further reading on Ontology Mapping
    • [8] J. Kang and J. Naughton. On schema matching with opaque column names and data values. In Proc. of SIGMOD-2003.
    • [9] W. Kim and J. Seo. Classifying schematic and data heterogeneity in multidatabase systems. IEEE Computer, 1991, 24(12):12-18
    • [10] J. Madhavan, P. Bernstein, and E. Rahm. Generic schema matching with cupid. In Proc. of VLDB-2001.
    • [11] A. Maedche, B. Moltik, N. Silva and R. Volz. MAFRA -An Ontology MApping FRAmework in the Context of the Semantic Web. In Proceeding of the EKAW'2002, Siguenza, Spain. 2002.
    • [12] Alexander Maedche, Steffen Staab: Ontology Learning for the Semantic Web. IEEE Intelligent Systems 16(2): 72-79 (2001)
    • [13] Jayant Madhavan, Philip Bernstein, Kuang Chen, Alon Halevy, and Pradeep Shenoy. Corpus based schema matching. In Proc. of the IJCAI-03 Workshop on Information Integration on the Web (IIWeb-03), 2003.
    • [14] McGuinness D., Fikes R., Rice J., and Wilder S. :An environment for merging and testing large ontologies. Proceedings of the 7th International Conference on Principles of Knowledge Representation and Reasoning. Colorado, USA.
  • Further reading on Ontology Mapping
    • [15] S. Melnik, H. Molina-Garcia, and E. Rahm. Similarity flooding: a versatile graph matching algorithm. In Proc. of ICDE-2002.
    • [16] N. F. Noy and M. A. Musen. PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment. In Proc. of AAAI-2000, pages 450-455, 2000.
    • [17] Nuno Silva and Joao Rocha. Semantic Web Complex Ontology Mapping. IEEE/WIC International Conference on Web Intelligence (WI'03) October 13-17, 2003 Halifax, Canada:82-100
    • [18] Omelayenko, B. RDFT: A Mapping Meta-Ontology for Business Integration; Workshop on Knowledge Transformation for the Semantic Web (KTSW 2002) at ECAI'2002. Lyon, France; 2002:76-83
    • [19] Palopoli, L., G. Terracina, D. Ursino: The System DIKE: Towards the Semi-Automatic Synthesis of Cooperative Information Systems and Data Warehouses. ADBIS-DASFAA 2000, 108¡§C117
    • [20] Park, J. Y., Gennari, J. H. and Musen, M. A.; &quot;Mappings for Reuse in Knowledge-based Systems&quot;; 11th Workshop on Knowledge Acquisition, Modelling and Management (KAW 98); Banff, Canada; 1998.
    • [21] Patrick. P, Dekang. L. Discovering Word Senses from Text. In Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2002:613-619.
  • Further reading on Ontology Mapping
    • [22] Richard Benjamins, Jes¡§?s Contreras. White Paper Six Challenges for the Semantic Web. Intelligent Software Components. Intelligent software for the networked economy (isoco). April, 2002.
    • [23] E. Rahm and P. A. Bernstein. A survey of approaches to automatic schema matching. The VLDB Journal, 10:334-350, 2001.
    • [24] Tim Berners-Lee, Mark Fischetti (Contributor), Michael L. Dertouzos; &quot;Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web&quot;; 1999.
    • [25] K. M. Ting and I. H. Witten. Issues in stacked generalization. Journal of Artificial Intelligence Research, 10:271-289, 1999.
    • [26] Wache, H.; Voegele, T.; Visser, U.; Stuckenschmidt, H.;Schuster, G.; Neumann, H.; and Huebner, S. 2001. Ontology-based integration of information - a survey of existing approaches. In Proc. of IJCAI 2001 Workshop on Ontologies and Information Sharing.
    • [27] Wiesman, F., Roos, N., and Vogt, P. (2001). Automatic ontology mapping for agent communication. Technical report.
    • [28] L. Xu and D. Embley. Using domain ontologies to discover direct and indirect matches for schema elements. In Proc. of the Semantic Integration Workshop at ISWC-2003.
  • Further Reading on Machine Learning
    • Muslea. Multi-view plus active learning. (thesis)
    • Tom M. Mitchell. Machine Learning.
    • Richard O. Duda. Pattern Classification. (Second Edition)
    • Zhai-Xiang Chen. Risk Minimization based Information Retrieval. (thesis)
    • Wrapper Induction. Several thesis: rapier, etc
    • Data Mining. Han,