New Approach To Personal Network Search Based On Information Extraction (Tin180 Com)

394 views

Published on

http://tin180.com

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
394
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Welcome Professor Dieter to Tsinghua. Wish to get your advice and instruction. I am Li Juanzi from Knowledge Engineering Group in the department of computer science and technology. Thank professor Yang to give me this opportunity to introduce our work about semantic web And web services.
  • The is the processing flow of the contact search. After the user inputted the person name, the system first query the database. If the database has the contact information of that person, the system will return the contact information directly. If not, the system submits the person name to Google. For the returned documents by Google, we take into consideration the top ranked 50 documents and fed them to a classifier. Our statistic shows that more that 90% of the personal information is located in the top ranked 20 documents and more that 95% of the personal information is located in the top ranked 50 documents. The classifier identifies whether a document contains the personal information or not. Finally, we make use a SVM based method for the extraction and save the extracted data into the database.
  • In non-text filtering, we use the similar methods for header, signature, and program code detection. In the methods, we view a text line in an email as an instance in SVM. For each instance, we define a set of features. The method consists of two stages: training and detection. We use header as example to explain how we conduct the non-text block detection. In training, we use the training data as input and define two sets of features respectively for header start line and header end line detection. We then use the two feature sets to construct two SVM models. In detection, we identify whether or not a line is the start line of a header, and whether or not a line is the end line of a header using the two SVM models. We then view the lines between the identified start line and the end line as a header. So, to define effective features is one of our focuses.
  • That is all for my introduction to our lab. Thank all
  • New Approach To Personal Network Search Based On Information Extraction (Tin180 Com)

    1. 1. Personal Social Network — A New Approach to Personal Network Search based on Information Extraction Jie Tang, Mingcai Hong, Jing Zhang, Bangyong Liang, and Juanzi Li Knowledge Engineering Group, Department of Computer Science and Technology, Tsinghua University Sep. 5 th , 2006
    2. 2. Personal Social Network <ul><li>Personal social network is an important research area. </li></ul><ul><li>A person usually has different types of information </li></ul><ul><ul><li>Personal profile (including portrait, homepage, position, affiliation, publications, and documents) </li></ul></ul><ul><ul><li>Contact information (including address, email, telephone, and fax number) </li></ul></ul><ul><ul><li>Friends </li></ul></ul><ul><li>Unfortunately, the information is often hidden in heterogeneous and distributed web pages </li></ul>
    3. 3. Our Approach <ul><li>Personal Social Network </li></ul><ul><li>= Building + Search + Mining </li></ul><ul><li>Doc collection </li></ul><ul><li>Annotation </li></ul><ul><li>Integration </li></ul><ul><li>Person search </li></ul><ul><li>Publication search </li></ul><ul><li>Association search </li></ul><ul><li>Expert finding </li></ul><ul><li>Research interesting finding </li></ul>
    4. 4. Processing Flow Submitted to Returned pages Fed to Extracting and saving to Ontology base Query Classification Model
    5. 5. Building the Personal Network >400,000 Persons >700,000 Publications
    6. 6. Annotation using SVMs Personal profile: e.g. image, affiliation, etc. Contact information: fax, email, phone, etc. Start position model End position model Identified info. Features sets
    7. 7. Person Search Search for a person using the name or other information, e.g. affiliation
    8. 8. Publication Search Searching for a publication using IR model
    9. 9. Publication Online-View
    10. 10. Association Search Finding associations between persons - high efficiency - Top-K associations Usage: - to find a partner - to find a person with same interests
    11. 11. Expert Finding Finding experts on a topic
    12. 12. Research Interest Finding Finding research interests for a person
    13. 13. Homepage: http:// keg.cs.tsinghua.edu.cn/persons/tj Thank You

    ×