Welcome Professor Dieter to Tsinghua. Wish to get your advice and instruction. I am Li Juanzi from Knowledge Engineering Group in the department of computer science and technology. Thank professor Yang to give me this opportunity to introduce our work about semantic web And web services.
The is the processing flow of the contact search. After the user inputted the person name, the system first query the database. If the database has the contact information of that person, the system will return the contact information directly. If not, the system submits the person name to Google. For the returned documents by Google, we take into consideration the top ranked 50 documents and fed them to a classifier. Our statistic shows that more that 90% of the personal information is located in the top ranked 20 documents and more that 95% of the personal information is located in the top ranked 50 documents. The classifier identifies whether a document contains the personal information or not. Finally, we make use a SVM based method for the extraction and save the extracted data into the database.
In non-text filtering, we use the similar methods for header, signature, and program code detection. In the methods, we view a text line in an email as an instance in SVM. For each instance, we define a set of features. The method consists of two stages: training and detection. We use header as example to explain how we conduct the non-text block detection. In training, we use the training data as input and define two sets of features respectively for header start line and header end line detection. We then use the two feature sets to construct two SVM models. In detection, we identify whether or not a line is the start line of a header, and whether or not a line is the end line of a header using the two SVM models. We then view the lines between the identified start line and the end line as a header. So, to define effective features is one of our focuses.
That is all for my introduction to our lab. Thank all
New Approach To Personal Network Search Based On Information Extraction (Tin180 Com)
Personal Social Network — A New Approach to Personal Network Search based on Information Extraction Jie Tang, Mingcai Hong, Jing Zhang, Bangyong Liang, and Juanzi Li Knowledge Engineering Group, Department of Computer Science and Technology, Tsinghua University Sep. 5 th , 2006
Personal Social Network <ul><li>Personal social network is an important research area. </li></ul><ul><li>A person usually has different types of information </li></ul><ul><ul><li>Personal profile (including portrait, homepage, position, affiliation, publications, and documents) </li></ul></ul><ul><ul><li>Contact information (including address, email, telephone, and fax number) </li></ul></ul><ul><ul><li>Friends </li></ul></ul><ul><li>Unfortunately, the information is often hidden in heterogeneous and distributed web pages </li></ul>