SlideShare a Scribd company logo
1 of 4
Semi-automatic Text Mining
Project Proposal for „Future and Emerging Technologies“
in the EU-IST Programme
S. Staab1
, R. Studer Karlsruhe University
K. Markert, B. Webber University of Edinburgh
N. Kushmerick University College Dublin
B. Bremdal, R. Engels Cognit a.s
http://www.aifb.uni-karlsruhe.de/~sst/Research/Projects/TextMining/
1 Abstract
Motivation: The revolutionary step from printed text to digital documents has lead to an
explosive growth of knowledge available (semi-)publicly through the internet or through
community and coporate intranets. With this flood of potentially useful information, there
comes the urgent need to sift through it, find the golden nuggets of information and
analyze them for making informed decisions.
Problem: The vision in text understanding has been that of fully automatic techniques
that may be exploited for purposes like detecting relevant informations in texts,
summarizing the relevant informations, or answering questions on texts. Nevertheless,
fully automatic text mining appears to be as distant as ever. Approaches that actually
work rely almost exclusivly on information retrieval techniques, hardly exploit the fast
progress in computational linguistics research, and thus exhibit well-known limitations
that lead to inconclusive summarizations or to the abundance of hits in search engines like
AltaVista. In addition, the connotation of text mining—the aggregation and analysis of
information into a piece of knowledge that may lead to an informed action — has hardly
been investigated so far.
Objectives: Our project proposal pursues a threefold objective. First, we want to bridge
the gap between techniques that are actually used for text mining, and thus draw from
current and upcoming progress in the fields of knowledge acquisition, computational
linguistics, information retrieval, information extraction and machine learning.
Second, we want to exploit the particularities found in current web documents. This
implies that we need to consider new web standards for document structuring, viz. XML,
and we must consider semi-structuring information such as given through layout, in tables
or lists.
Finally, we want to go beyond information extraction towards text information
exploitation. This means we want to combine extracted information in order to deduce
knowledge that may not have been in the mind of the authors of the text.
Method: We consider text mining a semi-automatic process that is designed and set up
with a particular application in mind. The design involves the construction of a domain
1
Contact: Steffen Staab, AIFB, Karlsruhe University, D-76128 Karlsruhe, email: staab@aifb.uni-
karlsruhe.de, Tel.: +49(0)721/608 7363, Fax.: +49(0)721/ 693717
ontology, the formulation and/or learning of interesting structures with computational
linguistics and/or information retrieval techniques and the exploration of the
corresponding results. Once, the domain specific text mining application is set up the
naive user may run it to extract information and – in particular – to find associations and
rules that were not present in the original texts, but that could only be found by
considering, integrating and comparing various text sources.
Scenario: As an interesting case study we choose the mining of annual business reports
and analysts‘ reports that comment on companies from a particular area (e.g.,
telecommunication). This scenario is very appropriate, because
1. It allows the observation of competitors and the detection of trends that are extremely
important for decision makers, such as trends in organizational structures or in
markets and products.
2. The understanding of these texts cannot be performed in isolation. Rather the
knowledge that needs to be found is mostly available in the annual changes that take
place and in the comparisons between companies in the same trade.
3. The setting is well enough observed and understood by professionals in order to
verify the techniques we develop.
2 Chances for Europe
Multiple chances and possibilities arising from an application of semi-automatic text
mining are given on several levels:
1 Informed Decisions: Results from our project may deliver critical information to
European businesses, thus keeping them competitive, reacting quickly to new trends
and possibilities.
2 Individual Learning: The more time the individual may spend on understanding
interconnections and the less time she spends with searching for information and
testing hypothesis, the more she profits from the information technology that is at
hand, now.
3 Research: Though our scenario develops a particular business case, many research
issues may profit from semi-automatic text mining, too. Indeed, research hypotheses
may be easier to (pre-)test or even to generate (cf. Hearst (1999)).
All these factors are critical to develop a high potential of Europeans and for Europeans.
Informed decisions, faster learning and improved research all work together in keeping
Europe competitive.
4 Partner Profile
We consider text mining as being a knowledge acquisition process that should be
facilitated by learning approaches and by the techniques found in information retrieval
and computational linguistics. Hence, the consortium includes people from these
different communities:
Prof. Dr. Studer has a chair for knowledge management at Karlsruhe University. He
has carried out research and organized numerous activities in the fields of knowledge
acquisition, knowledge management and data mining for over 20 years.
Dr. Steffen Staab is senior researcher and lecturer at Karlsruhe University. His research
interests include knowledge management, ontology engineering, information extraction,
and data mining. He is now project manager for Karlsruhe in the project GETESS
(http://www.getess.de), which aims at a specific information extraction system for the
tourism domain and which is funded by the German government.
Prof. Dr. Bonnie Webber...
Dr. Katja Markert....
Dr. Nicholas Kushmerick is College Lecturer in the Department of Computer Science,
University College Dublin, Ireland. Dr. Kushmerick received his Ph.D. in 1997 at the
University of Washington, and his dissertation was nominated for the ACM
Distinguished Dissertation award. Dr. Kushmerick has worked in the areas of planning,
machine learning, and information-extraction, -integration, and -retrieval. His worked
has been published in several international journals, and he has been on the organizing
committee of numerous conferences and workshops. Dr. Kushmerick’s current work
focuses on the use of machine learning to scale up knowledge engineering on the
Internet, in service of problems such as information extraction and designing intelligent
browsing assistants.
Dr. Bernt Arild Bremdal: Studied Marine Technology in Trondheim, Norway. After
finishing his MSc at the NTNU he wrote his PhD at the same university. He got is PhD
on the application of artificial intelligence, rule-based and object-oriented programming
in project planning in 1988. After he has been affiliated with a variety of companies he
co-founded and directs CognIT a.s. Author of more than 50 articles and published reports
on computer applications in engineering and industry, design and planning, object-
oriented technology and artificial intelligence. Most recent publication is Braunschweig
and Bremdal, “AI in the Petroleum Industry.” Volume 2. Edition Technip 1996.
Dr. Robert Engels: Studied Artificial Intelligence, Psychology and (partly) Computer
Science at the university of Amsterdam, NL. He conducted his MSc thesis on
applications of Inductive Logic Programming in Stockholm, Sweden. In 1999 he got his
PhD from the university of Karlsruhe for research conducted in the area of Knowledge
Discovery and Data Mining. He (co-) authored a variety of papers, and organised several
international and national (German) workshops on practical applications of Data Mining.
Currently he is affiliated with CognIT as a senior systems architect.
The work packages would be split along the following lines (bold face indicates
leadership for a particular work package):
Knowledge
Acquisition
Computational
Linguistics
Machine
Learning
Information
Retrieval
Univ. Karlsruhe Ontology
acquisition
Mining
Information
Univ. Edinburgh Information
Extraction
with Layout
Univ. College
Dublin
Wrappers with
Ontologies;
Mining
Information
Indexing and
querying
structured
documents
Cognit Ontology
induction
Understanding
XML Texts
5 Partner Adresses
Dr. Steffen Staab, Prof. Dr. Rudi Studer
Institute for Applied Computer Science and Formal Description Methods (AIFB),
Karlsruhe University, D-76128 Karlsruhe, Germany
http://www.aifb.uni-karlsruhe.de/WBS
mailto:staab@aifb.uni-karlsruhe.de,studer@aifb.uni-karlsruhe.de
Dr. Katja Markert, Prof. Dr. Bonnie Webber
Division of Informatics, University of Edinburgh, 80 South Bridge
Edinburgh EH1 1HN, Scotland
http://www.informatics.ed.ac.uk/research/irr/
mailto:markert@cogsci.ed.ac.uk,bonnie@dai.ed.ac.uk
Dr. Nicholas Kushmerick
Department of Computer Science, University College Dublin, Dublin 4, Ireland
http://www.cs.ucd.ie/staff/nick/
mailto:nick@ucd.ie
Dr. Robert Engels, Dr. Bernt Bremdal
Cognit a.s, P.B. 610, N-1754 Halden, Norway
http://www.cognit.no/
mailto:robert.engels@cognit.no,bernt.bremdal@cognit.no

More Related Content

Viewers also liked

Text and data mining in UK and France (ADBU - 13 Dec 16)
Text and data mining in UK and France (ADBU - 13 Dec 16)Text and data mining in UK and France (ADBU - 13 Dec 16)
Text and data mining in UK and France (ADBU - 13 Dec 16)Rob Johnson
 
Mining Unstructured Healthcare Data
Mining Unstructured Healthcare DataMining Unstructured Healthcare Data
Mining Unstructured Healthcare Datazang0
 
Doctoral Thesis Proposal: An Automatic Knowledge Discovery Strategy In Biomed...
Doctoral Thesis Proposal: An Automatic Knowledge Discovery Strategy In Biomed...Doctoral Thesis Proposal: An Automatic Knowledge Discovery Strategy In Biomed...
Doctoral Thesis Proposal: An Automatic Knowledge Discovery Strategy In Biomed...Universidad de los Llanos
 
Research proposal sample
Research proposal sampleResearch proposal sample
Research proposal sampleVanessa Cuesta
 
The Research Proposal
The Research ProposalThe Research Proposal
The Research Proposalguest349908
 
My research proposal.ppt
My research proposal.pptMy research proposal.ppt
My research proposal.pptnanimamat
 

Viewers also liked (7)

Text and data mining in UK and France (ADBU - 13 Dec 16)
Text and data mining in UK and France (ADBU - 13 Dec 16)Text and data mining in UK and France (ADBU - 13 Dec 16)
Text and data mining in UK and France (ADBU - 13 Dec 16)
 
Mining Unstructured Healthcare Data
Mining Unstructured Healthcare DataMining Unstructured Healthcare Data
Mining Unstructured Healthcare Data
 
Doctoral Thesis Proposal: An Automatic Knowledge Discovery Strategy In Biomed...
Doctoral Thesis Proposal: An Automatic Knowledge Discovery Strategy In Biomed...Doctoral Thesis Proposal: An Automatic Knowledge Discovery Strategy In Biomed...
Doctoral Thesis Proposal: An Automatic Knowledge Discovery Strategy In Biomed...
 
Research proposal sample
Research proposal sampleResearch proposal sample
Research proposal sample
 
Data mining
Data miningData mining
Data mining
 
The Research Proposal
The Research ProposalThe Research Proposal
The Research Proposal
 
My research proposal.ppt
My research proposal.pptMy research proposal.ppt
My research proposal.ppt
 

Similar to Semi-automatic Text MiningNK

Text Mining: Beyond Extraction Towards Exploitation
Text Mining: Beyond Extraction Towards ExploitationText Mining: Beyond Extraction Towards Exploitation
Text Mining: Beyond Extraction Towards Exploitationbutest
 
cv-agnar-fall-08.doc
cv-agnar-fall-08.doccv-agnar-fall-08.doc
cv-agnar-fall-08.docbutest
 
Web_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibWeb_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibEl Habib NFAOUI
 
Post 1What is text analytics How does it differ from text mini.docx
Post 1What is text analytics How does it differ from text mini.docxPost 1What is text analytics How does it differ from text mini.docx
Post 1What is text analytics How does it differ from text mini.docxstilliegeorgiana
 
Post 1What is text analytics How does it differ from text mini
Post 1What is text analytics How does it differ from text miniPost 1What is text analytics How does it differ from text mini
Post 1What is text analytics How does it differ from text minianhcrowley
 
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT ecij
 
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT ecij
 
IUI 2010: An Informal Summary of the International Conference on Intelligent ...
IUI 2010: An Informal Summary of the International Conference on Intelligent ...IUI 2010: An Informal Summary of the International Conference on Intelligent ...
IUI 2010: An Informal Summary of the International Conference on Intelligent ...J S
 
Frankfurt Big Data Lab & Refugee Projeect
Frankfurt Big Data Lab & Refugee ProjeectFrankfurt Big Data Lab & Refugee Projeect
Frankfurt Big Data Lab & Refugee ProjeectGoethe Univeristy
 
The 2011 IEEE/WIC/ACM International Conference on Web Intelligence » industry...
The 2011 IEEE/WIC/ACM International Conference on Web Intelligence » industry...The 2011 IEEE/WIC/ACM International Conference on Web Intelligence » industry...
The 2011 IEEE/WIC/ACM International Conference on Web Intelligence » industry...Francois Pouilloux
 
Deep Learning for Information Extraction in Natural Language Text
Deep Learning for Information Extraction in Natural Language TextDeep Learning for Information Extraction in Natural Language Text
Deep Learning for Information Extraction in Natural Language TextPankaj Gupta, PhD
 
Statement of Research Interests
Statement of Research InterestsStatement of Research Interests
Statement of Research Interestsadil raja
 
Knowledge Management Cultures: A Comparison of Engineering and Cultural Scien...
Knowledge Management Cultures: A Comparison of Engineering and Cultural Scien...Knowledge Management Cultures: A Comparison of Engineering and Cultural Scien...
Knowledge Management Cultures: A Comparison of Engineering and Cultural Scien...Ralf Klamma
 
Curriculum Vitae
Curriculum VitaeCurriculum Vitae
Curriculum VitaeAndy Nisbet
 
A STUDY ON PLAGIARISM CHECKING WITH APPROPRIATE ALGORITHM IN DATAMINING
A STUDY ON PLAGIARISM CHECKING WITH APPROPRIATE ALGORITHM IN DATAMININGA STUDY ON PLAGIARISM CHECKING WITH APPROPRIATE ALGORITHM IN DATAMINING
A STUDY ON PLAGIARISM CHECKING WITH APPROPRIATE ALGORITHM IN DATAMININGAllison Thompson
 
The Future is All Mine
The Future is All MineThe Future is All Mine
The Future is All Mineopenminted_eu
 
A Comprehensive Guide to Data Science Technologies.pdf
A Comprehensive Guide to Data Science Technologies.pdfA Comprehensive Guide to Data Science Technologies.pdf
A Comprehensive Guide to Data Science Technologies.pdfGeethaPratyusha
 

Similar to Semi-automatic Text MiningNK (20)

Text Mining: Beyond Extraction Towards Exploitation
Text Mining: Beyond Extraction Towards ExploitationText Mining: Beyond Extraction Towards Exploitation
Text Mining: Beyond Extraction Towards Exploitation
 
Annual Report
Annual ReportAnnual Report
Annual Report
 
cv-agnar-fall-08.doc
cv-agnar-fall-08.doccv-agnar-fall-08.doc
cv-agnar-fall-08.doc
 
Web_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibWeb_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_Habib
 
Post 1What is text analytics How does it differ from text mini.docx
Post 1What is text analytics How does it differ from text mini.docxPost 1What is text analytics How does it differ from text mini.docx
Post 1What is text analytics How does it differ from text mini.docx
 
Post 1What is text analytics How does it differ from text mini
Post 1What is text analytics How does it differ from text miniPost 1What is text analytics How does it differ from text mini
Post 1What is text analytics How does it differ from text mini
 
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
 
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
USING MACHINE LEARNING TO BUILD A SEMI-INTELLIGENT BOT
 
IUI 2010: An Informal Summary of the International Conference on Intelligent ...
IUI 2010: An Informal Summary of the International Conference on Intelligent ...IUI 2010: An Informal Summary of the International Conference on Intelligent ...
IUI 2010: An Informal Summary of the International Conference on Intelligent ...
 
Frankfurt Big Data Lab & Refugee Projeect
Frankfurt Big Data Lab & Refugee ProjeectFrankfurt Big Data Lab & Refugee Projeect
Frankfurt Big Data Lab & Refugee Projeect
 
The 2011 IEEE/WIC/ACM International Conference on Web Intelligence » industry...
The 2011 IEEE/WIC/ACM International Conference on Web Intelligence » industry...The 2011 IEEE/WIC/ACM International Conference on Web Intelligence » industry...
The 2011 IEEE/WIC/ACM International Conference on Web Intelligence » industry...
 
Deep Learning for Information Extraction in Natural Language Text
Deep Learning for Information Extraction in Natural Language TextDeep Learning for Information Extraction in Natural Language Text
Deep Learning for Information Extraction in Natural Language Text
 
Statement of Research Interests
Statement of Research InterestsStatement of Research Interests
Statement of Research Interests
 
Knowledge Management Cultures: A Comparison of Engineering and Cultural Scien...
Knowledge Management Cultures: A Comparison of Engineering and Cultural Scien...Knowledge Management Cultures: A Comparison of Engineering and Cultural Scien...
Knowledge Management Cultures: A Comparison of Engineering and Cultural Scien...
 
Curriculum Vitae
Curriculum VitaeCurriculum Vitae
Curriculum Vitae
 
2008
20082008
2008
 
A STUDY ON PLAGIARISM CHECKING WITH APPROPRIATE ALGORITHM IN DATAMINING
A STUDY ON PLAGIARISM CHECKING WITH APPROPRIATE ALGORITHM IN DATAMININGA STUDY ON PLAGIARISM CHECKING WITH APPROPRIATE ALGORITHM IN DATAMINING
A STUDY ON PLAGIARISM CHECKING WITH APPROPRIATE ALGORITHM IN DATAMINING
 
The Future is All Mine
The Future is All MineThe Future is All Mine
The Future is All Mine
 
Rdaeu russia_fg_1_july2014_final
Rdaeu  russia_fg_1_july2014_finalRdaeu  russia_fg_1_july2014_final
Rdaeu russia_fg_1_july2014_final
 
A Comprehensive Guide to Data Science Technologies.pdf
A Comprehensive Guide to Data Science Technologies.pdfA Comprehensive Guide to Data Science Technologies.pdf
A Comprehensive Guide to Data Science Technologies.pdf
 

More from butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Semi-automatic Text MiningNK

  • 1. Semi-automatic Text Mining Project Proposal for „Future and Emerging Technologies“ in the EU-IST Programme S. Staab1 , R. Studer Karlsruhe University K. Markert, B. Webber University of Edinburgh N. Kushmerick University College Dublin B. Bremdal, R. Engels Cognit a.s http://www.aifb.uni-karlsruhe.de/~sst/Research/Projects/TextMining/ 1 Abstract Motivation: The revolutionary step from printed text to digital documents has lead to an explosive growth of knowledge available (semi-)publicly through the internet or through community and coporate intranets. With this flood of potentially useful information, there comes the urgent need to sift through it, find the golden nuggets of information and analyze them for making informed decisions. Problem: The vision in text understanding has been that of fully automatic techniques that may be exploited for purposes like detecting relevant informations in texts, summarizing the relevant informations, or answering questions on texts. Nevertheless, fully automatic text mining appears to be as distant as ever. Approaches that actually work rely almost exclusivly on information retrieval techniques, hardly exploit the fast progress in computational linguistics research, and thus exhibit well-known limitations that lead to inconclusive summarizations or to the abundance of hits in search engines like AltaVista. In addition, the connotation of text mining—the aggregation and analysis of information into a piece of knowledge that may lead to an informed action — has hardly been investigated so far. Objectives: Our project proposal pursues a threefold objective. First, we want to bridge the gap between techniques that are actually used for text mining, and thus draw from current and upcoming progress in the fields of knowledge acquisition, computational linguistics, information retrieval, information extraction and machine learning. Second, we want to exploit the particularities found in current web documents. This implies that we need to consider new web standards for document structuring, viz. XML, and we must consider semi-structuring information such as given through layout, in tables or lists. Finally, we want to go beyond information extraction towards text information exploitation. This means we want to combine extracted information in order to deduce knowledge that may not have been in the mind of the authors of the text. Method: We consider text mining a semi-automatic process that is designed and set up with a particular application in mind. The design involves the construction of a domain 1 Contact: Steffen Staab, AIFB, Karlsruhe University, D-76128 Karlsruhe, email: staab@aifb.uni- karlsruhe.de, Tel.: +49(0)721/608 7363, Fax.: +49(0)721/ 693717
  • 2. ontology, the formulation and/or learning of interesting structures with computational linguistics and/or information retrieval techniques and the exploration of the corresponding results. Once, the domain specific text mining application is set up the naive user may run it to extract information and – in particular – to find associations and rules that were not present in the original texts, but that could only be found by considering, integrating and comparing various text sources. Scenario: As an interesting case study we choose the mining of annual business reports and analysts‘ reports that comment on companies from a particular area (e.g., telecommunication). This scenario is very appropriate, because 1. It allows the observation of competitors and the detection of trends that are extremely important for decision makers, such as trends in organizational structures or in markets and products. 2. The understanding of these texts cannot be performed in isolation. Rather the knowledge that needs to be found is mostly available in the annual changes that take place and in the comparisons between companies in the same trade. 3. The setting is well enough observed and understood by professionals in order to verify the techniques we develop. 2 Chances for Europe Multiple chances and possibilities arising from an application of semi-automatic text mining are given on several levels: 1 Informed Decisions: Results from our project may deliver critical information to European businesses, thus keeping them competitive, reacting quickly to new trends and possibilities. 2 Individual Learning: The more time the individual may spend on understanding interconnections and the less time she spends with searching for information and testing hypothesis, the more she profits from the information technology that is at hand, now. 3 Research: Though our scenario develops a particular business case, many research issues may profit from semi-automatic text mining, too. Indeed, research hypotheses may be easier to (pre-)test or even to generate (cf. Hearst (1999)). All these factors are critical to develop a high potential of Europeans and for Europeans. Informed decisions, faster learning and improved research all work together in keeping Europe competitive. 4 Partner Profile We consider text mining as being a knowledge acquisition process that should be facilitated by learning approaches and by the techniques found in information retrieval and computational linguistics. Hence, the consortium includes people from these different communities:
  • 3. Prof. Dr. Studer has a chair for knowledge management at Karlsruhe University. He has carried out research and organized numerous activities in the fields of knowledge acquisition, knowledge management and data mining for over 20 years. Dr. Steffen Staab is senior researcher and lecturer at Karlsruhe University. His research interests include knowledge management, ontology engineering, information extraction, and data mining. He is now project manager for Karlsruhe in the project GETESS (http://www.getess.de), which aims at a specific information extraction system for the tourism domain and which is funded by the German government. Prof. Dr. Bonnie Webber... Dr. Katja Markert.... Dr. Nicholas Kushmerick is College Lecturer in the Department of Computer Science, University College Dublin, Ireland. Dr. Kushmerick received his Ph.D. in 1997 at the University of Washington, and his dissertation was nominated for the ACM Distinguished Dissertation award. Dr. Kushmerick has worked in the areas of planning, machine learning, and information-extraction, -integration, and -retrieval. His worked has been published in several international journals, and he has been on the organizing committee of numerous conferences and workshops. Dr. Kushmerick’s current work focuses on the use of machine learning to scale up knowledge engineering on the Internet, in service of problems such as information extraction and designing intelligent browsing assistants. Dr. Bernt Arild Bremdal: Studied Marine Technology in Trondheim, Norway. After finishing his MSc at the NTNU he wrote his PhD at the same university. He got is PhD on the application of artificial intelligence, rule-based and object-oriented programming in project planning in 1988. After he has been affiliated with a variety of companies he co-founded and directs CognIT a.s. Author of more than 50 articles and published reports on computer applications in engineering and industry, design and planning, object- oriented technology and artificial intelligence. Most recent publication is Braunschweig and Bremdal, “AI in the Petroleum Industry.” Volume 2. Edition Technip 1996. Dr. Robert Engels: Studied Artificial Intelligence, Psychology and (partly) Computer Science at the university of Amsterdam, NL. He conducted his MSc thesis on applications of Inductive Logic Programming in Stockholm, Sweden. In 1999 he got his PhD from the university of Karlsruhe for research conducted in the area of Knowledge Discovery and Data Mining. He (co-) authored a variety of papers, and organised several international and national (German) workshops on practical applications of Data Mining. Currently he is affiliated with CognIT as a senior systems architect. The work packages would be split along the following lines (bold face indicates leadership for a particular work package): Knowledge Acquisition Computational Linguistics Machine Learning Information Retrieval Univ. Karlsruhe Ontology acquisition Mining Information Univ. Edinburgh Information
  • 4. Extraction with Layout Univ. College Dublin Wrappers with Ontologies; Mining Information Indexing and querying structured documents Cognit Ontology induction Understanding XML Texts 5 Partner Adresses Dr. Steffen Staab, Prof. Dr. Rudi Studer Institute for Applied Computer Science and Formal Description Methods (AIFB), Karlsruhe University, D-76128 Karlsruhe, Germany http://www.aifb.uni-karlsruhe.de/WBS mailto:staab@aifb.uni-karlsruhe.de,studer@aifb.uni-karlsruhe.de Dr. Katja Markert, Prof. Dr. Bonnie Webber Division of Informatics, University of Edinburgh, 80 South Bridge Edinburgh EH1 1HN, Scotland http://www.informatics.ed.ac.uk/research/irr/ mailto:markert@cogsci.ed.ac.uk,bonnie@dai.ed.ac.uk Dr. Nicholas Kushmerick Department of Computer Science, University College Dublin, Dublin 4, Ireland http://www.cs.ucd.ie/staff/nick/ mailto:nick@ucd.ie Dr. Robert Engels, Dr. Bernt Bremdal Cognit a.s, P.B. 610, N-1754 Halden, Norway http://www.cognit.no/ mailto:robert.engels@cognit.no,bernt.bremdal@cognit.no