download

Learning for Biomedical Information Extraction with ILP Margherita Berardi Vincenzo Giuliano Donato Malerba

Outline of the talk ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

What is “Information Extraction” Filling slots in a database from sub-segments of text. As a task: October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… NAME TITLE ORGANIZATION

What is “Information Extraction” Filling slots in a database from sub-segments of text. As a task: October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte , a Microsoft VP . "That's a super-important shift for us in terms of code access.“ Richard Stallman , founder of the Free Software Foundation , countered saying… NAME TITLE ORGANIZATION Bill Gates CEO Microsoft Bill Veghte VP Microsoft Richard Stallman founder Free Soft.. IE

IE from Biomedical Texts: Motivation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Genome decoding  increasing amount of published literature Too much to read!

IE History ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Learning Language in biomedicine ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Is there “Logic” in language learning? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

IE problem formulation for HmtDB ,[object Object],( http:// www.hmdb.uniba.it / )

Textual Entity Extraction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],By modelling the sentence structure: substitution (X)  follows (Y,X), type (Y) Extractors cannot be learned independently!!!

Textual Entity Extraction ,[object Object],[object Object],[object Object],[object Object],[object Object],Mutation Sampled population DNA sample tissue DNA screening method … Title Abstract Introduction Methods

The learning task ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

ATRE (Apprendimento di Teorie Ricorsive da Esempi) http://www.di.uniba.it/~malerba/software/atre/ ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Find a (possibly recursive) logical theory T for the concepts C 1 , C 2 , ... , C r , such that T is complete and consistent with respect to the set of observations and satisfies the preference criterion PC .

ATRE ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

… the learning strategy… Example: Parallel search for the predicates even and odd seeds even(0) odd(1) Simplest consistent clauses are found first, independently of the predicates to be learned

… the learning strategy… Example: Parallel search for the predicates even and odd seeds even(2) odd(1) A predicate dependency is discovered! even(X)  succ ( Y,X ) even(X)  succ( X , Y ) odd(X)  succ(Y,X) odd(X)  succ(X,Y) even(X)  succ(Y,X), succ(Z,Y) odd(X)  succ(Y,X), even(Y) odd(X)  succ(Y,X), zero(Y) even(X)  succ(X,Y), succ(Y,Z)

Data preparation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Text processing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Text processing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Text description ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Application ,[object Object],[object Object],[object Object],[object Object],[object Object]

Textual portions of papers were categorized in five classes: Abstract, Introduction, Materials & Methods, Discussion and Results The abstract of each paper was processed Avg. No. of categories correctly classified

[object Object],[object Object],[object Object],[object Object],Example description

Background knowledge ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Experiments ,[object Object],[object Object],[object Object],[object Object]

Learned theories ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Wrap-up ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Where from here? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

[object Object],[object Object],[object Object]

Textual Pattern Extraction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Goal : to find descriptions of texts belonging to the “abstract” class Task relevant objects : Nominal chuncks, Words Reference object : abstract

Language bias ,[object Object],[object Object],[object Object],[object Object],[object Object]

Efficiency issues in ATRE ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

The learning strategy… ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

… the learning strategy… ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

The ILP approach to Data Mining ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

download

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to download

Similar to download (20)

More from butest

More from butest (20)

download

Editor's Notes