SlideShare a Scribd company logo
1 of 7
1
© 2015 IBM Corporation
During the extraction process, item’s priority is given by the left hand character of this item:
Priority 1: Left hand character is important punctuation.
Priority 2: Left hand character is normal punctuation.
Priority 3: Left hand character is not a defined punctuation (may be a word).
About Item’s Priority
Important
Punctuations
Normal
Punctuations
2
© 2015 IBM Corporation
About Item’s Key and Prefix
Section
Chapter
1.
1.1.
2-
2-2-
…
2
A
1
3
2
5
Ii
…
Prefix Key
For an item
Prefix: Item’s list mark. (Item may not have this)
Key: Item’s sequence mark in a list.
3
© 2015 IBM Corporation
About Tree & Linear Chain
1. Items: Titles extracted by
regular expression from input
document.
A 1.1
2
……
2.3
2. Tree: Potential assignments
of items. (Due to items’
sequence of key, format and
prefix)
B
B
A
C
C
D
C
Subsections!
3.Linear Chain: a typical tree
which have only one branch in
each node.
1.2
1.1
1.3
Note: in tree building phase, one child can have many parents. After the
pruning phase, one child one parent, aka. Linear chain
B
A
4
© 2015 IBM Corporation
Extraction
1. Section
Pattern
6. Sort and
remove
overlaps
3. Multilevel
Pattern
2. Patent
Pattern
4. Item
Pattern
5. TOC
Pattern
Input
Output
Extraction Module (Due to RegExp or Style information)
Document
Converter
(.XML file)
Build Forest
A fast filtering in 1-5
to check item’s
continuity
5
© 2015 IBM Corporation
Build Forest
2. The Forest
(Priority 1)
Items from
Extractor
2. The Low
Forest
(Priority 2 and
3)
3. The Forest
4. Prune
Forest
Output
Drop
Add items. (Due to
the result of check
in-line list for the
low forest)
1. Detect
Subsections
Detect Subsections
and build potential
Trees.
Build forest due to
tree’s priority.
6
© 2015 IBM Corporation
Prune Forest (detect structure.java)
The Forest
Get all linear
chains
1. Linear
Chains
2. Valid Linear
Chains
Verify and Validate
3. Prune with
Linear Chains
Output
Prune trees in The Forest with linear chains
Filters
4. Hierarchy
the output
5. Clause
Extraction &
Filtering
6. Create
Clause
Annotation
Output for
the user
Iteratively
7
© 2015 IBM Corporation
Prune Theory
1. Linear chain ends
before tree’s start.
Nothing to prune
2. Linear chain starts
before tree’s start.
3. Tree starts before
linear chain’s start

More Related Content

Viewers also liked

Mohan Pun (4)
Mohan Pun (4)Mohan Pun (4)
Mohan Pun (4)Mohan pun
 
DPS6 Research Paper - Leonard Howe
DPS6 Research Paper - Leonard HoweDPS6 Research Paper - Leonard Howe
DPS6 Research Paper - Leonard HoweLeonard Howe
 
US History Fall 2016
US History Fall 2016 US History Fall 2016
US History Fall 2016 terrikaplan
 
Dual Clutch Transmission System
Dual Clutch Transmission SystemDual Clutch Transmission System
Dual Clutch Transmission SystemNeel Thakkar
 
Nano Watt fueling from Microbial fuel Cell using Black tea waste
Nano Watt fueling from Microbial fuel Cell using Black tea wasteNano Watt fueling from Microbial fuel Cell using Black tea waste
Nano Watt fueling from Microbial fuel Cell using Black tea wasteAman Anand
 

Viewers also liked (7)

Mohan Pun (4)
Mohan Pun (4)Mohan Pun (4)
Mohan Pun (4)
 
DPS6 Research Paper - Leonard Howe
DPS6 Research Paper - Leonard HoweDPS6 Research Paper - Leonard Howe
DPS6 Research Paper - Leonard Howe
 
Screen beans giovanni
Screen beans giovanniScreen beans giovanni
Screen beans giovanni
 
US History Fall 2016
US History Fall 2016 US History Fall 2016
US History Fall 2016
 
Dual Clutch Transmission System
Dual Clutch Transmission SystemDual Clutch Transmission System
Dual Clutch Transmission System
 
Nano Watt fueling from Microbial fuel Cell using Black tea waste
Nano Watt fueling from Microbial fuel Cell using Black tea wasteNano Watt fueling from Microbial fuel Cell using Black tea waste
Nano Watt fueling from Microbial fuel Cell using Black tea waste
 
Feliznatallobomau 150119061730
Feliznatallobomau 150119061730Feliznatallobomau 150119061730
Feliznatallobomau 150119061730
 

StructureDetection&ClauseExtractionModule

  • 1. 1 © 2015 IBM Corporation During the extraction process, item’s priority is given by the left hand character of this item: Priority 1: Left hand character is important punctuation. Priority 2: Left hand character is normal punctuation. Priority 3: Left hand character is not a defined punctuation (may be a word). About Item’s Priority Important Punctuations Normal Punctuations
  • 2. 2 © 2015 IBM Corporation About Item’s Key and Prefix Section Chapter 1. 1.1. 2- 2-2- … 2 A 1 3 2 5 Ii … Prefix Key For an item Prefix: Item’s list mark. (Item may not have this) Key: Item’s sequence mark in a list.
  • 3. 3 © 2015 IBM Corporation About Tree & Linear Chain 1. Items: Titles extracted by regular expression from input document. A 1.1 2 …… 2.3 2. Tree: Potential assignments of items. (Due to items’ sequence of key, format and prefix) B B A C C D C Subsections! 3.Linear Chain: a typical tree which have only one branch in each node. 1.2 1.1 1.3 Note: in tree building phase, one child can have many parents. After the pruning phase, one child one parent, aka. Linear chain B A
  • 4. 4 © 2015 IBM Corporation Extraction 1. Section Pattern 6. Sort and remove overlaps 3. Multilevel Pattern 2. Patent Pattern 4. Item Pattern 5. TOC Pattern Input Output Extraction Module (Due to RegExp or Style information) Document Converter (.XML file) Build Forest A fast filtering in 1-5 to check item’s continuity
  • 5. 5 © 2015 IBM Corporation Build Forest 2. The Forest (Priority 1) Items from Extractor 2. The Low Forest (Priority 2 and 3) 3. The Forest 4. Prune Forest Output Drop Add items. (Due to the result of check in-line list for the low forest) 1. Detect Subsections Detect Subsections and build potential Trees. Build forest due to tree’s priority.
  • 6. 6 © 2015 IBM Corporation Prune Forest (detect structure.java) The Forest Get all linear chains 1. Linear Chains 2. Valid Linear Chains Verify and Validate 3. Prune with Linear Chains Output Prune trees in The Forest with linear chains Filters 4. Hierarchy the output 5. Clause Extraction & Filtering 6. Create Clause Annotation Output for the user Iteratively
  • 7. 7 © 2015 IBM Corporation Prune Theory 1. Linear chain ends before tree’s start. Nothing to prune 2. Linear chain starts before tree’s start. 3. Tree starts before linear chain’s start