“A Novel Text detection System Based on
Character and Link Energies”
Presented by: Arun Patel
Roll No.: 15EC65R18
M.Tech 1st year VIPES, IIT Kharagpur
1
Algorithm
• This algorithm can detect most text object in various condition including different lightening,
different colors, complex background and low contrast text.
• This method is robust to the font, size, color and orientation of text and discriminate text object
from others effectively.
Fig(1) Algorithm 2
Initialization of Candidate Text Objects
 Localize the candidate Part
 Euler number
 Let 𝑣𝑖 and 𝑣𝑗 be two candidate parts with widths W 𝑣𝑖
and Wvj , heights Hvi and Hvj , and
centroids Cvi and Cvj ;
dist.(Cvi ,Cvj)≤wd .min(max(Wvi ,Hvi), max(Wvj ,Hvj))
 Finally, the candidate character parts that are reachable by one another via one or more links are
grouped to form a candidate text.
3
Fig.2 Initialization of candidate text objects 4
Character Features
• One important characteristic that discriminate text object from other object is that character are made
up of strokes that typically have approximately uniform thickness resulting in two near parallel edges
sets in their boundaries.
• Two edges sets have high similarities in length, orientation and curvature.
• Similarities of two stroke edges is captured by gradient vector of each point on the boundary.
Fig.3 (a) edge pairs of strokes (b)Gradient vectors of ‘R’ 5
• For a character, it has two near parallel edges sets and the gradients of an edge point and its
corresponding point should have approximately opposite direction.
• Distance between the points and their corresponding are similar because the change of stroke
width is usually small.
Fig.4 Corresponding pairs and links 6
Average Angle Difference of Corres. Pairs(Dangle)
• Let N denote the number of edge points of a candidate part. P(i)(1 ≤ i ≤ N) is the ith edge point with
the corresponding point P(i) corr .The difference of the gradient directions of the corresponding
pair (P(i) , P(i)
corr.) is defined as:
𝑑 𝑎𝑛𝑔𝑙𝑒
(𝑖)
=abs(𝜃 𝑝
(𝑖)
-𝜃 𝑝 𝑐𝑜𝑟𝑟
(𝑖)
)
• Dangle measures the average gradient direction difference of all corresponding pairs of a candidate
part.
• Dangle =
1
𝑁∙𝜋 𝑖=1
𝑁
𝑑 𝑎𝑛𝑔𝑙𝑒
(𝑖)
• For an ideal character Dangle reaches the maximum value 1.
7
Fraction of non-noise pair (Fnon-noise)
• In some cases, however, a character may have a smaller 𝑑 𝑎𝑛𝑔𝑙𝑒
(𝑖)
due to noise or deformations.
We compute Fnon−noise to measure the noise and deformation levels of a part based on d (i)
angle .
• Fnon-noise =
1
𝑁 𝑖=1
𝑁
ℎ(𝑑 𝑎𝑛𝑔𝑙𝑒
𝑖
, β)
• h(𝑑 𝑎𝑛𝑔𝑙𝑒
(𝑖) ,β)=1 if d (i)
angle >β
=0 else
• Fnon-noise is the fraction of all pairs for which the angle difference d(i)
angle is greater than β.
Fig5.Noise connections and non-noise connections Ref.(1) 8
R A C E
Dangle 0.889 0.865 0.925 0.897
Fnon-noise 0.754 0.684 0.897 0.806
Fig.6 Dangle and Fnon-noise Ref. (1)
9
• we divide the non-noise connections into two types: stroke-length connection and stroke-width
connection.
• By doing so, we can separate circle like objects and compute the feature vector of stroke width.
• Let k(i)(1 ≤ i ≤ N) be one of N non-noise connections of a part and have Ik
(i) intersections with other
non-noise connections. We define stroke-length connection and stroke-width as follows:
• K(i)∈ stroke−length connection, if (
Ik
(i)
𝑁
)> TIS
stroke−width connection, otherwise
• For circle, every connection intersects with all other connections at its center. Hence, all non-
noise connection of a circle are stroke length connection.
• Character have much more stroke-width connection than the non-characters.
10
Fig 7. Percentages of stroke-width links of two example images. Ref.(1)
11
Vector of Stroke Width (𝑉 𝑤𝑖𝑑𝑡ℎ)
• The vector of stroke width Vwidth is defined as: 𝑉 𝑤𝑖𝑑𝑡ℎ=[𝑤 𝑑
(1)
, 𝑤 𝑑
(2)
].
• Characters typically have one or two dominating stroke widths depending on their fonts.
• Then, we estimate dominating stroke-width w(i)
d through a weighted average computation using w(i)
p and its
two immediately adjacent neighbors:
𝑤 𝑑
(𝑖)
= r1×( 𝑤 𝑝
(𝑖)
−1)+ 𝑤 𝑝
(𝑖)
+r2× ( 𝑤 𝑝
(𝑖)
+1)
r1+1+r2
Fig.8 Histogram of the lengths of stroke width connections Ref. (1) 12
Character Energy
• For a part vi , we consider that its 𝐷 𝑎𝑛𝑔𝑙𝑒
(𝑖)
and 𝐹𝑛𝑜𝑛−𝑛𝑜𝑖𝑠𝑒
(𝑖)
are equally important for text detection
and define the character energy 𝐸𝑐ℎ𝑎𝑟
(𝑖)
of vi as follows:
𝐸𝑐ℎ𝑎𝑟
(𝑖)
=
𝐷 𝑎𝑛𝑔𝑙𝑒
(𝑖)
+𝐹𝑛𝑜𝑛−𝑛𝑜𝑖𝑠𝑒
(𝑖)
2
,0≤ 𝐸𝑐ℎ𝑎𝑟
(𝑖)
≤1.
• It can be treated as a measure of the probability that vi is a character.
• Character have larger Echar can discriminate text objects from other objects and it is robust to the
font,size,color and orientation of characters.
• 𝐷 𝑎𝑛𝑔𝑙𝑒
(𝑖)
and 𝐹𝑛𝑜𝑛−𝑛𝑜𝑖𝑠𝑒
(𝑖)
are correlated.
13
(a) (b)
Fig.9 two character with different noise/deformation levels Ref. (1) 14
𝑫 𝒂𝒏𝒈𝒍𝒆 𝑭 𝒏𝒐𝒏_𝒏𝒐𝒊𝒔𝒆 𝑬 𝒄𝒉𝒂𝒓
(a) 0.8846 0.5950 0.5950
(b) 0.8847 0.5261 0.7054
Fig.10 character energy Ref. (1) 15
Link energy
• Link energy is computed for every candidate link to measure the probability that two parts
connected by the link are both characters.
• Link energy is computed by measuring two values:
1. Similarity in the properties of neighboring parts, such as the color, stroke width, and size.
2.Spatial consistency in the direction and distance between neighboring parts in a string of parts.
• For two connected parts vi and vj ,we use color, stroke width(Vwidth),character width, and character
height to capture similarities between them.
𝐸𝐿𝑖𝑛𝑘
(𝑖,𝑗)
=
1
4 𝑘=1
4
(𝑤 𝑘.𝑠𝑖,𝑗
(𝑘)
) 𝑤 𝑘= 0.25
• Higher the 𝐸𝐿𝑖𝑛𝑘
(𝑖,𝑗)
higher the similarities between two parts.
16
Similarity Computation Of two Character
Fig.11 Link energy Ref.(1) 17
colour 𝑆𝑖,𝑗
(1)
=
1
3 (𝐶=𝑅,𝐺,𝐵)(1 −
|Ci−Cj|
255
)
Vwidth
𝑆𝑖,𝑗
(2)
=
1
2 𝑘=1
𝑘=2
𝑆𝑖𝑚𝑖 𝑅𝑖,𝑗
𝑣
𝑘 , 𝑅𝑖,𝑗
𝑣
=
Vi (k)
Vj (k)
Character
width
𝑆𝑖,𝑗
(3)
= 𝑆𝑖𝑚𝑖 𝑅𝑖,𝑗
𝑤
, 𝑅𝑖,𝑗
𝑤
=
𝑊𝑖
𝑊 𝑗
Character
Height
𝑆𝑖,𝑗
(4)
=Simi(𝑅𝑖,𝑗
(𝐻)
), 𝑅𝑖,𝑗
𝑤
=
𝐻𝑖
𝐻 𝑗
Text Unit Energy
• For the text unit containing two parts vi and v j , the text unit energ 𝐸𝑡𝑒𝑥𝑡
(𝑖,𝑗)
is computed using
character energies 𝐸𝑐ℎ𝑎𝑟
(𝑖)
, 𝐸𝑐ℎ𝑎𝑟
(𝑗)
and link energy 𝐸𝑙𝑖𝑛𝑘
(𝑖,𝑗)
:
• 𝐸𝑡𝑒𝑥𝑡
(𝑖,𝑗)
=
1
2
[(
𝐸 𝑐ℎ𝑎𝑟
(𝑖)
+ 𝐸 𝑐ℎ𝑎𝑟
(𝑗)
2
)+ 𝐸𝑙𝑖𝑛𝑘
(𝑖,𝑗)
]
• To refine the detected text objects, text units whose text unit energies are smaller than a pre-
defined threshold Ttext are removed from the text objects.
• choice of this threshold depends upon the characteristic of the datasets, a threshold of of 0.7
worked well for several datasets used for testing this algorithm.
18
Fig.12 Text energy Ref(1) 19
Fig.13 Threshold Etext and txt detection outputs Ref(1) 20
Result on ICDAR 2003/2005 Dataset Objects
Fig.14 experimental outputs Ref.(2) 21
Evalution Results
Algorithm Precision
Ashida 0.55
Hinneck Becker 0.62
SWT 0.73
Novel Text detection 0.74
22
Recall
0.46
0.67
0.60
0.69
References:
• [1] Jing Zhang and Rangachar Kasturi ,“A novel text detection system based on characters
and link energies”, image processing, IEEE trans., vol.23, No.9, pp.4187-4198, September 2014.
• [2] S.M.Lucas, A. Panaretos, L.Sosa,A. Tang, S. Wong, and R. Young, “ICDAR 2003 robust reading
competitions”, in Proc. 7th Int. Conf. Document And Recognit.,vol.2,pp. 682,2003.
• [3]D,.Marr and Hildreth, “Theory of edge detection,” Proc.Roy.Soc. London B,vol.
207,No.1167,pp. 187-217,1980.
23
24
Thank You

Text Detection From Image

  • 1.
    “A Novel Textdetection System Based on Character and Link Energies” Presented by: Arun Patel Roll No.: 15EC65R18 M.Tech 1st year VIPES, IIT Kharagpur 1
  • 2.
    Algorithm • This algorithmcan detect most text object in various condition including different lightening, different colors, complex background and low contrast text. • This method is robust to the font, size, color and orientation of text and discriminate text object from others effectively. Fig(1) Algorithm 2
  • 3.
    Initialization of CandidateText Objects  Localize the candidate Part  Euler number  Let 𝑣𝑖 and 𝑣𝑗 be two candidate parts with widths W 𝑣𝑖 and Wvj , heights Hvi and Hvj , and centroids Cvi and Cvj ; dist.(Cvi ,Cvj)≤wd .min(max(Wvi ,Hvi), max(Wvj ,Hvj))  Finally, the candidate character parts that are reachable by one another via one or more links are grouped to form a candidate text. 3
  • 4.
    Fig.2 Initialization ofcandidate text objects 4
  • 5.
    Character Features • Oneimportant characteristic that discriminate text object from other object is that character are made up of strokes that typically have approximately uniform thickness resulting in two near parallel edges sets in their boundaries. • Two edges sets have high similarities in length, orientation and curvature. • Similarities of two stroke edges is captured by gradient vector of each point on the boundary. Fig.3 (a) edge pairs of strokes (b)Gradient vectors of ‘R’ 5
  • 6.
    • For acharacter, it has two near parallel edges sets and the gradients of an edge point and its corresponding point should have approximately opposite direction. • Distance between the points and their corresponding are similar because the change of stroke width is usually small. Fig.4 Corresponding pairs and links 6
  • 7.
    Average Angle Differenceof Corres. Pairs(Dangle) • Let N denote the number of edge points of a candidate part. P(i)(1 ≤ i ≤ N) is the ith edge point with the corresponding point P(i) corr .The difference of the gradient directions of the corresponding pair (P(i) , P(i) corr.) is defined as: 𝑑 𝑎𝑛𝑔𝑙𝑒 (𝑖) =abs(𝜃 𝑝 (𝑖) -𝜃 𝑝 𝑐𝑜𝑟𝑟 (𝑖) ) • Dangle measures the average gradient direction difference of all corresponding pairs of a candidate part. • Dangle = 1 𝑁∙𝜋 𝑖=1 𝑁 𝑑 𝑎𝑛𝑔𝑙𝑒 (𝑖) • For an ideal character Dangle reaches the maximum value 1. 7
  • 8.
    Fraction of non-noisepair (Fnon-noise) • In some cases, however, a character may have a smaller 𝑑 𝑎𝑛𝑔𝑙𝑒 (𝑖) due to noise or deformations. We compute Fnon−noise to measure the noise and deformation levels of a part based on d (i) angle . • Fnon-noise = 1 𝑁 𝑖=1 𝑁 ℎ(𝑑 𝑎𝑛𝑔𝑙𝑒 𝑖 , β) • h(𝑑 𝑎𝑛𝑔𝑙𝑒 (𝑖) ,β)=1 if d (i) angle >β =0 else • Fnon-noise is the fraction of all pairs for which the angle difference d(i) angle is greater than β. Fig5.Noise connections and non-noise connections Ref.(1) 8
  • 9.
    R A CE Dangle 0.889 0.865 0.925 0.897 Fnon-noise 0.754 0.684 0.897 0.806 Fig.6 Dangle and Fnon-noise Ref. (1) 9
  • 10.
    • we dividethe non-noise connections into two types: stroke-length connection and stroke-width connection. • By doing so, we can separate circle like objects and compute the feature vector of stroke width. • Let k(i)(1 ≤ i ≤ N) be one of N non-noise connections of a part and have Ik (i) intersections with other non-noise connections. We define stroke-length connection and stroke-width as follows: • K(i)∈ stroke−length connection, if ( Ik (i) 𝑁 )> TIS stroke−width connection, otherwise • For circle, every connection intersects with all other connections at its center. Hence, all non- noise connection of a circle are stroke length connection. • Character have much more stroke-width connection than the non-characters. 10
  • 11.
    Fig 7. Percentagesof stroke-width links of two example images. Ref.(1) 11
  • 12.
    Vector of StrokeWidth (𝑉 𝑤𝑖𝑑𝑡ℎ) • The vector of stroke width Vwidth is defined as: 𝑉 𝑤𝑖𝑑𝑡ℎ=[𝑤 𝑑 (1) , 𝑤 𝑑 (2) ]. • Characters typically have one or two dominating stroke widths depending on their fonts. • Then, we estimate dominating stroke-width w(i) d through a weighted average computation using w(i) p and its two immediately adjacent neighbors: 𝑤 𝑑 (𝑖) = r1×( 𝑤 𝑝 (𝑖) −1)+ 𝑤 𝑝 (𝑖) +r2× ( 𝑤 𝑝 (𝑖) +1) r1+1+r2 Fig.8 Histogram of the lengths of stroke width connections Ref. (1) 12
  • 13.
    Character Energy • Fora part vi , we consider that its 𝐷 𝑎𝑛𝑔𝑙𝑒 (𝑖) and 𝐹𝑛𝑜𝑛−𝑛𝑜𝑖𝑠𝑒 (𝑖) are equally important for text detection and define the character energy 𝐸𝑐ℎ𝑎𝑟 (𝑖) of vi as follows: 𝐸𝑐ℎ𝑎𝑟 (𝑖) = 𝐷 𝑎𝑛𝑔𝑙𝑒 (𝑖) +𝐹𝑛𝑜𝑛−𝑛𝑜𝑖𝑠𝑒 (𝑖) 2 ,0≤ 𝐸𝑐ℎ𝑎𝑟 (𝑖) ≤1. • It can be treated as a measure of the probability that vi is a character. • Character have larger Echar can discriminate text objects from other objects and it is robust to the font,size,color and orientation of characters. • 𝐷 𝑎𝑛𝑔𝑙𝑒 (𝑖) and 𝐹𝑛𝑜𝑛−𝑛𝑜𝑖𝑠𝑒 (𝑖) are correlated. 13
  • 14.
    (a) (b) Fig.9 twocharacter with different noise/deformation levels Ref. (1) 14 𝑫 𝒂𝒏𝒈𝒍𝒆 𝑭 𝒏𝒐𝒏_𝒏𝒐𝒊𝒔𝒆 𝑬 𝒄𝒉𝒂𝒓 (a) 0.8846 0.5950 0.5950 (b) 0.8847 0.5261 0.7054
  • 15.
  • 16.
    Link energy • Linkenergy is computed for every candidate link to measure the probability that two parts connected by the link are both characters. • Link energy is computed by measuring two values: 1. Similarity in the properties of neighboring parts, such as the color, stroke width, and size. 2.Spatial consistency in the direction and distance between neighboring parts in a string of parts. • For two connected parts vi and vj ,we use color, stroke width(Vwidth),character width, and character height to capture similarities between them. 𝐸𝐿𝑖𝑛𝑘 (𝑖,𝑗) = 1 4 𝑘=1 4 (𝑤 𝑘.𝑠𝑖,𝑗 (𝑘) ) 𝑤 𝑘= 0.25 • Higher the 𝐸𝐿𝑖𝑛𝑘 (𝑖,𝑗) higher the similarities between two parts. 16
  • 17.
    Similarity Computation Oftwo Character Fig.11 Link energy Ref.(1) 17 colour 𝑆𝑖,𝑗 (1) = 1 3 (𝐶=𝑅,𝐺,𝐵)(1 − |Ci−Cj| 255 ) Vwidth 𝑆𝑖,𝑗 (2) = 1 2 𝑘=1 𝑘=2 𝑆𝑖𝑚𝑖 𝑅𝑖,𝑗 𝑣 𝑘 , 𝑅𝑖,𝑗 𝑣 = Vi (k) Vj (k) Character width 𝑆𝑖,𝑗 (3) = 𝑆𝑖𝑚𝑖 𝑅𝑖,𝑗 𝑤 , 𝑅𝑖,𝑗 𝑤 = 𝑊𝑖 𝑊 𝑗 Character Height 𝑆𝑖,𝑗 (4) =Simi(𝑅𝑖,𝑗 (𝐻) ), 𝑅𝑖,𝑗 𝑤 = 𝐻𝑖 𝐻 𝑗
  • 18.
    Text Unit Energy •For the text unit containing two parts vi and v j , the text unit energ 𝐸𝑡𝑒𝑥𝑡 (𝑖,𝑗) is computed using character energies 𝐸𝑐ℎ𝑎𝑟 (𝑖) , 𝐸𝑐ℎ𝑎𝑟 (𝑗) and link energy 𝐸𝑙𝑖𝑛𝑘 (𝑖,𝑗) : • 𝐸𝑡𝑒𝑥𝑡 (𝑖,𝑗) = 1 2 [( 𝐸 𝑐ℎ𝑎𝑟 (𝑖) + 𝐸 𝑐ℎ𝑎𝑟 (𝑗) 2 )+ 𝐸𝑙𝑖𝑛𝑘 (𝑖,𝑗) ] • To refine the detected text objects, text units whose text unit energies are smaller than a pre- defined threshold Ttext are removed from the text objects. • choice of this threshold depends upon the characteristic of the datasets, a threshold of of 0.7 worked well for several datasets used for testing this algorithm. 18
  • 19.
  • 20.
    Fig.13 Threshold Etextand txt detection outputs Ref(1) 20
  • 21.
    Result on ICDAR2003/2005 Dataset Objects Fig.14 experimental outputs Ref.(2) 21
  • 22.
    Evalution Results Algorithm Precision Ashida0.55 Hinneck Becker 0.62 SWT 0.73 Novel Text detection 0.74 22 Recall 0.46 0.67 0.60 0.69
  • 23.
    References: • [1] JingZhang and Rangachar Kasturi ,“A novel text detection system based on characters and link energies”, image processing, IEEE trans., vol.23, No.9, pp.4187-4198, September 2014. • [2] S.M.Lucas, A. Panaretos, L.Sosa,A. Tang, S. Wong, and R. Young, “ICDAR 2003 robust reading competitions”, in Proc. 7th Int. Conf. Document And Recognit.,vol.2,pp. 682,2003. • [3]D,.Marr and Hildreth, “Theory of edge detection,” Proc.Roy.Soc. London B,vol. 207,No.1167,pp. 187-217,1980. 23
  • 24.