A SELF-ORGANIZI NG NEURAL S YS TEM FOR L EARNI NG TO
                RECOGNI Z E TEXTURED S CENES


                       Stephen Grossberg1 and James R. Will i am 2
                                                                son


                        Departm of Cogni ti ve and Neural System
                               ent                              s
                             and C enter f or Adapti ve Systems
                                      Boston Uni versi ty

                            Vision Research    , 39 (1999) 1385-1406.


                          All c rr spo de c sh uld be a d e d to
                               o e    n ne o           d r sse  :


                                Prof essor Stephen G rossberg
                        Departm of C ti ve and N
                               ent     ogni           eural Systems
                                      Boston U versi ty
                                               ni
                                      677 B eacon Street
                                      Boston, MA02215
                                     Phone: 617-353-7858
                                      Fax: 617-353-7755
                                   E-m l : steve@cns. bu. edu
                                      ai



Keywords:         pattern recogni ti on, boundary segm  entati on, surf ace representati on,
l l i ng-i n, texture cl assi
cati on neural network, adapti ve resonance theory
                                     ,




  1
     Supported in par t by t he Defense Res ear ch Pr oject s Agency and t he Oce of Naval Re s e ar c h
(O N00014-95- 1- 0409) and t he O c e of Naval Res ear ch ( ONR N00014- 95- 1- 0657) .
   NR                                  
   2 Suppor t ed i n par t by t he Def ens e Res ear ch Pr o j ect s Agency and t he Oce of Naval Re s e ar c h
( O N00014- 95- 1- 0409) .
   NR
Abs tr act
Asel f -organi zi ng A TE m
                       R X odel i s devel oped to categori ze and cl assi f y textured i m     age
regi ons. A T Xspeci al i zes the F C D odel of howthe vi sual cortex sees, and the
              RE                       A A Em
A Tm of howtem
  R odel              poral andpref rontal corti ces i nteract w th the hi ppocam system
                                                                   i               pal
to l earn vi sual recogni ti on categori es and thei r nam F C D
                                                             es. A A Eprocessi ng generates a
vector of boundary and surf ace properti es, notabl y texture and bri ghtness properti es, by
uti l i zi ng m ti -scal e
l teri ng, com ti on, and di usi ve
l l i ng-i n. Its context-sensi ti ve
               ul                          peti
l ocal m   easures of textured scenes can be used to recogni ze sceni c properti es that grad-
ual l y change across space, as w l as abrupt texture boundari es. A T i ncrem
                                       el                                       R         ental l y
l earns recogni ti on categori es that cl assi f y F C D
                                                     A A Eoutput vectors, cl ass nam of these
                                                                                       es
categori es, and thei r probabi l i ti es. T  op-dow expectati ons w thi n A Tencode l earned
                                                      n                 i       R
prototypes that pay attenti on to expected vi sual f eatures. W novel vi sual i nf orm
                                                                        hen                       a-
ti on creates a poor m w th the best exi sti ng category prototype, a m ory search
                          atch i                                                   em
sel ects a new category w th w ch cl assi f y the novel data. A T X i s com
                              i     hi                                    RE        pared w th
                                                                                            i
psychophysi cal data, and i s benchm      arked on cl assi
cati on of natural textures and syn-
theti c aperture radar i m  ages. It outperf orm state-of -the-art system that use rul e-based,
                                                  s                           s
backpropagati on, and K-nearest nei ghbor cl assi
ers.




                                                 1
1    Introduction


1.1 Ba c kgr o und a n d Be n c hma r k s
T brai n's unparal l el ed abi l i ty to percei ve and recogni ze a rapi dl y changi ng w d has
  he                                                                                       orl
i nspi red an i ncreasi ng num of m s ai m at expl oi ti ng these properti es f or purposes
                                ber     odel       ed
of autom c target recogni ti on. On the perceptual si de, the brai n can cope w th vari abl e
           ati                                                                         i
i l l um nati on l evel s and noi sy sceni c data that com ne i nf orm on about edges, textures,
        i                                                 bi          ati
shadi ng, and depth that are overl ai d i n al l parts of a scene. T s type of general -purpose
                                                                       hi
processi ng enabl es the brai n to deal w th a w de range of i m
                                                i       i               agery, both f am l i ar and
                                                                                         i
unf am l i ar. O the recogni ti on si de, the brai n can autonom y di scover and l earn
        i          n                                                    ousl
recogni ti oncategori es and predi cti ve cl assi
cati ons that shape them ves to the stati sti cs
                                                                               sel
of a changi ng envi ronm i n real ti m T present arti cl e devel ops a newsel f -organi zi ng
                            ent             e. he
neural archi tecture that com nes perceptual and recogni ti on m s that exhi bi t these
                                    bi                                    odel
desi rabl e properti es.
    These m s have i ndi vi dual l y been deri ved to expl ai n and predi ct data about how
             odel
the brai n generates perceptual representati ons i n the stri ate and prestri ate vi sual cor-
ti ces (e. g. , A ngton, 1994; B och  G
                  rri               al        rossberg, 1997; F  ranci s G  rossberg, 1996; G ove,
G rossberg, Mngol l a, 1995; G
                  i                rossberg, 1994, 1997; G  rossberg, Mngol l a, R
                                                                         i            oss, 1997;
P essoa, Mngol l a, N ann, 1995) and uses these representati ons to l earn attenti ve
           i              eum
recogni ti on categori es and predi cti ons through i nteracti ons betw i nf erotem
                                                                           een          poral , pre-
f rontal , and hi ppocam corti ces (e. g. , B
                          pal                   radski G  rossberg, 1995; C    arpenter G ross-
berg, 1993; G    rossberg, 1995; G rossberg M l l , 1996). T perceptual theory i n ques-
                                                erri              he
ti on i s cal l ed F C D theory. It consi sts of subsystem cal l ed the B
                     AAE                                       s                oundary Contour
System(B S) and the F
           C               eature Contour System(FC that generate 3-Dboundary and
                                                     S)
surf ace representati ons that m     odel the corti cal i nterbl ob and bl ob processi ng stream   s,
respecti vel y. T adapti ve categori zati on and predi cti ve theory i s cal l ed A
                    he                                                                dapti ve Reso-
nance T  heory, or A T A Tm s are capabl e of stabl y sel f -organi zi ng thei r recogni ti on
                      R . R odel
codes usi ng ei ther unsupervi sed or supervi sed i ncrem    ental l earni ng i n any com nati on
                                                                                         bi
through ti m (C
              e arpenter G     rossberg, 1991; C arpenter et al., 1992).
    T present w devel ops the A T Xm to cl assi f y scenes that i ncl ude com ex
      he            ork                R E odel                                        pl
textures, both natural and arti
ci al . T A T Xarchi tecture w bui l t up f romspe-
                                             he R E                    as
ci al i zed versi ons of F C D
                          A A Eand A Tm s that have been desi gned to achi eve hi gh
                                      R odel
com petence i n cl assi f yi ng textured scenes w thout al so i ncorporati ng m
                                                  i                             echani sm that
                                                                                         s
are not essenti al f or understandi ng thi s com  petence. Just as the properti es of the F - A
C D and A Tm s are em
 AE          R odel             ergent properti es that are due to i nteracti ons of thei r
vari ous parts, the properti es of the A T Xarchi tecture are al so em
                                          RE                            ergent properti es due
to i nteracti ons w thi n and betw i ts F C D
                     i              een     A A Eand A Tm es. T
                                                       R odul       hese newem   ergent
properti es are not m y the sumof the parts of the m es of w chthey are deri ved,
                        erel                                odul        hi
and need to be anal ysed on thei r ow term
                                        n    s.
    Inorder to understandthe emergent properti es that are achi evedby joi ni ng a F C D
                                                                                    AAE

                                                  2
vi si on preprocessor to an A Tadapti ve cl assi
er, A T Xi s benchm
                              R                          RE            arked agai nst state-
of -the-art al ternati ve m s of texture cl assi
cati on. O m stri ki ng resul ts are deri ved
                           odel                                ur ost
throughbenchm studi es that cl assi f y natural textures f romthe B
                 ark                                                     rodatz (1966) texture
al bum w chi s of ten used as a standardi zedtest of texture cl assi
cati on m s. A T X
       , hi                                                                       odel     RE
benchm em ated the condi ti ons under w ch others benchm
        arks ul                                 hi                  arked thei r al gori thms
on B rodatz textures. Asi ngl e tri al of on-l i ne i ncrem  ental category l earni ng by A T X
                                                                                           RE
can outperf ormanother l eadi ng m ' s o-l i ne batchl earni ng usi ng a com ex rul e-based
                                   odel                                        pl
system(G   reenspan, 1996; G  reenspan et al., 1994). A T Xal so outperf orm K
                                                         RE                    s -nearest
nei ghbor m s i n both accuracy anddata com
            odel                                 pressi on, andm ti l ayer perceptrons (back
                                                                 ul
propagati on) i n both accuracy and processi ng ti m  e.
   T cl assi
cati on errors that A T Xdoes produce are com
    he                                  RE                    pared w th hum per-
                                                                     i      an
cepti on of texture si m l ari ti es (R Lohse, 1993, 1996). Acorrel ati on exi sts betw
                         i              ao                                                 een
the psychophysi cal l y measured si m l ari ty betw tw textures and the probabi l i ty that
                                      i            een o
A T Xw l l conf use them
 RE i                     .
    A T Xi s al so used to cl assi f y regi ons i n real -w d scenes that have been processed
     RE                                                    orl
by syntheti c aperture radar (SA ). SA m
                                      R      Ri agery has recentl y becom popul ar i n m
                                                                         e              any
satel l i te i m processi ng appl i cati ons because the SA sensor can penetrate vari abl e
                age                                            R
w eather condi ti ons (N  ovak et al., 1990; W an et al., 1995). T SA m present
                                               axm                  he Ri ages
a chal l enge f or texture cl assi
ers because they contai n pi xel i ntensi ti es that vary over
ve orders of m tude and are corrupted by hi gh l evel s of m ti pl i cati ve noi se, yi el di ng
                  agni                                             ul
i ncom ete and di sconti nuous boundary and surf ace representati ons. R ts bel owon
      pl                                                                      esul
natural texture and SA m i l l ustrate howpattern recogni ti on m s that are based
                          Ri ages                                        odel
on bi ol ogi cal pri nci pl es and m echani sm can outperf ormm s that have been deri ved
                                              s                odel
f romm tradi ti onal engi neeri ng concepts.
       ore

1 . 2 Ps y c h o ph y s ic a l Da t a a n d Mo d e l Pr o p e r t i e s
A l east tw di erent approaches exi st to texture cl assi
cati on. In one approach, the f ocus
  t          o
i s on separati ng regi ons w th di erent textures by
ndi ng the boundari es betw them
                                i                                                      een
(B  ergen A son, 1988; F
             del               ogel Sagi , 1989; Gurnsey B se, 1989; M i k P
                                                             row           al        erona,
1990; R  ubenstei n Sagi , 1990; B   ergen Landy, 1991). A  nother approach attem topts
cl assi f y the textures w thi n sm l regi ons of a scene (C l i , 1985, 1988; B k, C ark,
                             i        al                         ael                    ovi     l
G sl er, 1990; Jai n F
     ei                        arrokhni a, 1991; Greenspan et al., 1994). Such an approach
di scovers texture boundari es by cl assi f yi ng the textures w thi n each regi on di erentl y. It
                                                                 i
can al so cl assi f y l ocal regi ons whose textural properti es vary gradual l y across space, and
thus are not separated by a di sti nct boundary.
    Gurnsey and Laundry (1992) have provi ded psychophysi cal data i n support of the
l atter type of processi ng by show ng that hum texture recogni ti on i s onl y sl i ghtl y i m
                                   i           an                                              -
pai red w the boundari es betw di erent textures i n a texture m c are bl urred.
         hen                     een                                  ozai
A T Xdoes the l atter type of cl assi
cati on. It deri ves a 17-di m onal f eature vec-
  RE                                                                 ensi
tor f romm ti pl e-scal e boundary f eatures of the B S and a surf ace bri ghtness f eature
           ul                                        C

                                                 3
of the FC T s f eature vector uti l i zes
l ters of f our di erent scal es, as suggested by
          S. hi
psychophysi cal experi m (Harvey G
                           ents             ervai s, 1978; R chards, 1979; Wl son B
                                                            i                    i        ergen,
1979). T spati al
l ters are eval uated at f our di erent ori entati ons, thereby l eadi ng to a
         he
16-di m onal (4 2 4) f eature vector. T 17 di m on i s a surf ace bri ghtness f eature.
       ensi                                   he     th
                                                           ensi
T A T Xm uses these f eature vectors to generate a context-sensi ti ve cl assi
cati on
 he R E odel
of l ocal texture properti es. T   hese B S and FC operati ons are desi gned to be as si m e
                                         C          S                                         pl
and f ast as possi bl e w thout i ncurri ng a l oss of accuracy i n cl assi f yi ng texture data.
                         i
     Al arge psychophysi cal l i terature supports the F C DA A Ehypothesi s that the hum    an
brai n f orm di sti nct boundary and surf ace representati ons bef ore they are bound together
            s
by obj ect recogni ti on categori es. E   xperi mental resul ts that support the rol e of boundary
representati ons i ncl ude the f ol l ow ng: (1) O ect superi ori ty eects occur usi ng outl i ne
                                        i           bj
sti m i w th l i ttl e surf ace detai l (D do D
     ul i                                   avi        onnel l y, 1990; H a, H
                                                                         om aver, Schw      artz,
1976). (2) T num of errors i n tachi stoscopi c recogni ti on and the speed of i denti
ca-
              he ber
ti on are of ten com parabl e usi ng appropri atel y and i nappropri atel y col ored obj ects (Mal ,
                                                                                                i
Sm th, D
   i      oherty, Sm th, 1979; O
                       i             stergaard D do, 1985). (3) T
                                                   avi                    here i s no di erence
i nrecogni ti on speed usi ng bl ack-and-w te photographs or l i ne draw ngs that are caref ul l y
                                            hi                              i
deri ved f romthem(B ederm Ju, 1988).
                        i     an
    Several types of data al so i m i cate a separate surf ace bri ghtness and col or process.
                                       pl
T hese i ncl ude the f ol l ow ng: (4) C ored surf aces m be bound to an i ncorrect f ormdur-
                              i         ol               ay
i ng i l l usory conj uncti ons (M cLean, B roadbent, Broadbent, 1983; Stef urak Boynton,
1986; T sm Schmdt, 1982). (5) C or can f aci l i tate obj ect nam ng i f the obj ect-
           rei an          i                  ol                             i
s to be nam are structural l y si m l ar or degraded (C st, 1975; P ce H phreys,
               ed                         i                  hri           ri      um
1989). (6) C ors are coded categori cal l y pri or to the processi ng stage at w ch they
                 ol                                                                     hi
are nam (D do, 1991; R
           ed avi                 osch, 1975). T o of the m recent studi es i n support
                                                   w          ost
of the boundary-surf ace di sti ncti on w carri ed out by E der and Zucker (1998) and
                                              ere                 l
R ogers-R achandran and R achandran (1998).
            am                  am
     F C D theory proposes that 3-Dboundary and surf ace f eatures that are f orm
      AAE                                                                            ed
i n the prestri ate vi sual cortex are categori zed i n the i nf erotemporal cortex (Grossberg,
1994, 1997). B boundary and surf ace properti es are proposed to be com ned duri ng
                oth                                                           bi
the categori zati on process w thi n bottom and top-dow adapti ve pathw that are
                                i             -up            n                ays
m ed by an A Tsystem T o consequences of thi s concepti on are that unam guous
  odel          R         . w                                                    bi
boundari es can generate category recogni ti on by them ves, and that boundari es can
                                                          sel
pri m 3-Dobj ect representati ons even i f they need to be suppl em
      e                                                                ented by 3-Dsurf ace
i nf orm on i n order to achi eve unam guous recogni ti on. C
        ati                            bi                      avanagh (1997) has reported
data consi stent w th thi s l atter predi cti on.
                   i
    In the A T Xi m em
              R E pl entati on of thi s concept, the f eature vectors that are f orm        ed
f romthe 17-di m onal boundary and surf ace f eatures of the F C D
                  ensi                                              A A Epreprocessor are
i nput to an A Tcl assi
er, w ch categori zes the textures usi ng a bi ol ogi cal l y-m vated
               R                hi                                                      oti
l earni ng al gori thm H ans l earn to di scri m nate textures by l ooki ng at themand be-
                      . um                       i
com ng sensi ti ve to thei r stati sti cal properti es i n sm l regi ons. T s i s howour m i s
    i                                                        al            hi                odel
trai ned. Intui ti vel y speaki ng, m trai ni ng i s l i ke havi ng an observer l ook at a num
                                     odel                                                      ber

                                                 4
of l ocati ons and tryi ng to l earn to categori ze thembased on thei r l ocal properti es. T           he
A T cl assi
er w used, cal l ed G
  R               e                    aussi an A T A , or G M i ncrem
                                                   R MP        A,          ental l y constructs
i nternal categori es that have G     aussi an recepti ve
el ds i n the i nput space, and that m    ap
to output cl ass predi cti ons (Wl l i am 1996, 1997). C l s w th G
                                   i        son,                   el   i      aussi an recepti ve
el ds
are ubi qui tous i n the brai n, and have been used to m data about howthe i nf erotem
                                                              odel                                 -
poral cortex l earns to categori ze vi sual i nput patterns (Logotheti s et al., 1994). Such
m s are not, how
  odel                 ever, typi cal l y abl e to sel f -organi ze thei r ow recogni ti on categori es
                                                                               n
and to autonom y search f or new ones w th w ch to cl assi f y novel i nput patterns.
                 ousl                               i     hi
A Tm s overcom thi s w
  R odel             e        eakness by show ng howcom em
                                                 i           pl entary attenti onal and ori -
enti ng system are desi gned w th w ch to bal ance betw the processi ng of f am l i ar and
               s                 i       hi                     een                          i
expected events, on the one hand, and unf am l i ar and unexpected events on the other
                                                      i
(C  arpenter G rossberg, 1991; G     rossberg, 1980; G    rossberg M l l , 1996). A l l earned
                                                                       erri                l
categori zati on goes on w thi n the attenti onal system T ori enti ng subsystemi s acti -
                             i                                   . he
vated i n response to events that are too novel f or the attenti onal systemto successf ul l y
categori ze them Interacti ons betw the attenti onal and ori enti ng subsystem then l ead
                  .                       een                                                s
to a m ory search w ch di scovers a m appropri ate popul ati on of cel l s w th w ch
        em              hi                    ore                                         i     hi
to categori ze the novel i nf orm on. T
                                     ati       hese i nteracti ons are desi gned to expl ai n howthe
brai n conti nues to l earn qui ckl y about huge am     ounts of newi nf orm on throughout l i f e,
                                                                              ati
w thout bei ng f orced to j ust as qui ckl y f orget usef ul i nf orm on that i t has previ ousl y
  i                                                                       ati
l earned.
    A ter each i nput i s presented (i . e. , each l ocati on i s observed), G Mautom cal l y
      f                                                                              A        ati
acti vates cel l s w recepti ve
el ds adapt to represent the i nput by am
                    hose                                                          ounts proporti onal
to thei r l evel of match w th the i nput. H ever, i f the i nput i s too novel f or any exi sti ng
                            i                 ow
recepti ve
el d to m the i nput w l enough, then a m ory search i s tri ggered w ch
                       atch             el                      em                              hi
l eads to the sel ecti on of a previ ousl y uncom i tted cel l popul ati on w th w ch a newcate-
                                                  m                            i     hi
gory can be l earned. D ng unsupervi sed l earni ng, the correct nam of the regi ons that
                          uri                                                 es
are bei ng cl assi
ed are not suppl i ed, and the l evel of m that i s requi red f or a category
                                                                atch
to l earn i s constant. T param
                            he       eter that determ nes thi s degree of m
                                                         i                        atch i s cal l ed the
vi gi l ance param because i t com
                     eter                  putati onal l y real i zes the i ntui ti ve process of bei ng
m or l ess vi gi l ant i n respose to i nf orm onof vari abl e i m
  ore                                          ati                      portance (C arpenter G   ross-
berg, 1991). Lowvi gi l ance al l ow the netw to l earn general categori es i n w ch m
                                     s           ork                                       hi       any
i nput exem ars m share the sam category prototype. H gh vi gi l ance enabl es the net-
             pl      ay               e                            i
w to l earn m speci
c categori es, even categori es i n w chonl y a si ngl e exem ar m
  ork            ore                                                 hi                      pl ay
be represented. T the choi ce of vi gi l ance can trade betw prototype and exem ar
                      hus                                               een                      pl
l earni ng, even w thi n a si ngl e A Tsystem E
                   i                 R          . xperi m  ental evi dence consi stent w th vi gi -
                                                                                           i
l ance control has been reported i n m   onkeys w they attem to perf ormcl assi
cati ons
                                                   hen               pt
duri ng easy vs. di cul t di scri m nati ons (Spi tzer, D m M
                                     i                       esi one, oran, 1988).
   Learni ng typi cal l y starts w th a l ow vi gi l ance val ue, w ch l eads to the f orm on
                                     i                                 hi                        ati
of the m general categori es that are consi stent w th the i nput data. B
         ost                                                  i                          ecause A T
                                                                                                  R
m s are sel f -organi zi ng, suchl earni ng can proceed on i ts ow n an unsupervi sedm
 odel                                                                   ni                      ode.
Starti ng w th a l owvi gi l ance val ue conserves m ory resources, but i t can al so create the
           i                                        em
tendency, al so f ound i n chi l dren, to overgeneral i ze unti l f urther l earni ng l eads to category

                                                    5
re
nem (C an, et al., 1986; C ark, 1973; Sm th et al., 1985; Sm th K l er, 1978;
        ent hapm                    l              i                  i       em
W 1983). F exam e, i t m ght happen that, af ter l earni ng a category that cl assi
es
  ard,           or      pl       i
vari ati ons on the l etter E the l etter F w l l al so acti vate that category, based on the
                               ,                 i
vi sual si m l ari ty betw the tw types of l etters. T di erence betw the l etters E
            i              een      o                     he                  een               
and F i s determ ned by cul tural f actors, not by vi sual si m l ari ty. Supervi sed l earni ng
                     i                                             i
i s of ten essenti al to prevent errors based on i nput si m l ari ty w ch do not correspond to
                                                             i         hi
cul tural understandi ngs, or other envi ronm ental l y dependent f actors. A Tm s can
                                                                                R odel
operate i n both unsupervi sed and supervi sed l earni ng modes, and can sw tch betw the
                                                                            i        een
tw seam essl y duri ng the course of l earni ng.
   o      l
    D ng supervi sed l earni ng, the vi gi l ance param
     uri                                               eter, or requi red m l evel , i s rai sed
                                                                            atch
i f an i ncorrect predi cti on i s m (e. g. , i f there i s negati ve rei nf orcem
                                    ade                                             ent) by j ust e-
nough to tri gger a m ory search f or a new category. T s type of vi gi l ance control
                       em                                     hi
sacri
ces category general i ty onl y w m speci
c categori es are needed to m the
                                        hen ore                                         atch
stati sti cal properti es of a gi ven envi ronm C
                                               ent. ategori es of vari abl e general i ty are hereby
autom cal l y l earned based upon the success or f ai l ure of previ ousl y l earned categori es
      ati
i n predi cti ng the correct cl assi
cati on. Abl ock di agramof the A T Xarchi tecture i s
                                                                         RE
show i n Fi gure 1.
     n


2        u pl e-scal e Ori en
          lti                ted Fi l ter


T A T Xm ti pl e-scal e ori ented
l ter f urther devel ops the B S
l ter that w i ntro-
 he R E ul                                                       C               as
duced to expl ai n texture data i n Grossberg and Mngol l a (1985). Vari ants of thi s B S
                                                   i                                      C
l ter have si nce becom standard i n m texture segm
                        e                any              entati on al gori thm (M i k 
                                                                               s al
Perona, 1989; Sutter, B eck, G  raham 1989; B k et al., 1990; B
                                      ,        ovi                ergen, 1991; B ergen 
Landy, 1991; Jai n F arrokhni a, 1991; Graham B
                                              , eck, Sutter, 1992; G    reenspan et al.,
1994).
     Fi gure 2 di agram the A T X versi on of B S processi ng (Stages 1{5) f or a si ngl e
                        s       RE                  C
spati al scal e. A i n R chards (1979), w used 4 spati al f requency channel s. E chan-
                    s      i                    e                                           ach
nel com   puted 4 ori entati onal contrast f eatures. T     hese
l ter equati ons and param    eters
are descri bed i n A  ppendi x I. Af uncti onal descri pti on i s gi ven here. Stage 1 of the B S   C

red neuronal Som Net

  • 1.
    A SELF-ORGANIZI NGNEURAL S YS TEM FOR L EARNI NG TO RECOGNI Z E TEXTURED S CENES Stephen Grossberg1 and James R. Will i am 2 son Departm of Cogni ti ve and Neural System ent s and C enter f or Adapti ve Systems Boston Uni versi ty Vision Research , 39 (1999) 1385-1406. All c rr spo de c sh uld be a d e d to o e n ne o d r sse : Prof essor Stephen G rossberg Departm of C ti ve and N ent ogni eural Systems Boston U versi ty ni 677 B eacon Street Boston, MA02215 Phone: 617-353-7858 Fax: 617-353-7755 E-m l : steve@cns. bu. edu ai Keywords: pattern recogni ti on, boundary segm entati on, surf ace representati on,
  • 2.
    l l ing-i n, texture cl assi
  • 3.
    cati on neuralnetwork, adapti ve resonance theory , 1 Supported in par t by t he Defense Res ear ch Pr oject s Agency and t he Oce of Naval Re s e ar c h (O N00014-95- 1- 0409) and t he O c e of Naval Res ear ch ( ONR N00014- 95- 1- 0657) . NR 2 Suppor t ed i n par t by t he Def ens e Res ear ch Pr o j ect s Agency and t he Oce of Naval Re s e ar c h ( O N00014- 95- 1- 0409) . NR
  • 4.
    Abs tr act Aself -organi zi ng A TE m R X odel i s devel oped to categori ze and cl assi f y textured i m age regi ons. A T Xspeci al i zes the F C D odel of howthe vi sual cortex sees, and the RE A A Em A Tm of howtem R odel poral andpref rontal corti ces i nteract w th the hi ppocam system i pal to l earn vi sual recogni ti on categori es and thei r nam F C D es. A A Eprocessi ng generates a vector of boundary and surf ace properti es, notabl y texture and bri ghtness properti es, by uti l i zi ng m ti -scal e
  • 5.
    l teri ng,com ti on, and di usi ve
  • 6.
    l l ing-i n. Its context-sensi ti ve ul peti l ocal m easures of textured scenes can be used to recogni ze sceni c properti es that grad- ual l y change across space, as w l as abrupt texture boundari es. A T i ncrem el R ental l y l earns recogni ti on categori es that cl assi f y F C D A A Eoutput vectors, cl ass nam of these es categori es, and thei r probabi l i ti es. T op-dow expectati ons w thi n A Tencode l earned n i R prototypes that pay attenti on to expected vi sual f eatures. W novel vi sual i nf orm hen a- ti on creates a poor m w th the best exi sti ng category prototype, a m ory search atch i em sel ects a new category w th w ch cl assi f y the novel data. A T X i s com i hi RE pared w th i psychophysi cal data, and i s benchm arked on cl assi
  • 7.
    cati on ofnatural textures and syn- theti c aperture radar i m ages. It outperf orm state-of -the-art system that use rul e-based, s s backpropagati on, and K-nearest nei ghbor cl assi
  • 8.
  • 9.
    1 Introduction 1.1 Ba c kgr o und a n d Be n c hma r k s T brai n's unparal l el ed abi l i ty to percei ve and recogni ze a rapi dl y changi ng w d has he orl i nspi red an i ncreasi ng num of m s ai m at expl oi ti ng these properti es f or purposes ber odel ed of autom c target recogni ti on. On the perceptual si de, the brai n can cope w th vari abl e ati i i l l um nati on l evel s and noi sy sceni c data that com ne i nf orm on about edges, textures, i bi ati shadi ng, and depth that are overl ai d i n al l parts of a scene. T s type of general -purpose hi processi ng enabl es the brai n to deal w th a w de range of i m i i agery, both f am l i ar and i unf am l i ar. O the recogni ti on si de, the brai n can autonom y di scover and l earn i n ousl recogni ti oncategori es and predi cti ve cl assi
  • 10.
    cati ons thatshape them ves to the stati sti cs sel of a changi ng envi ronm i n real ti m T present arti cl e devel ops a newsel f -organi zi ng ent e. he neural archi tecture that com nes perceptual and recogni ti on m s that exhi bi t these bi odel desi rabl e properti es. These m s have i ndi vi dual l y been deri ved to expl ai n and predi ct data about how odel the brai n generates perceptual representati ons i n the stri ate and prestri ate vi sual cor- ti ces (e. g. , A ngton, 1994; B och G rri al rossberg, 1997; F ranci s G rossberg, 1996; G ove, G rossberg, Mngol l a, 1995; G i rossberg, 1994, 1997; G rossberg, Mngol l a, R i oss, 1997; P essoa, Mngol l a, N ann, 1995) and uses these representati ons to l earn attenti ve i eum recogni ti on categori es and predi cti ons through i nteracti ons betw i nf erotem een poral , pre- f rontal , and hi ppocam corti ces (e. g. , B pal radski G rossberg, 1995; C arpenter G ross- berg, 1993; G rossberg, 1995; G rossberg M l l , 1996). T perceptual theory i n ques- erri he ti on i s cal l ed F C D theory. It consi sts of subsystem cal l ed the B AAE s oundary Contour System(B S) and the F C eature Contour System(FC that generate 3-Dboundary and S) surf ace representati ons that m odel the corti cal i nterbl ob and bl ob processi ng stream s, respecti vel y. T adapti ve categori zati on and predi cti ve theory i s cal l ed A he dapti ve Reso- nance T heory, or A T A Tm s are capabl e of stabl y sel f -organi zi ng thei r recogni ti on R . R odel codes usi ng ei ther unsupervi sed or supervi sed i ncrem ental l earni ng i n any com nati on bi through ti m (C e arpenter G rossberg, 1991; C arpenter et al., 1992). T present w devel ops the A T Xm to cl assi f y scenes that i ncl ude com ex he ork R E odel pl textures, both natural and arti
  • 11.
    ci al .T A T Xarchi tecture w bui l t up f romspe- he R E as ci al i zed versi ons of F C D A A Eand A Tm s that have been desi gned to achi eve hi gh R odel com petence i n cl assi f yi ng textured scenes w thout al so i ncorporati ng m i echani sm that s are not essenti al f or understandi ng thi s com petence. Just as the properti es of the F - A C D and A Tm s are em AE R odel ergent properti es that are due to i nteracti ons of thei r vari ous parts, the properti es of the A T Xarchi tecture are al so em RE ergent properti es due to i nteracti ons w thi n and betw i ts F C D i een A A Eand A Tm es. T R odul hese newem ergent properti es are not m y the sumof the parts of the m es of w chthey are deri ved, erel odul hi and need to be anal ysed on thei r ow term n s. Inorder to understandthe emergent properti es that are achi evedby joi ni ng a F C D AAE 2
  • 12.
    vi si onpreprocessor to an A Tadapti ve cl assi
  • 13.
    er, A TXi s benchm R RE arked agai nst state- of -the-art al ternati ve m s of texture cl assi
  • 14.
    cati on. Om stri ki ng resul ts are deri ved odel ur ost throughbenchm studi es that cl assi f y natural textures f romthe B ark rodatz (1966) texture al bum w chi s of ten used as a standardi zedtest of texture cl assi
  • 15.
    cati on ms. A T X , hi odel RE benchm em ated the condi ti ons under w ch others benchm arks ul hi arked thei r al gori thms on B rodatz textures. Asi ngl e tri al of on-l i ne i ncrem ental category l earni ng by A T X RE can outperf ormanother l eadi ng m ' s o-l i ne batchl earni ng usi ng a com ex rul e-based odel pl system(G reenspan, 1996; G reenspan et al., 1994). A T Xal so outperf orm K RE s -nearest nei ghbor m s i n both accuracy anddata com odel pressi on, andm ti l ayer perceptrons (back ul propagati on) i n both accuracy and processi ng ti m e. T cl assi
  • 16.
    cati on errorsthat A T Xdoes produce are com he RE pared w th hum per- i an cepti on of texture si m l ari ti es (R Lohse, 1993, 1996). Acorrel ati on exi sts betw i ao een the psychophysi cal l y measured si m l ari ty betw tw textures and the probabi l i ty that i een o A T Xw l l conf use them RE i . A T Xi s al so used to cl assi f y regi ons i n real -w d scenes that have been processed RE orl by syntheti c aperture radar (SA ). SA m R Ri agery has recentl y becom popul ar i n m e any satel l i te i m processi ng appl i cati ons because the SA sensor can penetrate vari abl e age R w eather condi ti ons (N ovak et al., 1990; W an et al., 1995). T SA m present axm he Ri ages a chal l enge f or texture cl assi
  • 17.
    ers because theycontai n pi xel i ntensi ti es that vary over
  • 18.
    ve orders ofm tude and are corrupted by hi gh l evel s of m ti pl i cati ve noi se, yi el di ng agni ul i ncom ete and di sconti nuous boundary and surf ace representati ons. R ts bel owon pl esul natural texture and SA m i l l ustrate howpattern recogni ti on m s that are based Ri ages odel on bi ol ogi cal pri nci pl es and m echani sm can outperf ormm s that have been deri ved s odel f romm tradi ti onal engi neeri ng concepts. ore 1 . 2 Ps y c h o ph y s ic a l Da t a a n d Mo d e l Pr o p e r t i e s A l east tw di erent approaches exi st to texture cl assi
  • 19.
    cati on. Inone approach, the f ocus t o i s on separati ng regi ons w th di erent textures by
  • 20.
    ndi ng theboundari es betw them i een (B ergen A son, 1988; F del ogel Sagi , 1989; Gurnsey B se, 1989; M i k P row al erona, 1990; R ubenstei n Sagi , 1990; B ergen Landy, 1991). A nother approach attem topts cl assi f y the textures w thi n sm l regi ons of a scene (C l i , 1985, 1988; B k, C ark, i al ael ovi l G sl er, 1990; Jai n F ei arrokhni a, 1991; Greenspan et al., 1994). Such an approach di scovers texture boundari es by cl assi f yi ng the textures w thi n each regi on di erentl y. It i can al so cl assi f y l ocal regi ons whose textural properti es vary gradual l y across space, and thus are not separated by a di sti nct boundary. Gurnsey and Laundry (1992) have provi ded psychophysi cal data i n support of the l atter type of processi ng by show ng that hum texture recogni ti on i s onl y sl i ghtl y i m i an - pai red w the boundari es betw di erent textures i n a texture m c are bl urred. hen een ozai A T Xdoes the l atter type of cl assi
  • 21.
    cati on. Itderi ves a 17-di m onal f eature vec- RE ensi tor f romm ti pl e-scal e boundary f eatures of the B S and a surf ace bri ghtness f eature ul C 3
  • 22.
    of the FCT s f eature vector uti l i zes
  • 23.
    l ters off our di erent scal es, as suggested by S. hi psychophysi cal experi m (Harvey G ents ervai s, 1978; R chards, 1979; Wl son B i i ergen, 1979). T spati al
  • 24.
    l ters areeval uated at f our di erent ori entati ons, thereby l eadi ng to a he 16-di m onal (4 2 4) f eature vector. T 17 di m on i s a surf ace bri ghtness f eature. ensi he th ensi T A T Xm uses these f eature vectors to generate a context-sensi ti ve cl assi
  • 25.
    cati on heR E odel of l ocal texture properti es. T hese B S and FC operati ons are desi gned to be as si m e C S pl and f ast as possi bl e w thout i ncurri ng a l oss of accuracy i n cl assi f yi ng texture data. i Al arge psychophysi cal l i terature supports the F C DA A Ehypothesi s that the hum an brai n f orm di sti nct boundary and surf ace representati ons bef ore they are bound together s by obj ect recogni ti on categori es. E xperi mental resul ts that support the rol e of boundary representati ons i ncl ude the f ol l ow ng: (1) O ect superi ori ty eects occur usi ng outl i ne i bj sti m i w th l i ttl e surf ace detai l (D do D ul i avi onnel l y, 1990; H a, H om aver, Schw artz, 1976). (2) T num of errors i n tachi stoscopi c recogni ti on and the speed of i denti
  • 26.
    ca- he ber ti on are of ten com parabl e usi ng appropri atel y and i nappropri atel y col ored obj ects (Mal , i Sm th, D i oherty, Sm th, 1979; O i stergaard D do, 1985). (3) T avi here i s no di erence i nrecogni ti on speed usi ng bl ack-and-w te photographs or l i ne draw ngs that are caref ul l y hi i deri ved f romthem(B ederm Ju, 1988). i an Several types of data al so i m i cate a separate surf ace bri ghtness and col or process. pl T hese i ncl ude the f ol l ow ng: (4) C ored surf aces m be bound to an i ncorrect f ormdur- i ol ay i ng i l l usory conj uncti ons (M cLean, B roadbent, Broadbent, 1983; Stef urak Boynton, 1986; T sm Schmdt, 1982). (5) C or can f aci l i tate obj ect nam ng i f the obj ect- rei an i ol i s to be nam are structural l y si m l ar or degraded (C st, 1975; P ce H phreys, ed i hri ri um 1989). (6) C ors are coded categori cal l y pri or to the processi ng stage at w ch they ol hi are nam (D do, 1991; R ed avi osch, 1975). T o of the m recent studi es i n support w ost of the boundary-surf ace di sti ncti on w carri ed out by E der and Zucker (1998) and ere l R ogers-R achandran and R achandran (1998). am am F C D theory proposes that 3-Dboundary and surf ace f eatures that are f orm AAE ed i n the prestri ate vi sual cortex are categori zed i n the i nf erotemporal cortex (Grossberg, 1994, 1997). B boundary and surf ace properti es are proposed to be com ned duri ng oth bi the categori zati on process w thi n bottom and top-dow adapti ve pathw that are i -up n ays m ed by an A Tsystem T o consequences of thi s concepti on are that unam guous odel R . w bi boundari es can generate category recogni ti on by them ves, and that boundari es can sel pri m 3-Dobj ect representati ons even i f they need to be suppl em e ented by 3-Dsurf ace i nf orm on i n order to achi eve unam guous recogni ti on. C ati bi avanagh (1997) has reported data consi stent w th thi s l atter predi cti on. i In the A T Xi m em R E pl entati on of thi s concept, the f eature vectors that are f orm ed f romthe 17-di m onal boundary and surf ace f eatures of the F C D ensi A A Epreprocessor are i nput to an A Tcl assi
  • 27.
    er, w chcategori zes the textures usi ng a bi ol ogi cal l y-m vated R hi oti l earni ng al gori thm H ans l earn to di scri m nate textures by l ooki ng at themand be- . um i com ng sensi ti ve to thei r stati sti cal properti es i n sm l regi ons. T s i s howour m i s i al hi odel trai ned. Intui ti vel y speaki ng, m trai ni ng i s l i ke havi ng an observer l ook at a num odel ber 4
  • 28.
    of l ocations and tryi ng to l earn to categori ze thembased on thei r l ocal properti es. T he A T cl assi
  • 29.
    er w used,cal l ed G R e aussi an A T A , or G M i ncrem R MP A, ental l y constructs i nternal categori es that have G aussi an recepti ve
  • 30.
    el ds in the i nput space, and that m ap to output cl ass predi cti ons (Wl l i am 1996, 1997). C l s w th G i son, el i aussi an recepti ve
  • 31.
    el ds are ubiqui tous i n the brai n, and have been used to m data about howthe i nf erotem odel - poral cortex l earns to categori ze vi sual i nput patterns (Logotheti s et al., 1994). Such m s are not, how odel ever, typi cal l y abl e to sel f -organi ze thei r ow recogni ti on categori es n and to autonom y search f or new ones w th w ch to cl assi f y novel i nput patterns. ousl i hi A Tm s overcom thi s w R odel e eakness by show ng howcom em i pl entary attenti onal and ori - enti ng system are desi gned w th w ch to bal ance betw the processi ng of f am l i ar and s i hi een i expected events, on the one hand, and unf am l i ar and unexpected events on the other i (C arpenter G rossberg, 1991; G rossberg, 1980; G rossberg M l l , 1996). A l l earned erri l categori zati on goes on w thi n the attenti onal system T ori enti ng subsystemi s acti - i . he vated i n response to events that are too novel f or the attenti onal systemto successf ul l y categori ze them Interacti ons betw the attenti onal and ori enti ng subsystem then l ead . een s to a m ory search w ch di scovers a m appropri ate popul ati on of cel l s w th w ch em hi ore i hi to categori ze the novel i nf orm on. T ati hese i nteracti ons are desi gned to expl ai n howthe brai n conti nues to l earn qui ckl y about huge am ounts of newi nf orm on throughout l i f e, ati w thout bei ng f orced to j ust as qui ckl y f orget usef ul i nf orm on that i t has previ ousl y i ati l earned. A ter each i nput i s presented (i . e. , each l ocati on i s observed), G Mautom cal l y f A ati acti vates cel l s w recepti ve
  • 32.
    el ds adaptto represent the i nput by am hose ounts proporti onal to thei r l evel of match w th the i nput. H ever, i f the i nput i s too novel f or any exi sti ng i ow recepti ve
  • 33.
    el d tom the i nput w l enough, then a m ory search i s tri ggered w ch atch el em hi l eads to the sel ecti on of a previ ousl y uncom i tted cel l popul ati on w th w ch a newcate- m i hi gory can be l earned. D ng unsupervi sed l earni ng, the correct nam of the regi ons that uri es are bei ng cl assi
  • 34.
    ed are notsuppl i ed, and the l evel of m that i s requi red f or a category atch to l earn i s constant. T param he eter that determ nes thi s degree of m i atch i s cal l ed the vi gi l ance param because i t com eter putati onal l y real i zes the i ntui ti ve process of bei ng m or l ess vi gi l ant i n respose to i nf orm onof vari abl e i m ore ati portance (C arpenter G ross- berg, 1991). Lowvi gi l ance al l ow the netw to l earn general categori es i n w ch m s ork hi any i nput exem ars m share the sam category prototype. H gh vi gi l ance enabl es the net- pl ay e i w to l earn m speci
  • 35.
    c categori es,even categori es i n w chonl y a si ngl e exem ar m ork ore hi pl ay be represented. T the choi ce of vi gi l ance can trade betw prototype and exem ar hus een pl l earni ng, even w thi n a si ngl e A Tsystem E i R . xperi m ental evi dence consi stent w th vi gi - i l ance control has been reported i n m onkeys w they attem to perf ormcl assi
  • 36.
    cati ons hen pt duri ng easy vs. di cul t di scri m nati ons (Spi tzer, D m M i esi one, oran, 1988). Learni ng typi cal l y starts w th a l ow vi gi l ance val ue, w ch l eads to the f orm on i hi ati of the m general categori es that are consi stent w th the i nput data. B ost i ecause A T R m s are sel f -organi zi ng, suchl earni ng can proceed on i ts ow n an unsupervi sedm odel ni ode. Starti ng w th a l owvi gi l ance val ue conserves m ory resources, but i t can al so create the i em tendency, al so f ound i n chi l dren, to overgeneral i ze unti l f urther l earni ng l eads to category 5
  • 37.
  • 38.
    nem (C an,et al., 1986; C ark, 1973; Sm th et al., 1985; Sm th K l er, 1978; ent hapm l i i em W 1983). F exam e, i t m ght happen that, af ter l earni ng a category that cl assi
  • 39.
    es ard, or pl i vari ati ons on the l etter E the l etter F w l l al so acti vate that category, based on the , i vi sual si m l ari ty betw the tw types of l etters. T di erence betw the l etters E i een o he een and F i s determ ned by cul tural f actors, not by vi sual si m l ari ty. Supervi sed l earni ng i i i s of ten essenti al to prevent errors based on i nput si m l ari ty w ch do not correspond to i hi cul tural understandi ngs, or other envi ronm ental l y dependent f actors. A Tm s can R odel operate i n both unsupervi sed and supervi sed l earni ng modes, and can sw tch betw the i een tw seam essl y duri ng the course of l earni ng. o l D ng supervi sed l earni ng, the vi gi l ance param uri eter, or requi red m l evel , i s rai sed atch i f an i ncorrect predi cti on i s m (e. g. , i f there i s negati ve rei nf orcem ade ent) by j ust e- nough to tri gger a m ory search f or a new category. T s type of vi gi l ance control em hi sacri
  • 40.
    ces category generali ty onl y w m speci
  • 41.
    c categori esare needed to m the hen ore atch stati sti cal properti es of a gi ven envi ronm C ent. ategori es of vari abl e general i ty are hereby autom cal l y l earned based upon the success or f ai l ure of previ ousl y l earned categori es ati i n predi cti ng the correct cl assi
  • 42.
    cati on. Ablock di agramof the A T Xarchi tecture i s RE show i n Fi gure 1. n 2 u pl e-scal e Ori en lti ted Fi l ter T A T Xm ti pl e-scal e ori ented
  • 43.
    l ter further devel ops the B S
  • 44.
    l ter thatw i ntro- he R E ul C as duced to expl ai n texture data i n Grossberg and Mngol l a (1985). Vari ants of thi s B S i C
  • 45.
    l ter havesi nce becom standard i n m texture segm e any entati on al gori thm (M i k s al Perona, 1989; Sutter, B eck, G raham 1989; B k et al., 1990; B , ovi ergen, 1991; B ergen Landy, 1991; Jai n F arrokhni a, 1991; Graham B , eck, Sutter, 1992; G reenspan et al., 1994). Fi gure 2 di agram the A T X versi on of B S processi ng (Stages 1{5) f or a si ngl e s RE C spati al scal e. A i n R chards (1979), w used 4 spati al f requency channel s. E chan- s i e ach nel com puted 4 ori entati onal contrast f eatures. T hese
  • 46.
    l ter equations and param eters are descri bed i n A ppendi x I. Af uncti onal descri pti on i s gi ven here. Stage 1 of the B S C