Virus Detection System
         VDS


    seak@antiy.net
Outline
The virus trends of 2004
Qualities of an IDS
Mechanisms of a VDS
Data processing
20047 new kinds of virus in 2004

                             Other   PE
                              10%    2%   worm
      AD/Erotic pieces                    11%
             3%                                  Script virus
       黑客工具                                           1%
          6%




   Back Door
      20%
                                                 Trojan
  Virus writing generation                        45%
                     UNIX
            tool
                      0%
             2%
Outline
The virus trends of 2004
Qualities of an IDS
Mechanisms of a VDS
Data processing
How a traditional IDS works
               Meticulous protocol
               analysis
               Lightweight rule set
               No more than 500
               records in a rule set.
Unitary software designing
Unitary design: In the    AV Ware: Scan
case of dealing with an   target object’s
extensive complicated
incident, we should
                          divergence.
classify the events and   IDS: Protocol’s
unify one or more of      divergence.
the processing modules
by using an extensible
data structure and data
set.
AVML and Snort
Echo
virus(id=”B00801”;type=”Backdoor”;os=”Win32”;format=”pe”;na
me=”bo”;version=”a”;size=”124928”;Port_listen=on[31337];cont
ent=|81EC0805000083BC240C05000000535657557D148B84242
40500008BAC242005000050E9950500000F85800500008B|;delm
ark=1)

alert tcp $EXTERNAL_NET any -> $HOME_NET 21
(msg:"Backdoor.bo.a Upload"; content:
|81EC0805000083BC240C05000000535657557D148B842424050
0008BAC242005000050E9950500000F85800500008B |;)

alert tcp $EXTERNAL_NET any -> $HOME_NET 139
(msg:"Backdoor.bo.a Copy"; content:
|81EC0805000083BC240C05000000535657557D148B842424050
0008BAC242005000050E9950500000F85800500008B |;)。
Redundant scans caused by
       divergence

 FTP
         Transfer
                     NETBIOS
 rules   character     rules

           rules
Rule set scaling pressure
   type         quantity   Besides worms,
                           there are over
Email worm       2807      20,000 Trojans,
                           Backdoors, etc…
 IM-worm          172
                           which transfer over
                           the network.
P2P-worm         1007
                           The corresponding
IRC-worm          715      rule quantity may
                           exceed 30,000
Other worm        675      records.
   total         5376
Outline
The virus trends of 2004
Qualities of an IDS
Mechanisms of a VDS
Data processing
Algorithm optimization(1)
                5000
                4500
                                                                                                                              When the quantity of rules
                4000                                                                                                          is less than 6,000, it is not
                3500                                                                                                          obvious that time
durtation(ms)




                3000
                2500
                                                                                                                              increases linearly with
                2000                                                                                                          record count. But after
                1500                                                                                                          about 10,000 records, that
                1000
                                                                                                                              begins to change, causing
                500
                  0
                                                                                                                              a sudden drop in
                                                                                                                              performance up until it is
                     0
                    00

                           00

                                  00

                                         00

                                                00

                                                       00

                                                            0

                                                                  0

                                                                         0

                                                                                0

                                                                                       0

                                                                                              0

                                                                                                     0

                                                                                                            0

                                                                                                                   0

                                                                                                                          0
                                                          50

                                                                 00

                                                                        50

                                                                               00

                                                                                      50

                                                                                             00

                                                                                                    50

                                                                                                           00

                                                                                                                  50

                                                                                                                         00
                  15

                         30

                                45

                                       60

                                              75

                                                     90

                                                        10

                                                               12

                                                                      13

                                                                             15

                                                                                    16

                                                                                           18

                                                                                                  19

                                                                                                         21

                                                                                                                22

                                                                                                                       24
                                                                  records
                                                                                                                              simply unavailable。


                                  The influence of record quantity on record
                                  matching time
Algorithm optimization (2)
                                                                                                                                                     The scanning speed
                                                                                                                                                     is also affected by
                                                              实际规则检测网络数据
                                                               木马检验网络数据

                                                              实际规则检测随机数据
                                                                                                                                                     the data being
                6000                                          随机规则检测网络数据                                                                             matched and the
                5000                                                                                                                                 quality of the
                                                                                                                                                     patterns.
duration (ms)




                4000

                3000

                2000

                1000

                  0
                       0
                           1500
                                  3000
                                         4500
                                                6000
                                                       7500
                                                              9000
                                                                     10500
                                                                             12000
                                                                                     13500
                                                                                             15000
                                                                                                     16500
                                                                                                             18000
                                                                                                                     19500
                                                                                                                             21000
                                                                                                                                     22500
                                                                                                                                             24000




                                                                                       records




                                                       Scan methods’ and data objects’ influence on the speed
Algorithm optimization (3)

              1200
              1000
speed(kb/s)




                                                                                                                             original                      improved
              800
              600
              400
              200
                0
                     500
                           2000
                                  3500
                                         5000
                                                6500
                                                       8000
                                                              9500
                                                                     11000
                                                                             12500
                                                                                     14000
                                                                                             15500
                                                                                                     17000
                                                                                                             18500
                                                                                                                     20000
                                                                                                                               21500
                                                                                                                                       23000
                                                                                                                                               24500
                                                                                                                                                       26000
                                                                                                                                                               27500
                                                                                                                                                                       29000
                                                                                     records




                 Influence on efficiency caused by limiting the
                 approximation of the virus’ characteristics
Key method of designing VDS




The Unitary Model focuses on matching speed and matching
granularity — matching is of foremost importance.
Network traffic data is classified into three types:data matched
on the binary level, data needing pre-treatment and data needing
specific algorithms。
Data flow direction and
                                                                                                                    the Level of virus detection
                                                                                                                                                    Divided into 4 levels:
                                       Data log / Process backstage




                                                                                                                                  Event process
                                                                                                                                      level
                                                                                                                                                    collection, divergence,
                                                                                                                                                    detection and
                                                                                                                    (File) Scan
                                                                                                             Complete Dataflow
                                                                                                                                                    processing
Cross verification




                                                                                                                                   Virus scan
                                                                                               Stream scan




                                                                                                                                     level



                                                                                                                                                    Provides package
                                                                                                                                                    scanning, incomplete
                                                                                                        t
                                                                                                     en
                                                                                                    m
                                                                                                  at




                        Pa
                                                                                                re
Procotol tag transfer




                                                                                                                                                    data scanning And
                          cka
                                                                                              et




                             g
                                                                                              pr




                                 es
                                      can
                                                                                                                                  Data diffluence
                                                                      Protocol analysis and




                                                                                                                                       level

                                                                                                                                                    complete data
                                                                           diffluence




                                                                                                                                                    scanning.
                                                                                                                                  Data collection
                                                                                                                                       level
                                                                             Sniffer
System structure
Data efficiency




Virus data output from Harbin Institute of Technology on
July 8 , 2003.
Statistics from the 26th week of 2005
Unknown virus forewarning system




Detected an unknown worm (I-
Worm.Unknow) increasing notably on June
5, 2003. On June 6 it was shown to be the
virus I-worm.sobig.f.
Outline
The virus trends of 2004
Qualities of an IDS
Mechanisms of a VDS
Data processing
Event Processing ( 1 )
Detection Events          Processing methods
Description Language      Tech-based Internal
(DEDL).                   combine
We use descriptors to     Parallel combine
define standard formats   Analysis-based Parallel
for network events and    combine
make them support
other formats             Radiant combine
Defined elements:         Convergence combine
event type, event ID,     Chain combine
source IP, target IP,
event time, and so on.
More than 20 such key
     elements.
Event Processing ( 2 )
If existNet_Action(RPC_Exploit)[IP(1)->IP(2);time(1)]
Net_Action(RPC_Exploit) [IP(2)->IP(3) ;time(2)]
and
time(2)>time(1)
than
Net_Action(RPC_Exploit) [IP(1)-> IP(2) -> IP(3)]
Behavior Classifications
DEDL events                                                 AVML diagnostic behavior regulations



Net_Action(act)[IP(1),IP(2):445; ;time(1)]                  Virus_act_lib
Net_Action(act)[IP(1),IP(3):445; ;time(1)]                  Virus
….                                                                seek(id=”W02872”;dport=139,445;trans=ne
Net_Action(act)[IP(1),IP(12):445; ;time(1)]                       tbios)
Net_Action(Trans,Worm.Win32.Dvldr)[IP(1)->IP(12);time(1)]
Data processing
            IRC SERVER                                   IRC SERVER2                   IRC SERVER3




                                                                                 IR
                                                                                   C
                                                                            Co
                                                                               nn
                                                                                            IRC C
          IRC Connection                     IRC Connection




                                                                       ec
                                                                                                 onnec




                                                                         tio
                                                                                                      tion
                            IRC Connection


                                                                             n




                           Virus.A
                           NODE A                 NODE B                  NODE C            NODE D




Virus.A                      Node A                       Node B


                                                                            Virus.A



                                                       Virus.A

                             Virus.A
Thoughts
Network virus monitoring has been
explored academically and
productively. It has now expanded
into a new technology with its own
direction.
The path of virus defense leads us to
the world of freedom.

Virus Detection System

  • 1.
    Virus Detection System VDS seak@antiy.net
  • 2.
    Outline The virus trendsof 2004 Qualities of an IDS Mechanisms of a VDS Data processing
  • 3.
    20047 new kindsof virus in 2004 Other PE 10% 2% worm AD/Erotic pieces 11% 3% Script virus 黑客工具 1% 6% Back Door 20% Trojan Virus writing generation 45% UNIX tool 0% 2%
  • 4.
    Outline The virus trendsof 2004 Qualities of an IDS Mechanisms of a VDS Data processing
  • 5.
    How a traditionalIDS works Meticulous protocol analysis Lightweight rule set No more than 500 records in a rule set.
  • 6.
    Unitary software designing Unitarydesign: In the AV Ware: Scan case of dealing with an target object’s extensive complicated incident, we should divergence. classify the events and IDS: Protocol’s unify one or more of divergence. the processing modules by using an extensible data structure and data set.
  • 7.
    AVML and Snort Echo virus(id=”B00801”;type=”Backdoor”;os=”Win32”;format=”pe”;na me=”bo”;version=”a”;size=”124928”;Port_listen=on[31337];cont ent=|81EC0805000083BC240C05000000535657557D148B84242 40500008BAC242005000050E9950500000F85800500008B|;delm ark=1) alerttcp $EXTERNAL_NET any -> $HOME_NET 21 (msg:"Backdoor.bo.a Upload"; content: |81EC0805000083BC240C05000000535657557D148B842424050 0008BAC242005000050E9950500000F85800500008B |;) alert tcp $EXTERNAL_NET any -> $HOME_NET 139 (msg:"Backdoor.bo.a Copy"; content: |81EC0805000083BC240C05000000535657557D148B842424050 0008BAC242005000050E9950500000F85800500008B |;)。
  • 8.
    Redundant scans causedby divergence FTP Transfer NETBIOS rules character rules rules
  • 9.
    Rule set scalingpressure type quantity Besides worms, there are over Email worm 2807 20,000 Trojans, Backdoors, etc… IM-worm 172 which transfer over the network. P2P-worm 1007 The corresponding IRC-worm 715 rule quantity may exceed 30,000 Other worm 675 records. total 5376
  • 10.
    Outline The virus trendsof 2004 Qualities of an IDS Mechanisms of a VDS Data processing
  • 11.
    Algorithm optimization(1) 5000 4500 When the quantity of rules 4000 is less than 6,000, it is not 3500 obvious that time durtation(ms) 3000 2500 increases linearly with 2000 record count. But after 1500 about 10,000 records, that 1000 begins to change, causing 500 0 a sudden drop in performance up until it is 0 00 00 00 00 00 00 0 0 0 0 0 0 0 0 0 0 50 00 50 00 50 00 50 00 50 00 15 30 45 60 75 90 10 12 13 15 16 18 19 21 22 24 records simply unavailable。 The influence of record quantity on record matching time
  • 12.
    Algorithm optimization (2) The scanning speed is also affected by 实际规则检测网络数据 木马检验网络数据 实际规则检测随机数据 the data being 6000 随机规则检测网络数据 matched and the 5000 quality of the patterns. duration (ms) 4000 3000 2000 1000 0 0 1500 3000 4500 6000 7500 9000 10500 12000 13500 15000 16500 18000 19500 21000 22500 24000 records Scan methods’ and data objects’ influence on the speed
  • 13.
    Algorithm optimization (3) 1200 1000 speed(kb/s) original improved 800 600 400 200 0 500 2000 3500 5000 6500 8000 9500 11000 12500 14000 15500 17000 18500 20000 21500 23000 24500 26000 27500 29000 records Influence on efficiency caused by limiting the approximation of the virus’ characteristics
  • 14.
    Key method ofdesigning VDS The Unitary Model focuses on matching speed and matching granularity — matching is of foremost importance. Network traffic data is classified into three types:data matched on the binary level, data needing pre-treatment and data needing specific algorithms。
  • 15.
    Data flow directionand the Level of virus detection Divided into 4 levels: Data log / Process backstage Event process level collection, divergence, detection and (File) Scan Complete Dataflow processing Cross verification Virus scan Stream scan level Provides package scanning, incomplete t en m at Pa re Procotol tag transfer data scanning And cka et g pr es can Data diffluence Protocol analysis and level complete data diffluence scanning. Data collection level Sniffer
  • 16.
  • 17.
    Data efficiency Virus dataoutput from Harbin Institute of Technology on July 8 , 2003.
  • 18.
    Statistics from the26th week of 2005
  • 19.
    Unknown virus forewarningsystem Detected an unknown worm (I- Worm.Unknow) increasing notably on June 5, 2003. On June 6 it was shown to be the virus I-worm.sobig.f.
  • 20.
    Outline The virus trendsof 2004 Qualities of an IDS Mechanisms of a VDS Data processing
  • 21.
    Event Processing (1 ) Detection Events Processing methods Description Language Tech-based Internal (DEDL). combine We use descriptors to Parallel combine define standard formats Analysis-based Parallel for network events and combine make them support other formats Radiant combine Defined elements: Convergence combine event type, event ID, Chain combine source IP, target IP, event time, and so on. More than 20 such key elements.
  • 22.
    Event Processing (2 ) If existNet_Action(RPC_Exploit)[IP(1)->IP(2);time(1)] Net_Action(RPC_Exploit) [IP(2)->IP(3) ;time(2)] and time(2)>time(1) than Net_Action(RPC_Exploit) [IP(1)-> IP(2) -> IP(3)]
  • 23.
    Behavior Classifications DEDL events AVML diagnostic behavior regulations Net_Action(act)[IP(1),IP(2):445; ;time(1)] Virus_act_lib Net_Action(act)[IP(1),IP(3):445; ;time(1)] Virus …. seek(id=”W02872”;dport=139,445;trans=ne Net_Action(act)[IP(1),IP(12):445; ;time(1)] tbios) Net_Action(Trans,Worm.Win32.Dvldr)[IP(1)->IP(12);time(1)]
  • 24.
    Data processing IRC SERVER IRC SERVER2 IRC SERVER3 IR C Co nn IRC C IRC Connection IRC Connection ec onnec tio tion IRC Connection n Virus.A NODE A NODE B NODE C NODE D Virus.A Node A Node B Virus.A Virus.A Virus.A
  • 25.
    Thoughts Network virus monitoringhas been explored academically and productively. It has now expanded into a new technology with its own direction. The path of virus defense leads us to the world of freedom.