Content Aware SIEM™ defined


Published on

  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Content Aware SIEM™ defined

  1. 1.           Content  Aware  SIEM™  defined   By  Dr.  Anton  Chuvakin  and  Eric  D.  Knapp   1/4/2010      
  2. 2.   EXECUTIVE  SUMMARY     Content  Aware  SIEM™  (CA-­‐SIEM)  represents  a  new  generation  of  Security  Information  and  Event   Management  (SIEM)  capabilities  that  extend  the  value  and  benefits  of  SIEM  by  providing  visibility  into  the   contents  of  applications,  documents  and  protocols.    Without  content  awareness,  a  SIEM  is  only  able  to  act   upon  the  surface  details  provided  by  logs.    This  limits  the  effectiveness  of  key  SIEM  functionalities  — including  threat  detection,  incident  response,  and  compliance  reporting  —  because  the  data  being  used  for   analysis  lacks  sufficient  context  to  make  informed,  relevant  decisions.   As  a  result,  SIEM  systems  have  started  to  evolve:  context  information  from  add-­‐on  systems  such  as  Identity   Management,  Vulnerability  Assessment,  Configuration  Management  systems,  and  others  has  been  used  to   enhance  the  security  events  collected  and  correlated  by  the  SIEM.    While  these  systems  provide  a  great  deal   of  value  to  SIEM,  the  events  themselves  are  still  myopic,  limited  to  the  summary  data  provided  by  the   source  log  files.   Consider  the  following  example:     An  email  being  sent  by  an  admin  user  to  an  outside  address,  on  the  surface,  does  not  represent  a   threat.    However,  if  the  email  contains  sensitive  company  information,  either  within  the  body  of  the   email  or  in  an  attachment,  the  activity  could  indicate  insider  theft.       However,  by  extending  visibility  into  the  actual  payload  of  applications  and  protocols  on  the  network,  this   generates  a  massive  increase  in  event  load  translating  directly  to  a  performance  impact  on  current  SIEMs.       Query  responses  from  most  SQL-­‐based  data  stores  begin  to  slow  rapidly  after  just  a  few  million  rows  of   stored  data,  and  content  information  can  quickly  generate  billions  of  rows.  Even  with  highly  optimized   databases  or  flat-­‐file  data  stores,  current  systems  lack  the  scale  and  performance  to  deliver  the  real-­‐time   results  needed  to  be  valuable  as  rapid  response  operations  systems.     Luckily,  while  content  awareness  represents  architectural  challenges  to  most  SIEM  platforms,  it  is  a   technology  that  is  available  today.    NitroSecurity’s  NitroView  Enterprise  Security  Manager  (ESM)  —  which   was  built  from  the  start  to  handle  massive  volumes  of  diverse  data,  logs  and  content  —  is  the  first   commercially  available  Content  Aware  SIEM.           1  
  3. 3.   INTRODUCTION   What  is  SIEM?   Security  information  and  event  management  (SIEM)  is  sometimes  defined  as  a  set  of  technologies  for  log   data  collection,  aggregation,  normalization,  retention,  analysis  and  workflow.    The  analysis  of  security   information  includes  the  collection  of  “context  data”  —  additional  details  and  meta  data  that  are  relevant   for  understanding  the  impact  of  collected  log  entries,  such  as  DNS  names  or  geo-­‐location  information,  but   not  defined  within  the  original  log  —  and  the  correlation,  visualization  and  reporting  of  the  newly   contextualized  information.    In  addition,  SIEM  products  support  related  workflow  either  directly  or  through   the  integration  of  external  case  management  systems,  so  that  this  newly  contextualized  information  can  be   used  directly  to  support  the  various  operations  of  the  Security  Operations  Center  (SOC)  or  Computer   Incident  Response  Team  (CIRT).    Similarly,  Gartner  defined  SIEM  as  “[technology]  used  to  analyze  security   event  data  in  real  time  for  internal  and  external  threat  management,  and  to  collect,  store,  analyze  and   report  on  log  data  for  regulatory  compliance  and  forensics”  (source:  SIEM  research  note  2009).   Companies  use  SIEM  tools  as  enabling  technology  for  both  Security  Operations  Centers  (SOCs),  where  the   SIEM  is  a  key  component  of  security  monitoring  functions,  and  also  for  compliance  reporting  in  support  of   various  regulatory  requirements,  including  NERC,  HIPAA,  PCI,  and  the  Sarbanes-­‐Oxley  Act  (SOX).   SIEM  solutions  originated  from  simple  event  management  and  aggregation  tools,  evolving  with  industry   trends  until  they  became  advanced  security  centralization  solutions  that  help  organizations  optimize  various   aspects  of  security  management,  including  incident,  risk,  policy  and  vulnerability  management.    As  SIEM   evolved,  it  also  expanded  to  provide  greater  visibility  into  new  areas  of  the  enterprise,  by  expanding  the   scope  of  data  that  is  fed  into  the  SIEM:  from  log  sources  such  IDS,  IPS,  hosts  and  firewalls  to  new   information  sources  designed  to  provide  additional  context  to  an  event  —  vulnerability  assessment   information,  and  identity  information,  for  example.   As  security  and  compliance  requirements  grow  even  more  demanding,  however,  the  effectiveness  of  the   SIEM  is  under  pressure.    Where  SEM  added  additional  information  context  and  historical  analysis  to  event   management  to  become  SIEM,  the  need  for  visibility  into  actual  application  usage  is  driving  SIEM  towards  a   further  evolution,  towards  a  new  SIEM  that  is  both  context  and  content  aware.    The  new  Content  Aware   SIEM  (CA-­‐SIEM)  combines  context  and  content  together  with  log  and  event  data  to  provide  the  next   generation  of  threat  detection,  incident  response,  remediation,  risk  management  and  compliance  services   (see  Appendices  for  examples  and  use  cases).       2  
  4. 4.   What  is  context?   As  we  mentioned  before,  “context”  in  relation  to  security  management  is  a  matter  of  data  enrichment.     Data  from  various  log,  event  and  flow  sources  is  fed  into  SIEM  platforms  in  order  to  convert  raw  data  into   actionable  information  about  threats,  attacks,  malware  issues  and  other  security  relevant  information.  Such   information  is  then  used  to  make  decisions  –  by  SIEM  operators  and  their  managers  as  well  as  by  the  SIEM   itself  –  to  investigate,  block,  reconfigure,  etc.   One  of  the  ways  SIEM  performs  such  enrichment  is  by  adding  “context  information.”  Context  information  is   the  additional  information  required  to  make  the  limited  details  available  within  an  event  or  log  more   meaningful.    Context  information  does  not  come  from  the  logs  themselves,  but  originates  in  the   surrounding  IT  environment,  in  other  information  systems  inside  or  outside  the  organization.     Context  Information  can  be  thought  of  using  the  analogy  of  an  online  retailer:  if  the  book  title,  price,   and  ISBN  make  up  the  “log  record”  then  where  it  was  purchased,  store  address,  phone  number  or   author  name  are  all  parts  of  the  book’s  context.  Of  course,  the  book’s  contents  would  be  an  example   of  Content  Information,  which  is  discussed  later.   One  of  the  simplest  example  of  Context  Information  is  name  and,  where  possible,  full  identity  resolution:   DNS  names,  Windows  Host  names  and/or  network  username  details  are  added  to  the  logs.    While  the  log   file  may  have  already  provided  IP  addresses,  the  added  context  of  a  human-­‐readable  name  makes  the  log   more  meaningful.  Normally,  DNS  names  are  not  present  in  logs,  but  have  to  be  obtained  by  queries  to  a   DNS  server.    The  SIEM  tool  gathers  Context  Information  from  a  variety  sources,  including:   • Windows  name  services,  DNS  and  NIS  servers:  to  map  addresses  to  names   • Defined  asset  groups:  Internal  or  external  status  of  an  IP  address;  Logical  or  physical  meta-­‐groups   • WHOIS  servers:  WHOIS  information  for  external  addresses  shows  who  owns  them  and  where  they   are  located   • Geo-­‐location:  show  the  physical  location  of  the  system   • Asset  and  owner  information  for  internal  addresses   • DNS  and  NIS  servers:  to  map  addresses  to  names   • Active  directory  and  LDAP  servers:  to  map  user  names  to  actual  user  identities   • Entitlement  servers:  to  obtain  a  user’s  entitlements   • Asset  management  systems:  to  gather  information  about  systems,  their  ownership,  compliance   relevant  of  each  system  or  group  of  systems   • Attack  and  exploit  information:  to  gather  additional  details  about  the  log  data   • Vulnerability  assessment  information   An  example  of  log  data  enrichment  is  given  in  Figure  A.       3  
  5. 5.     Figure  A:  Enriching  the  surface  data  found  in  logs     Access  to  Context  Information  requires  the  SIEM  to  connect  to  external  systems  and  then  to  retain  the   Context  Information  locally,  so  that  it  can  be  accessed  quickly  and  used  for  event  analysis,  correlation,   reporting,  and  other  SIEM  functions.    Context  Information  is  extremely  useful  for  performing  advanced   forms  of  correlation.  For  example,  correlating  user  role  with  access  hours  or  matching  the  server  under   attacks  with  a  particular  department  in  the  company  are  examples  of  useful  correlations  with  context  data.     Consider  the  following  examples:   Classic  event  correlation  rule:    if  a  user  tries  to  login  10  times  and  fails,  but  then  immediately   succeeds  to  login,  raise  an  alert  “Password  guessing  successful.”   Context-­aware  rule:  if  user  belongs  to  a  “IT  group,”  but  not  to  “IT  support  subgroup”  and  logs  into  a   server  in  “finance”  group,  raise  a  medium  severity  alert  “Possible  unauthorized  access.”   The  benefit  of  context-­‐aware  correlation  is  obvious  in  the  above  example,  as  it  makes  an  “unauthorized   login”  alert  more  relevant  to  the  security  analyst.    It  also  allows  creation  of  very  simple  and  effective   correlation  rules,  which  can  be  directly  derived  from  corporate  security  policy.       4  
  6. 6.   What  is  content?   As  mentioned  before,  log  data  and  context  data  provide  the  major  input  into  a  SIEM  product.  Log  data   shows  what  activities  were  occurring,  succeeding,  failing  or  being  attempted.  However,  what  if  we’d  like  to   learn  more  about  specifics  of  those  activities?  For  example,  what  was  that  downloaded  file?  What  was  that   email  about?     Learning  these  details  becomes  more  and  more  important  as  threats  become  more  complex  and  as  they   “move  up  the  stack,”  exploiting  vulnerabilities  at  the  application  and  session  layers  as  well  as  business  logic   flaws.    Current  SIEM  products  cannot  answer  these  questions,  and  require  other  expensive  security   products  such  as  Data  Leakage  Prevention  (DLP)  and  Database  Activity  Monitoring  (DAM)  to  be  deployed   and  integrated  alongside  the  SIEM.    However,  as  SIEM  evolves  to  become  content  aware,  the  situation  is   changing,  allowing  the  integration  of  key  DAM  and  DLP  features  into  the  SIEM.    Following  the  above  book   analogy:   The  book’s  content  provides  the  ultimate  level  of  detail:  the  full  text  that  make  up  the  book’s  interior.     While  title,  ISBN,  price  are  part  of  the  book  “log  record,”  and  the  added  context  of  the  author’s  other   published  titles  might  be  obtained  from  outside  sources  to  add  relevance,  what  is  the  book  about?     Does  it  contain  profanity?    Does  it  mention  a  particular  historical  figure,  or  a  certain  character?     How  many  chapters,  words  and  characters  does  the  book  contain?    How  many  times  does  the  word   ‘engineering’  appear  in  the  text,  and  of  those,  how  many  occur  immediately  following  the  word   ‘social’?    Only  the  analysis  of  the  book’s  content  will  answer  these  questions.     This  analogy  also  explains  the  value  of  content  compared  to  the  value  of  log  data:  while  knowing  the  ISBN   and  title  is  very  useful  and  gives  an  initial  impression  of  the  book,  it  is  impossible  to  make  a  qualified  and   informed  decision  about  the  book  until  you  read  its  contents.    If  it’s  a  fiction  novel,  do  you  like  the  story?    If   it’s  a  reference,  does  it  contain  the  information  that  you  need?    Should  you  buy  it?    Without  content   awareness,  you  would  be  contradicting  the  age-­‐old  lesson  of  “judging  a  book  by  its  cover.”   Imagine  a  SIEM  product  having  access  to  the  actual  content  of  the  conversation.  It’s  easy  to  understand  the   almost  endless  opportunities  that  this  would  provide  to  the  security  analyst.    Using  another  analogy,   consider  law  enforcement:    tracing  a  phone  call,  or  examining  call  records  to  know  “who  called  who”   provides  a  certain  value  to  an  investigation,  but  is  largely  circumstantial.    In  contrast,  actually  recording  the   contents  of  the  phone  call  provides  hard  evidence,  with  a  clear  and  incontrovertible  record  of  the   conversation,  and  all  (if  any)  incriminations.    Admittedly,  knowing  who  called  who  is  key  for  catching  known   offenders:  common  criminals  and  terrorists,  for  example.    However,  knowing  what  they  actually  spoke   about  allows  the  investigation  to  go  the  next  level.  The  more  granular  collection  and  analysis  of  a   conversation  (the  Content  Information  of  the  phone  call)  based  upon  the  initial  transaction  (dialing  a   number  to  place  a  call  to  a  certain  destination)  allows  the  initiation  or  escalation  of  incident  response   efforts  to  be  more  focused,  efficient,  and  effective.    Using  SIEM  for  information  security  is  no  different:   content  truly  matters,  and  facilitates  all  levels  of  threat  detection  and  investigation  (see  ‘Appendix  B:   Content  Analysis  in  CA-­‐SIEM’).    However,  adding  content  information  to  SIEM  runs  the  risk  of  overwhelming   the  tools  as  well  as  the  analyst  with  too  much  information.  The  SIEM  has  to  be  engineered  to  support   massive  amounts  of  data  in  a  scalable  manner.       5  
  7. 7.   So  what  is  “content”  as  it  relates  to  SIEM?  For  the  purpose  of  this  discussion  we  define  content  as  the   payload  of  an  application,  i.e.,  what  is  actually  being  communicated,  transferred,  and  shared  over  the   network.  Logs  describe  the  fact  that  an  activity  has  taken  place  on  a  system  or  network.  Content  is  what   defines  the  actual  nature  of  the  activity.  For  example:   • Email  contents,  including  attachments   • Social  network  communication   • Document  contents   • Database  queries  and  the  size  and/or  subject  matter  of  their  responses   • IM  conversation  contents   What  Legacy  SIEM  is  Missing   The  SIEM  typically  collects  data  from  a  variety  of  sources  to  provide  a  base  of  forensic  detail  (for  detection   of  events  as  well  as  for  subsequent  investigations),  collects  additional  Context  Information  to  provide  other   relevant  assessments,  such  as  risk  and  severity.    However,  with  current  SIEMs,  the  range  of  data  sources   collected  is  still  relatively  limited.    Typical  data  sources  include:   • Firewalls:  firewall  logs  contain  information  on  connections  allowed  or  blocked  by  the  firewall   • Netflow:  similarly,  netflow  records  contain  information  about  systems  connecting  to  other  systems   and  include  characteristics  such  as:  session  time,  duration,  protocol,  packet  and  byte  count.   • Router  logs:  often  disabled,  router  logs  are  similar  to  firewall  logs  and  netflow  since  they  contain   connectivity  information,  as  well  as  access  ‘permit’  and  ‘deny’  events  related  to  network  access.     • NIDS/NIPS  alerts:  these  alerts  and  logs  contain  information  about  attacks  detected  or  prevented  by   the  systems  as  well  as  information  about  suspicious  activities.   • Operating  system  logs  such  as  Windows  event  logs  or  Unix/Linux  syslog:  they  contain  information   about  the  routine  system  operations,  system  access  as  well  as  various  errors  and  failures.   • Email  server  logs:  these  logs  contain  the  information  about  sent  and  received  email,  the  sender  and   the  recipients’  addressees,  errors,  message  sizes  and  other  parameters  of  email  messages.   • Proxy  logs:  logs  contain  information  about  internet  connectivity  thru  the  proxy  server;  what  sites   the  user  went  to  and  sometimes  which  actions  he  performed  on  each  site.   • Vulnerability  Assessment  results:  results  of  a  standard  VA  scan  can  define  specific  host  and   application  vulnerabilities,  and  provide  a  detailed  inventory  of  discovered  assets.   The  following  log  types  are  rarely  sent  into  a  SIEM,  even  though  their  use  is  on  the  rise.    While  collection  of   information  from  these  sources  is  considered  good  practice,  it  also  represents  new  challenges  to  the  SIEM.     • Database  logs:  native  database  logs  contain  queries  and  database  administration  commands.   • Application  logs:  such  logs  contain  literally  anything  that  a  developer  will  put  in  them,  with  no   standards,  limitation  or  restrictions.   • IAM:  Identity  and  Access  Management  systems  provide  user  policy  context within SIEM.   • CMDB:    Configuration  Management  Database  systems  provide  configuration  policy  context  to  SIEM.   However,  despite  the  source  of  these  log  files  being  applications  and  databases,  they  seldom  represent   actual  application  activity  or  data  access.    They  still  represent  the  surface  detail;  the  “title  and  ISBN”   information  that  is  available  in  the  log.       6  
  8. 8.   How  Content  Awareness  Brings  Value  to  SIEM   Before  Content-­‐Aware  SIEM  was  created  by  NitroSecurity,  a  typical  SIEM  product  was  aware  of  things  like   source  IP  address,  destination  IP  address,  TCP  or  UDP  port,  username,  attack  type,  number  of  bytes   transferred,  etc.      As  discussed  earlier,  this  information  is  insufficient  —  especially  when  considering  event   correlation,  which  is  the  SIEM’s  primary  mechanism  for  threat  detection.    Correlation  rules  are  limited  to   fairly  simple  patterns  that  match  known  attack  patterns,  such  as  a  “brute  force”  login:   Event  correlation  rule:    if  a  user  tries  to  login  10  times  and  fails,  but  then  immediately  succeeds  to   login,  raise  an  alert  “Password  guessing  successful”   Or  perhaps  a  slightly  more  sophisticated  rule:   Event  correlation  rule:  if  an  IDS  alert  against  a  system  is  followed  by  a  new  user  being  added  to  a   system,  raise  an  alert  “System  compromise”   With  the  addition  of  context,  correlation  rules  can  leverage  user  roles,  IP  address  information,  and  asset   data  for  correlation.  For  example:   Context  rule:  if  a  non-­admin  user  accesses  a  system  after  hours,  alert  “Possible  fraudulent  activity.”   Adding  content  awareness  to  SIEM  creates  unique  advantages  for  the  security  analyst;  allowing  the  analysis   of  new  content  and  correlating  that  content  information  with  logs  and  context  information.    This  results  in   better  visibility  and  control  over  the  entire  IT  environment,  not  just  the  network.    Logs  allow  the  SIEM  to  see   the  events  occurring  on  systems  and  network  devices,  adding  content  information  explains  the  nature  of   these  events.  From  simply  knowing  that  an  email  was  sent,  we  move  to  knowing  what  it  was  about.  From   knowing  that  FTP  connection  was  established,  we  move  to  knowing  what  file  was  copied.   Content  data  can  be  analyzed  using  Content  Aware  SIEM.  Instead  of  correlating,  summarizing,  filtering  and   reporting  on  events  we  can  now  filter  email  contents,  correlate  SQL  keywords  with  others  information  and   perform  other  analysis  tasks.  Examples  of  how  content  data  can  add  value  to  correlation  rules  are  shown   below:   Event  correlation  rule:  if  1,000  emails  originating  from  within  the  company  are  sent,  raise  an  alert   “anomalous  email  activity.”   Context  rule:  if  1,000  emails  originating  from  a  non-­SMTP  host  within  the  company  are  sent  to  1,000   unique  addresses,  raise  an  alert  “possible  spam  bot,”  with  increased  severity.   Content  rule:  if  1,000  emails  originating  from  a  non-­SMTP  host  within  the  company,  with  a  ‘reply  to’   address  in  an  outside  domain,  are  sent  to  1,000  unique  addresses  with  the  words  ‘account’  and   ‘password’  in  the  body  of  the  email,  raise  a  critical  alert  “possible  spam  bot,”  with  maximum  severity.       7  
  9. 9.   Getting  the  Most  out  of  Content  Correlation   Additional  value  is  recognized  when  logs,  context,  and  content  information  are  correlated  together.  Such   rules  can  be  used  for  either  security  or  compliance  use  cases,  within  the  SOC  environment  or  for  automated   security  monitoring.    Using  content  information  alone  will  improve  the  value  of  correlation,  but  the  real   value  of  content  correlation  is  achieved  when  individual  rules  are  combined  to  create  tiered  or  “composite”   rules,  which  provide  an  even  greater  incident  detection  capability.    For  example:   Content  rule:  if  outbound  application  contents  contain  sensitive  terms  (e.g.,  “quarterly  report”),   raise  an  alert  “possible  breach  of  non-­disclosure”.   Composite  rule:  if  user  role  is  “Finance”  and  an  IM  conversation  topic  is  “quarterly  report”  and  the   IM  program  is  not  on  the  corporate  approved  list,  raise  a  critical  alert  “possible  fraudulent  activity.”   Such  rules  allow  much  more  comprehensive  security  monitoring,  compared  to  a  legacy  SIEM.    Bringing  this   level  of  visibility  into  the  Content  Aware  SIEM  also  brings  us  one  step  closer  to  a  true  “single  pane  of  glass”   CISO  dashboard.  Previously,  the  daily  operations  involved  in  security  monitoring  entailed  watching  the  logs;   now  the  views  can  include  the  actual  contents  of  network  communication  and  application  activity,  adding   new  tools  to  the  arsenal  of  a  SOC  analyst,  security  manager  and  CSO.   Detection  Capability   The  CA-­‐SIEM  provides  improved  threat  detection  capability,  from  both  inside  and  outside  sources.    As   networks  and  operating  systems  grow  more  secure,  cyber  threats  continue  to  move  “up  the  stack”  of  the   OSI  model  to  take  advantage  of  application  vulnerabilities.    Monitoring  content  allows  the  CA-­‐SIEM  to   detect  these  new  application–  and  database–  level  attacks,  and  even  fraud  and  business  logic  abuse.   Sending  application-­‐level  traffic  to  the  CA-­‐SIEM  for  correlation  and  analysis  allows  for  the  detection  of  a   much  broader  range  of  attacks.    Single-­‐purpose  monitoring  tools  —  including  legacy  SIEM,  as  well  as   standalone  DAM,  DLP  and  Deep  Packet  Inspection  solutions  —  cannot  provide  this  breadth  of  visibility  and   correlation.   Insider  attacks,  in  contrast,  are  often  simple  cases  of  authorized  access  abuse.    While  these  are  not  complex,   blended  attacks  with  sophisticated  attack  vectors,  they  are  often  more  difficult  to  detect  because  they   involve  authorized  users  accessing  information  that  is  mostly  within  the  realm  of  expected  behavior.    Total   visibility  of  the  authorized  user  activity  –  from  file  transfers  to  emails  to  various  queries  to  IM  to  social   networks  –  enables  security  analysts  to  separate  legitimate  actions  from  dangerous  mistakes  and  from   actual  insider  abuse.       Finally,  if  malicious  activity  is  detected  the  CA-­‐SIEM  provides  more  relevant  data,  which  is  available  for  live   forensic  analysis  as  well  as  for  restoring  a  complete  picture  of  the  incident.    SIEM  log  repositories  have  long   been  used  for  forensics  analysis,  but  now  we  can  retain  not  just  the  logs,  but  also  the  documents  and   communication  records  of  users,  which  are  often  even  more  useful  than  logs  during  the  investigations  (see   ‘Appendix  C:  Use  Cases,’  for  more  examples).       8  
  10. 10.     Why  legacy  SIEMs  are  not  Content  Aware   With  all  of  the  added  value  that  content  awareness  brings  to  security  information  management,  why  aren’t   all  SIEMs  content  aware?    The  issue  is  one  of  both  performance  and  complexity;  as  each  event  is  examined   in  more  detail,  the  volume  of  raw  event  data  associated  with  it  increases  exponentially.    Therefore,  by  fully   decoding  packets  to  expose  the  content  information  –  whether  from  applications  or  databases  -­‐    the  volume   of  information  that  needs  to  be  stored,  retrieved,  analyzed,  and  visualized  within  the  user  interface  quickly   becomes  unmanageable.   More  Data  Volume  —  Performance  is  Key!   Just  as  a  book’s  contents  are  much  larger  than  its  title  page,  adding  content  information  into  the  SIEM   increases  the  event  load  by  many  times.    This  massive  increase  in  event  load  translates  directly  to  a   performance  impact  on  the  SIEM:  the  data  stores  used  by  most  SIEM  products  begin  to  slow  rapidly  after   just  a  few  million  rows  of  stored  data,  and  content  information  can  quickly  generate  billions  of  rows.  This  is   why  most  SIEM  vendors  are  unable  to  add  content  information  into  their  platforms.    Even  with  highly   optimized  databases  or  flat-­‐file  data  stores,  these  systems  will  quickly  become  unusable.  The  only  way  to   support  this  level  of  event  detail  for  sustained  periods  of  time  is  to  implement  a  highly  scalable  purpose-­‐ built  database,  which  would  require  hundreds  of  man  years  of  research  and  development  to  produce.     NitroSecurity,  with  a  history  in  database  development,  is  able  to  use  the  Nitro  Extreme  Database   (NitroEDB),  which  provides  the  scalability  and  performance  necessary  to  support  content  information1.   More  Complexity  —  Usability  is  Key!   Another  challenge  faced  by  the  CA-­‐SIEM  is  how  to  present  the  many  types  of  new  information  to  the   operator,  in  a  way  that  remains  meaningful  and  useful.    With  only  log  data  (the  book’s  title  and  ISBN),  each   event  can  be  clearly  defined  within  the  confines  of  a  User  Interface  (UI).    With  context  information  added   (the  book’s  availability,  reader  score,  and  other  meta-­‐  data),  the  UI  can  begin  to  grow  complex,  but  the   amount  of  information  being  visualized  is  still  manageable.    However,  with  content  information  (the  full   contents  of  the  book),  interface  complexity  becomes  a  real  issue.  Going  beyond  logs  and  context  into   document  presentation,  even  simple  query  results  can  raise  the  challenges  of  design  to  a  new  level.   Avoiding  “SIEM  operator  overload”  becomes  critical  for  CA-­‐SIEM  success,  and  requires  flexible  control  over   how  this  detail  is  presented  within  the  interface.    Some  methods  of  maintaining  usability  include:   • Provide  normalization  of  information  to  allow  higher-­‐level  activity  to  be  presented  clearly,  while   allowing  users  to  “drill  in”  for  greater  depth  as  needed.   • Provide  easy-­‐to-­‐use  “rules”  to  be  generated,  so  that  the  most  important  content  information  can  be   quickly  identified.   • Expose  content  information  to  the  CA-­‐SIEM’s  correlation  system,  so  that  content  becomes  a   consideration  when  identifying  potential  threats  —  preserving  the  content  information  for   forensics,  but  filtering  it  from  initial  views  within  the  UI.                                                                                                                         1  See  “The  Evolution  of  Security  Information  &  Event  Management  (and  the  technology  that  can  take  us  there)”,   available  at­‐generation-­‐SIEM/       9  
  11. 11.   Conclusion   Content  Aware  SIEM  is  available  today   While  CA-­‐SIEM  represents  a  new  generation  of  SIEM  capabilities,  these  systems  are  available  today.   Information  Security  professionals  no  longer  have  to  settle  for  a  mix  of  very  complex  and  poorly  integrated   solutions.  For  example,  enterprise  SIEM  software  getting  data  from  another  vendor’s  DAM  solution,  and   providing  content  information  from  yet  another  vendor’s  DLP  solution  will  NOT  produce  a  content-­‐aware   SIEM;  it  will  simply  produce  a  SIEM  with  many  different  types  of  new  logs.  A  content-­‐aware  SIEM  can   natively  understand,  integrate  and  correlate  log,  context  and  content  data  in  one  tool.   CA-­SIEM  is  the  natural  evolution  of  SIEM,  but  only  if  built  on  a  datastore  that  can  handle  the  data   The  original  Security  Event  Managers  (SEM)  started  by  supported  IDS  logs.    Bringing  in  other  third  party  logs   grew  the  SEM  into  a  Security  Information  Management  (SIM),  which  then  evolved  further  to  incorporate   contextual  information  from  other  sources  such  as  VA  and  IAM  tools,  finally  becoming  what  we  refer  to   today  as  a  “Security  Information  and  Event  Management”  system,  or  SIEM.    Each  evolution  increased  the   event  load  placed  on  the  system,  in  how  fast  events  or  logs  needed  to  be  collected,  how  much  storage  was   required  to  support  data  retention  over  time,  and  how  quickly  the  data  could  be  analyzed  and  accessed,  in   order  to  produce  actionable  information.       With  SIEM  now  becoming  aware  of  content  information,  the  strain  of  information  management  is  being   seen  again  —  this  time  exponentially  due  to  the  depth  of  event  detail  being  analyzed  and  retained.    Not   every  SIEM  can  support  these  extreme  data  management  requirements.    Even  second-­‐generation  SIEM   platforms  can  barely  handle  the  logs,  much  less  the  context  around  them,  and  there  is  absolutely  no  chance   that  adding  content  will  be  possible  without  sacrificing  performance  and  usability.  Only  SIEM  tools  built   from  the  start  to  handle  massive  volumes  of  diverse  data,  logs  and  content,  can  evolve  to  the  next  level,  and   become  a  true  Content  Aware  SIEM.         10  
  12. 12.     Appendix  A:  A  Summary  of  Content  Aware  SIEM  Capabilities   When  selecting  a  Content  Aware  SIEM,  it  is  important  to  ensure  that  the  true  features,  capabilities,  and   benefits  of  a  CA-­‐SIEM  are  provided.    The  following  list  is  a  sample  of  capabilities  that  should  be  offered  by   the  CA-­‐SIEM:   Monitoring  and  Collection:   • Capability  of  collecting  full  content  information  at  required  event  rates.   o Baseline  collection  rates  (in  EPS)  are  easy  to  determine,  and  can  be  used  to  size  a  SIEM.     o When  looking  to  collect  content  information,  the  traffic  load  (in  Mbps)  must  also  be   considered.  Make  sure  the  CA-­‐SIEM  can  handle  the  data!   • Ability  to  collect  database  traffic,  both  queries  and  responses,  with  session  detail.   o Capability  to  support  a  broad  range  of  databases,  including  major  SQL  variants.   • Ability  to  decode  application  traffic  to  monitor  application  content   o Ability  to  extract  documents,  attachments  and  other  content  from  network  communication.   o Ability  to  detect  and  recognize  encrypted  application  traffic.   o Ability  to  monitor  protocols,  including  printing  protocols  to  detect  documents  being  sent  to   network  printers.   o Ability  to  monitor  content  of  web  applications,  including  social  networking  traffic.     Implementation  and  Deployment:   • Ease  of  installation,  deployment  and  use.   • Scalability,  to  support  the  more  severe  collection  rates  associated  with  content  monitoring.   • Scalability,  to  support  the  retention  and  storage  of  the  higher  quantity  and  depth  of  information   provided  by  content  monitoring.   • Performance,  to  support  the  real-­‐time  correlation  and  analysis  of  a  broader  range  of  device  types   and  information  sources,  at  greater  volumes.   • Performance,  to  support  the  more  urgent  Incident  Response  requirements  associated  with  detected   data  loss,  fraud,  and  other  critical  functions  made  possible  by  CA-­‐SIEM.     Usability:   • Capability  of  the  user  interface  to  provide  unified  data  presentation  (UDP),  including  content   information.   • Summarization  and  presentation  of  log,  context  and  content  data.     Correlation  and  Analysis:   • Real-­‐time  correlation  of  all  context-­‐  and  content-­‐  information,  not  just  logs.   • Ability  to  match  for  patterns  in  application  content  and/or  query  response  data.   • Ability  to  match  patterns  and  look  inside  documents,  even  when  compressed  (zipped)  or  sent  as   attachments.   • Understanding  of  application  level  protocols  on  the  network,  such  as  email,  file  sharing,  IRC  and   other  protocols  in  order  to  detect  anomaly  behavior.     11  
  13. 13.     Appendix  B:  Content  Analysis  in  Content  Aware  SIEM   The  following  list  is  intended  to  represent  a  small  sample  of  the  questions  that  can  be  answered  by  using  a   Content  Aware  SIEM:   Detection  of  Data  Loss  and  Fraud:   • What  are  users  emailing?  What  is  in  the  email?    Are  sensitive,  private,  or  protected  information   being  sent?   • What  document(s)  are  being  emailed  outside  of  the  company?   • What  document(s)  are  being  uploaded  or  downloaded  to  a  remote  server  or  website?   • What  is  being  talked  about  via  IM?  Are  sensitive  terms  being  used?  Is  protected  information  being   transmitted?   • What  queries  are  being  run  against  corporate  database(s)?   • Are  failed  queries  being  requested?   • Are  database  administrators  operating  outside  of  expected  behaviors?   • Is  protected  data  being  sent  to  network  printers?    Transferred  electronically  via  email,  FTP,  IM,  etc?     Enforcement  of  Corporate  IT  Policies:   • Are  mandatory  email  disclaimers  being  removed  from  emails?   • Are  users  using  applications  outside  of  corporate  policy?     Protocol  Anomaly  Detection:   • Are  emails  originating  from  non-­‐SMTP  servers?   • Are  sessions  being  rejected?   • Are  there  underlying  network  errors  (illegal  frame  sizes,  invalid  checksums,  etc)?       12  
  14. 14.     Appendix  C:  Use  Cases   CA-­‐SIEM’s  visibility  into  application  content  (data  in  motion)  and  database  activity  (data  at  rest)  provides   crossover  functionality  into  other  key  information  security  markets.    While  this  includes  the  support  of  key   features  within  other  functional  areas  of  security,  it  does  not  necessarily  represent  a  full  replacement  of   solutions  that  have  been  built  for  these  markets.    However,  the  centralized  management  capabilities  of  the   Content  Aware  SIEM  also  provides  new  value  that  is  lacking  from  disparate,  non-­‐integrated  systems,  and  so   CA-­‐SIEM  should  be  considered  when  evaluating  solutions  in  these  markets.    Fraud  Detection  and  Data  Loss   Prevention  (DLP)  are  two  obvious  examples.    In  some  scenarios  the  users  no  longer  need  to  procure   additional  dedicated  solutions,  saving  costs  as  the  Content  Aware  SIEM  plays  the  role  of  DLP  by  collecting   and  analyzing  application  traffic  and  other  materials  transferred  over  the  network.   This  can  be  seen  in  the  following  use  cases,  which  are  only  possible  within  CA-­‐SIEM:   A. Theft  of  confidential  information   User  John  Doe  logged  in  as  and  sent  an  email  to  with  a   document  called  shoo.doc  containing  the  words  "coke  recipe"  at  12:20PM  from  the  host  desktop0232   (  using  the  SMTP  server  (  with  the  subject:  “got  it.”      This  activity  is  detectable   with  a  correlation  rule  within  the  CA-­SIEM.   B. Use  of  unauthorized  applications   User  Joe  Rock  violated  policy  by  transferring  music  using  an  unauthorized  application  limewire,   which  he  downloaded  and  installed.    He  sent  large  files  during  work  hours,  consuming  valuable   bandwidth.    More  investigation  reveals  Joe  is  a  regular  offender  -­  he  uses  Jabber,  IRC  and  even  runs  a   webserver  on  his  desktop,  all  of  which  are  unauthorized  and  put  the  organization  at  risk.    The  CA-­ SIEM  can  detect  the  unauthorized  download  and  installation  of  limewire,  issuing  a  notification  early   in  the  process.    In  addition,  the  large  transfers  and  other  policy  violations  could  be  detected  singly  or   as  part  of  more  complex  rule  (if  user  downloads  unauthorized  applications  and  then  exhibits   abnormal  network  usage,  alert  ‘possible  misuse  of  unauthorized  applications’).    Finally,  the  CA-­SIEM   is  able  to  correlate  the  user  to  other  observed  policy  violations,  allowing  the  operator  to  drill  down   and  discover  the  additional  violations.   C. Cyber  slacking  in  the  work  place   Joe  Slacker  works  at  MyElectric  Company  but  during  the  workday  he  is  also  a  secret  day  trader.    He   connects  to  daily  and  spends  on  average  1  hour  each  morning  and  afternoon.    He   uses  the  companies  VoIP  (SIP)  system  to  make  an  average  of  6  calls  daily.    He  also  spends  hours  on   yahoo  messenger  as  traderjoe  talking  to  traderbob  and  tradergill.    The  CA-­SIEM  can  easily  detect  this   activity  by  monitoring  the  content  of  web  applications  and  protocols,  as  well  as  VoIP  protocols,  and   issue  an  alert  ‘excessive  internet  activity  to  non-­authorized  services.’   D. Find  broken  business  processes   Company  eTrader  processes  trades.    The  final  step  in  the  trade  process  is  creation  of  PDFs  and   printed  reports,  which  must  be  sent  to  customers.    CA-­SIEM  discovers  that  the  customers  that   subscribed  for  the  paperless  option  are  being  emailed  PDF  reports  without  any  password  protection   or  encryption.           13  
  15. 15.   E. Use  of  weak  passwords   Company  ABC’s  security  policy  requires  use  of  strong  passwords  for  all  user  system  and  application   accounts.    This  is  strictly  managed  for  all  AD  accounts.    However,  dozen's  of  weak  passwords  on   outward  facing  ftp  servers,  mail  servers,  and  critical  web  applications  would  go  unmonitored  and   undetected  in  security  audits  because  they  don’t  use  AD.    Because  CA-­SIEM  fully  decodes  application   and  protocol  traffic,  a  notification  can  be  easily  generated  if  a  weak  password  is  detected,  including   relevant  user  names  so  that  passwords  can  be  strengthened  and  the  offending  users  can  be  given   security  awareness  training.   F. Attack  on  blind  spot   An  HP  LaserJet  4050  printer  is  not  a  critical  asset  and  ignored  in  VA  scans  and  not  monitored  in  the   SIEM.    The  printer  supports  web  services  and  has  been  compromised;  it  is  now  being  used  to  launch   attacks.    These  attacks  are  discovered  by  CA-­SIEM,  which  sees  a  printer  sending  non  printer  protocols   like  HTTP,  FTP,  SSH  instead  of  expected  protocols  like  PJL,  IPP,  LPD/LPR.   G. Dangerous  legacy  code     Rob  Braveheart  was  a  hard  worker  but  sadly  he's  been  deceased  for  over  a  year.    His  AD  account  was   deleted  but  not  his  Unix  account.    Some  scripts  that  Rob  wrote  still  utilize  Rob's  Unix  credentials.    The   scripts  run  every  night.    It's  against  corporate  policy  to  have  scripts  using  hard  coded  usernames  and   passwords  in  clear  text.    Rob  was  a  Unix  sysadmin  so  his  id  has  administrative  privileges,  so  this   presents  a  major  risk.    CA-­SIEM  is  able  to  detect  the  activity  of  a  ‘ghost  user’  through  the  context   provided  by  Identity  Management  systems  such  as  Active  Directory,  as  well  as  visibility  into  the   Content  Information,  which  shows  the  presence  of  disabled  usernames  in  recent  application  activity.   H. Theft  of  credentials   Rob  Braveheart’s  coworker  Gill  uses  Rob’s  username  &  passwords,  which  he  found  embedded  in  the   above  scripts.    He  uses  these  credentials  to  snoop  on  file  systems  containing  sensitive  corporate  data.     He  mounts  the  file  systems  (which  uses  SMB  protocol),  but  CA-­SIEM  provides  visibility  into  SMB   authentication  events,  which  reveal  the  theft  of  the  credentials.   I. Access  outside  business  hours   Gill  lives  close  to  the  office  and  comes  into  the  office  regularly  on  weekends.    He  feels  it’s  his  right  to   use  office  stationery,  copiers  and  printers.  The  abuse  would  have  gone  unnoticed  if  his  account   couldn’t  be  linked  to  printing  activity  from  his  desktop.    CA-­SIEM  can  easily  correlate  the  normally   benign  application  activity  (printing  from  Gills  desktop)  with  additional  context  (it  is  the  weekend),   detect  the  unauthorized  behavior,  and  issue  a  notification.    Further  investigation  using  CA-­SIEM   could  determine  if  the  printing  was  simple  misuse  of  resources  or  the  theft  of  sensitive  information,   by  examining  the  content  information  around  the  activity.   J. Sensitive  data  via  Web  Mail   Cynthia  is  in  customer  support  at  a  company  that  sells  retail  goods  online.    She  takes  customer  orders   over  the  phone  and  needs  to  validate  each  customer’s  credit  card  in  order  to  complete  the   transaction.    She  keeps  track  of  the  card  numbers  &  security  codes  by  typing  them  into  a  text  file.     One  evening  she  sends  herself  a  webmail  with  the  text  file  attached.    The  CA-­SIEM  can  detect  the   presence  of  sensitive  credit  card  information  from  within  the  text  file.   K. Discover  spam  bots  and  backdoors   CA-­SIEM  reported  outbound  SMTP  traffic  from  an  unrecognized  SMTP  server,  which  was  not   discovered  by  the  company’s  IDS,  IPS  or  AV  solutions.    Further  drill-­down  also  revealed  that  the   compromised  machines  had  a  root-­kit  that  disabled  AV.    CA-­SIEM  also  reported  that  a  thousand   emails  had  already  been  sent  from  the  computer  with  the  ‘from’  address  of  the  company’s  CEO.       14