SlideShare a Scribd company logo
1 of 35
Security	
  Data	
  Deluge-­‐	
  Zions	
  Bank's	
  
Hadoop	
  Based	
  Security	
  Data	
  Warehouse	
  	
  
              Claiming	
  the	
  Intersec>on	
  @	
  
            Informa>on	
  Security	
  and	
  Fraud	
  

 Brian	
  Chris>an,	
  CTO	
  and	
  Co-­‐Founder,	
  ZeEaset	
  
   Michael	
  Fowkes,	
  SVP	
  and	
  Director	
  of	
  Fraud	
  
Preven>on	
  and	
  Security	
  Analy>cs	
  ,	
  Zions	
  Bancorp	
  
Security	
  Data	
  Warehouse	
  
•  A	
  security	
  data	
  warehouse	
  is	
  a	
  massive	
  
   database	
  intended	
  to	
  aggregate	
  event	
  data	
  
   across	
  your	
  en>re	
  enterprise;	
  for	
  long	
  term	
  
   large-­‐scale	
  security/fraud	
  related	
  analy>cs	
  
•  The	
  u>lity	
  of	
  this	
  system	
  is	
  realized	
  once	
  the	
  
   data	
  is	
  normalized	
  into	
  a	
  common	
  format,	
  and	
  
   mined	
  by	
  experts	
  with	
  in>mate	
  understanding	
  
   of	
  the	
  data	
  itself	
  
•  It’s	
  also	
  affordable	
  to	
  the	
  common	
  company	
  
Why	
  SDW	
  Today	
  
•  “More	
  data	
  is	
  generated	
  in	
  3	
  days	
  than	
  in	
  the	
  
     history	
  of	
  the	
  world	
  to	
  2003”	
  –	
  Eric	
  Schmidt	
  
•  Fraudsters	
  con>nue	
  to	
  innovate	
  and	
  leverage	
  
     explosive	
  growth	
  of	
  portable	
  compu>ng	
  
•  Fraudsters	
  con-nue	
  to	
  study	
  “us”	
  
•  Through	
  massive	
  data	
  sets	
  and	
  
     comprehensive	
  analy>c	
  modeling,	
  you/we	
  can	
  
     begin	
  to	
  study	
  them	
  
	
  
SDW	
  isn’t	
  a	
  product	
  
•  Security	
  is	
  never	
  a	
  product	
  it’s	
  a	
  process	
  
•  There	
  are	
  past	
  “processes”	
  that	
  help	
  build	
  the	
  
   system	
  
    –  Key	
  example	
  is	
  SIEM:	
  SIEM	
  creates	
  a	
  “Big	
  Data”	
  
       problem	
  for	
  InfoSec.	
  Instead	
  of	
  dumping	
  that	
  data	
  
       a]er	
  60	
  days,	
  store	
  ALL	
  the	
  data	
  in	
  the	
  SDW	
  –	
  
       even	
  the	
  events	
  you’re	
  currently	
  not	
  logging	
  
•  When	
  fraud	
  teams	
  work	
  with	
  security,	
  the	
  
   common	
  pla`orm	
  will	
  accelerate	
  the	
  program	
  
SDW	
  Data	
  Collec>on                                   	
  	
  
•  The	
  SDW	
  is	
  intended	
  to	
  collect	
  EVERYTHING	
  
    –  Everything	
  in	
  terms	
  of	
  event	
  data/not	
  just	
  security	
  
•  SDW	
  business	
  analysts	
  live	
  by	
  the	
  expression	
  
   “the	
  more	
  data	
  I	
  receive,	
  the	
  beEer	
  I	
  feel”	
  
•  All	
  data	
  is	
  created	
  equal	
  –	
  but	
  data	
  mined	
  in	
  
   certain	
  combina>ons	
  is	
  more	
  interes>ng	
  than	
  
   others	
  
    –  	
  Trust	
  but	
  verify:	
  This	
  goes	
  for	
  both	
  automated	
  
       controls	
  as	
  well	
  as	
  human	
  behavior	
  
SDW	
  System	
  Availability	
  
•  The	
  system	
  should	
  be	
  easy	
  to	
  use	
  
    –  Average	
  skilled	
  labor	
  to	
  maintain	
  the	
  pla`orm/
       cluster	
  
•  The	
  system	
  must	
  fault	
  tolerant	
  	
  
    –  At	
  700TB	
  –	
  2PB	
  of	
  data,	
  when	
  a	
  hard	
  drives	
  fail	
  
       the	
  system	
  should	
  maintain	
  its	
  process	
  
•  The	
  SDW	
  should	
  grow	
  as	
  needed	
  without	
  
   performance	
  degrada>on	
  
    –  Affordable	
  to	
  meet	
  tomorrow’s	
  demand	
  
SDW	
  is	
  used	
  for	
  Mining	
  
•  SDW	
  is	
  where	
  Informa>on	
  Security	
  and	
  Fraud	
  
   teams	
  meet	
  to	
  solve	
  problems	
  	
  
    –  Most	
  InfoSec	
  and	
  Fraud	
  don’t	
  communicate	
  
    –  Silos	
  of	
  data	
  are	
  collapsed	
  into	
  a	
  single	
  view	
  
•  The	
  SDW	
  is	
  a	
  laboratory	
  –	
  Not	
  SIEM	
  
    –  Are	
  there	
  indicators	
  that	
  other	
  users/accounts	
  are	
  
       suscep>ble	
  to	
  fraud/aEack	
  
    –  Run	
  the	
  model	
  through	
  the	
  en>re	
  database	
  to	
  
       account	
  for	
  similar	
  aEributes	
  
What	
  is	
  a	
  Security	
  Data	
  Warehouse	
  
•  A	
  Security	
  Data	
  Warehouse	
  is	
  a	
  massive	
  
   mineable	
  database	
  	
  
•  The	
  system	
  is	
  horizontally	
  scalable	
  to	
  
   Petabytes	
  of	
  data	
  
•  The	
  amount	
  of	
  data	
  available	
  for	
  analysis	
  is	
  
   historical	
  and	
  many	
  years	
  old	
  
•  Its	
  affordable	
  to	
  the	
  common	
  person!	
  
    –  Risk	
  Management	
  are	
  the	
  common	
  people	
  in	
  IT	
  
Why	
  did	
  we	
  build	
  a	
  SDW?	
  
SIEM	
  
   DATA	
   DATA	
  
DATA	
            DATA	
   DATA	
  
         DATA	
   DATA	
  
SIEM	
  Issues	
  
•    Rigid	
  data	
  models	
  
•    Did	
  not	
  deal	
  well	
  with	
  unstructured	
  data	
  
•    RDMS	
  performance	
  with	
  large	
  data	
  sets	
  
•    Limited	
  ways	
  to	
  interact	
  with	
  data	
  
What	
  we	
  built	
  
Why	
  Hadoop	
  &	
  Hive	
  
•  Scalability	
  /	
  performance	
  
•  Manage	
  resources	
  
    –  Fair	
  Scheduler	
  
•  Fault	
  tolerance	
  
•  SQL	
  like	
  language	
  (HiveQL)	
  
    –  Most	
  of	
  the	
  staff	
  had	
  SQL	
  skills	
  
•  Easy	
  applica>on	
  /	
  tool	
  integra>on	
  
    –  ODBC	
  /	
  JDBC	
  driver	
  
•  Fast	
  data	
  inges>on	
  
•  Can	
  handle	
  unstructured	
  data	
  
•  Flexibility	
  &	
  extensibility	
  
    –  UDF’s	
  
    –  Streaming	
  jobs	
  
ETL	
  Philosophy	
  
•  “Pre-­‐mine”	
  Intelligence	
  during	
  the	
  ETL	
  process	
  
    –  Add	
  value	
  at	
  >me	
  of	
  capture	
  (enrichment)	
  
    –  Quickly	
  analyze	
  important	
  data	
  
    –  Automate	
  >me-­‐sensi>ve	
  ac>vi>es	
  
•  Load	
  all	
  data…no	
  filtering	
  of	
  data	
  that	
  will	
  be	
  
   loaded	
  into	
  the	
  warehouse	
  
    –  You	
  don’t	
  know	
  what	
  you	
  will	
  want	
  tomorrow	
  
    –  Leverage	
  file	
  compression,	
  rcfiles,	
  and	
  table	
  
       par>>oning	
  to	
  address	
  storage	
  /	
  performance	
  issues	
  
•  Store	
  2	
  years	
  worth	
  of	
  historical	
  data	
  
The	
  Team	
  
•    Data	
  Scien>st	
  
•    Data	
  Analyst	
  
•    LOB	
  User	
  
•    Data	
  Engineer	
  
•    Data	
  Pla`orm	
  Administrator	
  
What	
  was	
  the	
  outcome?	
  
Hadoop	
  jobtracker	
  stats	
  
•  1709	
  daily	
  ETL	
  and	
  model	
  jobs	
  
•  350	
  daily	
  employee	
  jobs	
  
Data	
  Examples	
  
•      Web	
  server	
  logs	
                     •    Customer	
  database(s)	
  
•      OS	
  logs	
  	
                            •    Fraud	
  model	
  alerts	
  
•      DB	
  logs	
                                •    Mainframe	
  ac>vity	
  logs	
  
•      Proxy	
  server	
  logs	
                   •    HTTP	
  (customer	
  Internet	
  ac>vity)	
  
•      SPAM	
  filter	
  logs	
                          logs	
  
•      A/V	
  events	
                             •    ATM/POS	
  transac>ons	
  
•      DLP	
  events	
                             •    Credit	
  card	
  transac>ons	
  
•      VPN	
  logs	
                               •    G/L	
  logs	
  
•      DNS	
  logs	
                               •    ACH,	
  Wire,	
  and	
  Deposit	
  
•      Firewall	
  logs	
                               transac>ons	
  
                                                   •    On-­‐line	
  banking	
  applica>on	
  logs	
  
•      E-­‐mail	
  logs	
  
                                                   •    Deposits	
  /	
  savings	
  /	
  >me	
  account	
  
•      Router	
  /	
  switch	
  logs	
                  daily	
  balances	
  
•      IP	
  blacklists	
  
•      Vulnerability	
  scan	
  results	
  
	
  
                                  Over	
  120	
  data	
  sets	
  
How	
  users	
  interact	
  with	
  the	
  data	
  
•  Data	
  Scien>st	
  
    –  KNIME	
  
    –  R	
  
    –  Tableau	
  
•  Data	
  Analyst	
  
    –  SQuirreL	
  SQL	
  
    –  Hive	
  command	
  line	
  
•  LOB	
  User	
  
    –  Datameer	
  
    –  Custom	
  web	
  app	
  for	
  common	
  queries	
  
         •  Parameterized	
  queries	
  
         •  Output	
  to	
  HTML	
  table	
  or	
  tab	
  delimited	
  file	
  
Sample	
  firewall	
  log	
  query	
  
SELECT	
  
   	
  collect_set(src_ip)	
  as	
  src_ips,	
  
   	
  dst_ip,	
  
   	
  protocol,	
  
   	
  ac>on,	
  
   	
  rule_uid,	
  
   	
  collect_set(rule)	
  as	
  rules,	
  
   	
  count	
  (*)	
  as	
  log_entry_count	
  
FROM	
  firewall_logs	
  
WHERE	
  day	
  =	
  ‘2012-­‐05-­‐26’	
  
AND	
  dst	
  =	
  ‘1.1.1.1’	
  
GROUP	
  BY	
  ac>on,	
  dst,	
  proto,	
  service,	
  rule_uid	
  
ORDER	
  BY	
  dst_ip,	
  protocol,	
  rule_uid	
  	
  
Dealing	
  with	
  unstructured	
  data	
  
Via	
  Perl:	
  
	
  
while	
  (<INFILE>)	
  {	
  
	
  	
  	
  	
  if	
  (	
  $_	
  =~	
  /s+w+s+Transac>onsInquirys+/)	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  chomp	
  $_;	
  
	
  	
  	
  	
  	
  	
  	
  	
  my	
  ($ts,	
  $ip,	
  $port,	
  $payload)	
  =	
  split(/|/,	
  $_);	
  
	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  if	
  ($payload	
  =~	
  /	
  s+Account:s+d+s+Appl:s+(w+)s+/)	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  {	
  $product_code	
  =	
  $1;	
  }	
  
	
  	
  	
  	
  	
  	
  	
  	
  if	
  ($payload	
  =~	
  /	
  s+Account:s+(d+)s+Appl:s+w+s+/)	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  {	
  $account	
  =	
  $1;	
  }	
  
	
  	
  	
  	
  	
  	
  	
  	
  if	
  ($payload	
  =~	
  /	
  s+w+s+Transac>ons+Inquirys+(w+)s+/)	
                                                                                            	
  	
  	
  	
  	
  	
  	
  	
  {	
  $bank	
  =	
  $1;	
  }	
  
	
  	
  	
  	
  	
  	
  	
  	
  if	
  ($payload	
  =~	
  /	
  s+(w+)s+Transac>ons+Inquirys+/)	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
                                                          	
  	
  	
  	
  	
  	
  	
  	
  {	
  $agent	
  =	
  $1;	
  }	
  
	
  
	
  	
  	
  	
  	
  	
  	
  	
  print	
  OUTFILE	
  “$ts|$ip|$port|$product_code|$account|$bank|$agentn”;	
  
	
  	
  	
  	
  }	
  
}	
  
Dealing	
  with	
  unstructured	
  data	
  
Via	
  Hive:	
  
	
  
SELECT	
  
     	
  ts,	
  
     	
  ip	
  ,	
  
     	
  port	
  ,	
  
     	
  regexp_extract(payload,	
  's+Account:s+d+s+Appl:s+(w+)s+',	
  1)	
  as	
  product_code,	
  
     	
  regexp_extract(payload,	
  's+Account:s+(d+)s+Appl:s+w+s+',	
  1)	
  as	
  account,	
  
     	
  regexp_extract(payload,	
  's+w+s+Transac>ons+Inquirys+(w+)s+',	
  1)	
  as	
  bank,	
  
     	
  regexp_extract(payload,	
  's+(w+)s+Transac>ons+Inquirys+',	
  1)	
  as	
  agent	
  
FROM	
  mainframe_logs	
  
WHERE	
  day	
  =	
  '2012-­‐05-­‐26'	
  
AND	
  payload	
  rlike	
  's+w+s+Transac>onsInquirys+’	
  
Predic>ve	
  analy>cs	
  examples	
  
•  Spear	
  phishing	
  detec>on	
  
•  Phishing	
  website	
  detec>on	
  
•  Fraud	
  detec>on:	
  
   –  Online	
  banking	
  anomaly	
  detec>on	
  
   –  ACH	
  /	
  Wire	
  fraud	
  
   –  Monitoring	
  of	
  high-­‐risk	
  employee	
  ac>vi>es	
  
Visualiza>on	
  example	
  

                                          Fraud	
  Events	
  Per	
  Day	
  
           JAN	
     FEB	
     MAR	
     APR	
     MAY	
     JUN	
     JUL	
     AUG	
     SEP	
     OCT	
     NOV	
     DEC	
  
Sun	
  
Mon	
  
Tue	
  




                                                                                                                                   2010	
  
Wed	
  
Thu	
  
 Fri	
  
 Sat	
  


Sun	
  
Mon	
  
Tue	
  




                                                                                                                                   2011	
  
Wed	
  
Thu	
  
 Fri	
  
 Sat	
  


Sun	
  
Mon	
  
Tue	
  




                                                                                                                                   2012	
  
Wed	
  
Thu	
  
 Fri	
  
 Sat	
  
Sessions will resume at 2:25pm




                             Page 35

More Related Content

What's hot

Architectural considerations for Hadoop Applications
Architectural considerations for Hadoop ApplicationsArchitectural considerations for Hadoop Applications
Architectural considerations for Hadoop Applicationshadooparchbook
 
An Introduction to Hadoop and Cloudera: Nashville Cloudera User Group, 10/23/14
An Introduction to Hadoop and Cloudera: Nashville Cloudera User Group, 10/23/14An Introduction to Hadoop and Cloudera: Nashville Cloudera User Group, 10/23/14
An Introduction to Hadoop and Cloudera: Nashville Cloudera User Group, 10/23/14iwrigley
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0Adam Muise
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaMark Kerzner
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelUwe Printz
 
Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataData Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataCloudera, Inc.
 
Strata EU tutorial - Architectural considerations for hadoop applications
Strata EU tutorial - Architectural considerations for hadoop applicationsStrata EU tutorial - Architectural considerations for hadoop applications
Strata EU tutorial - Architectural considerations for hadoop applicationshadooparchbook
 
Realtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBaseRealtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBaselarsgeorge
 
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Jonathan Seidman
 
Data warehousing with Hadoop
Data warehousing with HadoopData warehousing with Hadoop
Data warehousing with Hadoophadooparchbook
 
Hadoop World 2011: Mike Olson Keynote Presentation
Hadoop World 2011: Mike Olson Keynote PresentationHadoop World 2011: Mike Olson Keynote Presentation
Hadoop World 2011: Mike Olson Keynote PresentationCloudera, Inc.
 
Productionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best PracticesProductionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best PracticesMapR Technologies
 
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...yaevents
 
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...Krishnan Parasuraman
 
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop WebinarWhy Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop WebinarCloudera, Inc.
 
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...lucenerevolution
 
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksStinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksData Con LA
 

What's hot (20)

Architectural considerations for Hadoop Applications
Architectural considerations for Hadoop ApplicationsArchitectural considerations for Hadoop Applications
Architectural considerations for Hadoop Applications
 
hadoop @ Ibmbigdata
hadoop @ Ibmbigdatahadoop @ Ibmbigdata
hadoop @ Ibmbigdata
 
An Introduction to Hadoop and Cloudera: Nashville Cloudera User Group, 10/23/14
An Introduction to Hadoop and Cloudera: Nashville Cloudera User Group, 10/23/14An Introduction to Hadoop and Cloudera: Nashville Cloudera User Group, 10/23/14
An Introduction to Hadoop and Cloudera: Nashville Cloudera User Group, 10/23/14
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data Model
 
Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataData Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big Data
 
Strata EU tutorial - Architectural considerations for hadoop applications
Strata EU tutorial - Architectural considerations for hadoop applicationsStrata EU tutorial - Architectural considerations for hadoop applications
Strata EU tutorial - Architectural considerations for hadoop applications
 
Realtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBaseRealtime Analytics with Hadoop and HBase
Realtime Analytics with Hadoop and HBase
 
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
 
Data warehousing with Hadoop
Data warehousing with HadoopData warehousing with Hadoop
Data warehousing with Hadoop
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Hadoop World 2011: Mike Olson Keynote Presentation
Hadoop World 2011: Mike Olson Keynote PresentationHadoop World 2011: Mike Olson Keynote Presentation
Hadoop World 2011: Mike Olson Keynote Presentation
 
Productionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best PracticesProductionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best Practices
 
Treasure Data and Heroku
Treasure Data and HerokuTreasure Data and Heroku
Treasure Data and Heroku
 
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
 
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
 
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop WebinarWhy Every NoSQL Deployment Should Be Paired with Hadoop Webinar
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
 
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
 
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksStinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of Hortonworks
 

Similar to Security data deluge

Preventing The Next Data Breach Through Log Management
Preventing The Next Data Breach Through Log ManagementPreventing The Next Data Breach Through Log Management
Preventing The Next Data Breach Through Log ManagementNovell
 
Development of concurrent services using In-Memory Data Grids
Development of concurrent services using In-Memory Data GridsDevelopment of concurrent services using In-Memory Data Grids
Development of concurrent services using In-Memory Data Gridsjlorenzocima
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsCloudera, Inc.
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game ChangerCaserta
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amirydatastack
 
How to evaluate data protection technologies - Mastercard conference
How to evaluate data protection technologies -  Mastercard conferenceHow to evaluate data protection technologies -  Mastercard conference
How to evaluate data protection technologies - Mastercard conferenceUlf Mattsson
 
Big Data Approaches to Cloud Security
Big Data Approaches to Cloud SecurityBig Data Approaches to Cloud Security
Big Data Approaches to Cloud SecurityPaul Morse
 
Ankus, bigdata deployment and orchestration framework
Ankus, bigdata deployment and orchestration frameworkAnkus, bigdata deployment and orchestration framework
Ankus, bigdata deployment and orchestration frameworkAshrith Mekala
 
Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Govind Kanshi
 
Mtc learnings from isv & enterprise interaction
Mtc learnings from isv & enterprise  interactionMtc learnings from isv & enterprise  interaction
Mtc learnings from isv & enterprise interactionGovind Kanshi
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsAbhishekKumarAgrahar2
 
Understanding event data
Understanding event dataUnderstanding event data
Understanding event datayalisassoon
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarioskcmallu
 
Securing and governing a multi-tenant data lake within the financial industry
Securing and governing a multi-tenant data lake within the financial industrySecuring and governing a multi-tenant data lake within the financial industry
Securing and governing a multi-tenant data lake within the financial industryDataWorks Summit
 
FirstEigen Brochure- All clouds.pdf
FirstEigen Brochure- All clouds.pdfFirstEigen Brochure- All clouds.pdf
FirstEigen Brochure- All clouds.pdfarifulislam946965
 
Securing your esi_piedmont
Securing your esi_piedmontSecuring your esi_piedmont
Securing your esi_piedmontscm24
 

Similar to Security data deluge (20)

Preventing The Next Data Breach Through Log Management
Preventing The Next Data Breach Through Log ManagementPreventing The Next Data Breach Through Log Management
Preventing The Next Data Breach Through Log Management
 
Development of concurrent services using In-Memory Data Grids
Development of concurrent services using In-Memory Data GridsDevelopment of concurrent services using In-Memory Data Grids
Development of concurrent services using In-Memory Data Grids
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
 
Dw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhanDw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhan
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
How to evaluate data protection technologies - Mastercard conference
How to evaluate data protection technologies -  Mastercard conferenceHow to evaluate data protection technologies -  Mastercard conference
How to evaluate data protection technologies - Mastercard conference
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Big Data Approaches to Cloud Security
Big Data Approaches to Cloud SecurityBig Data Approaches to Cloud Security
Big Data Approaches to Cloud Security
 
Lecture1
Lecture1Lecture1
Lecture1
 
Ankus, bigdata deployment and orchestration framework
Ankus, bigdata deployment and orchestration frameworkAnkus, bigdata deployment and orchestration framework
Ankus, bigdata deployment and orchestration framework
 
Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)
 
Mtc learnings from isv & enterprise interaction
Mtc learnings from isv & enterprise  interactionMtc learnings from isv & enterprise  interaction
Mtc learnings from isv & enterprise interaction
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in details
 
Big Data Boom
Big Data BoomBig Data Boom
Big Data Boom
 
Understanding event data
Understanding event dataUnderstanding event data
Understanding event data
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
Securing and governing a multi-tenant data lake within the financial industry
Securing and governing a multi-tenant data lake within the financial industrySecuring and governing a multi-tenant data lake within the financial industry
Securing and governing a multi-tenant data lake within the financial industry
 
FirstEigen Brochure- All clouds.pdf
FirstEigen Brochure- All clouds.pdfFirstEigen Brochure- All clouds.pdf
FirstEigen Brochure- All clouds.pdf
 
Securing your esi_piedmont
Securing your esi_piedmontSecuring your esi_piedmont
Securing your esi_piedmont
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Recently uploaded (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Security data deluge

  • 1. Security  Data  Deluge-­‐  Zions  Bank's   Hadoop  Based  Security  Data  Warehouse     Claiming  the  Intersec>on  @   Informa>on  Security  and  Fraud   Brian  Chris>an,  CTO  and  Co-­‐Founder,  ZeEaset   Michael  Fowkes,  SVP  and  Director  of  Fraud   Preven>on  and  Security  Analy>cs  ,  Zions  Bancorp  
  • 2. Security  Data  Warehouse   •  A  security  data  warehouse  is  a  massive   database  intended  to  aggregate  event  data   across  your  en>re  enterprise;  for  long  term   large-­‐scale  security/fraud  related  analy>cs   •  The  u>lity  of  this  system  is  realized  once  the   data  is  normalized  into  a  common  format,  and   mined  by  experts  with  in>mate  understanding   of  the  data  itself   •  It’s  also  affordable  to  the  common  company  
  • 3. Why  SDW  Today   •  “More  data  is  generated  in  3  days  than  in  the   history  of  the  world  to  2003”  –  Eric  Schmidt   •  Fraudsters  con>nue  to  innovate  and  leverage   explosive  growth  of  portable  compu>ng   •  Fraudsters  con-nue  to  study  “us”   •  Through  massive  data  sets  and   comprehensive  analy>c  modeling,  you/we  can   begin  to  study  them    
  • 4. SDW  isn’t  a  product   •  Security  is  never  a  product  it’s  a  process   •  There  are  past  “processes”  that  help  build  the   system   –  Key  example  is  SIEM:  SIEM  creates  a  “Big  Data”   problem  for  InfoSec.  Instead  of  dumping  that  data   a]er  60  days,  store  ALL  the  data  in  the  SDW  –   even  the  events  you’re  currently  not  logging   •  When  fraud  teams  work  with  security,  the   common  pla`orm  will  accelerate  the  program  
  • 5. SDW  Data  Collec>on     •  The  SDW  is  intended  to  collect  EVERYTHING   –  Everything  in  terms  of  event  data/not  just  security   •  SDW  business  analysts  live  by  the  expression   “the  more  data  I  receive,  the  beEer  I  feel”   •  All  data  is  created  equal  –  but  data  mined  in   certain  combina>ons  is  more  interes>ng  than   others   –   Trust  but  verify:  This  goes  for  both  automated   controls  as  well  as  human  behavior  
  • 6. SDW  System  Availability   •  The  system  should  be  easy  to  use   –  Average  skilled  labor  to  maintain  the  pla`orm/ cluster   •  The  system  must  fault  tolerant     –  At  700TB  –  2PB  of  data,  when  a  hard  drives  fail   the  system  should  maintain  its  process   •  The  SDW  should  grow  as  needed  without   performance  degrada>on   –  Affordable  to  meet  tomorrow’s  demand  
  • 7. SDW  is  used  for  Mining   •  SDW  is  where  Informa>on  Security  and  Fraud   teams  meet  to  solve  problems     –  Most  InfoSec  and  Fraud  don’t  communicate   –  Silos  of  data  are  collapsed  into  a  single  view   •  The  SDW  is  a  laboratory  –  Not  SIEM   –  Are  there  indicators  that  other  users/accounts  are   suscep>ble  to  fraud/aEack   –  Run  the  model  through  the  en>re  database  to   account  for  similar  aEributes  
  • 8. What  is  a  Security  Data  Warehouse   •  A  Security  Data  Warehouse  is  a  massive   mineable  database     •  The  system  is  horizontally  scalable  to   Petabytes  of  data   •  The  amount  of  data  available  for  analysis  is   historical  and  many  years  old   •  Its  affordable  to  the  common  person!   –  Risk  Management  are  the  common  people  in  IT  
  • 9. Why  did  we  build  a  SDW?  
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15. SIEM   DATA   DATA   DATA   DATA   DATA   DATA   DATA  
  • 16. SIEM  Issues   •  Rigid  data  models   •  Did  not  deal  well  with  unstructured  data   •  RDMS  performance  with  large  data  sets   •  Limited  ways  to  interact  with  data  
  • 18.
  • 19. Why  Hadoop  &  Hive   •  Scalability  /  performance   •  Manage  resources   –  Fair  Scheduler   •  Fault  tolerance   •  SQL  like  language  (HiveQL)   –  Most  of  the  staff  had  SQL  skills   •  Easy  applica>on  /  tool  integra>on   –  ODBC  /  JDBC  driver   •  Fast  data  inges>on   •  Can  handle  unstructured  data   •  Flexibility  &  extensibility   –  UDF’s   –  Streaming  jobs  
  • 20.
  • 21.
  • 22. ETL  Philosophy   •  “Pre-­‐mine”  Intelligence  during  the  ETL  process   –  Add  value  at  >me  of  capture  (enrichment)   –  Quickly  analyze  important  data   –  Automate  >me-­‐sensi>ve  ac>vi>es   •  Load  all  data…no  filtering  of  data  that  will  be   loaded  into  the  warehouse   –  You  don’t  know  what  you  will  want  tomorrow   –  Leverage  file  compression,  rcfiles,  and  table   par>>oning  to  address  storage  /  performance  issues   •  Store  2  years  worth  of  historical  data  
  • 23.
  • 24. The  Team   •  Data  Scien>st   •  Data  Analyst   •  LOB  User   •  Data  Engineer   •  Data  Pla`orm  Administrator  
  • 25. What  was  the  outcome?  
  • 26. Hadoop  jobtracker  stats   •  1709  daily  ETL  and  model  jobs   •  350  daily  employee  jobs  
  • 27.
  • 28. Data  Examples   •  Web  server  logs   •  Customer  database(s)   •  OS  logs     •  Fraud  model  alerts   •  DB  logs   •  Mainframe  ac>vity  logs   •  Proxy  server  logs   •  HTTP  (customer  Internet  ac>vity)   •  SPAM  filter  logs   logs   •  A/V  events   •  ATM/POS  transac>ons   •  DLP  events   •  Credit  card  transac>ons   •  VPN  logs   •  G/L  logs   •  DNS  logs   •  ACH,  Wire,  and  Deposit   •  Firewall  logs   transac>ons   •  On-­‐line  banking  applica>on  logs   •  E-­‐mail  logs   •  Deposits  /  savings  /  >me  account   •  Router  /  switch  logs   daily  balances   •  IP  blacklists   •  Vulnerability  scan  results     Over  120  data  sets  
  • 29. How  users  interact  with  the  data   •  Data  Scien>st   –  KNIME   –  R   –  Tableau   •  Data  Analyst   –  SQuirreL  SQL   –  Hive  command  line   •  LOB  User   –  Datameer   –  Custom  web  app  for  common  queries   •  Parameterized  queries   •  Output  to  HTML  table  or  tab  delimited  file  
  • 30. Sample  firewall  log  query   SELECT    collect_set(src_ip)  as  src_ips,    dst_ip,    protocol,    ac>on,    rule_uid,    collect_set(rule)  as  rules,    count  (*)  as  log_entry_count   FROM  firewall_logs   WHERE  day  =  ‘2012-­‐05-­‐26’   AND  dst  =  ‘1.1.1.1’   GROUP  BY  ac>on,  dst,  proto,  service,  rule_uid   ORDER  BY  dst_ip,  protocol,  rule_uid    
  • 31. Dealing  with  unstructured  data   Via  Perl:     while  (<INFILE>)  {          if  (  $_  =~  /s+w+s+Transac>onsInquirys+/)  {                  chomp  $_;                  my  ($ts,  $ip,  $port,  $payload)  =  split(/|/,  $_);                      if  ($payload  =~  /  s+Account:s+d+s+Appl:s+(w+)s+/)                                                                  {  $product_code  =  $1;  }                  if  ($payload  =~  /  s+Account:s+(d+)s+Appl:s+w+s+/)                                                                  {  $account  =  $1;  }                  if  ($payload  =~  /  s+w+s+Transac>ons+Inquirys+(w+)s+/)                  {  $bank  =  $1;  }                  if  ($payload  =~  /  s+(w+)s+Transac>ons+Inquirys+/)                                      {  $agent  =  $1;  }                    print  OUTFILE  “$ts|$ip|$port|$product_code|$account|$bank|$agentn”;          }   }  
  • 32. Dealing  with  unstructured  data   Via  Hive:     SELECT    ts,    ip  ,    port  ,    regexp_extract(payload,  's+Account:s+d+s+Appl:s+(w+)s+',  1)  as  product_code,    regexp_extract(payload,  's+Account:s+(d+)s+Appl:s+w+s+',  1)  as  account,    regexp_extract(payload,  's+w+s+Transac>ons+Inquirys+(w+)s+',  1)  as  bank,    regexp_extract(payload,  's+(w+)s+Transac>ons+Inquirys+',  1)  as  agent   FROM  mainframe_logs   WHERE  day  =  '2012-­‐05-­‐26'   AND  payload  rlike  's+w+s+Transac>onsInquirys+’  
  • 33. Predic>ve  analy>cs  examples   •  Spear  phishing  detec>on   •  Phishing  website  detec>on   •  Fraud  detec>on:   –  Online  banking  anomaly  detec>on   –  ACH  /  Wire  fraud   –  Monitoring  of  high-­‐risk  employee  ac>vi>es  
  • 34. Visualiza>on  example   Fraud  Events  Per  Day   JAN   FEB   MAR   APR   MAY   JUN   JUL   AUG   SEP   OCT   NOV   DEC   Sun   Mon   Tue   2010   Wed   Thu   Fri   Sat   Sun   Mon   Tue   2011   Wed   Thu   Fri   Sat   Sun   Mon   Tue   2012   Wed   Thu   Fri   Sat  
  • 35. Sessions will resume at 2:25pm Page 35