SlideShare a Scribd company logo
1 of 46
Download to read offline
Adversarial	
  Analy-cs	
  101	
  
Strata	
  Conference	
  &	
  Hadoop	
  World	
  
October	
  28,	
  2013	
  
Robert	
  L.	
  Grossman	
  
Open	
  Data	
  Group	
  
Examples	
  of	
  Adversarial	
  Analy-cs	
  	
  
•  Compete	
  for	
  resources	
  
–  Low	
  latency	
  trading	
  
–  Auc-ons	
  

•  Steal	
  someone	
  else’s	
  resources	
  
–  Credit	
  card	
  fraud	
  rings	
  	
  
–  Insurance	
  fraud	
  rings	
  	
  

•  Hide	
  your	
  ac-vity	
  
–  Click	
  spam	
  	
  
–  Exfiltra-on	
  of	
  informa-on	
  
	
  
Conven-onal	
  vs	
  Adversarial	
  Analy-cs	
  
Conven&onal	
  Analy&cs	
  
Type	
  of	
  game	
   Gain	
  /	
  Gain	
  
Subject	
  
Individual	
  
Behavior	
  
DriUs	
  over	
  -me	
  
Transparency	
   Behavior	
  usually	
  open	
  
Automa-on	
   Manual	
  (an	
  individual)	
  

Adversarial	
  Analy&cs	
  
Gain	
  /	
  Loss	
  
Group	
  /	
  ring	
  /	
  organiza-on	
  
Sudden	
  changes	
  
Behavior	
  usually	
  hidden	
  
Some-mes	
  another	
  model	
  
1.	
  Points	
  of	
  Compromise	
  

Source:	
  Wall	
  Street	
  Journal,	
  May	
  4,	
  2007	
  
Case	
  Study:	
  Points	
  of	
  Compromise	
  
•  TJX	
  compromise	
  
•  Wireless	
  Point	
  of	
  Sale	
  
devices	
  compromised	
  	
  
•  Personal	
  informa-on	
  on	
  
451,000	
  individuals	
  taken	
  
•  Informa-on	
  used	
  months	
  
later	
  for	
  fraudulent	
  
purchases.	
  
•  WSJ	
  reports	
  that	
  45.7	
  
million	
  credit	
  card	
  accounts	
  
at	
  risk	
  
•  “TJX's	
  breach-­‐related	
  bill	
  
could	
  surpass	
  $1	
  billion	
  
over	
  five	
  years”	
  

Source:	
  Wall	
  Street	
  Journal,	
  
May	
  4,	
  2007	
  
Learning	
  TradecraU	
  
The	
  adversary	
  is	
  oUen	
  a	
  group,	
  
criminal	
  ring,	
  etc.	
  
It’s	
  Not	
  an	
  Individual…	
  

Source:	
  Jus-ce	
  Department,	
  New	
  York	
  Times.	
  
Gonzalez	
  TradecraU	
  
1.  Exploit	
  vulnerabili-es	
  in	
  wireless	
  networks	
  
outside	
  of	
  retail	
  stores.	
  
2.  Exploit	
  vulnerabili-es	
  in	
  databases	
  of	
  retail	
  
organiza-ons.	
  
3.  Gain	
  unauthorized	
  access	
  to	
  networks	
  that	
  
process	
  and	
  store	
  credit	
  card	
  transac-ons.	
  	
  
4.  Exfiltrate	
  40	
  Million	
  track	
  2	
  records	
  (data	
  on	
  
credit	
  card’s	
  magne-c	
  strip)	
  
Gonzalez	
  TradecraU	
  (cont’d)	
  
5.  Sell	
  track	
  2	
  data	
  in	
  Eastern	
  Europe,	
  USA,	
  and	
  
other	
  places.	
  
6.  Create	
  counterfeit	
  ATM	
  and	
  debit	
  cards	
  using	
  
track	
  2	
  data.	
  
7.  Conceal	
  and	
  launder	
  the	
  illegal	
  proceeds	
  
obtained	
  through	
  anonymous	
  web	
  currencies	
  
in	
  the	
  US	
  and	
  Russia.	
  
Source:	
  US	
  District	
  Court,	
  District	
  of	
  Massachusehs,	
  Indictment	
  of	
  
Albert	
  Gonzalez,	
  August	
  5,	
  2008.	
  
Data	
  Is	
  Large	
  
•  TJX	
  compromise	
  data	
  is	
  too	
  
large	
  to	
  fit	
  into	
  memory	
  
•  Data	
  is	
  difficult	
  to	
  fit	
  into	
  
database	
  
•  Millions	
  of	
  possible	
  points	
  of	
  
compromise	
  
•  Data	
  must	
  be	
  kept	
  for	
  months	
  
to	
  years	
  

Source:	
  Wall	
  Street	
  Journal,	
  May	
  4,	
  2007	
  
2.	
  Introduc-on	
  to	
  	
  
Adversarial	
  Analy-cs	
  
Conven-onal	
  Analy-cs	
  

•  Example:	
  predic-ve	
  
model	
  to	
  select	
  online	
  
ads.	
  
•  Example:	
  predic-ve	
  
model	
  to	
  classify	
  
sen-ment	
  of	
  a	
  customer	
  
service	
  interac-on.	
  

Adversarial	
  Analy-cs	
  

•  Example:	
  commercial	
  
compe-tor	
  trying	
  to	
  
maximize	
  trading	
  gains.	
  
•  Example:	
  criminal	
  gang	
  
ahacking	
  a	
  system	
  and	
  
hiding	
  its	
  behavior.	
  
What’s	
  Different	
  About	
  the	
  Models?	
  

Data	
  
Data	
  size	
  
Model	
  

Conven&onal	
  Analy&cs	
  
Labeled	
  
Small	
  to	
  large	
  
Classifica-on	
  /	
  	
  
Regression	
  

Frequency	
   Monthly,	
  quarterly,	
  
of	
  update	
   yearly	
  

Adversarial	
  Analy&cs	
  
Unlabeled	
  
Small	
  to	
  very	
  large	
  
Clustering,	
  change	
  
detec-on	
  
Hourly,	
  daily,	
  weekly,	
  …	
  
Types	
  of	
  Adversaries	
  
Adversary	
  

Home	
  Team	
  
Use	
  specialized	
  teams	
  
and	
  approaches.	
  
Build	
  custom	
  
analy-c	
  models.	
  

Use	
  products.	
  

Change	
  
the	
  game.	
  

$10,000,000	
  

Create	
  new	
  analy-c	
  
techniques.	
  

$100,000	
  

Use	
  exis-ng	
  analy-c	
  techniques.	
  

$1,000	
  

Source:	
  This	
  approach	
  is	
  based	
  in	
  part	
  on	
  the	
  threat	
  hierarchy	
  in	
  the	
  Defense	
  Science	
  (DSB)	
  
Report	
  on	
  Resilient	
  Military	
  Systems	
  and	
  the	
  Cyber	
  Threat,	
  January,	
  2013	
  
What	
  is	
  Different	
  About	
  the	
  Adversary?	
  
En-ty	
  
What	
  is	
  
modeled?	
  

Conven&onal	
  Analy&cs	
   Adversarial	
  Analy&cs	
  
Person	
  browsing	
  
Gang	
  ahacking	
  a	
  system	
  
Events,	
  en--es	
  
Events,	
  en--es,	
  rings	
  

Behavior	
  

Component	
  of	
  natural	
  
behavior	
  

Wai-ng	
  for	
  opportuni-es	
  

Evolu-on	
  

DriU	
  

Ac-ve	
  change	
  in	
  behavior	
  
	
  

Obfusca-on	
   Ignore	
  

Hide	
  
Upda-ng	
  Models	
  
Conven&onal	
  
Analy&cs	
  
When	
  do	
  you	
  
update	
  model?	
  

Adversarial	
  
Analy&cs	
  

When	
  there	
  is	
  a	
  
major	
  change	
  in	
  
behavior	
  

Frequently,	
  to	
  gain	
  
an	
  advantage.	
  

Process	
  for	
  upda-ng	
   Automa-on	
  and	
  
models	
  
analy-c	
  
infrastructure	
  
helpful.	
  

Automa-on	
  and	
  
analy-c	
  
infrastructure	
  
essen-al.	
  
3.	
  More	
  Examples	
  
Click	
  Fraud	
  

?	
  

•  Is	
  a	
  click	
  on	
  an	
  ad	
  legi-mate?	
  
•  Is	
  it	
  done	
  by	
  a	
  computer	
  or	
  a	
  human?	
  
•  If	
  done	
  by	
  a	
  human,	
  is	
  it	
  an	
  adversary?	
  
Types	
  of	
  Adversaries	
  
Adversary	
  

Home	
  Team	
  
Use	
  specialized	
  teams	
  
and	
  approaches.	
  
Build	
  custom	
  
analy-c	
  models.	
  

Use	
  products.	
  

Change	
  
the	
  game.	
  

$10,000,000	
  

Create	
  new	
  analy-c	
  
techniques.	
  

$100,000	
  

Use	
  exis-ng	
  analy-c	
  techniques.	
  

$1,000	
  

Source:	
  This	
  approach	
  is	
  based	
  in	
  part	
  on	
  the	
  threat	
  hierarchy	
  in	
  the	
  Defense	
  Science	
  (DSB)	
  
Report	
  on	
  Resilient	
  Military	
  Systems	
  and	
  the	
  Cyber	
  Threat,	
  January,	
  2013	
  
Low	
  Latency	
  Trading	
  
Broker/Dealer	
  
ECN	
  

Trades	
  

Market	
  data	
  
Low	
  latency	
  
trader	
  
Adversarial	
  traders	
  

• 
• 
• 
• 

2+	
  billion	
  bids,	
  asks,	
  executes	
  /	
  day	
  
>	
  90%	
  are	
  machine	
  generated	
  
Most	
  are	
  cancelled	
  
Decisions	
  are	
  made	
  in	
  ms	
  

Algorithm	
  

Algorithm	
  

Back	
  tes-ng	
  
Connec-ng	
  Chicago	
  &	
  NYC	
  
Technology	
  

Vendor	
  

Round	
  Trip	
  
Time	
  (ms)	
  

Microwave	
  

Windy	
  
Apple	
  	
  

9	
  .0	
  

Dark	
  fiber,	
  
Spread	
  
shorter	
  path	
   Networks	
  

13.1	
  

Standard	
  
fiber,	
  ISP	
  

14.5+	
  

Various	
  

Source:	
  lowlatency.com,	
  June	
  27,	
  2012;	
  Wired	
  
Magazine,	
  August	
  3,	
  2012	
  
Types	
  of	
  Adversaries	
  
Adversary	
  

Home	
  Team	
  
Use	
  specialized	
  teams	
  
and	
  approaches.	
  
Build	
  custom	
  
analy-c	
  models.	
  

Use	
  products.	
  

Change	
  
the	
  game.	
  

$10,000,000	
  

Create	
  new	
  analy-c	
  
techniques.	
  

$100,000	
  

Use	
  exis-ng	
  analy-c	
  techniques.	
  

$1,000	
  

Source:	
  This	
  approach	
  is	
  based	
  in	
  part	
  on	
  the	
  threat	
  hierarchy	
  in	
  the	
  Defense	
  Science	
  (DSB)	
  
Report	
  on	
  Resilient	
  Military	
  Systems	
  and	
  the	
  Cyber	
  Threat,	
  January,	
  2013	
  
Example:	
  Conficker	
  

Source:	
  Conficker	
  Working	
  Group:	
  Lessons	
  Learned,	
  June	
  2010	
  (Published	
  
January	
  2011),www.confickerworkinggroup.org	
  
Infected	
  PCs	
  

Command	
  and	
  
control	
  center	
  
Source:	
  Conficker	
  Working	
  Group:	
  Lessons	
  Learned,	
  June	
  2010	
  (Published	
  
January	
  2011),www.confickerworkinggroup.org	
  
Source:	
  Conficker	
  Working	
  Group:	
  Lessons	
  Learned,	
  June	
  2010	
  (Published	
  
January	
  2011),www.confickerworkinggroup.org	
  
Types	
  of	
  Adversaries	
  
Adversary	
  

Home	
  Team	
  
Use	
  specialized	
  teams	
  
and	
  approaches.	
  
Build	
  custom	
  
analy-c	
  models.	
  

Use	
  products.	
  

Change	
  
the	
  game.	
  

$10,000,000	
  

Create	
  new	
  analy-c	
  
techniques.	
  

$100,000	
  

Use	
  exis-ng	
  analy-c	
  techniques.	
  

$1,000	
  

Source:	
  This	
  approach	
  is	
  based	
  in	
  part	
  on	
  the	
  threat	
  hierarchy	
  in	
  the	
  Defense	
  Science	
  (DSB)	
  
Report	
  on	
  Resilient	
  Military	
  Systems	
  and	
  the	
  Cyber	
  Threat,	
  January,	
  2013	
  
4.	
  Building	
  Models	
  for	
  	
  
Adversarial	
  Analy-cs	
  
Building	
  Models	
  To	
  	
  
Iden-fy	
  Fraudulent	
  Transac-ons	
  
Building	
  the	
  model:	
  
1.  Get	
  a	
  dataset	
  of	
  labeled	
  
data	
  (i.e.	
  fraud	
  is	
  labeled).	
  
2.  Build	
  a	
  candidate	
  predic-ve	
  
model	
  that	
  scores	
  each	
  
transac-on	
  with	
  likelihood	
  
of	
  fraud.	
  
3.  Compare	
  liU	
  of	
  candidate	
  
model	
  to	
  current	
  champion	
  
model	
  on	
  hold-­‐out	
  dataset.	
  
4.  Deploy	
  new	
  model	
  if	
  
performance	
  is	
  beher.	
  

Scoring	
  transac-ons:	
  
1.  Get	
  next	
  event	
  
(transac-on).	
  
2.  Update	
  one	
  or	
  more	
  
associated	
  feature	
  vectors	
  
(account-­‐based)	
  
3.  Use	
  model	
  to	
  process	
  
updated	
  feature	
  vectors	
  to	
  
compute	
  scores.	
  
4.  Post-­‐process	
  scores	
  using	
  
rules	
  to	
  compute	
  alerts.	
  
Building	
  Models	
  to	
  	
  
Iden-fy	
  Fraud	
  Rings	
  
Driver	
  
Repair	
  shop	
  
Repair	
  shop	
  

Driver	
  

Driver	
  

Clinic	
  
Driver	
  
Driver	
  
Driver	
  
Clinic	
  

•  Look	
  for	
  suspicious	
  paherns	
  and	
  rela-onships	
  
among	
  claims.	
  	
  
•  Look	
  for	
  hidden	
  rela-onships	
  through	
  common	
  
phone	
  numbers,	
  nearby	
  addresses,	
  etc.	
  
Method	
  1:	
  Look	
  for	
  Tes-ng	
  Behavior	
  
•  Adversaries	
  oUen	
  test	
  an	
  approach	
  first.	
  
•  Build	
  models	
  to	
  detect	
  the	
  tes-ng.	
  
Method	
  2:	
  Iden-fy	
  Common	
  Behavior	
  
•  Common	
  cluster	
  algorithms	
  (e.g.	
  k-­‐means)	
  
require	
  specifying	
  the	
  number	
  of	
  clusters	
  
•  For	
  many	
  applica-ons,	
  we	
  keep	
  the	
  number	
  of	
  
clusters	
  k	
  rela-vely	
  small	
  
•  With	
  microclustering,	
  we	
  produce	
  many	
  
clusters,	
  even	
  if	
  some	
  of	
  the	
  them	
  are	
  quite	
  
similar.	
  
•  Microclusters	
  can	
  some-mes	
  be	
  labeled	
  	
  
Microcluster	
  Based	
  Scoring	
  
Good	
  
Cluster	
  
Bad	
  
Cluster	
  

•  The	
  closer	
  a	
  
feature	
  vector	
  is	
  to	
  
a	
  good	
  cluster,	
  the	
  
more	
  likely	
  it	
  is	
  to	
  
be	
  good	
  
•  The	
  closer	
  it	
  is	
  to	
  a	
  
bad	
  cluster,	
  the	
  
more	
  likely	
  it	
  is	
  to	
  
be	
  bad.	
  
Method	
  3:	
  Scaling	
  Baseline	
  Algorithms	
  

Drive	
  by	
  exploits	
  

Source:	
  Wall	
  Street	
  Journal	
  

Points	
  of	
  compromise	
  
What	
  are	
  the	
  Common	
  Elements?	
  
•  Time	
  stamps	
  
•  Sites	
  
–  e.g.	
  Web	
  sites,	
  vendors,	
  computers,	
  network	
  devices	
  

•  En--es	
  

• 
• 
• 
• 

–  e.g.	
  visitors,	
  users,	
  flows	
  
	
  
Log	
  files	
  fill	
  disks;	
  many,	
  many	
  disks	
  
Behavior	
  occurs	
  at	
  all	
  scales	
  
Want	
  to	
  iden-fy	
  phenomena	
  at	
  all	
  scales	
  
Need	
  to	
  group	
  “similar	
  behavior”	
  
Examples	
  of	
  a	
  Site-­‐En-ty	
  Data	
  
Example	
  

Sites	
  

En&&es	
  

Drive-­‐by	
  exploits	
  
Compromised	
  
user	
  accounts	
  
Compromised	
  	
  
payment	
  systems	
  

Web	
  sites	
  
Compromised	
  
computers	
  
Merchants	
  

Computers	
  
User	
  accounts	
  
Cardholders	
  

Source:	
  Collin	
  Benneh,	
  Robert	
  L.	
  Grossman,	
  David	
  Locke,	
  Jonathan	
  Seidman	
  and	
  Steve	
  Vejcik,	
  
MalStone:	
  Towards	
  a	
  Benchmark	
  for	
  Analy-cs	
  on	
  Large	
  Data	
  Clouds,	
  The	
  16th	
  ACM	
  SIGKDD	
  
Interna-onal	
  Conference	
  on	
  Knowledge	
  Discovery	
  and	
  Data	
  Mining	
  (KDD	
  2010),	
  ACM,	
  2010.	
  

37	
  
Exposure	
  Window	
  

sites	
  

en--es	
  

Monitor	
  Window	
  

dk-­‐1	
  

dk	
  

-me	
  

38	
  
The	
  Mark	
  Model	
  
•  Some	
  sites	
  are	
  marked	
  (percent	
  of	
  mark	
  is	
  a	
  
parameter	
  and	
  type	
  of	
  sites	
  marked	
  is	
  a	
  draw	
  
from	
  a	
  distribu-on)	
  
•  Some	
  en--es	
  become	
  marked	
  aUer	
  visi-ng	
  a	
  
marked	
  site	
  (this	
  is	
  a	
  draw	
  from	
  a	
  distribu-on)	
  
•  There	
  is	
  a	
  delay	
  between	
  the	
  visit	
  and	
  the	
  when	
  
the	
  en-ty	
  becomes	
  marked	
  (this	
  is	
  a	
  draw	
  from	
  a	
  
distribu-on)	
  
•  There	
  is	
  a	
  background	
  process	
  that	
  marks	
  some	
  
en--es	
  independent	
  of	
  visit	
  (this	
  adds	
  noise	
  to	
  
problem)	
  
Subsequent	
  Propor-on	
  of	
  Marks	
  
•  Fix	
  a	
  site	
  s[j]	
  
•  Let	
  A[j]	
  be	
  en--es	
  that	
  transact	
  during	
  
ExpWin	
  and	
  if	
  en-ty	
  is	
  marked,	
  then	
  visit	
  
occurs	
  before	
  mark	
  
•  Let	
  B[j]	
  be	
  all	
  en--es	
  in	
  A[j]	
  that	
  become	
  
marked	
  some-me	
  during	
  the	
  MonWin	
  
•  Subsequent	
  propor-on	
  of	
  marks	
  is	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  r[j]	
  =	
  |	
  B[j]	
  |	
  	
  /	
  	
  |	
  A[j]	
  	
  |	
  
•  Easily	
  computed	
  with	
  MapReduce.	
  
	
  
Compu-ng	
  Site	
  En-ty	
  	
  
Sta-s-cs	
  with	
  MapReduce	
  
MalStone	
  B	
  
Hadoop	
  	
  MapReduce	
  v0.18.3	
  
799	
  min	
  
Hadoop	
  Python	
  Streams	
  	
  v0.18.3	
   142	
  min	
  
Sector	
  UDFs	
  v1.19	
  
44	
  min	
  
#	
  Nodes	
  
20	
  nodes	
  
#	
  Records	
  
10	
  Billion	
  
Size	
  of	
  Dataset	
  	
  
1	
  TB	
  
Source:	
  Collin	
  Benneh,	
  Robert	
  L.	
  Grossman,	
  David	
  Locke,	
  Jonathan	
  Seidman	
  and	
  Steve	
  Vejcik,	
  
MalStone:	
  Towards	
  a	
  Benchmark	
  for	
  Analy-cs	
  on	
  Large	
  Data	
  Clouds,	
  The	
  16th	
  ACM	
  SIGKDD	
  
Interna-onal	
  Conference	
  on	
  Knowledge	
  Discovery	
  and	
  Data	
  Mining	
  (KDD	
  2010),	
  ACM,	
  2010.	
  

41	
  
5.	
  Summary	
  
Some	
  Rules	
  
1.  Understand	
  your	
  adversary's	
  tradecraU;	
  he	
  or	
  
she	
  will	
  go	
  aUer	
  the	
  easiest	
  or	
  weakest	
  link.	
  
2.  Your	
  adversary	
  will	
  be	
  crea-ng	
  new	
  
opportuni-es;	
  you	
  must	
  create	
  new	
  models	
  and	
  
analy-c	
  approaches.	
  
3.  Risk	
  is	
  usually	
  viewed	
  as	
  the	
  cost	
  of	
  doing	
  
business,	
  but	
  every	
  once	
  in	
  a	
  while	
  the	
  cost	
  may	
  
be	
  100x	
  –	
  1,000x	
  greater	
  
4.  The	
  quicker	
  you	
  can	
  update	
  your	
  strategy	
  and	
  
models,	
  the	
  greater	
  your	
  advantage.	
  	
  Use	
  
automa-on	
  (e.g.	
  PMML-­‐based	
  scoring	
  engines)	
  
to	
  deploy	
  models	
  more	
  quickly.	
  
Ques-ons?	
  
About	
  Robert	
  L.	
  Grossman	
  
Robert	
  Grossman	
  (@bobgrossman)	
  is	
  the	
  Founder	
  and	
  a	
  Partner	
  
of	
  Open	
  Data	
  Group,	
  which	
  specializes	
  in	
  building	
  predic-ve	
  
models	
  over	
  big	
  data.	
  He	
  is	
  also	
  a	
  Senior	
  Fellow	
  at	
  the	
  
Computa-on	
  Ins-tute	
  	
  and	
  Ins-tute	
  for	
  Genomics	
  and	
  Systems	
  
Biology	
  at	
  the	
  University	
  of	
  Chicago	
  and	
  a	
  Professor	
  in	
  the	
  
Biological	
  Sciences	
  Division.	
  He	
  has	
  led	
  the	
  development	
  of	
  new	
  
open	
  source	
  soUware	
  tools	
  for	
  analyzing	
  big	
  data,	
  cloud	
  
compu-ng,	
  and	
  high	
  performance	
  networking.	
  Prior	
  to	
  star-ng	
  
Open	
  Data	
  Group,	
  he	
  founded	
  Magnify,	
  Inc.,	
  which	
  provides	
  data	
  
mining	
  solu-ons	
  to	
  the	
  insurance	
  industry.	
  Grossman	
  was	
  
Magnify’s	
  CEO	
  un-l	
  2001	
  and	
  its	
  Chairman	
  un-l	
  it	
  was	
  sold	
  to	
  
ChoicePoint	
  in	
  2005.	
  He	
  blogs	
  occasionaly	
  about	
  big	
  data,	
  data	
  
science,	
  and	
  data	
  engineering	
  at	
  rgrossman.com.	
  	
  	
  
For	
  More	
  Informa-on	
  

www.opendatagroup.com	
  
info@opendatagroup.com	
  

More Related Content

What's hot

Is one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchIs one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchGreg Landrum
 
Some Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your CompanySome Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your CompanyRobert Grossman
 
Mining Big Data using Genetic Algorithm
Mining Big Data using Genetic AlgorithmMining Big Data using Genetic Algorithm
Mining Big Data using Genetic AlgorithmIRJET Journal
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Greg Landrum
 
Massive Data Analysis- Challenges and Applications
Massive Data Analysis- Challenges and ApplicationsMassive Data Analysis- Challenges and Applications
Massive Data Analysis- Challenges and ApplicationsVijay Raghavan
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataRobert Grossman
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...Robert Grossman
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?Robert Grossman
 
Multipleregression covidmobility and Covid-19 policy recommendation
Multipleregression covidmobility and Covid-19 policy recommendationMultipleregression covidmobility and Covid-19 policy recommendation
Multipleregression covidmobility and Covid-19 policy recommendationKan Yuenyong
 
20170406 Genomics@Google - KeyGene - Wageningen
20170406 Genomics@Google - KeyGene - Wageningen20170406 Genomics@Google - KeyGene - Wageningen
20170406 Genomics@Google - KeyGene - WageningenAllen Day, PhD
 
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...Allen Day, PhD
 
Some Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsSome Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsRobert Grossman
 
20170402 Crop Innovation and Business - Amsterdam
20170402 Crop Innovation and Business - Amsterdam20170402 Crop Innovation and Business - Amsterdam
20170402 Crop Innovation and Business - AmsterdamAllen Day, PhD
 
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / PhoenixAllen Day, PhD
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
 
Data Science: Not Just For Big Data
Data Science: Not Just For Big DataData Science: Not Just For Big Data
Data Science: Not Just For Big DataRevolution Analytics
 
Introduction to data science intro,ch(1,2,3)
Introduction to data science intro,ch(1,2,3)Introduction to data science intro,ch(1,2,3)
Introduction to data science intro,ch(1,2,3)heba_ahmad
 
Data science Innovations January 2018
Data science Innovations January 2018Data science Innovations January 2018
Data science Innovations January 2018suresh sood
 

What's hot (20)

Is one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchIs one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical research
 
Cri big data
Cri big dataCri big data
Cri big data
 
Some Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your CompanySome Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your Company
 
Big Data
Big Data Big Data
Big Data
 
Mining Big Data using Genetic Algorithm
Mining Big Data using Genetic AlgorithmMining Big Data using Genetic Algorithm
Mining Big Data using Genetic Algorithm
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...
 
Massive Data Analysis- Challenges and Applications
Massive Data Analysis- Challenges and ApplicationsMassive Data Analysis- Challenges and Applications
Massive Data Analysis- Challenges and Applications
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?
 
Multipleregression covidmobility and Covid-19 policy recommendation
Multipleregression covidmobility and Covid-19 policy recommendationMultipleregression covidmobility and Covid-19 policy recommendation
Multipleregression covidmobility and Covid-19 policy recommendation
 
20170406 Genomics@Google - KeyGene - Wageningen
20170406 Genomics@Google - KeyGene - Wageningen20170406 Genomics@Google - KeyGene - Wageningen
20170406 Genomics@Google - KeyGene - Wageningen
 
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...
 
Some Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsSome Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data Platforms
 
20170402 Crop Innovation and Business - Amsterdam
20170402 Crop Innovation and Business - Amsterdam20170402 Crop Innovation and Business - Amsterdam
20170402 Crop Innovation and Business - Amsterdam
 
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
Data Science: Not Just For Big Data
Data Science: Not Just For Big DataData Science: Not Just For Big Data
Data Science: Not Just For Big Data
 
Introduction to data science intro,ch(1,2,3)
Introduction to data science intro,ch(1,2,3)Introduction to data science intro,ch(1,2,3)
Introduction to data science intro,ch(1,2,3)
 
Data science Innovations January 2018
Data science Innovations January 2018Data science Innovations January 2018
Data science Innovations January 2018
 

Viewers also liked

Practical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large DatasetsPractical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large DatasetsRobert Grossman
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...Robert Grossman
 
AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016Robert Grossman
 
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...Robert Grossman
 
Machine Learning
Machine LearningMachine Learning
Machine Learningbutest
 
Big Data - Lab A1 (SC 11 Tutorial)
Big Data - Lab A1 (SC 11 Tutorial)Big Data - Lab A1 (SC 11 Tutorial)
Big Data - Lab A1 (SC 11 Tutorial)Robert Grossman
 
Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)Robert Grossman
 
What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?Robert Grossman
 
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery DataThe Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery DataRobert Grossman
 
Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)Robert Grossman
 
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Robert Grossman
 
The Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceThe Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceRobert Grossman
 
Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)Robert Grossman
 
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)Robert Grossman
 
Deep Learning: Theory, History, State of the Art & Practical Tools
Deep Learning: Theory, History, State of the Art & Practical ToolsDeep Learning: Theory, History, State of the Art & Practical Tools
Deep Learning: Theory, History, State of the Art & Practical ToolsIlya Kuzovkin
 
AF 2016-I Semana 2 Dia 1
AF 2016-I Semana 2 Dia 1AF 2016-I Semana 2 Dia 1
AF 2016-I Semana 2 Dia 1ORASMA
 
Estado de flujo de efectivo 2009 hoja 2
Estado de flujo de efectivo 2009 hoja 2Estado de flujo de efectivo 2009 hoja 2
Estado de flujo de efectivo 2009 hoja 2tommenik
 

Viewers also liked (20)

Practical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large DatasetsPractical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large Datasets
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
 
AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016
 
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Big Data - Lab A1 (SC 11 Tutorial)
Big Data - Lab A1 (SC 11 Tutorial)Big Data - Lab A1 (SC 11 Tutorial)
Big Data - Lab A1 (SC 11 Tutorial)
 
Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)
 
What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?
 
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery DataThe Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
 
Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)
 
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
 
The Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceThe Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of Science
 
Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)
 
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
 
Deep Learning: Theory, History, State of the Art & Practical Tools
Deep Learning: Theory, History, State of the Art & Practical ToolsDeep Learning: Theory, History, State of the Art & Practical Tools
Deep Learning: Theory, History, State of the Art & Practical Tools
 
Flores Tropicales de Costa Rica
Flores Tropicales de Costa RicaFlores Tropicales de Costa Rica
Flores Tropicales de Costa Rica
 
AF 2016-I Semana 2 Dia 1
AF 2016-I Semana 2 Dia 1AF 2016-I Semana 2 Dia 1
AF 2016-I Semana 2 Dia 1
 
Portafolio de finanzas internacionales
Portafolio de finanzas internacionalesPortafolio de finanzas internacionales
Portafolio de finanzas internacionales
 
Estado de flujo de efectivo 2009 hoja 2
Estado de flujo de efectivo 2009 hoja 2Estado de flujo de efectivo 2009 hoja 2
Estado de flujo de efectivo 2009 hoja 2
 
Garun odoleztatzea
Garun odoleztatzeaGarun odoleztatzea
Garun odoleztatzea
 

Similar to Adversarial Analytics - 2013 Strata & Hadoop World Talk

Data mining and Machine learning expained in jargon free & lucid language
Data mining and Machine learning expained in jargon free & lucid languageData mining and Machine learning expained in jargon free & lucid language
Data mining and Machine learning expained in jargon free & lucid languageq-Maxim
 
Building Predictive Analytics on Big Data Platforms
Building Predictive Analytics on Big Data PlatformsBuilding Predictive Analytics on Big Data Platforms
Building Predictive Analytics on Big Data PlatformsOlha Hrytsay
 
Machine Learning, Data Mining, and
Machine Learning, Data Mining, and Machine Learning, Data Mining, and
Machine Learning, Data Mining, and butest
 
Big data-analytics-changing-way-organizations-conducting-business
Big data-analytics-changing-way-organizations-conducting-businessBig data-analytics-changing-way-organizations-conducting-business
Big data-analytics-changing-way-organizations-conducting-businessAmit Bhargava
 
EDF2013: Big Data Tutorial: Marko Grobelnik
EDF2013: Big Data Tutorial: Marko GrobelnikEDF2013: Big Data Tutorial: Marko Grobelnik
EDF2013: Big Data Tutorial: Marko GrobelnikEuropean Data Forum
 
Data mining techniques and dss
Data mining techniques and dssData mining techniques and dss
Data mining techniques and dssNiyitegekabilly
 
Data mining final year project in jalandhar
Data mining final year project in jalandharData mining final year project in jalandhar
Data mining final year project in jalandhardeepikakaler1
 
Data mining final year project in ludhiana
Data mining final year project in ludhianaData mining final year project in ludhiana
Data mining final year project in ludhianadeepikakaler1
 
Explainability and bias in AI
Explainability and bias in AIExplainability and bias in AI
Explainability and bias in AIBill Liu
 
lec01-IntroductionToDataMining.pptx
lec01-IntroductionToDataMining.pptxlec01-IntroductionToDataMining.pptx
lec01-IntroductionToDataMining.pptxAmjadAlDgour
 
Introduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in DatabaseIntroduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in DatabaseKartik Kalpande Patil
 
Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)yesheeka
 
Artificial Intelligence Primer
Artificial Intelligence PrimerArtificial Intelligence Primer
Artificial Intelligence PrimerImam Hoque
 
What data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientistsWhat data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientistsHugo Bowne-Anderson
 
Challenges in business analytics
Challenges in business analyticsChallenges in business analytics
Challenges in business analyticsMiklos Koren
 
Data Mining in Operating System
Data Mining in Operating SystemData Mining in Operating System
Data Mining in Operating SystemITz_1
 
6months industrial training in data mining, jalandhar
6months industrial training in data mining, jalandhar6months industrial training in data mining, jalandhar
6months industrial training in data mining, jalandhardeepikakaler1
 

Similar to Adversarial Analytics - 2013 Strata & Hadoop World Talk (20)

Data mining and Machine learning expained in jargon free & lucid language
Data mining and Machine learning expained in jargon free & lucid languageData mining and Machine learning expained in jargon free & lucid language
Data mining and Machine learning expained in jargon free & lucid language
 
Data mining
Data miningData mining
Data mining
 
Building Predictive Analytics on Big Data Platforms
Building Predictive Analytics on Big Data PlatformsBuilding Predictive Analytics on Big Data Platforms
Building Predictive Analytics on Big Data Platforms
 
Machine Learning, Data Mining, and
Machine Learning, Data Mining, and Machine Learning, Data Mining, and
Machine Learning, Data Mining, and
 
Data mining
Data miningData mining
Data mining
 
Big data-analytics-changing-way-organizations-conducting-business
Big data-analytics-changing-way-organizations-conducting-businessBig data-analytics-changing-way-organizations-conducting-business
Big data-analytics-changing-way-organizations-conducting-business
 
EDF2013: Big Data Tutorial: Marko Grobelnik
EDF2013: Big Data Tutorial: Marko GrobelnikEDF2013: Big Data Tutorial: Marko Grobelnik
EDF2013: Big Data Tutorial: Marko Grobelnik
 
Data mining techniques and dss
Data mining techniques and dssData mining techniques and dss
Data mining techniques and dss
 
Data mining final year project in jalandhar
Data mining final year project in jalandharData mining final year project in jalandhar
Data mining final year project in jalandhar
 
Data mining final year project in ludhiana
Data mining final year project in ludhianaData mining final year project in ludhiana
Data mining final year project in ludhiana
 
Explainability and bias in AI
Explainability and bias in AIExplainability and bias in AI
Explainability and bias in AI
 
lec01-IntroductionToDataMining.pptx
lec01-IntroductionToDataMining.pptxlec01-IntroductionToDataMining.pptx
lec01-IntroductionToDataMining.pptx
 
Introduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in DatabaseIntroduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in Database
 
Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)
 
Artificial Intelligence Primer
Artificial Intelligence PrimerArtificial Intelligence Primer
Artificial Intelligence Primer
 
What data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientistsWhat data scientists really do, according to 50 data scientists
What data scientists really do, according to 50 data scientists
 
Big data overview
Big data overviewBig data overview
Big data overview
 
Challenges in business analytics
Challenges in business analyticsChallenges in business analytics
Challenges in business analytics
 
Data Mining in Operating System
Data Mining in Operating SystemData Mining in Operating System
Data Mining in Operating System
 
6months industrial training in data mining, jalandhar
6months industrial training in data mining, jalandhar6months industrial training in data mining, jalandhar
6months industrial training in data mining, jalandhar
 

Recently uploaded

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 

Recently uploaded (20)

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 

Adversarial Analytics - 2013 Strata & Hadoop World Talk

  • 1. Adversarial  Analy-cs  101   Strata  Conference  &  Hadoop  World   October  28,  2013   Robert  L.  Grossman   Open  Data  Group  
  • 2. Examples  of  Adversarial  Analy-cs     •  Compete  for  resources   –  Low  latency  trading   –  Auc-ons   •  Steal  someone  else’s  resources   –  Credit  card  fraud  rings     –  Insurance  fraud  rings     •  Hide  your  ac-vity   –  Click  spam     –  Exfiltra-on  of  informa-on    
  • 3. Conven-onal  vs  Adversarial  Analy-cs   Conven&onal  Analy&cs   Type  of  game   Gain  /  Gain   Subject   Individual   Behavior   DriUs  over  -me   Transparency   Behavior  usually  open   Automa-on   Manual  (an  individual)   Adversarial  Analy&cs   Gain  /  Loss   Group  /  ring  /  organiza-on   Sudden  changes   Behavior  usually  hidden   Some-mes  another  model  
  • 4. 1.  Points  of  Compromise   Source:  Wall  Street  Journal,  May  4,  2007  
  • 5. Case  Study:  Points  of  Compromise   •  TJX  compromise   •  Wireless  Point  of  Sale   devices  compromised     •  Personal  informa-on  on   451,000  individuals  taken   •  Informa-on  used  months   later  for  fraudulent   purchases.   •  WSJ  reports  that  45.7   million  credit  card  accounts   at  risk   •  “TJX's  breach-­‐related  bill   could  surpass  $1  billion   over  five  years”   Source:  Wall  Street  Journal,   May  4,  2007  
  • 7. The  adversary  is  oUen  a  group,   criminal  ring,  etc.  
  • 8. It’s  Not  an  Individual…   Source:  Jus-ce  Department,  New  York  Times.  
  • 9. Gonzalez  TradecraU   1.  Exploit  vulnerabili-es  in  wireless  networks   outside  of  retail  stores.   2.  Exploit  vulnerabili-es  in  databases  of  retail   organiza-ons.   3.  Gain  unauthorized  access  to  networks  that   process  and  store  credit  card  transac-ons.     4.  Exfiltrate  40  Million  track  2  records  (data  on   credit  card’s  magne-c  strip)  
  • 10. Gonzalez  TradecraU  (cont’d)   5.  Sell  track  2  data  in  Eastern  Europe,  USA,  and   other  places.   6.  Create  counterfeit  ATM  and  debit  cards  using   track  2  data.   7.  Conceal  and  launder  the  illegal  proceeds   obtained  through  anonymous  web  currencies   in  the  US  and  Russia.   Source:  US  District  Court,  District  of  Massachusehs,  Indictment  of   Albert  Gonzalez,  August  5,  2008.  
  • 11. Data  Is  Large   •  TJX  compromise  data  is  too   large  to  fit  into  memory   •  Data  is  difficult  to  fit  into   database   •  Millions  of  possible  points  of   compromise   •  Data  must  be  kept  for  months   to  years   Source:  Wall  Street  Journal,  May  4,  2007  
  • 12. 2.  Introduc-on  to     Adversarial  Analy-cs  
  • 13. Conven-onal  Analy-cs   •  Example:  predic-ve   model  to  select  online   ads.   •  Example:  predic-ve   model  to  classify   sen-ment  of  a  customer   service  interac-on.   Adversarial  Analy-cs   •  Example:  commercial   compe-tor  trying  to   maximize  trading  gains.   •  Example:  criminal  gang   ahacking  a  system  and   hiding  its  behavior.  
  • 14. What’s  Different  About  the  Models?   Data   Data  size   Model   Conven&onal  Analy&cs   Labeled   Small  to  large   Classifica-on  /     Regression   Frequency   Monthly,  quarterly,   of  update   yearly   Adversarial  Analy&cs   Unlabeled   Small  to  very  large   Clustering,  change   detec-on   Hourly,  daily,  weekly,  …  
  • 15. Types  of  Adversaries   Adversary   Home  Team   Use  specialized  teams   and  approaches.   Build  custom   analy-c  models.   Use  products.   Change   the  game.   $10,000,000   Create  new  analy-c   techniques.   $100,000   Use  exis-ng  analy-c  techniques.   $1,000   Source:  This  approach  is  based  in  part  on  the  threat  hierarchy  in  the  Defense  Science  (DSB)   Report  on  Resilient  Military  Systems  and  the  Cyber  Threat,  January,  2013  
  • 16. What  is  Different  About  the  Adversary?   En-ty   What  is   modeled?   Conven&onal  Analy&cs   Adversarial  Analy&cs   Person  browsing   Gang  ahacking  a  system   Events,  en--es   Events,  en--es,  rings   Behavior   Component  of  natural   behavior   Wai-ng  for  opportuni-es   Evolu-on   DriU   Ac-ve  change  in  behavior     Obfusca-on   Ignore   Hide  
  • 17. Upda-ng  Models   Conven&onal   Analy&cs   When  do  you   update  model?   Adversarial   Analy&cs   When  there  is  a   major  change  in   behavior   Frequently,  to  gain   an  advantage.   Process  for  upda-ng   Automa-on  and   models   analy-c   infrastructure   helpful.   Automa-on  and   analy-c   infrastructure   essen-al.  
  • 19. Click  Fraud   ?   •  Is  a  click  on  an  ad  legi-mate?   •  Is  it  done  by  a  computer  or  a  human?   •  If  done  by  a  human,  is  it  an  adversary?  
  • 20. Types  of  Adversaries   Adversary   Home  Team   Use  specialized  teams   and  approaches.   Build  custom   analy-c  models.   Use  products.   Change   the  game.   $10,000,000   Create  new  analy-c   techniques.   $100,000   Use  exis-ng  analy-c  techniques.   $1,000   Source:  This  approach  is  based  in  part  on  the  threat  hierarchy  in  the  Defense  Science  (DSB)   Report  on  Resilient  Military  Systems  and  the  Cyber  Threat,  January,  2013  
  • 21. Low  Latency  Trading   Broker/Dealer   ECN   Trades   Market  data   Low  latency   trader   Adversarial  traders   •  •  •  •  2+  billion  bids,  asks,  executes  /  day   >  90%  are  machine  generated   Most  are  cancelled   Decisions  are  made  in  ms   Algorithm   Algorithm   Back  tes-ng  
  • 22. Connec-ng  Chicago  &  NYC   Technology   Vendor   Round  Trip   Time  (ms)   Microwave   Windy   Apple     9  .0   Dark  fiber,   Spread   shorter  path   Networks   13.1   Standard   fiber,  ISP   14.5+   Various   Source:  lowlatency.com,  June  27,  2012;  Wired   Magazine,  August  3,  2012  
  • 23. Types  of  Adversaries   Adversary   Home  Team   Use  specialized  teams   and  approaches.   Build  custom   analy-c  models.   Use  products.   Change   the  game.   $10,000,000   Create  new  analy-c   techniques.   $100,000   Use  exis-ng  analy-c  techniques.   $1,000   Source:  This  approach  is  based  in  part  on  the  threat  hierarchy  in  the  Defense  Science  (DSB)   Report  on  Resilient  Military  Systems  and  the  Cyber  Threat,  January,  2013  
  • 24. Example:  Conficker   Source:  Conficker  Working  Group:  Lessons  Learned,  June  2010  (Published   January  2011),www.confickerworkinggroup.org  
  • 25. Infected  PCs   Command  and   control  center  
  • 26. Source:  Conficker  Working  Group:  Lessons  Learned,  June  2010  (Published   January  2011),www.confickerworkinggroup.org  
  • 27. Source:  Conficker  Working  Group:  Lessons  Learned,  June  2010  (Published   January  2011),www.confickerworkinggroup.org  
  • 28. Types  of  Adversaries   Adversary   Home  Team   Use  specialized  teams   and  approaches.   Build  custom   analy-c  models.   Use  products.   Change   the  game.   $10,000,000   Create  new  analy-c   techniques.   $100,000   Use  exis-ng  analy-c  techniques.   $1,000   Source:  This  approach  is  based  in  part  on  the  threat  hierarchy  in  the  Defense  Science  (DSB)   Report  on  Resilient  Military  Systems  and  the  Cyber  Threat,  January,  2013  
  • 29. 4.  Building  Models  for     Adversarial  Analy-cs  
  • 30. Building  Models  To     Iden-fy  Fraudulent  Transac-ons   Building  the  model:   1.  Get  a  dataset  of  labeled   data  (i.e.  fraud  is  labeled).   2.  Build  a  candidate  predic-ve   model  that  scores  each   transac-on  with  likelihood   of  fraud.   3.  Compare  liU  of  candidate   model  to  current  champion   model  on  hold-­‐out  dataset.   4.  Deploy  new  model  if   performance  is  beher.   Scoring  transac-ons:   1.  Get  next  event   (transac-on).   2.  Update  one  or  more   associated  feature  vectors   (account-­‐based)   3.  Use  model  to  process   updated  feature  vectors  to   compute  scores.   4.  Post-­‐process  scores  using   rules  to  compute  alerts.  
  • 31. Building  Models  to     Iden-fy  Fraud  Rings   Driver   Repair  shop   Repair  shop   Driver   Driver   Clinic   Driver   Driver   Driver   Clinic   •  Look  for  suspicious  paherns  and  rela-onships   among  claims.     •  Look  for  hidden  rela-onships  through  common   phone  numbers,  nearby  addresses,  etc.  
  • 32. Method  1:  Look  for  Tes-ng  Behavior   •  Adversaries  oUen  test  an  approach  first.   •  Build  models  to  detect  the  tes-ng.  
  • 33. Method  2:  Iden-fy  Common  Behavior   •  Common  cluster  algorithms  (e.g.  k-­‐means)   require  specifying  the  number  of  clusters   •  For  many  applica-ons,  we  keep  the  number  of   clusters  k  rela-vely  small   •  With  microclustering,  we  produce  many   clusters,  even  if  some  of  the  them  are  quite   similar.   •  Microclusters  can  some-mes  be  labeled    
  • 34. Microcluster  Based  Scoring   Good   Cluster   Bad   Cluster   •  The  closer  a   feature  vector  is  to   a  good  cluster,  the   more  likely  it  is  to   be  good   •  The  closer  it  is  to  a   bad  cluster,  the   more  likely  it  is  to   be  bad.  
  • 35. Method  3:  Scaling  Baseline  Algorithms   Drive  by  exploits   Source:  Wall  Street  Journal   Points  of  compromise  
  • 36. What  are  the  Common  Elements?   •  Time  stamps   •  Sites   –  e.g.  Web  sites,  vendors,  computers,  network  devices   •  En--es   •  •  •  •  –  e.g.  visitors,  users,  flows     Log  files  fill  disks;  many,  many  disks   Behavior  occurs  at  all  scales   Want  to  iden-fy  phenomena  at  all  scales   Need  to  group  “similar  behavior”  
  • 37. Examples  of  a  Site-­‐En-ty  Data   Example   Sites   En&&es   Drive-­‐by  exploits   Compromised   user  accounts   Compromised     payment  systems   Web  sites   Compromised   computers   Merchants   Computers   User  accounts   Cardholders   Source:  Collin  Benneh,  Robert  L.  Grossman,  David  Locke,  Jonathan  Seidman  and  Steve  Vejcik,   MalStone:  Towards  a  Benchmark  for  Analy-cs  on  Large  Data  Clouds,  The  16th  ACM  SIGKDD   Interna-onal  Conference  on  Knowledge  Discovery  and  Data  Mining  (KDD  2010),  ACM,  2010.   37  
  • 38. Exposure  Window   sites   en--es   Monitor  Window   dk-­‐1   dk   -me   38  
  • 39. The  Mark  Model   •  Some  sites  are  marked  (percent  of  mark  is  a   parameter  and  type  of  sites  marked  is  a  draw   from  a  distribu-on)   •  Some  en--es  become  marked  aUer  visi-ng  a   marked  site  (this  is  a  draw  from  a  distribu-on)   •  There  is  a  delay  between  the  visit  and  the  when   the  en-ty  becomes  marked  (this  is  a  draw  from  a   distribu-on)   •  There  is  a  background  process  that  marks  some   en--es  independent  of  visit  (this  adds  noise  to   problem)  
  • 40. Subsequent  Propor-on  of  Marks   •  Fix  a  site  s[j]   •  Let  A[j]  be  en--es  that  transact  during   ExpWin  and  if  en-ty  is  marked,  then  visit   occurs  before  mark   •  Let  B[j]  be  all  en--es  in  A[j]  that  become   marked  some-me  during  the  MonWin   •  Subsequent  propor-on  of  marks  is                                          r[j]  =  |  B[j]  |    /    |  A[j]    |   •  Easily  computed  with  MapReduce.    
  • 41. Compu-ng  Site  En-ty     Sta-s-cs  with  MapReduce   MalStone  B   Hadoop    MapReduce  v0.18.3   799  min   Hadoop  Python  Streams    v0.18.3   142  min   Sector  UDFs  v1.19   44  min   #  Nodes   20  nodes   #  Records   10  Billion   Size  of  Dataset     1  TB   Source:  Collin  Benneh,  Robert  L.  Grossman,  David  Locke,  Jonathan  Seidman  and  Steve  Vejcik,   MalStone:  Towards  a  Benchmark  for  Analy-cs  on  Large  Data  Clouds,  The  16th  ACM  SIGKDD   Interna-onal  Conference  on  Knowledge  Discovery  and  Data  Mining  (KDD  2010),  ACM,  2010.   41  
  • 43. Some  Rules   1.  Understand  your  adversary's  tradecraU;  he  or   she  will  go  aUer  the  easiest  or  weakest  link.   2.  Your  adversary  will  be  crea-ng  new   opportuni-es;  you  must  create  new  models  and   analy-c  approaches.   3.  Risk  is  usually  viewed  as  the  cost  of  doing   business,  but  every  once  in  a  while  the  cost  may   be  100x  –  1,000x  greater   4.  The  quicker  you  can  update  your  strategy  and   models,  the  greater  your  advantage.    Use   automa-on  (e.g.  PMML-­‐based  scoring  engines)   to  deploy  models  more  quickly.  
  • 45. About  Robert  L.  Grossman   Robert  Grossman  (@bobgrossman)  is  the  Founder  and  a  Partner   of  Open  Data  Group,  which  specializes  in  building  predic-ve   models  over  big  data.  He  is  also  a  Senior  Fellow  at  the   Computa-on  Ins-tute    and  Ins-tute  for  Genomics  and  Systems   Biology  at  the  University  of  Chicago  and  a  Professor  in  the   Biological  Sciences  Division.  He  has  led  the  development  of  new   open  source  soUware  tools  for  analyzing  big  data,  cloud   compu-ng,  and  high  performance  networking.  Prior  to  star-ng   Open  Data  Group,  he  founded  Magnify,  Inc.,  which  provides  data   mining  solu-ons  to  the  insurance  industry.  Grossman  was   Magnify’s  CEO  un-l  2001  and  its  Chairman  un-l  it  was  sold  to   ChoicePoint  in  2005.  He  blogs  occasionaly  about  big  data,  data   science,  and  data  engineering  at  rgrossman.com.      
  • 46. For  More  Informa-on   www.opendatagroup.com   info@opendatagroup.com