SlideShare a Scribd company logo
1 of 36
Download to read offline
RAPID PRUNING OF SEARCH SPACE THROUGH
HIERARCHICAL MATCHING
Chandra Mouleeswaran
Machine Learning Scientist, ThreatMetrix Inc.
5/2/13	
   1	
  
My	
  Background	
  
•  Machine	
  Learning	
  Scien8st	
  at	
  ThreatMetrix	
  Inc.	
  
•  Co-­‐	
  Chair,	
  Developer	
  Programs,	
  IntelliFest.org,	
  Oct	
  2013,	
  
San	
  Diego,	
  CA	
  
	
  
Career	
  Path	
  
-­‐  Siemens	
  Corporate	
  Research:	
  Learning	
  &	
  Expert	
  Systems	
  
-­‐  Technology	
  division	
  of	
  Donaldson,	
  LuQin	
  and	
  JenreSe	
  
company	
  (Pershing):	
  Ar8ficial	
  Intelligence	
  Group	
  -­‐	
  Network	
  
Monitoring	
  
-­‐  Several	
  startups:	
  Classifica8on,	
  Web	
  Crawling,	
  Security,	
  
Financial	
  Trading	
  etc.	
  
5/2/13	
   2	
  
Outline	
  
•  Task	
  descrip8on	
  
•  Approaches	
  
•  Why	
  search	
  paradigm?	
  
•  Hierarchical	
  matching	
  	
  
•  Results	
  
•  Acknowledgments	
  
	
  
5/2/13	
   3	
  
The	
  Device	
  Iden8fica8on	
  Task	
  
•  Computa8onally,	
  it’s	
  a	
  CLASSIFICATION	
  problem:	
  
{	
  a0,	
  a1,	
  a2,	
  a3………..	
  an	
  }	
  	
  è	
  {	
  ci	
  }	
  
ai	
  =	
  (	
  aSribute	
  |	
  field	
  |	
  key	
  )	
  value	
  
ci	
  =	
  (	
  label	
  |	
  signature	
  |	
  class	
  |	
  hash	
  )	
  
•  Returning	
  devices	
  should	
  be	
  correctly	
  iden8fied	
  
within	
  certain	
  tolerances	
  
•  New	
  classes	
  may	
  be	
  created	
  if	
  a	
  good	
  match	
  is	
  
not	
  found	
  in	
  the	
  repository	
  of	
  known	
  devices	
  
•  Devices	
  age	
  out,	
  based	
  on	
  data	
  reten8on	
  policy	
  
	
  
	
  
5/2/13	
   4	
  
Task	
  Challenges	
  
•  Extremely	
  vola8le	
  aSributes	
  
•  There	
  are	
  no	
  pivot	
  aSributes	
  to	
  divide	
  and	
  
conquer	
  the	
  search	
  space	
  	
  
•  Changing	
  distribu8ons	
  
•  Emphasis	
  on	
  PRECISION	
  
•  Stringent	
  RESPONSE	
  8me	
  
5/2/13	
   5	
  
Engineering	
  Challenges	
  
•  Precision	
  (accuracy)	
  and	
  latency	
  (response	
  
8me)	
  are	
  antagonis8c	
  constraints	
  
•  Project	
  management	
  	
  
Repository	
  Size	
  
(millions)	
  
Load	
  
(TPS)	
  
Latency	
  
(ms)	
  
Project	
  start	
   28	
   200	
  	
   <	
  100	
  
Present	
   280	
   300	
  	
   <	
  100	
  
Change	
   10	
  X	
   1.5	
  X	
   None	
  
5/2/13	
   6	
  
Approaches	
  
•  Rules	
  engine	
  
•  Learning	
  models	
  
•  Vector	
  space	
  models	
  
	
  
	
  
	
  
Need	
  an	
  enterprise	
  grade	
  solu8on!	
  
5/2/13	
   7	
  
Rules	
  Engine	
  
•  No	
  experts	
  
•  Number	
  of	
  rules?	
  
•  Maintenance?	
  
Not	
  a	
  viable	
  approach!	
  
	
  
5/2/13	
   8	
  
Learning	
  Models	
  
•  Most	
  machine	
  learning	
  methods	
  deal	
  
predominantly	
  with	
  binary	
  classifica8on	
  
problems	
  (eg.	
  fraud	
  /	
  not	
  fraud)	
  or	
  a	
  small	
  
number	
  of	
  target	
  classes	
  
•  Few	
  exemplars	
  for	
  each	
  class	
  
•  ASribute	
  values	
  may	
  be	
  unbounded	
  	
  
•  ASributes	
  may	
  not	
  follow	
  a	
  natural	
  
progression	
  
	
  
5/2/13	
   9	
  
Learning	
  Models	
  …	
  
•  Unsupervised	
  learning	
  such	
  as	
  clustering	
  
methods	
  would	
  make	
  good	
  models,	
  but	
  not	
  
good	
  enough	
  to	
  be	
  of	
  prac8cal	
  use.	
  Any	
  
simplifica8on	
  process	
  will	
  compromise	
  on	
  
accuracy	
  
•  Ability	
  to	
  explain	
  is	
  cri8cal	
  
•  Tend	
  to	
  ignore	
  domain	
  knowledge	
  
	
  
Challenge	
  in	
  providing	
  enterprise	
  solu8on	
  
5/2/13	
   10	
  
Thoughts	
  
•  No	
  comparable	
  applica8on	
  with	
  such	
  
requirements	
  
•  Build	
  and	
  deploy	
  a	
  classifier	
  that	
  explains	
  itself	
  
easily,	
  scales	
  temporally	
  and	
  offers	
  quick	
  
response	
  
•  Use	
  domain	
  knowledge	
  to	
  guide	
  verifica8on	
  
•  Improve	
  the	
  classifier	
  through	
  machine	
  
learning	
  methods	
  by	
  analyzing	
  performance	
  in	
  
the	
  field	
  
	
  
5/2/13	
   11	
  
Vector-­‐Space	
  Models	
  
•  Similarity	
  based	
  search	
  make	
  vector-­‐space	
  
model	
  a	
  good	
  choice	
  for	
  genera8ng	
  selec8ons	
  
•  Given	
  the	
  vola8le	
  nature	
  of	
  data,	
  informa8on	
  
retrieval	
  (IR)	
  systems	
  can	
  adapt	
  easily	
  
•  Good	
  at	
  neighborhood	
  search	
  
	
  
Sensi8ve	
  to	
  individual	
  aSribute	
  changes!	
  
5/2/13	
   12	
  
Sources	
  of	
  Inspira8on	
  
•  Lucene/Solr	
  features	
  
•  Documenta8on	
  from	
  (erstwhile)	
  Lucid	
  
Imagina8on	
  
•  Ease	
  with	
  which	
  Lucene/Solr	
  could	
  be	
  
installed	
  and	
  explored	
  
	
  
Very	
  short	
  learning	
  curve	
  for	
  novices!	
  
5/2/13	
   13	
  
Feature	
  Selec8on	
  	
  
•  Primi8ve	
  and	
  derived	
  aSributes	
  
•  Entropy	
  
•  Distribu8on	
  
	
  
5/2/13	
   14	
  
Domain	
  
•  Devices	
  come	
  with	
  structural	
  informa8on	
  but	
  
not	
  much	
  grammar	
  or	
  seman8cs	
  
•  Bag-­‐of-­‐words	
  (single	
  field)	
  approach	
  is	
  fast	
  but	
  
not	
  precise	
  
•  Using	
  all	
  fields	
  is	
  precise	
  but	
  response	
  is	
  slow	
  
	
  
	
  
Now	
  what?	
  
5/2/13	
   15	
  
Disjunc8on	
  Max	
  
•  Matrix	
  of	
  all	
  possible	
  combina8ons	
  of	
  user	
  input	
  query	
  
and	
  document	
  fields	
  
•  Transforms	
  into	
  a	
  Boolean	
  query	
  of	
  
Disjunc8onMaxQueries	
  of	
  each	
  row	
  
•  Maximum	
  score	
  of	
  sub	
  clauses	
  Is	
  used	
  by	
  
Disjunc8onMaxQuery	
  
•  No	
  single	
  term	
  in	
  user	
  input	
  dominates	
  
	
  
This	
  is	
  needed!	
  
	
  
Src:	
  SearchHub	
  and	
  LucidWorks	
  
	
  
	
  5/2/13	
   16	
  
DisMax	
  Experiments	
  
(index	
  size	
  =	
  60	
  Million)	
  
Scenario	
  1	
  
mm=2	
  	
  
Solr	
  fields	
  =	
  {	
  a1,	
  a2,	
  
a3	
  }	
  
Values=	
  {	
  phrase1,	
  
phrase2,	
  phrase3}	
  
	
  
Must-­‐Match	
  Clauses	
  
Latency:	
  YES	
  (35	
  ms)	
  
Precision:	
  NO	
  (20%	
  
failure)	
  
5/2/13	
   17	
  
Scenario	
  2	
  
mm	
  =	
  50	
  %	
  
Solr	
  fields	
  =	
  {	
  a1	
  }	
  
Values=	
  {	
  term1,	
  term2,	
  
term3	
  ….	
  termn	
  }	
  
	
  
Should-­‐Match	
  Clauses	
  
Latency:	
  NO	
  (>	
  2	
  seconds)	
  
Precision:	
  YES	
  (>	
  98%)	
  
Possible	
  Workaround	
  	
  
•  Look-­‐ahead:	
  Customize	
  Lucene/Solr	
  to	
  do	
  a	
  
branch-­‐and-­‐bound	
  search,	
  bail	
  out	
  on	
  some	
  
lower	
  bound	
  score	
  
•  Minimize	
  candidates	
  for	
  DisMax	
  search	
  
-­‐  reduce	
  total	
  number	
  of	
  Solr	
  instances	
  to	
  search	
  
-­‐  reduce	
  total	
  number	
  of	
  disjunc8ve	
  terms	
  	
  
	
  [	
  Empirical	
  es8mate:	
  tn	
  =	
  2	
  *	
  tn-­‐1	
  
	
   	
  where	
  t	
  =	
  8me	
  &	
  	
  
	
   	
   	
   	
  n	
  =	
  number	
  of	
  disjunc8ve	
  terms]	
  
5/2/13	
   18	
  
Phrases	
  over	
  Terms	
  
•  Used	
  coloca8on	
  (co-­‐occurrence	
  matrix)	
  to	
  
determine	
  most	
  common	
  phrases	
  
•  Delete	
  terms	
  covered	
  by	
  phrases	
  
•  Add	
  stop	
  words	
  based	
  on	
  frequency	
  analysis	
  
•  Ensure	
  precision	
  is	
  preserved	
  through	
  
regression	
  tests	
  
	
  
Reduced	
  the	
  number	
  of	
  DisMax	
  terms	
  by	
  30%	
  
5/2/13	
   19	
  
Sources	
  of	
  Inspira8on	
  
•  Planning	
  in	
  a	
  Hierarchy	
  of	
  Abstrac8on	
  Spaces,	
  
Ar8ficial	
  Intelligence,	
  Vol.	
  5,	
  No.	
  2,	
  pp.	
  
115-­‐135	
  (1974)	
  	
  
•  Search	
  Reduc8on	
  in	
  Hierarchical	
  Problem	
  
Solving,	
  Proc.	
  Of	
  the	
  9th	
  IJCAI,	
  AAAI	
  Press,	
  
Menlo	
  Park,	
  CA	
  (1991)	
  
•  Excep8onal	
  Data	
  Quality	
  Using	
  Intelligent	
  
Matching	
  and	
  Retrieval,	
  AI	
  Magazine,	
  AAAI	
  
Press	
  (Spring	
  2010)	
  
5/2/13	
   20	
  
Hierarchical	
  Matching	
  
Bag	
  of	
  words	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
Models	
  
Phrases	
  
Filters	
   DisMax	
  
Query	
  
Formulator	
  
Domain-­‐
specific	
  
paSerns	
  
	
  
	
  
CSV/JSON	
  
Solr	
  	
  
instances	
  
selector	
  
To	
  Solr	
  Servers	
  
5/2/13	
  
21	
  
Verifica8on	
  
Conflict	
  Resolu8on	
  
•  Top	
  n	
  candidates	
  are	
  returned	
  from	
  each	
  Solr	
  
instance	
  
•  They	
  are	
  ranked	
  based	
  on	
  custom	
  verifica8on	
  
module	
  
•  Ties	
  are	
  broken	
  using	
  recency	
  
•  Top	
  candidate	
  is	
  persisted	
  and	
  returned	
  along	
  
with	
  custom	
  score	
  
5/2/13	
   22	
  
Comments	
  
•  Dismax	
  performs	
  mul8dimensional	
  match	
  
•  Extracted	
  mul8ple	
  filters	
  and	
  arranged	
  them	
  
hierarchically	
  
•  Separa8on	
  of	
  selec8on	
  and	
  evalua8on	
  
-­‐  Selec8on	
  =	
  approximate	
  solu8on	
  
-­‐  Evalua8on	
  =	
  refinement	
  
5/2/13	
   23	
  
Where	
  8me	
  went..	
  
•  ASribute	
  selec8on	
  
•  Ranking	
  	
  
•  Op8miza8on	
  
•  Index	
  re-­‐genera8on	
  	
  
•  Regression	
  tes8ng	
  
5/2/13	
   24	
  
Sources	
  for	
  Tune	
  Up	
  
•  Scaling	
  Solr,	
  Lucene	
  Revolu8on,	
  May	
  2011	
  	
  
•  Prac8cal	
  Search	
  with	
  Solr:	
  Beyond	
  just	
  Looking	
  
it	
  Up,	
  Lucid	
  Imagina8on,	
  May	
  2010	
  
5/2/13	
   25	
  
Tes8ng	
  
•  Precision	
  tes8ng	
  using	
  self	
  and	
  mixed	
  modes	
  
•  Latency	
  tests	
  	
  
-­‐  custom	
  harness	
  for	
  stand-­‐alone	
  tests	
  
-­‐  integrated	
  tests	
  with	
  JMeter	
  framework	
  
5/2/13	
   26	
  
 
Results	
  
5/2/13	
   27	
  
Latency	
  Percen8les	
  
original	
  edismax	
  
Ini8al	
  solu8on	
  
Op8miza8on	
  2:	
  Domain	
  paSerns,	
  	
  
Stop	
  words,	
  de-­‐dupe	
  
Op8miza8on	
  1:	
  Filters,	
  
Focused	
  search,	
  verifica8on	
  
5/2/13	
   28	
  
TPS	
  
5/2/13	
   29	
  
Response	
  Times	
  over	
  Time	
  
5/2/13	
   30	
  
Project	
  Execu8on	
  
•  Agile	
  Methodology	
  
•  Risk	
  mi8ga8on	
  through	
  primary	
  and	
  
con8ngency	
  plans	
  
•  Rapid	
  prototyping	
  followed	
  by	
  good	
  sozware	
  
engineering	
  prac8ces	
  
•  Evalua8ng	
  DSE	
  (DataStax)	
  &	
  Solr	
  Cloud	
  
	
  
5/2/13	
   31	
  
Gleanings	
  
•  You	
  can	
  classify	
  anything	
  with	
  Lucene/Solr,	
  
lexicon	
  is	
  your	
  own	
  
•  The	
  ques8on	
  is	
  not	
  whether	
  Lucene/Solr	
  can	
  
solve	
  a	
  par8cular	
  classifica8on	
  problem,	
  but	
  
whether	
  you	
  can	
  priori8ze	
  among	
  the	
  many	
  
ways	
  of	
  doing	
  it	
  
•  If	
  you	
  run	
  into	
  a	
  problem,	
  someone	
  has	
  solved	
  
it	
  or	
  will	
  solve	
  it	
  in	
  the	
  near	
  future	
  
	
  
5/2/13	
   32	
  
Gleanings	
  …	
  
•  Deal	
  with	
  accuracy	
  before	
  latency	
  
•  If	
  precision,	
  latency	
  and	
  scale	
  are	
  all	
  cri8cal	
  to	
  
your	
  domain,	
  expect	
  to	
  invest	
  some8me	
  in	
  
hierarchical	
  abstrac8ons	
  
•  Index	
  once,	
  run	
  any8me,	
  anywhere,	
  does	
  not	
  
apply	
  during	
  development	
  
•  Throwing	
  all	
  data	
  at	
  Lucene/Solr	
  will	
  not	
  work	
  for	
  
mission	
  cri8cal	
  applica8ons	
  
•  Rapid	
  prototyping	
  and	
  willingness	
  to	
  fail	
  
5/2/13	
   33	
  
Summary	
  
	
  
	
  
	
  
Simplify	
  and	
  match	
  at	
  mul0ple	
  levels	
  of	
  
abstrac0on	
  
	
  
5/2/13	
   34	
  
Contributors	
  
Chandra	
  Mouleeswaran	
  
Research	
  &	
  Prototyping	
  
Fang	
  Chen	
  
Research	
  &	
  Prototyping	
  
Luke	
  Mertens	
  
Produc8za8on	
  &	
  Scalability	
  
Brent	
  Pearson	
  
Release	
  Management	
  
Tracy	
  Hsu	
  
Precision	
  Tes8ng	
  &	
  QA	
  
5/2/13	
   35	
  
Srinivas	
  Nayani	
  
Deployment	
  &	
  QA	
  
COMMENTS & FEEDBACK:
Chandra Mouleeswaran
cmouleeswaran@threatmetrix.com
5/2/13	
   36	
  

More Related Content

What's hot

Artificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software TestingArtificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software TestingLionel Briand
 
OCLR: A More Expressive, Pattern-Based Temporal Extension of OCL
OCLR: A More Expressive, Pattern-Based Temporal Extension of OCLOCLR: A More Expressive, Pattern-Based Temporal Extension of OCL
OCLR: A More Expressive, Pattern-Based Temporal Extension of OCLLionel Briand
 
Automated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance SystemsAutomated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance SystemsLionel Briand
 
Scalable and Cost-Effective Model-Based Software Verification and Testing
Scalable and Cost-Effective Model-Based Software Verification and TestingScalable and Cost-Effective Model-Based Software Verification and Testing
Scalable and Cost-Effective Model-Based Software Verification and TestingLionel Briand
 
Dependable Systems -Software Dependability (15/16)
Dependable Systems -Software Dependability (15/16)Dependable Systems -Software Dependability (15/16)
Dependable Systems -Software Dependability (15/16)Peter Tröger
 
Keynote SBST 2014 - Search-Based Testing
Keynote SBST 2014 - Search-Based TestingKeynote SBST 2014 - Search-Based Testing
Keynote SBST 2014 - Search-Based TestingLionel Briand
 
Applications of Machine Learning and Metaheuristic Search to Security Testing
Applications of Machine Learning and Metaheuristic Search to Security TestingApplications of Machine Learning and Metaheuristic Search to Security Testing
Applications of Machine Learning and Metaheuristic Search to Security TestingLionel Briand
 
Scalable Software Testing and Verification of Non-Functional Properties throu...
Scalable Software Testing and Verification of Non-Functional Properties throu...Scalable Software Testing and Verification of Non-Functional Properties throu...
Scalable Software Testing and Verification of Non-Functional Properties throu...Lionel Briand
 
Dependable Systems - Introduction (1/16)
Dependable Systems - Introduction (1/16)Dependable Systems - Introduction (1/16)
Dependable Systems - Introduction (1/16)Peter Tröger
 
Dependable Systems -Dependability Threats (2/16)
Dependable Systems -Dependability Threats (2/16)Dependable Systems -Dependability Threats (2/16)
Dependable Systems -Dependability Threats (2/16)Peter Tröger
 
Analyzing Natural-Language Requirements: The Not-too-sexy and Yet Curiously D...
Analyzing Natural-Language Requirements: The Not-too-sexy and Yet Curiously D...Analyzing Natural-Language Requirements: The Not-too-sexy and Yet Curiously D...
Analyzing Natural-Language Requirements: The Not-too-sexy and Yet Curiously D...Lionel Briand
 
Dependable Systems -Dependability Attributes (5/16)
Dependable Systems -Dependability Attributes (5/16)Dependable Systems -Dependability Attributes (5/16)
Dependable Systems -Dependability Attributes (5/16)Peter Tröger
 
Automated Inference of Access Control Policies for Web Applications
Automated Inference of Access Control Policies for Web ApplicationsAutomated Inference of Access Control Policies for Web Applications
Automated Inference of Access Control Policies for Web ApplicationsLionel Briand
 
Dependable Systems - Summary (16/16)
Dependable Systems - Summary (16/16)Dependable Systems - Summary (16/16)
Dependable Systems - Summary (16/16)Peter Tröger
 
System Testing of Timing Requirements based on Use Cases and Timed Automata
System Testing of Timing Requirements based on Use Cases and Timed AutomataSystem Testing of Timing Requirements based on Use Cases and Timed Automata
System Testing of Timing Requirements based on Use Cases and Timed AutomataLionel Briand
 
Dependable Systems - Hardware Dependability with Diagnosis (13/16)
Dependable Systems - Hardware Dependability with Diagnosis (13/16)Dependable Systems - Hardware Dependability with Diagnosis (13/16)
Dependable Systems - Hardware Dependability with Diagnosis (13/16)Peter Tröger
 
Dependable Systems -Fault Tolerance Patterns (4/16)
Dependable Systems -Fault Tolerance Patterns (4/16)Dependable Systems -Fault Tolerance Patterns (4/16)
Dependable Systems -Fault Tolerance Patterns (4/16)Peter Tröger
 
Testing of Cyber-Physical Systems: Diversity-driven Strategies
Testing of Cyber-Physical Systems: Diversity-driven StrategiesTesting of Cyber-Physical Systems: Diversity-driven Strategies
Testing of Cyber-Physical Systems: Diversity-driven StrategiesLionel Briand
 
Testing Autonomous Cars for Feature Interaction Failures using Many-Objective...
Testing Autonomous Cars for Feature Interaction Failures using Many-Objective...Testing Autonomous Cars for Feature Interaction Failures using Many-Objective...
Testing Autonomous Cars for Feature Interaction Failures using Many-Objective...Lionel Briand
 
OpenSubmit - How to grade 1200 code submissions
OpenSubmit - How to grade 1200 code submissionsOpenSubmit - How to grade 1200 code submissions
OpenSubmit - How to grade 1200 code submissionsPeter Tröger
 

What's hot (20)

Artificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software TestingArtificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software Testing
 
OCLR: A More Expressive, Pattern-Based Temporal Extension of OCL
OCLR: A More Expressive, Pattern-Based Temporal Extension of OCLOCLR: A More Expressive, Pattern-Based Temporal Extension of OCL
OCLR: A More Expressive, Pattern-Based Temporal Extension of OCL
 
Automated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance SystemsAutomated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance Systems
 
Scalable and Cost-Effective Model-Based Software Verification and Testing
Scalable and Cost-Effective Model-Based Software Verification and TestingScalable and Cost-Effective Model-Based Software Verification and Testing
Scalable and Cost-Effective Model-Based Software Verification and Testing
 
Dependable Systems -Software Dependability (15/16)
Dependable Systems -Software Dependability (15/16)Dependable Systems -Software Dependability (15/16)
Dependable Systems -Software Dependability (15/16)
 
Keynote SBST 2014 - Search-Based Testing
Keynote SBST 2014 - Search-Based TestingKeynote SBST 2014 - Search-Based Testing
Keynote SBST 2014 - Search-Based Testing
 
Applications of Machine Learning and Metaheuristic Search to Security Testing
Applications of Machine Learning and Metaheuristic Search to Security TestingApplications of Machine Learning and Metaheuristic Search to Security Testing
Applications of Machine Learning and Metaheuristic Search to Security Testing
 
Scalable Software Testing and Verification of Non-Functional Properties throu...
Scalable Software Testing and Verification of Non-Functional Properties throu...Scalable Software Testing and Verification of Non-Functional Properties throu...
Scalable Software Testing and Verification of Non-Functional Properties throu...
 
Dependable Systems - Introduction (1/16)
Dependable Systems - Introduction (1/16)Dependable Systems - Introduction (1/16)
Dependable Systems - Introduction (1/16)
 
Dependable Systems -Dependability Threats (2/16)
Dependable Systems -Dependability Threats (2/16)Dependable Systems -Dependability Threats (2/16)
Dependable Systems -Dependability Threats (2/16)
 
Analyzing Natural-Language Requirements: The Not-too-sexy and Yet Curiously D...
Analyzing Natural-Language Requirements: The Not-too-sexy and Yet Curiously D...Analyzing Natural-Language Requirements: The Not-too-sexy and Yet Curiously D...
Analyzing Natural-Language Requirements: The Not-too-sexy and Yet Curiously D...
 
Dependable Systems -Dependability Attributes (5/16)
Dependable Systems -Dependability Attributes (5/16)Dependable Systems -Dependability Attributes (5/16)
Dependable Systems -Dependability Attributes (5/16)
 
Automated Inference of Access Control Policies for Web Applications
Automated Inference of Access Control Policies for Web ApplicationsAutomated Inference of Access Control Policies for Web Applications
Automated Inference of Access Control Policies for Web Applications
 
Dependable Systems - Summary (16/16)
Dependable Systems - Summary (16/16)Dependable Systems - Summary (16/16)
Dependable Systems - Summary (16/16)
 
System Testing of Timing Requirements based on Use Cases and Timed Automata
System Testing of Timing Requirements based on Use Cases and Timed AutomataSystem Testing of Timing Requirements based on Use Cases and Timed Automata
System Testing of Timing Requirements based on Use Cases and Timed Automata
 
Dependable Systems - Hardware Dependability with Diagnosis (13/16)
Dependable Systems - Hardware Dependability with Diagnosis (13/16)Dependable Systems - Hardware Dependability with Diagnosis (13/16)
Dependable Systems - Hardware Dependability with Diagnosis (13/16)
 
Dependable Systems -Fault Tolerance Patterns (4/16)
Dependable Systems -Fault Tolerance Patterns (4/16)Dependable Systems -Fault Tolerance Patterns (4/16)
Dependable Systems -Fault Tolerance Patterns (4/16)
 
Testing of Cyber-Physical Systems: Diversity-driven Strategies
Testing of Cyber-Physical Systems: Diversity-driven StrategiesTesting of Cyber-Physical Systems: Diversity-driven Strategies
Testing of Cyber-Physical Systems: Diversity-driven Strategies
 
Testing Autonomous Cars for Feature Interaction Failures using Many-Objective...
Testing Autonomous Cars for Feature Interaction Failures using Many-Objective...Testing Autonomous Cars for Feature Interaction Failures using Many-Objective...
Testing Autonomous Cars for Feature Interaction Failures using Many-Objective...
 
OpenSubmit - How to grade 1200 code submissions
OpenSubmit - How to grade 1200 code submissionsOpenSubmit - How to grade 1200 code submissions
OpenSubmit - How to grade 1200 code submissions
 

Similar to Rapid pruning of search space through hierarchical matching

Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Maarten Smeets
 
BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6Rod Soto
 
Agile performance engineering with cloud 2016
Agile performance engineering with cloud   2016Agile performance engineering with cloud   2016
Agile performance engineering with cloud 2016Ken Chan
 
Scam2011 syer
Scam2011 syerScam2011 syer
Scam2011 syerSAIL_QU
 
Always On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on CassandraAlways On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on CassandraRobbie Strickland
 
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Spark Summit
 
The Diabolical Developers Guide to Performance Tuning
The Diabolical Developers Guide to Performance TuningThe Diabolical Developers Guide to Performance Tuning
The Diabolical Developers Guide to Performance TuningjClarity
 
Machine Learning and AI: Core Methods and Applications
Machine Learning and AI: Core Methods and ApplicationsMachine Learning and AI: Core Methods and Applications
Machine Learning and AI: Core Methods and ApplicationsQuantUniversity
 
Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses,...
Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses,...Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses,...
Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses,...DevOpsDays Tel Aviv
 
2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratch2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratchFEG
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyAlon Bochman, CFA
 
The Automation Firehose: Be Strategic & Tactical With Your Mobile & Web Testing
The Automation Firehose: Be Strategic & Tactical With Your Mobile & Web TestingThe Automation Firehose: Be Strategic & Tactical With Your Mobile & Web Testing
The Automation Firehose: Be Strategic & Tactical With Your Mobile & Web TestingPerfecto by Perforce
 
Automated product categorization
Automated product categorizationAutomated product categorization
Automated product categorizationAndreas Loupasakis
 
Automated product categorization
Automated product categorization   Automated product categorization
Automated product categorization Warply
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkIvo Andreev
 
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptxPPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptxneju3
 
Rise of the machines -- Owasp israel -- June 2014 meetup
Rise of the machines -- Owasp israel -- June 2014 meetupRise of the machines -- Owasp israel -- June 2014 meetup
Rise of the machines -- Owasp israel -- June 2014 meetupShlomo Yona
 
Electi Deep Learning Optimization
Electi  Deep Learning OptimizationElecti  Deep Learning Optimization
Electi Deep Learning OptimizationNikolas Markou
 

Similar to Rapid pruning of search space through hierarchical matching (20)

Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!
 
BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6
 
Agile performance engineering with cloud 2016
Agile performance engineering with cloud   2016Agile performance engineering with cloud   2016
Agile performance engineering with cloud 2016
 
Scam2011 syer
Scam2011 syerScam2011 syer
Scam2011 syer
 
Always On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on CassandraAlways On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on Cassandra
 
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
 
techniques.ppt
techniques.ppttechniques.ppt
techniques.ppt
 
The Diabolical Developers Guide to Performance Tuning
The Diabolical Developers Guide to Performance TuningThe Diabolical Developers Guide to Performance Tuning
The Diabolical Developers Guide to Performance Tuning
 
Machine Learning and AI: Core Methods and Applications
Machine Learning and AI: Core Methods and ApplicationsMachine Learning and AI: Core Methods and Applications
Machine Learning and AI: Core Methods and Applications
 
Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses,...
Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses,...Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses,...
Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses,...
 
2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratch2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratch
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case Study
 
Introduction
IntroductionIntroduction
Introduction
 
The Automation Firehose: Be Strategic & Tactical With Your Mobile & Web Testing
The Automation Firehose: Be Strategic & Tactical With Your Mobile & Web TestingThe Automation Firehose: Be Strategic & Tactical With Your Mobile & Web Testing
The Automation Firehose: Be Strategic & Tactical With Your Mobile & Web Testing
 
Automated product categorization
Automated product categorizationAutomated product categorization
Automated product categorization
 
Automated product categorization
Automated product categorization   Automated product categorization
Automated product categorization
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptxPPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
 
Rise of the machines -- Owasp israel -- June 2014 meetup
Rise of the machines -- Owasp israel -- June 2014 meetupRise of the machines -- Owasp israel -- June 2014 meetup
Rise of the machines -- Owasp israel -- June 2014 meetup
 
Electi Deep Learning Optimization
Electi  Deep Learning OptimizationElecti  Deep Learning Optimization
Electi Deep Learning Optimization
 

More from lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 

More from lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 

Recently uploaded

P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfP4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfYu Kanazawa / Osaka University
 
How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17Celine George
 
Practical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxPractical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxKatherine Villaluna
 
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptxSandy Millin
 
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptxClinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptxraviapr7
 
Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.raviapr7
 
How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17Celine George
 
CAULIFLOWER BREEDING 1 Parmar pptx
CAULIFLOWER BREEDING 1 Parmar pptxCAULIFLOWER BREEDING 1 Parmar pptx
CAULIFLOWER BREEDING 1 Parmar pptxSaurabhParmar42
 
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRADUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRATanmoy Mishra
 
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxAUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxiammrhaywood
 
How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17Celine George
 
5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...CaraSkikne1
 
Philosophy of Education and Educational Philosophy
Philosophy of Education  and Educational PhilosophyPhilosophy of Education  and Educational Philosophy
Philosophy of Education and Educational PhilosophyShuvankar Madhu
 
Benefits & Challenges of Inclusive Education
Benefits & Challenges of Inclusive EducationBenefits & Challenges of Inclusive Education
Benefits & Challenges of Inclusive EducationMJDuyan
 
General views of Histopathology and step
General views of Histopathology and stepGeneral views of Histopathology and step
General views of Histopathology and stepobaje godwin sunday
 
CapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptxCapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptxCapitolTechU
 
Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.EnglishCEIPdeSigeiro
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxDr. Asif Anas
 

Recently uploaded (20)

P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfP4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
 
How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17
 
Practical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxPractical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptx
 
Finals of Kant get Marx 2.0 : a general politics quiz
Finals of Kant get Marx 2.0 : a general politics quizFinals of Kant get Marx 2.0 : a general politics quiz
Finals of Kant get Marx 2.0 : a general politics quiz
 
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
 
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptxClinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
 
Personal Resilience in Project Management 2 - TV Edit 1a.pdf
Personal Resilience in Project Management 2 - TV Edit 1a.pdfPersonal Resilience in Project Management 2 - TV Edit 1a.pdf
Personal Resilience in Project Management 2 - TV Edit 1a.pdf
 
Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.
 
How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17
 
CAULIFLOWER BREEDING 1 Parmar pptx
CAULIFLOWER BREEDING 1 Parmar pptxCAULIFLOWER BREEDING 1 Parmar pptx
CAULIFLOWER BREEDING 1 Parmar pptx
 
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRADUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
 
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxAUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
 
How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17
 
5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...
 
Philosophy of Education and Educational Philosophy
Philosophy of Education  and Educational PhilosophyPhilosophy of Education  and Educational Philosophy
Philosophy of Education and Educational Philosophy
 
Benefits & Challenges of Inclusive Education
Benefits & Challenges of Inclusive EducationBenefits & Challenges of Inclusive Education
Benefits & Challenges of Inclusive Education
 
General views of Histopathology and step
General views of Histopathology and stepGeneral views of Histopathology and step
General views of Histopathology and step
 
CapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptxCapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptx
 
Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptx
 

Rapid pruning of search space through hierarchical matching

  • 1. RAPID PRUNING OF SEARCH SPACE THROUGH HIERARCHICAL MATCHING Chandra Mouleeswaran Machine Learning Scientist, ThreatMetrix Inc. 5/2/13   1  
  • 2. My  Background   •  Machine  Learning  Scien8st  at  ThreatMetrix  Inc.   •  Co-­‐  Chair,  Developer  Programs,  IntelliFest.org,  Oct  2013,   San  Diego,  CA     Career  Path   -­‐  Siemens  Corporate  Research:  Learning  &  Expert  Systems   -­‐  Technology  division  of  Donaldson,  LuQin  and  JenreSe   company  (Pershing):  Ar8ficial  Intelligence  Group  -­‐  Network   Monitoring   -­‐  Several  startups:  Classifica8on,  Web  Crawling,  Security,   Financial  Trading  etc.   5/2/13   2  
  • 3. Outline   •  Task  descrip8on   •  Approaches   •  Why  search  paradigm?   •  Hierarchical  matching     •  Results   •  Acknowledgments     5/2/13   3  
  • 4. The  Device  Iden8fica8on  Task   •  Computa8onally,  it’s  a  CLASSIFICATION  problem:   {  a0,  a1,  a2,  a3………..  an  }    è  {  ci  }   ai  =  (  aSribute  |  field  |  key  )  value   ci  =  (  label  |  signature  |  class  |  hash  )   •  Returning  devices  should  be  correctly  iden8fied   within  certain  tolerances   •  New  classes  may  be  created  if  a  good  match  is   not  found  in  the  repository  of  known  devices   •  Devices  age  out,  based  on  data  reten8on  policy       5/2/13   4  
  • 5. Task  Challenges   •  Extremely  vola8le  aSributes   •  There  are  no  pivot  aSributes  to  divide  and   conquer  the  search  space     •  Changing  distribu8ons   •  Emphasis  on  PRECISION   •  Stringent  RESPONSE  8me   5/2/13   5  
  • 6. Engineering  Challenges   •  Precision  (accuracy)  and  latency  (response   8me)  are  antagonis8c  constraints   •  Project  management     Repository  Size   (millions)   Load   (TPS)   Latency   (ms)   Project  start   28   200     <  100   Present   280   300     <  100   Change   10  X   1.5  X   None   5/2/13   6  
  • 7. Approaches   •  Rules  engine   •  Learning  models   •  Vector  space  models         Need  an  enterprise  grade  solu8on!   5/2/13   7  
  • 8. Rules  Engine   •  No  experts   •  Number  of  rules?   •  Maintenance?   Not  a  viable  approach!     5/2/13   8  
  • 9. Learning  Models   •  Most  machine  learning  methods  deal   predominantly  with  binary  classifica8on   problems  (eg.  fraud  /  not  fraud)  or  a  small   number  of  target  classes   •  Few  exemplars  for  each  class   •  ASribute  values  may  be  unbounded     •  ASributes  may  not  follow  a  natural   progression     5/2/13   9  
  • 10. Learning  Models  …   •  Unsupervised  learning  such  as  clustering   methods  would  make  good  models,  but  not   good  enough  to  be  of  prac8cal  use.  Any   simplifica8on  process  will  compromise  on   accuracy   •  Ability  to  explain  is  cri8cal   •  Tend  to  ignore  domain  knowledge     Challenge  in  providing  enterprise  solu8on   5/2/13   10  
  • 11. Thoughts   •  No  comparable  applica8on  with  such   requirements   •  Build  and  deploy  a  classifier  that  explains  itself   easily,  scales  temporally  and  offers  quick   response   •  Use  domain  knowledge  to  guide  verifica8on   •  Improve  the  classifier  through  machine   learning  methods  by  analyzing  performance  in   the  field     5/2/13   11  
  • 12. Vector-­‐Space  Models   •  Similarity  based  search  make  vector-­‐space   model  a  good  choice  for  genera8ng  selec8ons   •  Given  the  vola8le  nature  of  data,  informa8on   retrieval  (IR)  systems  can  adapt  easily   •  Good  at  neighborhood  search     Sensi8ve  to  individual  aSribute  changes!   5/2/13   12  
  • 13. Sources  of  Inspira8on   •  Lucene/Solr  features   •  Documenta8on  from  (erstwhile)  Lucid   Imagina8on   •  Ease  with  which  Lucene/Solr  could  be   installed  and  explored     Very  short  learning  curve  for  novices!   5/2/13   13  
  • 14. Feature  Selec8on     •  Primi8ve  and  derived  aSributes   •  Entropy   •  Distribu8on     5/2/13   14  
  • 15. Domain   •  Devices  come  with  structural  informa8on  but   not  much  grammar  or  seman8cs   •  Bag-­‐of-­‐words  (single  field)  approach  is  fast  but   not  precise   •  Using  all  fields  is  precise  but  response  is  slow       Now  what?   5/2/13   15  
  • 16. Disjunc8on  Max   •  Matrix  of  all  possible  combina8ons  of  user  input  query   and  document  fields   •  Transforms  into  a  Boolean  query  of   Disjunc8onMaxQueries  of  each  row   •  Maximum  score  of  sub  clauses  Is  used  by   Disjunc8onMaxQuery   •  No  single  term  in  user  input  dominates     This  is  needed!     Src:  SearchHub  and  LucidWorks      5/2/13   16  
  • 17. DisMax  Experiments   (index  size  =  60  Million)   Scenario  1   mm=2     Solr  fields  =  {  a1,  a2,   a3  }   Values=  {  phrase1,   phrase2,  phrase3}     Must-­‐Match  Clauses   Latency:  YES  (35  ms)   Precision:  NO  (20%   failure)   5/2/13   17   Scenario  2   mm  =  50  %   Solr  fields  =  {  a1  }   Values=  {  term1,  term2,   term3  ….  termn  }     Should-­‐Match  Clauses   Latency:  NO  (>  2  seconds)   Precision:  YES  (>  98%)  
  • 18. Possible  Workaround     •  Look-­‐ahead:  Customize  Lucene/Solr  to  do  a   branch-­‐and-­‐bound  search,  bail  out  on  some   lower  bound  score   •  Minimize  candidates  for  DisMax  search   -­‐  reduce  total  number  of  Solr  instances  to  search   -­‐  reduce  total  number  of  disjunc8ve  terms      [  Empirical  es8mate:  tn  =  2  *  tn-­‐1      where  t  =  8me  &            n  =  number  of  disjunc8ve  terms]   5/2/13   18  
  • 19. Phrases  over  Terms   •  Used  coloca8on  (co-­‐occurrence  matrix)  to   determine  most  common  phrases   •  Delete  terms  covered  by  phrases   •  Add  stop  words  based  on  frequency  analysis   •  Ensure  precision  is  preserved  through   regression  tests     Reduced  the  number  of  DisMax  terms  by  30%   5/2/13   19  
  • 20. Sources  of  Inspira8on   •  Planning  in  a  Hierarchy  of  Abstrac8on  Spaces,   Ar8ficial  Intelligence,  Vol.  5,  No.  2,  pp.   115-­‐135  (1974)     •  Search  Reduc8on  in  Hierarchical  Problem   Solving,  Proc.  Of  the  9th  IJCAI,  AAAI  Press,   Menlo  Park,  CA  (1991)   •  Excep8onal  Data  Quality  Using  Intelligent   Matching  and  Retrieval,  AI  Magazine,  AAAI   Press  (Spring  2010)   5/2/13   20  
  • 21. Hierarchical  Matching   Bag  of  words                         Models   Phrases   Filters   DisMax   Query   Formulator   Domain-­‐ specific   paSerns       CSV/JSON   Solr     instances   selector   To  Solr  Servers   5/2/13   21   Verifica8on  
  • 22. Conflict  Resolu8on   •  Top  n  candidates  are  returned  from  each  Solr   instance   •  They  are  ranked  based  on  custom  verifica8on   module   •  Ties  are  broken  using  recency   •  Top  candidate  is  persisted  and  returned  along   with  custom  score   5/2/13   22  
  • 23. Comments   •  Dismax  performs  mul8dimensional  match   •  Extracted  mul8ple  filters  and  arranged  them   hierarchically   •  Separa8on  of  selec8on  and  evalua8on   -­‐  Selec8on  =  approximate  solu8on   -­‐  Evalua8on  =  refinement   5/2/13   23  
  • 24. Where  8me  went..   •  ASribute  selec8on   •  Ranking     •  Op8miza8on   •  Index  re-­‐genera8on     •  Regression  tes8ng   5/2/13   24  
  • 25. Sources  for  Tune  Up   •  Scaling  Solr,  Lucene  Revolu8on,  May  2011     •  Prac8cal  Search  with  Solr:  Beyond  just  Looking   it  Up,  Lucid  Imagina8on,  May  2010   5/2/13   25  
  • 26. Tes8ng   •  Precision  tes8ng  using  self  and  mixed  modes   •  Latency  tests     -­‐  custom  harness  for  stand-­‐alone  tests   -­‐  integrated  tests  with  JMeter  framework   5/2/13   26  
  • 28. Latency  Percen8les   original  edismax   Ini8al  solu8on   Op8miza8on  2:  Domain  paSerns,     Stop  words,  de-­‐dupe   Op8miza8on  1:  Filters,   Focused  search,  verifica8on   5/2/13   28  
  • 30. Response  Times  over  Time   5/2/13   30  
  • 31. Project  Execu8on   •  Agile  Methodology   •  Risk  mi8ga8on  through  primary  and   con8ngency  plans   •  Rapid  prototyping  followed  by  good  sozware   engineering  prac8ces   •  Evalua8ng  DSE  (DataStax)  &  Solr  Cloud     5/2/13   31  
  • 32. Gleanings   •  You  can  classify  anything  with  Lucene/Solr,   lexicon  is  your  own   •  The  ques8on  is  not  whether  Lucene/Solr  can   solve  a  par8cular  classifica8on  problem,  but   whether  you  can  priori8ze  among  the  many   ways  of  doing  it   •  If  you  run  into  a  problem,  someone  has  solved   it  or  will  solve  it  in  the  near  future     5/2/13   32  
  • 33. Gleanings  …   •  Deal  with  accuracy  before  latency   •  If  precision,  latency  and  scale  are  all  cri8cal  to   your  domain,  expect  to  invest  some8me  in   hierarchical  abstrac8ons   •  Index  once,  run  any8me,  anywhere,  does  not   apply  during  development   •  Throwing  all  data  at  Lucene/Solr  will  not  work  for   mission  cri8cal  applica8ons   •  Rapid  prototyping  and  willingness  to  fail   5/2/13   33  
  • 34. Summary         Simplify  and  match  at  mul0ple  levels  of   abstrac0on     5/2/13   34  
  • 35. Contributors   Chandra  Mouleeswaran   Research  &  Prototyping   Fang  Chen   Research  &  Prototyping   Luke  Mertens   Produc8za8on  &  Scalability   Brent  Pearson   Release  Management   Tracy  Hsu   Precision  Tes8ng  &  QA   5/2/13   35   Srinivas  Nayani   Deployment  &  QA  
  • 36. COMMENTS & FEEDBACK: Chandra Mouleeswaran cmouleeswaran@threatmetrix.com 5/2/13   36