Fusion	
  for	
  Business	
  Intelligence	
  
Allan	
  Syiek	
  
Senior	
  Sales	
  Engineer	
  
September	
  14,	
  2016	
  
Session	
  Objec,ves	
  
By	
  the	
  end	
  of	
  this	
  session,	
  you	
  will:	
  
	
  
–  Have	
  a	
  high	
  level	
  awareness	
  of	
  the	
  variety	
  of	
  
search	
  and	
  discovery	
  funcFonality	
  available	
  
–  Select	
  the	
  right	
  product	
  for	
  a	
  parFcular	
  use	
  
case	
  
–  Know	
  why	
  this	
  baby	
  is	
  so	
  happy	
  
	
  
Agenda	
  
Ø The	
  Beer	
  and	
  Diaper	
  Legend	
  
Ø DIKW	
  Pyramid	
  
Ø What	
  is	
  Enterprise	
  Search	
  
Ø Indexing	
  101	
  
Ø StaFsFcs	
  vs.	
  Data	
  Mining	
  vs.	
  Machine	
  Learning	
  
Ø What	
  is	
  Business	
  Intelligence	
  
Ø Where	
  does	
  Fusion	
  Fit?	
  
Parable	
  of	
  the	
  Beer	
  and	
  the	
  Diapers	
  
Illustrates	
  the	
  difference	
  between	
  querying	
  and	
  data	
  mining,	
  	
  
already	
  firmly	
  enshrined	
  in	
  BI	
  mythology	
  	
  
 
The	
  DIKW	
  Pyramid	
  
	
  
What	
  is	
  Enterprise	
  Search	
  
Q.	
  What	
  do	
  you	
  do	
  with	
  a	
  mountain	
  of	
  data	
  located	
  everywhere?	
  
A.	
  Depends….	
  What	
  do	
  you	
  need	
  it	
  for?	
  
•  Crawling,	
  Parsing,	
  Indexing,	
  Searching	
  
•  Advanced	
  Searches	
  
•  Searching	
  Structured	
  Data	
  
•  Searching	
  Unstructured	
  Data	
  
•  Metadata	
  
•  Ranking	
  
•  Results	
  
•  Access	
  Control	
  
•  UI	
  
•  Tuning	
  
•  ReporFng	
  
•  Scale	
  and	
  Performance	
  
Aspects	
  of	
  Enterprise	
  Search	
  
Index Pipeline
Tika	
  Parser	
  
Exclusion	
  Filter	
  
Field	
  Mapper	
  
HTML	
  Transform	
  Stage	
  
XML	
  Transform	
  Stage	
  
OpenNLP	
  EnFty	
  ExtracFon	
  
Gaze]eer	
  ExtracFon	
  
Regular	
  Expression	
  
AggregaFng	
  
Javascript	
  (custom	
  scripts)	
  
…and	
  others…	
   SearchCollection
SearchUI
Search	
  Fields/Parameters	
  
Facets	
  	
  
Landing	
  Pages	
  
Boost	
  Documents	
  
Block	
  Documents	
  
Security	
  Trimming	
  
RecommendaFon	
  BoosFng	
  
Rollup	
  Aggregator	
  
Sub	
  Query	
  
Javascript	
  (custom	
  scripts)	
  
…and	
  others…	
  
Documents
Query Pipeline
 
Indexing	
  101	
  
	
  
A	
  system	
  used	
  to	
  make	
  finding	
  informa,on	
  easier. 	
  	
  
Every	
  word	
  is	
  converted	
  
into	
  a	
  wordID	
  by	
  using	
  an	
  
in-­‐memory	
  hash	
  table	
  -­‐-­‐	
  
the	
  lexicon.	
  	
  
	
  
Occurrences	
  in	
  the	
  current	
  
document	
  are	
  translated	
  
into	
  hit	
  lists	
  and	
  are	
  
wri]en	
  into	
  the	
  forward	
  
“barrels”.	
  	
  
	
  
Inverted	
  Barrels	
  have	
  been	
  
sorted.	
  
	
  
Indexing	
  101	
  -­‐	
  Ranking	
  
•  Score	
  Results	
  for	
  PresentaFon	
  
–  Weighted	
  by	
  	
  
	
  	
  	
  Term	
  Frequency-­‐Inverse	
  Document	
  Frequency	
  	
  
	
  	
  	
  	
  (TF-­‐IDF)	
  
–  Clustering	
  
–  Complex	
  proprietary	
  algorithms	
  
	
  	
  
Indexing	
  101	
  -­‐	
  Relevance	
  
Sta,s,cs	
  vs.	
  Data	
  Mining	
  vs.	
  Machine	
  Learning	
  
– Sta,s,cs	
  quan%fies	
  numbers	
  
– Data	
  Mining	
  explains	
  pa]erns	
  
– Machine	
  Learning	
  predicts	
  with	
  models	
  
– Ar,ficial	
  Intelligence	
  behaves	
  and	
  reasons	
  
What	
  is	
  Business	
  Intelligence	
  
•  BI	
  technologies	
  provide	
  historical,	
  current	
  and	
  predicFve	
  views	
  of	
  business	
  
operaFons	
  
•  Business	
  intelligence	
  is	
  made	
  up	
  of	
  an	
  increasing	
  number	
  of	
  components	
  
including:	
  
–  MulFdimensional	
  aggregaFon	
  and	
  allocaFon	
  (OLAP–	
  Online	
  AnalyFcal	
  Processing)	
  
–  DenormalizaFon,	
  tagging	
  and	
  standardizaFon	
  (relaFonal	
  database)	
  
–  Real	
  Fme	
  reporFng	
  with	
  analyFcal	
  alert	
  
–  A	
  method	
  of	
  interfacing	
  with	
  unstructured	
  data	
  sources	
  (data	
  mining)	
  
–  Group	
  consolidaFon,	
  budgeFng	
  and	
  rolling	
  forecasts	
  
–  StaFsFcal	
  inference	
  and	
  probabilisFc	
  simulaFon	
  
–  Key	
  performance	
  indicators	
  opFmizaFon	
  
–  Version	
  control	
  and	
  process	
  management	
  
–  Open	
  item	
  management	
  
•  Why Fusion for Log
Analytics?
	
  
•  Secure	
  access	
  to	
  
dashboards	
  
•  ETL	
  of	
  logs	
  using	
  Index	
  
pipelines	
  
•  Spark	
  run	
  analysis	
  models	
  
for	
  logs	
  and	
  leverage	
  with	
  
ML	
  index	
  pipeline	
  
	
  
•  Time	
  series	
  index	
  
management	
  
Massive-­‐scale	
  log	
  analyFcs	
  
•  Index billions of log events per day, real-time
•  Recent event and historical analysis: Analyze logs
over time: today, recent, past week, past 30 days,
…
•  Easy to use dashboards to visualize common
questions and allow for ad hoc analysis
•  Ability to scale linearly as business grows …
with sub-linear growth in costs!
•  Easy to setup, easy to manage, easy to use
•  Signals	
  &	
  RecommendaFons	
  
Fusion	
  can	
  capture,	
  store,	
  and	
  aggregate	
  signals	
  from	
  a	
  
variety	
  of	
  sources	
  to	
  drive	
  predicFve	
  search	
  capabiliFes	
  
and	
  conFnuous	
  relevancy	
  tuning	
  
Signals can include
Clicks	
  and	
  queries	
  
Add-­‐to-­‐cart	
  and	
  
purchase	
  behavior	
  
Geo-­‐locaFon	
  
User	
  behavior	
  and	
  
preferences	
  
User	
  history	
  and	
  past	
  
orders	
  
Device	
  
VisualizaFon	
  &	
  Insight	
  with	
  SILK	
  
SILK Dashboards provide a rich visual
interface for users to search, inspect and
visualize event/log data
Gives user the power to perform ad-hoc
search and analysis on massive amounts
of multi-structured and time series data.
Real-time insights and trends for on-the-
fly decision making using the most
accurate and up-to-date data
Users can share visualizations and
dashboards
REST	
  API	
  
Worker	
   Worker	
   Cluster	
  Mgr.	
  
Apache	
  Spark	
  
Shards	
   Shards	
  
Apache	
  Solr	
  
HDFS	
  (OpFonal)	
  
Shared	
  Config	
  
Mgmt	
  
Leader	
  ElecFon	
   Load	
  Balancing	
  
ZK	
  1	
  
Apache	
  Zookeeper	
  
ZK	
  N	
  
DATABASEWEBFILELOGSHADOOP CLOUD
Connectors
Alerting/Messaging
NLP
Pipelines
Blob Storage
Scheduling
Recommenders/Signals
…
Core	
  Services	
  
Admin UI
SECURITY BUILT-IN
Lucidworks
View
Where Does Fusion Fit?
Learn	
  more	
  at	
  -­‐	
  	
  lucenerevoluFon.org	
  
Thank	
  You	
  
Q	
  &	
  A	
  

Webinar: Fusion for Business Intelligence

  • 2.
    Fusion  for  Business  Intelligence   Allan  Syiek   Senior  Sales  Engineer   September  14,  2016  
  • 3.
    Session  Objec,ves   By  the  end  of  this  session,  you  will:     –  Have  a  high  level  awareness  of  the  variety  of   search  and  discovery  funcFonality  available   –  Select  the  right  product  for  a  parFcular  use   case   –  Know  why  this  baby  is  so  happy    
  • 4.
    Agenda   Ø The  Beer  and  Diaper  Legend   Ø DIKW  Pyramid   Ø What  is  Enterprise  Search   Ø Indexing  101   Ø StaFsFcs  vs.  Data  Mining  vs.  Machine  Learning   Ø What  is  Business  Intelligence   Ø Where  does  Fusion  Fit?  
  • 5.
    Parable  of  the  Beer  and  the  Diapers   Illustrates  the  difference  between  querying  and  data  mining,     already  firmly  enshrined  in  BI  mythology    
  • 6.
  • 7.
    What  is  Enterprise  Search   Q.  What  do  you  do  with  a  mountain  of  data  located  everywhere?   A.  Depends….  What  do  you  need  it  for?  
  • 8.
    •  Crawling,  Parsing,  Indexing,  Searching   •  Advanced  Searches   •  Searching  Structured  Data   •  Searching  Unstructured  Data   •  Metadata   •  Ranking   •  Results   •  Access  Control   •  UI   •  Tuning   •  ReporFng   •  Scale  and  Performance   Aspects  of  Enterprise  Search  
  • 10.
    Index Pipeline Tika  Parser   Exclusion  Filter   Field  Mapper   HTML  Transform  Stage   XML  Transform  Stage   OpenNLP  EnFty  ExtracFon   Gaze]eer  ExtracFon   Regular  Expression   AggregaFng   Javascript  (custom  scripts)   …and  others…   SearchCollection SearchUI Search  Fields/Parameters   Facets     Landing  Pages   Boost  Documents   Block  Documents   Security  Trimming   RecommendaFon  BoosFng   Rollup  Aggregator   Sub  Query   Javascript  (custom  scripts)   …and  others…   Documents Query Pipeline
  • 11.
      Indexing  101     A  system  used  to  make  finding  informa,on  easier.     Every  word  is  converted   into  a  wordID  by  using  an   in-­‐memory  hash  table  -­‐-­‐   the  lexicon.       Occurrences  in  the  current   document  are  translated   into  hit  lists  and  are   wri]en  into  the  forward   “barrels”.       Inverted  Barrels  have  been   sorted.    
  • 12.
    Indexing  101  -­‐  Ranking   •  Score  Results  for  PresentaFon   –  Weighted  by          Term  Frequency-­‐Inverse  Document  Frequency            (TF-­‐IDF)   –  Clustering   –  Complex  proprietary  algorithms      
  • 13.
  • 14.
    Sta,s,cs  vs.  Data  Mining  vs.  Machine  Learning   – Sta,s,cs  quan%fies  numbers   – Data  Mining  explains  pa]erns   – Machine  Learning  predicts  with  models   – Ar,ficial  Intelligence  behaves  and  reasons  
  • 15.
    What  is  Business  Intelligence   •  BI  technologies  provide  historical,  current  and  predicFve  views  of  business   operaFons   •  Business  intelligence  is  made  up  of  an  increasing  number  of  components   including:   –  MulFdimensional  aggregaFon  and  allocaFon  (OLAP–  Online  AnalyFcal  Processing)   –  DenormalizaFon,  tagging  and  standardizaFon  (relaFonal  database)   –  Real  Fme  reporFng  with  analyFcal  alert   –  A  method  of  interfacing  with  unstructured  data  sources  (data  mining)   –  Group  consolidaFon,  budgeFng  and  rolling  forecasts   –  StaFsFcal  inference  and  probabilisFc  simulaFon   –  Key  performance  indicators  opFmizaFon   –  Version  control  and  process  management   –  Open  item  management  
  • 16.
    •  Why Fusionfor Log Analytics?   •  Secure  access  to   dashboards   •  ETL  of  logs  using  Index   pipelines   •  Spark  run  analysis  models   for  logs  and  leverage  with   ML  index  pipeline     •  Time  series  index   management  
  • 17.
    Massive-­‐scale  log  analyFcs   •  Index billions of log events per day, real-time •  Recent event and historical analysis: Analyze logs over time: today, recent, past week, past 30 days, … •  Easy to use dashboards to visualize common questions and allow for ad hoc analysis •  Ability to scale linearly as business grows … with sub-linear growth in costs! •  Easy to setup, easy to manage, easy to use
  • 18.
    •  Signals  &  RecommendaFons   Fusion  can  capture,  store,  and  aggregate  signals  from  a   variety  of  sources  to  drive  predicFve  search  capabiliFes   and  conFnuous  relevancy  tuning   Signals can include Clicks  and  queries   Add-­‐to-­‐cart  and   purchase  behavior   Geo-­‐locaFon   User  behavior  and   preferences   User  history  and  past   orders   Device  
  • 19.
    VisualizaFon  &  Insight  with  SILK   SILK Dashboards provide a rich visual interface for users to search, inspect and visualize event/log data Gives user the power to perform ad-hoc search and analysis on massive amounts of multi-structured and time series data. Real-time insights and trends for on-the- fly decision making using the most accurate and up-to-date data Users can share visualizations and dashboards
  • 20.
    REST  API   Worker   Worker   Cluster  Mgr.   Apache  Spark   Shards   Shards   Apache  Solr   HDFS  (OpFonal)   Shared  Config   Mgmt   Leader  ElecFon   Load  Balancing   ZK  1   Apache  Zookeeper   ZK  N   DATABASEWEBFILELOGSHADOOP CLOUD Connectors Alerting/Messaging NLP Pipelines Blob Storage Scheduling Recommenders/Signals … Core  Services   Admin UI SECURITY BUILT-IN Lucidworks View Where Does Fusion Fit?
  • 21.
    Learn  more  at  -­‐    lucenerevoluFon.org  
  • 22.