Fusion for data science:
Grant Ingersoll
CTO, Lucidworks
@gsingers
Scalable search and analytics in one
Get Started
https://github.com/LucidWorks/fusion-examples/tree/master/
fusion-for-datascience-webinar
• Best in breed search solution built on Apache Lucene and Solr
• Easily capture signals like clicks, shares, ratings, etc. and make them actionable
• Powerful data ingestion and analysis capabilities enabling machine learning,
recommendations and positive user feedback loops
• Effortless scale leveraging proven frameworks and algorithms
• Easy integration with big data tools like Hadoop
Fusion Foundations
Billions of Docs
Optional
REST
Security woven
throughout
Proxy/LB
Recs
Worker
Pipes Metrics
NLP Sched.
Blobs Msging
Connectors
Worker Cluster Mgr.
Spark
Shards Shards
Solr
HDFS
Shared Config
Mgmt
Leader
Election
Load
Balancing
ZK 1
Zookeeper
ZK N
Signals
Fusion Architecture
Millions of Users
• Data exploration and visualization
• Easy Ingestion, feature selection and data reduction
• REST APIs for easy integration with commonly used tools
• Quick and Dirty: classification, clustering
• Powerful and scalable aggregations, math/stats framework leveraging Apache Spark
• Out of the box NLP tools for part of speech, sentence detection, named entity and more
• OOTB recommenders plus Mahout extensions
Fusion Data Science Use Cases
Lucene: Core search, pluggable ranking, advanced
storage, sparse matrix
Solr: Faceting, function queries, basic stats, scaling, easy
setup, UIMA, basic NLP, search clustering
Fusion: Pipelines, Connectors/Crawlers, Dashboards/UI,
Spark integration, advanced stats, large scale
aggregations
Fusion: Standing on the shoulders
of giants.
Data Exploration Demo
• Ingestion
• 60+ connectors, plus easily push data in using REST APIs
• Feature Selection
• Analyzers for all types
• Easily get/calculate weights for terms and attach payloads
• Term Vectors/Term Dictionary
• Data Reduction
• Filters
• Analyzers
• Data quality tools
Ingestion, Selection, Reduction
• Math:
• Search is essentially Vector * Matrix
• Aggregations
• Enable advanced computation over both core content as well as Fusion’s signals
• Make it easy to try out by leveraging Solr
• Ship with prebuilt “named” aggregations to cover common scenarios
Aggregations and Math
• Effortless scale, integrated with Fusion and Solr
• Leverage existing libraries like:
• Mahout
• Deep Learning 4J
• GraphX, MLLib
• As easy as:
• bin/spark start
• http://.../aggregator/jobs/twitter/hashtags_per_author?spark=true
Spark FTW!
Aggregations Demo
• Fusion powers recommendation use cases such as:
• People who bough this, bought that
• Related searches, spellings and more
• Session analysis
• Fusion ships with several built in recommendation options
- Graph and collaborative filtering based approaches
• Easily enable multi-modal recommendations that combine:
- Content
- Collaborative Filtering
- Spatial
- Historic/Context
Recommendations
• Spark
• APIs for running non-Lucid Spark jobs
• Integration with 3rd party Spark instances (from major Hadoop distros)
• Solr RDD extensions for term dictionary, term vectors
• UI for managing Aggregations
• Full-fledged Graph API
• More Math: matrices, functions, etc.
What's Next
• Lucidworks: http://www.lucidworks.com
• Me: grant@lucidworks.com
• Key Docs:
• https://docs.lucidworks.com
• https://docs.lucidworks.com/display/fusion/Signals+Aggregator+API
• https://docs.lucidworks.com/display/fusion/Aggregator+Functions
• https://docs.lucidworks.com/display/fusion/Signals+Aggregations+and
+Recommendations
Resources
Webinar: Fusion for Data Science

Webinar: Fusion for Data Science

  • 2.
    Fusion for datascience: Grant Ingersoll CTO, Lucidworks @gsingers Scalable search and analytics in one
  • 3.
  • 4.
    • Best inbreed search solution built on Apache Lucene and Solr • Easily capture signals like clicks, shares, ratings, etc. and make them actionable • Powerful data ingestion and analysis capabilities enabling machine learning, recommendations and positive user feedback loops • Effortless scale leveraging proven frameworks and algorithms • Easy integration with big data tools like Hadoop Fusion Foundations
  • 5.
    Billions of Docs Optional REST Securitywoven throughout Proxy/LB Recs Worker Pipes Metrics NLP Sched. Blobs Msging Connectors Worker Cluster Mgr. Spark Shards Shards Solr HDFS Shared Config Mgmt Leader Election Load Balancing ZK 1 Zookeeper ZK N Signals Fusion Architecture Millions of Users
  • 7.
    • Data explorationand visualization • Easy Ingestion, feature selection and data reduction • REST APIs for easy integration with commonly used tools • Quick and Dirty: classification, clustering • Powerful and scalable aggregations, math/stats framework leveraging Apache Spark • Out of the box NLP tools for part of speech, sentence detection, named entity and more • OOTB recommenders plus Mahout extensions Fusion Data Science Use Cases
  • 8.
    Lucene: Core search,pluggable ranking, advanced storage, sparse matrix Solr: Faceting, function queries, basic stats, scaling, easy setup, UIMA, basic NLP, search clustering Fusion: Pipelines, Connectors/Crawlers, Dashboards/UI, Spark integration, advanced stats, large scale aggregations Fusion: Standing on the shoulders of giants.
  • 9.
  • 10.
    • Ingestion • 60+connectors, plus easily push data in using REST APIs • Feature Selection • Analyzers for all types • Easily get/calculate weights for terms and attach payloads • Term Vectors/Term Dictionary • Data Reduction • Filters • Analyzers • Data quality tools Ingestion, Selection, Reduction
  • 11.
    • Math: • Searchis essentially Vector * Matrix • Aggregations • Enable advanced computation over both core content as well as Fusion’s signals • Make it easy to try out by leveraging Solr • Ship with prebuilt “named” aggregations to cover common scenarios Aggregations and Math
  • 12.
    • Effortless scale,integrated with Fusion and Solr • Leverage existing libraries like: • Mahout • Deep Learning 4J • GraphX, MLLib • As easy as: • bin/spark start • http://.../aggregator/jobs/twitter/hashtags_per_author?spark=true Spark FTW!
  • 13.
  • 14.
    • Fusion powersrecommendation use cases such as: • People who bough this, bought that • Related searches, spellings and more • Session analysis • Fusion ships with several built in recommendation options - Graph and collaborative filtering based approaches • Easily enable multi-modal recommendations that combine: - Content - Collaborative Filtering - Spatial - Historic/Context Recommendations
  • 15.
    • Spark • APIsfor running non-Lucid Spark jobs • Integration with 3rd party Spark instances (from major Hadoop distros) • Solr RDD extensions for term dictionary, term vectors • UI for managing Aggregations • Full-fledged Graph API • More Math: matrices, functions, etc. What's Next
  • 16.
    • Lucidworks: http://www.lucidworks.com •Me: grant@lucidworks.com • Key Docs: • https://docs.lucidworks.com • https://docs.lucidworks.com/display/fusion/Signals+Aggregator+API • https://docs.lucidworks.com/display/fusion/Aggregator+Functions • https://docs.lucidworks.com/display/fusion/Signals+Aggregations+and +Recommendations Resources