Webinar: Fusion for Data Science

Fusion for data science:
Grant Ingersoll
CTO, Lucidworks
@gsingers
Scalable search and analytics in one

Get Started
https://github.com/LucidWorks/fusion-examples/tree/master/
fusion-for-datascience-webinar

• Best in breed search solution built on Apache Lucene and Solr
• Easily capture signals like clicks, shares, ratings, etc. and make them actionable
• Powerful data ingestion and analysis capabilities enabling machine learning,
recommendations and positive user feedback loops
• Effortless scale leveraging proven frameworks and algorithms
• Easy integration with big data tools like Hadoop
Fusion Foundations

Billions of Docs
Optional
REST
Security woven
throughout
Proxy/LB
Recs
Worker
Pipes Metrics
NLP Sched.
Blobs Msging
Connectors
Worker Cluster Mgr.
Spark
Shards Shards
Solr
HDFS
Shared Conﬁg
Mgmt
Leader
Election
Load
Balancing
ZK 1
Zookeeper
ZK N
Signals
Fusion Architecture
Millions of Users

• Data exploration and visualization
• Easy Ingestion, feature selection and data reduction
• REST APIs for easy integration with commonly used tools
• Quick and Dirty: classiﬁcation, clustering
• Powerful and scalable aggregations, math/stats framework leveraging Apache Spark
• Out of the box NLP tools for part of speech, sentence detection, named entity and more
• OOTB recommenders plus Mahout extensions
Fusion Data Science Use Cases

Lucene: Core search, pluggable ranking, advanced
storage, sparse matrix
Solr: Faceting, function queries, basic stats, scaling, easy
setup, UIMA, basic NLP, search clustering
Fusion: Pipelines, Connectors/Crawlers, Dashboards/UI,
Spark integration, advanced stats, large scale
aggregations
Fusion: Standing on the shoulders
of giants.

• Ingestion
• 60+ connectors, plus easily push data in using REST APIs
• Feature Selection
• Analyzers for all types
• Easily get/calculate weights for terms and attach payloads
• Term Vectors/Term Dictionary
• Data Reduction
• Filters
• Analyzers
• Data quality tools
Ingestion, Selection, Reduction

• Math:
• Search is essentially Vector * Matrix
• Aggregations
• Enable advanced computation over both core content as well as Fusion’s signals
• Make it easy to try out by leveraging Solr
• Ship with prebuilt “named” aggregations to cover common scenarios
Aggregations and Math

• Effortless scale, integrated with Fusion and Solr
• Leverage existing libraries like:
• Mahout
• Deep Learning 4J
• GraphX, MLLib
• As easy as:
• bin/spark start
• http://.../aggregator/jobs/twitter/hashtags_per_author?spark=true
Spark FTW!

• Fusion powers recommendation use cases such as:
• People who bough this, bought that
• Related searches, spellings and more
• Session analysis
• Fusion ships with several built in recommendation options
- Graph and collaborative ﬁltering based approaches
• Easily enable multi-modal recommendations that combine:
- Content
- Collaborative Filtering
- Spatial
- Historic/Context
Recommendations

• Spark
• APIs for running non-Lucid Spark jobs
• Integration with 3rd party Spark instances (from major Hadoop distros)
• Solr RDD extensions for term dictionary, term vectors
• UI for managing Aggregations
• Full-ﬂedged Graph API
• More Math: matrices, functions, etc.
What's Next

• Lucidworks: http://www.lucidworks.com
• Me: grant@lucidworks.com
• Key Docs:
• https://docs.lucidworks.com
• https://docs.lucidworks.com/display/fusion/Signals+Aggregator+API
• https://docs.lucidworks.com/display/fusion/Aggregator+Functions
• https://docs.lucidworks.com/display/fusion/Signals+Aggregations+and
+Recommendations
Resources

Webinar: Fusion for Data Science

Webinar: Fusion for Data Science

More Related Content

What's hot

Viewers also liked

Similar to Webinar: Fusion for Data Science

More from Lucidworks

Recently uploaded

Webinar: Fusion for Data Science