1
Introducing Fusion
5.1
W I T H D ATA S C I E N C E T O O L K I T I N T E G R AT I O N :
T E N S O R F L O W, S PA C Y & S C I K I T - L E A R N
Justin Sears, VP of Product Marketing
Sanket Shahane, Data Scientist & Product Manager for Artificial
Intelligence
2
Today’s Speakers
Justin Sears
VP of Product Marketing
Sanket Shahane
Product Manager - Artificial
Intelligence
3
Agenda
• Introduction to Fusion & Version 5.1
• Fusion’s Jupyter Notebook Integration
– Architecture, Scope & Purpose
– Demo: Reading Data, Writing Data, SQL Aggregation
• Deploying ML Models with Seldon Core
– Architecture, Scope & Purpose
– Demo: Deploying a Custom Model
• Q&A
4
Fusion Overview
I N T R O D U C I N G F U S I O N 5 . 1
5
How We Do It
Fusion leverages existing knowledge & maximizes the velocity of data discovery
Understanding
Content
Understanding
Users
Delivering at
Scale
6
FILTER
VISUALIZATION
ACTIVITY
CONTENT
INDEX
NATURAL
LANGUAGE
BOOSTED
RESULTS
MACHINE
LEARNING
QUERY RULE
MATCHING
USER
SIGNALS
FACET, TOPIC
& CLUSTER
D ATA
Human
Generated
System
Generated
Application
Generated
S O L U T I O N
Digital
Workplace
Digital
Commerce
7
Advanced connectors and ML enrichment,
delivered by intuitive applications,
deployed on-prem, in the cloud or as a PaaS.
D ATA
Any format,
any platform
S O L U T I O N
Personalized
insights for
each user
STORAGE
& SEARCH
INTENT
PREDICTIO
N
APP
CREATION
DATA
INGEST &
PREP
F U S I O N P L AT F O R M
Human
Generated
System
Generated
Application
Generated
Digital
Workplace
Digital
Commerce
8
STORAGE & SEARCH INTENT PREDICTION APP CREATIONDATA INGEST & PREP
NLP: NER, phrases, POS
Document classification
Anomaly detection
Clustering
Topic detection
Search engine &
data processing
Connectors
ETL pipelines
Scheduling & alerting
SQL engine
Rules engine
Query pipelines
Query intent detector
Automatic relevancy
Signals & query analytics
Recommenders
A/B testing
Modular components
Stateless architecture
User-focused experience
Geospatial mapping
Results preview
Rapid prototyping
S C A L A B L E O P E R AT I O N S
SECURITYCDCRCLOUDSCALABLEEXTENSIBLE
99
C L O U D - N AT I V E ,
M I C R O S E R V I C E S
A R C H I T E C T U R E
O R C H E S T R AT E D B Y
K U B E R N E T E S
A U T O S C A L I N G P O L I C I E S
D Y N A M I C A L LY M A N A G E
R E S O U R C E S
N AT I V E S U P P O R T F O R
P Y T H O N M L M O D E L S
E A S Y I N T E G R AT I O N W I T H
D ATA S C I E N C E T O O L S E . G .
T E N S O R F L O W, S C I K I T-
L E A R N , S PA C Y, J U P Y T E R
N O T E B O O K S
S PA R K S T R E A M I N G F O R
S I G N A L S
F U S I O N 5 . 1 – C LO U D N AT I V E & D ATA S C I E N C E R E A DY
10
• Workplace apps with a
consumer-like experience
• Contextual, personalized
discovery of insights
• Successful cross-functional
adoption of applied ML
• Employee engagement,
collaboration & retention
The Hyper-
Personalized
Workplace
11
• Real-time, personalized,
relevant search results
• Proactive recommendations
with ML that work on Day 1
• Machine intelligence at scale,
with merchandisers in charge
• A trove of customer insight to
inform strategic decisions
Hyper-
Personalized
Commerce
12
Jupyter Notebook
Integration
R E A D I N G D ATA , W R I T I N G D ATA &
S Q L A G G R E G AT I O N
13
Architecture, Scope & Purpose
Jupyter Notebook runs as
its own independent
service
Deployed on the analytics
node pool in the Kube
deployment
14
Architecture, Scope & Purpose
Jupyter Notebook Scope
• Interacts with Fusion, Solr Collections, and the
outside world (access permitting)
• Current scope limited to dev and exploration
(should not be used for production workloads.
• Fusion proxy authenticated endpoint
• Supports Scala, Python, and other language
kernels
• Hosts Spark for manipulating heavy data
15
Architecture, Scope & Purpose
Jupyter Notebook Use Cases
• Explore data from Solr
• Load data into Solr from other storage
sources
• Export data from Solr to other storage sources
• Run Fusion SQL
• Dev and Test custom SQL Aggregations
1616
Demo
17
Sample usage
18
Sample usage
19
Sample usage
20
Sample usage
21
Sample usage
22
Sample usage
23
Deploying ML
Models with
Seldon Core
W O R K I N G W I T H C U S T O M M O D E L S
24
What is Data Science Toolkit
Integration?
Data Science Toolkit Integration is a model service that provides seamless integration with
Fusion’s Query and Index Pipelines. It adds intelligence for processing incoming queries and
documents.
Fusion integrates with Seldon Core, an open source framework for model deployment
management.
Objectives
• Streamline production of search-focused ML models
• Reduce data science teams dependencies on DevOps teams and vice versa
• Increase productivity, drive experimentation to fail fast, iterate, and improve
Process
Train model
Build and
Publish
Docker
Image
Deploy in
Fusion
Integrate
25
Workflow
Development and
Publishing
1. Develop ML model using choice of
framework.
2. Persist model and other objects
3. Create docker image consisting of
python packages, prediction class,
and model objects
4. Publish to a docker repository
5. Deploy in Fusion using template
job
26
Usage – Query and Index Pipelines
Query Pipelines
• Process user
queries
• Multiple stages for
specific purposes
• Return results to the
user
Index Pipelines
• Process documents
• Multiple stages for
specific purposes
• Store documents to
Solr for Query
Pipelines
27
Usage
ML Models are immutable Docker
images deployed and scaled
independently.
28
Usage
Seldon Core balances the workload
between model replicas.
29
Usage
ML Service is a proxy and keeps track
of models available in Fusion.
30
Usage
Machine Learning stages interact with
ML Service and Seldon core via GRPC
protocol.
31
Usage
ML Models are immutable Docker
images deployed and scaled
independently.
Seldon Core balances the workload
between model replicas
ML Service is a proxy and keeps track
of models available in Fusion.
Machine Learning stages interact with
ML Service and Seldon core via GRPC
protocol.
3232
Demo
33
Learn More
Read the blog:
Fusion 5.1 Is Here: Faster Deployment of Data Science and Innovation
https://lucidworks.com/post/latest-fusion-release/
Test drive Fusion on your own! https://lucidworks.com/try/
Try in the Cloud
Try in our Sandbox (Github)
Contact us: https://lucidworks.com/contact/
Check out these resources to learn more about Fusion 5.1
3434
Questions & Answers
35
THANK YOU

Webinar: Accelerate Data Science with Fusion 5.1

  • 1.
    1 Introducing Fusion 5.1 W IT H D ATA S C I E N C E T O O L K I T I N T E G R AT I O N : T E N S O R F L O W, S PA C Y & S C I K I T - L E A R N Justin Sears, VP of Product Marketing Sanket Shahane, Data Scientist & Product Manager for Artificial Intelligence
  • 2.
    2 Today’s Speakers Justin Sears VPof Product Marketing Sanket Shahane Product Manager - Artificial Intelligence
  • 3.
    3 Agenda • Introduction toFusion & Version 5.1 • Fusion’s Jupyter Notebook Integration – Architecture, Scope & Purpose – Demo: Reading Data, Writing Data, SQL Aggregation • Deploying ML Models with Seldon Core – Architecture, Scope & Purpose – Demo: Deploying a Custom Model • Q&A
  • 4.
    4 Fusion Overview I NT R O D U C I N G F U S I O N 5 . 1
  • 5.
    5 How We DoIt Fusion leverages existing knowledge & maximizes the velocity of data discovery Understanding Content Understanding Users Delivering at Scale
  • 6.
    6 FILTER VISUALIZATION ACTIVITY CONTENT INDEX NATURAL LANGUAGE BOOSTED RESULTS MACHINE LEARNING QUERY RULE MATCHING USER SIGNALS FACET, TOPIC &CLUSTER D ATA Human Generated System Generated Application Generated S O L U T I O N Digital Workplace Digital Commerce
  • 7.
    7 Advanced connectors andML enrichment, delivered by intuitive applications, deployed on-prem, in the cloud or as a PaaS. D ATA Any format, any platform S O L U T I O N Personalized insights for each user STORAGE & SEARCH INTENT PREDICTIO N APP CREATION DATA INGEST & PREP F U S I O N P L AT F O R M Human Generated System Generated Application Generated Digital Workplace Digital Commerce
  • 8.
    8 STORAGE & SEARCHINTENT PREDICTION APP CREATIONDATA INGEST & PREP NLP: NER, phrases, POS Document classification Anomaly detection Clustering Topic detection Search engine & data processing Connectors ETL pipelines Scheduling & alerting SQL engine Rules engine Query pipelines Query intent detector Automatic relevancy Signals & query analytics Recommenders A/B testing Modular components Stateless architecture User-focused experience Geospatial mapping Results preview Rapid prototyping S C A L A B L E O P E R AT I O N S SECURITYCDCRCLOUDSCALABLEEXTENSIBLE
  • 9.
    99 C L OU D - N AT I V E , M I C R O S E R V I C E S A R C H I T E C T U R E O R C H E S T R AT E D B Y K U B E R N E T E S A U T O S C A L I N G P O L I C I E S D Y N A M I C A L LY M A N A G E R E S O U R C E S N AT I V E S U P P O R T F O R P Y T H O N M L M O D E L S E A S Y I N T E G R AT I O N W I T H D ATA S C I E N C E T O O L S E . G . T E N S O R F L O W, S C I K I T- L E A R N , S PA C Y, J U P Y T E R N O T E B O O K S S PA R K S T R E A M I N G F O R S I G N A L S F U S I O N 5 . 1 – C LO U D N AT I V E & D ATA S C I E N C E R E A DY
  • 10.
    10 • Workplace appswith a consumer-like experience • Contextual, personalized discovery of insights • Successful cross-functional adoption of applied ML • Employee engagement, collaboration & retention The Hyper- Personalized Workplace
  • 11.
    11 • Real-time, personalized, relevantsearch results • Proactive recommendations with ML that work on Day 1 • Machine intelligence at scale, with merchandisers in charge • A trove of customer insight to inform strategic decisions Hyper- Personalized Commerce
  • 12.
    12 Jupyter Notebook Integration R EA D I N G D ATA , W R I T I N G D ATA & S Q L A G G R E G AT I O N
  • 13.
    13 Architecture, Scope &Purpose Jupyter Notebook runs as its own independent service Deployed on the analytics node pool in the Kube deployment
  • 14.
    14 Architecture, Scope &Purpose Jupyter Notebook Scope • Interacts with Fusion, Solr Collections, and the outside world (access permitting) • Current scope limited to dev and exploration (should not be used for production workloads. • Fusion proxy authenticated endpoint • Supports Scala, Python, and other language kernels • Hosts Spark for manipulating heavy data
  • 15.
    15 Architecture, Scope &Purpose Jupyter Notebook Use Cases • Explore data from Solr • Load data into Solr from other storage sources • Export data from Solr to other storage sources • Run Fusion SQL • Dev and Test custom SQL Aggregations
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
    23 Deploying ML Models with SeldonCore W O R K I N G W I T H C U S T O M M O D E L S
  • 24.
    24 What is DataScience Toolkit Integration? Data Science Toolkit Integration is a model service that provides seamless integration with Fusion’s Query and Index Pipelines. It adds intelligence for processing incoming queries and documents. Fusion integrates with Seldon Core, an open source framework for model deployment management. Objectives • Streamline production of search-focused ML models • Reduce data science teams dependencies on DevOps teams and vice versa • Increase productivity, drive experimentation to fail fast, iterate, and improve Process Train model Build and Publish Docker Image Deploy in Fusion Integrate
  • 25.
    25 Workflow Development and Publishing 1. DevelopML model using choice of framework. 2. Persist model and other objects 3. Create docker image consisting of python packages, prediction class, and model objects 4. Publish to a docker repository 5. Deploy in Fusion using template job
  • 26.
    26 Usage – Queryand Index Pipelines Query Pipelines • Process user queries • Multiple stages for specific purposes • Return results to the user Index Pipelines • Process documents • Multiple stages for specific purposes • Store documents to Solr for Query Pipelines
  • 27.
    27 Usage ML Models areimmutable Docker images deployed and scaled independently.
  • 28.
    28 Usage Seldon Core balancesthe workload between model replicas.
  • 29.
    29 Usage ML Service isa proxy and keeps track of models available in Fusion.
  • 30.
    30 Usage Machine Learning stagesinteract with ML Service and Seldon core via GRPC protocol.
  • 31.
    31 Usage ML Models areimmutable Docker images deployed and scaled independently. Seldon Core balances the workload between model replicas ML Service is a proxy and keeps track of models available in Fusion. Machine Learning stages interact with ML Service and Seldon core via GRPC protocol.
  • 32.
  • 33.
    33 Learn More Read theblog: Fusion 5.1 Is Here: Faster Deployment of Data Science and Innovation https://lucidworks.com/post/latest-fusion-release/ Test drive Fusion on your own! https://lucidworks.com/try/ Try in the Cloud Try in our Sandbox (Github) Contact us: https://lucidworks.com/contact/ Check out these resources to learn more about Fusion 5.1
  • 34.
  • 35.

Editor's Notes

  • #2 Janessa to intro slide 1-3, then pass over to Justin on slide 4
  • #5 Justin to intro
  • #6 SCRIPT: I’m about to go into our product detail. I’ll tell you how our products help your team better understand content, better understand users, and deliver digital workplace solutions at scale. We help you understand content better by indexing vast amounts of structured and unstructured content. Our parallel bulk loader uses the power of Apache Spark to connect to any data source and index it in Apache Spark. As Fusion indexes content, it applies AI and natural language processing to: cluster and classify documents, use NLP to recognize named entities (like products, people and places), tag parts of speech and analyze sentiment within text. Because we understand content at index, the content can be matched to a user’s query intent so that Fusion can provide the most relevant results. Fusion users also generate millions of valuable signals as they interact with the platform. Those signals help our clients uncover connections between different entities and by getting the right answers to users in the right way. Fusion uncovers connections with technology such as the semantic knowledge graph (SKG) that calculates the relationship strength between various entities and understands how they are related in the content. Fusion’s app studio helps your clients create rich, purpose-built applications for different types of users, so they can get the answers they need within the appropriate user experience. And finally, we deliver AI-powered finding at scale by allowing you to connect to any data via pre-built or custom-built connectors, APIs, templates or data pipelines. Our customer excellence team will educate you about best practices and provide regular health checks. A Lucidworks Customer success manager ensures continued success after launch through ongoing touchpoints and measurement of desired outcomes. And for complex knowledge management deployments, we offer advisory services on your Fusion machine learning models. [NEXT SLIDE]
  • #7 SCRIPT: Those three ways we do it oversimplified what we offer. Now let me go into more of the detail. At Lucidworks, we know that the challenges of AI-powered finding are very complex, because we began working on them 12 years ago. We began with a deep understanding of search and Apache Solr, and we wrote code to ingest data generated by systems, humans and applications. We’ve developed advanced functionality for indexing, clustering, classification, faceting, filtering, relevancy, ranking, analytics, visualization, natural language processing, ranking and boosting results. We’ve bundled it all into a Digital Workplace solution available with Lucidworks Fusion. Because we’ve done the work and become the experts at delivering the solution in the world’s largest organizations, you can begin using our platform without diverting time, attention and money away from your core business to build something in-house. Focus on your business, your customers and your employees instead. REFERENCES: Indexing Data: https://doc.lucidworks.com/fusion/2.1/Indexing-Data.html Faceting: https://doc.lucidworks.com/fusion/2.1/Search/Faceting.html Collaborative Filtering: https://doc.lucidworks.com/fusion/2.1/Recommendations_and_Boosting/Collaborative-Filtering.html Relevancy: https://doc.lucidworks.com/fusion-server/4.1/solr-reference-guide/7.4.0/relevance.html Ranking: https://doc.lucidworks.com/fusion-server/4.1/solr-reference-guide/7.4.0/learning-to-rank.html Analytics: https://doc.lucidworks.com/fusion-server/4.1/solr-reference-guide/7.4.0/analytics.html Visualization: https://doc.lucidworks.com/fusion-server/4.1/system-administration/dashboards/display-panels.html
  • #8 SCRIPT: Lucidworks Fusion incorporates AI and Machine Learning throughout the platform to intelligently ingest, explore, and curate the data. When you ingest data into the Fusion platform, it uses AI to cluster, classify and organize content, so that it will be available for user queries. Once the query comes into Fusion Server, it invokes Search AI to understand query intent and personalize results. Fusion App Studio allows teams to quickly create new, personalized, data discovery experiences. All of this can either be deployed on-prem, self-hosted on public cloud service, or as managed by Lucidworks in a multi-tenant cloud PaaS. REFERENCES: No customer left behind: How to drive growth by putting personalization at the center of your marketing | https://www.mckinsey.com/business-functions/marketing-and-sales/our-insights/no-customer-left-behind
  • #9 SCRIPT: Here’s some deeper detail on the functions performed by each of Fusion’s components. For example, AI on data ingest includes NLP analysis, document classification, anomaly detection, clustering and topic detection. Fusion Server delivers connectors to hundreds of data sources, and very fast search results, even with extremely large numbers of documents and thousands of concurrent users. AI at query time helps with intent detection, automatic relevancy, query and signal analytics, recommenders and A/B testing. And App Studio helps our customers deliver powerful knowledge discovery through a user-focused experience. Rapid prototyping helps our customers get the experiences right in the beginning and then easily tune them as conditions change. Fusion has a distributed architecture, so its operations are extensible, scalable, and cloud-ready, all with cross data center replication (CDCR) and enterprise-grade security. REFERENCES: SolrCloud: https://doc.lucidworks.com/fusion-server/4.1/solr-reference-guide/7.2.1/solrcloud.html CDCR video: https://youtu.be/fAvO8bHTh-Q Webinar “Secure Solr with Fusion”: https://youtu.be/unREzFNIa7Q
  • #10 SCRIPT: We released Fusion 5.0 in September 2019 and Fusion 5.1 in March 2020. Fusion 5 includes major product improvements such as: A cloud-native microservices architecture orchestrated by Kubernetes. This lets Fusion customers add new cloud-based apps more quickly and easily. Customers who want to run Fusion 5 in their data center can manage it with cloud-like agility, as long as they have Kubernetes on-site Fusion auto scaling policies let customers expand capacity dynamically, as new apps are introduced or as system load grows. If a new app starts consuming too much CPU, Kubernetes will add nodes and auto-balance the cluster. The Fusion ML microservice natively supports Python machine learning models, allowing data scientists to utilize existing models with Fusion’s Index and Query Pipelines. The most popular data science toolkits integrate natively with Fusion, so data scientists can use TensorFlow, Scikit-learn, spaCy. And in Fustion 5.1, we added Spark streaming for signals ingest and near real-time aggregation of search activity. [NEXT SLIDE]
  • #11 SPEAKER SCRIPT We’ve seen adoption of Fusion at hundreds of companies create hyper-personalized workplaces. These companies give their employees workplace applications with a consumer-like experience. Employees use those for contextual, personalized discovery of insights relevant to their particular role, question and moment. The successful cross-functional adoption of applied machine learning makes broader ML adoption easier within the organization. All of this leads to improved employee engagement, collaboration and retention. [NEXT SLIDE]
  • #12 SPEAKER SCRIPT We’ve seen adoption of Fusion transform commerce at some of the world’s top retailers. Those brands give both B2B and B2C buyers real-time, personalized, relevant search results. Fusion ships with proactive recommendations powered by Spark machine learning jobs that work on Day 1. This means machine intelligence works as an e-commerce operations grows, while allowing merchandisers to keep charge of the human experience. And because Lucidworks customers own the signals data that comes from Fusion, brands can use that data to inform strategic decisions that extend far beyond their websites. [NEXT SLIDE]
  • #33 Janessa to switch over to play video clip OR switch over to screenshare. JUSTIN & SANKET: look over at the Team Chat widget for directions from Janessa SPEAKER: Hold 3 seconds after the video ends before continuing with the slides
  • #35 Look over to the QA box for the first question
  • #36 SCRIPT: Thanks everyone for joining today’s webinar! The slides and video recording will be sent to you in a follow up email, along with the resources to learn more about Fusion. If we didn’t get a chance to answer your question today, don’t worry! A Lucidworks rep will be reaching out to you. If you want to get in touch with us, send us a note at lucidworks.com/contact. Thanks again everyone for joining. Have a great day.