SlideShare a Scribd company logo
1 of 41
Ingesting and Manipulating Data
with Javascript
Produces the world’s largest open
source user conference dedicated
to Lucene/Solr
Lucidworks is the primary sponsor of
the Apache Solr project
Employs over 40% of the active
committers on the Solr project
Contributes over 70% of Solr's
open source codebase
40%
70%
Based in San Francisco
Offices in Bangalore, Bangkok,
New York City, Raleigh, London
Over 300 customers across the
Fortune 1000
Fusion, a Solr-powered platform
for search-driven apps
An optimized search experience
for every user using relevance
boosting and machine learning.
Create custom search and
discovery applications in
minutes.
Highly scalable search
engine and NoSQL
datastore that gives you
instant access to all your
data.
Lucidworks Fusion product suite
• 50+ connectors
• Full SQL compatibility
• End-to-end security
• Multi-dimensional real-time
ingestion
• Administration and analytics
• Personalized
recommendations
• Machine learning out-of-the-
box
• Powerful recommenders
and classifiers
• Predictive search
• Point-and-click relevancy
tuning
• Quick prototyping
• Fine-grained security
• Stateless architecture
• Support 25+ data platforms
• Full library of components
• Pre-tested reusable
modules
Fusion Pipelines
Index Pipeline
Fusion Query Pipeline
Javascript Index Pipeline
Stage
This is a
Fusion
Javascript
Pipeline stage
Why Javascript?
Javascript vs
Pipeline Stage
o Existential discussion at Lucidworks
o My opinion only…
Pipeline stages
are good for…
And…
Not…
o 20 discrete operations I have to do to convert one
field…
o Conditional operations (if this then this, otherwise
do this other thing)
o Canned functionality you have elsewhere.
o I don’t want to do anything that feels like
programming in form fields…
com.lucidworks.apollo.common.
pipeline.PipelineDocument
PipelineDocument Highlights
https://doc.lucidworks.com/fusion-pipeline-
javadocs/3.1/com/lucidworks/apollo/common/pipeline/PipelineDocument.html
PipelineDocument{
…
addField(name, value);
getAllFieldNames(); //include internal use names
getFieldNames(); //exclude internal use names
getFirstField(name);
getLastField(name);
removeFields(name);
setField(name, value);
...
}
The Javascript Function
Basic
function (doc) {
// do really important things.
return doc;
}
With Context
function (doc, ctx) {
// do really important things.
return doc;
}
https://doc.lucidworks.com/fusion-pipeline-
javadocs/3.1/com/lucidworks/apollo/pipeline/Context.html
With Collection
function (doc, ctx, collection) {
// do really important things.
return doc;
}
With solrServer
function (doc, ctx, collection, solrServer) {
// do really important things.
// solrServer can index/query things
return doc;
}
https://doc.lucidworks.com/fusion-pipeline-
javadocs/3.1/com/lucidworks/apollo/component/
BufferingSolrServer.html
With
solrServerFactory
aka
SolrClientFactory
function (doc, ctx, collection, solrServer,
solrServerFactory) {
// do really important things.
// solrServerFactory look up other collections
return doc;
}
https://doc.lucidworks.com/fusion-pipeline-
javadocs/3.1/com/lucidworks/apollo/component/
SolrClientFactory.html
Common Problems
Add a Field
function (doc) {
// replace any values currently in
the field with new ones
doc.setField('some-new-field',
'some field value');
// for multi value fields this will
combine values with old values if
there are any, otherwise it will add a
new field.
doc.addField('some-new-field',
'some field value');
return doc;
}
Glue Two
Fields
function(doc) {
var value = "";
if (doc.hasField("Actor1Geo_Lat") &&
doc.hasField("Actor1Geo_Long")) {
value =
doc.getFirstFieldValue("Actor1Geo_Lat") + "," +
doc.getFirstFieldValue("Actor1Geo_Long");
doc.addField("Actor1Geo_p", value);
}
return doc;
}
Iterate through the fields
function (doc) {
// list of doc fields to iterate over
var fields = doc.getFieldNames().toArray();
for (var i=0;i < fields.length;i++) {
var fieldName = fields[i];
var fieldValue = doc.getFirstFieldValue(fieldName);
logger.info("field name:" +fieldName + ", field name: " +
fieldValue);
}
}
return doc;
}
Logging
logger.info("field name:" +fieldName + ", field name: " +
fieldValue);
fusion/3.1.x/var/log/connectors/connectors.log
Preview a field
function(doc){
if (doc.getId() != null) {
var fromField = "body_t";
var toField = "preview_t";
var value =
doc.getFirstFieldValue(fromField);
var pattern = /n|t/g;
value = value.replace(pattern, " ");
value = value ? value : "";
}
var length = value.length < 500 ?
value.length : 500;
value = value.substr(0,length);
doc.addField(toField, value);
}
return doc;
}
Bust up a
document
function (doc) {
var field = doc.getFieldValues('price');
var id = doc.getId();
var newDocs = [];
for (i = 0; i < field.size(); i++) {
newDocs.push( { 'id' : id+'-'+i,
'fields' : [ {'name' : 'subject', 'value' :
field.get(i) } ] } );
}
return newDocs;
}
Look up in another collection
function doWork(doc, ctx, collection,
solrServer, solrServerFactory) {
var imports = new JavaImporter(
org.apache.solr.client.solrj.SolrQuery,
org.apache.solr.client.solrj.util.ClientUtils);
with(imports) {
var sku = doc.getFirstFieldValue("sku");
if (!doc.hasField("mentions")) {
var mentions = ""
var productsSolr = solrServerFactory.getSolrServer("products");
Look up in another collection
if( productsSolr != null ){
var q = "sku:"+sku;
var query = new SolrQuery();
query.setRows(100);
query.setQuery(q);
var res = productsSolr.query(query);
mentions = res.getResults().size();
doc.addField("mentions",mentions);
}
}
}
Reject a
document
function (doc) {
if (doc.hasValue('foo')) {
return null; // stop this document from being indexed.
}
return doc;
}
Java +
Javascript
var ArrayList = Java.type("java.util.ArrayList");
var a = new ArrayList;
Next Steps
o Grab Fusion https://lucidworks.com/download/
o Ingest some data
o Create a JavaScript pipeline stage and manipulate the data
o https://doc.lucidworks.com/fusion/latest/Indexing_Data/Custom-JavaScript-Indexing-
Stages.html
o Attend a training
o Get support
Thank You

More Related Content

What's hot

Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conferenceErik Hatcher
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Alexandre Rafalovitch
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache SolrChristos Manios
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Integrating the Solr search engine
Integrating the Solr search engineIntegrating the Solr search engine
Integrating the Solr search engineth0masr
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Erik Hatcher
 
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBertrand Delacretaz
 
Faster Data Analytics with Apache Spark using Apache Solr
Faster Data Analytics with Apache Spark using Apache SolrFaster Data Analytics with Apache Spark using Apache Solr
Faster Data Analytics with Apache Spark using Apache SolrChitturi Kiran
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature PreviewYonik Seeley
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache SolrEdureka!
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachAlexandre Rafalovitch
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksErik Hatcher
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solrpittaya
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorialChris Huang
 
Webinar: Solr & Spark for Real Time Big Data Analytics
Webinar: Solr & Spark for Real Time Big Data AnalyticsWebinar: Solr & Spark for Real Time Big Data Analytics
Webinar: Solr & Spark for Real Time Big Data AnalyticsLucidworks
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrRahul Jain
 

What's hot (20)

Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conference
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Integrating the Solr search engine
Integrating the Solr search engineIntegrating the Solr search engine
Integrating the Solr search engine
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)
 
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and Solr
 
Faster Data Analytics with Apache Spark using Apache Solr
Faster Data Analytics with Apache Spark using Apache SolrFaster Data Analytics with Apache Spark using Apache Solr
Faster Data Analytics with Apache Spark using Apache Solr
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature Preview
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache Solr
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approach
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solr
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorial
 
Webinar: Solr & Spark for Real Time Big Data Analytics
Webinar: Solr & Spark for Real Time Big Data AnalyticsWebinar: Solr & Spark for Real Time Big Data Analytics
Webinar: Solr & Spark for Real Time Big Data Analytics
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 

Similar to Ingesting and Manipulating Data with JavaScript

Build powerfull and smart web applications with Symfony2
Build powerfull and smart web applications with Symfony2Build powerfull and smart web applications with Symfony2
Build powerfull and smart web applications with Symfony2Hugo Hamon
 
Reaching out from ADF Mobile (ODTUG KScope 2014)
Reaching out from ADF Mobile (ODTUG KScope 2014)Reaching out from ADF Mobile (ODTUG KScope 2014)
Reaching out from ADF Mobile (ODTUG KScope 2014)Luc Bors
 
WebNet Conference 2012 - Designing complex applications using html5 and knock...
WebNet Conference 2012 - Designing complex applications using html5 and knock...WebNet Conference 2012 - Designing complex applications using html5 and knock...
WebNet Conference 2012 - Designing complex applications using html5 and knock...Fabio Franzini
 
Kerberizing spark. Spark Summit east
Kerberizing spark. Spark Summit eastKerberizing spark. Spark Summit east
Kerberizing spark. Spark Summit eastJorge Lopez-Malla
 
Developing your first application using FI-WARE
Developing your first application using FI-WAREDeveloping your first application using FI-WARE
Developing your first application using FI-WAREFermin Galan
 
Developing your first application using FIWARE
Developing your first application using FIWAREDeveloping your first application using FIWARE
Developing your first application using FIWAREFIWARE
 
Fun Teaching MongoDB New Tricks
Fun Teaching MongoDB New TricksFun Teaching MongoDB New Tricks
Fun Teaching MongoDB New TricksMongoDB
 
JCConf 2016 - Dataflow Workshop Labs
JCConf 2016 - Dataflow Workshop LabsJCConf 2016 - Dataflow Workshop Labs
JCConf 2016 - Dataflow Workshop LabsSimon Su
 
GDG Addis - An Introduction to Django and App Engine
GDG Addis - An Introduction to Django and App EngineGDG Addis - An Introduction to Django and App Engine
GDG Addis - An Introduction to Django and App EngineYared Ayalew
 
Xitrum Web Framework Live Coding Demos / Xitrum Web Framework ライブコーディング
Xitrum Web Framework Live Coding Demos / Xitrum Web Framework ライブコーディングXitrum Web Framework Live Coding Demos / Xitrum Web Framework ライブコーディング
Xitrum Web Framework Live Coding Demos / Xitrum Web Framework ライブコーディングscalaconfjp
 
Xitrum @ Scala Matsuri Tokyo 2014
Xitrum @ Scala Matsuri Tokyo 2014Xitrum @ Scala Matsuri Tokyo 2014
Xitrum @ Scala Matsuri Tokyo 2014Ngoc Dao
 
Data science at the command line
Data science at the command lineData science at the command line
Data science at the command lineSharat Chikkerur
 
The Best Way to Become an Android Developer Expert with Android Jetpack
The Best Way to Become an Android Developer Expert  with Android JetpackThe Best Way to Become an Android Developer Expert  with Android Jetpack
The Best Way to Become an Android Developer Expert with Android JetpackAhmad Arif Faizin
 

Similar to Ingesting and Manipulating Data with JavaScript (20)

Build powerfull and smart web applications with Symfony2
Build powerfull and smart web applications with Symfony2Build powerfull and smart web applications with Symfony2
Build powerfull and smart web applications with Symfony2
 
Reaching out from ADF Mobile (ODTUG KScope 2014)
Reaching out from ADF Mobile (ODTUG KScope 2014)Reaching out from ADF Mobile (ODTUG KScope 2014)
Reaching out from ADF Mobile (ODTUG KScope 2014)
 
Hexagonal architecture in PHP
Hexagonal architecture in PHPHexagonal architecture in PHP
Hexagonal architecture in PHP
 
Data Science on Google Cloud Platform
Data Science on Google Cloud PlatformData Science on Google Cloud Platform
Data Science on Google Cloud Platform
 
WebNet Conference 2012 - Designing complex applications using html5 and knock...
WebNet Conference 2012 - Designing complex applications using html5 and knock...WebNet Conference 2012 - Designing complex applications using html5 and knock...
WebNet Conference 2012 - Designing complex applications using html5 and knock...
 
Kerberizing spark. Spark Summit east
Kerberizing spark. Spark Summit eastKerberizing spark. Spark Summit east
Kerberizing spark. Spark Summit east
 
Dart Workshop
Dart WorkshopDart Workshop
Dart Workshop
 
Developing your first application using FI-WARE
Developing your first application using FI-WAREDeveloping your first application using FI-WARE
Developing your first application using FI-WARE
 
Developing your first application using FIWARE
Developing your first application using FIWAREDeveloping your first application using FIWARE
Developing your first application using FIWARE
 
Fun Teaching MongoDB New Tricks
Fun Teaching MongoDB New TricksFun Teaching MongoDB New Tricks
Fun Teaching MongoDB New Tricks
 
JCConf 2016 - Dataflow Workshop Labs
JCConf 2016 - Dataflow Workshop LabsJCConf 2016 - Dataflow Workshop Labs
JCConf 2016 - Dataflow Workshop Labs
 
URLProtocol
URLProtocolURLProtocol
URLProtocol
 
GDG Addis - An Introduction to Django and App Engine
GDG Addis - An Introduction to Django and App EngineGDG Addis - An Introduction to Django and App Engine
GDG Addis - An Introduction to Django and App Engine
 
Gwt.create
Gwt.createGwt.create
Gwt.create
 
Xitrum Web Framework Live Coding Demos / Xitrum Web Framework ライブコーディング
Xitrum Web Framework Live Coding Demos / Xitrum Web Framework ライブコーディングXitrum Web Framework Live Coding Demos / Xitrum Web Framework ライブコーディング
Xitrum Web Framework Live Coding Demos / Xitrum Web Framework ライブコーディング
 
Xitrum @ Scala Matsuri Tokyo 2014
Xitrum @ Scala Matsuri Tokyo 2014Xitrum @ Scala Matsuri Tokyo 2014
Xitrum @ Scala Matsuri Tokyo 2014
 
Power tools in Java
Power tools in JavaPower tools in Java
Power tools in Java
 
Building a Search Engine Using Lucene
Building a Search Engine Using LuceneBuilding a Search Engine Using Lucene
Building a Search Engine Using Lucene
 
Data science at the command line
Data science at the command lineData science at the command line
Data science at the command line
 
The Best Way to Become an Android Developer Expert with Android Jetpack
The Best Way to Become an Android Developer Expert  with Android JetpackThe Best Way to Become an Android Developer Expert  with Android Jetpack
The Best Way to Become an Android Developer Expert with Android Jetpack
 

More from Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceLucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesLucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteLucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchLucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondLucidworks
 

More from Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Recently uploaded

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 

Ingesting and Manipulating Data with JavaScript

  • 1.
  • 2. Ingesting and Manipulating Data with Javascript
  • 3. Produces the world’s largest open source user conference dedicated to Lucene/Solr Lucidworks is the primary sponsor of the Apache Solr project Employs over 40% of the active committers on the Solr project Contributes over 70% of Solr's open source codebase 40% 70% Based in San Francisco Offices in Bangalore, Bangkok, New York City, Raleigh, London Over 300 customers across the Fortune 1000 Fusion, a Solr-powered platform for search-driven apps
  • 4.
  • 5. An optimized search experience for every user using relevance boosting and machine learning. Create custom search and discovery applications in minutes. Highly scalable search engine and NoSQL datastore that gives you instant access to all your data. Lucidworks Fusion product suite
  • 6. • 50+ connectors • Full SQL compatibility • End-to-end security • Multi-dimensional real-time ingestion • Administration and analytics
  • 7. • Personalized recommendations • Machine learning out-of-the- box • Powerful recommenders and classifiers • Predictive search • Point-and-click relevancy tuning
  • 8. • Quick prototyping • Fine-grained security • Stateless architecture • Support 25+ data platforms • Full library of components • Pre-tested reusable modules
  • 10.
  • 11.
  • 17. Javascript vs Pipeline Stage o Existential discussion at Lucidworks o My opinion only…
  • 20. Not… o 20 discrete operations I have to do to convert one field… o Conditional operations (if this then this, otherwise do this other thing) o Canned functionality you have elsewhere. o I don’t want to do anything that feels like programming in form fields…
  • 22. PipelineDocument Highlights https://doc.lucidworks.com/fusion-pipeline- javadocs/3.1/com/lucidworks/apollo/common/pipeline/PipelineDocument.html PipelineDocument{ … addField(name, value); getAllFieldNames(); //include internal use names getFieldNames(); //exclude internal use names getFirstField(name); getLastField(name); removeFields(name); setField(name, value); ... }
  • 24. Basic function (doc) { // do really important things. return doc; }
  • 25. With Context function (doc, ctx) { // do really important things. return doc; } https://doc.lucidworks.com/fusion-pipeline- javadocs/3.1/com/lucidworks/apollo/pipeline/Context.html
  • 26. With Collection function (doc, ctx, collection) { // do really important things. return doc; }
  • 27. With solrServer function (doc, ctx, collection, solrServer) { // do really important things. // solrServer can index/query things return doc; } https://doc.lucidworks.com/fusion-pipeline- javadocs/3.1/com/lucidworks/apollo/component/ BufferingSolrServer.html
  • 28. With solrServerFactory aka SolrClientFactory function (doc, ctx, collection, solrServer, solrServerFactory) { // do really important things. // solrServerFactory look up other collections return doc; } https://doc.lucidworks.com/fusion-pipeline- javadocs/3.1/com/lucidworks/apollo/component/ SolrClientFactory.html
  • 30. Add a Field function (doc) { // replace any values currently in the field with new ones doc.setField('some-new-field', 'some field value'); // for multi value fields this will combine values with old values if there are any, otherwise it will add a new field. doc.addField('some-new-field', 'some field value'); return doc; }
  • 31. Glue Two Fields function(doc) { var value = ""; if (doc.hasField("Actor1Geo_Lat") && doc.hasField("Actor1Geo_Long")) { value = doc.getFirstFieldValue("Actor1Geo_Lat") + "," + doc.getFirstFieldValue("Actor1Geo_Long"); doc.addField("Actor1Geo_p", value); } return doc; }
  • 32. Iterate through the fields function (doc) { // list of doc fields to iterate over var fields = doc.getFieldNames().toArray(); for (var i=0;i < fields.length;i++) { var fieldName = fields[i]; var fieldValue = doc.getFirstFieldValue(fieldName); logger.info("field name:" +fieldName + ", field name: " + fieldValue); } } return doc; }
  • 33. Logging logger.info("field name:" +fieldName + ", field name: " + fieldValue); fusion/3.1.x/var/log/connectors/connectors.log
  • 34. Preview a field function(doc){ if (doc.getId() != null) { var fromField = "body_t"; var toField = "preview_t"; var value = doc.getFirstFieldValue(fromField); var pattern = /n|t/g; value = value.replace(pattern, " "); value = value ? value : ""; } var length = value.length < 500 ? value.length : 500; value = value.substr(0,length); doc.addField(toField, value); } return doc; }
  • 35. Bust up a document function (doc) { var field = doc.getFieldValues('price'); var id = doc.getId(); var newDocs = []; for (i = 0; i < field.size(); i++) { newDocs.push( { 'id' : id+'-'+i, 'fields' : [ {'name' : 'subject', 'value' : field.get(i) } ] } ); } return newDocs; }
  • 36. Look up in another collection function doWork(doc, ctx, collection, solrServer, solrServerFactory) { var imports = new JavaImporter( org.apache.solr.client.solrj.SolrQuery, org.apache.solr.client.solrj.util.ClientUtils); with(imports) { var sku = doc.getFirstFieldValue("sku"); if (!doc.hasField("mentions")) { var mentions = "" var productsSolr = solrServerFactory.getSolrServer("products");
  • 37. Look up in another collection if( productsSolr != null ){ var q = "sku:"+sku; var query = new SolrQuery(); query.setRows(100); query.setQuery(q); var res = productsSolr.query(query); mentions = res.getResults().size(); doc.addField("mentions",mentions); } } }
  • 38. Reject a document function (doc) { if (doc.hasValue('foo')) { return null; // stop this document from being indexed. } return doc; }
  • 39. Java + Javascript var ArrayList = Java.type("java.util.ArrayList"); var a = new ArrayList;
  • 40. Next Steps o Grab Fusion https://lucidworks.com/download/ o Ingest some data o Create a JavaScript pipeline stage and manipulate the data o https://doc.lucidworks.com/fusion/latest/Indexing_Data/Custom-JavaScript-Indexing- Stages.html o Attend a training o Get support

Editor's Notes

  1. Hi, I’m Andrew Oliver, My title is Technical Enablement Manager. I’m a Fusion and Solr junkie. I’ve ingested so much data that my laptop is totally full and now I need to start moving it all to the cloud. Today we’re going to talk about how to use the Fusion Javascript index pipeline stage to manipulate data. We’ll go over some common cases and look at some code. This presentation is mainly for the data engineers and people who have to make this stuff work.
  2. Before we get into the topic I’d like to quickly review that Lucidworks is a San Francisco based company with offices around the world. We are the primary sponsor of the Apache Solr project which powers search for some of the Internet’s largest sites and many of the worlds largest companies. Solr is the core of our product Lucidworks Fusion.
  3. Let’s review Lucidworks Fusion.
  4. Lucidworks Fusion is a platform that includes a highly scalable search engine coupled with AI and Machine learning functionality to give you the most relevant personalized results. In addition we have Fusion App Studio which automates and accelerates the tasks you necessary to develop search applications. Meaning the world does not need someone to write another search box with type-ahead and suggestion functionality, just use app studio, include it and skin it.
  5. Connect to your data wherever it lives with over 50 connectors including databases, intranets, network drives, SharePoint, CRM systems, support tickets, the public web, and the cloud. Access your data your way with the tools you already know with REST APIs and endpoint, text search, analytics, and full SQL queries using familiar commands. Your security model enforced end-to-end from ingest to search including role-based access controls for encryption, masking, and redaction at every level. Multi-dimensional real-time ingestion including documents and data, key-value stores (NoSQL), relational databases (MySQL, Hadoop, JBDC) with graph capabilities to show relationships and detect anomalies. Administration from one unified view for managing and monitoring performance and uptime with load balancing, failover and recovery, and multi-tenancy compatibility.
  6. Personalized recommendations that aggregate user history and actions, and highlights items for exploration and discovery. Machine learning models that are pre-tuned and ready to for production add intelligence to your apps. Powerful recommenders and classifiers for collaborative filtering and understanding intent. Predictive search that suggests items and documents before a user even enters query. Full control over relevancy with simulated preview before going live - and of course rules for boosts and blocks
  7. Protoypes in hours, not weeks with a modular library of UI components Fine-grained security fortified for industries across the Fortune 500 organizations and government agencies Stateless architecture so apps are robust, easy to deploy, and highly scalable Supports over 25 data platforms including Solr, SharePoint, Elasticsearch, Cloudera, Attivio, FAST, MongoDB, and many more - and of course Fusion Server Full library of visualization components for charts, pivots, graphs and more Pre-tested reusable modules include pagination, faceting, geospatial mapping, rich snippets, heatmaps, topic pages, and more.
  8. Let’s get into the meat of the topic at hand. Ingestion and querying in Fusion is governed by pipelines.
  9. Fusion’s ingestion process involves data going into a connector or rest endpoint, through a series of parsers for specific data shapes (like zip files or html or word docs). After data is parsed it is sent through an index pipeline which consists of stages. The last stage sends it to Solr. For developers that remember design patterns, this is the chain of responsibility pattern.
  10. Likewise on the query side, we have a query pipeline that consists of a set of stages, the last of which sends the query to solr and retrieves the data.
  11. Today we’re mostly going to talk about the index pipeline. You see here that I’ve ingested a series of articles from wikipedia. I have a connector, a set of parsers and I’ve expanded the index pipeline. It consists of three stages so far. On the right you see a simulated set of results. Fusion has an extensive library of pipeline stages that cover everything from renaming fields to mapping date types to entity extraction using Natural Langauge processing techniques. Today we’re really going to talk about the Javascript pipeline stage.
  12. We’re not going to go over the query side of things much today but this is the query workbench and a series of query pipeline stages. Fusion comes with a library of pipeline stages from basic faceting to security trimming to boosting results based on what other users clicked on and advanced machine learning based search recommendations. There is also a Javascript stage on the query pipeline side of things, but we’re going to focus on the index pipeline side today.
  13. Without further ado let’s look at the Javascript index pipeline stage
  14. This is my pipeline for querying wikipedia cat pictures. I’ve used this in other webinars such as the site search in 1h that I did early this year. Like most pipeline stages you can have a condition which governs whether the stage executes at all. I find that a bit less important for the Javascript stage since I can basically include that in the script body anyhow. You can paste a script into the body or click “open editor” and edit it in a larger window. For my cat pictures app, if you recall, I used a script to create a preview field from the content body. That’s all I’m showing part of here.
  15. So we mentioned that Fusion has a lot of pre-built pipeline stages that you can just configure and use to manipulate data. Why would you want to use the Javascript stage?
  16. And this is a debate we have internally at Lucidworks too. Here is where I stand on this.
  17. Prebuilt Pipeline stages are great for complex functionality like NLP entity extraction or machine learning classification or anything where configuration just makes a whole lot more sense than code.
  18. And pipeline stages are great for common types of field transformations like date parsing. Or even where you’re just going to run a regex on one or a series of fields.
  19. But if you’ve got a bunch of things you need to do in order to convert one field, then having a bunch of stages seems less optimal. Additionally if you have a condition that governs a lot of different functionality or whether a series of things should be done – then I think using a JavaScript stage is a better solution. Moreover, a lot of companies have functionality they’re using elsewhere that is already in JavaScript or is more easily translated. I in general don’t want to do anything that fields like programming by form fields. Simple configurable data transformations yes…coding via form field...not so much.
  20. The core of what you’ll be working on in the JavaScript index stage is a PipelineDocument. Let’s look at it’s basic interface.
  21. You can find the Javadoc for the pipeline document at the Lucidworks documentation site. It has a lot of different functions, but the basic ones you’ll use are for adding, removing, getting or setting fields. Some of the common ones are listed here.
  22. In the “body” you’re going to put an anonymous function. Let’s look at the basic forms of it.
  23. This is the most common version where you just want to manipulate a pipelinedocument and return the manipulated one.
  24. Sometimes you need context like whether the document is a “signal” basically an event like a click or query as opposed to normal data. Or you may want to pass key/value data to another pipeline object. If so you can inject the context.
  25. If you want to know the name of the collection you’re operating on you can use this form of the fuction.
  26. If you’re going to perform index or query operations from inside your pipeline stage you can have the solrServer component injected.
  27. If you’re going to look up things in other collections or manipulate other collections from inside your pipeline you can have the solr client factory injected. This was renamed “solrClientFactory” from solrServerFactory” but in most of the documentation and examples its still shown as solrServerFactory. All of the function elements are injected by index so you can call it solrClientFactory instead if you like. Heck, You can call it bob if you want to.
  28. So let’s look at common sorts of things you can do with JavaScript. The idea is to give you some code recipes.
  29. If you want to replace a field you can call doc.setField with the field name and the field value. If you want to add a value you can use addField. If the field is multi-valued addfield will add another value or add an addition field if not.
  30. You may want to combine two fields or include conditionals. This shows a latitude being combined with a longitude into a new point field.
  31. Sometimes you want to look through a set of fields. Here we get the field names, then iterate through them, then get the values. This is sort of useless in itself presumably we’d do more than log, but you get the idea.
  32. Speaking of logging we can do info, error, debug… What shows up in the log in terms of level is configurable. You’re wondering where these will show up by default...here’s where the log messages can are emitted...in var/log/connectors/connectors.log
  33. In case you missed my webinar about the cat pictures. Above you see that I’ve taken a body_t field that is parsed from a wikipedia page. I then create a field called preview_t. I grab the value of body_t, operate on it with a regex which ditches the newlines and tabs. Next I trim the field to 500 characters and store it in the preview_t field. Frankly this is a very simple “preview” I could also parse the html and make sure I don’t get in any header information or grab specific parts of the article, but this is good enough for a demo!
  34. Fusion’s parsers generally do a good job of taking a file and turning it into multiple documents. However sometimes you need to grab bits and pieces and create new documents. This “busts up” a document and creates a new set of documents. Note that in this case we’re returnning a collection of documents instead of just one.
  35. This is what you’d need to build and maintain if you want an Intelligent Search Application