SlideShare a Scribd company logo
1 of 41
Download to read offline
Getting Started With
LucidWorks Enterprise
A Lucid Imagination
Technical White Paper
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page i
© 2010 by Lucid Imagination, Inc. under the terms of Creative Commons license, as detailed at
http://www.lucidimagination.com/Copyrights-and-Disclaimers/. Version 1.5, published 7 October 2010. Solr,
Lucene, and their logos are trademarks of the Apache Software Foundation.
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page ii
Abstract
LucidWorks Enterprise is the search solution development platform built on the power of
Apache Solr/Lucene technology, developed by the enterprise search experts at Lucid
Imagination. LucidWorks Enterprise leverages the disruptive innovation of the leading
open source search technology to deliver unmatched scalability to billions of documents,
with subsecond query and faceting response time. By building and expanding the scalable
power of Solr open source technology with vital new features, the search experts at Lucid
Imagination have created an integrated platform that simplifies and empowers predictable,
reliable search application development.
This document is intended to provide you with a basic working knowledge of the
LucidWorks Enterprise search development platform. It provides you with an overview of
the software’s functions and an explanation of how to use them from the provided user
interface, as opposed to from a programming perspective.
You will learn about installation, indexing content from local files, web sites, and databases,
and searching, as well as improving the user experience using features such as user alerts,
auto-complete and spell-check.
This document does not require any previous programming experience.
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page iii
Table of Contents
introduction ............................................................................................................................................................2
What You’ll Learn In This Document ........................................................................................................2
What This Document Won’t Teach You....................................................................................................3
How LucidWorks Enterprise Works..........................................................................................................3
Installation...............................................................................................................................................................4
Using The Installation Wizard......................................................................................................................4
Installing Via Command Line........................................................................................................................6
Testing The Installation..................................................................................................................................7
Basic Searching................................................................................................................................................... 11
Understanding Search Queries................................................................................................................. 11
Searching Individual Fields................................................................................................................... 13
Range Queries............................................................................................................................................. 13
Faceted Searching.......................................................................................................................................... 13
Improving Search Coverage........................................................................................................................... 16
Understanding Fields ................................................................................................................................... 16
Indexing The Local Filesystem.................................................................................................................. 19
Indexing Http Documents........................................................................................................................... 20
Indexing Database Records........................................................................................................................ 21
Indexing Solr Documents............................................................................................................................ 23
Scheduling Tasks............................................................................................................................................ 24
Deleting Test Documents............................................................................................................................ 25
Improving The Search Experience .............................................................................................................. 27
User Alerts ........................................................................................................................................................ 27
Helping Users Create Their Queries ....................................................................................................... 29
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page 1
Auto-Complete............................................................................................................................................ 29
Spell-Checking............................................................................................................................................ 30
Find Similar Links ..................................................................................................................................... 30
Enabling These Functions...................................................................................................................... 31
Specifying Fields........................................................................................................................................ 32
Indexing ........................................................................................................................................................ 33
Improving Relevancy........................................................................................................................................ 33
Synonyms.......................................................................................................................................................... 34
Stopwords......................................................................................................................................................... 35
Click Scoring..................................................................................................................................................... 36
Summary ............................................................................................................................................................... 37
Next Steps ............................................................................................................................................................. 37
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page 2
Introduction
Welcome to LucidWorks Enterprise, the search platform that takes the power of
Solr/Lucene and delivers it to you in one convenient, supported package. This document
will take you from the very beginning of installing the software down through some of the
things you’ll need to know to make the most of it.
LucidWorks Enterprise has been designed to provide you with the search capabilities and
benefits of Solr while still providing the ease of use you need to work efficiently in an
environment in which data is everywhere, and you need to get a handle on it. While it does
provide some great opportunities for programmers to take control and build powerful
search applications using those capabilities, it’s also been designed to take much of the pain
out of using such a complex system.
As such, many of the things you can do with LucidWorks Enterprise can be accomplished
without any programming at all. The administrative user interface provides a way to index
documents for searching, make queries, and even learn about how your system is being
used. It also provides a web interface to most of the functions you’ll need to run your
system.
What You’ll Learn in this Document
This document is meant to give you a running start on getting the most out of LucidWorks
Enterprise. It teaches you about:
 How LucidWorks Enterprise works
 Installing the software and indexing your first test collection
 Searching for data, and how to make the most of the individual fields you’ve indexed
 How to use faceted searches to filter results
 How to index local files (such as word processing documents), web pages (such as
those on local or remote web sites), and even databases
 How to tell LucidWorks about specific attributes of your data
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page 3
 How to configure user-friendly features such as user alerts for new content, auto-
complete, spell-check, and the ability to find related results that don’t necessarily
contain the specified search term
 How to improve the quality of search results by specifying synonyms and stop
words, common words that should be left out of a search
What This Document Won’t Teach You
This document is intended to get you started using LucidWorks Enterprise; it’s not a
programmer’s guide. In addition to numerous ReST-based APIs, LucidWorks Enterprise
gives you complete access to the source code underlying the platform, so you have full
control over query handling, results relevancy, and other factors. For more information on
making use of these capabilities, see the product documentation.
How LucidWorks Enterprise Works
LucidWorks Enterprise works by “indexing” data, or breaking it down into individual
words or terms, each of which is assigned to a “field”, against which you can later search. A
collection of fields is considered a “document”. For example, a PDF on your hard drive
might have fields for “author”, “title”, and “text”, and these three fields make up the
document.
Because you can control not only the names of these fields, but also how they’re treated by
the indexer and the query engine, you have a great deal of flexibility when it comes to
indexing your data. For example, you might index product data out of your database, and
specify that you want the title and description to be indexed (searchable) and that the id
column is to be treated as a unique key, so that if you update the database, each product
“document” in the index can be updated appropriately.
Once you’ve indexed your data, it’s ready for searching. The query parser takes the user’s
request and compares it to the data stored in the index. If it finds a relevant match (or
matches) it then returns information about all of the matching documents.
LucidWorks Enterprise provides a convenient web-based interface for indexing content,
and for controlling the types of information to be returned for each document that satisfies
a query. You can then build upon the application to decide how to use that data. That said,
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page 4
the LucidWorks Enterprise interface provides everything you need to see the results of a
query, with no programming required.
But first, you’ll need to install the software.
Installation
At the time of this writing, LucidWorks Enterprise is available as a limited distribution
Developer Access Release, from http://www.lucidimagination.com/lucidworks-enterprise.
Once you download the software, you’re ready to run the installer.
LucidWorks Enterprise provides both a graphical and a command-line option for
installation. Both provide the same options, but the command-line version can be used for
systems where a graphical user interface isn’t an option.
Note that this document is meant to be a guide, and not a comprehensive look at
installation. If you need more details, please search the downloaded documentation for
“Installation”.
Using the Installation Wizard
To install LucidWorks Enterprise using the GUI, perform the following steps:
1) Make sure that you have Java 1.6 (JDK or JRE) installed on your machine. You can
download Java from
http://www.oracle.com/technetwork/java/javase/downloads/index.html.
2) Double-click the installer to start it. If you are using the *.jar file and double-clicking
doesn’t start the installer, go to the command line and type
java -jar lucidworks-enterprise-installer.jar
(Make sure to use the correct filename.)
3) Click Next to go to the system requirements. The typical desktop machine will
handle small-scale data collections -- less than 100,000 documents, depending on
your specific hardware and existing software, as well as how you intend to use the
data. You will need 8GB to 16GB for a large scale deployment. LucidWorks
Enterprise runs on Windows (XP or higher), Linux (kernel 2.4 or higher) and MacOS
(10.5 or higher).
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page 5
4) Click Next. Read the user agreement, make sure it doesn’t contain anything
objectionable, and click “I accept the terms of this license agreement”. Click Next
again.
5) Choose which components you want to install on this machine, and the addresses
from which you want to access them.
The default is to install and activate all components on the local server using ports
8888 and 8989, and for beginning purposes this is just fine. If this configuration
conflicts with existing applications, however, feel free to change the port numbers.
Keep in mind, however, that when they are installed during the same session, the
SearchUI, AdminUI, and Alerts need to use the same port.
If you are installing components on different machines or ports, be sure to alter the
addresses to point to the appropriate server and port.
Click Next to continue.
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page 6
6) Select the installation path and click Next. Click OK to let the installer create the
new directory.
7) Review the selected options and click Next to install the software. Depending on
your system, this may take several minutes. When the progress bar shows that the
installation is complete, click Next.
8) LucidWorks Enterprise takes a bit longer than usual to start for the first time, so
make sure that Start LucidWorks Enterprise is checked, and then click Next to
continue with the installation.
9) Click the Next button when it becomes available.
10)Decide whether to create shortcuts for other users, and where to place them, and
then click Next.
11)If all has gone well, you will see a screen telling you LucidWorks Enterprise has been
installed, and offering the opportunity to create an installation script. If you are
installing on multiple computers, this script will simplify the process by pre-filling
the values you chose for this installation. Click Done to dismiss the installer.
Installing Via Command Line
The process for installing via the command line is virtually identical to installing via the
graphical interface. Follow these steps:
1) Make sure that you have Java 1.6 (JDK or JRE) installed on your machine. You can
download Java from
http://www.oracle.com/technetwork/java/javase/downloads/index.html.
2) In the directory in which the *.jar file is located, execute the following command:
java -jar lucidworks-enterprise-installer.jar --console
(Again, make sure to use the correct file name.)
3) Follow the steps presented in the installer. For information on specific steps, see
the instructions above.
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page 7
Testing the Installation
You can test the new installation by pulling up the administration user interface in your
browser. To do that, go to:
http://localhost:8989
If everything installed and started properly, you will see the generic search page:
Of course, at this point, you don’t actually have any data indexed, so there’s no point
searching. To index data, you will need to log into the admin UI, so click the “login” link at
the top of the page and log in using:
Username: admin
Password: admin
(This user is pre-installed; before going to production, you will want to either use the User
API or an LDAP directory to manage your users.)
Logging in will bring you to the Quick Start page, where you can index content. Just to have
something to search against, click Local Filesystem and enter the path for a small-ish
directory of documents.
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page 8
Click Continue.
For now, choose to index this content “immediately”.
Click Finish to go to the Dashboard Summary page.
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page 9
This page shows you what’s been going on with your system. In this case, we haven’t had
any queries, so we can just see the number of documents that have been indexed. This
page updates automatically, so by watching it, you’ll know when indexing has been
completed. The final data will show under “Recently Completed”.
Finally, we have data to search! Click the Search tab to go to the search page.
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page 10
Notice that information about the data you’ve indexed already appears on the search page.
In this case, you can see document authors, the data source, and the types of documents.
These notations are called facets, and can be used to narrow results. (We’ll talk more about
facets later.)
Enter a search term and click Search to see the results.
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page 11
Now that we know everything’s working properly, we can talk about searching itself.
Basic searching
On the surface, searching is pretty simple. Enter a keyword, press “search”, and get results.
And that’s true. It is that simple. But it’s also powerful, in that you have the ability to get
more out of your searching than a simple keyword search. In this section, we’ll discuss
how to get more out of your searches.
Understanding Search Queries
The simplest search query involves just a keyword or phrase, such as when I enter “lucid”
in the search box:
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page 12
We can also combine terms into a single query. For example, I can find documents that
contain the terms “indexing” and “delete” with
indexing AND delete
In fact, LucidWorks Enterprise provides a default AND operator unless you specify
something else, such as
indexing OR delete
which finds documents that contain either term.
In fact, LucidWorks Enterprise includes support for a whole range of operators, including
comparative operators such as < and proximity operators such as NEAR, BEFORE, or AFTER. See
the LucidWorks Enterprise User’s Guide for the list.
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page 13
Searching individual fields
Sometimes, however, you want to be more specific. For example, I might want to find all of
my PDF files. If I did a search for just
application/pdf
I’d get no results, because that information isn’t stored in the default search field. Instead, I
could search for
mimeType:application/pdf
This tells LucidWorks Enterprise to search for documents that have a value of
“application/pdf” in the mimeType field.
Range queries
You also have the option to search for a range of values. For example, I can find all of the
documents in my index that have 50 pages or less with:
pageCount:[0 TO 50]
Or if I wanted to be even more specific, I could find all PDF files in that range:
mimeType:application/pdf AND pageCount:[0 TO 50]
One place you often find this kind of query is in faceted searches.
Faceted Searching
The use of facets is one of the great recent advances in searching. Search facets enable
users to “narrow down” their search by a variety of factors. For example, if we go back to
the original keyword search for “lucid”, you can see a number of different options down the
right-hand side of the page:
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page 14
Notice that each entry includes not just a description of what it is, but also how many
relevant documents there are. So I can see that this data source has 18 HTML documents
that mention “Lucid”, and one that was authored by Grant Ingersoll. If I wanted to narrow
my search to, say, OpenOffice presentations, I could click that link under Type.
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page 15
This narrows the list from 51 to just 7 documents. If I wanted to, I could further narrow the
list, say, to show only the documents authored by me.
To clear the existing filters and go back to the original search, click the “clear filters” link
under the search box.
Now let’s look at getting more data to search.
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page 16
Improving search coverage
Of course, the quality of search results depends on the quality of data added to the index. If
the appropriate information isn’t in the index, even the most carefully constructed query
isn’t going to find it.
In this section, we’ll show you how to index both local and remote content so that it can be
found by your users.
Understanding Fields
The first thing we’ll need to do before doing any indexing is understand just how
LucidWorks Enterprise looks at the data we’re putting into it.
Each document is made up of one or more fields. You can see a list of existing fields by
clicking the Index tab, and the Fields subtab in the administration user interface.
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page 17
If you click a field, you’ll see the option to edit the properties of that field. When you’re just
starting out, you’ll want to understand these properties:
 Name: This value is the name by which the field is known, both in the indexing
process and in queries
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page 18
 Field Type: This property determines how the field is handled. For example, in this
case, we’re looking at English text (as opposed to German, etc.) rather than a date, a
sortable number, and so on.
 Indexed: This property specifies whether the contents of this field are used to
determine whether a document matches a particular query.
 Stored: This property determines whether the original value of the field is stored,
potentially to be returned as part of a result.
 Multi-valued: This property determines whether a document can have multiple
values for this field.
 Field Default: This value determines the value that will be used for the document if
no value is given when it’s indexed.
 Search by Default: This property determines whether the field will be used in a
search for which the user doesn’t specify a particular field.
 Include in Results: Make sure this option is checked if you want this field to show
up in the search results for this document.
 Highlight: This property determines whether the given term will be shown in
context for this field.
 Facet: This property determines whether this field shows up as an available filter
on the search page. Note that documents without this field won’t show up in the
counts for this facet.
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page 19
 Use for Deduplication: In all likelihood, you will want to re-index your content as it
changes. This setting enables you to determine how LucidWorks Enterprise knows
this is the “same” document. For a product, it might be a product number. For
HTTP data sources, it might be the URL.
You can also delete and add fields from this interface.
Indexing the Local Filesystem
Even if you haven’t given any thought to what documents you’d like to search, you likely
have a ready source of material right on your hard drive. To create a data source from local
files, click the Index tab, then the Sources subtab, and the FileSystem sub-subtab.
Enter a user-friendly name for your new data source and the full directory path in which
the documents are stored. You have the option to drill down into subdirectories or not, as
well as to follow symbolic links or not.
(Note that for security reasons, data indexed from the local filesystem will not
automatically be available via a link from the search results. Search the product
documentation for “linking” for information on how to configure LucidWorks Enterprise to
activate those links.)
Click Create to create the data source.
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page 20
Note that this process does not start the indexer; we’ll look at that under Scheduling in a
moment.
Indexing HTTP Documents
Another option is to index web documents. You might want to index the contents of your
local intranet, or perhaps you have a repository of content that’s currently available via the
browser. You can also use it to monitor external web sites. To set up a data source of web
content, click the Index tab, then the Sources subtab and the Web sub-subtab.
The Name should be something you’ll recognize later, and the URL is the value at which you
want the crawler to start.
The Allow Paths and Disallow Paths values enable you to control where the crawler goes.
For example, if I were to index my own site, as I’m doing here, I might want only my own
content, so I’ve specified only paths that start with my URL, using a regular expression to
specify the rest of the path. Similarly, I might not want to index my administration pages.
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page 21
You can use Disallow Paths to allow the crawler to follow links to external sites, but avoid,
say, indexing tens of thousands of tweets.
The Crawl Depth specifies how far down the crawler will go. A depth of 0 crawls only the
specified URL; to get only that page and the pages linked directly by it, specify a depth of 1.
Click Create to create the data source.
Note that this process does not start the indexer; we’ll look at that under Scheduling in a
moment.
Indexing Database Records
Another fertile area for data indexing is the database. Here the effort required is a little
greater, but so are the potential rewards. Before you index any database content, however,
there are two tasks you must accomplish:
1) Determine the fields you’re going to be indexing from your database. It’s unlikely
that the fields with which LucidWorks Enterprise is preconfigured will match your
column names exactly, and unless you map those columns to existing fields, all of
your data will wind up in the text_all field, making it difficult to search.
Fortunately, LucidWorks Enterprise gives you complete and easy control over field
definitions. Use the Fields sub-subtab to create any necessary fields.
2) Make sure the appropriate JDBC driver is available to LucidWorks Enterprise. LWE
doesn’t ship with any available drivers, so you will have to upload your own. To do
that, you’ll need to download Curl (available at http://curl.haxx.se/download.html)
and use the following command to upload the *.jar file with the appropriate drivers:
curl -F file=@<filename> http://localhost:8888/api/collections/collection1/jdbcdrivers
Once you’ve accomplished these two steps, you’re ready to create the new data source.
Click the Index tab, then the Sources subtab and the DB sub-subtab.
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page 22
As usual, choose and enter a recognizable name for the data source, and enter the JDBC
URL, minus any authentication information. For example:
jdbc:mysql://127.0.0.1/productDB
Enter the JDBC driver name. This is the actual class name you would use in a Java
application, such as com.mysql.jdbc.Driver. Enter the username and password for the
database.
Finally, enter the query used to extract the data from the database, mapping each column to
a LucidWorks Enterprise field. For example:
select id as id, prod_name as name, price_pt as price from products
In all likelihood, you will have information about a single item, or “document”, in several
tables; to add it to your index, create the appropriate joins to add all of your data as a single
query.
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page 23
(In some cases, you will have structures that can’t be added with a single query. See the
Programmers Guide to learn how to handle these situations using Solr’s
DataImportHandler.)
Click Create to create the data source.
Note that this process does not start the indexer; we’ll look at that under Scheduling in a
moment.
Indexing Solr Documents
One final method of indexing data involves adding it directly using Solr’s native document
format. To do that, you will need a Solr document, which is just an XML document, such as:
<add>
<doc>
<field name='id'>prod3_0</field>
<field name='data_source'>Auxiliary Data</field>
<field name='itemId_s'>3</field>
<field name='itemType_s'>product</field>
<field name='cat'>Printers</field>
<field name='name_t'>Dokad SPE 3299 Printer</field>
<field name='price_td'>99</field>
<field name='blurb_t'>A great printer that doesn't use a lot of ink.</field>
<field name='description_t'>Sed ut perspiciatis ...</field>
<field name='text'>Dokad SPE 3299 Printer99A great printer that doesn't use a
lot of ink.Sed ut ...</field>
</doc>
<doc>
<field name='id'>prod4_0</field>
<field name='data_source'>Auxiliary Data</field>
<field name='itemId_s'>4</field>
<field name='itemType_s'>product</field>
<field name='cat'>Cameras</field>
<field name='cat'>Accessories</field>
<field name='name_t'>Kinok UltraCam II</field>
<field name='price_td'>550</field>
<field name='blurb_t'>Boy, the Kinok UltraCam II is a great camera, and it hooks
up to your printer terrifically.</field>
<field name='description_t'>Lorem ipsum dolor sit amet, consectetur adipisicing
elit, ...</field>
<field name='text'>Kinok UltraCam II550Boy, the Kinok UltraCam II is a great
camera, and it hooks up to your printer terrifically.Lorem ipsum dolor sit amet,
...</field>
</doc>
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page 24
</add>
To index this type of document, click the Index tab, then the Sources subtab and the Solr
sub-subtab.
Enter a recognizable name, as well as the path to the actual document. Note that we’ve
added a data_source field that matches what we’ve named the data source. This is because
Solr documents don’t automatically have this field populated, as the other types do, so if we
want that information to show up in the search facets, we need to provide it ourselves.
Click Create to create the data source.
Note that this process does not start the indexer; we’ll look at that under Scheduling right
now.
Scheduling Tasks
We’re finally ready to look at scheduling the indexing of our data sources. Under the Index
tab, click the Schedules subtab.
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page 25
Click the icon next to DataSources to expand that option, and click the data source you want
to index to highlight it.
You have the option to start indexing in a specific amount of time (such as 0 seconds to
start immediately) or at a specific time on a specific day.
Also, if your data is likely to change, you can specify the frequency with which you want to
reindex it.
You can also deactivate an index if you’d like to stop the indexing process. Deactivating an
index won’t make the data unavailable, however. To do that, you’ll need to delete it
altogether.
Deleting Test Documents
Deleting a data source is a pretty straightforward process. Click the Index tab, and then the
Sources subtab. Under the list of data source types, you’ll find a Delete button.
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page 26
To delete a data source, highlight it and press the Delete button. Note that there is no
confirmation dialog. Once you click Delete, it’s gone.
Sort of. While it will no longer be updated, the data indexed as part of that data source is
still in the index, and queries will still return it. To get rid of it altogether, you will need to
call the underlying engine directly.
To delete all the data in your index – and I really do mean all the data in your index – point
your browser to the following URL:
http://localhost:8888/solr/update?stream.body=<delete><query>*:*]</query></delete>
This process is the opposite of the Delete button; it gets rid of the data, but no data sources.
So you’re ready to either reindex, or delete them and start over.
Assuming that you haven’t started over, we’re ready to look at enhancing your users’
search experience.
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page 27
Improving the Search Experience
LucidWorks Enterprise builds on the rich ecosystem that includes the Solr/Lucene on
which it is built. That means that you have access to all of the best bells and whistles
available with Solr, plus even more, right at your fingertips. In this section, we’ll look at
how to configure some of the most useful.
User Alerts
When you’re dealing with huge amounts of data, one of the biggest challenges is keeping up
with it as it grows. One way to do that is to use user alerts, which notify you when new
content matching your queries has been added to the system.
To set up a user alert for a query, click the “Add this query as alert” link under the search
box.
This link takes you to a page where you can specify the details of where you’d like to
receive the alert.
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page 28
The alerts arrive at the specified email address using the Name of the alert as the subject
line, so you can specify it in a way that works with your mail filters.
You can also specify how often to check for new data, with the option to limit how often it
actually sends you data.
Now, all that said, by default, email alerts are not enabled when LucidWorks Enterprise
ships, because they require administrator configuration. To enable them, edit the file
<LWE_HOME>/rails/config/alerts.yml
to include the appropriate SMTP information.
Even without additional configuration, however, you can still see the results of an alert. To
do that, save the alert to add it to the list provided when you click the View saved alerts link
under the search box.
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page 29
By clicking the Preview link for the alert, you can see the latest results for this particular
query.
Helping Users Create Their Queries
Several of the functions available in LucidWorks Enterprise can help you help your users by
providing guidance on what they should be searching for. These include auto-complete,
spell-checking, and “find similar” links.
Auto-complete
One of the best ways to make sure that users don’t wind up with “no results” is to guide
them towards terms that actually exist within the index. And one of the best ways to do
that is using auto-complete functionality.
Auto-complete looks at the characters the user has already entered and offers terms that
start with those characters.
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page 30
Spell-checking
You can also check the spelling of the terms the user has entered against the existing index
of terms, and offer suggestions. For example, if you were to search for “printe”, the system
might suggest “printer” based on the content you have indexed.
Find Similar links
The “find similar” functionality helps users by finding content they may not have known
they were looking for. For example, a search for “sports” might find a document that
contains the word “basketball”, even if the document doesn’t include the word “sports” at
all.
If a similar result is available, the link appears under the existing result.
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page 31
In order for these three functions to work, you’ll need to make sure of three things:
1) Make sure auto-complete and/or spell checking are enabled.
2) Make sure that at least one field is specified as a source for these terms.
3) For spell-checking and auto-complete, you’ll need to make sure that indexing for
these terms has been performed.
Enabling These Functions
To enable auto-complete, spell-checking, or find similar, click the Queries tab and the
Settings subtab. Click the Search Settings item to highlight it.
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page 32
Make sure that Enable auto-complete, Enable spell checking, and/or Show “find similar”
links are checked, and click Save Settings.
Specifying Fields
To specify one or more fields for these functions, click the Index tab and the Fields subtab.
Highlight the relevant field.
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page 33
Make sure that Index for Spell Checking , Index for Auto-complete, and/or Use in “Find
Similar” are checked, and click Save Settings.
Indexing
Finally, make sure that the spell-checking and/or auto-complete information has been
indexed. To do that, click the Index tab and the Schedules subtab. Click the icon next to
Activities to expand it and highlight spelling or auto-complete. Schedule these indexes just
as you would schedule your data sources.
Now that we’ve got good queries, it’s time to make sure they return good results.
Improving Relevance
One of the advantages of using a search platform such as LucidWorks Enterprise is that
results can be ranked by relevance, with the results most likely to be what the user is
looking for at the top of the list. Out of the box, LucidWorks Enterprise does a pretty good
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page 34
job, but there are ways that you can help LWE improve relevancy even more by including
input from the best computer out there: the human brain.
Synonyms
One way to provide better results is to provide LucidWorks Enterprise with groupings of
words that have the same (or at least similar) meanings. For example, a search for “lawyer”
should probably also find documents that only contain “attorney”. Most industries and
subject areas have their own set of jargon and synonyms, and you can configure them
directly from within the administration user interface.
Click the Queries tab, and then the Settings subtab. Highlight Synonyms and Stopwords,
and then expand the Synonyms entry.
From here, you can add new entries or remove existing entries. Each line is considered a
group; you can add as many comma-delimited terms as you like.
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page 35
Click Save Settings when you’re finished.
Stopwords
In search parlance, a “stopword” is a word that’s so common that adding it to a query rarely
increases the quality of results, and frequently decreases it. For example, if you did a
search for “the City of Chicago”, “Chicago” would certainly provide good results. “City”
might as well. But how many billions of documents that have nothing to do with Chicago
contain the words “the” and “of”?
Fortunately, LucidWorks Enterprise understands the concepts of stopwords, and in most
cases, will eliminate them from your query. It also understands how to handle stopwords
on the back end so that they help improve relevance (for example, by judging the proximity
of two words) rather than hinder it.
LucidWorks Enterprise starts with a list of several dozen stop words, such as “a”, “and”,
“for”, and so on. You may find, however, that you need to add your own. To do that, click
the Queries tab and the Settings subtab. Highlight Synonyms and Stopwords, and expand
the Stopwords entry.
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page 36
As with synonyms, you can use this interface to add, edit, or delete stopwords.
Click Scoring
Perhaps the best way for LucidWorks Enterprise to know whether a result is really
relevant for a particular search query is to keep track of whether a human thinks it is. Click
scoring makes that happen.
When you enable click scoring, LucidWorks Enterprise tracks which results are most often
clicked for a particular query, and “boosts” their relevance scores accordingly. It will then
be more likely to present those results higher in the list for that query.
Using click scoring requires manual configuration of LucidWorks Enterprise. For
information on how to set it up, search for “Click Scoring Relevance Framework” in the
LucidWorks Enterprise documentation.
Getting Started With LucidWorks Enterprise
A Lucid Imagination Technical White Paper • October 2010 Page 37
Summary
By providing a search development platform with a fast, flexible architecture built on open
source, LucidWorks Enterprise harnesses the power of Solr/Lucene in a convenient, well-
curated package, while sparing you the programming pain that would otherwise be
required to get a basic system up and running.
In this document, we showed how to install a single-server instance of LucidWorks
Enterprise, and how to index local, HTTP, and database content. We also looked at the
basic concepts involved in performing search queries.
We then covered some of the bells and whistles that are available to make your life, and the
lives of your users, easer, and how to configure them.
You should now have a fully-functioning search platform, ready for data and customization.
Next Steps
For more information on how Lucid Imagination can help search application developers,
employees, customers, and partners find the information they need, please visit
www.lucidimagination.com to access blog posts, articles, and reviews of dozens of
successful implementations.
Please e-mail specific questions to:
Support and Service: support@lucidimagination.com
Sales and Commercial: sales@lucidimagination.com
Consulting: consulting@lucidimagination.com
Or call: 1.650.353.4057

More Related Content

Viewers also liked

Building SaaS Solutions for Online Media Using Apache Solr
Building SaaS Solutions for Online Media Using Apache SolrBuilding SaaS Solutions for Online Media Using Apache Solr
Building SaaS Solutions for Online Media Using Apache SolrLucidworks (Archived)
 
Tennis
TennisTennis
Tennisaritz
 
Spanish bombss
Spanish bombssSpanish bombss
Spanish bombsstanica
 
across the universe
across the universeacross the universe
across the universetanica
 
Searching The United States Code with Solr/Lucene
Searching The United States Code with Solr/LuceneSearching The United States Code with Solr/Lucene
Searching The United States Code with Solr/LuceneLucidworks (Archived)
 
Impact of open source search on the intelligence community
Impact of open source search on the intelligence communityImpact of open source search on the intelligence community
Impact of open source search on the intelligence communityLucidworks (Archived)
 
Understanding Lucene Search Performance
Understanding Lucene Search PerformanceUnderstanding Lucene Search Performance
Understanding Lucene Search PerformanceLucidworks (Archived)
 
Maroon5
Maroon5Maroon5
Maroon5tanica
 
Amazing grace[1]
Amazing grace[1]Amazing grace[1]
Amazing grace[1]tanica
 
Sudarshan Gaikaiwari - Lucene @ Yelp
Sudarshan Gaikaiwari - Lucene @ YelpSudarshan Gaikaiwari - Lucene @ Yelp
Sudarshan Gaikaiwari - Lucene @ YelpLucidworks (Archived)
 
情報科学演習 09
情報科学演習 09情報科学演習 09
情報科学演習 09libryukyu
 
20101023 ie9 cache
20101023 ie9 cache20101023 ie9 cache
20101023 ie9 cache彰 村地
 
Ecma 262 5th Edition を読む #5 第9条
Ecma 262 5th Edition を読む #5 第9条Ecma 262 5th Edition を読む #5 第9条
Ecma 262 5th Edition を読む #5 第9条彰 村地
 
Presentation to Virginia Beach Vision, 1 27-14
Presentation to Virginia Beach Vision, 1 27-14Presentation to Virginia Beach Vision, 1 27-14
Presentation to Virginia Beach Vision, 1 27-14Marty Kaszubowski
 
Jazeed about Solr - People as A Search Problem
Jazeed about Solr - People as A Search ProblemJazeed about Solr - People as A Search Problem
Jazeed about Solr - People as A Search ProblemLucidworks (Archived)
 

Viewers also liked (20)

Column Stride Fields aka. DocValues
Column Stride Fields aka. DocValuesColumn Stride Fields aka. DocValues
Column Stride Fields aka. DocValues
 
Joan Miro
Joan MiroJoan Miro
Joan Miro
 
Building SaaS Solutions for Online Media Using Apache Solr
Building SaaS Solutions for Online Media Using Apache SolrBuilding SaaS Solutions for Online Media Using Apache Solr
Building SaaS Solutions for Online Media Using Apache Solr
 
Tennis
TennisTennis
Tennis
 
Spanish bombss
Spanish bombssSpanish bombss
Spanish bombss
 
across the universe
across the universeacross the universe
across the universe
 
Noche Estrellada
Noche EstrelladaNoche Estrellada
Noche Estrellada
 
Customized Navigation Using SOLR
Customized Navigation Using SOLRCustomized Navigation Using SOLR
Customized Navigation Using SOLR
 
Searching The United States Code with Solr/Lucene
Searching The United States Code with Solr/LuceneSearching The United States Code with Solr/Lucene
Searching The United States Code with Solr/Lucene
 
Impact of open source search on the intelligence community
Impact of open source search on the intelligence communityImpact of open source search on the intelligence community
Impact of open source search on the intelligence community
 
Understanding Lucene Search Performance
Understanding Lucene Search PerformanceUnderstanding Lucene Search Performance
Understanding Lucene Search Performance
 
Maroon5
Maroon5Maroon5
Maroon5
 
Amazing grace[1]
Amazing grace[1]Amazing grace[1]
Amazing grace[1]
 
Web Design Course Overview
Web Design Course OverviewWeb Design Course Overview
Web Design Course Overview
 
Sudarshan Gaikaiwari - Lucene @ Yelp
Sudarshan Gaikaiwari - Lucene @ YelpSudarshan Gaikaiwari - Lucene @ Yelp
Sudarshan Gaikaiwari - Lucene @ Yelp
 
情報科学演習 09
情報科学演習 09情報科学演習 09
情報科学演習 09
 
20101023 ie9 cache
20101023 ie9 cache20101023 ie9 cache
20101023 ie9 cache
 
Ecma 262 5th Edition を読む #5 第9条
Ecma 262 5th Edition を読む #5 第9条Ecma 262 5th Edition を読む #5 第9条
Ecma 262 5th Edition を読む #5 第9条
 
Presentation to Virginia Beach Vision, 1 27-14
Presentation to Virginia Beach Vision, 1 27-14Presentation to Virginia Beach Vision, 1 27-14
Presentation to Virginia Beach Vision, 1 27-14
 
Jazeed about Solr - People as A Search Problem
Jazeed about Solr - People as A Search ProblemJazeed about Solr - People as A Search Problem
Jazeed about Solr - People as A Search Problem
 

Similar to Getting started with Lucidworks Enterprise

Record matching over query results
Record matching over query resultsRecord matching over query results
Record matching over query resultsambitlick
 
What Lucene and Solr Open Source Search can do for Enterprise Search
What Lucene and Solr Open Source Search can do for Enterprise SearchWhat Lucene and Solr Open Source Search can do for Enterprise Search
What Lucene and Solr Open Source Search can do for Enterprise SearchLucidworks (Archived)
 
Why you need excellent documents and how to produce them… with Enterprise Arc...
Why you need excellent documents and how to produce them… with Enterprise Arc...Why you need excellent documents and how to produce them… with Enterprise Arc...
Why you need excellent documents and how to produce them… with Enterprise Arc...eaDocX
 
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapRHadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapRDouglas Bernardini
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrLucidworks (Archived)
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrLucidworks (Archived)
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrLucidworks (Archived)
 
Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...
Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...
Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...Lucidworks (Archived)
 
Simile Exhibit @ VGSom : A tutorial
Simile Exhibit @ VGSom : A tutorialSimile Exhibit @ VGSom : A tutorial
Simile Exhibit @ VGSom : A tutorialKanishka Chakraborty
 
Documentation Guidelines
Documentation GuidelinesDocumentation Guidelines
Documentation GuidelinesGreg Turnbull
 
White Paper | The Interoperability Executive Customer Council: A Collaboratio...
White Paper | The Interoperability Executive Customer Council: A Collaboratio...White Paper | The Interoperability Executive Customer Council: A Collaboratio...
White Paper | The Interoperability Executive Customer Council: A Collaboratio...The Microsoft Openness Network
 
Research: Developing an Interactive Web Information Retrieval and Visualizati...
Research: Developing an Interactive Web Information Retrieval and Visualizati...Research: Developing an Interactive Web Information Retrieval and Visualizati...
Research: Developing an Interactive Web Information Retrieval and Visualizati...Roman Atachiants
 
Moving to Solr/Lucene Open Source Search
Moving to Solr/Lucene Open Source SearchMoving to Solr/Lucene Open Source Search
Moving to Solr/Lucene Open Source SearchLucidworks (Archived)
 

Similar to Getting started with Lucidworks Enterprise (20)

Liwp consider opensource2010
Liwp consider opensource2010Liwp consider opensource2010
Liwp consider opensource2010
 
Whitepaper- Real World Search
Whitepaper-  Real World SearchWhitepaper-  Real World Search
Whitepaper- Real World Search
 
Record matching over query results
Record matching over query resultsRecord matching over query results
Record matching over query results
 
Consul tutorial
Consul tutorialConsul tutorial
Consul tutorial
 
What Lucene and Solr Open Source Search can do for Enterprise Search
What Lucene and Solr Open Source Search can do for Enterprise SearchWhat Lucene and Solr Open Source Search can do for Enterprise Search
What Lucene and Solr Open Source Search can do for Enterprise Search
 
Cube_it!_software_report_for_IMIS
Cube_it!_software_report_for_IMISCube_it!_software_report_for_IMIS
Cube_it!_software_report_for_IMIS
 
Why you need excellent documents and how to produce them… with Enterprise Arc...
Why you need excellent documents and how to produce them… with Enterprise Arc...Why you need excellent documents and how to produce them… with Enterprise Arc...
Why you need excellent documents and how to produce them… with Enterprise Arc...
 
Microsoft access 2010 guide by mushfiqmukit
Microsoft access 2010 guide by mushfiqmukitMicrosoft access 2010 guide by mushfiqmukit
Microsoft access 2010 guide by mushfiqmukit
 
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapRHadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with Solr
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with Solr
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with Solr
 
Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...
Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...
Guidelines for Managers: What Lucene and Solr Open Source Search can do for E...
 
Simile Exhibit @ VGSom : A tutorial
Simile Exhibit @ VGSom : A tutorialSimile Exhibit @ VGSom : A tutorial
Simile Exhibit @ VGSom : A tutorial
 
Documentation Guidelines
Documentation GuidelinesDocumentation Guidelines
Documentation Guidelines
 
Official Webmaster
Official WebmasterOfficial Webmaster
Official Webmaster
 
White Paper | The Interoperability Executive Customer Council: A Collaboratio...
White Paper | The Interoperability Executive Customer Council: A Collaboratio...White Paper | The Interoperability Executive Customer Council: A Collaboratio...
White Paper | The Interoperability Executive Customer Council: A Collaboratio...
 
Connections2.5
Connections2.5Connections2.5
Connections2.5
 
Research: Developing an Interactive Web Information Retrieval and Visualizati...
Research: Developing an Interactive Web Information Retrieval and Visualizati...Research: Developing an Interactive Web Information Retrieval and Visualizati...
Research: Developing an Interactive Web Information Retrieval and Visualizati...
 
Moving to Solr/Lucene Open Source Search
Moving to Solr/Lucene Open Source SearchMoving to Solr/Lucene Open Source Search
Moving to Solr/Lucene Open Source Search
 

More from Lucidworks (Archived)

Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Lucidworks (Archived)
 
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and SolrLucidworks (Archived)
 
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessSFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessLucidworks (Archived)
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceLucidworks (Archived)
 
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineLucidworks (Archived)
 
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchLucidworks (Archived)
 
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrMinneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrLucidworks (Archived)
 
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchMinneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchLucidworks (Archived)
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Lucidworks (Archived)
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...Lucidworks (Archived)
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Lucidworks (Archived)
 
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCBig Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCLucidworks (Archived)
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCLucidworks (Archived)
 
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCSolr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCLucidworks (Archived)
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCLucidworks (Archived)
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCLucidworks (Archived)
 
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKLucidworks (Archived)
 

More from Lucidworks (Archived) (20)

Integrating Hadoop & Solr
Integrating Hadoop & SolrIntegrating Hadoop & Solr
Integrating Hadoop & Solr
 
The Data-Driven Paradigm
The Data-Driven ParadigmThe Data-Driven Paradigm
The Data-Driven Paradigm
 
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
 
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessSFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
 
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
 
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
 
What's new in solr june 2014
What's new in solr june 2014What's new in solr june 2014
What's new in solr june 2014
 
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrMinneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
 
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchMinneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
 
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCBig Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
 
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCSolr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DC
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
 
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
 

Recently uploaded

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Recently uploaded (20)

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

Getting started with Lucidworks Enterprise

  • 1. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper
  • 2. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page i © 2010 by Lucid Imagination, Inc. under the terms of Creative Commons license, as detailed at http://www.lucidimagination.com/Copyrights-and-Disclaimers/. Version 1.5, published 7 October 2010. Solr, Lucene, and their logos are trademarks of the Apache Software Foundation.
  • 3. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page ii Abstract LucidWorks Enterprise is the search solution development platform built on the power of Apache Solr/Lucene technology, developed by the enterprise search experts at Lucid Imagination. LucidWorks Enterprise leverages the disruptive innovation of the leading open source search technology to deliver unmatched scalability to billions of documents, with subsecond query and faceting response time. By building and expanding the scalable power of Solr open source technology with vital new features, the search experts at Lucid Imagination have created an integrated platform that simplifies and empowers predictable, reliable search application development. This document is intended to provide you with a basic working knowledge of the LucidWorks Enterprise search development platform. It provides you with an overview of the software’s functions and an explanation of how to use them from the provided user interface, as opposed to from a programming perspective. You will learn about installation, indexing content from local files, web sites, and databases, and searching, as well as improving the user experience using features such as user alerts, auto-complete and spell-check. This document does not require any previous programming experience.
  • 4. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page iii Table of Contents introduction ............................................................................................................................................................2 What You’ll Learn In This Document ........................................................................................................2 What This Document Won’t Teach You....................................................................................................3 How LucidWorks Enterprise Works..........................................................................................................3 Installation...............................................................................................................................................................4 Using The Installation Wizard......................................................................................................................4 Installing Via Command Line........................................................................................................................6 Testing The Installation..................................................................................................................................7 Basic Searching................................................................................................................................................... 11 Understanding Search Queries................................................................................................................. 11 Searching Individual Fields................................................................................................................... 13 Range Queries............................................................................................................................................. 13 Faceted Searching.......................................................................................................................................... 13 Improving Search Coverage........................................................................................................................... 16 Understanding Fields ................................................................................................................................... 16 Indexing The Local Filesystem.................................................................................................................. 19 Indexing Http Documents........................................................................................................................... 20 Indexing Database Records........................................................................................................................ 21 Indexing Solr Documents............................................................................................................................ 23 Scheduling Tasks............................................................................................................................................ 24 Deleting Test Documents............................................................................................................................ 25 Improving The Search Experience .............................................................................................................. 27 User Alerts ........................................................................................................................................................ 27 Helping Users Create Their Queries ....................................................................................................... 29
  • 5. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page 1 Auto-Complete............................................................................................................................................ 29 Spell-Checking............................................................................................................................................ 30 Find Similar Links ..................................................................................................................................... 30 Enabling These Functions...................................................................................................................... 31 Specifying Fields........................................................................................................................................ 32 Indexing ........................................................................................................................................................ 33 Improving Relevancy........................................................................................................................................ 33 Synonyms.......................................................................................................................................................... 34 Stopwords......................................................................................................................................................... 35 Click Scoring..................................................................................................................................................... 36 Summary ............................................................................................................................................................... 37 Next Steps ............................................................................................................................................................. 37
  • 6. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page 2 Introduction Welcome to LucidWorks Enterprise, the search platform that takes the power of Solr/Lucene and delivers it to you in one convenient, supported package. This document will take you from the very beginning of installing the software down through some of the things you’ll need to know to make the most of it. LucidWorks Enterprise has been designed to provide you with the search capabilities and benefits of Solr while still providing the ease of use you need to work efficiently in an environment in which data is everywhere, and you need to get a handle on it. While it does provide some great opportunities for programmers to take control and build powerful search applications using those capabilities, it’s also been designed to take much of the pain out of using such a complex system. As such, many of the things you can do with LucidWorks Enterprise can be accomplished without any programming at all. The administrative user interface provides a way to index documents for searching, make queries, and even learn about how your system is being used. It also provides a web interface to most of the functions you’ll need to run your system. What You’ll Learn in this Document This document is meant to give you a running start on getting the most out of LucidWorks Enterprise. It teaches you about:  How LucidWorks Enterprise works  Installing the software and indexing your first test collection  Searching for data, and how to make the most of the individual fields you’ve indexed  How to use faceted searches to filter results  How to index local files (such as word processing documents), web pages (such as those on local or remote web sites), and even databases  How to tell LucidWorks about specific attributes of your data
  • 7. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page 3  How to configure user-friendly features such as user alerts for new content, auto- complete, spell-check, and the ability to find related results that don’t necessarily contain the specified search term  How to improve the quality of search results by specifying synonyms and stop words, common words that should be left out of a search What This Document Won’t Teach You This document is intended to get you started using LucidWorks Enterprise; it’s not a programmer’s guide. In addition to numerous ReST-based APIs, LucidWorks Enterprise gives you complete access to the source code underlying the platform, so you have full control over query handling, results relevancy, and other factors. For more information on making use of these capabilities, see the product documentation. How LucidWorks Enterprise Works LucidWorks Enterprise works by “indexing” data, or breaking it down into individual words or terms, each of which is assigned to a “field”, against which you can later search. A collection of fields is considered a “document”. For example, a PDF on your hard drive might have fields for “author”, “title”, and “text”, and these three fields make up the document. Because you can control not only the names of these fields, but also how they’re treated by the indexer and the query engine, you have a great deal of flexibility when it comes to indexing your data. For example, you might index product data out of your database, and specify that you want the title and description to be indexed (searchable) and that the id column is to be treated as a unique key, so that if you update the database, each product “document” in the index can be updated appropriately. Once you’ve indexed your data, it’s ready for searching. The query parser takes the user’s request and compares it to the data stored in the index. If it finds a relevant match (or matches) it then returns information about all of the matching documents. LucidWorks Enterprise provides a convenient web-based interface for indexing content, and for controlling the types of information to be returned for each document that satisfies a query. You can then build upon the application to decide how to use that data. That said,
  • 8. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page 4 the LucidWorks Enterprise interface provides everything you need to see the results of a query, with no programming required. But first, you’ll need to install the software. Installation At the time of this writing, LucidWorks Enterprise is available as a limited distribution Developer Access Release, from http://www.lucidimagination.com/lucidworks-enterprise. Once you download the software, you’re ready to run the installer. LucidWorks Enterprise provides both a graphical and a command-line option for installation. Both provide the same options, but the command-line version can be used for systems where a graphical user interface isn’t an option. Note that this document is meant to be a guide, and not a comprehensive look at installation. If you need more details, please search the downloaded documentation for “Installation”. Using the Installation Wizard To install LucidWorks Enterprise using the GUI, perform the following steps: 1) Make sure that you have Java 1.6 (JDK or JRE) installed on your machine. You can download Java from http://www.oracle.com/technetwork/java/javase/downloads/index.html. 2) Double-click the installer to start it. If you are using the *.jar file and double-clicking doesn’t start the installer, go to the command line and type java -jar lucidworks-enterprise-installer.jar (Make sure to use the correct filename.) 3) Click Next to go to the system requirements. The typical desktop machine will handle small-scale data collections -- less than 100,000 documents, depending on your specific hardware and existing software, as well as how you intend to use the data. You will need 8GB to 16GB for a large scale deployment. LucidWorks Enterprise runs on Windows (XP or higher), Linux (kernel 2.4 or higher) and MacOS (10.5 or higher).
  • 9. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page 5 4) Click Next. Read the user agreement, make sure it doesn’t contain anything objectionable, and click “I accept the terms of this license agreement”. Click Next again. 5) Choose which components you want to install on this machine, and the addresses from which you want to access them. The default is to install and activate all components on the local server using ports 8888 and 8989, and for beginning purposes this is just fine. If this configuration conflicts with existing applications, however, feel free to change the port numbers. Keep in mind, however, that when they are installed during the same session, the SearchUI, AdminUI, and Alerts need to use the same port. If you are installing components on different machines or ports, be sure to alter the addresses to point to the appropriate server and port. Click Next to continue.
  • 10. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page 6 6) Select the installation path and click Next. Click OK to let the installer create the new directory. 7) Review the selected options and click Next to install the software. Depending on your system, this may take several minutes. When the progress bar shows that the installation is complete, click Next. 8) LucidWorks Enterprise takes a bit longer than usual to start for the first time, so make sure that Start LucidWorks Enterprise is checked, and then click Next to continue with the installation. 9) Click the Next button when it becomes available. 10)Decide whether to create shortcuts for other users, and where to place them, and then click Next. 11)If all has gone well, you will see a screen telling you LucidWorks Enterprise has been installed, and offering the opportunity to create an installation script. If you are installing on multiple computers, this script will simplify the process by pre-filling the values you chose for this installation. Click Done to dismiss the installer. Installing Via Command Line The process for installing via the command line is virtually identical to installing via the graphical interface. Follow these steps: 1) Make sure that you have Java 1.6 (JDK or JRE) installed on your machine. You can download Java from http://www.oracle.com/technetwork/java/javase/downloads/index.html. 2) In the directory in which the *.jar file is located, execute the following command: java -jar lucidworks-enterprise-installer.jar --console (Again, make sure to use the correct file name.) 3) Follow the steps presented in the installer. For information on specific steps, see the instructions above.
  • 11. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page 7 Testing the Installation You can test the new installation by pulling up the administration user interface in your browser. To do that, go to: http://localhost:8989 If everything installed and started properly, you will see the generic search page: Of course, at this point, you don’t actually have any data indexed, so there’s no point searching. To index data, you will need to log into the admin UI, so click the “login” link at the top of the page and log in using: Username: admin Password: admin (This user is pre-installed; before going to production, you will want to either use the User API or an LDAP directory to manage your users.) Logging in will bring you to the Quick Start page, where you can index content. Just to have something to search against, click Local Filesystem and enter the path for a small-ish directory of documents.
  • 12. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page 8 Click Continue. For now, choose to index this content “immediately”. Click Finish to go to the Dashboard Summary page.
  • 13. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page 9 This page shows you what’s been going on with your system. In this case, we haven’t had any queries, so we can just see the number of documents that have been indexed. This page updates automatically, so by watching it, you’ll know when indexing has been completed. The final data will show under “Recently Completed”. Finally, we have data to search! Click the Search tab to go to the search page.
  • 14. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page 10 Notice that information about the data you’ve indexed already appears on the search page. In this case, you can see document authors, the data source, and the types of documents. These notations are called facets, and can be used to narrow results. (We’ll talk more about facets later.) Enter a search term and click Search to see the results.
  • 15. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page 11 Now that we know everything’s working properly, we can talk about searching itself. Basic searching On the surface, searching is pretty simple. Enter a keyword, press “search”, and get results. And that’s true. It is that simple. But it’s also powerful, in that you have the ability to get more out of your searching than a simple keyword search. In this section, we’ll discuss how to get more out of your searches. Understanding Search Queries The simplest search query involves just a keyword or phrase, such as when I enter “lucid” in the search box:
  • 16. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page 12 We can also combine terms into a single query. For example, I can find documents that contain the terms “indexing” and “delete” with indexing AND delete In fact, LucidWorks Enterprise provides a default AND operator unless you specify something else, such as indexing OR delete which finds documents that contain either term. In fact, LucidWorks Enterprise includes support for a whole range of operators, including comparative operators such as < and proximity operators such as NEAR, BEFORE, or AFTER. See the LucidWorks Enterprise User’s Guide for the list.
  • 17. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page 13 Searching individual fields Sometimes, however, you want to be more specific. For example, I might want to find all of my PDF files. If I did a search for just application/pdf I’d get no results, because that information isn’t stored in the default search field. Instead, I could search for mimeType:application/pdf This tells LucidWorks Enterprise to search for documents that have a value of “application/pdf” in the mimeType field. Range queries You also have the option to search for a range of values. For example, I can find all of the documents in my index that have 50 pages or less with: pageCount:[0 TO 50] Or if I wanted to be even more specific, I could find all PDF files in that range: mimeType:application/pdf AND pageCount:[0 TO 50] One place you often find this kind of query is in faceted searches. Faceted Searching The use of facets is one of the great recent advances in searching. Search facets enable users to “narrow down” their search by a variety of factors. For example, if we go back to the original keyword search for “lucid”, you can see a number of different options down the right-hand side of the page:
  • 18. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page 14 Notice that each entry includes not just a description of what it is, but also how many relevant documents there are. So I can see that this data source has 18 HTML documents that mention “Lucid”, and one that was authored by Grant Ingersoll. If I wanted to narrow my search to, say, OpenOffice presentations, I could click that link under Type.
  • 19. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page 15 This narrows the list from 51 to just 7 documents. If I wanted to, I could further narrow the list, say, to show only the documents authored by me. To clear the existing filters and go back to the original search, click the “clear filters” link under the search box. Now let’s look at getting more data to search.
  • 20. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page 16 Improving search coverage Of course, the quality of search results depends on the quality of data added to the index. If the appropriate information isn’t in the index, even the most carefully constructed query isn’t going to find it. In this section, we’ll show you how to index both local and remote content so that it can be found by your users. Understanding Fields The first thing we’ll need to do before doing any indexing is understand just how LucidWorks Enterprise looks at the data we’re putting into it. Each document is made up of one or more fields. You can see a list of existing fields by clicking the Index tab, and the Fields subtab in the administration user interface.
  • 21. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page 17 If you click a field, you’ll see the option to edit the properties of that field. When you’re just starting out, you’ll want to understand these properties:  Name: This value is the name by which the field is known, both in the indexing process and in queries
  • 22. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page 18  Field Type: This property determines how the field is handled. For example, in this case, we’re looking at English text (as opposed to German, etc.) rather than a date, a sortable number, and so on.  Indexed: This property specifies whether the contents of this field are used to determine whether a document matches a particular query.  Stored: This property determines whether the original value of the field is stored, potentially to be returned as part of a result.  Multi-valued: This property determines whether a document can have multiple values for this field.  Field Default: This value determines the value that will be used for the document if no value is given when it’s indexed.  Search by Default: This property determines whether the field will be used in a search for which the user doesn’t specify a particular field.  Include in Results: Make sure this option is checked if you want this field to show up in the search results for this document.  Highlight: This property determines whether the given term will be shown in context for this field.  Facet: This property determines whether this field shows up as an available filter on the search page. Note that documents without this field won’t show up in the counts for this facet.
  • 23. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page 19  Use for Deduplication: In all likelihood, you will want to re-index your content as it changes. This setting enables you to determine how LucidWorks Enterprise knows this is the “same” document. For a product, it might be a product number. For HTTP data sources, it might be the URL. You can also delete and add fields from this interface. Indexing the Local Filesystem Even if you haven’t given any thought to what documents you’d like to search, you likely have a ready source of material right on your hard drive. To create a data source from local files, click the Index tab, then the Sources subtab, and the FileSystem sub-subtab. Enter a user-friendly name for your new data source and the full directory path in which the documents are stored. You have the option to drill down into subdirectories or not, as well as to follow symbolic links or not. (Note that for security reasons, data indexed from the local filesystem will not automatically be available via a link from the search results. Search the product documentation for “linking” for information on how to configure LucidWorks Enterprise to activate those links.) Click Create to create the data source.
  • 24. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page 20 Note that this process does not start the indexer; we’ll look at that under Scheduling in a moment. Indexing HTTP Documents Another option is to index web documents. You might want to index the contents of your local intranet, or perhaps you have a repository of content that’s currently available via the browser. You can also use it to monitor external web sites. To set up a data source of web content, click the Index tab, then the Sources subtab and the Web sub-subtab. The Name should be something you’ll recognize later, and the URL is the value at which you want the crawler to start. The Allow Paths and Disallow Paths values enable you to control where the crawler goes. For example, if I were to index my own site, as I’m doing here, I might want only my own content, so I’ve specified only paths that start with my URL, using a regular expression to specify the rest of the path. Similarly, I might not want to index my administration pages.
  • 25. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page 21 You can use Disallow Paths to allow the crawler to follow links to external sites, but avoid, say, indexing tens of thousands of tweets. The Crawl Depth specifies how far down the crawler will go. A depth of 0 crawls only the specified URL; to get only that page and the pages linked directly by it, specify a depth of 1. Click Create to create the data source. Note that this process does not start the indexer; we’ll look at that under Scheduling in a moment. Indexing Database Records Another fertile area for data indexing is the database. Here the effort required is a little greater, but so are the potential rewards. Before you index any database content, however, there are two tasks you must accomplish: 1) Determine the fields you’re going to be indexing from your database. It’s unlikely that the fields with which LucidWorks Enterprise is preconfigured will match your column names exactly, and unless you map those columns to existing fields, all of your data will wind up in the text_all field, making it difficult to search. Fortunately, LucidWorks Enterprise gives you complete and easy control over field definitions. Use the Fields sub-subtab to create any necessary fields. 2) Make sure the appropriate JDBC driver is available to LucidWorks Enterprise. LWE doesn’t ship with any available drivers, so you will have to upload your own. To do that, you’ll need to download Curl (available at http://curl.haxx.se/download.html) and use the following command to upload the *.jar file with the appropriate drivers: curl -F file=@<filename> http://localhost:8888/api/collections/collection1/jdbcdrivers Once you’ve accomplished these two steps, you’re ready to create the new data source. Click the Index tab, then the Sources subtab and the DB sub-subtab.
  • 26. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page 22 As usual, choose and enter a recognizable name for the data source, and enter the JDBC URL, minus any authentication information. For example: jdbc:mysql://127.0.0.1/productDB Enter the JDBC driver name. This is the actual class name you would use in a Java application, such as com.mysql.jdbc.Driver. Enter the username and password for the database. Finally, enter the query used to extract the data from the database, mapping each column to a LucidWorks Enterprise field. For example: select id as id, prod_name as name, price_pt as price from products In all likelihood, you will have information about a single item, or “document”, in several tables; to add it to your index, create the appropriate joins to add all of your data as a single query.
  • 27. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page 23 (In some cases, you will have structures that can’t be added with a single query. See the Programmers Guide to learn how to handle these situations using Solr’s DataImportHandler.) Click Create to create the data source. Note that this process does not start the indexer; we’ll look at that under Scheduling in a moment. Indexing Solr Documents One final method of indexing data involves adding it directly using Solr’s native document format. To do that, you will need a Solr document, which is just an XML document, such as: <add> <doc> <field name='id'>prod3_0</field> <field name='data_source'>Auxiliary Data</field> <field name='itemId_s'>3</field> <field name='itemType_s'>product</field> <field name='cat'>Printers</field> <field name='name_t'>Dokad SPE 3299 Printer</field> <field name='price_td'>99</field> <field name='blurb_t'>A great printer that doesn't use a lot of ink.</field> <field name='description_t'>Sed ut perspiciatis ...</field> <field name='text'>Dokad SPE 3299 Printer99A great printer that doesn't use a lot of ink.Sed ut ...</field> </doc> <doc> <field name='id'>prod4_0</field> <field name='data_source'>Auxiliary Data</field> <field name='itemId_s'>4</field> <field name='itemType_s'>product</field> <field name='cat'>Cameras</field> <field name='cat'>Accessories</field> <field name='name_t'>Kinok UltraCam II</field> <field name='price_td'>550</field> <field name='blurb_t'>Boy, the Kinok UltraCam II is a great camera, and it hooks up to your printer terrifically.</field> <field name='description_t'>Lorem ipsum dolor sit amet, consectetur adipisicing elit, ...</field> <field name='text'>Kinok UltraCam II550Boy, the Kinok UltraCam II is a great camera, and it hooks up to your printer terrifically.Lorem ipsum dolor sit amet, ...</field> </doc>
  • 28. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page 24 </add> To index this type of document, click the Index tab, then the Sources subtab and the Solr sub-subtab. Enter a recognizable name, as well as the path to the actual document. Note that we’ve added a data_source field that matches what we’ve named the data source. This is because Solr documents don’t automatically have this field populated, as the other types do, so if we want that information to show up in the search facets, we need to provide it ourselves. Click Create to create the data source. Note that this process does not start the indexer; we’ll look at that under Scheduling right now. Scheduling Tasks We’re finally ready to look at scheduling the indexing of our data sources. Under the Index tab, click the Schedules subtab.
  • 29. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page 25 Click the icon next to DataSources to expand that option, and click the data source you want to index to highlight it. You have the option to start indexing in a specific amount of time (such as 0 seconds to start immediately) or at a specific time on a specific day. Also, if your data is likely to change, you can specify the frequency with which you want to reindex it. You can also deactivate an index if you’d like to stop the indexing process. Deactivating an index won’t make the data unavailable, however. To do that, you’ll need to delete it altogether. Deleting Test Documents Deleting a data source is a pretty straightforward process. Click the Index tab, and then the Sources subtab. Under the list of data source types, you’ll find a Delete button.
  • 30. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page 26 To delete a data source, highlight it and press the Delete button. Note that there is no confirmation dialog. Once you click Delete, it’s gone. Sort of. While it will no longer be updated, the data indexed as part of that data source is still in the index, and queries will still return it. To get rid of it altogether, you will need to call the underlying engine directly. To delete all the data in your index – and I really do mean all the data in your index – point your browser to the following URL: http://localhost:8888/solr/update?stream.body=<delete><query>*:*]</query></delete> This process is the opposite of the Delete button; it gets rid of the data, but no data sources. So you’re ready to either reindex, or delete them and start over. Assuming that you haven’t started over, we’re ready to look at enhancing your users’ search experience.
  • 31. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page 27 Improving the Search Experience LucidWorks Enterprise builds on the rich ecosystem that includes the Solr/Lucene on which it is built. That means that you have access to all of the best bells and whistles available with Solr, plus even more, right at your fingertips. In this section, we’ll look at how to configure some of the most useful. User Alerts When you’re dealing with huge amounts of data, one of the biggest challenges is keeping up with it as it grows. One way to do that is to use user alerts, which notify you when new content matching your queries has been added to the system. To set up a user alert for a query, click the “Add this query as alert” link under the search box. This link takes you to a page where you can specify the details of where you’d like to receive the alert.
  • 32. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page 28 The alerts arrive at the specified email address using the Name of the alert as the subject line, so you can specify it in a way that works with your mail filters. You can also specify how often to check for new data, with the option to limit how often it actually sends you data. Now, all that said, by default, email alerts are not enabled when LucidWorks Enterprise ships, because they require administrator configuration. To enable them, edit the file <LWE_HOME>/rails/config/alerts.yml to include the appropriate SMTP information. Even without additional configuration, however, you can still see the results of an alert. To do that, save the alert to add it to the list provided when you click the View saved alerts link under the search box.
  • 33. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page 29 By clicking the Preview link for the alert, you can see the latest results for this particular query. Helping Users Create Their Queries Several of the functions available in LucidWorks Enterprise can help you help your users by providing guidance on what they should be searching for. These include auto-complete, spell-checking, and “find similar” links. Auto-complete One of the best ways to make sure that users don’t wind up with “no results” is to guide them towards terms that actually exist within the index. And one of the best ways to do that is using auto-complete functionality. Auto-complete looks at the characters the user has already entered and offers terms that start with those characters.
  • 34. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page 30 Spell-checking You can also check the spelling of the terms the user has entered against the existing index of terms, and offer suggestions. For example, if you were to search for “printe”, the system might suggest “printer” based on the content you have indexed. Find Similar links The “find similar” functionality helps users by finding content they may not have known they were looking for. For example, a search for “sports” might find a document that contains the word “basketball”, even if the document doesn’t include the word “sports” at all. If a similar result is available, the link appears under the existing result.
  • 35. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page 31 In order for these three functions to work, you’ll need to make sure of three things: 1) Make sure auto-complete and/or spell checking are enabled. 2) Make sure that at least one field is specified as a source for these terms. 3) For spell-checking and auto-complete, you’ll need to make sure that indexing for these terms has been performed. Enabling These Functions To enable auto-complete, spell-checking, or find similar, click the Queries tab and the Settings subtab. Click the Search Settings item to highlight it.
  • 36. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page 32 Make sure that Enable auto-complete, Enable spell checking, and/or Show “find similar” links are checked, and click Save Settings. Specifying Fields To specify one or more fields for these functions, click the Index tab and the Fields subtab. Highlight the relevant field.
  • 37. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page 33 Make sure that Index for Spell Checking , Index for Auto-complete, and/or Use in “Find Similar” are checked, and click Save Settings. Indexing Finally, make sure that the spell-checking and/or auto-complete information has been indexed. To do that, click the Index tab and the Schedules subtab. Click the icon next to Activities to expand it and highlight spelling or auto-complete. Schedule these indexes just as you would schedule your data sources. Now that we’ve got good queries, it’s time to make sure they return good results. Improving Relevance One of the advantages of using a search platform such as LucidWorks Enterprise is that results can be ranked by relevance, with the results most likely to be what the user is looking for at the top of the list. Out of the box, LucidWorks Enterprise does a pretty good
  • 38. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page 34 job, but there are ways that you can help LWE improve relevancy even more by including input from the best computer out there: the human brain. Synonyms One way to provide better results is to provide LucidWorks Enterprise with groupings of words that have the same (or at least similar) meanings. For example, a search for “lawyer” should probably also find documents that only contain “attorney”. Most industries and subject areas have their own set of jargon and synonyms, and you can configure them directly from within the administration user interface. Click the Queries tab, and then the Settings subtab. Highlight Synonyms and Stopwords, and then expand the Synonyms entry. From here, you can add new entries or remove existing entries. Each line is considered a group; you can add as many comma-delimited terms as you like.
  • 39. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page 35 Click Save Settings when you’re finished. Stopwords In search parlance, a “stopword” is a word that’s so common that adding it to a query rarely increases the quality of results, and frequently decreases it. For example, if you did a search for “the City of Chicago”, “Chicago” would certainly provide good results. “City” might as well. But how many billions of documents that have nothing to do with Chicago contain the words “the” and “of”? Fortunately, LucidWorks Enterprise understands the concepts of stopwords, and in most cases, will eliminate them from your query. It also understands how to handle stopwords on the back end so that they help improve relevance (for example, by judging the proximity of two words) rather than hinder it. LucidWorks Enterprise starts with a list of several dozen stop words, such as “a”, “and”, “for”, and so on. You may find, however, that you need to add your own. To do that, click the Queries tab and the Settings subtab. Highlight Synonyms and Stopwords, and expand the Stopwords entry.
  • 40. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page 36 As with synonyms, you can use this interface to add, edit, or delete stopwords. Click Scoring Perhaps the best way for LucidWorks Enterprise to know whether a result is really relevant for a particular search query is to keep track of whether a human thinks it is. Click scoring makes that happen. When you enable click scoring, LucidWorks Enterprise tracks which results are most often clicked for a particular query, and “boosts” their relevance scores accordingly. It will then be more likely to present those results higher in the list for that query. Using click scoring requires manual configuration of LucidWorks Enterprise. For information on how to set it up, search for “Click Scoring Relevance Framework” in the LucidWorks Enterprise documentation.
  • 41. Getting Started With LucidWorks Enterprise A Lucid Imagination Technical White Paper • October 2010 Page 37 Summary By providing a search development platform with a fast, flexible architecture built on open source, LucidWorks Enterprise harnesses the power of Solr/Lucene in a convenient, well- curated package, while sparing you the programming pain that would otherwise be required to get a basic system up and running. In this document, we showed how to install a single-server instance of LucidWorks Enterprise, and how to index local, HTTP, and database content. We also looked at the basic concepts involved in performing search queries. We then covered some of the bells and whistles that are available to make your life, and the lives of your users, easer, and how to configure them. You should now have a fully-functioning search platform, ready for data and customization. Next Steps For more information on how Lucid Imagination can help search application developers, employees, customers, and partners find the information they need, please visit www.lucidimagination.com to access blog posts, articles, and reviews of dozens of successful implementations. Please e-mail specific questions to: Support and Service: support@lucidimagination.com Sales and Commercial: sales@lucidimagination.com Consulting: consulting@lucidimagination.com Or call: 1.650.353.4057