Building a data driven search application with LucidWorks SiLK

Confidential and Proprietary © Copyright 2013
Building a Data-Driven
Log Application
with SILK
April 21, 2014
Search | Discover | Analyze

Agenda
• Introduction to LucidWorks
• The Continuum of Search
• LucidWorks SILK
– Enabling Big Data Search
– 360-degree view of customers and systems
– Breakthrough ROI
• Solution Components
• Demonstration
• Summary and Q&A

Speakers
• Chief Product Officer at LucidWorks
• 15 years product, marketing and BD experience
• Prior to LW 8 years @Splunk (Employee ~9)
• Proud Search Snob
• Leads LucidWorks’ newly created Solutions team
• 16-year track record of data-driven solutions
– Customer analytics/nano-targeting
– Improving product development operations
– Video processing and transmission
• Establishing search as the paradigm for solving the
"last mile problem" of big data

Commercial entity behind Lucene/Solr -
industry leading open search engine:
• 300+ enterprise customers
• Consulting, training, SLAs and “Pro-
Active Support” for open source
 LucidWorks platform provides advanced
search capabilities directly on Solr:
 Connectors , Entity
Extraction, Security, pipelines, rules and
more…
 Solutions (e.g SiLK & LucidWorks App
for Splunk) to help streamline use case
adoption. Platform
Who is LucidWorks

 Intranet Search
 Knowledge Base
 E-Discovery
 E-Commerce
‘Big Data Search’
Application
Innovation
Index
Characteristics
‘Enterprise Search’ ‘Intelligent Search’
 Gigabyte scale
 Single instance
 Full-text
 Terabyte Scale
 Cluster-ready
 Structured/Unstructur
ed Data
 Near real-time
 Search on Hadoop
 Log Analysis
 Fraud Detection
 Unlimited Scale
 Cloud-ready
 Handles any data type
 Real-time
 NoSQL Alternative
Continuum of Search

Creates the data access layer leveraged by best-in-class data-driven
applications:
is the choice of those building data-driven
applications at massive scale
6
Solr is the Choice

A Big Data Search search index
 Unlimited Scale
 Cloud-ready
 Handles any data type
 Real-time
 NoSQL Alternative
7
Creates the data access layer
 At-Hoc Discovery
 Personalization
 Context
That developers & users
demand in their Big
Data applications
Big Data Search
is the partner of choice to deliver next generation search by the
leading Big Data vendors

Big Data Ecosystem WITHOUT LucidWorks Search
Input Data
Stream
Traditional RDBMS/EDW
Doc Stores
Platform for Data Storage and Machine Learning
Difficult Getting Value
from Data
1. Opaque
2. Narrow views into data
3. Out-of-date
4. Not Actionable
5. Accessible mostly to
expert users
6. Expensive, ineffective
translation to broader
set of users
Product Mgr’s
Business Users
Rest of Org
Data
Scientist
BI Analyst
IT
HDFS; NoSQL; Hadoop
Real-time
Processing

Input Data
Stream
Traditional RDBMS/EDW
Doc Stores
Directly Access Data and
Insights to Drive Actions:
Breakthrough ROI
Predictive
Relevant
Actionable
Timely
HDFS; NoSQL; Hadoop
Real-time
Processing
Lucene/
Solr
Solving the Last Mile Problem of Big Data

Solution Components
Gateway
JDBC
Connector
Web/File
System Crawl
Data
Warehouse
Hadoop
Connectors
Clickstream Networking
Data Sources
Connectors
Servers

Events from App/Server/Web Logs,etc
• Application Logs
– 2013-12-18 01:37:20,637 INFO core.SolrCore - [collection1] webapp= path=/browse
params={fl=lucid_facet&facet.query={!tag%3Done_day}dateCreated:[NOW-1DAY/DAY+TO+NOW/DAY]
&facet.query={!tag%3Done_year}dateCreated:[NOW-
365DAYS/DAY+TO+NOW/DAY]&start=260&q=faceting&f.project.facet.limit=20&role=DEFAULT&req_type=main&
hl.simple.post=</span>&facet.field={!ex%3Dsource}source&facet.field={!ex%3Dsource}list_type&facet.field={!ex%
3Dsource}issue_status&facet.field={!ex%3Dsource}lucid_facet&facet.field={!ex%3Dproject}project&facet.field={!e
x%3Dauthor_display}author_display} hits=6761 status=0 QTime=14
• Firewall Logs
– Apr 07 2014 10:14:56 eventid='1278457197410173971' severity=severe category="Penetrate/ArpPoisoning"
hostId=r signature=3201-2 description="Unix Password File Access Attempt" attacker=110.236.0.15
target=27.96.128.0 target=141.146.8.66 gc_score="-5" gc_riskdelta="3" gc_riskrating="false"
gc_deny_packet="true" gc_deny_attacker="false”
• Web Logs
– 50.17.233.225 - - [09/Mar/2014:06:26:50 -0700] "GET / HTTP/1.1" 200 24442 "-" "Mozilla/5.0 (X11; U; Linux i686;
en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7 »
• Syslogs
– Apr 17 07:00:42 Lucids-MacBook-Pro-25.local Microsoft Outlook[2461]: CGSCopyDisplayUUID: Invalid display
0x18d88a81
• Other—Database Logs, Click Data, Conversions, Social Media
(Tweets…), Financial Data, Product Catalogs, Knowledge Base, etc.
• Volume, Variety and Velocity

Application Development Process
• Understand your Users
• Know your Data
• Prepare and Ingest Data into Solr
• Build Visualizations
• Iterate

Search Analytics—Understand your Users
• Who will use this application
– Business User (eCommerce or KM), IT and Search Administrators
• What are they interested in?
– What are people searching for?
– Which queries are returning zero hits?
– Which searches are providing slow response times?
– What is my memory & cpu usage, jvm metrics, etc.?
– Is there a trend in my slow searches?
– Is the cache warm-up time very large?
• First three of interest to Business User, Search
Admins/Developers interested in all six.

Search Analytics–Know your Data
• Where is the data available?
– Core Logs
– Core Request Logs
– Connector Logs
– Mbeans API
– Log4j
• Data Connectors
– LogStash (for this example)
– Hadoop Job Jar

Centralized Logging Infrastructures
• Can be built using a combination of LogStash, Apache
Flume, Lumberjack, Rabbit MQ, Apache Kafka, etc.
• Today’s example uses LogStash—extensive
documentation at http://logstash.net/docs/1.4.0
Shipper
Shipper
Broker Indexer

Solr/Solr Cloud
Search Analytics—Data Ingestion & Visualization
Gateway
(Reverse Proxy)
Solr Output
Writer for
LogStash (Http)
Search Logs
Visualization
Configurable Dashboards
Hadoop Connector
GrokIngestMapperLogStash

DEMO
Search | Discover | Analyze

• Contacts
– Will Hayes, Chief Product Officer
will.hayes@lucidworks.com twitter:@iamwillhayes
– Ravi Krishnamurthy, Director of Solutions
ravi.krishnamurthy@lucidworks.com
• Links
– http://www.lucidworks.com/silk
Q & A

Building a data driven search application with LucidWorks SiLK

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Building a data driven search application with LucidWorks SiLK

Similar to Building a data driven search application with LucidWorks SiLK (20)

More from Lucidworks (Archived)

More from Lucidworks (Archived) (20)

Recently uploaded

Recently uploaded (20)

Building a data driven search application with LucidWorks SiLK

Editor's Notes