Confidential and Proprietary © Copyright 2013
Building a Data-Driven
Log Application
with SILK
April 21, 2014
Search | Dis...
Confidential and Proprietary © Copyright 2013
Agenda
• Introduction to LucidWorks
• The Continuum of Search
• LucidWorks S...
Confidential and Proprietary © Copyright 2013
Speakers
• Chief Product Officer at LucidWorks
• 15 years product, marketing...
Confidential and Proprietary © Copyright 2013
Commercial entity behind Lucene/Solr -
industry leading open search engine:
...
Confidential and Proprietary © Copyright 2013
 Intranet Search
 Knowledge Base
 E-Discovery
 E-Commerce
‘Big Data Sear...
Confidential and Proprietary © Copyright 2013
Creates the data access layer leveraged by best-in-class data-driven
applica...
Confidential and Proprietary © Copyright 2013
A Big Data Search search index
 Unlimited Scale
 Cloud-ready
 Handles any...
Confidential and Proprietary © Copyright 2013
Big Data Ecosystem WITHOUT LucidWorks Search
Input Data
Stream
Traditional R...
Confidential and Proprietary © Copyright 2013
Input Data
Stream
Traditional RDBMS/EDW
Doc Stores
Directly Access Data and
...
Confidential and Proprietary © Copyright 2013
Solution Components
Gateway
JDBC
Connector
Web/File
System Crawl
Data
Wareho...
Confidential and Proprietary © Copyright 2013
Events from App/Server/Web Logs,etc
• Application Logs
– 2013-12-18 01:37:20...
Confidential and Proprietary © Copyright 2013
Application Development Process
• Understand your Users
• Know your Data
• P...
Confidential and Proprietary © Copyright 2013
Search Analytics—Understand your Users
• Who will use this application
– Bus...
Confidential and Proprietary © Copyright 2013
Search Analytics–Know your Data
• Where is the data available?
– Core Logs
–...
Confidential and Proprietary © Copyright 2013
Centralized Logging Infrastructures
• Can be built using a combination of Lo...
Confidential and Proprietary © Copyright 2013
Solr/Solr Cloud
Search Analytics—Data Ingestion & Visualization
Gateway
(Rev...
Confidential and Proprietary © Copyright 2013
DEMO
Search | Discover | Analyze
Confidential and Proprietary © Copyright 20...
Confidential and Proprietary © Copyright 2013
• Contacts
– Will Hayes, Chief Product Officer
will.hayes@lucidworks.com twi...
Upcoming SlideShare
Loading in …5
×

Building a data driven search application with LucidWorks SiLK

2,221 views

Published on

LucidWorks SiLK is an open source stack that combines Lucene/Solr with best in class open source data ingestion and analytics tools such as Flume, LogStash and Kibana. This webinar will explore the features of SiLK, and provide attendees with valuable information on how they can benefit from the following:

- A powerful UI to analyze time series data stored in Lucene/Solr
- Creating and sharing visualizations, dashboards and reports
- Discovery and analysis of data coming from servers, applications, devices and more
- Exploration of click, geospatial and social data in ways previously unimaginable

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,221
On SlideShare
0
From Embeds
0
Number of Embeds
643
Actions
Shares
0
Downloads
44
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • McKinsey estimates that search and big data analysis can increase profits in the retail sector by 60%. Increasingly, innovation in this sector means simulation, experimentation and iteration. Access to data and understanding the user patters in order to run different modes is what drives this growth. These are over course techniques that’s search practioners have been perfecting for over a decade
  • McKinsey estimates that search and big data analysis can increase profits in the retail sector by 60%. Increasingly, innovation in this sector means simulation, experimentation and iteration. Access to data and understanding the user patters in order to run different modes is what drives this growth. These are over course techniques that’s search practioners have been perfecting for over a decade
  • Rather than speak solely in the abstract, I shall illustrate how we internally use LucidWorks SILK to get insight from search logs
  • For the Search Analytics case, I am fortunate that my users are sitting next to me
  • I chose LogStash for data transformation and import for two reasons: It provides a powerful framework for extracting, grokking and transforming log data into a structured format that Solr can consume and that SILK can use for dashboards.LucidWorks’ Hadoop Connectors have a GrokIngestMapper that allows me to reuse the same LogStash Filters to work with larger volumes of files on HDFS (more details on this in a future article).
  • Building a data driven search application with LucidWorks SiLK

    1. 1. Confidential and Proprietary © Copyright 2013 Building a Data-Driven Log Application with SILK April 21, 2014 Search | Discover | Analyze
    2. 2. Confidential and Proprietary © Copyright 2013 Agenda • Introduction to LucidWorks • The Continuum of Search • LucidWorks SILK – Enabling Big Data Search – 360-degree view of customers and systems – Breakthrough ROI • Solution Components • Demonstration • Summary and Q&A
    3. 3. Confidential and Proprietary © Copyright 2013 Speakers • Chief Product Officer at LucidWorks • 15 years product, marketing and BD experience • Prior to LW 8 years @Splunk (Employee ~9) • Proud Search Snob • Leads LucidWorks’ newly created Solutions team • 16-year track record of data-driven solutions – Customer analytics/nano-targeting – Improving product development operations – Video processing and transmission • Establishing search as the paradigm for solving the "last mile problem" of big data
    4. 4. Confidential and Proprietary © Copyright 2013 Commercial entity behind Lucene/Solr - industry leading open search engine: • 300+ enterprise customers • Consulting, training, SLAs and “Pro- Active Support” for open source  LucidWorks platform provides advanced search capabilities directly on Solr:  Connectors , Entity Extraction, Security, pipelines, rules and more…  Solutions (e.g SiLK & LucidWorks App for Splunk) to help streamline use case adoption. Platform Who is LucidWorks
    5. 5. Confidential and Proprietary © Copyright 2013  Intranet Search  Knowledge Base  E-Discovery  E-Commerce ‘Big Data Search’ Application Innovation Index Characteristics ‘Enterprise Search’ ‘Intelligent Search’  Gigabyte scale  Single instance  Full-text  Terabyte Scale  Cluster-ready  Structured/Unstructur ed Data  Near real-time  Search on Hadoop  Log Analysis  Fraud Detection  Unlimited Scale  Cloud-ready  Handles any data type  Real-time  NoSQL Alternative Continuum of Search
    6. 6. Confidential and Proprietary © Copyright 2013 Creates the data access layer leveraged by best-in-class data-driven applications: is the choice of those building data-driven applications at massive scale 6 Solr is the Choice
    7. 7. Confidential and Proprietary © Copyright 2013 A Big Data Search search index  Unlimited Scale  Cloud-ready  Handles any data type  Real-time  NoSQL Alternative 7 Creates the data access layer  At-Hoc Discovery  Personalization  Context That developers & users demand in their Big Data applications Big Data Search is the partner of choice to deliver next generation search by the leading Big Data vendors
    8. 8. Confidential and Proprietary © Copyright 2013 Big Data Ecosystem WITHOUT LucidWorks Search Input Data Stream Traditional RDBMS/EDW Doc Stores Platform for Data Storage and Machine Learning Difficult Getting Value from Data 1. Opaque 2. Narrow views into data 3. Out-of-date 4. Not Actionable 5. Accessible mostly to expert users 6. Expensive, ineffective translation to broader set of users Product Mgr’s Business Users Rest of Org Data Scientist BI Analyst IT HDFS; NoSQL; Hadoop Real-time Processing
    9. 9. Confidential and Proprietary © Copyright 2013 Input Data Stream Traditional RDBMS/EDW Doc Stores Directly Access Data and Insights to Drive Actions: Breakthrough ROI Predictive Relevant Actionable Timely HDFS; NoSQL; Hadoop Real-time Processing Lucene/ Solr Solving the Last Mile Problem of Big Data
    10. 10. Confidential and Proprietary © Copyright 2013 Solution Components Gateway JDBC Connector Web/File System Crawl Data Warehouse Hadoop Connectors Clickstream Networking Data Sources Connectors Servers
    11. 11. Confidential and Proprietary © Copyright 2013 Events from App/Server/Web Logs,etc • Application Logs – 2013-12-18 01:37:20,637 INFO core.SolrCore - [collection1] webapp= path=/browse params={fl=lucid_facet&facet.query={!tag%3Done_day}dateCreated:[NOW-1DAY/DAY+TO+NOW/DAY] &facet.query={!tag%3Done_year}dateCreated:[NOW- 365DAYS/DAY+TO+NOW/DAY]&start=260&q=faceting&f.project.facet.limit=20&role=DEFAULT&req_type=main& hl.simple.post=</span>&facet.field={!ex%3Dsource}source&facet.field={!ex%3Dsource}list_type&facet.field={!ex% 3Dsource}issue_status&facet.field={!ex%3Dsource}lucid_facet&facet.field={!ex%3Dproject}project&facet.field={!e x%3Dauthor_display}author_display} hits=6761 status=0 QTime=14 • Firewall Logs – Apr 07 2014 10:14:56 eventid='1278457197410173971' severity=severe category="Penetrate/ArpPoisoning" hostId=r signature=3201-2 description="Unix Password File Access Attempt" attacker=110.236.0.15 target=27.96.128.0 target=141.146.8.66 gc_score="-5" gc_riskdelta="3" gc_riskrating="false" gc_deny_packet="true" gc_deny_attacker="false” • Web Logs – 50.17.233.225 - - [09/Mar/2014:06:26:50 -0700] "GET / HTTP/1.1" 200 24442 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7 » • Syslogs – Apr 17 07:00:42 Lucids-MacBook-Pro-25.local Microsoft Outlook[2461]: CGSCopyDisplayUUID: Invalid display 0x18d88a81 • Other—Database Logs, Click Data, Conversions, Social Media (Tweets…), Financial Data, Product Catalogs, Knowledge Base, etc. • Volume, Variety and Velocity
    12. 12. Confidential and Proprietary © Copyright 2013 Application Development Process • Understand your Users • Know your Data • Prepare and Ingest Data into Solr • Build Visualizations • Iterate
    13. 13. Confidential and Proprietary © Copyright 2013 Search Analytics—Understand your Users • Who will use this application – Business User (eCommerce or KM), IT and Search Administrators • What are they interested in? – What are people searching for? – Which queries are returning zero hits? – Which searches are providing slow response times? – What is my memory & cpu usage, jvm metrics, etc.? – Is there a trend in my slow searches? – Is the cache warm-up time very large? • First three of interest to Business User, Search Admins/Developers interested in all six.
    14. 14. Confidential and Proprietary © Copyright 2013 Search Analytics–Know your Data • Where is the data available? – Core Logs – Core Request Logs – Connector Logs – Mbeans API – Log4j • Data Connectors – LogStash (for this example) – Hadoop Job Jar
    15. 15. Confidential and Proprietary © Copyright 2013 Centralized Logging Infrastructures • Can be built using a combination of LogStash, Apache Flume, Lumberjack, Rabbit MQ, Apache Kafka, etc. • Today’s example uses LogStash—extensive documentation at http://logstash.net/docs/1.4.0 Shipper Shipper Broker Indexer
    16. 16. Confidential and Proprietary © Copyright 2013 Solr/Solr Cloud Search Analytics—Data Ingestion & Visualization Gateway (Reverse Proxy) Solr Output Writer for LogStash (Http) Search Logs Visualization Configurable Dashboards Hadoop Connector GrokIngestMapperLogStash
    17. 17. Confidential and Proprietary © Copyright 2013 DEMO Search | Discover | Analyze Confidential and Proprietary © Copyright 2013
    18. 18. Confidential and Proprietary © Copyright 2013 • Contacts – Will Hayes, Chief Product Officer will.hayes@lucidworks.com twitter:@iamwillhayes – Ravi Krishnamurthy, Director of Solutions ravi.krishnamurthy@lucidworks.com • Links – http://www.lucidworks.com/silk Q & A

    ×