Your SlideShare is downloading. ×
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Building a data driven search application with LucidWorks SiLK

1,390

Published on

LucidWorks SiLK is an open source stack that combines Lucene/Solr with best in class open source data ingestion and analytics tools such as Flume, LogStash and Kibana. This webinar will explore the …

LucidWorks SiLK is an open source stack that combines Lucene/Solr with best in class open source data ingestion and analytics tools such as Flume, LogStash and Kibana. This webinar will explore the features of SiLK, and provide attendees with valuable information on how they can benefit from the following:

- A powerful UI to analyze time series data stored in Lucene/Solr
- Creating and sharing visualizations, dashboards and reports
- Discovery and analysis of data coming from servers, applications, devices and more
- Exploration of click, geospatial and social data in ways previously unimaginable

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,390
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
35
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • McKinsey estimates that search and big data analysis can increase profits in the retail sector by 60%. Increasingly, innovation in this sector means simulation, experimentation and iteration. Access to data and understanding the user patters in order to run different modes is what drives this growth. These are over course techniques that’s search practioners have been perfecting for over a decade
  • McKinsey estimates that search and big data analysis can increase profits in the retail sector by 60%. Increasingly, innovation in this sector means simulation, experimentation and iteration. Access to data and understanding the user patters in order to run different modes is what drives this growth. These are over course techniques that’s search practioners have been perfecting for over a decade
  • Rather than speak solely in the abstract, I shall illustrate how we internally use LucidWorks SILK to get insight from search logs
  • For the Search Analytics case, I am fortunate that my users are sitting next to me
  • I chose LogStash for data transformation and import for two reasons: It provides a powerful framework for extracting, grokking and transforming log data into a structured format that Solr can consume and that SILK can use for dashboards.LucidWorks’ Hadoop Connectors have a GrokIngestMapper that allows me to reuse the same LogStash Filters to work with larger volumes of files on HDFS (more details on this in a future article).
  • Transcript

    • 1. Confidential and Proprietary © Copyright 2013 Building a Data-Driven Log Application with SILK April 21, 2014 Search | Discover | Analyze
    • 2. Confidential and Proprietary © Copyright 2013 Agenda • Introduction to LucidWorks • The Continuum of Search • LucidWorks SILK – Enabling Big Data Search – 360-degree view of customers and systems – Breakthrough ROI • Solution Components • Demonstration • Summary and Q&A
    • 3. Confidential and Proprietary © Copyright 2013 Speakers • Chief Product Officer at LucidWorks • 15 years product, marketing and BD experience • Prior to LW 8 years @Splunk (Employee ~9) • Proud Search Snob • Leads LucidWorks’ newly created Solutions team • 16-year track record of data-driven solutions – Customer analytics/nano-targeting – Improving product development operations – Video processing and transmission • Establishing search as the paradigm for solving the "last mile problem" of big data
    • 4. Confidential and Proprietary © Copyright 2013 Commercial entity behind Lucene/Solr - industry leading open search engine: • 300+ enterprise customers • Consulting, training, SLAs and “Pro- Active Support” for open source  LucidWorks platform provides advanced search capabilities directly on Solr:  Connectors , Entity Extraction, Security, pipelines, rules and more…  Solutions (e.g SiLK & LucidWorks App for Splunk) to help streamline use case adoption. Platform Who is LucidWorks
    • 5. Confidential and Proprietary © Copyright 2013  Intranet Search  Knowledge Base  E-Discovery  E-Commerce ‘Big Data Search’ Application Innovation Index Characteristics ‘Enterprise Search’ ‘Intelligent Search’  Gigabyte scale  Single instance  Full-text  Terabyte Scale  Cluster-ready  Structured/Unstructur ed Data  Near real-time  Search on Hadoop  Log Analysis  Fraud Detection  Unlimited Scale  Cloud-ready  Handles any data type  Real-time  NoSQL Alternative Continuum of Search
    • 6. Confidential and Proprietary © Copyright 2013 Creates the data access layer leveraged by best-in-class data-driven applications: is the choice of those building data-driven applications at massive scale 6 Solr is the Choice
    • 7. Confidential and Proprietary © Copyright 2013 A Big Data Search search index  Unlimited Scale  Cloud-ready  Handles any data type  Real-time  NoSQL Alternative 7 Creates the data access layer  At-Hoc Discovery  Personalization  Context That developers & users demand in their Big Data applications Big Data Search is the partner of choice to deliver next generation search by the leading Big Data vendors
    • 8. Confidential and Proprietary © Copyright 2013 Big Data Ecosystem WITHOUT LucidWorks Search Input Data Stream Traditional RDBMS/EDW Doc Stores Platform for Data Storage and Machine Learning Difficult Getting Value from Data 1. Opaque 2. Narrow views into data 3. Out-of-date 4. Not Actionable 5. Accessible mostly to expert users 6. Expensive, ineffective translation to broader set of users Product Mgr’s Business Users Rest of Org Data Scientist BI Analyst IT HDFS; NoSQL; Hadoop Real-time Processing
    • 9. Confidential and Proprietary © Copyright 2013 Input Data Stream Traditional RDBMS/EDW Doc Stores Directly Access Data and Insights to Drive Actions: Breakthrough ROI Predictive Relevant Actionable Timely HDFS; NoSQL; Hadoop Real-time Processing Lucene/ Solr Solving the Last Mile Problem of Big Data
    • 10. Confidential and Proprietary © Copyright 2013 Solution Components Gateway JDBC Connector Web/File System Crawl Data Warehouse Hadoop Connectors Clickstream Networking Data Sources Connectors Servers
    • 11. Confidential and Proprietary © Copyright 2013 Events from App/Server/Web Logs,etc • Application Logs – 2013-12-18 01:37:20,637 INFO core.SolrCore - [collection1] webapp= path=/browse params={fl=lucid_facet&facet.query={!tag%3Done_day}dateCreated:[NOW-1DAY/DAY+TO+NOW/DAY] &facet.query={!tag%3Done_year}dateCreated:[NOW- 365DAYS/DAY+TO+NOW/DAY]&start=260&q=faceting&f.project.facet.limit=20&role=DEFAULT&req_type=main& hl.simple.post=</span>&facet.field={!ex%3Dsource}source&facet.field={!ex%3Dsource}list_type&facet.field={!ex% 3Dsource}issue_status&facet.field={!ex%3Dsource}lucid_facet&facet.field={!ex%3Dproject}project&facet.field={!e x%3Dauthor_display}author_display} hits=6761 status=0 QTime=14 • Firewall Logs – Apr 07 2014 10:14:56 eventid='1278457197410173971' severity=severe category="Penetrate/ArpPoisoning" hostId=r signature=3201-2 description="Unix Password File Access Attempt" attacker=110.236.0.15 target=27.96.128.0 target=141.146.8.66 gc_score="-5" gc_riskdelta="3" gc_riskrating="false" gc_deny_packet="true" gc_deny_attacker="false” • Web Logs – 50.17.233.225 - - [09/Mar/2014:06:26:50 -0700] "GET / HTTP/1.1" 200 24442 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7 » • Syslogs – Apr 17 07:00:42 Lucids-MacBook-Pro-25.local Microsoft Outlook[2461]: CGSCopyDisplayUUID: Invalid display 0x18d88a81 • Other—Database Logs, Click Data, Conversions, Social Media (Tweets…), Financial Data, Product Catalogs, Knowledge Base, etc. • Volume, Variety and Velocity
    • 12. Confidential and Proprietary © Copyright 2013 Application Development Process • Understand your Users • Know your Data • Prepare and Ingest Data into Solr • Build Visualizations • Iterate
    • 13. Confidential and Proprietary © Copyright 2013 Search Analytics—Understand your Users • Who will use this application – Business User (eCommerce or KM), IT and Search Administrators • What are they interested in? – What are people searching for? – Which queries are returning zero hits? – Which searches are providing slow response times? – What is my memory & cpu usage, jvm metrics, etc.? – Is there a trend in my slow searches? – Is the cache warm-up time very large? • First three of interest to Business User, Search Admins/Developers interested in all six.
    • 14. Confidential and Proprietary © Copyright 2013 Search Analytics–Know your Data • Where is the data available? – Core Logs – Core Request Logs – Connector Logs – Mbeans API – Log4j • Data Connectors – LogStash (for this example) – Hadoop Job Jar
    • 15. Confidential and Proprietary © Copyright 2013 Centralized Logging Infrastructures • Can be built using a combination of LogStash, Apache Flume, Lumberjack, Rabbit MQ, Apache Kafka, etc. • Today’s example uses LogStash—extensive documentation at http://logstash.net/docs/1.4.0 Shipper Shipper Broker Indexer
    • 16. Confidential and Proprietary © Copyright 2013 Solr/Solr Cloud Search Analytics—Data Ingestion & Visualization Gateway (Reverse Proxy) Solr Output Writer for LogStash (Http) Search Logs Visualization Configurable Dashboards Hadoop Connector GrokIngestMapperLogStash
    • 17. Confidential and Proprietary © Copyright 2013 DEMO Search | Discover | Analyze Confidential and Proprietary © Copyright 2013
    • 18. Confidential and Proprietary © Copyright 2013 • Contacts – Will Hayes, Chief Product Officer will.hayes@lucidworks.com twitter:@iamwillhayes – Ravi Krishnamurthy, Director of Solutions ravi.krishnamurthy@lucidworks.com • Links – http://www.lucidworks.com/silk Q & A

    ×