Submit Search
Upload
Solr+Hadoop = Big Data Search
•
60 likes
•
32,738 views
Cloudera, Inc.
Follow
From Solr committer Mark Miller
Read less
Read more
Technology
Report
Share
Report
Share
1 of 33
Download now
Download to read offline
Recommended
Solr Recipes
Solr Recipes
Erik Hatcher
Rapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
Adding Search to the Hadoop Ecosystem
Adding Search to the Hadoop Ecosystem
Cloudera, Inc.
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
Rahul Jain
Integrating the Solr search engine
Integrating the Solr search engine
th0masr
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with Solr
Lucidworks (Archived)
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Lucidworks
Lucene for Solr Developers
Lucene for Solr Developers
Erik Hatcher
Recommended
Solr Recipes
Solr Recipes
Erik Hatcher
Rapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
Adding Search to the Hadoop Ecosystem
Adding Search to the Hadoop Ecosystem
Cloudera, Inc.
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
Rahul Jain
Integrating the Solr search engine
Integrating the Solr search engine
th0masr
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with Solr
Lucidworks (Archived)
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Lucidworks
Lucene for Solr Developers
Lucene for Solr Developers
Erik Hatcher
Solr 4
Solr 4
Erik Hatcher
Solr Recipes Workshop
Solr Recipes Workshop
Erik Hatcher
Large Scale ETL for Hadoop and Cloudera Search using Morphlines
Large Scale ETL for Hadoop and Cloudera Search using Morphlines
whoschek
Solr Application Development Tutorial
Solr Application Development Tutorial
Erik Hatcher
Battle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearch
Rafał Kuć
Solr Flair
Solr Flair
Erik Hatcher
Lucene for Solr Developers
Lucene for Solr Developers
Erik Hatcher
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5
israelekpo
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Lucidworks
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Provectus
Data Engineering with Solr and Spark
Data Engineering with Solr and Spark
Lucidworks
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and Solr
Bertrand Delacretaz
Intro to Apache Lucene and Solr
Intro to Apache Lucene and Solr
Grant Ingersoll
Data Science with Solr and Spark
Data Science with Solr and Spark
Lucidworks
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
Rahul Jain
New-Age Search through Apache Solr
New-Age Search through Apache Solr
Edureka!
Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Spark Summit
State-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache Solr
guest432cd6
Apache Solr crash course
Apache Solr crash course
Tommaso Teofili
Introduction to Apache Solr
Introduction to Apache Solr
Andy Jackson
Using Apache Spark to Tune Spark with Shivnath Babu and Adrian Popescu
Using Apache Spark to Tune Spark with Shivnath Babu and Adrian Popescu
Databricks
Ramping up your Devops Fu for Big Data developers
Ramping up your Devops Fu for Big Data developers
François Garillot
More Related Content
What's hot
Solr 4
Solr 4
Erik Hatcher
Solr Recipes Workshop
Solr Recipes Workshop
Erik Hatcher
Large Scale ETL for Hadoop and Cloudera Search using Morphlines
Large Scale ETL for Hadoop and Cloudera Search using Morphlines
whoschek
Solr Application Development Tutorial
Solr Application Development Tutorial
Erik Hatcher
Battle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearch
Rafał Kuć
Solr Flair
Solr Flair
Erik Hatcher
Lucene for Solr Developers
Lucene for Solr Developers
Erik Hatcher
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5
israelekpo
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Lucidworks
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Provectus
Data Engineering with Solr and Spark
Data Engineering with Solr and Spark
Lucidworks
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and Solr
Bertrand Delacretaz
Intro to Apache Lucene and Solr
Intro to Apache Lucene and Solr
Grant Ingersoll
Data Science with Solr and Spark
Data Science with Solr and Spark
Lucidworks
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
Rahul Jain
New-Age Search through Apache Solr
New-Age Search through Apache Solr
Edureka!
Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Spark Summit
State-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache Solr
guest432cd6
Apache Solr crash course
Apache Solr crash course
Tommaso Teofili
Introduction to Apache Solr
Introduction to Apache Solr
Andy Jackson
What's hot
(20)
Solr 4
Solr 4
Solr Recipes Workshop
Solr Recipes Workshop
Large Scale ETL for Hadoop and Cloudera Search using Morphlines
Large Scale ETL for Hadoop and Cloudera Search using Morphlines
Solr Application Development Tutorial
Solr Application Development Tutorial
Battle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearch
Solr Flair
Solr Flair
Lucene for Solr Developers
Lucene for Solr Developers
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Data Engineering with Solr and Spark
Data Engineering with Solr and Spark
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and Solr
Intro to Apache Lucene and Solr
Intro to Apache Lucene and Solr
Data Science with Solr and Spark
Data Science with Solr and Spark
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
New-Age Search through Apache Solr
New-Age Search through Apache Solr
Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)
State-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache Solr
Apache Solr crash course
Apache Solr crash course
Introduction to Apache Solr
Introduction to Apache Solr
Similar to Solr+Hadoop = Big Data Search
Using Apache Spark to Tune Spark with Shivnath Babu and Adrian Popescu
Using Apache Spark to Tune Spark with Shivnath Babu and Adrian Popescu
Databricks
Ramping up your Devops Fu for Big Data developers
Ramping up your Devops Fu for Big Data developers
François Garillot
OC Big Data Monthly Meetup #5 - Session 1 - Altiscale
OC Big Data Monthly Meetup #5 - Session 1 - Altiscale
Big Data Joe™ Rossi
Guidance, Code and Education: ScalaCenter and the Scala Community, Heather Mi...
Guidance, Code and Education: ScalaCenter and the Scala Community, Heather Mi...
OW2
Swift as a scripting language iOSDevUK14 Lightning talk
Swift as a scripting language iOSDevUK14 Lightning talk
Diego Freniche Brito
Data Science
Data Science
Ahmet Bulut
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
Asad Abbas
php fundamental
php fundamental
zalatarunk
Php introduction with history of php
Php introduction with history of php
pooja bhandari
php
php
Ramki Kv
Small wins in a small time with Apache Solr
Small wins in a small time with Apache Solr
Sourcesense
Get involved with the Apache Software Foundation
Get involved with the Apache Software Foundation
Shalin Shekhar Mangar
Ncku csie talk about Spark
Ncku csie talk about Spark
Giivee The
apache solr web development.pdf
apache solr web development.pdf
Tasnim Jahan
REST APIs in Laravel 101
REST APIs in Laravel 101
Samantha Geitz
Drupal & Summon: Keeping Article Discovery in the Library
Drupal & Summon: Keeping Article Discovery in the Library
Ken Varnum
Drupal & Apache Solr
Drupal & Apache Solr
Andrei Savu
REST-API introduction for developers
REST-API introduction for developers
Patrick Savalle
Build and maintain large ruby applications Ruby Conf Australia 2016
Build and maintain large ruby applications Ruby Conf Australia 2016
Enrico Teotti
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
lucenerevolution
Similar to Solr+Hadoop = Big Data Search
(20)
Using Apache Spark to Tune Spark with Shivnath Babu and Adrian Popescu
Using Apache Spark to Tune Spark with Shivnath Babu and Adrian Popescu
Ramping up your Devops Fu for Big Data developers
Ramping up your Devops Fu for Big Data developers
OC Big Data Monthly Meetup #5 - Session 1 - Altiscale
OC Big Data Monthly Meetup #5 - Session 1 - Altiscale
Guidance, Code and Education: ScalaCenter and the Scala Community, Heather Mi...
Guidance, Code and Education: ScalaCenter and the Scala Community, Heather Mi...
Swift as a scripting language iOSDevUK14 Lightning talk
Swift as a scripting language iOSDevUK14 Lightning talk
Data Science
Data Science
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
php fundamental
php fundamental
Php introduction with history of php
Php introduction with history of php
php
php
Small wins in a small time with Apache Solr
Small wins in a small time with Apache Solr
Get involved with the Apache Software Foundation
Get involved with the Apache Software Foundation
Ncku csie talk about Spark
Ncku csie talk about Spark
apache solr web development.pdf
apache solr web development.pdf
REST APIs in Laravel 101
REST APIs in Laravel 101
Drupal & Summon: Keeping Article Discovery in the Library
Drupal & Summon: Keeping Article Discovery in the Library
Drupal & Apache Solr
Drupal & Apache Solr
REST-API introduction for developers
REST-API introduction for developers
Build and maintain large ruby applications Ruby Conf Australia 2016
Build and maintain large ruby applications Ruby Conf Australia 2016
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
More from Cloudera, Inc.
Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
Cloudera, Inc.
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
Cloudera, Inc.
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
Cloudera, Inc.
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
Cloudera, Inc.
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
Cloudera, Inc.
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Cloudera, Inc.
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Cloudera, Inc.
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Cloudera, Inc.
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Cloudera, Inc.
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
Cloudera, Inc.
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Cloudera, Inc.
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
Cloudera, Inc.
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
Cloudera, Inc.
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
Cloudera, Inc.
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
Cloudera, Inc.
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
Cloudera, Inc.
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
Cloudera, Inc.
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
Cloudera, Inc.
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
Cloudera, Inc.
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
Cloudera, Inc.
More from Cloudera, Inc.
(20)
Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
Recently uploaded
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
Radu Cotescu
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
V3cube
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Drew Madelung
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Martijn de Jong
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Gabriella Davis
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
Boston Institute of Analytics
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
UK Journal
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
The Digital Insurer
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Neo4j
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
The Digital Insurer
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
sammart93
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
wesley chun
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
HampshireHUG
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Maria Levchenko
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
apidays
presentation ICT roal in 21st century education
presentation ICT roal in 21st century education
jfdjdjcjdnsjd
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
Antenna Manufacturer Coco
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
ThousandEyes
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
Recently uploaded
(20)
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
presentation ICT roal in 21st century education
presentation ICT roal in 21st century education
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
Solr+Hadoop = Big Data Search
1.
1 Solr%+%Hadoop%=%Big%Data%Search Mark%Miller
2.
2 Who$Am$I? Cloudera$employee,$Lucene/Solr$committer,$Lucene$PMC$member, Apache$member First$job$out$of$college$was$in$the$Newspaper$archiving$business. First$full$time$employee$at$LucidWorks$G$a$startup$around Lucene/Solr. Spent$a$couple$years$as$“Core”$engineering$manager,$reporting$to the$VP$of$engineering.
3.
3 Very%fast%and%feature%rich%‘core’%search%engine%library.% Compact%and%powerful,%Lucene%is%an%extremely%popular%full>text search%library. Provides%low%level%API’s%for%analyzing,%indexing,%and%searching text,%along%with%a%myriad%of%related%features. Just%the%core%>%either%you%write%the%‘glue’%or%use%a%higher%level search%engine%built%with%Lucene.
4.
4 Solr%(pronounced%"solar")%is%an%open%source%enterprise%search platform%from%the%Apache%Lucene%project.%Its%major%features include%full;text%search,%hit%highlighting,%faceted%search,%dynamic clustering,%database%integration,%and%rich%document%(e.g.,%Word, PDF)%handling.%Providing%distributed%search%and%index replication,%Solr%is%highly%scalable.%Solr%is%the%most%popular enterprise%search%engine. ;%Wikipedia
5.
5 Search'on'Hadoop'History 'Katta 'Blur 'SolBase 'HBASE73529 'SOLR71301 'SOLR71045 'Ad7Hoc • • • • • • •
6.
6 Family'Tree ...
7.
7 Strengthen(the(Family(Bonds No(need(to(build(something(radically(new(8(we(have(the pieces(we(need. Focus(on(integration(points. Create(high(quality,(first(class(integrations(and(contribute the(work(to(the(projects(involved. Focus(on(integration(and(quality(first(8(then(performance and(scale. • • • •
8.
8 SolrCloud
9.
9 Solr%Integration Read%and%Write%directly%to%HDFS First%Class%Custom%Directory%Support%in%Solr Support%Solr%Replication%on%HDFS Other%improvements%around%usability%and%configuration • • • •
10.
10 Read%and%Write%directly%to%HDFS Lucene%did%not%historically%support%append%only%file%system “Flexible%Indexing”%brought%around%support%for%append%only filesystem%support Lucene%support%append%only%filesystem%by%default%since%4.2 • • •
11.
11 Lucene&Directory&Abstraction It’s&how&Lucene&interacts&with&index&files. Solr&uses&the&Lucene&library&and&offers&DirectoryFactory Class&Directory&{ &&&&&&&&listAll(); &&&&&&&&createOutput(file,&context); &&&&&&&&openInput(file,&context); &&&&&&&&deleteFile(file); &&&&&&&&makeLock(file); &&&&&&&&clearLock(file); &&&&&&&&…
12.
12 Putting'the'Index'in'HDFS Solr'relies'on'the'filesystem'cache'to'operate'at'full'speed. HDFS'not'known'for'it’s'random'access'speed. Apache'Blur'has'already'solved'this'with'an'HdfsDirectory'that works'on'top'of'a'BlockDirectory. The'“block'cache”'caches'the'hot'blocks'of'the'index'off'heap (direct'byte'array)'and'takes'the'place'of'the'filesystem'cache. We'contributed'back'optional'‘write’'caching. • • • • •
13.
13 Putting'the'TransactionLog'in'HDFS HdfsUpdateLog'added'9'extends'UpdateLog Triggered'by'setting'the'UpdateLog'dataDir'to'something'that starts'with'hdfs:/'9'no'additional'configuration'necessary. Same'extensive'testing'as'used'on'UpdateLog • • •
14.
14 Running&Solr&on&HDFS Set&DirectoryFactory&to&HdfsDirectoryFactory&and&set&the&dataDir&to&a location&in&hdfs. Set&LockType&to&‘hdfs’ Use&an&UpdateLog&dataDir&location&that&begins&with&‘hdfs:/’ Or&java&FDsolr.directoryFactory=HdfsDirectoryFactory& &&&&&&&&&&&&&&&FDsolr.lockType=solr.HdfsLockFactory &&&&&&&&&&&&&&&FDsolr.updatelog=hdfs://host:port/path&Fjar&start.jar • • • •
15.
15 Solr%Replication%on%HDFS While%Solr%has%exposed%a%plug8able%DirectoryFactory%for%a%long time%now,%it%was%really%quite%limited. Most%glaring,%only%a%local%file%system%based%Directory%would work%with%replication. There%where%also%other%more%minor%areas%that%relied%on%a%local filesystem%Directory%implementation. • • •
16.
16 Future&Solr&Replication&on&HDFS Take&advantage&of&“distributed&filesystem”&and&allow&for something&similar&to&HBase®ions. If&a&node&goes&down,&the&data&is&still&available&in&HDFS&D&allow for&that&index&to&be&automatically&served&by&a&node&that&is&still&up if&it&has&the&capacity. • • Solr&Node Solr&Node Solr&Node HDFS
17.
17 MR#Index#Building Scalable#index#creation#via#map8reduce Many#initial#‘homegrown’#implementations#sent#documents#from#reducer#to SolrCloud#over#http To#really#scale,#you#want#the#reducers#to#create#the#indexes#in#HDFS#and then#load#them#up#with#Solr The#ideal#impl#will#allow#using#as#many#reducers#as#are#available#in#your hadoop#cluster,#and#then#merge#the#indexes#down#to#the#correct#number#of ‘shards’ • • • •
18.
18 MR#Index#Building Mapper: Parse#input#into indexable#document Mapper: Parse#input#into indexable#document Mapper: Parse#input#into indexable#document Index#shard 1 Index#shard 2 Arbitrary#reducing#steps#of#indexing#and#merging End@Reducer#(shard#1): Index#document End@Reducer#(shard#2): Index#document
19.
19 SolrCloud(Aware Can(‘inspect’(ZooKeeper(to(learn(about(Solr(cluster. What(URL’s(to(GoLive(to. The(Schema(to(use(when(building(indexes. Match(hash(E>(shard(assignments(of(a(Solr(cluster. • • • •
20.
20 GoLive After+building+your+indexes+with+map:reduce,+how+do+you deploy+them+to+your+Solr+cluster? We+want+it+to+be+easy+:+so+we+built+the+GoLive+option. GoLive+allows+you+to+easily+merge+the+indexes+you+have created+atomically+into+a+live+running+Solr+cluster. Paired+with+the+ZooKeeper+Aware+ability,+this+allows+you+to simply+point+your+map:reduce+job+to+your+Solr+cluster+and+it+will automatically+discover+how+many+shards+to+build+and+what locations+to+deliver+the+final+indexes+to+in+HDFS. • • • •
21.
21 Flume&Solr&Sync Flume&is&a&distributed,&reliable,&and&available&service&for efficiently&collecting,&aggregating,&and&moving&large&amounts of&log&data.&It&has&a&simple&and&flexible&architecture&based&on streaming&data&flows.&It&is&robust&and&fault&tolerant&with tunable&reliability&mechanisms&and&many&failover&and recovery&mechanisms.&It&uses&a&simple&extensible&data&model that&allows&for&online&analytic&application. =&Apache&Flume&Website
22.
Other Logs 22 Flume.Solr.Sync HDFS Flume Agent Flume Agent Solr
23.
23 SolrCloud(Aware Can(‘inspect’(ZooKeeper(to(learn(about(Solr(cluster. What(URL’s(to(send(data(to. The(Schema(for(the(collection(being(indexed(to. • • •
24.
24 HBase&Integration Collaboration&between&NGData&&&Cloudera NGData&are&creators&of&the&Lily&data&management&platform Lily&HBase&Indexer Service&which&acts&as&a&HBase&replication&listener HBase&replication&features,&such&as&filtering,&supported Replication&updates&trigger&indexing&of&updates&(rows) Integrates&Morphlines&library&for&ETL&of&rows AL2&licensed&on&github&https://github.com/ngdata • • • • • • • •
25.
25 HBase&Integration HDFS HBase interactive&load Indexer(s) Triggers&on&updates Solr&server Solr&server Solr&server Solr&server Solr&server
26.
26 Morphlines A,morphline,is,a,configuration,file,that,allows,you,to,define,ETL transformation,pipelines Extract,content,from,input,files,,transform,content,,load,content,(eg to,Solr) Uses,Tika,to,extract,content,from,a,large,variety,of,input,documents Part,of,the,CDK,(Cloudera,Development,Kit) • • • •
27.
27 Morphlines syslog Flume Agent Solr3Sink Command:3readLine Command:3grok Command:3loadSolr Solr 3Open3Source3framework3for3simple3ETL 3Ships3as3part3Cloudera3Developer3Kit3(CDK) 3It’s3a3Java3library 3AL23licensed3on3github https://github.com/cloudera/cdk 3Similar3to3Unix3pipelines 3Configuration3over3coding 3Supports3common3Hadoop3formats Avro Sequence3file Text • • • • • • •
28.
28 Morphlines +Integrate+with+and+load+into+Apache+Solr +Flexible+log+file+analysis +Single:line+record,+multi:line+records,+CSV+files+ +Regex+based+pattern+matching+and+extraction+ +Integration+with+Avro+ +Integration+with+Apache+Hadoop+Sequence+Files +Integration+with+SolrCell+and+all+Apache+Tika+parsers+ +Auto:detection+of+MIME+types+from+binary+data+using+++Apache+Tika • • • • • • • •
29.
29 Morphlines +Scripting+support+for+dynamic+java+code+ +Operations+on+fields+for+assignment+and+comparison +Operations+on+fields+with+list+and+set+semantics+ +if:then:else+conditionals+ +A+small+rules+engine+(tryRules) +String+and+timestamp+conversions+ +slf4j+logging +Yammer+metrics+and+counters+ +Decompression+and+unpacking+of+arbitrarily+nested+container+file formats • • • • • • • • •
30.
30 Morphlines+Example+Config morphlines+:+[ +{ +++id+:+morphline1 +++importCommands+:+["com.cloudera.**",+"org.apache.solr.**"] +++commands+:+[ +++++{+readLine+{}+}++++++++++++++++++++ +++++{+ +++++++grok+{+ +++++++++dictionaryFiles+:+[/tmp/grokFdictionaries]+++++++++++++++++++++++++++++++ +++++++++expressions+:+{+ +++++++++++message+:+"""<%{POSINT:syslog_pri}>%{SYSLOGTIMESTAMP:syslog_timestamp}+% {SYSLOGHOST:syslog_hostname}+%{DATA:syslog_program}(?:[%{POSINT:syslog_pid}])?:+% {GREEDYDATA:syslog_message}""" +++++++++} +++++++} +++++} +++++{+loadSolr+{}+}+++++ ++++] +} ] Example(Input <164>Feb++4+10:46:14+syslog+sshd[607]:+listening+on+0.0.0.0+port+22 Output(Record syslog_pri:164 syslog_timestamp:Feb++4+10:46:14 syslog_hostname:syslog syslog_program:sshd syslog_pid:607 syslog_message:listening+on+0.0.0.0+port+22.
31.
31 Hue$Integration Hue Simple$UI Navigated,$faceted$drill$down Customizable$display Full$text$search,$standard$Solr API$and$query$language • • • •
32.
32 Cloudera)Search https://ccp.cloudera.com/display/SUPPORT/Downloads Or)Google “cloudera=search=download”
33.
Mark%Miller,%Cloudera @heismark
Download now