SlideShare a Scribd company logo
O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
Streaming Aggregation, New Horizons for Search
Erick Erickson
Workplace Partners, LLC.
Who am I?
• Erick Erickson
• Lucene/Solr committer
• PMC member
• Independent Consultant (Workplace
Partners, LLC)
• Not the Red State Guy
• XKCD fan
My favorite XKCD cartoon
http://xkcd.com/722/
Agenda
• High-level introduction to why you should care
about Streaming Aggregation (SA hereafter)
• High-level view of Parallel SQL processing built
on SA
• High-level view of Streaming Expressions
• Samples from a mortgage database
• Joel Bernstein will do a deep-dive right after this
presentation
• Assuming you are familiar with Solr concepts
Why SA?
• Solr has always had “issues” when
dealing with very large result sets
• Data returned had to be read from disk
an decompressed
• “Deep paging” paid this price too
• Entire result set returned at once == lots
of memory
Quick Overview of SA
• Built on the “export” capabilities introduced in
Solr 4.10
• Exports “tuples” which must be populated from
docValues fields
• Only exports primitive types, e.g. numeric,
string etc.
• Work can be distributed in parallel to worker
nodes
• Can scale to limits of hardware, 10s of millions of
rows a second with ParallelStreams (we think)
DocValues
• DocValues are basic to SA, they are the only fields
that can be specified in the “fl” list of an
Streaming Aggregation query
• Only Solr “primitive” types (int/tint, long/tlong,
string) are allowed in DocValues fields
• Defined per-field in schema.xml
• Specifically, cannot be Solr.TextField-derived
• The Solr doc may contain any field types at all, the
DocValues restriction is only on the fields that
may be exported in “tuples” for SA
We can do SQL in Solr!
select
agency_code, count(*), sum(loan_amount),
avg(loan_amount), min(loan_amount),
max(loan_amount), avg(applicant_income)
from hmda
where phonetic_name='(eric)’
having (avg(applicant_income) > 50)
group by agency_code
order by agency_code asc
And that’s not all!
• We can program arbitrary operations on complete
result sets
• We can parallelize processing across Solr nodes
• We can process very large result sets in limited
memory
• Design processing rate is 400K rows/node/
second
Streaming Aggregation == glue
• Solr is built for returning the top N documents
•  Top N is usually small, e.g. 20 docs
•  Decompress to return fields (fl list)
•  Solr commonly deals with billions of documents
• Analytics:
•  Often memory intensive, especially in distributed
mode. If they can be done at all
•  Are becoming more important to this thing we call
“search”
•  Increasingly important in the era of “big data”
Use the Right Tool
• Three “modes”
• Streaming Aggregation to do arbitrary
operations on large result sets – SolrJ
• Streaming Expressions for non Java way to
access Streaming Aggregations – HTTP and SolrJ
• Parallel SQL to do selected SQL operations on
large result sets - SolrJ
• SA’s sweet spot: batch operations
• Complements Solr’s capabilities, applies to
different problems
Why not use an RDBMS?
•  Well, if it’s the best tool, you should
•  RDBMSs are not good search engines though
•  Find the average mortgage value for all
users with a name that sounds like “erick”
•  erik, erich, eric, aerick, erick, arik
•  Critical point: The “tuples” processed can be
those that satisfy any arbitrary Solr query
Why not use Spark?
•  Well, if it’s the best tool, you should
•  I’m still trying to understand when one is
preferable to the other
•  SA only needs Solr, no other infrastructure
Why not just use Solr?
• Well, if it’s the best tool, you should
• What I’d do: exhaust Solr’s capabilities then apply
SA to those kinds of problems that OOB Solr isn’t
satisfactory for, especially those that require
processing very large result sets
How does SA work?
• Simple example of how to get a bunch of rows
back and “do something” with them from a Solr
collection
• You can process multiple streams from entirely
different collections if you choose!
• It’s usually a good idea to sort return sets
• Process all of one kind of thing then move on
• Could write the results to file, connector, etc.
Sample Data
• Data set of approx 200M mortgages. Selected
fields:
• Year
• Loan amount (thousands)
• Agency (FDIC, FRS, HUD)
• Reason for loan
• Reason for denial
• No personal data, I added randomly generated
names to illustrate search
Use SA through SolrJ
•  The basic pattern is:
•  Create a Solr query
•  Feed it to the appropriate stream
•  Process the “tuples”
•  Right, what’s a “tuple”? A wrapper for a map:
•  keys are the Solr field names
•  values the contents of those fields: must be docValues
•  Why this restriction? Because getting stored fields is
expensive
Code example
• Here’s a bit of code that
• Accesses a 2-shard SolrCloud collection
• Computes the average mortgage by “agency”,
e.g. HUD, OTS, OCC, OFS, FDIC, NCUA
• For a 217M dataset, 335K results (untuned) took
2.1 seconds
Code example
String zkHost = "169.254.80.84:2181";
Map params = new HashMap();
params.put("q", "phonetic_name:eric");
params.put("fl", "loan_amount,agency_code");
params.put("sort", "agency_code asc");
params.put("qt", "/export");
….
CloudSolrStream stream = new
CloudSolrStream(zkHost, "hmda", params);
stream.open();
More code
while (true) {
Tuple tuple = stream.read();
if (tuple.EOF) {
break;
}
// next slide in here
}
Last Code
String newAgency =
tuple.getString("agency_code");
long loant = tuple.getLong("loan_amount");
if (agency.equals(thisAgency) == true) {
add_to_current_counters
} else {
log(average for this agency);
reset_for_next_agency
}
More interestingly
•  Using SA, you can:
•  Join across completely different collections
•  Manipulate data in arbitrary ways to suit your use-case
•  Distribute this load across the solr nodes in a
collection
•  Unlike standard search, SA can use cycles on all the
replicas of a shard
•  Process zillions of buckets without blowing up
memory
Parallel SQL
• Use from SolrJ
• The work can be distributed across multiple
“worker” nodes
• Operations can be combined into complex
statements
• Let’s do our previous example with ParallelSQL
• Currently trunk/6.0 only due to Java 8
requirement for SQL parser. No plan to put in 5x
Parallel SQL
•  SQL “select” is mapped to Solr Search
•  Order by, Group by and Having are all supported
•  Certain aggregations are supported
•  count, sum, avg, min max
•  You can get crazy here:
•  having ((sum(fieldC) > 1000) AND (avg(fieldY) <= 10))
•  Following query with numWorkers=2, 612K rows
•  383ms
Sample SQL
select
agency_code, count(*), sum(loan_amount),
avg(loan_amount), min(loan_amount),
max(loan_amount)
from hmda
where phonetic_name='(erich)’
group by agency_code
order by agency_code asc
Sample SQL
select
agency_code, count(*), sum(loan_amount),
avg(loan_amount), min(loan_amount),
max(loan_amount)
from hmda <- collection name
where phonetic_name='(eric)’
group by agency_code
order by agency_code asc
Sample SQL
select
agency_code, count(*), sum(loan_amount),
avg(loan_amount), min(loan_amount),
max(loan_amount)
from hmda
where phonetic_name='(eric)’ <- Solr search
group by agency_code
order by agency_code asc
Sample SQL
select
agency_code, count(*), sum(loan_amount),
avg(loan_amount), min(loan_amount),
max(loan_amount)
from hmda
where phonetic_name='(eric)’
group by agency_code <- Solr field
order by agency_code asc <- Solr field
Parallel Sql in SolrJ
Map params = new HashMap();
params.put(CommonParams.QT, "/sql");
params.put("numWorkers", "2");
params.put("sql", "select agency_code, count(*),
sum(loan_amount), avg(loan_amount), " +
"min(loan_amount), max(loan_amount),
avg(applicant_income) from hmda where
phonetic_name='eric' " +
"group by agency_code " +
"having (avg(applicant_income) > 50) " +
"order by agency_code asc");
SolrStream stream = new SolrStream("http://ericks-mac-pro:
8981/solr/hmda", params);
Parallel Sql in SolrJ
Map params = new HashMap();
params.put(CommonParams.QT, "/sql");
params.put("numWorkers", "2");
params.put("sql", "select agency_code, count(*),
sum(loan_amount), avg(loan_amount), " +
"min(loan_amount), max(loan_amount),
avg(applicant_income) from hmda where
phonetic_name='eric' " +
"group by agency_code " +
"having (avg(applicant_income) > 50) " +
"order by agency_code asc");
Parallel Sql in SolrJ
SolrStream stream = new SolrStream("http://ericks-mac-pro:
8981/solr/hmda", params);
try {
stream.open();
while (true) {
Tuple tuple = stream.read();
dumpTuple(tuple);
log("");
if (tuple.EOF) {
break;
}
}
} finally {
if (stream != null) stream.close();
}
Parallel Sql in SolrJ
SolrStream stream = new SolrStream("http://ericks-mac-
pro:8981/solr/hmda", params);
try {
stream.open();
while (true) {
if (tuple.EOF) {
break;
}
Tuple tuple = stream.read();
dumpTuple(tuple);
}
} finally {
if (stream != null) stream.close();
}
Sample tuples returned
agency_code=FDIC
max(loan_amount)=972.0
sum(loan_amount)=53307.0
count(*)=224.0
avg(loan_amount)=237.97767857142858
min(loan_amount)=5.0
Sample tuples returned
agency_code=FRS
max(loan_amount)=3000.0
sum(loan_amount)=179702.0
count(*)=834.0
avg(loan_amount)=215.47002398081534
min(loan_amount)=1.0
Current Gotcha’s
• All fields must be lower case (possibly with
underscores)
• Trunk (6.0) only although will be in 5.x (5.4?) Not
planned. (Calcite)
• Requires solrconfig entries
• Only nodes hosting collections can act as worker
nodes (But not necessarily the queried collection)
• Be prepared to dig, documentation is also
evolving
Streaming expressions
• Provide a simple query language for SolrCloud
that merges search with parallel computing
without Java programming
• Operations can be nested
Streaming Expressions
• Can access at least two ways:
• HTTP
• SolrJ
Streaming Expressions
• Operations:
• search
• merge – can be used with separate collections
• group
• unique
• top
• parallel
Example Code
curl --data-urlencode
'stream=group(
search(hmda,q="*:*",
fl="id,agency_code",
sort="agency_code asc"),
by="agency_code asc")'
http://169.254.80.84:8981/solr/hmda/stream
Example Code
curl --data-urlencode
'stream=group(
search(hmda,q="*:*",
fl="id,agency_code",
sort="agency_code asc"),
by="agency_code asc")'
http://169.254.80.84:8981/solr/hmda/stream
Response
{"result-set":{"docs":[
{"agency_code":"FDIC","_MAPS_":[
{"agency_code":"FDIC","id":"2004_CD1.CSV_3955”}
…]
{"agency_code":"NCUA","_MAPS_":[
{"agency_code":"NCUA","id":"2004_CD1.CSV_2816”}
…]
{"EOF":true,"RESPONSE_TIME":4}]}}
Response
{"result-set":{"docs":[
{"agency_code":"FDIC","_MAPS_":[
{"agency_code":"FDIC","id":"2004_CD1.CSV_3955”
}…]
{"agency_code":"NCUA","_MAPS_":[
{"agency_code":"NCUA","id":"2004_CD1.CSV_2816
”}…]
{"EOF":true,"RESPONSE_TIME":4}]}}
Future Enhancements
• This capability is quite new, Solr 5.2 with
significant enhancements every release
• Some is still “baking” in trunk/6.0
• A JDBC Driver so any Java application can treat
Solr like a SQL database, e.g. for visualization
• More user-friendly interfaces (widgets?)
• More docs, how to’s, etc.
• “Select Into”
No time for (some)
•  Oh My. Subclasses of TupleStream:
•  MetricStream
•  RollupStream (for high cardinality faceting)
•  UniqueStream
•  FilterStream (Set operations)
•  MergeStream
•  ReducerStream
•  SolrStream for non-SolrCloud
No time for (cont)
• Parallel execution details
• Distributing SA across “Worker nodes”
• All of the Parallel SQL composition
possibilities
• All of the Streaming Expression
operations
Resources
• Ref guide for streaming expressions: https://
cwiki.apache.org/confluence/display/solr/
Streaming+Expressions
• Solr user’s list: http://lucene.apache.org/solr/
resources.html
• Joel Bernstein’s blogs: http://
joelsolr.blogspot.com/2015/04/in-line-
streaming-aggregation.html
• Parallel SQL Solr JIRA: https://issues.apache.org/
jira/browse/SOLR-7560
Resources (cont)
• Streaming expressions JIRA: https://
issues.apache.org/jira/browse/SOLR-7377
• Background for SA. http://heliosearch.org/
streaming-aggregation-for-solrcloud/
• Background for Parallel SQL. http://
heliosearch.org/heliosearch-sql-sub-project/
• Getting the code, compiling, etc. https://
wiki.apache.org/solr/HowToContribute
Thanks!

More Related Content

What's hot

Boost Your Neo4j with User-Defined Procedures
Boost Your Neo4j with User-Defined ProceduresBoost Your Neo4j with User-Defined Procedures
Boost Your Neo4j with User-Defined Procedures
Neo4j
 
Intro to Neo4j - Nicole White
Intro to Neo4j - Nicole WhiteIntro to Neo4j - Nicole White
Intro to Neo4j - Nicole White
Neo4j
 
Non-fungible tokens (NFTs)
Non-fungible tokens (NFTs)Non-fungible tokens (NFTs)
Non-fungible tokens (NFTs)
Andres Guadamuz
 
Ethereum (Blockchain Network)
Ethereum (Blockchain Network)Ethereum (Blockchain Network)
Ethereum (Blockchain Network)
Qais Ammari
 
Shock diamonds in an underexpanded jet
Shock diamonds in an underexpanded jetShock diamonds in an underexpanded jet
Shock diamonds in an underexpanded jet
Saurav Pathak
 
Building Event-Driven (Micro) Services with Apache Kafka
Building Event-Driven (Micro) Services with Apache KafkaBuilding Event-Driven (Micro) Services with Apache Kafka
Building Event-Driven (Micro) Services with Apache Kafka
Guido Schmutz
 
Decentraland. A blockchain-based virtual world. Whitepaper
Decentraland. A blockchain-based virtual world.  WhitepaperDecentraland. A blockchain-based virtual world.  Whitepaper
Decentraland. A blockchain-based virtual world. Whitepaper
eraser Juan José Calderón
 
ERC20 Step-by-Step - Creating Your First Ethereum Token
ERC20 Step-by-Step - Creating Your First Ethereum TokenERC20 Step-by-Step - Creating Your First Ethereum Token
ERC20 Step-by-Step - Creating Your First Ethereum Token
CodeOps Technologies LLP
 
Erc721 token & crypto kitties analysis
Erc721 token & crypto kitties analysisErc721 token & crypto kitties analysis
Erc721 token & crypto kitties analysis
Soobok Jin
 
How we eased out security journey with OAuth (Goodbye Kerberos!) | Paul Makka...
How we eased out security journey with OAuth (Goodbye Kerberos!) | Paul Makka...How we eased out security journey with OAuth (Goodbye Kerberos!) | Paul Makka...
How we eased out security journey with OAuth (Goodbye Kerberos!) | Paul Makka...
HostedbyConfluent
 
Future Fast Forward
Future Fast ForwardFuture Fast Forward
Future Fast Forward
Animoca Brands
 
Delivering: from Kafka to WebSockets | Adam Warski, SoftwareMill
Delivering: from Kafka to WebSockets | Adam Warski, SoftwareMillDelivering: from Kafka to WebSockets | Adam Warski, SoftwareMill
Delivering: from Kafka to WebSockets | Adam Warski, SoftwareMill
HostedbyConfluent
 
Overview of Blockchain Consensus Mechanisms
Overview of Blockchain Consensus MechanismsOverview of Blockchain Consensus Mechanisms
Overview of Blockchain Consensus Mechanisms
Johannes Ahlmann
 
Citricola
CitricolaCitricola
Blockchain Training | Blockchain Tutorial for Beginners | Blockchain Technolo...
Blockchain Training | Blockchain Tutorial for Beginners | Blockchain Technolo...Blockchain Training | Blockchain Tutorial for Beginners | Blockchain Technolo...
Blockchain Training | Blockchain Tutorial for Beginners | Blockchain Technolo...
Edureka!
 

What's hot (15)

Boost Your Neo4j with User-Defined Procedures
Boost Your Neo4j with User-Defined ProceduresBoost Your Neo4j with User-Defined Procedures
Boost Your Neo4j with User-Defined Procedures
 
Intro to Neo4j - Nicole White
Intro to Neo4j - Nicole WhiteIntro to Neo4j - Nicole White
Intro to Neo4j - Nicole White
 
Non-fungible tokens (NFTs)
Non-fungible tokens (NFTs)Non-fungible tokens (NFTs)
Non-fungible tokens (NFTs)
 
Ethereum (Blockchain Network)
Ethereum (Blockchain Network)Ethereum (Blockchain Network)
Ethereum (Blockchain Network)
 
Shock diamonds in an underexpanded jet
Shock diamonds in an underexpanded jetShock diamonds in an underexpanded jet
Shock diamonds in an underexpanded jet
 
Building Event-Driven (Micro) Services with Apache Kafka
Building Event-Driven (Micro) Services with Apache KafkaBuilding Event-Driven (Micro) Services with Apache Kafka
Building Event-Driven (Micro) Services with Apache Kafka
 
Decentraland. A blockchain-based virtual world. Whitepaper
Decentraland. A blockchain-based virtual world.  WhitepaperDecentraland. A blockchain-based virtual world.  Whitepaper
Decentraland. A blockchain-based virtual world. Whitepaper
 
ERC20 Step-by-Step - Creating Your First Ethereum Token
ERC20 Step-by-Step - Creating Your First Ethereum TokenERC20 Step-by-Step - Creating Your First Ethereum Token
ERC20 Step-by-Step - Creating Your First Ethereum Token
 
Erc721 token & crypto kitties analysis
Erc721 token & crypto kitties analysisErc721 token & crypto kitties analysis
Erc721 token & crypto kitties analysis
 
How we eased out security journey with OAuth (Goodbye Kerberos!) | Paul Makka...
How we eased out security journey with OAuth (Goodbye Kerberos!) | Paul Makka...How we eased out security journey with OAuth (Goodbye Kerberos!) | Paul Makka...
How we eased out security journey with OAuth (Goodbye Kerberos!) | Paul Makka...
 
Future Fast Forward
Future Fast ForwardFuture Fast Forward
Future Fast Forward
 
Delivering: from Kafka to WebSockets | Adam Warski, SoftwareMill
Delivering: from Kafka to WebSockets | Adam Warski, SoftwareMillDelivering: from Kafka to WebSockets | Adam Warski, SoftwareMill
Delivering: from Kafka to WebSockets | Adam Warski, SoftwareMill
 
Overview of Blockchain Consensus Mechanisms
Overview of Blockchain Consensus MechanismsOverview of Blockchain Consensus Mechanisms
Overview of Blockchain Consensus Mechanisms
 
Citricola
CitricolaCitricola
Citricola
 
Blockchain Training | Blockchain Tutorial for Beginners | Blockchain Technolo...
Blockchain Training | Blockchain Tutorial for Beginners | Blockchain Technolo...Blockchain Training | Blockchain Tutorial for Beginners | Blockchain Technolo...
Blockchain Training | Blockchain Tutorial for Beginners | Blockchain Technolo...
 

Similar to Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick Erickson

What's Your Super-Power? Mine is Machine Learning with Oracle Autonomous DB.
What's Your Super-Power? Mine is Machine Learning with Oracle Autonomous DB.What's Your Super-Power? Mine is Machine Learning with Oracle Autonomous DB.
What's Your Super-Power? Mine is Machine Learning with Oracle Autonomous DB.
Jim Czuprynski
 
Webinar: What's New in Solr 6
Webinar: What's New in Solr 6Webinar: What's New in Solr 6
Webinar: What's New in Solr 6
Lucidworks
 
Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and Spark
Lucidworks
 
MongoDB 3.0
MongoDB 3.0 MongoDB 3.0
MongoDB 3.0
Victoria Malaya
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
JSLT: JSON querying and transformation
JSLT: JSON querying and transformationJSLT: JSON querying and transformation
JSLT: JSON querying and transformation
Lars Marius Garshol
 
IOOF IT System Modernisation
IOOF IT System ModernisationIOOF IT System Modernisation
IOOF IT System Modernisation
MongoDB
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
GeeksLab Odessa
 
Webinar: Index Tuning and Evaluation
Webinar: Index Tuning and EvaluationWebinar: Index Tuning and Evaluation
Webinar: Index Tuning and Evaluation
MongoDB
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature Preview
Yonik Seeley
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADta
b0ris_1
 
Solr as a Spark SQL Datasource
Solr as a Spark SQL DatasourceSolr as a Spark SQL Datasource
Solr as a Spark SQL Datasource
Chitturi Kiran
 
OpenCms Days 2014 - Using the SOLR collector
OpenCms Days 2014 - Using the SOLR collectorOpenCms Days 2014 - Using the SOLR collector
OpenCms Days 2014 - Using the SOLR collector
Alkacon Software GmbH & Co. KG
 
Real-Time Analytics with Solr: Presented by Yonik Seeley, Cloudera
Real-Time Analytics with Solr: Presented by Yonik Seeley, ClouderaReal-Time Analytics with Solr: Presented by Yonik Seeley, Cloudera
Real-Time Analytics with Solr: Presented by Yonik Seeley, Cloudera
Lucidworks
 
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, ClouderaParallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
Lucidworks
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Make Your Data Searchable With Solr in 25 Minutes
Make Your Data Searchable With Solr in 25 MinutesMake Your Data Searchable With Solr in 25 Minutes
Make Your Data Searchable With Solr in 25 Minutes
UCLA Social Sciences Computing
 
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
confluent
 
20150210 solr introdution
20150210 solr introdution20150210 solr introdution
20150210 solr introdution
Xuan-Chao Huang
 

Similar to Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick Erickson (20)

What's Your Super-Power? Mine is Machine Learning with Oracle Autonomous DB.
What's Your Super-Power? Mine is Machine Learning with Oracle Autonomous DB.What's Your Super-Power? Mine is Machine Learning with Oracle Autonomous DB.
What's Your Super-Power? Mine is Machine Learning with Oracle Autonomous DB.
 
Webinar: What's New in Solr 6
Webinar: What's New in Solr 6Webinar: What's New in Solr 6
Webinar: What's New in Solr 6
 
Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and Spark
 
MongoDB 3.0
MongoDB 3.0 MongoDB 3.0
MongoDB 3.0
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
JSLT: JSON querying and transformation
JSLT: JSON querying and transformationJSLT: JSON querying and transformation
JSLT: JSON querying and transformation
 
Master tuning
Master   tuningMaster   tuning
Master tuning
 
IOOF IT System Modernisation
IOOF IT System ModernisationIOOF IT System Modernisation
IOOF IT System Modernisation
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
 
Webinar: Index Tuning and Evaluation
Webinar: Index Tuning and EvaluationWebinar: Index Tuning and Evaluation
Webinar: Index Tuning and Evaluation
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature Preview
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADta
 
Solr as a Spark SQL Datasource
Solr as a Spark SQL DatasourceSolr as a Spark SQL Datasource
Solr as a Spark SQL Datasource
 
OpenCms Days 2014 - Using the SOLR collector
OpenCms Days 2014 - Using the SOLR collectorOpenCms Days 2014 - Using the SOLR collector
OpenCms Days 2014 - Using the SOLR collector
 
Real-Time Analytics with Solr: Presented by Yonik Seeley, Cloudera
Real-Time Analytics with Solr: Presented by Yonik Seeley, ClouderaReal-Time Analytics with Solr: Presented by Yonik Seeley, Cloudera
Real-Time Analytics with Solr: Presented by Yonik Seeley, Cloudera
 
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, ClouderaParallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Make Your Data Searchable With Solr in 25 Minutes
Make Your Data Searchable With Solr in 25 MinutesMake Your Data Searchable With Solr in 25 Minutes
Make Your Data Searchable With Solr in 25 Minutes
 
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
Closing the Loop in Extended Reality with Kafka Streams and Machine Learning ...
 
20150210 solr introdution
20150210 solr introdution20150210 solr introdution
20150210 solr introdution
 

More from Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
Lucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
Lucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
Lucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
Lucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
Lucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
Lucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Lucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
Lucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Lucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Lucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
Lucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
Lucidworks
 

More from Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Recently uploaded

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 

Recently uploaded (20)

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 

Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick Erickson

  • 1. O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
  • 2. Streaming Aggregation, New Horizons for Search Erick Erickson Workplace Partners, LLC.
  • 3. Who am I? • Erick Erickson • Lucene/Solr committer • PMC member • Independent Consultant (Workplace Partners, LLC) • Not the Red State Guy • XKCD fan
  • 4. My favorite XKCD cartoon http://xkcd.com/722/
  • 5. Agenda • High-level introduction to why you should care about Streaming Aggregation (SA hereafter) • High-level view of Parallel SQL processing built on SA • High-level view of Streaming Expressions • Samples from a mortgage database • Joel Bernstein will do a deep-dive right after this presentation • Assuming you are familiar with Solr concepts
  • 6. Why SA? • Solr has always had “issues” when dealing with very large result sets • Data returned had to be read from disk an decompressed • “Deep paging” paid this price too • Entire result set returned at once == lots of memory
  • 7. Quick Overview of SA • Built on the “export” capabilities introduced in Solr 4.10 • Exports “tuples” which must be populated from docValues fields • Only exports primitive types, e.g. numeric, string etc. • Work can be distributed in parallel to worker nodes • Can scale to limits of hardware, 10s of millions of rows a second with ParallelStreams (we think)
  • 8. DocValues • DocValues are basic to SA, they are the only fields that can be specified in the “fl” list of an Streaming Aggregation query • Only Solr “primitive” types (int/tint, long/tlong, string) are allowed in DocValues fields • Defined per-field in schema.xml • Specifically, cannot be Solr.TextField-derived • The Solr doc may contain any field types at all, the DocValues restriction is only on the fields that may be exported in “tuples” for SA
  • 9. We can do SQL in Solr! select agency_code, count(*), sum(loan_amount), avg(loan_amount), min(loan_amount), max(loan_amount), avg(applicant_income) from hmda where phonetic_name='(eric)’ having (avg(applicant_income) > 50) group by agency_code order by agency_code asc
  • 10. And that’s not all! • We can program arbitrary operations on complete result sets • We can parallelize processing across Solr nodes • We can process very large result sets in limited memory • Design processing rate is 400K rows/node/ second
  • 11. Streaming Aggregation == glue • Solr is built for returning the top N documents •  Top N is usually small, e.g. 20 docs •  Decompress to return fields (fl list) •  Solr commonly deals with billions of documents • Analytics: •  Often memory intensive, especially in distributed mode. If they can be done at all •  Are becoming more important to this thing we call “search” •  Increasingly important in the era of “big data”
  • 12. Use the Right Tool • Three “modes” • Streaming Aggregation to do arbitrary operations on large result sets – SolrJ • Streaming Expressions for non Java way to access Streaming Aggregations – HTTP and SolrJ • Parallel SQL to do selected SQL operations on large result sets - SolrJ • SA’s sweet spot: batch operations • Complements Solr’s capabilities, applies to different problems
  • 13. Why not use an RDBMS? •  Well, if it’s the best tool, you should •  RDBMSs are not good search engines though •  Find the average mortgage value for all users with a name that sounds like “erick” •  erik, erich, eric, aerick, erick, arik •  Critical point: The “tuples” processed can be those that satisfy any arbitrary Solr query
  • 14. Why not use Spark? •  Well, if it’s the best tool, you should •  I’m still trying to understand when one is preferable to the other •  SA only needs Solr, no other infrastructure
  • 15. Why not just use Solr? • Well, if it’s the best tool, you should • What I’d do: exhaust Solr’s capabilities then apply SA to those kinds of problems that OOB Solr isn’t satisfactory for, especially those that require processing very large result sets
  • 16. How does SA work? • Simple example of how to get a bunch of rows back and “do something” with them from a Solr collection • You can process multiple streams from entirely different collections if you choose! • It’s usually a good idea to sort return sets • Process all of one kind of thing then move on • Could write the results to file, connector, etc.
  • 17. Sample Data • Data set of approx 200M mortgages. Selected fields: • Year • Loan amount (thousands) • Agency (FDIC, FRS, HUD) • Reason for loan • Reason for denial • No personal data, I added randomly generated names to illustrate search
  • 18. Use SA through SolrJ •  The basic pattern is: •  Create a Solr query •  Feed it to the appropriate stream •  Process the “tuples” •  Right, what’s a “tuple”? A wrapper for a map: •  keys are the Solr field names •  values the contents of those fields: must be docValues •  Why this restriction? Because getting stored fields is expensive
  • 19. Code example • Here’s a bit of code that • Accesses a 2-shard SolrCloud collection • Computes the average mortgage by “agency”, e.g. HUD, OTS, OCC, OFS, FDIC, NCUA • For a 217M dataset, 335K results (untuned) took 2.1 seconds
  • 20. Code example String zkHost = "169.254.80.84:2181"; Map params = new HashMap(); params.put("q", "phonetic_name:eric"); params.put("fl", "loan_amount,agency_code"); params.put("sort", "agency_code asc"); params.put("qt", "/export"); …. CloudSolrStream stream = new CloudSolrStream(zkHost, "hmda", params); stream.open();
  • 21. More code while (true) { Tuple tuple = stream.read(); if (tuple.EOF) { break; } // next slide in here }
  • 22. Last Code String newAgency = tuple.getString("agency_code"); long loant = tuple.getLong("loan_amount"); if (agency.equals(thisAgency) == true) { add_to_current_counters } else { log(average for this agency); reset_for_next_agency }
  • 23. More interestingly •  Using SA, you can: •  Join across completely different collections •  Manipulate data in arbitrary ways to suit your use-case •  Distribute this load across the solr nodes in a collection •  Unlike standard search, SA can use cycles on all the replicas of a shard •  Process zillions of buckets without blowing up memory
  • 24. Parallel SQL • Use from SolrJ • The work can be distributed across multiple “worker” nodes • Operations can be combined into complex statements • Let’s do our previous example with ParallelSQL • Currently trunk/6.0 only due to Java 8 requirement for SQL parser. No plan to put in 5x
  • 25. Parallel SQL •  SQL “select” is mapped to Solr Search •  Order by, Group by and Having are all supported •  Certain aggregations are supported •  count, sum, avg, min max •  You can get crazy here: •  having ((sum(fieldC) > 1000) AND (avg(fieldY) <= 10)) •  Following query with numWorkers=2, 612K rows •  383ms
  • 26. Sample SQL select agency_code, count(*), sum(loan_amount), avg(loan_amount), min(loan_amount), max(loan_amount) from hmda where phonetic_name='(erich)’ group by agency_code order by agency_code asc
  • 27. Sample SQL select agency_code, count(*), sum(loan_amount), avg(loan_amount), min(loan_amount), max(loan_amount) from hmda <- collection name where phonetic_name='(eric)’ group by agency_code order by agency_code asc
  • 28. Sample SQL select agency_code, count(*), sum(loan_amount), avg(loan_amount), min(loan_amount), max(loan_amount) from hmda where phonetic_name='(eric)’ <- Solr search group by agency_code order by agency_code asc
  • 29. Sample SQL select agency_code, count(*), sum(loan_amount), avg(loan_amount), min(loan_amount), max(loan_amount) from hmda where phonetic_name='(eric)’ group by agency_code <- Solr field order by agency_code asc <- Solr field
  • 30. Parallel Sql in SolrJ Map params = new HashMap(); params.put(CommonParams.QT, "/sql"); params.put("numWorkers", "2"); params.put("sql", "select agency_code, count(*), sum(loan_amount), avg(loan_amount), " + "min(loan_amount), max(loan_amount), avg(applicant_income) from hmda where phonetic_name='eric' " + "group by agency_code " + "having (avg(applicant_income) > 50) " + "order by agency_code asc"); SolrStream stream = new SolrStream("http://ericks-mac-pro: 8981/solr/hmda", params);
  • 31. Parallel Sql in SolrJ Map params = new HashMap(); params.put(CommonParams.QT, "/sql"); params.put("numWorkers", "2"); params.put("sql", "select agency_code, count(*), sum(loan_amount), avg(loan_amount), " + "min(loan_amount), max(loan_amount), avg(applicant_income) from hmda where phonetic_name='eric' " + "group by agency_code " + "having (avg(applicant_income) > 50) " + "order by agency_code asc");
  • 32. Parallel Sql in SolrJ SolrStream stream = new SolrStream("http://ericks-mac-pro: 8981/solr/hmda", params); try { stream.open(); while (true) { Tuple tuple = stream.read(); dumpTuple(tuple); log(""); if (tuple.EOF) { break; } } } finally { if (stream != null) stream.close(); }
  • 33. Parallel Sql in SolrJ SolrStream stream = new SolrStream("http://ericks-mac- pro:8981/solr/hmda", params); try { stream.open(); while (true) { if (tuple.EOF) { break; } Tuple tuple = stream.read(); dumpTuple(tuple); } } finally { if (stream != null) stream.close(); }
  • 36. Current Gotcha’s • All fields must be lower case (possibly with underscores) • Trunk (6.0) only although will be in 5.x (5.4?) Not planned. (Calcite) • Requires solrconfig entries • Only nodes hosting collections can act as worker nodes (But not necessarily the queried collection) • Be prepared to dig, documentation is also evolving
  • 37. Streaming expressions • Provide a simple query language for SolrCloud that merges search with parallel computing without Java programming • Operations can be nested
  • 38. Streaming Expressions • Can access at least two ways: • HTTP • SolrJ
  • 39. Streaming Expressions • Operations: • search • merge – can be used with separate collections • group • unique • top • parallel
  • 40. Example Code curl --data-urlencode 'stream=group( search(hmda,q="*:*", fl="id,agency_code", sort="agency_code asc"), by="agency_code asc")' http://169.254.80.84:8981/solr/hmda/stream
  • 41. Example Code curl --data-urlencode 'stream=group( search(hmda,q="*:*", fl="id,agency_code", sort="agency_code asc"), by="agency_code asc")' http://169.254.80.84:8981/solr/hmda/stream
  • 44. Future Enhancements • This capability is quite new, Solr 5.2 with significant enhancements every release • Some is still “baking” in trunk/6.0 • A JDBC Driver so any Java application can treat Solr like a SQL database, e.g. for visualization • More user-friendly interfaces (widgets?) • More docs, how to’s, etc. • “Select Into”
  • 45. No time for (some) •  Oh My. Subclasses of TupleStream: •  MetricStream •  RollupStream (for high cardinality faceting) •  UniqueStream •  FilterStream (Set operations) •  MergeStream •  ReducerStream •  SolrStream for non-SolrCloud
  • 46. No time for (cont) • Parallel execution details • Distributing SA across “Worker nodes” • All of the Parallel SQL composition possibilities • All of the Streaming Expression operations
  • 47. Resources • Ref guide for streaming expressions: https:// cwiki.apache.org/confluence/display/solr/ Streaming+Expressions • Solr user’s list: http://lucene.apache.org/solr/ resources.html • Joel Bernstein’s blogs: http:// joelsolr.blogspot.com/2015/04/in-line- streaming-aggregation.html • Parallel SQL Solr JIRA: https://issues.apache.org/ jira/browse/SOLR-7560
  • 48. Resources (cont) • Streaming expressions JIRA: https:// issues.apache.org/jira/browse/SOLR-7377 • Background for SA. http://heliosearch.org/ streaming-aggregation-for-solrcloud/ • Background for Parallel SQL. http:// heliosearch.org/heliosearch-sql-sub-project/ • Getting the code, compiling, etc. https:// wiki.apache.org/solr/HowToContribute