SlideShare a Scribd company logo
1 of 26
Download to read offline
O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
OLAP Battle: SolrCloud vs. HBase
Dragan Milosevic
Chief Search Architect, zanox
3
01
Table of Content
Introduce zanox and me
What is OLAP?
Solrcloud with Doc-Values
HBase with End-Point Coprocessors
Response Time Comparison
Summary
Questions
Tip 7 Tip 8
Tip 1 Tip 2 Tip 3
Tip 4 Tip 5 Tip 6
4
02
Who am I?
Chief Search Architect at Zanox
9+ years experience in building distributed search systems based on Lucene
Lucid Certified Apache Solr/Lucene Developer
6+ years of using Hadoop and 4+ years of using HBase for data mining
Cloudera Certified Developer for Apache Hadoop and HBase
I have applied different machine-learning techniques mainly to optimise resource usage while
performing distributed search during my PhD study at Technical University Berlin
See my book: “Beyond Centralised Search Engines An Agent-Based Filtering Framework”
How can you reach me?
dragan.milosevic@zanox.com
http://www.linkedin.com/in/draganmilosevic
5
03
Zanox Network
6
02
What is OLAP?
OLAP = Online Analytical Processing
7
02
What is Solrcloud?
8
01
Lucene Stored Fields
Lucene Stored Fields are optimized for retrieving the values of fields for relatively
small number of documents (top hits to be presented to user) Tip 1
Lucene Stored Fields
Lucene Stored Fields are optimized for retrieving the values of fields for relatively
small number of documents (top hits to be presented to user)
It is difficult to use them efficiently for reporting because many documents
(thousands, even millions) are selected and their fields will be needed
Machine learning has to be used to analyze queries and determine the optimal order of
documents and optimize the loading of hits if stored fields are used for reporting
Lucene Revolution
San Diego 2013
Analytics in OLAP
with Lucene and Hadoop
Tip 2
Tip 1
10
01
Doc-Values behind Solrcloud
11
01
Doc-Values Step-by-Step – schema.xml & Search Component class
Enable Doc-Values in shema.xml for fields that should be aggregated (with stored=“false”)
<field name="adrank" type="tfloat" indexed=“false" stored=“false" docValues="true"/>
Implement Search Component that uses Doc-Values and aggregates
public class SummingComponent extends SearchComponent {
public void prepare(ResponseBuilder rb) throws IOException
{ rb.setNeedDocSet(true); }
public void process(ResponseBuilder rb) throws IOException {
DocIterator itr = rb.getResults().docSet.iterator();
LeafReader reader = rb.req.getSearcher().getLeafReader();
NumericDocValues docValues = reader.getNumericDocValues("adrank");
double sum = 0;
while (itr.hasNext()) { sum += Float.intBitsToFloat((int) docValues.get(itr.nextDoc())); }
rb.rsp.add("sum", sum);
}
Tip 3
Doc-Values Step-by-Step – solrconfig.xml
Build jar with search component, put it somewhere in contrib and add it in solrconfig.xml
<lib dir="${solr.install.dir:../../../..}/contrib/zanox/lib" regex=".*.jar" />
Register newly created search component for summing in solrconfig.xml
<searchComponent name="summer" class="com.zanox.search.SummingComponent">
<str name=“…">…</str>
</searchComponent>
Use search component in request handler
<requestHandler name="/summing" class="solr.SearchHandler">
<lst name="defaults"> … </lst> …
<arr name="components">
<str>query</str>
<str>summer</str>
</arr>
</requestHandler>
What is HBase?
“HBase is an open source, non-relational, distributed database modeled after
Google's BigTable and written in Java. It is developed as part of Apache Software
Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed
Filesystem), providing BigTable-like capabilities for Hadoop. That is, it provides a fault-
tolerant way of storing large quantities of sparse data …”
https://en.wikipedia.org/wiki/Apache_HBase
“we could simply state that HBase is a faithful, open source implementation of Google’s
BigTable. But that would be a bit too simplistic …” HBase: The Definitive Guide, Lars
George
“HBase is a database: the Hadoop database. It’s often described as a sparse,
distributed, persistent, multidimensional sorted map, which is indexed by row-key,
column key, and timestamp. You’ll hear people refer to it as a key value store, a
column family-oriented database … “ HBase in Action, Nick Dimiduk, Amandeep
Khurana
HBase Table and Regions
Row-Key Design for Reporting
Many Different Columns
Column Family Challenge
There are hundreds of columns that should be assigned to few column families
The goal is to minimize the number of column families to be accessed for requested query
Fewer column families needed means fewer files to be checked
Because of the way how splitting of regions is implemented in HBase region servers,
the difference in size between column families should be minimized
Small column families will be spread across many regions, many files will be needed
to be checked while processing queries and response time will increase
The irrelevant columns that are not needed for query should be minimized
It is allowed to duplicate data and put column in multiple column families to optimize access
Important columns that are needed for many queries will be duplicated
But the amount of duplication should be minimized not to waste space
NP-hard optimization problem
End Point Coprocessors
End-Point Coprocessors provide a way to run custom code on
region-servers and aggregation can be this code
They are therefore analogous to both stored-procedures and
map-reduce jobs as they execute logic where data resides
Without them, client will have to retrieve all values locally
before applying logic and probably network bandwidth will be
bottleneck
They behave as network friendly as possible, as only the
summarized results from region servers are sent to client
HBase provides high level interface for calling End-Point
Coprocessors and selects regions to be queried automatically
Tip 4
HBase Reporting Guidelines
Queries have to be known and analyzed profoundly because
Row-keys should be designed so that needed records are saved next to each other
and therefore can be accessed quickly Tip 5
HBase Reporting Guidelines
Queries have to be known and analyzed profoundly because
Row-keys should be designed so that needed records are saved next to each other
and therefore can be accessed quickly
It is necessary to decide which column should be saved in which column-family so that
the number of accessed column-families is minimized
Tip 5
Tip 6
HBase Reporting Guidelines
Queries have to be known and analyzed profoundly because
Row-keys should be designed so that needed records are saved next to each other
and therefore can be accessed quickly
It is necessary to decide which column should be saved in which column-family so that the
number of accessed column-families is minimized
Queries that are unexpected can be very expensive because
The structure of row-keys solely decides the order in which records are saved and consequently
needed records will not be saved next to each other and therefore cannot be accessed quickly
Even though this sounds as a huge disadvantage, typically there are standard reports (queries)
that are the only ones made available to end-users and unexpected queries are not possible
Tip 5
Tip 6
Response Time Comparison – Single Column
Tip 7
If aggregation should be performed on single attribute (column) there is no clear winner
Therefore use the technology that you are more familiar with
Response Time Comparison – Multiple Columns
Tip 8
If aggregation should be performed on many big attributes (columns) there is clear
winner
Use HBase with row-keys and column-families that are optimized for expected queries
Summary
Tip 3
Tip 1
Tip 2
Tip 6
Tip 4
Tip 5
Tip 8
Tip 7
Do not use Lucene Stored Fields for reporting
If not convinced watch Analytics in OLAP with Lucene and Hadoop
Activate Doc-Values and enjoy great aggregation performance
Plug aggregation code as End-Point Coprocessor in HBase
Design row-keys based on queries so that needed records are next to each other
Optimize the structure of column-families for queries requesting many
aggregations
Pick any technology if aggregation is performed over single (few) attributes
Use optimized HBase for aggregating many big attributes
Questions?

More Related Content

What's hot

Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Rahul Jain
 
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on HadoopBig Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on HadoopGruter
 
Efficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajoEfficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajoHyunsik Choi
 
HBaseCon 2013: ETL for Apache HBase
HBaseCon 2013: ETL for Apache HBaseHBaseCon 2013: ETL for Apache HBase
HBaseCon 2013: ETL for Apache HBaseCloudera, Inc.
 
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, Target
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, TargetJourney of Implementing Solr at Target: Presented by Raja Ramachandran, Target
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, TargetLucidworks
 
Tajo: A Distributed Data Warehouse System for Hadoop
Tajo: A Distributed Data Warehouse System for HadoopTajo: A Distributed Data Warehouse System for Hadoop
Tajo: A Distributed Data Warehouse System for HadoopHyunsik Choi
 
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkMichael Stack
 
Real time analytics using Hadoop and Elasticsearch
Real time analytics using Hadoop and ElasticsearchReal time analytics using Hadoop and Elasticsearch
Real time analytics using Hadoop and ElasticsearchAbhishek Andhavarapu
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Lucidworks
 
Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Shalin Shekhar Mangar
 
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...Spark Summit
 
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...Lucidworks
 
Big Data Day LA 2015 - What's New Tajo 0.10 and Beyond by Hyunsik Choi of Gruter
Big Data Day LA 2015 - What's New Tajo 0.10 and Beyond by Hyunsik Choi of GruterBig Data Day LA 2015 - What's New Tajo 0.10 and Beyond by Hyunsik Choi of Gruter
Big Data Day LA 2015 - What's New Tajo 0.10 and Beyond by Hyunsik Choi of GruterData Con LA
 
Art of Feature Engineering for Data Science with Nabeel Sarwar
Art of Feature Engineering for Data Science with Nabeel SarwarArt of Feature Engineering for Data Science with Nabeel Sarwar
Art of Feature Engineering for Data Science with Nabeel SarwarSpark Summit
 
HBaseCon2017 Community-Driven Graphs with JanusGraph
HBaseCon2017 Community-Driven Graphs with JanusGraphHBaseCon2017 Community-Driven Graphs with JanusGraph
HBaseCon2017 Community-Driven Graphs with JanusGraphHBaseCon
 
Amazon Athena Hands-On Workshop
Amazon Athena Hands-On WorkshopAmazon Athena Hands-On Workshop
Amazon Athena Hands-On WorkshopDoiT International
 
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaKerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaSpark Summit
 
Tachyon and Apache Spark
Tachyon and Apache SparkTachyon and Apache Spark
Tachyon and Apache Sparkrhatr
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoophadooparchbook
 

What's hot (20)

Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )
 
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on HadoopBig Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
 
Efficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajoEfficient in situ processing of various storage types on apache tajo
Efficient in situ processing of various storage types on apache tajo
 
HBaseCon 2013: ETL for Apache HBase
HBaseCon 2013: ETL for Apache HBaseHBaseCon 2013: ETL for Apache HBase
HBaseCon 2013: ETL for Apache HBase
 
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, Target
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, TargetJourney of Implementing Solr at Target: Presented by Raja Ramachandran, Target
Journey of Implementing Solr at Target: Presented by Raja Ramachandran, Target
 
Tajo: A Distributed Data Warehouse System for Hadoop
Tajo: A Distributed Data Warehouse System for HadoopTajo: A Distributed Data Warehouse System for Hadoop
Tajo: A Distributed Data Warehouse System for Hadoop
 
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
 
Real time analytics using Hadoop and Elasticsearch
Real time analytics using Hadoop and ElasticsearchReal time analytics using Hadoop and Elasticsearch
Real time analytics using Hadoop and Elasticsearch
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
 
Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6
 
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
 
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
 
Big Data Day LA 2015 - What's New Tajo 0.10 and Beyond by Hyunsik Choi of Gruter
Big Data Day LA 2015 - What's New Tajo 0.10 and Beyond by Hyunsik Choi of GruterBig Data Day LA 2015 - What's New Tajo 0.10 and Beyond by Hyunsik Choi of Gruter
Big Data Day LA 2015 - What's New Tajo 0.10 and Beyond by Hyunsik Choi of Gruter
 
Art of Feature Engineering for Data Science with Nabeel Sarwar
Art of Feature Engineering for Data Science with Nabeel SarwarArt of Feature Engineering for Data Science with Nabeel Sarwar
Art of Feature Engineering for Data Science with Nabeel Sarwar
 
HBaseCon2017 Community-Driven Graphs with JanusGraph
HBaseCon2017 Community-Driven Graphs with JanusGraphHBaseCon2017 Community-Driven Graphs with JanusGraph
HBaseCon2017 Community-Driven Graphs with JanusGraph
 
Amazon Athena Hands-On Workshop
Amazon Athena Hands-On WorkshopAmazon Athena Hands-On Workshop
Amazon Athena Hands-On Workshop
 
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaKerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
 
Tachyon and Apache Spark
Tachyon and Apache SparkTachyon and Apache Spark
Tachyon and Apache Spark
 
Is hadoop for you
Is hadoop for youIs hadoop for you
Is hadoop for you
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoop
 

Viewers also liked

Updateable Fields in Lucene and other Codec Applications
Updateable Fields in Lucene and other Codec ApplicationsUpdateable Fields in Lucene and other Codec Applications
Updateable Fields in Lucene and other Codec Applicationslucenerevolution
 
Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)Spark Summit
 
Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and SparkLucidworks
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for ArchitectsNick Dimiduk
 
Notating pop music
Notating pop musicNotating pop music
Notating pop musicxjkoboe
 
FOOD ANYWHERE IS FOOD EVERYWHERE
FOOD ANYWHERE IS FOOD EVERYWHEREFOOD ANYWHERE IS FOOD EVERYWHERE
FOOD ANYWHERE IS FOOD EVERYWHEREWINRED AKPLAGAH
 
Exit Readiness to Maximize Enterprise Value
Exit Readiness to Maximize Enterprise ValueExit Readiness to Maximize Enterprise Value
Exit Readiness to Maximize Enterprise Valuedavidsaxe
 
Physician, the finest flower of civilization
Physician, the finest flower of civilizationPhysician, the finest flower of civilization
Physician, the finest flower of civilizationShehan Silva
 
8 habits of highly effective language learners
8 habits of highly effective language learners8 habits of highly effective language learners
8 habits of highly effective language learnersPhilip Seifi
 
Uuriv ja aktiivne õppeviis algklassides
Uuriv ja aktiivne õppeviis algklassidesUuriv ja aktiivne õppeviis algklassides
Uuriv ja aktiivne õppeviis algklassideshajao
 
2Q10 Earnings Release
2Q10 Earnings Release2Q10 Earnings Release
2Q10 Earnings ReleaseGafisa RI !
 
Corporate Image Consulting
Corporate Image ConsultingCorporate Image Consulting
Corporate Image Consultingrobertsol
 
Aceds 2011 E Discovery Conference Brochure Seth Row Voucher
Aceds 2011 E Discovery Conference Brochure Seth Row VoucherAceds 2011 E Discovery Conference Brochure Seth Row Voucher
Aceds 2011 E Discovery Conference Brochure Seth Row VoucherSeth Row
 
Words Associated with Questions about Macros in Tex
Words Associated with Questions about Macros in TexWords Associated with Questions about Macros in Tex
Words Associated with Questions about Macros in TexAngela Lozano
 
GSCNC Cookie Incentives
GSCNC Cookie IncentivesGSCNC Cookie Incentives
GSCNC Cookie Incentivesddurst16
 
M7 Developing a digital photography workshop in the telecentre
M7 Developing a digital  photography workshop in the  telecentreM7 Developing a digital  photography workshop in the  telecentre
M7 Developing a digital photography workshop in the telecentreTELECENTRE EUROPE
 

Viewers also liked (20)

Updateable Fields in Lucene and other Codec Applications
Updateable Fields in Lucene and other Codec ApplicationsUpdateable Fields in Lucene and other Codec Applications
Updateable Fields in Lucene and other Codec Applications
 
Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)
 
Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and Spark
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for Architects
 
Notating pop music
Notating pop musicNotating pop music
Notating pop music
 
FOOD ANYWHERE IS FOOD EVERYWHERE
FOOD ANYWHERE IS FOOD EVERYWHEREFOOD ANYWHERE IS FOOD EVERYWHERE
FOOD ANYWHERE IS FOOD EVERYWHERE
 
Web
WebWeb
Web
 
Exit Readiness to Maximize Enterprise Value
Exit Readiness to Maximize Enterprise ValueExit Readiness to Maximize Enterprise Value
Exit Readiness to Maximize Enterprise Value
 
Physician, the finest flower of civilization
Physician, the finest flower of civilizationPhysician, the finest flower of civilization
Physician, the finest flower of civilization
 
8 habits of highly effective language learners
8 habits of highly effective language learners8 habits of highly effective language learners
8 habits of highly effective language learners
 
Uuriv ja aktiivne õppeviis algklassides
Uuriv ja aktiivne õppeviis algklassidesUuriv ja aktiivne õppeviis algklassides
Uuriv ja aktiivne õppeviis algklassides
 
2Q10 Earnings Release
2Q10 Earnings Release2Q10 Earnings Release
2Q10 Earnings Release
 
Corporate Image Consulting
Corporate Image ConsultingCorporate Image Consulting
Corporate Image Consulting
 
Aceds 2011 E Discovery Conference Brochure Seth Row Voucher
Aceds 2011 E Discovery Conference Brochure Seth Row VoucherAceds 2011 E Discovery Conference Brochure Seth Row Voucher
Aceds 2011 E Discovery Conference Brochure Seth Row Voucher
 
dgdgdgdgd
dgdgdgdgddgdgdgdgd
dgdgdgdgd
 
Words Associated with Questions about Macros in Tex
Words Associated with Questions about Macros in TexWords Associated with Questions about Macros in Tex
Words Associated with Questions about Macros in Tex
 
Sujal enterprise
Sujal enterpriseSujal enterprise
Sujal enterprise
 
GSCNC Cookie Incentives
GSCNC Cookie IncentivesGSCNC Cookie Incentives
GSCNC Cookie Incentives
 
M7 Developing a digital photography workshop in the telecentre
M7 Developing a digital  photography workshop in the  telecentreM7 Developing a digital  photography workshop in the  telecentre
M7 Developing a digital photography workshop in the telecentre
 
G3a1 wang2q
G3a1 wang2qG3a1 wang2q
G3a1 wang2q
 

Similar to OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG

Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesJon Meredith
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Lucidworks
 
Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?Max Neunhöffer
 
Big Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSBig Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSjavier ramirez
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataRahul Jain
 
Big data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edgeBig data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edgeBhavya Gulati
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with HadoopNalini Mehta
 
Agile data lake? An oxymoron?
Agile data lake? An oxymoron?Agile data lake? An oxymoron?
Agile data lake? An oxymoron?samthemonad
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless DatabasesDan Gunter
 
Performance By Design
Performance By DesignPerformance By Design
Performance By DesignGuy Harrison
 
Redis and Bloom Filters - Atlanta Java Users Group 9/2014
Redis and Bloom Filters - Atlanta Java Users Group 9/2014Redis and Bloom Filters - Atlanta Java Users Group 9/2014
Redis and Bloom Filters - Atlanta Java Users Group 9/2014Christopher Curtin
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
DataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-ServiceDataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-ServiceMarin Dimitrov
 

Similar to OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG (20)

Oslo baksia2014
Oslo baksia2014Oslo baksia2014
Oslo baksia2014
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
 
The ABC of Big Data
The ABC of Big DataThe ABC of Big Data
The ABC of Big Data
 
Oslo bekk2014
Oslo bekk2014Oslo bekk2014
Oslo bekk2014
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
 
Impala for PhillyDB Meetup
Impala for PhillyDB MeetupImpala for PhillyDB Meetup
Impala for PhillyDB Meetup
 
No sq lv1_0
No sq lv1_0No sq lv1_0
No sq lv1_0
 
Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?
 
Big Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSBig Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWS
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
 
Big data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edgeBig data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edge
 
NoSQL and MongoDB
NoSQL and MongoDBNoSQL and MongoDB
NoSQL and MongoDB
 
CBO-2
CBO-2CBO-2
CBO-2
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
Agile data lake? An oxymoron?
Agile data lake? An oxymoron?Agile data lake? An oxymoron?
Agile data lake? An oxymoron?
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
Performance By Design
Performance By DesignPerformance By Design
Performance By Design
 
Redis and Bloom Filters - Atlanta Java Users Group 9/2014
Redis and Bloom Filters - Atlanta Java Users Group 9/2014Redis and Bloom Filters - Atlanta Java Users Group 9/2014
Redis and Bloom Filters - Atlanta Java Users Group 9/2014
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
DataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-ServiceDataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-Service
 

More from Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceLucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesLucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteLucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchLucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondLucidworks
 

More from Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Recently uploaded

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Recently uploaded (20)

Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG

  • 1. O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
  • 2. OLAP Battle: SolrCloud vs. HBase Dragan Milosevic Chief Search Architect, zanox
  • 3. 3 01 Table of Content Introduce zanox and me What is OLAP? Solrcloud with Doc-Values HBase with End-Point Coprocessors Response Time Comparison Summary Questions Tip 7 Tip 8 Tip 1 Tip 2 Tip 3 Tip 4 Tip 5 Tip 6
  • 4. 4 02 Who am I? Chief Search Architect at Zanox 9+ years experience in building distributed search systems based on Lucene Lucid Certified Apache Solr/Lucene Developer 6+ years of using Hadoop and 4+ years of using HBase for data mining Cloudera Certified Developer for Apache Hadoop and HBase I have applied different machine-learning techniques mainly to optimise resource usage while performing distributed search during my PhD study at Technical University Berlin See my book: “Beyond Centralised Search Engines An Agent-Based Filtering Framework” How can you reach me? dragan.milosevic@zanox.com http://www.linkedin.com/in/draganmilosevic
  • 6. 6 02 What is OLAP? OLAP = Online Analytical Processing
  • 8. 8 01 Lucene Stored Fields Lucene Stored Fields are optimized for retrieving the values of fields for relatively small number of documents (top hits to be presented to user) Tip 1
  • 9. Lucene Stored Fields Lucene Stored Fields are optimized for retrieving the values of fields for relatively small number of documents (top hits to be presented to user) It is difficult to use them efficiently for reporting because many documents (thousands, even millions) are selected and their fields will be needed Machine learning has to be used to analyze queries and determine the optimal order of documents and optimize the loading of hits if stored fields are used for reporting Lucene Revolution San Diego 2013 Analytics in OLAP with Lucene and Hadoop Tip 2 Tip 1
  • 11. 11 01 Doc-Values Step-by-Step – schema.xml & Search Component class Enable Doc-Values in shema.xml for fields that should be aggregated (with stored=“false”) <field name="adrank" type="tfloat" indexed=“false" stored=“false" docValues="true"/> Implement Search Component that uses Doc-Values and aggregates public class SummingComponent extends SearchComponent { public void prepare(ResponseBuilder rb) throws IOException { rb.setNeedDocSet(true); } public void process(ResponseBuilder rb) throws IOException { DocIterator itr = rb.getResults().docSet.iterator(); LeafReader reader = rb.req.getSearcher().getLeafReader(); NumericDocValues docValues = reader.getNumericDocValues("adrank"); double sum = 0; while (itr.hasNext()) { sum += Float.intBitsToFloat((int) docValues.get(itr.nextDoc())); } rb.rsp.add("sum", sum); } Tip 3
  • 12. Doc-Values Step-by-Step – solrconfig.xml Build jar with search component, put it somewhere in contrib and add it in solrconfig.xml <lib dir="${solr.install.dir:../../../..}/contrib/zanox/lib" regex=".*.jar" /> Register newly created search component for summing in solrconfig.xml <searchComponent name="summer" class="com.zanox.search.SummingComponent"> <str name=“…">…</str> </searchComponent> Use search component in request handler <requestHandler name="/summing" class="solr.SearchHandler"> <lst name="defaults"> … </lst> … <arr name="components"> <str>query</str> <str>summer</str> </arr> </requestHandler>
  • 13. What is HBase? “HBase is an open source, non-relational, distributed database modeled after Google's BigTable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed Filesystem), providing BigTable-like capabilities for Hadoop. That is, it provides a fault- tolerant way of storing large quantities of sparse data …” https://en.wikipedia.org/wiki/Apache_HBase “we could simply state that HBase is a faithful, open source implementation of Google’s BigTable. But that would be a bit too simplistic …” HBase: The Definitive Guide, Lars George “HBase is a database: the Hadoop database. It’s often described as a sparse, distributed, persistent, multidimensional sorted map, which is indexed by row-key, column key, and timestamp. You’ll hear people refer to it as a key value store, a column family-oriented database … “ HBase in Action, Nick Dimiduk, Amandeep Khurana
  • 14. HBase Table and Regions
  • 15. Row-Key Design for Reporting
  • 17. Column Family Challenge There are hundreds of columns that should be assigned to few column families The goal is to minimize the number of column families to be accessed for requested query Fewer column families needed means fewer files to be checked Because of the way how splitting of regions is implemented in HBase region servers, the difference in size between column families should be minimized Small column families will be spread across many regions, many files will be needed to be checked while processing queries and response time will increase The irrelevant columns that are not needed for query should be minimized It is allowed to duplicate data and put column in multiple column families to optimize access Important columns that are needed for many queries will be duplicated But the amount of duplication should be minimized not to waste space NP-hard optimization problem
  • 18.
  • 19. End Point Coprocessors End-Point Coprocessors provide a way to run custom code on region-servers and aggregation can be this code They are therefore analogous to both stored-procedures and map-reduce jobs as they execute logic where data resides Without them, client will have to retrieve all values locally before applying logic and probably network bandwidth will be bottleneck They behave as network friendly as possible, as only the summarized results from region servers are sent to client HBase provides high level interface for calling End-Point Coprocessors and selects regions to be queried automatically Tip 4
  • 20. HBase Reporting Guidelines Queries have to be known and analyzed profoundly because Row-keys should be designed so that needed records are saved next to each other and therefore can be accessed quickly Tip 5
  • 21. HBase Reporting Guidelines Queries have to be known and analyzed profoundly because Row-keys should be designed so that needed records are saved next to each other and therefore can be accessed quickly It is necessary to decide which column should be saved in which column-family so that the number of accessed column-families is minimized Tip 5 Tip 6
  • 22. HBase Reporting Guidelines Queries have to be known and analyzed profoundly because Row-keys should be designed so that needed records are saved next to each other and therefore can be accessed quickly It is necessary to decide which column should be saved in which column-family so that the number of accessed column-families is minimized Queries that are unexpected can be very expensive because The structure of row-keys solely decides the order in which records are saved and consequently needed records will not be saved next to each other and therefore cannot be accessed quickly Even though this sounds as a huge disadvantage, typically there are standard reports (queries) that are the only ones made available to end-users and unexpected queries are not possible Tip 5 Tip 6
  • 23. Response Time Comparison – Single Column Tip 7 If aggregation should be performed on single attribute (column) there is no clear winner Therefore use the technology that you are more familiar with
  • 24. Response Time Comparison – Multiple Columns Tip 8 If aggregation should be performed on many big attributes (columns) there is clear winner Use HBase with row-keys and column-families that are optimized for expected queries
  • 25. Summary Tip 3 Tip 1 Tip 2 Tip 6 Tip 4 Tip 5 Tip 8 Tip 7 Do not use Lucene Stored Fields for reporting If not convinced watch Analytics in OLAP with Lucene and Hadoop Activate Doc-Values and enjoy great aggregation performance Plug aggregation code as End-Point Coprocessor in HBase Design row-keys based on queries so that needed records are next to each other Optimize the structure of column-families for queries requesting many aggregations Pick any technology if aggregation is performed over single (few) attributes Use optimized HBase for aggregating many big attributes