SlideShare a Scribd company logo
1 of 9
Apache Solr Performance and Scalability
Effort Update 2017/7 Palo Alto
Michael Sun wsun@cloudera.com
Scaling Solr in a Reliable Way - Challenges
● Scientific Side
○ Velocity, Volume, Variety
○ Designed to Address a Large Spectrum of Use Cases
○ Dealing with Partial Failures
○ Dependencies
○ ...
Scaling Solr in a Reliable Way - Challenges
● Scientific Side
● Human Side
○ Leaky Abstractions
○ Fast Moving Code Base
○ Hard to Test
○ Knowledge Gap for Operation Engineers
High Level Approach - Tactics
1. Improvement on the ‘Scientific Side’
2. Regression Prevention
3. Investigations - Uncover Issues that Really Matters
4. Field Enablement - Tuning Guide for Production, Capacity Planning
Guide, Design Tutorial etc.
5. Automation - Solr Benchmark Framework
High Level Approach - Tactics cont.
1. Optimization
2. Load Test/Scalability Study
3. Lead Large Scale Study (>1k nodes)
Solr Benchmark Framework - Challenges
1. Lack of Standardization. Hard to Collaborate.
2. Lack of Instrumentations.
3. Little Code Reuse.
4. Most Benchmarks Address a Small Set of Use Cases.
Solr Benchmark Framework - Tactics
1. A standardized, extensible object model
2. A layer of reusable components
3. A set of standardized instruments
4. A set of benchmarks for common use cases
● inc. Solr and Solr dependencies
Solr Benchmark Framework - Object Model
Solr Benchmark Framework - Result
1. Solr Nightly Performance Result
a. Both Solr and Solr dependencies
b. https://michael-sun-github.github.io/index.html (the front page is under construction)
2. Solr Benchmarks https://issues.apache.org/jira/browse/SOLR-10317

More Related Content

What's hot

5 Apache Spark Tips in 5 Minutes
5 Apache Spark Tips in 5 Minutes5 Apache Spark Tips in 5 Minutes
5 Apache Spark Tips in 5 MinutesCloudera, Inc.
 
How to build leakproof stream processing pipelines with Apache Kafka and Apac...
How to build leakproof stream processing pipelines with Apache Kafka and Apac...How to build leakproof stream processing pipelines with Apache Kafka and Apac...
How to build leakproof stream processing pipelines with Apache Kafka and Apac...Cloudera, Inc.
 
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...Spark Summit
 
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
 Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac... Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...Cloudera, Inc.
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Cloudera, Inc.
 
dplyr Interfaces to Large-Scale Data
dplyr Interfaces to Large-Scale Datadplyr Interfaces to Large-Scale Data
dplyr Interfaces to Large-Scale DataCloudera, Inc.
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveFaster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveDataWorks Summit/Hadoop Summit
 
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...Spark Summit
 
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)Spark Summit
 
MapR-DB Elasticsearch Integration
MapR-DB Elasticsearch IntegrationMapR-DB Elasticsearch Integration
MapR-DB Elasticsearch IntegrationMapR Technologies
 
Dr. Elephant: Achieving Quicker, Easier, and Cost-Effective Big Data Analytic...
Dr. Elephant: Achieving Quicker, Easier, and Cost-Effective Big Data Analytic...Dr. Elephant: Achieving Quicker, Easier, and Cost-Effective Big Data Analytic...
Dr. Elephant: Achieving Quicker, Easier, and Cost-Effective Big Data Analytic...Spark Summit
 
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...DataStax
 
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduBuilding Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduJeremy Beard
 
RISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsRISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsJen Aman
 
The Next AMPLab: Real-Time, Intelligent, and Secure Computing
The Next AMPLab: Real-Time, Intelligent, and Secure ComputingThe Next AMPLab: Real-Time, Intelligent, and Secure Computing
The Next AMPLab: Real-Time, Intelligent, and Secure ComputingSpark Summit
 
Tachyon and Apache Spark
Tachyon and Apache SparkTachyon and Apache Spark
Tachyon and Apache Sparkrhatr
 

What's hot (20)

5 Apache Spark Tips in 5 Minutes
5 Apache Spark Tips in 5 Minutes5 Apache Spark Tips in 5 Minutes
5 Apache Spark Tips in 5 Minutes
 
How to build leakproof stream processing pipelines with Apache Kafka and Apac...
How to build leakproof stream processing pipelines with Apache Kafka and Apac...How to build leakproof stream processing pipelines with Apache Kafka and Apac...
How to build leakproof stream processing pipelines with Apache Kafka and Apac...
 
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
 
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
 Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac... Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
 
dplyr Interfaces to Large-Scale Data
dplyr Interfaces to Large-Scale Datadplyr Interfaces to Large-Scale Data
dplyr Interfaces to Large-Scale Data
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveFaster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
 
Amazon EMR
Amazon EMRAmazon EMR
Amazon EMR
 
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
 
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
 
MapR-DB Elasticsearch Integration
MapR-DB Elasticsearch IntegrationMapR-DB Elasticsearch Integration
MapR-DB Elasticsearch Integration
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Dr. Elephant: Achieving Quicker, Easier, and Cost-Effective Big Data Analytic...
Dr. Elephant: Achieving Quicker, Easier, and Cost-Effective Big Data Analytic...Dr. Elephant: Achieving Quicker, Easier, and Cost-Effective Big Data Analytic...
Dr. Elephant: Achieving Quicker, Easier, and Cost-Effective Big Data Analytic...
 
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...
 
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduBuilding Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
 
Spark Technology Center IBM
Spark Technology Center IBMSpark Technology Center IBM
Spark Technology Center IBM
 
RISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsRISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time Decisions
 
The Next AMPLab: Real-Time, Intelligent, and Secure Computing
The Next AMPLab: Real-Time, Intelligent, and Secure ComputingThe Next AMPLab: Real-Time, Intelligent, and Secure Computing
The Next AMPLab: Real-Time, Intelligent, and Secure Computing
 
Tachyon and Apache Spark
Tachyon and Apache SparkTachyon and Apache Spark
Tachyon and Apache Spark
 
Facial recognition
Facial recognitionFacial recognition
Facial recognition
 

Similar to Scaling Solr in a Reliable Way and Improving Performance

Scalable and Cost-Effective Model-Based Software Verification and Testing
Scalable and Cost-Effective Model-Based Software Verification and TestingScalable and Cost-Effective Model-Based Software Verification and Testing
Scalable and Cost-Effective Model-Based Software Verification and TestingLionel Briand
 
SolrCloud-Best Practices for Sitecore. Design, build, and devops considerations
SolrCloud-Best Practices for Sitecore. Design, build, and devops considerationsSolrCloud-Best Practices for Sitecore. Design, build, and devops considerations
SolrCloud-Best Practices for Sitecore. Design, build, and devops considerationsSameer Maggon
 
Rapid pruning of search space through hierarchical matching
Rapid pruning of search space through hierarchical matchingRapid pruning of search space through hierarchical matching
Rapid pruning of search space through hierarchical matchinglucenerevolution
 
A Framework for Classifying and Comparing Architecture-Centric Software Evolu...
A Framework for Classifying and Comparing Architecture-Centric Software Evolu...A Framework for Classifying and Comparing Architecture-Centric Software Evolu...
A Framework for Classifying and Comparing Architecture-Centric Software Evolu...Pooyan Jamshidi
 
Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...
Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...
Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...Abdel Salam Sayyad
 
Webinar: How We Evaluated MongoDB as a Relational Database Replacement
Webinar: How We Evaluated MongoDB as a Relational Database ReplacementWebinar: How We Evaluated MongoDB as a Relational Database Replacement
Webinar: How We Evaluated MongoDB as a Relational Database ReplacementMongoDB
 
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACLPerformance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACLTriNimbus
 
Spiral Model in Software Engineering with Case Study
Spiral Model in Software Engineering with Case StudySpiral Model in Software Engineering with Case Study
Spiral Model in Software Engineering with Case StudySahil Bansal
 
8th semester syllabus b sc csit-pawan kafle
8th semester syllabus b sc csit-pawan kafle8th semester syllabus b sc csit-pawan kafle
8th semester syllabus b sc csit-pawan kaflePAWAN KAFLE
 
Just the Job: Employing Solr for Recruitment Search -Charlie Hull
Just the Job: Employing Solr for Recruitment Search -Charlie Hull Just the Job: Employing Solr for Recruitment Search -Charlie Hull
Just the Job: Employing Solr for Recruitment Search -Charlie Hull lucenerevolution
 
Modularity for Automated Assessment: A Design-Space Exploration
Modularity for Automated Assessment: A Design-Space ExplorationModularity for Automated Assessment: A Design-Space Exploration
Modularity for Automated Assessment: A Design-Space ExplorationSteffen Zschaler
 
Adaptive processes advanced course in software design and architecture
Adaptive processes advanced course in software design and architectureAdaptive processes advanced course in software design and architecture
Adaptive processes advanced course in software design and architectureLN Mishra CBAP
 
Apache Solr - An Experience Report
Apache Solr - An Experience ReportApache Solr - An Experience Report
Apache Solr - An Experience ReportNetcetera
 
Apex Enterprise Patterns Galore - Boston, MA dev group meeting 062719
Apex Enterprise Patterns Galore - Boston, MA dev group meeting 062719Apex Enterprise Patterns Galore - Boston, MA dev group meeting 062719
Apex Enterprise Patterns Galore - Boston, MA dev group meeting 062719BingWang77
 
Introduction to Scrum - 1 day workshop
Introduction to Scrum - 1 day workshopIntroduction to Scrum - 1 day workshop
Introduction to Scrum - 1 day workshopEvan Leybourn
 
Why die a developer ? (programming to architecture)
Why die a developer  ? (programming to architecture)Why die a developer  ? (programming to architecture)
Why die a developer ? (programming to architecture)Rajesh M
 
Software Development Methodologiess.pptx
Software Development Methodologiess.pptxSoftware Development Methodologiess.pptx
Software Development Methodologiess.pptxCourtneyFradreck
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment Databricks
 

Similar to Scaling Solr in a Reliable Way and Improving Performance (20)

Scalable and Cost-Effective Model-Based Software Verification and Testing
Scalable and Cost-Effective Model-Based Software Verification and TestingScalable and Cost-Effective Model-Based Software Verification and Testing
Scalable and Cost-Effective Model-Based Software Verification and Testing
 
SolrCloud-Best Practices for Sitecore. Design, build, and devops considerations
SolrCloud-Best Practices for Sitecore. Design, build, and devops considerationsSolrCloud-Best Practices for Sitecore. Design, build, and devops considerations
SolrCloud-Best Practices for Sitecore. Design, build, and devops considerations
 
Rapid pruning of search space through hierarchical matching
Rapid pruning of search space through hierarchical matchingRapid pruning of search space through hierarchical matching
Rapid pruning of search space through hierarchical matching
 
A Framework for Classifying and Comparing Architecture-Centric Software Evolu...
A Framework for Classifying and Comparing Architecture-Centric Software Evolu...A Framework for Classifying and Comparing Architecture-Centric Software Evolu...
A Framework for Classifying and Comparing Architecture-Centric Software Evolu...
 
Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...
Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...
Evolutionary Search Techniques with Strong Heuristics for Multi-Objective Fea...
 
Webinar: How We Evaluated MongoDB as a Relational Database Replacement
Webinar: How We Evaluated MongoDB as a Relational Database ReplacementWebinar: How We Evaluated MongoDB as a Relational Database Replacement
Webinar: How We Evaluated MongoDB as a Relational Database Replacement
 
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACLPerformance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
 
Spiral Model in Software Engineering with Case Study
Spiral Model in Software Engineering with Case StudySpiral Model in Software Engineering with Case Study
Spiral Model in Software Engineering with Case Study
 
8th semester syllabus b sc csit-pawan kafle
8th semester syllabus b sc csit-pawan kafle8th semester syllabus b sc csit-pawan kafle
8th semester syllabus b sc csit-pawan kafle
 
Just the Job: Employing Solr for Recruitment Search -Charlie Hull
Just the Job: Employing Solr for Recruitment Search -Charlie Hull Just the Job: Employing Solr for Recruitment Search -Charlie Hull
Just the Job: Employing Solr for Recruitment Search -Charlie Hull
 
Modularity for Automated Assessment: A Design-Space Exploration
Modularity for Automated Assessment: A Design-Space ExplorationModularity for Automated Assessment: A Design-Space Exploration
Modularity for Automated Assessment: A Design-Space Exploration
 
Adaptive processes advanced course in software design and architecture
Adaptive processes advanced course in software design and architectureAdaptive processes advanced course in software design and architecture
Adaptive processes advanced course in software design and architecture
 
Apache Solr - An Experience Report
Apache Solr - An Experience ReportApache Solr - An Experience Report
Apache Solr - An Experience Report
 
Apex Enterprise Patterns Galore - Boston, MA dev group meeting 062719
Apex Enterprise Patterns Galore - Boston, MA dev group meeting 062719Apex Enterprise Patterns Galore - Boston, MA dev group meeting 062719
Apex Enterprise Patterns Galore - Boston, MA dev group meeting 062719
 
Introduction to Scrum - 1 day workshop
Introduction to Scrum - 1 day workshopIntroduction to Scrum - 1 day workshop
Introduction to Scrum - 1 day workshop
 
Why die a developer ? (programming to architecture)
Why die a developer  ? (programming to architecture)Why die a developer  ? (programming to architecture)
Why die a developer ? (programming to architecture)
 
DevOps 101
DevOps 101DevOps 101
DevOps 101
 
Software Development Methodologiess.pptx
Software Development Methodologiess.pptxSoftware Development Methodologiess.pptx
Software Development Methodologiess.pptx
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment
 
Solr 101
Solr 101Solr 101
Solr 101
 

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 

Recently uploaded (20)

Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptx
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 

Scaling Solr in a Reliable Way and Improving Performance

  • 1. Apache Solr Performance and Scalability Effort Update 2017/7 Palo Alto Michael Sun wsun@cloudera.com
  • 2. Scaling Solr in a Reliable Way - Challenges ● Scientific Side ○ Velocity, Volume, Variety ○ Designed to Address a Large Spectrum of Use Cases ○ Dealing with Partial Failures ○ Dependencies ○ ...
  • 3. Scaling Solr in a Reliable Way - Challenges ● Scientific Side ● Human Side ○ Leaky Abstractions ○ Fast Moving Code Base ○ Hard to Test ○ Knowledge Gap for Operation Engineers
  • 4. High Level Approach - Tactics 1. Improvement on the ‘Scientific Side’ 2. Regression Prevention 3. Investigations - Uncover Issues that Really Matters 4. Field Enablement - Tuning Guide for Production, Capacity Planning Guide, Design Tutorial etc. 5. Automation - Solr Benchmark Framework
  • 5. High Level Approach - Tactics cont. 1. Optimization 2. Load Test/Scalability Study 3. Lead Large Scale Study (>1k nodes)
  • 6. Solr Benchmark Framework - Challenges 1. Lack of Standardization. Hard to Collaborate. 2. Lack of Instrumentations. 3. Little Code Reuse. 4. Most Benchmarks Address a Small Set of Use Cases.
  • 7. Solr Benchmark Framework - Tactics 1. A standardized, extensible object model 2. A layer of reusable components 3. A set of standardized instruments 4. A set of benchmarks for common use cases ● inc. Solr and Solr dependencies
  • 8. Solr Benchmark Framework - Object Model
  • 9. Solr Benchmark Framework - Result 1. Solr Nightly Performance Result a. Both Solr and Solr dependencies b. https://michael-sun-github.github.io/index.html (the front page is under construction) 2. Solr Benchmarks https://issues.apache.org/jira/browse/SOLR-10317

Editor's Notes

  1. Flexible abstraction model