SlideShare a Scribd company logo
1 of 60
Download to read offline
Brahe - Flexible Indexing At Scale
Ben Brown
Software Architect, Cerner Corporation
Who I Am
• Ben Brown
Software Architect
• Cerner
Healthcare IT Company
• Semantic Solutions
Team of 10
Search Services
Fun Stuff
NLP, Medical Ontologies, ML
Chart Search
Taking This
Photo: http://bit.ly/Y7kTJt
Chart Search
Turning it into this
Chart Search Does
• Faceting
• NLP
• Semantic Concept Markup
Makes for a heavy record
(Especially on Solr 1.4)
Where We Started
Started Major Engineering in 2009
IBM Dev Works: http://ibm.co/14ZrtqX
Where We Started
Started Major Engineering in 2009
IBM Dev Works: http://ibm.co/14ZrtqX
Scale
• Clusters partitioned by client
• Raw and processed data in HDFS
• All processing & indexing done through map
reduce
Shard Size
Limiting Factor ~26 Million Discrete Results Per
Shard
Average of 35 Shards Per Client
Range 5 to 140
Query Touch Points
Query Touch Points
One User Action ~ 4 Queries
35 Shards - 432 Touch Points
140 Shards - 1692 Touch Points
• Works, but not efficient
• Chance for variance killing performance
• Failure is a massive config headache
Growth
• Hashed ID does not play well with resizing
• Deploy Again
• Reindex Everything
Document Hash modulo Shard Count
Doc One:Hash(abc123) = 15
Doc Two: Hash(efg456) = 8
Doc Three: Hash(hij789) = 7
3 Shards
Doc One -> Shard 0
Doc Two -> Shard 2
Doc Three -> Shard 1
4 Shards
Doc One -> Shard 3
Doc Two -> Shard 0
Doc Three -> Shard 3
We Have a Problem
Painful Growth
Lots of Deploys
Variance Risk
Image: http://bit.ly/Y7oBD6
What Would Be Better?
Load Balance at the Client
Automated Failover
Easy Deployments
Simplified Splitting
Minimized Touch Points
Disconnected Stages
Solution
Shift Master to HBase
Image: http://bit.ly/ZXO2na
Why HBase?
Lexically organized keys
Efficient key range scans
Efficient time based scans
We're pretty good at operating it
Coordinate With ZooKeeper
|-- Index name
|-- Version
|-- Solr Schema/Config
|-- Table Name + Connection Info
|-- Shard Number
|-- Shard Boundary Info
|-- Replica Number
|-- Ephemeral Claim
|-- Solr Connection Info
|-- Ephemeral Online
Custom Core Admin
Work with ZooKeeper for claim process
Creates solr core after claims
Controls pulling data from HBase
Claim Process
Claim Process
Claim Process
Image: http://bit.ly/Or317R
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Claim Process
Coordinate With ZooKeeper
|-- Index name
|-- Version
|-- Solr Schema/Config
|-- Table Name + Connection Info
|-- Shard Number
|-- Shard Boundary Info
|-- Replica Number
|-- Ephemeral Claim
|-- Solr Connection Info
|-- Ephemeral Online
Queries
• Client inspects ZooKeeper
• Finds online nodes
o Only for the keyspace it cares about
o Issues distributed queries if necessary
• Balances in the Client
• Retries if queries fail
Ends Thoughts
• Keep things simple
• Disconnect your stages
• Keep your touchpoints at a minimum
• Organize your data around your queries
• Use what you’re good at
CONTACT
Ben Brown
http://linkd.in/ZZIBK4
@b_brown
ENGINEERING BLOG
https://engineering.cerner.com/
WE’RE HIRING!
http://www.cerner.com/About_Cerner/Careers/
Bonus Slides!
Brahe   mass scale flexible indexing
Brahe   mass scale flexible indexing
Brahe   mass scale flexible indexing
Brahe   mass scale flexible indexing
Brahe   mass scale flexible indexing
Brahe   mass scale flexible indexing

More Related Content

Similar to Brahe mass scale flexible indexing

Was l iberty for java batch and jsr352
Was l iberty for java batch and jsr352Was l iberty for java batch and jsr352
Was l iberty for java batch and jsr352
sflynn073
 
Enhance ServiceNow with Automated Discovery for Mainframe and IBM i
Enhance ServiceNow with Automated Discovery for Mainframe and IBM iEnhance ServiceNow with Automated Discovery for Mainframe and IBM i
Enhance ServiceNow with Automated Discovery for Mainframe and IBM i
Precisely
 

Similar to Brahe mass scale flexible indexing (20)

Was l iberty for java batch and jsr352
Was l iberty for java batch and jsr352Was l iberty for java batch and jsr352
Was l iberty for java batch and jsr352
 
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsCloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
 
Insights on Knative and how it changes the serverless landscape
Insights on Knative and how it changes the serverless landscapeInsights on Knative and how it changes the serverless landscape
Insights on Knative and how it changes the serverless landscape
 
Enhance ServiceNow with Automated Discovery for Mainframe and IBM i
Enhance ServiceNow with Automated Discovery for Mainframe and IBM iEnhance ServiceNow with Automated Discovery for Mainframe and IBM i
Enhance ServiceNow with Automated Discovery for Mainframe and IBM i
 
Gemfire Introduction
Gemfire Introduction Gemfire Introduction
Gemfire Introduction
 
EEDC 2010. Scaling Web Applications
EEDC 2010. Scaling Web ApplicationsEEDC 2010. Scaling Web Applications
EEDC 2010. Scaling Web Applications
 
Benchmarking Hadoop and Big Data
Benchmarking Hadoop and Big DataBenchmarking Hadoop and Big Data
Benchmarking Hadoop and Big Data
 
General 05 integration design vs migration design
General 05   integration design vs migration designGeneral 05   integration design vs migration design
General 05 integration design vs migration design
 
Building Reactive Applications With Node.Js And Red Hat JBoss Data Grid (Gald...
Building Reactive Applications With Node.Js And Red Hat JBoss Data Grid (Gald...Building Reactive Applications With Node.Js And Red Hat JBoss Data Grid (Gald...
Building Reactive Applications With Node.Js And Red Hat JBoss Data Grid (Gald...
 
Bitfusion Nimbix Dev Summit Heterogeneous Architectures
Bitfusion Nimbix Dev Summit Heterogeneous Architectures Bitfusion Nimbix Dev Summit Heterogeneous Architectures
Bitfusion Nimbix Dev Summit Heterogeneous Architectures
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
 
Informix 14.1 launch Webinar
Informix 14.1 launch WebinarInformix 14.1 launch Webinar
Informix 14.1 launch Webinar
 
J1 - Keynote Data Platform - Rohan Kumar
J1 - Keynote Data Platform - Rohan KumarJ1 - Keynote Data Platform - Rohan Kumar
J1 - Keynote Data Platform - Rohan Kumar
 
Protecting Your Power Systems with Cloud-based HA/DR
Protecting Your Power Systems with Cloud-based HA/DRProtecting Your Power Systems with Cloud-based HA/DR
Protecting Your Power Systems with Cloud-based HA/DR
 
TDWI San Diego 2014: Wendy Lucas Describes how BLU Acceleration Delivers In-T...
TDWI San Diego 2014: Wendy Lucas Describes how BLU Acceleration Delivers In-T...TDWI San Diego 2014: Wendy Lucas Describes how BLU Acceleration Delivers In-T...
TDWI San Diego 2014: Wendy Lucas Describes how BLU Acceleration Delivers In-T...
 
File Manager for z/OS - Overview
File Manager for z/OS - OverviewFile Manager for z/OS - Overview
File Manager for z/OS - Overview
 
Informix 14.1 launch webinar
Informix 14.1 launch webinarInformix 14.1 launch webinar
Informix 14.1 launch webinar
 
Presentation design - key concepts and approaches for designing your deskto...
Presentation   design - key concepts and approaches for designing your deskto...Presentation   design - key concepts and approaches for designing your deskto...
Presentation design - key concepts and approaches for designing your deskto...
 
presentation slides
presentation slidespresentation slides
presentation slides
 
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...HBaseCon 2013:  Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
 

More from lucenerevolution

Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
lucenerevolution
 

More from lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 

Recently uploaded

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Recently uploaded (20)

Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 

Brahe mass scale flexible indexing