SlideShare a Scribd company logo
Solr
The Search First NoSQL Database
• Mark Miller: Cloudera
employee, Lucene PMC
member, Apache member
• Started playing with
Lucene in 2006
• Lucene committer since
2008
• Solr committer since 2009
Who Am I?
My Dog
Big Data is getting Bigger
• The total Big Data market reached $11.4 billion in 2012
• The Big Data market is projected to reach $18.1 billion in
2013, an annual growth of 61%
• On pace to exceed $47 billion by 2017.
3 basic needs
• Storage
• Processing
• Search
Two Standouts in
the Big Data Market
•Hadoop
•NoSQL
Ultimately, the NoSQL market is largely up for
grabs. Each NoSQL database has its related
strengths and weaknesses, and no one NoSQL
database currently “does it all.” Big Data
practitioners must take a number of factors into
consideration when selecting a NoSQL database
to facilitate large-scale transactional workloads,
including scalability, performance, security, and
ease-of-development.
Big Data Vendor Revenue and Market Forecast
(Wikibon)
RMDBS
• The classic way to store your data.
• ACID is great, transactions are cool, SQL is well
known and understood.
• Scaling is *hard*, but possible (see Facebook’s
MySQL cluster)
• ‘impedance mismatch’ sucks
Search
• Search has been moving from an expensive,
complicated option to an affordable and more easy
necessity.
• Lot’s of data begs for the ability to process it, store it,
and search it.
Enterprise Search
Engines
• Verity - acquired by Autonomy in 2005
• FAST - acquired by Microsoft in 2008
• Endeca - acquired by Oracle in 2011
• Autonomy - acquired by HP in 2011
• Vivisimo - acquired by IBM in 2012
NoSQL
• Not Only SQL rather than ‘No SQL’
• Except that makes little sense...
• “when ‘NoSQL’ is applied to a database, it refers to
an ill- defined set of mostly open-source databases,
mostly developed in the early 21st century, and
mostly not using SQL.” - NoSQL Distilled
NoSQL
• Key-Value
• Columnar
• Document
• Graph
In the beginning..
• BerkeleyDB (1991?)
• Lotus Notes (1989?)
• Bayou (1996?)
In the beginning of
the modern era...
• BigTable (Google) (started in 2004, paper in 2006)
• Dynamo (Amazon) (paper in 2007)
Derivatives
• Dynamo: Cassandra, CouchDB, Voldemort, Riak
• BigTable: Cassandra, HBase, Redis, HyperTable,
Accumulo
Also...
• AppEngine storage built on BigTable
• DynamoDB - based on the principles of Dynamo
When it comes to NoSQL,
Open Source rules the
roost.
• I won’t be talking about any solution that is not
based on Open Source - only because those
solutions are not popular.
• "there’s a notion that NoSQL is an open-source
phenomenon.” - NoSQL Distilled
The 2013 Future of Open
Source Survey Results
Black Duck and North Bridge
What’s Popular?
• NoSQL database proliferation - NoSQL databases are
a dime a dozen. Why?
• Which solutions should we look at?
indeed.com
• Indeed.com is an employment-related metasearch
engine for job listings
• Indeed is the #1 job site worldwide, with over 100
million unique visitors per month. Indeed is available
in more than 50 countries and 26 languages,
covering 94% of global GDP.
http://db-engines.com
• DB-Engines is an initiative to collect and present
information on database management systems
(DBMS). In addition to established relational DBMS,
systems and concepts of the growing NoSQL area
are emphasized.
• The DB-Engines Ranking is a list of DBMS ranked by
their current popularity. The list is updated monthly.
Popular Search Job
Trends
Popular Search
Solutions (DB-Engines)
Popular NoSQL Job
Trends
Let’s get some
context
Compare to Java
Add in Oracle...
NoSQL Database
Types
• Key-Value
• Column Family
• Document
• Graph
I’m going to ignore
Graph...everyone
else seems to...
Popular NoSQL
Document Stores
(DB-Rankings)
Key-Value Stores
Columnar Stores
The Full Popularity
Contest
In case you forgot,
Oracle is in the
NoSQL game...
• Oracle NoSQL
CAP Theorem
The CAP theorem, also known as Brewer's theorem,
states that it is impossible for a distributed computer
system to simultaneously provide all three of the
following guarantees:
• Consistency (all nodes see the same data at the
same time)
• Availability (a guarantee that every request
receives a response about whether it was
successful or failed)
• Partition tolerance (the system continues to
operate despite arbitrary message loss or failure of
part of the system)
CAP
Architectures
• For NoSQL, generally boils down to AP or CP. CA
does not support partition tolerance.
• You have to trade off consistency versus availability.
• AP favors availability over consistency - the is the
eventually consistent architecture.
• CP favors consistency over availability.
• Of course, there is a continuum between AP and CP.
Key Design
Decisions
• Data Model - how is the data stored/accessed
• Distribution Model - how is the data distributed
• Conflict Resolution - how is it ensured that the same
update ‘wins’ on each node.
Data Model
• key -> value (opaque)
• key -> document
• column oriented
Distributed Model
• Roughly, how is data distributed across the cluster?
• Sharding, replication, etc
Data Versioning and
Consistency
• Essentially, how is data kept consistent across nodes?
• Sequential consistency—ensuring that all nodes
apply operations in the same order.
• Update consistency and read consistency.
• Data Model - bson - binary json format
• Distributed Model - sharded asynchronous master/
slave replication.
• Data Versioning and Consistency - Master / Slave, per
table write lock
MongoDB Search
• Built in text search. I think of it like RBDMS built in
full text search - major feature gaps with dedicated
full text search engines, and likely major
performance gaps.
• Common to sit a search engine next to MongoDB
• Data Model - column based, like BigTable
• Distributed Updates - similar to Dynamo, consistent
hashing, master-master
• Data Versioning and Consistency - timestamps
Cassandra Search
• Lucandra
• Solandra
• DataStax Enterprise Search (Solr fields must be
strings)
• Data Model - Column Store
• Distribution Model - regions served by region
servers.
• Versioning and Consistency - strongly consistent
HBase Search
• HBasene (dead?)
• HBASE-SEARCH, HBASE-3529 (dead?)
• Solbase
• Lily
• Riak is a NoSQL database implementing the
principles from Amazon's Dynamo paper
• Data Model - stores key/value pairs in a high level
namespace called a bucket.
• Data Versioning and Consistency - Riak uses a data
structure called a vector clock to reason about
causality and staleness of stored values. (Can also
use timestamps). Last write wins, or client resolves
conflict.
Riak Search
• Riak Search - custom search engine, Solr-like API
• Yokozuna
Yokozuna Author Enumerates
Common Reasons Custom Search
has Failed
• Pretends to be lucene/solr
• Lack of analyzer/language/features
• Bad performance/resource usage for certain queries
• Basho is not in the business of search
• CouchDB’s data format is JSON stored as documents
(self-contained records with no intrinsic
relationships), grouped into “database” namespaces.
• Conflicts are left to the application to resolve at write
time. CouchDB arbitrarily, but deterministically,
determines a winner and tracks a conflict. The client
must then resolve the conflict.
CouchDB Search
• CouchDB-Lucene
• Seems people usually just sit a search engine next to
CouchDB
• Redis is an open-source, networked, in-memory, key-
value data store with optional durability.
• Memcached is a general-purpose distributed memory
caching system
• Redis-Search
Adding Search to
NoSQL
• Hard to do without a lot of compromise
• Build your own, or use Lucene or Lucene based
solution
• Nothing has yet set the world on fire...
Adding NoSQL to
Search
• Search solutions are generally already a Document
based NoSQL solution.
• Seems a lot easier to do then the reverse
• Nothing has yet set the world on fire...
Solr NoSQL
Features
• Realtime-Get
• Update Durability
• Atomic Compare and Set
• Versioning and optimistic locking
Schemaless?
• NoSQL databases are generally ‘schemaless’
• In some ways, convenient, in others ways not.
• Implicit schema moves to application code.
• Can’t optimize based on types.
• Note: some are calling ‘guessed’ schemas
schemaless.
• Most similar to the MongoDB architecture
• A CP system, though currently, eventually consistent.
• The architecture supports adding strong consistency
options.
SolrCloud
• The length of time an inconsistency is present is
called the inconsistency window.
• SolrCloud has a very small inconsistency window.
Data Model
• key -> document
• Optionally, column oriented
Contact Info
• @heismark
• markrmiller@gmail.com

More Related Content

What's hot

CosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersCosmosDB for DBAs & Developers
CosmosDB for DBAs & Developers
Niko Neugebauer
 
Scaling MySQL using Fabric
Scaling MySQL using FabricScaling MySQL using Fabric
Scaling MySQL using Fabric
Karthik .P.R
 
MySQL HA Percona cluster @ MySQL meetup Mumbai
MySQL HA Percona cluster @ MySQL meetup MumbaiMySQL HA Percona cluster @ MySQL meetup Mumbai
MySQL HA Percona cluster @ MySQL meetup Mumbai
Remote MySQL DBA
 
Scaling with Riak at Showyou
Scaling with Riak at ShowyouScaling with Riak at Showyou
Scaling with Riak at Showyou
John Muellerleile
 
Application Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a ServiceApplication Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a ServiceWSO2
 
Rolling With Riak
Rolling With RiakRolling With Riak
Rolling With Riak
John Lynch
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
DataStax Academy
 
Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )
Rahul Jain
 
How and when to use NoSQL
How and when to use NoSQLHow and when to use NoSQL
How and when to use NoSQL
Amazon Web Services
 
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
Lucidworks
 
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBaseHBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon
 
Solr + Hadoop = Big Data Search
Solr + Hadoop = Big Data SearchSolr + Hadoop = Big Data Search
Solr + Hadoop = Big Data SearchMark Miller
 
keyvi the key value index @ Cliqz
keyvi the key value index @ Cliqzkeyvi the key value index @ Cliqz
keyvi the key value index @ Cliqz
Hendrik Muhs
 
Tech Spark Presentation
Tech Spark PresentationTech Spark Presentation
Tech Spark PresentationStephen Borg
 
Modern MySQL Monitoring and Dashboards.
Modern MySQL Monitoring and Dashboards.Modern MySQL Monitoring and Dashboards.
Modern MySQL Monitoring and Dashboards.
Mydbops
 
Thug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen ZhangThug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen Zhang
Chen Zhang
 
Apache Spark in Industry
Apache Spark in IndustryApache Spark in Industry
Apache Spark in Industry
Dorian Beganovic
 
Hadoop for the Absolute Beginner
Hadoop for the Absolute BeginnerHadoop for the Absolute Beginner
Hadoop for the Absolute Beginner
Ike Ellis
 
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark Summit
 

What's hot (20)

CosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersCosmosDB for DBAs & Developers
CosmosDB for DBAs & Developers
 
Scaling MySQL using Fabric
Scaling MySQL using FabricScaling MySQL using Fabric
Scaling MySQL using Fabric
 
MySQL HA Percona cluster @ MySQL meetup Mumbai
MySQL HA Percona cluster @ MySQL meetup MumbaiMySQL HA Percona cluster @ MySQL meetup Mumbai
MySQL HA Percona cluster @ MySQL meetup Mumbai
 
Scaling with Riak at Showyou
Scaling with Riak at ShowyouScaling with Riak at Showyou
Scaling with Riak at Showyou
 
Application Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a ServiceApplication Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a Service
 
Rolling With Riak
Rolling With RiakRolling With Riak
Rolling With Riak
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Rails on HBase
Rails on HBaseRails on HBase
Rails on HBase
 
Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )
 
How and when to use NoSQL
How and when to use NoSQLHow and when to use NoSQL
How and when to use NoSQL
 
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
 
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBaseHBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
 
Solr + Hadoop = Big Data Search
Solr + Hadoop = Big Data SearchSolr + Hadoop = Big Data Search
Solr + Hadoop = Big Data Search
 
keyvi the key value index @ Cliqz
keyvi the key value index @ Cliqzkeyvi the key value index @ Cliqz
keyvi the key value index @ Cliqz
 
Tech Spark Presentation
Tech Spark PresentationTech Spark Presentation
Tech Spark Presentation
 
Modern MySQL Monitoring and Dashboards.
Modern MySQL Monitoring and Dashboards.Modern MySQL Monitoring and Dashboards.
Modern MySQL Monitoring and Dashboards.
 
Thug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen ZhangThug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen Zhang
 
Apache Spark in Industry
Apache Spark in IndustryApache Spark in Industry
Apache Spark in Industry
 
Hadoop for the Absolute Beginner
Hadoop for the Absolute BeginnerHadoop for the Absolute Beginner
Hadoop for the Absolute Beginner
 
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
 

Viewers also liked

Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
Chris Nauroth
 
The Many Facets of Apache Solr - Yonik Seeley
The Many Facets of Apache Solr - Yonik SeeleyThe Many Facets of Apache Solr - Yonik Seeley
The Many Facets of Apache Solr - Yonik Seeley
lucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
lucenerevolution
 
Scaling Solr with Solr Cloud
Scaling Solr with Solr CloudScaling Solr with Solr Cloud
Scaling Solr with Solr Cloud
Sematext Group, Inc.
 
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, ClouderaSolr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, ClouderaLucidworks
 
The history of Prometheus at SoundCloud
The history of Prometheus at SoundCloudThe history of Prometheus at SoundCloud
The history of Prometheus at SoundCloud
Tobias Schmidt
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Alexandre Rafalovitch
 
Introduction to Apache Solr.
Introduction to Apache Solr.Introduction to Apache Solr.
Introduction to Apache Solr.
ashish0x90
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
Mohit Saini
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
Tony Tam
 
Nosql data models
Nosql data modelsNosql data models
Nosql data models
Viet-Trung TRAN
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash courseTommaso Teofili
 
Scaling search with Solr Cloud
Scaling search with Solr CloudScaling search with Solr Cloud
Scaling search with Solr Cloud
Cominvent AS
 
SolrCloud Failover and Testing
SolrCloud Failover and TestingSolrCloud Failover and Testing
SolrCloud Failover and TestingMark Miller
 
Data Modeling PPT
Data Modeling PPTData Modeling PPT
Data Modeling PPT
Trinath
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Cloudera, Inc.
 
Different data models
Different data modelsDifferent data models
Different data models
madhusha udayangani
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Shalin Shekhar Mangar
 

Viewers also liked (20)

Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
 
The Many Facets of Apache Solr - Yonik Seeley
The Many Facets of Apache Solr - Yonik SeeleyThe Many Facets of Apache Solr - Yonik Seeley
The Many Facets of Apache Solr - Yonik Seeley
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Scaling Solr with Solr Cloud
Scaling Solr with Solr CloudScaling Solr with Solr Cloud
Scaling Solr with Solr Cloud
 
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, ClouderaSolr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
Solr on HDFS - Past, Present, and Future: Presented by Mark Miller, Cloudera
 
The history of Prometheus at SoundCloud
The history of Prometheus at SoundCloudThe history of Prometheus at SoundCloud
The history of Prometheus at SoundCloud
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
 
Introduction to Apache Solr.
Introduction to Apache Solr.Introduction to Apache Solr.
Introduction to Apache Solr.
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
 
Nosql data models
Nosql data modelsNosql data models
Nosql data models
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Scaling search with Solr Cloud
Scaling search with Solr CloudScaling search with Solr Cloud
Scaling search with Solr Cloud
 
Data models
Data modelsData models
Data models
 
SolrCloud Failover and Testing
SolrCloud Failover and TestingSolrCloud Failover and Testing
SolrCloud Failover and Testing
 
Data Modeling PPT
Data Modeling PPTData Modeling PPT
Data Modeling PPT
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 
Different data models
Different data modelsDifferent data models
Different data models
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
 

Similar to Solr cloud the 'search first' nosql database extended deep dive

UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
Rahul Borate
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
Rahul Borate
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
Bethmi Gunasekara
 
NOsql Presentation.pdf
NOsql Presentation.pdfNOsql Presentation.pdf
NOsql Presentation.pdf
AkshayDwivedi31
 
NoSQL.pptx
NoSQL.pptxNoSQL.pptx
NoSQL.pptx
RithikRaj25
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandra
Brian Enochson
 
NoSQL and MongoDB
NoSQL and MongoDBNoSQL and MongoDB
NoSQL and MongoDB
Rajesh Menon
 
No SQL
No SQLNo SQL
Drop acid
Drop acidDrop acid
Drop acid
Mike Feltman
 
NoSql - mayank singh
NoSql - mayank singhNoSql - mayank singh
NoSql - mayank singh
Mayank Singh
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7
abdulrahmanhelan
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Management
sameerfaizan
 
Chapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesChapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choices
Maynooth University
 
MongoDB
MongoDBMongoDB
MongoDB
Rony Gregory
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQL
RTigger
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabasesAdi Challa
 
Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?
Saltmarch Media
 
Nosql databases for the .net developer
Nosql databases for the .net developerNosql databases for the .net developer
Nosql databases for the .net developerJesus Rodriguez
 

Similar to Solr cloud the 'search first' nosql database extended deep dive (20)

UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
 
NOsql Presentation.pdf
NOsql Presentation.pdfNOsql Presentation.pdf
NOsql Presentation.pdf
 
NoSQL.pptx
NoSQL.pptxNoSQL.pptx
NoSQL.pptx
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandra
 
NoSQL and MongoDB
NoSQL and MongoDBNoSQL and MongoDB
NoSQL and MongoDB
 
No SQL
No SQLNo SQL
No SQL
 
Drop acid
Drop acidDrop acid
Drop acid
 
NoSql - mayank singh
NoSql - mayank singhNoSql - mayank singh
NoSql - mayank singh
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Management
 
Chapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesChapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choices
 
MongoDB
MongoDBMongoDB
MongoDB
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQL
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
 
Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?
 
Revision
RevisionRevision
Revision
 
Nosql databases for the .net developer
Nosql databases for the .net developerNosql databases for the .net developer
Nosql databases for the .net developer
 

More from lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
lucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
lucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
lucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
lucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
lucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
lucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
lucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
lucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
lucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
lucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
lucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
lucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
lucenerevolution
 

More from lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
 

Recently uploaded

Multithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race conditionMultithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race condition
Mohammed Sikander
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
Wasim Ak
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
Israel Genealogy Research Association
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
deeptiverma2406
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
David Douglas School District
 
Digital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion DesignsDigital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion Designs
chanes7
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
TechSoup
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
chanes7
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 

Recently uploaded (20)

Multithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race conditionMultithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race condition
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
 
Digital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion DesignsDigital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion Designs
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 

Solr cloud the 'search first' nosql database extended deep dive

  • 1. Solr The Search First NoSQL Database
  • 2. • Mark Miller: Cloudera employee, Lucene PMC member, Apache member • Started playing with Lucene in 2006 • Lucene committer since 2008 • Solr committer since 2009 Who Am I?
  • 4. Big Data is getting Bigger • The total Big Data market reached $11.4 billion in 2012 • The Big Data market is projected to reach $18.1 billion in 2013, an annual growth of 61% • On pace to exceed $47 billion by 2017.
  • 5. 3 basic needs • Storage • Processing • Search
  • 6. Two Standouts in the Big Data Market •Hadoop •NoSQL
  • 7. Ultimately, the NoSQL market is largely up for grabs. Each NoSQL database has its related strengths and weaknesses, and no one NoSQL database currently “does it all.” Big Data practitioners must take a number of factors into consideration when selecting a NoSQL database to facilitate large-scale transactional workloads, including scalability, performance, security, and ease-of-development. Big Data Vendor Revenue and Market Forecast (Wikibon)
  • 8. RMDBS • The classic way to store your data. • ACID is great, transactions are cool, SQL is well known and understood. • Scaling is *hard*, but possible (see Facebook’s MySQL cluster) • ‘impedance mismatch’ sucks
  • 9. Search • Search has been moving from an expensive, complicated option to an affordable and more easy necessity. • Lot’s of data begs for the ability to process it, store it, and search it.
  • 10. Enterprise Search Engines • Verity - acquired by Autonomy in 2005 • FAST - acquired by Microsoft in 2008 • Endeca - acquired by Oracle in 2011 • Autonomy - acquired by HP in 2011 • Vivisimo - acquired by IBM in 2012
  • 11. NoSQL • Not Only SQL rather than ‘No SQL’ • Except that makes little sense... • “when ‘NoSQL’ is applied to a database, it refers to an ill- defined set of mostly open-source databases, mostly developed in the early 21st century, and mostly not using SQL.” - NoSQL Distilled
  • 13. In the beginning.. • BerkeleyDB (1991?) • Lotus Notes (1989?) • Bayou (1996?)
  • 14. In the beginning of the modern era... • BigTable (Google) (started in 2004, paper in 2006) • Dynamo (Amazon) (paper in 2007)
  • 15. Derivatives • Dynamo: Cassandra, CouchDB, Voldemort, Riak • BigTable: Cassandra, HBase, Redis, HyperTable, Accumulo
  • 16. Also... • AppEngine storage built on BigTable • DynamoDB - based on the principles of Dynamo
  • 17. When it comes to NoSQL, Open Source rules the roost. • I won’t be talking about any solution that is not based on Open Source - only because those solutions are not popular. • "there’s a notion that NoSQL is an open-source phenomenon.” - NoSQL Distilled
  • 18. The 2013 Future of Open Source Survey Results Black Duck and North Bridge
  • 19. What’s Popular? • NoSQL database proliferation - NoSQL databases are a dime a dozen. Why? • Which solutions should we look at?
  • 20. indeed.com • Indeed.com is an employment-related metasearch engine for job listings • Indeed is the #1 job site worldwide, with over 100 million unique visitors per month. Indeed is available in more than 50 countries and 26 languages, covering 94% of global GDP.
  • 21. http://db-engines.com • DB-Engines is an initiative to collect and present information on database management systems (DBMS). In addition to established relational DBMS, systems and concepts of the growing NoSQL area are emphasized. • The DB-Engines Ranking is a list of DBMS ranked by their current popularity. The list is updated monthly.
  • 28. NoSQL Database Types • Key-Value • Column Family • Document • Graph
  • 29. I’m going to ignore Graph...everyone else seems to...
  • 34.
  • 35. In case you forgot, Oracle is in the NoSQL game... • Oracle NoSQL
  • 36. CAP Theorem The CAP theorem, also known as Brewer's theorem, states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: • Consistency (all nodes see the same data at the same time) • Availability (a guarantee that every request receives a response about whether it was successful or failed) • Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)
  • 37. CAP
  • 38. Architectures • For NoSQL, generally boils down to AP or CP. CA does not support partition tolerance. • You have to trade off consistency versus availability. • AP favors availability over consistency - the is the eventually consistent architecture. • CP favors consistency over availability. • Of course, there is a continuum between AP and CP.
  • 39. Key Design Decisions • Data Model - how is the data stored/accessed • Distribution Model - how is the data distributed • Conflict Resolution - how is it ensured that the same update ‘wins’ on each node.
  • 40. Data Model • key -> value (opaque) • key -> document • column oriented
  • 41. Distributed Model • Roughly, how is data distributed across the cluster? • Sharding, replication, etc
  • 42. Data Versioning and Consistency • Essentially, how is data kept consistent across nodes? • Sequential consistency—ensuring that all nodes apply operations in the same order. • Update consistency and read consistency.
  • 43. • Data Model - bson - binary json format • Distributed Model - sharded asynchronous master/ slave replication. • Data Versioning and Consistency - Master / Slave, per table write lock
  • 44. MongoDB Search • Built in text search. I think of it like RBDMS built in full text search - major feature gaps with dedicated full text search engines, and likely major performance gaps. • Common to sit a search engine next to MongoDB
  • 45. • Data Model - column based, like BigTable • Distributed Updates - similar to Dynamo, consistent hashing, master-master • Data Versioning and Consistency - timestamps
  • 46. Cassandra Search • Lucandra • Solandra • DataStax Enterprise Search (Solr fields must be strings)
  • 47. • Data Model - Column Store • Distribution Model - regions served by region servers. • Versioning and Consistency - strongly consistent
  • 48. HBase Search • HBasene (dead?) • HBASE-SEARCH, HBASE-3529 (dead?) • Solbase • Lily
  • 49. • Riak is a NoSQL database implementing the principles from Amazon's Dynamo paper • Data Model - stores key/value pairs in a high level namespace called a bucket. • Data Versioning and Consistency - Riak uses a data structure called a vector clock to reason about causality and staleness of stored values. (Can also use timestamps). Last write wins, or client resolves conflict.
  • 50. Riak Search • Riak Search - custom search engine, Solr-like API • Yokozuna
  • 51. Yokozuna Author Enumerates Common Reasons Custom Search has Failed • Pretends to be lucene/solr • Lack of analyzer/language/features • Bad performance/resource usage for certain queries • Basho is not in the business of search
  • 52. • CouchDB’s data format is JSON stored as documents (self-contained records with no intrinsic relationships), grouped into “database” namespaces. • Conflicts are left to the application to resolve at write time. CouchDB arbitrarily, but deterministically, determines a winner and tracks a conflict. The client must then resolve the conflict.
  • 53. CouchDB Search • CouchDB-Lucene • Seems people usually just sit a search engine next to CouchDB
  • 54. • Redis is an open-source, networked, in-memory, key- value data store with optional durability. • Memcached is a general-purpose distributed memory caching system • Redis-Search
  • 55. Adding Search to NoSQL • Hard to do without a lot of compromise • Build your own, or use Lucene or Lucene based solution • Nothing has yet set the world on fire...
  • 56. Adding NoSQL to Search • Search solutions are generally already a Document based NoSQL solution. • Seems a lot easier to do then the reverse • Nothing has yet set the world on fire...
  • 57. Solr NoSQL Features • Realtime-Get • Update Durability • Atomic Compare and Set • Versioning and optimistic locking
  • 58. Schemaless? • NoSQL databases are generally ‘schemaless’ • In some ways, convenient, in others ways not. • Implicit schema moves to application code. • Can’t optimize based on types. • Note: some are calling ‘guessed’ schemas schemaless.
  • 59. • Most similar to the MongoDB architecture • A CP system, though currently, eventually consistent. • The architecture supports adding strong consistency options.
  • 60. SolrCloud • The length of time an inconsistency is present is called the inconsistency window. • SolrCloud has a very small inconsistency window.
  • 61. Data Model • key -> document • Optionally, column oriented
  • 62.
  • 63. Contact Info • @heismark • markrmiller@gmail.com