SlideShare a Scribd company logo
Nitro:	A	Fast,	Scalable	In-Memory	Storage	
Engine	for	NoSQL	Global	Secondary	Index		
Sarath	Lakshman	
Sriram	Melkote	
John	Liang	
Ravi	Mayuram	
	
VLDB	2016
©2016	Couchbase	Inc.	 2	
Background	
§  Couchbase	is	a	scalable	NoSQL	distributed	document	database	
§  Global	Secondary	Indexes	(GSI)	are	updated	asynchronously	and	scaled	
independently	from	main	document	storage	
§  Single	index	write	performance	matters	as	it	has	to	keep	up	with	the	
rate	of	document	mutations
©2016	Couchbase	Inc.	 3	
Design	considerations	
§  Multiple	Writers	for	high	performance	
§  Utilize	the	inherent	parallelism	in	the	Database	Change	Protocol	(DCP)	
§  Scalable	single	index	write	performance	by	using	available	CPU	cores	
	
§  Lock-free	data	structures	for	high	concurrency	
§  Writers	and	readers	never	block	
§  Maximize	utilization	of	multicore	CPUs	
	
§  Fast	snapshots	
§  Minimize	latency	for	index	queries/	reduce	staleness	of	the	index		
§  Create	read	snapshots	at	the	rate	of	100/second	
	
§  Leverage	optimizations	for	memory	resident	data	structures
©2016	Couchbase	Inc.	 4	
Nitro	Architecture	
•  Implements	Insert,	Delete,	Lookup,	Range	Iteration	
•  Concurrent	partitioned	visitors	
•  Concurrent	bottom-up	skiplist	build	
	
•  Create	point-in-time	immutable	snapshots	for	index	scans	
•  Avoid	phantoms	and	provide	scan	stability	
•  Manage	index	snapshot	versions	in	use	
	
•  Remove	items	from	skiplist	which	belongs	to	the	unused	
snapshots	
•  Free	items	when	GCed	and	not	in	reference	
	
•  Create	backups	from	snapshots	and	recover	nitro	after	
restart/crash
©2016	Couchbase	Inc.	 5	
Skiplist	
§  Probabilistic	balanced	ordered	search	data	structure	
§  Search	is	similar	to	binary	search	over	linked-lists	(O(logn))	
§  Item	granular	operations	unlike	B+Tree	(page	oriented)	
§  Lock-free	skiplist	is	implemented	by	making	use	of	compare-and-swap,	
atomic-add-fetch	(See	paper	for	details)
©2016	Couchbase	Inc.	 6	
Multi	versions	management	
V=10	
bornSn=1	
deadSn=0	
V=20	
bornSn=1	
deadSn=0	
V=30	
bornSn=1	
deadSn=0	
Create	Snapshot	(Sn=1)	
V=10	
bornSn=1	
deadSn=0	
V=20	
bornSn=1	
deadSn=0	
V=30	
bornSn=1	
deadSn=0	
V=15	
bornSn=2	
deadSn=0	
V=32	
bornSn=2	
deadSn=0	
Create	Snapshot	(Sn=2)	
Lifetime	(bornSn,	deadSn)
©2016	Couchbase	Inc.	 7	
Multi	versions	management	
V=10	
bornSn=1	
deadSn=0	
V=20	
bornSn=1	
deadSn=3	
V=30	
bornSn=1	
deadSn=0	
V=15	
bornSn=2	
deadSn=0	
V=32	
bornSn=2	
deadSn=3	
Create	Snapshot	(Sn=3)	
V=32	
bornSn=3	
deadSn=0
©2016	Couchbase	Inc.	 8	
Multi	versions	management	
V=10	
bornSn=1	
deadSn=0	
V=20	
bornSn=1	
deadSn=3	
V=30	
bornSn=1	
deadSn=0	
V=15	
bornSn=2	
deadSn=0	
V=32	
bornSn=2	
deadSn=3	
V=32	
bornSn=3	
deadSn=0	
Visibility:	Iterator	(Sn=1)
©2016	Couchbase	Inc.	 9	
Multi	versions	management	
V=10	
bornSn=1	
deadSn=0	
V=20	
bornSn=1	
deadSn=3	
V=30	
bornSn=1	
deadSn=0	
V=15	
bornSn=2	
deadSn=0	
V=32	
bornSn=2	
deadSn=3	
V=32	
bornSn=3	
deadSn=0	
Visibility:	Iterator	(Sn=2)
©2016	Couchbase	Inc.	 10	
Multi	versions	management	
V=10	
bornSn=1	
deadSn=0	
V=20	
bornSn=1	
deadSn=3	
V=30	
bornSn=1	
deadSn=0	
V=15	
bornSn=2	
deadSn=0	
V=32	
bornSn=2	
deadSn=3	
V=32	
bornSn=3	
deadSn=0	
Visibility:	Iterator	(Sn=3)
©2016	Couchbase	Inc.	 11	
Garbage	Collection	
V=1	
bornSn=1	
deadSn=2	
V=2	
bornSn=2	
deadSn=0	
V=3	
bornSn=1	
deadSn=0	
V=4	
bornSn=1	
deadSn=2	
V=5	
bornSn=2	
deadSn=3	
V=6	
bornSn=3	
deadSn=0	
V=7	
bornSn=4	
deadSn=0	
V=8	
bornSn=1	
deadSn=0	
V=9	
bornSn=3	
deadSn=0	
V=10	
bornSn=3	
deadSn=4	
Sn=1	 Sn=2	 Sn=3	 Sn=4	 GC	
SMR	
Garbage	Collection	Snapshot	List	
rfcnt=0	 rfcnt=1	 rfcnt=0	 rfcnt=2	
V=1
©2016	Couchbase	Inc.	 12	
Safe	Memory	Reclamation	
§  Early	and	alive	accessors	can	potentially	hold	references	to	GCed	items	
§  Freeing	GCed	items/nodes	can	cause	dangling	references	
§  The	memory	reclaimer	has	to	make	sure	that	no	accessor	is	holding	
reference	to	GCed	items	
§  This	problem	does	not	occur	with	garbage	collected	languages	
§  A	lock-free	SMR	algorithm	is	described	in	the	paper
©2016	Couchbase	Inc.	 13	
Nitro	Backup	
File-1	
Backup	
worker-1	
Backup	
worker-2	
Backup	
worker-3	
File-2	
File-3	
GC	
Delta	
files	
non-intrusive	
backup
©2016	Couchbase	Inc.	 14	
Nitro	Recovery	
File-1	
File-2	
File-3	
•  Concurrent	bottom-up	skiplist	build	
•  Avoids	unnecessary	CAS	conflicts	
						during	concurrent	insert	
	
•  Snapshot	number	starts	from	Sn=1	
	
•  Once	build	is	complete,	additional	
							items	are	inserted	by	replaying	inserts	from	
							delta	files	concurrently
©2016	Couchbase	Inc.	 15	
Couchbase	Global	Secondary	Indexes	(GSI)	
§  The	storage	engine	needs	to	
maintain	two	storage	structures:	
§  Reverse	map	
§  Index	
§  Reverse	map	is	used	to	lookup	and	
remove	previous	index	entry	for	
the	docid	during	the	update	
§  Index	store	maintains	ordered	
index	entries	used	by	index	scans
©2016	Couchbase	Inc.	 16	
Nitro	Integration	–	Couchbase	Memory	Optimized	Indexes	
§  Scalable	write	performance	using	multiple	
writers	
	
§  Simple	hash	table	used	for	reverse	map	instead	
of	Nitro	(Avoid	concurrency	overheads)	
	
§  Periodic	backup	persists	only	(indexItem,	docid)	
	
§  The	reverse	map	can	be	reconstructed	on	the	fly	
during	recovery	
	
§  End-to-end	Indexing	latency	~20ms	
HT	
Nitro	INDEX	
hash(docid)	%	n	
writer-1	
HT	
writer-2	
HT	
writer-n	
..	
Index	Scan
©2016	Couchbase	Inc.	 17	
Storage	Optimization	for	reverse	map	
HT	
Nitro	
INDEX	
DocID	 Indexed	Item	
emp_005	 MountainView	
emp_008	 Sunnywale	
Index	Entry	
MountainView:emp_005		
Sunnywale:emp_008	
CRC32	Hash	 Node	Pointers	
hash1	
hash2	
§  Direct	pointers	from	hash	table	to	index	
entry	
§  Storage	needed	for	index	maintenance	
reduced	~50%	
§  Index	item	delete	cost	reduced	from	
O(logn)	to	O(1)	
§  Optimized	multi-entry	indexing	from	
single	document
©2016	Couchbase	Inc.	 18	
Nitro	Performance	
Insert	benchmark	 Lookup	benchmark
©2016	Couchbase	Inc.	 19	
Nitro	Performance	
Get	with	background	Inserts	 Throughput	scalability	with	partitions
©2016	Couchbase	Inc.	 20	
End-to-End	GSI	Performance	
Operation	 MOI	Index	 Regular	
Index	
Insert	 1,658,031		 88,102		
Update	 822,680		 70,802		
Delete	 1,578,316		 80,578		
GSI	index	server	throughput	(items/sec)		
	
Single	Index	benchmark
©2016	Couchbase	Inc.	 21	
Backup	and	Recovery	time
Q&A

More Related Content

What's hot

Building tiered data stores using aesop to bridge sql and no sql systems
Building tiered data stores using aesop to bridge sql and no sql systemsBuilding tiered data stores using aesop to bridge sql and no sql systems
Building tiered data stores using aesop to bridge sql and no sql systems
Regunath B
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
Ashnikbiz
 
ClustrixDB: how distributed databases scale out
ClustrixDB: how distributed databases scale outClustrixDB: how distributed databases scale out
ClustrixDB: how distributed databases scale out
MariaDB plc
 
Kafka website activity architecture
Kafka website activity architectureKafka website activity architecture
Kafka website activity architecture
Omid Vahdaty
 
How Alibaba Cloud scaled ApsaraDB with MariaDB MaxScale
How Alibaba Cloud scaled ApsaraDB with MariaDB MaxScaleHow Alibaba Cloud scaled ApsaraDB with MariaDB MaxScale
How Alibaba Cloud scaled ApsaraDB with MariaDB MaxScale
MariaDB plc
 
Security Best Practices for your Postgres Deployment
Security Best Practices for your Postgres DeploymentSecurity Best Practices for your Postgres Deployment
Security Best Practices for your Postgres Deployment
PGConf APAC
 
Introduction to couchbase
Introduction to couchbaseIntroduction to couchbase
Introduction to couchbase
Dipti Borkar
 
25 snowflake
25 snowflake25 snowflake
25 snowflake
剑飞 陈
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
lucenerevolution
 
Rails on HBase
Rails on HBaseRails on HBase
Rails on HBase
EffectiveUI
 
Products.intro.forum version
Products.intro.forum versionProducts.intro.forum version
Products.intro.forum version
sqlserver.co.il
 
Scylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
Scylla Summit 2022: How ScyllaDB Powers This Next Tech CycleScylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
Scylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
ScyllaDB
 
Membase Introduction
Membase IntroductionMembase Introduction
Membase Introduction
Membase
 
Introduction to NoSQL and Couchbase
Introduction to NoSQL and CouchbaseIntroduction to NoSQL and Couchbase
Introduction to NoSQL and Couchbase
Cecile Le Pape
 
Thug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen ZhangThug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen Zhang
Chen Zhang
 
Scylla Summit 2016: ScyllaDB, Present and Future
Scylla Summit 2016: ScyllaDB, Present and FutureScylla Summit 2016: ScyllaDB, Present and Future
Scylla Summit 2016: ScyllaDB, Present and Future
ScyllaDB
 
How MariaDB is approaching DBaaS
How MariaDB is approaching DBaaSHow MariaDB is approaching DBaaS
How MariaDB is approaching DBaaS
MariaDB plc
 
Cluster schedulers
Cluster schedulersCluster schedulers
Cluster schedulers
Anton Zadorozhniy
 
Using cassandra as a distributed logging to store pb data
Using cassandra as a distributed logging to store pb dataUsing cassandra as a distributed logging to store pb data
Using cassandra as a distributed logging to store pb data
Ramesh Veeramani
 
Chicago Data Summit: Geo-based Content Processing Using HBase
Chicago Data Summit: Geo-based Content Processing Using HBaseChicago Data Summit: Geo-based Content Processing Using HBase
Chicago Data Summit: Geo-based Content Processing Using HBase
Cloudera, Inc.
 

What's hot (20)

Building tiered data stores using aesop to bridge sql and no sql systems
Building tiered data stores using aesop to bridge sql and no sql systemsBuilding tiered data stores using aesop to bridge sql and no sql systems
Building tiered data stores using aesop to bridge sql and no sql systems
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
ClustrixDB: how distributed databases scale out
ClustrixDB: how distributed databases scale outClustrixDB: how distributed databases scale out
ClustrixDB: how distributed databases scale out
 
Kafka website activity architecture
Kafka website activity architectureKafka website activity architecture
Kafka website activity architecture
 
How Alibaba Cloud scaled ApsaraDB with MariaDB MaxScale
How Alibaba Cloud scaled ApsaraDB with MariaDB MaxScaleHow Alibaba Cloud scaled ApsaraDB with MariaDB MaxScale
How Alibaba Cloud scaled ApsaraDB with MariaDB MaxScale
 
Security Best Practices for your Postgres Deployment
Security Best Practices for your Postgres DeploymentSecurity Best Practices for your Postgres Deployment
Security Best Practices for your Postgres Deployment
 
Introduction to couchbase
Introduction to couchbaseIntroduction to couchbase
Introduction to couchbase
 
25 snowflake
25 snowflake25 snowflake
25 snowflake
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
 
Rails on HBase
Rails on HBaseRails on HBase
Rails on HBase
 
Products.intro.forum version
Products.intro.forum versionProducts.intro.forum version
Products.intro.forum version
 
Scylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
Scylla Summit 2022: How ScyllaDB Powers This Next Tech CycleScylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
Scylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
 
Membase Introduction
Membase IntroductionMembase Introduction
Membase Introduction
 
Introduction to NoSQL and Couchbase
Introduction to NoSQL and CouchbaseIntroduction to NoSQL and Couchbase
Introduction to NoSQL and Couchbase
 
Thug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen ZhangThug feb 23 2015 Chen Zhang
Thug feb 23 2015 Chen Zhang
 
Scylla Summit 2016: ScyllaDB, Present and Future
Scylla Summit 2016: ScyllaDB, Present and FutureScylla Summit 2016: ScyllaDB, Present and Future
Scylla Summit 2016: ScyllaDB, Present and Future
 
How MariaDB is approaching DBaaS
How MariaDB is approaching DBaaSHow MariaDB is approaching DBaaS
How MariaDB is approaching DBaaS
 
Cluster schedulers
Cluster schedulersCluster schedulers
Cluster schedulers
 
Using cassandra as a distributed logging to store pb data
Using cassandra as a distributed logging to store pb dataUsing cassandra as a distributed logging to store pb data
Using cassandra as a distributed logging to store pb data
 
Chicago Data Summit: Geo-based Content Processing Using HBase
Chicago Data Summit: Geo-based Content Processing Using HBaseChicago Data Summit: Geo-based Content Processing Using HBase
Chicago Data Summit: Geo-based Content Processing Using HBase
 

Similar to nitro

Benchmarking Couchbase Server for Interactive Applications
Benchmarking Couchbase Server for Interactive ApplicationsBenchmarking Couchbase Server for Interactive Applications
Benchmarking Couchbase Server for Interactive Applications
Altoros
 
Altoros using no sql databases for interactive_applications
Altoros using no sql databases for interactive_applicationsAltoros using no sql databases for interactive_applications
Altoros using no sql databases for interactive_applications
Jeff Harris
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
Andriy Zabavskyy
 
Azure DocumentDB Overview
Azure DocumentDB OverviewAzure DocumentDB Overview
Azure DocumentDB Overview
Andrew Liu
 
OpenStack Days East -- MySQL Options in OpenStack
OpenStack Days East -- MySQL Options in OpenStackOpenStack Days East -- MySQL Options in OpenStack
OpenStack Days East -- MySQL Options in OpenStack
Matt Lord
 
MySQL Options in OpenStack
MySQL Options in OpenStackMySQL Options in OpenStack
MySQL Options in OpenStack
Tesora
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Management
sameerfaizan
 
Change data capture
Change data captureChange data capture
Change data capture
Ron Barabash
 
MongoDB Internals
MongoDB InternalsMongoDB Internals
MongoDB Internals
Siraj Memon
 
From 0 to syncing
From 0 to syncingFrom 0 to syncing
From 0 to syncing
Philipp Fehre
 
Using Apache Cassandra and Apache Kafka to Scale Next Gen Applications
Using Apache Cassandra and Apache Kafka to Scale Next Gen ApplicationsUsing Apache Cassandra and Apache Kafka to Scale Next Gen Applications
Using Apache Cassandra and Apache Kafka to Scale Next Gen Applications
Data Con LA
 
MySQL NDB Cluster 8.0
MySQL NDB Cluster 8.0MySQL NDB Cluster 8.0
MySQL NDB Cluster 8.0
Ted Wennmark
 
Microservices Development - ICP Workshop Batch II
Microservices Development - ICP Workshop Batch IIMicroservices Development - ICP Workshop Batch II
Microservices Development - ICP Workshop Batch II
PT Datacomm Diangraha
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
Hadoop User Group
 
Common crawlpresentation
Common crawlpresentationCommon crawlpresentation
Common crawlpresentation
Hadoop User Group
 
Summer 2017 undergraduate research powerpoint
Summer 2017 undergraduate research powerpointSummer 2017 undergraduate research powerpoint
Summer 2017 undergraduate research powerpoint
Christopher Dubois
 
MySQL :What's New #GIDS16
MySQL :What's New #GIDS16MySQL :What's New #GIDS16
MySQL :What's New #GIDS16
Sanjay Manwani
 
SharePoint Saturday Netherlands 2016 - SharePoint and Office 365 performances...
SharePoint Saturday Netherlands 2016 - SharePoint and Office 365 performances...SharePoint Saturday Netherlands 2016 - SharePoint and Office 365 performances...
SharePoint Saturday Netherlands 2016 - SharePoint and Office 365 performances...
Patrick Guimonet
 
Mongo db 3.4 Overview
Mongo db 3.4 OverviewMongo db 3.4 Overview
Mongo db 3.4 Overview
Norberto Leite
 
NoSQL Consepts
NoSQL ConseptsNoSQL Consepts
NoSQL Consepts
Maynooth University
 

Similar to nitro (20)

Benchmarking Couchbase Server for Interactive Applications
Benchmarking Couchbase Server for Interactive ApplicationsBenchmarking Couchbase Server for Interactive Applications
Benchmarking Couchbase Server for Interactive Applications
 
Altoros using no sql databases for interactive_applications
Altoros using no sql databases for interactive_applicationsAltoros using no sql databases for interactive_applications
Altoros using no sql databases for interactive_applications
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
 
Azure DocumentDB Overview
Azure DocumentDB OverviewAzure DocumentDB Overview
Azure DocumentDB Overview
 
OpenStack Days East -- MySQL Options in OpenStack
OpenStack Days East -- MySQL Options in OpenStackOpenStack Days East -- MySQL Options in OpenStack
OpenStack Days East -- MySQL Options in OpenStack
 
MySQL Options in OpenStack
MySQL Options in OpenStackMySQL Options in OpenStack
MySQL Options in OpenStack
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Management
 
Change data capture
Change data captureChange data capture
Change data capture
 
MongoDB Internals
MongoDB InternalsMongoDB Internals
MongoDB Internals
 
From 0 to syncing
From 0 to syncingFrom 0 to syncing
From 0 to syncing
 
Using Apache Cassandra and Apache Kafka to Scale Next Gen Applications
Using Apache Cassandra and Apache Kafka to Scale Next Gen ApplicationsUsing Apache Cassandra and Apache Kafka to Scale Next Gen Applications
Using Apache Cassandra and Apache Kafka to Scale Next Gen Applications
 
MySQL NDB Cluster 8.0
MySQL NDB Cluster 8.0MySQL NDB Cluster 8.0
MySQL NDB Cluster 8.0
 
Microservices Development - ICP Workshop Batch II
Microservices Development - ICP Workshop Batch IIMicroservices Development - ICP Workshop Batch II
Microservices Development - ICP Workshop Batch II
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
 
Common crawlpresentation
Common crawlpresentationCommon crawlpresentation
Common crawlpresentation
 
Summer 2017 undergraduate research powerpoint
Summer 2017 undergraduate research powerpointSummer 2017 undergraduate research powerpoint
Summer 2017 undergraduate research powerpoint
 
MySQL :What's New #GIDS16
MySQL :What's New #GIDS16MySQL :What's New #GIDS16
MySQL :What's New #GIDS16
 
SharePoint Saturday Netherlands 2016 - SharePoint and Office 365 performances...
SharePoint Saturday Netherlands 2016 - SharePoint and Office 365 performances...SharePoint Saturday Netherlands 2016 - SharePoint and Office 365 performances...
SharePoint Saturday Netherlands 2016 - SharePoint and Office 365 performances...
 
Mongo db 3.4 Overview
Mongo db 3.4 OverviewMongo db 3.4 Overview
Mongo db 3.4 Overview
 
NoSQL Consepts
NoSQL ConseptsNoSQL Consepts
NoSQL Consepts
 

nitro