Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Building a High-
Performance Database with
Scala, Akka, and Spark
Evan Chan
Who am I
User and contributor to Spark since 0.9,
Cassandra since 0.6
Created Spark Job Server and FiloDB
Talks at Spark S...
Streaming is now King
Message
Queue
Events
Stream
Processing
Layer
State /
Database
Happy
Users
Why are Updates Important?
Appends
Streaming workloads. Add new data continuously.
Real data is *always* changing. Queries...
Introducing FiloDB
A distributed, versioned, columnar analytics
database. With updates. Built for streaming.
http://www.gi...
Fast Analytics Storage
• Scan speeds competitive with Apache Parquet
• In-memory version significantly faster
• Flexible fil...
Message
Queue
Events
Spark
Streaming
Short term
storage, K-V
Adhoc,
SQL, ML
Cassandra
FiloDB: Events,
ad-hoc, batch
Spark
...
100% Reactive
• Scala
• Akka Cluster
• Spark
• Typesafe Config for all configuration
• Scodec, Ficus, Enumeratum, Scalactic,...
Scala, Akka, and Spark
• Akka - eliminate shared mutable state
• Remote and cluster makes building distributed
client-serv...
One FiloDB Node
NodeCoordinatorActor
(NCA)
DatasetCoordinatorActor
(DsCA)
DatasetCoordinatorActor
(DsCA)
Active MemTable
F...
Akka vs Futures
NodeCoordinatorActor
(NCA)
DatasetCoordinatorActor
(DsCA)
DatasetCoordinatorActor
(DsCA)
Active MemTable
F...
Akka vs Futures
• Akka Actors:
• External FiloDB node API (remote + cluster)
• Async messaging with clients
• State manage...
Akka for Control Flow
Driver
Client
Executor
NCA
DsCA1 DsCA2
Executor
NCA
DsCA1 DsCA2
Flush()
NodeClusterActor
SingletonCl...
Yes, Akka in Spark
• Columnar ingestion is stateful - need stickiness of state. This
is inherently difficult in Spark.
• Ak...
Data Ingestion Setup
Executor
NCA
DsCA1 DsCA2
task0 task1
Row
Source
Actor
Row
Source
Actor
Executor
NCA
DsCA1 DsCA2
task0...
FiloDB NodeFiloDB Node
FiloDB separate nodes
Executor
NCA
DsCA1 DsCA2
task0 task1
Row
Source
Actor
Row
Source
Actor
Execut...
Akka wire protocol
Backpressure
• Assumes receiver is OK, starts sending rows
• Allows configurable number of unacked
messages before stops se...
Testing Akka Cluster
• MultiNodeSpec / sbt-multi-jvm
• AWESOME
• Test multi-node message routing
• Test cluster membership...
Core: All Futures
/**
* Clears all data from the column store for that given projection, for all versions.
* More like a t...
Kamon Tracing
def appendSegment(projection: RichProjection,
segment: ChunkSetSegment,
version: Int): Future[Response] = Tr...
Kamon Tracing
• http://kamon.io
• One trace can encapsulate multiple Future steps
all executing on different threads
• Tun...
Kamon Metrics
• Uses HDRHistogram for much finer and more
accurate buckets
• Built-in metrics for Akka actors, Spray, Akka-...
Validation: Scalactic
private def getColumnsFromNames(allColumns: Seq[Column],
columnNames: Seq[String]): Seq[Column] Or B...
Machine-Speed Scala
http://github.com/velvia/filo
https://github.com/filodb/FiloDB/blob/new-storage-format/core/
src/main/sc...
Filo: High Performance
Binary Vectors
• Designed for NoSQL, not a file format
• random or linear access
• on or off heap
• ...
Billions of Ops / Sec
• JMH benchmark: 0.5ns per FiloVector element access / add
• 2 Billion adds per second - single thre...
Thank you Scala OSS!
Upcoming SlideShare
Loading in …5
×

Building a High-Performance Database with Scala, Akka, and Spark

1,929 views

Published on

Here is my talk at Scala by the Bay 2016, Building a High-Performance Database with Scala, Akka, and Spark. Covers integration of Akka and Spark, when to use actors and futures, back pressure, reactive monitoring with Kamon, and more.

Published in: Engineering

Building a High-Performance Database with Scala, Akka, and Spark

  1. 1. Building a High- Performance Database with Scala, Akka, and Spark Evan Chan
  2. 2. Who am I User and contributor to Spark since 0.9, Cassandra since 0.6 Created Spark Job Server and FiloDB Talks at Spark Summit, Cassandra Summit, Strata, Scala Days, etc. http://velvia.github.io/
  3. 3. Streaming is now King
  4. 4. Message Queue Events Stream Processing Layer State / Database Happy Users
  5. 5. Why are Updates Important? Appends Streaming workloads. Add new data continuously. Real data is *always* changing. Queries on live real-time data has business benefits. Updates Idempotency = really simple ingestion pipelines Simpler streaming later update late events (See Spark 2.0 Structured Streaming)
  6. 6. Introducing FiloDB A distributed, versioned, columnar analytics database. With updates. Built for streaming. http://www.github.com/filodb/FiloDB
  7. 7. Fast Analytics Storage • Scan speeds competitive with Apache Parquet • In-memory version significantly faster • Flexible filtering along two dimensions • Much more efficient and flexible partition key filtering • Efficient columnar storage using dictionary encoding and other techniques • Updatable • Spark SQL for easy BI integration
  8. 8. Message Queue Events Spark Streaming Short term storage, K-V Adhoc, SQL, ML Cassandra FiloDB: Events, ad-hoc, batch Spark Dashboa rds, maps
  9. 9. 100% Reactive • Scala • Akka Cluster • Spark • Typesafe Config for all configuration • Scodec, Ficus, Enumeratum, Scalactic, etc. • Even most of the performance critical parts are written in Scala :)
  10. 10. Scala, Akka, and Spark • Akka - eliminate shared mutable state • Remote and cluster makes building distributed client-server architectures easy • Backpressure, at-least-once is easy to build • Failure handling and supervision are critical for databases • Spark for SQL, DataFrames, ML, interfacing
  11. 11. One FiloDB Node NodeCoordinatorActor (NCA) DatasetCoordinatorActor (DsCA) DatasetCoordinatorActor (DsCA) Active MemTable Flushing MemTable Reprojector ColumnStore Data, commands
  12. 12. Akka vs Futures NodeCoordinatorActor (NCA) DatasetCoordinatorActor (DsCA) DatasetCoordinatorActor (DsCA) Active MemTable Flushing MemTable Reprojector ColumnStore Data, commands Akka - control flow Core I/O - Futures
  13. 13. Akka vs Futures • Akka Actors: • External FiloDB node API (remote + cluster) • Async messaging with clients • State management and scheduling (flushing) • Futures: • Core I/O • Columnar data processing / ingestion • Type-safe processing stages
  14. 14. Akka for Control Flow Driver Client Executor NCA DsCA1 DsCA2 Executor NCA DsCA1 DsCA2 Flush() NodeClusterActor SingletonClusterProxy
  15. 15. Yes, Akka in Spark • Columnar ingestion is stateful - need stickiness of state. This is inherently difficult in Spark. • Akka (cluster) gives us a separate, asynchronous control channel to talk to FiloDB ingestors • Spark only gives data flow primitives, not async messaging • We need to route incoming records to the correct ingestion node. Sorting data is inefficient and forces all nodes to wait for sorting to be done. • On failure, can control state recovery and moving state
  16. 16. Data Ingestion Setup Executor NCA DsCA1 DsCA2 task0 task1 Row Source Actor Row Source Actor Executor NCA DsCA1 DsCA2 task0 task1 Row Source Actor Row Source Actor Node Cluster Actor Partition Map
  17. 17. FiloDB NodeFiloDB Node FiloDB separate nodes Executor NCA DsCA1 DsCA2 task0 task1 Row Source Actor Row Source Actor Executor NCA DsCA1 DsCA2 task0 task1 Row Source Actor Row Source Actor Node Cluster Actor Partition Map
  18. 18. Akka wire protocol
  19. 19. Backpressure • Assumes receiver is OK, starts sending rows • Allows configurable number of unacked messages before stops sending • Acking is receiver’s way of rate-limiting • Automatic retries for at-least-once • NACK for when receiver must stop (out of memory or MemTable full)
  20. 20. Testing Akka Cluster • MultiNodeSpec / sbt-multi-jvm • AWESOME • Test multi-node message routing • Test cluster membership and subscription • Inject network failures
  21. 21. Core: All Futures /** * Clears all data from the column store for that given projection, for all versions. * More like a truncation, not a drop. * NOTE: please make sure there are no reprojections or writes going on before calling this */ def clearProjectionData(projection: Projection): Future[Response] /** * Completely and permanently drops the dataset from the column store. * @param dataset the DatasetRef for the dataset to drop. */ def dropDataset(dataset: DatasetRef): Future[Response] /** * Appends the ChunkSets and incremental indices in the segment to the column store. * @param segment the ChunkSetSegment to write / merge to the columnar store * @param version the version # to write the segment to * @return Success. Future.failure(exception) otherwise. */ def appendSegment(projection: RichProjection, segment: ChunkSetSegment, version: Int): Future[Response]
  22. 22. Kamon Tracing def appendSegment(projection: RichProjection, segment: ChunkSetSegment, version: Int): Future[Response] = Tracer.withNewContext("append-segment") { val ctx = Tracer.currentContext stats.segmentAppend() if (segment.chunkSets.isEmpty) { stats.segmentEmpty() return(Future.successful(NotApplied)) } for { writeChunksResp <- writeChunks(projection.datasetRef, version, segment, ctx) writeIndexResp <- writeIndices(projection, version, segment, ctx) if writeChunksResp == Success } yield { ctx.finish() writeIndexResp } } private def writeChunks(dataset: DatasetRef, version: Int, segment: ChunkSetSegment, ctx: TraceContext): Future[Response] = { asyncSubtrace(ctx, "write-chunks", "ingestion") { val binPartition = segment.binaryPartition val segmentId = segment.segmentId val chunkTable = getOrCreateChunkTable(dataset) Future.traverse(segment.chunkSets) { chunkSet => chunkTable.writeChunks(binPartition, version, segmentId, chunkSet.info.id, chunkSet.chunks, stats) }.map { responses => responses.head } } }
  23. 23. Kamon Tracing • http://kamon.io • One trace can encapsulate multiple Future steps all executing on different threads • Tunable tracing levels • Summary stats and histograms for segments • Super useful for production debugging of reactive stack
  24. 24. Kamon Metrics • Uses HDRHistogram for much finer and more accurate buckets • Built-in metrics for Akka actors, Spray, Akka- Http, Play, etc. etc. KAMON trace name=append-segment n=2863 min=765952 p50=2113536 p90=3211264 p95=3981312 p99=9895936 p999=16121856 max=19529728 KAMON trace-segment name=write-chunks n=2864 min=436224 p50=1597440 p90=2637824 p95=3424256 p99=9109504 p999=15335424 max=18874368 KAMON trace-segment name=write-index n=2863 min=278528 p50=432128 p90=544768 p95=598016 p99=888832 p999=2260992 max=8355840
  25. 25. Validation: Scalactic private def getColumnsFromNames(allColumns: Seq[Column], columnNames: Seq[String]): Seq[Column] Or BadSchema = { if (columnNames.isEmpty) { Good(allColumns) } else { val columnMap = allColumns.map { c => c.name -> c }.toMap val missing = columnNames.toSet -- columnMap.keySet if (missing.nonEmpty) { Bad(MissingColumnNames(missing.toSeq, "projection")) } else { Good(columnNames.map(columnMap)) } } } for { computedColumns <- getComputedColumns(dataset.name, allColIds, columns) dataColumns <- getColumnsFromNames(columns, normProjection.columns) richColumns = dataColumns ++ computedColumns // scalac has problems dealing with (a, b, c) <- getColIndicesAndType... apparently segStuff <- getColIndicesAndType(richColumns, Seq(normProjection.segmentColId), "segment") keyStuff <- getColIndicesAndType(richColumns, normProjection.keyColIds, "row") partStuff <- getColIndicesAndType(richColumns, dataset.partitionColumns, "partition") } yield { • Notice how multiple validations compose!
  26. 26. Machine-Speed Scala http://github.com/velvia/filo https://github.com/filodb/FiloDB/blob/new-storage-format/core/ src/main/scala/filodb.core/binaryrecord/BinaryRecord.scala
  27. 27. Filo: High Performance Binary Vectors • Designed for NoSQL, not a file format • random or linear access • on or off heap • missing value support • Scala only, but cross-platform support possible http://github.com/velvia/filo is a binary data vector library designed for extreme read performance with minimal deserialization costs.
  28. 28. Billions of Ops / Sec • JMH benchmark: 0.5ns per FiloVector element access / add • 2 Billion adds per second - single threaded • Who said Scala cannot be fast? • Spark API (row-based) limits performance significantly val randomInts = (0 until numValues).map(i => util.Random.nextInt) val randomIntsAray = randomInts.toArray val filoBuffer = VectorBuilder(randomInts).toFiloBuffer val sc = FiloVector[Int](filoBuffer) @Benchmark @BenchmarkMode(Array(Mode.AverageTime)) @OutputTimeUnit(TimeUnit.MICROSECONDS) def sumAllIntsFiloApply(): Int = { var total = 0 for { i <- 0 until numValues optimized } { total += sc(i) } total }
  29. 29. Thank you Scala OSS!

×