SlideShare a Scribd company logo
A TALE OF BUILDING
A LIBRARY IN SCALA
Yaakov Breuer
Photo License: “CC0 Public Domain” https://creativecommons.org/publicdomain/zero/1.0/
{facebook,github}.com/bryaakov
Agenda
• Background
• The Problem
• Design Goals
• The Solution
• Caching Layer
• Distributed high-scale data warehouse
• Combines Big Data with Linked Data
Linked Data is a way of modeling the world by
a graph with typed edges:
Background: “CM-Well”
Yaakov TR
worksAt
Background: “CM-Well”
• Was created about 9 years ago
• Holds 4B objects in Production
• Was open-sourced a year ago
 Usually not an easy task in large corporations
 Keeping us in shape
Background: “CM-Well”
Play
Kafka
BG
Process
BG
Process
BG
Process
Akka Sterams
ElasticsearchCassandra
Background: “CM-Well”
Example 1: Read/Query
$ curl localhost:9000/_sparql --data-binary '
SELECT ?comp WHERE {
<http://example.org/yaakov>
<http://example.org/ont/worksAt> ?comp . }'
------------------------------------------
| comp |
|========================================|
| <http://permid.org/1-4295861160> |
------------------------------------------
Background: “CM-Well”
Example 1: Read/Query (what actually happens)
HTTP GET is received:
• Translate payload to a case class: SparqlRequest
• Query Elasticsearch
• Fetch data from Cassandra
• Return human-readable response
Background: “CM-Well”
Example 2: Data ingest
$ curl -X PUT localhost:9000/_in?format=n3 --data-binary
'@prefix example: <http://example.org/ont/> .
<http://example.org/Yaakov>
example:worksAt <http://permid.org/1-4295861160> .'
{"success":true}
Background: “CM-Well”
Example 2: Data ingest (what actually happens)
HTTP PUT/POST was received:
• Data is parsed
• Kafka messages are being prdocued
• The user gets 200 OK
• (Eventually) Kafka Messages are consumed
• Data is persisted in Cassandra
• Data is indexed in Elasticsearch
THE PROBLEM
• Normally, we store documents / objects
• We do support large files as objects
• Kafka messages should be small *
• “Any problem in computer science can be
solved with another layer of indirection”
(David Wheeler)
* https://kafka.apache.org/documentation/#configuration
The Problem
We wanted…
• A key/value store
• Distributed and Persisted
• put/get API
• To keep it simple
• An in-process solution
Design Goals
Why re-invent the wheel?
(The everlasting trade-off…)
It seems twitter/util has a util-cache module
 Might be a good fit
 No persistence
 Twitter Futures
Other options?
THE SOLUTION:
“zStore”
The zStore Trait
trait ZStore {
def put(uzid: String, value: Array[Byte]): Future[Unit]
def put(uzid: String, value: Array[Byte], secondsToLive: Int):
Future[Unit]
def get(uzid: String): Future[Array[Byte]]
def remove(uzid: String): Future[Unit]
}
• ZStoreImpl – uses Cassandra
• ZStoreMem – in memory (for testing)
Implementations
• Main usage – large files
• Key is hash(content)
• Keeping internal state (e.g. Kafka offsets)
• Caching
• Using TTL
• Using Memoization
Usages
Adding a Caching Layer
Next Level: From zStore to zCache
• zStore has a String => Future[Array[Byte]] API
• We need to generalize it to K => Future[V]
• And let’s use memoization
Memoize
• Traditionally, “memoize” is a function that takes
one function and returns a new function with same
singnature that caches results.
• So you can simply wrap existing heavylifting
method by it; no need to refactor.
Memoize Example
Reminder – HTTP GET is received:
• Translate payload to a case class: SparqlRequest
• Query Elasticsearch
• Fetch data from Cassandra
• Return human-readable response
Memoize Example
def handleHttpGet = {
val request: SparqlRequest = ???
val response = execute(request)
response.map(Ok.apply) // 200 OK
}
Memoize Example
val cachedExecute = memoize(execute)
def handleHttpGet = {
val request: SparqlRequest = ???
val response = cachedExecute(request)
response.map(Ok.apply) // 200 OK
}
zCache.memoize
When used, will have to do the following:
Given a key,
• Get from zStore (with retries)
• If exists, return value
• Else:
• Evaluate the “task”
• Put results in zStore (with TTL)
• Return value
zCache.memoize
In order achieve that, we are going to need:
Given a key, convert from key:K to uzid:String
• Get from zStore (with retries)
• If exists, return value but map from Array[Byte] to V
• Else:
• Evaluate the “task”
• Put results in zStore (with TTL)
but map from V to Array[Byte]
• Return value
def memoize[K,V](task: K => Future[V])
: K => Future[V] = ???
zCache: Constructing memoize API
def memoize[K,V](task: K => Future[V])(
digest: K => String,
deserializer: Array[Byte] => V,
serializer: V => Array[Byte]
): K => Future[V] = ???
zCache: Constructing memoize API
def memoize[K,V](task: K => Future[V])(
digest: K => String,
deserializer: Array[Byte] => V,
serializer: V => Array[Byte],
isCachable: V => Boolean
)
: K => Future[V] = ???
zCache: Constructing memoize API
def memoize[K,V](task: K => Future[V])(
digest: K => String,
deserializer: Array[Byte] => V,
serializer: V => Array[Byte],
isCachable: V => Boolean = (_: V) => true
)
: K => Future[V] = ???
zCache: Constructing memoize API
def memoize[K,V](task: K => Future[V])(
digest: K => String,
deserializer: Array[Byte] => V,
serializer: V => Array[Byte],
isCachable: V => Boolean = (_: V) => true
)(ttlSeconds: Int = 10, pollingMaxRetries: Int = 5,
pollingInterval: Int = 1)
: K => Future[V] = ???
zCache: Constructing memoize API
def memoize[K,V](task: K => Future[V])(
digest: K => String,
deserializer: Array[Byte] => V,
serializer: V => Array[Byte],
isCachable: V => Boolean = (_: V) => true
)(ttlSeconds: Int = 10, pollingMaxRetries: Int = 5,
pollingInterval: Int = 1)(
implicit ec: ExecutionContext
): K => Future[V] = ???
zCache: Constructing memoize API
THANK YOU
FAQ: Can I reuse this library?
• Probably not as-is…
• You’re welcome to be inspired
• Contacts:
facebook.com/bryaakov
github.com/bryaakov
• Questions?

More Related Content

What's hot

Infrastructure as Code with Terraform
Infrastructure as Code with TerraformInfrastructure as Code with Terraform
Infrastructure as Code with Terraform
Mario IC
 
Scaling with Python: SF Python Meetup, September 2017
Scaling with Python: SF Python Meetup, September 2017Scaling with Python: SF Python Meetup, September 2017
Scaling with Python: SF Python Meetup, September 2017
Varun Varma
 
Terraforming the Kubernetes Land
Terraforming the Kubernetes LandTerraforming the Kubernetes Land
Terraforming the Kubernetes Land
Radek Simko
 
Cache and Drupal
Cache and DrupalCache and Drupal
Cache and Drupal
Kornel Lugosi
 
Big data in the cloud
Big data in the cloudBig data in the cloud
Big data in the cloud
Ben Sullins
 
Atom: A cloud native deep learning platform at Supremind
Atom: A cloud native deep learning platform at SupremindAtom: A cloud native deep learning platform at Supremind
Atom: A cloud native deep learning platform at Supremind
Alluxio, Inc.
 
Adventures in RDS Load Testing
Adventures in RDS Load TestingAdventures in RDS Load Testing
Adventures in RDS Load Testing
Mike Harnish
 
Terraform Modules and Continuous Deployment
Terraform Modules and Continuous DeploymentTerraform Modules and Continuous Deployment
Terraform Modules and Continuous Deployment
Zane Williamson
 
Dbs302 driving a realtime personalization engine with cloud bigtable
Dbs302  driving a realtime personalization engine with cloud bigtableDbs302  driving a realtime personalization engine with cloud bigtable
Dbs302 driving a realtime personalization engine with cloud bigtable
Calvin French-Owen
 
Infrastructure as code with Terraform
Infrastructure as code with TerraformInfrastructure as code with Terraform
Infrastructure as code with TerraformSam Bashton
 
Terraform Q&A - HashiCorp User Group Oslo
Terraform Q&A - HashiCorp User Group OsloTerraform Q&A - HashiCorp User Group Oslo
Terraform Q&A - HashiCorp User Group Oslo
Anton Babenko
 
Building infrastructure with Terraform (Google)
Building infrastructure with Terraform (Google)Building infrastructure with Terraform (Google)
Building infrastructure with Terraform (Google)
Radek Simko
 
20131114
2013111420131114
20131114
Jocelyn
 
Altitude San Francisco 2018: WebAssembly Tools & Applications
Altitude San Francisco 2018: WebAssembly Tools & ApplicationsAltitude San Francisco 2018: WebAssembly Tools & Applications
Altitude San Francisco 2018: WebAssembly Tools & Applications
Fastly
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Altinity Ltd
 
Infrastructure as Code in Google Cloud
Infrastructure as Code in Google CloudInfrastructure as Code in Google Cloud
Infrastructure as Code in Google Cloud
Radek Simko
 
Scylla Summit 2022: ScyllaDB Embraces Wasm
Scylla Summit 2022: ScyllaDB Embraces WasmScylla Summit 2022: ScyllaDB Embraces Wasm
Scylla Summit 2022: ScyllaDB Embraces Wasm
ScyllaDB
 
Ali Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLON
Ali Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLONAli Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLON
Ali Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLON
Outlyer
 
High Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & AzureHigh Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & AzureDataStax Academy
 
OSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica SarbuOSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica Sarbu
NETWAYS
 

What's hot (20)

Infrastructure as Code with Terraform
Infrastructure as Code with TerraformInfrastructure as Code with Terraform
Infrastructure as Code with Terraform
 
Scaling with Python: SF Python Meetup, September 2017
Scaling with Python: SF Python Meetup, September 2017Scaling with Python: SF Python Meetup, September 2017
Scaling with Python: SF Python Meetup, September 2017
 
Terraforming the Kubernetes Land
Terraforming the Kubernetes LandTerraforming the Kubernetes Land
Terraforming the Kubernetes Land
 
Cache and Drupal
Cache and DrupalCache and Drupal
Cache and Drupal
 
Big data in the cloud
Big data in the cloudBig data in the cloud
Big data in the cloud
 
Atom: A cloud native deep learning platform at Supremind
Atom: A cloud native deep learning platform at SupremindAtom: A cloud native deep learning platform at Supremind
Atom: A cloud native deep learning platform at Supremind
 
Adventures in RDS Load Testing
Adventures in RDS Load TestingAdventures in RDS Load Testing
Adventures in RDS Load Testing
 
Terraform Modules and Continuous Deployment
Terraform Modules and Continuous DeploymentTerraform Modules and Continuous Deployment
Terraform Modules and Continuous Deployment
 
Dbs302 driving a realtime personalization engine with cloud bigtable
Dbs302  driving a realtime personalization engine with cloud bigtableDbs302  driving a realtime personalization engine with cloud bigtable
Dbs302 driving a realtime personalization engine with cloud bigtable
 
Infrastructure as code with Terraform
Infrastructure as code with TerraformInfrastructure as code with Terraform
Infrastructure as code with Terraform
 
Terraform Q&A - HashiCorp User Group Oslo
Terraform Q&A - HashiCorp User Group OsloTerraform Q&A - HashiCorp User Group Oslo
Terraform Q&A - HashiCorp User Group Oslo
 
Building infrastructure with Terraform (Google)
Building infrastructure with Terraform (Google)Building infrastructure with Terraform (Google)
Building infrastructure with Terraform (Google)
 
20131114
2013111420131114
20131114
 
Altitude San Francisco 2018: WebAssembly Tools & Applications
Altitude San Francisco 2018: WebAssembly Tools & ApplicationsAltitude San Francisco 2018: WebAssembly Tools & Applications
Altitude San Francisco 2018: WebAssembly Tools & Applications
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
 
Infrastructure as Code in Google Cloud
Infrastructure as Code in Google CloudInfrastructure as Code in Google Cloud
Infrastructure as Code in Google Cloud
 
Scylla Summit 2022: ScyllaDB Embraces Wasm
Scylla Summit 2022: ScyllaDB Embraces WasmScylla Summit 2022: ScyllaDB Embraces Wasm
Scylla Summit 2022: ScyllaDB Embraces Wasm
 
Ali Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLON
Ali Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLONAli Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLON
Ali Asad Lotia (DevOps at Beamly) - Riemann Stream Processing at #DOXLON
 
High Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & AzureHigh Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & Azure
 
OSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica SarbuOSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica Sarbu
 

Similar to zStore

Reactive Web-Applications @ LambdaDays
Reactive Web-Applications @ LambdaDaysReactive Web-Applications @ LambdaDays
Reactive Web-Applications @ LambdaDays
Manuel Bernhardt
 
Spring Day | Spring and Scala | Eberhard Wolff
Spring Day | Spring and Scala | Eberhard WolffSpring Day | Spring and Scala | Eberhard Wolff
Spring Day | Spring and Scala | Eberhard Wolff
JAX London
 
Graal in GraalVM - A New JIT Compiler
Graal in GraalVM - A New JIT CompilerGraal in GraalVM - A New JIT Compiler
Graal in GraalVM - A New JIT Compiler
Koichi Sakata
 
Nodejs - Should Ruby Developers Care?
Nodejs - Should Ruby Developers Care?Nodejs - Should Ruby Developers Care?
Nodejs - Should Ruby Developers Care?
Felix Geisendörfer
 
DjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling DisqusDjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling Disqus
zeeg
 
Fighting Against Chaotically Separated Values with Embulk
Fighting Against Chaotically Separated Values with EmbulkFighting Against Chaotically Separated Values with Embulk
Fighting Against Chaotically Separated Values with Embulk
Sadayuki Furuhashi
 
Docker & ECS: Secure Nearline Execution
Docker & ECS: Secure Nearline ExecutionDocker & ECS: Secure Nearline Execution
Docker & ECS: Secure Nearline Execution
Brennan Saeta
 
Raffaele Rialdi
Raffaele RialdiRaffaele Rialdi
Raffaele Rialdi
CodeFest
 
Concurrency at the Database Layer
Concurrency at the Database Layer Concurrency at the Database Layer
Concurrency at the Database Layer
mcwilson1
 
Hi performance table views with QuartzCore and CoreText
Hi performance table views with QuartzCore and CoreTextHi performance table views with QuartzCore and CoreText
Hi performance table views with QuartzCore and CoreText
Mugunth Kumar
 
Nodejs - A quick tour (v6)
Nodejs - A quick tour (v6)Nodejs - A quick tour (v6)
Nodejs - A quick tour (v6)
Felix Geisendörfer
 
Blocks & GCD
Blocks & GCDBlocks & GCD
Blocks & GCD
rsebbe
 
Javascript Everywhere
Javascript EverywhereJavascript Everywhere
Javascript Everywhere
Pascal Rettig
 
Immutable Deployments with AWS CloudFormation and AWS Lambda
Immutable Deployments with AWS CloudFormation and AWS LambdaImmutable Deployments with AWS CloudFormation and AWS Lambda
Immutable Deployments with AWS CloudFormation and AWS Lambda
AOE
 
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVMVoxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
Manuel Bernhardt
 
JavaScript in 2016
JavaScript in 2016JavaScript in 2016
JavaScript in 2016
Codemotion
 
JavaScript in 2016 (Codemotion Rome)
JavaScript in 2016 (Codemotion Rome)JavaScript in 2016 (Codemotion Rome)
JavaScript in 2016 (Codemotion Rome)
Eduard Tomàs
 
Kotlin talk
Kotlin talkKotlin talk
Kotlin talk
Klemen Kresnik
 
From Ruby to Scala
From Ruby to ScalaFrom Ruby to Scala
From Ruby to Scala
tod esking
 
Coding in Style
Coding in StyleCoding in Style
Coding in Style
scalaconfjp
 

Similar to zStore (20)

Reactive Web-Applications @ LambdaDays
Reactive Web-Applications @ LambdaDaysReactive Web-Applications @ LambdaDays
Reactive Web-Applications @ LambdaDays
 
Spring Day | Spring and Scala | Eberhard Wolff
Spring Day | Spring and Scala | Eberhard WolffSpring Day | Spring and Scala | Eberhard Wolff
Spring Day | Spring and Scala | Eberhard Wolff
 
Graal in GraalVM - A New JIT Compiler
Graal in GraalVM - A New JIT CompilerGraal in GraalVM - A New JIT Compiler
Graal in GraalVM - A New JIT Compiler
 
Nodejs - Should Ruby Developers Care?
Nodejs - Should Ruby Developers Care?Nodejs - Should Ruby Developers Care?
Nodejs - Should Ruby Developers Care?
 
DjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling DisqusDjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling Disqus
 
Fighting Against Chaotically Separated Values with Embulk
Fighting Against Chaotically Separated Values with EmbulkFighting Against Chaotically Separated Values with Embulk
Fighting Against Chaotically Separated Values with Embulk
 
Docker & ECS: Secure Nearline Execution
Docker & ECS: Secure Nearline ExecutionDocker & ECS: Secure Nearline Execution
Docker & ECS: Secure Nearline Execution
 
Raffaele Rialdi
Raffaele RialdiRaffaele Rialdi
Raffaele Rialdi
 
Concurrency at the Database Layer
Concurrency at the Database Layer Concurrency at the Database Layer
Concurrency at the Database Layer
 
Hi performance table views with QuartzCore and CoreText
Hi performance table views with QuartzCore and CoreTextHi performance table views with QuartzCore and CoreText
Hi performance table views with QuartzCore and CoreText
 
Nodejs - A quick tour (v6)
Nodejs - A quick tour (v6)Nodejs - A quick tour (v6)
Nodejs - A quick tour (v6)
 
Blocks & GCD
Blocks & GCDBlocks & GCD
Blocks & GCD
 
Javascript Everywhere
Javascript EverywhereJavascript Everywhere
Javascript Everywhere
 
Immutable Deployments with AWS CloudFormation and AWS Lambda
Immutable Deployments with AWS CloudFormation and AWS LambdaImmutable Deployments with AWS CloudFormation and AWS Lambda
Immutable Deployments with AWS CloudFormation and AWS Lambda
 
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVMVoxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
 
JavaScript in 2016
JavaScript in 2016JavaScript in 2016
JavaScript in 2016
 
JavaScript in 2016 (Codemotion Rome)
JavaScript in 2016 (Codemotion Rome)JavaScript in 2016 (Codemotion Rome)
JavaScript in 2016 (Codemotion Rome)
 
Kotlin talk
Kotlin talkKotlin talk
Kotlin talk
 
From Ruby to Scala
From Ruby to ScalaFrom Ruby to Scala
From Ruby to Scala
 
Coding in Style
Coding in StyleCoding in Style
Coding in Style
 

Recently uploaded

2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Jay Das
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
Ortus Solutions, Corp
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
informapgpstrackings
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Anthony Dahanne
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
kalichargn70th171
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
vrstrong314
 
RISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent EnterpriseRISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent Enterprise
Srikant77
 

Recently uploaded (20)

2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
RISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent EnterpriseRISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent Enterprise
 

zStore

  • 1. A TALE OF BUILDING A LIBRARY IN SCALA Yaakov Breuer Photo License: “CC0 Public Domain” https://creativecommons.org/publicdomain/zero/1.0/ {facebook,github}.com/bryaakov
  • 2. Agenda • Background • The Problem • Design Goals • The Solution • Caching Layer
  • 3.
  • 4. • Distributed high-scale data warehouse • Combines Big Data with Linked Data Linked Data is a way of modeling the world by a graph with typed edges: Background: “CM-Well” Yaakov TR worksAt
  • 5. Background: “CM-Well” • Was created about 9 years ago • Holds 4B objects in Production • Was open-sourced a year ago  Usually not an easy task in large corporations  Keeping us in shape
  • 7. Background: “CM-Well” Example 1: Read/Query $ curl localhost:9000/_sparql --data-binary ' SELECT ?comp WHERE { <http://example.org/yaakov> <http://example.org/ont/worksAt> ?comp . }' ------------------------------------------ | comp | |========================================| | <http://permid.org/1-4295861160> | ------------------------------------------
  • 8. Background: “CM-Well” Example 1: Read/Query (what actually happens) HTTP GET is received: • Translate payload to a case class: SparqlRequest • Query Elasticsearch • Fetch data from Cassandra • Return human-readable response
  • 9. Background: “CM-Well” Example 2: Data ingest $ curl -X PUT localhost:9000/_in?format=n3 --data-binary '@prefix example: <http://example.org/ont/> . <http://example.org/Yaakov> example:worksAt <http://permid.org/1-4295861160> .' {"success":true}
  • 10. Background: “CM-Well” Example 2: Data ingest (what actually happens) HTTP PUT/POST was received: • Data is parsed • Kafka messages are being prdocued • The user gets 200 OK • (Eventually) Kafka Messages are consumed • Data is persisted in Cassandra • Data is indexed in Elasticsearch
  • 12. • Normally, we store documents / objects • We do support large files as objects • Kafka messages should be small * • “Any problem in computer science can be solved with another layer of indirection” (David Wheeler) * https://kafka.apache.org/documentation/#configuration The Problem
  • 13. We wanted… • A key/value store • Distributed and Persisted • put/get API • To keep it simple • An in-process solution Design Goals
  • 14. Why re-invent the wheel? (The everlasting trade-off…) It seems twitter/util has a util-cache module  Might be a good fit  No persistence  Twitter Futures Other options?
  • 16. The zStore Trait trait ZStore { def put(uzid: String, value: Array[Byte]): Future[Unit] def put(uzid: String, value: Array[Byte], secondsToLive: Int): Future[Unit] def get(uzid: String): Future[Array[Byte]] def remove(uzid: String): Future[Unit] }
  • 17. • ZStoreImpl – uses Cassandra • ZStoreMem – in memory (for testing) Implementations
  • 18. • Main usage – large files • Key is hash(content) • Keeping internal state (e.g. Kafka offsets) • Caching • Using TTL • Using Memoization Usages
  • 20. Next Level: From zStore to zCache • zStore has a String => Future[Array[Byte]] API • We need to generalize it to K => Future[V] • And let’s use memoization
  • 21. Memoize • Traditionally, “memoize” is a function that takes one function and returns a new function with same singnature that caches results. • So you can simply wrap existing heavylifting method by it; no need to refactor.
  • 22. Memoize Example Reminder – HTTP GET is received: • Translate payload to a case class: SparqlRequest • Query Elasticsearch • Fetch data from Cassandra • Return human-readable response
  • 23. Memoize Example def handleHttpGet = { val request: SparqlRequest = ??? val response = execute(request) response.map(Ok.apply) // 200 OK }
  • 24. Memoize Example val cachedExecute = memoize(execute) def handleHttpGet = { val request: SparqlRequest = ??? val response = cachedExecute(request) response.map(Ok.apply) // 200 OK }
  • 25. zCache.memoize When used, will have to do the following: Given a key, • Get from zStore (with retries) • If exists, return value • Else: • Evaluate the “task” • Put results in zStore (with TTL) • Return value
  • 26. zCache.memoize In order achieve that, we are going to need: Given a key, convert from key:K to uzid:String • Get from zStore (with retries) • If exists, return value but map from Array[Byte] to V • Else: • Evaluate the “task” • Put results in zStore (with TTL) but map from V to Array[Byte] • Return value
  • 27. def memoize[K,V](task: K => Future[V]) : K => Future[V] = ??? zCache: Constructing memoize API
  • 28. def memoize[K,V](task: K => Future[V])( digest: K => String, deserializer: Array[Byte] => V, serializer: V => Array[Byte] ): K => Future[V] = ??? zCache: Constructing memoize API
  • 29. def memoize[K,V](task: K => Future[V])( digest: K => String, deserializer: Array[Byte] => V, serializer: V => Array[Byte], isCachable: V => Boolean ) : K => Future[V] = ??? zCache: Constructing memoize API
  • 30. def memoize[K,V](task: K => Future[V])( digest: K => String, deserializer: Array[Byte] => V, serializer: V => Array[Byte], isCachable: V => Boolean = (_: V) => true ) : K => Future[V] = ??? zCache: Constructing memoize API
  • 31. def memoize[K,V](task: K => Future[V])( digest: K => String, deserializer: Array[Byte] => V, serializer: V => Array[Byte], isCachable: V => Boolean = (_: V) => true )(ttlSeconds: Int = 10, pollingMaxRetries: Int = 5, pollingInterval: Int = 1) : K => Future[V] = ??? zCache: Constructing memoize API
  • 32. def memoize[K,V](task: K => Future[V])( digest: K => String, deserializer: Array[Byte] => V, serializer: V => Array[Byte], isCachable: V => Boolean = (_: V) => true )(ttlSeconds: Int = 10, pollingMaxRetries: Int = 5, pollingInterval: Int = 1)( implicit ec: ExecutionContext ): K => Future[V] = ??? zCache: Constructing memoize API
  • 34. FAQ: Can I reuse this library? • Probably not as-is… • You’re welcome to be inspired