Active Data Stores at 30,000ft

Active Data Stores at
30,000ft
Jeffrey Sica
@jeefy

Overview
● Definition
● General Disclaimer
● PostgreSQL
● MongoDB
● ElasticSearch
● Conclusion / Q&A

Definition
Active Data Store
Data that at any point can be queried, manipulated, and transformed within a service
layer
Meaning: Anything within a database, daemon, or service. Manipulating files on the
filesystem don’t count.
Exception/Debate: Parquet / HDFS / Big Data Anything

General Disclaimer
I pick technology with this mantra:
The right tool for the right job
I will present with this mantra.
That does not mean technology A can’t be used for purpose B
(and especially if a faculty is set on it)
Demos will use Docker.
Remember containers are ephemeral. Once the container is destroyed, so too is your
data.

PostgreSQL - High Level
● RDBMS - Relational Database Management System
● Enforced relationships (and schemas) between data types / models
● Clustering can be… a chore
● Writes can be slow (and compounded when clustered)
● Read performance reasonable
● Joins/Views a-plenty
● Granular access control
● Queries written using… SQL (shock)
● Memory footprint: Depends on usage
● Overall a solid service

PostgreSQL - Docker Playground / Connect Info
Zero to SQL Shell with Docker
#!/bin/bash
docker run -d --name postgres postgres:latest
docker exec -ti postgres bash
su postgres
psql
● In prompt: “h” for help, “q” to quit SQL Shell
● Default port (when exposed) is 5432
● Many GUIs, default (pgAdmin, https://www.pgadmin.org/ ) is fantastic

MongoDB - High Level
● NoSQL (JSON Document store)
● Schemaless: Record A and Record B can have completely differing schemas
● Clustering and maintenance is fairly easy
● Writes are fairly fast (eventual consistency across cluster)
● Reads are extremely fast
● No tables? No joins. No views.
● Per-Database RBAC (More complex when clustered)
● Custom Query Language (Fairly easy to learn)
● Dedupe: You like it (Depending on storage engine)
● Memory footprint: It’s C
● Problems in the past give me pause

MongoDB - Docker Playground / Connect Info
Zero to Mongo Shell with Docker
#!/bin/bash
docker run -d --name mongo mongo:latest
docker exec -ti mongo bash
mongo
● “exit” exits, “help” helps
● Many GUIs, I prefer “mongoclient” which is third-party OSS
https://docs.mongodb.com/ecosystem/tools/administration-interfaces/

ElasticSearch - High Level
● NoSQL (Document Store)
● Has a “schema” per “index” (think a table but not really)
● Press button: Receive Cluster (so easy even a caveman could do it)
● Writes extremely fast (eventually consistent w/ reads)
● Reads extremely fast (depending on “query”)
● No tables? You guessed it. No joins. Views depends on client (it reads fast)
● You like security? Hope you like iptables (or have lots of money)
● Query language: R-E-S-T-F-U-L (Sing it like Aretha) on top of Lucene/SOLR
● Dedupe: You like it
● Memory footprint: It’s Java.
● Self healing, set it and forget it. Very solid platform.

ElasticSearch - Connect
Zero to ElasticSearch “console” with Docker
#!/bin/bash
docker run -d --name elastic elasticsearch:latest
docker exec -ti elastic bash
curl -i -XGET 'localhost:9200/'
● It’s R-E-S-T-F-U-L so just curl it
● Many GUIs (Kibana and Grafana do dashboards)
Kid in a candy store for features / GUIs

Conclusion / Q&A
● Small sampling of service
● Try to fit the right tool (service) for the right job (data)
● If not: fit the right handle (query/interface) for the right researcher
● All else fails or researcher wants “something completely different”
Contact ARC-TS (Jeremy) and we’ll facilitate a decision
Pick My Brain Time

Active Data Stores at 30,000ft

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Active Data Stores at 30,000ft

Similar to Active Data Stores at 30,000ft (20)

Recently uploaded

Recently uploaded (20)

Active Data Stores at 30,000ft

Editor's Notes