SlideShare a Scribd company logo
Active Data Stores at
30,000ft
Jeffrey Sica
@jeefy
Overview
● Definition
● General Disclaimer
● PostgreSQL
● MongoDB
● ElasticSearch
● Conclusion / Q&A
Definition
Active Data Store
Data that at any point can be queried, manipulated, and transformed within a service
layer
Meaning: Anything within a database, daemon, or service. Manipulating files on the
filesystem don’t count.
Exception/Debate: Parquet / HDFS / Big Data Anything
General Disclaimer
I pick technology with this mantra:
The right tool for the right job
I will present with this mantra.
That does not mean technology A can’t be used for purpose B
(and especially if a faculty is set on it)
Demos will use Docker.
Remember containers are ephemeral. Once the container is destroyed, so too is your
data.
PostgreSQL - High Level
● RDBMS - Relational Database Management System
● Enforced relationships (and schemas) between data types / models
● Clustering can be… a chore
● Writes can be slow (and compounded when clustered)
● Read performance reasonable
● Joins/Views a-plenty
● Granular access control
● Queries written using… SQL (shock)
● Memory footprint: Depends on usage
● Overall a solid service
PostgreSQL - Docker Playground / Connect Info
Zero to SQL Shell with Docker
#!/bin/bash
docker run -d --name postgres postgres:latest
docker exec -ti postgres bash
su postgres
psql
● In prompt: “h” for help, “q” to quit SQL Shell
● Default port (when exposed) is 5432
● Many GUIs, default (pgAdmin, https://www.pgadmin.org/ ) is fantastic
MongoDB - High Level
● NoSQL (JSON Document store)
● Schemaless: Record A and Record B can have completely differing schemas
● Clustering and maintenance is fairly easy
● Writes are fairly fast (eventual consistency across cluster)
● Reads are extremely fast
● No tables? No joins. No views.
● Per-Database RBAC (More complex when clustered)
● Custom Query Language (Fairly easy to learn)
● Dedupe: You like it (Depending on storage engine)
● Memory footprint: It’s C
● Problems in the past give me pause
MongoDB - Docker Playground / Connect Info
Zero to Mongo Shell with Docker
#!/bin/bash
docker run -d --name mongo mongo:latest
docker exec -ti mongo bash
mongo
● “exit” exits, “help” helps
● Default port (when exposed) is 27017
● Many GUIs, I prefer “mongoclient” which is third-party OSS
https://docs.mongodb.com/ecosystem/tools/administration-interfaces/
ElasticSearch - High Level
● NoSQL (Document Store)
● Has a “schema” per “index” (think a table but not really)
● Press button: Receive Cluster (so easy even a caveman could do it)
● Writes extremely fast (eventually consistent w/ reads)
● Reads extremely fast (depending on “query”)
● No tables? You guessed it. No joins. Views depends on client (it reads fast)
● You like security? Hope you like iptables (or have lots of money)
● Query language: R-E-S-T-F-U-L (Sing it like Aretha) on top of Lucene/SOLR
● Dedupe: You like it
● Memory footprint: It’s Java.
● Self healing, set it and forget it. Very solid platform.
ElasticSearch - Connect
Zero to ElasticSearch “console” with Docker
#!/bin/bash
docker run -d --name elastic elasticsearch:latest
docker exec -ti elastic bash
curl -i -XGET 'localhost:9200/'
● It’s R-E-S-T-F-U-L so just curl it
● Default port (when exposed) is 9200
● Many GUIs (Kibana and Grafana do dashboards)
Kid in a candy store for features / GUIs
Conclusion / Q&A
● Small sampling of service
● Try to fit the right tool (service) for the right job (data)
● If not: fit the right handle (query/interface) for the right researcher
● All else fails or researcher wants “something completely different”
Contact ARC-TS (Jeremy) and we’ll facilitate a decision
Pick My Brain Time

More Related Content

What's hot

JMeter performance and scalability in Moodle Montana Moot 2014
JMeter performance and scalability in Moodle Montana Moot 2014JMeter performance and scalability in Moodle Montana Moot 2014
JMeter performance and scalability in Moodle Montana Moot 2014
moorejon
 
Massive Storage Engine
Massive Storage EngineMassive Storage Engine
Massive Storage Engine
Varnish Software
 
Moodle performance optimizations
Moodle performance optimizationsMoodle performance optimizations
Moodle performance optimizations
Jan Meier
 
Moodle performance and stress testing
Moodle performance and stress testingMoodle performance and stress testing
Moodle performance and stress testing
moorejon
 
Introduction to MongoDB and CRUD operations
Introduction to MongoDB and CRUD operationsIntroduction to MongoDB and CRUD operations
Introduction to MongoDB and CRUD operations
Anand Kumar
 
Webinar: Tales from the Field - 48 Hours to Data Centre Recovery
Webinar: Tales from the Field - 48 Hours to Data Centre RecoveryWebinar: Tales from the Field - 48 Hours to Data Centre Recovery
Webinar: Tales from the Field - 48 Hours to Data Centre Recovery
MongoDB
 
Session 2 deploy and run Mtech 1st lab question Blob and Clob using java
Session 2 deploy and run Mtech 1st lab question Blob and Clob using java Session 2 deploy and run Mtech 1st lab question Blob and Clob using java
Session 2 deploy and run Mtech 1st lab question Blob and Clob using java
Pratik Thakkar
 
MongoDB Introduction - Document Oriented Nosql Database
MongoDB Introduction - Document Oriented Nosql DatabaseMongoDB Introduction - Document Oriented Nosql Database
MongoDB Introduction - Document Oriented Nosql DatabaseSudhir Patil
 
Html
HtmlHtml
Html
mjlavin80
 
Brubeck: The Lightning Talk
Brubeck: The Lightning TalkBrubeck: The Lightning Talk
Brubeck: The Lightning Talk
James Dennis
 
Mongo db onepage
Mongo db onepageMongo db onepage
Mongo db onepage
Milind Zodge
 
Starting with MongoDB
Starting with MongoDBStarting with MongoDB
Starting with MongoDB
Cesar Martinez
 
Caching
CachingCaching
Caching
Aaron Scherer
 
Nodejs server lesson 3
 Nodejs server lesson 3 Nodejs server lesson 3
Nodejs server lesson 3
SamuelAdetunji2
 
Introduction to mongo db by zain
Introduction to mongo db by zainIntroduction to mongo db by zain
Introduction to mongo db by zain
KenAndTea
 
Moodle performance testing presentation - Jonathon Moore
 Moodle performance testing presentation - Jonathon Moore Moodle performance testing presentation - Jonathon Moore
Moodle performance testing presentation - Jonathon Moore
Ireland & UK Moodlemoot 2012
 
NoSQL
NoSQLNoSQL
NoSQL
Radu Potop
 
Getting Started - MongoDB
Getting Started - MongoDBGetting Started - MongoDB
Getting Started - MongoDB
Wildan Maulana
 
Javascript debugging
Javascript debuggingJavascript debugging
Javascript debugging
audiodog
 

What's hot (20)

JMeter performance and scalability in Moodle Montana Moot 2014
JMeter performance and scalability in Moodle Montana Moot 2014JMeter performance and scalability in Moodle Montana Moot 2014
JMeter performance and scalability in Moodle Montana Moot 2014
 
Massive Storage Engine
Massive Storage EngineMassive Storage Engine
Massive Storage Engine
 
Moodle performance optimizations
Moodle performance optimizationsMoodle performance optimizations
Moodle performance optimizations
 
Moodle performance and stress testing
Moodle performance and stress testingMoodle performance and stress testing
Moodle performance and stress testing
 
Introduction to MongoDB and CRUD operations
Introduction to MongoDB and CRUD operationsIntroduction to MongoDB and CRUD operations
Introduction to MongoDB and CRUD operations
 
Webinar: Tales from the Field - 48 Hours to Data Centre Recovery
Webinar: Tales from the Field - 48 Hours to Data Centre RecoveryWebinar: Tales from the Field - 48 Hours to Data Centre Recovery
Webinar: Tales from the Field - 48 Hours to Data Centre Recovery
 
Session 2 deploy and run Mtech 1st lab question Blob and Clob using java
Session 2 deploy and run Mtech 1st lab question Blob and Clob using java Session 2 deploy and run Mtech 1st lab question Blob and Clob using java
Session 2 deploy and run Mtech 1st lab question Blob and Clob using java
 
MongoDB Introduction - Document Oriented Nosql Database
MongoDB Introduction - Document Oriented Nosql DatabaseMongoDB Introduction - Document Oriented Nosql Database
MongoDB Introduction - Document Oriented Nosql Database
 
Html
HtmlHtml
Html
 
Brubeck: The Lightning Talk
Brubeck: The Lightning TalkBrubeck: The Lightning Talk
Brubeck: The Lightning Talk
 
Mongo db onepage
Mongo db onepageMongo db onepage
Mongo db onepage
 
Starting with MongoDB
Starting with MongoDBStarting with MongoDB
Starting with MongoDB
 
Caching
CachingCaching
Caching
 
NoSQL
NoSQLNoSQL
NoSQL
 
Nodejs server lesson 3
 Nodejs server lesson 3 Nodejs server lesson 3
Nodejs server lesson 3
 
Introduction to mongo db by zain
Introduction to mongo db by zainIntroduction to mongo db by zain
Introduction to mongo db by zain
 
Moodle performance testing presentation - Jonathon Moore
 Moodle performance testing presentation - Jonathon Moore Moodle performance testing presentation - Jonathon Moore
Moodle performance testing presentation - Jonathon Moore
 
NoSQL
NoSQLNoSQL
NoSQL
 
Getting Started - MongoDB
Getting Started - MongoDBGetting Started - MongoDB
Getting Started - MongoDB
 
Javascript debugging
Javascript debuggingJavascript debugging
Javascript debugging
 

Similar to Active Data Stores at 30,000ft

Big Data Lakes Benchmarking 2018
Big Data Lakes Benchmarking 2018Big Data Lakes Benchmarking 2018
Big Data Lakes Benchmarking 2018
Tom Grek
 
MongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL DatabaseMongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL Database
FITC
 
Hadoop-2.6.0 Slides
Hadoop-2.6.0 SlidesHadoop-2.6.0 Slides
Hadoop-2.6.0 Slides
kul prasad subedi
 
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
Holden Karau
 
Threads and processes
Threads and processesThreads and processes
Threads and processes
Fungirayiini Chiweshe Mushaninga
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
Mukesh Singh
 
Building a high-performance, scalable ML & NLP platform with Python, Sheer El...
Building a high-performance, scalable ML & NLP platform with Python, Sheer El...Building a high-performance, scalable ML & NLP platform with Python, Sheer El...
Building a high-performance, scalable ML & NLP platform with Python, Sheer El...
Pôle Systematic Paris-Region
 
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json  postgre-sql vs. mongodbPGConf APAC 2018 - High performance json  postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC
 
Netflix running Presto in the AWS Cloud
Netflix running Presto in the AWS CloudNetflix running Presto in the AWS Cloud
Netflix running Presto in the AWS CloudZhenxiao Luo
 
Keeping the fun in functional w/ Apache Spark @ Scala Days NYC
Keeping the fun in functional   w/ Apache Spark @ Scala Days NYCKeeping the fun in functional   w/ Apache Spark @ Scala Days NYC
Keeping the fun in functional w/ Apache Spark @ Scala Days NYC
Holden Karau
 
A fast introduction to PySpark with a quick look at Arrow based UDFs
A fast introduction to PySpark with a quick look at Arrow based UDFsA fast introduction to PySpark with a quick look at Arrow based UDFs
A fast introduction to PySpark with a quick look at Arrow based UDFs
Holden Karau
 
Lightning Fast Dataframes with Polars
Lightning Fast Dataframes with PolarsLightning Fast Dataframes with Polars
Lightning Fast Dataframes with Polars
Alberto Danese
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
Demi Ben-Ari
 
Scaling up and accelerating Drupal 8 with NoSQL
Scaling up and accelerating Drupal 8 with NoSQLScaling up and accelerating Drupal 8 with NoSQL
Scaling up and accelerating Drupal 8 with NoSQL
OSInet
 
Introduction to Apache Airflow
Introduction to Apache AirflowIntroduction to Apache Airflow
Introduction to Apache Airflow
mutt_data
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadKrivoy Rog IT Community
 
Introduction to AWS Big Data
Introduction to AWS Big Data Introduction to AWS Big Data
Introduction to AWS Big Data
Omid Vahdaty
 
DevOops & How I hacked you DevopsDays DC June 2015
DevOops & How I hacked you DevopsDays DC June 2015DevOops & How I hacked you DevopsDays DC June 2015
DevOops & How I hacked you DevopsDays DC June 2015
Chris Gates
 
Doctrine Project
Doctrine ProjectDoctrine Project
Doctrine Project
Daniel Lima
 

Similar to Active Data Stores at 30,000ft (20)

Big Data Lakes Benchmarking 2018
Big Data Lakes Benchmarking 2018Big Data Lakes Benchmarking 2018
Big Data Lakes Benchmarking 2018
 
MongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL DatabaseMongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL Database
 
Hadoop-2.6.0 Slides
Hadoop-2.6.0 SlidesHadoop-2.6.0 Slides
Hadoop-2.6.0 Slides
 
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018Beyond Wordcount  with spark datasets (and scalaing) - Nide PDX Jan 2018
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018
 
Threads and processes
Threads and processesThreads and processes
Threads and processes
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
 
Building a high-performance, scalable ML & NLP platform with Python, Sheer El...
Building a high-performance, scalable ML & NLP platform with Python, Sheer El...Building a high-performance, scalable ML & NLP platform with Python, Sheer El...
Building a high-performance, scalable ML & NLP platform with Python, Sheer El...
 
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json  postgre-sql vs. mongodbPGConf APAC 2018 - High performance json  postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
 
Netflix running Presto in the AWS Cloud
Netflix running Presto in the AWS CloudNetflix running Presto in the AWS Cloud
Netflix running Presto in the AWS Cloud
 
Keeping the fun in functional w/ Apache Spark @ Scala Days NYC
Keeping the fun in functional   w/ Apache Spark @ Scala Days NYCKeeping the fun in functional   w/ Apache Spark @ Scala Days NYC
Keeping the fun in functional w/ Apache Spark @ Scala Days NYC
 
A fast introduction to PySpark with a quick look at Arrow based UDFs
A fast introduction to PySpark with a quick look at Arrow based UDFsA fast introduction to PySpark with a quick look at Arrow based UDFs
A fast introduction to PySpark with a quick look at Arrow based UDFs
 
The Accidental DBA
The Accidental DBAThe Accidental DBA
The Accidental DBA
 
Lightning Fast Dataframes with Polars
Lightning Fast Dataframes with PolarsLightning Fast Dataframes with Polars
Lightning Fast Dataframes with Polars
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
 
Scaling up and accelerating Drupal 8 with NoSQL
Scaling up and accelerating Drupal 8 with NoSQLScaling up and accelerating Drupal 8 with NoSQL
Scaling up and accelerating Drupal 8 with NoSQL
 
Introduction to Apache Airflow
Introduction to Apache AirflowIntroduction to Apache Airflow
Introduction to Apache Airflow
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
 
Introduction to AWS Big Data
Introduction to AWS Big Data Introduction to AWS Big Data
Introduction to AWS Big Data
 
DevOops & How I hacked you DevopsDays DC June 2015
DevOops & How I hacked you DevopsDays DC June 2015DevOops & How I hacked you DevopsDays DC June 2015
DevOops & How I hacked you DevopsDays DC June 2015
 
Doctrine Project
Doctrine ProjectDoctrine Project
Doctrine Project
 

Recently uploaded

Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 

Recently uploaded (20)

Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 

Active Data Stores at 30,000ft

  • 1. Active Data Stores at 30,000ft Jeffrey Sica @jeefy
  • 2. Overview ● Definition ● General Disclaimer ● PostgreSQL ● MongoDB ● ElasticSearch ● Conclusion / Q&A
  • 3. Definition Active Data Store Data that at any point can be queried, manipulated, and transformed within a service layer Meaning: Anything within a database, daemon, or service. Manipulating files on the filesystem don’t count. Exception/Debate: Parquet / HDFS / Big Data Anything
  • 4. General Disclaimer I pick technology with this mantra: The right tool for the right job I will present with this mantra. That does not mean technology A can’t be used for purpose B (and especially if a faculty is set on it) Demos will use Docker. Remember containers are ephemeral. Once the container is destroyed, so too is your data.
  • 5. PostgreSQL - High Level ● RDBMS - Relational Database Management System ● Enforced relationships (and schemas) between data types / models ● Clustering can be… a chore ● Writes can be slow (and compounded when clustered) ● Read performance reasonable ● Joins/Views a-plenty ● Granular access control ● Queries written using… SQL (shock) ● Memory footprint: Depends on usage ● Overall a solid service
  • 6. PostgreSQL - Docker Playground / Connect Info Zero to SQL Shell with Docker #!/bin/bash docker run -d --name postgres postgres:latest docker exec -ti postgres bash su postgres psql ● In prompt: “h” for help, “q” to quit SQL Shell ● Default port (when exposed) is 5432 ● Many GUIs, default (pgAdmin, https://www.pgadmin.org/ ) is fantastic
  • 7. MongoDB - High Level ● NoSQL (JSON Document store) ● Schemaless: Record A and Record B can have completely differing schemas ● Clustering and maintenance is fairly easy ● Writes are fairly fast (eventual consistency across cluster) ● Reads are extremely fast ● No tables? No joins. No views. ● Per-Database RBAC (More complex when clustered) ● Custom Query Language (Fairly easy to learn) ● Dedupe: You like it (Depending on storage engine) ● Memory footprint: It’s C ● Problems in the past give me pause
  • 8. MongoDB - Docker Playground / Connect Info Zero to Mongo Shell with Docker #!/bin/bash docker run -d --name mongo mongo:latest docker exec -ti mongo bash mongo ● “exit” exits, “help” helps ● Default port (when exposed) is 27017 ● Many GUIs, I prefer “mongoclient” which is third-party OSS https://docs.mongodb.com/ecosystem/tools/administration-interfaces/
  • 9. ElasticSearch - High Level ● NoSQL (Document Store) ● Has a “schema” per “index” (think a table but not really) ● Press button: Receive Cluster (so easy even a caveman could do it) ● Writes extremely fast (eventually consistent w/ reads) ● Reads extremely fast (depending on “query”) ● No tables? You guessed it. No joins. Views depends on client (it reads fast) ● You like security? Hope you like iptables (or have lots of money) ● Query language: R-E-S-T-F-U-L (Sing it like Aretha) on top of Lucene/SOLR ● Dedupe: You like it ● Memory footprint: It’s Java. ● Self healing, set it and forget it. Very solid platform.
  • 10. ElasticSearch - Connect Zero to ElasticSearch “console” with Docker #!/bin/bash docker run -d --name elastic elasticsearch:latest docker exec -ti elastic bash curl -i -XGET 'localhost:9200/' ● It’s R-E-S-T-F-U-L so just curl it ● Default port (when exposed) is 9200 ● Many GUIs (Kibana and Grafana do dashboards) Kid in a candy store for features / GUIs
  • 11. Conclusion / Q&A ● Small sampling of service ● Try to fit the right tool (service) for the right job (data) ● If not: fit the right handle (query/interface) for the right researcher ● All else fails or researcher wants “something completely different” Contact ARC-TS (Jeremy) and we’ll facilitate a decision Pick My Brain Time

Editor's Notes

  1. Goal is under ten minutes. Slides will be made available for reference. Enjoy the ride and save questions for after, goal is there’ll be plenty of time for that
  2. A datastore is an active service with an endpoint you can query, curl, run a client, etc. That is the basis for this definition. There is some gray area when we start talking about HiveSQL, Hadoop, Anything big data. We’re ignoring those for now.
  3. Read the mantra. Live the mantra. It is why anything I design is pluggable and I’m never married to a single solution. If you don’t have access to a machine with Docker and you want to play with some of this later, see me after class. At this point if you want practical hands-on experience with these things, Docker is the path of least resistance. Also this is not a Docker presentation so that’s the extent of that.
  4. Create a table, Select *, basic SQL commands Postgres also DOES have a JSON Datatype but if you’re working with JSON… there are better options Click for Bobby Tables
  5. Click for Data Loss Back in 2011… 1.0 vs 0.98 Yes they’ve gotten significantly better but they still have some data assurance issues
  6. If you’re dealing with large redundant data such as log output (what this was built for) this will win 100% of the time Click for Aretha