With the advent of IoT, companies have the opportunity to put larger and larger volumes of machine data to work to optimize operations like manufacturing production, safety, security, user experience. Yet, they are finding that the old paradigms of processing this data do not help mainstream developers keep pace with the velocity of data, new analytic algorithms, and the need for real-time insight. Jodok Batlogg, founder and CTO of Crate.io, believes that the solution to this problem lies at the nexus of modern open source distributed database architectures, machine learning/AI, and IoT networking. These technologies will combine to create a new data management paradigm that moves beyond traditional conceptions of databases. He believes the future lies in a central nervous system, an “operational brain” that connects directly to sensory inputs and applies artificial intelligence to control, predict, and monitor systems and things in real time. In this session, Jodok will use-real world, in-production manufacturing and cybersecurity examples of “operational brains” at work to explain the new paradigm, and discuss the concrete steps organizations can take to implement them.
2. Agenda
–
• Recent history of DBMS
- The NoSQL Era, the good, the bad, the ugly
- Post-NoSQL…Distributed SQL renaissance
• “Things Data” & the “Operational Brain”
• DBMS futures predictions
3. I like databases
25 years in DBMS & software development companies
IMHO…the coolest ways software is changing what’s
possible in life and business…is usually due to some
database changing what’s possible with software.
7. NoSQL, The Good…
-
• Many, many choices for most any use case
- JSON document stores
- Key-value stores
- Cacheing
- Time series
- Text search
• Easy, Economical, Developer friendly:
- Scalability
- Fault-tolerance
- Dynamic, flexible schemas (JSON)
- Open source
• Communities!
8. Knowing the CAP Theorem Helped…
-
• A partitioned database…
- where data is duplicated across multiple machines
- Access to that data can be EITHER
• Highly Consistent (e.g., MongoDB)
• or Highly available (e.g., DynamoDB)
• We learned the sky doesn’t fall if you forfeit ACID
- “Eventual Consistency”
9. NoSQL, The Bad…
-
• No standards (i.e., literally, no SQL)
- Harder to learn
- Hard to integrate
• Too many choices, hard to differentiate
- MongoDB vs. Rethink?
- CouchDB vs. Couchbase?
• DBA expertise
- Resizing & rebalancing database clusters
• Brute force query optimization, via code
• Polyglot persistence gone wild
- Use multiple specialized databases in a single system
- Over time, duplicate data storage and sync costs can grow out of control
11. Ten Years Ago …
-
Was NoSQL a step backwards
in DBMS technology…or a step
forwards?
12. • Greatly expanded researchers & contributors
• Debunked assumptions about requirements
- SQL access
- ACID / Eventual consistency
• Created open source code and thought leadership on
which next generation of SQL is being built
IMHO, NoSQL has been a step forwards
-
15. The next wave of big data
will come from machines
“Things Data”
16. “Operational Brain” … All Software will Eventually Predict, Control, Act…
–
Analyze
(Real-time, AI, data science
Immediate
Action
(Control, alert, predict)
Machine
Data
(Sensory stimuli)
Sensors HealthMobileSecurity LogisticsManufacturing Automotive
17. ALPLA - Smart Factory
–
•$4B global plastic packaging manufacturer
•Centralized “mission control”
- Informed by connected machines
•1+ million sensors, across 1500 product lines
- Predictive maintenance & alerting
- Augmented reality connects mission control &
factory floor
•Business transformation:
- Reduced workforce turnover & on-boarding cost
- Lower raw materials waste
- Increased operational equipment effective (OEE)
“It’s incredibly powerful.
Continuous production data
guides decision-making on the
floor in the moment.”
Philipp Lehner, CEO Alpla, USA
18. Smart Systems, AI = Huge Appetite for Data
–
•Data Variety
-950 different sensor types
-Operator logs (natural language)
-Material suppliers
-Operator (HR) data
•Data Volume
-100s of data points per bottle
-Millions of bottles per day
19. Smart Systems Machine Data Workload
Firehose of
Complex data in
Real-time at the
Edge + Cloud
21. #1 Scale-out Databases & Messaging Middleware Unite
–
• Message queues were invented to
compensate for DBMS weaknesses
- Downtime
- Slow ingestion
• New scale-out DBs don’t have those
pitfalls
• Scale-out DBs embed MQTT (et al)
listeners
• Lowers hosting costs, complexity,
development time fast ingest. always-on architecture
Embedded MQTT Listener
Message Queue
Devices
MQTT messages MQTT messages
versus
DBMS
slow ingest &
DB downtime
Devices
MQTT Broker
MQTT Consumer/Writer
22. #2 Time Series Databases = a Fad (again)
–
• Old problem
- High velocity of timestamped INSERTs,
- Queried as they arrive (usually)
• Time Series DBMS come and go
- Oracle. Informix
- Riak TS (RIP)
- InfluxDB
• Small problem, small market
• It’s a DBMS feature, not a DBMS
• EXCEPTION!
23. #3 SQL Remains Dominant (but evolves)
–
• SQL Developers outnumber NoSQL 45:1
• Advances in SQL distributed processing, HA,
JSON (non-tabular) will make NoSQL obsolete
• MongoDB achieved escape velocity, and will be a
“safe bet” for years
• Others will keep shrinking away to nichedom
- Riak, Rethink (RIP)
- Couchbase? Cassandra? …
24. Final Thoughts…
–
• Keep on inventing in the data center!
• Security, monitoring, prediction,
automation you do, is moving into the
“things” world
• Let’s see what more AI can do