Abstract:- Apache Cassandra is known as the go-to database for cloud applications requiring large amounts of data storage with elastic scalability across multiple data centers. Spark is an in-memory analytics framework that supports both realtime and batch processing, with extensions for streaming, machine learning, and SQL. Jeff Carpenter, Technical Evangelist at DataStax, will share how DataStax Enterprise puts these powerful technologies together to solve common use cases in domains including entertainment and IoT. We’ll explore architectures for intelligent applications that leverage DSE to provide real-time operational analytics.
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Building Intelligent Applications w/ Cassandra, Spark & DataStax by Jeff Carpenter
1. Jeff Carpenter, DataStax
Building Intelligent Applications with Cassandra, Spark and DataStax
Enterprise Analytics
Big Data Day LA, 8/5/2017
DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States
and/or other countries.1
2. KillrVideo – a video sharing application
DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States
and/or other countries.2
https://github.com/KillrVideo
3. Who am I?
• Developer
• Architect
• Author
• Technical Evangelist
• Defense
• Hospitality
• R&D
• Distributed Systems
• Large Scale
• Cassandra
4. KillrVideo Capabilities
• Manage user
accounts
• Upload and tag videos
• Search videos
• By date, user,
rating, tag
• Watch videos
• Comment on videos
• Rate videos
DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States
and/or other countries.4
6. Apache Cassandra at a Glance
• First developed by Facebook
• Became a top-level Apache
Foundation project in 2010
• Distributed, decentralized
• Elastic scalability / high
performance
• High availability / fault tolerant
• Tuneable consistency
• Partitioned row store
6
Apache Cassandra ® Apache Software Foundation
7. Problems Cassandra is Especially Good At
• Large scale storage
– >10s of TB
• Lots of writes
– Time-series data, IoT
• Statistics and analytics
– For example, as a Spark data
source
• Geographic distribution
– Multiple data centers
7
Personalization
Customer
360
Recommendation
Fraud
Detection
Inventory
Management
Identity
Management
Security
Supply
Chain
8. KillrVideo Data Tier - Cassandra
• Schemas defined in CQL
– Look right ->
• Query-first design approach
• Microservices own individual
tables
DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States
and/or other countries.8
// Videos by id
CREATE TABLE IF NOT EXISTS videos (
videoid uuid PRIMARY KEY,
userid uuid,
name text,
description text,
location text,
location_type int,
preview_image_location text,
tags set<text>,
added_date timestamp
);
9. DataStax Enterprise
KillrVideo Services
DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States
and/or other countries.9
KillrVideo Services
Comment
Service
Ratings
Service
Search
Service
Statistics
Service
Suggested
Videos
Service
User
Management
Service
Video
Catalog
Service
Cassandra
Tables
12. What’s missing?
DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States
and/or other countries.12
13. Traditional Analytics Approach
DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States
and/or other countries.13
Applications and
Services
Data Tier
(e.g. DataStax Enterprise)
Hadoop cluster
Business Intelligence
and Reporting Tools
ETL or streaming
Analysis
Insights incorporated
into applications via
code changes
14. DataStax Enterprise
KillrVideo Recommendation Engine
DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States
and/or other countries.14
KillrVideo Services
Comment
Service
Ratings
Service
Search
Service
Statistics
Service
Suggested
Videos
Service
User
Management
Service
Video
Catalog
Service
Cassandra
Tables
DSE Graph
16. KillrVideo Graph Schema
DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States
and/or other countries.16
17. DataStax Studio
• Explore, query, and analyze
DSE Graph and Cassandra
data
• Gremlin Query Language
• Auto-completion, result set
visualization, execution
management, and much more.
• Friendly Fluent API
DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States
and/or other countries.17
18. Apache Spark at a Glance
• Distributed computing
framework
• Generalized DAG execution
• Easy Abstraction for Datasets
• Integrated SQL Queries
• Streaming
• Machine Learning Library
Company Confidential18
Spark
SQL
Spark
Streaming
MLib
Graph
X
Spark
R
Spark Core Engine
20. DataStax Enterprise
KillrVideo Recommendation Engine 2
DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States
and/or other countries.20
KillrVideo Services
Comment
Service
Ratings
Service
Search
Service
Statistics
Service
Suggested
Videos
Service
User
Management
Service
Video
Catalog
Service
DSE Graph
Mixed
workload
cluster
23. Flexible deployment options
• Mixed workload: virtual data centers within
a cluster separate operational and analytic
workloads
• Hybrid cloud: a single cluster spanning
on-premises and cloud
• Advanced replication: hub and spoke
arrangement of clusters for intermittent
connections or compliance
• Multi-instance: take advantage of big iron
by running several nodes on one machine
• Tiered storage: offload less frequently
used data to lower-cost storage options
DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States
and/or other countries.23
24. Get trained with DataStax Academy
• Free self-paced courses
• DS101: Introduction to Apache
Cassandra™
• DS201: DataStax Enterprise
Foundations of Apache Cassandra™
• DS210: DataStax Enterprise Operations
with Apache Cassandra™
• DS220: Data Modeling
• DS310: DataStax Enterprise Search
• DS320: DataStax Enterprise Analytics
with Apache Spark™
• DS330: DataStax Enterprise Graph
DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States
and/or other countries.24
https://academy.datastax.com
25. More Resources
25
Weekly show
Check out our weekly show on
distributed data development topics
via YouTube or your favorite podcast
app
Success Segments
Short form training on key topics
including Graph and Studio at
DataStax Academy
O’Reilly Book
Updated for Cassandra 3.X, including
CQL, SASI indexes, materialized
views, lightweight transactions,
DataStax drivers, and more