Apache Cassandra Lunch #67: Moving Data from Cassandra to Datastax Astra

Version 1.0
Moving Data from Cassandra to
DataStax Astra using DSBulk
An Anant Corporation Story.

Cassandra
● Apache Cassandra is an open-source distributed No-
SQL database designed to handle large volumes of data
across multiple different servers
● Cassandra clusters can be upgraded by either
improving hardware on current nodes (vertical
scalability) or adding more nodes (horizontal
scalability)
○ Horizontal scalability is part of why Cassandra is
so powerful - cheap machines can be added to a
cluster to improve its performance in a
significant manner
● Note: Demo will use Open Source Cassandra
○ Works nearly identically with DSE Cassandra

DataStax Astra
● Astra website:
https://www.datastax.com/products/datastax-astra
● DataStax Astra is a fully managed, serverless database
built on Apache Cassandra, and is provided by
DataStax
● Some additional features:
○ Stargate APIs: Makes it easy for developers to use a
Cassandra-based database like Astra to work with data
without deep knowledge of CQL
○ Zero Lock-In: Deploy on AWS, GCP and Azure and still
maintain compatibility with open-source Cassandra
○ Global Scale: Data replication across multiple data
centers, availability zones, and multiple regions.
■ Additionally, allows a user to scale an Astra
database up to multiple petabytes of data without
impacting speed or performance
○ 80 GB of storage and 20 million read/write operations for
free every month

DSBulk
● DSBulk: DataStax Bulk Loader for Apache Cassandra is an open source software used to
load/unload CSV or JSON data in and out of supported databases
● Supported databases:
○ DataStax Astra cloud database
○ DataStax Enterprise (DSE) 4.7 and later
○ Open source Apache Cassandra 2.1 and later
● More information about DSBulk, along with an introduction to it and various documentation can
be found linked here: https://docs.datastax.com/en/dsbulk/doc/dsbulk/dsbulkAbout.html
● Github Repository for the DataStax DSBulk project: https://github.com/datastax/dsbulk

DSBulk cont...
● Commands that will be used in today’s presentation/demo:
○ dsbulk load
■ This command is used to load data into a cassandra/astra database without a configuration file. Note
that necessary parameters will have to be passed in (listed below)
○ dsbulk unload
■ This command is used to unload data from a cassandra/astra database without a configuration file,
into a CSV or JSON file. Note that necessary parameters will have to be passed in as well.
○ dsbulk count
■ This command is used to return information about loaded data in a cassandra/astra database.
● Some necessary parameters/flags that must be used if using these commands without a configuration file:
○ -k: keyspace
○ -t: table
○ -b: path to secure connect bundle (only necessary if connecting to astra)
○ -u: username, -p: password (to the database)
■ Since recent Astra update earlier this year, need to use ClientID/ClientSecret instead of
username/password.
■ Can be left empty if cassandra database user/password is left as default (cassandra/cassandra)
○ -url: url from where to pull .CSV or .JSON file from, or a local directory for where to unload data into

Demo Project Slide
● Link to Github Repo: https://github.com/DataStax-Examples/dsbulk-to-astra/
○ Demo is based on sample data from this github repository
● Will be going through four main processes using dsbulk:
○ Loading a .csv hosted online into local cassandra
○ Loading a .csv hosted online into astra
○ Unloading from local cassandra to a .csv file
○ Loading from a .csv file into astra

Resources
● https://www.datastax.com/products/datastax-astra
● https://github.com/datastax/dsbulk
● https://docs.datastax.com/en/dsbulk/doc/dsbulk/dsbulkAbout.html
● https://docs.datastax.com/en/dsbulk/doc/dsbulk/install/dsbulkInstall.html
● https://github.com/DataStax-Examples/dsbulk-to-astra/
● https://github.com/Anant/cassandra.api/
● https://docs.datastax.com/en/dsbulk/doc/dsbulk/getStartedDsbulk.html
● https://www.datastax.com/products/datastax-astra/features
● https://docs.datastax.com/en/dsbulk/doc/dsbulk/dsbulkSimpleLoad.html
● https://docs.datastax.com/en/dsbulk/doc/dsbulk/dsbulkSimpleUnload.html

Strategy: Scalable Fast Data
Architecture: Cassandra, Spark, Kafka
Engineering: Node, Python, JVM,CLR
Operations: Cloud, Container
Rescue: Downtime!! I need help.
www.anant.us | solutions@anant.us | (855) 262-6826
3 Washington Circle, NW | Suite 301 | Washington, DC 20037

Apache Cassandra Lunch #67: Moving Data from Cassandra to Datastax Astra

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Apache Cassandra Lunch #67: Moving Data from Cassandra to Datastax Astra

Similar to Apache Cassandra Lunch #67: Moving Data from Cassandra to Datastax Astra (20)

More from Anant Corporation

More from Anant Corporation (20)

Recently uploaded

Recently uploaded (20)

Apache Cassandra Lunch #67: Moving Data from Cassandra to Datastax Astra