Webinar: Getting Started with Apache Cassandra


Published on

Would you like to learn how to use Cassandra but don’t know where to begin? Want to get your feet wet but you’re lost in the desert? Longing for a cluster when you don’t even know how to set up a node? Then look no further! Rebecca Mills, Junior Evangelist at Datastax, will guide you in the webinar “Getting Started with Apache Cassandra...”

You'll get an overview of Planet Cassandra’s resources to get you started quickly and easily. Rebecca will take you down the path that's right for you, whether you are a developer or administrator. Join if you are interested in getting Cassandra up and working in the way that suits you best.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Part of my job is to help try to make Cassandra more approachable for everyone
    A lot of people claim that other databases are faster and easier to get up and running with then Cassandra
    I consider it my mission to guide people through the challenges of getting started
  • Maybe you don’t have a loads of free time to spend trying to learn how to use a new database
    Sometimes it can be hard navigating your way through tangly docs
    when you really just want a quick taste of what its like to use the database
    Today I’m going to give you a brief overview of what it takes, we’ll say the bare minimum steps to get up and running with Cassandra
    I’m not saying you’ll have your own 100 node cluster going by the end of all this, but at least you’ll have a concept of what its like
    So sit back, relax, and lets go
  • As a Junior Evangelist I
    try to create awareness for open source cassandra
    I develop Cassandra themed content like blog posts, video tutorials, webinars, and I also have my twitter account
    Part of my job is also to step in the shoes of a ‘newbie’ to try to determine what kind of problems people just being
    introduced to Cassandra might encounter, which may not be obvious to an expert.
  • So if you haven’t already, head to Planet Cassandra and go to the Downloads section
    There you can choose your operating system and the type of DSC download that you want
    On the downloads page there are also guides on how to install DSC once you have it
  • Alright, well lets get going with our instance
  • But before you can fire up your instance, there are a few things that we need to tinker with
    Otherwise Cassandra may not work properly, or may not even start up at all!
    If we were starting up a cluster, this list would get a little longer as we would have to tell the nodes how to share information
    But for now we are just worried about our single instance
    Two things we are concerned about are checking our version of Java and making sure we have access to our data files when they get saved
  • Firstly, you need to make sure you have the latest version of Java, JDK7 installed on all your nodes
  • You’re going to want to to change the location of data, commit logs and save caches
    If you leave them as default, you’re going to have to run Cassandra as root in order for it to start, which isn’t ideal of course
    Put probably The easiest way to deal with this problem is set the save location in your home directory
    The location for the saves is configured in the cassandra.yaml file in the conf directory
  • Instead of using the default directory paths we’ll change them all to use our home directory.
    This will guarantee that we have the correct permissions.
  • We’re going to run through this list here now of 5 things you should be able to do quickly when you start up a Cassandra instance
  • So assuming you downloaded the tarball, just go to your install location and run cassandra from the bin directory
  • Once we get our instance started, we can run CQL shell
    CQL is Cassandra Query Language
    Syntactically its pretty similar to SQL, so it shouldn’t be too hard if you have a relational database background
    When you run CQL shell, you’ll get a prompt and then you can start communicating with your database
  • So a keyspaces hold our data in cassandra
    They have tables which are made up of rows and columns
    A row represents a single data entry
    Here I’m showing the creation of a keyspace in CQL, never mind the class and replication factor component for now, that’s outside of the scope of this webinar
    And then I created a “user” table within that keyspace, where I assign the columns a name and data type
  • Next, We can populate our the rows in our table using the insert command
    If I ran these 3 insert commands, it would inject 3 rows of user information into the “users” table I made
  • If we wanted to query our database, a “SELECT * FROM users“ would return all the rows from the table
    Using a WHERE clause and a specific last name (which we set to be our primary key), it would return the users associated with that last name.
    The PRIMARY KEY (which is also the partition key in this case) refers to the partition on disk where the data is located
  • These are examples of what an update and delete look like in CQL
    As you can see its pretty familiar looking syntax, it’s just that simple!
  • Two really great tools you can use with Cassandra are Opscenter and DevCenter
  • DevCenter is a free tool you can download on the DS website
    It’s a cool alternative to CQL shell, if you’d prefer a GUI
    You can connect to a local server or remote clusters
  • This is what dev center looks like
    You can type most of the same commands here as you would in CQL shell
    It has almost the same functionaliy, and has a nice visual interface
  • In the connection center, you can save a new connection if you intend to use it frequently, Instead of reconnecting over and over each time that you use it
    Here I’m connecting to an instance on my local machine
  • I’m running the same commands here as I was in CQL as earlier, creating that same demo keyspace
  • Creating that same user table.
    Notice the nice syntax highlighting.
    Also notice the schema window in the upper right corner showing all our keyspaces
  • Insert new records into the database
  • Then select those records and get a nice table view of the data
  • OpsCenter is a there to help you manage a Cassandra cluster
    Because managing a lot of machines can be a challenge sometimes
  • It’s easy to make cluster wide configuration changes with Opscenter, instead of digging through configuration files on the command line
  • You can also diagnosis problems with your cluster using Opscenter
    You can set up graphs to track Write latency, read latency, hinted handoff etc
    And these may give you a good indication of the source of a problem
  • So what about multi data center?
    Of course Opscenter does multi data center! Because its cassandra! 
  • You can create a Cassandra instance or cluster in the cloud using the AWS AMI
    You spin these up through Opscenter
    In the new cluster section, select the cloud option, which only appears if you’re running opscenter on an EC2 instance
  • Adding a cluster can be done from a single image and configuration file
    You give your Datastax credentials sent to you by email
    As well as the credentials of each node
  • You use your own AWS credentials to create a cluster and configure things like security groups on the fly
  • So DS has drivers for Java, Python, C# and C++
    There are a lot of other opensource drivers though
    Check out the Client Drivers section of Planet Cassandra and you’ll probably find one in the language you’re looking for
  • Connecting to your cluster using Java is really easy
    First create a cluster object
    Use the builder method to connect to the cluster
    That’s it! It’s just that easy.
  • Here is a simple program that will connect to your database
    Just a few lines of code and you are ready to insert and select data from Cassandra
  • Here the same situation in python, I wish I had more to say about this but it essentially the same, very simple
    Create a cluster object and use the connect method. That’s it.
  • Once you have a session, you can use the execute method to run CQL commands
  • So if your looking for great resouroces on Apache Casandra, you should definiety check out Planet Cassandra
    You’ll find everything you need there: webinars, blog posts, use cases, tutorials
    While you’re there, check out the try Cassandra section, which I created all the content for
  • Try cassandra has quick 10 minute tutorial for developers and administrators
    And some walk through videos that I made to help you guys out
  • Thank you everyone! Is there any questions?
  • Webinar: Getting Started with Apache Cassandra

    1. 1. ©2013 DataStax Confidential. Do not Rebecca Mills Junior Evangelist DataStax @rebccamills Getting Started with Apache Cassandra 1
    2. 2. • Then you’ve come to the right place! • To learn some important basics of Cassandra without ever having to leave your couch Don’t want to spend exorbitant amount of time and energy learning a new database?
    3. 3. What do I do? • Try to create awareness for open source Cassandra • Develop content to get people interested in trying • Identify problems newcomers might be encountering • Develop strategies and material to help with that
    4. 4. Where can you download Cassandra? • The easiest way is to head straight to Planet Cassandra • http://planetcassandra.or • Go to the “Downloads” section, choose you operating system and the version of DSC that’ you’d like • Get crackin’!
    5. 5. Let’s get started
    6. 6. 2 things you should do to get going 1.Check your version of Java 2.Edit your cassandra.yaml file to point your Cassandra instance towards your home directory
    7. 7. 1. Check your version of Java • To check what version of java you are using, at the prompt type % java –version •Be sure to use the latest version (JDK 7) on all nodes
    8. 8. 2. Change default location to save data • Don’t run Cassandra as root • Other wise we will not be able to start Cassandra or have access to the directories where our data is being saved. • Access the cassandra.yaml file though the cassandra conf directory
    9. 9. The 3 lines you should change in the cassandra.yaml file: Edit cassandra.yaml data_file_directories: - /var/lib/cassandra/data -$HOME/cassandra/data commitlog_directory: /var/lib/cassandra/commitlog $HOME/cassandra/commitlog saved_caches_directory: /var/lib/cassandra/saved_caches $HOME/cassandra/saved_caches
    10. 10. 1.Start up an instance 1.Create a schema with CQL 2.Inject some data into our instance 1.Run a query against our database 5 things you can do quickly
    11. 11. 1. Start up an instance • It’s very simple! Just go to your install location and start it from the bin directory as such: $ cd install_location $ bin/cassandra
    12. 12. 2. Create a schema with CQL • From within your installation directory, start up your CQL shell from within the bin directory $ cd install_directory $ bin/cqlsh • You should see the cqlsh command prompt as such Connected to Test Cluster at localhost:9160. [cqlsh 4.1.1 | Cassandra 2.0.8 | CQL spec 3.1.1 | Thrift protocol 19.39.0] Use HELP for help. cqlsh>
    13. 13. 2. Create a schema with CQL • A keyspace is a container for our data. Here we are creating a demo keyspace and a users table within. A table consists of rows and columns. CREATE KEYSPACE demo WITH REPLICATION = {‘class’:’SimpleStrategy’,’replication_factor’:1}; USE demo; CREATE TABLE users ( firstname text, lastname text, age int, email text, city text, PRIMARY KEY (lastname) );
    14. 14. 3. Inject some data into your instance • Nothing sadder than an empty database. Here we are populating our “users” table with rows of data using the INSERT command. INSERT INTO users (firstname, lastname, age, email, city) VALUES (‘John’,’Smith’, 46, ‘johnsmith@email.com’, ‘Sacramento’); INSERT INTO users (firstname, lastname, age, email, city) VALUES (‘Jane’,’Doe’, 36, ‘janedoe@email.com’, ‘Beverly Hills’); INSERT INTO users (firstname, lastname, age, email, city) VALUES (‘Rob’,’Byrne’, 24, ‘robbyrne@email.com’, ‘San Diego’);
    15. 15. 4. Make a query against your database SELECT * FROM users; SELECT * FROM users WHERE lastname=‘Doe’; lastname | age | city | email | firstname ----------+-----+---------------+---------------------+----------- Doe | 36 | Beverly Hills | janedoe@email.com | Jane Bryne | 24 | San Diego | robbyrne@email.com | Rob Smith | 46 | Sacramento | johnsmith@email.com | John lastname | age | city | email | firstname ----------+-----+---------------+-------------------+----------- Doe | 36 | Beverly Hills | janedoe@email.com | Jane
    16. 16. 5. Make a change to your data UPDATE users SET city=‘San Jose’ WHERE lastname=‘Doe’; SELECT * FROM users WHERE lastname= ‘Doe’; lastname | age | city | email | firstname ----------+-----+----------+-------------------+------------- Doe | 36 | San Jose | janedoe@email.com | Jane SELECT * FROM users;DELETE FROM users WHERE lastname=‘Doe’; lastname | age | city | email | firstname ----------+-----+---------------+---------------------+----------- Bryne | 24 | San Diego | robbyrne@email.com | Rob Smith | 46 | Sacramento | johnsmith@email.com | John
    17. 17. Two really neat tools: 1. Opscenter 2. DevCenter
    18. 18. Dev Center • Try out your CQL in an easy- to-use tool • Has most of the same functionality as cqlsh with a few exceptions • Quickly connect to your cluster and keyspace. GO!
    19. 19. Opscenter • Opscenter makes it easy to manage and configure your cluster!
    20. 20. Change configurations • Just a couple clicks and you can reconfigure an entire cluster.
    21. 21. Metrics • Diagnosis problems with your cluster
    22. 22. How about multi datacenter? Of course!
    23. 23. You can run an AWS AMI from Opscenter! • Run a Cassandra instance/cluster in the cloud! • Using Amazon Web Services EC2 Management Console • Quickly deploy a Cassandra cluster within a single availability zone through Opscenter • Check out http://www.datastax.com/documentation/cassa
    24. 24. What about the drivers • Datastax provides drivers for Java, Python, C#, and C+ + • There are also many open sources community drivers, including Closure, Go, Node.js and many many more.
    25. 25. Connect to your instance with Java • Create a new Java class, com.example.cassandra.SimpleClient for example • Add an instance field to hold cluster reference private Cluster cluster; • Add an instance method, connect, to your new class. Here you can add your contact point, the ip address of your node. public void connect(String node) { cluster = Cluster.builder() .addContactPoint(<ip_address>) .build(); } • Add an instance method, close, to shut down the cluster once you are finished
    26. 26. Connect to your instance with Java • In your main class, create a SimpleClient object, call connect, and close it public static void main(String[] args) { SimpleClient client = new SimpleClient(); client .connect(<ip_address>); client.close(); } • Select some data session.execute (‘SELECT * FROM demo.users’);
    27. 27. Connect to your instance in Python • From cassandra.cluster import Cluster cluster = Cluster() • This will attempt to connect to a cluster on your local machine. You could also give it an ip address and it will connect to that. cluster = Cluster(<ip_address>) • To connect to a node and begin begin actually running queries against our instance, we need a session, which is created by calling Cluster.connect() cluster = Cluster() Session = cluster.connect() • You can even connect to a particular keyspace cluster = Cluster() Session = cluster.connect(‘demo’)
    28. 28. Connect to your instance in Python • Select some data results = session.execute (””” SELECT * FROM demo.users “““)
    29. 29. Get familiar • Visit http://planetcassandra.org • Your #1 destination for NoSQL Apache Cassandra resources • Downloads, webinars, presentations, blog posts, and much, much more!
    30. 30. Try Cassandra
    31. 31. Thank you!! Any Questions?