NoSQLThenraja VettivelrajSwansea UniversityContents ListABSTRACT1. INTRODUCTION2. MAIN FEATURES2.1 COMPARISON WITH SQL3. EXAMPLE - CASSANDRA3.1 MAIN FEATURES OF APACHE CASSANDRA3.2 WHY APACHE CASSANDRA?3.3 APPLICATIONS4. DRAWBACKS OF NOSQL5. SUMMARY6. REFERENCES
ABSTRACTNoSQL is one of the emerging fields without any arguments. It is a very powerful and efficient tool indata storage and manipulating the data. It has no fixed Schema, no Joins and it also avoided the“ACID” properties. [Han, J. et al., 2011] And basically one of the advantages of the NoSQL is verymuch faster than the SQL and also the operational cost will be low than the relational database. Due tothe current trend there is necessity in increase of Storage, Connectedness, Architecture and Semi-Structure [Accessed: 25 Feb 2012].1. INTRODUCTIONThe term “NoSQL” means, it has so many interpretations at first many told that it is Non-Relationaldatabase and others say that “NOSQL” stands for Not Only SQL. And now-a-days they are calling theterm “NOSQL” as an Umbrella term for all the databases and the data stores which don’t follow therelational database and also it is not a single technology or a product but it is a class of products,collection of diverse and matter of about how to manipulate and store the data [Accessed: 24 Feb2012]Its a term basically hit the market on 1998 [Accessed: 24 Feb 2012] and now for the past 3-4 years ithas its own place in the market because of its tremendous growth. Massive scalability, Lower cost,Schema flexibility, Massive Data Stores and high availability [Accessed: 24 Feb2012]. Some of themain applications of the NoSQL are Search Engines, Data Processing and Social Website. NoSQLdoes not support Joins and but it supports ACID properties.There are four main data models in NoSQL namelyKey-Value StoresBig Table ClonesDocument DatabasesGraph DatabasesIn these we have to choose the right one for our job [Accessed: 25 Feb 2012]. Some of the veryexamples of NOSQL databases are Cassandra which is used by Facebook (Social Networking Site)and it comes under the Key-Value store. It has the capability to handle data very huge Terabyte (TB)of data in a single day because of its users. Big Table is an example for BigTable Clones and theyreasoned for developing their own database in order to increase the control the performance andscalability. Google uses for its Search Engine, Gmail, Orkut and other Google applications. Neo4j is avery good example for Graph database and it is written in Java. Apache CouchDB which is anexample for Document database written in Erlang. In the Figure 1 they have compared the fourdifferent data models of NoSQL in a graph size versus complexity.2. MAIN FEATURESCAP theorem-Consistency, Availability and Partition tolerance. According to [Accessed: 11 Mar2012] “Available, Partition-Tolerant (AP) Systems achieve "eventual consistency" throughreplication and verification. Examples of AP systems is Cassandra, CouchDBConsistency means that each client always has the same view of the data.Availability means that all clients can always read and write.Partition tolerance means that the system works well across physical networkpartitions.”
SizeComplexityFigure 1: Comparison on NoSQL data models2.1 COMPARISON WITH SQLWhen we compare with SQL, NoSQL slightly have the upper hand because of scalability andperformance. Uses map reduce, CQL instead of SQL language.3. EXAMPLE - CASSANDRACassandra is one of the well known NoSQL database and it is used widely because it has thecapability to handle large amounts of structured data without any failure and it will be ease of use.It is written in Java and it requires JVM (Java Virtual Machine) to be installed in the system beforeyou start your Server and also is of key-value store type. Basically Cassandra supports CQL(Cassandra Query Language). DataStax is one of the third party distributions of the Cassandra and ithas the Cassandra CQL Shell where we have to create the Keyspace and Column family.Figure 2: Cassandra CQL Shell where keyspace and column family createdKey-valuestoresBig table clonesDocument databasesGraph databases
Keyspace is the outer most grouping of our data and it also a collection of column family and typicallyeach application will have one keyspace name. They are the management and configuration part forthe column family. And one most important thing about the keyspace is the replicating factor. In theabove we created the strategy class as Simple strategy, other than this there is Network strategytopology. And we can create multiple number of nodes. Then created the Column family namedexample. Normally there are two types of column namelyStandard column family andSuper column familyCassandra consists of three simple methods. They are insert, get and delete.Standard column familySuper Colum familyFigure 3: Cassandra Data Modelling3.1 MAIN FEATURES OF APACHE CASSANDRAPartitioningThis is one of the main features in Cassandra because the data we are storing will be partitioneddynamically and stored in the cluster over the set of available nodes by using the Hash mechanism.By consistent hashing we will get a fixed circular space or “ring”. Each node has been assigned with arandom which denotes the position in the ring. Each data stored has been assigned a specific key inthe ring.
Figure 4: Ring View of Cassandra Test clusterThe above shown is the ring view of the Cassandra test cluster which has a token value and also it hassome other information like IP, Size and Load which is available in Web Interface of Datastax(http://localhost:8888/opscenter/index.html) by default.Scaling the clusterCassandra can also support multi node. When a new node is added into the existing system whichalready has one node will split up the workload of other node and hence will be responsible for thesame job what the other node does. This can be done by the Bootstrap algorithm by some node incommand line utility or by the Cassandra web dashboard.Figure 5: Cassandra dashboard3.2 WHY APACHE CASSANDRA?There are many factors that why I should have Cassandra mainly because it has the capability tohandle TB or PB’s of data in a peer to peer architecture, it follows CQL (Cassandra Query Language)which is alike SQL, peer to peer architecture, Data will be replicated to multiple nodes and hence
there won’t be single point of failure, cloud enabled, data will be replicated to more than one locationin case of disaster recovery scenarios so there will be durability and high availability, transparent faultdetection and recovery which follows gossip protocol, ease of use and no special hardware is requiredto run.3.3 APPLICATIONSCompanies like Accenture, Twitter, Facebook and many more companies were using the NoSQLdatabase in one or other way because of its main features. Not only in industries but also inEducational and other government sectors also slowly started using the NoSQL database. For example“Burt uses Cassandra in their software to help advertisers and agencies improve the efficiency andeffect of online campaigns” [Accessed: 11 Mar 2012].4. DRAWBACKS OF NOSQLUnlike the SQL it doesnt have ACID properties. So we cannot expect the degree of reliability whatwe get in the SQL database. Many were unfamiliar with this technology. Unlike the other commercialSQL databases here we wont get enough support for the product, since many of the NoSQL wereonly limited support.5. SUMMARYLike Graph database, Key-value database, Big table Clones, Document database it has made a verybig impact in the database field and most of them are Open source. So in my point of view I am surethat many will soon migrate towards NoSQL from SQL. So in the next two to four years we canexpect a major change in the database field because of its scalability and its other features, butchances are less that it will replace the SQL databases. Each database has its Pros and Cons and it’sour duty to choose the right one.
6. REFERENCES[Accessed: 24 Feb2012] Slideshare.net (2010) NoSQL databases. [Online] Available at:http://www.slideshare.net/marin_dimitrov/nosql-databases-3584443[Accessed: 24 Feb 2012] Perdue, T. (1998) NoSQL - An Overview of NoSQL. [online] Available at:http://newtech.about.com/od/databasemanagement/a/Nosql.htm[Accessed: 24 Feb 2012] Tiwari, S. (2011) Professional NoSQL. [e-book] Wrox Programmer toProgrammer. Available through: Google Bookshttp://books.google.co.uk/books?id=tv5iO9MnObUC&printsec=frontcover&dq=nosql&hl=en&sa=X&ei=5vw_T9CABMG_0QWtzqyPDw&ved=0CEQQ6AEwAg#v=onepage&q=nosql&f=false[Han, J. et al. , 2011] Han, J. et al. (2011)"Survey on NoSQL database," Pervasive Computing andApplications (ICPCA), 2011 6th International Conference on , vol., no., pp.363-366, 26-28 Oct. 2011doi: 10.1109/ICPCA.2011.6106531[Accessed: 4 Mar 2012] Slideshare.net (2010) NoSQL or not NoSQL? [Online] Available at:http://www.slideshare.net/ruflin/nosql-or-not-nosql[Accessed: 25 Feb 2012] Blogs.neotechnology.com (2009) NOSQL: scaling to size and scaling tocomplexity - Emils Neo Thoughts. [Online] Available at:http://blogs.neotechnology.com/emil/2009/11/nosql-scaling-to-size-and-scaling-to-complexity.html[Accessed: 25 Feb 2012] Slideshare.net (2011) A NOSQL Overview And The Benefits Of GraphDatabases (nosql east 2009). [Online] Available at: http://www.slideshare.net/emileifrem/nosql-east-a-nosql-overview-and-the-benefits-of-graph-databases[Accessed: 25 Feb 2012] Slideshare.net (2011) NOSQL for Dummies. [Online] Available at:http://www.slideshare.net/thobe/nosql-for-dummiesLeavitt, N.; , "Will NoSQL Databases Live Up to Their Promise?," Computer , vol.43, no.2, pp.12-14,Feb. 2010 doi: 10.1109/MC.2010.58URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5410700&isnumber=5410692[Accessed: 11 Mar 2012] Blog.nahurst.com (2010) Visual Guide to NoSQL Systems - Nathan HurstsBlog. [Online] Available at: http://blog.nahurst.com/visual-guide-to-nosql-systems[Accessed: 11 Mar 2012] Datastax.com (2011) Cassandra Users | DataStax. [online] Available at:http://www.datastax.com/cassandrausers