SlideShare a Scribd company logo
1 of 21
1706409 Zhu Na
1
Agenda
• What is Cassandra
• Main features and known issues
• Demo : Use Cassandra for OLAP
2
What is Cassandra
• Apache Cassandra is a
• Free for download and install
• Open-source still active on Github and JIRA
• NoSQL database management system
• designed to be distributed
 Handle large amounts of data
 Across many commodity servers
 Providing high availability with no single point of failure
3
Cassandra Query Language (CQL)
• CQL is
• a simple interface for accessing Cassandra
• as an alternative to the traditional Structured Query Language (SQL).
• CQL provides native syntaxes for collections and other common
encodings Language drivers are available for Java (JDBC), Python
(DBAPI2), Node.JS (Helenus), Go (gocql) and C++.
4
Something special
• Scalability
• MapReduce support
• Distributed
• Supports replication and multi data center replication
• Fault-tolerant
• consistency
5
Scalability
6
Map-Reduce
• Hadoop vs Spark
• Spark + Cassandra
7
Distributed : How to store data
• Key features of Cassandra’s distributed architecture are specifically tailored
for multiple-data center deployment.
• Cassandra operates by dividing all data evenly around a cluster of nodes,
which can be visualized as a ring. Nodes generally run on commodity
hardware. Each Cassandra node in the cluster is responsible for and
assigned a token range (which is essentially a range of hashes defined by a
partitioner).
• Each update or addition of data contains a unique row key (also known as
a primary key). The primary key is hashed to determine a replica (or node)
responsible for a token range inclusive of a given row key. The data is then
stored in the cluster n times (where n is defined by the
keyspace’s replication factor), or once on each replica responsible a given
query’s row key.
8
Distributed : How to read / write data
• A read request is processed using eventually consistency, and the keyspace was
created with a “replication factor” of 3, 2 of the 3 replicas for the requested data
would be contacted, their results merged, and a single result returned to the
client.
• A write requests, the coordinator node will send a write requests with all
mutated columns to all replica nodes for a given row key.
• First added to the commit log, which ensures durability of the transaction.
• Next, it is also added to the memtable. A memtable is a bounded in memory write-back
cache that contains recent writes which have not yet been flushed to an SSTable (a
permanent, immutable, and serialized on disk copy of the tables data).
• When updates cause a memtable to reach it’s configured maximum in-memory size, the
memtable is flushed to an immutable SSTable, persisting the data from the memtable
permanently on disk while making room for future updates.
• In the event of a crash or node failure, events are replayed from the commit log, which
prevents the loss of any data from memtables that had not been flushed to disk prior to an
unexpected event such as a power outage or crash.
9
Something tricky
• Cassandra is not row level consistent :
• When inserts and updates into the table
o affect the same row ; processed at approximately the same time
o may affect the non-key columns in inconsistent ways
o One update may affect one column while another affects the other
o resulting in sets of values within the row that were never specified or intended
oWhen update , Cassandra do not check the data is conflict or not !
10
Data model
• The most important thing to know in Cassandra data modeling: The primary key
• The simplest form :
• The first element in our PRIMARY KEY is what we call a partition key.
• The partition key has a special use in Apache Cassandra beyond showing the uniqueness of the
record in the database. The other purpose, and one that very critical in distributed systems, is
determining data locality.
• Added more elements :
• All columns listed after the partition key are called clustering columns.
• This is where Cassandra take a huge break from relational databases. Where the partition key is
important for data locality, the clustering column specifies the order that the data is arranged
inside the partition. The way we read this is left to right:
• Item one is the partition key
• Item two is the first clustering column.
• Item three is the second clustering column.
• After inserting data, you should expect your SELECT to return data in the
ascending/descending order of the item two for a single partition.
11
Demo for flight “delay”
• Maybe we all experienced being late for catching a flight or running
like a crazy in the airport transfer to next flight because the previous
one is delayed.
• Did you even notice sometimes your flight even fly earlier than
scheduled ? How often might this happen ?
• How could I know this airlines is “always late” or this transfer airport
always crowded so I can take a walk even the first flight is one hour
later than scheduled while I am booking the tickets?
• If we know where and how to look those data, and avoid some
problem if it might have a very high possibility to happen ?
12
Dataset
• Source : kaggle dataset flight-delay
• flights.csv for USA 2015 all the unscheduled flight ; airlines.csv; airports.csv
13
CQL
• Use primary key / clustering key
• No join
• Allow FILTERING
• Give your more controls :
• User-defined function (UDF)
• User-defined aggregate function (UDA)
14
Start Cassandra (Mac OS) Import the CSV data
• Start Cassandra first in terminal : /usr/local/apache-cassandra-3.10/bin/cassandra -f
• Then start cqlsh in another tab of terminal : /usr/local/apache-cassandra-3.10/bin/cqlsh
• Time for fun in cqlsh:
 CREATE KEYSPACE flight WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
 USE flight;
 CREATE TABLE flight
(YEAR SMALLINT,MONTH SMALLINT,
….
WEATHER_DELAY TEXT,
PRIMARY KEY (AIRLINE, destination_airport, origin_airport));
 COPY flight (YEAR,MONTH,DAY, …..,WEATHER_DELAY)
FROM '/Users/nanazhu/Downloads/flights.csv '
WITH header=true and NULL = 'NULL' ;
15
Query: I want to flight from JFK to LAX ,which
airline/what time should be double-checked?
16
What time is the delay happened most
17
Query : how many times / time in total for
some airline departing earlier?
18
Query : how many times / time in total for
some airline arrive late?
19
Reference
• From DataStax
• Using CQL
• DS220: Data Modeling
• From Tutorialspoint
• Cassandra tutorial
20
Questions ?
Thank you 
21

More Related Content

What's hot

Cassandra 2.1 boot camp, Read/Write path
Cassandra 2.1 boot camp, Read/Write pathCassandra 2.1 boot camp, Read/Write path
Cassandra 2.1 boot camp, Read/Write pathJoshua McKenzie
 
The No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelThe No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelRishikese MR
 
Dataguard implementation
Dataguard implementationDataguard implementation
Dataguard implementationRajshekar Patil
 
Faster and smaller inverted indices with Treaps Research Paper
Faster and smaller inverted indices with Treaps Research PaperFaster and smaller inverted indices with Treaps Research Paper
Faster and smaller inverted indices with Treaps Research Papersameiralk
 
8. column oriented databases
8. column oriented databases8. column oriented databases
8. column oriented databasesFabio Fumarola
 
Sql server logshipping
Sql server logshippingSql server logshipping
Sql server logshippingZeba Ansari
 
Implementing the Databese Server session 02
Implementing the Databese Server session 02Implementing the Databese Server session 02
Implementing the Databese Server session 02Guillermo Julca
 
Technical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPASTechnical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPASAshnikbiz
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsTop 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsCloudera, Inc.
 
SQL Server to Redshift Data Load Using SSIS
SQL Server to Redshift Data Load Using SSISSQL Server to Redshift Data Load Using SSIS
SQL Server to Redshift Data Load Using SSISMarc Leinbach
 
Executing Queries on a Sharded Database
Executing Queries on a Sharded DatabaseExecuting Queries on a Sharded Database
Executing Queries on a Sharded DatabaseNeha Narula
 
PostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability MethodsPostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability MethodsMydbops
 
PGConf.ASIA 2019 Bali - Upcoming Features in PostgreSQL 12 - John Naylor
PGConf.ASIA 2019 Bali - Upcoming Features in PostgreSQL 12 - John NaylorPGConf.ASIA 2019 Bali - Upcoming Features in PostgreSQL 12 - John Naylor
PGConf.ASIA 2019 Bali - Upcoming Features in PostgreSQL 12 - John NaylorEqunix Business Solutions
 

What's hot (18)

Cassandra 2.1 boot camp, Read/Write path
Cassandra 2.1 boot camp, Read/Write pathCassandra 2.1 boot camp, Read/Write path
Cassandra 2.1 boot camp, Read/Write path
 
The No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelThe No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra Model
 
Cassandra 101
Cassandra 101Cassandra 101
Cassandra 101
 
Dataguard implementation
Dataguard implementationDataguard implementation
Dataguard implementation
 
Faster and smaller inverted indices with Treaps Research Paper
Faster and smaller inverted indices with Treaps Research PaperFaster and smaller inverted indices with Treaps Research Paper
Faster and smaller inverted indices with Treaps Research Paper
 
8. column oriented databases
8. column oriented databases8. column oriented databases
8. column oriented databases
 
Sql server logshipping
Sql server logshippingSql server logshipping
Sql server logshipping
 
Bigtable and Dynamo
Bigtable and DynamoBigtable and Dynamo
Bigtable and Dynamo
 
4. hbase overview
4. hbase overview4. hbase overview
4. hbase overview
 
Implementing the Databese Server session 02
Implementing the Databese Server session 02Implementing the Databese Server session 02
Implementing the Databese Server session 02
 
Kafka Connect
Kafka ConnectKafka Connect
Kafka Connect
 
Technical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPASTechnical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPAS
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsTop 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
 
SQL Server to Redshift Data Load Using SSIS
SQL Server to Redshift Data Load Using SSISSQL Server to Redshift Data Load Using SSIS
SQL Server to Redshift Data Load Using SSIS
 
Executing Queries on a Sharded Database
Executing Queries on a Sharded DatabaseExecuting Queries on a Sharded Database
Executing Queries on a Sharded Database
 
PostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability MethodsPostgreSQL Replication High Availability Methods
PostgreSQL Replication High Availability Methods
 
PGConf.ASIA 2019 Bali - Upcoming Features in PostgreSQL 12 - John Naylor
PGConf.ASIA 2019 Bali - Upcoming Features in PostgreSQL 12 - John NaylorPGConf.ASIA 2019 Bali - Upcoming Features in PostgreSQL 12 - John Naylor
PGConf.ASIA 2019 Bali - Upcoming Features in PostgreSQL 12 - John Naylor
 
Nov 2011 HUG: Blur - Lucene on Hadoop
Nov 2011 HUG: Blur - Lucene on HadoopNov 2011 HUG: Blur - Lucene on Hadoop
Nov 2011 HUG: Blur - Lucene on Hadoop
 

Similar to Cassandra Tutorial

Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...DataStax Academy
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideMohammed Fazuluddin
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon RedshiftKel Graham
 
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftAmazon Web Services
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into CassandraBrent Theisen
 
Cassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityCassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityHiromitsu Komatsu
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overviewPritamKathar
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinChristian Johannsen
 
Top 20 FAQs on the Autonomous Database
Top 20 FAQs on the Autonomous DatabaseTop 20 FAQs on the Autonomous Database
Top 20 FAQs on the Autonomous DatabaseSandesh Rao
 
Connecting Hadoop and Oracle
Connecting Hadoop and OracleConnecting Hadoop and Oracle
Connecting Hadoop and OracleTanel Poder
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra nehabsairam
 
Cassandra for mission critical data
Cassandra for mission critical dataCassandra for mission critical data
Cassandra for mission critical dataOleksandr Semenov
 
Introduction To Maxtable
Introduction To MaxtableIntroduction To Maxtable
Introduction To Maxtablemaxtable
 
Cassandra
CassandraCassandra
Cassandraexsuns
 

Similar to Cassandra Tutorial (20)

BigData Developers MeetUp
BigData Developers MeetUpBigData Developers MeetUp
BigData Developers MeetUp
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
Apache cassandra
Apache cassandraApache cassandra
Apache cassandra
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
 
cassandra.pptx
cassandra.pptxcassandra.pptx
cassandra.pptx
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon Redshift
 
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
 
Cassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityCassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra Community
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overview
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
Top 20 FAQs on the Autonomous Database
Top 20 FAQs on the Autonomous DatabaseTop 20 FAQs on the Autonomous Database
Top 20 FAQs on the Autonomous Database
 
Connecting Hadoop and Oracle
Connecting Hadoop and OracleConnecting Hadoop and Oracle
Connecting Hadoop and Oracle
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
 
Cassandra for mission critical data
Cassandra for mission critical dataCassandra for mission critical data
Cassandra for mission critical data
 
Introduction To Maxtable
Introduction To MaxtableIntroduction To Maxtable
Introduction To Maxtable
 
Cassandra
CassandraCassandra
Cassandra
 

Recently uploaded

VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...Call girls in Ahmedabad High profile
 

Recently uploaded (20)

VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...
 

Cassandra Tutorial

  • 2. Agenda • What is Cassandra • Main features and known issues • Demo : Use Cassandra for OLAP 2
  • 3. What is Cassandra • Apache Cassandra is a • Free for download and install • Open-source still active on Github and JIRA • NoSQL database management system • designed to be distributed  Handle large amounts of data  Across many commodity servers  Providing high availability with no single point of failure 3
  • 4. Cassandra Query Language (CQL) • CQL is • a simple interface for accessing Cassandra • as an alternative to the traditional Structured Query Language (SQL). • CQL provides native syntaxes for collections and other common encodings Language drivers are available for Java (JDBC), Python (DBAPI2), Node.JS (Helenus), Go (gocql) and C++. 4
  • 5. Something special • Scalability • MapReduce support • Distributed • Supports replication and multi data center replication • Fault-tolerant • consistency 5
  • 7. Map-Reduce • Hadoop vs Spark • Spark + Cassandra 7
  • 8. Distributed : How to store data • Key features of Cassandra’s distributed architecture are specifically tailored for multiple-data center deployment. • Cassandra operates by dividing all data evenly around a cluster of nodes, which can be visualized as a ring. Nodes generally run on commodity hardware. Each Cassandra node in the cluster is responsible for and assigned a token range (which is essentially a range of hashes defined by a partitioner). • Each update or addition of data contains a unique row key (also known as a primary key). The primary key is hashed to determine a replica (or node) responsible for a token range inclusive of a given row key. The data is then stored in the cluster n times (where n is defined by the keyspace’s replication factor), or once on each replica responsible a given query’s row key. 8
  • 9. Distributed : How to read / write data • A read request is processed using eventually consistency, and the keyspace was created with a “replication factor” of 3, 2 of the 3 replicas for the requested data would be contacted, their results merged, and a single result returned to the client. • A write requests, the coordinator node will send a write requests with all mutated columns to all replica nodes for a given row key. • First added to the commit log, which ensures durability of the transaction. • Next, it is also added to the memtable. A memtable is a bounded in memory write-back cache that contains recent writes which have not yet been flushed to an SSTable (a permanent, immutable, and serialized on disk copy of the tables data). • When updates cause a memtable to reach it’s configured maximum in-memory size, the memtable is flushed to an immutable SSTable, persisting the data from the memtable permanently on disk while making room for future updates. • In the event of a crash or node failure, events are replayed from the commit log, which prevents the loss of any data from memtables that had not been flushed to disk prior to an unexpected event such as a power outage or crash. 9
  • 10. Something tricky • Cassandra is not row level consistent : • When inserts and updates into the table o affect the same row ; processed at approximately the same time o may affect the non-key columns in inconsistent ways o One update may affect one column while another affects the other o resulting in sets of values within the row that were never specified or intended oWhen update , Cassandra do not check the data is conflict or not ! 10
  • 11. Data model • The most important thing to know in Cassandra data modeling: The primary key • The simplest form : • The first element in our PRIMARY KEY is what we call a partition key. • The partition key has a special use in Apache Cassandra beyond showing the uniqueness of the record in the database. The other purpose, and one that very critical in distributed systems, is determining data locality. • Added more elements : • All columns listed after the partition key are called clustering columns. • This is where Cassandra take a huge break from relational databases. Where the partition key is important for data locality, the clustering column specifies the order that the data is arranged inside the partition. The way we read this is left to right: • Item one is the partition key • Item two is the first clustering column. • Item three is the second clustering column. • After inserting data, you should expect your SELECT to return data in the ascending/descending order of the item two for a single partition. 11
  • 12. Demo for flight “delay” • Maybe we all experienced being late for catching a flight or running like a crazy in the airport transfer to next flight because the previous one is delayed. • Did you even notice sometimes your flight even fly earlier than scheduled ? How often might this happen ? • How could I know this airlines is “always late” or this transfer airport always crowded so I can take a walk even the first flight is one hour later than scheduled while I am booking the tickets? • If we know where and how to look those data, and avoid some problem if it might have a very high possibility to happen ? 12
  • 13. Dataset • Source : kaggle dataset flight-delay • flights.csv for USA 2015 all the unscheduled flight ; airlines.csv; airports.csv 13
  • 14. CQL • Use primary key / clustering key • No join • Allow FILTERING • Give your more controls : • User-defined function (UDF) • User-defined aggregate function (UDA) 14
  • 15. Start Cassandra (Mac OS) Import the CSV data • Start Cassandra first in terminal : /usr/local/apache-cassandra-3.10/bin/cassandra -f • Then start cqlsh in another tab of terminal : /usr/local/apache-cassandra-3.10/bin/cqlsh • Time for fun in cqlsh:  CREATE KEYSPACE flight WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };  USE flight;  CREATE TABLE flight (YEAR SMALLINT,MONTH SMALLINT, …. WEATHER_DELAY TEXT, PRIMARY KEY (AIRLINE, destination_airport, origin_airport));  COPY flight (YEAR,MONTH,DAY, …..,WEATHER_DELAY) FROM '/Users/nanazhu/Downloads/flights.csv ' WITH header=true and NULL = 'NULL' ; 15
  • 16. Query: I want to flight from JFK to LAX ,which airline/what time should be double-checked? 16
  • 17. What time is the delay happened most 17
  • 18. Query : how many times / time in total for some airline departing earlier? 18
  • 19. Query : how many times / time in total for some airline arrive late? 19
  • 20. Reference • From DataStax • Using CQL • DS220: Data Modeling • From Tutorialspoint • Cassandra tutorial 20

Editor's Notes

  1. Request Development of a small project. Students are strongly encouraged to propose their own idea for projects. As a suggestion, they can refer to (and also select from) the following list of tools. The project connected to a tool consists, for example, in studying the logical data model(s) adopted by the tool, the native storage data structure it uses, the query language it provides, and highlighting further distinguishing features. Also, a demonstration of the basic use of the tool through one or more examples is required. Presentation connected to projects (possibly through slides) should last around 20 minutes (including the demo).
  2. https://academy.datastax.com/resources/ds101-introduction-cassandra?unit=introduction-cassandra-overview https://academy.datastax.com/resources/getting-started-apache-spark?unit=connecting-spark-reading-data-cassandra
  3. https://www.tutorialspoint.com/cassandra/cassandra_architecture.htm
  4. https://www.tutorialspoint.com/cassandra/cassandra_shell_commands.htm
  5. About
  6. Why we need hashing ? Think about you are searching on some data might contains “Disney” and you don’t know which node(s) has this data (imagine you have to turn over every stone to find it) DO you really think it is good idea to ask every node “do you have this data “Disney” ? Solution : Hashing the primary key and directly goes to the node which has this data and do the rest operation !
  7. https://www.tutorialspoint.com/cassandra/cassandra_data_model.htm
  8. CREATE OR REPLACE FUNCTION group_time_and_sum_delay(state map<smallint, int>, time smallint , delay int) CALLED ON NULL INPUT RETURNS map<smallint, int> LANGUAGE java AS 'if (delay !=null && delay > 0) state.put(time,delay + state.getOrDefault(time,0));return state;'; CREATE AGGREGATE IF NOT EXISTS group_time_and_sum(smallint,int) SFUNC group_time_and_sum_delay STYPE map<smallint, int> INITCOND {}; select group_time_and_sum(MONTH ,ARRIVAL_DELAY) from flight ; select group_time_and_sum(day_of_week ,ARRIVAL_DELAY) from flight ;
  9. cqlsh:flight> CREATE OR REPLACE FUNCTION group_airline_and_sum_early(state map<text, int>, airline text , delay int) CALLED ON NULL INPUT RETURNS map<text, int> LANGUAGE java AS 'if (delay !=null && delay < 0) state.put(airline,delay + state.getOrDefault(airline,0));return state;'; cqlsh:flight> CREATE AGGREGATE IF NOT EXISTS group_and_sum_early(text,int) SFUNC group_airline_and_sum_early STYPE map<text, int> INITCOND {}; cqlsh:flight> CREATE OR REPLACE FUNCTION group_airline_and_counter_early(state map<text, int>, airline text , delay int) CALLED ON NULL INPUT RETURNS map<text, int> LANGUAGE java AS 'if (delay !=null && delay < 0) state.put(airline, 1 + state.getOrDefault(airline,0));return state;'; cqlsh:flight> CREATE AGGREGATE IF NOT EXISTS group_and_count_early(text,int) SFUNC group_airline_and_counter_early STYPE map<text, int> INITCOND {}; cqlsh:flight> select group_and_count_early(airline,departure_delay) from flight ;  flight.group_and_count_early(airline, departure_delay) -----------------------------------------------------------------------------------------------------------------------------------------------------------  {'AA': 459, 'AS': 118, 'B6': 273, 'DL': 476, 'EV': 754, 'F9': 215, 'HA': 25, 'MQ': 228, 'NK': 257, 'OO': 686, 'UA': 588, 'US': 162, 'VX': 39, 'WN': 1341} (1 rows) Warnings : Aggregation query used without partition key cqlsh:flight> select group_and_sum_early(airline,departure_delay) from flight ;  flight.group_and_sum_early(airline, departure_delay) -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  {'AA': -3995, 'AS': -3010, 'B6': -1707, 'DL': -4560, 'EV': -7246, 'F9': -2277, 'HA': -419, 'MQ': -2300, 'NK': -1972, 'OO': -8081, 'UA': -2608, 'US': -707, 'VX': -398, 'WN': -3849} (1 rows) Warnings : Aggregation query used without partition key cqlsh:flight> select airline from airlines where iata_code in ('DL','WN');  airline ------------------------    Delta Air Lines Inc.  Southwest Airlines Co.
  10. cqlsh:flight> CREATE OR REPLACE FUNCTION group_airline_and_sum_late(state map<text, int>, airline text , delay int) CALLED ON NULL INPUT RETURNS map<text, int> LANGUAGE java AS 'if (delay !=null && delay > 0) state.put(airline,delay + state.getOrDefault(airline,0));return state;'; cqlsh:flight> CREATE AGGREGATE IF NOT EXISTS group_and_sum_late(text,int) SFUNC group_airline_and_sum_late STYPE map<text, int> INITCOND {}; cqlsh:flight> CREATE OR REPLACE FUNCTION group_airline_and_counter_late(state map<text, int>, airline text , delay int) CALLED ON NULL INPUT RETURNS map<text, int> LANGUAGE java AS 'if (delay !=null && delay > 0) state.put(airline, 1 + state.getOrDefault(airline,0));return state;'; cqlsh:flight> CREATE AGGREGATE IF NOT EXISTS group_and_count_late(text,int) SFUNC group_airline_and_counter_late STYPE map<text, int> INITCOND {}; cqlsh:flight> select group_and_sum_late(airline,ARRIVAL_DELAY) from flight ;  flight.group_and_sum_late(airline, arrival_delay) -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  {'AA': 13623, 'AS': 2347, 'B6': 8998, 'DL': 13679, 'EV': 38140, 'F9': 8255, 'HA': 434, 'MQ': 9605, 'NK': 10276, 'OO': 28878, 'UA': 17146, 'US': 8407, 'VX': 805, 'WN': 34363} (1 rows) Warnings : Aggregation query used without partition key cqlsh:flight>  cqlsh:flight> select group_and_count_late(airline,ARRIVAL_DELAY) from flight ;  flight.group_and_count_late(airline, arrival_delay) -----------------------------------------------------------------------------------------------------------------------------------------------------------  {'AA': 447, 'AS': 129, 'B6': 225, 'DL': 424, 'EV': 855, 'F9': 232, 'HA': 29, 'MQ': 251, 'NK': 262, 'OO': 897, 'UA': 364, 'US': 192, 'VX': 29, 'WN': 1013} (1 rows) Warnings : Aggregation query used without partition key cqlsh:flight> select airline from airlines where iata_code in ('HA','VX','WN');  airline ------------------------  Hawaiian Airlines Inc.          Virgin America  Southwest Airlines Co.