SlideShare a Scribd company logo
1 of 11
Devopam Mittra
Transforming RDBMS data model
to Apache Cassandra
• Data modeling is pivotal for a successful Data Access Layer in Cassandra.
• Unlike a traditional RDBMS table, a ‘Column Family’ aka Cassandra table has to be modeled based on the usage (and
not the storage).
• ‘Searchability’ of information becomes a key driving factor in Cassandra data modeling.
• Schema / structural changes have a huge effort and time impact.
 Storing structured data in Cassandra typically evades attention and detail as modelers prefer to do an as-is structural
copy from the source.
 This approach works well to suit structured query analytics and not beyond, defeating the basic purpose of self-service
analytics.
 Any changes to the structure / model leads to redoing modeling effort including associated testing and validation
 Leveraging ‘Collection’ type to store collated native datatype elements and only retain primary key attributes in main
table structure
(easier said than done , so let’s take a closer look into the approach…)
Here is a sample Orders table (source) Re-ordering non PK columns based on data-type
ORDERNUMBER
QUANTITYORDERED
PRICEEACH
ORDERLINENUMBER
SALES
ORDERDATE
STATUS
QTR_ID
MONTH_ID
YEAR_ID
PRODUCTLINE
MSRP
PRODUCTCODE
CUSTOMERNAME
PHONE
ADDRESSLINE1
ADDRESSLINE2
CITY
STATE
POSTALCODE
COUNTRY
TERRITORY
CONTACTLASTNAME
CONTACTFIRSTNAME
DEALSIZE
BIGINT
INT
DECIMAL
INT
DECIMAL
TIMESTAMP
TEXT
INT
INT
INT
TEXT
INT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
ORDERNUMBER
ORDERDATE
PRICEEACH
SALES
QUANTITYORDERED
ORDERLINENUMBER
QTR_ID
MONTH_ID
YEAR_ID
MSRP
STATUS
PRODUCTLINE
PRODUCTCODE
CUSTOMERNAME
PHONE
ADDRESSLINE1
ADDRESSLINE2
CITY
STATE
POSTALCODE
COUNTRY
TERRITORY
CONTACTLASTNAME
CONTACTFIRSTNAME
DEALSIZE
BIGINT
TIMESTAMP
DECIMAL
DECIMAL
INT
INT
INT
INT
INT
INT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
CREATE TABLE DDL - Traditional approach CREATE TABLE DDL – Proposed Alternate approach
CREATE TABLE IF NOT EXISTS etl_schema.orders (
ordernumber BIGINT,
quantityordered BIGINT,
priceeach DECIMAL,
orderlinenumber INT,
sales DECIMAL,
orderdate TIMESTAMP,
status TEXT,
qtr_id INT,
month_id INT,
year_id INT,
productline TEXT,
msrp INT,
productcode TEXT,
customername TEXT,
phone TEXT,
addressline1 TEXT,
addressline2 TEXT,
city TEXT,
state TEXT,
postalcode TEXT,
country TEXT,
territory TEXT,
contactlastname TEXT,
contactfirstname TEXT,
dealsize TEXT,
PRIMARY KEY (ordernumber,orderdate)
)
WITH CACHING = { 'keys' : 'ALL', 'rows_per_partition' : 200 }
AND CDC = TRUE AND COMMENT = 'Orders Table’ AND CLUSTERING ORDER BY (orderdate DESC)
AND COMPACTION = {'class': 'SizeTieredCompactionStrategy','tombstone_compaction_interval':1800}
AND GC_GRACE_SECONDS = 3600;
CREATE TABLE IF NOT EXISTS etl_schema.orders_new (
ordernumber BIGINT,
orderdate TIMESTAMP,
decimal_map MAP <TEXT, DECIMAL>,
tinyint_map MAP <TEXT, INT>,
text_map MAP <TEXT, TEXT>,
PRIMARY KEY (ordernumber,orderdate)
)
WITH caching = { 'keys' : 'ALL', 'rows_per_partition' : 200 }
AND cdc = TRUE
AND comment = 'Orders Key Value Table'
AND CLUSTERING ORDER BY (ORDERDATE DESC)
AND COMPACTION = {'class': 'SizeTieredCompactionStrategy','tombstone_compaction_interval':1800}
AND gc_grace_seconds = 3600;
CREATE INDEX idx_orders_new_1 ON orders_new (ENTRIES(decimal_map));
CREATE INDEX idx_orders_new_2 ON orders_new (ENTRIES(int_map));
CREATE INDEX idx_orders_new_3 ON orders_new (ENTRIES(text_map));
Note: Don’t use FROZEN maps else structural modification to existing data can’t be done at a later moment
Loading same record into both tables (note the syntax):
INSERT INTO etl_schema.orders
(ordernumber ,quantityordered ,priceeach ,orderlinenumber ,sales ,orderdate ,status ,qtr_id ,month_id ,year_id ,productline ,msrp ,productcode ,customername ,phone
,addressline1 ,addressline2 ,city ,state ,postalcode ,country ,territory ,contactlastname ,contactfirstname ,dealsize)
VALUES
(10107,30,95.7,2,2871,'2003-02-24','Shipped',1,2,2003,'Motorcycles',95,'S10_1678','Land of Toys Inc.','2125557818','897 Long Airport Avenue','
','NYC','NY','10022','USA','NA','Yu','Kwai','Small’);
INSERT INTO etl_schema.orders_new
(ordernumber ,orderdate ,decimal_map ,int_map ,text_map )
VALUES
(10107,'2003-02-24’,
{'priceeach':95.7,'sales':2871},
{'quantityordered':30,'orderlinenumber':2,'qtr_id':1,'month_id':2,'year_id':2003,'msrp':95},
{'status':'Shipped','productline':'Motorcycles','productcode':'S10_1678','customername':'Land of Toys Inc.','phone':'2125557818','addressline1':'897 Long Airport
Avenue','addressline2':'','city':'NYC','state':'NY','postalcode':'10022','country':'USA','territory':'NA','contactlastname':'Yu','contactfirstname':'Kwai','dealsize':'Small'});
Loading the full data file and trying to search information from the modified structure assures that the additional search
capability works as desired !
cqlsh> SELECT COUNT(*) FROM etl_schema.orders_new WHERE text_map CONTAINS 'Michael' ALLOW FILTERING;
count
-------
3
(1 rows)
Warnings :
Aggregation query used without partition key
cqlsh> SELECT COUNT(*) FROM etl_schema.orders_new WHERE text_map CONTAINS KEY 'productline' ALLOW FILTERING;
count
-------
165
(1 rows)
Warnings :
Aggregation query used without partition key
cqlsh> UPDATE etl_schema.orders_new SET text_map = text_map - {'territory'} WHERE ordernumber=10318 AND orderdate='2004-11-
02’; /* removes a ‘column’ from existing row*/
cqlsh> UPDATE etl_schema.orders_new SET text_map = text_map + {'new_territory':'NA'} WHERE ordernumber=10318 AND
orderdate='2004-11-02’; /* adds a ‘column’ in existing row*/
 Benefits of this approach are best for the use case where
‘Search’ criteria is not fixed on a table and evolves over a
period of time (and yet much desired)
 New ‘column’ addition is a seamless process that requires no
changes in the ETL/ELT pipeline
 This approach consumes more space than the conventional
storage by design
 Data ingestion speed slows down as compared to conventional
design
 Tombstones shall continue to be generated for insert/update/deletes as they are for the existing scenarios.
 However, Compaction operations are reduced if Values are suitably updated instead of overwriting the entire map
(syntax). Similarly, new Keys can be added with minimal impact.
 At an average , 2-10 % compaction reduction can be observed with this approach in case your table receives updates
 Searching columns is quite straightforward with this approach for columns
outside the Primary Keys
 Maps can be made searchable by enabling the ENTRIES/KEYS to be indexed.
This further helps to narrow down the search criteria by suitably using
CONTAINS /KEY predicates on otherwise non-searchable columns
 Flexible data model allows column modification ( DML operations) in the table seamlessly, saving the engineering team
time and effort to do so otherwise manually:
 Addition
 Deletion
 Modification of column datatype
 Keys are searchable, making other columns than Primary Key also to be used for lookup, while not adding on the effort
to build and maintain individual indexes
 Model is full compatible with application layer languages (Java, Python etc.) to slice and dice the key-value pairs from
maps using de/serialization techniques available in all libraries for json parsing
Images Sourced from

More Related Content

What's hot

In Sync11 Presentation The Biggest Loser
In Sync11 Presentation The Biggest LoserIn Sync11 Presentation The Biggest Loser
In Sync11 Presentation The Biggest Loserpaulguerin
 
Stored procedure tuning and optimization t sql
Stored procedure tuning and optimization t sqlStored procedure tuning and optimization t sql
Stored procedure tuning and optimization t sqlnishantdavid9
 
Pass 2018 introduction to dax
Pass 2018 introduction to daxPass 2018 introduction to dax
Pass 2018 introduction to daxIke Ellis
 
MSBI and Data WareHouse techniques by Quontra
MSBI and Data WareHouse techniques by Quontra MSBI and Data WareHouse techniques by Quontra
MSBI and Data WareHouse techniques by Quontra QUONTRASOLUTIONS
 
Introduction to DAX
Introduction to DAXIntroduction to DAX
Introduction to DAXIke Ellis
 
Intro to DAX Patterns
Intro to DAX PatternsIntro to DAX Patterns
Intro to DAX PatternsEric Bragas
 
Run your queries 14X faster without any investment!
Run your queries 14X faster without any investment!Run your queries 14X faster without any investment!
Run your queries 14X faster without any investment!Knoldus Inc.
 
OER UNIT 4 PARTITION
OER UNIT 4 PARTITIONOER UNIT 4 PARTITION
OER UNIT 4 PARTITIONGirija Muscut
 

What's hot (12)

In Sync11 Presentation The Biggest Loser
In Sync11 Presentation The Biggest LoserIn Sync11 Presentation The Biggest Loser
In Sync11 Presentation The Biggest Loser
 
Stored procedure tuning and optimization t sql
Stored procedure tuning and optimization t sqlStored procedure tuning and optimization t sql
Stored procedure tuning and optimization t sql
 
Cs437 lecture 7-8
Cs437 lecture 7-8Cs437 lecture 7-8
Cs437 lecture 7-8
 
Agreggates i
Agreggates iAgreggates i
Agreggates i
 
Introduction to DAX Language
Introduction to DAX LanguageIntroduction to DAX Language
Introduction to DAX Language
 
Pass 2018 introduction to dax
Pass 2018 introduction to daxPass 2018 introduction to dax
Pass 2018 introduction to dax
 
MSBI and Data WareHouse techniques by Quontra
MSBI and Data WareHouse techniques by Quontra MSBI and Data WareHouse techniques by Quontra
MSBI and Data WareHouse techniques by Quontra
 
Introduction to DAX
Introduction to DAXIntroduction to DAX
Introduction to DAX
 
Intro to DAX Patterns
Intro to DAX PatternsIntro to DAX Patterns
Intro to DAX Patterns
 
Run your queries 14X faster without any investment!
Run your queries 14X faster without any investment!Run your queries 14X faster without any investment!
Run your queries 14X faster without any investment!
 
OER UNIT 4 PARTITION
OER UNIT 4 PARTITIONOER UNIT 4 PARTITION
OER UNIT 4 PARTITION
 
DAX (Data Analysis eXpressions) from Zero to Hero
DAX (Data Analysis eXpressions) from Zero to HeroDAX (Data Analysis eXpressions) from Zero to Hero
DAX (Data Analysis eXpressions) from Zero to Hero
 

Similar to Cassandra Table Modeling - an alternate approach

The thinking persons guide to data warehouse design
The thinking persons guide to data warehouse designThe thinking persons guide to data warehouse design
The thinking persons guide to data warehouse designCalpont
 
Cassandra Data Modeling
Cassandra Data ModelingCassandra Data Modeling
Cassandra Data ModelingBen Knear
 
How to Fine-Tune Performance Using Amazon Redshift
How to Fine-Tune Performance Using Amazon RedshiftHow to Fine-Tune Performance Using Amazon Redshift
How to Fine-Tune Performance Using Amazon RedshiftAWS Germany
 
Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007paulguerin
 
PostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / ShardingPostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / ShardingAmir Reza Hashemi
 
02 database oprimization - improving sql performance - ent-db
02  database oprimization - improving sql performance - ent-db02  database oprimization - improving sql performance - ent-db
02 database oprimization - improving sql performance - ent-dbuncleRhyme
 
Sql tuning guideline
Sql tuning guidelineSql tuning guideline
Sql tuning guidelineSidney Chen
 
Apache Cassandra Data Modeling with Travis Price
Apache Cassandra Data Modeling with Travis PriceApache Cassandra Data Modeling with Travis Price
Apache Cassandra Data Modeling with Travis PriceDataStax Academy
 
Myth busters - performance tuning 102 2008
Myth busters - performance tuning 102 2008Myth busters - performance tuning 102 2008
Myth busters - performance tuning 102 2008paulguerin
 
SQL Database Performance Tuning for Developers
SQL Database Performance Tuning for DevelopersSQL Database Performance Tuning for Developers
SQL Database Performance Tuning for DevelopersBRIJESH KUMAR
 
sql_server_2016_history_tables
sql_server_2016_history_tablessql_server_2016_history_tables
sql_server_2016_history_tablesarthurjosemberg
 
PostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
PostgreSQL Performance Tables Partitioning vs. Aggregated Data TablesPostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
PostgreSQL Performance Tables Partitioning vs. Aggregated Data TablesSperasoft
 
PPT SQL CLASS.pptx
PPT SQL CLASS.pptxPPT SQL CLASS.pptx
PPT SQL CLASS.pptxAngeOuattara
 
SQL.pptx for the begineers and good know
SQL.pptx for the begineers and good knowSQL.pptx for the begineers and good know
SQL.pptx for the begineers and good knowPavithSingh
 

Similar to Cassandra Table Modeling - an alternate approach (20)

The thinking persons guide to data warehouse design
The thinking persons guide to data warehouse designThe thinking persons guide to data warehouse design
The thinking persons guide to data warehouse design
 
Cassandra Data Modeling
Cassandra Data ModelingCassandra Data Modeling
Cassandra Data Modeling
 
Less08 Schema
Less08 SchemaLess08 Schema
Less08 Schema
 
How to Fine-Tune Performance Using Amazon Redshift
How to Fine-Tune Performance Using Amazon RedshiftHow to Fine-Tune Performance Using Amazon Redshift
How to Fine-Tune Performance Using Amazon Redshift
 
Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007
 
PostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / ShardingPostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / Sharding
 
02 database oprimization - improving sql performance - ent-db
02  database oprimization - improving sql performance - ent-db02  database oprimization - improving sql performance - ent-db
02 database oprimization - improving sql performance - ent-db
 
Sql tuning guideline
Sql tuning guidelineSql tuning guideline
Sql tuning guideline
 
Apache Cassandra Data Modeling with Travis Price
Apache Cassandra Data Modeling with Travis PriceApache Cassandra Data Modeling with Travis Price
Apache Cassandra Data Modeling with Travis Price
 
Myth busters - performance tuning 102 2008
Myth busters - performance tuning 102 2008Myth busters - performance tuning 102 2008
Myth busters - performance tuning 102 2008
 
Chapter16
Chapter16Chapter16
Chapter16
 
SQL Database Performance Tuning for Developers
SQL Database Performance Tuning for DevelopersSQL Database Performance Tuning for Developers
SQL Database Performance Tuning for Developers
 
sql_server_2016_history_tables
sql_server_2016_history_tablessql_server_2016_history_tables
sql_server_2016_history_tables
 
PostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
PostgreSQL Performance Tables Partitioning vs. Aggregated Data TablesPostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
PostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
 
PPT SQL CLASS.pptx
PPT SQL CLASS.pptxPPT SQL CLASS.pptx
PPT SQL CLASS.pptx
 
Rdbms day3
Rdbms day3Rdbms day3
Rdbms day3
 
Database concepts
Database conceptsDatabase concepts
Database concepts
 
SQL
SQLSQL
SQL
 
data-modeling-paper
data-modeling-paperdata-modeling-paper
data-modeling-paper
 
SQL.pptx for the begineers and good know
SQL.pptx for the begineers and good knowSQL.pptx for the begineers and good know
SQL.pptx for the begineers and good know
 

Recently uploaded

Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationBoston Institute of Analytics
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 

Recently uploaded (20)

Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health Classification
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 

Cassandra Table Modeling - an alternate approach

  • 1. Devopam Mittra Transforming RDBMS data model to Apache Cassandra
  • 2. • Data modeling is pivotal for a successful Data Access Layer in Cassandra. • Unlike a traditional RDBMS table, a ‘Column Family’ aka Cassandra table has to be modeled based on the usage (and not the storage). • ‘Searchability’ of information becomes a key driving factor in Cassandra data modeling. • Schema / structural changes have a huge effort and time impact.  Storing structured data in Cassandra typically evades attention and detail as modelers prefer to do an as-is structural copy from the source.  This approach works well to suit structured query analytics and not beyond, defeating the basic purpose of self-service analytics.  Any changes to the structure / model leads to redoing modeling effort including associated testing and validation  Leveraging ‘Collection’ type to store collated native datatype elements and only retain primary key attributes in main table structure (easier said than done , so let’s take a closer look into the approach…)
  • 3. Here is a sample Orders table (source) Re-ordering non PK columns based on data-type ORDERNUMBER QUANTITYORDERED PRICEEACH ORDERLINENUMBER SALES ORDERDATE STATUS QTR_ID MONTH_ID YEAR_ID PRODUCTLINE MSRP PRODUCTCODE CUSTOMERNAME PHONE ADDRESSLINE1 ADDRESSLINE2 CITY STATE POSTALCODE COUNTRY TERRITORY CONTACTLASTNAME CONTACTFIRSTNAME DEALSIZE BIGINT INT DECIMAL INT DECIMAL TIMESTAMP TEXT INT INT INT TEXT INT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT ORDERNUMBER ORDERDATE PRICEEACH SALES QUANTITYORDERED ORDERLINENUMBER QTR_ID MONTH_ID YEAR_ID MSRP STATUS PRODUCTLINE PRODUCTCODE CUSTOMERNAME PHONE ADDRESSLINE1 ADDRESSLINE2 CITY STATE POSTALCODE COUNTRY TERRITORY CONTACTLASTNAME CONTACTFIRSTNAME DEALSIZE BIGINT TIMESTAMP DECIMAL DECIMAL INT INT INT INT INT INT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT
  • 4. CREATE TABLE DDL - Traditional approach CREATE TABLE DDL – Proposed Alternate approach CREATE TABLE IF NOT EXISTS etl_schema.orders ( ordernumber BIGINT, quantityordered BIGINT, priceeach DECIMAL, orderlinenumber INT, sales DECIMAL, orderdate TIMESTAMP, status TEXT, qtr_id INT, month_id INT, year_id INT, productline TEXT, msrp INT, productcode TEXT, customername TEXT, phone TEXT, addressline1 TEXT, addressline2 TEXT, city TEXT, state TEXT, postalcode TEXT, country TEXT, territory TEXT, contactlastname TEXT, contactfirstname TEXT, dealsize TEXT, PRIMARY KEY (ordernumber,orderdate) ) WITH CACHING = { 'keys' : 'ALL', 'rows_per_partition' : 200 } AND CDC = TRUE AND COMMENT = 'Orders Table’ AND CLUSTERING ORDER BY (orderdate DESC) AND COMPACTION = {'class': 'SizeTieredCompactionStrategy','tombstone_compaction_interval':1800} AND GC_GRACE_SECONDS = 3600; CREATE TABLE IF NOT EXISTS etl_schema.orders_new ( ordernumber BIGINT, orderdate TIMESTAMP, decimal_map MAP <TEXT, DECIMAL>, tinyint_map MAP <TEXT, INT>, text_map MAP <TEXT, TEXT>, PRIMARY KEY (ordernumber,orderdate) ) WITH caching = { 'keys' : 'ALL', 'rows_per_partition' : 200 } AND cdc = TRUE AND comment = 'Orders Key Value Table' AND CLUSTERING ORDER BY (ORDERDATE DESC) AND COMPACTION = {'class': 'SizeTieredCompactionStrategy','tombstone_compaction_interval':1800} AND gc_grace_seconds = 3600; CREATE INDEX idx_orders_new_1 ON orders_new (ENTRIES(decimal_map)); CREATE INDEX idx_orders_new_2 ON orders_new (ENTRIES(int_map)); CREATE INDEX idx_orders_new_3 ON orders_new (ENTRIES(text_map)); Note: Don’t use FROZEN maps else structural modification to existing data can’t be done at a later moment
  • 5. Loading same record into both tables (note the syntax): INSERT INTO etl_schema.orders (ordernumber ,quantityordered ,priceeach ,orderlinenumber ,sales ,orderdate ,status ,qtr_id ,month_id ,year_id ,productline ,msrp ,productcode ,customername ,phone ,addressline1 ,addressline2 ,city ,state ,postalcode ,country ,territory ,contactlastname ,contactfirstname ,dealsize) VALUES (10107,30,95.7,2,2871,'2003-02-24','Shipped',1,2,2003,'Motorcycles',95,'S10_1678','Land of Toys Inc.','2125557818','897 Long Airport Avenue',' ','NYC','NY','10022','USA','NA','Yu','Kwai','Small’); INSERT INTO etl_schema.orders_new (ordernumber ,orderdate ,decimal_map ,int_map ,text_map ) VALUES (10107,'2003-02-24’, {'priceeach':95.7,'sales':2871}, {'quantityordered':30,'orderlinenumber':2,'qtr_id':1,'month_id':2,'year_id':2003,'msrp':95}, {'status':'Shipped','productline':'Motorcycles','productcode':'S10_1678','customername':'Land of Toys Inc.','phone':'2125557818','addressline1':'897 Long Airport Avenue','addressline2':'','city':'NYC','state':'NY','postalcode':'10022','country':'USA','territory':'NA','contactlastname':'Yu','contactfirstname':'Kwai','dealsize':'Small'});
  • 6. Loading the full data file and trying to search information from the modified structure assures that the additional search capability works as desired ! cqlsh> SELECT COUNT(*) FROM etl_schema.orders_new WHERE text_map CONTAINS 'Michael' ALLOW FILTERING; count ------- 3 (1 rows) Warnings : Aggregation query used without partition key cqlsh> SELECT COUNT(*) FROM etl_schema.orders_new WHERE text_map CONTAINS KEY 'productline' ALLOW FILTERING; count ------- 165 (1 rows) Warnings : Aggregation query used without partition key cqlsh> UPDATE etl_schema.orders_new SET text_map = text_map - {'territory'} WHERE ordernumber=10318 AND orderdate='2004-11- 02’; /* removes a ‘column’ from existing row*/ cqlsh> UPDATE etl_schema.orders_new SET text_map = text_map + {'new_territory':'NA'} WHERE ordernumber=10318 AND orderdate='2004-11-02’; /* adds a ‘column’ in existing row*/
  • 7.  Benefits of this approach are best for the use case where ‘Search’ criteria is not fixed on a table and evolves over a period of time (and yet much desired)  New ‘column’ addition is a seamless process that requires no changes in the ETL/ELT pipeline  This approach consumes more space than the conventional storage by design  Data ingestion speed slows down as compared to conventional design
  • 8.  Tombstones shall continue to be generated for insert/update/deletes as they are for the existing scenarios.  However, Compaction operations are reduced if Values are suitably updated instead of overwriting the entire map (syntax). Similarly, new Keys can be added with minimal impact.  At an average , 2-10 % compaction reduction can be observed with this approach in case your table receives updates
  • 9.  Searching columns is quite straightforward with this approach for columns outside the Primary Keys  Maps can be made searchable by enabling the ENTRIES/KEYS to be indexed. This further helps to narrow down the search criteria by suitably using CONTAINS /KEY predicates on otherwise non-searchable columns
  • 10.  Flexible data model allows column modification ( DML operations) in the table seamlessly, saving the engineering team time and effort to do so otherwise manually:  Addition  Deletion  Modification of column datatype  Keys are searchable, making other columns than Primary Key also to be used for lookup, while not adding on the effort to build and maintain individual indexes  Model is full compatible with application layer languages (Java, Python etc.) to slice and dice the key-value pairs from maps using de/serialization techniques available in all libraries for json parsing