Uploaded byDataArt

PPTX, PDF203 views

Big data storages

Big data storage systems are designed to store large volumes of immutable data from sources like sensors, social networks, and log files. They provide horizontal and vertical scaling through clustering to ensure size, speed, and availability. Common approaches include NoSQL key-value, document, and column-oriented databases like Redis, MongoDB, and Cassandra that sacrifice transactions for performance but lack standardization and analytics capabilities.

Big Data
Storages

Agenda
[Big]Data Source: when it becomes Big?
What cluster is? Horizontal and vertical scaling
[Big]Data Storage challenges
Disadvantages
NoSQL = Not only SQL
Most popular and trendy

Big Data Storage Concepts
Only stores facts (events), doesn’t analyze it
Immutable
Time series data (based on timestamps and, maybe, origin)
Store everything, delete nothing
Where: Messages (email, twitter), social networks, Sensor data (IoT), Log files,
Locations

Cluster. Horizontal and vertical scaling
What cluster is?
Load balancer
Communication: master/slave
architecture
Fault tolerance and replication
factor

Size (keep and search huge
amount of data)
Speed (data acquisition, data
search)
Availability (fault tolerance,
partition tolerance)
Big Data Storage Challenges

Disadvantages of Big Data Storages
No transactions (ACID)
Less mature
Big variety of concepts, lack of standardization
No BI or analytics in queries
Administration

Distributed File storage
Amazon

Storages: Key-Value
Examples: Redis, DynamoDB, MemcacheDB, Riak KV, Aerospike, OrientDB

Storages: Document oriented
Examples: Apache CouchDB, Couchbase, MongoDB

Storages: Graphs
Examples: Allegro, Neo4J, OrientDB, Titan

Storages: Column based
Examples: Cassandra, HBase, Accumulo, Vertica

Why Cassandra?

Apache Cassandra: basics
Masterless architecture with read/write anywhere design
All nodes are the same
No single point of failure
Zone support
Linear scalability
CQL - cassandra query language
Availability and Partition Tolerance but Eventual Consistency

Partitioning and Replication

Data modeling

Demo

Recommended

PDF

TileDB Cloud Webinar (09/30/2021)

byStavros Papadopoulos

PDF

Population genomics is a data management problem

byStavros Papadopoulos

PDF

The New Data Economics

byStavros Papadopoulos

PDF

Debunking "Purpose-Built Data Systems:": Enter the Universal Database

byStavros Papadopoulos

PPTX

MetadataTheory: Introduction to Metadata (5th of 10)

byNikos Palavitsinis, PhD

PPTX

Overview of Oracle Database 18c Express Edition (XE)

PPTX

Data Mining Techniques

PPTX

Unit 1

bykarthik eriki

PDF

Datamining with big data

bymuhammed jassim k

PPTX

ORCID and RDM

PPT

Graph Database and Neo4j

PDF

Big Data Pitfalls

ODP

Building next generation data warehouses

PPTX

Introduction

byMr Patrick NIYISHAKA

PPTX

Custom Data Search with Stormpath

PPTX

Semantic Web related top conference review

PDF

Stardog Linked Data Catalog

PPTX

Introduction to Big Data

byMd. Afif Al Mamun

PDF

A Gentle Introduction to Big Data

byMehmet Ali Akyol

PPTX

The University of Edinburgh Research Data Management Service Suite

PPTX

Data Mining: Key definitions

byDataminingTools Inc

PPTX

Lunch & Learn Intro to Big Data

byMelissa Hornbostel

ODP

Graphing Your Data

PPTX

How Linked Data Can Speed Information Discovery

PPTX

Big Data Projects Research Ideas

byMatlab Simulation

PPTX

Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield

byDez Blanchfield

PPTX

Message queue demo

PDF

BARCLAYS TRAVEL

PPTX

World renown directors

byAbbey Cotterill

PDF

Ольга Котий: Конструктивные коммуникации с заказчиком.

More Related Content

PDF

TileDB Cloud Webinar (09/30/2021)

byStavros Papadopoulos

PDF

Population genomics is a data management problem

byStavros Papadopoulos

PDF

The New Data Economics

byStavros Papadopoulos

PDF

Debunking "Purpose-Built Data Systems:": Enter the Universal Database

byStavros Papadopoulos

PPTX

MetadataTheory: Introduction to Metadata (5th of 10)

byNikos Palavitsinis, PhD

PPTX

Overview of Oracle Database 18c Express Edition (XE)

PPTX

Data Mining Techniques

PPTX

Unit 1

bykarthik eriki

TileDB Cloud Webinar (09/30/2021)

byStavros Papadopoulos

Population genomics is a data management problem

byStavros Papadopoulos

The New Data Economics

byStavros Papadopoulos

Debunking "Purpose-Built Data Systems:": Enter the Universal Database

byStavros Papadopoulos

MetadataTheory: Introduction to Metadata (5th of 10)

byNikos Palavitsinis, PhD

Overview of Oracle Database 18c Express Edition (XE)

Data Mining Techniques

Unit 1

bykarthik eriki

What's hot

PDF

Datamining with big data

bymuhammed jassim k

PPTX

ORCID and RDM

PPT

Graph Database and Neo4j

PDF

Big Data Pitfalls

ODP

Building next generation data warehouses

PPTX

Introduction

byMr Patrick NIYISHAKA

PPTX

Custom Data Search with Stormpath

PPTX

Semantic Web related top conference review

PDF

Stardog Linked Data Catalog

PPTX

Introduction to Big Data

byMd. Afif Al Mamun

PDF

A Gentle Introduction to Big Data

byMehmet Ali Akyol

PPTX

The University of Edinburgh Research Data Management Service Suite

PPTX

Data Mining: Key definitions

byDataminingTools Inc

PPTX

Lunch & Learn Intro to Big Data

byMelissa Hornbostel

ODP

Graphing Your Data

PPTX

How Linked Data Can Speed Information Discovery

PPTX

Big Data Projects Research Ideas

byMatlab Simulation

PPTX

Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield

byDez Blanchfield

Datamining with big data

bymuhammed jassim k

ORCID and RDM

Graph Database and Neo4j

Big Data Pitfalls

Building next generation data warehouses

Introduction

byMr Patrick NIYISHAKA

Custom Data Search with Stormpath

Semantic Web related top conference review

Stardog Linked Data Catalog

Introduction to Big Data

byMd. Afif Al Mamun

A Gentle Introduction to Big Data

byMehmet Ali Akyol

The University of Edinburgh Research Data Management Service Suite

Data Mining: Key definitions

byDataminingTools Inc

Lunch & Learn Intro to Big Data

byMelissa Hornbostel

Graphing Your Data

How Linked Data Can Speed Information Discovery

Big Data Projects Research Ideas

byMatlab Simulation

Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield

byDez Blanchfield

Viewers also liked

PPTX

Message queue demo

PDF

BARCLAYS TRAVEL

PPTX

World renown directors

byAbbey Cotterill

PDF

Ольга Котий: Конструктивные коммуникации с заказчиком.

PPTX

Андраш Густи «Интерфейсы, которые вызывают привыкание, или Как перепрошить по...

PPTX

Contratación electrónica y contratación informática

byJoel Quintana

PPTX

Want to be a QA? What's next?

PPTX

«Особенности Agile-разработки интернет-проектов»

PPTX

128059 final ppt30.08.2K16

byCargills Ceylon PLC

PPTX

Final cut pro analysis

byAbbey Cotterill

PPTX

«Прототип за 60 секунд: о вайрфреймах и прототипах»

PPTX

Institution

byAbbey Cotterill

PPTX

Ксения Кобрин "Сферы применения различных типов менеджмента"

PDF

«Android: думайте через данные» Андрей Хитрый

PDF

“Разговоры на разных диалектах”, Артем Захарченко, DevPro

PPTX

นิทาน

PPTX

Расширение Visual studio для ASP.NET Identity

PDF

Александр Кашеверов — Коротко про WEB: HTML, CSS, JS.

Message queue demo

BARCLAYS TRAVEL

World renown directors

byAbbey Cotterill

Ольга Котий: Конструктивные коммуникации с заказчиком.

Андраш Густи «Интерфейсы, которые вызывают привыкание, или Как перепрошить по...

Contratación electrónica y contratación informática

byJoel Quintana

Want to be a QA? What's next?

«Особенности Agile-разработки интернет-проектов»

128059 final ppt30.08.2K16

byCargills Ceylon PLC

Final cut pro analysis

byAbbey Cotterill

«Прототип за 60 секунд: о вайрфреймах и прототипах»

Institution

byAbbey Cotterill

Ксения Кобрин "Сферы применения различных типов менеджмента"

«Android: думайте через данные» Андрей Хитрый

“Разговоры на разных диалектах”, Артем Захарченко, DevPro

นิทาน

Расширение Visual studio для ASP.NET Identity

Александр Кашеверов — Коротко про WEB: HTML, CSS, JS.

Similar to Big data storages

PDF

BigData Behind-the-Scenes~20150827

byAnthony Potappel

PPTX

Big Data in Action : Operations, Analytics and more

bySoftweb Solutions

PDF

No sql

bySudheer Kondla

PPTX

big data lec2 kkokojijijijijijiijijj(1).pptx

byismailah21645

PPTX

Introduction to Big Data

PPT

Apache Cassandra training. Overview and Basics

PDF

Introduction to Big Data Technologies & Applications

PPSX

Big data with Hadoop - Introduction

PDF

Cassandra background-and-architecture

PDF

DBA to Data Scientist

ODP

BigData Hadoop

byKumari Surabhi

PDF

Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays

PDF

Big data and hadoop

byAshishRathore72

PPT

Big Data

PDF

Big Data Ecosystem

byLucian Neghina

PDF

Beyond Relational

PPT

Lecture 5 - Big Data and Hadoop Intro.ppt

byalmaraniabwmalk

PPTX

Big data explanation with real time use case

byN.Jagadish Kumar

PPT

Final deck

PPTX

Bigdata

BigData Behind-the-Scenes~20150827

byAnthony Potappel

Big Data in Action : Operations, Analytics and more

bySoftweb Solutions

No sql

bySudheer Kondla

big data lec2 kkokojijijijijijiijijj(1).pptx

byismailah21645

Introduction to Big Data

Apache Cassandra training. Overview and Basics

Introduction to Big Data Technologies & Applications

Big data with Hadoop - Introduction

Cassandra background-and-architecture

DBA to Data Scientist

BigData Hadoop

byKumari Surabhi

Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays

Big data and hadoop

byAshishRathore72

Big Data

Big Data Ecosystem

byLucian Neghina

Beyond Relational

Lecture 5 - Big Data and Hadoop Intro.ppt

byalmaraniabwmalk

Big data explanation with real time use case

byN.Jagadish Kumar

Final deck

Bigdata

More from DataArt

PDF

DataArt Custom Software Engineering with a Human Approach

PDF

DataArt Healthcare & Life Sciences

PDF

DataArt Financial Services and Capital Markets

PDF

About DataArt HR Partners

PDF

Event management в IT

PDF

Digital Marketing from inside

PPTX

What's new in Android, Igor Malytsky ( Google Post I|O Tour)

PDF

DevOps Workshop:Что бывает, когда DevOps приходит на проект

PDF

IT Talk Kharkiv: «‎Soft skills в IT. Польза или вред? Максим Бастион, DataArt

PDF

«Ноль копеек. Спастись от выгорания» — Сергей Чеботарев (Head of Design, Han...

PDF

Communication in QA's life

PDF

Нельзя просто так взять и договориться, или как мы работали со сложными людьми

PDF

Знакомьтесь, DevOps

PDF

DevOps in real life

PDF

Codeless: автоматизация тестирования

PDF

Selenoid

PDF

Selenide

PDF

A. Sirota "Building an Automation Solution based on Appium"

PDF

Эмоциональный интеллект или как не сойти с ума в условиях сложного и динамичн...

PPTX

IT talk: Как я перестал бояться и полюбил TestNG

DataArt Custom Software Engineering with a Human Approach

DataArt Healthcare & Life Sciences

DataArt Financial Services and Capital Markets

About DataArt HR Partners

Event management в IT

Digital Marketing from inside

What's new in Android, Igor Malytsky ( Google Post I|O Tour)

DevOps Workshop:Что бывает, когда DevOps приходит на проект

IT Talk Kharkiv: «‎Soft skills в IT. Польза или вред? Максим Бастион, DataArt

«Ноль копеек. Спастись от выгорания» — Сергей Чеботарев (Head of Design, Han...

Communication in QA's life

Нельзя просто так взять и договориться, или как мы работали со сложными людьми

Знакомьтесь, DevOps

DevOps in real life

Codeless: автоматизация тестирования

Selenoid

Selenide

A. Sirota "Building an Automation Solution based on Appium"

Эмоциональный интеллект или как не сойти с ума в условиях сложного и динамичн...

IT talk: Как я перестал бояться и полюбил TestNG

Recently uploaded

PDF

BÀI TẬP TEST BỔ TRỢ THEO TỪNG CHỦ ĐỀ CỦA TỪNG UNIT KÈM BÀI TẬP NGHE - TIẾNG A...

byNguyen Thanh Tu Collection

PDF

Fever & Fragment Uncovering the Plague in The Waste Land

PDF

Result Analysis of the Pre Board 2025 .pdf

byDirectorate of Education Delhi

PPTX

What are the Different Types of Models in Odoo 18

byCeline George

PPTX

Society mirrored in libraries – the naming, denaming and renaming of the Libr...

byCONUL Conference

PDF

Road Safety Calendar-2026 Courtesy - IYSO Team India - Safer Indian Roads

byIndian Youth Secured Organisation

PPTX

Digital Equity and inclusivity: Librarians connecting the unconnected rural ...

byCONUL Conference

PPTX

Bookish Bats: a unique sustainable delivery service in the Russell Library. A...

byCONUL Conference

PPTX

Pharmaceutical organic chemistry II :unit V - Cycloalkanes

byPOOJA KARVANDE

PDF

Digital_Empathy_Toolkit_Netiquette_and_Responsible_Communication

PDF

Unit 4: Polynuclear Hydrocarbons | Pharma Organic Chemistry

PPTX

Thriving in Uncertain Times: Building on the past, positioning for the future...

byCONUL Conference

PDF

syllabus for mca regulation 2025 anna university

bySASIDHARANMURUGAN

PPTX

All Things 2025 - A Year in Review Quiz!

byShashank Jogani

PDF

Old Historicism vs New Historicism Slides

byDr. Ilyas Babar Awan

PPTX

Archival literacy and engagement through Brightspace: the new Special Collect...

byCONUL Conference

PPTX

Foundations for the Future: Auditing Digital Collections for Long-Term Preser...

byCONUL Conference

PPTX

AI Literacy at UCD Library. Dr Marta Bustillo & Sandra Dunkin, University Col...

byCONUL Conference

PPTX

Empowering Equality: The Athena Swan Journey at Maynooth University Library. ...

byCONUL Conference

PPTX

How to Add an Icon Image to a Module in Odoo 18

byCeline George

BÀI TẬP TEST BỔ TRỢ THEO TỪNG CHỦ ĐỀ CỦA TỪNG UNIT KÈM BÀI TẬP NGHE - TIẾNG A...

byNguyen Thanh Tu Collection

Fever & Fragment Uncovering the Plague in The Waste Land

Result Analysis of the Pre Board 2025 .pdf

byDirectorate of Education Delhi

What are the Different Types of Models in Odoo 18

byCeline George

Society mirrored in libraries – the naming, denaming and renaming of the Libr...

byCONUL Conference

Road Safety Calendar-2026 Courtesy - IYSO Team India - Safer Indian Roads

byIndian Youth Secured Organisation

Digital Equity and inclusivity: Librarians connecting the unconnected rural ...

byCONUL Conference

Bookish Bats: a unique sustainable delivery service in the Russell Library. A...

byCONUL Conference

Pharmaceutical organic chemistry II :unit V - Cycloalkanes

byPOOJA KARVANDE

Digital_Empathy_Toolkit_Netiquette_and_Responsible_Communication

Unit 4: Polynuclear Hydrocarbons | Pharma Organic Chemistry

Thriving in Uncertain Times: Building on the past, positioning for the future...

byCONUL Conference

syllabus for mca regulation 2025 anna university

bySASIDHARANMURUGAN

All Things 2025 - A Year in Review Quiz!

byShashank Jogani

Old Historicism vs New Historicism Slides

byDr. Ilyas Babar Awan

Archival literacy and engagement through Brightspace: the new Special Collect...

byCONUL Conference

Foundations for the Future: Auditing Digital Collections for Long-Term Preser...

byCONUL Conference

AI Literacy at UCD Library. Dr Marta Bustillo & Sandra Dunkin, University Col...

byCONUL Conference

Empowering Equality: The Athena Swan Journey at Maynooth University Library. ...

byCONUL Conference

How to Add an Icon Image to a Module in Odoo 18

byCeline George

Big data storages

1.
Big Data Storages
2.
Agenda [Big]Data Source: whenit becomes Big? What cluster is? Horizontal and vertical scaling [Big]Data Storage challenges Disadvantages NoSQL = Not only SQL Most popular and trendy
3.
Big Data StorageConcepts Only stores facts (events), doesn’t analyze it Immutable Time series data (based on timestamps and, maybe, origin) Store everything, delete nothing Where: Messages (email, twitter), social networks, Sensor data (IoT), Log files, Locations
4.
Cluster. Horizontal andvertical scaling What cluster is? Load balancer Communication: master/slave architecture Fault tolerance and replication factor
5.
Size (keep andsearch huge amount of data) Speed (data acquisition, data search) Availability (fault tolerance, partition tolerance) Big Data Storage Challenges
6.
Disadvantages of BigData Storages No transactions (ACID) Less mature Big variety of concepts, lack of standardization No BI or analytics in queries Administration
7.
Distributed File storage Amazon
9.
Storages: Key-Value Examples: Redis,DynamoDB, MemcacheDB, Riak KV, Aerospike, OrientDB
10.
Storages: Document oriented Examples:Apache CouchDB, Couchbase, MongoDB
11.
Storages: Graphs Examples: Allegro,Neo4J, OrientDB, Titan
12.
Storages: Column based Examples:Cassandra, HBase, Accumulo, Vertica
13.
Why Cassandra?
14.
Apache Cassandra: basics Masterlessarchitecture with read/write anywhere design All nodes are the same No single point of failure Zone support Linear scalability CQL - cassandra query language Availability and Partition Tolerance but Eventual Consistency
16.
Partitioning and Replication
17.
Data modeling
19.
Demo

Editor's Notes

#4 Materialized view, functions, procedures and triggers в RDBMS и что от этого ушли (пример про Oracle и финансовый отчет) Отказ от UPDATE в пользу INSERT за счет обновленного таймстемпа В силу предыдущего пункта данные принято называть time series Т.к. аналитика происходит за пределами БД (batch jobs), то желательно ничего не удалять, т.к. если в наших джобах будут какие-то ошибки или проблемы - мы всегда можем их прогнать снова и получить новые результаты Рассказать про основные источники time series данных
#5 Определение Коммуникационные протоколы -> master/slave architecture Single point of failure Распределение данных по кластеру, отказоустойчивость и репликация
#6 Напоминание про CAP теорему ++Меня потом спрашивали после лекции, Нужно еще раз пояснить, что это не догма, а скорее важный принцип о котором не следует забывать Трактовать тот же Consistency можно по разному
#7 Проговорить традиционное понятие транзакции, расшифровать ACID Пройтись по пунктам: атомарность, консистентность, изолированность, доступность (пример: перевод денег на счет) Big Data storages появились относительно недавно, по сравнению с RDBMS Большое кол-во концепций и реализаций для разных задач Нормальные формы БД в RDBMS, здесь их нет, для аналитики вам нужны другие компоненты (а значит и их изучение, финансы на запуск и администрирование) Администрирование кластера само по себе более сложная вещь
#8 S3 - web service, HDFS - software S3 provides eventual consistency (read-after-write) S3 communication: REST and SOAP S3 replication: you don’t control it, but you can enable cross-region replication HDFS - master-slave architecture (Namenodes, datanodes) HDFS: files splitted into parts - blocks HDFS: automatic recovery Adding nodes to cluster is ok, but deleting is a challenge
#9 Здесь рассказать, почему sql запросы невозможно выполнять на NoSQL DBs (расшифровать понятие, пройтись по UPDATE, DELETE, COMMIT, ROLLBACK для примера)
#10 Здесь сказать про кеш на примере Redis: Open source In memory (Redis holds its database entirely in memory, using the disk only for persistence) Scalable All the Redis operations are atomic Rich set of data types
#11 Пример: MongoDB JSON-based documents (set of key-value pairs) Have dynamic schema Supports indexing and aggregation queries
#16 Нет смысла хранить все данные на каждом из узлов Как распределить их по кластеру, Hash Ring Вопрос сохранности данных: репликация
#17 Репликация асинхронна Протокол общения между нодам - Gossip Каждая нода может обрабатывать запросы. Нода, на которую пришел запрос, является координатором этого запроса Hinted handoff - если нода отпала, то какое-то время информация, которую ей нужно было передать, хранится и ждет, пока нода снова появится
#18 Partition key Clustering column Ordering