This document summarizes a presentation about modeling data with Cassandra Query Language (CQL) using examples from a Twitter-like application called Twissandra. It introduces CQL as an alternative to Thrift for querying Cassandra and describes how to model users, followers, tweets, timelines and other social media data structures in Cassandra tables. The presentation emphasizes denormalizing data and using materialized views to optimize queries, and concludes by noting that applications can be built in various languages thanks to Cassandra drivers.
Percona Live 4/15/15: Transparent sharding database virtualization engine (DVE)Tesora
Amrith Kumar of Tesora and Peter Boros of Percona present an in-depth exploration of transparent database scale out use the Tesora DVE framework for MySQL.
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2DataStax
Title: Introduction to Apache Cassandra 1.2
Details: Join Aaron Morton, DataStax MVP for Apache Cassandra and learn the basics of the massively scalable NoSQL database. This webinar is will examine C*’s architecture and its strengths for powering mission-critical applications. Aaron will introduce you to core concepts such as Cassandra’s data model, multi-datacenter replication, and tunable consistency. He’ll also cover new features in Cassandra version 1.2 including virtual nodes, CQL 3 language and query tracing.
Speaker: Aaron Morton, Apache Cassandra Committer
Aaron Morton is a Freelance Developer based in New Zealand, and a Committer on the Apache Cassandra project. In 2010, he gave up the RDBMS world for the scale and reliability of Cassandra. He now spends his time advancing the Cassandra project and helping others get the best out of it.
2017 02-07 - elastic & spark. building a search geo locatorAlberto Paro
Presentazione dell'evento EsInRome del 7 Febbraio 2017 - Integrazione Elasticsearch in architettura BigData e facilità di integrazione con Apache Spark.
DDoS Resiliency with Amazon Web Services (SEC305) | AWS re:Invent 2013Amazon Web Services
It's a rough world out there, filled with mega bot nets that threaten the availability of your web service. How do you keep your service running in the event of a 10,000x increase in traffic? Maximizing service availability under DDoS conditions requires thoughtful service architecture, and at times, fast acting operations teams. This presentation covers best practices for DDoS-resilient services.
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data ModelingDataStax Academy
You know you need Cassandra for it's uptime and scaling, but what about that data model? Let's bridge that gap and get you building your game changing app. We'll break down topics like storing objects and indexing for fast retrieval. You will see by understanding a few things about Cassandra internals, you can put your data model in the spotlight. The goal of this talk is to get you comfortable working with data in Cassandra throughout the application lifecycle. What are you waiting for? The cameras are waiting!
Percona Live 4/15/15: Transparent sharding database virtualization engine (DVE)Tesora
Amrith Kumar of Tesora and Peter Boros of Percona present an in-depth exploration of transparent database scale out use the Tesora DVE framework for MySQL.
Cassandra Community Webinar | Introduction to Apache Cassandra 1.2DataStax
Title: Introduction to Apache Cassandra 1.2
Details: Join Aaron Morton, DataStax MVP for Apache Cassandra and learn the basics of the massively scalable NoSQL database. This webinar is will examine C*’s architecture and its strengths for powering mission-critical applications. Aaron will introduce you to core concepts such as Cassandra’s data model, multi-datacenter replication, and tunable consistency. He’ll also cover new features in Cassandra version 1.2 including virtual nodes, CQL 3 language and query tracing.
Speaker: Aaron Morton, Apache Cassandra Committer
Aaron Morton is a Freelance Developer based in New Zealand, and a Committer on the Apache Cassandra project. In 2010, he gave up the RDBMS world for the scale and reliability of Cassandra. He now spends his time advancing the Cassandra project and helping others get the best out of it.
2017 02-07 - elastic & spark. building a search geo locatorAlberto Paro
Presentazione dell'evento EsInRome del 7 Febbraio 2017 - Integrazione Elasticsearch in architettura BigData e facilità di integrazione con Apache Spark.
DDoS Resiliency with Amazon Web Services (SEC305) | AWS re:Invent 2013Amazon Web Services
It's a rough world out there, filled with mega bot nets that threaten the availability of your web service. How do you keep your service running in the event of a 10,000x increase in traffic? Maximizing service availability under DDoS conditions requires thoughtful service architecture, and at times, fast acting operations teams. This presentation covers best practices for DDoS-resilient services.
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data ModelingDataStax Academy
You know you need Cassandra for it's uptime and scaling, but what about that data model? Let's bridge that gap and get you building your game changing app. We'll break down topics like storing objects and indexing for fast retrieval. You will see by understanding a few things about Cassandra internals, you can put your data model in the spotlight. The goal of this talk is to get you comfortable working with data in Cassandra throughout the application lifecycle. What are you waiting for? The cameras are waiting!
Wikimedia Content API: A Cassandra Use-caseEric Evans
Among the resources offered by Wikimedia is an API providing low-latency access to full-history content, in many formats. Its results are often the product of computationally intensive transforms, and must be pre-generated and stored to meet latency expectations. Unsurprisingly, there are many challenges to providing low-latency access to such a large data-set, in a demanding, globally distributed environment.
This presentation covers the Wikimedia content API and its use of Apache Cassandra as storage for a diverse and growing set of use-cases. Trials, tribulations, and triumphs, of both a development and operational nature will be discussed.
The Wikimedia Foundation is a non-profit and charitable organization driven by a vision of a world where every human can freely share in the sum of all knowledge. Each month Wikimedia sites serve over 18 billion page views to 500 million unique visitors around the world.
Among the many resources offered by Wikimedia is a public-facing API that provides low-latency, programmatic access to full-history content and meta-data, in a variety of formats. Commonly, results from this system are the product of computationally intensive transformations, and must be pre-generated and persisted to meet latency expectations. Unsurprisingly, there are numerous challenges to providing low-latency storage of such a massive data-set, in a demanding, globally distributed environment.
This talk covers Wikimedia Content API, and it's use of Apache Cassandra, a massively-scalable distributed database, as storage for a diverse and growing set of use-cases. Trials, tribulations, and triumphs, of both a development and operational nature are discussed.
Castle is an open-source project that provides an alternative to the lower layers of the storage stack -- RAID and POSIX filesystems -- for big data workloads, and distributed data stores such as Apache Cassandra.
This presentation from Berlin Buzzwords 2012 provides a high-level overview of Castle and how it is used with Cassandra to improve performance and predictability.
Build Your Own SaaS using Docker. A proof of concept with a simple Memcached SaaS.
See the Memcached as a service application in action at http://www.memcachedasaservice.com
Find the source code on GitHub: https://github.com/jbarbier/SaaS_Memcached
Whether it's statistics, weather forecasting, astronomy, finance, or network management, time series data plays a critical role in analytics and forecasting. Unfortunately, while many tools exist for time series storage and analysis, few are able to scale past memory limits, or provide rich query and analytics capabilities outside what is necessary to produce simple plots; For those challenged by large volumes of data, there is much room for improvement.
Apache Cassandra is a fully distributed second-generation database. Cassandra stores data in key-sorted order making it ideal for time series, and its high throughput and linear scalability make it well suited to very large data sets.
This talk will cover some of the requirements and challenges of large scale time series storage and analysis. Cassandra data and query modeling for this use-case will be discussed, and Newts, an open source Cassandra-based time series store under development at The OpenNMS Group will be introduced.
Wikimedia Content API: A Cassandra Use-caseEric Evans
Among the resources offered by Wikimedia is an API providing low-latency access to full-history content, in many formats. Its results are often the product of computationally intensive transforms, and must be pre-generated and stored to meet latency expectations. Unsurprisingly, there are many challenges to providing low-latency access to such a large data-set, in a demanding, globally distributed environment.
This presentation covers the Wikimedia content API and its use of Apache Cassandra as storage for a diverse and growing set of use-cases. Trials, tribulations, and triumphs, of both a development and operational nature will be discussed.
The Wikimedia Foundation is a non-profit and charitable organization driven by a vision of a world where every human can freely share in the sum of all knowledge. Each month Wikimedia sites serve over 18 billion page views to 500 million unique visitors around the world.
Among the many resources offered by Wikimedia is a public-facing API that provides low-latency, programmatic access to full-history content and meta-data, in a variety of formats. Commonly, results from this system are the product of computationally intensive transformations, and must be pre-generated and persisted to meet latency expectations. Unsurprisingly, there are numerous challenges to providing low-latency storage of such a massive data-set, in a demanding, globally distributed environment.
This talk covers Wikimedia Content API, and it's use of Apache Cassandra, a massively-scalable distributed database, as storage for a diverse and growing set of use-cases. Trials, tribulations, and triumphs, of both a development and operational nature are discussed.
Castle is an open-source project that provides an alternative to the lower layers of the storage stack -- RAID and POSIX filesystems -- for big data workloads, and distributed data stores such as Apache Cassandra.
This presentation from Berlin Buzzwords 2012 provides a high-level overview of Castle and how it is used with Cassandra to improve performance and predictability.
Build Your Own SaaS using Docker. A proof of concept with a simple Memcached SaaS.
See the Memcached as a service application in action at http://www.memcachedasaservice.com
Find the source code on GitHub: https://github.com/jbarbier/SaaS_Memcached
Whether it's statistics, weather forecasting, astronomy, finance, or network management, time series data plays a critical role in analytics and forecasting. Unfortunately, while many tools exist for time series storage and analysis, few are able to scale past memory limits, or provide rich query and analytics capabilities outside what is necessary to produce simple plots; For those challenged by large volumes of data, there is much room for improvement.
Apache Cassandra is a fully distributed second-generation database. Cassandra stores data in key-sorted order making it ideal for time series, and its high throughput and linear scalability make it well suited to very large data sets.
This talk will cover some of the requirements and challenges of large scale time series storage and analysis. Cassandra data and query modeling for this use-case will be discussed, and Newts, an open source Cassandra-based time series store under development at The OpenNMS Group will be introduced.
C*ollege Credit: Data Modeling for Apache CassandraDataStax
Cassandra stores data differently than traditional RDBMS’s. It is these differences that allow for improvements in performance, availability and scalability. Aaron Morton, DataStax MVP for Apache Cassandra will present the basics of the data model and outline the differences clearly. This webinar is 101 level and is suitable for people who are coming from a relational background and just starting to get into Apache Cassandra.
DevOpsDaysRiga 2018: Michiel Rook - Database schema migrations with zero down...DevOpsDays Riga
Does your application or service use a database? When that application changes because of new business requirements, you may need to make changes to the database schema. These database migrations could lead to downtime and can be an obstacle to implementing continuous delivery/deployment.
How can we deal with database migrations when we don’t want our end-users to experience downtime, and want to keep releasing?
In this talk we’ll discuss non-destructive changes, rollbacks, large data sets, useful tools and a few strategies to migrate our data safely, with minimum disruption to production.
Полнотекстовый поиск в PostgreSQL / Александр Алексеев (Postgres Professional)Ontico
РИТ++ 2017, Backend Conf
Зал Сан-Паулу, 6 июня, 11:00
Тезисы:
http://backendconf.ru/2017/abstracts/2780.html
Не всем известно, что в PostgreSQL есть полнотекстовый поиск. Притом, в отличие от некоторых других РСУБД, в PostgreSQL этот поиск совершенно полноценный и способный посоревноваться в скорости и качестве со специализированными решениями. Что не менее интересно, используя полнотекстовый поиск PostgreSQL, вы можете избавиться от дублирования данных, экономя тем самым место на диске, трафик и обеспечивая согласованность данных.
Из этого доклада вы узнаете, как использовать для полнотекстового поиска PostgreSQL GIN, GiST, а также новый RUM-индекс, в чем заключаются преимущества и недостатки названных индексов, как с их помощью сделать поиск по документами или, например, саджестилки, и не только.
Importing Data into Neo4j quickly and easily - StackOverflowNeo4j
In this GraphConnect presentation Mark and Michael show several ways to import large amounts of highly connected data from different formats into Neo4j. Both Cypher's LOAD CSV as well as the bulk importer is demonstrated along with many tips.
We use the well know StackOverflow Q&A site data which is interestingly very graphy.
A presentation about MySQL for beginners. It includes the following topics:
- Introduction
- Installation
- Executing SQL statements
- SQL Language Syntax
- The most important SQL commands
- MySQL Data Types
- Operators
- Basic Syntax
- SQL Joins
- Some Exercise
Presented at Cassandra London (April 7, 2014); The challenges of time-series storage and analytics in OpenNMS, with an introduction to Newts, a new Cassandra-based time-series data store.
Apache Cassandra is a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model.
This presentation, given at FOSDEM in 2010, provides a brief summary of cassandra's history, a high-level overview of the architecture and data model, and showcases some real life use-cases.
1. Cassandra By Example:
Data Modelling with CQL3
Berlin Buzzwords
June 4, 2013
Eric Evans
eevans@opennms.com
@jericevans
2. CQL is...
● Query language for Apache Cassandra
● Almost SQL (almost)
● Alternative query interface First class citizen
● More performant!
● Available since Cassandra 0.8.0 (almost 2
years!)
25. following
-- Users a user is following
CREATE TABLE following (
username text,
followed text,
PRIMARY KEY(username, followed)
);
26. following
-- Meg follows Stewie
INSERT INTO following (username, followed)
VALUES ('meg', 'stewie')
-- Get a list of who Meg follows
SELECT followed FROM following
WHERE username = 'meg'
29. followers
-- The users who follow username
CREATE TABLE followers (
username text,
following text,
PRIMARY KEY(username, following)
);
30. followers
-- Meg follows Stewie
INSERT INTO followers (username, followed)
VALUES ('stewie', 'meg')
-- Get a list of who follows Stewie
SELECT followers FROM following
WHERE username = 'stewie'
31. redux: following / followers
-- @meg follows @stewie
BEGIN BATCH
INSERT INTO following (username, followed)
VALUES ('meg', 'stewie')
INSERT INTO followers (username, followed)
VALUES ('stewie', 'meg')
APPLY BATCH
34. tweets
-- Tweet storage (think: permalink)
CREATE TABLE tweets (
tweetid uuid PRIMARY KEY,
username text,
body text
);
35. tweets
-- Store a tweet
INSERT INTO tweets (
tweetid,
username,
body
) VALUES (
60780342-90fe-11e2-8823-0026c650d722,
'stewie',
'victory is mine!'
)
36. Query tweets by ... ?
● author, time descending
● followed authors, time descending
● date starting / date ending
38. userline
-- Materialized view of the tweets
-- created by user.
CREATE TABLE userline (
username text,
tweetid timeuuid,
body text,
PRIMARY KEY(username, tweetid)
);
39. Wait, WTF is a timeuuid?
● Aka "Type 1 UUID" (http://goo.gl/SWuCb)
● 100 nano second units since Oct. 15, 1582
● Timestamp is first 60 bits (sorts temporally!)
● Used like timestamp, but:
○ more granular
○ globally unique
40. userline
-- Range of tweets for a user
SELECT
dateOf(tweetid), body
FROM
userline
WHERE
username = 'stewie' AND
tweetid > minTimeuuid('2013-03-01 12:10:09')
ORDER BY
tweetid DESC
LIMIT 40
43. timeline
-- Materialized view of tweets from
-- the users username follows.
CREATE TABLE timeline (
username text,
tweetid timeuuid,
posted_by text,
body text,
PRIMARY KEY(username, tweetid)
);
44. timeline
-- Range of tweets for a user
SELECT
dateOf(tweetid), posted_by, body
FROM
timeline
WHERE
username = 'stewie' AND
tweetid > '2013-03-01 12:10:09'
ORDER BY
tweetid DESC
LIMIT 40
45. most recent tweets for @meg
dateOf(posted_at) | posted_by | body
--------------------------+-----------+-------------------
2013-03-19 14:43:15-0500 | stewie | victory is mine!
2013-03-19 13:23:25-0500 | meg | evolve intuit...
2013-03-19 13:23:25-0500 | meg | whiteboard bric...
2013-03-19 13:23:25-0500 | stewie | brand clic...
2013-03-19 13:23:25-0500 | brian | synergize gran...
2013-03-19 13:23:24-0500 | brian | expedite real-t...
2013-03-19 13:23:24-0500 | stewie | generate kil...
2013-03-19 13:23:24-0500 | stewie | grow B2B ...
2013-03-19 13:23:24-0500 | meg | generate intera...
...
46. redux: tweets
-- @stewie tweets
BEGIN BATCH
INSERT INTO tweets ...
INSERT INTO userline ...
INSERT INTO timeline ...
INSERT INTO timeline ...
INSERT INTO timeline ...
...
APPLY BATCH
47. In Conclusion:
● Think in terms of your queries, store that
● Don't fear duplication; Space is cheap to scale
● Go wide; Rows can have 2 billion columns!
● The only thing better than NoSQL, is MoSQL
● Python hater? Java ❤'r?
○ https://github.com/eevans/twissandra-j
● http://tinyurl.com/d0ntklik