The document provides an overview of integrating the Cassandra database including:
- Cassandra is a key-value store that evolved to support tables but lacks SQL features like joins and aggregation.
- It offers predictable performance as data grows and no single point of failure through replication across nodes.
- To write and read from Cassandra, clients connect to nodes and operations are distributed based on partitioning keys, with tunable consistency levels.
Apache Cassandra 2.0 is out - now there's no reason not to ditch that ol' legacy relational system for your important online applications. Cassandra 2.0 includes big impact features like Light Weight Transactions and Triggers. Do you know about the other new enhancements that got lost in the noise. Let's put the spotlight on all the things! Changes in memory management, file handling and internals. Low hype but they pack a big punch. While we were at it, we also did a bit of house cleaning.
Cassandra By Example: Data Modelling with CQL3Eric Evans
CQL is the query language for Apache Cassandra that provides an SQL-like interface. The document discusses the evolution from the older Thrift RPC interface to CQL and provides examples of modeling tweet data in Cassandra using tables like users, tweets, following, followers, userline, and timeline. It also covers techniques like denormalization, materialized views, and batch loading of related data to optimize for common queries.
Further discussion on Data Modeling with Apache Cassandra. Overview of formal data modeling techniques as well as practical. Real-world use cases and associated data models.
At this meetup Patrick McFadin, Solutions Architect at DataStax, will be discussing the most recently added features in Apache Cassandra 2.0, including: Lightweight transactions, eager retries, improved compaction, triggers, and CQL cursors. He'll also be touching on time series data with Apache Cassandra.
Introduction to CQL and Data Modeling with Apache CassandraJohnny Miller
Cassandra Meetup, Helsinki February 2014. Introduction to CQL and Data Modeling with Apache Cassandra. You can find the video here: http://bit.ly/jpm_004
Functional data models are great, but how can you squeeze out more performance and make them awesome! Let's talk through some example models, go through the tuning steps and understand the tradeoffs. Many time's just a simple understanding of the underlying internals can make all the difference. I've helped some of the biggest companies in the world do this and I can help you. Do you feel the need for Cassandra 2.0 speed?
The data model is dead, long live the data modelPatrick McFadin
The document discusses how data modeling concepts translate from relational databases to Cassandra. It begins with background on how Cassandra stores data using a row key and columns rather than tables and relations. Common patterns like one-to-many and many-to-many relationships are achieved without foreign keys by duplicating and denormalizing data. The document also covers concepts like UUIDs, transactions, and how some relational features like sequences are handled differently in Cassandra.
Cassandra Day Atlanta 2015: Building Your First Application with Apache Cassa...DataStax Academy
You’ve heard the talks, followed the tutorials, and done the research. You are a font of Cassandra knowledge. Now it’s time to change the world! (Or at least build something to make your boss happy). In this talk we’ll walk through the process of building KillrVideo, an open source video sharing website where users can upload and share videos, rate them, comment on them, and more. By looking at a real application, we’ll talk about architectural decisions, how the application drives the data model, some pro tips when using the DataStax drivers, and some lessons learned from mistakes made along the way. You’ll leave this session ready to start building your next application (world-changing or otherwise) with Cassandra.
Apache Cassandra 2.0 is out - now there's no reason not to ditch that ol' legacy relational system for your important online applications. Cassandra 2.0 includes big impact features like Light Weight Transactions and Triggers. Do you know about the other new enhancements that got lost in the noise. Let's put the spotlight on all the things! Changes in memory management, file handling and internals. Low hype but they pack a big punch. While we were at it, we also did a bit of house cleaning.
Cassandra By Example: Data Modelling with CQL3Eric Evans
CQL is the query language for Apache Cassandra that provides an SQL-like interface. The document discusses the evolution from the older Thrift RPC interface to CQL and provides examples of modeling tweet data in Cassandra using tables like users, tweets, following, followers, userline, and timeline. It also covers techniques like denormalization, materialized views, and batch loading of related data to optimize for common queries.
Further discussion on Data Modeling with Apache Cassandra. Overview of formal data modeling techniques as well as practical. Real-world use cases and associated data models.
At this meetup Patrick McFadin, Solutions Architect at DataStax, will be discussing the most recently added features in Apache Cassandra 2.0, including: Lightweight transactions, eager retries, improved compaction, triggers, and CQL cursors. He'll also be touching on time series data with Apache Cassandra.
Introduction to CQL and Data Modeling with Apache CassandraJohnny Miller
Cassandra Meetup, Helsinki February 2014. Introduction to CQL and Data Modeling with Apache Cassandra. You can find the video here: http://bit.ly/jpm_004
Functional data models are great, but how can you squeeze out more performance and make them awesome! Let's talk through some example models, go through the tuning steps and understand the tradeoffs. Many time's just a simple understanding of the underlying internals can make all the difference. I've helped some of the biggest companies in the world do this and I can help you. Do you feel the need for Cassandra 2.0 speed?
The data model is dead, long live the data modelPatrick McFadin
The document discusses how data modeling concepts translate from relational databases to Cassandra. It begins with background on how Cassandra stores data using a row key and columns rather than tables and relations. Common patterns like one-to-many and many-to-many relationships are achieved without foreign keys by duplicating and denormalizing data. The document also covers concepts like UUIDs, transactions, and how some relational features like sequences are handled differently in Cassandra.
Cassandra Day Atlanta 2015: Building Your First Application with Apache Cassa...DataStax Academy
You’ve heard the talks, followed the tutorials, and done the research. You are a font of Cassandra knowledge. Now it’s time to change the world! (Or at least build something to make your boss happy). In this talk we’ll walk through the process of building KillrVideo, an open source video sharing website where users can upload and share videos, rate them, comment on them, and more. By looking at a real application, we’ll talk about architectural decisions, how the application drives the data model, some pro tips when using the DataStax drivers, and some lessons learned from mistakes made along the way. You’ll leave this session ready to start building your next application (world-changing or otherwise) with Cassandra.
This course is designed to be a “fast start” on the basics of data modeling with Cassandra. We will cover some basic Administration information upfront that is important to understand as you choose your data model. It is still important to take a proper Admin class if you are responsible for production instance. This course focuses on CQL3, but thrift shall not be ignored.
Abstract:
Cassandra is a new kind of database: it is more than a single-machine system. It naturally runs in a High-Availability configuration. All nodes in the system are symmetric; there is no single point of failure. As you add machines, failure becomes routine, and Cassandra is built to tolerate that with no interruptions.
Cassandra is linearly scalable with good performance characteristics for very small and very large data stores. Unlike earlier efforts, Cassandra is more than just a key-value store; it is a structured data store which can facilitate complex use cases and queries. Cassandra allows for random access to your data organized into rows and columns.
Cassandra is different, and exciting. This presentation will discuss the pros and cons of using Cassandra, and why it has seen such amazing adoption in the past year.
Bio:
Ben Coverston is Director of Operations at DataStax (formerly knows as Riptano), a provider of software, support, services, training, resources and help for Cassandra. He has been involved in enterprise software his entire career. Working in the airline industry, he helped to build some of the highest volume online booking sites in the world. He saw first hand the consequences of trying to solve real world scalability problems at the limit of what traditional relational databases are capable of.
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data ModelingDataStax Academy
This document discusses using Cassandra to store and query time series data. It provides examples of modeling weather station data and financial trading data in Cassandra. The key points are:
- Cassandra is well-suited for storing and querying time series data due to its ability to scale out, its resilience, and efficient storage of sequential data.
- Example data models show how to store weather station temperature readings and stock trade events, with timestamps as the primary key to support queries on ranges of time.
- The on-disk layout sequentially stores data, allowing efficient slicing operations to retrieve ranges of records with a single disk seek.
Cassandra nice use cases and worst anti patternsDuyhai Doan
This document discusses Cassandra use cases and anti-patterns. Some good use cases include rate limiting, fraud prevention, account validation, and storing sensor time series data. Poor designs include using Cassandra like a queue, storing null values, intensive updates to the same column, and dynamically changing the schema. The document provides examples and explanations of how to properly implement these scenarios in Cassandra.
This summer, coming to a server near you, Cassandra 3.0! Contributors and committers have been working hard on what is the most ambitious release to date. It’s almost too much to talk about, but we will dig into some of the most important, ground breaking features that you’ll want to use. Indexing changes that will make your applications faster and spark jobs more efficient. Storage engine changes to get even more density and efficiency from your nodes. Developer focused features like full JSON support and User Defined Functions. And finally, one of the most requested features, Windows support, has made it’s arrival. There is more, but you’ll just have to some see for yourself. Get your front row seat and don’t miss it!
Introduction to Data Modeling with Apache CassandraDataStax Academy
This document provides an introduction to data modeling with Apache Cassandra. It discusses how Cassandra data models are designed based on the queries an application will perform, unlike relational databases which are designed based on normalization rules. Key aspects covered include avoiding joins by denormalizing data, using a partition key to group related data on nodes, and controlling the clustering order of columns. The document provides examples of modeling time series and tag data in Cassandra.
A lot has changed since I gave one of these talks and man, has it been good. 2.0 brought us a lot of new CQL features and now with 2.1 we get even more! Let me show you some real life data models and those new features taking developer productivity to an all new high. User Defined Types, New Counters, Paging, Static Columns. Exciting new ways of making your app truly killer!
The document discusses data modeling techniques for Cassandra and provides examples for four use cases: shopping cart data, user activity tracking, log collection/aggregation, and user form versioning. For each use case, it describes the business needs, issues with a relational database approach, and proposes a Cassandra data model using CQL. It emphasizes the importance of proper data modeling and getting the model right for a given use case.
My talk on NOSQL at OGF29.[Update with OSCON'10 presentation!] But updates do not work reliably in slideshare. So I also have latest version with my blog.
This document provides an overview and examples of modeling data in Apache Cassandra. It begins with an introduction to thinking about data models and queries before modeling, and emphasizes that Cassandra requires modeling around queries due to its limitations on joins and indexes. The document then provides examples of modeling user, video, and other entity data for a video sharing application to support common queries. It also discusses techniques for handling queries that could become hotspots, such as bucketing or adding random values. The examples illustrate best practices for data duplication, materialized views, and time series data storage in Cassandra.
The document discusses the evolution of Cassandra's data modeling capabilities over different versions of CQL. It covers features introduced in each version such as user defined types, functions, aggregates, materialized views, and storage attached secondary indexes (SASI). It provides examples of how to create user defined types, functions, materialized views, and SASI indexes in CQL. It also discusses when each feature should and should not be used.
C* Summit 2013: Can't we all just get along? MariaDB and Cassandra by Colin C...DataStax Academy
The Cassandra Storage Engine allows access to data in a Cassandra cluster from MariaDB. Learn what the Cassandra Storage Engine is and how to make use of it, how we implemented it using dynamic columns in MariaDB. Also, we'll look at CQL, data and command mapping, use cases and benchmarks.
This document summarizes a presentation on Cassandra Query Language version 3 (CQL3). It outlines the motivations for CQL3, provides examples of defining schemas and querying data with CQL3, and notes new features like collection support. The document also reviews changes from earlier versions like improved definition of static and dynamic column families using composite keys.
The document discusses Cassandra Storage Engine (Cassandra SE) in MariaDB, which provides interoperability between MariaDB and Cassandra. It covers an introduction to Cassandra, how Cassandra SE works, data mapping between Cassandra and SQL, command mapping, use cases for Cassandra SE, and benchmarks. It aims to enable accessing Cassandra data from MariaDB using SQL queries.
Montreal User Group - Cloning CassandraAdam Hutson
This document provides steps for cloning an Apache Cassandra database cluster. It begins with an introduction and overview. The main steps are:
1. Backup the existing cluster's data, schema, and token assignments and store off-site.
2. Create a new destination cluster matching the original's node count.
3. Restore the backed up data, schema files, and token assignments. Start the new cluster to complete the cloning process.
Cassandra Day Chicago 2015: Advanced Data ModelingDataStax Academy
The document discusses modeling data in Cassandra using the Chebotko method. It begins by explaining the conceptual, logical, and physical modeling stages of the Chebotko method. It then provides an example of modeling user data in a music database, showing the conceptual model, identifying access patterns, and designing the logical model with tables to satisfy each query. The logical model example shows how to design Cassandra tables for queries about performers, albums, tracks, users and their activities.
Getting Started with Apache Cassandra by Junior Evangelist Rebecca MillsDataStax Academy
Would you like to learn how to use Cassandra but don’t know where to begin? Want to get your feet wet but you’re lost in the desert? Longing for a cluster when you don’t even know how to set up a node? Then look no further! Rebecca Mills, Junior Evangelist at Datastax, will guide you in the webinar “Getting Started with Apache Cassandra...”
You'll get an overview of Planet Cassandra’s resources to get you started quickly and easily. Rebecca will take you down the path that's right for you, whether you are a developer or administrator. Join if you are interested in getting Cassandra up and working in the way that suits you best.
A 30 minute talk I did at Cassandra Dublin and Cassandra London. Just some things I've learned along the way as I've helped some of the largest users of Cassandra be successful. Learn form other peoples mistakes!
Using Apache Cassandra: What is this thing, and how do I use it?jeremiahdjordan
This is the presentation I gave at the Reflections | Projections conference at UIUC. http://www.acm.uiuc.edu/conference/2013/ It is an introduction to some of the basics of Apache Cassandra, followed by actually getting it up and running. This presentation goes over what Apache Cassandra is and how to get it up and running on your development machine. It then goes over using the DataStax Python Driver and the Cassandra Query Language (CQL) to create tables, write data to them, and then read it back out.
Use Your MySQL Knowledge to Become a MongoDB GuruTim Callaghan
Leverage all of your MySQL knowledge and experience to get up to speed quickly with MongoDB.
Presented at Percona Live London 2013 with Robert Hodges of Continuent.
This course is designed to be a “fast start” on the basics of data modeling with Cassandra. We will cover some basic Administration information upfront that is important to understand as you choose your data model. It is still important to take a proper Admin class if you are responsible for production instance. This course focuses on CQL3, but thrift shall not be ignored.
Abstract:
Cassandra is a new kind of database: it is more than a single-machine system. It naturally runs in a High-Availability configuration. All nodes in the system are symmetric; there is no single point of failure. As you add machines, failure becomes routine, and Cassandra is built to tolerate that with no interruptions.
Cassandra is linearly scalable with good performance characteristics for very small and very large data stores. Unlike earlier efforts, Cassandra is more than just a key-value store; it is a structured data store which can facilitate complex use cases and queries. Cassandra allows for random access to your data organized into rows and columns.
Cassandra is different, and exciting. This presentation will discuss the pros and cons of using Cassandra, and why it has seen such amazing adoption in the past year.
Bio:
Ben Coverston is Director of Operations at DataStax (formerly knows as Riptano), a provider of software, support, services, training, resources and help for Cassandra. He has been involved in enterprise software his entire career. Working in the airline industry, he helped to build some of the highest volume online booking sites in the world. He saw first hand the consequences of trying to solve real world scalability problems at the limit of what traditional relational databases are capable of.
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data ModelingDataStax Academy
This document discusses using Cassandra to store and query time series data. It provides examples of modeling weather station data and financial trading data in Cassandra. The key points are:
- Cassandra is well-suited for storing and querying time series data due to its ability to scale out, its resilience, and efficient storage of sequential data.
- Example data models show how to store weather station temperature readings and stock trade events, with timestamps as the primary key to support queries on ranges of time.
- The on-disk layout sequentially stores data, allowing efficient slicing operations to retrieve ranges of records with a single disk seek.
Cassandra nice use cases and worst anti patternsDuyhai Doan
This document discusses Cassandra use cases and anti-patterns. Some good use cases include rate limiting, fraud prevention, account validation, and storing sensor time series data. Poor designs include using Cassandra like a queue, storing null values, intensive updates to the same column, and dynamically changing the schema. The document provides examples and explanations of how to properly implement these scenarios in Cassandra.
This summer, coming to a server near you, Cassandra 3.0! Contributors and committers have been working hard on what is the most ambitious release to date. It’s almost too much to talk about, but we will dig into some of the most important, ground breaking features that you’ll want to use. Indexing changes that will make your applications faster and spark jobs more efficient. Storage engine changes to get even more density and efficiency from your nodes. Developer focused features like full JSON support and User Defined Functions. And finally, one of the most requested features, Windows support, has made it’s arrival. There is more, but you’ll just have to some see for yourself. Get your front row seat and don’t miss it!
Introduction to Data Modeling with Apache CassandraDataStax Academy
This document provides an introduction to data modeling with Apache Cassandra. It discusses how Cassandra data models are designed based on the queries an application will perform, unlike relational databases which are designed based on normalization rules. Key aspects covered include avoiding joins by denormalizing data, using a partition key to group related data on nodes, and controlling the clustering order of columns. The document provides examples of modeling time series and tag data in Cassandra.
A lot has changed since I gave one of these talks and man, has it been good. 2.0 brought us a lot of new CQL features and now with 2.1 we get even more! Let me show you some real life data models and those new features taking developer productivity to an all new high. User Defined Types, New Counters, Paging, Static Columns. Exciting new ways of making your app truly killer!
The document discusses data modeling techniques for Cassandra and provides examples for four use cases: shopping cart data, user activity tracking, log collection/aggregation, and user form versioning. For each use case, it describes the business needs, issues with a relational database approach, and proposes a Cassandra data model using CQL. It emphasizes the importance of proper data modeling and getting the model right for a given use case.
My talk on NOSQL at OGF29.[Update with OSCON'10 presentation!] But updates do not work reliably in slideshare. So I also have latest version with my blog.
This document provides an overview and examples of modeling data in Apache Cassandra. It begins with an introduction to thinking about data models and queries before modeling, and emphasizes that Cassandra requires modeling around queries due to its limitations on joins and indexes. The document then provides examples of modeling user, video, and other entity data for a video sharing application to support common queries. It also discusses techniques for handling queries that could become hotspots, such as bucketing or adding random values. The examples illustrate best practices for data duplication, materialized views, and time series data storage in Cassandra.
The document discusses the evolution of Cassandra's data modeling capabilities over different versions of CQL. It covers features introduced in each version such as user defined types, functions, aggregates, materialized views, and storage attached secondary indexes (SASI). It provides examples of how to create user defined types, functions, materialized views, and SASI indexes in CQL. It also discusses when each feature should and should not be used.
C* Summit 2013: Can't we all just get along? MariaDB and Cassandra by Colin C...DataStax Academy
The Cassandra Storage Engine allows access to data in a Cassandra cluster from MariaDB. Learn what the Cassandra Storage Engine is and how to make use of it, how we implemented it using dynamic columns in MariaDB. Also, we'll look at CQL, data and command mapping, use cases and benchmarks.
This document summarizes a presentation on Cassandra Query Language version 3 (CQL3). It outlines the motivations for CQL3, provides examples of defining schemas and querying data with CQL3, and notes new features like collection support. The document also reviews changes from earlier versions like improved definition of static and dynamic column families using composite keys.
The document discusses Cassandra Storage Engine (Cassandra SE) in MariaDB, which provides interoperability between MariaDB and Cassandra. It covers an introduction to Cassandra, how Cassandra SE works, data mapping between Cassandra and SQL, command mapping, use cases for Cassandra SE, and benchmarks. It aims to enable accessing Cassandra data from MariaDB using SQL queries.
Montreal User Group - Cloning CassandraAdam Hutson
This document provides steps for cloning an Apache Cassandra database cluster. It begins with an introduction and overview. The main steps are:
1. Backup the existing cluster's data, schema, and token assignments and store off-site.
2. Create a new destination cluster matching the original's node count.
3. Restore the backed up data, schema files, and token assignments. Start the new cluster to complete the cloning process.
Cassandra Day Chicago 2015: Advanced Data ModelingDataStax Academy
The document discusses modeling data in Cassandra using the Chebotko method. It begins by explaining the conceptual, logical, and physical modeling stages of the Chebotko method. It then provides an example of modeling user data in a music database, showing the conceptual model, identifying access patterns, and designing the logical model with tables to satisfy each query. The logical model example shows how to design Cassandra tables for queries about performers, albums, tracks, users and their activities.
Getting Started with Apache Cassandra by Junior Evangelist Rebecca MillsDataStax Academy
Would you like to learn how to use Cassandra but don’t know where to begin? Want to get your feet wet but you’re lost in the desert? Longing for a cluster when you don’t even know how to set up a node? Then look no further! Rebecca Mills, Junior Evangelist at Datastax, will guide you in the webinar “Getting Started with Apache Cassandra...”
You'll get an overview of Planet Cassandra’s resources to get you started quickly and easily. Rebecca will take you down the path that's right for you, whether you are a developer or administrator. Join if you are interested in getting Cassandra up and working in the way that suits you best.
A 30 minute talk I did at Cassandra Dublin and Cassandra London. Just some things I've learned along the way as I've helped some of the largest users of Cassandra be successful. Learn form other peoples mistakes!
Using Apache Cassandra: What is this thing, and how do I use it?jeremiahdjordan
This is the presentation I gave at the Reflections | Projections conference at UIUC. http://www.acm.uiuc.edu/conference/2013/ It is an introduction to some of the basics of Apache Cassandra, followed by actually getting it up and running. This presentation goes over what Apache Cassandra is and how to get it up and running on your development machine. It then goes over using the DataStax Python Driver and the Cassandra Query Language (CQL) to create tables, write data to them, and then read it back out.
Use Your MySQL Knowledge to Become a MongoDB GuruTim Callaghan
Leverage all of your MySQL knowledge and experience to get up to speed quickly with MongoDB.
Presented at Percona Live London 2013 with Robert Hodges of Continuent.
1. The document discusses the motivation and goals in building an Illumos-based OS called OmniOS. The key goals were to have ABI stability, ZFS, zones, DTrace, and be open source while also being installable via network and having consistent multi-architecture support.
2. The document outlines the release cycles established with regular minor weekly updates and major releases every 6 months, as well as goals around commercial support.
3. OmniOS is designed to be minimal, acting as a base for others to build more comprehensive distributions on top of, and is available via various methods like Vagrant boxes, ISOs, and AMIs.
The document discusses RabbitMQ boot steps, which take care of starting the many RabbitMQ subsystems in the proper order and respecting dependencies. It was created by @leastfixedpoint to handle ordering and allow modifying configuration at startup. The boot steps are defined using Erlang modules and module attributes, and their execution involves functions like rabbit_misc:all_module_attributes/1, rabbit:boot_steps/0, and rabbit_misc:build_acyclic_graph/3.
Keeping responsive into the future by Chris millsCodemotion
Chris Mills will go beyond the obvious, looking at what we can do today to adapt our front-ends to different browsing environments, from mobiles and other alternative devices to older browsers we may be called upon to support. You’ll learn some advanced media query and viewport tricks, including a look at @viewport; insights into responsive images: problems, and current solutions; how to provide usable alternatives to older browsers with Modernizr; what other CSS3 modules provide responsive capabilities; and where media queries are going in the future, with CSS4 media queries.
Introduction to Cassandra and Data Modelingnickmbailey
This document contains a presentation on Cassandra and how it can be used. It discusses Cassandra's architecture based on Dynamo and BigTable, as well as how it provides availability, scalability, and performance. It covers data modeling techniques in Cassandra like column families, static vs dynamic columns, and using timestamps for time series data. Examples are provided for modeling user login data and social network activity. Anti-patterns like super columns and read-before-write are also discussed. The document concludes with information on an Ebay use case involving social signals and recommendations.
The document is a presentation on responsive web design (RWD) given by Zach Leatherman. It discusses the goals of RWD, including providing a flexible grid and flexible media. It also covers potential performance issues with RWD like unnecessary CSS downloads and large images on small screens. The presentation provides solutions to these problems like using media queries to separate CSS and choosing minimal CSS when possible.
Spark Summit 2013 Talk:
At Sharethrough we have deployed Spark to our production environment to support several user facing product features. While building these features we uncovered a consistent set of challenges across multiple streaming jobs. By addressing these challenges you can speed up development of future streaming jobs. In this talk we will discuss the 3 major challenges we encountered while developing production streaming jobs and how we overcame them.
First we will look at how to write jobs to ensure fault tolerance since streaming jobs need to run 24/7 even under failure conditions. Second we will look at the programming abstractions we created using functional programming and existing libraries. Finally we will look at the way we test all the pieces of a job –from manipulating data through writing to external databases– to give us confidence in our code before we deploy to production
What can we learn from NoSQL technologies?Ivan Zoratti
This document summarizes Ivan Zoratti's presentation on NoSQL technologies. It discusses some of the perceived reasons for adopting NoSQL such as flexibility over schemas. It also summarizes key differences between NoSQL and SQL databases, such as schema-less designs and horizontal scaling in NoSQL. Additionally, it covers CAP theorem, examples of NoSQL databases, and when MySQL and NoSQL may each be better fits for different data and application needs.
Escalando una PHP App con DB sharding - PHP ConferenceMatias Paterlini
Presentación del PHP Conference Argentina 2013 sobre escalabilidad horizontal a nivel web y bases de datos para aplicaciones escritas en PHP, trabajando sobre sharding en mysql, amazon web services, y modelo de datos no relacional.
2013 - Matías Paterlini: Escalando PHP con sharding y Amazon Web Services PHP Conference Argentina
The document discusses scaling a PHP application using database sharding and Amazon Web Services. It describes using MySQL databases sharded across multiple database servers to improve scalability. Key points include using an application driver to determine which shard to write to, caching static files and data, and leveraging various Amazon services like S3, EC2, CloudWatch, and Route 53. Benefits of this approach include smaller, faster databases that are easier to manage and scale out writes by adding additional database shards.
This document contains the slides from a presentation by Patrick Chanezon on cloud computing. Some key points from the presentation include:
- Cloud computing has evolved from consumer websites needing to solve problems with large data sets, storage capacity, and scalability. This led to public cloud services from companies like Amazon and Google.
- While infrastructure as a service provides virtualization and scalability, platforms are still needed to build distributed applications. Platform as a service providers aim to make application development easier by providing services and hiding infrastructure details.
- Agile development processes are better suited for the fast iteration cycles needed when developing applications for consumer markets with short product lifetimes. Cloud platforms help enable more agile development.
The document discusses ActiveRecord in Rails. It explains that every model class in Rails corresponds to a database table. When a model is generated using a Rails generator, it creates a migration file that builds the corresponding database table when run. This establishes the connection between models and their database representations.
Rapid Home Provisioning is a new feature in Oracle Grid Infrastructure 12c R2 that provides a simplified way to provision and patch Oracle software and databases. It uses a centralized management server and golden images stored on ACFS to deploy pre-packaged and patched Oracle homes to client nodes. Administrators can easily create working copies of golden images, deploy databases from the working copies, and seamlessly patch databases by moving them to a working copy based on a newer patched golden image with a single command.
MYSQLCLONE is a free and simple tool used to clone MySQL databases from one server to another. It can transfer the entire database including data, schemas, stored procedures, functions and events. The tool connects to the source and destination databases using connection parameters and then transfers the database objects and data in either LOAD or INSERT mode. Quick usage examples are provided to demonstrate transferring the full database, schema objects only, and row data in INSERT mode.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
2. Maurits Lawende
•
•
•
dinsdag 12 november 13
Work at Dutch Open Projects (DOP) since 2007
Development and technical design for challenging Drupal sites
Development of SaaS solutions in PHP & NodeJS
18. Operations in Cassandra 1.0
•
CREATE KEYSPACE name
•
•
•
•
dinsdag 12 november 13
USE name
CREATE COLUMN FAMILY name
DROP KEYSPACE name
DROP COLUMN FAMILY name
19. Operations in Cassandra 1.0
•
•
•
•
•
dinsdag 12 november 13
SET columnfamily[‘row’][‘column’] = ‘value’;
GET columnfamily[‘row’]
LIST columnfamily
DEL columnfamily[‘row’]
DEL columnfamily[‘row’][‘column’]
21. Operations in Cassandra 1.0
post
•
•
•
dinsdag 12 november 13
post[‘uuid’][‘title’] = ‘First post!’;
user[‘mau’][‘firstname’] = ‘Maurits’;
user[‘mau’][‘lastname’] = ‘Lawende’;
title
uuid First post!
user
firstname
mau Maurits
lastname
Lawende
22. Operations in Cassandra 1.0
sorted by rowkey, columnname (all ascending)
•
•
•
dinsdag 12 november 13
post[‘uuid’][‘title’] = ‘First post!’;
user[‘mau’][‘firstname’] = ‘Maurits’;
user[‘mau’][‘lastname’] = ‘Lawende’;
24. Operations in Cassandra 1.0
How to get a list
of blogs by “mau”?
•
•
•
dinsdag 12 november 13
post[‘uuid’][‘title’] = ‘First post!’;
post[‘uuid’][‘user’] = ‘mau’;
user[‘mau’][‘firstname’] = ‘Maurits’;
25. Operations in Cassandra 1.0
How to get a list
of blogs by “mau”?
•
•
•
dinsdag 12 november 13
post[‘uuid’][‘title’] = ‘First post!’;
post[‘uuid’][‘user’] = ‘mau’;
user[‘mau’][‘firstname’] = ‘Maurits’;
WHERE user = ‘mau’
26. Operations in Cassandra 1.0
How to get a list
of blogs by “mau”?
•
•
•
dinsdag 12 november 13
WHERE user = ‘mau’
post[‘uuid’][‘title’] = ‘First post!’;
Bad Request:
No indexed columns present in
post[‘uuid’][‘user’] = ‘mau’;
by-columns clause with
user[‘mau’][‘firstname’] = ‘Maurits’;
Equal operator
27. Operations in Cassandra 1.0
How to get a list
of blogs by “mau”?
•
•
•
WHERE user = ‘mau’
post[‘uuid’][‘title’] = ‘First post!’;
Bad Request:
No indexed columns present in
post[‘uuid’][‘user’] = ‘mau’;
by-columns clause with
user[‘mau’][‘firstname’] = ‘Maurits’;
Equal operator
sequal scans
are rejected
dinsdag 12 november 13
28. Operations in Cassandra 1.0
How to get a list
of blogs by “mau”?
WHERE user = ‘mau’
post[‘uuid’][‘title’] = ‘First post!’;
Bad Request:
No indexed columns present in
post[‘uuid’][‘user’] = ‘mau’;
by-columns clause with
user[‘mau’][‘firstname’] = ‘Maurits’;
Equal operator
Bad Request: Order by is currently only supported
on the clustered columns of the PRIMARY KEY
•
•
•
dinsdag 12 november 13
29. Operations in Cassandra 1.0
How to get a list
of blogs by “mau”?
WHERE user = ‘mau’
post[‘uuid’][‘title’] = ‘First post!’;
Bad Request:
No indexed columns present in
post[‘uuid’][‘user’] = ‘mau’;
by-columns clause with
user[‘mau’][‘firstname’] = ‘Maurits’;
Equal operator
Bad Request: Order by is currently only supported
on the clustered columns of the PRIMARY KEY
Bad Request: ORDER BY is only supported when the partition key is
restricted by an EQ or an IN.
•
•
•
dinsdag 12 november 13
30. Operations in Cassandra 1.0
How to get a list
of blogs by “mau”?
•
•
•
dinsdag 12 november 13
post[‘uuid’][‘title’] = ‘First post!’;
post[‘uuid’][‘user’] = ‘mau’;
user[‘mau’][‘firstname’] = ‘Maurits’;
WHERE user = ‘mau’
ORDER BY date DESC
LIMIT 10
31. Operations in Cassandra 1.0
How to get a list
of blogs by “mau”?
•
•
•
dinsdag 12 november 13
post[‘uuid’][‘title’] = ‘First post!’;
post[‘uuid’][‘user’] = ‘mau’;
user[‘mau’][‘firstname’] = ‘Maurits’;
WHERE user = ‘mau’
ORDER BY date DESC
LIMIT 10
only possible when user and
date is in primary key
38. Operations in Cassandra 1.0
•
•
•
•
dinsdag 12 november 13
post[‘uuid’][‘title’] = ‘First post!’;
user[‘mau’][‘firstname’] = ‘Maurits’;
only one query required
to get user profile
with latest posts
user[‘mau’][‘post001:uuid’] = ‘First post!’;
user[‘mau’][‘post002:uuid’] = ‘Second post!’;
40. Beauty?
•
•
•
•
dinsdag 12 november 13
Dirty in the SQL world, but;
It’s a best practice in Big Data
Don’t think of it as a relational database
No strict rules on how to use it, just push it to the limits
66. Read repair
•
•
•
dinsdag 12 november 13
Compares data with 2 other replica’s in the background
Fixes inconsistent and missing data
At 10% of all reads
67. Node repair
•
•
dinsdag 12 november 13
Gradually compares all data in nodes with replica’s
Required in conjunction with read repair to fix ‘forgotten deletes’
68. ACID theorem
•
•
•
•
dinsdag 12 november 13
Atomic; completed successfully or entirely rolled back
Consistent; transations never invalidates the database state
Isolated; transactions are processed sequential
Durable; completed actions are persistent
69. CAP theorem
Impossible to achieve all three:
•
•
•
dinsdag 12 november 13
Consistency
Availability
Partition tolerance
71. Eventual consistency
•
•
Best effort
•
Configurable consistency level, but no transaction support
dinsdag 12 november 13
Consistency is not always more important than speed and scalability
(doesn’t require locking)
73. Surrogate keys
Say bye to sequences
ss cluster
istent acro
not cons
dinsdag 12 november 13
74. Surrogate keys
Say bye to sequences
ss cluster
istent acro
not cons
counters a
re for cou
n
dinsdag 12 november 13
ting
75. Native support for uuid’s
f47ac10b-58cc-4372-a567-0e02b2c3d479
Surrogate keys
Say bye to sequences
ss cluster
istent acro
not cons
counters a
re for cou
n
dinsdag 12 november 13
ting
79. Lists
•
•
user[‘mau’][‘posts’] = ‘uuid’;
•
•
UPDATE user SET posts = posts + [‘uuid’]
dinsdag 12 november 13
CREATE TABLE user (
username text PRIMARY KEY,
posts list<uuid>
);
UPDATE user SET posts = [‘uuid’] + posts
80. Set
•
CREATE TABLE user (
username text PRIMARY KEY,
email set<text>
);
•
UPDATE user SET emails = emails + {‘mail@example.com’}
dinsdag 12 november 13
81. Maps
•
CREATE TABLE user (
username text PRIMARY KEY,
attending map<timestamp,text>
);
•
•
UPDATE user SET attending[‘2013-11-12’] = ‘PHPMeetup’
dinsdag 12 november 13
DELETE attending[‘2013-12-05’] FROM user
82. Limits on collections
•
•
•
dinsdag 12 november 13
64K
Whole collection loaded in memory when reading / writing
Not an alternative to wide tables!
83. Limits on collections
•
•
•
dinsdag 12 november 13
64K
No size check in CQL
SET list = list + [‘...’]
Whole collection loaded in memory when reading / writing
Not an alternative to wide tables!
85. Wide tables in CQL3
•
•
dinsdag 12 november 13
CREATE TABLE tweets (
tweet_id uuid PRIMARY KEY,
author varchar,
body varchar
);
CREATE TABLE timeline (
user_id varchar,
tweet_id uuid,
author varchar,
body varchar,
PRIMARY KEY (user_id, tweet_id)
)
user_id
mau
user_id
mike
uuid:author
anne
uuid:author
david
uuid:body
Tweet from Anne
uuid:body
Tweet from David
86. Wide tables in CQL3
For schemaless lovers:
•
•
dinsdag 12 november 13
CREATE TABLE tweets (
tweet_id uuid PRIMARY KEY,
author varchar,
body varchar
);
CREATE TABLE timeline (
user_id varchar,
tweet_id uuid,
author varchar,
body varchar,
PRIMARY KEY (user_id, tweet_id)
)
user_id
mau
user_id
mike
CREATE TABLE name (
rowkey varchar,
columnname varchar,
value blob,
PRIMARY KEY (rowkey, columnname)
);
uuid:author uuid:body
anne
Tweet from Anne
uuid:author uuid:body
david
Tweet from David
87. Secondary index
•
•
dinsdag 12 november 13
CREATE INDEX name ON table (column);
High memory usage when used with high cardinality
91. Iteration
•
•
dinsdag 12 november 13
SELECT * FROM users
SELECT token(username), username, country, age FROM user
WHERE token(username) > 23947239 LIMIT 10
112. Pig
input_lines = LOAD '/tmp/my-copy-of-all-pages-on-internet' AS (line:chararray);
words = FOREACH input_lines GENERATE FLATTEN(TOKENIZE(line)) AS word;
filtered_words = FILTER words BY word MATCHES 'w+';
word_groups = GROUP filtered_words BY word;
word_count = FOREACH word_groups GENERATE COUNT(filtered_words) AS
count, group AS word;
ordered_word_count = ORDER word_count BY count DESC;
STORE ordered_word_count INTO '/tmp/number-of-words-on-internet';
dinsdag 12 november 13
113. Hive
SELECT v['ip'], COUNT(1) AS cnt FROM www_access
GROUP BY v['ip']
ORDER BY cnt DESC LIMIT 30
dinsdag 12 november 13
114. Pig and Hive
•
•
•
dinsdag 12 november 13
Using MapReduce
No(t very) predictable performance
Good for analysis
115. Hack your own
•
•
•
•
dinsdag 12 november 13
Not too difficult
Data can be split into subsets by filtering on tokens
Application must run on all MapRed nodes
Probably better performance than Pig / Hive
118. Thrift
•
•
•
•
•
dinsdag 12 november 13
Something like SOAP in a binary format
Tool which generates libraries based on definition files
Supports many languages (incl. PHP, JS, NodeJS, c, java, python, ruby.....)
Also used by HyperTable, HBase, Accumulo and ElasticSearch
Sole interface before 1.2
120. Binary protocol
•
•
•
dinsdag 12 november 13
Recommended protocol for Cassandra 1.2
Few client libraries available
No binary connectors were available for PHP
https://github.com/mauritsl/php-cassandra
121. php-cassandra
require('lib/cassandra/Cassandra.php');
use CassandraConnection as Cassandra;
$connection = new Cassandra('localhost', 'keyspace');
$rows = $connection->query('SELECT * FROM user');
foreach ($rows as $row) {
print $row->firstname;
print $row->listfield[0];
}
$rows->count();
$rows->getColumns();
dinsdag 12 november 13
123. Rule 1:
Don’t ask for NoSQL drivers for a CMS
dinsdag 12 november 13
124. Cassandra does not fit all
(same story for every NoSQL solution)
dinsdag 12 november 13
125. Every page (or API call) should only
require a few (if not one) query
dinsdag 12 november 13
126. Static versus Dynamic data
•
Static: information that doesn’t change very often
•
•
•
I.e.: translations
May go in a RDBMS or local storage (files?)
Dynamic: many changes
•
•
dinsdag 12 november 13
Changes must be visible on all nodes
Use Cassandra
127. Local versus Global data
•
Logging
•
•
Separate logs per node
Cache
•
•
Sometimes no need to share cache between nodes
Statistics
•
dinsdag 12 november 13
Can be kept local for a limited time
128. Local versus Global data
•
Sessions
•
dinsdag 12 november 13
Dependent on session stickiness
129. Caching
•
•
Memcache is recommended for local cache
Cassandra can be used for global cache
•
dinsdag 12 november 13
Has a TTL feature
INSERT INTO ... (...) VALUES (...) USING TTL 86400
131. What about files?
•
•
dinsdag 12 november 13
Use Hadoop Distributed File System (HDFS) or GlusterFS
Or use Cassandra
132. What about files?
•
•
Split files in chunks to avoid hotspots and save the heap
Not uncommon to have files in Cassandra
•
•
dinsdag 12 november 13
github.com/Netflix/astyanax
GB’s are ok, but do not store TB’s
133. Maximum size of cluster?
•
•
No satisfactory answer
Probably more dependent on network equipment
•
•
•
dinsdag 12 november 13
Rack awareness helps here
Facebook: 150 node cluster, 50TB data (2010)
Easou: 400 node cluster, 300TB data (300 million images)
134. Minimum size of a cluster?
•
•
•
dinsdag 12 november 13
Can run on a single node
4GB RAM recommended
Runs fine on 1GB RAM
135. Minimum size of a cluster?
•
•
•
dinsdag 12 november 13
Can run on a single node
4GB RAM recommended
Runs fine on 1GB RAM
“hot data” should fit in RAM
136. Installing Cassandra
•
Install JDK
Oracle Java recommended but OpenJDK works ok
•
•
•
•
Add Cassandra repository
dinsdag 12 november 13
apt-get install cassandra
Set listen and seed address (IP address of node and seed)
(Re)start Cassandra