Speaker(s): Luke Tillman, Apache Cassandra Language Evangelist at DataStax
You’ve heard the talks, followed the tutorials, and done the research. You are a font of Cassandra knowledge. Now it’s time to change the world! (Or at least build something to make your boss happy). In this talk we’ll walk through the process of building KillrVideo, an open source video sharing website where users can upload and share videos, rate them, comment on them, and more. By looking at a real application, we’ll talk about architectural decisions, how the application drives the data model, some pro tips when using the DataStax drivers, and some lessons learned from mistakes made along the way. You’ll leave this session ready to start building your next application (world-changing or otherwise) with Cassandra.
Cassandra nice use cases and worst anti patternsDuyhai Doan
This document discusses Cassandra use cases and anti-patterns. Some good use cases include rate limiting, fraud prevention, account validation, and storing sensor time series data. Poor designs include using Cassandra like a queue, storing null values, intensive updates to the same column, and dynamically changing the schema. The document provides examples and explanations of how to properly implement these scenarios in Cassandra.
Cassandra Day Chicago 2015: Advanced Data ModelingDataStax Academy
The document discusses modeling data in Cassandra using the Chebotko method. It begins by explaining the conceptual, logical, and physical modeling stages of the Chebotko method. It then provides an example of modeling user data in a music database, showing the conceptual model, identifying access patterns, and designing the logical model with tables to satisfy each query. The logical model example shows how to design Cassandra tables for queries about performers, albums, tracks, users and their activities.
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data ModelingDataStax Academy
This document discusses using Cassandra to store and query time series data. It provides examples of modeling weather station data and financial trading data in Cassandra. The key points are:
- Cassandra is well-suited for storing and querying time series data due to its ability to scale out, its resilience, and efficient storage of sequential data.
- Example data models show how to store weather station temperature readings and stock trade events, with timestamps as the primary key to support queries on ranges of time.
- The on-disk layout sequentially stores data, allowing efficient slicing operations to retrieve ranges of records with a single disk seek.
This document summarizes a presentation on Cassandra Query Language version 3 (CQL3). It outlines the motivations for CQL3, provides examples of defining schemas and querying data with CQL3, and notes new features like collection support. The document also reviews changes from earlier versions like improved definition of static and dynamic column families using composite keys.
Further discussion on Data Modeling with Apache Cassandra. Overview of formal data modeling techniques as well as practical. Real-world use cases and associated data models.
Apache Cassandra 2.0 is out - now there's no reason not to ditch that ol' legacy relational system for your important online applications. Cassandra 2.0 includes big impact features like Light Weight Transactions and Triggers. Do you know about the other new enhancements that got lost in the noise. Let's put the spotlight on all the things! Changes in memory management, file handling and internals. Low hype but they pack a big punch. While we were at it, we also did a bit of house cleaning.
This summer, coming to a server near you, Cassandra 3.0! Contributors and committers have been working hard on what is the most ambitious release to date. It’s almost too much to talk about, but we will dig into some of the most important, ground breaking features that you’ll want to use. Indexing changes that will make your applications faster and spark jobs more efficient. Storage engine changes to get even more density and efficiency from your nodes. Developer focused features like full JSON support and User Defined Functions. And finally, one of the most requested features, Windows support, has made it’s arrival. There is more, but you’ll just have to some see for yourself. Get your front row seat and don’t miss it!
Introduction to Data Modeling with Apache CassandraDataStax Academy
This document provides an introduction to data modeling with Apache Cassandra. It discusses how Cassandra data models are designed based on the queries an application will perform, unlike relational databases which are designed based on normalization rules. Key aspects covered include avoiding joins by denormalizing data, using a partition key to group related data on nodes, and controlling the clustering order of columns. The document provides examples of modeling time series and tag data in Cassandra.
Cassandra nice use cases and worst anti patternsDuyhai Doan
This document discusses Cassandra use cases and anti-patterns. Some good use cases include rate limiting, fraud prevention, account validation, and storing sensor time series data. Poor designs include using Cassandra like a queue, storing null values, intensive updates to the same column, and dynamically changing the schema. The document provides examples and explanations of how to properly implement these scenarios in Cassandra.
Cassandra Day Chicago 2015: Advanced Data ModelingDataStax Academy
The document discusses modeling data in Cassandra using the Chebotko method. It begins by explaining the conceptual, logical, and physical modeling stages of the Chebotko method. It then provides an example of modeling user data in a music database, showing the conceptual model, identifying access patterns, and designing the logical model with tables to satisfy each query. The logical model example shows how to design Cassandra tables for queries about performers, albums, tracks, users and their activities.
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data ModelingDataStax Academy
This document discusses using Cassandra to store and query time series data. It provides examples of modeling weather station data and financial trading data in Cassandra. The key points are:
- Cassandra is well-suited for storing and querying time series data due to its ability to scale out, its resilience, and efficient storage of sequential data.
- Example data models show how to store weather station temperature readings and stock trade events, with timestamps as the primary key to support queries on ranges of time.
- The on-disk layout sequentially stores data, allowing efficient slicing operations to retrieve ranges of records with a single disk seek.
This document summarizes a presentation on Cassandra Query Language version 3 (CQL3). It outlines the motivations for CQL3, provides examples of defining schemas and querying data with CQL3, and notes new features like collection support. The document also reviews changes from earlier versions like improved definition of static and dynamic column families using composite keys.
Further discussion on Data Modeling with Apache Cassandra. Overview of formal data modeling techniques as well as practical. Real-world use cases and associated data models.
Apache Cassandra 2.0 is out - now there's no reason not to ditch that ol' legacy relational system for your important online applications. Cassandra 2.0 includes big impact features like Light Weight Transactions and Triggers. Do you know about the other new enhancements that got lost in the noise. Let's put the spotlight on all the things! Changes in memory management, file handling and internals. Low hype but they pack a big punch. While we were at it, we also did a bit of house cleaning.
This summer, coming to a server near you, Cassandra 3.0! Contributors and committers have been working hard on what is the most ambitious release to date. It’s almost too much to talk about, but we will dig into some of the most important, ground breaking features that you’ll want to use. Indexing changes that will make your applications faster and spark jobs more efficient. Storage engine changes to get even more density and efficiency from your nodes. Developer focused features like full JSON support and User Defined Functions. And finally, one of the most requested features, Windows support, has made it’s arrival. There is more, but you’ll just have to some see for yourself. Get your front row seat and don’t miss it!
Introduction to Data Modeling with Apache CassandraDataStax Academy
This document provides an introduction to data modeling with Apache Cassandra. It discusses how Cassandra data models are designed based on the queries an application will perform, unlike relational databases which are designed based on normalization rules. Key aspects covered include avoiding joins by denormalizing data, using a partition key to group related data on nodes, and controlling the clustering order of columns. The document provides examples of modeling time series and tag data in Cassandra.
This document provides an overview and examples of modeling data in Apache Cassandra. It begins with an introduction to thinking about data models and queries before modeling, and emphasizes that Cassandra requires modeling around queries due to its limitations on joins and indexes. The document then provides examples of modeling user, video, and other entity data for a video sharing application to support common queries. It also discusses techniques for handling queries that could become hotspots, such as bucketing or adding random values. The examples illustrate best practices for data duplication, materialized views, and time series data storage in Cassandra.
Cassandra By Example: Data Modelling with CQL3Eric Evans
CQL is the query language for Apache Cassandra that provides an SQL-like interface. The document discusses the evolution from the older Thrift RPC interface to CQL and provides examples of modeling tweet data in Cassandra using tables like users, tweets, following, followers, userline, and timeline. It also covers techniques like denormalization, materialized views, and batch loading of related data to optimize for common queries.
A lot has changed since I gave one of these talks and man, has it been good. 2.0 brought us a lot of new CQL features and now with 2.1 we get even more! Let me show you some real life data models and those new features taking developer productivity to an all new high. User Defined Types, New Counters, Paging, Static Columns. Exciting new ways of making your app truly killer!
Functional data models are great, but how can you squeeze out more performance and make them awesome! Let's talk through some example models, go through the tuning steps and understand the tradeoffs. Many time's just a simple understanding of the underlying internals can make all the difference. I've helped some of the biggest companies in the world do this and I can help you. Do you feel the need for Cassandra 2.0 speed?
The document summarizes new features and improvements in Apache Cassandra 2.1, including enhanced performance, lightweight transactions, collection indexing, improved counters, incremental repair, and a new row cache. It also discusses Cassandra's use at eBay to power mission-critical features for hundreds of millions of users daily.
- The document discusses enhancements and new features in Cassandra 2.1 including user defined types, collection indexing, improved counters, data directory changes, bloom filter improvements, and more efficient repair. It also outlines the new query cache and row cache features in Cassandra.
1) The document discusses new features in Apache Cassandra including JSON support, collections, user-defined types, role-based authorization, user-defined functions, commitlog compression, and DateTieredCompactionStrategy.
2) It also discusses upcoming Cassandra 3.0 features like a new storage engine, hinted handoff improvements, materialized views, and a 3.x development process.
3) Benchmark results are shown for some new features like commitlog compression, DateTieredCompactionStrategy, hinted handoffs, and materialized views which demonstrate performance improvements.
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
This document provides an overview of using Datastax Enterprise (DSE) Search to enable full-text search capabilities in Cassandra applications. It discusses how DSE Search integrates Solr/Lucene indexing with the Cassandra database to allow searching of application data without requiring a separate search cluster, external ETL processes, or custom application code for data management. The document also includes examples of different types of searches that can be performed, such as filtering, faceting, geospatial searches, and joins. It concludes with basic steps for getting started with DSE Search such as creating a Solr core and executing search queries using CQL.
Big data 101 for beginners riga dev daysDuyhai Doan
This document provides an overview and introduction to big data concepts for a new project in 2017. It discusses distributed systems theories like time ordering, latency, failure modes, and consensus protocols. It also covers data sharding and replication techniques. The document explains the CAP theorem and how it relates to consistency and availability. Finally, it discusses different distributed systems architectures like master/slave versus masterless designs.
Storing time series data with Apache CassandraPatrick McFadin
If you are looking to collect and store time series data, it's probably not going to be small. Don't get caught without a plan! Apache Cassandra has proven itself as a solid choice now you can learn how to do it. We'll look at possible data models and the the choices you have to be successful. Then, let's open the hood and learn about how data is stored in Apache Cassandra. You don't need to be an expert in distributed systems to make this work and I'll show you how. I'll give you real-world examples and work through the steps. Give me an hour and I will upgrade your time series game.
The document discusses data modeling techniques for Cassandra and provides examples for four use cases: shopping cart data, user activity tracking, log collection/aggregation, and user form versioning. For each use case, it describes the business needs, issues with a relational database approach, and proposes a Cassandra data model using CQL. It emphasizes the importance of proper data modeling and getting the model right for a given use case.
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...StampedeCon
Learn how to model beyond traditional direct access in Apache Cassandra. Utilizing the DataStax platform to harness the power of Spark and Solr to perform search, analytics, and complex operations in place on your Cassandra data!
The document discusses improvements and new features in Cassandra 2.0 and 2.1, including lightweight transactions using Paxos consensus, cursors for paging through large result sets, and optimizations to Cassandra's memory usage including pushing more data structures off-heap.
Introduction to data modeling with apache cassandraPatrick McFadin
Are you using relational databases and wonder how to get started with data modeling and Apache Cassandra? Here is a starting tour of how to get started. Translating from the knowledge you already have to the knowledge you need to effective with Cassandra development. We cover patterns and anti-patterns. Get going today!
At this meetup Patrick McFadin, Solutions Architect at DataStax, will be discussing the most recently added features in Apache Cassandra 2.0, including: Lightweight transactions, eager retries, improved compaction, triggers, and CQL cursors. He'll also be touching on time series data with Apache Cassandra.
The document summarizes Cassandra developments over the past 5 years, including keynote details from Jonathan Ellis on Cassandra 1.2 and 2.0. Some highlights include improvements to scalability, performance and reliability in Cassandra 1.2, and the introduction of new features in Cassandra 2.0 like lightweight transactions (CAS), improved compaction, and experimental triggers. The keynote outlines changes and removals between the two versions to ease the transition for developers and operators.
Relational systems have always been built on the premise of modeling relationships. As you will see, static schema, one-to-one, many-to-many still have a place in Cassandra. From the familiar, we’ll go into the specific differences in Cassandra and tricks to make your application fast and resilient.
Building your First Application with CassandraLuke Tillman
You’ve heard the talks, followed the tutorials, and done the research. You are a font of Cassandra knowledge. Now it’s time to change the world! (Or at least build something to make your boss happy). In this talk we’ll walk through the process of building KillrVideo, an open source video sharing website where users can upload and share videos, rate them, comment on them, and more. By looking at a real application, we’ll talk about architectural decisions, how the application drives the data model, some pro tips when using the DataStax drivers, and some lessons learned from mistakes made along the way. You’ll leave this session ready to start building your next application (world-changing or otherwise) with Cassandra.
Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat VideosDataStax Academy
Keyboard Cat, Nyan Cat, and of course the world famous Grumpy Cat--it seems like the Internet can’t get enough cat videos. If you were building an application to let users share and consume their fill of videos, how would you go about it? In this talk, we’ll take a look at the data model for KillrVideo, a sample video sharing application similar to YouTube where users can share videos, comment, rate them, and more. You’ll learn get a practical introduction to Cassandra data modeling, querying with CQL, how the application drives the data model, and how to shift your thinking from the relational world you probably have experience with.
This document provides an overview and examples of modeling data in Apache Cassandra. It begins with an introduction to thinking about data models and queries before modeling, and emphasizes that Cassandra requires modeling around queries due to its limitations on joins and indexes. The document then provides examples of modeling user, video, and other entity data for a video sharing application to support common queries. It also discusses techniques for handling queries that could become hotspots, such as bucketing or adding random values. The examples illustrate best practices for data duplication, materialized views, and time series data storage in Cassandra.
Cassandra By Example: Data Modelling with CQL3Eric Evans
CQL is the query language for Apache Cassandra that provides an SQL-like interface. The document discusses the evolution from the older Thrift RPC interface to CQL and provides examples of modeling tweet data in Cassandra using tables like users, tweets, following, followers, userline, and timeline. It also covers techniques like denormalization, materialized views, and batch loading of related data to optimize for common queries.
A lot has changed since I gave one of these talks and man, has it been good. 2.0 brought us a lot of new CQL features and now with 2.1 we get even more! Let me show you some real life data models and those new features taking developer productivity to an all new high. User Defined Types, New Counters, Paging, Static Columns. Exciting new ways of making your app truly killer!
Functional data models are great, but how can you squeeze out more performance and make them awesome! Let's talk through some example models, go through the tuning steps and understand the tradeoffs. Many time's just a simple understanding of the underlying internals can make all the difference. I've helped some of the biggest companies in the world do this and I can help you. Do you feel the need for Cassandra 2.0 speed?
The document summarizes new features and improvements in Apache Cassandra 2.1, including enhanced performance, lightweight transactions, collection indexing, improved counters, incremental repair, and a new row cache. It also discusses Cassandra's use at eBay to power mission-critical features for hundreds of millions of users daily.
- The document discusses enhancements and new features in Cassandra 2.1 including user defined types, collection indexing, improved counters, data directory changes, bloom filter improvements, and more efficient repair. It also outlines the new query cache and row cache features in Cassandra.
1) The document discusses new features in Apache Cassandra including JSON support, collections, user-defined types, role-based authorization, user-defined functions, commitlog compression, and DateTieredCompactionStrategy.
2) It also discusses upcoming Cassandra 3.0 features like a new storage engine, hinted handoff improvements, materialized views, and a 3.x development process.
3) Benchmark results are shown for some new features like commitlog compression, DateTieredCompactionStrategy, hinted handoffs, and materialized views which demonstrate performance improvements.
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
This document provides an overview of using Datastax Enterprise (DSE) Search to enable full-text search capabilities in Cassandra applications. It discusses how DSE Search integrates Solr/Lucene indexing with the Cassandra database to allow searching of application data without requiring a separate search cluster, external ETL processes, or custom application code for data management. The document also includes examples of different types of searches that can be performed, such as filtering, faceting, geospatial searches, and joins. It concludes with basic steps for getting started with DSE Search such as creating a Solr core and executing search queries using CQL.
Big data 101 for beginners riga dev daysDuyhai Doan
This document provides an overview and introduction to big data concepts for a new project in 2017. It discusses distributed systems theories like time ordering, latency, failure modes, and consensus protocols. It also covers data sharding and replication techniques. The document explains the CAP theorem and how it relates to consistency and availability. Finally, it discusses different distributed systems architectures like master/slave versus masterless designs.
Storing time series data with Apache CassandraPatrick McFadin
If you are looking to collect and store time series data, it's probably not going to be small. Don't get caught without a plan! Apache Cassandra has proven itself as a solid choice now you can learn how to do it. We'll look at possible data models and the the choices you have to be successful. Then, let's open the hood and learn about how data is stored in Apache Cassandra. You don't need to be an expert in distributed systems to make this work and I'll show you how. I'll give you real-world examples and work through the steps. Give me an hour and I will upgrade your time series game.
The document discusses data modeling techniques for Cassandra and provides examples for four use cases: shopping cart data, user activity tracking, log collection/aggregation, and user form versioning. For each use case, it describes the business needs, issues with a relational database approach, and proposes a Cassandra data model using CQL. It emphasizes the importance of proper data modeling and getting the model right for a given use case.
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...StampedeCon
Learn how to model beyond traditional direct access in Apache Cassandra. Utilizing the DataStax platform to harness the power of Spark and Solr to perform search, analytics, and complex operations in place on your Cassandra data!
The document discusses improvements and new features in Cassandra 2.0 and 2.1, including lightweight transactions using Paxos consensus, cursors for paging through large result sets, and optimizations to Cassandra's memory usage including pushing more data structures off-heap.
Introduction to data modeling with apache cassandraPatrick McFadin
Are you using relational databases and wonder how to get started with data modeling and Apache Cassandra? Here is a starting tour of how to get started. Translating from the knowledge you already have to the knowledge you need to effective with Cassandra development. We cover patterns and anti-patterns. Get going today!
At this meetup Patrick McFadin, Solutions Architect at DataStax, will be discussing the most recently added features in Apache Cassandra 2.0, including: Lightweight transactions, eager retries, improved compaction, triggers, and CQL cursors. He'll also be touching on time series data with Apache Cassandra.
The document summarizes Cassandra developments over the past 5 years, including keynote details from Jonathan Ellis on Cassandra 1.2 and 2.0. Some highlights include improvements to scalability, performance and reliability in Cassandra 1.2, and the introduction of new features in Cassandra 2.0 like lightweight transactions (CAS), improved compaction, and experimental triggers. The keynote outlines changes and removals between the two versions to ease the transition for developers and operators.
Relational systems have always been built on the premise of modeling relationships. As you will see, static schema, one-to-one, many-to-many still have a place in Cassandra. From the familiar, we’ll go into the specific differences in Cassandra and tricks to make your application fast and resilient.
Building your First Application with CassandraLuke Tillman
You’ve heard the talks, followed the tutorials, and done the research. You are a font of Cassandra knowledge. Now it’s time to change the world! (Or at least build something to make your boss happy). In this talk we’ll walk through the process of building KillrVideo, an open source video sharing website where users can upload and share videos, rate them, comment on them, and more. By looking at a real application, we’ll talk about architectural decisions, how the application drives the data model, some pro tips when using the DataStax drivers, and some lessons learned from mistakes made along the way. You’ll leave this session ready to start building your next application (world-changing or otherwise) with Cassandra.
Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat VideosDataStax Academy
Keyboard Cat, Nyan Cat, and of course the world famous Grumpy Cat--it seems like the Internet can’t get enough cat videos. If you were building an application to let users share and consume their fill of videos, how would you go about it? In this talk, we’ll take a look at the data model for KillrVideo, a sample video sharing application similar to YouTube where users can share videos, comment, rate them, and more. You’ll learn get a practical introduction to Cassandra data modeling, querying with CQL, how the application drives the data model, and how to shift your thinking from the relational world you probably have experience with.
Creating a Python Microservice Tier in Four Sprints with Cassandra, Kafka, an...Jeffrey Carpenter
Creating a Python microservice tier in four sprints using Cassandra, Kafka, and DSE Graph. The document outlines the steps taken in each sprint: Sprint 1 focused on setting up the basic plumbing of services, Sprint 2 added data access and business logic using Cassandra and DSE Search, Sprint 3 implemented messaging with Kafka, and Sprint 4 developed a graph-based recommender system using DSE Graph. Key lessons learned included decoupling components, using different driver interaction methods, and approaching new technologies incrementally.
In this webinar, we review the benefits of deploying a microservices architecture with Cassandra as your backbone in order to ensure your applications become incredibly reliable. We discuss in detail:
- How to create microservices in Node.js with ExpressJs and Seneca
- Tuning the Node.js driver for Cassandra: error handling, load balancing and degrees of parallelism
- Additional best practices to ensure your systems are highly performant and available
The sample service is available on GitHub: https://github.com/jorgebay/killr-service
Unleash the Power of Video Communication - Office 365 Video vs. Azure Media S...Gina Montgomery, V-TSP
Is your organization considering the power of Video Communication? Microsoft provides two options that can assist your organization in Video Communication. The first, robust and fully customizable option, Azure Media Services, allows you to deliver any media, on virtually any device, with the power of the Azure cloud. Microsoft also provides an out-of-the box video portal solution in Office 365 that is built on Azure Media Services and SharePoint Online. Come learn features and benefits of each.
Busy Developers Guide to AngularJS (Tiberiu Covaci)ITCamp
Since last year, Single Page Applications have grown exponentially in popularity and the framework of choice for many developers is Angular.js. In this session we will go through some of the features that make Angular such a popular framework so you can start using it in your own projects.
Crossroads of Asynchrony and Graceful DegradationC4Media
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1VmbI3t.
Nitesh Kant describes how embracing asynchrony in the Netflix applications, from networking to business processing, creates gracefully degrading and highly resilient applications. Filmed at qconsf.com.
Nitesh Kant is an engineer in Netflix’s Edge Gateway team, working on Netflix’s asynchronous Inter Process Communication stack. He is the author of RxNetty which forms the core of this stack and is currently moving Zuul to this new architecture.
HTML5: The Parts You Care About - 4/Nov/13 - PrDC Saskatoon, SKDavid Wesst
The document is a presentation about HTML5 that discusses its evolution and components. It defines HTML5 as using HTML, CSS, and JavaScript to update web standards for how the world currently uses the web. It outlines the main parts of HTML5 like new elements, tools for defining pages with HTML, styling with CSS compilers and frameworks, and interacting with JavaScript libraries, compilers, and APIs. It provides examples of using new HTML5 features and recommends further resources to learn more.
Capture, record, clip, embed and play, search: video from newbie to ninjaVito Flavio Lorusso
This document provides an overview of building a video streaming solution using Azure Media Services. It discusses the key components involved including:
1. Creating Media Services and Storage accounts
2. Uploading videos as assets and encoding them
3. Generating thumbnails, subtitles and adaptive bitrate manifests
4. Creating a streaming endpoint and getting streaming URLs
5. Integrating with a web app using the Azure Media Player
The document also briefly covers integrating with Azure Search to enable video search functionality on the web app. It provides code samples for common tasks like uploading, encoding, and playing videos using Media Services and searching using Azure Search.
GraphQL, GRPC, REST, WebFlux, OData il existe une multitude de protocoles et formats pour implementer des API au dessus d'une base de donnees comme Cassandra. Chez DataStax nous avons eu l'occasion de toutes les implementer pour tester. Je vous propose un tour d'horizon des différentes solutions avec les pros, les cons et surtout beaucoup de code !
Getting started with Appcelerator TitaniumTechday7
Techday7, Cross platform application development using Appcelerator Titanium event's Getting started with Appcelerator Titanium By Naga Harish M, Lead Developer of Anubavam Technologies
This document discusses different types of cloud computing models including private, infrastructure as a service (IaaS), and platform as a service (PaaS). It provides an overview of the Microsoft Cloud including its global data centers, Windows Azure platform, categories of services, and how to get started with a free Windows Azure account.
Towards Functional Programming through Hexagonal ArchitectureCodelyTV
Slides of for the talk "Towards Functional Programming through Hexagonal Architecture" delivered at the Software Crafters Barcelona 2018 conference #scbcn18 by Juanma Serrano from Habla Computing and Javier Ferrer from CodelyTV
This document introduces Adam Tuliper and Christopher Harrison from Microsoft and provides an overview of their session on implementing Entity Framework with MVC. The session will cover introducing Entity Framework, beginning code first development, managing relationships and transactions, and integrating additional features. Attendees will learn how to use Entity Framework to access and manage data in an MVC application.
This document introduces Adam Tuliper and Christopher Harrison from Microsoft and provides an overview of their session on implementing Entity Framework with MVC. The session will cover introducing Entity Framework, beginning code first development, managing relationships and transactions, and integrating additional features. Attendees will learn how to use Entity Framework to access and manage data in an MVC application.
Acercándonos a la Programación Funcional a través de la Arquitectura Hexag...CodelyTV
Slides de la charla "Acercándonos a la Programación Funcional a través de la Arquitectura Hexagonal" en el meetup de Software Crafters Madrid conjuntamente con Scala Madrid el 21/11/2018. Descuento en cursos CodelyTV Pro por verla: http://bit.ly/codelytv19e
The document discusses upgrading the Zimmer Twins Drupal site from version 4.6 to 6. It describes stripping down contributed modules, upgrading the database from 4.6 to 4.7 and then 5 and finally to 6. Many custom modules needed to be converted or recreated for Drupal 6. Several contributed modules were also incorporated. Issues encountered included migrating the site's data from a Latin1 encoding to UTF-8 and slow user logins due to username comparisons.
This document provides an introduction to behavior-driven development (BDD). It defines BDD as a software development process based on test-driven development (TDD) that combines TDD techniques with ideas from domain-driven design and object-oriented analysis to promote communication and collaboration between developers and business analysts. The document outlines the BDD process of writing stories, scenarios, specifications, implementing features, verifying behavior, and refactoring. It provides examples of transforming stories into scenarios and scenarios into tests.
Similar to Cassandra Day Chicago 2015: Building Your First Application with Apache Cassandra (20)
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
Companies today are innovating with real-time data to deliver truly amazing customer experiences in the moment. Real-time data management for real-time customer experience is core to staying ahead of competition and driving revenue growth. Join Trays to learn how Comcast is differentiating itself from it's own historical reputation with Customer Experience strategies.
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
DataStax Enterprise (DSE) Graph is a built to manage, analyze, and search highly connected data. DSE Graph, built on NoSQL Apache Cassandra delivers continuous uptime along with predictable performance and scales for modern systems dealing with complex and constantly changing data.
Download DataStax Enterprise: Academy.DataStax.com/Download
Start free training for DataStax Enterprise Graph: Academy.DataStax.com/courses/ds332-datastax-enterprise-graph
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
DataStax Enterprise Advanced Replication supports one-way distributed data replication from remote database clusters that might experience periods of network or internet downtime. Benefiting use cases that require a 'hub and spoke' architecture.
Learn more at http://www.datastax.com/2016/07/stay-100-connected-with-dse-advanced-replication
Advanced Replication docs – https://docs.datastax.com/en/latest-dse/datastax_enterprise/advRep/advRepTOC.html
This document discusses using Docker containers to run Cassandra clusters at Walmart. It proposes transforming existing Cassandra hardware into containers to better utilize unused compute. It also suggests building new Cassandra clusters in containers and migrating old clusters to double capacity on existing hardware and save costs. Benchmark results show Docker containers outperforming virtual machines on OpenStack and Azure in terms of reads, writes, throughput and latency for an in-house application.
The document discusses the evolution of Cassandra's data modeling capabilities over different versions of CQL. It covers features introduced in each version such as user defined types, functions, aggregates, materialized views, and storage attached secondary indexes (SASI). It provides examples of how to create user defined types, functions, materialized views, and SASI indexes in CQL. It also discusses when each feature should and should not be used.
Cisco has a large global IT infrastructure supporting many applications, databases, and employees. The document discusses Cisco's existing customer service and commerce systems (CSCC/SMS3) and some of the performance, scalability, and user experience issues. It then presents a proposed new architecture using modern technologies like Elasticsearch, Cassandra, and microservices to address these issues and improve agility, performance, scalability, uptime, and the user interface.
Data Modeling is the one of the first things to sink your teeth into when trying out a new database. That's why we are going to cover this foundational topic in enough detail for you to get dangerous. Data Modeling for relational databases is more than a touch different than the way it's approached with Cassandra. We will address the quintessential query-driven methodology through a couple of different use cases, including working with time series data for IoT. We will also demo a new tool to get you bootstrapped quickly with MovieLens sample data. This talk should give you the basics you need to get serious with Apache Cassandra.
Hear about how Coursera uses Cassandra as the core of its scalable online education platform. I'll discuss the strengths of Cassandra that we leverage, as well as some limitations that you might run into as well in practice.
In the second part of this talk, we'll dive into how best to effectively use the Datastax Java drivers. We'll dig into how the driver is architected, and use this understanding to develop best practices to follow. I'll also share a couple of interesting bug we've run into at Coursera.
This document promotes Datastax Academy and Certification resources for learning Cassandra including a three step process of learning Cassandra, getting certified, and profiting. It lists community evangelists like Luke Tillman, Patrick McFadin, Jon Haddad, and Duy Hai Doan who can provide help and resources.
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
This document summarizes three presentations from a Cassandra Meetup:
1. Jason Cacciatore discussed monitoring Cassandra health at scale across hundreds of clusters and thousands of nodes using the reactive stream processing system Mantis.
2. Minh Do explained how Cassandra uses the gossip protocol for tasks like discovering cluster topology and sharing load information. Gossip also has limitations and race conditions that can cause problems.
3. Chris Kalantzis presented Cassandra Tickler, an open source tool he created to help repair operations that get stuck by running lightweight consistency checks on an old Cassandra version or a node with space issues.
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
This talk covers scaling Cassandra to a fast growing user base. Alex and Isaias will cover new best practices and how to work with the strengths and weaknesses of Cassandra at large scale. They will discuss how to adapt to bottlenecks while providing a rich feature set to the playstation community.
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
The document discusses Cassandra's use by Sony Network Entertainment to handle the large amount of user and transaction data from the growing PlayStation Network. It describes how the relational database they previously used did not scale sufficiently, so they transitioned to using Cassandra in a denormalized and customized way. Some of the techniques discussed include caching user data locally on application servers, secondary indexing, and using a real-time indexer to enable personalized search by friends.
This document provides guidance on setting up server monitoring, application metrics, log aggregation, time synchronization, replication strategies, and garbage collection for a Cassandra cluster. Key recommendations include:
1. Use monitoring tools like Monit, Munin, Nagios, or OpsCenter to monitor processes, disk usage, and system performance. Aggregate all logs centrally with tools like Splunk, Logstash, or Greylog.
2. Install NTP to synchronize server times which are critical for consistency.
3. Use the NetworkTopologyStrategy replication strategy and avoid SimpleStrategy for production.
4. Avoid shared storage and focus on low latency and high throughput using multiple local disks.
5. Understand
This document discusses real time analytics using Spark and Spark Streaming. It provides an introduction to Spark and highlights limitations of Hadoop for real-time analytics. It then describes Spark's advantages like in-memory processing and rich APIs. The document discusses Spark Streaming and the Spark Cassandra Connector. It also introduces DataStax Enterprise which integrates Spark, Cassandra and Solr to allow real-time analytics without separate clusters. Examples of streaming use cases and demos are provided.
The document discusses different data storage options for small, medium, and large datasets. It argues that relational databases do not scale well for large datasets due to limitations with replication, normalization, sharding, and high availability. The document then introduces Apache Cassandra as a fast, distributed, highly available, and linearly scalable database that addresses these limitations through its use of a hash ring architecture and tunable consistency levels. It describes Cassandra's key features including replication, compaction, and multi-datacenter support.
The document discusses common bad habits that can occur when working with Apache Cassandra and provides recommendations to avoid them. Specifically, it addresses issues like sliding back into a relational mindset when the data model is different, improperly benchmarking Cassandra systems, having slow client performance, and neglecting important operations tasks. The presentation provides guidance on how to approach data modeling, querying, benchmarking, driver usage, and operations management in a Cassandra-oriented way.
The document discusses best practices for using Apache Cassandra, including:
- Topology considerations like replication strategies and snitches
- Booting new datacenters and replacing nodes
- Security techniques like authentication, authorization, and SSL encryption
- Using prepared statements for efficiency
- Asynchronous execution for request pipelining
- Batch statements and their appropriate uses
- Improving performance through techniques like the new row cache
This is a two part talk in which we'll go over the architecture that enables Apache Cassandra’s linear scalability as well as how DataStax Drivers are able to take full advantage of it to provide developers with nicely designed and speedy clients extendable to the core.
To view the full-length video and tutorial, visit: https://academy.datastax.com/demos/getting-started-graph-databases
Getting Started with Graph Databases contains a brief overview of RDBMS architecture in comparison to graph, basic graph terminology, a real-world use case for graph, and an overview of Gremlin, the standard graph query language found in TinkerPop.
Spark can be used to perform maintenance operations on Cassandra data. There are three basic patterns for interacting with Cassandra using Spark: read-transform-write (1:1), read-transform-write (1:m), and read-filter-delete (m:1). Deletes are tricky in Cassandra and require either selecting records to delete and issuing deletes or selecting records to keep and rewriting/deleting partitions. The document provides examples of using Spark for cache maintenance, trimming user history, publishing data, and multitenant backup and recovery.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Cassandra Day Chicago 2015: Building Your First Application with Apache Cassandra
1. Building Your First Application with
Cassandra
Luke Tillman (@LukeTillman)
Language Evangelist at DataStax
2. Who are you?!
• Evangelist with a focus on the .NET Community
• Long-time .NET Developer
• Recently presented at Cassandra Summit 2014 with Microsoft
2
3. KillrVideo, a Video Sharing Site
• Think a YouTube competitor
– Users add videos, rate them, comment on them, etc.
– Can search for videos by tag
4. See the Live Demo, Get the Code
• Live demo available at http://www.killrvideo.com
– Written in C#
– Live Demo running in Azure
– Open source: https://github.com/luketillman/killrvideo-csharp
• Interesting use case because of different data modeling
challenges and the scale of something like YouTube
– More than 1 billion unique users visit YouTube each month
– 100 hours of video are uploaded to YouTube every minute
4
5. 1 Think Before You Model
2 A Data Model for Cat Videos
3 Phase 2: Build the Application
4 Software Architecture, A Love Story
5 The Future
5
6. Think Before You Model
Or how to keep doing what you’re already doing
6
7. Getting to Know Your Data
• What things do I have in the system?
• What are the relationships between them?
• This is your conceptual data model
• You already do this in the RDBMS world
8. Some of the Entities and Relationships in KillrVideo
8
User
id
firstname
lastname
email
password
Video
id
name
description
location
preview_image
tags
features
Comment
comment
id
adds
timestamp
posts
timestamp
1
n
n
1
1
n
n
m
rates
rating
9. Getting to Know Your Queries
• What are your application’s workflows?
• How will I access the data?
• Knowing your queries in advance is NOT optional
• Different from RDBMS because I can’t just JOIN or create a new
indexes to support new queries
9
10. Some Application Workflows in KillrVideo
10
User Logs
into site
Show basic
information
about user
Show videos
added by a
user
Show
comments
posted by a
user
Search for a
video by tag
Show latest
videos added
to the site
Show
comments
for a video
Show ratings
for a video
Show video
and its
details
11. Some Queries in KillrVideo to Support Workflows
11
Users
User Logs
into site
Find user by email
address
Show basic
information
about user
Find user by id
Comments
Show
comments
for a video
Find comments by
video (latest first)
Show
comments
posted by a
user
Find comments by
user (latest first)
Ratings
Show ratings
for a video Find ratings by video
12. Some Queries in KillrVideo to Support Workflows
12
Videos
Search for a
video by tag Find video by tag
Show latest
videos added
to the site
Find videos by date
(latest first)
Show video
and its
details
Find video by id
Show videos
added by a
user
Find videos by user
(latest first)
13. A Data Model for Cat Videos
Because the Internet loves ‘em some cat videos
13
14. Just How Popular are Cats on the Internet?
14
http://mashable.com/2013/07/08/cats-bacon-rule-internet/
15. Just How Popular are Cats on the Internet?
15
http://mashable.com/2013/07/08/cats-bacon-rule-internet/
16. Data Modeling Refresher
• Cassandra limits us to queries that can scale across many nodes
– Include value for Partition Key and optionally, Clustering Column(s)
• We know our queries, so we build tables to answer them
• Denormalize at write time to do as few reads as possible
• Many times we end up with a “table per query”
– Similar to materialized views from the RDBMS world
16
17. Users – The Relational Way
• Single Users table with all user data and an Id Primary Key
• Add an index on email address to allow queries by email
User Logs
into site
Find user by email
address
Show basic
information
about user
Find user by id
18. Users – The Cassandra Way
User Logs
into site
Find user by email
address
Show basic
information
about user
Find user by id
CREATE TABLE user_credentials (
email text,
password text,
userid uuid,
PRIMARY KEY (email)
);
CREATE TABLE users (
userid uuid,
firstname text,
lastname text,
email text,
created_date timestamp,
PRIMARY KEY (userid)
);
19. Videos Everywhere!
19
Show video
and its
details
Find video by id
Show videos
added by a
user
Find videos by user
(latest first)
CREATE TABLE videos (
videoid uuid,
userid uuid,
name text,
description text,
location text,
location_type int,
preview_image_location text,
tags set<text>,
added_date timestamp,
PRIMARY KEY (videoid)
);
CREATE TABLE user_videos (
userid uuid,
added_date timestamp,
videoid uuid,
name text,
preview_image_location text,
PRIMARY KEY (userid,
added_date, videoid)
)
WITH CLUSTERING ORDER BY (
added_date DESC,
videoid ASC);
20. Videos Everywhere!
Considerations When Duplicating Data
• Can the data change?
• How likely is it to change or how frequently will it change?
• Do I have all the information I need to update duplicates and
maintain consistency?
20
Search for a
video by tag Find video by tag
Show latest
videos added
to the site
Find videos by date
(latest first)
21. Modeling Relationships – Collection Types
• Cassandra doesn’t support JOINs, but your data will still have
relationships (and you can still model that in Cassandra)
• One tool available is CQL collection types
CREATE TABLE videos (
videoid uuid,
userid uuid,
name text,
description text,
location text,
location_type int,
preview_image_location text,
tags set<text>,
added_date timestamp,
PRIMARY KEY (videoid)
);
22. Modeling Relationships – Client Side Joins
22
CREATE TABLE videos (
videoid uuid,
userid uuid,
name text,
description text,
location text,
location_type int,
preview_image_location text,
tags set<text>,
added_date timestamp,
PRIMARY KEY (videoid)
);
CREATE TABLE users (
userid uuid,
firstname text,
lastname text,
email text,
created_date timestamp,
PRIMARY KEY (userid)
);
Currently requires query for video,
followed by query for user by id based
on results of first query
23. Modeling Relationships – Client Side Joins
• What is the cost? Might be OK in small situations
• Do NOT scale
• Avoid when possible
23
25. Modeling Relationships – Client Side Joins
• Remember the considerations when you duplicate data
• What happens if a user changes their name or email address?
• Can I update the duplicated data?
25
26. Cassandra Rules Can Impact Your Design
• Video Ratings – use counters to track sum of all ratings and
count of ratings
• Counters are a good example of something with special rules
CREATE TABLE videos (
videoid uuid,
userid uuid,
name text,
description text,
...
rating_counter counter,
rating_total counter,
PRIMARY KEY (videoid)
);
CREATE TABLE video_ratings (
videoid uuid,
rating_counter counter,
rating_total counter,
PRIMARY KEY (videoid)
);
27. Single Nodes Have Limits Too
• Latest videos are bucketed by
day
• Means all reads/writes to latest
videos are going to same
partition (and thus the same
nodes)
• Could create a hotspot
27
Show latest
videos added
to the site
Find videos by date
(latest first)
CREATE TABLE latest_videos (
yyyymmdd text,
added_date timestamp,
videoid uuid,
name text,
preview_image_location text,
PRIMARY KEY (yyyymmdd,
added_date, videoid)
) WITH CLUSTERING ORDER BY (
added_date DESC,
videoid ASC
);
28. Single Nodes Have Limits Too
• Mitigate by adding data to the
Partition Key to spread load
• Data that’s already naturally a
part of the domain
– Latest videos by category?
• Arbitrary data, like a bucket
number
– Round robin at the app level
28
Show latest
videos added
to the site
Find videos by date
(latest first)
CREATE TABLE latest_videos (
yyyymmdd text,
bucket_number int,
added_date timestamp,
videoid uuid,
name text,
preview_image_location text,
PRIMARY KEY (
(yyyymmdd, bucket_number)
added_date, videoid)
) ...
29. Phase 2: Build the Application
Phase 3: Profit
29
Phase 1: Data Model
30. The DataStax Drivers for Cassandra
• Currently Available
– C# (.NET)
– Python
– Java
– NodeJS
– Ruby
– C++
• Will Probably Happen
– PHP
– Scala
– JDBC
• Early Discussions
– Go
– Rust
30
• Open source, Apache 2 licensed, available on GitHub
– https://github.com/datastax/
31. The DataStax Drivers for Cassandra
Language Bootstrapping Code
C#
Cluster cluster = Cluster.Builder().AddContactPoint("127.0.0.1").Build();
ISession session = cluster.Connect("killrvideo");
Python
from cassandra.cluster import Cluster
cluster = Cluster(contact_points=['127.0.0.1'])
session = cluster.connect('killrvideo')
Java
Cluster cluster = Cluster.builder().addContactPoint("127.0.0.1").build();
Session session = cluster.connect("killrvideo");
NodeJS
var cassandra = require('cassandra-driver');
var client = new cassandra.Client({
contactPoints: ['127.0.0.1'], keyspace: 'killrvideo'
});
32. Use Prepared Statements
• Performance optimization for queries you run repeatedly
• Pay the cost of preparing once (causes roundtrip to Cassandra)
• KillrVideo: looking a user’s credentials up by email address
• Save and reuse the PreparedStatement instance after preparing
32
PreparedStatement prepared = session.Prepare(
"SELECT * FROM user_credentials WHERE email = ?");
33. Use Prepared Statements
• Bind variable values when ready to execute
• Execution only has to send variable values over the wire
• Cassandra doesn’t have to reparse the CQL string each time
• Remember: Prepare once, bind and execute many
33
BoundStatement bound = prepared.Bind("luke.tillman@datastax.com");
RowSet rows = await _session.ExecuteAsync(bound);
34. Batch Statements: Use and Misuse
• You can mix and match Simple/Bound statements in a batch
• Batches are Logged (atomic) by default
• Use when you want a group of mutations (statements) to all
succeed or all fail (denormalizing at write time)
• Large batches are an anti-pattern (Cassandra will warn you)
• Not a performance optimization for bulk-loading data
34
35. KillrVideo: Update a Video’s Name with a Batch
35
public class VideoCatalogDataAccess
{
public VideoCatalogDataAccess(ISession session)
{
_session = session;
_prepared = _session.Prepare(
"UPDATE user_videos SET name = ? WHERE userid = ? AND videoid = ?");
}
public async Task UpdateVideoName(UpdateVideoDto video)
{
BoundStatement bound = _prepared.Bind(video.Name, video.UserId, video.VideoId);
var simple = new SimpleStatement("UPDATE videos SET name = ? WHERE videoid = ?",
video.Name, video.VideoId);
// Use an atomic batch to send over all the mutations
var batchStatement = new BatchStatement();
batchStatement.Add(bound);
batchStatement.Add(simple);
RowSet rows = await _session.ExecuteAsync(batch);
}
}
36. Lightweight Transactions when you need them
• Use when you don’t want writes to step on each other
– Sometimes called Linearizable Consistency
– Similar to Serial Isolation Level from RDBMS
• Essentially a Check and Set (CAS) operation using Paxos
• Read the fine print: has a latency cost associated with it
• The canonical example: unique user accounts
36
37. KillrVideo: LWT to create user accounts
• Returns a column called [applied] indicating success/failure
• Different from relational world where you might expect an
Exception (i.e. PrimaryKeyViolationException or similar)
37
string cql = "INSERT INTO user_credentials (email, password, userid)" +
"VALUES (?, ?, ?) IF NOT EXISTS";
var statement = new SimpleStatement(cql, user.Email, hashedPassword, user.UserId);
RowSet rows = await _session.ExecuteAsync(statement);
var userInserted = rows.Single().GetValue<bool>("[applied]");
39. KillrVideo Logical Architecture
Web UI
HTML5 / JavaScript
KillrVideo MVC App
Serves up Web UI HTML and handles JSON requests from Web UI
Comments
Tracks comments on
videos by users
Uploads
Handles processing,
storing, and encoding
uploaded videos
Video Catalog
Tracks the catalog of
available videos
User Management
User accounts, login
credentials, profiles
Cassandra
Cluster (DSE)
App data storage
for services (e.g.
users, comments)
DataStax
OpsCenter
Management,
provisioning, and
monitoring
Azure Media
Services
Video encoding,
thumbnail
generation
Azure Storage
(Blob, Queue)
Video file and
thumbnail image
storage
Azure Service
Bus
Published events
from services for
interactions
Browser
Server
Services
Infrastructure
40. Inside a Simple Service: Video Catalog
Video Catalog
Tracks the catalog of
available videos
Cassandra
Cluster (DSE)
App data storage
for services (e.g.
users, comments)
Azure Service
Bus
Published events
from services for
interactions
41. Inside a Simple Service: Video Catalog
Video Catalog
Tracks the catalog of
available videos
Cassandra
Cluster (DSE)
App data storage
for services (e.g.
users, comments)
• Stores metadata about videos in
Cassandra (e.g. name, description,
location, thumbnail location, etc.)
42. Inside a Simple Service: Video Catalog
Video Catalog
Tracks the catalog of
available videos
Azure Service
Bus
Published events
from services for
interactions
• Publishes events about interesting things
that happen (e.g. YouTubeVideoAdded,
UploadedVideoAccepted, etc.)
43. Inside a More Complicated Service: Uploads
Uploads
Handles processing,
storing, and encoding
uploaded videos
Cassandra
Cluster (DSE)
App data storage
for services (e.g.
users, comments)
Azure Storage
(Blob, Queue)
Video file and
thumbnail image
storage
Azure Media
Services
Video encoding,
thumbnail
generation
Azure Service
Bus
Published events
from services for
interactions
44. Inside a More Complicated Service: Uploads
Uploads
Handles processing,
storing, and encoding
uploaded videos
Cassandra
Cluster (DSE)
App data storage
for services (e.g.
users, comments)
• Stores data about uploaded video file
locations, encoding jobs, job status, etc.
45. Inside a More Complicated Service: Uploads
Uploads
Handles processing,
storing, and encoding
uploaded videos
Azure Storage
(Blob, Queue)
Video file and
thumbnail image
storage
• Stores original and re-encoded video file
assets, as well as thumbnail preview
images generated
46. Inside a More Complicated Service: Uploads
Uploads
Handles processing,
storing, and encoding
uploaded videos
Azure Media
Services
Video encoding,
thumbnail
generation
• Re-encodes uploaded videos to format
suitable for the web, generates
thumbnail image previews
47. Inside a More Complicated Service: Uploads
Uploads
Handles processing,
storing, and encoding
uploaded videos
Azure Service
Bus
Published events
from services for
interactions
• Publishes events about interesting things
that happen (e.g.
UploadedVideoPublished, etc.)
48. Event Driven Architecture
• Only the application(s)
give commands
• Decoupled: Pub-sub
messaging to tell other
parts of the system
something interesting
happened
• Services could be
deployed, scaled, and
versioned independently
(AKA microservices)
42
Azure Service
Bus
User
Management
Comments
Video
Ratings
Sample Data
Search
Statistics
Suggested
Videos
Uploads
Video
Catalog
49. Event Driven Architecture
• Only the application(s)
give commands
• Decoupled: Pub-sub
messaging to tell other
parts of the system
something interesting
happened
• Services could be
deployed, scaled, and
versioned independently
(AKA microservices)
42
Azure Service
Bus
Search
Suggested
Videos
Video
Catalog
Hey, I added this
new YouTube video
to the catalog!
50. Event Driven Architecture
• Only the application(s)
give commands
• Decoupled: Pub-sub
messaging to tell other
parts of the system
something interesting
happened
• Services could be
deployed, scaled, and
versioned independently
(AKA microservices)
42
Azure Service
Bus
Search
Suggested
Videos
Video
Catalog
Hey, I added this
new YouTube video
to the catalog!
Time to figure
out what videos
to suggest for
that new video.
Better index that
new video so it
shows up in
search results.
53. Where do we go with KillrVideo from here?
• Spark or AzureML for video suggestions
• Video search via Solr
• Actors that store state in C* (Akka.NET or Orleans)
• Storing file data (thumbnails, profile pics) in C* using pithos