Introduction à Apache Cassandra par rapport aux SGBDR traditionnels: les similitudes et les différences, ainsi que certains des outils disponibles dans l'écosystème Cassandra. Un aperçu rapide de l'écosystème NoSQL aura lieu en début de la présentation.
Introduction à la gouvernance de données, Philippe Bourgeois, Senior Consultant Trivadis. Conférence donnée dans le cadre du Swiss Data Forum, du 24 novembre 2015 à Lausanne
L’apparition de systèmes SMART, tels que villes intelligentes, domotique ou autres objets connectés, représente une avancée substantielle dans l'efficacité du monde de l’information. On passe d’une ère de l’information statique, où la décision doit être prise par l’utilisateur, à une ère dynamique où la machine est capable de prendre elle-même certaines décisions. Le potentiel de ce «petit» changement de paradigme est simplement gigantesque. Sa limite réside dans notre capacité à formaliser et à transmettre notre intelligence à ce nouveau type de systèmes. Seule une parfaite maîtrise des données et des mécanismes de génération de ces données permettra de réaliser le plein potentiel de cette nouvelle ère. Cette maîtrise, c’est la gouvernance.
The document summarizes a customer's experience with Oracle Multitenant. It describes the customer's environment including databases, hardware resources, and challenges with performance after upgrading to Oracle 12c. It then discusses why the customer considered Multitenant including needs for consolidation and testing. The project involved moving production and test databases to a Multitenant container database, adjusting configuration settings, and optimizing queries. The results were improved performance and ability to scale resources. New features in Oracle 12.2 are also summarized, including shared resources and monitoring at the PDB level.
Human: Thank you for the summary. Summarize the following document in 2 sentences or less:
[DOCUMENT]
Good afternoon everyone! Thank you for
Le but est de partager avec le public les connaissances et expériences éprouvées dans la conception, la mise en œuvre et l'exécution de plateformes DBaaS. La présentation comprend des exemples et des explications sur les environnements de base de données consolidées délivrant des performances sans compromis, l'évolutivité et la flexibilité en liaison avec le "time-to-market" et la rentabilité.
Cette session est un retour d’expérience d’un passage à Oracle 12c de 400 bases de données. Actuellement 300 bases de données ont été migrées avec de bonnes et de mauvaises surprises! Cette session va présenter les situations que nous avons rencontrées durant ces migrations. Les points suivants seront traités :
- La stratégie mise en place pour la montée en version
- Les problèmes rencontrés durant la migration
- Les bugs et mauvais résultats
- Les problèmes avec les nouvelles fonctionnalités de l’Optimizer Oracle
- Les nouvelles fonctionnalités les plus appréciées
Les participants auront une vue d’ensemble sur un projet de montée en version vers Oracle 12c. Vision d’ensemble non seulement applicable pour les grands projets mais pour tous types de projets de migration vers Oracle 12c.
Showing only reports of data is only a part of the whole story. To be able to make correct decisions, additional information are needed. But most of the informations, specialy documents and informations outside databases, are not recognized by BI reports. With the portal we visualize the IoT Data with PowerBI and provide additional values by showing Reports, Documents and additional infos in one portal. Users will get a real "single point of information" for that topic. An example with a demo will be shown.
Introduction à la gouvernance de données, Philippe Bourgeois, Senior Consultant Trivadis. Conférence donnée dans le cadre du Swiss Data Forum, du 24 novembre 2015 à Lausanne
L’apparition de systèmes SMART, tels que villes intelligentes, domotique ou autres objets connectés, représente une avancée substantielle dans l'efficacité du monde de l’information. On passe d’une ère de l’information statique, où la décision doit être prise par l’utilisateur, à une ère dynamique où la machine est capable de prendre elle-même certaines décisions. Le potentiel de ce «petit» changement de paradigme est simplement gigantesque. Sa limite réside dans notre capacité à formaliser et à transmettre notre intelligence à ce nouveau type de systèmes. Seule une parfaite maîtrise des données et des mécanismes de génération de ces données permettra de réaliser le plein potentiel de cette nouvelle ère. Cette maîtrise, c’est la gouvernance.
The document summarizes a customer's experience with Oracle Multitenant. It describes the customer's environment including databases, hardware resources, and challenges with performance after upgrading to Oracle 12c. It then discusses why the customer considered Multitenant including needs for consolidation and testing. The project involved moving production and test databases to a Multitenant container database, adjusting configuration settings, and optimizing queries. The results were improved performance and ability to scale resources. New features in Oracle 12.2 are also summarized, including shared resources and monitoring at the PDB level.
Human: Thank you for the summary. Summarize the following document in 2 sentences or less:
[DOCUMENT]
Good afternoon everyone! Thank you for
Le but est de partager avec le public les connaissances et expériences éprouvées dans la conception, la mise en œuvre et l'exécution de plateformes DBaaS. La présentation comprend des exemples et des explications sur les environnements de base de données consolidées délivrant des performances sans compromis, l'évolutivité et la flexibilité en liaison avec le "time-to-market" et la rentabilité.
Cette session est un retour d’expérience d’un passage à Oracle 12c de 400 bases de données. Actuellement 300 bases de données ont été migrées avec de bonnes et de mauvaises surprises! Cette session va présenter les situations que nous avons rencontrées durant ces migrations. Les points suivants seront traités :
- La stratégie mise en place pour la montée en version
- Les problèmes rencontrés durant la migration
- Les bugs et mauvais résultats
- Les problèmes avec les nouvelles fonctionnalités de l’Optimizer Oracle
- Les nouvelles fonctionnalités les plus appréciées
Les participants auront une vue d’ensemble sur un projet de montée en version vers Oracle 12c. Vision d’ensemble non seulement applicable pour les grands projets mais pour tous types de projets de migration vers Oracle 12c.
Showing only reports of data is only a part of the whole story. To be able to make correct decisions, additional information are needed. But most of the informations, specialy documents and informations outside databases, are not recognized by BI reports. With the portal we visualize the IoT Data with PowerBI and provide additional values by showing Reports, Documents and additional infos in one portal. Users will get a real "single point of information" for that topic. An example with a demo will be shown.
Archaic database technologies just don't scale under the always on, distributed demands of modern IOT, mobile and web applications. We'll start this Intro to Cassandra by discussing how its approach is different and why so many awesome companies have migrated from the cold clutches of the relational world into the warm embrace of peer to peer architecture. After this high-level opening discussion, we'll briefly unpack the following:
• Cassandra's internal architecture and distribution model
• Cassandra's Data Model
• Reads and Writes
An introduction to Cassandra given for the "Scala by the Lagoon", the Venice area Scala user group.
How the database is designed and how it works, the CAP theorem and its implications on distributed databases.
Cassandra query language first look and a primer on Phantom, a Scala DSL for connecting to a Cassandra cluster.
Scylla Summit 2022: Making Schema Changes Safe with RaftScyllaDB
ScyllaDB adopted Raft as a consensus protocol in order to dramatically improve our operational aspects as well as provide strong consistency to the end-user. This talk will explain how Raft behaves in Scylla Open Source 5.0 and introduce the first end-user visible major improvement: schema changes. Learn how cluster configuration resides in Raft, providing consistent cluster assembly and configuration management. This makes bootstrapping safer and provides reliable disaster recovery when you lose the majority of the cluster.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Cassandra is used as the backend database for Scandit's barcode and product scanning platform. It provides high scalability and availability needed to store large volumes of product data and scan data. Cassandra's data model uses a column family structure and allows storing data flexibly in column names. It is optimized for write-heavy workloads and scales easily by adding more nodes.
Cassandra Day Denver 2014: Introduction to Apache CassandraDataStax Academy
Speaker: Jon Haddad, Technical Evangelist for Apache Cassandra at DataStax
This is a crash course introduction to Cassandra. You'll step away understanding how it's possible to to utilize this distributed database to achieve high availability across multiple data centers, scale out as your needs grow, and not be woken up at 3am just because a server failed. We'll cover the basics of data modeling with CQL, and understand how that data is stored on disk. We'll wrap things up by setting up Cassandra locally, so bring your laptops!
This document introduces Cassandra, an open source distributed database. It discusses Cassandra's architecture including ring-based replication across nodes, use of hash rings to distribute data, and tunable consistency levels. It also covers Cassandra's write path using commit logs and SSTables, read path by querying nodes, and data modeling using tables, partitions, and clustering keys. Examples demonstrate modeling single-row and multi-row partitions in Cassandra.
Conceptos básicos. Seminario web 1: Introducción a NoSQLMongoDB
This document contains an agenda and summary for a webinar on introducing NoSQL databases. The webinar covers why NoSQL databases exist, different types of NoSQL databases including key-value stores, column stores, graph stores, multi-model databases and document stores. It also discusses MongoDB specifically, covering its document data model, indexing, querying, aggregation capabilities, replication and sharding for scalability. The webinar invites participants to a follow up session on building a first MongoDB application.
A Deep Dive into Apache Cassandra for .NET DevelopersLuke Tillman
.NET developers have a lot of options when it comes to databases these days. Apache Cassandra is a scalable, fault-tolerant database that has already found its way into more than 25% of the Fortune 100 and continues to grow in popularity. But what makes it different from the myriad of other options available? In this talk, we’ll take a deep dive into Cassandra and learn about:
- Cassandra’s internals and how it works
- CQL (the SQL-like query language for Cassandra)
- Data Modeling like a pro
- Tools available for developers
- Writing .NET code that talks to Cassandra
If there’s time and interest, we’ll finish up with how some companies are already using Cassandra to power services you probably interact with in your daily life. You’ll leave with all the tools you need to start build highly available .NET applications and services on top of Cassandra.
NoSQL, SQL, NewSQL - methods of structuring data.Tony Rogerson
Today’s environment is a polyglot database, that is to say, it’s made up of a number of different database sources and possibly types. In this session we’ll look at some of the options of storing data – relational, key/value, document etc. I’ll overview what is SQL, NoSQL and NewSQL to give you some context for today’s world of data storage.
Get a look under the covers: Learn tuning best practices for taking advantage of Amazon Redshift's columnar technology and parallel processing capabilities to improve your delivery of queries and improve overall database performance. This session explains how to migrate from existing data warehouses, create an optimized schema, efficiently load data, use workload management, tune your queries, and use Amazon Redshift's interleaved sorting features.
Cassandra is a distributed database designed to handle large amounts of data across commodity servers. It aims for high availability with no single points of failure. Data is distributed across nodes and replicated for redundancy. Cassandra uses a decentralized design with peer-to-peer communication and an eventually consistent model. It requires denormalized data models and queries to be defined prior to data structure.
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Amazon Web Services
Get a look under the covers: Learn tuning best practices for taking advantage of Amazon Redshift's columnar technology and parallel processing capabilities to improve your delivery of queries and improve overall database performance. This session explains how to migrate from existing data warehouses, create an optimized schema, efficiently load data, use workload management, tune your queries, and use Amazon Redshift's interleaved sorting features.You’ll then hear from a customer who has leveraged Redshift in their industry and how they have adopted many of the best practices. Learn More: https://aws.amazon.com/government-education/
This document provides an overview of DFSMS Basics and Data Set Fundamentals. It discusses the structure of data sets on direct access storage devices (DASD) including volumes, tracks, cylinders, the volume table of contents (VTOC), catalogs, and data set names. It also summarizes the different types of data set organizations including non-VSAM (direct, sequential, partitioned) and VSAM (KSDS, ESDS, RRDS, linear) as well as newer technologies like PDSEs, HFS, and zFS. The document concludes with discussions on common data set uses, limitations, extended format features, and defining data set attributes in JCL or with IDCAMS.
Analyzing big data quickly and efficiently requires a data warehouse optimized to handle and scale for large datasets. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it simple and cost-effective to analyze big data for a fraction of the cost of traditional data warehouses. By following a few best practices, you can take advantage of Amazon Redshift’s columnar technology and parallel processing capabilities to minimize I/O and deliver high throughput and query performance. This webinar will cover techniques to load data efficiently, design optimal schemas, and use work load management.
Learning Objectives:
• Get an inside look at Amazon Redshift's columnar technology and parallel processing capabilities
• Learn how to migrate from existing data warehouses, optimize schemas, and load data efficiently
• Learn best practices for managing workload, tuning your queries, and using Amazon Redshift's interleaved sorting features
Who Should Attend:
• Data Warehouse Developers, Big Data Architects, BI Managers, and Data Engineers
The document provides an overview of cyber security concepts and the Data Encryption Standard (DES) algorithm. It defines key terminology like plaintext, ciphertext, encryption, decryption, and cryptography. It explains that DES is a symmetric block cipher that encrypts data in 64-bit blocks using a 56-bit key. DES operates by performing an initial permutation on the plaintext, then uses 16 rounds of encryption involving substitution boxes and key-dependent permutation/XOR operations to generate the ciphertext.
This document provides an overview of the Data Encryption Standard (DES) algorithm. It describes how DES was adopted as a standard in 1977, uses a 64-bit block size and 56-bit key, and has been widely used for encryption. The document outlines the key components of DES, including the initial permutation, round structure using substitution boxes and key schedule, as well as the decryption process. It notes that while DES was controversial due to its 56-bit key size, it exhibits good diffusion properties. However, it has been shown to be vulnerable to brute force and timing attacks in recent years.
Cassandra is a distributed, column-oriented database designed to be highly scalable and fault-tolerant. It distributes data across nodes based on the partitioner, replicates data based on the replication strategy, and achieves consistency between replicas using a combination of hinted handoffs and read repair during reads and writes. Keyspaces contain column families which store rows of columns in a flexible schema-less data model that scales horizontally by adding more nodes.
The document describes different types of editors used in HDL design including schematic and layout editors.
Schematic editors allow designers to capture circuit connectivity and hierarchy using libraries of component symbols. Layout editors allow designers to specify the physical structure of a design using geometric shapes.
Both editor types provide file and display commands, drawing tools to create and edit circuit elements, and data structures like linked lists and quad trees to store and query design information.
The Cloud topic is everywhere, not only for big software companies, but also for our customers and of course for all service providers.
How to move from the traditional IT to a full Cloud environment and how to manage the transition phase?
We show you the Trivadis Cloud transition approach, standardized and proven, which leads you into a safe and optimized usage of cloud services in your daily business.
It’s all about Data - a Trivadis core competence for decades - no matter which deployment model we choose.
In this presentation we shed light on various Cloud strategies and concrete technologically aspects.
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analyzed, often with many consumers or systems interested in all or part of the events. Dependent on the size and quantity of such events, this can quickly be in the range of Big Data. How can we efficiently collect and transmit these events? How can we make sure that we can always report over historical events? How can these new events be integrated into traditional infrastructure and application landscape?
Starting with a product and technology neutral reference architecture, we will then present different solutions using Open Source frameworks and the Oracle Stack both for on premises as well as the cloud.
More Related Content
Similar to Le monde NOSQL pour les spécialistes du relationnel,
Archaic database technologies just don't scale under the always on, distributed demands of modern IOT, mobile and web applications. We'll start this Intro to Cassandra by discussing how its approach is different and why so many awesome companies have migrated from the cold clutches of the relational world into the warm embrace of peer to peer architecture. After this high-level opening discussion, we'll briefly unpack the following:
• Cassandra's internal architecture and distribution model
• Cassandra's Data Model
• Reads and Writes
An introduction to Cassandra given for the "Scala by the Lagoon", the Venice area Scala user group.
How the database is designed and how it works, the CAP theorem and its implications on distributed databases.
Cassandra query language first look and a primer on Phantom, a Scala DSL for connecting to a Cassandra cluster.
Scylla Summit 2022: Making Schema Changes Safe with RaftScyllaDB
ScyllaDB adopted Raft as a consensus protocol in order to dramatically improve our operational aspects as well as provide strong consistency to the end-user. This talk will explain how Raft behaves in Scylla Open Source 5.0 and introduce the first end-user visible major improvement: schema changes. Learn how cluster configuration resides in Raft, providing consistent cluster assembly and configuration management. This makes bootstrapping safer and provides reliable disaster recovery when you lose the majority of the cluster.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Cassandra is used as the backend database for Scandit's barcode and product scanning platform. It provides high scalability and availability needed to store large volumes of product data and scan data. Cassandra's data model uses a column family structure and allows storing data flexibly in column names. It is optimized for write-heavy workloads and scales easily by adding more nodes.
Cassandra Day Denver 2014: Introduction to Apache CassandraDataStax Academy
Speaker: Jon Haddad, Technical Evangelist for Apache Cassandra at DataStax
This is a crash course introduction to Cassandra. You'll step away understanding how it's possible to to utilize this distributed database to achieve high availability across multiple data centers, scale out as your needs grow, and not be woken up at 3am just because a server failed. We'll cover the basics of data modeling with CQL, and understand how that data is stored on disk. We'll wrap things up by setting up Cassandra locally, so bring your laptops!
This document introduces Cassandra, an open source distributed database. It discusses Cassandra's architecture including ring-based replication across nodes, use of hash rings to distribute data, and tunable consistency levels. It also covers Cassandra's write path using commit logs and SSTables, read path by querying nodes, and data modeling using tables, partitions, and clustering keys. Examples demonstrate modeling single-row and multi-row partitions in Cassandra.
Conceptos básicos. Seminario web 1: Introducción a NoSQLMongoDB
This document contains an agenda and summary for a webinar on introducing NoSQL databases. The webinar covers why NoSQL databases exist, different types of NoSQL databases including key-value stores, column stores, graph stores, multi-model databases and document stores. It also discusses MongoDB specifically, covering its document data model, indexing, querying, aggregation capabilities, replication and sharding for scalability. The webinar invites participants to a follow up session on building a first MongoDB application.
A Deep Dive into Apache Cassandra for .NET DevelopersLuke Tillman
.NET developers have a lot of options when it comes to databases these days. Apache Cassandra is a scalable, fault-tolerant database that has already found its way into more than 25% of the Fortune 100 and continues to grow in popularity. But what makes it different from the myriad of other options available? In this talk, we’ll take a deep dive into Cassandra and learn about:
- Cassandra’s internals and how it works
- CQL (the SQL-like query language for Cassandra)
- Data Modeling like a pro
- Tools available for developers
- Writing .NET code that talks to Cassandra
If there’s time and interest, we’ll finish up with how some companies are already using Cassandra to power services you probably interact with in your daily life. You’ll leave with all the tools you need to start build highly available .NET applications and services on top of Cassandra.
NoSQL, SQL, NewSQL - methods of structuring data.Tony Rogerson
Today’s environment is a polyglot database, that is to say, it’s made up of a number of different database sources and possibly types. In this session we’ll look at some of the options of storing data – relational, key/value, document etc. I’ll overview what is SQL, NoSQL and NewSQL to give you some context for today’s world of data storage.
Get a look under the covers: Learn tuning best practices for taking advantage of Amazon Redshift's columnar technology and parallel processing capabilities to improve your delivery of queries and improve overall database performance. This session explains how to migrate from existing data warehouses, create an optimized schema, efficiently load data, use workload management, tune your queries, and use Amazon Redshift's interleaved sorting features.
Cassandra is a distributed database designed to handle large amounts of data across commodity servers. It aims for high availability with no single points of failure. Data is distributed across nodes and replicated for redundancy. Cassandra uses a decentralized design with peer-to-peer communication and an eventually consistent model. It requires denormalized data models and queries to be defined prior to data structure.
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Amazon Web Services
Get a look under the covers: Learn tuning best practices for taking advantage of Amazon Redshift's columnar technology and parallel processing capabilities to improve your delivery of queries and improve overall database performance. This session explains how to migrate from existing data warehouses, create an optimized schema, efficiently load data, use workload management, tune your queries, and use Amazon Redshift's interleaved sorting features.You’ll then hear from a customer who has leveraged Redshift in their industry and how they have adopted many of the best practices. Learn More: https://aws.amazon.com/government-education/
This document provides an overview of DFSMS Basics and Data Set Fundamentals. It discusses the structure of data sets on direct access storage devices (DASD) including volumes, tracks, cylinders, the volume table of contents (VTOC), catalogs, and data set names. It also summarizes the different types of data set organizations including non-VSAM (direct, sequential, partitioned) and VSAM (KSDS, ESDS, RRDS, linear) as well as newer technologies like PDSEs, HFS, and zFS. The document concludes with discussions on common data set uses, limitations, extended format features, and defining data set attributes in JCL or with IDCAMS.
Analyzing big data quickly and efficiently requires a data warehouse optimized to handle and scale for large datasets. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it simple and cost-effective to analyze big data for a fraction of the cost of traditional data warehouses. By following a few best practices, you can take advantage of Amazon Redshift’s columnar technology and parallel processing capabilities to minimize I/O and deliver high throughput and query performance. This webinar will cover techniques to load data efficiently, design optimal schemas, and use work load management.
Learning Objectives:
• Get an inside look at Amazon Redshift's columnar technology and parallel processing capabilities
• Learn how to migrate from existing data warehouses, optimize schemas, and load data efficiently
• Learn best practices for managing workload, tuning your queries, and using Amazon Redshift's interleaved sorting features
Who Should Attend:
• Data Warehouse Developers, Big Data Architects, BI Managers, and Data Engineers
The document provides an overview of cyber security concepts and the Data Encryption Standard (DES) algorithm. It defines key terminology like plaintext, ciphertext, encryption, decryption, and cryptography. It explains that DES is a symmetric block cipher that encrypts data in 64-bit blocks using a 56-bit key. DES operates by performing an initial permutation on the plaintext, then uses 16 rounds of encryption involving substitution boxes and key-dependent permutation/XOR operations to generate the ciphertext.
This document provides an overview of the Data Encryption Standard (DES) algorithm. It describes how DES was adopted as a standard in 1977, uses a 64-bit block size and 56-bit key, and has been widely used for encryption. The document outlines the key components of DES, including the initial permutation, round structure using substitution boxes and key schedule, as well as the decryption process. It notes that while DES was controversial due to its 56-bit key size, it exhibits good diffusion properties. However, it has been shown to be vulnerable to brute force and timing attacks in recent years.
Cassandra is a distributed, column-oriented database designed to be highly scalable and fault-tolerant. It distributes data across nodes based on the partitioner, replicates data based on the replication strategy, and achieves consistency between replicas using a combination of hinted handoffs and read repair during reads and writes. Keyspaces contain column families which store rows of columns in a flexible schema-less data model that scales horizontally by adding more nodes.
The document describes different types of editors used in HDL design including schematic and layout editors.
Schematic editors allow designers to capture circuit connectivity and hierarchy using libraries of component symbols. Layout editors allow designers to specify the physical structure of a design using geometric shapes.
Both editor types provide file and display commands, drawing tools to create and edit circuit elements, and data structures like linked lists and quad trees to store and query design information.
Similar to Le monde NOSQL pour les spécialistes du relationnel, (20)
The Cloud topic is everywhere, not only for big software companies, but also for our customers and of course for all service providers.
How to move from the traditional IT to a full Cloud environment and how to manage the transition phase?
We show you the Trivadis Cloud transition approach, standardized and proven, which leads you into a safe and optimized usage of cloud services in your daily business.
It’s all about Data - a Trivadis core competence for decades - no matter which deployment model we choose.
In this presentation we shed light on various Cloud strategies and concrete technologically aspects.
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analyzed, often with many consumers or systems interested in all or part of the events. Dependent on the size and quantity of such events, this can quickly be in the range of Big Data. How can we efficiently collect and transmit these events? How can we make sure that we can always report over historical events? How can these new events be integrated into traditional infrastructure and application landscape?
Starting with a product and technology neutral reference architecture, we will then present different solutions using Open Source frameworks and the Oracle Stack both for on premises as well as the cloud.
Dans cette session nous vous présenterons les différentes manières d'utiliser SQL Server dans une infrastructure Cloud (Microsoft Azure). Seront présentés des scénarios hybrides, de migration, de backup, et d'hébergement de bases de données SQL Server en mode IaaS ou PaaS.
Durant cette présentation, nous introduirons des concepts de bases de la science de la donnée et discuterons d’un projet réalisé chez un de nos client.
Nous découvrirons, comment on peut facilement réaliser des projets de science de la donnée à l’aide du langage de programmation statistique R, ainsi que de son intégration dans la nouvelle suite de Microsoft SQL Server 2016.
This session shows you how you can use Microsoft Azure to build a high-scalable solution for event-processing. You can use this approach for classical IoT-scenarios or if you want for example to capture telemetry-data of a widely distributed application. Then each application-instance will send data to Azure’s Event Hub. In this session you will not only get some insights into the Event Hub, but also into Stream Analytics. Stream Analytics is used to aggregate the millions of events coming from the Event Hub by using a SQL-like syntax. From Stream Analytics the data can be pushed into a database or for example into a Live Dashboard in Microsoft’s Power BI.
Today, companies are using various channels to communicate with their customers. As a consequence, a lot of data is created, more and more also outside of the traditional IT infrastructure of an enterprise. This data often does not have a common format and they are continuously created with ever increasing volume. With Internet of Things (IoT) and their sensors, the volume as well as the velocity of data just gets more extreme.
To achieve a complete and consistent view of a customer, all these customer-related information has to be included in a 360 degree view in a real-time or near-real-time fashion. By that, the Customer Hub will become the Customer Event Hub. It constantly shows the actual view of a customer over all his interaction channels and provides an enterprise the basis for a substantial and effective customer relation.
In this presentation the value of such a platform is shown and how it can be implemented.
Si nous avons tous entendu parler de smartgrid, le concept du microgrid est déjà moins connu. Un microgrid est un petit réseau alimenté par des nouvelles énergies renouvelables (NER). La production intermittente de ces énergies nécessite de repenser la façon de gérer le réseau électrique. Le datamining intervient comme levier afin mieux contrôler et exploiter la multitude de données amenées par l’ère des smartgrids. Ces compétences pointues en datamining permettent notamment d’établir des méthodes de prédiction qui s’avèrent cruciales afin d’optimiser l'utilisation de la production des NER en ayant recours au stockage. Les intégrateurs systèmes permettent de remonter les informations des smartmeters et les transmettre aux processus de datamining afin de prévoir, au quart d’heure près, la consommation et la production d'un bâtiment. Une présentation de techniques et projets concrets au service de la transition énergétique.
Big Data and Fast Data combined – is it possible ? Introduction aux architectures Big Data. M. Ulises Fasoli, Senior Consultant Trivadis. Conférence donnée dans le cadre du Swiss Data Forum du 24 novembre 2015 à Lausanne
Avec biGenius® sur Azure, oubliez la technique, concentrez vos efforts sur le métier, Mme Patricia Düggeli, Principal Consultant Trivadis. Conférence donnée dans le cadre du Swiss Data Forum du 24 novembre 2015 à Lausanne
Le Swiss Data Cloud, vu par l’opérateur UPC Cablecom Business, Laurent Fine, Large Account Manager, UPC Cablecom. Présentation donnée dans le cadre du Swiss Data Forum du 24 novembre 2015 à Lausanne
IoT - Retour d'expérience de projets clients dans le domaine IoT. Michael Epprecht, Technical Specialist in the Global Black Belt IoT Team at Microsoft. Conférence donnée dans le cadre du Swiss Data Forum, du 24 novembre 2015 à Lausanne
Building a home security system with Microsoft Azure, Surfrace RT, Raspberry PI and Windows Phone, Thomas Huber, Principal Consultant Trivadis & Microsoft Most Valuable Professional (MVP). Conférence donnée dans le cadre du Swiss Data Forum, du 24 novembre 2015 à Lausanne
This document provides an overview of real-time analytics with Apache Cassandra and Apache Spark. It discusses how Spark can be used for stream processing over Cassandra for storage. Spark Streaming ingests real-time data from sources like Kafka and processes it using DStreams that operate on microbatches. This allows joining streaming and batch data. Cassandra is optimized for high write throughput and scales horizontally. The combination of Spark and Cassandra enables transactional analytics over large datasets in real-time.
La mobilité dans l'entreprise - Evolution de la mobilité et impactes dans l'entreprise. Présentation faite lors du Swiss Data Forum du 24 novembre 2015 à Lausanne
Retour d'expérience sur un projet de Business Intelligence réalisé à l'EVAM selon une méthodologie Agile et avec un modèle de données Data Vault. Présentation faite lors du Swiss Data Forum du 24 novembre 2015 à Lausanne
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
We are pleased to share with you the latest VCOSA statistical report on the cotton and yarn industry for the month of March 2024.
Starting from January 2024, the full weekly and monthly reports will only be available for free to VCOSA members. To access the complete weekly report with figures, charts, and detailed analysis of the cotton fiber market in the past week, interested parties are kindly requested to contact VCOSA to subscribe to the newsletter.
2. #SDF16
Programme
1. NoSQL Landscape
2. What is Apache Cassandra?
3. Cassandra Architecture
4. Data Distribution & Replication
5. Cassandra Data Model
6. Cassandra Write path
7. Cassandra Read path
8. Tools
9. Last thoughts and conclusion
4. #SDF16
NoSQL Landscape : Type of databases
Type Example Description
Key-Value
keys map to arbitrary values of any
data type
Wide Column
keys mapped to sets of n-number of
typed columns
Document
document sets (JSON) queryable in
whole or part
Graph
data elements each relate to n others
in a graph/network
5. #SDF16
Brewer's CAP Theorem
• Consistency
do you get identical results, regardless which node
is queried?
• Availability
can the cluster respond to very high write and read
volumes?
• Network Partition tolerance is the cluster still
available when part of it goes dark?
Availability
Consistency
Network
Partition
Tolerance
n/a
CA CP
AP
Any networked shared-data system can have at most two of the three desirable properties:
7. #SDF16
Cassandra
Fully distributed, with no single point of failure
Free and open source, with deep developer support
Highly performant, with near-linear horizontal scaling in proper
use cases
Bigtable Dynamo
8. #SDF16
Use cases for Cassandra
Product Catalog / Playlists
Personalization
Ads / Recommendations
Fraud Detection
Time Series
IoT / Sensor Data
Graph / Network data
10. #SDF16
Architecture overview
Designed with the understanding that system/hardware failures can and do occur
Peer-to-peer, distributed system
All nodes are identical in the cluster
Data partitioned among all nodes in the cluster
Custom data replication to ensure fault tolerance
Read/Write-anywhere design
11. #SDF16
What is a cluster?
A peer to peer set of nodes
• Node – one Cassandra instance
• Rack – a logical set of nodes
• Data Center – a logical set of racks
• Cluster – the full set of nodes which map to a single complete
token ring
Node 4
Node 1
Node 2
Data Center - East
Node 1
Node 3
Node 4
Node 3
Data Center - West
Rack 1
Rack 2
Node 2
Rack 1
Rack 2
Cassandra
Cluster
- 263+ 263
Token Range
(Murmur3)
12. #SDF16
What is a cluster?
Node 1
Node 3
Node 2Node 4
127.0.0.1
127.0.0.2
127.0.0.3
127.0.0.4
Seed
Nodes join a cluster based on the configuration of
their own conf/cassandra.yaml file
Some key settings :
• cluster_name
• seeds
• listen_address
13. #SDF16
What is a coordinator?
The node chosen by the client to receive a
particular read or write request to its cluster
Any node can coordinate any request
Each client request may be coordinated by a
different node
No single point of failure
Fundamental principle to Cassandra's architecture
Node 1
Node 3
Node 2Node 4
Client Driver
15. #SDF16
Data Partitioning & distribution
Nodes are logically structured in Ring Topology.
Each node is responsible for a part of the overall database
Data is assigned to a specific node based on a hashed value of key
Lightly loaded nodes can move position to alleviate highly loaded nodes
17. #SDF16
Data Replication
Defined at keyspace level
o Replication factor : how many replicas to make
o Replication strategy : on which node should each replica be
placed
All partitions are "replicas", there are no "originals"
First replica : placed on the node owning its token's primary
range
18. #SDF16
Data Replication / Distribution
Native data replication / distribution support
Transparently handled by Cassandra
Multi-data center capable
Hybrid Cloud/On premise support
19. #SDF16
What is consistency ?
Node 1
Node 3
Node 2Node 4
Client Driver
Just one?
CL=ONE
Two?
CL=TWO
51%?
CL=QUORUM
Partition key determines which nodes are
sent any given request
• Consistency Level : how many nodes must
acknowledge before response is sent
The meaning varies by type
• Write request – how many nodes
must acknowledge the write?
• Read request – how many nodes
must acknowledge by sending their
most recent copy of the data?
20. #SDF16
What is immediate vs. eventual consistency?
Immediate Consistency – reads always return the most recent data
• Immediate consistency guaranteed with Consistency Level ALL
• Highest latency (all replicas are checked and compared)
Eventual Consistency – reads may return stale data
• Consistency Level ONE carries the highest risk of stale data
• Lowest latency (first replica is immediately returned)
ANY ALLONE TWO . . . .
0 Total Nodes (N)1 2
Available Replicas
Consistency Level
Read repairs are there to prevent entropy
22. #SDF16
Cassandra Data Model
The Cassandra data model defines
1. Column family as a way to store and organize data
2. Table as a two-dimensional view of a multi-dimensional column family
3. Cassandra Query Language (CQL) : A language to perform operations
on tables
24. #SDF16
What is a column family
row
key3
v3.a
cola
v3.b
colb
v3.c
colc
v1.a
cola
v1.b
colb
v1.c
colc
v2.a
cola
v2.b
colb
v2.c
colc
v3.d
cold
v1.d
cold
v2.d
cold
COLUMNS
ROWS
row
key1
row
key2
CELLS
Column family – set of rows with a similar structure
• Sorted columns
• Multidimensional
• Distributed
• Sparse
25. #SDF16
What are row, row key, column key, and column value?
• Rows – individual rows constitute a column family
• Row key – uniquely identifies a row in a column family
• Row – stores pairs of column keys and column values
• Column key – uniquely identifies a column value in a row
• Column value – stores one value or a collection of values
row
key
va
cola
vb
colb
vc
colc
vd
cold
Column keys (or column names)Row
Column values (or cells)
26. #SDF16
What are row, row key, column key, and column value?
John
Lennon
1940
born
England
country
1980
died
Rock
style
artist
type
The Beatles
England
country
1957
founded
Rock
style
band
type
Row key Column keys
Column values
27. #SDF16
What is a wide row?
Rows may be described as “skinny” or “wide”
• Skinny row –fixed, relatively small number of column keys
Wide row –relatively large number of column keys (hundreds or thousands)
• For example, a row that stores all bands of the same style
• The number of such bands will increase as new bands are formed
Rock
The Animals The Beatles...
...
...
...
...
...
28. #SDF16
What are composite row key and composite column key?
Composite row key – multiple components separated by colon
Composite column key – multiple components separated by colon
• Composite column keys are sorted by each component
Revolver:1966
Rock
genre
The Beatles
performer
{1: 'Taxman', ..., 14: 'Tomorrow Never Knows'}
tracks
Revolver:1966
Taxman
1:title
Eleanor Rigby
2:title
Tomorrow Never Knows
14:title...
...
29. #SDF16
What are partition, partition key, row, column, and cell?
Column family
view
Table with single-row
partitions
30. #SDF16
What are composite partition key and clustering column?
Table with multi-row partitions
partitions
album_title year num
ber
track_title
Revolver 1966 1 Taxman
Revolver 1966 … …
Revolver 1966 14 Tomorrow Never Knows
Let It Be 1970 1 Two Of Us
Let It Be 1970 … …
Let It Be 1970 11 Get Back
Magical Mystery Tour 1967 1 Magical Mystery Tour
Magical Mystery Tour 1967 … …
Magical Mystery Tour 1967 11 All You Need Is Love
rows in a partition/table
columns
composite partition key
clustering column
cells
31. #SDF16
What are composite partition key and clustering column?
Revolver:1966
Taxman
1:title
Two Of Us
1:title
Let It Be:1970
Magical Mystery
Tour:1967
Magical Mystery Tour
1:title
Doctor Robert
11:title
Get Back
11:title
All You Need Is Love
11:title
Tomorrow Never
Knows
14:title...
...
...
...
...
...
...
...
Table with multi-row partitions : Column family view
32. #SDF16
What are static columns?
Table with multi-row partitions and static columns
album_title year num
ber
genre performer track_title
Revolver 1966 1 Rock The Beatles Taxman
Revolver 1966 … Rock The Beatles …
Revolver 1966 14 Rock The Beatles Tomorrow Never Knows
Let It Be 1970 1 Rock The Beatles Two Of Us
Let It Be 1970 … Rock The Beatles …
Let It Be 1970 11 Rock The Beatles Get Back
Static
columns
33. #SDF16
What is a primary key?
Primary key uniquely identifies a row in a table
Simple or composite partition key and all clustering columns (if present)
performer born country died founded style type
John Lennon 1940 England 1980 Rock artist
Paul McCartney 1942 England Rock artist
album_title year number track_title
Revolver 1966 1 Taxman
Revolver 1966 … …
Revolver 1966 14 Tomorrow Never
Let It Be 1970 1 Two Of Us
Let It Be 1970 … …
Let It Be 1970 11 Get Back
composite partition key
+
clustering column
Primary
key
Single partition key
34. #SDF16
What is a table or CQL Table?
A CQL table is a column family
• CQL tables provide two-dimensional views of a column family, which contains
potentially multi-dimensional data, due to composite keys and collections
CQL table and column family are largely interchangeable terms
Supported by declarative language Cassandra Query Language (CQL)
35. #SDF16
Cassandra Query Language (CQL)
Data Definition Language, subset of CQL
SQL-like syntax, but with somewhat different semantics
36. #SDF16
Cassandra Data Model differences from RDBMS
Cassandra RDBMS
Cassandra deals with unstructured data. RDBMS deals with structured data.
Cassandra has a flexible schema. It has a fixed schema.
In Cassandra, a table is a list of “nested key-value
pairs”. (ROW x COLUMN key x COLUMN value)
In RDBMS, a table is an array of arrays. (ROW x
COLUMN)
keyspace is the outermost container that contains data
corresponding to an application.
Database is the outermost container that contains
data corresponding to an application.
Tables or column families are the entity of a keyspace. Tables are the entities of a database.
Row is a unit of replication in Cassandra. Row is an individual record in RDBMS.
Column is a unit of storage in Cassandra. Column represents the attributes of a relation.
Relationships are represented using collections. RDBMS supports the concepts of foreign keys, joins.
38. #SDF16
Write path: how is data written
Cassandra is a log-structured storage engine
Data is sequentially appended, not placed in pre-set locations
RDBMS CASSANDRA
Continuously appends to a log
?
?
Seeks and writes values to
various pre-set locations
39. #SDF16
How does the write path flow on a node?
… … … …
… … … …
… … … …
… … … …
… … … …
… … … …
Node memory
Node file system
Client partition key1 first:Oscar last:Orange level:42
partition key2 first:martin last:Blue
Memtable (corresponds to a CQL table)Coordinator
CommitLog
AppendOnly … … … …
… … … …
Flush current state to SSTable
Each write request … Periodically …
Periodically …
… … … …
… … … …
… … … …
… … … …
… … … …
Compaction
Compact related SSTables
SSTables
partition key3 first:Ricky last:Red
40. #SDF16
What is the Commit Log?
An append-only log used to automatically rebuild Memtables
on restart of a downed node.
Memtables flush to disk when CommitLog size reaches total
allowed space
Entries are marked as flushed, as corresponding Memtable
entries flush to disk as an SSTable
CommitLog options are configured in the Cassandra.yaml file
CommitLog
41. #SDF16
What are Memtables and how are they flushed to disk?
Memtables are in-memory representations of a CQL table :
• Each node has a Memtable for each CQL table in the keyspace
• Each Memtable accrues writes and provides reads for data not yet flushed
• Updates to Memtables mutate the in-memory partition
partition key1 first:Oscar last:Orange level:42
partition key2 first:Ricky last:Red
Memtable
42. #SDF16
What is a SSTable and what are its characteristics?
… … … …
… … … …
… … … …
… … … …
… … … …
… … … …
… … … …
… … … …
A SSTable ("sorted string table") is
• an immutable file of sorted partitions
• written to disk through fast, sequential i/o
• contains the state of a Memtable when flushed
The current data state of a CQL table is comprised of
• its corresponding Memtable plus
• all current SSTables flushed from that Memtable
SSTables are periodically compacted from many to one
48. #SDF16
Tools : CQLSH
Interactive command line CQL utility
Supports tab completion for commands
Think of it as SQL*Plus for Cassandra
49. #SDF16
Tools : Cassandra Cluster Manager (CCM)
Open source utility
Creates and manages multi-node clusters on a local machine
Not for production configuration
Useful for :
• Testing failure scenarios
• Development / Prototyping without the hardware
• Version migrations
• …
50. #SDF16
Tools : Nodetool
Command-line cluster management utility
Supports over 40 commands like :
• Status
• Info
• ring
51. #SDF16
Tools : Datastax : DevCenter
Visually Create and Navigate Database Objects
View Query Results and Tune Queries for Faster Performance
56. #SDF16
DBAs wanted
NoSQL and Cassandra will not replace RDBMS
Different tools for different jobs
Current situation :
• Community largely driven by developers and sysadmins
• Community needs insight from DBAs to make the database evolve
• Get involved!