Managing large volumes of data isn’t trivial and needs a plan. Fast Data is how we describe the nature of data in a heavily consumer-driven world. Fast in. Fast out. Is your data infrastructure ready? You will learn some important reference architectures for large-scale data problems. The three main areas are covered:
Organize - Manage the incoming data stream and ensure it is processed correctly and on time. No data left behind.
Process - Analyze volumes of data you receive in near real-time or in a batch. Be ready for fast serving in your application.
Store - Reliably store data in the data models to support your application. Never accept downtime or slow response times.
Apache cassandra and spark. you got the the lighter, let's start the firePatrick McFadin
Introduction to analyzing Apache Cassandra data using Apache Spark. This includes data models, operations topics and the internal on how Spark interfaces with Cassandra.
Storing time series data with Apache CassandraPatrick McFadin
If you are looking to collect and store time series data, it's probably not going to be small. Don't get caught without a plan! Apache Cassandra has proven itself as a solid choice now you can learn how to do it. We'll look at possible data models and the the choices you have to be successful. Then, let's open the hood and learn about how data is stored in Apache Cassandra. You don't need to be an expert in distributed systems to make this work and I'll show you how. I'll give you real-world examples and work through the steps. Give me an hour and I will upgrade your time series game.
Owning time series with team apache Strata San Jose 2015Patrick McFadin
Break out your laptops for this hands-on tutorial is geared around understanding the basics of how Apache Cassandra stores and access time series data. We’ll start with an overview of how Cassandra works and how that can be a perfect fit for time series. Then we will add in Apache Spark as a perfect analytics companion. There will be coding as a part of the hands on tutorial. The goal will be to take a example application and code through the different aspects of working with this unique data pattern. The final section will cover the building of an end-to-end data pipeline to ingest, process and store high speed, time series data.
An Introduction to time series with Team ApachePatrick McFadin
We as an industry are collecting more data every year. IoT, web, and mobile applications send torrents of bits to our data centers that have to be processed and stored, even as users expect an always-on experience—leaving little room for error. Patrick McFadin explores how successful companies do this every day using the powerful Team Apache: Apache Kafka, Spark, and Cassandra.
Patrick walks you through organizing a stream of data into an efficient queue using Apache Kafka, processing the data in flight using Apache Spark Streaming, storing the data in a highly scaling and fault-tolerant database using Apache Cassandra, and transforming and finding insights in volumes of stored data using Apache Spark.
Topics include:
- Understanding the right use case
- Considerations when deploying Apache Kafka
- Processing streams with Apache Spark Streaming
- A deep dive into how Apache Cassandra stores data
- Integration between Cassandra and Spark
- Data models for time series
- Postprocessing without ETL using Apache Spark on Cassandra
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterprisePatrick McFadin
Wait! Back away from the Cassandra 2ndary index. It’s ok for some use cases, but it’s not an easy button. "But I need to search through a bunch of columns to look for the data and I want to do some regression analysis… and I can’t model that in C*, even after watching all of Patrick McFadins videos. What do I do?” The answer, dear developer, is in DSE Search and Analytics. With it’s easy Solr API and Spark integration so you can search and analyze data stored in your Cassandra database until your heart’s content. Take our hand. WE will show you how.
Nike Tech Talk: Double Down on Apache Cassandra and SparkPatrick McFadin
Apache Cassandra has proven to be one of the best solutions for storing and retrieving time series data at high velocity and high volume. This talk will give you an overview of the many ways you can be successful by introducing Apache Cassandra concepts. We will discuss how the storage model of Cassandra is well suited for this pattern and go over examples of how best to build data models. There will also be examples of how you can use Apache Spark along with Apache Cassandra to create a real time data analytics platform. It’s so easy, you will be shocked and ready to try it yourself.
You’ve heard all of the hype, but how can SMACK work for you? In this all-star lineup, you will learn how to create a reactive, scaling, resilient and performant data processing powerhouse. We will go through the basics of Akka, Kafka and Mesos and then deep dive into putting them together in an end2end (and back again) distrubuted transaction. Distributed transactions mean producers waiting for one or more of consumers to respond. On the backend, you will see how Apache Cassandra and Spark can be combined to add the incredibly scaling storage and data analysis needed for fast data pipelines. With these technologies as a foundation, you have the assurance that scale is never a problem and uptime is default.
Functional data models are great, but how can you squeeze out more performance and make them awesome! Let's talk through some example models, go through the tuning steps and understand the tradeoffs. Many time's just a simple understanding of the underlying internals can make all the difference. I've helped some of the biggest companies in the world do this and I can help you. Do you feel the need for Cassandra 2.0 speed?
Apache cassandra and spark. you got the the lighter, let's start the firePatrick McFadin
Introduction to analyzing Apache Cassandra data using Apache Spark. This includes data models, operations topics and the internal on how Spark interfaces with Cassandra.
Storing time series data with Apache CassandraPatrick McFadin
If you are looking to collect and store time series data, it's probably not going to be small. Don't get caught without a plan! Apache Cassandra has proven itself as a solid choice now you can learn how to do it. We'll look at possible data models and the the choices you have to be successful. Then, let's open the hood and learn about how data is stored in Apache Cassandra. You don't need to be an expert in distributed systems to make this work and I'll show you how. I'll give you real-world examples and work through the steps. Give me an hour and I will upgrade your time series game.
Owning time series with team apache Strata San Jose 2015Patrick McFadin
Break out your laptops for this hands-on tutorial is geared around understanding the basics of how Apache Cassandra stores and access time series data. We’ll start with an overview of how Cassandra works and how that can be a perfect fit for time series. Then we will add in Apache Spark as a perfect analytics companion. There will be coding as a part of the hands on tutorial. The goal will be to take a example application and code through the different aspects of working with this unique data pattern. The final section will cover the building of an end-to-end data pipeline to ingest, process and store high speed, time series data.
An Introduction to time series with Team ApachePatrick McFadin
We as an industry are collecting more data every year. IoT, web, and mobile applications send torrents of bits to our data centers that have to be processed and stored, even as users expect an always-on experience—leaving little room for error. Patrick McFadin explores how successful companies do this every day using the powerful Team Apache: Apache Kafka, Spark, and Cassandra.
Patrick walks you through organizing a stream of data into an efficient queue using Apache Kafka, processing the data in flight using Apache Spark Streaming, storing the data in a highly scaling and fault-tolerant database using Apache Cassandra, and transforming and finding insights in volumes of stored data using Apache Spark.
Topics include:
- Understanding the right use case
- Considerations when deploying Apache Kafka
- Processing streams with Apache Spark Streaming
- A deep dive into how Apache Cassandra stores data
- Integration between Cassandra and Spark
- Data models for time series
- Postprocessing without ETL using Apache Spark on Cassandra
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterprisePatrick McFadin
Wait! Back away from the Cassandra 2ndary index. It’s ok for some use cases, but it’s not an easy button. "But I need to search through a bunch of columns to look for the data and I want to do some regression analysis… and I can’t model that in C*, even after watching all of Patrick McFadins videos. What do I do?” The answer, dear developer, is in DSE Search and Analytics. With it’s easy Solr API and Spark integration so you can search and analyze data stored in your Cassandra database until your heart’s content. Take our hand. WE will show you how.
Nike Tech Talk: Double Down on Apache Cassandra and SparkPatrick McFadin
Apache Cassandra has proven to be one of the best solutions for storing and retrieving time series data at high velocity and high volume. This talk will give you an overview of the many ways you can be successful by introducing Apache Cassandra concepts. We will discuss how the storage model of Cassandra is well suited for this pattern and go over examples of how best to build data models. There will also be examples of how you can use Apache Spark along with Apache Cassandra to create a real time data analytics platform. It’s so easy, you will be shocked and ready to try it yourself.
You’ve heard all of the hype, but how can SMACK work for you? In this all-star lineup, you will learn how to create a reactive, scaling, resilient and performant data processing powerhouse. We will go through the basics of Akka, Kafka and Mesos and then deep dive into putting them together in an end2end (and back again) distrubuted transaction. Distributed transactions mean producers waiting for one or more of consumers to respond. On the backend, you will see how Apache Cassandra and Spark can be combined to add the incredibly scaling storage and data analysis needed for fast data pipelines. With these technologies as a foundation, you have the assurance that scale is never a problem and uptime is default.
Functional data models are great, but how can you squeeze out more performance and make them awesome! Let's talk through some example models, go through the tuning steps and understand the tradeoffs. Many time's just a simple understanding of the underlying internals can make all the difference. I've helped some of the biggest companies in the world do this and I can help you. Do you feel the need for Cassandra 2.0 speed?
At this meetup Patrick McFadin, Solutions Architect at DataStax, will be discussing the most recently added features in Apache Cassandra 2.0, including: Lightweight transactions, eager retries, improved compaction, triggers, and CQL cursors. He'll also be touching on time series data with Apache Cassandra.
Apache Cassandra is a popular choice for a wide variety of application persistence needs. There are many design choices that can effect uptime and performance. In this talk we'll look at some of the many things to consider from a single server to multiple data centers. Basic understanding of Cassandra features coupled with client driver features can be a very powerful combination. This talk will be an introduction but will deep dive into the technical details of how Cassandra works.
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...DataStax
Worried that you aren't taking full advantage of your Spark and Cassandra integration? Well worry no more! In this talk we'll take a deep dive into all of the available configuration options and see how they affect Cassandra and Spark performance. Concerned about throughput? Learn to adjust batching parameters and gain a boost in speed. Always running out of memory? We'll take a look at the various causes of OOM errors and how we can circumvent them. Want to take advantage of Cassandra's natural partitioning in Spark? Find out about the recent developments that let you perform shuffle-less joins on Cassandra-partitioned data! Come with your questions and problems and leave with answers and solutions!
About the Speaker
Russell Spitzer Software Engineer, DataStax
Russell Spitzer received a Ph.D in Bio-Informatics before finding his deep passion for distributed software. He found the perfect outlet for this passion at DataStax where he began on the Automation and Test Engineering team. He recently moved from finding bugs to making bugs as part of the Analytics team where he works on integration between Cassandra and Spark as well as other tools.
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...DataStax
Have you ever wondered what is in all of those SSTable files and how it helps Cassandra find and manage your data? If you go to the Datastax website they will give you a high level explanation of what is in each file. In this talk we will go much deeper explaining each file and walking through a dump of its contents. We will also explore the differences between Cassandra 2.1 and 3.4.
About the Speaker
John Schulz Prinicipal Consultant, The Pythian Group
John has 40 of years experience working with data. Data in files and in Databases from flat files through ISAM to relational databases and most recently NoSQL. For the last 15 he's worked on a variety of Open source technologies including MySQL, PostgreSQL, Cassandra, Riak, Hadoop and Hbase. He has been working with Cassandra since 2010. For the last eighteen months he has been working for The Pythian Group to help their customers improve their existing databases and select new ones.
Spark and Cassandra with the Datastax Spark Cassandra Connector
How it works and how to use it!
Missed Spark Summit but Still want to see some slides?
This slide deck is for you!
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...DataStax
We built an application based on the principles of CQRS and Event Sourcing using Cassandra and Spark. During the project we encountered a number of challenges and problems with Cassandra and the Spark Connector.
In this talk we want to outline a few of those problems and our actions to solve them. While some problems are specific to CQRS and Event Sourcing applications most of them are use case independent.
About the Speakers
Matthias Niehoff IT-Consultant, codecentric AG
works as an IT-Consultant at codecentric AG in Germany. His focus is on big data & streaming applications with Apache Cassandra & Apache Spark. Yet he does not lose track of other tools in the area of big data. Matthias shares his experiences on conferences, meetups and usergroups.
Stephan Kepser Senior IT Consultant and Data Architect, codecentric AG
Dr. Stephan Kepser is an expert on cloud computing and big data. He wrote a couple of journal articles and blog posts on subjects of both fields. His interests reach from legal questions to questions of architecture and design of cloud computing and big data systems to technical details of NoSQL databases.
Introduction to data modeling with apache cassandraPatrick McFadin
Are you using relational databases and wonder how to get started with data modeling and Apache Cassandra? Here is a starting tour of how to get started. Translating from the knowledge you already have to the knowledge you need to effective with Cassandra development. We cover patterns and anti-patterns. Get going today!
Escape From Hadoop: Spark One Liners for C* OpsRussell Spitzer
Apache Cassandra and Spark when combined can give powerful OLTP and OLAP functionality for your data. We’ll walk through the basics of both of these platforms before diving into applications combining the two. Usually joins, changing a partition key, or importing data can be difficult in Cassandra, but we’ll see how do these and other operations in a set of simple Spark Shell one-liners!
DataStax: An Introduction to DataStax Enterprise SearchDataStax Academy
1) Why We Built DSE Search
2) Basics of the Read and Write Paths
3) Fault-tolerance and Adaptive Routing
4) Analytics with Search and Spark
5) Live Indexing
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...DataStax Academy
Wait! Back away from the Cassandra 2ndary index. It’s ok for some use cases, but it’s not an easy button. "But I need to search through a bunch of columns to look for the data and I want to do some regression analysis… and I can’t model that in C*, even after watching all of Patrick McFadins videos. What do I do?” The answer, dear developer, is in DSE Search and Analytics. With it’s easy Solr API and Spark integration so you can search and analyze data stored in your Cassandra database until your heart’s content. Take our hand. WE will show you how.
DataStax Enterprise clients, such as CQLSH or Hadoop and Spark based applications, can be precisely configured to achieve a desired behaviour. For a basic use case, we just run a dedicated DSE command and do not care about how all of those pieces are setup to work together, leveraging the goodness of DSE. However, understanding where and what we need to modify to achieve the expected change in the configuration is essential for using DSE efficiently. In this presentation we go through the basic and advanced settings for client applications, including security features and limitations or DSE patches introduced into integrated Spark. We show the new tools which significantly simplify the configuration of external DSE installations which are used just for accessing DSE cluster in client mode. Finally, we conclude with hints for configuring Spark driver from scratch in order to use it in a web application, when running the program through DSE scripts is not feasible.
About the Speaker
Jacek Lewandowski Software engineer, DataStax
Jacek Lewandowski is a software engineer with 13 years of experience. Initially a full stack developer, he was working as a consultant and a trainer for different companies. Since 2011 he started using Cassandra as an alternative to SQL in various applications. He is passionate about distributed algorithms, graphs and functional programming in Scala. Part time assistant professor popularizing Cassandra database among students and researchers. Working at DataStax Analytics team for over 2 years.
A lot has changed since I gave one of these talks and man, has it been good. 2.0 brought us a lot of new CQL features and now with 2.1 we get even more! Let me show you some real life data models and those new features taking developer productivity to an all new high. User Defined Types, New Counters, Paging, Static Columns. Exciting new ways of making your app truly killer!
UKOUG Tech15 - Deploying Oracle 12c Cloud Control in Maximum Availability Arc...Zahid Anwar (OCM)
Common Cloud Control deployments can sometimes be exposed to single points of failure. In this presentation we will be discussing these pitfalls and how, through deploying Cloud Control within the Maximum Availability Architecture can provide a robust system. Aimed at a technical audience - we will dive into giving High Availability and Disaster Recovery for the OMS repository and OMS Web Tier through the use of RAC, Web Tier Clustering, Data Guard and Storage Replication. We will take our audience through the simple but effective steps required for this type of deployment in addition to the license implications of using Maximum Availability Architecture including what Oracle give you for free under a restricted-use license. This presentation is based on a recent project completed by our speaker Zahid Anwar. This project saw Zahid provide Maximum Availability Architecture for Cloud Control which was monitoring 6, critical X4-2 Eighth Exadata Machines.
Oracle Drivers configuration for High AvailabilityLudovico Caldara
... is it a developer's job?
UCP, GridLink, TAF, AC, TAC, FAN… The configuration of Oracle Drivers for application high availability is not an easy job. The developers often care about the minimal working configuration, while the DBAs are busy with the operations. In this session I will try to demystify application server’s connectivity to the database and give a direction toward the highest availability, using Real Application Clusters and new Oracle features like TAC and CMAN TDM.
At this meetup Patrick McFadin, Solutions Architect at DataStax, will be discussing the most recently added features in Apache Cassandra 2.0, including: Lightweight transactions, eager retries, improved compaction, triggers, and CQL cursors. He'll also be touching on time series data with Apache Cassandra.
Apache Cassandra is a popular choice for a wide variety of application persistence needs. There are many design choices that can effect uptime and performance. In this talk we'll look at some of the many things to consider from a single server to multiple data centers. Basic understanding of Cassandra features coupled with client driver features can be a very powerful combination. This talk will be an introduction but will deep dive into the technical details of how Cassandra works.
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...DataStax
Worried that you aren't taking full advantage of your Spark and Cassandra integration? Well worry no more! In this talk we'll take a deep dive into all of the available configuration options and see how they affect Cassandra and Spark performance. Concerned about throughput? Learn to adjust batching parameters and gain a boost in speed. Always running out of memory? We'll take a look at the various causes of OOM errors and how we can circumvent them. Want to take advantage of Cassandra's natural partitioning in Spark? Find out about the recent developments that let you perform shuffle-less joins on Cassandra-partitioned data! Come with your questions and problems and leave with answers and solutions!
About the Speaker
Russell Spitzer Software Engineer, DataStax
Russell Spitzer received a Ph.D in Bio-Informatics before finding his deep passion for distributed software. He found the perfect outlet for this passion at DataStax where he began on the Automation and Test Engineering team. He recently moved from finding bugs to making bugs as part of the Analytics team where he works on integration between Cassandra and Spark as well as other tools.
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...DataStax
Have you ever wondered what is in all of those SSTable files and how it helps Cassandra find and manage your data? If you go to the Datastax website they will give you a high level explanation of what is in each file. In this talk we will go much deeper explaining each file and walking through a dump of its contents. We will also explore the differences between Cassandra 2.1 and 3.4.
About the Speaker
John Schulz Prinicipal Consultant, The Pythian Group
John has 40 of years experience working with data. Data in files and in Databases from flat files through ISAM to relational databases and most recently NoSQL. For the last 15 he's worked on a variety of Open source technologies including MySQL, PostgreSQL, Cassandra, Riak, Hadoop and Hbase. He has been working with Cassandra since 2010. For the last eighteen months he has been working for The Pythian Group to help their customers improve their existing databases and select new ones.
Spark and Cassandra with the Datastax Spark Cassandra Connector
How it works and how to use it!
Missed Spark Summit but Still want to see some slides?
This slide deck is for you!
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...DataStax
We built an application based on the principles of CQRS and Event Sourcing using Cassandra and Spark. During the project we encountered a number of challenges and problems with Cassandra and the Spark Connector.
In this talk we want to outline a few of those problems and our actions to solve them. While some problems are specific to CQRS and Event Sourcing applications most of them are use case independent.
About the Speakers
Matthias Niehoff IT-Consultant, codecentric AG
works as an IT-Consultant at codecentric AG in Germany. His focus is on big data & streaming applications with Apache Cassandra & Apache Spark. Yet he does not lose track of other tools in the area of big data. Matthias shares his experiences on conferences, meetups and usergroups.
Stephan Kepser Senior IT Consultant and Data Architect, codecentric AG
Dr. Stephan Kepser is an expert on cloud computing and big data. He wrote a couple of journal articles and blog posts on subjects of both fields. His interests reach from legal questions to questions of architecture and design of cloud computing and big data systems to technical details of NoSQL databases.
Introduction to data modeling with apache cassandraPatrick McFadin
Are you using relational databases and wonder how to get started with data modeling and Apache Cassandra? Here is a starting tour of how to get started. Translating from the knowledge you already have to the knowledge you need to effective with Cassandra development. We cover patterns and anti-patterns. Get going today!
Escape From Hadoop: Spark One Liners for C* OpsRussell Spitzer
Apache Cassandra and Spark when combined can give powerful OLTP and OLAP functionality for your data. We’ll walk through the basics of both of these platforms before diving into applications combining the two. Usually joins, changing a partition key, or importing data can be difficult in Cassandra, but we’ll see how do these and other operations in a set of simple Spark Shell one-liners!
DataStax: An Introduction to DataStax Enterprise SearchDataStax Academy
1) Why We Built DSE Search
2) Basics of the Read and Write Paths
3) Fault-tolerance and Adaptive Routing
4) Analytics with Search and Spark
5) Live Indexing
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...DataStax Academy
Wait! Back away from the Cassandra 2ndary index. It’s ok for some use cases, but it’s not an easy button. "But I need to search through a bunch of columns to look for the data and I want to do some regression analysis… and I can’t model that in C*, even after watching all of Patrick McFadins videos. What do I do?” The answer, dear developer, is in DSE Search and Analytics. With it’s easy Solr API and Spark integration so you can search and analyze data stored in your Cassandra database until your heart’s content. Take our hand. WE will show you how.
DataStax Enterprise clients, such as CQLSH or Hadoop and Spark based applications, can be precisely configured to achieve a desired behaviour. For a basic use case, we just run a dedicated DSE command and do not care about how all of those pieces are setup to work together, leveraging the goodness of DSE. However, understanding where and what we need to modify to achieve the expected change in the configuration is essential for using DSE efficiently. In this presentation we go through the basic and advanced settings for client applications, including security features and limitations or DSE patches introduced into integrated Spark. We show the new tools which significantly simplify the configuration of external DSE installations which are used just for accessing DSE cluster in client mode. Finally, we conclude with hints for configuring Spark driver from scratch in order to use it in a web application, when running the program through DSE scripts is not feasible.
About the Speaker
Jacek Lewandowski Software engineer, DataStax
Jacek Lewandowski is a software engineer with 13 years of experience. Initially a full stack developer, he was working as a consultant and a trainer for different companies. Since 2011 he started using Cassandra as an alternative to SQL in various applications. He is passionate about distributed algorithms, graphs and functional programming in Scala. Part time assistant professor popularizing Cassandra database among students and researchers. Working at DataStax Analytics team for over 2 years.
A lot has changed since I gave one of these talks and man, has it been good. 2.0 brought us a lot of new CQL features and now with 2.1 we get even more! Let me show you some real life data models and those new features taking developer productivity to an all new high. User Defined Types, New Counters, Paging, Static Columns. Exciting new ways of making your app truly killer!
UKOUG Tech15 - Deploying Oracle 12c Cloud Control in Maximum Availability Arc...Zahid Anwar (OCM)
Common Cloud Control deployments can sometimes be exposed to single points of failure. In this presentation we will be discussing these pitfalls and how, through deploying Cloud Control within the Maximum Availability Architecture can provide a robust system. Aimed at a technical audience - we will dive into giving High Availability and Disaster Recovery for the OMS repository and OMS Web Tier through the use of RAC, Web Tier Clustering, Data Guard and Storage Replication. We will take our audience through the simple but effective steps required for this type of deployment in addition to the license implications of using Maximum Availability Architecture including what Oracle give you for free under a restricted-use license. This presentation is based on a recent project completed by our speaker Zahid Anwar. This project saw Zahid provide Maximum Availability Architecture for Cloud Control which was monitoring 6, critical X4-2 Eighth Exadata Machines.
Oracle Drivers configuration for High AvailabilityLudovico Caldara
... is it a developer's job?
UCP, GridLink, TAF, AC, TAC, FAN… The configuration of Oracle Drivers for application high availability is not an easy job. The developers often care about the minimal working configuration, while the DBAs are busy with the operations. In this session I will try to demystify application server’s connectivity to the database and give a direction toward the highest availability, using Real Application Clusters and new Oracle features like TAC and CMAN TDM.
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016StampedeCon
Have you ever wanted to analyze sensor data that arrives every second from across the world? Or maybe your want to analyze intra-day trading prices of millions of financial instruments? Or take all the page views from Wikipedia and compare the hourly statistics? To do this or any other similar analysis, you will need to analyze large sequences of measurements over time. And what better way to do this then with Apache Spark? In this session we will dig into how to consume data, and analyze it with Spark, and then store the results in Apache Cassandra.
Webinar - How to Build Data Pipelines for Real-Time Applications with SMACK &...DataStax
Data is being collected more and more every year. Cloud applications, including IoT, web, and mobile send torrents of bits at our data centers that have to be processed and stored. In addition, users expect an always-on experience, with little room for error. Numerous companies are successfully doing this every day. In this webinar, you will learn about the convergence of complementary technologies: Spark, Mesos, Akka, Cassandra and Kafka (SMACK), how Apache Kafka can help you get your data under control and the critical role Kafka plays in your data pipeline.
Webinar recording: https://youtu.be/uwYlwLyv-1s
Webinar Q&A will be posted shortly.
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...StreamNative
Despite what the Ghostbusters said, we’re going to go ahead and cross (or, join) the streams. This session covers getting started with streaming data pipelines, maximizing Pulsar’s messaging system alongside one of the most flexible streaming frameworks available, Apache Flink. Specifically, we’ll demonstrate the use of Flink SQL, which provides various abstractions and allows your pipeline to be language-agnostic. So, if you want to leverage the power of a high-speed, highly customizable stream processing engine without the usual overhead and learning curves of the technologies involved (and their interconnected relationships), then this talk is for you. Watch the step-by-step demo to build a unified batch and streaming pipeline from scratch with Pulsar, via the Flink SQL client. This means you don’t need to be familiar with Flink, (or even a specific programming language). The examples provided are built for highly complex systems, but the talk itself will be accessible to any experience level.
... or why Oracle still cares about CMAN and why you should do it too
The Oracle Connection Manager (CMAN) is the Swiss-army knife for database connections. It can be used for security, routing, high availability, single-point of contact... Starting with Oracle 18c, it has been extended with the new Traffic Director Mode (CMAN TDM), that allows transparent failover for applications that do not implement it natively.
In this session I will introduce briefly what CMAN is capable of, how to configure it in a high availability environment, and how the new release achieves a higher protection level.
Are your Oracle databases highly available? You have deployed Real Application Clusters (RAC), Data Guard, or Failover Clusters and are well protected against server failures? Great – the prerequisites for a highly available environment are given. However, to assure that backend infrastructure failures also remain transparent to the client, an appropriate configuration is a prerequisite.
This lecture will discuss the Oracle technologies that can be used to achieve automatic client failover functionality. What are the advantages, but also the limitations of these technologies?
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
Flink Forward San Francisco 2022.
At Stripe we have created a complete end to end exactly-once processing pipeline to process financial data at scale, by combining the exactly-once power from Flink, Kafka, and Pinot together. The pipeline provides exactly-once guarantee, end-to-end latency within a minute, deduplication against hundreds of billions of keys, and sub-second query latency against the whole dataset with trillion level rows. In this session we will discuss the technical challenges of designing, optimizing, and operating the whole pipeline, including Flink, Kafka, and Pinot. We will also share our lessons learned and the benefits gained from exactly-once processing.
by
Xiang Zhang & Pratyush Sharma & Xiaoman Dong
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...DataStax
Cassandra's support for multiple data centers can bring massive benefits to an organization, however it can also bring painful operational lessons. While there is no recipe for trouble free mutli DC clusters, the best approach is to understand why you are using one, what Cassandra supports, and how it does it. With this knowledge in your toolkit you will have a better chance of fixing the sort of gremlins that can trouble a globally distributed database.
In this talk Alexander Dejanovski, Consultant at The Last Pickle, will outline the motivations people typically have for running a multi DC cluster. He will also look at how multiple DC's are supported through all areas of the Cassandra, how it impacts your application and operations, and how you can always blame the network.
About the Speaker
Alexander DEJANOVSKI Consultant, The Last Pickle
Alexander has been working as a software developer for the last 18 years, mainly for the french leader of express shipments. He's been leading there the effort to build a Cassandra based architecture and migrate services to it from traditional RDBMS. He is involved in the Cassandra community through the development of a JDBC wrapper for the DataStax Java Driver. Recently, he joined The Last Pickle as a Cassandra consultant and now helps customers to get the best out of it.
Linux Kernel vs DPDK: HTTP Performance ShowdownScyllaDB
In this session I will use a simple HTTP benchmark to compare the performance of the Linux kernel networking stack with userspace networking powered by DPDK (kernel-bypass).
It is said that kernel-bypass technologies avoid the kernel because it is "slow", but in reality, a lot of the performance advantages that they bring just come from enforcing certain constraints.
As it turns out, many of these constraints can be enforced without bypassing the kernel. If the system is tuned just right, one can achieve performance that approaches kernel-bypass speeds, while still benefiting from the kernel's battle-tested compatibility, and rich ecosystem of tools.
SignalFx engineer Rajiv Kurian's presentation on why we wrote our own Kafka consumer, the performance goals, and the performance gains achieved.
Download the slides to see animations showing hardware details. These slides were converged from Keynote to Powerpoint, so there may be some oddness with slide transitions!
JDD2015: Make your world event driven - Krzysztof DębskiPROIDEA
MAKE YOUR WORLD EVENT DRIVEN
Just after you set up your first microservice you realize that the game has just started. You need to improve latency in your application and reduce unnecessary communication.
To make your architecture fully decoupled you need to embrace asynchronous communication. Good way to achieve that is to switch to Event Driven Architecture.
We will see how to use Kafka in your microservices. We will also cover some pitfalls you might face during using Kafka and how to deal with them.
After the talk you will know the toolset that are need to improve your microservice ecosystem.
If you’re involved in open source work in or around a business, you will inevitably have the discussion, “Is this open source or proprietary?” Do not take this moment lightly. This seemingly easy question is met with strong opinions on both sides. Friendships have been lost. Companies have suffered. It’s as close to religious warfare as we can get in the tech world.
It’s time to call a truce.
There are plenty of valid arguments on both sides. Patrick McFadin outlines the pros and cons of each. Using example scenarios of projects that must decide whether or not they’ll be open source, Patrick explores objective ways to make a decision without descending into chaos and name calling. Even without a completely objective picture, understanding both sides of the argument can help keep you on track and civil. Patrick has been involved in OSS for more years than he likes to admit and would love for his past mistakes to benefit you.
Topics include:
- Key questions to ask to help guide your decision
- Reasons for choosing OSS
- Reasons for staying strictly proprietary
- Considerations for mixing OSS and proprietary models
- Transitioning from one model to the other
Help! I want to contribute to an Open Source project but my boss says no.Patrick McFadin
You love using Open Source Software. It's done right by you and now you want to contribute back. You get your patch all ready and… the boss says no! Don't feel alone. Enterprises everywhere are trying to figure this out. I'll walk you through what actually risks exist to businesses and how you can help manage them. Maybe armed with some information your boss will say... yes!
Analyzing Time Series Data with Apache Spark and CassandraPatrick McFadin
You have collected a lot of time series data so now what? It's not going to be useful unless you can analyze what you have. Apache Spark has become the heir apparent to Map Reduce but did you know you don't need Hadoop? Apache Cassandra is a great data source for Spark jobs! Let me show you how it works, how to get useful information and the best part, storing analyzed data back into Cassandra. That's right. Kiss your ETL jobs goodbye and let's get to analyzing. This is going to be an action packed hour of theory, code and examples so caffeine up and let's go.
This summer, coming to a server near you, Cassandra 3.0! Contributors and committers have been working hard on what is the most ambitious release to date. It’s almost too much to talk about, but we will dig into some of the most important, ground breaking features that you’ll want to use. Indexing changes that will make your applications faster and spark jobs more efficient. Storage engine changes to get even more density and efficiency from your nodes. Developer focused features like full JSON support and User Defined Functions. And finally, one of the most requested features, Windows support, has made it’s arrival. There is more, but you’ll just have to some see for yourself. Get your front row seat and don’t miss it!
Further discussion on Data Modeling with Apache Cassandra. Overview of formal data modeling techniques as well as practical. Real-world use cases and associated data models.
Making money with open source and not losing your soul: A practical guidePatrick McFadin
We now live in a world where Open Source Software is as generally accepted as any commercial software. This doesn’t mean that there are lack of commercial aspects for OSS, because I’m here to tell you, Open Source is a perfectly viable business model. Don't worry! You don't have to sell your soul to the suits on Wall Street and give up on the core values of open source to make it work. I'm employed by a company that (hopefully) embodies these values with a lot of success. I’ve also interviewed many business leaders in Open Source companies. Let me share some of what I’ve learned so you too can be successful. The topics I will be covering:
- Picking the right open source license
- Business models for monetizing open source
- Engaging the community in a mutually beneficial way
- Competing with commercial alternatives
- The selling process (yes, we have to talk about that)
Time series with Apache Cassandra - Long versionPatrick McFadin
Apache Cassandra has proven to be one of the best solutions for storing and retrieving time series data. This talk will give you an overview of the many ways you can be successful. We will discuss how the storage model of Cassandra is well suited for this pattern and go over examples of how best to build data models.
Apache Cassandra 2.0 is out - now there's no reason not to ditch that ol' legacy relational system for your important online applications. Cassandra 2.0 includes big impact features like Light Weight Transactions and Triggers. Do you know about the other new enhancements that got lost in the noise. Let's put the spotlight on all the things! Changes in memory management, file handling and internals. Low hype but they pack a big punch. While we were at it, we also did a bit of house cleaning.
Building Antifragile Applications with Apache CassandraPatrick McFadin
Even with the best infrastructure, failures will occur without warning and are almost guaranteed. Building applications that can resist this fact of life can be both art and science. In this talk, I'll try to eliminate the art portion and focus more on the science. Starting at high level architecture decisions, I will take you through each layer and finally down to actual application code. Using Cassandra as the back end database, we can build layers of fault tolerance that will leave end users completely unaware of the underlying chaos that could be occurring. With a little planning, we can say goodbye to the Fail Whale and the fragility of the traditional RDBMS. Topics will include:
- Application strategies to utilize active-active, diverse, datacenters
- Replicating data with the highest integrity and maximum resilience
- Utilizing Cassandra's built-in fault tolerance
- Architecture of private, cloud or hybrid based applications
- Application driver techniques when using Cassandra
A 30 minute talk I did at Cassandra Dublin and Cassandra London. Just some things I've learned along the way as I've helped some of the largest users of Cassandra be successful. Learn form other peoples mistakes!
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
31. Kafka
Producer Consumer
Collection
API
Temperature
Processor
Precipitation
Processor
Topic = Temperature
Tem
1
Temp
2
Tem
3
Temp
4
Temp
5
Topic = Precipitation
Precip
1
Precip
2
Precip
3
Precip
4
Precip
5
Broker
Partition 0
Partition 0
Tem
1
Temp
2
Tem
3
Temp
4
Temp
5
Partition 1
Temperature
Processor
Topic = Temperature
Tem
1
Temp
2
Tem
3
Temp
4
Temp
5
Topic = Precipitation
Precip
1
Precip
2
Precip
3
Precip
4
Precip
5
Broker
Partition 0
Partition 0
Tem
1
Temp
2
Tem
3
Temp
4
Temp
5
Partition 1
Topic Temperature
Replication Factor = 2
Topic Precipitation
Replication Factor = 2
32. Kafka
Producer
Consumer
Collection
API
Temperature
Processor
Precipitation
Processor
Topic = Temperature
Tem
1
Temp
2
Tem
3
Temp
4
Temp
5
Topic = Precipitation
Precip
1
Precip
2
Precip
3
Precip
4
Precip
5
Broker
Partition 0
Partition 0
Tem
1
Temp
2
Tem
3
Temp
4
Temp
5
Partition 1 Temperature
Processor
Topic = Temperature
Tem
1
Temp
2
Tem
3
Temp
4
Temp
5
Topic = Precipitation
Precip
1
Precip
2
Precip
3
Precip
4
Precip
5
Broker
Partition 0
Partition 0
Tem
1
Temp
2
Tem
3
Temp
4
Temp
5
Partition 1
Temperature
Processor
Temperature
Processor
Precipitation
Processor
Topic Temperature
Replication Factor = 2
Topic Precipitation
Replication Factor = 2
33. Guarantees
Order
•Messages are ordered as they are sent by the
producer
•Consumers see messages in the order they were
inserted by the producer
Durability
•Messages are delivered at least once
•With a Replication Factor N up to N-1 server failures
can be tolerated without losing committed messages
35. Akka in a nutshell
• Highly concurrent
• Reactive
• Fully distributed
• Completely elastic and resilient
Actor
Mailbox
Actor
Mailbox
Actor
Mailbox
Actor
Mailbox