This document provides an overview of Apache Cassandra and Datastax Enterprise. It discusses what Cassandra is, how it is used across different industries, its key features like scalability and availability. It also covers Cassandra terminology, data distribution, replication strategies, consistency levels, and how reads and writes work in Cassandra.
Data Quality With or Without Apache Spark and Its EcosystemDatabricks
Ā
Few solutions exist in the open-source community either in the form of libraries or complete stand-alone platforms, which can be used to assure a certain data quality, especially when continuous imports happen. Organisations may consider picking up one of the available options ā Apache Griffin, Deequ, DDQ and Great Expectations. In this presentation weāll compare these different open-source products across different dimensions, like maturity, documentation, extensibility, features like data profiling and anomaly detection.
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangDatabricks
Ā
As a general computing engine, Spark can process data from various data management/storage systems, including HDFS, Hive, Cassandra and Kafka. For flexibility and high throughput, Spark defines the Data Source API, which is an abstraction of the storage layer. The Data Source API has two requirements.
1) Generality: support reading/writing most data management/storage systems.
2) Flexibility: customize and optimize the read and write paths for different systems based on their capabilities.
Data Source API V2 is one of the most important features coming with Spark 2.3. This talk will dive into the design and implementation of Data Source API V2, with comparison to the Data Source API V1. We also demonstrate how to implement a file-based data source using the Data Source API V2 for showing its generality and flexibility.
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin HuaiDatabricks
Ā
Catalyst is becoming one of the most important components of Apache Spark, as it underpins all the major new APIs in Spark 2.0 and later versions, from DataFrames and Datasets to Streaming. At its core, Catalyst is a general library for manipulating trees.
In this talk, Yin explores a modular compiler frontend for Spark based on this library that includes a query analyzer, optimizer, and an execution planner. Yin offers a deep dive into Spark SQLās Catalyst optimizer, introducing the core concepts of Catalyst and demonstrating how developers can extend it. Youāll leave with a deeper understanding of how Spark analyzes, optimizes, and plans a userās query.
Data Quality With or Without Apache Spark and Its EcosystemDatabricks
Ā
Few solutions exist in the open-source community either in the form of libraries or complete stand-alone platforms, which can be used to assure a certain data quality, especially when continuous imports happen. Organisations may consider picking up one of the available options ā Apache Griffin, Deequ, DDQ and Great Expectations. In this presentation weāll compare these different open-source products across different dimensions, like maturity, documentation, extensibility, features like data profiling and anomaly detection.
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangDatabricks
Ā
As a general computing engine, Spark can process data from various data management/storage systems, including HDFS, Hive, Cassandra and Kafka. For flexibility and high throughput, Spark defines the Data Source API, which is an abstraction of the storage layer. The Data Source API has two requirements.
1) Generality: support reading/writing most data management/storage systems.
2) Flexibility: customize and optimize the read and write paths for different systems based on their capabilities.
Data Source API V2 is one of the most important features coming with Spark 2.3. This talk will dive into the design and implementation of Data Source API V2, with comparison to the Data Source API V1. We also demonstrate how to implement a file-based data source using the Data Source API V2 for showing its generality and flexibility.
A Deep Dive into Spark SQL's Catalyst Optimizer with Yin HuaiDatabricks
Ā
Catalyst is becoming one of the most important components of Apache Spark, as it underpins all the major new APIs in Spark 2.0 and later versions, from DataFrames and Datasets to Streaming. At its core, Catalyst is a general library for manipulating trees.
In this talk, Yin explores a modular compiler frontend for Spark based on this library that includes a query analyzer, optimizer, and an execution planner. Yin offers a deep dive into Spark SQLās Catalyst optimizer, introducing the core concepts of Catalyst and demonstrating how developers can extend it. Youāll leave with a deeper understanding of how Spark analyzes, optimizes, and plans a userās query.
Storing time series data with Apache CassandraPatrick McFadin
Ā
If you are looking to collect and store time series data, it's probably not going to be small. Don't get caught without a plan! Apache Cassandra has proven itself as a solid choice now you can learn how to do it. We'll look at possible data models and the the choices you have to be successful. Then, let's open the hood and learn about how data is stored in Apache Cassandra. You don't need to be an expert in distributed systems to make this work and I'll show you how. I'll give you real-world examples and work through the steps. Give me an hour and I will upgrade your time series game.
Writing Continuous Applications with Structured Streaming in PySparkDatabricks
Ā
We are in the midst of a Big Data Zeitgeist in which data comes at us fast, in myriad forms and formats at intermittent intervals or in a continuous stream, and we need to respond to streaming data immediately. This need has created a notion of writing a streaming application that reacts and interacts with data in real-time. We call this a continuous application. In this talk we will explore the concepts and motivations behind continuous applications and how Structured Streaming Python APIs in Apache Spark 2.x enables writing them. We also will examine the programming model behind Structured Streaming and the APIs that support them. Through a short demo and code examples, Jules will demonstrate how to write an end-to-end Structured Streaming application that reacts and interacts with both real-time and historical data to perform advanced analytics using Spark SQL, DataFrames, and Datasets APIs.
Building a data lake is a daunting task. The promise of a virtual data lake is to provide the advantages of a data lake without consolidating all data into a single repository. With Apache Arrow and Dremio, companies can, for the first time, build virtual data lakes that provide full access to data no matter where it is stored and no matter what size it is.
This presentation shortly describes key features of Apache Cassandra. It was held at the Apache Cassandra Meetup in Vienna in January 2014. You can access the meetup here: http://www.meetup.com/Vienna-Cassandra-Users/
Maxscale switchover, failover, and auto rejoinWagner Bianchi
Ā
How the MariaDB Maxscale Switchover, Failover, and Rejoin works under the hood by Esa Korhonen and Wagner Bianchi.
You can watch the video of the presentation at
https://www.linkedin.com/feed/update/urn:li:activity:6381185640607809536
- Understanding Time Series
- What's the Fundamental Problem
- Prometheus Solution (v1.x)
- New Design of Prometheus (v2.x)
- Data Compression Algorithm
A brief history of Instagram's adoption cycle of the open source distributed database Apache Cassandra, in addition to details about it's use case and implementation. This was presented at the San Francisco Cassandra Meetup at the Disqus HQ in August 2013.
Presenter: Robbie Strickland, Software Development Manager at The Weather Channel
As a reformed CQL critic, I'd like to help dispel the myths around CQL and extol its awesomeness. Most criticism comes from people like me who were early Cassandra adopters and are concerned about the SQL-like syntax, the apparent lack of control, and the reliance on a defined schema. I'll pop open the hood, showing just how the various CQL constructs translate to the underlying storage layer--and in the process I hope to give novices and old-timers alike a reason to love CQL.
Storing time series data with Apache CassandraPatrick McFadin
Ā
If you are looking to collect and store time series data, it's probably not going to be small. Don't get caught without a plan! Apache Cassandra has proven itself as a solid choice now you can learn how to do it. We'll look at possible data models and the the choices you have to be successful. Then, let's open the hood and learn about how data is stored in Apache Cassandra. You don't need to be an expert in distributed systems to make this work and I'll show you how. I'll give you real-world examples and work through the steps. Give me an hour and I will upgrade your time series game.
Writing Continuous Applications with Structured Streaming in PySparkDatabricks
Ā
We are in the midst of a Big Data Zeitgeist in which data comes at us fast, in myriad forms and formats at intermittent intervals or in a continuous stream, and we need to respond to streaming data immediately. This need has created a notion of writing a streaming application that reacts and interacts with data in real-time. We call this a continuous application. In this talk we will explore the concepts and motivations behind continuous applications and how Structured Streaming Python APIs in Apache Spark 2.x enables writing them. We also will examine the programming model behind Structured Streaming and the APIs that support them. Through a short demo and code examples, Jules will demonstrate how to write an end-to-end Structured Streaming application that reacts and interacts with both real-time and historical data to perform advanced analytics using Spark SQL, DataFrames, and Datasets APIs.
Building a data lake is a daunting task. The promise of a virtual data lake is to provide the advantages of a data lake without consolidating all data into a single repository. With Apache Arrow and Dremio, companies can, for the first time, build virtual data lakes that provide full access to data no matter where it is stored and no matter what size it is.
This presentation shortly describes key features of Apache Cassandra. It was held at the Apache Cassandra Meetup in Vienna in January 2014. You can access the meetup here: http://www.meetup.com/Vienna-Cassandra-Users/
Maxscale switchover, failover, and auto rejoinWagner Bianchi
Ā
How the MariaDB Maxscale Switchover, Failover, and Rejoin works under the hood by Esa Korhonen and Wagner Bianchi.
You can watch the video of the presentation at
https://www.linkedin.com/feed/update/urn:li:activity:6381185640607809536
- Understanding Time Series
- What's the Fundamental Problem
- Prometheus Solution (v1.x)
- New Design of Prometheus (v2.x)
- Data Compression Algorithm
A brief history of Instagram's adoption cycle of the open source distributed database Apache Cassandra, in addition to details about it's use case and implementation. This was presented at the San Francisco Cassandra Meetup at the Disqus HQ in August 2013.
Presenter: Robbie Strickland, Software Development Manager at The Weather Channel
As a reformed CQL critic, I'd like to help dispel the myths around CQL and extol its awesomeness. Most criticism comes from people like me who were early Cassandra adopters and are concerned about the SQL-like syntax, the apparent lack of control, and the reliance on a defined schema. I'll pop open the hood, showing just how the various CQL constructs translate to the underlying storage layer--and in the process I hope to give novices and old-timers alike a reason to love CQL.
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...DataStax
Ā
Many users set the replication strategy on their keyspaces to NetworkTopologyStrategy and move on with modeling their data or developing the next big application. But what does that replication strategy really mean? Let's explore replication and consistency in Cassandra.
How are replicas chosen?
Where does node topology (location in a cluster) come into play?
What can I expect when nodes are down I'm querying with a Consistency Level of local quorum?
If a rack goes down can I still respond to quorum queries?
These questions may be simple to test, but have nuances that should be understood. This talk will dive into these topics in a visual and technical manner. Seasoned Cassandra veterans and new users alike stand to gain knowledge about these critical Cassandra components.
About the Speaker
Christopher Bradford Solutions Architect, DataStax
High performance drives Christopher Bradford. He has worked across various industries including the federal government, higher education, social news syndication, low latency HD video delivery and usability research. Chris combines application engineering principles and systems administration experience to design and implement performant systems. He has architected applications and systems to create highly available, fault tolerant, distributed services in a myriad environments.
Solr & Cassandra: Searching Cassandra with DataStax EnterpriseDataStax Academy
Ā
Wait! Back away from the Cassandra secondary index. Itās ok for some use cases, but itās not an easy button. āBut I need to search through a bunch of columns to look for the dataā¦ and I canāt model that in C*, even after watching all of Patrick McFadins data modeling videos. What do I do?ā The answer, dear developer, is in DSE Search. With itās easy Solr API, Lucene indexes (and fault tolerance) you can search data stored in your Cassandra database until your heartās content. Take my hand. I will show you how.
Patrick Guillebert ā IT-Tage 2015 ā Cassandra NoSQL - Architektur und Anwendu...Informatik Aktuell
Ā
Was macht die NoSQL-Datenbank Cassandra so skalierbar und hochverfĆ¼gbar?
Cassandra ist eine verteilte, spaltenorientierte NoSQL-Datenbank, die leicht weiterentwickelt werden kann und die Replikationsmechanismen von Dynamo nutzt, gleichzeitig aber nach auĆen die Datenstruktur von BigTable anbietet. Sie ist auf hohe Skalierbarkeit und Ausfallsicherheit bei groĆen, verteilten Systemen ausgelegt. Die Daten werden in SchlĆ¼ssel-Wert-Relationen abgelegt.
Fault tolerance at scale, a look at how Apache Cassandra achieves fault tolerance via multi data center and/or multi region replication. Presented by Alex Thompson at the Sydney Cassandra Meetup.
Designing Resilient Application Platforms with Apache Cassandra - Hayato Shim...jaxLondonConference
Ā
Presented at JAX London 2013
All too often I have observed infrastructure designs for deploying Java applications come as an afterthought by businesses, technical analysts, and application developers. Choices of technologies are frequently made with no final deployment infrastructures being discussed. The talk will cover the design considerations on building resilient applications, and application deployment platforms across multiple data centres, and how organisations can leverage technologies such as Apache Cassandra to achieve this.
Apache Cassandra operations have the reputation to be simple on single datacenter deployments and / or low volume clusters but they become way more complex on high latency multi-datacenter clusters with high volume and / or high throughout: basic Apache Cassandra operations such as repairs, compactions or hints delivery can have dramatic consequences even on a healthy high latency multi-datacenter cluster.
In this presentation, Julien will go through Apache Cassandra mutli-datacenter concepts first then show multi-datacenter operations essentials in details: bootstrapping new nodes and / or datacenter, repairs strategy, Java GC tuning, OS tuning, Apache Cassandra configuration and monitoring.
Based on his 3 years experience managing a multi-datacenter cluster against Apache Cassandra 2.0, 2.1, 2.2 and 3.0, Julien will give you tips on how to anticipate and prevent / mitigate issues related to basic Apache Cassandra operations with a multi-datacenter cluster.
Apache Cassandra operations have the reputation to be simple on single datacenter deployments and / or low volume clusters but they become way more complex on high latency multi-datacenter clusters with high volume and / or high throughout: basic Apache Cassandra operations such as repairs, compactions or hints delivery can have dramatic consequences even on a healthy high latency multi-datacenter cluster.
In this presentation, Julien will go through Apache Cassandra mutli-datacenter concepts first then show multi-datacenter operations essentials in details: bootstrapping new nodes and / or datacenter, repairs strategy, Java GC tuning, OS tuning, Apache Cassandra configuration and monitoring.
Based on his 3 years experience managing a multi-datacenter cluster against Apache Cassandra 2.0, 2.1, 2.2 and 3.0, Julien will give you tips on how to anticipate and prevent / mitigate issues related to basic Apache Cassandra operations with a multi-datacenter cluster.
About the Speaker
Julien Anguenot VP Software Engineering, iland Internet Solutions, Corp
Julien currently serves as iland's Vice President of Software Engineering. Prior to joining iland, Mr. Anguenot held tech leadership positions at several open source content management vendors and tech startups in Europe and in the U.S. Julien is a long time Open Source software advocate, contributor and speaker: Zope, ZODB, Nuxeo contributor, Zope and OpenStack foundations member, his talks includes Apache Con, Cassandra summit, OpenStack summit, The WWW Conference or still EuroPython.
Apache Cassandra and The Multi-Cloud by Amanda MoranData Con LA
Ā
Distributed Databases and more specifically cloud-native databases were created to face many of the issues with a traditional relational database. Having a low latency and highly available database is the key to preventing a multitude of issues. This talk will focus on what distributed databases provide and why itās important. This talk will also focus on how cloud-native databases like Apache Cassandra are the perfect match for multi-cloud architectures, and why multi-cloud is important.
Apache Cassandra operations have the reputation to be quite simple against single datacenter clusters and / or low volume clusters but they become way more complex against high latency multi-datacenter clusters: basic operations such as repair, compaction or hints delivery can have dramatic consequences even on a healthy cluster.
In this presentation, Julien will go through Cassandra operations in details: bootstrapping new nodes and / or datacenter, repair strategies, compaction strategies, GC tuning, OS tuning, large batch of data removal and Apache Cassandra upgrade strategy.
Julien will give you tips and techniques on how to anticipate issues inherent to multi-datacenter cluster: how and what to monitor, hardware and network considerations as well as data model and application level bad design / anti-patterns that can affect your multi-datacenter cluster performances.
One of our presentation which was given on Cassandra Database. Aruman implement big-data projects for its multiple client. RDBMS to Cassandra conversion is task which is taken by ARUMAN.
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
Ā
Companies today are innovating with real-time data to deliver truly amazing customer experiences in the moment. Real-time data management for real-time customer experience is core to staying ahead of competition and driving revenue growth. Join Trays to learn how Comcast is differentiating itself from it's own historical reputation with Customer Experience strategies.
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
Ā
DataStax Enterprise (DSE) Graph is a built to manage, analyze, and search highly connected data. DSE Graph, built on NoSQL Apache Cassandra delivers continuous uptime along with predictable performance and scales for modern systems dealing with complex and constantly changing data.
Download DataStax Enterprise: Academy.DataStax.com/Download
Start free training for DataStax Enterprise Graph: Academy.DataStax.com/courses/ds332-datastax-enterprise-graph
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
Ā
DataStax Enterprise Advanced Replication supports one-way distributed data replication from remote database clusters that might experience periods of network or internet downtime. Benefiting use cases that require a 'hub and spoke' architecture.
Learn more at http://www.datastax.com/2016/07/stay-100-connected-with-dse-advanced-replication
Advanced Replication docs āĀ https://docs.datastax.com/en/latest-dse/datastax_enterprise/advRep/advRepTOC.html
Data Modeling is the one of the first things to sink your teeth into when trying out a new database. That's why we are going to cover this foundational topic in enough detail for you to get dangerous. Data Modeling for relational databases is more than a touch different than the way it's approached with Cassandra. We will address the quintessential query-driven methodology through a couple of different use cases, including working with time series data for IoT. We will also demo a new tool to get you bootstrapped quickly with MovieLens sample data. This talk should give you the basics you need to get serious with Apache Cassandra.
Hear about how Coursera uses Cassandra as the core of its scalable online education platform. I'll discuss the strengths of Cassandra that we leverage, as well as some limitations that you might run into as well in practice.
In the second part of this talk, we'll dive into how best to effectively use the Datastax Java drivers. We'll dig into how the driver is architected, and use this understanding to develop best practices to follow. I'll also share a couple of interesting bug we've run into at Coursera.
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
Ā
This talk covers scaling Cassandra to a fast growing user base. Alex and Isaias will cover new best practices and how to work with the strengths and weaknesses of Cassandra at large scale. They will discuss how to adapt to bottlenecks while providing a rich feature set to the playstation community.
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
Ā
This talk covers scaling Cassandra to a fast growing user base. Alex and Isaias will cover new best practices and how to work with the strengths and weaknesses of Cassandra at large scale. They will discuss how to adapt to bottlenecks while providing a rich feature set to the playstation community.
Welcome to the first live UiPath Community Day Dubai! Join us for this unique occasion to meet our local and global UiPath Community and leaders. You will get a full view of the MEA region's automation landscape and the AI Powered automation technology capabilities of UiPath. Also, hosted by our local partners Marc Ellis, you will enjoy a half-day packed with industry insights and automation peers networking.
š Curious on our agenda? Wait no more!
10:00 Welcome note - UiPath Community in Dubai
Lovely Sinha, UiPath Community Chapter Leader, UiPath MVPx3, Hyper-automation Consultant, First Abu Dhabi Bank
10:20 A UiPath cross-region MEA overview
Ashraf El Zarka, VP and Managing Director MEA, UiPath
10:35: Customer Success Journey
Deepthi Deepak, Head of Intelligent Automation CoE, First Abu Dhabi Bank
11:15 The UiPath approach to GenAI with our three principles: improve accuracy, supercharge productivity, and automate more
Boris Krumrey, Global VP, Automation Innovation, UiPath
12:15 To discover how Marc Ellis leverages tech-driven solutions in recruitment and managed services.
Brendan Lingam, Director of Sales and Business Development, Marc Ellis
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Ā
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
Ā
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Ā
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
The Metaverse and AI: how can decision-makers harness the Metaverse for their...Jen Stirrup
Ā
The Metaverse is popularized in science fiction, and now it is becoming closer to being a part of our daily lives through the use of social media and shopping companies. How can businesses survive in a world where Artificial Intelligence is becoming the present as well as the future of technology, and how does the Metaverse fit into business strategy when futurist ideas are developing into reality at accelerated rates? How do we do this when our data isn't up to scratch? How can we move towards success with our data so we are set up for the Metaverse when it arrives?
How can you help your company evolve, adapt, and succeed using Artificial Intelligence and the Metaverse to stay ahead of the competition? What are the potential issues, complications, and benefits that these technologies could bring to us and our organizations? In this session, Jen Stirrup will explain how to start thinking about these technologies as an organisation.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Ā
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Enhancing Performance with Globus and the Science DMZGlobus
Ā
ESnet has led the way in helping national facilitiesāand many other institutions in the research communityāconfigure Science DMZs and troubleshoot network issues to maximize data transfer performance. In this talk we will present a summary of approaches and tips for getting the most out of your network infrastructure using Globus Connect Server.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Ā
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Ā
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
Ā
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Ā
Clients donāt know what they donāt know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clientsā needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
4. What is Apache Cassandra?
Apache Cassandraā¢ is a massively scalable NoSQL database.
Cassandra is designed to handle big data workloads across
multiple data centers with no single point of failure, providing
enterprises with continuous availability without compromising
performance.
16. Overview of Data Partitioning in Cassandra
There are two basic data partitioning strategies:
1. Random partitioning ā this is the default and
recommended strategy. Partitions data as evenly as
possible across all nodes using a hash of every column
family row key
2. Ordered partitioning ā stores column family row keys in
sorted order across the nodes in a database cluster
20. 5534023222112865484
20
-9223372036854775808
-5534023222112865485
-1844674407370955162
1844674407370955161
Data Distribution
This node owns the token
range:
1844674407370955162
To
5534023222112865484
21. Overview of Replication in Cassandra
ā¢ Replication is controlled by what is called the replication
factor. A replication factor of 1 means there is only one
copy of a row in a cluster. A replication factor of 2 means
there are two copies of a row stored in a cluster
ā¢ Replication is controlled at the keyspace level in
Cassandra
Original row
Copy of row
37. Reading and Writing to Cassandra Nodes
ā¢ Cassandra has a ālocation independenceā architecture,
which allows any user to connect to any node in any data
center and read/write the data they need
ā¢ All writes being partitioned and replicated for them
automatically throughout the cluster
43. Tunable Data Consistency
ā¢ Choose between strong and eventual consistency (one
to all responding) depending on the need
ā¢ Can be done on a per-operation basis, and for both
reads and writes
ā¢ Handles multi-data center operations
Writes
ā¢ Any
ā¢ One
ā¢ Quorum
ā¢ Local_Quorum
ā¢ Each_Quorum
ā¢ All
Reads
ā¢ One
ā¢ Quorum
ā¢ Local_Quorum
ā¢ Each_Quorum
ā¢ All
61. Writes (what happens within each node)
ā¢ Data is first written to a commit log for durability. Your data is safe
in Cassandra
ā¢ Then written to a memtable in memory
ā¢ Once the memtable becomes full, it is flushed to an SSTable
(sorted strings table)
ā¢ Writes are atomic at the row level; all columns are written or
updated, or none are. RDBMS-styled transactions are not
supported
INSERT INTOā¦
Commit log memtable
SSTable
Cassandra is known for being the fastest database in the industry where
write operations are concerned.
64. Reads (what happens within each node)
ā¢ Depending on the frequency of inserts and updates, a record will
exist in multiple places. Each place must be read to retrieve the
entire record.
ā¢ Data is read from the memtable in memory.
ā¢ Multiple SSTables may also be read.
ā¢ Bloom filters prevent excessive reading of SSTables.
SELECT * FROMā¦
memtable
SSTable
Bloom Filter
SSTable SSTable SSTable
69. Security in Cassandra
BENEFITS FEATURES
Internal Authentication
Manages login IDs and
passwords inside the
database
+ Ensures only authorized
users can access a
database system using
internal validation
+ Simple to implement and
easy to understand
+ No learning curve from
the relational world
Object Permission
Management
controls who has access to
what and who can do what
in the database
+ Provides granular based
control over who can
add/change/delete/read
data
+ Uses familiar GRANT/
REVOKE from relational
systems
+ No learning curve
Client to Node
Encryption
protects data in flight to
and from a database
cluster
+ Ensures data cannot
be captured/stolen in
route to a server
+ Data is safe both in
flight from/to a
database and on the
database; complete
coverage is ensured
70. Advanced Security in DataStax Enterprise
BENEFITS FEATURES
External Authentication
uses external security
software systems to control
security
+ Only authorized users
have access to a database
system using external
validation
+ Uses most trusted external
security systems
(Kerberos, LDAP, AD),
mainstays in government
and finance
+ Single sign on to all data
domains
Transparent Data
Encryption
encrypts data at rest
+ Protects sensitive data
at rest from theft and
from being read at the
file system level
+ No changes needed at
application level
Data Auditing
provides trail of who did and
looked at what/when
+ Supplies admins with an
audit trail of all accesses
and changes
+ Granular control to audit
only whatās needed
+ Uses log4j interface to
ensure performance and
efficient audit operations
72. DataStax OpsCenter
ā¢ Visual, browser-based user
interface negates need to install
client software
ā¢ Administration tasks carried out in
point-and-click fashion
ā¢ Allows for visual rebalance of data
across a cluster when new nodes
are added
ā¢ Contains proactive alerts that warn
of impending issues.
ā¢ Built-in external notification
abilities
ā¢ Visually perform and schedule
backup operations
73. DataStax OpsCenter
A new, 10-node Cassandra (or Hadoop) cluster with OpsCenter A new, 10-node DSE cluster with OpsCenter running on AruWnnSin gin i n3 3m mininuutteessā¦ā¦
1 2 3 Done
75. Enterprise Search
ā¢ Built-in enterprise search on Cassandra data via Solr integration
ā¢ Facets, Filtering, Geospatial search, Text Analysis, etc.
ā¢ Near real-time search operations
ā¢ Search queries from CQL and REST/Solr
ā¢ Solr shortcomings:
ā¢ No bottleneck. Client can read/write to any Solr node.
ā¢ Search index partitioning and replication for scalability and availability.
ā¢ Multi-DC support
ā¢ Data durability (Solr lacks write-ahead log, data can be lost)
76
Cassandra
Replication
Customer
Facing
Search
Nodes