C* Summit EU 2013: No Whistling Required: Cabs, Cassandra, and Hailo DataStax Academy
Speaker: Dave Gardner, Architect at Hailo
Video: http://www.youtube.com/watch?v=6cUuE7sTdU0&list=PLqcm6qE9lgKLoYaakl3YwIWP4hmGsHm5e&index=16
Hailo has leveraged Cassandra to build one of the most successful startups in European history. This presentations looks at how Hailo grew from a simple MySQL-backed infrastructure to a resilient Cassandra-backed system running in three data centres globally. Topics covered include: the process of migration, experience running multi-DC on AWS, common data modeling patterns and security implications for achieving PCI compliance.
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag JambhekarDataStax Academy
We have seen rapid adoption of C* at eBay in past two years. We have made tremendous efforts to integrate C* into existing database platforms, including Oracle, MySQL, Postgres, MongoDB, XMP etc.. We also scale C* to meet business requirement and encountered technical challenges you only see at eBay scale, 100TB data on hundreds of nodes. We will share our experience of deployment automation, managing, monitoring, reporting for both Apache Cassandra and DataStax enterprise.
Migration Best Practices: From RDBMS to Cassandra without a HitchDataStax Academy
Presenter: Duy Hai Doan, Technical Advocate at Datastax
Libon is a messaging service designed to improve mobile communications through free calls, chat and a voicemail services regardless of operator or Internet access provider. As a mobile communications application, Libon processes billions of messages and calls while backing up billions of contact data. Join this webinar to learn best practices and pitfalls to avoid when tackling a migration project from Relational Database (RDBMS) to Cassandra and how Libon is now able to ingest massive volumes of high velocity data with read and write latency below 10 milliseconds.
Cassandra Summit 2014: Apache Cassandra Best Practices at EbayDataStax Academy
Presenter: Feng Qu, Principal DBA at eBay
Cassandra has been adopted widely at eBay in recent years and used by many end-user facing applications. I will introduce best practices we have built over the time around system design, capacity planning, deployment automation, monitoring integration, performance analysis and troubleshooting. I will also share our experience working with DataStax support to provide a highly available, highly scalable data store fitting into eBay infrastructure.
Cassandra at eBay - Cassandra Summit 2013Jay Patel
"Buy It Now! Cassandra at eBay" talk at Cassandra Summit 2013
This session will cover various use cases for Cassandra at eBay. It’ll start with overview of eBay’s heterogeneous data platform comprised of SQL & NoSQL databases, and where Cassandra fits into that. For each use case, Jay will go into detail of system design, data model & multi-datacenter deployment. To conclude, Jay will summarize the best practices that guide Cassandra utilization at eBay.
http://www.datastax.com/company/news-and-events/events/cassandrasummit2013
DataStax C*ollege Credit: What and Why NoSQL?DataStax
In the first of our bi-weekly C*ollege Credit series Aaron Morton, DataStax MVP for Apache Cassandra and Apache Cassandra committer and Robin Schumacher, VP of product management at DataStax, will take a look back at the history of NoSQL databases and provide a foundation of knowledge for people looking to get started with NoSQL, or just wanting to learn more about this growing trend. You will learn how to know that NoSQL is right for your application, and how to pick a NoSQL database. This webinar is C* 101 level.
Talk given at QCon, London 2014. You can find the video here: http://bit.ly/jpm_001a
This topic will introduce the Cassandra native protocol, native drivers and Cassandra Query Language (CQL). It is important for developers to be aware of this new way of integrating with and querying Cassandra – without using Thrift or RPC. There are various ways of tuning that integration and modeling your data - all intended to make it easier and more productive to build against Cassandra with some additional performance benefits. This is a technical session with code abstracts using the Java driver.
C* Summit EU 2013: No Whistling Required: Cabs, Cassandra, and Hailo DataStax Academy
Speaker: Dave Gardner, Architect at Hailo
Video: http://www.youtube.com/watch?v=6cUuE7sTdU0&list=PLqcm6qE9lgKLoYaakl3YwIWP4hmGsHm5e&index=16
Hailo has leveraged Cassandra to build one of the most successful startups in European history. This presentations looks at how Hailo grew from a simple MySQL-backed infrastructure to a resilient Cassandra-backed system running in three data centres globally. Topics covered include: the process of migration, experience running multi-DC on AWS, common data modeling patterns and security implications for achieving PCI compliance.
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag JambhekarDataStax Academy
We have seen rapid adoption of C* at eBay in past two years. We have made tremendous efforts to integrate C* into existing database platforms, including Oracle, MySQL, Postgres, MongoDB, XMP etc.. We also scale C* to meet business requirement and encountered technical challenges you only see at eBay scale, 100TB data on hundreds of nodes. We will share our experience of deployment automation, managing, monitoring, reporting for both Apache Cassandra and DataStax enterprise.
Migration Best Practices: From RDBMS to Cassandra without a HitchDataStax Academy
Presenter: Duy Hai Doan, Technical Advocate at Datastax
Libon is a messaging service designed to improve mobile communications through free calls, chat and a voicemail services regardless of operator or Internet access provider. As a mobile communications application, Libon processes billions of messages and calls while backing up billions of contact data. Join this webinar to learn best practices and pitfalls to avoid when tackling a migration project from Relational Database (RDBMS) to Cassandra and how Libon is now able to ingest massive volumes of high velocity data with read and write latency below 10 milliseconds.
Cassandra Summit 2014: Apache Cassandra Best Practices at EbayDataStax Academy
Presenter: Feng Qu, Principal DBA at eBay
Cassandra has been adopted widely at eBay in recent years and used by many end-user facing applications. I will introduce best practices we have built over the time around system design, capacity planning, deployment automation, monitoring integration, performance analysis and troubleshooting. I will also share our experience working with DataStax support to provide a highly available, highly scalable data store fitting into eBay infrastructure.
Cassandra at eBay - Cassandra Summit 2013Jay Patel
"Buy It Now! Cassandra at eBay" talk at Cassandra Summit 2013
This session will cover various use cases for Cassandra at eBay. It’ll start with overview of eBay’s heterogeneous data platform comprised of SQL & NoSQL databases, and where Cassandra fits into that. For each use case, Jay will go into detail of system design, data model & multi-datacenter deployment. To conclude, Jay will summarize the best practices that guide Cassandra utilization at eBay.
http://www.datastax.com/company/news-and-events/events/cassandrasummit2013
DataStax C*ollege Credit: What and Why NoSQL?DataStax
In the first of our bi-weekly C*ollege Credit series Aaron Morton, DataStax MVP for Apache Cassandra and Apache Cassandra committer and Robin Schumacher, VP of product management at DataStax, will take a look back at the history of NoSQL databases and provide a foundation of knowledge for people looking to get started with NoSQL, or just wanting to learn more about this growing trend. You will learn how to know that NoSQL is right for your application, and how to pick a NoSQL database. This webinar is C* 101 level.
Talk given at QCon, London 2014. You can find the video here: http://bit.ly/jpm_001a
This topic will introduce the Cassandra native protocol, native drivers and Cassandra Query Language (CQL). It is important for developers to be aware of this new way of integrating with and querying Cassandra – without using Thrift or RPC. There are various ways of tuning that integration and modeling your data - all intended to make it easier and more productive to build against Cassandra with some additional performance benefits. This is a technical session with code abstracts using the Java driver.
Cassandra nyc 2011 ilya maykov - ooyala - scaling video analytics with apac...ivmaykov
This document discusses scaling video analytics using Apache Cassandra. It provides an overview of Ooyala's video analytics platform and the challenges of scaling to support billions of log pings and terabytes of data daily. Cassandra is used to store over 10 terabytes of historical analytics data covering 4 years of growth. The key challenges addressed are scaling to handle enormous data volumes, providing fast processing and query speeds, supporting deep queries over many dimensions of data, ensuring accuracy, and allowing for rapid developer iteration. The document explains how Cassandra's data model and capabilities help meet these challenges through features like linear scalability, tunable consistency, and a rich data model.
Being able to rapidly iterate on, build, and test your code is key to being a productive developer. Without local automation, working with the numerous platforms and technologies in your stack can become very frustrating. In this webinar, Ben Bromhead CTO of Instaclustr will explore best practices to easily integrate Apache CassandraTM into your development workflow, so you spend more time writing good code and less time fighting your environment.
Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Sto...DataStax
What did I sell yesterday and how much of my plan did I fulfill today? How do my clients use our offer? What configuration combinations are in demand and what trends are emerging? How can I improve the user experience? These and other questions are frequently asked by board members and stakeholders and must be answered within a short period of time. Especially in companies that provide configurable products, it is important to support the product and pricing managers in short-term and competition-related matters with all the important data in a timely manner. In our use case, Cassandra, Kafka and Flink will take up this challenge. In this session, we will present a reference architecture based on selected use cases and demonstrate what applications arise for companies. We also take a closer look to information privacy and give some words about data visualisation.
About the Speakers
Alexandra Klimova Big Data Architect, Allianz Deutschland AG
Alexandra has 10 years of experience in both programing and operations. For the last 4 years she has focused on design and integration of Big Data Systems into enterprise platforms. She is working on data processing pipelines, distributed systems, realtime processing and data science. Alexandra holds a degree in Computer Science from the Technical University Munich. She is certified Hortonworks Hadoop Trainer and Big Data Architect at metafinanz.
Dominique Ronde Big Data Architect, Allianz Deutschland AG
Dominique Ronde is Big Data Architect at Allianz Deutschland AG and focused on the cassandra plattform. He also enjoys the part of data analytics with Flink and Spark. As a real java nerd since 2002 Dominique is familiar with the programming part, too. He is certified DataStax Solution Architect
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreDataStax Academy
We will present our Office 365 use case scenarios, why we chose Cassandra + Spark, and walk through the architecture we chose for running DSE on Azure.
The presentation will feature demos on how you too can build similar applications.
Ben Bromhead is the co-founder and CTO of Instaclustr, which provides Cassandra-as-a-Service. Instaclustr manages 50+ Cassandra nodes for customers. Early on, Instaclustr encountered issues like a Cassandra bug causing assertion errors for large column names and had to perform an emergency migration for a customer whose self-managed cluster was down for 48 hours. Migrations and real-world usage revealed new challenges compared to initial perfect test scenarios.
There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...DataStax
This document summarizes Carlos Rolo's presentation on using Cassandra with Azure Resource Manager. It introduces Carlos and his background with distributed systems and Cassandra. It then discusses Pythian, the consulting company, and their expertise in database management. The remainder of the document summarizes key aspects of using Azure, including the different Azure services, storage options, networking, availability sets, and Azure Resource Manager templates for automating deployments of Cassandra clusters on Azure.
The document discusses Apache Cassandra, a distributed database management system designed to handle large amounts of data across many commodity servers. It was developed at Facebook and modeled after Google's Bigtable. The summary discusses key concepts like its use of consistent hashing to distribute data, support for tunable consistency levels, and focus on scalability and availability over traditional SQL features. It also provides an overview of how Cassandra differs from relational databases by not supporting joins, having an optional schema, and using a prematerialized and transaction-less model.
Low-Latency Data Processing in the Era of Serverless @JavaDayLvivNazarii Cherkas
This document discusses low latency data processing in serverless computing. It begins with an overview of serverless and why industries are adopting it. While serverless is well suited for fast development, low latency data processing is challenging due to function startup times. However, approaches like keeping functions warm, using co-located services, and caching hot data externally can help. The document demonstrates a fraud detection solution built on Hazelcast Cloud, which provides an in-memory data grid that can ingest and transform data quickly. In serverless, it is important to consider latency and high availability when choosing data storage, and in-memory computing is well suited for low latency requirements.
Cassandra Day SV 2014: Beyond Read-Modify-Write with Apache CassandraDataStax Academy
This document discusses strategies for updating data in Apache Cassandra beyond using read-modify-write operations. It describes how eventual consistency allows safe updates without locking by propagating changes asynchronously. It also covers Cassandra features like collections, lightweight transactions, and content-addressable storage that provide flexible data models for modern web-scale applications while avoiding the need for read-modify-write in many cases.
Webinar: Diagnosing Apache Cassandra Problems in ProductionDataStax Academy
This document provides guidance on diagnosing problems in Cassandra production systems. It recommends first using OpsCenter to identify issues, then monitoring servers, applications, and logs. Common problems discussed include incorrect timestamps, tombstones slowing queries, not using a snitch, version mismatches, and disk space not being reclaimed. Diagnostic tools like htop, iostat, and nodetool are presented. The document also covers JVM garbage collection profiling to identify issues like early object promotion and long minor GCs slowing the system.
Hardening cassandra for compliance or paranoiazznate
Cassandra at rest encryption, inter-node communication encryption, client-server communication encryption, authentication, authorization, and securing JMX management were discussed. The document provided guidance on implementing encryption at rest using commercial and open source options, setting up SSL for inter-node and client-server communication using self-signed certificates, implementing authentication and authorization best practices from RBMS, and securing JMX access.
Cisco has a large global IT infrastructure supporting many applications, databases, and employees. The document discusses Cisco's existing customer service and commerce systems (CSCC/SMS3) and some of the performance, scalability, and user experience issues. It then presents a proposed new architecture using modern technologies like Elasticsearch, Cassandra, and microservices to address these issues and improve agility, performance, scalability, uptime, and the user interface.
- Micro-batching involves grouping statements into small batches to improve throughput and reduce network overhead when writing to Cassandra.
- A benchmark was conducted to compare individual statements, regular batches, and partition-aware batches when inserting 1 million rows into Cassandra.
- The results showed that partition-aware batches had shorter runtime, lower client and cluster CPU usage, and was more performant overall compared to individual statements and regular batches. However, it may have higher latency which is better suited for bulk data processing rather than real-time workloads.
This session will address Cassandra's tunable consistency model and cover how developers and companies should adopt a more Optimistic Software Design model.
Webinar: DataStax Training - Everything you need to become a Cassandra RockstarDataStax
The document outlines a training program from DataStax on Apache Cassandra, including an introduction to various courses that cover topics such as core concepts, operations and performance tuning, building scalable Java applications, and data modeling. It provides details on the objectives, length, audience, prerequisites, and agenda for each course. The document also includes a schedule of public course dates and locations for attendees to sign up for training.
DataStax recently announced the general availability of DataStax Enterprise 4.7 (DSE 4.7), the leading database platform purpose-built for the performance and availability demands of web, mobile, and IOT applications. In this product launch webinar, Robin Schumacher, VP of Products, explores the wide range of enhancements in DSE 4.7 including enterprise class search, analytics, and in-memory.
Dyn delivers exceptional Internet Performance. Enabling high quality services requires data centers around the globe. In order to manage services, customers need timely insight collected from all over the world. Dyn uses DataStax Enterprise (DSE) to deploy complex clusters across multiple datacenters to enable sub 50 ms query responses for hundreds of billions of data points. From granular DNS traffic data, to aggregated counts for a variety of report dimensions, DSE at Dyn has been up since 2013 and has shined through upgrades, data center migrations, DDoS attacks and hardware failures. In this webinar, Principal Engineers Tim Chadwick and Rick Bross cover the requirements which led them to choose DSE as their go-to Big Data solution, the path which led to SPARK, and the lessons that we’ve learned in the process.
Ruby Driver Explained: DataStax Webinar May 5th 2015DataStax
Apache Cassandra is the leading distributed database platform purpose built for the demands of today's modern Web, Mobile, and IOT applications. To support this growing demand, DataStax makes it a priority to deliver drivers to support today's most common and cutting edge languages. Join Bulat Shakirzyanov, Software Engineer at DataStax, as he dives into the inner-workings of the DataStax Ruby Driver for Apache Cassandra. He will:
• Provide a brief overview of Apache Cassandra's architecture and various terminology
• Walk through the basics for installing and using the DataStax Ruby Driver
• Drill into exactly why the Ruby Driver is fully asynchronous, both from the IO implementation as well as the Cassandra native protocol design
• Look at how the driver uses load balancing policies as well as use cases for various built-in load balancing policies, such as token aware and data center aware round-robin
• Discuss in detail the driver's error handling and fault tolerance and discuss various expected failure modes and how to handle them
• Explain the importance of address resolution policies and take a look at those that ship with the driver
This document discusses SQL Server security enhancements in SQL Server 2014. It covers three main topics:
1) Transparent Data Encryption allows encrypting database and log files for protection both during operations and when backing up to disk or Azure. Encryption can use passwords, asymmetric keys, or certificates.
2) Encryption Key Management allows managing encryption keys through PowerShell, SMO, SSMS and T-SQL. Asymmetric keys or certificates used for encryption must be properly backed up.
3) A new "CONNECT ANY DATABASE" permission allows logins to connect to all current and future databases without other permissions in those databases. This facilitates auditing processes.
• We sleeping well. And our mobile ringing and ringing. Message: DISASTER! In this session (on slides) we are NOT talk about potential disaster (such BCM); we talk about: And what NOW? New version old my old well-known session updated for whole changes which happened in DBA World in last two-three years.
• So, from the ground to the Sky and further - everything for surviving disaster. Which tasks should have been finished BEFORE. Is virtual or physical SQL matter? We talk about systems, databases, peoples, encryption, passwords, certificates and users.
• In this session (on few demos) I'll show which part of our SQL Server Environment are critical and how to be prepared to disaster. In some documents I'll show You how to be BEST prepared.
Cassandra nyc 2011 ilya maykov - ooyala - scaling video analytics with apac...ivmaykov
This document discusses scaling video analytics using Apache Cassandra. It provides an overview of Ooyala's video analytics platform and the challenges of scaling to support billions of log pings and terabytes of data daily. Cassandra is used to store over 10 terabytes of historical analytics data covering 4 years of growth. The key challenges addressed are scaling to handle enormous data volumes, providing fast processing and query speeds, supporting deep queries over many dimensions of data, ensuring accuracy, and allowing for rapid developer iteration. The document explains how Cassandra's data model and capabilities help meet these challenges through features like linear scalability, tunable consistency, and a rich data model.
Being able to rapidly iterate on, build, and test your code is key to being a productive developer. Without local automation, working with the numerous platforms and technologies in your stack can become very frustrating. In this webinar, Ben Bromhead CTO of Instaclustr will explore best practices to easily integrate Apache CassandraTM into your development workflow, so you spend more time writing good code and less time fighting your environment.
Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Sto...DataStax
What did I sell yesterday and how much of my plan did I fulfill today? How do my clients use our offer? What configuration combinations are in demand and what trends are emerging? How can I improve the user experience? These and other questions are frequently asked by board members and stakeholders and must be answered within a short period of time. Especially in companies that provide configurable products, it is important to support the product and pricing managers in short-term and competition-related matters with all the important data in a timely manner. In our use case, Cassandra, Kafka and Flink will take up this challenge. In this session, we will present a reference architecture based on selected use cases and demonstrate what applications arise for companies. We also take a closer look to information privacy and give some words about data visualisation.
About the Speakers
Alexandra Klimova Big Data Architect, Allianz Deutschland AG
Alexandra has 10 years of experience in both programing and operations. For the last 4 years she has focused on design and integration of Big Data Systems into enterprise platforms. She is working on data processing pipelines, distributed systems, realtime processing and data science. Alexandra holds a degree in Computer Science from the Technical University Munich. She is certified Hortonworks Hadoop Trainer and Big Data Architect at metafinanz.
Dominique Ronde Big Data Architect, Allianz Deutschland AG
Dominique Ronde is Big Data Architect at Allianz Deutschland AG and focused on the cassandra plattform. He also enjoys the part of data analytics with Flink and Spark. As a real java nerd since 2002 Dominique is familiar with the programming part, too. He is certified DataStax Solution Architect
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreDataStax Academy
We will present our Office 365 use case scenarios, why we chose Cassandra + Spark, and walk through the architecture we chose for running DSE on Azure.
The presentation will feature demos on how you too can build similar applications.
Ben Bromhead is the co-founder and CTO of Instaclustr, which provides Cassandra-as-a-Service. Instaclustr manages 50+ Cassandra nodes for customers. Early on, Instaclustr encountered issues like a Cassandra bug causing assertion errors for large column names and had to perform an emergency migration for a customer whose self-managed cluster was down for 48 hours. Migrations and real-world usage revealed new challenges compared to initial perfect test scenarios.
There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...DataStax
This document summarizes Carlos Rolo's presentation on using Cassandra with Azure Resource Manager. It introduces Carlos and his background with distributed systems and Cassandra. It then discusses Pythian, the consulting company, and their expertise in database management. The remainder of the document summarizes key aspects of using Azure, including the different Azure services, storage options, networking, availability sets, and Azure Resource Manager templates for automating deployments of Cassandra clusters on Azure.
The document discusses Apache Cassandra, a distributed database management system designed to handle large amounts of data across many commodity servers. It was developed at Facebook and modeled after Google's Bigtable. The summary discusses key concepts like its use of consistent hashing to distribute data, support for tunable consistency levels, and focus on scalability and availability over traditional SQL features. It also provides an overview of how Cassandra differs from relational databases by not supporting joins, having an optional schema, and using a prematerialized and transaction-less model.
Low-Latency Data Processing in the Era of Serverless @JavaDayLvivNazarii Cherkas
This document discusses low latency data processing in serverless computing. It begins with an overview of serverless and why industries are adopting it. While serverless is well suited for fast development, low latency data processing is challenging due to function startup times. However, approaches like keeping functions warm, using co-located services, and caching hot data externally can help. The document demonstrates a fraud detection solution built on Hazelcast Cloud, which provides an in-memory data grid that can ingest and transform data quickly. In serverless, it is important to consider latency and high availability when choosing data storage, and in-memory computing is well suited for low latency requirements.
Cassandra Day SV 2014: Beyond Read-Modify-Write with Apache CassandraDataStax Academy
This document discusses strategies for updating data in Apache Cassandra beyond using read-modify-write operations. It describes how eventual consistency allows safe updates without locking by propagating changes asynchronously. It also covers Cassandra features like collections, lightweight transactions, and content-addressable storage that provide flexible data models for modern web-scale applications while avoiding the need for read-modify-write in many cases.
Webinar: Diagnosing Apache Cassandra Problems in ProductionDataStax Academy
This document provides guidance on diagnosing problems in Cassandra production systems. It recommends first using OpsCenter to identify issues, then monitoring servers, applications, and logs. Common problems discussed include incorrect timestamps, tombstones slowing queries, not using a snitch, version mismatches, and disk space not being reclaimed. Diagnostic tools like htop, iostat, and nodetool are presented. The document also covers JVM garbage collection profiling to identify issues like early object promotion and long minor GCs slowing the system.
Hardening cassandra for compliance or paranoiazznate
Cassandra at rest encryption, inter-node communication encryption, client-server communication encryption, authentication, authorization, and securing JMX management were discussed. The document provided guidance on implementing encryption at rest using commercial and open source options, setting up SSL for inter-node and client-server communication using self-signed certificates, implementing authentication and authorization best practices from RBMS, and securing JMX access.
Cisco has a large global IT infrastructure supporting many applications, databases, and employees. The document discusses Cisco's existing customer service and commerce systems (CSCC/SMS3) and some of the performance, scalability, and user experience issues. It then presents a proposed new architecture using modern technologies like Elasticsearch, Cassandra, and microservices to address these issues and improve agility, performance, scalability, uptime, and the user interface.
- Micro-batching involves grouping statements into small batches to improve throughput and reduce network overhead when writing to Cassandra.
- A benchmark was conducted to compare individual statements, regular batches, and partition-aware batches when inserting 1 million rows into Cassandra.
- The results showed that partition-aware batches had shorter runtime, lower client and cluster CPU usage, and was more performant overall compared to individual statements and regular batches. However, it may have higher latency which is better suited for bulk data processing rather than real-time workloads.
This session will address Cassandra's tunable consistency model and cover how developers and companies should adopt a more Optimistic Software Design model.
Webinar: DataStax Training - Everything you need to become a Cassandra RockstarDataStax
The document outlines a training program from DataStax on Apache Cassandra, including an introduction to various courses that cover topics such as core concepts, operations and performance tuning, building scalable Java applications, and data modeling. It provides details on the objectives, length, audience, prerequisites, and agenda for each course. The document also includes a schedule of public course dates and locations for attendees to sign up for training.
DataStax recently announced the general availability of DataStax Enterprise 4.7 (DSE 4.7), the leading database platform purpose-built for the performance and availability demands of web, mobile, and IOT applications. In this product launch webinar, Robin Schumacher, VP of Products, explores the wide range of enhancements in DSE 4.7 including enterprise class search, analytics, and in-memory.
Dyn delivers exceptional Internet Performance. Enabling high quality services requires data centers around the globe. In order to manage services, customers need timely insight collected from all over the world. Dyn uses DataStax Enterprise (DSE) to deploy complex clusters across multiple datacenters to enable sub 50 ms query responses for hundreds of billions of data points. From granular DNS traffic data, to aggregated counts for a variety of report dimensions, DSE at Dyn has been up since 2013 and has shined through upgrades, data center migrations, DDoS attacks and hardware failures. In this webinar, Principal Engineers Tim Chadwick and Rick Bross cover the requirements which led them to choose DSE as their go-to Big Data solution, the path which led to SPARK, and the lessons that we’ve learned in the process.
Ruby Driver Explained: DataStax Webinar May 5th 2015DataStax
Apache Cassandra is the leading distributed database platform purpose built for the demands of today's modern Web, Mobile, and IOT applications. To support this growing demand, DataStax makes it a priority to deliver drivers to support today's most common and cutting edge languages. Join Bulat Shakirzyanov, Software Engineer at DataStax, as he dives into the inner-workings of the DataStax Ruby Driver for Apache Cassandra. He will:
• Provide a brief overview of Apache Cassandra's architecture and various terminology
• Walk through the basics for installing and using the DataStax Ruby Driver
• Drill into exactly why the Ruby Driver is fully asynchronous, both from the IO implementation as well as the Cassandra native protocol design
• Look at how the driver uses load balancing policies as well as use cases for various built-in load balancing policies, such as token aware and data center aware round-robin
• Discuss in detail the driver's error handling and fault tolerance and discuss various expected failure modes and how to handle them
• Explain the importance of address resolution policies and take a look at those that ship with the driver
This document discusses SQL Server security enhancements in SQL Server 2014. It covers three main topics:
1) Transparent Data Encryption allows encrypting database and log files for protection both during operations and when backing up to disk or Azure. Encryption can use passwords, asymmetric keys, or certificates.
2) Encryption Key Management allows managing encryption keys through PowerShell, SMO, SSMS and T-SQL. Asymmetric keys or certificates used for encryption must be properly backed up.
3) A new "CONNECT ANY DATABASE" permission allows logins to connect to all current and future databases without other permissions in those databases. This facilitates auditing processes.
• We sleeping well. And our mobile ringing and ringing. Message: DISASTER! In this session (on slides) we are NOT talk about potential disaster (such BCM); we talk about: And what NOW? New version old my old well-known session updated for whole changes which happened in DBA World in last two-three years.
• So, from the ground to the Sky and further - everything for surviving disaster. Which tasks should have been finished BEFORE. Is virtual or physical SQL matter? We talk about systems, databases, peoples, encryption, passwords, certificates and users.
• In this session (on few demos) I'll show which part of our SQL Server Environment are critical and how to be prepared to disaster. In some documents I'll show You how to be BEST prepared.
This one sentence document appears to be a copyright notice for RealPage, Inc. stating that they own all trademarks mentioned and reserving all rights.
Wysoka Dostępność SQL Server 2008 w kontekscie umów SLATobias Koprowski
To druga prezentacja w cztero-częściowym cyklu omawiającym znaczenie wysokiej dostepności w kontekście umów SLA. Prezentacje przeznaczone są dla odbiorców z kręgu ITPro, a publikowane na zywo na portalu VirtualStudy.pl
***
This is second part of my four-parts cycle about Service Level Agreement for ITPros. It a session for Virtualstudy.pl education portal.
On my first session I would to introduce everyone to formerly known SQL Azure (actually Windows Azure SQL Database). Under Tips and Trick session I will show which points, features, compatibility and non-compatibility for SQL Azure are important for DBA's. I will cover functionalities, performance, cost, and sla and security aspects.
After break I will show how we can work with our data in the Cloud using SQL Azure, Blob Storage, what functionality of backup, restore, encryption and availability are available for us, how we can implement hybrid environment and when an why it is (or not) good practice.
And finally I hope we will find few minutes for discussion about Future of the DBA (not only in AD 2016)
Introduction to SQL Server Analysis services 2008Tobias Koprowski
This is my presentation from 17th Polish SQL server User Group Meeting in Wroclaw. It\'s first part of Quadrology Bussiness Intelligence for ITPros Cycle.
This document discusses best practices for preparing for and responding to a disaster involving critical IT systems like servers and databases. It emphasizes the importance of regular backups, having recovery procedures documented, testing restores, and defining roles and responsibilities of team members. It provides guidance on backup strategies for SQL Server and SharePoint, including using different types of backups, storing backups offline, and setting backup schedules. It also stresses the value of preparation, being ready to restore from backups, and having contact information and credentials documented in advance in case of an emergency.
A Whistleblowing Report to the United States of Congress submitted by Scott Bennett, 2LT, United States Army (Reserve), 11th Psychological Operations Battalion to the Department of Defense Inspector General, Memorial Day, May 27, 2013
The Betrayal and Cover-Up by the U.S. Government of the Union Bank of Switzerland - Terrorist Threat Financia Connection to Booz Allen Hamilton and U.S. Central Command
Scott Bennett - Shell Game (pdf source - http://projectcamelotportal.com/files/SHELL_GAME.pdf
Windows Azure SQL Database for Beginners (tips & tricks)
The document provides an overview and introduction to Windows Azure SQL Database including:
- Key features such as scalability, availability, data protection, and programmatic DBA functionality.
- Performance levels are described in DTU (database transaction units) with different tiers for Basic, Standard, and Premium databases.
- Limitations are discussed around database sizing, collations, logins/users, and compatibility with on-premises SQL Server features.
Eventuosity For Event Producers and Service ProvidersJustin Panzer
The document describes a cloud-based platform for collaborative event management that allows all event stakeholders including clients, producers, venues, exhibitors, and attendees to be involved in the planning process. The platform provides end-to-end planning and management tools that can be accessed in the office, on the road, or at the venue. It offers complete control for event planners and integrates with core business applications. The platform promises benefits like greater efficiency, improved accuracy, smarter collaboration, and deeper business intelligence for event clients as well as resource optimization, client process integration, and competitive differentiation for event producers.
Презентация стратегической игры MatriX UrbanАндрей Донских
Стратегическая игра MatriX Urban — это специализированная версия креативной платформы MatriX, предназначенная для поиска нестандартных и свежих решений по развитию территорий и моногородов, улучшению качества жизни населения, поиску новых форматов взаимодействия органов власти, бизнеса, экспертного сообщества, общественных организаций и других заинтересованных сторон.
MatriX Urban — это креативная платформа, предназначенная для проектирования будущего городов и проектов городского развития.
Активные городские сообщества и горожане понимают необходимость перемен и готовы брать на себя ответственность за настоящее и будущее своего города. Это проявляется в том, что они готовы созидательно участвовать в его среде.
Подробнее http://donskih.ru/matrix/matrix-urban/
Virtual Study Beta Exam 71-663 Exchange 2010 Designing And Deploying Messagin...Tobias Koprowski
This is my presentation for VirtualStudy.pl as teh last part of preparation for 71-663 beta exam: 71-663: Pro: Designing and Deploying Messaging Solutions with Microsoft Exchange Server 2010
Recent news about the pending shortage of data scientists prompts speculation about automation: will machines replace human analysts? We propose a model of automation, and briefly review progress in automated machine learning over the past twenty years. Summarizing the current state of the art, we look at some of the remaining challenges, and the implications for practicing data scientists.
The document discusses how companies can implement next best offer strategies using customer data and signals. It describes how customers' purchasing behaviors have become more complex, influenced by various online sources. It then outlines how SAS software can help companies analyze customer data and behaviors to generate targeted, personalized offers at optimal times through real-time decisioning across all channels. Case studies show how US Bank improved sales and increased customer value using next best offer strategies based on signal and event analysis.
Slideburst #7 - Next Best Action in All Digital ChannelsPatrik Svensson
This document discusses using customer data and analytics to deliver personalized next best actions across digital channels. It provides examples of using customer profile and usage data to offer targeted communications and packages to specific customers. The document advocates building customer profiles based on analytics and segmentation techniques. It also argues that delivering personalized next best actions requires changes to data infrastructure and architecture to better integrate customer, usage and event data.
Why is ERISA attorney Thomas Schendt so passionate about stopping retirement plan leakage? Because 401(k) loan defaults and a misunderstanding of plan sponsor requirements are costing plans billions every year. See why he believes this problem has a simple solution.
The taste of food and beverages can be dictated by the cleanliness of your water. Issues such as sediment, chlorine and hardness are often to blame but can be easily prevented.
C* Summit 2013: No Whistling Required: Cabs, Cassandra, and Hailo by Dave Gar...DataStax Academy
Hailo has leveraged Cassandra to build one of the most successful startups in European history. This presentations looks at how Hailo grew from a simple MySQL-backed infrastructure to a resilient Cassandra-backed system running in three data centers globally. Topics covered include: the process of migration, experience running multi-DC on AWS, common data modeling patterns and security implications for achieving PCI compliance.
ContextSpace is working to develop and support an open source implementation of the Camunda core engine that persists all if its data to Cassandra. This development addresses issues of ACID as well as approaches to lock management. ContextSpace plans to integrate this implementation with its own product offering in order to expose data and events generated from its identity, security, roles, messaging and contextual user activities to be managed by Camunda-driven business processes.
Acunu Analytics and Cassandra at Hailo All Your Base 2013 Acunu
Hailo, the taxi app, has served more than 5 million passengers in 15 cities and has taken fares of $100 million this year. I'm going to talk about how that rapid growth has been powered by a platform based on Cassandra and operational analytics and insights powered by Acunu Analytics. I'll cover some challenges and lessons learned from scaling fast!
This document discusses efficient data mining solutions using Hadoop, Cassandra, and Spark. It describes Cassandra as a fast, robust, and efficient key-value database but notes it has limitations for certain queries. Spark is presented as an alternative to Hadoop MapReduce that can be 100 times faster for interactive algorithms and data mining. The document demonstrates how Spark can integrate with Cassandra to allow distributed data processing over Cassandra data without needing to clone the data or use other databases. Future extensions are proposed to directly access Cassandra's SSTable files from Spark and extend CQL3 to leverage Spark.
C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael KjellmanDataStax Academy
Abstract A brief intro to how Barracuda Networks uses Cassandra and the ways in which they are replacing their MySQL infrastructure, with Cassandra. This presentation will include the lessons they've learned along the way during this migration.
C* Summit EU 2013: Keynote by Jonathan Ellis — Cassandra 2.0 & 2.1DataStax Academy
Speaker: Jonathan Ellis, Apache Cassandra Chair & CTO/Co-Founder at DataStax
Keynote presentation on Apache Cassandra 2.0 & 2.1 at Cassandra Summit EU 2013
The document discusses Cassandra 2.1, including:
- New features like user defined types, collection indexing, and more efficient HyperLogLog filters and repair processes.
- Past and ongoing improvements to Cassandra's performance, scalability, reliability and ease of use over its 5 year history and multiple releases.
- Details on Cassandra's architecture like its read path, compaction strategies, and use of on- and off-heap memory.
C* Summit EU 2013: Mixing Batch and Real-Time: Cassandra with Shark DataStax Academy
Speaker: Richard Low, Analytics Tech Lead at SwiftKey
Video: http://www.youtube.com/watch?v=QTb4HTwVMq0&list=PLqcm6qE9lgKLoYaakl3YwIWP4hmGsHm5e&index=2
Everything Cassandra does is designed for a real-time workload of high volume inserts and frequent small queries. Cassandra has Hadoop and Hive integration, but performing long running ad-hoc queries with these tools is difficult without impacting real-time performance or requires duplicate clusters. This talk will explain how I'm integrating Cassandra with Shark, a drop-in Hive replacement developed by Berkeley's AmpLab. It's designed to give fine grained control over all resource usage so you can safely run arbitrary ad-hoc queries on your existing cluster with controlled and predictable impact.
Mixing Batch and Real-time: Cassandra with Shark (Cassandra Europe 2013)Richard Low
The document discusses running batch analytics queries on Cassandra databases by using Spark and Shark to directly access the SSTables. Current solutions like running Hive on Cassandra have performance issues. The author's solution uses Spark workers running on Cassandra nodes to read SSTables directly, avoiding the filesystem cache and CQL interface. Performance tests show this approach is 2.5x faster than using the CQL interface and has lower and more predictable query latency, even under write load. The author calls for further development and contributions to the technique.
This are the slides from the intensive Cassandra Workshop I held in Madrid as a Meetup: http://www.meetup.com/Madrid-Cassandra-Users/events/225944063/ They cover all the Cassandra core concepts, and data modelling basic ones to get up and running with Cassandra.
An efficient data mining solution by integrating Spark and CassandraStratio
Integrating C* and Spark gives us a system that combines the best of both worlds. The goal of this integration is to obtain a better result than using Spark over HDFS because Cassandra´s philosophy is much closer to RDD's philosophy than what HDFS is. The goal with Cassandra is to have a system that mines all the information stored in C* in a much more efficient way than having the information stored in HDFS. Cassandra data storage and Spark data mining power: an unrivalled mix.
Apache Cassandra: building a production app on an eventually-consistent DBOliver Lockwood
At Sky, we use Cassandra for database persistence in our Online Video Platform - the system which delivers all OTT video content to both Sky and NOW TV customers - and yes, that includes handling huge spikes in traffic both when there's a big Premier League football match and when a new Game of Thrones season comes online!
This talk aims to cover the following topics.
- A brief introduction to Cassandra, including what it’s good for, what it’s not good for, and why. We'll dig into how storage, reads, writes and conflict resolution work.
- Gotchas in an eventually-consistent DB - some interesting problems we encountered and the lessons we learned the hard way.
- Performing database schema and data evolution in Cassandra for a production app.
- Why this is important, and what we did at Sky to ensure consistency of our database schema.
Presented at Geecon Prague on 20th October 2016.
C* Summit EU 2013: Effective Cassandra Development with AchillesDataStax Academy
This document discusses Achilles, an open source persistence manager for Cassandra that provides features like entity mapping, common CRUD operations, query DSL, and integration with Spring. It highlights that Achilles was created by developers for developers and aims to support all CQL3 features and upcoming Cassandra features. The presentation encourages effective Cassandra development using Achilles and provides an overview of its capabilities and roadmap.
Effective cassandra development with achillesDuyhai Doan
This document discusses Achilles, an open source persistence manager for Cassandra that provides features like entity mapping, common CRUD operations, query DSL, and integration with Spring. It highlights that Achilles was created by developers for developers to make Cassandra development more effective. The roadmap includes future support for secondary indexes, bean validation, DAO templates, and new Cassandra 2.0 features.
Apache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOpsAnant Corporation
In this lunch, Johnny will show us how easy it is to start monitoring your Cassandra cluster in minutes. He will explain the various aspects and features of Cassandra that need to be monitored, how to do it, and most importantly why! Approaches for backups and Cassandra repairs will be discussed and explored in detail.
Learn how AxonOps significantly reduces the complexity and overhead when looking after Cassandra and ensures your Cassandra cluster is reliable and resilient.
Experienced developer, DevOps, architect, and AxonOps co-founder, Johnny Miller, has worked with a wide variety of companies – from small start-ups to large enterprises. He has been working with Cassandra for many years and has a deep understanding of the challenges facing modern companies looking to adopt Apache Cassandra.
Business Growth Is Fueled By Your Event-Centric Digital Strategyzitipoff
The document discusses how event-driven architecture (EDA) can fuel business growth through an event-centric digital strategy. It covers:
1) EDA's role in digital business strategies and how it enables organizations to respond rapidly to events.
2) Key components of an EDA system including Kafka, Spark and Cassandra, and how technologies like these provide benefits such as scalability, fault tolerance and real-time processing.
3) Examples of Netflix and Amazon successfully leveraging EDA for hyper-personalization to retain customers and increase sales.
Pollfish is a survey platform which provides access to millions of targeted users. Pollfish allows easy distribution and targeting of surveys through existing mobile apps. (https://www.pollfish.com/). At pollfish we use Cassandra for difference use cases, eg. for application data store to maximize write throughput when appropriate and for our analytics project to find insights in application generated data. As a medium to accomplish our success so far, we use the Datastax's DSE 4.6 environment which integrates Appache Cassadra, Spark and a hadoop compatible file system (CFS). We will discuss how we started, how the journey was and the impressions gained so far along with some tips learned the hard way. This is a result of joint work of an excellent team here at Pollfish.
Every company likes to brag about their successes, but not many are willing to talk about their failures. At PagerDuty we have been rigorously tracking downtime in order to analyze it and learn from our mistakes - we even blog about these failures publicly.
Despite being a highly available system, we have had three outages caused by problems with our production Cassandra clusters over the past year. We'll take a look at each of these outages: what we saw from the inside, the actions we took to recover, and most importantly the procedures and monitoring that will help prevent it from happening to you.
Slides from my Planning to Fail talk given at PHP North East conference 2013. This is a slightly longer version of the same talk given at the PHP UK conference. The talk was on how you can build resilient systems by embracing failure.
The document discusses planning for failure when building software systems. It notes that as software projects grow larger with more engineers, complexity and the potential for failures increases. The author discusses how the taxi app Hailo has grown significantly and now uses a service-oriented architecture across multiple data centers to improve reliability. Key technologies discussed include Zookeeper, Elasticsearch, NSQ, and Cruftflake which provide distributed and resilient capabilities. The importance of testing failures through simulation is emphasized to improve reliability.
Cassandra concepts, patterns and anti-patternsDave Gardner
The document discusses Cassandra concepts, patterns, and anti-patterns. It begins with an agenda that covers choosing NoSQL, Cassandra concepts based on Dynamo and Bigtable, and patterns and anti-patterns of use. It then delves into Cassandra concepts such as consistent hashing, vector clocks, gossip protocol, hinted handoff, read repair, and consistency levels. It also discusses Bigtable concepts like sparse column-based data model, SSTables, commit log, and memtables. Finally, it outlines several patterns and anti-patterns of Cassandra use.
Unique ID generation in distributed systemsDave Gardner
The document discusses different strategies for generating unique IDs in a distributed system. It covers using auto-incrementing numeric IDs in MySQL, which are not resilient, and various solutions like UUIDs, Twitter Snowflake IDs, and Flickr ticket servers that generate IDs in a distributed and ordered way without coordination between data centers. It also provides code examples of generating Twitter Snowflake-like IDs in PHP without coordination using ZeroMQ.
Cassandra's Sweet Spot - an introduction to Apache CassandraDave Gardner
Slides from my NoSQL Exchange 2011 talk introducing Apache Cassandra. This talk explained the fundamental concepts of Cassandra and then demonstrated how to build a simple ad-targeting application using PHP, with a focus on data modeling.
Video of talk: http://skillsmatter.com/podcast/home/cassandra/js-2880
Intro slides from Cassandra London July 2011Dave Gardner
The document compares Cassandra and MongoDB, two NoSQL databases. It provides information on their data models, conflict resolution approaches, distribution methods, and differences. A commenter responds that Cassandra and MongoDB have almost nothing in common and that claims of Cassandra dying off are incorrect.
This document provides a summary of various resources about Apache Cassandra, including blog posts on migrating Netflix to Cassandra, indexing in Cassandra, and Cassandra at Twitter. It also lists a book on Cassandra and highlights the key components of the Acunu data platform, which includes Cassandra, management tools, and an easily installed package.
An introduction to DataStax's Brisk (a distribution of Cassandra, Hadoop and Hive). Includes a back story of my own experience with Cassandra plus a demo of Brisk built around a very simple ad-network-type application.
Introduction to Cassandra at London Web MeetupDave Gardner
A 15 minute introduction to the Cassandra distributed data store from the February 2011 London Web meetup.
This covers the basics of who is using it, why you might want to use it (due to the large amount of data being collected by Web Apps today) and, most importantly, _what_ it is!
What are the challenges of running Apache Cassandra on Amazon EC2? Is it a good idea?
In this presentation, we explore reasons for and against running the distributed database Cassandra on EC2. We look at the I/O performance of EC2 and
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...Alex Pruden
Folding is a recent technique for building efficient recursive SNARKs. Several elegant folding protocols have been proposed, such as Nova, Supernova, Hypernova, Protostar, and others. However, all of them rely on an additively homomorphic commitment scheme based on discrete log, and are therefore not post-quantum secure. In this work we present LatticeFold, the first lattice-based folding protocol based on the Module SIS problem. This folding protocol naturally leads to an efficient recursive lattice-based SNARK and an efficient PCD scheme. LatticeFold supports folding low-degree relations, such as R1CS, as well as high-degree relations, such as CCS. The key challenge is to construct a secure folding protocol that works with the Ajtai commitment scheme. The difficulty, is ensuring that extracted witnesses are low norm through many rounds of folding. We present a novel technique using the sumcheck protocol to ensure that extracted witnesses are always low norm no matter how many rounds of folding are used. Our evaluation of the final proof system suggests that it is as performant as Hypernova, while providing post-quantum security.
Paper Link: https://eprint.iacr.org/2024/257
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
Discover top-tier mobile app development services, offering innovative solutions for iOS and Android. Enhance your business with custom, user-friendly mobile applications.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots - like ChatGPT - can converse on any topic, task-oriented chatbots - the focus of this paper - are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots.
To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Essentials of Automations: Exploring Attributes & Automation ParametersSafe Software
Building automations in FME Flow can save time, money, and help businesses scale by eliminating data silos and providing data to stakeholders in real-time. One essential component to orchestrating complex automations is the use of attributes & automation parameters (both formerly known as “keys”). In fact, it’s unlikely you’ll ever build an Automation without using these components, but what exactly are they?
Attributes & automation parameters enable the automation author to pass data values from one automation component to the next. During this webinar, our FME Flow Specialists will cover leveraging the three types of these output attributes & parameters in FME Flow: Event, Custom, and Automation. As a bonus, they’ll also be making use of the Split-Merge Block functionality.
You’ll leave this webinar with a better understanding of how to maximize the potential of automations by making use of attributes & automation parameters, with the ultimate goal of setting your enterprise integration workflows up on autopilot.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving
Manufacturing custom quality metal nameplates and badges involves several standard operations. Processes include sheet prep, lithography, screening, coating, punch press and inspection. All decoration is completed in the flat sheet with adhesive and tooling operations following. The possibilities for creating unique durable nameplates are endless. How will you create your brand identity? We can help!
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
How information systems are built or acquired puts information, which is what they should be about, in a secondary place. Our language adapted accordingly, and we no longer talk about information systems but applications. Applications evolved in a way to break data into diverse fragments, tightly coupled with applications and expensive to integrate. The result is technical debt, which is re-paid by taking even bigger "loans", resulting in an ever-increasing technical debt. Software engineering and procurement practices work in sync with market forces to maintain this trend. This talk demonstrates how natural this situation is. The question is: can something be done to reverse the trend?
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...Jason Yip
The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
10. #CASSANDRA13 CASSANDRASUMMIT2013
• The world’s highest-rated taxi app – over 10,000 five-star reviews
• Over 500,000 registered passengers
• A Hailo e-hail is accepted by a driver every four seconds around
the world
• Hailo operates in ten cities from Tokyo to Toronto in just over
eighteen months of operation
What is Hailo?
11. #CASSANDRA13 CASSANDRASUMMIT2013
• Hailo is a marketplace that facilitates over $100M in run-rate
transactions and is making the world a better place for passengers
and drivers
• Hailo has raised over $50M in financing from the world's best
investors including Union Square Ventures, Accel, the founder of
Skype (via Atomico), Wellington Partners (Spotify), Sir Richard
Branson, and our CEO's mother, Janice
Hailo is growing
13. #CASSANDRA13 CASSANDRASUMMIT2013
Hailo launched in London in November 2011
• Launched on AWS
• Two PHP/MySQL web apps plus a Java backend
• Mostly built by a team of 3 or 4 backend engineers
• MySQL multi-master for single AZ resilience
14. #CASSANDRA13 CASSANDRASUMMIT2013
Why Cassandra?
• A desire for greater resilience – “become a utility”
Cassandra is designed for high availability
• Plans for international expansion around a single consumer app
Cassandra is good at global replication
• Expected growth
Cassandra scales linearly for both reads and writes
• Prior experience
I had experience with Cassandra and could recommend it
15. #CASSANDRA13 CASSANDRASUMMIT2013
The path to adoption
• Largely unilateral decision by developers – a result of a startup
culture
• Replacement of key consumer app functionality, splitting up the
PHP/MySQL web app into a mixture of global PHP/Java services
backed by a Cassandra data store
• Launched into production in September 2012 – originally just
powering North American expansion, before gradually switching
over Dublin and London
20. #CASSANDRA13 CASSANDRASUMMIT2013
Considerations for entity storage
• Do not read the entire entity, update one property and then write
back a mutation containing every column
• Only mutate columns that have been set
• This avoids read-before-write race conditions
26. #CASSANDRA13 CASSANDRASUMMIT2013
Considerations for time series storage
• Choose row key carefully, since this partitions the records
• Think about how many records you want in a single row
• Denormalise on write into many indexes
28. #CASSANDRA13 CASSANDRASUMMIT2013
Analytics
• With Cassandra we lost the ability to carry out analytics
eg: COUNT, SUM, AVG, GROUP BY
• We use Acunu Analytics to give us this abilty in real time, for pre-
planned query templates
• It is backed by Cassandra and therefore highly available, resilient
and globally distributed
• Integration is straightforward
37. #CASSANDRA13 CASSANDRASUMMIT2013
Learn the theory
• Teach each team member the fundamentals
• CQL can encourage an SQL mindset, but it’s important to
understand the underlying data model
• Make a real effort to share knowledge – keep in mind the gulf in
experience for most team members between their old world and
the new world (SQL vs NoSQL)
• Peer review data models
41. #CASSANDRA13 CASSANDRASUMMIT2013
2 clusters
6 machines per region
3 regions
(stats cluster pending addition
of third DC)
Operational
Cluster
Stats
Cluster
ap-southeast-1 us-east-1 eu-west-1
us-east-1 eu-west-1
42. #CASSANDRA13 CASSANDRASUMMIT2013
AWS VPCs with Open
VPN links
3 AZs per region
m1.large machines
Provisoned IOPS EBS
Operational
Cluster
Stats
Cluster
~ 600GB/node
~ 100GB/node
43. #CASSANDRA13 CASSANDRASUMMIT2013
Backups
• SSTable snapshot
• Used to upload to S3, but this was taking >6 hours and consuming
all our network bandwidth
• Now take EBS snapshot of the SSTable snapshots
44. #CASSANDRA13 CASSANDRASUMMIT2013
Encryption
• Requirement for NYC launch
• We use dmcrypt to encrypt the entire EBS volume
• Chose dmcrypt because it is uncomplicated
• Our tests show a 1% performance hit in disk performance, which
concurs with what Amazon suggest
45. #CASSANDRA13 CASSANDRASUMMIT2013
Datastax Ops Centre
• We run the free version
• Offers up easily accessible “one screen” overviews of the activity
of the entire cluster
• Big fans – an easy win
47. #CASSANDRA13 CASSANDRASUMMIT2013
Multi DC
• Something that Cassandra makes trivial
• Would have been very difficult to accomplish active-active inter-DC
replication with a team of 2 without Cassandra
• Rolling repair needed to make it safe (we use LOCAL_QUORUM)
• We schedule “narrow repairs” on different nodes in our cluster
each night
48. #CASSANDRA13 CASSANDRASUMMIT2013
Compression
• Our stats cluster was running at ~1.5TB per node
• We didn’t want to add more nodes
• With compression, we are now back to ~600GB
• Easy to accomplish
• `nodetool upgradesstables` on a rolling schedule
53. #CASSANDRA13 CASSANDRASUMMIT2013
Technically, everything is fine…
• Our COO feels that C* is “technically good and beautiful”, a
“perfectly good option”
• Our EVPO says that C* reminds him of a time series database in
use at Goldman Sachs that had “very good performance”
…but there are concerns
57. #CASSANDRA13 CASSANDRASUMMIT2013
Keep the business informed
• Pre-launch, we were tasked with increasing resiliency
• Cassandra addressed immediate business needs, but the trade
offs involved should have been communicated more clearly
58. #CASSANDRA13 CASSANDRASUMMIT2013
Sing from the same hymn sheet
• A senior founding engineer had doubts about the adoption of
Cassandra until very recently
• In the presence of business doubt, this lack of consistency
amongst developers exacerbated the concerns
• We should have made more effort to make bilateral decisions on
adoption – I don’t think this would have been hard to achieve
63. #CASSANDRA13 CASSANDRASUMMIT2013
Lessons for successful adoption
• Have an advocate, sell the dream
• Learn the fundamentals, get the best out of Cassandra
• Invest in tools to make life easier
• Keep management in the loop, explain the trade offs
64. #CASSANDRA13 CASSANDRASUMMIT2013
The future
• We will continue to invest in Cassandra as we expand globally
• We will hire people with experience running Cassandra
• We will focus on expanding our reporting facilities
• We aspire to extend our network (1M consumer installs, wallet)
beyond cabs
• We will continue to hire the best engineers in London, NYC and
Asia
Had the idea of talking about “Cassandra at Hailo”.When it came to the time to actually write the talk, I realised it was going to be quite difficult.
I started using Cassandra in 2010, back in version 0.6. Back then it was quite hard work.
I founded the London meetup group in 2010 and have been flying the C* flag over London ever since. My motivation was to connect with others who were using Cassandra. Back then “swapping war stories” was a common theme. Cassandra was not easy to use.
Fast forward to 2013. 7,429 commits later. Cassandra “just works”. Kudos to the team of committers and contributors who have made this happen.
4:30Whilst “it just works” is quite compelling, there are still challenges to successful adoption of C* in an organisation. I am going to talk about our experiences at Hailo, from three perpsectives: dev, ops and management.
On iOS and Android, live in London, New York, Chicago, Toronto, Boston, Dublin, Madrid
My recommendation was based on the solid design principles behind C*, something I’ve talked about in the past.
13:00
Row key = entity ID, in this instance, a 64 bit integer a-la SnowflakeColumn name = property nameValue = property valueA key point when using this pattern is to only mutate columns that you change
Row key = entity ID, in this instance, a 64 bit integer a-la SnowflakeColumn name = property nameValue = property valueA key point when using this pattern is to only mutate columns that you change
Read heavy, demand-driven. Writes consistent.
Time series for storing records of emails sent. In this instance bucketed by a daily row key, for all messages. The column name is a type 1 UUID.
We also denormalise for other indexes, eg: here we store every message sent to a given address under a single row.
More writes than reads – most of these reads are actually single entity reads.
Stats service – insert rate at 5k/sec. Responsible for storing business events from all areas of our system.
Row key = entity ID, in this instance, a 64 bit integer a-la SnowflakeColumn name = property nameValue = property valueA key point when using this pattern is to only mutate columns that you change
We are not using CQL.
We can execute AQL
Some screenshot
Some screenshot
1. Most people have N years of SQL experience where N >= 5
2. It’s possible to shoot yourself in the foot – but this is true of SQL (eg: joins that work with low data volumes)
27:00
London, NYC, Tokyo, Osaka, Dublin, Toronto, Boston, Chicago, Madrid, Barcelona, Washington, Montreal
Our rings, plus key stats (m1.large, 18 nodes in cluster A, 12 nodes in cluster B, 100GB per node in cluster A, ~ 600GB in cluster B)
Our rings, plus key stats (m1.large, 18 nodes in cluster A, 12 nodes in cluster B, 100GB per node in cluster A, ~ 600GB in cluster B)
Sometimes C* works too well. Clearly this cluster needs some attention, but our application is still working fine.We are probably at the point where we need a dedicated C* expert.
I interviewed key people from our management team to gauge their reaction to our C* deployment.
There is a perceptionthat we have made it much harder to get at our data. In the early days at Hailo, when we all worked in one room, developers could execute ad-hoc queries on the fly for management. Nowadays we can’t. The reasons behind this are two-fold – firstly it is true that C* is harder to execute ad-hoc queries. But that’s not the whole picture. Much of our data is still in MySQL, and the queries we used to do against this data do not run smoothly either. The perception, however, is that it is the “new database” that is the cause of problems.
It’s easy to cause yourself a “Big Data” problem. Developers collect and store data because they can, without being clear about the business implications.
With the right tools, we could change the picture completely.