Presented at JProf meetup in Minsk on March 14.
Presentation provides some insights on migrating challenge data for millions of active users from MySQL to Cassandra
MariaDB Galera Cluster Webinar by Ivan Zoratti on 13.11.2013. Also available as on demand webinar at http://www.skysql.com/why-skysql/webinars/mariadb-galera-cluster-simple-transparent-highly-available
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...DataStax
A solid backup strategy is a DBA's bread and butter. Cassandra's nodetool snapshot makes it easy to back up the SSTable files, but there remains the question of where to put them and how. Knewton's backup strategy uses Ansible for distributed backups and stores them in S3.
Unfortunately, it's all too easy to store backups that are essentially useless due to the absence of a coherent restoration strategy. This problem proved much more difficult and nuanced than taking the backups themselves. I will discuss Knewton's restoration strategy, which again leverages Ansible, yet I will focus on general principles and pitfalls to be avoided. In particular, restores necessitated modifying our backup strategy to generate cluster-wide metadata that is critical for a smooth automated restoration. Such pitfalls indicate that a restore-focused backup design leads to faster and more deterministic recovery.
About the Speaker
Joshua Wickman Database Engineer, Knewton
Dr. Joshua Wickman is currently part of the database team at Knewton, a NYC tech company focused on adaptive learning. He earned his PhD at the University of Delaware in 2012, where he studied particle physics models of the early universe. After a brief stint teaching college physics, he entered the New York tech industry in 2014 working with NoSQL, first with MongoDB and then Cassandra. He was certified in Cassandra at his first Cassandra Summit in 2015.
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...DataStax
Traditionally, machines were statically partitioned across the different services at Uber. In an effort to increase the machine utilization, Uber has recently started transitioning most of its services, including the storage services, to run on top of Mesos. This presentation will describe the initial experience building and operating a framework for running Cassandra on top of Mesos running across multiple datacenters at Uber. This framework automates several Cassandra operations such as node repairs, addition of new nodes and backup/restore. It improves efficiency by co-locating CPU-intensive services as well as multiple Cassandra nodes on the same Mesos agent. It handles failure and restart of Mesos agents by using persistent volumes and dynamic reservations. This talk includes statistics about the number of Cassandra clusters in production, time taken to start a new cluster, add a new node, detect a node failure; and the observed Cassandra query throughput and latency.
About the Speaker
Abhishek Verma Software Engineer, Uber
Dr. Abhishek Verma is currently working on running Cassandra on top of Mesos at Uber. Prior to this, he worked on BorgMaster at Google and was the first author of the Borg paper published in Eurosys 2015. He received an MS in 2010 and a PhD in 2012 in Computer Science from the University of Illinois at Urbana-Champaign, during which he authored more than 20 publications in conferences, journals and books and presented tens of talks.
C* Summit 2013: Cassandra at Instagram by Rick BransonDataStax Academy
Speaker: Rick Branson, Infrastructure Engineer at Instagram
Cassandra is a critical part of Instagram's large scale site infrastructure that supports more than 100 million active users. This talk is a practical deep dive into data models, systems architecture, and challenges encountered during the implementation process.
MariaDB Galera Cluster Webinar by Ivan Zoratti on 13.11.2013. Also available as on demand webinar at http://www.skysql.com/why-skysql/webinars/mariadb-galera-cluster-simple-transparent-highly-available
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...DataStax
A solid backup strategy is a DBA's bread and butter. Cassandra's nodetool snapshot makes it easy to back up the SSTable files, but there remains the question of where to put them and how. Knewton's backup strategy uses Ansible for distributed backups and stores them in S3.
Unfortunately, it's all too easy to store backups that are essentially useless due to the absence of a coherent restoration strategy. This problem proved much more difficult and nuanced than taking the backups themselves. I will discuss Knewton's restoration strategy, which again leverages Ansible, yet I will focus on general principles and pitfalls to be avoided. In particular, restores necessitated modifying our backup strategy to generate cluster-wide metadata that is critical for a smooth automated restoration. Such pitfalls indicate that a restore-focused backup design leads to faster and more deterministic recovery.
About the Speaker
Joshua Wickman Database Engineer, Knewton
Dr. Joshua Wickman is currently part of the database team at Knewton, a NYC tech company focused on adaptive learning. He earned his PhD at the University of Delaware in 2012, where he studied particle physics models of the early universe. After a brief stint teaching college physics, he entered the New York tech industry in 2014 working with NoSQL, first with MongoDB and then Cassandra. He was certified in Cassandra at his first Cassandra Summit in 2015.
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...DataStax
Traditionally, machines were statically partitioned across the different services at Uber. In an effort to increase the machine utilization, Uber has recently started transitioning most of its services, including the storage services, to run on top of Mesos. This presentation will describe the initial experience building and operating a framework for running Cassandra on top of Mesos running across multiple datacenters at Uber. This framework automates several Cassandra operations such as node repairs, addition of new nodes and backup/restore. It improves efficiency by co-locating CPU-intensive services as well as multiple Cassandra nodes on the same Mesos agent. It handles failure and restart of Mesos agents by using persistent volumes and dynamic reservations. This talk includes statistics about the number of Cassandra clusters in production, time taken to start a new cluster, add a new node, detect a node failure; and the observed Cassandra query throughput and latency.
About the Speaker
Abhishek Verma Software Engineer, Uber
Dr. Abhishek Verma is currently working on running Cassandra on top of Mesos at Uber. Prior to this, he worked on BorgMaster at Google and was the first author of the Borg paper published in Eurosys 2015. He received an MS in 2010 and a PhD in 2012 in Computer Science from the University of Illinois at Urbana-Champaign, during which he authored more than 20 publications in conferences, journals and books and presented tens of talks.
C* Summit 2013: Cassandra at Instagram by Rick BransonDataStax Academy
Speaker: Rick Branson, Infrastructure Engineer at Instagram
Cassandra is a critical part of Instagram's large scale site infrastructure that supports more than 100 million active users. This talk is a practical deep dive into data models, systems architecture, and challenges encountered during the implementation process.
Cassandra Summit 2014: Cassandra at Instagram 2014DataStax Academy
Presenter: Rick Branson, Infrastructure Engineer at Instagram
As Instagram has scaled to over 200 million users, so has our use of Cassandra. We've built new features and rebuilt old on Cassandra, and it's become an extremely mission-critical foundation of our production infrastructure. Rick will deliver a refresh of our use cases and go deep on the technical challenges we faced during our expansion.
Powering Microservices with Docker, Kubernetes, Kafka, & MongoDBMongoDB
This session introduced technologies such as Docker, Kubernetes, and Kafka, which are driving the microservices revolution. Learn about containers and orchestration – and most importantly, how to exploit them for stateful services such as MongoDB.
What You Will Learn:
* Why organizations are choosing to use microservice architectures, what benefits they deliver, and when they should – or shouldn't – be used.
* Technologies that are used to build microservices – including containers, orchestration, and messaging systems.
* Why MongoDB is a good fit for microservices and what special steps need to be taken to make them work well together.
We will show the advantages of having a geo-distributed database cluster and how to create one using Galera Cluster for MySQL. We will also discuss the configuration and status variables that are involved and how to deal with typical situations on the WAN such as slow, untrusted or unreliable links, latency and packet loss. We will demonstrate a multi-region cluster on Amazon EC2 and perform some throughput and latency measurements in real-time (video http://galeracluster.com/videos/using-galera-replication-to-create-geo-distributed-clusters-on-the-wan-webinar-video-3/)
Lightweight Transactions at Lightning SpeedScyllaDB
This talk will outline the Scylla implementation of Lightweight Transactions (LWT) that brings us to parity with Apache Cassandra. We will cover how to use it, what is working, and what is left to be done. We will also cover what other improvements are in store to improve Scylla's transactional capabilities and why it matters.
Over the years, DevOps has evolved in many aspects: from best practices, to the responsibilities of an operations engineer, to culture, the ever-changing landscape of infrastructure brings new, exciting discoveries… but also some never before seen issues! I always keep a careful record of all the problems we’ve troubleshot because you never know when a solution will come in handy. This session tells a couple of tales of dealing with databases and how troubleshooting has evolved over time.
JCON Online 2021, International Java Community Conference, 07.10.21, Moritz Kammerer (@Moritz Kammerer, Expert Software Engineer at QAware).
== Please download slides in case they are blurred! ===
In his talk we have had a look at how Microservices can be developed with Micronaut. In our slides you can find out if it kept its promise.
Detail behind the Apache Cassandra 2.0 release and what is new in it including Lightweight Transactions (compare and swap) Eager retries, Improved compaction, Triggers (experimental) and more!
• CQL cursors
A brief, but action-packed introduction to DataStax Enterprise Search. In this deck, we'll get an overview of DSE Search's value proposition, see some example CQL search queries, and dive into the details of the indexing and query paths.
AddThis: Scaling Cassandra up and down into containers with ZFSDataStax Academy
ZFS is an advanced file, raid, and volume management system originally developed by Sun Microsystems, 'The Last Word in File Systems' has been unavailable on Linux until recently. AddThis uses ZFS to more effectively scale up dedicated hardware, getting twice the performance at half the cost. ZFS is also fundamental to containerization, allowing nodes from multiple clusters to be co-located with safe persistent storage.
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...DataStax
EmoDB is an open source RESTful data store built on top of Cassandra that stores JSON documents and, most notably, offers a databus that allows subscribers to watch for changes to those documents in real time. It features massive non-blocking global writes, asynchronous cross data center communication, and schema-less json content.
For non-blocking global writes, we created a ""JSON delta"" specification that defines incremental updates to any json document. Each row, in Cassandra, is thus a sequence of deltas that serves as a Conflict-free Replicated Datatype (CRDT) for EmoDB's system of record. We introduce the concept of ""distributed compactions"" to frequently compact these deltas for efficient reads.
Finally, the databus forms a crucial piece of our data infrastructure and offers a change queue to real time streaming applications.
About the Speaker
Fahd Siddiqui Lead Software Engineer, Bazaarvoice
Fahd Siddiqui is a Lead Software Engineer at Bazaarvoice in the data infrastructure team. His interests include highly scalable, and distributed data systems. He holds a Master's degree in Computer Engineering from the University of Texas at Austin, and frequently talks at Austin C* User Group. About Bazaarvoice: Bazaarvoice is a network that connects brands and retailers to the authentic voices of people where they shop. More at www.bazaarvoice.com
- Understanding Time Series
- What's the Fundamental Problem
- Prometheus Solution (v1.x)
- New Design of Prometheus (v2.x)
- Data Compression Algorithm
Every company likes to brag about their successes, but not many are willing to talk about their failures. At PagerDuty we have been rigorously tracking downtime in order to analyze it and learn from our mistakes - we even blog about these failures publicly.
Despite being a highly available system, we have had three outages caused by problems with our production Cassandra clusters over the past year. We'll take a look at each of these outages: what we saw from the inside, the actions we took to recover, and most importantly the procedures and monitoring that will help prevent it from happening to you.
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...DataStax
Cassandra's support for multiple data centers can bring massive benefits to an organization, however it can also bring painful operational lessons. While there is no recipe for trouble free mutli DC clusters, the best approach is to understand why you are using one, what Cassandra supports, and how it does it. With this knowledge in your toolkit you will have a better chance of fixing the sort of gremlins that can trouble a globally distributed database.
In this talk Alexander Dejanovski, Consultant at The Last Pickle, will outline the motivations people typically have for running a multi DC cluster. He will also look at how multiple DC's are supported through all areas of the Cassandra, how it impacts your application and operations, and how you can always blame the network.
About the Speaker
Alexander DEJANOVSKI Consultant, The Last Pickle
Alexander has been working as a software developer for the last 18 years, mainly for the french leader of express shipments. He's been leading there the effort to build a Cassandra based architecture and migrate services to it from traditional RDBMS. He is involved in the Cassandra community through the development of a JDBC wrapper for the DataStax Java Driver. Recently, he joined The Last Pickle as a Cassandra consultant and now helps customers to get the best out of it.
A key feature when monitoring and debugging any Cloud infrastructure is to provide the ability to trace, track, and collate all the individual, discrete steps that compose an event. A typical resource action in OpenStack is often a combination of smaller tasks -- which given the distributed nature of OpenStack -- can fail at unpredictable points in the workflow. By collecting the appropriate events, operators can view all events within Ceilometer, filter on a failed action and trace back the history of related events to spot anomalies or errors. In this talk, we provide an overview of the recent enhancements made in Ceilometer to support the collection of event notifications from OpenStack services. We will describe: how events are processed, transformed and stored in Ceilometer; how you can derive metrics from events; and how it’s possible to track the events of a resource and analyse where errors occur.
C* Summit 2013: How Not to Use Cassandra by Axel LiljencrantzDataStax Academy
At Spotify, we see failure as an opportunity to learn. During the two years we've used Cassandra in our production environment, we have learned a lot. This session touches on some of the exciting design anti-patterns, performance killers and other opportunities to lose a finger that are at your disposal with Cassandra.
Cassandra Summit 2014: Cassandra at Instagram 2014DataStax Academy
Presenter: Rick Branson, Infrastructure Engineer at Instagram
As Instagram has scaled to over 200 million users, so has our use of Cassandra. We've built new features and rebuilt old on Cassandra, and it's become an extremely mission-critical foundation of our production infrastructure. Rick will deliver a refresh of our use cases and go deep on the technical challenges we faced during our expansion.
Powering Microservices with Docker, Kubernetes, Kafka, & MongoDBMongoDB
This session introduced technologies such as Docker, Kubernetes, and Kafka, which are driving the microservices revolution. Learn about containers and orchestration – and most importantly, how to exploit them for stateful services such as MongoDB.
What You Will Learn:
* Why organizations are choosing to use microservice architectures, what benefits they deliver, and when they should – or shouldn't – be used.
* Technologies that are used to build microservices – including containers, orchestration, and messaging systems.
* Why MongoDB is a good fit for microservices and what special steps need to be taken to make them work well together.
We will show the advantages of having a geo-distributed database cluster and how to create one using Galera Cluster for MySQL. We will also discuss the configuration and status variables that are involved and how to deal with typical situations on the WAN such as slow, untrusted or unreliable links, latency and packet loss. We will demonstrate a multi-region cluster on Amazon EC2 and perform some throughput and latency measurements in real-time (video http://galeracluster.com/videos/using-galera-replication-to-create-geo-distributed-clusters-on-the-wan-webinar-video-3/)
Lightweight Transactions at Lightning SpeedScyllaDB
This talk will outline the Scylla implementation of Lightweight Transactions (LWT) that brings us to parity with Apache Cassandra. We will cover how to use it, what is working, and what is left to be done. We will also cover what other improvements are in store to improve Scylla's transactional capabilities and why it matters.
Over the years, DevOps has evolved in many aspects: from best practices, to the responsibilities of an operations engineer, to culture, the ever-changing landscape of infrastructure brings new, exciting discoveries… but also some never before seen issues! I always keep a careful record of all the problems we’ve troubleshot because you never know when a solution will come in handy. This session tells a couple of tales of dealing with databases and how troubleshooting has evolved over time.
JCON Online 2021, International Java Community Conference, 07.10.21, Moritz Kammerer (@Moritz Kammerer, Expert Software Engineer at QAware).
== Please download slides in case they are blurred! ===
In his talk we have had a look at how Microservices can be developed with Micronaut. In our slides you can find out if it kept its promise.
Detail behind the Apache Cassandra 2.0 release and what is new in it including Lightweight Transactions (compare and swap) Eager retries, Improved compaction, Triggers (experimental) and more!
• CQL cursors
A brief, but action-packed introduction to DataStax Enterprise Search. In this deck, we'll get an overview of DSE Search's value proposition, see some example CQL search queries, and dive into the details of the indexing and query paths.
AddThis: Scaling Cassandra up and down into containers with ZFSDataStax Academy
ZFS is an advanced file, raid, and volume management system originally developed by Sun Microsystems, 'The Last Word in File Systems' has been unavailable on Linux until recently. AddThis uses ZFS to more effectively scale up dedicated hardware, getting twice the performance at half the cost. ZFS is also fundamental to containerization, allowing nodes from multiple clusters to be co-located with safe persistent storage.
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...DataStax
EmoDB is an open source RESTful data store built on top of Cassandra that stores JSON documents and, most notably, offers a databus that allows subscribers to watch for changes to those documents in real time. It features massive non-blocking global writes, asynchronous cross data center communication, and schema-less json content.
For non-blocking global writes, we created a ""JSON delta"" specification that defines incremental updates to any json document. Each row, in Cassandra, is thus a sequence of deltas that serves as a Conflict-free Replicated Datatype (CRDT) for EmoDB's system of record. We introduce the concept of ""distributed compactions"" to frequently compact these deltas for efficient reads.
Finally, the databus forms a crucial piece of our data infrastructure and offers a change queue to real time streaming applications.
About the Speaker
Fahd Siddiqui Lead Software Engineer, Bazaarvoice
Fahd Siddiqui is a Lead Software Engineer at Bazaarvoice in the data infrastructure team. His interests include highly scalable, and distributed data systems. He holds a Master's degree in Computer Engineering from the University of Texas at Austin, and frequently talks at Austin C* User Group. About Bazaarvoice: Bazaarvoice is a network that connects brands and retailers to the authentic voices of people where they shop. More at www.bazaarvoice.com
- Understanding Time Series
- What's the Fundamental Problem
- Prometheus Solution (v1.x)
- New Design of Prometheus (v2.x)
- Data Compression Algorithm
Every company likes to brag about their successes, but not many are willing to talk about their failures. At PagerDuty we have been rigorously tracking downtime in order to analyze it and learn from our mistakes - we even blog about these failures publicly.
Despite being a highly available system, we have had three outages caused by problems with our production Cassandra clusters over the past year. We'll take a look at each of these outages: what we saw from the inside, the actions we took to recover, and most importantly the procedures and monitoring that will help prevent it from happening to you.
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...DataStax
Cassandra's support for multiple data centers can bring massive benefits to an organization, however it can also bring painful operational lessons. While there is no recipe for trouble free mutli DC clusters, the best approach is to understand why you are using one, what Cassandra supports, and how it does it. With this knowledge in your toolkit you will have a better chance of fixing the sort of gremlins that can trouble a globally distributed database.
In this talk Alexander Dejanovski, Consultant at The Last Pickle, will outline the motivations people typically have for running a multi DC cluster. He will also look at how multiple DC's are supported through all areas of the Cassandra, how it impacts your application and operations, and how you can always blame the network.
About the Speaker
Alexander DEJANOVSKI Consultant, The Last Pickle
Alexander has been working as a software developer for the last 18 years, mainly for the french leader of express shipments. He's been leading there the effort to build a Cassandra based architecture and migrate services to it from traditional RDBMS. He is involved in the Cassandra community through the development of a JDBC wrapper for the DataStax Java Driver. Recently, he joined The Last Pickle as a Cassandra consultant and now helps customers to get the best out of it.
A key feature when monitoring and debugging any Cloud infrastructure is to provide the ability to trace, track, and collate all the individual, discrete steps that compose an event. A typical resource action in OpenStack is often a combination of smaller tasks -- which given the distributed nature of OpenStack -- can fail at unpredictable points in the workflow. By collecting the appropriate events, operators can view all events within Ceilometer, filter on a failed action and trace back the history of related events to spot anomalies or errors. In this talk, we provide an overview of the recent enhancements made in Ceilometer to support the collection of event notifications from OpenStack services. We will describe: how events are processed, transformed and stored in Ceilometer; how you can derive metrics from events; and how it’s possible to track the events of a resource and analyse where errors occur.
C* Summit 2013: How Not to Use Cassandra by Axel LiljencrantzDataStax Academy
At Spotify, we see failure as an opportunity to learn. During the two years we've used Cassandra in our production environment, we have learned a lot. This session touches on some of the exciting design anti-patterns, performance killers and other opportunities to lose a finger that are at your disposal with Cassandra.
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...DataStax
Cassandra is moving away from Thrift to CQL protocol. Along with this change, Thrift based client drivers are not actively supported any more, nor are they exposed to new Cassandra features. This brings many existing Cassandra users into a situation that they need to migrate their Thrift based application to CQL based. A complete solution to Thrift-to-CQL migration requires changes in 3 areas: application code, data model, and existing data. In this session we will focus on how we can migrate existing data effectively in order to reflect data model changes and give you the necessary tools to build from what you have learned to apply it in your environments.
Learning Objectives:
1) Gain an insight of what Cassandra storage engine looks like for version 2.2 and downward
2) Get better idea of how Thrift and CQL difference affects table design
3) Explore an approach to effectively migrate dynamically generated (Thrift) data into a static defined (CQL) table
About the Speaker
Yabin Meng Apache Cassandra / DataStax Enterprise Consultant, Pythian
Yabin is a DataStax certified Architect, Administrator, and Developer. He has been in IT industry for more than 15 years and much of his career is around database related technologies. He has been working with Cassandra for about 2 years and is currently a Cassandra/DSE consultant at Pythian.
Most project managers understand that effective communication is essential to project success. And most are familiar with and use Microsoft Project. However, many project managers are not aware of the rich functional benefits that can be realized from Microsoft’s integration of its products. This webinar will expose and demonstrate why and how the application of integrated Microsoft tools can dramatically improve your communication skills and increase project delivery success.
Cassandra Day 2014: Interactive Analytics with Cassandra and SparkEvan Chan
Take your analytics to the next level by using Apache Spark to accelerate complex interactive analytics using your Apache Cassandra data. Includes an introduction to Spark as well as how to read Cassandra tables in Spark.
Real time Messages at Scale with Apache Kafka and CouchbaseWill Gardella
Kafka is a scalable, distributed publish subscribe messaging system that's used as a data transmission backbone in many data intensive digital businesses. Couchbase Server is a scalable, flexible document database that's fast, agile, and elastic. Because they both appeal to the same type of customers, Couchbase and Kafka are often used together.
This presentation from a meetup in Mountain View describes Kafka's design and why people use it, Couchbase Server and its uses, and the use cases for both together. Also covered is a description and demo of Couchbase Server writing documents to a Kafka topic and consuming messages from a Kafka topic. using the Couchbase Kafka Connector.
Introduction to Apache Spark Workshop at Lambda World 2015 on October 23th and 24th, 2015, celebrated in Cádiz. Speakers: @fperezp and @juanpedromoreno
Github Repo: https://github.com/47deg/spark-workshop
This is a presentation of the popular NoSQL database Apache Cassandra which was created by our team in the context of the module "Business Intelligence and Big Data Analysis".
10 Ways To Get Clients for IT Software Development CompaniesKraftblick
In a real life the ability to deliver “world-changing” IT solutions is not persuasive enough for decision makers to open their corporate wallets. Every IT software development vendor has to promote the expertise of its team!
Kraftblick team has prepared the presentation showcasing 10 ways for IT software development companies to acquire new clients from Internet.
From the original abstract:
If you're already using Cassandra you're already aware of it’s strengths of high availability and linear scalability. The downside to this power is less query flexibility. For an OLTP system with an SLA this is an acceptable tradeoff, but for a data scientist it’s extremely limiting.
Enter Apache Spark. Apache spark complements an existing Cassandra cluster by providing a means of executing arbitrary queries, filters, sorting and aggregation. It’s possible to use functional constructs like map, filter, and reduce, as well as SQL and DataFrames.
In this presentation I’ll show you how to process Cassandra data in bulk or through a Kafka stream using Python. Then we’ll visualize our data using iPython notebooks, leveraging Pandas and matplotlib.
This is an advanced talk. We will assume existing knowledge of Cassandra and CQL.
The Nitty Gritty of Advanced Analytics Using Apache Spark in PythonMiklos Christine
Apache Spark is the next big data processing tool for Data Scientist. As seen on the recent StackOverflow analysis, it's the hottest big data technology on their site! In this talk, I'll use the PySpark interface to leverage the speed and performance of Apache Spark. I'll focus on the end to end workflow for getting data into a distributed platform, and leverage Spark to process the data for advanced analytics. I'll discuss the popular Spark APIs used for data preparation, SQL analysis, and ML algorithms. I'll explain the performance differences between Scala and Python, and how Spark has bridged the gap in performance. I'll focus on PySpark as the interface to the platform, and walk through a demo to showcase the APIs.
Talk Overview:
Spark's Architecture. What's out now and what's in Spark 2.0Spark APIs: Most common APIs used by Spark Common misconceptions and proper techniques for using Spark.
Demo:
Walk through ETL of the Reddit dataset. SparkSQL Analytics + Visualizations of the Dataset using MatplotLibSentiment Analysis on Reddit Comments
WSA: Scaling Web Service to Handle Millions of Requests per SecondWebStackAcademy
In this presentation of Venkata Ramana Gollamudi, Principal Architect at Flipkart, as our esteemed speaker. With his expertise and experience, he'll guide us through the intricate process of “Scaling web service to handle millions of requests per second”.
Key Takeaways:
Understanding the specific challenges of scaling for Indian users
Strategies to ensure service availability, reduce latency, and maintain reliability
Leveraging tools like ELBs, K8s, message queues, and NoSQL databases effectively
Don't miss this unparalleled opportunity to gain invaluable insights and elevate your expertise in scaling web services to new heights.
We eagerly await your participation in this transformative webinar. Let's embark on the journey to mastering scalability together!
Slides from a presentation by Monal Daxini at Disney, Glendale CA about Netflix Open Source Software, Cloud Data Persistence, and Cassandra best Practices
Watch this webcast here: https://www.confluent.io/online-talks/write-great-kafka-connectors/
Apache Kafka with all its simple but needed offerings has left deep footprints in the software industry. And, with the ever-growing and maturing Kafka ecosystem, Kafka Connect allows us to focus on the data transformation rather than handling Kafka nitty-gritty details (which is mandatory if someone is using Kafka’s Producer/Consumer APIs). Kafka Connect in contrast, provides developers and operators a simple way of accessing, transforming and delivering data to connect an organization’s applications with their event streaming platform in the form of connectors.
Confluent's partner HashedIn Technologies has created many Kafka connectors. In this online talk, HashedIn shares their best practices on how to write great ones.
HashedIn will cover:
-In-field best practices for writing great Connectors, including both Sink Connectors and Source Connectors that transform and move data in and out of diverse external systems like AWS SQS, AWS S3, Firebase, Hadoop File System, InfluxDB, JDBC, Prometheus, Salesforce and Windows event logging.
-How to unlock the true potential of the Confluent Platform and move petabytes of data each day with the best possible Kafka Connector promises including exactly/at-least-once message delivery, retry mechanism, restart/rebalance behavior and ordering guarantees.
Deploying MariaDB databases with containers at Nokia NetworksMariaDB plc
Nokia is focused on providing software and products that facilitate rapid development, deployment and scaling of products and services to customers. The Common Software Foundation (CSF) within Nokia develops and supports product reuse by multiple applications within Nokia, including MariaDB. Their focus over the last year has been to develop a containerized MariaDB solution supporting multiple architectures, including both clustering and primary/secondary replication with MariaDB MaxScale. In this talk, Rick Lane discusses this journey of these containerized solutions from development to customer trials, including problems encountered and solutions.
GumGum relies heavily on Cassandra for storing different kinds of metadata. Currently GumGum reaches 1 billion unique visitors per month using 3 Cassandra datacenters in Amazon Web Services spread across the globe.
This presentation will detail how we scaled out from one local Cassandra datacenter to a multi-datacenter Cassandra cluster and all the problems we encountered and choices we made while implementing it.
How did we architect multi-region Cassandra in AWS? What were our experiences in implementing multi-datacenter Cassandra? How did we achieve low latency with multi-region Cassandra and the Datastax Driver? What are the different Cassandra use cases at GumGum? How did we integrate our Cassandra with Spark?
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd KnownDataStax
A brief intro to how Barracuda Networks uses Cassandra and the ways in which they are replacing their MySQL infrastructure, with Cassandra. This presentation will include the lessons they've learned along the way during this migration.
Speaker: Michael Kjellman, Software Engineer at Barracuda Networks
Michael Kjellman is a Software Engineer, from San Francisco, working at Barracuda Networks. Michael works across multiple products, technologies, and languages. He primarily works on Barracuda's spam infrastructure and web filter classification data.
Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluen...confluent
Do you think that writing simple, expressive code to react to event streams in real-time can sometimes look just a little too easy ? Or perhaps you’re already a seasoned stream processing expert ? Either way, come prepared to have some fun and guess the answers to our streaming brain-teasers as we highlight some misconceptions and things that may surprise you in the brave new world of continuous stream processing!
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
7. Sharded MySQL. Environment
1. MySQL (Percona Server)
2. Hardware configuration:
a. two Intel E2620v2 CPU
b. 128GB RAM
c. 12x800GB Intel SSD, RAID 10
d. two 2Gb network interfaces (bonded)
10. Sharded MySQL. Approach
1. Hibernate changes:
a. Patching 2nd level caching:
i. +environment
ii. -class version
b. More info to debug problems
c. Fixing bugs
2. Own implementation:
a. FitbitTransactional
b. ManagedHibernateSession
3. Dynamic sharding concept (somewhat similar to C*)
12. Sharded MySQL. Data migration
Migration (96 -> 152 shards):
● vBuckets to move: 96579
● 1 bucket migration time: 8 min
● 10 bucketmover * 3 processes - 12 days
13. Sharded MySQL. Data migration
Job
● Setup
a. Ensures vbuckets in read-only mode
b. Waits for servers to reach consensus
● Execute
a. Triggers actions (dump, insert, etc.) on Bucketmover
b. Waits for actions to complete
● Wrap-up
a. Updates shards for vbuckets, re-opens them for writes
b. Advances jobs to next action
14. Sharded MySQL. Schema migration
1. Locks during schema update
Solution: pt-online-schema-change + protobuf
Drawbacks:
1. Split between DML/DDL scripts
2. Binary format (additional data)
3. Additional platform specific tool
message Meta {
optional string name = 1;
optional string intro = 2;
...
repeated string requiredFeatures = 32;
}
message Challenge {
optional Meta meta = 1;
...
optional CWRace cw_race = 6;
}
15. Sharded MySQL. Development
1. Job system across shards
2. Use unsharded databases for lookup tables
3. Do not forget about custom annotation
@PrimaryEntity(entityType = EntityType.SHARDED_ENTITY)
16. Query patterns
1. Create challenge
2. List challenges by user
3. Get challenge team leaderboard by user
4. Post a message
5. List challenge messages
6. Cheer a message
17. MySQL. Not a problem
1. Knowledge Base
2. Response Time
18. Our problems
1. MySQL
a. Scalability
b. Fault tolerance
c. Schema migration
d. Locks
2. Infrastructure cost
a. MemcacheD
b. Redis
22. Apache Cassandra
Apache Cassandra is a free and open-source distributed database
management system designed to handle large amounts of data across
many commodity servers, providing high availability with no single
point of failure. Cassandra offers robust support for clusters spanning
multiple datacenters, with asynchronous masterless replication allowing
low latency operations for all clients.
26. ID migration
1. Create pseudo-random migration UUID based on BIGINT
2. Thank API designers for using string as object ids.
3. Make sure clients are ready for the new length of the id.
4. Migrate API to UUID all over the place
27. DAO (Data Access Object)
1. Create CassandraDAO with the same interface as HibernateDAO
2. Create ProxyAdapterDAO to control which implementation to select
3. Create adapter implementation for each DAO with the same
interface as HibernateDAO
28. Enable shadow writes (percentage)
1. Introduce environment specific settings for shadow writes
2. Adjust ProxyAdapterDAO code to enable shadow writes by
percentage. Various implementations.
3. Analyze performance (StatsD metrics for our code + Cassandra
metrics)
30. Start shadow C* reads with validation
1. Environment specific settings for data validation
2. Adjust ProxyAdapterDAO code to enable simultaneous read from
MySQL and Cassandra
3. Adjust ProxyAdapterDAO to be able to compare objects
4. Logging & investigating data discrepancy.
31. Check validation issues
1. Path
a. Fix code problems
b. Migrate affected challenges again
c. Go to step 1
2. Duration: 1.5 month
32. Turn on read from C*
1. Introduce C* return read percentage in the config settings
2. Still do shadow MySQL reads and validations
3. Increase percentage over time
34. Clean-up
1. Adjust places which are not suitable for C* patterns like look
through all of the shards.
2. Adjust adapters to get rid of Hibernate DAOs. Adapter hierarchy is
still presented
3. Remove obsolete code
4. Clean up MySQL database
35. Challenge Events Migration Example
1. Previous attempts:
a. SWF + SQS
b. MySQL + Job across all shards
2. Now
a. Complication due to C* as a queue performance
b. 16 threads across 1024 buckets
36. Code Redesign. Message cheer example
1. Originally
a. Read
b. Update BLOB
c. Persist
2. Approach
a. Update C* set as a single operation
37. Code Redesign. Life without transactions
1. BATCH
2. Some object as a single source of truth
38.
39. Challenges C*. Current State
1. Two datacenters
2. 18 nodes
3. Hardware
a. 24-core CPU
b. 64 GB RAM
4. RF: 3
40. Results of migration
1. Significant improvement in persistence storage scalability &
management (comparing to MySQL RDBMS)
2. Minimizing number of external points of failures
3. Squashing Technical Debt
4. Created a reusable migration module
41. Cassandra Inconveniences
1. Lack of ACID transactions
2. MultiDC scenarios require concious decisions for
QUORUM/LOCAL_QUORUM.
3. Data denormalization
4. CQL vs SQL limitations
5. Less readable IDs
42. Surprisingly not a big deal
1. Lack of JOINs due to the model
2. Lack of aggregation functions due to the model (we’re on 2.1 now)
3. Eventual consistency
4. IDs format change