This document discusses various strategies for modeling and querying data in Cassandra to avoid common anti-patterns like client-side joins, multi-partition queries, and mutable data. It recommends denormalizing data and using techniques like bucketing and user-defined types to reduce the need for joins. Batch statements are recommended only for writing to the same partition, as writing to multiple partitions in a batch can degrade performance. Logged batches can be used to write related data to multiple tables in an atomic way. The document also discusses an event sourcing approach to modeling mutable data.
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data ModelingDataStax Academy
You know you need Cassandra for it's uptime and scaling, but what about that data model? Let's bridge that gap and get you building your game changing app. We'll break down topics like storing objects and indexing for fast retrieval. You will see by understanding a few things about Cassandra internals, you can put your data model in the spotlight. The goal of this talk is to get you comfortable working with data in Cassandra throughout the application lifecycle. What are you waiting for? The cameras are waiting!
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data ModelingDataStax Academy
You know you need Cassandra for it's uptime and scaling, but what about that data model? Let's bridge that gap and get you building your game changing app. We'll break down topics like storing objects and indexing for fast retrieval. You will see by understanding a few things about Cassandra internals, you can put your data model in the spotlight. The goal of this talk is to get you comfortable working with data in Cassandra throughout the application lifecycle. What are you waiting for? The cameras are waiting!
Cassandra Day Atlanta 2015: Building Your First Application with Apache Cassa...DataStax Academy
You’ve heard the talks, followed the tutorials, and done the research. You are a font of Cassandra knowledge. Now it’s time to change the world! (Or at least build something to make your boss happy). In this talk we’ll walk through the process of building KillrVideo, an open source video sharing website where users can upload and share videos, rate them, comment on them, and more. By looking at a real application, we’ll talk about architectural decisions, how the application drives the data model, some pro tips when using the DataStax drivers, and some lessons learned from mistakes made along the way. You’ll leave this session ready to start building your next application (world-changing or otherwise) with Cassandra.
Temporary Cache Assistance (Transients API): WordCamp Phoenix 2014Cliff Seal
We’ll cover the basics of the Transients API, see basic examples, and then discuss common places where this method can be most helpful, like large, complex queries or pulling from an external API. We’ll also discuss how this type of caching is unique, when to use it, and how to scale it for big bursts of traffic.
Follow along with the code examples inside a working plugin: http://logoscreative.co/wcphx14/
Temporary Cache Assistance (Transients API): WordCamp Birmingham 2014Cliff Seal
WordPress has a few built-in ways to cache data that enable rapid development. Understanding your options and how to use them properly in your context is crucial to a performant and scalable site. The Transients API provides a powerful and easy way to store data with an expiration, and it comes with a few under-the-hood perks as well.
Join me in looking at the benefits you can gain from understanding and implementing “transients”. When we’re done, you’ll know what this API is, when it should be used, how to use it, and how to scale it. I’ll give real, useful code examples that you can implement immediately—without boring you to death. You’ll be able to do anything from caching data from a external API (like recent tweets) to storing a large, complex query.
We’ll also cover some of the more obscure aspects of this method, like:
-Object caching/Memcached
-Autoloading
-Race Conditions
-Expired transient cleanup
-Options table bloat
Do yourself and your visitors a favor by utilizing the Transients API. And, as you’ll see in this session, knowing how to use it will make all WordPress’s caching techniques easy to implement.
Cassandra Day Chicago 2015: Advanced Data ModelingDataStax Academy
Speaker(s): Tim Berglund, Global Director of Training at DataStax
You know you need Cassandra for its uptime and scaling, but what about that data model? Let's bridge that gap and get you building your game changing app. We'll break down topics like storing objects and indexing for fast retrieval. You will see by understanding a few things about Cassandra internals, you can put your data model in the spotlight. The goal of this talk is to get you comfortable working with data in Cassandra throughout the application lifecycle. What are you waiting for?
Service discovery and configuration provisioningSource Ministry
Slides from our talk "Service discovery and configuration provisioning" presented by Mariusz Gil at PHP Benelux 2016
Apache Zookeeper or Consul are almost completely unknown in the PHP world, although its use solves a lot of typical problems. In a nutshell, they are a central services of provisioning configuration information, distributed synchronization and coordination of servers/processes. It simplifies the processes of application configuration management, so it is possible to change its settings and operation in real time (eg. feature flagging). During the presentation the typical cases of use of Zookeeper/Consul in PHP applications will be presented, both strictly web and workers running from the CLI.
Design Patterns in Micro-services architectures & GilmourPiyush Verma
Gilmour is a framework for writing micro-services that exchange data over non-http transports. Talk looks at Design Patterns of a service oriented architecture, like Request-Response, Signal-Slots, Service Discovery, Load Balancing, Error detection and how Gilmour addresses those.
"Wix Engineering Media AI Photo Studio", Mykola MykhailychFwdays
In this talk, we will review components of the Wix Engineering AI-based image processing toolbox: super-resolution, automatic enhancement, cutout, etc. We'll share insights about models in production and their metrics. We'll show how ML models can improve user experience and make core products even more helpful.
MongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDBMongoDB
Le pipeline d'agrégation a été en mesure d'alimenter votre analyse de données depuis la version 2.2. Dans la version 4.2, nous avons ajouté plus de puissance et vous pouvez maintenant l'utiliser pour des requêtes plus puissantes, des mises à jour et la sortie de vos données dans des collections existantes. Venez découvrir comment vous pouvez tout faire avec le pipeline, y compris les vues uniques, ETL, les cumuls de données et les vues matérialisées.
The JSON data type and functions that support it comprise one of the most interesting features introduced in MySQL 5.7 for application developers. But no feature is a Golden Hammer. We need to apply a little expertise to get the best of it, and avoid misusing it. I’ll show practical examples that work well with JSON, and other scenarios where conventional columns would perform better. Questions addressed in this presentation: How much space does JSON data use, compared to conventional data? What is the performance of querying JSON vs. conventional data? How do I create indexes for JSON data? What kind of data is best to store in JSON? How do I get the best of both worlds?
Speech of Nihad Abbasov, Senior Software Engineer at Digital Classifieds, at Ruby Meditation 27, Dnipro, 19.05.2019
Slideshare -
Next conference - http://www.rubymeditation.com/
How fast is your code? Performance is crucial as your startup grows, and optimizing your application can make a huge impact on user experience. During this talk, you will learn hints, techniques and best practices for improving the overall speed of your Ruby application.
Announcements and conference materials https://www.fb.me/RubyMeditation
News https://twitter.com/RubyMeditation
Photos https://www.instagram.com/RubyMeditation
The stream of Ruby conferences (not just ours) https://t.me/RubyMeditation
Cassandra Day Atlanta 2015: Building Your First Application with Apache Cassa...DataStax Academy
You’ve heard the talks, followed the tutorials, and done the research. You are a font of Cassandra knowledge. Now it’s time to change the world! (Or at least build something to make your boss happy). In this talk we’ll walk through the process of building KillrVideo, an open source video sharing website where users can upload and share videos, rate them, comment on them, and more. By looking at a real application, we’ll talk about architectural decisions, how the application drives the data model, some pro tips when using the DataStax drivers, and some lessons learned from mistakes made along the way. You’ll leave this session ready to start building your next application (world-changing or otherwise) with Cassandra.
Temporary Cache Assistance (Transients API): WordCamp Phoenix 2014Cliff Seal
We’ll cover the basics of the Transients API, see basic examples, and then discuss common places where this method can be most helpful, like large, complex queries or pulling from an external API. We’ll also discuss how this type of caching is unique, when to use it, and how to scale it for big bursts of traffic.
Follow along with the code examples inside a working plugin: http://logoscreative.co/wcphx14/
Temporary Cache Assistance (Transients API): WordCamp Birmingham 2014Cliff Seal
WordPress has a few built-in ways to cache data that enable rapid development. Understanding your options and how to use them properly in your context is crucial to a performant and scalable site. The Transients API provides a powerful and easy way to store data with an expiration, and it comes with a few under-the-hood perks as well.
Join me in looking at the benefits you can gain from understanding and implementing “transients”. When we’re done, you’ll know what this API is, when it should be used, how to use it, and how to scale it. I’ll give real, useful code examples that you can implement immediately—without boring you to death. You’ll be able to do anything from caching data from a external API (like recent tweets) to storing a large, complex query.
We’ll also cover some of the more obscure aspects of this method, like:
-Object caching/Memcached
-Autoloading
-Race Conditions
-Expired transient cleanup
-Options table bloat
Do yourself and your visitors a favor by utilizing the Transients API. And, as you’ll see in this session, knowing how to use it will make all WordPress’s caching techniques easy to implement.
Cassandra Day Chicago 2015: Advanced Data ModelingDataStax Academy
Speaker(s): Tim Berglund, Global Director of Training at DataStax
You know you need Cassandra for its uptime and scaling, but what about that data model? Let's bridge that gap and get you building your game changing app. We'll break down topics like storing objects and indexing for fast retrieval. You will see by understanding a few things about Cassandra internals, you can put your data model in the spotlight. The goal of this talk is to get you comfortable working with data in Cassandra throughout the application lifecycle. What are you waiting for?
Service discovery and configuration provisioningSource Ministry
Slides from our talk "Service discovery and configuration provisioning" presented by Mariusz Gil at PHP Benelux 2016
Apache Zookeeper or Consul are almost completely unknown in the PHP world, although its use solves a lot of typical problems. In a nutshell, they are a central services of provisioning configuration information, distributed synchronization and coordination of servers/processes. It simplifies the processes of application configuration management, so it is possible to change its settings and operation in real time (eg. feature flagging). During the presentation the typical cases of use of Zookeeper/Consul in PHP applications will be presented, both strictly web and workers running from the CLI.
Design Patterns in Micro-services architectures & GilmourPiyush Verma
Gilmour is a framework for writing micro-services that exchange data over non-http transports. Talk looks at Design Patterns of a service oriented architecture, like Request-Response, Signal-Slots, Service Discovery, Load Balancing, Error detection and how Gilmour addresses those.
"Wix Engineering Media AI Photo Studio", Mykola MykhailychFwdays
In this talk, we will review components of the Wix Engineering AI-based image processing toolbox: super-resolution, automatic enhancement, cutout, etc. We'll share insights about models in production and their metrics. We'll show how ML models can improve user experience and make core products even more helpful.
MongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDBMongoDB
Le pipeline d'agrégation a été en mesure d'alimenter votre analyse de données depuis la version 2.2. Dans la version 4.2, nous avons ajouté plus de puissance et vous pouvez maintenant l'utiliser pour des requêtes plus puissantes, des mises à jour et la sortie de vos données dans des collections existantes. Venez découvrir comment vous pouvez tout faire avec le pipeline, y compris les vues uniques, ETL, les cumuls de données et les vues matérialisées.
The JSON data type and functions that support it comprise one of the most interesting features introduced in MySQL 5.7 for application developers. But no feature is a Golden Hammer. We need to apply a little expertise to get the best of it, and avoid misusing it. I’ll show practical examples that work well with JSON, and other scenarios where conventional columns would perform better. Questions addressed in this presentation: How much space does JSON data use, compared to conventional data? What is the performance of querying JSON vs. conventional data? How do I create indexes for JSON data? What kind of data is best to store in JSON? How do I get the best of both worlds?
Speech of Nihad Abbasov, Senior Software Engineer at Digital Classifieds, at Ruby Meditation 27, Dnipro, 19.05.2019
Slideshare -
Next conference - http://www.rubymeditation.com/
How fast is your code? Performance is crucial as your startup grows, and optimizing your application can make a huge impact on user experience. During this talk, you will learn hints, techniques and best practices for improving the overall speed of your Ruby application.
Announcements and conference materials https://www.fb.me/RubyMeditation
News https://twitter.com/RubyMeditation
Photos https://www.instagram.com/RubyMeditation
The stream of Ruby conferences (not just ours) https://t.me/RubyMeditation
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...DataStax
Making sure your Data Model will work on the production cluster after 6 months as well as it does on your laptop is an important skill. It's one that we use every day with our clients at The Last Pickle, and one that relies on tools like the cassandra-stress. Knowing how the data model will perform under stress once it has been loaded with data can prevent expensive re-writes late in the project.
In this talk Christopher Batey, Consultant at The Last Pickle, will shed some light on how to use the cassandra-stress tool to test your own schema, graph the results and even how to extend the tool for your own use cases. While this may be called premature optimisation for a RDBS, a successful Cassandra project depends on it's data model.
About the Speaker
Christopher Batey Consultant / Software Engineer, The Last Pickle
Christopher (@chbatey) is a part time consultant at The Last Pickle where he works with clients to help them succeed with Apache Cassandra as well as a freelance software engineer working in London. Likes: Scala, Haskell, Java, the JVM, Akka, distributed databases, XP, TDD, Pairing. Hates: Untested software, code ownership. You can checkout his blog at: http://www.batey.info
This presentation will walk through some of the key considerations for planning and running load test to ensure your Cassandra application will meet you expected scaling requirements. We will also walk through some examples of using the cassandra-stress tool to construct load test for real-life application scenarios.
About the Speaker
Ben Slater Chief Product Officer, Instaclustr
Instaclustr provides Cassandra and Spark as a managed service in the cloud. As Chief Product Officer, Ben is charged with steering Instaclustr's development roadmap, managing product engineering and overseeing the production support and consulting teams. Ben has over 20 years experience in systems development including previously as lead architect for the product that is now Oracle Policy Automation and over 10 years as a solution architect and project manager for Accenture.
Cassandra Day London 2015: Apache Cassandra Anti-PatternsDataStax Academy
Speaker(s): Christopher Batey, Apache Cassandra Evangelist at DataStax
You wouldn't try and use a saw to hammer a nail so why do people try to use databases as queues? Or perhaps more subtly: rely on atomic updates to mutable data distributed across the world (when you say it out loud is sounds a bit ridiculous right?)
In this talk we're going to go through the most common ways to misuse Cassandra. Some are obvious and you should be using a different kind of technology, like the queue example, but others you just need to change the way you're using Cassandra.
Topics we'll cover:
* Joins (yes we'll get started with the obvious one)
* Mutable data / read then write / breaking the laws of physics
* Really wide rows (but the Internet said wide rows were good?)
* Batches!
* Multi partition queries a.k.a ReadTimeout heaven
* Secondary indexes under the covers
* Tripping over tombstones
Crazy retry policies: I am looking at you Downgrading retry policy
Misusing technology isn't isolated to the distributed database world, I used to work on a small piece of queueing software called WebSphereMQ and what do you know? A good portion of our customers used it as a database!
Cassandra Day London 2015: Getting Started with Apache Cassandra and JavaDataStax Academy
Speaker(s): Christopher Batey, Apache Cassandra Evangelist at DataStax
In this session you’ll learn just enough to get started with NoSQL Apache Cassandra and Java, including how to install, build and try out some basic API calls. You’ll learn the basics of how to code your first application in Java on top of Cassandra, and leave the session feeling confident and excited to take the next step!
Webinar: You made the move to Office 365—now what?ShareGate
In this webinar, Benjamin Niaulin explains how to leverage your Office 365 subscriptions to keep pace with the evolving workplace.
It’s not just SharePoint that needs to go from classic to modern—the way our IT departments think about and use technology in the workplace needs to be updated, too.
[JSS2015] Nouveautés SQL Server 2016:Sécurité,Temporal & Stretch TablesGUSS
SQL Server 2016 a des nouveautés très intéressantes comme le dynamic data masking et le row level security ou encore les stretch tables qui vous permettent l’extension d’une ou plusieurs tables vers une base Azure SQL. Découvrez ces fonctionnalités à travers des exemples d'utilisation.Comment cela fonctionne t-il? Comment peut-il influer sur l'administration de votre base de données? Nous allons essayer de répondre à toutes ces questions ...
Talk given at QCon, London 2014. You can find the video here: http://bit.ly/jpm_001a
This topic will introduce the Cassandra native protocol, native drivers and Cassandra Query Language (CQL). It is important for developers to be aware of this new way of integrating with and querying Cassandra – without using Thrift or RPC. There are various ways of tuning that integration and modeling your data - all intended to make it easier and more productive to build against Cassandra with some additional performance benefits. This is a technical session with code abstracts using the Java driver.
Supercharging WordPress Development - Wordcamp Brighton 2019Adam Tomat
Slide links:
- https://lumberjack.rareloop.com
- https://docs.lumberjack.rareloop.com
- https://github.com/Rareloop/lumberjack-bedrock-installer
- https://github.com/Rareloop/lumberjack
- https://github.com/Rareloop/lumberjack-validation
- https://github.com/Rareloop/hatchet
- rareloop.com/careers
- https://www.rareloop.com/posts/comparing-modern-mvc-wordpress-frameworks/
- https://lizkeogh.com/2017/08/31/reflecting-reality/amp
- https://www.youtube.com/watch?v=uQUxJObxTUs
- https://www.upstatement.com/timber
- https://roots.io/bedrock
---
Often WordPress themes are not easy to change, maintain or fun to work on. This can rule WordPress out as a viable option for bespoke, non-trivial websites.
In this talk we’ll dive into how this happens & look at how we can benefit from software engineering techniques to help make your code easier to change. I’ll also show how using Lumberjack, a powerful MVC framework built on Timber, can be used to power-up your themes.
Light Weight Transactions Under Stress (Christopher Batey, The Last Pickle) ...DataStax
The Strong Consistency provided by QUORUM reads in Cassandra can still lead to read-write-modify problems when applications want to do things such as guarantee uniqueness or sell exactly 300 cinema tickets. Fortunately Light Weight Transactions (LWT) are designed to solve the problems Strong Consistency can not.
In this talk Christopher Batey, Consultant at The Last Pickle, will discuss:
- Syntax and semantics: Theoretical use cases
- How they work under the covers
Then we will go through LWTs in practice:
- How do the number of nodes/replicas/data centres affect performance?
- How does contention (multiple concurrent queries using LWTs) affect availability and performance?
- What consistency guarantees do you get with other LWTs and non-LWTs?
- How does LWT timeout differ from normal write timeout?
- Use case: LWTs as a distributed lock and how it went wrong 5 times.
About the Speaker
Christopher Batey Consultant / Software Engineer, The Last Pickle
Christopher (@chbatey) is a part time consultant at The Last Pickle where he works with clients to help them succeed with Apache Cassandra as well as a freelance software engineer working in London. Likes: Scala, Haskell, Java, the JVM, Akka, distributed databases, XP, TDD, Pairing. Hates: Untested software, code ownership. You can checkout his blog at: http://www.batey.info
A lot has changed since I gave one of these talks and man, has it been good. 2.0 brought us a lot of new CQL features and now with 2.1 we get even more! Let me show you some real life data models and those new features taking developer productivity to an all new high. User Defined Types, New Counters, Paging, Static Columns. Exciting new ways of making your app truly killer!
How to Realize an Additional 270% ROI on SnowflakeAtScale
Companies of all sizes have embraced the power, scale and ease of use of Snowflake’s cloud data platform, along with the promise of cost-savings. But if you aren’t careful, cloud compute usage can sneak up on you and leave you with runaway costs no matter what BI tool you are using.
The presentation from experts from Rakuten Rewards and AtScale shows practical techniques on how you can reduce unnecessary compute and boost BI performance to realize an additional 270% ROI on Snowflake. For the on-demand webinar, go to: https://www.atscale.com/resource/wbr-cloud-compute-cost-snowflake-tableau/
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
Graspan: A Big Data System for Big Code AnalysisAftab Hussain
We built a disk-based parallel graph system, Graspan, that uses a novel edge-pair centric computation model to compute dynamic transitive closures on very large program graphs.
We implement context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases such as Linux shows that their Graspan implementations scale to millions of lines of code and are much simpler than their original implementations.
These analyses were used to augment the existing checkers; these augmented checkers found 132 new NULL pointer bugs and 1308 unnecessary NULL tests in Linux 4.4.0-rc5, PostgreSQL 8.3.9, and Apache httpd 2.2.18.
- Accepted in ASPLOS ‘17, Xi’an, China.
- Featured in the tutorial, Systemized Program Analyses: A Big Data Perspective on Static Analysis Scalability, ASPLOS ‘17.
- Invited for presentation at SoCal PLS ‘16.
- Invited for poster presentation at PLDI SRC ‘16.
Software Engineering, Software Consulting, Tech Lead, Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Transaction, Spring MVC, OpenShift Cloud Platform, Kafka, REST, SOAP, LLD & HLD.
Top 7 Unique WhatsApp API Benefits | Saudi ArabiaYara Milbes
Discover the transformative power of the WhatsApp API in our latest SlideShare presentation, "Top 7 Unique WhatsApp API Benefits." In today's fast-paced digital era, effective communication is crucial for both personal and professional success. Whether you're a small business looking to enhance customer interactions or an individual seeking seamless communication with loved ones, the WhatsApp API offers robust capabilities that can significantly elevate your experience.
In this presentation, we delve into the top 7 distinctive benefits of the WhatsApp API, provided by the leading WhatsApp API service provider in Saudi Arabia. Learn how to streamline customer support, automate notifications, leverage rich media messaging, run scalable marketing campaigns, integrate secure payments, synchronize with CRM systems, and ensure enhanced security and privacy.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
OpenMetadata Community Meeting - 5th June 2024OpenMetadata
The OpenMetadata Community Meeting was held on June 5th, 2024. In this meeting, we discussed about the data quality capabilities that are integrated with the Incident Manager, providing a complete solution to handle your data observability needs. Watch the end-to-end demo of the data quality features.
* How to run your own data quality framework
* What is the performance impact of running data quality frameworks
* How to run the test cases in your own ETL pipelines
* How the Incident Manager is integrated
* Get notified with alerts when test cases fail
Watch the meeting recording here - https://www.youtube.com/watch?v=UbNOje0kf6E
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
GraphSummit Paris - The art of the possible with Graph TechnologyNeo4j
Sudhir Hasbe, Chief Product Officer, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeAftab Hussain
Understanding variable roles in code has been found to be helpful by students
in learning programming -- could variable roles help deep neural models in
performing coding tasks? We do an exploratory study.
- These are slides of the talk given at InteNSE'23: The 1st International Workshop on Interpretability and Robustness in Neural Software Engineering, co-located with the 45th International Conference on Software Engineering, ICSE 2023, Melbourne Australia
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxrickgrimesss22
Discover the essential features to incorporate in your Winzo clone app to boost business growth, enhance user engagement, and drive revenue. Learn how to create a compelling gaming experience that stands out in the competitive market.
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
11. @chbatey
Adding staff link
CREATE TABLE customer_events(
customer_id text,
staff set<text>,
time timeuuid,
event_type text,
store_name text,
PRIMARY KEY ((customer_id), time));
CREATE TYPE staff(
name text,
favourite_colour text,
job_title text);
CREATE TYPE store(
store_name text,
location text,
store_type text);
12. @chbatey
Queries
select * from customer_events where customer_id = ‘chbatey’
limit 1
select * from staff where name in (staff1, staff2, staff3,
staff4)
Any one of these fails the whole query fails
17. @chbatey
Denormalise with UDTs
CREATE TYPE store (name text, type text, postcode text)
CREATE TYPE staff (name text, fav_colour text, job_title text)
CREATE TABLE customer_events(
customer_id text,
time timeuuid,
event_type text,
store store,
staff set<staff>,
PRIMARY KEY ((customer_id), time));
User defined types
Cassandra 2.1 has landed in DSE 4.7!!!
19. @chbatey
Adding a time bucket
CREATE TABLE customer_events_bucket(
customer_id text,
time_bucket text
time timeuuid,
event_type text,
store store,
staff set<staff>,
PRIMARY KEY ((customer_id, time_bucket), time));
20. @chbatey
Queries
select * from customer_events_bucket where customer_id =
‘chbatey’ and time_bucket IN (‘2015-01-01|0910:01’,
‘2015-01-01|0910:02’, ‘2015-01-01|0910:03’, ‘2015-01-01|
0910:04’)
Often better as multiple async queries
21. @chbatey
Tips for avoiding joins & multi gets
• Say no to client side joins by denormalising
- Much easier with UDTs
• When bucketing aim for at most two buckets for a query
• Get in the habit of reading trace + using a multi node
cluster locally
26. @chbatey
Batch insert - Good idea?
public void storeEvents(ConsistencyLevel consistencyLevel, CustomerEvent... events) {
BatchStatement batchStatement = new BatchStatement(BatchStatement.Type.UNLOGGED);
for (CustomerEvent event : events) {
BoundStatement boundInsert = insertStatement.bind(
customerEvent.getCustomerId(),
customerEvent.getTime(),
customerEvent.getEventType(),
customerEvent.getStaffId(),
customerEvent.getStaffId());
batchStatement.add(boundInsert);
}
session.execute(batchStatement);
}
27. @chbatey
Not so much
client
C
Very likely to fail if nodes are
down / over loaded
Gives far too much work for the coordinator
Ruins performance gains of
token aware driver
32. @chbatey
Logged Batches
• Once accepted the statements will eventually succeed
• Achieved by writing them to a distributed batchlog
• 30% slower than unlogged batches
33. @chbatey
Batch insert - Good idea?
public void storeEvents(ConsistencyLevel consistencyLevel, CustomerEvent... events) {
BatchStatement batchStatement = new BatchStatement(BatchStatement.Type.LOGGED);
for (CustomerEvent event : events) {
BoundStatement boundInsert = insertStatement.bind(
customerEvent.getCustomerId(),
customerEvent.getTime(),
customerEvent.getEventType(),
customerEvent.getStaffId(),
customerEvent.getStaffId());
batchStatement.add(boundInsert);
}
session.execute(batchStatement);
}
38. @chbatey
Distributed mutable state :-(
create TABLE accounts(customer text PRIMARY KEY, balance_in_pence int);
Overly naive example
• Cassandra uses last write wins (no Vector clocks)
• Read before write is an anti-pattern - race conditions and latency
39. @chbatey
Take a page from Event sourcing
create table accounts_log(
customer text,
time timeuuid,
delta int,
primary KEY (customer, time)); All records for the same customer in the same storage row
Transactions ordered by TIMEUUID - UUIDs are your friends in distributed systems
40. @chbatey
Tips
• Avoid mutable state in distributed system, favour
immutable log
• Can roll up and snapshot to avoid calculation getting too
big
• If you really have to then checkout LWTs and do via
CAS - this will be slower
42. @chbatey
Customer events table
CREATE TABLE if NOT EXISTS customer_events (
customer_id text,
staff_id text,
store_type text,
time timeuuid ,
event_type text,
PRIMARY KEY (customer_id, time))
create INDEX on customer_events (staff_id) ;
43. @chbatey
Indexes to the rescue?
customer_id time staff_id
chbatey 2015-03-03 08:52:45 trevor
chbatey 2015-03-03 08:52:54 trevor
chbatey 2015-03-03 08:53:11 bill
chbatey 2015-03-03 08:53:18 bill
rusty 2015-03-03 08:56:57 bill
rusty 2015-03-03 08:57:02 bill
rusty 2015-03-03 08:57:20 trevor
staff_id customer_id
trevor chbatey
trevor chbatey
bill chbatey
bill chbatey
bill rusty
bill rusty
trevor rusty
44. @chbatey
Indexes to the rescue?
staff_id customer_id
trevor chbatey
trevor chbatey
bill chbatey
bill chbatey
staff_id customer_id
bill rusty
bill rusty
trevor rusty
A B
chbatey rusty
customer_id time staff_id
chbatey 2015-03-03 08:52:45 trevor
chbatey 2015-03-03 08:52:54 trevor
chbatey 2015-03-03 08:53:11 bill
chbatey 2015-03-03 08:53:18 bill
rusty 2015-03-03 08:56:57 bill
rusty 2015-03-03 08:57:02 bill
rusty 2015-03-03 08:57:20 trevor
customer_events table
staff_id customer_id
trevor chbatey
trevor chbatey
bill chbatey
bill chbatey
bill rusty
bill rusty
trevor rusty
staff_id index
45. @chbatey
Do it your self index
CREATE TABLE if NOT EXISTS customer_events (
customer_id text,
statff_id text,
store_type text,
time timeuuid ,
event_type text,
PRIMARY KEY (customer_id, time))
CREATE TABLE if NOT EXISTS customer_events_by_staff (
customer_id text,
statff_id text,
store_type text,
time timeuuid ,
event_type text,
PRIMARY KEY (staff_id, time))
46. @chbatey
Anti patterns summary
• Most anti patterns are very obvious in trace output
• Don’t test on small clusters, most problems only occur at
scale
51. @chbatey
Write timeout
• Received acknowledgements
• Required acknowledgements
• Consistency level
• CAS and Batches are more complicated:
- WriteType: SIMPLE, BATCH, UNLOGGED_BATCH,
BATCH_LOG, CAS
52. @chbatey
Batches
• BATCH_LOG
- Timed out waiting for batch log replicas
• BATCH
- Written to batch log but timed out waiting for actual replica
- Will eventually be committed
54. @chbatey
Anti patterns summary
• Most anti patterns are very obvious in trace output
• Don’t test on small clusters, most problems only occur at
scale
59. @chbatey
Datamodel
CREATE TABLE queues (
name text,
enqueued_at timeuuid,
payload blob,
PRIMARY KEY (name, enqueued_at)
);
SELECT enqueued_at, payload
FROM queues
WHERE name = 'queue-1'
LIMIT 1;
60. @chbatey
Trace
activity | source | elapsed
-------------------------------------------+-----------+--------
execute_cql3_query | 127.0.0.3 | 0
Parsing statement | 127.0.0.3 | 48
Peparing statement | 127.0.0.3 | 362
Message received from /127.0.0.3 | 127.0.0.1 | 42
Sending message to /127.0.0.1 | 127.0.0.3 | 718
Executing single-partition query on queues | 127.0.0.1 | 145
Acquiring sstable references | 127.0.0.1 | 158
Merging memtable contents | 127.0.0.1 | 189
Merging data from memtables and 0 sstables | 127.0.0.1 | 235
Read 1 live and 19998 tombstoned cells | 127.0.0.1 | 251102
Enqueuing response to /127.0.0.3 | 127.0.0.1 | 252976
Sending message to /127.0.0.3 | 127.0.0.1 | 253052
Message received from /127.0.0.1 | 127.0.0.3 | 324314
Not good :(
61. @chbatey
Tips
• Avoid lots of deletes + range queries in the same
partition
• Avoid it with data modelling / query pattern:
- Move partition
- Add a start for your range: e.g. enqueued_at >
9d1cb818-9d7a-11b6-96ba-60c5470cbf0e