This document summarizes Richard Low's upcoming data modeling workshop. The workshop will cover what data modeling is, factors to consider when designing a data model like workload and queries, modeling options in Cassandra like rows and columns, and tools like counters and secondary indexes. It provides an example of modeling a scalable messaging application and compares it to a relational database model. The workshop aims to help attendees optimize their data for common queries and operations.
Modelling Data in Neo4j, bidirectional relationships, qualifying relationships with properties vs. relationship types (performance comparison), Neo4j hardware sizing, Cypher vs. Java API
The Data Distribution Service (DDS) is a standard for ubiquitous, interoperable, secure, platform independent, and real-time data sharing across network connected devices. DDS is today used in a large class of applications, such as, Power Generation, Large Scale SCADA, Air Traffic Control and Management, Smart Cities, Smart Grids, Vehicles, Medical Devices, Simulation, Aerospace, Defense and Financial Trading.
Differently from traditional message-centric technologies, DDS is data-centric – the accent is on seamless (user-defined) data sharing as opposed to message delivery. Therefore, when embracing DDS and data-centricity, data modeling becomes a key step in the design of a distributed system.
This webcast will (1) explain the role and scope of data modeling in DDS, (2) introduce the techniques at the foundation of effective and extensible Data Models, and (3) summarize the most common DDS Data Modeling Idioms.
A short presentation on content modelling describing how it is an essential future friendly strategy. The presentation is pinned around Karen McGrane's Chunks-NOT-Blobs approach!
The document discusses data modeling and entity relationship diagrams. It defines data modeling as the process of defining and analyzing data requirements to support business processes. It describes the different types of data models including conceptual, logical, and physical models. It also explains the key components of entity relationship diagrams including entities, attributes, relationships, cardinality, and notation. The document provides an example of using an ERD to model a scenario involving departments, supervisors, employees, and projects.
Data Modelling 101 half day workshop presented by Chris Bradley at the Enterprise Data and Business Intelligence conference London on November 3rd 2014.
Chris Bradley is a leading independent information strategist.
Contact chris.bradley@dmadvisors.co.uk
Powerpoint Search Engine has collection of slides related to specific topics. Write the required keyword in the search box and it fetches you the related results.
Modelling Data in Neo4j, bidirectional relationships, qualifying relationships with properties vs. relationship types (performance comparison), Neo4j hardware sizing, Cypher vs. Java API
The Data Distribution Service (DDS) is a standard for ubiquitous, interoperable, secure, platform independent, and real-time data sharing across network connected devices. DDS is today used in a large class of applications, such as, Power Generation, Large Scale SCADA, Air Traffic Control and Management, Smart Cities, Smart Grids, Vehicles, Medical Devices, Simulation, Aerospace, Defense and Financial Trading.
Differently from traditional message-centric technologies, DDS is data-centric – the accent is on seamless (user-defined) data sharing as opposed to message delivery. Therefore, when embracing DDS and data-centricity, data modeling becomes a key step in the design of a distributed system.
This webcast will (1) explain the role and scope of data modeling in DDS, (2) introduce the techniques at the foundation of effective and extensible Data Models, and (3) summarize the most common DDS Data Modeling Idioms.
A short presentation on content modelling describing how it is an essential future friendly strategy. The presentation is pinned around Karen McGrane's Chunks-NOT-Blobs approach!
The document discusses data modeling and entity relationship diagrams. It defines data modeling as the process of defining and analyzing data requirements to support business processes. It describes the different types of data models including conceptual, logical, and physical models. It also explains the key components of entity relationship diagrams including entities, attributes, relationships, cardinality, and notation. The document provides an example of using an ERD to model a scenario involving departments, supervisors, employees, and projects.
Data Modelling 101 half day workshop presented by Chris Bradley at the Enterprise Data and Business Intelligence conference London on November 3rd 2014.
Chris Bradley is a leading independent information strategist.
Contact chris.bradley@dmadvisors.co.uk
Powerpoint Search Engine has collection of slides related to specific topics. Write the required keyword in the search box and it fetches you the related results.
Mansoura University CSED & Nozom web development sprintAl Sayed Gamal
The document outlines an agenda for a web development training covering client-side and server-side technologies. It discusses web browsers, HTML, CSS, JavaScript, mockup tools, server-side programming with Python and Django, databases with MySQL, and rapid application development methodologies. Key topics include the web scenario, HTML tags and document structure, CSS selectors and properties, JavaScript basics, mockups, Python syntax, the Django framework, and agile development with SCRUM.
The webinar introduces Spring Data Neo4j 2.0. It is presented by Michael Hunger from Neo Technology. The webinar provides a brief overview of Spring Data, introduces Neo4j as a graph database, and discusses the key features and capabilities of Spring Data Neo4j for accessing Neo4j from Spring applications. It demonstrates how to perform basic graph operations in Neo4j and queries in Cypher.
AN INTRODUCTION TO AUTO-ML EDGE-ML (VIDEO 1/4)Alexis Bondu
This document introduces Auto ML and compares traditional machine learning approaches to Edge ML. Traditional ML has limitations including not scaling well with large data as the number of parameters increases. It also does not provide a way to choose the best family of models without extensive grid search. Regular Auto ML approaches try to address these problems but still require large hardware resources. Edge ML takes a Bayesian approach called MODL that avoids grid search and scales to tiny hardware resources while producing accurate and robust models.
I presented these slides as a keynote at the Enterprise Intelligence Workshop at KDD2016 in San francisco.
In these slides, I describe our work towards developing a Maslow's Hierarchy for Human in the Loop Data Analytics!
On Friday, September 25th Devin Hopps lead us through a presentation on an Introduction to Big Data and how technology has evolved to harness the power of Big Data.
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hakky St
This is the documentation of the study-meeting in lab.
Tha book title is "Hands-On Machine Learning with Scikit-Learn and TensorFlow" and this is the chapter 8.
1) Machine learning algorithms aim to learn patterns in labeled data to predict labels for new data, while data mining describes patterns without guaranteed generalization.
2) Running machine learning on Hadoop has issues with iterations and data sparsity causing many small, empty files.
3) Techniques like speculation, grouping rare values, and sampling can improve performance by reducing iterations and sparsity when learning decision trees on Hadoop.
Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)Daniel Austin
This document summarizes a keynote address on big data myths. It discusses that big data refers to problems of large volumes and high rates of change, and NoSQL is one proposed solution but not synonymous with big data. It also discusses that the CAP theorem is more about tradeoffs between consistency and availability. Finally, it introduces the YESQL project which aims to build a globally distributed SQL database that does not fail, lose data, or sacrifice consistency while supporting transactions and scaling linearly.
How to manage a system in which the schema of data cannot be defined “a priori”? How to quickly search for entities whose data is on multiple lines? In this session we are going to address all these issues, historically among the most complex for those who find themselves having to manage yet very common and very delicate with regard to performance. From EAV to Sparse Columns, we'll see all the possible techniques to do it in the best way possible, from a usability, performance and maintenance points of view.
The document discusses schema design considerations for modeling data in MongoDB. It notes that while MongoDB is schemaless, applications are still responsible for schema design. It compares relational and MongoDB schema designs, highlighting that MongoDB uses embedded documents, has no joins, and requires duplicating or precomputing data. The document provides recommendations like combining related objects, optimizing for specific use cases, and doing aggregation work during writes rather than reads.
This document provides information about MongoDB and its suitability for e-commerce applications. It discusses how MongoDB allows for a flexible schema that can accommodate different product types like books, music albums, jeans, without needing to define all attributes in advance. This flexibility addresses the "data dilemma" that traditional relational databases have in modeling diverse e-commerce data. Examples of companies successfully using MongoDB for e-commerce are also provided.
Slides for a talk on the Program Synthesis field in general, the structure of the DreamCoder system, and ways to improve it to better handle tasks from the Abstraction and Reasoning Corpus. Presented at the community event for the Machine Learning Street Talk podcast.
This lecture provided an overview and introduction to the CSE 326: Data Structures course. It outlined administrative details like instructors, office hours, grading policy, and topics to be covered. These include fundamental data structures, analyzing algorithm efficiency using asymptotic complexity, and becoming proficient with Unix. The lecture introduced queues as the first abstract data type and discussed two implementations using arrays and linked lists.
This document summarizes a PyCon India 2012 presentation about Pycassa, a Python library for Cassandra. The presentation covers:
- An introduction to NoSQL databases and Cassandra's data model
- Using Pycassa to create a Cassandra keyspace and column families, insert and retrieve data in bulk and individually, and count rows
- How Cassandra provides tunable consistency, elastic scalability, and fault tolerance through replication and its gossip protocol
- References for further exploring Pycassa and Cassandra
This document provides an introduction to big data and NoSQL databases. It begins with an introduction of the presenter. It then discusses how the era of big data came to be due to limitations of traditional relational databases and scaling approaches. The document introduces different NoSQL data models including document, key-value, graph and column-oriented databases. It provides examples of NoSQL databases that use each data model. The document discusses how NoSQL databases are better suited than relational databases for big data problems and provides a real-world example of Twitter's use of FlockDB. It concludes by discussing approaches for working with big data using MapReduce and provides examples of using MongoDB and Azure for big data.
The NoSQL movement has introduced four new database architectural patterns that complement, but not replace, traditional relational and analytical databases. This presentation will introduce these four patterns and discuss their relative strengths and weaknesses for solving a variety of business problems. These problems include Big Data (scalability), search, high availability and agility. For each type of problem we look at how NoSQL databases take different approaches to solving these problems and how you can use this knowledge to find the right database architecture for your business challenges.
Acunu and Hailo: a realtime analytics case study on CassandraAcunu
Hailo is a taxi app that receives a hail every 4 seconds across 15 cities. It launched on AWS using MySQL but adopted Cassandra and Acunu for greater resilience during international expansion. Cassandra provided high availability and global replication. Acunu provided analytics capabilities on Cassandra data. Hailo uses Cassandra for entity storage and Acunu for analytics, seeing benefits like simplified data modeling, rich queries, and infrastructure monitoring. Choosing these platforms allowed for high availability, multi-data center operation, and scaling to support growth.
- Cassandra nodes are clustered in a ring, with each node assigned a random token range to own.
- Adding or removing nodes traditionally required manually rebalancing the token ranges, which was complex, impacted many nodes, and took the cluster offline.
- Virtual nodes assign each physical node multiple random token ranges of varying sizes, allowing incremental changes where new nodes "steal" ranges from others, distributing the load evenly without manual work or downtime.
More Related Content
Similar to Cassandra EU 2012 - Data modelling workshop by Richard Low
Mansoura University CSED & Nozom web development sprintAl Sayed Gamal
The document outlines an agenda for a web development training covering client-side and server-side technologies. It discusses web browsers, HTML, CSS, JavaScript, mockup tools, server-side programming with Python and Django, databases with MySQL, and rapid application development methodologies. Key topics include the web scenario, HTML tags and document structure, CSS selectors and properties, JavaScript basics, mockups, Python syntax, the Django framework, and agile development with SCRUM.
The webinar introduces Spring Data Neo4j 2.0. It is presented by Michael Hunger from Neo Technology. The webinar provides a brief overview of Spring Data, introduces Neo4j as a graph database, and discusses the key features and capabilities of Spring Data Neo4j for accessing Neo4j from Spring applications. It demonstrates how to perform basic graph operations in Neo4j and queries in Cypher.
AN INTRODUCTION TO AUTO-ML EDGE-ML (VIDEO 1/4)Alexis Bondu
This document introduces Auto ML and compares traditional machine learning approaches to Edge ML. Traditional ML has limitations including not scaling well with large data as the number of parameters increases. It also does not provide a way to choose the best family of models without extensive grid search. Regular Auto ML approaches try to address these problems but still require large hardware resources. Edge ML takes a Bayesian approach called MODL that avoids grid search and scales to tiny hardware resources while producing accurate and robust models.
I presented these slides as a keynote at the Enterprise Intelligence Workshop at KDD2016 in San francisco.
In these slides, I describe our work towards developing a Maslow's Hierarchy for Human in the Loop Data Analytics!
On Friday, September 25th Devin Hopps lead us through a presentation on an Introduction to Big Data and how technology has evolved to harness the power of Big Data.
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hakky St
This is the documentation of the study-meeting in lab.
Tha book title is "Hands-On Machine Learning with Scikit-Learn and TensorFlow" and this is the chapter 8.
1) Machine learning algorithms aim to learn patterns in labeled data to predict labels for new data, while data mining describes patterns without guaranteed generalization.
2) Running machine learning on Hadoop has issues with iterations and data sparsity causing many small, empty files.
3) Techniques like speculation, grouping rare values, and sampling can improve performance by reducing iterations and sparsity when learning decision trees on Hadoop.
Big Data is a Big Scam Most of the Time! (MySQL Connect Keynote 2012)Daniel Austin
This document summarizes a keynote address on big data myths. It discusses that big data refers to problems of large volumes and high rates of change, and NoSQL is one proposed solution but not synonymous with big data. It also discusses that the CAP theorem is more about tradeoffs between consistency and availability. Finally, it introduces the YESQL project which aims to build a globally distributed SQL database that does not fail, lose data, or sacrifice consistency while supporting transactions and scaling linearly.
How to manage a system in which the schema of data cannot be defined “a priori”? How to quickly search for entities whose data is on multiple lines? In this session we are going to address all these issues, historically among the most complex for those who find themselves having to manage yet very common and very delicate with regard to performance. From EAV to Sparse Columns, we'll see all the possible techniques to do it in the best way possible, from a usability, performance and maintenance points of view.
The document discusses schema design considerations for modeling data in MongoDB. It notes that while MongoDB is schemaless, applications are still responsible for schema design. It compares relational and MongoDB schema designs, highlighting that MongoDB uses embedded documents, has no joins, and requires duplicating or precomputing data. The document provides recommendations like combining related objects, optimizing for specific use cases, and doing aggregation work during writes rather than reads.
This document provides information about MongoDB and its suitability for e-commerce applications. It discusses how MongoDB allows for a flexible schema that can accommodate different product types like books, music albums, jeans, without needing to define all attributes in advance. This flexibility addresses the "data dilemma" that traditional relational databases have in modeling diverse e-commerce data. Examples of companies successfully using MongoDB for e-commerce are also provided.
Slides for a talk on the Program Synthesis field in general, the structure of the DreamCoder system, and ways to improve it to better handle tasks from the Abstraction and Reasoning Corpus. Presented at the community event for the Machine Learning Street Talk podcast.
This lecture provided an overview and introduction to the CSE 326: Data Structures course. It outlined administrative details like instructors, office hours, grading policy, and topics to be covered. These include fundamental data structures, analyzing algorithm efficiency using asymptotic complexity, and becoming proficient with Unix. The lecture introduced queues as the first abstract data type and discussed two implementations using arrays and linked lists.
This document summarizes a PyCon India 2012 presentation about Pycassa, a Python library for Cassandra. The presentation covers:
- An introduction to NoSQL databases and Cassandra's data model
- Using Pycassa to create a Cassandra keyspace and column families, insert and retrieve data in bulk and individually, and count rows
- How Cassandra provides tunable consistency, elastic scalability, and fault tolerance through replication and its gossip protocol
- References for further exploring Pycassa and Cassandra
This document provides an introduction to big data and NoSQL databases. It begins with an introduction of the presenter. It then discusses how the era of big data came to be due to limitations of traditional relational databases and scaling approaches. The document introduces different NoSQL data models including document, key-value, graph and column-oriented databases. It provides examples of NoSQL databases that use each data model. The document discusses how NoSQL databases are better suited than relational databases for big data problems and provides a real-world example of Twitter's use of FlockDB. It concludes by discussing approaches for working with big data using MapReduce and provides examples of using MongoDB and Azure for big data.
The NoSQL movement has introduced four new database architectural patterns that complement, but not replace, traditional relational and analytical databases. This presentation will introduce these four patterns and discuss their relative strengths and weaknesses for solving a variety of business problems. These problems include Big Data (scalability), search, high availability and agility. For each type of problem we look at how NoSQL databases take different approaches to solving these problems and how you can use this knowledge to find the right database architecture for your business challenges.
Similar to Cassandra EU 2012 - Data modelling workshop by Richard Low (20)
Acunu and Hailo: a realtime analytics case study on CassandraAcunu
Hailo is a taxi app that receives a hail every 4 seconds across 15 cities. It launched on AWS using MySQL but adopted Cassandra and Acunu for greater resilience during international expansion. Cassandra provided high availability and global replication. Acunu provided analytics capabilities on Cassandra data. Hailo uses Cassandra for entity storage and Acunu for analytics, seeing benefits like simplified data modeling, rich queries, and infrastructure monitoring. Choosing these platforms allowed for high availability, multi-data center operation, and scaling to support growth.
- Cassandra nodes are clustered in a ring, with each node assigned a random token range to own.
- Adding or removing nodes traditionally required manually rebalancing the token ranges, which was complex, impacted many nodes, and took the cluster offline.
- Virtual nodes assign each physical node multiple random token ranges of varying sizes, allowing incremental changes where new nodes "steal" ranges from others, distributing the load evenly without manual work or downtime.
Acunu Analytics and Cassandra at Hailo All Your Base 2013 Acunu
Hailo, the taxi app, has served more than 5 million passengers in 15 cities and has taken fares of $100 million this year. I'm going to talk about how that rapid growth has been powered by a platform based on Cassandra and operational analytics and insights powered by Acunu Analytics. I'll cover some challenges and lessons learned from scaling fast!
Understanding Cassandra internals to solve real-world problemsAcunu
The document summarizes Nicolas Favre-Felix's presentation on Cassandra internals at a Cassandra London meetup. It discusses four common problems encountered with Cassandra - high read latency, high CPU usage with little activity, long nodetool repair times, and optimizing write throughput. For each problem, it describes symptoms, analysis using tools like nodetool, and solutions like adjusting the data model, increasing thread pool sizes, and adding hardware resources. The key takeaways are that monitoring Cassandra is important, using the right data model impacts performance, and understanding how Cassandra stores and arranges data on disk is essential to optimization.
Talk for the Cassandra Seattle Meetup April 2013: http://www.meetup.com/cassandra-seattle/events/114988872/
Cassandra's got some properties which make it an ideal fit for building real-time analytics applications -- but getting from atomic increments to live dashboards and streaming queries is quite a stretch. In this talk, Tim Moreton, CTO at Acunu, talks about how and why they built Acunu Analytics, which adds rich SQL-like queries and a RESTful API on top of Cassandra, and looks at how it keeps Cassandra's spirit of denormalization under the hood.
The document describes how Apache Cassandra can be used for real-time analytics on streaming data. It provides an example of counting Twitter mentions of a term per day in real-time by incrementing counters in Cassandra as tweets are processed. This allows queries to be answered by reading the counters. More complex queries can be supported by storing aggregated data in a denormalized format across rows and columns in Cassandra.
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...Acunu
The document discusses implementing real-time analytics on Twitter data using Cassandra. It describes incrementing counters for each tweet to track token frequencies over time. This allows querying token mentions within a date range by reading the relevant counter columns. However, Cassandra's random partitioner prevents efficient range queries on rows. Instead, the solution denormalizes the data into wide rows with time buckets as columns to allow fast counting of token mentions within each time period through a single disk read. The document provides code examples and encourages experimenting with an open source implementation.
This document discusses real-time analytics with Cassandra. It includes sections on motivation/alternatives, what real-time analytics with Cassandra is, how it works, approximate analytics, and what problems it can help solve. The document contains log data as an example of the type of data that can be analyzed with this technique.
- The document discusses Acunu Analytics, a real-time big data analytics platform.
- It addresses the motivation for developing Acunu Analytics compared to alternatives. It also briefly describes what Acunu Analytics is, how it works, and what problems it can help solve.
- The main topics covered are the product itself, its capabilities for real-time analytics of big data, and potential use cases.
Realtime Analytics on the Twitter Firehose with CassandraAcunu
This document discusses using Cassandra for real-time analytics of Twitter data. It describes incrementing counters in Cassandra as tweets are processed to track metrics like mentions over time. This allows queries to retrieve trends by reading counters with a single I/O, rather than scanning large amounts of data. The document demonstrates preparing tweet data by tokenizing and incrementing counters in time buckets. It also covers implementing a range query to retrieve mentions between dates from a wide row with time buckets as columns.
This document discusses a distributed database called Acunu that is tunably consistent, highly available, and partition tolerant. It can scale out on commodity servers and provides high performance. The database uses a multi-master architecture without single points of failure and supports data replication across multiple data centers. It also provides a simple but powerful data model and is well-suited for applications involving high-velocity data.
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...Acunu
The document discusses NoSQL, NewSQL, and other database technologies that are emerging to address limitations of relational databases in scaling to meet demands for performance, availability, and flexibility. It provides an overview of different categories of NoSQL databases and NewSQL solutions, and analyzes drivers like scalability, performance, relaxed consistency, agility, and complexity of data that are contributing to adoption of these new database approaches.
Cassandra EU 2012 - Putting the X Factor into CassandraAcunu
Malcolm Box discusses Tellybug's experience using Cassandra to power voting applications for reality TV shows like Britain's Got Talent and The X Factor. They started with Cassandra to handle high write loads from millions of votes but found counting to be more challenging than expected. They implemented sharded counters in Memcached with Cassandra as the source of truth. While Cassandra scaled well for writes, reads had performance issues. Backup and data integrity also presented operational challenges as their usage of Cassandra evolved.
Acunu is developing an enterprise Cassandra appliance called Castle that aims to simplify Cassandra deployment and management. Castle includes a storage engine optimized for large disks and workloads, and allows for high density on commodity hardware. It also features fast disk rebuilds through its shared memory architecture. Acunu provides a web UI called the Control Center to configure, monitor, and troubleshoot Castle without deep Cassandra expertise. Acunu performs extensive automated testing of Castle to ensure reliability.
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans Acunu
The document discusses the history and development of Cassandra Query Language (CQL), which provides an SQL-like interface for querying Apache Cassandra databases. It describes CQL evolving from versions 1.0 through 3.0 to become more standardized and user-friendly. Key points include CQL initially being introduced in Cassandra 0.8 to replace the low-level Thrift API, its goals of being simple, intuitive, and high performing, and ongoing work to improve its interface stability and driver support across languages.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
OpenID AuthZEN Interop Read Out - AuthorizationDavid Brossard
During Identiverse 2024 and EIC 2024, members of the OpenID AuthZEN WG got together and demoed their authorization endpoints conforming to the AuthZEN API
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
UiPath Test Automation using UiPath Test Suite series, part 6
Cassandra EU 2012 - Data modelling workshop by Richard Low
1. Data modelling
workshop
Richard Low
rlow@acunu.com @richardalow
Wednesday, 28 March 2012
2. Outline
• What is data modelling?
• What do I need to know to come up with a
model?
• Options and available tools
• Denormalisation
• Example and demo: scalable messaging
application
Wednesday, 28 March 2012
4. Data modelling
• How you organise your data
• Store all in one big value?
• Store as columns in one row or lots of rows?
• Use counters?
• Can I avoid read-modify-write?
Wednesday, 28 March 2012
5. Why care about it?
• Performance
• Ensure good load balancing
• Disk usage
• Future proofing
Wednesday, 28 March 2012
6. Performance
100
• Bad data model: do read-modify-write on
0x
large column im
pro
• Good data model: just overwrite updated data
vem
• ent
Difference? Could be 100 ops/s vs. 100k ops/s
Wednesday, 28 March 2012
7. Performance
• Cacheability
• Ensure your cache isn’t polluted by
uncacheable things
• Cached reads are ~100x faster than
uncached
Wednesday, 28 March 2012
14. Keyspaces and Column Families
SQL Cassandra
Database row/key col_1 col_2
Keyspace
row/key col_1 col_1
row/ col_1 col_1
Table Column Family
Wednesday, 28 March 2012
15. Options and tools
• Rows
• Columns
• Supercolumns
• Composite columns
Wednesday, 28 March 2012
16. Rows and columns
col1 col2 col3 col4 col5 col6 col7
row1 x x x
row2 x x x x x
row3 x x x x x
row4 x x x x
row5 x x x x
row6 x
row7 x x x
Wednesday, 28 March 2012
17. Column options
• Regular columns
• Super columns: columns within columns
• Composite columns: multi-dimensional
column names
Wednesday, 28 March 2012
19. Tools
• Counters: atomic inc and dec
• Expiring columns: TTL
• Secondary indexes: your WHERE clause
Wednesday, 28 March 2012
20. Rows vs columns
• Row key is the shard key
• Need lots of rows for scalability
• Don’t be afraid of large-ish rows
• But don’t make them too big
• Avoid range queries across rows, but use
them within rows
Wednesday, 28 March 2012
21. Range queries
• Within a row:
SELECT col3..col5 FROM
Standard1 WHERE KEY=row1
row1 col1 col2 col5 col6 col8
Wednesday, 28 March 2012
22. Range queries
• Across rows:
SELECT * FROM table WHERE key >
row2 LIMIT 2
Wednesday, 28 March 2012
23. Range queries
SELECT * FROM table
WHERE key > row2 row4
LIMIT 2
> row2, row1
row2
row3 row1
Wednesday, 28 March 2012
24. Range queries
• Range queries within rows ‘get_slice’ are
fine
• Avoid range queries across rows
‘get_range_slices’
Wednesday, 28 March 2012
25. Batching
• Overhead on each call
• Batch together inserts, better if in the same
row
• Reduce read ops, use large get_slice reads
Wednesday, 28 March 2012
27. Denormalisation
• Hard drive performance constraints:
• Sequential IO at 100s MB/s
• Seek at 100 IO/s
• Avoid random IO
Wednesday, 28 March 2012
28. Denormalisation
• Store columns accessed at similar times near
to each other
• => put them in the same row
• Involves copying
• Copying isn’t bad - pre flood prices <$100
per TB
Wednesday, 28 March 2012
30. Messaging application
• Users can send messages to other users
• Horizontally scalable
• Expect users to send to lots of recipients
Wednesday, 28 March 2012
31. Messaging
• In an RDBMS we might have a table for:
• Users
• Messages (sender is unique)
• Mappings, Message → Receiver
Wednesday, 28 March 2012
32. A relational model
Msg_Receipt
Id
Message_Id ∞
∞ User_Id
Users 1 Is_read
1 Messages
Id
Id
username 1
Subject
Content
Date
Example Relational ∞
Sender_Id
DB model
Wednesday, 28 March 2012
33. Querying
Most recent 10 messages sent by a user:
SELECT *
FROM Messages
WHERE Messages.Sender_Id = <id>
ORDER BY Messages.Date DESC
LIMIT 10;
Most recent 10 messages received by a user:
SELECT Messages.*
FROM Messages, Msg_Receipt
WHERE Msg_Receipt.User_Id = <id>
AND Msg_Receipt.Message_Id = Messages.Id
ORDER BY Messages.Date DESC
LIMIT 10;
Wednesday, 28 March 2012
34. Under the hood
Msg_Receipt Messages
id msg_id user_id id subject ...
0 0 0 0 a
1 3 1 1 b
2 4 2 2 c
3 6000 0 3 d
4 e
...
6000 x
Wednesday, 28 March 2012
35. Under the hood
• Normalisation => seeks
• So denormalise
• Hit capacity limit of one node quickly
Wednesday, 28 March 2012
36. Back of the envelope...
• 1 M users
• Message size 1 KB
• Each user has 5000 messages
• => 5 TB data
Wednesday, 28 March 2012
37. Back of the envelope...
• Reading 10 messages => 10 seeks
• If 10k active at once, need 100k seeks/s
• => need 1000 disks
• With 8 disks per node, RF 3, that’s 375
nodes
Wednesday, 28 March 2012
38. Back of the envelope...
• Denormalize: messages are immutable
• Insert them into everyone’s inbox
• Read 10 messages is one seek
• Paging is sequential
• => 10x fewer nodes: 38 nodes now!
Wednesday, 28 March 2012
39. In Cassandra
• Use a row per user
• Composite columns, with TimeUUID as ID
• Gives time ordering on messages
• Inserts go to all recipients
Wednesday, 28 March 2012
40. Messaging example
From: alice
To: bob, charlie
Subject: rock?
m1
alice
sender subject
bob
alice rock?
sender subject
charlie
alice rock?
Wednesday, 28 March 2012
41. Messaging example
From: bob
To: alice, charlie
Subject: paper!
m1 m2
sender subject
alice
bob paper!
sender subject
bob
alice rock?
sender subject sender subject
charlie
alice rock? bob paper!
Wednesday, 28 March 2012