Presented at Cassandra London (April 7, 2014); The challenges of time-series storage and analytics in OpenNMS, with an introduction to Newts, a new Cassandra-based time-series data store.
Whether it's statistics, weather forecasting, astronomy, finance, or network management, time series data plays a critical role in analytics and forecasting. Unfortunately, while many tools exist for time series storage and analysis, few are able to scale past memory limits, or provide rich query and analytics capabilities outside what is necessary to produce simple plots; For those challenged by large volumes of data, there is much room for improvement.
Apache Cassandra is a fully distributed second-generation database. Cassandra stores data in key-sorted order making it ideal for time series, and its high throughput and linear scalability make it well suited to very large data sets.
This talk will cover some of the requirements and challenges of large scale time series storage and analysis. Cassandra data and query modeling for this use-case will be discussed, and Newts, an open source Cassandra-based time series store under development at The OpenNMS Group will be introduced.
C* Summit 2013: Time-Series Metrics with Cassandra by Mike HeffnerDataStax Academy
Librato's Metrics platform relies on Cassandra as its sole data storage platform for time-series data. This session will discuss how we have scaled from a single six node Cassandra ring two years ago to the multiple storage rings that handle over 150,000 writes/second today. We'll cover the steps we have taken to scale the platform including the evolution of our underlying schema, operational tricks, and client-library improvements. The session will finish with our suggestions on how we believe Cassandra as a project and its community can be improved.
TypoScript and EEL outside of Neos [InspiringFlow2013]Christian Müller
Talk from Inspiring Flow 2013 describing TypoScript and Embedded Expression Language in Neos and use cases to have both in TYPO3 Flow applications and extend them for your needs.
Whether it's statistics, weather forecasting, astronomy, finance, or network management, time series data plays a critical role in analytics and forecasting. Unfortunately, while many tools exist for time series storage and analysis, few are able to scale past memory limits, or provide rich query and analytics capabilities outside what is necessary to produce simple plots; For those challenged by large volumes of data, there is much room for improvement.
Apache Cassandra is a fully distributed second-generation database. Cassandra stores data in key-sorted order making it ideal for time series, and its high throughput and linear scalability make it well suited to very large data sets.
This talk will cover some of the requirements and challenges of large scale time series storage and analysis. Cassandra data and query modeling for this use-case will be discussed, and Newts, an open source Cassandra-based time series store under development at The OpenNMS Group will be introduced.
C* Summit 2013: Time-Series Metrics with Cassandra by Mike HeffnerDataStax Academy
Librato's Metrics platform relies on Cassandra as its sole data storage platform for time-series data. This session will discuss how we have scaled from a single six node Cassandra ring two years ago to the multiple storage rings that handle over 150,000 writes/second today. We'll cover the steps we have taken to scale the platform including the evolution of our underlying schema, operational tricks, and client-library improvements. The session will finish with our suggestions on how we believe Cassandra as a project and its community can be improved.
TypoScript and EEL outside of Neos [InspiringFlow2013]Christian Müller
Talk from Inspiring Flow 2013 describing TypoScript and Embedded Expression Language in Neos and use cases to have both in TYPO3 Flow applications and extend them for your needs.
The Wikimedia Foundation is a non-profit and charitable organization driven by a vision of a world where every human can freely share in the sum of all knowledge. Each month Wikimedia sites serve over 18 billion page views to 500 million unique visitors around the world.
Among the many resources offered by Wikimedia is a public-facing API that provides low-latency, programmatic access to full-history content and meta-data, in a variety of formats. Commonly, results from this system are the product of computationally intensive transformations, and must be pre-generated and persisted to meet latency expectations. Unsurprisingly, there are numerous challenges to providing low-latency storage of such a massive data-set, in a demanding, globally distributed environment.
This talk covers Wikimedia Content API, and it's use of Apache Cassandra, a massively-scalable distributed database, as storage for a diverse and growing set of use-cases. Trials, tribulations, and triumphs, of both a development and operational nature are discussed.
Be a Zen monk, the Python way.
A short tech talk at Imaginea to get developers bootstrapped with the focus and philosophy of Python and their point of convergence with the philosophy.
Bitcoin Price Detection with Pyspark presentationYakup Görür
Cryptocurrencies are digital currencies that have garnered significant investor attention in the financial markets. The aim of this project is to predict the daily price, particularly the daily closing price of the cryptocurrency Bitcoin. This plays a vital role in making trading decisions. There exist various factors which affect the price of Bitcoin, thereby making price prediction a complex and technically challenging task. To perform prediction, random forest model was trained on the historical time series which is the past prices of Bitcoin over several years. Features such as the opening price, highest price, lowest price, closing price, volume of Bitcoin, volume of currencies, and weighted price were taken into consideration so as to predict the closing price of the next day. Random forest model designed and implemented on both of pyspark and scikit learn frameworks to build predictive analysis and evaluated them by computing various measures such as the RMSE (root mean square error) and r (Pearson's correlation coefficient) on test data. Pyspark framework was used to make parallelize the creating trees when training the random forest to handle bigdata. Code has been made available at: https://github.com/ykpgrr/Price-Prediction-with-Random-Forest
Developing Ansible Dynamic Inventory Script - Nov 2017Ahmed AbouZaid
A session about my experience with writing an external inventory script from scratch for "Netbox" (IPAM and DCIM tool from DigitalOcean network engineering team) and push it to upstream to became an official inventory script.
Repo:
https://github.com/AAbouZaid/netbox-as-ansible-inventory
The "Dynamic inventory" is one of nice features in Ansible, where you can use an external service as inventory for Ansible instead the basic text-based ini file. So you can use AWS EC2 as inventory of your hosts, or maybe OpenStack, or whatever ... you actually can use any source inventory for Ansible, and you can write your own "External Inventory Script".
“Show Me the Garbage!”, Understanding Garbage CollectionHaim Yadid
“Just leave the garbage outside and we will take care of it for you”. This is the panacea promised by garbage collection mechanisms built into most software stacks available today. So, we don’t need to think about it anymore, right? Wrong! When misused, garbage collectors can fail miserably. When this happens they slow down your application and lead to unacceptable pauses. In this talk we will go over different garbage collectors approaches in different software runtimes and what are the conditions which enable them to function well.
Presented on Reversim summit 2019
https://summit2019.reversim.com/session/5c754052d0e22f001706cbd8
[Paper Reading] Efficient Query Processing with Optimistically Compressed Has...PingCAP
Modern query engines rely heavily on hash tables for query processing. Overall query performance and memory
footprint is often determined by how hash tables and the
tuples within them are represented. In this work, we propose three complementary techniques to improve this representation:
Domain-Guided Prefix Suppression bit-packs keys and values tightly to reduce hash table record width. Optimistic Splitting decomposes values (and operations on them) into (operations on) frequently-accessed and infrequently-accessed value slices.
By removing the infrequently-accessed value slices from the hash table record, it improves cache locality. The Unique Strings Selfaligned Region (USSR) accelerates handling frequently-occurring strings, which are very common in real-world data sets, by creating an on-the-fly dictionary of the most frequent strings. This allows executing many string operations with integer logic and reduces memory pressure.
We integrated these techniques into Vectorwise. On the TPC-H benchmark, our approach reduces peak memory consumption by 2–4× and improves performance by up to 1.5×. On a real-world BI workload, we measured a 2× improvement in performance and in micro-benchmarks we observed speedups of up to 25×.
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...NoSQLmatters
Ted Dunning – Very High Bandwidth Time Series Database Implementation
This talk will describe our work in creating time series databases with very high ingest rates (over 100 million points / second) on very small clusters. Starting with openTSDB and the off-the-shelf version of MapR-DB, we were able to accelerate ingest by >1000x. I will describe our techniques in detail and talk about the architectural changes required. We are also working to allow access to openTSDB data using SQL via Apache Drill. In addition, I will talk about how this work has implications regarding the much fabled Internet of Things. And tell some stories about the origins of open source big data in the 19th century at sea.
Spark Gotchas and Lessons Learned (2/20/20)Jen Waller
Presentation from the Boulder/Denver Big Data Meetup on 2/20/2020 in Boulder, CO. Topics covered: Troubleshooting Spark jobs (groupby, shuffle) for big data, tuning AWS EMR Spark clusters, EMR cluster resource utilization, writing scaleable Scala for scanning S3 metadata.
In Apache Cassandra Lunch #59: Functions in Cassandra, we discussed the functions that are usable inside of the Cassandra database. The live recording of Cassandra Lunch, which includes a more in-depth discussion and a demo, is embedded below in case you were not able to attend live.
Storing time series data with Apache CassandraPatrick McFadin
If you are looking to collect and store time series data, it's probably not going to be small. Don't get caught without a plan! Apache Cassandra has proven itself as a solid choice now you can learn how to do it. We'll look at possible data models and the the choices you have to be successful. Then, let's open the hood and learn about how data is stored in Apache Cassandra. You don't need to be an expert in distributed systems to make this work and I'll show you how. I'll give you real-world examples and work through the steps. Give me an hour and I will upgrade your time series game.
The Wikimedia Foundation is a non-profit and charitable organization driven by a vision of a world where every human can freely share in the sum of all knowledge. Each month Wikimedia sites serve over 18 billion page views to 500 million unique visitors around the world.
Among the many resources offered by Wikimedia is a public-facing API that provides low-latency, programmatic access to full-history content and meta-data, in a variety of formats. Commonly, results from this system are the product of computationally intensive transformations, and must be pre-generated and persisted to meet latency expectations. Unsurprisingly, there are numerous challenges to providing low-latency storage of such a massive data-set, in a demanding, globally distributed environment.
This talk covers Wikimedia Content API, and it's use of Apache Cassandra, a massively-scalable distributed database, as storage for a diverse and growing set of use-cases. Trials, tribulations, and triumphs, of both a development and operational nature are discussed.
Be a Zen monk, the Python way.
A short tech talk at Imaginea to get developers bootstrapped with the focus and philosophy of Python and their point of convergence with the philosophy.
Bitcoin Price Detection with Pyspark presentationYakup Görür
Cryptocurrencies are digital currencies that have garnered significant investor attention in the financial markets. The aim of this project is to predict the daily price, particularly the daily closing price of the cryptocurrency Bitcoin. This plays a vital role in making trading decisions. There exist various factors which affect the price of Bitcoin, thereby making price prediction a complex and technically challenging task. To perform prediction, random forest model was trained on the historical time series which is the past prices of Bitcoin over several years. Features such as the opening price, highest price, lowest price, closing price, volume of Bitcoin, volume of currencies, and weighted price were taken into consideration so as to predict the closing price of the next day. Random forest model designed and implemented on both of pyspark and scikit learn frameworks to build predictive analysis and evaluated them by computing various measures such as the RMSE (root mean square error) and r (Pearson's correlation coefficient) on test data. Pyspark framework was used to make parallelize the creating trees when training the random forest to handle bigdata. Code has been made available at: https://github.com/ykpgrr/Price-Prediction-with-Random-Forest
Developing Ansible Dynamic Inventory Script - Nov 2017Ahmed AbouZaid
A session about my experience with writing an external inventory script from scratch for "Netbox" (IPAM and DCIM tool from DigitalOcean network engineering team) and push it to upstream to became an official inventory script.
Repo:
https://github.com/AAbouZaid/netbox-as-ansible-inventory
The "Dynamic inventory" is one of nice features in Ansible, where you can use an external service as inventory for Ansible instead the basic text-based ini file. So you can use AWS EC2 as inventory of your hosts, or maybe OpenStack, or whatever ... you actually can use any source inventory for Ansible, and you can write your own "External Inventory Script".
“Show Me the Garbage!”, Understanding Garbage CollectionHaim Yadid
“Just leave the garbage outside and we will take care of it for you”. This is the panacea promised by garbage collection mechanisms built into most software stacks available today. So, we don’t need to think about it anymore, right? Wrong! When misused, garbage collectors can fail miserably. When this happens they slow down your application and lead to unacceptable pauses. In this talk we will go over different garbage collectors approaches in different software runtimes and what are the conditions which enable them to function well.
Presented on Reversim summit 2019
https://summit2019.reversim.com/session/5c754052d0e22f001706cbd8
[Paper Reading] Efficient Query Processing with Optimistically Compressed Has...PingCAP
Modern query engines rely heavily on hash tables for query processing. Overall query performance and memory
footprint is often determined by how hash tables and the
tuples within them are represented. In this work, we propose three complementary techniques to improve this representation:
Domain-Guided Prefix Suppression bit-packs keys and values tightly to reduce hash table record width. Optimistic Splitting decomposes values (and operations on them) into (operations on) frequently-accessed and infrequently-accessed value slices.
By removing the infrequently-accessed value slices from the hash table record, it improves cache locality. The Unique Strings Selfaligned Region (USSR) accelerates handling frequently-occurring strings, which are very common in real-world data sets, by creating an on-the-fly dictionary of the most frequent strings. This allows executing many string operations with integer logic and reduces memory pressure.
We integrated these techniques into Vectorwise. On the TPC-H benchmark, our approach reduces peak memory consumption by 2–4× and improves performance by up to 1.5×. On a real-world BI workload, we measured a 2× improvement in performance and in micro-benchmarks we observed speedups of up to 25×.
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...NoSQLmatters
Ted Dunning – Very High Bandwidth Time Series Database Implementation
This talk will describe our work in creating time series databases with very high ingest rates (over 100 million points / second) on very small clusters. Starting with openTSDB and the off-the-shelf version of MapR-DB, we were able to accelerate ingest by >1000x. I will describe our techniques in detail and talk about the architectural changes required. We are also working to allow access to openTSDB data using SQL via Apache Drill. In addition, I will talk about how this work has implications regarding the much fabled Internet of Things. And tell some stories about the origins of open source big data in the 19th century at sea.
Spark Gotchas and Lessons Learned (2/20/20)Jen Waller
Presentation from the Boulder/Denver Big Data Meetup on 2/20/2020 in Boulder, CO. Topics covered: Troubleshooting Spark jobs (groupby, shuffle) for big data, tuning AWS EMR Spark clusters, EMR cluster resource utilization, writing scaleable Scala for scanning S3 metadata.
In Apache Cassandra Lunch #59: Functions in Cassandra, we discussed the functions that are usable inside of the Cassandra database. The live recording of Cassandra Lunch, which includes a more in-depth discussion and a demo, is embedded below in case you were not able to attend live.
Storing time series data with Apache CassandraPatrick McFadin
If you are looking to collect and store time series data, it's probably not going to be small. Don't get caught without a plan! Apache Cassandra has proven itself as a solid choice now you can learn how to do it. We'll look at possible data models and the the choices you have to be successful. Then, let's open the hood and learn about how data is stored in Apache Cassandra. You don't need to be an expert in distributed systems to make this work and I'll show you how. I'll give you real-world examples and work through the steps. Give me an hour and I will upgrade your time series game.
Skyline Innovations, a renewable energy company in Washington DC, uses MongoDB to store its time series data from its solar installations. This talk tells how, and why.
www.skylineinnovations.com
Given at MongoDC2011
Wikimedia Content API: A Cassandra Use-caseEric Evans
Among the resources offered by Wikimedia is an API providing low-latency access to full-history content, in many formats. Its results are often the product of computationally intensive transforms, and must be pre-generated and stored to meet latency expectations. Unsurprisingly, there are many challenges to providing low-latency access to such a large data-set, in a demanding, globally distributed environment.
This presentation covers the Wikimedia content API and its use of Apache Cassandra as storage for a diverse and growing set of use-cases. Trials, tribulations, and triumphs, of both a development and operational nature will be discussed.
Castle is an open-source project that provides an alternative to the lower layers of the storage stack -- RAID and POSIX filesystems -- for big data workloads, and distributed data stores such as Apache Cassandra.
This presentation from Berlin Buzzwords 2012 provides a high-level overview of Castle and how it is used with Cassandra to improve performance and predictability.
Data Modeling with Cassandra and Time Series DataDani Traphagen
This talk was for the Cassandra Users Meetup group in Portland, OR. We addressed data modeling with Cassandra and showed a cool time series biotech example.
Graph databases in computational bioloby: case of neo4j and TitanDBAndrei KUCHARAVY
Code used for demos is available from: https://github.com/chiffa/neo4jDemo repositry
Code used for IO over the reactome is available from: https://github.com/chiffa/PolyPharma
AWS Big Data Demystified #1: Big data architecture lessons learned Omid Vahdaty
AWS Big Data Demystified #1: Big data architecture lessons learned . a quick overview of a big data techonoligies, which were selected and disregard in our company
The video: https://youtu.be/l5KmaZNQxaU
dont forget to subcribe to the youtube channel
The website: https://amazon-aws-big-data-demystified.ninja/
The meetup : https://www.meetup.com/AWS-Big-Data-Demystified/
The facebook group : https://www.facebook.com/Amazon-AWS-Big-Data-Demystified-1832900280345700/
A presentation about the deployment of an ELK stack at bol.com
At bol.com we use Elasticsearch, Logstash and Kibana in a logsearch system that allows our developers and operations people to easilly access and search thru logevents coming from all layers of its infrastructure.
The presentations explains the initial design and its failures. It continues with explaining the latest design (mid 2014). Its improvements. And finally a set of tips are giving regarding Logstash and Elasticsearch scaling.
These slides were first presented at the Elasticsearch NL meetup on September 22nd 2014 at the Utrecht bol.com HQ.
Eko10 - Security Monitoring for Big Infrastructures without a Million Dollar ...Hernan Costante
Nowadays in an increasingly more complex and dynamic network its not enough to be a regex ninja and storing only the logs you think you might need. From network traffic to custom logs you won't know which logs will be crucial to stop the next attacker, and if you are not planning to spend a half of your security budget in a commercial solution we will show you a way to building you own SIEM with open source. The talk will go from how to build a powerful logging environment for your organization to scaling on the cloud and storing everything forever. We will walk through how to build such a system with open source solutions as Elasticsearch and Hadoop, and creating your own custom monitoring rules to monitor everything you need. The talk will also include how to secure the environment and allow restricted access to other teams as well as avoiding common pitfalls and ensuring compliance standards.
PGConf APAC 2018 - High performance json postgre-sql vs. mongodbPGConf APAC
Speakers: Dominic Dwyer & Wei Shan Ang
This talk was presented in Percona Live Europe 2017. However, we did not have enough time to test against more scenario. We will be giving an updated talk with a more comprehensive tests and numbers. We hope to run it against citusDB and MongoRocks as well to provide a comprehensive comparison.
https://www.percona.com/live/e17/sessions/high-performance-json-postgresql-vs-mongodb
Security Monitoring for big Infrastructures without a Million Dollar budgetJuan Berner
Nowadays in an increasingly more complex and dynamic network its not enough to be a regex ninja and storing only the logs you think you might need. From network traffic to custom logs you won't know which logs will be crucial to stop the next attacker, and if you are not planning to spend a half of your security budget in a commercial solution we will show you a way to building you own SIEM with open source. The talk will go from how to build a powerful logging environment for your organization to scaling on the cloud and storing everything forever. We will walk through how to build such a system with open source solutions as Elasticsearch and Hadoop, and creating your own custom monitoring rules to monitor everything you need. The talk will also include how to secure the environment and allow restricted access to other teams as well as avoiding common pitfalls and ensuring compliance standards.
a comprehensive good introduction to the the Big data world in AWS cloud, hadoop, Streaming, batch, Kinesis, DynamoDB, Hbase, EMR, Athena, Hive, Spark, Piq, Impala, Oozie, Data pipeline, Security , Cost, Best practices
From 1000/day to 1000/sec: The Evolution of Incapsula's BIG DATA System [Surg...Imperva Incapsula
Mondrian, MySQL, Mongo, Casandra, Lucene. You name it, we tried it. As a startup looking for cost-efficient and scalable solutions to power our event processing and statistics backend, we gave almost every Big Data technology out there a go. What we learned from these experiences is that doing it yourself is better than using plug-and-play black box solutions.
This presentation details the building of Incapsula’s Big Data system as a case study, examining the requirements and the different evolutionary phases it went through before becoming what it is today.
Enterprise Cloud Databases are fully managed and clustered databases tailored for production needs.
OVH takes care of all the infrastructure setup, you end up with you SQL access and are able to focus on your business.
Step-by-step process to scale up a LAMP stack application, using PHP7, Amazon Elastic Beanstalk and other free services. Covers many traps to be avoided when vertical and horizontal scaling.
Apache Cassandra is a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model.
This presentation, given at FOSDEM in 2010, provides a brief summary of cassandra's history, a high-level overview of the architecture and data model, and showcases some real life use-cases.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
5. OpenNMS: What It Is
● Network Management System
○ Discovery and Provisioning
○ Service monitoring
○ Data collection
○ Event management and notifications
● Java, open source, GPLv3
● Since 1999
7. RRDTool
● Round robin database
● First released 1999
● Time-series storage
● File-based
● Constant-size
● Automatic, amortized aggregation
8. Consider
● 2 IOPs per update (read-update-write)
● 1 RRD per data source (storeByGroup=false)
● 100,000s of data sources, 1,000s IOPS
● 1,000,000s of data sources, 10,000s IOPS
● 15,000 RPM SAS drive, ~175-200 IOPS
9. Also
● Not everything is a graph
● Inflexible
● Incremental backups impractical
● ...
10. Observation #1
We collect and write a great deal; We read
(graph) relatively little.
We are optimized for reading everything,
always.
11. Observation #2
Samples are naturally collected, and graphed
together in groups.
Grouping samples that are accessed together
is an easy optimization.
12. Project: Newts
Goals:
● Stand-alone time-series data store
● High-throughput
● Horizontally scalable
● Grouped metric storage/retrieval
● Late-aggregating