This document provides an overview of graph databases and Neo4j. It defines what a graph is mathematically and in the context of databases. It describes the key components of Neo4j including nodes, relationships, properties, labels, paths, traversals, and indexes. It also discusses the Cypher query language, performance advantages of Neo4j over SQL databases, and basic requirements and licensing options.
Combine Spring Data Neo4j and Spring Boot to quicklNeo4j
Speakers: Michael Hunger (Neo Technology) and Josh Long (Pivotal)
Spring Data Neo4j 3.0 is here and it supports Neo4j 2.0. Neo4j is a tiny graph database with a big punch. Graph databases are imminently suited to asking interesting questions, and doing analysis. Want to load the Facebook friend graph? Build a recommendation engine? Neo4j's just the ticket. Join Spring Data Neo4j lead Michael Hunger (@mesirii) and Spring Developer Advocate Josh Long (@starbuxman) for a look at how to build smart, graph-driven applications with Spring Data Neo4j and Spring Boot.
Odessapy2013 - Graph databases and PythonMax Klymyshyn
Page 10 "Я из Одессы я просто бухаю." translation: I'm from Odessa I just drink. Meaning his drinking a lot of "Vodka" ^_^ (@tuc @hackernews)
This is local meme - when someone asking question and you will look stupid in case you don't have answer.
An investigation of how PostgreSQL and its latest capabilities (JSONB data type, GIN indices, Full Text Search) can be used to store, index and perform queries on structured Bibliographic Data such as MARC21/MARCXML, breaking the dependence on proprietary and arcane or obsolete software products.
Talk presented at FOSDEM 2016 in Brussels on 31/01/2016. This is a very practical & hands-on presentation with example code which is certainly not optimal ;)
The social graph of Facebook is the most popular application for a graph database. In addition, there are far more exciting applications, such as spatial data, financial trail, indexing, and others. If you combine different graphs, you are able to evaluate those together with the algorithms known from the graph theory. As a graph, a domain can often be easier and more natural designed. This talk introduces the topic of graph databases and shows how to implement mediated models with large, complex and highly connected data with Neo4j. Subsequently, topics like querying, indexing, import / export are considered as well.
Php data structures – beyond spl (online version)Mark Baker
Presentation on the Trie datastructure, showing how it works, how it's used and what it can be used for; and an implementation of Tries in PHP... with occasional references to Rugby League
Example code to go with the slides can be found at https://github.com/MarkBaker/Tries
and
https://github.com/MarkBaker/QuadTrees
Combine Spring Data Neo4j and Spring Boot to quicklNeo4j
Speakers: Michael Hunger (Neo Technology) and Josh Long (Pivotal)
Spring Data Neo4j 3.0 is here and it supports Neo4j 2.0. Neo4j is a tiny graph database with a big punch. Graph databases are imminently suited to asking interesting questions, and doing analysis. Want to load the Facebook friend graph? Build a recommendation engine? Neo4j's just the ticket. Join Spring Data Neo4j lead Michael Hunger (@mesirii) and Spring Developer Advocate Josh Long (@starbuxman) for a look at how to build smart, graph-driven applications with Spring Data Neo4j and Spring Boot.
Odessapy2013 - Graph databases and PythonMax Klymyshyn
Page 10 "Я из Одессы я просто бухаю." translation: I'm from Odessa I just drink. Meaning his drinking a lot of "Vodka" ^_^ (@tuc @hackernews)
This is local meme - when someone asking question and you will look stupid in case you don't have answer.
An investigation of how PostgreSQL and its latest capabilities (JSONB data type, GIN indices, Full Text Search) can be used to store, index and perform queries on structured Bibliographic Data such as MARC21/MARCXML, breaking the dependence on proprietary and arcane or obsolete software products.
Talk presented at FOSDEM 2016 in Brussels on 31/01/2016. This is a very practical & hands-on presentation with example code which is certainly not optimal ;)
The social graph of Facebook is the most popular application for a graph database. In addition, there are far more exciting applications, such as spatial data, financial trail, indexing, and others. If you combine different graphs, you are able to evaluate those together with the algorithms known from the graph theory. As a graph, a domain can often be easier and more natural designed. This talk introduces the topic of graph databases and shows how to implement mediated models with large, complex and highly connected data with Neo4j. Subsequently, topics like querying, indexing, import / export are considered as well.
Php data structures – beyond spl (online version)Mark Baker
Presentation on the Trie datastructure, showing how it works, how it's used and what it can be used for; and an implementation of Tries in PHP... with occasional references to Rugby League
Example code to go with the slides can be found at https://github.com/MarkBaker/Tries
and
https://github.com/MarkBaker/QuadTrees
Streaming machine learning is being integrated in Spark 2.1+, but you don’t need to wait. Holden Karau and Seth Hendrickson demonstrate how to do streaming machine learning using Spark’s new Structured Streaming and walk you through creating your own streaming model. By the end of this session, you’ll have a better understanding of Spark’s Structured Streaming API as well as how machine learning works in Spark.
Introducing Apache Spark's Data Frames and Dataset APIs workshop seriesHolden Karau
This session of the workshop introduces Spark SQL along with DataFrames, Datasets. Datasets give us the ability to easily intermix relational and functional style programming. So that we can explore the new Dataset API this iteration will be focused in Scala.
Introduction to Spark Datasets - Functional and relational together at lastHolden Karau
Spark Datasets are an evolution of Spark DataFrames which allow us to work with both functional and relational transformations on big data with the speed of Spark.
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018Holden Karau
Apache Spark is one of the most popular big data systems, but once the shiny finish starts to wear off you can find yourself wondering if you've accidentally deployed a Ford Pinto into production. This talk will look at the challenges that come with scaling Spark jobs. Also, the talk will explore Spark's new(ish) Dataset/DataFrame API, as well as how it’s evolving in Spark 2.3 with improved Python support.
If you're already a Spark user, come to find out why it’s not all your fault. If you aren't already a Spark user, come to find out how to save yourself from some of the pitfalls once you move beyond the example code.
Check out Holden's newest book, High Performance Spark, for more information!
From https://niketechtalksjan2018.splashthat.com/
Beyond shuffling - Scala Days Berlin 2016Holden Karau
This session will cover our & community experiences scaling Spark jobs to large datasets and the resulting best practices along with code snippets to illustrate.
The planned topics are:
Using Spark counters for performance investigation
Spark collects a large number of statistics about our code, but how often do we really look at them? We will cover how to investigate performance issues and figure out where to best spend our time using both counters and the UI.
Working with Key/Value Data
Replacing groupByKey for awesomeness
groupByKey makes it too easy to accidently collect individual records which are too large to process. We will talk about how to replace it in different common cases with more memory efficient operations.
Effective caching & checkpointing
Being able to reuse previously computed RDDs without recomputing can substantially reduce execution time. Choosing when to cache, checkpoint, or what storage level to use can have a huge performance impact.
Considerations for noisy clusters
Functional transformations with Spark Datasets
How to have the some of benefits of Spark’s DataFrames while still having the ability to work with arbitrary Scala code
Talk given at ClojureD conference, Berlin
Apache Spark is an engine for efficiently processing large amounts of data. We show how to apply the elegance of Clojure to Spark - fully exploiting the REPL and dynamic typing. There will be live coding using our gorillalabs/sparkling API.
In the presentation, we will of course introduce the core concepts of Spark, like resilient distributed data sets (RDD). And you will learn how the Spark concepts resembles those well-known from Clojure, like persistent data structures and functional programming.
Finally, we will provide some Do’s and Don’ts for you to kick off your Spark program based upon our experience.
About Paulus Esterhazy and Christian Betz
Being a LISP hacker for several years, and a Java-guy for some more, Chris turned to Clojure for production code in 2011. He’s been Project Lead, Software Architect, and VP Tech in the meantime, interested in AI and data-visualization.
Now, working on the heart of data driven marketing for Performance Media in Hamburg, he turned to Apache Spark for some Big Data jobs. Chris released the API-wrapper ‘chrisbetz/sparkling’ to fully exploit the power of his compute cluster.
Paulus Esterhazy
Paulus is a philosophy PhD turned software engineer with an interest in functional programming and a penchant for hammock-driven development.
He currently works as Senior Web Developer at Red Pineapple Media in Berlin.
A super fast introduction to Spark and glance at BEAMHolden Karau
Apache Spark is one of the most popular general purpose distributed systems, with built in libraries to support everything from ML to SQL. Spark has APIs across languages including Scala, Java, Python, and R -- with more 3rd party language support (like Julia & C#). Apache BEAM is a cross-platform tool for building on top of different distributed systems, but its in it’s early stages. This talk will introduce the core concepts of Apache Spark, and look to the potential future of Apache BEAM.
Apache Spark has two core abstractions for representing distributed data and computations. This talk will introduce the basics of RDDs and Spark DataFrames & Datasets, and Spark’s method for achieving resiliency. Since it’s a big data talk, we will include the almost required wordcount example, and end the Spark part with follow up pointers on Spark’s new ML APIs. For folks who are interested we’ll then talk a bit about portability, and how Apache BEAM aims to improve portability (as well it’s unique approach to cross-language support).
Slides from Holden's talk at https://www.meetup.com/Wellington-Data-Scaling-Chats/events/mdcsdpyxcbxb/
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!Holden Karau
Slides from: https://www.meetup.com/Sydney-Apache-Spark-User-Group/events/246892684/
Welcome to the first Sydney Spark Meetup in 2018!
We are very glad to have an visiting Apache Spark committer Holden Karau to give a talk on streaming machine learning. Title: Streaming ML w/Spark (and why it's a bit painful today & #workingonit)
Apache Spark is one of the most popular distributed systems, and it has built in libraries for both machine learning and streaming. This talk will cover Spark's two streaming libraries, look at the future, and how to make streaming ML work today (for both serving and prediction). If you aren't familiar with Spark, that's ok! We'll spend the first ~5 minutes covering just enough to get through the rest of the talk, and for those of you already familiar you can spend those ~5 minutes downloading the sample code :)
About Holden:
Holden is a transgender Canadian open source developer advocate @ Google with a focus on Apache Spark, BEAM, and related "big data" tools. She is the co-author of Learning Spark, High Performance Spark, and another Spark book that's a bit more out of date. She is a committer on the Apache Spark, SystemML, and Mahout projects. She was tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal.
• What to bring
• Important to know
A couple of us will be at the doors of 60 Margaret St to let people in until 6.10pm.
Beyond Shuffling - Effective Tips and Tricks for Scaling Spark (Vancouver Sp...Holden Karau
Beyond Shuffling - Tips & Tricks for scaling your Apache Spark programs. This talk walks through a number of common mistakes which can keep our Spark programs from scaling and examines the solutions, as well as general techniques useful for moving from beyond a prof of concept to production. It covers topics like effective RDD re-use, considerations for working with key/value data, and finishes up with a preview of some of the work being done to add code generation to Spark ML.
Java Performance Tips (So Code Camp San Diego 2014)Kai Chan
Slides for my presentation at SoCal Code Camp, June 29, 2014 (http://www.socalcodecamp.com/socalcodecamp/session.aspx?sid=68942cd0-6714-4753-a218-20d4b48da07d)
Search Engine-Building with Lucene and SolrKai Chan
These are the slides for the session I presented at SoCal Code Camp San Diego on July 27, 2013.
http://www.socalcodecamp.com/socalcodecamp/session.aspx?sid=6b28337d-6eae-4003-a664-5ed719f43533
Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)Kai Chan
These are the slides for the session I presented at SoCal Code Camp Los Angeles on November 10, 2013.
http://www.socalcodecamp.com/socalcodecamp/session.aspx?sid=8cdfd955-2cd4-44a2-ad08-5353e079685a
Streaming machine learning is being integrated in Spark 2.1+, but you don’t need to wait. Holden Karau and Seth Hendrickson demonstrate how to do streaming machine learning using Spark’s new Structured Streaming and walk you through creating your own streaming model. By the end of this session, you’ll have a better understanding of Spark’s Structured Streaming API as well as how machine learning works in Spark.
Introducing Apache Spark's Data Frames and Dataset APIs workshop seriesHolden Karau
This session of the workshop introduces Spark SQL along with DataFrames, Datasets. Datasets give us the ability to easily intermix relational and functional style programming. So that we can explore the new Dataset API this iteration will be focused in Scala.
Introduction to Spark Datasets - Functional and relational together at lastHolden Karau
Spark Datasets are an evolution of Spark DataFrames which allow us to work with both functional and relational transformations on big data with the speed of Spark.
Beyond Wordcount with spark datasets (and scalaing) - Nide PDX Jan 2018Holden Karau
Apache Spark is one of the most popular big data systems, but once the shiny finish starts to wear off you can find yourself wondering if you've accidentally deployed a Ford Pinto into production. This talk will look at the challenges that come with scaling Spark jobs. Also, the talk will explore Spark's new(ish) Dataset/DataFrame API, as well as how it’s evolving in Spark 2.3 with improved Python support.
If you're already a Spark user, come to find out why it’s not all your fault. If you aren't already a Spark user, come to find out how to save yourself from some of the pitfalls once you move beyond the example code.
Check out Holden's newest book, High Performance Spark, for more information!
From https://niketechtalksjan2018.splashthat.com/
Beyond shuffling - Scala Days Berlin 2016Holden Karau
This session will cover our & community experiences scaling Spark jobs to large datasets and the resulting best practices along with code snippets to illustrate.
The planned topics are:
Using Spark counters for performance investigation
Spark collects a large number of statistics about our code, but how often do we really look at them? We will cover how to investigate performance issues and figure out where to best spend our time using both counters and the UI.
Working with Key/Value Data
Replacing groupByKey for awesomeness
groupByKey makes it too easy to accidently collect individual records which are too large to process. We will talk about how to replace it in different common cases with more memory efficient operations.
Effective caching & checkpointing
Being able to reuse previously computed RDDs without recomputing can substantially reduce execution time. Choosing when to cache, checkpoint, or what storage level to use can have a huge performance impact.
Considerations for noisy clusters
Functional transformations with Spark Datasets
How to have the some of benefits of Spark’s DataFrames while still having the ability to work with arbitrary Scala code
Talk given at ClojureD conference, Berlin
Apache Spark is an engine for efficiently processing large amounts of data. We show how to apply the elegance of Clojure to Spark - fully exploiting the REPL and dynamic typing. There will be live coding using our gorillalabs/sparkling API.
In the presentation, we will of course introduce the core concepts of Spark, like resilient distributed data sets (RDD). And you will learn how the Spark concepts resembles those well-known from Clojure, like persistent data structures and functional programming.
Finally, we will provide some Do’s and Don’ts for you to kick off your Spark program based upon our experience.
About Paulus Esterhazy and Christian Betz
Being a LISP hacker for several years, and a Java-guy for some more, Chris turned to Clojure for production code in 2011. He’s been Project Lead, Software Architect, and VP Tech in the meantime, interested in AI and data-visualization.
Now, working on the heart of data driven marketing for Performance Media in Hamburg, he turned to Apache Spark for some Big Data jobs. Chris released the API-wrapper ‘chrisbetz/sparkling’ to fully exploit the power of his compute cluster.
Paulus Esterhazy
Paulus is a philosophy PhD turned software engineer with an interest in functional programming and a penchant for hammock-driven development.
He currently works as Senior Web Developer at Red Pineapple Media in Berlin.
A super fast introduction to Spark and glance at BEAMHolden Karau
Apache Spark is one of the most popular general purpose distributed systems, with built in libraries to support everything from ML to SQL. Spark has APIs across languages including Scala, Java, Python, and R -- with more 3rd party language support (like Julia & C#). Apache BEAM is a cross-platform tool for building on top of different distributed systems, but its in it’s early stages. This talk will introduce the core concepts of Apache Spark, and look to the potential future of Apache BEAM.
Apache Spark has two core abstractions for representing distributed data and computations. This talk will introduce the basics of RDDs and Spark DataFrames & Datasets, and Spark’s method for achieving resiliency. Since it’s a big data talk, we will include the almost required wordcount example, and end the Spark part with follow up pointers on Spark’s new ML APIs. For folks who are interested we’ll then talk a bit about portability, and how Apache BEAM aims to improve portability (as well it’s unique approach to cross-language support).
Slides from Holden's talk at https://www.meetup.com/Wellington-Data-Scaling-Chats/events/mdcsdpyxcbxb/
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!Holden Karau
Slides from: https://www.meetup.com/Sydney-Apache-Spark-User-Group/events/246892684/
Welcome to the first Sydney Spark Meetup in 2018!
We are very glad to have an visiting Apache Spark committer Holden Karau to give a talk on streaming machine learning. Title: Streaming ML w/Spark (and why it's a bit painful today & #workingonit)
Apache Spark is one of the most popular distributed systems, and it has built in libraries for both machine learning and streaming. This talk will cover Spark's two streaming libraries, look at the future, and how to make streaming ML work today (for both serving and prediction). If you aren't familiar with Spark, that's ok! We'll spend the first ~5 minutes covering just enough to get through the rest of the talk, and for those of you already familiar you can spend those ~5 minutes downloading the sample code :)
About Holden:
Holden is a transgender Canadian open source developer advocate @ Google with a focus on Apache Spark, BEAM, and related "big data" tools. She is the co-author of Learning Spark, High Performance Spark, and another Spark book that's a bit more out of date. She is a committer on the Apache Spark, SystemML, and Mahout projects. She was tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal.
• What to bring
• Important to know
A couple of us will be at the doors of 60 Margaret St to let people in until 6.10pm.
Beyond Shuffling - Effective Tips and Tricks for Scaling Spark (Vancouver Sp...Holden Karau
Beyond Shuffling - Tips & Tricks for scaling your Apache Spark programs. This talk walks through a number of common mistakes which can keep our Spark programs from scaling and examines the solutions, as well as general techniques useful for moving from beyond a prof of concept to production. It covers topics like effective RDD re-use, considerations for working with key/value data, and finishes up with a preview of some of the work being done to add code generation to Spark ML.
Java Performance Tips (So Code Camp San Diego 2014)Kai Chan
Slides for my presentation at SoCal Code Camp, June 29, 2014 (http://www.socalcodecamp.com/socalcodecamp/session.aspx?sid=68942cd0-6714-4753-a218-20d4b48da07d)
Search Engine-Building with Lucene and SolrKai Chan
These are the slides for the session I presented at SoCal Code Camp San Diego on July 27, 2013.
http://www.socalcodecamp.com/socalcodecamp/session.aspx?sid=6b28337d-6eae-4003-a664-5ed719f43533
Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)Kai Chan
These are the slides for the session I presented at SoCal Code Camp Los Angeles on November 10, 2013.
http://www.socalcodecamp.com/socalcodecamp/session.aspx?sid=8cdfd955-2cd4-44a2-ad08-5353e079685a
During this presentation, Will covers the updates made in the Neo4j 3.0 release. He introduces Bolt (Neo4j's new binary protocol), and shows how developers can start using the Neo4j official drivers, build a stored procedure and take advantage of advanced support for cloud, container and on-premise.
This introduction to graph databases is specifically designed for Enterprise Architects who need to map business requirements to architectural components like graph databases. It explains how and why graphs matter for Enterprise Architecture and reviews the architectural differences between relational and graph models.
These webinar slides are an introduction to Neo4j and Graph Databases. They discuss the primary use cases for Graph Databases and the properties of Neo4j which make those use cases possible. They also cover the high-level steps of modeling, importing, and querying your data using Cypher and touch on RDBMS to Graph.
Intro to Graph Databases Using Tinkerpop, TitanDB, and GremlinCaleb Jones
A quick overview of the history, motivation, and uses of graph modeling and graph databases in various industries. Covers a brief introduction to graph databases with an emphasis on the Tinkerpop stack and Gremlin query language. These concepts are then solidified through a hands-on lab modeling a blog engine using Titan and Gremlin.
See more at http://allthingsgraphed.com.
Working With a Real-World Dataset in Neo4j: Import and ModelingNeo4j
This webinar will cover how to work with a real world dataset in Neo4j, with a focus on how to build a graph from an existing dataset (in this case a series of JSON files). We will explore how to performantly import the data into Neo4j - both in the case of an initial import and scaling writes for your graph application. We will demonstrate different approaches for data import (neo4j-import, LOAD CSV, and using the official Neo4j drivers), and discuss when it makes to use each import technique. If you've ever asked these questions, then this webinar is for you!
- How do I design a property graph model for my domain?
- How do I use the official Neo4j drivers?
- How can I deal with concurrent writes to Neo4j?
- How can I import JSON into Neo4j?
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Databricks
Graph data and graph analytics are increasingly important in data science and engineering. Cypher is an open language used for querying and updating graph databases and analytics platforms, which is now available in the Apache Spark environment. Neo4j Morpheus leverages the open source graph language project to integrate data from Neo4j operational graph databases with Hive and JDBC SQL data sources, using new Cypher features like the Property Graph Catalog, named graphs, graph projection, parameterized graph view functions, and graph/table views. Input and output graphs can be loaded and stored as structured collections of DataFrames with strong graph schemas to ensure data consistency and graph query optimization. Property graphs can also be analyzed and transformed using graph algorithms such as those in the GraphFrames project. Besides describing and demonstrating these capabilities, this talk also discusses the Spark Project Improvement Proposal to bring Cypher into Spark 3.0, and outlines current work to unify Cypher with other graph query languages to form a new ISO standard Graph Query Language.
Speakers: Alastair Green, Martin Junghanns
NoSQL no more: SQL on Druid with Apache Calcitegianmerlino
Druid is an analytics-focused, distributed, scale-out data store. Existing Druid clusters have scaled to petabytes of data and trillions of events, ingesting millions of events every second. Up until version 0.10, Druid could only be queried in a JSON-based language that many users found unfamiliar.
Enter Apache Calcite. It includes an industry-standard SQL parser, validator, and JDBC driver, as well as a cost-based relational optimizer. Calcite bills itself as “the foundation for your next high-performance database” and is used by Hive, Drill, and a variety of other projects. Druid uses Calcite to power Druid SQL, a standards-based query API that vaults Druid out of the NoSQL world and into the SQL world.
Gian Merlino offers an overview of Druid SQL and explains how Druid and Calcite are integrated and why you should stop worrying and learn to love relational algebra in your own projects.
Getting started with Graph Databases & Neo4jSuroor Wijdan
The presentation gives a brief information about Graph Databases and its usage in today's scenario. Moving on the presentation talks about the popular Graph DB Neo4j and its Cypher Query Language i.e., used to query the graph.
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...jexp
Highlighting the progress in Neo4j 3.3 and 3.4 especially
Neo4j Desktop, Graph Algorithms, NLP, Date-Time, Geospatial, and performance.
Also featuring the new visualization tool Neo4j Bloom.
Anatomy of Data Frame API : A deep dive into Spark Data Frame APIdatamantra
In this presentation, we discuss about internals of spark data frame API. All the code discussed in this presentation available at https://github.com/phatak-dev/anatomy_of_spark_dataframe_api
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
5. What is a Graph in math
3
● represent a connected set of objects
● graph:
○ vertex (node/points)
○ edge (arc/line/relationship/arrow) - undirected
○ attribute (property) - on node/relationship
● types:
○ pair: G = (V, E)
○ digraph: D = (V, A)
○ mixed: G = (V, E, A)
V = {1, 2, 3, 4, 5, 6}
E = {{1, 2}, {1, 5}, {2, 3}, {2, 5}, {3, 4}, {4, 5}, {4, 6}}
6. What is a Graph database
4
● stores data in a graph and retrieving vast networks of data
● shines when storing richly-connected data
● consists of nodes, connected by relationships
○ A Graph —records data in→ Nodes —which have→ Properties
○ Nodes —are organized by→ Rels —which also have→ Properties
○ Nodes —are grouped by→ Labels —into→ Sets
○ A Traversal —navigates→ a Graph
it —identifies→ Paths —which order→ Nodes
○ An Index —maps from→ Properties —to either→ Nodes or Rels
○ A Graph Database —manages a→ Graph and
—also manages related→ Indexes
8. Graph Traversal
6
A Traversal
—navigates→ a Graph
it
—identifies→ Paths
—which order→ Nodes
what music
do my friends like
that I don’t yet own
if this power supply goes down,
what web services
are affected?
9. Graph Index
7
An Index
—maps from→ Properties
—to either→ Nodes or Rels
find the Account
for username master-of-graphs
13. A Graph Database elaborates a Key-Value Store
11
K* = key
V* = value
14. A Graph Database relates Column-Family
12
● BigTable databases are an evolution of key-value,
using "families" to allow grouping of rows
● stored in a graph, the families could become
hierarchical, and the relationships among data
becomes explicit
15. A Graph Database navigates a Document Store
13
D=Document,
S=Subdocument,
V=Value,
D2/S2 = reference
18. ● intuitive, using a graph model for data representation
● reliable, fully transactional, upholds ACID
● durable and fast, using a custom disk-based, native storage engine
● massively scalable, up to several billion nodes/relationships/properties
● highly-available, when distributed across multiple machines
● expressive, with a powerful, human readable declarative graph query
language
● fast, with a powerful traversal framework for high-speed graph queries
● embeddable, with a few small jars
● simple, accesible by a convenient REST API interface or an object-
oriented JAVA API
● indexes are based on Apache Lucene, supports Secondary Indexes
● has been in commercial development for 10 years and in production for
over 7 years; since 2003;
● Cross-platform; Simple set-up; Well documented; Open source;
● GPL for Community, AGPL for Enterprise
16
Neo4j features
19. ● CPU - Intel Core i3/i7
● Memory - 2GB .. 16/32GB
● Disk - 10GB SATA .. SSD w/ SATA
● Filesystem - ext4 .. ext4/ZFS
● Software - Oracle JAVA 7
17
Neo4j requirements
20. ● Neo4j Community
○ Open-Source High Performance
○ fully ACID transactional graph database
● Neo4j Enterprise
○ High-Performance Cache (up to 10x faster)
○ Horizontal scalability with Neo4j Clustering (predictable scalability)
○ High-availability and online backups
○ Cache based sharding (shard your graph in memory)
○ Advanced Monitoring (operational metrics)
○ Certified for Windows and Linux
○ Email/Phone Support (10x5, 24x7 hours)
○ Subscriptions
■ Personal (up to 3 devs, $100k annual revenue) = FREE
■ Startups (<$10M funding, <$5M annual revenue) = $12k
■ Business (medium, to Global 2000) = Contact Sales
18
Neo4j license
21. 19
● for the simple friends of friends query, Neo4j is 60% faster than MySQL
● for friends of friends of friends, Neo is 180 times faster
● and for the depth four query, Neo4j is 1,135 times faster
● and MySQL just chokes on the depth 5 query
Neo4j vs. Mysql
22. Neo4j: Nodes
● fundamental units that form a graph
● can have key/value-style properties
● index nodes and relationships
by {key, value} pairs
● represent entities
20
23. Neo4j: Relationships #1/2
● connect entities and structure domain
● allow for finding related data
● are always directed (outgoing or incoming)
● are equally well traversed in either direction
● can have relationships to itself
● have a relationship type (label)
21
25. Neo4j: Properties
● nodes and relationships can have properties
● are key-value pairs
○ key is a string
○ values can be either a primitive or an array of
one primitive type
■ boolean, String, int, int[], etc
■ Java Language Specification
● entity attributes, rels qualities,
and metadata
23
26. Neo4j: Labels
● used to group nodes into sets
● any number of labels, including none
● can be added and removed during runtime
● can be used to mark temporary states for nodes
● names case-sensitive
● CamelCase (convention)
24
27. Neo4j: Paths
● is one or more nodes with connecting relationships
● shortest path:
● a path of length one:
● a path of length one:
25
28. Neo4j: Traversal
● Traversal Framework from box
● means visiting nodes, following relationships by rules
● in most cases only a subgraph is visited
● callback based traversal API
○ you can specify the traversal rules
● traversing breadth- or depth-first
● open Java API
26
29. Neo4j: graph algorithms
● A* (> uses the A* algorithm to find the cheapest path between two
nodes)
● Dijkstra (dijkstra > Dijkstra algorithm to find the cheapest path
between two nodes)
● PathWithLength (> all paths of a certain length (depth)
between two nodes)
● Shortest paths (shortestPath Default > find all the
shortest paths between two nodes)
● All simple paths (allSimplePaths > find all simple paths
between two nodes; without loops;)
● All paths (allPaths > find all available paths between two
nodes)
27
31. ● introduced in Neo4j 2.0
● eventually available (populating in the background, is
not immediately available for querying)
○ come online after fully populated
○ failed status (drop and recreate the index)
● can be created on labels group
● indexed Nodes & Rels
● node_auto_indexing=false,
node_keys_indexable
Neo4j: Index
29
32. Neo4j: Constraints
● can help you keep your data clean
● specify the rules for what your data should
look like
● unique constraints is the only available
constraint type
30
33. ● single server instance
○ nodes = 2^35 (~34 billion)
○ relationships = 2^35 (~34 billion)
○ labels = 2^31 (~2 billion)
○ properties = 2^36 to 2^38 depending on
property types (maximum ~274 billion, always
at least ~68 billion)
○ relationship types = 2^15 (~ 32’000)
31
Neo4j: Data Size
34. ● powerful graph query language
● relatively simple
● declarative grammar (say what you want, not how)
● humane query language
● self-explanatory (based on English prose and neat iconography)
● written in Scala
● pattern-matching (borrows expression approaches from SPARQL)
● aggregation, ordering, limits
● create, update, delete
● structure and most of keywords inspired by SQL
● changing rather rapidly (CYPHER 1.9 START ...)
Cypher Query Language
32
“Makes the simple things easy, and the complex things possible”
37. Cypher: START / RETURN
“It all starts with the START”
Michael Hunger, Cypher webinar, Sep 2012
● designates the start points
● START is optional (in Neo4j >= 2.0)
Examples:
● START <lookup> RETURN <expression>
● START n=node(0) RETURN n
● START n=node(*) RETURN n.name
35
38. Cypher: MATCH
● primary way of getting data from the database
● START <lookup> MATCH <pattern> RETURN <expr>
● OPTIONAL MATCH <lookup> RETURN <expr>
Examples:
● MATCH (n) RETURN count(n)
● MATCH (actor:Actor) RETURN actor.name;
● START me=node(0) MATCH (me)--(f) RETURN f.name
● MATCH (n)-[r]->(m) RETURN n AS FROM, r AS `->`, m AS TO
36
40. Cypher: WHERE
● filters the results
● MATCH <pattern> WHERE <condition> RETURN <expr>
Examples:
● WHERE n.name =~ “(?i)John.*”
● WHERE NOT ..
● WHERE type(rel) =~ “Perso.*”
38
41. Cypher: RETURN
● creates the result table
● any query can return data
● can be nodes, relationships, or properties on these
● RETURN DISTINCT <expression> AS x
● RETURN aggregate(expr) as alias
● RETURN nodes, rels, properties
● RETURN expressions of funcs and operators
● RETURN aggregation funcs on the above
39
42. Cypher: etc
● CASE / WHEN / ELSE
● ORDER BY node.key, node2.key, .. ASC|DESC
● LIMIT / SKIP
● WITH (WITH count(*) as c)
● UNION / UNION ALL (combining results from multiple queries)
● USING INDEX/SCAN
● MERGE / SET / DELETE / REMOVE / FORECH
● Expressions
● Operators
● Comments
● Functions: ALL, ANY, LENGTH, {Math}, {String}, ...
40
43. ● any updating query will run in a transaction
● ACID
● “it is very important to finish each transaction”
● write lock on node/rel:
○ adding, changing or removing prop on a node/rel
● write lock on node:
○ creating or deleting a node
● write lock on node and both its nodes:
○ creating or deleting a relationship
Cypher: Transactions
41
45. ● SELECT *
FROM Person
WHERE name=“Valentin” and age > 30
● START person=node:Person(node=”Valentin”)
WHERE person.age > 30
RETURN person
Cypher: back to SQL #1/5
43
46. Cypher: back to SQL #2/5
● SELECT “Email”.*
FROM Person
JOIN “Email” ON “Person”.id = “Email”.person_id
WHERE “Person”.name = “Benedikt”
● START person=node:Person(name=”Benedikt”)
MATCH person-[:email]->email
RETURN email
44
47. Cypher: back to SQL #3/5
● show me all people that are both actors and
directors
● SELECT name FROM Person
WHERE
person_id IN (SELECT person_id FROM Actor) AND
person_id IN (SELECT person_id FROM Director)
● START person=node:Person(“name:*”)
WHERE (person)-[:ACTS_IN]->()
AND (person)-[:DIRECTED]->()
RETURN person.name
45
48. Cypher: back to SQL #4/5
● show me all Tom Hanks’s co-actors
● SELECT DISTICT co_actor.name FROM Person tom
JOIN Movie a1 ON tom.person_in = a1.person_id
JOIN Actor a2 ON a1.movie_id = a2.movie_id
JOIN Person co_actor ON co_actor.person_id = a2.person_id
WHERE tom.name = “Tom Hanks”
● START tom=node:Person(name=”Tom Hanks”)
MATCH tom-[:ACTS_IN]->movie,
co_actor-[:ACTS_IN]->movie
RETURN DISTINCT co_actor.name
46
49. Cypher: back to SQL #5/5
● show me all Lucy’s favorite directors
● SELECT dir.name, count(*) FROM Person lucy
JOIN Actor on Person.person_id = Actor.person_id
JOIN Director ON Actor.movie_id = Director.movie_id
JOIN Person dir ON Director.person_id = dir.person_id
WHERE lucy.name = “Lucy Liu”
GROUP BY dir.name
ORDER BY count(*) DESC
● START lucy=node:Person(name=”Lucy Liu”)
MATCH lucy-[:ACTS_IN]->movie,
director-[:DIRECTED]->movie
RETURN director.name, count(*)
ORDER BY director.name, count(*) DESC
47
50. START
lucy = node:Person(name=”Lucy Lui”),
kevin = node:Person(name=”Kevin Bacon”)
MATCH
p = shortestPath( lucy-[:ACTS_IN*]-kevin )
RETURN
EXTRACT (n in NODES(p):
COALESCE(n.name?, n.title?))
48
Cypher: back to SQL #6/5
52. Neo4j: Security
● does not deal with data encryption
explicitly
● can be used all means built into the Java
● can be used encrypted datastore
● webadmin https
50
53. ● manipulate data stored in RDF format
● focused on match triple sets
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?email
WHERE {
?person a foaf:Person.
?person foaf:name ?name.
?person foaf:mbox ?email.
}
SPARQL
51
54. ● graph traversal language
● scripting language
● Pipe & Filter (similar to jQuery)
● across different graph databases
● based on Groovy (limited to Java)
● not as stable in Neo4j
● XPath like
● ./outE[label=”family”]/inV/@name
● g.v(1).out('likes').in('likes').out('likes').groupCount(m)
● g.V.as('x').out.groupCount(m).loop('x'){c++ < 1000}
● g.v(1).in(‘LOVE_OF’).out(‘SOME_IN’).has(‘title’,’abc’).back(2)
Gremlin
52
55. Neo4j and PHP
● everyman/neo4jphp < packagist.org
○ PHP wrapper for the Neo4j using REST interface
○ Follows the PSR-0 autoloading standard
○ Basic wrappers for all components
○ Last update - a month ago
○ supports Gremlin
● Neo4j-PHP OGM < a lot of based on
○ Object Graph Mapper, inspired by Doctrine
○ based on DoctrineCommon
○ borrows significantly DoctrineORM design
○ uses annotations on classes
○ MIT Licence
● Neo4J PHP REST API client
○ Using Neo4j REST API
○ Node create/find/delete
○ Relationship create/list/filter
53
56. High Availability with Neo4j
● in HA - a single master and zero or more slaves
● slave synchronizing with the master to preserve
consistency
● master write to slave before transaction completes
54
57. Demo
Neo4j.org Example Datasets:
● DrWho (nodes=1'060; rels=2'286)
● Cineasts Movies & Actors (nodes=64'069; rels=121'778)
● Hubway Data Challenge (nodes=554'674; rels=2'011'904)
GraphGist:
● JIRA and neo4j
● PHP and neo4j
● Kant in neo4j
XSS
55
65. ● GrapheneDB - based on neo4j
● AllegroGraph - Closed Source, Commercial, RDF-QuadStore
● Sones - Closed Source, .NET focused
○ graph database built around the W3C spec for the Resource
Description Framework
○ supports SPARQL, RDFS++, and Prolog
● Virtuoso - Closed Source, RDF focused
● GraphDB - graph database built in .NET by the German company sones
● InfiniteGraph - goal is to create a graph database with "virtually
unlimited scalability."
● FlockDB
Analogues
63
67. ● best used for graph-style,
rich or complex,
structured dense data,
deep graphs with unlimited depth and cyclical,
with weighted connections,
interconnected data
● quickly add new functionality without impacting
existing deployments
● schema-less forcing to re-think entire approach to data
● not the silver bullet for all problems
Conclusion