This document provides an overview of using Neo4j and graph databases for bootstrapping recommendations in real-time. It discusses common recommendation approaches like popularity, content-based, collaborative filtering and hybrid recommendations. It also covers challenges of real-time recommendations like processing relationships and accommodating new data continuously. Additionally, it demonstrates sample Cypher queries for calculating similarity between users and providing movie recommendations based on a user's nearest neighbors.
An introduction to Neo4j and Graph Databases. Learn about the primary use cases for Graph Databases and explore the properties of Neo4j that make those use cases possible.
An introduction to Neo4j and Graph Databases. Learn about the primary use cases for Graph Databases and explore the properties of Neo4j that make those use cases possible.
Review the latest features released in Neo4j version 4.1 including Cypher, database drivers, clustering, security, and extension libraries like APOC and Spring Data Neo4j!
These webinar slides are an introduction to Neo4j and Graph Databases. They discuss the primary use cases for Graph Databases and the properties of Neo4j which make those use cases possible. They also cover the high-level steps of modeling, importing, and querying your data using Cypher and touch on RDBMS to Graph.
This developer-focused webinar will explain how to use the Cypher graph query language. Cypher, a query language designed specifically for graphs, allows for expressing complex graph patterns using simple ASCII art-like notation and offers a simple but expressive approach for working with graph data.
During this webinar you'll learn:
-Basic Cypher syntax
-How to construct graph patterns using Cypher
-Querying existing data
-Data import with Cypher
-Using aggregations such as statistical functions
-Extending the power of Cypher using procedures and functions
026 Neo4j Data Loading (ETL_ELT) Best Practices - NODES2022 AMERICAS Advanced...Neo4j
What patterns are most appropriate for building ETLs using Neo4j? In this session, we share how we built the Google Cloud DataFlow flex template using the Neo4j Java API. You can then apply the same approach to building read and write operators in any framework, including AWS Lambda and Google Cloud Functions.
Neo4j is a powerful and expressive tool for storing, querying and manipulating data. However modeling data as graphs is quite different from modeling data under a relational database. In this talk, Michael Hunger will cover modeling business domains using graphs and show how they can be persisted and queried in Neo4j. We'll contrast this approach with the relational model, and discuss the impact on complexity, flexibility and performance.
APOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures Libraryjexp
APOC has become the de-facto standard utility library for Neo4j. In this talk, I will demonstrate some of the lesser known but very useful components of APOC that will save you a lot of work. You will also learn how to combine individual functions into powerful constructs to achieve impressive feats
This will be a fast-paced demo/live-coding talk.
Video: https://neo4j.com/graphconnect-2018/session/neo4j-utility-library-apoc-pearls
Unicorn images by TeeTurtle.com (Unstable Unicorns is a fun game & cool t-shirts)
Complex hierarchical relationships between entities can only be mapped with difficulty in a relational database and demanding queries are usually quite slow.
Graph databases are optimized for exactly these kinds of relationships and can provide high-performance results even with huge amounts of data. Moreover, not only the entities that are stored in the database, have attributes, but also their relationships. Queries can look at entities as well as their relationships.
Get to know the basics of graph databases, using Neo4j as an example, and see how it is used C# projects.
Maps and Meaning: Graph-based Entity Resolution in Apache Spark & GraphXDatabricks
Data integration and the automation of tedious data extraction tasks are the fundamental building blocks of a data-driven organizations and are overlooked or underestimated at times. Aside from data extraction, scraping and ETL tasks, entity resolution is a crucial step in successfully combining datasets. The combination of data sources is usually what provides richness in features and variance. Building an expertise in entity resolution is important for data engineerings to successfully combine data sources. Graph-based entity resolution algorithms have emerged as a highly effective approach.
This talk will present the implementation of a graph-bases entity resolution technique in GraphX and in GraphFrames respectively. Working from concept, through how to implement the algorithm in Spark, the technique will also be illustrated by walking through a practical example. The technique will exhibit an example where efficacy can be achieved based on simple heuristics, and at the same time map a path to a machine-learning assisted entity resolution engine with a powerful knowledge graph at its center.
The role of ML can be found upstream in building the graph, for example by using classification algorithms in determining the link strength between nodes based on data, or downstream where dimensionality reduction can play a role in clustering and reduce the computational load in the resolution stage. The audience will leave with a clear picture of a scalable data pipeline performing entity resolution effectively and a thorough understanding of the internal mechanism, ready to apply it to their use cases.
Building Event Streaming Architectures on Scylla and KafkaScyllaDB
Event streaming architectures require high-throughput, low-latency components to consistently and smoothly transfer data between heterogenous transactional and analytical systems. Join us and Confluent's Tim Berglund to learn how the Scylla and Confluent Kafka interoperate as a foundation upon which you can build enterprise-grade, event-driven applications, plus a use case from Numberly.
With the introduction of the Neo4j Graph Platform and increased adoption of graph database technology across all industries, now is a better time than ever to get started with graphs.
Join us for this introduction to Neo4j and graph databases. We'll discuss the primary use cases for graph databases and explore the properties of Neo4j that make those use cases possible.
The trend nowadays is to represent the relationships between entities in a graph structure. Neo4j is a NOSQL graph database, which allows for fast and effective queries on connected data. Implementation of own algorithms is possible, which can improve the functionality of built in API. We make use of the graph database to model and recommend movies and other media content.
Recommendation and personalization systems are an important part of many modern websites. Graphs provide a natural way to represent the behavioral data that is the core input to many recommendation algorithms. Thomas Pinckney and his colleagues at Hunch (recently acquired by eBay) built a large scale recommendation system, and then ported the technology to eBay. Thomas will be discussing how his team uses Cassandra to provide the high I/O storage of their fifty billion edge graphs and how they generate new recommendations in real time as users click around the site.
Review the latest features released in Neo4j version 4.1 including Cypher, database drivers, clustering, security, and extension libraries like APOC and Spring Data Neo4j!
These webinar slides are an introduction to Neo4j and Graph Databases. They discuss the primary use cases for Graph Databases and the properties of Neo4j which make those use cases possible. They also cover the high-level steps of modeling, importing, and querying your data using Cypher and touch on RDBMS to Graph.
This developer-focused webinar will explain how to use the Cypher graph query language. Cypher, a query language designed specifically for graphs, allows for expressing complex graph patterns using simple ASCII art-like notation and offers a simple but expressive approach for working with graph data.
During this webinar you'll learn:
-Basic Cypher syntax
-How to construct graph patterns using Cypher
-Querying existing data
-Data import with Cypher
-Using aggregations such as statistical functions
-Extending the power of Cypher using procedures and functions
026 Neo4j Data Loading (ETL_ELT) Best Practices - NODES2022 AMERICAS Advanced...Neo4j
What patterns are most appropriate for building ETLs using Neo4j? In this session, we share how we built the Google Cloud DataFlow flex template using the Neo4j Java API. You can then apply the same approach to building read and write operators in any framework, including AWS Lambda and Google Cloud Functions.
Neo4j is a powerful and expressive tool for storing, querying and manipulating data. However modeling data as graphs is quite different from modeling data under a relational database. In this talk, Michael Hunger will cover modeling business domains using graphs and show how they can be persisted and queried in Neo4j. We'll contrast this approach with the relational model, and discuss the impact on complexity, flexibility and performance.
APOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures Libraryjexp
APOC has become the de-facto standard utility library for Neo4j. In this talk, I will demonstrate some of the lesser known but very useful components of APOC that will save you a lot of work. You will also learn how to combine individual functions into powerful constructs to achieve impressive feats
This will be a fast-paced demo/live-coding talk.
Video: https://neo4j.com/graphconnect-2018/session/neo4j-utility-library-apoc-pearls
Unicorn images by TeeTurtle.com (Unstable Unicorns is a fun game & cool t-shirts)
Complex hierarchical relationships between entities can only be mapped with difficulty in a relational database and demanding queries are usually quite slow.
Graph databases are optimized for exactly these kinds of relationships and can provide high-performance results even with huge amounts of data. Moreover, not only the entities that are stored in the database, have attributes, but also their relationships. Queries can look at entities as well as their relationships.
Get to know the basics of graph databases, using Neo4j as an example, and see how it is used C# projects.
Maps and Meaning: Graph-based Entity Resolution in Apache Spark & GraphXDatabricks
Data integration and the automation of tedious data extraction tasks are the fundamental building blocks of a data-driven organizations and are overlooked or underestimated at times. Aside from data extraction, scraping and ETL tasks, entity resolution is a crucial step in successfully combining datasets. The combination of data sources is usually what provides richness in features and variance. Building an expertise in entity resolution is important for data engineerings to successfully combine data sources. Graph-based entity resolution algorithms have emerged as a highly effective approach.
This talk will present the implementation of a graph-bases entity resolution technique in GraphX and in GraphFrames respectively. Working from concept, through how to implement the algorithm in Spark, the technique will also be illustrated by walking through a practical example. The technique will exhibit an example where efficacy can be achieved based on simple heuristics, and at the same time map a path to a machine-learning assisted entity resolution engine with a powerful knowledge graph at its center.
The role of ML can be found upstream in building the graph, for example by using classification algorithms in determining the link strength between nodes based on data, or downstream where dimensionality reduction can play a role in clustering and reduce the computational load in the resolution stage. The audience will leave with a clear picture of a scalable data pipeline performing entity resolution effectively and a thorough understanding of the internal mechanism, ready to apply it to their use cases.
Building Event Streaming Architectures on Scylla and KafkaScyllaDB
Event streaming architectures require high-throughput, low-latency components to consistently and smoothly transfer data between heterogenous transactional and analytical systems. Join us and Confluent's Tim Berglund to learn how the Scylla and Confluent Kafka interoperate as a foundation upon which you can build enterprise-grade, event-driven applications, plus a use case from Numberly.
With the introduction of the Neo4j Graph Platform and increased adoption of graph database technology across all industries, now is a better time than ever to get started with graphs.
Join us for this introduction to Neo4j and graph databases. We'll discuss the primary use cases for graph databases and explore the properties of Neo4j that make those use cases possible.
The trend nowadays is to represent the relationships between entities in a graph structure. Neo4j is a NOSQL graph database, which allows for fast and effective queries on connected data. Implementation of own algorithms is possible, which can improve the functionality of built in API. We make use of the graph database to model and recommend movies and other media content.
Recommendation and personalization systems are an important part of many modern websites. Graphs provide a natural way to represent the behavioral data that is the core input to many recommendation algorithms. Thomas Pinckney and his colleagues at Hunch (recently acquired by eBay) built a large scale recommendation system, and then ported the technology to eBay. Thomas will be discussing how his team uses Cassandra to provide the high I/O storage of their fifty billion edge graphs and how they generate new recommendations in real time as users click around the site.
What Finance can learn from Dating SitesMax De Marzi
Dating, as is often said, is a numbers game. And organizations such as Match.com, and Zoosk rely on very sophisticated technology as they sift through vast customer bases to create the most compatible couples. Specially, they rely on data to build the most nuanced portraits of their members that they can, so they can find the best matches. This is a business-critical activity for dating sites — the more successful the matching, the better revenues will be. One of the ways they do this is through graph databases. These differ from relational databases as they specialize in identifying the relationships between multiple data points. This means they can query and display connections between people, preferences and interests very quickly.
In this session you will see how in many ways dating sites are getting better performance and more value out of their data than financial institutions by using Neo4j.
Summary: Graphs are structures commonly used in computer science that model the interactions among entities. I will start from introducing the basic formulations of graph based machine learning, which has been a popular topic of research in the past decade and led to a powerful set of techniques. Particularly, I will show examples on how it acts as a generic data mining and predictive analytic tool. In the second part, I am going to discuss applications of such learning techniques in media analytics: (1) image analysis, where visually coherent objects are isolated from images; (2) social analysis of videos, where actors' social properties are predicted from videos. Materials in this part are based on our recent publications in highly selective venues (papers on https://sites.google.com/site/leiding2010/ ).
Bio: Lei Ding is a researcher making sense of large amounts of data in all media types. He currently works in Intent Media as a scientist, focusing on data analytics and applied machine learning in online advertising. Previously, he has worked in several research institutions including Columbia University, UIUC and IBM Research on digital / social media analysis and understanding. He received a Ph.D. degree in Computer Science and Engineering from The Ohio State University, where he was a Distinguished University Fellow.
Neo4j + MongoDB. Neo4j Doc Manager for Mongo Connector - GraphConnect SF 2015William Lyon
Polyglot persistence is all about taking advantage of the strengths of multiple database technologies together to enhance your application. The Neo4j Doc Manager for Mongo Connector allows application developers to use the Neo4j graph database alongside the MongoDB document database to add functionality to applications.
A presentation on how Showyou uses the Riak datastore at Showyou.com, as well as work we've been doing on a custom Riak backend for search and analytics.
Are you sick of seeing your team treated as a sausage machine for turning user stories into code? Can your developers only talk about how long something will take, or how exactly it will be built?
In this session, I’ll explore how to get your team focused on delivering customer value instead by:
• asking questions that refine stories to deliver value more effectively, rather than estimating story points and technical tasks
• building and refining backlogs around customer journeys, rather than breaking down epics into prioritised lists of stories
• creating a culture of continuous discovery and experimentation, rather than delivering a fixed roadmap of features
This session directly addresses some common problems tech leads face in managing and estimating product backlogs. We’re giving specific methods that tech leads can take back to their teams to start dealing with those problems. More people are having to manage “non technical” discussions about customer value and this session offers a framework to help with those discussions.
How to Build a Recommendation Engine on SparkCaserta
How to Build a Recommendation Engine on Spark was a presentation given by Joe Caserta, CEO and founder of Caserta Concepts, at @AnalyticsWeek in Boston.
Boston's Data AnalyticsStreet Conference is a 2 day packed event with thought provoking keynotes, knowledge filled sessions, intense workshops, insightful panels, and real-world case studies - engaging analytics community with latest methodologies and trends. The conference encompasses largest Speaker-to-Attendee ratio for unmatched networking and learning opportunity.
For more information on the services and solutions Caserta Concepts offers, visit our website at http://casertaconcepts.com/.
Natural Language Processing with Graph Databases and Neo4jWilliam Lyon
Originally presented at DataDay Texas in Austin, this presentation shows how a graph database such as Neo4j can be used for common natural language processing tasks, such as building a word adjacency graph, mining word associations, summarization and keyword extraction and content recommendation.
Netflix is the world’s leading Internet television network with over 48 million members in more than 40 countries enjoying more than one billion hours of TV shows and movies per month, including original series. Netflix uses machine learning to deliver a personalized experience to each one of our 48 million users.
In this talk you will hear about the machine learning algorithms that power almost every part of the Netflix experience, including some of our recent work on distributed Neural Networks on AWS GPUs. You will also get an insight into the innovation approach that includes offline experimentation and online AB testing. Finally, you will learn about the system architectures that enable all of this at a Netflix scale.
talk at KTH 14 May 2014 about matrix factorization, different latent and neighborhood models, graphs and energy diffusion for recommender systems, as well as what makes good/bad recommendations.
The modern RECRUITER needs to be tech-savvy, love tinkering with tools and have a marketing brain. A trifecta of skills critical for any recruiter. This session marries technology, tools, psychology and marketing practices to solve sourcing roadblocks. Susanna will illustrate how you map talent, embrace technology and apply new tools.
SEOktoberfest 2022 - Blending SEO, Discover, & Entity Extraction to Analyze D...Amsive
Lily Ray, Sr. SEO Director and Head of Organic Research at Amsive, talks about blending SEO, Discover, & entity extraction to analyze data at scale at MozCon SEOktoberfest 2022.
Conversion Models: A Systematic Method of Building Learning to Rank Training ...Lucidworks
When using user signals to improve relevance, what should you use? Clicks are more frequent, but really only correspond to a search result looking attractive. A conversion is a powerful signal of true relevance but occurs less frequently. Can we combine shallow "this looks interesting" click events along with strong, but rare conversion signals in a robust fashion to generate learning to rank training data? In this talk, we introduce click models, an industry-proven way of measuring search result attractiveness from clicks, and propose a systematic way of incorporating conversion data into click models. Whether your industry is conversion heavy (like e-commerce), or lacking in any clear conversion signal (like publishing) you'll take away from this talk a system for turning any search analytics into robust judgments and training data. Because, after all, there is no AI-based Search without good training data!
Doug Turnbull, OpenSource Connections
Presented at JavaOne 2013, Tuesday September 24.
"Data Modeling Patterns" co-created with Ian Robinson.
"Pitfalls and Anti-Patterns" created by Ian Robinson.
Similar to Bootstrapping Recommendations with Neo4j (20)
Outrageous ideas for Graph Databases
Almost every graph database vendor raised money in 2021. I am glad they did, because they are going to need the money. Our current Graph Databases are terrible and need a lot of work. There I said it. It's the ugly truth in our little niche industry. That's why despite waiting for over a decade for the "Year of the Graph" to come we still haven't set the world on fire. Graph databases can be painfully slow, they can't handle non-graph workloads, their APIs are clunky, their query languages are either hard to learn or hard to scale. Most graph projects require expert shepherding to succeed. 80% of the work takes 20% of the time, but that last 20% takes forever. The graph database vendors optimize for new users, not grizzly veterans. They optimize for sales not solutions. Come listen to a Rant by an industry OG on where we could go from here if we took the time to listen to the users that haven't given up on us yet.
Outrageous ideas for Graph Databases
Almost every graph database vendor raised money in 2021. I am glad they did, because they are going to need the money. Our current Graph Databases are terrible and need a lot of work. There I said it. It's the ugly truth in our little niche industry. That's why despite waiting for over a decade for the "Year of the Graph" to come we still haven't set the world on fire. Graph databases can be painfully slow, they can't handle non-graph workloads, their APIs are clunky, their query languages are either hard to learn or hard to scale. Most graph projects require expert shepherding to succeed. 80% of the work takes 20% of the time, but that last 20% takes forever. The graph database vendors optimize for new users, not grizzly veterans. They optimize for sales not solutions. Come listen to a Rant by an industry OG on where we could go from here if we took the time to listen to the users that haven't given up on us yet.
Los estafadores ahora están utilizando métodos más sofisticados y dinámicos con tarjetas de crédito, el blanqueo de dinero y otros tipos de fraude. El aprovechamiento de la tecnología gráfica le permitirá ver más allá de los puntos de datos individuales y descubrir patrones difíciles de detectar.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
2. About
Me
• Max
De
Marzi
-‐
Neo4j
Field
Engineer
• My
Blog:
http://maxdemarzi.com
• Find
me
on
Twitter:
@maxdemarzi
• Email
me:
maxdemarzi@gmail.com
• GitHub:
http://github.com/maxdemarzi
3. Big
Data
-‐
What
is
it
good
for?
• Absolutely
Nothing!
• Benchmarks
Is
this
performing
better
then
that?
Yes,
why?
Uh.
• Recommendations
You
should
buy
this
right
now.
• Predictions
You
will
probably
buy
this.
8. Collaborative
Filtering
Recommendations
• Step
1:
Collect
User
Behavior
• Step
2:
Find
similar
Users
• Step
3:
Recommend
Behavior
taken
by
similar
users
• Example:
People
with
similar
musical
tastes
10. Using
Relationships
for
Recommendations
Content-‐based
filtering
Recommend
items
based
on
what
users
have
liked
in
the
past
Collaborative
filtering
Predict
what
users
like
based
on
the
similarity
of
their
behaviors,
activities
and
preferences
to
others
Movie
Person
Person
RATED
SIMILARITY
rating:
7
value:
.92
12. Benefits
of
Real-‐Time
Recommendations
Online
Retail
• Suggest
related
products
and
services
• Increase
revenue
and
engagement
Media
and
Broadcasting
• Create
an
engaging
experience
• Produce
personalized
content
and
offers
Logistics
• Recommend
optimal
routes
• Increase
network
efficiency
13. Challenges
for
Real-‐Time
Recommendations
Make
effective
real-‐time
recommendations
• Timing
is
everything
in
point-‐of-‐touch
applications
• Base
recommendations
on
current
data,
not
last
night’s
batch
load
Process
large
amounts
of
data
and
relationships
for
context
• Relevance
is
king:
Make
the
right
connections
• Drive
traffic:
Get
users
to
do
more
with
your
application
Accommodate
new
data
and
relationships
continuously
• Systems
get
richer
with
new
data
and
relationships
• Recommendations
become
more
relevant
14. Relational
vs.
Graph
Models
Relational
Model Graph
Model
RATED
RATED
RATED
MAX
Person MovieRatings
MAX
Terminator
Toy
Story
Titanic
15. Cypher
Query
Language
MATCH
(:Person
{
name:“Dan”}
)
-‐[:KNOWS]-‐>
(:Person
{
name:“Ann”}
)
KNOWS
Dan Ann
Label Property Label Property
Node Node
16. MATCH
(boss)-‐[:MANAGES*0..3]-‐>(sub),
(sub)-‐[:MANAGES*1..3]-‐>(report)
WHERE
boss.name
=
“John
Doe”
RETURN
sub.name
AS
Subordinate,
count(report)
AS
Total
Express
Complex
Queries
Easily
with
Cypher
Find
all
direct
reports
and
how
many
people
they
manage,
up
to
3
levels
down
Cypher
QuerySQL
Query
19. Cypher
Query:
Movie
Recommendation
MATCH
(watched:Movie
{title:"Toy
Story”})
<-‐[r1:RATED]-‐
()
-‐[r2:RATED]-‐>
(unseen:Movie)
WHERE
r1.rating
>
7
AND
r2.rating
>
7
AND
watched.genres
=
unseen.genres
AND
NOT(
(:Person
{username:”maxdemarzi"})
-‐[:RATED|WATCHED]-‐>
(unseen)
)
RETURN
unseen.title,
COUNT(*)
ORDER
BY
COUNT(*)
DESC
LIMIT
25
What
are
the
Top
25
Movies
• that
I
haven't
seen
• with
the
same
genres
as
Toy
Story
• given
high
ratings
• by
people
who
liked
Toy
Story
21. Cypher
Query:
Ratings
of
Two
Users
MATCH
(p1:Person
{name:'Michael
Sherman’})
-‐[r1:RATED]-‐>
(m:Movie),
(p2:Person
{name:'Michael
Hunger’})
-‐[r2:RATED]-‐>
(m:Movie)
RETURN
m.name
AS
Movie,
r1.rating
AS
`M.
Sherman's
Rating`,
r2.rating
AS
`M.
Hunger's
Rating`
What
are
the
Movies
these
2
users
have
both
rated
23. Cypher
Query:
Cosine
Similarity
MATCH
(p1:Person)
-‐[x:RATED]-‐>
(m:Movie)
<-‐[y:RATED]-‐
(p2:Person)
WITH
SUM(x.rating
*
y.rating)
AS
xyDotProduct,
SQRT(REDUCE(xDot
=
0.0,
a
IN
COLLECT(x.rating)
|
xDot
+
a^2))
AS
xLength,
SQRT(REDUCE(yDot
=
0.0,
b
IN
COLLECT(y.rating)
|
yDot
+
b^2))
AS
yLength,
p1,
p2
MERGE
(p1)-‐[s:SIMILARITY]-‐(p2)
SET
s.similarity
=
xyDotProduct
/
(xLength
*
yLength)
Calculate
it
for
all
Person
nodes
with
at
least
one
Movie
between
them
25. Cypher
Query:
Your
nearest
neighbors
MATCH
(p1:Person
{name:'Grace
Andrews’})
-‐[s:SIMILARITY]-‐
(p2:Person)
WITH
p2,
s.score
AS
sim
ORDER
BY
sim
DESC
LIMIT
5
RETURN
p2.name
AS
Neighbor,
sim
AS
Similarity
Who
are
the
• top
5
Persons
and
their
similarity
score
• ordered
by
similarity
in
descending
order
• for
Grace
Andrews
27. Cypher
Query:
k-‐NN
Recommendation
MATCH
(m:Movie)
<-‐[r:RATED]-‐
(b:Person)
-‐[s:SIMILARITY]-‐
(p:Person
{name:'Zoltan
Varju'})
WHERE
NOT(
(p)
-‐[:RATED]-‐>
(m)
)
WITH
m,
s.similarity
AS
similarity,
r.rating
AS
rating
ORDER
BY
m.name,
similarity
DESC
WITH
m.name
AS
movie,
COLLECT(rating)[0..3]
AS
ratings
WITH
movie,
REDUCE(s
=
0,
i
IN
ratings
|
s
+
i)*1.0
/
LENGTH(ratings)
AS
recommendation
ORDER
BY
recommendation
DESC
RETURN
movie,
recommendation
LIMIT
25
What
are
the
Top
25
Movies
• that
Zoltan
Varju
has
not
seen
• using
the
average
rating
• by
my
top
3
neighbors
29. Recommend
Jobs
to
Job
Seekers
What
connects
them?
• location
• skills
• education
• experience
30. Cypher
Query:
Job
Recommendation
What
are
the
Top
10
Jobs
for
me
• that
are
in
the
same
location
I’m
in
• for
which
I
have
the
necessary
qualifications
31. Job
Recommendation
Results
Perfect
Candidate
for
100%
matches
• missing
qualifications
can
be
added
quickly
• might
encourage
exaggerated
resumes
32. Just
one
tiny
itsy
bitsy
problem
Job
Boards
get
paid
by
• Number
of
Applicants
to
a
Job
• Wholesale
Resume
sales
• Selling
your
data
33. Recommend
Love
Find
your
soulmate
in
the
graph
• Are
they
energetic?
• Do
they
like
dogs?
• Have
a
good
sense
of
humor?
• Neat
and
tidy,
but
not
crazy
about
it?
What
are
the
Top
10
Potential
Mates
for
me
• that
are
in
the
same
location
• are
sexually
compatible
• have
traits
I
want
• want
traits
I
have
44. Hacker
News
Recommendations
• Which
stories
should
I
read?
• Which
users
should
I
follow?
• What
else
should
I
be
interested
in?
• Who
seems
to
know
a
lot
about
X?
• Etc.
45. GraphAware
Recommendation
Framework
• Ability
to
trade
off
recommendation
quality
for
speed
• Ability
to
pre-‐compute
recommendations
• Built-‐in
algorithms
and
functions
• Ability
to
measure
recommendation
quality
• Ability
to
easily
run
in
A/B
test
environments
47. Walmart
BUSINESS
CASE
World’s
largest
company
by
revenue
World’s
largest
retailer
and
private
employer
SF-‐based
global
e-‐commerce
division
manages
several
websites
Found
in
1969
Bentonville,
Arkansas
• Needed
online
customer
recommendations
to
keep
pace
with
competition
• Data
connections
provided
predictive
context,
but
were
not
in
a
usable
format
• Solution
had
to
serve
many
millions
of
customers
and
products
while
maintaining
superior
scalability
and
performance
48. Walmart
SOLUTION
• Brings
customers,
preferences,
purchases,
products
and
locations
into
a
graph
model
• Uses
connections
to
make
product
recommendations
• Solution
deployed
across
WalMart
divisions
and
websites
49. Global
Courier
BUSINESS
CASE
World’s
largest
courier
480,000
employees
€55
billion
in
revenue
Needed
new
B2C
and
B2B
parcel
routing
system
for
its
logistics
practice
Legacy
system
neither
supported
the
full
network
nor
the
shift
to
online
demands
Needed
to
replace
aging
B2B
and
B2C
parcel
routing
system
whose
requirements
include:
• 24x7
availability
• Peak
loads
of
5M
parcels
per
day,
3K
per
second
• Support
for
complex
and
diverse
software
stack
• Predictable
performance
with
linear
scalability
• Daily
changes
to
logistics
networks
• Route
from
any
point
to
any
point
• Single
point
of
truth
for
entire
network
50. Global
Courier
SOLUTION
Neo4j
provides
the
ideal
domain
fit
since
a
logistics
network
is
a
graph
• High
availability
and
performance
via
Neo4j
clustering
• Greatly
simplified
Cypher
queries
for
routing
versus
relational
SQL
queries
• Flexible
data
model
that
reflects
the
real
logistics
world
far
better
than
relational
• Easy-‐to-‐grasp
whiteboard-‐friendly
model
51. eBay
BUSINESS
CASE
C2C
and
B2C
retail
network
Full
e-‐commerce
functionality
for
individuals
and
businesses
Integrated
with
logistics
vendors
for
product
deliveries
• Needed
an
offering
to
compete
with
Amazon
Prime
• Enable
customer-‐selected
delivery
inside
90
minutes
• Calculate
best
route
option
in
real-‐time
• Scale
to
enable
a
variety
of
services
• Offer
more
predictable
delivery
times
52. eBay
Now
SOLUTION
• Acquired
UK-‐based
Shutl.
a
leader
in
same-‐day
delivery
• Used
Neo4j
to
create
eBay
Now
• 1000
times
faster
than
the
prior
MySQL-‐based
solution
• Faster
time-‐to-‐market
• Improved
code
quality
with
10
to
100
times
less
query
code
53. Classmates
BUSINESS
CASE
Online
yearbook
connecting
friends
from
school,
work
and
military
in
US
and
Canada
Founded
as
Memory
Lane
in
Seattle
Develop
new
social
networking
capabilities
to
monetize
yearbook-‐related
offerings
• Show
all
the
people
I
know
in
a
yearbook
• Show
yearbooks
my
friends
appear
in
most
often
• Show
sections
of
a
yearbook
that
my
friends
appear
most
in
• Show
me
other
schools
my
friends
attended
54. Classmates
SOLUTION
Neo4j
provides
a
robust
and
scalable
graph
database
solution
• 3-‐instance
cluster
with
cache
sharding
and
disaster-‐recovery
• 18ms
response
time
for
top
4
queries
• 100M
nodes
and
600M
relationships
in
initial
graph—including
people,
images,
schools,
yearbooks
and
pages
• Projected
to
grow
to
1B
nodes
and
6B
relationships
55. National
Geographic
BUSINESS
CASE
Non-‐profit
scientific
and
educational
institution
founded
in
1888
Covers
geography,
archaeology,
natural
science,
environment
and
historical
conservation
Journals,
online
media,
radio,
TV,
documentaries,
live
events
and
consumer
content
and
goods
• Improve
poor
performance
of
PostgreSQL
app
• Increase
user
engagement
by
linking
to
100+
years
of
multimedia
content
• Improve
targeting
by
understand
subscribers’
interests
better
• Recommend
content
and
services
to
users
based
on
their
interests
56. National
Geographic
SOLUTION
• Enabled
complex
real-‐time
analytics
across
eight
million
users
and
a
century
of
content
• Delivered
robust
performance
by
eliminating
triple-‐nested
SQL
joins
• Cross-‐refers
users
among
content,
live
events,
travel,
goods
and
causes
• Neo4j
solution
much
less
cumbersome
and
easier
to
maintain
than
previous
SQL
system
57. Curaspan
BUSINESS
CASE
Leader
in
patient
management
for
discharges
and
referrals
Manages
patient
referrals
4600+
health
care
facilities
Connects
providers,
payers
via
web-‐based
patient
management
platform
Founded
in
1999
in
Newton,
Massachusetts
• Improve
poor
performance
of
Oracle
solution
• Support
more
complexity
including
granular,
role-‐based
access
control
• Satisfy
complex
Graph
Search
queries
by
discharge
nurses
and
intake
coordinators
Find
a
skilled
nursing
facility
within
n
miles
of
a
given
location,
belonging
to
health
care
group
XYZ,
offering
speech
therapy
and
cardiac
care,
and
optionally
Italian
language
services
58. Curaspan
SOLUTION
• Met
fast,
real-‐time
performance
demands
• Supported
queries
span
multiple
hierarchies
including
provider
and
employee-‐permissions
graphs
• Improved
data
model
to
handle
adding
more
dimensions
to
the
data
such
as
insurance
networks,
service
areas
and
care
organizations
• Greatly
simplified
queries,
simplifying
multi-‐page
SQL
statements
into
one
Neo4j
function
59. FiftyThree
BUSINESS
CASE
Maker
of
Paper,
one
of
the
top
apps
in
Apple’s
App
Store,
with
millions
of
users
Based
in
New
York
City
• Add
social
capabilities
to
digital-‐paper
app
• Support
social
collaboration
across
millions
of
users
in
new
Mix
app
• Enable
seamless
interaction
between
social
and
content-‐asset
networks
• Ensure
new
apps
are
robust,
scalable
and
fast
60. FiftyThree
SOLUTION
• Neo4j
data
model
ideal
for
social
network,
content
management
and
access
control
• Users
create,
publish
and
share
designs
simply
• Easy
to
develop
and
evolve
Neo4j-‐based
app
• Integrates
well
with
FiftyThree
EC2
architecture
See
the
Neo4j
solution
in
action
Betting
the
Company
(Literally)
on
a
Graph
Database
http://aseemk.com/talks/neo4j-‐lessons-‐learned#/
App
Store
Editor’s
Choice
2012
iPad
App
of
Year
Apple
Best
Apps
of
2014
61. Questions
• How
does
Neo4j
fit
into
my
existing
infrastructure?
As
a
Service.
• Will
Neo4j
scale?
Yes.