This document contains a presentation on using graph databases for recommendations. It begins with an introduction to graphs and graph theory, then discusses what graph databases are and how they are different from relational databases. It explains how graphs are well-suited for complex querying and representing connected data. The presentation describes how recommendation systems work and how graph algorithms and storing recommendation data in a graph structure provide benefits like real-time recommendations, navigating relationships between items, and efficient operations. It concludes with a demonstration, examples, and discussing future events.
This document discusses using Neo4J to model configuration management databases and hobby projects. It describes converting relational models to graph databases by making components nodes and relations relationships. This allows visualizing the data and running queries to understand dependencies. The document provides an example of modeling an Oracle database configuration and a running competition as a graph. It recommends using Python and libraries like py2neo and Flask to quickly build a web interface for visualizing and interacting with the graph models.
Graph databases store data in graph structures with nodes, edges, and properties. Neo4j is a popular open-source graph database that uses a property graph model. It has a core API for programmatic access, indexes for fast lookups, and Cypher for graph querying. Neo4j provides high availability through master-slave replication and scales horizontally by sharding graphs across instances through techniques like cache sharding and domain-specific sharding.
AI from your data lake: Using Solr for analyticsDataWorks Summit
Introductory technical session on Apache Solr's (HDP Search) artificial intelligence and machine learning features to discover relationships and insights across big data in the enterprise. Discussions will include how Solr performs graph traversal, anomaly detection, NLP and time-series analysis, and how you can display this data to users with easy-to-create dashboards.
This technical session will review Apache Solr’s streaming expressions, which were introduced in Solr 6.5. With over 100 expressions and evaluators, conditional logic, variables and data structures these functions form the basis of a new paradigm that brings many of the features from the relational world into search. These new capabilities form the basis of a powerful functional programming language that enables the implementation of many parallel computing use cases such as anomaly detection, streaming NLP, graph traversal and time-series analysis.
In order to discover and analyze big data, third party tools such as Jupyter, Tableau, and Lucidworks Insights will be reviewed.
Speaker
Cassandra Targett, Lucidworks, Director of Engineering
Marcelline Saunders, Lucidworks, Director, Global Partner Enablement
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Lucidworks
This document summarizes a presentation about query-time nonparametric regression and time routed aliases in Solr. It discusses how nonparametric multiplicative regression was used to continuously predict user interests for an online career coaching system based on click-through data. It also describes how time routed aliases in Solr provide a built-in way to implement time-partitioned indexing of timestamped data across multiple collections while automatically adding and removing collections over time.
Graph databases are used to represent graph structures with nodes, edges and properties. Neo4j, an open-source graph database is reliable and fast for managing and querying highly connected data. Will explore how to install and configure, create nodes and relationships, query with the Cypher Query Language, importing data and using Neo4j in concert with SQL Server... Providing answers and insight with visual diagrams about connected data that you have in your SQL Server Databases!
Introduction to Machine Learning for Oracle Database ProfessionalsAlex Gorbachev
This document summarizes a presentation on practical machine learning for database administrators. It discusses using machine learning to classify PL/SQL code as good or bad, classify database schemas, cluster SQL statements, and detect anomalies in database workloads. The presentation covers what machine learning is, why it can be useful for databases, and provides examples of applying machine learning to common DBA problems like code classification. It describes building a naive Bayes classification model in Oracle to classify PL/SQL code, including extracting text features, training and testing the model, and assessing performance.
This document contains a presentation on using graph databases for recommendations. It begins with an introduction to graphs and graph theory, then discusses what graph databases are and how they are different from relational databases. It explains how graphs are well-suited for complex querying and representing connected data. The presentation describes how recommendation systems work and how graph algorithms and storing recommendation data in a graph structure provide benefits like real-time recommendations, navigating relationships between items, and efficient operations. It concludes with a demonstration, examples, and discussing future events.
This document discusses using Neo4J to model configuration management databases and hobby projects. It describes converting relational models to graph databases by making components nodes and relations relationships. This allows visualizing the data and running queries to understand dependencies. The document provides an example of modeling an Oracle database configuration and a running competition as a graph. It recommends using Python and libraries like py2neo and Flask to quickly build a web interface for visualizing and interacting with the graph models.
Graph databases store data in graph structures with nodes, edges, and properties. Neo4j is a popular open-source graph database that uses a property graph model. It has a core API for programmatic access, indexes for fast lookups, and Cypher for graph querying. Neo4j provides high availability through master-slave replication and scales horizontally by sharding graphs across instances through techniques like cache sharding and domain-specific sharding.
AI from your data lake: Using Solr for analyticsDataWorks Summit
Introductory technical session on Apache Solr's (HDP Search) artificial intelligence and machine learning features to discover relationships and insights across big data in the enterprise. Discussions will include how Solr performs graph traversal, anomaly detection, NLP and time-series analysis, and how you can display this data to users with easy-to-create dashboards.
This technical session will review Apache Solr’s streaming expressions, which were introduced in Solr 6.5. With over 100 expressions and evaluators, conditional logic, variables and data structures these functions form the basis of a new paradigm that brings many of the features from the relational world into search. These new capabilities form the basis of a powerful functional programming language that enables the implementation of many parallel computing use cases such as anomaly detection, streaming NLP, graph traversal and time-series analysis.
In order to discover and analyze big data, third party tools such as Jupyter, Tableau, and Lucidworks Insights will be reviewed.
Speaker
Cassandra Targett, Lucidworks, Director of Engineering
Marcelline Saunders, Lucidworks, Director, Global Partner Enablement
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Lucidworks
This document summarizes a presentation about query-time nonparametric regression and time routed aliases in Solr. It discusses how nonparametric multiplicative regression was used to continuously predict user interests for an online career coaching system based on click-through data. It also describes how time routed aliases in Solr provide a built-in way to implement time-partitioned indexing of timestamped data across multiple collections while automatically adding and removing collections over time.
Graph databases are used to represent graph structures with nodes, edges and properties. Neo4j, an open-source graph database is reliable and fast for managing and querying highly connected data. Will explore how to install and configure, create nodes and relationships, query with the Cypher Query Language, importing data and using Neo4j in concert with SQL Server... Providing answers and insight with visual diagrams about connected data that you have in your SQL Server Databases!
Introduction to Machine Learning for Oracle Database ProfessionalsAlex Gorbachev
This document summarizes a presentation on practical machine learning for database administrators. It discusses using machine learning to classify PL/SQL code as good or bad, classify database schemas, cluster SQL statements, and detect anomalies in database workloads. The presentation covers what machine learning is, why it can be useful for databases, and provides examples of applying machine learning to common DBA problems like code classification. It describes building a naive Bayes classification model in Oracle to classify PL/SQL code, including extracting text features, training and testing the model, and assessing performance.
This document provides an overview of effective big data visualization. It discusses information visualization and data visualization, including common chart types like histograms, scatter plots, and dashboards. It covers visualization goals, considerations, processes, basics, and guidelines. Examples of good visualization are provided. Tools for creating infographics are listed, as are resources for learning more about data visualization and references. Overall, the document serves as a comprehensive introduction to big data visualization.
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...Rodney Joyce
Number 2 in the Data Science for Dummies series - We'll predict Titanic survival with Databricks, python and MLSpark.
These are the slides only (excuse the Powerpoint animation issues) - check out the actual tech talk on YouTube: https://rodneyjoyce.home.blog/2019/05/03/data-science-for-dummies-machine-learning-with-databricks-python-sparkml-tech-talk-1-of-7/)
If you have not used Databricks before check out the first talk - Databricks for Dummies.
Here's the rest of the series: https://rodneyjoyce.home.blog/tag/data-science-for-dummies/
1) Data Science overview with Databricks
2) Titanic survival prediction with Azure Machine Learning Studio + Kaggle
3) Data Engineering with Titanic dataset + Databricks + Python
4) Titanic with Databricks + Spark ML
5) Titanic with Databricks + Azure Machine Learning Service
6) Titanic with Databricks + MLS + AutoML
7) Titanic with Databricks + MLFlow
8) Titanic with .NET Core + ML.NET
9) Deployment, DevOps/MLOps and Productionisation
Sparking Science up with Research Recommendations by Maya HristakevaSpark Summit
Mendeley Suggest is a personalized article recommender system that recommends relevant research articles to researchers. It uses various recommender algorithms like collaborative filtering and content-based filtering. Spark has proven to be a good alternative to Mahout for the computation layer, though some tuning is required. User-based collaborative filtering has been shown to outperform item-based collaborative filtering and matrix factorization methods for Mendeley Suggest. Offline evaluation is important before deploying recommendations online to test performance and quality.
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheDremio Corporation
From DataEngConf 2017 - Everybody wants to get to data faster. As we move from more general solution to specific optimization techniques, the level of performance impact grows. This talk will discuss how layering in-memory caching, columnar storage and relational caching can combine to provide a substantial improvement in overall data science and analytical workloads. It will include a detailed overview of how you can use Apache Arrow, Calcite and Parquet to achieve multiple magnitudes improvement in performance over what is currently possible.
Tired of seeing the loading spinner of doom while trying to analyze your big data on Tableau? Learn how Jethro accelerates your database so you can interactively analyze your big data on Tableau and gain the crucial insights that you need without losing your train of thought. Jethro enables you to be completely flexible with no need for partitions in order to speed up the data. This presentation will explain how indexing is a superior architecture for the BI use case when dealing with big data while compared to MPP architecture.
Alex mang patterns for scalability in microsoft azure applicationCodecamp Romania
The document discusses patterns for scalability in Microsoft Azure applications. It covers queue-based load leveling, competing consumers, and priority queue patterns for handling application load and message processing. It also discusses materialized view and sharding patterns for scaling databases, where materialized views optimize queries and sharding partitions data horizontally across multiple servers. The talk includes demos of priority queue and sharding patterns to illustrate their implementations.
This document provides an overview of Neo4j, a graph database management system. It discusses how Neo4j stores data as nodes and relationships, allowing for fast querying of connected data. Traditional relational databases struggle with complex relationships, while NoSQL databases don't support relationships at all. Neo4j addresses these issues through its native graph storage and processing capabilities. The document highlights key Neo4j features like scalability, high performance, and its Cypher query language.
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...Spark Summit
Building a machine learning model is an iterative process. A data scientist will build many tens to hundreds of models before arriving at one that meets some acceptance criteria. However, the current style of model building is ad-hoc and there is no practical way for a data scientist to manage models that are built over time. In addition, there are no means to run complex queries on models and related data.
In this talk, we present ModelDB, a novel end-to-end system for managing machine learning (ML) models. Using client libraries, ModelDB automatically tracks and versions ML models in their native environments (e.g. spark.ml, scikit-learn). A common set of abstractions enable ModelDB to capture models and pipelines built across different languages and environments. The structured representation of models and metadata then provides a platform for users to issue complex queries across various modeling artifacts. Our rich web frontend provides a way to query ModelDB at varying levels of granularity.
ModelDB has been open-sourced at https://github.com/mitdbg/modeldb.
Tech-Talk at Bay Area Spark Meetup
Apache Spark(tm) has rapidly become a key tool for data scientists to explore, understand and transform massive datasets and to build and train advanced machine learning models. The question then becomes, how do I deploy these model to a production environment. How do I embed what I have learned into customer facing data applications. Like all things in engineering, it depends.
In this meetup, we will discuss best practices from Databricks on how our customers productionize machine learning models and do a deep dive with actual customer case studies and live demos of a few example architectures and code in Python and Scala. We will also briefly touch on what is coming in Apache Spark 2.X with model serialization and scoring options.
Scoring at Scale: Generating Follow Recommendations for Over 690 Million Link...Databricks
The Communities AI team at LinkedIn generates follow recommendations from a large (10’s of millions) set of entities to each of our 690+ million members.
knowIT is a collaborative semantic wiki used by Johnson & Johnson to map their IT systems, applications, servers and stakeholders. It aims to capture knowledge about these informatics systems, their relationships and components to answer questions, facilitate knowledge sharing and enable self-service. The wiki uses Semantic MediaWiki and has grown to include systems portfolio management, configuration management and other features to increase IT systems knowledge across the organization.
Options for Data Prep - A Survey of the Current MarketDremio Corporation
Data comes in many shapes and sizes, and every company struggles to find ways to transform, validate, and enrich data for multiple purposes. The problem has been around as long as data, and the market has an overwhelming number of options. In this presentation we look at the problem and key options from vendors in the market today. Dremio is a new approach that eliminates the need for stand alone data prep tools.
How to Survive as a Data Architect in a Polyglot Database WorldKaren Lopez
Karen Lopez talks to data architects and data moders how they can best deliver value on modern data drive projects beyond relational database technologies. She covers NoSQL Databases and Datastores, which data stories they best fit and which ones they don't. She ends with 10 tips for adding more value to ployschematic database solutions.
Practical Machine Learning for Smarter Search with Solr and SparkJake Mannix
This document discusses using Apache Spark and Apache Solr together for practical machine learning and data engineering tasks. It provides an overview of Spark and Solr, why they are useful together, and then gives an example of exploring and analyzing mailing list archives by indexing the data into Solr with Spark and performing both unsupervised and supervised machine learning techniques.
Forritun gagnaaðgangs er líklega eitt algengasta viðfangsefni við gerð enterprice lausna. Einhvern vegin verðum við að geyma stöður og gögn. Til þess eru töflugagnagrunnar (relational databases) lílkega algengasta formið af geymslu. Gallinn er sá að hlutbundin forritun fellur ekkert sérlega vel að töflugrunnum.
Í þessum fyrirlestri er yfir þau vandamál sem koma upp við hönnun gagnalagsins og hvernig best er að brúa bilið milli klasa í forriti og taflna í grunni.
20141015 how graphs revolutionize access managementRik Van Bruggen
This document discusses how graph databases can revolutionize access and identity management. It begins with an introduction to graphs and graph databases, explaining how they are well-suited for complex querying of connected data. The document then argues that graph databases allow for a more accurate representation of real-world identity relationships, which are often multi-dimensional, and enable real-time queries that eliminate the need for integration between different systems. A demonstration of a graph database is provided, followed by examples, licensing information and a question and answer section.
This document provides an overview of effective big data visualization. It discusses information visualization and data visualization, including common chart types like histograms, scatter plots, and dashboards. It covers visualization goals, considerations, processes, basics, and guidelines. Examples of good visualization are provided. Tools for creating infographics are listed, as are resources for learning more about data visualization and references. Overall, the document serves as a comprehensive introduction to big data visualization.
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...Rodney Joyce
Number 2 in the Data Science for Dummies series - We'll predict Titanic survival with Databricks, python and MLSpark.
These are the slides only (excuse the Powerpoint animation issues) - check out the actual tech talk on YouTube: https://rodneyjoyce.home.blog/2019/05/03/data-science-for-dummies-machine-learning-with-databricks-python-sparkml-tech-talk-1-of-7/)
If you have not used Databricks before check out the first talk - Databricks for Dummies.
Here's the rest of the series: https://rodneyjoyce.home.blog/tag/data-science-for-dummies/
1) Data Science overview with Databricks
2) Titanic survival prediction with Azure Machine Learning Studio + Kaggle
3) Data Engineering with Titanic dataset + Databricks + Python
4) Titanic with Databricks + Spark ML
5) Titanic with Databricks + Azure Machine Learning Service
6) Titanic with Databricks + MLS + AutoML
7) Titanic with Databricks + MLFlow
8) Titanic with .NET Core + ML.NET
9) Deployment, DevOps/MLOps and Productionisation
Sparking Science up with Research Recommendations by Maya HristakevaSpark Summit
Mendeley Suggest is a personalized article recommender system that recommends relevant research articles to researchers. It uses various recommender algorithms like collaborative filtering and content-based filtering. Spark has proven to be a good alternative to Mahout for the computation layer, though some tuning is required. User-based collaborative filtering has been shown to outperform item-based collaborative filtering and matrix factorization methods for Mendeley Suggest. Offline evaluation is important before deploying recommendations online to test performance and quality.
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheDremio Corporation
From DataEngConf 2017 - Everybody wants to get to data faster. As we move from more general solution to specific optimization techniques, the level of performance impact grows. This talk will discuss how layering in-memory caching, columnar storage and relational caching can combine to provide a substantial improvement in overall data science and analytical workloads. It will include a detailed overview of how you can use Apache Arrow, Calcite and Parquet to achieve multiple magnitudes improvement in performance over what is currently possible.
Tired of seeing the loading spinner of doom while trying to analyze your big data on Tableau? Learn how Jethro accelerates your database so you can interactively analyze your big data on Tableau and gain the crucial insights that you need without losing your train of thought. Jethro enables you to be completely flexible with no need for partitions in order to speed up the data. This presentation will explain how indexing is a superior architecture for the BI use case when dealing with big data while compared to MPP architecture.
Alex mang patterns for scalability in microsoft azure applicationCodecamp Romania
The document discusses patterns for scalability in Microsoft Azure applications. It covers queue-based load leveling, competing consumers, and priority queue patterns for handling application load and message processing. It also discusses materialized view and sharding patterns for scaling databases, where materialized views optimize queries and sharding partitions data horizontally across multiple servers. The talk includes demos of priority queue and sharding patterns to illustrate their implementations.
This document provides an overview of Neo4j, a graph database management system. It discusses how Neo4j stores data as nodes and relationships, allowing for fast querying of connected data. Traditional relational databases struggle with complex relationships, while NoSQL databases don't support relationships at all. Neo4j addresses these issues through its native graph storage and processing capabilities. The document highlights key Neo4j features like scalability, high performance, and its Cypher query language.
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...Spark Summit
Building a machine learning model is an iterative process. A data scientist will build many tens to hundreds of models before arriving at one that meets some acceptance criteria. However, the current style of model building is ad-hoc and there is no practical way for a data scientist to manage models that are built over time. In addition, there are no means to run complex queries on models and related data.
In this talk, we present ModelDB, a novel end-to-end system for managing machine learning (ML) models. Using client libraries, ModelDB automatically tracks and versions ML models in their native environments (e.g. spark.ml, scikit-learn). A common set of abstractions enable ModelDB to capture models and pipelines built across different languages and environments. The structured representation of models and metadata then provides a platform for users to issue complex queries across various modeling artifacts. Our rich web frontend provides a way to query ModelDB at varying levels of granularity.
ModelDB has been open-sourced at https://github.com/mitdbg/modeldb.
Tech-Talk at Bay Area Spark Meetup
Apache Spark(tm) has rapidly become a key tool for data scientists to explore, understand and transform massive datasets and to build and train advanced machine learning models. The question then becomes, how do I deploy these model to a production environment. How do I embed what I have learned into customer facing data applications. Like all things in engineering, it depends.
In this meetup, we will discuss best practices from Databricks on how our customers productionize machine learning models and do a deep dive with actual customer case studies and live demos of a few example architectures and code in Python and Scala. We will also briefly touch on what is coming in Apache Spark 2.X with model serialization and scoring options.
Scoring at Scale: Generating Follow Recommendations for Over 690 Million Link...Databricks
The Communities AI team at LinkedIn generates follow recommendations from a large (10’s of millions) set of entities to each of our 690+ million members.
knowIT is a collaborative semantic wiki used by Johnson & Johnson to map their IT systems, applications, servers and stakeholders. It aims to capture knowledge about these informatics systems, their relationships and components to answer questions, facilitate knowledge sharing and enable self-service. The wiki uses Semantic MediaWiki and has grown to include systems portfolio management, configuration management and other features to increase IT systems knowledge across the organization.
Options for Data Prep - A Survey of the Current MarketDremio Corporation
Data comes in many shapes and sizes, and every company struggles to find ways to transform, validate, and enrich data for multiple purposes. The problem has been around as long as data, and the market has an overwhelming number of options. In this presentation we look at the problem and key options from vendors in the market today. Dremio is a new approach that eliminates the need for stand alone data prep tools.
How to Survive as a Data Architect in a Polyglot Database WorldKaren Lopez
Karen Lopez talks to data architects and data moders how they can best deliver value on modern data drive projects beyond relational database technologies. She covers NoSQL Databases and Datastores, which data stories they best fit and which ones they don't. She ends with 10 tips for adding more value to ployschematic database solutions.
Practical Machine Learning for Smarter Search with Solr and SparkJake Mannix
This document discusses using Apache Spark and Apache Solr together for practical machine learning and data engineering tasks. It provides an overview of Spark and Solr, why they are useful together, and then gives an example of exploring and analyzing mailing list archives by indexing the data into Solr with Spark and performing both unsupervised and supervised machine learning techniques.
Forritun gagnaaðgangs er líklega eitt algengasta viðfangsefni við gerð enterprice lausna. Einhvern vegin verðum við að geyma stöður og gögn. Til þess eru töflugagnagrunnar (relational databases) lílkega algengasta formið af geymslu. Gallinn er sá að hlutbundin forritun fellur ekkert sérlega vel að töflugrunnum.
Í þessum fyrirlestri er yfir þau vandamál sem koma upp við hönnun gagnalagsins og hvernig best er að brúa bilið milli klasa í forriti og taflna í grunni.
20141015 how graphs revolutionize access managementRik Van Bruggen
This document discusses how graph databases can revolutionize access and identity management. It begins with an introduction to graphs and graph databases, explaining how they are well-suited for complex querying of connected data. The document then argues that graph databases allow for a more accurate representation of real-world identity relationships, which are often multi-dimensional, and enable real-time queries that eliminate the need for integration between different systems. A demonstration of a graph database is provided, followed by examples, licensing information and a question and answer section.
Graphgen aims at helping people prototyping a graph database, by providing a visual tool that ease the generation of nodes and relationships with a Cypher DSL.
Many people struggle with not only creating a good graph model of their domain but also with creating sensible example data to test hypotheses or use-cases.
Graphgen aims at helping people with no time but a good enough understanding of their domain model, by providing a visual dsl for data model generation which borrows heavily on Neo4j Cypher graph query language.
The ascii art allows even non-technical users to write and read model descriptions/configurations as concise as plain english but formal enough to be parseable. The underlying generator combines the DSL inputs (structure, cardinalities and amount-ranges) and combines them with a comprehensive fake data generation library to create real-world-like datasets of medium/arbitrary size and complexity.
Users can create their own models combining the basic building blocks of the dsl and share their data-descriptions with others with a simple link.
GraphGen: Conducting Graph Analytics over Relational DatabasesPyData
This document discusses GraphGen, a tool for conducting graph analytics over relational databases. It begins by introducing graph analytics and its applications. It then discusses the current state of graph analytics, which is fragmented with no single solution. Most organizations store data relationally and have "hidden" graphs that can be extracted. GraphGen provides a declarative language to define nodes and edges to extract these graphs without ETL. It supports various interfaces like Java, Python, and a web application to enable graph analytics over relational data in an intuitive way.
This document discusses document classification using graphs and Neo4j. It introduces hierarchical pattern recognition (HPR) for graph-based document classification. HPR learns deep feature representations in a hierarchy using finite state machines. The features are mapped to a vector space model for classification. The document demonstrates HPR by classifying US presidential speeches by political affiliation, achieving over 70% similarity for predicted vs actual labels. It encourages attendees to get involved in the Neo4j community.
Wanderu – Lessons from Building a Travel Site with Neo4j - Eddy Wong @ GraphC...Neo4j
Wanderu is a consumer-focused search engine for buses and trains. Eddy will recount the architectural, modeling and other technical “lessons learned” and “lessons unlearned” in implementing our geospatial and search features using Neo4j in the context of a NoSQL polyglot solution.
This document provides an introduction to data modeling with Neo4j. It discusses modeling complex data as a graph using nodes, relationships, properties and labels. It introduces Neo4j as a graph database and its data model of labeled property graphs. It also provides an overview of the Cypher query language and includes an example of modeling a domain to find people with similar skills within a company.
Designing and Building a Graph Database Application – Architectural Choices, ...Neo4j
Ian closely looks at design and implementation strategies you can employ when building a Neo4j-based graph database solution, including architectural choices, data modelling, and testing.g
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...jaxLondonConference
Presented at JAX London
In this session we'll look at some of the design and implementation strategies you can employ when building a Neo4j-based graph database solution, including architectural choices, data modelling, and testing.
Introducing the MySQL Workbench CASE toolAndrás Bögöly
Introducing the ER Model, and the The MySQL Workbench CASE tool with its Database modeling, database SQL development and some aspects of the change management capabilities.
Efficient Rails Test Driven Development (class 3) by Wolfram ArnoldMarakana Inc.
Learn how to apply the test-first approach to all of your Rails projects. In this six class series, experienced Rails engineer and consultant, Wolfram Arnold applies his real-world perspective to teaching you effective patterns for testing.
In this third of six classes, Wolf covers:
- Controller testing
- View, Helper, Routes Testing
- How much is enough? How much is too much?
** You can get the video and source code from this presentation at: http://marakana.com/f/201 **
All six classes will be available online, so stay tuned! And be sure to check out marakana.com/techtv for more videos on open source training.
Presented by: Wolfram Arnold, in collaboration with Sarah Allen, BlazingCloud.net
Produced by: Marakana
The document discusses how property graph databases like Neo4j can model and query relationship data more effectively than relational or other NoSQL databases. It provides examples of modeling user, movie, and product data as graphs and executing queries in Cypher. It also discusses using the Java Core API and Traversal API to navigate graph data and developing recommendation systems and applications for fraud detection by analyzing patterns in user behaviors and connections.
The document discusses how property graph databases like Neo4j can model and query relationship data more effectively than relational or other NoSQL databases. It provides examples of modeling user, movie, and product data as graphs and executing queries in Cypher. It also discusses using the Java Core API and Traversal API to navigate graph data and developing recommendation systems and applications for fraud detection by analyzing patterns in user behaviors and connections.
This document discusses database normalization and Anchor Modeling. It provides reasons for normalizing databases like reducing redundancy and improving integrity. It also lists common objections to normalization. The document then introduces Anchor Modeling as an agile technique for maintaining a highly normalized data model that is easy to evolve over time without downtime. Key aspects of Anchor Modeling like anchors, attributes, ties and knots are explained. Finally, examples are provided of how to model requirements and handle changes using this approach.
The document provides information on entity relationship diagrams (ERDs), including their objectives, components, and how to construct them. An ERD is a graphical representation of entities, attributes, and relationships within a database. It serves as a design tool, documentation, and means to communicate the logical structure. Key aspects covered include identifying entities and attributes, defining relationships and cardinalities, and using standard symbols and notations to draw the ERD.
The document provides an overview of conceptual database design using entity-relationship (ER) modeling. It defines key concepts in ER diagrams like entities, attributes, relationships and their cardinalities. It explains how to model different relationship types like one-to-one, one-to-many and many-to-many. It also covers advanced topics such as weak entities, generalization, specialization and aggregation. The overall purpose is to illustrate how ER diagrams can be used to design databases by visually representing the entities, attributes, and relationships in a domain.
This document discusses entity-relationship (ER) modeling and ER diagrams. It defines key concepts such as entities, attributes, relationships, and cardinalities. It explains how ER diagrams visually represent these concepts using symbols like rectangles, diamonds, and lines. The document also covers ER diagram notation for different types of attributes, keys, roles, and relationship cardinalities. The goal of ER modeling and diagrams is to conceptualize a database without technical details.
The document provides an overview of entity-relationship (ER) modeling concepts used in database design. It defines key terms like entities, attributes, relationships, and cardinalities. It explains how ER diagrams visually represent these concepts using symbols like rectangles, diamonds, and lines. The document also discusses entity types, relationship degrees, key attributes, weak entities, and how to model one-to-one, one-to-many, many-to-one, and many-to-many relationships. Overall, the document serves as a guide to basic ER modeling principles for conceptual database design.
1) The document describes an entity-relationship (ER) diagram for a university database. It identifies the main entities as Department, Course, Module, Lecturer, and Student.
2) The key relationships are that a Department offers multiple Courses, a Course includes multiple Modules, a Lecturer teaches multiple Modules, and a Student enrolls in a Course and takes the Modules required to complete it.
3) The document explains the different components of an ER diagram, including entities, relationships, attributes, keys, and relationship types (one-to-one, one-to-many, many-to-many). It provides examples of how to map an ER diagram to database tables.
The student will understand the basics of the Relational Database Model.
The student will learn Database Administration functions as appropriate for software developers.
The student will learn SQL.
The student will become familiar with the entire implementation cycle of a client server application.
And, you will build one.
The document discusses the entity-relationship (ER) model for conceptual database design. It describes the basic constructs of the ER model including entities, attributes, relationships, keys, and various modeling choices. The ER model is useful for capturing the semantics of an application domain and producing a conceptual schema before logical and physical design.
Exploring NoSQL and implementing through CassandraDileep Kalidindi
This document provides an overview of NoSQL databases and Apache Cassandra. It discusses how data and data modeling have evolved with big data. It introduces key concepts like CAP theorem and ACID vs BASE. It describes various NoSQL implementations like key-value, document, and column-oriented databases. It provides details on Cassandra's architecture, data model, and operational aspects. The document demonstrates Cassandra configuration, CQL usage, and monitoring tools.
This document provides an introduction to database management systems (DBMS). It defines key terminology related to databases and discusses problems with manual databases. It describes the functions and advantages of DBMS, including data representation, transaction management, data sharing, and increased security. Examples of popular DBMS are provided, such as Oracle, Microsoft SQL Server, and MySQL. Database system architecture, data models, and the relational model are overviewed. Finally, entity relationship (ER) modeling is explained as a way to conceptualize data needs and design the database logically before implementation.
HBase and Drill: How Loosely Typed SQL is Ideal for NoSQLMapR Technologies
From the Hadoop Summit 2015 Session with Ted Dunning:
The Apache HBase approach to data has a huge potential for expressing NoSQL-y, non-relational programs. Apache Drill supports SQL for non-relational data. Paradoxically, combining this NoSQL with this SQL tool results in something even better. I will show and explain how to combine HBase and Drill to access time series data and to support high performance secondary indexing.
HBase and Drill: How loosley typed SQL is ideal for NoSQLDataWorks Summit
The document discusses how complex data structures can be modeled in a database using an extended relational model. It begins with an agenda that includes discussing loose typing, examples of what can be done, and looking at a real database with 10-20x fewer tables. It then contrasts the traditional relational model with HBase and discusses how structuring allows complex objects in fields and references between objects. Examples are given of modeling time-series data and music metadata in fewer tables using these techniques. Apache Drill is presented as a way to perform SQL queries over these complex data structures.
Presented at JavaOne 2013, Tuesday September 24.
"Data Modeling Patterns" co-created with Ian Robinson.
"Pitfalls and Anti-Patterns" created by Ian Robinson.
Similar to 20141216 graph database prototyping ams meetup (20)
1 rik van bruggen - intro and state of the graphRik Van Bruggen
This document provides an agenda for a Graphdb-Brussels Meetup event. The agenda includes:
- A presentation on "The State of the Graph" industry by Rik Van Bruggen of Neo4j (19h).
- Three case study presentations on using graphs for CMDB modeling, mapping IT landscapes, and protein association networks (19h15-20h40).
- Pizza will be served around 19h50.
- The event closes at 21h15 with goodbye. There will be at least as many attendees as a previous larger event. The speakers will provide insights into applications of graph databases.
This document summarizes a method for constructing protein networks from public proteomics data. It involves pairing proteins that co-occur in experiments and mapping these pairs to existing knowledge bases to identify biologically related pairs. Over 2300 protein pairs were found with a Jaccard similarity score of at least 0.4, and 71% of these were known associations according to literature. The associated protein pairs are stored in an online database called Tabloid Proteome that allows visualization of the network and detection of indirect protein relations through graph algorithms.
The document discusses the rise of platforms and artificial intelligence. It describes how platforms allow users to create and consume value, and how developers can extend platform functionality through APIs. The rise of platforms also fueled advances in artificial intelligence, as platforms accumulated large amounts of user data that could be used to train machine learning models. Specific examples of platforms discussed include Facebook, Amazon, and Google.
Reinventing Identity and Access Management with Graph DatabasesRik Van Bruggen
This document discusses reinventing identity and access management (IAM) with graph databases. It notes that traditional IAM systems have static views of identity that cannot handle today's complex, dynamic identities for users, things, and services. Graph databases allow for modeling the flexible, multidimensional relationships between entities that are needed for modern IAM. Neo4j is highlighted as a graph database that can represent complex identity relationships and hierarchies and enable real-time traversal of these relationships for advanced access management and compliance checks. Case studies demonstrate how Neo4j has been used successfully for flexible, high performance IAM applications.
This document provides an overview of how to embed graph visualization in an application using Neo4j and D3.js. It discusses:
- Accessing Neo4j via REST API or embedded mode
- Using D3.js for graph visualization by converting Neo4j data to nodes and links format
- Typical application architectures with Neo4j server and client-side components
- An example workflow of querying Neo4j, converting the response for D3.js force layout, and building the force layout
- Requirements including D3.js force layout, Neo4j REST API, and converting between Neo4j and D3.js data formats
The
20150619 GOTO Amsterdam Conference - What Business can learn from DatingRik Van Bruggen
Talk about how graph databases are used in the Dating industry, and how that use-case pattern can actually be used by lots of other industries - new and old.
This document provides an introduction and overview of graph databases. It begins with an introduction to graphs and their history, then discusses what graph databases are and how they complement relational databases. It introduces Neo4j as an example graph database and describes its key aspects like the labeled property graph data model and Cypher query language. The document then discusses when graph databases are applicable and provides examples. It demonstrates graph querying and concludes with case studies and next steps.
Rik Van Bruggen from Neo Technology presented on graph databases and data innovation. A survey showed that 34.3% of respondents worked with relational databases, while 38.4% worked with graph databases. Rik encouraged connecting with the graph database community through meetup groups, conferences like GraphConnect, or by following him on social media.
This document discusses how graphs can be used to solve big data problems by enabling insights through modeling and analyzing connections in data. It provides examples of how graphs have transformed industries like consumer web, telecommunications, and databases. Specifically, it outlines how graphs can be used to model reality, store data in high fidelity, look for connections in content, customers, users and products, and build these graph-based insights into applications for recommendations, impact analysis, dependency management, and risk analysis.
201411203 goto night on graphs for fraud detectionRik Van Bruggen
This document discusses how graph databases can be useful for fraud detection. It begins with an introduction to graphs and graph theory, then discusses how graph databases work and their advantages over relational databases for complex querying and modeling connected data. The document notes that fraud detection relies on real-time analysis, complex patterns, and graph algorithms to navigate relationships. It provides a short demonstration and discusses case studies where graph databases have been successfully used for fraud detection due to their ability to efficiently handle large, interconnected datasets.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
4. Topics
• Graph
model
building
blocks
• Quick
intro
to
Cypher
• Example
modeling
process
• Modeling
Eps
• Recipes
for
common
modeling
scenarios
• Refactoring
• Test-‐driven
data
modeling
9. Nodes
• Used
to
represent
en##es
and
complex
value
types
in
your
domain
• Can
contain
properEes
– Used
to
represent
enEty
a1ributes
and/or
metadata
(e.g.
Emestamps,
version)
– Key-‐value
pairs
• Java
primiEves
• Arrays
• null
is
not
a
valid
value
– Every
node
can
have
different
properEes
10. EnEEes
and
Value
Types
• EnEEes
– Have
unique
conceptual
idenEty
– Change
aWribute
values,
but
idenEty
remains
the
same
• Value
types
– No
conceptual
idenEty
– Can
subsEtute
for
each
other
if
they
have
the
same
value
• Simple:
single
value
(e.g.
colour,
category)
• Complex:
mulEple
aWributes
(e.g.
address)
12. RelaEonships
• Every
relaEonship
has
a
name
and
a
direc#on
– Add
structure
to
the
graph
– Provide
semanEc
context
for
nodes
• Can
contain
properEes
– Used
to
represent
quality
or
weight
of
relaEonship,
or
metadata
• Every
relaEonship
must
have
a
start
node
and
end
node
– No
dangling
relaEonships
13. RelaEonships
(conEnued)
Nodes
can
have
more
than
one
relaEonship
Nodes
can
be
connected
by
more
than
one
relaEonship
Self
relaEonships
are
allowed
14. Variable
Structure
• RelaEonships
are
defined
with
regard
to
node
instances,
not
classes
of
nodes
– Two
nodes
represenEng
the
same
kind
of
“thing”
can
be
connected
in
very
different
ways
• Allows
for
structural
variaEon
in
the
domain
– Contrast
with
relaEonal
schemas,
where
foreign
key
relaEonships
apply
to
all
rows
in
a
table
• No
need
to
use
null
to
represent
the
absence
of
a
connecEon
16. Labels
• Every
node
can
have
zero
or
more
labels
• Used
to
represent
roles
(e.g.
user,
product,
company)
– Group
nodes
– Allow
us
to
associate
indexes
and
constraints
with
groups
of
nodes
17. Four
Building
Blocks
• Nodes
– EnEEes
• RelaEonships
– Connect
enEEes
and
structure
domain
• ProperEes
– EnEty
aWributes,
relaEonship
qualiEes,
and
metadata
• Labels
– Group
nodes
by
role
21. Method
1. IdenEfy
applicaEon/end-‐user
goals
2. Figure
out
what
quesEons
to
ask
of
the
domain
3. IdenEfy
enEEes
in
each
quesEon
4. IdenEfy
relaEonships
between
enEEes
in
each
quesEon
5. Convert
enEEes
and
relaEonships
to
paths
– These
become
the
basis
of
the
data
model
6. Express
quesEons
as
graph
paWerns
– These
become
the
basis
for
queries
22. ApplicaEon/End-‐User
Goals
As
an
employee
I
want
to
know
who
in
the
company
has
similar
skills
to
me
So
that
we
can
exchange
knowledge
23. QuesEons
To
Ask
of
the
Domain
As
an
employee
I
want
to
know
who
in
the
company
has
similar
skills
to
me
So
that
we
can
exchange
knowledge
Which
people,
who
work
for
the
same
company
as
me,
have
similar
skills
to
me?
24. IdenEfy
EnEEes
Which
people,
who
work
for
the
same
company
as
me,
have
similar
skills
to
me?
Person
Company
Skill
25. IdenEfy
RelaEonships
Between
EnEEes
Which
people,
who
work
for
the
same
company
as
me,
have
similar
skills
to
me?
Person
WORKS_FOR
Company
Person
HAS_SKILL
Skill
26. Convert
to
Cypher
Paths
RelaEonship
Person
WORKS_FOR
Company
Person
HAS_SKILL
Skill
Label
(:Person)-[:WORKS_FOR]->(:Company),
(:Person)-[:HAS_SKILL]->(:Skill)
30. Express
QuesEon
as
Graph
PaWern
Which
people,
who
work
for
the
same
company
as
me,
have
similar
skills
to
me?
31. Cypher
Query
Which
people,
who
work
for
the
same
company
as
me,
have
similar
skills
to
me?
MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill),
(company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)
WHERE me.name = {name}
RETURN colleague.name AS name,
count(skill) AS score,
collect(skill.name) AS skills
ORDER BY score DESC
32. Graph
PaWern
Which
people,
who
work
for
the
same
company
as
me,
have
similar
skills
to
me?
MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill),
(company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)
WHERE me.name = {name}
RETURN colleague.name AS name,
count(skill) AS score,
collect(skill.name) AS skills
ORDER BY score DESC
33. Anchor
PaWern
in
Graph
Which
people,
who
work
for
the
same
company
as
me,
have
similar
skills
to
me?
MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill),
(company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)
WHERE me.name = {name}
RETURN colleague.name AS name,
count(skill) AS score,
collect(skill.name) AS skills
ORDER BY score DESC
If
an
index
for
Person.name
exists,
Cypher
will
use
it
34. Create
ProjecEon
of
Results
Which
people,
who
work
for
the
same
company
as
me,
have
similar
skills
to
me?
MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill),
(company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)
WHERE me.name = {name}
RETURN colleague.name AS name,
count(skill) AS score,
collect(skill.name) AS skills
ORDER BY score DESC
39. From
User
Story
to
Model
and
Query
MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill),
(company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)
WHERE me.name = {name}
RETURN colleague.name AS name,
count(skill) AS score,
collect(skill.name) AS skills
ORDER BY score DESC
As
an
employee
I
want
to
know
who
in
the
company
has
similar
skills
to
me
So
that
we
can
exchange
knowledge
Person
WORKS_FOR
Company
Person
HAS_SKILL
Skill
(:Company)<-[:WORKS_FOR]-(:Person)-[:HAS_SKILL]->(:Skill)
? Which
people,
who
work
for
the
same
company
as
me,
have
similar
skills
to
me?
42. Use
RelaEonships
When…
• You
need
to
specify
the
weight,
strength,
or
some
other
quality
of
the
rela#onship
• AND/OR
the
aWribute
value
comprises
a
complex
value
type
(e.g.
address)
• Examples:
– Find
all
my
colleagues
who
are
expert
(relaEonship
quality)
at
a
skill
(aWribute
value)
we
have
in
common
– Find
all
recent
orders
delivered
to
the
same
delivery
address
(complex
value
type)
43. Use
ProperEes
When…
• There’s
no
need
to
qualify
the
relaEonship
• AND
the
aWribute
value
comprises
a
simple
value
type
(e.g.
colour)
• Examples:
– Find
those
projects
wriWen
by
contributors
to
my
projects
that
use
the
same
language
(aWribute
value)
as
my
projects
44. If
Performance
is
CriEcal…
• Small
property
lookup
on
a
node
will
be
quicker
than
traversing
a
relaEonship
– But
traversing
a
relaEonship
is
sEll
faster
than
a
SQL
join…
• However,
many
small
proper#es
on
a
node,
or
a
lookup
on
a
large
string
or
large
array
property
will
impact
performance
– Always
performance
test
against
a
representaEve
dataset
46. Align
With
Use
Cases
• RelaEonships
are
the
“royal
road”
into
the
graph
• When
querying,
well-‐named
relaEonships
help
discover
only
what
is
absolutely
necessary
– And
eliminate
unnecessary
porEons
of
the
graph
from
consideraEon
51. Events
and
AcEons
• Oken
involve
mulEple
parEes
• Can
include
other
circumstanEal
detail,
which
may
be
common
to
mulEple
events
• Examples
– Patrick
worked
for
Acme
from
2001
to
2005
as
a
Sokware
Developer
– Sarah
sent
an
email
to
Lucy,
copying
in
David
and
Claire
52. Timeline
Trees
• Discrete
events
– No
natural
relaEonships
to
other
events
• You
need
to
find
events
at
differing
levels
of
granularity
– Between
two
days
– Between
two
months
– Between
two
minutes
55. Modeling
EnEEes
as
RelaEonships
• Limits
data
model
evoluEon
– A
relaEonship
connects
two
things
– Modeling
an
enEty
as
a
relaEonship
prevents
it
from
being
related
to
more
than
two
things
• Smells:
– Lots
of
aWribute-‐like
properEes
– Heavy
use
of
relaEonship
indexes
• EnEEes
hidden
in
verbs:
– E.g.
emailed,
reviewed
56. Example:
Movie
Reviews
• IniEal
requirements:
– People
review
films
– ApplicaEon
aggregates
reviews
from
mulEple
sites