How Dell Used Neo4j Graph Database to Redesign Their Pricing-as-a-Service Platform

•Download as PPTX, PDF•

0 likes•292 views

Neo4j

Technology

© 2022 Neo4j, Inc. All rights reserved.
© 2022 Neo4j, Inc. All rights reserved.
Redesigning Dell Pricing
Platform
Andrew Nepogodin,
Cloud Architect
Bhanu Naidu,
Data Engineer
Natraj Rachakonda,
Data Engineer

Agenda
Dell Digital Introduction
• Pre-assembly Model value
• Separation of concerns: data preparation vs runtime execution
Evolving from SOA to Serverless Architecture – how Graph became a necessity
• Case for Consolidation & Denormalization
• Increased data density (avoiding proliferation)
Why Neo4J?
• Pure SaaS principles
• Horizontal scalability as foundation
• Flexible schema
• Ad-hoc model to leverage Engine capabilities
• Lessons learned
• Data migration strategy
• Ops management – support, monitoring, alerts, administration, backups
Success story of Pricing Engine

Dell Digital IT
Product
Price
Payment
Cart
Quote
Order
Grow
Mature
Consolidate

Dell Digital IT
EMEA APJ
AMER
0
5
10
15
20
25
30
35
40
1 2 3 4 5 6

Legacy SOA (cont.)
Client
Commerce
Service
Orchestrator
Dependency
Service
Data
Dependency
Service
X
X

Ingestion events (zero runtime dependency)

Why Graph
(Neo4j)
Service?
• Data "gluing" mechanism for disconnected
source systems
• No proliferation during input; denormalized
output
• Natural representation of structure and relations
• Schema-less... almost
• Compliance with declarative modeling
• Efficient traversals including recursion and
circular references
• Relation (Edges) are first-class citizens
• Distributed load between Reads & Writes
• Support for server-side plugins

Data Density
• No proliferation at input
• Denormalized output

Denormalization
• Runtime data collection is expensive
• Self-sufficient runtime packages
• Key-value storage
• Data integrity (old package used till new one is ready)

Pricing Engine SaaS offering
Integrating
disconnected Data
Sources
Price Stages as
business constructs
rather than
technology process
Common domain
problems (e.g.
Rounding &
Compensation)
Price mutating
actions via Formulas
Capabilities over use-
cases
Ad-hoc business
model declaration (no
code changes)
Price explanation Data usage insight

Expanding:
Horizontal
Scaling or
Parallelization

Data-driven architecture
• Input data:
• Product Structure
• Components
• Adjustments
• Connecting data:
• Joining Product and Components
• Joining Nodes of Product Structure and Adjustment
• Applying data:
• Adjustment candidate selection
• Runtime context as an Adjustment application condition
Item
(laptop)
Module:
Memory
Module:
HD
Option:
8GB
Option:
16GB
Option:
1TB
Option:
2TB
SKU:
8GB
SKU:
16GB
SKU:
1TB
SKU:
2TB
StandardPrice:
$100
StandardPrice:
$180
StandardPrice:
$200
StandardPrice:
$300

Graph as
common
denominator
• Modeling of relationship
• Condition Expression
• DNF
• Problems to solve:
• For a given Product get
all applicable
Adjustments
• Get all Products affected
by the given
Adjustments
family.id=11111 and component.id==C-123
Item
id=I1
Classification
family==11111
Module
Component
id=C-222
Component
id=C-123
Adjustment

Dynamic
Graph
schema
Item
Specification Classification
Component
AdjustmentContext
Adjustment
has
is
is
is
is
appliedTo
appliedTo
contains
contains
ConditionGroup
Service
Context
includes
belongs
has
Main Schema Ad-hoc Schema

Things we
wish we
knew
sooner
Separating Read and Write operations
Memory footprint
Multi-label capability
Bookmarks
Disallow slow Node to be a Leader

Migrating legacy data into new Platform
Initial load
K8S
Cloud Function
...
Cloud Function
Product
Source
Neo4J
Stream
Product
Crawler
Component
Source
Component
Crawler
Component
Source
Component
Crawler

Migrating legacy data (cont.)
Change events
Stream
Stream
Stream
Adjustments
Components
Products
Neo4J
Cloud Function
Cloud Function
Cloud Function

Production
Data Volume
• Uneven load distribution (peaks and valleys)
Up to 50 Million payloads per day
• 2-4 Vertices
• 1-4 Edges
80% are Small payloads:
• 4-12 Vertices
• 3-12 Edges
15% are Medium payloads
• Up to 4000 vertices
• Up to 4000 edges
5% are Large Payloads

Monitoring & Alerts
Splunk
• Ingest all neo4j related logs
• Configure macros to capture log events
• Create custom dashboards
• Set up alerts

Neo4j Performance dashboard with Grafana
Prometheus/Grafana
• Export neo4j host and database metrics to Prometheus
• Set up custom dashboards dedicated to host metrics and DB metrics
• Configured email alerts with alert manager

No client interruption backup/restore
Customized bash script to perform full/incremental backups
Daily backups with crontab
Backups saved to NAS disk
Delphix to refresh non prod environments

Summary:
Benefits for
business
•Across domain – authoring, shop, transact
•Across regions
Unified pricing experience
•Self-service configuration effort (no recompilation/redeployment)
•No duplicated functionality across domains
Reduced time to market
•Supporting what-if scenarios
•Virtually unlimited price authoring logic
•True Delta price presentation
Decoupling business from architectural constraints
•Guaranteed SLA
•99.999% availability
•Controllable system load
Platform stability

What's hot

Apache Kafka With Spark Structured Streaming With Emma LIU, Nitin Saksena, Ram Dhakne | Current 2022 A well-architected data lakehouse provides an open data platform that combines streaming with data warehousing, data engineering, data science and ML. This opens a world beyond streaming to solving business problems in real-time with analytics and AI. See how companies like Albertsons have used Databricks and Confluent together to combine Kafka streaming with Databricks for their digital transformation. In this talk, you will learn: - The built-in streaming capabilities of a lakehouse - Best practices for integrating Kafka with Spark Structured Streaming - How Albertsons architected their data platform for real-time data processing and real-time analytics

Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...

HostedbyConfluent

Would you share your bank account information on social media? How about shouting your social security number on the New York City subway? We didn’t think so either – that’s why data governance is consistently top of mind. In this webinar, we’ll discuss the common Cloud data governance best practices – and how to apply them today. Join us to uncover Google Cloud’s investment in data governance and learn practical and doable methods around key management and confidential computing. Hear real customer experiences and leave with insights that you can share with your team. Let’s get solving. Topics that you will hear addressed in this webinar: - Understanding the basics of Cloud Incident Response (IR) and anticipated data governance trends - Best practices for key management and apply data governance to your day-to-day - The next wave of Confidential Computing and how to get started, including a demo

Data Governance Trends and Best Practices To Implement Today

DATAVERSITY

The question is asked all the time: “What part of the organization should own your Data Governance program?” The typical answers are “the business” and “IT (information technology).” Another answer to that question is “Yes.” The program must be owned and reside somewhere in the organization. You may ask yourself if there is a correct answer to the question. Join this new RWDG webinar with Bob Seiner where Bob will answer the question that is the title of this webinar. Determining ownership of Data Governance is a vital first step. Figuring out the appropriate part of the organization to manage the program is an important second step. This webinar will help you address these questions and more. In this session Bob will share: - What is meant by “the business” when it comes to owning Data Governance - Why some people say that Data Governance in IT is destined to fail - Examples of IT positioned Data Governance success - Considerations for answering the question in your organization - The final answer to the question of who should own Data Governance

Who Should Own Data Governance – IT or Business?

DATAVERSITY

What does it mean for an organization to be data-driven? How does an organization get there? Many organizations think that they are data-driven but the reality is that few genuinely are and that we could all do better. In this talk, I cover what it truly means to be data driven. The answer, it turns out, is not to do with the latest tools and technologies (although they can help) but having an appropriate data culture than spans the whole organization, where data is accessible broadly, embedded into operations and processes, and enables effective decision making. In this presentation, I dissect what an effective data-driven culture entails, covering facets such as data leadership, data literacy, and A/B testing, illustrating concepts with examples from different industries as well as personal experience.

Creating a Data-Driven Organization, Crunchconf, October 2015

Carl Anderson

EY: Why graph technology makes sense for fraud detection and customer 360 pro...

Neo4j

It is a fascinating, explosive time for enterprise analytics. It is from the position of analytics leadership that the enterprise mission will be executed and company leadership will emerge. The data professional is absolutely sitting on the performance of the company in this information economy and has an obligation to demonstrate the possibilities and originate the architecture, data, and projects that will deliver analytics. After all, no matter what business you’re in, you’re in the business of analytics. The coming years will be full of big changes in enterprise analytics and data architecture. William will kick off the fifth year of the Advanced Analytics series with a discussion of the trends winning organizations should build into their plans, expectations, vision, and awareness now.

2023 Trends in Enterprise Analytics

DATAVERSITY

How to build a successful Data Lake

DataWorks Summit/Hadoop Summit

Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.

Modernizing to a Cloud Data Architecture

Databricks

Delta has been powering many production pipelines at scale in the Data and AI space since it has been introduced for the past few years. Built on open standards, Delta provides data reliability, enhances storage and query performance to support big data use cases (both batch and streaming), fast interactive queries for BI and enabling machine learning. Delta has matured over the past couple of years in both AWS and AZURE and has become the de-facto standard for organizations building their Data and AI pipelines. In today’s talk, we will explore building end-to-end pipelines on the Google Cloud Platform (GCP). Through presentation, code examples and notebooks, we will build the Delta Pipeline from ingest to consumption using our Delta Bronze-Silver-Gold architecture pattern and show examples of Consuming the delta files using the Big Query Connector.

Building End-to-End Delta Pipelines on GCP

Databricks

This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems. Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/) Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.

Data Mesh Part 4 Monolith to Mesh

Jeffrey T. Pollock

Data Platform Architecture Principles and Evaluation Criteria

ScyllaDB

Snowflake: The Good, the Bad, and the Ugly

Tyler Wishnoff

Webinar Data Mesh - Part 3

Jeffrey T. Pollock

Databricks is a Software-as-a-Service-like experience (or Spark-as-a-service) that is a tool for curating and processing massive amounts of data and developing, training and deploying models on that data, and managing the whole workflow process throughout the project. It is for those who are comfortable with Apache Spark as it is 100% based on Spark and is extensible with support for Scala, Java, R, and Python alongside Spark SQL, GraphX, Streaming and Machine Learning Library (Mllib). It has built-in integration with many data sources, has a workflow scheduler, allows for real-time workspace collaboration, and has performance improvements over traditional Apache Spark.

Introduction to Azure Databricks

James Serra

Pourquoi Leroy Merlin a besoin d'un Knowledge Graph ?

Neo4j

Traditional data storage and analytic tools no longer provide the agility and flexibility required to deliver relevant business insights. That’s why organizations are shifting to a data lake architecture. This approach allows you to store massive amounts of data in a central location so it's readily available to be categorized, processed, analyzed, and consumed by diverse organizational groups. In this session, we’ll assemble a data lake using services such as Amazon S3, Amazon Kinesis, Amazon Athena, Amazon EMR, and AWS Glue.

Big Data & Data Lakes Building Blocks

Amazon Web Services

Introducing the Snowflake Computing Cloud Data Warehouse

Snowflake Computing

Lakehouse in Azure

Sergio Zenatti Filho

Intro to Neo4j and Graph Databases

Neo4j

"Unlike just a few years ago, today the lakehouse architecture is an established data platform embraced by all major cloud data companies such as AWS, Azure, Google, Oracle, Microsoft, Snowflake and Databricks. This session kicks off with a technical, no-nonsense introduction to the lakehouse concept, dives deep into the lakehouse architecture and recaps how a data lakehouse is built from the ground up with streaming as a first-class citizen. Then we focus on serverless for streaming use cases. Serverless concepts are well-known from developers triggering hundreds of thousands of AWS Lambda functions at a negligible cost. However, the same concept becomes more interesting when looking at data platforms. We have all heard about the principle ""It runs best on Powerpoint"", so I decided to skip slides here and bring a serverless demo instead: A hands-on, fun, and interactive serverless streaming use case example where we ingest live events from hundreds of mobile devices (don't miss out - bring your phone and be part of it!!). Based on this use case I will critically explore how much of a modern lakehouse is serverless and how we implemented that at Databricks (spoiler alert: serverless is everywhere from data pipelines, workflows, optimized Spark APIs, to ML). TL;DR benefits for the Data Practitioners: -Recap the OSS foundation of the Lakehouse architecture and understand its appeal - Understand the benefits of leveraging a lakehouse for streaming and what's there beyond Spark Structured Streaming. - Meat of the talk: The Serverless Lakehouse. I give you the tech bits beyond the hype. How does a serverless lakehouse differ from other serverless offers? - Live, hands-on, interactive demo to explore serverless data engineering data end-to-end. For each step we have a critical look and I explain what it means, e.g for you saving costs and removing operational overhead."

Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...

HostedbyConfluent

What's hot (20)

Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...

Data Governance Trends and Best Practices To Implement Today

Who Should Own Data Governance – IT or Business?

Creating a Data-Driven Organization, Crunchconf, October 2015

EY: Why graph technology makes sense for fraud detection and customer 360 pro...

2023 Trends in Enterprise Analytics

How to build a successful Data Lake

Modernizing to a Cloud Data Architecture

Building End-to-End Delta Pipelines on GCP

Data Mesh Part 4 Monolith to Mesh

Data Platform Architecture Principles and Evaluation Criteria

Snowflake: The Good, the Bad, and the Ugly

Webinar Data Mesh - Part 3

Introduction to Azure Databricks

Pourquoi Leroy Merlin a besoin d'un Knowledge Graph ?

Big Data & Data Lakes Building Blocks

Introducing the Snowflake Computing Cloud Data Warehouse

Lakehouse in Azure

Intro to Neo4j and Graph Databases

Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...

Similar to How Dell Used Neo4j Graph Database to Redesign Their Pricing-as-a-Service Platform

Oracle Big Data Appliance and Big Data SQL for advanced analytics

jdijcks

Gamma Soft and NuoDB Speed Up Data Consolidation And Cloud Migration

NuoDB

FInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo Aquino

Hugo Aquino

Watch full webinar here: https://bit.ly/34iCruM Many organizations are embarking on strategically important journeys to embrace data and analytics. The goal can be to improve internal efficiencies, improve the customer experience, drive new business models and revenue streams, or – in the public sector – provide better services. All of these goals require empowering employees to act on data and analytics and to make data-driven decisions. However, getting data – the right data at the right time – to these employees is a huge challenge and traditional technologies and data architectures are simply not up to this task. This webinar will look at how organizations are using Data Virtualization to quickly and efficiently get data to the people that need it. Attend this session to learn: - The challenges organizations face when trying to get data to the business users in a timely manner - How Data Virtualization can accelerate time-to-value for an organization’s data assets - Examples of leading companies that used data virtualization to get the right data to the users at the right time

Bridging the Last Mile: Getting Data to the People Who Need It (APAC)

Denodo

NRB SAP Hosting & Cloud Solutions

NRB

Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...

Data Con LA

Building a devops CMDB

Jaime Valero de Bernabé

Graph Data Science at Scale

Neo4j

Oracle big data appliance and solutions

solarisyougood

«Moderne» Data Warehouse/Data Lake Architekturen strotzen oft nur von Layern und Services. Mit solchen Systemen lassen sich Petabytes von Daten verwalten und analysieren. Das Ganze hat aber auch seinen Preis (Komplexität, Latenzzeit, Stabilität) und nicht jedes Projekt wird mit diesem Ansatz glücklich. Der Vortrag zeigt die Reise von einer technologieverliebten Lösung zu einer auf die Anwender Bedürfnisse abgestimmten Umgebung. Er zeigt die Sonnen- und Schattenseiten von massiv parallelen Systemen und soll die Sinne auf das Aufnehmen der realen Kundenanforderungen sensibilisieren.

Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)

Trivadis

Watch full webinar here: https://bit.ly/3K2SaCQ In today’s world, management of data can be a major challenge. For many systems, including SAP, data in real-time and integrating it with other disparate sources has historically been difficult to accomplish. The traditional Data Warehouse approach can also be quite expensive to keep data fresh and control access to meet new and future data protection requirements. Denodo and Gateway Architect’s Meister Core™ offers a high-performance data virtualization solution, designed to fulfill those needs. Join Denodo, Gateway Architects and W5 Consulting to learn about the value of a logical Data Fabric and delivery platform and its role in this new solution. The webinar will overview the solution including how it provides support for SAP Migrations and sharing of SAP data across geographic boundaries. In addition, you will see how this solution provides the added value of improved agility for supply chain management, and much more. We will also share a demonstration to showcase the benefits of this solution. Do not miss this opportunity to learn all this as well as how the Joint Denodo/Meister Core solution can: - Create an agile, real-time, robust data virtualization solution. - Work with combinations of SAP and Non-SAP data in “Actual” real time scenarios. - And deliver a true 360 degree view of analytics from multiple systems and seemingly tie that to all your SAP FICO documents 10X faster then previously possible.

Become More Data-driven by Leveraging Your SAP Data

Denodo

AVATA presents Upgrading Demantra Webinar

AVATA

Scaling Multi-Cloud Deployments with Denodo: Automated Infrastructure Management

Denodo

Transform Your Data Integration Platform From Informatica To ODI

Jade Global

Ultime Novità di Prodotto Neo4j

Neo4j

Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks

Databricks

Sap hana sap webinar 12-2-13 v1

Rick Speyer

MT12 - SAP solutions from Dell – from your Datacenter to the Cloud

Dell EMC World

Peek into Neo4j Product Strategy and Roadmap

Neo4j

Ready solutions with Red Hat

Caio Candido

Similar to How Dell Used Neo4j Graph Database to Redesign Their Pricing-as-a-Service Platform (20)

Oracle Big Data Appliance and Big Data SQL for advanced analytics

Gamma Soft and NuoDB Speed Up Data Consolidation And Cloud Migration

FInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo Aquino

Bridging the Last Mile: Getting Data to the People Who Need It (APAC)

NRB SAP Hosting & Cloud Solutions

Data Con LA 2018 - Populating your Enterprise Data Hub for Next Gen Analytics...

Building a devops CMDB

Graph Data Science at Scale

Oracle big data appliance and solutions

Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)

Become More Data-driven by Leveraging Your SAP Data

AVATA presents Upgrading Demantra Webinar

Scaling Multi-Cloud Deployments with Denodo: Automated Infrastructure Management

Transform Your Data Integration Platform From Informatica To ODI

Ultime Novità di Prodotto Neo4j

Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks

Sap hana sap webinar 12-2-13 v1

MT12 - SAP solutions from Dell – from your Datacenter to the Cloud

Peek into Neo4j Product Strategy and Roadmap

Ready solutions with Red Hat

Your enemies use GenAI too - staying ahead of fraud with Neo4j

Neo4j

Delivered by Sreenath Gopalakrishna, Director of Software Engineering at BT, and Dr Jim Webber, Chief Scientist at Neo4j, at Gartner Data & Analytics Summit London 2024 this presentation examines how knowledge graphs and GenAI combine in real-world solutions. BT Group has used the Neo4j Graph Database to enable impressive digital transformation programs over the last 6 years. By re-imagining their operational support systems to adopt self-serve and data lead principles they have substantially reduced the number of applications and complexity of their operations. The result has been a substantial reduction in risk and costs while improving time to value, innovation, and process automation. Future innovation plans include the exploration of uses of EKG + Generative AI.

BT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptx

Neo4j

Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan

Neo4j

Workshop 1. Architecting Innovative Graph Applications Join this hands-on workshop for beginners led by Neo4j experts guiding you to systematically uncover contextual intelligence. Using a real-life dataset we will build step-by-step a graph solution; from building the graph data model to running queries and data visualization. The approach will be applicable across multiple use cases and industries.

Workshop - Architecting Innovative Graph Applications- GraphSummit Milan

Neo4j

Roberto Sannino, Larus Business Automation Nel panorama sempre più complesso dei progetti basati su grafi, LARUS ha consolidato una solida esperienza pluriennale, costruendo un rapporto di fiducia e collaborazione con Neo4j. Attraverso il LARUS Labs, ha sviluppato componenti e connettori che arricchiscono l’ecosistema Neo4j, contribuendo alla sua continua evoluzione. Tutto questo know-how è stato incanalato nell’innovativa soluzione Galileo.XAI di LARUS, un prodotto all’avanguardia che, integrato con la Generative AI, offre una nuova prospettiva nel mondo dell’Intelligenza Artificiale Spiegabile applicata ai grafi. In questo speech, si esplorerà il percorso di crescita di LARUS in questo settore, mettendo in luce le potenzialità della soluzione Galileo.XAI nel guidare l’innovazione e la trasformazione digitale.

LARUS - Galileo.XAI e Gen-AI: la nuova prospettiva di LARUS per il futuro del...

Neo4j

GraphSummit Milan - Visione e roadmap del prodotto Neo4j

Neo4j

Dr Jesús Barrasa, Head of Solutions Architecture for EMEA, Neo4j Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.

GraphSummit Milan - Neo4j: The Art of the Possible with Graph

Neo4j

Roberto Sannino, Product Owner, Larus Business Automation Nel panorama sempre più complesso dei progetti basati su grafi, LARUS ha consolidato una solida esperienza pluriennale, costruendo un rapporto di fiducia e collaborazione con Neo4j. Attraverso il LARUS Labs, ha sviluppato componenti e connettori che arricchiscono l’ecosistema Neo4j, contribuendo alla sua continua evoluzione. Tutto questo know-how è stato incanalato nell’innovativa soluzione Galileo.XAI di LARUS, un prodotto all’avanguardia che, integrato con la Generative AI, offre una nuova prospettiva nel mondo dell’Intelligenza Artificiale Spiegabile applicata ai grafi. In questo speech, si esplorerà il percorso di crescita di LARUS in questo settore, mettendo in luce le potenzialità della soluzione Galileo.XAI nel guidare l’innovazione e la trasformazione digitale.

LARUS - Galileo.XAI e Gen-AI: la nuova prospettiva di LARUS per il futuro del...

Neo4j

UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida

Neo4j

CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...

Neo4j

From Knowledge Graphs via Lego Bricks to scientific conversations.pptx

Neo4j

Novo Nordisk: When Knowledge Graphs meet LLMs

Neo4j

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Neo4j

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

Neo4j

QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians

Neo4j

EY_Graph Database Powered Sustainability

Neo4j

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph

Neo4j

These slides were presented by Hakan Lofqvist, Senior Field Engineer at Neo4j, at Data Innovation Summit April 2024 in Stockholm. The slidedeck helps explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships and knowledge graphs in data to uncover contextual insights and solve their most pressing challenges. Key takeaways: - Discover the potential of Graph Databases and how they can transform your data strategy - Explore the importance of Knowledge Graphs for GenAI in structuring information and uncover valuable insights. - Understand the best way to rapidly build accurate, contextual, and explainable GenAI applications - Witness firsthand the transformative impact these structures have on organizing information and enhancing the capabilities of GenAI systems.

Build your next Gen AI Breakthrough - April 2024

Neo4j

In this presentation, delivered by ABK Andreas Kollegger at QCon London 2024, the focus was on Connecting the Dots for Information Discovery. The classic RAG application extends an LLM with private information, able to fetch answers to questions that are contained in a single chunk of text. What if the answer requires connecting the dots across multiple chunks that may not be directly similar to the question? That is information discovery with GraphRAG. You'll learn how to: - reconstruct chunks into the original context - meaningfully connect disparate chunks - expand unstructured text data with structured data - combine all this into a RAG workflow

Connecting the Dots for Information Discovery.pdf

Neo4j

Pedro García Fernández, Consultor del Área de Estrategia TIC e Inteligencia Artificial, ISDEFE Alberto Uceda Aguilar, Responsable de Contrato, ISDEFE “Tokenización”, como activo, de las emisiones contaminantes en el transporte aéreo mediante el procesamiento de datos cercano al tiempo real. Cómo los grafos nos ayudan al linaje del proceso de verificación, a la gobernanza del dato y como base de conocimiento de rutas aéreas.

ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...

Neo4j

More from Neo4j (20)

Your enemies use GenAI too - staying ahead of fraud with Neo4j

BT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptx

Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan

Workshop - Architecting Innovative Graph Applications- GraphSummit Milan

LARUS - Galileo.XAI e Gen-AI: la nuova prospettiva di LARUS per il futuro del...

GraphSummit Milan - Visione e roadmap del prodotto Neo4j

GraphSummit Milan - Neo4j: The Art of the Possible with Graph

LARUS - Galileo.XAI e Gen-AI: la nuova prospettiva di LARUS per il futuro del...

UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida

CERVED e Neo4j su una nuvola, migrazione ed evoluzione di un grafo mission cr...

From Knowledge Graphs via Lego Bricks to scientific conversations.pptx

Novo Nordisk: When Knowledge Graphs meet LLMs

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians

EY_Graph Database Powered Sustainability

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph

Build your next Gen AI Breakthrough - April 2024

Connecting the Dots for Information Discovery.pdf

ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...

Recently uploaded

Six Myths about Ontologies: The Basics of Formal Ontology

johnbeverley2021

In the dynamic field of DevOps, the quest for efficiency and productivity is endless. This talk introduces a revolutionary toolkit: Large Language Models (LLMs), including ChatGPT, Gemini, and Claude, extending far beyond traditional coding assistance. We'll explore how LLMs can automate not just code generation, but also transform day-to-day operations such as crafting compelling cover letters for TPS reports, streamlining client communications, and architecting innovative DevOps solutions. Attendees will learn effective prompting strategies and examine real-life use cases, demonstrating LLMs' potential to redefine productivity in the DevOps landscape. Join us to discover how to harness the power of LLMs for a comprehensive productivity boost across your DevOps activities.

ChatGPT and Beyond - Elevating DevOps Productivity

VictorSzoltysek

At its core, the challenge of managing Human Resources data is an integration challenge: estimates range from 2-3 HR systems in use at a typical SMB, up to a few dozen systems implemented amongst enterprise HR departments, and these systems seldom integrate seamlessly between themselves. Providing a multi-tenant, cloud-native solution to integrate these hundreds of HR-related systems, normalize their disparate data models and then render that consolidated information for stakeholder decision making has been a substantial undertaking, but one significantly eased by leveraging Ballerina. In this session, we’ll cover: The overall software architecture for VHR’s Cloud Data Platform Critical decision points leading to adoption of Ballerina for the CDP Ballerina’s role in multiple evolutionary steps to the current architecture Roadmap for the CDP architecture and plans for Ballerina WSO2’s partnership in bringing continual success for the CD

Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform

WSO2

Design and Development of a Provenance Capture Platform for Data Science

Paolo Missier

Tracing the root cause of a performance issue requires a lot of patience, experience, and focus. It’s so hard that we sometimes attempt to guess by trying out tentative fixes, but that usually results in frustration, messy code, and a considerable waste of time and money. This talk explains how to correctly zoom in on a performance bottleneck using three levels of profiling: distributed tracing, metrics, and method profiling. After we learn to read the JVM profiler output as a flame graph, we explore a series of bottlenecks typical for backend systems, like connection/thread pool starvation, invisible aspects, blocking code, hot CPU methods, lock contention, and Virtual Thread pinning, and we learn to trace them even if they occur in library code you are not familiar with. Attend this talk and prepare for the performance issues that will eventually hit any successful system. About authorWith two decades of experience, Victor is a Java Champion working as a trainer for top companies in Europe. Five thousands developers in 120 companies attended his workshops, so he gets to debate every week the challenges that various projects struggle with. In return, Victor summarizes key points from these workshops in conference talks and online meetups for the European Software Crafters, the world’s largest developer community around architecture, refactoring, and testing. Discover how Victor can help you on victorrentea.ro : company training catalog, consultancy and YouTube playlists.

Finding Java's Hidden Performance Traps @ DevoxxUK 2024

Victor Rentea

In this keynote, Asanka Abeysinghe, CTO,WSO2 will explore the shift towards platformless technology ecosystems and their importance in driving digital adaptability and innovation. We will discuss strategies for leveraging decentralized architectures and integrating diverse technologies, with a focus on building resilient, flexible, and future-ready IT infrastructures. We will also highlight WSO2's roadmap, emphasizing our commitment to supporting this transformative journey with our evolving product suite.

Platformless Horizons for Digital Adaptability

WSO2

AWS Community Day CPH - Three problems of Terraform

Andrey Devyatkin

Exploring Multimodal Embeddings with Milvus

Zilliz

CNIC Information System with Pakdata Cf In Pakistan

danishmna97

Decarbonising Commercial Real Estate: The Role of Operational Performance

IES VE

Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows. We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases. This video focuses on the deployment of external web forms using Jotform for Bonterra Impact Management. This solution can be customized to your organization’s needs and deployed to support the common use cases below: - Intake and consent - Assessments - Surveys - Applications - Program registration Interested in deploying web form automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

Jeffrey Haguewood

TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....

rightmanforbloodline

In today's digital world, trust is key to customer relationships, but keeping it is a huge challenge. Customers are well-informed and empowered, quick to change brands if their trust is broken, even if it costs them more. This puts a lot of pressure on organizations to handle trust and safety issues with great care and transparency. The challenge, however, is real. Fragmented solutions have left privacy, legal, and security teams in a perpetual cycle of catch-up, struggling to update privacy notices, manage customer data rights, and answer lengthy security questionnaires—all while trying to prove ROI to the business. It's a thankless job, filled with repetition, tedious tasks, and constant interdepartmental coordination. Combine this with fast regulatory changes and the quick evolution of AI, and it becomes overwhelming. Join this webinar to learn more about TrustArc's new innovative solution Trust Center, the only unified, no-code online hub for trust and safety information built for privacy, security, compliance, and legal teams. Trust Center streamlines your path to compliance, shortens the pre-sales cycle, and reduces both legal and regulatory risks, saving time, effort, and cost. This webinar will review: - Why companies are building unified Trust Centers for a robust privacy program. - How unified Trust Centers streamline sales cycles, ensure regulatory compliance, and reduce operational bottlenecks. - How compliance, legal, security, GRC, and privacy teams benefit from a unified Trust Center in terms of needs, pains, and outcomes. - How TrustArc Trust Center saves time and work while reducing legal, reputational, and compliance risk by effectively managing policies, notices, terms, and disclosures, and providing real-time updates on subprocessors.

TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...

TrustArc

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

Zilliz

Understanding the FAA Part 107 License ..

Christopher Logan Kennedy

Retrieval augmented generation (RAG) is the most popular style of large language model application to emerge from 2023. The most basic style of RAG works by vectorizing your data and injecting it into a vector database like Milvus for retrieval to augment the text output generated by an LLM. This is just the beginning. One of the ways that we can extend RAG, and extend AI, is through multilingual use cases. Typical RAG is done in English using embedding models that are trained in English. In this talk, we’ll explore how RAG could work in languages other than English. We’ll explore French, Chinese, and Polish.

Introduction to Multilingual Retrieval Augmented Generation (RAG)

Zilliz

Explore the latest trends and insights on JavaScript usage with Pixlogix's informative blog. Discover key statistics and facts about JavaScript's role in web development, its popularity among developers, and its impact on modern websites. Stay updated with the evolving landscape of JavaScript frameworks and libraries, and learn how they're shaping the future of web development. Gain valuable insights to enhance your JavaScript skills and stay ahead in the digital realm.

JavaScript Usage Statistics 2024 - The Ultimate Guide

Pixlogix Infotech

Following the popularity of "Cloud Revolution: Exploring the New Wave of Serverless Spatial Data," we're thrilled to announce this much-anticipated encore webinar. In this sequel, we'll dive deeper into the Cloud-Native realm by uncovering practical applications and FME support for these new formats, including COGs, COPC, FlatGeoBuf, GeoParquet, STAC, and ZARR. Building on the foundation laid by industry leaders Michelle Roby of Radiant Earth and Chris Holmes of Planet in the first webinar, this second part offers an in-depth look at the real-world application and behind-the-scenes dynamics of these cutting-edge formats. We will spotlight specific use-cases and workflows, showcasing their efficiency and relevance in practical scenarios. Discover the vast possibilities each format holds, highlighted through detailed discussions and demonstrations. Our expert speakers will dissect the key aspects and provide critical takeaways for effective use, ensuring attendees leave with a thorough understanding of how to apply these formats in their own projects. Elevate your understanding of how FME supports these cutting-edge technologies, enhancing your ability to manage, share, and analyze spatial data. Whether you're building on knowledge from our initial session or are new to the serverless spatial data landscape, this webinar is your gateway to mastering cloud-native formats in your workflows.

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Safe Software

Discover the innovative features and strategic vision that keep WSO2 an industry leader. Explore the exciting 2024 roadmap of WSO2 API management, showcasing innovations, unified APIM/APK control plane, natural language API interaction, and cloud native agility. Discover how open source solutions, microservices architecture, and cloud native technologies unlock seamless API management in today's dynamic landscapes. Leave with a clear blueprint to revolutionize your API journey and achieve industry success!

WSO2's API Vision: Unifying Control, Empowering Developers

WSO2

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

Juan lago vázquez

Recently uploaded (20)

Six Myths about Ontologies: The Basics of Formal Ontology

ChatGPT and Beyond - Elevating DevOps Productivity

Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform

Design and Development of a Provenance Capture Platform for Data Science

Finding Java's Hidden Performance Traps @ DevoxxUK 2024

Platformless Horizons for Digital Adaptability

AWS Community Day CPH - Three problems of Terraform

Exploring Multimodal Embeddings with Milvus

CNIC Information System with Pakdata Cf In Pakistan

Decarbonising Commercial Real Estate: The Role of Operational Performance

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....

TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

Understanding the FAA Part 107 License ..

Introduction to Multilingual Retrieval Augmented Generation (RAG)

JavaScript Usage Statistics 2024 - The Ultimate Guide

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

WSO2's API Vision: Unifying Control, Empowering Developers

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

How Dell Used Neo4j Graph Database to Redesign Their Pricing-as-a-Service Platform

1. © 2022 Neo4j, Inc. All rights reserved. © 2022 Neo4j, Inc. All rights reserved. Redesigning Dell Pricing Platform Andrew Nepogodin, Cloud Architect Bhanu Naidu, Data Engineer Natraj Rachakonda, Data Engineer

2. Agenda Dell Digital Introduction • Pre-assembly Model value • Separation of concerns: data preparation vs runtime execution Evolving from SOA to Serverless Architecture – how Graph became a necessity • Case for Consolidation & Denormalization • Increased data density (avoiding proliferation) Why Neo4J? • Pure SaaS principles • Horizontal scalability as foundation • Flexible schema • Ad-hoc model to leverage Engine capabilities • Lessons learned • Data migration strategy • Ops management – support, monitoring, alerts, administration, backups Success story of Pricing Engine

3. Dell Digital IT Product Price Payment Cart Quote Order Grow Mature Consolidate

4. Dell Digital IT EMEA APJ AMER 0 5 10 15 20 25 30 35 40 1 2 3 4 5 6

5. Legacy Service Oriented Architecture

6. Legacy SOA (cont.) Client Commerce Service Orchestrator Dependency Service Data Dependency Service X X

7. Serverless Architecture

8. Pre-compute or pre-assembly

9. Pre-compute vs pre-assembly

10. Ingestion events (zero runtime dependency)

11. Why Graph (Neo4j) Service? • Data "gluing" mechanism for disconnected source systems • No proliferation during input; denormalized output • Natural representation of structure and relations • Schema-less... almost • Compliance with declarative modeling • Efficient traversals including recursion and circular references • Relation (Edges) are first-class citizens • Distributed load between Reads & Writes • Support for server-side plugins

12. Data Density • No proliferation at input • Denormalized output

13. Data density: Authoring Scope Forest

14. Denormalization • Runtime data collection is expensive • Self-sufficient runtime packages • Key-value storage • Data integrity (old package used till new one is ready)

15. Pricing Engine SaaS offering Integrating disconnected Data Sources Price Stages as business constructs rather than technology process Common domain problems (e.g. Rounding & Compensation) Price mutating actions via Formulas Capabilities over use- cases Ad-hoc business model declaration (no code changes) Price explanation Data usage insight

16. CAP Theorem

17. Expanding: Horizontal Scaling or Parallelization

18. Data-driven architecture • Input data: • Product Structure • Components • Adjustments • Connecting data: • Joining Product and Components • Joining Nodes of Product Structure and Adjustment • Applying data: • Adjustment candidate selection • Runtime context as an Adjustment application condition Item (laptop) Module: Memory Module: HD Option: 8GB Option: 16GB Option: 1TB Option: 2TB SKU: 8GB SKU: 16GB SKU: 1TB SKU: 2TB StandardPrice: $100 StandardPrice: $180 StandardPrice: $200 StandardPrice: $300

19. Graph as common denominator • Modeling of relationship • Condition Expression • DNF • Problems to solve: • For a given Product get all applicable Adjustments • Get all Products affected by the given Adjustments family.id=11111 and component.id==C-123 Item id=I1 Classification family==11111 Module Component id=C-222 Component id=C-123 Adjustment

20. Dynamic Graph schema Item Specification Classification Component AdjustmentContext Adjustment has is is is is appliedTo appliedTo contains contains ConditionGroup Service Context includes belongs has Main Schema Ad-hoc Schema

21. Things we wish we knew sooner Separating Read and Write operations Memory footprint Multi-label capability Bookmarks Disallow slow Node to be a Leader

22. Migrating legacy data into new Platform Initial load K8S Cloud Function ... Cloud Function Product Source Neo4J Stream Product Crawler Component Source Component Crawler Component Source Component Crawler

23. Migrating legacy data (cont.) Change events Stream Stream Stream Adjustments Components Products Neo4J Cloud Function Cloud Function Cloud Function

24. Production Data Volume • Uneven load distribution (peaks and valleys) Up to 50 Million payloads per day • 2-4 Vertices • 1-4 Edges 80% are Small payloads: • 4-12 Vertices • 3-12 Edges 15% are Medium payloads • Up to 4000 vertices • Up to 4000 edges 5% are Large Payloads

25. Production Database Infrastructure

26. Monitoring & Alerts Splunk • Ingest all neo4j related logs • Configure macros to capture log events • Create custom dashboards • Set up alerts

27. Neo4j Performance dashboard with Grafana Prometheus/Grafana • Export neo4j host and database metrics to Prometheus • Set up custom dashboards dedicated to host metrics and DB metrics • Configured email alerts with alert manager

28. No client interruption backup/restore Customized bash script to perform full/incremental backups Daily backups with crontab Backups saved to NAS disk Delphix to refresh non prod environments

29. Summary: Benefits for business •Across domain – authoring, shop, transact •Across regions Unified pricing experience •Self-service configuration effort (no recompilation/redeployment) •No duplicated functionality across domains Reduced time to market •Supporting what-if scenarios •Virtually unlimited price authoring logic •True Delta price presentation Decoupling business from architectural constraints •Guaranteed SLA •99.999% availability •Controllable system load Platform stability

30. Questions and Answers

Editor's Notes

Today we are going to talk about evolutional transformation that took place at Dell Digital org. We are going to cover challenges of our legacy architecture, how we decided to addressed them, and most importantly what technologies we had to leverage to achieve our goal. Our special focus of interest is going to be around Graph technology which happened to fit perfectly well to address our architectural objectives. Our goal was to create Pricing Engine service implemented under SaaS principles to deliver scalable, resilient, zero-loss, flexible and highly customizable solution for Dell’s pricing needs. Also, we are going to highlight lessons that we’ve learned along our journey, a bit of stats about data volume and our Ops model.
Dell Digital covers set of services required for operating Commerce platform. There are several domains addressing various aspect of sale lifecycle such as Quote, Payment, Order, Product, Cart and Price. Each of those domains is own large ecosystem comprising of many services, tools, processes and procedures. There are wide variety of services being used – from legacy systems to modern cloud-based solutions. And the important goal is to keep those systems to be able to communicate to each other while executing transition from older frameworks and architecture to newer ones. The process of gradual replacement and retiring legacy systems we are referring as Digital Transformation. To a degree, it is almost a surgical procedures – replacing pieces while the entire commerce platform keeps functioning so that end customers experience no issues.
Dell Digital is spread across the globe physically and logically. In the past, each region was able to operate successfully with a great degree of isolation. However, nowadays isolation can make extremely hard to unify operating model and most importantly, customer experience. In realities of the present world, all parts of business units must function in collaboration and synchronization to achieve global company’s objectives. However, operating different market segments under unified set of services is not a trivial task especially with the baggage of separated and disconnected tools and services. Dell Digital is taking steps in unifying the segments, starting from pricing authoring concern, and ending with prices displayed on the customer’s screen. Unified and scalable data management allows Dell to expand its business year after year.
Let’s dive a bit into history, the time when Service Oriented Architecture was a dominating golden standard for enterprise systems. Many of you at some point time have dealt with one of variations of Enterprise Service Bus architecture. There is many existing commercial and custom solutions, but the main principle is the same – there is an Orchestrator service that is responsible for gluing together disconnected services. In some cases, the Orchestrator would talk to a legacy tool, in some cases, it would be talking directly to a DB but ultimately, all obtained data had to be transformed into some common format that Orchestrator can act upon.
While the idea of orchestrator that understand data format from multiple disconnected systems was a great advancement from monolith architecture, it had own challenges. A typical customer request was looking like this: Initiate a request to a Commerce Service which act as an Orchestrator Orchestrator initiates sequential or parallel communication with external Dependency Services or raw data sources All responses are obtained, the data is processed, and the response is passed back to the caller. However, as it usually happens, the reality is often not as bright as it looks on a diagram. Should one of the Dependency Services fail or simply being unhealthy, the entire customer response is compromised either from SLA standpoint or in the worst-case scenario cannot be complete at all. This creates a challenge of indirect coupling between various external systems. Of course, each of the identified challenges can be addressed with some architectural improvements but all that would come at the expense of complicating the solutions. And as we all know the simpler is the architecture the better it is from virtually any perspective.
What is the natural step in improving runtime request experience? This would be eliminating runtime dependencies on external services. However, we still need data, right? The answer is to make data available without a need to make a trip to an external system. That’s where serverless, or event-driven architecture becomes an attractive option to collect all required data before using it. The main architectural focus switches from data preparation at runtime towards using ready-to-consume data prepared ahead of time so that runtime computational and communication cost is minimal. Prepare your data offline, use your data online. Upstream system feed stream of changes into background services that are responsible for connecting data from various sources and generating self-sufficient data content. Preparing self-sufficient data content can be referred as denormalization. Then during the runtime, a request is served by denormalized data package that has no dependency on the external systems, services or data. All heavy lifting that was previously executed by runtime Orchestrator is now served by offline Consolidator which does not participate in runtime execution and thus is not a subject of runtime SLA.
Since we mentioned denormalization approach, let’s have a brief overview of different flavors of that concept. Foundation of data presentation in any commerce platform is the price of a product. Price of a product that you typically see on a screen is usually taken from a pricing document. However, to get that single number on a screen, there is a complex business and procedural logic involved. For large retailers like Dell with hundreds of thousands of different products available in different geographical and business segments, setting individual price per product would be highly inefficient. Instead, prices are authored in different systems, while targeting specific properties or attributes. Once price-decision points are consolidated from multiple authoring systems, the denormalization process generates a document that ultimately will be used to displayed price to end customers. In case of a single a price point, the denormalization technique can be referred as a pre-compute. One product has one price within the given context. Pretty simple. However, Dell has specifics. Many Dell’s offerings allow for product customization. Each selection change results in a different resulting price. The immediate temptation might be to generate denormalized price document for each possible configuration selection. However, many Dell’s solutions have hundreds of different configuration options. Simple math provides interesting details – a single product with just 30 multi-select configuration options would generate more 1bln permutations. Even for Big Data, billions of documents for a single product is not probably what we want to deal with.
Dell products is usually represented in a form of a Tree Structure. There is a root node which is a product itself which comprised of different modules such as memory or hard drive, each module may have different selection options. For example, memory selection can be between 8GB or 16GB, etc. There is a default configuration that a customer can see when they navigate to the list of products. The default configuration has its price. Price of a product affected by its selected Options. If default price was the only one required, we could leverage pre-compute model and get default price for each product as a record in our denormalized repo. However, Dell offers possibility for customizations. A customer may choose – I do not want the default 1TB drive, I want to have 2TB. As you can imagine, 2TB drive is more expensive than 1TB therefore price calculated for the default configuration is not applicable to the custom configuration. We could potentially find out all possible permutations in the selections and for each of those calculate its price. But this can quickly go out of control as the number of possible configuration options get increased. There is another approach. For each price forming element in the Tree Structure, there can be an associated object containing actionable price information. With this approach we do not store product price as a single number. Instead, we create a lightweight model of association between Nodes and Price forming elements. Then during the request time, a customer simply provides the input of desired configuration, and the model executes rollup of prices to generate the resulting price. This is what we refer as pre-assembly model. Still no dependency on the external systems, still denormalized content but the end number is a result of a lightweight calculation in-memory.
In order to generate just-in-time price for the given product configuration, Dell’s Pricing Engine needs three main types of data – Product Structure, Price Forming Objects (Adjustments) and optional Components. Due to the nature of business, each of those data source have independent lifecycles with no direct correlation with each other. To give a simple example, a memory stick of 8GB has its own price while being used in thousands of different Products. Price changes of this memory stick may or may not affect prices of products where it is being installed. Or change in Product Structure (e.g. adding or removing configuration Options) may affect the default or custom product prices. Each of those changes are authored and scoped within own ecosystem. Changes are streamed out in form of events. Relation between Product Structure and its Adjustment may not be known ahead of time. The question becomes – how we connect those pieces of data together so that we can execute lightweight just-in-time price calculation? Or in other words, what do we use as data consolidation mechanism from which we can generated denormalized pre-assembly product models. The answer is a Graph system. Graph has become a centerpiece for our data consolidation. Denormalized content is a result of Graph traversal logic.
Why Graph? Why not RDBMS or some other No-SQL DB? The answer lies in several major factors – resolved association (as you might remember, Product and its Adjustments must be associated); and flexible dynamically defined schema represented in a form of ad-hoc relations. In terms of data processing, we were aiming at no proliferation during the intake, and denormalized output. For example, a memory stick of 8GB can be used in thousands of different products so its price change may impact thousands price packages. We intake its price change once, and we get denormalize output of thousands of affected packages. The relationship between price Adjustments and Products is defined dynamically by series of attributes rather than via predefined schema. This implies that we still use some elements of schema but by no mean we are limited to a rigid set of allowed relations. The depth of relationship between price forming elements is not strongly defined. Sometimes it can be a direct one-to-one link; in some cases, it can be based on inner elements of various hierarchy; in some cases, entities may be considered related if both belong to the same forest. As you can imagine, traversal logic may become complicated, but Graph can take care of that complexity leaving us with a simple formulation of traversal goal.
Let’s take a quick look at data density problem. When there is an event that affect one or more products, we do not want to spend our intake time in identifying all affected elements. And even less we want to duplicate the event for every possible relation destination. Order of event delivery is non-deterministic. Sometimes Price Adjustment event may arrive before its impacted Product, or the other way around. To address the disconnected nature of data relations, we simply create potential point of connections, or Context Nodes based on the model declaration. Those may or may not be used at all. The important aspect is to have those ready. For example, two Products event were ingested. Both got linked to Context objects. Once a Price Adjustment event arrived, it gets connected to an existing Context only once. But with one link we have got the possibility to identify relationship with two Products.
Data at Dell can be authored at different levels. Sometimes it can be a business catalog, sometimes it is a customer-specific catalog, region, country, segment, or other grouping units. Authoring Scopes are a totally separate set of data that is heavily used by Pricing Platform. The relationship between authoring scopes is often not hierarchical in its nature. Those are more of a forest nature with possible circular reference. Proliferating pricing data across different authoring scopes can create data explosion. In realities of Graph, we can afford non-linear nature of relations or membership. Membership groups are forming forests or clusters. This way a relevant price-related input ingested once, becomes available to the entire forest.
Despite the fact we have all relevant data stored in Graph, resolving relationships can be an expensive process. During the runtime request we do not want to spend time on traversing Graph to get all relevant pieces of data to calculate the price. Instead, we want to store all Price relevant data packages ahead of time. Preparing of such packages is done via scalable background processes that are not a subject of SLA agreement. That’s basically where denormalization happens. A single price adjustment element can be included in thousands of packages. Preparing an individual package can take some time; however, what matters is how many packages we can produce within a given time. And this is controlled by degree of parallelism. Spending for example 800ms on a single package does not sound too impressive. However, if within the same 800ms we can generate 10K packages, that’s already not that bad. Once the package is prepared, Pricing Engine has all necessary information to calculate Product price as per customer selection - in one place, no external dependency. Packages are stored in a key-value store. With this approach there is no undefined data state – till new package is ready, the older one is being used.
Let’s overview, what goals we were trying to achieve while redesigning out Pricing Engine Platform and how graph technology allowed us to achieve those goals. In its nutshell, Pricing Engine is a sort of calculator. For given data, it calculates prices against any product represented as a Tree Structure. The prices are calculated in different traversal directions – from leaves to the root, from root to the leaves and any combination in between. In addition, it solves common commerce problems – rounding, price compensation, currency conversion, etc. as well as business problems such as price explanation & break-down, grouping by price category (price vs tax vs discount vs cost), etc. From day one we put ourselves a goal – it must be implemented as a SaaS. Why? Despite the fact the Service was created to serve Dell needs, the variety of cases among Dell internal customers is no different than serving external customers. All that means was that we cannot afford hard-coded use-case implementation because hard coded logic for one customer will not work for another. Therefore, we had to create a platform where Pricing service would be able to accommodate any customer via self-service configuration. We had created a strict set of rules such as “Pricing does not author data but only serves data”. All data related to prices are authored in external systems while the Platform only facilitates connecting different scopes of rules and data together. Or another rule, which is my favorite – “if we implement this capability, can we advertise it a selling point if we put this Platform on the market”? As you can imagine, with such degree of flexibility, it is virtually impossible to predict and maintain strong data schema. Instead, relationships between data points are created dynamically based on a business context and attributes. Graph allows us to preserve data meaning without going into abstraction layers requiring several PhDs to comprehend to content. The principle is simple – create natural relations now, use them later. What I mean by “natural” relations is that by looking into your Graph content you should be able to formulate sentences about data meaning in plain English so that even outsiders would understand.
Let’s overview some challenges we had to solve along the way. Graph DB just like any other DB is a subject of CAP Theorem in according to which a data store can provide only two out of three guarantees. In case of Neo4J the guarantees are Consistency and Availability. So, we had to decide whether we could live without Partition Tolerance. The answer was yes, but under certain conditions. Without sharding, the only option to increase data intake throughput would be through vertical scaling – increasing computational resources. And even though the throughput limit can be quite high, it is still a limit. So, we had to ensure data ingestion throttling was in place to prevent service overload. Here I need to mention that Neo4J Fabric allows for partitioning but currently only for disjoined Graphs which is not the case for Pricing data.
Another problem to solve. As we just mentioned, all Neo4J write transactions are executed against a Leader Core. Since real sharding is not an option, the only way to increase write throughput is via Vertical Scaling by adding computational power to Neo4J Cores which of course would increase hardware cost. The reality of Dell business is that amount of data keeps growing. So, there must be a strategy to deal with ever increasing data volume. Vertical scaling can provide only temporary relief. Fortunately for Dell, there was an alternative. Data have clearly defined geographic region boundary. This way, instead of physical data sharding, we were able to organize “logical sharding” where each region serves only products for that region. Of course, there is a subset of data that will be duplicated between regions but as we all know – duplicated data is better than poorly organized data. Our end solution still avoids proliferation. Original massage is published once, each region picks only data it needs; and in some cases, the same message may be picked by more than region.
Having Neo4J as our Graph service, allowed us to achieve true data-driven solution. What does that mean? Any data ingested into our system has 2 separated categories of properties. First category drives connection point (or potential connection points) between data types – Product Structure, Adjustments and Components. However, the fact an Adjustment is linked to a Product does not mean it plays immediate role in its price calculation. The relations defined on the Graph level are nothing more than “runtime candidates”. That means that an Adjustment has a potential to be applied. Whether or not an Adjustment gets applied is a subject of runtime Request Context. This second type of data category drives “final decision” of what is applicable and what is not. This way, two different category of customers requesting for a price of the same product may see two different prices.
Since our focus is Graph, let’s look at the first category of relations that defines price Adjustment candidates. Typically, business authors price affecting constructs in a form of Boolean tree of attributes using and/or/not/contains/starts-with/etc. clauses. This is referred as condition-expression which in is raw form is just a string. However, the question is how this gets applied at the Graph level? Each message goes through data decomposition phase which ultimately gets translated into vertices and edges on the Graph level and condition-expression plays major role in forming associations. An Adjustment becomes relevant to a Product only if one or more conditions are satisfied. Condition Expression, being a Binary Tree can complicate Graph traversal. To simplify traversals, we want to avoid complex logic. For this, we flatten out condition-expression by converting it into Disjunctive Normal Form. That’s where a complex expression with nested clauses becomes just a flat list of AND clauses combined by OR Clause. Within each AND clause, there is a list of attributes to look for. Each AND clause can be processed separately during traversal and if at least one AND clause is satisfied, Adjustment get associated with a Product. This significantly simplified Graph traversal. Ultimately, we want only two types of answers from our Graph storage: For a given Product, give me all relevant Price Adjustments For a given Adjustment, give me all Products where it is applied
While we are using loosely defined schema on ingested data, there can be edge case to address limited customer-specific data relations or traversal. Instead of creating a separate DB, we just create ad-hoc sub-model that exists side-by-side with the main data content. The question is how ad-hoc data is processed? Again, we leverage declarative syntax. Upon data ingestions, if some attribute matching criteria is met, our data processing pipeline can infer special instruction on how to interpret or decompose the relationships. This may include inferring Labels, defining attributes that should be exposed as separate Nodes and special traversal instructions such as what Node Labels to hop while traversing Graph. This is not a user-friendly type of instruction to specify, but this mechanism allows us to quickly accommodate business needs without any code recompilation or redeployment.
A few tips that we had learned along the way. Some of those may be obvious but still were not considered till real-life situations pointed to that. These points are specific to Neo4J and may not be applicable to other Graph solutions. Neo4J executes all write operations on a Leader Core. Depending on the data load, it may burn a lot of CPU utilization. On the other hand, read operations requiring Graph traversal are not coming for free either. However, unlike write operations, can be executed on Follower’s Cores or Read Replicas. By explicitly specifying type of transaction, you can redirect read transactions to less busy instances thus giving more write room to the Leader. Additionally, read replicas can be scaled horizontally. Another factor to keep in mind is that Neo4J is the most efficient when the entire content can fit within the memory of a Core. In our case, we store in Graph only data that is relevant to defining relationship between Product and its Price Adjustments. Any additional information is stored in other DB such as Mongo, Redis or Blob. This way you can utilize Core’s memory with the most efficiency and do not waste its CPU cycles for paging. Another useful trick is separating data by Vertex labeling. Neo4J Vertices can have multiple labels. This way, depending on the Label picked, can be seen as a part of main schema, or of an ad-hoc schema. Neo4J considers write transaction successful if majority of Cores acknowledge the write operation. However, that does not mean Read Replicas are participating in confirmation. Newly inserted data get propagated to read replicas at later stage. But how do we know when a read replica we hit has already received data we just inserted? Neo4J has a useful mechanism of bookmarking. Once you execute your write transactions, you get bookmarks. You can pass those bookmarks to your read transactions. If you hit a read replicas that has not received yet the new data, the bookmark will hold your request till data gets delivered to that replica. Another tip that we learned a hard way is that when you use stretch cluster setup (Cores in separate data center) and some data center is slower than others, in order to prevent write operations to happen in that slow Core, you can disallow it to be a Leader. It still will participate in Leader election but will not become a leader itself.
A few words about our migration strategy. While analyzing the effort required to convert old DB content into Graph repo, we found out that investing into direct data load would not be practical. The reason being is that there is a lot of declarative business rules that must be considered when processing data. To accommodate all business rules for defining relations, the entire pricing functionality must had been repeated with the migration tool. Given the fact that would be a through-away investments, we decided to go with a different approach. All data coming into Pricing Platform arrives in canonical format that is independent from upstream formats. Conversion is achieved with set of microservices that we call Adapters. Data event arrives to Adapter in upstream system’s native format, and Adapter responsibility is to convert it into canonical representation and pass to the intake endpoint. So instead of investing into a throw away effort of data migration tool, we decided to take re-usable approach. Intake Adapters were extended to operate in Crawling mode. That means that instead of waiting for upstream system events, Adapter would go to the source system and explicitly pull all data available. This was like a big-bang approach with maxed out scaling of Cloud Function to maximize throughput while making sure we do not kill our Neo4J instance with the number of connections and write operations. So, the short answer to our migration strategy – no direct DB migration, just native data intake as if it were new data.
Once the initial big-bang load is over, adapter go to its normal operation mode and simply receive change notifications, but the end action is still the same – receive data in its native format, convert into canonical schema then submit it to the Pricing Intake endpoint. This way, the only difference between initial data migration and normal event processing is the volume of data.
A bit of details about data volume the Pricing Platform is serving. Per day we usually receive anywhere between 5 to 50 mln events. Each event represents a Tree Structure. However, not all tree structures are equal. Vast majority of all events, about 80% are small trees representing Price Adjustments. Typically, it is not more than 4-5 vertices and edges. About 15% are medium size message where number of vertices and edges can go up to 10-12. And about 5% of all events are representing large tree structures. Number of Vertices and Edges can go up to 4000. As you can imagine, each of those events has meaning and potential impact on Price. That means, we must guarantee zero-loss processing system. How this is achieved is a separate subject, but we can say that without bullet-proof infrastructure stability that would be very difficult to accommodate. Let’s overview how we manage our infrastructure.
All our Neo4j clusters run on version 4.4.x and spread across 3 different data centers. Out of these 3 DCs, 2 are close by and third one is remote. Each cluster consists of 3 core nodes (one in each data center) and 4 read replicas ( 2 in each data center which are close to app layer). We enabled server groups to make sure Leader node stays in one of the close by DC as well as to prevent app calls to remote DC. Each node is having 24 cores, 192 GB RAM, SSD storage with OEL8 running on VMWARE. Coming to users, we use both LDAP and native authentication with https communication enabled using DELL certificate authority.
Splunk for DB log analytics. We install Splunk agents on all neo4j servers to ingest DB logs into Splunk (neo4j logs, debug logs, query & security logs) on real time basis. We custom built multiple dashboards using macros to provide better visibility to app team. Also created multiple alerts to identify high severity incidents like Node lost communication, out of memory, running out of threads, Neo4j service down etc.…. Splunk data retention is 90 days so it’s easy go back and troubleshoot at a specific interval.
Prometheus/Grafana - captures both host metrics and DB metrics from neo4j clusters. Alert manager send alert emails to DBA distro. Grafana is our main troubleshooting tool for all production incidents.
Backups happen from remote data center core node as this node doesn’t serve app traffic. Full backups are being performed on daily basis to backup NAS disk and restore validation is being performed on quarterly basis. All non prod refreshes happen through Delphix.
All those changes we have spoken are good as academic concept, but the real question is how business benefit from this architecture. We can mention a few benefits. Unified pricing experience guarantees that all commerce domains and region are dealing with the common format understood by all stakeholders. Time to market is an essential factor to stay competitive. 90% of all new cases are addressed via configuration of existing capabilities. No code recompilation, no redeployment. In the past each commerce segment and region had own implementation of Pricing Service. With the new Engine implemented as SaaS this is no longer the case thus letting us to save on labor and maintenance. Another major gain was that business now has freedom to experiment with virtually any type of price authoring logic without coordination with the backend. Pricing Engine ensures all real and virtual prices are processed equally. This allows to accommodate non-linear logic such as tier-based discounts while still giving accurate delta price presentation. Delta Price is the price difference between currently selected configuration and would-be selected configuration. And system-wide we’ve got stable SLA because response time no longer depends on dependency service – all relevant information is available ahead of time. In addition, business no longer needs to coordinate data load because the architecture ensures predictable load on services that cannot scale horizontally.

How Dell Used Neo4j Graph Database to Redesign Their Pricing-as-a-Service Platform

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to How Dell Used Neo4j Graph Database to Redesign Their Pricing-as-a-Service Platform

Similar to How Dell Used Neo4j Graph Database to Redesign Their Pricing-as-a-Service Platform (20)

More from Neo4j

More from Neo4j (20)

Recently uploaded

Recently uploaded (20)

How Dell Used Neo4j Graph Database to Redesign Their Pricing-as-a-Service Platform

Editor's Notes