The document discusses strategies for loading data in JPA, specifically lazy vs eager loading. It provides examples of when each strategy would be preferred, noting that eager loading is generally better for loading a small amount of related data to avoid additional database queries, while lazy loading is preferred when the related data is large or its needed fields are unknown. It also discusses using lazy loading for large objects like blobs and clobs.
This is a basic tutorial on Spring core.
Best viewed when animations and transitions are supported, e.g., view in MS Powerpoint. So, please try to view it with animation else the main purpose of this presentation will be defeated.
In this session Sam Weaver, Product Manager at MongoDB will introduce MongoDB Compass, a new tool developed by MongoDB that allows you to easily visualize your MongoDB schema and build queries graphically. The session will include a practical demonstration of Compass as well as a discussion of its features and applications.
This is a basic tutorial on Spring core.
Best viewed when animations and transitions are supported, e.g., view in MS Powerpoint. So, please try to view it with animation else the main purpose of this presentation will be defeated.
In this session Sam Weaver, Product Manager at MongoDB will introduce MongoDB Compass, a new tool developed by MongoDB that allows you to easily visualize your MongoDB schema and build queries graphically. The session will include a practical demonstration of Compass as well as a discussion of its features and applications.
Tez is the next generation Hadoop Query Processing framework written on top of YARN. Computation topologies in higher level languages like Pig/Hive can be naturally expressed in the new graph dataflow model exposed by Tez. Multi-stage queries can be expressed as a single Tez job resulting in lower latency for short queries and improved throughput for large scale queries. MapReduce has been the workhorse for Hadoop but its monolithic structure had made innovation slower. YARN separates resource management from application logic and thus enables the creation of Tez, a more flexible and generic new framework for data processing for the benefit of the entire Hadoop query ecosystem.
Importing Data into Neo4j quickly and easily - StackOverflowNeo4j
In this GraphConnect presentation Mark and Michael show several ways to import large amounts of highly connected data from different formats into Neo4j. Both Cypher's LOAD CSV as well as the bulk importer is demonstrated along with many tips.
We use the well know StackOverflow Q&A site data which is interestingly very graphy.
If you’re already a SQL user then working with Hadoop may be a little easier than you think, thanks to Apache Hive. It provides a mechanism to project structure onto the data in Hadoop and to query that data using a SQL-like language called HiveQL (HQL).
This cheat sheet covers:
-- Query
-- Metadata
-- SQL Compatibility
-- Command Line
-- Hive Shell
In this Introduction to Apache Sqoop the following topics are covered:
1. Why Sqoop
2. What is Sqoop
3. How Sqoop Works
4. Importing and Exporting Data using Sqoop
5. Data Import in Hive and HBase with Sqoop
6. Sqoop and NoSql data store i.e. MongoDB
7. Resources
Kafka Streams vs. KSQL for Stream Processing on top of Apache KafkaKai Wähner
Spoilt for Choice – Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka:
Apache Kafka is a de facto standard streaming data processing platform. It is widely deployed as event streaming platform. Part of Kafka is its stream processing API “Kafka Streams”. In addition, the Kafka ecosystem now offers KSQL, a declarative, SQL-like stream processing language that lets you define powerful stream-processing applications easily. What once took some moderately sophisticated Java code can now be done at the command line with a familiar and eminently approachable syntax.
This session discusses and demos the pros and cons of Kafka Streams and KSQL to understand when to use which stream processing alternative for continuous stream processing natively on Apache Kafka infrastructures. The end of the session compares the trade-offs of Kafka Streams and KSQL to separate stream processing frameworks such as Apache Flink or Spark Streaming.
Security is one of fundamental features for enterprise adoption. Specifically, for SQL users, row/column-level access control is important. However, when a cluster is used as a data warehouse accessed by various user groups via different ways, it is difficult to guarantee data governance in a consistent way. In this talk, we focus on SQL users and talk about how to provide row/column-level access controls with common access control rules throughout the whole cluster with various SQL engines, e.g., Apache Spark 2.1, Apache Spark 1.6 and Apache Hive 2.1. If some of rules are changed, all engines are controlled consistently in near real-time. Technically, we enables Spark Thrift Server to work with an identify given by JDBC connection and take advantage of Hive LLAP daemon as a shared and secured processing engine. We demonstrate row-level filtering, column-level filtering and various column maskings in Apache Spark with Apache Ranger. We use Apache Ranger as a single point of security control center.
Working with JSON Data in PostgreSQL vs. MongoDBScaleGrid.io
In this post, we are going to show you tips and techniques on how to effectively store and index JSON data in PostgreSQL vs. MongoDB. Learn more in the blog post: https://scalegrid.io/blog/using-jsonb-in-postgresql-how-to-effectively-store-index-json-data-in-postgresql
Benchmarking is hard. Benchmarking databases, harder. Benchmarking databases that follow different approaches (relational vs document) is even harder.
But the market demands these kinds of benchmarks. Despite the different data models that MongoDB and PostgreSQL expose, many organizations face the challenge of picking either technology. And performance is arguably the main deciding factor.
Join this talk to discover the numbers! After $30K spent on public cloud and months of testing, there are many different scenarios to analyze. Benchmarks on three distinct categories have been performed: OLTP, OLAP and comparing MongoDB 4.0 transaction performance with PostgreSQL's.
What would be faster, MongoDB or PostgreSQL?
[2019] 바르게, 빠르게! Reactive를 품은 Spring KafkaNHN FORWARD
※다운로드하시면 더 선명한 자료를 보실 수 있습니다.
Spring Kafka 2.3에 추가된 Reactive API를 소개합니다.
모니터링시스템에서 감지한 이상 현상을 담당자들에게 통지하는 실제 사례를 중심으로 설명합니다.
Reactive 방식으로 메시지를 발행하고 소비하는 방법을 소개하고, 읽어 들인 이벤트 메시지에 적용해야 할 여러 복잡한 요구 사항을 Rx의 연산자들을 통해 간결하게 구현하는 예제를 공유합니다.
Publisher와 Subscriber 간의 동작 구조를 통해 여러 시스템 그리고 저장소와 연계할 때 주의할 점을 되짚어보고, 특히 Kafka를 이용해서 생길 수 있는 문제와 이를 해결할 방법을 제안합니다.
목차
1. Kafka 메시지를 비동기로 처리하는 방법
2. ReactiveX에서 제공하는 연산자를 활용하는 사례
3. Project Reactor의 내부 구조(Publisher-Subscriber 간 처리 흐름)
대상
- Reactive Programming에 관심 있는 분
- Kafka 등 스트리밍 플랫폼의 메시지 처리량을 높이고 싶은 분
■관련 동영상: https://youtu.be/HzQfJNusnO8
Tez is the next generation Hadoop Query Processing framework written on top of YARN. Computation topologies in higher level languages like Pig/Hive can be naturally expressed in the new graph dataflow model exposed by Tez. Multi-stage queries can be expressed as a single Tez job resulting in lower latency for short queries and improved throughput for large scale queries. MapReduce has been the workhorse for Hadoop but its monolithic structure had made innovation slower. YARN separates resource management from application logic and thus enables the creation of Tez, a more flexible and generic new framework for data processing for the benefit of the entire Hadoop query ecosystem.
Importing Data into Neo4j quickly and easily - StackOverflowNeo4j
In this GraphConnect presentation Mark and Michael show several ways to import large amounts of highly connected data from different formats into Neo4j. Both Cypher's LOAD CSV as well as the bulk importer is demonstrated along with many tips.
We use the well know StackOverflow Q&A site data which is interestingly very graphy.
If you’re already a SQL user then working with Hadoop may be a little easier than you think, thanks to Apache Hive. It provides a mechanism to project structure onto the data in Hadoop and to query that data using a SQL-like language called HiveQL (HQL).
This cheat sheet covers:
-- Query
-- Metadata
-- SQL Compatibility
-- Command Line
-- Hive Shell
In this Introduction to Apache Sqoop the following topics are covered:
1. Why Sqoop
2. What is Sqoop
3. How Sqoop Works
4. Importing and Exporting Data using Sqoop
5. Data Import in Hive and HBase with Sqoop
6. Sqoop and NoSql data store i.e. MongoDB
7. Resources
Kafka Streams vs. KSQL for Stream Processing on top of Apache KafkaKai Wähner
Spoilt for Choice – Kafka Streams vs. KSQL for Stream Processing on top of Apache Kafka:
Apache Kafka is a de facto standard streaming data processing platform. It is widely deployed as event streaming platform. Part of Kafka is its stream processing API “Kafka Streams”. In addition, the Kafka ecosystem now offers KSQL, a declarative, SQL-like stream processing language that lets you define powerful stream-processing applications easily. What once took some moderately sophisticated Java code can now be done at the command line with a familiar and eminently approachable syntax.
This session discusses and demos the pros and cons of Kafka Streams and KSQL to understand when to use which stream processing alternative for continuous stream processing natively on Apache Kafka infrastructures. The end of the session compares the trade-offs of Kafka Streams and KSQL to separate stream processing frameworks such as Apache Flink or Spark Streaming.
Security is one of fundamental features for enterprise adoption. Specifically, for SQL users, row/column-level access control is important. However, when a cluster is used as a data warehouse accessed by various user groups via different ways, it is difficult to guarantee data governance in a consistent way. In this talk, we focus on SQL users and talk about how to provide row/column-level access controls with common access control rules throughout the whole cluster with various SQL engines, e.g., Apache Spark 2.1, Apache Spark 1.6 and Apache Hive 2.1. If some of rules are changed, all engines are controlled consistently in near real-time. Technically, we enables Spark Thrift Server to work with an identify given by JDBC connection and take advantage of Hive LLAP daemon as a shared and secured processing engine. We demonstrate row-level filtering, column-level filtering and various column maskings in Apache Spark with Apache Ranger. We use Apache Ranger as a single point of security control center.
Working with JSON Data in PostgreSQL vs. MongoDBScaleGrid.io
In this post, we are going to show you tips and techniques on how to effectively store and index JSON data in PostgreSQL vs. MongoDB. Learn more in the blog post: https://scalegrid.io/blog/using-jsonb-in-postgresql-how-to-effectively-store-index-json-data-in-postgresql
Benchmarking is hard. Benchmarking databases, harder. Benchmarking databases that follow different approaches (relational vs document) is even harder.
But the market demands these kinds of benchmarks. Despite the different data models that MongoDB and PostgreSQL expose, many organizations face the challenge of picking either technology. And performance is arguably the main deciding factor.
Join this talk to discover the numbers! After $30K spent on public cloud and months of testing, there are many different scenarios to analyze. Benchmarks on three distinct categories have been performed: OLTP, OLAP and comparing MongoDB 4.0 transaction performance with PostgreSQL's.
What would be faster, MongoDB or PostgreSQL?
[2019] 바르게, 빠르게! Reactive를 품은 Spring KafkaNHN FORWARD
※다운로드하시면 더 선명한 자료를 보실 수 있습니다.
Spring Kafka 2.3에 추가된 Reactive API를 소개합니다.
모니터링시스템에서 감지한 이상 현상을 담당자들에게 통지하는 실제 사례를 중심으로 설명합니다.
Reactive 방식으로 메시지를 발행하고 소비하는 방법을 소개하고, 읽어 들인 이벤트 메시지에 적용해야 할 여러 복잡한 요구 사항을 Rx의 연산자들을 통해 간결하게 구현하는 예제를 공유합니다.
Publisher와 Subscriber 간의 동작 구조를 통해 여러 시스템 그리고 저장소와 연계할 때 주의할 점을 되짚어보고, 특히 Kafka를 이용해서 생길 수 있는 문제와 이를 해결할 방법을 제안합니다.
목차
1. Kafka 메시지를 비동기로 처리하는 방법
2. ReactiveX에서 제공하는 연산자를 활용하는 사례
3. Project Reactor의 내부 구조(Publisher-Subscriber 간 처리 흐름)
대상
- Reactive Programming에 관심 있는 분
- Kafka 등 스트리밍 플랫폼의 메시지 처리량을 높이고 싶은 분
■관련 동영상: https://youtu.be/HzQfJNusnO8
Mr. Mario Meštrović showcases some pros and cons of choosing a Micro ORM framework over traditionally well known Entity Framework. Stack Overflow uses Dapper. Lets see why.
Spring Data is a high level SpringSource project whose purpose is to unify and ease the access to different kinds of persistence stores, both relational database systems and NoSQL data stores.
Amazon Webservices for Java Developers - UCI WebinarCraig Dickson
Amazon Web Services (AWS) offers IT infrastructure services to businesses in the form of web services - now commonly known as cloud computing. AWS is an ideal platform to develop on and host enterprise Java applications, due to the zero up front costs and virtually infinite scalability of resources. Learn basic AWS concepts and work with many of the available services. Gain an understanding of how existing JavaEE applications can be migrated to the AWS environment and what the advantages are. Discover how to architect a new JavaEE application from the ground up to leverage the AWS environment for maximum benefit.
How to define candidate level on the interview?
How to do that quickly and precisely?
How to know one's technical level? How to improve it and build successful IT career? I try to answer all those questions in my talk.
Java Persistence API is a collection of classes and methods to persistently store the vast amounts of data into a database which is provided by the Oracle Corporation.
Generally, Java developers use lots of code, or use the proprietary framework to interact with the database, whereas using JPA, the burden of interacting with the database reduces significantly. It forms a bridge between object models (Java program) and relational models (database program).
Java Persistence API (JPA) - A Brief OverviewCraig Dickson
This is a lightning presentation given by Scott Rabon, a member of my development team. He presents a high level overview of the JPA based on his first exposure to it.
Introduction to JPA and Hibernate including examplesecosio GmbH
In this talk, held as part of the Web Engineering lecture series at Vienna University of Technology, we introduce the main concepts of Java Persistence API (JPA) and Hibernate.
The first part of the presentation introduces the main principles of JDBC and outlines the major drawbacks of JDBC-based implementations. We then further outline the fundamental principles behind the concept of object relation mapping (ORM) and finally introduce JPA and Hibernate.
The lecture is accompanied by practical examples, which are available on GitHub.
Spring Data Requery is alternatives of Spring Data JPA
Requery is lightweight ORM for DBMS (MySQL, PostgreSQL, H2, SQLite, Oracle, SQL Server)
Spring Data Requery provide Query By Native Query, Query By Example and Query By Property like Spring Data JPA
Spring Data Requery is better performance than JPA
Building Scalable Stateless Applications with RxJavaRick Warren
RxJava is a lightweight open-source library, originally from Netflix, that makes it easy to compose asynchronous data sources and operations. This presentation is a high-level intro to this library and how it can fit into your application.
Hadoop became the most common systm to store big data.
With Hadoop, many supporting systems emerged to complete the aspects that are missing in Hadoop itself.
Together they form a big ecosystem.
This presentation covers some of those systems.
While not capable to cover too many in one presentation, I tried to focus on the most famous/popular ones and on the most interesting ones.
Hadoop became the most common systm to store big data.
With Hadoop, many supporting systems emerged to complete the aspects that are missing in Hadoop itself.
Together they form a big ecosystem.
This presentation covers some of those systems.
While not capable to cover too many in one presentation, I tried to focus on the most famous/popular ones and on the most interesting ones.
Building Deep Learning Workflows with DL4JJosh Patterson
In this session we will take a look at a practical review of what is deep learning and introduce DL4J. We’ll look at how it supports deep learning in the enterprise on the JVM. We’ll discuss the architecture of DL4J’s scale-out parallelization on Hadoop and Spark in support of modern machine learning workflows. We’ll conclude with a workflow example from the command line interface that shows the vectorization pipeline in Canova producing vectors for DL4J’s command line interface to build deep learning models easily.
Do you want to see live Kubernetes hacking? Come to see interactive demos where your newly registered accounts in a k8s application are hijacked.
This talk guides you through various security risk of Kubernetes, focusing on OWASP Kubernetes Top 10 list. In live demos, you will find out how to exploit a range of vulnerabilities or misconfigurations in your k8s clusters, attacking containers, pods, network, or k8s components, leading to an ultimate compromise of user accounts in an exemplary web application.
You will learn about common mistakes and vulnerabilities along with the best practices for hardening your Kubernetes systems.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
2. About Me
• 15+ professional experience
– SoGware engineer, architect, head of soGware R&D
• Author and speaker
– JavaOne, Devoxx, JavaZone, TheServerSide Java
Symposium, Jazoon, OOPSLA, ASE, others
• Finalizing PhD in Computer Science
• Founder and CTO of Yonita
– Bridge the gap between the industry and the academia
– Automated detecUon and refactoring of soGware defects
– Security, performance, concurrency, databases
• TwiVer: @yonlabs
3. Outline
• MoUvaUon
• Why?
– Expect unexpected!
• What?
– Use cases and corner cases
– Hints on strategies
• How?
– JPA 2.1
• Conclusion
6. Hibernate JPA Provider
Heads of Hydra
@Entity
public class Hydra {
private Long id;
private List<Head> heads = new ArrayList<Head>();
@Id @GeneratedValue
public Long getId() {...}
protected void setId() {...}
@OneToMany(cascade=CascadeType.ALL)
public List<Head> getHeads() {
return Collections.unmodifiableList(heads);
}
protected void setHeads() {...}
}
// new EntityManager and new transaction: creates and persists the hydra with 3 heads
// new EntityManager and new transaction
Hydra found = em.find(Hydra.class, hydra.getId());
7. How Many Queries in 2nd Tx?
@Entity
public class Hydra {
private Long id;
private List<Head> heads = new ArrayList<Head>();
@Id @GeneratedValue
public Long getId() {...}
protected void setId() {...}
@OneToMany(cascade=CascadeType.ALL)
public List<Head> getHeads() {
return Collections.unmodifiableList(heads);
}
protected void setHeads() {...}
}
// new EntityManager and new transaction: creates and persists the hydra with 3 heads
// new EntityManager and new transaction
Hydra found = em.find(Hydra.class, hydra.getId());
(a) 1 select
(b) 2 selects
(c) 1+3 selects
(d) 2 selects, 1 delete, 3
inserts
(e) None of the above
8. How Many Queries in 2nd Tx?
(a) 1 select
(b) 2 selects
(c) 1+3 selects
(d) 2 selects, 1 delete, 3 inserts
(e) None of the above
During commit hibernate checks whether the
collection property is dirty (needs to be re-created)
by comparing Java identities (object references).
9. Another Look
@Entity
public class Hydra {
private Long id;
private List<Head> heads = new ArrayList<Head>();
@Id @GeneratedValue
public Long getId() {...}
protected void setId() {...}
@OneToMany(cascade=CascadeType.ALL)
public List<Head> getHeads() {
return Collections.unmodifiableList(heads);
}
protected void setHeads() {...}
}
// new EntityManager and new transaction: creates and persists the hydra with 3 heads
// new EntityManager and new transaction
// during find only 1 select (hydra)
Hydra found = em.find(Hydra.class, hydra.getId());
// during commit 1 select (heads),1 delete (heads),3 inserts (heads)
10. Lessons Learned
• Expect unexpected ;-)
• Prefer field access mappings
• Operate on collection objects returned by
hibernate
–Don’t change collection references unless you know
what you’re doing
11. Lessons Learned
• Expect unexpected ;-)
• Prefer field access mappings
• Operate on collection objects returned by
hibernate
–Don’t change collection references unless you know
what you’re doing
List<Head> newHeads = new List<>(hydra.getHeads());
Hydra.setHeads(newHeads);
13. Lessons Learned
• A lot of depends on a JPA Provider!
• JPA is a spec
– A great spec, but only a spec
– It says what to implement, not how to implement
• You need to tune an application in a concrete
environment
23. Sum of Salaries By Country
Select All (1)
TypedQuery<Employee> query = em.createQuery(
"SELECT e FROM Employee e", Employee.class);
List<Employee> list = query.getResultList();
// calculate sum of salaries by country
// map: country->sum
Map<String, BigDecimal> results = new HashMap<>();
for (Employee e : list) {
String country = e.getAddress().getCountry();
BigDecimal total = results.get(country);
if (total == null) total = BigDecimal.ZERO;
total = total.add(e.getSalary());
results.put(country, total);
}
24. Sum of Salaries by Country
Select Join Fetch (2)
TypedQuery<Employee> query = em.createQuery(
"SELECT e FROM Employee e
JOIN FETCH e.address", Employee.class);
List<Employee> list = query.getResultList();
// calculate sum of salaries by country
// map: country->sum
Map<String, BigDecimal> results = new HashMap<>();
for (Employee e : list) {
String country = e.getAddress().getCountry();
BigDecimal total = results.get(country);
if (total == null) total = BigDecimal.ZERO;
total = total.add(e.getSalary());
results.put(country, total);
}
25. ReporUng AnU-PaVerns
ProjecUon (3)
Query query = em.createQuery(
"SELECT e.salary, e.address.country
FROM Employee e”);
List<Object[]> list = query.getResultList();
// calculate sum of salaries by country
// map: country->sum
Map<String, BigDecimal> results = new HashMap<>();
for (Object[] e : list) {
String country = (String) e[1];
BigDecimal total = results.get(country);
if (total == null) total = BigDecimal.ZERO;
total = total.add((BigDecimal) e[0]);
results.put(country, total);
}
26. ReporUng AnU-PaVerns
AggregaUon JPQL (4)
Query query = em.createQuery(
"SELECT SUM(e.salary), e.address.country
FROM Employee e
GROUP BY e.address.country”);
List<Object[]> list = query.getResultList();
// already calculated!
27. ReporUng AnU-PaVerns
AggregaUon SQL (5)
Query query = em.createNativeQuery(
"SELECT SUM(e.salary), a.country
FROM employee e
JOIN address a ON e.address_id = a.id
GROUP BY a.country");
List list = query.getResultList();
// already calculated!
29. ProjecUon
JPQL -> Value Object
Query query = em.createQuery(
"SELECT new com.yonita.jpa.vo.EmployeeVO(
e.salary, e.address.country)
FROM Employee e”);
// List<EmployeeVO>
List list = query.getResultList();
30. ProjecUon
JPQL -> Value Object
Query query = em.createQuery(
"SELECT new com.yonita.jpa.CountryStatVO(
sum(e.salary), e.address.country)
FROM Employee e
GROUP BY e.address.country"”);
// List<CountryStatVO>
List list = query.getResultList();
32. ProjecUon
SQL -> Value Object
Query query = em.createNativeQuery(
"SELECT SUM(e.salary), a.country
FROM employee e
JOIN address a ON e.address_id = a.id
GROUP BY a.country", "countryStatVO");
// List<CountryStatVO>
List list = query.getResultList();
33. ProjecUon
Wrap-up
• JPA 2.0
– Only JPQL query to directly produce a value object!
• JPA 2.1
– JPQL and naUve queries to directly produce a value object!
• Managed object
– Sync with database
– L1/L2 cache
• Use cases for Direct Value Object
– ReporUng, staUsUcs, history
– Read-only data, GUI data
– Performance:
• No need for managed objects
• Rich (or fat) managed objects
• Subset of aVributes required
• Gain speed
• Offload an app server
34. AggregaUon
Wrap-up
• JPA 2.0
– Selected aggregaUon funcUons: COUNT, SUM, AVG, MIN, MAX
• JPA 2.1
– All funcUon as supported by a database
– Call any database funcUon with new FUNCTION keyword
• Database-specific aggregate funcUons
– MS SQL: STDEV, STDEVP, VAR, VARP,…
– MySQL: BIT_AND, BIT_OR, BIT_XOR,…
– Oracle: MEDIAN, PERCENTILE,…
– More…
• Use cases
– ReporUng, staUsUcs
– Performance
• Gain speed
• Offload an app server to a database!
35. Loading Strategy: EAGER for sure!
• We know what we want
– Known range of required data in a future
execution path
• We want a little
– A relatively small entity, no need to divide it into
tiny pieces
36. Loading strategy: Usually Better
EAGER!
• Network latency to a database
– Lower number of round-trips to a database with
EAGER loading
37. Loading Strategy: LAZY for sure!
• We don’t know what we want
– Load only required data
– „I’ll think about that tomorrow”
• We want a lot
– Divide and conquer
– Load what’s needed in the first place
38. Large Objects
• Lazy Property Fetching
• @Basic(fetch = FetchType.LAZY)
• Recommended usage
– Blobs
– Clobs
– Formulas
• Remember about byte-code instrumentation,
– Otherwise will not work
– Silently ignores
39. Large Objects
• Lazy Property Fetching
• @Basic(fetch = FetchType.LAZY)
• Recommended usage
– Blobs
– Clobs
– Formulas
• Remember about byte-code instrumentation,
– Otherwise will not work
– Silently ignores
42. Large Objects
• Something smells here
• Do you really need them?
• But do you really need them?
• Ponder on your object model and use cases,
otherwise it’s not gonna work
43. Large Collections
• Divide and conquer!
• Definitely lazy
• You don’t want a really large collection in the
memory
• Batch size
– JPA Provider specific configuration
44. Hibernate: Plant a Tree
@Entity
public class Forest {
@Id @GeneratedValue
private Long id;
@OneToMany
private Collection<Tree> trees = new HashSet<Tree>();
public void plantTree(Tree tree) {
return trees.add(tree);
}
}
// new EntityManager and new transaction: creates and persists a forest with 10.000 trees
// new EntityManager and new transaction
Tree tree = new Tree(“oak”);
em.persist(tree);
Forest forest = em.find(Forest.class, id);
forest.plantTree(tree);
45. How Many Queries in 2nd Tx?
@Entity
public class Forest {
@Id @GeneratedValue
private Long id;
@OneToMany
private Collection<Tree> trees = new HashSet<Tree>();
public void plantTree(Tree tree) {
return trees.add(tree);
}
}
// new EntityManager and new transaction: creates and persists a forest with 10.000 trees
// new EntityManager and new transaction
Tree tree = new Tree(“oak”);
em.persist(tree);
Forest forest = em.find(Forest.class, id);
forest.plantTree(tree);
(a) 1 select, 2 inserts
(b) 2 selects, 2 inserts
(c) 2 selects, 1 delete,
10.000+2 inserts
(d) 2 selects, 10.000
deletes, 10.000+2 inserts
(e) Even more ;-)
46. How Many Queries in 2nd Tx?
(a) 1 select, 2 inserts
(b) 2 selects, 2 inserts
(c) 2 selects, 1 delete, 10.000+2 inserts
(d) 2 selects, 10.000 deletes, 10.000+2 inserts
(e) Even more ;-)
The combination of OneToMany and Collection
enables a bag semantic. That’s why the collection is
re-created.
47. Plant a Tree Revisited
@Entity
public class Orchard {
@Id @GeneratedValue
private Long id;
@OneToMany
private List<Tree> trees = new ArrayList<Tree>();
public void plantTree(Tree tree) {
return trees.add(tree);
}
}
// creates and persists a forest with 10.000 trees
// new EntityManager and new transaction
Tree tree = new Tree(“apple tree”);
em.persist(tree);
Orchard orchard = em.find(Orchard.class, id);
orchard.plantTree(tree);
STILL BAG SEMANTIC
Use OrderColumn or
IndexColumn for list
semantic.
48. Plant a Tree
@Entity
public class Forest {
@Id @GeneratedValue
private Long id;
@OneToMany
private Set<Tree> trees = new HashSet<Tree>();
public void plantTree(Tree tree) {
return trees.add(tree);
}
}
// new EntityManager and new transaction: creates and persists a forest with 10.000 trees
// new EntityManager and new transaction
Tree tree = new Tree(“oak”);
em.persist(tree);
Forest forest = em.find(Forest.class, id);
forest.plantTree(tree);
1. Collection elements
loaded into memory
2. Possibly unnecessary
queries
3. Transaction and locking
schema problems:
version, optimistic
locking
49. Plant a Tree
@Entity public class Forest {
@Id @GeneratedValue
private Long id;
@OneToMany(mappedBy = „forest”)
private Set<Tree> trees = new HashSet<Tree>();
public void plantTree(Tree tree) {
return trees.add(tree);
}
}
@Entity public class Tree {
@Id @GeneratedValue
private Long id;
private String name;
@ManyToOne
private Forest forest;
public void setForest(Forest forest) {
this.forest = forest;
Forest.plantTree(this);
}
}
Set semantic on the
inverse side forces of
loading all trees.
51. Loading strategy: It depends!
• You know what you want
– But it’s dynamic, depending on an execution path
and its parameters
52. Loading strategy: It depends!
• You know what you want
– But it’s dynamic, depending on runtime
parameters
• That was the problem in JPA 2.0
– Fetch queries
– Provider specific extensions
– Different mappings for different cases
• JPA 2.1 comes in handy
53. Entity Graphs in JPA 2.1
• „A template that captures the paths and
boundaries for an operation or query”
• Fetch plans for query or find operations
• Defined by annotations
• Created programmatically
54. Entity Graphs in JPA 2.1
• Defined by annotations
– @NamedEntityGraph, @NamedEntitySubgraph,
@NamedAttributeNode
• Created programmatically
– Interfaces EntityGraph, EntitySubgraph,
AttributeNode
55. Entity Graphs in Query or Find
• Default fetch graph
– Transitive closure of all its attributes specified or
defaulted as EAGER
• javax.persistence.fetchgraph
– Attributes specified by attribute nodes are EAGER,
others are LAZY
• javax.persistence.loadgraph
– Attributes specified by by attribute nodes are
EAGER, others as specified or defaulted
56. Entity Graphs in Query or Find
• Default fetch graph
– Transitive closure of all its attributes specified or
defaulted as EAGER
• javax.persistence.fetchgraph
– Attributes specified by attribute nodes are EAGER,
others are LAZY
• javax.persistence.loadgraph
– Attributes specified by by attribute nodes are
EAGER, others as specified or defaulted
57. Entity Graphs in Query or Find
• Default fetch graph
– Transitive closure of all its attributes specified or
defaulted as EAGER
• javax.persistence.fetchgraph
– Attributes specified by attribute nodes are EAGER,
others are LAZY
• javax.persistence.loadgraph
– Attributes specified by by attribute nodes are
EAGER, others as specified or defaulted
58. Entity Graphs Advantages
• Better hints to JPA providers
• Hibernate now generates smarter queries
– 1 select with joins on 3 tables
– 1 round-trip to a database instead of default N+1
• Dynamic modification of a fetch plan
61. Wrap-up
• Main use cases
– LisUng and reporUng
• JPA
– EnUty graphs (JPA 2.1)
– ProjecUons (JPA 2.0/2.1)
• Performance
– Don’t load if you don’t need
– Don’t execute many small queries if you can execute one big query
– Don’t calculate if a database can
• Tuning
– Tune in your concrete environment
– JPA Providers behave differently!
– Databases behave differently!