On the integrity of data in Java
Applications
Lucas Jellema (AMIS)
NLJUG JFall 2013
6th November 2013, Nijkerk, The Nether...
Agenda
• What is integrity?
• Enforcing data constraints
– throughout the application architecture
• Transactions
• Exclus...
3

Definition of Integrity
• Truth
– Nothing but the truth

• The Only Truth
• [Degree of] success or
completeness of
acti...
4

Sufficient Integrity

Integrity

Integrity
7,0
48,23

π

Uncorrupted

33,0000002
“five”

Corruptible

42
Correct
Comple...
5

Conference Application
6

Conference Application

Client
(HTML 5 & Java Script)

Web Tier
JavaServer Faces
POJO
Domain
Model

Business Tier
JPA

...
7

Validation at entry time
8

Validation at entry time
Client and View
9

Validation at entry time
Client and View
More validation at entry time –
bean Validation

10
11

Validation at entry time
Bean Validation in View
12

Engage Bean Validation
in Web Tier
13

Record (Type) level rules
• Program should be Kids
when age < 18; either
Developer or Management
when age > 18
• Using...
14

Type Level Constraints with
Bean Validation
15

Type Level Bean Validation:
Custom Validator
16

Validation Implementation
options & considerations
Native
Mobile Client

Native HTML 5;
JavaScript
Client
(pure HTML 5...
17

But wait – there is more!
• More User Interfaces
• More Attendee
•
•
•
•

Instances
More Entities
& More types
of Cons...
18

Domain model
•
•
•
•
•
•

Attendee
Speaker
Session
Room
Slot
Attendance
– Booked
– Realized
19

Multiple-Instances-of-Single-Entity
constraints
• Constraints that cover multiple same type objects/instances
–
–
–
–
...
20

Inter entity constraints
• Attendees can only attend one hands-on session during the conference
• A person cannot atte...
21

Event Analysis
for Inter Entity Constraint
• No more planned session attendances are allowed than the capacity of
the ...
22

Constraint classification
• Based on event-analysis (when can the constraint get
violated) we discern these categories...
23

Nous ne sommes
pas ‘Sans Famille’
24

Nous ne sommes
pas ‘Sans Famille’

Mobile Client

Client
(pure HTML 5 & Java
Script)

Client
(JSF based HTML 5 & Java ...
25

Multiple clients for
Data Source
Client
(pure HTML 5 & Java Script)

Mobile Client

Client
(JSF based HTML 5 & Java Sc...
26

Integrity Enforcement in the
Persistent Store
• All data is available
• Persistent store is the final stop: the buck s...
27

Multiple-Instances-of-Single-Entity
constraints
• No more than 5 conference attendees from the same company
28

Implementation Consideration
for Multiple-Entity-Instance rule
• Implementation – how and where?
–
–
–
–
–

Is the ent...
29

Implementing Multi-Instance
constraint ‘5 max per company’
Register New Attendee – method A
- Ensure L2 Cache is up to...
30

Max 5 per company
JPA Facade enforcement
Max 5 per Company – Flaws in
JPA Enforcement
• Persist does not [always] ‘post to database’
– When more than one attendee ...
32

One thread persisting two
attendees in a row – no flush
Max 5 per Company – Flaws in
JPA Enforcement
• Persist does not [always] ‘post to database’
– When more than one attendee ...
34

Flush after persist for complete
picture
35

JPA Facade enforcement in a
multi-threaded world
Client
HTML 5 & Java Script
Session A

Client
HTML 5 & Java Script
Se...
36

JPA Facade enforcement in a
multi-threaded world
Client
HTML 5 & Java Script
Session A

Client
HTML 5 & Java Script
Se...
37

Two threads inter-leaving
38

Database Solution?
Data Trick – Materialized View
with Check Constraint

39
40

Transactions
• Logically consistent set of data manipulations
– Atomic units of work
– Succeed or fail together
– Any ...
41

Perfect Integrity
42

Fine grained locking

Transaction 1

Transaction 2

insert …
('John','Doe',…)

Attendees

Unique Key UK1 on
(FirstName...
43

Fine grained locking

Transaction 1
insert …
('John','Doe',…)

Transaction 2

insert …
('Jane','Doe',…)
update <JANE>
...
44

Fine grained locking

Transaction 1
insert …
('John','Doe',…)
Lock on
UK1_JOHN_
DOE

Transaction 2

insert …
('Jane','...
45

JPA Facade enforcement
Exclusive Constraint Checking
Client
HTML 5 & Java Script
Session A

Client
HTML 5 & Java Scrip...
46

Two threads and Lock on
Constraint
47

Two threads and Lock on
Constraint
48

Distributed or Global
Transaction
• One logical unit of work - involving data manipulations in multiple
resources (glo...
49

Implementation for
Distributed Transaction
• Typical approach: two-phase commit
– Each resource locks and validates – ...
50

Distributed (aka global)
transaction inside container
• Java EE containers (and various non-EE JTA implementations) su...
51

Distributed transactions
across/outside containers

Step 2:
Payment

Mobile Client

Client
(pure HTML 5 & Java
Script)...
52

Distributed transactions
across/outside containers
• Transaction involving remote containers, Web Services, File Syste...
53

Compensation
• How to implement a compensation mechanism?
• How long after the commit can compensation be requested?
•...
54

RESTful transaction is a
distributed transaction

Client

Resource A

Resource B
Domain Model/JPA Cache

Resource C
55

RESTful transaction is a
distributed transaction

Client

Resource A

Resource B
Domain Model/JPA

Resource C
56

Distributed
Constraints
• Constraints that involve data collections in multiple enterprise resources
Mobile Client

Cl...
57

Distributed Constraints
• Not more than three attendees (resource A) from the same company may
attend a session (resou...
58

Distributed Constraints
• Not more than three attendees (resource A) from the same company may
attend a session (resou...
59

Distributed Constraints
• Not more than three attendees (resource A) from the same company may
attend a session (resou...
61

Java global (distributed) lock
managers
• Within JVM: SynchronousQueue
• Across JVMs: Apache ZooKeeper, HazelCast, Ora...
62

Summary
• Which level of integrity is required?
• Change undermines integrity
– Data change is trigger for constraint ...
64

Handling Integrity Really Well...
Lucas Jellema (AMIS)
Email: lucas.jellema@amis.nl
Twitter: @lucasjellema

Blog: http://technology.amis.nl
Website: http://...
Upcoming SlideShare
Loading in...5
×

Data Integrity in Java Applications (JFall 2013)

1,233

Published on

The accuracy, internal quality, and reliability of data is frequently referred to using the term 'data integrity'. Without it, data is less valuable or even useless. This session takes a close look at what data integrity entails and how it can be enforced in multi-tier application architectures using distributed data sources and global transactions. The discussion will make clear which elements are required from any robust implementation of data oriented business rules aka data constraints and it will explain how most existing solutions are not as watertight as is frequently assumed. Steps for achieving reliable constraint enforcement are demonstrated.

Summary
- what is data integrity
- types of data constraints (and various levels: attribute, record, inter-entity)
- what is the notion of a transaction (and a commit)
- data constraint enforcement in various tiers of enterprise applications: user interface (client side), web tier, business service, database
- what are the challenges for implementing data integrity in a multi user environment; what are the additional challenges in an environment with multiple independent data sources
- demonstrate a common implementation of data integrity - starting at the UI and adding additional enforcement working our way down through the tiers
- make clearly visible how because of multi-session, data caching, clustering etc. most implementations look reasonable enough but lack robustness
- explain and demonstrate how some degree of locking is required to provide true data integrity in a multi-session environment; explain what the finest grained level of locking should be and how that can be implemented.

Published in: Technology
1 Comment
1 Like
Statistics
Notes
  • Thank you Lucas.

    Your slides are enough good, interesting and understanding.
    The Transactions Control is the most problem. I always found this problem in my diferent projects.
    Next Monday at work I'll study again your slides slowly.

    Regards,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
1,233
On Slideshare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
17
Comments
1
Likes
1
Embeds 0
No embeds

No notes for slide

Data Integrity in Java Applications (JFall 2013)

  1. 1. On the integrity of data in Java Applications Lucas Jellema (AMIS) NLJUG JFall 2013 6th November 2013, Nijkerk, The Netherlands
  2. 2. Agenda • What is integrity? • Enforcing data constraints – throughout the application architecture • Transactions • Exclusive Access to … • The Distributed World
  3. 3. 3 Definition of Integrity • Truth – Nothing but the truth • The Only Truth • [Degree of] success or completeness of actions is known
  4. 4. 4 Sufficient Integrity Integrity Integrity 7,0 48,23 π Uncorrupted 33,0000002 “five” Corruptible 42 Correct Complete Consistent Reliable
  5. 5. 5 Conference Application
  6. 6. 6 Conference Application Client (HTML 5 & Java Script) Web Tier JavaServer Faces POJO Domain Model Business Tier JPA RDBMS EJB
  7. 7. 7 Validation at entry time
  8. 8. 8 Validation at entry time Client and View
  9. 9. 9 Validation at entry time Client and View
  10. 10. More validation at entry time – bean Validation 10
  11. 11. 11 Validation at entry time Bean Validation in View
  12. 12. 12 Engage Bean Validation in Web Tier
  13. 13. 13 Record (Type) level rules • Program should be Kids when age < 18; either Developer or Management when age > 18 • Using JavaScript – when either field changes (handle nulls) – on submit of the entire record • Using Bean Validation: custom type validator – in either web-tier or JPA
  14. 14. 14 Type Level Constraints with Bean Validation
  15. 15. 15 Type Level Bean Validation: Custom Validator
  16. 16. 16 Validation Implementation options & considerations Native Mobile Client Native HTML 5; JavaScript Client (pure HTML 5 & Java Script) Native HTML 5; JavaScript Client (JSF based HTML 5 & Java Script) Custom; Web Tier JSF Validator; Bean JavaServer Faces Validation Custom; Bean Validation RESTful Services POJO Domain Model Business Tier JPA RDBMS EJB Custom; Bean Validation
  17. 17. 17 But wait – there is more! • More User Interfaces • More Attendee • • • • Instances More Entities & More types of Constraints More Users, Sessions, and Transactions More Nodes in the Middle Tier Cluster More Data Stores
  18. 18. 18 Domain model • • • • • • Attendee Speaker Session Room Slot Attendance – Booked – Realized
  19. 19. 19 Multiple-Instances-of-Single-Entity constraints • Constraints that cover multiple same type objects/instances – – – – – – Attendee’s Registration Id is unique No more than 5 conference attendees from the same company Not more than two sessions by the same speaker At most one session scheduled per room per slot Only one keynote session in a slot Sessions from up to a maximum of three tracks can be scheduled in the same room
  20. 20. 20 Inter entity constraints • Attendees can only attend one hands-on session during the conference • A person cannot attend another session in a slot in which the session (s)he is speaker of is scheduled • No more planned session attendances are allowed than the capacity of the room in which the session is scheduled to take place • If the room capacity is smaller than 100, then no more than 2 people from the same company may sign up for it • Attendees from Amsterdam cannot attend sessions in room 010 • Common challenge: – Many data change events can lead to constraint violation
  21. 21. 21 Event Analysis for Inter Entity Constraint • No more planned session attendances are allowed than the capacity of the room in which the session is scheduled to take place Create, Update (session reference) Update (room reference) Update (capacity [decrease])
  22. 22. 22 Constraint classification • Based on event-analysis (when can the constraint get violated) we discern these categories of contraints – – – – Attribute Tuple Entity Inter Entity • Each category has its own implementation methods, options and considerations – Multi record instance rules cannot meaningfully be enforced in client/web-tier
  23. 23. 23 Nous ne sommes pas ‘Sans Famille’
  24. 24. 24 Nous ne sommes pas ‘Sans Famille’ Mobile Client Client (pure HTML 5 & Java Script) Client (JSF based HTML 5 & Java Script) Web Tier JavaServer Faces RESTful Services POJO Domain Model Business Tier JPA RDBMS EJB
  25. 25. 25 Multiple clients for Data Source Client (pure HTML 5 & Java Script) Mobile Client Client (JSF based HTML 5 & Java Script) Web Tier JavaServer Faces RESTful Services POJO Domain Model EJB Business Tier JPA Mobile Client Client (pure HTML 5 & Java Script) Client (JSF based HTML 5 & Java Script) Web Tier JavaServer Faces RESTful Services POJO Domain Model Business Tier JPA EJB .NET ESB DBA/ Application Admin RDBMS Batch
  26. 26. 26 Integrity Enforcement in the Persistent Store • All data is available • Persistent store is the final stop: the buck stops here – Any alternative data manipulation (channel) has to go to the persistent store – Mobile, Batch, DBA, ESB • Built-in (native) mechanisms for constraint enforcement – Productive development, proven robustness, scalable performance – For example: Column Type, PK/UK, FK, Check; trigger • Transactions • Enforcing integrity is integral part of persisting data – Without final validation, persistent store cannot take responsibility for integrity
  27. 27. 27 Multiple-Instances-of-Single-Entity constraints • No more than 5 conference attendees from the same company
  28. 28. 28 Implementation Consideration for Multiple-Entity-Instance rule • Implementation – how and where? – – – – – Is the entire set of data available Is all associated info available Is the data set stable? Can the constraint elegantly be implemented (natively? good framework support?) Are all data access paths covered?
  29. 29. 29 Implementing Multi-Instance constraint ‘5 max per company’ Register New Attendee – method A - Ensure L2 Cache is up to date in terms of Attendees (fetch all attendees into cache) - Inspect the collection of attendees for same company - Persist Attendee if collection does not hold 5 (or more) POJO Domain Model Register New Attendee – method B - Select count of attendees in same company from the Data Store - Inspect the long value - Persist Attendee if long is < 5 Business Tier JPA Attendees L2 Cache Attendees
  30. 30. 30 Max 5 per company JPA Facade enforcement
  31. 31. Max 5 per Company – Flaws in JPA Enforcement • Persist does not [always] ‘post to database’ – When more than one attendee is added in a transaction, prior ones are not counted when the latter are validated Thread 1 POJO Domain Model select count persist select count persist Facade Business Tier JPA Attendees 31
  32. 32. 32 One thread persisting two attendees in a row – no flush
  33. 33. Max 5 per Company – Flaws in JPA Enforcement • Persist does not [always] ‘post to database’ – When more than one attendee is added in a transaction, prior ones are not counted when the latter are validated Thread 1 POJO Domain Model select count persist select count persist commit Facade Business Tier JPA Attendees 33
  34. 34. 34 Flush after persist for complete picture
  35. 35. 35 JPA Facade enforcement in a multi-threaded world Client HTML 5 & Java Script Session A Client HTML 5 & Java Script Session B Web Tier Thread 1 POJO Domain Model Thread 2 select count persist select count persist Facade Business Tier JPA Attendees
  36. 36. 36 JPA Facade enforcement in a multi-threaded world Client HTML 5 & Java Script Session A Client HTML 5 & Java Script Session B Web Tier Thread 1 POJO Domain Model Thread 2 select count persist commit select count persist commit Facade Business Tier JPA Attendees
  37. 37. 37 Two threads inter-leaving
  38. 38. 38 Database Solution?
  39. 39. Data Trick – Materialized View with Check Constraint 39
  40. 40. 40 Transactions • Logically consistent set of data manipulations – Atomic units of work – Succeed or fail together – Any changes inside a transaction are invisible to other sessions/transactions until the transaction completes (commits) – Note: during a transaction, constraints may be violated; the only thing that matters: commit [time] – Transaction ends with succesful commit or rollback – In both cases, transaction-owned locks are released • ACID (in RDBMS) – vs BASE (in NoSQL) • Note: post vs. commit with RDBMS – Post means do [all] data manipulation (insert, update, delete) but do not commit [yet] – Only upon commit are changes persisted and published
  41. 41. 41 Perfect Integrity
  42. 42. 42 Fine grained locking Transaction 1 Transaction 2 insert … ('John','Doe',…) Attendees Unique Key UK1 on (FirstName, LastName)
  43. 43. 43 Fine grained locking Transaction 1 insert … ('John','Doe',…) Transaction 2 insert … ('Jane','Doe',…) update <JANE> set firstname ='John' Attendees Unique Key UK1 on (FirstName, LastName)
  44. 44. 44 Fine grained locking Transaction 1 insert … ('John','Doe',…) Lock on UK1_JOHN_ DOE Transaction 2 insert … ('Jane','Doe',…) update <JANE> set firstname ='John' commit Attendees Unique Key UK1 on (FirstName, LastName)
  45. 45. 45 JPA Facade enforcement Exclusive Constraint Checking Client HTML 5 & Java Script Session A Client HTML 5 & Java Script Session B Web Tier Thread 1 POJO Domain Model Thread 2 take lock select count persist Facade commit take lock… select count rollback Business Tier JPA LockMgr ATT_MAX Attendees
  46. 46. 46 Two threads and Lock on Constraint
  47. 47. 47 Two threads and Lock on Constraint
  48. 48. 48 Distributed or Global Transaction • One logical unit of work - involving data manipulations in multiple resources (global transaction composed of local transactions) Mobile Client Client (pure HTML 5 & Java Script) Client (JSF based HTML 5 & Java Script) Web Tier JavaServer Faces RESTful Services POJO Domain Model RDBMS EJB Business Tier RDBMS JCA JMS ERP
  49. 49. 49 Implementation for Distributed Transaction • Typical approach: two-phase commit – Each resource locks and validates – then reports OK or NOK back to the transaction overseeer – When all resources have indicated OK then phase two: all resources commit and release locks – When one or more resources signal NOK, then phase two: all resources roll back/undo changes and release locks • With regards to integrity: – With a distributed transaction, the integrity for each participant is handled as before; this will result in ‘constraint-locks’ in multiple separate resources
  50. 50. 50 Distributed (aka global) transaction inside container • Java EE containers (and various non-EE JTA implementations) support global (distributed) transactions within a JVM – JTA (JSR-907) – based on X/Open XA architecture • Key element is Transaction Monitor (the container) and Resource Managers (JDBC, EJB, JMS, JCA) • One non-XA resource can participate (file system, email, …) in a global transaction: – – – – All XA-resources perform Phase One The non-XA resource does its thing Upon success of the non-XA resource: others perform Phase two by comitting Upon failure of the non-XA resource: others roll back
  51. 51. 51 Distributed transactions across/outside containers Step 2: Payment Mobile Client Client (pure HTML 5 & Java Script) Client (JSF based HTML 5 & Java Script) Web Tier JavaServer Faces RESTful Services POJO Domain Model Business Tier JPA RDBMS EJB
  52. 52. 52 Distributed transactions across/outside containers • Transaction involving remote containers, Web Services, File System or any stateless transaction participant • There is no actual common, shared vehicle (like a global XA transaction) – There is not really a coordinated two-phase commit • Transaction consists of – Any resource does its thing – lock, validate, commit (or rollback), report back – If all resources report succes: great, done – If one resource reports failure the all other resources should perform ‘compensation’ – i.e. rollback/undo effects of a committed transaction commit Container Local Enterprise Resource Transaction compensate commit Remote/Stateless Enterprise Resource Remote/Stateless Enterprise Resource
  53. 53. 53 Compensation • How to implement a compensation mechanism? • How long after the commit can compensation be requested? • What is the state of the enterprise resource between commit and the compensation expiry time? • Should the invoker notify the resource that compensation is no longer required (so the ‘logical locks’/’temporary state’ can be updated) – i.e. the global distributed transaction has succussfully completed commit compensate Enterprise Resource
  54. 54. 54 RESTful transaction is a distributed transaction Client Resource A Resource B Domain Model/JPA Cache Resource C
  55. 55. 55 RESTful transaction is a distributed transaction Client Resource A Resource B Domain Model/JPA Resource C
  56. 56. 56 Distributed Constraints • Constraints that involve data collections in multiple enterprise resources Mobile Client Client (pure HTML 5 & JS) Client (JSF based HTML 5 & Java Script) Web Tier JavaServer Faces RESTful Services POJO Domain Model RDBMS Table Y Business Tier RDBMS Table X EJB JCA JMS ERP
  57. 57. 57 Distributed Constraints • Not more than three attendees (resource A) from the same company may attend a session (resource B) – Insert/Update Attendance requires validation – as does update of Attendee.company Client Client Web Tier Java EE Business Tier Client Web Tier MAX_3_COMP_ATT Java EE Business Tier Distributed Lock Manager ATTENDEES ATTENDANCES
  58. 58. 58 Distributed Constraints • Not more than three attendees (resource A) from the same company may attend a session (resource B) – Insert/Update Attendance requires validation – as does update of Attendee.company Client Client Web Tier Java EE Business Tier Client Web Tier MAX_3_COMP_ATT Java EE Business Tier Distributed Lock Manager ATTENDEES ATTENDANCES
  59. 59. 59 Distributed Constraints • Not more than three attendees (resource A) from the same company may attend a session (resource B) – Insert/Update Attendance requires validation – as does update of Attendee.company Client ESB Client Web Tier Java EE Business Tier Client Web Tier MAX_3_COMP_ATT Java EE Business Tier Distributed Lock Manager ATTENDEES ATTENDANCES
  60. 60. 61 Java global (distributed) lock managers • Within JVM: SynchronousQueue • Across JVMs: Apache ZooKeeper, HazelCast, Oracle Coherence, … JVM JVM JVM
  61. 61. 62 Summary • Which level of integrity is required? • Change undermines integrity – Data change is trigger for constraint validation • Exclusive lock on multi-record validation – released when transaction commits • Ensure that all data access paths are covered – Not all data manipulations may come through the Java middle tier • Transactions may include multiple enterprise resources – That may not be able to participate in a distributed transaction and have to support a compensation mechanism • True integrity and real robustness are very hard to achieve – Much harder than is commonly assumed
  62. 62. 64 Handling Integrity Really Well...
  63. 63. Lucas Jellema (AMIS) Email: lucas.jellema@amis.nl Twitter: @lucasjellema Blog: http://technology.amis.nl Website: http://www.amis.nl
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×