The accuracy, internal quality, and reliability of data is frequently referred to using the term 'data integrity'. Without it, data is less valuable or even useless. This session takes a close look at what data integrity entails and how it can be enforced in multi-tier application architectures using distributed data sources and global transactions. The discussion will make clear which elements are required from any robust implementation of data oriented business rules aka data constraints and it will explain how most existing solutions are not as watertight as is frequently assumed. Steps for achieving reliable constraint enforcement are demonstrated.
Summary
- what is data integrity
- types of data constraints (and various levels: attribute, record, inter-entity)
- what is the notion of a transaction (and a commit)
- data constraint enforcement in various tiers of enterprise applications: user interface (client side), web tier, business service, database
- what are the challenges for implementing data integrity in a multi user environment; what are the additional challenges in an environment with multiple independent data sources
- demonstrate a common implementation of data integrity - starting at the UI and adding additional enforcement working our way down through the tiers
- make clearly visible how because of multi-session, data caching, clustering etc. most implementations look reasonable enough but lack robustness
- explain and demonstrate how some degree of locking is required to provide true data integrity in a multi-session environment; explain what the finest grained level of locking should be and how that can be implemented.
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Data Integrity in Java Applications (JFall 2013)
1. On the integrity of data in Java
Applications
Lucas Jellema (AMIS)
NLJUG JFall 2013
6th November 2013, Nijkerk, The Netherlands
2. Agenda
• What is integrity?
• Enforcing data constraints
– throughout the application architecture
• Transactions
• Exclusive Access to …
• The Distributed World
3. 3
Definition of Integrity
• Truth
– Nothing but the truth
• The Only Truth
• [Degree of] success or
completeness of
actions is known
13. 13
Record (Type) level rules
• Program should be Kids
when age < 18; either
Developer or Management
when age > 18
• Using JavaScript
– when either field changes
(handle nulls)
– on submit of the entire
record
• Using Bean Validation:
custom type validator
– in either web-tier or JPA
16. 16
Validation Implementation
options & considerations
Native
Mobile Client
Native HTML 5;
JavaScript
Client
(pure HTML 5 & Java
Script)
Native HTML 5;
JavaScript
Client
(JSF based HTML 5 & Java Script)
Custom;
Web Tier JSF Validator;
Bean
JavaServer Faces Validation
Custom;
Bean Validation
RESTful Services
POJO
Domain
Model
Business Tier
JPA
RDBMS
EJB
Custom;
Bean Validation
17. 17
But wait – there is more!
• More User Interfaces
• More Attendee
•
•
•
•
Instances
More Entities
& More types
of Constraints
More Users, Sessions,
and Transactions
More Nodes in
the Middle Tier Cluster
More Data Stores
19. 19
Multiple-Instances-of-Single-Entity
constraints
• Constraints that cover multiple same type objects/instances
–
–
–
–
–
–
Attendee’s Registration Id is unique
No more than 5 conference attendees from the same company
Not more than two sessions by the same speaker
At most one session scheduled per room per slot
Only one keynote session in a slot
Sessions from up to a maximum of three tracks can be scheduled in the same room
20. 20
Inter entity constraints
• Attendees can only attend one hands-on session during the conference
• A person cannot attend another session in a slot in which the session
(s)he is speaker of is scheduled
• No more planned session attendances are allowed than the capacity of
the room in which the session is scheduled to take place
• If the room capacity is smaller than 100, then no more than 2 people from
the same company may sign up for it
• Attendees from Amsterdam cannot attend sessions in room 010
• Common challenge:
– Many data change events
can lead to constraint violation
21. 21
Event Analysis
for Inter Entity Constraint
• No more planned session attendances are allowed than the capacity of
the room in which the session is scheduled to take place
Create,
Update (session reference)
Update (room reference)
Update (capacity [decrease])
22. 22
Constraint classification
• Based on event-analysis (when can the constraint get
violated) we discern these categories of contraints
–
–
–
–
Attribute
Tuple
Entity
Inter Entity
• Each category has its own
implementation methods,
options and considerations
– Multi record instance rules cannot
meaningfully be enforced in client/web-tier
24. 24
Nous ne sommes
pas ‘Sans Famille’
Mobile Client
Client
(pure HTML 5 & Java
Script)
Client
(JSF based HTML 5 & Java Script)
Web Tier
JavaServer Faces
RESTful Services
POJO
Domain
Model
Business Tier
JPA
RDBMS
EJB
25. 25
Multiple clients for
Data Source
Client
(pure HTML 5 & Java Script)
Mobile Client
Client
(JSF based HTML 5 & Java Script)
Web Tier
JavaServer Faces
RESTful Services
POJO Domain Model
EJB
Business Tier
JPA
Mobile Client
Client
(pure HTML 5 & Java
Script)
Client
(JSF based HTML 5 & Java Script)
Web Tier
JavaServer Faces
RESTful Services
POJO
Domain
Model
Business Tier
JPA
EJB
.NET
ESB
DBA/
Application
Admin
RDBMS
Batch
26. 26
Integrity Enforcement in the
Persistent Store
• All data is available
• Persistent store is the final stop: the buck stops here
– Any alternative data manipulation (channel) has to go to the persistent store
– Mobile, Batch, DBA, ESB
• Built-in (native) mechanisms
for constraint enforcement
– Productive development, proven robustness, scalable performance
– For example:
Column Type, PK/UK, FK, Check; trigger
• Transactions
• Enforcing integrity is integral part of persisting data
– Without final validation, persistent store cannot take responsibility for integrity
28. 28
Implementation Consideration
for Multiple-Entity-Instance rule
• Implementation – how and where?
–
–
–
–
–
Is the entire set of data available
Is all associated info available
Is the data set stable?
Can the constraint elegantly be implemented (natively? good framework support?)
Are all data access paths covered?
29. 29
Implementing Multi-Instance
constraint ‘5 max per company’
Register New Attendee – method A
- Ensure L2 Cache is up to date in terms of
Attendees (fetch all attendees into cache)
- Inspect the collection of attendees for
same company
- Persist Attendee if collection does not hold
5 (or more)
POJO
Domain
Model
Register New Attendee – method B
- Select count of attendees in same
company from the Data Store
- Inspect the long value
- Persist Attendee if long is < 5
Business Tier
JPA
Attendees
L2 Cache
Attendees
31. Max 5 per Company – Flaws in
JPA Enforcement
• Persist does not [always] ‘post to database’
– When more than one attendee is added in a transaction, prior ones are not counted
when the latter are validated
Thread 1
POJO
Domain
Model
select count
persist
select count
persist
Facade
Business Tier
JPA
Attendees
31
33. Max 5 per Company – Flaws in
JPA Enforcement
• Persist does not [always] ‘post to database’
– When more than one attendee is added in a transaction, prior ones are not counted
when the latter are validated
Thread 1
POJO
Domain
Model
select count
persist
select count
persist
commit
Facade
Business Tier
JPA
Attendees
33
35. 35
JPA Facade enforcement in a
multi-threaded world
Client
HTML 5 & Java Script
Session A
Client
HTML 5 & Java Script
Session B
Web Tier
Thread 1
POJO
Domain
Model
Thread 2
select count
persist
select count
persist
Facade
Business Tier
JPA
Attendees
36. 36
JPA Facade enforcement in a
multi-threaded world
Client
HTML 5 & Java Script
Session A
Client
HTML 5 & Java Script
Session B
Web Tier
Thread 1
POJO
Domain
Model
Thread 2
select count
persist
commit
select count
persist
commit
Facade
Business Tier
JPA
Attendees
39. Data Trick – Materialized View
with Check Constraint
39
40. 40
Transactions
• Logically consistent set of data manipulations
– Atomic units of work
– Succeed or fail together
– Any changes inside a transaction are invisible to other sessions/transactions until the
transaction completes (commits)
– Note: during a transaction, constraints may be violated; the only thing that matters:
commit [time]
– Transaction ends with succesful commit or rollback –
In both cases, transaction-owned locks are released
• ACID (in RDBMS)
– vs BASE (in NoSQL)
• Note: post vs. commit with RDBMS
– Post means do [all] data manipulation (insert, update, delete) but do not commit [yet]
– Only upon commit are changes persisted and published
48. 48
Distributed or Global
Transaction
• One logical unit of work - involving data manipulations in multiple
resources (global transaction composed of local transactions)
Mobile Client
Client
(pure HTML 5 & Java
Script)
Client
(JSF based HTML 5 & Java Script)
Web Tier
JavaServer Faces
RESTful Services
POJO
Domain
Model
RDBMS
EJB
Business Tier
RDBMS
JCA
JMS
ERP
49. 49
Implementation for
Distributed Transaction
• Typical approach: two-phase commit
– Each resource locks and validates – then reports OK or NOK back to the transaction
overseeer
– When all resources have indicated OK
then phase two:
all resources commit and
release locks
– When one or more resources signal
NOK, then phase two:
all resources roll back/undo
changes and release locks
• With regards to integrity:
– With a distributed transaction,
the integrity for each participant
is handled as before;
this will result in ‘constraint-locks’ in multiple separate resources
50. 50
Distributed (aka global)
transaction inside container
• Java EE containers (and various non-EE JTA implementations) support
global (distributed) transactions within a JVM
– JTA (JSR-907) – based on X/Open XA architecture
• Key element is Transaction Monitor (the container) and Resource
Managers (JDBC, EJB, JMS, JCA)
• One non-XA resource can participate (file system, email, …) in a global
transaction:
–
–
–
–
All XA-resources perform Phase One
The non-XA resource does its thing
Upon success of the non-XA resource: others perform Phase two by comitting
Upon failure of the non-XA resource: others roll back
51. 51
Distributed transactions
across/outside containers
Step 2:
Payment
Mobile Client
Client
(pure HTML 5 & Java
Script)
Client
(JSF based HTML 5 & Java Script)
Web Tier
JavaServer Faces
RESTful Services
POJO
Domain
Model
Business Tier
JPA
RDBMS
EJB
52. 52
Distributed transactions
across/outside containers
• Transaction involving remote containers, Web Services, File System or
any stateless transaction participant
• There is no actual common, shared vehicle (like a global XA transaction)
– There is not really a coordinated two-phase commit
• Transaction consists of
– Any resource does its thing – lock, validate, commit (or rollback), report back
– If all resources report succes: great, done
– If one resource reports failure the all other resources should perform
‘compensation’ – i.e. rollback/undo effects of a committed transaction
commit
Container
Local
Enterprise
Resource
Transaction
compensate
commit
Remote/Stateless
Enterprise
Resource
Remote/Stateless
Enterprise
Resource
53. 53
Compensation
• How to implement a compensation mechanism?
• How long after the commit can compensation be requested?
• What is the state of the enterprise resource between commit and the
compensation expiry time?
• Should the invoker notify the resource that compensation is no longer
required (so the ‘logical locks’/’temporary state’ can be updated)
– i.e. the global distributed transaction has succussfully completed
commit
compensate
Enterprise
Resource
54. 54
RESTful transaction is a
distributed transaction
Client
Resource A
Resource B
Domain Model/JPA Cache
Resource C
55. 55
RESTful transaction is a
distributed transaction
Client
Resource A
Resource B
Domain Model/JPA
Resource C
56. 56
Distributed
Constraints
• Constraints that involve data collections in multiple enterprise resources
Mobile Client
Client
(pure HTML 5 & JS)
Client
(JSF based HTML 5 & Java Script)
Web Tier
JavaServer Faces
RESTful Services
POJO
Domain
Model
RDBMS
Table Y
Business Tier
RDBMS
Table X
EJB
JCA
JMS
ERP
57. 57
Distributed Constraints
• Not more than three attendees (resource A) from the same company may
attend a session (resource B)
– Insert/Update Attendance requires validation – as does update of Attendee.company
Client
Client
Web Tier
Java EE
Business Tier
Client
Web Tier
MAX_3_COMP_ATT
Java EE
Business Tier
Distributed Lock
Manager
ATTENDEES
ATTENDANCES
58. 58
Distributed Constraints
• Not more than three attendees (resource A) from the same company may
attend a session (resource B)
– Insert/Update Attendance requires validation – as does update of Attendee.company
Client
Client
Web Tier
Java EE
Business Tier
Client
Web Tier
MAX_3_COMP_ATT
Java EE
Business Tier
Distributed Lock
Manager
ATTENDEES
ATTENDANCES
59. 59
Distributed Constraints
• Not more than three attendees (resource A) from the same company may
attend a session (resource B)
– Insert/Update Attendance requires validation – as does update of Attendee.company
Client
ESB
Client
Web Tier
Java EE
Business Tier
Client
Web Tier
MAX_3_COMP_ATT
Java EE
Business Tier
Distributed Lock
Manager
ATTENDEES
ATTENDANCES
60. 61
Java global (distributed) lock
managers
• Within JVM: SynchronousQueue
• Across JVMs: Apache ZooKeeper, HazelCast, Oracle Coherence, …
JVM
JVM
JVM
61. 62
Summary
• Which level of integrity is required?
• Change undermines integrity
– Data change is trigger for constraint validation
• Exclusive lock on multi-record validation
– released when transaction commits
• Ensure that all data access paths are covered
– Not all data manipulations may come through the Java middle tier
• Transactions may include multiple enterprise resources
– That may not be able to participate in a distributed transaction and have to support a
compensation mechanism
• True integrity and real robustness are very hard to achieve
– Much harder than is commonly assumed