1. Data journey modelling:
Predicting risk of IT developments
Iliada Eleftheriou
and
Suzanne M. Embury
Andy Brass
Principles of Enterprise Modelling
Nov 2016
5. Related work
• Current approaches
– mainly focused on detailed predictions based on
substantial models
– support project managers throughout the
development process, rather than
• give a low-cost indicator
• for use in early-stage decision making.
– Cocomo, Prince, UML, etc.
• Need for a lightweight approach, that gives
reliable predictions, and can be used early.
03/01/2017 Data journey modelling 5
6. Project aim
• To develop a method that:
– reliably predicts places of costs and risks,
– can be used in early stage decision making.
• Data journey model:
– Lightweight technique
– captures the journey of data through complex
networks of people and systems
– identifies socio-technical challenges in the journey
– Highlights places of high cost and risk
03/01/2017 Data journey modelling 6
7. Methods
18 case studies from the NHS domain
• Recent IT developments
• Only 3 successful
IT failure factors
• Technical, e.g. conflicting data formats, data silos
• Social: human and organisational related factors
Data movement:
a key indicator of failure
03/01/2017 Data journey modelling 7
8. Conceptual model
• Data movement anti-patterns:
– movement of data that under some circumstances
impose costs to the new development
03/01/2017 Data journey modelling 8
If the source stores the data in a
physical form, and the target
requests it in electronic, then a
transformation cost is implied to
either end of the movement.
data entry, injection of errors
• Administrative costs:
• Data sharing agreements
• Governance requirements
• Ethical issues
• Data islands
• Legacy systems
• Clash of grammar: dates,
experience, knowledge.
11. Operational model
Data Journey Model:
A. Landscape: existing journeys of data within an
organisational landscape, happening at any given time.
B. New journey: the data journey needed by the new
functionality.
A data journey landscape captures both the social and the
technical factors that can affect the journey of data.
03/01/2017 Data journey modelling 11
DATAjourney.org
12. Operational model
• A data journey, is a set of data movements
between containers.
• A journey leg moves data, through media.
• Actors interact with containers.
03/01/2017 Data journey modelling 12
DATAjourney.org
13. Predicting risk
03/01/2017 Data journey modelling 13
• Data movement anti – patterns: High cost and
risk occurred when data moved between actors
and containers with key discrepancies:
– Change of media (physical to electronic)
– Discontinuity (external organisation)
– Actor’s properties (clash of grammar)
• Need low cost ways to incorporate patterns.
– In some cases, information is readily available.
– Other factors, are less obvious (people’s vocabularies)
– Use of proxies
14. Predicting risk
• Group together the elements of the data
journey diagram with similar properties.
• Overlay groupings onto the landscape to form
boundaries.
03/01/2017 Data journey modelling 14
15. Evaluation
• Retrospective
evaluation
• Real world case study
• Results:
– Accurately predicted:
13 out of 19 predictions.
– Also, predicted 7 that
haven’t been found by
humans, but assessed
as feasible by domain
experts.
• http://datajourney.org/publications/
tech_rep_data_journey.pdf
03 January 2017 Iliada Eleftheriou 15
16. Conclusion
• Contributions:
– A set of 32 IT failure factors
– Data movement patterns
– Data journey model:
• Potentially identify opportunities for cost saving
• Next: Application on another case study
– Verify the set of boundaries on the genomics team
of the St Mary’s Hospital.
03 January 2017 Iliada Eleftheriou 16
17. 03/01/2017 Data journey modelling 17
Data journey modelling: Predicting risk for IT developments.
Iliada Eleftheriou
iliada.eleftheriou@manchester.ac.uk
DATAjourney.org
Editor's Notes
I am Iliada, I come from the UoM and now am on my fourth and final year.
My project investigates challenges and risks of moving data across contexts.
Today, I will be presenting our paper on how we conceived the data journey model;
a lightweight technique that assists in predicting risk for new IT developments.
Is bigger always better? In the context of modelling cost estimation of course.
Are bigger, more complex and more detailed cost estimation techniques always more preferable?
Often organisations have new requirements coming in, requiring new functionality to be implemented on top of an existing network of people, systems and data.
For example, 2 departments merging, requiring their data to be integrated,
Existing data needs to be shared with an external agency to create new value,
Or additional data needs to be shared with a consumer
Managers and stakeholders of these organisations will have to make a quick decision on whether is worth proceeding with the new development or not.
It might sound a simple decision, but in real life is a bit more complicated.
Here we see a drawing from the Kings Fund attempting to structure the National Health Service in the UK. As we can see,
organisations are larger and more complex with several sub-organisations and departments each with its own infrastructure, people, policies, governance and politics.
Experience shows us that integrating new functionality to an already crowded infrastructure causes things to go wrong.
Costs are often underestimated,
Projects are given up
And jobs are lost
So how can we make a go / no go decision in a defensible way, and avoid any newspaper headlines?
Ideally, we would search the literature for an off-the-shelve cost estimation technique.
Current approaches to managing risk and estimating the cost
are mainly focused on creating detailed predictions based on substantial models of the planned development.
They aim to support project managers throughout the development process, rather than giving a low-cost indicator for use in early-stage decision making.
Such approaches like COCOMO, PRINCE, I*, UML
are powerful and very useful but for later in the cycle. We might have only a day, a week or at most a month to take the decision.
The aim of our project is to help managers and stakeholders of large complex organisations to make better informed decisions on whether to proceed with a new development or not.
To do so, we developed a method that reliably predicts risk of new developments, that can be used in early stage decision making.
Following the agile methodology ( ), we came up with a rather simplistic model. The data journey model, is a
We analysed 18 case studies from the NHS domain.
Written by staff of the NHS and they describe recent IT developments
Surprisingly, only 3 out of the 18 studies were categorised by the authors as having been successful.
The rest were described as having (completely or partly) failed to deliver the expected benefits.
B. We looked for factors influencing the success and failure of the newly introduced development in an existing setting
And we extracted a set of 32 factors that contributed to the failure of the developments.
We found Not just technical issues of e.g. heterogeneous data sources, but also a majority of social, people and organisational related, factors like:
Res. To change, Lack of shared vision
Governance and ethical issues
C. A form of data movement, either between people, systems, and organisations was a key indicator of failure.
Finally, we went through the case studies again and derived generic data movement anti patterns to serve as early warning signs of failure in a new development.
Data entry is a time consuming process typically done by clerical staff, who may not have a strong understanding of the meaning of the data they are entering.
Errors can easily be injected that may significantly reduce the quality of the information.
We found 8 anti-patterns so far. Of course is not a complete and final list. But it can get us started. I explain each of them in the paper in more detail.
But we can’t just consult managers to avoid any movement of data.
Hence, we propose the data journey model, based on the patterns, assists managers to predict risk.
But let’s begin with an example. Let’s imagine we go to our local doctor, the GP to request a blood test.
Example used: A GP requests blood test results from a pathology lab. A new external agency requires demographics data from the pathology lab to make workload sharing more effective.
So, lets design a data journey model.
As I mentioned before, the djm models the journey / movement of data within and across orgs.
Having modelled the existing journeys of the data and the new one of the new functionality, we can predict places of the journey that may impose high costs and risks to the new development.
From data movement anti-patterns, we found that high cost and risk occurred when data moved between actors and containers with some key discrepancies:
Change of media
Discontinuity (external organisation)
Actor’s properties (clash of grammar): salary band proxy
We need low cost ways of incorporating these factors into the data journey model. In some cases, the information is readily available (like whether a container stores data in physical or electronic form).
However, other factors, like people’s vocabularies, are less obvious. For these factors we use a proxy; some piece of information which is cheap to apply, and approximates the same relationship between the actors and containers as by the original factor. For example, we use salary bands as a proxy indicator for the presence of “clash of grammars”, on the grounds that a large difference in salary bands between actors probably indicates a different degree of technical expertise.
To identify the places in which the above factors may impose costs, we group together the elements of the data journey diagram with similar properties. These groupings are overlaid onto the landscape of the data journey model and form boundaries. The places where a journey leg crosses from one grouping into another are the predicted location of the cost/risk introduced by the external organisational factor.
As we can see, the model doesn’t only predicts high cost places of the new functionality, but also of the existing landscape.
The list of the places suggesting to managers a further investigation on the costs that can happen.
We did evaluated our model, though is part of another paper.
Our methodology can potentially be used to identify opportunities for cost saving in an existing system, as well as predicting costs and risks of new developments.
Also, the methodology may be used to assess organisational readiness for various compliance programmes, such as clinical guidelines for management of chronic conditions, like diabetes.
The guidelines can be modelled as sets of data journeys to check whether the organisation follows or not.
If the organisation does not implement a data journey guideline will show the cost of compliance to the organisation.
For any questions or further clarifications, please don’t hesitate to contact me. My email is: iliada.eleftheriou@manchester.ac.uk