poem_presentation_v5_linkedIn_version

•Download as PPTX, PDF•

0 likes•85 views

Iliada Eleftheriou

03/01/2017 Data journey modelling 2
Is bigger
always
better

The problem
03/01/2017 Data journey modelling 3

The problem
03/01/2017 Data journey modelling 4

Related work
• Current approaches
– mainly focused on detailed predictions based on
substantial models
– support project managers throughout the
development process, rather than
• give a low-cost indicator
• for use in early-stage decision making.
– Cocomo, Prince, UML, etc.
• Need for a lightweight approach, that gives
reliable predictions, and can be used early.
03/01/2017 Data journey modelling 5

Project aim
• To develop a method that:
– reliably predicts places of costs and risks,
– can be used in early stage decision making.
• Data journey model:
– Lightweight technique
– captures the journey of data through complex
networks of people and systems
– identifies socio-technical challenges in the journey
– Highlights places of high cost and risk
03/01/2017 Data journey modelling 6

Methods
18 case studies from the NHS domain
• Recent IT developments
• Only 3 successful
IT failure factors
• Technical, e.g. conflicting data formats, data silos
• Social: human and organisational related factors
Data movement:
a key indicator of failure
03/01/2017 Data journey modelling 7

Conceptual model
• Data movement anti-patterns:
– movement of data that under some circumstances
impose costs to the new development
03/01/2017 Data journey modelling 8
If the source stores the data in a
physical form, and the target
requests it in electronic, then a
transformation cost is implied to
either end of the movement.
data entry, injection of errors
• Administrative costs:
• Data sharing agreements
• Governance requirements
• Ethical issues
• Data islands
• Legacy systems
• Clash of grammar: dates,
experience, knowledge.

Conceptual model
03/01/2017 Data journey modelling 9

Operational model
03/01/2017 Data journey modelling 10

Operational model
Data Journey Model:
A. Landscape: existing journeys of data within an
organisational landscape, happening at any given time.
B. New journey: the data journey needed by the new
functionality.
A data journey landscape captures both the social and the
technical factors that can affect the journey of data.
03/01/2017 Data journey modelling 11
DATAjourney.org

Operational model
• A data journey, is a set of data movements
between containers.
• A journey leg moves data, through media.
• Actors interact with containers.
03/01/2017 Data journey modelling 12
DATAjourney.org

Predicting risk
03/01/2017 Data journey modelling 13
• Data movement anti – patterns: High cost and
risk occurred when data moved between actors
and containers with key discrepancies:
– Change of media (physical to electronic)
– Discontinuity (external organisation)
– Actor’s properties (clash of grammar)
• Need low cost ways to incorporate patterns.
– In some cases, information is readily available.
– Other factors, are less obvious (people’s vocabularies)
– Use of proxies

Predicting risk
• Group together the elements of the data
journey diagram with similar properties.
• Overlay groupings onto the landscape to form
boundaries.
03/01/2017 Data journey modelling 14

Evaluation
• Retrospective
evaluation
• Real world case study
• Results:
– Accurately predicted:
13 out of 19 predictions.
– Also, predicted 7 that
haven’t been found by
humans, but assessed
as feasible by domain
experts.
• http://datajourney.org/publications/
tech_rep_data_journey.pdf
03 January 2017 Iliada Eleftheriou 15

Conclusion
• Contributions:
– A set of 32 IT failure factors
– Data movement patterns
– Data journey model:
• Potentially identify opportunities for cost saving
• Next: Application on another case study
– Verify the set of boundaries on the genomics team
of the St Mary’s Hospital.
03 January 2017 Iliada Eleftheriou 16

03/01/2017 Data journey modelling 17
Data journey modelling: Predicting risk for IT developments.
Iliada Eleftheriou
iliada.eleftheriou@manchester.ac.uk
DATAjourney.org

What's hot

The profile of the management (data) scientist: Potential scenarios and skill...Juan Mateos-Garcia

SMACMphasis

20Technology_solution

25Technology_solution

Opportunities and methodological challenges of Big Data for official statist...Piet J.H. Daas

Significant Role of Statistics in Computational SciencesEditor IJCATR

The story of Data StoriesElena Simperl

Data Visualization & Data Storytelling彭其捷 Jack

Application of statistics in cseKrishno Dey

New data sources for statistics: Experiences at Statistics Netherlands.Piet J.H. Daas

Data science SouravSadhukhan6

Art of Information: A Guide to Data VisualizationUXPA Boston

The human face of AI: how collective and augmented intelligence can help sol...Elena Simperl

Data Science Lecture: Overview and Information CollateralFrank Kienle

High-value datasets: from publication to impactElena Simperl

Using fuzzy cognitive maps as decision support tool for smart cities goraczekDanube University Krems, Centre for E-Governance

New Data for Innovation PolicyJuan Mateos-Garcia

Data ScienceRabin BK

What's hot (18)

The profile of the management (data) scientist: Potential scenarios and skill...

SMAC

Opportunities and methodological challenges of Big Data for official statist...

Significant Role of Statistics in Computational Sciences

The story of Data Stories

Data Visualization & Data Storytelling

Application of statistics in cse

New data sources for statistics: Experiences at Statistics Netherlands.

Data science

Art of Information: A Guide to Data Visualization

The human face of AI: how collective and augmented intelligence can help sol...

Data Science Lecture: Overview and Information Collateral

High-value datasets: from publication to impact

Using fuzzy cognitive maps as decision support tool for smart cities goraczek

New Data for Innovation Policy

Data Science

Similar to poem_presentation_v5_linkedIn_version

Algorithmic Systems Transparency and Accountability in Big Data & Cognitive EraNozha Boujemaa

Experimental transformation of ABS data into Data Cube Vocabulary (DCV) form...Alistair Hamilton

Visual analyticsKatrien Verbert

1. Web Mining – Web mining is an application of data mining for di.docxbraycarissa250

Introduction to data scienceMahir Haque

Predictive Analytics: Context and Use CasesKimberley Mitchell

Proposing an Interactive Audit Pipeline for Visual Privacy ResearchChristan Grant

An Introduction to Advanced analytics and data miningBarry Leventhal

Risk management and IT technologiesHadi Fadlallah

6 ijaems sept-2015-6-a review of data security primitives in data miningINFOGAIN PUBLICATION

data mining ellen16187

[IJCT-V3I2P30] Authors: Sunny SharmaIJET - International Journal of Engineering and Techniques

DSSG Speaker Series: Paco NathanPaco Nathan

Big data AnalyticsShivanandaVSeeri

ml-02x01.pdfNextGenATM Erasmus+ Project

التنقيب في البيانات - Data Miningnabil_alsharafi

Sameer Kumar Das International Conference Paper 53Mr.Sameer Kumar Das

Pathways Overview For Open House 19 Sep2010jmorriso

Data Science Introduction: Concepts, lifecycle, applications.pptxsumitkumar600840

Data analytics career pathRubikal

Similar to poem_presentation_v5_linkedIn_version (20)

Algorithmic Systems Transparency and Accountability in Big Data & Cognitive Era

Experimental transformation of ABS data into Data Cube Vocabulary (DCV) form...

Visual analytics

1. Web Mining – Web mining is an application of data mining for di.docx

Introduction to data science

Predictive Analytics: Context and Use Cases

Proposing an Interactive Audit Pipeline for Visual Privacy Research

An Introduction to Advanced analytics and data mining

Risk management and IT technologies

6 ijaems sept-2015-6-a review of data security primitives in data mining

data mining

[IJCT-V3I2P30] Authors: Sunny Sharma

DSSG Speaker Series: Paco Nathan

Big data Analytics

ml-02x01.pdf

التنقيب في البيانات - Data Mining

Sameer Kumar Das International Conference Paper 53

Pathways Overview For Open House 19 Sep2010

Data Science Introduction: Concepts, lifecycle, applications.pptx

Data analytics career path

poem_presentation_v5_linkedIn_version

1. Data journey modelling: Predicting risk of IT developments Iliada Eleftheriou and Suzanne M. Embury Andy Brass Principles of Enterprise Modelling Nov 2016

2. 03/01/2017 Data journey modelling 2 Is bigger always better

3. The problem 03/01/2017 Data journey modelling 3

4. The problem 03/01/2017 Data journey modelling 4

5. Related work • Current approaches – mainly focused on detailed predictions based on substantial models – support project managers throughout the development process, rather than • give a low-cost indicator • for use in early-stage decision making. – Cocomo, Prince, UML, etc. • Need for a lightweight approach, that gives reliable predictions, and can be used early. 03/01/2017 Data journey modelling 5

6. Project aim • To develop a method that: – reliably predicts places of costs and risks, – can be used in early stage decision making. • Data journey model: – Lightweight technique – captures the journey of data through complex networks of people and systems – identifies socio-technical challenges in the journey – Highlights places of high cost and risk 03/01/2017 Data journey modelling 6

7. Methods 18 case studies from the NHS domain • Recent IT developments • Only 3 successful IT failure factors • Technical, e.g. conflicting data formats, data silos • Social: human and organisational related factors Data movement: a key indicator of failure 03/01/2017 Data journey modelling 7

8. Conceptual model • Data movement anti-patterns: – movement of data that under some circumstances impose costs to the new development 03/01/2017 Data journey modelling 8 If the source stores the data in a physical form, and the target requests it in electronic, then a transformation cost is implied to either end of the movement. data entry, injection of errors • Administrative costs: • Data sharing agreements • Governance requirements • Ethical issues • Data islands • Legacy systems • Clash of grammar: dates, experience, knowledge.

9. Conceptual model 03/01/2017 Data journey modelling 9

10. Operational model 03/01/2017 Data journey modelling 10

11. Operational model Data Journey Model: A. Landscape: existing journeys of data within an organisational landscape, happening at any given time. B. New journey: the data journey needed by the new functionality. A data journey landscape captures both the social and the technical factors that can affect the journey of data. 03/01/2017 Data journey modelling 11 DATAjourney.org

12. Operational model • A data journey, is a set of data movements between containers. • A journey leg moves data, through media. • Actors interact with containers. 03/01/2017 Data journey modelling 12 DATAjourney.org

13. Predicting risk 03/01/2017 Data journey modelling 13 • Data movement anti – patterns: High cost and risk occurred when data moved between actors and containers with key discrepancies: – Change of media (physical to electronic) – Discontinuity (external organisation) – Actor’s properties (clash of grammar) • Need low cost ways to incorporate patterns. – In some cases, information is readily available. – Other factors, are less obvious (people’s vocabularies) – Use of proxies

14. Predicting risk • Group together the elements of the data journey diagram with similar properties. • Overlay groupings onto the landscape to form boundaries. 03/01/2017 Data journey modelling 14

15. Evaluation • Retrospective evaluation • Real world case study • Results: – Accurately predicted: 13 out of 19 predictions. – Also, predicted 7 that haven’t been found by humans, but assessed as feasible by domain experts. • http://datajourney.org/publications/ tech_rep_data_journey.pdf 03 January 2017 Iliada Eleftheriou 15

16. Conclusion • Contributions: – A set of 32 IT failure factors – Data movement patterns – Data journey model: • Potentially identify opportunities for cost saving • Next: Application on another case study – Verify the set of boundaries on the genomics team of the St Mary’s Hospital. 03 January 2017 Iliada Eleftheriou 16

17. 03/01/2017 Data journey modelling 17 Data journey modelling: Predicting risk for IT developments. Iliada Eleftheriou iliada.eleftheriou@manchester.ac.uk DATAjourney.org

Editor's Notes

I am Iliada, I come from the UoM and now am on my fourth and final year. My project investigates challenges and risks of moving data across contexts. Today, I will be presenting our paper on how we conceived the data journey model; a lightweight technique that assists in predicting risk for new IT developments.
Is bigger always better? In the context of modelling cost estimation of course. Are bigger, more complex and more detailed cost estimation techniques always more preferable?
Often organisations have new requirements coming in, requiring new functionality to be implemented on top of an existing network of people, systems and data. For example, 2 departments merging, requiring their data to be integrated, Existing data needs to be shared with an external agency to create new value, Or additional data needs to be shared with a consumer Managers and stakeholders of these organisations will have to make a quick decision on whether is worth proceeding with the new development or not. It might sound a simple decision, but in real life is a bit more complicated.
Here we see a drawing from the Kings Fund attempting to structure the National Health Service in the UK. As we can see, organisations are larger and more complex with several sub-organisations and departments each with its own infrastructure, people, policies, governance and politics. Experience shows us that integrating new functionality to an already crowded infrastructure causes things to go wrong. Costs are often underestimated, Projects are given up And jobs are lost So how can we make a go / no go decision in a defensible way, and avoid any newspaper headlines?
Ideally, we would search the literature for an off-the-shelve cost estimation technique. Current approaches to managing risk and estimating the cost are mainly focused on creating detailed predictions based on substantial models of the planned development. They aim to support project managers throughout the development process, rather than giving a low-cost indicator for use in early-stage decision making. Such approaches like COCOMO, PRINCE, I*, UML are powerful and very useful but for later in the cycle. We might have only a day, a week or at most a month to take the decision.
The aim of our project is to help managers and stakeholders of large complex organisations to make better informed decisions on whether to proceed with a new development or not. To do so, we developed a method that reliably predicts risk of new developments, that can be used in early stage decision making. Following the agile methodology ( ), we came up with a rather simplistic model. The data journey model, is a
We analysed 18 case studies from the NHS domain. Written by staff of the NHS and they describe recent IT developments Surprisingly, only 3 out of the 18 studies were categorised by the authors as having been successful. The rest were described as having (completely or partly) failed to deliver the expected benefits. B. We looked for factors influencing the success and failure of the newly introduced development in an existing setting And we extracted a set of 32 factors that contributed to the failure of the developments. We found Not just technical issues of e.g. heterogeneous data sources, but also a majority of social, people and organisational related, factors like: Res. To change, Lack of shared vision Governance and ethical issues C. A form of data movement, either between people, systems, and organisations was a key indicator of failure. Finally, we went through the case studies again and derived generic data movement anti patterns to serve as early warning signs of failure in a new development.
Data entry is a time consuming process typically done by clerical staff, who may not have a strong understanding of the meaning of the data they are entering. Errors can easily be injected that may significantly reduce the quality of the information.
We found 8 anti-patterns so far. Of course is not a complete and final list. But it can get us started. I explain each of them in the paper in more detail. But we can’t just consult managers to avoid any movement of data.
Hence, we propose the data journey model, based on the patterns, assists managers to predict risk. But let’s begin with an example. Let’s imagine we go to our local doctor, the GP to request a blood test. Example used: A GP requests blood test results from a pathology lab. A new external agency requires demographics data from the pathology lab to make workload sharing more effective. So, lets design a data journey model.
As I mentioned before, the djm models the journey / movement of data within and across orgs.
Having modelled the existing journeys of the data and the new one of the new functionality, we can predict places of the journey that may impose high costs and risks to the new development.
From data movement anti-patterns, we found that high cost and risk occurred when data moved between actors and containers with some key discrepancies: Change of media Discontinuity (external organisation) Actor’s properties (clash of grammar): salary band proxy We need low cost ways of incorporating these factors into the data journey model. In some cases, the information is readily available (like whether a container stores data in physical or electronic form). However, other factors, like people’s vocabularies, are less obvious. For these factors we use a proxy; some piece of information which is cheap to apply, and approximates the same relationship between the actors and containers as by the original factor. For example, we use salary bands as a proxy indicator for the presence of “clash of grammars”, on the grounds that a large difference in salary bands between actors probably indicates a different degree of technical expertise.
To identify the places in which the above factors may impose costs, we group together the elements of the data journey diagram with similar properties. These groupings are overlaid onto the landscape of the data journey model and form boundaries. The places where a journey leg crosses from one grouping into another are the predicted location of the cost/risk introduced by the external organisational factor. As we can see, the model doesn’t only predicts high cost places of the new functionality, but also of the existing landscape. The list of the places suggesting to managers a further investigation on the costs that can happen.
We did evaluated our model, though is part of another paper.
Our methodology can potentially be used to identify opportunities for cost saving in an existing system, as well as predicting costs and risks of new developments. Also, the methodology may be used to assess organisational readiness for various compliance programmes, such as clinical guidelines for management of chronic conditions, like diabetes. The guidelines can be modelled as sets of data journeys to check whether the organisation follows or not. If the organisation does not implement a data journey guideline will show the cost of compliance to the organisation.
For any questions or further clarifications, please don’t hesitate to contact me. My email is: iliada.eleftheriou@manchester.ac.uk

poem_presentation_v5_linkedIn_version

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to poem_presentation_v5_linkedIn_version

Similar to poem_presentation_v5_linkedIn_version (20)

poem_presentation_v5_linkedIn_version

Editor's Notes