poem_presentation_v5_linkedIn_version

•Download as PPTX, PDF•

0 likes•85 views

The document proposes a method called data journey modelling to predict risks and costs for IT developments in early stages. It involves mapping existing and new data journeys through systems and identifying where data movement between different actors or formats could cause issues. The method was evaluated on 18 NHS IT case studies and accurately predicted risks in 13 out of 19 predictions. The approach provides a lightweight alternative to more complex modelling techniques for early decision making.

Data journey modelling:
Predicting risk of IT developments
Iliada Eleftheriou
and
Suzanne M. Embury
Andy Brass
Principles of Enterprise Modelling
Nov 2016

03/01/2017 Data journey modelling 2
Is bigger
always
better

The problem
03/01/2017 Data journey modelling 3

The problem
03/01/2017 Data journey modelling 4

Related work
• Current approaches
– mainly focused on detailed predictions based on
substantial models
– support project managers throughout the
development process, rather than
• give a low-cost indicator
• for use in early-stage decision making.
– Cocomo, Prince, UML, etc.
• Need for a lightweight approach, that gives
reliable predictions, and can be used early.
03/01/2017 Data journey modelling 5

Project aim
• To develop a method that:
– reliably predicts places of costs and risks,
– can be used in early stage decision making.
• Data journey model:
– Lightweight technique
– captures the journey of data through complex
networks of people and systems
– identifies socio-technical challenges in the journey
– Highlights places of high cost and risk
03/01/2017 Data journey modelling 6

Methods
18 case studies from the NHS domain
• Recent IT developments
• Only 3 successful
IT failure factors
• Technical, e.g. conflicting data formats, data silos
• Social: human and organisational related factors
Data movement:
a key indicator of failure
03/01/2017 Data journey modelling 7

Conceptual model
• Data movement anti-patterns:
– movement of data that under some circumstances
impose costs to the new development
03/01/2017 Data journey modelling 8
If the source stores the data in a
physical form, and the target
requests it in electronic, then a
transformation cost is implied to
either end of the movement.
data entry, injection of errors
• Administrative costs:
• Data sharing agreements
• Governance requirements
• Ethical issues
• Data islands
• Legacy systems
• Clash of grammar: dates,
experience, knowledge.

Conceptual model
03/01/2017 Data journey modelling 9

Operational model
03/01/2017 Data journey modelling 10

Operational model
Data Journey Model:
A. Landscape: existing journeys of data within an
organisational landscape, happening at any given time.
B. New journey: the data journey needed by the new
functionality.
A data journey landscape captures both the social and the
technical factors that can affect the journey of data.
03/01/2017 Data journey modelling 11
DATAjourney.org

Operational model
• A data journey, is a set of data movements
between containers.
• A journey leg moves data, through media.
• Actors interact with containers.
03/01/2017 Data journey modelling 12
DATAjourney.org

Predicting risk
03/01/2017 Data journey modelling 13
• Data movement anti – patterns: High cost and
risk occurred when data moved between actors
and containers with key discrepancies:
– Change of media (physical to electronic)
– Discontinuity (external organisation)
– Actor’s properties (clash of grammar)
• Need low cost ways to incorporate patterns.
– In some cases, information is readily available.
– Other factors, are less obvious (people’s vocabularies)
– Use of proxies

Predicting risk
• Group together the elements of the data
journey diagram with similar properties.
• Overlay groupings onto the landscape to form
boundaries.
03/01/2017 Data journey modelling 14

Evaluation
• Retrospective
evaluation
• Real world case study
• Results:
– Accurately predicted:
13 out of 19 predictions.
– Also, predicted 7 that
haven’t been found by
humans, but assessed
as feasible by domain
experts.
• http://datajourney.org/publications/
tech_rep_data_journey.pdf
03 January 2017 Iliada Eleftheriou 15

Conclusion
• Contributions:
– A set of 32 IT failure factors
– Data movement patterns
– Data journey model:
• Potentially identify opportunities for cost saving
• Next: Application on another case study
– Verify the set of boundaries on the genomics team
of the St Mary’s Hospital.
03 January 2017 Iliada Eleftheriou 16

03/01/2017 Data journey modelling 17
Data journey modelling: Predicting risk for IT developments.
Iliada Eleftheriou
iliada.eleftheriou@manchester.ac.uk
DATAjourney.org

Big and Social Media data opens up new scenarios and opportunities for management research (such as using internal communication data to map knowledge networks inside firms, or using web data to study firm capabilities and strategies). This presentation, given at the British Academy of Management 2014 conference proposes a typology of such scenarios, describes the skills required to exploit them, and considers implications for the education and training of management researchers.

SMAC

Mphasis

Technology_solution

Opportunities and methodological challenges of Big Data for official statist...

Piet J.H. Daas

Significant Role of Statistics in Computational Sciences

Editor IJCATR

This paper is focused on the issues related to optimizing statistical approaches in the emerging fields of Computer Science and Information Technology. More emphasis has been given on the role of statistical techniques in modern data mining. Statistics is the science of learning from data and of measuring, controlling, and communicating uncertainty. Statistical approaches can play a vital role for providing significance contribution in the field of software engineering, neural network, data mining, bioinformatics and other allied fields. Statistical techniques not only helps make scientific models but it quantifies the reliability, reproducibility and general uncertainty associated with these models. In the current scenario, large amount of data is automatically recorded with computers and managed with the data base management systems (DBMS) for storage and fast retrieval purpose. The practice of examining large preexisting databases in order to generate new information is known as data mining. Presently, data mining has attracted substantial attention in the research and commercial arena which involves applications of a variety of statistical techniques. Twenty years ago mostly data was collected manually and the data set was in simple form but in present time, there have been considerable changes in the nature of data. Statistical techniques and computer applications can be utilized to obtain maximum information with the fewest possible measurements to reduce the cost of data collection.

The story of Data Stories

Elena Simperl

Data Visualization & Data Storytelling

彭其捷 Jack

Application of statistics in cse

Krishno Dey

New data sources for statistics: Experiences at Statistics Netherlands.

Piet J.H. Daas

Data science

SouravSadhukhan6

Art of Information: A Guide to Data Visualization

UXPA Boston

The human face of AI: how collective and augmented intelligence can help sol...

Elena Simperl

Data Science Lecture: Overview and Information Collateral

Frank Kienle

High-value datasets: from publication to impact

Elena Simperl

Using fuzzy cognitive maps as decision support tool for smart cities goraczek

Danube University Krems, Centre for E-Governance

New Data for Innovation Policy

Juan Mateos-Garcia

Data Science

Rabin BK

Algorithmic Systems Transparency and Accountability in Big Data & Cognitive Era

Nozha Boujemaa

Experimental transformation of ABS data into Data Cube Vocabulary (DCV) form...

Alistair Hamilton

Visual analyticsKatrien Verbert

1. Web Mining – Web mining is an application of data mining for di.docx

braycarissa250

1. Web Mining – Web mining is an application of data mining for discovering data patterns from the web. Web mining is of three categories – content mining, structure mining and usage mining. Content mining detects patterns from data collected by the search engine. Structure mining examines the data which is related to the structure of the website while usage mining examines data from the user’s browser. The data collected through web mining is evaluated and analyzed using techniques like clustering, classification, and association. It is a very good topic for the thesis in data mining. 2. Predictive Analytics – Predictive Analytics is a set of statistical techniques to analyze the current and historical data to predict the future events. The techniques include predictive modeling, machine learning, and data mining. In large organizations, predictive analytics help businesses to identify risks and opportunities in their business. Both structured and unstructured data is analyzed to detect patterns. Predictive Analysis is a lengthy process and consist of seven stages which are project defining, data collection, data analysis, statistics, modeling, deployment, and monitoring. It is an excellent choice for research and thesis. 3. Oracle Data Mining – Oracle Data Mining, also referred as ODM, is a component of Oracle Advanced Analytics Database. It provides powerful data mining algorithms to assist the data analysts to get valuable insights from data to predict the future standards. It helps in predicting the customer behavior which will ultimately help in targeting the best customer and cross-selling. SQL functions are used in the algorithm to mine data tables and views. It is also a good choice for thesis and research in data mining and database. 4. Clustering – Clustering is a process in which data objects are divided into meaningful sub-classes known as clusters. Objects with similar characteristics are aggregated together in a cluster. There are distinct models of clustering such as centralized, distributed. In centroid-based clustering, a vector value is assigned to each cluster. There are various applications of clustering in data mining such as market research, image processing, and data analysis. It is also used in credit card fraud detection. 5. Text mining – Text mining or text data mining is a process to extract high-quality information from the text. It is done through patterns and trends devised using statistical pattern learning. Firstly, the input data is structured. After structuring, patterns are derived from this structured data and finally, the output is evaluated and interpreted. The main applications of text mining include competitive intelligence, E-Discovery, National Security, and social media monitoring. It is a trending topic for the thesis in data mining. 6. Fraud Detection – The number of frauds in daily life is increasing in sectors like banking, finance, and government. Accurate detection of fraud is a challenge. Da.

What's hot

The profile of the management (data) scientist: Potential scenarios and skill...

Juan Mateos-Garcia

SMAC

Mphasis

Technology_solution

Opportunities and methodological challenges of Big Data for official statist...

Piet J.H. Daas

Significant Role of Statistics in Computational Sciences

Editor IJCATR

The story of Data Stories

Elena Simperl

Data Visualization & Data Storytelling

彭其捷 Jack

Application of statistics in cse

Krishno Dey

New data sources for statistics: Experiences at Statistics Netherlands.

Piet J.H. Daas

Data science

SouravSadhukhan6

Art of Information: A Guide to Data Visualization

UXPA Boston

The human face of AI: how collective and augmented intelligence can help sol...

Elena Simperl

Data Science Lecture: Overview and Information Collateral

Frank Kienle

High-value datasets: from publication to impact

Elena Simperl

Using fuzzy cognitive maps as decision support tool for smart cities goraczek

Danube University Krems, Centre for E-Governance

New Data for Innovation Policy

Juan Mateos-Garcia

Data Science

Rabin BK

What's hot (18)

The profile of the management (data) scientist: Potential scenarios and skill...

SMAC

Opportunities and methodological challenges of Big Data for official statist...

Significant Role of Statistics in Computational Sciences

The story of Data Stories

Data Visualization & Data Storytelling

Application of statistics in cse

New data sources for statistics: Experiences at Statistics Netherlands.

Data science

Art of Information: A Guide to Data Visualization

The human face of AI: how collective and augmented intelligence can help sol...

Data Science Lecture: Overview and Information Collateral

High-value datasets: from publication to impact

Using fuzzy cognitive maps as decision support tool for smart cities goraczek

New Data for Innovation Policy

Data Science

Similar to poem_presentation_v5_linkedIn_version

Algorithmic Systems Transparency and Accountability in Big Data & Cognitive Era

Nozha Boujemaa

Experimental transformation of ABS data into Data Cube Vocabulary (DCV) form...

Alistair Hamilton

Visual analyticsKatrien Verbert

1. Web Mining – Web mining is an application of data mining for di.docx

braycarissa250

Introduction to data science

Mahir Haque

Predictive Analytics: Context and Use Cases

Kimberley Mitchell

Proposing an Interactive Audit Pipeline for Visual Privacy Research

Christan Grant

An Introduction to Advanced analytics and data mining

Barry Leventhal

Risk management and IT technologies

Hadi Fadlallah

6 ijaems sept-2015-6-a review of data security primitives in data mining

INFOGAIN PUBLICATION

This paper has discussed various issues and security primitives like Spatial Data Handing, Privacy Protection of data, Data Load Balancing, Resource Mining etc. in the area of Data Mining.A 5-stage review process has been conductedfor 30 research papers which were published in the period of year ranging from 1996 to year 2013. After an exhaustive review process, nine key issues were found “Spatial Data Handing, Data Load Balancing, Resource Mining ,Visual Data Mining, Data Clusters Mining, Privacy Preservation, Mining of gaps between business tools & patterns, Mining of hidden complex patterns.” which have been resolved and explained with proper methodologies. Several solution approaches have been discussed in the 30 papers. This paper provides an outcome of the review which is in the form of various findings, found under various key issues. The findings included algorithms and methodologies used by researchers along with their strengths and weaknesses and the scope for the future work in the area.

data mining

ellen16187

[IJCT-V3I2P30] Authors: Sunny Sharma

IJET - International Journal of Engineering and Techniques

There are numerous ways to analyse the web information, generally web substance are housed in large information sets and basic inquiries are utilized to parse such information sets. As the requests expanded with time, mining web information amended to meet challenging task in a web analysis. Machine learning methodologies are the most up to date one to go into these analysis forms. Different approaches like decision trees, association rules, Meta heuristic and basic learning methods are embraced for making web data appraisal and mining data from various web instances. This study will highlight these approaches in perspective of web investigation. One of the prime goals of this exploration is to investigate more data mining approaches alongside machine learning systems, and to express emerging collaboration of web analytics with artificial intelligence.

DSSG Speaker Series: Paco Nathan

Paco Nathan

An invited talk by Paco Nathan in the speaker series at the University of Chicago's Data Science for Social Good fellowship (2013-08-12) http://dssg.io/2013/05/21/the-fellowship-and-the-fellows.html Learnings generalized from trends in Data Science: a 30-year retrospective on Machine Learning, a 10-year summary of Leading Data Science Teams, and a 2-year survey of Enterprise Use Cases. http://www.eventbrite.com/event/7476758185

Big data Analytics

ShivanandaVSeeri

ml-02x01.pdf

NextGenATM Erasmus+ Project

التنقيب في البيانات - Data Mining

nabil_alsharafi

Sameer Kumar Das International Conference Paper 53Mr.Sameer Kumar Das

Pathways Overview For Open House 19 Sep2010

jmorriso

Data Science Introduction: Concepts, lifecycle, applications.pptx

sumitkumar600840

Data analytics career path

Rubikal

Similar to poem_presentation_v5_linkedIn_version (20)

Algorithmic Systems Transparency and Accountability in Big Data & Cognitive Era

Experimental transformation of ABS data into Data Cube Vocabulary (DCV) form...

Visual analytics

1. Web Mining – Web mining is an application of data mining for di.docx

Introduction to data science

Predictive Analytics: Context and Use Cases

Proposing an Interactive Audit Pipeline for Visual Privacy Research

An Introduction to Advanced analytics and data mining

Risk management and IT technologies

6 ijaems sept-2015-6-a review of data security primitives in data mining

data mining

[IJCT-V3I2P30] Authors: Sunny Sharma

DSSG Speaker Series: Paco Nathan

Big data Analytics

ml-02x01.pdf

التنقيب في البيانات - Data Mining

Sameer Kumar Das International Conference Paper 53

Pathways Overview For Open House 19 Sep2010

Data Science Introduction: Concepts, lifecycle, applications.pptx

Data analytics career path

poem_presentation_v5_linkedIn_version

1. Data journey modelling: Predicting risk of IT developments Iliada Eleftheriou and Suzanne M. Embury Andy Brass Principles of Enterprise Modelling Nov 2016

2. 03/01/2017 Data journey modelling 2 Is bigger always better

3. The problem 03/01/2017 Data journey modelling 3

4. The problem 03/01/2017 Data journey modelling 4

5. Related work • Current approaches – mainly focused on detailed predictions based on substantial models – support project managers throughout the development process, rather than • give a low-cost indicator • for use in early-stage decision making. – Cocomo, Prince, UML, etc. • Need for a lightweight approach, that gives reliable predictions, and can be used early. 03/01/2017 Data journey modelling 5

6. Project aim • To develop a method that: – reliably predicts places of costs and risks, – can be used in early stage decision making. • Data journey model: – Lightweight technique – captures the journey of data through complex networks of people and systems – identifies socio-technical challenges in the journey – Highlights places of high cost and risk 03/01/2017 Data journey modelling 6

7. Methods 18 case studies from the NHS domain • Recent IT developments • Only 3 successful IT failure factors • Technical, e.g. conflicting data formats, data silos • Social: human and organisational related factors Data movement: a key indicator of failure 03/01/2017 Data journey modelling 7

8. Conceptual model • Data movement anti-patterns: – movement of data that under some circumstances impose costs to the new development 03/01/2017 Data journey modelling 8 If the source stores the data in a physical form, and the target requests it in electronic, then a transformation cost is implied to either end of the movement. data entry, injection of errors • Administrative costs: • Data sharing agreements • Governance requirements • Ethical issues • Data islands • Legacy systems • Clash of grammar: dates, experience, knowledge.

9. Conceptual model 03/01/2017 Data journey modelling 9

10. Operational model 03/01/2017 Data journey modelling 10

11. Operational model Data Journey Model: A. Landscape: existing journeys of data within an organisational landscape, happening at any given time. B. New journey: the data journey needed by the new functionality. A data journey landscape captures both the social and the technical factors that can affect the journey of data. 03/01/2017 Data journey modelling 11 DATAjourney.org

12. Operational model • A data journey, is a set of data movements between containers. • A journey leg moves data, through media. • Actors interact with containers. 03/01/2017 Data journey modelling 12 DATAjourney.org

13. Predicting risk 03/01/2017 Data journey modelling 13 • Data movement anti – patterns: High cost and risk occurred when data moved between actors and containers with key discrepancies: – Change of media (physical to electronic) – Discontinuity (external organisation) – Actor’s properties (clash of grammar) • Need low cost ways to incorporate patterns. – In some cases, information is readily available. – Other factors, are less obvious (people’s vocabularies) – Use of proxies

14. Predicting risk • Group together the elements of the data journey diagram with similar properties. • Overlay groupings onto the landscape to form boundaries. 03/01/2017 Data journey modelling 14

15. Evaluation • Retrospective evaluation • Real world case study • Results: – Accurately predicted: 13 out of 19 predictions. – Also, predicted 7 that haven’t been found by humans, but assessed as feasible by domain experts. • http://datajourney.org/publications/ tech_rep_data_journey.pdf 03 January 2017 Iliada Eleftheriou 15

16. Conclusion • Contributions: – A set of 32 IT failure factors – Data movement patterns – Data journey model: • Potentially identify opportunities for cost saving • Next: Application on another case study – Verify the set of boundaries on the genomics team of the St Mary’s Hospital. 03 January 2017 Iliada Eleftheriou 16

17. 03/01/2017 Data journey modelling 17 Data journey modelling: Predicting risk for IT developments. Iliada Eleftheriou iliada.eleftheriou@manchester.ac.uk DATAjourney.org

Editor's Notes

I am Iliada, I come from the UoM and now am on my fourth and final year. My project investigates challenges and risks of moving data across contexts. Today, I will be presenting our paper on how we conceived the data journey model; a lightweight technique that assists in predicting risk for new IT developments.
Is bigger always better? In the context of modelling cost estimation of course. Are bigger, more complex and more detailed cost estimation techniques always more preferable?
Often organisations have new requirements coming in, requiring new functionality to be implemented on top of an existing network of people, systems and data. For example, 2 departments merging, requiring their data to be integrated, Existing data needs to be shared with an external agency to create new value, Or additional data needs to be shared with a consumer Managers and stakeholders of these organisations will have to make a quick decision on whether is worth proceeding with the new development or not. It might sound a simple decision, but in real life is a bit more complicated.
Here we see a drawing from the Kings Fund attempting to structure the National Health Service in the UK. As we can see, organisations are larger and more complex with several sub-organisations and departments each with its own infrastructure, people, policies, governance and politics. Experience shows us that integrating new functionality to an already crowded infrastructure causes things to go wrong. Costs are often underestimated, Projects are given up And jobs are lost So how can we make a go / no go decision in a defensible way, and avoid any newspaper headlines?
Ideally, we would search the literature for an off-the-shelve cost estimation technique. Current approaches to managing risk and estimating the cost are mainly focused on creating detailed predictions based on substantial models of the planned development. They aim to support project managers throughout the development process, rather than giving a low-cost indicator for use in early-stage decision making. Such approaches like COCOMO, PRINCE, I*, UML are powerful and very useful but for later in the cycle. We might have only a day, a week or at most a month to take the decision.
The aim of our project is to help managers and stakeholders of large complex organisations to make better informed decisions on whether to proceed with a new development or not. To do so, we developed a method that reliably predicts risk of new developments, that can be used in early stage decision making. Following the agile methodology ( ), we came up with a rather simplistic model. The data journey model, is a
We analysed 18 case studies from the NHS domain. Written by staff of the NHS and they describe recent IT developments Surprisingly, only 3 out of the 18 studies were categorised by the authors as having been successful. The rest were described as having (completely or partly) failed to deliver the expected benefits. B. We looked for factors influencing the success and failure of the newly introduced development in an existing setting And we extracted a set of 32 factors that contributed to the failure of the developments. We found Not just technical issues of e.g. heterogeneous data sources, but also a majority of social, people and organisational related, factors like: Res. To change, Lack of shared vision Governance and ethical issues C. A form of data movement, either between people, systems, and organisations was a key indicator of failure. Finally, we went through the case studies again and derived generic data movement anti patterns to serve as early warning signs of failure in a new development.
Data entry is a time consuming process typically done by clerical staff, who may not have a strong understanding of the meaning of the data they are entering. Errors can easily be injected that may significantly reduce the quality of the information.
We found 8 anti-patterns so far. Of course is not a complete and final list. But it can get us started. I explain each of them in the paper in more detail. But we can’t just consult managers to avoid any movement of data.
Hence, we propose the data journey model, based on the patterns, assists managers to predict risk. But let’s begin with an example. Let’s imagine we go to our local doctor, the GP to request a blood test. Example used: A GP requests blood test results from a pathology lab. A new external agency requires demographics data from the pathology lab to make workload sharing more effective. So, lets design a data journey model.
As I mentioned before, the djm models the journey / movement of data within and across orgs.
Having modelled the existing journeys of the data and the new one of the new functionality, we can predict places of the journey that may impose high costs and risks to the new development.
From data movement anti-patterns, we found that high cost and risk occurred when data moved between actors and containers with some key discrepancies: Change of media Discontinuity (external organisation) Actor’s properties (clash of grammar): salary band proxy We need low cost ways of incorporating these factors into the data journey model. In some cases, the information is readily available (like whether a container stores data in physical or electronic form). However, other factors, like people’s vocabularies, are less obvious. For these factors we use a proxy; some piece of information which is cheap to apply, and approximates the same relationship between the actors and containers as by the original factor. For example, we use salary bands as a proxy indicator for the presence of “clash of grammars”, on the grounds that a large difference in salary bands between actors probably indicates a different degree of technical expertise.
To identify the places in which the above factors may impose costs, we group together the elements of the data journey diagram with similar properties. These groupings are overlaid onto the landscape of the data journey model and form boundaries. The places where a journey leg crosses from one grouping into another are the predicted location of the cost/risk introduced by the external organisational factor. As we can see, the model doesn’t only predicts high cost places of the new functionality, but also of the existing landscape. The list of the places suggesting to managers a further investigation on the costs that can happen.
We did evaluated our model, though is part of another paper.
Our methodology can potentially be used to identify opportunities for cost saving in an existing system, as well as predicting costs and risks of new developments. Also, the methodology may be used to assess organisational readiness for various compliance programmes, such as clinical guidelines for management of chronic conditions, like diabetes. The guidelines can be modelled as sets of data journeys to check whether the organisation follows or not. If the organisation does not implement a data journey guideline will show the cost of compliance to the organisation.
For any questions or further clarifications, please don’t hesitate to contact me. My email is: iliada.eleftheriou@manchester.ac.uk

poem_presentation_v5_linkedIn_version

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to poem_presentation_v5_linkedIn_version

Similar to poem_presentation_v5_linkedIn_version (20)

poem_presentation_v5_linkedIn_version

Editor's Notes