These are slides for my talk "Data Quality as a prerequisite for you business success: when should I start taking care of it?" I delivered as an invited keynote for HackCodeX Forum that gathered international experts to share their experience and knowledge on the emerging technologies and areas such as Artificial Intelligence, Security, Data Quality, Quantum Computing, Sustainability, Open Data, Privacy etc.
Tackling data quality problems requires more than a series of tactical, one off improvement projects. By their nature, many data quality problems extend across and often beyond an organization. Addressing these issues requires a holistic architectural approach combining people, process and technology. Join Donna Burbank and Nigel Turner as they provide practical ways to control data quality issues in your organization.
You Need a Data Catalog. Do You Know Why?Precisely
The data catalog has become a popular discussion topic within data management and data governance circles. A data catalog is a central repository that contains metadata for describing data sets, how they are defined, and where to find them. TDWI research indicates that implementing a data catalog is a top priority among organizations we survey. The data catalog can also play an important part in the governance process. It provides features that help ensure data quality, compliance, and that trusted data is used for analysis. Without an in-depth knowledge of data and associated metadata, organizations cannot truly safeguard and govern their data.
Join this on-demand webinar to learn more about the data catalog and its role in data governance efforts.
Topics include:
· Data management challenges and priorities
· The modern data catalog – what it is and why it is important
· The role of the modern data catalog in your data quality and governance programs
· The kinds of information that should be in your data catalog and why
How to Build & Sustain a Data Governance Operating Model DATUM LLC
Learn how to execute a data governance strategy through creation of a successful business case and operating model.
Originally presented to an audience of 400+ at the Master Data Management & Data Governance Summit.
Visit www.datumstrategy.com for more!
Measuring Data Quality Return on InvestmentDATAVERSITY
Data Quality is an elusive subject that can defy measurement and yet be critical enough to derail any project, strategic initiative, or even a company. The data layer of an organization is a critical component because it is so easy to ignore the quality of that data or to make overly optimistic assumptions about its efficacy. Having Data Quality as a focus is a business philosophy that aligns strategy, business culture, company information, and technology in order to manage data to the benefit of the enterprise. It is a competitive strategy.
Data Governance Best Practices, Assessments, and RoadmapsDATAVERSITY
When starting or evaluating the present state of your Data Governance program, it is important to focus on best practices such that you don’t take a ready, fire, aim approach. Best practices need to be practical and doable to be selected for your organization, and the program must be at risk if the best practice is not achieved.
Join Bob Seiner for an important webinar focused on industry best practice around standing up formal Data Governance. Learn how to assess your organization against the practices and deliver an effective roadmap based on the results of conducting the assessment.
In this webinar, Bob will focus on:
- Criteria to select the appropriate best practices for your organization
- How to define the best practices for ultimate impact
- Assessing against selected best practices
- Focusing the recommendations on program success
- Delivering a roadmap for your Data Governance program
Tackling data quality problems requires more than a series of tactical, one off improvement projects. By their nature, many data quality problems extend across and often beyond an organization. Addressing these issues requires a holistic architectural approach combining people, process and technology. Join Donna Burbank and Nigel Turner as they provide practical ways to control data quality issues in your organization.
You Need a Data Catalog. Do You Know Why?Precisely
The data catalog has become a popular discussion topic within data management and data governance circles. A data catalog is a central repository that contains metadata for describing data sets, how they are defined, and where to find them. TDWI research indicates that implementing a data catalog is a top priority among organizations we survey. The data catalog can also play an important part in the governance process. It provides features that help ensure data quality, compliance, and that trusted data is used for analysis. Without an in-depth knowledge of data and associated metadata, organizations cannot truly safeguard and govern their data.
Join this on-demand webinar to learn more about the data catalog and its role in data governance efforts.
Topics include:
· Data management challenges and priorities
· The modern data catalog – what it is and why it is important
· The role of the modern data catalog in your data quality and governance programs
· The kinds of information that should be in your data catalog and why
How to Build & Sustain a Data Governance Operating Model DATUM LLC
Learn how to execute a data governance strategy through creation of a successful business case and operating model.
Originally presented to an audience of 400+ at the Master Data Management & Data Governance Summit.
Visit www.datumstrategy.com for more!
Measuring Data Quality Return on InvestmentDATAVERSITY
Data Quality is an elusive subject that can defy measurement and yet be critical enough to derail any project, strategic initiative, or even a company. The data layer of an organization is a critical component because it is so easy to ignore the quality of that data or to make overly optimistic assumptions about its efficacy. Having Data Quality as a focus is a business philosophy that aligns strategy, business culture, company information, and technology in order to manage data to the benefit of the enterprise. It is a competitive strategy.
Data Governance Best Practices, Assessments, and RoadmapsDATAVERSITY
When starting or evaluating the present state of your Data Governance program, it is important to focus on best practices such that you don’t take a ready, fire, aim approach. Best practices need to be practical and doable to be selected for your organization, and the program must be at risk if the best practice is not achieved.
Join Bob Seiner for an important webinar focused on industry best practice around standing up formal Data Governance. Learn how to assess your organization against the practices and deliver an effective roadmap based on the results of conducting the assessment.
In this webinar, Bob will focus on:
- Criteria to select the appropriate best practices for your organization
- How to define the best practices for ultimate impact
- Assessing against selected best practices
- Focusing the recommendations on program success
- Delivering a roadmap for your Data Governance program
This introduction to data governance presentation covers the inter-related DM foundational disciplines (Data Integration / DWH, Business Intelligence and Data Governance). Some of the pitfalls and success factors for data governance.
• IM Foundational Disciplines
• Cross-functional Workflow Exchange
• Key Objectives of the Data Governance Framework
• Components of a Data Governance Framework
• Key Roles in Data Governance
• Data Governance Committee (DGC)
• 4 Data Governance Policy Areas
• 3 Challenges to Implementing Data Governance
• Data Governance Success Factors
Data Architecture Strategies: Data Architecture for Digital TransformationDATAVERSITY
MDM, data quality, data architecture, and more. At the same time, combining these foundational data management approaches with other innovative techniques can help drive organizational change as well as technological transformation. This webinar will provide practical steps for creating a data foundation for effective digital transformation.
Data-Ed Webinar: Data Quality Success StoriesDATAVERSITY
Organizations must realize what it means to utilize data quality management in support of business strategy. This webinar will demonstrate how chronic business challenges can often be attributed to the root problem of poor data quality. Showing how data quality should be engineered provides a useful framework in which to develop an effective approach. Establishing this framework allows organizations to more efficiently identify business and data problems caused by structural issues versus practice-oriented defects; giving them the skillset to prevent these problems from re-occurring.
Learning Objectives:
Understanding foundational data quality concepts based on the DAMA DMBOK
Utilizing data quality engineering in support of business strategy
Case Studies illustrating data quality success
Data quality guiding principles & best practices
Steps for improving data quality at your organization
Formalize Data Governance with Policies and ProceduresDATAVERSITY
Policies and procedures lie at the heart of institutionalizing data governance. Data Governance is defined as the act of “executing and enforcing authority” to follow the procedures and enforce the policies. You can formalize Data Governance by clearly defining and following policies and procedures.
Join Bob Seiner for this month’s installment of the Real-World Data Governance webinar series where he will discuss how data governance can be formalized in parallel to the delivery of data policy and detailed procedures. Challenges associated with the changing the behavior of the data stewards will be identified, discussed and resolved during this session.
In this webinar Bob will discuss:
The relationship between Data Governance and Data Policy
Core guidelines to embrace through policy
DG Roles and their importance to following Policies and Procedures
Using RACIs and similar constructs to formalize Data Governance
Measuring the results of formalizing policies and procedures
Glossaries, Dictionaries, and Catalogs Result in Data GovernanceDATAVERSITY
Data catalogs, business glossaries, and data dictionaries house metadata that is important to your organization’s governance of data. People in your organization need to be engaged in leveraging the tools, understanding the data that is available, who is responsible for the data, and knowing how to get their hands on the data to perform their job function. The metadata will not govern itself.
Join Bob Seiner for the webinar where he will discuss how glossaries, dictionaries, and catalogs can result in effective Data Governance. People must have confidence in the metadata associated with the data that you need them to trust. Therefore, the metadata in your data catalog, business glossary, and data dictionary must result in governed data. Learn how glossaries, dictionaries, and catalogs can result in Data Governance in this webinar.
Bob will discuss the following subjects in this webinar:
- Successful Data Governance relies on value from very important tools
- What it means to govern your data catalog, business glossary, and data dictionary
- Why governing the metadata in these tools is important
- The roles necessary to govern these tools
- Governance expected from metadata in catalogs, glossaries, and dictionaries
DAS Slides: Data Governance - Combining Data Management with Organizational ...DATAVERSITY
Data Governance is both a technical and an organizational discipline, and getting Data Governance right requires a combination of Data Management fundamentals aligned with organizational change and stakeholder buy-in. Join Nigel Turner and Donna Burbank as they provide an architecture-based approach to aligning business motivation, organizational change, Metadata Management, Data Architecture and more in a concrete, practical way to achieve success in your organization.
Data Governance Takes a Village (So Why is Everyone Hiding?)DATAVERSITY
Data governance represents both an obstacle and opportunity for enterprises everywhere. And many individuals may hesitate to embrace the change. Yet if led well, a governance initiative has the potential to launch a data community that drives innovation and data-driven decision-making for the wider business. (And yes, it can even be fun!). So how do you build a roadmap to success?
This session will gather four governance experts, including Mary Williams, Associate Director, Enterprise Data Governance at Exact Sciences, and Bob Seiner, author of Non-Invasive Data Governance, for a roundtable discussion about the challenges and opportunities of leading a governance initiative that people embrace. Join this webinar to learn:
- How to build an internal case for data governance and a data catalog
- Tips for picking a use case that builds confidence in your program
- How to mature your program and build your data community
To take a “ready, aim, fire” tactic to implement Data Governance, many organizations assess themselves against industry best practices. The process is not difficult or time-consuming and can directly assure that your activities target your specific needs. Best practices are always a strong place to start.
Join Bob Seiner for this popular RWDG topic, where he will provide the information you need to set your program in the best possible direction. Bob will walk you through the steps of conducting an assessment and share with you a set of typical results from taking this action. You may be surprised at how easy it is to organize the assessment and may hear results that stimulate the actions that you need to take.
In this webinar, Bob will share:
- The value of performing a Data Governance best practice assessment
- A practical list of industry Data Governance best practices
- Criteria to determine if a practice is best practice
- Steps to follow to complete an assessment
- Typical recommendations and actions that result from an assessment
Improving Data Literacy Around Data ArchitectureDATAVERSITY
Data Literacy is an increasing concern, as organizations look to become more data-driven. As the rise of the citizen data scientist and self-service data analytics becomes increasingly common, the need for business users to understand core Data Management fundamentals is more important than ever. At the same time, technical roles need a strong foundation in Data Architecture principles and best practices. Join this webinar to understand the key components of Data Literacy, and practical ways to implement a Data Literacy program in your organization.
The first step towards understanding data assets’ impact on your organization is understanding what those assets mean for each other. Metadata – literally, data about data – is a practice area required by good systems development, and yet is also perhaps the most mislabeled and misunderstood Data Management practice. Understanding metadata and its associated technologies as more than just straightforward technological tools can provide powerful insight into the efficiency of organizational practices and enable you to combine practices into sophisticated techniques supporting larger and more complex business initiatives. Program learning objectives include:
- Understanding how to leverage metadata practices in support of business strategy
- Discuss foundational metadata concepts
- Guiding principles for and lessons previously learned from metadata and its practical uses applied strategy
Metadata strategies include:
- Metadata is a gerund so don’t try to treat it as a noun
- Metadata is the language of Data Governance
- Treat glossaries/repositories as capabilities, not technology
How to Strengthen Enterprise Data Governance with Data QualityDATAVERSITY
If your organization is in a highly-regulated industry – or relies on data for competitive advantage – data governance is undoubtedly a top priority. Whether you’re focused on “defensive” data governance (supporting regulatory compliance and risk management) or “offensive” data governance (extracting the maximum value from your data assets, and minimizing the cost of bad data), data quality plays a critical role in ensuring success.
Join our webinar to learn how enterprise data quality drives stronger data governance, including:
The overlaps between data governance and data quality
The “data” dependencies of data governance – and how data quality addresses them
Key considerations for deploying data quality for data governance
In this lecture we discuss data quality and data quality in Linked Data. This 50 minute lecture was given to masters student at Trinity College Dublin (Ireland), and had the following contents:
1) Defining Quality
2) Defining Data Quality - What, Why, Costs
3) Identifying problems early - using a simple semantic publishing process as an example
4) Assessing Linked (big) Data quality
5) Quality of LOD cloud datasets
References can be found at the end of the slides
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 (CC-BY-SA-40) International License.
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
Enterprise Architecture (EA) provides a visual blueprint of the organization, and shows key interrelationships between data, process, applications, and more. By abstracting these assets in a graphical view, it’s possible to see key interrelationships, particularly as they relate to data and its business impact across the organization. Join us for a discussion on how Data Architecture is a key component of an overall Enterprise Architecture for enhanced business value and success.
Data Governance and Metadata ManagementDATAVERSITY
Metadata is a tool that improves data understanding, builds end-user confidence, and improves the return on investment in every asset associated with becoming a data-centric organization. Metadata’s use has expanded beyond “data about data” to cover every phase of data analytics, protection, and quality improvement. Data Governance and metadata are connected at the hip in every way possible. As the song goes, “You can’t have one without the other.”
In this RWDG webinar, Bob Seiner will provide a way to renew your energy by focusing on the valuable asset that can make or break your Data Governance program’s success. The truth is metadata is already inherent in your data environment, and it can be leveraged by making it available to all levels of the organization. At issue is finding the most appropriate ways to leverage and share metadata to improve data value and protection.
Throughout this webinar, Bob will share information about:
- Delivering an improved definition of metadata
- Communicating the relationship between successful governance and metadata
- Getting your business community to embrace the need for metadata
- Determining the metadata that will provide the most bang for your bucks
- The importance of Metadata Management to becoming data-centric
The Role of Data Governance in a Data StrategyDATAVERSITY
A Data Strategy is a plan for moving an organization towards a more data-driven culture. A Data Strategy is often viewed as a technical exercise. A modern and comprehensive Data Strategy addresses more than just the data; it is a roadmap that defines people, process, and technology. The people aspect includes governance, the execution and enforcement of authority, and formalization of accountability over the management of the data.
In this RWDG webinar, Bob Seiner will share where Data Governance fits into an effective Data Strategy. As part of the strategy, the program must focus on the governance of people, process, and technology fixated on treating and leveraging data as a valued asset. Join us to learn about the role of Data Governance in a Data Strategy.
Bob will address the following in this webinar:
- A structure for delivery of a Data Strategy
- How to address people, process, and technology in a Data Strategy
- Why Data Governance is an important piece of a Data Strategy
- How to include Data Governance in the structure of the policy
- Examples of how governance has been included in a Data Strategy
Data Management, Metadata Management, and Data Governance – Working TogetherDATAVERSITY
The data disciplines listed in the title must work together. The key to success requires understanding the boundaries and overlaps between the disciplines. Wouldn’t it be great to be able to present the relationships between the disciplines in a simple all-in diagram? At the end of this webinar, you will be able to do just that.
This new RWDG webinar with Bob Seiner will outline how Data Management, Metadata Management, and Data Governance can be optimized to work together. Bob will share a diagram that has successfully communicated the relationship between these disciplines to leadership resulting in the disciplines working in harmony and delivering success.
Bob will share the following in this webinar:
- Categories of disciplines focused on managing data as an asset
- A definition of Data Management that embraces numerous data disciplines
- The importance of Metadata -Management to all data disciplines
- Why data and metadata require formal governance
- A graphic that effectively exhibits the relationship between the disciplines
It’s been three years since the General Data Protection Regulation shook up how organizations manage data security and privacy, ushering in a new focus on Data Governance. But what is the state of Data Governance today?
How has it evolved? What’s its role now? Building on prior research, erwin by Quest and ESG have partnered on a new study about what’s driving the practice of Data Governance, program maturity and current challenges. It also examines the connections to data operations and data protection, which is interesting given the fact that improving data security is now the No. 1 driver of Data Governance, according to this year’s survey respondents.
So please join us for this webinar to learn about the:
Other primary drivers for enterprise Data Governance programs
Most common bottlenecks to program maturity and sustainability
Advantages of aligning Data Governance with the other data disciplines
In a post-COVID world, data has the power to be even more transformative, and 84% of business and technology professionals say it represents the best opportunity to develop a competitive advantage during the next 12 to 24 months. Let’s make sure your organization has the intelligence it needs about both data and data systems to empower stakeholders in the front and back office to do what they need to do.
Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...Anastasija Nikiforova
“Data is the new oil” is only partly true, since according to Forbes, data is more than oil, while according to Ataccama, “Manual Data Quality Doesn’t Cut It in 2023” – this was the main driver behind of my guest lecture entitled “Data Quality for AI or AI for Data quality: advances in Data Quality Management for the success and sustainability of emerging technologies, business and society”, as part of which we discussed what is the role of artificial intelligence in data quality management and what is the role of data quality for AI, concluding that it is not about “data quality for AI” OR “AI for data quality” but rather about AND.
We also looked at what is the current market offer regarding AI-driven data quality management, what are the pros and cons of these solutions and what are the prerequisites that we have to take into account when using them (e.g., metadata and their quality for those, which derive DQ rules based on metadata analysis), and how possibly more promising solution could be built.
We also looked at what are those data quality specificities we should consider depending on the artifact – a data object (dataset), whose owner is known / is unknown (open data), Information System, Data Warehouse, Data Lake, Data Lakehouse, Data Mesh – where, when and how DQ takes place in them? What are the current trends? And are these indeed trends or rather hype?
Public data ecosystems in and for smart cities: how to make open / Big / smar...Anastasija Nikiforova
This is a set of slides used as part of my keynote "Public data ecosystems in and for smart cities: how to make open / Big / smart / geo data ecosystems value-adding for SDG-compliant Smart Living and Society 5.0" delivered at the 5th International Conference on Advanced Research Methods and Analytics (CARMA 2023) -> https://carmaconf2023.wordpress.com/keynote-speakers/. read more here -> https://anastasijanikiforova.com/2023/06/30/keynote-at-the-5th-international-conference-on-advanced-research-methods-and-analytics-carma-2023/
This introduction to data governance presentation covers the inter-related DM foundational disciplines (Data Integration / DWH, Business Intelligence and Data Governance). Some of the pitfalls and success factors for data governance.
• IM Foundational Disciplines
• Cross-functional Workflow Exchange
• Key Objectives of the Data Governance Framework
• Components of a Data Governance Framework
• Key Roles in Data Governance
• Data Governance Committee (DGC)
• 4 Data Governance Policy Areas
• 3 Challenges to Implementing Data Governance
• Data Governance Success Factors
Data Architecture Strategies: Data Architecture for Digital TransformationDATAVERSITY
MDM, data quality, data architecture, and more. At the same time, combining these foundational data management approaches with other innovative techniques can help drive organizational change as well as technological transformation. This webinar will provide practical steps for creating a data foundation for effective digital transformation.
Data-Ed Webinar: Data Quality Success StoriesDATAVERSITY
Organizations must realize what it means to utilize data quality management in support of business strategy. This webinar will demonstrate how chronic business challenges can often be attributed to the root problem of poor data quality. Showing how data quality should be engineered provides a useful framework in which to develop an effective approach. Establishing this framework allows organizations to more efficiently identify business and data problems caused by structural issues versus practice-oriented defects; giving them the skillset to prevent these problems from re-occurring.
Learning Objectives:
Understanding foundational data quality concepts based on the DAMA DMBOK
Utilizing data quality engineering in support of business strategy
Case Studies illustrating data quality success
Data quality guiding principles & best practices
Steps for improving data quality at your organization
Formalize Data Governance with Policies and ProceduresDATAVERSITY
Policies and procedures lie at the heart of institutionalizing data governance. Data Governance is defined as the act of “executing and enforcing authority” to follow the procedures and enforce the policies. You can formalize Data Governance by clearly defining and following policies and procedures.
Join Bob Seiner for this month’s installment of the Real-World Data Governance webinar series where he will discuss how data governance can be formalized in parallel to the delivery of data policy and detailed procedures. Challenges associated with the changing the behavior of the data stewards will be identified, discussed and resolved during this session.
In this webinar Bob will discuss:
The relationship between Data Governance and Data Policy
Core guidelines to embrace through policy
DG Roles and their importance to following Policies and Procedures
Using RACIs and similar constructs to formalize Data Governance
Measuring the results of formalizing policies and procedures
Glossaries, Dictionaries, and Catalogs Result in Data GovernanceDATAVERSITY
Data catalogs, business glossaries, and data dictionaries house metadata that is important to your organization’s governance of data. People in your organization need to be engaged in leveraging the tools, understanding the data that is available, who is responsible for the data, and knowing how to get their hands on the data to perform their job function. The metadata will not govern itself.
Join Bob Seiner for the webinar where he will discuss how glossaries, dictionaries, and catalogs can result in effective Data Governance. People must have confidence in the metadata associated with the data that you need them to trust. Therefore, the metadata in your data catalog, business glossary, and data dictionary must result in governed data. Learn how glossaries, dictionaries, and catalogs can result in Data Governance in this webinar.
Bob will discuss the following subjects in this webinar:
- Successful Data Governance relies on value from very important tools
- What it means to govern your data catalog, business glossary, and data dictionary
- Why governing the metadata in these tools is important
- The roles necessary to govern these tools
- Governance expected from metadata in catalogs, glossaries, and dictionaries
DAS Slides: Data Governance - Combining Data Management with Organizational ...DATAVERSITY
Data Governance is both a technical and an organizational discipline, and getting Data Governance right requires a combination of Data Management fundamentals aligned with organizational change and stakeholder buy-in. Join Nigel Turner and Donna Burbank as they provide an architecture-based approach to aligning business motivation, organizational change, Metadata Management, Data Architecture and more in a concrete, practical way to achieve success in your organization.
Data Governance Takes a Village (So Why is Everyone Hiding?)DATAVERSITY
Data governance represents both an obstacle and opportunity for enterprises everywhere. And many individuals may hesitate to embrace the change. Yet if led well, a governance initiative has the potential to launch a data community that drives innovation and data-driven decision-making for the wider business. (And yes, it can even be fun!). So how do you build a roadmap to success?
This session will gather four governance experts, including Mary Williams, Associate Director, Enterprise Data Governance at Exact Sciences, and Bob Seiner, author of Non-Invasive Data Governance, for a roundtable discussion about the challenges and opportunities of leading a governance initiative that people embrace. Join this webinar to learn:
- How to build an internal case for data governance and a data catalog
- Tips for picking a use case that builds confidence in your program
- How to mature your program and build your data community
To take a “ready, aim, fire” tactic to implement Data Governance, many organizations assess themselves against industry best practices. The process is not difficult or time-consuming and can directly assure that your activities target your specific needs. Best practices are always a strong place to start.
Join Bob Seiner for this popular RWDG topic, where he will provide the information you need to set your program in the best possible direction. Bob will walk you through the steps of conducting an assessment and share with you a set of typical results from taking this action. You may be surprised at how easy it is to organize the assessment and may hear results that stimulate the actions that you need to take.
In this webinar, Bob will share:
- The value of performing a Data Governance best practice assessment
- A practical list of industry Data Governance best practices
- Criteria to determine if a practice is best practice
- Steps to follow to complete an assessment
- Typical recommendations and actions that result from an assessment
Improving Data Literacy Around Data ArchitectureDATAVERSITY
Data Literacy is an increasing concern, as organizations look to become more data-driven. As the rise of the citizen data scientist and self-service data analytics becomes increasingly common, the need for business users to understand core Data Management fundamentals is more important than ever. At the same time, technical roles need a strong foundation in Data Architecture principles and best practices. Join this webinar to understand the key components of Data Literacy, and practical ways to implement a Data Literacy program in your organization.
The first step towards understanding data assets’ impact on your organization is understanding what those assets mean for each other. Metadata – literally, data about data – is a practice area required by good systems development, and yet is also perhaps the most mislabeled and misunderstood Data Management practice. Understanding metadata and its associated technologies as more than just straightforward technological tools can provide powerful insight into the efficiency of organizational practices and enable you to combine practices into sophisticated techniques supporting larger and more complex business initiatives. Program learning objectives include:
- Understanding how to leverage metadata practices in support of business strategy
- Discuss foundational metadata concepts
- Guiding principles for and lessons previously learned from metadata and its practical uses applied strategy
Metadata strategies include:
- Metadata is a gerund so don’t try to treat it as a noun
- Metadata is the language of Data Governance
- Treat glossaries/repositories as capabilities, not technology
How to Strengthen Enterprise Data Governance with Data QualityDATAVERSITY
If your organization is in a highly-regulated industry – or relies on data for competitive advantage – data governance is undoubtedly a top priority. Whether you’re focused on “defensive” data governance (supporting regulatory compliance and risk management) or “offensive” data governance (extracting the maximum value from your data assets, and minimizing the cost of bad data), data quality plays a critical role in ensuring success.
Join our webinar to learn how enterprise data quality drives stronger data governance, including:
The overlaps between data governance and data quality
The “data” dependencies of data governance – and how data quality addresses them
Key considerations for deploying data quality for data governance
In this lecture we discuss data quality and data quality in Linked Data. This 50 minute lecture was given to masters student at Trinity College Dublin (Ireland), and had the following contents:
1) Defining Quality
2) Defining Data Quality - What, Why, Costs
3) Identifying problems early - using a simple semantic publishing process as an example
4) Assessing Linked (big) Data quality
5) Quality of LOD cloud datasets
References can be found at the end of the slides
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 (CC-BY-SA-40) International License.
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
Enterprise Architecture (EA) provides a visual blueprint of the organization, and shows key interrelationships between data, process, applications, and more. By abstracting these assets in a graphical view, it’s possible to see key interrelationships, particularly as they relate to data and its business impact across the organization. Join us for a discussion on how Data Architecture is a key component of an overall Enterprise Architecture for enhanced business value and success.
Data Governance and Metadata ManagementDATAVERSITY
Metadata is a tool that improves data understanding, builds end-user confidence, and improves the return on investment in every asset associated with becoming a data-centric organization. Metadata’s use has expanded beyond “data about data” to cover every phase of data analytics, protection, and quality improvement. Data Governance and metadata are connected at the hip in every way possible. As the song goes, “You can’t have one without the other.”
In this RWDG webinar, Bob Seiner will provide a way to renew your energy by focusing on the valuable asset that can make or break your Data Governance program’s success. The truth is metadata is already inherent in your data environment, and it can be leveraged by making it available to all levels of the organization. At issue is finding the most appropriate ways to leverage and share metadata to improve data value and protection.
Throughout this webinar, Bob will share information about:
- Delivering an improved definition of metadata
- Communicating the relationship between successful governance and metadata
- Getting your business community to embrace the need for metadata
- Determining the metadata that will provide the most bang for your bucks
- The importance of Metadata Management to becoming data-centric
The Role of Data Governance in a Data StrategyDATAVERSITY
A Data Strategy is a plan for moving an organization towards a more data-driven culture. A Data Strategy is often viewed as a technical exercise. A modern and comprehensive Data Strategy addresses more than just the data; it is a roadmap that defines people, process, and technology. The people aspect includes governance, the execution and enforcement of authority, and formalization of accountability over the management of the data.
In this RWDG webinar, Bob Seiner will share where Data Governance fits into an effective Data Strategy. As part of the strategy, the program must focus on the governance of people, process, and technology fixated on treating and leveraging data as a valued asset. Join us to learn about the role of Data Governance in a Data Strategy.
Bob will address the following in this webinar:
- A structure for delivery of a Data Strategy
- How to address people, process, and technology in a Data Strategy
- Why Data Governance is an important piece of a Data Strategy
- How to include Data Governance in the structure of the policy
- Examples of how governance has been included in a Data Strategy
Data Management, Metadata Management, and Data Governance – Working TogetherDATAVERSITY
The data disciplines listed in the title must work together. The key to success requires understanding the boundaries and overlaps between the disciplines. Wouldn’t it be great to be able to present the relationships between the disciplines in a simple all-in diagram? At the end of this webinar, you will be able to do just that.
This new RWDG webinar with Bob Seiner will outline how Data Management, Metadata Management, and Data Governance can be optimized to work together. Bob will share a diagram that has successfully communicated the relationship between these disciplines to leadership resulting in the disciplines working in harmony and delivering success.
Bob will share the following in this webinar:
- Categories of disciplines focused on managing data as an asset
- A definition of Data Management that embraces numerous data disciplines
- The importance of Metadata -Management to all data disciplines
- Why data and metadata require formal governance
- A graphic that effectively exhibits the relationship between the disciplines
It’s been three years since the General Data Protection Regulation shook up how organizations manage data security and privacy, ushering in a new focus on Data Governance. But what is the state of Data Governance today?
How has it evolved? What’s its role now? Building on prior research, erwin by Quest and ESG have partnered on a new study about what’s driving the practice of Data Governance, program maturity and current challenges. It also examines the connections to data operations and data protection, which is interesting given the fact that improving data security is now the No. 1 driver of Data Governance, according to this year’s survey respondents.
So please join us for this webinar to learn about the:
Other primary drivers for enterprise Data Governance programs
Most common bottlenecks to program maturity and sustainability
Advantages of aligning Data Governance with the other data disciplines
In a post-COVID world, data has the power to be even more transformative, and 84% of business and technology professionals say it represents the best opportunity to develop a competitive advantage during the next 12 to 24 months. Let’s make sure your organization has the intelligence it needs about both data and data systems to empower stakeholders in the front and back office to do what they need to do.
Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...Anastasija Nikiforova
“Data is the new oil” is only partly true, since according to Forbes, data is more than oil, while according to Ataccama, “Manual Data Quality Doesn’t Cut It in 2023” – this was the main driver behind of my guest lecture entitled “Data Quality for AI or AI for Data quality: advances in Data Quality Management for the success and sustainability of emerging technologies, business and society”, as part of which we discussed what is the role of artificial intelligence in data quality management and what is the role of data quality for AI, concluding that it is not about “data quality for AI” OR “AI for data quality” but rather about AND.
We also looked at what is the current market offer regarding AI-driven data quality management, what are the pros and cons of these solutions and what are the prerequisites that we have to take into account when using them (e.g., metadata and their quality for those, which derive DQ rules based on metadata analysis), and how possibly more promising solution could be built.
We also looked at what are those data quality specificities we should consider depending on the artifact – a data object (dataset), whose owner is known / is unknown (open data), Information System, Data Warehouse, Data Lake, Data Lakehouse, Data Mesh – where, when and how DQ takes place in them? What are the current trends? And are these indeed trends or rather hype?
Public data ecosystems in and for smart cities: how to make open / Big / smar...Anastasija Nikiforova
This is a set of slides used as part of my keynote "Public data ecosystems in and for smart cities: how to make open / Big / smart / geo data ecosystems value-adding for SDG-compliant Smart Living and Society 5.0" delivered at the 5th International Conference on Advanced Research Methods and Analytics (CARMA 2023) -> https://carmaconf2023.wordpress.com/keynote-speakers/. read more here -> https://anastasijanikiforova.com/2023/06/30/keynote-at-the-5th-international-conference-on-advanced-research-methods-and-analytics-carma-2023/
OPEN DATA: ECOSYSTEM, CURRENT AND FUTURE TRENDS, SUCCESS STORIES AND BARRIERSAnastasija Nikiforova
"OPEN DATA: ECOSYSTEM, CURRENT AND FUTURE TRENDS, SUCCESS STORIES AND BARRIERS" set of slides was prepared for the Guest Lecture, which I has delivered to the students of the University of South-Eastern Norway (USN), October 2021
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la IglesiaMaria de la Iglesia
Según Hal Varian (experto en microeconomía y economía de la información y, desde el año 2002, Chief Economist de Google) “En los próximos años, el trabajo más atractivo será el de los estadísticos: La capacidad de recoger datos, comprenderlos, procesarlos, extraer su valor, visualizarlos, comunicarlos serán todas habilidades importantes en las próximas décadas. Ahora disponemos de datos gratuitos y omnipresentes. Lo que aún falta es la capacidad de comprender estos datos“.
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
This is my presentation on the Topic "Data Science - An emerging Stream of Science with its Spreading Reach & Impact". I have compiled and collected different statistics and data from different sources. This may be useful for students and those who might be interested in this field of Study.
In this paper we have penetrate an era of Big Data. Through better analysis of the large volumes of data that are becoming available, there is the potential for making faster advances in many scientific disciplines and improving the profitability and success of many enterprises. However, many technical challenges described in this paper must be addressed before this potential can be realized fully. The challenges include not just the obvious issues of scale, but also heterogeneity, lack of structure, error-handling, privacy, timeliness, provenance, and visualization, at all stages of the analysis pipeline from data acquisition to result interpretation.
How Can Public Data Help Your Organization? An Introduction to DataCommons.orgTechSoup
Hosted by TechSoup on February 13, 2023.
https://events.techsoup.org/e/mykxzr/
Nonprofit organizations can use data to help communities and funders better understand their work. But how do you know which data to use? And where do you find it? And critically: once you have data to share, how can you use it to tell a story about your organization?
TechSoup is collaborating with DataCommons.org and Tech Impact’s Data Innovation Lab to help answer these questions. We know that organizing the data you need in a meaningful way can be difficult, especially if the data comes from many different places. In this webinar, you will learn how DataCommons.org helps to address this challenge, and how we are working together to make it as easy as possible for small organizations to use public data to share stories about their work and impact.
Supervised Multi Attribute Gene Manipulation For Cancerpaperpublications3
Abstract: Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviours, allowing businesses to make proactive, knowledge-driven decisions. The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools typical of decision support systems.
They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Data mining techniques are the result of a long process of research and product development. This evolution began when business data was first stored on computers, continued with improvements in data access, and more recently, generated technologies that allow users to navigate through their data in real time. Data mining takes this evolutionary process beyond retrospective data access and navigation to prospective and proactive information delivery.
The presentation is about the career path in the field of Data Science. Data Science is a multi-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from structured and unstructured data.
Towards High-Value Datasets determination for data-driven development: a syst...Anastasija Nikiforova
Slides for the talk delivered as part of EGOV-CeDEM-ePart 2023 (EGOV2023) conference, aimed at examining how HVD determination has been reflected in the literature over the years and what has been found by these studies to date, incl. the indicators used in them, involved stakeholders, data-related aspects, and frameworks, which was done by conducting a Systematic Literature Review.
Read the paper here -> https://link.springer.com/chapter/10.1007/978-3-031-41138-0_14
Artificial Intelligence for open data or open data for artificial intelligence?Anastasija Nikiforova
This is a presentation used to deliver an invited talk for Babu Banarasi Das University (BBDU, Department of Computer Science and Engineering) Development Program «Artificial Intelligence for Sustainable Development» organized by AI Research Centre, Department of Computer Science & Engineering, ShodhGuru Research Labs, Soft Computing Research Society, IEEE UP Section, Computational Intelligence Society Chapter in 2022. Read more here -> https://anastasijanikiforova.com/2022/09/24/ai-for-open-data-or-open-data-for-ai-an-invited-talk-for-bbdu-development-program-artificial-intelligence-for-sustainable-development%f0%9f%8e%a4/
Overlooked aspects of data governance: workflow framework for enterprise data...Anastasija Nikiforova
This presentation is a supplementary material for the article "Overlooked aspects of data governance: workflow framework for enterprise data deduplication" (Azeroual, Nikiforova, Shei) presented at The International Conference on Intelligent Computing, Communication, Networking and Services (ICCNS2023).
Abstract of the paper: Data quality in companies is decisive and critical to the benefits their products and services can provide. However, in heterogeneous IT infrastructures where, e.g., different applications for Enterprise Resource Planning (ERP), Customer Relationship Management (CRM), product management, manufacturing, and marketing are used, duplicates, e.g., multiple entries for the same customer or product in a database or information system, occur. There can be several reasons for this, but the result of non-unique or duplicate records is a degraded data quality. This ultimately leads to poorer, inefficient, and inaccurate data-driven decisions. For this reason, in this paper, we develop a conceptual data governance framework for effective and efficient management of duplicate data, and improvement of data accuracy and consistency in large data ecosystems. We present methods and recommendations for companies to deal with duplicate data in a meaningful way.
Framework for understanding quantum computing use cases from a multidisciplin...Anastasija Nikiforova
This presentation is a supplementary material for the article "Framework for understanding quantum computing use cases from a multidisciplinary perspective and future research directions" (Ukpabi, D.C., Karjaluoto, H., Botticher, A., Nikiforova, A., Petrescu, D.I., Schindler, P., Valtenbergs, V., Lehmann, L., & Yakaryılmaz, A) available at https://arxiv.org/ftp/arxiv/papers/2212/2212.13909.pdf. THe presentation, however, was delivered for QWorld Quantum Science Days 2023 | May 29-31.
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...Anastasija Nikiforova
This presentation was delivered as part of the Data Science Seminar titled “When, Why and How? The Importance of Business Intelligence“ organized by the Institute of Computer Science (University of Tartu) in cooperation with Swedbank.
In this presentation I talked about:
*“Data warehouse vs. data lake – what are they and what is the difference between them?” (structured vs unstructured, static vs dynamic (real-time data), schema-on-write vs schema on-read, ETL vs ELT) with further elaboration on What are their goals and purposes? What is their target audience? What are their pros and cons?
*“Is the Data warehouse the only data repository suitable for BI?” – no, (today) data lakes can also be suitable. And even more, both are considered the key to “a single version of the truth”. Although, if descriptive BI is the only purpose, it might still be better to stay within data warehouse. But, if you want to either have predictive BI or use your data for ML (or do not have a specific idea on how you want to use the data, but want to be able to explore your data effectively and efficiently), you know that a data warehouse might not be the best option.
*“So, the data lake will save my resources a lot, because I do not have to worry about how to store /allocate the data – just put it in one storage and voila?!” – no, in this case your data lake will turn into a data swamp! And you are forgetting about the data quality you should (must!) be thinking of!
*“But how do you prevent the data lake from becoming a data swamp?” – in short and simple terms – proper data governance & metadata management is the answer (but not as easy as it sounds – do not forget about your data engineer and be friendly with him [always… literally always :D) and also think about the culture in your organization.
*“So, the use of a data warehouse is the key to high quality data?” – no, it is not! Having ETL do not guarantee the quality of your data (transform&load is not data quality management). Think about data quality regardless of the repository!
*“Are data warehouses and data lakes the only options to consider or are we missing something?“– true! Data lakehouse!
*“If a data lakehouse is a combination of benefits of a data warehouse and data lake, is it a silver bullet?“– no, it is not! This is another option (relatively immature) to consider that may be the best bit for you, but not a panacea. Dealing with data is not easy (still)…
In addition, in this talk I also briefly introduced the ongoing research into the integration of the data lake as a data repository and data wrangling seeking for an increased data quality in IS. In short, this is somewhat like an improved data lakehouse, where we emphasize the need of data governance and data wrangling to be integrated to really get the benefits that the data lakehouses promise (although we still call it a data lake, since a data lakehouse is nut sufficiently mature concept with different definitions of it).
Putting FAIR Principles in the Context of Research Information: FAIRness for ...Anastasija Nikiforova
This presentation is a supplementary material for "Putting FAIR Principles in the Context of Research Information: FAIRness for CRIS and CRIS for FAIRness" (Otmane Azeroual, Joachim Schopfel, Janne Polonen, and Anastasija Nikiforova) paper presented at 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K) conference, and also received the Best Paper Award. In this presentation we raise a discussion on this topic showing that the improvement of FAIRness is a dual or bidirectional process, where CRIS promotes and contributes to the FAIRness of data and infrastructures, and FAIR principles push for further improvement in the underlying CRIS data model and format, positively affecting the sustainability of these systems and underlying artifacts. CRIS are beneficial for FAIR, and FAIR is beneficial for CRIS.
See the text here -> https://www.scitepress.org/Link.aspx?doi=10.5220/0011548700003335
Cite as -> Azeroual, O.; Schöpfel, J.; Pölönen, J. and Nikiforova, A. (2022). Putting FAIR Principles in the Context of Research Information: FAIRness for CRIS and CRIS for FAIRness. In Proceedings of the 14th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KMIS, ISBN 978-989-758-614-9; ISSN 2184-3228, pages 63-71. DOI: 10.5220/0011548700003335
Open data hackathon as a tool for increased engagement of Generation Z: to h...Anastasija Nikiforova
This is presentation for the paper "Open data hackathon as a tool for increased engagement of Generation Z: to hack or not to hack?" presented at EGETC2022.
A hackathon is known as a form of civic innovation in which participants representing citizens can point out existing problems or social needs and propose a solution. Given the high social, technical, and economic potential of open government data (OGD), the concept of open data hackathons is becoming popular around the world. This concept has become popular in Latvia with the annual hackathons organised for a specific cluster of citizens – Generation Z. This study presents the latest findings on the role of open data hackathons and the benefits that they can bring to both the society, participants, and government. First, a systematic literature review is carried out to establish a knowledge base. Then, empirical research of 4 case studies of open data hackathons for Generation Z participants held between 2018 and 2021 in Latvia is conducted to understand which ideas dominated and what were the main results of these events for the OGD initiative. It demonstrates that, despite the widespread belief that young people are indifferent to current
societal and natural problems, the ideas developed correspond to current situation and are aimed at solving them, revealing aspects for improvement in both the
provision of data, infrastructure, culture, and government- related areas.
Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...Anastasija Nikiforova
This is the presentation for our ongoing study "Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Innovation Resistance Theory" (Anastasija Nikiforova, Anneke Zuiderwijk) presented at ICEGOV2022 conference – 15th International Conference on Theory and Practice of Electronic Governance (nominated to the Best Paper Awards).
In short, the study aims to develop an Open Government Data-adapted Innovation Resistance Theory model to empirically identify predictors affecting public agencies’ resistance to openly sharing government data. Here we want to understand:
💡what are functional and behavioural factors that facilitate or hamper opening government data by public organizations?
💡does IRT provide a new and more complete insight compared to more traditional UTAUT and TAM? – IRT has not been applied in this domain, yet, so we are checking whether it should be considered, or rather those models we are familiar so much are the best ones?
💡and additionally – does the COVID-19 pandemic had an [obvious/significant] effect on the public agencies in terms of their readiness or resistance to openly share government data?
Based on a review of the literature on both IRT research and barriers associated with open data sharing by public agencies, we developed an initial version of the model. Once the model is refined in a qualitative study (interviews with public agencies), we will validate it to study the resistance of public authorities to openly sharing government data in a quantitative study.
Read the paper and cite as -> Nikiforova A., Zuiderwijk A. (2022) Barriers to openly sharing government data: towards an open data-adapted innovation resistance theory, In 15th International Conference on Theory and Practice of Electronic Governance (ICEGOV 2022). Association for Computing Machinery, New York, NY, USA, 215–220, https://doi.org/10.1145/3560107.3560143 – best paper award nominee
Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRISAnastasija Nikiforova
This presentation is a supplementary material for the "Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRIS" presented at 15th International Conference on Current Research Information Systems (CRIS2022) - Linking Research Information across data spaces. It provides an insight on the ongoing study of combining data lake as a data repository and data wrangling seeking for an increased data quality in CRIS systems, although the proposed approach is domain-agnostic and can be used not only within CRIS.
Read the article here -> Azeroual, O., Schöpfel, J., Ivanovic, D., & Nikiforova, A. (2022, May). Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRIS. In CRIS2022: 15th International Conference on Current Research Information Systems --> https://hal.archives-ouvertes.fr/hal-03694519/
The role of open data in the development of sustainable smart cities and smar...Anastasija Nikiforova
This presentation is a supplementary material for the guest lecture "The role of open data in the development of sustainable smart cities and smart society" I delivered for the Federal University of Technology – Paraná (Universidade Tecnológica Federal do Paraná (UTFPR)) (Brazil, May 2022).
Data security as a top priority in the digital world: preserve data value by ...Anastasija Nikiforova
Today, in the age of information and Industry 4.0, billions of data sources, including but not limited to interconnected devices (sensors, monitoring devices) forming Cyber-Physical Systems (CPS) and the Internet of Things (IoT) ecosystem, continuously generate, collect, process, and exchange data. With the rapid increase in the number of devices and information systems in use, the amount of data is increasing. Moreover, due to the digitization and variety of data being continuously produced and processed with a reference to Big Data, their value, is also growing. As a result, the risk of security breaches and data leaks. The value of data, however, is dependent on several factors, where data quality and data security that can affect the data quality if the data are accessed and corrupted, are the most vital. Data serve as the basis for decision-making, input for models, forecasts, simulations etc., which can be of high strategical and commercial / business value. This has become even more relevant in terms of COVID-19 pandemic, when in addition to affecting the health, lives, and lifestyle of billions of citizens globally, making it even more digitized, it has had a significant impact on business. This is especially the case because of challenges companies have faced in maintaining business continuity in this so-called “new normal”. However, in addition to those cybersecurity threats that are caused by changes directly related to the pandemic and its consequences, many previously known threats have become even more desirable targets for intruders, hackers. Every year millions of personal records become available online. Moreover, the popularity of IoTSE decreased a level of complexity of searching for connected devices on the internet and easy access even for novices due to the widespread popularity of step-by-step guides on how to use IoT search engine to find and gain access if insufficiently protected to webcams, routers, databases and other artifacts. A recent research demonstrated that weak data and database protection in particular is one of the key security threats. Various measures can be taken to address the issue. The aim of the study to which this presentation refers is to examine whether “traditional” vulnerability registries provide a sufficiently comprehensive view of DBMS security, or whether they should be intensively and dynamically inspected by DBMS holders by referring to Internet of Things Search Engines moving towards a sustainable and resilient digitized environment. The paper brings attention to this problem and make the reader think about data security before looking for and introducing more advanced security and protection mechanisms, which, in the absence of the above, may bring no value.
IoTSE-based Open Database Vulnerability inspection in three Baltic Countries:...Anastasija Nikiforova
This presentation is devoted to the "IoTSE-based Open Database Vulnerability inspection in three Baltic Countries: ShoBEVODSDT sees you" research paper developed by Artjoms Daskevics and Anastasija Nikiforova and presented during the The International conference on Internet of Things, Systems, Management and Security (IOTSMS2021) co-located with The 8th International Conference on Social Networks Analysis, Management and Security (SNAMS2021), December 6-9, 2021, Valencia, Spain (online)
Read paper here -> Daskevics, A., & Nikiforova, A. (2021, December). IoTSE-based open database vulnerability inspection in three Baltic countries: ShoBEVODSDT sees you. In 2021 8th International Conference on Internet of Things: Systems, Management and Security (IOTSMS) (pp. 1-8). IEEE -> https://ieeexplore.ieee.org/abstract/document/9704952?casa_token=NfEjYuud0wEAAAAA:6QxucVPuY762I3qzD6D_oWqa0B9eMUFRNMG-E7dyHKohSYIzI0bH1V9bLaAcly_Lp-Ll52ghO5Y
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...Anastasija Nikiforova
This presentations is a supplementary material for presenting the "Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can Save Your Business" (authored by Anastasija Nikiforova and Natalija Kozmina) research paper during the The International Conference on Intelligent Data Science Technologies and Applications (IDSTA2021), November 15-16, 2021. Tartu, Estonia (web-based)
Read paper here -> Nikiforova, A., & Kozmina, N. (2021, November). Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can Save Your Business. In 2021 Second International Conference on Intelligent Data Science Technologies and Applications (IDSTA) (pp. 66-73). IEEE -> https://ieeexplore.ieee.org/abstract/document/9660802?casa_token=LFJa20LrXAwAAAAA:wVwhTcCPWqxdloAvDQ3-l98KkkLx70xzG3zNvIIkJbC6wvJ4VxwX_VGc3mmW_7c1T-QJlOtTiao
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...Anastasija Nikiforova
This presentation is devoted to the "ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detection tool or what Internet of Things Search Engines know about you" research paper developed by Artjoms Daskevics and Anastasija Nikiforova and presented during the The International Conference on Intelligent Data Science Technologies and Applications (IDSTA2021), November 15-16, 2021. Tartu, Estonia (web-based).
Read paper here -> Daskevics, A., & Nikiforova, A. (2021, November). ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detection tool or what Internet of Things Search Engines know about you. In 2021 Second International Conference on Intelligent Data Science Technologies and Applications (IDSTA) (pp. 38-45). IEEE.
Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...Anastasija Nikiforova
This presentation is prepared as a part of my talk on the openness (open data and open science) in the context of Society 5.0 during the International Conference and Expo on Nanotechnology and Nanomaterials. It was very pleasant to receive an invitation to deliver the talk on my recently published article Smarter Open Government Data for Society 5.0: Are Your Open Data Smart Enough? (Sensors 2021, 21(15), 5204), which I have entitled as “Open Data as a driver of Society 5.0: how you and your scientific outputs can contribute to the development of the Super Smart Society and transformation into Smart Living?“. The paper has been briefly discussed in my previous post, thus, just a few words on this talk and overall experience.
Towards enrichment of the open government data: a stakeholder-centered determ...Anastasija Nikiforova
This set of slides is a part of the presentation prepared and delivered in the scope of the 14th International Conference on Theory and Practice of Electronic Governance (ICEGOV 2021), 6-8 October, 2021, Smart Digital Governance for Global Sustainability
It is based on the paper -> Nikiforova, A. (2021, October). Towards enrichment of the open government data: a stakeholder-centered determination of High-Value Data sets for Latvia. In 14th International Conference on Theory and Practice of Electronic Governance (pp. 367-372) -> https://dl.acm.org/doi/abs/10.1145/3494193.3494243?casa_token=bPeuwmFWwQwAAAAA:ls-xXIPK5uXDHyxtBxqsMJOCuV6ud_ip59BX8n78uJnqvql6e8H9urlDG9zzeNklRmGFwI4sCXU06w
Atvērtā lekcija "Atvērto datu potenciāls" notika LU SZF maģistrantūras kursa “Datu sabiedrības vadība” ietvaros, ko nolasīja Dr.sc.comp. Anastasija Ņikiforova, LU Datorikas fakultātes docente, pētniece.
Atvērtie dati tiek uzskatīti par vērtīgu resursu, kura izmantošana ir potenciāli spējīga sniegt ievērojamus ekonomiskus, tehnoloģiskus un sociālus ieguvumus. Taču to panākšanai ir jāizpildās virknei priekšnosacījumu, kas attiecināmi gan uz datiem, gan uz infrastruktūru, gan uz lietotājiem, t.i. atvērto datu iniciatīvas veiksmes faktors ir ilgtspējīgas atvērto pārvaldes datu ekosistēmas izveide un uzturēšana. Lekcijas mērķis ir sniegt ieskatu par atvērto datu popularitāti un potenciālu tehnoloģisko un ekonomisko procesu attīstībai, uzmanību pievēršot to praktiskiem pielietojumiem gan Latvijā, gan ārpus tās, datus transformējot (inovatīvajos) risinājumos un pakalpojumos. Tāpat, ir plānots sniegts ieskatu par nozīmīgākajiem aspektiem, kas potenciāli ir spējīgi sekmēt ilgtspējīgas atvērto datu ekosistēmas izveidi, nodrošinot iespēju ikvienam interesentam atvērtus datus transformēt vērtībā.
PhD, Dc. comp.sc. Anastasija Ņikiforova ir Latvijas Universitātes Datorikas Fakultātes docente un Inovatīvo informācijas tehnoloģiju laboratorijas pētniece. Dr. Ņikiforovas pētnieciskas intereses ir saistītas ar datu pārvaldības, īpaši datu kvalitātes, un atvērto datu saistītājiem jautājumiem. LU Datorikas fakultātē papildus citiem docētājiem kursiem viņa ir izstrādājusi Specsemināru “Atvērtie dati un datu kvalitāte” un maģistra programmas kursu “Atvērtie pārvaldes dati datu-virzītā pasaulē”. Dr. Ņikiforova ir Latvijas Zinātnes padomes eksperte Inženierzinātnes un tehnoloģijas (Elektrotehnika, elektronika, informācijas un komunikāciju tehnoloģijas) un Dabaszinātnes (Datorzinātnes un informātika) nozarēs, kā arī LATA (Latvijas Atvērto Tehnoloģiju Asociācija) asociētā biedre. Viņa ir vairāk kā 25 zinātnisko rakstu (līdz-)autore, 4 no kuriem ir publicēti augstākā rangā Q1 žurnālos.
TIMELINESS OF OPEN DATA IN OPEN GOVERNMENT DATA PORTALS THROUGH PANDEMIC-RELA...Anastasija Nikiforova
This presentation is a supplementary material for the following article -> Nikiforova, A. (2020, October). Timeliness of open data in open government data portals through pandemic-related data: a long data way from the publisher to the user. In 2020 Fourth International Conference on Multimedia Computing, Networking and Applications (MCNA) (pp. 131-138). IEEE.
The paper addresses the “timeliness” of data in open government data (OGD) portals. It is one of the primary principles of open data, which is considered to be a success factor, while at the same time it is one of the biggest barriers that can disrupt users trust in data and even the desire to use the entire open data portal. However, assessing this aspect is a very difficult task that, in most cases, becomes an impossible for open data users. There is therefore a lack of comparative studies on the timeliness of data of different national open data portals. Unfortunately, 2020 gave the opportunity to find out this. It became easy enough to compare how long is the data path from the data holder to the OGD portal by analysing the timeliness of Covid-19-related data sets in relation to the first case observed in a country. The study thus fills the gap of comparative studies by addressing 60 countries and their OGD portals concerning the timeliness of the data, providing a report on how much and what countries provide the open data as quickly as possible. It makes it possible to understand how quickly OGD portals react to emergencies by opening and updating data for their further potential reuse, which is essential in the digital data-driven world.
Read paper here -> Nikiforova, A. (2020, October). Timeliness of open data in open government data portals through pandemic-related data: a long data way from the publisher to the user. In 2020 Fourth International Conference on Multimedia Computing, Networking and Applications (MCNA) (pp. 131-138). IEEE.https://ieeexplore.ieee.org/abstract/document/9264298?casa_token=FtfC_6bqZnsAAAAA:TaSnKrE7ZCxLyq5hvxX-X8O2sK_vZYcodTBtxoWOvaOAIFmMmy65f5dIK-kKYxFAMiC5jyl7Eeg
This presentation is a supplementary material for the following article -> Nikiforova, A., Bicevskis, J., & Karnitis, G. (2020, December). Towards a Concurrence Analysis in Business Processes. In 2020 Seventh International Conference on Social Networks Analysis, Management and Security (SNAMS) (pp. 1-6). IEEE.
This paper presents first steps towards a solution aimed to provide concurrent business processes analysis methodology for predicting the probability of incorrect business process execution. The aim of the paper is to (a) look at approaches to describing and dealing with the execution of concurrent processes, mainly focusing on the transaction mechanisms in database management systems, (b) present an idea and a preliminary version of an algorithm that detects the possibility of incorrect execution of concurrent business processes. Analyzing business process according to the proposed procedure allows to configure transaction processing optimally.
Quality defects in TMT Bars, Possible causes and Potential Solutions.PrashantGoswami42
Maintaining high-quality standards in the production of TMT bars is crucial for ensuring structural integrity in construction. Addressing common defects through careful monitoring, standardized processes, and advanced technology can significantly improve the quality of TMT bars. Continuous training and adherence to quality control measures will also play a pivotal role in minimizing these defects.
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
Courier management system project report.pdfKamal Acharya
It is now-a-days very important for the people to send or receive articles like imported furniture, electronic items, gifts, business goods and the like. People depend vastly on different transport systems which mostly use the manual way of receiving and delivering the articles. There is no way to track the articles till they are received and there is no way to let the customer know what happened in transit, once he booked some articles. In such a situation, we need a system which completely computerizes the cargo activities including time to time tracking of the articles sent. This need is fulfilled by Courier Management System software which is online software for the cargo management people that enables them to receive the goods from a source and send them to a required destination and track their status from time to time.
Vaccine management system project report documentation..pdfKamal Acharya
The Division of Vaccine and Immunization is facing increasing difficulty monitoring vaccines and other commodities distribution once they have been distributed from the national stores. With the introduction of new vaccines, more challenges have been anticipated with this additions posing serious threat to the already over strained vaccine supply chain system in Kenya.
Democratizing Fuzzing at Scale by Abhishek Aryaabh.arya
Presented at NUS: Fuzzing and Software Security Summer School 2024
This keynote talks about the democratization of fuzzing at scale, highlighting the collaboration between open source communities, academia, and industry to advance the field of fuzzing. It delves into the history of fuzzing, the development of scalable fuzzing platforms, and the empowerment of community-driven research. The talk will further discuss recent advancements leveraging AI/ML and offer insights into the future evolution of the fuzzing landscape.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
Data Quality as a prerequisite for you business success: when should I start taking care of it?
1. HackCodeX Forum
5.06.2023, Riga, Latvia
DATA QUALITY AS A PREREQUISITE
FOR BUSINESS SUCCESS:
WHEN SHOULD I START
TAKING CARE OF IT?
Anastasija Nikiforova
Assistant Professor of Information Systems, Faculty of Science and Technology,
Institute of Computer Science, Chair of Software Engineering, University of Tartu
European Open Science CLoud (EOSC) Task Force “FAIR metrics and data quality”
2. PHD IN COMPUTER SCIENCE – DATA PROCESSING SYSTEMS AND DATA NETWORKING
RESEARCH INTERESTS: DATA MANAGEMENT WITH A FOCUS ON DATA QUALITY, OPEN
GOVERNMENT DATA, SMART CITY, SOCIETY 5.0, SUSTAINABLE DEVELOPMENT, IOT, HCI,
DIGITIZATION.
✔ASSISTANT PROFESSOR AT THE UNIVERSITY OF TARTU, FACULTY OF SCIENCE AND TECHNOLOGY, INSTITUTE OF COMPUTER SCIENCE,
CHAIR OF SOFTWARE ENGINEERING
✔EUROPEAN OPEN SCIENCE CLOUD TASK FORCE “FAIR METRICS AND DATA QUALITY”
✔EDSC AMBASSADOR (EUROPEAN DIGITAL SKILLS CERTIFICATE, AS PART OF ACTION 9 OF THE DIGITAL EDUCATION ACTION PLAN (2021- 2027) –
JRC/SVQ/2022/OP/0013)
✔IFIP WG8.5 ON ICT AND PUBLIC ADMINISTRATION MEMBER
✔ASSOCIATE MEMBER OF THE LATVIAN OPEN TECHNOLOGY ASSOCIATION
✔EXPERT OF THE LATVIAN COUNCIL OF SCIENCES IN (1) NATURAL SCIENCES – COMPUTER SCIENCE & INFORMATICS, (2) ENGINEERING & TECHNOLOGY-
ELECTRICAL ENGINEERING, ELECTRONICS, ICT, (3) SOCIAL SCIENCES – ECONOMICS & BUSINESS
✔EXPERT OF THE COST – EUROPEAN COOPERATION IN SCIENCE & TECHNOLOGY
✔ASSISTANT PROFESSOR AT THE UNIVERSITY OF TARTU, FACULTY OF SCIENCE AND TECHNOLOGY, INSTITUTE OF COMPUTER SCIENCE,
CHAIR OF SOFTWARE ENGINEERING
✔EUROPEAN OPEN SCIENCE CLOUD TASK FORCE “FAIR METRICS AND DATA QUALITY”
✔EDSC AMBASSADOR (EUROPEAN DIGITAL SKILLS CERTIFICATE, AS PART OF ACTION 9 OF THE DIGITAL EDUCATION ACTION PLAN (2021- 2027) –
JRC/SVQ/2022/OP/0013)
✔IFIP WG8.5 ON ICT AND PUBLIC ADMINISTRATION MEMBER
✔ASSOCIATE MEMBER OF THE LATVIAN OPEN TECHNOLOGY ASSOCIATION
✔EXPERT OF THE LATVIAN COUNCIL OF SCIENCES IN (1) NATURAL SCIENCES – COMPUTER SCIENCE & INFORMATICS, (2) ENGINEERING & TECHNOLOGY-
ELECTRICAL ENGINEERING, ELECTRONICS, ICT, (3) SOCIAL SCIENCES – ECONOMICS & BUSINESS
✔EXPERT OF THE COST – EUROPEAN COOPERATION IN SCIENCE & TECHNOLOGY
✔VISITING RESEARCHER AT THE DELFT UNIVERSITY OF TEHNOLOGY, FACULTY TECHNOLOGY POLICY AND MANAGEMENT (TPM)
✔ASSISTANT PROFESSOR AT THE FACULTY OF COMPUTING, UNIVERSITY OF LATVIA
✔RESEARCHER IN THE INNOVATION LABORATORY, FACULTY OF COMPUTING, UNIVERSITY OF LATVIA
✔IT-EXPERT AT THE LATVIAN BIOMEDICAL RESEARCH AND STUDY CENTRE, BBMRI-ERIC LV NATIONAL NODE
✔ADVISOR FOR THE INSTITUTE FOR SOCIAL AND POLITICAL STUDIES, UNIVERSITY OF LATVIA
✔DATA SECURITY SOLUTIONS, LATVIA
✔VISITING RESEARCHER AT THE DELFT UNIVERSITY OF TEHNOLOGY, FACULTY TECHNOLOGY POLICY AND MANAGEMENT (TPM)
✔ASSISTANT PROFESSOR AT THE FACULTY OF COMPUTING, UNIVERSITY OF LATVIA
✔RESEARCHER IN THE INNOVATION LABORATORY, FACULTY OF COMPUTING, UNIVERSITY OF LATVIA
✔IT-EXPERT AT THE LATVIAN BIOMEDICAL RESEARCH AND STUDY CENTRE, BBMRI-ERIC LV NATIONAL NODE
✔ADVISOR FOR THE INSTITUTE FOR SOCIAL AND POLITICAL STUDIES, UNIVERSITY OF LATVIA
✔DATA SECURITY SOLUTIONS, LATVIA
MOST RECENT EXPERIENCE
PAST EXPERIENCE
6. DATA … DATA ARE EVERYWHERE
Sources: Premium Vector | Artificial intelligence logo, icon. vector symbol ai, deep learning blockchain neural network concept. machine learning, artificial intelligence, ai. (freepik.com), Top 10 Successful Data Science Companies in 2023 - Learn | Hevo (hevodata.com),
How to Use Business Intelligence (BI) to Improve Organizational Alignment | Wyn Enterprise (grapecity.com), Machine learning logo - Wi6Labs, Business Intelligence Icon Gráfico por aimagenarium · Creative Fabrica, Open Data – GEOAFRICA,
https://www.gartner.com/en/articles/4-emerging-technologies-you-need-to-know-about?utm_medium=social&utm_source=linkedin&utm_campaign=SM_GB_YOY_GTR_SOC_SF1_SM-SWG&utm_content=&sf267111387=1
7. DATA … DATA ARE EVERYWHERE
M-Files on Twitter: "Data is the New Oil – Especially in Oil and Gas! https://t.co/zFlrvQqlMs https://t.co/qE3Q4aLNQy" / Twitter
8. DATA QUALITY - WHAT, WHY, HOW, 10 BEST PRACTICES & MORE - Enterprise Master Data Management • Profisee
14. “DATA IS THE NEW OIL” WHY IT IS NOT?
BUT!
✓
Source: Here's Why Data Is Not The New Oil (forbes.com), Image sources: Oil well – Wikipedia, How do we get oil and gas out of the ground? (world-petroleum.org), Customized Silos For Effective Storage of Food | Nextech Solutions (nextechagrisolutions.com)
DATA, LIKE OIL is a source of power,
and those, who control them,
are establishing themselves as «masters of the universe»,
just as oil barons did 100 years ago
15. effectively infinitely durable and reusable
treating like oil –storing in siloes, has little benefit & reduces its usefulness
a finite resource
can be replicated indefinitely & moved around the world at
the speed of light, at low cost, through fiber optic networks
OIL
requires huge amounts of resources to be
transported to where it is needed
when used, its energy being lost as heat or light, or
permanently converted into another form (e.g., plastic)
becomes more useful the more it is used - once
processed, data often reveals further applications
as the world’s oil reserves dwindle, extracting
it becomes increasingly difficult and expensive
becoming increasingly available as computer
technology advances
data mining doesn’t intrinsically involve damage to the
environment & exploitation of finite natural resources
*apart from the electricity used to run the system
oil drilling involve causing damage to the natural
environment and exploitation of finite natural resources
“DATA IS THE NEW OIL” WHY IT IS NOT?
✘
Source: Here's Why Data Is Not The New Oil (forbes.com), Image sources: Oil well – Wikipedia, How do we get oil and gas out of the ground? (world-petroleum.org), Customized Silos For Effective Storage of Food | Nextech Solutions (nextechagrisolutions.com)
DATA
✘
✘
✘
✘
16. IF WE THINK ABOUT DATA AS A POWER SOURCE OR FUEL,
IT WOULD MAKE MORE SENSE TO COMPARE THEM WITH
RENEWABLE SOURCES LIKE THE
SUN, WIND AND TIDES”
-B. Marr, Forbes
Here's Why Data Is Not The New Oil (forbes.com)
Letter from the Editor: Here comes the sun (medicalnewstoday.com), A healthy wind | MIT News | Massachusetts Institute of Technology, Tidal phenomenon: high and low tides | Ponant Magazine
17. AMONG OTHER “NUANCES”,
DATA QUALITY IS USE-CASE DEPENDENT AND DYNAMIC IN NATURE
“ABSOLUTE DATA QUALITY”
DATA QUALITY LEVEL AT WHICH THE DATA WOULD SATISFY
ALL POSSIBLE USE CASES - IS IMPOSSIBLE TO ACHIEVE,
BUT IT IS A GOAL TO BE PURSUED
20. Def. 1: FITNESS-FOR-USE
Def. 2: FITNESS-FOR-PURPOSE
Def. 3: FREE OF ERRORS
UTILITY*
WARRANTY*
=
=
According to ITIL® 4: the framework for the management of IT-enabled service
21. ISO def.: THE DEGREE TO WHICH
DATA SATISFIES THE REQUIREMENTS
OF ITS INTENDED PURPOSE
ISO/IEC 25012
22. IN SIMPLER TERMS… THINK OF WINE…
INTRINSIC - flavor type & intensity
EXTRINSIC - brand, packaging…
Based on ISO 19157,
Langstaff, S. A. (2010). Sensory quality control in the wine industry.
Lacagnina, C., David, R., Nikiforova, A., Kuusniemi, M. E., Cappiello, C., Biehlmaier, O., Wright, L., Schubert, C., Bertino, A., Thiemann, H., & Dennis, R. (2023). Towards a data quality framework for
23.
24. NOT ONLY ABOUT WHAT, BUT
ALSO ABOUT HOW?
IT IS A PROCESS
25. NOT ONLY ABOUT WHAT, BUT
ALSO ABOUT HOW?
IT IS A PROCESS –
DATA QUALITY MANAGEMENT PROCESS
26.
27. DEFINE
MEASURE
ANALYSE
IMPROVE TDQM
DATA QUALITY MANAGEMENT PROCESS
TOTAL DATA QUALITY MANAGEMENT LIFCYCLE (BY MIT)
DEFINE: IDENTIFY RELEVANT DQ DIMENSIONS
MEASURE: PRODUCE DQ METRICS
ANALYSE: IDENTIFY ROOT CAUSES FOR DQ PROBLEMS AND
DETERMINE THE IMPACT OF POOR DQ
IMPROVE: IDENTIFY AND EMPLOY TECHNIQUES FOR
IMPROVING DQ
28. •Lacagnina, C., David, R., Nikiforova, A., Kuusniemi, M. E., Cappiello, C., Biehlmaier, O., Wright, L.,
Schubert, C., Bertino, A., Thiemann, H., & Dennis, R. (2023). Towards a data quality framework
for EOSC. Zenodo. https://doi.org/10.5281/zenodo.7515816
31. IS THERE ANY COMMONLY ACCEPTED DQ DIMENSION
CLASSIFICATION?
https://iso25000.com/index.php/en/iso-25000-standards/iso-25012/136-iso-iec-2012
ISO 25012
SOFTWARE ENGINEERING — SOFTWARE
PRODUCT QUALITY REQUIREMENTS
AND EVALUATION (SQUARE) — DATA
QUALITY MODEL
32. DIMENSIONS VARY IN DEFINITION AND SCOPE
ONE AND THE SAME NOTION CAN REFER TO DIFFERENT DIMENSIONS
ONE AND THE SAME DIMENSION CAN HAVE
DIFFERENT NOTIONS [IN DIFFERENT SOURCES]
DATA QUALITY RULES ARE THEN DEFINED
FOR EACH DIMENSION
METRICS ARE THEN SELECTED FOR THEM
34. ✓ STANDARDIZATION, NORMALIZATION AND PARSING
✓ MATCHING / DEDUPLICATION AND MERGING
✓ DATA CLEANSING
✓ VALIDATION
✓ DATA PROFILING / AUDITING
✓ SOME A FEW OF THEM SUPPORT (SEMI-)AUTOMATED DQ RULE RECOGNITION
BASED ON METADATA, BUILT-IN RULES, OR MACHINE LEARNING
DQ TOOLS FOR (SEMI-)AUTOMATED DQM
40. DATA OBJECT
DATASET
DATABASE DATA REPOSITORY INFORMATION SYSTEM
SOFTWARE
DATA STRUCTURE
NO ONE-SIZE-FITS-ALL
STRUCTURED DATA UNSTRUCTURED DATA
SEMI-STRUCTURED DATA
Image sources: https://monkeylearn.com/blog/semi-structured-data/, https://www.pngitem.com/middle/ioJTTbR_organization-structure-icon-png-download-structures-icon-png/
41. DATA OBJECT
DATASET
DATABASE DATA REPOSITORY INFORMATION SYSTEM
SOFTWARE
DATA WAREHOUSE DATA LAKE
Maybe even something else?
NO ONE-SIZE-FITS-ALL
42. DATA OBJECT
DATASET
DATABASE DATA REPOSITORY INFORMATION SYSTEM
SOFTWARE
Running Analytics on the Data Lake - The Databricks Blog
NO ONE-SIZE-FITS-ALL
44. Implementing a Data Lake or Data Warehouse Architecture for Business Intelligence? | by Lan Chu | Towards Data Science
NB: EXTRACT-TRANSFORM-LOAD
IS NOT DQM!!!
47. Image source: The abstracted future of data engineering | by Justin Gage | Datalogue | Medium
OR HOW TO AVOID GIGO*?
*“GARBAGE IN, GARBAGE OUT”
48. DATA LAKE FOR BI
BUSINESS DATA LAKE
https://www.capgemini.com/wp-content/uploads/2017/07/pivotal_data_lake_vs_traditional_bi_20140805.pdf
49. DATA LAKE
+
DATA WRANGLING
[an asset, not a silver bullet]
✔
Source: https://monkeylearn.com/blog/data-wrangling/, https://www.altair.com/what-is-data-wrangling/ , https://pediaa.com/what-is-the-difference-between-data-wrangling-and-data-cleaning
51. THE DATA WRANGLING PROCESS TO PREPARE DATA AND INTEGRATE IT INTO IS
DEPENDING ON THE IS AND THE DESIRED OR REQUIRED TARGET QUALITY*, INDIVIDUAL STEPS
SHOULD BE CARRIED OUT SEVERAL TIMES ➔ !!! DATA WRANGLING IS A CONTINUOUS PROCESS
!!! THAT REPEATS ITSELF REPEATEDLY AT REGULAR INTERVALS.
Information
System
Azeroual, O., Schöpfel, J., Ivanovic, D., & Nikiforova, A. (2022). Combining data lake and
data wrangling for ensuring data quality in CRIS. Procedia Computer Science, 211, 3-16.
52. DATA LAKE VS DATA WAREHOUSE
HOW TO TAKE
THE ADVANTAGES OF BOTH?
53. DATA LAKE VS DATA WAREHOUSE
HOW TO TAKE
THE ADVANTAGES OF BOTH?
DATA LAKEHOUSE
54. DATA LAKEHOUSE IS SEEN AS A COMBINATION OF DATA WAREHOUSING WORKLOADS & DATA LAKE ECONOMICS
Running Analytics on the Data Lake - The Databricks Blog
55. Running Analytics on the Data Lake - The Databricks Blog, Build a Lake House Architecture on AWS | AWS Big Data Blog (amazon.com), The Data Lakehouse, the Data Warehouse and a Modern Data platform architecture - Microsoft Community Hub
58. THINK DATA QUALITY FIRST!!! OR TOWARDS DATA
QUALITY BY DESIGN
Guerra-García, C., Nikiforova, A., Jiménez, S., Perez-Gonzalez, H. G., Ramírez-Torres, M., & Ontañon-
García, L. (2023). ISO/IEC 25012-based methodology for managing data quality requirements in the
development of information systems: Towards Data Quality by Design. Data & Knowledge
Engineering, 145,
DAQUAVORD - A METHODOLOGY FOR PROJECT MANAGEMENT OF DATA QUALITY REQUIREMENTS
SPECIFICATION - AIMED AT ELICITING DQ REQUIREMENTS ARISING FROM DIFFERENT USERS’ VIEWPOINTS
THESE DQ REQUIREMENTS SERVE AS DATA QUALITY SOFTWARE REQUIREMENT AT THE TIME
OF THE DEVELOPMENT OF SOFTWARE THAT TAKES DATA QUALITY INTO ACCOUNT BY
DEFAULT.
IS BASED ON THE VIEWPOINT-ORIENTED REQUIREMENTS DEFINITION (VORD) METHOD, AND
THE LATEST AND MOST GENERALLY ACCEPTED ISO/IEC 25012 STANDARD.
59. DATA ARTIFACT
WHAT DQM APPROACH DEPENDS ON?
DEFINITION USER
TIME
DIMENSION
PROCESS PURPOSE
60.
61. MUSK’S TOP PRIORITY: TO IMPROVE THE
PRODUCT…
Q: HOW DOES ONE ENSURE THE RELIABILITY OF DATA
AND DECISIONS MADE BASED ON SAID DATA?
THE ANSWER LIES NOT IN MANAGING THE DATA ALONE,
BUT ALSO THE INFORMATION AROUND AND ABOUT DATA
ACQUISITION, TRANSFORMATIONS AND VISUALIZATION
TO PROVIDE A BETTER UNDERSTANDING AND SUPPORT
DECISION MAKERS
https://www.gqindia.com/get-smart/content/5-things-elon-musk-did-to-become-one-of-the-richest-men-in-the-world
62. https://www.gqindia.com/get-smart/content/5-things-elon-musk-did-to-become-one-of-the-richest-men-in-the-world
MUSK’S TOP PRIORITY: TO IMPROVE THE
PRODUCT…
Q: HOW DOES ONE ENSURE THE RELIABILITY OF DATA
AND DECISIONS MADE BASED ON SAID DATA?
THE ANSWER LIES NOT IN MANAGING THE DATA ALONE,
BUT ALSO THE INFORMATION AROUND AND ABOUT DATA
ACQUISITION, TRANSFORMATIONS AND VISUALIZATION
TO PROVIDE A BETTER UNDERSTANDING AND SUPPORT
DECISION MAKERS
BY FOCUSING ON SUSTAINABLE DATA, CLEAR
DATA GOVERNANCE
AND STRONG DATA MANAGEMENT