INCONSISTENCIES IN BIG DATA
1
Prepared by,
Minu Joseph
Guided by,
Mr. Thomas Varghese
Contents
• Introduction.
• Problem Statement.
• 3V’s
• Big data.
• Defining Big data.
• Dimensions of big data.
• Sources, applications of big data.
• Inconsistencies in big data.
• Inconsistency induced learning.
• Conclusion.
• References.
2
Introduction
• A torrent of data is generated and captured in
digital form due to advancement in science
and technology.
• Everything we do is increasingly leaving a
digital trace.
• Large data sets which are so large and
complex that traditional data processing
applications are inadequate.
3
Problem Statement
• Big Data-The next big thing in IT industry.
• Classification of big data inconsistencies.
• Big Data and Big Data analysis in terms of
issues and challenges.
• Inconsistency Induced Learning- A tool to turn
big data inconsistencies into helpful formulas
for better analysis of results.
4
5
Big Data
• Big data can be described by:
Volume
Velocity
Variety
Variability
Veracity
Complexity
6
What is BIG DATA?
7
8
Dimensions In Big Data
9
10
11
Levels of Knowledge
12
INCONSITENCIES IN BIG DATA
• Temporal
• Spatial
• Text
• Functional Dependency
13
Temporal Inconsistencies
• Conflicting information.
• Data items with conflicting circumstances may
coincide or overlap in time.
• SRS often contain inconsistent information.
• Inconsistent information affects the
correctness and performance of the system.
• Due to concurrent programming errors
Therac-25(1985-1987) lead to 6 accidents.
14
List of temporal inconsistencies
15
Spatial Inconsistencies
• Happens in datasets which include geometric
or spatial dimensions.
• Traditional DB systems are enhanced to
include spatially referenced data.
• Spatial inconsistencies can arise from
 Geometric representation of objects
 Spatial relationship between objects
 Aggregation of composite objects.
16
Spatial Inconsistencies contd..
17
Text Inconsistencies
• Inconsistencies found in unstructured natural
language text.
• Data generated from social media, blogs,
emails etc.
• If two texts are referring to same event or
entity they are said to be of co-reference.
• Contradiction Detection detects text
inconsistencies and has many applications.
18
Text Inconsistencies contd..
19
Functional Dependency Inconsistency
• When certain attribute values are equal, then
other attribute values must also be equal.
• Many big databases are stored , aggregated
and cleaned through the help of RDBMS.
• Here Functional dependencies play an
important role in enforcing the integrity
constraints for the database.
20
Functional Dependency Inconsistency
contd…
21
• Variation of Functional Dependencies will
result in inconsistencies in data and
information.
Inconsistency Induced Learning
• Improves data quality
• Helps to enhance big data applications.
• Accommodates lifelong learning by allowing
successive learning episodes to be triggered
through inconsistencies an agent encounters
during its problem solving episodes.
• Basic idea is to identify the cause of
inconsistency and then apply cause specific
heuristics to resolve inconsistencies.
22
Conclusion
• Multidimensional issues and challenges in big
data and big data analysis.
• Types of inconsistencies.
• How to improve quality of big data analysis.
23
References
• www.slideshare.com
• dl.acm.org
• www.ieeexplore.ieee.org
• D. Zhang, On Temporal Properties of Knowledge Base
Inconsistency. Springer Transactions on Computational
Science.
• M. Schroeck, R. Shockley, J. Smart, D. Romero-Morales, and P.
Tufano, Analytics: the real-world use of big data: how
innovative enterprises extract value from uncertain data,
Executive Report, IBM Institute for Business Value and Said
Business School at the University of Oxford.
• Nasrin Irshad Hussain ,Big Data,www.slideshare.com
24
QUESTIONS?
25
26

Inconsistencies in big data

  • 1.
    INCONSISTENCIES IN BIGDATA 1 Prepared by, Minu Joseph Guided by, Mr. Thomas Varghese
  • 2.
    Contents • Introduction. • ProblemStatement. • 3V’s • Big data. • Defining Big data. • Dimensions of big data. • Sources, applications of big data. • Inconsistencies in big data. • Inconsistency induced learning. • Conclusion. • References. 2
  • 3.
    Introduction • A torrentof data is generated and captured in digital form due to advancement in science and technology. • Everything we do is increasingly leaving a digital trace. • Large data sets which are so large and complex that traditional data processing applications are inadequate. 3
  • 4.
    Problem Statement • BigData-The next big thing in IT industry. • Classification of big data inconsistencies. • Big Data and Big Data analysis in terms of issues and challenges. • Inconsistency Induced Learning- A tool to turn big data inconsistencies into helpful formulas for better analysis of results. 4
  • 5.
  • 6.
    Big Data • Bigdata can be described by: Volume Velocity Variety Variability Veracity Complexity 6
  • 7.
    What is BIGDATA? 7
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
    INCONSITENCIES IN BIGDATA • Temporal • Spatial • Text • Functional Dependency 13
  • 14.
    Temporal Inconsistencies • Conflictinginformation. • Data items with conflicting circumstances may coincide or overlap in time. • SRS often contain inconsistent information. • Inconsistent information affects the correctness and performance of the system. • Due to concurrent programming errors Therac-25(1985-1987) lead to 6 accidents. 14
  • 15.
    List of temporalinconsistencies 15
  • 16.
    Spatial Inconsistencies • Happensin datasets which include geometric or spatial dimensions. • Traditional DB systems are enhanced to include spatially referenced data. • Spatial inconsistencies can arise from  Geometric representation of objects  Spatial relationship between objects  Aggregation of composite objects. 16
  • 17.
  • 18.
    Text Inconsistencies • Inconsistenciesfound in unstructured natural language text. • Data generated from social media, blogs, emails etc. • If two texts are referring to same event or entity they are said to be of co-reference. • Contradiction Detection detects text inconsistencies and has many applications. 18
  • 19.
  • 20.
    Functional Dependency Inconsistency •When certain attribute values are equal, then other attribute values must also be equal. • Many big databases are stored , aggregated and cleaned through the help of RDBMS. • Here Functional dependencies play an important role in enforcing the integrity constraints for the database. 20
  • 21.
    Functional Dependency Inconsistency contd… 21 •Variation of Functional Dependencies will result in inconsistencies in data and information.
  • 22.
    Inconsistency Induced Learning •Improves data quality • Helps to enhance big data applications. • Accommodates lifelong learning by allowing successive learning episodes to be triggered through inconsistencies an agent encounters during its problem solving episodes. • Basic idea is to identify the cause of inconsistency and then apply cause specific heuristics to resolve inconsistencies. 22
  • 23.
    Conclusion • Multidimensional issuesand challenges in big data and big data analysis. • Types of inconsistencies. • How to improve quality of big data analysis. 23
  • 24.
    References • www.slideshare.com • dl.acm.org •www.ieeexplore.ieee.org • D. Zhang, On Temporal Properties of Knowledge Base Inconsistency. Springer Transactions on Computational Science. • M. Schroeck, R. Shockley, J. Smart, D. Romero-Morales, and P. Tufano, Analytics: the real-world use of big data: how innovative enterprises extract value from uncertain data, Executive Report, IBM Institute for Business Value and Said Business School at the University of Oxford. • Nasrin Irshad Hussain ,Big Data,www.slideshare.com 24
  • 25.
  • 26.