2. Group Id : 15
Names: Shweta Dolhare (B120344212)
Snehal Gaikwad (B120344215)
Poonam Ghorpade (B120344219)
Project Title : “Error Detection in Big Data
on Cloud”
Internal Guide : S.B.Jadhav
3. Project Definition:
To develop such a approach that will
efficiently reduce time for detecting errors in
big sensor data on cloud. If any error is found
then it also involves error recovery & storing
the data in original format.
Technical Keywords :
Cloud Computing, Service Composition,
Online Web services, Hadoop, MySQL, Map
Reduce.
4. Introduction :
We need to develop such a approach that
will efficiently reduce time for detecting
errors in big sensor data on cloud. If any
error is found then it also involves error
recovery & storing the data in original
format.
5. According to the error type and features
from scale-free network we have
proposed a time-efficient strategy for
detecting and locating errors in big data
sets on cloud . The main aim is to reduce
the time required to detect the errors and
to provide a error free transmission of
data .
6. Motivation of the Project:
According to the error type and features from
scale-free network we have proposed a time-
efficient strategy for detecting and locating
errors in big data sets on cloud.
The main aim is to reduce the time required to
detect the errors and to provide a error free
transmission of data.
7. Big Data Processing on Cloud
Service over a network.
Ideal platform for big data storage.
Stream based data management.
Hadoop based framework.
Work load distribution .
Scalability.
Data filtering.
8. Error Detection on Cloud
Error detection
Error localization .
Complexity analysis.
Algorithm calibration on cloud.
9. MODULES :
Module 1 :
1.Create a big data
Module 2 :
2. Implementation of algorithm for error
detection
13. Design of Project:
A)Mathematical Model :
Let ‘S’ be the | Error detection in big data
as the final set.
Identify the inputs as D
S = {D,L,A}
D = {D1, D2, D3, D4| ‘D’ given Data
files}
14. Identify the outputs as O
S = {D, L, A}
D = {D1, D2, D3, D4| ‘D’ gives data files }
L = {L1, L2 | ‘L’ gives the log files for
upload and download and repair}
A = {A1, A2, A3 | ‘A’ gives alerts }
15. Identify the functions as ‘F’
S = {D, L, A, F}
F = {F1(), F2(), F3(), F4(), F5(), F6() }
F1( V ) :: Upload
F2 ( V) :: integrity check
F3 ( V ) :: Log generation
F4 ( T ) :: Alert the system
F4 ( D ) :: Restore the file
F6 ( V ) :: Download the data file
16. Feasibility Analysis:
This is considered with specifying equipment
and software that will successful satisfy the user
requirement the technical needs of the system
may vary considerably but might include
•The facility to produce outputs in a given time.
• Response time under certain conditions.
• Ability to process a certain column of
transaction at a particular speed.
17. Technical Feasibility :
The facility to produce outputs in a given
time.
Response time under certain conditions.
Ability to process a certain column of
transaction at a particular speed.
18. NP hard :
A problem is NP hard, if all other problems in
NP can be reduced to it .
NP complete :
A problem is NP complete , if it is (a) in NP, and
(b) NP hard.
In short:
NP- complete: the most difficult problems in
NP
Our project comes under NP Complete.
20. Cyclic Redundancy Check
In CRC, q sequence of redundant bits, called
cyclic redundancy check bits are appended
to the end of data unit so that resulting data
unit becomes exactly divisible by a second,
predetermined binary number . The basic
idea of CRC algorithms is simply to treat the
message as an enormous binary number, to
divide it by another fixed binary number,
and to make the remainder from this
division the checksum.
21. Humming Code :
Higher information rate.
It encode & decodes code words .
Detect errors of weight up to 3.
Correct errors of weight 1.
The key to the Hamming Code is the use of
extra parity bits to allow the identification of a
single error.
22. SHA 1
Cryptographic hash function.
Produces 160 bit hash values as a message.
Value is hexadecimal number.
40 digits long.
SHA-1 forms part of several widely used
security applications and protocols,
including TLS and SSL, PGP, SSH, S/MIME,
and IPsec. Those applications can also
use MD5; both MD5 and SHA-1 are descended
from MD4.
23. SHA-1 hashing is also used in distributed
revision control systems like Git, Mercurial,
and Monotone to identify revisions, and to
detect data corruption or tampering.
34. 1.Time efficient approach for detecting
and correcting errors.
2.It works for text , audio and video files.
Advantages:
35. Limitations:
1. It works for only specific kinds of errors.
2. Limited size of block upto 1 Gb.
3. Works on Public cloud.
36. Paper Submission Details :
Paper Title : “ERROR DETECTION IN
BID DATA”
Paper has been accepted for publication to
International Education and Research
Journal –IERJ (E-ISSN:2454-9916)
37. References :
S. Tsuchiya, Y. Sakamoto, Y. Tsuchimoto,
and V. Lee, “Big Data Processing in Cloud
Environments,” FUJITSU Science and
Technology J., vol. 48, no. 2, pp. 159-168,
2012.
S. Sakr, A. Liu, D. Batista, and M. Alomari,
“A Survey of Large Scale Data Management
Approaches in Cloud Environments,” IEEE
Comm. Surveys & Tutorials, vol. 13, no. 3, pp.
311-336, Third Quarter 2011.
38. M.C. Vuranand and I.F. Akyildiz, “Error
Control in Wireless Sensor Networks: A Cross
Layer Analysis,” IEEE Trans. Networking, vol.
17, no. 4, pp. 1186-1199, Aug. 2009.
C. Liu, J. Chen, T. Yang, X. Zhang, C. Yang,
R. Ranjan, and K. Kotagiri, “Authorized public
auditing of dynamic big data storage on cloud
with efficient verifiable fine-grained updates,”
IEEE Trans. Parallel and Distributed Systems,
vol. 25, no. 9, pp. 2234–2244, Sept. 2014
“SensorCloud,” http://www.sensorcloud.com/,
accessed on 30, Aug. 2013.