“ERROR DETECTION IN
BIGDATA ON CLOUD”
Group Id : 15
Names: Shweta Dolhare (B120344212)
Snehal Gaikwad (B120344215)
Poonam Ghorpade (B120344219)
Project Title : “Error Detection in Big Data
on Cloud”
Internal Guide : S.B.Jadhav
Project Definition:
To develop such a approach that will
efficiently reduce time for detecting errors in
big sensor data on cloud. If any error is found
then it also involves error recovery & storing
the data in original format.
Technical Keywords :
Cloud Computing, Service Composition,
Online Web services, Hadoop, MySQL, Map
Reduce.
Introduction :
We need to develop such a approach that
will efficiently reduce time for detecting
errors in big sensor data on cloud. If any
error is found then it also involves error
recovery & storing the data in original
format.
According to the error type and features
from scale-free network we have
proposed a time-efficient strategy for
detecting and locating errors in big data
sets on cloud . The main aim is to reduce
the time required to detect the errors and
to provide a error free transmission of
data .
Motivation of the Project:
According to the error type and features from
scale-free network we have proposed a time-
efficient strategy for detecting and locating
errors in big data sets on cloud.
The main aim is to reduce the time required to
detect the errors and to provide a error free
transmission of data.
Big Data Processing on Cloud
 Service over a network.
 Ideal platform for big data storage.
 Stream based data management.
 Hadoop based framework.
 Work load distribution .
 Scalability.
 Data filtering.
Error Detection on Cloud
 Error detection
 Error localization .
 Complexity analysis.
 Algorithm calibration on cloud.
MODULES :
Module 1 :
1.Create a big data
Module 2 :
2. Implementation of algorithm for error
detection
Module 3:
3. Implementation of recovery of file
Module 4 :
4. Testing
Flow Diagram of Project :
Upload a
file to
cloud
Store the
file on
cloud
If no
recover
the error
Check
if file is
equal
to
original
file
no
Yes
System Architecture
Design of Project:
A)Mathematical Model :
Let ‘S’ be the | Error detection in big data
as the final set.
Identify the inputs as D
S = {D,L,A}
D = {D1, D2, D3, D4| ‘D’ given Data
files}
Identify the outputs as O
S = {D, L, A}
D = {D1, D2, D3, D4| ‘D’ gives data files }
L = {L1, L2 | ‘L’ gives the log files for
upload and download and repair}
A = {A1, A2, A3 | ‘A’ gives alerts }
Identify the functions as ‘F’
S = {D, L, A, F}
F = {F1(), F2(), F3(), F4(), F5(), F6() }
F1( V ) :: Upload
F2 ( V) :: integrity check
F3 ( V ) :: Log generation
F4 ( T ) :: Alert the system
F4 ( D ) :: Restore the file
F6 ( V ) :: Download the data file
Feasibility Analysis:
This is considered with specifying equipment
and software that will successful satisfy the user
requirement the technical needs of the system
may vary considerably but might include
•The facility to produce outputs in a given time.
• Response time under certain conditions.
• Ability to process a certain column of
transaction at a particular speed.
Technical Feasibility :
The facility to produce outputs in a given
time.
 Response time under certain conditions.
 Ability to process a certain column of
transaction at a particular speed.
NP hard :
A problem is NP hard, if all other problems in
NP can be reduced to it .
NP complete :
A problem is NP complete , if it is (a) in NP, and
(b) NP hard.
In short:
NP- complete: the most difficult problems in
NP
Our project comes under NP Complete.
Algorithms :
1.Cyclic Redundancy Check.
2.Hamming Code.
3.Secure Hash Algorithm.
 Cyclic Redundancy Check
In CRC, q sequence of redundant bits, called
cyclic redundancy check bits are appended
to the end of data unit so that resulting data
unit becomes exactly divisible by a second,
predetermined binary number . The basic
idea of CRC algorithms is simply to treat the
message as an enormous binary number, to
divide it by another fixed binary number,
and to make the remainder from this
division the checksum.
Humming Code :
Higher information rate.
 It encode & decodes code words .
 Detect errors of weight up to 3.
 Correct errors of weight 1.
 The key to the Hamming Code is the use of
extra parity bits to allow the identification of a
single error.
SHA 1
 Cryptographic hash function.
 Produces 160 bit hash values as a message.
 Value is hexadecimal number.
 40 digits long.
 SHA-1 forms part of several widely used
security applications and protocols,
including TLS and SSL, PGP, SSH, S/MIME,
and IPsec. Those applications can also
use MD5; both MD5 and SHA-1 are descended
from MD4.
SHA-1 hashing is also used in distributed
revision control systems like Git, Mercurial,
and Monotone to identify revisions, and to
detect data corruption or tampering.
Data Flow Diagram:
(Level 0):
User System Web service
(Level 1) :
User System Web service
Hadoop
Generate Hash Key Hash Key Checker
(Level 2):
User System Web service
Hadoop
Generate Hash Key Hash Key Checker
Upload file
Repair file
No error message
Restore File
Class Diagram :
State Transition Diagram :
Use Case Diagram :
Activity Diagram :
Component Diagram :
Development Diagram :
Sequence Diagram :
1.Time efficient approach for detecting
and correcting errors.
2.It works for text , audio and video files.
Advantages:
Limitations:
1. It works for only specific kinds of errors.
2. Limited size of block upto 1 Gb.
3. Works on Public cloud.
Paper Submission Details :
Paper Title : “ERROR DETECTION IN
BID DATA”
Paper has been accepted for publication to
International Education and Research
Journal –IERJ (E-ISSN:2454-9916)
References :
 S. Tsuchiya, Y. Sakamoto, Y. Tsuchimoto,
and V. Lee, “Big Data Processing in Cloud
Environments,” FUJITSU Science and
Technology J., vol. 48, no. 2, pp. 159-168,
2012.
S. Sakr, A. Liu, D. Batista, and M. Alomari,
“A Survey of Large Scale Data Management
Approaches in Cloud Environments,” IEEE
Comm. Surveys & Tutorials, vol. 13, no. 3, pp.
311-336, Third Quarter 2011.
 M.C. Vuranand and I.F. Akyildiz, “Error
Control in Wireless Sensor Networks: A Cross
Layer Analysis,” IEEE Trans. Networking, vol.
17, no. 4, pp. 1186-1199, Aug. 2009.
 C. Liu, J. Chen, T. Yang, X. Zhang, C. Yang,
R. Ranjan, and K. Kotagiri, “Authorized public
auditing of dynamic big data storage on cloud
with efficient verifiable fine-grained updates,”
IEEE Trans. Parallel and Distributed Systems,
vol. 25, no. 9, pp. 2234–2244, Sept. 2014
“SensorCloud,” http://www.sensorcloud.com/,
accessed on 30, Aug. 2013.
Thank you

prj exam

  • 1.
  • 2.
    Group Id :15 Names: Shweta Dolhare (B120344212) Snehal Gaikwad (B120344215) Poonam Ghorpade (B120344219) Project Title : “Error Detection in Big Data on Cloud” Internal Guide : S.B.Jadhav
  • 3.
    Project Definition: To developsuch a approach that will efficiently reduce time for detecting errors in big sensor data on cloud. If any error is found then it also involves error recovery & storing the data in original format. Technical Keywords : Cloud Computing, Service Composition, Online Web services, Hadoop, MySQL, Map Reduce.
  • 4.
    Introduction : We needto develop such a approach that will efficiently reduce time for detecting errors in big sensor data on cloud. If any error is found then it also involves error recovery & storing the data in original format.
  • 5.
    According to theerror type and features from scale-free network we have proposed a time-efficient strategy for detecting and locating errors in big data sets on cloud . The main aim is to reduce the time required to detect the errors and to provide a error free transmission of data .
  • 6.
    Motivation of theProject: According to the error type and features from scale-free network we have proposed a time- efficient strategy for detecting and locating errors in big data sets on cloud. The main aim is to reduce the time required to detect the errors and to provide a error free transmission of data.
  • 7.
    Big Data Processingon Cloud  Service over a network.  Ideal platform for big data storage.  Stream based data management.  Hadoop based framework.  Work load distribution .  Scalability.  Data filtering.
  • 8.
    Error Detection onCloud  Error detection  Error localization .  Complexity analysis.  Algorithm calibration on cloud.
  • 9.
    MODULES : Module 1: 1.Create a big data Module 2 : 2. Implementation of algorithm for error detection
  • 10.
    Module 3: 3. Implementationof recovery of file Module 4 : 4. Testing
  • 11.
    Flow Diagram ofProject : Upload a file to cloud Store the file on cloud If no recover the error Check if file is equal to original file no Yes
  • 12.
  • 13.
    Design of Project: A)MathematicalModel : Let ‘S’ be the | Error detection in big data as the final set. Identify the inputs as D S = {D,L,A} D = {D1, D2, D3, D4| ‘D’ given Data files}
  • 14.
    Identify the outputsas O S = {D, L, A} D = {D1, D2, D3, D4| ‘D’ gives data files } L = {L1, L2 | ‘L’ gives the log files for upload and download and repair} A = {A1, A2, A3 | ‘A’ gives alerts }
  • 15.
    Identify the functionsas ‘F’ S = {D, L, A, F} F = {F1(), F2(), F3(), F4(), F5(), F6() } F1( V ) :: Upload F2 ( V) :: integrity check F3 ( V ) :: Log generation F4 ( T ) :: Alert the system F4 ( D ) :: Restore the file F6 ( V ) :: Download the data file
  • 16.
    Feasibility Analysis: This isconsidered with specifying equipment and software that will successful satisfy the user requirement the technical needs of the system may vary considerably but might include •The facility to produce outputs in a given time. • Response time under certain conditions. • Ability to process a certain column of transaction at a particular speed.
  • 17.
    Technical Feasibility : Thefacility to produce outputs in a given time.  Response time under certain conditions.  Ability to process a certain column of transaction at a particular speed.
  • 18.
    NP hard : Aproblem is NP hard, if all other problems in NP can be reduced to it . NP complete : A problem is NP complete , if it is (a) in NP, and (b) NP hard. In short: NP- complete: the most difficult problems in NP Our project comes under NP Complete.
  • 19.
    Algorithms : 1.Cyclic RedundancyCheck. 2.Hamming Code. 3.Secure Hash Algorithm.
  • 20.
     Cyclic RedundancyCheck In CRC, q sequence of redundant bits, called cyclic redundancy check bits are appended to the end of data unit so that resulting data unit becomes exactly divisible by a second, predetermined binary number . The basic idea of CRC algorithms is simply to treat the message as an enormous binary number, to divide it by another fixed binary number, and to make the remainder from this division the checksum.
  • 21.
    Humming Code : Higherinformation rate.  It encode & decodes code words .  Detect errors of weight up to 3.  Correct errors of weight 1.  The key to the Hamming Code is the use of extra parity bits to allow the identification of a single error.
  • 22.
    SHA 1  Cryptographichash function.  Produces 160 bit hash values as a message.  Value is hexadecimal number.  40 digits long.  SHA-1 forms part of several widely used security applications and protocols, including TLS and SSL, PGP, SSH, S/MIME, and IPsec. Those applications can also use MD5; both MD5 and SHA-1 are descended from MD4.
  • 23.
    SHA-1 hashing isalso used in distributed revision control systems like Git, Mercurial, and Monotone to identify revisions, and to detect data corruption or tampering.
  • 24.
    Data Flow Diagram: (Level0): User System Web service
  • 25.
    (Level 1) : UserSystem Web service Hadoop Generate Hash Key Hash Key Checker
  • 26.
    (Level 2): User SystemWeb service Hadoop Generate Hash Key Hash Key Checker Upload file Repair file No error message Restore File
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
    1.Time efficient approachfor detecting and correcting errors. 2.It works for text , audio and video files. Advantages:
  • 35.
    Limitations: 1. It worksfor only specific kinds of errors. 2. Limited size of block upto 1 Gb. 3. Works on Public cloud.
  • 36.
    Paper Submission Details: Paper Title : “ERROR DETECTION IN BID DATA” Paper has been accepted for publication to International Education and Research Journal –IERJ (E-ISSN:2454-9916)
  • 37.
    References :  S.Tsuchiya, Y. Sakamoto, Y. Tsuchimoto, and V. Lee, “Big Data Processing in Cloud Environments,” FUJITSU Science and Technology J., vol. 48, no. 2, pp. 159-168, 2012. S. Sakr, A. Liu, D. Batista, and M. Alomari, “A Survey of Large Scale Data Management Approaches in Cloud Environments,” IEEE Comm. Surveys & Tutorials, vol. 13, no. 3, pp. 311-336, Third Quarter 2011.
  • 38.
     M.C. Vuranandand I.F. Akyildiz, “Error Control in Wireless Sensor Networks: A Cross Layer Analysis,” IEEE Trans. Networking, vol. 17, no. 4, pp. 1186-1199, Aug. 2009.  C. Liu, J. Chen, T. Yang, X. Zhang, C. Yang, R. Ranjan, and K. Kotagiri, “Authorized public auditing of dynamic big data storage on cloud with efficient verifiable fine-grained updates,” IEEE Trans. Parallel and Distributed Systems, vol. 25, no. 9, pp. 2234–2244, Sept. 2014 “SensorCloud,” http://www.sensorcloud.com/, accessed on 30, Aug. 2013.
  • 39.