Big data

Big Data
Presented by,
Mohamedsalman S
(BIT CSE)

contents
 Introduction.
 Components.
 Methods.
 What is Hadoop.
 Hadoop Offers.
 Map reduce.
 What is HPCC.
 HPCC Components.
 Big Data Samples.
 Difference between Hpcc and Hadoop.
 Private and Security issues.
 Knowledge Discovery.
 Conclusion.

Introduction
 Big data and its analysis are at the center of modern science and
business.
 These data are generated from online transactions, emails, videos,
audios, images etc.
 They are stored in databases grow massively and become difficult to
capture, store, manage, share.
 It is predicted to double every two years reaching about 8zettabytes
of data by 2015.

Components
 Vareity.
Variety makes big data really big.
Big data comes from a great variety of sources.
Generally has in three types structured, unstructured and semi-
structured.
Structured data inserts a data warehouse already tagged and
easily sorted.
Unstructured data is random and difficult to analyze.

Components
Semi-structured data does not conform to fixed fields but contains
tags to separate data elements.
 Volume.
Volume or the size of data now is larger than terabytes, petabytes and
zettabytes.
 Velocity.
The flow of data is massive and continuous.
Big data should be used as it streams into the organization in order to
maximize its value.

Methods
 Facing lots of new data which arrives in many different forms.
 Big data has generated a whole new industry of supporting
architectures such as MapReduce.
 MapReduce is a programming framework for distributed computing.
 Created by google using divide and conquer method.
 MapReduce can be divided into two stages.
Map Step. Hpcc.
Reduce Step. Hadoop.

What is Hadoop?
 Hadoop is an open-source software framework.
 Its Java based framework.
 Essentially it accomplishes two tasks massive data storage and faster
processing.
 Its not replace in database warehouse or ETL.

Hadoop Offers
 HDFS - responsible for storing data on the clusters.
 MapReduce.
 Hbase - distributed database for random read/write access.
 Pig - high level data processing system.
 Hive - data warehouse application.
 Sqoop - transferring data between relational databases and Hadoop.

Mapreduce
 MapReduce is a programming framework for distributed computing.
 Created by google using divide and conquer method.
 MapReduce can be divided into two stages.
Map Step.
Reduce Step.

What is HPCC?
 HPCC also known as DAS.
 HPCC Systems distributed data intensive open source computing
platform and provides big data workflow management services.
 Unlike Hadoop, HPCC’s data model defined by user.
 HPCC Platform does not require third party tools like GreenPlum,
Cassandra, RDBMS, Oozie.

HPCC Components
 HPCC Data Refinery
Massively parallel ETL engine that enables data integration
and provides batch oriented data manipulation.
 HPCC Data Delivery Engine
High throughput, ultra fast, low latency.
 Enterprise Control Language
Simple usage programming language optimized for big data
operations and query transactions.

Big Data Samples
 Biological science.
 Life sciences.
 Medical records.
 Scientific research.
 Mobile phones.
 Government.

Difference between Hpcc and
Hadoop

Knowledge Discovery
 Some operations designed to get information from complicated data
sets.
 Removing noise, handling missing data fields and calculating time
information.
 Mapping purposes to a particular data mining methods.
 Choose data mining algorithm and method for searching data
patterns.

Privacy and Security Issues
 It required that big data stores are rightly controlled.
 To ensure authentication a cryptographically secure communication
framework has to be implemented.
 They control data according to specified by the regulations such as
imposing store periods.
 Organizations have to consider legal branching for storing data.

Conclusion
 Difficult to managing the data.
 Data keep in secure manner.
 Its used more no of organization.

Big data

More Related Content

What's hot

Viewers also liked

Similar to Big data

Recently uploaded

Big data