Seminarppt

Guided by :
Prof. N.M.Kandoi
Submitted by:
Ms. Monali D. Akhare
Roll no. 02
BIG DATAANALYTICS
Department of Computer Science &Engineering
Shri Sant Gajanan Maharaj College of Engineering
Shegaon (444203)

Contents
1. Introduction
2. Big Data and Big Data Analytics
3. Literature Review
4. Analysis of Work
5. Proposed Work
6. Applications
7. Future of Big Data
8. Reference

1/16/2017 Topic : Big Data Analytics Roll No.02
Introduction
 Big Data may well be the Next Big Thing in the IT world.
 Big data burst upon the scene in the first decade of the 21st century.
 The first organizations to embrace it were online and startup firms.
 Big data is currently a major topic across a number of fields
including,
-management and marketing
-scientific research
-national security
-government transparency
-open data.

 Big data can bring about dramatic cost reductions, substantial
improvements in the time required to perform a computing task.
 Big Data Analytics for manufacturing applications can be based on
a 5C architecture:
-connection,
-conversion
-cyber
-cognition
-configuration

Big Data and Big Data Analytics
What is Big Data?
 Big data usually includes data sets with sizes beyond the ability of
commonly used software tools to;
-capture
-manage
-process data
-elapsed time.
 But having data bigger it requires different approaches:
-techniques, tools and architecture
 Aim to solve new problems or old problems in a better way.
 Generates value and process very large information from storage
that cannot be analyzed by traditional computing techniques.

The Structure of Big Data
 The various challenges faced in large data management include
scalability, unstructured data, accessibility, real time analytics, fault
tolerance and many more.
 Structured
-Most traditional data sources
 Semi-structured
-Many sources of big data
 Unstructured
-Video data, audio data

 Growth of Big Data is needed
-Increase of storage capacities
-Increase of processing power
-Availability of data(different data types)
Why Big Data?
 IBM claims 90% of today’s stored data was generated in just the
last two years.
How Is Big Data Different?
 Automatically generated by a machine
(e.g. Sensor embedded in an engine)
 Typically an entirely new source of data
(e.g. Use of the internet)

 Examining large amount of data
 Appropriate information
 Identification of hidden patterns, unknown correlations
 Better business decisions: strategic and operational
 Effective marketing, customer satisfaction, increased revenue
 Big Data and analytics a large challenge offering great opportunities:
-understanding the business
-mobile advertising space
What is Big Data Analytics?

Big Data and Analytics Characteristics
Data can be described by the following characteristics:
Volume -The Big word in Big data itself defines the volume. Data volume
measures the amount of data available to an organization.
Variety - Data variety is a measure of
the richness of the data representation
-text
-images
-video
-audio
-web Pages
-e-mail.

Velocity- Speed of generation of data processed to meet the demands,
challenges lie in path of growth and development.
Value - Data value measures the usefulness of data in making decisions.
These reports help these people to find the business trends according to
which they can change their strategies.
Veracity -The quality of the data being captured can vary greatly accuracy
of analysis depends on the veracity of the source data.

Issues in Big Data
 Big data Issues are need not be confused with problems but they
are important to know and crucial to handle
Fig: Explosion in size of Data (Hewlett-Packard Development Company, 2012)

Issues related to the Characteristics
Volume :As data volume increases, the value of different data records will
decrease in proportion .
Velocity :Traditional systems are not capable of performing the analytics on
data which is constantly in motion so velocity management is more than a
bandwidth issue.
Variety :Incompatible data formats, non-aligned data structures, and
inconsistent data semantics .
Value : Business leaders would be just adding value to their business and
getting more profit unlike IT leaders who would have to concern with the
technicalities of storage and processing.

Other Issues......
Storage and Transport Issues
The quantity of data has exploded each time we invented a new
storage medium to handle this issue, data should be processed “in place”
and transmit only resulting information.
Data Management Issues
Given volume, it is impractical to
validate every data item so new approaches
to data qualification and validation are
needed.

Motivation for Big Data and Analytics
 Current tools and technologies are not up to the mark to store and
process huge amount of data.
 They are also unable to extract value from these data Big Data can
help to gain insights and make better decisions.
 Following are some areas where Big Data can play important role:
-Big Data Analytics and Health care
-Big Data Analytic and Intelligence Agencies
-Big Data Analytics and Environment

Literature Review
 Big Data can help to gain insights and make better decisions and
presents an opportunity.
 Technologies being applied to big data include massively parallel
processing (MPP) databases, data mining grids, distributed file
systems, distributed databases, cloud computing platforms.
 A wide variety of techniques and technologies has been developed
and adapted to aggregate, manipulate, analyze, and visualize big
data.

Big Data Technology
 Hadoop is an open source project hosted by Apache Software
Foundation.
 It consists of many small sub projects which belong to the category
of infrastructure for distributed computing.
Hadoop mainly consists of:
-File System (The Hadoop File System)
-Programming Paradigm (Map Reduce)
The other subprojects provide complementary services or they are
building on the core to add higher-level abstractions.

Fig. Hadoop High Level Architecture

 Replication i.e. creating redundant copies of the same data at
different devices so that in case of failure the copy of the data is
available.
 The main problem is of combining the data being read from
different devices.
 Many a methods are available in distributed computing to handle
this problem but still it is quite challenging.
 All the problems discussed are easily handled by Hadoop.
 The problem of failure is handled by the Hadoop Distributed File
System .

 Combining data is handled by Map reduce programming Paradigm
reduces problem of disk reads and writes by providing a
programming model dealing in computation with keys and values.
 Hadoop thus provides: a reliable shared storage and analysis system
 The storage is provided by HDFS and analysis by MapReduce
Fig . HDFS Architecture

BIG DATA is not just HADOOP
Manage & store huge volume
of any data
Hadoop File System
MapReduce
Manage streaming data Stream Computing
Analyze unstructured data Text Analytics Engine
Data WarehousingStructure and control data
Integrate and govern all
data sources
Integration, Data Quality, Security,
Lifecycle Management, MDM
Understand and navigate
federated big data sources
Federated Discovery and Navigation

Big Data Projects
 There are some of the projects which are Big Data using effectively.
-Big Science
-Private Sector
-Governments
-International Development
 Data access project by IBM.
-Pig
-Hive
-Flume
-Hcatalog
-Avro
-Spark

Analysis of Work
 The challenges in Big Data are usually real implementation hurdles
which require immediate attention.
 Any implementation without handling challenges may lead to failure
of technology implementation and some unpleasant result.
 There are many challenges in different sector given below:
- Privacy and security
- Analytical Challenges
- Technical Challenges
- Fault Tolerance : with the incoming of new technologies like cloud
- Scalability : the issue if big data has lead toward cloud
Big Data Issues and Challenges

Big Data Technologies and Risk
The risk associated with Big Data technologies:
 This is a new technology for most organizations so need to
understand other wise will create vulnerabilities.
 User authentication and access to data from multiple
locations may not be sufficiently
controlled.

Proposed Work
Apache Hadoop
Apache Hadoop is open source software library which includes
framework that allows for distributed processing of large data sets
across clusters of computers using simple programming models.
It has variety of options ranging from single computer to thousands
of computers, each of which offering local computation and storage.
Instead of depending on hardware, library itself designed to detect
and handle failure and assure high-availability at application layer.

Fig. Data store and retrival in Apache Hadoop system

Big Data Analytics has numerous proposed work below
Homeland
Security
Smarter
Healthcare
Multi-channel
sales
Telecom
Manufacturing
Traffic Control
Trading
Analytics
Search
Quality

Applications
Government
 The use and adoption of Big Data within governmental processes is
beneficial and allows efficiencies in terms of cost, productivity, and
innovation .
United States of America
 In 2012, the Obama administration announced the Big Data Research and
Development Initiative, to explore how big data could be used to address
important problems faced by the government.
India
 Big data analysis was, in parts, responsible for the BJP and its allies to
win a highly successful Indian General Election 2014.
 The Indian Government utilizes numerous techniques to ascertain how
Indian electorate is responding to government action, as well as ideas for
policy augmentation.

International development
 Advancements in big data analysis offer cost effective opportunities to
improve decision making in critical development areas such as health
care, employment, economic productivity, crime, security, and natural
disaster.
Manufacturing
 Based on TCS 2013 Global Trend Study, improvements in supply
planning and product quality provide the greatest benefit of big data for
manufacturing.
Private sector
 Retail: Walmart handles more than 1 million customer transactions every
hour, which are imported into databases estimated to contain more than
2.5 petabytes of data.
 Retail Banking: FICO Card Detection System protects accounts
worldwide.

Future of Big Data
 $15 billion on software firms only specializing in data
management and analytics.
 This industry on its own is worth more than $100 billion and
growing at almost 10% a year which is roughly twice as fast as the
software business as a whole.
 In February 2012, the open source analyst firm Wikibon released
the first market forecast for Big Data , listing $5.1B revenue in
2012 with growth to $53.4B in 2017
 The McKinsey Global Institute estimates that data volume is
growing 40% per year, and will grow 44x between 2009 and 2020.

References
[1] Katal, A., Wazid, M., Goudar, R.H., “Big data: Issues, challenges, tools and Good
practices”, Sixth International Conference on Contemporary Computing (IC3) 2013.
[2] Stephen K, Frank A, J. Alberto E, William M, “Big Data: Issues and Challenges Moving
Forward”, IEEE, 46th Hawaii International Conference on System Sciences, 2013.
[3] Big Data: Big Promises for Information Security By Rasim Alguliyev Institute of
Information Technology Azerbaijan National Academy of Sciences Baku, Azerbaijan
Yadigar Imamverdiyev Institute of Information Technology Azerbaijan National Academy
of Sciences Baku, Azerbaijan
[4] Big Data analytics frameworks by Parth Chandarana V.E.S.I.T, Chembur ,Mumbai,
India , M. Vijayalakshmi Department of Information Technology, V.E.S.I.T, Chembur
,Mumbai, India 2014 International Conference on Circuits, Systems, Communication and
Inf.
[5]Cloud Security Alliance (CSA): Big Data Analytics for SecurityIntelligence. September
2013.https://cloudsecurityalliance.org/download/big-data-analyticsfor-security-intelligence

Seminarppt

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Seminarppt

Similar to Seminarppt (20)

Recently uploaded

Recently uploaded (20)

Seminarppt

Editor's Notes