BIG DATA ANALYTICS
USING HADOOP
SUBMITTED BY
V.N.V. SRIKANTH
138W1A12B4
ABSTRACT
• Big data analytics is the process of examining large data sets
containing a variety of data types i.e., big data to uncover
hidden patterns, unknown correlations, market trends,
customer preferences and other useful business information.
• The analytical findings can lead to more effective marketing,
new revenue opportunities, better customer service, improved
operational efficiency, competitive advantages over rival
organizations and other business benefits
• Many big data projects originate from the need to answer
specific business questions.With the right big data analytics
platforms in place, an enterprise can boost sales, increase
efficiency, and improve operations, customer service and risk
management
MOTIVATION
• By using big data analytics you can extract only the relevant
information from terabytes, petabytes and exabytes, and
analyze it to transform your business decisions for the future.
• With the right big data analytics platforms in place, an
enterprise can boost sales, increase efficiency, and improve
operations, customer service and risk management.
0
2
4
6
Category 1 Category 2 Category 3 Category 4
Series 1 Series 2 Series 3
• Technologies that includes
Hadoop and related tools such as
YARN, MapReduce, Spark, Hive
and Pig as well as NoSQL
databases supports the
processing of large and diverse
data sets across clustered
systems
PROBLEM STATEMENT
• The first challenge is in breaking down data silos to access all
data an organization stores in different places and often in
different systems.
• A second big data challenge is in creating platforms that can
pull in unstructured data as easily as structured data.
• This massive volume of data is typically so large that it's
difficult to process using traditional database and software
methods.
PROBLEM STATEMENT(cont..)
• The above challenges can be overcome by the
implementation of following technologies
Parallel Database Technologies
Map Reduce
• The best open source tools available are
1996
1996
1997
1996
KNOWLEDGE FROM LITERATURE SURVEY
1998
2013
KNOWLEDGE FROM LITERATURE
SURVEY(CONT..)
• 2004- Initial versions of HDFS and MapReduce were
implemented.
• 2005-used GFS and MapReduce to perform operations.
• 2006-Yahoo! created Hadoop based on GFS and MapReduce .
• 2007 -Yahoo started using Hadoop on a 1000 node cluster.
• 2008- Apache took over Hadoop,Tested a 4000 node cluster with
it
• 2009- successfully sorted a peta byte of data in less than 17 hours
to handle billions of searches and indexing millions of web pages.
• 2011 - Hadoop releases version 1.0
• 2013 -Version 2.0.6 is available
KNOWLEDGE FROM LITERATURE
SURVEY(CONT..)
2003
2004
2006
LITERATURE SURVEY METHODS
Methods Author Year
RDBMS
(Relational Data Base Management
Systems)
E.F.CODD 1980
GRID COMPUTING IANFOSTER,
CARL KESSELMAN
(Early) 1990s
Volunteer computing Luis F. G. Sarmenta 1996
hadoop HDFS Sanjay Ghemawat, Howard Gobioff, Shun-
Tak Leung
2003
hadoop MapReduce Jefry Dean and Sanjay Ghemawat 2004
Apache Hadoop Doug Cutting
&
Mike Cafarella
2011
LITERATURE SURVEY METHODS(CONT..)
•Hardware Failure:
As soon as we start using many pieces of hardware,
the chance that one will fail is fairly high.
• Combine the data after analysis:
Most analysis tasks need to be able to combine the
data in some way; data read from one disk may need
to be combined with the data from any of the other
99 disks.
DEMERITS OF PREVIOUS METHODS
Apache Hadoop is a framework for running applications on large cluster
built of commodity hardware.
A common way of avoiding data loss is through replication: redundant
copies of the data are kept by the system so that in the event of failure,
there is another copy available.The Hadoop Distributed Filesystem (HDFS),
takes care of this problem.
The second problem is solved by a simple programming model- Mapreduce.
Hadoop is the popular open source implementation of MapReduce, a
powerful tool designed for deep analysis and transformation of very large
data sets.
HADOOP ADVANTAGES
PROJECT IDEAS RELATEDTOTHETOPIC
•TrafficCongestion Control
•Hospital Management
•College Management Systems
CONCLUSION
By using big data analytics you can extract only the relevant
information from terabytes, petabytes and exabytes, and analyze it
to transform your business decisions for the future.
With the right big data analytics platforms in place, an enterprise can
boost sales, increase efficiency, and improve operations, customer
service and risk management.
Pros Cons
Cost Effective Cluster management is hard
Parallel processing Single point of failure
Fault tolerance Security issues
Scalability
REFERENCES
 https://en.wikipedia.org/wiki/Big_data
 http://searchbusinessanalytics.techtarget.com/definition/big-data-
analytics
 http://www.computerworld.com/article/2690856/big-data/8-big-trends-in-
big-data-analytics.html
 http://www.lunametrics.com/blog/2014/01/27/google-analytics-bigquery-
whys-hows/
 http://www.webopedia.com/TERM/B/big_data_analytics.html
 http://www.sas.com/en_us/insights/analytics/big-data-analytics.html
Big Data Analytics Using Hadoop

Big Data Analytics Using Hadoop

  • 1.
    BIG DATA ANALYTICS USINGHADOOP SUBMITTED BY V.N.V. SRIKANTH 138W1A12B4
  • 2.
    ABSTRACT • Big dataanalytics is the process of examining large data sets containing a variety of data types i.e., big data to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. • The analytical findings can lead to more effective marketing, new revenue opportunities, better customer service, improved operational efficiency, competitive advantages over rival organizations and other business benefits • Many big data projects originate from the need to answer specific business questions.With the right big data analytics platforms in place, an enterprise can boost sales, increase efficiency, and improve operations, customer service and risk management
  • 4.
    MOTIVATION • By usingbig data analytics you can extract only the relevant information from terabytes, petabytes and exabytes, and analyze it to transform your business decisions for the future. • With the right big data analytics platforms in place, an enterprise can boost sales, increase efficiency, and improve operations, customer service and risk management. 0 2 4 6 Category 1 Category 2 Category 3 Category 4 Series 1 Series 2 Series 3 • Technologies that includes Hadoop and related tools such as YARN, MapReduce, Spark, Hive and Pig as well as NoSQL databases supports the processing of large and diverse data sets across clustered systems
  • 5.
    PROBLEM STATEMENT • Thefirst challenge is in breaking down data silos to access all data an organization stores in different places and often in different systems. • A second big data challenge is in creating platforms that can pull in unstructured data as easily as structured data. • This massive volume of data is typically so large that it's difficult to process using traditional database and software methods.
  • 6.
    PROBLEM STATEMENT(cont..) • Theabove challenges can be overcome by the implementation of following technologies Parallel Database Technologies Map Reduce • The best open source tools available are
  • 8.
  • 9.
  • 10.
    • 2004- Initialversions of HDFS and MapReduce were implemented. • 2005-used GFS and MapReduce to perform operations. • 2006-Yahoo! created Hadoop based on GFS and MapReduce . • 2007 -Yahoo started using Hadoop on a 1000 node cluster. • 2008- Apache took over Hadoop,Tested a 4000 node cluster with it • 2009- successfully sorted a peta byte of data in less than 17 hours to handle billions of searches and indexing millions of web pages. • 2011 - Hadoop releases version 1.0 • 2013 -Version 2.0.6 is available KNOWLEDGE FROM LITERATURE SURVEY(CONT..)
  • 11.
  • 12.
    Methods Author Year RDBMS (RelationalData Base Management Systems) E.F.CODD 1980 GRID COMPUTING IANFOSTER, CARL KESSELMAN (Early) 1990s Volunteer computing Luis F. G. Sarmenta 1996 hadoop HDFS Sanjay Ghemawat, Howard Gobioff, Shun- Tak Leung 2003 hadoop MapReduce Jefry Dean and Sanjay Ghemawat 2004 Apache Hadoop Doug Cutting & Mike Cafarella 2011 LITERATURE SURVEY METHODS(CONT..)
  • 13.
    •Hardware Failure: As soonas we start using many pieces of hardware, the chance that one will fail is fairly high. • Combine the data after analysis: Most analysis tasks need to be able to combine the data in some way; data read from one disk may need to be combined with the data from any of the other 99 disks. DEMERITS OF PREVIOUS METHODS
  • 14.
    Apache Hadoop isa framework for running applications on large cluster built of commodity hardware. A common way of avoiding data loss is through replication: redundant copies of the data are kept by the system so that in the event of failure, there is another copy available.The Hadoop Distributed Filesystem (HDFS), takes care of this problem. The second problem is solved by a simple programming model- Mapreduce. Hadoop is the popular open source implementation of MapReduce, a powerful tool designed for deep analysis and transformation of very large data sets. HADOOP ADVANTAGES
  • 15.
    PROJECT IDEAS RELATEDTOTHETOPIC •TrafficCongestionControl •Hospital Management •College Management Systems
  • 16.
    CONCLUSION By using bigdata analytics you can extract only the relevant information from terabytes, petabytes and exabytes, and analyze it to transform your business decisions for the future. With the right big data analytics platforms in place, an enterprise can boost sales, increase efficiency, and improve operations, customer service and risk management. Pros Cons Cost Effective Cluster management is hard Parallel processing Single point of failure Fault tolerance Security issues Scalability
  • 17.
    REFERENCES  https://en.wikipedia.org/wiki/Big_data  http://searchbusinessanalytics.techtarget.com/definition/big-data- analytics http://www.computerworld.com/article/2690856/big-data/8-big-trends-in- big-data-analytics.html  http://www.lunametrics.com/blog/2014/01/27/google-analytics-bigquery- whys-hows/  http://www.webopedia.com/TERM/B/big_data_analytics.html  http://www.sas.com/en_us/insights/analytics/big-data-analytics.html