Why do I need Hadoop?


Business analytics focuses on developing
new insights and understanding of
business performance based on data and
stati...
Problem : Too much data
Big Data!!
Velocity
 How fast data is being produced and how fast the data must be processed to meet
demand.
 Have a look through a...
Variability
 highly inconsistent with periodic peaks
 Is something big trending in the social media?
 Difference in Var...
Megabytes,Gigabytes…


Terabyte : To put it in some perspective, a
Terabyte could hold about 300 hours of good
quality vi...
Human Generated Data and Machine
Generated
Data
Sheer size of Big Data
 Big Data is unstructured or semi
structured.
 No point in just storing big data, if we
can't pro...
Hadoop enables a computing
solution that is:

Scalable– New nodes can be added as needed, and added
without needing to cha...
Power of Map Reduce


Introduction
Hadoop: Basic Concepts
What is Hadoop?
The Hadoop Distributed File System
Hadoop Map Reduce Works
Anatomy ...


HDFS ( Hadoop Distributed File System )



Blocks and Splits
Input Splits
HDFS SplitsData Replication
Hadoop Rack Awar...


Writing a MapReduce Program
Examining a Sample MapReduce Program
With several examples
Basic API Concepts
The Driver Co...


Debugging MapReduce Programs
Testing with MRUnit
Logging
Other Debugging Strategies.



Advanced MapReduce Programming...


HBase
HBase concepts
HBase architecture
Region server architecture
File storage architecture
HBase basics
Column access...


Hive
Hive concepts
Hive architecture
Install and configure hive on cluster
Create database, access it from java client
...


PIG
Pig basics
Install and configure PIG on a cluster
PIG Vs MapReduce and SQL
Pig Vs Hive
Write sample Pig Latin scrip...


Sqoop
Getting Sqoop
A Sample Import
Database Imports
Controlling the import
Imports and consistency
Direct-mode imports...
Contact Us
Address
MindScripts Technologies,
2nd Floor, Siddharth Hall,
Near Ranka Jewellers,
Behind HP Petrol Pump,
Karve...
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Upcoming SlideShare
Loading in...5
×

Big-Data Hadoop Tutorials - MindScripts Technologies, Pune

557

Published on

MindScripts Technologies, is the leading Big-Data Hadoop Training institutes in Pune, providing a complete Big-Data Hadoop Course with Cloud-Era certification.

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
557
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
26
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Big-Data Hadoop Tutorials - MindScripts Technologies, Pune

  1. 1. Why do I need Hadoop?
  2. 2.  Business analytics focuses on developing new insights and understanding of business performance based on data and statistical methods. Business analytics
  3. 3. Problem : Too much data
  4. 4. Big Data!!
  5. 5. Velocity  How fast data is being produced and how fast the data must be processed to meet demand.  Have a look through analytics lens!
  6. 6. Variability  highly inconsistent with periodic peaks  Is something big trending in the social media?  Difference in Variety and Variability
  7. 7. Megabytes,Gigabytes…  Terabyte : To put it in some perspective, a Terabyte could hold about 300 hours of good quality video. A Terabyte could hold 1,000 copies of the Encyclopedia Britannica.  Petabyte : It could hold 500 billion pages of standard printed text.  Exabyte: It has been said that 5 Exabytes would be equal to all of the words ever spoken by mankind.
  8. 8. Human Generated Data and Machine Generated Data
  9. 9. Sheer size of Big Data  Big Data is unstructured or semi structured.  No point in just storing big data, if we can't process it.  Challenges of Big Data
  10. 10. Hadoop enables a computing solution that is: Scalable– New nodes can be added as needed, and added without needing to change data formats, how data is loaded, how jobs are written, or the applications on top.  Cost effective– Hadoop brings massively parallel computing to commodity servers.  Flexible– Hadoop is schema-less, and can absorb any type of data, structured or not, from any number of sources.  Fault tolerant– When you lose a node, the system redirects work to another location of the data and continues processing without missing a beat. 
  11. 11. Power of Map Reduce
  12. 12.  Introduction Hadoop: Basic Concepts What is Hadoop? The Hadoop Distributed File System Hadoop Map Reduce Works Anatomy of a Hadoop Cluster  Hadoop daemons Master Daemons Name node Job Tracker Secondary name node Slave Daemons Job tracker Task tracker Course Content
  13. 13.  HDFS ( Hadoop Distributed File System )  Blocks and Splits Input Splits HDFS SplitsData Replication Hadoop Rack Aware Data high availability Data Integrity Cluster architecture and block placement Accessing HDFS JAVA Approach CLI ApproachProgramming Practices Developing MapReduce Programs in Local Mode Running without HDFS and Mapreduce Pseudo-distributed Mode Running all daemons in a single node Fully distributed mode Running daemons on dedicated nodes
  14. 14.  Writing a MapReduce Program Examining a Sample MapReduce Program With several examples Basic API Concepts The Driver Code The Mapper The Reducer Hadoop's Streaming API  Common MapReduce Algorithms Sorting and Searching Indexing Classification/Machine Learning Term Frequency - Inverse Document Frequency Word Co-Occurrence Hands-On Exercise: Creating an Inverted Index Identity Mapper Identity Reducer Exploring well known problems using MapReduce applications
  15. 15.  Debugging MapReduce Programs Testing with MRUnit Logging Other Debugging Strategies.  Advanced MapReduce Programming A Recap of the MapReduce Flow The Secondary Sort Customized Input Formats and Output Formats
  16. 16.  HBase HBase concepts HBase architecture Region server architecture File storage architecture HBase basics Column access Scans HBase use cases Install and configure HBase on a multi node cluster Create database, Develop and run sample applications Access data stored in HBase using clients like Java, Python and Pearl HBase and Hive Integration HBase admin tasks Defining Schema and basic operation Hadoop Ecosystem
  17. 17.  Hive Hive concepts Hive architecture Install and configure hive on cluster Create database, access it from java client Buckets PartitionsJoins in hive Inner joins Outer Joins Hive UDF Hive UDAF Hive UDTF Develop and run sample applications in Java/Python to access hive
  18. 18.  PIG Pig basics Install and configure PIG on a cluster PIG Vs MapReduce and SQL Pig Vs Hive Write sample Pig Latin scripts Modes of running PIG Running in Grunt shell Programming in Eclipse Running as Java program PIG UDFs Pig Macros  Flume Flume concepts Install and configure flume on cluster Create a sample application to capture logs from Apache using flume
  19. 19.  Sqoop Getting Sqoop A Sample Import Database Imports Controlling the import Imports and consistency Direct-mode imports Performing an Export
  20. 20. Contact Us Address MindScripts Technologies, 2nd Floor, Siddharth Hall, Near Ranka Jewellers, Behind HP Petrol Pump, Karve Rd, Pune 411004 Call 9595957557 8805674210 9764560238 9767427924 9881371828 Address MindScripts Technologies, C8, 2nd Floor, Sant Tukaram Complex , Pradhikaran, Above Savali Hotel, Opp Nigdi Bus Stand, Nigdi, Pune - 411044 www.mindscripts.com info@mindscripts.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×