Big data and hadoop ecosystem tools

 Introduction to Big Data
 Properties of Big Data
 Introduction to Hadoop
 Core components in Hadoop
 MapReduce
 Hadoop Ecosystem tools
 Conclusion

A data which is beyond storage capacity and beyond
processing power
Properties of Big Data
According to IBM
Volume
Velocity
variety

1. Structured Data
RDBMS
2. Semi Structured Data
Log Files
3. Unstructured Data
text, audio, video, image etc..

Name Node
 Master of the system
 Maintains and manages the blocks
of data nodes
Data Node
 salves and provides actual storage
 responsible for read and write operations

 Highly fault-tolerant
 High Throughput
 Suitable for applications with large data dets
 Write once and read many times
 Can be built by commodity hardware
 Replicating data across different data nodes

 Low latency data access(quickly access small data)
 Lots of small files
 Multiple writes, arbitrary file modifications

 Familiar with SQL use
 Initially given by Facebook
 Internally runs with MapReduce
 HiveQL-Hive Query Language act as interpreter
 Can load thousands of rows at a time

Importing data from RDBMS to HDFS
Exporting data from HDFS to RBMS
Used to Store data in Hbase
Used to upload data to Hive

 No need of lot of knowledge in programming and
SQL
 Simplifies the work done by mapreduce programs
 Initially given by Yahoo
 Own language “Pig Latin Scripting”

 Works as a server
 Coordinating more than one job at a time

 No SQL
 Column Oriented Format
 Data can be stored and processed

 Hadoop can handle any type of data
 Open Source from Apache
 Fault Tolerant
 Provides tools for various domain knowledge
 Works very fast compared to others

Big data and hadoop ecosystem tools

Big data and hadoop ecosystem tools

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (11)

Similar to Big data and hadoop ecosystem tools

Similar to Big data and hadoop ecosystem tools (20)

Recently uploaded

Recently uploaded (20)

Big data and hadoop ecosystem tools