In next 3-5 year this technology will play a major role because data is growing in uncontrolled way like data generated from intelligent systems,social networks ,mobile phones leads and in last 5 years that generation has doubled and this leads to big data .
big data has 3 characteristics calles 3v's that is
velocity of data.
variety of data
variety of data
Big data problem is solved by hadoop which uses hadoop file distribution system(provides huge storage in distributed form) and through map reduce framework it solves the need of high computing power.
2. Content
Introduction of big data .
Data sources .
What is hadoop ??.
Why hadoop ??.
How hadoop works ??.
Mapreduce algorithm .
Problem’s ??.
Conclusion .
3. Introduction to big data
Doug cutting and Mike cafarella involved in a project called
“Nutch” .
Data which is unable to process by traditional systems .
Problems faced by many organisation like google,ibm,facebook
etc.
Explosive growth of data – difficult to make sense.
3 v’s –velocity,variety,volume.
4. Data sources
Facebook generates >25 TB daily.
Airbus >10 TB every 30 min.
Smartphones >5 billion camera phones which are gps
enabled.
Internet users >2 billion people and cisco estimates
internet traffic to be 8 ZB per year.
E-mail sent 300 billion every day .
5. What is Hadoop ????
Open-source software for storing and processing big data .
Distributed .
Framework.
Massive data storage.
Faster processing .
6. Why hadoop ???
Low cost - HDFSs.
Computing power.
Scalability.
Storage flexibility.
Inherent data processing and self healing capabilities.
Large data,calculation,unstructured data..
7. How hadoop works ???
HDFS – java based distributed file system that can store all kind
of data.
MAPREDUCE – a s/w programming model for processing large
sets of data parallel.
YARN – a resource management for scheduling and handling
resource request from distributed applications.
PIG – platform for manipulating data stored in hdfs.
HIVE – a data warehouse.
ZOOKEPER – application that coordinates distributed process.
8. Map reduce algorithm !!!
Large data -> smaller data and mapped to computer -> theme ->
single computer -> o/p.
9. Problem’s ???
Mapreduce –not suitable for iterative and interactive analytic
task.
Mapreduce is file intensive – creates multiple files.
Talent gap.
Fragmented data security issues.
Lacking tools for data quality and standardization.
10. Conclusion
Select the right projects for hadoop implementation.
Rethink and adapt existing architecture to hadoop.
Plan availability of skills and resource before started.
Prepare to deliver trusted data for areas that impacts business
insight and operation .
Adopt lean and agile integration principles.
To have edge in compitition