Apache Hadoop, BigData & MapReduceWHY BIG DATA:“More data usually beats better algorithm.”GOOD NEWS:“Big data is here.”BAD NEWS:We are struggling to store and analyze it.KEY PROBLEM:“Storage increased, not Speed.”SOLUTION: ParallelismBut, while implementing parallelism we may face some noteworthy problems like; Hardware failure Combining dataThese problems have been overcome by Hadoop because of use of – HDFS ( Hadoop Distributed File System) MapReduce ( use of keys and values)
In a nutshell,Hadoop provides - A reliable Shared Storage (by HDFS) -A reliable Analysis System (by MapReduce)MAPREDUCE: Entire database or a good portion of it is processed for each query. MapReduce is a batch query processor. Already used by Mailtrust , Rackspace’s mail division for handling big data.MAPREDUCE VS RDBMS:CONCLUSION:Though a thorough understanding is absent here, more research will make it more clarified anddistinguished as well. Some more valuable information will enrich it in the coming days.