HADOOP COURSE CONTENT (Includes theoretical as well as practical sessions) Table of Contents- With Real Time Faculty 1. Basics of Parallel Programming a. Multi-Threading b. Open MP (Open Multiprocessing)and MPI (Message Passing Interface) c. Performance tuning and optimization i. Matrix Multiplication ii. Unique word count problem 2. Distributed computing concepts 3. Hadoop Overview a. Why Hadoop? b. Brief history of hadoop c. Architecture of Hadoop d. Overview of HDFS (Hadoop Distributed File System) and MR (Map Reduce) framework e. Overview of problems solved by Hadoop i. Data Mining ii. Web Mining iii. Natural Language Processing iv. K-means clustering v. Sentimental Analysis 4. Map Reduce Programming Model a. Details of execution of Map Reduce frame work b. Word count problem solved using Map Reduce programming model. c. Data Mining on Wikipedia data set. 5. Hadoop ecosystem 6. Hadoop Programming Languages a. Pig b. Hadoop Pipes (C++) c. Hadoop Streaming d. Hadoop and R 7. Distributed data base concepts a. RDBMS v/s No SQL DB b. Overview of HBase and Cassandra 8. Advance Map Reduce Programming (chaining Mapper and Reducer) 9. Case Studies A. Data Mining on Wikipedia data set using a. Batch Mode Processing (MR ) b. Using Hive c. Using HBase and Hive B. Web Mining using Apache Nutch, Apache Solr and Hadoop C. Web Log processing using Flume and Hadoop D. Complex Event processing using Flume, Hadoop and EPL ( Event Processing Language) E. Integrating Hadoop and RDBMS Prerequisites: (1) Hands-on Core java programming / C++/ R/Python (2) Hands on parallel/multithreaded programming (3) Query Language (SQL or EPL) (Optional)