Challengers I Told Ya ShirtChallengers I Told Ya Shirt
Big Data
1. Harvesting Twitter Data Using Big Data Techniques
Sridhar Mamella
MSc – Big Data and Business Intelligence | 2014 - 2015
Introduction
To provide analytical skills to study big data and provide a
solid foundation for developing solutions and applications
that are needed to manipulate big data.
Results
Word Cloud depicting the most used English Tweet words
Heat Maps illustrating the most active tweeting regions
Conclusion
• Evaluated the common statistical analysis and machine
learning techniques used to manipulate data
• Utilised current big data technologies
• Selected and employed an appropriate tool
Technologies
• R
• Hadoop
• Hive
• Pig
There are modules contained within the Hadoop project —
Hadoop Common, Hadoop Distributed File System, Hadoop
YARN and Hadoop MapReduce.
Together these systems give users the tools to support
additional Hadoop projects, along with the ability to process
large data sets in real time while automatically scheduling
jobs and managing cluster resources.
To complement the Hadoop modules there are also a variety
of other projects that provide specialized services.
• Apache Hive
• Apache Spark
• Apache Ambari
• Apache Pig
• Apache Hbase
Future Work
• Work with more complex datasets
• Use Apache Sqoop
• Implement r-Hadoop
Hadoop Eco-system
Analytics
• Working with R
• Building Matrices
• K-means algorithm
• Creating Word Clouds
Hadoop
• HDFS
• Hive – HQL
• Pig – Pig Latin