2. About
Company
Xcelerator is a collaborative learning community,
which brings together all stake holders to create an
experiential and contextual learning platform. They
do this by offering industry-relevant projects and
learning, and a platform for students to engage with
industry experts. The purpose of these project-
based engagements are to ensure that the
learning's are relevant to the nature of work that
goes on in the industry and to help students get
closer to the real needs of the industry. The
platform offers contextual learning around these
projects so that there is content-on-demand which
is closely linked to the task at hand.
3. INTRODUCTION
Data Analytics.
Data Visualization.
Why R?
Overview of the Project
•Scope of the Project.
•Learning Outcomes
•Key Skills
•Hardware
•Software
Tasks I need to do..
•Task 1
•Task 2
Flow chart
Applications
4. Why do we need Data Analytics ?
Data Analytics helps organizations hardness their data and use it to identify new
opportunities.
Cost Reduction
Better Marketing and Product Analysis
Faster , better decision marking
Organization Analysis
What is Data Visualization ?
Visualization allows us visual access to huge amount of data in easily digestible visuals.
Well designed data graphics are usually the simplest and at the same time , the most
powerful.
5. DATA VISUALIZATION
R Programming Language.
Quick view.
Python.
Spark.
These are the major tools that are available in market to
retrieve data from social media.
6. Why R?
Programming and Statistical
Language
Data Analysis and Visualization
Simple and Easy to Learn
Free and Open Source
7. PROJECT ABSTRACT
The objective of this project is to retrieve data from social media such as twitter, perform
exploratory analysis and show the result in a visual form. The project involves research about
social media analytics, identifying what data needs to be collected, and then doing the analysis.
SCOPE OF THE PROJECT
•Collecting data from social media.
•Storing the data in Neo4j database, performing exploratory data analysis and showing the
result of the analysis in a visual form.
LEARNING OUTCOMES
•Store data in Neo4j graph database
•Write cipher query to retrieve data from Neo4j database
•Develop R functions that performs exploratory analysis of unstructured data stored in Neo4j database
•Use R visualization packages to show the analysis in a visual form
8. KEY SKILLS
•R
•Data analysis
•Data Visualization
•Neo4j
HARDWARE
•Operating System - Windows/Linux (Ubuntu)
•Minimum 4 GB RAM, 500 GB Hard disk
SOFTWARE
•R (https://cran.r-project.org/ )
•R Studio
(https://www.rstudio.com/products/rstudio/download/)
•Neo4j (https://neo4j.com/)
9. TASKS I NEED TO DO...
Setting up the environment for
project implementation..
TASK1
The purpose of this task is to collect
twitter data for storing it in Neo4j
graph database.
TASK2
In this task, the data collected
needs to be stored in Neo4j
database
TASK3
9
TASK4
Perform exploratory
analysis of the data and
show the analysis done
in a visual form.
#1 Environment Setup #2 Data Acquisition from Twitter #3 Data Storage #4 Data Analysis and Visualization
10. •Go through the resources provided to install and familiarize with R,
Neo4j
•Install the latest version of R and Rstudio
•R (https://cran.r-project.org/ )
•R Studio (https://www.rstudio.com/products/rstudio/download/)
•Install Neo4j (https://neo4j.com/)
•Install Neo4j driver for R (install the R package "RNeo4j")
Output/Results:
R, R Studio, Neo4j installed in the machine.
New R Project Created.
TASK 1
11. TASK 2
•Create a new twitter app using the link provided. Go through various
settings provided in the app. Some of these details will be required to be
added in your R script.
•Write R function to retrieve twitter data, for example text from tweets
for a specific #tag, or profile details like followers or any other data. To
complete this task you will have to research and finalize what data you
want to collect, store in the Neo4j graph database.
•Install and include the necessary R packages to retrieve the tweets.
Output/Results:
•R function that can be used to retrieve twitter data.
12. TASK 3
•Build a graph Database
•Create a R function to build the database.
•Explore various functions in RNeo4j
package , where you will using these
functions to add nodes, relationships,
constraints , indexes.
•Store the collected hash tags or tweets.
Output/Results:
•Neo4j graph database created and populated with data.
13. TASK 4
Network Analysis is used to investigate and visualize the inter-
relationship between entities (individuals, things).
Examples of network structures, include: social media
networks, friendship networks, collaboration networks and
disease transmission.
Network and graph theory are extensively used across different
fields, such as in biology (pathway analysis and protein-protein
interaction visualization), finance, social sciences, economics,
communication, history, computer science, etc.
In this chapter, you’ll learn:
•the basic terms of network analysis and visualization.
•how to create static networks using igraph (R base plot) and R
packages.
•how to create arc diagram, treemap and dendrogram layouts
14. TASK 1
12-06-19 TO 14-06-19
TASK 2
17-06-19 TO 21-06-19
TASK 3
24-06-19 TO --------
TASK 4
21-06-19 TO 23-06-19
PROJECT SCHEDULE
15. Data acquisition
from social network
Project Overview Input Data Prerequisites Tasks
End
Solution
Process
Overview
Learning
Outcome
Student
Deliverables
Key
Skills
Hardware Software
Task 1 Task 2 Task 3 Task 4
16. Applications of Data Acquisition from Social Network
•Data acquisition and analysis of social media data is a very huge platform where volume
and velocity at which the data is generated is huge.
• Analysis of RStudio,Neo4j data can be useful in many ways for example, we can create
highly customized advertisements to users.
• Performing real time analysis of data collected from social media .
•We can perform exploratory analysis of the data and show the analysis done in a visual
form.
17. Mission
Vision
The mission is to analyze massive
and complex data in order to extract
useful information.
The vision of the Project is to
facilitate the highest quality data
science education, research and
industrial collaboration.