2. Big data Analytics
Introduction
Unstructured data contains different (multiple) types of
data
Unstructured data is a generic label for describing data
that is not contained in a database or some other type of
data structure.
Unstructured data contains everything and presents
everywhere globally.
More than 90% of social media data is unstructured.
3. Big data Analytics
Importance of Unstructured Data
Every minute, there are more than 6,000 pictures shared
on social media sites and more than 200 million emails
sent.
Analyzing social content such as Tweets, Facebook posts
and transcripts from support calls gives a clear view of
how customers perceive the value and issues.
Unstructured data isn't well organized or easy to access,
but companies who analyze this data and integrate it into
their information management landscape can significantly
improve employee productivity.
4. Big data Analytics
Examples of Unstructured Data
e-mail messages, word processing documents, videos,
photos, audio files, presentations, web pages.
Examples of "unstructured data" may include books,
journals, documents, metadata, health records, audio,
video, analog data, images, files.
5. Big data Analytics
Influence of Unstructured data on Social media
The social media needs to be part of the business
strategy by interacting with clients on customer.
The statistics contain the number of Twitter
followers, Facebook likes, LinkedIn connections,
blog subscribers.
Social media like Facebook is growing enormously
with the massive amount of unstructured data, they
are collecting.
Twitter sees about 175 million tweets each day and
has more than 465 million accounts.
7. Big data Analytics
Technologies
Data mining
Pattern Recognition
Operations Research
Social Network Analytics (Facebook, Twitter, LinkedIn)
Natural Language Processing
9. Big data Analytics
RapidMiner
Rapidminer provides an integrated environment for
machine learning, data mining, text mining, predictive
analytics.
It is the most powerful tool, easy to use and intuitive
graphical interface for the design of analytic process.
The code is written in JAVA.
Runs on all major platforms and operating system.
Save time by identifying possible errors, and get
suggested quick fixes and support .csv, excel and binary
files.
11. Big data Analytics
Weka
Weka is a collection of machine learning algorithms.
It contains tools for data pre-processing,
classification, regression, clustering, association
rules, and visualization.
It is s written in Java and runs on almost any
platform.
Large collection of different data mining algorithms.
12. Big data Analytics
Python
Connect python with R by installing package
“Rserve”
High level language and easy to interpret.
Free and open source, runs on all platforms.
13. Big data Analytics
R language
R is very effective statistical tool and well worth the effort
to learn.
R is polymorphic, which means that the same function
can be applied to different types of objects.
R has more than 4000 packages available from multiple
repositories in various specializations.
CRAN (Comprehensive R Archive Network).
R can import data from csv files, excel, sas and produces
the output in pdf, jpg, png formats and also table output.
14. Big data Analytics
R langauge
Working with R studio, loading packages, extracting
the tweets.
15. Big data Analytics
Unstructured data Analysis for Motor Insurance
Extracting the data from social media related to Motor
insurance sector.
Company names, keywords.
Getting the tweets from twitter and analyzing the data.
Sentiment analysis.
User interface.
What type of insurance can be given or any fraud detection?
16. Big data Analytics
Extracting data from Twitter using R
Need to create an app
api_key
api_token
access_token
access_secret
18. Big data Analytics
Comparison between data mining tools
Characteristic R Rapidminer Weka
Purpose Statistics,Clusteirng
and analytics
Data Mining,
Classification
Data Mining,
Association.
Data Import .xlsx, csv,
RODBSC, .txt
.csv.xlsx, binary files .csv.arff
Specialization It has a large
number of users, in
the fields of bio-
informatics and
social science.
Specialized for
Business solutions that
include predictive
analysis and statistical
computing.
Weka is best suited
for mining
association rules
and machine
learning techniques.
Advantages Purely statistical Visualization,
Parameter
optimization
Ease of use and
machine learning