3. BIG DATA!!!
It describes the large volume of structured and
unstructured data.
Big data is high-volume,high-velocity and high-veriety
information assets that demand cost efficient,
innovative forms of information processing for
enhanced insight and decision making.
4. BIG DATA is about.......BIG DATA IS ABOUT....
1.Storing and accessing large amount of data
2.Processing high volume of data stream
3.Converting unstructured to structured data
4.Making sense of the data
5.Predictive technologies
5. WHY IS BIG DATA IMPORTANT?
Can take data from any source, analyse it tofind
answers that enables:
*Cost reduction
*Time reduction
*New product development
*optimized offering
*Smart decision making
6.
7. VOLUME
Organizations collect large amount of data from
variety of sources, including
Business transactions, social media and
information machine -to-machine data
90% of all data ever created, was created in the
past 2 years
8. VELOCITY
Speed at which the data is created, stored,
analysed and visualized
In the past the processing and updatation of data
took substantial amount of time
In big data era the data is created in real-time
With the availability of internet the machines can
pass-on the
Data the moment it is created
9. VARIETY
In the past, all the data that was createdvwas
structured data
Now most of the data sytored are unstructured
data
The data can a structured data, unstructured
data, semi-structured data, complex structured
data
Wide variety of data requires different types of
analysis and techniques to store all raw data
10. VERACITY
Making sure that the data is accurate
It requires processes to keep the bad data from
accumulating in the system
Incorrect data can cause problems to the
organization as well as to the consumer
So the organization must ensure that the data stored
is correct and the analysis performed on it is correct
By this the organization can become information-
centric
11. VARIABILITY
Big data is extremely variable
It is the rapid change in the meaning of the
content of a data
So to perform a proper analysis, algorithm need
to be able to understand the context and be
able to decipher the exact meaning of a word in
that context
12. VISUALIZATION
Making all the vast amount of data
comprehensible in a manner that is easy to
understand and read
Raw data can bo put to use or remains
essentially useless with the right analysis and
visualization
Visualization means complex graphs that can
include many variables of data that are
understandable and readable
13. VALUE
The ultimate objective of any big data project should
be to generate some sort of value for the company
doing all the analysis
The value is in the analysis done on the data and
how the data is turned into information
The value is in how the organization uses the data
and turn into an information-centric company that
relies on insights derived from data analysis for their
decision-making
15. It is the proces of storing and managing
of data
A good data storage provider offers an
infrastructure on which to ru all the
other analytics tools as well as place to
store and query the data
Some of the storage tools are:
Hadoop
Cloudera
MangoDB
Talend
16. The process of refining and reshaping data
into a useable data set
This is done before data mining
The tools used are:
OpenRefine
DataCleaner
17. Data mining is the process of discovering insights
within a database
The aim of data mining is to make prediction and
decisions on the data that we have
The tools userd are:
RapidMiner
IBM SPSS Modeler
Oracle data mining
Teradata
FramedData
Kaggle
18. Data analysis is about breaking down the
data being mined and assessing the impact
of those patterns overtime
Analytics is about asking specific questions
and finding the answers in data
The tools used are:
Qubole
BigML
Statwing
19. Visualizations are a bright and easy way to convey
complex data insights
The tools used are:
Tableau
Silk
CartoDB
Chartio
Plot.ly
datawrapper
20. Data integration platforms are the glue between
each program
It connect the data between source and
destination
The tools used are:
Blockspring
Pentaho
21. Some of the languages used for data
manipulation are:
R
Python
RegEx
Xpath
22. The data has to be collected beforestoring, analysing
or visualizing it
Data extraction is the process of taking something
that is unstructured like webpage and turning it into a
structured table
Once the data is in structured format , manipulation
can be done
The most used tool for data extraction is import.io