2. Overview
Introduction to Big Data
Characteristics of Big Data
Challenges in Big Data
Big DataTrends
Data Scientist & their roles
7/12/20172
3. Introduction to Big Data
7/12/20173
Information that can’t be processed or analyzed using
traditional processes or tools.
Data sets that are so large and complex
Data which are difficult to capture, store, process,
search and analysis
‘Big-data’ is similar to‘Small-data’, but bigger.
4. Characteristics of Big Data
Volume
• Data size
• Data is generated by machines, networks and human
interaction on systems like social media
• 2.5 Exabytes of data produced everyday which is equivalent
to 90 years of HD video
Velocity
• The pace of flow of data from sources like business,
machines, network and human interaction with social medias
or mobiles
• The data flow is massive and continuous
7/12/20174
5. Characteristics of Big Data
Variety
• Data heterogeneity: Structured and Unstructured data
• Many sources and types of data both structured and
unstructured.
• Data comes in the form of emails, photos, videos,
monitoring devices, PDFs, audio, etc
Veracity
• Uncertainty of accuracy and authenticity of data
• Biases, noise and abnormality in data.
• Data that is being stored, and mined meaningful to the
problem being analyzed or not.
7/12/20175
6. Characteristics of Big Data
Validity
• The issue of validity meaning is the data correct and accurate
for the intended use.
• Valid data is key to making the right decisions
Volatility
• How long is data valid and how long should it be stored.
• Data need to determine at what point is data no longer
relevant to the current analysis.
7/12/20176
7. Challenges in Big Data
Fault tolerance: ability to handle failures
Scalability : ability to handle data with time
Heterogeneity: ability to handle various kinds of data
7/12/20177
8. Big Data in Information System
Unstructured data handling capability
Real time data processing
Predictive analytics and in-memory analytics
7/12/20178
9. Big Data Trends
NoSQL database: for handling unstructured data
Cloud based analytics: migrating data to cloud platform
Deep learning: algorithms are used for mining data
In memory analytics: speed up analytical processing.
7/12/20179
10. Data Scientist
Person that analyses and interprets data to assist in
decision making.
The people who understand how to fish out answers to
important business questions from today's tsunami of
unstructured information
A hybrid of data hacker, analyst, communicator, and
trusted adviser
.
7/12/201710
11. Roles and Skills Of Data Scientist
Use technologies that make taming big data possible,
including Hadoop, and related open-source tools, cloud
computing, and data visualization.
Make discoveries while swimming in pool of data
Bring structure to large quantities of formless data and make
analysis possible
Write code
7/12/201711
12. Roles and Skills Of Data Scientist
Communicate what they’ve learned and suggest its
implications for new business directions
Fashion their own tools and even conduct academic-style
research
Be creative in displaying information visually and making the
patterns they find clear and compelling
7/12/201712