2. Outline
Big Data : An Introduction
Big Data Analytics
Big Data Analytics : Applications and Business
prosperity
Big Data Technology
Big Data : Issues and Challenges
Conclusion
10/3/20182 Don Bosco College, Yelagiri hills.
4. Introduction
4
Data
Facts and piece of information collected together
for reference or analysis
Information processed or stored by computer &
other electronic devices
Text, image, audio, video, etc.,
10/3/2018Don Bosco College, Yelagiri hills.
5. Introduction
10/3/20185
Big data is similar to data, but it’s not behave the
same
The term ‘big data’ applies to information that cannot be
processed or handled using traditional processes or tools
1 8 bit
1024
byte
1024
kilobyte
1024
megabyte
1024
Gigabyte
1024
Terabyte
1024
petabyte
1024
Exabyte
1024
zeta byte
Bit
Byte
Kilobyte
Megabyte
Gigabyte
Terabyte
Petabyte
Exabyte
Zetabyte
Yottobye
Don Bosco College, Yelagiri hills.
6. Definition
10/3/20186
There is no single standard definition.
Big data is high-volume, high-velocity and high-
variety information assets that demand cost-effective,
innovative forms of information processing for enhanced
insight and decision making.
-Gartner.
“Big data exceeds the reach of commonly used hardware
environments and software tools to capture, manage,
and process it with in a tolerable elapsed time for its user
population.” -Teradata Magazine article,
2011.
Don Bosco College, Yelagiri hills.
8. Introduction
Characteristics of Big Data.
Volume:
Huge size of data (Tera byte to Peta byte) at rest.
Velocity:
Data in motion (streaming data).
Variety:
Varieties of data (image, audio, text, video, etc).
10/3/20188 Don Bosco College, Yelagiri hills.
9. Introduction
Characteristics of Big Data
Now researchers include more V’s
Veracity
Value
Variability
.
.
.
.
Victory
10/3/20189 Don Bosco College, Yelagiri hills.
13. Sources of Big Data
13
What is big data?
Every day, we create 2.5 quintillion bytes of data
— so much that 90% of the data in the world today has been created
in the last two years alone.
Data comes from everywhere:
sensors used to gather climate information
posts to social media sites
digital pictures and videos
purchase transaction records
cell phone GPS signals, etc.
This data is big data.
10/3/2018Don Bosco College, Yelagiri hills.
14. Web & Ecommerce
BECOMES
BIG
DATABank/Credit card
Transactional
Mobile
Social
Video & Preference
Machine & Sensor
Retail POS
Sources of Big Data
10/3/201814 Don Bosco College, Yelagiri hills.
15. Who is generating big data?
10/3/201815
The Model of Generating/Consuming Data has
Changed
Old Model: Few companies are generating data, all others are consuming
data
New Model: all of us are generating data, and all of us are consuming
data
Don Bosco College, Yelagiri hills.
17. What we know or see
What’s actually there
What does Big Data look like ?
10/3/201817 Don Bosco College, Yelagiri hills.
18. Area of Applications
10/3/201818
Health care / Biotech.
E – Governance.
Social Networks /
Social Media.
Weather Forecasting.
Education data.
Don Bosco College, Yelagiri hills.
19. Area of Applications
10/3/201819
Banking / Insurance / Finance.
Retail industries.
CRM / Customer Analytics.
Airways and etc.,.
Don Bosco College, Yelagiri hills.
21. Definition
Big data analytics is the process of examining
enormous amounts of data of a variety of types to
uncover hidden patterns, unknown correlations and other
useful information.
Example:
Searches in “friends” networks at social-networking
sites, involve graphs with hundreds of millions of nodes
and many billions of edges.
10/3/201821 Don Bosco College, Yelagiri hills.
22. Why Big Data Analytics Feasible?
10/3/2018Don Bosco College, Yelagiri hills.22
Increased storage capacities
Next generation products
Cost Reduction
Faster and better decision making
Communication networking
Improved services or products
Distributed processing technologies
23. Stages in Big Data Analytics
10/3/201823 Don Bosco College, Yelagiri hills.
24. Available Analytic Methods
Traditional Data Processing systems
Information Processing using statistical tools
Knowledge Engineering and Intelligence Systems
Business Analytics using Data mining
Business Intelligence
Genetic Algorithms
Machine learning algorithms
Exploratory data analysis and etc.,
10/3/201824 Don Bosco College, Yelagiri hills.
25. Types of Big Data Analytics
10/3/201825
Analytics
Descriptive:
what is
happened?
Predictive:
what will
happen?
Prescriptive:
What
should
happen?
Don Bosco College, Yelagiri hills.
27. Analysis of data is a process of,
with the goal of discovering useful information,
suggesting conclusions, and supporting decision-making.
Activities in Analytics
Inspecting
Cleaning
Transforming
modeling
10/3/201827 Don Bosco College, Yelagiri hills.
28. Why new analytical method needed?
Big in Size – (Volume)
Unstructured data – (Variety)
To analyze the streaming data (High-Velocity)
Distributed
Need of parallel analytics
10/3/201828 Don Bosco College, Yelagiri hills.
30. Key Technologies for Big data
DFS (Distributed File System):
Large files are split into parts
Move file parts into a cluster
Fault-tolerant through replication across nodes while being rack-
aware
MapReduce:
Move algorithms close to the data by structuring them for
parallel execution so that each task works on a part of the data. The
power of Simplicity!
NoSQL:
A NoSQL (often interpreted as Not Only SQL) database
provides a mechanism for storage and retrieval of data that is modeled
in means other than the tabular relations used in relational databases.
10/3/201830 Don Bosco College, Yelagiri hills.
31. Key Technologies for Big data
Three key technologies that can help to handle big data:
Information management for big data: Manage data as
a strategic, core asset, with ongoing process control
High-performance analytics for big data: Gain rapid
insights from big data and the ability to solve increasingly
complex problems
Flexible deployment options for big data: Choose
between options for on premises or hosted, software-as-a-
service (SaaS) approaches
10/3/201831 Don Bosco College, Yelagiri hills.
32. Fast Processors and Massively Parallel Processing
(MPP)
Distributed File System
Apache Hadoop
Data Intensive Computing Strategies
Low cost storages, In-Memory Processing
Technologies for Big data
10/3/201832 Don Bosco College, Yelagiri hills.
33. Hadoop Distributions
Hortonworks
Cloud Operating System
Cloud Foundry — By VMware
OpenStack — Worldwide participation and well-known
companies
Storage
fusion-io — Not open source, but very supportive of Open
Source projects; Flash-aware applications.
10/3/2018Don Bosco College, Yelagiri hills.33
Technologies for Big data
34. Python — Awesome programming language.
Mahout — Machine learning programming
language.
R — Best among Data mining tools.
Storm — Stream processing by Twitter.
Giraph — Graph processing by Facebook.
10/3/2018Don Bosco College, Yelagiri hills.34
Development Platforms and Tools
37. Big Data: Issues &
Challenges
10/3/201837 Don Bosco College, Yelagiri hills.
38. Challenges
10/3/201838
The Bottleneck is…..
In technology
New architecture, algorithms, techniques are needed
Also in technical skills
Lack of experts in using the new technology
Don Bosco College, Yelagiri hills.
40. Challenges
Internet of Things related
The amount of data needed to sort, improve, integrate,
analyze and manage is huge.
Sensor devices, constantly chattering updates about
moisture, light, movement
Real-time stream data analytics platform that can handle
Big Data and a scalable infrastructure to support it.
10/3/201840 Don Bosco College, Yelagiri hills.
41. Challenges
Cloud computing related
Traditional WAN-based transport methods cannot move
terabytes of data at the speed dictated by businesses
10/3/201841 Don Bosco College, Yelagiri hills.
43. Challenges: Storage related
Clearly not enough hard disks/devices.
Distributed storage is still not enough, manufacturers
cannot make enough storage devices in time.
Speed in writing to devices, bigger data paths/data-bus
10/3/201843 Don Bosco College, Yelagiri hills.
44. Challenges: Management related
Data Collection
Organize the varieties of data
Need of distributed environments
Need of new analytical methodology
10/3/201844 Don Bosco College, Yelagiri hills.
45. Challenges: Processing related
Integrating data using Filters
“What” Data and “How” ?
Effective Data processing system Design
Latency and Bandwidth
Streaming data processing
10/3/201845 Don Bosco College, Yelagiri hills.
46. Challenges: Big data visualization
Meeting the need for speed
Understanding the data
Addressing data quality
Displaying meaningful results
10/3/201846 Don Bosco College, Yelagiri hills.
48. For Researchers
Research institutes and companies invite more data
scientists for the research and development.
Research opportunities in R & D in the respective fields
such as
Telecom industry
Retail industry
Social networks
Healthcare industry and so on.
10/3/201848
49. For Students
10/3/201849
Develop deep analytical skills to grab Analyst positions
Basic knowledge about Optimization techniques, Data
mining, Machine Learning algorithms, etc.
Keep an eye on evolving technologies