2. What is Big Data?
• Data sets with sizes beyond the ability of
commomly used software tools to capture,
curate,manage, and process the data within a
tolerable elapsed time.
• Difficult to process using on-hand database
management tools or traditional data
processing applications.
3. Big Data
• Big data may well be the Next Big thing in IT
world.
• New techniques , tools and architecture
4. Big Data today:
• Facebook system processes 2.5 billion pieces
of content and 500+ terabytes of data each
day. It’s pulling in 2.7 billion Like actions and
300 million photos per day, and it scans
roughly 105 terabytes of data each half hour.
• Walmart handles more than 1 million
customer transactions every hour.
6. Volume
• Size of data is being increased day by day,
minutes by minutes
• Big data deals with extremely large size of data
• Facebook system processes 2.5 billion pieces of
content and 500+ terabytes of data each day. It’s
pulling in 2.7 billion Like actions and 300 million
photos per day, and it scans roughly 105
terabytes of data each half hour.
• Walmart handles more than 1 million customer
transactions every hour.
7. Velocity
• High frequency stock trading algorithm reflect
market changes within microseconds
• Machine to machine processes exchange data
between billions of devices
• Infrastructures and sensors generate massive log
data in real time that has to be transferred
processed in less time
• Online gaming systems support millions of
concurrent users, each producing multiple inputs
per seconds
8. Variety
• Today , data is of various formats, types, and
structures.
• Text, numerical, images, 3D graphics, audio,
video, time series,sequences
• Structured , semistructured, unstructured
data
9. Generation of Big data
• Sensor technologies and networks
• Scientific instruments
• Social media and networks
• Online marketing and banking .
10. Big Data Analytics
• Examining large amount of data
• Appropriate information
• Identification of hidden patterns, unknown
correlations
• Competitive advantage
• Better business decisions: strategic and
operational
• Effective marketing , customer satisfaction ,
increased revenue
11. Distributed System
• A distributed system is a model in which
components located on networked computers
communicate and coordinate their actions by
passing messages. The components interact
with each other in order to achieve a common
goal.
12. Role of Distributed system in Big Data
• Distributed computing and parallel processing techniques can make
a significant difference in the latency experienced by customers,
suppliers, and partners. Many big data applications are dependent
on low latency because of the big data requirements for speed and
the volume and variety of the data.
• Provides the capability to process and analyze huge amounts of
data in near real time.
• Helps to meet big data demands
• Big data take advantage of availablr hardware by automating
processes like load balancing and optimization across a huge cluster
of nodes.
• Analysts able to use and process all the data rather than settling for
snapshots.
13. Who are the data scientists?
• Data scientists are a new breed of analytical
data expert who have the technical skills to
solve complex problems – and the curiosity to
explore what problems need to be solved.
• Part mathematician, part computer scientist
and part trend-spotter.
• Because they straddle both the business and
IT worlds, they’re highly sought-after and well-
paid.
15. Skills required for data scientist
• Curious and explorative mindset
• Ability to question existing practices and
devise alternatives
• Strong analytical skills
• Effective communication skills for diverse
audience
• Business problem-solving skills
• Cross-functional team management skills
16. Role and job duties of a Data scientist
• Collecting large amounts of unruly data and transforming it into a
more usable format.
• Solving business-related problems using data-driven techniques.
• Working with a variety of programming languages, including SAS, R
and Python.
• Having a solid grasp of statistics, including statistical tests and
distributions.
• Staying on top of analytical techniques such as machine learning,
deep learning and text analytics.
• Communicating and collaborating with both IT and business.
• Looking for order and patterns in data, as well as spotting trends
that can help a business’s bottom line.