Diman Maharjan
070bex412
Pulchowk Campus
BIG DATA
An OVERVIEW
What is Big Data?
• Data sets with sizes beyond the ability of
commomly used software tools to capture,
curate,manage, and process the data within a
tolerable elapsed time.
• Difficult to process using on-hand database
management tools or traditional data
processing applications.
Big Data
• Big data may well be the Next Big thing in IT
world.
• New techniques , tools and architecture
Big Data today:
• Facebook system processes 2.5 billion pieces
of content and 500+ terabytes of data each
day. It’s pulling in 2.7 billion Like actions and
300 million photos per day, and it scans
roughly 105 terabytes of data each half hour.
• Walmart handles more than 1 million
customer transactions every hour.
Characteristics of Big Data
3 V’s
VOLUME VARIETYVELOCITY
Volume
• Size of data is being increased day by day,
minutes by minutes
• Big data deals with extremely large size of data
• Facebook system processes 2.5 billion pieces of
content and 500+ terabytes of data each day. It’s
pulling in 2.7 billion Like actions and 300 million
photos per day, and it scans roughly 105
terabytes of data each half hour.
• Walmart handles more than 1 million customer
transactions every hour.
Velocity
• High frequency stock trading algorithm reflect
market changes within microseconds
• Machine to machine processes exchange data
between billions of devices
• Infrastructures and sensors generate massive log
data in real time that has to be transferred
processed in less time
• Online gaming systems support millions of
concurrent users, each producing multiple inputs
per seconds
Variety
• Today , data is of various formats, types, and
structures.
• Text, numerical, images, 3D graphics, audio,
video, time series,sequences
• Structured , semistructured, unstructured
data
Generation of Big data
• Sensor technologies and networks
• Scientific instruments
• Social media and networks
• Online marketing and banking .
Big Data Analytics
• Examining large amount of data
• Appropriate information
• Identification of hidden patterns, unknown
correlations
• Competitive advantage
• Better business decisions: strategic and
operational
• Effective marketing , customer satisfaction ,
increased revenue
Distributed System
• A distributed system is a model in which
components located on networked computers
communicate and coordinate their actions by
passing messages. The components interact
with each other in order to achieve a common
goal.
Role of Distributed system in Big Data
• Distributed computing and parallel processing techniques can make
a significant difference in the latency experienced by customers,
suppliers, and partners. Many big data applications are dependent
on low latency because of the big data requirements for speed and
the volume and variety of the data.
• Provides the capability to process and analyze huge amounts of
data in near real time.
• Helps to meet big data demands
• Big data take advantage of availablr hardware by automating
processes like load balancing and optimization across a huge cluster
of nodes.
• Analysts able to use and process all the data rather than settling for
snapshots.
Who are the data scientists?
• Data scientists are a new breed of analytical
data expert who have the technical skills to
solve complex problems – and the curiosity to
explore what problems need to be solved.
• Part mathematician, part computer scientist
and part trend-spotter.
• Because they straddle both the business and
IT worlds, they’re highly sought-after and well-
paid.
Data Scientist Skillsets
Skills required for data scientist
• Curious and explorative mindset
• Ability to question existing practices and
devise alternatives
• Strong analytical skills
• Effective communication skills for diverse
audience
• Business problem-solving skills
• Cross-functional team management skills
Role and job duties of a Data scientist
• Collecting large amounts of unruly data and transforming it into a
more usable format.
• Solving business-related problems using data-driven techniques.
• Working with a variety of programming languages, including SAS, R
and Python.
• Having a solid grasp of statistics, including statistical tests and
distributions.
• Staying on top of analytical techniques such as machine learning,
deep learning and text analytics.
• Communicating and collaborating with both IT and business.
• Looking for order and patterns in data, as well as spotting trends
that can help a business’s bottom line.
Thank you

BIg Data Overview

  • 1.
  • 2.
    What is BigData? • Data sets with sizes beyond the ability of commomly used software tools to capture, curate,manage, and process the data within a tolerable elapsed time. • Difficult to process using on-hand database management tools or traditional data processing applications.
  • 3.
    Big Data • Bigdata may well be the Next Big thing in IT world. • New techniques , tools and architecture
  • 4.
    Big Data today: •Facebook system processes 2.5 billion pieces of content and 500+ terabytes of data each day. It’s pulling in 2.7 billion Like actions and 300 million photos per day, and it scans roughly 105 terabytes of data each half hour. • Walmart handles more than 1 million customer transactions every hour.
  • 5.
    Characteristics of BigData 3 V’s VOLUME VARIETYVELOCITY
  • 6.
    Volume • Size ofdata is being increased day by day, minutes by minutes • Big data deals with extremely large size of data • Facebook system processes 2.5 billion pieces of content and 500+ terabytes of data each day. It’s pulling in 2.7 billion Like actions and 300 million photos per day, and it scans roughly 105 terabytes of data each half hour. • Walmart handles more than 1 million customer transactions every hour.
  • 7.
    Velocity • High frequencystock trading algorithm reflect market changes within microseconds • Machine to machine processes exchange data between billions of devices • Infrastructures and sensors generate massive log data in real time that has to be transferred processed in less time • Online gaming systems support millions of concurrent users, each producing multiple inputs per seconds
  • 8.
    Variety • Today ,data is of various formats, types, and structures. • Text, numerical, images, 3D graphics, audio, video, time series,sequences • Structured , semistructured, unstructured data
  • 9.
    Generation of Bigdata • Sensor technologies and networks • Scientific instruments • Social media and networks • Online marketing and banking .
  • 10.
    Big Data Analytics •Examining large amount of data • Appropriate information • Identification of hidden patterns, unknown correlations • Competitive advantage • Better business decisions: strategic and operational • Effective marketing , customer satisfaction , increased revenue
  • 11.
    Distributed System • Adistributed system is a model in which components located on networked computers communicate and coordinate their actions by passing messages. The components interact with each other in order to achieve a common goal.
  • 12.
    Role of Distributedsystem in Big Data • Distributed computing and parallel processing techniques can make a significant difference in the latency experienced by customers, suppliers, and partners. Many big data applications are dependent on low latency because of the big data requirements for speed and the volume and variety of the data. • Provides the capability to process and analyze huge amounts of data in near real time. • Helps to meet big data demands • Big data take advantage of availablr hardware by automating processes like load balancing and optimization across a huge cluster of nodes. • Analysts able to use and process all the data rather than settling for snapshots.
  • 13.
    Who are thedata scientists? • Data scientists are a new breed of analytical data expert who have the technical skills to solve complex problems – and the curiosity to explore what problems need to be solved. • Part mathematician, part computer scientist and part trend-spotter. • Because they straddle both the business and IT worlds, they’re highly sought-after and well- paid.
  • 14.
  • 15.
    Skills required fordata scientist • Curious and explorative mindset • Ability to question existing practices and devise alternatives • Strong analytical skills • Effective communication skills for diverse audience • Business problem-solving skills • Cross-functional team management skills
  • 16.
    Role and jobduties of a Data scientist • Collecting large amounts of unruly data and transforming it into a more usable format. • Solving business-related problems using data-driven techniques. • Working with a variety of programming languages, including SAS, R and Python. • Having a solid grasp of statistics, including statistical tests and distributions. • Staying on top of analytical techniques such as machine learning, deep learning and text analytics. • Communicating and collaborating with both IT and business. • Looking for order and patterns in data, as well as spotting trends that can help a business’s bottom line.
  • 17.