Big data

Big Data
Principles of Database Design
Textbook Reference:
Oracle The Big Data Handbook

Today we will discuss:
• What is Data?
• Why Big Data?
• How it is Different?
• Characteristic of Big Data
• Application of Big Data
• Benefits of Big Data
• Future of Big Data
2
To be continued …

What is Data?
• Data can be any character, text, words,
number, pictures, sound, or video and, if not
put into context, means little or nothing to a
human.
• Information is useful and usually formatted
in a manner that allows it to be understood
by a human.

Big data V/s Small Data
Big Data
• The large picture
• Encompasses many
different types of data
• Unstructured data
• Unfocused
• Difficulty to interpret
Small Data
• The small picture.
• Mostly Homogenous
• Structured
• Focused
• Easily Interpreted

Why the hype around Big Data?
• An aim to solve new problems or old
problems in a better way
• Big Data generates value from the storage
and processing of very large quantities of
digital information

How is big data different?
• Automatically generated by a machine (e.g.
Sensor embedded in an engine)
• Typically an entirely new source of data (e.g.
Use of the internet)
• Not designed to be friendly (e.g. Text
streams)
• May not have much values
• Need to focus on the important part

How big is big data?
• Analysts predict that by 2020, there will be 5,200
gigabytes of data on every person in the world.
• On average, people send about 500 million tweets per
day.
• The average U.S. customer uses 1.8 gigabytes of data
per month on his or her cell phone plan.
• Walmart processes one million customer transactions
per hour.
• Amazon sells 600 items per second.
• On average, each person who uses email receives 88
emails per day and send 34. That adds up to more than
200 billion emails each day.
• MasterCard processes 74 billion transactions per year.
• Commercial airlines make about 5,800 flights per day.

Big data is not much howbig the
data is, it is about the value within
the data

Characteristics of Big Data
Volume Data Quantity
VarietyData Types
Velocity Data Speed

Volume
Refers to vast amount of data that is generated
every second

Volume
• Today, Facebook ingests 500 terabytes of new
data every day.
• Boeing 737 will generate 240 terabytes of
flight data during a single flight across the US.
• The smart phones, the data they create and
consume; sensors embedded into everyday
objects will soon result in billions of new,
constantly-updated data feeds containing
environmental, location, and other
information, including video.

Velocity
Refers to the speed at which new data is
generated

Velocity
• Clickstreams and ad impressions capture user
behavior at millions of events per second
• High-frequency stock trading algorithms
reflect market changes within microseconds
• Machine to machine processes exchange data
between billions of devices
• Infrastructure and sensors generate massive
log data in realtime
• On-line gaming systems support millions of
concurrent users, each producing multiple
inputs per second.

Variety
Different Types of Data

Variety
• Big Data analysis includes different types of
data
• Geospatial data, 3D data, audio and video,
and unstructured text, including log files and
social media.
• Traditional database systems were designed
to address smaller volumes of structured
data, fewer updates or a predictable,
consistent data structure.

Some common Types
• Activity Data
• Conversation Data
• Photo and Video Image
• Sensor Data
• IoT Data
• Scientific Data
• Geo-spatial Data
• Biological Data

Veracity – The 4th V
Refers to the massiveness or trust worthies of
the data

Big Data Sources
Sources
Users
Systems
Sensors
Application

Storing Big Data
• Data models: key value, graph, document,
column-family
• Hadoop Distributed File System
• HBase
• Hive
Overview of Big Data stores

Storing Big Data
• Selecting data sources for analysis
• Eliminating redundant data
• Establishing the role of NoSQL
Analyzing your data characteristics

Data Analytics
• Examining large amount of data
• Identification of hidden patterns, unknown
correlations
• Better business decisions: strategic and
operational
• Effective marketing, customer satisfaction,
increased revenue

Where processing is hosted?
• Distributed Servers / Cloud (e.g. Amazon EC2)

Where data is stored?
• Distributed Storage (e.g. Amazon S3)

What is the programming model?
• Distributed Processing (e.g. MapReduce)

How data is stored & indexed?
• High-performance schema-free databases (e.g. MongoDB)

What operations are performed on data?
• Analytic / Semantic Processing

Risks of Big Data
• Will be so overwhelmed
• Need the right people and solve the right
problems
• Costs escalate too fast
• Isn’t necessary to capture 100%
• Data privacy
• Self-regulation
• Legal regulation

Better understand and
target customers

Understand and
Optimize Business

Improving Security
and Law Enforcement

Benefits of Big Data
• Ability to make better decisions and take
meaningful actions at the right time.
• Technologies like Hadoop give you the scale
and flexibility to store data before you know
how you are going to process it.

Benefits of Big Data
• Organizations are using big data to target
customer-centric outcomes, tap into internal
data and build a better information
ecosystem.
• Technologies such as MapReduce, Hive and
Impala enable you to run queries without
changing the data structures underneath.

Future of Big Data
• $15 billion on software firms only specializing
in data management and analytics.
• This industry on its own is worth more than
$100 billion and growing at almost 10% a
year which is roughly twice as fast as the
software business as a whole. •
• In February 2012, the open source analyst
firm Wikibon released the first market
forecast for Big Data , listing $5.1B revenue in
2012 with growth to $53.4B in 2017

Thank you
You can download the presentation at
slideshare.com/enfarose

Big data

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Big data

Similar to Big data (20)

Recently uploaded

Recently uploaded (20)

Big data