Big data

BIG DATA
Syed Measum Haider Naqvi
Network Specialist

WHAT IS BIG DATA
Big data is defined as any kind of data source that has at least three
shared characteristics:
▪ Extremely large Volumes of data
▪ Extremely high Velocity of data
▪ Extremely wide Variety of data

DEFINING BIG DATA
Big data usually refers to the following kinds of data:
▪ Traditional enterprise data: Includes customer information from CRM
systems, transactional ERP data, web store transactions, and general
ledger data.
▪ Machine-generated /sensor data: Includes Call Detail Records (“CDR”),
weblogs, smart meters, manufacturing sensors, equipment logs (often
referred to as digital exhaust), trading systems data.
▪ Social data: Includes customer feedback streams, micro-blogging sites
like Twitter, social media platforms like Facebook

BIG CHARACTERISTICS
Big data are four key characteristics:
▪ Volume: Machine-generated data is produced in much higher quantities than non-
traditional data.
▪ Velocity: Social media data streams – while not as massive as machine-generated data
– produce a huge influx of opinions and relationships valuable to customer
relationship management.
▪ Variety: Traditional data formats tend to be relatively well defined by a data schema
and change slowly. In contrast, non-traditional data formats exhibit a dizzying rate of
change. As new services are added, new sensors deployed, or new marketing
campaigns executed, new data types are needed to capture the resultant information.
▪ Value: The economic value of different data varies significantly. Typically, there is good
information hidden amongst a larger body of non-traditional data; the challenge is
identifying what is valuable and then transforming and extracting that data for
analysis.

EMERGENCE OF BIG DATA
The huge amount of data is produced as a result of democratization and
ecosystem factors such as the following:
▪ Mobility trends: Mobile devices, mobile events and sharing, and
sensory integration
▪ Data access and consumption: Internet, interconnected systems, social
networking, and convergent interfaces and access models (Internet,
search and social networking, and messaging)
▪ Ecosystem capabilities: Main changes in the information processing
model and the availability of an open source framework; the general-
purpose computing and unified network integration

BIG DATA MOVES INTO
THE ENTERPRISE

BIG DATA MOVES INTO THE ENTERPRISE
The desires of traditional enterprise data models for application, database, and storage resources
have developed over the years, and the cost and complexity of these models has improved along
the way to meet the needs of big data. This speedy change has encouraged changes in the
fundamental models that define the way that big data is stored, analyzed, and accessed. The new
models are based on a scaled-out, shared-nothing architecture, bringing new challenges to
enterprises to decide what technologies to use, where to use them, and how. One size no longer
fits all, and the traditional model is now being expanded to incorporate new building blocks that
address the tasks of big data with new information processing frameworks purpose-built to meet
big data’s requirements. However, these purpose-built systems also must meet the inherent
requirement for integration into current business models, data plans, and network infrastructures.

In traditional data warehousing terms, organizing data is called data addition. Because there is
such a high volume of big data, there is a tendency to organize data at its early destination
location, thus saving both time and money by not moving around huge volumes of data. The
infrastructure required for organizing big data must be able to process and operate data in the
original storage location; support very high throughput (often in batch) to deal with large data
processing steps; and handle a large variety of data formats, from unstructured to structured.

BIG DATA COMPONENTS
Two main building blocks are being added to the enterprise stack to accommodate big data:
 Hadoop
 NoSQL

HADOOP
Hadoop is a new technology that permits huge data volumes to be prepared and processed while
keeping the data on the original data storage cluster. Hadoop Distributed File System (HDFS) is the
long-term storage system for web logs for example. These web logs are turned into browsing
behavior (sessions) by running Map Reduce programs on the cluster and generating aggregated
results on the same cluster. These combined results are then loaded into a Relational DBMS
system.

NOSQL
NoSQL systems are designed to capture all data without categorizing and parsing it upon entry into
the system, and therefore the data is highly varied. SQL systems, on the other hand, typically place
data in well-defined structures and impose metadata on the data captured to ensure consistency
and validate data types.

Big data

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Big data

Similar to Big data (20)

Recently uploaded

Recently uploaded (20)

Big data