BIG DATA
Syed Measum Haider Naqvi
Network Specialist
WHAT IS BIG DATA
Big data is defined as any kind of data source that has at least three
shared characteristics:
▪ Extremely large Volumes of data
▪ Extremely high Velocity of data
▪ Extremely wide Variety of data
DEFINING BIG DATA
Big data usually refers to the following kinds of data:
▪ Traditional enterprise data: Includes customer information from CRM
systems, transactional ERP data, web store transactions, and general
ledger data.
▪ Machine-generated /sensor data: Includes Call Detail Records (“CDR”),
weblogs, smart meters, manufacturing sensors, equipment logs (often
referred to as digital exhaust), trading systems data.
▪ Social data: Includes customer feedback streams, micro-blogging sites
like Twitter, social media platforms like Facebook
BIG CHARACTERISTICS
Big data are four key characteristics:
▪ Volume: Machine-generated data is produced in much higher quantities than non-
traditional data.
▪ Velocity: Social media data streams – while not as massive as machine-generated data
– produce a huge influx of opinions and relationships valuable to customer
relationship management.
▪ Variety: Traditional data formats tend to be relatively well defined by a data schema
and change slowly. In contrast, non-traditional data formats exhibit a dizzying rate of
change. As new services are added, new sensors deployed, or new marketing
campaigns executed, new data types are needed to capture the resultant information.
▪ Value: The economic value of different data varies significantly. Typically, there is good
information hidden amongst a larger body of non-traditional data; the challenge is
identifying what is valuable and then transforming and extracting that data for
analysis.
EMERGENCE OF BIG DATA
The huge amount of data is produced as a result of democratization and
ecosystem factors such as the following:
▪ Mobility trends: Mobile devices, mobile events and sharing, and
sensory integration
▪ Data access and consumption: Internet, interconnected systems, social
networking, and convergent interfaces and access models (Internet,
search and social networking, and messaging)
▪ Ecosystem capabilities: Main changes in the information processing
model and the availability of an open source framework; the general-
purpose computing and unified network integration
BIG DATA MOVES INTO
THE ENTERPRISE
BIG DATA MOVES INTO THE ENTERPRISE
The desires of traditional enterprise data models for application, database, and storage resources
have developed over the years, and the cost and complexity of these models has improved along
the way to meet the needs of big data. This speedy change has encouraged changes in the
fundamental models that define the way that big data is stored, analyzed, and accessed. The new
models are based on a scaled-out, shared-nothing architecture, bringing new challenges to
enterprises to decide what technologies to use, where to use them, and how. One size no longer
fits all, and the traditional model is now being expanded to incorporate new building blocks that
address the tasks of big data with new information processing frameworks purpose-built to meet
big data’s requirements. However, these purpose-built systems also must meet the inherent
requirement for integration into current business models, data plans, and network infrastructures.
BIG DATA MOVES INTO THE ENTERPRISE
BIG DATA MOVES INTO THE ENTERPRISE
In traditional data warehousing terms, organizing data is called data addition. Because there is
such a high volume of big data, there is a tendency to organize data at its early destination
location, thus saving both time and money by not moving around huge volumes of data. The
infrastructure required for organizing big data must be able to process and operate data in the
original storage location; support very high throughput (often in batch) to deal with large data
processing steps; and handle a large variety of data formats, from unstructured to structured.
BIG DATA COMPONENTS
Two main building blocks are being added to the enterprise stack to accommodate big data:
 Hadoop
 NoSQL
HADOOP
Hadoop is a new technology that permits huge data volumes to be prepared and processed while
keeping the data on the original data storage cluster. Hadoop Distributed File System (HDFS) is the
long-term storage system for web logs for example. These web logs are turned into browsing
behavior (sessions) by running Map Reduce programs on the cluster and generating aggregated
results on the same cluster. These combined results are then loaded into a Relational DBMS
system.
NOSQL
NoSQL systems are designed to capture all data without categorizing and parsing it upon entry into
the system, and therefore the data is highly varied. SQL systems, on the other hand, typically place
data in well-defined structures and impose metadata on the data captured to ensure consistency
and validate data types.
BIG DATA COMPONENTS

Big data

  • 1.
    BIG DATA Syed MeasumHaider Naqvi Network Specialist
  • 2.
    WHAT IS BIGDATA Big data is defined as any kind of data source that has at least three shared characteristics: ▪ Extremely large Volumes of data ▪ Extremely high Velocity of data ▪ Extremely wide Variety of data
  • 3.
    DEFINING BIG DATA Bigdata usually refers to the following kinds of data: ▪ Traditional enterprise data: Includes customer information from CRM systems, transactional ERP data, web store transactions, and general ledger data. ▪ Machine-generated /sensor data: Includes Call Detail Records (“CDR”), weblogs, smart meters, manufacturing sensors, equipment logs (often referred to as digital exhaust), trading systems data. ▪ Social data: Includes customer feedback streams, micro-blogging sites like Twitter, social media platforms like Facebook
  • 4.
    BIG CHARACTERISTICS Big dataare four key characteristics: ▪ Volume: Machine-generated data is produced in much higher quantities than non- traditional data. ▪ Velocity: Social media data streams – while not as massive as machine-generated data – produce a huge influx of opinions and relationships valuable to customer relationship management. ▪ Variety: Traditional data formats tend to be relatively well defined by a data schema and change slowly. In contrast, non-traditional data formats exhibit a dizzying rate of change. As new services are added, new sensors deployed, or new marketing campaigns executed, new data types are needed to capture the resultant information. ▪ Value: The economic value of different data varies significantly. Typically, there is good information hidden amongst a larger body of non-traditional data; the challenge is identifying what is valuable and then transforming and extracting that data for analysis.
  • 5.
    EMERGENCE OF BIGDATA The huge amount of data is produced as a result of democratization and ecosystem factors such as the following: ▪ Mobility trends: Mobile devices, mobile events and sharing, and sensory integration ▪ Data access and consumption: Internet, interconnected systems, social networking, and convergent interfaces and access models (Internet, search and social networking, and messaging) ▪ Ecosystem capabilities: Main changes in the information processing model and the availability of an open source framework; the general- purpose computing and unified network integration
  • 6.
    BIG DATA MOVESINTO THE ENTERPRISE
  • 7.
    BIG DATA MOVESINTO THE ENTERPRISE The desires of traditional enterprise data models for application, database, and storage resources have developed over the years, and the cost and complexity of these models has improved along the way to meet the needs of big data. This speedy change has encouraged changes in the fundamental models that define the way that big data is stored, analyzed, and accessed. The new models are based on a scaled-out, shared-nothing architecture, bringing new challenges to enterprises to decide what technologies to use, where to use them, and how. One size no longer fits all, and the traditional model is now being expanded to incorporate new building blocks that address the tasks of big data with new information processing frameworks purpose-built to meet big data’s requirements. However, these purpose-built systems also must meet the inherent requirement for integration into current business models, data plans, and network infrastructures.
  • 8.
    BIG DATA MOVESINTO THE ENTERPRISE
  • 9.
    BIG DATA MOVESINTO THE ENTERPRISE In traditional data warehousing terms, organizing data is called data addition. Because there is such a high volume of big data, there is a tendency to organize data at its early destination location, thus saving both time and money by not moving around huge volumes of data. The infrastructure required for organizing big data must be able to process and operate data in the original storage location; support very high throughput (often in batch) to deal with large data processing steps; and handle a large variety of data formats, from unstructured to structured.
  • 10.
    BIG DATA COMPONENTS Twomain building blocks are being added to the enterprise stack to accommodate big data:  Hadoop  NoSQL
  • 11.
    HADOOP Hadoop is anew technology that permits huge data volumes to be prepared and processed while keeping the data on the original data storage cluster. Hadoop Distributed File System (HDFS) is the long-term storage system for web logs for example. These web logs are turned into browsing behavior (sessions) by running Map Reduce programs on the cluster and generating aggregated results on the same cluster. These combined results are then loaded into a Relational DBMS system.
  • 12.
    NOSQL NoSQL systems aredesigned to capture all data without categorizing and parsing it upon entry into the system, and therefore the data is highly varied. SQL systems, on the other hand, typically place data in well-defined structures and impose metadata on the data captured to ensure consistency and validate data types.
  • 13.