UNIT : II
Chracteristics of Data
 Composition: deals with the structure of data i.e. sources of
data, types of data, nature of data.
 Condition: deals with state of data i.e.
 Context: deals with generation of data, sensitivity of data.
Evolution of Big Data
 In 1970s : The data was essentially primitive and
structured.
 In 1980s and 1990s : Relational databases evolved,
so the era was of Data-intensive applications.
 In 2000 and beyond : WWW and IoT have led to
structured, unstructured and multimedia data.
Big Data
Define Big Data?
 It's anything beyond imagination.
 Today's BIG may be tomorrow's NORMAL.
 Terabytes, Petabytes or Zettabytes of data.
 About 3V's.
 In 2001 industry analyst Doug Laney defines “Big Data” as the three
V’s (3Vs): Volume, Velocity and Variety.
 In 2012 Gartner update this definition as, “Big Data” is high-volume,
high-velocity & high-variety information assets that demand cost-
effective, innovative form of information processing for enhanced
insight and decision making.
 Big data is an evolving term that describes any voluminous amount
of structured, semi-structured and unstructured data that has the
potential to be mined for information.
Big Data
Challenges with Big Data
Challenges with Big Data
Capture
Storage
Curation
Search
Analysis
Transfer
Visualization
Privacy
Characteristics of Big Data
Big data is broken by three characteristics.
Extremely largeVolume of data
Extremely highVelocity of data
Extremely wideVariety of data
Other characteristics of data which
are not definitional for Big Data
 Veracity and Validity : deals with abnormality, accuracy and
correctness
 Volatility : deals with data validity
 Variability : deals with data floe which is highly inconsistent
Why Big Data?
More Data
More Acurate Analysis
More Confidence in
decision making
Impact in terms of enhancing
operational efficiency,
reducing cost & time,
innovating New products, new services,
Optimized offerings etc.
We are only Consumers or
information producers?
Consider one scenario :
1. Text msg. To attend the party.
2. use of credit/debit card at the petrol pump.
3. Point-of-sale sys. At Archie's shop.
4. Photographs & posts on social networking
sites.
5. Likes & comments to your post.
BI Versus Big Data
Bisiness Intelligence(BI)
1. All enterprise's data is
housed in a central server
2. Tipical database server
scales data Vertically
3. BI data analyzed in an offline
mode
4. BI is about Structured Data
5. Move Data to code
Big Data
1. Data resides in a
distributed file system
2. Distributed file system
scales data Horizontally
3. Big Data analyzed in both
real time as well as
offline mode.
4. Big Data is about veriety
data
5. Move Code to data
Typical Data Warehouse Environment
ERP
(Enterprise Resource
Planning)
CRM
(Customer Relationship
Management)
Third party apps
Legacy System
Data
Warehouse
Reporting/
Dashbording
OLAP
Ad hoc querying
Modeling
Typical Hadoop Environment
Web Logs
Images and Videos
Docs and PDFs
Social Media
HDFS
Operational System
Data Warehouse
Data Mart
ODS
(Operational Data Store)
Data MartHadoop
MapReduce
Functional Requirements of Big Data
Big Data
Big Data
Big Data
(1)
Collection
(2)
Integration
(3)
Analysis
(4)
Actions
Decisions
Big Data Stack
 Big Data technical Stack explain layered
architecture.
 It is how to think about Big Data.
 It is dealing with
– Storage
– Analytics
– Reporting
– Applications
 Let's watch this Vedio....
Big Data Stack
Layer 0
Layer 1
Layer 2
Layer 3
Layer 4
Big Data Stack
Layer 0 (Redundant Physical Infrastructure) :
Deals with hardware, network & so on.
 Performance: How responsive do you need the sys. To be?
performance of your machine, very fast infrastructures tends
to be very expensive.
 Availability: Do you need a 100% uptime guarantee of
servise? Highly available infrastuctures are very expensive.
 Scalability: How Big does your infrastructure need to be?
How much Disk space is needed?
 Flexibility: How quickly can you add more resourses to the
infrastructure?
 Cost: What can you afford?
Big Data Stack
Layer 1 (Security Infrastructure) :
Security and privacy requirements for big data are similar to the
requirements for conventional data environments.
 Data Access: Data should be available to authorized person.
 Application Access: Most API's offer protection from
unauthorized usage or access.
 Data Encryption: It is most challenging aspect in Big Data
environment.
 Threat Detection: The inclusion of mobile devices and social
networks exponentially increases both the amount of data and
opportunities for security threats.
Big Data Stack
Layer 2 (Operational Databases):
 For Big Data environment it is needed to be have
fast & scalable database engine.
 Use of RDBMS for Big Data is not practical
solution.
 Choose Proper Database.
 Your Database must support ACID.
Big Data Stack
Layer 3 (Organizing Data Services and Tools):
Organizing Data Services and Tools capture, validate and assemble
various big data elements in to contextually relevent collections.
Becouse Big data is massive.
Tools need to provide integration, translation, normalization and scale.
Technologies in this layer are as follows:
 A Distributed File System
 Serialization Service
 Coordination Services
 Extract, Transfer and Load (ETL) Tools
 Workflow Services
Big Data Stack
Layer 4 (Analytical data Warehouses):
 Data Warehouse and Data Mart contain normalized data
gathered from a variety of sources and assembled to facilitate
analysis of the business.
 It is for creation of reports and visualization of disparate data
items.
Big Data Analytics:
It requires proper Analytical tools
This Architecture list three classes of tools.
 Reporting and dashboards: this tools provide
“User-friendly” representation of information.
 Visualization:
 Analytics and Advanced Analytics:
Big Data Applications:
Need to choose categories of applications.

Unit 2

  • 1.
    UNIT : II Chracteristicsof Data  Composition: deals with the structure of data i.e. sources of data, types of data, nature of data.  Condition: deals with state of data i.e.  Context: deals with generation of data, sensitivity of data.
  • 2.
    Evolution of BigData  In 1970s : The data was essentially primitive and structured.  In 1980s and 1990s : Relational databases evolved, so the era was of Data-intensive applications.  In 2000 and beyond : WWW and IoT have led to structured, unstructured and multimedia data.
  • 3.
    Big Data Define BigData?  It's anything beyond imagination.  Today's BIG may be tomorrow's NORMAL.  Terabytes, Petabytes or Zettabytes of data.  About 3V's.
  • 4.
     In 2001industry analyst Doug Laney defines “Big Data” as the three V’s (3Vs): Volume, Velocity and Variety.  In 2012 Gartner update this definition as, “Big Data” is high-volume, high-velocity & high-variety information assets that demand cost- effective, innovative form of information processing for enhanced insight and decision making.  Big data is an evolving term that describes any voluminous amount of structured, semi-structured and unstructured data that has the potential to be mined for information. Big Data
  • 5.
    Challenges with BigData Challenges with Big Data Capture Storage Curation Search Analysis Transfer Visualization Privacy
  • 6.
    Characteristics of BigData Big data is broken by three characteristics. Extremely largeVolume of data Extremely highVelocity of data Extremely wideVariety of data
  • 8.
    Other characteristics ofdata which are not definitional for Big Data  Veracity and Validity : deals with abnormality, accuracy and correctness  Volatility : deals with data validity  Variability : deals with data floe which is highly inconsistent
  • 9.
    Why Big Data? MoreData More Acurate Analysis More Confidence in decision making Impact in terms of enhancing operational efficiency, reducing cost & time, innovating New products, new services, Optimized offerings etc.
  • 10.
    We are onlyConsumers or information producers? Consider one scenario :
  • 11.
    1. Text msg.To attend the party. 2. use of credit/debit card at the petrol pump. 3. Point-of-sale sys. At Archie's shop. 4. Photographs & posts on social networking sites. 5. Likes & comments to your post.
  • 12.
    BI Versus BigData Bisiness Intelligence(BI) 1. All enterprise's data is housed in a central server 2. Tipical database server scales data Vertically 3. BI data analyzed in an offline mode 4. BI is about Structured Data 5. Move Data to code Big Data 1. Data resides in a distributed file system 2. Distributed file system scales data Horizontally 3. Big Data analyzed in both real time as well as offline mode. 4. Big Data is about veriety data 5. Move Code to data
  • 13.
    Typical Data WarehouseEnvironment ERP (Enterprise Resource Planning) CRM (Customer Relationship Management) Third party apps Legacy System Data Warehouse Reporting/ Dashbording OLAP Ad hoc querying Modeling
  • 14.
    Typical Hadoop Environment WebLogs Images and Videos Docs and PDFs Social Media HDFS Operational System Data Warehouse Data Mart ODS (Operational Data Store) Data MartHadoop MapReduce
  • 15.
    Functional Requirements ofBig Data Big Data Big Data Big Data (1) Collection (2) Integration (3) Analysis (4) Actions Decisions
  • 16.
    Big Data Stack Big Data technical Stack explain layered architecture.  It is how to think about Big Data.  It is dealing with – Storage – Analytics – Reporting – Applications  Let's watch this Vedio....
  • 17.
    Big Data Stack Layer0 Layer 1 Layer 2 Layer 3 Layer 4
  • 18.
    Big Data Stack Layer0 (Redundant Physical Infrastructure) : Deals with hardware, network & so on.  Performance: How responsive do you need the sys. To be? performance of your machine, very fast infrastructures tends to be very expensive.  Availability: Do you need a 100% uptime guarantee of servise? Highly available infrastuctures are very expensive.  Scalability: How Big does your infrastructure need to be? How much Disk space is needed?  Flexibility: How quickly can you add more resourses to the infrastructure?  Cost: What can you afford?
  • 19.
    Big Data Stack Layer1 (Security Infrastructure) : Security and privacy requirements for big data are similar to the requirements for conventional data environments.  Data Access: Data should be available to authorized person.  Application Access: Most API's offer protection from unauthorized usage or access.  Data Encryption: It is most challenging aspect in Big Data environment.  Threat Detection: The inclusion of mobile devices and social networks exponentially increases both the amount of data and opportunities for security threats.
  • 20.
    Big Data Stack Layer2 (Operational Databases):  For Big Data environment it is needed to be have fast & scalable database engine.  Use of RDBMS for Big Data is not practical solution.  Choose Proper Database.  Your Database must support ACID.
  • 21.
    Big Data Stack Layer3 (Organizing Data Services and Tools): Organizing Data Services and Tools capture, validate and assemble various big data elements in to contextually relevent collections. Becouse Big data is massive. Tools need to provide integration, translation, normalization and scale. Technologies in this layer are as follows:  A Distributed File System  Serialization Service  Coordination Services  Extract, Transfer and Load (ETL) Tools  Workflow Services
  • 22.
    Big Data Stack Layer4 (Analytical data Warehouses):  Data Warehouse and Data Mart contain normalized data gathered from a variety of sources and assembled to facilitate analysis of the business.  It is for creation of reports and visualization of disparate data items.
  • 23.
    Big Data Analytics: Itrequires proper Analytical tools This Architecture list three classes of tools.  Reporting and dashboards: this tools provide “User-friendly” representation of information.  Visualization:  Analytics and Advanced Analytics:
  • 24.
    Big Data Applications: Needto choose categories of applications.