Big Data Analytics
Kalimullah Lone
Department of Information Technology
National Institute of Technology, Srinagar
Topics to be Covered
 Big Data
 Big Data Evolution
 Types of Data
 Elements of Big Data
 Big Data Challenges
 Big data trends
 Big Data Analytic Process
 Applications
 Data Analytics
Big Data
 Extremely large data sets that may be analyzed computationally to reveal patterns, trends and
Associations
 The process of capturing or collecting Big Data is called Datafication
 Big Data refers to the data sets whose size makes it difficult for commonly used data capturing
software tools to interpret, manage, and process them within a reasonable time frame.
Big Data Evolution
 1940s- An American Librarian felt shortfall of shelves and limited storage due
to rapid increase in information
 1960s- A Paper named “Automatic Data Compression” aimed to reduce the
size of data to increase storage capacity and rate of data transmission
 1970s- Japan, Ministry of Telecommunication started a project to study
information flow and to track the volume of information circulating in the
country
 1990s- Digital storage systems became more economical. John Masey used
the term Big Data in his research work
 2000 onwards- Methods to streamline information, 3Vs of data, Data from
infinite sources made data big
Types of Data
 Structured
 UnStructured
Structured Data
 A data set that has defined repeated patterns
 The patterns will make it easy to sort, read and process the data
 It is stored in tabular form
 The main sources of structured data are Relational databases, comma
separated values(csv) and multidimensional databases(used in data
warehouse technology)
UnStructured Data
 A data set that might or might not have logical or repeating patterns
 It consists of inconsistent data obtained from social media, email, audio,
text, images, videos and satellites
 About 80% of the enterprise data is unstructured in nature
Data Sources of Unstructured Data[Data Collected on 12/03/2017]
Data Sources of Unstructured Data[Data Collected on 12/03/2017]
Data Sources of Unstructured Data[Data Collected on 12/03/2017]
Big Data Elements
 Volume
 Velocity
 Variety
Big Data Size
Big Data Challenges
 Meeting the need for speed
 Understanding the data
 Addressing data quality
 Displaying meaningful results
 Dealing with outliers
Big Data Trends
 Movement to the cloud
 Aggregation of digital unstructured and machine IoT data
 The use of more dark data[ Information inside papers, images, Videos]
 Big data is no longer just Hadoop
 Variety, not volume or velocity, drives big-data
 Spark and machine learning light up big data
 Convergence of IoT, cloud, and big data
Reporting
 A process in which data is organized and summarized in an easy to understand
format
 It enables organizations to monitor various performance parameters and
improve customer satisfaction
 A process in which raw data is transformed into useful information
 Reports help organizations to understand what is happening
Analysis
 Analysis is a process in which data and reports are examined to get insights
from them
 These insights help organizations to plan for new strategy, introducing a new
product and improve customer satisfaction
 Analysis in simple terms transform information into insights
 Analysis helps organizations to understand why it is happening and what
action can be taken for it
 It provides answers to the questions being raised
Two perspectives of Analysis
 Decision oriented analysis:
 Traditional business intelligence approach
 Uses results of analysis in the process of making business decisions
 Action oriented analysis:
 It is used when a rapid response or action is expected to a critical situation
 A particular pattern emerges or specific types of data are detected and an appropriate
action is required to be taken
Big Data Analytic Process
 1) Business Understanding:
 Determine business objectives
 Assess situation
 Determine data mining goals
 Produce a project plan
 2) Data Collection:
 Collect initial data
 Describe data
 Explore data
 Verify data quality
Big Data Analytic Process
 3) Data Preparation:
 Select data
 Clean data
 Construct data
 Integrate data
 Format data
 4) Data modelling:
 Select modelling technique
 Generate test design
 Build model
 Assess model
Big Data Analytic Process
 5) Data evaluation:
 Evaluate Results
 Review Process
 Determine next steps
 6) Deployment:
 Plan Deployment
 Plan Monitoring & Maintenance
 Produce final report
 Review project
Types of Big Data Analytics
 Descriptive Analytics:[Deals with Past]
 It tells us, “What happened in the business?”
 Provides trends of the past or current business events
 It performs in-depth analysis of the historical and current data to reveal reason for
success and failure
 Predictive Analysis: [Deals with future]
 It is about understanding and predicting the future
 It answers the question, “What could happen?”
 It uses statistics, data mining techniques and machine learning to analyze the future
 Prescriptive Analysis: [ Deals with both]
 It answers, “ What should we do?”, on the basis of complex data obtained from descriptive
and predictive analysis
Application Area
 Transportation
 Education
 Travel
 Government [Elections]
 Healthcare
 Telecom
 Customer goods industry
 Aviation Industry
 Social Media
 Insurance Sector[Relation with social Media]
Skills Required
 Technical Skills:
 Understanding Hadoop Components, such as HDFS, MapReduce,Pig, Hive etc.
 Knowledge of Natural Language Processing
 Knowledge of statistical analysis methods
 Knowledge of Machine Learning
 Soft Skills:
 Basic understanding of how data sets organize data
 Analytical ability
Big data Introduction

Big data Introduction

  • 1.
    Big Data Analytics KalimullahLone Department of Information Technology National Institute of Technology, Srinagar
  • 2.
    Topics to beCovered  Big Data  Big Data Evolution  Types of Data  Elements of Big Data  Big Data Challenges  Big data trends  Big Data Analytic Process  Applications  Data Analytics
  • 3.
    Big Data  Extremelylarge data sets that may be analyzed computationally to reveal patterns, trends and Associations  The process of capturing or collecting Big Data is called Datafication  Big Data refers to the data sets whose size makes it difficult for commonly used data capturing software tools to interpret, manage, and process them within a reasonable time frame.
  • 4.
    Big Data Evolution 1940s- An American Librarian felt shortfall of shelves and limited storage due to rapid increase in information  1960s- A Paper named “Automatic Data Compression” aimed to reduce the size of data to increase storage capacity and rate of data transmission  1970s- Japan, Ministry of Telecommunication started a project to study information flow and to track the volume of information circulating in the country  1990s- Digital storage systems became more economical. John Masey used the term Big Data in his research work  2000 onwards- Methods to streamline information, 3Vs of data, Data from infinite sources made data big
  • 5.
    Types of Data Structured  UnStructured
  • 6.
    Structured Data  Adata set that has defined repeated patterns  The patterns will make it easy to sort, read and process the data  It is stored in tabular form  The main sources of structured data are Relational databases, comma separated values(csv) and multidimensional databases(used in data warehouse technology)
  • 7.
    UnStructured Data  Adata set that might or might not have logical or repeating patterns  It consists of inconsistent data obtained from social media, email, audio, text, images, videos and satellites  About 80% of the enterprise data is unstructured in nature
  • 8.
    Data Sources ofUnstructured Data[Data Collected on 12/03/2017]
  • 9.
    Data Sources ofUnstructured Data[Data Collected on 12/03/2017]
  • 10.
    Data Sources ofUnstructured Data[Data Collected on 12/03/2017]
  • 11.
    Big Data Elements Volume  Velocity  Variety
  • 13.
  • 14.
    Big Data Challenges Meeting the need for speed  Understanding the data  Addressing data quality  Displaying meaningful results  Dealing with outliers
  • 15.
    Big Data Trends Movement to the cloud  Aggregation of digital unstructured and machine IoT data  The use of more dark data[ Information inside papers, images, Videos]  Big data is no longer just Hadoop  Variety, not volume or velocity, drives big-data  Spark and machine learning light up big data  Convergence of IoT, cloud, and big data
  • 16.
    Reporting  A processin which data is organized and summarized in an easy to understand format  It enables organizations to monitor various performance parameters and improve customer satisfaction  A process in which raw data is transformed into useful information  Reports help organizations to understand what is happening
  • 17.
    Analysis  Analysis isa process in which data and reports are examined to get insights from them  These insights help organizations to plan for new strategy, introducing a new product and improve customer satisfaction  Analysis in simple terms transform information into insights  Analysis helps organizations to understand why it is happening and what action can be taken for it  It provides answers to the questions being raised
  • 18.
    Two perspectives ofAnalysis  Decision oriented analysis:  Traditional business intelligence approach  Uses results of analysis in the process of making business decisions  Action oriented analysis:  It is used when a rapid response or action is expected to a critical situation  A particular pattern emerges or specific types of data are detected and an appropriate action is required to be taken
  • 19.
    Big Data AnalyticProcess  1) Business Understanding:  Determine business objectives  Assess situation  Determine data mining goals  Produce a project plan  2) Data Collection:  Collect initial data  Describe data  Explore data  Verify data quality
  • 20.
    Big Data AnalyticProcess  3) Data Preparation:  Select data  Clean data  Construct data  Integrate data  Format data  4) Data modelling:  Select modelling technique  Generate test design  Build model  Assess model
  • 21.
    Big Data AnalyticProcess  5) Data evaluation:  Evaluate Results  Review Process  Determine next steps  6) Deployment:  Plan Deployment  Plan Monitoring & Maintenance  Produce final report  Review project
  • 22.
    Types of BigData Analytics  Descriptive Analytics:[Deals with Past]  It tells us, “What happened in the business?”  Provides trends of the past or current business events  It performs in-depth analysis of the historical and current data to reveal reason for success and failure  Predictive Analysis: [Deals with future]  It is about understanding and predicting the future  It answers the question, “What could happen?”  It uses statistics, data mining techniques and machine learning to analyze the future  Prescriptive Analysis: [ Deals with both]  It answers, “ What should we do?”, on the basis of complex data obtained from descriptive and predictive analysis
  • 23.
    Application Area  Transportation Education  Travel  Government [Elections]  Healthcare  Telecom  Customer goods industry  Aviation Industry  Social Media  Insurance Sector[Relation with social Media]
  • 24.
    Skills Required  TechnicalSkills:  Understanding Hadoop Components, such as HDFS, MapReduce,Pig, Hive etc.  Knowledge of Natural Language Processing  Knowledge of statistical analysis methods  Knowledge of Machine Learning  Soft Skills:  Basic understanding of how data sets organize data  Analytical ability