SlideShare a Scribd company logo
1 of 50
Arun Kumar
MSc(Computer Science),
Don Bosco College Yelagirihills.
10/3/20181 Don Bosco College, Yelagiri hills.
Outline
 Big Data : An Introduction
 Big Data Analytics
 Big Data Analytics : Applications and Business
prosperity
 Big Data Technology
 Big Data : Issues and Challenges
 Conclusion
10/3/20182 Don Bosco College, Yelagiri hills.
Big Data:
An Introduction
10/3/20183 Don Bosco College, Yelagiri hills.
Introduction
4
 Data
 Facts and piece of information collected together
for reference or analysis
 Information processed or stored by computer &
other electronic devices
 Text, image, audio, video, etc.,
10/3/2018Don Bosco College, Yelagiri hills.
Introduction
10/3/20185
 Big data is similar to data, but it’s not behave the
same
 The term ‘big data’ applies to information that cannot be
processed or handled using traditional processes or tools
1 8 bit
1024
byte
1024
kilobyte
1024
megabyte
1024
Gigabyte
1024
Terabyte
1024
petabyte
1024
Exabyte
1024
zeta byte
Bit
Byte
Kilobyte
Megabyte
Gigabyte
Terabyte
Petabyte
Exabyte
Zetabyte
Yottobye
Don Bosco College, Yelagiri hills.
Definition
10/3/20186
 There is no single standard definition.
 Big data is high-volume, high-velocity and high-
variety information assets that demand cost-effective,
innovative forms of information processing for enhanced
insight and decision making.
-Gartner.
 “Big data exceeds the reach of commonly used hardware
environments and software tools to capture, manage,
and process it with in a tolerable elapsed time for its user
population.” -Teradata Magazine article,
2011.
Don Bosco College, Yelagiri hills.
Introduction
 Characteristics of Big Data
Big Data
Velocity
Variety
Volume
10/3/20187 Don Bosco College, Yelagiri hills.
Introduction
 Characteristics of Big Data.
 Volume:
 Huge size of data (Tera byte to Peta byte) at rest.
 Velocity:
 Data in motion (streaming data).
 Variety:
 Varieties of data (image, audio, text, video, etc).
10/3/20188 Don Bosco College, Yelagiri hills.
Introduction
 Characteristics of Big Data
 Now researchers include more V’s
 Veracity
 Value
 Variability
.
.
.
.
 Victory
10/3/20189 Don Bosco College, Yelagiri hills.
Volume
10/3/201810 Don Bosco College, Yelagiri hills.
Variety
10/3/201811 Don Bosco College, Yelagiri hills.
Velocity
10/3/201812 Don Bosco College, Yelagiri hills.
Sources of Big Data
13
 What is big data?
 Every day, we create 2.5 quintillion bytes of data
— so much that 90% of the data in the world today has been created
in the last two years alone.
 Data comes from everywhere:
 sensors used to gather climate information
 posts to social media sites
 digital pictures and videos
 purchase transaction records
 cell phone GPS signals, etc.
 This data is big data.
10/3/2018Don Bosco College, Yelagiri hills.
Web & Ecommerce
BECOMES
BIG
DATABank/Credit card
Transactional
Mobile
Social
Video & Preference
Machine & Sensor
Retail POS
Sources of Big Data
10/3/201814 Don Bosco College, Yelagiri hills.
Who is generating big data?
10/3/201815
 The Model of Generating/Consuming Data has
Changed
Old Model: Few companies are generating data, all others are consuming
data
New Model: all of us are generating data, and all of us are consuming
data
Don Bosco College, Yelagiri hills.
10/3/201816 Don Bosco College, Yelagiri hills.
What we know or see
What’s actually there
What does Big Data look like ?
10/3/201817 Don Bosco College, Yelagiri hills.
Area of Applications
10/3/201818
 Health care / Biotech.
 E – Governance.
 Social Networks /
Social Media.
 Weather Forecasting.
 Education data.
Don Bosco College, Yelagiri hills.
Area of Applications
10/3/201819
 Banking / Insurance / Finance.
 Retail industries.
 CRM / Customer Analytics.
 Airways and etc.,.
Don Bosco College, Yelagiri hills.
Big Data Analytics
10/3/201820 Don Bosco College, Yelagiri hills.
Definition
 Big data analytics is the process of examining
enormous amounts of data of a variety of types to
uncover hidden patterns, unknown correlations and other
useful information.
 Example:
Searches in “friends” networks at social-networking
sites, involve graphs with hundreds of millions of nodes
and many billions of edges.
10/3/201821 Don Bosco College, Yelagiri hills.
Why Big Data Analytics Feasible?
10/3/2018Don Bosco College, Yelagiri hills.22
 Increased storage capacities
 Next generation products
 Cost Reduction
 Faster and better decision making
 Communication networking
 Improved services or products
 Distributed processing technologies
Stages in Big Data Analytics
10/3/201823 Don Bosco College, Yelagiri hills.
Available Analytic Methods
 Traditional Data Processing systems
 Information Processing using statistical tools
 Knowledge Engineering and Intelligence Systems
 Business Analytics using Data mining
 Business Intelligence
 Genetic Algorithms
 Machine learning algorithms
 Exploratory data analysis and etc.,
10/3/201824 Don Bosco College, Yelagiri hills.
Types of Big Data Analytics
10/3/201825
Analytics
Descriptive:
what is
happened?
Predictive:
what will
happen?
Prescriptive:
What
should
happen?
Don Bosco College, Yelagiri hills.
Capture
Organize
IntegrateAnalyze
Act
The Cycle of Big Data Management
10/3/201826 Don Bosco College, Yelagiri hills.
 Analysis of data is a process of,
with the goal of discovering useful information,
suggesting conclusions, and supporting decision-making.
Activities in Analytics
 Inspecting
 Cleaning
 Transforming
 modeling
10/3/201827 Don Bosco College, Yelagiri hills.
Why new analytical method needed?
 Big in Size – (Volume)
 Unstructured data – (Variety)
 To analyze the streaming data (High-Velocity)
 Distributed
 Need of parallel analytics
10/3/201828 Don Bosco College, Yelagiri hills.
Big Data Technology
10/3/201829 Don Bosco College, Yelagiri hills.
Key Technologies for Big data
 DFS (Distributed File System):
 Large files are split into parts
 Move file parts into a cluster
 Fault-tolerant through replication across nodes while being rack-
aware
 MapReduce:
Move algorithms close to the data by structuring them for
parallel execution so that each task works on a part of the data. The
power of Simplicity!
 NoSQL:
A NoSQL (often interpreted as Not Only SQL) database
provides a mechanism for storage and retrieval of data that is modeled
in means other than the tabular relations used in relational databases.
10/3/201830 Don Bosco College, Yelagiri hills.
Key Technologies for Big data
Three key technologies that can help to handle big data:
 Information management for big data: Manage data as
a strategic, core asset, with ongoing process control
High-performance analytics for big data: Gain rapid
insights from big data and the ability to solve increasingly
complex problems
Flexible deployment options for big data: Choose
between options for on premises or hosted, software-as-a-
service (SaaS) approaches
10/3/201831 Don Bosco College, Yelagiri hills.
 Fast Processors and Massively Parallel Processing
(MPP)
 Distributed File System
 Apache Hadoop
 Data Intensive Computing Strategies
 Low cost storages, In-Memory Processing
Technologies for Big data
10/3/201832 Don Bosco College, Yelagiri hills.
 Hadoop Distributions
 Hortonworks
 Cloud Operating System
 Cloud Foundry — By VMware
 OpenStack — Worldwide participation and well-known
companies
 Storage
 fusion-io — Not open source, but very supportive of Open
Source projects; Flash-aware applications.
10/3/2018Don Bosco College, Yelagiri hills.33
Technologies for Big data
 Python — Awesome programming language.
 Mahout — Machine learning programming
language.
 R — Best among Data mining tools.
 Storm — Stream processing by Twitter.
 Giraph — Graph processing by Facebook.
10/3/2018Don Bosco College, Yelagiri hills.34
Development Platforms and Tools
 NoSQL Databases
 MongoDB
 Cassandra
 Hbase (Hadoop)
 SQL Databases
 MySql — Belongs to Oracle
 PostgreSQL — Object Relational Database
 TokuDB — Improves RDBMS performance
10/3/2018Don Bosco College, Yelagiri hills.35
Databases
Visualization tools
10/3/2018Don Bosco College, Yelagiri hills.36
 Maps
 Charts (pie, bar, plot, etc)
 Graphs
Big Data: Issues &
Challenges
10/3/201837 Don Bosco College, Yelagiri hills.
Challenges
10/3/201838
The Bottleneck is…..
 In technology
 New architecture, algorithms, techniques are needed
 Also in technical skills
 Lack of experts in using the new technology
Don Bosco College, Yelagiri hills.
Data sources
Big Data Analytics
10/3/201839 Don Bosco College, Yelagiri hills.
Challenges
Internet of Things related
 The amount of data needed to sort, improve, integrate,
analyze and manage is huge.
 Sensor devices, constantly chattering updates about
moisture, light, movement
 Real-time stream data analytics platform that can handle
Big Data and a scalable infrastructure to support it.
10/3/201840 Don Bosco College, Yelagiri hills.
Challenges
Cloud computing related
 Traditional WAN-based transport methods cannot move
terabytes of data at the speed dictated by businesses
10/3/201841 Don Bosco College, Yelagiri hills.
Classified Issues & Challenges
 Storage
 Management
 Processing
 Visualization
10/3/201842 Don Bosco College, Yelagiri hills.
Challenges: Storage related
 Clearly not enough hard disks/devices.
 Distributed storage is still not enough, manufacturers
cannot make enough storage devices in time.
 Speed in writing to devices, bigger data paths/data-bus
10/3/201843 Don Bosco College, Yelagiri hills.
Challenges: Management related
 Data Collection
 Organize the varieties of data
 Need of distributed environments
 Need of new analytical methodology
10/3/201844 Don Bosco College, Yelagiri hills.
Challenges: Processing related
 Integrating data using Filters
 “What” Data and “How” ?
 Effective Data processing system Design
 Latency and Bandwidth
 Streaming data processing
10/3/201845 Don Bosco College, Yelagiri hills.
Challenges: Big data visualization
 Meeting the need for speed
 Understanding the data
 Addressing data quality
 Displaying meaningful results
10/3/201846 Don Bosco College, Yelagiri hills.
Conclusion
10/3/201847 Don Bosco College, Yelagiri hills.
For Researchers
 Research institutes and companies invite more data
scientists for the research and development.
 Research opportunities in R & D in the respective fields
such as
 Telecom industry
 Retail industry
 Social networks
 Healthcare industry and so on.
10/3/201848
For Students
10/3/201849
 Develop deep analytical skills to grab Analyst positions
 Basic knowledge about Optimization techniques, Data
mining, Machine Learning algorithms, etc.
 Keep an eye on evolving technologies
Thank you
10/3/201850 Don Bosco College, Yelagiri hills.

More Related Content

What's hot

Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure DatabricksJames Serra
 
Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemJames Serra
 
Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Rajesh Kumar
 
How to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at ScaleHow to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at ScaleDATAVERSITY
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks FundamentalsDalibor Wijas
 
Creating an Enterprise AI Strategy
Creating an Enterprise AI StrategyCreating an Enterprise AI Strategy
Creating an Enterprise AI StrategyAtScale
 
Introduction to Big Data
Introduction to Big Data Introduction to Big Data
Introduction to Big Data Srinath Perera
 
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Databricks
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)James Serra
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouseJames Serra
 
Big Data, Business Intelligence and Data Analytics
Big Data, Business Intelligence and Data AnalyticsBig Data, Business Intelligence and Data Analytics
Big Data, Business Intelligence and Data AnalyticsSystems Limited
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's includedJames Serra
 
Power BI Architecture
Power BI ArchitecturePower BI Architecture
Power BI ArchitectureArthur Graus
 
(The life of a) Data engineer
(The life of a) Data engineer(The life of a) Data engineer
(The life of a) Data engineerAlex Chalini
 

What's hot (20)

Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform System
 
Big data
Big dataBig data
Big data
 
Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture
 
How to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at ScaleHow to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at Scale
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
 
Creating an Enterprise AI Strategy
Creating an Enterprise AI StrategyCreating an Enterprise AI Strategy
Creating an Enterprise AI Strategy
 
Introduction to Big Data
Introduction to Big Data Introduction to Big Data
Introduction to Big Data
 
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Big Data, Business Intelligence and Data Analytics
Big Data, Business Intelligence and Data AnalyticsBig Data, Business Intelligence and Data Analytics
Big Data, Business Intelligence and Data Analytics
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
How to build a successful Data Lake
How to build a successful Data LakeHow to build a successful Data Lake
How to build a successful Data Lake
 
Power BI Architecture
Power BI ArchitecturePower BI Architecture
Power BI Architecture
 
(The life of a) Data engineer
(The life of a) Data engineer(The life of a) Data engineer
(The life of a) Data engineer
 

Similar to Big Data analytics

06. 9534 14985-1-ed b edit dhyan
06. 9534 14985-1-ed b edit dhyan06. 9534 14985-1-ed b edit dhyan
06. 9534 14985-1-ed b edit dhyanIAESIJEECS
 
Big Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data ScientistsBig Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data ScientistsWay-Yen Lin
 
In memory big data management and processing
In memory big data management and processingIn memory big data management and processing
In memory big data management and processingPranav Gontalwar
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIIJCSEA Journal
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIIJCSEA Journal
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIIJCSEA Journal
 
Identifying and analyzing the transient and permanent barriers for big data
Identifying and analyzing the transient and permanent barriers for big dataIdentifying and analyzing the transient and permanent barriers for big data
Identifying and analyzing the transient and permanent barriers for big datasarfraznawaz
 
Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53Mr.Sameer Kumar Das
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best PracticesDATAVERSITY
 
A study on web analytics with reference to select sports websites
A study on web analytics with reference to select sports websitesA study on web analytics with reference to select sports websites
A study on web analytics with reference to select sports websitesBhanu Prakash
 
Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Trieu Nguyen
 
BIG Data and Methodology-A review
BIG Data and Methodology-A reviewBIG Data and Methodology-A review
BIG Data and Methodology-A reviewShilpa Soi
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data ScienceEdureka!
 
Data Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A SurveyData Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A SurveyEditor IJCATR
 
Big Data in Distributed Analytics,Cybersecurity And Digital Forensics
Big Data in Distributed Analytics,Cybersecurity And Digital ForensicsBig Data in Distributed Analytics,Cybersecurity And Digital Forensics
Big Data in Distributed Analytics,Cybersecurity And Digital ForensicsSherinMariamReji05
 
BIG DATA & DATA ANALYTICS
BIG  DATA & DATA  ANALYTICSBIG  DATA & DATA  ANALYTICS
BIG DATA & DATA ANALYTICSNAGARAJAGIDDE
 

Similar to Big Data analytics (20)

06. 9534 14985-1-ed b edit dhyan
06. 9534 14985-1-ed b edit dhyan06. 9534 14985-1-ed b edit dhyan
06. 9534 14985-1-ed b edit dhyan
 
Big Data.pdf
Big Data.pdfBig Data.pdf
Big Data.pdf
 
Big Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data ScientistsBig Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data Scientists
 
In memory big data management and processing
In memory big data management and processingIn memory big data management and processing
In memory big data management and processing
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AI
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AI
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AI
 
Identifying and analyzing the transient and permanent barriers for big data
Identifying and analyzing the transient and permanent barriers for big dataIdentifying and analyzing the transient and permanent barriers for big data
Identifying and analyzing the transient and permanent barriers for big data
 
Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
A study on web analytics with reference to select sports websites
A study on web analytics with reference to select sports websitesA study on web analytics with reference to select sports websites
A study on web analytics with reference to select sports websites
 
Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)
 
Big Data a Catalunya
Big Data a CatalunyaBig Data a Catalunya
Big Data a Catalunya
 
Big Data a Catalunya
Big Data a CatalunyaBig Data a Catalunya
Big Data a Catalunya
 
BIG Data and Methodology-A review
BIG Data and Methodology-A reviewBIG Data and Methodology-A review
BIG Data and Methodology-A review
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data Science
 
Data Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A SurveyData Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A Survey
 
Big Data in Distributed Analytics,Cybersecurity And Digital Forensics
Big Data in Distributed Analytics,Cybersecurity And Digital ForensicsBig Data in Distributed Analytics,Cybersecurity And Digital Forensics
Big Data in Distributed Analytics,Cybersecurity And Digital Forensics
 
BIG DATA & DATA ANALYTICS
BIG  DATA & DATA  ANALYTICSBIG  DATA & DATA  ANALYTICS
BIG DATA & DATA ANALYTICS
 

Recently uploaded

Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
ROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint PresentationROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint PresentationAadityaSharma884161
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Quarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayQuarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayMakMakNepo
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 

Recently uploaded (20)

Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
ROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint PresentationROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint Presentation
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Quarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayQuarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up Friday
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 

Big Data analytics

  • 1. Arun Kumar MSc(Computer Science), Don Bosco College Yelagirihills. 10/3/20181 Don Bosco College, Yelagiri hills.
  • 2. Outline  Big Data : An Introduction  Big Data Analytics  Big Data Analytics : Applications and Business prosperity  Big Data Technology  Big Data : Issues and Challenges  Conclusion 10/3/20182 Don Bosco College, Yelagiri hills.
  • 3. Big Data: An Introduction 10/3/20183 Don Bosco College, Yelagiri hills.
  • 4. Introduction 4  Data  Facts and piece of information collected together for reference or analysis  Information processed or stored by computer & other electronic devices  Text, image, audio, video, etc., 10/3/2018Don Bosco College, Yelagiri hills.
  • 5. Introduction 10/3/20185  Big data is similar to data, but it’s not behave the same  The term ‘big data’ applies to information that cannot be processed or handled using traditional processes or tools 1 8 bit 1024 byte 1024 kilobyte 1024 megabyte 1024 Gigabyte 1024 Terabyte 1024 petabyte 1024 Exabyte 1024 zeta byte Bit Byte Kilobyte Megabyte Gigabyte Terabyte Petabyte Exabyte Zetabyte Yottobye Don Bosco College, Yelagiri hills.
  • 6. Definition 10/3/20186  There is no single standard definition.  Big data is high-volume, high-velocity and high- variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making. -Gartner.  “Big data exceeds the reach of commonly used hardware environments and software tools to capture, manage, and process it with in a tolerable elapsed time for its user population.” -Teradata Magazine article, 2011. Don Bosco College, Yelagiri hills.
  • 7. Introduction  Characteristics of Big Data Big Data Velocity Variety Volume 10/3/20187 Don Bosco College, Yelagiri hills.
  • 8. Introduction  Characteristics of Big Data.  Volume:  Huge size of data (Tera byte to Peta byte) at rest.  Velocity:  Data in motion (streaming data).  Variety:  Varieties of data (image, audio, text, video, etc). 10/3/20188 Don Bosco College, Yelagiri hills.
  • 9. Introduction  Characteristics of Big Data  Now researchers include more V’s  Veracity  Value  Variability . . . .  Victory 10/3/20189 Don Bosco College, Yelagiri hills.
  • 10. Volume 10/3/201810 Don Bosco College, Yelagiri hills.
  • 11. Variety 10/3/201811 Don Bosco College, Yelagiri hills.
  • 12. Velocity 10/3/201812 Don Bosco College, Yelagiri hills.
  • 13. Sources of Big Data 13  What is big data?  Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone.  Data comes from everywhere:  sensors used to gather climate information  posts to social media sites  digital pictures and videos  purchase transaction records  cell phone GPS signals, etc.  This data is big data. 10/3/2018Don Bosco College, Yelagiri hills.
  • 14. Web & Ecommerce BECOMES BIG DATABank/Credit card Transactional Mobile Social Video & Preference Machine & Sensor Retail POS Sources of Big Data 10/3/201814 Don Bosco College, Yelagiri hills.
  • 15. Who is generating big data? 10/3/201815  The Model of Generating/Consuming Data has Changed Old Model: Few companies are generating data, all others are consuming data New Model: all of us are generating data, and all of us are consuming data Don Bosco College, Yelagiri hills.
  • 16. 10/3/201816 Don Bosco College, Yelagiri hills.
  • 17. What we know or see What’s actually there What does Big Data look like ? 10/3/201817 Don Bosco College, Yelagiri hills.
  • 18. Area of Applications 10/3/201818  Health care / Biotech.  E – Governance.  Social Networks / Social Media.  Weather Forecasting.  Education data. Don Bosco College, Yelagiri hills.
  • 19. Area of Applications 10/3/201819  Banking / Insurance / Finance.  Retail industries.  CRM / Customer Analytics.  Airways and etc.,. Don Bosco College, Yelagiri hills.
  • 20. Big Data Analytics 10/3/201820 Don Bosco College, Yelagiri hills.
  • 21. Definition  Big data analytics is the process of examining enormous amounts of data of a variety of types to uncover hidden patterns, unknown correlations and other useful information.  Example: Searches in “friends” networks at social-networking sites, involve graphs with hundreds of millions of nodes and many billions of edges. 10/3/201821 Don Bosco College, Yelagiri hills.
  • 22. Why Big Data Analytics Feasible? 10/3/2018Don Bosco College, Yelagiri hills.22  Increased storage capacities  Next generation products  Cost Reduction  Faster and better decision making  Communication networking  Improved services or products  Distributed processing technologies
  • 23. Stages in Big Data Analytics 10/3/201823 Don Bosco College, Yelagiri hills.
  • 24. Available Analytic Methods  Traditional Data Processing systems  Information Processing using statistical tools  Knowledge Engineering and Intelligence Systems  Business Analytics using Data mining  Business Intelligence  Genetic Algorithms  Machine learning algorithms  Exploratory data analysis and etc., 10/3/201824 Don Bosco College, Yelagiri hills.
  • 25. Types of Big Data Analytics 10/3/201825 Analytics Descriptive: what is happened? Predictive: what will happen? Prescriptive: What should happen? Don Bosco College, Yelagiri hills.
  • 26. Capture Organize IntegrateAnalyze Act The Cycle of Big Data Management 10/3/201826 Don Bosco College, Yelagiri hills.
  • 27.  Analysis of data is a process of, with the goal of discovering useful information, suggesting conclusions, and supporting decision-making. Activities in Analytics  Inspecting  Cleaning  Transforming  modeling 10/3/201827 Don Bosco College, Yelagiri hills.
  • 28. Why new analytical method needed?  Big in Size – (Volume)  Unstructured data – (Variety)  To analyze the streaming data (High-Velocity)  Distributed  Need of parallel analytics 10/3/201828 Don Bosco College, Yelagiri hills.
  • 29. Big Data Technology 10/3/201829 Don Bosco College, Yelagiri hills.
  • 30. Key Technologies for Big data  DFS (Distributed File System):  Large files are split into parts  Move file parts into a cluster  Fault-tolerant through replication across nodes while being rack- aware  MapReduce: Move algorithms close to the data by structuring them for parallel execution so that each task works on a part of the data. The power of Simplicity!  NoSQL: A NoSQL (often interpreted as Not Only SQL) database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. 10/3/201830 Don Bosco College, Yelagiri hills.
  • 31. Key Technologies for Big data Three key technologies that can help to handle big data:  Information management for big data: Manage data as a strategic, core asset, with ongoing process control High-performance analytics for big data: Gain rapid insights from big data and the ability to solve increasingly complex problems Flexible deployment options for big data: Choose between options for on premises or hosted, software-as-a- service (SaaS) approaches 10/3/201831 Don Bosco College, Yelagiri hills.
  • 32.  Fast Processors and Massively Parallel Processing (MPP)  Distributed File System  Apache Hadoop  Data Intensive Computing Strategies  Low cost storages, In-Memory Processing Technologies for Big data 10/3/201832 Don Bosco College, Yelagiri hills.
  • 33.  Hadoop Distributions  Hortonworks  Cloud Operating System  Cloud Foundry — By VMware  OpenStack — Worldwide participation and well-known companies  Storage  fusion-io — Not open source, but very supportive of Open Source projects; Flash-aware applications. 10/3/2018Don Bosco College, Yelagiri hills.33 Technologies for Big data
  • 34.  Python — Awesome programming language.  Mahout — Machine learning programming language.  R — Best among Data mining tools.  Storm — Stream processing by Twitter.  Giraph — Graph processing by Facebook. 10/3/2018Don Bosco College, Yelagiri hills.34 Development Platforms and Tools
  • 35.  NoSQL Databases  MongoDB  Cassandra  Hbase (Hadoop)  SQL Databases  MySql — Belongs to Oracle  PostgreSQL — Object Relational Database  TokuDB — Improves RDBMS performance 10/3/2018Don Bosco College, Yelagiri hills.35 Databases
  • 36. Visualization tools 10/3/2018Don Bosco College, Yelagiri hills.36  Maps  Charts (pie, bar, plot, etc)  Graphs
  • 37. Big Data: Issues & Challenges 10/3/201837 Don Bosco College, Yelagiri hills.
  • 38. Challenges 10/3/201838 The Bottleneck is…..  In technology  New architecture, algorithms, techniques are needed  Also in technical skills  Lack of experts in using the new technology Don Bosco College, Yelagiri hills.
  • 39. Data sources Big Data Analytics 10/3/201839 Don Bosco College, Yelagiri hills.
  • 40. Challenges Internet of Things related  The amount of data needed to sort, improve, integrate, analyze and manage is huge.  Sensor devices, constantly chattering updates about moisture, light, movement  Real-time stream data analytics platform that can handle Big Data and a scalable infrastructure to support it. 10/3/201840 Don Bosco College, Yelagiri hills.
  • 41. Challenges Cloud computing related  Traditional WAN-based transport methods cannot move terabytes of data at the speed dictated by businesses 10/3/201841 Don Bosco College, Yelagiri hills.
  • 42. Classified Issues & Challenges  Storage  Management  Processing  Visualization 10/3/201842 Don Bosco College, Yelagiri hills.
  • 43. Challenges: Storage related  Clearly not enough hard disks/devices.  Distributed storage is still not enough, manufacturers cannot make enough storage devices in time.  Speed in writing to devices, bigger data paths/data-bus 10/3/201843 Don Bosco College, Yelagiri hills.
  • 44. Challenges: Management related  Data Collection  Organize the varieties of data  Need of distributed environments  Need of new analytical methodology 10/3/201844 Don Bosco College, Yelagiri hills.
  • 45. Challenges: Processing related  Integrating data using Filters  “What” Data and “How” ?  Effective Data processing system Design  Latency and Bandwidth  Streaming data processing 10/3/201845 Don Bosco College, Yelagiri hills.
  • 46. Challenges: Big data visualization  Meeting the need for speed  Understanding the data  Addressing data quality  Displaying meaningful results 10/3/201846 Don Bosco College, Yelagiri hills.
  • 47. Conclusion 10/3/201847 Don Bosco College, Yelagiri hills.
  • 48. For Researchers  Research institutes and companies invite more data scientists for the research and development.  Research opportunities in R & D in the respective fields such as  Telecom industry  Retail industry  Social networks  Healthcare industry and so on. 10/3/201848
  • 49. For Students 10/3/201849  Develop deep analytical skills to grab Analyst positions  Basic knowledge about Optimization techniques, Data mining, Machine Learning algorithms, etc.  Keep an eye on evolving technologies
  • 50. Thank you 10/3/201850 Don Bosco College, Yelagiri hills.