SlideShare a Scribd company logo
Faculty Name: Namrata
Sharma/Arjun S. Parihar
Year/Branch:3rd/CSE
Subject Code:CS-503(A)
Subject Name:Data Analytics
In this session you will learn about:
• Hadoop Ecosystem
• Data discovery
• Open source technology for Big Data Analytics
• cloud and Big Data
Learning Objectives
 Apache Hadoop is the most powerful tool of Big Data.
 Hadoop ecosystem revolves around three main components-
• HDFS
• MapReduce
• YARN
Apart from these Hadoop Components, there are
some other Hadoop ecosystem components also, that play an important
role to boost Hadoop functionalities.
Hadoop
Hadoop Components
1.1 HDFS
Hadoop Distributed File system (HDFS) is the primary storage system
of Hadoop.
HDFS store very large files running on a cluster of commodity
hardware.
It follows the principle of storing less number of large files rather than
the huge number of small files.
HDFS stores data reliably even in the case of hardware failure.
 it provides high throughput access to the application by accessing in
parallel.
1.11 NameNode –
 It works as Master in Hadoop cluster.
 Namenode stores meta-data i.e. number of blocks, replicas and other
details.
 Meta-data is present in memory in the master.
 NameNode assigns tasks to the slave node.
 It should deploy on reliable hardware as it is the centerpiece of HDFS.
Components of HDFS
1.12 DataNode –
 It works as Slave in Hadoop cluster.
 In Hadoop HDFS, DataNode is responsible for storing actual data
in HDFS.
 DataNode also performs read and write operation as per request
for the clients.
 DataNodes can also deploy on commodity hardware.
1.2 MapReduce
 Hadoop MapReduce is the data processing layer of Hadoop.
 It processes large structured and unstructured data stored in HDFS.
 MapReduce also processes a huge amount of data in parallel.
 It does this by dividing the job (submitted job) into a set of independent
tasks (sub-job).
 In Hadoop, MapReduce works by breaking the processing into phases.
1.3 YARN
Hadoop YARN provides the resource management.
It is the operating system of Hadoop.
So, it is responsible for managing and monitoring workloads,
implementing security controls.
It is a central platform to deliver data governance tools across Hadoop
clusters.
YARN allows multiple data processing engines such as real-time
streaming, batch processing etc.
Resource Manager –
It is a cluster level component and runs on the Master machine.
It manages resources and schedule applications running on the top of
YARN.
It has two components: Scheduler & Application Manager.
Node Manager –
 It is a node level component.
It runs on each slave machine.
It continuously communicate with Resource Manager to remain up-to-date
Components of YARN
Data discovery is the collection and analysis of data from various sources
to gain insight from hidden patterns and trends.
It is the first step in fully harnessing an organization’s data to inform
critical business decisions.
Through the data discovery process, data is gathered, combined, and
analyzed in a sequence of steps.
The goal is to make messy and scattered data clean, understandable, and
user-friendly.
Data discovery
According to Gartner, “Big Data Discovery” is the next big trend in
analytics.
Hottest trends of the last few years in analytics:
Big Data
Data Discovery
Data Science
What are the Benefits of Data Discovery?
Gather Actionable Insights
Save Time
Scale Data Across Teams
Clean and Reuse Data
Data discovery provides a framework for firms to unlock and act upon the
insights contained within their data.
It transforms messy and unstructured data to facilitate and enhance its
analysis. Data discovery allows firms to:
Data Discovery Tools
We know we want collect, store, organize,
analyze and share it.
But we have limited resources.
What is Cloud Computing?
25
Cloud computing is a fast-
growing technology that has
established itself in the next
generation of IT industry and
business.
Cloud Service Model
26
Cloud service model typically consists of paas, saas, and laas.
Cloud Process
Case Study
 Application
• Call Center surveillance
 Background
• Previously – voice data
 Goal for a new system
• Monitor data & voice
• Multiple data sources
• Advanced correlations
Ever Growing Data
Deeper Correlation
Tight Performance
A Classic Case for..
Cost Business
Impact
Big Data
in the Cloud
 Auto start VMs
 Install and configure app components
 Monitor
 Repair
 (Auto) Scale
Managing Big
Data on the cloud
Big Data in the cloud
Reduce the
infrastructure cost
Choose the right
cloud for the job
Big Data in the cloud
• Consistent Management
• Automation Through the Entire Stack
Reducing the
operational
complexity
Big Data in the cloud
Predictive analytics is the practice of extracting insights from the existing
data set with the help data mining, statistical modeling and machine
learning techniques and using it to predict unobserved/unknown events.
Identifying cause-effect relationships across the variables from the
historical data.
Discovering hidden insights and patterns with the help of data mining
techniques.
Apply observed patterns to unknowns in the Past, Present or Future.
Predictive Analytics
Thanks!

More Related Content

Similar to data analytics lecture4.pptx

Big Data Analytics With Hadoop
Big Data Analytics With HadoopBig Data Analytics With Hadoop
Big Data Analytics With Hadoop
Umair Shafique
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoop
Manoj Jangalva
 
G017143640
G017143640G017143640
G017143640
IOSR Journals
 
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – Hadoop
IOSR Journals
 
1.demystifying big data & hadoop
1.demystifying big data & hadoop1.demystifying big data & hadoop
1.demystifying big data & hadoop
databloginfo
 
BigData & Hadoop Ecosystem.pptx
BigData & Hadoop Ecosystem.pptxBigData & Hadoop Ecosystem.pptx
BigData & Hadoop Ecosystem.pptx
BibhasDeb1
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
saisreealekhya
 
BIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdfBIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdf
DIVYA370851
 
Hadoop info
Hadoop infoHadoop info
Hadoop info
Nikita Sure
 
Hadoop
HadoopHadoop
Hadoop
Ankit Prasad
 
Hadoop
HadoopHadoop
Hadoop
Mayuri Gupta
 
Distributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxDistributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptx
Uttara University
 
Bigdata and hadoop
Bigdata and hadoopBigdata and hadoop
Bigdata and hadoop
Aditi Yadav
 
Big data
Big dataBig data
Big data
revathireddyb
 
Big data
Big dataBig data
Big data
revathireddyb
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
Nalini Mehta
 
Hadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceHadoop Ecosystem at a Glance
Hadoop Ecosystem at a Glance
Neev Technologies
 
Big data
Big dataBig data
Big data
Abilash Mavila
 
Fundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and HadoopFundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and Hadoop
Archana Gopinath
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop Basics
Sonal Tiwari
 

Similar to data analytics lecture4.pptx (20)

Big Data Analytics With Hadoop
Big Data Analytics With HadoopBig Data Analytics With Hadoop
Big Data Analytics With Hadoop
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoop
 
G017143640
G017143640G017143640
G017143640
 
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – Hadoop
 
1.demystifying big data & hadoop
1.demystifying big data & hadoop1.demystifying big data & hadoop
1.demystifying big data & hadoop
 
BigData & Hadoop Ecosystem.pptx
BigData & Hadoop Ecosystem.pptxBigData & Hadoop Ecosystem.pptx
BigData & Hadoop Ecosystem.pptx
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
 
BIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdfBIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdf
 
Hadoop info
Hadoop infoHadoop info
Hadoop info
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Distributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxDistributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptx
 
Bigdata and hadoop
Bigdata and hadoopBigdata and hadoop
Bigdata and hadoop
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
Hadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceHadoop Ecosystem at a Glance
Hadoop Ecosystem at a Glance
 
Big data
Big dataBig data
Big data
 
Fundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and HadoopFundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and Hadoop
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop Basics
 

Recently uploaded

22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
KrishnaveniKrishnara1
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
IJECEIAES
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
ihlasbinance2003
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
gerogepatton
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
MIGUELANGEL966976
 
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball playEric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
enizeyimana36
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have oneISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
Las Vegas Warehouse
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
Yasser Mahgoub
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
VICTOR MAESTRE RAMIREZ
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
Madan Karki
 
Recycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part IIRecycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part II
Aditya Rajan Patra
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
University of Maribor
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
171ticu
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
Madan Karki
 
New techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdfNew techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdf
wisnuprabawa3
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
IJNSA Journal
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 

Recently uploaded (20)

22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
 
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball playEric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have oneISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
 
Recycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part IIRecycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part II
 
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
 
New techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdfNew techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdf
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 

data analytics lecture4.pptx

  • 1.
  • 2. Faculty Name: Namrata Sharma/Arjun S. Parihar Year/Branch:3rd/CSE Subject Code:CS-503(A) Subject Name:Data Analytics
  • 3. In this session you will learn about: • Hadoop Ecosystem • Data discovery • Open source technology for Big Data Analytics • cloud and Big Data Learning Objectives
  • 4.  Apache Hadoop is the most powerful tool of Big Data.  Hadoop ecosystem revolves around three main components- • HDFS • MapReduce • YARN Apart from these Hadoop Components, there are some other Hadoop ecosystem components also, that play an important role to boost Hadoop functionalities. Hadoop
  • 5.
  • 6. Hadoop Components 1.1 HDFS Hadoop Distributed File system (HDFS) is the primary storage system of Hadoop. HDFS store very large files running on a cluster of commodity hardware. It follows the principle of storing less number of large files rather than the huge number of small files. HDFS stores data reliably even in the case of hardware failure.  it provides high throughput access to the application by accessing in parallel.
  • 7.
  • 8. 1.11 NameNode –  It works as Master in Hadoop cluster.  Namenode stores meta-data i.e. number of blocks, replicas and other details.  Meta-data is present in memory in the master.  NameNode assigns tasks to the slave node.  It should deploy on reliable hardware as it is the centerpiece of HDFS. Components of HDFS
  • 9.
  • 10. 1.12 DataNode –  It works as Slave in Hadoop cluster.  In Hadoop HDFS, DataNode is responsible for storing actual data in HDFS.  DataNode also performs read and write operation as per request for the clients.  DataNodes can also deploy on commodity hardware.
  • 11.
  • 12. 1.2 MapReduce  Hadoop MapReduce is the data processing layer of Hadoop.  It processes large structured and unstructured data stored in HDFS.  MapReduce also processes a huge amount of data in parallel.  It does this by dividing the job (submitted job) into a set of independent tasks (sub-job).  In Hadoop, MapReduce works by breaking the processing into phases.
  • 13.
  • 14. 1.3 YARN Hadoop YARN provides the resource management. It is the operating system of Hadoop. So, it is responsible for managing and monitoring workloads, implementing security controls. It is a central platform to deliver data governance tools across Hadoop clusters. YARN allows multiple data processing engines such as real-time streaming, batch processing etc.
  • 15.
  • 16. Resource Manager – It is a cluster level component and runs on the Master machine. It manages resources and schedule applications running on the top of YARN. It has two components: Scheduler & Application Manager. Node Manager –  It is a node level component. It runs on each slave machine. It continuously communicate with Resource Manager to remain up-to-date Components of YARN
  • 17. Data discovery is the collection and analysis of data from various sources to gain insight from hidden patterns and trends. It is the first step in fully harnessing an organization’s data to inform critical business decisions. Through the data discovery process, data is gathered, combined, and analyzed in a sequence of steps. The goal is to make messy and scattered data clean, understandable, and user-friendly. Data discovery
  • 18.
  • 19. According to Gartner, “Big Data Discovery” is the next big trend in analytics. Hottest trends of the last few years in analytics: Big Data Data Discovery Data Science
  • 20.
  • 21. What are the Benefits of Data Discovery? Gather Actionable Insights Save Time Scale Data Across Teams Clean and Reuse Data Data discovery provides a framework for firms to unlock and act upon the insights contained within their data. It transforms messy and unstructured data to facilitate and enhance its analysis. Data discovery allows firms to:
  • 23.
  • 24. We know we want collect, store, organize, analyze and share it. But we have limited resources.
  • 25. What is Cloud Computing? 25 Cloud computing is a fast- growing technology that has established itself in the next generation of IT industry and business.
  • 26. Cloud Service Model 26 Cloud service model typically consists of paas, saas, and laas.
  • 28. Case Study  Application • Call Center surveillance  Background • Previously – voice data  Goal for a new system • Monitor data & voice • Multiple data sources • Advanced correlations
  • 29. Ever Growing Data Deeper Correlation Tight Performance
  • 30. A Classic Case for..
  • 33.  Auto start VMs  Install and configure app components  Monitor  Repair  (Auto) Scale Managing Big Data on the cloud Big Data in the cloud
  • 34. Reduce the infrastructure cost Choose the right cloud for the job Big Data in the cloud
  • 35. • Consistent Management • Automation Through the Entire Stack Reducing the operational complexity Big Data in the cloud
  • 36. Predictive analytics is the practice of extracting insights from the existing data set with the help data mining, statistical modeling and machine learning techniques and using it to predict unobserved/unknown events. Identifying cause-effect relationships across the variables from the historical data. Discovering hidden insights and patterns with the help of data mining techniques. Apply observed patterns to unknowns in the Past, Present or Future. Predictive Analytics
  • 37.