SlideShare a Scribd company logo
Presentation On Big Data & Data Mining
Introduction
Datamining and bigdata analytics is
the process of examining data to
uncover hidden patterns, unknown
correlations and other useful
information that can be used to make
better decisions.
Definitions:
 Big Data is a phrase used to
mean a massive volume of
both structured and
unstructured data that is so
large it is difficult to process
using traditional database
and software techniques.
 Data mining is about
finding new information in a
lot of data. The information
obtained from data
mining is hopefully both
new and useful. In many
cases, data is stored so it
can be used later.
Interesting Facts
 The volume of business data worldwide, across all companies, doubles
every 1.2 years (was 1.5 years)
 Daily 2500 quadrillion of data are produced and more than 90 percentage
of data are produced within past two years.
 A regular person is processing daily more data than a 16th century
individual in his entire life
 In the last years cost of storage and processing power dropped significantly
 Bad data or poor data quality costs US businesses $600 billion annually
 By 2015, 4.4 million IT jobs globally will be created to support big data
(Gartner)
 Facebook processes 10 TB of data every day / Twitter 7 TB
 Google has over 3 million servers processing over 2 trillion searches per
year in 2012 (only 22 million in 2000)
Characteristics of Big Data
Volume - The quantity of data
Variety - categorizing the data
Velocity - speed of generation of data or the speed of processing the data
Variability - Inconsistency
Complexity - Managing the data
Big Data Mining Algorithm
 Big data applications have so many sources to gather information.
 If we want to mine data, we need to gather all distributed data to the
centralized site. But it is prohibited because of high data transmission
cost and privacy concerns.
 Most of the mining levels order to achieve the pattern of correlations,
or patterns can be discovered from combined variety of sources.
 The global data mining is done through two steps process.
 Model level
 Knowledge level.
 Each and every local sites use local data to calculate the data statistics
and it share this information in order to achieve global data distribution
in their data level.
 In model level it will produce local pattern. This pattern will be
produced after mined local data.
 By sharing these local patterns with other local sites, we can produce a
single global pattern.
 At the knowledge level, model correlation analysis investigates the
relevance between models generated from various data sources to
determine how related the data sources are correlated to each other,
and how to form accurate decisions based on models built from
autonomous sources
Applications of Big Data
 Healthcare organizations can achieve better insight into disease trends
and patient treatments.
 Public sector agencies can catch fraud and other threats in real-time.
 Applications of Multimedia data
 To find travelling pattern of travelers
 CC TV camera footage
 Photos and Videos from social network
 Recommender system
 Integration and mining of Bio data from various sources in Biological
network by NSF (National Science Foundation).
 Classifying the Big data stream in run time, by Australian Research
council.
Applications of Data Mining
 It uses data and analytics to identify best practices that improve care and
reduce costs.
 Market basket analysis is a modelling technique based upon a theory that if
you buy a certain group of items you are more likely to buy another group of
items. This technique may allow the retailer to understand the purchase
behaviour of a buyer.
 There is a new emerging field, called Educational Data Mining, concerns with
developing methods that discover knowledge from data originating from
educational Environments.
 There is a new emerging field, called Educational Data Mining, concerns with
developing methods that discover knowledge from data originating from
educational Environments.
DATA MINING CHALLENGES WITH BIG DATA
 Main challenge for an intelligent database is handling Big data. The
important thing is scaling the large amount of data and provide
solution for these problem by HACE theorem
Challenges
Location of Big Data sources- Commonly Big Data are stored in different locations
Volume of the Big Data- size of the Big Data grows continuously.
Hardware resources- RAM capacity
Privacy- Medical reports, bank transactions
Having domain knowledge
Getting meaningful information
Solutions
Parallel computing programming
An efficient platform for computing will not have centralized data storage instead
of that platform will be distributed in big scale storage.
Restricting access to the data
BIG Data Mining Tools
 Hadoop
 Apache S4
 Strom
 Apache Mahout
 MOA
Hadoop
 It is developed by Apache Software Foundation project and open
source software platform for scalable, distributed computing.
 Apache Hadoop software library is a framework that allows for the
distributed processing of large data sets across clusters of computers
using simple programming models.
 Hadoop provides fast and reliable analysis of both Structured and un
structured data.
 It is designed to scale up from single servers to thousands of machines,
each offering local computation and storage.
 Hadoop uses MapReduce programming model to mine data.
 This MapReduce program is used to separate datasets which are sent as
input into independent subsets. Those are process parallel map task.
 Map() procedure that performs filtering and sorting
 Reduce() procedure that performs a summary operation
Data Mining Software
•Weka - an open-source software for data mining
•RapidMiner - an open-source system for data and text mining
•KNIME - an open-source data integration, processing, analysis, and exploration
platform
•The Mahout machine learning library - mining large data sets. It supports
recommendation mining, clustering, classification and frequent itemset mining.
•Rattle - a GUI for data mining using R
From the dawn of civilization until
2003, humankind generated five
exabytes of data. Now we produce
five exabytes every two days…and
the pace is accelerating.
Eric Schmidt,
Executive Chairman, Google
Big data and data mining
Big data and data mining

More Related Content

What's hot

Data Analytics
Data AnalyticsData Analytics
Data Analytics
Srinimf-Slides
 
lazy learners and other classication methods
lazy learners and other classication methodslazy learners and other classication methods
lazy learners and other classication methods
rajshreemuthiah
 
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
Thien Q. Tran
 
Big data unit i
Big data unit iBig data unit i
Big data unit i
Navjot Kaur
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
Md. Salman Ahmed
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
Ghulam Imaduddin
 
Big data-ppt
Big data-pptBig data-ppt
Big data-ppt
Nazir Ahmed
 
What Are The Latest Trends in Data Science?
What Are The Latest Trends in Data Science?What Are The Latest Trends in Data Science?
What Are The Latest Trends in Data Science?
Bernard Marr
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data Science
Jason Geng
 
Introduction to Pig
Introduction to PigIntroduction to Pig
Introduction to Pig
Prashanth Babu
 
Big Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBig Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should Know
Bernard Marr
 
Lung Cancer Detection using Machine Learning
Lung Cancer Detection using Machine LearningLung Cancer Detection using Machine Learning
Lung Cancer Detection using Machine Learning
ijtsrd
 
Big Data
Big DataBig Data
BIG DATA-Seminar Report
BIG DATA-Seminar ReportBIG DATA-Seminar Report
BIG DATA-Seminar Report
josnapv
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
RohithND
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Simplilearn
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
Sampath Kumar
 
Data cleaning and visualization
Data cleaning and visualizationData cleaning and visualization
Data cleaning and visualization
Tapan Gautam
 
Data mining slides
Data mining slidesData mining slides
Data mining slidessmj
 

What's hot (20)

Data Analytics
Data AnalyticsData Analytics
Data Analytics
 
lazy learners and other classication methods
lazy learners and other classication methodslazy learners and other classication methods
lazy learners and other classication methods
 
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
 
Big data unit i
Big data unit iBig data unit i
Big data unit i
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Big data-ppt
Big data-pptBig data-ppt
Big data-ppt
 
What Are The Latest Trends in Data Science?
What Are The Latest Trends in Data Science?What Are The Latest Trends in Data Science?
What Are The Latest Trends in Data Science?
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data Science
 
Introduction to Pig
Introduction to PigIntroduction to Pig
Introduction to Pig
 
Big Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBig Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should Know
 
Lung Cancer Detection using Machine Learning
Lung Cancer Detection using Machine LearningLung Cancer Detection using Machine Learning
Lung Cancer Detection using Machine Learning
 
Big Data
Big DataBig Data
Big Data
 
BIG DATA-Seminar Report
BIG DATA-Seminar ReportBIG DATA-Seminar Report
BIG DATA-Seminar Report
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Data cleaning and visualization
Data cleaning and visualizationData cleaning and visualization
Data cleaning and visualization
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
 
Introduction data mining
Introduction data miningIntroduction data mining
Introduction data mining
 

Similar to Big data and data mining

Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysis
Poonam Kshirsagar
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
Sandip Tipayle Patil
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
almaraniabwmalk
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabati
nabati
 
BigData
BigDataBigData
BigData
Viveka Sharma
 
Real World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining ToolsReal World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining Tools
ijsrd.com
 
Big Data
Big DataBig Data
Big Data
Vinayak Kamath
 
Unit 1 Introduction to Data Analytics .pptx
Unit 1 Introduction to Data Analytics .pptxUnit 1 Introduction to Data Analytics .pptx
Unit 1 Introduction to Data Analytics .pptx
vipulkondekar
 
Big data
Big dataBig data
Big data
Hoang Nguyen
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)
Shahbaz Anjam
 
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
IJSRD
 
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
IJSRD
 
Mining Big Data using Genetic Algorithm
Mining Big Data using Genetic AlgorithmMining Big Data using Genetic Algorithm
Mining Big Data using Genetic Algorithm
IRJET Journal
 

Similar to Big data and data mining (20)

Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysis
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabati
 
BigData
BigDataBigData
BigData
 
Real World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining ToolsReal World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining Tools
 
Big Data
Big DataBig Data
Big Data
 
Unit 1 Introduction to Data Analytics .pptx
Unit 1 Introduction to Data Analytics .pptxUnit 1 Introduction to Data Analytics .pptx
Unit 1 Introduction to Data Analytics .pptx
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)
 
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
 
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
 
Mining Big Data using Genetic Algorithm
Mining Big Data using Genetic AlgorithmMining Big Data using Genetic Algorithm
Mining Big Data using Genetic Algorithm
 

Recently uploaded

Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
ChristineTorrepenida1
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
anoopmanoharan2
 
bank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdfbank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdf
Divyam548318
 
Self-Control of Emotions by Slidesgo.pptx
Self-Control of Emotions by Slidesgo.pptxSelf-Control of Emotions by Slidesgo.pptx
Self-Control of Emotions by Slidesgo.pptx
iemerc2024
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
obonagu
 
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
dxobcob
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
NidhalKahouli2
 
TOP 10 B TECH COLLEGES IN JAIPUR 2024.pptx
TOP 10 B TECH COLLEGES IN JAIPUR 2024.pptxTOP 10 B TECH COLLEGES IN JAIPUR 2024.pptx
TOP 10 B TECH COLLEGES IN JAIPUR 2024.pptx
nikitacareer3
 
Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
symbo111
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Christina Lin
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
gestioneergodomus
 
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.pptPROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
bhadouriyakaku
 
Fundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptxFundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptx
manasideore6
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
SyedAbiiAzazi1
 
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
ssuser7dcef0
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
camseq
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
nooriasukmaningtyas
 

Recently uploaded (20)

Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
 
PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
 
bank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdfbank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdf
 
Self-Control of Emotions by Slidesgo.pptx
Self-Control of Emotions by Slidesgo.pptxSelf-Control of Emotions by Slidesgo.pptx
Self-Control of Emotions by Slidesgo.pptx
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
 
TOP 10 B TECH COLLEGES IN JAIPUR 2024.pptx
TOP 10 B TECH COLLEGES IN JAIPUR 2024.pptxTOP 10 B TECH COLLEGES IN JAIPUR 2024.pptx
TOP 10 B TECH COLLEGES IN JAIPUR 2024.pptx
 
Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
 
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.pptPROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
PROJECT FORMAT FOR EVS AMITY UNIVERSITY GWALIOR.ppt
 
Fundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptxFundamentals of Induction Motor Drives.pptx
Fundamentals of Induction Motor Drives.pptx
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
 
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
 

Big data and data mining

  • 1. Presentation On Big Data & Data Mining
  • 2. Introduction Datamining and bigdata analytics is the process of examining data to uncover hidden patterns, unknown correlations and other useful information that can be used to make better decisions.
  • 3. Definitions:  Big Data is a phrase used to mean a massive volume of both structured and unstructured data that is so large it is difficult to process using traditional database and software techniques.  Data mining is about finding new information in a lot of data. The information obtained from data mining is hopefully both new and useful. In many cases, data is stored so it can be used later.
  • 4. Interesting Facts  The volume of business data worldwide, across all companies, doubles every 1.2 years (was 1.5 years)  Daily 2500 quadrillion of data are produced and more than 90 percentage of data are produced within past two years.  A regular person is processing daily more data than a 16th century individual in his entire life  In the last years cost of storage and processing power dropped significantly  Bad data or poor data quality costs US businesses $600 billion annually  By 2015, 4.4 million IT jobs globally will be created to support big data (Gartner)  Facebook processes 10 TB of data every day / Twitter 7 TB  Google has over 3 million servers processing over 2 trillion searches per year in 2012 (only 22 million in 2000)
  • 5. Characteristics of Big Data Volume - The quantity of data Variety - categorizing the data Velocity - speed of generation of data or the speed of processing the data Variability - Inconsistency Complexity - Managing the data
  • 6. Big Data Mining Algorithm  Big data applications have so many sources to gather information.  If we want to mine data, we need to gather all distributed data to the centralized site. But it is prohibited because of high data transmission cost and privacy concerns.  Most of the mining levels order to achieve the pattern of correlations, or patterns can be discovered from combined variety of sources.  The global data mining is done through two steps process.  Model level  Knowledge level.  Each and every local sites use local data to calculate the data statistics and it share this information in order to achieve global data distribution in their data level.
  • 7.  In model level it will produce local pattern. This pattern will be produced after mined local data.  By sharing these local patterns with other local sites, we can produce a single global pattern.  At the knowledge level, model correlation analysis investigates the relevance between models generated from various data sources to determine how related the data sources are correlated to each other, and how to form accurate decisions based on models built from autonomous sources
  • 8. Applications of Big Data  Healthcare organizations can achieve better insight into disease trends and patient treatments.  Public sector agencies can catch fraud and other threats in real-time.  Applications of Multimedia data  To find travelling pattern of travelers  CC TV camera footage  Photos and Videos from social network  Recommender system  Integration and mining of Bio data from various sources in Biological network by NSF (National Science Foundation).  Classifying the Big data stream in run time, by Australian Research council.
  • 9. Applications of Data Mining  It uses data and analytics to identify best practices that improve care and reduce costs.  Market basket analysis is a modelling technique based upon a theory that if you buy a certain group of items you are more likely to buy another group of items. This technique may allow the retailer to understand the purchase behaviour of a buyer.  There is a new emerging field, called Educational Data Mining, concerns with developing methods that discover knowledge from data originating from educational Environments.  There is a new emerging field, called Educational Data Mining, concerns with developing methods that discover knowledge from data originating from educational Environments.
  • 10. DATA MINING CHALLENGES WITH BIG DATA  Main challenge for an intelligent database is handling Big data. The important thing is scaling the large amount of data and provide solution for these problem by HACE theorem
  • 11. Challenges Location of Big Data sources- Commonly Big Data are stored in different locations Volume of the Big Data- size of the Big Data grows continuously. Hardware resources- RAM capacity Privacy- Medical reports, bank transactions Having domain knowledge Getting meaningful information
  • 12. Solutions Parallel computing programming An efficient platform for computing will not have centralized data storage instead of that platform will be distributed in big scale storage. Restricting access to the data
  • 13. BIG Data Mining Tools  Hadoop  Apache S4  Strom  Apache Mahout  MOA
  • 14. Hadoop  It is developed by Apache Software Foundation project and open source software platform for scalable, distributed computing.  Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.  Hadoop provides fast and reliable analysis of both Structured and un structured data.  It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.  Hadoop uses MapReduce programming model to mine data.  This MapReduce program is used to separate datasets which are sent as input into independent subsets. Those are process parallel map task.  Map() procedure that performs filtering and sorting  Reduce() procedure that performs a summary operation
  • 15. Data Mining Software •Weka - an open-source software for data mining •RapidMiner - an open-source system for data and text mining •KNIME - an open-source data integration, processing, analysis, and exploration platform •The Mahout machine learning library - mining large data sets. It supports recommendation mining, clustering, classification and frequent itemset mining. •Rattle - a GUI for data mining using R
  • 16. From the dawn of civilization until 2003, humankind generated five exabytes of data. Now we produce five exabytes every two days…and the pace is accelerating. Eric Schmidt, Executive Chairman, Google

Editor's Notes

  1. Sourcessssssssss Social network Satellite data Geographical data Live streaming data