SlideShare a Scribd company logo
1 of 17
Welcome to Our Presentation
Presentation on
Big Data and Data Mining
Introduction
Big Data is a term for data sets that are so large
or complex that traditional data processing application
softwareis inadequate to deal with them.
Data mining is the computing process of
discovering patterns in large data sets involving
methods at the intersection of machine
learning, statistics, and database systems.
Important Info
 Daily 2500 quadrillion of data are produced and more than 90 percentage of data
are produced within past two years.
 A regular person is processing daily more data than a 16th century individual in his
entire life
 The volume of business data worldwide, across all companies, doubles every 1.2
years (was 1.5 years)
 Bad data or poor data quality costs US businesses $600 billion annually
 By 2015, 4.4 million IT jobs globally will be created to support big data (Gartner)
 Facebook processes 10 TB of data every day / Twitter 7 TB
 Google has over 3 million servers processing over 2 trillion searches per year in
2012 (only 22 million in 2000)
4 variants of Big Data
Volume
• Data
Quantity
Velocity
• Data Speed
Variety
• Data Types
Variability
• Inconsistency
Big Data Mining Algorithm
 Big data applications have so many sources to gather information.
 If we want to mine data, we need to gather all distributed data to the
centralized site. But it is prohibited because of high data transmission cost and
privacy concerns.
 Most of the mining levels order to achieve the pattern of correlations, or patterns
can be discovered from combined variety of sources.
 The global data mining is done through two steps process.
 Model level
 Knowledge level.
 Each and every local sites use local data to calculate the data statistics and it
share this information in order to achieve global data distribution in their data
level.
 In model level it will produce local pattern. This pattern will be produced
after mined local data.
 By sharing these local patterns with other local sites, we can produce a single
global pattern.
 At the knowledge level, model correlation analysis investigates the relevance
between models generated from various data sources to determine how
related the data sources are correlated to each other, and how to form
accurate decisions based on models built from autonomous sources
DATA MINING CHALLENGES WITH BIG DATA
 Main challenge for an intelligent database is handling Big data.
The important thing is scaling the large amount of data and
provide solution for these problem by HACE theorem
Challenges
Hardware resources- RAM capacity
Location of Big Data sources- Commonly Big Data are
stored in different locations
Volume of the Big Data- size of the Big Data grows
continuously.
Privacy- Medical reports, bank transactions
Having domain knowledge
Getting meaningful information
Solutions
Parallel computing programming
An efficient platform for computing will
not have centralized data storage instead of
that platform will be distributed in big scale
storage.
Restricting access to the data
BIG Data Mining Tools
 Hadoop
 Apache S4
 Strom
 Apache Mahout
 MOA
Hadoop
 It is developed by Apache Software Foundation project and open source
software platform for scalable, distributed computing.
 Apache Hadoop software library is a framework that allows for the distributed
processing of large data sets across clusters of computers using simple
programming models.
 Hadoop provides fast and reliable analysis of both Structured and un
structured data.
 It is designed to scale up from single servers to thousands of machines, each
offering local computation and storage.
 Hadoop uses MapReduce programming model to mine data.
 This MapReduce program is used to separate datasets which are sent as input
into independent subsets. Those are process parallel map task.
 Map() procedure that performs filtering and sorting
 Reduce() procedure that performs a summary operation
Applications of Big Data
 Healthcare organizations can achieve better insight into disease trends and
patient treatments.
 Public sector agencies can catch fraud and other threats in real-time.
 Applications of Multimedia data
 To find travelling pattern of travelers
 CC TV camera footage
 Photos and Videos from social network
 Recommender system
 Integration and mining of Bio data from various sources in Biological network
by NSF (National Science Foundation).
 Classifying the Big data stream in run time, by Australian Research council.
Advantages
 Fast response
 Extract useful information
 Prediction of required data from large amount of data.
 Savour of better results in the form of visualization.
We Are: The Genius
 Gopesh Singha ………………….1519
 Md. Mizanur Rahman ………..…1524
 Kawsar Ahmed ……………….…1531
 Hasan Pervez…………………….1520

More Related Content

What's hot (20)

Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies Overview
 
Big data
Big dataBig data
Big data
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Big data
Big dataBig data
Big data
 
Responsible AI
Responsible AIResponsible AI
Responsible AI
 
Introduction to data science.pptx
Introduction to data science.pptxIntroduction to data science.pptx
Introduction to data science.pptx
 
Big data analytics in banking sector
Big data analytics in banking sectorBig data analytics in banking sector
Big data analytics in banking sector
 
Big Data
Big DataBig Data
Big Data
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.ppt
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
 
Implementing Artificial Intelligence with Big Data
Implementing Artificial Intelligence with Big DataImplementing Artificial Intelligence with Big Data
Implementing Artificial Intelligence with Big Data
 
Big data
Big dataBig data
Big data
 
Intro to deep learning
Intro to deep learning Intro to deep learning
Intro to deep learning
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
Big Data
Big DataBig Data
Big Data
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
Big data mining
Big data miningBig data mining
Big data mining
 
Big Data ppt
Big Data pptBig Data ppt
Big Data ppt
 
Data analytics
Data analyticsData analytics
Data analytics
 

Similar to Big Data & Data Mining

Big data and data mining
Big data and data miningBig data and data mining
Big data and data miningPolash Halder
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptalmaraniabwmalk
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabatinabati
 
A Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and ChallengesA Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and Challengesijcisjournal
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysisPoonam Kshirsagar
 
Moving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and PerspectivesMoving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and PerspectivesIJRESJOURNAL
 
Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overviewNitesh Ghosh
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessAjay Ohri
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in detailsMahmoud Yassin
 
Lesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxLesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxPankajkumar496281
 

Similar to Big Data & Data Mining (20)

Big data and data mining
Big data and data miningBig data and data mining
Big data and data mining
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Big Data
Big DataBig Data
Big Data
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabati
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
BigData
BigDataBigData
BigData
 
Big Data
Big DataBig Data
Big Data
 
A Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and ChallengesA Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and Challenges
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysis
 
Moving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and PerspectivesMoving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and Perspectives
 
Unit 1
Unit 1Unit 1
Unit 1
 
Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overview
 
Bigdata " new level"
Bigdata " new level"Bigdata " new level"
Bigdata " new level"
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help business
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in details
 
A Big Data Concept
A Big Data ConceptA Big Data Concept
A Big Data Concept
 
Lesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxLesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptx
 

Recently uploaded

04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 

Recently uploaded (20)

04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 

Big Data & Data Mining

  • 1. Welcome to Our Presentation Presentation on Big Data and Data Mining
  • 3. Big Data is a term for data sets that are so large or complex that traditional data processing application softwareis inadequate to deal with them.
  • 4. Data mining is the computing process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.
  • 5. Important Info  Daily 2500 quadrillion of data are produced and more than 90 percentage of data are produced within past two years.  A regular person is processing daily more data than a 16th century individual in his entire life  The volume of business data worldwide, across all companies, doubles every 1.2 years (was 1.5 years)  Bad data or poor data quality costs US businesses $600 billion annually  By 2015, 4.4 million IT jobs globally will be created to support big data (Gartner)  Facebook processes 10 TB of data every day / Twitter 7 TB  Google has over 3 million servers processing over 2 trillion searches per year in 2012 (only 22 million in 2000)
  • 6. 4 variants of Big Data Volume • Data Quantity Velocity • Data Speed Variety • Data Types Variability • Inconsistency
  • 7. Big Data Mining Algorithm  Big data applications have so many sources to gather information.  If we want to mine data, we need to gather all distributed data to the centralized site. But it is prohibited because of high data transmission cost and privacy concerns.  Most of the mining levels order to achieve the pattern of correlations, or patterns can be discovered from combined variety of sources.  The global data mining is done through two steps process.  Model level  Knowledge level.  Each and every local sites use local data to calculate the data statistics and it share this information in order to achieve global data distribution in their data level.
  • 8.  In model level it will produce local pattern. This pattern will be produced after mined local data.  By sharing these local patterns with other local sites, we can produce a single global pattern.  At the knowledge level, model correlation analysis investigates the relevance between models generated from various data sources to determine how related the data sources are correlated to each other, and how to form accurate decisions based on models built from autonomous sources
  • 9. DATA MINING CHALLENGES WITH BIG DATA  Main challenge for an intelligent database is handling Big data. The important thing is scaling the large amount of data and provide solution for these problem by HACE theorem
  • 10. Challenges Hardware resources- RAM capacity Location of Big Data sources- Commonly Big Data are stored in different locations Volume of the Big Data- size of the Big Data grows continuously. Privacy- Medical reports, bank transactions Having domain knowledge Getting meaningful information
  • 11. Solutions Parallel computing programming An efficient platform for computing will not have centralized data storage instead of that platform will be distributed in big scale storage. Restricting access to the data
  • 12. BIG Data Mining Tools  Hadoop  Apache S4  Strom  Apache Mahout  MOA
  • 13. Hadoop  It is developed by Apache Software Foundation project and open source software platform for scalable, distributed computing.  Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.  Hadoop provides fast and reliable analysis of both Structured and un structured data.  It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.  Hadoop uses MapReduce programming model to mine data.  This MapReduce program is used to separate datasets which are sent as input into independent subsets. Those are process parallel map task.  Map() procedure that performs filtering and sorting  Reduce() procedure that performs a summary operation
  • 14. Applications of Big Data  Healthcare organizations can achieve better insight into disease trends and patient treatments.  Public sector agencies can catch fraud and other threats in real-time.  Applications of Multimedia data  To find travelling pattern of travelers  CC TV camera footage  Photos and Videos from social network  Recommender system  Integration and mining of Bio data from various sources in Biological network by NSF (National Science Foundation).  Classifying the Big data stream in run time, by Australian Research council.
  • 15. Advantages  Fast response  Extract useful information  Prediction of required data from large amount of data.  Savour of better results in the form of visualization.
  • 16.
  • 17. We Are: The Genius  Gopesh Singha ………………….1519  Md. Mizanur Rahman ………..…1524  Kawsar Ahmed ……………….…1531  Hasan Pervez…………………….1520

Editor's Notes

  1. In 2012, debate which is held during president election between Obama & Mitt triggered about 10 million tweets within 2 hours. And the well-known web site Flickr which is used to post our images faced a problem. It receives 1.8 million photographs every day which has the size of 2MB. Approximately they need 3.6TB storage capacity per day. These situations shows the reason for rise of Big Data application
  2. Sourcessssssssss Social network Satellite data Geographical data Live streaming data