Machine Learning Using Big Data
A SEMINAR ON
SEMINAR GUIDE: PROF A.K HASE PRESENTER: MR. VAIBHAV KURKUTE
15-04-2017
1. History & Traditional Database
2. Introduction
3. Data Mining
4. What is Machine Learning ?
5. Types of Learning's
6. Supervised Learning Algorithms
7. Unsupervised Learning Algorithms
8. Case Studies
9. Future Scope & Tools
10. Conclusion
15-04-2017
Content
• Old Source of Data: Telephone (Text or Voice)
• Computer Invention & Business Uses
• Old Data Storage
• 21st Century Evolution
• Traditional Databases & Drawbacks
• Structure Data
• Use of MySQL Database Use
• Machine Generating Data.
• Unstructured Data
Use MongoDB i.e NoSQL Database
*Hadoop Distributed File System,HBASE ,Hive.
15-04-2017
History
15-04-2017
Introduction to Big Data
• Generated Fast in unstructured form.
• Continuously Processed and Analyzed
• Large amounts of data, like a million rows in an Excel sheet
• Different types of data mostly unstructured data.
• Get Knowledge out of this data.
1. Google processes 20 petabytes of data every day
2. Facebook gets Thousands of Status in an hour.
15-04-2017
Introduction to Big Data
• Web: estimated Google index 45 billion pages
• Transaction data: 5-50 TB/day
• Satellite image feeds: ~1TB/day/satellite
• Sensor networks/arrays
– CERN Large Hadron Collider ~100 petabytes/day
• Biological data: 1-10TB/day/sequencer
• TV: 2TB/day/channel; YouTube 4TB/day uploaded
• Digitized telephony: ~100 petabytes/day
15-04-2017
How big is Big Data ?
15-04-2017
Data Mining
• Data Mining is of no use if we can’t get useful information from data
• To mine insights from the data & make it potentially useful.
• Previously Unknown data to knowledge.
• Which can be used for ?
1. Predict future trends
2. Allowing businesses to make proactive.
3. Knowledge-driven decisions
4. E.G From your travel history on Yatra.com, one can identify your hometown
5. E.G Snyder & Vini Facebook status
15-04-2017
Data Mining
15-04-2017
Machine Learning
• Machine learn on its own
• No need to tell the machine what to do
• No Need of Coding
• We provide what we call the training data set.
• Use of algorithms and Learn Pattern so to.
• Create knowledge from data.
• Example:
If we give sample input & output like
2 -> f(x) -> 4 and 3 -> f(x) -> 9
4 -> f(x) -> 16 then 5 -> f(x) -> ?
15-04-2017
Machine Learning
• Here are few examples:
1. Google’s self-driving cars
2. Blocking of suspicious credit cards & Spam Mails
3. Recommendation engines on an e-commerce site
4. Facebook Friend Suggestion
“People worry that computers will get too smart and take over the world, but the real
problem is that they're too stupid and they've already taken over the world”
15-04-2017
Machine Learning
15-04-2017
Types of Learning
• Training data with correct answers i.e Examples for Computer
• Use training data to prepare the algorithm
• Apply it to data without a correct answer
• It’s like predictive algorithms.
15-04-2017
Type: Supervised Learning
• No Examples for Computer i.e No training data
• We give data to algorithm
• Here we know which algorithm to use.
• It’s like exploratory algorithm
• We have just to input data & Not Output
• Example
Differentiates correctly between the face of a horse, cat or human (clustering of data)
15-04-2017
Type: Unsupervised Learning
• Clustering:
• Splitting records to pre-defined group
• Data with similar property
• Association:
Seeing what often appears together with what.
• K-means clustering
15-04-2017
Unsupervised Algorithm
• Classification:
• Assigning Records to Predefined Groups
• E.g Recognizing handwritten numbers, or classify emails spam or not.
• Regression (predictive analysis):
• Predict the output value using training data
• Naïve Bayes classifier.
• Decision trees
• Nearest neighbors (kNN)
• Neural networks
15-04-2017
Supervised Algorithm
• Classification:
• Assigning Records to Predefined Groups
• E.g a data used by motor vehicle company to find where to sale ?
• Regression (predictive analysis):
• Predict the output value using training data
• Naïve Bayes classifier.
• Decision trees
• Nearest neighbors (kNN)
• Neural networks
15-04-2017
Supervised Algorithm
• Type of Unsupervised Learning.
• We have to predict using training data.
• Association Rules Mining its using If-Then Condition.
• CASE STUDY 1:
How does amazon predict which product will be sold with what ?
15-04-2017
Apriori Algorithm
• It is a type of Market Basket Analysis
• Information of this type used in the form of “if–then” statements.
• Rules are computed from the data
• Examine all possible rules.
• For the items in an if–then format.
• Select only those that are most likely
to be indicators of true dependence.
15-04-2017
Case Study (Amazon)
15-04-2017
Case Study (Amazon)
15-04-2017
Case Study (Amazon)
• Generate frequent item sets
• With two items, then with three items.
• Based on , how many transactions in the database include the item.
15-04-2017
Case Study (Amazon)
Tools
1. R-PROGRAMMING
2. PYTHON (SCIPY, SCIKIT-LEARN)
3. MATLAB (TO GENERATE IN GRAPHICAL FORM)
4. SPSS
5. SAS
15-04-2017
Real life application
• Some real life applications of machine learning:
 Recommender systems – suggesting similar people on Facebook/LinkedIn, similar
movies/ books etc. on Amazon,
 Business applications – Customer segmentation, Customer retention, Targeted
Marketing etc.
 Medical applications – Disease diagnosis,
 Banking – Credit card issue, fraud detection etc.
 Language translation, text to speech or vice versa.
15-04-2017
Future scope
• Companies using ML – Google, FB, Microsoft, BoA and those which are not using are at
loss.
• With the current increase in use of IoT (Household, Business, Industries etc.) so there is
need of continuously analysis data and conclude using machine learning.
• Connected devices, we now have access to so much more data—and along with it, an
increased need to manage and understand what we know.
• In the future, users will receive more precise recommendations and ads will become
both more effective and less annoying.
Conclusion
• Machine Learning can efficiently support fraud/error detection system.
• Association rule is often the most accurate for suggestion product in market basket
analysis.
• ML can play a good role in the different phase of software engineering, like planning,
analysis, design and testing.
• And Mostly in analyzing data Generated from Sensor used in IoT.
“Machine Learning is like magic where you can get answer to any question”
Thank You
Any Questions ?

Machine Learning using Big data

  • 1.
    Machine Learning UsingBig Data A SEMINAR ON SEMINAR GUIDE: PROF A.K HASE PRESENTER: MR. VAIBHAV KURKUTE 15-04-2017
  • 2.
    1. History &Traditional Database 2. Introduction 3. Data Mining 4. What is Machine Learning ? 5. Types of Learning's 6. Supervised Learning Algorithms 7. Unsupervised Learning Algorithms 8. Case Studies 9. Future Scope & Tools 10. Conclusion 15-04-2017 Content
  • 3.
    • Old Sourceof Data: Telephone (Text or Voice) • Computer Invention & Business Uses • Old Data Storage • 21st Century Evolution • Traditional Databases & Drawbacks • Structure Data • Use of MySQL Database Use • Machine Generating Data. • Unstructured Data Use MongoDB i.e NoSQL Database *Hadoop Distributed File System,HBASE ,Hive. 15-04-2017 History
  • 4.
  • 5.
    • Generated Fastin unstructured form. • Continuously Processed and Analyzed • Large amounts of data, like a million rows in an Excel sheet • Different types of data mostly unstructured data. • Get Knowledge out of this data. 1. Google processes 20 petabytes of data every day 2. Facebook gets Thousands of Status in an hour. 15-04-2017 Introduction to Big Data
  • 6.
    • Web: estimatedGoogle index 45 billion pages • Transaction data: 5-50 TB/day • Satellite image feeds: ~1TB/day/satellite • Sensor networks/arrays – CERN Large Hadron Collider ~100 petabytes/day • Biological data: 1-10TB/day/sequencer • TV: 2TB/day/channel; YouTube 4TB/day uploaded • Digitized telephony: ~100 petabytes/day 15-04-2017 How big is Big Data ?
  • 7.
  • 8.
    • Data Miningis of no use if we can’t get useful information from data • To mine insights from the data & make it potentially useful. • Previously Unknown data to knowledge. • Which can be used for ? 1. Predict future trends 2. Allowing businesses to make proactive. 3. Knowledge-driven decisions 4. E.G From your travel history on Yatra.com, one can identify your hometown 5. E.G Snyder & Vini Facebook status 15-04-2017 Data Mining
  • 9.
  • 10.
    • Machine learnon its own • No need to tell the machine what to do • No Need of Coding • We provide what we call the training data set. • Use of algorithms and Learn Pattern so to. • Create knowledge from data. • Example: If we give sample input & output like 2 -> f(x) -> 4 and 3 -> f(x) -> 9 4 -> f(x) -> 16 then 5 -> f(x) -> ? 15-04-2017 Machine Learning
  • 11.
    • Here arefew examples: 1. Google’s self-driving cars 2. Blocking of suspicious credit cards & Spam Mails 3. Recommendation engines on an e-commerce site 4. Facebook Friend Suggestion “People worry that computers will get too smart and take over the world, but the real problem is that they're too stupid and they've already taken over the world” 15-04-2017 Machine Learning
  • 12.
  • 13.
    • Training datawith correct answers i.e Examples for Computer • Use training data to prepare the algorithm • Apply it to data without a correct answer • It’s like predictive algorithms. 15-04-2017 Type: Supervised Learning
  • 14.
    • No Examplesfor Computer i.e No training data • We give data to algorithm • Here we know which algorithm to use. • It’s like exploratory algorithm • We have just to input data & Not Output • Example Differentiates correctly between the face of a horse, cat or human (clustering of data) 15-04-2017 Type: Unsupervised Learning
  • 15.
    • Clustering: • Splittingrecords to pre-defined group • Data with similar property • Association: Seeing what often appears together with what. • K-means clustering 15-04-2017 Unsupervised Algorithm
  • 16.
    • Classification: • AssigningRecords to Predefined Groups • E.g Recognizing handwritten numbers, or classify emails spam or not. • Regression (predictive analysis): • Predict the output value using training data • Naïve Bayes classifier. • Decision trees • Nearest neighbors (kNN) • Neural networks 15-04-2017 Supervised Algorithm
  • 17.
    • Classification: • AssigningRecords to Predefined Groups • E.g a data used by motor vehicle company to find where to sale ? • Regression (predictive analysis): • Predict the output value using training data • Naïve Bayes classifier. • Decision trees • Nearest neighbors (kNN) • Neural networks 15-04-2017 Supervised Algorithm
  • 18.
    • Type ofUnsupervised Learning. • We have to predict using training data. • Association Rules Mining its using If-Then Condition. • CASE STUDY 1: How does amazon predict which product will be sold with what ? 15-04-2017 Apriori Algorithm
  • 19.
    • It isa type of Market Basket Analysis • Information of this type used in the form of “if–then” statements. • Rules are computed from the data • Examine all possible rules. • For the items in an if–then format. • Select only those that are most likely to be indicators of true dependence. 15-04-2017 Case Study (Amazon)
  • 20.
  • 21.
    15-04-2017 Case Study (Amazon) •Generate frequent item sets • With two items, then with three items. • Based on , how many transactions in the database include the item.
  • 22.
  • 23.
    Tools 1. R-PROGRAMMING 2. PYTHON(SCIPY, SCIKIT-LEARN) 3. MATLAB (TO GENERATE IN GRAPHICAL FORM) 4. SPSS 5. SAS
  • 24.
    15-04-2017 Real life application •Some real life applications of machine learning:  Recommender systems – suggesting similar people on Facebook/LinkedIn, similar movies/ books etc. on Amazon,  Business applications – Customer segmentation, Customer retention, Targeted Marketing etc.  Medical applications – Disease diagnosis,  Banking – Credit card issue, fraud detection etc.  Language translation, text to speech or vice versa.
  • 25.
    15-04-2017 Future scope • Companiesusing ML – Google, FB, Microsoft, BoA and those which are not using are at loss. • With the current increase in use of IoT (Household, Business, Industries etc.) so there is need of continuously analysis data and conclude using machine learning. • Connected devices, we now have access to so much more data—and along with it, an increased need to manage and understand what we know. • In the future, users will receive more precise recommendations and ads will become both more effective and less annoying.
  • 26.
    Conclusion • Machine Learningcan efficiently support fraud/error detection system. • Association rule is often the most accurate for suggestion product in market basket analysis. • ML can play a good role in the different phase of software engineering, like planning, analysis, design and testing. • And Mostly in analyzing data Generated from Sensor used in IoT. “Machine Learning is like magic where you can get answer to any question”
  • 27.