SlideShare a Scribd company logo
Approaches to ML Techniques on
       Real World Data
    A Demo on Behavioral Analysis in
        Social Networking Sites.
                   --Venkata Ramana C
Real World Data

Social Networking Sites
             Blogs
  Forums
            Tweets …
Aim
• Create an Environment where you create your
  own rules on how to share your data on the
  Web.
Technique
• Active Learning
Active Learning
• The key idea behind active learning is that a machine learning
  algorithm can achieve greater accuracy with fewer labelled
  training instances if it is allowed to choose the data from
  which is learns.

• An active learner may ask queries in the form of unlabeled
  instances to be labelled by an oracle (e.g., a human
  annotator).
Scenario
Uncertainty Sampling
• The technique of the project is as follows:

•         there will be questions to the user about each friend as below,
    followed by what the user wants to share with his friend -- which are the
    profile features.

• How are u associated with your friend? ( or )

• What do u have in common with your friend?

• 1.personal

• 2.donno how (shall take some time to decide ?)

• 3.we have ...x.y.z....(specify) in common. [this will form a group]
Pseudo Algorithm
• Input: initial small training set L, and pool of unlabeled
   data set U
  Use L to train the initial classifier C
   – Repeat
      • Use the current classifier C to label all unlabeled
        examples in U.
      • Use uncertainty sampling technique to select
        m2 most informative unlabeled examples, and
        ask oracle H for labelling.
      • Augment L with these m new examples, and
        remove them from U.
      • Use L to retrain the current classifier C.
   Until the predefined stopping criterion SC is met.
So When is This Useful?
Friend Groups
Active Learning for Privacy




         Courtesy: Privacy Wizards for Social Networking Sites. WWW2010
Concepts
• Gini Impurity
   - the expected error rate.
• Entropy
   - how mixed up a set is?
 p(i) = frequency(outcome) = count(outcome) / count(total rows)
 Entropy = sum of p(i) x log(p(i)) for all outcomes
Courtesy: Collective Intelligence by Toby Segaran
CART
 (Classification and Regression Trees)
• Decision tree classifiers are simple to view and
  interpret.
• If-Then rules.
Application Data.
•   Id       group1   group2   .....   DOB
•   123        Yes    No       .....   Share
•   124        No     Yes      .....   NotShare
•   ......     ...    .....    .....
•   ......     ...    .....    .....
•   ......     ...    .....    .....
•   ......     ...    .....    .....
•   129        Yes    Yes      .....   Share
Demo
• We will use Decision Trees to train some of
  our friends in Social Networks to set Privacy
  Preferences.
Demo Results:
• [u'project', u'srm', u'sssg'] => Feature Vector
    [0, 1, 2] => mapped by CART algorithm
• Goal is to ‘Share’ or ‘NotShare’
   Dob, zip, religion, phone, email.
For Dob:
  *‘yes’, ‘No’, ‘No’, ‘Share’+ (from Vasu)
  *‘No’, ‘Yes’, ‘No’, ‘NotShare’+ (from Cigith)
  *‘No’, ‘No’, ‘Yes’, ‘Share’+ (from Harish)
Decision Tree for DOB:
                  0: Yes?
               F/           T
             ,‘Share’: 2- ,‘NotShare’: 1-
                    Zip Code:
• [['Yes', 'No', 'No', 'Share'], ['No', 'Yes', 'No',
  'NotShare'], ['No', 'No', 'Yes', 'Share']]
                    1:Yes?
            T->                F->
{'NotShare': 1}                    {'Share': 2}
References
• Collective Intelligence by Toby Segaran.




• Privacy Wizards for Social Networking Sites.
  WWW2010.

More Related Content

Similar to Approaches to ml techniques on real world data

Paris ML meetup
Paris ML meetupParis ML meetup
Paris ML meetup
Yves Raimond
 
Parismlmeetupfinalslides 151209190037-lva1-app6892
Parismlmeetupfinalslides 151209190037-lva1-app6892Parismlmeetupfinalslides 151209190037-lva1-app6892
Parismlmeetupfinalslides 151209190037-lva1-app6892
mercedes calderon
 
Digital manipulatives todd_final
Digital manipulatives todd_finalDigital manipulatives todd_final
Digital manipulatives todd_finaljanon672002
 
Sippin: A Mobile Application Case Study presented at Techfest Louisville
Sippin: A Mobile Application Case Study presented at Techfest LouisvilleSippin: A Mobile Application Case Study presented at Techfest Louisville
Sippin: A Mobile Application Case Study presented at Techfest Louisville
Dawn Yankeelov
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
Data Con LA
 
Deep learning tutorial 9/2019
Deep learning tutorial 9/2019Deep learning tutorial 9/2019
Deep learning tutorial 9/2019
Amr Rashed
 
Deep Learning Tutorial
Deep Learning TutorialDeep Learning Tutorial
Deep Learning Tutorial
Amr Rashed
 
Developing in R - the contextual Multi-Armed Bandit edition
Developing in R - the contextual Multi-Armed Bandit editionDeveloping in R - the contextual Multi-Armed Bandit edition
Developing in R - the contextual Multi-Armed Bandit edition
Robin van Emden
 
“Making GANs Much Better, or If at First You Don’t Succeed, Try, Try a GAN,” ...
“Making GANs Much Better, or If at First You Don’t Succeed, Try, Try a GAN,” ...“Making GANs Much Better, or If at First You Don’t Succeed, Try, Try a GAN,” ...
“Making GANs Much Better, or If at First You Don’t Succeed, Try, Try a GAN,” ...
Edge AI and Vision Alliance
 
The Path to Good Software Hygiene Betty Le Dem
The Path to Good Software Hygiene Betty Le DemThe Path to Good Software Hygiene Betty Le Dem
The Path to Good Software Hygiene Betty Le Dem
contaokubo
 
Deep learning with tensorflow
Deep learning with tensorflowDeep learning with tensorflow
Deep learning with tensorflow
Charmi Chokshi
 
KorraAI - a probabilistic virtual agent framework
KorraAI - a probabilistic virtual agent frameworkKorraAI - a probabilistic virtual agent framework
KorraAI - a probabilistic virtual agent framework
AntonAndreev13
 
Predict oscars (4:17)
Predict oscars (4:17)Predict oscars (4:17)
Predict oscars (4:17)
Thinkful
 
Understanding and improving games through machine learning - Natasha Latysheva
Understanding and improving games through machine learning - Natasha LatyshevaUnderstanding and improving games through machine learning - Natasha Latysheva
Understanding and improving games through machine learning - Natasha Latysheva
Lauren Cormack
 
tensorflow.pptx
tensorflow.pptxtensorflow.pptx
tensorflow.pptx
JoanJeremiah
 
Decision trees
Decision treesDecision trees
Decision trees
Ncib Lotfi
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
Gangeshwar Krishnamurthy
 
CM UTaipei Kaggle Share
CM UTaipei Kaggle ShareCM UTaipei Kaggle Share
CM UTaipei Kaggle Share
志明 陳
 
Örüntü tanıma - Pattern Recognition
Örüntü tanıma - Pattern RecognitionÖrüntü tanıma - Pattern Recognition
Örüntü tanıma - Pattern Recognition
Hassan-k Abdi
 

Similar to Approaches to ml techniques on real world data (20)

Paris ML meetup
Paris ML meetupParis ML meetup
Paris ML meetup
 
Parismlmeetupfinalslides 151209190037-lva1-app6892
Parismlmeetupfinalslides 151209190037-lva1-app6892Parismlmeetupfinalslides 151209190037-lva1-app6892
Parismlmeetupfinalslides 151209190037-lva1-app6892
 
Digital manipulatives todd_final
Digital manipulatives todd_finalDigital manipulatives todd_final
Digital manipulatives todd_final
 
Sippin: A Mobile Application Case Study presented at Techfest Louisville
Sippin: A Mobile Application Case Study presented at Techfest LouisvilleSippin: A Mobile Application Case Study presented at Techfest Louisville
Sippin: A Mobile Application Case Study presented at Techfest Louisville
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Deep learning tutorial 9/2019
Deep learning tutorial 9/2019Deep learning tutorial 9/2019
Deep learning tutorial 9/2019
 
Deep Learning Tutorial
Deep Learning TutorialDeep Learning Tutorial
Deep Learning Tutorial
 
Developing in R - the contextual Multi-Armed Bandit edition
Developing in R - the contextual Multi-Armed Bandit editionDeveloping in R - the contextual Multi-Armed Bandit edition
Developing in R - the contextual Multi-Armed Bandit edition
 
“Making GANs Much Better, or If at First You Don’t Succeed, Try, Try a GAN,” ...
“Making GANs Much Better, or If at First You Don’t Succeed, Try, Try a GAN,” ...“Making GANs Much Better, or If at First You Don’t Succeed, Try, Try a GAN,” ...
“Making GANs Much Better, or If at First You Don’t Succeed, Try, Try a GAN,” ...
 
The Path to Good Software Hygiene Betty Le Dem
The Path to Good Software Hygiene Betty Le DemThe Path to Good Software Hygiene Betty Le Dem
The Path to Good Software Hygiene Betty Le Dem
 
Deep learning with tensorflow
Deep learning with tensorflowDeep learning with tensorflow
Deep learning with tensorflow
 
KorraAI - a probabilistic virtual agent framework
KorraAI - a probabilistic virtual agent frameworkKorraAI - a probabilistic virtual agent framework
KorraAI - a probabilistic virtual agent framework
 
gan.pdf
gan.pdfgan.pdf
gan.pdf
 
Predict oscars (4:17)
Predict oscars (4:17)Predict oscars (4:17)
Predict oscars (4:17)
 
Understanding and improving games through machine learning - Natasha Latysheva
Understanding and improving games through machine learning - Natasha LatyshevaUnderstanding and improving games through machine learning - Natasha Latysheva
Understanding and improving games through machine learning - Natasha Latysheva
 
tensorflow.pptx
tensorflow.pptxtensorflow.pptx
tensorflow.pptx
 
Decision trees
Decision treesDecision trees
Decision trees
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
 
CM UTaipei Kaggle Share
CM UTaipei Kaggle ShareCM UTaipei Kaggle Share
CM UTaipei Kaggle Share
 
Örüntü tanıma - Pattern Recognition
Örüntü tanıma - Pattern RecognitionÖrüntü tanıma - Pattern Recognition
Örüntü tanıma - Pattern Recognition
 

Recently uploaded

Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
kimdan468
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
Krisztián Száraz
 
Chapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdfChapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdf
Kartik Tiwari
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Atul Kumar Singh
 
Marketing internship report file for MBA
Marketing internship report file for MBAMarketing internship report file for MBA
Marketing internship report file for MBA
gb193092
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
David Douglas School District
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
goswamiyash170123
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
Multithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race conditionMultithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race condition
Mohammed Sikander
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 

Recently uploaded (20)

Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
 
Chapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdfChapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdf
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
 
Marketing internship report file for MBA
Marketing internship report file for MBAMarketing internship report file for MBA
Marketing internship report file for MBA
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
Multithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race conditionMultithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race condition
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 

Approaches to ml techniques on real world data

  • 1. Approaches to ML Techniques on Real World Data A Demo on Behavioral Analysis in Social Networking Sites. --Venkata Ramana C
  • 2. Real World Data Social Networking Sites Blogs Forums Tweets …
  • 3.
  • 4. Aim • Create an Environment where you create your own rules on how to share your data on the Web.
  • 6. Active Learning • The key idea behind active learning is that a machine learning algorithm can achieve greater accuracy with fewer labelled training instances if it is allowed to choose the data from which is learns. • An active learner may ask queries in the form of unlabeled instances to be labelled by an oracle (e.g., a human annotator).
  • 8. Uncertainty Sampling • The technique of the project is as follows: • there will be questions to the user about each friend as below, followed by what the user wants to share with his friend -- which are the profile features. • How are u associated with your friend? ( or ) • What do u have in common with your friend? • 1.personal • 2.donno how (shall take some time to decide ?) • 3.we have ...x.y.z....(specify) in common. [this will form a group]
  • 9. Pseudo Algorithm • Input: initial small training set L, and pool of unlabeled data set U Use L to train the initial classifier C – Repeat • Use the current classifier C to label all unlabeled examples in U. • Use uncertainty sampling technique to select m2 most informative unlabeled examples, and ask oracle H for labelling. • Augment L with these m new examples, and remove them from U. • Use L to retrain the current classifier C. Until the predefined stopping criterion SC is met.
  • 10. So When is This Useful?
  • 12. Active Learning for Privacy Courtesy: Privacy Wizards for Social Networking Sites. WWW2010
  • 13. Concepts • Gini Impurity - the expected error rate. • Entropy - how mixed up a set is? p(i) = frequency(outcome) = count(outcome) / count(total rows) Entropy = sum of p(i) x log(p(i)) for all outcomes
  • 15. CART (Classification and Regression Trees) • Decision tree classifiers are simple to view and interpret. • If-Then rules.
  • 16. Application Data. • Id group1 group2 ..... DOB • 123 Yes No ..... Share • 124 No Yes ..... NotShare • ...... ... ..... ..... • ...... ... ..... ..... • ...... ... ..... ..... • ...... ... ..... ..... • 129 Yes Yes ..... Share
  • 17. Demo • We will use Decision Trees to train some of our friends in Social Networks to set Privacy Preferences.
  • 18. Demo Results: • [u'project', u'srm', u'sssg'] => Feature Vector [0, 1, 2] => mapped by CART algorithm • Goal is to ‘Share’ or ‘NotShare’ Dob, zip, religion, phone, email. For Dob: *‘yes’, ‘No’, ‘No’, ‘Share’+ (from Vasu) *‘No’, ‘Yes’, ‘No’, ‘NotShare’+ (from Cigith) *‘No’, ‘No’, ‘Yes’, ‘Share’+ (from Harish)
  • 19. Decision Tree for DOB: 0: Yes? F/ T ,‘Share’: 2- ,‘NotShare’: 1- Zip Code: • [['Yes', 'No', 'No', 'Share'], ['No', 'Yes', 'No', 'NotShare'], ['No', 'No', 'Yes', 'Share']] 1:Yes? T-> F-> {'NotShare': 1} {'Share': 2}
  • 20.
  • 21.
  • 22. References • Collective Intelligence by Toby Segaran. • Privacy Wizards for Social Networking Sites. WWW2010.