www.edureka.in/data-science
Data Science Webinar Series:
Applications of Clustering in Real Life
View Data Science Courses...
www.edureka.in/data-scienceSlide 2
Meet Your Instructor
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Quest...
www.edureka.in/data-scienceSlide 3
Meet Your Instructor
 Understand Data Science Applications and Prospects
 Get an over...
www.edureka.in/data-scienceSlide 4
Objectives
 Understand Data Science Applications and Prospects
 Get an overview of Ma...
www.edureka.in/data-scienceSlide 5
Data Science Applications: Wine Recommendation
Twitter @edurekaIN, Facebook /edurekaIN,...
www.edureka.in/data-scienceSlide 6
Data Science Applications: Pizza Hut
Twitter @edurekaIN, Facebook /edurekaIN, use #AskE...
www.edureka.in/data-scienceSlide 7
Data Science Applications: NetFlix
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdu...
www.edureka.in/data-scienceSlide 8
Data Science Applications: Summarize News
www.edureka.in/data-scienceSlide 9
How about this?
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
www.edureka.in/data-scienceSlide 10
What’s Common in these Applications?
According to Wikipedia: Data science is the study...
Slide 11 www.edureka.in/data-science
Data Science: Demand Supply Gap
Big Data Analyst
Big Data Architect
Big Data Engineer...
Slide 12 www.edureka.in/data-science
Data Science: Job Trends
www.edureka.in/data-scienceSlide 13
Machine Learning Categories
Types of Learning
Supervised
Learning
Unsupervised
Learnin...
www.edureka.in/data-scienceSlide 14
Machine Learning Categories
What category do the applications below fall into?
Supervi...
www.edureka.in/data-scienceSlide 15
Common Machine Learning Algorithms
Types of Learning
Supervised Learning
Unsupervised ...
www.edureka.in/data-scienceSlide 16
Clustering
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
www.edureka.in/data-scienceSlide 17
Clustering: Scenarios
The following scenarios implement Clustering:
 A telephone comp...
www.edureka.in/data-scienceSlide 18
Some More Use-Cases of Clustering
Slide 18
 Organizing data into clusters shows inter...
www.edureka.in/data-scienceSlide 19
What is Clustering?
Slide 19
Organizing data into clusters such that there is:
 High ...
www.edureka.in/data-scienceSlide 20Slide 20
K-Means Clustering
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka fo...
www.edureka.in/data-scienceSlide 21Slide 21
K-Means Clustering
The process by which objects are classified into
a number ...
www.edureka.in/data-scienceSlide 22
K-Means: Pizza Hut Clustering Example
Twitter @edurekaIN, Facebook /edurekaIN, use #As...
www.edureka.in/data-scienceSlide 23
Let us suppose the following points are the delivery locations for Pizza.
K-Means: Piz...
www.edureka.in/data-scienceSlide 24
Lets locate three cluster centres randomly
C1
C3
C2
K-Means: Pizza Hut Clustering Exam...
www.edureka.in/data-scienceSlide 25
Find the distance of the points as shown.
C1
C3
C2
K-Means: Pizza Hut Clustering Examp...
www.edureka.in/data-scienceSlide 26
Assign the points to the nearest cluster centres based on the distance between each ce...
www.edureka.in/data-scienceSlide 27
Re-assign the cluster centres and locate nearest points.
C1
C2
C3
K-Means: Pizza Hut C...
www.edureka.in/data-scienceSlide 28
Re-assign the cluster centres and locate nearest points, calculate the distance.
C1
C2...
www.edureka.in/data-scienceSlide 29
Form the three clusters.
C1
C2
C3
K-Means: Pizza Hut Clustering Example
www.edureka.in/data-scienceSlide 30
ObjectiveFunctionValue
i.e.,Distortion
Elbow method
The value of k should be such that...
www.edureka.in/data-scienceSlide 31
Now let us consider the another scenario of clustering :
The data from “Google page ra...
Slide 32 www.edureka.in/data-science
Demo
Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
More Info...
Slide 33 www.edureka.in/data-science
 Module 1
» Introduction to Data Science
 Module 2
» Basic Data Manipulation using ...
Slide 34
Questions?
Enroll for the Complete Course at : www.edureka.in/data_science
Twitter @edurekaIN, Facebook /edurekaI...
Upcoming SlideShare
Loading in...5
×

Application of Clustering in Data Science using Real-life Examples

5,564

Published on

Clustering data into subsets is an important task for many data science applications. It is considered as one of the most important unsupervised learning technique. Keeping this in mind, we have come with a free webinar ‘Application of Cluster in Data Science using Real-life examples.’

Published in: Education, Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,564
On Slideshare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
0
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide
  • Netflix uses 1 petabyte to store the videos for streaming.
    BitTorrent Sync has transferred over 30 petabytes of data since its pre-alpha release in January 2013.
    The 2009 movie Avatar is reported to have taken over 1 petabyte of local storage at Weta Digital for the rendering of the 3D CGI effects.
    One petabyte of average MP3-encoded songs (for mobile, roughly one megabyte per minute), would require 2000 years to play.
  • News groups as clusters
  • Transcript of "Application of Clustering in Data Science using Real-life Examples "

    1. 1. www.edureka.in/data-science Data Science Webinar Series: Applications of Clustering in Real Life View Data Science Courses at : www.edureka.in/data_science * Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions View Data Science Courses at : www.edureka.in/data_science *
    2. 2. www.edureka.in/data-scienceSlide 2 Meet Your Instructor Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions Mr. Kumaran Ponnambalam • Director, Data Engineering & PS, Transera Inc, San Francisco Bay Area
    3. 3. www.edureka.in/data-scienceSlide 3 Meet Your Instructor  Understand Data Science Applications and Prospects  Get an overview of Machine Learning  Understand the difference between Supervised and Unsupervised Learning  Learn Clustering and K-means Clustering  Implement K-means clustering in R At the end of this session, you will be able to Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
    4. 4. www.edureka.in/data-scienceSlide 4 Objectives  Understand Data Science Applications and Prospects  Get an overview of Machine Learning  Understand the difference between Supervised and Unsupervised Learning  Learn Clustering and K-means Clustering  Implement K-means clustering in R At the end of this session, you will be able to Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
    5. 5. www.edureka.in/data-scienceSlide 5 Data Science Applications: Wine Recommendation Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
    6. 6. www.edureka.in/data-scienceSlide 6 Data Science Applications: Pizza Hut Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
    7. 7. www.edureka.in/data-scienceSlide 7 Data Science Applications: NetFlix Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
    8. 8. www.edureka.in/data-scienceSlide 8 Data Science Applications: Summarize News
    9. 9. www.edureka.in/data-scienceSlide 9 How about this? Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
    10. 10. www.edureka.in/data-scienceSlide 10 What’s Common in these Applications? According to Wikipedia: Data science is the study of the generalizable extraction of knowledge from data, yet the key word is science. These scenarios involve:  Storing, organizing and integrating huge amount of unstructured data  Processing and analyzing the data  Extracting knowledge, insights and predict future from the data Storage of big data is done in Hadoop. For more details on Hadoop please refer Big data and Hadoop blog http://www.edureka.in/blog/category/big-data-and-hadoop/ Processing, Analyzing, extracting knowledge and insights are done through Machine Learning Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
    11. 11. Slide 11 www.edureka.in/data-science Data Science: Demand Supply Gap Big Data Analyst Big Data Architect Big Data Engineer Big Data Research Analyst Big Data Visualizer Data Scientist 50 43 44 31 23 18 50 57 56 69 77 82 Filled job vs unfilled jobs in big data Filled Unfilled Vacancy/Filled(%) Gartner Says Big Data Creates Big Jobs: 4.4 Million IT Jobs Globally to Support Big Data By 2015http://www.gartner.com/newsroom/id/2207915
    12. 12. Slide 12 www.edureka.in/data-science Data Science: Job Trends
    13. 13. www.edureka.in/data-scienceSlide 13 Machine Learning Categories Types of Learning Supervised Learning Unsupervised Learning Inferring a function from labelled training data. Trying to find hidden structure in unlabelled data. Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
    14. 14. www.edureka.in/data-scienceSlide 14 Machine Learning Categories What category do the applications below fall into? Supervised Learning Supervised Learning Unsupervised Learning Unsupervised Learning Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
    15. 15. www.edureka.in/data-scienceSlide 15 Common Machine Learning Algorithms Types of Learning Supervised Learning Unsupervised Learning Algorithms  Naïve Bayes  Support Vector Machines  Random Forests  Decision Trees Algorithms  K-means  Fuzzy Clustering  Hierarchical Clustering Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
    16. 16. www.edureka.in/data-scienceSlide 16 Clustering Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
    17. 17. www.edureka.in/data-scienceSlide 17 Clustering: Scenarios The following scenarios implement Clustering:  A telephone company needs to establish its network by putting its towers in a particular region it has acquired. The location of putting these towers can be found by clustering algorithm so that all its users receive optimum signal strength.  The Miami DEA wants to make its law enforcement more stringent and hence have decided to make their patrol vans stationed across the area so that the areas of high crime rates are in vicinity to the patrol vans.  A Hospital Care chain wants to open a series of Emergency-Care wards, keeping in mind the factor of maximum accident prone areas in a region. Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
    18. 18. www.edureka.in/data-scienceSlide 18 Some More Use-Cases of Clustering Slide 18  Organizing data into clusters shows internal structure of the data Ex. Clusty and clustering genes  Sometimes the partitioning is the goal Ex. Market segmentation  Prepare for other AI techniques Ex. Summarize news (cluster and then find centroid)  Discovery in data Ex. Underlying rules, reoccurring patterns, topics, etc.
    19. 19. www.edureka.in/data-scienceSlide 19 What is Clustering? Slide 19 Organizing data into clusters such that there is:  High intra-cluster similarity  Low inter-cluster similarity  Informally, finding natural groupings among objects http://en.wikipedia.org/wiki/Cluster_analysis Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
    20. 20. www.edureka.in/data-scienceSlide 20Slide 20 K-Means Clustering Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
    21. 21. www.edureka.in/data-scienceSlide 21Slide 21 K-Means Clustering The process by which objects are classified into a number of groups so that they are as much dissimilar as possible from one group to another group, but as much similar as possible within each group. The objects in group 1 should be as similar as possible. But there should be much difference between an object in group 1 and group 2. The attributes of the objects are allowed to determine which objects should be grouped together. Total population Group 1 Group 2 Group 3 Group 4 Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
    22. 22. www.edureka.in/data-scienceSlide 22 K-Means: Pizza Hut Clustering Example Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
    23. 23. www.edureka.in/data-scienceSlide 23 Let us suppose the following points are the delivery locations for Pizza. K-Means: Pizza Hut Clustering Example Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
    24. 24. www.edureka.in/data-scienceSlide 24 Lets locate three cluster centres randomly C1 C3 C2 K-Means: Pizza Hut Clustering Example Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
    25. 25. www.edureka.in/data-scienceSlide 25 Find the distance of the points as shown. C1 C3 C2 K-Means: Pizza Hut Clustering Example Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
    26. 26. www.edureka.in/data-scienceSlide 26 Assign the points to the nearest cluster centres based on the distance between each centre and the points. C1 C2 C3 K-Means: Pizza Hut Clustering Example Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
    27. 27. www.edureka.in/data-scienceSlide 27 Re-assign the cluster centres and locate nearest points. C1 C2 C3 K-Means: Pizza Hut Clustering Example Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
    28. 28. www.edureka.in/data-scienceSlide 28 Re-assign the cluster centres and locate nearest points, calculate the distance. C1 C2 C3 K-Means: Pizza Hut Clustering Example Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
    29. 29. www.edureka.in/data-scienceSlide 29 Form the three clusters. C1 C2 C3 K-Means: Pizza Hut Clustering Example
    30. 30. www.edureka.in/data-scienceSlide 30 ObjectiveFunctionValue i.e.,Distortion Elbow method The value of k should be such that even if we increase the value of k from here on, the distortion remains constant. This is the ideal value of k, for the clusters created. The Elbow Curve Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
    31. 31. www.edureka.in/data-scienceSlide 31 Now let us consider the another scenario of clustering : The data from “Google page rank”. Notice, that the data given here are sentences and not vectors. Can we apply K-means clustering to it? We will take a deep dive into TF-IDF in module 3 of this course. Let’s look at the Another Scenario For analyzing this type of data we use “TF-IDF algorithm” which converts these attributes to vectors. Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
    32. 32. Slide 32 www.edureka.in/data-science Demo Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions More Information on R setup and applications at: http://www.edureka.in/blog/category/business-analytics-with-r/
    33. 33. Slide 33 www.edureka.in/data-science  Module 1 » Introduction to Data Science  Module 2 » Basic Data Manipulation using R  Module 3 » Machine Learning Techniques using R Part -1 - Clustering - TF-IDF and Cosine Similarity - Association Rule Mining  Module 4 » Machine Learning Techniques using R Part -2 - Supervised and Unsupervised Learning - Decision Tree Classifier Course Topics  Module 5 » Machine Learning Techniques using R Part -3 - Random Forest Classifier - Naïve Bayer’s Classifier  Module 6 » Introduction to Hadoop Architecture  Module 7 » Integrating R with Hadoop  Module 8 » Mahout Introduction and Algorithm Implementation  Module 9 » Additional Mahout Algorithms and Parallel Processing in R  Module 10 » Project Twitter @edurekaIN, Facebook /edurekaIN, use #AskEdureka for Questions
    34. 34. Slide 34 Questions? Enroll for the Complete Course at : www.edureka.in/data_science Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions www.edureka.in/data_science Please Don’t forget to fill in the survey report Class Recording and Presentation will be available in 24 hours at: http://www.edureka.in/blog/application-of-clustering-in-data-science-using-real-life-examples/

    ×