This document provides an overview of data mining and related topics from a professor's lecture. It discusses:
- The growth of data and need for automated analysis, leading to the emergence of data mining in the late 1980s.
- The data mining process involves selecting, cleaning, transforming, mining, and evaluating data to discover useful patterns. Common data mining tasks include classification, clustering, associations, and prediction.
- Not all patterns discovered will be interesting, and it is difficult to find all and only the interesting patterns due to issues of completeness and optimization in the data mining process. Background knowledge can help address these issues.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Basic introduction to recommender systems + Implementing a content-based recommender system by leveraging knowledge encoded into Linked Open Data datasets
The World Wide Web is moving from a Web of hyper-linked documents to a Web of linked data. Thanks to the Semantic Web technological stack and to the more recent Linked Open Data (LOD) initiative, a vast amount of RDF data have been published in freely accessible datasets connected with each other to form the so called LOD cloud. As of today, we have tons of RDF data available in the Web of Data, but only a few applications really exploit their potential power. The availability of such data is for sure an opportunity to feed personalized information access tools such as recommender systems. We will show how to plug Linked Open Data in a recommendation engine in order to build a new generation of LOD-enabled applications.
(Lecture given @ the 11th Reasoning Web Summer School - Berlin - August 1, 2015)
World Future Society 2015 Professional Members ForumWendy Schultz
Slidedeck on the 2015 WFS Professional Members Forum "Software Sandbox" morning session, presented by Dr Wendy Schultz, Infinite Futures, and Dr Richard Lum, Vision Foresight Strategy.
Workshop session given at the Institutional Web Management Workshop 2012 (IWMW 2012) event held at the University of Edinburgh on 18th - 20th June 2012.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Basic introduction to recommender systems + Implementing a content-based recommender system by leveraging knowledge encoded into Linked Open Data datasets
The World Wide Web is moving from a Web of hyper-linked documents to a Web of linked data. Thanks to the Semantic Web technological stack and to the more recent Linked Open Data (LOD) initiative, a vast amount of RDF data have been published in freely accessible datasets connected with each other to form the so called LOD cloud. As of today, we have tons of RDF data available in the Web of Data, but only a few applications really exploit their potential power. The availability of such data is for sure an opportunity to feed personalized information access tools such as recommender systems. We will show how to plug Linked Open Data in a recommendation engine in order to build a new generation of LOD-enabled applications.
(Lecture given @ the 11th Reasoning Web Summer School - Berlin - August 1, 2015)
World Future Society 2015 Professional Members ForumWendy Schultz
Slidedeck on the 2015 WFS Professional Members Forum "Software Sandbox" morning session, presented by Dr Wendy Schultz, Infinite Futures, and Dr Richard Lum, Vision Foresight Strategy.
Workshop session given at the Institutional Web Management Workshop 2012 (IWMW 2012) event held at the University of Edinburgh on 18th - 20th June 2012.
Intro to Data Science for Non-Data ScientistsSri Ambati
Erin LeDell and Chen Huang's presentations from the Intro to Data Science for Non-Data Scientists Meetup at H2O HQ on 08.20.15
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Privacy, Ethics, and Future Uses of the Social WebMatthew Russell
A presentation to the Owen Graduate School of Management (Vanderbilt University) about social media and some of the technology behind the future uses of social media that are likely to shape the future of the Web as we know it.
Outline
Digital Project Planning
What is the goal of your Digital Scholarship project?
We will discuss Digital Humanities projects as Digital Scholarship Project
Learn what the components or layers of a Digital Humanities project are.
How do you find data to use to answer research questions?
Understand descriptive metadata and the rationale for its use
Digital Pedagogy
If you are involving students how does that affect your planning plan?
How do you incorporate Digital Pedagogy into a Digital Project?
How can we mine, analyse and visualise the Social Web?
In this lecture, you will learn about mining social web data for analysis. Data preparation and gathering basic statistics on your data.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
DMTM Lecture 13 Representative based clusteringPier Luca Lanzi
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides from the 2016/2017 edition of the Video game Design and Programming course at the Politecnico di Milano. More information at http://www.polimigamecollective.org Some of the video games developed by the students during the course are available at https://polimi-game-collective.itch.io
Slides from the 2016/2017 edition of the Video game Design and Programming course at the Politecnico di Milano. More information at http://www.polimigamecollective.org Some of the video games developed by the students during the course are available at https://polimi-game-collective.itch.io
Slides from the 2016/2017 edition of the Video game Design and Programming course at the Politecnico di Milano. More information at http://www.polimigamecollective.org Some of the video games developed by the students during the course are available at https://polimi-game-collective.itch.io
Slides from the 2016/2017 edition of the Video game Design and Programming course at the Politecnico di Milano. More information at http://www.polimigamecollective.org Some of the video games developed by the students during the course are available at https://polimi-game-collective.itch.io
Normal Labour/ Stages of Labour/ Mechanism of LabourWasim Ak
Normal labor is also termed spontaneous labor, defined as the natural physiological process through which the fetus, placenta, and membranes are expelled from the uterus through the birth canal at term (37 to 42 weeks
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
A workshop hosted by the South African Journal of Science aimed at postgraduate students and early career researchers with little or no experience in writing and publishing journal articles.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
This presentation includes basic of PCOS their pathology and treatment and also Ayurveda correlation of PCOS and Ayurvedic line of treatment mentioned in classics.
Safalta Digital marketing institute in Noida, provide complete applications that encompass a huge range of virtual advertising and marketing additives, which includes search engine optimization, virtual communication advertising, pay-per-click on marketing, content material advertising, internet analytics, and greater. These university courses are designed for students who possess a comprehensive understanding of virtual marketing strategies and attributes.Safalta Digital Marketing Institute in Noida is a first choice for young individuals or students who are looking to start their careers in the field of digital advertising. The institute gives specialized courses designed and certification.
for beginners, providing thorough training in areas such as SEO, digital communication marketing, and PPC training in Noida. After finishing the program, students receive the certifications recognised by top different universitie, setting a strong foundation for a successful career in digital marketing.
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
A review of the growth of the Israel Genealogy Research Association Database Collection for the last 12 months. Our collection is now passed the 3 million mark and still growing. See which archives have contributed the most. See the different types of records we have, and which years have had records added. You can also see what we have for the future.
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
3. Prof. Pier Luca Lanzi
Why Data Mining?
“Necessity is the mother of invention”
Explosive Growth of Data
Pressing need for the automated analysis of massive data
Emerged in the late 1980s
Major developments in the mid 1990s
4. Prof. Pier Luca Lanzihttp://www.iflscience.com/technology/amount-data-internet-generates-every-minute-crazy/
5. Prof. Pier Luca Lanzi
Evolution of Technology
• 1960s: data collection, database creation, & network DBMS
• 1970s: relational data model, relational DBMS implementation
• 1980s: RDBMS, advanced data models (extended-relational, OO,
deductive, etc.); application-oriented DBMS
(spatial, scientific, engineering, etc.)
• 1990s: data mining, data warehousing, multimedia databases,
and Web databases
• 2000s: stream data management and mining, web technology (XML,
data integration), global information systems
• 2010s: social networks, NoSQL, unstructured data, etc.
5
6. Prof. Pier Luca Lanzi
http://launchhack.com/content/25-cartoons-give-current-big-data-hype-perspective/
7. Prof. Pier Luca Lanzi
What is the Commercial Viewpoint?
• Huge amounts of data is being collected
and warehoused everyday
§ Web data, e-commerce
§ Purchases at department stores
§ Bank/Credit Card transactions
• Computers have become
cheaper and more powerful
• Competitive pressure is strong
to provide better, customized services
(e.g., CRM or Customer Relationship Management)
• Poor data across businesses and the government costs
huge amount of money
7
8. Prof. Pier Luca Lanzi
What is the Scientific Viewpoint?
• Data collected and stored at
enormous speeds (GB/hour)
§remote sensors on a satellite
§telescopes scanning the skies
§microarrays generating gene
expression data
§scientific simulations
generating terabytes of data
• Traditional techniques infeasible for raw data
• Data mining may help scientists
§in classifying and segmenting data
§in Hypothesis Formation
8
9. Prof. Pier Luca Lanzi
Examples
• Customer attrition
§ Given customer information for the past months
§ Predict who is likely to attrite next month,
or estimate customer value
• Credit assessment
§ Given a loan application
§ Predict whether the bank should approve the loan
• Customer segmentation
§ Given several information about the customers
§ Identify interesting groups among them
• Community detection
§ Given a social network of users
§ Identify community based on their connections
(friendship relation, discussions, etc.)
9
17. Prof. Pier Luca Lanzi
Data Mining
The non-trivial process of identifying
(1) valid, (2) novel, (3) potentially useful,
and (4) understandable patterns in data.
18. Prof. Pier Luca Lanzi
An Example Using Contact Lens Data 18
NoneReducedYesHypermetropePre-presbyopic
NoneNormalYesHypermetropePre-presbyopic
NoneReducedNoMyopePresbyopic
NoneNormalNoMyopePresbyopic
NoneReducedYesMyopePresbyopic
HardNormalYesMyopePresbyopic
NoneReducedNoHypermetropePresbyopic
SoftNormalNoHypermetropePresbyopic
NoneReducedYesHypermetropePresbyopic
NoneNormalYesHypermetropePresbyopic
SoftNormalNoHypermetropePre-presbyopic
NoneReducedNoHypermetropePre-presbyopic
HardNormalYesMyopePre-presbyopic
NoneReducedYesMyopePre-presbyopic
SoftNormalNoMyopePre-presbyopic
NoneReducedNoMyopePre-presbyopic
hardNormalYesHypermetropeYoung
NoneReducedYesHypermetropeYoung
SoftNormalNoHypermetropeYoung
NoneReducedNoHypermetropeYoung
HardNormalYesMyopeYoung
NoneReducedYesMyopeYoung
SoftNormalNoMyopeYoung
NoneReducedNoMyopeYoung
Recommended lensesTear production rateAstigmatismSpectacle prescriptionAge
19. Prof. Pier Luca Lanzi
An example of possible pattern
if astigmatism = yes
and tear production rate = normal
and spectacle prescription = myope
then recommendation = hard
21. Prof. Pier Luca Lanzi
How Can We Evaluate a Pattern?
• Is it valid?
§The pattern has to be valid with respect
to a certainty level (rule true for the 86%)
• Is it novel?
§Is the relation between astigmatism and
hard contact lenses already well-known?
• Is it useful? Is it actionable?
§The pattern should provide information
useful to the bank for assessing credit risk
• Is it understandable?
22
22. Prof. Pier Luca Lanzi
but there is another important question …
was it “worth” finding it? (I mean $ worth)
how much did the search cost?
how much value did it bring
23
23. Prof. Pier Luca Lanzi
Example of Cost-Based
Model Evaluation
• A bank has a predictive model that can identify risky loans with an
accuracy of 72%
• Your company develops a model that can improve their
performance by 3% reaching an accuracy of 75%
• Is this a good result?
• We might simply evaluate the 3% of improvement but giving out
loans has a cost that depends on the type of error we make
24
24. Prof. Pier Luca Lanzi
Example of Cost-Based
Model Evaluation
• Predictive accuracy in this case is defined as
• True Positives, true negatives
§ Safe and risky loans predicted as safe and risky respectively
• False Positive Errors
§ We accept a risky loan which we predicted was safe
§ We are likely not to get the money back
(let’s say on average 30000 euros)
• False Negative Errors
§ We don’t give a safe loan since we predicted it was risky
§ We will loose the interest money
(let’s say on average 10000 euros
25
25. Prof. Pier Luca Lanzi
Example of Cost-Based
Model Evaluation
• Original Model
§1576 false positives and 1224 false negatives
§Total cost is 59525443
• Our Model
§1407 false positives and 1093 false negatives
§Total cost is 53147717 (more than 6 millions saved)
• What if we can change the way our model makes mistakes?
§1093 false positives and 1407 false negatives
§Total cost becomes 46852283 (more than 12 millions saved)
26
26. Prof. Pier Luca Lanzi
What is the General Idea?
• Build computer programs that navigate through databases
automatically, seeking regularities or patterns
• There will be problems
§Most patterns are banal and uninteresting
§Most patterns are spurious, inexact, or contingent on
accidental coincidences in the particular dataset used
§Real data is imperfect: Some parts will be garbled,
and some will be missing
• Algorithms need to be robust enough to cope with imperfect
data and to extract regularities that are inexact but useful
28
35. Prof. Pier Luca Lanzi
Statistics, Machine Learning,
and Data Mining
• Statistics is more theory-based, focuses on testing hypotheses
• Machine learning is more based on heuristic, focuses on building
program that learns, more general than Data Mining
• Data Mining
§Integrates theory and heuristics
§Focus on the entire process of discovery, including
data cleaning, learning, integration and visualization
Distinctions are blurred!
37
37. Prof. Pier Luca Lanzi
Why Is It Different?
• Tremendous amount of data
§ High scalability to handle terabytes of data
• High-dimensionality of data
§ Micro-array may have tens of thousands of dimensions
• High complexity of data
§ Data streams and sensor data
§ Time-series data, temporal data, sequence data
§ Structure data, graphs, social networks and multi-linked data
§ Heterogeneous databases and legacy databases
§ Spatial, spatiotemporal, multimedia, text and Web data
§ Software programs, scientific simulations
• New and sophisticated applications
39
38. Prof. Pier Luca Lanzi
Knowledge Discovery Process 40
selection
cleaning
transformation
mining
evaluation
39. Prof. Pier Luca Lanzi
Knowledge Discovery Process
What are the main steps?
• Learning the application domain to extract
relevant prior knowledge and goals
• Data selection
• Data cleaning
• Data reduction and transformation
• Mining
§ Select the mining approach: classification,
regression, association, clustering, etc.
§ Choosing the mining algorithm(s)
§ Perform mining: search for patterns of interest
• Pattern evaluation and knowledge presentation
§ visualization, transformation,
removing redundant patterns, etc.
• Use of discovered knowledge
41
40. Prof. Pier Luca Lanzi
What are the typical
Data Mining tasks?
42
41. Prof. Pier Luca Lanzi
What are the Major Data Mining
Tasks?
• Classification: predicting an item class
• Clustering: finding clusters in data
• Associations: frequent occurring events…
• Visualization: to facilitate human discovery
• Summarization: describing a group
• Deviation Detection: finding changes
• Estimation: predicting a continuous value
• Link Analysis: finding relationship
• “Sentiment” Analysis, “Opinion” Mining
• But many appears as time goes by, opinion mining,
sentiment mining
43
43. Prof. Pier Luca Lanzi
Input variable: LSTAT - % lower status of the population
Output variable: MEDV - Median value of owner-occupied homes in $1000's
44. Prof. Pier Luca Lanzi
Input variable: LSTAT - % lower status of the population
Output variable: MEDV - Median value of owner-occupied homes in $1000's
51. Prof. Pier Luca Lanzi
Data Mining Tasks: Associations
Bread
Peanuts
Milk
Fruit
Jam
Bread
Jam
Soda
Chips
Milk
Fruit
Steak
Jam
Soda
Chips
Bread
Jam
Soda
Chips
Milk
Bread
Fruit
Soda
Chips
Milk
Jam
Soda
Peanuts
Milk
Fruit
Fruit
Soda
Peanuts
Milk
Fruit
Peanuts
Cheese
Yogurt
Is there something interesting to be noted?
53
52. Prof. Pier Luca Lanzi
Data Mining Tasks: Associations
• Finds interesting associations and/or correlation relationships
among large set of data items.
• E.g., 98% of people who purchase tires and auto accessories also
get automotive services done
54
53. Prof. Pier Luca Lanzi
Data Mining Tasks: Other Tasks
• Outlier analysis
§ Outlier: a data object that does not comply
with the general behavior of the data
§ It can be considered as noise or exception
but is quite useful in fraud detection, rare events analysis
• Trend and evolution analysis
§ Trend and deviation: regression analysis
§ Sequential pattern mining, periodicity analysis
§ Similarity-based analysis
• Text Mining, Topic Modeling, Graph Mining, Data Streams
• Sentiment Analysis, Opinion Mining, etc.
• Other pattern-directed or statistical analyses
55
55. Prof. Pier Luca Lanzi
Are all the “Discovered” Patterns
Interesting?
• Data Mining may generate thousands of patterns,
not all of them are interesting.
• Interestingness measures: a pattern is interesting if it is easily understood by
humans, valid on new or test data with some degree of certainty, potentially
useful, novel, or validates some hypothesis that a user seeks to confirm
• Objective vs. subjective interestingness measures:
• Objective: based on statistics and structures of patterns, e.g., support,
confidence, etc.
• Subjective: based on user’s belief in the data, e.g., unexpectedness, novelty, etc.
57
56. Prof. Pier Luca Lanzi
Can We Find All and Only
Interesting Patterns?
• Completeness
§Find all the interesting patterns
§Can a data mining system find all
the interesting patterns?
§Association vs. classification vs. clustering
• Optimization
§Search for only interesting patterns:
§Can a data mining system find only
the interesting patterns?
§Two approaches: (1) first general all the patterns and then
filter out the uninteresting ones; (2) generate only the
interesting patterns—mining query optimization
58
57. Prof. Pier Luca Lanzi
What About Background Knowledge?
• A typical kind of background knowledge: Concept hierarchies
• Schema hierarchy
§ E.g., Street < City < ProvinceOrState < Country
• Set-grouping hierarchy
§ E.g., {20-39} = young, {40-59} = middle_aged
• Operation-derived hierarchy
§ email address: hagonzal@cs.uiuc.edu
§ login-name < department < university < country
• Rule-based hierarchy
§ LowProfitMargin (X) <= Price(X, P1) and Cost (X, P2)
and (P1 - P2) < $50
59
58. Prof. Pier Luca Lanzi
https://www.youtube.com/watch?v=CO2mGny6fFs
59. Prof. Pier Luca Lanzi
Assignments
• Mining of Massive Datasets (Chapter 1)
61