DESCRIPTIVE GRANULARITY  Building Foundations of Data             MiningIn Memory of my Professors: Zdzislaw Pawlak,Helena...
Part 1:   INTRODUCTION                         2
We all have scientific history;All problems we work on have history;It is important to trace historyof problems we work on;...
We all have scientific history;Here is my LATEST history (of building Foun-   dations of Data Mining)1995- 1998 I supervise...
It has been a slow process but finally a com-    munity and specialized conferences devel-    oped, books started to appear...
Our work in Data Mining Foundations ma-  tured and finally we were invited by T.Y.  LIN to write a 20 pages long entry abou...
All problems we work on have historyShort History of Foundational StudiesThe origins of Foundational Studies can be  trace...
Hilbert Problems: In 1900 he proposed at the    Paris conference of the International Congress    of Mathematicians 23 pro...
TWO Problems: 1, 2 are FOUNDATIONAL  Problems; 1 concerning Continuum Hypoth-  esis was solved by Cohen in 1963, and 2  co...
Riemann hypothesis was proposed by Bern-   hard Riemann (1859)It is a conjecture about the distribution of the     zeros o...
Pierre Deligne proved in 1973 analogue of the   Riemann Hypothesis for zeta functions of   varieties defined over finite fiel...
Goldbach’s conjecture (1742) is one of the   oldest unsolved problems in number theory   and in all of mathematics. It sta...
Hilbert ProgramHilbert proposed, in 1920 a research project    that became known as Hilbert’s Program.1. He wanted mathema...
In 1931 Kurt Godel showed that Hilbert’s grand    plan 1. and 2. was impossible as stated.Godel proved in what is now call...
Gentzen’s work led to the development of Proof   Theory and Automated Theorem Prov-   ing as separate Mathematics and Comp...
Personal History: my Master Thesis in Com-   puter Science (under Pawlak and Rasiowa)   consisted of a solution of Gentzen...
Polish School of MathematicsThe term Polish School of Mathematics refers  to groups of mathematicians of the 1920’s  and 1...
Any list of important twentieth century math-   ematicians contains Polish names in a fre-   quency out of proportion to t...
Independent Poland was crated in 1918 and   University of Warsaw re-opened with   Janiszewski, Mazurkiewicz, and Sierpin- ...
The choice of title was deliberate to reflect  that all areas published there were to be  connected with foundational studi...
The notable mathematicians of the Warsaw  and Lvov Schools of Mathematics were,  between others Stefan Banach, Stanis-  la...
Stanislaw Ulam emigrated to America just be-   fore the war and became American math-   ematician of Polish-Jewish origins...
Roman Sikorski reputation was established by  his outstanding results in Boolean algebras,  functional analysis, theories ...
The notable logicians of the Lvov-Warsaw  School of Logic were:Alfred Tarski - since 1942 in Berkeley and    founder of Am...
Helena Rasiowa became, in 1977 the founder   of Fundamenta Informaticae the first world   journal specialized in foundation...
Part 2:DESCRIPTIVE GRANULARITY  A Model for Data Mining                            26
We present here a formal syntax and seman-  tics for a notion of a descriptive granu-  larity.We do so in terms of three a...
Data Mining - Informal DefinitionOne of the main goals of Data Mining is to  provide comprehensible descriptions of  inform...
The descriptions come in different forms.In case of classification problems it might be    a set of characteristic or discri...
In case of approximate classification by the    Rough Set analysis it is usually a set of dis-    criminant or characterist...
SYNTAXWe understand] by syntax, or syntactical  concepts simple relations among symbols  and expressions of formal symboli...
SEMANTICSSemantics for as given symbolic language L  assigns a specific interpretation in some  domain to all symbols and e...
MODELThe word model is used in many situations  and has many meanings but they all reflect  some parts, if not all, of its ...
All our Models are abstract structures that    allow us to formalize some general prop-    erties of Data Mining process a...
The notion of generalization is defined in  terms of granularity of steps of the pro-  cess.Data is represented in the mode...
Granular ModelGranular Model is a system   GM = ( S M, DM, |= ) where:    • SM is a Semantic Model;    • DM is a Descripti...
Semantic Model definition motivation.First step in any data mining procedures is to    drop the key attribute.This step all...
As the next step we represent, following Rough   Set model our target data table as Pawlak’s   Information System with the...
The idea behind is very simple. It is the  same as saying that (a + b)2 = a2 + 2ab + b2  is a more general formula then th...
To model a situation that allows us to talk   about descriptions of sets of records (ob-   jects) we extend the notion of ...
Target Data Table T0    a1        a2        a3   small     small   medium  medium     small   medium   small     small   m...
Knowledge System of granularity one (all  objects are one element sets) correspond-  ing to target table T0 is as follows....
Assume now that we have applied some algo-   rithm ALG1 and it has returned a following   set                      D = {D1...
QuestionsQ1 How well this set of descriptions describes  our original data i.e. how accurate is the  algorithm ALG1 we hav...
Intuitively, the sets               S(D) = {x ∈ U :   D}   contain all records (i.e. their identifiers)   with the same des...
In association analysis the descriptions can rep-    resent the frequent item sets.For example , for a frequent three item...
For the target data and descriptions Di ∈ D  presented in the above examples the sets  S(Di) are as follows.S1 = S(D1 ) = ...
We represent our results in a form of a Knowl-  edge System as follows.             Resulting Knowledge System         K1 ...
The representation of data mining results in  a form of a knowledge system allows us to  define how good is the knowledge o...
Moreover, we can see that the resulting sys-  tem K1 is more general then the input  data K0 because its granularity is hi...
Now assume that we have applied to out tar-  get data T (represented by K0 ) another  algorithm ALG2 and it returned two d...
Incorporating the algorithm parameters im-   posed by the ALG2 into our Knowledge   System we obtain the following table. ...
The algorithm ALG2 generalized the target  data, even if in an incomplete way.The formal definitions of Information System,...
Knowledge System is an extension of the fol-  lowing notion of Pawlak’s information sys-  tem.Information System is a syst...
A knowledge system based on the informa-   tion system                  I = (U, A, VA, f )    is a system           KI = (...
g is a partial function called knowledge in-    formation function(k-function)         g : P(U ) × (A ∪ E) −→ (VA ∪ VE )  ...
We use the above notion of knowledge sys-  tem to define the granules of the universe  and the granularity of the system, a...
A knowledge system K = (P(U ), A, E, VA, VE , g)  is called exact if and only if all its granules  GrK form a partition of...
We put all the above observations into a for-  mal notion of a semantic model.Semantic Model is a system              S M ...
The semantic model is always being built for  a given application.The target data is represented first in a form  the targe...
The semantic model based on our examples  is as follows.              S M = (P(U ), K, G),   where:    • U = {x1, x2, ...x...
Data Mining as GeneralizationWe model data mining as a process of gen-  eralization in terms of the generalization  relati...
Observe that for K0, K1, K2 from our exam-  ples grK0 = 1 ≤ 5 = grK1 ≤ 7 = grK2 , and  the system K2 is the most general. ...
Data Mining Operators GIn data mining process the preprocessing and   data mining proper are disjoint , inclu-   sive/exlu...
We provide also a detailed formal definitions,  their motivation, and discussion of these  two classes.Data Mining and prep...
The main idea behind the concept of the  operator is to capture not only the fact  that data mining techniques generalize ...
We prove the following theorem.Theorem Let Gclass, Gclust and Gassoc be the  sets of all classification, clustering, and as...
Data Mining ProcessDefinition   Any sequence             K1, K2, ....Kn (n ≥ 1)   of data mining states is called a data pr...
The data mining process consists of the pre-  processing process (that might be empty)  and the data mining proper process...
Granular ModelSyntax- Semantic Duality of Data MiningGranular Model is a system   GM = ( S M, DM, |= ) where:    • SM is a...
Descriptive ModelFor any Semantic Model S M = (P(U ), K, G, )  we associate with it its descriptive counter-  part defined ...
DK = ∅ and DK ⊆ P(E) is a set of descrip-  tions of knowledge states.As in a case of semantic model, we build the   descri...
For example, a neural network with its nodes   and weights can be seen as a formal de-   scription (in an appropriate desc...
Granular Model is a system   GM = ( S M, DM, |= ) where:    • SM is a Semantic Model;    • DM is a Descriptive Model;    •...
Stage2 For each K ∈ K, and descriptive ex-   pression F ∈ EK , we define what does it   mean that D satisfied in K; i.e. we ...
Stage4 We use the satisfaction relation |=K   to define, for each K ∈ K, the set DK ⊆   P(EK ) of descriptions of its own k...
Part 3:    TRACING THE             HISTORYMathematics Genealogy Projectgenealogy.math.ndsu.nodak.edu                      ...
We all have a historyWe are all mathematiciansMission Statement of the Mathematics Ge-   nealogy Project defines a mathemat...
The Genealogy Project solicits information from  all schools who participate in the devel-  opment of research level mathe...
Below are some links (sequences of connected  people) for a computer scientist.Any two people in the sequence are listed i...
A mathematician would say: For any element A of the sequence, if A  has more then one adviser, then for any  1 ≤ k ≤ n , a...
Link to Nicolaus Copernicus          (Mikolaj Kopernik)        He has 1598 descendantsAnita Wasilewska, Ph.D. Warsaw Unive...
1. Johann August Bach, Magister philosophiae,   Universitat Leipzig, 1744, 1.Christian Kust-   ner, Magister philosophiae,...
1.Melchior Jostel, Magister artium, Medici-  nae Dr., Martin Luther Universitat, Halle  Wittenberg, 1583, 1600, 1.Valentin...
Georg von Peuerbach, Magister artium, Uni-   versitat Wien, 1440, Johannes von Gmunden,   Magister artium, Universitat Wie...
Link to Gottfried Leibniz          (54209 descendants),            Immanuel Kant        ( 2176 descendants), and   Desider...
Immanuel Kant, Ph.D. Universitat Konigs-  berg 1770,Martin Knutzen, Dr. Phil. Universitat Konigs-  berg, 1732, Christian v...
(Snel van Royen) Snellius, Artium liberal-  ium Magister, Universitat zu Koln, Ruprecht  Karls Universitat Heidelberg, 157...
Link to Pierre-Simon Laplace         ( 50295 descendants) and       Jean Le Rond d’AlembertAnita Wasilewska, Ph.D. Warsaw ...
Link to Emile Borel            (2506 descendants),              Leonhard Euler            (52555 descendants)Anita Wasilew...
1. Joseph Lagrange, no degree, student of  Leonhard Euler, Ph.D. Universitat Basel,  1726, Dr. med. Universitat Basel, 169...
Link to Andrei Markov         (4824 descendants), and  Pafnuty Chebyshev (5964 descendants)Anita Wasilewska, Ph.D. Warsaw ...
MY PhD COUSINS includeKurt GoedelAlain TuringAlonso ChurchRoman SikorskiZdzislam Pawlakand many others....I am sure some o...
In Stony Brook CS Department I traced 10    of them.WE ALL ARE A BIG SCIENTIFIC FAMILY!                                   92
Upcoming SlideShare
Loading in …5
×

Descriptive Granularity - Building Foundations of Data Mining

1,056 views

Published on

Associate Professor Anita Wasilewska gave a lecture on "Descriptive Granularity" in the Distinguished Lecturer Series - Leon The Mathematician.

More Information available at:
http://dls.csd.auth.gr

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,056
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Descriptive Granularity - Building Foundations of Data Mining

  1. 1. DESCRIPTIVE GRANULARITY Building Foundations of Data MiningIn Memory of my Professors: Zdzislaw Pawlak,Helena Rasiowa and Roman Sikorski Anita Wasilewska Computer Science Department Stony Brook University Stony Brook, NY 1
  2. 2. Part 1: INTRODUCTION 2
  3. 3. We all have scientific history;All problems we work on have history;It is important to trace historyof problems we work on;We all build scientific history;The future belongs to us,and so does the past. 3
  4. 4. We all have scientific history;Here is my LATEST history (of building Foun- dations of Data Mining)1995- 1998 I supervised PhD Thesis ofErnestina Menasalvas, now Professor and a Vice-Rector of Madrid Polytechnic.We (with some others) went from building models for concrete implementations (1996- 2002) todeveloping a general language for Founda- tions of Data Mining (2002 -2004) tobuilding a general foundational model for Data Mining (2005- ). 4
  5. 5. It has been a slow process but finally a com- munity and specialized conferences devel- oped, books started to appear:Foundations and Novel Approaches in Data Mining, T.Y. Lin, S. Ohsuga, C. J. Liau, and X. Hu , editors, Springer 2006, Data Mining: Foundations and Practice, Tsau Young Lin, Ying Xie, Anita Wasilewska, Churn-Jung Liau, editors, Studies in Com- putational Intelligence (SCI)118, Springer- Verlag 2008,and a field Foundations of Data Mining was created.We all build the scientific history and it takes TIME and patience to do so. 5
  6. 6. Our work in Data Mining Foundations ma- tured and finally we were invited by T.Y. LIN to write a 20 pages long entry about our research in the Encyclopedia of Com- plexity and System Science published by Springer in 2008. The Encyclopedia is Springer’s latest and prestigious initiative with its Board of Ed- itors including between others Ahmed Ze- wail, Nobel in Chemistry, Thomas Schelling, Nobel in Economics, Richard E. Stearns, 1993 Turing Award, Pierre-Louis Lions, 1994 Fields Medal, and Lotfi Zadeh, IEEE Medal of Honor.All entries were by invitation only and the in- clusion of our work shows the recognition of the need for foundational studies in newly developing domains. 6
  7. 7. All problems we work on have historyShort History of Foundational StudiesThe origins of Foundational Studies can be traced back to David Hilbert, a German mathematician, recognized as one of the most influential and universal mathemati- cians of the 19th and early 20th centuries. 7
  8. 8. Hilbert Problems: In 1900 he proposed at the Paris conference of the International Congress of Mathematicians 23 problems for the fu- ture century.Several of them turned out to be very influ- ential for 20th century mathematics and later Computer Science.Of the cleanly-formulated Hilbert problems,TEN problems: 3, 7, 10, 11, 13, 14, 17, 19, 20, and 21 have solutions that are ac- cepted by consensus. 8
  9. 9. TWO Problems: 1, 2 are FOUNDATIONAL Problems; 1 concerning Continuum Hypoth- esis was solved by Cohen in 1963, and 2 concerning Consistency of Arithmetic was solved by and Godel and Gentzen in 1936FIVE Problems: 5, 9, 15, 18, and 22 have partial solutions,FOUR problems: 4, 6, 16, and 23 are too loosely formulated to be ever described as possible to be solved.TWO Problems: 8 (the Riemann Hypothe- sis, along with the Goldbach conjecture is a part of it) and 12 are still OPEN, both being in number theory. 9
  10. 10. Riemann hypothesis was proposed by Bern- hard Riemann (1859)It is a conjecture about the distribution of the zeros of the Riemann zeta function which states that all non-trivial zeros have real part 1/2.The Riemann hypothesis implies results about the distribution of prime numbers that are in some ways as good as possible.Along with suitable generalizations, it is con- sidered by some mathematicians to be the most important unresolved problem in pure mathematics. 10
  11. 11. Pierre Deligne proved in 1973 analogue of the Riemann Hypothesis for zeta functions of varieties defined over finite fields.The full version of the hypothesis remains un- solved, althoughcomputer calculations have shown that the first 10 trillion zeros lie on the critical line. 11
  12. 12. Goldbach’s conjecture (1742) is one of the oldest unsolved problems in number theory and in all of mathematics. It states: Every even integer greater than 2 can be expressed as the sum of two primesFor example; 4 = 2 + 2, 6 = 3 + 3, 8 = 3 + 5, 10 = 7 + 3, or 5 + 5, 12 = 5 + 7, 14 = ....T. Oliveira e Silva is running a distributed com- puter search that has verified the conjec- ture for n ≤ 1.609 × 1018 and some higher small ranges up to 4 × 1018. 12
  13. 13. Hilbert ProgramHilbert proposed, in 1920 a research project that became known as Hilbert’s Program.1. He wanted mathematics to be formulated on a solid and complete logical founda- tion.2. He believed that in principle this could be done, by showing that all of mathematics follows from a correctly-chosen finite sys- tem of axioms and that some such axiom system is provably consistent.3. He also believed that one can have such a system in which proofs of theorems can be deduced automatically from the way the theorems are built. 13
  14. 14. In 1931 Kurt Godel showed that Hilbert’s grand plan 1. and 2. was impossible as stated.Godel proved in what is now called Godel’s Incompleteness Theorem that any non contradictory formal system, which was com- prehensive enough to include at least arith- metic, cannot demonstrate its complete- ness by way of its own axioms.In 1933-34 Gerhard Gentzen gave a positive answer to 3. in a case of classical proposi- tional logic, and partially positive answer in case of (semi-undecidable) predicate logic.Nevertheless Hilbert’s and Godel’s work led to the development of recursion theory and then mathematical logic and foun- dations of mathematics as autonomous disciplines. 14
  15. 15. Gentzen’s work led to the development of Proof Theory and Automated Theorem Prov- ing as separate Mathematics and Computer Science domains.Godel inspired works of Alonzo Church and Alan Turing that became the basis for theoretical computer science and also led to the further development of a unique phenomenon called the Polish School of Mathematics and later to the creation of Foundational Studies in Computer Science. 15
  16. 16. Personal History: my Master Thesis in Com- puter Science (under Pawlak and Rasiowa) consisted of a solution of Gentzen’s con- juncture for Modal S4 and S5 Logics and consequently I also developed first world theorem prover for S4 Modal Logic in 1967.As a result I have spent first 15 years of my scientific life (before coming to USA) work- ing in Proof Theory for non-classical log- ics, formulated (as a pure mathematician) a General Theory of Gentzen Type For- malizations and established various re- sults about connections and relationships between certain Classes of Logics, For- mal Languages and Theory of Programs (as computer scientist). 16
  17. 17. Polish School of MathematicsThe term Polish School of Mathematics refers to groups of mathematicians of the 1920’s and 1930’s working on common subjects.The main two groups were situated in War- saw and Lvov (now Lviv, the biggest city in Western Ukraine).We talk hence more specifically about War- saw and Lvov Schools of Mathematics and additionally of Warsaw-Lvov School of Logic working in Warsaw. 17
  18. 18. Any list of important twentieth century math- ematicians contains Polish names in a fre- quency out of proportion to the size of the country.Poland was partitioned by Russia, Germany, and Austria and was under foreign domi- nation for 200 years, from 1795 until the end of World War I.What was to become known as the Polish School of Mathematics was possible be- cause it was carefully planned, agreed upon, and executed. 18
  19. 19. Independent Poland was crated in 1918 and University of Warsaw re-opened with Janiszewski, Mazurkiewicz, and Sierpin- ski as professors of mathematics.They chose logic, set theory, point-set topol- ogy, and real functions as the area of concentration.The journal Fundamenta Mathematicae was founded in 1920 and is still in print.It was the first specialized mathematical journal in the world. 19
  20. 20. The choice of title was deliberate to reflect that all areas published there were to be connected with foundational studies.It should be remembered that at the time these areas had not yet received full acceptance by the mathematical commu- nity.The choice reflected both insight and courage 20
  21. 21. The notable mathematicians of the Warsaw and Lvov Schools of Mathematics were, between others Stefan Banach, Stanis- lam Ulam and after the war, Roman Sikorski.Stefan Banach was self-taught mathematics prodigy and the founder of modern func- tional analysis.Mathematical concepts named after Banach include the Banach-Tarski paradox, Hahn- Banach theorem, BanachSteinhaus theo- rem, Banach-Mazur game and Banach spaces. 21
  22. 22. Stanislaw Ulam emigrated to America just be- fore the war and became American math- ematician of Polish-Jewish origins.He participated in the Manhattan Project and originated the Teller-Ulam design of thermonuclear weapons.He also invented nuclear pulse propulsion and developed a number of mathematical tools in number theory, set theory, ergodic the- ory and algebraic topology. 22
  23. 23. Roman Sikorski reputation was established by his outstanding results in Boolean algebras, functional analysis, theories of distribution, measure theory, general topology, descrip- tive set theory, and in Algebraic Math- ematical Logic (with collaboration with Rasiowa). In axiomatic set theory, the Rasiowa-Sikorski Lemma is one of the most fundamental facts used in the technique of forcing. 23
  24. 24. The notable logicians of the Lvov-Warsaw School of Logic were:Alfred Tarski - since 1942 in Berkeley and founder of American School of Founda- tions of Mathematics, Jan Lukasiewicz, Andrzej Mostowski, and after the second world war Helena Ra- siowa. 24
  25. 25. Helena Rasiowa became, in 1977 the founder of Fundamenta Informaticae the first world journal specialized in foundation of com- puter science.The choice of the title Fundamenta Infor- maticae was again deliberate.It reflected not only the subject, but also stresses that the new research area being developed in Warsaw is a direct continu- ation of the tradition of the Foundational Studies of Polish School of Mathemat- ics. 25
  26. 26. Part 2:DESCRIPTIVE GRANULARITY A Model for Data Mining 26
  27. 27. We present here a formal syntax and seman- tics for a notion of a descriptive granu- larity.We do so in terms of three abstract models: Descriptive, Semantic, and Granular.Descriptive model formalizes the syntactical concepts and properties of the data min- ing, or learning process.Semantic model formalizes its semantical prop- erties.Granular model establishes a relationship be- tween the Descriptive and Semantic mod- els in terms of a formal satisfaction rela- tion. 27
  28. 28. Data Mining - Informal DefinitionOne of the main goals of Data Mining is to provide comprehensible descriptions of information extracted from the data bases.We are hence interested in building models for a descriptive data mining, i.e. the data mining which main goal is to produce a set of descriptions in a language easily comprehensible to the user. 28
  29. 29. The descriptions come in different forms.In case of classification problems it might be a set of characteristic or discriminant rules, it might be a decision tree or a neural net- work with fixed set of weights.In case of association analysis it is a set of associations (frequent item sets), or asso- ciation rules with accuracy parameters.In case of cluster analysis it is a set of clus- ters, each of which has its own description and a cluster name. 29
  30. 30. In case of approximate classification by the Rough Set analysis it is usually a set of dis- criminant or characteristic rules (with or without accuracy parameters) or a set of decision tables.Data Mining results are usually presented to the user in their descriptive, i.e. syntac- tic form as it is the most natural form of communication. But the Data Mining process is deeply semantical in its nature. We hence build our Granular Model on two levels: syntactic and semantic. 30
  31. 31. SYNTAXWe understand] by syntax, or syntactical concepts simple relations among symbols and expressions of formal symbolic lan- guages.A symbolic language is a pair L = (A, E), where A is an alphabet and E is the set of expressions of L.The expressions of formal languages, even if created with a specific meaning in mind, do not carry themselves any meaning, they are just finite sequences of certain symbols. The meaning is being assigned to them by establishing a proper semantics. 31
  32. 32. SEMANTICSSemantics for as given symbolic language L assigns a specific interpretation in some domain to all symbols and expressions of the language.It also involves related ideas such as truth and model. They are called semantical concepts to distinguish them from the syn- tactical ones. 32
  33. 33. MODELThe word model is used in many situations and has many meanings but they all reflect some parts, if not all, of its following formal meaning.A structure M , called also an interpretation, is a model for a set E0 ⊆ E of expressions of a formal language L if and only if every expression E ∈ E0 is true in M . 33
  34. 34. All our Models are abstract structures that allow us to formalize some general prop- erties of Data Mining process and address the semantics-syntax duality inherent to any Data Mining process.Moreover, it allows us to provide a formal def- inition of a generalization and of Data Mining as the process of information gen- eralization. 34
  35. 35. The notion of generalization is defined in terms of granularity of steps of the pro- cess.Data is represented in the model in a form of Knowledge Systems.Each Knowledge System has a granularity associated with it and the process changes, or not, its granularity.Granularity is the crucial for defining some notions and components of the model, hence the Granular Model name. 35
  36. 36. Granular ModelGranular Model is a system GM = ( S M, DM, |= ) where: • SM is a Semantic Model; • DM is a Descriptive Model; • |= ⊆ P(U ) × E is called a satisfaction relation, where U is the universe of SM and E is the set of descriptions defined by the DM.Satisfaction |= establishes truth relationship between the data mining model and the descriptive model. 36
  37. 37. Semantic Model definition motivation.First step in any data mining procedures is to drop the key attribute.This step allows us to introduce similarities in the database as records do not have their unique identification anymore.The input into the data mining process is hence always a a data table obtained from the target data by removal of the key at- tribute.We call it a target data table. 37
  38. 38. As the next step we represent, following Rough Set model our target data table as Pawlak’s Information System with the universe U by adding a new, non attribute column for the record names, i.e. objects of U . We take this set U as the universe of our model of SM.Why Information system?We want to model Data Mining as a process of generalization.In order to model this process we have first to define what does it mean from seman- tical point of view that one stage of the process is more general then the other. 38
  39. 39. The idea behind is very simple. It is the same as saying that (a + b)2 = a2 + 2ab + b2 is a more general formula then the formula (2 + 3)2 = 22 + 2 · 2 · 3 + 32.This means that one description (formula) is more general then the other if it de- scribes more objects.From semantical point of view it means that data mining process consists of putting ob- jects (records) in sets of objects.From syntactical point of view data min- ing process consists of building descrip- tions (in terms of attribute, values of at- tributes pairs) of these sets of objects, with some extra parameters, if needed. 39
  40. 40. To model a situation that allows us to talk about descriptions of sets of records (ob- jects) we extend the notion of Pawlak’s model of information system to our notion of Knowledge System.The universe of a knowledge system con- tains some subsets of U , i.e. elements of P(U ).For example a target data table (after pre- processing) and the corresponding repre- sentation by Pawlak’s information system, and a knowledge system with universe U of granularity one are as follows. 40
  41. 41. Target Data Table T0 a1 a2 a3 small small medium medium small medium small small medium big small small medium medium big small small medium big small small medium medium big small small medium big small medium medium medium small small small medium big small big medium medium small Target Information System I0U a1 a2 a3x1 small small mediumx2 medium small mediumx3 small small mediumx4 big small smallx5 medium medium bigx6 small small mediumx7 big small smallx8 medium medium bigx9 small small mediumx10 big small mediumx11 medium medium smallx12 small small mediumx13 big small bigx14 medium medium small 41
  42. 42. Knowledge System of granularity one (all objects are one element sets) correspond- ing to target table T0 is as follows. Target Knowledge System K0 P 1 (U ) a1 a2 a3 {x1 } small small medium {x2 } medium small medium {x3 } small small medium {x4 } big small small {x5 } medium medium big {x6 } small small medium {x7 } big small small {x8 } medium medium big {x9 } small small medium {x10 } big small medium {x11 } medium medium small {x12 } small small medium {x13 } big small big {x14 } medium medium small 42
  43. 43. Assume now that we have applied some algo- rithm ALG1 and it has returned a following set D = {D1, D2, ...D7} of descriptions.D1 : (a1 = s) ∩ (a2 = s) ∩ (a3 = m),D2 : (a1 = m) ∩ (a2 = s) ∩ (a3 = m),D3 : (a1 = m) ∩ (a2 = m) ∩ (a3 = b),D4 : (a1 = m) ∩ (a2 = m) ∩ (a3 = s),D5 : (a1 = b) ∩ (a2 = s) ∩ (a3 = s),D6 : (a1 = b) ∩ (a2 = s) ∩ (a3 = m),D7 : (a1 = b) ∩ (a2 = s) ∩ (a3 = b). 43
  44. 44. QuestionsQ1 How well this set of descriptions describes our original data i.e. how accurate is the algorithm ALG1 we have used to find them,Q2 how accurate is the knowledge we have thus obtained out of our data.The answer is formulated in terms of the tar- get information system with the universe U , and the sets S(D) defined (after Pawlak) for any description D ∈ D as follows. S(D) = {x ∈ U : D}.We call S(D) the truth set for D. 44
  45. 45. Intuitively, the sets S(D) = {x ∈ U : D} contain all records (i.e. their identifiers) with the same description given in terms of attribute, values of attribute pairs.The descriptions do not need to utilize all at- tributes of the target data, as it is often the case, and one of ultimate goals of data mining is to find descriptions with as few attributes as possible. 45
  46. 46. In association analysis the descriptions can rep- resent the frequent item sets.For example , for a frequent three itemset D = i1i2i3, the truth set S(D) represents all all transactions that contain items i1, i2, i3.In general description come in different forms, depending on the data mining goal and ap- plication.We define formally a general form of descrip- tions as a part of the Descriptive Model 46
  47. 47. For the target data and descriptions Di ∈ D presented in the above examples the sets S(Di) are as follows.S1 = S(D1 ) = {x ∈ U : D1 } = {x1 , x3 , x6 , x9 , x12 },S2 = S(D2 ) = {x ∈ U : D2 } = {x2 },S3 = S(D3 ) = {x ∈ U : D3 } = {x5 , x8 },S4 = S(D4 ) = {x ∈ U : D4 } = {x11 , x14 },S5 = S(D5 ) = {x ∈ U : D5 } = {x4 , x7 },S6 = S(D6 ) = {x ∈ U : D6 } = {x10 },S7 = S(D7 ) = {x ∈ U : D7 } = {x13 }. 47
  48. 48. We represent our results in a form of a Knowl- edge System as follows. Resulting Knowledge System K1 P(U ) a1 a2 a3 {x1 , x3 , x6 , x9 , x12 } s s m {x2 } m s m {x5 , x8 } m m b {x11 , x14 } m m s {x4 , x7 } b s s {x10 } b s s {x13 } b s b P(U ) a1 a2 a3 S1 s s m S2 m s m S3 m m b S4 m m s S5 b s s S6 b s s S7 b s b 48
  49. 49. The representation of data mining results in a form of a knowledge system allows us to define how good is the knowledge ob- tained by a given algorithm.In our case the knowledge obtained describes 100% of our target data as S1 ∪ S2 ∪ S3 ∩ ... ∪ S7 = {x1, x2, ..., x14} = U.Observe that the sets S1, ..S7 are also disjoint and non-empty, i.e. they form a partition of the universe U .We define such knowledge as exact. 49
  50. 50. Moreover, we can see that the resulting sys- tem K1 is more general then the input data K0 because its granularity is higher the the granularity of K0.Definition: Granularity of a knowledge sys- tem is the maximum of cardinality of its granules, i.e. elements of its universe.The granularity of all Target Knowledge Sys- tems is one.Granularity of K1 is max{|S1|, ...|S7|} = max{5, 1, 2, } = 5. 50
  51. 51. Now assume that we have applied to out tar- get data T (represented by K0 ) another algorithm ALG2 and it returned two de- scriptions D1, D2 under a condition that we need only descriptions of the length 2 and with frequency ≥ 30%. The descriptions are:D1 : (a1 = s) ∩ (a2 = s),D2 : (a2 = s) ∩ (a3 = m).Now we evaluate:S1 = S(D1 ) = {x1 , x3 , x6 , x9 , x12 },S2 = S(D2 ) = {x1 , x2 , x3 , x6 , x9 , x10 , x12 }. 51
  52. 52. Incorporating the algorithm parameters im- posed by the ALG2 into our Knowledge System we obtain the following table. Resulting Knowledge System K2 P(U ) a1 a2 a3 #of attr frequency S1 s s - 2 36% S2 - s m 2 50%The sets S1, S2 do not form a partition of the universe U as S1 ∩ S2 = ∅ and moreover, S1 ∪ S2 = U .The knowledge obtained by the algorithm ALG2 is hence not exact.It describes only 57% of the target data and what is described is described following cer- tain (frequency) conditions.Of course K2 is more general then K0. 52
  53. 53. The algorithm ALG2 generalized the target data, even if in an incomplete way.The formal definitions of Information System, Knowledge and Target Knowledge Systems, and their granularity and exactness are as follows. 53
  54. 54. Knowledge System is an extension of the fol- lowing notion of Pawlak’s information sys- tem.Information System is a system I = (U, A, VA, f ), where U = ∅ is called a set of objects, A = ∅, VA = ∅ are called the set of at- tributes and values of of attributes, re- spectively, f is called an information function and f : U × A −→ VA 54
  55. 55. A knowledge system based on the informa- tion system I = (U, A, VA, f ) is a system KI = (P(U ), A, E, VA, VE , g) whereE is a finite set of knowledge attributes (k- attributes) such that A ∩ E = ∅.VE is a finite set of values of k- attributes. 55
  56. 56. g is a partial function called knowledge in- formation function(k-function) g : P(U ) × (A ∪ E) −→ (VA ∪ VE ) such that(i) g | ( x∈U {x} × A) = f(ii) ∀S∈P(U )∀a∈A((S, a) ∈ dom(g) ⇒ g(S, a) ∈ VA)(iii) ∀S∈P(U )∀e∈E ((S, e) ∈ dom(g) ⇒ g(S, e) ∈ VE ) 56
  57. 57. We use the above notion of knowledge sys- tem to define the granules of the universe and the granularity of the system, an hence later, the granularity of the data mining process.Granule: Any set S ∈ P(U ) i.e. S ⊆ U is called a granule of U .Granularity of S: The cardinality |S| of S is called a granularity of S.Granule Universe: The set GrK = {S ∈ P : ∃b ∈ (E∪A)((S, b) ∈ dom(g))} is called a granule universe of KI .Granularity of K: A number grK = max{|S| : S ∈ GrK } is called a granularity of K. 57
  58. 58. A knowledge system K = (P(U ), A, E, VA, VE , g) is called exact if and only if all its granules GrK form a partition of the universe U .Operators: In our Model we represent data mining algorithms as certain operators.For example our ALG1 is represented in the semantic model by an operator p1 acting on some subset of a set K of knowledge systems, such that p1(K0) = K1.ALG2 is represented in the model by an op- erator p2 also acting on some (may be dif- ferent) subset of the set K of knowledge systems, such that p2(K0) = K2. 58
  59. 59. We put all the above observations into a for- mal notion of a semantic model.Semantic Model is a system S M = (P(U ), K, G), where: • U = ∅ is the universe; • K = ∅ is a set of knowledge systems, called also data mining process states; • G = ∅ is the set of operators; • Each operator p ∈ G is a partial function on the set of all data mining process states, i.e. p : K −→ K. 59
  60. 60. The semantic model is always being built for a given application.The target data is represented first in a form the target information system with the uni- verse U , and then in the form of target knowledge system K0, as we showed in our examples. 60
  61. 61. The semantic model based on our examples is as follows. S M = (P(U ), K, G), where: • U = {x1, x2, ...x14}; • K = {K0, K1, K2}; • G = {p1, p2}; • Each pi ∈ G for (i = 1, 2) is a partial function pi : K1 −→ K1, such that p1(K0) = K1, p2(K0) = K2. 61
  62. 62. Data Mining as GeneralizationWe model data mining as a process of gen- eralization in terms of the generalization relation based on a notion of granularity and generalization operators.Definition: A relation ⊆ K × K is called a generalization relation if the following condition holds for any K, K ∈ K. K K if and only if grK ≤ grK , where grK denotes the granularity of K. 62
  63. 63. Observe that for K0, K1, K2 from our exam- ples grK0 = 1 ≤ 5 = grK1 ≤ 7 = grK2 , and the system K2 is the most general. But at the same time K1 is exact and K2 is not exact, so we have a trade off between exactness and generality.Definition: an operator g ∈ G is called a gen- eralization operator if for any K, K ∈ K such that g(K) = K , we have that K K.Observe that both operators p1, p2 in our ex- ample are generalization operators. 63
  64. 64. Data Mining Operators GIn data mining process the preprocessing and data mining proper are disjoint , inclu- sive/exlusive categories.The preprocessing is an integral and very im- portant stage of the data mining process and needs as careful analysis as the data mining proper.Our framework allows us distinguish two dis- joint classes of operators: the preprocess- ing operators Gprep and data mining proper operators Gdm and we put G = Gprep ∪ Gdm. 64
  65. 65. We provide also a detailed formal definitions, their motivation, and discussion of these two classes.Data Mining and preprocessing operators de- fine different kind of generalizations.The model presented in our examples didn’t include the preprocessing stage; it used the data mining proper operators only. 65
  66. 66. The main idea behind the concept of the operator is to capture not only the fact that data mining techniques generalize the data but also to categorize existing meth- ods.We define within our model three classes of data mining operators: classification Gclass, clustering Gclust, and association Gassoc.We don’t include in our analysis purely sta- tistical methods like regression, etc... 66
  67. 67. We prove the following theorem.Theorem Let Gclass, Gclust and Gassoc be the sets of all classification, clustering, and as- sociation operators, respectively. The following conditions hold.(1) Gclass = Gclust = Gassoc(2) Gassoc ∩ Gclass = ∅,(3) Gassoc ∩ Gclust = ∅. 67
  68. 68. Data Mining ProcessDefinition Any sequence K1, K2, ....Kn (n ≥ 1) of data mining states is called a data pre- processing process, if there is a prepro- cessing operator G ∈ Gprep, such that G(Ki) = Ki+1, i = 1, 2, ...n − 1.Definition Any sequence K1, K2, ....Kn (n ≥ 1) of data mining states is called a data min- ing proper process , if there is a data mining proper operator G ∈ Gdm, such that G(Ki) = Ki+1, i = 1, 2, ...n − 1. 68
  69. 69. The data mining process consists of the pre- processing process (that might be empty) and the data mining proper process.We know that the sets Gprep and Gdm are dis- joint. This justifies the the following defi- nition.Definition Data mining process process is any sequence K1, K2, ....Kn (n ≥ 1) of data mining states, such that K1, ..Ki (0 ≤ i ≤ n) is a preprocessing process and Ki+1, ...., Kn is a data mining proper process. 69
  70. 70. Granular ModelSyntax- Semantic Duality of Data MiningGranular Model is a system GM = ( S M, DM, |= ) where: • SM is a Semantic Model; • DM is a Descriptive Model; • |= ⊆ P(U ) × E is called a satisfaction relation, where U is the universe of SM and E is the set of descriptions defined by the DM.Satisfaction |= establishes relationship between the semantic model and the descriptive model. 70
  71. 71. Descriptive ModelFor any Semantic Model S M = (P(U ), K, G, ) we associate with it its descriptive counter- part defined below.A Descriptive Model is a system DM = ( L, E, DK ), where:L = ( A, E ) is called a descriptive lan- guage;A is a countably infinite set called the alpha- bet;E = ∅ and E ⊆ A∗ is the set of descriptive expressions of L; 71
  72. 72. DK = ∅ and DK ⊆ P(E) is a set of descrip- tions of knowledge states.As in a case of semantic model, we build the descriptive model for a given application.We define here only a general form of the model.We assume however, that whatever is the ap- plication, the descriptions are always build in terms of attributes and values of the attributes, some logical connectives, some predicates and some extra parameters, if needed.The commonly used descriptions have the form (a = v) to denote that the attribute a has a value v, but one might also use, as it is often done, a predicate form a(v) or a(x, v) instead. 72
  73. 73. For example, a neural network with its nodes and weights can be seen as a formal de- scription (in an appropriate descriptive lan- guage), and the knowledge states would represent changes in parameters during the neural network training process.The model we build here is a model for, what we call a descriptive data mining, i.e. the data mining for which the goal of the data mining process is to produce a set of de- scriptions in a language easily comprehen- sible to the user.For that purpose in the model we identify the decision tree constructed by the classifica- tion by Decision Tree algorithm with the set of discriminant rules obtained from the tree. 73
  74. 74. Granular Model is a system GM = ( S M, DM, |= ) where: • SM is a Semantic Model; • DM is a Descriptive Model; • |= ⊆ P(U ) × E is called a satisfaction relation, where U is the universe of SM and E is the set of descriptions defined by the DM.Satisfaction |= establishes relationship between the semantic model and the descriptive model.We define the Satisfaction |= component of the Granular Model DM in the following stages.Stage1 For each K ∈ K, we define its own descriptive language LK = ( AK , EK ). 74
  75. 75. Stage2 For each K ∈ K, and descriptive ex- pression F ∈ EK , we define what does it mean that D satisfied in K; i.e. we define a satisfaction relation |=K .Stage3 For each K ∈ K, and descriptive ex- pression F ∈ EK , we define what does it mean that D is true K, i.e. |=K D.
  76. 76. Stage4 We use the satisfaction relation |=K to define, for each K ∈ K, the set DK ⊆ P(EK ) of descriptions of its own knowl- edge.Stage5 We use the languages LK to define the descriptive language L.Stage6 We use the descriptive expressions EK of LK to define the set E of descriptive expressions of L.Stage7 We use the satisfaction relations |=K to define the satisfaction relation |= of the Granular Model GM. 75
  77. 77. Part 3: TRACING THE HISTORYMathematics Genealogy Projectgenealogy.math.ndsu.nodak.edu 76
  78. 78. We all have a historyWe are all mathematiciansMission Statement of the Mathematics Ge- nealogy Project defines a mathematician as follows.” ... Throughout this project when we use the word ”mathematics” or ”mathemati- cian” we mean that word in a very inclu- sive sense. Thus, all relevant data from statistics, computer science, or operations research is welcome....”Computer Science classification within the project is: Mathematics Subject Classifi- cation: 68Computer Science. 77
  79. 79. The Genealogy Project solicits information from all schools who participate in the devel- opment of research level mathematics and from all individuals who may know desired information. It means Computer Science as well.For them, and the history, we are all math- ematicians. 78
  80. 80. Below are some links (sequences of connected people) for a computer scientist.Any two people in the sequence are listed in order PhD student, Adviser.If a person has more then one adviser the ad- viser is preceded with a number; i.e.adviser 1 is listed as 1. adviser Name,adviser 2 is listed as 2. adviser Name, etc... 79
  81. 81. A mathematician would say: For any element A of the sequence, if A has more then one adviser, then for any 1 ≤ k ≤ n , an adviser k is listed as k.Name of the adviser k, and the number in front of the name is omitted otherwise. 80
  82. 82. Link to Nicolaus Copernicus (Mikolaj Kopernik) He has 1598 descendantsAnita Wasilewska, Ph.D. Warsaw University, 1975, Poland, Helena Rasiowa, Ph.D. War- saw University,1950, Andrzej Mostowski, Ph.D. Warsaw University, 1938, 2. Alfred Tarski, Ph.D. Warsaw University, 1924, Stanislaw Lesniewski, Ph.D. University of Lvov, 1912, Kazimierz Twardowski, Ph.D. Universitat Wien, 1891, Franz Clemens Brentano, Ph.D. Eberhard Karls Universi- tat, Tubingen 1862, 2. Friedrich Adolf Trendelenburg, Dr. phil. Universitat Leipzig, 1826, 1. Georg Ludwig Konig, Artium Liberalium Magister, Georg August Univer- sitat, Gottingen, 1790, Christian Heyne, Magister Juris, Universitat Leipzig, 1752, 81
  83. 83. 1. Johann August Bach, Magister philosophiae, Universitat Leipzig, 1744, 1.Christian Kust- ner, Magister philosophiae, Universitat Leipzig, 1742, Johann Ernesti, Magister philosophiae, Universitat Leipzig, 1730, Johann Gesner, Magister artium, Friedrich Schiller Univer- sitat Jena, 1715, Johann Buddeus, Magis- ter artium, Martin Luther Universitat, Halle Wittenberg, 1687, Michael Walther, Jr., Magister artium, Theol. Dr., Martin Luther Universitat, Halle Wittenberg, 1661, 1687, 2.Johann Quenstedt, Magister artium, Theol. Dr., Universitat Helmstedt, Martin Luther Universitat,b Halle Wittenberg, 1643, 1644, Christoph Notnagel, Magister artium, Mar- tin Luther Universitat, Halle Wittenberg, 1630, Ambrosius Rhodius, Magister artium, Medicinae Dr., Martin Luther Universitat, Halle Wittenberg, 1600, 1610, 82
  84. 84. 1.Melchior Jostel, Magister artium, Medici- nae Dr., Martin Luther Universitat, Halle Wittenberg, 1583, 1600, 1.Valentin Otto, Magister artium, Martin Luther Universi- tat, Halle Wittenberg, 1570, Georg Joachim Rheticus, Magister artium, Martin Luther Universitat, Halle Wittenberg 1535,2. Nicolaus Copernicus, Juris utriusque, Doctor, Uniwersytet Jagiellonski (Cra- cow Jagellonian University), Universita di Bologna, Universita degli Studi di Ferrara, Universita di Padova, 1499, Poland-Italy,2.Domenico Novara da Ferrara, Universita di Firenze, 1483, 1. Johannes Regiomon- tanus, Magister artium, Universitat Leipzig, Universitat Wien, 1457, 83
  85. 85. Georg von Peuerbach, Magister artium, Uni- versitat Wien, 1440, Johannes von Gmunden, Magister artium, Universitat Wien, 1406, Heinrich von Langenstein, Magister artium, Theol. Dr., Universite de Paris, 1363, 1375, unknown.Georg von Peuerbach, 1375 is my ”oldest” ancestor.THERE ARE 3 more lines of ancestry; also interesting, if not so illustrious. Here they are. 84
  86. 86. Link to Gottfried Leibniz (54209 descendants), Immanuel Kant ( 2176 descendants), and Desiderius Erasmus of Rotterdam (57416 descendants)Anita Wasilewska, Ph.D. Warsaw University, 1975, Poland, Helena Rasiowa, Ph.D. War- saw University, 1950, Andrzej Mostowski, Ph.D. Warsaw University, 1938, 2. Alfred Tarski, Ph.D. Warsaw University, 1924, Stanislaw Lesniewski, Ph.D. University of Lvov, 1912, Kazimierz Twardowski, Ph.D. Universitat Wien, 1891, Franz Clemens Brentano, Ph.D. Eberhard Karls Univer- sitat, Tubingen 1862, 2. Friedrich Adolf Trendelenburg, Dr. Phil. Universitat Leipzig, 1826, 2. Karl Reinhold, PhD., 85
  87. 87. Immanuel Kant, Ph.D. Universitat Konigs- berg 1770,Martin Knutzen, Dr. Phil. Universitat Konigs- berg, 1732, Christian von Wolff, Dr. phil., Universitat Leipzig, 1700,2. Gottfried Leibniz, Dr. jur. Universitat Altdorf, 1666,2. Christiaan Huygens, Artium Liberalium Magister, Jurisutriusque Doctor, Universiteit Leiden, Universite d’Angers, 1647, 1655, Frans van Schooten, Jr., Artium Liberal- ium Magister, Universiteit Leiden, 1635, Jacobus Golius, Artium Liberalium Magis- ter, Philosophiae Doctor Universiteit Lei- den, 1612, 1621, 1. Willebrord (Snel van Royen) Snellius, Artium Liberalium Magis- ter, Universiteit Leiden, 1607, 2. Rudolph 86
  88. 88. (Snel van Royen) Snellius, Artium liberal- ium Magister, Universitat zu Koln, Ruprecht Karls Universitat Heidelberg, 1572, 1. Valen- tine Naibod, Magister Artium, Martin Luther Universitat, Halle Wittenberg, Universitat Erfur, Erasmus Reinhold, Magister Artium, Martin Luther Universitat, Halle Witten- berg, 1535, Jakob Milich, Liberalium Ar- tium Magister, Med. Dr., Albert Ludwigs Universitat Freiburg, Breisgau, Universitat Wien, 1520, 1524,Desiderius Erasmus Roterodamus (sometimes known as Desiderius Erasmus of Rot- terdam), University of Paris, Theologiae Baccalaureus, College de Montaigu, 1497,Jan Standonck, Magister Artium, Theol. Dr., College Sainte-Barbe, College de Montaigu, 1474, 1490, unknown
  89. 89. Link to Pierre-Simon Laplace ( 50295 descendants) and Jean Le Rond d’AlembertAnita Wasilewska, Ph.D. Warsaw University, 1975, Poland, Helena Rasiowa, Ph.D. War- saw University, 1950, Andrzej Mostowski, Ph.D. Warsaw University, 1938, 1. Kaz- imierz Kuratowski, Ph.D. Warsaw Uni- versity, 1921, 1. Stefan Mazurkiewicz, Ph.D. University of Lvov, 1913, Waclaw Sierpinski, Ph.D. Uniwersytet Jagiellonski, 1906, 1. Stanislaw Zaremba, Ph.D. Uni- versite Paris IV-Sorbonne, 1889, Gaston Darboux, Ph.D. Ecole Normale Superieure, Paris, 1866, Michel Chasles, Ph.D. Ecole Polytechnique, 1814, Simeon Poisson, Ph.D. Ecole Polytechnique, 1800, 2. Pierre-Simon Laplace, Ph.D., Jean Le Rond d’Alembert, unknown 87
  90. 90. Link to Emile Borel (2506 descendants), Leonhard Euler (52555 descendants)Anita Wasilewska, Ph.D. Warsaw University, 1975, Poland, Helena Rasiowa, Ph.D. War- saw University, 1950, Andrzej Mostowski, Ph.D. Warsaw University, 1938, 2. Zyg- munt Janiszewski, Ph.D. Ecole Normale Superieure Paris, 1911, Henri Lebesgue, Ph.D. Universite Henri Poincare Nancy 1, 1902, Emile Borel, Ph.D. Ecole Normale Superieure, Paris, 1893, Gaston Darboux, Ph.D. Ecole Normale Superieure, Paris, 1866, Michel Chasles, Ph.D., Ecole Polytechnique, 1814, Simeon Poisson, Ph.D. Ecole Poly- technique, 1800, 88
  91. 91. 1. Joseph Lagrange, no degree, student of Leonhard Euler, Ph.D. Universitat Basel, 1726, Dr. med. Universitat Basel, 1694, Dr. hab. Sci. Universitat Basel, 1684, Gottfried Leibniz, Dr. jur. Universitat Alt- dorf, 1666, 1.Johann Bernoulli, Dr. med. Universitt Basel 1694, Jacob Bernoulli, Dr. hab. Sci. Universitt Basel, 1684, Got- tfried Wilhelm Leibniz, Dr. jur. Universitt Altdorf, 1666, 1. Erhard Weigel, Ph.D. Universitt Leipzig, 1650, unknown. 89
  92. 92. Link to Andrei Markov (4824 descendants), and Pafnuty Chebyshev (5964 descendants)Anita Wasilewska, Ph.D. Warsaw University, 1975, Poland, Helena Rasiowa, Ph.D. War- saw University, 1950, Andrzej Mostowski, Ph.D. Warsaw University, 1938, 1. Kaz- imierz Kuratowski, Ph.D. Warsaw Uni- versity,1921, 1. Stefan Mazurkiewicz, Ph.D. University of Lvov, 1913, Waclaw Sierpinski, Ph.D. Uniwersytet Jagiellonski, 1906, 2. Georgy Fedoseevich Voronoy, Ph.D. University of St. Petersburg, 1896, Andrei Markov, Ph.D. University of St. Petersburg, 1884, Pafnuty Chebyshev, Ph.D. University of St. Petersburg, 1849, Nikolai Dmitrievich Brashman, Ph.D. Moscow State University, 1834, Joseph Johann von Littrow, Ph.D., unknown 90
  93. 93. MY PhD COUSINS includeKurt GoedelAlain TuringAlonso ChurchRoman SikorskiZdzislam Pawlakand many others....I am sure some of them in this room! 91
  94. 94. In Stony Brook CS Department I traced 10 of them.WE ALL ARE A BIG SCIENTIFIC FAMILY! 92

×