Slideshow transcript
Slide 1: Data Mining, Truth, Justice, the American Way, and the Flying Spaghetti Monster tim@menzies.us Ph.D. LCSEE, WVU, 20 Sept 2007
Slide 2: Expose, and hose • \"Part of education is to • \"Part of science is to expose people to different expose people to the schools of thought.” critical and continual (re)evaluation of ideas.” - President George Bush, - Some guy called Timm, August 1, 2005 September 20, 2007 2
Slide 3: \"Look up in the sky! It's a bird! It's a plane! It's Superman!\" \"Yes, it's Superman, strange visitor from another planet who came to Earth with powers and abilities far beyond those of mortal men.” “Superman, who can change the course of mighty rivers, bend steel in his bare hands; and who, disguised as Clark Kent, mild- mannered reporter for a great metropolitan newspaper, fights a never ending battle for truth, justice, and the American way.\" Why a never- How to ensure ending battle? justice? How to make lottsa $$ ? How to find truth? 3
Slide 4: So, tonight Notions of certainty Standards for debate Surprises Nothing is “truth” but many more things are false And some things are useful Implications for humility And for justice 4
Slide 5: God gave me a brain. I take it (s)he wants me to use it. Mark of the rational while not dead; do Review and revise assumptions; Done Entertain a wide range of ideas But don’t necessarily accept them Demand evidence that lets your repeat/ refute/ improve prior conclusions But what of faith? That, is another talk There is room for the divine in my universe But in my test tubes? Not too much 5
Slide 6: Data miners: agents that automate the creation and review of new ideas @relation weather.symbolic @attribute outlook {sunny, overcast, rainy} Mountains @attribute temperature {hot, mild, cool} @attribute humidity {high, normal} of data @attribute windy {TRUE, FALSE} @attribute play {yes, no} @data sunny,hot,high,FALSE,no Tablespoons of sunny,hot,high,TRUE,no knowledge overcast,hot,high,FALSE,yes rainy,mild,high,FALSE,yes rainy,cool,normal,FALSE,yes outlook = sunny rainy,cool,normal,TRUE,no | humidity = high: no overcast,cool,normal,TRUE,yes sunny,mild,high,FALSE,no | humidity = normal: yes sunny,cool,normal,FALSE,yes outlook = overcast: yes rainy,mild,normal,FALSE,yes outlook = rainy sunny,mild,normal,TRUE,yes | windy = TRUE: no overcast,mild,high,TRUE,yes overcast,hot,normal,FALSE,yes | windy = FALSE: yes rainy,mild,high,TRUE,no 6
Slide 7: Data doubling every 20 months Internet, Radio Frequency Identification (RFID) tracking, on-line shopping (patterns of sales tracked at Amazon) So now we can automatically learn answers to many questions; e.g. What eggs to select for IVF? What will software cost to develop? What diseases does a patient have? Which loan applications to fund? What houses will have the best resale value? Which parts of the program need more inspection? What products are best to sell to what markets? What cows to keep and which to send to the abattoir ? How to teach a satellite to distinguish between cloud shadows and oil spills? How much electricity will be needed in two hours i.e. what cola-powered generators to fire up? 7
Slide 8: More fundamentally, what can we say about the world, with any certainty? Same data, different data miners different conclusions Every miner biased by Evaluation bias Language What is the “shape” of the models we can learn? Decision trees, equations, etc Search Pruning the possible infinite space of of candidate models What not to explore Over-fitting avoidance How to stop the learner fixating on noise E.g. pruning back decision trees 8
Slide 9: Any learning scheme has many biases • Bias lets us ignore “stuff”. • Without it, we don’t know what is important or dull, we can’t summarize, generalize. • Without bias, we can’t learn from the past • Bias blinds us but lets us see the future • But changing biases changes what we best believe • No wonder truth is a never-ending battle 9
Slide 10: Generalizing from the past, works Sometimes, very clearly Heavy smokers have 2000% to 3000% higher change of lung cancer Learned theories performs very well on new data But ... the “best” learned theory can be a moveable feast. 10
Slide 11: So, a relativistic soup? No certainty? No way to plan effective actions? No way to rule out absurd notions? 11
Slide 12: I don’t want to offend any one, but… … I think that once … Should I even say this in a public place? there were no cell phones or iPods, or clothes, or \"Part of education is to expose countries, or language, or people to different schools of human society, or 4-valved thought.” hearts, or homeostasis, or President George Bush, organs, or brains, or planets, August 1, 2005 or stars, or matter Shouldn’t I be have to give credence to all theories? Where the net energy in-flow is positive… Evolution, Intelligent design the universe selects for self- perpetuating systems, Pirates cause global warming? an exponentially decreasing number of which are of exponentially increasing complexity 12
Slide 13: The Church of the Flying Spaghetti Monster (FSM) Founded in 2005 OSU physics graduate Bobby Henderson A protest against the decision by the Kansas State Board of Education That require the teaching of intelligent design as an alternative to biological evolution. Henderson wrote to the board professing belief in a supernatural Creator called the Flying Spaghetti Monster Demanded that his \"Pastafarian\" theory of creation be taught in science classrooms. 13
Slide 14: FSM is not about religion It is a mistake to view FSM as anti-religion Rather, FSM is anti-anti-scientific rigor No one in their right mind would ever believe this nonsense And that’s the point Truth is a never-ending battle We must have standards to assess scientific theories, to reject absurdities Or any nonsense can be released on this world E.g. “Global warming is caused by pirates.” 14
Slide 15: Wikipedia on FSM FSM: an invisible, undetectable Pirates are \"absolute divine Flying Spaghetti Monster beings\" and the original Pastafarians. Evidence for evolution planted by FSM to in to Pastafarians' faith Their image as \"thieves and outcasts\" is misinformation spread by Christian theologians in the FSM changes the results of Middle Ages and Hare Krishnas. measurements, like radiocarbon dating, via His Noodly Appendage. Pirates are \"peace-loving explorers and spreaders of good Heaven contains beer volcanoes will\" who distributed candy to and a stripper factory. small children. Hell is similar, but with stale beer Global warming, earthquakes, and diseased strippers. hurricanes, and other natural disasters are a direct effect of the shrinking numbers of pirates since the 1800s. 15
Slide 16: FSM “proof” of the divinity of pirates A case study on how not to present data X-axis deliberately misleading. Crazy? Yes! • But would you recognize such craziness if you say it again? 16
Slide 17: What is the “best” weight-loss diet? 17
Slide 18: How lucky for those in power that people don't think. - Adolph Hitler i.e. people trying to sell you their diet book
Slide 19: What is the “best” programming language? 19
Slide 20: To our peril, we trust old ideas too much Columbia ice strike: Size: 1200 in3, Speed: 477 mph (relative to vehicle) Certified as “safe” by the CRATER micro-meteorite model A typical experiment in CRATER’s test database Size: 3 in3 piece of debris Speed: under 150 mph. 20
Slide 21: Value of estrogen (NYT magazine, Sept 16, 2007) 1990s: Failure of scientific method American Heart Association Benefits of estrogen reported from large recommends hormone replacement observational studies, not randomized trials therapy for older women to ward off Repeated epidemiological finding: heart disease and osteoporosis. 2001: randomized trail rarely support conclusions from observational studies. 15 million Americans filling H.R.T. So forget what you’re read about prescriptions annually 2002: Anti-oxidants like vitamins E & C &beta carotene preventing heat disease estrogen therapy exposed as a hazard, Fiber prevents colon cancer not a benefit, for health 21
Slide 22: So, why is FSM silly? And please, rest assured, it is very very silly stuff indeed. Theories need an entrance exam Many possible theories one for each bias Demand that a theory has past at least some operational al test before we condone it, act on it. If no reason to accept the new, don’t Trust the most what has been challenged the most Karl Popper 22
Slide 23: No things are “right”, but some things are “useful” Sure, one data set supports many theories. But there are many many more theories that are unsupported. No model is right, but some things are useful (perform well on test data) George Box And many many many more ideas are useless Can’t make predictions Not defined enough to support (possible) refutation 23
Slide 24: Wolfgang Pauli The \"conscience of physics\", the critic to whom his colleagues were accountable. Scathing in his dismissal of poor theories often labeling it ganz falsch, utterly false. But “ganz falsch” was not his most severe criticism, He hated theories so unclearly presented as to be untestable unevaluatable, Worse than wrong because they could not be proven wrong. Not properly belonging within the realm of science, even though posing as such. Famously, he wrote of of such unclear paper: ”This paper is right. It is not even wrong.\" 24
Slide 25: Believe those who seek the truth; doubt those who find it -Andre Gide.
Slide 26: Don’t test once on just the training data Study more than the average performance Also look at the variance E.g. here, no significant on new data after X=8 26
Slide 27: If something works, poke it till it breaks i) Sort attributes on “infogain” ii) Learn using first N attributes labor soybean diabetes anneal A few variables are (often) enough 27
Slide 28: Living with Uncertainty Check how training rate size effects theory 28
Slide 29: Living with Uncertainty Launch learners with anomaly detection and repair tools 29
Slide 30: Living with uncertainty: An incremental discretizer + a Bayes count, alert, fix classifier where all inputs are all mono-classified Track average max likelihood for data processing in “era”’s of X instances Count: stuff seen in past Alert: if new counts different Contrast set learning Fix: find delta new to old Linear time inference, Very, very fast Tiny memory footprint And, it works [Orrego, 2004] F15 simulator data [courtesy B. Cukic] Five flights: a,b,c,d,e each with different off-nominal condition imposed at “time” 15 Off-nominal condition not present in prior data In all cases, massive change detected 30
Slide 31: Living with uncertainty Policy #1: exploration Life is a Tolerate the sub-optimal, a little balance Doing crazy things to learn new things between Policy #2: exploitation Fix your theories and base your work on those fixed ideas. Popper: • most “science” is puzzle solving… • … within existing paradigms. • Sometimes the paradigm breakdowns…. • …prompting revolutionary research Human young: • Do crazy things (take long trips) • Less craziness as we grow older 31
Slide 32: Tolerance of “exploration” Critical to the American way America: history of tolerance and acceptance 1945: 400 German rocket scientists choose to surrender to the Yankees, not the Russians The choose their post-war life based on their perceptions of American ideology Hence, 32
Slide 33: Tolerance = hi-tech = $$$ R. Florida: The Economic Geography of Talent, 2002 Annals of Association of American Geographers 92(4), 2002,pp743-655 Best predictor for hi-tech industry R2 0.42 to “coolness” R2 0.49 to cultural amenities R2 0.50 to median house value R2 0.77 to “diversity” index 33
Slide 34: Data Mining, Truth, Justice, the American Way & Flying Spaghetti Monsters “Superman, fights a never ending battle To make $$, for truth, justice, and the American way.\" institutionalize exploration Old conclusions must No “truth”, and tolerance be constantly re-assessed all Is biased. A healthy hi-tech needs tolerance to support exploration and that the FSM is silly, but would consider revising that view if new evidence emerges 34
Slide 35: Expose, and hose • \"Part of education is to • \"Part of science is to expose people to different expose people to the schools of thought.” critical and continual (re)evaluation of ideas.” - President George Bush, - Some guy called Timm, August 1, 2005 September 20, 2007 35



Add a comment on Slide 1
If you have a SlideShare account, login to comment; else you can comment as a guest- Favorites & Groups
Showing 1-50 of 0 (more)