Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor Daniel Martin Katz

2,726 views

Published on

Published in: Education, Technology, Business
  • Be the first to comment

  • Be the first to like this

ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor Daniel Martin Katz

  1. 1. Complex Systems Models in the Social Sciences (Lecture 6) daniel martin katz illinois institute of technology chicago kent college of law @computationaldanielmartinkatz.com computationallegalstudies.com
  2. 2. Today I Would Like to Sketch (In Part) Where the World is Heading
  3. 3. Simply Put
  4. 4. A Much More Data Driven World
  5. 5. Highlighting the Data Deluge 2008 2009 2010
  6. 6. Highlighting the Data Deluge 2011 2011
  7. 7. Highlighting the Data Deluge
  8. 8. Before Talking About the Specific Applications
  9. 9. Some Broad Trends
  10. 10. What is Driving the Big Data Revolution?
  11. 11. This is the Era of “Big Data” Decreasing Data Storage Costs Increasing Computing Power Fundamentally Altering the Scope of Scientific & Technical Possibility
  12. 12. Moore’s law !
  13. 13. And How Big is ‘Big’?
  14. 14. How Much Data Is a Petabyte?
  15. 15. How Much Data Is a Petabyte?
  16. 16. How Much Data Is a Petabyte?
  17. 17. How Much Data Is a Petabyte?
  18. 18. How Much Data Is a Petabyte?
  19. 19. How Much Data Is a Petabyte?
  20. 20. How Much Data Is a Petabyte?
  21. 21. How Much Data Is a Petabyte?
  22. 22. How Much Data Is a Petabyte?
  23. 23. Kryder’s law !
  24. 24. How Much Data Is a Petabyte?
  25. 25. Implications for the Economy
  26. 26. Erik Brynjolfsson is the Schussel Family Professor at the  MIT Sloan School of Management , Director of the MIT Center for Digital Business, Chair of the MIT Sloan Management Review , and the Editor of the Information Systems Network Andrew McAfee, a principal research scientist at MIT’s Center for Digital Business, studies the ways that information technology (IT) affects business.
  27. 27. . .. .... .. ...... .... ............ 128 256 512 1024 2048 4096 327688192 16384 65536 131k 262k 524k 1M 2M 4M ................................................................ 8M ................................ 16M 33M 67M 134M 268M 536M 1B 2B 4B 8B 17B 34B 68B 137B 274B 549B 1T 2T 4T 8T 17T 35T 70T 140T 281T 562T 1Q 2Q 4Q 9Q 18Q 36Q 72Q 144Q 288Q 576Q 1QT 2QT 4QT 9QT
  28. 28. Implications for Science + Engineering
  29. 29. How About This One ...
  30. 30. 2004 Darpa Grand Challenge
  31. 31. Goal: Build a Driverless Car that Could Travel 150 miles
  32. 32. Winning Vehicle Traveled only Eight Miles
  33. 33. Fast Forward to 2012 ...
  34. 34. Some Applicable Terms That Will Drive The Future
  35. 35. Natural Language Processing Clustering Knowledge Representation Machine Learning Feature Selection Feature Extraction Classification
  36. 36. http://www.drewconway.com/zia/?p=2378
  37. 37. A Simple Demo of Machine Learning
  38. 38. Smarter Than You Think ...
  39. 39. Smarter Than You Think ...
  40. 40. A Brief Discussion of the McKinsey Study
  41. 41. http://www-01.ibm.com/software/data/bigdata/industry.html
  42. 42. Financial Services Help agents find patterns, understand context and flag abuse by analyzing customer correspondence in conjunction with structured information. Predict consumer behavior by using cross-line of business details to correlate consumer debt ratios and transaction patterns. Automate entity identification and risk profiling based on SEC and other regulatory filings, with the ability to assemble, model and stay current on large volumes of information. Consume, analyze and act on real-time market data while maintaining sub-millisecond response times, even under extreme data loads.
  43. 43. Retail Analyze social media data in conjunction with customer buying data to predict customer behavior, and track sentiment and brand perception. Monitor social media sources for rumors, deliberate false information, and impersonation of employees to more quickly understand and correct misinformation. Issue marketing promotions, analyze their success in real time, and adapt promotions to optimize outcome. Analyze linkages between specific online advertising and recommendations to buying behavior so that you can strategically merchandise and place effective advertising. Run highly complex pattern detection algorithms on years, rather than months of transaction data, allowing the organization to rapidly detect and respond to new fraud scenarios and exposures. (Also True of Financial services)
  44. 44. Healthcare Perform complex real-time analytics on physiological streams of data in ICU environments to detect life-threatening conditions in time to proactively intervene. Manage and analyze real-time sensor data to assist chronic disease patients. Capture and analyze clinical information from electronic health records to speed the creation and diffusion of medical knowledge. Track a wide variety of data streams and leverage prior benchmarks to potentially help to expose the early signs of an epidemic
  45. 45. Telecom Perform real-time mediation with the ability to handle billions of call detail records per day. Process real-time call data to predict customer churn and remediate customer satisfaction issues (ex: dropped calls) as soon as they happen. Enable real-time geo-mapping and marketing promotions. Analyze social media data with customer buying data to predict customer behavior and track sentiment and brand perception. Unlock the insights embedded in call center voice recordings by doing voice-to-text conversions, then performing advanced text analytics on the converted recordings.
  46. 46. The Artificial Intelligence Revolution is On
  47. 47. The Artificial Intelligence Revolution is On
  48. 48. http://www.youtube.com/watch?v=lI-M7O_bRNg&feature=relmfu Watch This On Your Own
  49. 49. http://www.youtube.com/watch?v=DywO4zksfXw&feature=related Watch This On Your Own
  50. 50. A Brief Word About Data Driven Theory Building
  51. 51. Hypothesis Testing is the Core of Mainstream Science
  52. 52. Deduction
  53. 53. Popperian Falsification
  54. 54. Partial or Complete Induction
  55. 55. Is the Alternative
  56. 56. In Case You Did not Know
  57. 57. This is an Inductive Age
  58. 58. This is the Age of Aspirational Spelling (Spelling is 1.0 Thinking)
  59. 59. The Inverse Problem
  60. 60. Kepler v. Newton
  61. 61. Net Flix Prize -- From the AT&T Labs http://www.youtube.com/watch?v=ImpV70uLxyw
  62. 62. The Music Genome Project
  63. 63. A Brief Word About Data Driven Theory Building One Billion Clicks Can Be the Basis for a Theory Generation The Hypothesis Testing Frame is not the only way to do Science
  64. 64. Validation Can Be Achieved Through Either: (1) Out of Sample Testing (2) Forward Prediction A Brief Word About Data Driven Theory Building http://en.wikipedia.org/wiki/Netflix_Prize http://www.wired.com/epicenter/2009/09/ how-the-netflix-prize-was-won/ Example:
  65. 65. This is the Age of Aspirational Spelling (Spelling is 1.0 Thinking) Additional Examples
  66. 66. Additional Examples People Who Bought X Also Bought Y Collaborative Filtering
  67. 67. The Science of Similarity What Makes a Set of High Dimensional Objects ‘Similar”? Movies People Words Music Books Crowd Sourced Human Reasoning is Helping Develop a ‘New’ Science of Similarity
  68. 68. Computer Forensics Drawn from Jesse Kornblum 1. Feature Extraction 2. Feature Selection 3. Comparison 4. Clustering 5. Classification 6. ??? The ‘???’ means: – Which features to extract – Which similarity measure to use – Which classification algorithm The Science of Similarity
  69. 69. Machine Learning
  70. 70. http://www.youtube.com/watch? v=yDLKJtOVx5c&feature=results_video&playnext=1&list=PLD0F06 AA0D2E8FFBA
  71. 71. http://www.youtube.com/watch?v=EFrgVDniDqU&feature=related
  72. 72. If you Want a Full Course in the Topic It is Free From Stanford University
  73. 73. http://www.forbes.com/sites/oreillymedia/ 2012/01/05/goodbye-information-economy- hello-feedback-economy/
  74. 74. © daniel martin katz michael j bommarito Machine Learning is the heart of predictive analytics
  75. 75. Legal Analytics Professor Daniel Martin Katz Professor Michael J Bommarito II © daniel martin katz michael j bommarito
  76. 76. Supervised Statistical models Bayesian, e.g., Naïve Bayes Classification Frequentist, e.g., Ordinary Least Squares Neural Networks (NN) Support Vector Machines (SVM) Random Forests (RF) Genetic Algorithms (GA) Semi/Unsupervised Neural Networks (NN) Clustering K-means Hierarchical Radial Basis (RBF) Graph Some Machine Learning Methods © daniel martin katz michael j bommarito
  77. 77. http://scikit-learn.org/stable/tutorial/machine_learning_map/index.html © daniel martin katz michael j bommarito
  78. 78. classification clustering regression dimension reduction the family of machine learning methods © daniel martin katz michael j bommarito
  79. 79. Quick Example of the Methods © daniel martin katz michael j bommarito
  80. 80. © daniel martin katz michael j bommarito Adapted from Slides By Victor Lavrenko and Nigel Goddard @ University of Edinburgh Take A LookThese 12
  81. 81. © daniel martin katz michael j bommarito 72 Female Human 3 Female Horse 36 Male Human 21 Male Human 67 Male Human 29 Female Human 54 Male Human 44 Male Human 50 Male Human 42 Female Human 6 Male Dog 7 Female Human
  82. 82. © daniel martin katz michael j bommarito Classification (Supervised Learning) decision boundary female male f( ) Gender?
  83. 83. © daniel martin katz michael j bommarito Classification (Supervised Learning) decision boundary female male f( ) Gender? Regression (Supervised Learning) #f( ) Age? 723 2 3 67 54 29 42 44 50 7 6 27 44 53 3 68 2 48 10 6 743 4 4
  84. 84. © daniel martin katz michael j bommarito Classification (Supervised Learning) decision boundary female male f( ) Gender? f( ) Loan Application? Yes Multi Class Classification (Supervised Learning) No Maybe Yes Perhaps No Multiclass = Boundary Hyperplane Regression (Supervised Learning) #f( ) Age? 723 2 3 67 54 29 42 44 50 7 6 27 44 53 3 68 2 48 10 6 743 4 4
  85. 85. © daniel martin katz michael j bommarito Classification (Supervised Learning) decision boundary female male f( ) Gender? f( ) Loan Application? Yes Multi Class Classification (Supervised Learning) No Maybe Yes Perhaps No Multiclass = Boundary Hyperplane Regression (Supervised Learning) #f( ) Age? 723 2 3 67 54 29 42 44 50 7 6 27 44 53 3 68 2 48 10 6 743 4 4 Clustering (Unsupervised Learning) Clusterf( ) Group?
  86. 86. classification clustering regression dimension reduction the family of machine learning methods © daniel martin katz michael j bommarito
  87. 87. Why I Am Interested in (Obsessed with) Analogy
  88. 88. Analogy
  89. 89. Deep End of Human Reasoning?
  90. 90. What Makes an Analogy Convincing?
  91. 91. Analogical Reasoning
  92. 92. Law
  93. 93. Thinking Like a Lawyer
  94. 94. Psychology
  95. 95. The Availability Heuristic (Related but not exactly on point)
  96. 96. Science
  97. 97. Interdisciplinary Scholarship
  98. 98. Markets
  99. 99. Emerging Technology Venture Capital Pricing Model Arbitrage Detection
  100. 100. Emerging Technology Venture Capital Pricing Model Arbitrage Detection Analogy
  101. 101. How Much is Facebook Worth? Do I have the right analogy? 1 billion 10 billion 100 billion 500 billion
  102. 102. The Science of Similarity
  103. 103. and of course
  104. 104. The Curse of Dimensionality
  105. 105. The Revolution in Soft Artificial Intelligence
  106. 106. Computer Forensics Drawn from Jesse Kornblum 1. Feature Extraction 2. Feature Selection 3. Comparison 4. Clustering 5. Classification 6. ??? The ‘???’ means: – Which features to extract – Which similarity measure to use – Which classification algorithm
  107. 107. My hope is that the AI Revolution will allow us to better understand analogical reasoning as it is a critical for a large number questions we care about .....
  108. 108. http://blog.ted.com/2010/07/14/when_ideas_have/ When Ideas Have Sex By Matt Ridley

×