Complex Systems Models
in the Social Sciences
(Lecture 6)
daniel martin katz
illinois institute of technology
chicago kent...
Today I Would Like to
Sketch (In Part)
Where the World
is Heading
Simply
Put
A
Much
More
Data
Driven
World
Highlighting the Data Deluge
2008 2009 2010
Highlighting the Data Deluge
2011 2011
Highlighting the Data Deluge
Before Talking
About the
Specific
Applications
Some Broad Trends
What is Driving
the Big Data
Revolution?
This is the Era of “Big Data”
Decreasing Data Storage Costs
Increasing Computing Power
Fundamentally Altering the Scope of...
Moore’s law
!
And
How
Big is
‘Big’?
How Much Data Is a Petabyte?
How Much Data Is a Petabyte?
How Much Data Is a Petabyte?
How Much Data Is a Petabyte?
How Much Data Is a Petabyte?
How Much Data Is a Petabyte?
How Much Data Is a Petabyte?
How Much Data Is a Petabyte?
How Much Data Is a Petabyte?
Kryder’s law
!
How Much Data Is a Petabyte?
Implications for the Economy
Erik Brynjolfsson is the Schussel Family
Professor at the  MIT Sloan School of
Management , Director of the MIT
Center for...
. .. ....
..
......
....
............ 128
256 512 1024 2048 4096 327688192 16384
65536 131k 262k 524k 1M 2M 4M
..............
Implications for
Science + Engineering
How About This One ...
2004 Darpa
Grand Challenge
Goal:
Build a Driverless
Car that Could
Travel 150 miles
Winning Vehicle
Traveled
only
Eight Miles
Fast Forward
to 2012 ...
Some Applicable Terms
That Will Drive
The Future
Natural Language Processing
Clustering
Knowledge Representation
Machine Learning
Feature Selection
Feature Extraction
Clas...
http://www.drewconway.com/zia/?p=2378
A Simple Demo of
Machine Learning
Smarter Than You
Think ...
Smarter Than You
Think ...
A Brief Discussion of
the McKinsey Study
http://www-01.ibm.com/software/data/bigdata/industry.html
Financial Services
Help agents find patterns, understand context and
flag abuse by analyzing customer correspondence in
conj...
Retail
Analyze social media data in conjunction with
customer buying data to predict customer behavior,
and track sentimen...
Healthcare
Perform complex real-time analytics on
physiological streams of data in ICU
environments to detect life-threate...
Telecom
Perform real-time mediation with the ability to
handle billions of call detail records per day.
Process real-time ...
The Artificial Intelligence
Revolution is On
The Artificial Intelligence
Revolution is On
http://www.youtube.com/watch?v=lI-M7O_bRNg&feature=relmfu
Watch This On Your Own
http://www.youtube.com/watch?v=DywO4zksfXw&feature=related
Watch This On Your Own
A Brief Word About Data
Driven Theory Building
Hypothesis Testing
is the Core of
Mainstream Science
Deduction
Popperian
Falsification
Partial or
Complete Induction
Is the Alternative
In Case You
Did not Know
This is an
Inductive
Age
This is the Age of
Aspirational Spelling
(Spelling is 1.0 Thinking)
The Inverse Problem
Kepler v. Newton
Net Flix Prize -- From the AT&T
Labs http://www.youtube.com/watch?v=ImpV70uLxyw
The Music Genome Project
A Brief Word About Data
Driven Theory Building
One Billion Clicks Can Be the
Basis for a Theory
Generation
The Hypothesis ...
Validation Can Be Achieved
Through Either:
(1) Out of Sample Testing
(2) Forward Prediction
A Brief Word About Data
Driven...
This is the Age of
Aspirational Spelling
(Spelling is 1.0 Thinking)
Additional Examples
Additional Examples
People Who Bought X
Also Bought Y
Collaborative Filtering
The Science of Similarity
What Makes a Set of High Dimensional
Objects ‘Similar”?
Movies People Words Music Books
Crowd So...
Computer
Forensics
Drawn from Jesse Kornblum
1. Feature Extraction
2. Feature Selection
3. Comparison
4. Clustering
5. Cla...
Machine Learning
http://www.youtube.com/watch?
v=yDLKJtOVx5c&feature=results_video&playnext=1&list=PLD0F06
AA0D2E8FFBA
http://www.youtube.com/watch?v=EFrgVDniDqU&feature=related
If you Want a Full Course in the Topic It is
Free From Stanford University
http://www.forbes.com/sites/oreillymedia/
2012/01/05/goodbye-information-economy-
hello-feedback-economy/
© daniel martin katz michael j bommarito
Machine Learning
is the heart of
predictive analytics
Legal Analytics
Professor Daniel Martin Katz
Professor Michael J Bommarito II
© daniel martin katz michael j bommarito
Supervised
Statistical models
Bayesian, e.g., Naïve Bayes Classification
Frequentist, e.g., Ordinary Least Squares
Neural ...
http://scikit-learn.org/stable/tutorial/machine_learning_map/index.html
© daniel martin katz michael j bommarito
classification
clustering
regression
dimension reduction
the family of machine learning methods © daniel martin katz micha...
Quick Example of
the Methods
© daniel martin katz michael j bommarito
© daniel martin katz michael j bommarito
Adapted from Slides By
Victor Lavrenko and Nigel Goddard
@ University of Edinburg...
© daniel martin katz michael j bommarito
72
Female
Human
3
Female
Horse
36
Male
Human
21
Male
Human
67
Male
Human
29
Femal...
© daniel martin katz michael j bommarito
Classification
(Supervised Learning)
decision
boundary
female
male
f( )
Gender?
© daniel martin katz michael j bommarito
Classification
(Supervised Learning)
decision
boundary
female
male
f( )
Gender?
Re...
© daniel martin katz michael j bommarito
Classification
(Supervised Learning)
decision
boundary
female
male
f( )
Gender?
f(...
© daniel martin katz michael j bommarito
Classification
(Supervised Learning)
decision
boundary
female
male
f( )
Gender?
f(...
classification
clustering
regression
dimension reduction
the family of machine learning methods © daniel martin katz micha...
Why I Am Interested in
(Obsessed with) Analogy
Analogy
Deep End of Human Reasoning?
What Makes
an Analogy
Convincing?
Analogical Reasoning
Law
Thinking
Like a
Lawyer
Psychology
The Availability
Heuristic
(Related but not exactly on point)
Science
Interdisciplinary
Scholarship
Markets
Emerging
Technology
Venture
Capital
Pricing
Model
Arbitrage
Detection
Emerging
Technology
Venture
Capital
Pricing
Model
Arbitrage
Detection
Analogy
How Much is
Facebook Worth?
Do I have the
right analogy?
1 billion
10 billion
100 billion
500 billion
The Science of Similarity
and of course
The Curse of Dimensionality
The
Revolution
in
Soft
Artificial
Intelligence
Computer
Forensics
Drawn from Jesse Kornblum
1. Feature Extraction
2. Feature Selection
3. Comparison
4. Clustering
5. Cla...
My hope is that the AI Revolution
will allow us to better understand
analogical reasoning as it is a critical
for a large ...
http://blog.ted.com/2010/07/14/when_ideas_have/
When Ideas
Have Sex
By Matt Ridley
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor Daniel Martin Katz
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor Daniel Martin Katz
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor Daniel Martin Katz
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor Daniel Martin Katz
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor Daniel Martin Katz
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor Daniel Martin Katz
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor Daniel Martin Katz
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor Daniel Martin Katz
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor Daniel Martin Katz
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor Daniel Martin Katz
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor Daniel Martin Katz
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor Daniel Martin Katz
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor Daniel Martin Katz
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor Daniel Martin Katz
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor Daniel Martin Katz
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor Daniel Martin Katz
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor Daniel Martin Katz
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor Daniel Martin Katz
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor Daniel Martin Katz
Upcoming SlideShare
Loading in …5
×

ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor Daniel Martin Katz

2,535 views
2,453 views

Published on

Published in: Education, Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,535
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
49
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor Daniel Martin Katz

  1. 1. Complex Systems Models in the Social Sciences (Lecture 6) daniel martin katz illinois institute of technology chicago kent college of law @computationaldanielmartinkatz.com computationallegalstudies.com
  2. 2. Today I Would Like to Sketch (In Part) Where the World is Heading
  3. 3. Simply Put
  4. 4. A Much More Data Driven World
  5. 5. Highlighting the Data Deluge 2008 2009 2010
  6. 6. Highlighting the Data Deluge 2011 2011
  7. 7. Highlighting the Data Deluge
  8. 8. Before Talking About the Specific Applications
  9. 9. Some Broad Trends
  10. 10. What is Driving the Big Data Revolution?
  11. 11. This is the Era of “Big Data” Decreasing Data Storage Costs Increasing Computing Power Fundamentally Altering the Scope of Scientific & Technical Possibility
  12. 12. Moore’s law !
  13. 13. And How Big is ‘Big’?
  14. 14. How Much Data Is a Petabyte?
  15. 15. How Much Data Is a Petabyte?
  16. 16. How Much Data Is a Petabyte?
  17. 17. How Much Data Is a Petabyte?
  18. 18. How Much Data Is a Petabyte?
  19. 19. How Much Data Is a Petabyte?
  20. 20. How Much Data Is a Petabyte?
  21. 21. How Much Data Is a Petabyte?
  22. 22. How Much Data Is a Petabyte?
  23. 23. Kryder’s law !
  24. 24. How Much Data Is a Petabyte?
  25. 25. Implications for the Economy
  26. 26. Erik Brynjolfsson is the Schussel Family Professor at the  MIT Sloan School of Management , Director of the MIT Center for Digital Business, Chair of the MIT Sloan Management Review , and the Editor of the Information Systems Network Andrew McAfee, a principal research scientist at MIT’s Center for Digital Business, studies the ways that information technology (IT) affects business.
  27. 27. . .. .... .. ...... .... ............ 128 256 512 1024 2048 4096 327688192 16384 65536 131k 262k 524k 1M 2M 4M ................................................................ 8M ................................ 16M 33M 67M 134M 268M 536M 1B 2B 4B 8B 17B 34B 68B 137B 274B 549B 1T 2T 4T 8T 17T 35T 70T 140T 281T 562T 1Q 2Q 4Q 9Q 18Q 36Q 72Q 144Q 288Q 576Q 1QT 2QT 4QT 9QT
  28. 28. Implications for Science + Engineering
  29. 29. How About This One ...
  30. 30. 2004 Darpa Grand Challenge
  31. 31. Goal: Build a Driverless Car that Could Travel 150 miles
  32. 32. Winning Vehicle Traveled only Eight Miles
  33. 33. Fast Forward to 2012 ...
  34. 34. Some Applicable Terms That Will Drive The Future
  35. 35. Natural Language Processing Clustering Knowledge Representation Machine Learning Feature Selection Feature Extraction Classification
  36. 36. http://www.drewconway.com/zia/?p=2378
  37. 37. A Simple Demo of Machine Learning
  38. 38. Smarter Than You Think ...
  39. 39. Smarter Than You Think ...
  40. 40. A Brief Discussion of the McKinsey Study
  41. 41. http://www-01.ibm.com/software/data/bigdata/industry.html
  42. 42. Financial Services Help agents find patterns, understand context and flag abuse by analyzing customer correspondence in conjunction with structured information. Predict consumer behavior by using cross-line of business details to correlate consumer debt ratios and transaction patterns. Automate entity identification and risk profiling based on SEC and other regulatory filings, with the ability to assemble, model and stay current on large volumes of information. Consume, analyze and act on real-time market data while maintaining sub-millisecond response times, even under extreme data loads.
  43. 43. Retail Analyze social media data in conjunction with customer buying data to predict customer behavior, and track sentiment and brand perception. Monitor social media sources for rumors, deliberate false information, and impersonation of employees to more quickly understand and correct misinformation. Issue marketing promotions, analyze their success in real time, and adapt promotions to optimize outcome. Analyze linkages between specific online advertising and recommendations to buying behavior so that you can strategically merchandise and place effective advertising. Run highly complex pattern detection algorithms on years, rather than months of transaction data, allowing the organization to rapidly detect and respond to new fraud scenarios and exposures. (Also True of Financial services)
  44. 44. Healthcare Perform complex real-time analytics on physiological streams of data in ICU environments to detect life-threatening conditions in time to proactively intervene. Manage and analyze real-time sensor data to assist chronic disease patients. Capture and analyze clinical information from electronic health records to speed the creation and diffusion of medical knowledge. Track a wide variety of data streams and leverage prior benchmarks to potentially help to expose the early signs of an epidemic
  45. 45. Telecom Perform real-time mediation with the ability to handle billions of call detail records per day. Process real-time call data to predict customer churn and remediate customer satisfaction issues (ex: dropped calls) as soon as they happen. Enable real-time geo-mapping and marketing promotions. Analyze social media data with customer buying data to predict customer behavior and track sentiment and brand perception. Unlock the insights embedded in call center voice recordings by doing voice-to-text conversions, then performing advanced text analytics on the converted recordings.
  46. 46. The Artificial Intelligence Revolution is On
  47. 47. The Artificial Intelligence Revolution is On
  48. 48. http://www.youtube.com/watch?v=lI-M7O_bRNg&feature=relmfu Watch This On Your Own
  49. 49. http://www.youtube.com/watch?v=DywO4zksfXw&feature=related Watch This On Your Own
  50. 50. A Brief Word About Data Driven Theory Building
  51. 51. Hypothesis Testing is the Core of Mainstream Science
  52. 52. Deduction
  53. 53. Popperian Falsification
  54. 54. Partial or Complete Induction
  55. 55. Is the Alternative
  56. 56. In Case You Did not Know
  57. 57. This is an Inductive Age
  58. 58. This is the Age of Aspirational Spelling (Spelling is 1.0 Thinking)
  59. 59. The Inverse Problem
  60. 60. Kepler v. Newton
  61. 61. Net Flix Prize -- From the AT&T Labs http://www.youtube.com/watch?v=ImpV70uLxyw
  62. 62. The Music Genome Project
  63. 63. A Brief Word About Data Driven Theory Building One Billion Clicks Can Be the Basis for a Theory Generation The Hypothesis Testing Frame is not the only way to do Science
  64. 64. Validation Can Be Achieved Through Either: (1) Out of Sample Testing (2) Forward Prediction A Brief Word About Data Driven Theory Building http://en.wikipedia.org/wiki/Netflix_Prize http://www.wired.com/epicenter/2009/09/ how-the-netflix-prize-was-won/ Example:
  65. 65. This is the Age of Aspirational Spelling (Spelling is 1.0 Thinking) Additional Examples
  66. 66. Additional Examples People Who Bought X Also Bought Y Collaborative Filtering
  67. 67. The Science of Similarity What Makes a Set of High Dimensional Objects ‘Similar”? Movies People Words Music Books Crowd Sourced Human Reasoning is Helping Develop a ‘New’ Science of Similarity
  68. 68. Computer Forensics Drawn from Jesse Kornblum 1. Feature Extraction 2. Feature Selection 3. Comparison 4. Clustering 5. Classification 6. ??? The ‘???’ means: – Which features to extract – Which similarity measure to use – Which classification algorithm The Science of Similarity
  69. 69. Machine Learning
  70. 70. http://www.youtube.com/watch? v=yDLKJtOVx5c&feature=results_video&playnext=1&list=PLD0F06 AA0D2E8FFBA
  71. 71. http://www.youtube.com/watch?v=EFrgVDniDqU&feature=related
  72. 72. If you Want a Full Course in the Topic It is Free From Stanford University
  73. 73. http://www.forbes.com/sites/oreillymedia/ 2012/01/05/goodbye-information-economy- hello-feedback-economy/
  74. 74. © daniel martin katz michael j bommarito Machine Learning is the heart of predictive analytics
  75. 75. Legal Analytics Professor Daniel Martin Katz Professor Michael J Bommarito II © daniel martin katz michael j bommarito
  76. 76. Supervised Statistical models Bayesian, e.g., Naïve Bayes Classification Frequentist, e.g., Ordinary Least Squares Neural Networks (NN) Support Vector Machines (SVM) Random Forests (RF) Genetic Algorithms (GA) Semi/Unsupervised Neural Networks (NN) Clustering K-means Hierarchical Radial Basis (RBF) Graph Some Machine Learning Methods © daniel martin katz michael j bommarito
  77. 77. http://scikit-learn.org/stable/tutorial/machine_learning_map/index.html © daniel martin katz michael j bommarito
  78. 78. classification clustering regression dimension reduction the family of machine learning methods © daniel martin katz michael j bommarito
  79. 79. Quick Example of the Methods © daniel martin katz michael j bommarito
  80. 80. © daniel martin katz michael j bommarito Adapted from Slides By Victor Lavrenko and Nigel Goddard @ University of Edinburgh Take A LookThese 12
  81. 81. © daniel martin katz michael j bommarito 72 Female Human 3 Female Horse 36 Male Human 21 Male Human 67 Male Human 29 Female Human 54 Male Human 44 Male Human 50 Male Human 42 Female Human 6 Male Dog 7 Female Human
  82. 82. © daniel martin katz michael j bommarito Classification (Supervised Learning) decision boundary female male f( ) Gender?
  83. 83. © daniel martin katz michael j bommarito Classification (Supervised Learning) decision boundary female male f( ) Gender? Regression (Supervised Learning) #f( ) Age? 723 2 3 67 54 29 42 44 50 7 6 27 44 53 3 68 2 48 10 6 743 4 4
  84. 84. © daniel martin katz michael j bommarito Classification (Supervised Learning) decision boundary female male f( ) Gender? f( ) Loan Application? Yes Multi Class Classification (Supervised Learning) No Maybe Yes Perhaps No Multiclass = Boundary Hyperplane Regression (Supervised Learning) #f( ) Age? 723 2 3 67 54 29 42 44 50 7 6 27 44 53 3 68 2 48 10 6 743 4 4
  85. 85. © daniel martin katz michael j bommarito Classification (Supervised Learning) decision boundary female male f( ) Gender? f( ) Loan Application? Yes Multi Class Classification (Supervised Learning) No Maybe Yes Perhaps No Multiclass = Boundary Hyperplane Regression (Supervised Learning) #f( ) Age? 723 2 3 67 54 29 42 44 50 7 6 27 44 53 3 68 2 48 10 6 743 4 4 Clustering (Unsupervised Learning) Clusterf( ) Group?
  86. 86. classification clustering regression dimension reduction the family of machine learning methods © daniel martin katz michael j bommarito
  87. 87. Why I Am Interested in (Obsessed with) Analogy
  88. 88. Analogy
  89. 89. Deep End of Human Reasoning?
  90. 90. What Makes an Analogy Convincing?
  91. 91. Analogical Reasoning
  92. 92. Law
  93. 93. Thinking Like a Lawyer
  94. 94. Psychology
  95. 95. The Availability Heuristic (Related but not exactly on point)
  96. 96. Science
  97. 97. Interdisciplinary Scholarship
  98. 98. Markets
  99. 99. Emerging Technology Venture Capital Pricing Model Arbitrage Detection
  100. 100. Emerging Technology Venture Capital Pricing Model Arbitrage Detection Analogy
  101. 101. How Much is Facebook Worth? Do I have the right analogy? 1 billion 10 billion 100 billion 500 billion
  102. 102. The Science of Similarity
  103. 103. and of course
  104. 104. The Curse of Dimensionality
  105. 105. The Revolution in Soft Artificial Intelligence
  106. 106. Computer Forensics Drawn from Jesse Kornblum 1. Feature Extraction 2. Feature Selection 3. Comparison 4. Clustering 5. Classification 6. ??? The ‘???’ means: – Which features to extract – Which similarity measure to use – Which classification algorithm
  107. 107. My hope is that the AI Revolution will allow us to better understand analogical reasoning as it is a critical for a large number questions we care about .....
  108. 108. http://blog.ted.com/2010/07/14/when_ideas_have/ When Ideas Have Sex By Matt Ridley

×