Predictive Analytics - BarCamp Boston 2011

1,538 views
1,472 views

Published on

An overview of the state of the art in predictive analytics technology.

Presented at BarCamp Boston 2011.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,538
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Predictive Analytics - BarCamp Boston 2011

  1. 1. predictive analytics!the future of predicting the future Vedant Misra vedant.misra@gmail.com Boston BarCamp 2011
  2. 2. the big picture We are witnessing a data explosion. "Everywhere you look, the quantity of information in the world is soaring.According to one estimate, mankind created 150 exabytes of data in 2005. This year, it will create 1,200 exabytes." The Data Deluge. The Economist, Feb 25, 2010. P.S. 1 exabyte is 1 million terabytes.
  3. 3. the big pictureWe are witnessing a data explosion. "we create as much information* in twodays now as we did from the dawn of man through 2003" -Larry Page, CEO, Google *This is mostly lolcats and duckface photos.
  4. 4. the problem data information knowledge
  5. 5. modus operandi1. ngest data I •  tructured s •  nstructured u2.  igest data D •  LP N •  ntity extraction e3.  pit data back up S •  isualization v • ederated search f
  6. 6. the state of the art Omniture, Stratify, Jedox, Bime, Kosmix, I2, SpotFire, QuidScoremind, Birst, Predixion Software, PivotLink, GoodData, Endeca, FSI, Informatica, IBM, Kofax, SPSS, Data Applied, Mathematica, Matlab, Octave, R, Stata, Statistica, ROOT, Geant, Attensity360, Sysomos, SAS, ISS CIDNE, Centrifuge Systems,Prediction Company, CASA, Info Mesa, FreeBase, YouCalc, Inxight
  7. 7. Palantir.
  8. 8. Digital Reasoning
  9. 9. IBM DeepQA
  10. 10. ingesting data•  tructured information s •  xplicitly defined format e •  elationships are clear r •  SVs, relational C databases, XLS•  nstructured information u •  o data model n •  ixed text, numbers, m figures •  mails, webpages, e books, health records, call logs, phone recordings, video footage
  11. 11. digesting data•  o NLP D • okenize t •  etermine POS d • emmatize l•  xtract entities E•  ategorize entities Cusing a dynamicontology•  eographical tagging G•  ssociative net A
  12. 12. spitting up data•  owerful visualizations p• ederated search f •  eospatial, spatial, temporal g •  ersistent background search (alerts) p
  13. 13. complications•  igh-resolution access control h•  ource, date, location, and other smetadata for tracking pedigree andlineage•  dding insight and new data back into adata layer•  ir-gapped networks a•  evisioning databases r•  eal-time hypothesis and intuition rsharing
  14. 14. whats left?•  eep analytics: platforms that dunderstand•  eplacing IA with AI r•  ven fancier statistical methods e naive Bayes classifier, support vector machine, kernel estimation, neural networks, k-nearest neighbor,k-means clustering, kernel PCA, hierarchical clustering, linear regression, neural networks, gaussian process regression, principal component analysis, independent component analysis, hidden Markov models, maximum entropy Markov models, Kalman filters, particle filters, Bayesian networks, Markov random fields, bootstrap aggregating, ensemble averaging...
  15. 15. whats left?•  ore science of prediction: m •  odelling and validation m •  enetic algorithms for finding g symbolic expressions•  hen are systems unpredictable? w•  escribing groups with game dtheory•  hen is individual behavior wimportant?
  16. 16. thanks!

×