machine learning in the age of big data: new approaches and business applications

  • 475 views
Uploaded on

Presentation at University of Lisbon on Machine Learning and big data. …

Presentation at University of Lisbon on Machine Learning and big data.
Deep learning algorithms and applications to credit risk analysis, churn detection and recommendation algorithms

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
475
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
8
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Story telling : gripe google
  • Invariants, parity, connexity

Transcript

  • 1. Armando Vieira Closer Armando.lidinwise.com
  • 2. 1. 2. 3. 4. 5. 6. Machine Learning: finding features, patterns & representations The connectionist approach: Neural Networks Applications The Deep Learning “revolution”: a step closer to the brain? Applications The Big Data deluge: better algorithms & more data
  • 3. Was “Deep Blue” Intelligent?  How about Watson?  Or Google?  Does machines have reached the intelligence level of a rat? ….  Let’s be pragmatic: I’ll call “intelligent” any device capable of surprise me! 
  • 4. Deep MLP Hebb concept Architectures
  • 5.      1943 – Mculloch& Pitts + Hebb 1968- Rosenblatperceptron and the Minsk argument - or why a good theory may kill an even better idea 1985- RumelhartPerceptron 2006- Hinton Deep Learning (Boltzmann) Networks All together: Watson, Google et al
  • 6.    Input builds up on receptors (dendrites) Cell has an input threshold Upon breech of cell’s threshold, activation is fired down the axon.
  • 7. The visual cortex
  • 8.  A step closer to success thanks to a training algorithm: back propagation
  • 9.   Training is nothing more than fitting: regression, classification, recommendations Problem is we have to find a way to represent the world (extract features)
  • 10. FRUSTATION Age Money
  • 11. Simpler hypothesis has lower error rate
  • 12.     ANN are very hard to optimize Lots of local minimum (trap for stochastic gradient descendent) Permutation invariant (no unique solution) How to stop training?
  • 13.     Neural Networks are incredible powerful algorithms But they are also wild beasts that should be treated with great care Its very easy to fall in the GIGO trap Problems like overfit, suboptimization, bad conditioned, wrong interpretation are common
  • 14. Interpretation of outputs  Loss function  Outputs ≠ probabilities  Where to draw the line?  VERY careful in interpreting the outputs of ML algorithms: you not always get what you see Input preparation  Clean & balance the data  Normalize it properly  Remove unneeded features, create new ones  Missing values
  • 15.      Rutherford Backscattering (RBS) Credit Risk & Scoring Churn prediction (CDR) Prediction of hotel demand with Google trends Adwords Optimization
  • 16. Ion beam analysis MeV/amu RBS Channelling h NRA PIXE ERDA
  • 17. 25Å Ge -layer under 400 nm Si 1500 (c) 1000 50 Angle of incidence o 500 25 o 0 o Yield (arb. units) 0 (b) 1000 120 o 500 Scattering angle 140 180 o o 0 (a) 1.2 MeV 1000 Beam energy 1.6 MeV 500 2 MeV 0 0 100 200 Channel 300 400
  • 18. architecture (I, 100, O) (I, 250, O) (I, 100, 80, O) (I, 100, 50, 20, O) (I, 100, 80, 50, O) (I, 100, 80, 80, O) (I, 100, 50, 100, O) (I, 100, 80, 80, 50, O) (I, 100, 80, 50, 30, 20, O) train set error test set error 6.3 5.2 3.6 4.2 3.0 2.8 3.0 3.2 3.8 11.7 10.1 5.3 5.1 4.1 4.7 4.2 4.1 5.3
  • 19. 2 DepthANN (10 at/cm ) 4000 b) 15 3000 2000 1000 0 0 1000 2000 3000 15 4000 2 100 a) 10 15 2 DoseANN (10 at/cm ) Depthdata (10 at/cm ) 1 0,1 0,1 1 10 15 100 2 Dosedata (10 at/cm )
  • 20. Score (EBIT, Current ratio) 1 0.5 0 -0.5 -1 -1.5 2 1 0 -1 eb -2 -2 0 -1 cr 1 2
  • 21. Before After
  • 22. 0 -1 -2 -3 -4 -5
  • 23.   Neural networks are good when: many training data available; continuous variables; relevant features are known; unicity of the mapping. Neural networks are less useful when: problem is linear; few data compared to the size of search space; data high dimensional; long range correlations. They are black boxes
  • 24. Characteristic Traditional methods (Von Neumann) Artificial neural networks logics Deductive Inductive Processing principle Logical Gestalt Processing style Sequential Distributed (parallel) Functions through realised concepts, rules, calculations Concepts, images, categories, maps Connections concepts between Programmed a priori Dynamic, evolving Programming Through a limited set of Self-programmable (given an rigid rules appropriate architecture) Learning By rules Self-learning Through internal algorithmic Continuously adaptable parameters Tolerance to errors Mostly none By examples (analogies) Inerent
  • 25.    ANN are massive correlation & feature extraction machines isn’t what intelligence is all about? Knowledge is embedded in a messy network of weights Capable to model an arbitrary complex mapping
  • 26.    We need thousands of examples for training. Why? Prior Algorithms are simple: complexity lies in the data
  • 27. Hinton et al, 2006
  • 28.        “quasi” non-supervised machines Extract and combine subtle features in the data Build high-level representations (abstractions) Capable of knowledge transfer Can handle (very) high-dimensional data Are deep and broad: millions of synapses Work both ways: up and down
  • 29. Learning features that are not mutually exclusive
  • 30.        Top on image identification (is some cases it beat humans) Top on video classification Top on real-time translation Top on Gene identification Reverse engineering: can replicate complex human behaviour, like walking. Data visualization and of text disambiguation (river-bank/bank-bailout) Kaggle
  • 31. Data Data Features Results Features Results
  • 32. Better Powerful Algorithms Computers Data Available MAGIC
  • 33.   In 2 years we produce more data (and garbage) than the accumulated over all history Zettabytesof data, 1021 bytes produced every year
  • 34. Machine learning molecules (**)
  • 35.     Most ML algorithms work better (sometimes much better) by simple throwing more data to them And now we have more data. Plenty of it! Which is signal and which is noise? Let the machines decide (they are good at it) Where humans stands in this equation? We are feeding the machines!
  • 36.         Don’t look for causation; welcome correlations Messiness - prepare to get your hands dirty Don’t expect definitive answers. Only communists have them! Stop searching God’s equation Keep theories at bay and let the data speak Exactitude may not be better than “estimations” Forget about keep data clean and organized Data is alive and wild. Don’t imprisoned it
  • 37.       Flue prediction Netflix movie rating contest New York city building security Used car Veg food->airport Prediction rare events frauds and why its important
  • 38.        A step closer to the brain? Yes and No What is missing? Predictive analytics (crime before it occurs)? Algorithms that learn & adapt Replace humans? Augment reality Big Data & algorithms are revolutionizing the world. Fast!
  • 39.        Recommendations (Amazon, Netflix, Facebook) Trading (70% Wall Street is made by them) Identifying your partner, recruiting, votes Images, video, voice, translation (real time) Where are we heading? NSA? Black boxes?
  • 40.      Deeplearning.net Hinton Google talks “Too big to know” Big Data: a new revolution that will transform business Machine Learning in R
  • 41.       Matlab (several code – google for it) R (CRAN repository), Rminer Python (Skilearn) C++ (mainly on Github) Torch More on Deeplearning.net
  • 42. Recommend an unseen item i to an user ubased on engagement of other users to items 1 to 8. Items recommended in this case are i2 followed by i1.
  • 43. Item based recommendation for a user ua based on a neighbour of k = 3. Items recommended in this case are i3 followed by i4. (item based CF superior to user based CF but it requires lot of information like ratings or user interaction with the product).