This document discusses machine learning concepts including algorithms, data inputs/outputs, runtimes, and trends in academia vs industry. It notes that while academia focuses on algorithm complexity, industry prioritizes data-driven approaches using large datasets. Ensemble methods combining many simple models generally perform better than single complex models. Specific ML techniques discussed include word segmentation using n-gram probabilities, perceptrons for classification, SVD for recommendations and clustering, and crowdsourcing ensembles. The key lessons are that simple models with large data outperform complex models with less data, and that embracing many small independent models through ensembles is effective.
This was the presentation for the Microsoft Community Technology Update of 2016. The idea was to introduce to people the concept of Machine Learning and its easy to get started if you are keen. My objective was also to communicate how some of the algorithms work and they require no more than basic understanding of Math to get going, sometimes not even that.
The algorithms we covered were, Support Vector Machines (SVM), Decision Tree using R2D3 and Neural Networks for classification. We used the Tensorflow Playground to help understand the Neural Network and Deep Learning concepts.
I gave an analogy of how Machine Learning process is like making a smoothie where your algorithm is a recipe, your data are your ingredients, your computer is your blender and your smoothie is the model that you developed. I used the same example to convey the concept of Training Validation and Testing. Coverage of Type 1 and Type 2 errors together with the metrics of Recall and Precision was covered as well. Finally I closed the session with what are some good resources to get started with Machine Learning for all skill levels. There are references to websites, courses, kaggle competition, podcasts, cheat sheets and books.
Data Science: A Mindset for Productivity
Keynote at 2015 Ronin Labs West Coast CTO Summit
https://www.eventjoy.com/e/west-coast-cto-summit-2015
Abstract
Data science isn't just about using a collection of technologies and algorithms. Data science requires a mindset that solves problems at a higher level of abstraction. How do we model utility when we think about optimization? How do we decide which hypotheses to test? How do we allocate our scarce resources to make progress?
There are no silver bullets. But I'll share what I've learned from a variety of contexts over the course of my work at Endeca, Google, and LinkedIn; and I hope you'll leave this talk with some practical wisdom you can apply to your next data science project.
The “best” price for a product or service is one that maximizes profits, not necessarily the price that sells the most units. This presentation uses real-world examples to explore how Excel’s Solver functionality can be used to calculate the optimal price for any product or service.
This was the presentation for the Microsoft Community Technology Update of 2016. The idea was to introduce to people the concept of Machine Learning and its easy to get started if you are keen. My objective was also to communicate how some of the algorithms work and they require no more than basic understanding of Math to get going, sometimes not even that.
The algorithms we covered were, Support Vector Machines (SVM), Decision Tree using R2D3 and Neural Networks for classification. We used the Tensorflow Playground to help understand the Neural Network and Deep Learning concepts.
I gave an analogy of how Machine Learning process is like making a smoothie where your algorithm is a recipe, your data are your ingredients, your computer is your blender and your smoothie is the model that you developed. I used the same example to convey the concept of Training Validation and Testing. Coverage of Type 1 and Type 2 errors together with the metrics of Recall and Precision was covered as well. Finally I closed the session with what are some good resources to get started with Machine Learning for all skill levels. There are references to websites, courses, kaggle competition, podcasts, cheat sheets and books.
Data Science: A Mindset for Productivity
Keynote at 2015 Ronin Labs West Coast CTO Summit
https://www.eventjoy.com/e/west-coast-cto-summit-2015
Abstract
Data science isn't just about using a collection of technologies and algorithms. Data science requires a mindset that solves problems at a higher level of abstraction. How do we model utility when we think about optimization? How do we decide which hypotheses to test? How do we allocate our scarce resources to make progress?
There are no silver bullets. But I'll share what I've learned from a variety of contexts over the course of my work at Endeca, Google, and LinkedIn; and I hope you'll leave this talk with some practical wisdom you can apply to your next data science project.
The “best” price for a product or service is one that maximizes profits, not necessarily the price that sells the most units. This presentation uses real-world examples to explore how Excel’s Solver functionality can be used to calculate the optimal price for any product or service.
A fast-paced introduction to Deep Learning concepts, such as activation functions, cost functions, backpropagation, and then a quick dive into CNNs. Basic knowledge of vectors, matrices, and elementary calculus (derivatives), are helpful in order to derive the maximum benefit from this session.
Next we'll see a simple neural network using Keras, followed by an introduction to TensorFlow and TensorBoard. (Bonus points if you know Zorn's Lemma, the Well-Ordering Theorem, and the Axiom of Choice.)
Hoje em dia é fácil juntar quantidades absurdamente grandes de dados. Mas, uma vez de posse deles, como fazer para extrair informações dessas montanhas amorfas de dados? Nesse minicurso vamos apresentar o modelo de programação MapReduce: entender como ele funciona, para que serve e como construir aplicações usando-o. Vamos ver também como usar o Elastic MapReduce, o serviço da Amazon que cria clusters MapReduce sob-demanda, para que você não se preocupe em administrar e conseguir acesso a um cluster de máquinas, mas em como fazer seu código digerir de forma distribuída os dados que você possui. Veremos exemplos práticos em ação e codificaremos juntos alguns desafios.
This presentation focuses on Deep Learning (DL) concepts, such as neural networks, backprop, activation functions, and Convolutional Neural Networks. You'll also learn how to incorporate Deep Learning in Android applications. Basic knowledge of matrices is helpful for this session, which is targeted primarily to beginners.
This presentation introduces Deep Learning (DL) concepts, such as neural neworks, backprop, activation functions, and Convolutional Neural Networks, followed by an Angular application that uses TypeScript in order to replicate the Tensorflow playground.
Slides used during the virtual conference, NetCoreConf on April 04, 2020. The session was a introduction to Machine Learning for .Net developers, using ML.Net as the main framework.
A Gentle Introduction to Coding ... with PythonTariq Rashid
A gentle introduction to coding (programming) for complete beginners. Starting from then basics - electrical wires - proceeding through variables, data structures, loops, functions, and exploring libraries for visualisation and specialist tools. Finally we use flask to make a very simple twitter clone web application.
Machine learning is the science of getting computers to act without being explicitly programmed. In the past decade, machine learning has given us self-driving cars, practical speech recognition, effective web search, and a vastly improved understanding of the human genome. Machine learning is so pervasive today that you probably use it dozens of times a day without knowing it.
We will review some modern machine learning applications, understand variety of machine learning problem definitions, go through particular approaches of solving machine learning tasks.
This year 2015 Amazon and Microsoft introduced services to perform machine learning tasks in cloud. Microsoft Azure Machine Learning offers a streamlined experience for all data scientist skill levels, from setting up with only a web browser, to using drag and drop gestures and simple data flow graphs to set up experiments.
We will briefly review Azure ML Studio features and run machine learning experiment.
We present basic concepts of machine learning such as: supervised and unsupervised learning, types of tasks, how some algorithms work, neural networks, deep learning concepts, how to apply it in your work.
The ABC of Implementing Supervised Machine Learning with Python.pptxRuby Shrestha
It is to our fact that machine learning has taken a significant height. However, knowing and understanding how small problems can be solved from a machine learning perspective is necessary to form a good base, appreciate the process of implementation and get started in this domain. Therefore, in this post, I would like to talk about the ABC of implementing Supervised Machine Learning with Python by navigating through a simple example, which is, adding two numbers. So, to put it in simple terms, I would like to make a machine learn to add. This can be put in other words; I would like to develop a predictive model that can add. Sounds simple, right? View the presentation for more details.
A fast-paced introduction to Deep Learning concepts, such as activation functions, cost functions, backpropagation, and then a quick dive into CNNs. Basic knowledge of vectors, matrices, and elementary calculus (derivatives), are helpful in order to derive the maximum benefit from this session.
Next we'll see a simple neural network using Keras, followed by an introduction to TensorFlow and TensorBoard. (Bonus points if you know Zorn's Lemma, the Well-Ordering Theorem, and the Axiom of Choice.)
Hoje em dia é fácil juntar quantidades absurdamente grandes de dados. Mas, uma vez de posse deles, como fazer para extrair informações dessas montanhas amorfas de dados? Nesse minicurso vamos apresentar o modelo de programação MapReduce: entender como ele funciona, para que serve e como construir aplicações usando-o. Vamos ver também como usar o Elastic MapReduce, o serviço da Amazon que cria clusters MapReduce sob-demanda, para que você não se preocupe em administrar e conseguir acesso a um cluster de máquinas, mas em como fazer seu código digerir de forma distribuída os dados que você possui. Veremos exemplos práticos em ação e codificaremos juntos alguns desafios.
This presentation focuses on Deep Learning (DL) concepts, such as neural networks, backprop, activation functions, and Convolutional Neural Networks. You'll also learn how to incorporate Deep Learning in Android applications. Basic knowledge of matrices is helpful for this session, which is targeted primarily to beginners.
This presentation introduces Deep Learning (DL) concepts, such as neural neworks, backprop, activation functions, and Convolutional Neural Networks, followed by an Angular application that uses TypeScript in order to replicate the Tensorflow playground.
Slides used during the virtual conference, NetCoreConf on April 04, 2020. The session was a introduction to Machine Learning for .Net developers, using ML.Net as the main framework.
A Gentle Introduction to Coding ... with PythonTariq Rashid
A gentle introduction to coding (programming) for complete beginners. Starting from then basics - electrical wires - proceeding through variables, data structures, loops, functions, and exploring libraries for visualisation and specialist tools. Finally we use flask to make a very simple twitter clone web application.
Machine learning is the science of getting computers to act without being explicitly programmed. In the past decade, machine learning has given us self-driving cars, practical speech recognition, effective web search, and a vastly improved understanding of the human genome. Machine learning is so pervasive today that you probably use it dozens of times a day without knowing it.
We will review some modern machine learning applications, understand variety of machine learning problem definitions, go through particular approaches of solving machine learning tasks.
This year 2015 Amazon and Microsoft introduced services to perform machine learning tasks in cloud. Microsoft Azure Machine Learning offers a streamlined experience for all data scientist skill levels, from setting up with only a web browser, to using drag and drop gestures and simple data flow graphs to set up experiments.
We will briefly review Azure ML Studio features and run machine learning experiment.
We present basic concepts of machine learning such as: supervised and unsupervised learning, types of tasks, how some algorithms work, neural networks, deep learning concepts, how to apply it in your work.
The ABC of Implementing Supervised Machine Learning with Python.pptxRuby Shrestha
It is to our fact that machine learning has taken a significant height. However, knowing and understanding how small problems can be solved from a machine learning perspective is necessary to form a good base, appreciate the process of implementation and get started in this domain. Therefore, in this post, I would like to talk about the ABC of implementing Supervised Machine Learning with Python by navigating through a simple example, which is, adding two numbers. So, to put it in simple terms, I would like to make a machine learn to add. This can be put in other words; I would like to develop a predictive model that can add. Sounds simple, right? View the presentation for more details.
Similar to Intelligent Ruby + Machine Learning (20)
JavaScript is great, but let's face it, being stuck with just JavaScript in the browser is no fun.
Why not write and run Ruby in the browser, on the client, and on the server as part of your next web application?
No Callbacks, No Threads - RailsConf 2010Ilya Grigorik
Multi-threaded servers compete for the global interpreter lock (GIL) and incur the cost of continuous context switching, potential deadlocks, or plain wasted cycles. Asynchronous servers, on the other hand, create a mess of callbacks and errbacks, complicating the code. But, what if, you could get all the benefits of asynchronous programming, while preserving the synchronous look and feel of the code – no threads, no callbacks?
Ruby Proxies for Scale, Performance, and Monitoring - GoGaRuCo - igvita.comIlya Grigorik
A high-performance proxy server is less than a hundred lines of Ruby code and it is an indispensable tool for anyone who knows how to use it. In this session we will first walk through the basics of event-driven architectures and high-performance network programming in Ruby using the EventMachine framework.
A look at the technologies and the architecture behind the emerging real-time web. We will discuss XMPP/Jabber and AMQP protocols and explore the advantages of each over the commonly used HTTP request-response cycle. As part of the workshop we will look at the available tools and libraries and work through simple examples of creating an event driven, real-time service.
3. “Machine learning is a discipline that is concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data”
4. Algorithm Data Input Data Output Runtime ML & AI in the academia and how it’s commonly taught
5. Algorithm Data Input Data Output Runtime ML & AI in the real world or, at least, where the trends are going
13. Growing at exponential rateRuntime Data, is often no longer scarce… in fact, we (Rubyists) are responsible for generating a lot of it…
14. Data Input Data Input Data Input Data Input Data Input ? Runtime Runtime Runtime Runtime Runtime Mo’ data, Mo’ problems? Requires more resources? No better off…?
15. “Mitigating the Paucity-of-Data Problem: Exploring the Effect of Training Corpus Size on Classifier Performance for Natural Language Processing” Michelle Banko, Eric Brill http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.12.646 “More input data vs. Better Algorithms”
16. “Data-Driven Learning” "We were able significantly reduce the error rate, compared to the best system trained on the standard training set size, simply by adding more training data... We see that even out to a billion words the learners continue to benefit from additional training data."
18. 新星歐唐尼爾 保守特立獨行 Wordsegmentationistricky Word|segmentation|is|tricky Strategy 1: Grammar for dummies Strategy 2: Natural language toolkit (encode a language model) Strategy 3: Take a guess! NLP with Big-Data Google does this better than anyone else…
19. P(W) xP(ordsegmentationistricky) P(Wo) xP(rdsegmentationistricky) … P(Word) xP(segmentationistricky) argmax P(W) = ???? Word Segmentation: Take a guess! Estimate the probability of every segmentation, pick the best performer
20.
21. Adding new language: scrape the web, count the words, done.Word Segmentation: Take a guess! That’s how Google does it, and does it well…
22. Algorithm Data Input Data Output Data Input Data Input Data Input Data Input Runtime Runtime Runtime Runtime Runtime Of course, smarter algorithms still matter! don’t get me wrong…
23. If we can identify significant concepts (within a dataset) then we can represent a large dataset with fewer bits. “Machine Learning” If we can represent our data with fewer bits (compress our data), then we have identified “significant” concepts! Learning vs. Compression closely correlated concepts
25. ? Exercise: maximize the margin Color Red = Not tasty Green = Tasty ? Tasty… Feel Predicting a “tasty fruit” with the perceptron algorithm (y = mx + b) http://bit.ly/bMcwhI
26. Green = Positive Purple = Negative Where perceptron breaks down we need a better model…
27. Gree = Positive Purple = Negative Perfect! Idea: y = x2 Throw the data into a “higher dimensional” space! http://bit.ly/dfG7vD
28. require'SVM' sp =Problem.new sp.addExample(”spam", [1,1,0]) sp.addExample(”ham", [0,1,1]) pa =Parameter.new m=Model.new(sp, pa) m.predict [1, 0, 0] Support Vector Machines That’s the core insight! Simple as that. http://bit.ly/a2oyMu
30. A B C D Ben Any M xN matrix (where M >= N), can be decomposed into: M xM - call it U M xN - call it S N xN - call it V Fred Tom James Bob Observation: we can use this decomposition to approximate the original MxN matrix (by fiddling with S and then recomputingU x S x V) Linear Algebra + Singular Value Decomposition A bit of linear algebra for good measure…
31. SVD in action bread and butter of computer vision systems
32. require'linalg' m=Linalg::DMatrix[[1,0,1,0], [1,1,1,1], ... ]] # Compute the SVD Decomposition u, s, vt=m.singular_value_decomposition # ... compute user similarity # ... make recommendations based on similar users! gem install linalg to do the heavy-lifting… http://bit.ly/9lXuOL
34. Raw data Similarity? 1. AAAA AAA AAAA AAA AAAAA 2. BBBBB BBBBBB BBBBB BBBBB 3. AAAA BBBBB AAA BBBBB AA similarity(1, 3) > similarity(1, 2) similarity(2, 3) > similarity(1, 2) Yeah.. but how did you figure that out? Learning & compression are closely correlated concepts Some of you ran Lempel-Ziv on it…
35. Exercise: cluster your ITunes library.. files =Dir['data/*'] defdeflate(*files) z=Zlib::Deflate.new z.deflate(files.collect {|f| open(f).read}.join(""), Zlib::FINISH).size end pairwise= files.combination(2).collect do |f1, f2| a = deflate(f1) b= deflate(f2) both = deflate(f1, f2) { :files => [f1, f2], :score => (a+b)-both } end pp pairwise.sort {|a,b| b[:score] <=> a[:score]}.first(20) Similarity = amount of space saved when compressed together vs. individually Clustering with Zlib no knowledge of the domain, just straight up compression
36. Algorithm Data Input Data Output Data Input Algorithm Data Input Algorithm Data Input Algorithm Data Input Algorithm Runtime Runtime Runtime Runtime Runtime “Ensemble Methods in Machine Learning” Thomas G. Diettrerich (2000) “Ensemble methods are learning algorithms that construct a set of classifiers and then classify new data points by taking a vote of their predictions… ensembles can often perform better than any single classifier.”
37. The Ensemble = 30+ members BellKor = 7 members http://nyti.ms/ccR7ul
38. require'open-uri' classCrowdsource definitialize load_leaderboard# scrape github contest leaders parse_leaders# find their top performing results fetch_results# download best results cleanup_leaders# cleanup missing or incorrect data crunchit# build an ensemble end #... end Crowdsource.new Collaborative, Collaborative Filtering? Unfortunately, GitHub grew didn’t buy into the idea…
41. Complex ideas are constructed on simple ideas: explore the simple ideasMore resources, More data, More Models = Collaborative, Data-Driven Learning
42. Collaborative Filtering with Ensembles: http://www.igvita.com/2009/09/01/collaborative-filtering-with-ensembles/ Support Vector Machines in Ruby: http://www.igvita.com/2008/01/07/support-vector-machines-svm-in-ruby/ SVD Recommendation System in Ruby: http://www.igvita.com/2007/01/15/svd-recommendation-system-in-ruby/ gem install ai4r http://ai4r.rubyforge.org/ Phew, time for questions? hope this convinced you to explore the area further…
Editor's Notes
Now, I believe that as the rails ecosystem grows, and becomes older… The end-to-end performance becomes only more important, because all of the sudden, the projects are larger, and more successful, and they’re feeling the pain of “scaling the Rails stack”.