Polong Lin(林伯龍)/how to approach data science problems from start to end台灣資料科學年會
Polong Lin is a Data Scientist at IBM. He is a regular speaker on data science and develops content for free data education on bigdatauniversity.com using open data tools on datascientistworkbench.com. Polong earned his M.Sc. at the Univ. of Tsukuba.
machine learning in the age of big data: new approaches and business applicat...Armando Vieira
Presentation at University of Lisbon on Machine Learning and big data.
Deep learning algorithms and applications to credit risk analysis, churn detection and recommendation algorithms
This document provides an introduction to data science, noting that 90% of the world's data was generated in the last two years. It discusses the fields of computer science, business, statistics, and data science. It describes two types of data scientists: statisticians who specialize in analysis and developers who specialize in building tools. It also lists some popular programming languages and visualization tools used in data science like Python, R, and Tableau. Finally, it provides some tips for those interested in data science such as learning design, public speaking, coding, and finding value.
This document provides an introduction to machine learning. It discusses that machine learning focuses on learning about processes in the world rather than just memorizing data. It also covers the main types of machine learning: supervised learning which learns mappings between examples and labels; unsupervised learning which learns structure from unlabeled examples; and reinforcement learning which learns to take actions to maximize rewards. The document explains that machine learning requires representing data as feature vectors and using models with optimization techniques to find parameters that generalize to new data rather than overfitting the training data.
This document provides an overview of machine learning tools and languages. It discusses Python, R, and MATLAB as the most commonly used tools. For each tool, it lists advantages and disadvantages. Python is highlighted as the number one language for machine learning due to its many libraries and large user community. R is best for time series analysis and causal inference. MATLAB is still a leading tool for signal processing but lacks machine learning libraries. The document also provides resources for learning machine learning foundations and examples.
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine LearningAli Alkan
The document provides an introduction to image processing and recognition using machine learning. It discusses how deep learning uses hierarchical neural networks inspired by the human brain to learn representations of image data without requiring manual feature engineering. Deep learning has been applied successfully to problems like computer vision through convolutional neural networks. The document also describes how KNIME can be used as an open-source platform to visually build and run deep learning models for image processing tasks and integrate with other tools. It highlights several image processing and deep learning nodes available in KNIME.
2017: The Many Faces of Artificial Intelligence: From AI to Big Data - A Hist...Leandro de Castro
(1) Artificial intelligence has evolved significantly since its origins in the 1930s and 1940s, with pioneering work by Turing, McCulloch and Pitts, and others. (2) The field experienced periods of optimism and funding in the 1960s followed by a "winter" in the 1970s due to lack of progress. (3) New approaches in the 1980s-1990s like neural networks, expert systems, and increased computing power led to a rebirth of the field. (4) Today, areas like machine learning, deep learning, big data and natural language processing are driving advances, powered by technologies from companies pursuing both general and specialized AI.
Polong Lin(林伯龍)/how to approach data science problems from start to end台灣資料科學年會
Polong Lin is a Data Scientist at IBM. He is a regular speaker on data science and develops content for free data education on bigdatauniversity.com using open data tools on datascientistworkbench.com. Polong earned his M.Sc. at the Univ. of Tsukuba.
machine learning in the age of big data: new approaches and business applicat...Armando Vieira
Presentation at University of Lisbon on Machine Learning and big data.
Deep learning algorithms and applications to credit risk analysis, churn detection and recommendation algorithms
This document provides an introduction to data science, noting that 90% of the world's data was generated in the last two years. It discusses the fields of computer science, business, statistics, and data science. It describes two types of data scientists: statisticians who specialize in analysis and developers who specialize in building tools. It also lists some popular programming languages and visualization tools used in data science like Python, R, and Tableau. Finally, it provides some tips for those interested in data science such as learning design, public speaking, coding, and finding value.
This document provides an introduction to machine learning. It discusses that machine learning focuses on learning about processes in the world rather than just memorizing data. It also covers the main types of machine learning: supervised learning which learns mappings between examples and labels; unsupervised learning which learns structure from unlabeled examples; and reinforcement learning which learns to take actions to maximize rewards. The document explains that machine learning requires representing data as feature vectors and using models with optimization techniques to find parameters that generalize to new data rather than overfitting the training data.
This document provides an overview of machine learning tools and languages. It discusses Python, R, and MATLAB as the most commonly used tools. For each tool, it lists advantages and disadvantages. Python is highlighted as the number one language for machine learning due to its many libraries and large user community. R is best for time series analysis and causal inference. MATLAB is still a leading tool for signal processing but lacks machine learning libraries. The document also provides resources for learning machine learning foundations and examples.
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine LearningAli Alkan
The document provides an introduction to image processing and recognition using machine learning. It discusses how deep learning uses hierarchical neural networks inspired by the human brain to learn representations of image data without requiring manual feature engineering. Deep learning has been applied successfully to problems like computer vision through convolutional neural networks. The document also describes how KNIME can be used as an open-source platform to visually build and run deep learning models for image processing tasks and integrate with other tools. It highlights several image processing and deep learning nodes available in KNIME.
2017: The Many Faces of Artificial Intelligence: From AI to Big Data - A Hist...Leandro de Castro
(1) Artificial intelligence has evolved significantly since its origins in the 1930s and 1940s, with pioneering work by Turing, McCulloch and Pitts, and others. (2) The field experienced periods of optimism and funding in the 1960s followed by a "winter" in the 1970s due to lack of progress. (3) New approaches in the 1980s-1990s like neural networks, expert systems, and increased computing power led to a rebirth of the field. (4) Today, areas like machine learning, deep learning, big data and natural language processing are driving advances, powered by technologies from companies pursuing both general and specialized AI.
BSidesLV 2013 - Using Machine Learning to Support Information SecurityAlex Pinto
Big Data, Data Science, Machine Learning and Analytics are a few of the new buzzwords that have invaded out industry of late. Again we are being sold a unicorn-laden, silver-bullet panacea by heavy handed marketing folks, evoking an expected pushback from the most enlightened members of our community. However, as was the case before, there might just be enough technical meat in there to help out with our security challenges and the overwhelming odds we face everyday. And if so, what do we as a community have to know about these technologies in order to be better professionals? Can we really use the data we have been collecting to help automate our security decision making? Is a robot going to steal my job?
If you are interested in what is behind this marketing buzz and are not scared of a little math, this talk would like to address some insights into applying Machine Learning techniques to data any of us have easy access to, and try to bring home the point that if all of this technology can be used to show us “better” ads in social media and track our behavior online (and a bit more than that) it can also be used to defend our networks as well.
This document discusses whether big data analysis is more of a "systems" task or "human" task. It presents research showing that software defect prediction, even when conducted by top experts using the same datasets and algorithms over many years, shows little improvement and high variability. This suggests that human factors like biases are important. The document proposes using data mining on source code and social media to classify developers by expertise and identify groups who could share knowledge to reduce defects. It outlines an initial approach using parsers, classifiers like Naive Bayes to distinguish novices from experts, and seeking larger datasets from partners. The goal is to strengthen the "human" aspects of big data analysis.
IT Cluster Skolkovo Presentation at FRUCT.org conferenceAlbert Yefimov
Skolkovo Foundation aims to foster innovation in Russia by funding over 900 startups through grants, venture funds, and partnerships with global corporations and universities; some successful startups include Datadvance, Rock Flow Dynamics, and Synesis which have received grants and grown their staff, revenue, intellectual property, and partnerships. The foundation focuses on key areas like IT, energy efficiency, life sciences, and nuclear technologies to drive economic and technological development in Russia.
Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...CS, NcState
The document discusses three laws of trusted data sharing based on research in software engineering quality prediction. The first law is to only share the essential "corners" of the data rather than all data. The second law is to anonymize the data in the corners before sharing. The third law is never to mutate the data across important "decision boundaries". The research found that building models from a small percentage of shared and privatized data in this way produced better results than using all the original raw data. The author plans to apply these laws of data sharing to other domains like smart cities and healthcare to investigate the costs and benefits of data sharing.
The Art and Science of Analyzing Software DataCS, NcState
This document summarizes an ICSE'14 tutorial on analyzing software data. The tutorial covers various topics:
- Organization issues like talking to users to understand goals, knowing the software domain to avoid misinterpretations, questioning the data, and seeing data science as cyclic.
- Qualitative methods like discovering information needs through surveys and interviews.
- Quantitative methods like data reduction techniques and privacy-preserving sharing.
- Open issues like data instabilities, model comparisons, and ensemble techniques.
The document emphasizes understanding the user's perspective and software domain knowledge to properly analyze data and avoid incorrect conclusions. Case studies show how missing this domain knowledge led analyses down wrong paths.
Agile Research in Information Systems Field: Analysis from Knowledge Transfor...Ilia Bider
Presentation at the 8th IADIS International Conference on Information systems. Pre-proceedings available at: http://bit.ly/1QPEZS5
Due to the relative success of agile methods in software development, the idea of having agile processes started to be tested in other areas, for example, agile business process development. This trend already reached the research community and there have appeared some materials that suggest using agility in research projects. Analysis of these suggestions, however, shows that they do not go beyond finding superficial analogy between the concepts of the software development and research projects. The paper presents a deeper analysis of the concept of agile research in Information Systems (IS) based on the analysis of the research projects from the knowledge transformation perspective. As a basis for analysis, the SECI model of Nonaka is used. Based on this analysis, several suggestions are made on how to conduct agile research in IS, e.g. prioritize relevance over vigor, test early for a practical purpose, use own experience and reflections, etc. It is also shown that some research types, like action research and design science, are more suitable for conducting agile research than others. The paper also gives analysis of risks of non-agile research, and presents an example where they are revealed.
Presentation at Data ScienceTech Institute campuses, Paris and Nice, May 2016 , including Intro, Data Science History and Terms; 10 Real-World Data Science Lessons; Data Science Now: Polls & Trends; Data Science Roles; Data Science Job Trends; and Data Science Future
Visually Exploring Patent Collections for Events and PatternsXiaoyu Wang
My talk on Patent Visualization at The 3rd IEEE Workshop on Interactive Visual Text Analytics. Primary focus is to introduce the Scalable Visual Analytics research that my team is working on. Workshop paper can be found at: http://vialab.science.uoit.ca/textvis2013/papers/Ankam-TextVis2013.pdf
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Alex Pinto
The document discusses machine learning-based security monitoring. It begins with an introduction of the speaker, Alex Pinto, and an agenda that will include a discussion of anomaly detection versus classification techniques. It then covers some history of anomaly detection research dating back to the 1980s. It also discusses challenges with anomaly detection, such as the curse of dimensionality with high-dimensional data and lack of ground truth labels. The document emphasizes communicating these machine learning concepts clearly.
GALE: Geometric active learning for Search-Based Software EngineeringCS, NcState
Multi-objective evolutionary algorithms (MOEAs) help software engineers find novel solutions to complex problems. When automatic tools explore too many options, they are slow to use and hard to comprehend. GALE is a near-linear time MOEA that builds a piecewise approximation to the surface of best solutions along the Pareto frontier. For each piece, GALE mutates solutions towards the better end. In numerous case studies, GALE finds comparable solutions to standard methods (NSGA-II, SPEA2) using far fewer evaluations (e.g. 20 evaluations, not 1,000). GALE is recommended when a model is expensive to evaluate, or when some audience needs to browse and understand how an MOEA has made its conclusions.
This document provides an introduction to machine learning. It discusses how machine learning gives computers the ability to learn without being explicitly programmed. It also discusses how machine learning is used widely by major companies and has become integral to many businesses. Finally, it covers different machine learning techniques including supervised learning methods like classification, regression, and artificial neural networks as well as unsupervised learning methods like clustering.
A Pragmatic Perspective on Software VisualizationArie van Deursen
Slides of the keynote presentation at the 5th International IEEE/ACM Symposium on Software Visualization, SoftVis 2010. Salt Lake City, USA, October 2010.
1. Knowledge discovery in production requires automation due to the growth of information, devices, and knowledge workers.
2. A core dataflow model engine is needed to preprocess data and compose networked intelligence solutions for emerging applications.
3. Product solutions include hybrid SaaS factory subscriptions and applications via an open marketplace to deliver business value such as increased productivity and test time reduction for electronics manufacturing customers.
DeepLearning4J and Spark: Successes and Challenges - François Garillotsparktc
Deeplearning4J is an open-source, distributed deep learning library written for Java and Scala. It provides tools for training neural networks on distributed systems. While large companies can distribute training across many servers, Deeplearning4J allows other organizations to do distributed training as well. It includes libraries for vectorization, linear algebra, data preprocessing, model definition and training. The library aims to make deep learning more accessible to enterprises by allowing them to train models on their own large datasets.
DeepLearning4J and Spark: Successes and Challenges - François Garillotsparktc
At the recent sold-out Spark & Machine Learning Meetup in Brussels, François Garillot of Skymind delivered a lightning talk called DeepLearning4J and Spark: Successes and Challenges.
Specifically, François offered a tour of the DeepLearning4J architecture intermingled with applications. He went over the main blocks of this deep learning solution for the JVM that includes GPU acceleration, a custom n-dimensional array library, a parallelized data-loading swiss army tool, deep learning and reinforcement learning libraries — all with an easy-access interface.
Along the way, he pointed out the strategic points of parallelization of computation across machines and gave insight on where Spark helps — and where it doesn't.
DeepLearning4J and Spark: Successes and Challenges - François GarillotSteve Moore
At the recent Spark & Machine Learning Meetup in Brussels, François Garillot of Skymind delivered this lightning talk to a sold-out crowd.
Specifically, François offered a tour of the DeepLearning4J architecture intermingled with applications. He went over the main blocks of this deep learning solution for the JVM that includes GPU acceleration, a custom n-dimensional array library, a parallelized data-loading swiss army tool, deep learning and reinforcement learning libraries — all with an easy-access interface.
Along the way, he pointed out the strategic points of parallelization of computation across machines and gave insight on where Spark helps — and where it doesn't.
This document provides an introduction to deep learning, including definitions of artificial intelligence, machine learning, and deep learning. It discusses examples of inputs and outputs in deep learning systems, potential applications, common Python libraries like Keras, and conclusions. The key takeaways are that deep learning uses neural networks to learn patterns at different levels of abstraction, it involves training models on data and using the models to make inferences on new data, and libraries like Keras and TensorFlow are commonly used.
Algorithm Marketplace and the new "Algorithm Economy"Diego Oppenheimer
Diego Oppenheimer discusses the rise of algorithm marketplaces and the new "algorithm economy". Key points include:
- Advances in machine learning, computer vision, speech recognition and natural language processing are enabling algorithms to interpret unstructured data at scale.
- Algorithm marketplaces allow algorithms to be hosted, discovered, monetized and composed modularly to address a wide range of use cases across many industries.
- The algorithm economy will lower barriers to applying machine intelligence and foster innovation as algorithms become reusable assets that creators and users can both benefit from.
BSidesLV 2013 - Using Machine Learning to Support Information SecurityAlex Pinto
Big Data, Data Science, Machine Learning and Analytics are a few of the new buzzwords that have invaded out industry of late. Again we are being sold a unicorn-laden, silver-bullet panacea by heavy handed marketing folks, evoking an expected pushback from the most enlightened members of our community. However, as was the case before, there might just be enough technical meat in there to help out with our security challenges and the overwhelming odds we face everyday. And if so, what do we as a community have to know about these technologies in order to be better professionals? Can we really use the data we have been collecting to help automate our security decision making? Is a robot going to steal my job?
If you are interested in what is behind this marketing buzz and are not scared of a little math, this talk would like to address some insights into applying Machine Learning techniques to data any of us have easy access to, and try to bring home the point that if all of this technology can be used to show us “better” ads in social media and track our behavior online (and a bit more than that) it can also be used to defend our networks as well.
This document discusses whether big data analysis is more of a "systems" task or "human" task. It presents research showing that software defect prediction, even when conducted by top experts using the same datasets and algorithms over many years, shows little improvement and high variability. This suggests that human factors like biases are important. The document proposes using data mining on source code and social media to classify developers by expertise and identify groups who could share knowledge to reduce defects. It outlines an initial approach using parsers, classifiers like Naive Bayes to distinguish novices from experts, and seeking larger datasets from partners. The goal is to strengthen the "human" aspects of big data analysis.
IT Cluster Skolkovo Presentation at FRUCT.org conferenceAlbert Yefimov
Skolkovo Foundation aims to foster innovation in Russia by funding over 900 startups through grants, venture funds, and partnerships with global corporations and universities; some successful startups include Datadvance, Rock Flow Dynamics, and Synesis which have received grants and grown their staff, revenue, intellectual property, and partnerships. The foundation focuses on key areas like IT, energy efficiency, life sciences, and nuclear technologies to drive economic and technological development in Russia.
Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...CS, NcState
The document discusses three laws of trusted data sharing based on research in software engineering quality prediction. The first law is to only share the essential "corners" of the data rather than all data. The second law is to anonymize the data in the corners before sharing. The third law is never to mutate the data across important "decision boundaries". The research found that building models from a small percentage of shared and privatized data in this way produced better results than using all the original raw data. The author plans to apply these laws of data sharing to other domains like smart cities and healthcare to investigate the costs and benefits of data sharing.
The Art and Science of Analyzing Software DataCS, NcState
This document summarizes an ICSE'14 tutorial on analyzing software data. The tutorial covers various topics:
- Organization issues like talking to users to understand goals, knowing the software domain to avoid misinterpretations, questioning the data, and seeing data science as cyclic.
- Qualitative methods like discovering information needs through surveys and interviews.
- Quantitative methods like data reduction techniques and privacy-preserving sharing.
- Open issues like data instabilities, model comparisons, and ensemble techniques.
The document emphasizes understanding the user's perspective and software domain knowledge to properly analyze data and avoid incorrect conclusions. Case studies show how missing this domain knowledge led analyses down wrong paths.
Agile Research in Information Systems Field: Analysis from Knowledge Transfor...Ilia Bider
Presentation at the 8th IADIS International Conference on Information systems. Pre-proceedings available at: http://bit.ly/1QPEZS5
Due to the relative success of agile methods in software development, the idea of having agile processes started to be tested in other areas, for example, agile business process development. This trend already reached the research community and there have appeared some materials that suggest using agility in research projects. Analysis of these suggestions, however, shows that they do not go beyond finding superficial analogy between the concepts of the software development and research projects. The paper presents a deeper analysis of the concept of agile research in Information Systems (IS) based on the analysis of the research projects from the knowledge transformation perspective. As a basis for analysis, the SECI model of Nonaka is used. Based on this analysis, several suggestions are made on how to conduct agile research in IS, e.g. prioritize relevance over vigor, test early for a practical purpose, use own experience and reflections, etc. It is also shown that some research types, like action research and design science, are more suitable for conducting agile research than others. The paper also gives analysis of risks of non-agile research, and presents an example where they are revealed.
Presentation at Data ScienceTech Institute campuses, Paris and Nice, May 2016 , including Intro, Data Science History and Terms; 10 Real-World Data Science Lessons; Data Science Now: Polls & Trends; Data Science Roles; Data Science Job Trends; and Data Science Future
Visually Exploring Patent Collections for Events and PatternsXiaoyu Wang
My talk on Patent Visualization at The 3rd IEEE Workshop on Interactive Visual Text Analytics. Primary focus is to introduce the Scalable Visual Analytics research that my team is working on. Workshop paper can be found at: http://vialab.science.uoit.ca/textvis2013/papers/Ankam-TextVis2013.pdf
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Alex Pinto
The document discusses machine learning-based security monitoring. It begins with an introduction of the speaker, Alex Pinto, and an agenda that will include a discussion of anomaly detection versus classification techniques. It then covers some history of anomaly detection research dating back to the 1980s. It also discusses challenges with anomaly detection, such as the curse of dimensionality with high-dimensional data and lack of ground truth labels. The document emphasizes communicating these machine learning concepts clearly.
GALE: Geometric active learning for Search-Based Software EngineeringCS, NcState
Multi-objective evolutionary algorithms (MOEAs) help software engineers find novel solutions to complex problems. When automatic tools explore too many options, they are slow to use and hard to comprehend. GALE is a near-linear time MOEA that builds a piecewise approximation to the surface of best solutions along the Pareto frontier. For each piece, GALE mutates solutions towards the better end. In numerous case studies, GALE finds comparable solutions to standard methods (NSGA-II, SPEA2) using far fewer evaluations (e.g. 20 evaluations, not 1,000). GALE is recommended when a model is expensive to evaluate, or when some audience needs to browse and understand how an MOEA has made its conclusions.
This document provides an introduction to machine learning. It discusses how machine learning gives computers the ability to learn without being explicitly programmed. It also discusses how machine learning is used widely by major companies and has become integral to many businesses. Finally, it covers different machine learning techniques including supervised learning methods like classification, regression, and artificial neural networks as well as unsupervised learning methods like clustering.
A Pragmatic Perspective on Software VisualizationArie van Deursen
Slides of the keynote presentation at the 5th International IEEE/ACM Symposium on Software Visualization, SoftVis 2010. Salt Lake City, USA, October 2010.
1. Knowledge discovery in production requires automation due to the growth of information, devices, and knowledge workers.
2. A core dataflow model engine is needed to preprocess data and compose networked intelligence solutions for emerging applications.
3. Product solutions include hybrid SaaS factory subscriptions and applications via an open marketplace to deliver business value such as increased productivity and test time reduction for electronics manufacturing customers.
DeepLearning4J and Spark: Successes and Challenges - François Garillotsparktc
Deeplearning4J is an open-source, distributed deep learning library written for Java and Scala. It provides tools for training neural networks on distributed systems. While large companies can distribute training across many servers, Deeplearning4J allows other organizations to do distributed training as well. It includes libraries for vectorization, linear algebra, data preprocessing, model definition and training. The library aims to make deep learning more accessible to enterprises by allowing them to train models on their own large datasets.
DeepLearning4J and Spark: Successes and Challenges - François Garillotsparktc
At the recent sold-out Spark & Machine Learning Meetup in Brussels, François Garillot of Skymind delivered a lightning talk called DeepLearning4J and Spark: Successes and Challenges.
Specifically, François offered a tour of the DeepLearning4J architecture intermingled with applications. He went over the main blocks of this deep learning solution for the JVM that includes GPU acceleration, a custom n-dimensional array library, a parallelized data-loading swiss army tool, deep learning and reinforcement learning libraries — all with an easy-access interface.
Along the way, he pointed out the strategic points of parallelization of computation across machines and gave insight on where Spark helps — and where it doesn't.
DeepLearning4J and Spark: Successes and Challenges - François GarillotSteve Moore
At the recent Spark & Machine Learning Meetup in Brussels, François Garillot of Skymind delivered this lightning talk to a sold-out crowd.
Specifically, François offered a tour of the DeepLearning4J architecture intermingled with applications. He went over the main blocks of this deep learning solution for the JVM that includes GPU acceleration, a custom n-dimensional array library, a parallelized data-loading swiss army tool, deep learning and reinforcement learning libraries — all with an easy-access interface.
Along the way, he pointed out the strategic points of parallelization of computation across machines and gave insight on where Spark helps — and where it doesn't.
This document provides an introduction to deep learning, including definitions of artificial intelligence, machine learning, and deep learning. It discusses examples of inputs and outputs in deep learning systems, potential applications, common Python libraries like Keras, and conclusions. The key takeaways are that deep learning uses neural networks to learn patterns at different levels of abstraction, it involves training models on data and using the models to make inferences on new data, and libraries like Keras and TensorFlow are commonly used.
Algorithm Marketplace and the new "Algorithm Economy"Diego Oppenheimer
Diego Oppenheimer discusses the rise of algorithm marketplaces and the new "algorithm economy". Key points include:
- Advances in machine learning, computer vision, speech recognition and natural language processing are enabling algorithms to interpret unstructured data at scale.
- Algorithm marketplaces allow algorithms to be hosted, discovered, monetized and composed modularly to address a wide range of use cases across many industries.
- The algorithm economy will lower barriers to applying machine intelligence and foster innovation as algorithms become reusable assets that creators and users can both benefit from.
Using Algorithmia to leverage AI and Machine Learning APIsRakuten Group, Inc.
We are entering a new era of software development. Companies are realizing that AI and machine learning are critical to success in business, both to save cost on repetitive tasks, and to enable to new features and products that would be impossible without machine intelligence. Algorithmia makes these tools available through web APIs that makes tools like computer vision and natural language processing available to companies everywhere. Kenny will talk about how sharing of intelligent APIs can improve your applications.
https://rakutentechnologyconference2016.sched.org/event/8aS5/using-algorithmia-to-leverage-ai-and-machine-learning-apis
Rakuten Technology Conference 2016
http://tech.rakuten.co.jp/
This document provides an introduction to deep learning with Microsoft's Cognitive Toolkit (CNTK). It discusses key deep learning concepts and how they are implemented in CNTK, including neural networks, backpropagation, loss functions, and common network architectures like convolutional neural networks. It also outlines several of Microsoft's products that use deep learning like Cortana, Bing, and Skype Translator. Examples of training deep learning models with CNTK on datasets like MNIST using logistic regression, multi-layer perceptrons, and CNNs are also presented.
Vertex has invested in companies across geographies addressing different industry applications leveraging AI to transform their service offerings. Read more on the trends and waves of AI developments observed.
The document summarizes the evolution of artificial intelligence (AI) from the 1950s to the present. It discusses three waves of AI development: handcrafted knowledge in the early period, statistical learning from the 1960s to 1980s, and contextual adaptation from the 1990s onward. Recent advances are driven by increased computing power, data availability, and new algorithms. Deep learning is increasingly important and applications include voice control, natural language processing, and computer vision. While AI has great potential, a lack of talent and data is creating a bifurcated ecosystem with large tech firms at the top.
Infusing Social Data Analytics into Future Internet applications for Manufact...Michael Petychakis
This document discusses using social data analytics to enhance future internet applications for manufacturing. It describes developing a cloud-based solution called FITMAN-Analyzer that collects unstructured data from social networks and websites. FITMAN-Analyzer then performs natural language processing, sentiment analysis, and trend analysis to extract useful knowledge for manufacturers. The solution is designed to be domain-independent, require no coding skills, and provide real-time streaming and visualization of results.
More information, visit: http://www.godatadriven.com/accelerator.html
Data scientists aren’t a nice-to-have anymore, they are a must-have. Businesses of all sizes are scooping up this new breed of engineering professional. But how do you find the right one for your business?
The Data Science Accelerator Program is a one year program, delivered in Amsterdam by world-class industry practitioners. It provides your aspiring data scientists with intensive on- and off-site instruction, access to an extensive network of speakers and mentors and coaching.
The Data Science Accelerator Program helps you assess and radically develop the skills of your data science staff or recruits.
Our goal is to deliver you excellent data scientists that help you become a data driven enterprise.
The right tools
We teach your organisation the proven data science tools.
The right hands
We are trusted by many industry leading partners.
The right experience
We've done big data and data science at many clients, we know what the real world is like.
The right experts
We have a world class selection of lecturers that you will be working with.
Vincent D. Warmerdam
Jonathan Samoocha
Ivo Everts
Rogier van der Geer
Ron van Weverwijk
Giovanni Lanzani
The right curriculum
We meet twice a month. Once for a lecture, once for a hackathon.
Lectures
The RStudio stack.
The art of simulation.
The iPython stack.
Linear modelling.
Operations research.
Nonlinear modelling.
Clustering & ensemble methods.
Natural language processing.
Time series.
Visualisation.
Scaling to big data.
Advanced topics.
Hackathons
Scrape and mine the internet.
Solving multiarmed bandit problems.
Webdev with flask and pandas as a backend.
Build an automation script for linear models.
Build a heuristic tsp solver.
Code review your automation for nonlinear models.
Build a method that outperforms random forests.
Build a markov chain to generate song lyrics.
Predict an optimal portfolio for the stock market.
Create an interactive d3 app with backend.
Start up a spark cluster with large s3 data.
You pick!
Interested?
Ping us here. signal@godatadriven.com
High time to add machine learning to your information security stackMinhaz A V
Machine learning and deep learning techniques are increasingly being used for cybersecurity applications like malware detection, spam filtering, and anomaly detection. As attacks become more sophisticated, machine learning can help security teams focus on important threats by analyzing large amounts of data. While machine learning is a powerful tool, security experts still need to provide guidance on what problems to solve and how to structure machine learning pipelines and evaluate results. Individuals and organizations should embrace machine learning by participating in online courses and challenges to gain hands-on experience applying these techniques.
This document provides a summary of image classification using deep learning. It begins with an introduction to the speaker and their background. It then discusses key concepts in image classification like image types (e.g. raster, vector), feature extraction using convolutional and pooling layers, classification using dense layers and activation functions, and model training. It provides examples of datasets like cats vs dogs and how to balance classes. Finally, it discusses model saving, transformers, and provides homework on modifying the image classification code.
This document discusses an introduction to deep learning and provides information about frameworks, courses, and recent advances. It includes the following:
1. An overview of deep learning techniques and frameworks like TensorFlow, MXNet, and Torch. Popular online courses from Stanford, Oxford, and Coursera are also listed.
2. Questions from attendees about potential deep learning applications in areas like social networks, NLP, robotics, fraud detection, medicine, and more.
3. Suggestions for potential deep learning solutions like QA systems, chatbots, fraud detection, image recognition, and topic extraction from text. Attendees are encouraged to apply deep learning to solve problems in their respective domains.
This presentation will discuss leveraging analytics and machine learning techniques like deep learning, long short term memory networks, and gradient boosted machines for security applications like threat assessment. The presenter will compare current machine learning technologies and discuss best practices for applying predictive modeling to security problems, including data acquisition, feature selection, and model validation. The talk is part of a security roundtable event and will be followed by a lab exercise on developing predictive models.
Big data and artificial intelligence have developed through an iterative process where increased data leads to improved infrastructure which then enables the collection of even more data. This virtuous cycle began with the rise of the internet and web data in the 1990s. Modern frameworks like Hadoop and algorithms like MapReduce established the infrastructure needed to analyze large, distributed datasets and fuel machine learning applications. Deep learning techniques are now widely used for tasks involving images, text, video and other complex data types, with many companies seeking to gain advantages by leveraging proprietary datasets.
Building High Available and Scalable Machine Learning ApplicationsYalçın Yenigün
This document discusses building high available and scalable machine learning products. It begins with an introduction to data-driven products and machine learning concepts like supervised and unsupervised learning. It then discusses six key challenges in building machine learning products at iyzico: 1) models need testing on real data before production, 2) response times must be under 0.1 seconds, 3) data is dynamic, 4) high availability and fail fast is required, 5) continuous delivery of machine learning models, and 6) simulating aggregated features from batch data. It provides examples of techniques used at iyzico to address these challenges like Spark for predictions, schemaless databases, circuit breakers, devops for machine learning, and Redis for
DeepLearning is not just a hype - it outperforms state-of-the-art ML algorithms. One by one. In this talk we will show how DeepLearning can be used for detecting anomalies on IoT sensor data streams at high speed using DeepLearning4J on top of different BigData engines like ApacheSpark and ApacheFlink. Key in this talk is the absence of any large training corpus since we are using unsupervised machine learning - a domain current DL research threats step-motherly. As we can see in this demo LSTM networks can learn very complex system behavior - in this case data coming from a physical model simulating bearing vibration data. Once draw back of DeepLearning is that normally a very large labaled training data set is required. This is particularly interesting since we can show how unsupervised machine learning can be used in conjunction with DeepLearning - no labeled data set is necessary. We are able to detect anomalies and predict braking bearings with 10 fold confidence. All examples and all code will be made publicly available and open sources. Only open source components are used.
This document provides an overview of getting started with data science using Python. It discusses what data science is, why it is in high demand, and the typical skills and backgrounds of data scientists. It then covers popular Python libraries for data science like NumPy, Pandas, Scikit-Learn, TensorFlow, and Keras. Common data science steps are outlined including data gathering, preparation, exploration, model building, validation, and deployment. Example applications and case studies are discussed along with resources for learning including podcasts, websites, communities, books, and TV shows.
Similar to Demystifying Machine Learning and Artificial Intelligence (20)
EPCC is a supercomputing centre at the University of Edinburgh that has been self-funded for over 28 years. It has over 110 staff and £5 million in annual turnover. EPCC supports multi-disciplinary research through access to its high performance computing facilities, training courses, and collaborative projects. It houses various supercomputing systems totaling over 150,000 CPU cores for researchers to use. EPCC also works with over 1000 companies through technology transfer and industrial collaborations in areas like simulation, data processing, and cloud computing. One example is its partnership with Rolls-Royce on a £15 million virtual gas turbine engine simulation project.
This document outlines the structure and supervision process for MSC student projects at EPCC. It describes the three main stages: project preparation where the student plans their objectives and workplan, a break during exams, and the main project period where the student implements their workplan under supervision. It notes that the level of supervisor involvement can be flexible but the work should be the student's, and that EPCC can provide non-domain specific support and handle all marking.
The document discusses how organizations are using the Internet of Things (IoT). It defines IoT as a network of physical objects embedded with sensors and technology to collect and share data. The document then provides examples of how IoT is used for monitoring processes and equipment, tracking logistics and assets, verifying product authenticity, and ensuring safety. It also describes the University of Edinburgh's IoT research network and testbed which is used to support various IoT projects and initiatives.
This document discusses how companies are exploiting data science in various industries. It provides examples of using gaming data to create new products, using production line sensor data and machine learning to schedule maintenance, and using a neural network trained on artificially curled pages to remove page curling from scanned books in a non-destructive manner. The gaming data example increased one company's revenue by €75k-€150k annually and the production line example could save hundreds of thousands per day in breakdown costs if scaled globally across industries.
EPCC is a leading supercomputing center that is fully self-sustaining with over 110 staff. It provides national supercomputing resources and works with companies on high performance computing and data analytics projects. The Edinburgh and Southeast Scotland City Region Deal aims to make the region the "Data Capital of Europe" through a data driven innovation program. EPCC is central to this program and will host the new World Class Data Infrastructure, a high performance data center and resources to support work with large, complex datasets across various sectors. The infrastructure seeks to train regional workforce in data technologies and enable businesses to leverage data innovation.
A data safe haven is defined as a data access broker that maintains stringent security standards required by data providers, while enforcing access restrictions on researchers. It provides expertise in information governance, security, and data management that most research groups lack. A data safe haven structurally separates duties, requires access approval, and maintains private dedicated resources and automated auditing. It allows access to data not otherwise available and provides a secure platform for sensitive data sharing, research, and collaboration at an expert level that individual projects cannot achieve alone.
By Richard Cooper, Chief Executive Officer, Rock Solid Images
This talk was presented at the EPCC-CeeD event 'Advanced technologies for Industry 4.0' in September 2018.
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
2. Machine learning & AI: why now?
• Because: Big Data
• We have (a lot) more (digital) data
• from the Web
• from sensors (including cameras, mobile devices)
• from transactional business systems
• We have faster computers
• parallel ‘cluster’ computing is mainstream
• CPUs, GPUs, FPGAs, ASICs
• We have lots of useful open source software
• for data management, data pipelining & analytics
• driven by social media giants
Advanced Technologies for Industry 4.0
3. Big Data 2017: per year…
1PB
NASDAQ
3PB
US
Census
4PB
US Library
of
Congress
5PB
NOAA
archive
6PB
YouTube
15PB
Advanced Technologies for Industry 4.0
4. Big Data 2017: per year…
Advanced Technologies for Industry 4.0
5. Big Data 2017: per year…
CERN
archive
73PB
searches
on Google
98PB
uploads to
Facebook
180PB
Advanced Technologies for Industry 4.0
6. Big Data 2017: per year…
CERN
archive
73PB
…2025 per year
Square
Kilometre Array
Telescope,
Phase 1
300PB
searches
on Google
98PB
uploads to
Facebook
180PB
High Luminosity
Large Hadron Collider
1,000PB
Square Kilometre Array
Telescope, Phase 2
1,000PB
Advanced Technologies for Industry 4.0
7. Different kinds of “big”
• Big Data are typically measured three ways
• volume – from gigabytes to terabytes to petabytes
• velocity – data streams at you or changes rapidly
• variety – no longer are data in nice, neat tables
• some folk add others
• veracity, verifiability, validity, value…
• Big Data come in many flavours
• very large transaction databases
• very large social graphs
• very large image collections
• very large numbers of sensor feeds
• etc.
Advanced Technologies for Industry 4.0
8. Data [ science | engineering | management ]
~20%
Data science
• analytics
• statistics
• machine learning
~40%
Data engineering
• data movement
• data pipelines
• data tech deployment (“data dev ops”)
• database design
• data preparation & cleaning
~40%
Data management
• data storage
• data formats
• metadata management
• data preservation & backup
• data preparation & cleaning
Advanced Technologies for Industry 4.0
9. Machine learning
“Machine learning is the science of getting computers
to act without being explicitly programmed.”
– Andrew Ng, Stanford University
• Two main kinds of machine learning
• unsupervised learning finds patterns in data without being
told exactly what to look for
• e.g. for clustering, fitting
• supervised learning uses labelled training data to build a
model, which is then used to make predictions
• e.g. for classification
Advanced Technologies for Industry 4.0
10. Unsupervised learning in action: k-means
clustering
Advanced Technologies for Industry 4.0
Iteration
11. Unsupervised learning: limitations of k-
means
• Clusters assumed
to the the same size
• Clusters on density
not so good
Advanced Technologies for Industry 4.0
12. minPts = 5, ε = 0.7 minPts = 5, ε = 0.8 minPts = 5, ε = 0.9
Unsupervised learning as art
• Plenty of other unsupervised learning algorithms
• distribution-based clustering
• density-based clustering… etc
• More complex ones have more free parameters
• tweaking is as much art as science
Advanced Technologies for Industry 4.0
13. Supervised learning: classifying irises
Advanced Technologies for Industry 4.0
? o
o ?
? o
Versicolor iris image courtesy of David
Berger under a CC-BY licence
setosa
versicolor
virginica
• Crunch data on flower
size, shape to identify
its type (class label)
• label = F (petal, sepal)
14. Supervised learning: step 1 – training
• Need labelled (i.e. already classified)
data
• want to train a model to recognise the
classes from the data (i.e. find F() )
• class label is dependent variable
• rest of data are independent variables or
predictors
• Split your big data set into training & test
sets
• 70/30 or 60/40 or so
• Feed training data into model-learning
software
• e.g. neural net, decision tree…
• Result: a classifier model F :
• label = F (petal, sepal)
Advanced Technologies for Industry 4.0
petal sepal label
1.5 5.2 setosa
1.2 4.6 setosa
4.1 6.0 versicolor
5.2 6.0 virginica
6.0 7.2 virginica
… … …
Modelling
software
Classifier
15. Supervised learning: step 2 – evaluation
• Feed test data into classifier
model F
• Count hits, misses vs your known
labels
• true positives, false positives…
• Good enough?
• good to go!
• Not good enough?
• go back
• tweak your modelling software
• try again
Advanced Technologies for Industry 4.0
petal sepal label
1.4 5.1 setosa
5.3 6.5 virginica
4.5 6.2 virginica
… … …
Classifier
petal sepal label
model
says…
1.4 5.1 setosa setosa
5.3 6.5 virginica virginica
4.5 6.2 virginica versicolor
… … … …
16. Advanced supervised learning: deep learning
• Deep learning: “learn multiple levels of representations
that correspond to different levels of abstraction”
• Wikipedia
• An old-fashioned neural net is 1 layer deep
• Deep learning neural nets are… deeper!
• multi-layer NNs, deep NNs, recurrent NNs, convolution NNs
• e.g. deep learning for image recognition
• look at flat pixel data… (1 layer)
• …and edge-detection in the image data… (another layer)
• …and different scales of the image data… (another layer)
• all in the same modelling framework
Advanced Technologies for Industry 4.0
18. Deep learning: spotting solar panels
• Accuracy:
• 99.60% !
• Careful!
• a classifier that
always says
“background” is
98.75% accurate
• precision is a
better measure!
• Precision:
• 84.54%
Advanced Technologies for Industry 4.0
19. Advanced supervised learning: reinforcement
learning
• Reinforcement learning allows software
“agents” to “explore”
• don’t need labelled data
• just set up an environment & go
• An agent:
• takes actions in an environment
• which is interpreted into a reward…
• and a representation of the state…
• which are fed back into the agent
Advanced Technologies for Industry 4.0
• Good example is DeepMind’s AlphaGo Zero
• two versions of the agent play Go against each other
• learn winning strategies by beating the other guy
20. Machine learning and artificial intelligence
• Today’s ML is principally pattern recognition
• IF data.looksLike(pedestrian) THEN report(‘Pedestrian’);
• This can be a powerful tool for decision support
• Think of AI as taking next step to decision making:
• IF data.looksLike(pedestrian) THEN brakes.On(now);
• Generally, we want to use empirical data to take next-best-
action
• whether a human is in, on or out of the loop
Advanced Technologies for Industry 4.0
21. The future of AI
• State-of-the-art in AI driven robotics:
• a team at Nanyang Technological University, Singapore got two industrial
robots to assemble (most of) an IKEA STEFAN chair in c. 20 mins
• The Economist, April 2018
• Current research topics are transfer learning…
• can a machine learn the rules of Go (yes) then figure out how to apply
them to the game of Chess (not yet)
• …and curiosity-based learning
• continuing the reinforcement-learning trend
• Hardware is becoming specialised
• GPUs (graphical processing units) and more
• Excellent source: https://www.stateof.ai/
• Nathan Benaich, Ian Hogarth (UK AI VCs), June 2018
Advanced Technologies for Industry 4.0
22. Be problem-driven, not data-driven
• Big Data / AI / ML is not a silver bullet
• Don’t start with the tech – start with the problem
• Don’t look at “your” data and ask what can I do with them?
• Look at your business and ask, what can I do better?
• improve operational efficiency (data management)
• understand my customers better (data science/ML)
• measure or monitor things with sensors (data engineering)
• simulate things digitally (data engineering/management)
• automate processes/decisions (ML/AI)
Advanced Technologies for Industry 4.0
Editor's Notes
We have a lot of data but we need techniques/tools/machines to understand/interpret the data and make use of it. Here is where machine learning and AI come into play. Data, powerful machines, and open-source software are available.
Although we call this ‘unsupervised’ actually we have told the computer to divide the dataset into three groups. We could have said 2 or 4. This is the ‘k’ value. The ‘means’ part signifies that a data point is assigned to the cluster that has the closest mean (average) value. The algorithm tries to get the points so that the sum of the distances of each point to the mean of its cluster is the minimum for the whole set. Run the animation and watch the crosses (the mean of each cluster) move as the algorithm progresses towards a better solution.
This method does not work for all datasets. These are standard datasets that are used to show that the method can break down.
Setting the ringed parameter to differing values produces different results. These diagrams show unsupervised learning where the number of clusters is not given. Instead there are parameter to define the minimum number of points and how far they are allowed to be from the centre of a cluster but still be counted as part of it. Varying the distance parameter changes the results significantly. Which value is correct? There is no correct answer to that!
Iris is a type of flower with three categories. The picture shows versicolor.
Sepal is the part of the flower that supports the petals (usually green) not shown that well in this diagram.
How would you classify the question marks?
First one is quite easy
Second one is quite easy
Third one is trickier
---
Plot comes from:
my.plot <- xyplot(Sepal.Length ~ Petal.Length, data = iris, groups=Species,
panel = panel.superpose,
col.line = trellis.par.get("strip.background")$col,
col.symbol = trellis.par.get("strip.shingle")$col,
key = list(title = "Iris Data", x = .15, y=.85, corner = c(0,1),
border = TRUE,
points = list(col=trellis.par.get("strip.shingle")$col[1:3],
pch = trellis.par.get("superpose.symbol")$pch[1:3],
cex = trellis.par.get("superpose.symbol")$cex[1:3]
),
text = list(levels(iris$Species))))
The algorithm (set of steps) used to train a model.
It’s quite an important point here that it’s not a good idea to use your training data as test data. This is why you hold some back (as on the previous slide). It’s possible to end up with a very misleading accuracy by ‘overtraining’ the algorithm so that it performs with very high accuracy on the training set, but is no good for data it has not ‘seen’ before.
Neural networks usually have several layers. Deep learning comes from “deep neural networks”, ie a deep learning model contains a lot of layers.
For further information, see:
https://www.mathworks.com/discovery/deep-learning.html
https://devblogs.nvidia.com/deep-learning-nutshell-core-concepts/
CNNS are commonly used for image processing. A CNN consists of an input and an output layer, as well as multiple hidden layers. The hidden layers of a CNN typically consist of convolutional layers, pooling layers, fully connected layers and normalization layers. Convolutional layers apply a convolution operation to the input, passing the result to the next layer.
See: https://en.wikipedia.org/wiki/Convolutional_neural_network#Convolutional
i.e. PPV: 84.54% of the predicted solar panel pixels are solar panel pixels
i.e. TPR: 83.78% of the pixels that belong to a solar panel are correctly predicted as solar panel pixels
To learn more, see the Deepmind page:
https://deepmind.com/blog/alphago-zero-learning-scratch/