2. This is a first cut.
More details will be added later.
3. Part 1: Artificial Intelligence (AI)
Part 2: Natural Intelligence(NI)
Part 3: Artificial General Intelligence (AI + NI)
Part 4: Networked AGI Layer on top or Gaia and Human Society
Four Slide Sets on Artificial General Intelligence
AI = Artificial Intelligence (Task)
AGI = Artificial Mind (Simulation)
AB = Artificial Brain (Emulation)
AC = Artificial Consciousness (Synthetic)
AI < AGI < ? AB <AC (Is a partial brain emulation needed to create a mind?)
Mind is not required for task proficiency
Full Natural Brain architecture is not required for a mind
Consciousness is not required for a natural brain architecture
4. Philosophical Musings 10/2022
Focused Artifical Intelligence (AI) will get better at specific tasks
Specific AI implementations will probably exceed human performance in most tasks
Some will attain superhuman abilities is a wide range of tasks
“Common Sense” = low-level experiential broad knowledge could be an exception
Some AIs could use brain inspired architectures to improve complex ask performance
This is not equivalent to human or artificial general intelligence (AGI)
However networking task-centric AIs could provide a first step towards AGI
This is similar to the way human society achieves power from communication
The combination of the networked AIs could be the foundation of an artificial mind
In a similar fashion, human society can accomplish complex tasks without being conscious
Distributed division of labor enable tasks to be assigned to the most competent element
Networked humans and AIs could cooperate through brain-machine interfaces
In the brain, consciousness provides direction to the mind
In large societies, governments perform the role of conscious direction
With networked AIs, a “conscious operating system”could play a similar role.
This would probably have to be initially programmed by humans.
If the AI network included sensors, actuators, and robots it could be aware of the world
The AI network could form a grid managing society, biology, and geology layers
A conscious AI network could develop its own goals beyond efficient management
Humans in the loop could be valuable in providing common sense and protective oversight
5. Outline
Classical AI
Knowledge Representation
Agents
Classical Machine Learning
Deep Learning
Deep Learning Models
Deep Learning Hardware
Reinforcement Learning
Google Research
Computing and Sensing Architecture
IoT and Deep Learning
DeepMind
Deep Learning 2020
Causal Reasoning and Deep Learning
References
11. Stored Knowledge Base
From https://www.researchgate.net/publication/327926311_Development_of_a_knowledge_base_based_on_context_analysis_of_external_information_resources/figures?lo=1
15. Intelligent Agents
From https://en.wikipedia.org/wiki/Intelligent_agent
In artificial intelligence, an intelligent agent (IA) is anything which perceives its environment, takes actions autonomously in order to achieve goals, and may improve
its performance with learning or may use knowledge. They may be simple or complex — a thermostat is considered an example of an intelligent agent, as is a human
being, as is any system that meets the definition, such as a firm, a state, or a biome.[1]
Leading AI textbooks define "artificial intelligence" as the "study and design of intelligent agents", a definition that considers goal-directed behavior to be the essence of
intelligence. Goal-directed agents are also described using a term borrowed from economics, "rational agent".[1]
An agent has an "objective function" that encapsulates all the IA's goals. Such an agent is designed to create and execute whatever plan will, upon completion, maximize
the expected value of the objective function.[2] For example, a reinforcement learning agent has a "reward function" that allows the programmers to shape the IA's desired
behavior,[3] and an evolutionary algorithm's behavior is shaped by a "fitness function".[4]
Intelligent agents in artificial intelligence are closely related to agents in economics, and versions of the intelligent agent paradigm are studied in cognitive science,
ethics, the philosophy of practical reason, as well as in many interdisciplinary socio-cognitive modeling and computer social simulations.
Intelligent agents are often described schematically as an abstract functional system similar to a computer program. Abstract descriptions of intelligent agents are called
abstract intelligent agents (AIA) to distinguish them from their real world implementations. An autonomous intelligent agent is designed to function in the absence of
human intervention. Intelligent agents are also closely related to software agents (an autonomous computer program that carries out tasks on behalf of users).
16. Node in Real-Time Control System (RCS) by Albus
From https://en.wikipedia.org/wiki/4D-RCS_Reference_Model_Architecture
17. Intelligent Agents for Network Management
From https://www.ericsson.com/en/blog/2022/6/who-are-the-intelligent-agents-in-network-operations-and-why-we-need-them
18. Intelligent Agents on the Web
From https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.230.5806&rep=rep1&type=pdf
Intelligent agents are goal-driven and autonomous, and can communicate and interact with each other. Moreover,
they can evaluate information obtained online from heterogeneoussources and present information tailored to an
individual’s needs. This article covers different facets of the intelligent agent paradigm and applications, while also
exploring new opportunities and trends for intelligent agents.
IAs cover several functionalities, ranging from adaptive user interfaces (called interface agents) tointelligent
mobile processes that cooperate with other agents to coordinate their activities in a distributed manner. The
requirements for IAs remain open for discussion. An agent should be able to:
• interact with humans and other agents
• anticipate user needs for information
• adapt to changes in user needs and the environment
• cope with heterogeneity of information and other agents.
The following attributes characterize an IA-based systems’ main capabilities:
• Intelligence. The method an agent uses to de-velop its intelligence includes using the agent’sown software
content and knowledge representation, which describes vocabulary data, conditions, goals, and tasks.
• Continuity. An agent is a continuously running process that can detect changes in its environment, modify its
behavior, and update its knowledge base (which describes the environment).
• Communication. An agent can communicate with other agents to achieve its goals, and it can interact with users
directly by using appropriate interfaces.
• Cooperation. An agent automatically customizes itself to its users’ needs based on previous experiences and
monitored profiles.
• Mobility. The degree of mobility with which an agent can perform varies from remote execution, in which the
agent is transferred from
a distant system, to a situation in which the agent creates new agents, dies, or executes partially during migratiion
19. Smart Agents 2022 Comparison
From https://www.businessnewsdaily.com/10315-siri-cortana-google-assistant-amazon-alexa-face-off.html
When AI assistants first hit the market, they were far from ubiquitous, but thanks to more third-party OEMs jumping on the smart speaker bandwagon,
there are more choices for assistant-enabled devices than ever. In addition to increasing variety, in terms of hardware, devices that support multiple types
of AI assistants are becoming more common. Despite more integration, competition between AI assistants is still stiff, so to save you time and
frustration, we did an extensive hands-on test – not to compare speakers against each other, but to compare the AI assistants themselves.
There are four frontrunners in the AI assistant space: Amazon (Alexa), Apple (Siri), Google (Google Assistant) and Microsoft (Cortana). Rather than
gauge each assistant’s efficacy based on company-reported features, I spent hours testing each assistant by issuing commands and asking questions that
many business users would use. I constructed questions to test basic understanding as well as contextual understanding and general vocal recognition.
Accessibility and trends
Ease of setup
Voice recognition
Success of queries and ability to understand context
Bottom line
None of the AI assistants are perfect; this is young technology, and it has a long way to go. There was a handful of questions that none of the virtual
assistants on my list could answer. For example, when I asked for directions to the closest airport, even the two best assistants on my list, Google
Assistant and Siri, failed hilariously: Google Assistant directed me to a travel agency (those still exist?), while Siri directed me to a seaplane base (so
close!).
Judging purely on out-of-the-box functionality, I would choose either Siri or Google Assistant, and I would make the final choice based on hardware
preferences. None of the assistants are good enough to go out of your way to adopt. Choose between Siri and Google Assistant based on convenience
and what hardware you already have
IFTTT = "if this, then that," is a service that lets you connect apps, services, and smart home devices.
20. Amazon Alexa
From https://en.wikipedia.org/wiki/Amazon_Alexa
Amazon Alexa, also known simply as Alexa,[2] is a virtual assistant technology largely based on a Polish speech synthesiser
named Ivona, bought by Amazon in 2013.[3][4] It was first used in the Amazon Echo smart speaker and the Echo Dot, Echo
Studio and Amazon Tap speakers developed by Amazon Lab126. It is capable of voice interaction, music playback, making
to-do lists, setting alarms, streaming podcasts, playing audiobooks, and providing weather, traffic, sports, and other real-time
information, such as news.[5] Alexa can also control several smart devices using itself as a home automation system. Users
are able to extend the Alexa capabilities by installing "skills" (additional functionality developed by third-party vendors, in other
settings more commonly called apps) such as weather programs and audio features. It uses automatic speech recognition,
natural language processing, and other forms of weak AI to perform these tasks.[6]
Most devices with Alexa allow users to activate the device using a wake-word[7] (such as Alexa or Amazon); other devices
(such as the Amazon mobile app on iOS or Android and Amazon Dash Wand) require the user to click a button to activate
Alexa's listening mode, although, some phones also allow a user to say a command, such as "Alexa" or "Alexa wake".
21. Google Assistant
From https://en.wikipedia.org/wiki/Google_Assistant
Google Assistant is a virtual assistant software application developed by Google that is primarily available on mobile and home
automation devices. Based on artificial intelligence, Google Assistant can engage in two-way conversations,[1] unlike the
company's previous virtual assistant, Google Now.
Google Assistant debuted in May 2016 as part of Google's messaging app Allo, and its voice-activated speaker Google Home.
After a period of exclusivity on the Pixel and Pixel XL smartphones, it was deployed on other Android devices starting in February
2017, including third-party smartphones and Android Wear (now Wear OS), and was released as a standalone app on
the iOS operating system in May 2017. Alongside the announcement of a software development kit in April 2017, Assistant has
been further extended to support a large variety of devices, including cars and third-party smart home appliances. The
functionality of the Assistant can also be enhanced by third-party developers.
Users primarily interact with the Google Assistant through natural voice, though keyboard input is also supported. Assistant is
able to answer questions, schedule events and alarms, adjust hardware settings on the user's device, show information from the
user's Google account, play games, and more. Google has also announced that Assistant will be able to identify objects and
gather visual information through the device's camera, and support purchasing products and sending money.
22. Apple Siri
https://en.wikipedia.org/wiki/Siri
Siri (/ˈsɪri/ SEER-ee) is a virtual assistant that is part of Apple Inc.'s iOS, iPadOS, watchOS, macOS, tvOS, and audioOS operating systems.[1]
[2] It uses voice queries, gesture based control, focus-tracking and a natural-language user interface to answer questions, make
recommendations, and perform actions by delegating requests to a set of Internet services. With continued use, it adapts to users' individual
language usages, searches and preferences, returning individualized results.
Siri is a spin-off from a project developed by the SRI International Artificial Intelligence Center. Its speech recognition engine was provided
by Nuance Communications, and it uses advanced machine learning technologies to function. Its original American, British and
Australian voice actors recorded their respective voices around 2005, unaware of the recordings' eventual usage. Siri was released as an app
for iOS in February 2010. Two months later, Apple acquired it and integrated into iPhone 4S at its release on 4 October, 2011, removing the
separate app from the iOS App Store. Siri has since been an integral part of Apple's products, having been adapted into other hardware
devices including newer iPhone models, iPad, iPod Touch, Mac, AirPods, Apple TV, and HomePod.
Siri supports a wide range of user commands, including performing phone actions, checking basic information, scheduling events and
reminders, handling device settings, searching the Internet, navigating areas, finding information on entertainment, and is able to engage with
iOS-integrated apps. With the release of iOS 10 in 2016, Apple opened up limited third-party access to Siri, including third-party messaging
apps, as well as payments, ride-sharing, and Internet calling apps. With the release of iOS 11, Apple updated Siri's voice and added support
for follow-up questions, language translation, and additional third-party actions.
23. Microsoft Cortana
From https://en.wikipedia.org/wiki/Cortana_(virtual_assistant)
Cortana is a virtual assistant developed by Microsoft that uses the Bing search engine to perform tasks such as setting reminders
and answering questions for the user.
Cortana is currently available in English, Portuguese, French, German, Italian, Spanish, Chinese, and Japanese language editions, depending
on the software platform and region in which it is used.[8]
Microsoft began reducing the prevalence of Cortana and converting it from an assistant into different software integrations in 2019.[9] It was split
from the Windows 10 search bar in April 2019.[10] In January 2020, the Cortana mobile app was removed from certain markets,[11][12] and on
March 31, 2021, the Cortana mobile app was shut down globally.[13]
Microsoft has integrated Cortana into numerous products such as Microsoft Edge,[28] the browser bundled with Windows 10. Microsoft's
Cortana assistant is deeply integrated into its Edge browser. Cortana can find opening hours when on restaurant sites, show retail coupons for
websites, or show weather information in the address bar. At the Worldwide Partners Conference 2015 Microsoft demonstrated Cortana
integration with products such as GigJam.[29] Conversely, Microsoft announced in late April 2016 that it would block anything other than Bing
and Edge from being used to complete Cortana searches, again raising questions of anti-competitive practices by the company.[30]
In May 2017, Microsoft in collaboration with Harman Kardon announced INVOKE, a voice-activated speaker featuring Cortana. The premium
speaker has a cylindrical design and offers 360 degree sound, the ability to make and receive calls with Skype, and all of the other features
currently available with Cortana.[42]
25. Machine Learning Types
From https://towardsdatascience.com/coding-deep-learning-for-beginners-types-of-machine-learning-b9e651e1ed9d
26. Perceptron
From https://deepai.org/machine-learning-glossary-and-terms/perceptron
How does a Perceptron work?
The process begins by taking all the input values and multiplying them by their weights. Then, all of these
multiplied values are added together to create the weighted sum. The weighted sum is then applied to the
activation function, producing the perceptron's output. The activation function plays the integral role of
ensuring the output is mapped between required values such as (0,1) or (-1,1). It is important to note that
the weight of an input is indicative of the strength of a node. Similarly, an input's bias value gives the
ability to shift the activation function curve up or down.
27. Ensemble Machine Learning
From https://machinelearningmastery.com/tour-of-ensemble-learning-algorithms/
Ensemble learning is a general meta approach to machine learning that seeks better predictive
performance by combining the predictions from multiple models.
Although there are a seemingly unlimited number of ensembles that you can develop for your predictive
modeling problem, there are three methods that dominate the field of ensemble learning. So much so, that
rather than algorithms per se, each is a field of study that has spawned many more specialized methods.
The three main classes of ensemble learning methods are bagging, stacking, and boosting, and it is
important to both have a detailed understanding of each method and to consider them on your predictive
modeling project.
But, before that, you need a gentle introduction to these approaches and the key ideas behind each method
prior to layering on math and code.
In this tutorial, you will discover the three standard ensemble learning techniques for machine learning.
After completing this tutorial, you will know:
• Bagging involves fitting many decision trees on different samples of the same dataset and averaging
the predictions.
• Stacking involves fitting many different models types on the same data and using another model to
learn how to best combine the predictions.
• Boosting involves adding ensemble members sequentially that correct the predictions made by prior
models and outputs a weighted average of the predictions.
28. Bagging
From https://en.wikipedia.org/wiki/Bootstrap_aggregating
Bootstrap aggregating, also called bagging (from bootstrap aggregating), is a machine learning ensemble
meta-algorithm designed to improve the stability and accuracy of machine learning algorithms used in
statistical classification and regression. It also reduces variance and helps to avoid overfitting. Although it
is usually applied to decision tree methods, it can be used with any type of method. Bagging is a special
case of the model averaging approach.
Given a standard training set of size n, bagging generates m new training sets , each of size nʹ, by
sampling from D uniformly and with replacement. By sampling with replacement, some observations may
be repeated in each . If nʹ=n, then for large n the set is expected to have the fraction (1 - 1/e) (≈63.2%) of
the unique examples of D, the rest being duplicates.[1] This kind of sample is known as a bootstrap sample.
Sampling with replacement ensures each bootstrap is independent from its peers, as it does not depend on
previous chosen samples when sampling. Then, m models are fitted using the above m bootstrap samples
and combined by averaging the output (for regression) or voting (for classification).
29. Boosting
From https://www.ibm.com/cloud/learn/boosting and
https://en.wikipedia.org/wiki/Boosting_(machine_learning)
In machine learning, boosting is an ensemble meta-algorithm for primarily reducing bias, and also variance[1]
in supervised learning, and a family of machine learning algorithms that convert weak learners to strong ones
Bagging vs Boosting
Bagging and boosting are two main types of ensemble learning methods. As highlighted in this study (PDF, 242 KB)
(link resides outside IBM), the main difference between these learning methods is the way in which they are trained.
In bagging, weak learners are trained in parallel, but in boosting, they learn sequentially. This means that a series of
models are constructed and with each new model iteration, the weights of the misclassified data in the previous
model are increased. This redistribution of weights helps the algorithm identify the parameters that it needs to focus
on to improve its performance. AdaBoost, which stands for “adaptative boosting algorithm,” is one of the most
popular boosting algorithms as it was one of the first of its kind. Other types of boosting algorithms include
XGBoost, GradientBoost, and BrownBoost.
Another difference between bagging and boosting is in how they are used. For example, bagging methods are
typically used on weak learners that exhibit high variance and low bias, whereas boosting methods are leveraged
when low variance and high bias is observed. While bagging can be used to avoid overfitting, boosting methods
can be more prone to this (link resides outside IBM) although it really depends on the dataset. However, parameter
tuning can help avoid the issue.
As a result, bagging and boosting have different real-world applications as well. Bagging has been leveraged for
loan approval processes and statistical genomics while boosting has been used more within image recognition
apps and search engines.
Boosting is an ensemble learning method that combines a set of weak learners into a strong learner
to minimize training errors. In boosting, a random sample of data is selected, fitted with a model and
then trained sequentially—that is, each model tries to compensate for the weaknesses of its
predecessor. With each iteration, the weak rules from each individual classifier are combined to form
one, strong prediction rule.
30. Stacking
From https://www.geeksforgeeks.org/stacking-in-machine-learning/
Stacking is a way to ensemble multiple classifications or regression model. There are many ways to ensemble
models, the widely known models are Bagging or Boosting. Bagging allows multiple similar models with high
variance are averaged to decrease variance. Boosting builds multiple incremental models to decrease the bias, while
keeping variance small.
Stacking (sometimes called Stacked Generalization) is a different paradigm. The point of stacking is to explore a
space of different models for the same problem. The idea is that you can attack a learning problem with different
types of models which are capable to learn some part of the problem, but not the whole space of the problem. So, you
can build multiple different learners and you use them to build an intermediate prediction, one prediction for each
learned model. Then you add a new model which learns from the intermediate predictions the same target.
This final model is said to be stacked on the top of the others, hence the name. Thus, you might improve your overall
performance, and often you end up with a model which is better than any individual intermediate model. Notice
however, that it does not give you any guarantee, as is often the case with any machine learning technique.
31. Gradient Boosting
From https://en.wikipedia.org/wiki/Gradient_boosting
Gradient boosting is a machine learning technique used in regression and classification tasks, among others. It gives a
prediction model in the form of an ensemble of weak prediction models, which are typically decision trees.[1][2] When a
decision tree is the weak learner, the resulting algorithm is called gradient-boosted trees; it usually outperforms random
forest.[1][2][3] A gradient-boosted trees model is built in a stage-wise fashion as in other boosting methods, but it generalizes
the other methods by allowing optimization of an arbitrary differentiable loss function.
32. Introduction to XG Boost
From https://machinelearningmastery.com/gentle-introduction-xgboost-applied-machine-learning/
37. Acumos Shared Model Process Flow
From https://arxiv.org/ftp/arxiv/papers/1810/1810.07159.pdf
38. Distributed AI
From https://en.wikipedia.org/wiki/Distributed_artificial_intelligence
Distributed Artificial Intelligence (DAI) also called Decentralized Artificial Intelligence[1] is a subfield of artificial intelligence research dedicated to the
development of distributed solutions for problems. DAI is closely related to and a predecessor of the field of multi-agent systems.
The objectives of Distributed Artificial Intelligence are to solve the reasoning, planning, learning and perception problems of artificial intelligence,
especially if they require large data, by distributing the problem to autonomous processing nodes (agents). To reach the objective, DAI requires:
• A distributed system with robust and elastic computation on unreliable and failing resources that are loosely coupled
• Coordination of the actions and communication of the nodes
• Subsamples of large data sets and online machine learning
There are many reasons for wanting to distribute intelligence or cope with multi-agent systems. Mainstream problems in DAI research include the
following:
• Parallel problem solving: mainly deals with how classic artificial intelligence concepts can be modified, so that multiprocessor systems and clusters
of computers can be used to speed up calculation.
• Distributed problem solving (DPS): the concept of agent, autonomous entities that can communicate with each other, was developed to serve as an
abstraction for developing DPS systems. See below for further details.
• Multi-Agent Based Simulation (MABS): a branch of DAI that builds the foundation for simulations that need to analyze not only phenomena at
macro level but also at micro level, as it is in many social simulation scenarios.
39. Swarm Intelligence
From https://en.wikipedia.org/wiki/Swarm_intelligence
Swarm intelligence (SI) is the collective behavior of decentralized, self-organized systems, natural or artificial. The concept is employed in work
on artificial intelligence. The expression was introduced by Gerardo Beni and Jing Wang in 1989, in the context of cellular robotic systems.[1]
SI systems consist typically of a population of simple agents or boids interacting locally with one another and with their environment.[2] The
inspiration often comes from nature, especially biological systems. The agents follow very simple rules, and although there is no centralized control
structure dictating how individual agents should behave, local, and to a certain degree random, interactions between such agents lead to
the emergence of "intelligent" global behavior, unknown to the individual agents.[3] Examples of swarm intelligence in natural systems include ant
colonies, bee colonies, bird flocking, hawks hunting, animal herding, bacterial growth, fish schooling and microbial intelligence.
The application of swarm principles to robots is called swarm robotics while swarm intelligence refers to the more general set of algorithms. Swarm
prediction has been used in the context of forecasting problems. Similar approaches to those proposed for swarm robotics are considered
for genetically modified organisms in synthetic collective intelligence.[4]
• 1 Models of swarm behavior
◦ 1.1 Boids (Reynolds 1987)
◦ 1.2 Self-propelled particles (Vicsek et al. 1995)
• 2 Metaheuristics
◦ 2.1 Stochastic diffusion search (Bishop 1989)
◦ 2.2 Ant colony optimization (Dorigo 1992)
◦ 2.3 Particle swarm optimization (Kennedy, Eberhart & Shi 1995)
◦ 2.4 Artificial Swarm Intelligence (2015)
• 3 Applications
◦ 3.1 Ant-based routing
◦ 3.2 Crowd simulation
▪ 3.2.1 Instances
◦ 3.3 Human swarming
◦ 3.4 Swarm grammars
◦ 3.5 Swarmic art
40. IBM Watson
From https://en.wikipedia.org/wiki/IBM_Watson
IBM Watson is a question-answering computer system capable of answering questions posed in natural language,[2] developed in IBM's
DeepQA project by a research team led by principal investigator David Ferrucci.[3] Watson was named after IBM's founder and first CEO,
industrialist Thomas J. Watson.[4][5]
Software -Watson uses IBM's DeepQA software and the Apache UIMA (Unstructured Information Management Architecture) framework implementation. The system
was written in various languages, including Java, C++, and Prolog, and runs on the SUSE Linux Enterprise Server 11 operating system using the Apache Hadoop
framework to provide distributed computing.[12][13][14]
Hardware -The system is workload-optimized, integrating massively parallel POWER7 processors and built on IBM's DeepQA technology,[15] which it uses to generate
hypotheses, gather massive evidence, and analyze data.[2] Watson employs a cluster of ninety IBM Power 750 servers, each of which uses a 3.5 GHz POWER7 eight-
core processor, with four threads per core. In total, the system has 2,880 POWER7 processor threads and 16 terabytes of RAM.[15] According to John Rennie, Watson
can process 500 gigabytes (the equivalent of a million books) per second.[16] IBM master inventor and senior consultant Tony Pearson estimated Watson's hardware cost
at about three million dollars.[17] Its Linpack performance stands at 80 TeraFLOPs, which is about half as fast as the cut-off line for the Top 500 Supercomputers list.[18]
According to Rennie, all content was stored in Watson's RAM for the Jeopardy game because data stored on hard drives would be too slow to compete with human
Jeopardy champions.[16]
Data -The sources of information for Watson include encyclopedias, dictionaries, thesauri, newswire articles and literary works. Watson also used databases,
taxonomies and ontologies including DBPedia, WordNet and Yago.[19] The IBM team provided Watson with millions of documents, including dictionaries,
encyclopedias and other reference material, that it could use to build its knowledge.[20]
From https://www.researchgate.net/publication/282644173_Implementation_of_a_Natural_Language_Processing_Tool_for_Cyber-Physical_Systems/figures?lo=1
50. Neural Net Models
From https://becominghuman.ai/cheat-sheets-for-ai-neural-networks-machine-learning-deep-learning-big-data-678c51b4b463
51. Neural Net Models (cont)
From https://becominghuman.ai/cheat-sheets-for-ai-neural-networks-machine-learning-deep-learning-big-data-678c51b4b463
52. TensorFlow
From https://en.wikipedia.org/wiki/TensorFlow
TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of
tasks but has a particular focus on training and inference of deep neural networks.[4][5]
TensorFlow was developed by the Google Brain team for internal Google use in research and production.[6][7][8] The initial version
was released under the Apache License 2.0 in 2015.[1][9] Google released the updated version of TensorFlow, named TensorFlow 2.0,
in September 2019.[10]
TensorFlow can be used in a wide variety of programming languages, most notably Python, as well as Javascript, C++, and Java.[11]
This flexibility lends itself to a range of applications in many different sectors.
53. Keras
From https://en.wikipedia.org/wiki/Keras
Keras is an open-source software library that provides a Python interface for artificial neural networks. Keras acts
as an interface for the TensorFlow library.
Up until version 2.3, Keras supported multiple backends, including TensorFlow, Microsoft Cognitive Toolkit,
Theano, and PlaidML.[1][2][3] As of version 2.4, only TensorFlow is supported. Designed to enable fast
experimentation with deep neural networks, it focuses on being user-friendly, modular, and extensible. It was
developed as part of the research effort of project ONEIROS (Open-ended Neuro-Electronic Intelligent Robot
Operating System),[4] and its primary author and maintainer is François Chollet, a Google engineer. Chollet is also
the author of the Xception deep neural network model.[5]
54. Comparison of Deep Learning Frameworks
From https://arxiv.org/pdf/1903.00102.pdf
55. Popularity of Deep Learning Frameworks
From https://medium.com/implodinggradients/tensorflow-or-keras-which-one-should-i-learn-5dd7fa3f9ca0
56. Acronyms in Deep Learning
• RBM - Restricted Boltzmann Machines
• MLP - Multi-layer Perceptron
• DBN - Deep Belief Network
• CNN - Convolution Neural Network
• RNN - Recurrent Neural Network
• SGD - Stochastic Gradient Descent
• XOR - Exclusive Or
• SVM - SupportVector Machine
• ReLu - Rectified Linear Unit
• MNIST - Modified National Institute of Standards and Technology
• RBF - Radial Basis Function
• HMM - Hidden Markovv Model
• MAP - Maximum A Postiori
• MLE - Maximum Likelihood Estimate
• Adam - Adaptive Moment Estimation
• LSTM - Long Short Term Memory
• GRU - Gated Recurrent Unit
57. Concerns for Deep Learning by Gary Marcus
From https://arxiv.org/ftp/arxiv/papers/1801/1801.00631.pdf
Deep Learning thus far:
• Is data hungry
• Is shallow and has limited capacity for transfer
• Has no natural way to deal with hierarchical structure
• Has struggled with open-ended inference
• Is not sufficiently transparent
• Has not been well integrated with prior knowledge
• Cannot inherently distinguish causation from correlation
• Presumes a largely stable world, in ways that may be problematic
• Works well as an approximation, but answers often can’t be fully trusted
• Is difficult to engineer with
64. Bayesian Learning vis Stochastic Gradient Langevin Dynamics
From https://tinyurl.com/22xayz76
In this paper we propose a new framework for learning from large scale
datasets based on iterative learning from small minibatches.By adding the
right amount of noise to a standard stochastic gradient optimization
algorithm we show that the iterates will converge to samples from the true
posterior distribution as we anneal the stepsize. This seamless transition
between optimization and Bayesian posterior sampling provides an in-
built protection against overfitting. We also propose a practical method for
Monte Carlo estimates of posterior statistics which monitors a “sampling
threshold” and collects samples after it has been surpassed. We apply the
method to three models: a mixture of Gaussians, logistic regression and
ICA with natural gradients
Our method combines Robbins-Monro type algorithms which stochastically
optimize a likelihood, with Langevin dynamics which injects noise into the
parameter updates in such a waythat the trajectory of the parameters will
converge to the full posterior distribution rather than just themaximum a
posteriori mode. The resulting algorithm starts off being similar to stochastic
optimization, then automatically transitions to one that simulates samples from
the posterior using Langevin dynamics.
66. Bayesian Deep Learning Survey
From https://arxiv.org/pdf/1604.01662.pdf
Conclusion and Future Research
In this survey, we identified a current trend of merging probabilistic graphical models and neural networks (deep
learning) and reviewed recent work on Bayesian deep learning, which strives to combine the merits of PGM and NN by
organically integrating them in a single principled probabilistic framework. To learn parameters in BDL, several
algorithms have been proposed, ranging from block coordinate descent, Bayesian conditional density filtering, and
stochastic gradient thermostats to stochastic gradient variational Bayes. Bayesian deep learning gains its popularity
both from the success of PGM and from the recent promising advances on deep learning. Since many real-world tasks
involve both perception and inference, BDL is a natural choice to harness the perception ability from NN and the (causal
and logical) inference ability from PGM. Although current applications of BDL focus on recommender systems, topic
models, and stochastic optimal control, in the future, we can expect an increasing number of other applications like link
prediction, community detection, active learning, Bayesian reinforcement learning, and many other complex tasks that
need interaction between perception and causal inference. Besides, with the advances of efficient Bayesian neural
networks (BNN), BDL with BNN as an important component is expected to be more and more scalable
67. Ensemble Methods for Deep Learning
From https://machinelearningmastery.com/ensemble-methods-for-deep-learning-neural-networks/
69. Seed Reinforcement Learning from Google
From https://ai.googleblog.com/2020/03/massively-scaling-reinforcement.html
The field of reinforcement learning (RL) has recently seen impressive results across a variety oftasks. This has in
part been fueled by the introduction of deep learning in RL and the introduction of accelerators such as GPUs. In
the very recent history, focus on massive scale has been key to solve a number of complicated games such as
AlphaGo (Silver et al., 2016), Dota (OpenAI, 2018)and StarCraft 2 (Vinyals et al., 2017).
The sheer amount of environment data needed to solve tasks trivial to humans, makes distributed machine
learning unavoidable for fast experiment turnaround time. RL is inherently comprised of heterogeneous tasks:
running environments, model inference, model training, replay buffer, etc. and current state-of-the-art distributed
algorithms do not efficiently use compute resources for the tasks.The amount of data and inefficient use of
resources makes experiments unreasonably expensive. The two main challenges addressed in this paper are
scaling of reinforcement learning and optimizing the use of modern accelerators, CPUs and other resources.
We introduce SEED (Scalable, Efficient, Deep-RL), a modern RL agent that scales well, is flexible and efficiently
utilizes available resources. It is a distributed agent where model inference is done centrally combined with fast
streaming RPCs to reduce the overhead of inference calls. We show that with simple methods, one can achieve
state-of-the-art results faster on a number of tasks. For optimal performance, we use TPUs (cloud.google.com/
tpu/) and TensorFlow 2 (Abadi et al., 2015)to simplify the implementation. The cost of running SEED is analyzed
against IMPALA (Espeholtet al., 2018) which is a commonly used state-of-the-art distributed RL algorithm (Veeriah
et al.(2019); Li et al. (2019); Deverett et al. (2019); Omidshafiei et al. (2019); Vezhnevets et al. (2019);Hansen et
al. (2019); Schaarschmidt et al.; Tirumala et al. (2019), ...). We show cost reductions of up to 80% while being
significantly faster. When scaling SEED to many accelerators, it can train on millions of frames per second. Finally,
the implementation is open-sourced together with examples of running it at scale on Google Cloud (see Appendix
A.4 for details) making it easy to reproduce results and try novel ideas
70. Designing Neural Nets through Neuroevolution
From tinyurl.com/mykhb52y
Much of recent machine learning has focused on deep learning, in which neural network weights are trained through
variantsof stochastic gradient descent. An alternative approach comes from the field of neuroevolution, which harnesses
evolutionary algorithms to optimize neural networks, inspired by the fact that natural brains themselves are the products of
an evolutionary process. Neuroevolution enables important capabilities that are typically unavailable to gradient-based
approaches, including learning neural network building blocks (for example activation functions), hyperparameters,
architectures and even the algorithms for learning themselves. Neuroevolution also differs from deep learning (and deep
reinforcement learning) by maintaining a population of solutions during search, enabling extreme exploration and massive
parallelization. Finally, because neuroevolution research has (until recently) developed largely in isolation from gradient-
based neural network research, ithas developed many unique and effective techniques that should be effective in other
machine learning areas too.
This Review looks at several key aspects of modern neuroevolution, including large-scale computing, the benefits of novelty
and diversity, the power of indirect encoding, and the field’s contributions to meta-learning and architecture search. Our hope
is to inspire renewed interest in the field as it meets the potential of the increasing computation available today, to highlight
how many of its ideas can provide an exciting resource for inspiration and hybridization to the deep learning, deep
reinforcement learning and machine learning communities, and to explain how neuroevolution could prove to be a critical
tool in the long-term pursuit of artificial general intelligence
73. From https://arxiv.org/pdf/1412.3555v1.pdf
A transformer is a deep learning model that adopts the mechanism of self-attention, differentially weighting the significance of each part of the input data. It is used primarily
in the fields of natural language processing (NLP)[1] and computer vision (CV).[2]
Like recurrent neural networks (RNNs), transformers are designed to process sequential input data, such as natural language, with applications towards tasks such as translation
and text summarization. However, unlike RNNs, transformers process the entire input all at once. The attention mechanism provides context for any position in the input
sequence. For example, if the input data is a natural language sentence, the transformer does not have to process one word at a time. This allows for more parallelization than
RNNs and therefore reduces training times.[1]
Transformers were introduced in 2017 by a team at Google Brain[1] and are increasingly the model of choice for NLP problems,[3] replacing RNN models such as long short-
term memory (LSTM). The additional training parallelization allows training on larger datasets. This led to the development of pretrained systems such as BERT (Bidirectional
Encoder Representations from Transformers) and GPT (Generative Pre-trained Transformer), which were trained with large language datasets, such as the Wikipedia Corpus
and Common Crawl, and can be fine-tuned for specific tasks.[4][5]
Attention mechanisms let a model draw from the state at any preceding point along the sequence. The attention layer can access all previous states and weight them according to
a learned measure of relevance, providing relevant information about far-away tokens. When added to RNNs, attention mechanisms increase performance. The development of
the Transformer architecture revealed that attention mechanisms were powerful in themselves and that sequential recurrent processing of data was not necessary to achieve the
quality gains of RNNs with attention. Transformers use an attention mechanism without an RNN, processing all tokens at the same time and calculating attention weights
between them in successive layers. Since the attention mechanism only uses information about other tokens from lower layers, it can be computed for all tokens in parallel,
which leads to improved training speed.
Like earlier seq2seq models, the original Transformer model used an encoder–decoder architecture. The encoder consists of encoding layers that process the input iteratively
one layer after another, while the decoder consists of decoding layers that do the same thing to the encoder's output. The function of each encoder layer is to generate encodings
that contain information about which parts of the inputs are relevant to each other. It passes its encodings to the next encoder layer as inputs. Each decoder layer does the
opposite, taking all the encodings and using their incorporated contextual information to generate an output sequence.[6] To achieve this, each encoder and decoder layer makes
use of an attention mechanism. For each input, attention weighs the relevance of every other input and draws from them to produce the output.[7] Each decoder layer has an
additional attention mechanism that draws information from the outputs of previous decoders, before the decoder layer draws information from the encodings. Both the encoder
and decoder layers have a feed-forward neural network for additional processing of the outputs and contain residual connections and layer normalization steps.
Transformers
74. From https://en.wikipedia.org/wiki/Transformer_(machine_learning_model)
Transformers
Before transformers, most state-of-the-art NLP systems relied on gated RNNs, such as LSTMs and gated recurrent units (GRUs), with added
attention mechanisms. Transformers also make use of attention mechanisms but, unlike RNNs, do not have a recurrent structure. This means that
provided with enough training data, attention mechanisms alone can match the performance of RNNs with attention.[1]
Sequential processing
Gated RNNs process tokens sequentially, maintaining a state vector that contains a representation of the data seen prior to the current token. To
process the th token, the model combines the state representing the sentence up to token with the information of the new token to create a new
state, representing the sentence up to token . Theoretically, the information from one token can propagate arbitrarily far down the sequence, if at
every point the state continues to encode contextual information about the token. In practice this mechanism is flawed: the vanishing gradient
problem leaves the model's state at the end of a long sentence without precise, extractable information about preceding tokens. The dependency of
token computations on results of previous token computations also makes it hard to parallelize computation on modern deep learning hardware.
This can make the training of RNNs inefficient.
Self-Attention
These problems were addressed by attention mechanisms. Attention mechanisms let a model draw from the state at any preceding point along the
sequence. The attention layer can access all previous states and weight them according to a learned measure of relevance, providing relevant
information about far-away tokens.
A clear example of the value of attention is in language translation, where context is essential to assign the meaning of a word in a sentence. In an
English-to-French translation system, the first word of the French output most probably depends heavily on the first few words of the English input.
However, in a classic LSTM model, in order to produce the first word of the French output, the model is given only the state vector after processing
the last English word. Theoretically, this vector can encode information about the whole English sentence, giving the model all necessary
knowledge. In practice, this information is often poorly preserved by the LSTM. An attention mechanism can be added to address this problem: the
decoder is given access to the state vectors of every English input word, not just the last, and can learn attention weights that dictate how much to
attend to each English input state vector.
When added to RNNs, attention mechanisms increase performance. The development of the Transformer architecture revealed that attention
mechanisms were powerful in themselves and that sequential recurrent processing of data was not necessary to achieve the quality gains of RNNs
with attention. Transformers use an attention mechanism without an RNN, processing all tokens at the same time and calculating attention weights
between them in successive layers. Since the attention mechanism only uses information about other tokens from lower layers, it can be computed
for all tokens in parallel, which leads to improved training speed.
75. From https://en.wikipedia.org/wiki/GPT-3
GPT-3
Generative Pre-trained Transformer 3 (GPT-3; stylized GPT·3) is an autoregressive language model that uses deep learning to
produce human-like text.
The architecture is a standard transformer network (with a few engineering tweaks) with the unprecedented size of 2048-token-long
context and 175 billion parameters (requiring 800 GB of storage). The training method is "generative pretraining", meaning that it is
trained to predict what the next token is. The model demonstrated strong few-shot learning on many text-based tasks.
It is the third-generation language prediction model in the GPT-n series (and the successor to GPT-2) created by OpenAI, a San
Francisco-based artificial intelligence research laboratory.[2] GPT-3's full version has a capacity of 175 billion machine learning
parameters. GPT-3, which was introduced in May 2020, and was in beta testing as of July 2020,[3] is part of a trend in natural language
processing (NLP) systems of pre-trained language representations.[1]
The quality of the text generated by GPT-3 is so high that it can be difficult to determine whether or not it was written by a human,
which has both benefits and risks.[4] Thirty-one OpenAI researchers and engineers presented the original May 28, 2020 paper
introducing GPT-3. In their paper, they warned of GPT-3's potential dangers and called for research to mitigate risk.[1]:34 David
Chalmers, an Australian philosopher, described GPT-3 as "one of the most interesting and important AI systems ever produced."[5]
Microsoft announced on September 22, 2020, that it had licensed "exclusive" use of GPT-3; others can still use the public API to receive
output, but only Microsoft has access to GPT-3's underlying model.[6]
An April 2022 review in The New York Times described GPT-3's capabilities as being able to write original prose with fluency
equivalent to that of a human.[7]
76. OpenAI
From https://openai.com/
Recent Research
Efficient Training of Language Models to Fill in the Middle
Hierarchical Text-Conditional Image Generation with CLIP Latents
Formal Mathematics Statement Curriculum Learning
Training language models to follow instructions with human feedback
Text and Code Embeddings by Contrastive Pre-Training
WebGPT: Browser-assisted question-answering with human feedback
Training Verifiers to Solve Math Word Problems
Recursively Summarizing Books with Human Feedback
Evaluating Large Language Models Trained on Code
Process for Adapting Language Models to
Society (PALMS) with Values-Targeted Datasets
Multimodal Neurons in Artificial Neural Networks
Learning Transferable Visual Models From Natural Language Supervision
Zero-Shot Text-to-Image Generation
Understanding the Capabilities, Limitations,
and Societal Impact of Large Language Models
OpenAI is an AI research and deployment company. Our mission is to ensure that artificial general intelligence benefits all of humanity.
78. Reservoir Computing
From https://martinuzzifrancesco.github.io/posts/a-brief-introduction-to-reservoir-computing/
Reservoir Computing is an umbrella term used to identify a general framework of computation derived from Recurrent Neural Networks (RNN),
indipendently developed by Jaeger [1] and Maass et al. [2]. These papers introduced the concepts of Echo State Networks (ESN) and Liquid State Machines
(LSM) respectively. Further improvements over these two models constitute what is now called the field of Reservoir Computing. The main idea lies in
leveraging a fixed non-linear system, of higher dimension than the input, onto which to input signal is mapped. After this mapping is only necessary to use a
simple readout layer to harvest the state of the reservoir and to train it to the desired output. In principle, given a complex enough system, this architecture
should be capable of any computation [3]. The intuition was born from the fact that in training RNNs most of the times the weights showing most change were
the ones in the last layer [4]. In the next section we will also see that ESNs actually use a fixed random RNN as the reservoir. Given the static nature of this
implementation usually ESNs can yield faster results and in some cases even better, in particular when dealing with chaotic time series predictions [5].
But not every complex system is suited to be a good reservoir. A good reservoir is one that is able to separate inputs; different external inputs should drive the
system to different regions of the configuration space [3]. This is called the separability condition. Furthermore an important property for the reservoirs of
ESNs is the Echo State property which states that inputs to the reservoir echo in the system forever, or util they dissipate. A more formal definition of this
property can be found in [6].
Reservoir computing is a best-in-class machine learning algorithm for processing information generated by dynamical systems using observed time-series
data. Importantly, it requires very small training data sets, uses linear optimization, and thus requires minimal computing resources. However, the
algorithm uses randomly sampled matrices to define the underlying recurrent neural network and has a multitude of metaparameters that must be
optimized. Recent results demonstrate the equivalence of reservoir computing to nonlinear vector autoregression, which requires no random matrices,
fewer metaparameters, and provides interpretable results. Here, we demonstrate that nonlinear vector autoregression excels at reservoir computing
benchmark tasks and requires even shorter training data sets and training time, heralding the next generation of reservoir computing.
A dynamical system evolves in time, with examples including the Earth’s weather system and human-built devices such as unmanned aerial vehicles. One practical
goal is to develop models for forecasting their behavior. Recent machine learning (ML) approaches can generate a model using only observed data, but many of these
algorithms tend to be data hungry, requiring long observation times and substantial computational resources.
Reservoir computing1,2 is an ML paradigm that is especially well-suited for learning dynamical systems. Even when systems display chaotic3 or complex
spatiotemporal behaviors4, which are considered the hardest-of-the-hard problems, an optimized reservoir computer (RC) can handle them with ease.
From https://www.nature.com/articles/s41467-021-25801-2
80. Brain Connectivity meets Reservoir Computing
From https://www.biorxiv.org/content/10.1101/2021.01.22.427750v1
The connectivity of Artificial Neural Networks (ANNs) is different from the one observed in Biological Neural Networks (BNNs).
Can the wiring of actual brains help improve ANNs architectures? Can we learn from ANNs about what network features support
computation in the brain when solving a task?
ANNs’ architectures are carefully engineered and have crucial importance in many recent performance improvements. On the
other hand, BNNs’ exhibit complex emergent connectivity patterns. At the individual level, BNNs connectivity results from brain
development and plasticity processes, while at the species level, adaptive reconfigurations during evolution also play a major role
shaping connectivity.
Ubiquitous features of brain connectivity have been identified in recent years, but their role in the brain’s ability to perform
concrete computations remains poorly understood. Computational neuroscience studies reveal the influence of specific brain
connectivity features only on abstract dynamical properties, although the implications of real brain networks topologies on
machine learning or cognitive tasks have been barely explored.
Here we present a cross-species study with a hybrid approach integrating real brain connectomes and Bio-Echo State Networks,
which we use to solve concrete memory tasks, allowing us to probe the potential computational implications of real brain
connectivity patterns on task solving.
We find results consistent across species and tasks, showing that biologically inspired networks perform as well as classical echo
state networks, provided a minimum level of randomness and diversity of connections is allowed. We also present a framework,
bio2art, to map and scale up real connectomes that can be integrated into recurrent ANNs. This approach also allows us to show
the crucial importance of the diversity of interareal connectivity patterns, stressing the importance of stochastic processes
determining neural networks connectivity in general.
89. HPC vs Big Data Ecosystems
From https://www.hpcwire.com/2018/08/31/the-convergence-of-big-data-and-extreme-scale-hpc/
90. HPC and ML
From http://dsc.soic.indiana.edu/publications/Learning_Everywhere_Summary.pdf
HPCforML: Using HPC to execute and enhance ML performance, or using HPC simulations to train ML algorithms
(theory
guided machine learning), which are then used to understand experimental data or simulations.
•MLforHPC: Using ML to enhance HPC applications and systems
•This categorization is related to Jeff Dean’s ”Machine Learning for Systems and Systems for Machine Learning” [6] and
Matsuoka’s convergence of AI and HPC [7].We further subdivide HPCforML as
•• HPCrunsML: Using HPC to execute ML with high performance • SimulationTrainedML: Using HPC simulations to train
ML algorithms, which are then used to understand experimental data or simulations. We also subdivide MLforHPC as •
MLautotuning: Using ML to configure (autotune) ML or HPC simulations. Already, autotuning with systems like ATLAS
is hugely successful and gives an initial view of MLautotuning. As well as choosing block sizes to improve cache use and
vectorization, MLautotuning can also be used for simulation mesh sizes [8] and in big data problems for configuring
databases and complex systems like Hadoop and Spark [9], [10]
•. • MLafterHPC: ML analyzing results of HPC as in trajectory analysis and structure identification in biomolecular
simulations • MLaroundHPC: Using ML to learn from simulations and produce learned surrogates for the simulations. The
same ML wrapper can also learn configurations as well as results. This differs from SimulationTrainedML as there
typically a learnt network is used to redirect observation whereas in MLaroundHPC we are using the ML to improve the
HPC performance
•. • MLControl: Using simulations (with HPC) in contro of experiments and in objective driven computational campaigns
[11]. Here the simulation surrogates are very valuable to allow real-time predictions.
91. Designing Neural Nets through Neuroevolution
From www.evolvingai.org/stanley-clune-lehman-2019-designing-neural-networks
93. Deep Density Destructors
From https://www.cs.cmu.edu/~dinouye/papers/inouye2018-deep-density-destructors-icml2018.pdf
We propose a unified framework for deep density models by formally defining density
destructors. A density destructor is an invertible function that transforms a given density to
the uniform density—essentially destroying any structure in the original density. This
destructive transformation generalizes Gaussianization via ICA and more recent
autoregressive models such as MAF and Real NVP. Informally, this transformation can be
seen as a generalized whitening procedure or a multivariate generalization of the univariate
CDF function. Unlike Gaussianization, our destructive transformation has the elegant
property that the density function is equal to the absolute value of the Jacobian determinant.
Thus, each layer of a deep density can be seen as a shallow density—uncovering a
fundamental connection between shallow and deep densities. In addition, our framework
provides a common interface for all previous methods enabling them to be systematically
combined, evaluated and improved. Leveraging the connection to shallow densities, we also
propose a novel tree destructor based on tree densities and an image-specific destructor based
on pixel locality. We illustrate our framework on a 2D dataset, MNIST, and CIFAR-10.
95. Sci-Kit Learning Decision Tree
From https://becominghuman.ai/cheat-sheets-for-ai-neural-networks-machine-learning-deep-learning-big-data-678c51b4b463
100. Paired Open Ended Trailblazer (POET)
From https://drive.google.com/file/d/12QdNmMll-bGlSWnm8pmD_TawuRN7xagX/view https://drive.google.com/file/d/12QdNmMll-bGlSWnm8pmD_TawuRN7xagX/view
101. One Model to Learn Them All
From https://arxiv.org/pdf/1706.05137.pdf
102. Self-modifying NNs With Differentiable Neuromodulated Plasticity
From https://arxiv.org/pdf/1706.05137.pdf
107. Graphical Processing Units (GPU)
From https://www.intel.com/content/www/us/en/products/docs/processors/what-is-a-gpu.html
Graphics processing technology has evolved to deliver unique benefits in the world of computing. The latest
graphics processing units (GPUs) unlock new possibilities in gaming, content creation, machine learning, and more.
What Does a GPU Do?
The graphics processing unit, or GPU, has become one of the most important types of computing technology, both for personal
and business computing. Designed for parallel processing, the GPU is used in a wide range of applications, including graphics and
video rendering. Although they’re best known for their capabilities in gaming, GPUs are becoming more popular for use in
creative production and artificial intelligence (AI).
GPUs were originally designed to accelerate the rendering of 3D graphics. Over time, they became more flexible and
programmable, enhancing their capabilities. This allowed graphics programmers to create more interesting visual effects and
realistic scenes with advanced lighting and shadowing techniques. Other developers also began to tap the power of GPUs to
dramatically accelerate additional workloads in high performance computing (HPC), deep learning, and more.
GPU and CPU: Working Together
The GPU evolved as a complement to its close cousin, the CPU (central processing unit). While CPUs have continued to deliver performance
increases through architectural innovations, faster clock speeds, and the addition of cores, GPUs are specifically designed to accelerate
computer graphics workloads. When shopping for a system, it can be helpful to know the role of the CPU vs. GPU so you can make the most
of both.
GPU vs. Graphics Card: What’s the Difference?
While the terms GPU and graphics card (or video card) are often used interchangeably, there is a subtle distinction between these terms.
Much like a motherboard contains a CPU, a graphics card refers to an add-in board that incorporates the GPU. This board also includes the
raft of components required to both allow the GPU to function and connect to the rest of the system.
GPUs come in two basic types: integrated and discrete. An integrated GPU does not come on its own separate card at all and is instead
embedded alongside the CPU. A discrete GPU is a distinct chip that is mounted on its own circuit board and is typically attached to a PCI
Express slot.
108. NVidia Graphical Processing Units (GPU)
From https://en.wikipedia.org/wiki/Nvidia
Nvidia Corporation[note 1][note 2] (/ɛnˈvɪdiə/ en-VID-ee-ə) is an American multinational technology company incorporated in
Delaware and based in Santa Clara, California.[2] It is a software and fabless company which designs graphics processing units
(GPUs), application programming interface (APIs) for data science and high-performance computing as well as system on a chip
units (SoCs) for the mobile computing and automotive market. Nvidia is a global leader in artificial intelligence hardware and
software.[3][4] Its professional line of GPUs are used in workstations for applications in such fields as architecture, engineering and
construction, media and entertainment, automotive, scientific research, and manufacturing design.[5]
In addition to GPU manufacturing, Nvidia provides an API called CUDA that allows the creation of massively parallel programs
which utilize GPUs.[6][7] They are deployed in supercomputing sites around the world.[8][9] More recently, it has moved into the
mobile computing market, where it produces Tegra mobile processors for smartphones and tablets as well as vehicle navigation
and entertainment systems.[10][11][12] In addition to AMD, its competitors include Intel,[13] Qualcomm[14] and AI-accelerator
companies such as Graphcore.
Nvidia's GPUs are used for edge to cloud computing, and supercomputers (Nvidia provides the accelerators, i.e. the GPUs for
many of them, including a previous top fastest, while it has been replaced, and current fastest, and most-power efficient, are
powered by AMD GPUs and CPUs) and Nvidia expanded its presence in the gaming industry with its handheld game consoles
Shield Portable, Shield Tablet, and Shield Android TV and its cloud gaming service GeForce Now.
Nvidia announced plans on September 13, 2020, to acquire Arm from SoftBank, pending regulatory approval, for a value of
US$40 billion in stock and cash, which would be the largest semiconductor acquisition to date. SoftBank Group will acquire
slightly less than a 10% stake in Nvidia, and Arm would maintain its headquarters in Cambridge.[15][16][17][18]
109. Tesla unveils new Dojo Supercomouter
From https://electrek.co/2022/10/01/tesla-dojo-supercomputer-tripped-power-grid/
Tesla has unveiled its latest version of its Dojo supercomputer and it’s apparently so powerful that it tripped the power grid in Palo
Alto. Dojo is Tesla’s own custom supercomputer platform built from the ground up for AI machine learning and more specifically
for video training using the video data coming from its fleet of vehicles.
The automaker already has a large NVIDIA GPU-based supercomputer that is one of the most powerful in the world, but the new
Dojo custom-built computer is using chips and an entire infrastructure designed by Tesla.The custom-built supercomputer is
expected to elevate Tesla’s capacity to train neural nets using video data, which is critical to its computer vision technology
powering its self-driving effort.
Last year, at Tesla’s AI Day, the company unveiled its Dojo supercomputer, but the company was still ramping up its effort at the
time. It only had its first chip and training tiles, and it was still working on building a full Dojo cabinet and cluster or
“Exapod.”Now Tesla has unveiled the progress made with the Dojo program over the last year during its AI Day 2022 last night.
Why does Tesla need to Dojo supercomputer?
It’s a fair question. Why is an automaker developing the world’s most powerful supercomputer? Well, Tesla would tell you that it’s
not just an automaker, but a technology company developing products to accelerate the transition to a sustainable economy.Musk
said it makes sense to offer a Dojo as a service, perhaps to take on his buddy Jeff Bezos’s Amazon AWS and calling it a “service
that you can use that’s available online where you can train your models way faster and for less money.”
But more specifically, Tesla needs Dojo to auto-label train videos from its fleet and train its neural nets to build its self-driving
system.Tesla realized that its approach to developing a self-driving system using neural nets training on millions of videos coming
from its customer fleet requires a lot of computing power. and it decided to develop its own supercomputer to deliver that power.
That’s the short-term goal, but Tesla will have plenty of use for the supercomputer going forward as it has big ambitions to
develop other artificial intelligence programs.
112. Introduction to Deep Reinforcement Learning
From https://skymind.ai/wiki/deep-reinforcement-learning
Many RL references at this site
113. Model-based Reinforcement Learning
From http://rail.eecs.berkeley.edu/deeprlcourse-fa17/f17docs/lecture_9_model_based_rl.pdfhttp://rail.eecs.berkeley.edu/deeprlcourse-fa17/f17docs/lecture_9_model_based_rl.pdf
114. Hierarchical Deep Reinforcement Learning
From https://papers.nips.cc/paper/6233-hierarchical-deep-reinforcement-learning-integrating-temporal-abstraction-and-intrinsic-motivation.pdf
115. Meta Learning Shared Hierarchy
From https://skymind.ai/wiki/deep-reinforcement-learning
116. Learning with Hierarchical Deep Models
From https://www.cs.toronto.edu/~rsalakhu/papers/HD_PAMI.pdf
We introduce HD (or “Hierarchical-Deep”) models, a new compositional learning architecture
that integrates deep learning models with structured hierarchical Bayesian (HB) models.
Specifically, we show how we can learn a hierarchical Dirichlet process (HDP) prior over the
activities of the top-level features in a deep Boltzmann machine (DBM). This compound HDP-
DBM model learns to learn novel concepts from very few training example by learning low-
level generic features, high-level features that capture correlations among low-level features,
and a category hierarchy for sharing priors over the high-level features that are typical of
different kinds of concepts. We present efficient learning and inference algorithms for the
HDP-DBM model and show that it is able to learn new concepts from very few examples on
CIFAR-100 object recognition, handwritten character recognition, and human motion capture
datasets.
118. Convolutional Deep Belief Networks for Scalable
Unsupervised Learning of Hierarchical Representations
From https://web.eecs.umich.edu/~honglak/icml09-ConvolutionalDeepBeliefNetworks.pdf
There has been much interest in unsupervised learning of hierarchical generative models such as deep belief networks. Scaling such models to
full-sized, high-dimensional images remains a difficult problem. To address this problem, we present the convolutional deep belief network, a
hierarchical generative model which scales to realistic image sizes. This model is translation-invariant and supports efficient bottom-up and top-
down probabilistic inference. Key to our approach is probabilistic max-pooling, a novel technique which shrinks the representations of higher
layers in a probabilistically sound way. Our experiments show that the algorithm learns useful high-level visual features, such as object parts, from
unlabeled images of objects and natural scenes. We demonstrate excellent performance on several visual recognition tasks and show that our
model can perform hierarchical (bottom-up and top-down) inference over full-sized images.
The visual world can be described at many levels: pixel intensities, edges, object parts, objects, and beyond. The prospect of learning hierarchical
models which simultaneously represent multiple levels has recently generated much interest. Ideally, such “deep” representations would learn
hierarchies of feature detectors, and further be able to combine top-down and bottomup processing of an image. For instance, lower layers could
support object detection by spotting low-level features indicative of object parts. Conversely, information about objects in the higher layers could
resolve lower-level ambiguities in the image or infer the locations of hidden object parts. Deep architectures consist of feature detector units
arranged in layers. Lower layers detect simple features and feed into higher layers, which in turn detect more complex features. There have been
several approaches to learning deep networks (LeCun et al., 1989; Bengio et al., 2006; Ranzato et al., 2006; Hinton et al., 2006). In particular, the
deep belief network (DBN) (Hinton et al., 2006) is a multilayer generative model where each layer encodes statistical dependencies among the
units in the layer below it; it is trained to (approximately) maximize the likelihood of its training data. DBNs have been successfully used to learn
high-level structure in a wide variety of domains, including handwritten digits (Hinton et al., 2006) and human motion capture data (Taylor et al.,
2007). We build upon the DBN in this paper because we are interested in learning a generative model of images which can be trained in a purely
unsupervised manner
This paper presents the convolutional deep belief network, a hierarchical generative model that scales to full-sized images. Another key to our
approach is probabilistic max-pooling, a novel technique that allows higher-layer units to cover larger areas of the input in a probabilistically
sound way. To the best of our knowledge, ours is the first translation invariant hierarchical generative model which supports both top-down and
bottom-up probabilistic inference and scales to realistic image sizes. The first, second, and third layers of our network learn edge detectors, object
parts, and objects respectively. We show that these representations achieve excellent performance on several visual recognition tasks and allow
“hidden” object parts to be inferred from high-level object information.
119. Learning with Hierarchical-Deep Models
From https://www.cs.toronto.edu/~rsalakhu/papers/HD_PAMI.pdf
We introduce HD (or “Hierarchical-Deep”) models, a new compositional learning architecture that integrates deep learning models with structured
hierarchical Bayesian (HB) models. Specifically, we show how we can learn a hierarchical Dirichlet process (HDP) prior over the activities of the
top-level features in a deep Boltzmann machine (DBM). This compound HDP-DBM model learns to learn novel concepts from very few training
example by learning low-level generic features, high-level features that capture correlations among low-level features, and a category hierarchy for
sharing priors over the high-level features that are typical of different kinds of concepts. We present efficient learning and inference algorithms for
the HDP-DBM model and show that it is able to learn new concepts from very few examples on CIFAR-100 object recognition, handwritten
character recognition, and human motion capture datasets
The ability to learn abstract representations that support transfer to novel but related tasks lies at the core of many problems in computer vision,
natural language processing, cognitive science, and machine learning. In typical applications of machine classification algorithms today, learning a
new concept requires tens, hundreds, or thousands of training examples. For human learners, however, just one or a few examples are often
sufficient to grasp a new category and make meaningful generalizations to novel instances [15], [25], [31], [44]. Clearly, this requires very strong
but also appropriately tuned inductive biases. The architecture we describe here takes a step toward this ability by learning several forms of abstract
knowledge at different levels of abstraction that support transfer of useful inductive biases from previously learned concepts to novel ones.
We call our architectures compound HD models, where “HD” stands for “Hierarchical-Deep,” because they are derived by composing hierarchical
nonparametric Bayesian models with deep networks, two influential approaches from the recent unsupervised learning literature with
complementary strengths. Recently introduced deep learning models, including deep belief networks (DBNs) [12], deep Boltzmann machines
(DBM) [29], deep autoencoders [19], and many others [9], [10], [21], [22], [26], [32], [34], [43], have been shown to learn useful distributed feature
representations for many high-dimensional datasets. The ability to automatically learn in multiple layers allows deep models to construct
sophisticated domain-specific features without the need to rely on precise human-crafted input representations, increasingly important with the
proliferation of datasets and application domains.
120. Reinforcement Learning: Fast and Slow
From https://www.cell.com/trends/cognitive-sciences/fulltext/S1364-6613(19)30061-0
Meta-RL: Speeding up Deep RL by Learning to Learn
As discussed earlier, a second key source of slowness in standard deep RL, alongside incremental
updating, is weak inductive bias. As formalized in the idea of the bias–variance tradeoff, fast learning
requires the learner to go in with a reasonably sized set of hypotheses concerning the structure of the
patterns that it will face. The narrower the hypothesis set, the faster learning can be. However, as
foreshadowed earlier, there is a catch: a narrow hypothesis set will only speed learning if it contains
the correct hypothesis. While strong inductive biases can accelerate learning, they will only do so if
the specific biases the learner adopts happen to fit with the material to be learned. As a result of this, a
new learning problem arises: how can the learner know what inductive biases to adopt?
Episodic Deep RL: Fast Learning through Episodic Memory
If incremental parameter adjustment is one source of slowness in deep RL, then one way to
learn faster might be to avoid such incremental updating. Naively increasing the learning rate
governing gradient descent optimization leads to the problem of catastrophic interference.
However, recent research shows that there is another way to accomplish the same goal, which
is to keep an explicit record of past events, and use this record directly as a point of reference
in making new decisions. This idea, referred to as episodic RL parallels ‘non-parametric’
approaches in machine learning and resembles ‘instance-’ or ‘exemplar-based’ theories of
learning in psychology When a new situation is encountered and a decision must be made
concerning what action to take, the procedure is to compare an internal representation of the
current situation with stored representations of past situations. The action chosen is then the
one associated with the highest value, based on the outcomes of the past situations that are
most similar to the present. When the internal state representation is computed by a multilayer
neural network, we refer to the resulting algorithm as ‘episodic deep RL’.
122. Large-Scale Deep Learning (Jeff Dean)
From http://static.googleusercontent.com/media/research.google.com/en//people/jeff/CIKM-keynote-Nov2014.pdf
123. Embedding for Sparse Inputs (Jeff Dean)
From http://static.googleusercontent.com/media/research.google.com/en//people/jeff/CIKM-keynote-Nov2014.pdf
124. Efficient Vector Representation of Words (Jeff Dean)
From http://static.googleusercontent.com/media/research.google.com/en//people/jeff/CIKM-keynote-Nov2014.pdf
125. Deep Convolution Neural Nets and Gaussian Processes
From https://ai.google/research/pubs/pub47671
126. Deep Convolution Neural Nets and Gaussian Processes(cont)
From https://ai.google/research/pubs/pub47671
127. Google’s Inception Network
From https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
128. Google’s Inception Network
From https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
129. Google’s Inception Network
From https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
130. Google’s Inception Network
From https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
131. Google’s Inception Network
From https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
132. Google’s Inception Network
From https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
133. Google’s Inception Network
From https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
134. Large-Scale Deep Learning (Jeff Dean)
From http://static.googleusercontent.com/media/research.google.com/en//people/jeff/CIKM-keynote-Nov2014.pdf
135. Large-Scale Deep Learning (Jeff Dean)
From http://static.googleusercontent.com/media/research.google.com/en//people/jeff/CIKM-keynote-Nov2014.pdf
136. Large-Scale Deep Learning (Jeff Dean)
From http://static.googleusercontent.com/media/research.google.com/en//people/jeff/CIKM-keynote-Nov2014.pdf
137. Large-Scale Deep Learning (Jeff Dean)
From http://static.googleusercontent.com/media/research.google.com/en//people/jeff/CIKM-keynote-Nov2014.pdf
139. Simple
Event
Processing
Complex
Event
Processing
Hierarchical C4ISR Flow Model from Bob Marcus
Preprocess
In
Input
Devices
u
World
Model
Update
New
World
Model
Strategy
Tactics
HQ
Operations
Field
Operations
Situation Impact
Object Process
Simple
Response
Complex
Response
Update Plan
Create New
Goals and Plan
Sensor and
Effects
Management
In
Actuator
Devices
Measurement
Field
Processors
Data Structured Data Information Knowledge Wisdom
Devices
Awareness
Decision
Adapted From http://www.et-strategies.com/great-global-grid/Events.pdf
140. Computing and Sensing Architectures
From https://www.researchgate.net/publication/323835314_Greening_Trends_in_Energy-Efficiency_of_IoT-based_Heterogeneous_Wireless_Nodes/figures?lo=1
141. Computing and Sensing Architectures
From https://www.researchgate.net/publication/323835314_Greening_Trends_in_Energy-Efficiency_of_IoT-based_Heterogeneous_Wireless_Nodes/figures?lo=1
142. Bio-Inspired Distributed Intelligence
From https://news.mit.edu/2022/wiggling-toward-bio-inspired-machine-intelligence-juncal-arbelaiz-1002
More than half of an octopus’ nerves are distributed through its eight arms, each of which has some degree of autonomy. This
distributed sensing and information processing system intrigued Arbelaiz, who is researching how to design decentralized
intelligence for human-made systems with embedded sensing and computation. At MIT, Arbelaiz is an applied math student who
is working on the fundamentals of optimal distributed control and estimation in the final weeks before completing her PhD this
fall.
She finds inspiration in the biological intelligence of invertebrates such as octopus and jellyfish, with the ultimate goal of
designing novel control strategies for flexible “soft” robots that could be used in tight or delicate surroundings, such as a surgical
tool or for search-and-rescue missions.
“The squishiness of soft robots allows them to dynamically adapt to different environments. Think of worms, snakes, or jellyfish,
and compare their motion and adaptation capabilities to those of vertebrate animals,” says Arbelaiz. “It is an interesting expression
of embodied intelligence — lacking a rigid skeleton gives advantages to certain applications and helps to handle uncertainty in the
real world more efficiently. But this additional softness also entails new system-theoretic challenges.”
In the biological world, the “controller” is usually associated with the brain and central nervous system — it creates motor
commands for the muscles to achieve movement. Jellyfish and a few other soft organisms lack a centralized nerve center, or brain.
Inspired by this observation, she is now working toward a theory where soft-robotic systems could be controlled using
decentralized sensory information sharing.
“When sensing and actuation are distributed in the body of the robot and onboard computational capabilities are limited, it might
be difficult to implement centralized intelligence,” she says. “So, we need these sort of decentralized schemes that, despite sharing
sensory information only locally, guarantee the desired global behavior. Some biological systems, such as the jellyfish, are
beautiful examples of decentralized control architectures — locomotion is achieved in the absence of a (centralized) brain. This is
fascinating as compared to what we can achieve with human-made machines.”
150. DeepMind Website
DeepMind Home page
https://deepmind.com/
DeepMind Research
https://deepmind.com/research/
https://deepmind.com/research/publications/
DeepMind Blog
https://deepmind.com/blog
DeepMind Applied
https://deepmind.com/applied
151. DeepMind Featured Research Publications
From https://deepmind.com/research
AlphaGo
https://www.deepmind.com/research/highlighted-research/alphago
Deep Reinforcement Learning
https://deepmind.com/research/dqn/
A Dual Approach to Scalable Verification of Deep Networks
http://auai.org/uai2018/proceedings/papers/204.pdf
https://www.youtube.com/watch?v=SV05j3GM0LI
Learning to reinforcement learn
https://arxiv.org/abs/1611.05763
Neural Programmer - Interpreters
https://arxiv.org/pdf/1511.06279v3.pdf
Dueling Network Architectures for Deep Reinforcement Learning
https://arxiv.org/pdf/1511.06581.pdf
DeepMind Research over 400 publications
https://deepmind.com/research/publications/
152. DeepMind Applied
From https://deepmind.com/applied/
DeepMind Health
https://deepmind.com/applied/deepmind-health/
DeepMind for Google
https://deepmind.com/applied/deepmind-google/
DeepMind Ethics and Society
https://deepmind.com/applied/deepmind-ethics-society/
153. AlphaGo and AlphaGoZero
From https://www.deepmind.com/research/highlighted-research/alphago
We created AlphaGo, a computer program that combines advanced search tree with deep neural
networks. These neural networks take a description of the Go board as an input and process it
through a number of different network layers containing millions of neuron-like connections.
One neural network, the “policy network”, selects the next move to play. The other neural network,
the “value network”, predicts the winner of the game. We introduced AlphaGo to numerous amateur
games to help it develop an understanding of reasonable human play. Then we had it play against
different versions of itself thousands of times, each time learning from its mistakes.
Over time, AlphaGo improved and became increasingly stronger and better at learning and decision-
making. This process is known as reinforcement learning. AlphaGo went on to defeat Go world
champions in different global arenas and arguably became the greatest Go player of all time.
Following the summit, we revealed AlphaGo Zero. While AlphaGo learnt the game by
playing thousands of matches with amateur and professional players, AlphaGo Zero
learnt by playing against itself, starting from completely random play.
This powerful technique is no longer constrained by the limits of human knowledge. Instead,
the computer program accumulated thousands of years of human knowledge during a period of
just a few days and learned to play Go from the strongest player in the world, AlphaGo.
AlphaGo Zero quickly surpassed the performance of all previous versions and also discovered new
knowledge, developing unconventional strategies and creative new moves, including those which
beat the World Go Champions Lee Sedol and Ke Jie. These creative moments give us confidence
that AI can be used as a positive multiplier for human ingenuity.
154. AlphaZero
From https://www.deepmind.com/blog/alphazero-shedding-new-light-on-chess-shogi-and-go
In late 2017 we introduced AlphaZero, a single system that taught itself from scratch how to master the
games of chess, shogi(Japanese chess), and Go, beating a world-champion program in each case. We were
excited by the preliminary results and thrilled to see the response from members of the chess community,
who saw in AlphaZero’s games a ground-breaking, highly dynamic and “unconventional” style of play that
differed from any chess playing engine that came before it.
Today, we are delighted to introduce the full evaluation of AlphaZero, published in the journal Science (Open
Access version here), that confirms and updates those preliminary results. It describes how AlphaZero quickly
learns each game to become the strongest player in history for each, despite starting its training from random play,
with no in-built domain knowledge but the basic rules of the game.
This ability to learn each game afresh, unconstrained by the norms of human play, results in a distinctive,
unorthodox, yet creative and dynamic playing style. Chess Grandmaster Matthew Sadler and Women’s
International Master Natasha Regan, who have analysed thousands of AlphaZero’s chess games for their
forthcoming book Game Changer (New in Chess, January 2019), say its style is unlike any traditional chess
engine.” It’s like discovering the secret notebooks of some great player from the past,” says Matthew.
Traditional chess engines – including the world computer chess champion Stockfish and IBM’s ground-
breaking Deep Blue – rely on thousands of rules and heuristics handcrafted by strong human players that try
to account for every eventuality in a game. Shogi programs are also game specific, using similar search
engines and algorithms to chess programs.
AlphaZero takes a totally different approach, replacing these hand-crafted rules with a deep neural network
and general purpose algorithms that know nothing about the game beyond the basic rules.