SlideShare a Scribd company logo
1 of 231
Download to read offline
Artificial General
Intelligence 1
Bob Marcus
robert.marcus@et-strategies.com
Part 1 of 4 parts: Artificial Intelligence and Machine Learning
This is a first cut.
More details will be added later.
Part 1: Artificial Intelligence (AI)
Part 2: Natural Intelligence(NI)
Part 3: Artificial General Intelligence (AI + NI)
Part 4: Networked AGI Layer on top or Gaia and Human Society
Four Slide Sets on Artificial General Intelligence
AI = Artificial Intelligence (Task)
AGI = Artificial Mind (Simulation)
AB = Artificial Brain (Emulation)
AC = Artificial Consciousness (Synthetic)
AI < AGI < ? AB <AC (Is a partial brain emulation needed to create a mind?)
Mind is not required for task proficiency
Full Natural Brain architecture is not required for a mind
Consciousness is not required for a natural brain architecture
Philosophical Musings 10/2022
Focused Artifical Intelligence (AI) will get better at specific tasks
Specific AI implementations will probably exceed human performance in most tasks
Some will attain superhuman abilities is a wide range of tasks
“Common Sense” = low-level experiential broad knowledge could be an exception
Some AIs could use brain inspired architectures to improve complex ask performance
This is not equivalent to human or artificial general intelligence (AGI)
However networking task-centric AIs could provide a first step towards AGI
This is similar to the way human society achieves power from communication
The combination of the networked AIs could be the foundation of an artificial mind
In a similar fashion, human society can accomplish complex tasks without being conscious
Distributed division of labor enable tasks to be assigned to the most competent element
Networked humans and AIs could cooperate through brain-machine interfaces
In the brain, consciousness provides direction to the mind
In large societies, governments perform the role of conscious direction
With networked AIs, a “conscious operating system”could play a similar role.
This would probably have to be initially programmed by humans.
If the AI network included sensors, actuators, and robots it could be aware of the world
The AI network could form a grid managing society, biology, and geology layers
A conscious AI network could develop its own goals beyond efficient management
Humans in the loop could be valuable in providing common sense and protective oversight
Outline
Classical AI
Knowledge Representation
Agents
Classical Machine Learning
Deep Learning
Deep Learning Models
Deep Learning Hardware
Reinforcement Learning
Google Research
Computing and Sensing Architecture
IoT and Deep Learning
DeepMind
Deep Learning 2020
Causal Reasoning and Deep Learning
References
Classical AI
Classical Paper Awards 1999-2022
Top 100 AI Start-ups
From https://singularityhub.com/2020/03/30/the-top-100-ai-startups-out-there-now-and-what-theyre-working-on/
Classical AI Tools
Lisp
https://en.wikipedia.org/wiki/Lisp_(programming_language)
Prolog
https://www.geeksforgeeks.org/prolog-an-introduction/
Knowledge Representation
https://en.wikipedia.org/wiki/Knowledge_representation_and_reasoning
Decision Trees
https://en.wikipedia.org/wiki/Decision_tree
Forward and Backward Chaining
https://www.section.io/engineering-education/forward-and-backward-chaining-in-ai/
Constraint Satisfaction
https://en.wikipedia.org/wiki/Constraint_satisfaction
OPS5
https://en.wikipedia.org/wiki/OPS5
Classical AI Systems
CYC
https://en.wikipedia.org/wiki/Cyc
Expert Systems
https://en.wikipedia.org/wiki/Expert_system
XCON
https://en.wikipedia.org/wiki/Xcon
MYCIN
https://en.wikipedia.org/wiki/Mycin
MYCON
https://www.slideshare.net/bobmarcus/1986-multilevel-constraintbased-configuration-article
https://www.slideshare.net/bobmarcus/1986-mycon-multilevel-constraint-based-configuration
Knowledge Representation
Stored Knowledge Base
From https://www.researchgate.net/publication/327926311_Development_of_a_knowledge_base_based_on_context_analysis_of_external_information_resources/figures?lo=1
Pre-defined Models
From https://intelligence.org/2015/07/27/miris-approach/
Agents
AI Agents
From https://www.geeksforgeeks.org/agents-artificial-intelligence/
Intelligent Agents
From https://en.wikipedia.org/wiki/Intelligent_agent
In artificial intelligence, an intelligent agent (IA) is anything which perceives its environment, takes actions autonomously in order to achieve goals, and may improve
its performance with learning or may use knowledge. They may be simple or complex — a thermostat is considered an example of an intelligent agent, as is a human
being, as is any system that meets the definition, such as a firm, a state, or a biome.[1]
Leading AI textbooks define "artificial intelligence" as the "study and design of intelligent agents", a definition that considers goal-directed behavior to be the essence of
intelligence. Goal-directed agents are also described using a term borrowed from economics, "rational agent".[1]
An agent has an "objective function" that encapsulates all the IA's goals. Such an agent is designed to create and execute whatever plan will, upon completion, maximize
the expected value of the objective function.[2] For example, a reinforcement learning agent has a "reward function" that allows the programmers to shape the IA's desired
behavior,[3] and an evolutionary algorithm's behavior is shaped by a "fitness function".[4]
Intelligent agents in artificial intelligence are closely related to agents in economics, and versions of the intelligent agent paradigm are studied in cognitive science,
ethics, the philosophy of practical reason, as well as in many interdisciplinary socio-cognitive modeling and computer social simulations.
Intelligent agents are often described schematically as an abstract functional system similar to a computer program. Abstract descriptions of intelligent agents are called
abstract intelligent agents (AIA) to distinguish them from their real world implementations. An autonomous intelligent agent is designed to function in the absence of
human intervention. Intelligent agents are also closely related to software agents (an autonomous computer program that carries out tasks on behalf of users).
Node in Real-Time Control System (RCS) by Albus
From https://en.wikipedia.org/wiki/4D-RCS_Reference_Model_Architecture
Intelligent Agents for Network Management
From https://www.ericsson.com/en/blog/2022/6/who-are-the-intelligent-agents-in-network-operations-and-why-we-need-them
Intelligent Agents on the Web
From https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.230.5806&rep=rep1&type=pdf
Intelligent agents are goal-driven and autonomous, and can communicate and interact with each other. Moreover,
they can evaluate information obtained online from heterogeneoussources and present information tailored to an
individual’s needs. This article covers different facets of the intelligent agent paradigm and applications, while also
exploring new opportunities and trends for intelligent agents.
IAs cover several functionalities, ranging from adaptive user interfaces (called interface agents) tointelligent
mobile processes that cooperate with other agents to coordinate their activities in a distributed manner. The
requirements for IAs remain open for discussion. An agent should be able to:
• interact with humans and other agents
• anticipate user needs for information
• adapt to changes in user needs and the environment
• cope with heterogeneity of information and other agents.
The following attributes characterize an IA-based systems’ main capabilities:
• Intelligence. The method an agent uses to de-velop its intelligence includes using the agent’sown software
content and knowledge representation, which describes vocabulary data, conditions, goals, and tasks.
• Continuity. An agent is a continuously running process that can detect changes in its environment, modify its
behavior, and update its knowledge base (which describes the environment).
• Communication. An agent can communicate with other agents to achieve its goals, and it can interact with users
directly by using appropriate interfaces.
• Cooperation. An agent automatically customizes itself to its users’ needs based on previous experiences and
monitored profiles.
• Mobility. The degree of mobility with which an agent can perform varies from remote execution, in which the
agent is transferred from
a distant system, to a situation in which the agent creates new agents, dies, or executes partially during migratiion
Smart Agents 2022 Comparison
From https://www.businessnewsdaily.com/10315-siri-cortana-google-assistant-amazon-alexa-face-off.html
When AI assistants first hit the market, they were far from ubiquitous, but thanks to more third-party OEMs jumping on the smart speaker bandwagon,
there are more choices for assistant-enabled devices than ever. In addition to increasing variety, in terms of hardware, devices that support multiple types
of AI assistants are becoming more common. Despite more integration, competition between AI assistants is still stiff, so to save you time and
frustration, we did an extensive hands-on test – not to compare speakers against each other, but to compare the AI assistants themselves.
There are four frontrunners in the AI assistant space: Amazon (Alexa), Apple (Siri), Google (Google Assistant) and Microsoft (Cortana). Rather than
gauge each assistant’s efficacy based on company-reported features, I spent hours testing each assistant by issuing commands and asking questions that
many business users would use. I constructed questions to test basic understanding as well as contextual understanding and general vocal recognition.
Accessibility and trends
Ease of setup
Voice recognition
Success of queries and ability to understand context
Bottom line
None of the AI assistants are perfect; this is young technology, and it has a long way to go. There was a handful of questions that none of the virtual
assistants on my list could answer. For example, when I asked for directions to the closest airport, even the two best assistants on my list, Google
Assistant and Siri, failed hilariously: Google Assistant directed me to a travel agency (those still exist?), while Siri directed me to a seaplane base (so
close!).
Judging purely on out-of-the-box functionality, I would choose either Siri or Google Assistant, and I would make the final choice based on hardware
preferences. None of the assistants are good enough to go out of your way to adopt. Choose between Siri and Google Assistant based on convenience
and what hardware you already have
IFTTT = "if this, then that," is a service that lets you connect apps, services, and smart home devices.
Amazon Alexa
From https://en.wikipedia.org/wiki/Amazon_Alexa
Amazon Alexa, also known simply as Alexa,[2] is a virtual assistant technology largely based on a Polish speech synthesiser
named Ivona, bought by Amazon in 2013.[3][4] It was first used in the Amazon Echo smart speaker and the Echo Dot, Echo
Studio and Amazon Tap speakers developed by Amazon Lab126. It is capable of voice interaction, music playback, making
to-do lists, setting alarms, streaming podcasts, playing audiobooks, and providing weather, traffic, sports, and other real-time
information, such as news.[5] Alexa can also control several smart devices using itself as a home automation system. Users
are able to extend the Alexa capabilities by installing "skills" (additional functionality developed by third-party vendors, in other
settings more commonly called apps) such as weather programs and audio features. It uses automatic speech recognition,
natural language processing, and other forms of weak AI to perform these tasks.[6]
Most devices with Alexa allow users to activate the device using a wake-word[7] (such as Alexa or Amazon); other devices
(such as the Amazon mobile app on iOS or Android and Amazon Dash Wand) require the user to click a button to activate
Alexa's listening mode, although, some phones also allow a user to say a command, such as "Alexa" or "Alexa wake".
Google Assistant
From https://en.wikipedia.org/wiki/Google_Assistant
Google Assistant is a virtual assistant software application developed by Google that is primarily available on mobile and home
automation devices. Based on artificial intelligence, Google Assistant can engage in two-way conversations,[1] unlike the
company's previous virtual assistant, Google Now.
Google Assistant debuted in May 2016 as part of Google's messaging app Allo, and its voice-activated speaker Google Home.
After a period of exclusivity on the Pixel and Pixel XL smartphones, it was deployed on other Android devices starting in February
2017, including third-party smartphones and Android Wear (now Wear OS), and was released as a standalone app on
the iOS operating system in May 2017. Alongside the announcement of a software development kit in April 2017, Assistant has
been further extended to support a large variety of devices, including cars and third-party smart home appliances. The
functionality of the Assistant can also be enhanced by third-party developers.
Users primarily interact with the Google Assistant through natural voice, though keyboard input is also supported. Assistant is
able to answer questions, schedule events and alarms, adjust hardware settings on the user's device, show information from the
user's Google account, play games, and more. Google has also announced that Assistant will be able to identify objects and
gather visual information through the device's camera, and support purchasing products and sending money.
Apple Siri
https://en.wikipedia.org/wiki/Siri
Siri (/ˈsɪri/ SEER-ee) is a virtual assistant that is part of Apple Inc.'s iOS, iPadOS, watchOS, macOS, tvOS, and audioOS operating systems.[1]
[2] It uses voice queries, gesture based control, focus-tracking and a natural-language user interface to answer questions, make
recommendations, and perform actions by delegating requests to a set of Internet services. With continued use, it adapts to users' individual
language usages, searches and preferences, returning individualized results.
Siri is a spin-off from a project developed by the SRI International Artificial Intelligence Center. Its speech recognition engine was provided
by Nuance Communications, and it uses advanced machine learning technologies to function. Its original American, British and
Australian voice actors recorded their respective voices around 2005, unaware of the recordings' eventual usage. Siri was released as an app
for iOS in February 2010. Two months later, Apple acquired it and integrated into iPhone 4S at its release on 4 October, 2011, removing the
separate app from the iOS App Store. Siri has since been an integral part of Apple's products, having been adapted into other hardware
devices including newer iPhone models, iPad, iPod Touch, Mac, AirPods, Apple TV, and HomePod.
Siri supports a wide range of user commands, including performing phone actions, checking basic information, scheduling events and
reminders, handling device settings, searching the Internet, navigating areas, finding information on entertainment, and is able to engage with
iOS-integrated apps. With the release of iOS 10 in 2016, Apple opened up limited third-party access to Siri, including third-party messaging
apps, as well as payments, ride-sharing, and Internet calling apps. With the release of iOS 11, Apple updated Siri's voice and added support
for follow-up questions, language translation, and additional third-party actions.
Microsoft Cortana
From https://en.wikipedia.org/wiki/Cortana_(virtual_assistant)
Cortana is a virtual assistant developed by Microsoft that uses the Bing search engine to perform tasks such as setting reminders
and answering questions for the user.
Cortana is currently available in English, Portuguese, French, German, Italian, Spanish, Chinese, and Japanese language editions, depending
on the software platform and region in which it is used.[8]
Microsoft began reducing the prevalence of Cortana and converting it from an assistant into different software integrations in 2019.[9] It was split
from the Windows 10 search bar in April 2019.[10] In January 2020, the Cortana mobile app was removed from certain markets,[11][12] and on
March 31, 2021, the Cortana mobile app was shut down globally.[13]
Microsoft has integrated Cortana into numerous products such as Microsoft Edge,[28] the browser bundled with Windows 10. Microsoft's
Cortana assistant is deeply integrated into its Edge browser. Cortana can find opening hours when on restaurant sites, show retail coupons for
websites, or show weather information in the address bar. At the Worldwide Partners Conference 2015 Microsoft demonstrated Cortana
integration with products such as GigJam.[29] Conversely, Microsoft announced in late April 2016 that it would block anything other than Bing
and Edge from being used to complete Cortana searches, again raising questions of anti-competitive practices by the company.[30]
In May 2017, Microsoft in collaboration with Harman Kardon announced INVOKE, a voice-activated speaker featuring Cortana. The premium
speaker has a cylindrical design and offers 360 degree sound, the ability to make and receive calls with Skype, and all of the other features
currently available with Cortana.[42]
Classical Machine Learning
Machine Learning Types
From https://towardsdatascience.com/coding-deep-learning-for-beginners-types-of-machine-learning-b9e651e1ed9d
Perceptron
From https://deepai.org/machine-learning-glossary-and-terms/perceptron
How does a Perceptron work?
The process begins by taking all the input values and multiplying them by their weights. Then, all of these
multiplied values are added together to create the weighted sum. The weighted sum is then applied to the
activation function, producing the perceptron's output. The activation function plays the integral role of
ensuring the output is mapped between required values such as (0,1) or (-1,1). It is important to note that
the weight of an input is indicative of the strength of a node. Similarly, an input's bias value gives the
ability to shift the activation function curve up or down.
Ensemble Machine Learning
From https://machinelearningmastery.com/tour-of-ensemble-learning-algorithms/
Ensemble learning is a general meta approach to machine learning that seeks better predictive
performance by combining the predictions from multiple models.
Although there are a seemingly unlimited number of ensembles that you can develop for your predictive
modeling problem, there are three methods that dominate the field of ensemble learning. So much so, that
rather than algorithms per se, each is a field of study that has spawned many more specialized methods.
The three main classes of ensemble learning methods are bagging, stacking, and boosting, and it is
important to both have a detailed understanding of each method and to consider them on your predictive
modeling project.
But, before that, you need a gentle introduction to these approaches and the key ideas behind each method
prior to layering on math and code.
In this tutorial, you will discover the three standard ensemble learning techniques for machine learning.
After completing this tutorial, you will know:
• Bagging involves fitting many decision trees on different samples of the same dataset and averaging
the predictions.
• Stacking involves fitting many different models types on the same data and using another model to
learn how to best combine the predictions.
• Boosting involves adding ensemble members sequentially that correct the predictions made by prior
models and outputs a weighted average of the predictions.
Bagging
From https://en.wikipedia.org/wiki/Bootstrap_aggregating
Bootstrap aggregating, also called bagging (from bootstrap aggregating), is a machine learning ensemble
meta-algorithm designed to improve the stability and accuracy of machine learning algorithms used in
statistical classification and regression. It also reduces variance and helps to avoid overfitting. Although it
is usually applied to decision tree methods, it can be used with any type of method. Bagging is a special
case of the model averaging approach.
Given a standard training set of size n, bagging generates m new training sets , each of size nʹ, by
sampling from D uniformly and with replacement. By sampling with replacement, some observations may
be repeated in each . If nʹ=n, then for large n the set is expected to have the fraction (1 - 1/e) (≈63.2%) of
the unique examples of D, the rest being duplicates.[1] This kind of sample is known as a bootstrap sample.
Sampling with replacement ensures each bootstrap is independent from its peers, as it does not depend on
previous chosen samples when sampling. Then, m models are fitted using the above m bootstrap samples
and combined by averaging the output (for regression) or voting (for classification).
Boosting
From https://www.ibm.com/cloud/learn/boosting and
https://en.wikipedia.org/wiki/Boosting_(machine_learning)
In machine learning, boosting is an ensemble meta-algorithm for primarily reducing bias, and also variance[1]
in supervised learning, and a family of machine learning algorithms that convert weak learners to strong ones
Bagging vs Boosting
Bagging and boosting are two main types of ensemble learning methods. As highlighted in this study (PDF, 242 KB)
(link resides outside IBM), the main difference between these learning methods is the way in which they are trained.
In bagging, weak learners are trained in parallel, but in boosting, they learn sequentially. This means that a series of
models are constructed and with each new model iteration, the weights of the misclassified data in the previous
model are increased. This redistribution of weights helps the algorithm identify the parameters that it needs to focus
on to improve its performance. AdaBoost, which stands for “adaptative boosting algorithm,” is one of the most
popular boosting algorithms as it was one of the first of its kind. Other types of boosting algorithms include
XGBoost, GradientBoost, and BrownBoost.
Another difference between bagging and boosting is in how they are used. For example, bagging methods are
typically used on weak learners that exhibit high variance and low bias, whereas boosting methods are leveraged
when low variance and high bias is observed. While bagging can be used to avoid overfitting, boosting methods
can be more prone to this (link resides outside IBM) although it really depends on the dataset. However, parameter
tuning can help avoid the issue.
As a result, bagging and boosting have different real-world applications as well. Bagging has been leveraged for
loan approval processes and statistical genomics while boosting has been used more within image recognition
apps and search engines.
Boosting is an ensemble learning method that combines a set of weak learners into a strong learner
to minimize training errors. In boosting, a random sample of data is selected, fitted with a model and
then trained sequentially—that is, each model tries to compensate for the weaknesses of its
predecessor. With each iteration, the weak rules from each individual classifier are combined to form
one, strong prediction rule.
Stacking
From https://www.geeksforgeeks.org/stacking-in-machine-learning/
Stacking is a way to ensemble multiple classifications or regression model. There are many ways to ensemble
models, the widely known models are Bagging or Boosting. Bagging allows multiple similar models with high
variance are averaged to decrease variance. Boosting builds multiple incremental models to decrease the bias, while
keeping variance small.
Stacking (sometimes called Stacked Generalization) is a different paradigm. The point of stacking is to explore a
space of different models for the same problem. The idea is that you can attack a learning problem with different
types of models which are capable to learn some part of the problem, but not the whole space of the problem. So, you
can build multiple different learners and you use them to build an intermediate prediction, one prediction for each
learned model. Then you add a new model which learns from the intermediate predictions the same target.
This final model is said to be stacked on the top of the others, hence the name. Thus, you might improve your overall
performance, and often you end up with a model which is better than any individual intermediate model. Notice
however, that it does not give you any guarantee, as is often the case with any machine learning technique.
Gradient Boosting
From https://en.wikipedia.org/wiki/Gradient_boosting
Gradient boosting is a machine learning technique used in regression and classification tasks, among others. It gives a
prediction model in the form of an ensemble of weak prediction models, which are typically decision trees.[1][2] When a
decision tree is the weak learner, the resulting algorithm is called gradient-boosted trees; it usually outperforms random
forest.[1][2][3] A gradient-boosted trees model is built in a stage-wise fashion as in other boosting methods, but it generalizes
the other methods by allowing optimization of an arbitrary differentiable loss function.
Introduction to XG Boost
From https://machinelearningmastery.com/gentle-introduction-xgboost-applied-machine-learning/
Terminology
・SoftMax https://en.wikipedia.org/wiki/Softmax_function
・SoftPlus https://en.wikipedia.org/wiki/Rectifier_(neural_networks)#Softplus
・Logit https://en.wikipedia.org/wiki/Logit
・Sigmoid https://en.wikipedia.org/wiki/Sigmoid_function
・Logistic Function https://en.wikipedia.org/wiki/Logistic_function
・Tanh https://brenocon.com/blog/2013/10/tanh-is-a-rescaled-logistic-sigmoid-function/
・ReLu https://en.wikipedia.org/wiki/Rectifier_(neural_networks)
・Maxpool Selects the maximum in subsets of convolutional neural nets layer
・
Relationships
SoftMax
SoftPlus
Sigmoid = Logistic
Tanh
Logit
Inverses
Derivative
SoftMax (z, 0)
First component
SoftMax (z, -z)
First component
SoftMax (z, -z)
Second component
-
x = log (2p/(1-p))
(0, x)
(-1, 1)
(0, 1)
(-∞, + ∞)
(0,1)
Log (SoftMax (z1, z2)
First component)/ (SoftMax (z1, z2)
Second component))
ReLu
(0, x)
Terminology (continued)
・Ηeteroscedastic https://en.wiktionary.org/wiki/scedasticity
・Maxout https://stats.stackexchange.com/questions/129698/what-is-maxout-in-neural-network/298705
・Cross-Entropy https://en.wikipedia.org/wiki/Cross_entropy -Ep(log q)
・Joint Entropy https://en.wikipedia.org/wiki/Joint_entropy - Ep(X,Y) (log (p(X,Y))
・KL Divergence https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence
・H(P,Q) = H(P) + KL(P,Q) or Ep(log q) = -Ep(log p) + {Ep(log p) - Ep(log q)}
・Mutual Information https://en.wikipedia.org/wiki/Mutual_information KL (p(x,y), p(x)p(y))
・Ridge Regression and Lasso Regression
https://hackernoon.com/practical-machine-learning-ridge-regression-vs-lasso-a00326371ece
・Logistic Regression https://www.stat.cmu.edu/~cshalizi/uADA/12/lectures/ch12.pdf
・Dropout https://en.wikipedia.org/wiki/Dropout_(neural_networks)
・RMSProp and AdaGrad and AdaDelta and Adam
https://www.quora.com/What-are-differences-between-update-rules-like-AdaDelta-RMSProp-AdaGrad-and-AdaM
・Pooling https://www.quora.com/Is-pooling-indispensable-in-deep-learning
・Boltzmann Machine https://en.wikipedia.org/wiki/Boltzmann_machine
・Hyperparameters
・
Reinforcement Learning Book
From https://www.andrew.cmu.edu/course/10-703/textbook/BartoSutton.pdf
Acumos Shared Model Process Flow
From https://arxiv.org/ftp/arxiv/papers/1810/1810.07159.pdf
Distributed AI
From https://en.wikipedia.org/wiki/Distributed_artificial_intelligence
Distributed Artificial Intelligence (DAI) also called Decentralized Artificial Intelligence[1] is a subfield of artificial intelligence research dedicated to the
development of distributed solutions for problems. DAI is closely related to and a predecessor of the field of multi-agent systems.
The objectives of Distributed Artificial Intelligence are to solve the reasoning, planning, learning and perception problems of artificial intelligence,
especially if they require large data, by distributing the problem to autonomous processing nodes (agents). To reach the objective, DAI requires:
• A distributed system with robust and elastic computation on unreliable and failing resources that are loosely coupled
• Coordination of the actions and communication of the nodes
• Subsamples of large data sets and online machine learning
There are many reasons for wanting to distribute intelligence or cope with multi-agent systems. Mainstream problems in DAI research include the
following:
• Parallel problem solving: mainly deals with how classic artificial intelligence concepts can be modified, so that multiprocessor systems and clusters
of computers can be used to speed up calculation.
• Distributed problem solving (DPS): the concept of agent, autonomous entities that can communicate with each other, was developed to serve as an
abstraction for developing DPS systems. See below for further details.
• Multi-Agent Based Simulation (MABS): a branch of DAI that builds the foundation for simulations that need to analyze not only phenomena at
macro level but also at micro level, as it is in many social simulation scenarios.
Swarm Intelligence
From https://en.wikipedia.org/wiki/Swarm_intelligence
Swarm intelligence (SI) is the collective behavior of decentralized, self-organized systems, natural or artificial. The concept is employed in work
on artificial intelligence. The expression was introduced by Gerardo Beni and Jing Wang in 1989, in the context of cellular robotic systems.[1]
SI systems consist typically of a population of simple agents or boids interacting locally with one another and with their environment.[2] The
inspiration often comes from nature, especially biological systems. The agents follow very simple rules, and although there is no centralized control
structure dictating how individual agents should behave, local, and to a certain degree random, interactions between such agents lead to
the emergence of "intelligent" global behavior, unknown to the individual agents.[3] Examples of swarm intelligence in natural systems include ant
colonies, bee colonies, bird flocking, hawks hunting, animal herding, bacterial growth, fish schooling and microbial intelligence.
The application of swarm principles to robots is called swarm robotics while swarm intelligence refers to the more general set of algorithms. Swarm
prediction has been used in the context of forecasting problems. Similar approaches to those proposed for swarm robotics are considered
for genetically modified organisms in synthetic collective intelligence.[4]
• 1 Models of swarm behavior
◦ 1.1 Boids (Reynolds 1987)
◦ 1.2 Self-propelled particles (Vicsek et al. 1995)
• 2 Metaheuristics
◦ 2.1 Stochastic diffusion search (Bishop 1989)
◦ 2.2 Ant colony optimization (Dorigo 1992)
◦ 2.3 Particle swarm optimization (Kennedy, Eberhart & Shi 1995)
◦ 2.4 Artificial Swarm Intelligence (2015)
• 3 Applications
◦ 3.1 Ant-based routing
◦ 3.2 Crowd simulation
▪ 3.2.1 Instances
◦ 3.3 Human swarming
◦ 3.4 Swarm grammars
◦ 3.5 Swarmic art
IBM Watson
From https://en.wikipedia.org/wiki/IBM_Watson
IBM Watson is a question-answering computer system capable of answering questions posed in natural language,[2] developed in IBM's
DeepQA project by a research team led by principal investigator David Ferrucci.[3] Watson was named after IBM's founder and first CEO,
industrialist Thomas J. Watson.[4][5]
Software -Watson uses IBM's DeepQA software and the Apache UIMA (Unstructured Information Management Architecture) framework implementation. The system
was written in various languages, including Java, C++, and Prolog, and runs on the SUSE Linux Enterprise Server 11 operating system using the Apache Hadoop
framework to provide distributed computing.[12][13][14]
Hardware -The system is workload-optimized, integrating massively parallel POWER7 processors and built on IBM's DeepQA technology,[15] which it uses to generate
hypotheses, gather massive evidence, and analyze data.[2] Watson employs a cluster of ninety IBM Power 750 servers, each of which uses a 3.5 GHz POWER7 eight-
core processor, with four threads per core. In total, the system has 2,880 POWER7 processor threads and 16 terabytes of RAM.[15] According to John Rennie, Watson
can process 500 gigabytes (the equivalent of a million books) per second.[16] IBM master inventor and senior consultant Tony Pearson estimated Watson's hardware cost
at about three million dollars.[17] Its Linpack performance stands at 80 TeraFLOPs, which is about half as fast as the cut-off line for the Top 500 Supercomputers list.[18]
According to Rennie, all content was stored in Watson's RAM for the Jeopardy game because data stored on hard drives would be too slow to compete with human
Jeopardy champions.[16]
Data -The sources of information for Watson include encyclopedias, dictionaries, thesauri, newswire articles and literary works. Watson also used databases,
taxonomies and ontologies including DBPedia, WordNet and Yago.[19] The IBM team provided Watson with millions of documents, including dictionaries,
encyclopedias and other reference material, that it could use to build its knowledge.[20]
From https://www.researchgate.net/publication/282644173_Implementation_of_a_Natural_Language_Processing_Tool_for_Cyber-Physical_Systems/figures?lo=1
Deep Learning
Three Types of Deep Learning
From https://www.slideshare.net/TerryTaewoongUm/introduction-to-deep-learning-with-tensorflow
Convolutional Neural Networks
https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53
Convolutional Neural Nets Comparison (2016)
From https://medium.com/@culurciello/analysis-of-deep-neural-networks-dcf398e71aae
Reference: https://towardsdatascience.com/neural-network-architectures-156e5bad51ba
Recurrent Neural Networks
From https://medium.com/deep-math-machine-learning-ai/chapter-10-deepnlp-recurrent-neural-networks-with-math-c4a6846a50a2
From colah.github.io/posts/2015-08-Understanding-LSTMs/
Recurrent Neural Networks and Long Short Term Memory
Dynamical System View on Recurrent Neural Networks
From https://openreview.net/pdf?id=ryxepo0cFX
From https://arxiv.org/pdf/1412.3555v1.pdf
Gated Recurrent Units vs Long Short Term Memory
Deep Learning Models
From https://arxiv.org/pdf/1712.04301.pdf
Neural Net Models
From https://becominghuman.ai/cheat-sheets-for-ai-neural-networks-machine-learning-deep-learning-big-data-678c51b4b463
Neural Net Models (cont)
From https://becominghuman.ai/cheat-sheets-for-ai-neural-networks-machine-learning-deep-learning-big-data-678c51b4b463
TensorFlow
From https://en.wikipedia.org/wiki/TensorFlow
TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of
tasks but has a particular focus on training and inference of deep neural networks.[4][5]
TensorFlow was developed by the Google Brain team for internal Google use in research and production.[6][7][8] The initial version
was released under the Apache License 2.0 in 2015.[1][9] Google released the updated version of TensorFlow, named TensorFlow 2.0,
in September 2019.[10]
TensorFlow can be used in a wide variety of programming languages, most notably Python, as well as Javascript, C++, and Java.[11]
This flexibility lends itself to a range of applications in many different sectors.
Keras
From https://en.wikipedia.org/wiki/Keras
Keras is an open-source software library that provides a Python interface for artificial neural networks. Keras acts
as an interface for the TensorFlow library.
Up until version 2.3, Keras supported multiple backends, including TensorFlow, Microsoft Cognitive Toolkit,
Theano, and PlaidML.[1][2][3] As of version 2.4, only TensorFlow is supported. Designed to enable fast
experimentation with deep neural networks, it focuses on being user-friendly, modular, and extensible. It was
developed as part of the research effort of project ONEIROS (Open-ended Neuro-Electronic Intelligent Robot
Operating System),[4] and its primary author and maintainer is François Chollet, a Google engineer. Chollet is also
the author of the Xception deep neural network model.[5]
Comparison of Deep Learning Frameworks
From https://arxiv.org/pdf/1903.00102.pdf
Popularity of Deep Learning Frameworks
From https://medium.com/implodinggradients/tensorflow-or-keras-which-one-should-i-learn-5dd7fa3f9ca0
Acronyms in Deep Learning
• RBM - Restricted Boltzmann Machines
• MLP - Multi-layer Perceptron
• DBN - Deep Belief Network
• CNN - Convolution Neural Network
• RNN - Recurrent Neural Network
• SGD - Stochastic Gradient Descent
• XOR - Exclusive Or
• SVM - SupportVector Machine
• ReLu - Rectified Linear Unit
• MNIST - Modified National Institute of Standards and Technology
• RBF - Radial Basis Function
• HMM - Hidden Markovv Model
• MAP - Maximum A Postiori
• MLE - Maximum Likelihood Estimate
• Adam - Adaptive Moment Estimation
• LSTM - Long Short Term Memory
• GRU - Gated Recurrent Unit
Concerns for Deep Learning by Gary Marcus
From https://arxiv.org/ftp/arxiv/papers/1801/1801.00631.pdf
Deep Learning thus far:
• Is data hungry
• Is shallow and has limited capacity for transfer
• Has no natural way to deal with hierarchical structure
• Has struggled with open-ended inference
• Is not sufficiently transparent
• Has not been well integrated with prior knowledge
• Cannot inherently distinguish causation from correlation
• Presumes a largely stable world, in ways that may be problematic
• Works well as an approximation, but answers often can’t be fully trusted
• Is difficult to engineer with
Watson Architecture
From https://seekingalpha.com/article/4087604-much-artificial-intelligence-ibm-watson
How transferable are features in deep neural networks?
From http://cs231n.github.io/transfer-learning/
Transfer Learning
From https://www.mathematik.hu-berlin.de/~perkowsk/files/thesis.pdf
More Transfer Learning
From https://towardsdatascience.com/transfer-learning-from-pre-trained-models-f2393f124751
More Transfer Learning
From http://ruder.io/transfer-learning/
Bayesian Deep Learning
From https://alexgkendall.com/computer_vision/bayesian_deep_learning_for_safe_ai/
Bayesian Learning vis Stochastic Gradient Langevin Dynamics
From https://tinyurl.com/22xayz76
In this paper we propose a new framework for learning from large scale
datasets based on iterative learning from small minibatches.By adding the
right amount of noise to a standard stochastic gradient optimization
algorithm we show that the iterates will converge to samples from the true
posterior distribution as we anneal the stepsize. This seamless transition
between optimization and Bayesian posterior sampling provides an in-
built protection against overfitting. We also propose a practical method for
Monte Carlo estimates of posterior statistics which monitors a “sampling
threshold” and collects samples after it has been surpassed. We apply the
method to three models: a mixture of Gaussians, logistic regression and
ICA with natural gradients
Our method combines Robbins-Monro type algorithms which stochastically
optimize a likelihood, with Langevin dynamics which injects noise into the
parameter updates in such a waythat the trajectory of the parameters will
converge to the full posterior distribution rather than just themaximum a
posteriori mode. The resulting algorithm starts off being similar to stochastic
optimization, then automatically transitions to one that simulates samples from
the posterior using Langevin dynamics.
DeterministicVariational Inference for Robust Bayesian NNs
From https://openreview.net/pdf?id=B1l08oAct7
Bayesian Deep Learning Survey
From https://arxiv.org/pdf/1604.01662.pdf
Conclusion and Future Research
In this survey, we identified a current trend of merging probabilistic graphical models and neural networks (deep
learning) and reviewed recent work on Bayesian deep learning, which strives to combine the merits of PGM and NN by
organically integrating them in a single principled probabilistic framework. To learn parameters in BDL, several
algorithms have been proposed, ranging from block coordinate descent, Bayesian conditional density filtering, and
stochastic gradient thermostats to stochastic gradient variational Bayes. Bayesian deep learning gains its popularity
both from the success of PGM and from the recent promising advances on deep learning. Since many real-world tasks
involve both perception and inference, BDL is a natural choice to harness the perception ability from NN and the (causal
and logical) inference ability from PGM. Although current applications of BDL focus on recommender systems, topic
models, and stochastic optimal control, in the future, we can expect an increasing number of other applications like link
prediction, community detection, active learning, Bayesian reinforcement learning, and many other complex tasks that
need interaction between perception and causal inference. Besides, with the advances of efficient Bayesian neural
networks (BNN), BDL with BNN as an important component is expected to be more and more scalable
Ensemble Methods for Deep Learning
From https://machinelearningmastery.com/ensemble-methods-for-deep-learning-neural-networks/
Comparing Loss Functions
From Neural Networks and Deep Learning Book
Seed Reinforcement Learning from Google
From https://ai.googleblog.com/2020/03/massively-scaling-reinforcement.html
The field of reinforcement learning (RL) has recently seen impressive results across a variety oftasks. This has in
part been fueled by the introduction of deep learning in RL and the introduction of accelerators such as GPUs. In
the very recent history, focus on massive scale has been key to solve a number of complicated games such as
AlphaGo (Silver et al., 2016), Dota (OpenAI, 2018)and StarCraft 2 (Vinyals et al., 2017).
The sheer amount of environment data needed to solve tasks trivial to humans, makes distributed machine
learning unavoidable for fast experiment turnaround time. RL is inherently comprised of heterogeneous tasks:
running environments, model inference, model training, replay buffer, etc. and current state-of-the-art distributed
algorithms do not efficiently use compute resources for the tasks.The amount of data and inefficient use of
resources makes experiments unreasonably expensive. The two main challenges addressed in this paper are
scaling of reinforcement learning and optimizing the use of modern accelerators, CPUs and other resources.
We introduce SEED (Scalable, Efficient, Deep-RL), a modern RL agent that scales well, is flexible and efficiently
utilizes available resources. It is a distributed agent where model inference is done centrally combined with fast
streaming RPCs to reduce the overhead of inference calls. We show that with simple methods, one can achieve
state-of-the-art results faster on a number of tasks. For optimal performance, we use TPUs (cloud.google.com/
tpu/) and TensorFlow 2 (Abadi et al., 2015)to simplify the implementation. The cost of running SEED is analyzed
against IMPALA (Espeholtet al., 2018) which is a commonly used state-of-the-art distributed RL algorithm (Veeriah
et al.(2019); Li et al. (2019); Deverett et al. (2019); Omidshafiei et al. (2019); Vezhnevets et al. (2019);Hansen et
al. (2019); Schaarschmidt et al.; Tirumala et al. (2019), ...). We show cost reductions of up to 80% while being
significantly faster. When scaling SEED to many accelerators, it can train on millions of frames per second. Finally,
the implementation is open-sourced together with examples of running it at scale on Google Cloud (see Appendix
A.4 for details) making it easy to reproduce results and try novel ideas
Designing Neural Nets through Neuroevolution
From tinyurl.com/mykhb52y
Much of recent machine learning has focused on deep learning, in which neural network weights are trained through
variantsof stochastic gradient descent. An alternative approach comes from the field of neuroevolution, which harnesses
evolutionary algorithms to optimize neural networks, inspired by the fact that natural brains themselves are the products of
an evolutionary process. Neuroevolution enables important capabilities that are typically unavailable to gradient-based
approaches, including learning neural network building blocks (for example activation functions), hyperparameters,
architectures and even the algorithms for learning themselves. Neuroevolution also differs from deep learning (and deep
reinforcement learning) by maintaining a population of solutions during search, enabling extreme exploration and massive
parallelization. Finally, because neuroevolution research has (until recently) developed largely in isolation from gradient-
based neural network research, ithas developed many unique and effective techniques that should be effective in other
machine learning areas too.
This Review looks at several key aspects of modern neuroevolution, including large-scale computing, the benefits of novelty
and diversity, the power of indirect encoding, and the field’s contributions to meta-learning and architecture search. Our hope
is to inspire renewed interest in the field as it meets the potential of the increasing computation available today, to highlight
how many of its ideas can provide an exciting resource for inspiration and hybridization to the deep learning, deep
reinforcement learning and machine learning communities, and to explain how neuroevolution could prove to be a critical
tool in the long-term pursuit of artificial general intelligence
Illuminating Search Spaces by Mapping Elites
From https://arxiv.org/pdf/1504.04909.pdf
From https://blog.openai.com/reinforcement-learning-with-prediction-based-rewards/#implementationjump
Reinforcement Learning with Prediction-based Rewards
From https://arxiv.org/pdf/1412.3555v1.pdf
A transformer is a deep learning model that adopts the mechanism of self-attention, differentially weighting the significance of each part of the input data. It is used primarily
in the fields of natural language processing (NLP)[1] and computer vision (CV).[2]
Like recurrent neural networks (RNNs), transformers are designed to process sequential input data, such as natural language, with applications towards tasks such as translation
and text summarization. However, unlike RNNs, transformers process the entire input all at once. The attention mechanism provides context for any position in the input
sequence. For example, if the input data is a natural language sentence, the transformer does not have to process one word at a time. This allows for more parallelization than
RNNs and therefore reduces training times.[1]
Transformers were introduced in 2017 by a team at Google Brain[1] and are increasingly the model of choice for NLP problems,[3] replacing RNN models such as long short-
term memory (LSTM). The additional training parallelization allows training on larger datasets. This led to the development of pretrained systems such as BERT (Bidirectional
Encoder Representations from Transformers) and GPT (Generative Pre-trained Transformer), which were trained with large language datasets, such as the Wikipedia Corpus
and Common Crawl, and can be fine-tuned for specific tasks.[4][5]
Attention mechanisms let a model draw from the state at any preceding point along the sequence. The attention layer can access all previous states and weight them according to
a learned measure of relevance, providing relevant information about far-away tokens. When added to RNNs, attention mechanisms increase performance. The development of
the Transformer architecture revealed that attention mechanisms were powerful in themselves and that sequential recurrent processing of data was not necessary to achieve the
quality gains of RNNs with attention. Transformers use an attention mechanism without an RNN, processing all tokens at the same time and calculating attention weights
between them in successive layers. Since the attention mechanism only uses information about other tokens from lower layers, it can be computed for all tokens in parallel,
which leads to improved training speed.
Like earlier seq2seq models, the original Transformer model used an encoder–decoder architecture. The encoder consists of encoding layers that process the input iteratively
one layer after another, while the decoder consists of decoding layers that do the same thing to the encoder's output. The function of each encoder layer is to generate encodings
that contain information about which parts of the inputs are relevant to each other. It passes its encodings to the next encoder layer as inputs. Each decoder layer does the
opposite, taking all the encodings and using their incorporated contextual information to generate an output sequence.[6] To achieve this, each encoder and decoder layer makes
use of an attention mechanism. For each input, attention weighs the relevance of every other input and draws from them to produce the output.[7] Each decoder layer has an
additional attention mechanism that draws information from the outputs of previous decoders, before the decoder layer draws information from the encodings. Both the encoder
and decoder layers have a feed-forward neural network for additional processing of the outputs and contain residual connections and layer normalization steps.
Transformers
From https://en.wikipedia.org/wiki/Transformer_(machine_learning_model)
Transformers
Before transformers, most state-of-the-art NLP systems relied on gated RNNs, such as LSTMs and gated recurrent units (GRUs), with added
attention mechanisms. Transformers also make use of attention mechanisms but, unlike RNNs, do not have a recurrent structure. This means that
provided with enough training data, attention mechanisms alone can match the performance of RNNs with attention.[1]
Sequential processing
Gated RNNs process tokens sequentially, maintaining a state vector that contains a representation of the data seen prior to the current token. To
process the th token, the model combines the state representing the sentence up to token with the information of the new token to create a new
state, representing the sentence up to token . Theoretically, the information from one token can propagate arbitrarily far down the sequence, if at
every point the state continues to encode contextual information about the token. In practice this mechanism is flawed: the vanishing gradient
problem leaves the model's state at the end of a long sentence without precise, extractable information about preceding tokens. The dependency of
token computations on results of previous token computations also makes it hard to parallelize computation on modern deep learning hardware.
This can make the training of RNNs inefficient.
Self-Attention
These problems were addressed by attention mechanisms. Attention mechanisms let a model draw from the state at any preceding point along the
sequence. The attention layer can access all previous states and weight them according to a learned measure of relevance, providing relevant
information about far-away tokens.
A clear example of the value of attention is in language translation, where context is essential to assign the meaning of a word in a sentence. In an
English-to-French translation system, the first word of the French output most probably depends heavily on the first few words of the English input.
However, in a classic LSTM model, in order to produce the first word of the French output, the model is given only the state vector after processing
the last English word. Theoretically, this vector can encode information about the whole English sentence, giving the model all necessary
knowledge. In practice, this information is often poorly preserved by the LSTM. An attention mechanism can be added to address this problem: the
decoder is given access to the state vectors of every English input word, not just the last, and can learn attention weights that dictate how much to
attend to each English input state vector.
When added to RNNs, attention mechanisms increase performance. The development of the Transformer architecture revealed that attention
mechanisms were powerful in themselves and that sequential recurrent processing of data was not necessary to achieve the quality gains of RNNs
with attention. Transformers use an attention mechanism without an RNN, processing all tokens at the same time and calculating attention weights
between them in successive layers. Since the attention mechanism only uses information about other tokens from lower layers, it can be computed
for all tokens in parallel, which leads to improved training speed.
From https://en.wikipedia.org/wiki/GPT-3
GPT-3
Generative Pre-trained Transformer 3 (GPT-3; stylized GPT·3) is an autoregressive language model that uses deep learning to
produce human-like text.
The architecture is a standard transformer network (with a few engineering tweaks) with the unprecedented size of 2048-token-long
context and 175 billion parameters (requiring 800 GB of storage). The training method is "generative pretraining", meaning that it is
trained to predict what the next token is. The model demonstrated strong few-shot learning on many text-based tasks.
It is the third-generation language prediction model in the GPT-n series (and the successor to GPT-2) created by OpenAI, a San
Francisco-based artificial intelligence research laboratory.[2] GPT-3's full version has a capacity of 175 billion machine learning
parameters. GPT-3, which was introduced in May 2020, and was in beta testing as of July 2020,[3] is part of a trend in natural language
processing (NLP) systems of pre-trained language representations.[1]
The quality of the text generated by GPT-3 is so high that it can be difficult to determine whether or not it was written by a human,
which has both benefits and risks.[4] Thirty-one OpenAI researchers and engineers presented the original May 28, 2020 paper
introducing GPT-3. In their paper, they warned of GPT-3's potential dangers and called for research to mitigate risk.[1]:34 David
Chalmers, an Australian philosopher, described GPT-3 as "one of the most interesting and important AI systems ever produced."[5]
Microsoft announced on September 22, 2020, that it had licensed "exclusive" use of GPT-3; others can still use the public API to receive
output, but only Microsoft has access to GPT-3's underlying model.[6]
An April 2022 review in The New York Times described GPT-3's capabilities as being able to write original prose with fluency
equivalent to that of a human.[7]
OpenAI
From https://openai.com/
Recent Research
Efficient Training of Language Models to Fill in the Middle
Hierarchical Text-Conditional Image Generation with CLIP Latents
Formal Mathematics Statement Curriculum Learning
Training language models to follow instructions with human feedback
Text and Code Embeddings by Contrastive Pre-Training
WebGPT: Browser-assisted question-answering with human feedback
Training Verifiers to Solve Math Word Problems
Recursively Summarizing Books with Human Feedback
Evaluating Large Language Models Trained on Code
Process for Adapting Language Models to
Society (PALMS) with Values-Targeted Datasets
Multimodal Neurons in Artificial Neural Networks
Learning Transferable Visual Models From Natural Language Supervision
Zero-Shot Text-to-Image Generation
Understanding the Capabilities, Limitations,
and Societal Impact of Large Language Models
OpenAI is an AI research and deployment company. Our mission is to ensure that artificial general intelligence benefits all of humanity.
From https://deepbrainai.io/?www.deepbrainai.io=
Deep Brain
Reservoir Computing
From https://martinuzzifrancesco.github.io/posts/a-brief-introduction-to-reservoir-computing/
Reservoir Computing is an umbrella term used to identify a general framework of computation derived from Recurrent Neural Networks (RNN),
indipendently developed by Jaeger [1] and Maass et al. [2]. These papers introduced the concepts of Echo State Networks (ESN) and Liquid State Machines
(LSM) respectively. Further improvements over these two models constitute what is now called the field of Reservoir Computing. The main idea lies in
leveraging a fixed non-linear system, of higher dimension than the input, onto which to input signal is mapped. After this mapping is only necessary to use a
simple readout layer to harvest the state of the reservoir and to train it to the desired output. In principle, given a complex enough system, this architecture
should be capable of any computation [3]. The intuition was born from the fact that in training RNNs most of the times the weights showing most change were
the ones in the last layer [4]. In the next section we will also see that ESNs actually use a fixed random RNN as the reservoir. Given the static nature of this
implementation usually ESNs can yield faster results and in some cases even better, in particular when dealing with chaotic time series predictions [5].
But not every complex system is suited to be a good reservoir. A good reservoir is one that is able to separate inputs; different external inputs should drive the
system to different regions of the configuration space [3]. This is called the separability condition. Furthermore an important property for the reservoirs of
ESNs is the Echo State property which states that inputs to the reservoir echo in the system forever, or util they dissipate. A more formal definition of this
property can be found in [6].
Reservoir computing is a best-in-class machine learning algorithm for processing information generated by dynamical systems using observed time-series
data. Importantly, it requires very small training data sets, uses linear optimization, and thus requires minimal computing resources. However, the
algorithm uses randomly sampled matrices to define the underlying recurrent neural network and has a multitude of metaparameters that must be
optimized. Recent results demonstrate the equivalence of reservoir computing to nonlinear vector autoregression, which requires no random matrices,
fewer metaparameters, and provides interpretable results. Here, we demonstrate that nonlinear vector autoregression excels at reservoir computing
benchmark tasks and requires even shorter training data sets and training time, heralding the next generation of reservoir computing.
A dynamical system evolves in time, with examples including the Earth’s weather system and human-built devices such as unmanned aerial vehicles. One practical
goal is to develop models for forecasting their behavior. Recent machine learning (ML) approaches can generate a model using only observed data, but many of these
algorithms tend to be data hungry, requiring long observation times and substantial computational resources.
Reservoir computing1,2 is an ML paradigm that is especially well-suited for learning dynamical systems. Even when systems display chaotic3 or complex
spatiotemporal behaviors4, which are considered the hardest-of-the-hard problems, an optimized reservoir computer (RC) can handle them with ease.
From https://www.nature.com/articles/s41467-021-25801-2
Reservoir Computing Trends
From https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.709.514&rep=rep1&type=pdf
Brain Connectivity meets Reservoir Computing
From https://www.biorxiv.org/content/10.1101/2021.01.22.427750v1
The connectivity of Artificial Neural Networks (ANNs) is different from the one observed in Biological Neural Networks (BNNs).
Can the wiring of actual brains help improve ANNs architectures? Can we learn from ANNs about what network features support
computation in the brain when solving a task?
ANNs’ architectures are carefully engineered and have crucial importance in many recent performance improvements. On the
other hand, BNNs’ exhibit complex emergent connectivity patterns. At the individual level, BNNs connectivity results from brain
development and plasticity processes, while at the species level, adaptive reconfigurations during evolution also play a major role
shaping connectivity.
Ubiquitous features of brain connectivity have been identified in recent years, but their role in the brain’s ability to perform
concrete computations remains poorly understood. Computational neuroscience studies reveal the influence of specific brain
connectivity features only on abstract dynamical properties, although the implications of real brain networks topologies on
machine learning or cognitive tasks have been barely explored.
Here we present a cross-species study with a hybrid approach integrating real brain connectomes and Bio-Echo State Networks,
which we use to solve concrete memory tasks, allowing us to probe the potential computational implications of real brain
connectivity patterns on task solving.
We find results consistent across species and tasks, showing that biologically inspired networks perform as well as classical echo
state networks, provided a minimum level of randomness and diversity of connections is allowed. We also present a framework,
bio2art, to map and scale up real connectomes that can be integrated into recurrent ANNs. This approach also allows us to show
the crucial importance of the diversity of interareal connectivity patterns, stressing the importance of stochastic processes
determining neural networks connectivity in general.
Deep Learning Models
Sharing Models
From https://arxiv.org/ftp/arxiv/papers/1810/1810.07159.pdf
Summary of Deep Learning Models: Survey
From https://arxiv.org/pdf/1712.04301.pdf
Deep Learning Acronyms
From https://arxiv.org/pdf/1712.04301.pdf
Deep Learning Hardware
From https://medium.com/iotforall/using-deep-learning-processors-for-intelligent-iot-devices-1a7ed9d2226d
Deep Learning MIT
From https://deeplearning.mit.edu/
ONNX
From http://onnx.ai/
GitHub ONNX Models
From https://github.com/onnx/models
HPC vs Big Data Ecosystems
From https://www.hpcwire.com/2018/08/31/the-convergence-of-big-data-and-extreme-scale-hpc/
HPC and ML
From http://dsc.soic.indiana.edu/publications/Learning_Everywhere_Summary.pdf
HPCforML: Using HPC to execute and enhance ML performance, or using HPC simulations to train ML algorithms
(theory
guided machine learning), which are then used to understand experimental data or simulations.
•MLforHPC: Using ML to enhance HPC applications and systems
•This categorization is related to Jeff Dean’s ”Machine Learning for Systems and Systems for Machine Learning” [6] and
Matsuoka’s convergence of AI and HPC [7].We further subdivide HPCforML as
•• HPCrunsML: Using HPC to execute ML with high performance • SimulationTrainedML: Using HPC simulations to train
ML algorithms, which are then used to understand experimental data or simulations. We also subdivide MLforHPC as •
MLautotuning: Using ML to configure (autotune) ML or HPC simulations. Already, autotuning with systems like ATLAS
is hugely successful and gives an initial view of MLautotuning. As well as choosing block sizes to improve cache use and
vectorization, MLautotuning can also be used for simulation mesh sizes [8] and in big data problems for configuring
databases and complex systems like Hadoop and Spark [9], [10]
•. • MLafterHPC: ML analyzing results of HPC as in trajectory analysis and structure identification in biomolecular
simulations • MLaroundHPC: Using ML to learn from simulations and produce learned surrogates for the simulations. The
same ML wrapper can also learn configurations as well as results. This differs from SimulationTrainedML as there
typically a learnt network is used to redirect observation whereas in MLaroundHPC we are using the ML to improve the
HPC performance
•. • MLControl: Using simulations (with HPC) in contro of experiments and in objective driven computational campaigns
[11]. Here the simulation surrogates are very valuable to allow real-time predictions.
Designing Neural Nets through Neuroevolution
From www.evolvingai.org/stanley-clune-lehman-2019-designing-neural-networks
Go Explore Algorithm
From http://www.evolvingai.org/files/1901.10995.pdf
Deep Density Destructors
From https://www.cs.cmu.edu/~dinouye/papers/inouye2018-deep-density-destructors-icml2018.pdf
We propose a unified framework for deep density models by formally defining density
destructors. A density destructor is an invertible function that transforms a given density to
the uniform density—essentially destroying any structure in the original density. This
destructive transformation generalizes Gaussianization via ICA and more recent
autoregressive models such as MAF and Real NVP. Informally, this transformation can be
seen as a generalized whitening procedure or a multivariate generalization of the univariate
CDF function. Unlike Gaussianization, our destructive transformation has the elegant
property that the density function is equal to the absolute value of the Jacobian determinant.
Thus, each layer of a deep density can be seen as a shallow density—uncovering a
fundamental connection between shallow and deep densities. In addition, our framework
provides a common interface for all previous methods enabling them to be systematically
combined, evaluated and improved. Leveraging the connection to shallow densities, we also
propose a novel tree destructor based on tree densities and an image-specific destructor based
on pixel locality. We illustrate our framework on a 2D dataset, MNIST, and CIFAR-10.
Predictive Perception
From https://www.quantamagazine.org/to-make-sense-of-the-present-brains-may-predict-the-future-20180710/
Sci-Kit Learning Decision Tree
From https://becominghuman.ai/cheat-sheets-for-ai-neural-networks-machine-learning-deep-learning-big-data-678c51b4b463
Imitation Learning
From https://drive.google.com/file/d/12QdNmMll-bGlSWnm8pmD_TawuRN7xagX/view https://drive.google.com/file/d/12QdNmMll-bGlSWnm8pmD_TawuRN7xagX/view
Imitation Learning
From https://drive.google.com/file/d/12QdNmMll-bGlSWnm8pmD_TawuRN7xagX/view https://drive.google.com/file/d/12QdNmMll-bGlSWnm8pmD_TawuRN7xagX/view
Generative Adversarial Networks (GANs)
From https://skymind.ai/wiki/generative-adversarial-network-gan
Deep Generative Network-based Activation Management (DGN-AMs)
From https://arxiv.org/pdf/1605.09304.pdf
Paired Open Ended Trailblazer (POET)
From https://drive.google.com/file/d/12QdNmMll-bGlSWnm8pmD_TawuRN7xagX/view https://drive.google.com/file/d/12QdNmMll-bGlSWnm8pmD_TawuRN7xagX/view
One Model to Learn Them All
From https://arxiv.org/pdf/1706.05137.pdf
Self-modifying NNs With Differentiable Neuromodulated Plasticity
From https://arxiv.org/pdf/1706.05137.pdf
Stein Variational Gradient Descent
From https://arxiv.org/pdf/1706.05137.pdf
Linux Foundation Deep Learning (LFDL) Projects
From https://lfdl.io/projects/
Linux Foundation Deep Learning (LFDL) Projects
From https://lfdl.io/projects/
Deep Learning Hardware
Graphical Processing Units (GPU)
From https://www.intel.com/content/www/us/en/products/docs/processors/what-is-a-gpu.html
Graphics processing technology has evolved to deliver unique benefits in the world of computing. The latest
graphics processing units (GPUs) unlock new possibilities in gaming, content creation, machine learning, and more.
What Does a GPU Do?
The graphics processing unit, or GPU, has become one of the most important types of computing technology, both for personal
and business computing. Designed for parallel processing, the GPU is used in a wide range of applications, including graphics and
video rendering. Although they’re best known for their capabilities in gaming, GPUs are becoming more popular for use in
creative production and artificial intelligence (AI).
GPUs were originally designed to accelerate the rendering of 3D graphics. Over time, they became more flexible and
programmable, enhancing their capabilities. This allowed graphics programmers to create more interesting visual effects and
realistic scenes with advanced lighting and shadowing techniques. Other developers also began to tap the power of GPUs to
dramatically accelerate additional workloads in high performance computing (HPC), deep learning, and more.
GPU and CPU: Working Together
The GPU evolved as a complement to its close cousin, the CPU (central processing unit). While CPUs have continued to deliver performance
increases through architectural innovations, faster clock speeds, and the addition of cores, GPUs are specifically designed to accelerate
computer graphics workloads. When shopping for a system, it can be helpful to know the role of the CPU vs. GPU so you can make the most
of both.
GPU vs. Graphics Card: What’s the Difference?
While the terms GPU and graphics card (or video card) are often used interchangeably, there is a subtle distinction between these terms.
Much like a motherboard contains a CPU, a graphics card refers to an add-in board that incorporates the GPU. This board also includes the
raft of components required to both allow the GPU to function and connect to the rest of the system.
GPUs come in two basic types: integrated and discrete. An integrated GPU does not come on its own separate card at all and is instead
embedded alongside the CPU. A discrete GPU is a distinct chip that is mounted on its own circuit board and is typically attached to a PCI
Express slot.
NVidia Graphical Processing Units (GPU)
From https://en.wikipedia.org/wiki/Nvidia
Nvidia Corporation[note 1][note 2] (/ɛnˈvɪdiə/ en-VID-ee-ə) is an American multinational technology company incorporated in
Delaware and based in Santa Clara, California.[2] It is a software and fabless company which designs graphics processing units
(GPUs), application programming interface (APIs) for data science and high-performance computing as well as system on a chip
units (SoCs) for the mobile computing and automotive market. Nvidia is a global leader in artificial intelligence hardware and
software.[3][4] Its professional line of GPUs are used in workstations for applications in such fields as architecture, engineering and
construction, media and entertainment, automotive, scientific research, and manufacturing design.[5]
In addition to GPU manufacturing, Nvidia provides an API called CUDA that allows the creation of massively parallel programs
which utilize GPUs.[6][7] They are deployed in supercomputing sites around the world.[8][9] More recently, it has moved into the
mobile computing market, where it produces Tegra mobile processors for smartphones and tablets as well as vehicle navigation
and entertainment systems.[10][11][12] In addition to AMD, its competitors include Intel,[13] Qualcomm[14] and AI-accelerator
companies such as Graphcore.
Nvidia's GPUs are used for edge to cloud computing, and supercomputers (Nvidia provides the accelerators, i.e. the GPUs for
many of them, including a previous top fastest, while it has been replaced, and current fastest, and most-power efficient, are
powered by AMD GPUs and CPUs) and Nvidia expanded its presence in the gaming industry with its handheld game consoles
Shield Portable, Shield Tablet, and Shield Android TV and its cloud gaming service GeForce Now.
Nvidia announced plans on September 13, 2020, to acquire Arm from SoftBank, pending regulatory approval, for a value of
US$40 billion in stock and cash, which would be the largest semiconductor acquisition to date. SoftBank Group will acquire
slightly less than a 10% stake in Nvidia, and Arm would maintain its headquarters in Cambridge.[15][16][17][18]
Tesla unveils new Dojo Supercomouter
From https://electrek.co/2022/10/01/tesla-dojo-supercomputer-tripped-power-grid/
Tesla has unveiled its latest version of its Dojo supercomputer and it’s apparently so powerful that it tripped the power grid in Palo
Alto. Dojo is Tesla’s own custom supercomputer platform built from the ground up for AI machine learning and more specifically
for video training using the video data coming from its fleet of vehicles.
The automaker already has a large NVIDIA GPU-based supercomputer that is one of the most powerful in the world, but the new
Dojo custom-built computer is using chips and an entire infrastructure designed by Tesla.The custom-built supercomputer is
expected to elevate Tesla’s capacity to train neural nets using video data, which is critical to its computer vision technology
powering its self-driving effort.
Last year, at Tesla’s AI Day, the company unveiled its Dojo supercomputer, but the company was still ramping up its effort at the
time. It only had its first chip and training tiles, and it was still working on building a full Dojo cabinet and cluster or
“Exapod.”Now Tesla has unveiled the progress made with the Dojo program over the last year during its AI Day 2022 last night.
Why does Tesla need to Dojo supercomputer?
It’s a fair question. Why is an automaker developing the world’s most powerful supercomputer? Well, Tesla would tell you that it’s
not just an automaker, but a technology company developing products to accelerate the transition to a sustainable economy.Musk
said it makes sense to offer a Dojo as a service, perhaps to take on his buddy Jeff Bezos’s Amazon AWS and calling it a “service
that you can use that’s available online where you can train your models way faster and for less money.”
But more specifically, Tesla needs Dojo to auto-label train videos from its fleet and train its neural nets to build its self-driving
system.Tesla realized that its approach to developing a self-driving system using neural nets training on millions of videos coming
from its customer fleet requires a lot of computing power. and it decided to develop its own supercomputer to deliver that power.
That’s the short-term goal, but Tesla will have plenty of use for the supercomputer going forward as it has big ambitions to
develop other artificial intelligence programs.
Linux Foundation Deep Learning (LFDL) Projects
From https://lfdl.io/projects/
Reinforcement Learning
Introduction to Deep Reinforcement Learning
From https://skymind.ai/wiki/deep-reinforcement-learning
Many RL references at this site
Model-based Reinforcement Learning
From http://rail.eecs.berkeley.edu/deeprlcourse-fa17/f17docs/lecture_9_model_based_rl.pdfhttp://rail.eecs.berkeley.edu/deeprlcourse-fa17/f17docs/lecture_9_model_based_rl.pdf
Hierarchical Deep Reinforcement Learning
From https://papers.nips.cc/paper/6233-hierarchical-deep-reinforcement-learning-integrating-temporal-abstraction-and-intrinsic-motivation.pdf
Meta Learning Shared Hierarchy
From https://skymind.ai/wiki/deep-reinforcement-learning
Learning with Hierarchical Deep Models
From https://www.cs.toronto.edu/~rsalakhu/papers/HD_PAMI.pdf
We introduce HD (or “Hierarchical-Deep”) models, a new compositional learning architecture
that integrates deep learning models with structured hierarchical Bayesian (HB) models.
Specifically, we show how we can learn a hierarchical Dirichlet process (HDP) prior over the
activities of the top-level features in a deep Boltzmann machine (DBM). This compound HDP-
DBM model learns to learn novel concepts from very few training example by learning low-
level generic features, high-level features that capture correlations among low-level features,
and a category hierarchy for sharing priors over the high-level features that are typical of
different kinds of concepts. We present efficient learning and inference algorithms for the
HDP-DBM model and show that it is able to learn new concepts from very few examples on
CIFAR-100 object recognition, handwritten character recognition, and human motion capture
datasets.
Transfer Learning
From http://cs231n.github.io/transfer-learning/
Convolutional Deep Belief Networks for Scalable
Unsupervised Learning of Hierarchical Representations
From https://web.eecs.umich.edu/~honglak/icml09-ConvolutionalDeepBeliefNetworks.pdf
There has been much interest in unsupervised learning of hierarchical generative models such as deep belief networks. Scaling such models to
full-sized, high-dimensional images remains a difficult problem. To address this problem, we present the convolutional deep belief network, a
hierarchical generative model which scales to realistic image sizes. This model is translation-invariant and supports efficient bottom-up and top-
down probabilistic inference. Key to our approach is probabilistic max-pooling, a novel technique which shrinks the representations of higher
layers in a probabilistically sound way. Our experiments show that the algorithm learns useful high-level visual features, such as object parts, from
unlabeled images of objects and natural scenes. We demonstrate excellent performance on several visual recognition tasks and show that our
model can perform hierarchical (bottom-up and top-down) inference over full-sized images.
The visual world can be described at many levels: pixel intensities, edges, object parts, objects, and beyond. The prospect of learning hierarchical
models which simultaneously represent multiple levels has recently generated much interest. Ideally, such “deep” representations would learn
hierarchies of feature detectors, and further be able to combine top-down and bottomup processing of an image. For instance, lower layers could
support object detection by spotting low-level features indicative of object parts. Conversely, information about objects in the higher layers could
resolve lower-level ambiguities in the image or infer the locations of hidden object parts. Deep architectures consist of feature detector units
arranged in layers. Lower layers detect simple features and feed into higher layers, which in turn detect more complex features. There have been
several approaches to learning deep networks (LeCun et al., 1989; Bengio et al., 2006; Ranzato et al., 2006; Hinton et al., 2006). In particular, the
deep belief network (DBN) (Hinton et al., 2006) is a multilayer generative model where each layer encodes statistical dependencies among the
units in the layer below it; it is trained to (approximately) maximize the likelihood of its training data. DBNs have been successfully used to learn
high-level structure in a wide variety of domains, including handwritten digits (Hinton et al., 2006) and human motion capture data (Taylor et al.,
2007). We build upon the DBN in this paper because we are interested in learning a generative model of images which can be trained in a purely
unsupervised manner
This paper presents the convolutional deep belief network, a hierarchical generative model that scales to full-sized images. Another key to our
approach is probabilistic max-pooling, a novel technique that allows higher-layer units to cover larger areas of the input in a probabilistically
sound way. To the best of our knowledge, ours is the first translation invariant hierarchical generative model which supports both top-down and
bottom-up probabilistic inference and scales to realistic image sizes. The first, second, and third layers of our network learn edge detectors, object
parts, and objects respectively. We show that these representations achieve excellent performance on several visual recognition tasks and allow
“hidden” object parts to be inferred from high-level object information.
Learning with Hierarchical-Deep Models
From https://www.cs.toronto.edu/~rsalakhu/papers/HD_PAMI.pdf
We introduce HD (or “Hierarchical-Deep”) models, a new compositional learning architecture that integrates deep learning models with structured
hierarchical Bayesian (HB) models. Specifically, we show how we can learn a hierarchical Dirichlet process (HDP) prior over the activities of the
top-level features in a deep Boltzmann machine (DBM). This compound HDP-DBM model learns to learn novel concepts from very few training
example by learning low-level generic features, high-level features that capture correlations among low-level features, and a category hierarchy for
sharing priors over the high-level features that are typical of different kinds of concepts. We present efficient learning and inference algorithms for
the HDP-DBM model and show that it is able to learn new concepts from very few examples on CIFAR-100 object recognition, handwritten
character recognition, and human motion capture datasets
The ability to learn abstract representations that support transfer to novel but related tasks lies at the core of many problems in computer vision,
natural language processing, cognitive science, and machine learning. In typical applications of machine classification algorithms today, learning a
new concept requires tens, hundreds, or thousands of training examples. For human learners, however, just one or a few examples are often
sufficient to grasp a new category and make meaningful generalizations to novel instances [15], [25], [31], [44]. Clearly, this requires very strong
but also appropriately tuned inductive biases. The architecture we describe here takes a step toward this ability by learning several forms of abstract
knowledge at different levels of abstraction that support transfer of useful inductive biases from previously learned concepts to novel ones.
We call our architectures compound HD models, where “HD” stands for “Hierarchical-Deep,” because they are derived by composing hierarchical
nonparametric Bayesian models with deep networks, two influential approaches from the recent unsupervised learning literature with
complementary strengths. Recently introduced deep learning models, including deep belief networks (DBNs) [12], deep Boltzmann machines
(DBM) [29], deep autoencoders [19], and many others [9], [10], [21], [22], [26], [32], [34], [43], have been shown to learn useful distributed feature
representations for many high-dimensional datasets. The ability to automatically learn in multiple layers allows deep models to construct
sophisticated domain-specific features without the need to rely on precise human-crafted input representations, increasingly important with the
proliferation of datasets and application domains.
Reinforcement Learning: Fast and Slow
From https://www.cell.com/trends/cognitive-sciences/fulltext/S1364-6613(19)30061-0
Meta-RL: Speeding up Deep RL by Learning to Learn
As discussed earlier, a second key source of slowness in standard deep RL, alongside incremental
updating, is weak inductive bias. As formalized in the idea of the bias–variance tradeoff, fast learning
requires the learner to go in with a reasonably sized set of hypotheses concerning the structure of the
patterns that it will face. The narrower the hypothesis set, the faster learning can be. However, as
foreshadowed earlier, there is a catch: a narrow hypothesis set will only speed learning if it contains
the correct hypothesis. While strong inductive biases can accelerate learning, they will only do so if
the specific biases the learner adopts happen to fit with the material to be learned. As a result of this, a
new learning problem arises: how can the learner know what inductive biases to adopt?
Episodic Deep RL: Fast Learning through Episodic Memory
If incremental parameter adjustment is one source of slowness in deep RL, then one way to
learn faster might be to avoid such incremental updating. Naively increasing the learning rate
governing gradient descent optimization leads to the problem of catastrophic interference.
However, recent research shows that there is another way to accomplish the same goal, which
is to keep an explicit record of past events, and use this record directly as a point of reference
in making new decisions. This idea, referred to as episodic RL parallels ‘non-parametric’
approaches in machine learning and resembles ‘instance-’ or ‘exemplar-based’ theories of
learning in psychology When a new situation is encountered and a decision must be made
concerning what action to take, the procedure is to compare an internal representation of the
current situation with stored representations of past situations. The action chosen is then the
one associated with the highest value, based on the outcomes of the past situations that are
most similar to the present. When the internal state representation is computed by a multilayer
neural network, we refer to the resulting algorithm as ‘episodic deep RL’.
Google Research featuring Jeff Dean
Large-Scale Deep Learning (Jeff Dean)
From http://static.googleusercontent.com/media/research.google.com/en//people/jeff/CIKM-keynote-Nov2014.pdf
Embedding for Sparse Inputs (Jeff Dean)
From http://static.googleusercontent.com/media/research.google.com/en//people/jeff/CIKM-keynote-Nov2014.pdf
Efficient Vector Representation of Words (Jeff Dean)
From http://static.googleusercontent.com/media/research.google.com/en//people/jeff/CIKM-keynote-Nov2014.pdf
Deep Convolution Neural Nets and Gaussian Processes
From https://ai.google/research/pubs/pub47671
Deep Convolution Neural Nets and Gaussian Processes(cont)
From https://ai.google/research/pubs/pub47671
Google’s Inception Network
From https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
Google’s Inception Network
From https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
Google’s Inception Network
From https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
Google’s Inception Network
From https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
Google’s Inception Network
From https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
Google’s Inception Network
From https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
Google’s Inception Network
From https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
Large-Scale Deep Learning (Jeff Dean)
From http://static.googleusercontent.com/media/research.google.com/en//people/jeff/CIKM-keynote-Nov2014.pdf
Large-Scale Deep Learning (Jeff Dean)
From http://static.googleusercontent.com/media/research.google.com/en//people/jeff/CIKM-keynote-Nov2014.pdf
Large-Scale Deep Learning (Jeff Dean)
From http://static.googleusercontent.com/media/research.google.com/en//people/jeff/CIKM-keynote-Nov2014.pdf
Large-Scale Deep Learning (Jeff Dean)
From http://static.googleusercontent.com/media/research.google.com/en//people/jeff/CIKM-keynote-Nov2014.pdf
Computing and Sensing Architecture
Simple
Event
Processing
Complex
Event
Processing
Hierarchical C4ISR Flow Model from Bob Marcus
Preprocess
In
Input
Devices
u
World
Model
Update
New
World
Model
Strategy
Tactics
HQ
Operations
Field
Operations
Situation Impact
Object Process
Simple
Response
Complex
Response
Update Plan
Create New
Goals and Plan
Sensor and
Effects
Management
In
Actuator
Devices
Measurement
Field
Processors
Data Structured Data Information Knowledge Wisdom
Devices
Awareness
Decision
Adapted From http://www.et-strategies.com/great-global-grid/Events.pdf
Computing and Sensing Architectures
From https://www.researchgate.net/publication/323835314_Greening_Trends_in_Energy-Efficiency_of_IoT-based_Heterogeneous_Wireless_Nodes/figures?lo=1
Computing and Sensing Architectures
From https://www.researchgate.net/publication/323835314_Greening_Trends_in_Energy-Efficiency_of_IoT-based_Heterogeneous_Wireless_Nodes/figures?lo=1
Bio-Inspired Distributed Intelligence
From https://news.mit.edu/2022/wiggling-toward-bio-inspired-machine-intelligence-juncal-arbelaiz-1002
More than half of an octopus’ nerves are distributed through its eight arms, each of which has some degree of autonomy. This
distributed sensing and information processing system intrigued Arbelaiz, who is researching how to design decentralized
intelligence for human-made systems with embedded sensing and computation. At MIT, Arbelaiz is an applied math student who
is working on the fundamentals of optimal distributed control and estimation in the final weeks before completing her PhD this
fall.
She finds inspiration in the biological intelligence of invertebrates such as octopus and jellyfish, with the ultimate goal of
designing novel control strategies for flexible “soft” robots that could be used in tight or delicate surroundings, such as a surgical
tool or for search-and-rescue missions.
“The squishiness of soft robots allows them to dynamically adapt to different environments. Think of worms, snakes, or jellyfish,
and compare their motion and adaptation capabilities to those of vertebrate animals,” says Arbelaiz. “It is an interesting expression
of embodied intelligence — lacking a rigid skeleton gives advantages to certain applications and helps to handle uncertainty in the
real world more efficiently. But this additional softness also entails new system-theoretic challenges.”
In the biological world, the “controller” is usually associated with the brain and central nervous system — it creates motor
commands for the muscles to achieve movement. Jellyfish and a few other soft organisms lack a centralized nerve center, or brain.
Inspired by this observation, she is now working toward a theory where soft-robotic systems could be controlled using
decentralized sensory information sharing.
“When sensing and actuation are distributed in the body of the robot and onboard computational capabilities are limited, it might
be difficult to implement centralized intelligence,” she says. “So, we need these sort of decentralized schemes that, despite sharing
sensory information only locally, guarantee the desired global behavior. Some biological systems, such as the jellyfish, are
beautiful examples of decentralized control architectures — locomotion is achieved in the absence of a (centralized) brain. This is
fascinating as compared to what we can achieve with human-made machines.”
IoT and Deep Learning
From https://cse.buffalo.edu/~lusu/papers/Computer2018.pdf
Deep Learning for IoT
Deep Learning for IoT Overview: Survey
From https://arxiv.org/pdf/1712.04301.pdf
Deep Learning for IoT Overview: Survey
From https://arxiv.org/pdf/1712.04301.pdf
Standardized IoT Data Sets: Survey
From https://arxiv.org/pdf/1712.04301.pdf
Standardized IoT Data Sets: Survey
From https://arxiv.org/pdf/1712.04301.pdf
DeepMind
DeepMind Website
DeepMind Home page
https://deepmind.com/
DeepMind Research
https://deepmind.com/research/
https://deepmind.com/research/publications/
DeepMind Blog
https://deepmind.com/blog
DeepMind Applied
https://deepmind.com/applied
DeepMind Featured Research Publications
From https://deepmind.com/research
AlphaGo
https://www.deepmind.com/research/highlighted-research/alphago
Deep Reinforcement Learning
https://deepmind.com/research/dqn/
A Dual Approach to Scalable Verification of Deep Networks
http://auai.org/uai2018/proceedings/papers/204.pdf
https://www.youtube.com/watch?v=SV05j3GM0LI
Learning to reinforcement learn
https://arxiv.org/abs/1611.05763
Neural Programmer - Interpreters
https://arxiv.org/pdf/1511.06279v3.pdf
Dueling Network Architectures for Deep Reinforcement Learning
https://arxiv.org/pdf/1511.06581.pdf
DeepMind Research over 400 publications
https://deepmind.com/research/publications/
DeepMind Applied
From https://deepmind.com/applied/
DeepMind Health
https://deepmind.com/applied/deepmind-health/
DeepMind for Google
https://deepmind.com/applied/deepmind-google/
DeepMind Ethics and Society
https://deepmind.com/applied/deepmind-ethics-society/
AlphaGo and AlphaGoZero
From https://www.deepmind.com/research/highlighted-research/alphago
We created AlphaGo, a computer program that combines advanced search tree with deep neural
networks. These neural networks take a description of the Go board as an input and process it
through a number of different network layers containing millions of neuron-like connections.
One neural network, the “policy network”, selects the next move to play. The other neural network,
the “value network”, predicts the winner of the game. We introduced AlphaGo to numerous amateur
games to help it develop an understanding of reasonable human play. Then we had it play against
different versions of itself thousands of times, each time learning from its mistakes.
Over time, AlphaGo improved and became increasingly stronger and better at learning and decision-
making. This process is known as reinforcement learning. AlphaGo went on to defeat Go world
champions in different global arenas and arguably became the greatest Go player of all time.
Following the summit, we revealed AlphaGo Zero. While AlphaGo learnt the game by
playing thousands of matches with amateur and professional players, AlphaGo Zero
learnt by playing against itself, starting from completely random play.
This powerful technique is no longer constrained by the limits of human knowledge. Instead,
the computer program accumulated thousands of years of human knowledge during a period of
just a few days and learned to play Go from the strongest player in the world, AlphaGo.
AlphaGo Zero quickly surpassed the performance of all previous versions and also discovered new
knowledge, developing unconventional strategies and creative new moves, including those which
beat the World Go Champions Lee Sedol and Ke Jie. These creative moments give us confidence
that AI can be used as a positive multiplier for human ingenuity.
AlphaZero
From https://www.deepmind.com/blog/alphazero-shedding-new-light-on-chess-shogi-and-go
In late 2017 we introduced AlphaZero, a single system that taught itself from scratch how to master the
games of chess, shogi(Japanese chess), and Go, beating a world-champion program in each case. We were
excited by the preliminary results and thrilled to see the response from members of the chess community,
who saw in AlphaZero’s games a ground-breaking, highly dynamic and “unconventional” style of play that
differed from any chess playing engine that came before it.
Today, we are delighted to introduce the full evaluation of AlphaZero, published in the journal Science (Open
Access version here), that confirms and updates those preliminary results. It describes how AlphaZero quickly
learns each game to become the strongest player in history for each, despite starting its training from random play,
with no in-built domain knowledge but the basic rules of the game.
This ability to learn each game afresh, unconstrained by the norms of human play, results in a distinctive,
unorthodox, yet creative and dynamic playing style. Chess Grandmaster Matthew Sadler and Women’s
International Master Natasha Regan, who have analysed thousands of AlphaZero’s chess games for their
forthcoming book Game Changer (New in Chess, January 2019), say its style is unlike any traditional chess
engine.” It’s like discovering the secret notebooks of some great player from the past,” says Matthew.
Traditional chess engines – including the world computer chess champion Stockfish and IBM’s ground-
breaking Deep Blue – rely on thousands of rules and heuristics handcrafted by strong human players that try
to account for every eventuality in a game. Shogi programs are also game specific, using similar search
engines and algorithms to chess programs.
AlphaZero takes a totally different approach, replacing these hand-crafted rules with a deep neural network
and general purpose algorithms that know nothing about the game beyond the basic rules.
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf
AGI Part 1.pdf

More Related Content

Similar to AGI Part 1.pdf

How to build an AI app.pdf
How to build an AI app.pdfHow to build an AI app.pdf
How to build an AI app.pdfStephenAmell4
 
What is Artificial Intelligence.docx
What is Artificial Intelligence.docxWhat is Artificial Intelligence.docx
What is Artificial Intelligence.docxAliParsa22
 
artificial intelligence
artificial intelligenceartificial intelligence
artificial intelligencevallibhargavi
 
artificial intelligence
artificial intelligenceartificial intelligence
artificial intelligencevallibhargavi
 
Addis abeb university ..Artificial intelligence .pptx
Addis abeb university ..Artificial intelligence .pptxAddis abeb university ..Artificial intelligence .pptx
Addis abeb university ..Artificial intelligence .pptxethiouniverse
 
ai and smart assistant using machine learning and deep learning
ai and smart assistant using machine learning and deep learningai and smart assistant using machine learning and deep learning
ai and smart assistant using machine learning and deep learninganujapawar1950
 
Artificial intelligent
Artificial intelligent Artificial intelligent
Artificial intelligent Omer Shaikh
 
aman presentation 2.pptx
aman presentation 2.pptxaman presentation 2.pptx
aman presentation 2.pptxSanuBose
 
Artificial intelligence and Conquering the next frontier of the digital world.
Artificial intelligence and Conquering the next frontier of the digital world.Artificial intelligence and Conquering the next frontier of the digital world.
Artificial intelligence and Conquering the next frontier of the digital world.Muhammad Hamza
 
Artificial Intelligence.
Artificial Intelligence.Artificial Intelligence.
Artificial Intelligence.DeepakKewlani4
 
A Case Study of Artificial Intelligence is being used to Reshape Business
A Case Study of Artificial Intelligence is being used to Reshape BusinessA Case Study of Artificial Intelligence is being used to Reshape Business
A Case Study of Artificial Intelligence is being used to Reshape BusinessAI Publications
 
Running head ARTIFICIAL INTELLIGENCE1ARTIFICIAL INTELLIGENCE.docx
Running head ARTIFICIAL INTELLIGENCE1ARTIFICIAL INTELLIGENCE.docxRunning head ARTIFICIAL INTELLIGENCE1ARTIFICIAL INTELLIGENCE.docx
Running head ARTIFICIAL INTELLIGENCE1ARTIFICIAL INTELLIGENCE.docxtoddr4
 
Action Transformer.pdf
Action Transformer.pdfAction Transformer.pdf
Action Transformer.pdfStephenAmell4
 
Action Transformer.pdf
Action Transformer.pdfAction Transformer.pdf
Action Transformer.pdfJamieDornan2
 
Top And Best Digital Marketing Agency With AI
Top And Best Digital Marketing Agency With AITop And Best Digital Marketing Agency With AI
Top And Best Digital Marketing Agency With AIamdigitalmark15
 
UNCOVERING FAKE NEWS BY MEANS OF SOCIAL NETWORK ANALYSIS
UNCOVERING FAKE NEWS BY MEANS OF SOCIAL NETWORK ANALYSISUNCOVERING FAKE NEWS BY MEANS OF SOCIAL NETWORK ANALYSIS
UNCOVERING FAKE NEWS BY MEANS OF SOCIAL NETWORK ANALYSISpijans
 

Similar to AGI Part 1.pdf (20)

How to build an AI app.pdf
How to build an AI app.pdfHow to build an AI app.pdf
How to build an AI app.pdf
 
What is Artificial Intelligence.docx
What is Artificial Intelligence.docxWhat is Artificial Intelligence.docx
What is Artificial Intelligence.docx
 
Lecture 1- Artificial Intelligence - Introduction
Lecture 1- Artificial Intelligence - IntroductionLecture 1- Artificial Intelligence - Introduction
Lecture 1- Artificial Intelligence - Introduction
 
artificial intelligence
artificial intelligenceartificial intelligence
artificial intelligence
 
artificial intelligence
artificial intelligenceartificial intelligence
artificial intelligence
 
Addis abeb university ..Artificial intelligence .pptx
Addis abeb university ..Artificial intelligence .pptxAddis abeb university ..Artificial intelligence .pptx
Addis abeb university ..Artificial intelligence .pptx
 
ai and smart assistant using machine learning and deep learning
ai and smart assistant using machine learning and deep learningai and smart assistant using machine learning and deep learning
ai and smart assistant using machine learning and deep learning
 
AI basic.pptx
AI basic.pptxAI basic.pptx
AI basic.pptx
 
AI.pptx
AI.pptxAI.pptx
AI.pptx
 
Artificial intelligent
Artificial intelligent Artificial intelligent
Artificial intelligent
 
aman presentation 2.pptx
aman presentation 2.pptxaman presentation 2.pptx
aman presentation 2.pptx
 
Artificial intelligence and Conquering the next frontier of the digital world.
Artificial intelligence and Conquering the next frontier of the digital world.Artificial intelligence and Conquering the next frontier of the digital world.
Artificial intelligence and Conquering the next frontier of the digital world.
 
Artificial Intelligence.
Artificial Intelligence.Artificial Intelligence.
Artificial Intelligence.
 
A Case Study of Artificial Intelligence is being used to Reshape Business
A Case Study of Artificial Intelligence is being used to Reshape BusinessA Case Study of Artificial Intelligence is being used to Reshape Business
A Case Study of Artificial Intelligence is being used to Reshape Business
 
Running head ARTIFICIAL INTELLIGENCE1ARTIFICIAL INTELLIGENCE.docx
Running head ARTIFICIAL INTELLIGENCE1ARTIFICIAL INTELLIGENCE.docxRunning head ARTIFICIAL INTELLIGENCE1ARTIFICIAL INTELLIGENCE.docx
Running head ARTIFICIAL INTELLIGENCE1ARTIFICIAL INTELLIGENCE.docx
 
Action Transformer.pdf
Action Transformer.pdfAction Transformer.pdf
Action Transformer.pdf
 
Action Transformer.pdf
Action Transformer.pdfAction Transformer.pdf
Action Transformer.pdf
 
Action Transformer.pdf
Action Transformer.pdfAction Transformer.pdf
Action Transformer.pdf
 
Top And Best Digital Marketing Agency With AI
Top And Best Digital Marketing Agency With AITop And Best Digital Marketing Agency With AI
Top And Best Digital Marketing Agency With AI
 
UNCOVERING FAKE NEWS BY MEANS OF SOCIAL NETWORK ANALYSIS
UNCOVERING FAKE NEWS BY MEANS OF SOCIAL NETWORK ANALYSISUNCOVERING FAKE NEWS BY MEANS OF SOCIAL NETWORK ANALYSIS
UNCOVERING FAKE NEWS BY MEANS OF SOCIAL NETWORK ANALYSIS
 

Recently uploaded

Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Recently uploaded (20)

Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 

AGI Part 1.pdf

  • 1. Artificial General Intelligence 1 Bob Marcus robert.marcus@et-strategies.com Part 1 of 4 parts: Artificial Intelligence and Machine Learning
  • 2. This is a first cut. More details will be added later.
  • 3. Part 1: Artificial Intelligence (AI) Part 2: Natural Intelligence(NI) Part 3: Artificial General Intelligence (AI + NI) Part 4: Networked AGI Layer on top or Gaia and Human Society Four Slide Sets on Artificial General Intelligence AI = Artificial Intelligence (Task) AGI = Artificial Mind (Simulation) AB = Artificial Brain (Emulation) AC = Artificial Consciousness (Synthetic) AI < AGI < ? AB <AC (Is a partial brain emulation needed to create a mind?) Mind is not required for task proficiency Full Natural Brain architecture is not required for a mind Consciousness is not required for a natural brain architecture
  • 4. Philosophical Musings 10/2022 Focused Artifical Intelligence (AI) will get better at specific tasks Specific AI implementations will probably exceed human performance in most tasks Some will attain superhuman abilities is a wide range of tasks “Common Sense” = low-level experiential broad knowledge could be an exception Some AIs could use brain inspired architectures to improve complex ask performance This is not equivalent to human or artificial general intelligence (AGI) However networking task-centric AIs could provide a first step towards AGI This is similar to the way human society achieves power from communication The combination of the networked AIs could be the foundation of an artificial mind In a similar fashion, human society can accomplish complex tasks without being conscious Distributed division of labor enable tasks to be assigned to the most competent element Networked humans and AIs could cooperate through brain-machine interfaces In the brain, consciousness provides direction to the mind In large societies, governments perform the role of conscious direction With networked AIs, a “conscious operating system”could play a similar role. This would probably have to be initially programmed by humans. If the AI network included sensors, actuators, and robots it could be aware of the world The AI network could form a grid managing society, biology, and geology layers A conscious AI network could develop its own goals beyond efficient management Humans in the loop could be valuable in providing common sense and protective oversight
  • 5. Outline Classical AI Knowledge Representation Agents Classical Machine Learning Deep Learning Deep Learning Models Deep Learning Hardware Reinforcement Learning Google Research Computing and Sensing Architecture IoT and Deep Learning DeepMind Deep Learning 2020 Causal Reasoning and Deep Learning References
  • 6. Classical AI Classical Paper Awards 1999-2022
  • 7. Top 100 AI Start-ups From https://singularityhub.com/2020/03/30/the-top-100-ai-startups-out-there-now-and-what-theyre-working-on/
  • 8. Classical AI Tools Lisp https://en.wikipedia.org/wiki/Lisp_(programming_language) Prolog https://www.geeksforgeeks.org/prolog-an-introduction/ Knowledge Representation https://en.wikipedia.org/wiki/Knowledge_representation_and_reasoning Decision Trees https://en.wikipedia.org/wiki/Decision_tree Forward and Backward Chaining https://www.section.io/engineering-education/forward-and-backward-chaining-in-ai/ Constraint Satisfaction https://en.wikipedia.org/wiki/Constraint_satisfaction OPS5 https://en.wikipedia.org/wiki/OPS5
  • 9. Classical AI Systems CYC https://en.wikipedia.org/wiki/Cyc Expert Systems https://en.wikipedia.org/wiki/Expert_system XCON https://en.wikipedia.org/wiki/Xcon MYCIN https://en.wikipedia.org/wiki/Mycin MYCON https://www.slideshare.net/bobmarcus/1986-multilevel-constraintbased-configuration-article https://www.slideshare.net/bobmarcus/1986-mycon-multilevel-constraint-based-configuration
  • 11. Stored Knowledge Base From https://www.researchgate.net/publication/327926311_Development_of_a_knowledge_base_based_on_context_analysis_of_external_information_resources/figures?lo=1
  • 15. Intelligent Agents From https://en.wikipedia.org/wiki/Intelligent_agent In artificial intelligence, an intelligent agent (IA) is anything which perceives its environment, takes actions autonomously in order to achieve goals, and may improve its performance with learning or may use knowledge. They may be simple or complex — a thermostat is considered an example of an intelligent agent, as is a human being, as is any system that meets the definition, such as a firm, a state, or a biome.[1] Leading AI textbooks define "artificial intelligence" as the "study and design of intelligent agents", a definition that considers goal-directed behavior to be the essence of intelligence. Goal-directed agents are also described using a term borrowed from economics, "rational agent".[1] An agent has an "objective function" that encapsulates all the IA's goals. Such an agent is designed to create and execute whatever plan will, upon completion, maximize the expected value of the objective function.[2] For example, a reinforcement learning agent has a "reward function" that allows the programmers to shape the IA's desired behavior,[3] and an evolutionary algorithm's behavior is shaped by a "fitness function".[4] Intelligent agents in artificial intelligence are closely related to agents in economics, and versions of the intelligent agent paradigm are studied in cognitive science, ethics, the philosophy of practical reason, as well as in many interdisciplinary socio-cognitive modeling and computer social simulations. Intelligent agents are often described schematically as an abstract functional system similar to a computer program. Abstract descriptions of intelligent agents are called abstract intelligent agents (AIA) to distinguish them from their real world implementations. An autonomous intelligent agent is designed to function in the absence of human intervention. Intelligent agents are also closely related to software agents (an autonomous computer program that carries out tasks on behalf of users).
  • 16. Node in Real-Time Control System (RCS) by Albus From https://en.wikipedia.org/wiki/4D-RCS_Reference_Model_Architecture
  • 17. Intelligent Agents for Network Management From https://www.ericsson.com/en/blog/2022/6/who-are-the-intelligent-agents-in-network-operations-and-why-we-need-them
  • 18. Intelligent Agents on the Web From https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.230.5806&rep=rep1&type=pdf Intelligent agents are goal-driven and autonomous, and can communicate and interact with each other. Moreover, they can evaluate information obtained online from heterogeneoussources and present information tailored to an individual’s needs. This article covers different facets of the intelligent agent paradigm and applications, while also exploring new opportunities and trends for intelligent agents. IAs cover several functionalities, ranging from adaptive user interfaces (called interface agents) tointelligent mobile processes that cooperate with other agents to coordinate their activities in a distributed manner. The requirements for IAs remain open for discussion. An agent should be able to: • interact with humans and other agents • anticipate user needs for information • adapt to changes in user needs and the environment • cope with heterogeneity of information and other agents. The following attributes characterize an IA-based systems’ main capabilities: • Intelligence. The method an agent uses to de-velop its intelligence includes using the agent’sown software content and knowledge representation, which describes vocabulary data, conditions, goals, and tasks. • Continuity. An agent is a continuously running process that can detect changes in its environment, modify its behavior, and update its knowledge base (which describes the environment). • Communication. An agent can communicate with other agents to achieve its goals, and it can interact with users directly by using appropriate interfaces. • Cooperation. An agent automatically customizes itself to its users’ needs based on previous experiences and monitored profiles. • Mobility. The degree of mobility with which an agent can perform varies from remote execution, in which the agent is transferred from a distant system, to a situation in which the agent creates new agents, dies, or executes partially during migratiion
  • 19. Smart Agents 2022 Comparison From https://www.businessnewsdaily.com/10315-siri-cortana-google-assistant-amazon-alexa-face-off.html When AI assistants first hit the market, they were far from ubiquitous, but thanks to more third-party OEMs jumping on the smart speaker bandwagon, there are more choices for assistant-enabled devices than ever. In addition to increasing variety, in terms of hardware, devices that support multiple types of AI assistants are becoming more common. Despite more integration, competition between AI assistants is still stiff, so to save you time and frustration, we did an extensive hands-on test – not to compare speakers against each other, but to compare the AI assistants themselves. There are four frontrunners in the AI assistant space: Amazon (Alexa), Apple (Siri), Google (Google Assistant) and Microsoft (Cortana). Rather than gauge each assistant’s efficacy based on company-reported features, I spent hours testing each assistant by issuing commands and asking questions that many business users would use. I constructed questions to test basic understanding as well as contextual understanding and general vocal recognition. Accessibility and trends Ease of setup Voice recognition Success of queries and ability to understand context Bottom line None of the AI assistants are perfect; this is young technology, and it has a long way to go. There was a handful of questions that none of the virtual assistants on my list could answer. For example, when I asked for directions to the closest airport, even the two best assistants on my list, Google Assistant and Siri, failed hilariously: Google Assistant directed me to a travel agency (those still exist?), while Siri directed me to a seaplane base (so close!). Judging purely on out-of-the-box functionality, I would choose either Siri or Google Assistant, and I would make the final choice based on hardware preferences. None of the assistants are good enough to go out of your way to adopt. Choose between Siri and Google Assistant based on convenience and what hardware you already have IFTTT = "if this, then that," is a service that lets you connect apps, services, and smart home devices.
  • 20. Amazon Alexa From https://en.wikipedia.org/wiki/Amazon_Alexa Amazon Alexa, also known simply as Alexa,[2] is a virtual assistant technology largely based on a Polish speech synthesiser named Ivona, bought by Amazon in 2013.[3][4] It was first used in the Amazon Echo smart speaker and the Echo Dot, Echo Studio and Amazon Tap speakers developed by Amazon Lab126. It is capable of voice interaction, music playback, making to-do lists, setting alarms, streaming podcasts, playing audiobooks, and providing weather, traffic, sports, and other real-time information, such as news.[5] Alexa can also control several smart devices using itself as a home automation system. Users are able to extend the Alexa capabilities by installing "skills" (additional functionality developed by third-party vendors, in other settings more commonly called apps) such as weather programs and audio features. It uses automatic speech recognition, natural language processing, and other forms of weak AI to perform these tasks.[6] Most devices with Alexa allow users to activate the device using a wake-word[7] (such as Alexa or Amazon); other devices (such as the Amazon mobile app on iOS or Android and Amazon Dash Wand) require the user to click a button to activate Alexa's listening mode, although, some phones also allow a user to say a command, such as "Alexa" or "Alexa wake".
  • 21. Google Assistant From https://en.wikipedia.org/wiki/Google_Assistant Google Assistant is a virtual assistant software application developed by Google that is primarily available on mobile and home automation devices. Based on artificial intelligence, Google Assistant can engage in two-way conversations,[1] unlike the company's previous virtual assistant, Google Now. Google Assistant debuted in May 2016 as part of Google's messaging app Allo, and its voice-activated speaker Google Home. After a period of exclusivity on the Pixel and Pixel XL smartphones, it was deployed on other Android devices starting in February 2017, including third-party smartphones and Android Wear (now Wear OS), and was released as a standalone app on the iOS operating system in May 2017. Alongside the announcement of a software development kit in April 2017, Assistant has been further extended to support a large variety of devices, including cars and third-party smart home appliances. The functionality of the Assistant can also be enhanced by third-party developers. Users primarily interact with the Google Assistant through natural voice, though keyboard input is also supported. Assistant is able to answer questions, schedule events and alarms, adjust hardware settings on the user's device, show information from the user's Google account, play games, and more. Google has also announced that Assistant will be able to identify objects and gather visual information through the device's camera, and support purchasing products and sending money.
  • 22. Apple Siri https://en.wikipedia.org/wiki/Siri Siri (/ˈsɪri/ SEER-ee) is a virtual assistant that is part of Apple Inc.'s iOS, iPadOS, watchOS, macOS, tvOS, and audioOS operating systems.[1] [2] It uses voice queries, gesture based control, focus-tracking and a natural-language user interface to answer questions, make recommendations, and perform actions by delegating requests to a set of Internet services. With continued use, it adapts to users' individual language usages, searches and preferences, returning individualized results. Siri is a spin-off from a project developed by the SRI International Artificial Intelligence Center. Its speech recognition engine was provided by Nuance Communications, and it uses advanced machine learning technologies to function. Its original American, British and Australian voice actors recorded their respective voices around 2005, unaware of the recordings' eventual usage. Siri was released as an app for iOS in February 2010. Two months later, Apple acquired it and integrated into iPhone 4S at its release on 4 October, 2011, removing the separate app from the iOS App Store. Siri has since been an integral part of Apple's products, having been adapted into other hardware devices including newer iPhone models, iPad, iPod Touch, Mac, AirPods, Apple TV, and HomePod. Siri supports a wide range of user commands, including performing phone actions, checking basic information, scheduling events and reminders, handling device settings, searching the Internet, navigating areas, finding information on entertainment, and is able to engage with iOS-integrated apps. With the release of iOS 10 in 2016, Apple opened up limited third-party access to Siri, including third-party messaging apps, as well as payments, ride-sharing, and Internet calling apps. With the release of iOS 11, Apple updated Siri's voice and added support for follow-up questions, language translation, and additional third-party actions.
  • 23. Microsoft Cortana From https://en.wikipedia.org/wiki/Cortana_(virtual_assistant) Cortana is a virtual assistant developed by Microsoft that uses the Bing search engine to perform tasks such as setting reminders and answering questions for the user. Cortana is currently available in English, Portuguese, French, German, Italian, Spanish, Chinese, and Japanese language editions, depending on the software platform and region in which it is used.[8] Microsoft began reducing the prevalence of Cortana and converting it from an assistant into different software integrations in 2019.[9] It was split from the Windows 10 search bar in April 2019.[10] In January 2020, the Cortana mobile app was removed from certain markets,[11][12] and on March 31, 2021, the Cortana mobile app was shut down globally.[13] Microsoft has integrated Cortana into numerous products such as Microsoft Edge,[28] the browser bundled with Windows 10. Microsoft's Cortana assistant is deeply integrated into its Edge browser. Cortana can find opening hours when on restaurant sites, show retail coupons for websites, or show weather information in the address bar. At the Worldwide Partners Conference 2015 Microsoft demonstrated Cortana integration with products such as GigJam.[29] Conversely, Microsoft announced in late April 2016 that it would block anything other than Bing and Edge from being used to complete Cortana searches, again raising questions of anti-competitive practices by the company.[30] In May 2017, Microsoft in collaboration with Harman Kardon announced INVOKE, a voice-activated speaker featuring Cortana. The premium speaker has a cylindrical design and offers 360 degree sound, the ability to make and receive calls with Skype, and all of the other features currently available with Cortana.[42]
  • 25. Machine Learning Types From https://towardsdatascience.com/coding-deep-learning-for-beginners-types-of-machine-learning-b9e651e1ed9d
  • 26. Perceptron From https://deepai.org/machine-learning-glossary-and-terms/perceptron How does a Perceptron work? The process begins by taking all the input values and multiplying them by their weights. Then, all of these multiplied values are added together to create the weighted sum. The weighted sum is then applied to the activation function, producing the perceptron's output. The activation function plays the integral role of ensuring the output is mapped between required values such as (0,1) or (-1,1). It is important to note that the weight of an input is indicative of the strength of a node. Similarly, an input's bias value gives the ability to shift the activation function curve up or down.
  • 27. Ensemble Machine Learning From https://machinelearningmastery.com/tour-of-ensemble-learning-algorithms/ Ensemble learning is a general meta approach to machine learning that seeks better predictive performance by combining the predictions from multiple models. Although there are a seemingly unlimited number of ensembles that you can develop for your predictive modeling problem, there are three methods that dominate the field of ensemble learning. So much so, that rather than algorithms per se, each is a field of study that has spawned many more specialized methods. The three main classes of ensemble learning methods are bagging, stacking, and boosting, and it is important to both have a detailed understanding of each method and to consider them on your predictive modeling project. But, before that, you need a gentle introduction to these approaches and the key ideas behind each method prior to layering on math and code. In this tutorial, you will discover the three standard ensemble learning techniques for machine learning. After completing this tutorial, you will know: • Bagging involves fitting many decision trees on different samples of the same dataset and averaging the predictions. • Stacking involves fitting many different models types on the same data and using another model to learn how to best combine the predictions. • Boosting involves adding ensemble members sequentially that correct the predictions made by prior models and outputs a weighted average of the predictions.
  • 28. Bagging From https://en.wikipedia.org/wiki/Bootstrap_aggregating Bootstrap aggregating, also called bagging (from bootstrap aggregating), is a machine learning ensemble meta-algorithm designed to improve the stability and accuracy of machine learning algorithms used in statistical classification and regression. It also reduces variance and helps to avoid overfitting. Although it is usually applied to decision tree methods, it can be used with any type of method. Bagging is a special case of the model averaging approach. Given a standard training set of size n, bagging generates m new training sets , each of size nʹ, by sampling from D uniformly and with replacement. By sampling with replacement, some observations may be repeated in each . If nʹ=n, then for large n the set is expected to have the fraction (1 - 1/e) (≈63.2%) of the unique examples of D, the rest being duplicates.[1] This kind of sample is known as a bootstrap sample. Sampling with replacement ensures each bootstrap is independent from its peers, as it does not depend on previous chosen samples when sampling. Then, m models are fitted using the above m bootstrap samples and combined by averaging the output (for regression) or voting (for classification).
  • 29. Boosting From https://www.ibm.com/cloud/learn/boosting and https://en.wikipedia.org/wiki/Boosting_(machine_learning) In machine learning, boosting is an ensemble meta-algorithm for primarily reducing bias, and also variance[1] in supervised learning, and a family of machine learning algorithms that convert weak learners to strong ones Bagging vs Boosting Bagging and boosting are two main types of ensemble learning methods. As highlighted in this study (PDF, 242 KB) (link resides outside IBM), the main difference between these learning methods is the way in which they are trained. In bagging, weak learners are trained in parallel, but in boosting, they learn sequentially. This means that a series of models are constructed and with each new model iteration, the weights of the misclassified data in the previous model are increased. This redistribution of weights helps the algorithm identify the parameters that it needs to focus on to improve its performance. AdaBoost, which stands for “adaptative boosting algorithm,” is one of the most popular boosting algorithms as it was one of the first of its kind. Other types of boosting algorithms include XGBoost, GradientBoost, and BrownBoost. Another difference between bagging and boosting is in how they are used. For example, bagging methods are typically used on weak learners that exhibit high variance and low bias, whereas boosting methods are leveraged when low variance and high bias is observed. While bagging can be used to avoid overfitting, boosting methods can be more prone to this (link resides outside IBM) although it really depends on the dataset. However, parameter tuning can help avoid the issue. As a result, bagging and boosting have different real-world applications as well. Bagging has been leveraged for loan approval processes and statistical genomics while boosting has been used more within image recognition apps and search engines. Boosting is an ensemble learning method that combines a set of weak learners into a strong learner to minimize training errors. In boosting, a random sample of data is selected, fitted with a model and then trained sequentially—that is, each model tries to compensate for the weaknesses of its predecessor. With each iteration, the weak rules from each individual classifier are combined to form one, strong prediction rule.
  • 30. Stacking From https://www.geeksforgeeks.org/stacking-in-machine-learning/ Stacking is a way to ensemble multiple classifications or regression model. There are many ways to ensemble models, the widely known models are Bagging or Boosting. Bagging allows multiple similar models with high variance are averaged to decrease variance. Boosting builds multiple incremental models to decrease the bias, while keeping variance small. Stacking (sometimes called Stacked Generalization) is a different paradigm. The point of stacking is to explore a space of different models for the same problem. The idea is that you can attack a learning problem with different types of models which are capable to learn some part of the problem, but not the whole space of the problem. So, you can build multiple different learners and you use them to build an intermediate prediction, one prediction for each learned model. Then you add a new model which learns from the intermediate predictions the same target. This final model is said to be stacked on the top of the others, hence the name. Thus, you might improve your overall performance, and often you end up with a model which is better than any individual intermediate model. Notice however, that it does not give you any guarantee, as is often the case with any machine learning technique.
  • 31. Gradient Boosting From https://en.wikipedia.org/wiki/Gradient_boosting Gradient boosting is a machine learning technique used in regression and classification tasks, among others. It gives a prediction model in the form of an ensemble of weak prediction models, which are typically decision trees.[1][2] When a decision tree is the weak learner, the resulting algorithm is called gradient-boosted trees; it usually outperforms random forest.[1][2][3] A gradient-boosted trees model is built in a stage-wise fashion as in other boosting methods, but it generalizes the other methods by allowing optimization of an arbitrary differentiable loss function.
  • 32. Introduction to XG Boost From https://machinelearningmastery.com/gentle-introduction-xgboost-applied-machine-learning/
  • 33. Terminology ・SoftMax https://en.wikipedia.org/wiki/Softmax_function ・SoftPlus https://en.wikipedia.org/wiki/Rectifier_(neural_networks)#Softplus ・Logit https://en.wikipedia.org/wiki/Logit ・Sigmoid https://en.wikipedia.org/wiki/Sigmoid_function ・Logistic Function https://en.wikipedia.org/wiki/Logistic_function ・Tanh https://brenocon.com/blog/2013/10/tanh-is-a-rescaled-logistic-sigmoid-function/ ・ReLu https://en.wikipedia.org/wiki/Rectifier_(neural_networks) ・Maxpool Selects the maximum in subsets of convolutional neural nets layer ・
  • 34. Relationships SoftMax SoftPlus Sigmoid = Logistic Tanh Logit Inverses Derivative SoftMax (z, 0) First component SoftMax (z, -z) First component SoftMax (z, -z) Second component - x = log (2p/(1-p)) (0, x) (-1, 1) (0, 1) (-∞, + ∞) (0,1) Log (SoftMax (z1, z2) First component)/ (SoftMax (z1, z2) Second component)) ReLu (0, x)
  • 35. Terminology (continued) ・Ηeteroscedastic https://en.wiktionary.org/wiki/scedasticity ・Maxout https://stats.stackexchange.com/questions/129698/what-is-maxout-in-neural-network/298705 ・Cross-Entropy https://en.wikipedia.org/wiki/Cross_entropy -Ep(log q) ・Joint Entropy https://en.wikipedia.org/wiki/Joint_entropy - Ep(X,Y) (log (p(X,Y)) ・KL Divergence https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence ・H(P,Q) = H(P) + KL(P,Q) or Ep(log q) = -Ep(log p) + {Ep(log p) - Ep(log q)} ・Mutual Information https://en.wikipedia.org/wiki/Mutual_information KL (p(x,y), p(x)p(y)) ・Ridge Regression and Lasso Regression https://hackernoon.com/practical-machine-learning-ridge-regression-vs-lasso-a00326371ece ・Logistic Regression https://www.stat.cmu.edu/~cshalizi/uADA/12/lectures/ch12.pdf ・Dropout https://en.wikipedia.org/wiki/Dropout_(neural_networks) ・RMSProp and AdaGrad and AdaDelta and Adam https://www.quora.com/What-are-differences-between-update-rules-like-AdaDelta-RMSProp-AdaGrad-and-AdaM ・Pooling https://www.quora.com/Is-pooling-indispensable-in-deep-learning ・Boltzmann Machine https://en.wikipedia.org/wiki/Boltzmann_machine ・Hyperparameters ・
  • 36. Reinforcement Learning Book From https://www.andrew.cmu.edu/course/10-703/textbook/BartoSutton.pdf
  • 37. Acumos Shared Model Process Flow From https://arxiv.org/ftp/arxiv/papers/1810/1810.07159.pdf
  • 38. Distributed AI From https://en.wikipedia.org/wiki/Distributed_artificial_intelligence Distributed Artificial Intelligence (DAI) also called Decentralized Artificial Intelligence[1] is a subfield of artificial intelligence research dedicated to the development of distributed solutions for problems. DAI is closely related to and a predecessor of the field of multi-agent systems. The objectives of Distributed Artificial Intelligence are to solve the reasoning, planning, learning and perception problems of artificial intelligence, especially if they require large data, by distributing the problem to autonomous processing nodes (agents). To reach the objective, DAI requires: • A distributed system with robust and elastic computation on unreliable and failing resources that are loosely coupled • Coordination of the actions and communication of the nodes • Subsamples of large data sets and online machine learning There are many reasons for wanting to distribute intelligence or cope with multi-agent systems. Mainstream problems in DAI research include the following: • Parallel problem solving: mainly deals with how classic artificial intelligence concepts can be modified, so that multiprocessor systems and clusters of computers can be used to speed up calculation. • Distributed problem solving (DPS): the concept of agent, autonomous entities that can communicate with each other, was developed to serve as an abstraction for developing DPS systems. See below for further details. • Multi-Agent Based Simulation (MABS): a branch of DAI that builds the foundation for simulations that need to analyze not only phenomena at macro level but also at micro level, as it is in many social simulation scenarios.
  • 39. Swarm Intelligence From https://en.wikipedia.org/wiki/Swarm_intelligence Swarm intelligence (SI) is the collective behavior of decentralized, self-organized systems, natural or artificial. The concept is employed in work on artificial intelligence. The expression was introduced by Gerardo Beni and Jing Wang in 1989, in the context of cellular robotic systems.[1] SI systems consist typically of a population of simple agents or boids interacting locally with one another and with their environment.[2] The inspiration often comes from nature, especially biological systems. The agents follow very simple rules, and although there is no centralized control structure dictating how individual agents should behave, local, and to a certain degree random, interactions between such agents lead to the emergence of "intelligent" global behavior, unknown to the individual agents.[3] Examples of swarm intelligence in natural systems include ant colonies, bee colonies, bird flocking, hawks hunting, animal herding, bacterial growth, fish schooling and microbial intelligence. The application of swarm principles to robots is called swarm robotics while swarm intelligence refers to the more general set of algorithms. Swarm prediction has been used in the context of forecasting problems. Similar approaches to those proposed for swarm robotics are considered for genetically modified organisms in synthetic collective intelligence.[4] • 1 Models of swarm behavior ◦ 1.1 Boids (Reynolds 1987) ◦ 1.2 Self-propelled particles (Vicsek et al. 1995) • 2 Metaheuristics ◦ 2.1 Stochastic diffusion search (Bishop 1989) ◦ 2.2 Ant colony optimization (Dorigo 1992) ◦ 2.3 Particle swarm optimization (Kennedy, Eberhart & Shi 1995) ◦ 2.4 Artificial Swarm Intelligence (2015) • 3 Applications ◦ 3.1 Ant-based routing ◦ 3.2 Crowd simulation ▪ 3.2.1 Instances ◦ 3.3 Human swarming ◦ 3.4 Swarm grammars ◦ 3.5 Swarmic art
  • 40. IBM Watson From https://en.wikipedia.org/wiki/IBM_Watson IBM Watson is a question-answering computer system capable of answering questions posed in natural language,[2] developed in IBM's DeepQA project by a research team led by principal investigator David Ferrucci.[3] Watson was named after IBM's founder and first CEO, industrialist Thomas J. Watson.[4][5] Software -Watson uses IBM's DeepQA software and the Apache UIMA (Unstructured Information Management Architecture) framework implementation. The system was written in various languages, including Java, C++, and Prolog, and runs on the SUSE Linux Enterprise Server 11 operating system using the Apache Hadoop framework to provide distributed computing.[12][13][14] Hardware -The system is workload-optimized, integrating massively parallel POWER7 processors and built on IBM's DeepQA technology,[15] which it uses to generate hypotheses, gather massive evidence, and analyze data.[2] Watson employs a cluster of ninety IBM Power 750 servers, each of which uses a 3.5 GHz POWER7 eight- core processor, with four threads per core. In total, the system has 2,880 POWER7 processor threads and 16 terabytes of RAM.[15] According to John Rennie, Watson can process 500 gigabytes (the equivalent of a million books) per second.[16] IBM master inventor and senior consultant Tony Pearson estimated Watson's hardware cost at about three million dollars.[17] Its Linpack performance stands at 80 TeraFLOPs, which is about half as fast as the cut-off line for the Top 500 Supercomputers list.[18] According to Rennie, all content was stored in Watson's RAM for the Jeopardy game because data stored on hard drives would be too slow to compete with human Jeopardy champions.[16] Data -The sources of information for Watson include encyclopedias, dictionaries, thesauri, newswire articles and literary works. Watson also used databases, taxonomies and ontologies including DBPedia, WordNet and Yago.[19] The IBM team provided Watson with millions of documents, including dictionaries, encyclopedias and other reference material, that it could use to build its knowledge.[20] From https://www.researchgate.net/publication/282644173_Implementation_of_a_Natural_Language_Processing_Tool_for_Cyber-Physical_Systems/figures?lo=1
  • 42. Three Types of Deep Learning From https://www.slideshare.net/TerryTaewoongUm/introduction-to-deep-learning-with-tensorflow
  • 44. Convolutional Neural Nets Comparison (2016) From https://medium.com/@culurciello/analysis-of-deep-neural-networks-dcf398e71aae Reference: https://towardsdatascience.com/neural-network-architectures-156e5bad51ba
  • 45. Recurrent Neural Networks From https://medium.com/deep-math-machine-learning-ai/chapter-10-deepnlp-recurrent-neural-networks-with-math-c4a6846a50a2
  • 47. Dynamical System View on Recurrent Neural Networks From https://openreview.net/pdf?id=ryxepo0cFX
  • 49. Deep Learning Models From https://arxiv.org/pdf/1712.04301.pdf
  • 50. Neural Net Models From https://becominghuman.ai/cheat-sheets-for-ai-neural-networks-machine-learning-deep-learning-big-data-678c51b4b463
  • 51. Neural Net Models (cont) From https://becominghuman.ai/cheat-sheets-for-ai-neural-networks-machine-learning-deep-learning-big-data-678c51b4b463
  • 52. TensorFlow From https://en.wikipedia.org/wiki/TensorFlow TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks.[4][5] TensorFlow was developed by the Google Brain team for internal Google use in research and production.[6][7][8] The initial version was released under the Apache License 2.0 in 2015.[1][9] Google released the updated version of TensorFlow, named TensorFlow 2.0, in September 2019.[10] TensorFlow can be used in a wide variety of programming languages, most notably Python, as well as Javascript, C++, and Java.[11] This flexibility lends itself to a range of applications in many different sectors.
  • 53. Keras From https://en.wikipedia.org/wiki/Keras Keras is an open-source software library that provides a Python interface for artificial neural networks. Keras acts as an interface for the TensorFlow library. Up until version 2.3, Keras supported multiple backends, including TensorFlow, Microsoft Cognitive Toolkit, Theano, and PlaidML.[1][2][3] As of version 2.4, only TensorFlow is supported. Designed to enable fast experimentation with deep neural networks, it focuses on being user-friendly, modular, and extensible. It was developed as part of the research effort of project ONEIROS (Open-ended Neuro-Electronic Intelligent Robot Operating System),[4] and its primary author and maintainer is François Chollet, a Google engineer. Chollet is also the author of the Xception deep neural network model.[5]
  • 54. Comparison of Deep Learning Frameworks From https://arxiv.org/pdf/1903.00102.pdf
  • 55. Popularity of Deep Learning Frameworks From https://medium.com/implodinggradients/tensorflow-or-keras-which-one-should-i-learn-5dd7fa3f9ca0
  • 56. Acronyms in Deep Learning • RBM - Restricted Boltzmann Machines • MLP - Multi-layer Perceptron • DBN - Deep Belief Network • CNN - Convolution Neural Network • RNN - Recurrent Neural Network • SGD - Stochastic Gradient Descent • XOR - Exclusive Or • SVM - SupportVector Machine • ReLu - Rectified Linear Unit • MNIST - Modified National Institute of Standards and Technology • RBF - Radial Basis Function • HMM - Hidden Markovv Model • MAP - Maximum A Postiori • MLE - Maximum Likelihood Estimate • Adam - Adaptive Moment Estimation • LSTM - Long Short Term Memory • GRU - Gated Recurrent Unit
  • 57. Concerns for Deep Learning by Gary Marcus From https://arxiv.org/ftp/arxiv/papers/1801/1801.00631.pdf Deep Learning thus far: • Is data hungry • Is shallow and has limited capacity for transfer • Has no natural way to deal with hierarchical structure • Has struggled with open-ended inference • Is not sufficiently transparent • Has not been well integrated with prior knowledge • Cannot inherently distinguish causation from correlation • Presumes a largely stable world, in ways that may be problematic • Works well as an approximation, but answers often can’t be fully trusted • Is difficult to engineer with
  • 59. How transferable are features in deep neural networks? From http://cs231n.github.io/transfer-learning/
  • 61. More Transfer Learning From https://towardsdatascience.com/transfer-learning-from-pre-trained-models-f2393f124751
  • 62. More Transfer Learning From http://ruder.io/transfer-learning/
  • 63. Bayesian Deep Learning From https://alexgkendall.com/computer_vision/bayesian_deep_learning_for_safe_ai/
  • 64. Bayesian Learning vis Stochastic Gradient Langevin Dynamics From https://tinyurl.com/22xayz76 In this paper we propose a new framework for learning from large scale datasets based on iterative learning from small minibatches.By adding the right amount of noise to a standard stochastic gradient optimization algorithm we show that the iterates will converge to samples from the true posterior distribution as we anneal the stepsize. This seamless transition between optimization and Bayesian posterior sampling provides an in- built protection against overfitting. We also propose a practical method for Monte Carlo estimates of posterior statistics which monitors a “sampling threshold” and collects samples after it has been surpassed. We apply the method to three models: a mixture of Gaussians, logistic regression and ICA with natural gradients Our method combines Robbins-Monro type algorithms which stochastically optimize a likelihood, with Langevin dynamics which injects noise into the parameter updates in such a waythat the trajectory of the parameters will converge to the full posterior distribution rather than just themaximum a posteriori mode. The resulting algorithm starts off being similar to stochastic optimization, then automatically transitions to one that simulates samples from the posterior using Langevin dynamics.
  • 65. DeterministicVariational Inference for Robust Bayesian NNs From https://openreview.net/pdf?id=B1l08oAct7
  • 66. Bayesian Deep Learning Survey From https://arxiv.org/pdf/1604.01662.pdf Conclusion and Future Research In this survey, we identified a current trend of merging probabilistic graphical models and neural networks (deep learning) and reviewed recent work on Bayesian deep learning, which strives to combine the merits of PGM and NN by organically integrating them in a single principled probabilistic framework. To learn parameters in BDL, several algorithms have been proposed, ranging from block coordinate descent, Bayesian conditional density filtering, and stochastic gradient thermostats to stochastic gradient variational Bayes. Bayesian deep learning gains its popularity both from the success of PGM and from the recent promising advances on deep learning. Since many real-world tasks involve both perception and inference, BDL is a natural choice to harness the perception ability from NN and the (causal and logical) inference ability from PGM. Although current applications of BDL focus on recommender systems, topic models, and stochastic optimal control, in the future, we can expect an increasing number of other applications like link prediction, community detection, active learning, Bayesian reinforcement learning, and many other complex tasks that need interaction between perception and causal inference. Besides, with the advances of efficient Bayesian neural networks (BNN), BDL with BNN as an important component is expected to be more and more scalable
  • 67. Ensemble Methods for Deep Learning From https://machinelearningmastery.com/ensemble-methods-for-deep-learning-neural-networks/
  • 68. Comparing Loss Functions From Neural Networks and Deep Learning Book
  • 69. Seed Reinforcement Learning from Google From https://ai.googleblog.com/2020/03/massively-scaling-reinforcement.html The field of reinforcement learning (RL) has recently seen impressive results across a variety oftasks. This has in part been fueled by the introduction of deep learning in RL and the introduction of accelerators such as GPUs. In the very recent history, focus on massive scale has been key to solve a number of complicated games such as AlphaGo (Silver et al., 2016), Dota (OpenAI, 2018)and StarCraft 2 (Vinyals et al., 2017). The sheer amount of environment data needed to solve tasks trivial to humans, makes distributed machine learning unavoidable for fast experiment turnaround time. RL is inherently comprised of heterogeneous tasks: running environments, model inference, model training, replay buffer, etc. and current state-of-the-art distributed algorithms do not efficiently use compute resources for the tasks.The amount of data and inefficient use of resources makes experiments unreasonably expensive. The two main challenges addressed in this paper are scaling of reinforcement learning and optimizing the use of modern accelerators, CPUs and other resources. We introduce SEED (Scalable, Efficient, Deep-RL), a modern RL agent that scales well, is flexible and efficiently utilizes available resources. It is a distributed agent where model inference is done centrally combined with fast streaming RPCs to reduce the overhead of inference calls. We show that with simple methods, one can achieve state-of-the-art results faster on a number of tasks. For optimal performance, we use TPUs (cloud.google.com/ tpu/) and TensorFlow 2 (Abadi et al., 2015)to simplify the implementation. The cost of running SEED is analyzed against IMPALA (Espeholtet al., 2018) which is a commonly used state-of-the-art distributed RL algorithm (Veeriah et al.(2019); Li et al. (2019); Deverett et al. (2019); Omidshafiei et al. (2019); Vezhnevets et al. (2019);Hansen et al. (2019); Schaarschmidt et al.; Tirumala et al. (2019), ...). We show cost reductions of up to 80% while being significantly faster. When scaling SEED to many accelerators, it can train on millions of frames per second. Finally, the implementation is open-sourced together with examples of running it at scale on Google Cloud (see Appendix A.4 for details) making it easy to reproduce results and try novel ideas
  • 70. Designing Neural Nets through Neuroevolution From tinyurl.com/mykhb52y Much of recent machine learning has focused on deep learning, in which neural network weights are trained through variantsof stochastic gradient descent. An alternative approach comes from the field of neuroevolution, which harnesses evolutionary algorithms to optimize neural networks, inspired by the fact that natural brains themselves are the products of an evolutionary process. Neuroevolution enables important capabilities that are typically unavailable to gradient-based approaches, including learning neural network building blocks (for example activation functions), hyperparameters, architectures and even the algorithms for learning themselves. Neuroevolution also differs from deep learning (and deep reinforcement learning) by maintaining a population of solutions during search, enabling extreme exploration and massive parallelization. Finally, because neuroevolution research has (until recently) developed largely in isolation from gradient- based neural network research, ithas developed many unique and effective techniques that should be effective in other machine learning areas too. This Review looks at several key aspects of modern neuroevolution, including large-scale computing, the benefits of novelty and diversity, the power of indirect encoding, and the field’s contributions to meta-learning and architecture search. Our hope is to inspire renewed interest in the field as it meets the potential of the increasing computation available today, to highlight how many of its ideas can provide an exciting resource for inspiration and hybridization to the deep learning, deep reinforcement learning and machine learning communities, and to explain how neuroevolution could prove to be a critical tool in the long-term pursuit of artificial general intelligence
  • 71. Illuminating Search Spaces by Mapping Elites From https://arxiv.org/pdf/1504.04909.pdf
  • 73. From https://arxiv.org/pdf/1412.3555v1.pdf A transformer is a deep learning model that adopts the mechanism of self-attention, differentially weighting the significance of each part of the input data. It is used primarily in the fields of natural language processing (NLP)[1] and computer vision (CV).[2] Like recurrent neural networks (RNNs), transformers are designed to process sequential input data, such as natural language, with applications towards tasks such as translation and text summarization. However, unlike RNNs, transformers process the entire input all at once. The attention mechanism provides context for any position in the input sequence. For example, if the input data is a natural language sentence, the transformer does not have to process one word at a time. This allows for more parallelization than RNNs and therefore reduces training times.[1] Transformers were introduced in 2017 by a team at Google Brain[1] and are increasingly the model of choice for NLP problems,[3] replacing RNN models such as long short- term memory (LSTM). The additional training parallelization allows training on larger datasets. This led to the development of pretrained systems such as BERT (Bidirectional Encoder Representations from Transformers) and GPT (Generative Pre-trained Transformer), which were trained with large language datasets, such as the Wikipedia Corpus and Common Crawl, and can be fine-tuned for specific tasks.[4][5] Attention mechanisms let a model draw from the state at any preceding point along the sequence. The attention layer can access all previous states and weight them according to a learned measure of relevance, providing relevant information about far-away tokens. When added to RNNs, attention mechanisms increase performance. The development of the Transformer architecture revealed that attention mechanisms were powerful in themselves and that sequential recurrent processing of data was not necessary to achieve the quality gains of RNNs with attention. Transformers use an attention mechanism without an RNN, processing all tokens at the same time and calculating attention weights between them in successive layers. Since the attention mechanism only uses information about other tokens from lower layers, it can be computed for all tokens in parallel, which leads to improved training speed. Like earlier seq2seq models, the original Transformer model used an encoder–decoder architecture. The encoder consists of encoding layers that process the input iteratively one layer after another, while the decoder consists of decoding layers that do the same thing to the encoder's output. The function of each encoder layer is to generate encodings that contain information about which parts of the inputs are relevant to each other. It passes its encodings to the next encoder layer as inputs. Each decoder layer does the opposite, taking all the encodings and using their incorporated contextual information to generate an output sequence.[6] To achieve this, each encoder and decoder layer makes use of an attention mechanism. For each input, attention weighs the relevance of every other input and draws from them to produce the output.[7] Each decoder layer has an additional attention mechanism that draws information from the outputs of previous decoders, before the decoder layer draws information from the encodings. Both the encoder and decoder layers have a feed-forward neural network for additional processing of the outputs and contain residual connections and layer normalization steps. Transformers
  • 74. From https://en.wikipedia.org/wiki/Transformer_(machine_learning_model) Transformers Before transformers, most state-of-the-art NLP systems relied on gated RNNs, such as LSTMs and gated recurrent units (GRUs), with added attention mechanisms. Transformers also make use of attention mechanisms but, unlike RNNs, do not have a recurrent structure. This means that provided with enough training data, attention mechanisms alone can match the performance of RNNs with attention.[1] Sequential processing Gated RNNs process tokens sequentially, maintaining a state vector that contains a representation of the data seen prior to the current token. To process the th token, the model combines the state representing the sentence up to token with the information of the new token to create a new state, representing the sentence up to token . Theoretically, the information from one token can propagate arbitrarily far down the sequence, if at every point the state continues to encode contextual information about the token. In practice this mechanism is flawed: the vanishing gradient problem leaves the model's state at the end of a long sentence without precise, extractable information about preceding tokens. The dependency of token computations on results of previous token computations also makes it hard to parallelize computation on modern deep learning hardware. This can make the training of RNNs inefficient. Self-Attention These problems were addressed by attention mechanisms. Attention mechanisms let a model draw from the state at any preceding point along the sequence. The attention layer can access all previous states and weight them according to a learned measure of relevance, providing relevant information about far-away tokens. A clear example of the value of attention is in language translation, where context is essential to assign the meaning of a word in a sentence. In an English-to-French translation system, the first word of the French output most probably depends heavily on the first few words of the English input. However, in a classic LSTM model, in order to produce the first word of the French output, the model is given only the state vector after processing the last English word. Theoretically, this vector can encode information about the whole English sentence, giving the model all necessary knowledge. In practice, this information is often poorly preserved by the LSTM. An attention mechanism can be added to address this problem: the decoder is given access to the state vectors of every English input word, not just the last, and can learn attention weights that dictate how much to attend to each English input state vector. When added to RNNs, attention mechanisms increase performance. The development of the Transformer architecture revealed that attention mechanisms were powerful in themselves and that sequential recurrent processing of data was not necessary to achieve the quality gains of RNNs with attention. Transformers use an attention mechanism without an RNN, processing all tokens at the same time and calculating attention weights between them in successive layers. Since the attention mechanism only uses information about other tokens from lower layers, it can be computed for all tokens in parallel, which leads to improved training speed.
  • 75. From https://en.wikipedia.org/wiki/GPT-3 GPT-3 Generative Pre-trained Transformer 3 (GPT-3; stylized GPT·3) is an autoregressive language model that uses deep learning to produce human-like text. The architecture is a standard transformer network (with a few engineering tweaks) with the unprecedented size of 2048-token-long context and 175 billion parameters (requiring 800 GB of storage). The training method is "generative pretraining", meaning that it is trained to predict what the next token is. The model demonstrated strong few-shot learning on many text-based tasks. It is the third-generation language prediction model in the GPT-n series (and the successor to GPT-2) created by OpenAI, a San Francisco-based artificial intelligence research laboratory.[2] GPT-3's full version has a capacity of 175 billion machine learning parameters. GPT-3, which was introduced in May 2020, and was in beta testing as of July 2020,[3] is part of a trend in natural language processing (NLP) systems of pre-trained language representations.[1] The quality of the text generated by GPT-3 is so high that it can be difficult to determine whether or not it was written by a human, which has both benefits and risks.[4] Thirty-one OpenAI researchers and engineers presented the original May 28, 2020 paper introducing GPT-3. In their paper, they warned of GPT-3's potential dangers and called for research to mitigate risk.[1]:34 David Chalmers, an Australian philosopher, described GPT-3 as "one of the most interesting and important AI systems ever produced."[5] Microsoft announced on September 22, 2020, that it had licensed "exclusive" use of GPT-3; others can still use the public API to receive output, but only Microsoft has access to GPT-3's underlying model.[6] An April 2022 review in The New York Times described GPT-3's capabilities as being able to write original prose with fluency equivalent to that of a human.[7]
  • 76. OpenAI From https://openai.com/ Recent Research Efficient Training of Language Models to Fill in the Middle Hierarchical Text-Conditional Image Generation with CLIP Latents Formal Mathematics Statement Curriculum Learning Training language models to follow instructions with human feedback Text and Code Embeddings by Contrastive Pre-Training WebGPT: Browser-assisted question-answering with human feedback Training Verifiers to Solve Math Word Problems Recursively Summarizing Books with Human Feedback Evaluating Large Language Models Trained on Code Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets Multimodal Neurons in Artificial Neural Networks Learning Transferable Visual Models From Natural Language Supervision Zero-Shot Text-to-Image Generation Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models OpenAI is an AI research and deployment company. Our mission is to ensure that artificial general intelligence benefits all of humanity.
  • 78. Reservoir Computing From https://martinuzzifrancesco.github.io/posts/a-brief-introduction-to-reservoir-computing/ Reservoir Computing is an umbrella term used to identify a general framework of computation derived from Recurrent Neural Networks (RNN), indipendently developed by Jaeger [1] and Maass et al. [2]. These papers introduced the concepts of Echo State Networks (ESN) and Liquid State Machines (LSM) respectively. Further improvements over these two models constitute what is now called the field of Reservoir Computing. The main idea lies in leveraging a fixed non-linear system, of higher dimension than the input, onto which to input signal is mapped. After this mapping is only necessary to use a simple readout layer to harvest the state of the reservoir and to train it to the desired output. In principle, given a complex enough system, this architecture should be capable of any computation [3]. The intuition was born from the fact that in training RNNs most of the times the weights showing most change were the ones in the last layer [4]. In the next section we will also see that ESNs actually use a fixed random RNN as the reservoir. Given the static nature of this implementation usually ESNs can yield faster results and in some cases even better, in particular when dealing with chaotic time series predictions [5]. But not every complex system is suited to be a good reservoir. A good reservoir is one that is able to separate inputs; different external inputs should drive the system to different regions of the configuration space [3]. This is called the separability condition. Furthermore an important property for the reservoirs of ESNs is the Echo State property which states that inputs to the reservoir echo in the system forever, or util they dissipate. A more formal definition of this property can be found in [6]. Reservoir computing is a best-in-class machine learning algorithm for processing information generated by dynamical systems using observed time-series data. Importantly, it requires very small training data sets, uses linear optimization, and thus requires minimal computing resources. However, the algorithm uses randomly sampled matrices to define the underlying recurrent neural network and has a multitude of metaparameters that must be optimized. Recent results demonstrate the equivalence of reservoir computing to nonlinear vector autoregression, which requires no random matrices, fewer metaparameters, and provides interpretable results. Here, we demonstrate that nonlinear vector autoregression excels at reservoir computing benchmark tasks and requires even shorter training data sets and training time, heralding the next generation of reservoir computing. A dynamical system evolves in time, with examples including the Earth’s weather system and human-built devices such as unmanned aerial vehicles. One practical goal is to develop models for forecasting their behavior. Recent machine learning (ML) approaches can generate a model using only observed data, but many of these algorithms tend to be data hungry, requiring long observation times and substantial computational resources. Reservoir computing1,2 is an ML paradigm that is especially well-suited for learning dynamical systems. Even when systems display chaotic3 or complex spatiotemporal behaviors4, which are considered the hardest-of-the-hard problems, an optimized reservoir computer (RC) can handle them with ease. From https://www.nature.com/articles/s41467-021-25801-2
  • 79. Reservoir Computing Trends From https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.709.514&rep=rep1&type=pdf
  • 80. Brain Connectivity meets Reservoir Computing From https://www.biorxiv.org/content/10.1101/2021.01.22.427750v1 The connectivity of Artificial Neural Networks (ANNs) is different from the one observed in Biological Neural Networks (BNNs). Can the wiring of actual brains help improve ANNs architectures? Can we learn from ANNs about what network features support computation in the brain when solving a task? ANNs’ architectures are carefully engineered and have crucial importance in many recent performance improvements. On the other hand, BNNs’ exhibit complex emergent connectivity patterns. At the individual level, BNNs connectivity results from brain development and plasticity processes, while at the species level, adaptive reconfigurations during evolution also play a major role shaping connectivity. Ubiquitous features of brain connectivity have been identified in recent years, but their role in the brain’s ability to perform concrete computations remains poorly understood. Computational neuroscience studies reveal the influence of specific brain connectivity features only on abstract dynamical properties, although the implications of real brain networks topologies on machine learning or cognitive tasks have been barely explored. Here we present a cross-species study with a hybrid approach integrating real brain connectomes and Bio-Echo State Networks, which we use to solve concrete memory tasks, allowing us to probe the potential computational implications of real brain connectivity patterns on task solving. We find results consistent across species and tasks, showing that biologically inspired networks perform as well as classical echo state networks, provided a minimum level of randomness and diversity of connections is allowed. We also present a framework, bio2art, to map and scale up real connectomes that can be integrated into recurrent ANNs. This approach also allows us to show the crucial importance of the diversity of interareal connectivity patterns, stressing the importance of stochastic processes determining neural networks connectivity in general.
  • 83. Summary of Deep Learning Models: Survey From https://arxiv.org/pdf/1712.04301.pdf
  • 84. Deep Learning Acronyms From https://arxiv.org/pdf/1712.04301.pdf
  • 85. Deep Learning Hardware From https://medium.com/iotforall/using-deep-learning-processors-for-intelligent-iot-devices-1a7ed9d2226d
  • 86. Deep Learning MIT From https://deeplearning.mit.edu/
  • 88. GitHub ONNX Models From https://github.com/onnx/models
  • 89. HPC vs Big Data Ecosystems From https://www.hpcwire.com/2018/08/31/the-convergence-of-big-data-and-extreme-scale-hpc/
  • 90. HPC and ML From http://dsc.soic.indiana.edu/publications/Learning_Everywhere_Summary.pdf HPCforML: Using HPC to execute and enhance ML performance, or using HPC simulations to train ML algorithms (theory guided machine learning), which are then used to understand experimental data or simulations. •MLforHPC: Using ML to enhance HPC applications and systems •This categorization is related to Jeff Dean’s ”Machine Learning for Systems and Systems for Machine Learning” [6] and Matsuoka’s convergence of AI and HPC [7].We further subdivide HPCforML as •• HPCrunsML: Using HPC to execute ML with high performance • SimulationTrainedML: Using HPC simulations to train ML algorithms, which are then used to understand experimental data or simulations. We also subdivide MLforHPC as • MLautotuning: Using ML to configure (autotune) ML or HPC simulations. Already, autotuning with systems like ATLAS is hugely successful and gives an initial view of MLautotuning. As well as choosing block sizes to improve cache use and vectorization, MLautotuning can also be used for simulation mesh sizes [8] and in big data problems for configuring databases and complex systems like Hadoop and Spark [9], [10] •. • MLafterHPC: ML analyzing results of HPC as in trajectory analysis and structure identification in biomolecular simulations • MLaroundHPC: Using ML to learn from simulations and produce learned surrogates for the simulations. The same ML wrapper can also learn configurations as well as results. This differs from SimulationTrainedML as there typically a learnt network is used to redirect observation whereas in MLaroundHPC we are using the ML to improve the HPC performance •. • MLControl: Using simulations (with HPC) in contro of experiments and in objective driven computational campaigns [11]. Here the simulation surrogates are very valuable to allow real-time predictions.
  • 91. Designing Neural Nets through Neuroevolution From www.evolvingai.org/stanley-clune-lehman-2019-designing-neural-networks
  • 92. Go Explore Algorithm From http://www.evolvingai.org/files/1901.10995.pdf
  • 93. Deep Density Destructors From https://www.cs.cmu.edu/~dinouye/papers/inouye2018-deep-density-destructors-icml2018.pdf We propose a unified framework for deep density models by formally defining density destructors. A density destructor is an invertible function that transforms a given density to the uniform density—essentially destroying any structure in the original density. This destructive transformation generalizes Gaussianization via ICA and more recent autoregressive models such as MAF and Real NVP. Informally, this transformation can be seen as a generalized whitening procedure or a multivariate generalization of the univariate CDF function. Unlike Gaussianization, our destructive transformation has the elegant property that the density function is equal to the absolute value of the Jacobian determinant. Thus, each layer of a deep density can be seen as a shallow density—uncovering a fundamental connection between shallow and deep densities. In addition, our framework provides a common interface for all previous methods enabling them to be systematically combined, evaluated and improved. Leveraging the connection to shallow densities, we also propose a novel tree destructor based on tree densities and an image-specific destructor based on pixel locality. We illustrate our framework on a 2D dataset, MNIST, and CIFAR-10.
  • 95. Sci-Kit Learning Decision Tree From https://becominghuman.ai/cheat-sheets-for-ai-neural-networks-machine-learning-deep-learning-big-data-678c51b4b463
  • 96. Imitation Learning From https://drive.google.com/file/d/12QdNmMll-bGlSWnm8pmD_TawuRN7xagX/view https://drive.google.com/file/d/12QdNmMll-bGlSWnm8pmD_TawuRN7xagX/view
  • 97. Imitation Learning From https://drive.google.com/file/d/12QdNmMll-bGlSWnm8pmD_TawuRN7xagX/view https://drive.google.com/file/d/12QdNmMll-bGlSWnm8pmD_TawuRN7xagX/view
  • 98. Generative Adversarial Networks (GANs) From https://skymind.ai/wiki/generative-adversarial-network-gan
  • 99. Deep Generative Network-based Activation Management (DGN-AMs) From https://arxiv.org/pdf/1605.09304.pdf
  • 100. Paired Open Ended Trailblazer (POET) From https://drive.google.com/file/d/12QdNmMll-bGlSWnm8pmD_TawuRN7xagX/view https://drive.google.com/file/d/12QdNmMll-bGlSWnm8pmD_TawuRN7xagX/view
  • 101. One Model to Learn Them All From https://arxiv.org/pdf/1706.05137.pdf
  • 102. Self-modifying NNs With Differentiable Neuromodulated Plasticity From https://arxiv.org/pdf/1706.05137.pdf
  • 103. Stein Variational Gradient Descent From https://arxiv.org/pdf/1706.05137.pdf
  • 104. Linux Foundation Deep Learning (LFDL) Projects From https://lfdl.io/projects/
  • 105. Linux Foundation Deep Learning (LFDL) Projects From https://lfdl.io/projects/
  • 107. Graphical Processing Units (GPU) From https://www.intel.com/content/www/us/en/products/docs/processors/what-is-a-gpu.html Graphics processing technology has evolved to deliver unique benefits in the world of computing. The latest graphics processing units (GPUs) unlock new possibilities in gaming, content creation, machine learning, and more. What Does a GPU Do? The graphics processing unit, or GPU, has become one of the most important types of computing technology, both for personal and business computing. Designed for parallel processing, the GPU is used in a wide range of applications, including graphics and video rendering. Although they’re best known for their capabilities in gaming, GPUs are becoming more popular for use in creative production and artificial intelligence (AI). GPUs were originally designed to accelerate the rendering of 3D graphics. Over time, they became more flexible and programmable, enhancing their capabilities. This allowed graphics programmers to create more interesting visual effects and realistic scenes with advanced lighting and shadowing techniques. Other developers also began to tap the power of GPUs to dramatically accelerate additional workloads in high performance computing (HPC), deep learning, and more. GPU and CPU: Working Together The GPU evolved as a complement to its close cousin, the CPU (central processing unit). While CPUs have continued to deliver performance increases through architectural innovations, faster clock speeds, and the addition of cores, GPUs are specifically designed to accelerate computer graphics workloads. When shopping for a system, it can be helpful to know the role of the CPU vs. GPU so you can make the most of both. GPU vs. Graphics Card: What’s the Difference? While the terms GPU and graphics card (or video card) are often used interchangeably, there is a subtle distinction between these terms. Much like a motherboard contains a CPU, a graphics card refers to an add-in board that incorporates the GPU. This board also includes the raft of components required to both allow the GPU to function and connect to the rest of the system. GPUs come in two basic types: integrated and discrete. An integrated GPU does not come on its own separate card at all and is instead embedded alongside the CPU. A discrete GPU is a distinct chip that is mounted on its own circuit board and is typically attached to a PCI Express slot.
  • 108. NVidia Graphical Processing Units (GPU) From https://en.wikipedia.org/wiki/Nvidia Nvidia Corporation[note 1][note 2] (/ɛnˈvɪdiə/ en-VID-ee-ə) is an American multinational technology company incorporated in Delaware and based in Santa Clara, California.[2] It is a software and fabless company which designs graphics processing units (GPUs), application programming interface (APIs) for data science and high-performance computing as well as system on a chip units (SoCs) for the mobile computing and automotive market. Nvidia is a global leader in artificial intelligence hardware and software.[3][4] Its professional line of GPUs are used in workstations for applications in such fields as architecture, engineering and construction, media and entertainment, automotive, scientific research, and manufacturing design.[5] In addition to GPU manufacturing, Nvidia provides an API called CUDA that allows the creation of massively parallel programs which utilize GPUs.[6][7] They are deployed in supercomputing sites around the world.[8][9] More recently, it has moved into the mobile computing market, where it produces Tegra mobile processors for smartphones and tablets as well as vehicle navigation and entertainment systems.[10][11][12] In addition to AMD, its competitors include Intel,[13] Qualcomm[14] and AI-accelerator companies such as Graphcore. Nvidia's GPUs are used for edge to cloud computing, and supercomputers (Nvidia provides the accelerators, i.e. the GPUs for many of them, including a previous top fastest, while it has been replaced, and current fastest, and most-power efficient, are powered by AMD GPUs and CPUs) and Nvidia expanded its presence in the gaming industry with its handheld game consoles Shield Portable, Shield Tablet, and Shield Android TV and its cloud gaming service GeForce Now. Nvidia announced plans on September 13, 2020, to acquire Arm from SoftBank, pending regulatory approval, for a value of US$40 billion in stock and cash, which would be the largest semiconductor acquisition to date. SoftBank Group will acquire slightly less than a 10% stake in Nvidia, and Arm would maintain its headquarters in Cambridge.[15][16][17][18]
  • 109. Tesla unveils new Dojo Supercomouter From https://electrek.co/2022/10/01/tesla-dojo-supercomputer-tripped-power-grid/ Tesla has unveiled its latest version of its Dojo supercomputer and it’s apparently so powerful that it tripped the power grid in Palo Alto. Dojo is Tesla’s own custom supercomputer platform built from the ground up for AI machine learning and more specifically for video training using the video data coming from its fleet of vehicles. The automaker already has a large NVIDIA GPU-based supercomputer that is one of the most powerful in the world, but the new Dojo custom-built computer is using chips and an entire infrastructure designed by Tesla.The custom-built supercomputer is expected to elevate Tesla’s capacity to train neural nets using video data, which is critical to its computer vision technology powering its self-driving effort. Last year, at Tesla’s AI Day, the company unveiled its Dojo supercomputer, but the company was still ramping up its effort at the time. It only had its first chip and training tiles, and it was still working on building a full Dojo cabinet and cluster or “Exapod.”Now Tesla has unveiled the progress made with the Dojo program over the last year during its AI Day 2022 last night. Why does Tesla need to Dojo supercomputer? It’s a fair question. Why is an automaker developing the world’s most powerful supercomputer? Well, Tesla would tell you that it’s not just an automaker, but a technology company developing products to accelerate the transition to a sustainable economy.Musk said it makes sense to offer a Dojo as a service, perhaps to take on his buddy Jeff Bezos’s Amazon AWS and calling it a “service that you can use that’s available online where you can train your models way faster and for less money.” But more specifically, Tesla needs Dojo to auto-label train videos from its fleet and train its neural nets to build its self-driving system.Tesla realized that its approach to developing a self-driving system using neural nets training on millions of videos coming from its customer fleet requires a lot of computing power. and it decided to develop its own supercomputer to deliver that power. That’s the short-term goal, but Tesla will have plenty of use for the supercomputer going forward as it has big ambitions to develop other artificial intelligence programs.
  • 110. Linux Foundation Deep Learning (LFDL) Projects From https://lfdl.io/projects/
  • 112. Introduction to Deep Reinforcement Learning From https://skymind.ai/wiki/deep-reinforcement-learning Many RL references at this site
  • 113. Model-based Reinforcement Learning From http://rail.eecs.berkeley.edu/deeprlcourse-fa17/f17docs/lecture_9_model_based_rl.pdfhttp://rail.eecs.berkeley.edu/deeprlcourse-fa17/f17docs/lecture_9_model_based_rl.pdf
  • 114. Hierarchical Deep Reinforcement Learning From https://papers.nips.cc/paper/6233-hierarchical-deep-reinforcement-learning-integrating-temporal-abstraction-and-intrinsic-motivation.pdf
  • 115. Meta Learning Shared Hierarchy From https://skymind.ai/wiki/deep-reinforcement-learning
  • 116. Learning with Hierarchical Deep Models From https://www.cs.toronto.edu/~rsalakhu/papers/HD_PAMI.pdf We introduce HD (or “Hierarchical-Deep”) models, a new compositional learning architecture that integrates deep learning models with structured hierarchical Bayesian (HB) models. Specifically, we show how we can learn a hierarchical Dirichlet process (HDP) prior over the activities of the top-level features in a deep Boltzmann machine (DBM). This compound HDP- DBM model learns to learn novel concepts from very few training example by learning low- level generic features, high-level features that capture correlations among low-level features, and a category hierarchy for sharing priors over the high-level features that are typical of different kinds of concepts. We present efficient learning and inference algorithms for the HDP-DBM model and show that it is able to learn new concepts from very few examples on CIFAR-100 object recognition, handwritten character recognition, and human motion capture datasets.
  • 118. Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations From https://web.eecs.umich.edu/~honglak/icml09-ConvolutionalDeepBeliefNetworks.pdf There has been much interest in unsupervised learning of hierarchical generative models such as deep belief networks. Scaling such models to full-sized, high-dimensional images remains a difficult problem. To address this problem, we present the convolutional deep belief network, a hierarchical generative model which scales to realistic image sizes. This model is translation-invariant and supports efficient bottom-up and top- down probabilistic inference. Key to our approach is probabilistic max-pooling, a novel technique which shrinks the representations of higher layers in a probabilistically sound way. Our experiments show that the algorithm learns useful high-level visual features, such as object parts, from unlabeled images of objects and natural scenes. We demonstrate excellent performance on several visual recognition tasks and show that our model can perform hierarchical (bottom-up and top-down) inference over full-sized images. The visual world can be described at many levels: pixel intensities, edges, object parts, objects, and beyond. The prospect of learning hierarchical models which simultaneously represent multiple levels has recently generated much interest. Ideally, such “deep” representations would learn hierarchies of feature detectors, and further be able to combine top-down and bottomup processing of an image. For instance, lower layers could support object detection by spotting low-level features indicative of object parts. Conversely, information about objects in the higher layers could resolve lower-level ambiguities in the image or infer the locations of hidden object parts. Deep architectures consist of feature detector units arranged in layers. Lower layers detect simple features and feed into higher layers, which in turn detect more complex features. There have been several approaches to learning deep networks (LeCun et al., 1989; Bengio et al., 2006; Ranzato et al., 2006; Hinton et al., 2006). In particular, the deep belief network (DBN) (Hinton et al., 2006) is a multilayer generative model where each layer encodes statistical dependencies among the units in the layer below it; it is trained to (approximately) maximize the likelihood of its training data. DBNs have been successfully used to learn high-level structure in a wide variety of domains, including handwritten digits (Hinton et al., 2006) and human motion capture data (Taylor et al., 2007). We build upon the DBN in this paper because we are interested in learning a generative model of images which can be trained in a purely unsupervised manner This paper presents the convolutional deep belief network, a hierarchical generative model that scales to full-sized images. Another key to our approach is probabilistic max-pooling, a novel technique that allows higher-layer units to cover larger areas of the input in a probabilistically sound way. To the best of our knowledge, ours is the first translation invariant hierarchical generative model which supports both top-down and bottom-up probabilistic inference and scales to realistic image sizes. The first, second, and third layers of our network learn edge detectors, object parts, and objects respectively. We show that these representations achieve excellent performance on several visual recognition tasks and allow “hidden” object parts to be inferred from high-level object information.
  • 119. Learning with Hierarchical-Deep Models From https://www.cs.toronto.edu/~rsalakhu/papers/HD_PAMI.pdf We introduce HD (or “Hierarchical-Deep”) models, a new compositional learning architecture that integrates deep learning models with structured hierarchical Bayesian (HB) models. Specifically, we show how we can learn a hierarchical Dirichlet process (HDP) prior over the activities of the top-level features in a deep Boltzmann machine (DBM). This compound HDP-DBM model learns to learn novel concepts from very few training example by learning low-level generic features, high-level features that capture correlations among low-level features, and a category hierarchy for sharing priors over the high-level features that are typical of different kinds of concepts. We present efficient learning and inference algorithms for the HDP-DBM model and show that it is able to learn new concepts from very few examples on CIFAR-100 object recognition, handwritten character recognition, and human motion capture datasets The ability to learn abstract representations that support transfer to novel but related tasks lies at the core of many problems in computer vision, natural language processing, cognitive science, and machine learning. In typical applications of machine classification algorithms today, learning a new concept requires tens, hundreds, or thousands of training examples. For human learners, however, just one or a few examples are often sufficient to grasp a new category and make meaningful generalizations to novel instances [15], [25], [31], [44]. Clearly, this requires very strong but also appropriately tuned inductive biases. The architecture we describe here takes a step toward this ability by learning several forms of abstract knowledge at different levels of abstraction that support transfer of useful inductive biases from previously learned concepts to novel ones. We call our architectures compound HD models, where “HD” stands for “Hierarchical-Deep,” because they are derived by composing hierarchical nonparametric Bayesian models with deep networks, two influential approaches from the recent unsupervised learning literature with complementary strengths. Recently introduced deep learning models, including deep belief networks (DBNs) [12], deep Boltzmann machines (DBM) [29], deep autoencoders [19], and many others [9], [10], [21], [22], [26], [32], [34], [43], have been shown to learn useful distributed feature representations for many high-dimensional datasets. The ability to automatically learn in multiple layers allows deep models to construct sophisticated domain-specific features without the need to rely on precise human-crafted input representations, increasingly important with the proliferation of datasets and application domains.
  • 120. Reinforcement Learning: Fast and Slow From https://www.cell.com/trends/cognitive-sciences/fulltext/S1364-6613(19)30061-0 Meta-RL: Speeding up Deep RL by Learning to Learn As discussed earlier, a second key source of slowness in standard deep RL, alongside incremental updating, is weak inductive bias. As formalized in the idea of the bias–variance tradeoff, fast learning requires the learner to go in with a reasonably sized set of hypotheses concerning the structure of the patterns that it will face. The narrower the hypothesis set, the faster learning can be. However, as foreshadowed earlier, there is a catch: a narrow hypothesis set will only speed learning if it contains the correct hypothesis. While strong inductive biases can accelerate learning, they will only do so if the specific biases the learner adopts happen to fit with the material to be learned. As a result of this, a new learning problem arises: how can the learner know what inductive biases to adopt? Episodic Deep RL: Fast Learning through Episodic Memory If incremental parameter adjustment is one source of slowness in deep RL, then one way to learn faster might be to avoid such incremental updating. Naively increasing the learning rate governing gradient descent optimization leads to the problem of catastrophic interference. However, recent research shows that there is another way to accomplish the same goal, which is to keep an explicit record of past events, and use this record directly as a point of reference in making new decisions. This idea, referred to as episodic RL parallels ‘non-parametric’ approaches in machine learning and resembles ‘instance-’ or ‘exemplar-based’ theories of learning in psychology When a new situation is encountered and a decision must be made concerning what action to take, the procedure is to compare an internal representation of the current situation with stored representations of past situations. The action chosen is then the one associated with the highest value, based on the outcomes of the past situations that are most similar to the present. When the internal state representation is computed by a multilayer neural network, we refer to the resulting algorithm as ‘episodic deep RL’.
  • 122. Large-Scale Deep Learning (Jeff Dean) From http://static.googleusercontent.com/media/research.google.com/en//people/jeff/CIKM-keynote-Nov2014.pdf
  • 123. Embedding for Sparse Inputs (Jeff Dean) From http://static.googleusercontent.com/media/research.google.com/en//people/jeff/CIKM-keynote-Nov2014.pdf
  • 124. Efficient Vector Representation of Words (Jeff Dean) From http://static.googleusercontent.com/media/research.google.com/en//people/jeff/CIKM-keynote-Nov2014.pdf
  • 125. Deep Convolution Neural Nets and Gaussian Processes From https://ai.google/research/pubs/pub47671
  • 126. Deep Convolution Neural Nets and Gaussian Processes(cont) From https://ai.google/research/pubs/pub47671
  • 127. Google’s Inception Network From https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
  • 128. Google’s Inception Network From https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
  • 129. Google’s Inception Network From https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
  • 130. Google’s Inception Network From https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
  • 131. Google’s Inception Network From https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
  • 132. Google’s Inception Network From https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
  • 133. Google’s Inception Network From https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202
  • 134. Large-Scale Deep Learning (Jeff Dean) From http://static.googleusercontent.com/media/research.google.com/en//people/jeff/CIKM-keynote-Nov2014.pdf
  • 135. Large-Scale Deep Learning (Jeff Dean) From http://static.googleusercontent.com/media/research.google.com/en//people/jeff/CIKM-keynote-Nov2014.pdf
  • 136. Large-Scale Deep Learning (Jeff Dean) From http://static.googleusercontent.com/media/research.google.com/en//people/jeff/CIKM-keynote-Nov2014.pdf
  • 137. Large-Scale Deep Learning (Jeff Dean) From http://static.googleusercontent.com/media/research.google.com/en//people/jeff/CIKM-keynote-Nov2014.pdf
  • 138. Computing and Sensing Architecture
  • 139. Simple Event Processing Complex Event Processing Hierarchical C4ISR Flow Model from Bob Marcus Preprocess In Input Devices u World Model Update New World Model Strategy Tactics HQ Operations Field Operations Situation Impact Object Process Simple Response Complex Response Update Plan Create New Goals and Plan Sensor and Effects Management In Actuator Devices Measurement Field Processors Data Structured Data Information Knowledge Wisdom Devices Awareness Decision Adapted From http://www.et-strategies.com/great-global-grid/Events.pdf
  • 140. Computing and Sensing Architectures From https://www.researchgate.net/publication/323835314_Greening_Trends_in_Energy-Efficiency_of_IoT-based_Heterogeneous_Wireless_Nodes/figures?lo=1
  • 141. Computing and Sensing Architectures From https://www.researchgate.net/publication/323835314_Greening_Trends_in_Energy-Efficiency_of_IoT-based_Heterogeneous_Wireless_Nodes/figures?lo=1
  • 142. Bio-Inspired Distributed Intelligence From https://news.mit.edu/2022/wiggling-toward-bio-inspired-machine-intelligence-juncal-arbelaiz-1002 More than half of an octopus’ nerves are distributed through its eight arms, each of which has some degree of autonomy. This distributed sensing and information processing system intrigued Arbelaiz, who is researching how to design decentralized intelligence for human-made systems with embedded sensing and computation. At MIT, Arbelaiz is an applied math student who is working on the fundamentals of optimal distributed control and estimation in the final weeks before completing her PhD this fall. She finds inspiration in the biological intelligence of invertebrates such as octopus and jellyfish, with the ultimate goal of designing novel control strategies for flexible “soft” robots that could be used in tight or delicate surroundings, such as a surgical tool or for search-and-rescue missions. “The squishiness of soft robots allows them to dynamically adapt to different environments. Think of worms, snakes, or jellyfish, and compare their motion and adaptation capabilities to those of vertebrate animals,” says Arbelaiz. “It is an interesting expression of embodied intelligence — lacking a rigid skeleton gives advantages to certain applications and helps to handle uncertainty in the real world more efficiently. But this additional softness also entails new system-theoretic challenges.” In the biological world, the “controller” is usually associated with the brain and central nervous system — it creates motor commands for the muscles to achieve movement. Jellyfish and a few other soft organisms lack a centralized nerve center, or brain. Inspired by this observation, she is now working toward a theory where soft-robotic systems could be controlled using decentralized sensory information sharing. “When sensing and actuation are distributed in the body of the robot and onboard computational capabilities are limited, it might be difficult to implement centralized intelligence,” she says. “So, we need these sort of decentralized schemes that, despite sharing sensory information only locally, guarantee the desired global behavior. Some biological systems, such as the jellyfish, are beautiful examples of decentralized control architectures — locomotion is achieved in the absence of a (centralized) brain. This is fascinating as compared to what we can achieve with human-made machines.”
  • 143. IoT and Deep Learning
  • 145. Deep Learning for IoT Overview: Survey From https://arxiv.org/pdf/1712.04301.pdf
  • 146. Deep Learning for IoT Overview: Survey From https://arxiv.org/pdf/1712.04301.pdf
  • 147. Standardized IoT Data Sets: Survey From https://arxiv.org/pdf/1712.04301.pdf
  • 148. Standardized IoT Data Sets: Survey From https://arxiv.org/pdf/1712.04301.pdf
  • 150. DeepMind Website DeepMind Home page https://deepmind.com/ DeepMind Research https://deepmind.com/research/ https://deepmind.com/research/publications/ DeepMind Blog https://deepmind.com/blog DeepMind Applied https://deepmind.com/applied
  • 151. DeepMind Featured Research Publications From https://deepmind.com/research AlphaGo https://www.deepmind.com/research/highlighted-research/alphago Deep Reinforcement Learning https://deepmind.com/research/dqn/ A Dual Approach to Scalable Verification of Deep Networks http://auai.org/uai2018/proceedings/papers/204.pdf https://www.youtube.com/watch?v=SV05j3GM0LI Learning to reinforcement learn https://arxiv.org/abs/1611.05763 Neural Programmer - Interpreters https://arxiv.org/pdf/1511.06279v3.pdf Dueling Network Architectures for Deep Reinforcement Learning https://arxiv.org/pdf/1511.06581.pdf DeepMind Research over 400 publications https://deepmind.com/research/publications/
  • 152. DeepMind Applied From https://deepmind.com/applied/ DeepMind Health https://deepmind.com/applied/deepmind-health/ DeepMind for Google https://deepmind.com/applied/deepmind-google/ DeepMind Ethics and Society https://deepmind.com/applied/deepmind-ethics-society/
  • 153. AlphaGo and AlphaGoZero From https://www.deepmind.com/research/highlighted-research/alphago We created AlphaGo, a computer program that combines advanced search tree with deep neural networks. These neural networks take a description of the Go board as an input and process it through a number of different network layers containing millions of neuron-like connections. One neural network, the “policy network”, selects the next move to play. The other neural network, the “value network”, predicts the winner of the game. We introduced AlphaGo to numerous amateur games to help it develop an understanding of reasonable human play. Then we had it play against different versions of itself thousands of times, each time learning from its mistakes. Over time, AlphaGo improved and became increasingly stronger and better at learning and decision- making. This process is known as reinforcement learning. AlphaGo went on to defeat Go world champions in different global arenas and arguably became the greatest Go player of all time. Following the summit, we revealed AlphaGo Zero. While AlphaGo learnt the game by playing thousands of matches with amateur and professional players, AlphaGo Zero learnt by playing against itself, starting from completely random play. This powerful technique is no longer constrained by the limits of human knowledge. Instead, the computer program accumulated thousands of years of human knowledge during a period of just a few days and learned to play Go from the strongest player in the world, AlphaGo. AlphaGo Zero quickly surpassed the performance of all previous versions and also discovered new knowledge, developing unconventional strategies and creative new moves, including those which beat the World Go Champions Lee Sedol and Ke Jie. These creative moments give us confidence that AI can be used as a positive multiplier for human ingenuity.
  • 154. AlphaZero From https://www.deepmind.com/blog/alphazero-shedding-new-light-on-chess-shogi-and-go In late 2017 we introduced AlphaZero, a single system that taught itself from scratch how to master the games of chess, shogi(Japanese chess), and Go, beating a world-champion program in each case. We were excited by the preliminary results and thrilled to see the response from members of the chess community, who saw in AlphaZero’s games a ground-breaking, highly dynamic and “unconventional” style of play that differed from any chess playing engine that came before it. Today, we are delighted to introduce the full evaluation of AlphaZero, published in the journal Science (Open Access version here), that confirms and updates those preliminary results. It describes how AlphaZero quickly learns each game to become the strongest player in history for each, despite starting its training from random play, with no in-built domain knowledge but the basic rules of the game. This ability to learn each game afresh, unconstrained by the norms of human play, results in a distinctive, unorthodox, yet creative and dynamic playing style. Chess Grandmaster Matthew Sadler and Women’s International Master Natasha Regan, who have analysed thousands of AlphaZero’s chess games for their forthcoming book Game Changer (New in Chess, January 2019), say its style is unlike any traditional chess engine.” It’s like discovering the secret notebooks of some great player from the past,” says Matthew. Traditional chess engines – including the world computer chess champion Stockfish and IBM’s ground- breaking Deep Blue – rely on thousands of rules and heuristics handcrafted by strong human players that try to account for every eventuality in a game. Shogi programs are also game specific, using similar search engines and algorithms to chess programs. AlphaZero takes a totally different approach, replacing these hand-crafted rules with a deep neural network and general purpose algorithms that know nothing about the game beyond the basic rules.