SlideShare a Scribd company logo
1 of 9
CS290c – Machine Learning
                         Final Project




Examples and Implications of Physics Math in Machine Learning




                                                Daniel A. O’Leary
In the course of lecture and discussion for this class, it appeared as though there
exists a pattern of discovery for machine learning theories. Several instances of machine
learning research exhibit the following steps:
           •   a topic or problem is identified
           •   some reasonable solutions are suggested
           •   one theory is chosen as best
           •   this theory is investigated and implemented
           •   it is discovered that the math of this solution coincides with a physics
               property that shares the mathematical properties of this solution
           •   the physics math reveals new facts regarding the nature of the problem
               being investigated.
Not only does this seem to occur with enough frequency to suggest that the physics and
machine learning may share a common theme, the ideas that seem to come out of these
examples of mathematics cross-over are often the type of elegant or “big” ideas that
shape and propel a discipline forward. As such it seems reasonable that another way to
search for these “big” leaps in understanding would be to look at the physics that applies
to machine learning, identify trends or similarities and extrapolate from these trends
possible future mathematics crossovers that will prove true in machine learning.
       I propose to investigate the use of physics principals as metaphors in machine
learning. Specifically, I will review energy, temperature, mean field theory, and
momentum. In doing so, I hope to discover an aspect of these topics in physics that has
been overlooked in the machine learning community. Essentially, I hope to extend
physics analogies to illuminate a machine learning problem. Failing that, I would like to
reach a conclusion about topics of physics that represent a high likelihood of being used
in machine learning in the future. In identifying such areas a “reverse engineering”
process might be performed – that a search among existing machine learning questions
for patterns that correspond to the math in high likelihood physics topics. In identifying
these similarities, understanding of machine learning can be advanced through existing
physics knowledge.
Energy
Energy is a challenging topic in physics; it is simultaneously obvious and elusive. We all
know what energy is, generally it is a measure of the ability to do work; however, energy
does not have a consistent shape nor discrete unit. It is easy to conceive, but hard to
define. Adding to this difficulty, Einstein’s famous E=mc2 suggests that the line we draw
between matter and energy is purely a mental construct. It is perfectly valid to suggest
that any discussion in physics is, in fact, a discussion on energy.
          Nonetheless, we can put these issues aside and accept our more intuitive
definition of energy for now (the ability to do work), and concentrate on the energy’s
applications in machine learning. Energy in a neural network first appeared in 1982 and
has since become commonplace. A review of literature on the topic shows that energy is
consistently used as a function of weights, whose minima describe the attractors within
the network, in Boltzman distributions [Dayan], within Gibbs Free Energy [Csato], and
as a tool for classification, regression, constraint satisfaction, and determining latent
variables [Yann] all of which are derived from the Hopfield energy equation for energy.
[Hertz]
          Within the context of a neural network, energy (H) is defined as the difference
between weights of like and dislike variables. Consider the variables Si and Sj, binary
variables that report positive or negative one and the weight between them is wij. The
energy of a network is H=-½3wijSiSj, such that weights between agreeing variables
decrease the systems energy and weights between disagreeing variable increase the
system’s energy. The ½ of the summed weights accounts for the fact that wij = wji and as
                                                  such each weighted edge is counted twice
                                                  – halving the total sum eliminates this
                                                  redundancy. Si = sign( jwjiSj), Si takes the
                                                  value of the higher weighted value that
                                                  connects to it.
                                                       The graph of an energy function can
                                                  be considered as a landscape of hills and

  Figure 1 An example of an Energy landscape.     valleys (see figure 1). Patterns that consist
                                                  of the training set act as attractors and sit
at the minima of the landscape. Some other minima are the echos of surrounding
attractors, these minima are called spurious mixture states because they mimic attractors
but are the result of a mixture of true attractors. Further, some spurious minima are
unrelated to the attractors, these are called spin glass states and have their basis in a
different physics topic which is too great a digression for an investigation here.
Ultimately, it is the minima and its relationship to the attractors that makes the energy
function useful.
       Energy in the neural network context is compared to the energy exhibited by an
array of micromagnets, each magnet exhibiting a spin that corresponds to the machine
learning variables positive and negative one outputs. These arrays of micromagnets
naturally settle on a minima of lowest energy, just like the neural networks. Further this
physics metaphore is extended to show other physics metaphors in the machine learning
world. It is also exhibits the qualities that I suggest correlate among physics math in
machine learning: the magnetic array physics example is discrete, binary, it deals with
physics at the atomic level, and it deals with
energy as electrostatic forces.


Temperature
       Temperature in the physics world can be
                                                         Figure 2 Energy function exhibiting
thought of as the energy contained in an object.         spurious minima
For example, the difference between a cold piece
of iron and a warm piece of iron is the warm piece of iron contains more energy. This
energy can be considered the velocity of the atoms in the substance – the warm piece of
iron has the same atoms as the cold piece of iron, but the atoms are moving faster.
       Within the machine learning context, temperature is an expression of noise within
a system and a parameter controlling the update rule. Temperature plays a role in
Beltzman machines and Helmholtz free energy equations [Tanaka]. Temperature decay is
used in annealing to avoid spurious minima [Galland].
       Temperature (T) in machine learning expresses as the inverse of the Absoute
temperature(β=1/T). In the physics world, machine learning temperature exhibits
properties similar to temperature within a micromanetic array (just like the
micromagnetic array discussed in energy). In our
previous example of micromagnetic arrays, the
spin of an atom was determined solely by the spin
of its surrounding atoms, in fact this is only true
as a material approaches absolute zero. At higher
temperatures, the energy in the material (indicated          Figure 3 Energy function
                                                             reducing spurious minima
by temperature) can cause the spin to flip. Thus,
                                                             through temperature.
spin becomes a probability Si = +1 w/ probability
g(h) and g(h)=Fβ(+/-hi)=1/(1+exp(-/+ 2βhi) Fβ(+/-hi). This is a sigmoid function that
flattens as temperature increases, becoming completely flat at a critical temperature Tc.
Thus the critical temperature is the temperature at which knowledge of the spin states
offers no indication of what a connected spin state will be (The point at which noise
exceeds the ability to draw any similarities). Temperature is used to reduce the imact of
spurious minima by increasing noise greater than the spurious minima’s depth, and
thereby allowing us to be “kicked out” of spurious states and continue in search of
attractors. In being an extension of the magnetic array, this physics property also has the
qualities of being discrete, binary, occurs at the atomic level, and involves energy as an
electrostatic force.

                                            <S>
Mean Field Theory
        The mean field theory is
powerful, yet simple concept. It is the
aggregation of spin states to a single
                                            1
spin state and using that spin state
(the mean field) as the only spin state
involved. This greatly simplifies the
                                                                                              T
problem and offer insight to the                                                     Tc
average spin state(<Si>) of an array         Figure 3 Predictablity of Spin State with respect to
element.                                     temperature

        The transition of the mean field theory to machine learning is trivial – having
closely aligned the math of the magnetic array example, mean field theory directly
applies. Further the complexity of finding minima in an energy function grows
exponentially with the number of paths such that many interesting problems are too
complex because of the number of paths their graphs contain. Use of the mean field
theory makes such intractable problems manageable and greatly extends the use of
Boltzman machines [Tanaka], Gibbs free energy, and Belief Propogation [Csato]
         On a conceptual level, the mean field theory offers an unexpected insight to the
impact of noise in pattern recognition. Just as temperature in the physics model creates
                                                       the sharp change in behavior at a
         NCorrect
                                                       particular noise level, noise in the
                                                       machine learning model undergoes
                                                       a similar phase transition with
                                                       respect to the number of correctly
                                                       labeled bits. “One might have

                                                 T     assumed naively that the behavior
                                Tc                     would change smoothly as T was
 Figure 6 the number of correct bits retrieved in a    varied, but in a large system this is
 pattern with respect to temperature
                                                       often not the case. . . In the present
context [mean field theory] says that a large network abruptly cease to function at all if a
certain noise level is exceeded.” [Hertz] As shown in figure 5, when the critical
temperature in a stochastic network is reached, an input pattern offers no more insight
than random guessing.


Momentum
         Momentum, in the physics sense, is the product of mass and velocity. A bowling
ball is thrown down a bowling lane, when it is released from the player’s hand, it no
longer is acted on by external forces (assuming a frictionless vacuum), but continues to
move forward. It is momentum that allows the ball to move in the absence of maintained
force.
         In machine learning, momentum refers to a dampening effect placed on weight
changes in back-propagation algorithms. Momentum () is found in the following equation
for changes in weight [Hertz, 123]:
 wpq (t+1)=-(E/wpq)+wpq(t)
This momentum acts as an average force, compelling each new iteration of weights to
coincide with previous weights.
        I include momentum in this paper for several reasons; first it is a physics concept
and as such clearly falls into the category of topics under investigation. Second, it deals
with energy, so it coincides with the physics topics discussed so far. Third, it differs from
previous topics in that it includes large objects (like a car or a ball rolling to a stop) unlike
the atomic scope discussed with previous topics. Finally, it offers a counter-example to
the trends I identify as being consistent with physics properties used in machine learning
– momentum is not discrete, it is not binary, and it is not concerned with energy in an
electrostatic force sense. It is my suspicion that machine learning comes to the term
momentum not due to its mathematical similarities to physic’s momentum, but because
of the common usage meaning of momentum. In common usage, momentum means
impetus, a driving force, which is very different from the physics term.


Conclusion
        Having recognized a high propensity of physics metaphors in machine learning, I
entered into this paper with two goals. The first was to review the examples of physics
math in machine learning with the belief that there exists some extension of these physics
principals that applies to machine learning but has not been explored by the machine
learning community.
        I made no discovery extending machine learning based on the physics metaphors
that machine learning employs; nonetheless, I believe that such extensions exist. These
metaphors have been revealed in the past and I am confident they will continue to be
revealed in the future. I did not discover any extension because in this endeavor, my
reach exceeds my grasp. It was a little naive (and perhaps a little arrogant) to think that I
could review literature for a few weeks and identify a pattern that had been missed by a
community that has been investigating the topic since its inception. However - and I
cannot state this strongly enough - I do not imply that absence of proof is proof of
absence. I remain confident in the premise of my search, that the math shared in machine
learning and physics represents a “hidden truth” – that both are linked by a truth
regarding the way in which information is transferred and processed – and that until such
a truth is revealed, the more mature science of physics will offer many insights into the
relatively new endeavor of machine learning.
       My second goal for this paper was to look at the physics math that is applied to
machine learning and draw a conclusion about which aspects of physics would more
likely offer insight into fruitful future research in the realm of machine learning.
Obviously, not all physics knowledge applies to machine learning, but in identifying
qualities of the topics in physics that prove most applicable to machine learning we can
extrapolate which topics in physics are more likely to offer insights to machine learning.
My conclusions in this area are far from earth shattering. For example, the idea that
physics representations of objects at their smallest level, where events are discrete and
binary, are more often applicable to the discrete and binary questions in machine learning
is a very reasonable, perhaps even intuitive notion.
       Quantum mechanics deals with quantized information, at an atomic level, with a
high propensity to exhibit binary behavior. By and large, the physics that applies to
machine learning shares these qualities. This alone suggests that research in this direction
is worth considering. Add to that the fact that quantum mechanics’ narrow, non-intuitive
nature places it outside the sphere of interest of most computer scientists. This suggests
that machine learning - through the lens of quantum math - is probably an
underdeveloped topic, and I feel confident in suggesting that this is an area worthy of
greater investigation, and a successful result for the second goal in writing this paper.
References


Hertz J, Krogh A, Palmer R Introduction to the Theory of Neural Computation.
Publisher: Addison-Wesley Pub. Co, Redwood City, Calif., 1991.


Reif F, Fundamentals of Statistical and Thermal Physics, Publisher McGraw-Hill 1985


Dayan P, Hinton GE, Neal RM, Zemel RS. The Helmholtz machine. [Journal Paper]
Neural Computation, vol.7, no.5, Sept. 1995, pp. 889-904. USA.


Csato, L. Opper, M. Winther, O. , TAP Gibbs Free Energy, Belief Propagation and
Sparsity ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, VOL 1,
ISSU 14, 2002 pages 657-664


Tanaka T. Mean-field theory of Boltzmann machine learning. [Journal Paper]
Physical Review E (Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary
Topics), vol.58, no.2, Aug. 1998, pp. 2302-10. Publisher: APS through AIP, USA.


Kivinen J, Warmath MK. Boosting as entropy projection. [Conference Paper]
Proceedings of the Twelfth Annual Conference on Computational Learning Theory.
ACM. 1999, pp. 134-44. New York, NY, USA.


Galland CC. The limitations of deterministic Boltzmann machine learning. [Journal
Paper] Network: Computation in Neural Systems, vol.4, no.3, Aug. 1993, pp. 355-79. UK.


Yann LeCun and Fu Jie Huang, "Loss Functions for Discriminative Training of Energy-
Based Models," in Proc. of the 10-th International Workshop on Artificial Intelligence
and Statistics (AIStats'05) , 2005

More Related Content

What's hot

Strong Nuclear Force and Quantum Vacuum as Gravity (FUNDAMENTAL TENSOR)
Strong Nuclear Force and Quantum Vacuum as Gravity (FUNDAMENTAL TENSOR)Strong Nuclear Force and Quantum Vacuum as Gravity (FUNDAMENTAL TENSOR)
Strong Nuclear Force and Quantum Vacuum as Gravity (FUNDAMENTAL TENSOR)SergioPrezFelipe
 
Mass, energy and momentum at cms i2 u2
Mass, energy and momentum at cms   i2 u2Mass, energy and momentum at cms   i2 u2
Mass, energy and momentum at cms i2 u2Tom Loughran
 
Quantum Chemistry II
Quantum Chemistry IIQuantum Chemistry II
Quantum Chemistry IIbaoilleach
 
Strong Nuclear Force and Quantum Vacuum as Theory of Everything (NETWORK EQUI...
Strong Nuclear Force and Quantum Vacuum as Theory of Everything (NETWORK EQUI...Strong Nuclear Force and Quantum Vacuum as Theory of Everything (NETWORK EQUI...
Strong Nuclear Force and Quantum Vacuum as Theory of Everything (NETWORK EQUI...SergioPrezFelipe
 
Vasil Penchev. Cyclic mechanics. The principle of cyclicity
Vasil Penchev. Cyclic mechanics. The principle of cyclicityVasil Penchev. Cyclic mechanics. The principle of cyclicity
Vasil Penchev. Cyclic mechanics. The principle of cyclicityVasil Penchev
 
The Origin of Inertia
The Origin of InertiaThe Origin of Inertia
The Origin of InertiaKagia
 
Elastic Metamaterials Analysis: Simple and Double Resonators
Elastic Metamaterials Analysis: Simple and Double ResonatorsElastic Metamaterials Analysis: Simple and Double Resonators
Elastic Metamaterials Analysis: Simple and Double Resonatorsresearchinventy
 
2013 05 duality and models in st bcap
2013 05 duality and models in st bcap2013 05 duality and models in st bcap
2013 05 duality and models in st bcapIoan Muntean
 
Part III Essay: Could the graviton have a mass?
Part III Essay: Could the graviton have a mass?Part III Essay: Could the graviton have a mass?
Part III Essay: Could the graviton have a mass?Yiteng Dang
 
An apologytodirac'sreactionforcetheory
An apologytodirac'sreactionforcetheoryAn apologytodirac'sreactionforcetheory
An apologytodirac'sreactionforcetheorySergio Prats
 
Elastic scattering reaction of on partial wave scattering matrix, differentia...
Elastic scattering reaction of on partial wave scattering matrix, differentia...Elastic scattering reaction of on partial wave scattering matrix, differentia...
Elastic scattering reaction of on partial wave scattering matrix, differentia...Alexander Decker
 
PART VII.1 - Quantum Electrodynamics
PART VII.1 - Quantum ElectrodynamicsPART VII.1 - Quantum Electrodynamics
PART VII.1 - Quantum ElectrodynamicsMaurice R. TREMBLAY
 
Quark Model Three Body Calculations for the Hypertriton Bound State
Quark Model Three Body Calculations for the Hypertriton Bound StateQuark Model Three Body Calculations for the Hypertriton Bound State
Quark Model Three Body Calculations for the Hypertriton Bound StateIOSR Journals
 
FINAL 2014 Summer QuarkNet Research – LHCb Paper
FINAL 2014 Summer QuarkNet Research – LHCb PaperFINAL 2014 Summer QuarkNet Research – LHCb Paper
FINAL 2014 Summer QuarkNet Research – LHCb PaperTheodore Baker
 
hf=mc^2 ? Let's try to discover it (WWW.OLOSCIENCE.COM)
hf=mc^2 ? Let's try to discover it (WWW.OLOSCIENCE.COM)hf=mc^2 ? Let's try to discover it (WWW.OLOSCIENCE.COM)
hf=mc^2 ? Let's try to discover it (WWW.OLOSCIENCE.COM)Fausto Intilla
 

What's hot (20)

Strong Nuclear Force and Quantum Vacuum as Gravity (FUNDAMENTAL TENSOR)
Strong Nuclear Force and Quantum Vacuum as Gravity (FUNDAMENTAL TENSOR)Strong Nuclear Force and Quantum Vacuum as Gravity (FUNDAMENTAL TENSOR)
Strong Nuclear Force and Quantum Vacuum as Gravity (FUNDAMENTAL TENSOR)
 
WhyStrings
WhyStringsWhyStrings
WhyStrings
 
Mass, energy and momentum at cms i2 u2
Mass, energy and momentum at cms   i2 u2Mass, energy and momentum at cms   i2 u2
Mass, energy and momentum at cms i2 u2
 
604 2441-1-pb
604 2441-1-pb604 2441-1-pb
604 2441-1-pb
 
Quantum Chemistry II
Quantum Chemistry IIQuantum Chemistry II
Quantum Chemistry II
 
Strong Nuclear Force and Quantum Vacuum as Theory of Everything (NETWORK EQUI...
Strong Nuclear Force and Quantum Vacuum as Theory of Everything (NETWORK EQUI...Strong Nuclear Force and Quantum Vacuum as Theory of Everything (NETWORK EQUI...
Strong Nuclear Force and Quantum Vacuum as Theory of Everything (NETWORK EQUI...
 
Vasil Penchev. Cyclic mechanics. The principle of cyclicity
Vasil Penchev. Cyclic mechanics. The principle of cyclicityVasil Penchev. Cyclic mechanics. The principle of cyclicity
Vasil Penchev. Cyclic mechanics. The principle of cyclicity
 
The Origin of Inertia
The Origin of InertiaThe Origin of Inertia
The Origin of Inertia
 
FirstMatter - Copy
FirstMatter - CopyFirstMatter - Copy
FirstMatter - Copy
 
Elastic Metamaterials Analysis: Simple and Double Resonators
Elastic Metamaterials Analysis: Simple and Double ResonatorsElastic Metamaterials Analysis: Simple and Double Resonators
Elastic Metamaterials Analysis: Simple and Double Resonators
 
Phys 101 lo1
Phys 101 lo1Phys 101 lo1
Phys 101 lo1
 
2013 05 duality and models in st bcap
2013 05 duality and models in st bcap2013 05 duality and models in st bcap
2013 05 duality and models in st bcap
 
Paper einstein
Paper einsteinPaper einstein
Paper einstein
 
Part III Essay: Could the graviton have a mass?
Part III Essay: Could the graviton have a mass?Part III Essay: Could the graviton have a mass?
Part III Essay: Could the graviton have a mass?
 
An apologytodirac'sreactionforcetheory
An apologytodirac'sreactionforcetheoryAn apologytodirac'sreactionforcetheory
An apologytodirac'sreactionforcetheory
 
Elastic scattering reaction of on partial wave scattering matrix, differentia...
Elastic scattering reaction of on partial wave scattering matrix, differentia...Elastic scattering reaction of on partial wave scattering matrix, differentia...
Elastic scattering reaction of on partial wave scattering matrix, differentia...
 
PART VII.1 - Quantum Electrodynamics
PART VII.1 - Quantum ElectrodynamicsPART VII.1 - Quantum Electrodynamics
PART VII.1 - Quantum Electrodynamics
 
Quark Model Three Body Calculations for the Hypertriton Bound State
Quark Model Three Body Calculations for the Hypertriton Bound StateQuark Model Three Body Calculations for the Hypertriton Bound State
Quark Model Three Body Calculations for the Hypertriton Bound State
 
FINAL 2014 Summer QuarkNet Research – LHCb Paper
FINAL 2014 Summer QuarkNet Research – LHCb PaperFINAL 2014 Summer QuarkNet Research – LHCb Paper
FINAL 2014 Summer QuarkNet Research – LHCb Paper
 
hf=mc^2 ? Let's try to discover it (WWW.OLOSCIENCE.COM)
hf=mc^2 ? Let's try to discover it (WWW.OLOSCIENCE.COM)hf=mc^2 ? Let's try to discover it (WWW.OLOSCIENCE.COM)
hf=mc^2 ? Let's try to discover it (WWW.OLOSCIENCE.COM)
 

Viewers also liked

DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..butest
 
Technical Area: Machine Learning and Pattern Recognition
Technical Area: Machine Learning and Pattern RecognitionTechnical Area: Machine Learning and Pattern Recognition
Technical Area: Machine Learning and Pattern Recognitionbutest
 
Click here to read article
Click here to read articleClick here to read article
Click here to read articlebutest
 
Chapter6.doc
Chapter6.docChapter6.doc
Chapter6.docbutest
 
Webpage Design-eCommerce
Webpage Design-eCommerceWebpage Design-eCommerce
Webpage Design-eCommercebutest
 
Applying Machine Learning to Software Clustering
Applying Machine Learning to Software ClusteringApplying Machine Learning to Software Clustering
Applying Machine Learning to Software Clusteringbutest
 
CISC 879 - Machine Learning for Solving Systems Problems
CISC 879 - Machine Learning for Solving Systems Problems CISC 879 - Machine Learning for Solving Systems Problems
CISC 879 - Machine Learning for Solving Systems Problems butest
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsbutest
 
EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 

Viewers also liked (10)

DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..
 
Technical Area: Machine Learning and Pattern Recognition
Technical Area: Machine Learning and Pattern RecognitionTechnical Area: Machine Learning and Pattern Recognition
Technical Area: Machine Learning and Pattern Recognition
 
Click here to read article
Click here to read articleClick here to read article
Click here to read article
 
Chapter6.doc
Chapter6.docChapter6.doc
Chapter6.doc
 
Webpage Design-eCommerce
Webpage Design-eCommerceWebpage Design-eCommerce
Webpage Design-eCommerce
 
Applying Machine Learning to Software Clustering
Applying Machine Learning to Software ClusteringApplying Machine Learning to Software Clustering
Applying Machine Learning to Software Clustering
 
CISC 879 - Machine Learning for Solving Systems Problems
CISC 879 - Machine Learning for Solving Systems Problems CISC 879 - Machine Learning for Solving Systems Problems
CISC 879 - Machine Learning for Solving Systems Problems
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
 
[ppt]
[ppt][ppt]
[ppt]
 
EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 

Similar to danreport.doc

A General Relativity Primer
A General Relativity PrimerA General Relativity Primer
A General Relativity PrimerIOSR Journals
 
Mass and energy a bank general assets and liabilities approach –the general ...
Mass and energy a bank general assets and liabilities approach –the  general ...Mass and energy a bank general assets and liabilities approach –the  general ...
Mass and energy a bank general assets and liabilities approach –the general ...Alexander Decker
 
Physics Project On Physical World, Units and Measurement
Physics Project On Physical World, Units and MeasurementPhysics Project On Physical World, Units and Measurement
Physics Project On Physical World, Units and MeasurementSamiran Ghosh
 
Math, applied math, and math in physics
Math, applied math, and math in physicsMath, applied math, and math in physics
Math, applied math, and math in physicsJoe Redish
 
solvable-complex-potentials
solvable-complex-potentialssolvable-complex-potentials
solvable-complex-potentialsDuong Duy Nguyen
 
10.5923.j.ep.20201002.01
10.5923.j.ep.20201002.0110.5923.j.ep.20201002.01
10.5923.j.ep.20201002.01SimaIoana1
 
The Monte Carlo Method of Random Sampling in Statistical Physics
The Monte Carlo Method of Random Sampling in Statistical PhysicsThe Monte Carlo Method of Random Sampling in Statistical Physics
The Monte Carlo Method of Random Sampling in Statistical PhysicsIOSR Journals
 
Group Cohomology of the Poincare Group and Invariant States
Group Cohomology of the Poincare Group and Invariant States Group Cohomology of the Poincare Group and Invariant States
Group Cohomology of the Poincare Group and Invariant States James Moffat
 
Intro and Basic Concepts.pdf
Intro and Basic Concepts.pdfIntro and Basic Concepts.pdf
Intro and Basic Concepts.pdfHEMAMALINIKANASAN
 

Similar to danreport.doc (20)

Lie groups-ln
Lie groups-lnLie groups-ln
Lie groups-ln
 
A General Relativity Primer
A General Relativity PrimerA General Relativity Primer
A General Relativity Primer
 
Mass and energy a bank general assets and liabilities approach –the general ...
Mass and energy a bank general assets and liabilities approach –the  general ...Mass and energy a bank general assets and liabilities approach –the  general ...
Mass and energy a bank general assets and liabilities approach –the general ...
 
The Cell Method
The Cell MethodThe Cell Method
The Cell Method
 
PART X.1 - Superstring Theory
PART X.1 - Superstring TheoryPART X.1 - Superstring Theory
PART X.1 - Superstring Theory
 
Physics Project On Physical World, Units and Measurement
Physics Project On Physical World, Units and MeasurementPhysics Project On Physical World, Units and Measurement
Physics Project On Physical World, Units and Measurement
 
Math, applied math, and math in physics
Math, applied math, and math in physicsMath, applied math, and math in physics
Math, applied math, and math in physics
 
SFT_preprint-EN_2_col.pdf
SFT_preprint-EN_2_col.pdfSFT_preprint-EN_2_col.pdf
SFT_preprint-EN_2_col.pdf
 
solvable-complex-potentials
solvable-complex-potentialssolvable-complex-potentials
solvable-complex-potentials
 
Work Energy And Power
Work  Energy And PowerWork  Energy And Power
Work Energy And Power
 
Master robin schlenga
Master robin schlengaMaster robin schlenga
Master robin schlenga
 
Statmech
StatmechStatmech
Statmech
 
COMPUTATIONAL TOOLS
COMPUTATIONAL TOOLS COMPUTATIONAL TOOLS
COMPUTATIONAL TOOLS
 
10.5923.j.ep.20201002.01
10.5923.j.ep.20201002.0110.5923.j.ep.20201002.01
10.5923.j.ep.20201002.01
 
maths.ppt
maths.pptmaths.ppt
maths.ppt
 
EM Term Paper
EM Term PaperEM Term Paper
EM Term Paper
 
Lecture.1 By Jyotibhooshan Chaturvedi
Lecture.1  By Jyotibhooshan ChaturvediLecture.1  By Jyotibhooshan Chaturvedi
Lecture.1 By Jyotibhooshan Chaturvedi
 
The Monte Carlo Method of Random Sampling in Statistical Physics
The Monte Carlo Method of Random Sampling in Statistical PhysicsThe Monte Carlo Method of Random Sampling in Statistical Physics
The Monte Carlo Method of Random Sampling in Statistical Physics
 
Group Cohomology of the Poincare Group and Invariant States
Group Cohomology of the Poincare Group and Invariant States Group Cohomology of the Poincare Group and Invariant States
Group Cohomology of the Poincare Group and Invariant States
 
Intro and Basic Concepts.pdf
Intro and Basic Concepts.pdfIntro and Basic Concepts.pdf
Intro and Basic Concepts.pdf
 

More from butest

1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 
Download
DownloadDownload
Downloadbutest
 

More from butest (20)

1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 
Download
DownloadDownload
Download
 

danreport.doc

  • 1. CS290c – Machine Learning Final Project Examples and Implications of Physics Math in Machine Learning Daniel A. O’Leary
  • 2. In the course of lecture and discussion for this class, it appeared as though there exists a pattern of discovery for machine learning theories. Several instances of machine learning research exhibit the following steps: • a topic or problem is identified • some reasonable solutions are suggested • one theory is chosen as best • this theory is investigated and implemented • it is discovered that the math of this solution coincides with a physics property that shares the mathematical properties of this solution • the physics math reveals new facts regarding the nature of the problem being investigated. Not only does this seem to occur with enough frequency to suggest that the physics and machine learning may share a common theme, the ideas that seem to come out of these examples of mathematics cross-over are often the type of elegant or “big” ideas that shape and propel a discipline forward. As such it seems reasonable that another way to search for these “big” leaps in understanding would be to look at the physics that applies to machine learning, identify trends or similarities and extrapolate from these trends possible future mathematics crossovers that will prove true in machine learning. I propose to investigate the use of physics principals as metaphors in machine learning. Specifically, I will review energy, temperature, mean field theory, and momentum. In doing so, I hope to discover an aspect of these topics in physics that has been overlooked in the machine learning community. Essentially, I hope to extend physics analogies to illuminate a machine learning problem. Failing that, I would like to reach a conclusion about topics of physics that represent a high likelihood of being used in machine learning in the future. In identifying such areas a “reverse engineering” process might be performed – that a search among existing machine learning questions for patterns that correspond to the math in high likelihood physics topics. In identifying these similarities, understanding of machine learning can be advanced through existing physics knowledge.
  • 3. Energy Energy is a challenging topic in physics; it is simultaneously obvious and elusive. We all know what energy is, generally it is a measure of the ability to do work; however, energy does not have a consistent shape nor discrete unit. It is easy to conceive, but hard to define. Adding to this difficulty, Einstein’s famous E=mc2 suggests that the line we draw between matter and energy is purely a mental construct. It is perfectly valid to suggest that any discussion in physics is, in fact, a discussion on energy. Nonetheless, we can put these issues aside and accept our more intuitive definition of energy for now (the ability to do work), and concentrate on the energy’s applications in machine learning. Energy in a neural network first appeared in 1982 and has since become commonplace. A review of literature on the topic shows that energy is consistently used as a function of weights, whose minima describe the attractors within the network, in Boltzman distributions [Dayan], within Gibbs Free Energy [Csato], and as a tool for classification, regression, constraint satisfaction, and determining latent variables [Yann] all of which are derived from the Hopfield energy equation for energy. [Hertz] Within the context of a neural network, energy (H) is defined as the difference between weights of like and dislike variables. Consider the variables Si and Sj, binary variables that report positive or negative one and the weight between them is wij. The energy of a network is H=-½3wijSiSj, such that weights between agreeing variables decrease the systems energy and weights between disagreeing variable increase the system’s energy. The ½ of the summed weights accounts for the fact that wij = wji and as such each weighted edge is counted twice – halving the total sum eliminates this redundancy. Si = sign( jwjiSj), Si takes the value of the higher weighted value that connects to it. The graph of an energy function can be considered as a landscape of hills and Figure 1 An example of an Energy landscape. valleys (see figure 1). Patterns that consist of the training set act as attractors and sit
  • 4. at the minima of the landscape. Some other minima are the echos of surrounding attractors, these minima are called spurious mixture states because they mimic attractors but are the result of a mixture of true attractors. Further, some spurious minima are unrelated to the attractors, these are called spin glass states and have their basis in a different physics topic which is too great a digression for an investigation here. Ultimately, it is the minima and its relationship to the attractors that makes the energy function useful. Energy in the neural network context is compared to the energy exhibited by an array of micromagnets, each magnet exhibiting a spin that corresponds to the machine learning variables positive and negative one outputs. These arrays of micromagnets naturally settle on a minima of lowest energy, just like the neural networks. Further this physics metaphore is extended to show other physics metaphors in the machine learning world. It is also exhibits the qualities that I suggest correlate among physics math in machine learning: the magnetic array physics example is discrete, binary, it deals with physics at the atomic level, and it deals with energy as electrostatic forces. Temperature Temperature in the physics world can be Figure 2 Energy function exhibiting thought of as the energy contained in an object. spurious minima For example, the difference between a cold piece of iron and a warm piece of iron is the warm piece of iron contains more energy. This energy can be considered the velocity of the atoms in the substance – the warm piece of iron has the same atoms as the cold piece of iron, but the atoms are moving faster. Within the machine learning context, temperature is an expression of noise within a system and a parameter controlling the update rule. Temperature plays a role in Beltzman machines and Helmholtz free energy equations [Tanaka]. Temperature decay is used in annealing to avoid spurious minima [Galland]. Temperature (T) in machine learning expresses as the inverse of the Absoute temperature(β=1/T). In the physics world, machine learning temperature exhibits properties similar to temperature within a micromanetic array (just like the
  • 5. micromagnetic array discussed in energy). In our previous example of micromagnetic arrays, the spin of an atom was determined solely by the spin of its surrounding atoms, in fact this is only true as a material approaches absolute zero. At higher temperatures, the energy in the material (indicated Figure 3 Energy function reducing spurious minima by temperature) can cause the spin to flip. Thus, through temperature. spin becomes a probability Si = +1 w/ probability g(h) and g(h)=Fβ(+/-hi)=1/(1+exp(-/+ 2βhi) Fβ(+/-hi). This is a sigmoid function that flattens as temperature increases, becoming completely flat at a critical temperature Tc. Thus the critical temperature is the temperature at which knowledge of the spin states offers no indication of what a connected spin state will be (The point at which noise exceeds the ability to draw any similarities). Temperature is used to reduce the imact of spurious minima by increasing noise greater than the spurious minima’s depth, and thereby allowing us to be “kicked out” of spurious states and continue in search of attractors. In being an extension of the magnetic array, this physics property also has the qualities of being discrete, binary, occurs at the atomic level, and involves energy as an electrostatic force. <S> Mean Field Theory The mean field theory is powerful, yet simple concept. It is the aggregation of spin states to a single 1 spin state and using that spin state (the mean field) as the only spin state involved. This greatly simplifies the T problem and offer insight to the Tc average spin state(<Si>) of an array Figure 3 Predictablity of Spin State with respect to element. temperature The transition of the mean field theory to machine learning is trivial – having closely aligned the math of the magnetic array example, mean field theory directly
  • 6. applies. Further the complexity of finding minima in an energy function grows exponentially with the number of paths such that many interesting problems are too complex because of the number of paths their graphs contain. Use of the mean field theory makes such intractable problems manageable and greatly extends the use of Boltzman machines [Tanaka], Gibbs free energy, and Belief Propogation [Csato] On a conceptual level, the mean field theory offers an unexpected insight to the impact of noise in pattern recognition. Just as temperature in the physics model creates the sharp change in behavior at a NCorrect particular noise level, noise in the machine learning model undergoes a similar phase transition with respect to the number of correctly labeled bits. “One might have T assumed naively that the behavior Tc would change smoothly as T was Figure 6 the number of correct bits retrieved in a varied, but in a large system this is pattern with respect to temperature often not the case. . . In the present context [mean field theory] says that a large network abruptly cease to function at all if a certain noise level is exceeded.” [Hertz] As shown in figure 5, when the critical temperature in a stochastic network is reached, an input pattern offers no more insight than random guessing. Momentum Momentum, in the physics sense, is the product of mass and velocity. A bowling ball is thrown down a bowling lane, when it is released from the player’s hand, it no longer is acted on by external forces (assuming a frictionless vacuum), but continues to move forward. It is momentum that allows the ball to move in the absence of maintained force. In machine learning, momentum refers to a dampening effect placed on weight changes in back-propagation algorithms. Momentum () is found in the following equation for changes in weight [Hertz, 123]:
  • 7.  wpq (t+1)=-(E/wpq)+wpq(t) This momentum acts as an average force, compelling each new iteration of weights to coincide with previous weights. I include momentum in this paper for several reasons; first it is a physics concept and as such clearly falls into the category of topics under investigation. Second, it deals with energy, so it coincides with the physics topics discussed so far. Third, it differs from previous topics in that it includes large objects (like a car or a ball rolling to a stop) unlike the atomic scope discussed with previous topics. Finally, it offers a counter-example to the trends I identify as being consistent with physics properties used in machine learning – momentum is not discrete, it is not binary, and it is not concerned with energy in an electrostatic force sense. It is my suspicion that machine learning comes to the term momentum not due to its mathematical similarities to physic’s momentum, but because of the common usage meaning of momentum. In common usage, momentum means impetus, a driving force, which is very different from the physics term. Conclusion Having recognized a high propensity of physics metaphors in machine learning, I entered into this paper with two goals. The first was to review the examples of physics math in machine learning with the belief that there exists some extension of these physics principals that applies to machine learning but has not been explored by the machine learning community. I made no discovery extending machine learning based on the physics metaphors that machine learning employs; nonetheless, I believe that such extensions exist. These metaphors have been revealed in the past and I am confident they will continue to be revealed in the future. I did not discover any extension because in this endeavor, my reach exceeds my grasp. It was a little naive (and perhaps a little arrogant) to think that I could review literature for a few weeks and identify a pattern that had been missed by a community that has been investigating the topic since its inception. However - and I cannot state this strongly enough - I do not imply that absence of proof is proof of absence. I remain confident in the premise of my search, that the math shared in machine learning and physics represents a “hidden truth” – that both are linked by a truth
  • 8. regarding the way in which information is transferred and processed – and that until such a truth is revealed, the more mature science of physics will offer many insights into the relatively new endeavor of machine learning. My second goal for this paper was to look at the physics math that is applied to machine learning and draw a conclusion about which aspects of physics would more likely offer insight into fruitful future research in the realm of machine learning. Obviously, not all physics knowledge applies to machine learning, but in identifying qualities of the topics in physics that prove most applicable to machine learning we can extrapolate which topics in physics are more likely to offer insights to machine learning. My conclusions in this area are far from earth shattering. For example, the idea that physics representations of objects at their smallest level, where events are discrete and binary, are more often applicable to the discrete and binary questions in machine learning is a very reasonable, perhaps even intuitive notion. Quantum mechanics deals with quantized information, at an atomic level, with a high propensity to exhibit binary behavior. By and large, the physics that applies to machine learning shares these qualities. This alone suggests that research in this direction is worth considering. Add to that the fact that quantum mechanics’ narrow, non-intuitive nature places it outside the sphere of interest of most computer scientists. This suggests that machine learning - through the lens of quantum math - is probably an underdeveloped topic, and I feel confident in suggesting that this is an area worthy of greater investigation, and a successful result for the second goal in writing this paper.
  • 9. References Hertz J, Krogh A, Palmer R Introduction to the Theory of Neural Computation. Publisher: Addison-Wesley Pub. Co, Redwood City, Calif., 1991. Reif F, Fundamentals of Statistical and Thermal Physics, Publisher McGraw-Hill 1985 Dayan P, Hinton GE, Neal RM, Zemel RS. The Helmholtz machine. [Journal Paper] Neural Computation, vol.7, no.5, Sept. 1995, pp. 889-904. USA. Csato, L. Opper, M. Winther, O. , TAP Gibbs Free Energy, Belief Propagation and Sparsity ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, VOL 1, ISSU 14, 2002 pages 657-664 Tanaka T. Mean-field theory of Boltzmann machine learning. [Journal Paper] Physical Review E (Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics), vol.58, no.2, Aug. 1998, pp. 2302-10. Publisher: APS through AIP, USA. Kivinen J, Warmath MK. Boosting as entropy projection. [Conference Paper] Proceedings of the Twelfth Annual Conference on Computational Learning Theory. ACM. 1999, pp. 134-44. New York, NY, USA. Galland CC. The limitations of deterministic Boltzmann machine learning. [Journal Paper] Network: Computation in Neural Systems, vol.4, no.3, Aug. 1993, pp. 355-79. UK. Yann LeCun and Fu Jie Huang, "Loss Functions for Discriminative Training of Energy- Based Models," in Proc. of the 10-th International Workshop on Artificial Intelligence and Statistics (AIStats'05) , 2005