Successfully reported this slideshow.
Upcoming SlideShare
×

Neural networks...

13,029 views

Published on

Published in: Education, Technology
• Full Name
Comment goes here.

Are you sure you want to Yes No
• It is help to teach for students

Are you sure you want to  Yes  No

Neural networks...

1. 1. Neural Networks (TEC-833)B.Tech (EC – VIII Sem) – Spring 2012 dcpande@gmail.com 9997756323
2. 2. Mid Term Syllabus• Introduction: – Brain and Machine, Biological neurons and its mathematical model, Artificial Neural Networks, Benefits and Applications, Architectures, Learning Process (paradigms and algorithms), Correlation Matrix Memory, Adaptation.• Supervised learning – I: – Pattern space and weight space, Linearly and non-linearly separable classes, decision boundary, Hebbian learning and limitation, Perceptron, Perceptron convergence theorem, Logic Functions implementations• LMS Algorithm: – LMS Algorithm,• Supervised Learning – II: – Multilayer Perceptrons, XOR problem,
3. 3. End Sem Syllabus• Introduction: Brain and Machine, Biological neurons and its mathematical model, Artificial Neural Networks, Benefits and Applications, Architectures, Learning Process (paradigms and algorithms), Correlation Matrix Memory, Adaptation.• Supervised learning – I: Pattern space and weight space, Linearly and non-linearly separable classes, decision boundary, Hebbian learning and limitation, Perceptron, Perceptron convergence theorem, Logic Functions implementations• LMS Algorithm: Wiener-Hopf equations, Steepest Descent search method, LMS Algorithm, Convergence consideration in mean and mean square, Adaline, Learning curve, Learning rate annealing schedules• Supervised Learning – II: Multilayer Perceptrons, Backpropagation algorithm, XOR problem, Training modes, Optimum learning, Local minima, Network pruning techniques• Unsupervised learning: Clustering, Hamming networks, Maxnet, Simple competitive learning, Winner-take-all networks, Learning Vector Quantizers, Counterpropagation Networks, Self Organizing Maps (Kohonen Networks), Adaptive Resonance Theory• Associative Models: Hopfield Networks (Discrete and Continuous), Storage Capacity, Energy function and minimization, Brain-state-in-a-box neural network• Applications of ANN and MATLAB Simulation: Character Recognition, Control Applications, Data Compression, Self Organizing Semantic Maps
4. 4. References• Neural Networks: A Comprehensive Foundation – Simon Haykin (Pearson Education)• Neural Networks: A Classroom Approach – Satish Kumar (Tata McGraw Hill)• Fundamentals of Neural Networks – Laurene Fausett (Pearson Education)• ftp://ftp.sas.com/pub/neural/FAQ.html• MATLAB neural network toolbox and related help notes
5. 5. Inputs to Neural Networks• Biology• Graph Theory• Algorithms• Artificial Intelligence• Control Systems• Signal Theory
6. 6. Minsky’s challenge (adapted from Minsky, Singh and Sloman (2004)) Few Number of Causes Many Symbolic Logical Case based Many Intractable Reasoning Reasoning Ordinary Analogy BasedNumber of Effects Qualitative Classical AI Reasoning Reasoning Connectionist, Few Easy Linear, Statistical Neural Network, Fuzzy Logic
7. 7. Who uses Neural Networks Area UseComputer Scientists To understand properties of non-symbolic information processing; Learning systemsEngineers In many areas including signal processing and automatic controlStatisticians As flexible, non-linear regression and classification modelsPhysicists To model phenomenon in statistical mechanics and other tasksCognitive Scientists To describe models of thinking and consciousness and other high level brain functionsNeuro-physiologists To describe and explore memory, sensory functions, motor functions and other mid-level brain functionsBiologists To interpret nucleotide sequencesPhilosophers etc. For their own reasons
8. 8. Brain vs the computer http://scienceblogs.com/developingintelligence/2007/03/why_the_brain_is_not_like_a_co.phpBrain ComputerBrains are analogue (neuronal firing rate, Computers are digitalasynchronous, leakiness)Brain uses content-addressable memory Computers use byte addressable memoryBrain is a massively parallel machine Computers are modular and serialProcessing speed is not fixed in the brain; there is Processing speed is fixed; there is a system clockno system clockShort-term memory only holds pointers to long RAM has isomorphic dataterm memoryNo hardware/software distinction can be made Computers have a clear distinction betweenwith respect to the brain or mind hardware and softwareSynapses are far more complex than electrical Electrical gates are simpler in function andlogic gates mechanismProcessing and memory are performed by the Processing and memory are performed bysame components in the brain different components in the computerThe brain is a self-organizing system Computers are usually not self organizingBrains have bodies and use them Computers do not usually use their bodiesThe brain capacity is much larger than any Computer capacities though large are still notcomputer comparable with those of the brain
9. 9. Neuro products and application areas• Academia Research • Market Segmentation• Automotive Industry • Medical Diagnosis• Bio Informatics • Meteorological Research• Cancer Detection • Optical Character Recognition• Computer Gaming • Pattern Recognition• Credit Ratings • Predicting Business Expenses• Drug Interaction Prediction • Real Estate Evaluations• Electrical Load Balancing • Robotics• Financial Forecasting • Sales Forecasting• Fraud Detection • Search Engines• Human Resources • Software Security• Image Recognition • Speech Recognition• Industrial Plant Modeling • Sports Betting• Machine Control • Sports Handicap Predictions• Machine Diagnostics
10. 10. Applications of ANN (Sample Examples)• Non linear statistical data modeling tools• Function Approximation/ Mapping• Pattern recognition in data• Noise Cancellation (LMS) in signaling systems• Time Series Predictions• Control and Steering of Autonomous Vehicles (Feedforward)• Protein structure prediction and RNA splice junction identification• Sonar/radar/image/astronomy/handwriting target recognition/ classification• Call admission control for improving QOS in telecommunications (ATM) networks• Software engineering project management• Reinforcement Learning in Robotics (Backpropagation)• Pattern Completion (Hopfield)• Object recognition (Hopfield)• Clustering and Character Recognition (ART)• Neural Information Retrieval System(Machine parts retrieval at Boeing) (ART)• Neural Phonetic Typewriter (SOM)• Control of Robot Arms (SOM)• Vector Quantization (SOM)• Radar based classification (SOM)• Brain Modeling (SOM)• Feature mapping of language data (SOM)• Organization of massive document collection (SOM)
11. 11. Neuroscience basics I• 100 B (10**11) neurons in brain• Each neuron has 10K (10**4) synapses on average• Thus 10**15 connections• A lifetime of 80 years is 2.5B seconds.
12. 12. Structural organization of levels in brain Central Nervous System Interregional Circuits (Systems) Local Circuits (Maps/Networks) Neurons Dendritic Trees Neural microcircuits Synapses Molecules
13. 13. Structural organization of levels in brain (Churchland)
14. 14. Neuroscience Basics II• Brain structures – Cerebrum • Frontal Lobe • Temporal Lobe • Parietal Lobe • Occipital Lobe • Central Sulcus • Sylvian fissure – Cerebellum – Brain Stem • Corpus callosum • Thalamus • Hypothalamus • Midbrain • Pons • Medulla
15. 15. Brain Anatomy
16. 16. Brain Areas
17. 17. Homunculus
18. 18. Nervous System
19. 19. Sympathetic and Parasympathetic nerves
20. 20. Neuron - I
21. 21. Neuron - II
22. 22. Sensory and Motor pathways
23. 23. Wiring the Brain
25. 25. Neurotransmitters• Neurotransmitters are endogenous chemicals which transmit signals from a neuron to a target cell across a synapse. Excitatory Inhibitory Glutamate (memory storage) Gamma Amino Butyric Acid (GABA) (brain) Acetylcholine (neuro muscular Glycine (spinal cord) junction) Dopamine (brain reward system) Norepinephrine Serotonin (regulation of appetite, sleep, memory, learning, mood, behaviour) Substance P
26. 26. Action Potential
27. 27. Action Potential
28. 28. Types of Neural NetworksBased on Learning Algorithms Supervised and UnsupervisedAssociativity in Supervised Learning Auto Associative and Hetro AssociativeBased on Network Topology Feed forward and feedback / recurrentBased on kind of data accepted Categorical variables, Quantitative variablesBased on transfer function used Linear, Non-linearBased on number of layers Single Layer, Multilayer
29. 29. ANN Architecture Taxonomy Linear Hebbian, Perceptron, Adaline, Higher Order, Functional Link MLP (Multilayer Back Propagation, Cascade Correlation, Quick Prop, RPROP Perceptron) Feed RBF Networks Orthogonal Least Squares forward CMAC Cerebellar Model Articulation Controller LVQ (Learning Vector Quantization), PNN (Probabilistic Classification Only Neural Network) Regression Only GNN (General Regression Neural Network) Supervised BAM (Binary Associative Memory) Boltzmann Machine Feedback Back Propagation through time, Elman, FIR, Jordan, Real time Recurrent Time recurrent network, Recurrent Back propagation, TDNN (Time Series Delay Neural Nets) Competitive ARTMAP, Fuzzy ARTMAP, Gaussian ARTMAP, Counter propagation, Neocognitron Vector Quantization Grossberg, Kohonen, ConscienceANN Self Organizing Map Kohonen, GTM, Local Linear Competitive Adaptive Resonance Theory ART1, ART2, ART2A, ART3, Fuzzy ART DCL (Differential Competitive Learning) Unsupervised Dimension Reduction Hebbian, Oja, Sanger, Differential Hebbian Auto Association Linear Auto Associator, BSB (Brain State in a Box), Hopfield Non learning Hopfield, various networks for optimization
30. 30. Learning Rules• Error correction learning• Memory based learning• Hebbian learning• Competitive learning• Boltzmann learning
31. 31. Error Correction Learning• Error signal: ek(n) = dk(n) – yk(n)• Control mechanism to apply a series of corrective adjustments• Index of performance or instantaneous value of Error Energy: E(n) = ½ ek2(n)• Delta rule or Widrow-Hopf rule – Thus Δwkj(n) = ηek(n)xj(n)• And wkj(n+1) = wkj(n) + Δwkj(n)• Using unit delay operator: wkj(n) = z-1[wkj(n+1)]
32. 32. Euclidean Distance• Ordinary distance between two points that can be measured with a ruler.• In multi dimensional case it is the distance between two vectors.
33. 33. Memory based learning• Binary pattern classification : – with input output pairs {(xi,di)}Ni=1• Nearest Neighbor Rule – xN’ є {x1, x2, …, xN} – If mini d(xi, xtest) = d(xN’, xtest) – Where d(xi, xtest) is the Euclidean distance between the vectors xi and x test.• Cover and Hart (1967): nearest neighbor rule for pattern classification. Assumptions are: – The classified examples (xi, di) are independently and identically distributed according to the joint probability distribution of the example – The sample size N is infinitely large Then the probability of classification error is bounded by twice the Bayes probability of error. , the minimum probability of error over all decision rules.• Radial basis function network for curve fitting (approximation problem in higher dimensional space)
34. 34. Hebbian Learning• Repeated or persistent firing changes synaptic weight due to increased efficiency• Associative learning at cellular level – Time dependent mechanism – Local mechanism – Interactive mechanism – Conjunctional or correlational mechanism – Here Δwkj(n) = F(yk(n), xj(n)) – Hebb’s hypothesis : Δwkj(n) = η yk(n)xj(n) – Covariance hypothesis: Δwkj(n) = η (yk – yav)(xj(n)-xav)• Synaptic modifications can be Hebbian, Anti-Hebbian, or non- Hebbian.• Evidence for Hebbian learning in the Hippocampus which plays an important role in learning and memory
35. 35. Competitive Learning• The O/P neurons compete among themselves to become active• Elements of competitive learning rule (Rumelhart and Zisper (1985)) – Sets of neurons are same except randomly distributed synaptic weights – Limit on strength of each neuron – Winner takes all mechanism• Use as feature detectors• Has feed forward (excitatory connections)• Has lateral (inhibitory) connections• Here Δwkj(n) = η(xj – wkj) if neuron k wins• = 0 if neuron k loses
36. 36. Boltzmann Learning• Stochastic model of a neuron – x = +1 with probability P(v) – = -1 with probability 1- P(v) – P = 1/(1+ exp(-v/T) – T is pseudo temperature use to control uncertainty in firing (noise level)• Stochastic learning algorithm for statistical mechanics• Neurons in recurrent structure• Operate in binary manner• Energy function – Here E= -1/2 Σ Σ wkjxkxj• Flip a random neuron from state xk to state –xk at some temperature with probability• P(xk -> -xk) = 1/(1+exp(- ΔEk/T))
37. 37. Credit Assignment Problem in Distributed Systems• Assignment of credit or blame for overall outcome to internal decisions• Credit assignment problem has two parts: – Temporal Credit Assignment Problem – Structural Credit Assignment Problem• Credit Assignment problem becomes more complex in multilayer feed forward neural nets.
38. 38. Supervised Learning• Knowledge is represented by a series of input-output examples• Environment provides training vector to both teacher and Neural Network• Teacher or Trainer provides Desired response• Neural Network provides Actual response• Error Signal = Desired response – Actual response• Adjustment is carried out iteratively to make the neural network emulate the teacher.• The mean square error function can be visualized as a multidimensional error-performance surface with the free parameters as coordinates.• Identification of local or global minimum is done using steepest gradient descent method.
39. 39. Reinforcement learning/ Neuro-dynamic Programming (Learning with a Critic)• Critic converts a primary reinforcement signal from environment to a heuristic reinforcement signal• system learns under delayed reinforcement after observation of temporal sequences• goal is to minimize the cumulative cost of actions over a sequence of steps• Problems: – No teacher to provide desired response – Learning machine must solve temporal credit assignment problem• Reinforcement learning is related to Dynamic Programming
40. 40. Unsupervised Learning (Self Organized Learning)• No external teacher or critic• Provision for task independent measure of quality of learning• Free parameters are optimized with respect to that measure• Network becomes tuned to statistical regularities in data• It develops ability to form internal representations for encoding features of input and create new classes automatically• Competitive Learning rule is used for Unsupervised learning• Two layers: input layer and competitive layer
41. 41. Learning Applications• Pattern Association• Pattern Recognition• Function Approximation• Control• Filtering• Beam forming
42. 42. Pattern Association• Cognition uses association in distributed memory : – xk -> yk ; key pattern -> memorized pattern – Two phases: • storage phase (training) • recall phase (noisy or distorted version of key pattern presented) • y= yj (Perfect recall) • y ≠ yj for x =xj (error)• Two types: – Auto associative memory: • Output set of patterns is the same as input set: yk = xk • Used for pattern retrieval • Input and output spaces have same dimensionality • Uses unsupervised learning – Hetero associative memory: • Output set of patterns is the different from input set: yk ≠ xk • Used in other Pattern Association • Input and output spaces may or may not have same dimensionality • Uses supervised learning
43. 43. Pattern Recognition• Process whereby a received pattern is assigned to a prescribed number of classes (categories)• Two stages: – Training Session – New patterns• Patterns can be considered as points in multidimensional decision space (MDS)• MDS is divided into regions, each associated with a class• Decision boundaries are determined by the training process• Boundary definition is by a statistical mechanism due to variability between classes• Machine has two parts: – Feature Extraction (Unsupervised network) – Classification (Supervised network) – m-dimensional observation (data) space -> q-dimensional feature space -> r dimensional decision space• Approaches: – Single layered feed forward network using a supervised learning algorithm – Feature extraction is done in the hidden layer
44. 44. Function Approximation• I/O mapping: d=f(x)• Function f(.) is unknown• Set of labeled examples are available – T= {(xi, di)}N i=1• ||F(x) –f(x)|| < ε for all x• Used in – System model identification – Inverse system model identification
45. 45. Control• Ref signal is compared with feedback signal• Error signal e is fed to neural network controller• O/P of NNC u is fed to plant as input• Plant output is y (part of which is sent as feedback)• J={dyk/duj} (partial differential)• Two approaches: – Indirect Learning – Direct Learning
46. 46. Filtering• To extract information from noisy data• Filter used for: – Filtering (for getting current data based on past data) – Smoothing (for getting current data based on future data) – Prediction (for forecasting future data based on current and past data)• In filtering – Cocktail party problem – Blind signal separation – Here x(n) = A u(n), were A = mixing matrix – Need a de mixing W to recover the original signal• In prediction – Error correction learning – x(n) provides the desired response and used for training – A form of model building, where network acts as model – When prediction is non-linear; NNs are a powerful method because non-linear processing units can be used for its construction – However if dynamic range of the time series is unknown, linear output unit is the most reasonable choice
47. 47. Beam forming• Spatial form of filtering• To provide attentional selectivity in the presence of noise• Used in radar and sonar systems• Detect and track a target of interest in the presence of receiver noise and interfering signals (e.g. from jammers)• Task is complicated by: – Target signal can be from an unknown direction – No prior information about interfering signals• Generalized Side Lobe Canceller (GSLC) consisting of: – Array of antenna elements: which samples the observed signals – A linear combiner: acts as a spatial filter and provides the desired response (i.e. for main lobe) – A signal blocking matrix: to cancel leakage from side lobes – A neural network : to accommodate variations in interfering signals• Neural network adjusts its free parameters and acts as an attentional neurocomputer.
48. 48. Associative Memory• Memory is relatively enduring neural alterations induced by the interaction of an organism with its environment.• Activity must be stored in memory through a learning process• Memory may be short term or long term• Associative memory – Distributed – Stimulus (key) pattern and response (stored) pattern vectors – Information is stored in memory by setting up a spatial pattern of neural activities across a large number of neurons – Information in stimulus also contains storage location and address for retrieval – High degree of resistance to noise and damage of a diffusive kind – May be interactions between different patterns stored in memory and thus errors in recall process
49. 49. Memory and noise• For a linear network yk = W(k)xk• Total experience gained M = Σk=1..q W(k)• Memory matrix Mk = Mk-1 + W(k); k = 1..q• Estimate of memory matrix Me = Σk=1..q ykxkT• Correlation matrix memory Me= YXT• X = key matrix; Y = memorized matrix• Recall : y= Mxj• y = yj + vj ; vj = noise vector is due to cross talk between key vector xj and all other key vectors stored in memory• For a linear signal space cosine of angle between vectors xj and xk cos(xk,xj) = xkTxj/(|xk|.|xj|)• Noise vector vj = Σk=1..m cos(xk,xj)yk
50. 50. Orthogonality, Community and Errors• The memory associates perfectly (noise vector is zero) when the key vectors are orthogonal, i.e. xkTxj = {1 when k=j and 0 when k≠j}• If key patterns are not orthogonal or highly separated it leads to confusion and errors• Community of set of patterns {xkey } can be such that xkTxj >= ᵞ for k≠j• If the lower bound ᵞis large enough, the memory may fail to distinguish the response y from any other key pattern contained in the set {xkey}
51. 51. Adaptation• Spatiotemporal nature of learning• Temporal structure of experience from insects to humans, thus animal can adapt its behavior• In time-stationary environment, – supervised learning possible, – synaptic weights can be frozen after learning – learning system relies on memory• In non-time-stationary environments – supervised learning inadequate – network needs a way to track the statistical variations in environment with time – desirable for neural network to continually adapt its free parameters to respond in real time – this requires continuous learning – Linear adaptive filters perform continuous learning • Used in radar, sonar, communications, seismology, biomedical signal processsing • In a mature state of development • Nonlinear adaptive filters, development not yet mature.
52. 52. Pseudo stationary process• Neural network requires stable time for computation• How can it adapt to signals varying in time?• Many non stationary processes change slowly enough for the process to be considered pseudo stationary over a window of short enough duration. – Speech signal: 10 – 30 ms – Radar returns from ocean surface: few seconds – Long range weather forecasting: few minutes – Long range stock market trends: few days• Retrain network at regular intervals, dynamic approach – Select a window short enough for data to be considered pseudo stationary – Use the sampled data to train the network – Keep data samples in a FIFO, add new sample and drop oldest data sample – Use updated data window to retrain and repeat• Network undergoes continual training with time ordered examples• Non linear filter : a generalization of linear adaptive filters• Resources available must be fast enough to complete the compute in one sampling period.
53. 53. Rosenblatt’s perceptron• Type: feed forward• Neuron layers: 1 I/P, 1 O/P• Input value types: binary• Activation function: Hard Limiter• Learning method: Supervised• Learning Algorithm: Hebb’s learning rule• Used in: Simple logic operations; pattern classification