Neural networks...


Published on

Published in: Education, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Neural networks...

  1. 1. Neural Networks (TEC-833)B.Tech (EC – VIII Sem) – Spring 2012 9997756323
  2. 2. Mid Term Syllabus• Introduction: – Brain and Machine, Biological neurons and its mathematical model, Artificial Neural Networks, Benefits and Applications, Architectures, Learning Process (paradigms and algorithms), Correlation Matrix Memory, Adaptation.• Supervised learning – I: – Pattern space and weight space, Linearly and non-linearly separable classes, decision boundary, Hebbian learning and limitation, Perceptron, Perceptron convergence theorem, Logic Functions implementations• LMS Algorithm: – LMS Algorithm,• Supervised Learning – II: – Multilayer Perceptrons, XOR problem,
  3. 3. End Sem Syllabus• Introduction: Brain and Machine, Biological neurons and its mathematical model, Artificial Neural Networks, Benefits and Applications, Architectures, Learning Process (paradigms and algorithms), Correlation Matrix Memory, Adaptation.• Supervised learning – I: Pattern space and weight space, Linearly and non-linearly separable classes, decision boundary, Hebbian learning and limitation, Perceptron, Perceptron convergence theorem, Logic Functions implementations• LMS Algorithm: Wiener-Hopf equations, Steepest Descent search method, LMS Algorithm, Convergence consideration in mean and mean square, Adaline, Learning curve, Learning rate annealing schedules• Supervised Learning – II: Multilayer Perceptrons, Backpropagation algorithm, XOR problem, Training modes, Optimum learning, Local minima, Network pruning techniques• Unsupervised learning: Clustering, Hamming networks, Maxnet, Simple competitive learning, Winner-take-all networks, Learning Vector Quantizers, Counterpropagation Networks, Self Organizing Maps (Kohonen Networks), Adaptive Resonance Theory• Associative Models: Hopfield Networks (Discrete and Continuous), Storage Capacity, Energy function and minimization, Brain-state-in-a-box neural network• Applications of ANN and MATLAB Simulation: Character Recognition, Control Applications, Data Compression, Self Organizing Semantic Maps
  4. 4. References• Neural Networks: A Comprehensive Foundation – Simon Haykin (Pearson Education)• Neural Networks: A Classroom Approach – Satish Kumar (Tata McGraw Hill)• Fundamentals of Neural Networks – Laurene Fausett (Pearson Education)•• MATLAB neural network toolbox and related help notes
  5. 5. Inputs to Neural Networks• Biology• Graph Theory• Algorithms• Artificial Intelligence• Control Systems• Signal Theory
  6. 6. Minsky’s challenge (adapted from Minsky, Singh and Sloman (2004)) Few Number of Causes Many Symbolic Logical Case based Many Intractable Reasoning Reasoning Ordinary Analogy BasedNumber of Effects Qualitative Classical AI Reasoning Reasoning Connectionist, Few Easy Linear, Statistical Neural Network, Fuzzy Logic
  7. 7. Who uses Neural Networks Area UseComputer Scientists To understand properties of non-symbolic information processing; Learning systemsEngineers In many areas including signal processing and automatic controlStatisticians As flexible, non-linear regression and classification modelsPhysicists To model phenomenon in statistical mechanics and other tasksCognitive Scientists To describe models of thinking and consciousness and other high level brain functionsNeuro-physiologists To describe and explore memory, sensory functions, motor functions and other mid-level brain functionsBiologists To interpret nucleotide sequencesPhilosophers etc. For their own reasons
  8. 8. Brain vs the computer ComputerBrains are analogue (neuronal firing rate, Computers are digitalasynchronous, leakiness)Brain uses content-addressable memory Computers use byte addressable memoryBrain is a massively parallel machine Computers are modular and serialProcessing speed is not fixed in the brain; there is Processing speed is fixed; there is a system clockno system clockShort-term memory only holds pointers to long RAM has isomorphic dataterm memoryNo hardware/software distinction can be made Computers have a clear distinction betweenwith respect to the brain or mind hardware and softwareSynapses are far more complex than electrical Electrical gates are simpler in function andlogic gates mechanismProcessing and memory are performed by the Processing and memory are performed bysame components in the brain different components in the computerThe brain is a self-organizing system Computers are usually not self organizingBrains have bodies and use them Computers do not usually use their bodiesThe brain capacity is much larger than any Computer capacities though large are still notcomputer comparable with those of the brain
  9. 9. Neuro products and application areas• Academia Research • Market Segmentation• Automotive Industry • Medical Diagnosis• Bio Informatics • Meteorological Research• Cancer Detection • Optical Character Recognition• Computer Gaming • Pattern Recognition• Credit Ratings • Predicting Business Expenses• Drug Interaction Prediction • Real Estate Evaluations• Electrical Load Balancing • Robotics• Financial Forecasting • Sales Forecasting• Fraud Detection • Search Engines• Human Resources • Software Security• Image Recognition • Speech Recognition• Industrial Plant Modeling • Sports Betting• Machine Control • Sports Handicap Predictions• Machine Diagnostics
  10. 10. Applications of ANN (Sample Examples)• Non linear statistical data modeling tools• Function Approximation/ Mapping• Pattern recognition in data• Noise Cancellation (LMS) in signaling systems• Time Series Predictions• Control and Steering of Autonomous Vehicles (Feedforward)• Protein structure prediction and RNA splice junction identification• Sonar/radar/image/astronomy/handwriting target recognition/ classification• Call admission control for improving QOS in telecommunications (ATM) networks• Software engineering project management• Reinforcement Learning in Robotics (Backpropagation)• Pattern Completion (Hopfield)• Object recognition (Hopfield)• Clustering and Character Recognition (ART)• Neural Information Retrieval System(Machine parts retrieval at Boeing) (ART)• Neural Phonetic Typewriter (SOM)• Control of Robot Arms (SOM)• Vector Quantization (SOM)• Radar based classification (SOM)• Brain Modeling (SOM)• Feature mapping of language data (SOM)• Organization of massive document collection (SOM)
  11. 11. Neuroscience basics I• 100 B (10**11) neurons in brain• Each neuron has 10K (10**4) synapses on average• Thus 10**15 connections• A lifetime of 80 years is 2.5B seconds.
  12. 12. Structural organization of levels in brain Central Nervous System Interregional Circuits (Systems) Local Circuits (Maps/Networks) Neurons Dendritic Trees Neural microcircuits Synapses Molecules
  13. 13. Structural organization of levels in brain (Churchland)
  14. 14. Neuroscience Basics II• Brain structures – Cerebrum • Frontal Lobe • Temporal Lobe • Parietal Lobe • Occipital Lobe • Central Sulcus • Sylvian fissure – Cerebellum – Brain Stem • Corpus callosum • Thalamus • Hypothalamus • Midbrain • Pons • Medulla
  15. 15. Brain Anatomy
  16. 16. Brain Areas
  17. 17. Homunculus
  18. 18. Nervous System
  19. 19. Sympathetic and Parasympathetic nerves
  20. 20. Neuron - I
  21. 21. Neuron - II
  22. 22. Sensory and Motor pathways
  23. 23. Wiring the Brain
  24. 24. Synapses(
  25. 25. Neurotransmitters• Neurotransmitters are endogenous chemicals which transmit signals from a neuron to a target cell across a synapse. Excitatory Inhibitory Glutamate (memory storage) Gamma Amino Butyric Acid (GABA) (brain) Acetylcholine (neuro muscular Glycine (spinal cord) junction) Dopamine (brain reward system) Norepinephrine Serotonin (regulation of appetite, sleep, memory, learning, mood, behaviour) Substance P
  26. 26. Action Potential
  27. 27. Action Potential
  28. 28. Types of Neural NetworksBased on Learning Algorithms Supervised and UnsupervisedAssociativity in Supervised Learning Auto Associative and Hetro AssociativeBased on Network Topology Feed forward and feedback / recurrentBased on kind of data accepted Categorical variables, Quantitative variablesBased on transfer function used Linear, Non-linearBased on number of layers Single Layer, Multilayer
  29. 29. ANN Architecture Taxonomy Linear Hebbian, Perceptron, Adaline, Higher Order, Functional Link MLP (Multilayer Back Propagation, Cascade Correlation, Quick Prop, RPROP Perceptron) Feed RBF Networks Orthogonal Least Squares forward CMAC Cerebellar Model Articulation Controller LVQ (Learning Vector Quantization), PNN (Probabilistic Classification Only Neural Network) Regression Only GNN (General Regression Neural Network) Supervised BAM (Binary Associative Memory) Boltzmann Machine Feedback Back Propagation through time, Elman, FIR, Jordan, Real time Recurrent Time recurrent network, Recurrent Back propagation, TDNN (Time Series Delay Neural Nets) Competitive ARTMAP, Fuzzy ARTMAP, Gaussian ARTMAP, Counter propagation, Neocognitron Vector Quantization Grossberg, Kohonen, ConscienceANN Self Organizing Map Kohonen, GTM, Local Linear Competitive Adaptive Resonance Theory ART1, ART2, ART2A, ART3, Fuzzy ART DCL (Differential Competitive Learning) Unsupervised Dimension Reduction Hebbian, Oja, Sanger, Differential Hebbian Auto Association Linear Auto Associator, BSB (Brain State in a Box), Hopfield Non learning Hopfield, various networks for optimization
  30. 30. Learning Rules• Error correction learning• Memory based learning• Hebbian learning• Competitive learning• Boltzmann learning
  31. 31. Error Correction Learning• Error signal: ek(n) = dk(n) – yk(n)• Control mechanism to apply a series of corrective adjustments• Index of performance or instantaneous value of Error Energy: E(n) = ½ ek2(n)• Delta rule or Widrow-Hopf rule – Thus Δwkj(n) = ηek(n)xj(n)• And wkj(n+1) = wkj(n) + Δwkj(n)• Using unit delay operator: wkj(n) = z-1[wkj(n+1)]
  32. 32. Euclidean Distance• Ordinary distance between two points that can be measured with a ruler.• In multi dimensional case it is the distance between two vectors.
  33. 33. Memory based learning• Binary pattern classification : – with input output pairs {(xi,di)}Ni=1• Nearest Neighbor Rule – xN’ є {x1, x2, …, xN} – If mini d(xi, xtest) = d(xN’, xtest) – Where d(xi, xtest) is the Euclidean distance between the vectors xi and x test.• Cover and Hart (1967): nearest neighbor rule for pattern classification. Assumptions are: – The classified examples (xi, di) are independently and identically distributed according to the joint probability distribution of the example – The sample size N is infinitely large Then the probability of classification error is bounded by twice the Bayes probability of error. , the minimum probability of error over all decision rules.• Radial basis function network for curve fitting (approximation problem in higher dimensional space)
  34. 34. Hebbian Learning• Repeated or persistent firing changes synaptic weight due to increased efficiency• Associative learning at cellular level – Time dependent mechanism – Local mechanism – Interactive mechanism – Conjunctional or correlational mechanism – Here Δwkj(n) = F(yk(n), xj(n)) – Hebb’s hypothesis : Δwkj(n) = η yk(n)xj(n) – Covariance hypothesis: Δwkj(n) = η (yk – yav)(xj(n)-xav)• Synaptic modifications can be Hebbian, Anti-Hebbian, or non- Hebbian.• Evidence for Hebbian learning in the Hippocampus which plays an important role in learning and memory
  35. 35. Competitive Learning• The O/P neurons compete among themselves to become active• Elements of competitive learning rule (Rumelhart and Zisper (1985)) – Sets of neurons are same except randomly distributed synaptic weights – Limit on strength of each neuron – Winner takes all mechanism• Use as feature detectors• Has feed forward (excitatory connections)• Has lateral (inhibitory) connections• Here Δwkj(n) = η(xj – wkj) if neuron k wins• = 0 if neuron k loses
  36. 36. Boltzmann Learning• Stochastic model of a neuron – x = +1 with probability P(v) – = -1 with probability 1- P(v) – P = 1/(1+ exp(-v/T) – T is pseudo temperature use to control uncertainty in firing (noise level)• Stochastic learning algorithm for statistical mechanics• Neurons in recurrent structure• Operate in binary manner• Energy function – Here E= -1/2 Σ Σ wkjxkxj• Flip a random neuron from state xk to state –xk at some temperature with probability• P(xk -> -xk) = 1/(1+exp(- ΔEk/T))
  37. 37. Credit Assignment Problem in Distributed Systems• Assignment of credit or blame for overall outcome to internal decisions• Credit assignment problem has two parts: – Temporal Credit Assignment Problem – Structural Credit Assignment Problem• Credit Assignment problem becomes more complex in multilayer feed forward neural nets.
  38. 38. Supervised Learning• Knowledge is represented by a series of input-output examples• Environment provides training vector to both teacher and Neural Network• Teacher or Trainer provides Desired response• Neural Network provides Actual response• Error Signal = Desired response – Actual response• Adjustment is carried out iteratively to make the neural network emulate the teacher.• The mean square error function can be visualized as a multidimensional error-performance surface with the free parameters as coordinates.• Identification of local or global minimum is done using steepest gradient descent method.
  39. 39. Reinforcement learning/ Neuro-dynamic Programming (Learning with a Critic)• Critic converts a primary reinforcement signal from environment to a heuristic reinforcement signal• system learns under delayed reinforcement after observation of temporal sequences• goal is to minimize the cumulative cost of actions over a sequence of steps• Problems: – No teacher to provide desired response – Learning machine must solve temporal credit assignment problem• Reinforcement learning is related to Dynamic Programming
  40. 40. Unsupervised Learning (Self Organized Learning)• No external teacher or critic• Provision for task independent measure of quality of learning• Free parameters are optimized with respect to that measure• Network becomes tuned to statistical regularities in data• It develops ability to form internal representations for encoding features of input and create new classes automatically• Competitive Learning rule is used for Unsupervised learning• Two layers: input layer and competitive layer
  41. 41. Learning Applications• Pattern Association• Pattern Recognition• Function Approximation• Control• Filtering• Beam forming
  42. 42. Pattern Association• Cognition uses association in distributed memory : – xk -> yk ; key pattern -> memorized pattern – Two phases: • storage phase (training) • recall phase (noisy or distorted version of key pattern presented) • y= yj (Perfect recall) • y ≠ yj for x =xj (error)• Two types: – Auto associative memory: • Output set of patterns is the same as input set: yk = xk • Used for pattern retrieval • Input and output spaces have same dimensionality • Uses unsupervised learning – Hetero associative memory: • Output set of patterns is the different from input set: yk ≠ xk • Used in other Pattern Association • Input and output spaces may or may not have same dimensionality • Uses supervised learning
  43. 43. Pattern Recognition• Process whereby a received pattern is assigned to a prescribed number of classes (categories)• Two stages: – Training Session – New patterns• Patterns can be considered as points in multidimensional decision space (MDS)• MDS is divided into regions, each associated with a class• Decision boundaries are determined by the training process• Boundary definition is by a statistical mechanism due to variability between classes• Machine has two parts: – Feature Extraction (Unsupervised network) – Classification (Supervised network) – m-dimensional observation (data) space -> q-dimensional feature space -> r dimensional decision space• Approaches: – Single layered feed forward network using a supervised learning algorithm – Feature extraction is done in the hidden layer
  44. 44. Function Approximation• I/O mapping: d=f(x)• Function f(.) is unknown• Set of labeled examples are available – T= {(xi, di)}N i=1• ||F(x) –f(x)|| < ε for all x• Used in – System model identification – Inverse system model identification
  45. 45. Control• Ref signal is compared with feedback signal• Error signal e is fed to neural network controller• O/P of NNC u is fed to plant as input• Plant output is y (part of which is sent as feedback)• J={dyk/duj} (partial differential)• Two approaches: – Indirect Learning – Direct Learning
  46. 46. Filtering• To extract information from noisy data• Filter used for: – Filtering (for getting current data based on past data) – Smoothing (for getting current data based on future data) – Prediction (for forecasting future data based on current and past data)• In filtering – Cocktail party problem – Blind signal separation – Here x(n) = A u(n), were A = mixing matrix – Need a de mixing W to recover the original signal• In prediction – Error correction learning – x(n) provides the desired response and used for training – A form of model building, where network acts as model – When prediction is non-linear; NNs are a powerful method because non-linear processing units can be used for its construction – However if dynamic range of the time series is unknown, linear output unit is the most reasonable choice
  47. 47. Beam forming• Spatial form of filtering• To provide attentional selectivity in the presence of noise• Used in radar and sonar systems• Detect and track a target of interest in the presence of receiver noise and interfering signals (e.g. from jammers)• Task is complicated by: – Target signal can be from an unknown direction – No prior information about interfering signals• Generalized Side Lobe Canceller (GSLC) consisting of: – Array of antenna elements: which samples the observed signals – A linear combiner: acts as a spatial filter and provides the desired response (i.e. for main lobe) – A signal blocking matrix: to cancel leakage from side lobes – A neural network : to accommodate variations in interfering signals• Neural network adjusts its free parameters and acts as an attentional neurocomputer.
  48. 48. Associative Memory• Memory is relatively enduring neural alterations induced by the interaction of an organism with its environment.• Activity must be stored in memory through a learning process• Memory may be short term or long term• Associative memory – Distributed – Stimulus (key) pattern and response (stored) pattern vectors – Information is stored in memory by setting up a spatial pattern of neural activities across a large number of neurons – Information in stimulus also contains storage location and address for retrieval – High degree of resistance to noise and damage of a diffusive kind – May be interactions between different patterns stored in memory and thus errors in recall process
  49. 49. Memory and noise• For a linear network yk = W(k)xk• Total experience gained M = Σk=1..q W(k)• Memory matrix Mk = Mk-1 + W(k); k = 1..q• Estimate of memory matrix Me = Σk=1..q ykxkT• Correlation matrix memory Me= YXT• X = key matrix; Y = memorized matrix• Recall : y= Mxj• y = yj + vj ; vj = noise vector is due to cross talk between key vector xj and all other key vectors stored in memory• For a linear signal space cosine of angle between vectors xj and xk cos(xk,xj) = xkTxj/(|xk|.|xj|)• Noise vector vj = Σk=1..m cos(xk,xj)yk
  50. 50. Orthogonality, Community and Errors• The memory associates perfectly (noise vector is zero) when the key vectors are orthogonal, i.e. xkTxj = {1 when k=j and 0 when k≠j}• If key patterns are not orthogonal or highly separated it leads to confusion and errors• Community of set of patterns {xkey } can be such that xkTxj >= ᵞ for k≠j• If the lower bound ᵞis large enough, the memory may fail to distinguish the response y from any other key pattern contained in the set {xkey}
  51. 51. Adaptation• Spatiotemporal nature of learning• Temporal structure of experience from insects to humans, thus animal can adapt its behavior• In time-stationary environment, – supervised learning possible, – synaptic weights can be frozen after learning – learning system relies on memory• In non-time-stationary environments – supervised learning inadequate – network needs a way to track the statistical variations in environment with time – desirable for neural network to continually adapt its free parameters to respond in real time – this requires continuous learning – Linear adaptive filters perform continuous learning • Used in radar, sonar, communications, seismology, biomedical signal processsing • In a mature state of development • Nonlinear adaptive filters, development not yet mature.
  52. 52. Pseudo stationary process• Neural network requires stable time for computation• How can it adapt to signals varying in time?• Many non stationary processes change slowly enough for the process to be considered pseudo stationary over a window of short enough duration. – Speech signal: 10 – 30 ms – Radar returns from ocean surface: few seconds – Long range weather forecasting: few minutes – Long range stock market trends: few days• Retrain network at regular intervals, dynamic approach – Select a window short enough for data to be considered pseudo stationary – Use the sampled data to train the network – Keep data samples in a FIFO, add new sample and drop oldest data sample – Use updated data window to retrain and repeat• Network undergoes continual training with time ordered examples• Non linear filter : a generalization of linear adaptive filters• Resources available must be fast enough to complete the compute in one sampling period.
  53. 53. Rosenblatt’s perceptron• Type: feed forward• Neuron layers: 1 I/P, 1 O/P• Input value types: binary• Activation function: Hard Limiter• Learning method: Supervised• Learning Algorithm: Hebb’s learning rule• Used in: Simple logic operations; pattern classification
  54. 54. Perceptron weight updates
  55. 55. Perceptron
  56. 56. Perceptron Convergence Theorem• 1: Initialization : set w(0) = 0• 2: Activation: at time step n, activate the perceptron by applying continuous valued input vector x(n) and desired response d(n)• 3: Computation of Actual Response: Compute the actual response of the perceptron – y(n) = sgn(wT(n)x(n))• 4: Adaptation of weight vector: Update the weight vector of the perceptron: – w(n+1) = w(n) + η[d(n) – y(n)]x(n) – Where – D(n) = +1 if x(n) belongs to class C1 – = -1 if x(n) belongs to class C2• Continuation: Increment time step n by one and go back to step 2
  57. 57. LMS Rule• Also known as: – Delta rule – Adaline rule – Widrow Hopf rule
  58. 58. Neural Network Hardware• Hardware runs orders of magnitude faster than software• Two approaches: – General, but probably expensive, system that can be reprogrammed for many kinds of tasks • e.g. Adaptive Solutions CNAPS – Specialized but cheap chip to do one thing very quickly and efficiently. • e.g. IBM ZISC• Number of neurons vary from 10 to 10**6• Precision is mostly limited to 16 bit fixed point for weights and 8 bit fixed point for outputs• Recurrent NNs may require output of >16 bits• Performance is measured in – number of multiply and accumulate operations in unit time (MCPS: millions of connections per second) – Rate of weight updates (MCUPS: millions of connections update per second)
  59. 59. NN Hardware categories• Neurocomputers – Standard chips • Sequential + Accelerator • Multiprocessor – Neuro chips • Analog • Digital • Hybrid
  60. 60. Hardware Implementation (Accelerator Boards)• Accelerator boards – Most frequently used neural commercial hardware • Relatively cheap • Widely available • Simple to connect to PCs, workstations • Have user friendly software tools • However usually specialized for certain tasks and may lack flexibility – Based on neural network chips • IBM ZISC036 : 36 neurons; RBF network; RCE (or ROI algorithm) • PCI card: 19 chips, 684 prototypes, • Can process 165,000 patterns per second; where patterns are 64 8-bit element vectors. • SAIC Sigma-1 • Neuro Turbo • HNC – Some use just fast DSPs
  61. 61. Hardware Implementation (General Purpose Processors)• Neuro computers built from general purpose Processors – BSP400 – COKOS – RAP (Ring Array Processor) • Used for development of connectionist algorithms for speech recognition • 4 to 40 TMS320C20 DSPs • Connected via ring of Xilinx FPGAs • VME bus to connect to host computer • 57 MCPS in feed forward mode • 13.2 MCPS in back propagation training