B Eng Final Year Project Presentation

1,773 views
1,686 views

Published on

Design and implementation of a Neural Network based image compression engine as part of Final Year Project by Jesu Joseph and Shibu Menon at Nanyang Technological University. The project won the best possible grade and excellent accolades from the research center.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,773
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
26
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

B Eng Final Year Project Presentation

  1. 1. Parallel architecture for image compression Introduction | Algorithm | Architecture | Results | Conclusion | Q&A A parallel architecture for image compression (2004) An FYP presentation by: Jesu Joseph, Shibu Menon
  2. 2. Parallel architecture for image compression Introduction | Algorithm | Architecture | Results | Conclusion | Q&A <ul><li>Introduction (Jesu Joseph) </li></ul><ul><li>Algorithm (Shibu Menon) </li></ul><ul><li>Architecture (Shibu Menon) </li></ul><ul><li>Results (Jesu Joseph) </li></ul><ul><li>Conclusion (Jesu Joseph) </li></ul>
  3. 3. Parallel architecture for image compression Introduction | Algorithm | Architecture | Results | Conclusion | Q&A Introduction
  4. 4. Parallel architecture for image compression Introduction | Algorithm | Architecture | Results | Conclusion | Q&A Why image compression? <ul><li>Low color displays </li></ul><ul><li>Real time video compression and streaming </li></ul><ul><li>Save storage </li></ul><ul><li>Applications include: </li></ul><ul><ul><li>Digital camera displays </li></ul></ul><ul><ul><li>Device to device video streaming </li></ul></ul><ul><ul><li>Devices with limited storage </li></ul></ul>
  5. 5. Parallel architecture for image compression Introduction | Algorithm | Architecture | Results | Conclusion | Q&A Artificial neural networks <ul><li>Inspired by the human brain </li></ul><ul><li>Human learning: </li></ul><ul><ul><li>Time dependent </li></ul></ul><ul><ul><li>Quality of brain dependent </li></ul></ul><ul><ul><li>Complexity of input dependent </li></ul></ul><ul><li>PC processing: </li></ul><ul><ul><li>INPUT >> SOFTWARE >> HARDWARE >> OUTPUT </li></ul></ul>
  6. 6. Parallel architecture for image compression Introduction | Algorithm | Architecture | Results | Conclusion | Q&A Artificial neural networks Ease of upgrade Self correction Self improvement Self learning
  7. 7. Parallel architecture for image compression Introduction | Algorithm | Architecture | Results | Conclusion | Q&A Our Project <ul><li>Use neural network techniques to design and implement a stand-alone chip for image compression </li></ul><ul><ul><li>Self learning </li></ul></ul><ul><ul><li>Self improvement </li></ul></ul><ul><ul><li>Ease of upgrade </li></ul></ul>
  8. 8. Parallel architecture for image compression Introduction | Algorithm | Architecture | Results | Conclusion | Q&A Our Project <ul><li>Stage 1 - Algorithm: </li></ul><ul><ul><li>Studied the theoretical algorithm </li></ul></ul><ul><ul><li>Optimize the algorithm for real time performance </li></ul></ul><ul><ul><li>Optimize the algorithm for ease of implementation </li></ul></ul><ul><li>Stage 2 - Architecture: </li></ul><ul><ul><li>Block level architecture design </li></ul></ul><ul><ul><li>Component level hardware design </li></ul></ul><ul><ul><li>Hardware coding (Verilog) </li></ul></ul><ul><ul><li>Design generation </li></ul></ul>
  9. 9. Parallel architecture for image compression Introduction | Algorithm | Architecture | Results | Conclusion | Q&A Our Project <ul><li>Stage 3 - Testing: </li></ul><ul><ul><li>Simulation of individual modules with test-benches </li></ul></ul><ul><ul><li>Data verification of individual modules </li></ul></ul><ul><ul><li>Result verification </li></ul></ul><ul><li>Stage 4 – Implementation: </li></ul><ul><ul><li>Synthesis </li></ul></ul><ul><ul><li>FPGA testing </li></ul></ul>
  10. 10. Parallel architecture for image compression Introduction | Algorithm | Architecture | Results | Conclusion | Q&A Basics of image compression (R,G,B) = (20,48,206) 10 = (14,30,CE) 16 = (00001110, 00110000, 11001110) 2 <ul><li>Normal format </li></ul><ul><li>24 bits per pixel </li></ul><ul><li>16,777,216 (16 million) possible colors </li></ul><ul><li>512x512 image size = 512 x 512 x 3 ≈ 800kB </li></ul><ul><li>16 Color format </li></ul><ul><li>Bits per pixel in 16 color format = 4 bits </li></ul><ul><li>Size of the image in 16 color format = 512 x 512 x .5 ≈ 130kB </li></ul><ul><li>About 6:1 image size compression and bandwidth improvement </li></ul>
  11. 11. Parallel architecture for image compression Introduction | Algorithm | Architecture | Results | Conclusion | Q&A Our design Color chooser Encoder About 170 different colors 16 colors, each represented by a 4-bit number Compressed image (FF, FF, FF) 1111 … … (40, 68, 90) 0011 (39,0A, 9D) 0010 (10,10,A3) 0001 (00,00,00) 0000
  12. 12. Parallel architecture for image compression Introduction | Algorithm | Architecture | Results | Conclusion | Q&A Our design Input pixels Learn the image Create the codebook Improve the code book Encode the image Code book Compressed image Decode the image (FF, FF, FF) 1111 … … (40, 68, 90) 0011 (39,0A, 9D) 0010 (10,10,A3) 0001 (00,00,00) 0000
  13. 13. Parallel architecture for image compression Introduction | Algorithm | Architecture | Results | Conclusion | Q&A Algorithm
  14. 14. ALGORITHM <ul><li>NORMAL ALGORITHM </li></ul><ul><ul><li>STEPS </li></ul></ul><ul><li>MODIFICATIONS MADE </li></ul><ul><ul><li>WHY THESE MODIFICATIONS </li></ul></ul><ul><li>MAIN FUNCTIONS </li></ul><ul><ul><li>WHAT THE ALGORITHM DOES </li></ul></ul><ul><li>ALGORITHM STEPS </li></ul><ul><ul><li>8-BIT </li></ul></ul><ul><ul><li>7-BIT </li></ul></ul><ul><li>ADVANTAGE OF 7-BIT ALGORITHM </li></ul>Parallel architecture for image compression Introduction | Algorithm | Architecture | Results | Conclusion | Q&A ALGORITHM KOHONEN ALGORITHM MODIFICATIONS 7-BIT ALGORITHM 8-BIT ALGORITHM FUNCTIONS
  15. 15. OVERALL ALGORITHM Kohonen Algorithm IMAGE Learning Encoding Encoder Compressed Image Pixel by pixel encoding LUT +LUT DECOMPRESSION MSB PLANE Parallel architecture for image compression Introduction | Algorithm | Architecture | Results | Conclusion | Q&A Weight 16 Neuron 16 … .. … .. Weight2 Neuron 2 Weight 1 Neuron 1 Data 16 Address 16 … .. … .. Data 2 Address 2 Data 1 Address 1
  16. 16. KOHONEN ALGORITHM <ul><li>Assumed </li></ul><ul><ul><li>Neuron Weights (w) </li></ul></ul><ul><ul><ul><li>Denotes position of neuron in 3D space. </li></ul></ul></ul><ul><ul><li>Serial presentation of training vectors (x) </li></ul></ul><ul><ul><li>Time dependent learning rate [ α (t)] </li></ul></ul><ul><ul><li>Learning Count - t </li></ul></ul><ul><li>3-D space </li></ul><ul><ul><li>Represents R,G and B </li></ul></ul>Parallel architecture for image compression Introduction | Algorithm | Architecture | Results | Conclusion | Q&A RED GREEN BLUE w1 w2 Neuron1 Neuron 2 x Input Training Vector
  17. 17. STEPS STEP 1: Find closest Neuron (neuron c) ||X(t) – W c (t)|| = min{||X(t)-W i (t)||} STEP 2: Update Weight of the winning Neuron and the Neurons in the topological neighborhood. Wi(t+1) = Wi(t) + α (t).{X(t) – W(t)} For i Є N c (t)  Neighborhood Iterate STEP1 and STEP2 Parallel architecture for image compression Introduction | Algorithm | Architecture | Results | Conclusion | Q&A
  18. 18. ALGORITHM MODIFICATION <ul><li>Why change ? </li></ul><ul><ul><li>Computational Expense </li></ul></ul><ul><ul><li>Hardware Complexity </li></ul></ul><ul><ul><li>Efficiency – 7-bit implementation </li></ul></ul><ul><li>Avoid Multiplication and recursive logic blocks </li></ul><ul><li>Tradeoff </li></ul><ul><ul><li>Time  Efficiency  Complexity </li></ul></ul><ul><li>Modifications discussed where relevant </li></ul>Parallel architecture for image compression Introduction | Algorithm | Architecture | Results | Conclusion | Q&A
  19. 19. MODIFIED ALGORITHM <ul><li>STEP 2: </li></ul><ul><li>Each Neuron calculates its weight difference from the input vector. </li></ul><ul><li>Manhattan Distance Σ |W ij – X|; </li></ul><ul><li>i  1-N(for i th neuron), j  r, g or b </li></ul><ul><li>STEP 3: </li></ul><ul><li>Winning Neuron with minimum Manhattan Distance chosen. </li></ul><ul><li>This neuron denoted Winner. </li></ul><ul><li>STEP 4: </li></ul><ul><li>Neurons in the topological neighborhood chosen </li></ul><ul><li>This neuron denoted Neighbor. </li></ul>Initialization based on Gray scale initialization (r = g = b) Manhattan Distance used instead of Euclidean Distance. It denotes the absolute distance of the neuron from input vector. Minimum Distance neuron can be chosen using binary/recursive searching Usual Kohonen Algorithm : Function f(t) => e.g. d = d0 ( 1- t/T ) Modified : Expanding Sphere. <ul><li>STEP 1 </li></ul><ul><li>Training Vector X (Xr, Xg, Xb) input to all neurons (N). </li></ul><ul><li>Neurons initialized with weight vectors w (gray scale initialization). </li></ul>STEP5 : Update Neuron weights Wi(t+1) = Wi(t) + α (t).{X(t) – W(t)} Learning rate α Є {1/2, 1/4, 1/8, 1/16…} <ul><li>Repeat for the next input vector </li></ul><ul><li>Stop after fixed number of iterations </li></ul>Parallel architecture for image compression Introduction | Algorithm | Architecture | Results | Conclusion | Q&A MODIFICATIONS Neighbor
  20. 20. 7-Bit Algorithm Parallel architecture for image compression Introduction | Algorithm | Architecture | Results | Conclusion | Q&A (0000_0001) (1111_1111) (000_0001)
  21. 21. 7-BIT vs. 8-BIT ALGORITHM Parallel architecture for image compression Introduction | Algorithm | Architecture | Results | Conclusion | Q&A <ul><li>Neuron Weight Components are 7-bits each </li></ul><ul><li>Input Vector components are 7-bits each (need for conversion). </li></ul><ul><li>Image reconstruction complex and involves looking up the MSB plane </li></ul><ul><li>Requires storage of the MSB plane and LUT </li></ul><ul><li>Neuron Weight components are 8-bits each (i.e. R= 8, G= 8 and B =8) </li></ul><ul><li>Input Vector components are 7-bits each. </li></ul><ul><li>Image Reconstruction a simple matter of looking up the pixel values from the lookup table </li></ul><ul><li>Requires storage of only the LUT. </li></ul>7-Bit Algorithm 8-Bit Algorithm
  22. 22. ARCHITECTURE
  23. 23. SYSTEM ARCHITECTURE BROADCAST ARCHITECTURE <ul><li>Central Controller broadcasts control signals to the neuron. </li></ul><ul><li>Neurons have the ability to take control of the global bus. </li></ul><ul><li>Arbitration eliminates contention for central bus. </li></ul><ul><li>Hardware efficient since rich interconnection would mean infeasible number of i/o pins. </li></ul><ul><li>Expansion of neural network simplified. </li></ul><ul><li>Only one feedback signal from the network to the controller. </li></ul>NEURON STRUCTURE <ul><li>All Arithmetic computations use the same block “ARITHMETIC_UNIT” </li></ul><ul><li>Variable Learning rate implemented by using Shift register. </li></ul>Parallel architecture for image compression Introduction | Algorithm | Architecture | Results | Conclusion | Q&A ARHITECTURE NETWORK STRUCTURE NEURON STRUCTURE GLOBAL CONTROLLER ARCHITECTURAL NOVELTY ALGORITHM – ARCHITECTURE TRANSLATION Neuron 15 Neuron 13 Neuron 11 Neuron 9 Neuron 7 Neuron 5 Neuron 3 Neuron 1 Neuron 14 Neuron 12 Neuron 10 Neuron 8 Neuron 6 Neuron 4 Neuron 2 Neuron 16 GLOBAL CONTROLLER GLOBAL CONTROLLER NEURON 1 NEURON 2 NEURON 3 NEURON 5 NEURON 7 NEURON 11 NEURON 13 NEURON 9 NEURON 15 NEURON 16 NEURON 14 NEURON 12 NEURON 10 NEURON 8 NEURON 6 NEURON 4
  24. 24. ALGORITHM TO ARCHITECTURE <ul><li>Step 6: Winner Updating </li></ul><ul><ul><li>Learning Rate for winner based on value in the frequency count register. </li></ul></ul><ul><ul><li>Red value updated based on: </li></ul></ul><ul><ul><ul><li>W R = W R + (T 2RC * T 2R α (f))  α (f) is the learning rate as a function of frequency. </li></ul></ul></ul><ul><ul><li>This step repeated for Green and Blue Values. </li></ul></ul><ul><ul><li>Only neuron with Wnr flag set updates these values. </li></ul></ul><ul><ul><li>Learning rate implemented by using a shift register </li></ul></ul><ul><ul><ul><li>Eg. α = 0.875  1/2+ 1/4+ 1/8 </li></ul></ul></ul><ul><li>Step 2: Weights Initialization </li></ul><ul><ul><li>2 cycles per neuron : address broadcast and then the data. </li></ul></ul><ul><li>GC: </li></ul><ul><ul><li>Cycle 1: Address in the DATA [8:0] with add_cyc. The particular neuron is selected. </li></ul></ul><ul><ul><li>Cycle 2: The W R1 (=W g1 = W g2 ) value on DATA [8:0] together with MEM_ADD [2:0]. The data is read by the RAM. </li></ul></ul><ul><li>Step 1: FC initialization </li></ul><ul><ul><li>Global Controller fills in “FC” for all neurons. </li></ul></ul><ul><ul><li>Frequency counter – starts from threshold value and is decremented for the winning neuron. Neuron disabled when FC=0. </li></ul></ul><ul><li>GC: ini_freq asserted with the data in DATA[8:0] </li></ul>Initialization: Add W R W G W B 0 00 00 00 1 08 08 08 2 10 10 10 … … … … 15 78 78 78 <ul><li>Step 3: Manhattan Distance calculation </li></ul><ul><ul><li>Arithmetic Unit of each neuron calculates |W R -X Ri | + | W G -X Gi | + |W B -X B|i | and stores it in register T2 </li></ul></ul><ul><li>GC: </li></ul><ul><ul><li>Step 1: W R address on RAM_ADD [2:0] with mem_rd asserted. </li></ul></ul><ul><ul><li>Step 2: First input’s red value on DATA [8:0] and assert st_cal. </li></ul></ul><ul><ul><li>Step 3: When first posedge(S0) is detected, read T 2R into the memory and T 2RC into the flag register. Repeat Step1&2 with the green value. </li></ul></ul><ul><ul><li>Step 4: When the second posedge(S0) is detected, read T 2G into the memory and T 2GC into the flag register. Repeat Step 1&2 with the blue value. </li></ul></ul><ul><ul><li>Step 5: When the third posedge(S0) is detected, read T 2B into the memory and T 2BC into the flag register. </li></ul></ul><ul><ul><li>Step 6: When the next negedge(S0) is detected, read T 2 into the memory using data_en. </li></ul></ul><ul><li>Step 4: Find the minimum Manhattan distance </li></ul><ul><ul><li>Any number less than 512 can be guessed in 10 steps, by using binary searching. The “<“ or “>” relation between the guessed number and the original number is used. </li></ul></ul><ul><ul><li>Neuron with least Distance has Wnr flag set. </li></ul></ul><ul><li>GC: </li></ul><ul><ul><li>Step 1: T 2 value is written to MEM_OUT[8:0] </li></ul></ul><ul><ul><li>Step 2: The number 512 is broadcasted to the negative terminal of the differentiator. </li></ul></ul><ul><ul><li>Step 3: If high S1 is detected, the least Manhattan distance lies below 512. </li></ul></ul><ul><ul><li>Step 4: Steps 1 to 3 are repeated with various values (256, 128…) depending on S1 after each stage. </li></ul></ul><ul><ul><li>Step 5: At the 10 th cycle, min(T 2 ) is on DATA[8:0]. Wnr Flag is loaded. The winning neuron(s) will have Wnr flag set to 1. </li></ul></ul><ul><li>Step 5: FC update and winner weight broadcast </li></ul><ul><ul><li>Frequency counter is decremented and the winning neuron takes control of the bus. </li></ul></ul><ul><ul><li>T2 is calculated in each neuron. </li></ul></ul><ul><li>GC: </li></ul><ul><ul><li>Step 1: dec_freq is asserted and the winning neuron decrements its frequency counter. </li></ul></ul><ul><ul><li>Step 2: nrn_ctrl is asserted with the W R MEM_ADD[2:0]. The winning neuron broadcasts its W R value on DATA[8:0]. All neurons, including the winning neuron, put their W R value on MEM_OUT [8:0]. </li></ul></ul><ul><ul><li>Step 3: GC asserts st_cal. Step 2 is repeated with W G and W B . </li></ul></ul><ul><ul><li>Step 4: The accumulated value is written to T2 using data_en, MEM_ADD[2:0] and mem_en. </li></ul></ul><ul><li>Step 7: Neighbor Determination and Updating </li></ul><ul><ul><li>All neurons have T1 value stored. </li></ul></ul><ul><ul><li>GC broadcasts Neighbor size values and neurons with T1<neighbor size fall in the neighborhood. </li></ul></ul><ul><ul><li>These neurons have their Nbr flag set. </li></ul></ul><ul><ul><li>Neighbor neurons are updated using </li></ul></ul><ul><ul><ul><li>W r = W r + (T 2RC )(T 2R /16) </li></ul></ul></ul><ul><ul><li>The Neighbor sise increases progressively </li></ul></ul>Neuron address w R (7) T 1 (9) w G (7) w B (7) T 2 (9) T 2R (7) T 2G (7) T 2B (7) T 2RC (1) T 2GC (1) T 2BC (1) FC (18) Distance from the winning neuron Distance from the input pixel Weight vectors Frequency counter Registers used in the architecture <ul><li>Step 8: Iterate for next input pixel </li></ul><ul><ul><li>Stop after fixe number of iterations. </li></ul></ul>Parallel architecture for image compression Introduction | Algorithm | Architecture | Results | Conclusion | Q&A
  25. 25. ARCHITECTURAL NOVELTIES Parallel architecture for image compression Introduction | Algorithm | Architecture | Results | Conclusion | Q&A Implementation of 7-bit learning: 7-bit learning, mapping of pixels to an octal space and encoding the MSB plane with the image, are new theoretical ideas that we implemented on hardware. This mode should theoretically create images with a better quality than those encoded using 8-bit mode. This is because the neurons are more closely packed in a smaller space, hence creating a better response on the structure from each pixel. Implementation of the 8-bit and the 7-bit Learning Algorithm: The same hardware can process an image in both 7-bit and 8-bit modes. A single push-button switch on the FPGA board sets the mode for a cycle. This is useful because certain images give a better output on 7-bit mode than on 8-bit or vice versa and they can be compared for later studies. This is done, keeping in mind the need for future upgrading of the functionalities. A module can be added to the design that calculates mean-square error for both 7-bit and 8-bit images and the better one can be selected.
  26. 26. ARCHITECTURAL NOVELTIES Parallel architecture for image compression Introduction | Algorithm | Architecture | Results | Conclusion | Q&A Integration of encoding hardware to the learning hardware: The integration of encoding and learning hardware ensures faster compression and reduces the hardware overhead. This is done, keeping in mind the future practical application of the hardware for real-time video compression, rather than just for stand-alone images. Implementation of the variable learning rate: Variable learning rate (using learning rates of 1/2,1/4,…etc.) is a novel feature of this arhitecture. This ensures better updating of neighbors based on its distance from the winning neuron, rather than a fixed updating. The neighbors are updated based on 5 ranges of distances from the winner and the updating distances calculated through theoretical calculations.
  27. 27. ARCHITECTURAL NOVELTIES Parallel architecture for image compression Introduction | Algorithm | Architecture | Results | Conclusion | Q&A Implementation of the learning rate depending on the frequency count: The frequency count value is calculated so that all neurons get equal chance of being the winner. At the same time, the algorithm ensures that a neuron that has been a winner the most number of the time doesn’t get updated as much as the ones that are not as lucky. This is not seen in any other similar algorithms. Implementation of the Neighbor updating: Neighbor-updating together with the winner-updating is another novel feature of our algorithm. This makes the design complicated, but the output quality is considerably improved compared to other architecture.
  28. 28. ARCHITECTURAL NOVELTIES Parallel architecture for image compression Introduction | Algorithm | Architecture | Results | Conclusion | Q&A <ul><li>Hardware Features : </li></ul><ul><li>Absolutely no redundant hardware. </li></ul><ul><li>All computations done using the same hardware blocks. (Shift Register and Arithmetic Unit). </li></ul><ul><li>All control features divested in the Global Controller. </li></ul>
  29. 29. Parallel architecture for image compression Introduction | Algorithm | Architecture | Results | Conclusion | Q&A Results
  30. 30. Parallel architecture for image compression Testing strategy Data verification Result verification Introduction | Algorithm | Architecture | Results | Conclusion | Q&A
  31. 31. Parallel architecture for image compression Testing strategy – data verification Data verification <ul><li>Modelsim SE 5.5a </li></ul><ul><li>4 random input pixels, 1 loop, 16 neurons, 7/8 bits </li></ul><ul><li>Signals for each module viewed & verified </li></ul><ul><li>Major verifications: </li></ul><ul><ul><li>Correct winner selection </li></ul></ul><ul><ul><li>GC state transition </li></ul></ul><ul><ul><li>Correct neighbor/winner update </li></ul></ul><ul><ul><li>Correct number of encoded/decoded pixels </li></ul></ul>Modelsim TEST.V TOP.V (Top-level synthesizable module) NEURON_ARRAY GLOBAL_CONTROLLER ENCODER TEST_INPUT Introduction | Algorithm | Architecture | Results | Conclusion | Q&A
  32. 32. Parallel architecture for image compression Testing strategy Data verification Result verification Introduction | Algorithm | Architecture | Results | Conclusion | Q&A
  33. 33. Parallel architecture for image compression Testing strategy – Result verification C Program Configuration file Decode.v Encoded_image.dat Decoded_image.dat Output.tiff input.tiff Config.dat Result verification Introduction | Algorithm | Architecture | Results | Conclusion | Q&A TOP.V (Top-level synthesizable module) NEURON_ARRAY GLOBAL_CONTROLLER ENCODER
  34. 34. Parallel architecture for image compression Testing strategy – Result verification Result verification Introduction | Algorithm | Architecture | Results | Conclusion | Q&A <ul><li>16 neurons 1 loop 7 bit </li></ul><ul><li>16 neurons 1 loop 8 bit </li></ul><ul><li>16 neurons 5 loop 7 bit </li></ul><ul><li>16 neurons 5 loop 8 bit </li></ul><ul><li>16 neurons 10 loop 7 bit </li></ul><ul><li>16 neurons 10 loop 8 bit </li></ul><ul><li>32 neurons 1 loop 7 bit </li></ul><ul><li>32 neurons 1 loop 8 bit </li></ul><ul><li>32 neurons 5 loop 7 bit </li></ul><ul><li>32 neurons 5 loop 8 bit </li></ul><ul><li>32 neurons 10 loop 7 bit </li></ul><ul><li>32 neurons 10 loop 8 bit </li></ul>
  35. 35. Parallel architecture for image compression Testing strategy – Result verification Mean square error = Σ [(R o -R c ) 2 + (G o -G c ) 2 +(B o -B c ) 2 ] 2963.244129 2780.932903 Introduction | Algorithm | Architecture | Results | Conclusion | Q&A
  36. 36. Parallel architecture for image compression Screen captures Introduction | Algorithm | Architecture | Results | Conclusion | Q&A
  37. 37. Parallel architecture for image compression Synthesis Introduction | Algorithm | Architecture | Results | Conclusion | Q&A Technology Libraries Verilog Code Constraints Synthesis tool Prototype Model Schematic optimized net-list In-signal file Out-signal file Xilinx ISE Series 4.1i
  38. 38. Parallel architecture for image compression Synthesis Introduction | Algorithm | Architecture | Results | Conclusion | Q&A ================== Chip top-optimized ================== Summary Information: -------------------- Type: Optimized implementation Source: top, up to date Status: 0 errors, 0 warnings, 0 messages Export: exported after last optimization Chip create time: 0.000000s Chip optimize time: 598.734000s FSM synthesis: ONEHOT Target Information: ------------------- Vendor: Xilinx Family: VIRTEX Device: V800HQ240 Speed: -4 Chip Parameters: ---------------- Optimize for: Speed Optimization effort: Low Frequency: 50 MHz Is module: No Keep io pads: No Number of flip-flops: 3129 Number of latches: 0
  39. 39. Parallel architecture for image compression FPGA Implementation Introduction | Algorithm | Architecture | Results | Conclusion | Q&A <ul><li>Used the following: </li></ul><ul><ul><li>Board – XESS Corporation XSV </li></ul></ul><ul><ul><li>FPGA – XILINX VIRTEX XCV V800HQ240 </li></ul></ul><ul><ul><li>CPLD - XILINX XC95108 CPLD </li></ul></ul><ul><ul><li>MEMORY - Winbond AS7C4096 – 2 512K x 16 bit banks </li></ul></ul><ul><ul><li>1 on-board Dip switch </li></ul></ul><ul><ul><li>2 push buttons </li></ul></ul><ul><ul><li>9 bar graph LEDs </li></ul></ul>
  40. 40. Parallel architecture for image compression FPGA Implementation Introduction | Algorithm | Architecture | Results | Conclusion | Q&A Upload configuration files and image to the on-board memory Upload the FPGA bit file to the CPLD BAR LED 1 glows - FPGA is configured Press Push Button 1 (START) to start the learning process BAR LED 2 glows – 2 loops completed BAR LED 3 glows – 4 loops completed BAR LED 4 glows – 6 loops completed BAR LED 5 glows – 10 loops completed BAR LED 6 glows – Encoding completed Download the image and convert it to tiff format
  41. 41. Parallel architecture for image compression FPGA Implementation Introduction | Algorithm | Architecture | Results | Conclusion | Q&A
  42. 42. Parallel architecture for image compression Introduction | Algorithm | Architecture | Results | Conclusion | Q&A Conclusion
  43. 43. Parallel architecture for image compression Conclusion 1. 7-bit process better than 8-bit process Introduction | Algorithm | Architecture | Results | Conclusion | Q&A 2. Suitable for real-time encoding and streaming of video images (About 12 seconds at 5MHz) 3. Use of frequency count register gives better images 4. More the loops, better the image (8-bit, beyond 5 loops). Similar to human learning
  44. 44. Parallel architecture for image compression Recommendation 1. Algorithm can be modified to improve learning time Introduction | Algorithm | Architecture | Results | Conclusion | Q&A 2. Real time video compression with 2 parallel learning chips 3. Both 7-bit and 8-bit in the same hardware 4. MSB plane compression
  45. 45. Parallel architecture for image compression Introduction | Algorithm | Architecture | Results | Conclusion | Q&A Q & A

×