This document discusses challenges in developing computer vision software. It explores how the wrong programming model can fail, such as assuming images fit in memory or that pixels can be represented with 8-bit values. Numerical issues are also discussed, like how floating point arithmetic lacks precision. Examples show how simple operations like image differencing, convolution, and calculating standard deviation can have hidden problems. Overall, the document advocates being suspicious of software results and addresses common issues that can cause vision algorithms to go wrong.
Better neuroimaging data processing: driven by evidence, open communities, an...Gael Varoquaux
My current thoughts about methods validity and design in brain imaging.
Data processing is a significant part of a neuroimaging study. The choice of corresponding methods and tools is crucial. I will give an opinionated view how on a path to building better data processing for neuroimaging. I will take examples on endeavors that I contributed to: defining standards for functional-connectivity analysis, the nilearn neuroimaging tool, the scikit-learn machine-learning toolbox -an industry standard with a million regular users. I will cover not only the technical process -statistics, signal processing, software engineering- but also the epistemology of methods development. Methods govern our results, they are more than a technical detail.
A smart environment is one that is able to identify people, interpret their actions, and react appropriately. Thus, one of the most important building blocks of smart environments is a person identification system. Face recognition devices are ideal for such systems, since they have recently become fast, cheap, unobtrusive, and, when combined with voice-recognition, are very robust against changes in the environment.
Scikit-learn and nilearn: Democratisation of machine learning for brain imagingGael Varoquaux
This talk describe our efforts to bring easily usable machine learning to brain mapping. It covers both questions that machine learning can answer as well as two softwares developed to facilitate machine learning and it's application to neuroimaging.
Better neuroimaging data processing: driven by evidence, open communities, an...Gael Varoquaux
My current thoughts about methods validity and design in brain imaging.
Data processing is a significant part of a neuroimaging study. The choice of corresponding methods and tools is crucial. I will give an opinionated view how on a path to building better data processing for neuroimaging. I will take examples on endeavors that I contributed to: defining standards for functional-connectivity analysis, the nilearn neuroimaging tool, the scikit-learn machine-learning toolbox -an industry standard with a million regular users. I will cover not only the technical process -statistics, signal processing, software engineering- but also the epistemology of methods development. Methods govern our results, they are more than a technical detail.
A smart environment is one that is able to identify people, interpret their actions, and react appropriately. Thus, one of the most important building blocks of smart environments is a person identification system. Face recognition devices are ideal for such systems, since they have recently become fast, cheap, unobtrusive, and, when combined with voice-recognition, are very robust against changes in the environment.
Scikit-learn and nilearn: Democratisation of machine learning for brain imagingGael Varoquaux
This talk describe our efforts to bring easily usable machine learning to brain mapping. It covers both questions that machine learning can answer as well as two softwares developed to facilitate machine learning and it's application to neuroimaging.
Design and Analysis of DNA String Matching by Optical Parallel ProcessingHossein Babashah
This is my thesis presentation which is about design and analysis of DNA string matching by optical parallel processing. In this presentation, you will find on how to use optics for faster and more efficient processing.
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
Accord.Net: Looking for a Bug that Could Help Machines Conquer HumankindPVS-Studio
Articles discussing the results of analysis of open-source projects are a good thing as they benefit everyone: some, including project authors themselves, can find out what bugs lurk in a project; others discover for themselves the static analysis technology and start using it to improve their code's quality. For us, it is a wonderful means to promote PVS-Studio analyzer, as well as to put it through some additional testing. This time I have analyzed Accord.Net framework and found lots of interesting issues in its code.
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATIONijaia
In natural language processing, attention mechanism in neural networks are widely utilized. In this paper, the research team explore a new mechanism of extending output attention in recurrent neural networks for dialog systems. The new attention method was compared with the current method in generating dialog sentence using a real dataset. Our architecture exhibits several attractive properties such as better handle long sequences and, it could generate more reasonable replies in many cases.
New Design Architecture of Chaotic Secure Communication System Combined with ...ijtsrd
In this paper, the exponential synchronization of secure communication system is introduced and a novel secure communication design combined with linear receiver is constructed to ensure the global exponential stability of the resulting error signals. Besides, the guaranteed exponential convergence rate of the proposed secure communication system can be correctly calculated. Finally, some numerical simulations are offered to demonstrate the correctness and feasibility of the obtained results. Yeong-Jeu Sun "New Design Architecture of Chaotic Secure Communication System Combined with Linear Receiver" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-1 , December 2020, URL: https://www.ijtsrd.com/papers/ijtsrd38214.pdf Paper URL : https://www.ijtsrd.com/engineering/electrical-engineering/38214/new-design-architecture-of-chaotic-secure-communication-system-combined-with-linear-receiver/yeongjeu-sun
Using Deep Learning to Find Similar DressesHJ van Veen
Report by Luís Mey ( https://www.linkedin.com/in/lu%C3%ADs-gustavo-bernardo-mey-97b38927/ ) on Udacity Machine Learning Course - Final Project: Use Deep Learning to Find Similar Dresses.
Performance Analysis of Various Data Mining Techniques on Banknote Authentica...inventionjournals
In this paper, we describe the functionality features for authenticating in Euro banknotes. We applied different data mining algorithms such as KMeans, Naive Bayes, Multilayer Perceptron, Decision trees (J48), and Expectation-Maximization(EM) to classifying banknote authentication dataset. The experiments are conducted in WEKA. The goal of this project is to obtain the higher authentication rate in banknote classification
Design and Analysis of DNA String Matching by Optical Parallel ProcessingHossein Babashah
This is my thesis presentation which is about design and analysis of DNA string matching by optical parallel processing. In this presentation, you will find on how to use optics for faster and more efficient processing.
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
Accord.Net: Looking for a Bug that Could Help Machines Conquer HumankindPVS-Studio
Articles discussing the results of analysis of open-source projects are a good thing as they benefit everyone: some, including project authors themselves, can find out what bugs lurk in a project; others discover for themselves the static analysis technology and start using it to improve their code's quality. For us, it is a wonderful means to promote PVS-Studio analyzer, as well as to put it through some additional testing. This time I have analyzed Accord.Net framework and found lots of interesting issues in its code.
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATIONijaia
In natural language processing, attention mechanism in neural networks are widely utilized. In this paper, the research team explore a new mechanism of extending output attention in recurrent neural networks for dialog systems. The new attention method was compared with the current method in generating dialog sentence using a real dataset. Our architecture exhibits several attractive properties such as better handle long sequences and, it could generate more reasonable replies in many cases.
New Design Architecture of Chaotic Secure Communication System Combined with ...ijtsrd
In this paper, the exponential synchronization of secure communication system is introduced and a novel secure communication design combined with linear receiver is constructed to ensure the global exponential stability of the resulting error signals. Besides, the guaranteed exponential convergence rate of the proposed secure communication system can be correctly calculated. Finally, some numerical simulations are offered to demonstrate the correctness and feasibility of the obtained results. Yeong-Jeu Sun "New Design Architecture of Chaotic Secure Communication System Combined with Linear Receiver" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-1 , December 2020, URL: https://www.ijtsrd.com/papers/ijtsrd38214.pdf Paper URL : https://www.ijtsrd.com/engineering/electrical-engineering/38214/new-design-architecture-of-chaotic-secure-communication-system-combined-with-linear-receiver/yeongjeu-sun
Using Deep Learning to Find Similar DressesHJ van Veen
Report by Luís Mey ( https://www.linkedin.com/in/lu%C3%ADs-gustavo-bernardo-mey-97b38927/ ) on Udacity Machine Learning Course - Final Project: Use Deep Learning to Find Similar Dresses.
Performance Analysis of Various Data Mining Techniques on Banknote Authentica...inventionjournals
In this paper, we describe the functionality features for authenticating in Euro banknotes. We applied different data mining algorithms such as KMeans, Naive Bayes, Multilayer Perceptron, Decision trees (J48), and Expectation-Maximization(EM) to classifying banknote authentication dataset. The experiments are conducted in WEKA. The goal of this project is to obtain the higher authentication rate in banknote classification
Rooter: A Methodology for the Typical Unification
of Access Points and Redundancy
Many physicists would agree that, had it not been for
congestion control, the evaluation of web browsers might never
have occurred. In fact, few hackers worldwide would disagree
with the essential unification of voice-over-IP and public-
private key pair. In order to solve this riddle, we confirm that
SMPs can be made stochastic, cacheable, and interposable.
The goal of this report is the presentation of our biometry and security course’s project: Face recognition for Labeled Faces in the Wild dataset using Convolutional Neural Network technology with Graphlab Framework.
Automatic License Plate Recognition using OpenCVEditor IJCATR
Automatic License Plate Recognition system is a real time embedded system which automatically recognizes the license plate of vehicles. There are many applications ranging from complex security systems to common areas and from parking admission to urban traffic control. Automatic license plate recognition (ALPR) has complex characteristics due to diverse effects such as of light and speed. Most of the ALPR systems are built using proprietary tools like Matlab. This paper presents an alternative method of implementing ALPR systems using Free Software including Python and the Open Computer Vision Library.
Automatic License Plate Recognition using OpenCV Editor IJCATR
Automatic License Plate Recognition system is a real time embedded system which automatically recognizes the license plate of vehicles. There are many applications ranging from complex security systems to common areas and from parking admission to urban traffic control. Automatic license plate recognition (ALPR) has complex characteristics due to diverse effects such as of light and speed. Most of the ALPR systems are built using proprietary tools like Matlab. This paper presents an alternative method of implementing ALPR systems using Free Software including Python and the Open Computer Vision Library.
We're in the age of toolbox Machine Learning. What should you know about how to use emerging technologies like pre-trained models, large-language models, fine-tuning, and MLOps solutions to quickly and effectively build AI products.
Color based image processing , tracking and automation using matlabKamal Pradhan
Image processing is a form of signal processing in which the input is an image, such as a photograph or video frame. The output of image processing may be either an image or, a set of characteristics or parameters related to the image. Most image-processing techniques involve treating the image as a two-dimensional signal and applying standard signal-processing techniques to it. This project aims at processing the real time images captured by a Webcam for motion detection and Color Recognition and system automation using MATLAB programming.
In color based image processing we work with colors instead of object. Color provides powerful information for object recognition. A simple and effective recognition scheme is to represent and match images on the basis of color histograms.
Tracking refers to detection of the path of the color once the color based processing is done the color becomes the object to be tracked this can be very helpful in security purposes.
Automation refers to an automated system is any system that does not require human intervention. In this project I’ve automated the mouse that work with our gesture and do the desired tasks.
Finding Resource Manipulation Bugs in Linux CodeAndrzej Wasowski
Software projects suffer from conceptually simple resource manipulation bugs, such as accessing a de-allocated memory region, or acquiring a non-reentrant lock twice. The VBDB bug database contains entries for 100 such real bugs from several open source projects, including the Linux Kernel project. These historical bugs have been collected with the aim of giving concrete well understood and documented cases to program analysis researchers, in order to boost program verification research. I will discuss simplicity and complexity of real software manipulation bugs on examples selected from VBDB. One way to reduce the amount of such bugs is to use code scanners such as Smatch or Coccinelle. Unfortunately, while very efficient, code scanners are typically based on syntactic pattern matching, which is insufficient for identifying problems that span multiple functions and involve dynamically allocated memory. We have developed a shape-and-effect inference system for C that constructs a lightweight semantic abstraction, more analyzable than syntax. A model checker is then used to match semantic bug patterns over the control flow graph decorated with the shape-and-effect abstractions. Experiments run with our prototype analyzer (EBA) shows better precision and effectiveness than with syntactic bug scanners. We have been so far able to identify 10 previously unknown locking bugs in the Linux kernel. The bugs are confirmed as real by the Kernel developers, and five of them have been already fixed in response to our reports. I will conclude, sketching how we combine EBA with another tool, RECONFIGURATOR, to massively scan Linux kernel code for bugs in atypical source configurations.
One of the main problems with C++ is having a huge number of constructions whose behavior is undefined, or is just unexpected for a programmer. We often come across them when using our static analyzer on various projects. But, as we all know, the best thing is to detect errors at the compilation stage. Let's see which techniques in modern C++ help writing not only simple and clear code, but make it safer and more reliable.
Bridging Concepts and Practice in eScience via Simulation-driven EngineeringRafael Ferreira da Silva
The CyberInfrastructure (CI) has been the object of intensive research and development in the last decade, re- sulting in a rich set of abstractions and interoperable software implementations that are used in production today for supporting ongoing and breakthrough scientific discoveries. A key challenge is the development of tools and application execution frameworks that are robust in current and emerging CI configurations, and that can anticipate the needs of upcoming CI applications. This paper presents WRENCH, a framework that enables simulation-driven engineering for evaluating and developing CI application execution frameworks. WRENCH provides a set of high- level simulation abstractions that serve as building blocks for developing custom simulators. These abstractions rely on the scalable and accurate simulation models that are provided by the SimGrid simulation framework. Consequently, WRENCH makes it possible to build, with minimum software development effort, simulators that that can accurately and scalably simulate a wide spectrum of large and complex CI scenarios. These simulators can then be used to evaluate and/or compare alternate platform, system, and algorithm designs, so as to drive the development of CI solutions for current and emerging applications.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
The increased availability of biomedical data, particularly in the public domain, offers the opportunity to better understand human health and to develop effective therapeutics for a wide range of unmet medical needs. However, data scientists remain stymied by the fact that data remain hard to find and to productively reuse because data and their metadata i) are wholly inaccessible, ii) are in non-standard or incompatible representations, iii) do not conform to community standards, and iv) have unclear or highly restricted terms and conditions that preclude legitimate reuse. These limitations require a rethink on data can be made machine and AI-ready - the key motivation behind the FAIR Guiding Principles. Concurrently, while recent efforts have explored the use of deep learning to fuse disparate data into predictive models for a wide range of biomedical applications, these models often fail even when the correct answer is already known, and fail to explain individual predictions in terms that data scientists can appreciate. These limitations suggest that new methods to produce practical artificial intelligence are still needed.
In this talk, I will discuss our work in (1) building an integrative knowledge infrastructure to prepare FAIR and "AI-ready" data and services along with (2) neurosymbolic AI methods to improve the quality of predictions and to generate plausible explanations. Attention is given to standards, platforms, and methods to wrangle knowledge into simple, but effective semantic and latent representations, and to make these available into standards-compliant and discoverable interfaces that can be used in model building, validation, and explanation. Our work, and those of others in the field, creates a baseline for building trustworthy and easy to deploy AI models in biomedicine.
Bio
Dr. Michel Dumontier is the Distinguished Professor of Data Science at Maastricht University, founder and executive director of the Institute of Data Science, and co-founder of the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. His research explores socio-technological approaches for responsible discovery science, which includes collaborative multi-modal knowledge graphs, privacy-preserving distributed data mining, and AI methods for drug discovery and personalized medicine. His work is supported through the Dutch National Research Agenda, the Netherlands Organisation for Scientific Research, Horizon Europe, the European Open Science Cloud, the US National Institutes of Health, and a Marie-Curie Innovative Training Network. He is the editor-in-chief for the journal Data Science and is internationally recognized for his contributions in bioinformatics, biomedical informatics, and semantic technologies including ontologies and linked data.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
1. Vision Algorithmics
Adrian F. Clark
VASE Laboratory, Computer Science & Electronic Engineering
University of Essex, Colchester, CO4 3SQ
halien@essex.ac.uki
This essay explores vision software development. Rather than focussing on how to use languages,
libraries and tools, it instead explores some of the more difficult problems to solve: having the
wrong software model, numerical problems, and so on. It also touches on non-vision software that
often proves useful, and offers some guidelines on the practicalities of developing vision software.
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Failure of Programming Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3 Programming Versus Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4 Numerical Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
5 The Right Algorithm in The Right Place . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
6 Vision Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
7 Massaging Program Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
8 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1
2. Vision Algorithmics 2
1 Introduction
The purpose of a PhD is to give training and experience in research. The principal output of a
PhD is one or more contributions to the body of human knowledge. This might suggest that the
main concern of a thesis is describing the contribution, the “Big Idea,” but the reality is different:
the thesis must present experimental evidence, analysed in a robust way, that demonstrates that
the Big Idea is indeed valid. For theses in the computer vision area, this gathering and analysis
of experimental evidence invariably involves programming algorithms. Indeed, informal surveys
have found that PhD students working in vision typically spend over 50% of their time in pro-
gramming, debugging and testing algorithms. Hence, becoming experienced in vision software
development and evaluation is an important, if not essential, element of research training.
Many researchers consider the programming of vision algorithms to be straightforward be-
cause it simply instantiates the underlying mathematics. This attitude is naïve, for it is well
known that mathematically-tractable solutions do not guarantee viable computational algorithms.
Indeed, the author considers a programming language to be a formal notation for expressing al-
gorithms, one that can be checked by a machine, and a program to be the formal description of
an algorithm that incorporates mathematical, numerical and procedural aspects.
Rather than present a dusty survey of the vision algorithms available in the 100+ image pro-
cessing packages available commercially or for free, this essay tries to do something different:
it attempts to illustrate, through the use of examples, the major things that you should bear in
mind when coding up your own algorithms or looking at others’ implementations. It concentrates
largely on what can go wrong. The main idea the author is trying to instill is a suspicion of all
results produced by software! Example code is written in C but the principles are equally valid in
C++, Java, Matlab, Python or any other programming language you may use.
Having made you thoroughly paranoid about software issues, the document will then briefly
consider what packages are available free of charge both to help you write vision software and
analyse the results that come from it. That will be followed by a brief discussion that leads into
the exploration of performance characterisation in a separate lecture in the Summer School.
3. Vision Algorithmics 3
2 Failure of Programming Model
Let us consider the first area in which computer vision programs often go wrong. The problem,
which is a fundamental one in programming terms, is that the conceptual model one is using as
the basis of the software has deficiencies.
Perhaps the best way to illustrate this is by means of an example. The following C code
performs image differencing, a simple technique for isolating moving areas in image sequences.
typedef unsigned char byte;
void sub_ims (byte **i1, byte **i2, int ny, int nx)
{
int y, x;
for (y = 0; y < ny; y++)
for (x = 0; x < nx; x++)
i1[y][x] = i1[y][x] - i2[y][x];
}
The procedure consists of two nested loops, the outer one scanning down the lines of the image
and the inner one accessing each pixel of the line; it was written with exposition in mind, not
speed. Let us consider what is good and bad about the code.
Firstly, the code accesses the pixels in the correct order. Two-dimensional arrays in C are
represented as ‘arrays of arrays,’ which means that the last subscript should cycle fastest to access
adjacent memory locations. On PC- and workstation-class machines, all of which have virtual
memory subsystems, failure to do this could lead to large numbers of page faults, substantially
slowing down the code. Incidentally, it is commonly reported that this double-subscript approach
is dreadfully inefficient as it involves multiplications to subscript into the array — but that is
complete bunkum!
There are several bad aspects to the code. Arrays i1 and i2 are declared as being of type
unsigned char, which means that pixel values must lie in the range 0–255; so the code is
constrained to work with 8-bit imagery. This is a reasonable assumption for the current generation
of video digitizers, but the software cannot be used on data recorded from a 10-bit flat-bed scanner,
or from a 14-bit remote sensing satellite, without throwing some potentially important resolution
away.
What happens if, for some particular values of the indices y and x, the value in i2 is greater
than that in i1? The subtraction will yield a negative value, one that cannot be written back
into i1. Some run-time systems will generate an exception; others will silently reduce the result
modulo 256 and continue, so that the user erroneously believes that the operation succeeded.
In any case, one will not get what one expected. The simplest way around both this and the
previous problem is to represent each pixel as a 32-bit signed integer rather than as an unsigned
byte. Although images will then occupy four times as much memory (and require longer to read
from disk, etc.), it is always better to get the right answer slowly than an incorrect one quickly.
Alternatively, one could even represent pixels as 32-bit floating-point numbers as they provide the
larger dynamic range (at the cost of reduced accuracy) needed for things like Fourier transforms.
On modern processors, the penalty in computation speed is negligible for either 32-bit integers or
floating-point numbers. Indeed, because of the number of integer units on a single processor and
the way its pipelines are filled, floating-point code can sometimes run faster than integer code!
4. Vision Algorithmics 4
You should note that this problem with overflowing the representation is not restricted to
subtraction: addition and multiplication have at least as much potential to wreak havoc with the
data. Division requires even more care for, with a purely integer representation, one must either
recognise that the fractional part of any division will be discarded or remember to include code
that rounds the result, as in
i1[y][x] = (i1[y][x] + 255) / i2[y][x];
for 8-bit unsigned imagery; the number added changes to 65535 for 16-bit unsigned imagery, and
so on.
There are ways of avoiding these types of assumptions. Although it is not really appropriate
to go into these in detail in this document, an indication of the general approaches is relevant.
Firstly, by making a class (or structured data type in non-object-oriented languages such as C)
called ‘image’, one can arrange to record the data type of the pixels in each image; a minimal
approach in C is:
struct {
int type;
void *data;
} image;
...
image *i1, *i2;
Inside the processing routine, one can then select separate loops based on the value of i1->type.
The code also has serious assumptions implicitly built in. It assumes that images comprise only
one channel of information, so that colour or multi-spectral data require multiple invocations.
More seriously, it assumes that an image can be held entirely in computer memory. Again, this is
reasonable for imagery grabbed from a video camera but perhaps unreasonable for 6000 ⇥ 6000
pixel satellite data or for images recorded from 24 Mpixel digital cameras being processed on an
embedded system.
The traditional way of avoiding having to store the entire image in memory is to employ line-
by-line access:
for (y = 0; y < ny; y++) {
buf = getline (y);
for (x = 0; x < nx; x++)
...operate on buf[x]...
putline (y, buf);
}
This doesn’t actually have to involve line-by-line access to a disk file as getline can return a
pointer to the line of an image held in memory: it is logical line-by-line access. This approach has
been used by the author and colleagues to produce code capable of being used unchanged on both
serial and parallel hardware as well as on clusters, grids and clouds.
5. Vision Algorithmics 5
3 Programming Versus Theory
Sticking with this notion of using familiar image processing operations to illustrate problems, let us
consider convolution performed in image space. This involves multiplying the pixels in successive
regions of the image with a set of coefficients, putting the result of each set of computations
back into the middle of the region. For example, the well-known Laplacean operator uses the
coefficients 0
@
1 1 1
1 8 1
1 1 1
1
A
while the coefficients 0
@
1 1 1
1 1 1
1 1 1
1
A
yield a 3 ⇥ 3 blur. When programming up any convolution operator that uses such a set of coef-
ficients, the most important design question is what happens at the edges? There are several
approaches in regular use:
• don’t process the edge region, which is a one-pixel border for a 3 ⇥ 3 ‘mask’ of coefficients,
for example;
• reduce the size of the mask as one approaches the edge;
• imagine the image is reflected along its first row and column and program the edge code
accordingly;
• imagine the image wraps around cyclically.
Which of these is used is often thought to be down to the whim of the programmer; but only one
of these choices matches the underlying mathematics.
The correct solution follows from the description of the operator as a convolution performed in
image space. In signal processing, convolution is usually performed in Fourier space: the image is
Fourier transformed to yield a spectrum; that spectrum is multiplied by a filter, and the resulting
product inverse-transformed. The reason that convolutions are not programmed in this way in
practice is that, for small-sized masks, the Fourier approach requires more computation, even
when using the FFT discussed below. The use of Fourier transformation, or more precisely the
discrete Fourier transform, involves implicit assumptions; the one that is relevant here is that the
data being transformed are cyclic in nature, as though the image is the unit cell of an infinitely-
repeating pattern. So the only correct implementation, at least as far as the theory is concerned,
is the last option listed above.
6. Vision Algorithmics 6
4 Numerical Issues
One thing that seems to be forgotten far too often these days is that floating-point arithmetic is
not particularly accurate. The actual representation of a floating-point number is m ⇥ 2e
, analog-
ous to scientific notation, where m, the mantissa, can be thought of as having the binary point
immediately to its left and e, the exponent, is the power of two that ‘scales’ the mantissa. Hence,
(IEEE-format) 32-bit floating-point corresponds to 7–8 decimal digits and 64-bit to roughly twice
that. Because of this and other issues, floating-point arithmetic is significantly less accurate than a
pocket calculator! The most common problem is that, in order to add or subtract two numbers,
the representation of the smaller number must be changed so that it has the same exponent as
the larger, and this involves right-shifting binary digits in the mantissa to the right. This means
that, if the numbers differ by about 107
, all the digits of the mantissa are shifted out and the lower
number effectively becomes zero. The ‘obvious’ solution is to use 64-bit floating-point (double
in C) but this simply postpones the problem to bigger numbers, it does not solve it.
You might think that these sorts of problems will not occur in computer vision; after all, images
generally involve numbers only in the range 0–255. But that is simply not true; let us consider
two examples of where this kind of problem can bite the unwary programmer.
The first example is well-known in numerical analysis. The task of solving a quadratic equation
crops up surprisingly frequently, perhaps in fitting to a maximum when trying to locate things
accurately in images, or when intersecting vectors with spheres when working on 3D problems.
The solution to
ax2
+ bx + c = 0
is something almost everyone learns at school:
x =
b ±
p
b2 4ac
2a
and indeed this works well for many quadratic problems. But there is a numerical problem hidden
in there.
When the discriminant, b2
4ac, involves values that make b2
4ac, the nature of floating-
point subtraction alluded to above can make 4ac ! 0 relative to b2
so that the discriminant
becomes ±b...and this means that the lower of the two solutions to the equation is b + b = 0.
Fortunately, numerical analysts realized this long ago and have devised mathematically-equivalent
formulations that do not suffer from the same numerical instability. If we first calculate
q =
1
2
Ä
b + sgn(b)
p
b2 4ac
ä
then the two solutions to the quadratic are given by
x1 = c/q
and
x2 = q/a.
Even code that looks straightforward can lead to difficult-to-find numerical problems. Let us
consider yet another simple image processing operation, finding the standard deviation (SD) of
the pixels in an image. The definition of the SD is straightforward enough
1
N
NX
i=1
(xi ¯x)2
7. Vision Algorithmics 7
where ¯x is the mean of the image and N the number of pixels in it. (We shall ignore the distinction
between population and sample SDs, which is negligible for images.) If we program this equation
to calculate the SD, the code has to make two passes through the image: the first calculates the
mean, ¯x, while the second finds deviations from the mean and hence calculates the SD.
Most of us probably learnt in statistics courses how to re-formulate this: substitute the defini-
tion of the mean in this equation and simplify the result. We end up needing to calculate
X
x2
P
x
2
N
which can be programmed up using a single pass through the image as follows.
float sd (image *im, int ny, int nx)
{
float v, var, sum = sum2 = 0.0;
int y, x;
for (y = 0; y < ny; y++) {
for (x = 0; x < nx; x++) {
v = im[y][x];
sum = sum + v;
sum2 = sum2 + v * v;
}
}
v = nx * ny;
var = (sum2 - sum * sum/v) / v;
if (var <= 0.0) return 0.0;
return sqrt(var);
}
I wrote such a routine myself while doing my PhD and it was used by about thirty people on a
daily basis for two or three years before someone came to see me with a problem: on his image,
the routine was returning a value of zero for the SD, even though there definitely was variation in
the image.
After a few minutes’ examination of the image, we noticed that the image in question com-
prised large values but had only a small variation. This identified the source of the problem: the
single-pass algorithm involves a subtraction and the quantities involved, which represented the
sums of squares or similar, ended up differing by less than one part in ⇠ 107
and hence yielded
inaccurate results from my code. I introduced two fixes: the first was to change sum and sum2 to
be double rather than float; and to look for cases where the subtraction in question yielded a
result that was was small or negative and, in those cases, calculate only the mean and then make
a second pass through the image. It’s a fairly ad hoc solution but proved adequate, and I and my
colleagues haven’t been bitten again.
8. Vision Algorithmics 8
5 The Right Algorithm in The Right Place
The fast Fourier transform (FFT) is a classic example of the distinction between mathematics and
‘algorithmics.’ The underlying discrete Fourier transform is normally represented as a matrix mul-
tiplication; hence, for N data points, O(N2
) complex multiplications are required to evaluate it.
The (radix-2) FFT algorithm takes the matrix multiplication and, by exploiting symmetry proper-
ties of the transform kernel, reduces the number of multiplications required to O(N log2 N); for a
256 ⇥ 256 pixel image, this is a saving of roughly
✓
2562
256 ⇥ 8
◆2
=
✓
216
211
◆2
= 1024 times!
These days, an FFT is entirely capable of running in real time on even a moderately fast processor;
a DFT is not.
On the other hand, one must not use cute algorithms blindly. When median filtering, for
example, one obviously needs to determine the median of a set of numbers, and the obvious
way to do that is to sort the numbers into order and choose the middle value. Reaching for our
textbook, we find that the ‘quicksort’ algorithm (see, for example, [1]) has the best performance
for random data, again with computation increasing as O(N log2 N), so we choose that. But
O(N log2 N) is its best-case performance; its worst-case performance is O(N2
), so if we have an
unfortunately-ordered set of data, quicksort is a poor choice! A better compromise is probably
Shell’s sort algorithm: its best-case performance isn’t as good as that of the quicksort but its
worst-case performance is nowhere near as bad — and the algorithm is simpler to program and
debug.
Even selecting the ‘best’ algorithm is not the whole story. Median filtering involves working
on many small sets of numbers rather than one large set, so the way in which the sort algorithm’s
performance scales is not the overriding factor. Quicksort is most easily implemented recursively,
and the time taken to save registers and generate a new call frame will probably swamp the com-
putation, even on a RISC processor; so we are pushed towards something that can be implemented
iteratively with minimal overhead — which might even mean a bubble sort! But wait: why bother
sorting at all? There are median-finding algorithms that do not involve re-arranging data, either
by using multiple passes over the data or by histogramming. One of these is almost certainly
faster.
Where do you find out about numerical algorithms? The standard reference these days is [2],
though I should warn you that many numerical analysts do not consider its algorithms to be state-
of-the-art, or even necessarily good. (A quick web-search will turn up many discussions regarding
the book and its contents.) The best numerical algorithms that the author is aware of are those
licensed from NAg, the Numerical Algorithms Group. All UK higher education establishments
should have NAg licenses.
Considerations of possible numerical issues such as the ones highlighted here are important if
you need to obtain accurate answers or if your code forms part of an active vision system.
9. Vision Algorithmics 9
6 Vision Packages
If you’ve been reading through this document carefully, you’re probably wondering whether it is
ever possible to produce vision software that works reliably. Well, it is possible — but you cannot
tell whether or not a piece of software has been written carefully and uses sensible numerical tech-
niques without looking at its source code. For that reason, the author, like many other researchers,
has a strong preference for open-source software.
What is out there in there in the open source world? There are many vision packages, though
all of them are the results of fairly small groupings of researchers and developers. Recent years
have seen the research community focus on OpenCV and Matlab. As a rule of thumb, people who
need their software to run in real time or have huge amounts of data to crunch through choose
OpenCV, while people whose focus is algorithm development often work with Matlab. As both
of these are covered elsewhere in this Summer School, here are a few others that you might be
interested in looking at:
VXL: http://vxl.sourceforge.net/
A heavily-templated C++ class library being developed jointly by people in a few research
groups, including (the author believes) Oxford and Manchester. VXL grew out of Target Jr,
which was a prototype of the Image Understanding Environment, a well-funded initiative
to develop a common programming platform for the vision community. (Unfortunately, the
IUE’s aspirations were beyond what could be achieved with the hardware and software of
the time and it failed.)
Tina: http://www.tina-vision.net/
Tina is probably the only vision library currently around that makes a conscious effort to
provide facilities that are statistically robust. While this is a great advantage and a lot of
work has been put into making it more portable and easy to work with, it is fair to say that
a significant amount of effort needs to be put into learning to use it.
EVE: http://vase.essex.ac.uk/software/eve.html
EVE, the Easy Vision Environment, is a pure-Python vision package built on top of numpy.
It aims to provide easy-to-use functionality for common image processing and computer
vision tasks, the intention being for them to be used during interactive sessions, from the
Python interpreter’s command prompt or from an enhanced interpreter such as ipython as
well as in scripts. It does not use Python’s OpenCV wrappers, giving the Python developer
an independent check of OpenCV functionality.
In the author’s opinion, the biggest usability problem with current computer vision packages
is that they cater principally for the programmer: they rarely provide facilities for prototyping
algorithms via a command language or for producing graphical user interfaces; and their visual-
isation capabilities are typically not great. These deficiencies are ones that Matlab, for example,
addresses very well.
Essentially all of the vision packages available today, free or commercial, are designed for a
single processor; they cannot spread computation across a cluster, for example, or migrate compu-
tation onto a GPU. That is a problem for algorithms that may take minutes to run and consequently
tens of hours to evaluate. There is a definite need for something that does this, provides a script-
ing language and good graphical capabilities, is based around solid algorithms, and is well tested.
OpenCL (not to be confused with OpenGL, the well-known 3D rendering package) may help us
10. Vision Algorithmics 10
achieve that, and the author has observed a few GPU versions of vision algorithms (SIFT, ransac,
etc.) starting to appear.
With the exception of OpenCV, which includes a (limited) machine learning library, vision
packages work on images. Once information has been extracted from images, you usually have to
find other facilities. This is another distinct shortcoming, as the author is certain that many task-
specific algorithms are very weak in their processing of the data extracted from images. However,
there are packages around that can help:
Image format conversion. It sometimes seems that there are as many different image formats
as there are research groups! You are almost certain to need to convert the format of some
image data files, and there are two good toolkits around that makes this very easy, both of
which are available for Windows and Unix (including Linux and MacOS X). Firstly, there is
NETPBM, an enhancement of Jef Poskanzer’s PBMPLUS (http://netpbm.sourceforge.
net/): this is a set of programs that convert most popular formats to and from its own
internal format. If you use an image format within your research group that isn’t already
supported by NETPBM, the author’s advice to you is to write compatible format converters
as quickly as possible; they’ll prove useful at some point. The second package worthy of
mention is ImageMagick (http://www.imagemagick.org/), which includes both a good
converter and a good display program. There are also interfaces between ImageMagick and
the Perl, Python and Tcl interpreters, which can be useful.
Machine learning. Weka (http://www.cs.waikato.ac.nz/ml/weka/) is a Java machine learn-
ing package really intended for data mining. It contains tools for data pre-processing, classi-
fication, regression, clustering, association rules, and visualization. I’ve heard good reports
about its facilities — if your data can be munged into something that it can work on.
Statistical software. The statistics research community has been good at getting its act together,
much better than the vision community, and much of its research now provides facilities that
interface to R (http://www.r-project.org/). R is a free implementation of S (and S+),
commercial statistical packages that grew out of work at Bell Laboratories. R is available
for both Unix and Windows (it’s a five-minute job to install on Debian Linux, for example)
and there are books around that provide introductions for all levels of users. Unusually for
free software, it provides excellent facilities for graphical displays of data.
Neural networks. Many good things have been said about netlab (http://www.ncrg.aston.
ac.uk/netlab/), a Matlab plug-in that provides facilities rarely seen elsewhere.
Genetic algorithms. Recent years have seen genetic algorithms gain in popularity as a robust
optimisation mechanism. There are many implementations of GAs around the Net; but the
author has written a simple, real-valued GA package with a root-polishing option which
seems to be both fast and reliable. It’s available in both C and Python; do contact him if you
would like a copy.
Whatever you choose to work with, whether from this list or not, treat any results with the same
suspicion as you treat those from your own code.
11. Vision Algorithmics 11
7 Massaging Program Outputs
One of the things you will inevitably have to do is take the output from a program and manipulate
it into some other form. This usually occurs when “plugging together” two programs or when
trying to gather success and failure rates. To make this easier, there are two specific things that
make your life easier:
• learn a scripting language; the most suitable is currently Python, largely because of the
excellent numpy and scipy extensions and its OpenCV functionality;
• don’t build a graphical user interface (GUI) into your program.
Scripting languages are designed to be used as software ‘glue’ between programs; they all provide
easy-to-use facilities for processing textual information, including regular expressions. Which of
them you choose is largely up to personal preference; the author has used Lua, Perl, Python, Ruby
and Tcl — after a long time working with Perl and Tcl, he finally settled on Python. Indeed, most of
the author’s programming is done in a scripting language as it is really a much more time-efficient
way to work.
Building a GUI into a piece of research code is a mistake because it makes it practically im-
possible to use in any kind of automated processing. A much better solution is to produce a
comprehensive command-line interface, one that allows most internal parameters and thresholds
to be set, and then use one of the scripting languages listed above to produce the GUI. For ex-
ample, the author built the interface shown in Figure 1 using Tcl and its Tk graphical toolkit in
about two hours. The interface allows the user to browse through a set of nearly 8,000 images and
an expert’s opinion of them, fine-tuning them when the inevitable mistakes crop up. This 250-line
GUI script works on Unix (including Linux and MacOS X) and Windows entirely unchanged.
Figure 1: Custom-designed graphical interface using a scripting language
12. Vision Algorithmics 12
8 Concluding Remarks
The aim of this essay is to make you aware of the major classes of problem that can and do crop
up in vision software. Beyond that, the major message to take home is not to believe results, either
those from your own software or anyone else’s, without checking. A close colleague of the author’s
will often spend a day getting a feel for the nature of her imagery, and she can spot a problem
in a technique better than anyone else I’ve ever worked with. That is not, I suggest, entirely
coincidence.
Another important thing is to see if changing tuning parameters has the effect that you expect.
For example, one of the author’s students recently wrote some stereo software. Its accuracy was
roughly as anticipated but, following my suggestion that she explore the software using simulated
imagery, found that accuracy decreased as the image size increased. She was uneasy about this
and, after checking with me for a ‘second opinion,’ took a good look at her software. Sure enough,
she found a bug and, re-running the tests after fixing it, found that accuracy then increased with
image size — in keeping with both our expectations. Without both a feel for the problem and
carrying out experiments to see that expected results do indeed occur, the bug may have lain
dormant in the code for weeks or months, only to become apparent after more research had been
built on it — effort that would have to be repeated.
This idea of exploring an algorithm by varying its parameters is actually an important one.
Measuring the success and failure rates as each parameter is varied allows the construction of re-
ceiver operating characteristic (ROC) curves, currently one of the most popular ways of presenting
performance data in vision papers.
However, it is possible to go somewhat further. Let us imagine we have written a program
that processes images of the sky and attempts to estimate the proportion of the image that has
cloud cover (a surprisingly difficult task); this is one of the problems that the data in Figure 1 are
used for. A straightforward evaluation of the algorithm, the kind presented in most papers, would
simply report how often the amount of cloud cover detected by the program matches the expert’s
opinion. However, as the expert also indicated the predominant type of cloud in the image, we
can explore the algorithm in much more detail. We might expect that a likely cause of erroneous
classification is mistaking thin, wispy cloud as blue sky. As both cirrus and altostratus are thin and
wispy, we can run the program against the database of images and see how often the estimate
of cloud cover is wrong when these types of cloud occur. This approach is much, much more
valuable: it gathers evidence to see if there is a particular type of imagery on which our program
will fail, and that tells us where to expend effort in improving the algorithm. This is performance
characterisation, the first step towards producing truly robust vision algorithms — and that forms
the basis of the material that is presented in a separate Summer School lecture.
References
[1] Robert Sedgewick. Algorithms in C. Addison-Wesley, 1990.
[2] William H. Press, Saul A. Teukolsky, Willian T. Vetterling, and Brian P. Flannery. Numerical
Recipes in C. Cambridge University Press, second edition, 1992.