Cancer diagnosis is essentially a human task. Almost universally, the process requires the extraction of tissue (biopsy) and examination of its microstructure by a human. To improve diagnoses based on limited and inconsistent morphologic knowledge, a new approach has recently been proposed that uses molecular spectroscopic imaging to utilize microscopic chemical composition for diagnoses. In contrast to visible imaging, the approach results in very large data sets as each pixel contains the entire molecular vibrational spectroscopy data from all chemical species. Here, we propose data handling and analysis strategies to allow computer-based diagnosis of human prostate cancer by applying a novel genetics-based machine learning technique ({\tt NAX}). We apply this technique to demonstrate both fast learning and accurate classification that, additionally, scales well with parallelization. Preliminary results demonstrate that this approach can improve current clinical practice in diagnosing prostate cancer.
This document summarizes a Six Sigma project conducted by Abu Dhabi Police Medical Services from January to June 2012 to improve defects in pre-analytical and post-analytical processes in their medical laboratory. The project aimed to reduce delay in laboratory results, improve patient safety, and increase customer satisfaction by defining, measuring, analyzing, improving, and controlling critical-to-quality factors affecting the significance and delay of laboratory test results. Data was collected over 10 days to analyze factors such as inappropriate test requests, errors in specimen labeling, registration and verification of results, staff training, IT issues, and specimen transport times.
This patent document describes an inspection tool for radiographic systems. The tool allows a user to input data about an object, radiation source, and detector. It then uses an optimization algorithm to automatically calculate optimal exposure parameters for the radiation source. This ensures high quality images while minimizing radiation exposure time. The tool models the entire imaging process and considers effects like scatter radiation. It provides advantages like faster inspection times and applicability to different source/detector configurations.
Application of online data analytics to a continuous process polybutene unitEmerson Exchange
The document discusses the benefits of meditation for reducing stress and anxiety. Regular meditation practice can help calm the mind and body by lowering heart rate and blood pressure. Studies have shown that meditating for just 10-20 minutes per day can have significant positive impacts on both mental and physical health over time.
The document summarizes research on optimal decision making for systems operating under variable conditions over time. It presents assumptions that if conditions A1-A5 are met, the optimal decision procedure can be given by a monotone procedure where the optimal operating condition is non-decreasing as the system state increases. It provides an example showing the optimal cost functions under different operating conditions are ordered the same as the conditions. The research concludes that for systems meeting the assumptions, a monotone procedure provides the optimal decisions.
Standardising Lab tests and data using SNOMED CTJay Kola
Earlier this year, we attempted to standardise lab tests from two large regions of Scotland using SNOMED CT [within the UK, NHS England / NHS Digital Unified Test List #UTL]. Here is a summary of our experience. Check it out if you:
✅ Plan to consolidate your lab tests before a single LIMS procurement
✅ Aim to achieve cross-site working in a region while enabling lab data interoperability
✅ Intend to achieve conformance to the @NHSEngland Unified Test List
This article also helps you understand how SNOMED CT (and the future NHS England standard UTL) might work in laboratory medicine. This is our presentation on this topic at the SNOMED International #SctExpo22 in Lisbon!
This document summarizes the status of the T2K experiment in 2017. It discusses the neutrino beam and detector status, highlights of recent results including constraints on neutrino oscillation parameters and the first measurement of CP violation, and future plans to collect more data and upgrade detectors to improve sensitivity to CP violation. Over 22×1020 protons have been collected so far, with stable beam operation at 470 kW power. The Super-K detector continues to collect data efficiently, and analysis improvements have increased the neutrino event samples.
The document discusses a tool called ROSE that mines version histories to suggest related changes when a programmer modifies code. ROSE analyzes past transactions from version control systems to determine associations between code changes and uses these to recommend additional locations that may need to be updated. An evaluation shows that ROSE is able to predict the correct changed entities 15% of the time on average across various projects and its top 3 suggestions are correct 64% of the time.
Online Detection of Shutdown Periods in Chemical Plants: A Case StudyManuel Martín
In process industry, chemical processes are controlled and monitored by using readings from multiple physical sensors across the plants. Such physical sensors are also supplemented by soft sensors, i.e. adaptive predictive models, which are often used for computing hard-to-measure variables of the process. For soft sensors to work well and adapt to changing operating conditions they need to be provided with relevant data. As production plants are regularly stopped, data instances generated during shutdown periods have to be identified to avoid updating these predictive models with wrong data. We present a case study concerned with a large chemical plant operation over a 2 years period. The task is to robustly and accurately identify the shutdown periods even in case of multiple sensor failures. State-of-the-art methods were evaluated using the first half of the dataset for calibration purposes and the other half for measuring the performance. Results show that shutdowns (i.e. sudden changes) can be quickly detected in any case but the detection delay of startups (i.e. gradual changes) is directly related with the choice of a window size.
This document summarizes a Six Sigma project conducted by Abu Dhabi Police Medical Services from January to June 2012 to improve defects in pre-analytical and post-analytical processes in their medical laboratory. The project aimed to reduce delay in laboratory results, improve patient safety, and increase customer satisfaction by defining, measuring, analyzing, improving, and controlling critical-to-quality factors affecting the significance and delay of laboratory test results. Data was collected over 10 days to analyze factors such as inappropriate test requests, errors in specimen labeling, registration and verification of results, staff training, IT issues, and specimen transport times.
This patent document describes an inspection tool for radiographic systems. The tool allows a user to input data about an object, radiation source, and detector. It then uses an optimization algorithm to automatically calculate optimal exposure parameters for the radiation source. This ensures high quality images while minimizing radiation exposure time. The tool models the entire imaging process and considers effects like scatter radiation. It provides advantages like faster inspection times and applicability to different source/detector configurations.
Application of online data analytics to a continuous process polybutene unitEmerson Exchange
The document discusses the benefits of meditation for reducing stress and anxiety. Regular meditation practice can help calm the mind and body by lowering heart rate and blood pressure. Studies have shown that meditating for just 10-20 minutes per day can have significant positive impacts on both mental and physical health over time.
The document summarizes research on optimal decision making for systems operating under variable conditions over time. It presents assumptions that if conditions A1-A5 are met, the optimal decision procedure can be given by a monotone procedure where the optimal operating condition is non-decreasing as the system state increases. It provides an example showing the optimal cost functions under different operating conditions are ordered the same as the conditions. The research concludes that for systems meeting the assumptions, a monotone procedure provides the optimal decisions.
Standardising Lab tests and data using SNOMED CTJay Kola
Earlier this year, we attempted to standardise lab tests from two large regions of Scotland using SNOMED CT [within the UK, NHS England / NHS Digital Unified Test List #UTL]. Here is a summary of our experience. Check it out if you:
✅ Plan to consolidate your lab tests before a single LIMS procurement
✅ Aim to achieve cross-site working in a region while enabling lab data interoperability
✅ Intend to achieve conformance to the @NHSEngland Unified Test List
This article also helps you understand how SNOMED CT (and the future NHS England standard UTL) might work in laboratory medicine. This is our presentation on this topic at the SNOMED International #SctExpo22 in Lisbon!
This document summarizes the status of the T2K experiment in 2017. It discusses the neutrino beam and detector status, highlights of recent results including constraints on neutrino oscillation parameters and the first measurement of CP violation, and future plans to collect more data and upgrade detectors to improve sensitivity to CP violation. Over 22×1020 protons have been collected so far, with stable beam operation at 470 kW power. The Super-K detector continues to collect data efficiently, and analysis improvements have increased the neutrino event samples.
The document discusses a tool called ROSE that mines version histories to suggest related changes when a programmer modifies code. ROSE analyzes past transactions from version control systems to determine associations between code changes and uses these to recommend additional locations that may need to be updated. An evaluation shows that ROSE is able to predict the correct changed entities 15% of the time on average across various projects and its top 3 suggestions are correct 64% of the time.
Online Detection of Shutdown Periods in Chemical Plants: A Case StudyManuel Martín
In process industry, chemical processes are controlled and monitored by using readings from multiple physical sensors across the plants. Such physical sensors are also supplemented by soft sensors, i.e. adaptive predictive models, which are often used for computing hard-to-measure variables of the process. For soft sensors to work well and adapt to changing operating conditions they need to be provided with relevant data. As production plants are regularly stopped, data instances generated during shutdown periods have to be identified to avoid updating these predictive models with wrong data. We present a case study concerned with a large chemical plant operation over a 2 years period. The task is to robustly and accurately identify the shutdown periods even in case of multiple sensor failures. State-of-the-art methods were evaluated using the first half of the dataset for calibration purposes and the other half for measuring the performance. Results show that shutdowns (i.e. sudden changes) can be quickly detected in any case but the detection delay of startups (i.e. gradual changes) is directly related with the choice of a window size.
Tackling the Management Challenges of Server Consolidation on Multi-core SystemsThe Linux Foundation
This document discusses server consolidation challenges on multi-core systems. It finds that hypervisor overhead increases significantly under high system load. Frequent context switching accounts for a large portion of hypervisor CPU cycles. Optimizing the credit scheduler to reduce context switching frequency improves performance by lowering hypervisor overhead by 22% and increasing performance per CPU utilization by 15%.
Progress and directions for LAPACK and ScaLAPACK as of 2005. See <a href="http://www.netlib.org/lapack/">LAPACK at Netlib</a> and <a href="http://www.netlib.org/lapack-dev/">lapack-dev</a> for more current information.
Deep learning based gaze detection system for automobile drivers using nir ca...Jaey Jeong
This document summarizes a research paper on developing a deep learning-based gaze detection system for automobile drivers using an NIR camera sensor. The system uses a CNN model to classify the driver's gaze into 17 zones based on facial images. Experimental results show the system achieved over 90% accuracy on both internal and open datasets, outperforming previous methods. The proposed method provides an effective way to monitor driver distraction without compromising safety.
This document describes finite element analysis (FEA) simulations performed on a micromirror design using ANSYS Workbench. Structural and modal analyses were conducted. The structural analysis determined stresses, deformation, and whether the mirror would fail at 5 degrees. The modal analysis identified six resonant frequencies and deformation shapes. The mirror was found to withstand loads but did not rotate properly due to oversupporting.
1. The document discusses using information graphics in health technology assessment to visually present scientific evidence that informs health policy decisions.
2. The author conducted a review of 98 health technology assessment reports from 2003-2007 which found that graphics were used in every report except one, with an average of 0.20 graphics per page compared to 0.58 tables per page.
3. The author also interviewed 5 advisors from NICE who noted that graphics are particularly useful for presenting complex data with multiple outcomes, subgroups, or variables, and when there are time limitations or a need to focus or compare results.
Optimization of parameter settings for GAMG solver in simple solver, OpenFOAM...Masashi Imano
The document summarizes presentations given by Masashi Imano of OCAEL Co. Ltd. at OpenFOAM study meetings for beginners in Kansai and Kanto, Japan. It discusses optimizing parameters for the GAMG solver in OpenFOAM, including the number of cells in the coarsest grid level. Testing on a 16-node SGI cluster showed the optimal range was 32-1024 cells. It also discusses parameters like merge levels, number of smoothing sweeps, and their effect on solver speed for different node counts. The document provides guidance on selecting parameters for the GAMG solver in OpenFOAM simulations.
The Ottawa Hospital Cancer Centre implemented a TomoTherapy program for image-guided IMRT. Two TomoTherapy units were installed and are used to treat over 18 patients per day. Therapists are involved in all aspects of treatment planning and delivery using a full scope practice model. This allows for streamlined, accurate treatment and reduced toxicities. Over 300 patients have been treated since 2005 over a variety of cancer sites. Planning time has decreased as experience increased. Therapists feel this model allows them to better contribute to patient care.
Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...Xavier Llorà
A byproduct benefit of using probabilistic model-building genetic algorithms is the creation of cheap and accurate surrogate models. Learning classifier systems---and genetics-based machine learning in general---can greatly benefit from such surrogates which may replace the costly matching procedure of a rule against large data sets. In this paper we investigate the accuracy of such surrogate fitness functions when coupled with the probabilistic models evolved by the x-ary extended compact classifier system (xeCCS). To achieve such a goal, we show the need that the probabilistic models should be able to represent all the accurate basis functions required for creating an accurate surrogate. We also introduce a procedure to transform populations of rules based into dependency structure matrices (DSMs) which allows building accurate models of overlapping building blocks---a necessary condition to accurately estimate the fitness of the evolved rules.
This document appears to be a thesis submitted by Conor McMenamin for their B.Sc. in Computational Thinking at Maynooth University. The thesis investigates existing standards for selecting elliptic curves for use in elliptic curve cryptography (ECC) and whether it is possible to manipulate the standards to exploit weaknesses. It provides background on elliptic curve theory, cryptography, and standards. The document outlines requirements and proposes designing a system to test manipulating the standards by choosing curves with a user-selected parameter ("BADA55") to simulate exploiting a weakness. It describes implementing and testing the system before concluding and discussing future work.
20160219 - M. Agostini - Nuove tecnologie per lo studio del DNA tumorale libe...Roberto Scarafia
Nano Inspired Biomedicine Laboratory
1 Department of Surgical, Oncological and Gastroenterological Sciences, University of Padua, Italy.
2 Istituto di Ricerca Pediatrica- Città della Speranza, Padova, Italy.
Gaze transformers use vision transformers for gaze estimation from facial images. A hybrid model combines a CNN for image features with a transformer. It outperforms pure transformer and CNN models. Ablation studies show removing self-attention or convolutional layers hurts performance. Pre-training on a large dataset helps transformers achieve state-of-the-art results, and future methods may rely more on pre-training for gaze estimation tasks.
Paper published in The Journal of Pipeline Engineering: A practical Approach to Pipeline Corrosion Modelling: Part 2 - Short-term integrity forecasting
Accurate protein-protein docking with rapid calculationMasahito Ohue
This document describes improving protein-protein docking prediction accuracy while maintaining rapid calculation speed. The researchers:
1) Proposed adding a simple hydrophobic interaction model to MEGADOCK's docking score, which previously only considered shape complementarity and electrostatics.
2) The new model averages atomic contact energy values for receptor surface atoms, allowing hydrophobic interactions to be incorporated with one convolution calculation instead of multiple as in other methods.
3) Testing on 176 protein complexes showed the new model achieved similar prediction accuracy to ZDOCK while being over 8 times faster, maintaining MEGADOCK's rapid calculation ability.
The DEBS Grand Challenge provides a common evaluation framework for event-based systems in research and industry. The 2017 challenge involved three tasks: anomaly detection in machine data, processing RDF data streams, and distributed benchmarking. The document describes the anomaly detection task, which involves building and updating cluster and Markov models on sliding windows of sensor data from machines to detect anomalies. It provides details on query logic, dynamics, and the HOBBIT platform used to run submissions and measure performance. The top 4 performing teams are highlighted, with the fastest solution winning a $1000 award.
Mining Assumptions for Software Components using Machine LearningLionel Briand
EPIcuRus is an approach to automatically generate assumptions for software components in cyber-physical systems modeled in Simulink. It uses machine learning techniques to mine assumptions from test case results. The approach includes generating tests cases using an important feature boundary test generation method, model checking candidate assumptions, and selecting the most informative safe assumptions. An evaluation on industrial case studies found it can learn non-vacuous assumptions for most requirements within a practical time limit, with the important feature boundary test generation performing best.
Flexible and Distributed Production Scheduling Problem Using Population-Based...Mohd Nor Akmal Khalid
This document summarizes a presentation on using population-based meta-heuristic algorithms to solve production scheduling problems in flexible manufacturing systems considering integrated machine maintenance scheduling. The key points discussed include: introducing the problem background and challenges, reviewing literature on existing scheduling approaches, describing the proposed artificial immune system and chemical reaction optimization algorithms, and outlining experiments conducted to evaluate the algorithms' performance on benchmark datasets.
Tim Bell from CERN presented on how they are using Puppet and other configuration management tools to manage their large infrastructure. CERN operates the Large Hadron Collider and worldwide computing grid. They are moving to adopt open source tools to better manage scaling their infrastructure from 7,000 to 15,000 servers. Puppet is helping CERN provision OpenStack cloud resources and automate configuration of complex applications and systems.
The document describes research on modeling synthetic genetic AND gates using computational modeling. It discusses two approaches to modeling - assisting with analysis and design, or describing existing system behavior. It then details the development of a model for a synthetic genetic AND gate using elementary chemical reactions and stochastic simulation. The model was able to capture experimental results but required refinements to account for promoter leakiness. Overall, modeling helped understand the genetic components and could be applied to larger designs.
This document describes research to develop an automated optical method for measuring face checking in maple veneered panels. The method uses cameras to monitor panel surfaces and detect checks as small as 0.2mm wide. Tests showed industrial cameras could simultaneously monitor multiple panels in a climate chamber maintained at low moisture levels to accelerate check formation. Future work involves building a modular test chamber and further testing camera capabilities to characterize many factors that influence checking.
A quick overview of the seed for Meandre 2.0 series. It covers the main motivations moving forward and the disruptive changes introduced via the use of Scala and MongoDB
This document discusses cloud computing and the Meandre framework. It provides an overview of cloud concepts like public/private clouds and IaaS, PaaS, SaaS models. It describes NCSA's use of virtual machines and Eucalyptus cloud. Meandre is presented as a component-based framework that can orchestrate data-intensive applications across cloud resources through its dataflow model and scripting language. It aims to facilitate scaling applications to leverage elastic cloud infrastructure and integrate computation and data.
More Related Content
Similar to Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging
Tackling the Management Challenges of Server Consolidation on Multi-core SystemsThe Linux Foundation
This document discusses server consolidation challenges on multi-core systems. It finds that hypervisor overhead increases significantly under high system load. Frequent context switching accounts for a large portion of hypervisor CPU cycles. Optimizing the credit scheduler to reduce context switching frequency improves performance by lowering hypervisor overhead by 22% and increasing performance per CPU utilization by 15%.
Progress and directions for LAPACK and ScaLAPACK as of 2005. See <a href="http://www.netlib.org/lapack/">LAPACK at Netlib</a> and <a href="http://www.netlib.org/lapack-dev/">lapack-dev</a> for more current information.
Deep learning based gaze detection system for automobile drivers using nir ca...Jaey Jeong
This document summarizes a research paper on developing a deep learning-based gaze detection system for automobile drivers using an NIR camera sensor. The system uses a CNN model to classify the driver's gaze into 17 zones based on facial images. Experimental results show the system achieved over 90% accuracy on both internal and open datasets, outperforming previous methods. The proposed method provides an effective way to monitor driver distraction without compromising safety.
This document describes finite element analysis (FEA) simulations performed on a micromirror design using ANSYS Workbench. Structural and modal analyses were conducted. The structural analysis determined stresses, deformation, and whether the mirror would fail at 5 degrees. The modal analysis identified six resonant frequencies and deformation shapes. The mirror was found to withstand loads but did not rotate properly due to oversupporting.
1. The document discusses using information graphics in health technology assessment to visually present scientific evidence that informs health policy decisions.
2. The author conducted a review of 98 health technology assessment reports from 2003-2007 which found that graphics were used in every report except one, with an average of 0.20 graphics per page compared to 0.58 tables per page.
3. The author also interviewed 5 advisors from NICE who noted that graphics are particularly useful for presenting complex data with multiple outcomes, subgroups, or variables, and when there are time limitations or a need to focus or compare results.
Optimization of parameter settings for GAMG solver in simple solver, OpenFOAM...Masashi Imano
The document summarizes presentations given by Masashi Imano of OCAEL Co. Ltd. at OpenFOAM study meetings for beginners in Kansai and Kanto, Japan. It discusses optimizing parameters for the GAMG solver in OpenFOAM, including the number of cells in the coarsest grid level. Testing on a 16-node SGI cluster showed the optimal range was 32-1024 cells. It also discusses parameters like merge levels, number of smoothing sweeps, and their effect on solver speed for different node counts. The document provides guidance on selecting parameters for the GAMG solver in OpenFOAM simulations.
The Ottawa Hospital Cancer Centre implemented a TomoTherapy program for image-guided IMRT. Two TomoTherapy units were installed and are used to treat over 18 patients per day. Therapists are involved in all aspects of treatment planning and delivery using a full scope practice model. This allows for streamlined, accurate treatment and reduced toxicities. Over 300 patients have been treated since 2005 over a variety of cancer sites. Planning time has decreased as experience increased. Therapists feel this model allows them to better contribute to patient care.
Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...Xavier Llorà
A byproduct benefit of using probabilistic model-building genetic algorithms is the creation of cheap and accurate surrogate models. Learning classifier systems---and genetics-based machine learning in general---can greatly benefit from such surrogates which may replace the costly matching procedure of a rule against large data sets. In this paper we investigate the accuracy of such surrogate fitness functions when coupled with the probabilistic models evolved by the x-ary extended compact classifier system (xeCCS). To achieve such a goal, we show the need that the probabilistic models should be able to represent all the accurate basis functions required for creating an accurate surrogate. We also introduce a procedure to transform populations of rules based into dependency structure matrices (DSMs) which allows building accurate models of overlapping building blocks---a necessary condition to accurately estimate the fitness of the evolved rules.
This document appears to be a thesis submitted by Conor McMenamin for their B.Sc. in Computational Thinking at Maynooth University. The thesis investigates existing standards for selecting elliptic curves for use in elliptic curve cryptography (ECC) and whether it is possible to manipulate the standards to exploit weaknesses. It provides background on elliptic curve theory, cryptography, and standards. The document outlines requirements and proposes designing a system to test manipulating the standards by choosing curves with a user-selected parameter ("BADA55") to simulate exploiting a weakness. It describes implementing and testing the system before concluding and discussing future work.
20160219 - M. Agostini - Nuove tecnologie per lo studio del DNA tumorale libe...Roberto Scarafia
Nano Inspired Biomedicine Laboratory
1 Department of Surgical, Oncological and Gastroenterological Sciences, University of Padua, Italy.
2 Istituto di Ricerca Pediatrica- Città della Speranza, Padova, Italy.
Gaze transformers use vision transformers for gaze estimation from facial images. A hybrid model combines a CNN for image features with a transformer. It outperforms pure transformer and CNN models. Ablation studies show removing self-attention or convolutional layers hurts performance. Pre-training on a large dataset helps transformers achieve state-of-the-art results, and future methods may rely more on pre-training for gaze estimation tasks.
Paper published in The Journal of Pipeline Engineering: A practical Approach to Pipeline Corrosion Modelling: Part 2 - Short-term integrity forecasting
Accurate protein-protein docking with rapid calculationMasahito Ohue
This document describes improving protein-protein docking prediction accuracy while maintaining rapid calculation speed. The researchers:
1) Proposed adding a simple hydrophobic interaction model to MEGADOCK's docking score, which previously only considered shape complementarity and electrostatics.
2) The new model averages atomic contact energy values for receptor surface atoms, allowing hydrophobic interactions to be incorporated with one convolution calculation instead of multiple as in other methods.
3) Testing on 176 protein complexes showed the new model achieved similar prediction accuracy to ZDOCK while being over 8 times faster, maintaining MEGADOCK's rapid calculation ability.
The DEBS Grand Challenge provides a common evaluation framework for event-based systems in research and industry. The 2017 challenge involved three tasks: anomaly detection in machine data, processing RDF data streams, and distributed benchmarking. The document describes the anomaly detection task, which involves building and updating cluster and Markov models on sliding windows of sensor data from machines to detect anomalies. It provides details on query logic, dynamics, and the HOBBIT platform used to run submissions and measure performance. The top 4 performing teams are highlighted, with the fastest solution winning a $1000 award.
Mining Assumptions for Software Components using Machine LearningLionel Briand
EPIcuRus is an approach to automatically generate assumptions for software components in cyber-physical systems modeled in Simulink. It uses machine learning techniques to mine assumptions from test case results. The approach includes generating tests cases using an important feature boundary test generation method, model checking candidate assumptions, and selecting the most informative safe assumptions. An evaluation on industrial case studies found it can learn non-vacuous assumptions for most requirements within a practical time limit, with the important feature boundary test generation performing best.
Flexible and Distributed Production Scheduling Problem Using Population-Based...Mohd Nor Akmal Khalid
This document summarizes a presentation on using population-based meta-heuristic algorithms to solve production scheduling problems in flexible manufacturing systems considering integrated machine maintenance scheduling. The key points discussed include: introducing the problem background and challenges, reviewing literature on existing scheduling approaches, describing the proposed artificial immune system and chemical reaction optimization algorithms, and outlining experiments conducted to evaluate the algorithms' performance on benchmark datasets.
Tim Bell from CERN presented on how they are using Puppet and other configuration management tools to manage their large infrastructure. CERN operates the Large Hadron Collider and worldwide computing grid. They are moving to adopt open source tools to better manage scaling their infrastructure from 7,000 to 15,000 servers. Puppet is helping CERN provision OpenStack cloud resources and automate configuration of complex applications and systems.
The document describes research on modeling synthetic genetic AND gates using computational modeling. It discusses two approaches to modeling - assisting with analysis and design, or describing existing system behavior. It then details the development of a model for a synthetic genetic AND gate using elementary chemical reactions and stochastic simulation. The model was able to capture experimental results but required refinements to account for promoter leakiness. Overall, modeling helped understand the genetic components and could be applied to larger designs.
This document describes research to develop an automated optical method for measuring face checking in maple veneered panels. The method uses cameras to monitor panel surfaces and detect checks as small as 0.2mm wide. Tests showed industrial cameras could simultaneously monitor multiple panels in a climate chamber maintained at low moisture levels to accelerate check formation. Future work involves building a modular test chamber and further testing camera capabilities to characterize many factors that influence checking.
Similar to Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging (20)
A quick overview of the seed for Meandre 2.0 series. It covers the main motivations moving forward and the disruptive changes introduced via the use of Scala and MongoDB
This document discusses cloud computing and the Meandre framework. It provides an overview of cloud concepts like public/private clouds and IaaS, PaaS, SaaS models. It describes NCSA's use of virtual machines and Eucalyptus cloud. Meandre is presented as a component-based framework that can orchestrate data-intensive applications across cloud resources through its dataflow model and scripting language. It aims to facilitate scaling applications to leverage elastic cloud infrastructure and integrate computation and data.
From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0Xavier Llorà
One hundred and fifty years have passed since the publication of Darwin's world-changing manuscript "The Origins of Species by Means of Natural Selection". Darwin's ideas have proven their power to reach beyond the biology realm, and their ability to define a conceptual framework which allows us to model and understand complex systems. In the mid 1950s and 60s the efforts of a scattered group of engineers proved the benefits of adopting an evolutionary paradigm to solve complex real-world problems. In the 70s, the emerging presence of computers brought us a new collection of artificial evolution paradigms, among which genetic algorithms rapidly gained widespread adoption. Currently, the Internet has propitiated an exponential growth of information and computational resources that are clearly disrupting our perception and forcing us to reevaluate the boundaries between technology and social interaction. Darwin's ideas can, once again, help us understand such disruptive change. In this talk, I will review the origin of artificial evolution ideas and techniques. I will also show how these techniques are, nowadays, helping to solve a wide range of applications, from life science problems to twitter puzzles, and how high performance computing can make Darwin ideas a routinary tool to help us model and understand complex systems.
Large Scale Data Mining using Genetics-Based Machine LearningXavier Llorà
We are living in the peta-byte era.We have larger and larger data to analyze, process and transform into useful answers for the domain experts. Robust data mining tools, able to cope with petascale volumes and/or high dimensionality producing human-understandable solutions are key on several domain areas. Genetics-based machine learning (GBML) techniques are perfect candidates for this task, among others, due to the recent advances in representations, learning paradigms, and theoretical modeling. If evolutionary learning techniques aspire to be a relevant player in this context, they need to have the capacity of processing these vast amounts of data and they need to process this data within reasonable time. Moreover, massive computation cycles are getting cheaper and cheaper every day, allowing researchers to have access to unprecedented parallelization degrees. Several topics are interlaced in these two requirements: (1) having the proper learning paradigms and knowledge representations, (2) understanding them and knowing when are they suitable for the problem at hand, (3) using efficiency enhancement techniques, and (4) transforming and visualizing the produced solutions to give back as much insight as possible to the domain experts are few of them.
This tutorial will try to answer this question, following a roadmap that starts with the questions of what large means, and why large is a challenge for GBML methods. Afterwards, we will discuss different facets in which we can overcome this challenge: Efficiency enhancement techniques, representations able to cope with large dimensionality spaces, scalability of learning paradigms. We will also review a topic interlaced with all of them: how can we model the scalability of the components of our GBML systems to better engineer them to get the best performance out of them for large datasets. The roadmap continues with examples of real applications of GBML systems and finishes with an analysis of further directions.
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...Xavier Llorà
Data-intensive computing has positioned itself as a valuable programming paradigm to efficiently approach problems requiring processing very large volumes of data. This paper presents a pilot study about how to apply the data-intensive computing paradigm to evolutionary computation algorithms. Two representative cases (selectorecombinative genetic algorithms and estimation of distribution algorithms) are presented, analyzed, and discussed. This study shows that equivalent data-intensive computing evolutionary computation algorithms can be easily developed, providing robust and scalable algorithms for the multicore-computing era. Experimental results show how such algorithms scale with the number of available cores without further modification.
Scalabiltity in GBML, Accuracy-based Michigan Fuzzy LCS, and new TrendsXavier Llorà
The document summarizes a presentation given by Jorge Casillas on research related to scaling up genetic learning algorithms and fuzzy classifier systems. Specifically, it discusses:
1. An approach using evolutionary instance selection and stratification to extract rule sets from large datasets that balance prediction accuracy and interpretability.
2. Fuzzy-XCS, an accuracy-based genetic fuzzy system the author is developing that uses competitive fuzzy inference and represents rules as disjunctive normal forms to address challenges in credit assignment.
3. Open problems and opportunities in applying genetic learning at large scales, such as addressing chromosome size and efficient evaluation over large datasets.
Pittsburgh Learning Classifier Systems for Protein Structure Prediction: Sca...Xavier Llorà
This document summarizes research using a Pittsburgh Learning Classifier System (LCS) called GAssist to predict protein structure by determining coordination numbers (CN). The researchers tested GAssist on a dataset of over 250,000 protein residues, comparing it to support vector machines, Naive Bayes, and C4.5 decision trees. While support vector machines achieved the best accuracy, GAssist produced more interpretable and compact rule sets at the cost of lower performance. The researchers analyzed the interpretability and scalability of GAssist for this challenging bioinformatics problem, identifying avenues for improving its accuracy while maintaining explanatory power.
Learning Classifier Systems for Class Imbalance ProblemsXavier Llorà
The document discusses learning classifier systems (LCS) for addressing class imbalance problems in datasets. It aims to enhance the applicability of LCS to knowledge discovery from real-world datasets that often exhibit class imbalance, where one class is represented by significantly fewer examples than other classes. The author proposes adapting parameters of the XCS learning classifier system, such as learning rate and genetic algorithm threshold, based on estimated class imbalance ratios within classifiers' niches in order to minimize bias towards majority classes and better handle small disjuncts representing minority classes.
XCS: Current capabilities and future challengesXavier Llorà
The document discusses the XCS classifier system, which uses a combination of gradient-based techniques and evolutionary algorithms to learn predictive models from complex problems. It summarizes XCS's current capabilities in classification, function approximation, and reinforcement learning tasks. However, it notes there are still challenges to improve XCS's representations and operators, niching abilities, handling of dynamic problems, solution compactness, and development of hierarchical classifier systems.
Computed Prediction: So far, so good. What now?Xavier Llorà
This document discusses computed prediction in learning classifier systems (LCS). It addresses representing the payoff function Q(s,a) that maps state-action pairs to expected future payoffs. Specifically:
1) In computed prediction, each classifier has parameters w and the classifier prediction is computed as a parametrized function p(x,w) like a linear approximation.
2) Classifier weights are updated using the Widrow-Hoff rule online as the payoff function is learned.
3) Using a powerful approximator like tile coding to compute predictions allows the problem to potentially be solved by a single classifier, but evolution of different approximators per problem subspace may still
This document provides information about the NCSA/IlliGAL Gathering on Evolutionary Learning (NIGEL 2006) conference. It discusses how the conference originated from a previous 2003 gathering. It thanks the organizers and participants and provides details about the agenda, which includes presentations on topics like classifier systems and discussions around applications and techniques of evolutionary learning.
Linkage Learning for Pittsburgh LCS: Making Problems TractableXavier Llorà
Presentation by Xavier Llorà, Kumara Sastry, & David E. Goldberg showing how linkage learning is possible on Pittsburgh style learning classifier systems
Meandre: Semantic-Driven Data-Intensive Flows in the CloudsXavier Llorà
- Meandre is a semantic-driven data-intensive workflow infrastructure for distributed computing. It allows users to assemble modular components into complex workflows (flows) in a visual programming tool or using a scripting language called ZigZag.
- Workflows are composed of components, which can be executable or control components. Executable components perform computational tasks when data is available, while control components pause workflows for user interactions. Components are described semantically using ontologies to separate functionality from implementation.
- Data availability drives workflow execution in Meandre. When required inputs are available, components will fire and produce outputs to make data available for downstream components. This dataflow approach aims to make workflows transparent, intuitive, and reusable across
ZigZag is a new language for describing data-intensive workflows. It aims to make the Meandre infrastructure easier to use by allowing users to assemble complex data flows. The language has a new syntax and compiles workflows that can then be run on Meandre to process large datasets.
This presentation covers a brief overview of the current stage of the DISCUS project. General overview and introduction to some of the currently available tools
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/temporal-event-neural-networks-a-more-efficient-alternative-to-the-transformer-a-presentation-from-brainchip/
Chris Jones, Director of Product Management at BrainChip , presents the “Temporal Event Neural Networks: A More Efficient Alternative to the Transformer” tutorial at the May 2024 Embedded Vision Summit.
The expansion of AI services necessitates enhanced computational capabilities on edge devices. Temporal Event Neural Networks (TENNs), developed by BrainChip, represent a novel and highly efficient state-space network. TENNs demonstrate exceptional proficiency in handling multi-dimensional streaming data, facilitating advancements in object detection, action recognition, speech enhancement and language model/sequence generation. Through the utilization of polynomial-based continuous convolutions, TENNs streamline models, expedite training processes and significantly diminish memory requirements, achieving notable reductions of up to 50x in parameters and 5,000x in energy consumption compared to prevailing methodologies like transformers.
Integration with BrainChip’s Akida neuromorphic hardware IP further enhances TENNs’ capabilities, enabling the realization of highly capable, portable and passively cooled edge devices. This presentation delves into the technical innovations underlying TENNs, presents real-world benchmarks, and elucidates how this cutting-edge approach is positioned to revolutionize edge AI across diverse applications.
High performance Serverless Java on AWS- GoTo Amsterdam 2024Vadym Kazulkin
Java is for many years one of the most popular programming languages, but it used to have hard times in the Serverless community. Java is known for its high cold start times and high memory footprint, comparing to other programming languages like Node.js and Python. In this talk I'll look at the general best practices and techniques we can use to decrease memory consumption, cold start times for Java Serverless development on AWS including GraalVM (Native Image) and AWS own offering SnapStart based on Firecracker microVM snapshot and restore and CRaC (Coordinated Restore at Checkpoint) runtime hooks. I'll also provide a lot of benchmarking on Lambda functions trying out various deployment package sizes, Lambda memory settings, Java compilation options and HTTP (a)synchronous clients and measure their impact on cold and warm start times.
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving
Manufacturing custom quality metal nameplates and badges involves several standard operations. Processes include sheet prep, lithography, screening, coating, punch press and inspection. All decoration is completed in the flat sheet with adhesive and tooling operations following. The possibilities for creating unique durable nameplates are endless. How will you create your brand identity? We can help!
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
"Scaling RAG Applications to serve millions of users", Kevin GoedeckeFwdays
How we managed to grow and scale a RAG application from zero to thousands of users in 7 months. Lessons from technical challenges around managing high load for LLMs, RAGs and Vector databases.
Must Know Postgres Extension for DBA and Developer during MigrationMydbops
Mydbops Opensource Database Meetup 16
Topic: Must-Know PostgreSQL Extensions for Developers and DBAs During Migration
Speaker: Deepak Mahto, Founder of DataCloudGaze Consulting
Date & Time: 8th June | 10 AM - 1 PM IST
Venue: Bangalore International Centre, Bangalore
Abstract: Discover how PostgreSQL extensions can be your secret weapon! This talk explores how key extensions enhance database capabilities and streamline the migration process for users moving from other relational databases like Oracle.
Key Takeaways:
* Learn about crucial extensions like oracle_fdw, pgtt, and pg_audit that ease migration complexities.
* Gain valuable strategies for implementing these extensions in PostgreSQL to achieve license freedom.
* Discover how these key extensions can empower both developers and DBAs during the migration process.
* Don't miss this chance to gain practical knowledge from an industry expert and stay updated on the latest open-source database trends.
Mydbops Managed Services specializes in taking the pain out of database management while optimizing performance. Since 2015, we have been providing top-notch support and assistance for the top three open-source databases: MySQL, MongoDB, and PostgreSQL.
Our team offers a wide range of services, including assistance, support, consulting, 24/7 operations, and expertise in all relevant technologies. We help organizations improve their database's performance, scalability, efficiency, and availability.
Contact us: info@mydbops.com
Visit: https://www.mydbops.com/
Follow us on LinkedIn: https://in.linkedin.com/company/mydbops
For more details and updates, please follow up the below links.
Meetup Page : https://www.meetup.com/mydbops-databa...
Twitter: https://twitter.com/mydbopsofficial
Blogs: https://www.mydbops.com/blog/
Facebook(Meta): https://www.facebook.com/mydbops/
Discover top-tier mobile app development services, offering innovative solutions for iOS and Android. Enhance your business with custom, user-friendly mobile applications.
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsDianaGray10
Join us to learn how UiPath Apps can directly and easily interact with prebuilt connectors via Integration Service--including Salesforce, ServiceNow, Open GenAI, and more.
The best part is you can achieve this without building a custom workflow! Say goodbye to the hassle of using separate automations to call APIs. By seamlessly integrating within App Studio, you can now easily streamline your workflow, while gaining direct access to our Connector Catalog of popular applications.
We’ll discuss and demo the benefits of UiPath Apps and connectors including:
Creating a compelling user experience for any software, without the limitations of APIs.
Accelerating the app creation process, saving time and effort
Enjoying high-performance CRUD (create, read, update, delete) operations, for
seamless data management.
Speakers:
Russell Alfeche, Technology Leader, RPA at qBotic and UiPath MVP
Charlie Greenberg, host
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
This talk will cover ScyllaDB Architecture from the cluster-level view and zoom in on data distribution and internal node architecture. In the process, we will learn the secret sauce used to get ScyllaDB's high availability and superior performance. We will also touch on the upcoming changes to ScyllaDB architecture, moving to strongly consistent metadata and tablets.
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...Fwdays
Direct losses from downtime in 1 minute = $5-$10 thousand dollars. Reputation is priceless.
As part of the talk, we will consider the architectural strategies necessary for the development of highly loaded fintech solutions. We will focus on using queues and streaming to efficiently work and manage large amounts of data in real-time and to minimize latency.
We will focus special attention on the architectural patterns used in the design of the fintech system, microservices and event-driven architecture, which ensure scalability, fault tolerance, and consistency of the entire system.
Essentials of Automations: Exploring Attributes & Automation ParametersSafe Software
Building automations in FME Flow can save time, money, and help businesses scale by eliminating data silos and providing data to stakeholders in real-time. One essential component to orchestrating complex automations is the use of attributes & automation parameters (both formerly known as “keys”). In fact, it’s unlikely you’ll ever build an Automation without using these components, but what exactly are they?
Attributes & automation parameters enable the automation author to pass data values from one automation component to the next. During this webinar, our FME Flow Specialists will cover leveraging the three types of these output attributes & parameters in FME Flow: Event, Custom, and Automation. As a bonus, they’ll also be making use of the Split-Merge Block functionality.
You’ll leave this webinar with a better understanding of how to maximize the potential of automations by making use of attributes & automation parameters, with the ultimate goal of setting your enterprise integration workflows up on autopilot.
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging
1. Towards Better than Human Capability in
Diagnosing Prostate Cancer
Using Infrared Spectroscopic Imaging
Xavier Llorà1, Rohith Reddy2,3, Brian Matesic2, Rohit Bhargava2,3
1 National Center for Supercomputing Applications & Illinois Genetic Algorithms Laboratory
2 Department of Bioengineering
3 Beckman Institute for Advanced Science and Technology
University of Illinois at Urbana-Champaign
Supported by AFOSR FA9550-06-1-0370, NSF at ISS-02-09199
DoD W81XWH-07-PRCP-NIA and the Faculty Fellows program at NCSA
GECCO 2007 HUMIES 1
2. Motivation
• The American Cancer Society estimated 234,460 new cases of
prostate cancer in 2006.
• Screening test:
– Digital rectal examination
– Prostate specific antigen (PSA) level
• Suspicious patients undergo biopsy process
• 1 million people undergo biopsies in the US alone per year
• Pathologist diagnose
– Crucial for the therapy
– Human accuracy ( error < 5% )
– Costs
GECCO 2007 Llorà, Reddy, Matesic & Bhargava 2
3. Current Diagnosis Procedure
• Biopsy-staining-microscopy-manual recognition is the diagnosis
procedure for the last 150 years.
GECCO 2007 Llorà, Reddy, Matesic & Bhargava 3
4. Advances on Fourier Transform IR Imaging
• Infrared spectroscopy is a classical technique for
measuring chemical composition of specimens.
• At specific frequencies, the vibrational modes of
molecules are resonant with the frequency of infrared
light.
• Microscope has develop to the point that resolution
that match a pixel with a cell (and keep improving).
• It allows to start from the same data (stained tissue)
• Generates larges volumes of data
GECCO 2007 Llorà, Reddy, Matesic & Bhargava 4
5. Advances on Fourier Transform IR Imaging
GECCO 2007 Llorà, Reddy, Matesic & Bhargava 5
6. Spectrum Analysis
• Microscope generate a lot of data
• Per spot the spectra signature requires GBs of storage
• Bhargava et al. (2005) feature extraction for tissue identification
• More than 200 potential features per spectrum (cell/pixel)
• Firsts methodology that allowed tissue identification
GECCO 2007 Llorà, Reddy, Matesic & Bhargava 6
7. Human Activity
• As mentioned earlier: Area of exclusive human activity
• Two key tasks:
– Using the spectra identify tissue type
– Using filtered tissue diagnose samples
• Both tasks:
– Require learning
– Can be model as supervised learning problems
• Challenges:
– Very large volumes of information
– Scalability and efficiency is a priority
– Interpretability of the models
GECCO 2007 Llorà, Reddy, Matesic & Bhargava 7
8. Genetics-Based Machine Learning
• GA-driven learning mechanisms
• Mainly rule based models
• Pittsburgh approach
• Inherently parallel process
• GBML is a good candidate for very large problems
• Rule matching is know to be the governing factor on
the execution time (Llorà & Sastry, 2006)
GECCO 2007 Llorà, Reddy, Matesic & Bhargava 8
9. Current Off-the-Shelf Systems
• There is a wide variety of GBML/LCS implementations
• Most of them:
– Oriented to run experiments in a single processors
– Have large memory footprints
– Typical problem = tens of attributes + thousand
records
– Few attention to efficient implementation and
acceleration techniques (Llorà & Sastry, 2006)
• Cancer diagnosis overwhelms them:
– Hundreds of features
– Millions of records
GECCO 2007 Llorà, Reddy, Matesic & Bhargava 9
11. NAX Mechanics
• The basic procedure:
1. Create an empty decision list
2. GA evolves a maximally accurate and maximally
general rule using the available instances
3. Add the evolved rule to the decision list
4. Remove all the instances covered by the rule
5. If there are uncovered instances go to step 2
GECCO 2007 Llorà, Reddy, Matesic & Bhargava 11
12. A Little Story about Hardware
• SIMD (Single Instruction Multiple Data) architectures were hot
in the ‘80s supercomputing scene
• SIMD were widely used to performed binary operations among
two vector operands (Cray)
• Those processors were very expensive
• Consumer products took another path, the scalar one
– No SIMD support in hardware (left to the software)
– The massive with spread of needs for CPUs make them cheaper
and cheaper
• Side effect:
– Hot in the supercomputing scene in the ‘90s become building
machines with large numbers of “cheap” processors
GECCO 2007 Llorà, Reddy, Matesic & Bhargava 12
13. The Consumer Market Strikes Back
• Computer games and multimedia applications
– Use a particular type of matrix operations
– Graphics heavily use 4x4 matrix operations
– Digital signal processing applications also take advantage of it
• In late ‘90s Intel introduced SIMD instructions on Pentium chips via
MMX
– Multimedia oriented instructions
– Vector operations for fix-size blocks
– Goal: accelerate via hardware multimedia apps
• Nowadays most vendors provide “multimedia” vector instruction
sets
– Intel: MMX, SSE, SSE2, SSE3
– AMD: 3Dnow!, 3Dnow+! (also support Intel’s MMX, SSE, SSE2)
– IBM/Motorola: AltiVec
GECCO 2007 Llorà, Reddy, Matesic & Bhargava 13
14. A Simple Example (I/II)
• Match = a simple aligned ‘and’ and ‘equal’
Instance Instance
10 01 10 01 01 10 10 01
0101 1001
Condition Condition
& &
10 01 11 11 10 01 11 11
01## 01##
Temp Temp
10 01 10 01 00 00 10 01
== ==
Instance Instance
10 01 10 01 01 10 10 01
Matched Not Matched
11 11 11 11 10 01 11 11
• Vector operations allow different manipulations
• 4 floats can be manipulated at once (spectra features)
GECCO 2007 Llorà, Reddy, Matesic & Bhargava 14
16. Exploiting the Inherent Parallelism
• Rule matching rules the overall execution time
• Fitness calculation > 99%
• The parallelization method focused on reducing
communication cost
• The idea
– Most of the time evaluating
– Evaluate the evaluation
– No master/slave
– All processors run the same GA seeded in the same manner
– Each processor only evaluate a chunk of the population (N/p)
– Broadcast the fitness of the chunk to the other processors
GECCO 2007 Llorà, Reddy, Matesic & Bhargava 16
18. Prostate Cancer Data
1. Tissue identification
– Modeled as a supervised learning problem
– (Features, tissue type)
– The goal: Accurately retrieve epithelial tissue
2. Tissue identification
– Modeled as a supervised learning problem
– (Features, diagnosis)
– The goal: Accurately diagnose each cell (pixel) and
aggregate those diagnosis to generate a spot
(patient) diagnosis
GECCO 2007 Llorà, Reddy, Matesic & Bhargava 18
23. Filtered Tissue is Accurately Diagnosed
• Pixel crossvalidation accuracy (87.34%)
• Spot accuracy
– 68 of 69 malignant spots
– 70 of 71 benign spots
• Human-competitive computer-aided diagnosis system
is possible
• First published results that fall in the range of
human error (<5%)
GECCO 2007 HUMIES Llorà, Reddy, Matesic & Bhargava 23
24. Breakthrough
• Current best published result, examples from
different fields
– Image Analysis - 77% accuracy1 (cancer/no cancer)
– Raman Spectroscopy – 86%2 accuracy
– Genomic analysis – 76% (low grade/high grade cancer)
1. R. Stotzka et al. Anal. Quant. Cytol. Histol.,17, 204-218 (1995).
2. P. Crow et al. Urol. 65, 1126-1130 (2005)
3. L. True et al. Proc Natl Acad Sci U S A. 2006 Jul 18;103(29):10991-10996.
GECCO 2007 HUMIES Llorà, Reddy, Matesic & Bhargava 24
25. Conclusions
• Humans are the ultimate and only source of diagnosis
• The FTIR imaging provides information about chemical
signatures and structure
• Large volumes of data forced efficient GBML design
• Diagnosis require two steps
• The results on prostate cancer are human competitive
• No previous method has been able to match
pathologist accuracy
GECCO 2007 Llorà, Reddy, Matesic & Bhargava 25
26. Towards Better than Human Capability in
Diagnosing Prostate Cancer
Using Infrared Spectroscopic Imaging
Xavier Llorà1, Rohith Reddy2,3, Brian Matesic2, Rohit Bhargava2,3
1 National Center for Supercomputing Applications & Illinois Genetic Algorithms Laboratory
2 Department of Bioengineering
3 Beckman Institute for Advanced Science and Technology
University of Illinois at Urbana-Champaign
Supported by AFOSR FA9550-06-1-0370, NSF at ISS-02-09199
DoD W81XWH-07-PRCP-NIA and the Faculty Fellows program at NCSA
GECCO 2007 HUMIES 26