1. Classification of tumor metastasis data
using Quantum kernel-based algorithm
Team:QKBA
Mentors:Leo (NVIDIA), Jay (NVIDIA)
Mentees:Tai-Yue Li (National Synchrotron Radiation Research
Center), Cheng-Fang Su (National Yang Ming Chiao Tung University),,
Chung-Hsing Hsu (Asia University), Kuan-Cheng Chen (QuEST),
Ka-Lok Ng (Asia University)
tim312508@gmail.com , scf1204@nycu.edu.tw
2. Introduction
Primary tumor Blood or lymphatic system New tumor High mortality
Cancer cells
We need an effective method to predict tumor metastasis to help
patients prevent it in advance.
4. Introduction
Currently, we don't have effective methods to find biomarkers of tumor metastasis.
How effective is the classification of tumor
metastasis data using the QSVM algorithm?
Supervised learning with quantum enhanced feature spaces Nature volume 567, pages209–212 (2019)
5. Introduction
x1
|0 >
|0 >
classical data space
2 data points with 2 features:
x2
x1 feature1 x1 feature2
x2 feature1 x2 feature2
x2
x1
x1 feature1
x1 feature2
U(x1)
x2 feature1
x2 feature2
U+(x2)
q1
q2
Quantum
feature space
x2
x1
n features = n qubits
Quantum feature map circuit
(non-liner map function)
Transition amplitude
of two data
a
x1
x2
Measure the expectations
value at zero state
Computation time costs = 2n (n qubits)
from numbers of features
0.5m2 (m data)
from numbers of data
x
1
a 1
a
x1 x2
x1
x2
SVM
Classification
Quantum kernel
matrix
6. Issue
Insufficient computing resources and long computing time are the main problems.
We screened out 48 important gene expression as features for tumor metastasis classification.
Ø The amount of calculation required is as high as 2 to the 48th power. (248)
Classical computer
• Difficult to perform
Real quantum computer
• Calculation cost very high
• Result will be affected by noise
Although the principal component analysis (PCA) can reduce the number of qubits can make the program
executable on a classical computer.
Ø The number of qubits and the amount of data increase, the calculation time will also increase significantly.
Ø We need to know which gene is related to tumor metastasis.
Therefore, a better solution is to alleviate the limitation of computing resources
7. Solution
We use the cuQuantum SDK to accelerate the GPU simulate QSVM.
And cuQuantum can use the two technologies of cuStateVec and cuTensorNet to accelerate the matrix
calculation using GPU.
The quantum circuit operations can be regarded as a series of matrix operations.
|0 >
|0 >
…..
U(x1) U+(x2) M
|0 >
|0 >
…..
QSVM quantum circuit
⋯
⋮ ⋱ ⋮
⋯
⋯
⋮ ⋱ ⋮
⋯
Series of matrix operations
…
In this hackathon:
1. Solved the slow
calculation problem
from Qiskit.
2. Install cuStateVec
support GPU quantum
computer simulator
3. Using cuTensorNet
to solve insufficient
memory problem
https://developer.nvidia.com/cuquantum-sdk
8. Result
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
0.01
0.1
1
10
100
Time
(min)
No. Qubits
334data-cpu
334data-cuStatevec-A6000
334data-cuStatevec-A100*2
2data-cpu
2data-cuStatevec-A6000
2data-cuStatevec-A100*2
34.1 min
21.7 min
QSVM runtime test
3 min
35.8 min
1.9 min
1.1 min
19x speedup
33x speedup
1.6x speedup
11.4x speedup
10. Future work
Ø We will complete the code of cuTensorNet running the QSVM quantum circuit.
Ø We need to solve the overfitting problem in our study
Ø Understand the relationship and limitation of data size, circuit depth and qubit number on QSVM operation time.
Data size
Circuit depth
Qubit number
QSVM cuQuantum
0 4 8 12 16 20 24
84
86
88
90
92
94
96
98
100
102
Accuracy
Dimention
Trian
Test
11. Thanks
In this hackathon, we learned much about cuQuantum and GPU, hoping these
tools can accelerate quantum computing research in future.
Using the cuQuantum SDK to accelerate GPU-simulated quantum computing
brings hope to our research.
13. Result
Generative modeling is not typically used for binary classification tasks, as it is
designed to model the probability distribution of the data rather than to classify it
directly. However, it is possible to use generative models for binary classification tasks
by combining them with a discriminative model.
One approach is to use a generative model, such as a Gaussian mixture model or a
variational autoencoder, to model the distribution of the data for each class. Then, a
discriminative model, such as a logistic regression or a neural network, can be trained
on the latent variables generated by the generative model to perform binary
classification.
Another approach is to use a generative adversarial network (GAN) to generate
synthetic data for each class. The discriminator of the GAN can then be trained to
classify the real and synthetic data as belonging to one of the two classes.
In both cases, it is important to ensure that the generative model is able to capture the
underlying distribution of the data for each class accurately. Otherwise, the
performance of the discriminative model will be limited.
14. Result
Here are the steps for the 1st approach on using a generative model and a discriminative model for binary classification:
1.Define the problem: Let's say we want to classify images as either cats or dogs. We have a relatively small dataset of
labeled images, and we want to use a generative model to augment the dataset and improve the performance of our
classifier.
2.Train the generative model: We will train a generative model, such as a Gaussian mixture model or a variational
autoencoder, on the labeled images in our dataset. The generative model will learn the underlying probability
distribution of the data for each class (i.e., the distribution of cat images and the distribution of dog images). Once the
generative model is trained, it can generate synthetic images of cats and dogs that are similar to the real images in the
dataset.
3.Generate synthetic data: Using the trained generative model, we can generate synthetic images of cats and dogs to
augment our dataset. We can control the number and type of synthetic images we generate by adjusting the input to the
generative model.
4.Train the discriminative model: We will train a discriminative model, such as a logistic regression or a neural network,
on the latent variables generated by the generative model. The discriminative model will learn to classify the images as
either cats or dogs based on the latent variables.
5.Evaluate the performance: We can evaluate the performance of our classifier on a separate test set to see if the
synthetic data has improved its performance. If the performance has improved, we can continue to generate synthetic
data and use it to augment our dataset further.
Overall, the use of a generative model to model the distribution of the data and generate synthetic data can be a
powerful tool for improving the performance of binary classifiers, especially when the original dataset is small or
unbalanced.
15. Result
Here's an example of the 2nd approach on how to use a GAN for binary classification:
1.Define the problem: Let's say we want to classify images as either cats or dogs. We have a dataset of images of
cats and dogs, but it is relatively small, and we want to generate synthetic images to augment the dataset and
improve the performance of our classifier.
2.Build the GAN: We will build a GAN consisting of a generator and a discriminator. The generator takes in a
random noise vector and outputs a synthetic image, while the discriminator takes in an image and predicts
whether it is real (i.e., from the dataset) or fake (i.e., generated by the generator). We will train the generator and
discriminator in an adversarial manner, where the generator tries to fool the discriminator, and the discriminator
tries to correctly classify the images as real or fake.
3.Train the GAN: We will train the GAN on the images in our dataset until the generator can produce realistic-
looking synthetic images of cats and dogs.
4.Generate synthetic data: Once the GAN is trained, we can use the generator to generate synthetic images of cats
and dogs. We can control the number and type of synthetic images we generate by adjusting the random noise
vector input to the generator.
5.Train the classifier: We can then combine the real and synthetic images to create an augmented dataset and
train our binary classifier on this dataset. The classifier should be able to classify both the real and synthetic
images accurately.
6.Evaluate the performance: We can evaluate the performance of our classifier on a separate test set to see if the
synthetic data has improved its performance. If the performance has improved, we can continue to generate
synthetic data and use it to augment our dataset further.
Overall, the use of a GAN to generate synthetic data can be a powerful tool for improving the performance of
binary classifiers, especially when the original dataset is small or unbalanced.