7 Computational Giants of
Massive Data Analysis
Instructor: Assoc. Prof. PhD. Nguyễn Thanh Bình
Master students:
Đoàn Đức Thế Anh
Võ Nam Thục Đoan
Nguyễn Ngọc Bảo Trân
Trần Trung Hiếu
 22C01001
 22C01004
 22C01021
 22C01009
CHAPTER 10
Massive data analysis
cannot be processed using a stand-alone computer
use of existing (distributed and parallel) hardware platforms
challenges to traditional statistical methods and algorithms
overall system architecture
Tasks of
machine learning / data mining
•orthogonal range-search, nearest-neighbor
O(N)
•all-nearest-neighbors O(N2)
Querying
•mixture of Gaussians, kernel density
estimation O(N2)
•kernel conditional density estimation O(N3)
1.Density
estimation
•decision tree, nearest-neighbor classifier
O(N2)
•support vector machine O(N3)
Classification
•linear regression, LASSO, kernel regression
O(N2)
•Gaussian process regression O(N3)
Regression
• PCA, non-negative matrix
factorization, kernel PCA O(N3)
• maximum variance unfolding
O(N3)
Dimension
reduction
• k-means, mean-shift O(N2)
• hierarchical (FoF) clustering
O(N3)
Clustering
• MST O(N3)
• bipartite cross-matching O(N3)
• n-point correlation 2-sample
testing O(Nn)
Testing
and
matching
The “7 Computational Giants” of Data
(computational problem types)
Basic statistics
Generalized N-body problem
Graph-theoretic computations
Linear-algebraic computations
Optimization
Integration
Alignment problems
1
2
3
4
5
6
7
Basic statistics
• Descriptive statistics: summarize the data
and provide insights into its
– central tendency: mean, median, mode
– variability of a data set: variance,
standard deviation, count, min max,
quartiles, skewness and kurtosis
– frequency distribution
N data points  O(N) calculations
Basic statistics
• Inferential statistics :
– generalize results to larger
populations based on small
samples
– looking at how things change over
time
– use sampling methods to find
samples that are representative of
the whole population
– determine what is happening
N data points  O(N2) calculations
Why is statistical computing important in
research and decision-making?
Evidence-based analysis
Explore relationships between variables
Evaluating the effectiveness of interventions
Contributing to improved outcomes
A vital role in fields: healthcare, finance, marketing, and social
sciences
Basic statistics - Challenges
High dimensionality
High dimensionality + large
sample size
Big2 Data: from multiple
sources, at different time
points, using different
technologies
• noise accumulation
• spurious correlations
• Incidental homogeneity
• heavy computational
cost
• algorithmic instability
• heterogeneity
• experimental variations
• statistical biases
false scientific
conclusions
wrong statistical
inference
statistical biases
Basic statistics - Solutions
New
statistical
thinking
New
computational
methods
Solutions
variable selection
dimension
reduction
new regularization
methods
independence
screening
the development of new
computational infrastructure and
data storage methods
Generalized N-body problem
• The 17th century, Sir Isaac Newton
formulated:
– The laws of motion
– The law of universal gravitation
 the behavior of objects and their interactions
 Origin of the N-body problem: predicting the
motions of N celestial objects interacting with
each other gravitationally
• Karl Fritiof Sundman: solved for n = 3
• L. K. Babadzanjanz and Qiudong Wang: generalized to n > 3
N-body problem
• Three bodies
with equal
mass
[published
2000]
• Three bodies
of unequal
mass
• Two pairs of
bodies orbiting
about each
other
• An orbit discovered
in 2008 by
Tiancheng
Ouyang, Duokui
Yan, and Skyler
Simmons at BYU
Generalized N-body problem - Challenges
• Numerical approximations
• Chaotic behavior
• Interdisciplinary nature
• Main obstacle: O(N2)
Generalized N-body problem - Solutions
• Barnes-Hut Algorithm [Barnes and Hut, 87]:
if

r
s 
s
r
 
i
R
R
i x
K
N
x
x
K )
,
(
)
,
( 
O(N log N)
N(N-1)/2 = O(N2)
Generalized N-body problem - Solutions
• Fast Multipole Method [Greengard and Rokhlin 1987]:
 

i
i
x
x
K
x )
,
(
, O(N)
multipole/Taylor expansion
of order p
Quadtree
[Callahan-Kosaraju 95]: O(N) is impossible for log-depth tree
N(N-1)/2 = O(N2)
Linear Algebraic computations
Problems involves matrix operations, solving linear systems, finding eigenvalues
eigenvectors, inverves, orthogonality,...
Examples: linear regression, SVD, PCA, clustering, graph analysis, image processing
(edge detection, compression, blurring,...)
Linear regression
SVD
PCA
Clustering
Kernel cho
edge detection
- Matrix with slowly decaying spectra → high computational
complexity, sensitive to noise.
- Nearly singular matrix det(M)~0 → nearly non-invertible, sensitive to
small changes in matrix entries.
→ Some solution approaches:
- Truncated SVD, regularization, pseudoinverse using SVD
- Random sampling + Statistical methods
E.g.: Choose a random submatrix based on suitable probability
distributions from the given matrix to approximate SVD of the
whole.
Linear Algebraic computations - Challenges
Other challenges:
- Optimization problems: generic
LA approaches yield high
training accuracy which can
cause overfitting
→ Gradient descent, random
sampling
- The data grows too massive that
it cannot be stored or handled
by a single device
→ Distributed linear algebra
Gradient descent
Matrices are
checkerboard
distributed on
TPU during
multiplication
Linear Algebraic computations - Challenges
Appear in statistical methods from early on and frequently
E.g.: semidefinite programming in manifold learning.
→ Optimizations generally focuses on minimize/ maximize the objective function.
Optimization
Linear programing Quadratic programing
From unconstrained to
constrained, both convex
and non-convex
- A large number of variables and constraints
- Finding a global solution for non-convex problems is an open
problem.
- Problems with integer constraints (integer programming).
- Challenging problems, such as high-dimensional nonlinear objective
problems, may contain multiple local optima in which
deterministic optimization algorithms may get stuck
Optimization - Challenges
Some approaches:
- Exploit the particular mathematical forms of certain problems to
find more effective optimizers
E.g.: Sequential Minimal Optimization decomposes SVM into sub-
problems by iteratively selecting 2 Lagrange multipliers to solve
- Stochastic optimization (introduce randomness) + Online learning
E.g.: Stochastic Gradient Descent - iteratively update parameters
with a random subset of data instead of the entire data.
Online learning
Optimization
Some approaches:
- Distributed optimization
E.g.: Tensorflow, PyTorch
a) across processors b) across multiple nodes
Distribute optimization process
Optimization
Graph-Theoretic Computations
• Graph-theoretic computations
involve traversing graphs, which
can be the data itself or
represent statistical models.
• Common statistical
computations on graphs include
betweenness centrality and
commute distances, used to
identify nodes or communities of
interest.
• Large-scale, sparse graphs
present computational
challenges for these
computations.
Challenges and Approaches
• Challenges: High interconnectivity in graphs,
• large maximal clique size, and memory constraints.
• Notable approaches:
• Sampling and disk-based methods for handling large graphs.
• Parallel/distributed approaches using sparse linear algebra
or graph concepts.
• Graph partitioning and linear algebraic reconditioning for
efficient computations.
• Transformation of graphical model inference problems into
optimization or variational methods.
• Sampling and parallel/distributed approaches for graphical
model inference.
Additional Applications:
• Manifold learning methods: Iso-map requires all-pairs-shortest-paths
computation.
• Single-linkage hierarchical clustering: Equivalent to computing a
minimum spanning tree.
• These examples highlight the intersection between graph
computations and distance-based or N-body-type problems.
Integration in Data Analysis
• Integration is a key computation
in data analysis, essential for
Bayesian inference and statistical
modeling.
• Challenges arise with high-
dimensional integrals, requiring
specialized approaches.
Approaches to High-Dimensional Integration
1. Markov Chain Monte Carlo (MCMC)
– Default approach for high-dimensional integration.
– Utilizes a sequence of random samples to
approximate the integral.
– Widely used in Bayesian inference and random
effects models.
2. Approximate Bayesian Computation (ABC) Methods
– Operate on summary data to accelerate
computation.
– Useful for cases where exact inference is
challenging.
– Achieves acceleration by working with population
means or variances.
Alternative Approaches and Strategies
1. Population Monte Carlo
– Form of adaptive importance sampling.
– Enhances the efficiency of Monte Carlo integration.
– Particularly useful for certain sequential models, such as particle
filtering.
2. Variational Methods
– Convert integration problems into optimization problems.
– Provide a general framework for approximate inference.
– Offers an alternative strategy to address high-dimensional integration
challenges.
3. Optimization-Based Point Estimation
– Skirts the full integration problem.
– Used in approaches like maximum a posteriori inference and empirical
Bayesian inference.
– Involves optimizing point estimates rather than performing full Bayesian
inference.a
Alignment
Genomic data science
Genomic data science emerged as a field in the
1990s to bring together two laboratory activities:
Experimentation: Generating genomic
information from studying the genomes of
living organisms
Data analysis: Using statistical and
computational tools to analyze and
visualize genomic data, which includes
processing and storing data and using
algorithms and software to make
predictions based on available genomic
data
Facts
Data about a single human genome
sequence alone would take up 200
gigabytes
Need an estimated 40 exabytes to
store the genome- sequence data
generated worldwide by 2025
DNA to RNA to Protein, Illustrating the Genetic Code
Sequence alignment
Question about sequence
1. Biological question: “How similar are the genomes of humans and
chimpanzees?”
– Computational question: Given two sequences r and s, compute
their similarity, sim(s,r)
2. Biological question: “This gene causes obesity in mice. Do humans
have the same gene?”
– Computational question: Given a sequence r (the mouse gene)
and a database D of sequences (all human genes), find
sequences s in D where sim(r,s) is above a threshold
Question about sequence
3. Biological question: “We know some mutations of this gene cause sickle-cell anemia.
We have the sequences of 100 patients and 100 normal people. Let find out the disease-
causing mutations.
– Computational question: Given two sets of sequences of different lengths, find an
alignment that maximizes the overall similarity. Then look for mutations that are
unique to one group.
Patients ACGCGT ACGCGT ACGCGT
CGCGT _CGCGT _CGCGT
ACGCGA ACGCGA ACGCGA
Control AGCTT A_GCTT A_GCTT
ACGCTT ACGCTT ACGCTT
ACGCTA ACGCTA ACGCTA
Perfoming aligment
makes it easy to
compute the
similarity between
two sequences.
Scoring function
To compare the similarity of two string up to changes such as: Mutation, Insertion,
Deletion. For string AGGCCTC
Mutations: AGG A CTC
Insertions: AGG G CTCT
Deletions: AGG . CTC
Symbol:
Match : +m
Mismatch: -s
Gap: -d
Simple Scoring Function: F = (#matches) x m - (#mismatches) x s - (#gap) x d
Total score will reflect the quality of alignment
Standard of alignment
The highest score?
Problems
Solutions
Thank you for your time 😊
Computational Giants_nhom.pptx
Computational Giants_nhom.pptx
Computational Giants_nhom.pptx

Computational Giants_nhom.pptx

  • 1.
    7 Computational Giantsof Massive Data Analysis Instructor: Assoc. Prof. PhD. Nguyễn Thanh Bình Master students: Đoàn Đức Thế Anh Võ Nam Thục Đoan Nguyễn Ngọc Bảo Trân Trần Trung Hiếu  22C01001  22C01004  22C01021  22C01009 CHAPTER 10
  • 2.
    Massive data analysis cannotbe processed using a stand-alone computer use of existing (distributed and parallel) hardware platforms challenges to traditional statistical methods and algorithms overall system architecture
  • 3.
    Tasks of machine learning/ data mining •orthogonal range-search, nearest-neighbor O(N) •all-nearest-neighbors O(N2) Querying •mixture of Gaussians, kernel density estimation O(N2) •kernel conditional density estimation O(N3) 1.Density estimation •decision tree, nearest-neighbor classifier O(N2) •support vector machine O(N3) Classification •linear regression, LASSO, kernel regression O(N2) •Gaussian process regression O(N3) Regression • PCA, non-negative matrix factorization, kernel PCA O(N3) • maximum variance unfolding O(N3) Dimension reduction • k-means, mean-shift O(N2) • hierarchical (FoF) clustering O(N3) Clustering • MST O(N3) • bipartite cross-matching O(N3) • n-point correlation 2-sample testing O(Nn) Testing and matching
  • 4.
    The “7 ComputationalGiants” of Data (computational problem types) Basic statistics Generalized N-body problem Graph-theoretic computations Linear-algebraic computations Optimization Integration Alignment problems 1 2 3 4 5 6 7
  • 5.
    Basic statistics • Descriptivestatistics: summarize the data and provide insights into its – central tendency: mean, median, mode – variability of a data set: variance, standard deviation, count, min max, quartiles, skewness and kurtosis – frequency distribution N data points  O(N) calculations
  • 6.
    Basic statistics • Inferentialstatistics : – generalize results to larger populations based on small samples – looking at how things change over time – use sampling methods to find samples that are representative of the whole population – determine what is happening N data points  O(N2) calculations
  • 7.
    Why is statisticalcomputing important in research and decision-making? Evidence-based analysis Explore relationships between variables Evaluating the effectiveness of interventions Contributing to improved outcomes A vital role in fields: healthcare, finance, marketing, and social sciences
  • 8.
    Basic statistics -Challenges High dimensionality High dimensionality + large sample size Big2 Data: from multiple sources, at different time points, using different technologies • noise accumulation • spurious correlations • Incidental homogeneity • heavy computational cost • algorithmic instability • heterogeneity • experimental variations • statistical biases false scientific conclusions wrong statistical inference statistical biases
  • 9.
    Basic statistics -Solutions New statistical thinking New computational methods Solutions variable selection dimension reduction new regularization methods independence screening the development of new computational infrastructure and data storage methods
  • 10.
    Generalized N-body problem •The 17th century, Sir Isaac Newton formulated: – The laws of motion – The law of universal gravitation  the behavior of objects and their interactions  Origin of the N-body problem: predicting the motions of N celestial objects interacting with each other gravitationally • Karl Fritiof Sundman: solved for n = 3 • L. K. Babadzanjanz and Qiudong Wang: generalized to n > 3
  • 11.
    N-body problem • Threebodies with equal mass [published 2000] • Three bodies of unequal mass • Two pairs of bodies orbiting about each other • An orbit discovered in 2008 by Tiancheng Ouyang, Duokui Yan, and Skyler Simmons at BYU
  • 12.
    Generalized N-body problem- Challenges • Numerical approximations • Chaotic behavior • Interdisciplinary nature • Main obstacle: O(N2)
  • 13.
    Generalized N-body problem- Solutions • Barnes-Hut Algorithm [Barnes and Hut, 87]: if  r s  s r   i R R i x K N x x K ) , ( ) , (  O(N log N) N(N-1)/2 = O(N2)
  • 14.
    Generalized N-body problem- Solutions • Fast Multipole Method [Greengard and Rokhlin 1987]:    i i x x K x ) , ( , O(N) multipole/Taylor expansion of order p Quadtree [Callahan-Kosaraju 95]: O(N) is impossible for log-depth tree N(N-1)/2 = O(N2)
  • 15.
    Linear Algebraic computations Problemsinvolves matrix operations, solving linear systems, finding eigenvalues eigenvectors, inverves, orthogonality,... Examples: linear regression, SVD, PCA, clustering, graph analysis, image processing (edge detection, compression, blurring,...) Linear regression SVD PCA Clustering Kernel cho edge detection
  • 16.
    - Matrix withslowly decaying spectra → high computational complexity, sensitive to noise. - Nearly singular matrix det(M)~0 → nearly non-invertible, sensitive to small changes in matrix entries. → Some solution approaches: - Truncated SVD, regularization, pseudoinverse using SVD - Random sampling + Statistical methods E.g.: Choose a random submatrix based on suitable probability distributions from the given matrix to approximate SVD of the whole. Linear Algebraic computations - Challenges
  • 17.
    Other challenges: - Optimizationproblems: generic LA approaches yield high training accuracy which can cause overfitting → Gradient descent, random sampling - The data grows too massive that it cannot be stored or handled by a single device → Distributed linear algebra Gradient descent Matrices are checkerboard distributed on TPU during multiplication Linear Algebraic computations - Challenges
  • 18.
    Appear in statisticalmethods from early on and frequently E.g.: semidefinite programming in manifold learning. → Optimizations generally focuses on minimize/ maximize the objective function. Optimization Linear programing Quadratic programing From unconstrained to constrained, both convex and non-convex
  • 19.
    - A largenumber of variables and constraints - Finding a global solution for non-convex problems is an open problem. - Problems with integer constraints (integer programming). - Challenging problems, such as high-dimensional nonlinear objective problems, may contain multiple local optima in which deterministic optimization algorithms may get stuck Optimization - Challenges
  • 20.
    Some approaches: - Exploitthe particular mathematical forms of certain problems to find more effective optimizers E.g.: Sequential Minimal Optimization decomposes SVM into sub- problems by iteratively selecting 2 Lagrange multipliers to solve - Stochastic optimization (introduce randomness) + Online learning E.g.: Stochastic Gradient Descent - iteratively update parameters with a random subset of data instead of the entire data. Online learning Optimization
  • 21.
    Some approaches: - Distributedoptimization E.g.: Tensorflow, PyTorch a) across processors b) across multiple nodes Distribute optimization process Optimization
  • 22.
    Graph-Theoretic Computations • Graph-theoreticcomputations involve traversing graphs, which can be the data itself or represent statistical models. • Common statistical computations on graphs include betweenness centrality and commute distances, used to identify nodes or communities of interest. • Large-scale, sparse graphs present computational challenges for these computations.
  • 23.
    Challenges and Approaches •Challenges: High interconnectivity in graphs, • large maximal clique size, and memory constraints. • Notable approaches: • Sampling and disk-based methods for handling large graphs. • Parallel/distributed approaches using sparse linear algebra or graph concepts. • Graph partitioning and linear algebraic reconditioning for efficient computations. • Transformation of graphical model inference problems into optimization or variational methods. • Sampling and parallel/distributed approaches for graphical model inference.
  • 24.
    Additional Applications: • Manifoldlearning methods: Iso-map requires all-pairs-shortest-paths computation. • Single-linkage hierarchical clustering: Equivalent to computing a minimum spanning tree. • These examples highlight the intersection between graph computations and distance-based or N-body-type problems.
  • 25.
    Integration in DataAnalysis • Integration is a key computation in data analysis, essential for Bayesian inference and statistical modeling. • Challenges arise with high- dimensional integrals, requiring specialized approaches.
  • 26.
    Approaches to High-DimensionalIntegration 1. Markov Chain Monte Carlo (MCMC) – Default approach for high-dimensional integration. – Utilizes a sequence of random samples to approximate the integral. – Widely used in Bayesian inference and random effects models. 2. Approximate Bayesian Computation (ABC) Methods – Operate on summary data to accelerate computation. – Useful for cases where exact inference is challenging. – Achieves acceleration by working with population means or variances.
  • 27.
    Alternative Approaches andStrategies 1. Population Monte Carlo – Form of adaptive importance sampling. – Enhances the efficiency of Monte Carlo integration. – Particularly useful for certain sequential models, such as particle filtering. 2. Variational Methods – Convert integration problems into optimization problems. – Provide a general framework for approximate inference. – Offers an alternative strategy to address high-dimensional integration challenges. 3. Optimization-Based Point Estimation – Skirts the full integration problem. – Used in approaches like maximum a posteriori inference and empirical Bayesian inference. – Involves optimizing point estimates rather than performing full Bayesian inference.a
  • 28.
  • 29.
    Genomic data science Genomicdata science emerged as a field in the 1990s to bring together two laboratory activities: Experimentation: Generating genomic information from studying the genomes of living organisms Data analysis: Using statistical and computational tools to analyze and visualize genomic data, which includes processing and storing data and using algorithms and software to make predictions based on available genomic data Facts Data about a single human genome sequence alone would take up 200 gigabytes Need an estimated 40 exabytes to store the genome- sequence data generated worldwide by 2025
  • 30.
    DNA to RNAto Protein, Illustrating the Genetic Code
  • 33.
  • 35.
    Question about sequence 1.Biological question: “How similar are the genomes of humans and chimpanzees?” – Computational question: Given two sequences r and s, compute their similarity, sim(s,r) 2. Biological question: “This gene causes obesity in mice. Do humans have the same gene?” – Computational question: Given a sequence r (the mouse gene) and a database D of sequences (all human genes), find sequences s in D where sim(r,s) is above a threshold
  • 36.
    Question about sequence 3.Biological question: “We know some mutations of this gene cause sickle-cell anemia. We have the sequences of 100 patients and 100 normal people. Let find out the disease- causing mutations. – Computational question: Given two sets of sequences of different lengths, find an alignment that maximizes the overall similarity. Then look for mutations that are unique to one group. Patients ACGCGT ACGCGT ACGCGT CGCGT _CGCGT _CGCGT ACGCGA ACGCGA ACGCGA Control AGCTT A_GCTT A_GCTT ACGCTT ACGCTT ACGCTT ACGCTA ACGCTA ACGCTA Perfoming aligment makes it easy to compute the similarity between two sequences.
  • 37.
    Scoring function To comparethe similarity of two string up to changes such as: Mutation, Insertion, Deletion. For string AGGCCTC Mutations: AGG A CTC Insertions: AGG G CTCT Deletions: AGG . CTC Symbol: Match : +m Mismatch: -s Gap: -d Simple Scoring Function: F = (#matches) x m - (#mismatches) x s - (#gap) x d Total score will reflect the quality of alignment
  • 38.
  • 39.
  • 40.
  • 41.
    Thank you foryour time 😊

Editor's Notes

  • #2 Entered text Massive data refers to a large amount of data that is too difficult to process using traditional tools like spreadsheets or text processors. It can exist in structured or unstructured form and consists of petabytes and exabytes of data. Big data can be analyzed for insights that improve decisions and give confidence for making strategic business moves. Processing massive data, also known as big data, can present several challenges. Here are some common ones: Storage, Processing speed, Data quality, Security, Data integration, Cost, Scalability
  • #3 Giới thiệu massive data -> kiến trúc hệ thống
  • #16 Giảm chiều dữ liệu có thể được sử dụng cho giảm nhiễu (noise reduction), trực quan hóa dữ liệu (data visualization), phân tích cụm, hoặc là một bước trung gian để tạo điều kiện thuận lợi cho các phân tích khác.
  • #17  its inverse may be highly sensitive to small changes in the matrix entries. Nearly non-inverible → iterative
  • #18 dividing the computational workload and data across multiple processing units
  • #19 Linear programing (determine the best outcome in a linear mathematical model, given a set of linear constraints.) LA computations are a special case (2nd-order optimization). quadratic(quadratic objective function and linear constraints) 2nd-order cone programming (linear objective, linear constraints bao gồm 2nd order cone deals with the optimization of linear objective functions subject to linear matrix inequality constraints. It generalizes linear programming to handle optimization problems involving positive semidefinite matrices. Manifold learning: học cấu trúc trong dữ lieệu cao chiều – biểu diển ít chiều hơn
  • #20 Các bài toán tối ưu được biểu diễn dưới dạng mô hình hóa toán học với Huấn luyện SVM yêu cầu tìm nghiệm của QP rất lớn, tốn nhiều tgian A stochastic program is an optimization problem in which some or all problem parameters are uncertain, but follow known probability distributions. This framework contrasts with deterministic optimization, in which all problem parameters are assumed to be known exactly.
  • #21 Các bài toán tối ưu được biểu diễn dưới dạng mô hình hóa toán học với Huấn luyện SVM yêu cầu tìm nghiệm của QP rất lớn, tốn nhiều tgian exploits the particular structure of this quadratic optimization problem of SVM by iteratively selecting two Lagrange multipliers and solving a sub-problem to update them. he objective function aims to maximize the margin between the decision boundary and the support vectors while minimizing the classification errors. The Lagrange multipliers (α values) are the variables to be optimized. The constraints ensure that the sum of the Lagrange multipliers weighted by the corresponding target variables is zero and that the Lagrange multipliers are within a specified range (0 ≤ α[i] ≤ C). thuật toán GD trong deep learning receives a sequence of data points one at a time and updates its model iteratively. the use of randomness in the objective function or in the optimization algorithm.
  • #22 Các bài toán tối ưu được biểu diễn dưới dạng mô hình hóa toán học với Huấn luyện SVM yêu cầu tìm nghiệm của QP rất lớn, tốn nhiều tgian exploits the particular structure of this quadratic optimization problem of SVM by iteratively selecting two Lagrange multipliers and solving a sub-problem to update them. he objective function aims to maximize the margin between the decision boundary and the support vectors while minimizing the classification errors. The Lagrange multipliers (α values) are the variables to be optimized. The constraints ensure that the sum of the Lagrange multipliers weighted by the corresponding target variables is zero and that the Lagrange multipliers are within a specified range (0 ≤ α[i] ≤ C). thuật toán GD trong deep learning receives a sequence of data points one at a time and updates its model iteratively. the use of randomness in the objective function or in the optimization algorithm.
  • #38 Để so sánh độ tương tự giữa 2 chuỗi với các thay đổi như đột biến, chèn hoặc xoá. Ví dụ chuỗi AGGCCTC Interactive demo for Needleman–Wunsch algorithm (mostafa.io)
  • #39 Tiêu chuẩn đánh giá Alignment
  • #41 Để giải quyết vấn đề và đạt được hiệu quả tính toán có thể hướng đến các hướng sau: sampling, parallel/distributed computing, algorithms
  • #44 Interactive demo for Needleman–Wunsch algorithm (mostafa.io)