SlideShare a Scribd company logo
1 of 58
Download to read offline
AI-Bio 융합 전문 과정
2022-8~10
윤형기 (hky@openwith.net)
4일차
주제 세부사항
1일차 인사 및 과정 소개
인사
수강생 현황 및 수강목적 등 파악
의료/바이오 개관 (기술/산업) 의료/바이오 기술 및 산업동향
기반기술 (1-1) Python과 분석 패키지 분석도구 (1) (Python, Scipy, numpy/pandas)
2일차 기반기술 (1-2) R과 통계분석 분석도구 (2) (R과 통계학)
생명통계 활용 (1) 생명정보와 ANOVA, 다변량분석 등
유전체 분석
3일차 생명통계 활용 (2) 메타분석
유전체 분석 (Omics) (1)
유전체(genome) 분석
전사체(transcriptome) 분석
4일차 유전체 분석 (Omics) (2)
후성유전체(epigenome) 분석
단백체(proteome) 분석
차세대 Sequencing
GenBank와 NCBI데이터
VCF 데이터 분석, NGS 데이터 처리 등
5일차 기반기술 (3) 기계학습 (1)
모델링 방법론 (모델 개념 및 Cross-Validation)
지도학습 알고리즘 (선형모델, 분류)
기반기술 (3) 기계학습 (2) 비지도학습 알고리즘 (군집, 연관분석 등)
6일차 지도학습과 생명정보 응용
의료데이터에서의 예측모델
선형모델과 헬스케어 데이터의 분류
비지도학습과 생명정보 응용
임상데이터의 연관성분석
동반질병 (comorbidity) 분석
의료/바이오 도메인 이해
헬스케어 데이터셋과 생명통계
바이오 데이터와 기계학습
일정
주제 세부사항
7일차 기반기술 (4) 딥러닝 (1) 신경망 학습과 딥러닝 모델
기반기술 (3) 딥러닝 (2)
TensorFlow
PyTorch
8일차 딥러닝과 생명정보 응용
Bi-LSTM을 이용한 헬스케어 시뮬레이션
딥러닝을 이용한 피부병 식별
온톨로지와 생명정보 응용
세만틱웹과 ontologies
Ontology의 생명정보 응용
9일차 기반 기술 (3) 이미지 처리 이미지 처리와 컴퓨터 비전 개요
의료영상분석 (1)
Segmentation
영상등록 (image registration)
10일차 의료영상분석 (2)
심전도 (ECG)
Rendering과 Surface Models
MRI
11일차 기반기술 (4) 생명정보와 계산화학 계산화학 (computational chemistry) 개요
신약개발 (drug discovery) (1)
표적규명 (target identification)
시약과 검정법 개발
ADME (흡수, 분포, 대사, 배설)
독성학과 기계학습 응용
12일차 기반 기술 (5) GAN GAN (Generative Adversarial Networks)과 VAE
신약개발과 GAN 생성모델을 이용한 신약후보물질 추천
총정리 Wrap-up 총정리
의료영상 분석
약물분석과 신약설계
바이오 데이터와 딥러닝
유전체 분석
생명정보학 주요 주제
• 서열정렬
– Pairwise Sequence Alignment
– Database 유사도 검색
– Multiple Sequence Alignment
– Profile과 HMM
– Protein Motifs and Domain
Prediction
• Gene과 Promoter 예측
– 유전자 예측
– Promoter and Regulatory
Element Prediction
• 분자 계통 발생학
(Molecular Phylogenetics)
– Phylogenetics Basics
– Phylogenetic Tree Construction
Methods and Programs
• 구조적 생명정보학
(Structural Bioinformatics)
– 단백질 구조 시각화, 비교 & 분류
– Protein 구조 Structure 예측
(2ndary, Tertiary)
– RNA 구조 예측
• 유전체학과 전사체학
(Genomics & Proteomics)
– 유전체 Mapping, Assembly, 비교
– 기능 유전체학
– Proteomics
• Genome rearrangements
• Motif finding
• Gene expression analysis
서열정렬
보충: 유전 부호(genetic code)
• 1. 개요
– 각 codon이 어떤 아미노산을 부호화(encoding)할지를 정해놓은 규칙
• 2. 코돈 Codon
– 단백질의 아미노산을 지정하는 RNA의 유전 정보
– RNA 구성 염기: Uracil, Guanine, Cytosine, Adenine
– 한 codon은 3개 염기로 구성 - 이론상 4×4×4=64종의 정보 지정.
• 3. 종류
– 3.1. 개시 코돈 start codon
• 5'-AUG-3’ (일부 박테리아에서 변형된 개시 코돈 사용).
• 진핵 생물에서는 메싸이오닌(Methionine, Met)을,
원핵생물에서는 N-포르밀메싸이오닌(N-Formylmethionine, fMet)을 지정.
• 또한 mRNA가 리보솜과 결합해 단백질 번역을 시작하도록 하는 역할도 수행
– 3.2. 종결 코돈 Stop Codon, Nonsense Codon
• 단백질 번역의 끝을 알리는 codon으로서 UAA, UAG, UGA의 세 종류
• 종결 코돈에는 대응하는 tRNA가 없고 대신 '종결 인자'라는 단백질이 붙으며, 번역 과
정에서 종결 코돈에 도달하면 리보솜의 두 단위체가 분리되어 번역이 종결된다.
– 3.3. 안티코돈(역코돈) anticodon
• tRNA의 RNA 사슬을 이루는 특정 구간의 염기 서열.
Pairwise Sequence Alignment
• 배경
• Sequence Homology (서열 상동성) vs. Sequence Similarity
• Sequence Similarity vs. Sequence Identity
• 기법
– Global Alignment and Local Alignment
– Alignment Algorithms
– Dot Matrix Method
– Dynamic Programming Method
• Gap Penalties
• Dynamic Programming for Global Alignment
• Dynamic Programming for Local Alignment
• Scoring 행렬
– Amino Acid Scoring 행렬
– PAM 행렬
– BLOSUM 행렬
– Comparison between PAM and BLOSUM
• Sequence Alignment의 통계적 유의성
• (Goal)
• 서열 비교
 “공통 character patterns” 과 residue–residue 대응관계를 찾아냄
• 배경 – 진화
• DNA와 protein은 진화의 소산
– The degree of sequence conservation in the alignment reveals
evolutionary relatedness of different sequences, whereas the
variation between sequences reflects the changes that have occurred
during evolution in the form of substitutions, insertions, and
deletions.
• sequence alignment
– can be used as basis for prediction of structure and function of
uncharacterized sequences.
– provides inference for the relatedness of two sequences under study.
Sequence Homology vs. Similarity
• (…)
– 용어 구별
• Homologous relationship or share homology.
– an inference or a conclusion about a common ancestral relationship
drawn from sequence similarity comparison when the two sequences
share a high enough degree of similarity. (qualitative)
• Sequence similarity
– is a direct result of observation from the sequence alignment.
– % of aligned residues that are similar in physiochemical properties
such as size, charge, and hydrophobicity. (quantitative)
– 문제는 sequence similarity level
• Nucleotide sequences consist of only 4 characters → unrelated
sequences have at least a 25% chance of being identical.
• protein sequences - 20 possible amino acid residues → two
unrelated sequences can match up 5% of the residues by random
chance.
– 단, % identity values only provide a tentative guidance for homology
identification
3 zones of protein sequence alignments. (Source: Modified from Rost 1999).
Sequence Similarity vs. Sequence Identity
• (…)
• nucleotide sequence의 경우 사실상 같은 의미
• Protein sequence의 경우 구별할 것
– sequence identity = % of matches of the same amino acid residues
between two aligned sequences.
– Similarity = % of aligned residues that have similar physicochemical
characteristics and can be more readily substituted for each other.
– Sequence similarity 및 identity 계산 방법
– One involves use of the overall sequence lengths of both sequences
– the other normalizes by the size of the shorter sequence.
Methods
• Global Alignment and Local Alignment
• Global Alignment
– 처음부터 끝까지 비교
» is more applicable for aligning two closely related sequences of
roughly the same length.
» For divergent sequences and sequences of variable lengths, this
method may not be able to generate optimal results because it
fails to recognize highly similar local regions between the two
sequences.
• Local alignment
– only finds local regions with the highest level of similarity between
the two sequences and aligns these regions without regard for the
alignment of the rest of the sequence regions
– Two sequences to be aligned can be of different lengths
pairwise sequence 비교의 예
• 정렬 알고리즘
– Dot Matrix Method (= dot plot method)
– Dynamic Programming Method
• Gap Penalties
• Dynamic Programming for Global Alignment
• Dynamic Programming for Local Alignment
– Word method
– Dot Matrix Method
dot plot에 의한 서열비교의 예. Lines linking the dots in diagonals indicate
sequence alignment. Diagonal lines above or below the main diagonal
represent internal repeats of either sequence
• Problem when comparing large sequences using dot matrix
method
– high noise level.
» In most dot plots, dots are plotted all over the graph, obscuring
identification of the true alignment - particularly acute for DNA
sequences because only 4 possible characters in DNA and each
residue therefore has a 1-in-4 chance of matching a residue in
another sequence.
» To reduce noise, instead of using a single residue to scan for
similarity, a filtering technique has to be applied, which uses a
“window” of fixed length covering a stretch of residue pairs.
• self comparison as a variation of using the dot plot method.
– a main diagonal for perfect matching of each residue  identify
internal repeat elements
– If repeats are present, short parallel lines are observed above and
below the main diagonal.
» Self complementarity of DNA sequences (also called inverted
repeats) can also be identified using a dot plot.
» In this case, a DNA sequence is compared with its reverse-
complemented sequence.
– Parallel diagonals represent the inverted repeats.
– 장점
» easy identification of greatest similarities.
– 단점
» it is often up to the user to construct a full alignment with
insertions and deletions by linking nearby diagonals.
» it lacks statistical rigor in assessing the quality of the alignment.
» is also restricted to pairwise alignment. It is difficult for the
method to scale up to multiple alignment.
– Dynamic Programming Method
• (…)
– convert a dot matrix into a scoring matrix to account for matches
and mismatches between sequences. By searching for the set of
highest scores in this matrix, the best alignment can be accurately
obtained.
– construct a 2-D matrix.
» The residue matching is according to a particular scoring matrix.
The scores are calculated one row at a time. This starts with the
first row of one sequence, which is used to scan through the
entire length of the other sequence, followed by scanning of
the second row. The matching scores are calculated.
• Gap Penalties
– Apply gaps that represent insertions and deletions.
– cost difference between opening a gap and extending an existing
gap.
» it is easier to extend a gap that has already been started. Thus,
gap opening have a much higher penalty  if insertions and
deletions ever occur, several adjacent residues are likely to have
been inserted or deleted together.
» affine gap penalties (= These differential gap penalties).
» Strategy: use preset gap penalty values for introducing and
extending gaps.
» The total gap penalty (W) is a linear function of gap length:
» a constant gap penalty - less realistic
γ = gap opening penalty,
δ = gap extension penalty,
k = length of the gap.
• DP for Global Alignment (Needleman–Wunsch algorithm)
– an optimal alignment is obtained over the entire lengths of the two
sequences.
– Drawback = risk of missing the best local similarity → only suitable
for aligning two closely related sequences that are of the same
length. (For divergent sequences or sequences with different domain
structures, the approach does not produce optimal alignment)
• DP for Local Alignment (Smith–Waterman algorithm)
– identification of regional sequence similarity
Scoring 행렬
• (…) = a substitution 행렬
• is derived from statistical analysis of residue substitution data
from sets of reliable alignments of highly related sequences.
– A positive value or high score is given for a match and a negative
value or low score for a mismatch.
– Assumption: the frequencies of mutation are equal for all bases.
단, 비현실적 가정임
• Scoring matrices for amino acids are more complicated
–  the physicochemical properties of amino acid residues, as well as
the likelihood of certain residues being substituted among true
homologous sequences.
– Certain amino acids with similar physicochemical properties can be
more easily substituted than those without similar characteristics.
Substitutions among similar residues are likely to preserve the
essential functional and structural features. However, substitutions
between residues of different physicochemical properties are more
likely to cause disruptions to the structure and function.
• Amino Acid Scoring 행렬
– 20 x 20 matrices to reflect the likelihood of residue substitutions
• 2 types of amino acid substitution matrices.
– (i) based on interchangeability of the genetic code or amino acid
properties,
» is based on genetic code or the physicochemical features of
amino acids → less accurate
– (ii) derived from empirical studies of amino acid substitutions.
»  surveys of actual amino acid substitutions among related
proteins.
» PAM and BLOSUM matrices derived from actual alignments of
highly similar sequences. By analyzing the probabilities of
amino acid substitutions in these alignments, a scoring system
can be developed by giving a high score for a more likely
substitution and a low score for a rare substitution.
• PAM 행렬 (Dayhoff PAM 행렬)
• point accepted mutation
Correspondence of PAM Numbers with Observed
Amino Acid Mutational Rates
• BLOSUM 행렬
• the series of blocks amino acid substitution matrices (BLOSUM)
– → (In PAM matrix construction, the only direct observation of
residue substitutions is in PAM1, based on a relatively small set of
extremely closely related sequences. Sequence alignment statistics
for more divergent sequences are not available. )
– all are derived based on direct observation for every possible amino
acid substitution in multiple sequence alignments.
• extrapolation 함수 대신, BLOSUM matrices are actual % identity
values of sequences selected for construction of the matrices.
PAM250 amino acid substitution matrix. Residues are
grouped according to physicochemical similarities.
BLOSUM62 amino acid substitution matrix.
• PAM과 BLOSUM의 비교
• 주된 차이점
– PAM matrices, except PAM1, are derived from an evolutionary model
– BLOSUM matrices consist of entirely direct observations.
» BLOSUM matrices are entirely derived from local sequence
alignments of conserved sequence blocks,
» PAM1 matrix is based on the global alignment of full-length
sequences composed of both conserved and variable regions. →
BLOSUM matrices is more advantageous in searching databases and
finding conserved domains in proteins.
• 몇몇 실증 비교의 결과
– BLOSUM matrices outperform the PAM matrices in terms of accuracy of
local alignment, largely because BLOSUM matrices are derived from a
much larger and more representative dataset than the one used to derive
the PAM matrices. → BLOSUM matrices more reliable.
– 개정된 행렬이 고안됨. (ex) Gonnet matrices and Jones–Taylor–Thornton
matrices –particularly robust in phylogenetic tree construction .
alignment score에 대한 Gumble 극값 분포.
Sequence Alignment의 통계적 유의성
• 개념
• True evidence of homology를 찾기 위한 통계검정
– 검정 절차
• A P-value resulting from the test
– < 10-100 indicates an exact match between the two sequences.
– 10-100 < P-value < 10-50 → a nearly identical match.
– 10-50 < P-value < 10-5 → sequences having clear homology.
– 10-5 < P-value < 10-1 → possible distant homologs.
– 10-1 < P-value → the two sequence may be randomly related.
– However, sometimes truly related protein sequences may lack the
statistical significance at the sequence level owing to fast divergence
rates. Their evolutionary relationships can nonetheless be revealed at
the three-dimensional structural level.
Database 유사도 검색
• DB 검색의 요건
• Heuristic 검색
• Basic Local Alignment Search Tool (BLAST)
– Variants
– Statistical Significance
– Low Complexity Regions
– BLAST Output Format
• FASTA
– 통계적 유의성
• FASTA와 BLAST의 비교
• Smith–Waterman Method에 의한 검색
일반론
• DB 검색
• pairwise alignment to retrieve biological sequences in DBs based on
similarity.
– Query for a pairwise comparison with all individual sequences in a
database. - Database similarity searching is pairwise alignment on a large
scale.
– However, DP is slow and impractical to use in most cases. Special search
methods are needed to speed up the computational process.
• DB 검색의 요건
• Sensitivity → “true positives”
• specificity = “false positives.”
• speed
– Types of algo
• Exhaustive type – examine all mathematical combinations (ex) DP
• Heuristic type – find empirical or near optimal solution using rules of
thumb
Heuristic 검색
• (…)
– BLAST
– FASTA
– word method
• Both BLAST and FASTA use a heuristic “word method” for fast
pairwise sequence alignment.
Basic Local Alignment Search Tool (BLAST)
• 목적
– = high-scoring ungapped segments를 찾아내고자 함 - Segments
above a given threshold indicates pairwise similarity beyond random
chance.
BLOSUM62 matrix에 의한 alignment scoring의 예
• 변형된 방법론
– BLASTN
– BLASTP
– BLASTX
– TBLASTX
• 통계적 유의성
– The larger the DB, the more unrelated sequence alignments.
→ a new parameter taking into account total number of sequence
alignments conducted, proportional to the size of the database.
• In BLAST searches, E-value (expectation value)
– indicates the probability that the resulting alignments from a DB
search are caused by random chance.
– E-value is related to the P-value used to assess significance of single
pairwise alignment. BLAST compares a query sequence against all
database sequences, and so the E-value is determined by:
– (ex) …
• A bit score
– Measures sequence similarity independent of query sequence length
and DB size and is normalized based on the raw pairwise alignment
score
• Low Complexity Regions (LCRs)
• For both protein and DNAsequences, there may be regions that
contain highly repetitive residues, such as short segments of
repeats, or segments that are overrepresented by a small number
of residues.
– LCRs are rather prevalent in DB sequences; about 15% of the total
protein sequences in public databases. → spurious DB matches and
lead to artificially high alignment scores with unrelated sequences.
• To avoid the problem of high similarity scores owing to matching
of LCRs, filter out the problematic regions in both query and DB
sequences to improve SN ratio,(= masking)
• 2 types of masking: hard and soft.
• SEG detects and mask repetitive elements before executing DB
searches.
– SEG has been integrated into the BLAST web based program.
• BLAST Output Format
FASTA
• (…)
• 최초의 DB 유사도 검색 도구
• find matches for a short stretch of identical residues with a
length of k. (“hashing” 방식)
– string of residues (= ktuples or ktups) are equivalent to words in
BLAST, but are normally shorter than words. Typically, a ktup is
composed of two residues for protein sequences and six residues for
DNA sequences.
• Similar to BLAST, FASTA has a number of subprograms.
Procedure of ktup identification using the hashing strategy by FASTA. Identical
offset values between residues of the two sequences allow the formation of ktups.
Steps of the FASTA alignment procedure. In step 1 (left ), all possible ungapped
alignments are found between two sequences with the hashing method. In step 2
(middle), the alignments are scored according to a particular scoring matrix. Only
the ten best alignments are selected. In step 3 (right ), the alignments in the same
diagonal are selected and joined to form a single gapped alignment, which is
optimized using the dynamic programming approach.
• 통계적 유의성
• FASTA also uses E-values and bit scores.
– essentially the same as in BLAST, but the FASTA output provides one
more statistical parameter, the Z-score.
» Because most of the alignments with the query sequence are
with unrelated sequences, the higher the Z-score for a reported
match, the further away from the mean of the score distribution,
hence, the more significant the match.
» For a Z-score > 15, the match can be considered extremely
significant, with certainty of a homologous relationship.
» If Z is in the range of 5 to 15, the sequence pair can be
described as highly probable homologs.
» If Z < 5, their relationships is described as less certain.
FASTA와 BLAST의 비교
• (…)
• BLAST and FASTA perform equally well in regular DB searching.
• differences (Notably seeding step)
– BLAST uses a substitution matrix to find matching words
» use of low-complexity masking in BLAST → higher specificity
than FASTA because potential FPs are reduced.
» BLAST sometimes gives multiple best-scoring alignments from
the same sequence;
– FASTA identifies identical matching word using hashing procedure.
» By default, FASTA scans smaller window sizes. → more sensitive
results than BLAST, with a better coverage rate for homologs.
However, it is usually slower than BLAST.
» FASTA returns only one final alignment.
다중 서열정렬
(Multiple Sequence Alignment)
• Scoring 함수
• Exhaustive Algorithms
• Heuristic Algorithms
– Progressive Alignment Method
– Drawbacks and Solutions
– Iterative Alignment
– Block-Based Alignment
• 검토사항
– Protein-Coding DNA Sequences
– Editing
– Format Conversion
• 개념
• generation of multiple matching sequence pairs → convert
numerous pairwise alignments into a single alignment → arrange
sequences in such a way that evolutionarily equivalent positions
across all sequences are matched.
• 장점
– reveals more biological information than pairwise alignments can.
– applications in designing degenerate PCR primers based on multiple
related sequences.
• DP vs. Heuristic
– the amount of computing time and memory DP requires increases
exponentially as the number of sequences increases. In practice,
heuristic approaches are most often used.
Scoring 함수
• (…)
• MSA is to arrange sequences in such a way that a max no. of
residues from each sequence are matched up according to a
particular scoring function.
» = sum of pairs (SP). (= sum of scores of all possible pairs of sequences in
a multiple alignment based on a particular scoring matrix).
– In calculating SP scores, each column is scored by summing the
scores for all possible pairwise matches, mismatches and gap costs.
The score of the entire alignment is the sum of all of column scores.
– The purpose of most multiple sequence alignment algorithms is to
achieve maximum SP scores.
Exhaustive Algorithms
Heuristic Algorithms
• (3 categories)
– Progressive Alignment Method
– Iterative Alignment
– Block-Based Alignment
• Progressive Alignment Method
– Drawbacks and Solutions
Schematic of a typical progressive alignment procedure (e.g., Clustal).
Angled wavy lines represent consensus sequences for sequence pairs A/B
and C/D. Curved wavy lines represent a consensus for A/B/C/D.
Conversion of a sequence alignment into a graphical profile in
the Poa algorithm. Identical residues in the alignment are
condensed as nodes in the partial order graph.
• Iterative Alignment
• Block-Based Alignment
Schematic of iterative alignment procedure for PRRN, which
involves two sets of iterations.
실습 (1) PYTHON
• Source
실습 (2) R
• Source

More Related Content

What's hot

Essential Regulatory Documents in Clinical Trials
Essential Regulatory Documents in Clinical TrialsEssential Regulatory Documents in Clinical Trials
Essential Regulatory Documents in Clinical TrialsTrialJoin
 
Use of Rasmol and study of proteins
Use of Rasmol and study of proteins Use of Rasmol and study of proteins
Use of Rasmol and study of proteins kamalmodi481
 
Presentation on CDISC- SDTM guidelines.
Presentation on CDISC- SDTM guidelines.Presentation on CDISC- SDTM guidelines.
Presentation on CDISC- SDTM guidelines.Khushbu Shah
 
CLINICAL STUDY REPORT - IN-TEXT TABLES, TABLES FIGURES AND GRAPHS, PATIENT AN...
CLINICAL STUDY REPORT - IN-TEXT TABLES, TABLES FIGURES AND GRAPHS, PATIENT AN...CLINICAL STUDY REPORT - IN-TEXT TABLES, TABLES FIGURES AND GRAPHS, PATIENT AN...
CLINICAL STUDY REPORT - IN-TEXT TABLES, TABLES FIGURES AND GRAPHS, PATIENT AN...Angelo Tinazzi
 
Database Designing in Clinical Data Management
Database Designing in Clinical Data ManagementDatabase Designing in Clinical Data Management
Database Designing in Clinical Data ManagementClinosolIndia
 
Introduction to clinical sas programming
Introduction to clinical sas programmingIntroduction to clinical sas programming
Introduction to clinical sas programmingray4hz
 
Clinical trail team ( stake holders )
Clinical trail team ( stake holders )Clinical trail team ( stake holders )
Clinical trail team ( stake holders )Irene Vadakkan
 
Unit 02 chapter 05 documentation systems documents and record keeping
Unit 02 chapter 05   documentation systems documents and record keepingUnit 02 chapter 05   documentation systems documents and record keeping
Unit 02 chapter 05 documentation systems documents and record keepingDominic Parry
 
Clinical Data Management Plan_Katalyst HLS
Clinical Data Management Plan_Katalyst HLSClinical Data Management Plan_Katalyst HLS
Clinical Data Management Plan_Katalyst HLSKatalyst HLS
 
Clinical data management
Clinical data management Clinical data management
Clinical data management sopi_1234
 
Motif Finding.pdf
Motif Finding.pdfMotif Finding.pdf
Motif Finding.pdfShimoFcis
 
Clinical Trial Management System Implementation Guide
Clinical Trial Management System Implementation GuideClinical Trial Management System Implementation Guide
Clinical Trial Management System Implementation GuidePerficient, Inc.
 
Sequence analysis - Bioinformatics
Sequence analysis - BioinformaticsSequence analysis - Bioinformatics
Sequence analysis - BioinformaticsPratik Parikh
 
Medical Coding_Katalyst HLS
Medical Coding_Katalyst HLSMedical Coding_Katalyst HLS
Medical Coding_Katalyst HLSKatalyst HLS
 

What's hot (20)

Qc in clinical trials
Qc in clinical trialsQc in clinical trials
Qc in clinical trials
 
ADaM - Where Do I Start?
ADaM - Where Do I Start?ADaM - Where Do I Start?
ADaM - Where Do I Start?
 
Essential Regulatory Documents in Clinical Trials
Essential Regulatory Documents in Clinical TrialsEssential Regulatory Documents in Clinical Trials
Essential Regulatory Documents in Clinical Trials
 
Use of Rasmol and study of proteins
Use of Rasmol and study of proteins Use of Rasmol and study of proteins
Use of Rasmol and study of proteins
 
Presentation on CDISC- SDTM guidelines.
Presentation on CDISC- SDTM guidelines.Presentation on CDISC- SDTM guidelines.
Presentation on CDISC- SDTM guidelines.
 
CLINICAL STUDY REPORT - IN-TEXT TABLES, TABLES FIGURES AND GRAPHS, PATIENT AN...
CLINICAL STUDY REPORT - IN-TEXT TABLES, TABLES FIGURES AND GRAPHS, PATIENT AN...CLINICAL STUDY REPORT - IN-TEXT TABLES, TABLES FIGURES AND GRAPHS, PATIENT AN...
CLINICAL STUDY REPORT - IN-TEXT TABLES, TABLES FIGURES AND GRAPHS, PATIENT AN...
 
Clinical data analytics
Clinical data analyticsClinical data analytics
Clinical data analytics
 
Database Designing in Clinical Data Management
Database Designing in Clinical Data ManagementDatabase Designing in Clinical Data Management
Database Designing in Clinical Data Management
 
Introduction to clinical sas programming
Introduction to clinical sas programmingIntroduction to clinical sas programming
Introduction to clinical sas programming
 
Protein databases
Protein databasesProtein databases
Protein databases
 
Clinical trail team ( stake holders )
Clinical trail team ( stake holders )Clinical trail team ( stake holders )
Clinical trail team ( stake holders )
 
Unit 02 chapter 05 documentation systems documents and record keeping
Unit 02 chapter 05   documentation systems documents and record keepingUnit 02 chapter 05   documentation systems documents and record keeping
Unit 02 chapter 05 documentation systems documents and record keeping
 
Clinical Data Management Plan_Katalyst HLS
Clinical Data Management Plan_Katalyst HLSClinical Data Management Plan_Katalyst HLS
Clinical Data Management Plan_Katalyst HLS
 
Sequence file formats
Sequence file formatsSequence file formats
Sequence file formats
 
PPT ON ALGORITHM
PPT ON ALGORITHMPPT ON ALGORITHM
PPT ON ALGORITHM
 
Clinical data management
Clinical data management Clinical data management
Clinical data management
 
Motif Finding.pdf
Motif Finding.pdfMotif Finding.pdf
Motif Finding.pdf
 
Clinical Trial Management System Implementation Guide
Clinical Trial Management System Implementation GuideClinical Trial Management System Implementation Guide
Clinical Trial Management System Implementation Guide
 
Sequence analysis - Bioinformatics
Sequence analysis - BioinformaticsSequence analysis - Bioinformatics
Sequence analysis - Bioinformatics
 
Medical Coding_Katalyst HLS
Medical Coding_Katalyst HLSMedical Coding_Katalyst HLS
Medical Coding_Katalyst HLS
 

Similar to AI 바이오 (4일차).pdf

sequence alignment
sequence alignmentsequence alignment
sequence alignmentammar kareem
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence AlignmentRavi Gandham
 
Sequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdfSequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdfsriaisvariyasundar
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformaticsAbhishek Vatsa
 
Bioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxBioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxRanjan Jyoti Sarma
 
Bioinformatics_Sequence Analysis
Bioinformatics_Sequence AnalysisBioinformatics_Sequence Analysis
Bioinformatics_Sequence AnalysisSangeeta Das
 
International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...IJCSEIT Journal
 
Sequence alignment global vs. local
Sequence alignment  global vs. localSequence alignment  global vs. local
Sequence alignment global vs. localbenazeer fathima
 
Bioinformatics t8-go-hmm wim-vancriekinge_v2013
Bioinformatics t8-go-hmm wim-vancriekinge_v2013Bioinformatics t8-go-hmm wim-vancriekinge_v2013
Bioinformatics t8-go-hmm wim-vancriekinge_v2013Prof. Wim Van Criekinge
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...journal ijrtem
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...IJRTEMJOURNAL
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastRai University
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastRai University
 
Global and Local Sequence Alignment
Global and Local Sequence AlignmentGlobal and Local Sequence Alignment
Global and Local Sequence AlignmentAjayPatil210
 

Similar to AI 바이오 (4일차).pdf (20)

sequence alignment
sequence alignmentsequence alignment
sequence alignment
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence Alignment
 
Sequence Analysis
Sequence AnalysisSequence Analysis
Sequence Analysis
 
Sequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdfSequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdf
 
Bioinformatics t8-go-hmm v2014
Bioinformatics t8-go-hmm v2014Bioinformatics t8-go-hmm v2014
Bioinformatics t8-go-hmm v2014
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 
Seq alignment
Seq alignment Seq alignment
Seq alignment
 
Bioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxBioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptx
 
Bioinformatics_Sequence Analysis
Bioinformatics_Sequence AnalysisBioinformatics_Sequence Analysis
Bioinformatics_Sequence Analysis
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence Alignment
 
International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
Sequence alignment global vs. local
Sequence alignment  global vs. localSequence alignment  global vs. local
Sequence alignment global vs. local
 
Bioinformatics t8-go-hmm wim-vancriekinge_v2013
Bioinformatics t8-go-hmm wim-vancriekinge_v2013Bioinformatics t8-go-hmm wim-vancriekinge_v2013
Bioinformatics t8-go-hmm wim-vancriekinge_v2013
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
 
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blast
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blast
 
Parwati sihag
Parwati sihagParwati sihag
Parwati sihag
 
Global and Local Sequence Alignment
Global and Local Sequence AlignmentGlobal and Local Sequence Alignment
Global and Local Sequence Alignment
 

More from H K Yoon

AI 바이오 (2_3일차).pdf
AI 바이오 (2_3일차).pdfAI 바이오 (2_3일차).pdf
AI 바이오 (2_3일차).pdfH K Yoon
 
Outlier Analysis.pdf
Outlier Analysis.pdfOutlier Analysis.pdf
Outlier Analysis.pdfH K Yoon
 
Nlp and transformer (v3s)
Nlp and transformer (v3s)Nlp and transformer (v3s)
Nlp and transformer (v3s)H K Yoon
 
Open stack and k8s(v4)
Open stack and k8s(v4)Open stack and k8s(v4)
Open stack and k8s(v4)H K Yoon
 
Open source Embedded systems
Open source Embedded systemsOpen source Embedded systems
Open source Embedded systemsH K Yoon
 
빅데이터, big data
빅데이터, big data빅데이터, big data
빅데이터, big dataH K Yoon
 
Sensor web
Sensor webSensor web
Sensor webH K Yoon
 
Tm기반검색v2
Tm기반검색v2Tm기반검색v2
Tm기반검색v2H K Yoon
 

More from H K Yoon (8)

AI 바이오 (2_3일차).pdf
AI 바이오 (2_3일차).pdfAI 바이오 (2_3일차).pdf
AI 바이오 (2_3일차).pdf
 
Outlier Analysis.pdf
Outlier Analysis.pdfOutlier Analysis.pdf
Outlier Analysis.pdf
 
Nlp and transformer (v3s)
Nlp and transformer (v3s)Nlp and transformer (v3s)
Nlp and transformer (v3s)
 
Open stack and k8s(v4)
Open stack and k8s(v4)Open stack and k8s(v4)
Open stack and k8s(v4)
 
Open source Embedded systems
Open source Embedded systemsOpen source Embedded systems
Open source Embedded systems
 
빅데이터, big data
빅데이터, big data빅데이터, big data
빅데이터, big data
 
Sensor web
Sensor webSensor web
Sensor web
 
Tm기반검색v2
Tm기반검색v2Tm기반검색v2
Tm기반검색v2
 

Recently uploaded

College Call Girls Mumbai Alia 9910780858 Independent Escort Service Mumbai
College Call Girls Mumbai Alia 9910780858 Independent Escort Service MumbaiCollege Call Girls Mumbai Alia 9910780858 Independent Escort Service Mumbai
College Call Girls Mumbai Alia 9910780858 Independent Escort Service Mumbaisonalikaur4
 
9711199012 Najafgarh Call Girls ₹5.5k With COD Free Home Delivery
9711199012 Najafgarh Call Girls ₹5.5k With COD Free Home Delivery9711199012 Najafgarh Call Girls ₹5.5k With COD Free Home Delivery
9711199012 Najafgarh Call Girls ₹5.5k With COD Free Home Deliverymarshasaifi
 
College Call Girls Hyderabad Sakshi 9907093804 Independent Escort Service Hyd...
College Call Girls Hyderabad Sakshi 9907093804 Independent Escort Service Hyd...College Call Girls Hyderabad Sakshi 9907093804 Independent Escort Service Hyd...
College Call Girls Hyderabad Sakshi 9907093804 Independent Escort Service Hyd...delhimodelshub1
 
VIP Call Girls Hyderabad Megha 9907093804 Independent Escort Service Hyderabad
VIP Call Girls Hyderabad Megha 9907093804 Independent Escort Service HyderabadVIP Call Girls Hyderabad Megha 9907093804 Independent Escort Service Hyderabad
VIP Call Girls Hyderabad Megha 9907093804 Independent Escort Service Hyderabaddelhimodelshub1
 
Experience learning - lessons from 25 years of ATACC - Mark Forrest and Halde...
Experience learning - lessons from 25 years of ATACC - Mark Forrest and Halde...Experience learning - lessons from 25 years of ATACC - Mark Forrest and Halde...
Experience learning - lessons from 25 years of ATACC - Mark Forrest and Halde...scanFOAM
 
Russian Call Girls in Goa Samaira 7001305949 Independent Escort Service Goa
Russian Call Girls in Goa Samaira 7001305949 Independent Escort Service GoaRussian Call Girls in Goa Samaira 7001305949 Independent Escort Service Goa
Russian Call Girls in Goa Samaira 7001305949 Independent Escort Service Goanarwatsonia7
 
Call Girl Hyderabad Madhuri 9907093804 Independent Escort Service Hyderabad
Call Girl Hyderabad Madhuri 9907093804 Independent Escort Service HyderabadCall Girl Hyderabad Madhuri 9907093804 Independent Escort Service Hyderabad
Call Girl Hyderabad Madhuri 9907093804 Independent Escort Service Hyderabaddelhimodelshub1
 
Russian Call Girls in Raipur 9873940964 Book Hot And Sexy Girls
Russian Call Girls in Raipur 9873940964 Book Hot And Sexy GirlsRussian Call Girls in Raipur 9873940964 Book Hot And Sexy Girls
Russian Call Girls in Raipur 9873940964 Book Hot And Sexy Girlsddev2574
 
Call Girls Hyderabad Krisha 9907093804 Independent Escort Service Hyderabad
Call Girls Hyderabad Krisha 9907093804 Independent Escort Service HyderabadCall Girls Hyderabad Krisha 9907093804 Independent Escort Service Hyderabad
Call Girls Hyderabad Krisha 9907093804 Independent Escort Service Hyderabaddelhimodelshub1
 
Gurgaon iffco chowk 🔝 Call Girls Service 🔝 ( 8264348440 ) unlimited hard sex ...
Gurgaon iffco chowk 🔝 Call Girls Service 🔝 ( 8264348440 ) unlimited hard sex ...Gurgaon iffco chowk 🔝 Call Girls Service 🔝 ( 8264348440 ) unlimited hard sex ...
Gurgaon iffco chowk 🔝 Call Girls Service 🔝 ( 8264348440 ) unlimited hard sex ...soniya singh
 
Book Call Girls in Noida Pick Up Drop With Cash Payment 9711199171 Call Girls
Book Call Girls in Noida Pick Up Drop With Cash Payment 9711199171 Call GirlsBook Call Girls in Noida Pick Up Drop With Cash Payment 9711199171 Call Girls
Book Call Girls in Noida Pick Up Drop With Cash Payment 9711199171 Call GirlsCall Girls Noida
 
Call Girls Kukatpally 7001305949 all area service COD available Any Time
Call Girls Kukatpally 7001305949 all area service COD available Any TimeCall Girls Kukatpally 7001305949 all area service COD available Any Time
Call Girls Kukatpally 7001305949 all area service COD available Any Timedelhimodelshub1
 
Call Girl Bangalore Aashi 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Aashi 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Aashi 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Aashi 7001305949 Independent Escort Service Bangalorenarwatsonia7
 
2025 Inpatient Prospective Payment System (IPPS) Proposed Rule
2025 Inpatient Prospective Payment System (IPPS) Proposed Rule2025 Inpatient Prospective Payment System (IPPS) Proposed Rule
2025 Inpatient Prospective Payment System (IPPS) Proposed RuleShelby Lewis
 
Russian Call Girls Hyderabad Saloni 9907093804 Independent Escort Service Hyd...
Russian Call Girls Hyderabad Saloni 9907093804 Independent Escort Service Hyd...Russian Call Girls Hyderabad Saloni 9907093804 Independent Escort Service Hyd...
Russian Call Girls Hyderabad Saloni 9907093804 Independent Escort Service Hyd...delhimodelshub1
 
Gurgaon Sector 90 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...
Gurgaon Sector 90 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...Gurgaon Sector 90 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...
Gurgaon Sector 90 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...ggsonu500
 
Call Girls Service Chandigarh Grishma ❤️🍑 9907093804 👄🫦 Independent Escort Se...
Call Girls Service Chandigarh Grishma ❤️🍑 9907093804 👄🫦 Independent Escort Se...Call Girls Service Chandigarh Grishma ❤️🍑 9907093804 👄🫦 Independent Escort Se...
Call Girls Service Chandigarh Grishma ❤️🍑 9907093804 👄🫦 Independent Escort Se...High Profile Call Girls Chandigarh Aarushi
 

Recently uploaded (20)

College Call Girls Mumbai Alia 9910780858 Independent Escort Service Mumbai
College Call Girls Mumbai Alia 9910780858 Independent Escort Service MumbaiCollege Call Girls Mumbai Alia 9910780858 Independent Escort Service Mumbai
College Call Girls Mumbai Alia 9910780858 Independent Escort Service Mumbai
 
9711199012 Najafgarh Call Girls ₹5.5k With COD Free Home Delivery
9711199012 Najafgarh Call Girls ₹5.5k With COD Free Home Delivery9711199012 Najafgarh Call Girls ₹5.5k With COD Free Home Delivery
9711199012 Najafgarh Call Girls ₹5.5k With COD Free Home Delivery
 
College Call Girls Hyderabad Sakshi 9907093804 Independent Escort Service Hyd...
College Call Girls Hyderabad Sakshi 9907093804 Independent Escort Service Hyd...College Call Girls Hyderabad Sakshi 9907093804 Independent Escort Service Hyd...
College Call Girls Hyderabad Sakshi 9907093804 Independent Escort Service Hyd...
 
College Call Girls Dehradun Kavya 🔝 7001305949 🔝 📍 Independent Escort Service...
College Call Girls Dehradun Kavya 🔝 7001305949 🔝 📍 Independent Escort Service...College Call Girls Dehradun Kavya 🔝 7001305949 🔝 📍 Independent Escort Service...
College Call Girls Dehradun Kavya 🔝 7001305949 🔝 📍 Independent Escort Service...
 
VIP Call Girls Hyderabad Megha 9907093804 Independent Escort Service Hyderabad
VIP Call Girls Hyderabad Megha 9907093804 Independent Escort Service HyderabadVIP Call Girls Hyderabad Megha 9907093804 Independent Escort Service Hyderabad
VIP Call Girls Hyderabad Megha 9907093804 Independent Escort Service Hyderabad
 
Experience learning - lessons from 25 years of ATACC - Mark Forrest and Halde...
Experience learning - lessons from 25 years of ATACC - Mark Forrest and Halde...Experience learning - lessons from 25 years of ATACC - Mark Forrest and Halde...
Experience learning - lessons from 25 years of ATACC - Mark Forrest and Halde...
 
Russian Call Girls in Goa Samaira 7001305949 Independent Escort Service Goa
Russian Call Girls in Goa Samaira 7001305949 Independent Escort Service GoaRussian Call Girls in Goa Samaira 7001305949 Independent Escort Service Goa
Russian Call Girls in Goa Samaira 7001305949 Independent Escort Service Goa
 
Call Girl Hyderabad Madhuri 9907093804 Independent Escort Service Hyderabad
Call Girl Hyderabad Madhuri 9907093804 Independent Escort Service HyderabadCall Girl Hyderabad Madhuri 9907093804 Independent Escort Service Hyderabad
Call Girl Hyderabad Madhuri 9907093804 Independent Escort Service Hyderabad
 
Call Girl Lucknow Gauri 🔝 8923113531 🔝 🎶 Independent Escort Service Lucknow
Call Girl Lucknow Gauri 🔝 8923113531  🔝 🎶 Independent Escort Service LucknowCall Girl Lucknow Gauri 🔝 8923113531  🔝 🎶 Independent Escort Service Lucknow
Call Girl Lucknow Gauri 🔝 8923113531 🔝 🎶 Independent Escort Service Lucknow
 
Russian Call Girls in Raipur 9873940964 Book Hot And Sexy Girls
Russian Call Girls in Raipur 9873940964 Book Hot And Sexy GirlsRussian Call Girls in Raipur 9873940964 Book Hot And Sexy Girls
Russian Call Girls in Raipur 9873940964 Book Hot And Sexy Girls
 
Call Girls Hyderabad Krisha 9907093804 Independent Escort Service Hyderabad
Call Girls Hyderabad Krisha 9907093804 Independent Escort Service HyderabadCall Girls Hyderabad Krisha 9907093804 Independent Escort Service Hyderabad
Call Girls Hyderabad Krisha 9907093804 Independent Escort Service Hyderabad
 
Gurgaon iffco chowk 🔝 Call Girls Service 🔝 ( 8264348440 ) unlimited hard sex ...
Gurgaon iffco chowk 🔝 Call Girls Service 🔝 ( 8264348440 ) unlimited hard sex ...Gurgaon iffco chowk 🔝 Call Girls Service 🔝 ( 8264348440 ) unlimited hard sex ...
Gurgaon iffco chowk 🔝 Call Girls Service 🔝 ( 8264348440 ) unlimited hard sex ...
 
VIP Call Girls Lucknow Isha 🔝 9719455033 🔝 🎶 Independent Escort Service Lucknow
VIP Call Girls Lucknow Isha 🔝 9719455033 🔝 🎶 Independent Escort Service LucknowVIP Call Girls Lucknow Isha 🔝 9719455033 🔝 🎶 Independent Escort Service Lucknow
VIP Call Girls Lucknow Isha 🔝 9719455033 🔝 🎶 Independent Escort Service Lucknow
 
Book Call Girls in Noida Pick Up Drop With Cash Payment 9711199171 Call Girls
Book Call Girls in Noida Pick Up Drop With Cash Payment 9711199171 Call GirlsBook Call Girls in Noida Pick Up Drop With Cash Payment 9711199171 Call Girls
Book Call Girls in Noida Pick Up Drop With Cash Payment 9711199171 Call Girls
 
Call Girls Kukatpally 7001305949 all area service COD available Any Time
Call Girls Kukatpally 7001305949 all area service COD available Any TimeCall Girls Kukatpally 7001305949 all area service COD available Any Time
Call Girls Kukatpally 7001305949 all area service COD available Any Time
 
Call Girl Bangalore Aashi 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Aashi 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Aashi 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Aashi 7001305949 Independent Escort Service Bangalore
 
2025 Inpatient Prospective Payment System (IPPS) Proposed Rule
2025 Inpatient Prospective Payment System (IPPS) Proposed Rule2025 Inpatient Prospective Payment System (IPPS) Proposed Rule
2025 Inpatient Prospective Payment System (IPPS) Proposed Rule
 
Russian Call Girls Hyderabad Saloni 9907093804 Independent Escort Service Hyd...
Russian Call Girls Hyderabad Saloni 9907093804 Independent Escort Service Hyd...Russian Call Girls Hyderabad Saloni 9907093804 Independent Escort Service Hyd...
Russian Call Girls Hyderabad Saloni 9907093804 Independent Escort Service Hyd...
 
Gurgaon Sector 90 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...
Gurgaon Sector 90 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...Gurgaon Sector 90 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...
Gurgaon Sector 90 Call Girls ( 9873940964 ) Book Hot And Sexy Girls In A Few ...
 
Call Girls Service Chandigarh Grishma ❤️🍑 9907093804 👄🫦 Independent Escort Se...
Call Girls Service Chandigarh Grishma ❤️🍑 9907093804 👄🫦 Independent Escort Se...Call Girls Service Chandigarh Grishma ❤️🍑 9907093804 👄🫦 Independent Escort Se...
Call Girls Service Chandigarh Grishma ❤️🍑 9907093804 👄🫦 Independent Escort Se...
 

AI 바이오 (4일차).pdf

  • 1. AI-Bio 융합 전문 과정 2022-8~10 윤형기 (hky@openwith.net) 4일차
  • 2. 주제 세부사항 1일차 인사 및 과정 소개 인사 수강생 현황 및 수강목적 등 파악 의료/바이오 개관 (기술/산업) 의료/바이오 기술 및 산업동향 기반기술 (1-1) Python과 분석 패키지 분석도구 (1) (Python, Scipy, numpy/pandas) 2일차 기반기술 (1-2) R과 통계분석 분석도구 (2) (R과 통계학) 생명통계 활용 (1) 생명정보와 ANOVA, 다변량분석 등 유전체 분석 3일차 생명통계 활용 (2) 메타분석 유전체 분석 (Omics) (1) 유전체(genome) 분석 전사체(transcriptome) 분석 4일차 유전체 분석 (Omics) (2) 후성유전체(epigenome) 분석 단백체(proteome) 분석 차세대 Sequencing GenBank와 NCBI데이터 VCF 데이터 분석, NGS 데이터 처리 등 5일차 기반기술 (3) 기계학습 (1) 모델링 방법론 (모델 개념 및 Cross-Validation) 지도학습 알고리즘 (선형모델, 분류) 기반기술 (3) 기계학습 (2) 비지도학습 알고리즘 (군집, 연관분석 등) 6일차 지도학습과 생명정보 응용 의료데이터에서의 예측모델 선형모델과 헬스케어 데이터의 분류 비지도학습과 생명정보 응용 임상데이터의 연관성분석 동반질병 (comorbidity) 분석 의료/바이오 도메인 이해 헬스케어 데이터셋과 생명통계 바이오 데이터와 기계학습 일정
  • 3. 주제 세부사항 7일차 기반기술 (4) 딥러닝 (1) 신경망 학습과 딥러닝 모델 기반기술 (3) 딥러닝 (2) TensorFlow PyTorch 8일차 딥러닝과 생명정보 응용 Bi-LSTM을 이용한 헬스케어 시뮬레이션 딥러닝을 이용한 피부병 식별 온톨로지와 생명정보 응용 세만틱웹과 ontologies Ontology의 생명정보 응용 9일차 기반 기술 (3) 이미지 처리 이미지 처리와 컴퓨터 비전 개요 의료영상분석 (1) Segmentation 영상등록 (image registration) 10일차 의료영상분석 (2) 심전도 (ECG) Rendering과 Surface Models MRI 11일차 기반기술 (4) 생명정보와 계산화학 계산화학 (computational chemistry) 개요 신약개발 (drug discovery) (1) 표적규명 (target identification) 시약과 검정법 개발 ADME (흡수, 분포, 대사, 배설) 독성학과 기계학습 응용 12일차 기반 기술 (5) GAN GAN (Generative Adversarial Networks)과 VAE 신약개발과 GAN 생성모델을 이용한 신약후보물질 추천 총정리 Wrap-up 총정리 의료영상 분석 약물분석과 신약설계 바이오 데이터와 딥러닝
  • 5. 생명정보학 주요 주제 • 서열정렬 – Pairwise Sequence Alignment – Database 유사도 검색 – Multiple Sequence Alignment – Profile과 HMM – Protein Motifs and Domain Prediction • Gene과 Promoter 예측 – 유전자 예측 – Promoter and Regulatory Element Prediction • 분자 계통 발생학 (Molecular Phylogenetics) – Phylogenetics Basics – Phylogenetic Tree Construction Methods and Programs • 구조적 생명정보학 (Structural Bioinformatics) – 단백질 구조 시각화, 비교 & 분류 – Protein 구조 Structure 예측 (2ndary, Tertiary) – RNA 구조 예측 • 유전체학과 전사체학 (Genomics & Proteomics) – 유전체 Mapping, Assembly, 비교 – 기능 유전체학 – Proteomics • Genome rearrangements • Motif finding • Gene expression analysis
  • 7. 보충: 유전 부호(genetic code) • 1. 개요 – 각 codon이 어떤 아미노산을 부호화(encoding)할지를 정해놓은 규칙 • 2. 코돈 Codon – 단백질의 아미노산을 지정하는 RNA의 유전 정보 – RNA 구성 염기: Uracil, Guanine, Cytosine, Adenine – 한 codon은 3개 염기로 구성 - 이론상 4×4×4=64종의 정보 지정. • 3. 종류 – 3.1. 개시 코돈 start codon • 5'-AUG-3’ (일부 박테리아에서 변형된 개시 코돈 사용). • 진핵 생물에서는 메싸이오닌(Methionine, Met)을, 원핵생물에서는 N-포르밀메싸이오닌(N-Formylmethionine, fMet)을 지정. • 또한 mRNA가 리보솜과 결합해 단백질 번역을 시작하도록 하는 역할도 수행 – 3.2. 종결 코돈 Stop Codon, Nonsense Codon • 단백질 번역의 끝을 알리는 codon으로서 UAA, UAG, UGA의 세 종류 • 종결 코돈에는 대응하는 tRNA가 없고 대신 '종결 인자'라는 단백질이 붙으며, 번역 과 정에서 종결 코돈에 도달하면 리보솜의 두 단위체가 분리되어 번역이 종결된다. – 3.3. 안티코돈(역코돈) anticodon • tRNA의 RNA 사슬을 이루는 특정 구간의 염기 서열.
  • 8. Pairwise Sequence Alignment • 배경 • Sequence Homology (서열 상동성) vs. Sequence Similarity • Sequence Similarity vs. Sequence Identity • 기법 – Global Alignment and Local Alignment – Alignment Algorithms – Dot Matrix Method – Dynamic Programming Method • Gap Penalties • Dynamic Programming for Global Alignment • Dynamic Programming for Local Alignment • Scoring 행렬 – Amino Acid Scoring 행렬 – PAM 행렬 – BLOSUM 행렬 – Comparison between PAM and BLOSUM • Sequence Alignment의 통계적 유의성
  • 9. • (Goal) • 서열 비교  “공통 character patterns” 과 residue–residue 대응관계를 찾아냄 • 배경 – 진화 • DNA와 protein은 진화의 소산 – The degree of sequence conservation in the alignment reveals evolutionary relatedness of different sequences, whereas the variation between sequences reflects the changes that have occurred during evolution in the form of substitutions, insertions, and deletions. • sequence alignment – can be used as basis for prediction of structure and function of uncharacterized sequences. – provides inference for the relatedness of two sequences under study.
  • 10. Sequence Homology vs. Similarity • (…) – 용어 구별 • Homologous relationship or share homology. – an inference or a conclusion about a common ancestral relationship drawn from sequence similarity comparison when the two sequences share a high enough degree of similarity. (qualitative) • Sequence similarity – is a direct result of observation from the sequence alignment. – % of aligned residues that are similar in physiochemical properties such as size, charge, and hydrophobicity. (quantitative) – 문제는 sequence similarity level • Nucleotide sequences consist of only 4 characters → unrelated sequences have at least a 25% chance of being identical. • protein sequences - 20 possible amino acid residues → two unrelated sequences can match up 5% of the residues by random chance.
  • 11. – 단, % identity values only provide a tentative guidance for homology identification 3 zones of protein sequence alignments. (Source: Modified from Rost 1999).
  • 12. Sequence Similarity vs. Sequence Identity • (…) • nucleotide sequence의 경우 사실상 같은 의미 • Protein sequence의 경우 구별할 것 – sequence identity = % of matches of the same amino acid residues between two aligned sequences. – Similarity = % of aligned residues that have similar physicochemical characteristics and can be more readily substituted for each other. – Sequence similarity 및 identity 계산 방법 – One involves use of the overall sequence lengths of both sequences – the other normalizes by the size of the shorter sequence.
  • 13. Methods • Global Alignment and Local Alignment • Global Alignment – 처음부터 끝까지 비교 » is more applicable for aligning two closely related sequences of roughly the same length. » For divergent sequences and sequences of variable lengths, this method may not be able to generate optimal results because it fails to recognize highly similar local regions between the two sequences. • Local alignment – only finds local regions with the highest level of similarity between the two sequences and aligns these regions without regard for the alignment of the rest of the sequence regions – Two sequences to be aligned can be of different lengths
  • 15. • 정렬 알고리즘 – Dot Matrix Method (= dot plot method) – Dynamic Programming Method • Gap Penalties • Dynamic Programming for Global Alignment • Dynamic Programming for Local Alignment – Word method
  • 16. – Dot Matrix Method dot plot에 의한 서열비교의 예. Lines linking the dots in diagonals indicate sequence alignment. Diagonal lines above or below the main diagonal represent internal repeats of either sequence
  • 17. • Problem when comparing large sequences using dot matrix method – high noise level. » In most dot plots, dots are plotted all over the graph, obscuring identification of the true alignment - particularly acute for DNA sequences because only 4 possible characters in DNA and each residue therefore has a 1-in-4 chance of matching a residue in another sequence. » To reduce noise, instead of using a single residue to scan for similarity, a filtering technique has to be applied, which uses a “window” of fixed length covering a stretch of residue pairs.
  • 18. • self comparison as a variation of using the dot plot method. – a main diagonal for perfect matching of each residue  identify internal repeat elements – If repeats are present, short parallel lines are observed above and below the main diagonal. » Self complementarity of DNA sequences (also called inverted repeats) can also be identified using a dot plot. » In this case, a DNA sequence is compared with its reverse- complemented sequence. – Parallel diagonals represent the inverted repeats.
  • 19. – 장점 » easy identification of greatest similarities. – 단점 » it is often up to the user to construct a full alignment with insertions and deletions by linking nearby diagonals. » it lacks statistical rigor in assessing the quality of the alignment. » is also restricted to pairwise alignment. It is difficult for the method to scale up to multiple alignment.
  • 20. – Dynamic Programming Method • (…) – convert a dot matrix into a scoring matrix to account for matches and mismatches between sequences. By searching for the set of highest scores in this matrix, the best alignment can be accurately obtained. – construct a 2-D matrix. » The residue matching is according to a particular scoring matrix. The scores are calculated one row at a time. This starts with the first row of one sequence, which is used to scan through the entire length of the other sequence, followed by scanning of the second row. The matching scores are calculated.
  • 21.
  • 22. • Gap Penalties – Apply gaps that represent insertions and deletions. – cost difference between opening a gap and extending an existing gap. » it is easier to extend a gap that has already been started. Thus, gap opening have a much higher penalty  if insertions and deletions ever occur, several adjacent residues are likely to have been inserted or deleted together. » affine gap penalties (= These differential gap penalties). » Strategy: use preset gap penalty values for introducing and extending gaps. » The total gap penalty (W) is a linear function of gap length: » a constant gap penalty - less realistic γ = gap opening penalty, δ = gap extension penalty, k = length of the gap.
  • 23. • DP for Global Alignment (Needleman–Wunsch algorithm) – an optimal alignment is obtained over the entire lengths of the two sequences. – Drawback = risk of missing the best local similarity → only suitable for aligning two closely related sequences that are of the same length. (For divergent sequences or sequences with different domain structures, the approach does not produce optimal alignment) • DP for Local Alignment (Smith–Waterman algorithm) – identification of regional sequence similarity
  • 24. Scoring 행렬 • (…) = a substitution 행렬 • is derived from statistical analysis of residue substitution data from sets of reliable alignments of highly related sequences. – A positive value or high score is given for a match and a negative value or low score for a mismatch. – Assumption: the frequencies of mutation are equal for all bases. 단, 비현실적 가정임 • Scoring matrices for amino acids are more complicated –  the physicochemical properties of amino acid residues, as well as the likelihood of certain residues being substituted among true homologous sequences. – Certain amino acids with similar physicochemical properties can be more easily substituted than those without similar characteristics. Substitutions among similar residues are likely to preserve the essential functional and structural features. However, substitutions between residues of different physicochemical properties are more likely to cause disruptions to the structure and function.
  • 25.
  • 26. • Amino Acid Scoring 행렬 – 20 x 20 matrices to reflect the likelihood of residue substitutions • 2 types of amino acid substitution matrices. – (i) based on interchangeability of the genetic code or amino acid properties, » is based on genetic code or the physicochemical features of amino acids → less accurate – (ii) derived from empirical studies of amino acid substitutions. »  surveys of actual amino acid substitutions among related proteins. » PAM and BLOSUM matrices derived from actual alignments of highly similar sequences. By analyzing the probabilities of amino acid substitutions in these alignments, a scoring system can be developed by giving a high score for a more likely substitution and a low score for a rare substitution.
  • 27. • PAM 행렬 (Dayhoff PAM 행렬) • point accepted mutation Correspondence of PAM Numbers with Observed Amino Acid Mutational Rates
  • 28. • BLOSUM 행렬 • the series of blocks amino acid substitution matrices (BLOSUM) – → (In PAM matrix construction, the only direct observation of residue substitutions is in PAM1, based on a relatively small set of extremely closely related sequences. Sequence alignment statistics for more divergent sequences are not available. ) – all are derived based on direct observation for every possible amino acid substitution in multiple sequence alignments. • extrapolation 함수 대신, BLOSUM matrices are actual % identity values of sequences selected for construction of the matrices.
  • 29. PAM250 amino acid substitution matrix. Residues are grouped according to physicochemical similarities.
  • 30. BLOSUM62 amino acid substitution matrix.
  • 31. • PAM과 BLOSUM의 비교 • 주된 차이점 – PAM matrices, except PAM1, are derived from an evolutionary model – BLOSUM matrices consist of entirely direct observations. » BLOSUM matrices are entirely derived from local sequence alignments of conserved sequence blocks, » PAM1 matrix is based on the global alignment of full-length sequences composed of both conserved and variable regions. → BLOSUM matrices is more advantageous in searching databases and finding conserved domains in proteins. • 몇몇 실증 비교의 결과 – BLOSUM matrices outperform the PAM matrices in terms of accuracy of local alignment, largely because BLOSUM matrices are derived from a much larger and more representative dataset than the one used to derive the PAM matrices. → BLOSUM matrices more reliable. – 개정된 행렬이 고안됨. (ex) Gonnet matrices and Jones–Taylor–Thornton matrices –particularly robust in phylogenetic tree construction .
  • 32. alignment score에 대한 Gumble 극값 분포.
  • 33. Sequence Alignment의 통계적 유의성 • 개념 • True evidence of homology를 찾기 위한 통계검정 – 검정 절차 • A P-value resulting from the test – < 10-100 indicates an exact match between the two sequences. – 10-100 < P-value < 10-50 → a nearly identical match. – 10-50 < P-value < 10-5 → sequences having clear homology. – 10-5 < P-value < 10-1 → possible distant homologs. – 10-1 < P-value → the two sequence may be randomly related. – However, sometimes truly related protein sequences may lack the statistical significance at the sequence level owing to fast divergence rates. Their evolutionary relationships can nonetheless be revealed at the three-dimensional structural level.
  • 34. Database 유사도 검색 • DB 검색의 요건 • Heuristic 검색 • Basic Local Alignment Search Tool (BLAST) – Variants – Statistical Significance – Low Complexity Regions – BLAST Output Format • FASTA – 통계적 유의성 • FASTA와 BLAST의 비교 • Smith–Waterman Method에 의한 검색
  • 35. 일반론 • DB 검색 • pairwise alignment to retrieve biological sequences in DBs based on similarity. – Query for a pairwise comparison with all individual sequences in a database. - Database similarity searching is pairwise alignment on a large scale. – However, DP is slow and impractical to use in most cases. Special search methods are needed to speed up the computational process. • DB 검색의 요건 • Sensitivity → “true positives” • specificity = “false positives.” • speed – Types of algo • Exhaustive type – examine all mathematical combinations (ex) DP • Heuristic type – find empirical or near optimal solution using rules of thumb
  • 36. Heuristic 검색 • (…) – BLAST – FASTA – word method • Both BLAST and FASTA use a heuristic “word method” for fast pairwise sequence alignment.
  • 37. Basic Local Alignment Search Tool (BLAST) • 목적 – = high-scoring ungapped segments를 찾아내고자 함 - Segments above a given threshold indicates pairwise similarity beyond random chance. BLOSUM62 matrix에 의한 alignment scoring의 예
  • 38. • 변형된 방법론 – BLASTN – BLASTP – BLASTX – TBLASTX
  • 39. • 통계적 유의성 – The larger the DB, the more unrelated sequence alignments. → a new parameter taking into account total number of sequence alignments conducted, proportional to the size of the database. • In BLAST searches, E-value (expectation value) – indicates the probability that the resulting alignments from a DB search are caused by random chance. – E-value is related to the P-value used to assess significance of single pairwise alignment. BLAST compares a query sequence against all database sequences, and so the E-value is determined by: – (ex) … • A bit score – Measures sequence similarity independent of query sequence length and DB size and is normalized based on the raw pairwise alignment score
  • 40. • Low Complexity Regions (LCRs) • For both protein and DNAsequences, there may be regions that contain highly repetitive residues, such as short segments of repeats, or segments that are overrepresented by a small number of residues. – LCRs are rather prevalent in DB sequences; about 15% of the total protein sequences in public databases. → spurious DB matches and lead to artificially high alignment scores with unrelated sequences. • To avoid the problem of high similarity scores owing to matching of LCRs, filter out the problematic regions in both query and DB sequences to improve SN ratio,(= masking) • 2 types of masking: hard and soft. • SEG detects and mask repetitive elements before executing DB searches. – SEG has been integrated into the BLAST web based program. • BLAST Output Format
  • 41.
  • 42. FASTA • (…) • 최초의 DB 유사도 검색 도구 • find matches for a short stretch of identical residues with a length of k. (“hashing” 방식) – string of residues (= ktuples or ktups) are equivalent to words in BLAST, but are normally shorter than words. Typically, a ktup is composed of two residues for protein sequences and six residues for DNA sequences. • Similar to BLAST, FASTA has a number of subprograms.
  • 43. Procedure of ktup identification using the hashing strategy by FASTA. Identical offset values between residues of the two sequences allow the formation of ktups.
  • 44. Steps of the FASTA alignment procedure. In step 1 (left ), all possible ungapped alignments are found between two sequences with the hashing method. In step 2 (middle), the alignments are scored according to a particular scoring matrix. Only the ten best alignments are selected. In step 3 (right ), the alignments in the same diagonal are selected and joined to form a single gapped alignment, which is optimized using the dynamic programming approach.
  • 45. • 통계적 유의성 • FASTA also uses E-values and bit scores. – essentially the same as in BLAST, but the FASTA output provides one more statistical parameter, the Z-score. » Because most of the alignments with the query sequence are with unrelated sequences, the higher the Z-score for a reported match, the further away from the mean of the score distribution, hence, the more significant the match. » For a Z-score > 15, the match can be considered extremely significant, with certainty of a homologous relationship. » If Z is in the range of 5 to 15, the sequence pair can be described as highly probable homologs. » If Z < 5, their relationships is described as less certain.
  • 46. FASTA와 BLAST의 비교 • (…) • BLAST and FASTA perform equally well in regular DB searching. • differences (Notably seeding step) – BLAST uses a substitution matrix to find matching words » use of low-complexity masking in BLAST → higher specificity than FASTA because potential FPs are reduced. » BLAST sometimes gives multiple best-scoring alignments from the same sequence; – FASTA identifies identical matching word using hashing procedure. » By default, FASTA scans smaller window sizes. → more sensitive results than BLAST, with a better coverage rate for homologs. However, it is usually slower than BLAST. » FASTA returns only one final alignment.
  • 47. 다중 서열정렬 (Multiple Sequence Alignment) • Scoring 함수 • Exhaustive Algorithms • Heuristic Algorithms – Progressive Alignment Method – Drawbacks and Solutions – Iterative Alignment – Block-Based Alignment • 검토사항 – Protein-Coding DNA Sequences – Editing – Format Conversion
  • 48. • 개념 • generation of multiple matching sequence pairs → convert numerous pairwise alignments into a single alignment → arrange sequences in such a way that evolutionarily equivalent positions across all sequences are matched. • 장점 – reveals more biological information than pairwise alignments can. – applications in designing degenerate PCR primers based on multiple related sequences. • DP vs. Heuristic – the amount of computing time and memory DP requires increases exponentially as the number of sequences increases. In practice, heuristic approaches are most often used.
  • 49. Scoring 함수 • (…) • MSA is to arrange sequences in such a way that a max no. of residues from each sequence are matched up according to a particular scoring function. » = sum of pairs (SP). (= sum of scores of all possible pairs of sequences in a multiple alignment based on a particular scoring matrix). – In calculating SP scores, each column is scored by summing the scores for all possible pairwise matches, mismatches and gap costs. The score of the entire alignment is the sum of all of column scores. – The purpose of most multiple sequence alignment algorithms is to achieve maximum SP scores.
  • 51. Heuristic Algorithms • (3 categories) – Progressive Alignment Method – Iterative Alignment – Block-Based Alignment • Progressive Alignment Method – Drawbacks and Solutions Schematic of a typical progressive alignment procedure (e.g., Clustal). Angled wavy lines represent consensus sequences for sequence pairs A/B and C/D. Curved wavy lines represent a consensus for A/B/C/D.
  • 52.
  • 53. Conversion of a sequence alignment into a graphical profile in the Poa algorithm. Identical residues in the alignment are condensed as nodes in the partial order graph.
  • 54. • Iterative Alignment • Block-Based Alignment Schematic of iterative alignment procedure for PRRN, which involves two sets of iterations.