2009 CSBB LAB 新生訓練

CSBB LAB 新生訓練基礎計算機科學 Speaker: 黃智沂 2 nd Year Student of Ph.D. Program

為什麼要談基礎計算機科學？ ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

李家同師公名言錄 ,[object Object],[object Object]

Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

何謂演算法 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

介紹 ,[object Object],[object Object],[object Object],[object Object]

O 表示法 ,[object Object],[object Object],[object Object],[object Object],[object Object]

O 表示法 ,[object Object],[object Object],[object Object],[object Object]

定理 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

引理 ,[object Object],[object Object],[object Object],[object Object],[object Object]

表 ,[object Object],10 38 10 38 10 39 10 39 1.1 n 125,000 250,000 500,000 1,000,000 n 3 125 250 500 1,000 n 2 4 8 16 32 n 1.5 1.25 2.5 5 10 nlog 2 n 0.125 0.25 0.5 1 n 0.001 0.003 0.005 0.010 log 2 n 執行時間時間 4 8000 步驟 / 秒時間 3 4000 步驟 / 秒時間 2 2000 步驟 / 秒時間 1 1000 步驟 / 秒

O 表示法 ,[object Object],[object Object],[object Object]

上限 & 下限演算法最短的執行時間慢快這是某個演算法因此，這問題的解法最慢不過如此了 ( 上限 ) 所有的演算法都不可能更快了 ( 下限 )

Ω 表示法 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Θ 表示法 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

時間與空間複雜度 ,[object Object],[object Object],[object Object],[object Object],[object Object]

空間複雜度 ,[object Object],[object Object],[object Object]

空間複雜度 c c c c c O(1) 100K 100,000 10K 10,000 1K 1,000 100 100 10 10 O(n) 輸入量 n

複雜度的替換代價 ,[object Object],[object Object],[object Object]

進階的複雜度分析 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

分析和計算模型有關 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

What is a Turing Machine? ,[object Object],[object Object],[object Object],[object Object],control  = blank symbols     b a b a

What is a TM? (2) ,[object Object],[object Object],[object Object],[object Object]

Extensions of Turing Machine ,[object Object],[object Object],[object Object],[object Object]

Variants of TM ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Multi-tape Turing Machines: Informal Description … We add a finite number of tapes Control … a 1 a 2  Tape 1 head 1 … a 1 a 2  Tape 2 head 2

Multi-tape Turing Machines: Informal Description (II) ,[object Object],[object Object],[object Object],( (p,(x 1 , x 2 )), (q, (y 1 , y 2 )) ) Such that each x i is in  and each y i is in  or is  or  . and if x i =  then y i =  or y i = 

Multi-tape Turing Machines vs Turing Machines ,[object Object],[object Object],[object Object]

a b  a b  Tape 1 a b a Tape 2 State in M2: s Solve by 2-tape Turing Machine M2 : a b  a b  Tape 1 a b a Tape 2 State in M2: s’

Using States to “Remember” Information Equivalent configuration in a Turing Machine M : a b  a b a b # a b

Theorem ,[object Object],[object Object]

Oracle Turing Machine ,[object Object],[object Object],[object Object]

Definition: A Non-Deterministic TM is a 7-tuple T = (Q, Σ , Γ ,  , q 0 , q accept , q reject ), where: Q is a finite set of states Γ is the tape alphabet, where   Γ and Σ  Γ q 0  Q is the start state Σ is the input alphabet, where   Σ  : Q  Γ -> Pow( Q  Γ  {L,R}) q accept  Q is the accept state q reject  Q is the reject state, and q reject  q accept

Acceptance for NTM ,[object Object],[object Object],[object Object],[object Object]

Non-Deterministic TM is a Parallel Universe

the set of languages decided by a O(t(n))-time non-deterministic Turing machine. Definition: NTIME(t(n)) is TIME(t(n))  NTIME(t(n))

NTM vs. DTM ,[object Object],[object Object],[object Object]

Deterministic Polynomial Time P = TIME(n k )  k  N

Non-deterministic Polynomial Time NP = NTIME(n k )  k  N

NTM 太抽象？ ,[object Object],[object Object]

Theorem: L  NP if and only if there exists a poly-time Turing machine V with L = { x |  y. |y| = poly(|x|) and V(x,y) accepts } . Proof: ,[object Object],[object Object],Because we can guess y and then run V. (2) If L  NP then L = { x |  y. |y| = poly(|x|) and V(x,y) accepts } Let N be a non-deterministic poly-time TM that decides L. Define V(x,y) to accept if y is an accepting computation history of N on x.

A language is in NP if and only if there exist polynomial-length certificates for membership to the language. SAT is in NP because a satisfying assignment is a polynomial-length certificate that a formula is satisfiable.

NP-Complete ,[object Object],[object Object],[object Object],[object Object]

The World by Karp P 2-SAT, Shortest-Path, Minimum-Cut, Arc-Cover ? NP-Hard NP-Complete SAT, Clique, Hamiltonian-Circuit, Chromatic Number . . . Equivalence of Regular Expression, Equivalence of ND Finite Automata, Context Sensitive Recognition Linear-Inequalities Graph-Isomorphism, Non-Primes NP ? ? In NPC In P

How to use NPC? NP = The set of all the problems for which you can verify an alleged solution in polynomial time.

最好的狀況當然是證明沒有好的方法存在。例如知名的 Sorting Problem 的 Lower Bound 是 O(n lg n).

可是這通常比找出演算法更難

Reference ,[object Object],[object Object],[object Object]

解決問題的層次 ( 一 ) ,[object Object],[object Object],[object Object],[object Object]

解決問題的層次 ( 二 ) ,[object Object],[object Object]

解決問題的層次 ( 三 ) ,[object Object],[object Object],[object Object],[object Object],[object Object]

解決問題的層次 ( 四 ) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

解決問題的層次 ( 五 ) ,[object Object],[object Object]

參考資料來源 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

生物相關問題與應用 ,[object Object],[object Object],[object Object]

字串排比 ,[object Object],[object Object],[object Object]

Global Alignment vs. Local Alignment ,[object Object],[object Object]

兩個序列的分析 ,[object Object],[object Object],[object Object]

Homology Search Tools ,[object Object],[object Object],[object Object],[object Object],[object Object]

三種常用的序列分析方法 ,[object Object],[object Object]

三種常用的序列分析方法 ( 續 ) ,[object Object],[object Object]

為什麼要用排比 (alignment) ？ ,[object Object],[object Object],[object Object],[object Object]

排比 (alignment) ,[object Object],[object Object],[object Object],[object Object],[object Object]

排比的評分方式 ,[object Object],[object Object],[object Object]

「同盟線性評分法」 (affine gap penalties) ,[object Object],[object Object],[object Object]

區域排比 (local alignment) ,[object Object],[object Object]

最佳區域排比的演算法

為什麼要加個 0 ？ ,[object Object]

多個最佳區域排比 ,[object Object],[object Object],[object Object]

BLAST ,[object Object],[object Object]

The maximal segment pair measure ,[object Object],the highest scoring pair ,[object Object],[object Object]

A matrix of similarity scores PAM 120

BLOSUM62 versus PAM250 (For Protein)

BLAST ,[object Object],[object Object],[object Object]

BLAST Step 1: Build the hash table for Sequence A. (3-tuple example) For DNA sequences: Seq. A = AGATCGAT 12345678 AAA AAC .. AGA 1 .. ATC 3 .. CGA 5 .. GAT 2 6 .. TCG 4 .. TTT For protein sequences: Seq. A = ELVIS Add xyz to the hash table if Score(xyz, ELV) ≧ T; Add xyz to the hash table if Score(xyz, LVI) ≧ T; Add xyz to the hash table if Score(xyz, VIS) ≧ T;

BLAST Step2: Scan sequence B for hits.

BLAST Step2: Scan sequence B for hits. Step 3: Extend hits. hit Terminate if the score of the sxtension fades away. (That is, when we reach a segment pair whose score falls a certain distance below the best score found for shorter extensions.) BLAST 2.0 saves the time spent in extension, and considers gapped alignments.

Gapped BLAST (I) The two-hit method

Gapped BLAST (II) Confining the dynamic-programming

多重序列排比 ,[object Object],[object Object]

多重序列排比的計算方法 ,[object Object],[object Object]

多重序列排比的評分方式 ,[object Object],[object Object],[object Object]

參考資料來源 ,[object Object],[object Object],[object Object]

生物網路與圖形理論 ,[object Object],[object Object],[object Object],[object Object]

Bio-Map protein-gene interactions protein-protein interactions PROTEOME GENOME Citrate Cycle METABOLISM Bio-chemical reactions

An Introduction to Graph Theory Definitions and Examples Undirected graph Directed graph isolated vertex adjacent loop multiple edges simple graph : an undirected graph without loop or multiple edges degree of a vertex: number of edges connected (indegree, outdegree) G =( V , E )

x y path : no vertex can be repeated a-b-c-d-e trail : no edge can be repeat a-b-c-d-e-b-d walk : no restriction a-b-d-a-b-c closed if x = y closed trail: circuit (a-b-c-d-b-e-d-a, one draw without lifting pen) closed path: cycle (a-b-c-d-a) a b c d e length : number of edges in this (path,trail,walk)

a x b remove any cycle on the repeated vertices Def 11.4 Let G =( V , E ) be an undirected graph. We call G connected if there is a path between any two distinct vertices of G . a b c d e a b c d e disconnected with two components

雙分圖 Bipartite graphs ,[object Object],[object Object],Complete bipartite graph K 2,3 The graph is bipartite

Def. 11.6 multigraph of multiplicity 3 multigraphs

Subgraphs, Complements, and Graph Isomorphism a b c d e a b c d e b c d e a c d spanning subgraph V 1 = V induced subgraph include all edges of E in V 1

Subgraphs, Complements, and Graph Isomorphism Def. 11.11 complete graph: K n a b c d e K 5 Def. 11.12 complement of a graph G G a b c d e a b c d e

Subgraphs, Complements, and Graph Isomorphism Graph Isomorphism 1 2 3 4 a b c d w x y z

Subgraphs, Complements, and Graph Isomorphism Ex. 11.8 q r w z x y u t v a b c d e f g h i j a-q c-u e-r g-x i-z b-v d-y f-w h-t j-s, isomorphic Ex. 11.9 degree 2 vertices=2 degree 2 vertices=3 Can you think of an algorithm for testing isomorphism?

Graph Alignment: NetworkBLAST/PathBLAST

Centralities ,[object Object],[object Object],[object Object],[object Object]

Centralities ,[object Object],[object Object],[object Object]

Centralities ,[object Object],[object Object]

Weighted Bipartite Matching Given a weighted bipartite graph, find a matching with maximum total weight. Not necessarily a maximum size matching. A B

History ,[object Object],[object Object]

Hungarian algorithm (Augmenting Path Algorithm) ,[object Object],[object Object],Find a shortest path M-augmenting path at each step

Example ,[object Object],Job 1 Job 2 Job 3 Job 4 Job 5 Alice 1$ 2$ 3$ 4$ 5$ Bob 6$ 7$ 8$ 7$ 2$ Chris 1$ 3$ 4$ 4$ 5$ Dirk 3$ 6$ 2$ 8$ 7$ Emma 4$ 1$ 3$ 5$ 4$

Example ,[object Object],[object Object],Cost matrix Excess matrix

Example ,[object Object],Excess matrix

Example ,[object Object],[object Object]

Example Excess matrix ,[object Object],[object Object],[object Object],is a edge of not covered by

Set Cover ,[object Object],[object Object],[object Object],[object Object]

Set Cover ,[object Object],[object Object],[object Object],[object Object],[object Object]

Set Cover ,[object Object],[object Object],[object Object],[object Object],Optimal?

Set Cover ,[object Object],[object Object],[object Object]

Example: A C B D G E F Karger’s Min-Cut Algorithm

Example: A C B D G E F contract

Example: A C B D G E F contract A C B D E FG

Example: A C B D G E F contract A C B D E FG contract

Example: A C B D G E F contract A C B D E FG contract A C B E FGD

Is output min-cut? ,[object Object],[object Object]

人工智慧與機械學習 ,[object Object],[object Object],[object Object],常見的工具： Decision Tree, SVM, Neural Networks, Random Forest

[object Object],[object Object],[object Object],[object Object],[object Object],Origins of Data Mining Machine Learning/ Pattern Recognition Statistics/ AI Data Mining Database systems

Illustrating Classification Task

Example of a Decision Tree Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K Splitting Attributes Training Data Model: Decision Tree categorical categorical continuous class

Another Example of Decision Tree categorical categorical continuous class MarSt Refund TaxInc YES NO NO Yes No Married Single, Divorced < 80K > 80K There could be more than one tree that fits the same data! NO

Decision Tree Classification Task Decision Tree

Apply Model to Test Data Test Data Start from the root of tree. Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K

Apply Model to Test Data Test Data Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K

Apply Model to Test Data Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K Test Data

Apply Model to Test Data Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K Test Data Assign Cheat to “No”

S upport Vector Machines Which hyperplane? y=1 y=-1

S upport Vector Machines Margin Margin = |d + |+|d - | y=1 y=-1 d + d - d -

S upport Vector Machines Maximum Margin d + d + d - d - y=1 y=-1 Support vectors

S upport Vector Machines Margin d b i1 b i2 y=1 y=-1 d x 1 x 2 x 1 -x 2 Page:261 5.32 5.33 5.34

S upport Vector Machines Objective function y=1 y=-1 d ,[object Object],Page:262 Definition 5.1

Perceptron (1) ,[object Object],[object Object],[object Object],w 1 w 2 w n  x 1 x 2 x n x 0 = 1 w 0

Perceptron (2) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Decision Surface of a Perceptron (1) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Decision Surface of a Perceptron (2) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Sigmoid Unit x 1 x 2 x n w 1 w 2 w n  x 0 = 1 w 0

Multilayer Networks (1) ,[object Object],[object Object],[object Object],[object Object],[object Object],o 1 o 2 w 43 Output Layer x 1 x 2 x 3 Input Layer w 11 h 1 h 2 h 3 h 4 Hidden Layer

Advanced Data Structure ,[object Object],[object Object],[object Object],[object Object]

Indexing ,[object Object],[object Object]

S = M A L A Y A L A M $ 1 2 3 4 5 6 7 8 9 10 $ YALAM$ M $ ALAYALAM$ $M YALAM$ $M YALAM$ $M YALAM$ A AL LA 6 2 8 4 7 3 1 9 5 10 Suffix Trees Paths from root to leaves represent all suffixes of S

M A L A Y A L A M $ 1 2 3 4 5 6 7 8 9 10 $ YALAM$ M $ ALAYALAM$ $M YALAM$ $M YALAM$ $M YALAM$ A AL LA 6 2 8 4 7 3 1 9 5 10 Suffix Tree

Suffix tree properties ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Application: Finding a short Pattern in a long String ,[object Object],[object Object],[object Object],[object Object]

Finding a Pattern in a String Find “ALA” $ YALAM$ M $ ALAYALAM$ M$ YALAM$ M$ YALAM$ M$ YALAM$ A AL LA 6 2 8 4 7 3 1 9 5 10 Two matches - at 6 and 2

(10, 10) (5, 10) (1, 1) (10, 10) (2, 10) (3, 4) (5, 10) (9, 10) (2, 2) (5, 10) (9, 10) (3, 4) (9, 10) (5, 10) 6 2 8 4 7 3 1 9 5 10 Edge Encoding S = M A L A Y A L A M $ 1 2 3 4 5 6 7 8 9 10

N äive Suffix Tree Construction Before starting: Why exactly do we need this $ , which is not part of the alphabet? $ 10 M$ 9 AM$ 8 LAM$ 7 ALAM$ 6 YALAM$ 5 AYALAM$ 4 LAYALAM$ 3 ALAYALAM$ 2 MALAYALAM$ 1

N äive Suffix Tree Construction $MALAYALAM LAYALAM$ 1 2 LAYALAM$ 3 A 2 3 4 4 YALAM$ etc. $ 10 M$ 9 AM$ 8 LAM$ 7 ALAM$ 6 YALAM$ 5 AYALAM$ 4 LAYALAM$ 3 ALAYALAM$ 2 MALAYALAM$ 1

Is Suffix Tree good? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Something Wrong?? ,[object Object],[object Object],[object Object],[object Object],[object Object]

Something Wrong?? (2) ,[object Object],[object Object],[object Object],[object Object]

Suffix Array – Reducing Space M A L A Y A L A M $ 1 2 3 4 5 6 7 8 9 10 Suffix Array : Lexicographic ordering of suffixes Derive Longest Common Prefix array Suffix 6 and 2 share “ALA” Suffix 2,8 share just “A”. lcp achieved for successive pairs . $ 10 YALAM$ 5 M$ 9 MALAYALAM$ 1 LAYALAM$ 3 LAM$ 7 AYALAM$ 4 AM$ 8 ALAYALAM$ 2 ALAM$ 6 10 5 9 1 3 7 4 8 2 6 - 0 0 1 0 2 0 1 1 3

Example Text Position Suffix Array 3 1 1 0 2 0 1 0 0 lcp Array M M A L A Y A L A $ 1 2 3 4 5 6 7 8 9 10 3 7 4 10 5 8 9 1 2 6 $ 10 YALAM$ 5 M$ 9 MALAYALAM$ 1 LAYALAM$ 3 LAM$ 7 AYALAM$ 4 AM$ 8 ALAYALAM$ 2 ALAM$ 6

Pattern Search in Suffix Array ,[object Object],[object Object],[object Object],[object Object]

Known (amazing) Results ,[object Object],[object Object]

More Applications ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Approximate set membership problem ,[object Object],[object Object],[object Object],[object Object],[object Object]

Bloom filters ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

0 0 0 0 0 0 0 0 0 0 0 0 Initial with all 0 1 1 1 1 1 x 1 x 2 Each element of S is hashed k times Each hash location set to 1 1 1 1 1 1 y To check if y is in S, check the k hash location. If a 0 appears , y is not in S 1 1 1 1 1 y If only 1s appear, conclude that y is in S This may yield false positive

The probability of a false positive ,[object Object],[object Object]

[object Object],[object Object],[object Object],[object Object]

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Deterministic Tools ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Range Searching ,[object Object],[object Object],[object Object],Query Q Report/Count answers

Single-shot Vs Repeatitive ,[object Object],[object Object],[object Object]

Orthogonal Range Searching in 1D ,[object Object],[object Object],a b Which query points lie inside the interval [a,b]?

Orthogonal Range Searching in 2D ,[object Object],[object Object]

[object Object],1D Range Query 7 7 19 15 12 8 2 4 5 2 4 5 8 12 15 2 4 5 7 8 12 15 19 query: O(log n+k) space: O(n) 6 17

Querying Strategy ,[object Object],[object Object],Paths split a b Problem: linking leaves do not extends to higher dimensions. Idea: if parents knew all descendants, wouldn’t need to link leaves.

Efficiency ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

1D Range Counting ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

2D Range queries ,[object Object],[object Object],x y x 1 x 2 y 1 y 2

Range trees ,[object Object],[object Object],[object Object],[object Object],[object Object],BST on y-coords P ( v ) T y ( v ) T P ( v ) v BST on x-coords

[object Object],[object Object],[object Object],Range trees T x v P( v ) T y ( v ) P( v ) p 1 p 2 p 3 p 4 p 5 p 6 p 7 p 1 p 2 p 3 p 4 p 5 p 6 p 7 v T 4 p 7 p 5 p 6 T y ( v )

[object Object],Range trees T x v P( v ) T y ( v ) P( v ) p 1 p 2 p 3 p 4 p 5 p 6 p 7 p 1 p 2 p 3 p 4 p 5 p 6 p 7 v T 4 p 7 p 5 p 6 T y ( v ) x y p1 1 2.5 p2 2 1 p3 3 0 p4 4 4 p5 4.5 3 p6 5.5 3.5 p7 6.5 2

Range trees The query time : Querying a 1D-tree requires O(log n+k) time. How many 1D trees (associated structures) do we need to query? At most 2  height of T = 2 log n Each 1D query requires O(log n+k’) time.  Query time = O(log 2 n + k) Answer to query = Union of answers to subqueries: k = ∑k’ . Query: [x,x’] x x’

Size of the range tree ,[object Object],[object Object],[object Object],[object Object]

Building the range tree ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Generalizing to higher dimensions ,[object Object],[object Object],[object Object],[object Object],[object Object]

參考資料來源 ,[object Object],[object Object],[object Object],[object Object]

2009 CSBB LAB 新生訓練

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 2009 CSBB LAB 新生訓練

Similar to 2009 CSBB LAB 新生訓練 (20)

More from Abner Huang

More from Abner Huang (8)

2009 CSBB LAB 新生訓練

Editor's Notes