SlideShare a Scribd company logo
1 of 9
Download to read offline
A T A C G T T C A G A T T G C C A G C A C G T T

Algorithms & Complexity Evaluation

Fast Search in
Biological Sequences
using Multiple Hash
Functions
We are going to deal with a very tiny alphabet
representing nucleotydes in a genetic sequence.

A DENINE
T HYMINE

Searching in a sequence for
more patterns.

G UANINE
C YTOSINE

After veryfing matches,
advance window: pos++

search window

T G A G C
A G G C A
T G T C G

patterns to
search

T G A G C

shift window
by 1 position

A T G A C G A C T

A G G C A
T G T C G

A T G A C G A C T

Grasping the
problem
string matching???
what’s this?

DNA sequence

Presentation by Simone Tino - All rights reserved. Authored from November 2012 to December 2012 - University of Catania - Faculty of Computer Science - Algoritmi e Complessità
a pattern
NOT A TEXT!!!

now... a text!

T G A G C A C T G
gram dim q = 3

T G A G C A C T G
extract
ing the
first
q-gram

T G A

First we
have pre processing
stage...

F[HASH(’CTG’)] =
patterns[cur]
feeding the hash function
with the extracted q-gram,
hash is returned:
0 <= hash <= MAX

HASH ( T G A ) = #@!*$%£&?
calculated hash is
used as index in
shift array

value used to
shift the window

sh[ #@!*$%£&? ] = shift

Let‛s talk
about
Wu &
Mamber
don’t worry!
It’s not a
magic spell...
it’s just an
algorithm

A
G
T
C

C
T
G
T

A
A
A
G

A
G
G
A

C
C
T
C

T
G
A
C

G
T
G
G

G
G
T
C

C
A
A
T

G
G
G
C

A
C
A
C

Then we
can move
to real
search...

window size =
pattern size = m

A C A A C T G G C
extracting the
last q-gram only

G G C

hash function gets the
q-gram, hash returned:
0 <= hash <= MAX

HASH ( G G C ) = ^@!*%£$?#
shift index

shift

= sh[ ^@!*%£$?# ]
0?

true

NAIVE CHECK

Presentation by Simone Tino - All rights reserved. Authored from November 2012 to December 2012 - University of Catania - Faculty of Computer Science - Algoritmi e Complessità
T G A

k = Math.floor(w/q);

0

1

0

0

1

0

1

k

W-M limit
cannot increase
them both...

Decrease
number
of false
positives

0

1

k
w

1

1

0

0

1

1

0

k

More text to
analize

Increase q

More bits
per char

Increase k

Presentation by Simone Tino - All rights reserved. Authored from November 2012 to December 2012 - University of Catania - Faculty of Computer Science - Algoritmi e Complessità
Enhancing
W-M...

T G A G C A C T G

γ =1

γ =2

T G A G C A C T G

pre-processing

T G A G C A C T G

HASH(’CTG’) = h1

HASH(’GCA’) = h2

HASH ( T G A ) = #@!*$%

HASH ( T G A ) = #@!*$%

sh 1[ #@!*$% ] = m-q-i

sh 2[ #@!*$% ] = m-2q-i

to be
continued...

h = ( h1 << 1) + h2

F[h] = patterns[cur]

Presentation by Simone Tino - All rights reserved. Authored from November 2012 to December 2012 - University of Catania - Faculty of Computer Science - Algoritmi e Complessità
...Enhancing
W-M
search

window

A
G
T
C

C
T
G
T

A
A
A
G

A
G
G
A

C
C
T
C

T
G
A
C

G
T
G
G

G
G
T
C

C
A
A
T

G
G
G
C

A
C
A
C

shift1 = sh 1[ §+!#*£$?% ]

HASH ( G G C ) = §+!#*£$?%

a text

h1

A C A A C T G G C

...now you
can’t go back

In the end...

h = ( h1 << 1) + h2
if (shift1 == 0 &&
shift2 == 0)
foreach (p in F[h])
checkOccurrInWin(p);

h2

HASH ( A C T ) = ^@!*%£$?#
shift2 = sh 2[ ^@!*%£$?# ]

Presentation by Simone Tino - All rights reserved. Authored from November 2012 to December 2012 - University of Catania - Faculty of Computer Science - Algoritmi e Complessità
Complexities
Pre-processing
O ( MAX (1+

O ( MAX + r

) + r ) = Space requirement
m q ) = Time requirement

Search phase
O ( m (1) n ) = Time requirement
m

(1)

=

r
i=1

( len ( p ))
i

Presentation by Simone Tino - All rights reserved. Authored from November 2012 to December 2012 - University of Catania - Faculty of Computer Science - Algoritmi e Complessità
Experimental results
35

time

100

WM(6,1)

MBNDM

time

1200

WM(4,2)

|P| = 100

30

best WM(q,γ)

time

|P| = 1000
80

1000

25
20

800

60

15

WM(8,1)

10

400

5
0

WM(8,1)

40

|P| = 10000

600

WM(8,1)
8

16

20

WM(8,1)

32

64

WM(8,2)

WM(8,1)
128

w

0

8

16

32

WM(4,2)

WM(8,3) WM(8,3)

64

128

WM(8,2) WM(8,2) WM(8,2) WM(8,2)

200

w

0

8

16

32

64

128

w

Showing comparison on execution times among WM(q,γ)
and one of the current fastest algorithms in literature
Presentation by Simone Tino - All rights reserved. Authored from November 2012 to December 2012 - University of Catania - Faculty of Computer Science - Algoritmi e Complessità
A T A C G T T C A G A T T G C C A G C A C G T T

The End

More Related Content

Recently uploaded

An Overview of the Odoo 17 Knowledge App
An Overview of the Odoo 17 Knowledge AppAn Overview of the Odoo 17 Knowledge App
An Overview of the Odoo 17 Knowledge AppCeline George
 
Chapter 7 Pharmacosy Traditional System of Medicine & Ayurvedic Preparations ...
Chapter 7 Pharmacosy Traditional System of Medicine & Ayurvedic Preparations ...Chapter 7 Pharmacosy Traditional System of Medicine & Ayurvedic Preparations ...
Chapter 7 Pharmacosy Traditional System of Medicine & Ayurvedic Preparations ...Sumit Tiwari
 
Graduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptxGraduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptxneillewis46
 
UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024Borja Sotomayor
 
Poster_density_driven_with_fracture_MLMC.pdf
Poster_density_driven_with_fracture_MLMC.pdfPoster_density_driven_with_fracture_MLMC.pdf
Poster_density_driven_with_fracture_MLMC.pdfAlexander Litvinenko
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...Nguyen Thanh Tu Collection
 
SURVEY I created for uni project research
SURVEY I created for uni project researchSURVEY I created for uni project research
SURVEY I created for uni project researchCaitlinCummins3
 
Features of Video Calls in the Discuss Module in Odoo 17
Features of Video Calls in the Discuss Module in Odoo 17Features of Video Calls in the Discuss Module in Odoo 17
Features of Video Calls in the Discuss Module in Odoo 17Celine George
 
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading RoomSternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading RoomSean M. Fox
 
Đề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinhĐề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinhleson0603
 
PSYPACT- Practicing Over State Lines May 2024.pptx
PSYPACT- Practicing Over State Lines May 2024.pptxPSYPACT- Practicing Over State Lines May 2024.pptx
PSYPACT- Practicing Over State Lines May 2024.pptxMarlene Maheu
 
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽中 央社
 
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...Nguyen Thanh Tu Collection
 
How to Manage Closest Location in Odoo 17 Inventory
How to Manage Closest Location in Odoo 17 InventoryHow to Manage Closest Location in Odoo 17 Inventory
How to Manage Closest Location in Odoo 17 InventoryCeline George
 
MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...
MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...
MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...Krashi Coaching
 
Dementia (Alzheimer & vasular dementia).
Dementia (Alzheimer & vasular dementia).Dementia (Alzheimer & vasular dementia).
Dementia (Alzheimer & vasular dementia).Mohamed Rizk Khodair
 
philosophy and it's principles based on the life
philosophy and it's principles based on the lifephilosophy and it's principles based on the life
philosophy and it's principles based on the lifeNitinDeodare
 
ANTI PARKISON DRUGS.pptx
ANTI         PARKISON          DRUGS.pptxANTI         PARKISON          DRUGS.pptx
ANTI PARKISON DRUGS.pptxPoojaSen20
 

Recently uploaded (20)

An Overview of the Odoo 17 Knowledge App
An Overview of the Odoo 17 Knowledge AppAn Overview of the Odoo 17 Knowledge App
An Overview of the Odoo 17 Knowledge App
 
Chapter 7 Pharmacosy Traditional System of Medicine & Ayurvedic Preparations ...
Chapter 7 Pharmacosy Traditional System of Medicine & Ayurvedic Preparations ...Chapter 7 Pharmacosy Traditional System of Medicine & Ayurvedic Preparations ...
Chapter 7 Pharmacosy Traditional System of Medicine & Ayurvedic Preparations ...
 
Graduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptxGraduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptx
 
Including Mental Health Support in Project Delivery, 14 May.pdf
Including Mental Health Support in Project Delivery, 14 May.pdfIncluding Mental Health Support in Project Delivery, 14 May.pdf
Including Mental Health Support in Project Delivery, 14 May.pdf
 
UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024
 
Poster_density_driven_with_fracture_MLMC.pdf
Poster_density_driven_with_fracture_MLMC.pdfPoster_density_driven_with_fracture_MLMC.pdf
Poster_density_driven_with_fracture_MLMC.pdf
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
 
SURVEY I created for uni project research
SURVEY I created for uni project researchSURVEY I created for uni project research
SURVEY I created for uni project research
 
Features of Video Calls in the Discuss Module in Odoo 17
Features of Video Calls in the Discuss Module in Odoo 17Features of Video Calls in the Discuss Module in Odoo 17
Features of Video Calls in the Discuss Module in Odoo 17
 
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading RoomSternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
 
Đề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinhĐề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinh
 
PSYPACT- Practicing Over State Lines May 2024.pptx
PSYPACT- Practicing Over State Lines May 2024.pptxPSYPACT- Practicing Over State Lines May 2024.pptx
PSYPACT- Practicing Over State Lines May 2024.pptx
 
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
 
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
 
How to Manage Closest Location in Odoo 17 Inventory
How to Manage Closest Location in Odoo 17 InventoryHow to Manage Closest Location in Odoo 17 Inventory
How to Manage Closest Location in Odoo 17 Inventory
 
MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...
MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...
MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...
 
Dementia (Alzheimer & vasular dementia).
Dementia (Alzheimer & vasular dementia).Dementia (Alzheimer & vasular dementia).
Dementia (Alzheimer & vasular dementia).
 
philosophy and it's principles based on the life
philosophy and it's principles based on the lifephilosophy and it's principles based on the life
philosophy and it's principles based on the life
 
ANTI PARKISON DRUGS.pptx
ANTI         PARKISON          DRUGS.pptxANTI         PARKISON          DRUGS.pptx
ANTI PARKISON DRUGS.pptx
 
IPL Online Quiz by Pragya; Question Set.
IPL Online Quiz by Pragya; Question Set.IPL Online Quiz by Pragya; Question Set.
IPL Online Quiz by Pragya; Question Set.
 

Featured

Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 

Featured (20)

Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 

Fast Searching in Biological Sequences Using Multiple Hash Functions

  • 1. A T A C G T T C A G A T T G C C A G C A C G T T Algorithms & Complexity Evaluation Fast Search in Biological Sequences using Multiple Hash Functions
  • 2. We are going to deal with a very tiny alphabet representing nucleotydes in a genetic sequence. A DENINE T HYMINE Searching in a sequence for more patterns. G UANINE C YTOSINE After veryfing matches, advance window: pos++ search window T G A G C A G G C A T G T C G patterns to search T G A G C shift window by 1 position A T G A C G A C T A G G C A T G T C G A T G A C G A C T Grasping the problem string matching??? what’s this? DNA sequence Presentation by Simone Tino - All rights reserved. Authored from November 2012 to December 2012 - University of Catania - Faculty of Computer Science - Algoritmi e Complessità
  • 3. a pattern NOT A TEXT!!! now... a text! T G A G C A C T G gram dim q = 3 T G A G C A C T G extract ing the first q-gram T G A First we have pre processing stage... F[HASH(’CTG’)] = patterns[cur] feeding the hash function with the extracted q-gram, hash is returned: 0 <= hash <= MAX HASH ( T G A ) = #@!*$%£&? calculated hash is used as index in shift array value used to shift the window sh[ #@!*$%£&? ] = shift Let‛s talk about Wu & Mamber don’t worry! It’s not a magic spell... it’s just an algorithm A G T C C T G T A A A G A G G A C C T C T G A C G T G G G G T C C A A T G G G C A C A C Then we can move to real search... window size = pattern size = m A C A A C T G G C extracting the last q-gram only G G C hash function gets the q-gram, hash returned: 0 <= hash <= MAX HASH ( G G C ) = ^@!*%£$?# shift index shift = sh[ ^@!*%£$?# ] 0? true NAIVE CHECK Presentation by Simone Tino - All rights reserved. Authored from November 2012 to December 2012 - University of Catania - Faculty of Computer Science - Algoritmi e Complessità
  • 4. T G A k = Math.floor(w/q); 0 1 0 0 1 0 1 k W-M limit cannot increase them both... Decrease number of false positives 0 1 k w 1 1 0 0 1 1 0 k More text to analize Increase q More bits per char Increase k Presentation by Simone Tino - All rights reserved. Authored from November 2012 to December 2012 - University of Catania - Faculty of Computer Science - Algoritmi e Complessità
  • 5. Enhancing W-M... T G A G C A C T G γ =1 γ =2 T G A G C A C T G pre-processing T G A G C A C T G HASH(’CTG’) = h1 HASH(’GCA’) = h2 HASH ( T G A ) = #@!*$% HASH ( T G A ) = #@!*$% sh 1[ #@!*$% ] = m-q-i sh 2[ #@!*$% ] = m-2q-i to be continued... h = ( h1 << 1) + h2 F[h] = patterns[cur] Presentation by Simone Tino - All rights reserved. Authored from November 2012 to December 2012 - University of Catania - Faculty of Computer Science - Algoritmi e Complessità
  • 6. ...Enhancing W-M search window A G T C C T G T A A A G A G G A C C T C T G A C G T G G G G T C C A A T G G G C A C A C shift1 = sh 1[ §+!#*£$?% ] HASH ( G G C ) = §+!#*£$?% a text h1 A C A A C T G G C ...now you can’t go back In the end... h = ( h1 << 1) + h2 if (shift1 == 0 && shift2 == 0) foreach (p in F[h]) checkOccurrInWin(p); h2 HASH ( A C T ) = ^@!*%£$?# shift2 = sh 2[ ^@!*%£$?# ] Presentation by Simone Tino - All rights reserved. Authored from November 2012 to December 2012 - University of Catania - Faculty of Computer Science - Algoritmi e Complessità
  • 7. Complexities Pre-processing O ( MAX (1+ O ( MAX + r ) + r ) = Space requirement m q ) = Time requirement Search phase O ( m (1) n ) = Time requirement m (1) = r i=1 ( len ( p )) i Presentation by Simone Tino - All rights reserved. Authored from November 2012 to December 2012 - University of Catania - Faculty of Computer Science - Algoritmi e Complessità
  • 8. Experimental results 35 time 100 WM(6,1) MBNDM time 1200 WM(4,2) |P| = 100 30 best WM(q,γ) time |P| = 1000 80 1000 25 20 800 60 15 WM(8,1) 10 400 5 0 WM(8,1) 40 |P| = 10000 600 WM(8,1) 8 16 20 WM(8,1) 32 64 WM(8,2) WM(8,1) 128 w 0 8 16 32 WM(4,2) WM(8,3) WM(8,3) 64 128 WM(8,2) WM(8,2) WM(8,2) WM(8,2) 200 w 0 8 16 32 64 128 w Showing comparison on execution times among WM(q,γ) and one of the current fastest algorithms in literature Presentation by Simone Tino - All rights reserved. Authored from November 2012 to December 2012 - University of Catania - Faculty of Computer Science - Algoritmi e Complessità
  • 9. A T A C G T T C A G A T T G C C A G C A C G T T The End