bioinfolec_7th_20071005

772 views
733 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
772
On SlideShare
0
From Embeds
0
Number of Embeds
17
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

bioinfolec_7th_20071005

  1. 1. X X ∩Y Y p(x, y) I(X; Y ) = p(x, y) log p(x)p(y) y∈Y x∈X |X ∩ Y | min(|X|, |Y |)
  2. 2. $ curl quot;http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi? db=pubmed&term=CDK2quot; <?xml version=quot;1.0quot;?> <!DOCTYPE eSearchResult PUBLIC quot;-//NLM//DTD eSearchResult, 11 May 2002//ENquot; quot;http://www.ncbi.nlm.nih.gov/entrez/query/DTD/ eSearch_020511.dtdquot;> <eSearchResult> <Count>3778</Count> <RetMax>20</RetMax> <RetStart>0</RetStart> <IdList> <Id>17904841</Id> <Id>17904366</Id> <Id>17893107</Id> () </eSearchResult>
  3. 3. $ curl quot;http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi? db=pubmed&term=CDK6quot; () <eSearchResult> <Count>740</Count> <RetMax>20</RetMax> <RetStart>0</RetStart> () </eSearchResult> $ curl quot;http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi? db=pubmed&term=CDK2+CDK6quot; () <eSearchResult> <Count>321</Count> <RetMax>20</RetMax> <RetStart>0</RetStart> () </eSearchResult>
  4. 4. 321 |X ∩ Y | = min(|X|, |Y |) min(3778, 740) 321 = = 0.438 740
  5. 5. $ ruby simpson.rb CDK2 CDK6 CDK2 CDK6 3778 742 321 0.432614555256065
  6. 6. #!/usr/bin/env ruby require 'rexml/document' require 'open-uri' def count(gene) fp = open(quot;http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi? db=pubmed&term=quot;+gene) source = fp.read fp.close doc = REXML::Document.new source return doc.elements['/eSearchResult/Count'].text.to_i end def simpson(gene1_count, gene2_count, gene12_count) if gene1_count <= 0 || gene2_count <= 0 return nil elsif gene1_count < gene2_count return gene12_count.to_f / gene1_count.to_f end return gene12_count.to_f / gene2_count.to_f end
  7. 7. def main(gene1,gene2) gene1_count = count(gene1) gene2_count = count(gene2) gene12_count = count(gene1 + quot;+quot; + gene2) s = simpson(gene1_count, gene2_count, gene12_count) puts [gene1, gene2, gene1_count, gene2_count, gene12_count, s].join (quot; quot;) end main(ARGV[0],ARGV[1])

×