SlideShare a Scribd company logo
Working with Dictionaries and Lists/Sets
Modules you can use:
argparse, reLinks, osLinks, collectionsLinks, sysLinks
General Guidelines (Steps 1-6)
You have more flexibility to implement your own function names and logic in these programs.
The data files you need for this assignment can obtained from:
HUGO_genes.txt, chr21_genes.txt
Create an output directory inside your assignment4 directory called "OUTPUT" for result files, so
that they will not mix with your programs. Output from your programs will be here!
Pay close attention to how you'll run the run_lints.sh script (see below)
Your program must implement command line options for the infiles it must open, but for
testing purposes it should run by default, so if no command line option is passed at the command
line, the program will still run. This will help in the grading of your program.
Create a Python Module called called io_utils.py. Put this io_utils.py Module in a subdirectory
named assignment4 inside your assignment4 top-level directory (see the tree below). Anytime a
file needs to be opened (read or write) in your programs in this assignment, the program should
call on this module's function get_filehandle. You can then use io_utils.get_filehandle by doing
this at the top of your programs:
from assignment4 import io_utils
# I can then use the module's get_filehandle() function by:
fh_in = io_utils.get_filehandle(infile1, "r") # note the function call "io_utils.get_filehandle()"
You can also import this way (Either way is acceptable, but I will assume in this assignment you
did it the way below. I think this is better for Pycharm)
from assignment4.io_utils import get_filehandle
# I can then use the module's get_filehandle() function by:
fh_in = get_filehandle(infile1, "r") # note the function call "get_filehandle()"
Your final submission must have the following files in bold and must use this directory structure.
Make sure you put a blank __init__.py where I've denoted below, e.g.: assignment4/
assignment4 (see below)
Make sure you have __init__.py only in assignment4/assignment4 folder, the tests folder, the unit
folder, and not in the assignment4 main folder.
Information on Source files
The chr21_genes.txt file lists genes from human chromosome 21, in their order along the
chromosome, as described in Hattori et al. (Nature 405, 311-319)Links to an external site.. For
each gene, the file gives the gene symbol, description and category. The fields are separated by
tabs. You will need to get the the meaning of each category. You can find these meanings in the
original paperLinks to an external site., under the "Gene categories" section. Create a file named
chr21_genes_categories.txt that store this information in tab separated fields:
This will be used in program #2
The HUGO_genes.txt file lists all human genes having official symbol approved by the HUGO
gene nomenclature committeeLinks to an external site. (some have probably changed by now).
For each gene, the file gives its symbol and description, separated by a TAB character.
Exercises
1. Write a program (call it gene_names_from_chr21.py) that asks the user to enter a gene
symbol and then prints the description for that gene based on data from the chr21_genes.txt file.
The program should give an error message if the entered symbol is not found in the table (the user
should should not have to worry about case, i.e. it will be a case-insensitive search). The program
should continue to ask the user for genes until "quit" or "exit" is given (case-insensitive). Make
sure to prompt the user to enter the quit to end the program. Use Dictionaries to solve this
problem. HINT: Feel free to use as Dictionary of Dictionaries, but it is not required.
HINT: First read the entire text file into a Dictionary that maps the association between gene
symbol and description. Once again, make sure to use a Dictionary.
Remember to have these command line options:
$ python3 gene_names_from_chr21.py -i chr21_genes.txt
Output from this program should just go to <STDOUT>:
2. Write a program (call it find_common_cats.py) that counts how many genes are in each
category (1.1, 1.2, 2.1 etc.) based on data from the chr21_genes.txt file. The program should print
the results so that categories are arranged in ascending order to an output file (call the output
output OUTPUT/categories.txt . Read the paper to see what the categories represent and have
this part of your output (this will be input from chr21_genes_categories.txt). Use Dictionaries to
solve this problem. HINT: Feel free to use as Dictionary of Dictionaries, but it is not required.
Note: you will notice that one gene has no category information. That's due to missing data in the
file, JUST IGNORE THIS GENE!.
Remember to have these command line options:
$ python3 find_common_cats.py -i1 chr21_genes.txt -i2 chr21_genes_categories.txt
Output to the file (OUTPUT/categories.txt) from this program:
Note <Occurrence Here> is a number
3. Write a program (call it intersection_of_gene_names.py) that finds all gene symbols that
appear both in the chr21_genes.txt file and in the HUGO_genes.txt file. These gene symbols
should be printed to a file in alphabetical order (you can hard code the output file
OUTPUT/intersection_output.txt) . The program should also print on the terminal how many
common gene symbols were found. Use Lists or Sets to solve the problem. It is fine to use a
temporary Dictionary to find the intersection of two Lists, but this can be simplified with Sets. Note:
HUGO_genes.txt could have some duplicate entries.
Remember to have these command line options:
$ python3 intersection_of_gene_names.py -i1 chr21_genes.txt -i2 HUGO_genes.txt # the N
's below are an integer and bolded for illustration only
Number of unique gene names in chr21_genes.txt: N
Number of unique gene names in HUGO_genes.txt: N
Number of common gene symbols found: N
Output stored in OUTPUT/intersection_output.txt
STDOUT is shown above, and the actual output of the intersection goes to the file (
OUTPUT/intersection_output.txt) from this program:
If you implemented intersection_of_gene_names.py correctly, this program could take any gene
file that has the gene in the first column (even if it's the only column)
(additional examples: hgnc_complete_set_reduced.txtLinks to an external site. and
gene_age.txtLinks to an external site.)
$ python3 intersection_of_gene_names.py -i1 hgnc_complete_set_reduced.txt -i2
HUGO_genes.txt
Number of unique gene names in hgnc_complete_set_reduced.txt: 43547
Number of unique gene names in HUGO_genes.txt: 11815
Number of common gene symbols found: 8654
Output stored in OUTPUT/intersection_output.txt
$ python3 intersection_of_gene_names.py -i1 gene_age.txt -i2 chr21_genes.txt
Number of unique gene names in gene_age.txt: 307
Number of unique gene names in chr21_genes.txt: 285
Number of common gene symbols found: 4
Output stored in OUTPUT/intersection_output.txt
You must solve exercises 1 and 2 by using Dictionaries, and exercise 3 using Lists or Sets

More Related Content

Similar to Working with Dictionaries and ListsSets Modules you can use.pdf

Part 5 of "Introduction to Linux for Bioinformatics": Working the command lin...
Part 5 of "Introduction to Linux for Bioinformatics": Working the command lin...Part 5 of "Introduction to Linux for Bioinformatics": Working the command lin...
Part 5 of "Introduction to Linux for Bioinformatics": Working the command lin...
Joachim Jacob
 
Project report
Project reportProject report
Project report
meenalpandey
 
You must implement the following functions- Name the functions exactly.docx
You must implement the following functions- Name the functions exactly.docxYou must implement the following functions- Name the functions exactly.docx
You must implement the following functions- Name the functions exactly.docx
Sebastian6SWSlaterb
 
Hb 1486-001 1074970 qsg-gene_readdataanalysis_1112
Hb 1486-001 1074970 qsg-gene_readdataanalysis_1112Hb 1486-001 1074970 qsg-gene_readdataanalysis_1112
Hb 1486-001 1074970 qsg-gene_readdataanalysis_1112Elsa von Licy
 
Lab 1 Essay
Lab 1 EssayLab 1 Essay
Lab 1 Essay
Melissa Moore
 
Must be similar to screenshotsI must be able to run the projects.docx
Must be similar to screenshotsI must be able to run the projects.docxMust be similar to screenshotsI must be able to run the projects.docx
Must be similar to screenshotsI must be able to run the projects.docx
herthaweston
 
Digital Forensic Examination Summary Report(for ALL lab assignme.docx
Digital Forensic Examination Summary Report(for ALL lab assignme.docxDigital Forensic Examination Summary Report(for ALL lab assignme.docx
Digital Forensic Examination Summary Report(for ALL lab assignme.docx
lynettearnold46882
 
Description 1) Create a Lab2 folder for this project2.docx
Description       1)  Create a Lab2 folder for this project2.docxDescription       1)  Create a Lab2 folder for this project2.docx
Description 1) Create a Lab2 folder for this project2.docx
theodorelove43763
 
GE3151_PSPP_UNIT_5_Notes
GE3151_PSPP_UNIT_5_NotesGE3151_PSPP_UNIT_5_Notes
GE3151_PSPP_UNIT_5_Notes
Asst.prof M.Gokilavani
 
intro unix/linux 11
intro unix/linux 11intro unix/linux 11
intro unix/linux 11
duquoi
 
Instructions Write a program whose main function is merely a.pdf
Instructions Write a program whose main function is merely a.pdfInstructions Write a program whose main function is merely a.pdf
Instructions Write a program whose main function is merely a.pdf
adinathknit
 
C++ - UNIT_-_V.pptx which contains details about File Concepts
C++  - UNIT_-_V.pptx which contains details about File ConceptsC++  - UNIT_-_V.pptx which contains details about File Concepts
C++ - UNIT_-_V.pptx which contains details about File Concepts
ANUSUYA S
 
CDS Filtering Program - User Manual
CDS Filtering Program - User ManualCDS Filtering Program - User Manual
CDS Filtering Program - User Manual
Yoann Pageaud
 
1st KeyStone Summer School - Hackathon Challenge
1st KeyStone Summer School - Hackathon Challenge1st KeyStone Summer School - Hackathon Challenge
1st KeyStone Summer School - Hackathon Challenge
Joel Azzopardi
 
Python for Physical Science.pdf
Python for Physical Science.pdfPython for Physical Science.pdf
Python for Physical Science.pdf
MarilouANDERSON
 
CS101S. ThompsonUniversity of BridgeportLab 7 Files, File.docx
CS101S. ThompsonUniversity of BridgeportLab 7 Files, File.docxCS101S. ThompsonUniversity of BridgeportLab 7 Files, File.docx
CS101S. ThompsonUniversity of BridgeportLab 7 Files, File.docx
annettsparrow
 
CSO Laboratory Manual
CSO Laboratory ManualCSO Laboratory Manual
CSO Laboratory Manual
Dwight Sabio
 
2600 v08 n1 (spring 1991)
2600 v08 n1 (spring 1991)2600 v08 n1 (spring 1991)
2600 v08 n1 (spring 1991)
Felipe Prado
 
Sam python pro_points_slide
Sam python pro_points_slideSam python pro_points_slide
Sam python pro_points_slide
"Samprateek "Sam"" Sinha
 

Similar to Working with Dictionaries and ListsSets Modules you can use.pdf (20)

Part 5 of "Introduction to Linux for Bioinformatics": Working the command lin...
Part 5 of "Introduction to Linux for Bioinformatics": Working the command lin...Part 5 of "Introduction to Linux for Bioinformatics": Working the command lin...
Part 5 of "Introduction to Linux for Bioinformatics": Working the command lin...
 
Project report
Project reportProject report
Project report
 
You must implement the following functions- Name the functions exactly.docx
You must implement the following functions- Name the functions exactly.docxYou must implement the following functions- Name the functions exactly.docx
You must implement the following functions- Name the functions exactly.docx
 
Unit 5
Unit 5Unit 5
Unit 5
 
Hb 1486-001 1074970 qsg-gene_readdataanalysis_1112
Hb 1486-001 1074970 qsg-gene_readdataanalysis_1112Hb 1486-001 1074970 qsg-gene_readdataanalysis_1112
Hb 1486-001 1074970 qsg-gene_readdataanalysis_1112
 
Lab 1 Essay
Lab 1 EssayLab 1 Essay
Lab 1 Essay
 
Must be similar to screenshotsI must be able to run the projects.docx
Must be similar to screenshotsI must be able to run the projects.docxMust be similar to screenshotsI must be able to run the projects.docx
Must be similar to screenshotsI must be able to run the projects.docx
 
Digital Forensic Examination Summary Report(for ALL lab assignme.docx
Digital Forensic Examination Summary Report(for ALL lab assignme.docxDigital Forensic Examination Summary Report(for ALL lab assignme.docx
Digital Forensic Examination Summary Report(for ALL lab assignme.docx
 
Description 1) Create a Lab2 folder for this project2.docx
Description       1)  Create a Lab2 folder for this project2.docxDescription       1)  Create a Lab2 folder for this project2.docx
Description 1) Create a Lab2 folder for this project2.docx
 
GE3151_PSPP_UNIT_5_Notes
GE3151_PSPP_UNIT_5_NotesGE3151_PSPP_UNIT_5_Notes
GE3151_PSPP_UNIT_5_Notes
 
intro unix/linux 11
intro unix/linux 11intro unix/linux 11
intro unix/linux 11
 
Instructions Write a program whose main function is merely a.pdf
Instructions Write a program whose main function is merely a.pdfInstructions Write a program whose main function is merely a.pdf
Instructions Write a program whose main function is merely a.pdf
 
C++ - UNIT_-_V.pptx which contains details about File Concepts
C++  - UNIT_-_V.pptx which contains details about File ConceptsC++  - UNIT_-_V.pptx which contains details about File Concepts
C++ - UNIT_-_V.pptx which contains details about File Concepts
 
CDS Filtering Program - User Manual
CDS Filtering Program - User ManualCDS Filtering Program - User Manual
CDS Filtering Program - User Manual
 
1st KeyStone Summer School - Hackathon Challenge
1st KeyStone Summer School - Hackathon Challenge1st KeyStone Summer School - Hackathon Challenge
1st KeyStone Summer School - Hackathon Challenge
 
Python for Physical Science.pdf
Python for Physical Science.pdfPython for Physical Science.pdf
Python for Physical Science.pdf
 
CS101S. ThompsonUniversity of BridgeportLab 7 Files, File.docx
CS101S. ThompsonUniversity of BridgeportLab 7 Files, File.docxCS101S. ThompsonUniversity of BridgeportLab 7 Files, File.docx
CS101S. ThompsonUniversity of BridgeportLab 7 Files, File.docx
 
CSO Laboratory Manual
CSO Laboratory ManualCSO Laboratory Manual
CSO Laboratory Manual
 
2600 v08 n1 (spring 1991)
2600 v08 n1 (spring 1991)2600 v08 n1 (spring 1991)
2600 v08 n1 (spring 1991)
 
Sam python pro_points_slide
Sam python pro_points_slideSam python pro_points_slide
Sam python pro_points_slide
 

More from advancesystem

Write a Fortran program to SOS take the name and ID number.pdf
Write a Fortran program to  SOS  take the name and ID number.pdfWrite a Fortran program to  SOS  take the name and ID number.pdf
Write a Fortran program to SOS take the name and ID number.pdf
advancesystem
 
Write a function that will take four parameters omega x0.pdf
Write a function that will take four parameters omega  x0.pdfWrite a function that will take four parameters omega  x0.pdf
Write a function that will take four parameters omega x0.pdf
advancesystem
 
Write a function that will take four parameters omega p.pdf
Write a function that will take four parameters omega  p.pdfWrite a function that will take four parameters omega  p.pdf
Write a function that will take four parameters omega p.pdf
advancesystem
 
Write a generalpurpose program with loop and indexed addres.pdf
Write a generalpurpose program with loop and indexed addres.pdfWrite a generalpurpose program with loop and indexed addres.pdf
Write a generalpurpose program with loop and indexed addres.pdf
advancesystem
 
Write a computer program in JAVA that hides a secret message.pdf
Write a computer program in JAVA that hides a secret message.pdfWrite a computer program in JAVA that hides a secret message.pdf
Write a computer program in JAVA that hides a secret message.pdf
advancesystem
 
Write a Fortran program to 1 take the name and ID number of.pdf
Write a Fortran program to 1 take the name and ID number of.pdfWrite a Fortran program to 1 take the name and ID number of.pdf
Write a Fortran program to 1 take the name and ID number of.pdf
advancesystem
 
Write a function called ApologyLine that take an integer c a.pdf
Write a function called ApologyLine that take an integer c a.pdfWrite a function called ApologyLine that take an integer c a.pdf
Write a function called ApologyLine that take an integer c a.pdf
advancesystem
 
Write a C++ code that makes pyramid shape as long as user wa.pdf
Write a C++ code that makes pyramid shape as long as user wa.pdfWrite a C++ code that makes pyramid shape as long as user wa.pdf
Write a C++ code that makes pyramid shape as long as user wa.pdf
advancesystem
 
Write a class called Window that contains the following info.pdf
Write a class called Window that contains the following info.pdfWrite a class called Window that contains the following info.pdf
Write a class called Window that contains the following info.pdf
advancesystem
 
Write a C code that uses struct to create a userdefined typ.pdf
Write a C code that uses struct to create a userdefined typ.pdfWrite a C code that uses struct to create a userdefined typ.pdf
Write a C code that uses struct to create a userdefined typ.pdf
advancesystem
 
Write a 12 page report on igneous rock and how they are m.pdf
Write a 12 page report on igneous rock and  how they are m.pdfWrite a 12 page report on igneous rock and  how they are m.pdf
Write a 12 page report on igneous rock and how they are m.pdf
advancesystem
 
WQ4 Coevolution of Central American ants in the Pseudomyrme.pdf
WQ4 Coevolution of Central American ants in the Pseudomyrme.pdfWQ4 Coevolution of Central American ants in the Pseudomyrme.pdf
WQ4 Coevolution of Central American ants in the Pseudomyrme.pdf
advancesystem
 
With the following companies Apple Caterpillar Consolidat.pdf
With the following companies Apple Caterpillar Consolidat.pdfWith the following companies Apple Caterpillar Consolidat.pdf
With the following companies Apple Caterpillar Consolidat.pdf
advancesystem
 
Would the current answer be considered correct 2 Identify .pdf
Would the current answer be considered correct 2 Identify .pdfWould the current answer be considered correct 2 Identify .pdf
Would the current answer be considered correct 2 Identify .pdf
advancesystem
 
WQ3 Considering that you know about natural selection is o.pdf
WQ3 Considering that you know about natural selection is o.pdfWQ3 Considering that you know about natural selection is o.pdf
WQ3 Considering that you know about natural selection is o.pdf
advancesystem
 
would be earned How much of the total is simple interest an.pdf
would be earned How much of the total is simple interest an.pdfwould be earned How much of the total is simple interest an.pdf
would be earned How much of the total is simple interest an.pdf
advancesystem
 
write 350400 WORDS AND EXPLAIN BRIEFLY lapter 7 How Touris.pdf
write 350400 WORDS AND EXPLAIN BRIEFLY lapter 7 How Touris.pdfwrite 350400 WORDS AND EXPLAIN BRIEFLY lapter 7 How Touris.pdf
write 350400 WORDS AND EXPLAIN BRIEFLY lapter 7 How Touris.pdf
advancesystem
 
World vegetation maps and world climate maps are very simila.pdf
World vegetation maps and world climate maps are very simila.pdfWorld vegetation maps and world climate maps are very simila.pdf
World vegetation maps and world climate maps are very simila.pdf
advancesystem
 
Working individually or in pairs you will apply what you ha.pdf
Working individually or in pairs you will apply what you ha.pdfWorking individually or in pairs you will apply what you ha.pdf
Working individually or in pairs you will apply what you ha.pdf
advancesystem
 
Without using a function write the JavaScript code so that w.pdf
Without using a function write the JavaScript code so that w.pdfWithout using a function write the JavaScript code so that w.pdf
Without using a function write the JavaScript code so that w.pdf
advancesystem
 

More from advancesystem (20)

Write a Fortran program to SOS take the name and ID number.pdf
Write a Fortran program to  SOS  take the name and ID number.pdfWrite a Fortran program to  SOS  take the name and ID number.pdf
Write a Fortran program to SOS take the name and ID number.pdf
 
Write a function that will take four parameters omega x0.pdf
Write a function that will take four parameters omega  x0.pdfWrite a function that will take four parameters omega  x0.pdf
Write a function that will take four parameters omega x0.pdf
 
Write a function that will take four parameters omega p.pdf
Write a function that will take four parameters omega  p.pdfWrite a function that will take four parameters omega  p.pdf
Write a function that will take four parameters omega p.pdf
 
Write a generalpurpose program with loop and indexed addres.pdf
Write a generalpurpose program with loop and indexed addres.pdfWrite a generalpurpose program with loop and indexed addres.pdf
Write a generalpurpose program with loop and indexed addres.pdf
 
Write a computer program in JAVA that hides a secret message.pdf
Write a computer program in JAVA that hides a secret message.pdfWrite a computer program in JAVA that hides a secret message.pdf
Write a computer program in JAVA that hides a secret message.pdf
 
Write a Fortran program to 1 take the name and ID number of.pdf
Write a Fortran program to 1 take the name and ID number of.pdfWrite a Fortran program to 1 take the name and ID number of.pdf
Write a Fortran program to 1 take the name and ID number of.pdf
 
Write a function called ApologyLine that take an integer c a.pdf
Write a function called ApologyLine that take an integer c a.pdfWrite a function called ApologyLine that take an integer c a.pdf
Write a function called ApologyLine that take an integer c a.pdf
 
Write a C++ code that makes pyramid shape as long as user wa.pdf
Write a C++ code that makes pyramid shape as long as user wa.pdfWrite a C++ code that makes pyramid shape as long as user wa.pdf
Write a C++ code that makes pyramid shape as long as user wa.pdf
 
Write a class called Window that contains the following info.pdf
Write a class called Window that contains the following info.pdfWrite a class called Window that contains the following info.pdf
Write a class called Window that contains the following info.pdf
 
Write a C code that uses struct to create a userdefined typ.pdf
Write a C code that uses struct to create a userdefined typ.pdfWrite a C code that uses struct to create a userdefined typ.pdf
Write a C code that uses struct to create a userdefined typ.pdf
 
Write a 12 page report on igneous rock and how they are m.pdf
Write a 12 page report on igneous rock and  how they are m.pdfWrite a 12 page report on igneous rock and  how they are m.pdf
Write a 12 page report on igneous rock and how they are m.pdf
 
WQ4 Coevolution of Central American ants in the Pseudomyrme.pdf
WQ4 Coevolution of Central American ants in the Pseudomyrme.pdfWQ4 Coevolution of Central American ants in the Pseudomyrme.pdf
WQ4 Coevolution of Central American ants in the Pseudomyrme.pdf
 
With the following companies Apple Caterpillar Consolidat.pdf
With the following companies Apple Caterpillar Consolidat.pdfWith the following companies Apple Caterpillar Consolidat.pdf
With the following companies Apple Caterpillar Consolidat.pdf
 
Would the current answer be considered correct 2 Identify .pdf
Would the current answer be considered correct 2 Identify .pdfWould the current answer be considered correct 2 Identify .pdf
Would the current answer be considered correct 2 Identify .pdf
 
WQ3 Considering that you know about natural selection is o.pdf
WQ3 Considering that you know about natural selection is o.pdfWQ3 Considering that you know about natural selection is o.pdf
WQ3 Considering that you know about natural selection is o.pdf
 
would be earned How much of the total is simple interest an.pdf
would be earned How much of the total is simple interest an.pdfwould be earned How much of the total is simple interest an.pdf
would be earned How much of the total is simple interest an.pdf
 
write 350400 WORDS AND EXPLAIN BRIEFLY lapter 7 How Touris.pdf
write 350400 WORDS AND EXPLAIN BRIEFLY lapter 7 How Touris.pdfwrite 350400 WORDS AND EXPLAIN BRIEFLY lapter 7 How Touris.pdf
write 350400 WORDS AND EXPLAIN BRIEFLY lapter 7 How Touris.pdf
 
World vegetation maps and world climate maps are very simila.pdf
World vegetation maps and world climate maps are very simila.pdfWorld vegetation maps and world climate maps are very simila.pdf
World vegetation maps and world climate maps are very simila.pdf
 
Working individually or in pairs you will apply what you ha.pdf
Working individually or in pairs you will apply what you ha.pdfWorking individually or in pairs you will apply what you ha.pdf
Working individually or in pairs you will apply what you ha.pdf
 
Without using a function write the JavaScript code so that w.pdf
Without using a function write the JavaScript code so that w.pdfWithout using a function write the JavaScript code so that w.pdf
Without using a function write the JavaScript code so that w.pdf
 

Recently uploaded

Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
timhan337
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 

Recently uploaded (20)

Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 

Working with Dictionaries and ListsSets Modules you can use.pdf

  • 1. Working with Dictionaries and Lists/Sets Modules you can use: argparse, reLinks, osLinks, collectionsLinks, sysLinks General Guidelines (Steps 1-6) You have more flexibility to implement your own function names and logic in these programs. The data files you need for this assignment can obtained from: HUGO_genes.txt, chr21_genes.txt Create an output directory inside your assignment4 directory called "OUTPUT" for result files, so that they will not mix with your programs. Output from your programs will be here! Pay close attention to how you'll run the run_lints.sh script (see below) Your program must implement command line options for the infiles it must open, but for testing purposes it should run by default, so if no command line option is passed at the command line, the program will still run. This will help in the grading of your program. Create a Python Module called called io_utils.py. Put this io_utils.py Module in a subdirectory named assignment4 inside your assignment4 top-level directory (see the tree below). Anytime a file needs to be opened (read or write) in your programs in this assignment, the program should call on this module's function get_filehandle. You can then use io_utils.get_filehandle by doing this at the top of your programs: from assignment4 import io_utils # I can then use the module's get_filehandle() function by: fh_in = io_utils.get_filehandle(infile1, "r") # note the function call "io_utils.get_filehandle()" You can also import this way (Either way is acceptable, but I will assume in this assignment you did it the way below. I think this is better for Pycharm) from assignment4.io_utils import get_filehandle # I can then use the module's get_filehandle() function by: fh_in = get_filehandle(infile1, "r") # note the function call "get_filehandle()" Your final submission must have the following files in bold and must use this directory structure. Make sure you put a blank __init__.py where I've denoted below, e.g.: assignment4/ assignment4 (see below) Make sure you have __init__.py only in assignment4/assignment4 folder, the tests folder, the unit folder, and not in the assignment4 main folder. Information on Source files The chr21_genes.txt file lists genes from human chromosome 21, in their order along the chromosome, as described in Hattori et al. (Nature 405, 311-319)Links to an external site.. For each gene, the file gives the gene symbol, description and category. The fields are separated by tabs. You will need to get the the meaning of each category. You can find these meanings in the original paperLinks to an external site., under the "Gene categories" section. Create a file named
  • 2. chr21_genes_categories.txt that store this information in tab separated fields: This will be used in program #2 The HUGO_genes.txt file lists all human genes having official symbol approved by the HUGO gene nomenclature committeeLinks to an external site. (some have probably changed by now). For each gene, the file gives its symbol and description, separated by a TAB character. Exercises 1. Write a program (call it gene_names_from_chr21.py) that asks the user to enter a gene symbol and then prints the description for that gene based on data from the chr21_genes.txt file. The program should give an error message if the entered symbol is not found in the table (the user should should not have to worry about case, i.e. it will be a case-insensitive search). The program should continue to ask the user for genes until "quit" or "exit" is given (case-insensitive). Make sure to prompt the user to enter the quit to end the program. Use Dictionaries to solve this problem. HINT: Feel free to use as Dictionary of Dictionaries, but it is not required. HINT: First read the entire text file into a Dictionary that maps the association between gene symbol and description. Once again, make sure to use a Dictionary. Remember to have these command line options: $ python3 gene_names_from_chr21.py -i chr21_genes.txt Output from this program should just go to <STDOUT>: 2. Write a program (call it find_common_cats.py) that counts how many genes are in each category (1.1, 1.2, 2.1 etc.) based on data from the chr21_genes.txt file. The program should print the results so that categories are arranged in ascending order to an output file (call the output output OUTPUT/categories.txt . Read the paper to see what the categories represent and have this part of your output (this will be input from chr21_genes_categories.txt). Use Dictionaries to solve this problem. HINT: Feel free to use as Dictionary of Dictionaries, but it is not required. Note: you will notice that one gene has no category information. That's due to missing data in the file, JUST IGNORE THIS GENE!. Remember to have these command line options: $ python3 find_common_cats.py -i1 chr21_genes.txt -i2 chr21_genes_categories.txt Output to the file (OUTPUT/categories.txt) from this program: Note <Occurrence Here> is a number 3. Write a program (call it intersection_of_gene_names.py) that finds all gene symbols that appear both in the chr21_genes.txt file and in the HUGO_genes.txt file. These gene symbols should be printed to a file in alphabetical order (you can hard code the output file OUTPUT/intersection_output.txt) . The program should also print on the terminal how many common gene symbols were found. Use Lists or Sets to solve the problem. It is fine to use a temporary Dictionary to find the intersection of two Lists, but this can be simplified with Sets. Note: HUGO_genes.txt could have some duplicate entries. Remember to have these command line options: $ python3 intersection_of_gene_names.py -i1 chr21_genes.txt -i2 HUGO_genes.txt # the N 's below are an integer and bolded for illustration only
  • 3. Number of unique gene names in chr21_genes.txt: N Number of unique gene names in HUGO_genes.txt: N Number of common gene symbols found: N Output stored in OUTPUT/intersection_output.txt STDOUT is shown above, and the actual output of the intersection goes to the file ( OUTPUT/intersection_output.txt) from this program: If you implemented intersection_of_gene_names.py correctly, this program could take any gene file that has the gene in the first column (even if it's the only column) (additional examples: hgnc_complete_set_reduced.txtLinks to an external site. and gene_age.txtLinks to an external site.) $ python3 intersection_of_gene_names.py -i1 hgnc_complete_set_reduced.txt -i2 HUGO_genes.txt Number of unique gene names in hgnc_complete_set_reduced.txt: 43547 Number of unique gene names in HUGO_genes.txt: 11815 Number of common gene symbols found: 8654 Output stored in OUTPUT/intersection_output.txt $ python3 intersection_of_gene_names.py -i1 gene_age.txt -i2 chr21_genes.txt Number of unique gene names in gene_age.txt: 307 Number of unique gene names in chr21_genes.txt: 285 Number of common gene symbols found: 4 Output stored in OUTPUT/intersection_output.txt You must solve exercises 1 and 2 by using Dictionaries, and exercise 3 using Lists or Sets