This document provides an introduction to Python programming for computational genomics and bioinformatics. It discusses the Python environment, integrated development environments (IDEs) like IPython Notebook and PyCharm, and various Python programming concepts including printing and manipulating text, reading and writing files, lists and loops, and writing your own functions. The document is presented as a tutorial with examples and exercises provided to help attendees learn the basics of Python programming.
This document provides an overview and introduction to using the command line interface and submitting jobs to the NIAID High Performance Computing (HPC) Cluster. The objectives are to learn basic Unix commands, practice file manipulation from the command line, and submit a job to the HPC cluster. The document covers topics such as the anatomy of the terminal, navigating directories, common commands, tips for using the command line more efficiently, accessing and mounting drives on the HPC cluster, and an overview of the cluster queue system.
The document provides an overview of Unix basics and scripting. It defines what an operating system and Unix are, describes the Unix philosophy and directory structure, and covers shells, commands, writing and executing scripts, variables, loops, and file permissions. The key topics covered include the Unix philosophy of small, modular programs; the hierarchical directory structure with / as the root; common shells like bash and commands like ls, grep, sort; and how to write simple shell scripts using variables, conditionals, and loops.
1) xv6 is a reimplementation of the Unix Version 6 operating system (V6) in ANSI C. It is used at MIT for teaching operating systems concepts.
2) The document discusses installing xv6 on a system by cloning its source code from GitHub and compiling it. Key steps include installing dependencies, QEMU, and cloning the xv6 source code.
3) An overview of xv6's structure is provided, noting it is a monolithic kernel that provides services to user processes via system calls, allowing processes to alternate between user and kernel space.
This document provides an introduction to Linux and common Linux commands. It discusses key facts about Unix, how Linux is based on Unix, popular Linux distributions like Ubuntu, and common file system layout and commands for manipulating files and directories. The document concludes with an assignment to write a Bash script to analyze and compare British and American English dictionaries.
Linux Administrator - The Linux Course on EduonixPaddy Lock
Daily tasks of a Linux administrator include package management, ensuring system security through regular backups and updating of software and patches, and monitoring system performance and anticipating potential issues. When issues do arise, Linux administrators must be able to effectively use documentation like man pages to troubleshoot problems. Choosing an appropriate Linux distribution depends on factors such as software compatibility, vendor support policies, and patch release schedules.
This presentation provides some technical details on the function of the Galaxy toolshed. It was prepared for a group (Biobix at UGent), during my previous job.
The document contains 9 questions about Linux commands and concepts:
1. The differences between various Linux distributions
2. The differences between the rm and rmdir commands
3. How to modify file timestamps to make it appear a change was made earlier
4. How to print a range of lines from a file
5. The behavior of the cp command when copying to an existing directory
6. The differences between file permissions and who has access
7. Identifying issues with a file copying command
8. The differences between the ps and top commands
9. How to create shortcuts to files and directories from the command line
The document provides an overview of basic and useful UNIX commands organized into categories including essential commands, valuable commands, fun commands, helpful commands, and useful commands. It describes commands for navigating directories, manipulating files, editing text, sending email, connecting to other systems, monitoring system usage, and more. The document is intended to help users get started with common tasks in UNIX.
This document provides an overview and introduction to using the command line interface and submitting jobs to the NIAID High Performance Computing (HPC) Cluster. The objectives are to learn basic Unix commands, practice file manipulation from the command line, and submit a job to the HPC cluster. The document covers topics such as the anatomy of the terminal, navigating directories, common commands, tips for using the command line more efficiently, accessing and mounting drives on the HPC cluster, and an overview of the cluster queue system.
The document provides an overview of Unix basics and scripting. It defines what an operating system and Unix are, describes the Unix philosophy and directory structure, and covers shells, commands, writing and executing scripts, variables, loops, and file permissions. The key topics covered include the Unix philosophy of small, modular programs; the hierarchical directory structure with / as the root; common shells like bash and commands like ls, grep, sort; and how to write simple shell scripts using variables, conditionals, and loops.
1) xv6 is a reimplementation of the Unix Version 6 operating system (V6) in ANSI C. It is used at MIT for teaching operating systems concepts.
2) The document discusses installing xv6 on a system by cloning its source code from GitHub and compiling it. Key steps include installing dependencies, QEMU, and cloning the xv6 source code.
3) An overview of xv6's structure is provided, noting it is a monolithic kernel that provides services to user processes via system calls, allowing processes to alternate between user and kernel space.
This document provides an introduction to Linux and common Linux commands. It discusses key facts about Unix, how Linux is based on Unix, popular Linux distributions like Ubuntu, and common file system layout and commands for manipulating files and directories. The document concludes with an assignment to write a Bash script to analyze and compare British and American English dictionaries.
Linux Administrator - The Linux Course on EduonixPaddy Lock
Daily tasks of a Linux administrator include package management, ensuring system security through regular backups and updating of software and patches, and monitoring system performance and anticipating potential issues. When issues do arise, Linux administrators must be able to effectively use documentation like man pages to troubleshoot problems. Choosing an appropriate Linux distribution depends on factors such as software compatibility, vendor support policies, and patch release schedules.
This presentation provides some technical details on the function of the Galaxy toolshed. It was prepared for a group (Biobix at UGent), during my previous job.
The document contains 9 questions about Linux commands and concepts:
1. The differences between various Linux distributions
2. The differences between the rm and rmdir commands
3. How to modify file timestamps to make it appear a change was made earlier
4. How to print a range of lines from a file
5. The behavior of the cp command when copying to an existing directory
6. The differences between file permissions and who has access
7. Identifying issues with a file copying command
8. The differences between the ps and top commands
9. How to create shortcuts to files and directories from the command line
The document provides an overview of basic and useful UNIX commands organized into categories including essential commands, valuable commands, fun commands, helpful commands, and useful commands. It describes commands for navigating directories, manipulating files, editing text, sending email, connecting to other systems, monitoring system usage, and more. The document is intended to help users get started with common tasks in UNIX.
This document provides an overview of various Unix/Linux commands and concepts. It discusses the introduction to Unix including defining an operating system and its functionalities. It describes the evolution and structure of Unix. It covers usage of simple commands like date, who, ls and file commands like cat, cp, mv etc. It explains the Unix file system hierarchy and concepts like input/output redirection and wildcards. It also discusses environmental variables, file permissions and commands related to pipes and filters like sort and grep. Finally, it talks about editors like vi and shell programming concepts.
Part 5 of "Introduction to Linux for Bioinformatics": Working the command lin...Joachim Jacob
This is part 5 of the training "introduction to linux for bioinformatics". Here we introduce more advanced use on the command line (piping, redirecting) and provide you a selection of GNU text mining and analysis tools that assist you tremendously in handling your bioinformatics data. Interested in following this training session? Contact me at http://www.jakonix.be/contact.html
Here are some sed commands to demonstrate its capabilities:
◦ sed 's/rain/snow/' easy_sed.txt; cat easy_sed.txt
◦ sed 's/plain/mountains/' easy_sed.txt; cat easy_sed.txt
◦ sed 's/Spain/France/' easy_sed.txt; cat easy_sed.txt
◦ sed 's/^The //' easy_sed.txt; cat easy_sed.txt
◦ sed '/Spain/d' easy_sed.txt; cat easy_sed.txt
This demonstrates sed's substitution and deletion capabilities using regular expressions to match patterns in the file.
This document provides a quick guide to the Linux command line. It introduces Linux and the shell, and explains why the command line is useful even with graphical user interfaces. It then covers basic commands for file management, processes, archives, and input/output redirection. Finally, it briefly mentions some simple text editors and hints at using more advanced shell scripting.
This document describes the functions of various Linux commands, including commands for listing files (ls), creating directories (mkdir) and files (touch, cat), copying files (cp), changing directories (cd), moving files (mv), finding file locations (whereis, which), displaying manual pages (man, info), checking disk usage (df, du), viewing running processes (ps), setting aliases (alias), changing user identity (su, sudo), viewing command history (history), setting the system date and time (date), displaying calendars (cal), and clearing the terminal screen (clear). It provides the syntax and examples for using each command.
This document provides summaries of basic, valuable, fun, helpful, and useful UNIX commands organized into categories. It introduces the UNIX operating system and notes that free versions like Linux are gaining popularity. The summaries describe 10 essential commands like ls and cd for navigating directories. Another 10 valuable commands help manage accounts, like grep to search files and chmod to change permissions. Additional categories summarize commands for tasks like printing, emailing, drawing, and monitoring system resources. The document aims to help users get started with common UNIX commands.
This document contains interview questions for a Linux administrator role. It includes questions about shell scripting, system administration tasks, networking, and more. Some example questions are how to take input in a shell script, write a script to convert file path slashes, and explain the differences between UDP and TCP. The document provides technical questions to assess a candidate's Linux knowledge and experience.
CompTIA Linux+ Powered by LPI certifies foundational skills and knowledge of Linux. With Linux being the central operating system for much of the world’s IT infrastructure, Linux+ is an essential credential for individuals working in IT, especially those on the path of a Web and software development career. With CompTIA’s Linux+ Powered by LPI certification, you’ll acquire the fundamental skills and knowledge you need to successfully configure, manage and troubleshoot Linux systems. Recommended experience for this certification includes CompTIA A+, CompTIA Network+ and 12 months of Linux admin experience. No prerequisites required.
Course 102: Lecture 16: Process Management (Part 2) Ahmed El-Arabawy
This lecture continues to introduce concepts about processes in Linux. It describes both Automatic processes and Daemon Processes.
Check the other Lectures and courses in
http://Linux4EnbeddedSystems.com
or Follow our Facebook Group at
- Facebook: @LinuxforEmbeddedSystems
Lecturer Profile:
- https://www.linkedin.com/in/ahmedelarabawy
This document discusses Linux text stream filters and provides examples of common Unix commands used to process and modify text streams. These commands include cat, head, tail, cut, and split. Cat prints the contents of files, head prints the first few lines, tail prints the last few lines, cut extracts parts of each line, and split divides files into smaller parts. The document also covers input/output redirection and how it can be used with filters to modify command output and send it to files.
This document discusses Linux text stream filters and provides examples of common Unix commands used to process and modify text streams. These commands include cat, head, tail, cut, and split. Cat prints the contents of files, head prints the first few lines, tail prints the last few lines, cut extracts parts of each line, and split divides files into smaller parts. The document also covers input/output redirection and how it can be used with text stream filters.
The document provides an overview of Linux commands and the command line interface. It discusses:
1. Why the command line interface is useful and how to open the terminal emulator.
2. The different types of shells available in Linux and how to check the current shell or change shells.
3. Common Linux directory structures like /bin and /usr/bin that contain executable programs and commands.
This document provides an introduction and overview of the Unix operating system. It covers topics such as getting help, the file system, the shell, network security, email clients, text editors, input/output redirection, printing, process management, and the X window system. The document is intended to help new Unix users understand basic Unix concepts and commands.
This document provides an overview of GNU/Linux and Bash basics, including their history, file system structure, users and permissions, processes, and Bash functionality. It covers topics such as files and directories, links, file types, locations in the file system, users and groups, process states and signals, the Bash command line interface versus scripts, variables, file streams and pipelines, text processing utilities, program execution and process management, file system management, permissions, and basic network tasks. The document is intended to introduce users to fundamental Linux and Bash concepts.
Course 102: Lecture 27: FileSystems in Linux (Part 2)Ahmed El-Arabawy
This lecture goes through the different types of Filesystems and some commands that are used with filesystems. It introduces the filesystems ext2/3/4 , JFFS2, cramfs, ramfs, tmpfs, and NFS.
Video for this lecture on youtube:
http://www.youtube.com/watch?v=XPtPsc6uaKY
Check the other Lectures and courses in
http://Linux4EnbeddedSystems.com
or Follow our Facebook Group at
- Facebook: @LinuxforEmbeddedSystems
Lecturer Profile:
Ahmed ElArabawy
- https://www.linkedin.com/in/ahmedelarabawy
This document provides an introduction to Linux and summarizes key topics including:
1. The history and development of Linux including influences from Multics and Unix as well as contributions from developers like Ken Thompson, Dennis Ritchie, and others.
2. Important related operating systems and distributions like BSD, Debian, Ubuntu, and others that helped shape Linux.
3. Core Linux concepts like the Unix philosophy, shells, files/file systems, users/permissions, and commands.
This document discusses embedded Linux programming. It covers topics such as what Linux is, the layers in a Linux system including the kernel and user programs, how Linux differs from legacy real-time operating systems, and an agenda for a course on embedded Linux driver development that covers the Linux kernel, memory management, interrupts, and networking. It also provides information on basic Linux command line tools and file permissions.
Linux is an operating system similar to Unix. The document lists and describes 27 common Linux commands, including commands for listing files (ls), removing files and directories (rm, rmdir), viewing file contents (cat, more, less), navigating and creating directories (cd, mkdir), moving and copying files (mv, cp), searching files (grep), counting characters (wc), checking the current working directory (pwd), getting command help (man), finding files and programs (whereis, find, locate), editing files (vi, emacs), connecting remotely (telnet, ssh), checking network status (netstat, ifconfig), getting information about internet hosts (whois, nslookup, dig, finger), testing network connectivity
The document provides an overview of the Linux command line interface (CLI), including:
- The CLI does not require graphics and is generally faster for experienced users than a GUI
- The bash shell is the default shell and allows command line completion
- Programs are executed through the shell and can take arguments to alter behavior
- Built-in commands are included with the shell while other programs must be found in the PATH
- Output redirection and piping allows chaining of commands and redirection of streams
This lecture discusses the Environment Variables concept, usage, and how processes acquire them. It then goes through the most popular ones
Check the other Lectures and courses in
http://Linux4EnbeddedSystems.com
or Follow our Facebook Group at
- Facebook: @LinuxforEmbeddedSystems
Lecturer Profile:
- https://www.linkedin.com/in/ahmedelarabawy
Biopython is a set of freely available Python tools for bioinformatics and molecular biology. It provides features like parsing bioinformatics files into Python structures, a sequence class to store sequences and features, and interfaces to popular bioinformatics programs. Biopython can be used to address common bioinformatics problems like sequence manipulation, searching for primers, and running BLAST searches. The current version is 1.53 from December 2009 and future plans include updating the multiple sequence alignment object and adding a Bio.Phylo module.
Python is a popular programming language created by Guido van Rossum in 1991. It is easy to use, powerful, and versatile, making it suitable for beginners and experts alike. Python code can be written and executed in the browser using Google Colab, which provides a Jupyter notebook environment and access to computing resources like GPUs. The document then discusses installing Python using Anaconda, basic Python concepts like indentation, variables, strings, conditionals, and loops.
This document provides an overview of various Unix/Linux commands and concepts. It discusses the introduction to Unix including defining an operating system and its functionalities. It describes the evolution and structure of Unix. It covers usage of simple commands like date, who, ls and file commands like cat, cp, mv etc. It explains the Unix file system hierarchy and concepts like input/output redirection and wildcards. It also discusses environmental variables, file permissions and commands related to pipes and filters like sort and grep. Finally, it talks about editors like vi and shell programming concepts.
Part 5 of "Introduction to Linux for Bioinformatics": Working the command lin...Joachim Jacob
This is part 5 of the training "introduction to linux for bioinformatics". Here we introduce more advanced use on the command line (piping, redirecting) and provide you a selection of GNU text mining and analysis tools that assist you tremendously in handling your bioinformatics data. Interested in following this training session? Contact me at http://www.jakonix.be/contact.html
Here are some sed commands to demonstrate its capabilities:
◦ sed 's/rain/snow/' easy_sed.txt; cat easy_sed.txt
◦ sed 's/plain/mountains/' easy_sed.txt; cat easy_sed.txt
◦ sed 's/Spain/France/' easy_sed.txt; cat easy_sed.txt
◦ sed 's/^The //' easy_sed.txt; cat easy_sed.txt
◦ sed '/Spain/d' easy_sed.txt; cat easy_sed.txt
This demonstrates sed's substitution and deletion capabilities using regular expressions to match patterns in the file.
This document provides a quick guide to the Linux command line. It introduces Linux and the shell, and explains why the command line is useful even with graphical user interfaces. It then covers basic commands for file management, processes, archives, and input/output redirection. Finally, it briefly mentions some simple text editors and hints at using more advanced shell scripting.
This document describes the functions of various Linux commands, including commands for listing files (ls), creating directories (mkdir) and files (touch, cat), copying files (cp), changing directories (cd), moving files (mv), finding file locations (whereis, which), displaying manual pages (man, info), checking disk usage (df, du), viewing running processes (ps), setting aliases (alias), changing user identity (su, sudo), viewing command history (history), setting the system date and time (date), displaying calendars (cal), and clearing the terminal screen (clear). It provides the syntax and examples for using each command.
This document provides summaries of basic, valuable, fun, helpful, and useful UNIX commands organized into categories. It introduces the UNIX operating system and notes that free versions like Linux are gaining popularity. The summaries describe 10 essential commands like ls and cd for navigating directories. Another 10 valuable commands help manage accounts, like grep to search files and chmod to change permissions. Additional categories summarize commands for tasks like printing, emailing, drawing, and monitoring system resources. The document aims to help users get started with common UNIX commands.
This document contains interview questions for a Linux administrator role. It includes questions about shell scripting, system administration tasks, networking, and more. Some example questions are how to take input in a shell script, write a script to convert file path slashes, and explain the differences between UDP and TCP. The document provides technical questions to assess a candidate's Linux knowledge and experience.
CompTIA Linux+ Powered by LPI certifies foundational skills and knowledge of Linux. With Linux being the central operating system for much of the world’s IT infrastructure, Linux+ is an essential credential for individuals working in IT, especially those on the path of a Web and software development career. With CompTIA’s Linux+ Powered by LPI certification, you’ll acquire the fundamental skills and knowledge you need to successfully configure, manage and troubleshoot Linux systems. Recommended experience for this certification includes CompTIA A+, CompTIA Network+ and 12 months of Linux admin experience. No prerequisites required.
Course 102: Lecture 16: Process Management (Part 2) Ahmed El-Arabawy
This lecture continues to introduce concepts about processes in Linux. It describes both Automatic processes and Daemon Processes.
Check the other Lectures and courses in
http://Linux4EnbeddedSystems.com
or Follow our Facebook Group at
- Facebook: @LinuxforEmbeddedSystems
Lecturer Profile:
- https://www.linkedin.com/in/ahmedelarabawy
This document discusses Linux text stream filters and provides examples of common Unix commands used to process and modify text streams. These commands include cat, head, tail, cut, and split. Cat prints the contents of files, head prints the first few lines, tail prints the last few lines, cut extracts parts of each line, and split divides files into smaller parts. The document also covers input/output redirection and how it can be used with filters to modify command output and send it to files.
This document discusses Linux text stream filters and provides examples of common Unix commands used to process and modify text streams. These commands include cat, head, tail, cut, and split. Cat prints the contents of files, head prints the first few lines, tail prints the last few lines, cut extracts parts of each line, and split divides files into smaller parts. The document also covers input/output redirection and how it can be used with text stream filters.
The document provides an overview of Linux commands and the command line interface. It discusses:
1. Why the command line interface is useful and how to open the terminal emulator.
2. The different types of shells available in Linux and how to check the current shell or change shells.
3. Common Linux directory structures like /bin and /usr/bin that contain executable programs and commands.
This document provides an introduction and overview of the Unix operating system. It covers topics such as getting help, the file system, the shell, network security, email clients, text editors, input/output redirection, printing, process management, and the X window system. The document is intended to help new Unix users understand basic Unix concepts and commands.
This document provides an overview of GNU/Linux and Bash basics, including their history, file system structure, users and permissions, processes, and Bash functionality. It covers topics such as files and directories, links, file types, locations in the file system, users and groups, process states and signals, the Bash command line interface versus scripts, variables, file streams and pipelines, text processing utilities, program execution and process management, file system management, permissions, and basic network tasks. The document is intended to introduce users to fundamental Linux and Bash concepts.
Course 102: Lecture 27: FileSystems in Linux (Part 2)Ahmed El-Arabawy
This lecture goes through the different types of Filesystems and some commands that are used with filesystems. It introduces the filesystems ext2/3/4 , JFFS2, cramfs, ramfs, tmpfs, and NFS.
Video for this lecture on youtube:
http://www.youtube.com/watch?v=XPtPsc6uaKY
Check the other Lectures and courses in
http://Linux4EnbeddedSystems.com
or Follow our Facebook Group at
- Facebook: @LinuxforEmbeddedSystems
Lecturer Profile:
Ahmed ElArabawy
- https://www.linkedin.com/in/ahmedelarabawy
This document provides an introduction to Linux and summarizes key topics including:
1. The history and development of Linux including influences from Multics and Unix as well as contributions from developers like Ken Thompson, Dennis Ritchie, and others.
2. Important related operating systems and distributions like BSD, Debian, Ubuntu, and others that helped shape Linux.
3. Core Linux concepts like the Unix philosophy, shells, files/file systems, users/permissions, and commands.
This document discusses embedded Linux programming. It covers topics such as what Linux is, the layers in a Linux system including the kernel and user programs, how Linux differs from legacy real-time operating systems, and an agenda for a course on embedded Linux driver development that covers the Linux kernel, memory management, interrupts, and networking. It also provides information on basic Linux command line tools and file permissions.
Linux is an operating system similar to Unix. The document lists and describes 27 common Linux commands, including commands for listing files (ls), removing files and directories (rm, rmdir), viewing file contents (cat, more, less), navigating and creating directories (cd, mkdir), moving and copying files (mv, cp), searching files (grep), counting characters (wc), checking the current working directory (pwd), getting command help (man), finding files and programs (whereis, find, locate), editing files (vi, emacs), connecting remotely (telnet, ssh), checking network status (netstat, ifconfig), getting information about internet hosts (whois, nslookup, dig, finger), testing network connectivity
The document provides an overview of the Linux command line interface (CLI), including:
- The CLI does not require graphics and is generally faster for experienced users than a GUI
- The bash shell is the default shell and allows command line completion
- Programs are executed through the shell and can take arguments to alter behavior
- Built-in commands are included with the shell while other programs must be found in the PATH
- Output redirection and piping allows chaining of commands and redirection of streams
This lecture discusses the Environment Variables concept, usage, and how processes acquire them. It then goes through the most popular ones
Check the other Lectures and courses in
http://Linux4EnbeddedSystems.com
or Follow our Facebook Group at
- Facebook: @LinuxforEmbeddedSystems
Lecturer Profile:
- https://www.linkedin.com/in/ahmedelarabawy
Biopython is a set of freely available Python tools for bioinformatics and molecular biology. It provides features like parsing bioinformatics files into Python structures, a sequence class to store sequences and features, and interfaces to popular bioinformatics programs. Biopython can be used to address common bioinformatics problems like sequence manipulation, searching for primers, and running BLAST searches. The current version is 1.53 from December 2009 and future plans include updating the multiple sequence alignment object and adding a Bio.Phylo module.
Python is a popular programming language created by Guido van Rossum in 1991. It is easy to use, powerful, and versatile, making it suitable for beginners and experts alike. Python code can be written and executed in the browser using Google Colab, which provides a Jupyter notebook environment and access to computing resources like GPUs. The document then discusses installing Python using Anaconda, basic Python concepts like indentation, variables, strings, conditionals, and loops.
This document provides an overview of using Python for bioinformatics. It discusses why Python is useful for bioinformatics due to its built-in libraries and wide scientific use. It also covers Python basics like strings, regular expressions, control structures, lists, dictionaries, reading/writing files, and using GitHub for code sharing. Examples are given for many of these topics. Finally, it poses questions about analyzing sequence data and a protein database using Python.
This document provides an introduction to the Python programming language over 30 slides. It covers key Python concepts like variables, data types, conditionals, loops, functions, imports, strings, lists, tuples, sets, dictionaries, classes and input/output. Examples are given for each concept to demonstrate how it works in Python. The document concludes by encouraging the reader to continue learning Python through online documentation and resources.
This document provides an introduction to the Python programming language over 30 minutes. It covers basic Python concepts like variables, data types, conditionals, loops, functions, imports, strings, lists, tuples, sets, dictionaries, and classes. Code examples are provided to demonstrate how to use these features. The document encourages learners to continue learning Python through online documentation and resources.
Python is an easy to learn, object-oriented programming language that is widely used for scientific computing. It allows interactive testing of code through its interpreter interface. A user's first Python program prints "hello, world!" by saving a line of text with the .py extension and running it with the Python interpreter. Python defines six basic object types - numbers, strings, lists, tuples, dictionaries, and files. Variables store references to objects, and objects can be written directly as literals or accessed via variables. Important Python packages must be imported before their functions can be used. Command line arguments provide a way to input information into programs.
This document discusses training on Python that was conducted over six weeks by Cetpa Infotech Pvt. Ltd. It covers topics like what Python is, the differences between programs and scripting languages, Python's history and uses. It also discusses installing Python IDEs and provides examples of Python code, variables, data types, strings, lists, tuples, and control flow statements. The conclusion is that Python is a good teaching language due to being free, easy to install, and flexible for both procedural and object-oriented programming.
This document provides an introduction to the Python programming language. It begins with an agenda that covers running Python, Python programming concepts like data types and control flows, and hands-on exercises. It then discusses running Python interactively and as programs, Python syntax and basic data types like numbers, strings, lists, dictionaries, and tuples. The document is intended to help users understand the basic structure of Python and write simple Python scripts.
This document provides an introduction to the Python programming language. It discusses why Python is easy to learn, relatively fast, object-oriented, strongly typed, widely used and portable. It then provides instructions on getting started with Python on Mac, including how to start the Python interpreter and run a simple "Hello World" program. It also demonstrates using the Python interpreter interactively to test code. The document explains the basic Python object types of numbers, strings, lists, tuples, dictionaries and files. It introduces the concepts of literals, variables and the import command. It provides examples of using command line arguments in Python programs.
Sphinx autodoc - automated api documentation - PyCon.MY 2015Takayuki Shimizukawa
Takayuki Shimizukawa presented an introduction to using Sphinx and docstrings to generate documentation from Python source code. Key points included setting up Sphinx with the autodoc, autosummary, and doctest extensions to automatically generate API documentation and test code examples from docstrings. Writing informative docstrings with parameter and return type information as well as code examples allows Sphinx to generate detailed, easy to understand documentation from Python modules, functions and methods.
The PyConTW (http://tw.pycon.org) organizer wishes to improve the quality and quantity of the programming cummunities in Taiwan. Though Python is their core tool and methodology, they know it's worth to learn and communicate with wide-ranging communities. Understanding cultures and ecosystem of a language takes me about three to six months. This six-hour course wraps up what I - an experienced Java developer - have learned from Python ecosystem and the agenda of the past PyConTW.
你可以在以下鏈結找到中文內容:
http://www.codedata.com.tw/python/python-tutorial-the-1st-class-1-preface
This document provides an introduction to the Python programming language. It outlines the objectives of learning Python, which are to introduce the language, write code to automate tasks, and retrieve and use hydrologic data. The schedule for the week is also presented. Python is described as easy to learn, freely available, cross-platform, and widely used. The document then demonstrates basic Python concepts like arithmetic, strings, variables, lists, dictionaries, conditionals, loops, functions, and modules. It concludes by assigning a coding challenge for students to download USGS streamflow data and extract information from it using Python.
The document provides a tutorial on Orange, an open source data mining package. It discusses Orange's features, how to install Orange Canvas on Windows and Ubuntu, and provides Python scripting code examples for using Orange, including calculating association rule support and confidence, naive Bayes classification, regression, and k-means clustering. The Python scripts demonstrate how to load and analyze data using Orange's Python API.
This document provides an overview of the Python programming language. It begins by explaining what Python is - a general purpose, interpreted programming language that can be used as both a programming and scripting language. It then discusses the differences between programs and scripting languages. The history and creator of Python, Guido van Rossum, are outlined. The document explores the scope of Python and what tasks it can be used for. Popular companies and industries that use Python today are listed. Reasons why people use Python, such as it being free, powerful, and portable, are provided. Instructions for installing Python and running Python code are included. The document covers Python code execution and introduces basic Python concepts like variables, strings, data types, lists
Sphinx autodoc - automated API documentation (EuroPython 2015 in Bilbao)Takayuki Shimizukawa
Takayuki Shimizukawa discusses how to generate documentation from Python source code using Sphinx. He introduces Sphinx and its extensions for automating documentation generation from docstrings. He demonstrates setting up a Sphinx project and configuring extensions like autodoc, autosummary, and doctest to generate API documentation and test code examples. The presentation emphasizes best practices for writing informative docstrings and code examples to fully document modules and functions.
This document provides an overview of the Python programming language. It begins by explaining what Python is, noting that it is a general purpose programming language that is often used for scripting. The key differences between program and scripting languages are then outlined. The history and creation of Python by Guido van Rossum are summarized, along with Python's scope in fields like science, system administration, and web development. Various uses of Python are listed, followed by who commonly uses Python today such as Google and YouTube. Reasons for Python's popularity include being free, powerful, and portable. The document concludes by covering installing Python, running and executing Python code, and some basic Python concepts like strings, variables, data types, and loops/
This document provides an introduction and overview of common methods for processing and analyzing next generation sequencing (NGS) data, including mapping NGS reads and de novo assembly of NGS reads. It discusses various NGS applications such as RNA-Seq, epigenetics, structural variation detection, and metagenomics. Key steps in read alignment such as choosing an alignment program and viewing alignments are outlined. Considerations for choosing an alignment program based on library type, read type, and platform are also reviewed. Popular alignment programs including Bowtie, BWA, TopHat, and Novoalign are mentioned.
slides1-introduction to python-programming.pptxAkhdanMumtaz
This document provides an introduction to the Python programming language. It discusses what Python is, how it works, how to get Python installed, and examples of simple Python programs. Key points covered include that Python is an interpreted, high-level programming language created in 1991, it has an easy to use syntax, and it can be used for a wide range of tasks. The document also demonstrates running Python programs and discusses common programming errors like syntax, runtime, and logic errors.
Similar to Intro to Python programming and iPython (20)
Here are the steps to visualize a potential indel region after realignment:
1. Run GATK IndelRealigner on the target list:
java -jar $EBROOTGATK/GenomeAnalysisTK.jar -T IndelRealigner -R ../human_g1k_v37.fasta -I sample.dedup.bam -targetIntervals sample.intervals -o sample.realigned.bam
2. Index the realigned BAM:
samtools index sample.realigned.bam
3. Load the realigned BAM into IGV and navigate to a region of interest from the target list (sample.intervals).
4. In I
This document discusses phylogenetic analysis and tree building. It introduces the Bioinformatics and Computational Biology Branch (BCBB) group and their work analyzing biological sequences and constructing phylogenetic trees. The document explains why biological sequences are important to analyze and compares sequences to understand relatedness and evolution. It also covers multiple sequence alignment, substitution models, and algorithms for building trees, including neighbor-joining.
The webinar covered new features and updates to the Nephele 2.0 bioinformatics analysis platform. Key updates included a new website interface, improved performance through a new infrastructure framework, the ability to resubmit jobs by ID, and interactive mapping file submission. New pipelines for 16S analysis using DADA2 and quality control preprocessing were introduced, and the existing 16S mothur pipeline was updated. The quality control pipeline provides tools to assess data quality before running microbiome analyses through FastQC, primer/adapter trimming with cutadapt, and additional quality filtering options. The webinar emphasized the importance of data quality checks and highlighted troubleshooting tips such as examining the log file for error messages when jobs fail.
1) METAGENOTE is a new web-based tool for annotating genomic samples and submitting metadata and sequencing files to the Sequence Read Archive (SRA) at the National Center for Biotechnology Information (NCBI).
2) It provides templates and controlled vocabularies to streamline sample metadata annotation using existing ontologies and standards. This allows for easier cross-study comparisons.
3) The demonstration showed how to use METAGENOTE's interface to annotate a mouse ear skin sample with terms from relevant ontologies, import additional annotations in batch, and submit the metadata and files to NCBI SRA through a 5-step wizard.
This document provides an introduction to homology modeling using computational tools like I-TASSER and Phyre2. It discusses how homology modeling can be used to generate 3D structural models of proteins when an experimental structure is not available. The document addresses common questions from users and outlines the I-TASSER modeling pipeline. Hands-on exercises are provided to allow users to run homology modeling tools and examine the resulting models.
This document summarizes different computational methods for protein structure prediction, including homology modeling, fold recognition, threading, and ab initio modeling. Homology modeling relies on identifying proteins with similar sequences and known structures. Fold recognition and threading can be used when there are no homologs, to identify proteins with the same overall fold but different sequences. Ab initio modeling uses physics-based modeling and protein fragments to predict structure from sequence alone, and has challenges due to the vast number of possible conformations.
Homology modeling is a computational technique for predicting the structure of a protein target based on its sequence similarity to proteins with known structures, and it involves finding a suitable template, aligning the target and template sequences, building a 3D model of the target, and evaluating the model quality. While experimental methods like X-ray crystallography and NMR can determine protein structures, they have limitations in terms of which proteins can be studied, so computational methods like homology modeling are needed to predict structures for the many proteins whose structures remain unknown.
The document discusses function prediction for unknown proteins. It begins with an overview of common methods for function prediction, including sequence and structure similarity, domains and motifs, gene expression, and interactions. It then uses a protein called Msa as a case study, analyzing it with various tools and finding evidence it may function as a signal transducer in bacterial response to environment. Finally, it briefly discusses another protein M46 and challenges in evaluating prediction accuracy.
This presentation discusses protein structure prediction using Rosetta. It begins with an overview of the Critical Assessment of Protein Structure Prediction (CASP) experiments and notes that Rosetta is one of the top performing free-modeling servers. The presentation then describes the basic ab initio protocol used by Rosetta, which involves fragment insertion, scoring, and refinement. It also discusses limitations and success rates. Key aspects of the Rosetta energy functions and sampling algorithms are presented. Examples of specific Rosetta applications including low-resolution modeling and refinement are provided.
This document provides an outline for a presentation on biological networks, including introducing biological networks, describing their basic components and types, methods for predicting and building networks, sources of interaction data, tools for network visualization and analysis, and a demonstration of building, visualizing and analyzing biological networks using Cytoscape. The presentation covers topics like nodes and edges in networks, features used to analyze networks, methods for predicting networks from sequences and omics data, integrated databases for interaction data, and popular tools for searching, visualizing and performing network analysis.
This document provides an overview of statistical analyses that can be performed in PRISM. It discusses how to perform common statistical tests like t-tests, ANOVA, linear regression, and summarizes the appropriate tests to address different research questions. Examples are given of how to analyze pre-post treatment data using paired t-tests and compare groups using independent t-tests or ANOVA. Guidance is also provided on interpreting results and checking assumptions.
1) JMP is statistical software that allows for easy import, organization, and analysis of data. It features spreadsheet-like data tables, powerful statistical modeling capabilities, and customizable graphics.
2) The document reviews various features of JMP including importing data, organizing data tables, performing statistical analyses through platforms like distribution and fit model, and creating graphs and reports.
3) Assistance is available for using JMP through free training, support contacts, and detailed help menus within the software. JMP allows for both simple and advanced statistical analysis of data.
This document discusses methods for analyzing categorical data and response variables, including contingency tables, chi-square tests, Fisher's exact test, odds ratios, logistic regression, and generalized linear models. Contingency tables are used to display relationships between categorical variables and tests of independence. Fisher's exact test and chi-square tests determine if a relationship is statistically significant. Odds ratios and relative risk indicate the magnitude of relationships. Logistic regression models relationships between continuous predictors and categorical responses. Generalized linear models extend these methods.
This document provides a training manual on better graphics in R. It begins with an overview of R and BioConductor and reviews basic R functions. It then covers creating simple and customized graphics, multi-step graphics with legends, and multi-panel layouts. The manual aims to help researchers learn visualization techniques to improve the communication of their data and results.
This document describes two web tools that were created using R to automate biostatistics workflows: HDX NAME and DRAP. HDX NAME analyzes hydrogen-deuterium exchange mass spectrometry data to estimate protein flexibility. It computes protection factors, compares groups, and maps results to protein structures. DRAP fits logistic dose-response curves to drug screening data from multiple plates. It automates curve fitting, compares results, and exports summaries. Both tools were created with R on the backend for analysis and web interfaces for usability. This allows researchers to perform complex analyses without programming expertise.
This document discusses several common problems with data handling and quality including building and testing models with the same data, confusion between biological and technical replicates, and identification and handling of outliers. It provides examples and explanations of key concepts such as experimental and sampling units, pseudo-replication, outliers versus high influence points, and leverage plots. The importance of proper data handling techniques like dividing data into training, test, and confirmation sets and using cross-validation is emphasized to avoid overfitting models and generating spurious findings.
The document provides an overview of statistical testing, including:
- When to use parametric vs. nonparametric tests
- When large sample tests or exact tests are needed
- When adjustments for multiple testing are required
It discusses key concepts like null and alternative hypotheses, test statistics, p-values, and type I and II errors. Examples of the Student's t-test and Wilcoxon rank sum test are provided.
This document summarizes a presentation on curve fitting using GraphPad Prism. It discusses nonlinear regression techniques for analyzing dose-response and binding curve data commonly used by biologists. Specific nonlinear regression models like sigmoidal dose-response curves are described. The document provides guidance on choosing and fitting appropriate models, evaluating model fit, and improving model fit if needed.
This document provides solutions to sample problems using various datasets. It demonstrates how to use R functions like bargraph.CI(), boxplot(), hist(), and table() to analyze and visualize data. For example, it shows how to create bar charts comparing mean BMI by gender and mean AFP difference by drug concentration using the bargraph.CI() function from the sciplot package. It also provides solutions for manipulating datasets, such as recoding a variable or sorting and subsetting data.
More from Bioinformatics and Computational Biosciences Branch (20)
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
1. 3/30/14
1
R. Burke Squires
Computational Genomics Specialist
Bioinformatics and Computational Biosciences Branch (BCBB)
2
Bioinformatics &
Computational Biology Branch (BCBB) Why Python?
4Source: http://xkcd.com/353/
Topics
§ iPython & Integrated Development Environments
§ Printing and manipulating text
§ Reading and writing files
§ Lists and Loops
§ Writing your own functions
§ Conditional tests
§ Regular Expressions
§ Dictionaries
§ Files, programs and user input
5
Resource:
Bioinformatics Programming
6
2. 3/30/14
2
Goals
§ Introduce you to the basics of the python
programming language
§ Introduce you to the iPython environment and
integrated development environments (IDE)
§ Enable you to write or assemble scripts of your own or
modify existing scripts for your own purposes
§ Prepare you for the next session “Introduction to
Biopython Programming”
7 8
Programming…is it Magic?
§ No…BUT it can seem
like it at times! J
§ Working with text files
9
Python Scripts vs. Program
10
Each function
is interpreted
and executed
(slower)
Code is
compiled once;
executed as
machine code
(fastest)
11
In Your Toolbelt…
§ Python environment
§ Integrated Development Environment (IDE)
– Continuum Analytics Anaconda
§ http://continuum.io/downloads.html
– Enthought Canopy Express
§ https://www.enthought.com/products/epd/free/
– iPython Notebook
§ http://ipython.org
– PyCharm CE (Community Edition)
§ http://www.jetbrains.com/pycharm/
12
3. 3/30/14
3
Python Environment
§ Open Terminal
§ Type “python” and hit return
• You should see “>>>”
§ Enter “print(‘hello world’)” and hit return
§ Congratulation! You have just written your first python
script!
§ You could also put code in text file and execute:
• $python script_name.py
13
iPython
14
iPython
§ IPython provides architecture for interactive
computing:
• Powerful interactive shells (terminal and Qt-based).
• A browser-based notebook with support for code,
text, mathematical expressions, inline plots and
other rich media.
• Support for interactive data visualization and use of
GUI toolkits.
• Easy to use, high performance tools for parallel
computing.
15
Source: ipython.org
iPython
§ Already installed
• Source: Continuum Analytics Anaconda
– http://continuum.io/downloads.html
§ Double-click on icon on desktop:
§ Launch the ipython-notebook
16
iPython – Home Screen
17
iPython – New Notebook
18
4. 3/30/14
4
iPython
§ Add text using Cell -> Markdown
• Type #Intro to Python
• Type “This is my first iPython notebook.”
• (To edit change to raw text)
§ Add a code cell
§ Type “print(“Hello world”)
§ Click play or run button (or Cell -> Run)
19
Source: ipython.org
iPython Notebook
20
iPython Notebook Help
21
iPython Notebook Help
§ Add images to your notebook
• “!["DNA"](files/DNA_chain.jpg)”
• In the same folder as notebook
§ Add YouTube Videos to your notebook:
• from IPython.display import YouTubeVideo
• YouTubeVideo('iwVvqwLDsJo')
22
Additional Tools
Canopy PyCharm
23
Advantages of IDEs
24
§ PyCharm features:
• Intelligent Editor:
– Code completion, on-the-fly error highlighting, auto-fixes, etc.
• Automated code refactorings and rich navigation
capabilities
• Integrated debugger and unit testing support
• Native version control system (VCS) integrations
• Customizable UI and key-bindings, with VIM
emulation available
Source: http://www.jetbrains.com/pycharm/
5. 3/30/14
5
Lastly: Python IDEs in the Cloud
25
§ Python Anywhere
• http://www.pythonanywhere.com
§ Python Fiddle: Python Cloud IDE
• http://pythonfiddle.com
§ Koding: Free Programming Virtual Machine
• http://koding.com
26
Printing and manipulating text:
“Hello World”
§ While in iPython:
• Type print(“Hello world”)
• “Run” the program
§ The whole thing is a statement; print is a function
§ Comments
• # This is a comment!
27
Printing and manipulating text:
Storing String in Variables
# store a short DNA sequence in the variable my_dna!
my_dna = "ATGCGTA"!
!
# now print the DNA sequence!
print(my_dna)
28Source: Python for Biologists, Dr. Martin Jones
Printing and manipulating text:
Concatenation
my_dna = "AATT" + "GGCC"!
print(my_dna)!
!
upstream = "AAA"!
my_dna = upstream + "ATGC"!
# my_dna is now "AAAATGC"
29Source: Python for Biologists, Dr. Martin Jones
Printing and manipulating text:
Finding the Length of a String
# store the DNA sequence in a variable!
my_dna = "ATGCGAGT”!
!
# calculate the length of the sequence and store it in a
variable!
dna_length = len(my_dna)!
!
# print a message telling us the DNA sequence lenth!
print("The length of the DNA sequence is " + dna_length)!
!
my_dna = "ATGCGAGT"!
dna_length = len(my_dna)!
print("The length of the DNA sequence is " + str(dna_length))
30Source: Python for Biologists, Dr. Martin Jones
6. 3/30/14
6
Printing and manipulating text:
Replacement
protein = "vlspadktnv"!
!
# replace valine with tyrosine!
print(protein.replace("v", "y"))!
!
# we can replace more than one character!
print(protein.replace("vls", "ymt"))!
!
# the original variable is not affected!
print(protein)
31Source: Python for Biologists, Dr. Martin Jones
Printing and manipulating text:
Replacement
protein = "vlspadktnv"!
!
# print positions three to five!
print(protein[3:5])!
!
# positions start at zero, not one!
print(protein[0:6])!
!
# if we use a stop position beyond the end, it's the same as
using the end!
print(protein[0:60])
32Source: Python for Biologists, Dr. Martin Jones
Printing and manipulating text:
Replacement
protein = "vlspadktnv"!
!
# count amino acid residues!
valine_count = protein.count('v')!
lsp_count = protein.count('lsp')!
tryptophan_count = protein.count('w')!
!
# now print the counts!
print("valines: " + str(valine_count))!
print("lsp: " + str(lsp_count))!
print("tryptophans: " + str(tryptophan_count))
33Source: Python for Biologists, Dr. Martin Jones
Printing and manipulating text:
Homework
Calculating AT content!
Here's a short DNA sequence:!
!
ACTGATCGATTACGTATAGTATTTGCTATCATACATATATATCGATGCGTTCAT!
!
Write a program that will print out the AT content of this
DNA sequence. Hint: you can use normal mathematical symbols
like add (+), subtract (-), multiply (*), divide (/) and
parentheses to carry out calculations on numbers in Python.!
!
Reminder: if you're using Python 2 rather than Python 3,
include this line at the top of your program:!
from __future__ import division
34Source: Python for Biologists, Dr. Martin Jones
35
Reading and Writing Files:
Reading a File
my_file = open("dna.txt")!
file_contents = my_file.read()!
print(file_contents)!
!
my_file = open("dna.txt")!
my_file_contents = my_file.read()!
!
# remove the newline from the end of the file contents!
my_dna = my_file_contents.rstrip("n")!
dna_length = len(my_dna)!
print("sequence is " + my_dna + " and length is " +
str(dna_length))
36Source: Python for Biologists, Dr. Martin Jones
7. 3/30/14
7
Reading and Writing Files:
Writing to a File
my_file = open("out.txt", "w")!
my_file.write("Hello world")!
!
# remember to close the file!
my_file.close()!
!
my_file = open("/Users/martin/Desktop/myfolder/myfile.txt")
37Source: Python for Biologists, Dr. Martin Jones
Reading and Writing Files:
Homework
Writing a FASTA file
FASTA file format is a commonly-used DNA and protein sequence file format. A
single sequence in FASTA format looks like this:
>sequence_name
ATCGACTGATCGATCGTACGAT
Write a program that will create a FASTA file for the following three sequences –
make sure that all sequences are in upper case and only contain the bases A, T, G
and C.
Sequence header DNA sequence
ABC123 ATCGTACGATCGATCGATCGCTAGACGTATCG
DEF456 actgatcgacgatcgatcgatcacgact
HIJ789 ACTGAC-ACTGT--ACTGTA----CATGTG
38Source: Python for Biologists, Dr. Martin Jones
39
Lists and Loops:
Lists
apes = ["Homo sapiens", "Pan troglodytes", "Gorilla gorilla"]!
conserved_sites = [24, 56, 132]!
print(apes[0])!
first_site = conserved_sites[2]!
!
chimp_index = apes.index("Pan troglodytes")!
# chimp_index is now 1!
!
nucleotides = ["T", ”C", ”A”, “G”]
last_ape = apes[-1]!
!
!
40
−1
>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[−1]
'A'
>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[−5]
'D'
>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[7 // 2]
'K'
0
>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[50]
Traceback (most recent call last):
File "<pyshell#14>", line 1, in <module>
'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[50]
IndexError: string index out of range
Slicing
[m:n]
10 | Chapter 1: Primitives
Source: Python for Biologists, Dr. Martin Jones & O’Reilly Bioinformatics Programming Using Python
Lists and Loops:
Slicing & Appending Lists
ranks = ["kingdom", "phylum", "class", "order", "family"]!
lower_ranks = ranks[2:5]!
# lower ranks are class, order and family!
!
!
apes = ["Homo sapiens", "Pan troglodytes", "Gorilla gorilla"]!
print("There are " + str(len(apes)) + " apes")!
apes.append("Pan paniscus")!
print("Now there are " + str(len(apes)) + " apes")!
41Source: Python for Biologists, Dr. Martin Jones
Lists and Loops:
Concatenating, Reversing & Sorting Lists
apes = ["Homo sapiens", "Pan troglodytes", "Gorilla gorilla"]!
monkeys = ["Papio ursinus", "Macaca mulatta"]!
primates = apes + monkeys!
print(str(len(apes)) + " apes")!
print(str(len(monkeys)) + " monkeys")!
print(str(len(primates)) + " primates")!
!
!
ranks = ["kingdom", "phylum", "class", "order", "family"]!
print("at the start : " + str(ranks))!
ranks.reverse()!
print("after reversing : " + str(ranks))!
ranks.sort()!
print("after sorting : " + str(ranks))!
42Source: Python for Biologists, Dr. Martin Jones
8. 3/30/14
8
Lists and Loops:
Looping through Lists
apes = ["Homo sapiens", "Pan troglodytes", "Gorilla gorilla"]!
for ape in apes:!
print(ape + " is an ape")!
!
!
apes = ["Homo sapiens", "Pan troglodytes", "Gorilla gorilla"]!
for ape in apes:!
name_length = len(ape)!
first_letter = ape[0]!
print(ape + " is an ape. Its name starts with " + "!
" first_letter)!
print("Its name has " + str(name_length) + " letters")!
43Source: Python for Biologists, Dr. Martin Jones
Python:
Indentation
apes = ["Homo sapiens", "Pan troglodytes", "Gorilla gorilla"]!
for ape in apes:!
name_length = len(ape)!
first_letter = ape[0]!
print(ape + " is an ape. Its name starts with " + !
" first_letter)!
print("Its name has " + str(name_length) + " letters")!
!
Indentation errors!
!
Use tabs or spaces but not both !
44Source: Python for Biologists, Dr. Martin Jones
Lists and Loops:
Using Strings as Lists, Splitting
name = "martin"!
for character in name:!
print("one character is " + character)!
!
!
names = "melanogaster,simulans,yakuba,ananassae"!
species = names.split(",")!
print(str(species))!
45Source: Python for Biologists, Dr. Martin Jones
Lists and Loops:
Looping through File, Line by Line
file = open("some_input.txt")!
for line in file:!
# do something with the line!
46Source: Python for Biologists, Dr. Martin Jones
Lists and Loops:
Looping with Ranges
protein = "vlspadktnv”!
vls!
vlsp!
vlspa…!
!
!
stop_positions = [3,4,5,6,7,8,9,10]!
for stop in stop_positions:!
substring = protein[0:stop]!
print(substring)!
!
for number in range(3, 8):!
print(number)!
!
for number in range(6):!
print(number)!
47Source: Python for Biologists, Dr. Martin Jones
Lists and Loops:
Looping with Ranges
protein = "vlspadktnv”!
vls!
vlsp!
vlspa…!
!
!
stop_positions = [3,4,5,6,7,8,9,10]!
for stop in stop_positions:!
substring = protein[0:stop]!
print(substring)!
!
for number in range(3, 8):!
print(number)!
!
for number in range(6):!
print(number)!
48Source: Python for Biologists, Dr. Martin Jones
9. 3/30/14
9
Lists and Loops:
Homework
§ Processing DNA in a file
• The file input.txt contains a number of DNA
sequences, one per line. Each sequence starts with
the same 14 base pair fragment – a sequencing
adapter that should have been removed. Write a
program that will (a) trim this adapter and write the
cleaned sequences to a new file and (b) print the
length of each sequence to the screen.
49Source: Python for Biologists, Dr. Martin Jones 50
Writing Your Own Functions:
Convert Code to Function
my_dna = "ACTGATCGATTACGTATAGTATTTGCTATCATACATATATATCGATGCGTTCAT”!
length = len(my_dna)!
a_count = my_dna.count('A’)!
t_count = my_dna.count('T’)!
at_content = (a_count + t_count) / length!
print("AT content is " + str(at_content))!
==============================================!
from __future__ import division #if using python 2!
!
def get_at_content(dna):!
length = len(dna)!
a_count = dna.count('A’)!
t_count = dna.count('T’)!
at_content = (a_count + t_count) / length!
return at_content!
==============================================!
print("AT content is " + str(get_at_content("ATGACTGGACCA")))
51Source: Python for Biologists, Dr. Martin Jones
Writing Your Own Functions:
Improving our Function
def get_at_content(dna, sig_figs):!
length = len(dna)!
a_count = dna.upper().count('A')!
t_count = dna.upper().count('T')!
at_content = (a_count + t_count) / length!
return round(at_content, sig_figs)!
!
!
test_dna = "ATGCATGCAACTGTAGC"!
print(get_at_content(test_dna, 1))!
print(get_at_content(test_dna, 2))!
print(get_at_content(test_dna, 3))
52Source: Python for Biologists, Dr. Martin Jones
Writing Your Own Functions:
Improving our Function
§ Functions do not always have to take parameters
§ Functions do not always have to return a value
!
def get_at_content():!
test_dna = "ATGCATGCAACTGTAGC"!
length = len(dna)!
a_count = dna.upper().count('A')!
t_count = dna.upper().count('T')!
at_content = (a_count + t_count) / length!
print(round(at_content, sig_figs))!
!
!
§ What are the disadvantages of doing these things?
53Source: Python for Biologists, Dr. Martin Jones
Writing Your Own Functions:
Defaults & Named Arguments
§ Function arguments can be named
§ Order then does not matter!
!
get_at_content(dna="ATCGTGACTCG", sig_figs=2)!
get_at_content(sig_figs=2, dna="ATCGTGACTCG")!
!
§ Functions can have default values
§ Default values do not need to be provided unless a
different value is desired
!
def get_at_content(dna, sig_figs=2):!
(function code)!
54Source: Python for Biologists, Dr. Martin Jones
10. 3/30/14
10
Writing Your Own Functions:
Testing Functions
§ Functions should be testing with know good values
§ Functions should be tested with known bad values!
!
assert get_at_content("ATGC") == 0.5!
assert get_at_content("A") == 1!
assert get_at_content("G") == 0!
assert get_at_content("ATGC") == 0.5!
assert get_at_content("AGG") == 0.33!
assert get_at_content("AGG", 1) == 0.3!
assert get_at_content("AGG", 5) == 0.33333!
55Source: Python for Biologists, Dr. Martin Jones 56
Conditional Tests:
True, False, If…else…elif…then
§ Python has a built-in values “True”, “False”
§ Conditional statements evaluate to True or False
§ If statements use conditional statements
expression_level = 125!
if expression_level > 100:!
print("gene is highly expressed")!
!
expression_level = 125!
if expression_level > 100:!
print("gene is highly expressed")!
else:!
print("gene is lowly expressed")
57Source: Python for Biologists, Dr. Martin Jones
Conditional Tests:
True, False, If…else…elif…then
file1 = open("one.txt", "w")!
file2 = open("two.txt", "w")!
file3 = open("three.txt", "w")!
accs = ['ab56', 'bh84', 'hv76', 'ay93', 'ap97', 'bd72']!
for accession in accs:!
if accession.startswith('a'):!
file1.write(accession + "n")!
elif accession.startswith('b'):!
file2.write(accession + "n")!
else:!
file3.write(accession + "n")
58Source: Python for Biologists, Dr. Martin Jones
Conditional Tests:
While loops
§ While loops loop until a condition is met
count = 0!
while count<10:!
print(count)!
count = count + 1
59Source: Python for Biologists, Dr. Martin Jones
Conditional Tests:
While Loops
§ While loops loop until a condition is met
count = 0!
while count<10:!
print(count)!
count = count + 1
60Source: Python for Biologists, Dr. Martin Jones
11. 3/30/14
11
Conditional Tests:
Building Complex Conditions
§ Use “and”, “or”, “not and”, “not or” to build complex
conditions
accs = ['ab56', 'bh84', 'hv76', 'ay93', 'ap97', 'bd72']!
for accession in accs:!
if accession.startswith('a') and accession.endswith('3'):!
print(accession)
61Source: Python for Biologists, Dr. Martin Jones 62
Regular Expressions:
Patterns in Biology
§ There are a lot of patterns in biology:
– protein domains
– DNA transcription factor binding motifs
– restriction enzyme cut sites
– runs of mononucleotides
§ Pattern in strings inside text:
– read mapping locations
– geographical sample coordinates
– taxonomic names
– gene names
– gene accession numbers
– BLAST searches
63Source: Python for Biologists, Dr. Martin Jones
Regular Expressions:
Patterns in Biology
§ Many problems that we want to solve that require
more flexible patterns:
– Given a DNA sequence, what's the length of the poly-A tail?
– Given a gene accession name, extract the part between the
third character and the underscore
– Given a protein sequence, determine if it contains this highly-
redundant domain motif
64Source: Python for Biologists, Dr. Martin Jones
Regular Expressions:
Modules in Python
§ To search for these patterns, we use the regular expression
module “re”
import re!
!
re.search(pattern, string)!
!
dna = "ATCGCGAATTCAC"!
if re.search(r"GAATTC", dna):!
print("restriction site found!")!
!
if re.search(r"GC(A|T|G|C)GC", dna):!
print("restriction site found!")!
!
if re.search(r"GC[ATGC]GC", dna):!
print("restriction site found!")
65Source: Python for Biologists, Dr. Martin Jones
Regular Expressions:
Get String and Position of Match
§ Get the string that matched
dna = "ATGACGTACGTACGACTG"!
# store the match object in the variable m!
m = re.search(r"GA([ATGC]{3})AC([ATGC]{2})AC", dna)!
print("entire match: " + m.group())!
print("first bit: " + m.group(1))!
print("second bit: " + m.group(2))!
!
§ Get the positions of the match
!
print("start: " + str(m.start()))!
print("end: " + str(m.end()))
66Source: Python for Biologists, Dr. Martin Jones
12. 3/30/14
12
67
Dictionaries:
Storing Paired Data
enzymes = {}!
enzymes['EcoRI'] = r'GAATTC'!
enzymes['AvaII] = r'GG(A|T)CC'!
enzymes['BisI'] = r'GC[ATGC]GC'!
!
# remove the EcoRI enzyme from the dict!
enzymes.pop('EcoRI')!
!
dna = "AATGATCGATCGTACGCTGA"!
counts = {}!
for base1 in ['A', 'T', 'G', 'C']:!
for base2 in ['A', 'T', 'G', 'C']:!
for base3 in ['A', 'T', 'G', 'C']:!
trinucleotide = base1 + base2 + base3!
count = dna.count(trinucleotide)!
counts[trinucleotide] = count!
print(counts)
68Source: Python for Biologists, Dr. Martin Jones
Dictionaries:
Storing Paired Data
!
{'ACC': 0, 'ATG': 1, 'AAG': 0, 'AAA': 0, 'ATC': 2, 'AAC': 0,
'ATA': 0, 'AGG': 0, 'CCT': 0, 'CTC': 0, 'AGC': 0, 'ACA': 0,
'AGA': 0, 'CAT': 0, 'AAT': 1, 'ATT': 0, 'CTG': 1, 'CTA': 0,
'ACT': 0, 'CAC': 0, 'ACG': 1, 'CAA': 0, 'AGT': 0, 'CAG': 0,
'CCG': 0, 'CCC': 0, 'CTT': 0, 'TAT': 0, 'GGT': 0, 'TGT': 0,
'CGA': 1, 'CCA': 0, 'TCT': 0, 'GAT': 2, 'CGG': 0, 'TTT': 0,
'TGC': 0, 'GGG': 0, 'TAG': 0, 'GGA': 0, 'TAA': 0, 'GGC': 0,
'TAC': 1, 'TTC': 0, 'TCG': 2, 'TTA': 0, 'TTG': 0, 'TCC': 0,
'GAA': 0, 'TGG': 0, 'GCA': 0, 'GTA': 1, 'GCC': 0, 'GTC': 0,
'GCG': 0, 'GTG': 0, 'GAG': 0, 'GTT': 0, 'GCT': 1, 'TGA': 2,
'GAC': 0, 'CGT': 1, 'TCA': 0, 'CGC': 1}!
!
print(counts['TGA'])
69Source: Python for Biologists, Dr. Martin Jones
Dictionaries:
Storing Paired Data
if 'AAA' in counts:!
print(counts('AAA'))!
!
for trinucleotide in counts.keys():!
if counts.get(trinucleotide) == 2:!
print(trinucleotide)!
!
for trinucleotide in sorted(counts.keys()):!
if counts.get(trinucleotide) == 2:!
print(trinucleotide)!
!
for trinucleotide, count in counts.items():!
if count == 2:!
print(trinucleotide)
70Source: Python for Biologists, Dr. Martin Jones
71
Files, Programs, & User Input:
Basic File Manipulation
§ Rename a file
!
import os!
os.rename("old.txt", "new.txt")!
!
§ Rename a folder
!
os.rename("/home/martin/old_folder", "/home/martin/
new_folder")!
!
§ Check to see if a file exists
!
if os.path.exists("/home/martin/email.txt"):!
print("You have mail!")
72Source: Python for Biologists, Dr. Martin Jones
13. 3/30/14
13
Files, Programs, & User Input:
Basic File Manipulation
§ Remove a file
os.remove("/home/martin/unwanted_file.txt")!
§ Remove empty folder
os.rmdir("/home/martin/emtpy")!
§ To delete a folder and all the files in it, use
shutil.rmtree
shutil.rmtree("home/martin/full")
73Source: Python for Biologists, Dr. Martin Jones
Files, Programs, & User Input:
Running External Programs
§ Run an external program
import subprocess!
subprocess.call("/bin/date")!
!
§ Run an external program with options
!
subprocess.call("/bin/date +%B", shell=True)!
!
§ Saving program output
!
current_month = subprocess.check_output("/bin/date +%B",
shell=True)
74Source: Python for Biologists, Dr. Martin Jones
Files, Programs, & User Input:
User Input
§ Interactive user input
accession = input("Enter the accession name")!
# do something with the accession variable!
§ Capture command line arguments
!
import sys!
print(sys.argv)!
# python myprogram.py one two three!
# sys.argv[1] return script name!
75Source: Python for Biologists, Dr. Martin Jones 76
Goals
§ Introduce you to the basics of the python
programming language
§ Introduced you to the iPython environment
§ Prepare you for the next session “Introduction to
Biopython for Scientists”
§ Enable you to write or assemble scripts of your own or
modify existing scripts for your own purposes
77
Resources: Website
§ Websites
• http://pythonforbiologists.com
• http://www.pythonforbeginners.com
• http://www.pythontutor.com/visualize.html#mode=display
§ Free eBook in HTML / PDF
• http://pythonforbiologists.com
• http://greenteapress.com/thinkpython/
• http://openbookproject.net/books/bpp4awd/index.html
§ Python Regular Expressions (pattern matching)
• http://www.pythonregex.com
§ Python Style Guide
• http://www.python.org/dev/peps/pep-0008/
78
14. 3/30/14
14
Additional Seminars
§ Introduction to BioPython for Scientists
§ Introduction to Data Analysis with Python
• Utilizing NumPy and pandas modules
79
Collaborations welcome
One-on-one training available for those on NIH campus and related
agencies
ScienceApps at niaid.nih.gov
80