This document summarizes research using machine learning algorithms to design artificial histone acetyltransferases (HATs) with improved stability and activity. The target protein is Tetrahymena GCN5 (tGCN5), a HAT. Experimental data showed increased stability but decreased activity when incorporating unnatural amino acids. The researchers identified 11 potential mutants and used machine learning approaches like active learning to select variants for further investigation, with the goal of optimizing tGCN5 function.
The document summarizes the work done in WP2 "Indexing" of the SemanticHIFI project. WP2 developed algorithms and modules for extracting audio features to enable functionalities like music segmentation, rhythm description, tonality description, generic audio descriptor extraction, music remixing, browsing by lyrics, audio identification, and tempo/phase detection. The work resulted in executable modules and libraries that were integrated into applications developed in other work packages. Scientific research methodologies were employed and results were disseminated through publications, conferences, and demonstrations.
This document provides instructions for Assignment 2 of the BMI 214 course on machine learning for expression data and genotype-phenotype associations. It includes instructions on using the Weka machine learning tool to perform supervised and unsupervised learning on gene expression datasets. For supervised learning, it has students classify leukemia samples and evaluate different classifiers. For unsupervised learning, it has students perform k-means clustering on a yeast gene expression dataset. It also includes exercises on feature selection to identify informative genes for classification.
Intelligent Agents: Technology and Applications butest
The document discusses multi-agent learning and summarizes key concepts from a taxonomy of multi-agent systems. It provides examples of single agent learning within a multi-agent environment and discusses issues around reactive vs. deliberative agents, modeling other agents, and how agents can affect one another without direct communication.
Comprehensive Final Exam Accounting 3300 Fall 2009 Prof. Richard ...butest
This document provides a comprehensive final exam for an accounting course. It consists of 75 multiple choice questions testing concepts related to managerial accounting, including cost behavior, cost allocation, budgeting, and decision making. Students are instructed to take the exam at home and then submit their answers using a testing center's chi-tester. The exam is due on the last day of finals.
Дітей і дорослих вітаємо з Днем захисту дітей!
Бажаємо, щоб дитинство було безтурботним на радість щасливим батькам!
Кожна дитина має право на морозиво, цукерки, торти і справедливі шкільні оцінки!
О’ціночки підготували до свята 9 улюблених цитат про дитинство, а в коментарях пропонуємо навести свої улюблені вислови про дитинство.
Корисні підказки для батьків шукайте
в мобільному додатку О’ціночки https://play.google.com/store/apps/details?id=ua.com.hitmax.ocinochky.full&hl=ru
The City of Berkeley is soliciting proposals for self-check, materials security, and automated materials handling systems for all locations of the Berkeley Public Library. Proposals are due by March 11, 2010. The document provides background information on the library's current systems and usage statistics. It outlines the scope of services sought, including requirements for self-check systems, staff workstations, materials security, media circulation/security, and automated materials handling.
Active Network is a novel approach of networking to mobile users in which the nodes are programmed to perform custom operations on the messages that pass through the node. It provides an architectural support for dynamically deploying new protocols in an existing network topology. The routers in an active network can download and execute code that is contained in the packets passing through them, thus rendering the node recognized and run totally new protocols without making any changes to the architecture of the network. Because the network's behavior can be altered at any time, active networks could be used to provide dynamic quality of service (QoS) or to support dynamic solutions to traffic congestion. This research implements and tests such specialized Active Networks security service known as the firewall and the ping service in Active Network. Active Network environment will be implemented on a small scale test scenario in order to study the performance and characteristics of active networks
The document summarizes the work done in WP2 "Indexing" of the SemanticHIFI project. WP2 developed algorithms and modules for extracting audio features to enable functionalities like music segmentation, rhythm description, tonality description, generic audio descriptor extraction, music remixing, browsing by lyrics, audio identification, and tempo/phase detection. The work resulted in executable modules and libraries that were integrated into applications developed in other work packages. Scientific research methodologies were employed and results were disseminated through publications, conferences, and demonstrations.
This document provides instructions for Assignment 2 of the BMI 214 course on machine learning for expression data and genotype-phenotype associations. It includes instructions on using the Weka machine learning tool to perform supervised and unsupervised learning on gene expression datasets. For supervised learning, it has students classify leukemia samples and evaluate different classifiers. For unsupervised learning, it has students perform k-means clustering on a yeast gene expression dataset. It also includes exercises on feature selection to identify informative genes for classification.
Intelligent Agents: Technology and Applications butest
The document discusses multi-agent learning and summarizes key concepts from a taxonomy of multi-agent systems. It provides examples of single agent learning within a multi-agent environment and discusses issues around reactive vs. deliberative agents, modeling other agents, and how agents can affect one another without direct communication.
Comprehensive Final Exam Accounting 3300 Fall 2009 Prof. Richard ...butest
This document provides a comprehensive final exam for an accounting course. It consists of 75 multiple choice questions testing concepts related to managerial accounting, including cost behavior, cost allocation, budgeting, and decision making. Students are instructed to take the exam at home and then submit their answers using a testing center's chi-tester. The exam is due on the last day of finals.
Дітей і дорослих вітаємо з Днем захисту дітей!
Бажаємо, щоб дитинство було безтурботним на радість щасливим батькам!
Кожна дитина має право на морозиво, цукерки, торти і справедливі шкільні оцінки!
О’ціночки підготували до свята 9 улюблених цитат про дитинство, а в коментарях пропонуємо навести свої улюблені вислови про дитинство.
Корисні підказки для батьків шукайте
в мобільному додатку О’ціночки https://play.google.com/store/apps/details?id=ua.com.hitmax.ocinochky.full&hl=ru
The City of Berkeley is soliciting proposals for self-check, materials security, and automated materials handling systems for all locations of the Berkeley Public Library. Proposals are due by March 11, 2010. The document provides background information on the library's current systems and usage statistics. It outlines the scope of services sought, including requirements for self-check systems, staff workstations, materials security, media circulation/security, and automated materials handling.
Active Network is a novel approach of networking to mobile users in which the nodes are programmed to perform custom operations on the messages that pass through the node. It provides an architectural support for dynamically deploying new protocols in an existing network topology. The routers in an active network can download and execute code that is contained in the packets passing through them, thus rendering the node recognized and run totally new protocols without making any changes to the architecture of the network. Because the network's behavior can be altered at any time, active networks could be used to provide dynamic quality of service (QoS) or to support dynamic solutions to traffic congestion. This research implements and tests such specialized Active Networks security service known as the firewall and the ping service in Active Network. Active Network environment will be implemented on a small scale test scenario in order to study the performance and characteristics of active networks
Fayin Li is seeking a full-time research position in machine learning, computer vision, and image processing. He has over 10 years of experience in these fields, including expertise in mathematics, algorithms, software development, machine learning techniques, and programming languages. He received his Ph.D from George Mason University where he conducted research on topics such as object recognition, face recognition, and motion estimation.
Biz2Credit, the leading online platform for small business funding, hosted its annual best small business cities 2015 webinar that examined the small business finance climate nationwide.
The document discusses the anatomy of the neck region, specifically focusing on cervical fascia and a transverse section of the neck. It defines cervical fascia as sheets of fibrous tissue that invest muscles and organs. It describes the layers of deep cervical fascia - the investing layer, pretracheal fascia, prevertebral fascia, and carotid sheaths. It illustrates these structures on diagrams of fascia of the neck and a transverse section. The summary highlights the organization of deep cervical fascia into distinct layers that surround specific regions like the vertebral column, viscera, and neurovascular bundles.
The document summarizes key concepts in reinforcement learning:
- Agent-environment interaction is modeled as states, actions, and rewards
- A policy is a rule for selecting actions in each state
- The return is the total discounted future reward an agent aims to maximize
- Tasks can be episodic or continuing
- The Markov property means the future depends only on the present state
- The agent-environment framework can be modeled as a Markov decision process
The document provides information on various Military Occupational Specialties (MOS), listing each MOS code followed by the MOS name and references for minimum qualifications (MinQual) and training requirements in the Army Training Requirements and Resources System (ATRRS) and DA PAM 611-21. It includes over 100 different MOS codes.
This document discusses a study analyzing the coronas (gas discharge visualizations) of apple tree leaves and fruits using the GDV Assistant system. The researchers recorded coronas under different conditions to analyze plant vitality and stress levels. They used various machine learning algorithms to analyze the parameterized corona images. The results showed coronas provide useful information about plant stress and variety. However, they could not differentiate between organically and conventionally grown fruit that were similar in standard quality measures. The document describes the GDV Assistant system parameters, recording methodology, classification problems analyzed, machine learning methods used, and results.
This document provides course information for CS 478 Machine Learning offered in the spring of 2002 at Cornell University. The course will introduce machine learning techniques including decision trees, Bayesian learning, Hidden Markov Models, neural networks, genetic algorithms, clustering, and more. Students will complete homework assignments, a project, a midterm exam, and a final exam. The instructor is Golan Yona and teaching assistants include Chee Yong Lee and Aleksandr Gilshteyn. Prerequisites include CS 280 and CS 312 or similar courses in linear algebra and probability.
1. Machine learning was used to create a decision tree model to diagnose problems in telecommunications networks, achieving 99% accuracy with only 10,000 examples.
2. The model was simplified for comprehensibility, becoming probabilistic and covering 50% of cases with general rules and 50% with specific small disjuncts.
3. Lessons from the success include the importance of model comprehensibility, handling small datasets, addressing systematic errors, and considering future extensions when applying machine learning solutions.
Applying Machine Learning to Software Clusteringbutest
This document discusses applying machine learning techniques to the problem of automatically clustering source code files into subsystems. Specifically, it formulates software clustering as a supervised machine learning problem, where a learner is trained on a subset of files that have been manually categorized and then aims to generalize that categorization to other files. The document tests two machine learning algorithms - Naive Bayes and Nearest Neighbor - on decompositions of three software systems, with the Nearest Neighbor algorithm achieving the best results.
Cristopher M. Bishop's tutorial on graphical modelsbutest
Part 1 of the document provides an overview of graphical models and machine learning techniques for computer vision. It discusses directed and undirected graphs, inference using message passing algorithms like belief propagation, and learning techniques like maximum likelihood and Bayesian learning. Graphical models allow modeling complex probability distributions and exploiting conditional independencies between variables.
A multi-dimension analysis of students' log files in algebrabutest
This document proposes a research project to analyze student log files from an algebra learning environment called Aplusix. The project would involve researchers from universities in the Philippines, Vietnam, France, and the UK. They would collect and analyze data from Aplusix sessions with students in those countries. The analyses would develop models of student knowledge, emotional states, and behaviors using data mining and machine learning techniques. The models could help improve intelligent tutoring systems by more accurately representing students and adapting to their emotions.
Este documento analiza el modelo de negocio de YouTube. Explica que YouTube y otros sitios de video online representan un nuevo modelo de negocio para contenidos audiovisuales debido al cambio en los hábitos de consumo causado por las nuevas tecnologías. Describe cómo YouTube aprovecha la participación de los usuarios para mejorar continuamente y atraer una audiencia diferente a la de los medios tradicionales.
The defense was successful in portraying Michael Jackson favorably to the jury in several ways:
1) They dressed Jackson in ornate costumes that conveyed images of purity, innocence, and humility.
2) Jackson was shown entering the courtroom as if on a red carpet, emphasizing his celebrity status.
3) Jackson appeared vulnerable, childlike, and in declining health during the trial, eliciting sympathy from jurors.
4) Defense attorney Tom Mesereau effectively presented a coherent narrative of Jackson as a victim and portrayed Neverland as a place of refuge, undermining the prosecution's arguments.
Michael Jackson was born in 1958 in Gary, Indiana and rose to fame in the 1960s as the lead singer of The Jackson 5, topping music charts in the 1970s. As a solo artist in the 1980s, his album Thriller broke music records. In the 1990s and 2000s, Jackson faced several legal issues related to child abuse allegations while continuing to release music. He married Lisa Marie Presley and Debbie Rowe and had two children before his death in 2009.
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
This document appears to be a list of popular books from various authors. It includes over 150 book titles across many genres such as fiction, non-fiction, memoirs, and novels. The books cover a wide range of topics from politics to cooking to autobiographies.
The prosecution lost the Michael Jackson trial due to several key mistakes and weaknesses in their case:
1) The lead prosecutor, Thomas Sneddon, was too personally invested in the case against Jackson, having pursued him for over a decade without success.
2) Sneddon's opening statement was disorganized and weak, failing to effectively outline the prosecution's case.
3) The accuser's mother was not credible and damaged the prosecution's case through her erratic testimony, history of lies and con artist behavior.
4) Many prosecution witnesses were not credible due to prior lawsuits against Jackson, debts owed to him, or having been fired by him. Several witnesses even took the Fifth Amendment.
Fayin Li is seeking a full-time research position in machine learning, computer vision, and image processing. He has over 10 years of experience in these fields, including expertise in mathematics, algorithms, software development, machine learning techniques, and programming languages. He received his Ph.D from George Mason University where he conducted research on topics such as object recognition, face recognition, and motion estimation.
Biz2Credit, the leading online platform for small business funding, hosted its annual best small business cities 2015 webinar that examined the small business finance climate nationwide.
The document discusses the anatomy of the neck region, specifically focusing on cervical fascia and a transverse section of the neck. It defines cervical fascia as sheets of fibrous tissue that invest muscles and organs. It describes the layers of deep cervical fascia - the investing layer, pretracheal fascia, prevertebral fascia, and carotid sheaths. It illustrates these structures on diagrams of fascia of the neck and a transverse section. The summary highlights the organization of deep cervical fascia into distinct layers that surround specific regions like the vertebral column, viscera, and neurovascular bundles.
The document summarizes key concepts in reinforcement learning:
- Agent-environment interaction is modeled as states, actions, and rewards
- A policy is a rule for selecting actions in each state
- The return is the total discounted future reward an agent aims to maximize
- Tasks can be episodic or continuing
- The Markov property means the future depends only on the present state
- The agent-environment framework can be modeled as a Markov decision process
The document provides information on various Military Occupational Specialties (MOS), listing each MOS code followed by the MOS name and references for minimum qualifications (MinQual) and training requirements in the Army Training Requirements and Resources System (ATRRS) and DA PAM 611-21. It includes over 100 different MOS codes.
This document discusses a study analyzing the coronas (gas discharge visualizations) of apple tree leaves and fruits using the GDV Assistant system. The researchers recorded coronas under different conditions to analyze plant vitality and stress levels. They used various machine learning algorithms to analyze the parameterized corona images. The results showed coronas provide useful information about plant stress and variety. However, they could not differentiate between organically and conventionally grown fruit that were similar in standard quality measures. The document describes the GDV Assistant system parameters, recording methodology, classification problems analyzed, machine learning methods used, and results.
This document provides course information for CS 478 Machine Learning offered in the spring of 2002 at Cornell University. The course will introduce machine learning techniques including decision trees, Bayesian learning, Hidden Markov Models, neural networks, genetic algorithms, clustering, and more. Students will complete homework assignments, a project, a midterm exam, and a final exam. The instructor is Golan Yona and teaching assistants include Chee Yong Lee and Aleksandr Gilshteyn. Prerequisites include CS 280 and CS 312 or similar courses in linear algebra and probability.
1. Machine learning was used to create a decision tree model to diagnose problems in telecommunications networks, achieving 99% accuracy with only 10,000 examples.
2. The model was simplified for comprehensibility, becoming probabilistic and covering 50% of cases with general rules and 50% with specific small disjuncts.
3. Lessons from the success include the importance of model comprehensibility, handling small datasets, addressing systematic errors, and considering future extensions when applying machine learning solutions.
Applying Machine Learning to Software Clusteringbutest
This document discusses applying machine learning techniques to the problem of automatically clustering source code files into subsystems. Specifically, it formulates software clustering as a supervised machine learning problem, where a learner is trained on a subset of files that have been manually categorized and then aims to generalize that categorization to other files. The document tests two machine learning algorithms - Naive Bayes and Nearest Neighbor - on decompositions of three software systems, with the Nearest Neighbor algorithm achieving the best results.
Cristopher M. Bishop's tutorial on graphical modelsbutest
Part 1 of the document provides an overview of graphical models and machine learning techniques for computer vision. It discusses directed and undirected graphs, inference using message passing algorithms like belief propagation, and learning techniques like maximum likelihood and Bayesian learning. Graphical models allow modeling complex probability distributions and exploiting conditional independencies between variables.
A multi-dimension analysis of students' log files in algebrabutest
This document proposes a research project to analyze student log files from an algebra learning environment called Aplusix. The project would involve researchers from universities in the Philippines, Vietnam, France, and the UK. They would collect and analyze data from Aplusix sessions with students in those countries. The analyses would develop models of student knowledge, emotional states, and behaviors using data mining and machine learning techniques. The models could help improve intelligent tutoring systems by more accurately representing students and adapting to their emotions.
Este documento analiza el modelo de negocio de YouTube. Explica que YouTube y otros sitios de video online representan un nuevo modelo de negocio para contenidos audiovisuales debido al cambio en los hábitos de consumo causado por las nuevas tecnologías. Describe cómo YouTube aprovecha la participación de los usuarios para mejorar continuamente y atraer una audiencia diferente a la de los medios tradicionales.
The defense was successful in portraying Michael Jackson favorably to the jury in several ways:
1) They dressed Jackson in ornate costumes that conveyed images of purity, innocence, and humility.
2) Jackson was shown entering the courtroom as if on a red carpet, emphasizing his celebrity status.
3) Jackson appeared vulnerable, childlike, and in declining health during the trial, eliciting sympathy from jurors.
4) Defense attorney Tom Mesereau effectively presented a coherent narrative of Jackson as a victim and portrayed Neverland as a place of refuge, undermining the prosecution's arguments.
Michael Jackson was born in 1958 in Gary, Indiana and rose to fame in the 1960s as the lead singer of The Jackson 5, topping music charts in the 1970s. As a solo artist in the 1980s, his album Thriller broke music records. In the 1990s and 2000s, Jackson faced several legal issues related to child abuse allegations while continuing to release music. He married Lisa Marie Presley and Debbie Rowe and had two children before his death in 2009.
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
This document appears to be a list of popular books from various authors. It includes over 150 book titles across many genres such as fiction, non-fiction, memoirs, and novels. The books cover a wide range of topics from politics to cooking to autobiographies.
The prosecution lost the Michael Jackson trial due to several key mistakes and weaknesses in their case:
1) The lead prosecutor, Thomas Sneddon, was too personally invested in the case against Jackson, having pursued him for over a decade without success.
2) Sneddon's opening statement was disorganized and weak, failing to effectively outline the prosecution's case.
3) The accuser's mother was not credible and damaged the prosecution's case through her erratic testimony, history of lies and con artist behavior.
4) Many prosecution witnesses were not credible due to prior lawsuits against Jackson, debts owed to him, or having been fired by him. Several witnesses even took the Fifth Amendment.
Here are three examples of public relations from around the world:
1. The UK government's "Be Clear on Cancer" campaign which aims to raise awareness of cancer symptoms and encourage early diagnosis.
2. Samsung's global brand marketing and sponsorship activities which aim to increase brand awareness and favorability of Samsung products worldwide.
3. The Brazilian government's efforts to improve its international image and relations with other countries through strategic communication and diplomacy.
The three most important functions of public relations are:
1. Media relations because the media is how most organizations reach their key audiences. Strong media relationships are crucial.
2. Writing, because written communication is at the core of public relations and how most information is
Michael Jackson Please Wait... provides biographical information about Michael Jackson including his birthdate, birthplace, parents, height, interests, idols, favorite foods, films, and more. It discusses his background, career highlights including influential albums like Thriller, and films he appeared in such as The Wiz and Moonwalker. The document contains photos and details about Jackson's life and illustrious music career.
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
The document discusses the process of manufacturing celebrity and its negative byproducts. It argues that celebrities are rarely the best in their individual pursuits like singing, dancing, etc. but become famous due to being products of a system controlled by wealthy elites. This system stifles opportunities for worthy artists and creates feudalism. The document also asserts that manufactured celebrities should not be viewed as role models due to behaviors like drug abuse and narcissism that result from the celebrity-making process.
Michael Jackson was a child star who rose to fame with the Jackson 5 in the late 1960s and early 1970s. As a solo artist in the 1970s and 1980s, he had immense commercial success with albums like Off the Wall, Thriller, and Bad, which featured hit singles and groundbreaking music videos. However, his career and public image were plagued by controversies related to allegations of child sexual abuse in the 1990s and 2000s. He continued recording and performing but faced ongoing media scrutiny into his private life until his death in 2009.
Social Networks: Twitter Facebook SL - Slide 1butest
The document discusses using social networking tools like Twitter and Facebook in K-12 education. Twitter allows students and teachers to share short updates and can be used to give parents a window into classroom activities. Facebook allows targeted advertising that could be used to promote educational activities. Both tools could help facilitate communication between schools and communities if used properly while managing privacy and security concerns.
Facebook has over 300 million active users who log on daily, and allows brands to create public profile pages to interact with users. Pages are for brands and organizations only, while groups can be made by any user about any topic. Pages do not show admin names and have no limits on fans, while groups display admin names and are limited to 5,000 members. Content on pages should aim to provoke action from subscribers and establish a regular posting schedule using a conversational tone.
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
Hare Chevrolet is a car dealership located in Noblesville, Indiana that has successfully used social media platforms like Twitter, Facebook, and YouTube to create a positive brand image. They invest significant time interacting directly with customers online to foster a sense of community rather than overtly advertising. As a result, Hare Chevrolet has built a large, engaged audience on social media and serves as a model for how brands can use online presences strategically.
Welcome to the Dougherty County Public Library's Facebook and ...butest
This document provides instructions for signing up for Facebook and Twitter accounts. It outlines the sign up process for both platforms, including filling out forms with name, email, password and other details. It describes how the platforms will then search for friends and suggest people to connect with. It also explains how to search for and follow the Dougherty County Public Library page on both Facebook and Twitter once signed up. The document concludes by thanking participants and providing a contact for any additional questions.
Paragon Software announces the release of Paragon NTFS for Mac OS X 8.0, which provides full read and write access to NTFS partitions on Macs. It is the fastest NTFS driver on the market, achieving speeds comparable to native Mac file systems. Paragon NTFS for Mac 8.0 fully supports the latest Mac OS X Snow Leopard operating system in 64-bit mode and allows easy transfer of files between Windows and Mac partitions without additional hardware or software.
This document provides compatibility information for Olympus digital products used with Macintosh OS X. It lists various digital cameras, photo printers, voice recorders, and accessories along with their connection type and any notes on compatibility. Some products require booting into OS 9.1 for software compatibility or do not support devices that need a serial port. Drivers and software are available for download from Olympus and other websites for many products to enable use with OS X.
To use printers managed by the university's Information Technology Services (ITS), students and faculty must install the ITS Remote Printing software on their Mac OS X computer. This allows them to add network printers, log in with their ITS account credentials, and print documents while being charged per page to funds in their pre-paid ITS account. The document provides step-by-step instructions for installing the software, adding a network printer, and printing to that printer from any internet connection on or off campus. It also explains the pay-in-advance printing payment system and how to check printing charges.
The document provides an overview of the Mac OS X user interface for beginners, including descriptions of the desktop, login screen, desktop elements like the dock and hard disk, and how to perform common tasks like opening files and folders. It also addresses frequently asked questions for Windows users switching to Mac OS X, such as where documents are stored, how to save or find documents, and what the equivalent of the C: drive is in Mac OS X. The document concludes with sections on file management tasks like creating and deleting folders, organizing files within applications, using Spotlight search, and an overview of the Dashboard feature.
This document provides a checklist for securing Mac OS X version 10.5, focusing on hardening the operating system, securing user accounts and administrator accounts, enabling file encryption and permissions, implementing intrusion detection, and maintaining password security. It describes the Unix infrastructure and security framework that Mac OS X is built on, leveraging open source software and following the Common Data Security Architecture model. The checklist can be used to audit a system or harden it against security threats.
This document summarizes a course on web design that was piloted in the summer of 2003. The course was a 3 credit course that met 4 times a week for lectures and labs. It covered topics such as XHTML, CSS, JavaScript, Photoshop, and building a basic website. 18 students from various majors enrolled. Student and instructor evaluations found the course to be very successful overall, though some improvements were suggested like ensuring proper software and pairing programming/non-programming students. The document also discusses implications of incorporating web design material into existing computer science curriculums.
1. Machine Learning Designs for Artificial
Histone Acetyltransferases
Man Xia Lee, Aye Sandar Moe1, Susheel Kumar Gunasekar, Kinjal
Mehta,
Zhiqiang Liu, Natalya Voloshchuk, Jin K. Montclare, Phyllis Frankl
and Lisa Hellerstein
Polytechnic Institute of NYU
http://cis.poly.edu/~amoe/mlpd
Abstract:
Although, in vivo incorporation of unnatural amino acids can be used to improve protein
stability; there is a trade off. Higher stability of the protein may lead to loss in activity.
One way to improve function is to employ machine-learning algorithms to identify
proteins that have enhanced activity. Our target protein Tetrahymena GCN5 (tGCN5), a
member of the family of Histone Acetyltransferases (HAT), acetylates histones at
specific lysine residues, enabling transcriptional regulation. Experimental data have
shown an increase in stability of the protein but loss in activity with the incorporation of
ortho-fluorophenylalanine (oFF) into tGCN5. Using information from biochemical and
structural data, we identify 11 potential mutants that may lead to improve function. We
investigate the structure and function of the tGCN5 mutants in the conventional and
fluorinated contexts. Moreover, we seek to generate optimized variants bearing these
mutants with the help of machine learning algorithms.
Introduction:
Histone Acetyltransferases (HAT) are proteins that acetylate the lysine residue of
the histone proteins on the N-terminal tails, enabling transcriptional regulation (Figure 1
A) [1]. When the positive charged lysine residue of the histone protein is acetylated, the
histone becomes neutralized and the negative charged DNA is more accessible for
1
Man Xia Lee and Aye Sandar Moe were supported by the CRAW Multidiciplinary Research
Opportunities for Women (M-ROW) program. Additional support was provided by the Othmer Institute,
Polytechnic University.
2. transcription to occur [2]. The HAT protein Tetrahymena GCN5 (tGCN5) is comprised
of a mixture of alpha-helices and beta-sheets [3] that catalyze the reaction involving the
transfer of the acetyl group from the acetyl-coenzyme A [4].
A) B)
F
F F
OH OH OH OH
H2N H2N H2N H2N
O O O O
F oFF pFF
mFF
Figure 1. A) Crystal structure of tGCN5: Nine
phenylalanine residues are shown in purple. B)
Structure of phenylalanine (F), ortho-
fluorophenylalanine (oFF), meta-fluorophenylalanine (mFF), and para-fluorophenylalanine (pFF).
Previously, Montclare and coworkers incorporated the fluorinated phenylalanine
(oFF, mFF and pFF) into tGCN5 in a residue specific fashion (Figure 1 A, B).
According to experimental data, in vivo incorporation of oFF has shown an increase in
thermal stability. Although tGCN5 bearing oFF displays improved thermal stability, there
is a decrease in activity. Based on biochemical data by numerous groups, we identified
15 residues that are important in the activity and stability of the protein [3-8] (Table 1,
Figure 2). With this set of mutants, we plan to create new variants with combined
mutations to improve protein function.
Table 1. Summary of mutations and their significance.
3. V 86 T Structurally similar
K 87 R Alignment analysis: conserved
F 90 Y Alignment analysis: conserved
V 98 A Important role in protein stability [6, 8]
I 99 V Important role in protein stability [6, 8]
L 100 I Important role in protein stability [6, 8]
I 107 V Important role in protein stability [6, 8]
F 112 R Alignment analysis: conserved
Q 114 L Important in raising the pKa for a more hydrophobic area[6, 7]
A 121 T Alignment analysis: conserved
A 130 S Alignment analysis: conserved
R 140 H Alignment analysis: conserved
K 144 H Important role in catalysis [6, 7]
F 145 L Important role in catalysis [6, 7]
Y 192 A Important role in catalysis [6, 7]
A)
Figure 2. Structure of tGCN5 with mutations highlighted in green are the conserved residues [6], orange
are residues that are critical for catalysis [6], red residues are important for protein stability [6, 7], and blue
residue is an isoteric change.
To reduce the time and cost investigating a combination of all 15 residues
mentioned in Table 1 for a more active tGCN5, we based our design on the theory of
Design of Experiments. The Placket Burman design is widely used to generate a set of
manageable experiments [9]. Because some of the mutations were adjacent to each other
4. Table 2, we chose to combine those adjacent mutations and designated them as a single
mutation. Using the Placket Burman design, we produced twelve variants bearing five to
eleven mutations to test (Table 2).
Table 2. Placket-Burman Design. The mutant(s) represented X1-X11. The ones with only single mutation are X2,
X4-X9 and X11. Those that consisted of two mutations are X1 and X10. Only X3 contained three mutants.
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11
Seq# 86 87 90 98 99 100 107 112 114 121 130 140 144 145 192
V K F V I L I F Q A A R K F Y
1 - - Y - - - V R L - - - H L -
2 - - - A V I - R L T - - - - A
3 T R - - - - V - L T S - - - -
4 - - Y - - - - R - T S H - - -
5 T R - - - - - R - - S - H L A
6 - - Y - - - - - L - - H - - A
7 - - - - - - V - - T - H H L A
8 - - Y A V I - - - T - - H L -
9 - - - A V I - - L - S H H L -
10 - - Y A V I V - - - S - - - A
11 - - - A V I V R - - - - - - -
12 T R Y A V I V R L T S H H L A
To identify which variants to test next, we intend to employ machine learning
algorithms in our protein design. Using genetic engineering techniques, we are
generating the protein variants and measuring the activities relative to the starting wild-
type tGCN5 and with the incorporation of oFF.
Machine-learning algorithms can be employed to predict the next set of variants
with an improved combination of substitutions [10]. By this approach, we hope to isolate
artificial tGCN5 variants with improved activity for the target histone peptide while
maintaining improved stability.
Active Learning
5. Active learning is a type of supervised learning technique where the classifier is
built by iteratively choosing the most informative data from a superset of unlabelled data.
This type of learning method is useful for experiments where data is expensive. Based on
the available data, a classifier is built. New data points are then chosen based on this
classifier. The chosen data points are then added to the training file to build another
classifier which is expected to be better than the previous one. We explored uses three
active learning methods discussed by Danziger et al.[11], minimal marginal hyperplane,
maximal marginal hyperplane, and maximum curiosity.
Minimal Marginal Hyperplane [11]
Minimal Marginal Hyperplane chooses the next data point by the data point’s
proximity to the decision boundary. The assumption here is that the points that are
closest to the decision boundary are those the most informative data. Therefore, the
classifier expects to achieve the desired learning accuracy faster by making use of this
close, unclassified data.
Maximal Marginal Hyperplane [11]
Maximal Marginal Hyperplane is similar to Minimal Marginal Hyperplane,
except that the next furthest point from the hyperplane is chosen to be the next data point.
Maximum Curiosity [11]
Maximum Curiosity chooses the data point by giving each point a score and then
picking the point which has the highest score. The formula to calculate the score of each
data point is
6. (tpt • tnt ) − ( fpt • fpn )
rt =
( tpt + fpt )(tpt + fnt )( tnt + fpt )(tnt + fnt )
This method assumes each data point to be active and then calculate the score.
Then, it takes the same data point and assumes it to be inactive and then calculate the
score. The higher score among the two was chosen.
Results and Discussion
Comparison of Active Learning Techniques
In order to determine the best active learning technique to use selecting tGcN5
variants, we compared the active learning techniques on a similar data set from a project
by Liao et al [10]. Figure 3 shows the overall experiment design. Generally, the more
the data, the more accurate the classifier will be. The active learning methods are
intended to help gain the highest accuracy quicker. We generated two different initial
training sets. We recorded the accuracy of the classifier as more data have been added.
The following graph shows the accuracy level obtained by each method as more data
points are added.
7. Test Accuracy
Dat
Training Weka[5] Classifier Active
data Learning
Label the new point
Best Next
Point
Figure 3. Choosing the next data point using active learning
8. Max Curiosity
Comparison ( Run 2)
100 Min HP
90 Random
80
accuracy 70 MaxHP
60
50
40
30
20
10
0
0 20 40 sizes 60 80 100
Figure 4: Comparison of data points chosen using active learning methods and random selection on the
first run
To make sure that we did not have a biased initial training set, another training set was
chosen to be the starting set and the active learning methods were run again.
Random
Comparison MaxHP
MinHP
100
90 MaxCuriosity
80
accuracy(%)
70
60
50
40
30
20
10
0
0 20 40 sizes 60 80 100
Figure 5: Comparison of data points chosen using active learning methods and random
selection on the second run
9. The two different seed training file gives different accuracy value to start of with. In
figure 4, the classifier improves its accuracy quickly. It was also shown that using active
learning methods is actually better than random selection of data. For the second initial
training file, the difference between random data selection and active learning methods is
not significant. Among the three methods that have been tested, maximum curiosity
seems to improve the classifier faster than the other two methods. When the
experimental data on tGCn5 are available, we plan to use an active learning method to
select additional protein variants.
PCR amplification of each fragment
In order to generate the designed variants bearing multiple mutations, we had to
assemble the fragments bearing mutations. By using the primers containing the
mutations, we were able to generate the fragments with the mutation (s) using PCR
assembly [12]. PCR allowed the primers to anneal to the template DNA (tGCN5 gene)
and amplify a fragment of the tGCN5 sequence. After amplifying all the fragments, we
ran another PCR to anneal the individual fragments to each other to generate a full-length
variant bearing the set of mutation, an example of sequence 10 shown in Figure 5.
Sequence 8, 9, and 11 were also generated shown in Figure 6. The full-length variant
will be restricted with the enzymes, Hind III and Bam HI, and cloned into the vector
pQE30. Once we have our new construct, we will proceed to protein expression and do
fluorescence assay.
10. Figure 5: PCR amplified, example, variant 10 ( (ladder), mutant 1 (~150 bp), mutant 2 (~54 bp), mutant 3
(~48 bp), mutant 4 (~90 bp), mutant 5a and 5b (~212 bp), mutant 6 (~100 bp)) on a 2% DNA gel (left).
The fragments are annealed and amplified (right).
L I----------------8-----------I I--------------9-----------I I------------11---------I
8 8 8 9 9 9 11 11 11
Figure 6: PCR alignment of sequence 8, sequence 9, and sequence 11.
Protein expression of tGCN5 and single mutants of tGCN5
Protein expressions of wild-type tGCN5, F90Y, and A121T, gene in the plasmid
pQE30 were transformed in a phenylalanine auxotrophic strain AFIQ. The protein
expression was visualized on 12 % SDS PAGE Figure 7 A. The expressed proteins were
purified on a 1 mL cobalt gel slurry (TALON® Metal Affinity Resin) with increasing
11. concentration of imidazole shown on 12 % SDS PAGE Figure 7 B, C and D. From the
SDS PAGE, the largest fraction of pure protein appeared in elution 4 (E4) for wild-type
tGCN5 at 21 kDa (Figure 7 B). For F90Y and A121T, the largest fraction appeared in
elution 2 (E2) and 3 (E3) (Figure 7 C and D). In Figure 7 B, there were impurities
shown in E1-4 for F90Y which indicate that we need to optimize purification conditions.
The largest fractions were subjected to dialysis for the removal of imidazole for
fluorescent assay.
-- WT-- -- F90Y -- -- A121T --
A L - + - + - +
20 kDa
12. B L E1 E2 E3 E4 E5
21 kDa
C L E1 E2 E3 E4 E5 D L E1 E2 E3 E4 E5
21 kDa
21 kDa
Figure 7 A) SDS PAGE gel result of overexpressed protein at 21 kDa: L (Ladder), pre-induction (-) and
overexpressed protein (+). SDS PAGE gel results of protein purification at 21 kDa: B) Wild-type tGCN5,
C) F90Y, D) A121T: L (Ladder), E1 (elution 1), L2 (elution 2), E3 (elution 3), E4 (elution 4), E5 (elution
5).
Fluorescent assay of tGCN5 and mutants
Kinetic data for tGCN5 was determined using fluorometric assay which detects
the enzymatic production of coenzyme A (CoA) as tGCN5 transfers the acetyl group
from AcCoA to lysine on a peptide, H3p19. The fluorophore, 7-diethylamino-3-(4’-
maleinidylphenyl)-4-methylcoumarin (CPM), reacts with CoA generated in the
acetyltransferase reaction giving a strong fluorescent emission at 465 nm (excitation
wavelength is 365 nm) [1]. 5.9 µM tGCN5 and tGCN5 mutants were tested with
13. different concentrations of H3p19 (1.2 mM, 0.6 mM, 0.3 mM. 0.15 mM, and 0.075 mM).
The Line-Weaver Burke equation generated from the fluorescent assay was used to
calculate Vmax, Kcat, and Km.
Based on the data (Figure 8 A, B, C and Table 3), A121T appeared to have the
highest turnover and specificity towards H3p19 compared to wild-type tGCN5 and F90Y.
Wild-type tGCN5 was tested in triplicate whereas the mutants were tested only once.
The Vmax, Kcat, and Km for wild-type tGCN5 were within the standard deviation. The
observed large standard deviation might be due to the fact that each trial we did for the
wild-type tGCN5 was performed at room temperature. We will need to repeat the
experiment.
14. A Wild-type tGCN5
)
25
20
15
1/V(o)
10
5
y = 121.57x + 0.35
0
0 0.05 0.1 0.15 0.2
1/[H3]
B) C)
F90Y A121T
45
600 40
500 35
30
1/V(o)
400
25
1/V(o)
300 20
200 15
10 y = 246.73x + 0.2883
100 y = 874.58x + 197.01
5
0 0
0 0.05 0.1 0.15 0.2 0 0.05 0.1 0.15 0.2
1/[H3] 1/
1/[H3]
Figure 8. Fluorescence assay results of: A. wild-type tGCN5. B.F90Y. C. A121T.
Table 3: Line Weaver-Burke equation was used to determine kinetics of wild-type tGCN5 and mutants.
Vmax(mM/sec) Km (mM) Kcat (Sec-1) Kcat/Km (Sec -1mM-1)
0.018 ± 2.388 ± 3.068 ± 1.420 ±
WT tGCN5 0.025 3.321 4.226 0.247
F90Y 1.14E-06 0.001 0.001163 1.162786
A121T 0.0035 0.8558 3.5274 4.1217
Conclusion
Wild-type tGCN5, F90Y and A121T were tested for activities using fluorescent
assay. The experimental data showed that A121T exhibited better activity and highest
turnover than wild-type tGCN5 and F90Y. Moreover, the wild-type tGCN5 Vmax, Kcat,
15. and Km were successfully calculated. The samples will be tested further under ice for
each trial to confirm experimental data.
Site-directed mutagenesis on tGCN5 was carried out to create single mutations
shown in Table 2 (X1- X10). Eight mutations were confirmed by DNA sequencing. We
will repeat site-directed mutagenesis procedure for the other three (X3: V98A, I99V,
L100I, X1: V86T, K87R, and X2: F112R) and send it for sequencing. The confirmed
single mutations will be analyzed for stability or/and activity and compared to wild-type
tGCN5.
Once we have activity results from the protein variants shown in Table 1 and
Table 2 with or without the incorporation of oFF, we will employ a machine-learning
algorithm to design a set of variants, which we hope will have improved activity. Our
machine learning experiments suggest that maximum curiosity will be the best active
learning technique to use. In future work, we plan to explore variants of the active
learning algorithms and different ways to model the feature space of the tGCN5 variants.
References:
1. Trievel, R.C., F.Y. Li, and R. Marmorstein, Application of a fluorescent histone
acetyltransferase assay to probe the substrate specificity of the human p300/CBP-
associated factor. Anal Biochem, 2000. 287(2): p. 319-28.
2. Tanner, K.G., et al., Catalytic mechanism and function of invariant glutamic acid
173 from the histone acetyltransferase GCN5 transcriptional coactivator. J Biol
Chem, 1999. 274(26): p. 18157-60.
3. Rojas, J.R., et al., Structure of Tetrahymena GCN5 bound to coenzyme A and a
histone H3 peptide. Nature, 1999. 401(6748): p. 93-8.
4. Langer, M.R., et al., Modulating acetyl-CoA binding in the GCN5 family of
histone acetyltransferases. J Biol Chem, 2002. 277(30): p. 27337-44.
16. 5. Yan, Y., et al., Crystal structure of yeast Esa1 suggests a unified mechanism for
catalysis and substrate binding by histone acetyltransferases. Mol Cell, 2000.
6(5): p. 1195-205.
6. Lin, Y., et al., Solution structure of the catalytic domain of GCN5 histone
acetyltransferase bound to coenzyme A. Nature, 1999. 400(6739): p. 86-9.
7. Brownell, J.E., et al., Tetrahymena histone acetyltransferase A: a homolog to
yeast Gcn5p linking histone acetylation to gene activation. Cell, 1996. 84(6): p.
843-51.
8. Trievel, R.C., et al., Crystal structure and mechanism of histone acetylation of the
yeast GCN5 transcriptional coactivator. Proc Natl Acad Sci U S A, 1999. 96(16):
p. 8931-6.
9. Burman, R.L., J. P., The design of optimum multifacterial experiments. Vol. 33.
1943: Biometrika. 305-325.
10. Liao, J., et al., Engineering proteinase K using machine learning and synthetic
genes. BMC Biotechnol, 2007. 7: p. 16.
11. Danziger, S.e.a., Choosing where to look next in a mutation sequence space:
Active Learning of informative p53 cancer rescue mutants. Bioinformatics. 23: p.
104-114.
12. Stemmer, W.P.C., Crameri, A., Ha, K. D., Brennan, T. M., and Heyneker, H. L.,
Single-step assembly of a gene and entire plasmid from large numbers of
oligodeoxyribonucleotides. Elsevier Science 1995. 164: p. 49-53.