Clustering without limits


Published on

Testing upload

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Clustering without limits

  1. 1. NewsBytes Aquaporin Simulations exchange experimentally for about ten years. To him, aquaporins are a likely De-Bunk Gas Exchange suspect for gas conduction because they Assumptions exist in places where oxygen must go in Biologists have long taken gas and carbon dioxide must come out. For exchange for granted, assuming that example they are plentiful in cells that gases simply seep through the cell’s lipid line the lung, in red blood cells, and in membrane. Since 1998, however, evi- astrocytes—cells at the blood-brain barri- dence has been building that gases er. But it’s very hard to measure small might also be exchanged through pores changes in oxygen concentration at the created by specialized proteins. surface of a membrane experimentally. Now molecular dynamics simulations So Tajkhorshid’s team pitched in of aquaporins have weighed in on the with molecular dynamics simulations. question. The result: “It’s now well Aquaporins occur in groups of four established that these proteins can con- (tetramers), with four pores that con- duct gas molecules,” says Emad duct water (one through each aquapor- Tajkhorshid, PhD, co-author of the in molecule) and one central pore work and assistant professor of bio- where the molecules meet. The latter, chemistry, pharmacology and biophysics until now, had no known function. at the University of Illinois at Urbana- When simulated using two comple- Champaign. But, he says, some uncer- mentary methods—explicit sampling tainty remains: “Whether or not it’s with full gas permeation and implicit Simulations of the aquaporin tetramer important in the human body, that’s the ligand sampling—the team found both found that carbon dioxide and oxygen are controversial part.” The work was pub- oxygen and carbon dioxide were exchanged through the central pore—a site lished in the March 2007 issue of the exchanged through that central pore. of previously unknown function. Image Journal of Structural Biology. Carbon dioxide was also transmitted courtesy of Emad Tajkhorshid, a faculty Fifteen to twenty years ago, scientists through the four water pores, while oxy- associate of the NIH Resource for believed that water permeation through gen passed through those pores only Macromolecular Modeling and lipid bilayers was enough for water trans- rarely. The research also found, howev- Bioinformatics, and his UIUC colleagues port into and out of cells. Gradually, er, that a plain lipid bilayer conducts Klaus Schulten, Yi Wang, and Jordi Cohen. “It’s now well established that [aquaporins] can conduct gas molecules,” says Emad Tajkhorshid. “Whether or not it’s important in the human body, that’s the controversial part.” though, researchers realized that some two and a half times as much gas as one properties of the central pore. cells need to control water permeability, embedded with aquaporin tetramers. Meanwhile, Boron’s group is looking for and other cells have lipid bilayers that “The question is whether this pathway a system in which gas conduction aren’t very permeable to water. is significant and makes any difference through aquaporins is a major pathway. Aquaporins, it turned out, carry water in in terms of total permeability of the Says Tajkhorshid: “Even if it’s 30 per- and out in a controllable fashion. “I membrane,” says Tajkhorshid. cent of total gas permeability, it becomes think the same might be true for gas per- The researchers hypothesize that, as physiologically relevant because then meability,” says Tajkhorshid. “Gas perme- with water permeability, aquaporins may you can control it.” ability of a lipid bilayer is like an open be physiologically relevant to gas According to Nazih Nakhoul, PhD, free highway where everything can go exchange when cells have dense, rigid research associate professor in biochem- through. With a protein, you can have a lipid bilayers or when aquaporins occu- istry at Tulane University, “This idea of gating mechanism and some regulation.” py a major fraction of the membrane. gas transport through membrane proteins One of Tajkhorshid’s collaborators, Tajkhorshid plans to introduce point is really gaining support. It’s interesting to Walter Boron, MD, PhD, professor of mutations inside the central pore and see molecular dynamics simulations con- cellular and molecular physiology at Yale manipulate the behavior of a gating loop firm some of the earliest findings.” University, has been working on gas to see how that changes the conducting —By Katharine Miller 2 BIOMEDICAL COMPUTATION REVIEW Summer 2007
  2. 2. NewsBytes Parkinson’s Culprit its hexamer interacting with the cell mediate and each may last only as long membrane required juggling around a as half of a nanosecond. Nevertheless, Modeled million atoms, Tsigelny says. Tsigelny says, even such fleeting inter- Under a microscope, the curious pro- Yet more than the size of alpha-synu- mediates may aggregate. The pore- tein clumps that dot the brains of clein, what made it difficult to model like aggregates, they found, are far Parkinson’s patients stick out like the was its lack of structure. Alpha-synucle- more stable than single molecules of culprits they are. But no one has yet in is an intrinsically unstructured pro- alpha-synuclein. caught the protein—alpha-synuclein—in tein—one without a distinct three- Having this model “is one step for- the act of causing disease. Now, investi- dimensional shape. Most proteins con- ward,” says Hilal Lashuel, PhD, profes- gators report in an April 2007 issue of sistently fold into a favored shape to do sor at the Swiss Federal Institute of FEBS Journal that they’re getting closer: their jobs, a form that can be crystal- Technology in Lausanne, Switzerland. they’ve modeled alpha-synuclein’s early lized, imaged, and pored over. But The UCSD model provides a structural aggregation and offered a detailed mech- unstructured proteins flop this way and basis for testing the hypothesis that anism for its participation in neuron that, even while performing their spe- alpha-synuclein forms toxic pores, he death. cific tasks, making them very difficult adds. But Lashuel also cautions that “This is not just the first computa- to pin down and study. only biochemical and in vivo studies can tional model of alpha-synuclein,” says “We were not scared by an unstable prove whether alpha-synuclein pokes Igor Tsigelny, PhD, an author of the protein,” Tsigelny states. And he and holes in neurons. “Isolating the toxic paper and a computational biologist at his coworkers developed an unusual species is really the most difficult ques- the San Diego Supercomputer Center. “all-dynamic” approach to modeling tion we are dealing with. You have to “Up to now, there was no molecular the protein. None of the conformations catch it in the act.” concept of the aggregation going on.” are final—they are all considered inter- —By Louisa Dalton In the brain cells of Parkinson’s patients, alpha-synuclein first starts to cluster as a proto-fibril. It then forms fib- ril chains, and finally ends up in the dense clumps of fibrils called Lewy bod- ies. Some researchers have suggested in the past few years that alpha-synuclein knocks off neurons right at the begin- ning of aggregation, long before it can be detected as a Lewy body. Biochemical and structural evidence hints that when a few alpha-synuclein molecules first self- assemble into proto-fibrils, they can form pore-like ring structures. These may interact with the cell membrane and allow ions to enter the cell. The entrance of ions such as Ca2+ could lead to neuron death. The computer model created by Tsigelny and his colleagues at the University of California, San Diego, sup- ports this theory, providing detailed dynamics of alpha-synuclein hexamers and pentamers and their interaction with the cell membrane. What’s more, the model shows that another synuclein in the cell—beta-synuclein—blocks alpha- synuclein’s ring-making, suggesting at least one avenue for future inhibitory drug development. Modeling such a complex aggregation wasn’t simple. Alpha-synuclein is a large Alpha-synuclein poses as a pentamer, pore-like, on the surface of a cell membrane. Courtesy protein (140 amino acids), and to model of Igor Tsigelny Summer 2007 BIOMEDICAL COMPUTATION REVIEW 3
  3. 3. NewsBytes Clustering Without Limits “Part of the Starting in preschool we all learn how to get organized. Typically, we start with pre-determined categories (dolls, trains, attraction of the blocks); pre-set ideas about what belongs in each category (Barbie: doll; Thomas [affinity propagation] the Tank Engine: train) and a fixed num- ber of bins to put things in. algorithm is that, But what if you started with none of those initial limitations? Could you still although it was group the toys? It turns out that, in a computer, such sorting is not only possi- ble, but extremely efficient. Using a complicated to Frey and Dueck use affinity propagation novel algorithm called affinity propaga- tion, researchers at the University of derive, it’s quite to cluster data around “exemplars”— data points that best represent their Toronto found that they can not only cluster lots of different kinds of data simple to implement compatriots. In this graphic, after start- ing with an equal chance of serving as an appropriately, but do it better and faster than other methods. The work was and to get an exemplar, candidates for that job have already emerged (red dots). Each data published in the February 16 issue of Science. intuitive feel for it,” point sends messages to each candidate exemplar conveying how well it repre- “Almost all existing techniques work on a hypothesis refinement basis: they says Brendan Frey. sents the blue point compared to other candidate exemplars. And candidate start off with a set of assumed groups exemplars send messages conveying and iteratively refine them,” says their availability to serve as an exemplar Brendan Frey, PhD, associate professor for particular data points. of electrical and computer engineering The task sounds mind-boggling: There at the University of Toronto, co-author are a huge number of possible groupings. of the paper. “To our knowledge, ours is But affinity propagation handles that says Dueck. Indeed the algorithm is so the first algorithm to consider all possi- problem by sending messages between generic that Frey and Dueck used it to ble groupings at once.” data points—pair-wise—so as to maximize analyze gene expression data, facial the net similarity in images, and airline routes, while other each group. “Each mes- researchers have found applications in sage encapsulates or basketball statistics, the stock market and summarizes a whole dis- computer vision. And many tasks in com- tribution of possible putational biology require a computer to groupings for one of the organize the data before using it to make data points,” says predictions. Delbert Dueck, a PhD “Part of the attraction of the algo- candidate in Frey’s lab. rithm is that, although it was complicat- “No one has done that ed to derive, it’s quite simple to imple- before.” ment and to get an intuitive feel for it,” Affinity propagation says Frey. There are basically only two is based on an algo- equations to it. “Sometimes we’ll give a rithm called belief prop- talk and get emails from people who’ve agation, which has been implemented it the day after,” he says. around in various incar- When the researchers looked at how nations for many years. well the algorithm performed compared But, say the authors, it’s to other clustering methods they found an approach that has it remarkably efficient. “A problem our never been applied to algorithm could solve in about five min- If asked to cluster facial images, a standard clustering method clustering. “Certainly utes on one computer would take other (k-means clustering) would take up to a million years on a sin- not to generic clustering methods up to one million years to solve gle computer to achieve the accuracy achieved by affinity prop- of any type of data,” on that same computer,” says Frey. agation after five minutes. 4 BIOMEDICAL COMPUTATION REVIEW Summer 2007
  4. 4. NewsBytes Tim Hughes, PhD, of the Center for lished out of the lab run by Tomaso was able to classify pictures of a busy Cellular and Biomolecular Research at Poggio, PhD, at MIT’s McGovern street scene as well as other leading the University of Toronto, is considering Institute for Brain Research. mathematics-based computer vision sys- using affinity propagation in his For decades, scientists have struggled tems, as described in the March 2007 research. “It seems like it would do best to create computer programs that can rec- issue of IEEE Transactions on Pattern when things really do form independent ognize visual objects as well as humans Analysis and Machine Intelligence. groups, and when the data are can. Some computer systems excel at rec- Serre’s team then built a more com- fairly sparse, so most of the correlation ognizing one particular object, but none plex system, consisting of many S and C matrix can be dropped in early are anywhere close to recognizing the wide layers designed to closely match the flow cycles,” he says. “I think it will work well range of objects observed by the human of information in a human brain during with exon-profiling data or brain. Visual the first 100-200 genome-tiling data, where there is also a recognition is milliseconds of constraint that the groups complicated by “We’ve built a model perception. This have to correspond to regions near each two conflicting enhanced system other on the chromosome.” goals: a program to be as close as performed as well —By Katharine Miller must be specific as humans on a enough to discrim- possible to what is rapid object recog- inate between nition task: distin- Computer Vision that different objects, such as a person known about the guishing animals from non-animals Mimics Human Vision Our brains can recognize most of the or a car, yet flexi- ble enough to rec- human visual when images were flashed in front of things we pass on an evening stroll: ognize the same humans and com- Cars, buildings, trees, and people all reg- type of object in system,” says puters. The work ister even at a great distance or from an different sizes, appeared in the odd angle. Now, a new computer vision poses, and light- Thomas Serre. April 2007 issue of program can do the same thing. It suc- ing. the Proceedings of cessfully rivals the human ability to rap- To achieve these goals, Serre and col- the National Academy of Sciences. The idly recognize objects in a complex pic- leagues used data recorded from real computer system even made errors simi- ture because it mimics how information neurons in the visual system to program lar to the errors made by humans, sug- flows during the initial stages of visual two fundamentally different kinds of vir- gesting that the model recapitulates the perception. tual neurons called S (simple) and C early processes of the human visual sys- “We’ve built a model to be as close as (complex) units. S units recognize specif- tem. possible to what is known about the ic features of an image; C units monitor The model will be used as a tool by human visual system,” explains Thomas a range of S units in one area and allow neuroscientists to better understand the Serre, PhD, a postdoctoral associate in for variation in position and size. human visual system, and also has prac- the Center for Biological and The researchers were surprised to tical applications for surveillance, driv- Computational learning at MIT and find that a simple system, consisting of ing assistance, and autonomous robot- lead author of two papers recently pub- four alternating layers of S and C units, ics. According to Poggio, the team’s next When presented with a real-world street scene (left), Serre’s computer vision system successfully recog- nized pedestrians, cars, buildings, trees, sky, and the street (right). Although not pictured, the model also successfully identified bicycles. Note the error in this example: the model mistakenly classified a street sign as a pedestrian. Graphic cour- tesy of Stanley Bileschi, PhD, McGovern Institute for Brain Research at MIT. Summer 2007 BIOMEDICAL COMPUTATION REVIEW 5
  5. 5. NewsBytes goal is to extend the model to include the “back projections” from other parts of the brain that allow feedback process- ing of visual information after 200 mil- liseconds. Agent-based computer models predict the “This is the first demonstration that pattern (left) produced when genetically a purely bottom up approach to visual identical cells have an inherent probability object recognition, inspired by record- of changing (from green to red and vice ings from the neurons in the brain, is versa), and the pattern (right) produced effective as a practical computer vision when cells are triggered to change by an system,” says Terry Sejnowski, PhD, extrinsic factor, such as cell density. Top head of the Computational Neuro- images represent exponential growth; biology Lab at the Salk Institute. “There bottom are at equilibrium. Courtesy of is much more work to do, both to Andras Paldi. improve its performance, and also to use it to better understand how our own visual system works.” agent based models of a tissue culture can affect the differentiation process. —By Matthew Busse, PhD plate. In each model, all cells act inde- “The stem cell nature is not an intrinsic pendently and can switch between two property of the cell,” he says. “It is a prop- cell types: A or B. In the “extrinsic” erty of the whole cell population.” Paldi model, A cells turn into B cells when it further believes the work supports the Nature Versus gets crowded, and back to A cells when effort to find a way of converting adult, Nurture In Silico they have more space. In the “intrinsic” differentiated cells into stem cells (and Every generation, a few noncon- model, each cell has fixed probabilities of avoid the need for harvesting embryonic formists crop up in tissue cultures of switching from A to B and back again. stem cells)—a possibility that has not just genetically identical cells. The question is: When the scientific, are the wayward simply born that way, or scientists ran the Why, in the same warm but social did something in the environment affect models, they and political them? “You have these two possibilities— intrinsic or extrinsic, nature or nurture,” found each pro- duces a stable, spot, getting the same implications as well. says Andras Paldi, PhD, a biologist at heterogeneous Genethon in France. population, yet rich media, do some cells Christa Muller- Now, Paldi and his colleagues have they differ in the Sieburg, modeled such cultured cells to deter- cell patterns. differentiate and others PhD, how- mine whether extrinsic or intrinsic The intrinsic ever, dis- influences play a key role in the sponta- model predicts stay stem cells? putes that neous emergence of phenotypic varia- lone A cells dis- scientific tion. It turns out that for spatial patterns tributed evenly throughout a largely B conclusion. “The idea that mature cells beyond randomness to arise, there has population. Extrinsic predicts that the A can turn into stem cells is very attractive to be some effect of sensing neighboring cells will cluster. The result held even to many modelers but has little support cells—i.e., extrinsic factors must play a though the cells were allowed to migrate. through experimental data,” says the role. And the extrinsic model resembles This pattern difference allowed the professor at the Sidney Kimmel Cancer results seen in real cells. The work researchers to compare their computa- Center. appears in April in PLoS One. tional simulation with real cells. Using a Sui Huang, MD, PhD, at Paldi’s work was motivated in part by muscle cell line that can switch between Children’s Hospital Boston, would the open question among stem cell biol- two distinct phenotypes, a stem-cell like have liked to see Paldi’s group perturb ogists of what triggers a stem cell to dif- progenitor state and a differentiated state, the cell line or the culture to confirm ferentiate. Why, in the same warm spot, they found that the cell pattern mostly their model. But both he and Muller- getting the same rich media, do some resembles that of the extrinsic model. Sieburg believe the study addressed an cells differentiate and others stay stem Many of the rare, stem-cell like cells clus- important question, that of heterogene- cells? It is commonly assumed that this is ter; a few are solitary. ity of a genetically identical population because the decision to differentiate is What’s important here, Paldi says, is of cells. And, says Huang, it certainly intrinsic—that is, purely random. that they find environment playing a “contributes to the discussion in the To test that assumption, Paldi’s group role—a significant one. In the case of stem community.” started by designing two simple, multi- (progenitor) cells, it means neighbor cells —By Louisa Dalton 6 BIOMEDICAL COMPUTATION REVIEW Summer 2007
  6. 6. Simulating Populations But that technique is not without its based on Python. The software is freely problems. When a population evolves for- available at, with Complex Diseases ward in time, there are simply too many under a GPL license. Diabetes, breast cancer, multiple When Peng and his colleagues used possible outcomes. Most notably, when sclerosis, Alzheimer’s disease. All are their method to compare several gene map- you introduce a disease allele, it can rapid- associated with several genes’ alleles ping techniques they found that certain ly be eliminated and replaced with new interacting in complex ways with one methods worked better for loci that were alleles. So Peng came up with a trick: He another and the environment. Now, located distantly from one another; and pre-sets desired disease allele frequencies in using a computationally intensive other methods were method known as forward-time simula- more effective when tion of human populations, researchers loci were close together. are hoping to gain a better understand- Overall, though, says ing of how such complex diseases Kimmel, “We’re mildly become established. pessimistic” about cur- “In a real population you just see peo- rent gene mapping ple with the disease,” says Marek approaches. “When Kimmel, PhD, professor of statistics at the number of loci Rice University and co-author of the CANCER involved in complex work. “You don’t see who in the popula- disease is greater than tion has the disease genes because peo- two, the methods rap- ple carrying these genes do not necessar- MULTIPLE SCLEROSIS idly lose their power.” ily become diseased.” But in the model Until recently, gene population, he says, “you see both.” And mapping for complex the researchers’ approach allows them to diseases has been disap- simulate a very complicated scenario— pointing, he says. Loci including changes in types of selection identified in such pressure. efforts have later “This lets us evaluate how well statis- DIABETES turned out to be statis- tical genetics tests determine what genes tical artifacts. “Our are responsible for the symptoms of a modeling could figure disease and how frequently those genes out if this is inevitable,” appear in the population.” That’s a he says—and help guide non-trivial exercise, he says, because it people toward more has been impossible, until now, to effective approaches. compare the many existing gene-map- David Balding, ping methods head-to-head. The work PhD, a professor of was published in PLoS Genetics in “In a real population, you just see statistical genetics at March 2007. Imperial College in Before now, the most commonly people with the disease,” says London, does similar used approach to simulating diseases in human populations—called the “coales- Marek Kimmel. “You don't see work using forward- time simulations of cent” method—worked by coalescing who in the population has the large genomic backward in time to a most-recent com- regions. He has mon ancestor. But it’s extremely diffi- disease genes...” become pessimistic cult to take selection into account using the current generation, extrapolates them about the method’s usefulness for the coalescent method, says co-author backward, and starts the simulation from understanding complex diseases because Bo Peng, PhD, a postdoctoral fellow at there. As Kimmel puts it, “We are restrict- no one really knows what kind of selec- the University of Texas MD Anderson ing potential variability in one aspect of tion is going on. Nevertheless, he says, Cancer Center. Moreover, that the present in order to produce a simula- this work can be useful for studying approach gets too complicated if more tion that resembles something close to the selection itself. “People tend to look at than one disease gene is involved. So selection one allele at a time,” he says, actual variability that exists now.” Peng and his colleagues turned to for- The simulation uses a scripting lan- “But forward-time simulation lets us do ward-time simulation, an approach guage called simuPOP, a general-purpose it with complex interactions.” that’s been around for about one hun- forward-time simulation environment —By Katharine Miller ■ dred years. Summer 2007 BIOMEDICAL COMPUTATION REVIEW 7