The External Research team is a subdivision of Microsoft Research, and has the goal of demonstrating that Microsoft tools and research technologies can be usefully applied in many different areas of scientific research.External Research focuses on a small number of global ‘themes’ – areas of research where Microsoft tools can make a significant research impact. They are:Computer ScienceEarth, Energy and the EnvironmentScholarly CommunicationHealth and Wellbeing – and the examples shown in these slides relate to this areaEach of these themes maintains a portfolio of two types of project:Researcher collaborations – where a Microsoft researcher is actively engaged with an academic partner, usually such that Microsoft provides the computing expertise and the academic partner provides the knowledge in their domain of research. Successful projects of this type generate scientific insights for the academic research and advance the computing research of Microsoft.Software development – the External Research team also develops software applications and platforms specific to the needs of the scientific community. This software is often developed collaboratively with academic researchers to ensure it remains relevant to their needs.
Microsoft External Research’s goal with this project is to enable communities who maintain ontologies to more easily experiment and to enhance the experience of authors who use Microsoft Word for content creation, incorporating semantic knowledge into the content. This add-in should simplify the development and validation of ontologies, by making ontologies more accessible to a wide audience of authors and by enabling semantic content to be integrated in the authoring experience, capturing the author’s intent and knowledge at the source, and facilitating downstream discoverability. The goal of the add-in is to assist scientists in writing a manuscript that is easily integrated with existing and pending electronic resources. The major aims of this project are to add semantic information as XML mark-up to the manuscript using ontologies and controlled vocabularies (from the National Center for Biomedical Ontology) and identifiers from major biological databases, and to integrate manuscript content with existing public data repositories.As part of the publishing workflow and archiving process, the terms added by the add-in, providing the semantic information, can be extracted from Word files, as they are stored as custom XML tags as part of the content. The semantic knowledge can then be preserved as the documented is converted to other formats, such as HTML or the XML format from the National Library of Medicine, which is commonly used for archiving.The full benefit of semantic-rich content will result from an end-to-end approach to the preservation of semantics and metadata through the publishing pipeline, starting with capturing knowledge from the subject experts, the authors, and enabling this knowledge to be preserved when published, as well as made available to search engines and presented to people consuming the content. This project resulted from an initial and ongoing collaboration between Microsoft External Research and Dr. Phil Bourne and Dr. Lynn Fink, at the University of California San Diego. Additional collaboration with the staff from Science Commons aims to make the add-in relevant to a wider audience and also to preserve semantic data along the publishing pipeline.
NodeXL is a template for Excel 2007 that lets you enter a network edge list, click a button, and see the network graph, all in the Excel window. You can easily customize the graph’s appearance; zoom, scale and pan the graph; dynamically filter vertices and edges; alter the graph’s layout; find clusters of related vertices; and calculate a set of graph metrics. Networks can be imported from and exported to a variety of data formats, and built-in connections for getting networks from Twitter, Flickr, YouTube, and your local email are provided.
3D Molecule Viewer is a stand-alone, demo version of the C-ME application that InterKnowlogy built for the Scripps Research Institute (TSRI). It is a WPF application built in C#. Affectionately called "The Cancer App", the full version of this application (a WPF front-end for SharePoint) is running in production and installed all over the world. As the brain-child of Dr. Peter Kuhn of TSRI, C-ME is just a step in realizing his dream/mission of "getting his arms around" cancer to turn it into a managed disease. This stand-alone, source code version of the application does not have the SharePoint dependency and allows you to open sample 3D Protein Database Format (PDB) files directly....spin them in 3D, zoom in on them, display them from different views, etc. This means you can get the application running quickly and stare at the code. Just a heads up: although WPF makes 3D dramatically easier, it still is not for the faint of heart. There is a lot of Trigonometry and Calculus in the code. And it's really well written - which means its object oriented and consequently abstracted. The problem that C-ME solved (what Dr. Peter Kuhn did not have) was a way to view cancer and SARS molecules in 3D (and 2D) and attach research directly to the 3D (and 2D) surface of the molecules. Research takes many forms: Office documents, like Word, PDFs, URLs to content all over the world, pictures, and even SharePoint discussions. Upon "pinning" research to the exact spot on the 3D (or 2D) surface of the molecule the research is actually persisted into SharePoint with the 5 coordinates of 3D. This Rich Client WPF application consumes SharePoint Web Services to pull that off. This "new" application development paradigm solves an interesting problems like a highly graphical and usable 3D client for the desktop and the broad reach of a browser based application (SharePoint) to house the research and handle the collaboration and workflow.
With Project Trident, you can author workflows visually by using a catalog of existing activities and complete workflows. The workflow workbench provides a tiered library that hides the complexity of different workflow activities and services for ease of use.
In addition to the software projects conducted by the External Research team, a range of collaborative scientific projects have been conducted over the years, each partnering a Microsoft researcher with an academic researcher to advance both the computer science and the biological research.While many of these projects have been successful in achieving their research goals, each has led to the independent development of software of value to only a small fraction of the scientific community. It would clearly be preferable if each additional software development could build on those that went before, resulting in a richer and more capable platform for research. This was one of the primary motivations behind the Microsoft Biology Foundation.
Typically, Microsoft researchers conduct computer science research – but some Microsoft researchers, such as David Heckerman and his team, work directly on problems in the life sciences. In this case, Dr. Heckerman applied his expertise in machine learning to the design of vaccines.Again, this work results in a range of freely-available software tools that can be downloaded and used by the scientific community. These tools encapsulate unique approaches to scientific challenges such as the construction of phylogenetic trees.
The purpose of the Microsoft Biology Foundation is to create a platform for the construction of applications of value to the life science community. To do this, we are combining many of the projects already underway in the External Research team – collaborations with academia, internal life science research, product development activities within Microsoft Corporation, and existing products that can be applied to biological research. All of these, plus dedicated software development on core features and community involvement, has led to the development of the Microsoft Biology Foundation.
The Microsoft Biology Foundation and its Applications Simon Mercer Director for Health & Wellbeing Microsoft External Research
Binary and source code: http://3dmoleculeviewer.codeplex.com/
The Trident Scientific Workflow Workbench A visual workflow environment that allows researchers to better manage, evaluate and interact with even the most complex scientific datasets Built on top of Windows Workflow Foundation Write once, deploy and run anywhere… Visually program workflows Libraries of activities and workflows Automatic provenance capture Available at: http://research.microsoft.com/en-us/collaboration/tools/trident.aspx
Previous bioinformatics project outputs Jaroslav Pillardy, Computational Biology Service Unit, Cornell University BioHPC: Suite of 28 applications modified and adapted for efficient use in an Windows HPC environment with ASP.NET interface Currently supports the areas of DNA sequence analysis, protein structure prediction, population genetics and phylogenetics Jim Hogan, SilverMap: Queensland University of Technology
MQUTer supports research into bioinformatics, sensor networks, visualization and parallelism on the Microsoft platform
Six new tools – the latest under development using MBF and Silverlight 3 which visualizes DNA sequence similarity and is integrated into MBF (and will shortly be available as an Excel plug-in)
Robin Gutell, Center for Computational Biology and Bioinf., UT Austin
Suite of tools to explore evolutionary relationships and predict function of RNA molecules
Available as a website – also a complementary open-source suite of Windows-based tools, under development using MBF (H1 FY11)
+ Cancer Bioinformatics in ER Marty Humphrey, Department of Computer Science, University of Virginia
The caBIGplatform connects consumers, the care delivery system, and the research community. Close to 60 NCI-designated Cancer Centers are deploying caBIG® infrastructure and tools, as are 16 Community Cancer Centers that in the aggregate touch 20 million lives.
This project pilots caBIG clients on Windows, leveraging and extending MBF, and tutorials demonstrating the value of Microsoft technologies to the caBIG developer and user community.
Fighting HIV and AIDS
Four-year collaboration between Bruce Walker at Harvard and David Heckerman’s team (Microsoft Research)
Ability to leverage greater strength from existing use of other MS technologies
Provides transition from local to cloud-based computation and data storage
Objectives Modular by design Commonly used features Exceptionally well-documented Extensible Interoperable
Initial Areas of Focus Genomics Sequencing Analysis and Annotation Advanced Research Phylogenetics Genome Wide Association Haplotype reconstruction Next Targets Visualization Large data sets
mbf.codeplex.com Open SourceAvailable free of charge for commercial and non-commercial use and modification under the MS-PL license (http://opensource.org/licenses/ms-pl.html) Community-DevelopedMoved to CodePlex, Creating advisory board and building a community Community-CuratedModify code, find bugs, contribute new features V1 ReleaseLate June 2010
Build executables Visual Studio Office add-in BioExcel Commandline scripting access Iron Python, PowerShell Workflow Activities Trident, WF Services on the Cloud Azure Different Styles of Usage
18 Selecting Restriction Endonucleases: DNA PReDuST(Aditi Technologies) Fragment Size Distribution Graph Restriction Map [Circular DNA]
Computational Biology Applications Suite for High Performance Computing (BioHPC) Computational Biology Service Unit
MBF Team Mike Zyskowski, Chris Wu Microsoft Research David Heckerman, Bob Davidson, Carl Kadie, Yogesh Simmhan, Jennifer Listgarten, Jonathan Carlson Cornell University Jarek Pillardy Queensland University of Technology Jim Hogan University of Texas at Austin Robin Gutell Aditi Technologies Vivek Kumar Illumina Corporation Scott Kahn Johnson & Johnson Pharmaceutical Research Division LLC. Dimitris Agrafiotis, Victor Lobanov, Jeremy Kolpak Acknowledgements mbf.codeplex.com