Your SlideShare is downloading. ×
Mercer bosc2010 microsoft_framework
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Mercer bosc2010 microsoft_framework

1,285
views

Published on

Published in: Technology

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,285
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
5
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • The External Research team is a subdivision of Microsoft Research, and has the goal of demonstrating that Microsoft tools and research technologies can be usefully applied in many different areas of scientific research.External Research focuses on a small number of global ‘themes’ – areas of research where Microsoft tools can make a significant research impact. They are:Computer ScienceEarth, Energy and the EnvironmentScholarly CommunicationHealth and Wellbeing – and the examples shown in these slides relate to this areaEach of these themes maintains a portfolio of two types of project:Researcher collaborations – where a Microsoft researcher is actively engaged with an academic partner, usually such that Microsoft provides the computing expertise and the academic partner provides the knowledge in their domain of research. Successful projects of this type generate scientific insights for the academic research and advance the computing research of Microsoft.Software development – the External Research team also develops software applications and platforms specific to the needs of the scientific community. This software is often developed collaboratively with academic researchers to ensure it remains relevant to their needs.
  • Microsoft External Research’s goal with this project is to enable communities who maintain ontologies to more easily experiment and to enhance the experience of authors who use Microsoft Word for content creation, incorporating semantic knowledge into the content. This add-in should simplify the development and validation of ontologies, by making ontologies more accessible to a wide audience of authors and by enabling semantic content to be integrated in the authoring experience, capturing the author’s intent and knowledge at the source, and facilitating downstream discoverability. The goal of the add-in is to assist scientists in writing a manuscript that is easily integrated with existing and pending electronic resources. The major aims of this project are to add semantic information as XML mark-up to the manuscript using ontologies and controlled vocabularies (from the National Center for Biomedical Ontology) and identifiers from major biological databases, and to integrate manuscript content with existing public data repositories.As part of the publishing workflow and archiving process, the terms added by the add-in, providing the semantic information, can be extracted from Word files, as they are stored as custom XML tags as part of the content. The semantic knowledge can then be preserved as the documented is converted to other formats, such as HTML or the XML format from the National Library of Medicine, which is commonly used for archiving.The full benefit of semantic-rich content will result from an end-to-end approach to the preservation of semantics and metadata through the publishing pipeline, starting with capturing knowledge from the subject experts, the authors, and enabling this knowledge to be preserved when published, as well as made available to search engines and presented to people consuming the content. This project resulted from an initial and ongoing collaboration between Microsoft External Research and Dr. Phil Bourne and Dr. Lynn Fink, at the University of California San Diego. Additional collaboration with the staff from Science Commons aims to make the add-in relevant to a wider audience and also to preserve semantic data along the publishing pipeline.
  • NodeXL is a template for Excel 2007 that lets you enter a network edge list, click a button, and see the network graph, all in the Excel window. You can easily customize the graph’s appearance; zoom, scale and pan the graph; dynamically filter vertices and edges; alter the graph’s layout; find clusters of related vertices; and calculate a set of graph metrics. Networks can be imported from and exported to a variety of data formats, and built-in connections for getting networks from Twitter, Flickr, YouTube, and your local email are provided.
  • 3D Molecule Viewer is a stand-alone, demo version of the C-ME application that InterKnowlogy built for the Scripps Research Institute (TSRI). It is a WPF application built in C#. Affectionately called "The Cancer App", the full version of this application (a WPF front-end for SharePoint) is running in production and installed all over the world. As the brain-child of Dr. Peter Kuhn of TSRI, C-ME is just a step in realizing his dream/mission of "getting his arms around" cancer to turn it into a managed disease. This stand-alone, source code version of the application does not have the SharePoint dependency and allows you to open sample 3D Protein Database Format (PDB) files directly....spin them in 3D, zoom in on them, display them from different views, etc. This means you can get the application running quickly and stare at the code. Just a heads up: although WPF makes 3D dramatically easier, it still is not for the faint of heart. There is a lot of Trigonometry and Calculus in the code. And it's really well written - which means its object oriented and consequently abstracted. The problem that C-ME solved (what Dr. Peter Kuhn did not have) was a way to view cancer and SARS molecules in 3D (and 2D) and attach research directly to the 3D (and 2D) surface of the molecules. Research takes many forms: Office documents, like Word, PDFs, URLs to content all over the world, pictures, and even SharePoint discussions. Upon "pinning" research to the exact spot on the 3D (or 2D) surface of the molecule the research is actually persisted into SharePoint with the 5 coordinates of 3D. This Rich Client WPF application consumes SharePoint Web Services to pull that off. This "new" application development paradigm solves an interesting problems like a highly graphical and usable 3D client for the desktop and the broad reach of a browser based application (SharePoint) to house the research and handle the collaboration and workflow.
  • With Project Trident, you can author workflows visually by using a catalog of existing activities and complete workflows. The workflow workbench provides a tiered library that hides the complexity of different workflow activities and services for ease of use.
  • In addition to the software projects conducted by the External Research team, a range of collaborative scientific projects have been conducted over the years, each partnering a Microsoft researcher with an academic researcher to advance both the computer science and the biological research.While many of these projects have been successful in achieving their research goals, each has led to the independent development of software of value to only a small fraction of the scientific community. It would clearly be preferable if each additional software development could build on those that went before, resulting in a richer and more capable platform for research. This was one of the primary motivations behind the Microsoft Biology Foundation.
  • Typically, Microsoft researchers conduct computer science research – but some Microsoft researchers, such as David Heckerman and his team, work directly on problems in the life sciences. In this case, Dr. Heckerman applied his expertise in machine learning to the design of vaccines.Again, this work results in a range of freely-available software tools that can be downloaded and used by the scientific community. These tools encapsulate unique approaches to scientific challenges such as the construction of phylogenetic trees.
  • The purpose of the Microsoft Biology Foundation is to create a platform for the construction of applications of value to the life science community. To do this, we are combining many of the projects already underway in the External Research team – collaborations with academia, internal life science research, product development activities within Microsoft Corporation, and existing products that can be applied to biological research. All of these, plus dedicated software development on core features and community involvement, has led to the development of the Microsoft Biology Foundation.
  • Transcript

    • 1. The Microsoft Biology Foundation and its Applications
      Simon Mercer
      Director for Health & Wellbeing
      Microsoft External Research
    • 2. Microsoft External Research - Software
    • 3. Ontology Add-in for Word
      Services: Ontology download web service
      • John Wilbanks
      • 4. Phil Bourne
      • 5. Lynn Fink
      Intent: Term recognition & disambiguation
      Relationships: Ontology browser
      Source code and binary:
      http://research.microsoft.com/ontology/
    • 6. NodeXL
      Binary and source code:
      http://nodexl.codeplex.com
    • 7. 3D Molecule Viewer
      • PDB File Viewer
      • 8. Written in C# using WPF
      Binary and source code:
      http://3dmoleculeviewer.codeplex.com/
    • 9. The Trident Scientific Workflow Workbench
      A visual workflow environment that allows researchers to better manage, evaluate and interact with even the most complex scientific datasets
      Built on top of Windows Workflow Foundation
      Write once, deploy and run anywhere…
      Visually program workflows
      Libraries of activities and workflows
      Automatic provenance capture
      Available at: http://research.microsoft.com/en-us/collaboration/tools/trident.aspx
    • 10. Origins of a Platform
    • 11. Previous bioinformatics project outputs
      Jaroslav Pillardy, Computational Biology Service Unit, Cornell University
      BioHPC: Suite of 28 applications modified and adapted for efficient use in an Windows HPC environment with ASP.NET interface
      Currently supports the areas of DNA sequence analysis, protein structure prediction, population genetics and phylogenetics
      Jim Hogan, SilverMap: Queensland University of Technology
      • MQUTer supports research into bioinformatics, sensor networks, visualization and parallelism on the Microsoft platform
      • 12. Six new tools – the latest under development using MBF and Silverlight 3 which visualizes DNA sequence similarity and is integrated into MBF (and will shortly be available as an Excel plug-in)
      Robin Gutell, Center for Computational Biology and Bioinf., UT Austin
      • Suite of tools to explore evolutionary relationships and predict function of RNA molecules
      • 13. Available as a website – also a complementary open-source suite of Windows-based tools, under development using MBF (H1 FY11)
      + Cancer Bioinformatics in ER
      Marty Humphrey, Department of Computer Science, University of Virginia
      • The caBIGplatform connects consumers, the care delivery system, and the research community. Close to 60 NCI-designated Cancer Centers are deploying caBIG® infrastructure and tools, as are 16 Community Cancer Centers that in the aggregate touch 20 million lives.
      • 14. This project pilots caBIG clients on Windows, leveraging and extending MBF, and tutorials demonstrating the value of Microsoft technologies to the caBIG developer and user community.
    • Fighting HIV and AIDS
      • Four-year collaboration between Bruce Walker at Harvard and David Heckerman’s team (Microsoft Research)
      • 15. Discovered three key insights to fight HIV:
      • 16. Immune system is led astray by decoy epitopes (Nature Medicine, 2006)
      • 17. Frameshift epitopes exist (JEM, 2010)
      • 18. Natural killer cells directly attack HIV (Nature Medicine, in review)
      • 19. 40+ publications, including Nature and Science
      • 20. Walker has obtained $110M+ subsequent funding
      • 21. PhyloD.Net, a tool for inferring HIV evolution in an individual, is used by 100+ HIV researchers and is now part of Microsoft Biology Foundation
      • 22. Numerous press stories including Business Week and NPR
    • Convergence on a Strategic Platform for Bioinformatics
      Microsoft BiologyFoundation
      • Beta 1: Nov 5, 2009 (MS Connect)
      • 23. Beta 2: Feb 10, 2010 (CodePlex)
      • 24. V1 release: July 2010
      • 25. Early adopters from industry and academia
      • 26. Bio-IT Alliance partner
      • 27. Leveraging Microsoft assets: Pivot, NodeXL, TRIDENT, Iron Python, etc
      • 28. Showcasing Microsoft products: Excel/Office, Visual Studio 2010, .NET 4.0, WPF, Silverlight
      • 29. V1 launch June 2010
      • 30. Keynote presentations planned
      • 31. Training course in prep
      • 32. Community ownership
      • 33. Foundation of future MSR genomics projects
      • 34. Foundation of all future ER genomics engagements with academia
      Azure engagement through XCG(Azure BLAST, PhyloD services)
      Product engagement and prototyping use by TC, HSG
    • 35. What is The Microsoft Biology Foundation?
      An open-source library of reusable bioinformatics algorithms, services and functions built on the .NET platform
      Benefits:
      • Easy to parallelize algorithms
      • 36. Easy to distribute computations and workflows
      • 37. Easy to visualize massive data sets
      • 38. Ability to leverage greater strength from existing use of other MS technologies
      • 39. Provides transition from local to cloud-based computation and data storage
    • Architecture: Namespaces
    • 40. Objectives
      Modular by design
      Commonly used features
      Exceptionally well-documented
      Extensible
      Interoperable
    • 41. Initial Areas of Focus
      Genomics
      Sequencing
      Analysis and Annotation
      Advanced Research
      Phylogenetics
      Genome Wide Association
      Haplotype reconstruction
      Next Targets
      Visualization
      Large data sets
    • 42. mbf.codeplex.com
      Open SourceAvailable free of charge for commercial and non-commercial use and modification under the MS-PL license (http://opensource.org/licenses/ms-pl.html)
      Community-DevelopedMoved to CodePlex, Creating advisory board and building a community
      Community-CuratedModify code, find bugs, contribute new features
      V1 ReleaseLate June 2010
    • 43. Build executables
      Visual Studio
      Office add-in
      BioExcel
      Commandline scripting access
      Iron Python, PowerShell
      Workflow Activities
      Trident, WF
      Services on the Cloud
      Azure
      Different Styles of Usage
    • 44. mbf.codeplex.com
    • 45. 18
      Selecting Restriction Endonucleases: DNA PReDuST(Aditi Technologies)
      Fragment Size Distribution Graph
      Restriction Map [Circular DNA]
    • 46. Computational Biology Applications Suite for High Performance Computing (BioHPC)
      Computational Biology Service Unit
    • 47. MBF Team
      Mike Zyskowski, Chris Wu
      Microsoft Research
      David Heckerman, Bob Davidson, Carl Kadie, Yogesh Simmhan, Jennifer Listgarten, Jonathan Carlson
      Cornell University
      Jarek Pillardy
      Queensland University of Technology
      Jim Hogan
      University of Texas at Austin
      Robin Gutell
      Aditi Technologies
      Vivek Kumar
      Illumina Corporation
      Scott Kahn
      Johnson & Johnson Pharmaceutical Research Division LLC.
      Dimitris Agrafiotis, Victor Lobanov, Jeremy Kolpak
      Acknowledgements
      mbf.codeplex.com
    • 48. © 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
      The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.