Mercer bosc2010 microsoft_framework

  • 1,257 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,257
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
4
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • The External Research team is a subdivision of Microsoft Research, and has the goal of demonstrating that Microsoft tools and research technologies can be usefully applied in many different areas of scientific research.External Research focuses on a small number of global ‘themes’ – areas of research where Microsoft tools can make a significant research impact. They are:Computer ScienceEarth, Energy and the EnvironmentScholarly CommunicationHealth and Wellbeing – and the examples shown in these slides relate to this areaEach of these themes maintains a portfolio of two types of project:Researcher collaborations – where a Microsoft researcher is actively engaged with an academic partner, usually such that Microsoft provides the computing expertise and the academic partner provides the knowledge in their domain of research. Successful projects of this type generate scientific insights for the academic research and advance the computing research of Microsoft.Software development – the External Research team also develops software applications and platforms specific to the needs of the scientific community. This software is often developed collaboratively with academic researchers to ensure it remains relevant to their needs.
  • Microsoft External Research’s goal with this project is to enable communities who maintain ontologies to more easily experiment and to enhance the experience of authors who use Microsoft Word for content creation, incorporating semantic knowledge into the content. This add-in should simplify the development and validation of ontologies, by making ontologies more accessible to a wide audience of authors and by enabling semantic content to be integrated in the authoring experience, capturing the author’s intent and knowledge at the source, and facilitating downstream discoverability. The goal of the add-in is to assist scientists in writing a manuscript that is easily integrated with existing and pending electronic resources. The major aims of this project are to add semantic information as XML mark-up to the manuscript using ontologies and controlled vocabularies (from the National Center for Biomedical Ontology) and identifiers from major biological databases, and to integrate manuscript content with existing public data repositories.As part of the publishing workflow and archiving process, the terms added by the add-in, providing the semantic information, can be extracted from Word files, as they are stored as custom XML tags as part of the content. The semantic knowledge can then be preserved as the documented is converted to other formats, such as HTML or the XML format from the National Library of Medicine, which is commonly used for archiving.The full benefit of semantic-rich content will result from an end-to-end approach to the preservation of semantics and metadata through the publishing pipeline, starting with capturing knowledge from the subject experts, the authors, and enabling this knowledge to be preserved when published, as well as made available to search engines and presented to people consuming the content. This project resulted from an initial and ongoing collaboration between Microsoft External Research and Dr. Phil Bourne and Dr. Lynn Fink, at the University of California San Diego. Additional collaboration with the staff from Science Commons aims to make the add-in relevant to a wider audience and also to preserve semantic data along the publishing pipeline.
  • NodeXL is a template for Excel 2007 that lets you enter a network edge list, click a button, and see the network graph, all in the Excel window. You can easily customize the graph’s appearance; zoom, scale and pan the graph; dynamically filter vertices and edges; alter the graph’s layout; find clusters of related vertices; and calculate a set of graph metrics. Networks can be imported from and exported to a variety of data formats, and built-in connections for getting networks from Twitter, Flickr, YouTube, and your local email are provided.
  • 3D Molecule Viewer is a stand-alone, demo version of the C-ME application that InterKnowlogy built for the Scripps Research Institute (TSRI). It is a WPF application built in C#. Affectionately called "The Cancer App", the full version of this application (a WPF front-end for SharePoint) is running in production and installed all over the world. As the brain-child of Dr. Peter Kuhn of TSRI, C-ME is just a step in realizing his dream/mission of "getting his arms around" cancer to turn it into a managed disease. This stand-alone, source code version of the application does not have the SharePoint dependency and allows you to open sample 3D Protein Database Format (PDB) files directly....spin them in 3D, zoom in on them, display them from different views, etc. This means you can get the application running quickly and stare at the code. Just a heads up: although WPF makes 3D dramatically easier, it still is not for the faint of heart. There is a lot of Trigonometry and Calculus in the code. And it's really well written - which means its object oriented and consequently abstracted. The problem that C-ME solved (what Dr. Peter Kuhn did not have) was a way to view cancer and SARS molecules in 3D (and 2D) and attach research directly to the 3D (and 2D) surface of the molecules. Research takes many forms: Office documents, like Word, PDFs, URLs to content all over the world, pictures, and even SharePoint discussions. Upon "pinning" research to the exact spot on the 3D (or 2D) surface of the molecule the research is actually persisted into SharePoint with the 5 coordinates of 3D. This Rich Client WPF application consumes SharePoint Web Services to pull that off. This "new" application development paradigm solves an interesting problems like a highly graphical and usable 3D client for the desktop and the broad reach of a browser based application (SharePoint) to house the research and handle the collaboration and workflow.
  • With Project Trident, you can author workflows visually by using a catalog of existing activities and complete workflows. The workflow workbench provides a tiered library that hides the complexity of different workflow activities and services for ease of use.
  • In addition to the software projects conducted by the External Research team, a range of collaborative scientific projects have been conducted over the years, each partnering a Microsoft researcher with an academic researcher to advance both the computer science and the biological research.While many of these projects have been successful in achieving their research goals, each has led to the independent development of software of value to only a small fraction of the scientific community. It would clearly be preferable if each additional software development could build on those that went before, resulting in a richer and more capable platform for research. This was one of the primary motivations behind the Microsoft Biology Foundation.
  • Typically, Microsoft researchers conduct computer science research – but some Microsoft researchers, such as David Heckerman and his team, work directly on problems in the life sciences. In this case, Dr. Heckerman applied his expertise in machine learning to the design of vaccines.Again, this work results in a range of freely-available software tools that can be downloaded and used by the scientific community. These tools encapsulate unique approaches to scientific challenges such as the construction of phylogenetic trees.
  • The purpose of the Microsoft Biology Foundation is to create a platform for the construction of applications of value to the life science community. To do this, we are combining many of the projects already underway in the External Research team – collaborations with academia, internal life science research, product development activities within Microsoft Corporation, and existing products that can be applied to biological research. All of these, plus dedicated software development on core features and community involvement, has led to the development of the Microsoft Biology Foundation.

Transcript

  • 1. The Microsoft Biology Foundation and its Applications
    Simon Mercer
    Director for Health & Wellbeing
    Microsoft External Research
  • 2. Microsoft External Research - Software
  • 3. Ontology Add-in for Word
    Services: Ontology download web service
    • John Wilbanks
    • 4. Phil Bourne
    • 5. Lynn Fink
    Intent: Term recognition & disambiguation
    Relationships: Ontology browser
    Source code and binary:
    http://research.microsoft.com/ontology/
  • 6. NodeXL
    Binary and source code:
    http://nodexl.codeplex.com
  • 7. 3D Molecule Viewer
    • PDB File Viewer
    • 8. Written in C# using WPF
    Binary and source code:
    http://3dmoleculeviewer.codeplex.com/
  • 9. The Trident Scientific Workflow Workbench
    A visual workflow environment that allows researchers to better manage, evaluate and interact with even the most complex scientific datasets
    Built on top of Windows Workflow Foundation
    Write once, deploy and run anywhere…
    Visually program workflows
    Libraries of activities and workflows
    Automatic provenance capture
    Available at: http://research.microsoft.com/en-us/collaboration/tools/trident.aspx
  • 10. Origins of a Platform
  • 11. Previous bioinformatics project outputs
    Jaroslav Pillardy, Computational Biology Service Unit, Cornell University
    BioHPC: Suite of 28 applications modified and adapted for efficient use in an Windows HPC environment with ASP.NET interface
    Currently supports the areas of DNA sequence analysis, protein structure prediction, population genetics and phylogenetics
    Jim Hogan, SilverMap: Queensland University of Technology
    • MQUTer supports research into bioinformatics, sensor networks, visualization and parallelism on the Microsoft platform
    • 12. Six new tools – the latest under development using MBF and Silverlight 3 which visualizes DNA sequence similarity and is integrated into MBF (and will shortly be available as an Excel plug-in)
    Robin Gutell, Center for Computational Biology and Bioinf., UT Austin
    • Suite of tools to explore evolutionary relationships and predict function of RNA molecules
    • 13. Available as a website – also a complementary open-source suite of Windows-based tools, under development using MBF (H1 FY11)
    + Cancer Bioinformatics in ER
    Marty Humphrey, Department of Computer Science, University of Virginia
    • The caBIGplatform connects consumers, the care delivery system, and the research community. Close to 60 NCI-designated Cancer Centers are deploying caBIG® infrastructure and tools, as are 16 Community Cancer Centers that in the aggregate touch 20 million lives.
    • 14. This project pilots caBIG clients on Windows, leveraging and extending MBF, and tutorials demonstrating the value of Microsoft technologies to the caBIG developer and user community.
  • Fighting HIV and AIDS
    • Four-year collaboration between Bruce Walker at Harvard and David Heckerman’s team (Microsoft Research)
    • 15. Discovered three key insights to fight HIV:
    • 16. Immune system is led astray by decoy epitopes (Nature Medicine, 2006)
    • 17. Frameshift epitopes exist (JEM, 2010)
    • 18. Natural killer cells directly attack HIV (Nature Medicine, in review)
    • 19. 40+ publications, including Nature and Science
    • 20. Walker has obtained $110M+ subsequent funding
    • 21. PhyloD.Net, a tool for inferring HIV evolution in an individual, is used by 100+ HIV researchers and is now part of Microsoft Biology Foundation
    • 22. Numerous press stories including Business Week and NPR
  • Convergence on a Strategic Platform for Bioinformatics
    Microsoft BiologyFoundation
    • Beta 1: Nov 5, 2009 (MS Connect)
    • 23. Beta 2: Feb 10, 2010 (CodePlex)
    • 24. V1 release: July 2010
    • 25. Early adopters from industry and academia
    • 26. Bio-IT Alliance partner
    • 27. Leveraging Microsoft assets: Pivot, NodeXL, TRIDENT, Iron Python, etc
    • 28. Showcasing Microsoft products: Excel/Office, Visual Studio 2010, .NET 4.0, WPF, Silverlight
    • 29. V1 launch June 2010
    • 30. Keynote presentations planned
    • 31. Training course in prep
    • 32. Community ownership
    • 33. Foundation of future MSR genomics projects
    • 34. Foundation of all future ER genomics engagements with academia
    Azure engagement through XCG(Azure BLAST, PhyloD services)
    Product engagement and prototyping use by TC, HSG
  • 35. What is The Microsoft Biology Foundation?
    An open-source library of reusable bioinformatics algorithms, services and functions built on the .NET platform
    Benefits:
    • Easy to parallelize algorithms
    • 36. Easy to distribute computations and workflows
    • 37. Easy to visualize massive data sets
    • 38. Ability to leverage greater strength from existing use of other MS technologies
    • 39. Provides transition from local to cloud-based computation and data storage
  • Architecture: Namespaces
  • 40. Objectives
    Modular by design
    Commonly used features
    Exceptionally well-documented
    Extensible
    Interoperable
  • 41. Initial Areas of Focus
    Genomics
    Sequencing
    Analysis and Annotation
    Advanced Research
    Phylogenetics
    Genome Wide Association
    Haplotype reconstruction
    Next Targets
    Visualization
    Large data sets
  • 42. mbf.codeplex.com
    Open SourceAvailable free of charge for commercial and non-commercial use and modification under the MS-PL license (http://opensource.org/licenses/ms-pl.html)
    Community-DevelopedMoved to CodePlex, Creating advisory board and building a community
    Community-CuratedModify code, find bugs, contribute new features
    V1 ReleaseLate June 2010
  • 43. Build executables
    Visual Studio
    Office add-in
    BioExcel
    Commandline scripting access
    Iron Python, PowerShell
    Workflow Activities
    Trident, WF
    Services on the Cloud
    Azure
    Different Styles of Usage
  • 44. mbf.codeplex.com
  • 45. 18
    Selecting Restriction Endonucleases: DNA PReDuST(Aditi Technologies)
    Fragment Size Distribution Graph
    Restriction Map [Circular DNA]
  • 46. Computational Biology Applications Suite for High Performance Computing (BioHPC)
    Computational Biology Service Unit
  • 47. MBF Team
    Mike Zyskowski, Chris Wu
    Microsoft Research
    David Heckerman, Bob Davidson, Carl Kadie, Yogesh Simmhan, Jennifer Listgarten, Jonathan Carlson
    Cornell University
    Jarek Pillardy
    Queensland University of Technology
    Jim Hogan
    University of Texas at Austin
    Robin Gutell
    Aditi Technologies
    Vivek Kumar
    Illumina Corporation
    Scott Kahn
    Johnson & Johnson Pharmaceutical Research Division LLC.
    Dimitris Agrafiotis, Victor Lobanov, Jeremy Kolpak
    Acknowledgements
    mbf.codeplex.com
  • 48. © 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
    The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.