Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. SPSS/CHILDREN’S MEMORIAL RESEARCH CENTER: Virtualizing Text Mining to Accelerate Research & Analysis Univa UD Case Study Industry: Managed Services SOLUTION OVERVIEW customer challenge Children’s Memorial Research Center (CMRC) is one of only five research institutions in the country devoted exclusively to pediatrics. Its ground breaking programs and lead- ing investigators attract millions in federal research dollars. In the search to identify gene therapies for pediatric brain tumors, CMRC’s Dr. Eric Bremer needed an efficient way to scan hundreds of thousands of journal articles to uncover key gene relationships, gain new insight into underlying tumor pathology, and ultimately predict which genes might lead to successful treatments. solution After utilizing SPSS’ enterprise data mining workbench Clementine® to analyze and classify tumor data, Dr. Bremer turned to SPSS’ linguistics-based text mining application LexiQuest Mine™ to explore the sea of published research for specific concepts and relate the analysis to known diagnosis criteria. He then implemented Univa UD technology to accelerate processing without expanding the Center’s hardware infrastructure. Virtual- izing LexiQuest Mine on a Univa UD solution has made it possible for Dr. Bremer’s team to effectively work with over five years of published research to build a knowledge base that will help improve the speed and accuracy of brain tumor treatment development. benefits CMRC found that by combining a robust text-mining application with a Univa UD applica- tion virtualization solution, they achieved: • 94% timeframe reduction from 24+ hours to just over an hour • Reduced cost of analysis by avoiding hardware investments • Expanded research scope with the ability to run more jobs a day Dr. Bremer’s team has successfully defined workflows and automated nearly 80% of this process with the grid in place. technology • Grid MP‘ platform from Univa UD • LexiQuest Mine™ from SPSS
  2. 2. SPSS/CHILDREN’S MEMORIAL RESEARCH CENTER: Virtualizing Text Mining to Accelerate Research & Analysis children’s memorial research center: Once a child is diagnosed with one of the 12 pediatric applying innovative approaches to brain tumor types, the options are few. Outside of surgery (which is risky) the treatments available are rare. That’s research & diagnostics why Dr. Bremer and his team at CMRC are leading the search for new therapeutic targets – key genes associated Established in 1986 as a formalized basic science research with specific tumor types that should be examined for program, Children’s Memorial Research Center is the therapeutic potential. research arm of Children’s Memorial Hospital, the pediatric teaching hospital for Northwestern University’s Feinberg School of Medicine and one of only five institutions in the “LexiQuest Mine has been a state-of- country devoted solely to pediatric medicine. the-art application for gigabytes pro- cessing. With Univa UD, the era of text CMRC’s vision is to become the preeminent child health mining terabytes has begun.” research organization dedicated to improving the health olivier jouve of children nationwide. The Center’s ability to attract lead- vice president of product marketing, ing investigators has considerably advanced its goal of data and text mining/spss ensuring scientific knowledge is translated into tangible and effective clinical uses for the benefit of all children. “If we can better understand which genes are related not only to the disease itself but also to fundamental biologi- CMRC has organized its work around seven interdisci- cal processes (like growth and development), we can plinary research programs, including Cancer Biology and better target certain genes for further research,” explains Epigenomics. As Director of Brain Tumor Research within Dr. Bremer. “If a gene or cluster of genes has been proven this program, Dr. Eric Bremer is charged with finding new, to affect certain disease pathways, for example, it might be innovative ways to identify genes that might lead to suc- possible to develop that into a treatment for that particu- cessful treatments. lar pathway.” “There are plenty of scientists out there Dr. Bremer and his CMRC team had already built a data- publishing results that could be critical base of tumor-related gene information using the SPSS for our work. But because these results data mining workbench Clementine. Now, they needed to are often buried in studies that don’t evaluate that information against what was known about immediately seem relevant, key pieces of how genes interact in other, similar situations in order to data remain untapped. We needed a way uncover relationships that might help in their search for to take advantage of all this published tumor therapy drug candidates. work, to sift through and uncover only the results that are relevant to our work a global collaboration – no more, no less.” Fortunately, Dr. Bremer and his team were not starting dr. eric bremer from scratch. Thousands of researchers worldwide have director of brain tumor research, published studies in professional journals that reference children’s memorial research center genes and their relationships to diseases, treatments, and each other. Publications like the Journal of Medicinal Chemistry, Journal of Biological Chemistry, and Science are know your foe: the pediatric valuable tools enabling researchers worldwide to share brain tumor results that may further each others’ studies. Pediatric brain tumors, the most common type of solid Dr. Bremer recognized this body of work as a potential tumor in children, are also the most fatal child cancer. Half source for valuable insights into new therapeutic of those diagnosed -- nearly 3,000 a year in the U.S. alone approaches for pediatric brain tumors. By uncovering -- will die within five years. And for survivors the prospects insights hidden within the pages and pages of research, are sobering: neurological disabilities, retardation, and Bremer and CMRC can better predict which genes to psychological problems are among the long-term prob- target for development. But first you have to know where lems these children can suffer. to look.
  3. 3. SPSS/CHILDREN’S MEMORIAL RESEARCH CENTER: Virtualizing Text Mining to Accelerate Research & Analysis needles in haystacks LexiQuest Mine works by employing a combination of dic- tionary-based linguistic analytics and statistical proximity Unfortunately, there are a limited number of free hours matching to identify key concepts, including multi-word in a scientist’s day, and when picking articles to review in concepts. Based on a linguistic analysis of the context and search of valuable insights an article is likely to be over- semantic nature of the words, LexiQuest Mine is able to looked if the primary topic doesn’t seem relevant. As Dr. identify the type (organization, product, genes, etc.) as well Bremer explains, “I wouldn’t normally scan an article on as the degree of the relationship between the words and fruit flies or bacteria, for example, to learn about genes other concepts. associated with pediatric brain tumors. But by ignoring these studies we may be missing out on any number of LexiQuest Mine has two operating modes that can be insights about potentially therapeutic genes.” separately or subsequently used: • Extraction of concepts, based on Part of Speech “The Univa UD solution has literally changed (PoS) tagging our definition of what is possible. Where • Text Link Analysis, to identify semantic before we were able to scan only a few relationships between known concepts thousand articles a day, now we can exam- ine 100,000 articles in the same amount of Ultimately, the results can be displayed in a color-coded graph- ical map so that the analysts can clearly identify relationships. time. We can also pull out more concepts and can rerun the data in different ways, Using LexiQuest Mine, Dr. Bremer was able to detect rather than just once, to improve the pre- protein or gene interactions that were identified across a cision and quality of what we learn. It’s large number of scientific publications and relate them to a dramatic increase in scope that analysts known brain tumor diagnosis criteria catalogued in the and researchers in other fields are going to Clementine database. want to achieve as well.” With this tool, Dr. Bremer’s team would now have the ability to dr. eric bremer scan the database of available journals and create a set of data director of brain tumor research, that could advance their research immeasurably. children’s memorial research center challenge: speeding the search process Dr. Bremer’s team needed a way to leverage over 125,000 specific articles from 21 journals and turn this data into While LexiQuest Mine made it possible to automate the a useful tool for research. But the challenge of wading process of intelligent data searching, the turnaround time through literally millions of pages of text required a man- for job processing was still prohibitively long. “I quickly re- ageable and cost-effective solution. alized it wasn’t realistic, even with this application, to mine the amount of data I needed to review,” said Bremer. Traditional keyword search solutions tend to produce too If the text mining process was to be used as a valuable few – or too many – hits and do not help scientists zero in tool for research, then a solution for faster processing on the few results that can help provide critical informa- would be required. tion about a particular gene association and its biological relevance. So, without tools to identify and extract the grid mp for optimized processing right published data, much of the information that can help Dr. Bremer and scientists like him remains hidden in To help surmount the obstacle of extended processing the overwhelming volume of published research. times, SPSS introduced Bremer to Univa UD (Univa UD), a leading provider of enterprise grid computing software. lexiquest mine: extracting SPSS wanted to maximize the benefits of its LexiQuest insights, building knowledge Mine solution for CMRC and offered to enable LexiQuest Mine on a Univa UD Grid MP platform. To solve the problem of scanning hundreds of thousands of journal articles for relevant information, Dr. Bremer and “The Grid MP solution was attractive as an alternative to his CMRC team turned again to SPSS and their text mining investing in high-end servers,” explains Dr. Bremer. “With application LexiQuest Mine. this solution we were able to use in-house machines for
  4. 4. SPSS/CHILDREN’S MEMORIAL RESEARCH CENTER: Virtualizing Text Mining to Accelerate Research & Analysis processing rather than investing in new hardware or, broad relevance worse, reducing the scope of our project.” The need to quickly search, process and analyze vast Grid MP consists of a server and agents distributed to volumes of text and other data is not unique to the health- designated machines (servers, clusters, or desktops) that care field. Dr. Bremer stresses that this solution would be can be in the same room or on multiple continents. The useful in a wide range of areas, “from lawyers mining pat- software schedules processing jobs to machines that ent applications to law enforcement agencies searching “advertise” their availability (meaning they have idle pro- for clues in databases of evidence.” cessor time and are equipped to perform the necessary computations) – ensuring work is scheduled to the best- The rapid increase in available information over the past suited resource. The machines return results to the server, decade necessitates approaches like the joint Univa UD / which recompiles the data and delivers it to the user in an SPSS solution to harness and benefit from that informa- easily searchable database format. tion – as fast as it becomes available. Robust provisioning policies allow administrators to future possibilities define usage policies and eliminate the need for the re- searcher to assign the jobs to specific machines. CMRC is already investigating grid-enabling other pro- cesses related to this program. For example, Dr. Bremer’s “From the user’s standpoint,” says Bremer, “nothing has team has enabled the automated data transfer changed except that we get results in a fraction of the tool, GETITRIGHT by CTH Technologies, used for access- time.” Bremer’s team is now able to run multiple jobs a ing journal articles and preparing them to be mined. This day and retrieve more results and thus improving both solution automatically connects and downloads full-text the timeframe and the scope of their work. journals from the web and does the necessary processing to deliver output that is readily available for text mining. up and running in a snap Dr. Bremer is also looking for ways to make these resourc- The CMRC deployment was typical, taking only a few hours es available to others. His knowledge base will be valuable to install and configure. Within days LexiQuest Mine was up to any doctors seeking to classify tumors more quickly and running thanks to Grid MP’s flexible application frame- and accurately, and the applied grid procedure would work. Dr. Bremer was able to start using the grid-enabled be relevant to researchers in a variety of fields. So others system within weeks of initially contacting Univa UD, and may benefit from his efforts, Dr. Bremer has published and benchmarking was complete in under a month. presented his results in a variety of forums. value and benefits lessons learned Thanks to the SPSS / Univa UD joint solution, Dr. Bremer Consider new approaches to manual tasks: The has established a growing knowledge base as a critical prospect of manually reviewing thousands of journal tool for advancing research on pediatric brain tumors. articles for clues to tumor therapy was daunting. But Timeframe Improvements: Project runtimes fell by 94%. by implementing automated tools for text mining, Dr. Projects that used to take 24-26 hours can now be run in Bremer’s team made it possible to perform work that just over an hour, so CMRC can now execute numerous was previously thought to be out of scope. searches during the workday – a vast improvement over the Promote new technologies across the business: The previous 24-hour wait time for results from a single search. success of CMRC’s initial SPSS implementation led to the Reduced Costs: By enabling their application on a Univa search for (and discovery of ) additional cost and time- UD virtualized environment, CMRC was able to avoid an in- saving uses for grid. Companies should seek ways to apply vestment in expensive hardware and instead increase the technologies for benefit in multiple areas of business. performance of resources already purchased. Challenge the boundaries of what you thought was Expanded Scope of Research: With the Univa UD text possible: Examine the limits of what was possible in mining solution in place, Dr. Bremer and his team can har- the past and consider what would be required to move ness the full body of published results available to them. beyond those limits. Then consult with trusted and re- They are now able to search hundreds of thousands of articles spected experts about how to meet these requirements. a day and to refine the way in which they analyze their results. Copyright © 2008 Univa UD Inc. All rights reserved.