J mc callumbig datapsp2013

3,285 views

Published on

Preso on relevance of big data analytics to scholarly publishers, given at annual AAP/PSP conference. Focuses on the "product side" of big data and how advances in new models for evaluating medical evidence will affect medical publishers and offers recommendations on how to prepare for new developments in data-driven evidence-based medicine.

Published in: Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,285
On SlideShare
0
From Embeds
0
Number of Embeds
923
Actions
Shares
0
Downloads
9
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Need I say more? For this audience in particular, the NLP orientation of IBM Watson and it’s experiments in the healthcare field with Wellpoint and Mt. Sinai Hospital in NY speak volumes.

    Mines scholarly articles as primary base of evidence and assigns probabilities to results from queries
  • Note my use of the word “likely”. Predictions are based on probabilities.

    Big Data systems are “learning systems”. See Hilary Mason (CTO of Bitly) talk on machine learning:
    http://www.hilarymason.com/presentations-2/devs-love-bacon-everything-you-need-to-know-about-machine-learning-in-30-minutes-or-less/

    Let’s move on to some less dramatic material.
  • 1) Growth in information – and availability of new sources of information from devices/sensor. Everything’s being digitized; everything digital can be tracked; mobile devices and sensors are connecting physical & digital worlds (and creating huge amounts of new data).

    Terabytes, petabytes, exabytes, zettabytes, yottabytes. You’ve heard the terms and get the gist.


    Couple of questions to the audience:
    How many of you consider yourselves data publishers?
    How many have any idea what Hadoop and MapReduce are?
  • Picture of Sequoia supercomputers at Lawrence Livermore Lab/DOE.
    “We have gone through about 32 doublings in computer power,” since the start of the computer age, Brynjolfsson points out. That is the first half of the chessboard. The second half—the ”next decade of exponential growth—is going to be far more impactful. . . . The ability to take advantage of this explosion in computer power and storage capacity is going to be key to business success.”

    2) It’s not really about big datasets; it’s about the tools that have been developed to manage & mine very large datasets and integrate different types of data to extract information that wouldn’t have been possible when data were kept in siloes.

    3)Think “Advanced Analytics for Complex Problem Solving” as a better term for “Big Data”. The really big computational tools may be the latest and greatest, but it’s the union of all of the data management and analytics tools that we have that make Big Data a Big Deal.
  • Before proceeding, let’s step back and define the term “big data”. The most common definition that includes the 3 (or 4) Vs has its origins in a report written by Gartner Analytst (Meta Group at the time), Doug Laney.

    Doug Laney’s original article describing the 3 “Vs” from 2001: http://blogs.gartner.com/doug-laney/deja-vvvue-others-claiming-gartners-volume-velocity-variety-construct-for-big-data/

    It’s important to emphasize that size or “volume” isn’t the only defining characteristic of big data. A combination of volume and variety are necessary conditions to earn the “big data” moniker in my opinion. The ability to handle high velocity data can be important, depending on the use case.

    Veracity, the term I’ve adopted from others to represent good data hygiene and data management practices, it of fundamental importance. Without quality data—and a good understanding of the limitations of your data—you won’t get good predictive capabilities.
  • Tim Berners-Lee talks frequently about the importance of raw data and linked data. Data in silos isn’t going to produce the progress we need.

    Stephen Wolfram, the founder of Mathematica & well-known for founding Wolfram|Allpha, the computational search engine, is now focusing on biomedicine and plans to introduce a medical version of Wolfram Alpha.
  • Eric Topol, one of the leading voices for rapid change in medical research to facilitate much faster “bench to bedside” dissemination of new discoveries. Also, “personalized medicine” requires a big data model.
  • And, I haven’t even included payment data, which is used heavily now, because other sources, especially outcomes data, are not widely available. Nor did I include social media.
  • If you’re not going to become an analytics company, you are going to have to learn how to license your content (data) to analytics companies. What terms will be acceptable?

    How many in the room are working with IBM Watson on a pilot basis? How many have a clear plan for commercial licensing terms for IBM Watson?
  • Wrapping up, this diagram illustrates the data and analysis that underlie the “evidence base” in medicine—in a simplified manner.

    In current era, the bottom layer continues to be the dominant means for disseminating new discoveries.

    In the future, more and more discoveries will enter the evidence base directly via machine learning. But, we have a ways to go before we have figured out the right data sources and algorithms so that we can safely apply Big Data to medicine.
  • Just some possibilities listed here. Let’s start at the bottom. I think you all understand the bottom role.

    In my view, there will always be a role for specialty publishers to disseminate information about recent scientific discoveries to interested audiences.
    However, I predict that the scientists involved in research will be sharing information in different forms in the future. For instance, new algorithms will be shared. More direct means of sharing new discoveries will be employed. New discoveries will go directly into the models and be reflected in the updated evidence base.

    Okay, enough prognosticating from me. I’ve covered a lot of material and hope that I’ve left you with some greater insight into “Big Data” and what it means for scholarly publishers.

    I’m happy to take a few questions now. You can find my contact information on the next page.
  • J mc callumbig datapsp2013

    1. 1. BIG DATA: HELPING SCHOLARLY PUBLISHERS CUT THROUGH THE HYPE Janice McCallum Health Content Advisors Association of American Publishers 2013 PSP Annual Conference, Washington, DC February 6-8, 2013
    2. 2. Focus for this talk Is Big Data Overhyped? Let’s start with some definitions Relevance to scholarly publishers What should you be doing to profit from--and avoid being disrupted by--Big Data upstarts?
    3. 3. IS BIG DATA OVERHYPED?
    4. 4. Need I say more?
    5. 5. Other Headline Grabbers from Retail and Financial Services Analyzing customer behavior to predict purchasing or payment patterns: • Target recognizing likely pregnancy of shopper • Credit card companies knowing when you’re likely to be late with payments • Gathering real-time behavior data (EyeSee mannequins)
    6. 6. There is More to Big Data than Watson…Is It All Hype? Comprehensive List of Big Data Statistics “Comprehensive” should be in quotes. Sources and magnitude are constantly changing.
    7. 7. What’s the Big Deal with Big Data? Growth in computing power/reduction in cost: Moore’s Law. • Why Big Data is So Big New Sources of Data • Mobile devices and sensor data hugely expand the amount of data generated. • Social media: Facebook and Twitter • But keep in mind: What Executives Don’t Understand About Big Data • Schrage: Question shouldn’t be how do we get more data, it should be “what marriage of data and algorithms” achieve our desired outcome. Value of Big Data lies in unlocking patterns and insights that would not have been possible without the combination of computing power, tools, and data. Alternative definition for Big Data: Advanced Analytics for Complex Problem Solving
    8. 8. DEFINING BIG DATA
    9. 9. • Volume Large datasets: too big for standard enterprise database applications. • Variety Combining structured and “unstructured” data sources. Think “union” of different sources, not big vs. small or structured vs. unstructured. • Velocity Big data systems can integrate and process near real-time data from mobile devices and sensors. • Veracity Data management best practices still matter. It’s not about trading off size vs. quality; it’s about combining best of both worlds. Big Data is an umbrella term that can encompass infinite use cases. The ability to incorporate large diverse data sources into an analytic model is paramount. The 4 Vs of Big Data‟Controlling Data Volume, Velocity, and Variety….to improve internal and external collaboration.” Doug Laney, 2001
    10. 10. RELEVANCE OF BIG DATA TO SCHOLARLY PUBLISHERS With Examples from Medical Publishing
    11. 11. Expectations of Researchers Have Changed Scientific and Medical Researchers Need: Easier faster access to data sets; Ability to trace data provenance; Central repositories or better discovery options for data sets; Business models for accessing, sharing, and adding value to the base of knowledge. Raw Data, Now! Tim Berners-Lee, 2009 [Biomedicine is] going to have to become more dynamic, more computational. Stephen Wolfram, 2006
    12. 12. Expectations of Clinicians Have Changed “…the problem is no longer getting access to data, whether it's a genome sequence or whether it's a glucose sensor, but how do you process that data in an efficient way…” --Eric Topol, 2012
    13. 13. Partial Set of Data Sources in Medical Research Clinical Research Patient registries/ Outcomes DataRx data Sensor data/Exercise tracking OTC & food purchases Disease registriesGenomic data Almost all medical research currently occurs on data types displayed above the fold.
    14. 14. Big Data Uses in Healthcare Are you prepared to play in this fast- growing fast-changing segment?
    15. 15. Getting Started in Big Data  First: recognize that you are all data publishers. If the content is digital, it’s data.  Create standard formats for data sets that are submitted with articles.  Plan for collaborating with Big Data analytics companies.  Develop expertise in new more complex models of medical evidence.
    16. 16. Accelerated Pace of Data Flows Evidence Base Software, models Analysis, insights Data sets, registries, directories Curated news, textual content, summaries Big Data isn’t about structured vs. unstructured data. It’s about building upon the existing base of knowledge with the ability to constantly update the evidence base with new data that either reinforce or replace currently accepted knowledge. A strong foundation remains essential and requires multi- directional data flows.
    17. 17. What Role Will Your Organization Play in Big Data Era? Some possibilities: Evidence Base Software, models Analysis, insights Data sets, registries, directories Curated news, textual content, summaries ← Disseminate latest evidence-based guidelines ← Provide software platform that incorporates latest algorithms and integrates data ← Employ analysts, data scientists, researchers to conduct studies and report results ← Provide master data management services; become clearinghouse for data exchange ← Publish curated scientific research results
    18. 18. THANK YOU! Janice McCallum Managing Director Health Content Advisors Janice@HealthContentAdvisors.com @janicemccallum

    ×