Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Clinical Trials & Big Data-Final


Published on

  • Be the first to comment

  • Be the first to like this

Clinical Trials & Big Data-Final

  1. 1. Manoj Vig Twitter - #manojvig
  2. 2. I am an employee of Shire pharmaceuticals. The statements and opinions expressed within this session are my own and do not represent those of Shire. There are some references to technical design pattern being implemented within Shire but explanation of those implementations provided in this session are purely technical. This presentation outlines general technology direction and trend analysis. Shire has no obligation to pursue any approaches outlined in this document or use any functionality documented or discussed in today’s session.
  3. 3. Volume VarietyVelocity (Petabytes of Data) (Structured, Unstructured, images, Sounds) (Batch, sub second response, stream, changes in data)  Handle large volume of data  Designed for Scalability & Failover  Support multiple workloads  Security, multi tenancy & privacy  Cost effective Characteristics of a big data system
  4. 4. 3. Apache Hadoop  Multiple work loads/Distributed Computing 1. Mobility 2. Social
  5. 5. Participant Recruitment Adherence & Engagement User Interaction  Frequent Data Generation Remote Data Exchange Data Generation
  6. 6. Participant engagement Patient & Site Identification Social Listenting Distributed Scale Unstructured Velocity Security Access Big Data Processing Systems
  7. 7. Twitter Twitter API(Multi threaded data acquisition) Curation Filter Algorithms Rank Location Profile Distributed, Scalable, Fast & Economical Key Decision Makers Targeted Ads Visualizations Web/Mobile Delivery Channels AutomatedProcess
  8. 8. Security,governance,privacyandAudit BI Reports & Dashboards Data Analysts Data Scientists Apps (Web + Mobile) Devices Data Feeds Data Service : Multiple data sources, multiple processing workloads and multiple delivery channels Impala / Tez (Interactive) HDFS(Hadoop Distributed File System) MR (Batch) Spark (Stream, ETL, DS) Hive (DW) Robust Cloud Infrastructure(e.g. AWS EC2) Governance,Security&Audit YARN (Cluster Resource Manager) Hbase (NoSQL) Solr (Search) Spark (Mlib, Graph) Custom/proprietary/Visualization Apps CTMS CommonDataIngestion Clinical Metadata Data Quality Searchable Data Catalog Streaming CRO Data Feed Genomic Data
  9. 9. CTMS Streaming UK Clinical Trials Gateway Other R&D Datasets SAS Datasets Genomic Datasets Apache Solr Running on Hadoop Cluster HDFS (Data Landing) Apache Solr Data Indexing Information Extraction (Spark) Pattern Recognition (Spark) Machine Learning (Spark) Metadata Driven Ontology (Hbase) Data Indexing Solr APIs Web UI Mobile Apps Desktop Widgets Dashboards Data Sources Consumption Hbase APIs
  10. 10.  Technology is here to stay  Data Generation speed will accelerate  Data Access will get easier  Device connectivity will increase  Technological disruption is inevitable
  11. 11.  Are Recommender Systems Now Mainstream? ◦ mainstream/  The Impact of Real-time Computing Systems – Part 1 ◦ systems-part-1/  The Impact of Real-time Computing Systems – Part 1 ◦ systems-part-2/  ASCOT: a text mining-based web-service ◦