The Janus program aims to improve FDA's management of structured scientific data by implementing data standards, improving access through interoperable data warehouses, and supporting analytic tools. Janus will initially focus on clinical, nonclinical, and pharmacogenomic data, as well as product and post-market data. It utilizes a flexible architecture based on the HL7 Reference Information Model to promote interoperability. Near term plans include moving pilots of clinical and nonclinical data warehouses to production and testing CDISC-HL7 message exchange.
This document provides a quick reference guide for Oracle Server 9i, summarizing key aspects in 3 sentences or less per section:
- It describes Oracle instance background processes, parameters, and dynamic performance views for monitoring instances.
- It then summarizes key aspects of databases, including initialization parameters, utilities, tablespaces, redo and undo management, users and privileges.
- The rest of the document briefly outlines Oracle features such as SQL, PL/SQL, distributed databases, clustering, globalization support, and tools like SQL*Plus, with brief notes on new and deprecated features between Oracle 8i and 9i.
Infosüsteemide infrastruktuuri haldus ja monitooring Oracle Enterprise Manage...ORACLE USER GROUP ESTONIA
This document discusses Oracle Enterprise Manager for managing IT services. It describes how Oracle Enterprise Manager allows centralized monitoring, management, and configuration of systems. It can monitor all aspects of the IT infrastructure from applications to storage. Key features include monitoring of databases, middleware, hardware, virtualization, and non-Oracle systems. Oracle Enterprise Manager provides lifecycle management capabilities for planning, building, testing, deploying, and managing cloud environments. It also allows metering and chargeback of cloud resource usage. Proact is presented as having expertise in these areas as an independent integrator focused on mission critical data management.
The first steps of analysing sequencing data (2GS,NGS) has entered a transitional period where on one hand most analysis steps can be automated and standardized (pipeline), while on the other constantly evolving protocols and software updates makes maintaining these analysis pipelines labour intensive.
I propose a centralized system within CSIRO that is flexible to cater for different analyses while also being generic to efficiently disseminate labour intensive maintenance and extension amongst the user community.
The document discusses the emergence of big data and new data architectures needed to handle large, diverse datasets. It notes that internet companies built their own data systems like Hadoop to process massive amounts of unstructured data across thousands of servers in a fault-tolerant, scalable way. These systems use a map-reduce programming model and distributed file systems like HDFS to store and process data in a parallel, distributed manner.
Data mining process powerpoint presentation slides.SlideTeam.net
The document describes the key steps in a data mining process:
1. Data is collected, cleaned, and stored in databases or data warehouses.
2. Machines learning, statistics, and other algorithms are applied to the data to discover patterns.
3. An analyst reviews the output and findings are reported. The results are then interpreted and actions may be taken based on the findings.
Data mining process powerpoint ppt templates.SlideTeam.net
The data mining process involves collecting and cleaning raw data from various sources, transforming the data into a usable format, analyzing the data using machine learning algorithms, interpreting and reporting the findings, and taking action based on the results.
Data mining process powerpoint presentation templates.SlideTeam.net
The document describes the key steps in a data mining process:
1. Data is collected, cleaned, and stored in databases or data warehouses.
2. Machines learning, statistics, and other algorithms are applied to the data to discover patterns.
3. An analyst reviews the output and findings are reported. The results are then interpreted and actions may be taken based on the findings.
Data mining strategy powerpoint ppt slides.SlideTeam.net
The document describes the key steps in a data mining process:
1. Data is collected, cleaned, and stored in databases or data warehouses.
2. Machines learning, statistics, and other algorithms are applied to the data to discover patterns.
3. An analyst reviews the output and findings are reported. The results are then interpreted and actions may be taken based on the findings.
This document provides a quick reference guide for Oracle Server 9i, summarizing key aspects in 3 sentences or less per section:
- It describes Oracle instance background processes, parameters, and dynamic performance views for monitoring instances.
- It then summarizes key aspects of databases, including initialization parameters, utilities, tablespaces, redo and undo management, users and privileges.
- The rest of the document briefly outlines Oracle features such as SQL, PL/SQL, distributed databases, clustering, globalization support, and tools like SQL*Plus, with brief notes on new and deprecated features between Oracle 8i and 9i.
Infosüsteemide infrastruktuuri haldus ja monitooring Oracle Enterprise Manage...ORACLE USER GROUP ESTONIA
This document discusses Oracle Enterprise Manager for managing IT services. It describes how Oracle Enterprise Manager allows centralized monitoring, management, and configuration of systems. It can monitor all aspects of the IT infrastructure from applications to storage. Key features include monitoring of databases, middleware, hardware, virtualization, and non-Oracle systems. Oracle Enterprise Manager provides lifecycle management capabilities for planning, building, testing, deploying, and managing cloud environments. It also allows metering and chargeback of cloud resource usage. Proact is presented as having expertise in these areas as an independent integrator focused on mission critical data management.
The first steps of analysing sequencing data (2GS,NGS) has entered a transitional period where on one hand most analysis steps can be automated and standardized (pipeline), while on the other constantly evolving protocols and software updates makes maintaining these analysis pipelines labour intensive.
I propose a centralized system within CSIRO that is flexible to cater for different analyses while also being generic to efficiently disseminate labour intensive maintenance and extension amongst the user community.
The document discusses the emergence of big data and new data architectures needed to handle large, diverse datasets. It notes that internet companies built their own data systems like Hadoop to process massive amounts of unstructured data across thousands of servers in a fault-tolerant, scalable way. These systems use a map-reduce programming model and distributed file systems like HDFS to store and process data in a parallel, distributed manner.
Data mining process powerpoint presentation slides.SlideTeam.net
The document describes the key steps in a data mining process:
1. Data is collected, cleaned, and stored in databases or data warehouses.
2. Machines learning, statistics, and other algorithms are applied to the data to discover patterns.
3. An analyst reviews the output and findings are reported. The results are then interpreted and actions may be taken based on the findings.
Data mining process powerpoint ppt templates.SlideTeam.net
The data mining process involves collecting and cleaning raw data from various sources, transforming the data into a usable format, analyzing the data using machine learning algorithms, interpreting and reporting the findings, and taking action based on the results.
Data mining process powerpoint presentation templates.SlideTeam.net
The document describes the key steps in a data mining process:
1. Data is collected, cleaned, and stored in databases or data warehouses.
2. Machines learning, statistics, and other algorithms are applied to the data to discover patterns.
3. An analyst reviews the output and findings are reported. The results are then interpreted and actions may be taken based on the findings.
Data mining strategy powerpoint ppt slides.SlideTeam.net
The document describes the key steps in a data mining process:
1. Data is collected, cleaned, and stored in databases or data warehouses.
2. Machines learning, statistics, and other algorithms are applied to the data to discover patterns.
3. An analyst reviews the output and findings are reported. The results are then interpreted and actions may be taken based on the findings.
The document discusses the components and architecture of data warehouses and data marts. It describes how a data warehouse collects data from multiple operational systems and makes it available for analysis. Data marts contain subsets of data tailored for specific business functions or departments. The document outlines different types of data warehouse architectures including virtual, coarse-grained, central, distributed, and data marts-only. It also discusses challenges like integrating dirty data from multiple sources and prerequisites for a successful data warehouse implementation.
This document discusses policy-based data management using the Integrated Rule-Oriented Data System (iRODS). iRODS enables flexible, customizable data management through policy-based controls mapped to computer rules and workflows. It has been applied to various use cases including data grids, digital libraries, and repositories. The document provides examples of rules that can automate tasks like validating data integrity and initializing workflow parameters in iRODS.
The document summarizes a presentation on evolving a new analytical platform. It discusses defining the platform to include tools for the whole research cycle beyond just business intelligence (BI), with SQL Server 2008 R2 as an example of defining the platform. It also discusses what is working with existing platforms and what is still missing, including the need for more scalable data storage and processing.
I-Know presentation: CODE - Commerically empowered Linked Open Data Ecosystem...MichaelGranitzer
This document discusses developing a linked open data ecosystem to better organize and analyze scientific literature and research data. It proposes extracting structured elements and facts from papers using natural language processing and crowdsourcing. Extracted data would be aggregated and stored using the RDF Data Cube Vocabulary to enable analysis across different dimensions. Visual analytics tools and a sharing platform could help analyze relationships and commercialize insights. The overall aim is to make more of the empirical evidence and knowledge from research discoverable, comparable and reusable.
- SEP is a German software company that provides backup and disaster recovery software. It was established in 1980 and has over 3,500 customers in over 45 countries.
- There are different strategies for backing up virtual environments, including snapshot backups, which take an image of the entire virtual disk. This allows restoring the whole system but not individual files.
- Agent-based backups allow getting data from guest VMs in a consistent manner via APIs from hypervisors like VMware, Hyper-V, and Citrix XenServer. This enables restoring individual files and databases.
Denial of Service in Software Defined NetoworksMohammad Faraji
The document discusses denial of service attacks in software defined networks. It proposes a method to isolate application traffic in an extended cloud computing environment using software defined networking. This would decrease the chance of denial of service attacks by allowing applications to define their own network topology and security policies on a fine-grained level, preventing attacks from affecting other applications' traffic.
Vayavya Labs is a company that develops system level design tools and provides embedded design services. It has created DDGEN, the world's first automated device driver generator, which can significantly reduce the cost and efforts required for device driver development. DDGEN takes hardware specification files as input and generates fully functional device drivers and test code. It supports a range of device complexities and operating systems. Pilot results found DDGEN provided close to 200-300% reductions in time and effort for driver development.
Microsoft SQL Server Data Warehouses for SQL Server DBAsMark Kromer
The document discusses Microsoft SQL Server data warehousing solutions. It provides an agenda for a presentation that includes an overview of Microsoft's data warehousing offerings, how to establish baseline metrics for Fast Track reference configurations, and how to design balanced server and storage configurations for data warehousing workloads. It also discusses software and hardware best practices, such as data striping and storage configuration recommendations. Overall, the document outlines topics and solutions to help customers accelerate their data warehouse deployments using Microsoft SQL Server.
The document discusses key characteristics of data warehouses including that they contain historical data derived from transactions for querying, reporting, and analysis. It also compares online transaction processing (OLTP) systems to data warehouses. Additionally, it covers data warehouse architectures, design considerations, logical and physical design, and managing large volumes of data through techniques like partitioning and parallelism. Optimizing input/output performance is also highlighted as critical for data warehouses.
The document discusses key characteristics of data warehouses including that they contain historical data derived from transactions for querying, reporting, and analysis. It also compares online transaction processing (OLTP) systems to data warehouses. Additionally, it covers data warehouse architectures, design considerations, logical and physical design, and managing large volumes of data through techniques like partitioning and parallelism. Optimizing input/output performance is also a major focus since I/O is typically the primary determinant of data warehouse performance.
The document presents information on data warehousing. It defines a data warehouse as a repository for integrating enterprise data for analysis and decision making. It describes the key components, including operational data sources, an operational data store, and end-user access tools. It also outlines the processes of extracting, cleaning, transforming, loading and accessing the data, as well as common management tools. Data marts are discussed as focused subsets of a data warehouse tailored for a specific department.
HP Microsoft SQL Server Data Management SolutionsEduardo Castro
In this presentation was used in the MSDN WebCast and we cover some details about the hardware offerings to run SQL Server DataWarehouse, some detail about HP Hardware is shown.
Best Regards,
Ing. Eduardo Castro Martinez
http://ecastrom.blogspot.com
This document introduces COSBench, a benchmark tool developed by Intel to measure the performance of cloud object storage services. It describes key components of COSBench including the workload configuration, performance metrics collected, and a web console for managing tests. The document also provides a case study using COSBench to evaluate the performance of OpenStack Swift, an open source cloud storage system, by describing the entities and architecture of OpenStack Swift and the test configuration used.
The document discusses COSBench, a benchmark tool developed by Intel to evaluate the performance of cloud object storage services. It describes key components of COSBench including its configurable workload definition, drivers to generate load, and a web console to view results. The document also uses COSBench to analyze the performance of OpenStack Swift, finding that insufficient processing power can throttle overall performance.
This document discusses real-time big data applications and provides a reference architecture for search, discovery, and analytics. It describes combining analytical and operational workloads using a unified data model and operational database. Examples are given of organizations using this approach for real-time search, analytics and continuous adaptation of large and diverse datasets.
Big Data launch Singapore Patrick BuddenbaumIntelAPAC
The document discusses Intel's Open Platform for Next-Gen Analytics. It introduces Intel's Distribution for Apache Hadoop software, which delivers optimized performance, security, and ease of deployment for Apache Hadoop. The software is backed by Intel's portfolio of data center products and contributes enhancements to the open source Apache Hadoop ecosystem. The distribution enables partners to innovate on analytics solutions.
Complex Er[jl]ang Processing with StreamBasedarach
The document is a presentation about complex event processing using StreamBase. It discusses StreamBase's event processing platform and how it provides high performance through its domain specific language and optimizations. It also covers how StreamBase integrates with Erlang through calling Erlang functions and messaging.
Business Intelligence Applications: Build or Buy Evaluation and IBM Cognos DemoSenturus
This document provides an agenda and overview for a webinar on business intelligence applications and whether to build or buy them. The webinar discusses different options for building or acquiring BI systems, including custom-built data marts and warehouses, pre-built vendor solutions, and flexible frameworks. It also demonstrates IBM's analytic applications, which allow users to rapidly model and generate custom or pre-built BI applications.
Cloud computing, big data, and mobile technologies are driving major changes in the IT world. Cloud computing provides scalable computing resources over the internet. Big data involves extremely large data sets that are analyzed to reveal business insights. Hadoop is an open-source software framework that allows distributed processing of big data across commodity hardware. It includes tools like HDFS for storage and MapReduce for distributed computing. The Hadoop ecosystem also includes additional tools for tasks like data integration, analytics, workflow management, and more. These emerging technologies are changing how businesses use and analyze data.
The document discusses the components and architecture of data warehouses and data marts. It describes how a data warehouse collects data from multiple operational systems and makes it available for analysis. Data marts contain subsets of data tailored for specific business functions or departments. The document outlines different types of data warehouse architectures including virtual, coarse-grained, central, distributed, and data marts-only. It also discusses challenges like integrating dirty data from multiple sources and prerequisites for a successful data warehouse implementation.
This document discusses policy-based data management using the Integrated Rule-Oriented Data System (iRODS). iRODS enables flexible, customizable data management through policy-based controls mapped to computer rules and workflows. It has been applied to various use cases including data grids, digital libraries, and repositories. The document provides examples of rules that can automate tasks like validating data integrity and initializing workflow parameters in iRODS.
The document summarizes a presentation on evolving a new analytical platform. It discusses defining the platform to include tools for the whole research cycle beyond just business intelligence (BI), with SQL Server 2008 R2 as an example of defining the platform. It also discusses what is working with existing platforms and what is still missing, including the need for more scalable data storage and processing.
I-Know presentation: CODE - Commerically empowered Linked Open Data Ecosystem...MichaelGranitzer
This document discusses developing a linked open data ecosystem to better organize and analyze scientific literature and research data. It proposes extracting structured elements and facts from papers using natural language processing and crowdsourcing. Extracted data would be aggregated and stored using the RDF Data Cube Vocabulary to enable analysis across different dimensions. Visual analytics tools and a sharing platform could help analyze relationships and commercialize insights. The overall aim is to make more of the empirical evidence and knowledge from research discoverable, comparable and reusable.
- SEP is a German software company that provides backup and disaster recovery software. It was established in 1980 and has over 3,500 customers in over 45 countries.
- There are different strategies for backing up virtual environments, including snapshot backups, which take an image of the entire virtual disk. This allows restoring the whole system but not individual files.
- Agent-based backups allow getting data from guest VMs in a consistent manner via APIs from hypervisors like VMware, Hyper-V, and Citrix XenServer. This enables restoring individual files and databases.
Denial of Service in Software Defined NetoworksMohammad Faraji
The document discusses denial of service attacks in software defined networks. It proposes a method to isolate application traffic in an extended cloud computing environment using software defined networking. This would decrease the chance of denial of service attacks by allowing applications to define their own network topology and security policies on a fine-grained level, preventing attacks from affecting other applications' traffic.
Vayavya Labs is a company that develops system level design tools and provides embedded design services. It has created DDGEN, the world's first automated device driver generator, which can significantly reduce the cost and efforts required for device driver development. DDGEN takes hardware specification files as input and generates fully functional device drivers and test code. It supports a range of device complexities and operating systems. Pilot results found DDGEN provided close to 200-300% reductions in time and effort for driver development.
Microsoft SQL Server Data Warehouses for SQL Server DBAsMark Kromer
The document discusses Microsoft SQL Server data warehousing solutions. It provides an agenda for a presentation that includes an overview of Microsoft's data warehousing offerings, how to establish baseline metrics for Fast Track reference configurations, and how to design balanced server and storage configurations for data warehousing workloads. It also discusses software and hardware best practices, such as data striping and storage configuration recommendations. Overall, the document outlines topics and solutions to help customers accelerate their data warehouse deployments using Microsoft SQL Server.
The document discusses key characteristics of data warehouses including that they contain historical data derived from transactions for querying, reporting, and analysis. It also compares online transaction processing (OLTP) systems to data warehouses. Additionally, it covers data warehouse architectures, design considerations, logical and physical design, and managing large volumes of data through techniques like partitioning and parallelism. Optimizing input/output performance is also highlighted as critical for data warehouses.
The document discusses key characteristics of data warehouses including that they contain historical data derived from transactions for querying, reporting, and analysis. It also compares online transaction processing (OLTP) systems to data warehouses. Additionally, it covers data warehouse architectures, design considerations, logical and physical design, and managing large volumes of data through techniques like partitioning and parallelism. Optimizing input/output performance is also a major focus since I/O is typically the primary determinant of data warehouse performance.
The document presents information on data warehousing. It defines a data warehouse as a repository for integrating enterprise data for analysis and decision making. It describes the key components, including operational data sources, an operational data store, and end-user access tools. It also outlines the processes of extracting, cleaning, transforming, loading and accessing the data, as well as common management tools. Data marts are discussed as focused subsets of a data warehouse tailored for a specific department.
HP Microsoft SQL Server Data Management SolutionsEduardo Castro
In this presentation was used in the MSDN WebCast and we cover some details about the hardware offerings to run SQL Server DataWarehouse, some detail about HP Hardware is shown.
Best Regards,
Ing. Eduardo Castro Martinez
http://ecastrom.blogspot.com
This document introduces COSBench, a benchmark tool developed by Intel to measure the performance of cloud object storage services. It describes key components of COSBench including the workload configuration, performance metrics collected, and a web console for managing tests. The document also provides a case study using COSBench to evaluate the performance of OpenStack Swift, an open source cloud storage system, by describing the entities and architecture of OpenStack Swift and the test configuration used.
The document discusses COSBench, a benchmark tool developed by Intel to evaluate the performance of cloud object storage services. It describes key components of COSBench including its configurable workload definition, drivers to generate load, and a web console to view results. The document also uses COSBench to analyze the performance of OpenStack Swift, finding that insufficient processing power can throttle overall performance.
This document discusses real-time big data applications and provides a reference architecture for search, discovery, and analytics. It describes combining analytical and operational workloads using a unified data model and operational database. Examples are given of organizations using this approach for real-time search, analytics and continuous adaptation of large and diverse datasets.
Big Data launch Singapore Patrick BuddenbaumIntelAPAC
The document discusses Intel's Open Platform for Next-Gen Analytics. It introduces Intel's Distribution for Apache Hadoop software, which delivers optimized performance, security, and ease of deployment for Apache Hadoop. The software is backed by Intel's portfolio of data center products and contributes enhancements to the open source Apache Hadoop ecosystem. The distribution enables partners to innovate on analytics solutions.
Complex Er[jl]ang Processing with StreamBasedarach
The document is a presentation about complex event processing using StreamBase. It discusses StreamBase's event processing platform and how it provides high performance through its domain specific language and optimizations. It also covers how StreamBase integrates with Erlang through calling Erlang functions and messaging.
Business Intelligence Applications: Build or Buy Evaluation and IBM Cognos DemoSenturus
This document provides an agenda and overview for a webinar on business intelligence applications and whether to build or buy them. The webinar discusses different options for building or acquiring BI systems, including custom-built data marts and warehouses, pre-built vendor solutions, and flexible frameworks. It also demonstrates IBM's analytic applications, which allow users to rapidly model and generate custom or pre-built BI applications.
Cloud computing, big data, and mobile technologies are driving major changes in the IT world. Cloud computing provides scalable computing resources over the internet. Big data involves extremely large data sets that are analyzed to reveal business insights. Hadoop is an open-source software framework that allows distributed processing of big data across commodity hardware. It includes tools like HDFS for storage and MapReduce for distributed computing. The Hadoop ecosystem also includes additional tools for tasks like data integration, analytics, workflow management, and more. These emerging technologies are changing how businesses use and analyze data.
1. Janus Update
Armando Oliva, M.D.
Deputy Director for Bioinformatics
Office of Critical Path Programs
armando.oliva@fda.hhs.gov
March 10, 2009
2. The views expressed in this presentation are
those of the speaker and not necessarily
those of the Food and Drug Administration
…
3. What is Janus?
• FDA’s Enterprise Program to improve
management of structured scientific data
– Clinical study data
– Nonclinical study data
– Pharmacogenomic (and other –omic) data
– Product quality and manufacturing data
– Post-market surveillance data
4. What will Janus do?
• Implementing data standards - Move
towards a single information model: the
HL7 Reference Information Model (RIM)
• Improving access: multiple interoperable
data warehouses
• Support and improve analytic tools
5. Janus Activity - Governance
• Sept. 2008:
– Janus approved by Bioinformatics Board as Agency-
wide initiative
– Previously Center-specific activities
• Uncoordinated
• Under-resourced
• January 2009:
– FDA Management Council approved Janus funding
from FY08 supplemental appropriations
7. Janus Architecture
Management
Data
Exchange Layer HL7 Messages (XML)
Common
EDR
RIM Database
RIM Database
Source Layer (Data Warehouse)
Clinical Nonclinical Harmonized
Product Information
Database Database Database n Database (SPL)
Database Layer (NCI) (NCTR) Inventory
Data Mart & CDISC SDTM CDISC SEND CDISC ADaM FAERS Database
MedWatch+
(Analysis Views) (Analysis Views) (Analysis Views) Data Mart n (Analysis Views)
Special Purpose
Layer
FAERS
FAERS COTS
Analysis SAS JReview Array Track Package WebVDME Analysis
(Software Application) (Software Application) (Software Application) (Software Application) (Software Application) Software n
Layer
Analysis
Results Layer
Results
8. Janus Architecture
• Flexible, Modular, and Extensible
• Leverages the HL7 RIM for healthcare information to promote
interoperability both within and outside FDA
• Open architecture and standards allow multiple plug-and-play
solutions
• Initial focus:
– Clinical study data
– Nonclinical study data
– Pharmacogenomic data
– Product Labeling, Registration, Listing data
– Post-marketing AE data
• Additional interoperable databases, data marts, and tools can be
added to accommodate additional scientific data streams
14. Janus Activities for 2009
• Long-term planning
– Assess/incorporate all Centers’ needs into long-term Agency
roadmap / strategic plan for Janus
• Support ongoing pilots
– Janus Phase 3 pilot for clinical data
• Move NCI Janus 1.x at NCI to production for SDTM submissions
– Phase 2 SEND pilot for nonclinical data
• Test SEND and NCTR Janus 1.x for animal toxicology data
• Test CDISC-HL7 messages
– RIM database for study data