This R package provides tools for integrative analysis of DNA copy number and gene expression data. It contains two main functions: DR-Correlate identifies genes with significant correlations between DNA copy number and gene expression levels. DR-SAM finds genes with alterations in both DNA and expression levels between two sample classes. The package computes correlations or test statistics for each gene and compares to null distributions from permuted data to identify significant genes while controlling the false discovery rate.
Esta charla habla sobre 4 notas que me parecen relevantes a la hora de emprender.
Incluye una serie de recomendaciones que pueden ser interesantes para quien hoy está pensando en dar sus primeros pasos hacia la aventura de emprender.
Este documento describe diferentes métodos para automatizar descargas de archivos desde Internet, incluyendo gestores de descargas, redes P2P como BitTorrent, e IRC. Explica cómo funcionan estos métodos, ejemplos de software populares para cada uno, y algunas controversias relacionadas con el compartir archivos protegidos.
This document discusses how rural communities in Kenya can develop sustainable water supplies with assistance from the Water Services Trust Fund. It describes six water projects built by Danish International Development Agency (DANIDA) in partnership with local communities between 2003-2006. The projects trained community members to construct and maintain infrastructure like intakes, dams, and kiosks to provide access to water. Communities contributed labor and materials to reduce costs. Ongoing support included training managers in financial planning, record keeping, and operating projects as businesses to ensure long-term sustainability.
The document summarizes a student's graduation project focusing on how animals are used as resources for human benefits and the animal cruelty often involved. The project explored how animals are used for experiments, food, clothing, entertainment, and addressed the question of what types of animal cruelty are involved. The student cites challenges with time management and revising their research paper, but learned about various ways animals are mistreated as resources and benefits to helping them.
Esta charla habla sobre 4 notas que me parecen relevantes a la hora de emprender.
Incluye una serie de recomendaciones que pueden ser interesantes para quien hoy está pensando en dar sus primeros pasos hacia la aventura de emprender.
Este documento describe diferentes métodos para automatizar descargas de archivos desde Internet, incluyendo gestores de descargas, redes P2P como BitTorrent, e IRC. Explica cómo funcionan estos métodos, ejemplos de software populares para cada uno, y algunas controversias relacionadas con el compartir archivos protegidos.
This document discusses how rural communities in Kenya can develop sustainable water supplies with assistance from the Water Services Trust Fund. It describes six water projects built by Danish International Development Agency (DANIDA) in partnership with local communities between 2003-2006. The projects trained community members to construct and maintain infrastructure like intakes, dams, and kiosks to provide access to water. Communities contributed labor and materials to reduce costs. Ongoing support included training managers in financial planning, record keeping, and operating projects as businesses to ensure long-term sustainability.
The document summarizes a student's graduation project focusing on how animals are used as resources for human benefits and the animal cruelty often involved. The project explored how animals are used for experiments, food, clothing, entertainment, and addressed the question of what types of animal cruelty are involved. The student cites challenges with time management and revising their research paper, but learned about various ways animals are mistreated as resources and benefits to helping them.
This document contains contact information for the Boston SharePoint User Group (BASPUG) including their website, email, and Twitter account. It also lists related SharePoint user groups in other locations and websites for SharePoint events and training opportunities. The majority of the document simply repeats the BASPUG contact details.
Hola, somos 3 emprendedores. Estamos diseñando una solución que hemos bautizado NetUP. Se trata de lo siguientes:
Imagínate:
“Estás en un café, trabajando en tu proyecto de emprendimiento. Estás en el Canvas modelando tu negocio. Junto a tus socios, al terminar miran el resultado; se ve bien…pero hay algo que no se ven tan sencillo: no tienen en sus redes: los primeros clientes, proveedores estratégicos, contacto con eventuales aliados y para qué decir, potenciales inversionistas"
Qué les pasa si les contamos?
"Que existe una aplicación que al llegar a un lugar puedes activarla y chequearte. Al hacerlo, la aplicación te geolocaliza y muestra quienes más están en el lugar y en las proximidades.
Es verdad, eso existe. Y Foursquare lo hace espectacular…..
Pero si te contamos que NetUP además te muestra los contactos segmentados por perfiles, indicándote lo que hace esta gente, de modo de detectar potenciales clientes, aliados, proveedores y porque no, inversionistas?"
La aplicación es Freemium: gratis parte del servicio y de pago, servicios adicionales.
Este documento describe los factores clave y desafíos de los kioscos en Chile. Explica que los kioscos enfrentan competencia de suplementeros, farmacias, supermercados y otros. También detalla algunas características como que cada kiosco tiene su propio medidor eléctrico y que venden diarios, revistas y confites. Finalmente, el documento propone una nueva propuesta para los kioscos que involucra el kiosco, el cliente, el repartidor y una plataforma en la nube.
This document discusses how to form the comparative and superlative forms of adjectives in English. It explains that one-syllable adjectives typically take -er and -est, while two-syllable adjectives and longer take 'more' and 'most'. It also lists irregular adjectives and provides examples of using comparatives and superlatives in sentences. The document concludes by asking the reader to provide the comparative and superlative forms of sample adjectives.
El documento define los conceptos de consumidor y consumo, y enumera los derechos y obligaciones de los consumidores. Los derechos incluyen la calidad de bienes y servicios, protección de salud y seguridad, información e instrucción para el consumo, y reparación de daños. Las obligaciones son no afectar el medioambiente, evitar riesgos, e informarse responsablemente. Finalmente, explica que ser un consumidor responsable significa considerar la salud propia, de otros y del medioambiente al consumir.
Edward tiene un día extraño en su clase de lingüística cuando se queda dormido y despierta en el salón vacío bajo un cielo rojo. Intenta escapar del colegio pero las puertas están cerradas. Al encender su generador de hologramas descubre que en realidad se encuentra en el año 6666 a.C. y que cuatro caballeros están buscando matar a todos los que no creen en el número 666. Uno de los caballeros se acerca a Edward y lo apuñala con una daga.
This package contains functions to fit three topic models - latent Dirichlet allocation (LDA), mixed-membership stochastic blockmodel (MMSB), and supervised LDA (sLDA) - using a collapsed Gibbs sampling algorithm. The functions take in text corpora and network data in sparse formats, perform inference via Gibbs sampling, and return point estimates of the fitted topic models. The package also includes utility functions for data preprocessing, model evaluation, and topic exploration.
This document provides guidance on preparing genomic data for quality control and genome-wide association studies (GWAS). It discusses:
- Data formats used by the GWASTools package for storing SNP and sample annotation data and genotype/intensity data.
- Creating SNP and sample annotation data objects with the minimum required variables like SNP ID, chromosome, position, and sample ID, sex, etc.
- Performing quality control checks on the genotype data like calculating missing call rates, relatedness between samples, population structure, and preliminary association tests. It provides code examples to demonstrate these data cleaning steps in R.
This document provides an overview of importing and exporting data in R. It discusses importing spreadsheet-like data, data from other statistical systems, relational databases, binary files, and image files. It also covers exporting to text files, XML, and connections. A variety of packages are described that facilitate working with different data formats and databases.
This package provides functions for Gaussian mixture models, k-means clustering, mini-batch k-means clustering, k-medoids clustering, and affinity propagation clustering. It takes advantage of RcppArmadillo to speed up computationally intensive parts of the algorithms. The package includes functions for plotting clusters, validating cluster assignments, predicting cluster labels for new data, and estimating the optimal number of clusters.
This TaqMan® Gene Expression Assays Protocol provides instructions for performing
real-time reverse transcription-PCR (real-time RT-PCR) using TaqMan Gene
Expression Assays and TaqMan Non-coding RNA Assays.
For more information visit:
http://www.invitrogen.com/site/us/en/home/Products-and-Services/Applications/PCR/real-time-pcr/real-time-pcr-assays/taqman-gene-expression/single-tube-taqman-gene-expression-analysis.html?ICID=search-product?CID=TaqManGeneProducts-SS-12312
The document benchmarks 20 machine learning models on two datasets to compare their accuracy and speed. On the smaller Car Evaluation dataset, bagged decision trees, random forests and boosted decision trees achieved over 99% accuracy, while neural networks, decision stumps and support vector machines exceeded 95% accuracy. On the larger Nursery dataset, similar models exceeded 99% accuracy, while other models like decision rules and k-nearest neighbors exceeded 95% accuracy. However, models varied significantly in speed depending on the hardware, with decision trees, mixture discriminant analysis and gradient boosting as the fastest on Car Evaluation, and mixture discriminant analysis, one rule and boosted decision trees as the fastest on Nursery. The findings imply the importance of regular benchmarking
The document describes a project report submitted by R Ashwin for the award of a Bachelor of Technology degree. It discusses the distributed implementation of the graph database system DGraph. The key steps in the distributed version include sharding the data, assigning unique IDs, loading the data into different servers, and enabling communication between servers through network calls. Performance evaluation on the Freebase film dataset showed that the distributed version had higher throughput and lower latency than the centralized version, especially as the load and computational power increased.
The document is a report on unstructured data and the enterprise from Second Quarter 2012. It contains:
1) Various text formats like documents, emails, and web pages make up the largest portion of unstructured data in enterprises. Many firms are implementing projects to extract useful information from large amounts of email data.
2) Content management systems help enterprises manage and obtain information from unstructured text documents using metadata to enhance search and reporting.
3) The report discusses how capturing and managing unstructured data through techniques like metadata, taxonomy creation, and text analytics can provide competitive advantages for firms.
This document describes a study on classifying HTML tables as genuine (containing relational data) or non-genuine (used for layout purposes) using machine learning algorithms. It discusses extracting features related to table layout, content, and words to create feature vectors for training classifiers like SVM, decision trees, random forests, AdaBoost and neural networks. Python code is provided for feature creation and applying the different machine learning models to perform table classification. The models' parameters are optimized and their performance on the dataset is reviewed to determine the best approach.
This thesis investigates methods for integrative analysis of multiple data types. It extends the Joint and Individual Variation Explained (JIVE) method by incorporating a fused lasso penalty. A novel rank selection algorithm is also proposed. The methods are evaluated on simulated data and applied to analyze The Cancer Genome Atlas glioblastoma data to identify shared mutational processes between chromosomes.
This document provides an overview and specifications for several chemical structure file formats used by MDL, including Connection Table files (CTAB), Molfiles, Reaction files (Rxnfiles), and more. It describes the basic components and blocks within each file type, such as the header, atom and bond blocks in CTAB files, and molecule and reaction blocks in Rxnfiles. The document also covers enhancements to CTAB files like handling large molecules.
This document discusses applying machine learning techniques including text retrieval, association rule mining, and decision tree learning using R. It introduces the movie review dataset and preprocessing steps like removing stopwords and stemming. Text retrieval is performed to create a document-term matrix from the reviews. Association rules are generated from a sample of negative reviews using the Apriori algorithm. Decision trees are built on the combined document-term matrix and sentiment labels to classify review sentiment.
This document provides an overview of data modeling in MongoDB. It discusses two approaches to representing relationships between documents: embedding related data within documents versus linking documents with references. The key challenges in data modeling are balancing the needs of the application, database performance, and data usage patterns. The document recommends considering how data will be queried, updated, and processed when designing a data model. It also notes that write operations in MongoDB are atomic at the document level.
This document provides guidance on data modeling for MongoDB databases. It discusses two approaches to representing relationships between documents: embedding related data within documents, and linking documents through references. The document recommends considering both the application's usage patterns and data structure when designing a data model. It provides an overview of key data modeling concepts and factors to balance, and directs readers to additional sections that include data modeling examples and reference documentation.
This document contains contact information for the Boston SharePoint User Group (BASPUG) including their website, email, and Twitter account. It also lists related SharePoint user groups in other locations and websites for SharePoint events and training opportunities. The majority of the document simply repeats the BASPUG contact details.
Hola, somos 3 emprendedores. Estamos diseñando una solución que hemos bautizado NetUP. Se trata de lo siguientes:
Imagínate:
“Estás en un café, trabajando en tu proyecto de emprendimiento. Estás en el Canvas modelando tu negocio. Junto a tus socios, al terminar miran el resultado; se ve bien…pero hay algo que no se ven tan sencillo: no tienen en sus redes: los primeros clientes, proveedores estratégicos, contacto con eventuales aliados y para qué decir, potenciales inversionistas"
Qué les pasa si les contamos?
"Que existe una aplicación que al llegar a un lugar puedes activarla y chequearte. Al hacerlo, la aplicación te geolocaliza y muestra quienes más están en el lugar y en las proximidades.
Es verdad, eso existe. Y Foursquare lo hace espectacular…..
Pero si te contamos que NetUP además te muestra los contactos segmentados por perfiles, indicándote lo que hace esta gente, de modo de detectar potenciales clientes, aliados, proveedores y porque no, inversionistas?"
La aplicación es Freemium: gratis parte del servicio y de pago, servicios adicionales.
Este documento describe los factores clave y desafíos de los kioscos en Chile. Explica que los kioscos enfrentan competencia de suplementeros, farmacias, supermercados y otros. También detalla algunas características como que cada kiosco tiene su propio medidor eléctrico y que venden diarios, revistas y confites. Finalmente, el documento propone una nueva propuesta para los kioscos que involucra el kiosco, el cliente, el repartidor y una plataforma en la nube.
This document discusses how to form the comparative and superlative forms of adjectives in English. It explains that one-syllable adjectives typically take -er and -est, while two-syllable adjectives and longer take 'more' and 'most'. It also lists irregular adjectives and provides examples of using comparatives and superlatives in sentences. The document concludes by asking the reader to provide the comparative and superlative forms of sample adjectives.
El documento define los conceptos de consumidor y consumo, y enumera los derechos y obligaciones de los consumidores. Los derechos incluyen la calidad de bienes y servicios, protección de salud y seguridad, información e instrucción para el consumo, y reparación de daños. Las obligaciones son no afectar el medioambiente, evitar riesgos, e informarse responsablemente. Finalmente, explica que ser un consumidor responsable significa considerar la salud propia, de otros y del medioambiente al consumir.
Edward tiene un día extraño en su clase de lingüística cuando se queda dormido y despierta en el salón vacío bajo un cielo rojo. Intenta escapar del colegio pero las puertas están cerradas. Al encender su generador de hologramas descubre que en realidad se encuentra en el año 6666 a.C. y que cuatro caballeros están buscando matar a todos los que no creen en el número 666. Uno de los caballeros se acerca a Edward y lo apuñala con una daga.
This package contains functions to fit three topic models - latent Dirichlet allocation (LDA), mixed-membership stochastic blockmodel (MMSB), and supervised LDA (sLDA) - using a collapsed Gibbs sampling algorithm. The functions take in text corpora and network data in sparse formats, perform inference via Gibbs sampling, and return point estimates of the fitted topic models. The package also includes utility functions for data preprocessing, model evaluation, and topic exploration.
This document provides guidance on preparing genomic data for quality control and genome-wide association studies (GWAS). It discusses:
- Data formats used by the GWASTools package for storing SNP and sample annotation data and genotype/intensity data.
- Creating SNP and sample annotation data objects with the minimum required variables like SNP ID, chromosome, position, and sample ID, sex, etc.
- Performing quality control checks on the genotype data like calculating missing call rates, relatedness between samples, population structure, and preliminary association tests. It provides code examples to demonstrate these data cleaning steps in R.
This document provides an overview of importing and exporting data in R. It discusses importing spreadsheet-like data, data from other statistical systems, relational databases, binary files, and image files. It also covers exporting to text files, XML, and connections. A variety of packages are described that facilitate working with different data formats and databases.
This package provides functions for Gaussian mixture models, k-means clustering, mini-batch k-means clustering, k-medoids clustering, and affinity propagation clustering. It takes advantage of RcppArmadillo to speed up computationally intensive parts of the algorithms. The package includes functions for plotting clusters, validating cluster assignments, predicting cluster labels for new data, and estimating the optimal number of clusters.
This TaqMan® Gene Expression Assays Protocol provides instructions for performing
real-time reverse transcription-PCR (real-time RT-PCR) using TaqMan Gene
Expression Assays and TaqMan Non-coding RNA Assays.
For more information visit:
http://www.invitrogen.com/site/us/en/home/Products-and-Services/Applications/PCR/real-time-pcr/real-time-pcr-assays/taqman-gene-expression/single-tube-taqman-gene-expression-analysis.html?ICID=search-product?CID=TaqManGeneProducts-SS-12312
The document benchmarks 20 machine learning models on two datasets to compare their accuracy and speed. On the smaller Car Evaluation dataset, bagged decision trees, random forests and boosted decision trees achieved over 99% accuracy, while neural networks, decision stumps and support vector machines exceeded 95% accuracy. On the larger Nursery dataset, similar models exceeded 99% accuracy, while other models like decision rules and k-nearest neighbors exceeded 95% accuracy. However, models varied significantly in speed depending on the hardware, with decision trees, mixture discriminant analysis and gradient boosting as the fastest on Car Evaluation, and mixture discriminant analysis, one rule and boosted decision trees as the fastest on Nursery. The findings imply the importance of regular benchmarking
The document describes a project report submitted by R Ashwin for the award of a Bachelor of Technology degree. It discusses the distributed implementation of the graph database system DGraph. The key steps in the distributed version include sharding the data, assigning unique IDs, loading the data into different servers, and enabling communication between servers through network calls. Performance evaluation on the Freebase film dataset showed that the distributed version had higher throughput and lower latency than the centralized version, especially as the load and computational power increased.
The document is a report on unstructured data and the enterprise from Second Quarter 2012. It contains:
1) Various text formats like documents, emails, and web pages make up the largest portion of unstructured data in enterprises. Many firms are implementing projects to extract useful information from large amounts of email data.
2) Content management systems help enterprises manage and obtain information from unstructured text documents using metadata to enhance search and reporting.
3) The report discusses how capturing and managing unstructured data through techniques like metadata, taxonomy creation, and text analytics can provide competitive advantages for firms.
This document describes a study on classifying HTML tables as genuine (containing relational data) or non-genuine (used for layout purposes) using machine learning algorithms. It discusses extracting features related to table layout, content, and words to create feature vectors for training classifiers like SVM, decision trees, random forests, AdaBoost and neural networks. Python code is provided for feature creation and applying the different machine learning models to perform table classification. The models' parameters are optimized and their performance on the dataset is reviewed to determine the best approach.
This thesis investigates methods for integrative analysis of multiple data types. It extends the Joint and Individual Variation Explained (JIVE) method by incorporating a fused lasso penalty. A novel rank selection algorithm is also proposed. The methods are evaluated on simulated data and applied to analyze The Cancer Genome Atlas glioblastoma data to identify shared mutational processes between chromosomes.
This document provides an overview and specifications for several chemical structure file formats used by MDL, including Connection Table files (CTAB), Molfiles, Reaction files (Rxnfiles), and more. It describes the basic components and blocks within each file type, such as the header, atom and bond blocks in CTAB files, and molecule and reaction blocks in Rxnfiles. The document also covers enhancements to CTAB files like handling large molecules.
This document discusses applying machine learning techniques including text retrieval, association rule mining, and decision tree learning using R. It introduces the movie review dataset and preprocessing steps like removing stopwords and stemming. Text retrieval is performed to create a document-term matrix from the reviews. Association rules are generated from a sample of negative reviews using the Apriori algorithm. Decision trees are built on the combined document-term matrix and sentiment labels to classify review sentiment.
This document provides an overview of data modeling in MongoDB. It discusses two approaches to representing relationships between documents: embedding related data within documents versus linking documents with references. The key challenges in data modeling are balancing the needs of the application, database performance, and data usage patterns. The document recommends considering how data will be queried, updated, and processed when designing a data model. It also notes that write operations in MongoDB are atomic at the document level.
This document provides guidance on data modeling for MongoDB databases. It discusses two approaches to representing relationships between documents: embedding related data within documents, and linking documents through references. The document recommends considering both the application's usage patterns and data structure when designing a data model. It provides an overview of key data modeling concepts and factors to balance, and directs readers to additional sections that include data modeling examples and reference documentation.
This document provides an overview of the Perl Reference Manual for CDF (Coordinate Data Frame). It discusses compiling Perl scripts that interface with CDF, the programming interface including item referencing, passing arguments, status constants, CDF formats, data types, encodings, and more. The standard and internal interfaces for CDF functions in Perl are also described.
This document provides an overview of the Perl Reference Manual for CDF (Coordinate Data Frame). It discusses compiling Perl scripts that interface with CDF, the programming interface including item referencing, passing arguments, status constants, CDF formats, data types, encodings, and more. The standard and internal interfaces for CDF functions in Perl are also described.
This package provides functions for identifying outliers in data sets. It contains functions for several statistical tests commonly used for outlier detection, including Dixon's test, Grubbs' test, and Cochran's test. The package is maintained by Lukasz Komsta and provides documentation for each function, describing their usage and theoretical background. It allows calculating critical values and p-values to identify potential outliers based on these statistical tests.
This document provides an implementation guide for a Zone Routing Protocol (ZRP) agent in the NS-2 network simulator. It includes instructions on patching ZRP into NS-2, running simulations with ZRP, and using the documentation. The documentation is organized into sections on modules, classes, and references to help understand the ZRP code design and class definitions.
The document describes several C++ header and source code files that implement different types of integer lists: IntegerListArray (array-based), IntegerListVector (vector-based), IntegerListLinked (linked list-based), and IntegerListSorted (sorted list-based). Each file pair (header and source) defines a class to represent the list type, with member functions for common list operations like getting elements, size, adding/removing elements, and iterating. The files were generated by Doxygen from comments in the code.
Citing Data in Journal Articles using JATS by Deborah A. Lapeyredatascienceiqss
This document discusses how to cite data sources in journal articles using the Journal Article Tag Suite (JATS). It recommends using the <mixed-citation> element to cite data in references and outlines elements like <data-title>, <version>, and <pub-id> that can be used to represent different components of data citations. It also proposes new attributes and values for existing JATS elements to better support citing datasets, databases, and other research materials.
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Trusted Execution Environment for Decentralized Process MiningLucaBarbaro3
Presentation of the paper "Trusted Execution Environment for Decentralized Process Mining" given during the CAiSE 2024 Conference in Cyprus on June 7, 2024.
2. 2 DRI-package
Index 21
DRI-package DR-Integrator: an analytic tool for integrating DNA copy number and
gene expression data
Description
DR-Integrator identifies genes with significant correlations between DNA copy number alterations
and gene expression data, and implements a supervised learning analysis that captures genes with
significant alterations in both DNA copy number and gene expression between two sample classes.
Details
Package: DRI
Type: Package
Version: 1.1
Date: 2009-11-16
License: GPL-2
This package contains two analytic tools: DR-Correlate and DR-SAM.
Author(s)
Keyan Salari, Robert Tibshirani, Jonathan R. Pollack
Maintainer: Keyan Salari <ksalari@stanford.edu>
References
Salari, K., Tibshirani, R., and Pollack, J.R. (2009) DR-Integrator: a new analytic tool for integrating
DNA copy number and gene expression data. http://pollacklab.stanford.edu/
See Also
drcorrelate, drcorrelate.null, drsam, drsam.null, dri.fdrCutoff, dri.sig_genes,
dri.heatmap, dri.merge.CNbyRNA, dri.smooth.cghdata, runFusedLasso
Examples
require(impute)
data(mySampleData)
attach(mySampleData)
# DNA data should contain no missing values - pre-smooth beforehand
# Impute missing values for gene expression data
RNA.data <- dri.impute(RNA.data)
3. drcorrelate 3
# DR-Correlate analysis to find genes with correlated DNA/RNA measurements
obs <- drcorrelate(DNA.data, RNA.data, method="pearson")
# generate null distribution for FDR calculation (10 permutations)
null <- drcorrelate.null(DNA.data, RNA.data, method="pearson", perm=10)
# identify the correlation cutoff corresponding to your desired FDR
n.cutoff <- dri.fdrCutoff(obs, null, targetFDR=0.05, bt=TRUE)
cutoff <- n.cutoff[2]
# retrieve all genes that are significant at the determined cutoff, and
# calculate gene-specific FDRs
Results <- dri.sig_genes(cutoff, obs, null, GeneIDs, GeneNames, Chr, Nuc,
bt=TRUE, method="drcorrelate")
# Optional heatmap plot for significant DR-Correlation genes
sample.names <- colnames(DNA.data)
pdf(file="DRI-Heatmap.pdf", height=8, width=11)
dri.heatmap(Results, DNA.data, RNA.data, sample.names, GeneNames, Chr, Nuc,
statistic="pearson", color.scheme="RG")
dev.off()
# DR-SAM analysis to find genes with alterations in both DNA and RNA between
# different classes
labels <- c(rep(1,25), rep(2,25)) # 25 samples in class 1 and 25 in class 2
obs <- drsam(DNA.data, RNA.data, labels, transform.type="raw")
# generate null distribution for FDR calculation (10 permutations)
null <- drsam.null(DNA.data, RNA.data, labels, transform.type="raw", 10)
# identify the correlation cutoff corresponding to your desired FDR
n.cutoff <- dri.fdrCutoff(obs$test.summed, null, targetFDR=0.05, bt=TRUE)
cutoff <- n.cutoff[2]
# retrieve all genes that are significant at the determined cutoff, and
# calculate gene-specific FDRs
Results <- dri.sig_genes(cutoff, obs, null, GeneIDs, GeneNames, Chr, Nuc,
bt=TRUE, method="drsam")
drcorrelate Compute correlations between DNA copy number and gene expression
for each gene
Description
A correlation is computed between each gene’s DNA copy number and gene expression across all
the samples. Significant correlations are determined by comparison to a null distribution derived
from random permutations of the data.
Usage
drcorrelate(DNA, RNA, method = "pearson", tail_p = 10)
4. 4 drcorrelate
Arguments
DNA matrix of DNA copy number data
RNA matrix of gene expression data, samples (columns) in same order as DNA matrix
method correlation statistic, either "pearson", "spearman", or "ttest"
tail_p top/bottom percent of samples (with respect to the gene’s copy number) to use
for extremes t-test groups; used only when method = "ttest"
Details
DR-Correlate aims to identify genes with expression changes explained by underlying CNAs. This
tool performs an analysis to identify genes with statistically significant correlations between their
DNA copy number and gene expression levels. Three options for the statistic to measure correlation
are implemented: (1) Pearson’s correlation; (2) Spearman’s rank correlation; and (3) an "extremes"
t-test. For Pearson’s and Spearman’s correlations, the respective correlation coefficient is computed
for each gene. For the extremes t-test, a modified Student’s t-test (Tusher et al. 2001) is computed
for each gene, comparing gene expression levels of samples comprising the lowest and highest
deciles with respect to DNA copy number. That is, for each gene the samples are rank ordered by
DNA copy number and samples below the 10th percentile and above the 90th percentile form two
groups whose means of gene expression are compared with a modified t-test. The percentile cutoff
defining the two groups is user-adjustable.
Value
observed a vector of observed correlations for each gene
Author(s)
Keyan Salari, Robert Tibshirani, and Jonathan R. Pollack
References
Salari, K., Tibshirani, R., and Pollack, J.R. (2009) DR-Integrator: a new analytic tool for integrating
DNA copy number and gene expression data. http://pollacklab.stanford.edu/
See Also
drcorrelate, drcorrelate.null, drsam, drsam.null, dri.fdrCutoff, dri.sig_genes,
dri.heatmap, dri.merge.CNbyRNA, dri.smooth.cghdata, runFusedLasso
Examples
require(impute)
data(mySampleData)
attach(mySampleData)
# DNA data should contain no missing values - pre-smooth beforehand
# Impute missing values for gene expression data
RNA.data <- dri.impute(RNA.data)
5. drcorrelate.null 5
# DR-Correlate analysis to find genes with correlated DNA/RNA measurements
obs <- drcorrelate(DNA.data, RNA.data, method="pearson")
# generate null distribution for FDR calculation (10 permutations)
null <- drcorrelate.null(DNA.data, RNA.data, method="pearson", perm=10)
# identify the correlation cutoff corresponding to your desired FDR
n.cutoff <- dri.fdrCutoff(obs, null, targetFDR=0.05, bt=TRUE)
cutoff <- n.cutoff[2]
# retrieve all genes that are significant at the determined cutoff, and
# calculate gene-specific FDRs
Results <- dri.sig_genes(cutoff, obs, null, GeneIDs, GeneNames, Chr, Nuc,
bt=TRUE, method="drcorrelate")
drcorrelate.null Generate null distribution for DR-Correlate analysis
Description
A null distribution is generated by randomly permuting the sample labels of the gene expression
data matrix and recomputing the DNA/RNA correlations for each gene.
Usage
drcorrelate.null(DNA, RNA, method = "pearson", tail_p = 10, perm)
Arguments
DNA matrix of DNA copy number data
RNA matrix of gene expression data, samples (columns) in same order as DNA matrix
method correlation statistic, either "pearson", "spearman", or "ttest"
tail_p top/bottom percent of samples (with respect to the gene’s copy number) to use
for extremes t-test groups; used only when method = "ttest"
perm number of permutations to perform
Value
null n * k matrix of null data, where n = number of genes and k = number of permu-
tations
Author(s)
Keyan Salari, Robert Tibshirani, and Jonathan R. Pollack
References
Salari, K., Tibshirani, R., and Pollack, J.R. (2009) DR-Integrator: a new analytic tool for integrating
DNA copy number and gene expression data. http://pollacklab.stanford.edu/
6. 6 dri.correlation.plot
See Also
drcorrelate, drcorrelate.null, drsam, drsam.null, dri.fdrCutoff, dri.sig_genes,
dri.heatmap, dri.merge.CNbyRNA, dri.smooth.cghdata, runFusedLasso
Examples
require(impute)
data(mySampleData)
attach(mySampleData)
# DNA data should contain no missing values - pre-smooth beforehand
# Impute missing values for gene expression data
RNA.data <- dri.impute(RNA.data)
# DR-Correlate analysis to find genes with correlated DNA/RNA measurements
obs <- drcorrelate(DNA.data, RNA.data, method="pearson")
# generate null distribution for FDR calculation (10 permutations)
null <- drcorrelate.null(DNA.data, RNA.data, method="pearson", perm=10)
# identify the correlation cutoff corresponding to your desired FDR
n.cutoff <- dri.fdrCutoff(obs, null, targetFDR=0.05, bt=TRUE)
cutoff <- n.cutoff[2]
# retrieve all genes that are significant at the determined cutoff, and
# calculate gene-specific FDRs
Results <- dri.sig_genes(cutoff, obs, null, GeneIDs, GeneNames, Chr, Nuc,
bt=TRUE, method="drcorrelate")
dri.correlation.plot
Plot significant correlations from DR-Correlate analysis
Description
A plot is generated of all the correlations computed by DR-Correlate, indicating significant corre-
lations in red (positive) or blue (negative), ordered by chromosomal coordinates.
Usage
dri.correlation.plot(observed, Results.SigGenes, sig_cutoff, chr,
nuc_pos, bothtails)
Arguments
observed vector of observed correlations from drcorrelate
Results.SigGenes
list of significant genes from dri.sig_genes
sig_cutoff correlation significance cutoff returned from dri.fdrCutoff
chr vector of gene chromosome locations
nuc_pos vector of gene nucleotide positions
bothtails TRUE or FALSE indicating whether 2-tail test was performed
7. dri.fdrCutoff 7
Value
plot a plot of the correlations is returned in chromosomal order, with significant cor-
relations marked in red (positive) and blue (negative)
Author(s)
Keyan Salari, Robert Tibshirani, Jonathan R. Pollack
References
Salari, K., Tibshirani, R., and Pollack, J.R. (2009) DR-Integrator: a new analytic tool for integrating
DNA copy number and gene expression data. http://pollacklab.stanford.edu/
See Also
drcorrelate, drcorrelate.null, drsam, drsam.null, dri.fdrCutoff, dri.sig_genes,
dri.heatmap, dri.merge.CNbyRNA, dri.smooth.cghdata, runFusedLasso
Examples
require(impute)
data(mySampleData)
attach(mySampleData)
# DNA data should contain no missing values - pre-smooth beforehand
# Impute missing values for gene expression data
RNA.data <- dri.impute(RNA.data)
# DR-Correlate analysis to find genes with correlated DNA/RNA measurements
obs <- drcorrelate(DNA.data, RNA.data, method="pearson")
# generate null distribution for FDR calculation (10 permutations)
null <- drcorrelate.null(DNA.data, RNA.data, method="pearson", perm=10)
# identify the correlation cutoff corresponding to your desired FDR
n.cutoff <- dri.fdrCutoff(obs, null, targetFDR=0.05, bt=TRUE)
cutoff <- n.cutoff[2]
# retrieve all genes that are significant at the determined cutoff, and
# calculate gene-specific FDRs
Results <- dri.sig_genes(cutoff, obs, null, GeneIDs, GeneNames, Chr, Nuc,
bt=TRUE, method="drcorrelate")
# Optional correlation plot for significant DR-Correlation genes
dri.correlation.plot(obs, Results, cutoff, Chr, Nuc, bothtails=TRUE)
dri.fdrCutoff Determine cutoff score for a desired false discovery rate (FDR)
Description
A search is performed for the cutoff score (for either DR-Correlate or DR-SAM) that corresponds
to the user-defined FDR.
8. 8 dri.fdrCutoff
Usage
dri.fdrCutoff(observed, null, targetFDR, bt = TRUE)
Arguments
observed vector of scores from either drcorrelate or drsam
null matrix of null data from either drcorrelate.null or drsam.null
targetFDR desired false discovery rate
bt either TRUE or FALSE indicating whether a 2-tail test was performed
Details
A binary search is implemented to find the cutoff score that corresponds to the user-defined FDR
Value
comp1 a two element list containing the number of genes found significant at the chosen
FDR, and the score cutoff corresponding to that FDR.
Author(s)
Keyan Salari, Robert Tibshirani, and Jonathan R. Pollack
References
Salari, K., Tibshirani, R., and Pollack, J.R. (2009) DR-Integrator: a new analytic tool for integrating
DNA copy number and gene expression data. http://pollacklab.stanford.edu/
See Also
drcorrelate, drcorrelate.null, drsam, drsam.null, dri.fdrCutoff, dri.sig_genes,
dri.heatmap, dri.merge.CNbyRNA, dri.smooth.cghdata, runFusedLasso
Examples
require(impute)
data(mySampleData)
attach(mySampleData)
# DNA data should contain no missing values - pre-smooth beforehand
# Impute missing values for gene expression data
RNA.data <- dri.impute(RNA.data)
# DR-Correlate analysis to find genes with correlated DNA/RNA measurements
obs <- drcorrelate(DNA.data, RNA.data, method="pearson")
# generate null distribution for FDR calculation (10 permutations)
null <- drcorrelate.null(DNA.data, RNA.data, method="pearson", perm=10)
# identify the correlation cutoff corresponding to your desired FDR
n.cutoff <- dri.fdrCutoff(obs, null, targetFDR=0.05, bt=TRUE)
cutoff <- n.cutoff[2]
9. dri.heatmap 9
# retrieve all genes that are significant at the determined cutoff, and
# calculate gene-specific FDRs
Results <- dri.sig_genes(cutoff, obs, null, GeneIDs, GeneNames, Chr, Nuc,
bt=TRUE, method="drcorrelate")
# DR-SAM analysis to find genes with alterations in both DNA and RNA between
# different classes
labels <- c(rep(1,25), rep(2,25)) # 25 samples in class 1 and 25 in class 2
obs <- drsam(DNA.data, RNA.data, labels, transform.type="raw")
# generate null distribution for FDR calculation (10 permutations)
null <- drsam.null(DNA.data, RNA.data, labels, transform.type="raw", 10)
# identify the correlation cutoff corresponding to your desired FDR
n.cutoff <- dri.fdrCutoff(obs$test.summed, null, targetFDR=0.05, bt=TRUE)
cutoff <- n.cutoff[2]
# retrieve all genes that are significant at the determined cutoff, and
# calculate gene-specific FDRs
Results <- dri.sig_genes(cutoff, obs, null, GeneIDs, GeneNames, Chr, Nuc,
bt=TRUE, method="drsam")
dri.heatmap Generate a heatmap of significant DR-Correlate genes
Description
A heatmap is generated showing the DNA copy number and gene expression data for the significant,
positively correlated DR-Correlate genes.
Usage
dri.heatmap(Results.SigGenes, DNA, RNA, SampleIDs, GeneNames, Chr,
Nuc, statistic, color.scheme)
Arguments
Results.SigGenes
list of significant genes from dri.sig_genes
DNA matrix of DNA copy number data
RNA matrix of gene expression data, samples (columns) in same order as DNA matrix
SampleIDs vector of sample names
GeneNames vector of gene names
Chr vector of gene chromosome locations
Nuc vector of gene nucleotide positions
statistic method used in drcorrelate, either "pearson", "spearman", or "ttest"
color.scheme desired heatmap color scheme, either "RG" (red-green), "RB" ("red-blue), or
"YB" (yellow-blue)
10. 10 dri.impute
Author(s)
Keyan Salari, Robert Tibshirani, Jonathan R. Pollack
References
Salari, K., Tibshirani, R., and Pollack, J.R. (2009) DR-Integrator: a new analytic tool for integrating
DNA copy number and gene expression data. http://pollacklab.stanford.edu/
See Also
drcorrelate, drcorrelate.null, drsam, drsam.null, dri.fdrCutoff, dri.sig_genes,
dri.heatmap, dri.merge.CNbyRNA, dri.smooth.cghdata, runFusedLasso
Examples
require(impute)
data(mySampleData)
attach(mySampleData)
# DNA data should contain no missing values - pre-smooth beforehand
# Impute missing values for gene expression data
RNA.data <- dri.impute(RNA.data)
# DR-Correlate analysis to find genes with correlated DNA/RNA measurements
obs <- drcorrelate(DNA.data, RNA.data, method="pearson")
# generate null distribution for FDR calculation (10 permutations)
null <- drcorrelate.null(DNA.data, RNA.data, method="pearson", perm=10)
# identify the correlation cutoff corresponding to your desired FDR
n.cutoff <- dri.fdrCutoff(obs, null, targetFDR=0.05, bt=TRUE)
cutoff <- n.cutoff[2]
# retrieve all genes that are significant at the determined cutoff, and
# calculate gene-specific FDRs
Results <- dri.sig_genes(cutoff, obs, null, GeneIDs, GeneNames, Chr, Nuc,
bt=TRUE, method="drcorrelate")
# Optional heatmap plot for significant DR-Correlation genes
sample.names <- colnames(DNA.data)
pdf(file="DRI-Heatmap.pdf", height=8, width=11)
dri.heatmap(Results, DNA.data, RNA.data, sample.names, GeneNames, Chr, Nuc,
statistic="pearson", color.scheme="RG")
dev.off()
dri.impute Impute missing values for gene expression data
Description
Missing gene expression measurements are imputed by the K-nearest neighbors method
11. dri.impute 11
Usage
dri.impute(d)
Arguments
d data matrix whose missing values will be imputed
Details
The K-nearest neighbors method is employed (K=10) to impute missing gene expression values.
For any row with more than 50
Uses the impute.knn function of the impute package.
Value
data the new imputed data matrix
Author(s)
Keyan Salari, Robert Tibshirani, and Jonathan R. Pollack
References
Salari, K., Tibshirani, R., and Pollack, J.R. (2009) DR-Integrator: a new analytic tool for integrating
DNA copy number and gene expression data. http://pollacklab.stanford.edu/
Olga Troyanskaya, Michael Cantor, Gavin Sherlock, Pat Brown, Trevor Hastie, Robert Tibshi-
rani, David Botstein and Russ B. Altman, Missing value estimation methods for DNA microarrays
BIOINFORMATICS Vol. 17 no. 6, 2001 Pages 520-525.
See Also
drcorrelate, drcorrelate.null, drsam, drsam.null, dri.fdrCutoff, dri.sig_genes,
dri.heatmap, dri.merge.CNbyRNA, dri.smooth.cghdata, runFusedLasso
Examples
require(impute)
data(mySampleData)
attach(mySampleData)
# Impute missing values for gene expression data
RNA.data <- dri.impute(RNA.data)
12. 12 dri.merge.CNbyRNA
dri.merge.CNbyRNA Merge DNA copy number and gene expression data sets
Description
A matrix of DNA copy number values and a matrix of gene expression values are merged into one
matrix using the genome position coordinates of the probes on each microarray platform. Copy
number probes flanking each gene expression probe are averaged to derive the copy number at a
given expression probe.
Usage
dri.merge.CNbyRNA(dna.chr, dna.nuc, dna.data, rna.chr, rna.nuc)
Arguments
dna.chr vector of chromosomes on which copy number probes are located
dna.nuc numeric vector of nucleotide positions where copy number probes are located
dna.data matrix of DNA copy number data
rna.chr vector of chromosomes on which expression probes are located
rna.nuc numeric vector of nucleotide positions where expression probes are located
Details
This function performs a pre-processing step to obtain 1-to-1 mappings of DNA copy number and
gene expression for each measured gene. For each gene expression probe, the closest 5’ and 3’
copy number probes are used to calculate an average copy number value. If the same array platform
was used to measure copy number and gene expression, the copy number and gene expression
measurement will be matched by the probes’ genome position coordinates. This function returns a
DNA copy number matrix that has the same number of rows (genes) as the gene expression matrix.
Value
average.dna.data
a matrix of average DNA copy number values corresponding to each gene ex-
pression value
Author(s)
Keyan Salari, Robert Tibshirani, and Jonathan R. Pollack
References
Salari, K., Tibshirani, R., and Pollack, J.R. (2009) DR-Integrator: a new analytic tool for integrating
DNA copy number and gene expression data. http://pollacklab.stanford.edu/
13. dri.sig_genes 13
See Also
drcorrelate, drcorrelate.null, drsam, drsam.null, dri.fdrCutoff, dri.sig_genes,
dri.heatmap, dri.merge.CNbyRNA, dri.smooth.cghdata, runFusedLasso
Examples
require(impute)
data(mySampleData)
attach(mySampleData)
dri.merge.CNbyRNA(dna.chr=Chr, dna.nuc=Nuc, dna.data=DNA.data, rna.chr=Chr,
rna.nuc=Nuc)
dri.sig_genes Determine significant genes for DR-Correlate or DR-SAM analysis
Description
Given a cutoff score corresponding to the desired FDR, the list of significant genes is generated,
along with each gene’s FDR.
Usage
dri.sig_genes(cutoff, observed, null_dist, gene_id, gene_name, chr,
nuc, bt = TRUE, method = "drcorrelate")
Arguments
cutoff cutoff score for significance from dri.fdrCutoff
observed vector of observed scores from drcorrelate or drsam
null_dist matrix of null data from drcorrelate.null or drsam.null
gene_id vector of gene IDs
gene_name vector of gene names
chr vector of gene chromosome locations
nuc vector of gene nucleotide positions
bt either TRUE or FALSE indicated whether a 2-tail test was performed
method analysis method used, either "drcorrelate" or "drsam"
Value
Results.SigGenes
a list of significant genes, positive and negative, with gene-specific FDRs
Author(s)
Keyan Salari, Robert Tibshirani, and Jonathan R. Pollack
14. 14 dri.smooth.cghdata
References
Salari, K., Tibshirani, R., and Pollack, J.R. (2009) DR-Integrator: a new analytic tool for integrating
DNA copy number and gene expression data. http://pollacklab.stanford.edu/
See Also
drcorrelate, drcorrelate.null, drsam, drsam.null, dri.fdrCutoff, dri.sig_genes,
dri.heatmap, dri.merge.CNbyRNA, dri.smooth.cghdata, runFusedLasso
Examples
require(impute)
data(mySampleData)
attach(mySampleData)
# DNA data should contain no missing values - pre-smooth beforehand
# Impute missing values for gene expression data
RNA.data <- dri.impute(RNA.data)
# DR-Correlate analysis to find genes with correlated DNA/RNA measurements
obs <- drcorrelate(DNA.data, RNA.data, method="pearson")
# generate null distribution for FDR calculation (10 permutations)
null <- drcorrelate.null(DNA.data, RNA.data, method="pearson", perm=10)
# identify the correlation cutoff corresponding to your desired FDR
n.cutoff <- dri.fdrCutoff(obs, null, targetFDR=0.05, bt=TRUE)
cutoff <- n.cutoff[2]
# retrieve all genes that are significant at the determined cutoff, and
# calculate gene-specific FDRs
Results <- dri.sig_genes(cutoff, obs, null, GeneIDs, GeneNames, Chr, Nuc,
bt=TRUE, method="drcorrelate")
dri.smooth.cghdata Smooth DNA copy number data over a specific window size
Description
A moving average is calculated by chromosomal location of copy number probes using a specified
window size.
Usage
dri.smooth.cghdata(DNA.data, Chr, mw = 5)
Arguments
DNA.data matrix of DNA copy number data (rows ordered by genome position
Chr vector of chromosomes on which copy number probes are located (ordered by
probes’ genome position
mw window size to take moving average over
15. drsam 15
Details
This function performs one pre-processing method for smoothing copy number data before perform-
ing further DNA/RNA integrative analyses. To help eliminate missing values, and reduce noise, for
each sample a moving average is taken over a user-specified window size along each chromosome.
Value
DNA.smoothed matrix of DNA copy number data smoothed by a moving average window
Author(s)
Keyan Salari, Robert Tibshirani, and Jonathan R. Pollack
References
Salari, K., Tibshirani, R., and Pollack, J.R. (2009) DR-Integrator: a new analytic tool for integrating
DNA copy number and gene expression data. http://pollacklab.stanford.edu/
See Also
drcorrelate, drcorrelate.null, drsam, drsam.null, dri.fdrCutoff, dri.sig_genes,
dri.heatmap, dri.merge.CNbyRNA, dri.smooth.cghdata, runFusedLasso
Examples
data(mySampleData)
attach(mySampleData)
DRI.DNA.processed <- dri.smooth.cghdata(DNA.data, Chr, 5)
drsam Perform supervised learning analysis between 2 sample classes for
DNA/RNA differences
Description
A test is performed to identify genes with significant differences in both DNA copy number and
gene expression between two sample groups of interest.
Usage
drsam(DNA.data, RNA.data, labels, transform.type, for.null = FALSE)
16. 16 drsam
Arguments
DNA.data matrix of DNA copy number data
RNA.data matrix of gene expression data, samples (columns) in same order as DNA matrix
labels class labels of the two comparison groups, either 1 or 2
transform.type
type of transformation to apply to data, either "standardize", "rank", or "raw"
for.null used internally by drsam.null, keep as default
Details
DR-SAM (DNA/RNA-Significance Analysis of Microarrays) performs a supervised analysis to
identify genes with statistically significant differences in both DNA copy number and gene ex-
pression between different classes (e.g., tumor subtype-A vs. tumor subtype-B). The goal of this
analysis is to identify genetic differences (CNAs) that mediate gene expression differences between
two groups of interest. DR-SAM implements a modified Student’s t-test to generate for each gene
two t-scores assessing differences in DNA copy number and differences in gene expression. A final
score is computed by first summing the copy number t-score and gene expression t-score, and then
weighting the sum by the ratio of the two t-scores. The weight is applied to favor genes with strong
differences in both DNA copy number and gene expression between the two classes.
Value
observed vector of observed DR-SAM scores for each gene
Author(s)
Keyan Salari, Robert Tibshirani, Jonathan R. Pollack
References
Salari, K., Tibshirani, R., and Pollack, J.R. (2009) DR-Integrator: a new analytic tool for integrating
DNA copy number and gene expression data. http://pollacklab.stanford.edu/
See Also
drcorrelate, drcorrelate.null, drsam, drsam.null, dri.fdrCutoff, dri.sig_genes,
dri.heatmap, dri.merge.CNbyRNA, dri.smooth.cghdata, runFusedLasso
Examples
require(impute)
data(mySampleData)
attach(mySampleData)
# DNA data should contain no missing values - pre-smooth beforehand
# Impute missing values for gene expression data
RNA.data <- dri.impute(RNA.data)
# DR-SAM analysis to find genes with alterations in both DNA and RNA between
17. drsam.null 17
# different classes
labels <- c(rep(1,25), rep(2,25)) # 25 samples in class 1 and 25 in class 2
obs <- drsam(DNA.data, RNA.data, labels, transform.type="raw")
# generate null distribution for FDR calculation (10 permutations)
null <- drsam.null(DNA.data, RNA.data, labels, transform.type="raw", 10)
# identify the correlation cutoff corresponding to your desired FDR
n.cutoff <- dri.fdrCutoff(obs$test.summed, null, targetFDR=0.05, bt=TRUE)
cutoff <- n.cutoff[2]
# retrieve all genes that are significant at the determined cutoff, and
# calculate gene-specific FDRs
Results <- dri.sig_genes(cutoff, obs, null, GeneIDs, GeneNames, Chr, Nuc,
bt=TRUE, method="drsam")
drsam.null Generate null distribution for DR-SAM analysis
Description
A null distribution is generated by randomly permuting the class labels of the samples and recom-
puting the DR-SAM statistic for each gene.
Usage
drsam.null(DNA.data, RNA.data, labels, transform.type, k)
Arguments
DNA.data matrix of DNA copy number data
RNA.data matrix of gene expression data, samples (columns) in same order as DNA matrix
labels class labels of the two comparison groups, either 1 or 2
transform.type
type of transformation to apply to data, either "standardize", "rank", or "raw"
k number of permutations to perform
Value
null n * k matrix of null data, where n = number of genes and k = number of permu-
tations
Author(s)
Keyan Salari, Robert Tibshirani, Jonathan R. Pollack
References
Salari, K., Tibshirani, R., and Pollack, J.R. (2009) DR-Integrator: a new analytic tool for integrating
DNA copy number and gene expression data. http://pollacklab.stanford.edu/
18. 18 mySampleData
See Also
drcorrelate, drcorrelate.null, drsam, drsam.null, dri.fdrCutoff, dri.sig_genes,
dri.heatmap, dri.merge.CNbyRNA, dri.smooth.cghdata, runFusedLasso
Examples
require(impute)
data(mySampleData)
attach(mySampleData)
# DNA data should contain no missing values - pre-smooth beforehand
# Impute missing values for gene expression data
RNA.data <- dri.impute(RNA.data)
# DR-SAM analysis to find genes with alterations in both DNA and RNA between
# different classes
labels <- c(rep(1,25), rep(2,25)) # 25 samples in class 1 and 25 in class 2
obs <- drsam(DNA.data, RNA.data, labels, transform.type="raw")
# generate null distribution for FDR calculation (10 permutations)
null <- drsam.null(DNA.data, RNA.data, labels, transform.type="raw", 10)
# identify the correlation cutoff corresponding to your desired FDR
n.cutoff <- dri.fdrCutoff(obs$test.summed, null, targetFDR=0.05, bt=TRUE)
cutoff <- n.cutoff[2]
# retrieve all genes that are significant at the determined cutoff, and
# calculate gene-specific FDRs
Results <- dri.sig_genes(cutoff, obs, null, GeneIDs, GeneNames, Chr, Nuc,
bt=TRUE, method="drsam")
mySampleData DR-Integrator Sample Data
Description
Sample data of 100 genes copy number and gene expression data from 50 cell lines
Usage
data(mySampleData)
Format
The format is: List of 6 $ GeneIDs : Factor $ GeneNames: Factor $ Chr : int $ Nuc : int $ DNA.data
: matrix $ RNA.data : matrix
Source
Kao, J., Salari, K., Bocanegra, M.C. et al. (2009) Molecular profiling of breast cancer cell lines
defines relevant tumor models and provides a resource for cancer gene discovery. PLoS One.
19. runFusedLasso 19
References
Salari, K., Tibshirani, R., and Pollack, J.R. (2009) DR-Integrator: a new analytic tool for integrating
DNA copy number and gene expression data. http://pollacklab.stanford.edu/
Examples
data(mySampleData)
runFusedLasso Wrapper function to run cghFLasso
Description
A function to run the Fused Lasso method on DNA copy number data to call copy number alter-
ations.
Usage
runFusedLasso(DNA.data, normal.data = NA, chr, nuc, FDR)
Arguments
DNA.data matrix of disease tissue DNA copy number data
normal.data matrix of normal tissue DNA copy number data (optional)
chr vector of chromosomes where copy number probes are located (ordered by
genome location)
nuc numeric vector of nucleotide positions where copy number probes are located
(ordered by genome location)
FDR false discovery rate to apply to threshold significant gains and losses
Details
This function performs a pre-processing step to smooth and segment DNA copy number data using
the fused lasso method of Tibshirani & Wang, 2008. The DNA copy number data from the disease
tissues are compared to that of normal tissues, if supplied, to help call significant copy number
alterations. A false discovery rate is calculated for each copy number alteration and the signifi-
cance threshold is user-definable. More details of the Fused Lasso method can be found with the
cghFLasso package and in the reference below.
Value
data.FL matrix of DNA copy number data with significant gains and losses called by
Fused Lasso, and non-significant changes set to zero
Author(s)
Keyan Salari, Robert Tibshirani, and Jonathan R. Pollack
20. 20 runFusedLasso
References
Salari, K., Tibshirani, R., and Pollack, J.R. (2009) DR-Integrator: a new analytic tool for integrating
DNA copy number and gene expression data. http://pollacklab.stanford.edu/
Tibshirani, R., and Wang, P. (2008) Spatial smoothing and hot spot detection for CGH data using
the fused lasso, Biostatistics, 9, 18-29.
See Also
drcorrelate, drcorrelate.null, drsam, drsam.null, dri.fdrCutoff, dri.sig_genes,
dri.heatmap, dri.merge.CNbyRNA, dri.smooth.cghdata, runFusedLasso
Examples
library(cghFLasso)
data(CGH)
attach(CGH)
DRI.DNA.data.FL <- runFusedLasso(DNA.data=DiseaseArray,
normal.data=NormalArray, chr=chromosome, nuc=nucposition, FDR=0.01)