The latest version of my PostgreSQL introduction for IL-TechTalks, a free service to introduce the Israeli hi-tech community to new and interesting technologies. In this talk, I describe the history and licensing of PostgreSQL, its built-in capabilities, and some of the new things that were added in the 9.1 and 9.2 releases which make it an attractive option for many applications.
One of the strongest points for using a NoSQL database is their focus on distribution — both for replication and sharding. This talks takes a short look at what replication is, why you should use it, and what is so difficult about it. We then take a look at MongoDB’s implementation in general and finally focus on what can go wrong. In a practical demo you see how to find the right balance between performance versus data safety and how to use it in your Java application.
PGDay.Amsterdam 2018 - Stefan Fercot - Save your data with pgBackRestPGDay.Amsterdam
Ever heard of Point-in-time recovery? pgBackRest is an awsome tool to handle backups, restores and even helps you build streaming replication ! This talk will introduce the tool, its basic features and how to use it.
Talk is called Deep Dive, so be prepared to hold your breath. In this talk we will take a look at the mechanisms of the SQL Server and literally dive into the bowels of SQL Server, going through all the stages of the request processing.
Если раньше при старте нового проекта нам нужно было выбрать одну из доступных на тот момент SQL баз данных, то за последние 5 лет ситуация кардинально изменилась. Теперь выбор стал гораздо сложнее. SQL или NoSQL? Сloud или on-premises? Если SQL/NoSQL - то какая именно? А может использовать и то и другое?
В данном докладе мы постараемся представить общий обзор доступных сегодня решений для хранения данных и определиться с критериями выбора.
Equinix Big Data Platform and Cassandra - A view into the journeyPraveen Kumar
Story of building Big Data Platform in Equinix to cater a number of use cases. It explains journey and selection of Cassandra for NoSQL solution sitting in the heart of the platform. Storm , flume, AMQ, Drools, Solr technologies playing an important role in the platform. Platform processing large amounts of data in real-time.
Overview of the different data models, mainly: flat file, hierarchical, network, relational, and object-oreitned. CAP theorem, NoSQL major four models: Document-oriented, Column-oriented, Key-Value store, and Graph. Followed by an overview of some of the famous no-sql products: Redis, Cassandra, MongoDB, and Neo4j.
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseEdureka!
NoSQL includes a wide range of different database technologies and were developed as a result of surging volume of data stored. Relational databases are not capable of coping with this huge volume and faces agility challenges. This is where NoSQL databases have come in to play and are popular because of their features. The session covers the following topics to help you choose the right NoSQL databases:
Traditional databases
Challenges with traditional databases
CAP Theorem
NoSQL to the rescue
A BASE system
Choose the right NoSQL database
Deep Dive into SharePoint Topologies and Server Architecture for SharePoint 2013K.Mohamed Faizal
Come and understand different type of SharePoint Topologies and learn how to design for SharePoint architecture that serve for Intranet, Websites, Office Web Apps Server, App management, wide-area networks, monitoring, newsfeeds, distributed cache, high availability, and disaster recovery.
In a DEV world where everything is built automatically (with Jenkins, Gitlab, Maven...), the developers still struggle to integrate some operational tasks in their build pipelines.
Despite tools like Ansible, Puppet and Rundeck are being used by more and more companies, some DBAs need (or want) to keep control over their legacy scripts. How can the developers convince the DBAs to implement some useful REST endpoints without much development effort?
In this session I will introduce some ideas for integration between applications, ORDS and operational scripts that will help cooperation between developers and DBAs.
The latest version of my PostgreSQL introduction for IL-TechTalks, a free service to introduce the Israeli hi-tech community to new and interesting technologies. In this talk, I describe the history and licensing of PostgreSQL, its built-in capabilities, and some of the new things that were added in the 9.1 and 9.2 releases which make it an attractive option for many applications.
One of the strongest points for using a NoSQL database is their focus on distribution — both for replication and sharding. This talks takes a short look at what replication is, why you should use it, and what is so difficult about it. We then take a look at MongoDB’s implementation in general and finally focus on what can go wrong. In a practical demo you see how to find the right balance between performance versus data safety and how to use it in your Java application.
PGDay.Amsterdam 2018 - Stefan Fercot - Save your data with pgBackRestPGDay.Amsterdam
Ever heard of Point-in-time recovery? pgBackRest is an awsome tool to handle backups, restores and even helps you build streaming replication ! This talk will introduce the tool, its basic features and how to use it.
Talk is called Deep Dive, so be prepared to hold your breath. In this talk we will take a look at the mechanisms of the SQL Server and literally dive into the bowels of SQL Server, going through all the stages of the request processing.
Если раньше при старте нового проекта нам нужно было выбрать одну из доступных на тот момент SQL баз данных, то за последние 5 лет ситуация кардинально изменилась. Теперь выбор стал гораздо сложнее. SQL или NoSQL? Сloud или on-premises? Если SQL/NoSQL - то какая именно? А может использовать и то и другое?
В данном докладе мы постараемся представить общий обзор доступных сегодня решений для хранения данных и определиться с критериями выбора.
Equinix Big Data Platform and Cassandra - A view into the journeyPraveen Kumar
Story of building Big Data Platform in Equinix to cater a number of use cases. It explains journey and selection of Cassandra for NoSQL solution sitting in the heart of the platform. Storm , flume, AMQ, Drools, Solr technologies playing an important role in the platform. Platform processing large amounts of data in real-time.
Overview of the different data models, mainly: flat file, hierarchical, network, relational, and object-oreitned. CAP theorem, NoSQL major four models: Document-oriented, Column-oriented, Key-Value store, and Graph. Followed by an overview of some of the famous no-sql products: Redis, Cassandra, MongoDB, and Neo4j.
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseEdureka!
NoSQL includes a wide range of different database technologies and were developed as a result of surging volume of data stored. Relational databases are not capable of coping with this huge volume and faces agility challenges. This is where NoSQL databases have come in to play and are popular because of their features. The session covers the following topics to help you choose the right NoSQL databases:
Traditional databases
Challenges with traditional databases
CAP Theorem
NoSQL to the rescue
A BASE system
Choose the right NoSQL database
Deep Dive into SharePoint Topologies and Server Architecture for SharePoint 2013K.Mohamed Faizal
Come and understand different type of SharePoint Topologies and learn how to design for SharePoint architecture that serve for Intranet, Websites, Office Web Apps Server, App management, wide-area networks, monitoring, newsfeeds, distributed cache, high availability, and disaster recovery.
In a DEV world where everything is built automatically (with Jenkins, Gitlab, Maven...), the developers still struggle to integrate some operational tasks in their build pipelines.
Despite tools like Ansible, Puppet and Rundeck are being used by more and more companies, some DBAs need (or want) to keep control over their legacy scripts. How can the developers convince the DBAs to implement some useful REST endpoints without much development effort?
In this session I will introduce some ideas for integration between applications, ORDS and operational scripts that will help cooperation between developers and DBAs.
=-=-=-==-=-Overview of the Talk-=-=-=-=-=
Introduction to the Subject
Database
Rational Database
Object Rational Database
Database Management System
History
Programming
SQL,
Connecting Java, Matlab to a Database
Advance DBMS
Data Grid
BigTable
Demo
Products
MySQL, SQLite, Oracle,
DB2, Microsoft Access,
Microsoft SQL Server
Products Comparison.
Open Source SQL databases enter millions queries per second eraAlexander Korotkov
It's widespread belief that SQL DBMSes are doomed to be hulking because of burden of backward compatibility. This belief is frequently used by marketing of various NoSQL DBMSes. However, this is not necessary true. Development in Open Source community makes product development flexible enough to meet needs of the times. MySQL and PostgreSQL, which are most popular Open Source DBMSes, recently made optimizations for big servers which made it possible to process more than million of SQL queries per second in the single instance. This talk will cover particular optimizations made in PostgreSQL for achieving this result, which could previously seem like fantastic. And one may say that Open Sourcse DBMSes open a new era of millions queries per second.
Bringing Oracle databases to the cloud involves major tectonic shifts: (1) hardware resources are no longer static and (2) expense model is “pay-per-use”. Previously, as long as your current servers were surviving the workload, no one cared whether they were under-utilized. Now, this difference can be immediately monetized because the resource elasticity means that you can give it back. As a result, the total quality of the code base (+performance tuning) has a direct impact on cost. This presentation will share some of the corresponding best practices: code instrumentation, profiling, code management, resource optimization etc. Overall, you can make your system cloud-friendly, but doing so takes explicit effort and serious thinking!
MySQL Without the SQL - Oh My! August 2nd presentation at Mid Atlantic Develo...Dave Stokes
MySQL Document Store allows you to use MySQL as a JSON Document Databases without needing to set up relational tables, normalize data, or use Structured Query Language.
Dynamic SQL: How to Build Fast Multi-Parameter Stored ProceduresBrent Ozar
ou're comfortable writing T-SQL, and you've built a lot of stored procedures that have a bunch of parameters. For example, you have that "product search" stored proc with parameters for product category, name, price range, sort order, etc, and you have to accept any of 'em.
So how do we make those go fast? And how can we get 'em to use indexes?
In one all-demo hour, performance tuner Brent Ozar will show you several ways that fail in comically bad ways. You'll learn how to write dynamic SQL that's easy to tune, manage, and troubleshoot.
Fewer developers each year are getting training in Structured Query Language (SQL) but their code is more dependent on Relation Data. The MySQL Document Store allows programmers to use the full power of MySQL without needing SQL! Built on the X DevAPI, the MySQL Document Store is a power tool for working with JSON Document Store or relational tables. Best of both NoSQL and SQL worlds!
All Things Open 2016 -- Database Programming for NewbiesDave Stokes
This presentation covers much a new developer needs to know about working WITH a database instead of against it. Plus there is much on what goes on behind the scenes when you submit a query and hints on how to avoid the big problems that can ruin your data
Similar to SQL, noSQL or no database at all? Are databases still a core skill? (20)
Data Integration: What I Haven't Yet AchievedNeil Saunders
Data integration is a hot topic in bioinformatics, but the term means different things to different people. What do we think it means? Talk given at CSIRO Bioinformatics & Biostatistics group meeting, November 21 2012.
What can science networking online do for youNeil Saunders
A talk on networking and blogging for life scientists. Part of a science communication seminar series, Diamantina Institute, Princess Alexandra hospital, Brisbane.
The Viking labelled release experiment: life on Mars?Neil Saunders
This is a very old talk from around 1999 that I gave to my department at the Free University of Amsterdam. It\'s very out of date now, but still interesting.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxMAGOTI ERNEST
Although Artemia has been known to man for centuries, its use as a food for the culture of larval organisms apparently began only in the 1930s, when several investigators found that it made an excellent food for newly hatched fish larvae (Litvinenko et al., 2023). As aquaculture developed in the 1960s and ‘70s, the use of Artemia also became more widespread, due both to its convenience and to its nutritional value for larval organisms (Arenas-Pardo et al., 2024). The fact that Artemia dormant cysts can be stored for long periods in cans, and then used as an off-the-shelf food requiring only 24 h of incubation makes them the most convenient, least labor-intensive, live food available for aquaculture (Sorgeloos & Roubach, 2021). The nutritional value of Artemia, especially for marine organisms, is not constant, but varies both geographically and temporally. During the last decade, however, both the causes of Artemia nutritional variability and methods to improve poorquality Artemia have been identified (Loufi et al., 2024).
Brine shrimp (Artemia spp.) are used in marine aquaculture worldwide. Annually, more than 2,000 metric tons of dry cysts are used for cultivation of fish, crustacean, and shellfish larva. Brine shrimp are important to aquaculture because newly hatched brine shrimp nauplii (larvae) provide a food source for many fish fry (Mozanzadeh et al., 2021). Culture and harvesting of brine shrimp eggs represents another aspect of the aquaculture industry. Nauplii and metanauplii of Artemia, commonly known as brine shrimp, play a crucial role in aquaculture due to their nutritional value and suitability as live feed for many aquatic species, particularly in larval stages (Sorgeloos & Roubach, 2021).
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skills
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Studia Poinsotiana
I Introduction
II Subalternation and Theology
III Theology and Dogmatic Declarations
IV The Mixed Principles of Theology
V Virtual Revelation: The Unity of Theology
VI Theology as a Natural Science
VII Theology’s Certitude
VIII Conclusion
Notes
Bibliography
All the contents are fully attributable to the author, Doctor Victor Salas. Should you wish to get this text republished, get in touch with the author or the editorial committee of the Studia Poinsotiana. Insofar as possible, we will be happy to broker your contact.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
Deep Software Variability and Frictionless Reproducibility
SQL, noSQL or no database at all? Are databases still a core skill?
1. SQL, noSQL or no database at all?
Are databases still a core skill?
Neil Saunders
COMPUTATIONAL INFORMATICS
www.csiro.au
2. Databases: Slide 2 of 24
alternative title: should David Lovell learn databases?
3. Databases: Slide 3 of 24
actual recent email request
Hi Neil,
I was wondering if you could help me with something. I am trying to put
together a table but it is rather slow by hand. Do you know if you can
help me with this task with a script? If it is too much of your time,
don’t worry about it. Just thought I’d ask before I start.
The task is:
The targets listed in A tab need to be found in B tab then the entire row
copied into C tab. Then the details in column C of C tab then need to be
matched with the details in D tab so that the patients with the mutations
are listed in row AG and AH of C tab.
Again, if this isn’t an easy task for you then don’t worry about it.
5. Databases: Slide 5 of 24
database design is a profession in itself
-- KEGG_DB schema
CREATE TABLE ec2go (
ec_no VARCHAR(16) NOT NULL, -- EC number (with "EC:" prefix)
go_id CHAR(10) NOT NULL -- GO ID
);
CREATE TABLE pathway2gene (
pathway_id CHAR(8) NOT NULL, -- KEGG pathway long ID
gene_id VARCHAR(20) NOT NULL -- Entrez Gene or ORF ID
);
CREATE TABLE pathway2name (
path_id CHAR(5) NOT NULL UNIQUE, -- KEGG pathway short ID
path_name VARCHAR(80) NOT NULL UNIQUE -- KEGG pathway name
);
-- Indexes.
CREATE INDEX Ipathway2gene ON pathway2gene (gene_id);
6. Databases: Slide 6 of 24
know your ORM from your MVC
(do you DSL?)
http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller
7. Databases: Slide 7 of 24
my one tip for today: use ORM
= object relational mapping
#!/usr/bin/ruby
require ’sequel’
# connect to UCSC Genomes MySQL server
DB = Sequel.connect(:adapter => "mysql", :host => "genome-mysql.cse.ucsc.edu",
:user => "genome", :database => "hg19")
# instead of "SELECT count(*) FROM knownGene"
DB.from(:knownGene).count
# => 82960
# instead of "SELECT name, chrom, txStart FROM knownGene LIMIT 1"
DB.from(:knownGene).select(:name, :chrom, :txStart).first
# => {:name=>"uc001aaa.3", :chrom=>"chr1", :txStart=>11873}
# instead of "SELECT name FROM knownGene WHERE chrom == ’chrM’"
DB.from(:knownGene).where(:chrom => "chrM").all
# => [{:name=>"uc004coq.4"}, {:name=>"uc022bqo.2"}, {:name=>"uc004cor.1"}, {:name=>"uc004cos.5"},
# {:name=>"uc022bqp.1"}, {:name=>"uc022bqq.1"}, {:name=>"uc022bqr.1"}, {:name=>"uc031tga.1"},
# {:name=>"uc022bqs.1"}, {:name=>"uc011mfi.2"}, {:name=>"uc022bqt.1"}, {:name=>"uc022bqu.2"},
# {:name=>"uc004cov.5"}, {:name=>"uc031tgb.1"}, {:name=>"uc004cow.2"}, {:name=>"uc004cox.4"},
# {:name=>"uc022bqv.1"}, {:name=>"uc022bqw.1"}, {:name=>"uc022bqx.1"}, {:name=>"uc004coz.1"}]
8. Databases: Slide 8 of 24
don’t want to CREATE? you still might want to SELECT
Question: How to map a SNP to a gene around +/- 60KB ?
I am looking at a bunch of SNPs. Some of them are part of genes,
but other are not. I am interested to look up +60KB or -60KB of
those SNPs to get details about some nearby genes. Please share
your experience in dealing with such a situation or thoughts on
any methods that can do this. Thanks in advance.
http://www.biostars.org/p/413/
9. Databases: Slide 9 of 24
example SELECT
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e ’
select
K.proteinID, K.name, S.name,
S.avHet, S.chrom, S.chromStart,
K.txStart, K.txEnd
from snp130 as S
left join knownGene as K on
(S.chrom = K.chrom and not(K.txEnd + 60000 < S.chromStart or
S.chromEnd + 60000 < K.txStart))
where
S.name in ("rs25","rs100","rs75","rs9876","rs101")
’
14. Databases: Slide 14 of 24
schema-free: save first, worry later
(= agile)
#!/usr/bin/ruby
require "mongo"
require "json/pure"
require "open-uri"
db = Mongo::Connection.new.db(’kegg’)
col = db.collection(’genes’)
j = JSON.parse(open("http://togows.dbcls.jp/entry/pathway/hsa00030/genes.json").read)
j.each do |g|
gene = Hash.new
g.each_pair do |key, val|
gene[:_id] = key
gene[:desc] = val
col.save(gene)
end
end
Ruby code to save JSON from the TogoWS REST service
15. Databases: Slide 15 of 24
example application - PMRetract
ask later if interested
http://pmretract.heroku.com/
https://github.com/neilfws/PubMed/tree/master/retractions
16. Databases: Slide 16 of 24
when rows + columns != database
- sometimes a database is overkill
20. Databases: Slide 20 of 24
when are databases good?
- when data are updated frequently
- when multiple users do the updating
- when queries are complex or ever-changing
- as backends to web applications
21. Databases: Slide 21 of 24
when are databases not/less good?
- for basic “set operations”
- for sequence data [1]
(?)
[1] no time to discuss BioSQL, GBrowse/Bio::DB::GFF, BioDAS etc.
22. Databases: Slide 22 of 24
so how did I answer that email?
options(java.parameters = "-Xmx4g")
library(XLConnect)
wb <- loadWorkbook("˜/Downloads/NGS Target list Tumour for Neil.xlsx")
s1 <- readWorksheet(wb, sheet = 1, startCol = 1, endCol = 1, header = F)
s2 <- readWorksheet(wb, sheet = 2, startCol = 1, endCol = 32, header = T)
s4 <- readWorksheet(wb, sheet = 4, startCol = 1, endCol = 3, header = T)
# then use gsub, match, %in% etc. to clean and join the data
# ...
Read spreadsheet into R using the XLConnect package, then “munge”