SlideShare a Scribd company logo
Building a Web Application to Monitor PubMed
Retraction Notices
Neil Saunders
CSIRO Mathematics, Informatics and Statistics
Building E6B, Macquarie University Campus
North Ryde

December 1, 2011
Retraction Watch
Project Aims

Monitor PubMed for retractions
Retrieve retraction data and store locally for analysis
Develop web application to display retraction data
PubMed - advanced search, RSS and send-to-file
Updates in Google Reader
PubMed - MeSH
PubMed - EUtils

http://www.ncbi.nlm.nih.gov/books/NBK25501/
EInfo example script

#!/usr/bin/ruby
require ’rubygems’
require ’bio’
require ’hpricot’
require ’open-uri’
Bio::NCBI.default_email = "me@me.com"
ncbi = Bio::NCBI::REST.new
url = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db="
ncbi.einfo.each do |db|
puts "Processing #{db}..."
File.open("#{db}.txt", "w") do |f|
doc = Hpricot(open("#{url + db}"))
(doc/’//fieldlist/field’).each do |field|
name = (field/’/name’).inner_html
fullname = (field/’/fullname’).inner_html
description = (field/’description’).inner_html
f.write("#{name},#{fullname},#{description}n")
end
end
end
EInfo script - output

ALL,All Fields,All terms from all searchable fields
UID,UID,Unique number assigned to publication
FILT,Filter,Limits the records
TITL,Title,Words in title of publication
WORD,Text Word,Free text associated with publication
MESH,MeSH Terms,Medical Subject Headings assigned to publication
MAJR,MeSH Major Topic,MeSH terms of major importance to publication
AUTH,Author,Author(s) of publication
JOUR,Journal,Journal abbreviation of publication
AFFL,Affiliation,Author’s institutional affiliation and address
...
MongoDB Overview

MongoDB is a so-called “NoSQL” database
Key features:
Document-oriented
Schema-free
Documents stored in collections
http://www.mongodb.org/
Saving to a database collection: ecount

#!/usr/bin/ruby
require "rubygems"
require "bio"
require "mongo"
db = Mongo::Connection.new.db(’pubmed’)
col = db.collection(’ecount’)
Bio::NCBI.default_email = "me@me.com"
ncbi = Bio::NCBI::REST.new
1977.upto(Time.now.year) do |year|
all
= ncbi.esearch_count("#{year}[dp]", {"db" => "pubmed"})
term
= ncbi.esearch_count("Retraction of Publication[ptyp] #{year}[dp]",
{"db" => "pubmed"})
record = {"_id" => year, "year" => year, "total" => all,
"retracted" => term, "updated_at" => Time.now}
col.save(record)
puts "#{year}..."
end
puts "Saved #{col.count} records."
ecount collection

> db.ecount.findOne()
{
"_id" : 1977,
"retracted" : 3,
"updated_at" : ISODate("2011-11-15T03:58:10.729Z"),
"total" : 260517,
"year" : 1977
}
Saving to a database collection: entries

#!/usr/bin/ruby
require "rubygems"
require "mongo"
require "crack"
db = Mongo::Connection.new.db("pubmed")
col = db.collection(’entries’)
col.drop
xmlfile = "#{ENV[’HOME’]}/Dropbox/projects/pubmed/retractions/data/retract.xml"
xml
= Crack::XML.parse(File.read(xmlfile))
xml[’PubmedArticleSet’][’PubmedArticle’].each do |article|
article[’_id’] = article[’MedlineCitation’][’PMID’]
col.save(article)
end
puts "Saved #{col.count} articles."
entries collection
{
"_id" : "22106469",
"PubmedData" : {
"PublicationStatus" : "ppublish",
"ArticleIdList" : {
"ArticleId" : "22106469"
},
"History" : {
"PubMedPubDate" : [
{
"Minute" : "0",
"Month" : "11",
"PubStatus" : "entrez",
"Day" : "23",
"Hour" : "6",
"Year" : "2011"
},
{
"Minute" : "0",
"Month" : "11",
"PubStatus" : "pubmed",
"Day" : "23",
"Hour" : "6",
"Year" : "2011"
},
...
Saving to a database collection: timeline

#!/usr/bin/ruby
require "rubygems"
require "mongo"
require "date"
db
= Mongo::Connection.new.db(’pubmed’)
entries = db.collection(’entries’)
timeline = db.collection(’timeline’)
dates = entries.find.map { |entry| entry[’MedlineCitation’][’DateCreated’] }
dates.map! { |d| Date.parse("#{d[’Year’]}-#{d[’Month’]}-#{d[’Day’]}") }
dates.sort!
data = (dates.first..dates.last).inject(Hash.new(0)) { |h, date| h[date] = 0; h }
dates.each { |date| data[date] += 1}
data = data.sort
data.map! {|e| ["Date.UTC(#{e[0].year},#{e[0].month - 1},#{e[0].day})", e[1]] }
data.each do |date|
timeline.save({"_id" => date[0].gsub(".", "_"), "date" => date[0], "count" => date[1]})
end
puts "Saved #{timeline.count} dates in timeline."
timeline collection

> db.timeline.findOne()
{
"_id" : "Date_UTC(1977,7,12)",
"date" : "Date.UTC(1977,7,12)",
"count" : 1
}
Sinatra: minimal example

require "rubygems"
require "sinatra"
get "/" do
"Hello World"
end
# ruby myapp.rb
# http://localhost:4567
Highcharts: minimal example code

var chart = new Highcharts.Chart({
chart: {
renderTo: ’container’,
defaultSeriesType: ’line’
},
xAxis: {
categories: [’Jan’, ’Feb’, ’Mar’, ’Apr’, ’May’, ’Jun’,
’Jul’, ’Aug’, ’Sep’, ’Oct’, ’Nov’, ’Dec’]
},
series: [{
data: [29.9, 71.5, 106.4, 129.2, 144.0, 176.0,
135.6, 148.5, 216.4, 194.1, 95.6, 54.4]
}]
});
// <div id="container" style="height: 400px"></div>
Highcharts: minimal example result
Web Application Overview
|---config.ru
|---Gemfile
|---main.rb
|---public
|
|---javascripts
|
|
|---dark-blue.js
|
|
|---dark-green.js
|
|
|---exporting.js
|
|
|---gray.js
|
|
|---grid.js
|
|
|---highcharts.js
|
|
|---jquery-1.4.2.min.js
|
|---stylesheets
|
|---main.css
|---Rakefile
|---statistics.rb
|---views
|---about.haml
|---byyear.haml
|---date.haml
|---error.haml
|---index.haml
|---journal.haml
|---journals.haml
|---layout.haml
|---test.haml
|---total.haml
Sinatra Application Code - main.rb

# main.rb
configure do
# a bunch of config stuff goes here
# DB = connection to MongoDB database
# timeline
timeline = DB.collection(’timeline’)
set :data, timeline.find.to_a.map { |e| [e[’date’], e[’count’]] }
end
# views
get "/" do
haml :index
end
Sinatra Views - index.haml
%h3 PubMed Retraction Notices - Timeline
%p Last update: #{options.updated_at}
%div#container(style="margin-left: auto; margin-right: auto; width: 800px;")
:javascript
$(function () {
new Highcharts.Chart({
chart: {
renderTo: ’container’,
defaultSeriesType: ’area’,
width: 800,
height: 600,
zoomType: ’x’,
marginTop: 80
},
legend: { enabled: false },
title: { text: ’Retractions by date’ },
xAxis: { type: ’datetime’},
yAxis: { title:
{ text: ’Retractions’ }
},
series: [{
data: #{options.data.inspect.gsub(/"/,"")}
}],
// more stuff goes here...
});
});
Deployment: Heroku + MongoHQ
Heroku.com - free application hosting (for small apps)

Almost as simple as:
$ git remote add heroku git@heroku.com:appname.git
$ git push heroku master
MongoHQ.com - free MongoDB database hosting (up to 16 MB)
“Final” product

Application - http://pmretract.heroku.com
Code - http://github.com/neilfws/PubMed

More Related Content

What's hot

MongoDB: Intro & Application for Big Data
MongoDB: Intro & Application  for Big DataMongoDB: Intro & Application  for Big Data
MongoDB: Intro & Application for Big DataTakahiro Inoue
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation FrameworkMongoDB
 
Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2
MongoDB
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorAnalytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Henrik Ingo
 
Mongo db dla administratora
Mongo db dla administratoraMongo db dla administratora
Mongo db dla administratora
Łukasz Jagiełło
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
MongoDB
 
MongoDB Aggregation
MongoDB Aggregation MongoDB Aggregation
MongoDB Aggregation
Amit Ghosh
 
Webinar: Exploring the Aggregation Framework
Webinar: Exploring the Aggregation FrameworkWebinar: Exploring the Aggregation Framework
Webinar: Exploring the Aggregation Framework
MongoDB
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation FrameworkTyler Brock
 
Extending Slate Queries & Reports with JSON & JQUERY
Extending Slate Queries & Reports with JSON & JQUERYExtending Slate Queries & Reports with JSON & JQUERY
Extending Slate Queries & Reports with JSON & JQUERY
Jonathan Wehner
 
Operational Intelligence with MongoDB Webinar
Operational Intelligence with MongoDB WebinarOperational Intelligence with MongoDB Webinar
Operational Intelligence with MongoDB WebinarMongoDB
 
Mongodb Aggregation Pipeline
Mongodb Aggregation PipelineMongodb Aggregation Pipeline
Mongodb Aggregation Pipeline
zahid-mian
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Nosh Petigara
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB
 
Moar tools for asynchrony!
Moar tools for asynchrony!Moar tools for asynchrony!
Moar tools for asynchrony!
Joachim Bengtsson
 
Aggregation Framework
Aggregation FrameworkAggregation Framework
Aggregation FrameworkMongoDB
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichAggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days Munich
Norberto Leite
 
hySON
hySONhySON
hySON
지영 이
 

What's hot (20)

MongoDB: Intro & Application for Big Data
MongoDB: Intro & Application  for Big DataMongoDB: Intro & Application  for Big Data
MongoDB: Intro & Application for Big Data
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
 
Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2Agg framework selectgroup feb2015 v2
Agg framework selectgroup feb2015 v2
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorAnalytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop Connector
 
Mongo db dla administratora
Mongo db dla administratoraMongo db dla administratora
Mongo db dla administratora
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
 
MongoDB Aggregation
MongoDB Aggregation MongoDB Aggregation
MongoDB Aggregation
 
Webinar: Exploring the Aggregation Framework
Webinar: Exploring the Aggregation FrameworkWebinar: Exploring the Aggregation Framework
Webinar: Exploring the Aggregation Framework
 
MongoDB at GUL
MongoDB at GULMongoDB at GUL
MongoDB at GUL
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
 
Extending Slate Queries & Reports with JSON & JQUERY
Extending Slate Queries & Reports with JSON & JQUERYExtending Slate Queries & Reports with JSON & JQUERY
Extending Slate Queries & Reports with JSON & JQUERY
 
Operational Intelligence with MongoDB Webinar
Operational Intelligence with MongoDB WebinarOperational Intelligence with MongoDB Webinar
Operational Intelligence with MongoDB Webinar
 
Mongodb Aggregation Pipeline
Mongodb Aggregation PipelineMongodb Aggregation Pipeline
Mongodb Aggregation Pipeline
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
 
Moar tools for asynchrony!
Moar tools for asynchrony!Moar tools for asynchrony!
Moar tools for asynchrony!
 
Aggregation Framework
Aggregation FrameworkAggregation Framework
Aggregation Framework
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichAggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days Munich
 
hySON
hySONhySON
hySON
 

Viewers also liked

EZZAT SHAHEEN CV.compressed
EZZAT SHAHEEN  CV.compressedEZZAT SHAHEEN  CV.compressed
EZZAT SHAHEEN CV.compressedEzzat Aljourdi
 
Building Skills to Monitor & Evaluate Performance & Outcomes
Building Skills to Monitor & Evaluate Performance & Outcomes Building Skills to Monitor & Evaluate Performance & Outcomes
Building Skills to Monitor & Evaluate Performance & Outcomes
Lauren-Glenn Davitian
 
Webinar:Building an Agile Enterprise with Business Activity Monitor
Webinar:Building an Agile Enterprise with Business Activity Monitor Webinar:Building an Agile Enterprise with Business Activity Monitor
Webinar:Building an Agile Enterprise with Business Activity Monitor
WSO2
 
A challenging job you
A challenging job youA challenging job you
A challenging job you
Don Bury
 
Amera Engineering Services Presentations
Amera Engineering Services PresentationsAmera Engineering Services Presentations
Amera Engineering Services Presentations
AMERA ENGINEERING AND CONSULTANCY INDIA PVT. LTD.
 
What we learned building Campaign Monitor
What we learned building Campaign MonitorWhat we learned building Campaign Monitor
What we learned building Campaign Monitor
Web Directions
 
Building an Internet Connectivity Monitor
Building an Internet Connectivity MonitorBuilding an Internet Connectivity Monitor
Building an Internet Connectivity Monitor
adrianommarques
 
Best Practices in Real-Time Energy Monitoring
Best Practices in Real-Time Energy MonitoringBest Practices in Real-Time Energy Monitoring
Best Practices in Real-Time Energy Monitoring
Lucid
 
Construction Supervising Site Engineer - Duties & Responsibilities
Construction Supervising Site Engineer - Duties & ResponsibilitiesConstruction Supervising Site Engineer - Duties & Responsibilities
Construction Supervising Site Engineer - Duties & Responsibilities
Asia Master Training آسيا ماسترز للتدريب والتطوير
 
Self-renewal: A Prototype Design for Urban-village Housing
Self-renewal: A Prototype Design for Urban-village HousingSelf-renewal: A Prototype Design for Urban-village Housing
Self-renewal: A Prototype Design for Urban-village Housing
Xinmin Zhuang
 
Building Enterprise Level End-To-End Monitor System with Open Source Solution...
Building Enterprise Level End-To-End Monitor System with Open Source Solution...Building Enterprise Level End-To-End Monitor System with Open Source Solution...
Building Enterprise Level End-To-End Monitor System with Open Source Solution...
LivePerson
 
Success is Broken
Success is BrokenSuccess is Broken
Success is Broken
The Vaikido Hero Unschool
 

Viewers also liked (12)

EZZAT SHAHEEN CV.compressed
EZZAT SHAHEEN  CV.compressedEZZAT SHAHEEN  CV.compressed
EZZAT SHAHEEN CV.compressed
 
Building Skills to Monitor & Evaluate Performance & Outcomes
Building Skills to Monitor & Evaluate Performance & Outcomes Building Skills to Monitor & Evaluate Performance & Outcomes
Building Skills to Monitor & Evaluate Performance & Outcomes
 
Webinar:Building an Agile Enterprise with Business Activity Monitor
Webinar:Building an Agile Enterprise with Business Activity Monitor Webinar:Building an Agile Enterprise with Business Activity Monitor
Webinar:Building an Agile Enterprise with Business Activity Monitor
 
A challenging job you
A challenging job youA challenging job you
A challenging job you
 
Amera Engineering Services Presentations
Amera Engineering Services PresentationsAmera Engineering Services Presentations
Amera Engineering Services Presentations
 
What we learned building Campaign Monitor
What we learned building Campaign MonitorWhat we learned building Campaign Monitor
What we learned building Campaign Monitor
 
Building an Internet Connectivity Monitor
Building an Internet Connectivity MonitorBuilding an Internet Connectivity Monitor
Building an Internet Connectivity Monitor
 
Best Practices in Real-Time Energy Monitoring
Best Practices in Real-Time Energy MonitoringBest Practices in Real-Time Energy Monitoring
Best Practices in Real-Time Energy Monitoring
 
Construction Supervising Site Engineer - Duties & Responsibilities
Construction Supervising Site Engineer - Duties & ResponsibilitiesConstruction Supervising Site Engineer - Duties & Responsibilities
Construction Supervising Site Engineer - Duties & Responsibilities
 
Self-renewal: A Prototype Design for Urban-village Housing
Self-renewal: A Prototype Design for Urban-village HousingSelf-renewal: A Prototype Design for Urban-village Housing
Self-renewal: A Prototype Design for Urban-village Housing
 
Building Enterprise Level End-To-End Monitor System with Open Source Solution...
Building Enterprise Level End-To-End Monitor System with Open Source Solution...Building Enterprise Level End-To-End Monitor System with Open Source Solution...
Building Enterprise Level End-To-End Monitor System with Open Source Solution...
 
Success is Broken
Success is BrokenSuccess is Broken
Success is Broken
 

Similar to Building A Web Application To Monitor PubMed Retraction Notices

An introduction to MongoDB
An introduction to MongoDBAn introduction to MongoDB
An introduction to MongoDB
Universidade de São Paulo
 
Getting Started with MongoDB: 4 Application Designs
Getting Started with MongoDB: 4 Application DesignsGetting Started with MongoDB: 4 Application Designs
Getting Started with MongoDB: 4 Application DesignsDATAVERSITY
 
Practical JSON in MySQL 5.7 and Beyond
Practical JSON in MySQL 5.7 and BeyondPractical JSON in MySQL 5.7 and Beyond
Practical JSON in MySQL 5.7 and Beyond
Ike Walker
 
[Nuxeo World 2013] OPENING KEYNOTE - ERIC BARROCA, NUXEO CEO
[Nuxeo World 2013] OPENING KEYNOTE - ERIC BARROCA, NUXEO CEO[Nuxeo World 2013] OPENING KEYNOTE - ERIC BARROCA, NUXEO CEO
[Nuxeo World 2013] OPENING KEYNOTE - ERIC BARROCA, NUXEO CEO
Nuxeo
 
Marc s01 e02-crud-database
Marc s01 e02-crud-databaseMarc s01 e02-crud-database
Marc s01 e02-crud-databaseMongoDB
 
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...MongoDB
 
Webinar: Building Your First Application with MongoDB
Webinar: Building Your First Application with MongoDBWebinar: Building Your First Application with MongoDB
Webinar: Building Your First Application with MongoDB
MongoDB
 
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
MongoDB
 
Analyse Yourself
Analyse YourselfAnalyse Yourself
Analyse Yourself
Norberto Leite
 
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDaysConexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
CAPSiDE
 
Mongodb intro
Mongodb introMongodb intro
Mongodb intro
christkv
 
9b. Document-Oriented Databases lab
9b. Document-Oriented Databases lab9b. Document-Oriented Databases lab
9b. Document-Oriented Databases lab
Fabio Fumarola
 
ACM BPM and elasticsearch AMIS25
ACM BPM and elasticsearch AMIS25ACM BPM and elasticsearch AMIS25
Data science at the command line
Data science at the command lineData science at the command line
Data science at the command line
Sharat Chikkerur
 
1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analyticsMongoDB
 
Conceptos básicos. Seminario web 2: Su primera aplicación MongoDB
 Conceptos básicos. Seminario web 2: Su primera aplicación MongoDB Conceptos básicos. Seminario web 2: Su primera aplicación MongoDB
Conceptos básicos. Seminario web 2: Su primera aplicación MongoDB
MongoDB
 
Implementing and Visualizing Clickstream data with MongoDB
Implementing and Visualizing Clickstream data with MongoDBImplementing and Visualizing Clickstream data with MongoDB
Implementing and Visualizing Clickstream data with MongoDB
MongoDB
 
Eat whatever you can with PyBabe
Eat whatever you can with PyBabeEat whatever you can with PyBabe
Eat whatever you can with PyBabeDataiku
 
Full stack development with node and NoSQL - All Things Open - October 2017
Full stack development with node and NoSQL - All Things Open - October 2017Full stack development with node and NoSQL - All Things Open - October 2017
Full stack development with node and NoSQL - All Things Open - October 2017
Matthew Groves
 

Similar to Building A Web Application To Monitor PubMed Retraction Notices (20)

An introduction to MongoDB
An introduction to MongoDBAn introduction to MongoDB
An introduction to MongoDB
 
Getting Started with MongoDB: 4 Application Designs
Getting Started with MongoDB: 4 Application DesignsGetting Started with MongoDB: 4 Application Designs
Getting Started with MongoDB: 4 Application Designs
 
Practical JSON in MySQL 5.7 and Beyond
Practical JSON in MySQL 5.7 and BeyondPractical JSON in MySQL 5.7 and Beyond
Practical JSON in MySQL 5.7 and Beyond
 
[Nuxeo World 2013] OPENING KEYNOTE - ERIC BARROCA, NUXEO CEO
[Nuxeo World 2013] OPENING KEYNOTE - ERIC BARROCA, NUXEO CEO[Nuxeo World 2013] OPENING KEYNOTE - ERIC BARROCA, NUXEO CEO
[Nuxeo World 2013] OPENING KEYNOTE - ERIC BARROCA, NUXEO CEO
 
Marc s01 e02-crud-database
Marc s01 e02-crud-databaseMarc s01 e02-crud-database
Marc s01 e02-crud-database
 
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
 
Webinar: Building Your First Application with MongoDB
Webinar: Building Your First Application with MongoDBWebinar: Building Your First Application with MongoDB
Webinar: Building Your First Application with MongoDB
 
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
 
Analyse Yourself
Analyse YourselfAnalyse Yourself
Analyse Yourself
 
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDaysConexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
Conexión de MongoDB con Hadoop - Luis Alberto Giménez - CAPSiDE #DevOSSAzureDays
 
LibreCat::Catmandu
LibreCat::CatmanduLibreCat::Catmandu
LibreCat::Catmandu
 
Mongodb intro
Mongodb introMongodb intro
Mongodb intro
 
9b. Document-Oriented Databases lab
9b. Document-Oriented Databases lab9b. Document-Oriented Databases lab
9b. Document-Oriented Databases lab
 
ACM BPM and elasticsearch AMIS25
ACM BPM and elasticsearch AMIS25ACM BPM and elasticsearch AMIS25
ACM BPM and elasticsearch AMIS25
 
Data science at the command line
Data science at the command lineData science at the command line
Data science at the command line
 
1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analytics
 
Conceptos básicos. Seminario web 2: Su primera aplicación MongoDB
 Conceptos básicos. Seminario web 2: Su primera aplicación MongoDB Conceptos básicos. Seminario web 2: Su primera aplicación MongoDB
Conceptos básicos. Seminario web 2: Su primera aplicación MongoDB
 
Implementing and Visualizing Clickstream data with MongoDB
Implementing and Visualizing Clickstream data with MongoDBImplementing and Visualizing Clickstream data with MongoDB
Implementing and Visualizing Clickstream data with MongoDB
 
Eat whatever you can with PyBabe
Eat whatever you can with PyBabeEat whatever you can with PyBabe
Eat whatever you can with PyBabe
 
Full stack development with node and NoSQL - All Things Open - October 2017
Full stack development with node and NoSQL - All Things Open - October 2017Full stack development with node and NoSQL - All Things Open - October 2017
Full stack development with node and NoSQL - All Things Open - October 2017
 

More from Neil Saunders

Online bioinformatics forums: why do we keep asking the same questions?
Online bioinformatics forums: why do we keep asking the same questions?Online bioinformatics forums: why do we keep asking the same questions?
Online bioinformatics forums: why do we keep asking the same questions?
Neil Saunders
 
Should I be dead? a very personal genomics
Should I be dead? a very personal genomicsShould I be dead? a very personal genomics
Should I be dead? a very personal genomics
Neil Saunders
 
Learning from complete strangers: social networking for bioinformaticians
Learning from complete strangers: social networking for bioinformaticiansLearning from complete strangers: social networking for bioinformaticians
Learning from complete strangers: social networking for bioinformaticians
Neil Saunders
 
SQL, noSQL or no database at all? Are databases still a core skill?
SQL, noSQL or no database at all? Are databases still a core skill?SQL, noSQL or no database at all? Are databases still a core skill?
SQL, noSQL or no database at all? Are databases still a core skill?
Neil Saunders
 
Data Integration: What I Haven't Yet Achieved
Data Integration: What I Haven't Yet AchievedData Integration: What I Haven't Yet Achieved
Data Integration: What I Haven't Yet Achieved
Neil Saunders
 
Version Control in Bioinformatics: Our Experience Using Git
Version Control in Bioinformatics: Our Experience Using GitVersion Control in Bioinformatics: Our Experience Using Git
Version Control in Bioinformatics: Our Experience Using Git
Neil Saunders
 
What can science networking online do for you
What can science networking online do for youWhat can science networking online do for you
What can science networking online do for you
Neil Saunders
 
Using structural information to predict protein-protein interaction and enyzm...
Using structural information to predict protein-protein interaction and enyzm...Using structural information to predict protein-protein interaction and enyzm...
Using structural information to predict protein-protein interaction and enyzm...
Neil Saunders
 
Predikin and PredikinDB: tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificityPredikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB: tools to predict protein kinase peptide specificity
Neil Saunders
 
The Viking labelled release experiment: life on Mars?
The Viking labelled release experiment:  life on Mars?The Viking labelled release experiment:  life on Mars?
The Viking labelled release experiment: life on Mars?
Neil Saunders
 
Protein function and bioinformatics
Protein function and bioinformaticsProtein function and bioinformatics
Protein function and bioinformatics
Neil Saunders
 
Genomics of cold-adapted microorganisms
Genomics of cold-adapted microorganismsGenomics of cold-adapted microorganisms
Genomics of cold-adapted microorganisms
Neil Saunders
 

More from Neil Saunders (12)

Online bioinformatics forums: why do we keep asking the same questions?
Online bioinformatics forums: why do we keep asking the same questions?Online bioinformatics forums: why do we keep asking the same questions?
Online bioinformatics forums: why do we keep asking the same questions?
 
Should I be dead? a very personal genomics
Should I be dead? a very personal genomicsShould I be dead? a very personal genomics
Should I be dead? a very personal genomics
 
Learning from complete strangers: social networking for bioinformaticians
Learning from complete strangers: social networking for bioinformaticiansLearning from complete strangers: social networking for bioinformaticians
Learning from complete strangers: social networking for bioinformaticians
 
SQL, noSQL or no database at all? Are databases still a core skill?
SQL, noSQL or no database at all? Are databases still a core skill?SQL, noSQL or no database at all? Are databases still a core skill?
SQL, noSQL or no database at all? Are databases still a core skill?
 
Data Integration: What I Haven't Yet Achieved
Data Integration: What I Haven't Yet AchievedData Integration: What I Haven't Yet Achieved
Data Integration: What I Haven't Yet Achieved
 
Version Control in Bioinformatics: Our Experience Using Git
Version Control in Bioinformatics: Our Experience Using GitVersion Control in Bioinformatics: Our Experience Using Git
Version Control in Bioinformatics: Our Experience Using Git
 
What can science networking online do for you
What can science networking online do for youWhat can science networking online do for you
What can science networking online do for you
 
Using structural information to predict protein-protein interaction and enyzm...
Using structural information to predict protein-protein interaction and enyzm...Using structural information to predict protein-protein interaction and enyzm...
Using structural information to predict protein-protein interaction and enyzm...
 
Predikin and PredikinDB: tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificityPredikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB: tools to predict protein kinase peptide specificity
 
The Viking labelled release experiment: life on Mars?
The Viking labelled release experiment:  life on Mars?The Viking labelled release experiment:  life on Mars?
The Viking labelled release experiment: life on Mars?
 
Protein function and bioinformatics
Protein function and bioinformaticsProtein function and bioinformatics
Protein function and bioinformatics
 
Genomics of cold-adapted microorganisms
Genomics of cold-adapted microorganismsGenomics of cold-adapted microorganisms
Genomics of cold-adapted microorganisms
 

Recently uploaded

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 

Recently uploaded (20)

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 

Building A Web Application To Monitor PubMed Retraction Notices

  • 1. Building a Web Application to Monitor PubMed Retraction Notices Neil Saunders CSIRO Mathematics, Informatics and Statistics Building E6B, Macquarie University Campus North Ryde December 1, 2011
  • 3. Project Aims Monitor PubMed for retractions Retrieve retraction data and store locally for analysis Develop web application to display retraction data
  • 4. PubMed - advanced search, RSS and send-to-file
  • 8. EInfo example script #!/usr/bin/ruby require ’rubygems’ require ’bio’ require ’hpricot’ require ’open-uri’ Bio::NCBI.default_email = "me@me.com" ncbi = Bio::NCBI::REST.new url = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=" ncbi.einfo.each do |db| puts "Processing #{db}..." File.open("#{db}.txt", "w") do |f| doc = Hpricot(open("#{url + db}")) (doc/’//fieldlist/field’).each do |field| name = (field/’/name’).inner_html fullname = (field/’/fullname’).inner_html description = (field/’description’).inner_html f.write("#{name},#{fullname},#{description}n") end end end
  • 9. EInfo script - output ALL,All Fields,All terms from all searchable fields UID,UID,Unique number assigned to publication FILT,Filter,Limits the records TITL,Title,Words in title of publication WORD,Text Word,Free text associated with publication MESH,MeSH Terms,Medical Subject Headings assigned to publication MAJR,MeSH Major Topic,MeSH terms of major importance to publication AUTH,Author,Author(s) of publication JOUR,Journal,Journal abbreviation of publication AFFL,Affiliation,Author’s institutional affiliation and address ...
  • 10. MongoDB Overview MongoDB is a so-called “NoSQL” database Key features: Document-oriented Schema-free Documents stored in collections http://www.mongodb.org/
  • 11. Saving to a database collection: ecount #!/usr/bin/ruby require "rubygems" require "bio" require "mongo" db = Mongo::Connection.new.db(’pubmed’) col = db.collection(’ecount’) Bio::NCBI.default_email = "me@me.com" ncbi = Bio::NCBI::REST.new 1977.upto(Time.now.year) do |year| all = ncbi.esearch_count("#{year}[dp]", {"db" => "pubmed"}) term = ncbi.esearch_count("Retraction of Publication[ptyp] #{year}[dp]", {"db" => "pubmed"}) record = {"_id" => year, "year" => year, "total" => all, "retracted" => term, "updated_at" => Time.now} col.save(record) puts "#{year}..." end puts "Saved #{col.count} records."
  • 12. ecount collection > db.ecount.findOne() { "_id" : 1977, "retracted" : 3, "updated_at" : ISODate("2011-11-15T03:58:10.729Z"), "total" : 260517, "year" : 1977 }
  • 13. Saving to a database collection: entries #!/usr/bin/ruby require "rubygems" require "mongo" require "crack" db = Mongo::Connection.new.db("pubmed") col = db.collection(’entries’) col.drop xmlfile = "#{ENV[’HOME’]}/Dropbox/projects/pubmed/retractions/data/retract.xml" xml = Crack::XML.parse(File.read(xmlfile)) xml[’PubmedArticleSet’][’PubmedArticle’].each do |article| article[’_id’] = article[’MedlineCitation’][’PMID’] col.save(article) end puts "Saved #{col.count} articles."
  • 14. entries collection { "_id" : "22106469", "PubmedData" : { "PublicationStatus" : "ppublish", "ArticleIdList" : { "ArticleId" : "22106469" }, "History" : { "PubMedPubDate" : [ { "Minute" : "0", "Month" : "11", "PubStatus" : "entrez", "Day" : "23", "Hour" : "6", "Year" : "2011" }, { "Minute" : "0", "Month" : "11", "PubStatus" : "pubmed", "Day" : "23", "Hour" : "6", "Year" : "2011" }, ...
  • 15. Saving to a database collection: timeline #!/usr/bin/ruby require "rubygems" require "mongo" require "date" db = Mongo::Connection.new.db(’pubmed’) entries = db.collection(’entries’) timeline = db.collection(’timeline’) dates = entries.find.map { |entry| entry[’MedlineCitation’][’DateCreated’] } dates.map! { |d| Date.parse("#{d[’Year’]}-#{d[’Month’]}-#{d[’Day’]}") } dates.sort! data = (dates.first..dates.last).inject(Hash.new(0)) { |h, date| h[date] = 0; h } dates.each { |date| data[date] += 1} data = data.sort data.map! {|e| ["Date.UTC(#{e[0].year},#{e[0].month - 1},#{e[0].day})", e[1]] } data.each do |date| timeline.save({"_id" => date[0].gsub(".", "_"), "date" => date[0], "count" => date[1]}) end puts "Saved #{timeline.count} dates in timeline."
  • 16. timeline collection > db.timeline.findOne() { "_id" : "Date_UTC(1977,7,12)", "date" : "Date.UTC(1977,7,12)", "count" : 1 }
  • 17. Sinatra: minimal example require "rubygems" require "sinatra" get "/" do "Hello World" end # ruby myapp.rb # http://localhost:4567
  • 18. Highcharts: minimal example code var chart = new Highcharts.Chart({ chart: { renderTo: ’container’, defaultSeriesType: ’line’ }, xAxis: { categories: [’Jan’, ’Feb’, ’Mar’, ’Apr’, ’May’, ’Jun’, ’Jul’, ’Aug’, ’Sep’, ’Oct’, ’Nov’, ’Dec’] }, series: [{ data: [29.9, 71.5, 106.4, 129.2, 144.0, 176.0, 135.6, 148.5, 216.4, 194.1, 95.6, 54.4] }] }); // <div id="container" style="height: 400px"></div>
  • 21. Sinatra Application Code - main.rb # main.rb configure do # a bunch of config stuff goes here # DB = connection to MongoDB database # timeline timeline = DB.collection(’timeline’) set :data, timeline.find.to_a.map { |e| [e[’date’], e[’count’]] } end # views get "/" do haml :index end
  • 22. Sinatra Views - index.haml %h3 PubMed Retraction Notices - Timeline %p Last update: #{options.updated_at} %div#container(style="margin-left: auto; margin-right: auto; width: 800px;") :javascript $(function () { new Highcharts.Chart({ chart: { renderTo: ’container’, defaultSeriesType: ’area’, width: 800, height: 600, zoomType: ’x’, marginTop: 80 }, legend: { enabled: false }, title: { text: ’Retractions by date’ }, xAxis: { type: ’datetime’}, yAxis: { title: { text: ’Retractions’ } }, series: [{ data: #{options.data.inspect.gsub(/"/,"")} }], // more stuff goes here... }); });
  • 23. Deployment: Heroku + MongoHQ Heroku.com - free application hosting (for small apps) Almost as simple as: $ git remote add heroku git@heroku.com:appname.git $ git push heroku master MongoHQ.com - free MongoDB database hosting (up to 16 MB)
  • 24. “Final” product Application - http://pmretract.heroku.com Code - http://github.com/neilfws/PubMed