"From SALAMI to Social Machines: Music Information Retrieval as an Exemplar of Digital Research, or Fourth Quadrant Research". Keynote by David De Roure at Semantic Media Launch, Barbican, 3 October 2012
1. From SALAMI to Social Machines:
Music Information Retrieval as an
Exemplar of Digital Research, or
Research
Fourth Quadrant Semantic Media
David De Roure
3. ...the imminent flood of
scientific data expected
from the next generation of
experiments, simulations, s
ensors and satellites
Source: CERN, CERN-EX-0712023, http://cdsweb.cern.ch/record/1203203
7. A Big Picture
e-infrastructure
The Fourth
More machines
Big Data The Future!
Big Compute Quadrant
Conventional Social
online
Computation Networking R&D
More people
8. E. Science laboris
• Data Analysis Pipelines
• Workflows are the new
rock and roll
• Machinery for
coordinating the
execution of services and
linking together resources
• Repetitive and mundane
boring stuff made easier
Carole Goble
10. Reuse, Recycling, Repurposing
• Paul writes workflows for identifying biological
pathways implicated in resistance to
Trypanosomiasis in cattle
• Paul meets Jo. Jo is investigating Whipworm in
mouse.
• Jo reuses one of Paul’s workflow without change.
• Jo identifies the biological pathways involved in
sex dependence in the mouse model, believed to
be involved in the ability of mice to expel the
parasite.
• Previously a manual two year study by Jo had
failed to do this.
Carole Goble
11. “A biologist would rather
share their toothbrush than
their gene name”
“Data mining: my
data’s mine and
your data’s mine”
14. Paul’s
Paul’s Pack Workflow 16 QTL
Research Results
Object produces
Included
in
Published in
Included in
Feeds into
Logs produces Included in Included in
Metadata Slides Paper
produces Published in
Common pathways
Workflow 13
Results
15. Research Objects
Reproducibility, Integrated Publishing
• Workflow Distributed Third Party Alien
– Provenance Tenancy Store
– Conservation & Preservation
– Executable Publication
Carole Goble
• Human
– Credit Tracking
– Unit of Scholarship
– Crowd management
• Semantics
– Acquisition & Publishing
– Encoding, Encapsulation
& Annotation: OAI-ORE, AO… Technical Objects Social Objects
16. Linked Data support rdf.myexperiment.org
1. Use URIs as names for things
2. Use HTTP URIs so that people
can look up those names
3. When someone looks up
a URI, provide useful
information, using the
standards (RDF*, SPARQL)
4. Include links to other URIs
so that they can discover
more things
19. To Do
Ingredient List Dissolve 4- Add K2CO3 Heat at reflux Cool and add Heat at Cool and add Extract with Combine organics, Remove Fuse compound to silica &
List
flourinated powder for 1.5 hours Br11OCB reflux until water (30ml) DCM dry over MgSO4 & solvent in column in ether/petrol
Fluorinated biphenyl 0.9 g
Br11OCB 1.59 g biphenyl in completion (3x40ml) filter vacuo
Potassium Carbonate 2.07 g butanone
A digital lab book
Butanone 40 ml
Plan
replacement that
Add Cool
Add Reflux Liquid- Remove Column
Add Reflux Cool Add Dry Filter Fuse
liquid Solvent Chromatography
extraction by Rotary
Evaporation
Butanone dried via silica column and
Sample of 4-
0.9031 grammes
Weigh
Inorganics dissolve 2
layers. Added brine
~20ml. text
image
3 of 40
Measure
ml
excess
Measure
g
Silica
Ether/
Petrol
Ratio
chemists were
Process
flourinated
able to use, and
measured into 100ml RB flask.
Record
Used 1ml extra solvent to wash out biphenyl Annotate
container. DCM MgSO4
Annotate
1 1 2 2 1 3 1 4 3 5 2 6 2 7 4 8 9 10 11 12 13 14
Add Cool
Add Reflux Add Remove Column
Add Reflux Cool Liquid- Dry Filter Fuse
text liquid (Buchner) Solvent Chromatography
Sample of
Butanone Annotate
extraction by Rotary
Sample of Br11OCB
Water Annotate Annotate Evaporation
K2CO3
Measure Powder
liked.
Weigh Weigh Measure
text
Started reflux at 13.30. (Had to
change heater stirrer) Only reflux
40 text Washed MgSO4 with text
ml for 45min, next step 14:15. Organics are yellow
solution DCM ~ 50ml
2.0719 g g 30 ml
1.5918
Key Observation Types Future Questions
Process weight - grammes Whether to have many subclasses of processes or fewer with annotations
measure - ml, drops Combechem
Input How to depict destructive processes
annotate - text
30 January 2004
Jeremy Frey
Literal
How to depict taking lots of samples
temperature - K, C ° gvh, hrm, gms
Observation What is the observation/process boundary? e.g. MRI scan
21. The Problem Ichiro Fujinaga
INT. VERSE VERSE BRIDG VERSE BRIDG VERSE O .
E E UT
22. SALAMI
• Structural Analysis of Large amounts of Music
Information
• Musical analysis has traditionally been conducted by
individuals and on a small scale
• Computational approach, combined with the huge
volume of data now available, will
1. Deliver substantive corpus of musical analyses in
common framework for music scholars and students
2. Establish a methodology and tooling so that
community can sustain and enhance this resource
www.diggingintodata.org
23. Structural Analysis of Large Amounts of Music Information
23,000 hours of Digital Music
recorded music
Collections Music Information
Retrieval Community
Student-sourced Community
ground truth Software
Supercomputer
Linked Data
Repositories
26. Segment Ontology
class structure
Ontology models properties from musicological domain
• Independent of Music Information Retrieval research and
signal processing foundations
• Maintains an accurate and complete description of
relationships that link them Kevin Page and Ben Fields
32. Digital Music
Digital Music
Digital Music
Digital Music
Collections
Collections
Collections
Collections
ground truth
ground truth
ground truth Community
Community
Community
Software
Software
Expertise
Expertise Software
Expertise
Expertise
Results
Results
papers Results
Results
papers
papers
Evaluation
Evaluation Papers
Infrastructure
Infrastructure
(sociotechnical) Evaluations
Evaluations
(sociotechnical) Evaluations
38. The R dimensions
Reusable. The key tenet of Research Replayable. Studies might involve
Objects is to support the sharing and single investigations that happen in
reuse of data, methods and milliseconds or protracted processes
processes. that take years.
Repurposeable. Reuse may also Referenceable. If research objects
involve the reuse of constituent are to augment or replace traditional
parts of the Research Object. publication methods, then they must
Repeatable. There should be be referenceable or citeable.
sufficient information in a Research Revealable. Third parties must be
Object to be able to repeat the able to audit the steps performed in
study, perhaps years later. the research in order to be convinced
Reproducible. A third party can of the validity of results.
start with the same inputs and Respectful. Explicit representations
methods and see if a prior result can of the provenance, lineage and flow
be confirmed. of intellectual property.
Replacing the Paper: The Twelve Rs of the e-Research Record” on http://blogs.nature.com/eresearch/
39. Research
repeat Record repeat
Machine paper Machine
REPRODUCE
paper
software software
Machine Machine
Software
REPRODUCE OR REPEAT?
paper
workflow workflow
wf software
software
Machine Software Machine
blogs.nature.com/eresearch/
41. Computational Research Objects
Research Objects that are
1. The research record for repeatable, reproducible, ... etc
2. Describe process (method) for enactment/execution
3. Usable by machines as well as humans
– Social Objects
– Semantically described
– Programmatically accessible
– Designed for assistance and automation
– Designed for scale and heterogeneity
4. Composable with a distributed computational model?
47. The Order of Social Machines
Real life is and must be full of all kinds of
social constraint – the very processes
from which society arises. Computers
can help if we use them to create
abstract social machines on the Web:
processes in which the people do the
creative work and the machine does the
administration… The stage is set for an
evolutionary growth of new social
engines. Berners-Lee, Weaving the Web, 1999
49. Dimensions
• Number of people • Empowering of
• Number of machines individuals, groups, crowds
• Scale of data • Time criticality
• Varieties of data • Extent of wide area
• Type of machine communication
problem solving • Need for urgent
• Type of human mobilization
problem solving • Specification of goal state
SOCIAM – The Theory and Practice of Social Machines – commences
October 2012, led by Nigel Shadbolt at University of Southampton.
50. Building a Social Machine
Virtual World
(Network of
social interactions) Dave Robertson
Model of social interaction
Design and Participation and
Composition Data supply
Physical World
(people and devices)
51.
52. The users of a website, the website, and
the interactions between them, together
form our fundamental notion of a “machine”
53. That Big Picture
e-infrastructure
The Fourth
More machines
Big Data The Future!
Big Compute Quadrant
Conventional Social
online
Computation Networking R&D
More people
54. An Agenda
1. Science has much to learn from an industry /
R&D that is already digital end-to-end
• Insights into ICT challenges
2. What can we learn from (e-)Science?
• Metadata capture at source, end-to-end semantics
• Social objects, semantic objects, audio objects?
• Reproducible/reconstructable/machine-assisted
analysis/production
• Interactivity, intersection of digital and physical
3. Designing the Social Machines of music*
• Human generated metadata / music?
* And also the music of Social Machines!
56. Links
• Semantic Media
http://semanticmedia.org.uk/
• myExperiment project wiki
http://wiki.myexperiment.org/
• Workflow Forever project (Wf4Ever)
http://www.wf4ever-project.org/
• Future of Research Communication (FORCE11)
http://force11.org/
• Theory and Practice of Social Machines (SOCIAM)
http://sociam.org/
57. • D. De Roure, C. Goble and R. Stevens. The Design and Realisation of the myExperiment
Virtual Research Environment for Social Sharing of Workflows Future Generation
Computer Systems 25, pp. 561-567.
• S. Bechhofer, I. Buchan, D De Roure et al. Why linked data is not enough for scientists,
Future Generation Computer Systems
• D. De Roure, David and C. Goble, Anchors in Shifting Sand: the Primacy of Method in
the Web of Data. WebSci10, April 26-27th, 2010, Raleigh, NC, US.
• D. De Roure, S. Bechhofer, C. Goble and D. Newman, Scientific Social Objects, 1st
International Workshop on Social Object Networks (SocialObjects 2011).
• D. De Roure, K. Belhajjame, P. Missier, P. et al Towards the preservation of scientific
workflows. 8th International Conference on Preservation of Digital Objects (iPRES 2011).
• Carole A. Goble, David De Roure and Sean Bechhofer Accelerating scientists’ knowledge
turns. Will be available at www.springerlink.com
• Khalid Belhajjame, Oscar Corcho, Daniel Garijo et al Workflow-Centric Research Objects:
First Class Citizens in Scholarly Discourse, SePublica2012 at ESWC2012, Greece, May
2012
• Kevin R. Page, Ben Fields, David De Roure et al Reuse, Remix, Repeat: The Workflows of
MIR, 13th International Society for Music Information Retrieval Conference (ISMIR 2012)
Porto, Portugal, October 8th-12th, 2012
• Jun Zhao, Jose Manuel Gomez-Perezy, Khalid Belhajjame et al, Why Workflows Break -
Understanding and Combating Decay in Taverna Workflows, eScience 2012, Chicago,
October 2012
Editor's Notes
CERN teams up with Leaders in Information Technology to build giant Data GridData accumulation rate: 10 Petabytes per year (equivalent to about 20 million CD-ROMs).http://public.web.cern.ch/press/pressreleases/Releases2001/PR11.01ECERNopenlab.html
Big Data and Big Compute and Big Society!Look at astronomy for exampleDifferent rates of progress along axes – one futurological theory says we need a lot more machine to assist because machines scale further than people
What we didn’t see much in phase 1 was sharing and reuse, but this is essential to harnessing of the new technology.The story on this slide involves sharing in a corridor and we will go on to see how we do it digitally! But it’s an important motivation. It led to new science.
The main diagonal (starting from lower left going to upper right) is the similarity of each chroma time-slice with itself. Naturally, this represents perfect similarity, and as such is the most prominent (darkest red) part of the image. In essence, this main diagonal represents the progression of the piece through time. The key part of the self-similarity map to inspect are the off diagonal elements. Prominent blocks and off-diagonals parallel to the main diagonal indicate the strong possibility of repeating sections. In this example, the most prominent example is the "C" section. Off the main diagonal, we notice prominent parallel diagonals (highlighted in white). When we project these off-diagonals vertically and horizontally to the main diagonal, we can infer there is a section of the piece that repeats. In contrast, the "A" block has no strong similarities to other portions of the piece (dominated by blue above and to the right of the main diagonal). The "B" block shows some similarities off the diagonal, but they are not complete. It is therefore possible that the B section is slightly varied, and therefore the second occurrence is labeled "B'". The final structure of this piece is then inferred to be ABCDB'C.
Big Data and Big Compute and Big Society!Look at astronomy for exampleDifferent rates of progress along axes – one futurological theory says we need a lot more machine to assist because machines scale further than people