Click to edit Master subtitle style
1
19 November 2019
Research
Automation
Pavel Loskot
Click to edit Master subtitle style
2
About me
From telecommunications
(since 1996)
– modulation, coding, protocols
– simulations, implementation, testing, …
to other areas (since 2012)
– statistical signal processing
– computational molecular biology
– tactical communications
– IoT for railways high power monitoring
– Internet in rural areas
– air-transport services
– renewable energy
Bottom line
– I noticed lots of commonalities across time & space
Click to edit Master subtitle style
3
Evolution of research
Problems of simplicity (1600-1800)
– simple physical models of measurable quantities
Problems of disorganized complexity (1900-1950)
– more complex systems, but with well defined average behavior
Problems of organized complexity (since 1950’s)
– embracing all factors influencing whole system
Experimental science (until 1700’s)
– observations and experiments
Theoretical science (until 1940’s)
– mathematical analysis
Computational science (until 2000)
– computer simulations
Big data science (after 2000)
– data collection and knowledge mining
Click to edit Master subtitle style
4
More recent changes
1990’s
– WWW, availability of PCs
– all about telecommunications
2000’s
– open access journals
– exponential growth in research
– availability of cloud computing
– focus on nanotechnology, living systems
2010’s
– digitalization, virtualization, attention economy
– availability of big data and computing
(GPU, HPC), rush in ML/AI
– Computer Science everywhere
– global challenges: climate, energy, water
– inter/multi-disciplinary approaches
New reality
– technology has become commodity
– anything can be designed
– abundant information, ultra-connectivity
– virtual worlds not bounded by limits of
physical worlds
New trends
– what is far more important than how
– surviving oversupply of information
– machine enhanced reasoning
– research to become
new commodity?
→ Research as a Service
Click to edit Master subtitle style
5
Some observations
9 circles of scientific hell
– limbo
– overselling
– post-hoc storytelling
– p-value fishing
– creative outliers
– plagiarism
– non-publication
– partial-publication
– falsification
source: Perspectives on Psychological
Science, 2012 7:643
My experience
– plagiarism tolerated unless blatantly obvious
– irrelevant who is first, but who makes use of it first
– 2nd
thermodynamic law: it is much easier to collect
than to generate ideas
– digital tools (ICT) reduced entry requirements for
research (and in many other areas) nearly to zero
– machine learning is used for nearly all (research)
problems
– many research fields and methods matured and
can be standardized or are optimized; this fuels
mass production of research results (i.e. papers)
Click to edit Master subtitle style
6
Meta research: research on research
Evolution of product
manufacturing
akin
Evolution of research
manufacturing
Click to edit Master subtitle style
7
Research methods
George Polya’s problem solving techniques (1945)
– If you are having difficulty understanding the problem, draw a picture.
– Assume some solution and see what you can derive from that (go backward).
– If the problem is abstract, try examining a concrete example.
– Try solving more general problem first, inventor's paradox: more ambitious
plans may have more chances of success.
Translations and generalizations
– imitate successful ideas/products/papers/objects
by using them in different or more general context
Generic strategies
– for network-like products and services,
distribute and pool resources as required
– combinatorial innovations
it is straightforward to
implement these as
algorithms and create
research robots
Click to edit Master subtitle style
8
Combinatorial innovations
Characteristics
– endless “innovation” opportunities
– not all combinations sensible
– widely exploited in research papers
Aims
– systematic exploration of innovation space
– predict more important combinations
– identify key innovation pathways
– visualize relationships
– can be automated, lightweight machine
learning may be sufficient
Rule-developing experimentation (RDE)
– systematic exploration of ideas and designs in
product and service development
Block combinatorial designs
(BIBDs, PBDs)
– combinatorial subsets to design
laboratory experiments and
explore degrees-of-freedom
Click to edit Master subtitle style
9
Towards research automation Scientific workflows
E. Deelman et al., The future of scientific workflows, IJHPCA, 2018.
Meta-analysis
These approaches are quantitative i.e.
they are tools to process numerical
data more systematically and flexibly
Click to edit Master subtitle style
10
Paper robots
Scientific papers
– enormous amount of qualitative
and quantitative information
– exponentially growing repositories
– permanent information storage
Tasks
– summarize content, key ideas
– identify keywords, trends
– validate results and statements
consistency across papers
– recommend papers
Revolutionary change
– consumers of scientific knowledge
are other robots/machines
other documents and
unpublished papers
should be included too!
researchgate.net/publication/284157095_Academic_publishing_-_Everything_you_ever_wanted_to_ask
Click to edit Master subtitle style
11
Automating literature search
Case study: comprehensive review of
inference models and methods in
biochemistry
– Linux scripts (awk & sed programmable filters)
to process over 800 mostly papers in pdf
– the scripts used for semi-automated
identification of keywords and visualizing their
appearances in papers
Challenges
– reliable conversion of PDF to text
– formulating proper queries represented by
regular expressions and patterns
P. Loskot et al., Frontiers in Genetics, June 2019.
Click to edit Master subtitle style
12
Take home ideas
Many/most research tasks
– repetitive, well-defined procedures
– it includes writing research papers and grant
applications
– this can be automated even without ML/AI
Research methods
– ideas everywhere, much easier to collect
than create; creativity no longer necessary
to become a researcher
– combinatorial innovations most common,
but other strategies available (translations,
generalizations)
– this can be automated even without ML/AI
Future of research
– meta research (research on research)
– 1st
generation large scale tools:
databases of research papers and
software (even without ML/AI) can
generate more results than any
largest institutions in the world
– 2nd
generation tools: include ML/AI
fully automated tools, no driver (researcher)!
Click to edit Master subtitle style
13
Consequences
Exponential research production
– majority of reported scientific results are true
statistically i.e. with some level of probability
– a few sources of genuine innovation can
propel research in the rest of the world
Attention economy
– self-promoted science turns science
into business
– becomes primarily problem of social
networking, not problem solving
– creates many security problems
Digitalization
– increases complexity
– creates many security vulnerabilities
– drives virtualization i.e. detachment
from physical reality
Most likely future research leaders
– whoever creates collections of scientific
papers (IEEEXplore, Elsevier and OA
publishers, Google Scholar, Sci-Hub)
– whoever creates massive computing/storage
infrastructure (Amazon, Google, Microsoft)
Research automation
– research training how to operate tools, not in scientific reasoning (already happening)
– number of research positions to shrink proportionally to automation, but boom of
research management sector
P. Loskot, Automation is Coming to Research, July 2018.

Research Automation

  • 1.
    Click to editMaster subtitle style 1 19 November 2019 Research Automation Pavel Loskot
  • 2.
    Click to editMaster subtitle style 2 About me From telecommunications (since 1996) – modulation, coding, protocols – simulations, implementation, testing, … to other areas (since 2012) – statistical signal processing – computational molecular biology – tactical communications – IoT for railways high power monitoring – Internet in rural areas – air-transport services – renewable energy Bottom line – I noticed lots of commonalities across time & space
  • 3.
    Click to editMaster subtitle style 3 Evolution of research Problems of simplicity (1600-1800) – simple physical models of measurable quantities Problems of disorganized complexity (1900-1950) – more complex systems, but with well defined average behavior Problems of organized complexity (since 1950’s) – embracing all factors influencing whole system Experimental science (until 1700’s) – observations and experiments Theoretical science (until 1940’s) – mathematical analysis Computational science (until 2000) – computer simulations Big data science (after 2000) – data collection and knowledge mining
  • 4.
    Click to editMaster subtitle style 4 More recent changes 1990’s – WWW, availability of PCs – all about telecommunications 2000’s – open access journals – exponential growth in research – availability of cloud computing – focus on nanotechnology, living systems 2010’s – digitalization, virtualization, attention economy – availability of big data and computing (GPU, HPC), rush in ML/AI – Computer Science everywhere – global challenges: climate, energy, water – inter/multi-disciplinary approaches New reality – technology has become commodity – anything can be designed – abundant information, ultra-connectivity – virtual worlds not bounded by limits of physical worlds New trends – what is far more important than how – surviving oversupply of information – machine enhanced reasoning – research to become new commodity? → Research as a Service
  • 5.
    Click to editMaster subtitle style 5 Some observations 9 circles of scientific hell – limbo – overselling – post-hoc storytelling – p-value fishing – creative outliers – plagiarism – non-publication – partial-publication – falsification source: Perspectives on Psychological Science, 2012 7:643 My experience – plagiarism tolerated unless blatantly obvious – irrelevant who is first, but who makes use of it first – 2nd thermodynamic law: it is much easier to collect than to generate ideas – digital tools (ICT) reduced entry requirements for research (and in many other areas) nearly to zero – machine learning is used for nearly all (research) problems – many research fields and methods matured and can be standardized or are optimized; this fuels mass production of research results (i.e. papers)
  • 6.
    Click to editMaster subtitle style 6 Meta research: research on research Evolution of product manufacturing akin Evolution of research manufacturing
  • 7.
    Click to editMaster subtitle style 7 Research methods George Polya’s problem solving techniques (1945) – If you are having difficulty understanding the problem, draw a picture. – Assume some solution and see what you can derive from that (go backward). – If the problem is abstract, try examining a concrete example. – Try solving more general problem first, inventor's paradox: more ambitious plans may have more chances of success. Translations and generalizations – imitate successful ideas/products/papers/objects by using them in different or more general context Generic strategies – for network-like products and services, distribute and pool resources as required – combinatorial innovations it is straightforward to implement these as algorithms and create research robots
  • 8.
    Click to editMaster subtitle style 8 Combinatorial innovations Characteristics – endless “innovation” opportunities – not all combinations sensible – widely exploited in research papers Aims – systematic exploration of innovation space – predict more important combinations – identify key innovation pathways – visualize relationships – can be automated, lightweight machine learning may be sufficient Rule-developing experimentation (RDE) – systematic exploration of ideas and designs in product and service development Block combinatorial designs (BIBDs, PBDs) – combinatorial subsets to design laboratory experiments and explore degrees-of-freedom
  • 9.
    Click to editMaster subtitle style 9 Towards research automation Scientific workflows E. Deelman et al., The future of scientific workflows, IJHPCA, 2018. Meta-analysis These approaches are quantitative i.e. they are tools to process numerical data more systematically and flexibly
  • 10.
    Click to editMaster subtitle style 10 Paper robots Scientific papers – enormous amount of qualitative and quantitative information – exponentially growing repositories – permanent information storage Tasks – summarize content, key ideas – identify keywords, trends – validate results and statements consistency across papers – recommend papers Revolutionary change – consumers of scientific knowledge are other robots/machines other documents and unpublished papers should be included too! researchgate.net/publication/284157095_Academic_publishing_-_Everything_you_ever_wanted_to_ask
  • 11.
    Click to editMaster subtitle style 11 Automating literature search Case study: comprehensive review of inference models and methods in biochemistry – Linux scripts (awk & sed programmable filters) to process over 800 mostly papers in pdf – the scripts used for semi-automated identification of keywords and visualizing their appearances in papers Challenges – reliable conversion of PDF to text – formulating proper queries represented by regular expressions and patterns P. Loskot et al., Frontiers in Genetics, June 2019.
  • 12.
    Click to editMaster subtitle style 12 Take home ideas Many/most research tasks – repetitive, well-defined procedures – it includes writing research papers and grant applications – this can be automated even without ML/AI Research methods – ideas everywhere, much easier to collect than create; creativity no longer necessary to become a researcher – combinatorial innovations most common, but other strategies available (translations, generalizations) – this can be automated even without ML/AI Future of research – meta research (research on research) – 1st generation large scale tools: databases of research papers and software (even without ML/AI) can generate more results than any largest institutions in the world – 2nd generation tools: include ML/AI fully automated tools, no driver (researcher)!
  • 13.
    Click to editMaster subtitle style 13 Consequences Exponential research production – majority of reported scientific results are true statistically i.e. with some level of probability – a few sources of genuine innovation can propel research in the rest of the world Attention economy – self-promoted science turns science into business – becomes primarily problem of social networking, not problem solving – creates many security problems Digitalization – increases complexity – creates many security vulnerabilities – drives virtualization i.e. detachment from physical reality Most likely future research leaders – whoever creates collections of scientific papers (IEEEXplore, Elsevier and OA publishers, Google Scholar, Sci-Hub) – whoever creates massive computing/storage infrastructure (Amazon, Google, Microsoft) Research automation – research training how to operate tools, not in scientific reasoning (already happening) – number of research positions to shrink proportionally to automation, but boom of research management sector P. Loskot, Automation is Coming to Research, July 2018.