20140429 egu

1
Challenge to
the Data-intensive Science
in Upper Atmospheric
Research in Japan
Y. Koyama*1, K. Kurakawa, Y. Sato, Y. Tanaka,
S. Abe, D. Ikeda, M. Nose, A. Shinbori,
N. Umemura, T. Iyemori, S. UeNo, M. Yagi,
and A. Yatagai
*1Graduate School of Science, Kyoto Univeristy
& WDC for Geomag., Kyoto, Japan
This is the final year of this projectt,
& My contract will expires.

My viewpoint
 Here I position my talk in
this session as right figure.
 Many presenters will talk
about bottom layer.
 So, I would like to
emphasize the necessity
of connection of these
three layers on the
Internet.
2
Tony Hey, Stewart Tansley, & Kristin Tolle (Eds.). (2009).
The Fourth Paradigm: Data-Intensive Scientific Discovery.
Microsoft Research.
Retrieved from http://research.microsoft.com/en-
us/collaboration/fourthparadigm/default.aspx
Redefined by Y. KOYAMA.
Myviewpoint

CAUTION
 To stay away from confusion, it is important to clarify the
presenter's and listener’s position.
 Candidates of Stakeholder for this topic are
 Researcher,
 Data Publisher,
 Journal Publisher,
 Funding Agency,
 President of Institute,
 Voter,
 Tax Payer,
 and so on.
 Today, I talk in a Researcher's position.
3

Problems of Scholarly
Communication
 From the beginning of journal
history,
 It is difficult to reach data.
 Tables in body text, and in
appendix is not enough.
 Repositories is not enough for
Big Data, and Data already
released on the Internet.
 It is difficult to reproduce.
 Reproducibility is poor from the
beginning. In addition,
 We are now facing Big Data.
 The number of papers is
increasing from 2000
dramatically.
 May we ignore this
situation?
4
doi:10.1098/rstl.1665.0007
R. Boyle

Linkage between paper and dataset
in Japan
 Japan Link Center (JaLC) was established in 2012 as
the 9th Digital Object Identifier (DOI) Registration
Agency in the world.
 JaLC is same rank of CrossRef and DataCite basically,
but JaLC is also under the umbrella of CrossRef and
DataCite simultaneously. (this situation is
complecated.)
 JaLC will has a function which gives DOI to dataset
and JaLC’s metadata format is compatible of
DataCite’s one.
5

DOI,
ORCID
DOI,
ORCID
Linkage between paper and dataset
in near future.
• Europe, U.S, and Australia
are progressing more than
Japan.
• In near future, Data
Publication & Citation will
be spreaded all over the
world.

How to reproduce the
contents of the paper?
 Here, I would like to ask you a question again.
[Q] “If you reach the dataset by Data Publication and
Citation, can you reproduce the result of the paper?”
 My answer is NO.
 To reach data is just the first step toward reproduce.
 Metadata to explain the dataset is needed.
 Then general metadata by DataCite/JaLC is insufficient,
and domain specific metadata is needed such as
IUGONET metadata.
7

Lack of Quick Look
 Moreover, at least, Quick Look and Data
Analysis Software code should be shared. And
it should be freely distributed.
 Code Citation is needed.
 Is that all?
8

1st
2nd
3rd
Cliffs of Intermediate Data Layer
4thDOI
DOI • The data-analysis procedure
written by natural language in
literature is insufficient
because of lack of information.
• The literature and published data
through intermediate data need to
be connected on the Internet.
• The code which outputs and
understands data-analysis
procedure in machine-readable
format is needed.This figure is expressing the situation
that Intermediata Data Layer is not
shared on the Internet.

Which field is the best
as testbed to realize this?
 Upper Atmospheric Research is the best.
 Far from ELSI (Ethical, Legal, Social Issues).
 Far from Big Data that is generated by the mobile sensor
devices with GPS.
 Open Data Culture is rooted from International Polar Year
(IGY:1957-1958) and most Data have already shared on
the Internet. It is very good starting point.
 IUGONET, ESPAS, and V*Os already exist.There product
and community become the base.
10

Data-Published Data is
connected,
 Inheritance of
knowledge becomes
more certainly.
 Reproduction of
knowledge accelerates.
11
+

Conclusion
 I pointed out the importance of cooperation of Literature,
Intermediate Data, and Published Data on the Internet.
 To realize these connection, sharing of following items are
needed.
 Data,
 Metadata,
 Persistent Identifer to specify Dataset, Subset, and Granule.
 Code for understand metadata, generate and understand data-
analysis procdure in machine-readable format, visualize,
analyze data,
 Data Sharing Infrastructure.
 The human resource to realize this!
12

20140429 egu

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (10)

Similar to 20140429 egu

Similar to 20140429 egu (20)

More from Yukinobu Koyama

More from Yukinobu Koyama (20)

Recently uploaded

Recently uploaded (20)

20140429 egu