Towards Open Methods: Using Scientific Workflows in Linguistics
Upcoming SlideShare
Loading in...5
×
 

Towards Open Methods: Using Scientific Workflows in Linguistics

on

  • 1,293 views

 

Statistics

Views

Total Views
1,293
Views on SlideShare
1,291
Embed Views
2

Actions

Likes
0
Downloads
9
Comments
0

2 Embeds 2

http://www.slideshare.net 1
https://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Towards Open Methods: Using Scientific Workflows in Linguistics Towards Open Methods: Using Scientific Workflows in Linguistics Presentation Transcript

  • Towards Open Methods: Using Scientific Workflows in Linguistics
    Richard Littauer
    1
  • Various tools, such as Kepler, Taverna, Vistrails, and many others have been designed in order to allow for scientific workflows to be created, executed, and shared among scientists and laboratories.
    Introduction
    2
  • Scientific workflows are typically used to automate the processing, analysis, and management of scientific data.
    Introduction
    3
  • Scientific workflows are typically used to automate the processing, analysis, and management of scientific data.
    They provide a way of tracing provenance and methodologies to help foster reproducible science and the publications of executable papers.
    Introduction
    4
  • By providing front-end visualisationsand adaptations of shell scripts and manual steps, it is easier for scientists to do their work, especially when integrating grids and parallel processing or external databases.
    Introduction
    5
  • How does this relate to Linguistics?
    Workflows in Linguistics
    6
  • How does this relate to Linguistics?
    Many workflow systems I've been looking at would work in the field of corpus linguistics if we merely had open source databases online to mine.
    Workflows in Linguistics
    7
  • How does this relate to Linguistics?
    Many workflow systems I've been looking at would work in the field of corpus linguistics if we merely had open source databases online to mine.
    They, most often, provide a way of cleaning data, and a way of processing repetitive tasks. This is directly applicable to Linguistic work.
    Workflows in Linguistics
    8
  • How does this relate to Open Linguistics?
    Workflows in Linguistics
    9
  • Promote the idea and definition, as specified in opendefinition.org of open data in linguistics and in relation to language data.
    Act as a central point of reference and support for people interested in open linguistic data.
    Provide guidance on legal issues surrounding linguistic data to the community.
    Build an index of indexes of open linguistic data sources and tools and link existing resources.
    Facilitate communication between existing groups.
    Serve as a mediator between providers and users of of technical infrastructure.
    Assemble best-practice guidelines / use cases to create, use and distribute data.
    Open Linguistics
    10
  • Promote the idea and definition, as specified in opendefinition.org of open data in linguistics and in relation to language data.
    Act as a central point of reference and support for people interested in open linguistic data.
    Provide guidance on legal issues surrounding linguistic data to the community.
    Build an index of indexes of open linguistic data sources and tools and link existing resources.
    Facilitate communication between existing groups.
    Serve as a mediator between providers and users of of technical infrastructure.
    Assemble best-practice guidelines / use cases to create, use and distribute data.
    Open Linguistics
    11
  • Promote the idea and definition, as specified in opendefinition.org of open data in linguistics and in relation to language data.
    Act as a central point of reference and support for people interested in open linguistic data.
    Provide guidance on legal issues surrounding linguistic data to the community.
    Build an index of indexes of open linguistic data sources and tools and link existing resources.
    Facilitate communication between existing groups.
    Serve as a mediator between providers and users of of technical infrastructure.
    Assemble best-practice guidelines / use cases to create, use and distribute data.
    Open Linguistics
    12
  • Promote the idea and definition, as specified in opendefinition.org of open data in linguistics and in relation to language data.
    Act as a central point of reference and support for people interested in open linguistic data.
    Provide guidance on legal issues surrounding linguistic data to the community.
    Build an index of indexes of open linguistic data sources and tools and link existing resources.
    Facilitate communication between existing groups.
    Serve as a mediator between providers and users of of technical infrastructure.
    Assemble best-practice guidelines / use cases to create, use and distribute data.
    Open Linguistics
    13
  • Promote the idea and definition, as specified in opendefinition.org of open data in linguistics and in relation to language data.
    Act as a central point of reference and support for people interested in open linguistic data.
    Provide guidance on legal issues surrounding linguistic data to the community.
    Build an index of indexes of open linguistic data sources and tools and link existing resources.
    Facilitate communication between existing groups.
    Serve as a mediator between providers and users of of technical infrastructure.
    Assemble best-practice guidelines / use cases to create, use and distribute data.
    Open Linguistics
    14
  • Examples
    • Example workflow
    15
  • Examples
    • Example workflow
    • This grabs the most recent XKCD comic off the web.
    • http://www.myexperiment.org/workflows/1370.html
    16
  • Examples
    • Another example workflow
    17
  • Examples
    • Another example workflow
    • This workflow retrieves relevant documents, based on a query optimized by adding a string to the original query that will rank the search output according to the most recent years.
    • http://www.myexperiment.org/workflows/117.html
    18
  • Hypothetical Example
    19
  • Hypothetical Example
    20
    Chinese character
    from a text
  • Hypothetical Example
    21
    [ zhi1], [zi2], [zhi2], [shi2], [ci1]
    Chinese character
    from a text
    Dictionary Database
  • Hypothetical Example
    22
    [ zhi1], [zi2], [zhi2], [shi2], [ci1]
    Chinese character
    from a text
    Dictionary Database
    Geographical data from researcher
  • Hypothetical Example
    23
    [ zhi1], [zi2], [zhi2], [shi2], [ci1]
    Chinese character
    from a text
    Dictionary Database
    Geographical data from researcher
  • Hypothetical Example
    24
    [ zhi1], [zi2], [zhi2], [shi2], [ci1]
    Chinese character
    from a text
    Dictionary Database
    Geographical data from researcher
    Character - Proper dialect reading - definition
  • Use in Linguistics
    • So, if we have a linked network online that is queryable
    25
  • Use in Linguistics
    • So, if we have a linked network online that is queryable
    • Hypothetically, it should be possible to use current workflow systems to access and download data
    26
  • Use in Linguistics
    • So, if we have a linked network online that is queryable
    • Hypothetically, it should be possible to use current workflow systems to access and download data
    • My hope is to see how feasible this is
    27
  • Use in Linguistics
    28
    Other use:
  • Use in Linguistics
    29
    Other use:
    Shims: data conversion workflows.
  • Use in Linguistics
    30
    Other use:
    Shims: data conversion workflows.
    As seen in the LexInfo slides, there are varying definitions for parts of speech (from 5 to 181 different types). Workflows could be used to standardise these after accessing the database…
  • Use in Linguistics
    31
    How does this help Open Methods?
  • Use in Linguistics
    32
    How does this help Open Methods?
    By keeping track of workflows and workflow systems before they start being popular, we can make sure that users upload and share their workflows to a single repository (like myExperiment.)
  • Use in Linguistics
    33
    How does this help Open Methods?
    By keeping track of workflows and workflow systems before they start being popular, we can make sure that users upload and share their workflows to a single repository (like myExperiment.)
    This could then be used by other linguists, along with data supplements, to produce replications, and to check methodology.
  • Use in Linguistics
    34
    How does this help Open Methods?
    Also, most workflows are now focusing more on providing provenance solutions.
  • Use in Linguistics
    35
    How does this help Open Methods?
    Also, most workflows are now focusing more on providing provenance solutions.
    This would make linguistics research more sharable, understandable and repeatable.
  • Use in Linguistics
    Work going on this, currently:
    36
  • Use in Linguistics
    Work going on this, currently:
    Steiner Lydia, Peter F. Stadler, Michael Cysouw. 2011. A Pipeline for Computational Historical Linguistics. Language Dynamics and Change, p. 89-127.
    37
  • More Information
    Places to look for more information:
    http://notebooks.dataone.org/workflows
    38
  • More Information
    Places to look for more information:
    http://notebooks.dataone.org/workflows
    https://kepler-project.org/
    39
  • More Information
    Places to look for more information:
    http://notebooks.dataone.org/workflows
    https://kepler-project.org/
    http://www.taverna.org.uk/
    40
  • More Information
    Places to look for more information:
    http://notebooks.dataone.org/workflows
    https://kepler-project.org/
    http://www.taverna.org.uk/
    http://www.myexperiment.org
    41
  • More Information
    Places to look for more information:
    http://notebooks.dataone.org/workflows
    https://kepler-project.org/
    http://www.taverna.org.uk/
    http://www.myexperiment.org
    http://www.mendeley.com/groups/1235381/workflows-in-linguistics/
    42
  • More Information
    Places to look for more information:
    http://notebooks.dataone.org/workflows
    https://kepler-project.org/
    http://www.taverna.org.uk/
    http://www.myexperiment.org
    http://www.mendeley.com/groups/1235381/workflows-in-linguistics/
    Thank you. Questions?
    43