Your SlideShare is downloading. ×
Disambiguating Advanced Computing for Humanities Researchers
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Disambiguating Advanced Computing for Humanities Researchers


Published on

Talk at Computing Arts 2004 (July 2004, Newcastle)

Talk at Computing Arts 2004 (July 2004, Newcastle)

Published in: Technology

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Disambiguating Advanced Computing for Humanities Researchers Baden Hughes Department of Computer Science and Software Engineering University of Melbourne [email_address]
  • 2. Agenda
    • Towards Data Intensive Research in the Humanities
    • A Thesis
    • Motivation for this Talk
    • The Disambiguation Task
    • Architectural Characteristics
    • Application Execution Models
    • Integration and Middleware
    • Interfaces
    • Extending the Computational Boundaries
    • Conclusion
  • 3. Towards Data Intensive Research in the Humanities
    • Humanities data is plentiful, i.e., in fields such as history, linguistics, archaeology, musicology, art and literature
    • Exploitation of large collections of data require the efficiency provided through automated analysis if they are to be exploited systematically and exhaustively
    • A barrier to date to the deployment of computational techniques has been the acquisition of data in digital form
    • However, data is becoming available regularly (even freely), and there are strong indications that tendency will continue into the future
    • Computational analysis of large volumes of data has innate challenges, even for domain experts
    • Renewed engagement with traditional questions in the humanities through this computationally-enabled data-centric approach may allow us to answer old questions, and discover new ones
  • 4. A Thesis
    • We are beginning to discover analytical needs within humanities computing disciplines which also exceed available computational resources, especially with the growing popularity of data-intensive research
    • Since other domains have already approached this point, and engineered solutions to the problem, it is possible for humanities researchers to find synergies, adopting existing methods to solve our own research problems
    • Conversely, we may offer new techniques to other research communities which may in turn enable them to attain their research objectives
    • This symbiosis, derived from the common locus of computational enablement of basic research, offers benefits to humanities researchers to enable them to engage with their research in a new expository fashion
  • 5. Motivation for this Talk
    • Computational tractability is increasingly embedded in humanities research methodologies
    • Despite widespread adoption, humanities computing is often characterised as being “less analytically complex” and on a “smaller scale” when compared to that of more “scientific” computing
    • The inherent scalability of humanities computing solutions has so far been largely not addressed since commodity computing has been deemed sufficient for achieving analytical goals within tolerable timeframes
    • Contrastively, in scientific domains, analytical complexity has far surpassed the capacity of commodity computing, and thus new solutions have been sought, and found
    • The adoption of such solutions has allowed scientific research to identify and pursue new avenues of investigation which were previously impossible owing to purely computational constraints
  • 6. The Disambiguation Task
    • Defining “Advanced Computing”:
      • Computational capability beyond that ordinarily available to researchers
      • “ Advanced Computing” therefore includes services which allow resource sharing (data, services and computational cycles), selection and aggregation of resources (distributed by topology or geography) for solving large-scale research problems
    • Foundational to enabling humanities researchers to take advantage of these resources and services is the need to understand the typology of the advanced computing landscape, and a lowering of the barrier to entry at both descriptive and technical levels
    • Here we seek provide an accessible overview of the foundational components of advanced computing, motivated by the desire to inform humanities researchers of the nature of these new paradigms
  • 7. Architectural Characteristics
    • Advanced computing services come in many forms, and have a somewhat interchangeable nomenclature
    • A simple typological approach is useful in understanding the characteristics of a number of common forms
      • Single CPU: what most of us have as workstations – commodity hardware
      • Shared Disk Systems: n-CPUs with a common disk bank – almost commoditised, and an order of magnitude more powerful than a single CPU (eg SMP)
      • Shared Memory Systems: n-CPUs with a common memory bank (increasingly rare)
      • Cluster: local, coordinated computational array, typically with shared storage but based on commodity hardware
      • Grid: distributed, coordinated computational array, typically commodity hardware, heavily dependent on software for integration
    • A range of complementary technologies exist within the advanced computing domain including:
      • high capacity, low latency bandwidth
      • large online data storage
      • metadata and catalogues
      • instrumentation
  • 8. Application Execution Models
    • Decomposition of existing applications can reveal affinities between in situ processing models and the processing models prevalent in the advanced computing domain
    • Serial/pipeline application execution models are the default for many areas of computing (scientific as well as humanities)
    • Naturally, much greater throughput can be gained through parallel execution models, and this is the basic mode employed in advanced computing of all kinds
    • Parallelism can be derived from a number of areas:
      • data-centric parallelism
      • parametric parallelism
    • In using advanced computing services, finding opportunities to parallelise processing is an important first step
      • Not all tasks can be parallelised, some have greater natural affinity than others (eg segmented data, parameter space traversal)
      • There are also lower bounds on the efficiency of parallelisation (the I/O to computation ratio)
  • 9. Integration and Execution
    • Advanced computing services like grids are typically aggregations of many smaller machines
    • Middleware provides the “glue” which allows advanced computing services to be treated as a single machine for interface purposes
      • There are many middleware vendors, with great diversity in approach and functionality
    • The lower middleware layer is concerned mainly with infrastructure management (and can be largely ignored)
      • Computational service discovery, aggregation and coordination
      • Authentication and Security
      • Instrumentation
    • The upper middleware layer is mainly concerned with execution management (and is one of the points of interface)
      • Batch queuing: application instances in a queue for processing
      • Execution brokering: dynamic determination of parameters, and/or generation of application instances, and monitoring of execution progress
    • Avoiding the middleware layer entirely …
  • 10. Interfaces
    • Interfaces to advanced computational services are polymorphic
      • Some require fundamental changes to technical approach on behalf of the researcher
      • Others are easily integrated and offer simple but functional access
    • Simple batch queue interfaces allow submission, execution and collation of experiment output
      • Globus and derivatives
      • NorduGrid’s ARC
    • Many popular programming languages have native-like support for execution in advanced computational environments
      • C/C++: libraries
      • Java: classes, threads
      • Python: wrappers, modules, threads
      • Perl: wrappers, modules, threads
      • Web Services: services
    • Some specialised frameworks have native support for parallel, clustered and distributed execution
  • 11. Extending the Computational Boundaries
    • Adopting computational approaches can impact our research methodologies
      • Increasing size of raw data collections can be efficiently analysed computationally
      • Increasing complexity of analysis can be facilitated computationally
    • On the horizon is computational capability beyond the bounds imposed by any individual researchers’ computational environment
    • Not only do advanced computing services offer new capabilities, but also spawn opportunities for new types of research collaboration
    • Motivated by basic scientific enquiry to
      • test the adequacy of answers to questions we thought were answered
      • find answers to older unanswered questions
      • discover new questions
  • 12. Conclusion
    • Advanced computing offers new opportunities in humanities research, both in terms of methodology and technology
    • Advanced computing takes many different forms, each of which could be more or less applicable to individual research programs
    • Humanities computing can offer insights into enablement questions within the scientific and computational community, and our contribution is welcome
    • Challenges remain in the area of increased accessibility of advanced computing services to humanities researchers, but utility-style computing is a key goal of the advanced computing research community
    • Renewed engagement with traditional questions in the humanities through this computationally-enabled data-centric approach may allow us to answer old questions, and hopefully discover new ones waiting to be answered