• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Doppl Development Introduction

Doppl Development Introduction



Doppl is a new programming language that aims providing a natural syntax for implementing parallel algorithms, designing data structures for shared memory applications and automated message passing ...

Doppl is a new programming language that aims providing a natural syntax for implementing parallel algorithms, designing data structures for shared memory applications and automated message passing among multiple tasks. The name is an abbreviation of `data oriented parallel programming language`.



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Doppl Development Introduction Doppl Development Introduction Document Transcript

    • DOPPL Data Oriented Parallel Programming Language Development Diary Introduction Diego PERINI Department of Computer Engineering Istanbul Technical University, Turkey 2013-04-08 1
    • Abstract This paper stands for the very first development diary entry for Doppl, a new programming language that aims providing a natural syntax for implementing parallel algorithms, designing data structures for shared memory applications and automated message passing among multiple tasks. Development lifecycle of the language is planned as consecutive iterations that are going to be documented separately. Any declared proposition about language terminology or lexical assets in each iteration may be subject to change on subsequent diary entries. 1. Introduction Serial code compilation whether from a high level language or a virtual machine bytecode is simply a translation from a valid syntactically organized declarations and instructions into metaphorical Turing Machine operations. Such computer represents the main philosophy current programming languages are designed on which happened to be very efficient and complete to describe what computation stands for. Current computers do their operations, nonetheless in a slightly improved fashion where there are multiple needles working on a single magnetic tape. Despite the fact that what happens in contrast still remains the same, the design procedure performed by the programmer turned out to be outdated as current multitasking approaches are constructions of workarounds that encapsulate serial coding paradigm with easy to use function, class or subroutine libraries. Whilst their efficiency and aid for current advancements in information technology is undeniable, what multitasked programming requires a more abstract model to represent simultaneous machine instruction executions. The remainder of this paper is set out as follows. Section 2 provides a literature review of programming languages that benefits from parallel programming style. Section 3 outlines the hypothesis behind Dopple, Section 4 considers to the method used to address the research question, Section 5 outlines the result. Section 6 discusses these results and identifies implications for future developments. Finally Section 7 summarizes the paper. 2. Literature Review 2.1 Declarative Programming Languages By definition, declarative programming focuses on what computation should be performed instead of how it should be computed. Such approach usually forces programmer to abandon imperative coding style since declarations themselves are effective enough to mimic looped structures as well as consecutive instruction calls. Functional programming is a subset of declarative programming which is built on lambda calculus. It intrinsically neglects function side effects and guaranties immutable data structures as one can only define a binding in terms of a function. Overwritten bindings in function programming languages is automatically denied and often generate a compiler or interpreter error. Immutable data structures does not respond to inner changes therefore objects of values can only be altered by copy and new bindings which labels these languages without side effects. Subroutines without side effects can easily be optimized and deforested thanks to their ability to provide the same result for same inputs. Parallel 2
    • optimizations are easy to integrate in this kind of languages likewise. Haskell, Clojure and Scheme are some of the leading, pure functional programming languages. Domain specific languages such as Make, SQL and Regular Expressions are also counted as declarative languages. Their declarations often consist of state transition rules and constraint definitions which is used to define synchronized program executions or element filtering on block data. Provided their concerned datasets never change, these languages can also be counted as languages without side effects as well. Furthermore, as long as execution dependencies permit processes to operate without barriers, these languages compile or interpreted into highly parallelized processor instructions. 2.2 State Machines Communication, barriers and wait locks in parallel computation often correspond to state changes or are able to be encapsulated into metaphorical states, thus giving the programmer to express the computed algorithms in terms of finite state machines. Data decomposition, message exchange, reduced collections becomes states of operated data and operating processes where state variables and output form the conditions for the state changes. Moore machine is a type of finite state machine whose output is calculated using solely state variables. State transition tables of Moore machines associate outputs of a node with pointed edges ending at another (can also end on self) node. Such property provides the programmer to model an imperative behavior using deterministic Moore tables as these tables become roadmaps for imperative function calls. Mealy machine is a type of finite state machine whose output is calculated using both state and input variables. State transition tables of these machines associate a tuple of output and state variables with pointed edges ending at another node. Such property provides the programmer to model functional behavior using Mealy tables as these tables become roadmaps for nested and recursive function calls. 2.3 Data Oriented Programming Paradigm High requirements of cache optimizations and multilevel cache mechanisms on multicore processors conflicted with object oriented approach for computations interested in large array of several object properties. Assuming a block memory full of allocated objects of type T that has attributes of x, y, z is constructed. Any computation on solely xs of all objects of T requires all x, y and z values to be summoned to memory cache since in object oriented languages, objects encapsulate these attributes in a sticky manner. This type of summoning culminates with high amount of cache misses due to cache overflow caused by unnecessary fetch operations for non required y and z attributes. Data oriented programming suggests that object arrays should be expressed as structures of attribute arrays instead of arrays of attribute structures, with a side effect of these objects becoming natively singleton. Object instantiation in such approach results with multiple value append operations on x, y, z arrays instead of a new (x, z, y) tuple allocation. Losing object referencing using pointers to array index grants the processor the ability to fetch only required attribute arrays to cache, culminating relatively high amount of cache hits which is a dramatic performance improvement for real time, 3
    • clustered, parallel algorithms. 3D Computer graphic shaders are often applied to pixels or vertices using these types of data structures to satisfy real time rendering constraints. 3. Hypothesis Multitasked programming in currently preferred general purpose languages highly relates to callback design, automated clustering via hardware accelerated computing (e.g Microsoft HLSL and computer graphics pipeline), asynchronous event polling, predefined signals among tasks and stateful objects or protocols. Targeted methodology for Doppl use case scenarios highly abstracts these topics and encourages the programmer to use non complex Doppl syntax to achieve same effects as these patterns, hence assuming that any parallel behavior or computation can still be implementable freed from these burdens. Doppl accuses that customary definitions of processes and threads no longer validates current purpose of these tools. Despite their differences in terms of operating system implementations, both tools provide the same functionality via different software interfaces and system calls, giving Doppl a chance to merge the two into a single unit, a widely accepted figure of speech, a task. A task is a specialized computation agent which can be cloned, forked and distributed over a number of processors or cores. Instances of the same tasks may work on shared, private or composite data with the restriction of forced appliance of the same logic. Such limitation enforces the programmer to design different tasks for different kind of computations which in fact gives the designer to model program logic as pipelined MIMD (Multiple Instructions Multiple Data) flow charts free of language constraints and additional utility concerns. Since different operating systems handle task concurrency differently on hardware level, first iteration of Doppl development does not adopt threads or processes as task baselines leaving the discussion to further diary entries. 4. Research Method Doppl is planned to be a compiled language that executes real machine instructions instead of a virtual one. The main reason behind this decision comes from data oriented design and targeted, relatively high cache hit ratio. Since a Doppl program is assumed to be highly parallel, a multicore environment can only be benefitted once the software is able to interact with the environment directly. Despite the fact that JIT compilation of bytecode languages are no longer considered slow, an abstraction of each processor architecture with a virtual machine hinders designing a generic, data oriented memory organization template. First iteration of Doppl does not cite any compilation tool for the language, however a cross language compiler is likely to be implemented in the future iterations. Target compiled language is planned to be C, due to its ability to be executed on almost any type of architecture. GNU C Compiler (gcc) is the prefered C compiler that will be used to create the final executables. 5. Limitations, Future Research and Conclusion State machines are able to simulate loops via circular, recurring transition paths. Since Doppl ecosystem is formed by states, synchronization points, stateless operations and transition rules (conditionals), a loop snippet or block structure to create loops will not be covered by the language standard. 4
    • Task members will be typed statically and allocated to guarantee data oriented cache formation. Their evaluation however can be lazy if required operations are compute heavy. Lazy operations will never cause non-deterministic results. Regular functions in most common programming languages will be available via member traits of tasks which are closures of functions encapsulated with the relevant member data accessed via a language operator. The language standard is planned to provide per type traits for default, common operations. User defined traits will also be included and can be implemented as distinct Doppl tasks by programmers themselves. Data hiding will only be applied among tasks. Access modifiers will therefore only indicate whether the tasks will share their members or not. Shared members will always be available on shared memory for high parallelization. Predefined types, custom types, immutable members, code imports, source code encoding and dynamic data allocations/bindings are decided to be as future research subjects. 5
    • 5. References Haskell http://www.haskell.org/ Clojure http://www.clojure.org/ Scheme http://www.r6rs.org/ Make http://www.gnu.org/software/make/ Data Oriented Design http://dice.se/publications/introduction-to-data-oriented-design/ http://dice.se/publications/data-oriented-interactive-water/ Microsoft HLSL http://msdn.microsoft.com/en-us/library/windows/desktop/bb509561%28v=vs.85%29.aspx GNU C Compiler http://gcc.gnu.org/ 6