UNIVERSITY OF TRENTO
Department of Civil, Environmental and Mechanical
Engineering
Master Thesis in
Environmental and Land...
C O N T E N T S
Introduction 1
1 setting up a modern environment for scientific com-
puting 3
1.1 Before starting 3
1.1.1 ...
Contents
4.2 Conclusions 65
5 summary 67
a the structure of the numerical solver utilized 69
b how to store sparse matrice...
L I S T O F S Y M B O L S
Other Symbols
Symbol Description Dimension Unit
θw Volumetric soil water
content
[−] [−]
Tj Tran...
List of Symbols
η Piezometric head [L] [ m]
λj Length of the j-th edge [L] [ m]
· Divergence operator [L−1] [ /m]
Gradient...
A C R O N Y M S
BC Boundary conditions
BEq Boussinesq’s equation
CG Conjugate Gradient
DC Dirichlet cells
DVCS Distributed...
I N T R O D U C T I O N
Mathematical models play a fundamental role in many scientific and en-
gineering fields in today’s w...
It not only solves the BEq but also rigorously treats the “wetting-and-
drying” problem with a clean, new numerical method...
1S E T T I N G U P A M O D E R N E N V I R O N M E N T F O R
S C I E N T I F I C C O M P U T I N G
Contents
1.1 Before sta...
setting up a modern environment for scientific computing
1.1.1 Software engineering & UML
Software engineering is probably...
1.1 before starting
UML Diagram
Structure Diagram
Class Diagram
Component Diagram
Composite Structure Diagram
Deployment D...
setting up a modern environment for scientific computing
• Aggregation: is a variant of the “has a” association relationsh...
1.1 before starting
and is the starting point of parallel activities. A juncture is used to
synchronize parallelism, thus ...
setting up a modern environment for scientific computing
This is possible because changes are univocally identified by an a...
1.1 before starting
While any other VCS stores information as a list of file-based changes, git
treats its data more like a...
setting up a modern environment for scientific computing
have to be merged with the master version, it can simply follow a...
1.2 the advantages of object-oriented programming
• a Graphical User Interface that enables users to conveniently manage
t...
setting up a modern environment for scientific computing
machine model, or rather the solution space where you are impleme...
1.2 the advantages of object-oriented programming
1.2.1 Object and Class
Object and class are the innovative features of t...
setting up a modern environment for scientific computing
Inheritance
As seen in paragraph 1.2.1, a class defines a new type...
1.3 java as programming language for scientific computing
For example, this kind of features can be used in a library: the...
setting up a modern environment for scientific computing
In general, some programming languages like C++ allow full memory...
1.3 java as programming language for scientific computing
the heap can be noticeably longer than time to allocate storage ...
setting up a modern environment for scientific computing
1.4 the object modeling system
The code related to this work has ...
1.5 lessons to take home
and/or other code repositories. These are incorporated into the framework
knowledge base, which v...
2T H E B O U S S I N E S Q ’ S G R O U N D WAT E R E Q U AT I O N
Contents
2.1 The physical problem 21
2.1.1 Some definitio...
the boussinesq’s groundwater equation
this work. That portion of rock not occupied by solid matter is the void
space or po...
2.2 the boussinesq equation
water table (phreatic surface) serving as its upper boundary. Normally,
it is bounded below by...
the boussinesq’s groundwater equation
Cordano and Rigon (2008) [9], considering a vertical hydrostatic distribution
of soi...
2.3 mass conservative scheme for wetting and drying
All these “vectors” are denoted by the symbol · , or harpoon, to disti...
the boussinesq’s groundwater equation
matrix where row names and column names are vertices of the graph and
it represents ...
2.3 mass conservative scheme for wetting and drying
Thus a new adjacency matrix Mij, of dimension Np x Np, is defined and
i...
the boussinesq’s groundwater equation
the derivative (gradient) of η, estimated at the j-th edge and orthogonal to
it. Rem...
2.3 mass conservative scheme for wetting and drying
polygon. The gradient is treated implicitly and the solution of (2.20)...
the boussinesq’s groundwater equation
corresponding to the zero eigenvalue as there are wet domains, i.e. groups
of connec...
2.4 boundary conditions
2.4.2 Head-Based boundary conditions (Dirichlet)
Dirichlet BC assign the time-variable value for η...
the boussinesq’s groundwater equation
2.5 lesson to take home
The physical problem has now been explained, while PDE and n...
3S O F T WA R E I M P L E M E N TAT I O N
Contents
3.1 Object-oriented unstructured mesh 36
3.1.1 First design step: find d...
software implementation
Using the example about matrices presented in Section 1.2, an abstract
class may be DoubleMatrix2D...
software implementation
of maximum value is the same for any type of matrix; the second (at line
45) is an abstract method...
names, argument lists, and return types, but no method bodies. A class
that implements an interface must implement all the...
3.1 object-oriented unstructured mesh
means that the “topological connection” behaviour of an unstructured mesh
can be des...
software implementation
Thus, it is not possible to add a method directly into the UnstructuredMesh
interface to describe ...
3.1 object-oriented unstructured mesh
Figure 8.: Abstract class hierarchy for unstructured meshes
39
software implementation
Figure 9.: Class diagram for unstructured meshes in the BEq implementation
40
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Master thesis Francesco Serafin
Upcoming SlideShare
Loading in …5
×

Master thesis Francesco Serafin

1,939 views

Published on


Mathematical models play a fundamental role in many scientific and en- gineering fields in today’s world. They are used for example in geotechnics to evalute the hillslope stability, in weather science to predict weather trends and produce weather reports, in structural design to study the resistance to stress, and in fluid dynamics to compute fluid flows and air flows.
Consequently mathematical models are evolving all the time: more and more new numerical methods are being invented to solve the Partial Dif- ferential Equations (PDE)s that describe physical problems with increasing precision, and more and more complex and efficient processor units are being created to reduce the computational time.
Therefore, the code into which the mathematical models are translated has to be “dynamic” in order to be easily updated on the basis of the con- tinuous developments (Formetta et al. (2014) [16]).
On the other hand, completely different physical problems are often de- scribed using similar PDEs. For this reason, the numerical methods which provide solutions to different problems can be the same. This suggest the implementation of an IT infrastructure that hosts a standard structure for solving PDEs and that can serve various disciplines with the minimum of hassles.
This work is focused on the application of what is envisioned above, with the main purpose of the creation of an abstract code for implementing every type of mathematical model described by PDEs.
We work on hydrological topics but we hope to design a structure of general interest. Obviously the final goal of any work of this type is to find a proper numerical solver, and therefore, part of the thesis is devoted to the analysis of the problem under scrutiny, and the description of the solution found.

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,939
On SlideShare
0
From Embeds
0
Number of Embeds
986
Actions
Shares
0
Downloads
19
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Master thesis Francesco Serafin

  1. 1. UNIVERSITY OF TRENTO Department of Civil, Environmental and Mechanical Engineering Master Thesis in Environmental and Land Engineering Patterns for the application of modern informatics to the integration of PDEs: the case of the Boussinesq Equation Advisor Author PhD Prof Riccardo Rigon Francesco Serafin Co-Advisor PhD Emanuele Cordano PhD Giuseppe Formetta Academic Year 2013 - 2014
  2. 2. C O N T E N T S Introduction 1 1 setting up a modern environment for scientific com- puting 3 1.1 Before starting 3 1.1.1 Software engineering & UML 4 1.1.2 Version Control System 7 1.1.3 Integrated development environment 10 1.2 The advantages of Object-Oriented Programming 11 1.2.1 Object and Class 13 1.2.2 OOP features 13 1.3 Java as programming language for scientific computing 15 1.3.1 Java performance 17 1.4 The Object Modeling System 18 1.5 Lessons to take home 19 2 the boussinesq’s groundwater equation 21 2.1 The physical problem 21 2.1.1 Some definitions 21 2.1.2 The moisture distribution in a vertical profile 22 2.1.3 Phreatic aquifer and its properties 22 2.2 The Boussinesq equation 23 2.3 Mass conservative scheme for wetting and drying 24 2.3.1 Unstructured orthogonal grid 25 2.3.2 Spatial discretization 27 2.3.3 Time discretization 28 2.4 Boundary conditions 30 2.4.1 Flux-Based boundary conditions (Neumann) 30 2.4.2 Head-Based boundary conditions (Dirichlet) 31 2.5 Lesson to take home 32 3 software implementation 33 3.1 Object-oriented unstructured mesh 36 3.1.1 First design step: find different ways of implementa- tion 36 3.1.2 Second design step: class hierarchy 37 3.1.3 Third step: the implementation 38 3.2 Object-oriented differential equations 41 3.2.1 First design step: find different implementation ways 41 3.2.2 Second design step: class hierarchy 42 3.2.3 Third step: the implementation of the Boussinesq’s equation (BEq) 51 3.3 Input/output management 55 3.4 Conclusions 55 4 test case 57 4.1 Analytical Non-linear Solution with planar topography 57 4.1.1 Analytical non-linear one-dimensional Boussinesq Equa- tion 57 4.1.2 Comparison between analytical and numerical solu- tion 59 4.1.3 Nondimensionalization & maximum norm computa- tion 63 i
  3. 3. Contents 4.2 Conclusions 65 5 summary 67 a the structure of the numerical solver utilized 69 b how to store sparse matrices 71 b.1 Triplet form 71 b.2 Compressed-row form 72 b.3 Matrix-vector multiplication 72 c about open source java matrices libraries - focusing on parallel colt 75 d other comparisons between analytical and numeri- cal simulations 77 d.1 Simulations at 5 days 78 d.2 Simulations at 15 days 82 d.3 Simulations at 20 days 86 bibliography 91 ii
  4. 4. L I S T O F S Y M B O L S Other Symbols Symbol Description Dimension Unit θw Volumetric soil water content [−] [−] Tj Transport coefficient along the j-th edge [L3 T−1] [ m3/s] nj Outcoming versor or- thogonal to the j-th edge [−] [−] H Thickness of the aquifer [L] [ m] hw Total water volume stored in a soil column per planimetric unit area [L] [ m] hw Water volume per unit area between the bedrock and the free surface level [L3/L2] [ m3/m2] KS Saturated hydraulic con- ductivity [L T−1] [ m/s] pi Planimetric area of the cell [L2] [ m2] Q Source term per plani- metric unit area [L T−1] [ m/s] Qn i Water-table recharge (or a sink) [L2 T−1] [ m2/s] qL Outgoing water flux (wa- ter discharge per unit vertical area) normal to the boundary [L T−1] [ m/s] s Drainable porosity [−] [−] t Time [T] [ s] x, y Planimetric cartesian co- ordinates [L] [ m] zb Bedrock elevation [L] [ m] zs Elevation of the terrain surface [L] [ m] Greek Symbols Symbol Description Dimension Unit δj Distance between the centers of the i-th and rho(i, j)-th polygons [L] [ m] iii
  5. 5. List of Symbols η Piezometric head [L] [ m] λj Length of the j-th edge [L] [ m] · Divergence operator [L−1] [ /m] Gradient operator [L−1] [ /m] iv
  6. 6. A C R O N Y M S BC Boundary conditions BEq Boussinesq’s equation CG Conjugate Gradient DC Dirichlet cells DVCS Distributed Version Control System EJML Efficient Java Matrix Library EMF Environmental Modeling Framework FDM Finite Difference Methods FEM Finite Element Method FVM Finite Volume Methods GC Garbage Collector GPL General Public License ICC Incomplete Cholesky Decomposition IDE Integrated Development Environment ILU Incomplete LU factorization ILUT Incomplete LU Transpose factorization ISO International Organization for Standardization JIT Just-in-time JVM Java Virtual Machine MMS Modular Modeling System ODE Ordinary Differential Equation OMS Object Modeling System OOP Object Oriented Programming PDE Partial Differential Equations SWEq Shallow Water Equation UJM Universal Java Matrix Package UML Unified Modeling Language VCS Version Control System v
  7. 7. I N T R O D U C T I O N Mathematical models play a fundamental role in many scientific and en- gineering fields in today’s world. They are used for example in geotechnics to evalute the hillslope stability, in weather science to predict weather trends and produce weather reports, in structural design to study the resistance to stress, and in fluid dynamics to compute fluid flows and air flows. Consequently mathematical models are evolving all the time: more and more new numerical methods are being invented to solve the Partial Dif- ferential Equations (PDE)s that describe physical problems with increasing precision, and more and more complex and efficient processor units are being created to reduce the computational time. Therefore, the code into which the mathematical models are translated has to be “dynamic” in order to be easily updated on the basis of the con- tinuous developments (Formetta et al. (2014) [16]). On the other hand, completely different physical problems are often de- scribed using similar PDEs. For this reason, the numerical methods which provide solutions to different problems can be the same. This suggest the implementation of an IT infrastructure that hosts a standard structure for solving PDEs and that can serve various disciplines with the minimum of hassles. This work is focused on the application of what is envisioned above, with the main purpose of the creation of an abstract code for implementing every type of mathematical model described by PDEs. We work on hydrological topics but we hope to design a structure of general interest. Obviously the final goal of any work of this type is to find a proper numerical solver, and therefore, part of the thesis is devoted to the analysis of the problem under scrutiny, and the description of the solution found. The thesis is organized as follows. In the first chapter an introduction to the software engineering and the tools that improve the team work for programming development is done. These aspects are usually neglected in environmental engineers practice. However the acquisition and the application of these notions is deemed nec- essary to a collaborative and incremental work, where many people can add their brick of specific knowledge and exploit their work. Positive and neg- ative aspects of object oriented languages are shown, because OO program- ming is an important element that helps to obtain short, readable, robust, manageable and portable codes. In the second chapter, the physical problem under analysis, the flow of a phreatic table, is introduced: the model implemented is the Boussinesq’s groundwater equation. This equation is central to describing problems like runoff formation, soil moisture distribution, and the prevention from hill- slopes instability. There are many models that solve this equation, such as Harman and Sivapalan (2009) [22], Harbaugh et al. (2000) [21], Painter et al. (2008) [28]. The implemented model merges the achievements of Brugnano and Casulli (2008) [3] and Casulli (2009) [5] and is further developed by Cor- dano and Rigon (2013) [10] to include any type of boundary conditions. 1
  8. 8. It not only solves the BEq but also rigorously treats the “wetting-and- drying” problem with a clean, new numerical method. In fact it allows for a water table to dry up and form patchy spots of saturation without cre- ating any numerical problems, a characteristic considered very important in hillslope hydrology [24, 25]. The third chapter is the original contribution of this thesis. It contains the entire software engineering work that implements the general structure that solves PDEs, and in particular non linear parabolic ones, discretized through an unstructured grid. In the same chapter it is discussed the application of the abstract structure to the solution of the BEq. The fourth chapter shows a comparison between the numerical solution and an analytical one, to verify the goodness of the model. This concludes a cycle of software engineering in which first of all the software requirements were delineated, then the system was subsequently designed, coded, and finally tested. As developed, the cycle shows the peculiarities and the speci- ficities of the scientific problem under analysis, i.e. solving PDEs, and opens towards the final challenging phases of software engineering, the code main- tenance and debugging. The thesis has four appendices. In Appendix A, the structure of the numerical solver utilized is analyzed in depth. In Appendix B, the compressed format used to store matrices involved in the computation is described. In addition, the algorithm used to compute matrix-vector multiplication in row compressed format is shown. In Appendix C, the Parallel Colt library used to implement the paral- lelized Conjugate Gradient is introduced. In Appendix D, the graphs to compare analytical and numerical solution not used in Chapter 4 are attached.
  9. 9. 1S E T T I N G U P A M O D E R N E N V I R O N M E N T F O R S C I E N T I F I C C O M P U T I N G Contents 1.1 Before starting 3 1.1.1 Software engineering & UML 4 1.1.2 Version Control System 7 1.1.3 Integrated development environment 10 1.2 The advantages of Object-Oriented Programming 11 1.2.1 Object and Class 13 1.2.2 OOP features 13 1.3 Java as programming language for scientific computing 15 1.3.1 Java performance 17 1.4 The Object Modeling System 18 1.5 Lessons to take home 19 The main purpose of this thesis is to build of a new, complete, computing environment in which scientific models can be created. Programmers (and consequently a scientific model developers) never work alone. The environ- ment is therefore conceived to facilitate the interactions of a working group. Various tools are essential for achieving this: • to maximize efficiency of team work, Unified Modeling Language (UML) can be used to design a robust code which allows every pro- grammer to implement a subpart of the code; • to bring together the various subparts, it is necessary to work on an open remote repository via Version Control System (VCS); • to develop a project working with VCS and UML while writing the code, an Integrated Development Environment (IDE) allows you to do every- thing with only one program; • to realize a robust, maintainable and portable code, an object-oriented language is necessary; • the Object Modeling System (OMS) offers the possibility of integrating this simple model with more complex ones, in order to solve inter- disciplinary problems without conflicting data, scales, methodologies and input/output formats. 1.1 before starting Before starting to analyze the features of the chosen programming lan- guage, it is necessary to address some practical aspects like software design, secure backup, sharing of parts of the code of the project via VCS and remote repository. Moreover, there is a category of programs that make it possible to manage all these aspects while working on a project with a single devel- opment environment. These programs are the IDEs, and a brief introduction to them will be made at the end of this section. 3
  10. 10. setting up a modern environment for scientific computing 1.1.1 Software engineering & UML Software engineering is probably the most important part of this work, because it is essential to the design of a complete code structure that will provide a base project as a starting point to implement the scientific model. However, at the same time, this is the least visible part, because it cannot be identified with a single block of work. Indeed, software engineering is de- sign, development, maintenance, testing and evaluation of the software [48]. There is also a part concerning systems that make computers or anything containing software operative, but it is outside this subject. Thus, software engineering is work done before, during and after the creation of the soft- ware. The UML is a tool which can be used to apply these principles. It is a family of graphical notations1, backed by single meta-model2, which is designed to provide a standard mode for visualizing the design of a system built using the object-oriented style [50, 17]. It has been accepted by the International Organization for Standardization (ISO) as an approved ISO standard since 2000. It is a versatile tool, indeed as reported in Fowler (2004) [17] there are three ways in which UML is used: • Sketch: to assist in communicating some aspects of a system; • Blueprint: forward-engineering (for drawing a UML diagram before writing code) and reverse-engineering (building a UML diagram from existing code in order to help understand it); • Programming language: as more and more work is done in UML and the programming gets increasingly mechanical, it becomes obvious that the programming should be automated. Thus, there are tools that allow developers to draw UML diagrams that are compiled directly to executable code, and the UML becomes the source code. In this work, UML is used to improve the design of the code during its de- velopment. The main reason for this choice is my total lack of experience in Object Oriented Programming (OOP) and related tools. Good design before structuring the code entails a faster and more orderly drafting of the code. UML2 describes 13 official types of diagram (see Figure 1), divided into two subgroups: the structure diagram and the behaviour diagram. The kind of structure diagram used is the class diagram, it describes the types of objects in the system and the various types of static relationships between them, the properties and operations of a class, and the constraintsUML2: description of the Class Diagram that apply to the way objects are connected [17]. Classes are represented by boxes divided in three parts: • The top part contains the name of the class and the package where the class is located. • The middle part contains attributes (fields) of the class. • The bottom part contains the methods of the class. 1 The notation is the graphical syntax of the modelling language. In a class diagram, notation defines how items and concepts, such as class, association, and multiplicity are represented. 2 The notation does not define every item and concept exactly, it is rather informal.Meta-model improves the rigour of the method, without sacrificing its usefulness in defining the concepts of the language. 4
  11. 11. 1.1 before starting UML Diagram Structure Diagram Class Diagram Component Diagram Composite Structure Diagram Deployment Diagram Object Diagram Package Diagram Behavior Diagram Activity Diagram Interaction Diagram Sequence Diagram Communication Diagram Interaction Overview Diagram Timing Diagram State Machine Diagram Use Case Diagram Figure 1.: Classification of UML diagram types A symbol precedes methods and fields of the class, to specify their visibility: + Public - Private # Protected / Derived ~ Package There are also several symbols to represent the relationships between classes or objects. The main Class level relationship is the Generalization (inheritance or “is a” relationship): the subclass is considered to be a specialized form of the superclass, while the latter is considered a generalization of the former. The UML graphical representation of a Generalization is a hollow triangle shape on the superclass end of the line that connects it to one or more subclasses (see Figure 2). The main Instance level relationships are: • Association: represents basic relationships between objects and is drawn as a line. 5
  12. 12. setting up a modern environment for scientific computing • Aggregation: is a variant of the “has a” association relationship, and normally occur when a class is a collection or container of other classes. In UML, it is graphically represented as a hollow diamond shape on the containing class end of the tree with a single line that connects the con- tained class to the containing class (Figure 2). The aggregate is seman- tically an extended object that is treated as a unit in many operations, although physically it is made up of several lesser objects [40]. Figure 2.: An example with a Class level and an Instance level relationships: Gener- alization is blue, while Aggregation is grey The kind of behavior diagram used is the activity diagram, a technique to describe procedural logic, business process and work flow. It is very similarUML2: description of the Activity Diagram to a flowchart; the main difference is that it support parallel actions [17]. The shapes used to create this type of diagram are (an example is shown in Figure 3): • Arrows: represent the order in which activities happen. • Rounded rectangles: represent actions; each action has a single flow coming in and a single flow going out. • Diamonds: represent decisions; it has a single incoming flow and several guarded out-bound flows. The guard of each outbound flow is a Boolean condition placed inside square brackets. Diamonds can also represent a merge, i.e. the end of a conditional statement started by a decision. Each merge has multiple input flows and a single output. • Bars: represent a fork or juncture of concurrent activities; each junc- ture has one incoming flow and several outgoing concurrent flows, 6
  13. 13. 1.1 before starting and is the starting point of parallel activities. A juncture is used to synchronize parallelism, thus the outgoing flow is taken only when all the incoming flows reach the juncture. • A black circle: represents the initial state of the workflow. • An encircled black circle: represents the final state. Figure 3.: An example of Activity Diagram 1.1.2 Version Control System If software engineering allows the designing, development, maintainance, testing and evaluation of the code or the entire project, then version control systems VCS implement software engineering principles, especially useful when working with other programmers. Version control is a system that What is a VCS? records changes to a file or a set of files over time so that a specific version can be recalled at a later time. It makes it possible to revert files back to a previous state, compare changes over time, see who last modified something that might be causing a problem, who introduced an issue and when, and more [7]. 7
  14. 14. setting up a modern environment for scientific computing This is possible because changes are univocally identified by an alphanu- meric code (normally hexadecimal code), called the revision number. Version control is very useful to developers that have to track and provide control over changes to source code and possibly over changes to documen- tation and configuration files. Exclusively focusing attention to debugging a code or a project, as part of the software engineering, can become gradu- ally more complex, and consequently it can also submit new bug to fix. For the purpose of locating and fixing bugs, it is very important to be able to retrieve and run different versions of the software to determine in which version(s) the problem occurs, and if necessary reconstruct the history of the bugs. We can find three different types of version control: • Local Version Control Systems. These are simple databases that keep all the changes to a file under revision control. These systems usually work by keeping patch sets, and not by keeping copies of the entire file. For example, this is done also to track changes to configuration files, such as those typically stored in /etc or /usr/local/etc on Unix systems [47]. Obviously, if the local machine breaks down, everything will be lost. This is the risk when the entire history of the project is saved in a single location. • Centralized Version Control Systems. In software development it is necessary to work as a team. This can be done with a server that con- tains all the versioned files, while several clients check out files from this central location. This kind of VCS facilitates teamwork. Further- more, every client can have a certain degree of access to what other clients are doing on the project. The biggest flaw of this system comes to light when the central point goes into failure: if the server goes down, nobody can work, or worse, if the hard disks of the server be- come corrupted, and proper backups have not been made, the entire history of the project is lost except for whatever single snapshots users happen to have on their local machines [7]. • Distributed Version Control Systems. With this kind of system, every client repository fully mirrors the server repository. Thus if the server dies, any of the client repositories can be copied back up to the server to restore it. Compared to the Centralized Version Control System, this system offers lower probability of losing the entire project. Fur- thermore, Distributed Version Control System (DVCS)s deal quite well with having several remote repositories that they can work with, so it is possible to collaborate with different groups of people in different ways simultaneously within the same project. This allows the setting up of several types of work-flows that are not possible in centralized systems, such as hierarchical models [7]. From this description it should be clear that DVCS is the state of the art in the revision control field. In fact, the code implemented in this work is loaded by a DVCS i nto a remote repository. The web hosting service chosen is GitHub, which uses git as DVCS. Beginning from the latter, git was initially designed and developed by Li- nus Torvald for the Linux Kernel development in 2005, following the break- down of relations with the commercial company that developed BitKeeper (a proprietary DVCS where the Linux Kernel had been maintained between 2002 and 2005). 8
  15. 15. 1.1 before starting While any other VCS stores information as a list of file-based changes, git treats its data more like a set of snapshots of a mini filesystem: every time you modify your project and push it into the repository, git takes a picture of what all your files look like at that moment and stores a reference to that snapshot [7]. For optimal efficiency, git has a sort of pointer indicating the Snapshot, not differencesfiles that do not show changes. Thus git only stores a link to the previous identical file it has already stored, and not the complete file again. The most important feature of git, however, is the possibility it offers to perform most operations only with local files and resources. Indeed, a git working direc- tory is a fully fledged repository with a complete history and full version tracking capabilities [47]. For example, to browse the history of the project, Nearly every operation is localgit does not need to go out to the server to get the history and display it, it simply reads it directly from the local database. This means that the project history can be seen almost instantly. This is also possible with changes intro- duced some revisions ago. Most operations can be performed offline, and the network connection is needed only to upload the changes. The history of revisions is depicted 1 2 3 4 5 6 T1 7 8 9 10 T2 Figure 4.: Example history graph of a DVCS (taken from [47]) as a graph, both in git and in most DVCSs. The type of graph is a tree with merges (an example is shown in Figure DVCS graph structure4): the versions are conceived as a line, the main line of development (called trunk or master, in green in Figure 4) with branching (called branch, in yel- low in Figure 4) off the trunk. If there is only the master without branches, the version forms a single line based on its immediate predecessor alone, the HEAD version. Then, the new version is identified as the new HEAD. In the presence of a branch, the relationship between version and derived version (the branch) is an arrow, pointing from older to newer in the same direction as time. After branching process, if the new revision is based only on the older derived revision (the branch is growing), then it is the new HEAD. When the new revision is based on more than one previous revision the result- ing process is called merging. This is typical when changes occur in multi- ple branches (most often two, but more are possible), which are then merged into a single branch incorporating both sets of changes; a merge is drawn as a red arrow in Figure 4. In the pres- ence of merges, the resulting graph is no longer a tree, but a rooted directed acyclic graph [47]. Observing Figure 4, there are two other features to mention. The first is represented by a blue rectangle, which is called tag. This is a functionality to mark release points at a specific time in the progress of the work; it is stored as a full object. The second feature is represented by a cyan circle, which marks a Discontinued Development Branch: a branch does not necessarily 9
  16. 16. setting up a modern environment for scientific computing have to be merged with the master version, it can simply follow a different development path, as has been done for various Linux distributions. git can be used via command line. Assuming that the remote repository has already been created, to run any git command it is necessary to change directory to git projects folder. To configure the protocol, the command via ssh is: $ git config remote.origin.url git@github.com:yourUserName/project.git Then you have to initialize the repository via $ git init This creates a new subdirectory named .git that contains all the necessary repository files, the git repository skeleton. However, nothing in the project is tracked yet. In order to start version controlling existing files, you have to specify the files you want to track with: $ git add . followed by a commit: $ git commit -m "insert your commit" Then you can share your tracked files in your repository by pushing them upstream. The simple command is: $ git push [remote-name] [branch-name] This command only works if you have cloned from a server to which you have write access and nobody has pushed in the meantime. If you are working in a team in the same repository, and someone pushes upstream and then you push upstream, your push will rightly be rejected. You will have to pull down the other work first and incorporate it into yours, before you will be allowed to push [7]: $ git pull [remote-name] In the most famous IDE, git can be integrated as a plug-in or installed by default. For Eclipse, this is called EGit and it allows complete control over your git project via Graphical User Interface. Regarding GitHub, this is simply the web-based hosting service that uses git as DVCS. It is mainly used for software development projects. 1.1.3 Integrated development environment This is the final link of the chain that connects software engineering with code writing. An IDE is a software application that provides computer pro- grammers with comprehensive facilities for software development [42]. It has sprung up in order to organize, accelerate, and standardize the de- velopment of applications, making them more accurate, faster, more robust and easier to maintain. In particular this is because it reduces the configura- tions necessary to piece together multiple development utilities, providing the same set of capabilities as a cohesive unit. The main features of IDEs are: 10
  17. 17. 1.2 the advantages of object-oriented programming • a Graphical User Interface that enables users to conveniently manage the task at hand; • a complete environment for building an application (building automa- tion tools); • helpful views of the application and components under construction; • cross-references to provide instant feedback about changes to the ap- plication; • prebuilt frameworks, components, and tools that work together seam- lessly; • debugging facilities to step through the running of the application and identify errors. The IDE used in application building doubles as a platform for maintain- ing and enhancing the application. The modular structure of an application built with an IDE enables developers other than the creators to understand, maintain, and enhance it [33]. In general, an IDE improves: • development time of applications; • software engineering (many modern IDEs also have a class browser, an object browser, and a class hierarchy diagram to use in object-oriented software development), and as a consequence, modularity and organi- zation applications; • reusability of components and pieces of code; • maintainability and code development, especially in a team projects (sometimes DVCS/VCS are integrated). These features can be summed up in one sentence: an IDE reduces the re- sources needed to build and maintain applications. The most famous multiple-language IDEs are Eclipse, IntelliJ IDEA, Mi- crosoft Visual Studio, NetBeans, Oracle JDeveloper. This work has been developed with Eclipse, an IDE based on an extensible plug-in system. Thus users can customize it or extend its abilities by in- stalling plugins written for the Eclipse Platform, such as development toolk- its for other programming languages. A matter worthy of attention is the Eclipse License. The Eclipse IDE is released under Eclipse Public License (EPL), an open source software license used by the Eclipse Foundation for its software. EPL licensed programs can be used, modified, copied and dis- tributed free of charge. The Eclipse IDE is only one of the Eclipse projects. An Open Source Community creates these products; the Eclipse Founda- tion, a non-profit, member-supported corporation hosts the Eclipse Opens Source projects and helps to cultivate both an Open Source Community and an ecosystem of products and services [36]. 1.2 the advantages of object-oriented programming The story that leads from monolithic programming to OOP can be seen as a sort of “Growing of Abstraction”. Eckel (2003) [14] introduces this concept: all programming languages provide abstractions. In this sense, you can view abstraction like a progress bar, in which the lower level corresponds to the 11
  18. 18. setting up a modern environment for scientific computing machine model, or rather the solution space where you are implementing the solution to a problem; on the other hand, there is the problem model that corresponds to the higher level of abstraction, or rather the problem space where the problem exists. If your abstraction level is small, you are very close to the machine model;Solution space, the lowest abstraction level the classical example is the ASSEMBLY language (even before the binary language). Many so-called imperative languages that followed (such as FOR- TRAN, BASIC and C) are abstractions of the ASSEMBLY language, thus they have come closer to the problem space. However, all these languages require you to think in terms of the computer structure, rather than the problem structure you are trying to solve. All of this implies the drafting of complex codes, that are difficult to read and expensive to maintain. If the abstraction level is high, you are closer to the problem. An exampleProblem space, the highest abstraction level would be a language like LISP, for which “All problems are ultimately lists”. Quoting Eckel (2003) [14] The object-oriented approach goes a step further by providing tools for the programmer to represent elements in the problem space. This representation is general enough that the programer is not constrained to an particular type of problem. We refer to the elements on the prob- lem space and their representations in the solution space as “objects”. The idea is that the program is allowed to adapt itself to the lingo of the problem by adding new types of objects, so when you read the code describing the solution, you’re reading words that also express the prob- lem. [. . . ] Thus, OOP allows you to describe the problem in terms of the problem, rather than in terms of the computer where the solution will run. - Bruce Eckel - Nowadays, OOP is the highest level of abstraction, involving the most flexible and powerful language. Indeed, it provides barriers to break down a large program into smaller structures (the classes), offering features (analized in paragraph 1.2.2) that can improve the design, implementation, debugging, and maintenance of codes. This makes it easy to manage large and complex problems. Thus OOP is very useful in implementing elaborate situations, such as numerical problems in scientific computing, in particular given how easy it is to improve and replace entire sections of code. However, one of the main issues of scientific computing is performance – and OOP is slower than imperative programming. The observed perfor-OOP in scientific computing mance loss of OOP led to it only being used experimentally by the scientific community, until it was understood, at least in part, that it is necessary to have the flexibility of OOP in order to easily develop ever more complex codes. The reason is quite simple: if you have to update a numerical model with a new linear system solver and you have to go to the trouble of ex- tricating between 1 000 or more lines of code, which has been poorly com- mented by other people, perhaps only to replace the call to the old object LinearSystemSolver with the call to the new object LinearSystemSolverEvolution, then optimizing this process with OOP by itself could pay for the computa- tional time lost later. Moreover, an imperative code is fast and efficient only if it is well written, otherwise there is the opposite effect. An imperative code is usually not so easy to write. So which is better - an accurate and efficient object-oriented code, or an approximate monolithic one? 12
  19. 19. 1.2 the advantages of object-oriented programming 1.2.1 Object and Class Object and class are the innovative features of this kind of language. The most important step is understanding the philosophy behind the use of object and class. The next step is to apply these tools within a scientific field. It is difficult to identify an object in a mathematical element, while identifying real ob- jects is quite easy, e.g. in a game like bingo, the objects might be player, card, poster etc. For clarity, an example is shown in the following paragraph. What is an Object? An object in OOP can be identified with something similar in a real-world context. It has two characteristics: state and related behaviour. For example, Every object has states and behaviours in the real world . . . a radio might have some states like on, off, current volume, current station and the related behaviours like turn on, turn off, increase volume, decrease volume, seek, scan and tune. A good numerical example could be a matrix: its states could be the number of rows, the number of columns, the determi- nant, eigenvectors and eigenvalues. Its related behaviours could be fill the matrix, compute the determinant, compute eigenvectors and eigenvalues, is the matrix positive definite etc. . . . which become fields and methods in the OOP world As has been just stated, translating into the world of OOP, the states be- come the fields and the behaviours become the methods (variables and func- tions in some programming languages, respectively). What is a Class? In the real world, there are many individual objects all of the same kind (how many radios are there in the world?). Keeping up the comparison The origin of Class keywordbetween real world and programming world, objects that are identical ex- cept for their states during program execution (e.g. the number of columns for two objects matrix could be different, but the two object matrix remain equal), are grouped together into classes of objects, and this is where the keyword class originated. But a class is not only a kind of object, it is also a new abstract data type. This is a fundamental concept in OOP. Abstract data types work almost Class: abstract data type like built-in typeexactly like built-in types: you can create variables of a type (called object or istance in OOP) and manipulate these variables (called sending messages or request: you send a message and the object figures out what to do with it) [14]. Always considering the high level of abstraction. A class is useful to a programmer because it can fit to a part of the analyzed problem, while an existing data type is designed to represent a unit of storage in a machine with a low level of abstraction. 1.2.2 OOP features The most important feature of OOP is data abstraction. However, there are also three other aspects that make this kind of programming very useful. These features are (not in order of importance): inheritance, encapsula- tion (also improperly called implementation hiding, because encapsulation is a combination of wrapping data and methods within classes and implemen- tation hiding), and dynamic binding (also called polymorphism). 13
  20. 20. setting up a modern environment for scientific computing Inheritance As seen in paragraph 1.2.1, a class defines a new type of object. For example, in the scientific world, a class can be well represented by a matrix, but there are several types of matrices: sparse, dense, diagonal, etc. So, once you have created the class Matrix.java with all its methods, why should you have to create a class DenseMatrix.java that might have similar functionality? The inheritance allows the cloning of the original class Matrix.java (called the base class, or superclass or parent class) and eventually makes additionssuperclasses & subclasses and modifications in a modified clone (called the derived class or inherited class or subclass or child class) like the DenseMatrix.java. The differences between base types and derived types are: • base types contain characteristics (variables) and behaviours (meth- ods) inherited by all the derived types; • derived types are a new type that contains not only all the members of existing type [14]. Indeed, if there is the necessity to differentiate the derived class from the original base one, you can: a. add new methods that are not part of the base class; b. change the behaviour of an existing base class method; this is referred to as overriding that method. Encapsulation Encapsulation is a form of wrapping data and methods within classes in combination with implementation hiding [14]. There are two reason to use this feature. The first is to establish what the client programmers can and cannot use, restricting the access to the object’s components. The second reason is to separate the interface3 from the implementation, in such a way that the client programmers can use the code by sending mes- sages to the public interface without worrying about code implementation. However the programmer can access and change all the code. There are three terms to manage the accessibility of an object (listed from most access to least access):3P for the access to an object: Private, Public and Protected a. Public: everything that is declared public is available to everyone. b. Protected: everything that is declared protected is closely linked to inheritance. When the derived class (class which inherits from the base class) and the base class are in different packages, the derived class can access members of the original class. Only with protected the programmer grants access to a particular member of the derived class. protected also gives package access – that is, other classes in the same package may access protected elements [14]. c. Private: no one can access that member except the class that contains that member [14]. In this way, the programmer insulates the member, as other classes in the same package cannot access it either. 3 The interface determines the request that can be made to a particular object. However, there must be the code somewhere to satisfy that request, but this is related to the implementation part. 14
  21. 21. 1.3 java as programming language for scientific computing For example, this kind of features can be used in a library: the library creator decides what is available to the client programmer and what is not, while the consumer uses the library without rewriting or changing pieces of code. In general, the consumer wants results from libraries; he is not interested in the design decisions which led to a particular implementation of the library. Dynamic binding In the example used in the inheritance paragraph, the base class Matrix. java is inherited by DenseMatrix.java, SparseMatrix.java, DiagonalMatrix.java, etc. Dynamic binding (also called polymorphism) is the feature by which the Polymorphism, to decide the type at run-time programmer declares an object to be of type matrix, for example, while writing the code, but the allocation of the object occurs only at run-time, when the variable is allocated according to the better type, i.e. sparse matrix rather than dense, etc. This feature implies another characteristic: upcasting. It occurs when an object referenced in a method is done as a base type. This allows the passing of every derived type to that method without causing errors. Following this principle, it is possible to write the entire code referring only to the base class, and deciding on the most suitable type of object at run-time. 1.3 java as programming language for scientific computing Java was not designed to be a scientific computing language, but through an analysis of its main positive and negative characteristics, it is possible to demonstrate that it has a great deal to offer and it may also have an important role in the scientific environment. Object Oriented Programming The most important feature is the OOP, as the main need in scientific com- puting is to evolve code either by improving it or increasing its complexity, and OOP makes the code easy to mantain. However, a more in-depth analy- sis has already been made in section 1.2. Java Virtual Machine Java is portable at both the source and object format levels. This means that, considering the main formats of a Java file4, all types are expected to behave in the same way on any computer with the appropriate Java com- piler and Java Virtual Machine (JVM) as the standard run-time system. The motto of the first Java developers was “write once, run anywhere” (WORA) “write once, run anywhere”– this has become fundamental in the academic environment as Java runs on virtually every University platform in the world making any code easily distributable via any type of VCS. Certainly, JVM increases the computational time, but this question is particularly thorny and is therefore treated in para- graph 1.3.1. Memory management Memory management is another important feature: each object require resources (memory) in order to exist, and it is very difficult to know if and when it can be destroyed after use. 4 The source format is the code in a .java file, while the object format is the bytecode in a .class file. 15
  22. 22. setting up a modern environment for scientific computing In general, some programming languages like C++ allow full memory management during the writing of the code. Thus, you can decide in which type of memory allocate a particular type of variable, when to allocate it and when to destroy it. For a computer technician, these kinds of operations are daily routine with a low difficulty level. For a non-computer technician, the risk of memory leaks is very high, especially when he is focused on solving mathematical/numerical problems, rather than on computer tricks. For example, in C++ there is a pointer variable that contains the memory address of an object: if in writing your code, you delete the pointer without deallocating the slot of memory occupied by the pointed object, a memory leak error may be generated, especially if you make this object into a loop, increasing the memory usage without control. The JVM has, in fact, the Garbage Collector (GC), which attempts to re-Garbage Collector introduction claim memory occupied by objects that are no longer in use by the program. In this way, the programmer is free from manually dealing with memory deallocation, and certain categories of bugs, like some kinds of memory leaks, are eliminated. Moreover, GC has a sort of memory defragmentation. As reported in [30], this feature is very useful because when you create a new object in Java, the JVM automatically allocates a block of memory large enough to fit the new object on the heap. Repeated allocation and reclama- tion leads to memory fragmentation, which is similar to disk fragmentation. This phenomenon implies two problems:Garbage Collector defragmenting system 1. Reduced allocation speed: The JVM tracks free memory in lists orga- nized by block size. To create a new object, Java searches through the lists to select and allocate an optimally sized block. Fragmentation slows the allocation process, slowing the execution of the application. 2. Allocation errors: Allocation errors happen when fragmentation be- comes so great that JVM is unable to allocate a sufficiently large block of memory for a new object. The defragmenting system compacts memory by moving all live objects to one end of the heap and removing any fragments of free space, therefore the allocation becomes much faster. Obviously, the GC itself has a cost in terms of efficiency and memory consumption. Firstly, whenever the GC is triggered all application threads are stopped, and the JVM is suspended for the complete duration of a run. The impact of the GC depends on the number of live objects, and not on the number ofGarbage Collector efficiency dead ones, because the GC suspends the execution of the program to ensure the integrity of object trees. Thus, the more live objects there are, then the longer is the suspension; the more objects die, the faster is the GC [30]. Secondly, all Java objects consume an extra 8 B of memory, because the JVMGarbage Collector memory consumption uses these bytes to keep track of the objects and it automatically reclaims memory through GC when the objects are no longer needed [20]. So, as reported in Eckel (2003) [14], there are two different approaches: the main objective of the C++ language is speed of memory allocation and memory release, sacrificing flexibility, because it is necessary to know the exact quantity, lifetime, and type of objects while writing the code; the main objective of the Java language, on the other hand, is flexibility, because you don’t know how many objects you need, what their lifetime is, or what exact type they are until run-time. In other words, the storage is managed dynamically. Obviously, the amount of time required to allocate storage on 16
  23. 23. 1.3 java as programming language for scientific computing the heap can be noticeably longer than time to allocate storage on the stack, because the latter requires a single assembly instruction to move the stack pointer down, while the time to create heap storage depends on design of the storage mechanism. Open source project Java is an open source project. Perhaps, this aspect may appear as not being very important. Indeed, it is a philosophical matter: to develop open Java is an open source projectsource software, such as in this work, you cannot use any programming lan- guage but only an open source language. Sun Microsystems accomplished releasing Java as a free and open source software on May 8, 2007 under the term of the GNU General Public License (GPL). 1.3.1 Java performance Java performance requires a separate discussion. A common reaction of the advanced numerical programmers regarding Java as a programming language for their codes is “But Java is too slow!”. While Java is slower than compiled languages, it is not as slow as it was some years ago. At the beginning of the JVM, the computational time was very long be- cause the JVM worked by strictly interpreting the bytecode of the .class files. Some people reported that the Java programs were up to 500 times slower than the equivalent C or Fortran codes. Much has changed in the past few years. Today, almost all JVMs for tra- To improve the performance of Java: the JIT ditional computing devices use Just-in-time (JIT) compiler technology (also known as dynamic translation). JITs operate as part of the JVM, compil- ing Java bytecode into native machine code at run-time. As reported in Wikipedia [44], in a bytecode-compiled system, source code is translated into an intermediate representation known as bytecode. The latter is not the machine code for any particular computer, and may be portable among computer architectures. The bytecode may then be interpreted by, or run on, a virtual machine. The JIT compiler reads the bytecodes in many sections (or, rarely, in full) and compiles them dynamically into machine language so the program can run faster. Java performs run-time checks on various sections of the code and this is the reason the entire code is not compiled at once. This can be done per-file, per-function or even on any arbitrary code fragment; the code can be compiled when it is about to be executed (hence the name “just-in-time”), and then cached and reused later without needing to be recompiled. A common motivation for using JIT techniques is to reach or surpass the performance of static compilation, while maintaining the advantages of bytecode interpretation. Nowadays, the ratio between C++ and Java perfor- Java is not so slow mance is 1:1.7, but a C++ code can lose most of its performance, transferring the executable file to a machine with different architecture. Thus, the statement “Java is slow” is not completely true; moreover, the positive features of this kind of language far outweigh the loss of computa- tional time. 17
  24. 24. setting up a modern environment for scientific computing 1.4 the object modeling system The code related to this work has been integrated into OMS, an Envi- ronmental Modeling Framework (EMF) for environmental model develop- ment [15]. Frameworks were developed during ’90s because human management issues of the natural word required the solving of ever more complex inter- disciplinary problems. Every model described only a single physical phe- nomenon in that period, thus several of these models often had to be applied within the context of a single project application. Frameworks were built as a supporting structure (integrated modelling framework) around which to integrate several models. Frameworks are essential to better enable software engineering processes in support of scientific modelling, because they improve not only developerWhy to use a framework productivity but also the quality and reliability of the software product it- self. It allows developers to focus development efforts on supporting unique application requirements (as opposed to developing application infrastruc- ture) [12]. Typically, environmental modellers are not software engineers, hence they are focused on the choice of the best set of equations, algorithms, and math- ematical constructs to create environmental models. The EMF manages the software engineering part of the model by black-box programming inter- faces that can hide information related to input/output management, code relationship, etc. In summary, the main purposes of frameworks are to enhance modularity, reusability, and interoperability of every science and auxiliary component, allowing modellers to focus on the model development. Regarding OMS, its development started in the early 2000s as a vehicle to migrate the design principles of the Modular Modeling System (MMS) written in C/Motiv, into a reusable EMF [12]. It has now arrived at the third version. OMS is an interagency project between the USDA-ARS (Agricultural Re- search Service), USGS (U.S. Geological Service) and USDA-NRCS (Natural Resources Conservation Service), and the aim is to make it an even more advanced modular modelling framework based on enhanced modularity, reusability and interoperability of all its components. OMS provides a con- sistent and efficient way to: • create science simulation components; • develop, parameterize, and evaluate environmental models and mod- ify/adjust them as science advances; • re-purpose environmental models for emerging customer requirements. OMS3 is comprised of four primarily architectural foundations, including modelling resources, the system knowledge base, development tools, and modelling products (see Figure 5). The OMS3 core consists of an internal metadata knowledge base for model and simulation creation. A simulation in OMS is defined as a collection of resources (parameter sets, input data, modelling components, model execution methods, etc.) required to pro- duce desired modelling outputs. The system supports harnessing metadata from various sources, including natural resources databases (e.g. land use/- cover, soil), web-based data provisioning services, version control systems 18
  25. 25. 1.5 lessons to take home and/or other code repositories. These are incorporated into the framework knowledge base, which various OMS3 development tools employ to create modelling products. OMS3 modelling products include science components and complete models, simulations supporting parameter estimation and sensitivity/uncertainty analysis, output analysis (e.g., statistical evaluation and graphical visualization), tools, modelling audit trails (i.e., reproducing model results for legal purposes), and miscellaneous technical/user docu- mentation [12]. Figure 5.: OMS3 principle framework architecture 1.5 lessons to take home In summary, the most suitable base IT tools for the development of mod- ern codes have been chosen in this chapter. These tools allow to: • work in teams, splitting the work to be done (UML) and sharing the work already done (DVCS); • write short, readable, robust, correct, manageable, and documented programs (OOP & Java); • have total control over previous tools using only one program (IDE & Eclipse). Thus, a complete environment is now available to realize a scientific model. Environmental phenomena are always described by Ordinary Differential Equation (ODE)s and PDEs. Therefore, to prove the advantages given by these tools, they are applied to a parabolic PDE, for example: the Boussinesq groundwater equation. Before implementing the code structure, in the next chapter, the physical problem is analyzed and the mathematical steps to convert the PDE into a linear system, through spatial and time discretizations, are described. 19
  26. 26. 2T H E B O U S S I N E S Q ’ S G R O U N D WAT E R E Q U AT I O N Contents 2.1 The physical problem 21 2.1.1 Some definitions 21 2.1.2 The moisture distribution in a vertical profile 22 2.1.3 Phreatic aquifer and its properties 22 2.2 The Boussinesq equation 23 2.3 Mass conservative scheme for wetting and drying 24 2.3.1 Unstructured orthogonal grid 25 2.3.2 Spatial discretization 27 2.3.3 Time discretization 28 2.4 Boundary conditions 30 2.4.1 Flux-Based boundary conditions (Neumann) 30 2.4.2 Head-Based boundary conditions (Dirichlet) 31 2.5 Lesson to take home 32 As seen in the introduction, there is a necessity to describe flow through porous media, both to estimate the water stored in the groundwater zone and to know the water movements of the water table, in order to prevent hydro-geological instability. The Boussinesq groundwater equation describes this phenomenon in a phreatic aquifer. 2.1 the physical problem This paragraph is a brief summary that presents the physics of the prob- lem. It is based on the work done by Bear (1988) [2]. Firstly, the most important definitions will be presented. 2.1.1 Some definitions An aquifer is a geological formation, or stratum, that contains water, and permits significant amounts of water to move through it under ordinary field conditions. In contradistinction, an aquiclude is a formation that may contain water (even in appreciable amounts), but it is incapable of trans- mitting significant quantities under ordinary field conditions, e.g. a clay layer. Thus, for all practical purposes, an aquiclude is considered an imper- vious formation. An aquitard is a semi-pervious geological formation that transmits water at a very slow rate as compared to the aquifer. However, over a large (horizontal) area it may permit the passage of large amounts of water between adjacent aquifers, which it separates from each other. An aquifuge is an impervious formation that neither contains nor transmits water. Groundwater is a term used to denote all waters found beneath the ground surface. However, the groundwater hydrologist, who is primarily concerned with the water contained in the zone of saturation, uses the term groundwater to denote water in this zone, and this is the meaning used in 21
  27. 27. the boussinesq’s groundwater equation this work. That portion of rock not occupied by solid matter is the void space or pore space, pores, interstices and fissures. This space contains wa- ter and/or air. Only connected interstices can act as elementary conduits within the formation. 2.1.2 The moisture distribution in a vertical profile In order to analyze the physics of the problem, it is essential to make an accurate description of the moisture distribution in a vertical profile. Subsurface water may be divided vertically into zones depending on the rel- ative proportion of the pore space occupied by water: a zone of saturation, in which all pores are completely filled with water, and an overlying zone of aeration, in which the pores contain both gases (mainly air and water vapour) and water. Thus the physical problem may be easily described: once water arrives on the ground surface, either by precipitation or irrigation, it slowly infiltrates and moves downwards, following the gravitational field. Having reached the impervious bedrock, the water begins to accumulate, filling all intercon- nected pores. A zone of saturation is thus formed and is bounded above by a water table (or phreatic surface), that identifies the surface at atmospheric pressure. The remaining part of the ground is the zone of aeration and can be di- vided in three sub-zones. Continuing to observe from the bottom upwards, the first layer is the capillary zone (or capillary fringe). Its thickness de-The Capillary Fringe: the deepest zone of aeration pends on the soil and on the uniformity of pore size: in coarse materials this layer is practically non-existent; in fine materials (e.g. clay), the thick- ness may be of 2 ÷ 3 m or more. Within the capillary zone there is usually a gradual decrease in moisture content with height above the water table, in- deed, the pressure is less than atmospheric; moving higher, only the smaller connected pores contain water. Observing a hillslope, the flow in the un- saturated zones may be of primary importance, while in the flat areas of a basin, the saturated zone below the water table is much thicker than the capillary fringe, thus the flow in the latter is often neglected. Over the capillary zone, there is the intermediate zone. This extends fromThe Intermediate Zone: pellicular and gravitational water the upper limit of the capillary fringe to the lower edge of the soil water zone. Normally, water moves downward through this zone as gravitational water; but when the water table is too high, this layer does not exist and the capillary fringe may extend into the soil water zone, or even to the ground surface. The highest layer is the soil water zone, this is adjacent to the ground surface, and is affected by conditions at the ground surface: e.g. fluctuationsThe Soil Water Zone: where the moisture distribution is affected by conditions at the ground surface of precipitation, irrigation, air temperature and air humidity, and by the presence of a shallow water table. In this layer, water can normally move downwards during infiltration, as in the other lower layers, but it may also move upwards by evaporation and plant transpiration. When the infiltration is excessive, usually during a short period, the soil in this zone may be almost completely saturated. 2.1.3 Phreatic aquifer and its properties The difference between unconfined or confined aquifers depends only on the presence or absence of a water table respectively. Thus a phreatic aquifer (also called unconfined aquifer or water table aquifer) is one with a 22
  28. 28. 2.2 the boussinesq equation water table (phreatic surface) serving as its upper boundary. Normally, it is bounded below by impervious formations, while a confined aquifer is bounded above and below by impervious formations. In contrast to a confined aquifer, in an observation well (or piezometer), the water level is equal to the piezometric head, and marks the height of the water table. In a confined aquifer (also known as a pressure aquifer), an observation well identifies the piezometric head, that is normally above the base of the con- fining formation; if the confined aquifer is an artesian aquifer the elevation of piezometric surface is above ground surface. Lastly, it is useful to define some general properties of an aquifer, in order to unify the physical phenomenon with mathematical formulae. The hydraulic conductivity indicates capacity of the aquifer material to conduct water through it under hydraulic gradients. It is, therefore, a coeffi- KS cient that depends on solid matrix properties (pore-size distribution, shape of pores, tortuosity and porosity) and on fluid properties (density and vis- cosity). The aquifer transmissivity indicates the ability of the aquifer to transmit T water through its entire thickness, when the flow is essentially horizontal. It is the product of the hydraulic conductivity and the thickness of the aquifer. The storativity of an aquifer indicates the relationship between the changes S in the quantity of water stored in an aquifer and the corresponding changes in the elevations of the phreatic surface for an unconfined aquifer. The stora- tivity can be defined as that volume of water released from (or added to) a vertical column of an aquifer with a unit horizontal cross-section, per unit of decline (or rise) of the phreatic surface. In this work, the focus is on the groundwater zone of an unconfined aquifer, with the aim to identifying the water table and its movements. 2.2 the boussinesq equation The major part of this work is based on the BEq, the governing equation of groundwater flow. It is based on Darcy’s Law and the continuity equation The conservative form of the BEqand, thus, it has been derived from the hydraulic theory of unconfined, saturated, groundwater flow in a sloping aquifer. The BEq used in this work has the following two-dimensional form: ∂hw (η, x, y) ∂t = · KS(x, y, z)h(η, x, y) η + Q(x, y), (2.1) where η is the unknown piezometric head (it can be seen as water-table ele- vation); x, y are the planimetric cartesian coordinates; t is time; hw(η, x, y) is the total water volume stored in a soil column of unit planimetric area; h(η, x, y) is the thickness of the aquifer, which is a function of η and space since it is defined as h := max(0, η − zb(x, y)) where zb(x, y) is the bedrock elevation; · is the divergence operator; is the gradient operator; Q(x, y) is a source term per unit planimetric area, which is also used as Boundary conditions (BC); KS(x, y, z) is the saturated hydraulic conductivity. Multiplying hw(η, x, y) by the domain area gives the total volume of water stored. But it is necessary to compute this magnitude. So, as shown in 23
  29. 29. the boussinesq’s groundwater equation Cordano and Rigon (2008) [9], considering a vertical hydrostatic distribution of soil water pressure: hw(η, x, y) = zs −∞ θw(x, y, z, η − z)dz, (2.2) where zs is the elevation of the terrein surface, and θw(x, y, z, ψ) is the vol- umetric soil water content. Then, the drainable porosity s(x, y) is calculated by deriving hw(η, x, y) with respect to η: s(x, y) = ∂hw(η, x, y) ∂η . (2.3) Equation (2.1) is written in conservative form, i.e. the time derivative is with respect to the volume of total stored water per unit area, which is the conserved quantity, and the second addendum of the equation contains the divergence of mass flux. By appling definition (2.3) and the derivation chainThe BEq rule to the left-hand side of (2.1), BEq can be rewritten in the form usually found in papers and manuals [2, 21, 28]: s(x, y) ∂η ∂t = · KS(x, y, z)h(η, x, y) η + Q(x, y). (2.4) 2.3 mass conservative scheme for wetting and drying Obviously, numerical approximation is typically used to solve the Boussi- nesq groundwater equation. But choosing the most appropriate numeric method is not so simple because the mass balance must not contain errors, especially in presence of wetting and drying. The first step is to choose the equation, because solving the (2.4) or the (2.1) is not the same thing. Indeed, the water level η(x, y) is not a smooth function at the edge of the wet zones, thus equation (2.1) is the best choice in order to conserve mass. The next step is to choose the numerical method. To solve a generic, free-Conservative method which simulates wetting and drying fronts surface, hydrodynamic problem, i.e. the Shallow Water Equation (SWEq) or the BEq, Brugano and Casulli in [3] found a rigorous numerical method. This method is conservative and can simulate wetting and drying fronts. Thus, the Boussinesq equation (2.1) can be cast as a system of the form: V ( ξ ) + T · ξ = b, (2.5) where • ξ = [ξ1, . . . , ξi, . . . , ξNp ]T is the array of unknown quantities; • b = [b1, . . . , bi, . . . , bNp ]T is the array of known parameters; • V ( ξ ) = [V1(ξ1), . . . , Vi(ξi), . . . , VNp (ξNp )]T are the values of the field conserved quantity, i.e. the water volume stored in the i-th cell, a non- linear function of η. 24
  30. 30. 2.3 mass conservative scheme for wetting and drying All these “vectors” are denoted by the symbol · , or harpoon, to distinguish them from space vectors, denoted by · . In the following paragraphs it is shown that the discretization adopted endows the matrix T with proper- ties, as specified in Appendix A, that make the system numerically solvable with the use of the Newton method [23] and the conjugate gradient algo- rithm [31] having convergence to solution guaranteed a priori [3]. 2.3.1 Unstructured orthogonal grid Before discretizing the equation (2.1), the horizontal (x,y) domain is cov- ered by an unstructured orthogonal grid of non-overlapping convex poly- gons. As reported in Casulli and Walters (2000) [6], in this type of grid, Definition of unstructured gridwithin each polygon a point can be identified such that the segment joining the centres of two adjacent polygons and the side shared by the two poly- gons, have a non-empty intersection and are orthogonal to each other. The centre of a polygon does not necessarily coincide with its geometrical centre, but it does coincide with its circumcentre. Thus the integration domain is divided into Np polygons (grid elements). Each side of a polygon is either a boundary line or the side of an adjacent polygon; when the standard approximation applied to the spatial domain is the finite volume method, the solution is computed only in the centres of the cells. Thus the boundary lines are not useful for method resolution. The numbered sides are only the Ns shared edges. In the example, the domain is divided into 8 polygons, while the numbered shared edges are 9. Figure 6.: Unstructured grid - the polygons are numbered in red, the shared edges are numbered in blue How to describe topological connection: the Adjacency Matrix The topological connection between the i-th geometric element and the m(i,j)-th adjacent cell can be described through the use of an adjacency ma- trix. According to its definition, an adjacency matrix is a square binary 25
  31. 31. the boussinesq’s groundwater equation matrix where row names and column names are vertices of the graph and it represents which vertices (or nodes) of the graph are adjacent to which other vertices. In this case, the graph vertices are the circumcentres of the cells of the integration domain, thus: Aij = 1 if the j-th polygon is adjacent to the i-th polygon 0 elsewhere . (2.6) Applying this definition to the example above returns the adjacency matrix Aij of dimensions Np x Np: Aij =             Np x Np 0 1 2 3 4 5 6 7 0 0 1 0 0 0 0 0 1 1 1 0 1 0 0 0 0 0 2 0 1 0 1 0 0 1 0 3 0 0 1 0 1 0 0 0 4 0 0 0 1 0 1 0 0 5 0 0 0 0 1 0 1 0 6 0 0 1 0 0 1 0 1 7 1 0 0 0 0 0 1 0             . (2.7) However, in this way the topological connections are not represented; af- ter numbering the shared sides of the polygons, it is known as an “adja- cency” matrix of dimension Np x Ns, such that: Bir = 1 if the r-th edge is part of the i-th polygon 0 elsewhere , (2.8) Bir =             Np x Ns 1 2 3 4 5 6 7 8 9 0 1 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 2 0 1 1 0 0 0 0 1 0 3 0 0 1 1 0 0 0 0 0 4 0 0 0 1 1 0 0 0 0 5 0 0 0 0 1 1 0 0 0 6 0 0 0 0 0 1 1 1 0 7 0 0 0 0 0 0 1 0 1             . (2.9) This matrix is also useful to define the subset of the shared edges for each i-th polygon: ˆFi = {r if Bir = 1}. (2.10) The subset ˆFi will be used later for the discretization of the divergence term in (2.1). Now, the two important pieces of information collected in the matrices A and B, must be combined into a single matrix: the shared edges adjacency matrix Mij. The simple statement that summarize the compute of this matrix is: Mij = r if Aij = 1 and Bir = 1 0 otherwise . (2.11) 26
  32. 32. 2.3 mass conservative scheme for wetting and drying Thus a new adjacency matrix Mij, of dimension Np x Np, is defined and if the entry is non-zero it coincides with the numbers of the sides shared between the polygon corresponding to the row pointer and the polygon corresponding to the column pointer. Mij =             Np x Np 0 1 2 3 4 5 6 7 0 0 1 0 0 0 0 0 9 1 1 0 2 0 0 0 0 0 2 0 2 0 3 0 0 8 0 3 0 0 3 0 4 0 0 0 4 0 0 0 4 0 5 0 0 5 0 0 0 0 5 0 6 0 6 0 0 8 0 0 6 0 7 7 9 0 0 0 0 0 7 0             . (2.12) Modified Adjacency MatrixFor computational purpose, the element −1 is added on the diagonal of the Mij matrix, and the result is: Mij =             Np x Np 0 1 2 3 4 5 6 7 0 −1 1 0 0 0 0 0 9 1 1 −1 2 0 0 0 0 0 2 0 2 −1 3 0 0 8 0 3 0 0 3 −1 4 0 0 0 4 0 0 0 4 −1 5 0 0 5 0 0 0 0 5 −1 6 0 6 0 0 8 0 0 6 −1 7 7 9 0 0 0 0 0 7 −1             . (2.13) 2.3.2 Spatial discretization By integrating (2.1) over the i-th polygon, one obtains: pi ∂hw(η, x, y) ∂t dxdy = = pi · KS(x, y, h)h(η, x, y) η dxdy + pi Q(x, y)dxdy, (2.14) where pi is the planimetric area of the cell. Therefore, observing that the volume of stored water in the i-th cell is: Vi(ηi) = pi hw(ηi, x, y)dxdy, (2.15) and given that hw is the water volume per unit area between the bedrock The integrated form of the BEqand the free surface level (η) as defined by (2.2), and then applying the divergence theorem, the following integrated form of the BEq is obtained: ∂Vi ∂t = j∈ ˆFi λj KS(x, y, h)h(η(x, y))( η · nj)dλj + pi Q(x, y)dxdy, (2.16) where λj is the length of the j-th edge, dλj is the differential along the j-th edge, nj is the outcoming versor orthogonal to the j-th edge, and η · nj is 27
  33. 33. the boussinesq’s groundwater equation the derivative (gradient) of η, estimated at the j-th edge and orthogonal to it. Remarkably, it can be noticed from (2.15) that the total stored volume is dependent on the spatial variability of hw within the cell and, therefore, on the sub-grid variability of the drainable porosity, the bedrock and sur- face elevation. Due to the orthogonality of the grid, the spatial gradient component can be approximated with the finite difference: η · nj ≈ ηρ(i,j) − ηi δj , (2.17) where δj is the distance between the centre of the i-th and ρ(i, j)-th adjacent polygons. Thus (2.16) is transformed into:Spatial discretization of the BEq ∂Vi(ηi) ∂t = j∈ ˆFi ˆTj(ηi, ηρ(i,j)) · ηρ(i,j) − ηi δj + pi Q(x, y)dxdy, (2.18) where Tj(ηi, ηρ(i,j)) is a transport coefficient along the j-th edge, estimated with an upstream weighting scheme as follows [28]: ˆTj(ηi, ηρ(i,j)) := max λj [KS(x, y, h)h(ηi, x, y)]dλj , λj [KS(x, y, h)h(ηρ(i,j), x, y)]dλj . (2.19) The main advantage of this upstream weighting estimator, ˆTj, is that it pre- vents water from leaving a nearly dry cell and allows water to flow into initially dry cells during a flooding [28]. However, it can also be easily modified on the basis of knowledge of the local variability of hydraulic transmissivity (the integrand) in the volumes under consideration. It is re- markable that ˆTj is a property of the j-th edge and is symmetric with respect to the level of η in adjacent cells, i.e. Tj(ηi, ηρ(i,j)) = Tj(ηρ(i,j), ηi). If the variations of hydraulic conductivity and porosity in space are known, for instance with a stochastic theory of the medium, e.g. [11], the above integral can be estimated with a Monte-Carlo method when the cells are big enough to ensure ergodicity . 2.3.3 Time discretization Each term in Equation (2.18) depends on time (which has been left im- plicit so far), and it can be discretized in a semi-implicit way as follows: Vi(ηn+1 i ) = Vi(ηn i ) + ∆t j∈ ˆFi ˆTn j ηn+1 ρ(i,j) − ηn+1 i δj + ∆t pi ˆQn i , (2.20) where ∆t is the time step and all the superscripts indicate the time instant. The transport coefficient ˆTj is estimated at time n (and is therefore known), as is Qn i , the water-table recharge (or a sink) averaged over the whole i-th 28
  34. 34. 2.3 mass conservative scheme for wetting and drying polygon. The gradient is treated implicitly and the solution of (2.20) must be sought by solving an algebraic non-linear system: Vi(ηn+1 i ) − ∆t j∈ ˆFi ˆTn j ηn+1 ρ(i,j) − ηn+1 i δj = Vi(ηn i ) + ∆t pi ˆQn i , (2.21) for i = 1 to Np. The system in (2.21) is a particular case of (2.5) where ξi is Time discretization of the BEqreplaced with ηn+1 i and the following equalities are taken: T · ξn+1 i = −∆t j∈ ˆFi ηρi, jn+1 − ηn+1 i δj (2.22) and bi = Vi(ηn i ) + ∆t pi ˆQn i . (2.23) From (2.22), T results to be a symmetric and positive semidefinite matrix, which satisfies the T2 property, defined in Appendix A, with diagonal en- tries: Tii = ∆t j∈ ˆFi ˆTj δj (2.24) and off-diagonal entries: Tir = −∆t ˆTn j δj δr,ρ(i,j) i = r, (2.25) where δr,ρ(i,j) is the Kronecker delta symbol. It can be shown that in each row of T the sum of the entries is zero: Np r=1 Tir = 0. (2.26) Finally, Equation (2.21) is rewritten with index notation as follows: Discretized form of the BEq Vi(ηn+1 i ) + Np i=1 Tijηn+1 j = bi. (2.27) From (2.22) it follows that the unit vector, [1, . . . , 1]T , is the eigenvector of T associated to the null eigenvalue. This happens when the hydraulic head is uniformly distributed and, according to Darcy’s law, there is no flux. The stability and convergence to the exact solution is verified according to the conditions (A.9) or (A.10). However, it can happen that T is reducible (and then Tii = 0 for a certain value of i and Til = 0 for certain adjacent cells i and l). In this case, no water flux occurs through the sides of some cells and there is a disconnected “wet” domain. Consequently, the matrix T has as many eigenventors v 29
  35. 35. the boussinesq’s groundwater equation corresponding to the zero eigenvalue as there are wet domains, i.e. groups of connected cells. The condition for each group of connected cells (to be compared with (A.9)) becomes: i∈Ωd Vi(ηn i ) + ∆t i∈Ωd pi ˆQn i > 0 1 d Nsd, (2.28) where d is the index of the group of connected cells Ωd. 2.4 boundary conditions Equation (2.21), which represents the water balance of a generic cell, is solved by coupling with BC, which can be either flux BC (Neumann type) or head BC (Dirichlet type). In the following we analyze how each choice of boundary condition modifies the integration properties of the system and, in particular, requires the modification of (2.21). 2.4.1 Flux-Based boundary conditions (Neumann) Neumann BC assess a positive or negative water flux along a certain part of the boundary. It consists in the introduction of a further source/sink term for the cells that are adjacent to the boundary of the integration domain, and can be accounted for with the following reformulation of the known term bi: bi = Vi(ηn i ) + ∆tpi ˆQn i − ∆t j∈Fi/ ˆFi λj qL(ηn i , x, y)h(ηn i , x, y)dλj, (2.29) where Fi is like ˆFi, but includes the boundary edges, thus Fi/ ˆFi is the subset of the edges of the i-th polygon belonging to the boundary and qL(ηn i , x, y) is the outgoing water flux (water discharge per unit vertical area) normal to the boundary. The new source/sink term affects the numer- ical stability of the method, therefore, (2.28) needs to be modified to account for the outgoing water flux as follows: i∈Ωd Vi(ηn i ) + ∆t i∈Ωd pi ˆQn i + − ∆t i∈Ωd j∈Fi/ ˆFi λj qL(ηn i , x, y)h(ηn i , x, y)dλj > 0 1 d Nsd. (2.30) However, in this work, this type of BC is computed like a rating curve of outgoing water flux in every i-th cell. Thus equation (2.29) becomes:Implemented Neumann boundary conditions bi = Vi(ηn i ) + ∆tpi ˆQn i − ∆t pi ci h(ηn i , x, y)mi . (2.31) 30
  36. 36. 2.4 boundary conditions 2.4.2 Head-Based boundary conditions (Dirichlet) Dirichlet BC assign the time-variable value for η at some boundary or inter- nal cells according to a known function. Therefore, for any i cell belonging to the boundary: ηn+1 i = ˆηi(tn+1 ), (2.32) where ˆηi(t) is an external forcing known a priori. Such i cells are called Dirichlet cells (DC). In this case, the non-linear system to be solved is formed by (2.32) for the DC and by (2.27) for the other cells. Therefore, such a system has a non-linear part similar to (2.5), i.e. one resulting in a monotonic function of η but also having a linear part which is not symmetric. In fact, for a cell adjacent to a DC, the left-hand side of (2.27) depends on values from the DC. On the contrary, (2.32) does not contain any unknown values from neighboring cells. Because this asymmetry makes the new algebraic equation system different from (2.5), the procedure described in Appendix to solve it may not work correctly. It is possible to avoid this problem by splitting the matrix T, defined by (2.5), into two components: T = TND + TD , (2.33) where TND and TD are both symmetric matrices constructed as follows: TND ij = Tij if neither i-th nor j-th cells are DC 0 elsewhere , (2.34) and TD ij = Tij if either i-th or j-th cells are DC 0 elsewhere . (2.35) The matrix TND contains information about the connection between cells not belonging to the boundary, whereas TD assumes non-zero values only in correspondence of DC. Then, (2.27) becomes: Vi(ηn+1 i ) + Np i=1 TND ij ηn+1 j + Np i=1 TD ij ηn+1 j = bi. (2.36) Equation (2.36) is applied to non-Dirichlet cell i; however TD ij can assume Implemented Dirichlet boundary conditions non-null values only if j refers to a Dirichlet cell, therefore (2.36) can be re-arranged, using (2.32), as: Vi(ηn+1 i ) + Np i=1 TND ij ηn+1 j = bi − Np i=1 TD ij ˆηj(tn+1 ), (2.37) where ˆηn+1 j , related to DC domain, moves to the right-hand side because it is known. The final solver of the BEq is represented by the system (2.37) for all the non-DC. The matrix TND satisfies the appropriate T1 or T2 prop- erties defined in Appendix A and then (2.37) is solved with the procedure described in Appendix A. 31
  37. 37. the boussinesq’s groundwater equation 2.5 lesson to take home The physical problem has now been explained, while PDE and numerical method have been analyzed in depth. To solve the problem, the domain is divided in polygons organized in an orthogonal unstructured grid; BCs are evaluated as problem forcings. The next step (and also the next chapter) is to implement all these feature in an object-oriented code. However, this has to be done through a well- planned structure, to be reused with several equations and eventually to be extended to other types of PDE such as the elliptic, the hyperbolic, etc. 32
  38. 38. 3S O F T WA R E I M P L E M E N TAT I O N Contents 3.1 Object-oriented unstructured mesh 36 3.1.1 First design step: find different ways of imple- mentation 36 3.1.2 Second design step: class hierarchy 37 3.1.3 Third step: the implementation 38 3.2 Object-oriented differential equations 41 3.2.1 First design step: find different implementation ways 41 3.2.2 Second design step: class hierarchy 42 3.2.3 Third step: the implementation of the BEq 51 3.3 Input/output management 55 3.4 Conclusions 55 The parabolic PDE to be solved is given in Chapter 2, while the numeri- cal scheme used is summarized in Appendix A. Thus, the purpose of this chapter is to discuss a sensible object-oriented design that will implement the numerical scheme. This means that a standard object-oriented structure will be created. For it to become a template for the implementation of every type of differential equation, the code must be easy to maintain, portable and reusable. In so doing, it is essential to design a hierarchy of interfaces and abstract classes that will guide the developer during the implementation of a numer- ical model. In this way, every developer only has to implement the functions of his or her numerical problem, helped by a prearranged structure of meth- ods and classes. To achieve all of this, the big problem must be split into three smaller ones: • the problem of the implementation of an unstructured mesh; • the problem of creating a resolution wizard for any type of differ- ential equation, in particular, one that can implement parabolic PDE solutions, with BEq as an application example; • the problem of managing the input/output of scientific software in a standard way. Definitions of abstract class/abstract method and interface are given here, so as to better understand the sections that follow (the information in the definitions is taken from [43, 38, 14]). An abstract class is a class that contains abstract methods. It cannot be in- The definitions of Abstract Class and Abstract Method stantiated directly, and it may provide no implementation, or an incomplete implementation. An abstract class is created when you want to manipulate a set of classes through their common interface. 33
  39. 39. software implementation Using the example about matrices presented in Section 1.2, an abstract class may be DoubleMatrix2D.java, where it is possible to declare common vari- ables and abstract methods, or implement common methods of classes such as DelegateDoubleMatrix2D.java, DenseDoubleMatrix2D.java, DiagonalDoubleMatrix2D.java , SelectedDenseDoubleMatrix2D, SparseDoubleMatrix2D.java, WrapperDoubleMatrix2D.java, etc. As a concrete example, the source code of Parallel Colt (see Appendix C on page 75), is shown in Listing 1, while Listing 2 represents the imple- mentation of abstract methods contained in the base class (the concepts of base class and derived classes have already been explained in paragraph Iheritance of Subsection 1.2.2 on page 13). Listing 1: Java implementation of the abstract class DoubleMatrix2D, summarized from the source code of Parallel Colt 1 /* 2 Copyright (C) 1999 CERN - European Organization for Nuclear Research. 3 Permission to use, copy, modify, distribute and sell this software and 4 its documentation for any purpose is hereby granted without fee, provided 5 that the above copyright notice appear in all copies and that both that 6 copyright notice and this permission notice appear in supporting documentation. 7 CERN makes no representations about the suitability of this software for any 8 purpose. It is provided "as is" without expressed or implied warranty. 9 */ 10 package cern.colt.matrix.tdouble; 11 12 public abstract class DoubleMatrix2D { 13 14 /** 15 * Return the maximum value of this matrix together with its location 16 * 17 * @return maximum_value, row_location, column_location; 18 */ 19 public double[] getMaxLocation() { 20 int rowLocation = 0; 21 int columnLocation = 0; 22 double maxValue = 0; 23 24 // implementation omitted 25 26 return new double[] { maxValue, rowLocation, columnLocation }; 27 } 28 29 /** 30 * Construct and returns a new empty matrix of the same dynamic type 31 * as the receiver, having the specified number of rows and columns. For 32 * example, if the receiver is an instance of type 33 * DenseDoubleMatrix2D the new matrix must also be of type 34 * DenseDoubleMatrix2D, if the receiver is an instance of type 35 * SparseDoubleMatrix2D the new matrix must also be of type 36 * SparseDoubleMatrix2D, etc. In general, the new matrix should 37 * have internal parametrization as similar as possible. 38 * 39 * @param rows 40 * the number of rows the matrix shall have. 41 * @param columns 42 * the number of columns the matrix shall have. 43 * @return a new empty matrix of the same dynamic type. 44 */ 45 public abstract DoubleMatrix2D like(int rows, int columns); 46 47 } It is easy to understand that Listing 1 is an abstract class, from the syntax abstract at line 12. In this code there are two methods: the first (at lineThe sintax abstract 19) is an implemented method because the algorithm to find the location 34
  40. 40. software implementation of maximum value is the same for any type of matrix; the second (at line 45) is an abstract method because no algorithm is given, rather it must be implemented in the derived class. In this base class no variable has been instantiated. Listing 2: Java implementation of the derived class DenseDoubleMatrix2D, summarized from the source code of Parallel Colt 1 /* 2 Copyright (C) 1999 CERN - European Organization for Nuclear Research. 3 Permission to use, copy, modify, distribute and sell this software and 4 its documentation for any purpose is hereby granted without fee, provided 5 that the above copyright notice appear in all copies and that both that 6 copyright notice and this permission notice appear in supporting documentation. 7 CERN makes no representations about the suitability of this software for any 8 purpose. It is provided "as is" without expressed or implied warranty. 9 */ 10 package cern.colt.matrix.tdouble.impl; 11 12 public class DenseDoubleMatrix2D extends DoubleMatrix2D { 13 14 protected double[] elements; 15 16 /** 17 * this method is a builder: it constructs and returns a new empty matrix 18 * of type DoubleMatrix2D, having the specified number of rows and columns. 19 * 20 * @param rows 21 * the number of rows the matrix shall have. 22 * @param columns 23 * the number of columns the matrix shall have. 24 * @return a new empty matrix of the same dynamic type 25 */ 26 public DoubleMatrix2D like(int rows, int columns) { 27 return new DenseDoubleMatrix2D(rows, columns); 28 } 29 30 } It is easy to understand that Listing 2 is a derived class from the syntax extends at line 12. This means that the class DenseDoubleMatrix2D.java is an The sintax extends extension of DoubleMatrix2D.java because it can use all of the implemented methods of the super class, and it has to implement those methods that are declared in the super class. Indeed, the method like(int rows, int columns) is abstract in the super class, and it must be implemented here, returning a new DenseDoubleMatrix2D. In particular, the latter method is a builder [18]. Quoting Wikipedia (2014b) [39] - The builder pattern is an object creation soft- ware design pattern. Unlike the abstract factory pattern and the factory method pattern, whose intention is to enable polymorphism, the intention of the builder pat- tern is to find a solution to the telescoping constructor anti-pattern. The telescoping The Builder Pattern: object creation software design pattern constructor anti-pattern occurs when the increase of object constructor parameter combination leads to an exponential list of constructors. Instead of using numerous constructors, the builder pattern uses another object, a builder, that receives each initialization parameter step by step and the returns the resulting constructed object at once-. The builder usually builds a class, but here the designer of Parallel Colt thought it was better to operate in this way. At the same time, every object of type DenseDoubleMatrix2D.java has the method getMaxLocation(). The keyword interface takes the concept of abstractness one step further. While in an abstract class it is possible to implement a method, it is not Interface, how to use itpossible in an interface, which allows the developer to determine method 35
  41. 41. names, argument lists, and return types, but no method bodies. A class that implements an interface must implement all the methods described in the interface, only abstract class is not forced to implement all methods. Interfaces are a way to achieve polymorphism. 3.1 object-oriented unstructured mesh A mesh to discretize the integration domain can be structured or unstruc- tured. The unstructured mesh is used in this work and is now analyzed in depth. However, an empty interface has been created providing a hierarchy in which one can include a structured mesh if desired (see Figure 7). 3.1.1 First design step: find different ways of implementation This step allows the creation of the most general and basic structure, before implementing a class hierarchy for an object-oriented unstructured mesh. Observing Figure 7, this elementary structure is created by finding all the possible ways to implement every element of the diagram. MESH for NUMERICAL METHOD Structured mesh Unstructured mesh Adjacency Matrix Based Neighbour Matrices Based Column compressed format Row compressed format Triplet format Figure 7.: Basic structure of the class hierarchy to create a mesh for numerical meth- ods In other words, if the behaviour of an element of the diagram, e.g. the storage of the adjacency matrix, can be implemented in three different ways,Unstructured Mesh: the idea behind the basic structure of a class hierarchy then the same number of branches must start from the element. For exam- ple, still referring to Figure 7, an unstructured mesh can be implemented in at least in two ways: by adjacency matrix or by neighbour matrices1. This 1 The topological connection of the geometrical elements of an unstructured mesh can be de- scribed by neighbour matrices. In this manner, the elements of the grid are tagged with abso-
  42. 42. 3.1 object-oriented unstructured mesh means that the “topological connection” behaviour of an unstructured mesh can be described in two different manners, thus two new branches have to start from the Unstructured Mesh class. The red arrows mark the choices made for this work: an unstructured mesh to discretize the integration domain, using an adjacency matrix to Unstructured Mesh: the branch developed in this work describe the topological connection of the geometric elements, and storing the matrix in row-compressed format. Obviously, just as the topological connections can be described in different ways, the same adjacency matrix can be stored in different compressed formats, e.g. column-compressed or triplet format. 3.1.2 Second design step: class hierarchy The next step is to fill the basic diagram in Figure 7 with abstract and implemented methods to describe the behaviours of each element and then to assign the appropriate labels (interface or abstract class). The result is a Unstructured Mesh: the design of the Class Hierarchy class diagram that highlights the class hierarchy. Starting from the highest level of the diagram, the methods to describe the features of an unstructured mesh are three: • getGridProperties(): This method is useful in describing features strictly related to the partition of the domain. This can be the number of poly- gons that cover the domain, the length of their edges, the euclidean distance between the barycentres of adjacent cells, etc. However, this method cannot be implemented yet, because the properties of a grid can be computed by an algorithm or read from files received as output from a mesh generator. These informations are typical of the problem analyzed, so this method can be implemented only in derived classes, during the implementation of the mathematical model. • getPolygonsProperties(): Surely, the physical problem to be solved has some features that are discretized as properties of each polygon of the mesh (e.g. piezometric head in a phreatic aquifer studied with the finite volume method). However, this depends on which problem you are solving. Thus, this method can only be implemented in the code of the model in analysis. • getSidesProperties(): This method is like the previous method, but in- volves features that are discretized as properties of sides of each poly- gon of the mesh. Thus, the behaviours of this class are known, but it is not possible to imple- ment them. UnstructuredMesh has to be an interface and, once implemented, it can completely describe the physical properties of the integration domain. Furthermore, there is the necessity to describe which polygons are con- nected to each other to simulate the evolution of the solution within the system. There are different ways to do that; here, adjacency matrix and neighbour matrices are quoted. lute numbering, while the sides are tagged with local counter-clockwise numbering for each polygon. Thus, every edge has a double label, one for each neighbouring cell. The only way to describe the topological connections is with two matrices with polygon numbers as columns, and side numbers as rows. The first matrix contains the polygon number which is adjacent to the i-th polygon through the j-th side of the i-th polygon. The second matrix contains the side number of the polygon which is adjacent to the i-th polygon through the j-th side of the i-th polygon. 37
  43. 43. software implementation Thus, it is not possible to add a method directly into the UnstructuredMesh interface to describe the topological connection, because the developer has to have the opportunity to chose which is the best way to implement this feature within his model. It is better to create two other interfaces which extend UntructuredMesh, inheriting all its methods. Then • AdjacencyMatrixBased has the additional method getAdjacencyMatrix(); • NeighborMatricesBased has the additional methods getNeighborCell() and getNeighborSide(); in order to describe the topological connections. The last and lowest level of the diagram in Figure 7 shows those elements (Column compressed format, Row compressed format, Triplet format) that can be implemented as abstract classes: • AbstractCCAdjacencyMatrixBasedMesh; • AbstractRCAdjacencyMatrixBasedMesh; • AbstractTripletAdjacencyMatrixBasedMesh; Indeed, these inherit all the methods of the interface AdjacencyMatrixBased and contain the declaration of the grid variables and those related to the adja- cency matrix, once the store format has been decided. For example, the class Abstract TripletAdjacencyMatrixBasedMesh contains the integer Np as num- ber of polygons and the arrays lengthSides, euclideanDistance and planArea to describe respectively the length of the sides of each polygon, the euclidean distance between the barycentres of adjacent cells, and the planimetric area of each polygon; it contains the array Ti, Tj and Tl to describe the topological connections in the triplet format (see Appendix B, Section B.1 on page 71). However, the methods inherited from AdjacencyMatrixBasedInterface are still abstract (without implementation). The reason for this is given in the next paragraph. The resulting class diagram is shown in Figure 8. 3.1.3 Third step: the implementation Now, to implement an unstructured mesh, you have to create a derived class, by extending the AbstractRCAdjacencyMatrixBased, and implement the in-Unstructured Mesh: the implementation of the class hierarchy herited methods inside it, declaring the new variables that you need. In this work, the derived class is still an abstract class due to the need to implement a one-dimensional mesh and a two-dimensional mesh within the same code. Thus, as shown in Figure 9, all the variables used to define the domain of a phreatic aquifer are declared in AbstractDomain. In this way, you have the following advantages: • writing as little code as possible because the derived classes contain only the implementation of the methods; • dynamic binding because you can declare the object mesh as type AbstractDomain, deciding which type of mesh to allocate only at run- time. The two derived classes only contain the implementation of the methods: 38
  44. 44. 3.1 object-oriented unstructured mesh Figure 8.: Abstract class hierarchy for unstructured meshes 39
  45. 45. software implementation Figure 9.: Class diagram for unstructured meshes in the BEq implementation 40

×