Engineering software is widely employed for its powerful abstraction of scientific and technical knowledge. It enables productive applications, e.g., analysis, prototyping, and manufacturing. Making engineering software requires a profound understanding in the problem domain, as well as the art of engineering it.
Software engineering differs substantially from conventional engineering. To professionally build software, mathematicians, scientists, and engineers need skills including system administration, automatic build, automatic testing, version control, to name but a few. Computer science knowledge like algorithms and data structures is also indispensable. It is a joyful, interdisciplinary, and world-changing enterprise worth sharing with all future engineering practitioners.
2. Computing
❖ Solve problems that can only be solved by computers.
❖ Save resources, shorten cycles, better this world.
❖ Software: scientific; technical; engineering.
❖ Reproducibility is the gold standard for trust.
❖ Engineers may accept an inaccurate result, but it must
be consistently inaccurate.
7. Engineering Software
❖ A software system incorporates and applies scientific
knowledge.
❖ Turn engineering know-hows into software
constructs.
❖ “Software engineering” differs from the “engineering”
we are familiar with.
12. Write Readable Code
❖ Read a lot and then write some.
❖ Delegate anything else to computers.
“Code is read much more often than it is written, so plan
accordingly”
Raymond Chen
17. Basic Tests
❖ Make sure the software does what it did: regression.
❖ At least 2 levels are needed:
❖ Unit tests.
❖ Integration tests, interface tests, system tests, etc.
❖ Unit tests test for the most fine-grain constructs in a
software system.
18. Test for Development
❖ Unit tests are a great tool for quality and productivity.
❖ Some testing is almost always needed while developing code.
Why not doing it in an organized way?
❖ Unit-testing frameworks like Google Test (C++), Python
unittest standard module, or JUnit (Java) are created for this.
❖ Testing should be taken into account in the code
implementation: design for testing.
❖ (Unit) tests may be developed before features: test-driven
development (TDD).
20. What to Do with an Error
❖ Build system and testing: Reduce errors.
❖ Version control: How an error crept in or what it is.
❖ Only when version control is sanely done, you can
bisect.
build / test passes build / test fails
which change introduces the error?
21. Branching
❖ Different “streaks” of
development need to be
traced separately.
❖ Temporal differences are
tracked with the
“streaks”.
❖ Nowadays the common
practice is to use DAG:
git, hg, etc.
“git flow”: http://nvie.com/posts/a-successful-git-branching-model/
22. Version Control 101
❖ Version control is for source code. Archive assets (“blobs”:
binary large objects) elsewhere.
❖ For small-to-medium-sized projects (almost all scientific /
research works), just use git. The decentralized system
works efficiently and securely and has a large community
for support.
❖ Each check-in should be organized logically and locally.
❖ Treat the version control history like code. It adds more
dimensions to the source code.
23. Platform-Centric
❖ High-performance computing (HPC).
❖ Time to results: engineers’ time is more valuable than
computers’.
❖ The problem at hand is complexity.
❖ Physics, HPC, house-keeping, analysis, visualization.
❖ A “platform” segregates everything in layers.
❖ Modular design for millions of lines of code.
25. Optimization
❖ Memory access is expensive.
❖ Branching may be expensive too.
❖ Think like a machine: see through high-level source
code all the way to assembly.
❖ If you don’t write your own compiler, learn C++.
“There are only two hard things in Computer Science:
cache invalidation and naming things.”
Phil Karlton
26. HPC Architecture
❖ Scientific computing takes tremendous computing
power. We are interested in big problems.
❖ Some may be divided to smaller, self-contained sub-
problems, e.g., data analytics.
❖ Some are unavoidably big. A reasonably big problem
may use thousands of CPU cores for days.
❖ Accelerators like GPGPU sometimes speed up, but at a
cost of complicated code.
27. Python for Building Platform
❖ It's impossible to get it right the first time.
❖ Architecture design takes many iterations.
❖ Python allows quick prototyping.
❖ There is almost always a package for you.
❖ Python is either the best or the second best language for
anything.
28. NumPy
❖ N-dimensional array (ndarray).
❖ ndarray is typed and offers very fast speed. Oftentimes
faster than naive C code.
❖ Efficient data storage and flexible access to memory.
❖ Linear algebra (MKL is supported), FFT, etc.
❖ SciPy: application-specific toolbox.
29. Python Is Designed for C/C++
❖ Everything may be replaced by C/C++.
❖ Python is a C library for a dynamically-typed runtime.
❖ Python is slow, but using Python makes the whole HPC system faster.
❖ Performance hotspots.
❖ High-level abstraction in low-level code.
❖ Plain C: Python C API or Cython.
❖ C++: pybind11 (C++11) or boost.python (pre-C++11).
❖ Fortran: f2py (part of numpy).
30. Two Types of Platform
❖ Top-down: lay out everything in Python and replace hotspots
using C/C++
❖ Pro: Fast development. Reach results early.
❖ Con: Python dynamicity weakens robustness.
❖ Bottom-up: lay out core parts in C++ and glue in Python
❖ Pro: Highly robust (if coded right.)
❖ Con: Hard to get it right and take long time to code.
❖ Equally high-performance. Python scripts work as input files.
31. Python Tools
❖ No one escapes from routine work, but Python crushes it.
❖ Data preparation and processing.
❖ Workflow automation.
❖ Distributed processing and parallel computing.
❖ Interactive analysis and visualization.
❖ Having these capabilities and the computing kernel, it’s a
fully-grown computing platform at your fingertip.
32. Data Manipulation
❖ “csv” standard module for comma-separated values.
❖ http://www.pytables.org: HDF5 hierarchical data
access.
❖ http://unidata.github.io/netcdf4-python/: netCDF, yet
another data storage based on HDF5.
❖ http://pandas.pydata.org: de facto tool for data
analytics
33. Workflow
❖ When you want more flexibility than make or shell
scripts. Advance to system admin and/or devop.
❖ https://docs.python.org/3/library/argparse.html:
standard command-line argument processing
❖ https://github.com/saltstack/salt: cloud-oriented
automation for management and configuration
❖ AWS, GCE, Azure all offer SDK for Python.
34. Concurrency
❖ https://docs.python.org/3/library/asyncio.html: support
native asynchronous constructs
❖ https://docs.python.org/3/library/multiprocessing.html:
parallel computing and distributed processing using multiple
processes
❖ Threads can’t simultaneously use multiple CPU cores
because GIL (global interpreter lock).
❖ http://zeromq.org/bindings:python: socket communication
❖ http://pythonhosted.org/mpi4py/: use MPI in Python
35. Interactive Exploratory Computing
❖ http://jupyter.org: run Python everywhere and code it
through browser.
❖ https://notebooks.azure.com: Azure sets it up for you
already
❖ https://matplotlib.org: de facto 2D plotting library
❖ https://www.vtk.org: versatile 3D visualization toolbox
❖ https://www.paraview.org: if you only want a
frontend
36. Everything on Python
❖ Python always gets the jobs done. When it can’t, you easily
bridge to C/C++ through the paved road.
❖ Exception: web browser frontend; only JavaScript works on it.
Quick Iteration Easy Extension
Rich Support
(free as beer)
Ideal Foundation to Build a Platform
37. Work Smart
❖ Coding is the craftsmanship everyone needs to practice.
Everyone needs the skills to command computers.
❖ Understandings to computer science is indispensable:
computer architecture, data structure, algorithms,
programming language, etc.
❖ Make friends.
38. Commercial Code Development
❖ Keep business in mind.
❖ Reorient the pursuit of knowledge to profitability.
❖ Teamwork.
❖ Be the best at what you are doing
❖ Help teammates to be the best at what they are doing
❖ Be honest. Seek help not too early and not too late
❖ If no one knows the right way to do it, work around.
39. Open-Source Code Development
❖ Technical excellency.
❖ Do whatever you want, but do it elegantly.
❖ Business or novelty may or may not matter.
❖ Beauty must be in the equation.
❖ The world is your team.
“Talk is Cheap. Show me the Code.”
Linus Torvalds
40. Developer Communities
❖ Advancing science requires critical discussions. So does
Software.
❖ Find a team that allows you to challenge the status quo
and helps you layout realistic plans.
❖ Go outside and meet other programmers face to face.
❖ Learn to make friends with patches.
❖ Don’t know where to start? Start with Python.
41. Do It Now
You have an idea.
You code it up.
You package and release it.
You get users and collaborators.