Calrg14 tm351
Upcoming SlideShare
Loading in...5
×
 

Calrg14 tm351

on

  • 99 views

 

Statistics

Views

Total Views
99
Views on SlideShare
99
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • TM351 – a new course, currently in production… <br /> <br /> Level 3 <br /> 30 points <br /> First presentation slated for October 2015 (15J)
  • It’s replacing a “traditional” databases course, but we’re planning quite a twists… What those twists are in content terms, though, is the subject of an other presentation…
  • What I am going to talk about are two new things we’re exploring in the context of the course, and which we’re hopefully might also prove attractive to other course teams.
  • The first thing are virtual machines. <br /> <br /> These have already been used on a couple of other OU courses – TM128 and M812 both use virtual machines – but we are taking a more fundamental view about how to use notebooks to delivering interactive teaching material as well as software application services. <br /> <br /> So what is a virtual machine?
  • We’re all familiar with the idea that a student can run OU supplied software, either third party software or OU created software, or a combination of both in the case of open source applications where we take some open code and then modify it ourselves, on the student’s own desktop.
  • We may even require students to install more that one piece of software, perhaps further requiring that these applications can interoperate. <br /> <br /> With a move to be be “open” and agnostic towards a particular operating system, there are considerable challenges to be faced: <br /> <br /> software libraries should ideally be cross platform rather than multiple native implementations of the ostensibly the same application; <br /> software versions across applications should update in synch with each other; <br /> the UI, or look and feel, should be the same across platforms – or we have more writing to do; <br /> support issues are likely to scale badly for us as we have to cope with more variations on the configuration of individual student machines (for example, different operating systems, different versions of the same operating system); <br /> <br />
  • One way of mitigating against change is to settle on a single UI space – such as a browser. <br /> <br /> Applications can be built solely within the browser, and made available to the user requiring little more desktop (or server) application support other than a web server. <br /> <br /> Application front ends written in HTML5 and Javascript can provide an experience rich enough to rival that of a native application. <br /> <br /> Application front ends can also be created for applications running as services either on the students’ desktop or via a remote server. <br /> <br /> Applications can draw on files in a folder on the student’s desktop machine, and the browser can be used to save files (e.g. from the internet) into that folder.
  • To get round the problem of having to install software onto multiple different possible system configurations, how much easier would it be if we knew exactly what operating system each student was running and they were all running exactly the same operating system. <br /> <br /> Virtualisation platforms such as Viirtualbox and VMware are cross-platform applications that can be downloaded to a student’s own machine and that then allow an additional guest operating system to be installed in its own container running on the the student’s own computer (the host) via the virtualisation platform. <br /> <br /> The guest operating system and the software that runs on the guest operating system are said to define a virtual machine or “VM”. <br /> <br /> The virtual machine can be defined by a central service and then delivered to the students in such a way that each receives a copy of exactly the same virtual machine in terms of its operating system and the applications preinstalled onto it.
  • What this means is that we can define a VM, preinstall software onto it, and ship it to students so they can run it via a virtualisation platform installed onto their machine. <br /> <br /> The VM can run applications as services, exposing their UIs via a browser. Files can easily be shared between the host and guest machines. <br /> <br /> As far as students are concerned, all they need to do is install a virtualisation system onto their computer, and then the same OU virtual machine into that system irrespective of the operating system they happen to be running. <br />
  • It is also possible to run the VM on a remote server, with the students accessing the services running in that VM via their browser. <br /> <br /> This means that students can access services using computers that themselves may not be capable of installing or running particular applications – such as some tablet computers.
  • Notebook computing is my great hope for the future. Notebook computing is like spreadsheet computing, a democratisation of access to and the process of practically based, task oriented computing. <br /> <br /> Spreadsheets help you get stuff done, even if you don’t consider yourself to be a programmer. My hope is that the notebook metaphor – and it’s actually quite an old one – can similarly encourage people who don’t consider themselves programmers to do and to use programmy things.
  • Notebook computing buys us in to two ways of thinking that I think are useful from a pedagogical perspective – that is, pedagogy not just as a way of teaching but also as a way of learning in the sense of learning about something through investigating it. <br /> <br /> Here, I’m thinking of an investigation as a form of problem based learning – I’m not up enough on educational or learning theory to know whether there is a body of theory, or even just a school of thought, about “investigative learning”. <br /> <br /> These two ways of thinking are literate programming and reproducible research.
  • In case you haven’t already realised it, code is an expressive medium. Code has its poets, and artists, as well as its architects, engineers and technicians. One of the grand masters of code is Don – Donald – Knuth. <br /> <br /> Don Knuth said “A literate programmer is an essayist who writes programs for humans to understand” as part of a longer quote. Here’s that longer quote: <br /> <br /> “Literate programming is a programming methodology that combines a programming language with a documentation language, making programs more robust, more portable, and more easily maintained than programs written only in a high-level language. <br /> “Computer programmers already know both kind of languages; they need only learn a few conventions about alternating between languages to create programs that are works of literature. A literate programmer is an essayist who writes programs for humans to understand, instead of primarily writing instructions for machines to follow. When programs are written in the recommended style they can be transformed into documents by a document compiler and into efficient code by an algebraic compiler.” <br /> <br /> Notebooks are environments that encourage the programming of writing literate code. Notebooks encourage you to write prose and illustrate it with code – and the outputs associated with executing that code. <br /> <br /> In many cases, the code may already exist. The programming is then more a case of applying an existing bit of code to a new bit of data. <br /> <br /> That is what you do in a spreadsheet, Oftentimes the code is hidden – or automatically generated – by a menu option selected by graphical user interface. But there is no magic going on (at least, no more magic than is associated with the ability to take electronic representations of text and do something to them that makes them responsible for what appears on a screen, keeps planes flying, and seemingly creates and destroys money on the fly in the world’s financial systems). <br /> <br /> Code is an incantation – and when you select a menu option in your spreadsheet you are asking the computer to perform that incantation and execute some code. You can also copy and paste code and then run it and it will have the same effect as selecting that operation from a menu. That’s how it works. <br /> <br /> In literate programming, you can see a human description of what you want to achieve by executing the code, then the code, then the result of executing the code, then an interpretation of the result. Introduction. Method. Results. Conclusion. You know this four part structure, particularly if you’ve ever taught – or been taught – how to write a formal practical report. <br /> <br /> But you can apply it at an atomic level to. At the level of a particular event. Like a particular scene in a narrative chart, or a particular geotemporal location in a time map.
  • The other idea that the notebooks buy is into is reproducible research. I love this idea and think you should too. It lets archiving make sense. <br /> <br /> Do I really have to say any more than just show that quote? <br /> <br /> Now you may say that that’s all very well for, I don’t know, physics or biology, or science, or economics. Or social science in general, where they do all sorts of inexplicable things with statistics and probably should try to keep track of what they doing. <br /> <br /> But not the humanities. <br /> <br /> But that’s not quite right, because in the digital humanities there are computational tools that you can use. Particularly in the areas of text analysis and visualisation. Such as some of the visualisations we saw in the first part of this presentation. <br /> <br /> But you need a tool that democratises access to this technology. You need an environment that the social scientists found in the form of a spreadsheet. <br /> <br /> But better. <br /> <br /> One that helps you keep track of what you did and that produces a serialisation that can be read back in a linear way that makes sense. <br /> <br /> Even if you don’t create it in a linear way. <br /> <br /> Even if you did that bit before this bit, but the way you tell it is as this bit before that bit. <br /> <br /> Which is one reason why postgrads get the fear that their experiment is going wrong. (Don’t panic! Those published papers you read? The work as described never took place the way it was described. The write-up is a post hoc rationalisation of the bits that worked, retold in such a way that it makes it look as if it was planned that way all along.) <br /> <br /> And here’s a another dirty secret – most of the published reports you read that write up one experiment of another are not replicable from that report.
  • (I also like to think of notebooks as a place where I can have a conversation with data.).
  • So how do notebooks help? <br /> <br /> The tool I want to describe is – are – called IPython Notebooks. <br /> <br /> IPython Notebooks let you execute code written in the Python programming language in an interactive way. But they also work with other languages – Javascript, Ruby, R, and so on, as well as other applications. I use a notebook for drawing diagrams using Graphviz, for example. <br /> <br /> They also include words – of introduction, of analysis, of conclusion, of reflection. <br /> <br /> And they also include the things the code wants to tell u, or that the data wants to tell us via the code. The code outputs. <br /> <br /> (Or more correctly, the code+data outputs.)
  • (I also like to think of notebooks as a place where I can have a conversation with data.).
  • (I also like to think of notebooks as a place where I can have a conversation with data.).
  • (I also like to think of notebooks as a place where I can have a conversation with data.).
  • The first thing notebooks let you do is write text for the non-coding reader. Words. In English. (Or Spanish. Or French. I would say Chinese, but I haven’t checked what character sets are supported, so I can’t say that for definite until I check!) <br /> <br /> “Literate programming is a programming methodology that combines a programming language with a documentation language”. That’s what Knuth said. But we can take it further. Past code. Past documentation. To write up. To story. <br /> <br /> The medium in which we can write our human words is a simple text markup language called markdown. <br /> <br /> If you’ve ever written HTML, it’s not that hard. <br /> <br /> If you’ve ever written and email and wrapped asterisks around a word or phrase to emphasise it, or written a list of items down by putting each new item onto a new line and preceding it with a dash, it’s that easy.
  • Here’s a notebook, and here’s some text. <br /> <br /> There’s also some code. <br /> <br /> But note the text – we have a header, and then some “human text”. <br /> <br /> You might also notice some up and down arrows in the notebook toolbar. These allow us to rearrange the order of the cells in the notebook in a straightforward way. <br /> <br /> In a sense, we are encouraged to rearrange the sequence of cells into an order that makes more sense as a narrative for the reader of the document, or in the execution of an investigation. <br /> <br /> The downside of this is that we can author a document in a ‘non-linear’ way and then linearise it for final distribution simply by reordering the order in which the cells are presented. <br /> <br /> There are constraints though – if a cell computationally depends on the result of, or state change resulting from, the execution of a prior cell, their relative ordering cannot be changed.
  • As well as human readable text cells – markdown cells or header cells at a variety of levels – there are also code cells. <br /> <br /> Code cells allow you to write (or copy and paste in) code and then run it. <br /> <br /> Applications give you menu options that in the background copy, paste and execute the code you want to run, or apply to some particular set of data, or text. <br /> <br /> Code cells work the same way, but they’re naked. They show you the code. <br /> <br /> At this point it’s important to remember that code can call code. <br /> <br /> Thousands of lines of code that do really clever and difficult things can be called from a single line of code. Often code with a sensible function name just like a sensible menu item label. A self-describing name that calls the masses of really clever code that someone else has written behind the scenes. <br /> <br /> But you know which code because you just called it. Explicitly. <br /> <br /> Let’s see an example – not a brilliant example, but an example nonetheless.
  • Here’s some code. <br /> <br /> It’s actually two code cells – in one, I define a function. In the second, I call it. <br /> <br /> (Already this is revisionist. I developed the function by not wrapping it in a function. It was just a series of lines of code that wrote to perform a particular task. <br /> <br /> But it was a useful task. So I wrapped the lines of code in a function, and now I can call those lines of code just by calling the function name. <br /> <br /> I can also hide the function in another file, outside of the notebook, then just include it in any notebook I want to… <br /> <br /> …or within a notebook, I could just copy a set of lines of code and repeatedly paste them into the notebook, applying them to a different set of data each time… but that just gets messy, and that’s what being able to call a bunch of lines of coped wrapped up in a function call avoids.
  • As far as reproducible research goes, the ability of a notebook to execute a code element and display the output from executing that code means that there is a one-to-one binding between a code fragment and the data on which it operates and the output obtained from executing just that code on just that data.
  • The output of the code is not a human copied and pasted artefact. <br /> <br /> The output of the code – in this case, the result of executing a particular function – is only and exactly the output from executing that function on a specified dataset. <br />
  • The output of a code cell is not limited to the arcane outputs of a computational function. <br /> <br /> We can display data table results as data tables.
  • We can also generate rich HTML outputs – in this case an interactive map overlaid with markers corresponding to locations specified in a dataset, and with lines connecting markers as defined by connections described in the original dataset. <br /> <br /> We can also delete the outputs of all the code cells, and then rerun the code, one step – one cell – after the other. Reproducing results becomes simply a matter of rerunning the code in the notebook against the data loaded in by the notebook – and then comparing the code cell outputs to the code cell outputs of the original document. <br /> <br /> Tools are also under development that help spot differences between those outputs, at least in cases where the outputs are text based.
  • So can we run virtual machines and IPython notebooks together?
  • The IPython notebooks are actually browser based front end applications being powered by an IPython server…
  • It’s easy enough to run the IPython server on a virtual machine, either running as a guest VM on a student’s host computer, or running as on online service accessed by the student via the web using their own web browser.
  • There is a lot more that could be said – for example: <br /> workflows around the building/provisioning of virtual machines, <br /> how we might be able to host such machines either centrally or as a self-service option, <br /> the corollary between notebook style computing and spreadsheets, <br /> the notion of conversations with data, <br /> etc. etc.

Calrg14 tm351 Calrg14 tm351 Presentation Transcript

  • Imagining TM351 From Virtual Machines to Notebooks Tony Hirst Computing and Communications
  • TM351 15J30L3
  • “The data course” TM351
  • Two new things
  • Virtual Machines
  • Student’s computer e.g. Windows Course software I Personal folder
  • Student’s computer e.g. Windows Course software I Course software II Personal folder
  • Student’s computer e.g. Windows Course software I Course software II Student’s own browser Personal folder Access as web/browser application Download files from web
  • Student’s computer e.g. Windows VirtualBox Application Guest Operating System e.g. Linux Student’s own browser Personal folder Download files from web Access as web/browser application
  • Student’s computer e.g. Windows VirtualBox Application Guest Operating System e.g. Linux Course software I Course software II Student’s own browser Personal folder Download files from web Access as web/browser application
  • Virtual machine Guest Operating System e.g. Linux Course software I Course software II Student’s own browser Personal folder Download files from web Student’s computer e.g. WindowsCloud server Access as web/browser application
  • Notebook computing
  • Literate programming Reproducible research
  • LiterateProgramming “A literate programmer is an essayist who writes programs for humans to understand.” Knuth, Donald E. "Literate programming." CSLI Lecture Notes, Stanford, CA: Center for the Study of Language and Information (CSLI), 1992 1 (1992).
  • ReproducibleResearch “[R]esearch papers with accompanying software tools that allow the reader to directly reproduce the results and employ the methods that are presented in the research paper.” Gentleman, Robert and Temple Lang, Duncan, "Statistical Analyses and Reproducible Research" (May 2004). Bioconductor Project Working Papers. Working Paper 2. http://biostats.bepress.com/bioconductor/paper2
  • [Conversations with data]
  • IPythonNotebook
  • [Corollary to spreadsheets]
  • Task oriented productivity software
  • Direct manipulation, immediate feedback
  • Markdown Cells
  • MarkdownCells
  • Code Cells
  • CodeCells
  • Code Output
  • CodeOutput
  • CodeOutput
  • CodeOutput
  • VM + .ipynb ?
  • Browser IPython Notebook IPython Files
  • Virtual Machine Browser IPython Notebook IPython Files
  • Any questions?