Please get in touch if you’d like to know any more about what these projects might entail. The level of detail provided for various projects varies quite a lot; this does not necessarily mean that the detailed projects are more interesting than the less detailed ones. Also, this list of ideas is not meant to be exhaustive, so if you have an idea for a project of your own that seems to fit into my general areas of interest, please suggest it!
These ideas would be suitable bases for PhD projects, and might, if suitably stripped down, serve as starting points for Honours projects too.
It is now possible to describe real world languages, such as C, Java and Ada, using formal methods (typically by giving the languages an operational semantics). Real world chip-architectures (such as the ARM instruction set) have also been formally described. The big, remaining project is to tie two such descriptions together, producing a compiler that is proven to produce correct output.
Issues:
Modern compilers include many optimisations. Can a verified compiler also include verified optimisations?
The “obvious” way to get a verified compiler is to define the function that maps from source-code to target assembly in a logic, and then prove that this function preserves behaviour (for some value of “behaviour”). Another approach might also be possible: write code to prove that a piece of assembly implemented the same behaviour as the source. As long as the proof procedure had high coverage (it might be tailored to only verify the output of a particular compilation function, which function might also output hints to help the procedure do the proof), this would be “just as good”.
Tangents and Variations:
Compile a functional or logic language like SML or Prolog.
Give a semantics to the sort of code that appears in operating system kernels, which mixes C and assembler, potentially within the same function.
Verify sophisticated optimisations on intermediate languages.
Further reading:
Formal certification of a compiler back-end, or: programming a compiler with a proof assistant, by Xavier Leroy.
Formal verification of a C compiler front-end, by Xavier Leroy et al.
But: Xavier Leroy may well “do” this whole project in the near future; his existing verifications go from a largish subset of C to a subset of the PowerPC instruction set.
Design a language capable of expressing the intricacies of modern communications protocols (such as TCP), but which also admits a reasonably efficient checking function, able to determine if an observed trace could have been generated by a given protocol description. Such a tool and language would be extremely valable to the community of protocol designers and implementors.
Tangents and Variations:
Write a tool to animate a specification so that it behaves as a (possibly inefficient) implementation of the protocol.
Further reading:
More recent papers on the Network Semantics page at Cambridge.
Researchers in the area of programming language semantics have long been exercised by the problems thrown up the notions of binders and α-equivalence. Some progress has been made recently, but focussing on types that are quotients of algebraic or inductive types, where values are finite in size. Co-algebraic types are potentially infinite in size, which means that assumptions like the one that a value only has finitely many free variables will no longer hold.
Developing a theory of co-algebraic types with binders would probably involve proving co-induction and co-recursion principles for these types, and demonstrating their utility in proofs drawn from the literature. One possible example would be the Boehm tree, which is used in the theory of the λ-calculus.
Further reading:
Recursive function definition for types with binders (covers algebraic types).
A possible structure for a PhD project is to mechanise some large-ish piece of mathematics (X) and to then use this to solve some “real” computer science problem (Y).
For example, mechanise real analysis, and then use it to verify floating-point algorithms (John Harrison). Or mechanise probability theory, and then use it to verify probabilistic algorithms (Joe Hurd).
This area is basically that of HCI. How would one take an interactive, but powerful, tool such as ACL2, HOL, Isabelle, or PVS and make it more user-friendly, without compromsing its ability to handle large problems? A graphical user-interface might be appropriate, but would a GUI put off the power-user? Perhaps a more significant rethinking of the theorem-proving work-flow is necessary.
Further reading:
Eclipse Proof General: A Generic Interface for Interactive Proof, by Dave Aspinall et al.
Also see the programming project below.
This would be work on topics such as first-order proof, rewriting and decision procedures for subject domains like arithmetic. There is a wealth of expertise in CSL and the Logic & Computation programme on these areas, so I might not necessarily be the best person to act as supervisor, though work on these technologies in an LCF setting (which tends to be interesting research) would be something I would be involved with.
A proof over an LCF-style kernel is usually expressed (by the user) in terms of high-level steps or tactics. Ultimately however, the proof is still executed at the lowest level of primitive inference steps. If these steps are recorded, the resulting “proof trace” is very long, for all that it may be possible to execute it quickly.
A possible project would be to investigate these traces and optimise them so that they proved the same results but did so faster. Many of the tricks of compiler optimisation might well have proof analogues, and insights from proof theory may also be relevant.
Issue: this may not be quite enough work to warrant a PhD.
These projects don’t necessarily involve much original research, and might be more suitable for software engineering group projects. If given an additional research slant, replacing some coding effort, they might be suitable as Honours projects. They are unlikely to be suitable as PhD projects, unless a compelling computer science application was found and demonstrated in the course of the thesis. Most of these projects would involve work on the HOL theorem-proving system, mainly because it is a system I know well, making it easy for me to advise on its modification.
Upgrade the current HOL implementation (“HOL4”) to run under the MLton compiler. This would require a fairly substantial re-engineering of the basic HOL work-flow, and the way it represents theories. Once this much was done, there’d be scope to play with connections to Eclipse (see interfaces project above).
Further reading:
When using an interactive proof assistant, a powerful tactic for finishing goals is the use of first order proof procedures. Unfortunately, these procedures are all-or-nothing affairs: if they succeed, they succeed completely, but if they fail to prove the goal, the user knows only that the provided lemmas and assumptions were not sufficient to allow the procedure to find a proof.
The advent of model generation tools promises a way out of this painful situation. These tools can be used to generate what is effectively a counter-example to false goals. This is much more helpful for the user. Either the tool's output really is a counter-example and their proof will never succeed, or the counter-example will be impossible because the user has forgotten to tell the tools some relevant lemma from a database of known facts.
The student’s task in this project would be to implement a connection between two existing tools: the interactive system HOL4, and the model generation tool Darwin. This connection would provide a user-friendly implementation of the scheme described above: when an invocation of the first order prover in HOL failed, the connection implementation would translate the HOL goal into terms that Darwin could understand, call Darwin, and then finally translate Darwin's provided counter-example back into a human-readable form for the user’s benefit.
This project would suit a student familiar with functional programming (HOL is written in SML), and happy with the concepts of first order logic (terms, formulas, quantifiers). The student would be jointly supervised by Michael Norrish and Peter Baumgartner.
HOL’s built-in first-order prover does a search for a proof, and then replays this proof within the kernel. The time doing the search tends to dominate replay, so a nice optimisation for all HOL proofs might be to cache the search information. Then if the proof script gets re-run, the cache (on disk) can be referred to.
Similarly, proofs from the built-in arithmetic decision procedure might be cached.