Repliss (Bachelor-/ Master-theses)
Repliss (Replicated information systems with strong guarantees) is a tool for verifying and testing applications, which are built on databases that provide only weak consistency guarantees.
There are several opportunities for working on a Bachelor- or Master theses related to the Repliss tool. Some suggestions are listed below. If you are interested in working in this area, please contact Peter Zeller. We can then concretize the topic and adapt it to your skills and requirements (in particular Bachelor/Master).
Weak Consistency: In a replicated setting, where data is stored at different places, consistency is an essential aspect. Strong consistency guarantees like serializability or linearizability are easy to program with, but cannot operate when replicas are disconnected. Even when all replicas are available, strong consistency has some costs in terms of latency and often also throughput. Therefore many replicated databases provide only weak consistency guarantees like eventual or causal consistency to programmers.
Writing correct programs on top of weakly consistent databases is more difficult, because the behavior of the database is less predictable. With Repliss we are building tools and techniques which can be used to build correct applications even in this difficult setting, and thus get the availability and performance benefits without sacrificing desirable functional properties of the application.
Verification and Automatic Testing: Verification and testing are used to increase the quality of applications. In the context of concurrent systems, it is inefficient to write test cases for all possible interleavings of actions. Therefore property based testing is sometimes used. With this technique only the desired properties of an application are specified, but no concrete inputs or executions. The test system will then automatically generate and inputs, schedule executions, and check the properties. Repliss includes such an automatic testing tool, which is tailored to the scenario of weakly consistent databases.
Automatic testing can not cover all possible executions in the general case and gives little insights into why a property holds for an application. Verification usually requires more manual effort, but therefore a verified program is guaranteed to satisfy its specification. Repliss includes a program logic and automatic verifier for the domain of applications built on top of weakly consistent databases. The verification tool of Repliss is built on a program logic and implemented with the Why3 verification tool.
Topic 1: Generating Code from Repliss
Users of Repliss write Code in a small domain specific language (DSL) which is suitable for verification.
The main challenges in this thesis are:
- A Repliss program defines an API for clients. This API has to be mapped to the target language in a meaningful way. Features like algebraic data types have to be represented if the target language does not provide them (as in Java).
- An API call is handled sequentially in Repliss. However, in practice database calls often should be executed in parallel. For example fetching 100 Tweets to show a Twitter timeline should not take 100 round-trips to the database to complete. Instead all 100 Tweets should be fetched in parallel. Therefore, the generated code should use parallelism when possible without changing the semantics of the code.
Prerequisites: Good programming skills and knowledge about compilers (“Compiler and Language-Processing Tools” or FGDP and SE1)
Topic 2: Implement Concolic testing for Repliss
The automatic testing tool in Repliss generates random inputs (with some heuristics) to test programs. This relatively simple approach can already find many bugs, but for complex examples is can take a very long time or not find a bug at all. Moreover, when no bug is found, there is no guarantee made for the program.
Symbolic and concolic testing is a technique to explore executions more systematically. In some cases it can even be used to prove the absence of bugs for executions up to a given length.
The goal of this thesis is to extend the existing testing tool (written in Scala) with concolic testing.
Prerequisites: Good programming skills and knowledge about compilers and logic (lectures “Compiler and Language-Processing Tools”, Logik, FGDP and SE1)
- Peter Zeller:
Testing properties of weakly consistent programs with Repliss.
- Patrice Godefroid, Nils Klarlund, Koushik Sen:
DART: directed automated random testing.
- Cristian Cadar, Dawson R. Engler:
Execution Generated Test Cases: How to Make Systems Code Crash Itself.
- Koushik Sen, Darko Marinov, Gul Agha:
CUTE: a concolic unit testing engine for C.
ESEC/SIGSOFT FSE 2005
- Aggelos Giantsios, Nikolaos S. Papaspyrou, Konstantinos Sagonas:
Concolic testing for functional languages.
Topic 3: Efficient evaluation of first order formulas for testing
The execution of Repliss programs requires evaluating first order formulas on a finite domain. This is done for checking invariants and for calculating the result of database queries.
To make testing more efficient, this evaluation must be handled in an efficient way. A previous Bachelor Thesis has done significant steps in improving the evaluation and we released a Java library for the efficient evaluation of first order formulas.
The goal of this thesis is to generalize the techniques from the previous thesis, extend them with more optimizations, and evaluate them against existing constraint and SMT solvers.
Prerequisites: Algorithms and logic (lecture “Logik”)
Topic 4: Extending the Repliss verification tool
In this thesis, you will extend the Repliss language with several interesting features. You will add data types like lists, sets, maps to the Repliss data model, as well as support for comprehensions.
The main challenge here is to encode these features in a first-order SMT solver which allows for automatic verification.
As an example, the following paper describes how to handle some of the problems with comprehensions:
- Leino, K. Rustan M., and Rosemary Monahan. “Reasoning about comprehensions with first-order SMT solvers.” Proceedings of the 2009 ACM symposium on Applied Computing. ACM, 2009.
Prerequisites: Good skills in functional and object-oriented programming (implementation language is Scala); Logic (lecture “Logik”)
- Sebastian Burckhardt: Principles of Eventual Consistency
- Antidote Database
- Marc Shapiro, Nuno M. Preguiça, Carlos Baquero, Marek Zawirski: A comprehensive study of Convergent and Commutative Replicated Data Types. INRIA Technical Report RR-7506, January 2011. http://hal.inria.fr/inria-00555588