Skip to content

Latest commit

 

History

History
280 lines (176 loc) · 29.5 KB

GSoC-2019-Project-Ideas.md

File metadata and controls

280 lines (176 loc) · 29.5 KB

Project Ideas

Please note that this list is not exclusive. If you have other ideas and topics related to JPF, please let us know on the JPF Google group. A possible proposal template can be found at the bottom of our GSoC page: [[JPF Google Summer of Code 2019]].

JPF Infrastructure

JPF Application Domains

Automatic Program Repair

Symbolic Execution

Fuzzing

Smart Contract

Android

Concolic Execution

Environment and Test Case Generation

Symbolic Data-race Detection

Projects Descriptions

Support Java 11 for jpf-core

jpf-core is essentially a JVM that currently supports only Java 8. The goal of this project is to make it up-to-date with new features of Java 11. The JPF source itself has already been made compatible with Java 11. Now, JPF should support new features of Java 11 bytecode and archives. Among new features of Java 11 are multi-version archives (JAR files) and the ability to link JAR files before they are used by the JVM, and bootstrap methods that are generated at load time. This is a high-priority project, as support for Java 8 is limited to the near future.

Support Java 11 for jpf-symbc

jpf-symbc is essentially a (symbolic) JVM that currently supports only Java 8. The goal of this project is to make it up-to-date with new features of Java 11. This is a high-priority project, as support for Java 8 is limited to the near future.

Support for gradle for jpf-core and extensions

We have recently moved the build for jpf-core from ant to gradle. However, gradle support for Java 11 is broken, and we have not migrated extension to gradle yet (or even the extension template). The goal of this project is to (1) fix gradle support for Java 11 (branch "java-10-gradle"), (2) to update the extension template, including gradle support and updated documentation, and (3) update widely used extensions with gradle support.

Method Summaries, extended

A thesis project implemented Summaries of methods when executing in JPF. It includes a representation of the summaries that captures all side effects of the given procedure; and implements modifications of the program state.

The actual summary of a method will be computed during its first execution, and then reused within traversal of other state paths. Another possible feature is the support for externally defined summaries that would be useful for library methods.

Experiments have shown that without summarizing the effects of constructors, most methods cannot be summarized. This is because the construct new objects or throw an exception (which is also a new object). Summarizing the effect of constructors would therefore be a huge enhancement to this technique. Other enhancements may also be possible.

Model Checking Distributed Java Applications

jpf-nas is an extension of JPF that provides support for model checking distributed multithreaded Java applications. It relies on the multiprocess support included in the JPF core which provides basic functionality to verify the bytecode of distributed applications. jpf-nas supports interprocess communication via TCP sockets by modeling the Java networking package java.net. This tool can handle simple multi-client server applications. Some examples can be found in the jpf-nas distribution (at jpf-nas/src/examples/). The goal of this project is to extend the functionality of jpf-nas in various ways, such as extending the communication model supported by the tool towards an existing open source Java library/framework, called QuickServer, increasing the performance of the tool by improving the mechanism used to manage the state of communication objects, extending the tool with the cache-based approach used in net-iocache, etc.

Verification of Multi Agent Systems

The goal of this project is to develop techniques that analyze key properties in multi-agent systems. The jpf-mas extension will initially provide the ability to generate the reachable state space of Brahms models. The reachable state space can then be encoded into input for a variety of model checkers such as SPIN, NuSMV and PRISM, thereby enabling the verification of LTL, CTL and PCTL properties. The project will also need to investigate how to generate the set of reachable states for other kinds of models, such as Jason models, and how to compose reachable states of different modelling languages both at run-time and off-line.

Automatic program repair using annotations

Automated program repair (APR) techniques often generate overfitting patches due to the reliance on test cases for patch generation and validation. In this project, we propose to overcome the overffiting issue in APR by leveraging developer-provided partial annotations to aid semantic reasoning. Developer annotations can come in different forms, e.g., JPF annotation. The advantage of developer annotations is two-fold. First, in addition to test cases, it helps augment the specifications of the program under analysis and thus provides more complete specifications. These annotations, despite being simple, can help significantly in semantic reasoning, e.g., null pointer analysis. Second, these annotations are not required to be complex so that to reduce the burden of manual effort by developers. For example, to reason about null pointer exception errors, developers are only required to add a few Nullable or Non-Nullable annotations to class fields or method parameters, etc. We will use JPF and SPF for symbolically reasoning about the semantics of programs under analysis and generating repairs. We will also use JPF-Annotation as a way for developers to provide annotations.

Refactoring SPF constraint library

SPF constraints need to be refactored to allow different kinds of constraints to be combined during the construction of a path condition. An example of how it should be after the refactoring is the Abstract Syntax Tree constructed by GREEN.

Whitebox Fuzzer and Grammar Learner

Build a whitebox fuzzing tool on top of Symbolic PathFinder, that can learn the input grammar of a piece of code in an iterative fashion. The idea would be to first run the code on symbolic input of a fixed length and then learn a possible grammar for this length, at that point extend the length and generalise the grammar. The main research goal behind this project is to see if one can do whitebox fuzzing without a pre-determined seed file (which is the way most whitebox fuzzers work at the moment).

Fuzzing and Symbolic Execution

Develop a fuzzer for Java that can be integrated with SPF (or another Java based symbolic execution engine). The idea would be that when fuzzing gets stuck and makes no progress that the symbolic analysis can create a new seed file to allow analysis to progress.

Comparison between concolic and classical symbolic execution

Comparison between concolic execution, e.g. DEEPSEA and JDart, and classical symbolic execution, e.g. SPF.

Analysis of Android Applications

Various ideas are welcome here. Here are a couple of possible subprojects:

  1. The goal of this project is collecting interesting Android applications, and evaluating and applying JPF-Android to analyze them. JPF-Android is an extension to JPF used to model check Android Applications. Android applications have many dependencies which make them hard to test and verify. They also require events to drive the execution of the applications. The goal of this project is to identify interesting Android applications to run on ]JPF-Android and then evaluate the efficiency and effectiveness of the tool on these apps. You will be using/improving existing approaches to generate stubs and models for applications and then compare the coverage and runtime on JPF-Android to other dynamic analysis tools.

  2. This project includes using JPF-Android to generated test sequences for android applications, and implementing a tool to convert these sequences into tests that can be run on the emulator. JPF-Android verifies Android applications outside of the Android software stack on JPF using a model environment to improve coverage and efficiency. It generates event sequences to drive the execution of the application during exploration. Each sequence also includes the configuration of the environment (device) for which the sequence was executed. This project uses the AndroidViewClient API in Python to run the set of event sequences as detected by JPF-Android on an emulator to find the number of valid sequences and the code coverage they obtain compared to JPF-Android.

  3. An extension of jpf-mobile-devices to generate the right initialization sequence for applications running inside JPF on Android. This project is different from the ones above in the sense that JPF is run as an Android application that can use the underlying Android environment, not as a normal application that models the Android environment. Also see the paper on jpf-mobile-devices.

Smart Contract Analysis

Develop a mechanism to allow the analysis of Ethereum Virtual Machine (EVM) bytecode by replacing the JVM bytecodes with EVM bytecodes within JPF. The second part of the project would be to extend the bytecodes further to allow symbolic execution as well.

JDart for Dynamic Taint Analysis

JDart is a dynamic symbolic execution engine Java based on Java PathFinder (JPF). The tool executes Java programs with concrete and symbolic values at the same time and records symbolic constraints describing all the decisions along a particular path of the execution. These path constraints are then used to find new paths in the program. Concrete data values for exercising these paths are generated using a constraint solver.

Currently JDart has two mechanisms for marking data values in a program for symbolic analysis. First, JDart analyzes methods during starting symbolic analysis. All parameters of an analyzed method are treated symbolically. Second, special annotations can be used to mark class members that should be analyzed symbolically. This mode of operation is geared towards generating test cases as well as the symbolic summaries of methods.

The goal of this project is to add a new mode of operation to JDart, in which symbolic variables can be introduced dynamically during execution – and also be made purely concrete again. This can be done by introducing symbolic variables for (or removing symbolic annotations from) the return values of specific methods (e.g., Random.nextInt()) – using listeners provided by JPF. This new mode of operation would have two benefits:

  1. Existing programs can be analyzed without modifications. And analysis is not confined to individual methods. This will, e.g., enable JDart to analyze the Java programs in the SVCOMP benchmarks (Injection Flaws).
  2. It is one step towards enabling using JDart for dynamic taint analysis, e.g., for analyzing potential for injection attacks (Minepump). Web-applications tend to contain code of form:
u = Request.getValueOf(“user”); //Taint u (symbolically) 
query = “SELECT user FROM table WHERE uid=”+u; //query becomes tainted
query = santitize(query); //Un-taint query (symbolically.) 
db.select(query);

If the value of u is not properly analyzed and sanitized, an attacker can exploit this code to gain access to arbitrary information in the applications database. Making the return value of Request.getValueOf(“user”) symbolic, is the first step towards tracing taint of this value.

Please note: For a full dynamic taint analysis, JDart would also have to be extended to analyze strings and arrays symbolically --- this is out of the scope of this project.

Test Case Generation/Model-based Testing with Modbat for JPF

JPF requires test cases as a starting point to explore a system. It is therefore suitable to use test case generation to create test cases automatically. Modbat is an open-source tool for test case generation. For testing concurrent software, an obvious choice would be to combine Modbat (to generate tests) with JPF (to execute tests and find concurrency problems). This has been done once as a proof of concept but is not supported in the current version of Modbat. The main reason for this is that Modbat's reporting has to read and parse bytecode, which requires access to some native code that JPF does not support. The goal is to find all problems where Modbat requires native access, and to use jpf-nhandler to resolve as many of these cases as possible. Remaining cases can be handled with custom model/peer classes.

Symbolic data-race detection for Habanera Java

Habanero Java is a Java implementation of the Habanero Extreme-scale programming model for multithreaded applications. The model is based on X10 and supports fork/join semantics as well as futures, isolation, and phasers. The advantage of structured parallelism such as Habanero is that the language itself provides concurrency guarantees such as deadlock freedom and determinacy if and only the program is free of data-race. A data-race occurs when two or more threads of execution access the same memory location and at least one of those accesses is a write. An additional advantage of structured parallelism is that run-times and analysis can be optimized based on the language structure itself. Recent work adds to JPF the ability to model check Habanero Java programs using a verification specific runtime and an algorithm that constructs and analyzes a computation graph representing the happens-before relation of the program execution (1, 2. The analysis is predictive because it infers from the single observed schedule the presence or absence of data-race in other non-observed schedules and only needs to enumerate schedules around isolation. Enumeration schedules around isolation though is still expensive and leads to state explosion in JPF. The work in this project is to mitigate this state explosion in enumerating schedules around isolation by building a symbolic computation graph from the program execution that adds constraints on the graph edges indicating under what condition the edge is active, and then using an SMT solver to find a set of edges on which a data-race exists. A first step in the project is to add a dynamic partial order reduction to JPF that is able to inform the symbolic computation graph about dependencies.