Dr. Memory

Project Suggestions for Contributors

General Prerequisites for Contributors

Dr. Memory and its underlying engine, DynamoRIO, are written in C. Knowledge of C programming is required for all of the projects listed here. Some projects that involve building new tools may alternatively involve C++ programming. Knowledge of operating system fundamentals and experience with runtime systems are also helpful.

Some lower-level projects need knowledge of x86/amd64/ARM assembly code (though mainly just knowledge of the architecture, register set, and instruction set: experience writing in a particular assembler is not required), or program analysis skills. Each project lists the particular skill set most suited to work on that project.

As a starting point to learn more about Dr. Memory, we suggest first becoming familiar with both the Dr. Memory end-user tool and its underlying engine, DynamoRIO. Use the links in the sidebar to download Dr. Memory and try it out. Take a look at our tutorial and talk slides to understand the basics of how DynamoRIO works and how tools are built on top of it.

Before starting work on a significant project, we suggest first contributing small patches to either DynamoRIO or Dr. Memory to both become familiar with the workflow and to allow the developers to feel more confident in your work. Try searching for the "GoodContrib" or "GoodFirstBug" labels in the project issue trackers. Another option is to search for the "Component-Tests" label, as tests tend to be more isolated and simpler problems to tackle for newcomers. Talk to us before starting coding to coordinate work and to ensure we're all on the same page about what's to be done.

Please also read our contributor policies and the code review and workflow information linked from there.


Project List:

  1. Build a Binary Logging Tool

    This project involves building a new "binary logging tool" which allows a user to insert logging calls into a binary without recompiling it. For example, the user could ask to add printing of the argument and return value every time a target function is called. This aids in faster debugging and program analysis. If the initial features are completed, more advanced features include access to local variables, specifying the logging points and actions via a scripting language, and other proposals.

    Prerequisites

    C or C++ programming: the tool could be written in either.

    Skill level: medium.

  2. Build a Library Tracing Tool

    Currently we have a prototype library tracing tool "drltrace". Its goal is to list all of the calls to library routines during the execution of a target application. It is in need of significant usability and feature improvements, including better separation of inter-library versus application-to-library calls, filtering of which library routines to trace, adding argument values and return values, and adding statistics modes as alternatives to full traces. This project involves owning the tool and making it a useful diagnostic tool for users.

    Prerequisites

    C programming.

    Skill level: low to medium.

  3. Build a Shadow Memory Tool

    There are several possible shadow-memory-based tools that this project could focus on. If time permits, the project could encompass building multiple tools.

    One tool is a last-writer tool, similar to what is described in the paper "Data Provenance Tracking for Concurrent Programs" in CGO 2015 by Brandon Lucia and Luis Ceze. Knowing which thread and which instruction last wrote a memory value can be invaluable during debugging. This tool would use simple instrumentation to record that information.

    Another possible tool is a data contention tool to detect the cache-line contention within multi-threaded concurrent programs, using shadow memory. The tool would be implemented along the lines of the paper "Dynamic Cache Contention Detection in Multi-threaded Applications" by Qin Zhao, David Koh, Syed Raza, Derek Bruening, Saman Amarasinghe, and Weng-Fai Wong in VEE 2011.

    A final possible tool would construct the dependence graph among the instructions of a target application based on the data read/write history during execution. Such a dependence graph can then provide insightful information about a target program. This is the most complex of the three proposed tools.

    Prerequisites

    C or C++ programming: the tool could be written in either.

    Skill level: medium to high.

  4. Integration with GDB

    The goal of this project is to bring the power of dynamic instrumentation into the debugger. Traditional debuggers can only pause the execution and then inspect the state of the program. Dynamic instrumentation, however, can successfully collect dynamic execution information for program analysis, including profiling data and dynamic information flow tracking. Dynamic instrumentation tools such as Valgrind and Dr.Memory are widely used for finding memory bugs like uninitialized reads.

    We are exploring the possibility of creating a debugger augmented with our dynamic instrumentation platform DynamoRIO. The debugger front-end would remain the normal debugger that a user sees, while the program being debugged is actually running under a DynamoRIO-based tool that keeps track of execution information. This would make much more information available to help the user debug the target program.

    Our first target is GDB, and this project's first step is to begin integration with GDB by enabling DynamoRIO to talk to GDB using GDB's existing remote debugging protocol. Once that is completed, the project can move on to subsequent steps including enabling DynamoRIO to work within a debugging environment and perform debugging tasks issued by GDB. Time permitting, the project could include initial work on adding powerful new debugging tools to GDB.

    Prerequisites

    C programming and low-level architectural knowledge. Familiarity with debugging in gdb will also help.

    Skill level: medium.

  5. System Call Tracing on Windows

    We have a tool called Dr. Strace which provides a system call trace on Windows. This project involves adding new features to the tool in order to make it more usable, including filtering options and filling in missing system call data in our Windows system call database.

    Prerequisites

    C programming and Windows low-level operating system knowledge.

    Skill level: medium.

  6. Port Code from x86 to ARM

    We have a large code base that we are in the process of porting to the ARM architecture. This project involves helping to port our tool suite, sample tools, extension libraries, and end-user tools to ARM. Some of the work involves porting x86 assembly code and dynamically generated machine code and will require knowledge of the ARM ISA.

    Prerequisites

    C programming and ARM assembly experience or ARM ISA knowledge. Access to an ARM-based device that can be used for development is also required.

    Skill level: low to medium.

  7. Add Post-Processing and Multi-Run-Aggregation Features to Dr. Memory

    Today, Dr. Memory produces an error report for each process during the execution of an application. In some cases, symbols were not available during execution. In other cases, an application is composed of multiple processes, but the user would like to see a single error report rather than separate reports per process. This project involves adding capabilities to re-symbolize, re-suppress, or combine results from prior runs under Dr. Memory.

    Prerequisites

    C programming.

    Skill level: low to medium.

  8. Dr. Memory Annotations

    Today, Dr. Memory does not yet support application annotations, which are used in other tools to identify exceptional cases or direct the checks performed by Dr. Memory to handle unusual application behavior. We do have an annotation infrastructure in the underlying tool platform. This project involves implementing annotations for Dr. Memory based on that infrastructure and using the annotations to simplify testing and improve usability.

    Prerequisites

    C programming.

    Skill level: low to medium.

  9. Use LLVM as a Decoding and Encoding Library

    This project involves building an AArch64 decoder, encoder, and disassembler for DynamoRIO, the tool platform underlying Dr. Memory, using LLVM's existing code. The LLVM code is not designed to be used within the constraints of a dynamic binary translator, however, and modifications will be needed. The output of the decoder and input of the encoder will also need to be transformed to the DynamoRIO instruction representation format.

    Prerequisites

    This is the most advanced and ambitious of the projects listed here and requires the most time input. It will require C++ and C programming and familiarity with what an ISA is: either assembly experience or general architectural knowledge.

    Skill level: high.


Contact Information

To discuss contributing to any of these projects, join the dynamorio-users forum.