Scalable Remote Debugger Protocol

From Kodewerx
Jump to: navigation, search

This page is currently serving as a reference to kick-start development of the universal debugger protocol which will be used by the Universal Debugger Project and hopefully many, many other debuggers and debugger interfaces in the years to come.


These references are listed in order of relevance; most relevant first.

  1. RFC-909: Loader Debugger Protocol
  2. GDB Remote Serial Protocol
  3. RFC-643: Network Debugging Protocol
  4. IEN-158: XNET Debugging Protocol
  5. DBGp: A common debugger protocol for languages and debugger UI communication

The relevancy I've determined for this list is due to interest in these specs, as well as potential generic uses and protocol extension.

RFC-909 is so far the closest thing I have found which resembles the general idea I have for a "Universal Debugger Protocol". It's composed as a simple binary packet, it's extensible, and it's designed to be stacked on top of existing transport protocols such as TCP/IP. I doubt this exact spec will fit all of our needs, but it is certainly a good start.

GDB provides a fairly popular protocol. This one is designed for serial communications, so it will work well with small embedded devices. But it could be complicated to extend while retaining its GDB friendliness.

RFC-643 and IEN-158 are interesting only because they show that some experimentation on the ideas of remote debugging have been employed in the past. Unfortunately, these specs were designed for a specific architecture, and are of little practical use for our purposes.

DBGp shows what a modern remote debugging protocol can look like; including modern XML syntax. The downside to this is that low-level debuggers in small embedded devices are unlikely to parse XML at all.


This section represents my (Parasyte) own personal opinions and ideas, and should not be taken as advocacy for standardization.

One of the main goals of developing a "universal" protocol for debugging is that it must be usable everywhere; in small embedded devices, and some of the most powerful machines in the world. This kind of flexibility must be designed around multiple layers of abstraction. See OSI Model and Internet Protocol Suite for examples of abstraction layers used in communications technologies.

At the lowest layer, you find the wire; the physical means of transmitting information over distance. For our purposes, we should not limit ourselves to a single wire. Instead, we should allow the use of multiple wires, user-selectable, but never more than one at a time.

The following layers get more and more generic and abstract, until you reach the highest layer which represents what the application sees and interacts with. This would be the "protocol" itself.

So let's break these components down, hypothetically, and get into some details, ordered lowest layer first:

  1. Physical layer: Some examples of wires to support include LAN (Ethernet/WiFi), Wireless (Bluetooth), RS-232 (serial port, USB serial port), Inter-Process Communication (Domain Sockets? DBUS?)
  2. Transport layer: Some examples of transport protocols include TCP/IP, UDP/IP (LAN, Domain Sockets), UART (RS-232), IPC-specific (DBUS)
  3. Application layer: A library (or similar service, E.G. a daemon) to tie all transport layers into a single API that, to the application, looks like one simple interface to connect and send/receive data. The library/daemon will have to handle the transport-specific details behind-the-scenes.

Thinking about this led to a conundrum; If we support multiple wires, we have to support multiple transport protocols which are compatible with those wires. And if we support multiple transport protocols, we have to know which one our target implements. To make the API as simple as possible, we must not force clients to choose from configurable options (for a bad example) that requires a large degree of changes for each different type of connection made. How do we simplify the API so that a user can just plain connect without doing any pre-setup work?

Answer: The URI scheme. The unfortunate downside to this solution is that it is undesired to use URI schemes without registering them with IANA. However, an argument could be made that these schemes would not be used for general network/internet communication. A few popular examples of similarly non-networked schemes are the file: and about: URI schemes. (The exception here is that at least one physical layer (LAN) could be used for over-the-internet communication; but this has great benefits in its own right.)

Example URI Schemes

The following table represents some examples of how URI schemes could be used as debugger protocols:

srdp:// TCP/IP to remote host on a pre-defined default port
srdp+udp:// UDP/IP to remote host on port 9424
srdp+usb://localhost/ USB (SRDP-compatible devices) on localhost
srdp+uart://localhost:3/ UART COM port 3 on localhost
srdp+dbus://localhost/ DBUS IPC on localhost

The 'srdp' prefix on these examples is to specify the 'Scalable Remote Debugger Protocol.' The + and suffix defines an additional layer (or protocol) below SRDP.

The latter three examples look a bit odd with localhost being the destination, but this is necessary, since the localhost is the destination for hosting the UART RS-232 port, USB port, and IPC interface. Using non-loopback interfaces (IP addresses outside of the local machine) with these protocols should be undefined, unless there is evidence that connecting to RS-232/USB/IPC interfaces on other machines across a network is practical and plausible.

Simplified Configuration

These URI schemes give a very simple and elegant solution to the concerns they address. No longer will you be stuck with complicated configuration settings like the example below (upper left group box) ... and this is not an incredibly complex configuration dialog, as it is; instead, connecting to ANY low-level debugger in the world will be as simple as typing a URL.

Example of what not to do:

Gscc config.png

Operation Groups

The protocol is defined as a set of usable "requests" (AKA "operations" or "commands") requested by the client to the debugger, or vice-versa. Operations should be grouped according to a specific metric. The metric I've chosen is hardware (architecture) relationships. The table below shows an example of such groups (currently 6 in total) and example operations assigned to each group.

1) Diagnostics (Info, Ping/Pong, Reset, ...)
2) CPU handling (Register read/write, Arbitrary code execution, General CPU control, General process/thread control...)
3) Memory handling (Read, Write, Address conversion, Hardware I/O, Cache control, ...)
4) Breakpoint handling (Add, Delete, Edit, Get, ...)
5) Stream handling (stdin/stdout/stderr, Debugger-specific messages, ...)
6) Vendor-specific (Custom command sets; should be discouraged unless absolutely necessary)


This section defines a proposed specification which may be adopted as the "Scalable Remote Debugger Protocol". It is considered a work in progress and is currently open for peer-review, meaning we are interested in receiving comments, criticisms, and suggestions.

Protocol Goals

Goals of the protocol include:

  1. Client/server relationship: Target (debuggee) acts as a server, quietly listening for any SRDP requests; User Interface acts as a client, making explicit requests to a listening server.
  2. Asynchronous requests: A client must send requests without expecting an immediate response. A server accepting requests may not respond immediately to those requests.
  3. Scalable: The data structure (format) used in the protocol must be adaptable to the future; The structure must be as forgiving and dynamic as possible, avoiding fixed contents (except where absolutely necessary) and allowing for [non-mission-critical] non-standard contents.
  4. Easy to implement: Basic features of the protocol should be easy to implement from an API point-of-view, as well as having a small memory footprint; the protocol must be usable on small embedded machines with few resources.
  5. Robust: Ambiguity should be kept to a minimum in all aspects of the protocol; every bit transferred should have a useful meaning.
  6. Easy to debug: A debugger protocol that cannot itself be debugged (observed and verified to work as expected) is a failure in and of itself. For this reason, the protocol should be human-readable in its most basic form.

Underlying Protocols

There are no reservations on any underlying protocols (protocols used to move data from the client to the server, and back again -- SRDP is not one of these protocols). The only requirement is that they provide hand-shaking (transmission control), sequential ordering of packet data arrival, and data integrity checking. Some examples of suitable underlying protocols include TCP/UDP/IP, and UART.

The initial reference implementation will use TCP/IP for remote connections. For local-listening servers, the reference implementation will use UNIX Domain Sockets on UNIX-like operating systems, and Named Pipes on Windows.

Requests, Responses, Alerts

Packets are given different names depending on their transmission direction (client -> server, or server -> client) and intended recipient (server, specific client, or all clients).

Response, requests, and alerts must have a unique identifier associated with them. This will allow clients and servers to stay in sync, knowing which responses are for which requests, for example.


A packet is called a request if it is from a client to the server. The name "request" comes from the idea that the client is requesting information or a specific action to be performed by the server.


Responses are packets from a server to a specific client. Responses are always sent in response to a request (hence the name). However, not all requests are required to send responses (which is why "requests" are not called "commands"). Responses are only sent to the client which initiated the specific request being handled.


An alert is a special type of response (a packet from the server to clients); an alert is sent to all connected/listening clients. This is synonymous with network "broadcast" packets, and it useful for notifying all clients of information they might like to know.

A few examples of information that all clients might like to know are:

  • Breakpoint hits
  • Pausing/resuming execution
  • Resets
  • Debugging messages (log messages)

Not all alerts are initiated by requests from clients, but most will be. Log messages are typically spewed by programs without explicit requests; SRDP can allow listening for and capturing these messages.


The all-encompassing term for requests, responses and alerts is "commands". Any time a "command" is mentioned, it refers to any combination of requests, responses, or alerts.

Protocol Packet Data Structure

The "goals" section outlines the major features which formed the following data structure. Inspiration comes mainly from JSON, the JavaScript Object Notation. As JSON is a serialization [text format] of JavaScript objects, the SRDP data structure is a serialization of the data being transmitted.

The structure also shares some inspiration from RPC; An example is that your client may want to read work RAM from the target. The SRDP request for "read memory" is technically similar to remotely running a "read memory" function on the server, invoked by the client. For this reason, each SRDP packet contains one or more "arguments" which you could imagine are passed directly to a plain old function.

Each packet is sent as a series of 8-bit bytes. Packets are broken down into a "request/response/alert" name (called a command), encapsulating a series of "arguments". You can think of it like a C function call. The "info" command, for example, requests information about the target machine architecture; it requires no arguments. The info command looks like this, and is a complete and valid SRDP packet:


Each argument has a name (made of one or more characters: alpha-numeric, underscore (_), or hyphen (-). The argument name is followed by a colon (:) and then a single byte representing the data type of the argument, then an equals sign (=) and the argument's value. All arguments are separated with a comma (,).

The argument syntax is similar to that of CSS. In pseudo-form, it looks something like this:


Valid data types:

{type} Name Description
n Number Any positive integer, encoded as a VLI
s Signed Number Any negative integer, encoded as a one's complement VLI
f Floating Point Number Any non-integer number, Infinity, or NaN, encoded as a null-terminated UTF-8 string, or null-terminated UTF-16 or UTF-32 string with BOM; To be decoded by sscanf
a Array Byte-array (binary blob), preceded by a VLI to indicate the length of the array.
c Compressed Array Byte-array (binary blob) with RLE compression. See #Compressed Array Data Type
t Text Null-terminated UTF-8 string without BOM, or null-terminated UTF-16 or UTF-32 string with BOM

Some example arguments. (Please keep in mind that all of the argument names listed within this section are for demonstration purposes only, and are not recommended for reference purposes.)

msg:t=Hello, World!␀

What the symbols mean:

 ␀: Null-terminator
 □: VLI
 ■: Data byte

Compressed Array Data Type

One suggestion is making this data-type optional, and changing this spec to use a standard library, like zlib. In this way, slow machines with few resources can adhere to the SRDP spec without wasting precious footprint space and computational power implementing such "heavy" libraries.

Using a "heavy-weight" compression library will help achieve better compression ratios, but will add processing overhead. The added overhead may turn out to be an overall hindrance. For example, a memory editor might want to refresh a memory display as fast as possible (several times per second) within a small visible view-port window. This kind of editor will need to download the contents of memory within the visible range quickly. Using compression in a use-case like this is a good idea, but only if transferring the smaller packet can make up for the time required to compress and decompress the data.

Compressed arrays are similar to the standard "Array" data type. The compressed array starts with a single VLI to represent the total data size of the value (e.g. the size of the compressed data, x). The following data is a series of alternating raw data and RLE data.

  1. Raw Data: A VLI representing raw data size in bytes (n), followed by n bytes of actual data.
  2. RLE Data: A VLI representing RLE data size in bytes (n), followed by a single byte to be repeated (n + 4) times in the output.

This series is repeated until there are no more bytes to be read from input (x bytes of the argument value have been read).

For the RLE compression to be useful (efficient) it must not be used for any less than 4 bytes (therefore, the VLI is said to be a "4-based" number). The number 4 is derived from the minimum overhead introduced by the serial alternation and VLIs; 1 VLI for RLE output length, 1 byte of RLE data, 1 VLI for raw data length.

Thus, in order for the RLE to perform "compression", the RLE output must be larger than the smallest sequence required to switch from raw data and back again. Some examples to illustrate, bold-italic bytes are for VLI "control sequences" (the data lengths specified above):


Non-compressed data:
94 24 51 73 00 00 00 01

Incorrectly compressed data:
04 94 24 51 73 03 00 01 01

Correctly compressed data:
08 94 24 51 73 00 00 00 01

Non-compressed data:
94 24 51 73 00 00 00 00

Correctly compressed data:
04 94 24 51 73 00 00

Non-compressed data:
00 00 00 00 00 00 00 00

Correctly compressed data:
00 04 00

The reason the second line in example 1 above is "incorrectly" compressed is because the second VLI is expecting the length to be 0-based. If this was the case, you would be simply adding overhead to replace any bytes saved by the "compression". For this reason, the "correct" way to compress the example is to use a single length of raw data. This example is non-compressible, and should not be sent as a compressed array data type.

In the second example, the data can be compressed nicely, saving a byte overall (including compression overhead). Since this is the "correct" way to compress the data, it is using a 4-based VLI on the RLE Data: "00" means 4 bytes of output, "01" means 5 bytes, "02" means 6 bytes, etc.

The third example shows how to compress a series of bytes that starts with repeating data, instead of non-repeating "raw data". The first VLI of 00 means there is no raw data for output (this is required: compressed arrays always begin with the Raw Data, followed by RLE Data). The second VLI 04 is the length of the RLE Data; 8 bytes. Even if the non-compressed data was 4 bytes of "00", the compressed array would still be only 3 bytes of total data, saving one byte. This helps explain the reasoning behind the 4-based VLI for RLE Data.

Object Data Type

The basic structure and data types shown so far are very powerful; your server can tell the client vast amounts of information, such as CPU architecture, memory maps, I/O maps, etc. with just a handful of arguments. However, grouping these arguments in a meaningful way may be difficult. You might be inclined to do some "mock" namespacing, like prefixing each CPU-related argument with "cpu_". This is effective, but also slightly wasteful and error-prone.

The "object" data type is designed to handle such situations. This data type allows an argument to be a container for other arguments. Its format looks like this:


Where the set of ellipses denotes one or more "regular" arguments. Here is an example of what a "cpu" object might look like:



(Note the white-space is for readability only; it is not meant to be transferred as part of the protocol data.)

In this example, the "cpu" object contains two arguments: cpu.arch and; both strings.

But there are also times when you will want your server to send that same information for two [or more] CPU architectures on a single target. Some platforms may have multiple CPUs, each with its own individual set of resources (memory maps and the like), as well as shared resources between the CPUs. For this, the packet data structure needs a more advanced method of communicating these kinds of details.

For this case, you can optionally create arrays of objects by including a comma (,) after the closing square bracket, followed by another series of arguments enclosed in their own square brackets, ad infinitum. In other words, any leading object definitions without explicit names will be treated as additional array elements for the previously named object.





Now our "cpu" object defines two CPUs: an ARM9 and ARM7, ready for Nintendo DS hacking. These arguments can be referenced as cpu[0].arch, cpu[0].name, cpu[1].arch, and cpu[1].name respectively.

Objects can be arbitrarily complex, containing arrays of other objects. Here is an example of a simple memory map, containing Work RAM (read/write/execute) and ROM (read/execute) sections. The Work RAM section is broken into two distinct memory ranges. This is quite easily expressed:

name:t=Work RAM␀,




Command Reference




Request information about the target machine architecture. The info() request is implied during connection: A response will be sent after a connection is successfully established. This allows clients to gather necessary information about the target during the connection phase, without explicitly making the request.

Return values:

Must only return one of the following:

  • sys (string):
    A shorthand for defining known machines (mostly video game systems).
    See below for supported values
  • machine (object):
    A longhand for defining the complete machine architecture.
    TODO: Specify all object elements and their meanings.

The currently supported values for sys are:

  • NES: Nintendo Entertainment System
  • SNES: Super Nintendo Entertainment System
  • N64: Nintendo 64
  • NGC: Nintendo GameCube

  • PS1: Sony Playstation
  • PS2: Sony Playstation 2

Planned future support:

  • GB: Nintendo GameBoy
  • GBC: Nintendo GameBoy Color
  • GBA: Nintendo GameBoy Advance
  • NDS: Nintendo DS/DSi
  • VB: Nintendo Virtual Boy
  • WII: Nintendo Wii

  • SMS: Sega Master System
  • SMD: Sega Genesis/Mega Drive
  • SCD: Sega CD/Mega CD
  • 32X: Sega 32X
  • GG: Sega GameGear
  • SAT: Sega Saturn
  • DC: Sega Dreamcast

  • PSP: Sony Playstation Portable

  • TGC: Tiger

Considering support "some day":

  • 3DO: 3DO
  • A26: Atari 2600
  • LYNX: Atari Lynx/Lynx II
  • JAG: Atari Jaguar
  • WS: Bandai Wonder Swan
  • WSC: Bandai Wonder Swan Color
  • GP32: GamePark 32
  • GP2X: GamePark Holdings GP2X
  • WIZ: GamePark Holdings GP2X Wiz
  • XBOX: Microsoft XBox
  • X360: Microsoft XBox 360
  • TG16: NEC TurboGrafx-16/PC Engine
  • TGCD: NEC TurboGrafx-CD/PC Engine CD
  • SGFX: NEC SuperGrafx
  • NGEO: Neo*Geo AES/MVS
  • NGP: Neo*Geo Pocket
  • NGPC: Neo*Geo Pocket Color
  • PKMN: Nintendo Pokémon Mini
  • SGB: Nintendo Super GameBoy
  • CDI: Philips CD-i
  • VMU: Sega Dreamcast VMU
  • PS3: Sony Playstation 3
  • SV: SuperVision

Proposal Comments

The "Protocol Goals" have lead much of the motivation for developing this proposal. This presents what seems to be several very strange choices, at first glance. The choice of representing floating point numbers as a string of text seems extremely odd, until you consider that using the target-native floating point data format would make no sense when sending that data to a remote client (which is most likely running on a different architecture, and may have totally different native floating point formats). In order to express floating points with arbitrary precision in a data format-agnostic way, it is necessary to use a non-native format like text.

Another oddity in this spec is the use of VLIs (variable-length integers) and their affect on the rest of the format. The main purpose for using VLIs is address widths. Some architectures can express their full address range within a single byte. Others require up to 8 bytes for a full 64-bit address range. Future architectures are by no means limited to 64-bit address widths. For this very reason, it is necessary to scale down as well as up. A VLI can express an address in as little as a single byte, or scale upward to arbitrarily large numbers. This makes VLIs perfect for addressing requirements among any architecture.

VLIs present their own issues, however. For example, expressing a negative number as a VLI is nearly incomprehensible. Some might be inclined to reserve one bit within a VLI to indicate signedness, but that's another bit that cannot be used to minimize VLI overhead. The overhead is additional bytes required to represent a full number in a VLI system. For example, it is common for numbers 0 - 127 to be contained entirely within a single byte, including the overhead. But numbers between 128 - 255 require an additional byte to include more VLI "header" information (used to extend VLIs into arbitrarily long numbers). This is counter-intuitive, where a single byte itself can hold numbers between 0 - 255. Adding an additional sign bit reduces the range of VLIs by half: a single byte can only encode numbers between 0 - 63.

The solution is to use a different data type specifically for expressing negative numbers. The VLI is encoded just like a positive number, but when interpreting the VLI, it must be converted to a negative number by either subtracting from zero (0 - n) or multiplying by negative one (n * -1). This is referred to as a "one's complement".

In general, the efficiency of a VLI is very static. That means, a number using 0 - 7 bits of data (for example, the number "0" uses 0 bits of data, and the number "64" [binary: 100000] uses 6 bits) can be encoded into a single byte, a number using 8 - 14 bits can be encoded into 2 bytes, a number using 15 - 21 bits can be encoded into 3 bytes, etc. See for more information on the kind of VLI I am considering for this proposal.