“Line-oriented JSON”: Using JSON for quick-and-dirty protocols and REPLs

As a Python and JavaScript programmer, I love JSON. Both languages support JSON really well, and it generally is more terse than XML while also providing basic built-in types. It’s very easy to rely on JSON as a general-purpose serialization format which can cross between both languages with a minimum of effort.

As such, one time I decided to use JSON for controlling a program via a read-eval-print loop (REPL). Rather than creating my own parsing scheme, it made sense to try to do things using JSON. Indeed, it is relatively easy to use JSON to do remote procedure calls, as is proven by existance of the JSON-RPC specification. (However, this post is not about JSON-RPC, nor do I use that particular protocol in this post, although similarities undoubtedly exist.)

As an example, here’s what a JSON-encoded remote function call might look like:

{
    "fn": "hello_person",
    "args": ["Steve"]
}

This can trivially be converted to JS or Python on the receiving side:

hello_person("Steve")

Now, the one key weakness I see with protocols based upon JSON is: what happens when an invalid JSON object is sent?

This may seem unlikely. In most cases, perhaps it is: if you always use mature JSON libraries with zero bugs, and never write JSON by hand, perhaps there is not much to worry about. But let’s say you can’t always guarantee that the JSON is pristine. What happens if it gets corrupted? Or what happens if for some reason the message is not valid JSON, either due to a typo or due to completely sending the wrong type of serialized object across the wire? Things like this are not quite as unlikely.

While working on a JSON-based read-eval-print loop (REPL), invalid input via user error was a real concern. So, I came up with a solution to address this. Frankly, I would be kind of surprised if no one else came up with this before me. But I did come up with this approach independently, and I think it’s a good approach to the problem.

Leverage JSON as a line-oriented protocol.

So, what is “line-oriented JSON”? It can be described like this:

  • Line-oriented JSON uses JSON to encode messages to be transmitted across some stream. (E.g. stdio, sockets.)
  • Each JSON message is terminated by a literal newline character.
  • JSON messages can be any valid JSON term (objects, arrays, strings, etc.). However, literal newline characters are not allowed anywhere in the term. This means that objects and lists (including all nested terms) must be expressed on a single line. For strings containing literal newlines, they must be replaced with the “\n” sequence.

Using line-oriented JSON, it becomes trivial to write parsers for JSON-based protocols: just read a line from the stream and load it via a JSON parser. By definition, the line should be a JSON term; if it is not, an error can be returned.

Additionally, this builds in error recovery in the case of invalid JSON input. Under normal circumstances, it may be difficult to recover from invalid JSON, especially on open-ended streams such as sockets or stdio where we have to detect the end of a given JSON term. By terminating each complete JSON object with a literal newline, this becomes much less likely.

Let’s use the REPL example to demonstrate. Let’s say I have a program that implements a line-oriented JSON REPL. I can send the program a message via something like this:

{"fn": "hello_person", "args": ["Steve"]}

The REPL will read a line from stdin, see that it is valid JSON, and see that it can interpret that data and do something. In this case, it calls hello_person(“Steve”).

The REPL may reply in the same way, e.g.:

{"return_value": "Hello Steve!"}

But what if I send a bad message?

// stdin
{"fn": "hello_person", "args": ["Steve"]
// stdout
{"error": "Invalid JSON"}

The REPL can immediately detect the invalid JSON and recover the parsing.

If we were using “full” JSON to do this, this wouldn’t be possible: there’s no syntax error here at all, so we’d have to assume that more bytes would be coming on the next line, and continue to parse until we either terminate the term in some way or detect a syntax error. Forcing JSON messages into single newline-terminated lines makes parsing easy and error recovery trivial.

Anyway, I hope this very minor insight may help others dealing with JSON as part of a protocol, and would encourage anyone who wants to write a robust JSON-oriented protocol to consider leveraging line-oriented JSON.

Leave a Reply

Your email address will not be published. Required fields are marked *