There are a lot of "Write your own Redis" guides online. The general idea is neat: you get to write a simple KV storage and a network interface for it in a guided fashion. You will work with network and file I/O, do some bytes encoding to support Redis wire protocol, write a simple data structure to hold your data. It's a nice way to get acquainted with a new language.
In this spirit, I decided to write my own Redis. To be honest, I just wanted to implement a Log-Structured Merge Tree, but it occurred to me that it would be nicer with an interactive interface, so I could use it from other software. My plan was simple: Redis is an established product, it's protocol (also called RESP - Redis Serialization Protocol) is well-documented and has to be relatively simple. Considering amount of "Redis-like" software out there, I expected to find a dozen different parsers to choose from.
However, I soon found out that this is not the case. At least for Go, there are a couple projects on Github, but they all share some of these properties:
In the end, I shelved the whole LSM Tree idea and decided to write a RESP parser/encoder. Should be easy, after all.
RESP is a text-based protocol. It's very easy to understand what is
happening under the hood, and interacting with your server doesn't
require any tools more complicated then netcat
. It also
doesn't have any complicated semantics. All types start with a single
byte identifying a type, and end with a CRLF terminator
(\r\n
).
For instance, encoding of string MEOW
looks like this:
+MEOW\r\n
. +
means that all text until the
next \r\n
is a string. Simple enough. There also similar
representations for errors (-ERROR\r\n
) and integers
(+42\r\n
).
Some types have information about their length encoded. If you want
to send some binary data, you have to use a "Bulk string" - you
essentially tell your server that "Next n bytes of the message are
data". Looks like $8\r\nDEADBEEF\r\n
.
Finally, if you want to send an actual command to the server, you use an array. These are just a bunch of other RESP objects, along with their count. Simple Redis request can look like this: (in this and other examples, newlines, tabs and comments are added for readability)
*2\r\n // Array of size 2
+GET\r\n // Simple string
+MYKEY\r\n // Simple string
At first glance, all these objects are straightforward to implement. Using them you can easily build a nice KV interface. They are all part of RESP2 protocol, which has been supported by Redis and it's clients for a while. But there are some nice-to-haves missing: there are no defined ways to send a floating-point number, or a map type. Thankfully, people at Redis thought about that and gave us RESP3. It's an extension of RESP2 which should solve most of it's drawbacks.
For one, now you have proper maps. They are pretty much as a sequence of Key-Value pairs. Simple and effective, I like it:
%2\r\n // Map with 2 Key-Value pairs
+foo\r\n // Key 1, Simple string
:0451\r\n // Value 1, Integer
+bar\r\n // Key 2, Simple string
:4212\r\n // Value 2, Integer
You also get a double type, which you use to encode floating-point
numbers. Nice stuff: ,33.77\r\n
. You also have all your
usual shenanigans, like ,-inf\r\n
and
,nan\r\n
.
However, RESP3 also added a bunch of other types. Some of those are not very elegant, while others highlight all the weird crutches you have to implement when you are developing a backwards-compatible database for 16 years straight.
Remember Bulk Strings? You should use those in two cases: either
your string contains raw bytes, or has a length big enough to warrant
knowing it's size before parsing. RESP3 introduces Bulk
Errors: !3\r\nERR\r\n
. Official docs describe it
as "combining purpose of simple errors with the expressive power of
bulk strings". To me, it looks like a solution for a problem which
shouldn't exist: errors are here to inform user about something going
wrong, they shouldn't contain bytes or be excessive in size.
You also get a proper Bool type. Looks nice: #t\r\n
.
The only problem is, Redis needed to return True/False long before
RESP3, so there was already an obvious solution to a lack of Boolean:
integers. One is true, zero is false, simple enough. It's not even
some kind of optimisation: "True" is 4 bytes as RESP Boolean, and 4
bytes as RESP Integer (:1\r\n
). So now you have two ways
to express a boolean type, and for no good reason.
Speaking of integers: you now have "Big number" type. Same thing as
"Integer", but now it can contain numbers bigger then
int64
:
(3492890328409238509324850943850943825024385\r\n
.
Understandable solution to preserving backwards compatibility, but now
you have two ways to send an integer.
RESP3 also adds "Attributes". These are essentially a Map type, except that those contain additional properties of an object. Here is an official example (!):
|1\r\n // An Attribute map with 1 Key-Value Pair
+key-popularity\r\n // Key: Simple String
%2\r\n // Value: Map with 2 Key-Value pairs
$1\r\na\r\n // Key 1: Bulk string "a"
,0.1923\r\n // Value 1: Float
$1\r\nb\r\n // Key 2: Bulk string "b"
,0.0012\r\n // Value 2: Float
*2\r\n // An Array of size 2, to which Attributes are applied
:2039123\r\n // Integer
:9543892\r\n // Integer
You have an array of two numbers, and this array has an attribute "key-popularity", which describes popularity of it's contents (why are those attributes attached to the array instead of integers itself? I have no idea, but this is not related to RESP-itself, so we will ignore it).
Can you notice the issue? Normal RESP is parsed in a recursive fashion. You usually have an Array at the top, which contains some objects, which might contain more objects, and so on. Attributes suddenly add an exception to this rule: they don't "contain" the object they are describing, instead they are just attached before it. This is not that big of a deal, but it seems like a strange design choice.
Another addition are "Verbatim strings". They are described as "similar to the bulk string, with the addition of providing a hint about the data's encoding". This sounds very nice: I've worked with old projects, where UTF-8 is mixed with other weird encodings, and having an ability to mark the encoding of a certain string would be very useful.
Let's look at the official example:
=15\r\ntxt:Some string\r\n
. The txt
part
represents the text encoding. There are two problems with this
approach:
utf8
,
win1251
).So now you have another type of bulk string, which is supposed to have text encoding embedded. But instead of proper encoding, they are just putting an attribute describing how to best present this text. If only there were a special data type which would allow us to add additional attributes to objects! We could even call it attributes.
Another structures introduced are "Set" and "Push". Sets are, as expected, a collection of elements with no particular order which is guaranteed to only have unique elements. Pushes are a way to sidestep traditional request-response model and push some data to the client. Let's compare representations of these data types with some test data:
Data Type | Representation |
---|---|
Array | *2\r\n+FOO\r\n+BAR\r\n |
Set | ~2\r\n+FOO\r\n+BAR\r\n |
Push | >2\r\n+FOO\r\n+BAR\r\n |
Those are all arrays, just with different first byte! I can get the excuse for Pushes, but I can't fathom why would you use Sets. They express exactly the same information as Arrays. Protocol declares them to be unordered and unique, but this should be a property of the function which returns a set, and not one of the protocol itself. You can send non-ordered, unique data with arrays just fine. You can also try to mess the "set" ordering and see which applications break first.
To sum up, RESP has 2 ways to express a string, 2 ways to express a error (one of which is cursed), 3 ways to express an integer (Integer, Bug Number and Double) and two ways to express a Bool (Integer and Boolean). You also get two types which are maps, but for different purposes (Map and Attribute), one of which is also cursed, and three types of Arrays (Array, Set and Push), one of which is fully redundant. However, that's not all!
Redis needed a way to express a "Nothingness", i.e. Null. When you fetch a key which doesn't exist, you shouldn't get an empty string as response (how would you differentiate that from an existing empty value?), but they also didn't want to send a error. Problem is, original RESP didn't have a type for Nulls, so they decided to do the simplest thing possible and use an existing type.
That is how "Null bulk string" was created: a string of size -1
which represents null ($-1\r\n
). But one type of null was
not enough, so some commands might return "Null Array" instead, as you
probably guessed, it's an array of size -1: *-1\r\n
.
RESP3 aimed to fix this mistake, and added a separate "Null" type.
Simple enough: _\r\n
. Problem is, other two types are
still a part of the standard. Because one way to express nothingness
is not enough, RESP has 3.
RESP3 introduces the concept of "Streamed type". Let's say you are
doing some slow search, and want to stream the data to the user in
real time. You don't know the amount of items to be returned yet, to
you can't use normal arrays or strings. Instead, you use aggregate
types: these are essentially just arrays with ?
as size.
Their end is instead marked by a special object of type "END"
(.\r\n
). Looks simple enough, and can be useful in some
applications. Example:
*?\r\n // Array of undefined size
:1\r\n // Integer
:2\r\n // Integer
.\r\n // END Type
Problem is, it's not properly documented. They are mentioned in official Redis RESP description, but they are not described there. In fact, I only learned about those when writing this text, because I stumbled on some docs published on GitHub which describe it.
RESP is not hard to work with. However, it's not that well documented, has some weird data types and can represent the same information in a multitude of different ways. It works for Redis, but if you are building your own thing, whatever this thing may be, I suggest that you stay away from it.
If you still want to build something using RESP, and you are building it in Go, I did end up writing a parser/encoder for it: keddad/kresp. It is decently tested, and supports all of the shenanigans described above (except for streamed types, because they are not part of the spec at Redis website).