REDIS: 3 ways to clone a database (in python)

It has recently come to my attention that REDIS is actually a very useful tool for managing your database. It’s very fast it’s supposedly reliable. At least Twitter and Pinterest seem to think so and if Pinterest says it, who am I to oppose.

As much as on a whole REDIS is quite intuitive, one might still run into troubles like I did when I was trying to clone a database and maintain an ‘original’ version alongside a version that had been edited. Apparently there are multiple ways how to go about it, but as it’s often the case, some ways are better the others.

One REDIS instance with multiple databases

This might look like the most straightforward way to deal with multiple databases as suggested here. REDIS is able to operate with multiple databases in one instance. If not specified, REDIS will create database at port 0, however if you so desire, you can change that. In order to achieve that you’ll need to do the following.

Open a single redis-client in your command line or however you decide to do that and proceed to python:

import redis

database1 = redis.StrictRedis('localhost', port = 6379, db=0)
database2 = redis.StrictRedis('localhost', port = 6379, db=1)

database1.set('key1', 'value1')
database2.set('key2', 'value2')
...

Now you have two database object, which each point to the same instance. You can access the two databases using their objects or the following links as described in the redis-py documentation.

This might look convenient, but it’s definitely not recommended. Let’s hear from Salvatore Sanfilippo himself as stated here:

I understand how this can be useful, but unfortunately I consider Redis multiple database errors my worst decision in Redis design at all… without any kind of real gain, it makes the internals a lot more complex.

(Salvatore Sanfilippo, 15.5.2010)

So what is the correct way to do this?

Mark your keys

Instead of separating two databases, you might come up with a system of key names that reflect belonging to two different datasets. There should nothing inherently wrong with that and it might end up being the easiest way. One could imagine a system like this:

database1_key1
database1_key2
database2_key1
...

Use one REDIS instance per database

If you don’t like the idea of mixing up the keys from two databases, having two separate instances of REDIS running should do the work. In this case you’ll probably need to specify port for each one of them as follows:

redis-server --port 6379

This way you can specify the port value for each REDIS instance separately. This is what you will use in your python code.

import redis database = redis.StrictRedis('localhost', port = 6379)

database.set('key1', 'value1')

Copying data between databases

Last but not least, here is a little tip on how to clone a database. Again there are a few ways to do that.

Dump database and reaload it

This is suggested over here. It feels a little bit more like a hack then a proper procedure and I haven’t tested it, but I don’t see a reason why it wouldn’t work. You can dump the whole database into a file asynchronously using:


database.bgsave()

This will create a dump.rdb file, which contains a copy of your database. You’ll need to copy it in a different folder and initiate the second database from there. As the second database starts, it will look for dump.rdb file to initialize from. This will effectively create a new clone.

Enslave and liberate method

This is not exactly an official name, but it’s quite descriptive of what’s going on. You will need to create two instances and make one slave of the other:


database1 = redis.StrictRedis('localhost', port=6379) 

database2 = redis.StrictRedis('localhost', port=6380)

database2.slaveof('localhost', port=6379)

The property of slave database is that it gets a copy of all the entries in a master database. This transfer happens on the background so you don’t have to worry about it.

Now because we want to have a clone of the original database, we need to ‘un-slave’ the secondĀ  database, or liberate it if you will. And a free database is a slave to whom? Nobody! And that’s precisely how it’s done.


database2.slaveof()

This breaks the bond between the databases and stops the data from being transfered together with future edits in database1. And there you have it – a clone of database is done. Just a side note though, you probably want to check if the data transfer is finished, so it might be a good idea to compare for instance the database sizes.

import time

if database1.dbsize() == database2,dbsize():
    database2.slaveof()
else:
    print('Waiting for the data transfer to finish')
    time.sleep(1)

There is probably a better way to do this. I reckon there is a way to ask for a confirmation of the transfer. REDIS is very general with its confirmation messages. However I didn’t have time to look into that. If you know any better, please leave me a comment. Thanks!

 

Advertisements