REDIS: Iterating through database with SCAN

It has recently come to my attention that iterating though stuff is quite important. Be it a lists, sets or hashes. This can generally be accomplished using the general SCAN method in Redis. However convenient, it is a little confusing at first so lets analyze it a bit closer.

Another option you will notice is the COUNT. I believe this is generally set to 10 give or take. Now you might come up with the same idea as I did. Surely setting the COUNT to 1 will make SCAN act like a normal iterator, returning one value at the time.

This is not the case as I had to learn the hard way.

Strangely enough the COUNT value is only an approximate value of results SCAN will return. Even if set to 1, you will very often receive 2 or more entries at the time so you will always need to iterate through the returned values if you want to deal with entries one by one.

Python code example

So say you have a hash database called ‘hash_database’ because you have no imagination and its late and you want to iterate through its members using redis-py. You would probably want to write something like this


import redis

database = redis.StrictRedis('localhost')
cursor = 0
while true:

    cursor, entries = database.hscan('hash_database', cursor)

    # Just printing entries one by one
    for name, content in entries.items():
    print("Name: {} -- Content: {}".format(name, content))

    if cursor == 0:
        break

This is a pretty simple way to handle this odd iteration method. If you’re concerned about the connection speed, you can set the COUNT to some higher number and receive larger chunks of your data at each call. One again, those chunks will vary in size with each iteration, but generally stay close to the defined number.

As much as this is not extremely complex, redis-py makes your life even simpler offering the convenience wrapper scan-iter. This will simplify the above code thusly:

<pre>import redis

database = redis.StrictRedis('localhost')

for entry in db.hscan_iter('read_sequence'):
    print("Name: {} -- Content: {}".format(entry[0], entry[1]))

How much simpler is this?! Note that in this case the entries will be returned one by one and you don’t have to worry about the cursor value. You can still speficy the COUNT value, which¬† I assume determines the number of database entries obtained in each ‘call to the database’. However that’s only my assumption and if you know more about it, please leave me a comment.

So why am I even bothering explaining the original SCAN method if redis-py makes it so much simpler? Well mostly because pure Redis doesn’t have any SCAN_ITER method so when accessing it directly or writing lua scripts you’ll still have to rely on the CURSOR. Partially also because I find it quite amusing. It reminds me of The Twelve Tasks of Asterix where they needed to fetch a permit A38 from ‘The Place That Sends You Mad’ sending Asterix and Obelix from one cubicle to another. If you’ve seen the movie and/or ever dealt with bureaucracy,¬† you would know what I’m refering to.

Introduction

When it comes to programming, the abundance of documentation out there is not always directly proportional to the ease of understanding. I for once sometimes find it hard to really figure out just what should I be doing. Too many options confuse me – how can I know which one is the best to pick? And sometimes I feel like I’m downright expected to think in binary to understand some ‘documentation’.

I am personally so glad for the army of ‘noobs’ to cover every possible (bad) idea on the stack overflow. However sometimes one gets to an unexplored territory where only the bravest and skilled programmers dwell. A jungle of undescriptive error messages and unhelpful manuals.

I am bravely trotting, where no (sensible) man has gone before and I’ll be sharing my woes and victories with you.