[Sharing] Using riak for a very very(less then 2k) tiny objects?

Eliezer Croitoru eliezer at ngtech.co.il
Tue Dec 22 14:52:45 EST 2015


Hey List,

I Wanted to share about a specific scenario which I have tested Riak for.

I was implementing a url-filtering engine and was looking for the right DB.
I started with defining my use case and eventually for version 1 of the 
engine I decided to use a key\value store which will be based on a a 
single byte value as a starter "1" or "0", 1 = blacklisted, 0 = whitelisted.
The sum of keys is standing on millions while for now it's about 4 million.
I then started to try to find the right DB for the task and I was 
conducting tests on all sorts of DB engines sql\nosql and couple others. 
The results was that on each and every DB there was some slowdown either 
for updates or for fetches.
Actually the last DB I have tested before I implemented my own engine 
was Riak which has lots of good features!
My tests showed that Riak does a great job but since my db lookup and 
update required a specific speed Riak was running too slow.
The logic was that each DB lookup can consist statistically in up to 8 
DB lookups and an update would be a batch of about 2 million objects("1" 
or "0" as a value).
I tried to test the usage with the http interface and the protobuf with 
couple clients(Golang, ruby, python, couple others) and the results was 
unrealistic for me. It took about an hour to update\set\create 2 million 
objects and the fetch\lookups was also very slow compared to MySQL and 
PostgreSQL and couple others.
Then I was very confused since I do liked the pros of Riak which is a 
distributed DB but since my objects size is very small and my basic 
requirement was fast lookups and batch updates I decided to try my own 
coding skills and to see where it leads me.
After couple weeks of work I found myself with Golang+levelDB and a full 
blown DB + broadcast proxy + couple other tools.
The DB implements the lookup logic so I need to run only one query for a 
url and the DB runs couple lookups which then returns an answer.
The broadcast proxy have a list of DB servers which in turn replay the 
http message it was fed per DB server.
I have implemented a specific batch update DB interface which works with 
a PUT message and in turn sets\updates a list of keys with a value which 
was the fastest solution ever! It beats any other DB I have tried 
sql\nosql document\non-document. And it was not just a beat it was a 
knock-out.
The DB size eventually is very small and a full blacklist uses about 
12-50 MB of disk space.
The speed of the batch update interface is about 1 million sets\updates 
in about 10 seconds and the query speed of the DB is standing at about 
6k queries per second(6k*8 internal queries=40k queries per second) on a 
8 cores Intel Atom server.

So it appears that has someone suggested to me here in the list or in 
the Riak irc channel, Riak might not fit to my use case which eventual I 
have found as the right path.

All The Bests,
Eliezer Croitoru




More information about the riak-users mailing list