[Sharing] Using riak for a very very(less then 2k) tiny objects?
eliezer at ngtech.co.il
Tue Dec 22 14:52:45 EST 2015
I Wanted to share about a specific scenario which I have tested Riak for.
I was implementing a url-filtering engine and was looking for the right DB.
I started with defining my use case and eventually for version 1 of the
engine I decided to use a key\value store which will be based on a a
single byte value as a starter "1" or "0", 1 = blacklisted, 0 = whitelisted.
The sum of keys is standing on millions while for now it's about 4 million.
I then started to try to find the right DB for the task and I was
conducting tests on all sorts of DB engines sql\nosql and couple others.
The results was that on each and every DB there was some slowdown either
for updates or for fetches.
Actually the last DB I have tested before I implemented my own engine
was Riak which has lots of good features!
My tests showed that Riak does a great job but since my db lookup and
update required a specific speed Riak was running too slow.
The logic was that each DB lookup can consist statistically in up to 8
DB lookups and an update would be a batch of about 2 million objects("1"
or "0" as a value).
I tried to test the usage with the http interface and the protobuf with
couple clients(Golang, ruby, python, couple others) and the results was
unrealistic for me. It took about an hour to update\set\create 2 million
objects and the fetch\lookups was also very slow compared to MySQL and
PostgreSQL and couple others.
Then I was very confused since I do liked the pros of Riak which is a
distributed DB but since my objects size is very small and my basic
requirement was fast lookups and batch updates I decided to try my own
coding skills and to see where it leads me.
After couple weeks of work I found myself with Golang+levelDB and a full
blown DB + broadcast proxy + couple other tools.
The DB implements the lookup logic so I need to run only one query for a
url and the DB runs couple lookups which then returns an answer.
The broadcast proxy have a list of DB servers which in turn replay the
http message it was fed per DB server.
I have implemented a specific batch update DB interface which works with
a PUT message and in turn sets\updates a list of keys with a value which
was the fastest solution ever! It beats any other DB I have tried
sql\nosql document\non-document. And it was not just a beat it was a
The DB size eventually is very small and a full blacklist uses about
12-50 MB of disk space.
The speed of the batch update interface is about 1 million sets\updates
in about 10 seconds and the query speed of the DB is standing at about
6k queries per second(6k*8 internal queries=40k queries per second) on a
8 cores Intel Atom server.
So it appears that has someone suggested to me here in the list or in
the Riak irc channel, Riak might not fit to my use case which eventual I
have found as the right path.
All The Bests,
More information about the riak-users