Is Riak appropriate for website metrics?

Aphyr aphyr at aphyr.com
Mon Nov 28 17:24:52 EST 2011


For limited mapreduce (where you know the keys in advance) riak would be 
a fine choice. 500 million keys, n val 3 is readily achievable on 
commodity hardware; say four nodes with 128GB SSDs.

If large-scale mapreduce (more than a few hundred thousand keys) is 
important, or listing keys is critical, you might consider HBase.

If you start hitting latency/write bottlenecks, it may be worth 
accumulating metrics in Redis before flushing them to disk.

At Showyou, we're also building a custom backend called Mecha which 
integrates Riak and SOLR, specifically for this kind of analytics over 
billions of keys. We haven't packaged it for open-source release yet, 
but it might be worth talking about off-list.

--Kyle

On 11/28/2011 02:07 PM, Michael Dungan wrote:
> Hi,
>
> Sorry if this has been asked before - I couldn't find a searchable
> archive of this list.
>
> I was told to ask this list whether or not Riak would be appropriate for
> tracking our site's metrics. We are currently using Redis for this but
> are at the point where we need both clustering and m/r capability, and
> on the surface, Riak looks to fit this bill (we already use Erlang
> elsewhere in our app, so that's an additional plus).
>
> The records are pretty small and can be representated easily in json. An
> example:
>
> {
> "id": "c4473dc5cfc5da53831d47c4c016d1c7de0a31e4fd94229e47ade569ef011a7b"
> "type": "Photo::Click",
> "user_id": 2640,
> "photo_id": 255,
> "ip": "100.101.102.103",
> "created_at": "2011/04/08 17:09:40 -0700"
> }
>
> We currently have around 25 million records similar to this one, and are
> adding 4-5 million more each month.
>
> Is Riak appropriate for this use case? Are there any gotchas I need to
> be aware of?
>
> thank you,
>
> -mike




More information about the riak-users mailing list