Is Riak appropriate for website metrics?

Michael Dungan mpd at stippleit.com
Mon Nov 28 17:59:37 EST 2011


Thank you for getting back to me. It does look like we'll be needing to 
go big, as we're already at 5m new records/month, so just dealing with 
monthly numbers is already beyond the few hundred thousand keys you 
mentioned, unless I'm thinking about this wrong.

I would love to hear more about Mecha if you're willing to share. Feel 
free to contact me off-list.

thanks again,

-mike


On 11/28/11 2:24 PM, Aphyr wrote:
> For limited mapreduce (where you know the keys in advance) riak would be
> a fine choice. 500 million keys, n val 3 is readily achievable on
> commodity hardware; say four nodes with 128GB SSDs.
>
> If large-scale mapreduce (more than a few hundred thousand keys) is
> important, or listing keys is critical, you might consider HBase.
>
> If you start hitting latency/write bottlenecks, it may be worth
> accumulating metrics in Redis before flushing them to disk.
>
> At Showyou, we're also building a custom backend called Mecha which
> integrates Riak and SOLR, specifically for this kind of analytics over
> billions of keys. We haven't packaged it for open-source release yet,
> but it might be worth talking about off-list.
>
> --Kyle
>
> On 11/28/2011 02:07 PM, Michael Dungan wrote:
>> Hi,
>>
>> Sorry if this has been asked before - I couldn't find a searchable
>> archive of this list.
>>
>> I was told to ask this list whether or not Riak would be appropriate for
>> tracking our site's metrics. We are currently using Redis for this but
>> are at the point where we need both clustering and m/r capability, and
>> on the surface, Riak looks to fit this bill (we already use Erlang
>> elsewhere in our app, so that's an additional plus).
>>
>> The records are pretty small and can be representated easily in json. An
>> example:
>>
>> {
>> "id": "c4473dc5cfc5da53831d47c4c016d1c7de0a31e4fd94229e47ade569ef011a7b"
>> "type": "Photo::Click",
>> "user_id": 2640,
>> "photo_id": 255,
>> "ip": "100.101.102.103",
>> "created_at": "2011/04/08 17:09:40 -0700"
>> }
>>
>> We currently have around 25 million records similar to this one, and are
>> adding 4-5 million more each month.
>>
>> Is Riak appropriate for this use case? Are there any gotchas I need to
>> be aware of?
>>
>> thank you,
>>
>> -mike




More information about the riak-users mailing list