ListKeys or MapReduce

Jeremiah Peschka jeremiah.peschka at gmail.com
Tue Feb 12 14:52:01 EST 2013


Oh, and an example can be found https://gist.github.com/peschkaj/4772825

---
Jeremiah Peschka - Founder, Brent Ozar Unlimited
MCITP: SQL Server 2008, MVP
Cloudera Certified Developer for Apache Hadoop


On Tue, Feb 12, 2013 at 11:44 AM, Jeremiah Peschka <
jeremiah.peschka at gmail.com> wrote:

> ...and fixed!
>
> You can get this right now if you're adventurous and want to build
> CorrugatedIron from source by grabbing the develop branch [1]. We have
> several other issues to clean up and verify before we release CI 1.1.1 in
> the next day or so. Or you can download it from [2] if you don't want to
> build yourself and don't want to wait for NuGet. Once we put 1.1.1 to NuGet
> we'll respond to this thread or email you directly.
>
> I make no guarantees that the new DLL won't eat your hard drive or turn
> your computer into a killer robot.
>
> [1]: https://github.com/DistributedNonsense/CorrugatedIron/tree/develop
> [2]:
> http://clientresources.brentozar.com.s3.amazonaws.com/CorrugatedIron-111-alpha.zip
>
> ---
> Jeremiah Peschka - Founder, Brent Ozar Unlimited
> MCITP: SQL Server 2008, MVP
> Cloudera Certified Developer for Apache Hadoop
>
>
> On Tue, Feb 12, 2013 at 11:13 AM, Jeremiah Peschka <
> jeremiah.peschka at gmail.com> wrote:
>
>> Good news! You've found a bug in CorrugatedIron. Because of index naming,
>> we muck index names to have a suffix of _bin or _int, depending on the
>> index type. This shouldn't be happening on $key, but it is. I'll create a
>> github issue and get that taken care of.
>>
>> ---
>> Jeremiah Peschka - Founder, Brent Ozar Unlimited
>> MCITP: SQL Server 2008, MVP
>> Cloudera Certified Developer for Apache Hadoop
>>
>>
>> On Tue, Feb 12, 2013 at 7:56 AM, Kevin Burton <rkevinburton at charter.net>wrote:
>>
>>> I forgot to mention that when I execute this code I get the error:****
>>>
>>> ** **
>>>
>>>                                         {not_found,****
>>>
>>>                                          {<<"products">>,****
>>>
>>>                                           <<"$keys">>},****
>>>
>>>                                          undefined}}}:[{mochijson2,****
>>>
>>>                                                         json_encode,2,**
>>> **
>>>
>>>                                                         [{file,****
>>>
>>>
>>> "src/mochijson2.erl"},****
>>>
>>>                                                          {line,149}]},**
>>> **
>>>
>>>                                                        {mochijson2,****
>>>
>>>
>>>                     '-json_encode_array/2-fun-0-',****
>>>
>>>                                                         3,****
>>>
>>>                                                         [{file,****
>>>
>>>
>>> "src/mochijson2.erl"},****
>>>
>>>                                                         {line,157}]},***
>>> *
>>>
>>>                                                        {lists,foldl,3,**
>>> **
>>>
>>>
>>> [{file,"lists.erl"},****
>>>
>>>                                                          {line,1197}]},*
>>> ***
>>>
>>>                                                        {mochijson2,****
>>>
>>>
>>> json_encode_array,2,****
>>>
>>>                                                         [{file,****
>>>
>>>
>>>                                              "src/mochijson2.erl"},****
>>>
>>>                                                          {line,159}]},**
>>> **
>>>
>>>
>>> {riak_kv_pb_mapred,****
>>>
>>>                                                         process_stream,3,
>>> ****
>>>
>>>                                                         [{file,****
>>>
>>>
>>> "src/riak_kv_pb_mapred.erl"},****
>>>
>>>                                                          {line,97}]},***
>>> *
>>>
>>>
>>>                                                      {riak_api_pb_server,
>>> ****
>>>
>>>                                                         process_stream,5,
>>> ****
>>>
>>>                                                         [{file,****
>>>
>>>
>>>               "src/riak_api_pb_server.erl"},****
>>>
>>>                                                          {line,227}]},**
>>> **
>>>
>>>
>>> {riak_api_pb_server,****
>>>
>>>                                                         handle_info,2,**
>>> **
>>>
>>>                                                         [{file,****
>>>
>>>
>>> "src/riak_api_pb_server.erl"},****
>>>
>>>                                                          {line,158}]},**
>>> **
>>>
>>>                                                        {gen_server,****
>>>
>>>                                                         handle_msg,5,***
>>> *
>>>
>>>                                                         [{file,****
>>>
>>>
>>>                                            "gen_server.erl"},****
>>>
>>>                                                          {line,607}]}] -
>>> CommunicationError****
>>>
>>> ** **
>>>
>>> ** **
>>>
>>> *From:* riak-users [mailto:riak-users-bounces at lists.basho.com] *On
>>> Behalf Of *Kevin Burton
>>> *Sent:* Tuesday, February 12, 2013 9:48 AM
>>> *To:* 'Jeremiah Peschka'
>>> *Cc:* 'riak-users'
>>> *Subject:* RE: ListKeys or MapReduce****
>>>
>>> ** **
>>>
>>> The name is “$keys”? Something like:****
>>>
>>> ** **
>>>
>>>             using (IRiakEndPoint cluster = RiakCluster.FromConfig(
>>> "riakConfig"))****
>>>
>>>             {****
>>>
>>>                 IRiakClient riakClient = cluster.CreateClient();****
>>>
>>>                 RiakBucketKeyInput bucketKeyInput = new
>>> RiakBucketKeyInput();****
>>>
>>>                 bucketKeyInput.AddBucketKey(productBucketName, "$keys");
>>> ****
>>>
>>>                 RiakMapReduceQuery query = new RiakMapReduceQuery()****
>>>
>>>                    .Inputs(bucketKeyInput)****
>>>
>>>                    .MapJs(m => m.Name("Riak.mapValuesJson").Keep(true));
>>> ****
>>>
>>>                 RiakResult<RiakMapReduceResult> result =
>>> riakClient.MapReduce(query);****
>>>
>>>                 if (result.IsSuccess)****
>>>
>>>                 {****
>>>
>>> ** **
>>>
>>> ** **
>>>
>>> *From:* Jeremiah Peschka [mailto:jeremiah.peschka at gmail.com<jeremiah.peschka at gmail.com>]
>>>
>>> *Sent:* Tuesday, February 12, 2013 9:18 AM
>>> *To:* Kevin Burton
>>> *Cc:* riak-users
>>> *Subject:* Re: ListKeys or MapReduce****
>>>
>>> ** **
>>>
>>> It would be queried like any other index as an MR input. I'll create an
>>> issue and will try to get this in some time in the next few days - no
>>> promises, though.****
>>>
>>>
>>> ****
>>>
>>> ---****
>>>
>>> Jeremiah Peschka - Founder, Brent Ozar Unlimited****
>>>
>>> MCITP: SQL Server 2008, MVP****
>>>
>>> Cloudera Certified Developer for Apache Hadoop****
>>>
>>> ** **
>>>
>>> On Tue, Feb 12, 2013 at 7:09 AM, Kevin Burton <rkevinburton at charter.net>
>>> wrote:****
>>>
>>> I will read the other URLs that you mentioned. Thank you.****
>>>
>>>  ****
>>>
>>> Would you mind giving a short example (preferably using CI) of the $keys
>>> index?****
>>>
>>>  ****
>>>
>>> *From:* Jeremiah Peschka [mailto:jeremiah.peschka at gmail.com]
>>> *Sent:* Tuesday, February 12, 2013 8:52 AM
>>> *To:* Kevin Burton
>>> *Cc:* riak-users
>>> *Subject:* Re: ListKeys or MapReduce****
>>>
>>>  ****
>>>
>>> They're both pretty crappy in terms of performance - they read all data
>>> off of disk. If you're using LevelDB you can use the $keys index to pull
>>> back just the keys that in a single bucket.****
>>>
>>>  ****
>>>
>>> A better approach is to maintain a separate bucket - e.g. DocumentCount
>>> - that is used for counting documents. Unfortunately, you can't guarantee
>>> transactional consistency around counts in Riak today, so you'll want to
>>> move maintaining the counts out of Riak and into something else. If you
>>> search the list archives [1], you'll find that Redis has been mentioned as
>>> a good way to solve this problem - counters are stored in Redis and flushed
>>> to Riak on a regular schedule. Because of the lack of consistency
>>> (especially around MapReduce operations), Riak isn't the best choice if you
>>> require counters/aggregations to be stored in the database.****
>>>
>>>  ****
>>>
>>> Once CRDTs [2] make it into mainstream Riak, you can make use of those
>>> data structures to implement distributed counters in Riak.****
>>>
>>>  ****
>>>
>>> [1]: http://riak.markmail.org****
>>>
>>> [2]: http://vimeo.com/52414903****
>>>
>>>
>>> ****
>>>
>>> ---****
>>>
>>> Jeremiah Peschka - Founder, Brent Ozar Unlimited****
>>>
>>> MCITP: SQL Server 2008, MVP****
>>>
>>> Cloudera Certified Developer for Apache Hadoop****
>>>
>>>  ****
>>>
>>> On Mon, Feb 11, 2013 at 10:30 AM, <rkevinburton at charter.net> wrote:****
>>>
>>> Say I need to determine how many document there are in my database. For
>>> a CorrugatedIron application I can do ListKeys and get the warning that it
>>> is an expensive operation or I can do a MapReduce query. Which is the the
>>> least expensive? Is there an option that I am missing?****
>>>
>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users at lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com****
>>>
>>>  ****
>>>
>>> ** **
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20130212/1eab3116/attachment.html>


More information about the riak-users mailing list