ListKeys or MapReduce

Jeremiah Peschka jeremiah.peschka at gmail.com
Tue Feb 12 14:13:41 EST 2013


Good news! You've found a bug in CorrugatedIron. Because of index naming,
we muck index names to have a suffix of _bin or _int, depending on the
index type. This shouldn't be happening on $key, but it is. I'll create a
github issue and get that taken care of.

---
Jeremiah Peschka - Founder, Brent Ozar Unlimited
MCITP: SQL Server 2008, MVP
Cloudera Certified Developer for Apache Hadoop


On Tue, Feb 12, 2013 at 7:56 AM, Kevin Burton <rkevinburton at charter.net>wrote:

> I forgot to mention that when I execute this code I get the error:****
>
> ** **
>
>                                         {not_found,****
>
>                                          {<<"products">>,****
>
>                                           <<"$keys">>},****
>
>                                          undefined}}}:[{mochijson2,****
>
>                                                         json_encode,2,****
>
>                                                         [{file,****
>
>
> "src/mochijson2.erl"},****
>
>                                                          {line,149}]},****
>
>                                                        {mochijson2,****
>
>
>                     '-json_encode_array/2-fun-0-',****
>
>                                                         3,****
>
>                                                         [{file,****
>
>
> "src/mochijson2.erl"},****
>
>                                                         {line,157}]},****
>
>                                                        {lists,foldl,3,****
>
>
> [{file,"lists.erl"},****
>
>                                                          {line,1197}]},***
> *
>
>                                                        {mochijson2,****
>
>
> json_encode_array,2,****
>
>                                                         [{file,****
>
>
>                                              "src/mochijson2.erl"},****
>
>                                                          {line,159}]},****
>
>                                                        {riak_kv_pb_mapred,
> ****
>
>                                                         process_stream,3,*
> ***
>
>                                                         [{file,****
>
>
> "src/riak_kv_pb_mapred.erl"},****
>
>                                                          {line,97}]},****
>
>                                                        {riak_api_pb_server,
> ****
>
>                                                         process_stream,5,*
> ***
>
>                                                         [{file,****
>
>
>               "src/riak_api_pb_server.erl"},****
>
>                                                          {line,227}]},****
>
>                                                        {riak_api_pb_server,
> ****
>
>                                                         handle_info,2,****
>
>                                                         [{file,****
>
>
> "src/riak_api_pb_server.erl"},****
>
>                                                          {line,158}]},****
>
>                                                        {gen_server,****
>
>                                                         handle_msg,5,****
>
>                                                         [{file,****
>
>
>                                            "gen_server.erl"},****
>
>                                                          {line,607}]}] -
> CommunicationError****
>
> ** **
>
> ** **
>
> *From:* riak-users [mailto:riak-users-bounces at lists.basho.com] *On Behalf
> Of *Kevin Burton
> *Sent:* Tuesday, February 12, 2013 9:48 AM
> *To:* 'Jeremiah Peschka'
> *Cc:* 'riak-users'
> *Subject:* RE: ListKeys or MapReduce****
>
> ** **
>
> The name is “$keys”? Something like:****
>
> ** **
>
>             using (IRiakEndPoint cluster = RiakCluster.FromConfig(
> "riakConfig"))****
>
>             {****
>
>                 IRiakClient riakClient = cluster.CreateClient();****
>
>                 RiakBucketKeyInput bucketKeyInput = new RiakBucketKeyInput
> ();****
>
>                 bucketKeyInput.AddBucketKey(productBucketName, "$keys");**
> **
>
>                 RiakMapReduceQuery query = new RiakMapReduceQuery()****
>
>                    .Inputs(bucketKeyInput)****
>
>                    .MapJs(m => m.Name("Riak.mapValuesJson").Keep(true));**
> **
>
>                 RiakResult<RiakMapReduceResult> result =
> riakClient.MapReduce(query);****
>
>                 if (result.IsSuccess)****
>
>                 {****
>
> ** **
>
> ** **
>
> *From:* Jeremiah Peschka [mailto:jeremiah.peschka at gmail.com<jeremiah.peschka at gmail.com>]
>
> *Sent:* Tuesday, February 12, 2013 9:18 AM
> *To:* Kevin Burton
> *Cc:* riak-users
> *Subject:* Re: ListKeys or MapReduce****
>
> ** **
>
> It would be queried like any other index as an MR input. I'll create an
> issue and will try to get this in some time in the next few days - no
> promises, though.****
>
>
> ****
>
> ---****
>
> Jeremiah Peschka - Founder, Brent Ozar Unlimited****
>
> MCITP: SQL Server 2008, MVP****
>
> Cloudera Certified Developer for Apache Hadoop****
>
> ** **
>
> On Tue, Feb 12, 2013 at 7:09 AM, Kevin Burton <rkevinburton at charter.net>
> wrote:****
>
> I will read the other URLs that you mentioned. Thank you.****
>
>  ****
>
> Would you mind giving a short example (preferably using CI) of the $keys
> index?****
>
>  ****
>
> *From:* Jeremiah Peschka [mailto:jeremiah.peschka at gmail.com]
> *Sent:* Tuesday, February 12, 2013 8:52 AM
> *To:* Kevin Burton
> *Cc:* riak-users
> *Subject:* Re: ListKeys or MapReduce****
>
>  ****
>
> They're both pretty crappy in terms of performance - they read all data
> off of disk. If you're using LevelDB you can use the $keys index to pull
> back just the keys that in a single bucket.****
>
>  ****
>
> A better approach is to maintain a separate bucket - e.g. DocumentCount -
> that is used for counting documents. Unfortunately, you can't guarantee
> transactional consistency around counts in Riak today, so you'll want to
> move maintaining the counts out of Riak and into something else. If you
> search the list archives [1], you'll find that Redis has been mentioned as
> a good way to solve this problem - counters are stored in Redis and flushed
> to Riak on a regular schedule. Because of the lack of consistency
> (especially around MapReduce operations), Riak isn't the best choice if you
> require counters/aggregations to be stored in the database.****
>
>  ****
>
> Once CRDTs [2] make it into mainstream Riak, you can make use of those
> data structures to implement distributed counters in Riak.****
>
>  ****
>
> [1]: http://riak.markmail.org****
>
> [2]: http://vimeo.com/52414903****
>
>
> ****
>
> ---****
>
> Jeremiah Peschka - Founder, Brent Ozar Unlimited****
>
> MCITP: SQL Server 2008, MVP****
>
> Cloudera Certified Developer for Apache Hadoop****
>
>  ****
>
> On Mon, Feb 11, 2013 at 10:30 AM, <rkevinburton at charter.net> wrote:****
>
> Say I need to determine how many document there are in my database. For a
> CorrugatedIron application I can do ListKeys and get the warning that it is
> an expensive operation or I can do a MapReduce query. Which is the the
> least expensive? Is there an option that I am missing?****
>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com****
>
>  ****
>
> ** **
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20130212/5b3d3f0e/attachment.html>


More information about the riak-users mailing list