Riak Search

Gordon Tillman gtillman at mezeo.com
Tue Oct 16 10:55:52 EDT 2012


Pavel as an alternative to re-writing the objects to cause them to be indexed, you may invoke what I call a map operation with side-effects.

You define an Erlang map-phase function as follows:


map_reindex({error,notfound}, _, _) ->
    [];
map_reindex(RiakObject, _, _) ->
    riak_search_kv_hook:precommit(RiakObject),
    [].


You want to run that against all of the keys in the bucket by posting a mapred job like this:

{
    "inputs": "<your-bucket>",
    "query": [
        {
            "map": {
                "function": "map_reindex", 
                "language": "erlang",
                "module": "<your-module>"
            }
        }
    ],
    "timeout": <your-timeout>
}


We have used this technique to re-index rather large clusters and it runs quickly because you are doing it in parallel across all of the nodes in the cluster.

-- gordon



On Oct 16, 2012, at 07:44 , Ryan Zezeski <rzezeski at basho.com> wrote:

> 
> 
> On Sun, Oct 14, 2012 at 12:33 AM, Pavel Kogan <pavel.kogan at cortica.com> wrote:
> 
> 1) Is search enabling has any impact on read latency/throughput?
> 
> If you are reading and searching at the same time there is a good chance it will.  It will cause more disk seeks.
>  
> 2) Is search enabling has any impact on RAM usage?
> 
> Yes, the index engine behind Riak Search makes heavy usage of Erlang ETS tables.  Each partition has an in-memory buffer as well as an in-memory offset table for every segment.  It also uses a temporary ETS table for every write to store posting data.  The ETS system limit can even become an issue in overload scenarios.
>  
> 3) In production we have no search enabled. What is the best way to 
>     enable search without stop production? I thought about something like:
>     1) Enable search node after node.
> 
> You could change the app env dynamically but that's only half the problem.  The other half is then starting the Riak Search application.  I think application:start(merge_index) followed by application:start(riak_search) should work but I'm not 100% sure and this has not been tested.  You'll also want to make sure to edit all app.configs so that it is persistent.
> 
>  
>     2) Execute some night script that runs on all keys and overwrite them back
>         with proper mime type.
> 
> Yes, you'll want to install the commit hook on the buckets you wish to index.  Then you'll want to do a streaming list-keys or bucket map-reduce and re-write the data.
> 
>  
> 4) If we see that search overhead is something we can't handle, is there simple
>     way to disable it without stop production?
> 
> I think the best course of action in this case would be to disable the commit hook.  But you would have to keep track of anything written during this time and re-write it after re-installing the hook.  If you don't then you'll have to re-index everything because you don't know what you missed.
> 
> 5) In what case we would need repair? It is said - on replica loss, but if I understand 
>     correct we have 3 replicas on different nodes don't we? If it happens how difficult and
>     long would it be for large cluster (about 100 nodes)?
> 
> Repair is on a per partition basis.  Number of nodes doesn't come into play.  Repair is very specific in that it requires the adjacent partitions to be in a good, convergent state.  If they aren't then repair isn't much help. 
> 
> A lot of these entropy issues go away in Yokozuna.  Repairing indexes is done automatically, in the background, in an efficient manner.  There is no need to re-write data or run manual repair commands.
> 
> -Z
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20121016/35ae5c82/attachment.html>


More information about the riak-users mailing list