Solr Indexing and required fields.

Les Mikesell lesmikesell at gmail.com
Tue May 22 14:56:33 EDT 2012


On Tue, May 22, 2012 at 1:25 PM, Ryan Zezeski <rzezeski at basho.com> wrote:
>
>>
>> Is there - or could there be - an efficient way to grow/expire indexes
>> for time-related items like news articles where you would generally
>> want the results listed newest-first?   Like being able to start new
>> indexes at a rate where the size makes sense, having way to tell the
>> server to merge the results to cover some timespan, and being able to
>> expire by just dropping old indexes as they age without having to
>> rewrite anything.
>>
>
> Les,
>
> Currently Riak's backends don't really have any way to initiate
> communication with the vnode.  E.g. bitcask has key expiry but no way to
> fire an event back up the chain to the vnode to let it know "this key is
> expired and you should clear it's indexes."

I meant a scheme where the index itself expires without rewriting.

> Another potential way to do it is put the notion of expiry in both the
> object and index backends.  In that case you'll have to deal with the fact
> that there is a race condition between the two.  Which may not matter
> depending on your applications needs.

Can't you pick some reasonable interval to start writing new indexes
(hourly, daily, weekly, depending on the data volume and retention),
then merge some number of them when processing queries?  You must
already be dealing with multiple index chunks on different nodes - why
not also for time ranges?   It could be particularly efficient if
queries could pass the time range and the server would only examine
the indexes covering that span.   The writers would probably need to
be able to back-date items and update existing entries so it wouldn't
be as simple as just using one small current index when storing data,
but on the expire side you'd just drop the whole chunk.

-- 
   Les Mikesell
    lesmikesell at gmail.com




More information about the riak-users mailing list