Retrieving objects from Riak

Alexander Sicular siculars at gmail.com
Thu Jun 16 00:19:53 EDT 2016


Hi Gianluca, I'll answer inline. My question to you is what is your
use case? In general, the methods you mention could all work but there
are pros and cons with each. -Alexander

On Wed, Jun 15, 2016 at 12:46 PM, Gianluca Padovani <gpadovani at gmail.com> wrote:
> Hi all,
> I'm exploring RiakKV and RiakTS and I have some questions:
>
> If I need to find a group of objects based on some properties, are there any
> way to search it on RiakKV?
>
> For example suppose to have some user and create a key as <state>_<name> so
> I have something as:
> - TX_John
> - TX_May
> - CA_Susan
> - CA_Martin
>
> Can I find all the people of TX? Can I do something as
> users_bucket.get("TX_*") ? I think no.

No, Riak KV is key value and by default keys have no relationship to
one another.

>
> To manage this scenario one possible way is to use a Crdt::Set. I can create
> for every state a set and when I add a new user to users_bucket I add also
> in this list. So If I want all users of TX, I get the set of TX, retrieve
> all the keys and then get all the objects with a mulit_get.

CRDT's may be used. However, performance degrades as you add more
elements to a set. Also, you should be aware of the total key size.
Generally, you don't want keys/values larger than 1MB in Riak. Or more
precisely, you don't want a large discrepancy in the distribution of
your key/value sizes. One 10MB key/value floating around the cluster
may cause thousands of 1KB key/values to get "stuck" behind it in
internal message queues. This ends up introducing a non-deterministic
performance profile.

>
> Another solution is to use 2i, tag every user object with the state, get the
> list of the keys and then retrieve the objects always with a multi_get.
>
> Another options is to use riak search that is based on Solr.
>

2i and Solr may be used here. However Solr is a much better approach
in cases where you may have complex queries going forward. When using
Solr you need to make accommodations for its resource consumption.
Solr like lots of memory.

> Are there any other options? What is the "best options". Some concerns about
> that:
>
> In first solution (set) I need to do 2 write for every write and to get the
> objects I need to do a get and a multi_get, that I think it's a lot of get,
> If I want list it in another way (regarding age for example) I don't have
> any options.
> Using the 2i I should remember to add the 2i to every objects, If I had
> already stored some Object and I want to use a 2i I need to do a map reduce
> to retag every obejcts. Probably riak search is the best solution?
>

If you just use Sole to index and not to store data, Solr will return
a list of keys and you will need to fetch them yourself. The output of
a Solr query is a json object which itself may be cached as a k/v in
Riak for later retrieval. Of course, this all depends on your use case
and whether or not caching is appropriate.

> If I'll user RiakTS, is this search more simple?
>
> If I use state as Partition Key, I can write
>
> select * from users where state="TX"
>
> but,  when I want search something I always filter for partition key,
> correct?

SQL queries in Riak TS require that all columns in a primary key be
queried as equalities. The exception to this is the timestamp column,
if you have one, that may be queried as a range. The reason for this
is because the primary key, with the quantum if part of the PK, is
hashed in order to determined where that data has been distributed in
the cluster. You can think of queries against a PK as an index look
up. Other columns in the schema may be queried as well, it is those
columns that are scanned and filtered. You can think of filters
applied against those additional columns as table scans.

If you look at the PK statement in the documentation, you'll notice
there are two lines. The first line is the partition key. The second
line is the local key. The partition key is what gets hashed to
determine distribution in the cluster. The local key is what is used
to sort data on disk and insure uniqueness. So, you may simply use
"state" as your partition key and "state, time" as your local key. So
although you don't need to, in most cases you'd probably want to use a
quantum in your partition key.

> I think that RiakTS doesn't support RiakSearch and 2i. Are there some plans
> for it?

Correct, Riak TS does not support 2i or Solr. Solr is not fast enough.
I believe the roadmap revolves around improving our SQL feature set at
this point.

>
> Is correct use RiakTS without a time series ? :-)

The timestamp data type in Riak TS is an int. You could just push
sequence ints into that column if you wanted.

>
> A long mail I know ...
>
> thanks and bye
> Gianluca
>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>




More information about the riak-users mailing list