Riak and SEC Filings

Andres Jaan Tack andres.jaan.tack at eesti.ee
Tue Nov 8 10:05:33 EST 2011


>
> * Given that the documents total ~1TB of storage (not including the
> generated indexes), does something like decreasing the n_val make sense?
>  Mostly the documents are bulk inserted on a daily or weekly basis – other
> than that all of the operations are read-only.


The N replication factor determines reliability of the data. If you have
N=1, losing one node of the cluster means you have DEFINITELY lost data. On
the other hand, the value of raising N diminishes iteratively and begins to
waste resources. So it depends: if you don't care about losing data (e.g.
if you can just replace the missing documents), lower N as needed.

--Tack

2011/11/8 Hector Castro <hectcastro at gmail.com>

> Hello,
>
> I'm currently in the process of evaluating solutions to index the contents
> of ~1TB of SEC (Securities and Exchange Commission) documents.  File sizes
> vary between a few KB to a couple hundred KB.  I started evaluating Riak
> first because ease of setting up and expanding a cluster are primary
> requirements (ElasticSearch is also probably going to get evaluated, along
> with Solr).
>
> Below I have a few specific questions that I was hoping people could help
> with:
>
>        * In going through the search querying documentation, I haven't
> found a way to extract a section of a result containing matches.  Something
> similar to Google's search results page where you see an excerpt of the
> webpage contents that match your query.  Is something like this built-in so
> that it doesn't have to be done by the application?
>        * Given that the documents total ~1TB of storage (not including the
> generated indexes), does something like decreasing the n_val make sense?
>  Mostly the documents are bulk inserted on a daily or weekly basis – other
> than that all of the operations are read-only.
>
> Other than these specific questions, if anyone can provide general insight
> on issues that would arise from a dataset like this within Riak, please
> feel free to mention them.
>
> Thanks,
>
> --
> Hector
>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20111108/8c7bb464/attachment.html>


More information about the riak-users mailing list