Slow performance using linkwalk, help wanted

Karsten Thygesen karthy at netic.dk
Tue Nov 9 05:10:30 EST 2010


Hi Ryan

Thanks for helping out!!

The cluster consists of 4 exactly similar nodes - all dedicated to riak use only - no other zones or tasks going on. We use Riak-EE 0.13. The servers is HP servers with 4 x 146GB 10K RPM SAS disks. There is a memorycache on the RAID controller and it is used during both read and writes but the RAID iis built usin Solaris-10u9 ZFS in a setup as such:

  pool: pool01
 state: ONLINE
 scrub: scrub completed after 0h0m with 0 errors on Tue Oct 26 21:25:05 2010
config:

        NAME          STATE     READ WRITE CKSUM
        pool01        ONLINE       0     0     0
          mirror-0    ONLINE       0     0     0
            c0t0d0s7  ONLINE       0     0     0
            c0t1d0s7  ONLINE       0     0     0
          mirror-1    ONLINE       0     0     0
            c0t2d0    ONLINE       0     0     0
            c0t3d0    ONLINE       0     0     0

errors: No known data errors

metrics during load gives 5% CPU load and about 10% IO load (iostat reports 30 iops and the disks should be able to handle 300 iops each). So basically, the servers is unloaded....

One question remains - we use ZFS with default blocksize of 128Kb - what is the optimal blocksize with bitcask?

But I believe, that we should look somewhere else for the challenge - the hardware is not loaded significant, so I suspect, that we have a faulty datamodel or usage...?

Best regards,

Karsten

On Nov 8, 2010, at 23:43 , Ryan Tilder wrote:

> Hi, Jan.  Your description of the behaviour you're seeing below is frequently the result of slow access times to data on disk due to low spindle count for a given data set.  Can you tell me the hardware specifications of the disks in each of the machines in the Riak ring?  Primarily the number of disks, their capacity in GB or TB, their spindle speed, any on disk buffer they might have, and whether or not they're all the same hardware model?[1]
> 
> One other question that might be relate: is there any other software of note running on the hardware?  i.e. Riak running on multi-tenant hardware in a Solaris Zone while another DB is running in a separate Zone on the same hardware, another disk seek heavy daemon running on the same machine, etc.
> 
> --Ryan
> 
> 1. If you don't know the above, you can get a lot of information from the "prtconf -v | more" command and just search for the work "disk".  The model number(s) for the relevant disks is enough that we can hunt down the technical specs quickly enough.
> 
> On Mon, Nov 8, 2010 at 12:44 PM, Jan Buchholdt <jbu at trifork.com> wrote:
> Another information is that the first time I do a link walk (using curl) on a total idle cluster it takes 2.71 second for a person with 363 documents. If I repeat the request it takes 319 milliseconds. I would expect that the performance would be almost the same.
> 
> If I run my performance test with 20 treads, that randomly pick a Person from 5 millions, the minimum time is 2.8s, average 6.7s, 90% 8.9s and max 12.4s.
> 
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20101109/c3dd1790/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1919 bytes
Desc: not available
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20101109/c3dd1790/attachment.p7s>


More information about the riak-users mailing list