Riak Cluster Setup on EC2

Ryan Maclear ryan at lambdasphere.com
Wed Feb 2 15:10:05 EST 2011


Hi All,

In my past experiences with distributed Erlang on systems I've put together, the first thing I check when two nodes cannot ping each other is whether the erlang cookie is the same on both nodes (and sometimes an erlang hosts file is needed).

I know in the dev setup of riak (3 nodes - dev1, dev2, dev3 all on 127.0.0.1) the cookie is in the etc/vm.args file, but I'm not sure whether this is the case in production deployments, or whether the cookie in the user's home directory is used. 

Just thought that might be something to look at as well.

Cheers,
Ryan

On 02 Feb 2011, at 7:57 PM, Jon Meredith wrote:

> Hey,
> 
> So that shows distributed erlang is not working, next step is to work out why.  The erlang emulator (beam) uses ports for distributed erlang in addition to the ones riak uses.
> 
> You can see which ports using the erlang port mapper.  It sits on a well known port and sets up all the other connection.  Run  'empd -names' from the commandline to see what is set up.  Here is my machine with 3 dev nodes all running at the same time.  You'll probably just have 'name riak' on yours. 
> 
> $ epmd -names
> epmd: up and running on port 4369 with data:
> name dev3 at port 56668
> name dev2 at port 56657
> name dev1 at port 56647
> 
> Check you can see the empd port from both sides and the distributed erlang port (that's what Sean was talking about with the chef recipe).
> 
> Jon
> 
> On Wed, Feb 2, 2011 at 10:49 AM, Abhishek Kona <abhishek.kona at gmail.com> wrote:
> Hi.
> 
> Sorry about that, 
> I am getting "pang"  as a reply. 
> 
> But is this a network issue?
> As I said before I can telnet on to the machines on the Riak ports.
> 
> What can be the issues?
> 
> Again ,Thanks for all the quick help
> 
> -Abhishek Kona
> 
> 
> On 02/02/11 10:33 PM, Jon Meredith wrote:
>> Hi Abhishek,
>> 
>> It looks like distributed erlang isn't working between the nodes so the join fails.
>> 
>> You can test it by bringing the nodes up in console mode and executing
>> 
>> $ riak console
>> [some logging status messages]
>> Eshell V5.7.5  (abort with ^G)
>> (dev2 at 127.0.0.1)1> net_adm:ping('dev1 at 127.0.0.1').
>> pong
>> (dev2 at 127.0.0.1)3> q().                           
>> ok
>> 
>> Make sure you switch the node names for your own and run the test in both directions. If the node is unreachable it will         return pang instead of ping.
>> 
>> --Jon
>> Senior Software Engineer
>> Basho Technologies
>> 
>> 
>> 
>> On Wed, Feb 2, 2011 at 9:04 AM, Abhishek Kona <abhishek.kona at gmail.com> wrote:
>> On 02/02/11 8:38 PM, Sean Cribbs wrote: 
>>> Abhishek, 
>>> 
>>> First, make sure all of your nodes are in the same security group. 
>> Yes, both the machines are on the same security group ( which has only the ports 8098, 8099, 8087). 
>>>    Second, check that your OS doesn't have an additional firewall installed (iptables, for example). 
>> I can telnet into the Riak ports from each of the machines, so firewall does not seem to be the issue. 
>>>   Third, you might consider doing what the Chef recipe for Riak does and limit the ports that Erlang uses for distributed communication.  Adding a section to app.config like the below will limit the port range: 
>>> 
>>> {kernel, [ 
>>>    {inet_dist_listen_min, 6000}, 
>>>    {inet_dist_listen_max, 7999} 
>>> ]} 
>>> 
>>> You'll need to stop Riak, kill the "epmd" process, and then start Riak up again for this change to take effect.  Make sure those ports are also open in your security group and any software firewall you have. 
>>> 
>> Tried with these changes as well, but still get the same message. Anything else, I can try?. 
>> Thanks for the help. 
>>> Sean Cribbs<sean at basho.com> 
>>> Developer Advocate 
>>> Basho Technologies, Inc. 
>>> http://basho.com/ 
>>> 
>>> On Feb 2, 2011, at 8:47 AM, Abhishek Kona wrote: 
>>> 
>>>> Hi folks 
>>>> 
>>>> I am trying to set up a Riak cluster on EC2. 
>>>> Each time I issue a command : 
>>>> 
>>>> $ sudo riak-admin join riak at 10.130.149.253 
>>>> 
>>>> It fails : 
>>>> 
>>>> Attempting to restart script through sudo -u riak 
>>>> 
>>>> Node riak at 10.130.149.253 is not reachable! 
>>>> 
>>>> 
>>>> Netstat on both the machines says the ports are running fine. 
>>>> 
>>>> netstat -na | egrep '(8087|8098|8099)' 
>>>> 
>>>> tcp        0      0 0.0.0.0:8098            0.0.0.0:*               LISTEN 
>>>> 
>>>> tcp        0      0 0.0.0.0:8099            0.0.0.0:*               LISTEN 
>>>> 
>>>> tcp        0      0 0.0.0.0:8087            0.0.0.0:*               LISTEN 
>>>> 
>>>> 
>>>> I can telnet to all the ports from each of the machine. 
>>>> I have been pulling my hair for long but of no avail. 
>>>> Can any one look and tell me what I am doing wrong. 
>>>> Are there any debug logs where I can look at what is going wrong? 
>>>> Is there any EC2 specific trick (like using public hostnames). 
>>>> 
>>>> I am attaching my app.cfg file for reference. 
>>>> 
>>>> Thanks 
>>>> -Abhishek Kona 
>>>> 
>>>> <app.cfg>_______________________________________________ 
>>>> riak-users mailing list 
>>>> riak-users at lists.basho.com 
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com 
>> 
>> 
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> 
>> 
> 
> 
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20110202/f2298e0d/attachment.html>


More information about the riak-users mailing list