Using snow (and snowfall) with AWS for parallel processing in R
In relation to my earlier similar SO question , I tried using snow/snowfall on AWS for parallel computing.
What I did was:
- In the
sfInit()function, I provided the public DNS to
socketHostsparameter like so
- The error returned was
Permission denied (publickey)
- I then followed the instructions (I presume correctly!) on http://www.imbi.uni-freiburg.de/parallel/ in the 'Passwordless Secure Shell (SSH) login' section
- I just cat the contents of the .pem file that I created on AWS into the ~/.ssh/authorized_keys of the AWS instance I want to connect to from my master AWS instance and for the master AWS instance as well
Is there anything I am missing out ? I would be very grateful if users can share their experiences in the use of snow on AWS.
Thank you very much for your suggestions.
UPDATE: I just wanted to update the solution I found to my specific problem:
- I used StarCluster to setup my AWS cluster : StarCluster
- Installed package
snowfallon all the nodes of the cluster
- From the master node issued the following commands
hostslist <- list("ec2-xxx-xx-xxx-xxx.compute-1.amazonaws.com","ec2-xx-xx-xxx-xxx.compute-1.amazonaws.com")
sfInit(parallel=TRUE, cpus=2, type="SOCK",socketHosts=hostslist)
l <- sfLapply(1:2,function(x)system("ifconfig",intern=T))
- The ip information confirmed that the AWS nodes were being utilized