Description
I followed the READMEs for cgcloud-core
and cgcloud-toil
to set up on my (firewalled) podcloud VM.
Because I already had a key registered (from my old VM, which crashed and took its id_rsa.pub with it), I used cgcloud register-key --force ~/.ssh/id_rsa.pub
cgcloud create-cluster --leader-instance-type m3.medium --instance-type c3.8xlarge --share shared/ --spot-bid 1.0 -s 1 toil
failed at the rsync step to copy from shared/, so I tried the same command without that option.
The cluster was created:
cgcloud list toil-leader
INFO: Using zone 'us-west-2a' and namespace '/jeltje.van.baren/'
i-abcb3770 jeltje.van.baren_toil-leader 0 172.31.31.92 52.40.118.17 i-abcb3770 2016-05-26T17:48:29.000Z running
However, cgcloud ssh toil-leader
gets an ssh error (full error pasted below)
I can't ping the machine either.
Ping and ssh to other machines work fine from the VM, so I'm assuming the authentication at EC2 is somehow messed up?
Full error:
INFO: Using zone 'us-west-2a' and namespace '/jeltje.van.baren/'
INFO: Binding to instance ...
INFO: ... waiting for instance i-abcb3770 ...
INFO: ... running, waiting for assignment of public IP ...
INFO: ... assigned, waiting for SSH port ...
INFO: ... open ...
INFO: ... instance ready.
Permission denied (publickey).
Traceback (most recent call last):
File "/home/ubuntu/cgcloud/bin/cgcloud", line 9, in <module>
load_entry_point('cgcloud-core==1.3.8', 'console_scripts', 'cgcloud')()
File "/home/ubuntu/cgcloud/local/lib/python2.7/site-packages/cgcloud/core/cli.py", line 49, in main
app.run( args )
File "/home/ubuntu/cgcloud/local/lib/python2.7/site-packages/cgcloud/lib/util.py", line 300, in run
command.run( options )
File "/home/ubuntu/cgcloud/local/lib/python2.7/site-packages/cgcloud/core/commands.py", line 81, in run
return self.run_in_ctx( options, ctx )
File "/home/ubuntu/cgcloud/local/lib/python2.7/site-packages/cgcloud/core/commands.py", line 105, in run_in_ctx
return self.run_on_role( options, ctx, role )
File "/home/ubuntu/cgcloud/local/lib/python2.7/site-packages/cgcloud/core/commands.py", line 124, in run_on_role
return self.run_on_box( options, box )
File "/home/ubuntu/cgcloud/local/lib/python2.7/site-packages/cgcloud/core/commands.py", line 164, in run_on_box
self.run_on_instance( options, box )
File "/home/ubuntu/cgcloud/local/lib/python2.7/site-packages/cgcloud/core/commands.py", line 232, in run_on_instance
self.ssh( options, box )
File "/home/ubuntu/cgcloud/local/lib/python2.7/site-packages/cgcloud/core/commands.py", line 219, in ssh
status = box.ssh( user=self._user( box, options ), command=options.command )
File "/home/ubuntu/cgcloud/local/lib/python2.7/site-packages/cgcloud/core/box.py", line 1050, in ssh
raise RuntimeError( 'ssh failed' )
RuntimeError: ssh failed
Activity
hannes-ucsc commentedon May 26, 2016
Delete your instances. Delete your key pair in the EC2 console and try
register-key
again, but without--force
.Jeltje commentedon May 26, 2016
I tried it. Same error:
hannes-ucsc commentedon May 26, 2016
You didn't delete the key pair because I can still see the old one.
hannes-ucsc commentedon May 26, 2016
You may also want to start from scratch with a new SSH key pair locally. Maybe the private key doesn't match the public key.
Jeltje commentedon May 27, 2016
I tried a few new key pairs, with and without password protection. I verified that the key pair fingerprint changed on EC2 after running
register-key
. Below is the error I get from trying to create a cluster using--shared
Jeltje commentedon May 27, 2016
When I start the cluster without
--shared
, I canssh ubuntu@52.40.39.137
just fine. Butssh mesosbox@52.40.39.137
getsPermission denied (publickey).
ssh -vvv mesosbox@52.40.39.137
full log output herehannes-ucsc commentedon May 27, 2016
What's CGCLOUD_KEYPAIRS set to?
Jeltje commentedon May 27, 2016
on the toil-leader,
cat /home/ubuntu/.ssh/authorized_keys
shows two different ssh-rsa keys, both ending with my email. The second key matches my id_rsa.pub./home/mesosbox/.ssh/authorized_keys
shows only the first key, which explains why it won't let me log on.Jeltje commentedon May 27, 2016
CGCLOUD_KEYPAIRS on the master? Or on my VM?
echo $CGCLOUD_KEYPAIRS
gives nothing on either.hannes-ucsc commentedon May 27, 2016
Then you don't have it set.
hannes-ucsc commentedon May 27, 2016
Upon investigation on the actual box, it turns out that dots in the namespace prevented cgcloudagent from creating the SQS queue. We should tweak the
__me__
derivation to strip dots. We should also tighten the regex that validates namespaces to disallow dots.Workaround for now is to
CGCLOUD_NAMESPACE=/foo/
Jeltje commentedon May 27, 2016
Changing the namespace hasn't fixed the problem.
export CGCLOUD_NAMESPACE=/jeltje/
cgcloud create -IT toil-box
cgcloud create-cluster --leader-instance-type m3.medium --instance-type c3.8xlarge --spot-bid 1.0 -s 1 toil
cgcloud list toil-leader
But I can't
ssh
to it. Yesterday I was at least able tossh ubuntu@52.34.135.67
(but notssh mesosbox@52.34.135.67
) but that no longer works either. So I can't see what's going on with the ssh keyshannes-ucsc commentedon May 27, 2016
Most recent failure was the result of misconfiguration on user's end (multiple SSH agent instances).