Perl 6 - the future is here, just unevenly distributed

IRC log for #fuel, 2013-11-11

| Channels | #fuel index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:01 e0ne joined #fuel
00:33 boldhawk joined #fuel
00:33 boldhawk hey all - was hoping to get passed a problem that is stoping me moving forward with Fuel........
00:34 boldhawk How can I remove nodes that show up in the GUI as "pending deletion". In the fuel CLI they show discovered.
01:20 TSCHAKMac joined #fuel
01:24 xarses boldhawk: you need to "deploy cluster" to remove them from cluster
01:25 boldhawk xarses - that's the problem they aren't part of a cluster.
01:25 boldhawk i selected add node, and all the nodes show as pending deletion?
02:09 Bomfunk joined #fuel
02:13 xarses boldhawk: where they in a cluster previously?
02:19 boldhawk xarses: they were, but one of the nodes failed to install due to a conflicting puppet cert. I deleted the environment thinking it would delete the nodes, but it left them in "pending deletion"
02:20 xarses so the cluster was deleted entirely?
02:20 boldhawk correct. "fuel environment --list" showed no clusters
02:20 boldhawk and in the fuel CLI - the nodes showed as discovered, not pending deleteion
02:22 boldhawk is there a better way to recover from a failed installation?
02:23 boldhawk I've had a number of failures which I think was a conflicting puppet cert, but other times I get ubuntu installation stuck on like 5% and it never moves.
02:23 xarses you should just be continue /restart by running the "deploy cluster" again
02:23 xarses puppet cert issue is odd
02:23 xarses it almost shouldn't happen
02:24 boldhawk I had a node-16, when I ssh'd in from fuel, and looked it gave the puppetca error and asked me to run puppetca clean node-16
02:28 boldhawk Can I ask what is the syntax for "deploy" to continue? "fuel deploy --env 4 /restart" gives an http 400 error
02:28 xarses no clue from the client
02:29 xarses should just be deploy --env 4
02:29 boldhawk [root@fuel init.d]# fuel deploy --env 4 HTTP Error 400: Bad Request
02:31 xarses ya, you can allways put chrome into debug mode and click on the deploy button in the ui to see if it does anything sepcial
02:31 xarses the ui should just restart the deployment from the last successful step
02:32 xarses the ui should be sending what ever command to the same api interface that the cli is sending
02:53 boldhawk thansk - I'll look into it  a little more.
02:58 boldhawk just a quick question with regards power control - is there anyway to tell fuel to use IPMI instead of ssh? I'm finding I have a stuck node every once and a while and when in a remote DC it's a bit of a pain if I can't restart it.
03:09 TSCHAKMac_ joined #fuel
03:32 ArminderS joined #fuel
03:39 boldhawk Has anyone experienced an issue with fuel when installing a Ceph Cluster? If it fails for some reason, the next re-installation attempt also fails but fails because the node fails to boot correctly. We had a similar issue when using Ceph with puppet and we had to wipe the GPT partition using parted and setting the lable to msdos so that the ubuntu installer can continue.
05:10 xarses boldhawk: currently only the ssh control plane is supported
05:11 boldhawk ok - we'll make provision to setup out of band management
05:12 xarses boldhawk: ceph will fail to install correctly in HA mode using neutron vlan or gre. It's a known issue and requires https://review.openstack.org/#/c/52850/
05:15 xarses boldhawk: hmm re-read that. It sounds like a seperate issue during upstart
05:16 boldhawk thats what I think as well.
05:16 boldhawk However - on a related note - how can we patch our fuel deployment rather than wait for a re-released ISO?
05:18 xarses there are some patches into the 3.2.1 branches that update the disk partitioning for ceph
05:19 xarses manifests from github.com/Mirantis/fuel are easy enough to hot patch in if they don't require changes in other components
05:19 xarses manifests (as in puppet manifests)
05:20 xarses in the case of the afore mentioned ceph patches, you'd need updates to nailgun and fuel (library)
05:20 ArminderS ceph in HA mode with neutron gre worked fine for me under centos
05:20 ArminderS is that ubuntu only related issue?
05:21 ArminderS radosgw though needed the patch
05:21 xarses ArminderS: you might only have one ceph-monitor then
05:21 ArminderS nope, i got 3
05:21 ArminderS 1 each on all 3 controllers
05:21 xarses oh, you fixed the public_network and storage_network manually right?
05:21 ArminderS ah right
05:21 ArminderS i had to do that
05:22 ArminderS well very easy thing to do
05:22 ArminderS so i didn't considered it as a patch
05:22 boldhawk how does the issue manifest ?
05:22 ArminderS so xarses, you are right in a way that it fails
05:22 ArminderS unless you do that thing
05:31 xarses boldhawk: its possible to hot patch each of the components nailgun (fuel-web) astute, fuel (fuel-library), and ostf but sometimes many of the changes are inter-related. You're best bet sometimes is to just build an iso from fuel-main
05:33 ArminderS xarses: on that note, do you think fuel-4.0 is on stage to build the iso & put up a usable cloud or it still needs lots of bug-fixing before its preview release a week away?
05:35 boldhawk xarses - pretty new to Fuel so apologies - but do you have any details on how to build an iso?
05:35 ArminderS http://docs.mirantis.com/fuel-dev/develop/env.html#building-the-fuel-iso
05:36 boldhawk ArminderS: Thank you!
05:41 xarses ArminderS: the 4.0 code was not working well last I checked, you can try if you want a havana cloud, but expect some issues
05:42 ArminderS right then, i'll probably wait, thanks
05:49 boldhawk joined #fuel
05:51 boldhawk xarses: - my attempt to deploy a cluster has failed again with all nodes showing 'error'. Not sure the best way to debug, but if I run something like 'puppet agent --test' if fails with an SSL verify error:
05:51 boldhawk err: /File[/var/lib/puppet/lib]: Failed to generate additional resources using 'eval_generate: SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed.  This is often because the time is out of sync on the server or client
05:57 boldhawk Also not sure what it means - but looking through the syslog on fuel for one of the failed nodes I see a couple of entries :    01:37:54.376778 #2194]  WARN -- : netio.rb:19:in `_receive' PLMC7: Exiting after signal: SignalException: SIGTERM
05:59 xarses boldhawk: hmm, have all deployments had this issue?
06:00 xarses boldhawk: also did you check the time?
06:00 xarses boldhawk: also check 'mco ping'
06:01 boldhawk xarses: all deployments get to the ubuntu installed stage, and then 'error' on all nodes.
06:02 boldhawk xarses: mco ping returns time <90ms difference
06:03 xarses found all nodes ok?
06:03 xarses is the time between fuel and one of the failed node close?
06:04 boldhawk xarses: yes - within <1sec
06:17 xarses boldhawk: see http://docs.mirantis.com/fuel/fuel-3.2/frequently-asked-questions.html and search for ssl
06:17 xarses see if that helps any
06:18 boldhawk_ joined #fuel
06:42 dburmistrov joined #fuel
08:04 teran joined #fuel
08:10 mattymo joined #fuel
08:21 e0ne joined #fuel
08:49 aglarendil_ joined #fuel
08:56 mrasskazov joined #fuel
09:07 ogzy joined #fuel
09:31 ogzy is there any possbility to use fuel web interface with chef?
09:31 akupko joined #fuel
09:33 teran1 joined #fuel
09:53 teran joined #fuel
09:55 e0ne joined #fuel
09:57 e0ne_ joined #fuel
10:10 tonyha joined #fuel
10:40 tonyha joined #fuel
11:56 MiroslavAnashkin joined #fuel
12:02 e0ne joined #fuel
12:06 aglarendil_ joined #fuel
12:17 e0ne joined #fuel
12:18 MiroslavAnashkin joined #fuel
12:28 mihgen joined #fuel
12:29 aglarendil_ joined #fuel
13:12 MiroslavAnashkin joined #fuel
14:03 MiroslavAnashkin joined #fuel
14:09 aglarendil_ joined #fuel
14:22 MiroslavAnashkin joined #fuel
14:25 mihgen joined #fuel
14:49 akupko joined #fuel
15:28 ArminderS joined #fuel
15:36 e0ne joined #fuel
15:38 MiroslavAnashkin joined #fuel
16:31 MiroslavAnashkin joined #fuel
16:46 akupko joined #fuel
16:59 ArminderS joined #fuel
17:16 e0ne_ joined #fuel
17:18 e0ne_ joined #fuel
17:26 MiroslavAnashkin joined #fuel
17:34 teran joined #fuel
18:27 xarses joined #fuel
18:43 xarses joined #fuel
18:54 albionandrew joined #fuel
18:55 albionandrew Hi xarses - http://pastebin.com/ey0zH6Qc failing health checks again
18:56 albionandrew xarses - I also see ERROR nova.openstack.common.rpc.common [-]  Failed to consume message from queue: Socket closed + similar messages on the compute node. How would I go about testing the rabbitmq?
20:03 dhblaz joined #fuel
20:20 angdraug joined #fuel
20:32 rmoe joined #fuel
20:35 xarses albionandrew: It sounds like you are still having (intermittent) network issues between the nodes. I'm not familiar with how to test the queues other than creating multiple tasks and seeing the messages add // remove from the queues
21:40 dhblaz joined #fuel
21:54 dhblaz Do you have any suggestions for how to test for such failures (other than running ping forever)?
21:56 dhblaz Here is the WARN+ from the nova log on the compute node
21:56 dhblaz http://pastebin.com/kvugZihk
22:12 dhblaz I'm struggling quite a bit with the quantum l3 config as fuel sets it up
22:12 dhblaz It doesn't looks like the init.d scripts should be used
22:12 dhblaz but instead perhaps pacemaker should manage it.
22:12 dhblaz Is this correct?
22:13 dhblaz Also can someone propose troubleshooting steps when Horizon reports that a router Port is Status DOWN/Admin State UP
22:13 dhblaz I'm specifically worries about the network:dhcp "Device Attached" ports
22:27 dhblaz after rebooting controller-2
22:27 dhblaz crm status shows it as offline
22:27 dhblaz I see this in the /var/log/daemon.log
22:27 dhblaz <29>Nov 11 22:26:11 node-92 crmd[2070]:   notice: crm_timer_popped: We appear to be in an election loop, something may be wrong
22:30 dhblaz Doing this fixed the crm status:
22:30 dhblaz [root@node-92 ~]# sudo /etc/init.d/corosync restart
22:30 dhblaz Signaling Corosync Cluster Engine (corosync) to terminate: [  OK  ]
22:30 dhblaz Waiting for corosync services to unload:................   [  OK  ]
22:30 dhblaz Starting Corosync Cluster Engine (corosync):               [  OK  ]
22:31 dhblaz But horizon still reports "Something went wrong!"
23:04 rmoe joined #fuel

| Channels | #fuel index | Today | | Search | Google Search | Plain-Text | summary