Perl 6 - the future is here, just unevenly distributed

IRC log for #fuel, 2014-03-13

| Channels | #fuel index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:11 xarses justif: http://paste.openstack.org/show/73324/
00:22 dhblaz When troubleshooting the problem I was having with floating IPs it would be prudent to check that the floating IP is listed in this output:
00:22 dhblaz ip netns exec `ip -o netns list | fgrep qrouter` ip -4 addr list
00:40 isAAAc joined #fuel
00:44 justif xarses testing now
00:45 xarses justif: about to head home, will be back online ~30 min
00:45 justif ok
01:13 IlyaE joined #fuel
01:28 designated I've got a frozen deployment that is stuck and won't delete.  is there a way to clear this out from cli or do i need to rebuild fuel?
01:37 ToTsiroll joined #fuel
01:39 justif I have had deployments that took forever to delete but they eventually deleted
01:39 justif did you try to cancel it first?
01:39 justif or stop it
01:44 xarses joined #fuel
01:55 ToTsiroll hi xarses..it did not work..i think that bug was commited to 4.1
01:55 ToTsiroll i've posted my bug here https://bugs.launchpad.net/fuel/+bug/1291140
01:56 designated it eventually cleared out just took a while
02:01 ToTsiroll what cleared out?
02:11 justif his stalled deployment
02:15 dhblaz Finally found where neutron sets up floating ips
02:15 dhblaz ip netns exec `ip -o netns list | fgrep qrouter `  iptables -t nat -S
02:21 designated xarses: got all of the packages added to the repo, don't know why 3 directories didn't get copied when i installed fuel but got it sorted out.  Trying a fresh deployment now.
02:24 ToTsiroll any ideas on the bug i posted
02:24 ToTsiroll cant seem to fix my deployment
02:25 designated now after hitting deploy, the nodes are repeating the same message over and over: Could not find kernel image: /images/ubuntu_1204_x86_64/linux
02:30 dhblaz joined #fuel
02:50 richardkiene joined #fuel
02:55 designated xarses: http://paste.openstack.org/show/73316/ shows multiple directories being copied but there is no /ubuntu/conf or /ubuntu/dists in the iso
02:56 designated i meant /ubuntu/db instead of /ubuntu/dists
02:56 designated i have /ubuntu/dists,indices,installer-amd64,pool
03:51 vkozhukalov joined #fuel
03:53 dhblaz joined #fuel
04:27 dburmistrov joined #fuel
05:05 Danny joined #fuel
05:08 Bomfunk joined #fuel
05:32 Ch00k joined #fuel
05:33 Ch00k joined #fuel
05:58 Ch00k joined #fuel
06:08 Ch00k joined #fuel
06:42 dburmistrov joined #fuel
06:42 VonDuke Do you know how you can connect to the Controller via SSH? I tried using the public key and it is denied
07:47 e0ne joined #fuel
08:02 IlyaE joined #fuel
08:12 saju_m joined #fuel
08:32 baboune joined #fuel
08:32 baboune hello
08:33 baboune new problem with Fuel 4.1 and CentOS installation: when the machine reboots and receives instruction from PXE, centOS starts to install, and we see: The following error was found while parsing the kickstart configuration file: The following error occurred on line 2: Specified non-existent partition 3 on partition command."
08:33 baboune this is a new error that did not exist before 4.1
08:33 baboune with 4.0, we could do the centOS install
08:34 baboune any ideas where to look?
08:45 getup- joined #fuel
08:52 topochan joined #fuel
09:16 dburmistrov_ joined #fuel
09:40 rvyalov joined #fuel
09:41 dburmistrov_ joined #fuel
09:50 tatyana joined #fuel
09:58 baboune I have a question about the nailgun agent and the disk detection. We think the problem is here https://github.com/stackforge/fuel-web/blob/master/bin/agent?source=c
09:58 baboune but it was somehow working when we got a fuel_debug patch to collect statistics
09:58 baboune so there is a delta there that seems to fix the problem.
09:59 dburmistrov joined #fuel
10:05 fweyns joined #fuel
10:06 getup- joined #fuel
10:31 anotchenko joined #fuel
11:10 fweyns left #fuel
11:14 vk joined #fuel
11:24 getup- joined #fuel
11:32 TVR___ joined #fuel
11:33 TVR___ MiroslavAnashkin let me know when you are on again
11:35 e0ne joined #fuel
11:41 Ch00k joined #fuel
11:43 obcecado joined #fuel
11:43 obcecado hi guys
11:47 anotchenko joined #fuel
11:56 TVR___ morning.. hopw all is well
12:05 e0ne joined #fuel
12:19 justif joined #fuel
12:26 anotchenko joined #fuel
12:27 topochan joined #fuel
12:28 TVR___ joined #fuel
12:31 e0ne joined #fuel
12:58 Ch00k joined #fuel
13:01 Ch00k joined #fuel
13:20 e0ne_ joined #fuel
13:22 anotchenko joined #fuel
13:23 Ch00k joined #fuel
14:01 justif xarses patch worked for the install but it seems the bnx firmware is missing and I get no networking - http://i.imgur.com/Ps368Mi.png
14:02 richardkiene joined #fuel
14:05 justif nvm about the nic firmware, seems to be an issue with the 3.10 kernel option I picked - http://unix.stackexchange.com/questions/102102/bnx2-cant-load-firmware-file-bnx2-bnx2-mips-09-6-2-1b-fw
14:07 anotchenko joined #fuel
14:07 e0ne joined #fuel
14:07 richardkiene_ joined #fuel
14:18 designate joined #fuel
14:19 designate xarses: http://paste.openstack.org/show/73316/ shows multiple directories being copied but there is no ubuntu/conf or ubuntu/db directories in the iso.
14:25 jobewan joined #fuel
14:42 MiroslavAnashkin TVR___: I am on. You may simply ask something even when there is no me. There are several people checking this chat from time to time.
14:43 TVR___ I have another snapshot if you want to see if my cluster failed in the same way....https://drive.google.com/file/d/0B5xmhlRed6NqSk5wbUFhVnVLcWM/edit?usp=sharing
14:43 TVR___ I figured you have already gone down a thought path.. so this may be more info for you.
14:44 TVR___ like before, I added 2 more compute + ceph nodes to an existing cluster, and that has failed...
14:44 TVR___ this time, however, the instances stayed up and continued their writing and deleting without issues
14:45 TVR___ the dashboard also works this time, as last time it went into a bad state and became unusable
14:46 MiroslavAnashkin TVR___: Yes, it looks like Fuel has to run `crm resource cleanup p_mysql` after Pacemaker gets new configuration for MySQL on the first node.
14:52 Bomfunk_ joined #fuel
14:52 Bomfunk_ joined #fuel
14:52 designate if anyone is available to answer my question it would be greatly appreciated.  I don't know when xarses usually gets on.
14:55 TVR___ so it seems the ability to add compute + ceph nodes to the cluster has gone from able to in 4.0 (I was able to reliably) to no longer able to.
14:55 TVR___ so do I need to file a bug, or will you push this upstream?
14:55 TVR___ I can also test any manually added patch you want me to test...if you want
14:59 IlyaE joined #fuel
15:04 MiroslavAnashkin TVR___: I'll file non-public bug. We file all the bugs with snapshots as private. For the case.
15:07 TVR___ OK, cool... thanks... do you want me to edit some recipies or do you have suggested edits I should try to fix this as I have the hardware available to do testing ...
15:35 Ch00k joined #fuel
15:41 vkozhukalov joined #fuel
15:49 dhblaz joined #fuel
16:02 TVR___ I am still jonesing to see the HA fail-over work.... it's just cool when fail-over works...
16:15 angdraug joined #fuel
16:17 anotchenko joined #fuel
16:24 bogdando joined #fuel
16:26 vkozhukalov joined #fuel
16:33 xarses joined #fuel
16:33 designate xarses: http://paste.openstack.org/show/73316/ shows multiple directories being copied but there is no ubuntu/conf or ubuntu/db directories in the iso.
16:33 designate good morning btw
16:39 TVR___ wonder how long it will be before the directory is called x86_64 rather than amd_64 with installs.... amd was the first to adopt the 64 bit OS, but man, that was so long ago...
16:40 TVR___ s/amd_64/amd64
16:40 dburmistrov joined #fuel
16:47 anotchenko joined #fuel
16:52 brain461 joined #fuel
17:05 stasisyn joined #fuel
17:10 dhblaz xarses: you around?
17:11 dhblaz I figured out my floating-ip problem
17:11 dhblaz But not quite sure what to do about it.  I'm also pretty sure I'm not the only one with this problem
17:11 dhblaz The issue is with my switch
17:12 dhblaz If I run this command on the l3-agent node the problem goes away
17:12 dhblaz ip netns exec `ip -o netns list | fgrep qrouter ` arping -c 200 -I qg-58817b59-c6 -A 74.63.153.153
17:14 richardkiene joined #fuel
17:17 anotchenko joined #fuel
17:19 TVR___ so before running that command.. instances, can't use their floating IP?
17:21 xarses designate: I've checked a couple of iso's and none have ubuntu/{conf,db} so that should be OK, the snippit i gave was was what is in the code to copy them in the first place
17:47 justif2 joined #fuel
17:47 MiroslavAnashkin Just FYI. We already got 4 different change requests addressed to fix issues with Fuel 4.1 OS boot on HP hardware.
17:48 xarses MiroslavAnashkin: http://paste.openstack.org/show/73324/
17:58 xarses MiroslavAnashkin: i updated https://bugs.launchpad.net/fuel/+bug/1291692 with the patch from that paste, it combines the two pmanager changes
18:01 justif2 xarses not sure if you saw but the patch worked
18:01 xarses justif2: yep, thanks
18:10 MiroslavAnashkin xarses: Yes, I just suspected one of these 4 patches is not needed.
18:18 dhblaz joined #fuel
18:20 xarses left #fuel
18:21 xarses joined #fuel
18:23 anotchenko joined #fuel
18:27 justif2 I just tried a Ubuntu + Ceph with 3 ceph/controller nodes and 2 compute nodes with neutron gre networking and it dies waiting for "The disk drive for /var/lib/glance is not ready yet or not present"
18:27 justif2 any ideas?
18:28 xarses justif2: is /var/lib/glance mounted?
18:29 justif2 I cant tell, it is still booting and I cant even use the option to Skip or manually mount it seems hung
18:29 xarses justif2: ok, so this is during boot?
18:30 dudnik joined #fuel
18:30 justif2 yea right after the fsck
18:36 xarses I'm guessing these are the same nodes that you had centos on before?
18:36 justif2 correct
18:37 xarses yep, the erase node task probably didn't clear out the partitions since it has the same regression in it so preseed probably didn't create all the partitions correctly
18:37 IlyaE joined #fuel
18:38 xarses let me get you a patch for that, then we should be able to delete the cluster and it should be able to create the partitions correctly
18:38 justif2 ok
18:41 Ch00k joined #fuel
18:42 xarses dhblaz: some useful reference material http://www.slideshare.net/mirantis/hk-openstack-namespaces1 (does not address your issue)
18:43 dhblaz xarses: thanks, I read it last night
18:43 dhblaz I really need the l3-agent to make gratuitous arps at an interval shorter than my switch unlearns the macs
18:44 xarses dhblaz: what switch vendor?
18:44 dhblaz When I looked in the /var/log/debug logs I only see calls to arping (l3-agent's method of making such arps) at startup
18:44 dhblaz HP
18:44 dhblaz But not their procurve line
18:45 xarses using spanning tree?
18:45 dhblaz It doesn't use spanning tree
18:47 dhblaz This product:
18:47 dhblaz http://h18004.www1.hp.com/products/quickspecs/13127_div/13127_div.pdf
18:47 xarses random thought, is it possible that the floating address is duplicated somwhere else on the network?
18:47 alexz joined #fuel
18:47 dhblaz We have 4 of them across two chassis
18:48 dhblaz No the firewall logs when an IP moves from one mac to another and there is no such movement recorded
18:48 dhblaz also I checked the ARP table on the firewall (default route on the lan segment) and it had the right mac
18:48 dhblaz node-18 is running the l3 agent
18:48 dhblaz and when I have the problem I see the packets on node-17 but not node-18
18:49 dhblaz I checked on the switch mac address table and it showed that it was only learned on the cross connects and the port assoicated with node-17
18:49 dhblaz But really this problem has to happen on other equipment too
18:51 xarses odd, is it possible to see what the switch has for a interface forwarding table? needing to advertise the mac implies that it's moving around forwarding ports.
18:51 dhblaz Well that is just it, it doesn't appear that the l3-agent is doing proxy arps
18:52 dhblaz which switch are you asking about the ovs or HP hardware?
18:52 xarses the hp hardware
18:53 dhblaz It looks like the implementation of the l3-agent requires that the neutron router's interface be used as the routing interface for the lan segment that has the floating IPs
18:54 dhblaz I'm guessing that I have a misunderstanding because what appears to be the case is contrary to the documentation
18:54 dhblaz regarding the hp hardware - it works like just about any other switch.  It learns macs that traverse the port
18:55 dhblaz and eventually ages out entries
18:55 xarses ya, but unless it's running some STP, it should learn them near instantly unless its already somewhere else
18:56 dhblaz that is just it, it is learning it from node-17
18:56 xarses hmm, does it have the namespace?
18:57 dhblaz All the controllers do
18:57 dhblaz So there are two things I don't understand.
18:58 dhblaz 1) Why does the switch forget the mac address that is associated with the neutron router (the same one advertised for the floating ips)
18:58 dhblaz 2) Why doesn't the l3-agent respond to arps made for the floating ip
18:58 rvyalov joined #fuel
19:08 mutex_ dhblaz: what version of fuel do you have ?
19:08 dhblaz 4.0
19:09 dhblaz {"build_id": "2013-12-27_00-24-14", "ostf_sha": "83ada35fec2664089e07fdc0d34861ae2a4d948a", "build_number": "214", "nailgun_sha": "af1598bcc9faf468d4d9265cc5c51fa8cea53136", "fuelmain_sha": "17eed776b30886851ae0042fa7a30184f5cd8eb6", "astute_sha": "6ce36837882399e0d3bb1ffdb2c3b2d8dcb84b54", "release": "4.0", "fuellib_sha": "eebe07913ee09311c8e7c9231f6785081327dc0e"}
19:12 mutex_ did you apply the ocf patches ?
19:12 mutex_ neutron was mostly unusable for me until I applied those
19:13 mutex_ https://bugs.launchpad.net/fuel/+bug/1269334
19:13 mutex_ I got very strange behavior from the daemons constantly restarting
19:14 dhblaz Yes
19:19 richardkiene joined #fuel
19:22 dhblaz I'm just getting more confused now.
19:23 dhblaz If I use arping to make arp requests from the br-ex interface on the l3-agent node (node-18 for me right now)
19:23 dhblaz I get arp replies and I see the arp reuqests/replies on the router interface in the router namespace
19:23 dhblaz if I make the arp requests in the router namespace they go unanswered
19:26 IlyaE joined #fuel
19:27 anotchenko joined #fuel
19:35 mutex_ oh, interesting
19:35 mutex_ I'm not entirely sure about how ovs works honestly
19:35 mutex_ my only exposure is a bit of debugging with the namespaces
19:45 e0ne joined #fuel
19:54 anotchenko joined #fuel
20:32 vkozhukalov joined #fuel
20:35 IlyaE joined #fuel
20:36 dhblaz How can I determine what these two addresses are?
20:36 dhblaz vip__management_old(ocf::heartbeat:IPaddr2):Started node-17.mumms.com
20:36 dhblaz vip__public_old(ocf::heartbeat:IPaddr2):Started node-16.mumms.com
20:39 xarses ip -4 a | grep :ka
20:57 mutex_ yeah on those nodes
21:33 dhblaz xarses: I think I found the problem but there is no fix: http://h20565.www2.hp.com/portal/site/hpsc/template.PAGE/public/kb/docDisplay/?sp4ts.oid=3794423&spf_p.tpst=kbDocDisplay&spf_p.prp_kbDocDisplay=wsrp-navigationalState%3DdocId%253Dmmr_kc-0113230-5%257CdocLocale%253Den%257CcalledBy%253DSearch_Result&javax.portlet.begCacheTok=com.vignette.cachetoken&javax.portlet.endCacheTok=com.vignette.cachetoken
21:35 dhblaz The config isn't like mine, but the problem is the same.  Node-17 (not the l3 agent) sends a packet with the router vip's mac address
21:35 dhblaz and the switch stops forwarding packets to node-18
21:49 dhblaz In the most recent example I have node-17 forwarded a dns request that had been nat'd to the router's IP to the dns server.  I don't know why this packet didn't go out node-18 (that is running the l3-agent)  after that I lose connectivity to all my floating IPs until some other network event happens that causes node-18 to get into the forwarding database for the router's VIP's MAC.
22:10 rvyalov joined #fuel
23:19 dhblaz joined #fuel
23:24 richardkiene_ joined #fuel
23:26 justif joined #fuel
23:38 richardkiene__ joined #fuel
23:47 xarses joined #fuel
23:52 dhblaz joined #fuel
23:54 IlyaE joined #fuel

| Channels | #fuel index | Today | | Search | Google Search | Plain-Text | summary