Perl 6 - the future is here, just unevenly distributed

IRC log for #fuel, 2013-11-06

| Channels | #fuel index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:55 rmoe_ joined #fuel
01:30 mihgen joined #fuel
01:44 rmoe joined #fuel
02:07 xarses joined #fuel
02:29 mihgen joined #fuel
02:37 teran joined #fuel
02:44 teran joined #fuel
03:13 rmoe joined #fuel
03:28 ArminderS joined #fuel
03:42 rmoe joined #fuel
04:16 jkirnosova1 joined #fuel
05:57 mihgen joined #fuel
06:43 e0ne joined #fuel
06:50 mrasskazov joined #fuel
06:53 aglarendil_ joined #fuel
07:04 teran joined #fuel
07:05 dburmistrov joined #fuel
07:12 e0ne joined #fuel
07:53 tatyana joined #fuel
08:29 mihgen joined #fuel
08:45 dnikishov joined #fuel
08:46 teran joined #fuel
09:17 Mihalych joined #fuel
09:26 Mihalych left #fuel
09:34 xdeller joined #fuel
09:38 mrasskazov joined #fuel
09:39 dnikishov left #fuel
09:42 e0ne joined #fuel
09:52 teran joined #fuel
10:04 teran joined #fuel
10:29 xdeller joined #fuel
10:52 tatyana joined #fuel
11:03 r0mikiam joined #fuel
12:38 e0ne joined #fuel
12:39 teran1 joined #fuel
13:08 mihgen joined #fuel
13:12 e0ne joined #fuel
13:28 e0ne_ joined #fuel
15:05 dburmistrov joined #fuel
15:40 r0mikiam joined #fuel
16:02 albionandrew joined #fuel
16:03 MiroslavAnashkin joined #fuel
16:15 albionandrew Morning xarses - dhblaz redeployed yesterday. Some machines have the 3.x kernel on by design and some still have the 2.6. we can't do the splinters until after puppet has finished but we are at 95% and a ceph node is hung at installing centos, has progressed in 12 hours. How do we get past that?
16:27 dhblaz joined #fuel
16:31 dhblaz To provide a little more information deployment timed out because eth0 was named rename10 and didn't have an IP in /etc/sysconfig/network-scripts/ifcfg-eth0.  I resolved that but it didn't seem to help.  I then tried to click deploy again.  No luck and it didn't reinstall centos on that node.  So I enabled netboot in cobbler for the node and rebooted.  Anaconda completed but the web gui still shows "INSTALLING CENTOS"
16:33 mihgen dhblaz: are you folks deploying on hardware? renaming of interfaces is pretty known Linux issue which is usually happen on VMs
16:33 dhblaz Yes, HP bl class blades
16:33 dhblaz This happened on bl280c g6
16:39 dhblaz https://bugs.launchpad.net/fuel/+bug/1248404
16:39 dhblaz I filed that bug last night
16:44 aglarendil_ Brad, could you please post /etc/udev/rules.d/* contents?
16:44 dhblaz I don't have the bad one but I can tell you there wasn't an entry for eth0
16:45 dhblaz I can give you a good one (but I have done an OS reinstall since that problem)
16:45 aglarendil The problem is not in /etc/udev/ it is in /lib/udev/rules.d/<bla-bla>persistent-net-generator.rules AFAIK
16:46 dhblaz http://paste.openstack.org/show/tkyT8Sm0ECQkj9H2AR5k/
16:46 aglarendil is this a good one ?
16:46 dhblaz I know that removing /etc/udev/rules.d/70-persistent-net.rules
16:46 dhblaz will resolve the problem
16:46 dhblaz after a reboot
16:46 aglarendil it will work until the next reboot
16:47 aglarendil then udev will regenerate this file
16:47 dhblaz but I will some times have to do this more than once
16:47 dhblaz right when it regenerates it is correct (some times)
16:47 dhblaz The paste is from the machine with working eth0
16:48 aglarendil looks like it depends on how mainboard identifies adapters after initialization
16:51 dhblaz I did notice in my kernel command line includes biosdevname=0
16:52 aglarendil yes, this is what related to bugs with bios propagating different ordering of interfaces
16:53 dhblaz I believe the feature enabled with biosdevname=1 addresses this issue
16:53 aglarendil and breakes other hardware :)
16:54 dhblaz Okay
16:55 dhblaz It might be worth building 70-persistent-net.rules at anaconda install because we know the mac addresses we want assigned to which interfaces
16:55 dhblaz rather we know the mac addresses of the hardware that the user assigned the varios networks to
16:55 aglarendil you are completely right. we will try to address this issue in our hotfix release
16:56 dhblaz Should I paste this transcript into the launchpad bug?
16:57 aglarendil Nope. I will update the bug accordingly
16:57 dhblaz Sorry, I already did it.
16:57 dhblaz I won't in the future
16:58 aglarendil Oh, do not worry, this is not a problem
16:59 Pierre-Luc joined #fuel
16:59 Pierre-Luc [11:57] <Pierre-Luc> Hi guys! [11:57] <Pierre-Luc> Admin from Ericsson Montreal here [11:58] <Pierre-Luc> wandering if somebody could help me out with something?
17:00 aglarendil it depends on what 'something' means
17:00 aglarendil :)
17:00 korn__ Hi, Pierre-Luc!
17:02 korn__ Do you have some troubles with Fuel?
17:03 xarses joined #fuel
17:04 dhblaz I'm still confused why the GUI is stuck in the state it is in.
17:04 dhblaz I have even manually run the nailgun agent
17:04 Pierre-Luc Yes we do
17:05 Pierre-Luc actually a lot with 3.2
17:05 dhblaz http://paste.openstack.org/show/50547/
17:05 Pierre-Luc we almost didn't have any problems with 3.1
17:05 dhblaz I see bugs in lauchpad logged againste 3.2.1 is it publically available?
17:05 Pierre-Luc almost each deployment fail for different reasons
17:05 dhblaz Pierre-Luc: you are not alone; we have the same problem here.
17:06 Pierre-Luc 2013-11-06 11:47:58ERR  (/Stage[main]/Nova::Api/Exec[nova-db-sync]/returns) change from notrun to 0 failed: /usr/bin/nova-manage db sync returned 1 instead of one of [0] at /etc/puppet/modules/nova/manifests/api.pp:115
17:06 Pierre-Luc We are deploying a simple 1 controller/1 compute and we got this error each time
17:06 aglarendil is it HA installation ?
17:06 Pierre-Luc no non-ha
17:07 aglarendil is it compute node ?
17:07 Pierre-Luc yes compute node
17:08 aglarendil can you ping your controller node IP from compute node ?
17:08 korn__ Could give us some info on your environment?  Is it VirtualBox VMs or bare metal?
17:08 Pierre-Luc not virtual box
17:08 Pierre-Luc 2 Dell blades in a M1000e
17:09 aglarendil most likely you do not have L2 connectivity  between compute and controller node
17:10 dhblaz Pierre-Luc: If the verify networks pass and you have L2 issues it is quite possibly an IP address conflict.
17:11 Pierre-Luc I can ping both nodes from Fuel
17:11 aglarendil it may be related to interface renaming after your started the deployment and l2 verify passed
17:11 Pierre-Luc but not between each Openstack node at this point
17:11 aglarendil and can you ping your controller node from compute node using internal network ?
17:11 Pierre-Luc no
17:13 aglarendil most likely it is due to interface renamed after the provisioning of the nodes. that's why you have misconfigured interfaces on your blades. it is udev-related issue that we have just discussed with dhblaz
17:13 Pierre-Luc I need to ping the br-mgmt interface?
17:13 aglarendil yep. ip assigned to br-mgmt interface
17:14 aglarendil and you also should ensure that you have vlans enabled on your blade switch
17:14 aglarendil because default config uses vlan tagging for internal traffic in quantum/neutron-enabled configuration
17:14 Pierre-Luc yes the vlans are already correct
17:15 Pierre-Luc we did another deployement with the same Fuel on other blades
17:15 rmoe joined #fuel
17:15 Pierre-Luc the networking test form fuel went alright
17:16 Pierre-Luc from*
17:18 aglarendil so did another deployment succeed ?
17:19 Pierre-Luc yes
17:19 Pierre-Luc but of course we had to change the networking for the other one
17:19 aglarendil could you please post `ip a` and /etc/udev/rules.d/*net rules ?
17:19 Pierre-Luc I will send the snapshot logs to Ryan
17:23 e0ne joined #fuel
17:23 dburmistrov_ joined #fuel
17:24 e0ne_ joined #fuel
17:27 korn__ Well, Pierre-Luc could you just run the "ip a" command on compute node and post here it's output.
17:28 Pierre-Luc br-eth0   Link encap:Ethernet  HWaddr F0:1F:AF:98:42:63           inet6 addr: fe80::5cce:63ff:fe9e:b441/64 Scope:Link           UP BROADCAST RUNNING  MTU:1500  Metric:1           RX packets:355 errors:0 dropped:0 overruns:0 frame:0           TX packets:6 errors:0 dropped:0 overruns:0 carrier:0           collisions:0 txqueuelen:0           RX bytes:37363 (36.4 KiB)  TX bytes:468 (468.0 b)  br-eth1   Link encap:Ethernet  HWaddr F0:1F
17:28 Pierre-Luc there is way too much lines...
17:29 korn__ yes, try to use http://paste.openstack.org/ for multi-line output
17:29 Pierre-Luc http://paste.openstack.org/show/50551/
17:29 Pierre-Luc here you go
17:30 xarses Pierre-Luc: what nic are these interfaces, and is this centos neutron with vlans?
17:31 xarses Pierre-Luc: ie, what kernel module
17:33 dhblaz joined #fuel
17:38 Pierre-Luc centos neutron gre, not vlans
17:39 xarses Pierre-Luc: and the nic module?
17:41 xdeller Pierre-Luc: as I see you are using eth2 for admin network, right?
17:45 Pierre-Luc yes
17:46 dburmistrov joined #fuel
17:46 Pierre-Luc another thing, I cannot use the PXE nic for anything else, it seems like a bug to me
17:46 Pierre-Luc otherwise it's a big limitation
17:47 angdraug joined #fuel
17:47 xdeller right now it is a limitation for quantum deployment
17:48 Pierre-Luc ok
17:48 xdeller ok, so counters seems quite suspicious in your paste above
17:48 xdeller looks like connectivity break
17:48 e0ne joined #fuel
17:52 Pierre-Luc so basically the deployment fails because the Compute can't reach reach controller?
17:54 xdeller seems so
17:56 Pierre-Luc We are looking into it here
17:56 Pierre-Luc it seems that it's a VLAN problem
17:57 Pierre-Luc we are waiting for the network team to give use feedback
17:57 Pierre-Luc thank you xdeller for the help
17:57 Pierre-Luc and everybody else of course
18:16 Pierre-Luc Does anyone worked with the EMC storage and SMI-S server?
18:29 albionandrew hi xarses did you see this  Morning xarses - dhblaz redeployed yesterday. Some machines have the 3.x kernel on by design and some still have the 2.6. we can't do the splinters until after puppet has finished but we are at 95% and a ceph node is hung at installing centos, has progressed in 12 hours. How do we get past that?..... dhblaz: To provide a little more information deployment timed out because eth0 was named rename10 and didn't have an IP in
18:29 albionandrew /etc/sysconfig/network-scripts/ifcfg-eth0.  I resolved that but it didn't seem to help.  I then tried to click deploy again.  No luck and it didn't reinstall centos on that node.  So I enabled netboot in cobbler for the node and rebooted.  Anaconda completed but the web gui still shows "INSTALLING CENTOS"
18:29 albionandrew So we are stuck on installing centos
18:32 dhblaz joined #fuel
19:06 xarses joined #fuel
19:08 Pierre-Luc FYI, I confirm that the VLAN was the problem
19:08 Pierre-Luc thanks for you help, it works now
19:27 dhblaz joined #fuel
19:32 xarses albionandrew: If the deployment is suck thinking that its currently running a deployment (the progress bar in the top of the cluster) then you can likely reset it by removing any "inprogress" jobs from the tasks table in the db. if the node its self is showing a progress bar but the cluster wont start the deploy again, then I'd be at a loss of how to make it continue
19:32 xarses after kicking the db, you should wack the nailgun process on the fuel node
19:32 xarses and force reload your browser
19:33 albionandrew Sorry dhblaz not here can you tell me how to do that?
19:33 albionandrew The db part
19:33 xarses sudo -u postgres psql
19:33 xarses \c nailgun
19:33 xarses \d
19:34 xarses \d tasks
19:34 xarses select * from tasks;
19:35 xarses paste the output
19:35 albionandrew Great I'll take a look in a bit thanks again!
19:35 xarses if its beyond poking the tasks table, one of the UI devs would need to look at it.
19:49 SteAle joined #fuel
19:56 Pierre-Luc joined #fuel
20:01 Pierre-Luc It seems that the key pairing doesn't work on any deployment that I tried with Fuel 3.2
20:01 Pierre-Luc Is it a known issue?
20:02 xarses key pairing?
20:03 Pierre-Luc if you use a generated ssh key when deploying you instance
20:03 Pierre-Luc the key doesn't seem to be injected into the image
20:04 Pierre-Luc key injection I should say
20:04 Pierre-Luc openstack-nova-metadata-api stops as soon as you start it
20:05 Pierre-Luc openstack-nova-metadata-api is stopped
20:05 Pierre-Luc Unless I'm mistaken, this is the service that does the key injection, right?
20:14 xarses I cant recall if its injected into the image (leaning towards) or added after the start with metadata
20:15 mrasskazov joined #fuel
20:15 xarses If your running HA, openstack-nova-metadata-api is controlled by crm (corosync)
20:15 xarses and the init script wont run correctly
20:17 Pierre-Luc I deployed a second Openstack not in HA and I get the same problem
20:17 Pierre-Luc I got this problem with every deployment with Fuel 3.2, probably the 5th time now
20:18 xarses hmm
20:18 Pierre-Luc The point is, usually the cloud images only work with key injection, you are not ven aloud to login with username/password
20:18 Pierre-Luc if this feature is not deployed correctly, the images are useless
20:21 Pierre-Luc at first I though that my keys were wrong but it never works
20:51 tatyana joined #fuel
20:58 albionandrew xarses heres the paste bin - http://pastebin.com/jQaHbRrY
20:58 albionandrew I think thats what you needed?
20:59 xarses hmm can you do a 'select id, cluster_id, name, status from tasks;'
21:00 xarses should make that more readable
21:00 xarses albionandrew: ^
21:01 xarses Pierre-Luc: I spoke with someone, its is infact the responsibility of metadata and cloud init to inject the ssh key
21:01 albionandrew xarses - http://pastebin.com/YsMy7f1E
21:03 Pierre-Luc thanks xarses
21:04 Pierre-Luc As I said, the service won't start in any envirronement I deployed with Fuel 3.2
21:04 xarses albionandrew: OK, lets do a DELETE FROM tasks WHERE status == 'running' OR status == 'error';
21:04 xarses Pierre-Luc: I'm setting up a cluster to test. Feel free to create a launchpad bug
21:05 Pierre-Luc xarses : thank  you
21:06 xarses albionandrew: then close out and kill the nailgun process
21:06 xarses it will respawn
21:10 albionandrew xarses: the delete failed due to a constraint violation.  Is it safe to do a cascading delete?
21:14 xarses its safe to truncate the whole table if no other tasks are running
21:23 dhblaz One of the compute nodes disappeared!
21:24 albionandrew xarses killed the nail gun process it respawned the 95% has gone but still sitting at installing centos, no progress indication but option to deploy changes back.
21:26 dhblaz I see this in notifications:
21:26 dhblaz 06-11-2013 21:21:34Node 'Untitled (38:48)' is back online
21:26 dhblaz 06-11-2013 21:19:46Node 'Untitled (38:48)' has gone away
21:27 dhblaz but 38:48 isn't in the environment and also isn't in the unallocated nodes
21:32 xarses albionandrew: iirc the progress bar for the node wont update from that hack, you should be able to deploy now and it should figure it's self out
21:32 albionandrew OK will push deploy changes then
21:32 dhblaz xarses: I looked into vlan splinters
21:33 dhblaz As best I can tell it isn't required for kernels 3.3 or higher
21:33 albionandrew It didn't like it - http://pastebin.com/iGxbewAd
21:35 dhblaz We made enough changes to the network interface hardware confg and firmware we should probalby just start over on a new cluster
21:35 dhblaz albionandrew: are you okay with me deleting rc-32?
21:36 albionandrew go for it
21:36 dhblaz we should also delete rc-30 and power down that blade
21:36 dhblaz It isn't doing anyone any good
21:37 dhblaz xarses: do you want anything from this environment before I blow it away?
21:37 dhblaz albionandrew: I deleted rc-30 please power down that blade
21:41 xarses dhblaz, I don't think so
22:09 mrasskazov joined #fuel
22:13 mrasskazov1 joined #fuel
22:22 xdeller_ joined #fuel
22:45 rmoe joined #fuel

| Channels | #fuel index | Today | | Search | Google Search | Plain-Text | summary