Perl 6 - the future is here, just unevenly distributed

IRC log for #fuel, 2013-11-07

| Channels | #fuel index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:03 rmoe joined #fuel
01:22 xarses joined #fuel
02:31 boldhawk joined #fuel
02:32 boldhawk I have 13 nodes,1 controller, 6 compute and 6 ceph, however only 11 of them have finished installing ubuntu. two of the seem to be stuck and don't seem to be continuing? Is there a timeout before fuel will fail?
02:33 boldhawk or a way to recover and continue?
03:32 r0mikiam joined #fuel
03:52 dhblaz joined #fuel
04:00 dhblaz joined #fuel
04:33 ArminderS joined #fuel
05:02 r0mikiam joined #fuel
05:57 r0mikiam joined #fuel
06:26 mrasskazov joined #fuel
07:00 xarses it will timeout eventually
07:01 xarses and will probably fail
07:53 tatyana joined #fuel
08:26 e0ne joined #fuel
08:37 teran joined #fuel
08:51 e0ne joined #fuel
08:54 e0ne_ joined #fuel
08:55 e0ne joined #fuel
08:58 bogdando joined #fuel
10:09 mrasskazov joined #fuel
10:21 e0ne joined #fuel
10:24 e0ne_ joined #fuel
10:38 mrasskazov joined #fuel
10:42 tatyana joined #fuel
11:07 MiroslavAnashkin joined #fuel
12:13 xarses joined #fuel
12:19 e0ne joined #fuel
12:24 vk joined #fuel
12:54 MiroslavAnashkin joined #fuel
13:25 xarses joined #fuel
13:29 ArminderS joined #fuel
13:33 e0ne joined #fuel
13:57 MiroslavAnashkin joined #fuel
15:46 MiroslavAnashkin joined #fuel
15:55 MiroslavAnashkin joined #fuel
15:59 xdeller joined #fuel
16:02 dhblaz joined #fuel
16:05 Grishkin joined #fuel
16:09 teran left #fuel
16:16 r0mikiam joined #fuel
16:19 dhblaz We had an environment fail last night while trying to make the first controller.  I would love some help getting this to the point where I can make a useful bug on launchpad
16:23 dhblaz joined #fuel
16:31 xdeller hi dhblaz
16:31 xdeller what`s the progress and what remains?
16:32 dhblaz The GUI shows ERRORS for the three controller nodes.
16:32 dhblaz I tried clicking the Deploy button for a second time per the instructions at the top
16:33 xdeller are puppet logs okay? did you checked them?
16:33 dhblaz But that didn't resolve the problem
16:34 dhblaz This is what I would like help with
16:34 dhblaz When I filter by WARN+ there are lines that match
16:34 xdeller oh nice, pastie?
16:35 dhblaz Should I just upload the entire puppet log?
16:35 xdeller fine too
16:35 dhblaz Or the relavant segment?
16:36 xdeller entire log sounds more promising in means of solution idea
16:37 ArminderS need helpwith this error in puppet log for contollers 2 & 3 in ha on fuel-3.2
16:37 ArminderS ERR
16:37 ArminderS (/Stage[main]/Ceph/Service[ceph]/ensure) change from stopped to running failed: Could not start Service[ceph]: Execution of '/sbin/service ceph start' returned 1:  at /etc/puppet/modules/ceph/manifests/init.pp:77
16:37 dhblaz Here is a segment:
16:37 dhblaz http://paste.openstack.org/show/50609/
16:37 dhblaz I'm looking to upload the entire log
16:38 xdeller dhblaz: was it first run?
16:39 dhblaz Does that file get overwritten?
16:39 xdeller ceph-deploy is definitely crappy tool but not that crappy...
16:39 aglarendil ArminderS: are you describing the same environment as dhblaz is ?
16:39 dhblaz No, he isn't
16:39 xdeller I`d suggest to remove ceph.conf from every node w/ ceph role and try again
16:40 xdeller and of course fill a bug
16:40 aglarendil Oh, that's fine because you are both describing ceph bugs. Could you please create bug at http://launchpad.net/fuel and attach diagnostic snapshot?
16:41 ArminderS seems like its the same
16:41 xdeller no it does not
16:41 xdeller it`s different stages
16:42 xdeller yours failing on cluster start and dhblaz`s failing on daemon layout creation
16:42 ArminderS right, i just checked and see no warning in my run
16:42 ArminderS for puppet
16:42 ArminderS just ceph start fails
16:42 aglarendil diagnostic snapshot is an archive that you can download from 'Support/Help' tab in GUI
16:42 ArminderS and i can see 6 processes for key creation on controllers 2 & 3
16:42 xdeller can you try /etc/init.d/ceph -a start and post output?
16:42 xdeller wow
16:43 xdeller so if it hangs on key creation you have no need to do this
16:44 ArminderS plus this in logs
16:44 ArminderS 2013-11-06 22:54:40.783453 7fa2e499f7a0  0 ceph version 0.61.8 (a6fdcca3bddbc9f177e4e2bf0d9cdd85006b028b), process ceph-mon, pid 19381
16:44 ArminderS 2013-11-06 22:54:40.853454 7fa2e499f7a0  0 mon.node-8 does not exist in monmap, will attempt to join an existing cluster
16:44 ArminderS 2013-11-06 22:54:40.853660 7fa2e499f7a0 -1 no public_addr or public_network specified, and mon.node-8 not present in monmap or ceph.conf
16:44 dhblaz xdeller: The second run doesn't start until 2013-11-07T01:54:00.194744+00:00
16:44 dhblaz So that is from the first run
16:44 xdeller just post the logs, for most probable reason deploy hangs at this point and relaunched multiple times
16:45 xdeller eh... i have no idea from where initial ceph.conf came from except package.. just a sec
16:46 ArminderS seems like its failing to join exisiting cluster because initial just contains primary controller
16:46 dhblaz http://paste.openstack.org/show/pVGYOwinhGkg8eyLL1RB/
16:47 ArminderS does it just requires the public network statement in the ceph.conf of the node hosting the new monitor?
16:47 ArminderS something here -> http://tracker.ceph.com/issues/5195
16:47 xdeller a bit confusing, ceph pkgs not including default config
16:47 dhblaz It shows this several lines up:
16:47 dhblaz 2013-11-06T22:36:01.749927+00:00 notice:  (/Stage[main]/Ceph::Conf/Exec[ceph-deploy new]/returns) [[1mceph_deploy.new[0m][[1;34mDEBUG[0m ] Writing initial config to ceph.conf...
16:48 xdeller seems that the your deploy retried at least once too
16:48 dhblaz Look around line 2851
16:48 dhblaz xdeller: if you are referring to mine.  I cliecked the deploy button a second time but this error is from the first run.
16:49 xdeller our deploy tries three times before error will be thrown at the higher level
16:49 xdeller that`s primary suspect
16:50 xdeller ok, so both issues are related to the incorrect ceph-deploy behaviour for me
16:51 dhblaz This may help with the timeline:
16:51 dhblaz http://paste.openstack.org/show/kBTO6oeB4RTrLdUqtqwX/
16:51 dhblaz Clearly in the first run
16:51 xdeller if you are in hurry we can try either ceph-deploy by hand or mkcephfs on config that our manifests created
16:52 ArminderS xdeller: would that work for me too?
16:53 dhblaz I'm trying to get to the point that I can make reliable deployments
16:53 dhblaz I can trudge through getting the cluster up and running but I would rather find the root cause and fix it.
16:53 ArminderS i had good success with non-ha one
16:53 xdeller of course, but you should kill ceph-* processes and clear all osd and mon catalogs
16:53 ArminderS but the ha one failed twice for me on this
16:54 xdeller I prefer mkcephfs just for straightforward work but it is almost non-customizable
16:54 xdeller and requires a lot of preparations
16:54 ArminderS i got 3 controllers with no osd roles & 2 ceph nodes with osd roles
16:54 xdeller so just for enhancing fuel deployment scenario we should stay with ceph-deploy
16:55 xdeller I`d recommend at least three osd hosts
16:55 xdeller it is general production consideration and we have same number hardcoded for replication level
16:56 ArminderS i think its set to 2 for replication factor
16:56 ArminderS so asks for 2 osds minimum
16:57 rmoe joined #fuel
16:57 angdraug joined #fuel
17:02 xdeller ehm, in our deployment rf should be set to three
17:02 xdeller had you replaced this param?
17:05 dhblaz FYI we use 2 for replication factor also
17:06 ArminderS the default is 2 in fuel-web
17:06 ArminderS and i never bother to change that
17:07 ArminderS i may in future would like to change that to 3 for some specific pool, probably volumes
17:07 ArminderS factor of 2 for images is fine for me
17:08 xarses joined #fuel
17:16 dhblaz xdeller: If I read this correctly
17:16 dhblaz Puppet does this:
17:16 dhblaz /usr/bin/ceph-deploy new node-70:10.62.10.2
17:17 dhblaz Then tries: /usr/bin/ceph-deploy mon create node-70:10.62.10.2
17:17 dhblaz And that second one failes
17:17 dhblaz fails rather
17:17 dhblaz I would expect that this would happen every time
17:17 xdeller got distracted; 2 is not recommended due to nature of recovery in disaster cases
17:17 dhblaz Because the second run will fail with: RuntimeError: config file /etc/ceph/ceph.conf exists with different content; use --overwrite-conf to overwrite
17:17 xdeller you simply will not have a "copy quorum" when replicas will be different
17:19 xarses joined #fuel
17:27 ArminderS wow, it was very easy to fix
17:27 ArminderS just added public_network in ceph.conf of controllers 2 & 3 and that fixed the ceph start issue
17:28 ArminderS added 2nd controller (node-8) -> monmap e2: 2 mons at {node-7=192.168.0.2:6789/0,node-8=192.168.0.3:6789/0}, election epoch 2, quorum 0,1 node-7,node-8
17:30 ArminderS now added the 3rd controller (node-10) too
17:31 ArminderS monmap e3: 3 mons at {node-10=192.168.0.5:6789/0,node-7=192.168.0.2:6789/0,node-8=192.168.0.3:6789/0}, election epoch 6, quorum 0,1,2 node-10,node-7,node-8
17:31 ArminderS thats fixed
17:31 ArminderS probably in puppet add to add the public_network setting in ceph.conf and you are good
17:39 xarses dhblaz: can you paste /root/ceph.log from node-70?
17:44 xarses Arminder: if public_network and cluster_network are missing its due to https://review.openstack.org/#/c/52850/
17:44 xarses ArminderS: even ^
17:47 ArminderS hrm
17:48 xarses ArminderS: that will be in 3.2.1
17:48 xarses ArminderS: I assume you are using nutron gre or vlan
17:49 dhblaz xarses: I got your request
17:49 dhblaz I have to run out for about 1.5 hours
17:49 dhblaz I'll get it to you then
17:50 ArminderS i'm using neutron with gre
17:50 xarses dhblaz: OK, thaks
17:50 xarses ArminderS: I don't see the OSD's in your ceph -s did those come up OK?
17:50 ArminderS thanks xarses
17:50 ArminderS do i need to put in cluster_network too?
17:50 ArminderS because i just added public_network
17:51 xarses Arminder: If you want the osd's to be required to use the storage network for transport, yes
17:51 ArminderS $cluster_network           = $::fuel_settings['storage_network_range'],
17:51 ArminderS $public_network            = $::fuel_settings['management_network_range'],
17:51 xarses correct
17:51 ArminderS ^ is what i see in ceph/manifests/init.pp
17:51 ArminderS i'll do that
17:52 ArminderS and what services other than ceph i need to restart for the changes i do
17:52 xarses just ceph
17:52 ArminderS i'm using rados-gw for objects
17:53 xarses hmm since we are restarting the osd's ya, you should restart rados
17:55 ArminderS so ceph & ceph-radosgw restart on controllers
17:55 ArminderS and ceph only on osd's, right
17:56 xarses yes
17:57 ArminderS great, thanks
17:57 ArminderS i missed this -> [07-Nov-13 23:20] <@xarses> ArminderS: I don't see the OSD's in your ceph -s did those come up OK?
17:57 ArminderS those are showing up fine
17:57 ArminderS [root@node-7 ceph]# ceph -s
17:57 ArminderS health HEALTH_OK
17:57 ArminderS monmap e3: 3 mons at {node-10=192.168.0.5:6789/0,node-7=192.168.0.2:6789/0,node-8=192.168.0.3:6789/0}, election epoch 6, quorum 0,1,2 node-10,node-7,node-8
17:57 ArminderS osdmap e42: 14 osds: 14 up, 14 in
17:57 ArminderS pgmap v274: 692 pgs: 692 active+clean; 0 bytes data, 29178 MB used, 26026 GB / 26054 GB avail
17:57 ArminderS mdsmap e1: 0/0/1 up
17:58 ArminderS i just jad pasted monmap only earlier
17:59 ArminderS actually there's only 2 ceph nodes, but with 7 drives each, so 14 osd's
18:02 xarses sounds good
18:05 ArminderS except for just ceph start failure on controller 2 & 3, the deployment was smooth
18:05 xarses Also something to note, you radosgw might not work correctly due to a couple of issues we found after the release. There are some test instructions here
18:06 xarses https://github.com/Mirantis/fuel/blob/master/deployment/puppet/ceph/README.md#swift-testing
18:06 xarses you will probably find that radosgw is not running on controller 1 & 2 and that you cant talk to the endpoint correctly
18:06 ArminderS i had tested radosgw on non-ha and it worked fine for me
18:06 xarses its only an issue in ha
18:07 ArminderS i'll check on ha one too
18:07 xarses I have some patched manifests that we can run that should clear it up
18:07 xarses afk
18:08 ArminderS damn github never works from my home dsl...weird
18:12 angdraug joined #fuel
18:13 e0ne joined #fuel
18:39 xarses back
18:47 ArminderS xarses: radosgw doesn't work like you said
18:48 ArminderS 2013-11-07 18:45:08.795551 7f2e5dd35820  0 ceph version 0.61.8 (a6fdcca3bddbc9f177e4e2bf0d9cdd85006b028b), process radosgw, pid 26112
18:48 ArminderS 2013-11-07 18:45:08.805728 7f2e5dd35820  0 librados: client.radosgw.gateway authentication error (1) Operation not permitted
18:48 ArminderS 2013-11-07 18:45:08.806224 7f2e5dd35820 -1 Couldn't init storage provider (RADOS)
18:48 ArminderS [07-Nov-13 23:37] <@xarses> I have some patched manifests that we can run that should clear it up
18:48 ArminderS can i have these?
18:49 ArminderS radosgw is not running on controller 1 & 2 and on 3rd, openssl error taking to endpoint
18:50 ArminderS 500 internal server error
18:52 xarses yep, let me find the patches you need
18:53 xarses alternatly, you could use the 3.2.1-rc1 tag, but I haven't tested everything in there yet
18:55 ArminderS would appreciate if you can share the patches
18:58 dhblaz joined #fuel
19:07 dhblaz xarses: node-70 isn't available by just sshing to node-70.  I had to go find a working IP for it on the console.
19:08 dhblaz http://paste.openstack.org/show/50642/
19:09 dhblaz Looks like they are the same: diff -u /root/ceph.conf /etc/ceph/ceph.conf
19:10 dhblaz Oh, I see now:
19:10 dhblaz [root@node-70 ~]# ls -l /root/ceph.conf /etc/ceph/ceph.conf
19:10 dhblaz -rw-r--r-- 1 root root 384 Nov  6 22:36 /etc/ceph/ceph.conf
19:10 dhblaz lrwxrwxrwx 1 root root  19 Nov  6 22:36 /root/ceph.conf -> /etc/ceph/ceph.conf
19:10 xarses dhblaz: can you paste /root/ceph.log?
19:10 dhblaz So as best I can tell this is never going to work
19:11 xarses controller-1 has /root/ceph.conf linked to /etc/ceph/ceph.conf
19:11 xarses to prevent a race issue that ceph-deploy introduced
19:11 Arminder joined #fuel
19:13 dhblaz http://paste.openstack.org/show/6MKGQe0UBQHJoUTcsOe7/
19:13 dhblaz but if you call ceph-deploy twice the second one will fail
19:13 dhblaz Unless --overwrite-conf is used
19:14 dhblaz I imagine that --overwrite-conf isn't used because you don't want to wipe out any existing config possibly leading to dataloss
19:14 xarses the actual reason for the failure is @32
19:15 xarses 2013-11-06 22:36:04,077 [node-70][ERROR ] 2013-11-06 22:36:04.008371 7f373a8167a0 -1 unable to find any IP address in networks: 10.62.4.0/24
19:15 xarses 2013-11-06 22:36:04,077 [node-70][ERROR ] 2013-11-06 22:36:04.008371 7f373a8167a0 -1 unable to find any IP address in networks: 10.62.4.0/24
19:18 dhblaz The only IP I would expect it to find would be the local node because this is first run on the first controller
19:20 xarses the --overwrite-conf is due to how ceph-deploy is configured to execute and can actually be false in some cases. We should have it squared in the manifests for when we actually expect it to be an issue and use --overwrite-conf appropriately. They should be ignorable.
19:21 xarses dhblaz: it expects there to be an interface on 10.62.4.0/24 for mon to bind to (taken from public_network in ceph.conf)
19:21 xarses it was not up when it executed
19:22 xarses it could be up now, which is odd.
19:22 xarses Is this centos still or ubuntu?
19:22 dhblaz centos
19:22 dhblaz When I look at the function that error message comes from it looks like it does something different than you are explaining
19:23 xarses odd that it occurred in centos, we saw issues previously in ubuntu
19:23 xarses the message is returned from ceph-mon
19:24 xarses and since you had problems getting into the box, i would agree that the inteface on 10.62.4.0/24 is probably down
19:25 xarses is it up now?
19:26 dhblaz check out @647 http://paste.openstack.org/show/pVGYOwinhGkg8eyLL1RB/
19:26 dhblaz (L3_if_downup[br-storage](provider=ruby)) Can't put interface 'br-storage' to UP state.
19:29 e0ne joined #fuel
19:29 dhblaz Later there is this @589 where is couldn't ping the gateway on the public network
19:30 dhblaz I did a network restart on node-70 to copy the logs off you asked about (about 20 minutes ago)
19:30 dhblaz I checked and I can ping 192.168.62.1 now.
19:33 xarses ArminderS: here are the patches you need for just rados https://github.com/xarses/fuel/tree/radosgw-fixes
19:33 dhblaz xarses: WRT you suggesting enabling vlan splinters.
19:33 dhblaz After reading this: http://git.openvswitch.org/cgi-bin/gitweb.cgi?p=openvswitch;a=blob_plain;f=FAQ;hb=HEAD
19:33 ArminderS thanks xarses
19:33 dhblaz Specifically: Q: VLANs don't work.
19:34 dhblaz I decided that using Kernel 3.10 on the machines that could run it made more sense.
19:34 dhblaz And enabling vlan splinters on the machines that couldn't run it.
19:34 xarses to re-run, on the first controller modify /etc/astute.yaml change role = .. to role = primary_controller
19:34 xarses on controllers 2, and 3 role = controller
19:35 dhblaz xarses: is that for me?
19:35 xarses and then run puppet agent --{onetime,summarize,debug,test} | tee ~/agent.log
19:35 xarses ArminderS: ^^
19:36 ArminderS sure thing
19:36 xarses dhblaz: no, your approach sounds good, I'm testing a patch today to try to enable vlan splinters in some cases
19:37 xarses ArminderS: you can either make a patch from that or replace /etc/puppet/modules with deploymnet/puppet/ from that url
19:37 xarses ArminderS: on the fuel node
19:38 ArminderS latter sounds good
19:39 ArminderS -> replace /etc/puppet/modules with deploymnet/puppet/
19:39 xarses yep
19:41 ArminderS does it contain the patch for missing cluster_network & public_network too?
19:42 xarses ArminderS: no it does not, that is a problem with astute, I'm not exactly sure how to hot patch that easily
19:42 xarses erm,
19:42 xarses not astute its in nailgun
19:42 xarses sorry
19:42 ArminderS no worries, its easier to fix it manually
19:43 xarses /opt/nailgun/lib/python2.6/site-packages/nailgun/
19:43 xarses and then kill the nailgun process and it will restart
19:45 ArminderS thats just 2 files to edit
19:45 xarses dhblaz: https://github.com/Mirantis/fuel/pull/806, not tested at all yet
19:49 dhblaz The problem I am having is not related to spliters
19:49 dhblaz I would like to try to find a solution to the problem I am having where the storage network isn't coming up.
19:49 dhblaz Perhaps there is a race condition and it needs to wait after the down/flush/up steps
19:50 xarses dhblaz: are all the interfaces up now?
19:50 dhblaz I don't think so
19:50 dhblaz because the IP address the fuel master expects it to have isn't up
19:52 xarses is this running the 2.6 kernel, or the 3.10?
19:52 ArminderS xarses: its 1:20 am in here, so I'll probably take nap now and will check on those patches tomorrow and let you know how it went
19:52 ArminderS thanks again
19:52 xarses ArminderS: you're welcome
19:54 dhblaz [root@node-70 ~]# uname -a
19:54 dhblaz Linux node-70.mumms.com 2.6.32-358.118.1.openstack.el6.x86_64 #1 SMP Fri Sep 20 09:31:41 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
19:54 dhblaz xarses: ^^
19:55 xarses dhblaz: OK, i think we should create a bug for the network interfaces not loading on the node
19:55 dhblaz I am pretty sure I had modified the /etc/grub.conf so that 3.10 loaded
19:55 dhblaz but I either didn't or something reversed that change
19:56 dhblaz I don't think that this would explain not having it's PXE IP correct
19:56 dhblaz because that is an untagged network
19:57 xarses are the eth interface present IE kernel loaded them?
19:57 xdeller joined #fuel
19:59 dhblaz xarses: In this case the PXE network is on eth2 (according to the web gui)
20:00 dhblaz eth2 is up but has an IP address of 10.62.12.60
20:00 dhblaz But:
20:00 dhblaz [root@fuel1 ~]# host node-70.mumms.com
20:00 dhblaz node-70.mumms.com has address 10.62.12.38
20:00 dhblaz I'm not sure why but the fuel master node expects it to have IP of 12.38
20:02 dhblaz /etc/astute.yaml and /etc/sysconfig/network-scripts/ifcfg-eth2 list 10.62.12.60 instead of what the fuel master node expects
20:02 dhblaz (at least what the dns shows is expected)
20:04 ArminderS has udev renamed the nics?
20:06 ArminderS check for macs in fuel & on udev
20:06 dhblaz ArminderS: that doesn't appear to be the problem in this case.
20:06 dhblaz Although we have this problem often
20:07 ArminderS 2 of my nodes showed as eth0 on fuel-web, but post install came up as eth2
20:08 ArminderS had to fix it in /etc/udev/rules.d/70-persistent-net.rules to make it work
20:08 ArminderS that was for admin (pxe) nics
20:12 tatyana joined #fuel
20:25 mrasskazov joined #fuel
20:34 dhblaz Does anyone know why force is used for debian but not centos?
20:34 dhblaz if Facter.value(:osfamily) == 'Debian'
20:34 dhblaz # add force for debian-based OS ([PRD-2132])
20:34 dhblaz ifup(['--force',@resource[:interface]])
20:34 dhblaz else
20:34 dhblaz ifup(@resource[:interface])
20:34 dhblaz end
20:34 mrasskazov joined #fuel
20:35 mrasskazov1 joined #fuel
20:38 xdeller because debian will not reconfigure interface in up state without this option
20:38 dhblaz I had a situation last night were an interface didn't come up (on centos)
20:39 dhblaz check out @647 http://paste.openstack.org/show/pVGYOwinhGkg8eyLL1RB/
20:39 dhblaz I also see that there is an option to set a sleep_time for the resource.  This is something else that may help if there is a race condition
20:39 dhblaz But I am not sure which would help more.
20:41 dhblaz The sleeps look like this:
20:41 dhblaz sleep @resource[:sleep_time]
20:41 dhblaz I am not sure where I would define this sleep_time
20:44 xdeller what is driver in use and what`s on other side?
20:44 dhblaz It looks like the default is 3 though
20:45 dhblaz What do you mean "what's on the other side"?
20:45 xdeller I mean switch model or so
20:46 xdeller it may take a lot of seconds to change link state for switches
20:46 dhblaz Oh, the "switch" is an HP Flex-10
20:47 dhblaz I believe the driver is e1000
20:47 xdeller i`ll be glad if you will measure this delay on it
20:47 xdeller no, host driver does not matter
20:49 dhblaz Sorry, I thought that is what you wanted when you asked "what is driver in use"
20:50 dhblaz I'm about to try another deployment
20:50 dhblaz I added force for centos and changed the default sleep_time to 10
20:52 e0ne joined #fuel
20:53 xdeller something like ip link set if0 down; sleep 10 ; ip link set if0 up; p=0 ; while [ -z "$(ethtool if0 | grep "Link detected: yes")" ] ; do  echo $((p++)); sleep 1;  done
20:53 dhblaz This doesn't match what is happening very closely
20:54 dhblaz The L3_if_downup function does down/flush/up
20:54 xdeller just a characteristic for your switching hardware, e.g. timeout fil link detection
20:54 dhblaz It isn't clear how it determins how up works
20:55 xdeller no, I just need it to verify if L3 may handle this
20:55 xdeller can you please try it on your hw?
20:55 dhblaz I will
20:56 dhblaz But I need to deploy another environment
20:56 xdeller it can be done even in the bootstrap env
20:56 dhblaz okay
20:57 xdeller thank you
21:01 dhblaz [root@bootstrap ~]# ip link set eth0 down; sleep 10 ; ip link set eth0 up; p=0 ; while [ -z "$(ethtool eth0 | grep "Link detected: yes")" ] ; do  echo $((p++)); sleep 1;  done
21:01 dhblaz 0
21:01 dhblaz 1
21:01 dhblaz 2
21:01 xdeller just three sec?
21:02 dhblaz Yes
21:02 dhblaz I have run it a few times now with the same results
21:02 xdeller thanks, will verify tomorrow
21:03 dhblaz But it looks like you only give it 3 seconds:
21:03 dhblaz 2013-11-06T22:29:25.672679+00:00 notice:  (L3_if_downup[br-storage](provider=ruby)) Interface 'br-storage' flush.
21:03 dhblaz 2013-11-06T22:29:28.801945+00:00 notice:  (L3_if_downup[br-storage](provider=ruby)) Can't put interface 'br-storage' to UP state.
21:04 xdeller I mean the same
21:04 xdeller thanks again
21:12 e0ne joined #fuel
21:14 dhblaz I removed force before puppet ran
21:14 dhblaz because I think it would call ifup --force
21:14 dhblaz and I don't believe this is supported on centos
21:23 dhblaz aglarendil: Do you have a fix for the udev rules problem that I could test
21:23 dhblaz I just had another deployment run into this problem
21:24 dhblaz I'd love to test the fix
21:30 tatyana_ joined #fuel
21:52 dhblaz xdeller: I have been thinking about this br-storage issue we just talked about
21:52 dhblaz you had me look at a L1 issue but I don't think that the br-storage interface would be affected by that layer L1 issue you just had me look at.
22:04 angdraug dhblaz: I suspect xdeller and aglarendil went to sleep, it's 1am where they are...
22:05 dhblaz Good to know ;)
22:05 angdraug which udev fix were you looking for?
22:06 angdraug maybe I can dig it up for you
22:08 dhblaz https://bugs.launchpad.net/fuel/+bug/1248404
22:10 dhblaz Hmmm, puppet just encountered this:
22:10 dhblaz 2013-11-07 21:53:18NOTICE
22:10 dhblaz (/Stage[main]/Quantum::Agents::L3/Corosync::Cleanup[p_quantum-l3-agent]/Exec[crm resource cleanup p_quantum-l3-agent]) Triggered 'refresh' from 1 events
22:10 dhblaz 2013-11-07 21:59:23ERR
22:10 dhblaz (/Stage[main]/Quantum::Agents::L3/Service[quantum-l3]/ensure) change from stopped to running failed: execution expired
22:12 dhblaz On second run it worked okay
22:19 angdraug we don't yet have a fix for that udev issue
22:20 angdraug the execution expired is a known bug, we're still wrestling with that one, too. there's a race between crm start and crm cleanup
22:25 dhblaz It seems like a retry would do the trick on that one.
22:26 dhblaz I'm about to file a bug against L3_if_downup
22:26 dhblaz it looks like it does this:
22:26 dhblaz ifdown br-storage; sleep 3; ip addr flush br-storage; sleep 3;ifup br-storage
22:26 dhblaz But this appears to always fail on subsequent runs for these bridge interfaces
22:27 dhblaz [root@node-76 ~]# ifdown br-storage; sleep 3; ip addr flush br-storage; sleep 3;ifup br-storage
22:27 dhblaz Error, some other host already uses address 10.62.4.2.
23:25 e0ne joined #fuel
23:31 e0ne joined #fuel
23:35 dhblaz One of my osd nodes can't talk to the controller.  So this command fails:
23:35 dhblaz ceph-deploy --overwrite-conf config pull node-76
23:35 dhblaz Pings time out in both directions
23:35 dhblaz Seems like the troublesome osd (node-80) can't talk to any nodes on the management network
23:41 dhblaz I'm trying to figure out how ovs is configured to connect br-mgmt to a physical nic
23:41 dhblaz But I'm struggling to find the command
23:47 dhblaz This command lists the bridges: ovs-vsctl list-br
23:47 dhblaz This command lists the interfaces for a bridge: ovs-vsctl list-ports br-mgmt
23:47 dhblaz (listing for the br-mgmt in this case)

| Channels | #fuel index | Today | | Search | Google Search | Plain-Text | summary