Perl 6 - the future is here, just unevenly distributed

IRC log for #fuel, 2013-12-18

| Channels | #fuel index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
23:11 e0ne joined #fuel
23:17 miroslav_ mutex: Please share your `crm status` output. You may use http://paste.openstack.org/
23:30 mutex sure!
23:30 mutex http://paste.openstack.org/show/55215/
23:32 mutex I'm just not sure how to get things right
23:32 mutex I keep rebooting cluster members but it doesn't seem to make things better
23:34 mutex these are the error logs when I ussue a 'resource start' http://paste.openstack.org/show/55216/
23:34 mutex seems like cluster is out of sync somehow
23:37 mutex also I am running fuel 3.2.1
23:41 miroslav_ mutex: Is you Fuel master node up and running?
23:41 mutex yes
23:41 mutex I use it to login to all of my controllers
23:43 miroslav_ mutex: It looks like even HAProxy is out of sync and it is very weird. Please go to any controller and try to ping any other controller
23:44 miroslav_ mutex: May be network connection between interfaces is broken
23:46 miroslav_ mutex: What is your Openstack network segmentation type - VLAN or GRE?
23:46 mutex VLAN
23:47 mutex I can ping between controllers no problem
23:50 miroslav_ mutex: Please try the following
23:50 miroslav_ 1. Run `
23:50 miroslav_ crm resource cleanup clone_p_quantum-openvswitch-agent
23:50 miroslav_ два раза
23:51 mutex sure
23:51 mutex I performed cleanup on vip__public_old
23:51 miroslav_ 2. `crm resource cleanup clone_p_quantum-openvswitch-agent` one more time
23:52 miroslav_ 3. `crm resource start clone_p_quantum-openvswitch-agent`
23:52 mutex should I cleanup vip__public_old too ?
23:52 mutex (a second time)
23:53 mutex Call cib_replace failed (-1006): Application of an update diff failed
23:53 miroslav_ No, don't touch vip__public_old so far
23:53 mutex k
23:55 mutex cib_replace error occured when I issued resource start
23:55 miroslav_ 4 `crm status' and please share its output
23:56 miroslav_ Then, quite qwickly run the following 3 commands:
23:56 miroslav_ crm resource cleanup vip__public_old
23:56 miroslav_ crm resource cleanup vip__public_old
23:56 miroslav_ crm resource start vip__public_old
23:57 mutex http://paste.openstack.org/show/55222/
23:58 mutex yeah they all seem to think node-6 is holding up the issuing of these commands
00:00 miroslav_ Please share full crm resource status after the last command as well
00:01 mutex sure
00:03 mutex http://paste.openstack.org/show/55223/ for after public_old
00:04 miroslav_ There is no Failed actions list in the last paste.
00:05 miroslav_ OK, please do the following:
00:05 mutex failed actions from the crmd logs ?
00:06 miroslav_ `crm resource stop vip__public_old` , wait 2 minutes and run `crm status` and share its results
00:06 miroslav_ yes, from crmd logs
00:06 mutex k
00:07 miroslav_ Next, run `crm resource unmanage vip__public_old` and wait 1 minute
00:08 mutex k
00:08 miroslav_ 3. crm resource manage vip__public_old
00:09 miroslav_ 4. `crm resource start  vip__public_old` and immediately after `crm status` and share full output from the last crm status as well
00:10 miroslav_ Then wait 2 minutes again and ruun `crm status` one more time and share its results
00:10 miroslav_ *run
00:10 mutex k
00:11 mutex completed wait 2 min: http://paste.openstack.org/show/55224/
00:13 miroslav_ Great, vip__public_old disappeared, so we can manage it
00:15 mutex yay
00:16 rongze joined #fuel
00:16 miroslav_ ?
00:18 mutex http://paste.openstack.org/show/55225/
00:21 miroslav_ In 2 minutes we would know, if it is network failure or just public vip
00:24 miroslav_ For the case - if you going to restart controller in HA mode - please wait about 5 minutes before going to do the next action. Pacemaker may need up to 5 minutes to reconfigure the remained part of the cluster
00:25 mutex k
00:27 mutex how can i restart in HA mode ?
00:27 miroslav_ what is with `crm status` now, after the 2 minutes of waiting?
00:27 mutex for nod-6 ?
00:27 mutex status has not changed as far as I can tel
00:27 mutex tell
00:28 miroslav_ Simply take pauses between controller restarts
00:28 mutex ok
00:29 miroslav_ for nod-6 - for any node, crm is cluster-wide
00:29 mutex takes a long time for the nodes to restart sadly
00:29 miroslav_ Are you restarting nodes now?
00:30 mutex reastarting node-6 only
00:30 Vidalinux joined #fuel
00:30 miroslav_ What is the output from `crm resource list` ?
00:32 miroslav_ Oh, it is not good idea to restart any controller before we restored  vip__public_old operation
00:32 mutex sad
00:32 mutex http://paste.openstack.org/show/55228/
00:35 miroslav_ Was it the whole output from crm resource list?
00:36 miroslav_ Please include empty command prompt after the list, so we might see it is complete output or only part of it
00:40 mutex sure
00:40 mutex what I pasted is the complete output
00:41 mutex ok node-6 rebooted
00:41 mutex so to restore vip__public_old, I need to manage and then start again ?
00:41 mutex hey wait
00:42 mutex vip__public_old is back!
00:42 mutex http://paste.openstack.org/show/55229/
00:43 mutex horizon dashboard doesn't seem to work yet, but at least I can ping the vip
00:43 * mutex investigates
00:44 miroslav_ Then, please wait about 2 minutes and paste full `crm resource list` again
00:44 mutex sure
00:44 mutex is 120sec the timeout for failovers to occur ?
00:44 miroslav_ It looks Mysql/Galera cluster startup and assembling is still in progress
00:45 mutex k
00:46 miroslav_ Depends on failover, number of controllers total, Pacemaker timeouts settings and node speed.
00:54 mutex I see
00:54 mutex I have 6
00:55 miroslav_ mutex: What is the current output from `crm resource list` ?
00:55 mutex http://paste.openstack.org/show/55234/
00:55 mutex looks better, and I can login to horizon
00:56 mutex but my original problem of failed to get vnc console on VM is still present :-/
00:58 mutex thanks for helping with the crm issue though!
00:58 miroslav_ What web browser are you use?
00:58 miroslav_ for horizon
00:58 mutex oh dear, nova-manager service list is all XXX
00:59 mutex I use either chrom or firefox
00:59 miroslav_ Do you use SSL to connect to Horizon?
01:00 mutex I don't think so
01:00 mutex at least, I don't put in https
01:01 mutex I just get RPC timeouts in the logs from attempting to launch a vnc console
01:01 miroslav_ Please check, is there special `unsafe script` sigh in the browser address bar when you are on VNC console page
01:01 mutex sure
01:01 mutex worked last week though
01:01 miroslav_ Please simply restart openstack-nova services on all controllers
01:01 mutex yeah, ok
01:02 mutex wishing I had to time to read the mco manpages and do it that way ;-)
01:03 miroslav_ So far only 2 people of ~100 confirmed they had red the docs
01:05 mutex which docs ?
01:06 mutex I read the relnotes ;-)
01:06 mutex and deployment docs
01:06 miroslav_ Even these one? http://docs.mirantis.com/fuel/fuel-3.2/frequently-asked-questions.html
01:07 mutex actually no
01:07 mutex how I missed them, I don't know
01:10 mutex I claim blindness
01:10 mutex selective blindness
01:11 miroslav_ Well, about non-working VNC console - please check if brouser does not blocks it and shows only small sign - after the recent updates both Chrome and Firefox may do so
01:11 mutex yeah sure
01:12 miroslav_ And please check nova-services on compute nodes, including all *vnc* ones. Console actually lives on computes
01:13 mutex sure
01:13 mutex yeah I know
01:13 mutex I was debugging yesterday
01:13 mutex the VNC port is open
01:13 mutex I could even telnet to it
01:13 mutex that was before I rebooted some controllers and hosed the cluster
01:14 miroslav_ There is no console on controllers at all, controllers just transport for the console. There is bug with NoVNC - it may silently crash after long running, about month or two
01:14 mutex oh, this is totally not that
01:14 mutex VNC was working last wednesday for me, then I came back to it yesterday and it didn't work
01:14 mutex should consoleauth and cert be running on all nodes ?
01:15 mutex s/nodes/controller nodes/
01:15 mutex it only seems to work on some
01:15 mutex restarting has no effect
01:15 miroslav_ Usually simple restart of all VNC related services on compute node resolves console issues
01:15 mutex I see
01:20 miroslav_ Yes, nova-consoleauth should be running as well
01:21 mutex hrm
01:21 mutex XXX on 3/6 nodes
01:21 mutex restarting doesn't seem to help
01:21 mutex I guess I"ll have to increase logging
01:28 miroslav_ And what is the nova-cert status?
01:29 mutex same 3/6 nodes doesn't start
01:30 mutex actually 4/6
01:30 mutex nothing in the logs though
01:31 mutex also the logs don't appear to be split even though all the args all say to log in separate /var/log/nova/nova-console.log
01:31 mutex for example
01:32 mutex looks like there might be a problem with the AMPQ
01:33 mutex or AMQ or whichever
01:33 mutex AMQP
01:33 mutex heh
01:38 miroslav_ Then try `service rabbitmq-server restart` one by one controller sequentially
01:38 miroslav_ And please wait until it finish restart before restarting the next one
01:39 mutex hmmm
01:39 mutex ok
01:39 miroslav_ Or run `rabbitmqctl cluster_status` first
01:40 mutex yeah
01:48 miroslav_ You may also check the remained free space on root partition on the controllers where RabbitMQ fails to start
01:48 miroslav_ It require at least 1 GB free
01:49 rmoe joined #fuel
01:51 mutex heh
01:52 mutex things are slowly being repaired
01:52 mutex yay
01:52 miroslav_ What does mean the latest 'yay'?
01:53 mutex the things are slowly being repaired
01:53 mutex it is almost as though some of the rabbitmq processes were stuck or broken
02:01 miroslav_ Yes, they should be broken after complete network outage.
02:02 mutex k
02:02 mutex so, vnc still seems broke
02:02 mutex I have restarted the vncproxy daemons
02:03 miroslav_ You have to restart nova services after rabbitmq failure
02:03 mutex yeah I did
02:03 mutex they are all smilies now
02:04 mutex if I check the logs, I get RPC timeouts
02:04 mutex for the amqp code
02:04 miroslav_ Which service gets RPC timeouts?
02:06 mutex http://paste.openstack.org/show/55243/
02:06 mutex get_splice_console
02:07 mutex that is from the controller currently hosting vip__public_old
02:11 miroslav_ Have you restarted abbitmq services on all controllers?
02:14 miroslav_ And what is status of openstack-nova-consoleauth service?
02:14 rongze joined #fuel
02:17 mutex consoleauth is all better now
02:17 mutex and yes
02:18 mutex well maybe there was one I didn't restart rabbit on
02:18 mutex so let me check
02:20 wputra joined #fuel
02:31 mutex http://paste.openstack.org/show/55246/
02:32 mutex spice-html5
02:32 mutex interesting
02:34 mutex shouldn't it be getting vnc console ?
02:48 miroslav_ Spice? Please log off from dashboard, clean up cookies in browser, log in, go to console page and try shift+reload page
02:52 mutex maybe my browser is too smart for its own good ;-)
02:52 mutex yeah I did all that but it is still stuck on the spinning console page
02:54 vkozhukalov joined #fuel
02:55 mutex yeah same probelm actually
02:55 mutex I guess i'll debug tomorrow
02:56 * mutex &
02:56 miroslav_ Hmm, spice console may be turned on only via nova.conf
03:10 rongze joined #fuel
03:28 ArminderS joined #fuel
03:43 h6w joined #fuel
03:44 h6w Hi.  Brand new fuel installation from the Mirantis image and I'm getting a 500 Error on the server when I attempt to install from PXE boot.
03:44 h6w Log says: Fault: <Fault 1: "<type 'exceptions.NameError'>:global name 'out_path' is not defined">
03:46 wputra hi all. i have these log when l3-agent stopped http://paste.openstack.org/show/55254/
03:47 wputra is it normal? seem l3 agent connect & disconnect repeatly
03:48 wputra before the l3 agent stop, ovs agent on the same node state failed
03:55 IlyaE joined #fuel
04:00 IlyaE joined #fuel
04:20 book` joined #fuel
05:24 SergeyLukjanov joined #fuel
05:30 rongze joined #fuel
05:41 xarses joined #fuel
05:47 mihgen joined #fuel
06:08 rongze joined #fuel
06:12 SergeyLukjanov joined #fuel
06:16 vkozhukalov joined #fuel
06:53 IlyaE joined #fuel
06:57 ArminderS joined #fuel
06:59 ArminderS- joined #fuel
07:08 Fecn joined #fuel
07:31 vk joined #fuel
07:39 mihgen joined #fuel
07:40 wputra anybody can me about relationship p_quantum-openvswitch-agent, p_quantum-dhcp-agent, p_quantum-l3-agent ?
07:41 wputra seems the dhcp & l3 agent cant start without starting ovs agent first
07:42 wputra once we start ovs agent, corosync & pacemaker automatically start the other agent too
07:43 vk joined #fuel
07:45 wputra unfortunately, when ovs agent seen failed by corosync, the dchp/l3 agent is killed and moved to the other node altough no agent seen failed from quantum agent-list
07:46 e0ne joined #fuel
07:47 wputra as a result, the instance cant have normal network, since dhcp/l3/both failed
07:48 Arminder joined #fuel
07:51 Arminder joined #fuel
07:57 wputra i think i must examine the p_quantum-openvswitch-agent log first. but where the log stored?
08:11 SteAle joined #fuel
08:12 e0ne joined #fuel
08:31 vkozhukalov joined #fuel
08:31 mrasskazov joined #fuel
08:32 SergeyLukjanov joined #fuel
08:36 mihgen joined #fuel
08:39 ArminderS joined #fuel
09:02 e0ne joined #fuel
09:03 mihgen wputra: logs should be put to master node by rsyslog, have you had a look there?
09:20 anotchenko joined #fuel
09:26 rvyalov joined #fuel
09:45 fandikurnia01 joined #fuel
09:46 ruhe joined #fuel
09:49 teran joined #fuel
09:50 anotchenko joined #fuel
10:37 teran joined #fuel
10:39 anotchenko joined #fuel
10:39 teran_ joined #fuel
10:42 vk joined #fuel
11:23 teran joined #fuel
11:24 rongze joined #fuel
11:53 anotchenko joined #fuel
12:01 Vidalinux joined #fuel
12:02 xdeller joined #fuel
12:08 ArminderS joined #fuel
12:23 e0ne joined #fuel
12:25 anotchenko joined #fuel
12:26 anotchenko joined #fuel
12:29 aglarendil_ joined #fuel
12:50 e0ne_ joined #fuel
13:04 fandikurnia01 joined #fuel
13:08 fandikurnia01 joined #fuel
13:16 vk joined #fuel
13:21 rongze joined #fuel
13:23 MiroslavAnashkin h6w: Sorry for delay, we currently have no people in Australia/Oceania timezones to answer fast.
13:24 MiroslavAnashkin h6w: Please check if no other DHCP server exists inside your network segment you use for PXE boot
13:35 aglarendil_ joined #fuel
13:51 SergeyLukjanov joined #fuel
13:56 mattymo joined #fuel
14:14 jkirnosova_ joined #fuel
14:18 evgeniyl joined #fuel
14:19 vk joined #fuel
14:19 jkirnosova joined #fuel
14:25 vk joined #fuel
15:14 IlyaE joined #fuel
15:16 ruhe joined #fuel
15:16 Vidalinux joined #fuel
15:37 SergeyLukjanov joined #fuel
15:40 rongze joined #fuel
15:44 teran joined #fuel
15:58 IlyaE joined #fuel
16:07 evgeniyl joined #fuel
16:19 ruhe joined #fuel
16:30 rvyalov joined #fuel
16:51 mutex Hi, so I am trying to debug why my horizon image console will not work
16:52 mutex looks like there is an AMQP RPC timeout
16:54 mutex I thought maybe it was related to HTML5 in my browser, so I disabled it
16:54 mutex http://paste.openstack.org/show/55342/
16:54 mutex it was all working last week and failed for no apparent reason
16:55 rongze joined #fuel
16:56 e0ne joined #fuel
17:14 teran joined #fuel
17:21 e0ne_ joined #fuel
17:22 e0ne joined #fuel
17:27 angdraug joined #fuel
17:32 ruhe joined #fuel
17:34 vkozhukalov joined #fuel
18:11 rmoe joined #fuel
18:11 xarses joined #fuel
18:14 teran joined #fuel
18:31 MiroslavAnashkin mutex: Please check your nova.conf on compute nodes. Last log you shared yesterday shows Openstack cannot launch SLICE console, not VNC one.
18:31 isd joined #fuel
18:34 isd Hey, We're trying to get openstack set up on some new hardware at the MOC, using fuel. We're using the 4.0 tech preview for it's better neutron support, but having trouble getting it to work.
18:34 isd specifically, it's setting up a "public" network on 172.16.0.0/16, which seems not to route anywhere, and the control node isn't picking up the compute nodes.
18:34 isd We've only got one NIC on each machine, which I recall used to be something that you really couldn't do with fuel and neutron. Is that still the case?
18:34 isd Also, what else might I look at? Fuel is still a bit of a black box to me.
18:39 angdraug isd: yes, with 4.0 tech preview you still need to have at least 2 interfaces
18:39 angdraug there is a workaround documented here: https://review.openstack.org/#/c/62495/1/pages/reference-architecture/0100-bonding.rst
18:40 angdraug it is talking about NIC bonding, but might be applicable to your configuration, too
18:41 angdraug oh, and the public network doesn't have to be 172.16.0.0/16, it's just a default that you can change in the network settings tab in the UI
18:42 e0ne joined #fuel
18:43 isd angdraug: thanks.
18:45 mutex MiroslavAnashkin: yeah I got it to ask for VNC today as you can see in the logs above
18:45 e0ne joined #fuel
18:45 mutex I am wondering if my consoleauth is somehow broken
18:57 mutex MiroslavAnashkin: nova.conf also has vnc enabled
18:57 mutex on my compute nodes
19:03 MiroslavAnashkin mutex: Hmm, interesting. I'll try to deploy similar config in an hour and check, where failure with VNC may happen.
19:04 mutex I had to disable html5 in firefox to get it to ask for vnc
19:05 mutex next path I am proceeding down is debugging rabbitmq to see what is happening to the auth request the browser makes
19:05 mutex I found this very helpful: http://goo.gl/aJa2Ds
19:06 MiroslavAnashkin We never disable HTML5 and VNC works
19:06 mutex heh, yeah I think the spice vs vnc problem is only slightly related
19:07 mutex there are RPC timeouts no matter which console auth request goes out
19:07 mutex and like I said, it was working last week
19:17 rmoe_ joined #fuel
19:32 IlyaE joined #fuel
19:35 e0ne joined #fuel
19:39 evgeniyl` joined #fuel
20:08 e0ne joined #fuel
21:00 teran joined #fuel
22:19 e0ne joined #fuel
22:28 e0ne joined #fuel
22:53 rongze joined #fuel
22:59 e0ne joined #fuel

| Channels | #fuel index | Today | | Search | Google Search | Plain-Text | summary