Perl 6 - the future is here, just unevenly distributed

IRC log for #fuel, 2014-01-24

| Channels | #fuel index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
23:09 rmoe joined #fuel
00:07 xarses joined #fuel
00:18 dhblaz joined #fuel
01:52 rmoe joined #fuel
02:03 xarses joined #fuel
02:38 kevein joined #fuel
02:45 Arminder joined #fuel
03:12 kevein joined #fuel
03:14 rongze joined #fuel
03:17 Arminder- joined #fuel
03:27 topshare joined #fuel
03:58 dhblaz joined #fuel
04:32 richardkiene_ joined #fuel
04:42 ArminderS joined #fuel
04:54 rongze joined #fuel
05:05 ArminderS joined #fuel
05:18 vkozhukalov joined #fuel
05:25 rongze joined #fuel
05:27 rongze_ joined #fuel
05:36 topshare joined #fuel
05:37 ogelbukh joined #fuel
05:38 kevein joined #fuel
06:25 rongze joined #fuel
06:37 ArminderS joined #fuel
06:39 ArminderS joined #fuel
06:40 ArminderS joined #fuel
06:43 ArminderS joined #fuel
06:47 ArminderS joined #fuel
07:09 rongze joined #fuel
07:27 AndreyDanin joined #fuel
08:02 topshare joined #fuel
08:03 topshare joined #fuel
08:11 e0ne joined #fuel
08:20 topshare joined #fuel
08:22 miguitas joined #fuel
08:27 vkozhukalov joined #fuel
08:36 richardkiene_ I'm having an issue where my neutron-l3-agent will not start
08:37 richardkiene_ I've issued "crm resource restart clone_p_neutron-plugin-openvswitch-agent"
08:37 richardkiene_ I've also tried "cdm resource restart p_neutron-l3-agent" on the host assigned to it
08:40 richardkiene_ am I missing a magic command?
08:50 teran joined #fuel
09:00 vk joined #fuel
09:00 teran_ joined #fuel
09:20 teran joined #fuel
09:49 mrasskazov1 joined #fuel
10:16 mrasskazov1 joined #fuel
10:26 mrasskazov1 joined #fuel
11:25 rongze joined #fuel
11:27 akupko joined #fuel
11:33 teran joined #fuel
11:33 teran joined #fuel
12:09 e0ne joined #fuel
12:10 richardkiene_ I believe I've fixed the l3 agent not starting issue
12:10 richardkiene_ but I believe there is a pretty weird timing bug with corosync
12:18 rongze joined #fuel
12:47 MiroslavAnashkin joined #fuel
12:54 MiroslavAnashkin joined #fuel
12:55 ArminderS joined #fuel
13:11 ArminderS joined #fuel
13:24 richardkiene_ Now I'm dealing with multiple neutron agents for some reason. Has anyone encountered this before?
13:36 MiroslavAnashkin richardkiene_: What type of neutron agents do you mean? L3, DHCP, OVS, etc?
13:36 richardkiene_ L3 and DHCP
13:36 richardkiene_ The networking issue I was experiencing a few days ago came back
13:37 richardkiene_ and the l3 agent was stopped (which I'm assuming was the cause of the complete lack of connectivity)
13:37 richardkiene_ MiroslavAnashkin: http://paste.openstack.org/show/61816/
13:38 richardkiene_ I've ran `crm resource restart clone_p_neutron-plugin-openvswitch-agent` multiple times
13:38 MiroslavAnashkin richardkiene_: We plan and hope to get multiple L3 agents in Fuel 4.1.
13:38 richardkiene_ I've done crm cleanup on each of the services multiple times
13:38 richardkiene_ and done rolling restarts
13:40 richardkiene_ You'll notice that `crm resource list` is in conflict with `neutron agent-list` http://paste.openstack.org/show/61817/
13:40 richardkiene_ and I can't get DHCP to start for the life of me
13:40 MiroslavAnashkin richardkiene_: But regarding the multiple DHCP agents - zebra cannot walk on 5 legs, it begin to flounder.
13:42 richardkiene_ the only way I was able to fix this before was by rebooting the boxes, and that seems rather heavy handed
13:42 MiroslavAnashkin richardkiene_: So, only single DHCP agent instance is possible per L2 segment
13:42 richardkiene_ yep that makes sense
13:43 richardkiene_ not quite sure how I ended up with 3 :)
13:43 MiroslavAnashkin richardkiene_: I see your L3 agent is started in unmanaged mode.
13:44 MiroslavAnashkin richardkiene_: please stop it first or even kill.
13:45 MiroslavAnashkin richardkiene_: Then, please run `crm resource cleanup clone_p_neutron-plugin-openvswitch-agent`
13:45 richardkiene_ okay, doing that now
13:46 MiroslavAnashkin richardkiene_: and then `crm resource start clone_p_neutron-plugin-openvswitch-agent`
13:46 richardkiene_ definitely done this before, hopefully just in the wrong order or something :)
13:47 MiroslavAnashkin And please give CRM an 2 minute timeout after each command
13:47 richardkiene_ oh okay, I've run cleanup, I'll wait a few minutes before the start
13:49 richardkiene_ still says unmanaged and same states
13:50 MiroslavAnashkin richardkiene_: OK, please check the L3 agent log.
13:51 vk joined #fuel
13:51 richardkiene_ I was able to run cleanup on p_neutron-dhcp-agent and p_neutron-l3-agent successfully and stop both cleanly
13:52 richardkiene_ so it not longer says unmanaged
13:52 richardkiene_ I'm running `crm resource restart clone_p_neutron-plugin-openvswitch-agent` now
13:52 richardkiene_ now the dhcp agent is started (unmanaged) failed
13:52 richardkiene_ and the l3 agent is stopped :(
13:54 richardkiene_ I'll look for the log file
13:56 richardkiene_ nothing scary in the l3 log that I can see
13:58 bookwar joined #fuel
13:58 richardkiene_ so `neutron agent-list` says everything is good to go, but `crm resource list` doesn't think dhcp or l3 is running
13:58 richardkiene_ and I can't ping any of my instances, so I'd have to agree with `crm resource list`
14:00 richardkiene_ Finally something good in the logs: http://paste.openstack.org/show/61820/
14:00 richardkiene_ ERROR neutron.agent.l3_agent [-] Failed synchronizing routers
14:04 MiroslavAnashkin richardkiene_: From the log. Pidfile /var/lib/neutron/external/pids/207db4f2-916b-44da-b909-3d52c8e2487f.pid already exist. Daemon already running?
14:04 MiroslavAnashkin richardkiene_: Please stop clone_p_neutron-plugin-openvswitch-agent and then remove this PID
14:11 richardkiene_ hmm same rsult
14:11 richardkiene_ *result
14:12 richardkiene_ well the dhcp agent is still stopped, but the l3 agent did start that time
14:16 richardkiene_ I stopped everything and cleared out the pid files
14:16 richardkiene_ still no dice
14:19 MiroslavAnashkin richardkiene_: Please connect to the node, where L3 agent fails to start and run `crm resource migrate p_neutron-l3-agent`
14:23 MiroslavAnashkin richardkiene_: It looks like new Neutron issue. They added PIDs to virtual entities, like routers and this issue came from incorrect PID management.
14:24 MiroslavAnashkin richardkiene_: After the agent migration, please restart clone_p_neutron-plugin-openvswitch-agent one more time, may be with cleanup.
14:25 richardkiene_ https://bugs.launchpad.net/fuel/+bug/1269334
14:25 richardkiene_ I've applied that patch on all controller nodes, if that is what you're referring to
14:35 xdeller joined #fuel
14:38 richardkiene_ Okay through cleaning up pid files and rebooting controllers I've managed to get everything in the correct state according to `crm resource list` and `neutron agent-list`
14:38 richardkiene_ however I still have no access in to the guest network
14:39 richardkiene_ and now I can ping it, I guess it just takes a minute
14:41 richardkiene_ does nova-compute take a bit to come back up after stopping neutron?
14:44 ArminderS joined #fuel
14:47 MiroslavAnashkin richardkiene_: No, nova-compute and neutron-dependable nova services should be restarted
14:48 richardkiene_ ah gotcha, the only thing not running is nova-compute on each computer node
14:48 richardkiene_ however service nova-computer status says it is running
14:48 richardkiene_ I guess they still need a restart?
14:49 MiroslavAnashkin Yes, they lost connection by timeout
14:50 richardkiene_ gotcha, looks like everything is working again, thank you very much for your help
14:50 richardkiene_ I'm hoping that the randomly killed l3 agent is fixed by the patch I applied from https://bugs.launchpad.net/fuel/+bug/1269334
14:51 richardkiene_ heh so much for that, networking is dead again
14:53 richardkiene_ though the l3 agent is still running on one of the nodes
14:53 richardkiene_ the other 2 controller nodes show it as down
14:54 MiroslavAnashkin There should be single instance of L3 agent
14:54 richardkiene_ there are 3 instances of the l3 agent now
14:54 richardkiene_ 1 up and 2 down
14:54 MiroslavAnashkin Yes, 1 up and 2 down is correct
14:55 richardkiene_ k, yeah everything looks correct, but ICMP to floating IPs is not working
14:55 richardkiene_ I'll see if I can console to a host
14:55 richardkiene_ *guest
14:56 MiroslavAnashkin Are ICMP and necessary ports enabled in security group, applied to the instances?
14:57 richardkiene_ yes
14:57 richardkiene_ looks like DHCP is not handing out IPs to instances
14:58 richardkiene_ just FYI we were even serving traffic and remoting to guests in this deployment as of 7PM PT yesterday
14:59 MiroslavAnashkin Please check L3 agent log one more time, it may be virtual router issue as well.
15:00 richardkiene_ will do
15:03 richardkiene_ here is the latest from the l3 log: http://paste.openstack.org/show/61828/
15:04 richardkiene_ that is on the controller that has the UP l3 agent
15:04 richardkiene_ time is in UTC
15:10 MiroslavAnashkin If it not were you, who send SIGTERM, then it was pacemaker
15:11 MiroslavAnashkin Does it happen even with patch from https://bugs.launchpad.net/fuel/+bug/1269334 ?
15:13 richardkiene_ yeah I applied that patch to all the controllers already
15:14 richardkiene_ I'm assuming those SIGTERMs are me, since everything is still up according to both crm resource list and neutron agent-list
15:16 MiroslavAnashkin Do you have security group, named "firewall_driver"
15:16 MiroslavAnashkin ?
15:17 richardkiene_ I don't think so, but I'll check
15:18 richardkiene_ No, I cannot find a tenant/project with a security group named that
15:20 richardkiene_ that looks related to nova
15:20 richardkiene_ https://bugs.launchpad.net/nova/+bug/1040430
15:21 richardkiene_ Yeah line 40 of nova.conf has "firewall_driver=nova.virt.firewall.NoopFirewallDriver"
15:24 richardkiene_ I'm tempted to restart the ovs, but that has been the source of so much pain
15:25 MiroslavAnashkin No, I concerned about underscore in the name. It was Openstack bug before Havana. Underscores were restricted characters in security group names, but were allowed by UI
15:26 richardkiene_ No underscores in any of the security group names
15:33 richardkiene_ Should nova-consoleauth, nova-cert, nova-scheduler, be running on multiple controllers?
15:36 richardkiene_ Since it can't get worse, I'll restart the OVS
15:39 richardkiene_ after clone_p_neutron-plugin-openvswitch-agent was restarted the p_neutron-l3-agent failed to start cleanly
15:39 richardkiene_ it is Started (unmanaged) FAILED
15:39 richardkiene_ so yeah, something is up with the l3 agent
15:41 richardkiene_ but after a "service neutron-l3-agent restart" on the node that neutron believes is responsible for the l3 agent, it is working
15:42 richardkiene_ followed by a crm cleanup and now crm resource status shows everything successful
15:49 richardkiene_ https://bugs.launchpad.net/neutron/+bug/1210121
15:49 richardkiene_ apparently the firewall_driver ERROR log is a known thing and benign
15:55 book` joined #fuel
16:08 mrasskazov1 joined #fuel
16:27 MiroslavAnashkin richardkiene_: neutron-l3-agent restart should not be started directly. neutron-plugin-openvswitch-agent should start is with necessary input parameter list.
16:31 richardkiene_ unfortunately I can't get it to reliably start with that command
16:31 richardkiene_ if I run that, it almost always says it is started (unmanaged) and failed
16:45 BillBen981 joined #fuel
17:00 e0ne joined #fuel
17:16 angdraug joined #fuel
17:50 MiroslavAnashkin richardkiene_: nova-consoleauth, nova-cert, nova-scheduler - all should run on each controller. All depend on neutron, so, please restart then all after every neutron restart
17:53 MiroslavAnashkin And please run `crm resource manage p_neutron-l3-agent`, and then cleanup and restart clone_p_neutron-plugin-openvswitch-agent
17:53 richardkiene_ MiroslavAnashkin: I assume those all restart with `service nova-compute restart` ?
17:56 MiroslavAnashkin richardkiene_: No, every nova* service is independent. But, they all are not depend on neutron. Please simply start any nova service in XXX state.
17:57 MiroslavAnashkin please start them with `service <service name> start command, some of these services work only under "nova" account
17:58 vkozhukalov joined #fuel
17:58 richardkiene_ sweet I'll be very happy if it comes up correctly after issuing the commands in this order
18:00 richardkiene_ still starts up like this "p_neutron-l3-agent(ocf::mirantis:neutron-agent-l3): Started (unmanaged) FAILED"
18:01 Dr_Drache joined #fuel
18:02 Dr_Drache When Someone has a Min, I have an issue with a test deployment using fuel.
18:03 Dr_Drache deploying a controller using ubunutu based, the machine becomes upbootable.
18:03 Dr_Drache (doesn't seem like grub/kernel finished before the reboot)
18:10 Dr_Drache http://www.sendspace.com/file/8fz73y <-- my log snapshot
18:15 richardkiene_ MiroslavAnashkin: Now we're back to everything looking correct, but no traffic reaches the guest networks
18:15 richardkiene_ :(
18:36 MiroslavAnashkin richardkiene_: Where so you check for the traffic?
18:36 MiroslavAnashkin richardkiene_: or what command do you use?
18:37 richardkiene_ simple ping from different boxes in the datacenter, and my workstation over vpn
18:37 richardkiene_ just trying to hit the floating ip
18:38 Dr_Drache looks like I have a near repeat of bug #1264779, bummer
18:38 richardkiene_ however after all the restarting the guest machines don't even have IPs assigned via DHCP
18:38 richardkiene_ so the guests themselves do not have a route to anywhere
18:39 richardkiene_ I also did a tcp tump and watched the ICMP packets come in to the host machine, but never make it to the guest
18:47 xarses joined #fuel
18:56 e0ne joined #fuel
19:01 MiroslavAnashkin richardkiene_: Please do the following: 1. ssh to the machine where the L3 agent is currently running
19:02 MiroslavAnashkin richardkiene_: Run `ip netns list`
19:03 MiroslavAnashkin richardkiene_: run `source openrc`
19:03 MiroslavAnashkin richardkiene_: Run `neutron router list`
19:04 MiroslavAnashkin richardkiene_: Find namespace, where qrouter-* is running
19:05 MiroslavAnashkin richardkiene_: Run `ip netns exec qrouter-* ping <internal IP of the instance>`
19:06 MiroslavAnashkin richardkiene_: You may also run `ip netns exec qrouter-* ip a` to see the list of available IP
19:06 richardkiene_ MiroslavAnashkin: part of it here: http://paste.openstack.org/show/61844/
19:07 MiroslavAnashkin richardkiene_: Please change qrouter-* to appropriate qrouter name
19:07 richardkiene_ unfortunately after all the restarting the agents are not back in entirely the right state
19:08 richardkiene_ so there are multiple l3 agents running at the moment
19:08 richardkiene_ node-62 is the agent presented by HAProxy though
19:10 richardkiene_ http://paste.openstack.org/show/61846/
19:10 richardkiene_ that is the remainder
19:10 MiroslavAnashkin richardkiene_: Oh, thank you for finding the bug. looks like pacemaker scripts monitors only resources run by pacemaker. but not the same resources ran manually
19:11 MiroslavAnashkin richardkiene_: and looks like you have chose incorrect qrouter
19:12 richardkiene_ which one should I pick?
19:12 MiroslavAnashkin Please run `ip netns exec qrouter-207db4f2-916b-44da-b909-3d52c8e2487f ip a` with each qrouter
19:14 MiroslavAnashkin richardkiene_: though, you already have 2 L3 agents, so one runs correct qrouter copies, other does not
19:14 richardkiene_ http://paste.openstack.org/show/61847/
19:15 richardkiene_ there is the result of that with each qrouter
19:15 MiroslavAnashkin Dr_Drache: Do you use HP SmartArray hardware?
19:15 Dr_Drache MiroslavAnashkin, yes sir. the controller is a HP.
19:16 Dr_Drache i400 IIRC.
19:20 Dr_Drache older unit for testing, and damn, P400i, I was close :P
19:20 Dr_Drache DL360 G5
19:20 MiroslavAnashkin Dr_Drache: Sorry for it. We discovered Fuel 4.0 works correct only with selected SmartArray models - actually with one, we tested our patch for SmartArray with.
19:21 Dr_Drache MiroslavAnashkin, that's unfortunate. our Dell controllers are backordered.
19:21 MiroslavAnashkin Currently we are creating the patch to make Fuel work correct with all HP SmartArray models
19:22 MiroslavAnashkin Dr_Drache: As workaround, you may install GRUB manually after the OS is deployed by Fuel.
19:22 Dr_Drache MiroslavAnashkin, which procedure you recommend, chroot via a bootdisc?
19:23 MiroslavAnashkin Dr_Drache: Chroot with the same kernel version as you are installing.
19:24 Dr_Drache that might be difficult, but not that bad. thanks for that
19:25 MiroslavAnashkin Dr_Drache: For HP SmartArray CentOS, which we use for PXE boot and discovering reports /dev/disk/by-path different with Ubuntu
19:26 Dr_Drache yea, CentOS gives me fits with ceph.
19:26 MiroslavAnashkin Dr_Drache: it leads GRUB to install to wrong device
19:26 Dr_Drache ahhh
19:28 MiroslavAnashkin richardkiene_: Well, I see qrouter-207db4f2-916b-44da-b909-3d52c8e2487f holds the IP 10.200.0.1
19:29 richardkiene_ yeah router4
19:31 richardkiene_ is there any reasonable workaround for the state we're in?
19:31 richardkiene_ I cannot seem to consistently get networking to actually work for any length of time
19:32 richardkiene_ even after rolling restarts and making sure every service is restarted properly and reporting properly
19:37 MiroslavAnashkin richardkiene_: Doubtless, there is workaround:) It is opensource after all. Please run the following:
19:38 MiroslavAnashkin richardkiene_: `crm unmanage p_neutron-l3-agent`
19:39 MiroslavAnashkin richardkiene_:  Wait a minute, then kill all L3 agent instances.
19:40 MiroslavAnashkin richardkiene_: Wait one more minute and check no L# agents are automatically launched
19:40 MiroslavAnashkin richardkiene_: L3
19:41 MiroslavAnashkin richardkiene_: Correction - Run `crm resource unmanage p_neutron-l3-agent`
19:43 MiroslavAnashkin Afte rall L3 agents are down - run `crm resource manage p_neutron-l3-agent`
19:44 MiroslavAnashkin richardkiene_: Wait another minute - crm may start p_neutron-l3-agent automatically and need 30 seconds to migrate router to the node with running L3 agent
19:46 MiroslavAnashkin richardkiene_: in case no L3 agent is started automatically - run `crm resource cleanup clone_p_neutron-plugin-openvswitch-agent`
19:46 MiroslavAnashkin richardkiene_: Then wait a minute and run `crm resource restart clone_p_neutron-plugin-openvswitch-agent`
19:47 MiroslavAnashkin richardkiene_: Then give crm 2 minutes and check all crm resources with `crm resource list`
19:48 MiroslavAnashkin If resources are OK - check nova services with `nova-manage service list` and restart all services in XXX state
19:50 MiroslavAnashkin richardkiene_: And after it all - find the node with L3 agent running and run `ip netns exec qrouter-* ip a` for each qrouter
19:52 Dr_Drache MiroslavAnashkin, that worked, but getting /var/lib/glance drive error on boot.
19:58 Dr_Drache the lvm wasnt created
20:02 Dr_Drache wish i knew how to make it :P
20:20 Dr_Drache man
20:20 Dr_Drache I can't win
20:20 Dr_Drache conf.pp:44 error now
20:20 teran_ joined #fuel
20:27 MiroslavAnashkin Dr_Drache: Our dev team so far haven't won as well:(
20:28 Dr_Drache so much for fuel being simpler.
20:28 Dr_Drache :P
20:29 Dr_Drache MiroslavAnashkin, is there a way to change my storage settings without destroying the cluster?
20:30 Dr_Drache MiroslavAnashkin, would my logs help at all? I enabled debugging.
20:30 MiroslavAnashkin Dr_Drache: Only manually.
20:31 Dr_Drache hmm, I really wanted this option.
20:31 MiroslavAnashkin Dr_Drache: and your diagnostic snapshot would be helpful, who knows what else HP hides for us...
20:32 Dr_Drache ohh, these conf.pp:44 errors are on brand new dells
20:32 Dr_Drache https://bugs.launchpad.net/fuel/+bug/1266853 that bug
20:34 Dr_Drache MiroslavAnashkin, http://www.sendspace.com/file/gjtdbm
20:35 Dr_Drache anything else, let me know
20:36 MiroslavAnashkin Dr_Drache: yes, this one. You may also help us by running  "ls /target/dev/disk/by-path" on the problematic node twice - one time inside bootstrap and one time under Ubuntu during the installation. And share both outputs
20:37 Dr_Drache k, going to have to give me a few min.
20:38 Dr_Drache trying to think of how to share the output.
20:39 Dr_Drache see, ubuntu is installed perfectly.
20:40 Dr_Drache this is just the openstack deployment part
20:52 Dr_Drache MiroslavAnashkin, getting it now
21:03 Dr_Drache MiroslavAnashkin, paste.openstack.org/show/61853
21:07 Dr_Drache MiroslavAnashkin, anything else? i'm about to start wrapping up for the weekend
21:28 Dr_Drache MiroslavAnashkin, is there newer builds that may have fixed something of 4.0?
21:43 teran joined #fuel
22:45 IlyaE joined #fuel

| Channels | #fuel index | Today | | Search | Google Search | Plain-Text | summary