Perl 6 - the future is here, just unevenly distributed

IRC log for #fuel, 2015-11-23

| Channels | #fuel index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:11 zhangjn joined #fuel
00:39 poseidon1157 joined #fuel
00:56 zhangjn joined #fuel
01:02 rmoe joined #fuel
01:13 zhangjn joined #fuel
01:44 jerrygb joined #fuel
02:49 ilbot3 joined #fuel
02:49 Topic for #fuel is now Fuel 7.0 (Kilo) https://software.mirantis.com | Paste here http://paste.openstack.org/ | IRC logs http://irclog.perlgeek.de/fuel/
02:51 yantarou joined #fuel
03:06 jerrygb joined #fuel
03:10 poseidon1157 joined #fuel
04:20 zhangjn joined #fuel
05:14 zhangjn joined #fuel
06:20 zhangjn joined #fuel
06:49 zhangjn joined #fuel
07:00 angdraug joined #fuel
07:09 f13o joined #fuel
07:12 magicboiz joined #fuel
07:31 javeriak joined #fuel
07:32 zhangjn joined #fuel
07:33 Fraggler joined #fuel
07:33 ppetit joined #fuel
07:35 javeriak joined #fuel
07:37 magicboiz joined #fuel
07:40 javeriak_ joined #fuel
07:43 neouf joined #fuel
07:51 jerrygb joined #fuel
08:13 HeOS_ joined #fuel
08:17 Fraggler joined #fuel
08:22 samuelBartel joined #fuel
08:44 sergmelikyan joined #fuel
08:45 Fraggler joined #fuel
08:46 xek joined #fuel
08:52 ScabS joined #fuel
09:04 marfx000 joined #fuel
09:09 f13o joined #fuel
09:10 f13o joined #fuel
09:15 sergmelikyan joined #fuel
09:16 hyperbaba joined #fuel
09:20 dsutyagin joined #fuel
09:21 ppetit joined #fuel
09:26 marfx000_ joined #fuel
09:28 f13o joined #fuel
09:28 e0ne joined #fuel
09:28 sergmelikyan joined #fuel
09:29 marfx000__ joined #fuel
09:30 dsutyagin joined #fuel
09:31 dsutyagin joined #fuel
09:32 dsutyagin joined #fuel
09:34 dsutyagin joined #fuel
09:35 f3flight joined #fuel
09:39 asaprykin joined #fuel
09:46 marfx000_ joined #fuel
09:47 dsutyagin joined #fuel
09:48 Fraggler joined #fuel
09:52 ppetit joined #fuel
09:54 alex_didenko joined #fuel
09:58 marfx000__ joined #fuel
10:01 marfx000 joined #fuel
10:06 zhangjn joined #fuel
10:08 jerrygb joined #fuel
10:20 sergmelikyan joined #fuel
10:23 ppetit joined #fuel
10:44 ppetit joined #fuel
10:50 sergmelikyan joined #fuel
11:06 marfx000 joined #fuel
11:10 Liuqing joined #fuel
11:16 poseidon1157 joined #fuel
11:17 javeriak joined #fuel
11:24 ppetit joined #fuel
11:50 Fraggler joined #fuel
11:52 Fraggler joined #fuel
12:08 jaypipes joined #fuel
12:23 tobiash Hi
12:23 tobiash I have a fuel 7.0 server which was upgraded several releases and manages a 6.0 based cluster and a 7.0 based cluster
12:23 tobiash Adding compute nodes to the 6.0 based cluster failes with [679] Error running provisioning: <class 'cobbler.cexceptions.CX'>:'Error deleting /var/lib/tftpboot/pxelinux.cfg/default'
12:25 tobiash now deployment hangs and also stopping deployment hangs
12:25 ppetit joined #fuel
12:32 sergmelikyan joined #fuel
12:34 TVR joined #fuel
12:43 sergmelikyan joined #fuel
12:49 ppetit joined #fuel
12:55 ppetit joined #fuel
12:59 jerrygb joined #fuel
12:59 angdraug joined #fuel
12:59 zhangjn joined #fuel
12:59 jerrygb joined #fuel
13:00 zhangjn joined #fuel
13:00 jerrygb_ joined #fuel
13:15 ppetit joined #fuel
13:32 jaypipes joined #fuel
13:34 ppetit joined #fuel
13:37 ppetit joined #fuel
13:49 ppetit joined #fuel
14:03 ppetit joined #fuel
14:14 Fraggler joined #fuel
14:23 rmoe joined #fuel
14:24 ppetit joined #fuel
14:28 pzhurba joined #fuel
14:32 alexz joined #fuel
14:40 angdraug joined #fuel
14:45 javeriak joined #fuel
14:46 javeriak_ joined #fuel
14:50 javeriak joined #fuel
14:58 subscope joined #fuel
15:01 claflico joined #fuel
15:02 sergmelikyan joined #fuel
15:17 tkhno joined #fuel
15:29 sergmelikyan joined #fuel
15:59 blahRus joined #fuel
16:00 ericjwolf_ joined #fuel
16:05 javeriak joined #fuel
16:12 xarses joined #fuel
16:13 ericjwolf_ Question, now that 7.0 is out will 6.1 receive patches?  There are a few updates to Juno that I require but not ready to upgrade to Kilo yet.
16:21 Verilium pasquier-s:  Ever ran into into an issue where hekad had too many open files?
16:21 pasquier-s Verilium, nope but you are, I guess...
16:22 Verilium Good guess. :P
16:22 Verilium root@node-13:/var/log# lsof -p 18863 | wc -l
16:22 Verilium 1038
16:22 Verilium Well, seems it's really over it's 1024 limit.
16:23 pasquier-s oh interesting, can you paste the list of opened files?
16:23 Verilium It seems to be a load of sockets...
16:23 Verilium hekad   18863 heka   71u  sock      0,7        0t0 64456308 can't identify protocol
16:23 sergmelikyan joined #fuel
16:23 Verilium hekad   18863 heka   72u  sock      0,7        0t0 64454528 can't identify protocol
16:24 Verilium hekad   18863 heka   73u  sock      0,7        0t0 64469560 can't identify protocol
16:24 Verilium root@node-13:/proc/18863/fd# lsof -p 18863 | grep "can't identify protocol" | wc -l
16:24 Verilium 991
16:24 pasquier-s is it on a controller node?
16:25 Verilium pasquier-s:  http://paste.openstack.org/show/479746/
16:25 pasquier-s Verilium, thanks!
16:26 Verilium Yep, control node.  The node that has the lma_collector resource at the moment.
16:26 Verilium root@node-13:/proc/18863/fd# crm resource status lma_collector
16:26 Verilium resource lma_collector is running on: node-13.streamtheworld.net
16:27 Verilium And at the same time, /var/log/lma_collector.log is filling up with:
16:28 Verilium 2015/11/23 16:27:27 Input 'aggregator_tcpinput' error: TCP accept failed: accept tcp 10.11.201.8:5565: too many open files
16:28 Guest70_ joined #fuel
16:28 Verilium ...which fills up /var/log, heh.
16:28 pasquier-s which commit of the LMA collector are you running?
16:30 pasquier-s googling for "can't identify protocol" tells me that some TCP connections aren't closed properly
16:32 Verilium Strange.
16:34 Verilium Well, I've restarted the resource for now and things seem fine again.
16:34 Verilium Running on...  Last log seems to be:
16:34 Verilium commit 40a3c1b6d1024cd7c103c70effc12eb03f7a7e55
16:34 Verilium Author: Simon Pasquier <spasquier@mirantis.com>
16:34 Verilium Date:   Fri Nov 13 15:27:11 2015 +0100
16:35 javeriak_ joined #fuel
16:36 pasquier-s so AFAICT  you don't miss important fixes
16:37 javeriak joined #fuel
16:37 pasquier-s I would be surprised if the issue doesn't appear again but I've never seen it before
16:37 ppetit Would be ineresting to know what those sockets where created for… talk to backed Influx, elasticsearch, ….
16:38 ppetit Can this be monitored?
16:39 Verilium pasquier-s:  I'll advise if it happens again.  At least it's pretty obvious quickly. :)
16:40 pasquier-s Verilium, ok thanks!
16:52 Verilium Hmm, seems restarting the lma_collector resource fixed it for a few mins, status' were coming back in nagios, but now, it's back to unknown in nagios and grafana is just showing up with nothing, empty dashboards.
16:54 sergmelikyan joined #fuel
17:01 elemoine_ Verilium, same problem as before?
17:10 Verilium Well, yes and no.  Same end result, but seemingly not same cause.
17:10 Verilium hekad isn't having issues with too many open files.  56 right now on the node that has it.
17:20 elemoine_ ok
17:20 elemoine_ I'll mention this to pasquier-s tomorrow
17:20 elemoine_ but please tell us if have more information about this issue
17:26 Fraggler joined #fuel
17:32 Guest70_ joined #fuel
17:33 jerrygb joined #fuel
17:35 Fraggler joined #fuel
17:40 jerrygb joined #fuel
17:47 sc68cal joined #fuel
17:50 jfluhmann joined #fuel
18:10 javeriak hey, is there a way to rerun only post-deployment tasks again rather then the whole deployment ?
18:10 Guest70_ joined #fuel
18:20 Guest70_ joined #fuel
18:22 sergmelikyan joined #fuel
18:22 e0ne joined #fuel
18:25 xarses javeriak: you can manually re-run specific tasks on specified nodes
18:25 xarses the syntax is under the `fuel node` group of commands in the cli
18:26 xarses look for ones talking about tasks
18:26 javeriak xarses yep ive seen that, was just wondering if the UI provided that capability
18:26 xarses no
18:27 jerrygb joined #fuel
18:27 xarses the UI will only nodes that are in error, and in that case, all of their tasks
18:27 xarses regardless of which failed
18:28 javeriak oh alright
18:29 e0ne joined #fuel
18:29 javeriak wait i mustve seen something different, because there is no mention of tasks here: https://wiki.openstack.org/wiki/Fuel_CLI
18:31 xarses ya, I cant imagine that is ever actually up to date
18:32 xarses and appears to be centered around getting started anyway
18:36 javeriak where do i find the full detail for cli commands?
18:36 xarses in the help output from the command
18:37 xarses it should also be in one of he autobuild docs, I'm looking for that
18:37 xarses s/he/the
18:38 javeriak okay thanks
18:46 srmaddox joined #fuel
18:46 jerrygb joined #fuel
18:49 xarses https://gist.github.com/xarses/d2c8c76b2c39d1a34628#file-gistfile1-txt-L62-L68
18:51 javeriak xarses thanks for the bother :)
18:52 xarses I don't see a link to the auto docs, So I'll file a bug for that
18:57 javeriak alright cool
18:58 bildz what is the best way to apply updates to each node?  for instance, Im using ubuntu, do I just ssh to all nodes and apt-get update/upgrade ?
19:03 jerrygb joined #fuel
19:04 Verilium pasquier-s:  A possible side-effect of my earlier issue with hekad running out of file descriptors, seems collectd on all the nodes were spitting out:  "[2015-11-23 19:01:30] plugin_dispatch_values: Low water mark reached. Dropping 100% of metrics."
19:19 Guest70_ joined #fuel
19:23 bpiotrowski bildz: every node is running mcollective, so looks possible to utilize this
19:26 bpiotrowski bildz: you can use execute_shell_command for that
19:29 bpiotrowski bildz: mco rpc execute_shell_command cmd='uptime' for example
19:32 HeOS_ joined #fuel
19:33 krobzaur joined #fuel
19:33 javeriak guys, whats an appropiate way to put a wait/sleep between deployment tasks? or even retires for one that you think needs more time
19:33 bpiotrowski tobiash: could you please bring this problem to our bug tracker?
19:34 bildz bpiotrowski: thanks
19:34 asaprykin joined #fuel
19:35 bpiotrowski ericjwolf: I believe 6.1 still receives maintenance upgrades
19:35 krobzaur I'm experiencing a really strange problem with Fuel 6.1. I realize the topic for the channel is Fuel 7.0, so is this the wrong place to ask about 6.1?
19:35 fuel-slackbot joined #fuel
19:35 bpiotrowski krobzaur: no, it's fine, topic says 7.0 as it's latest release
19:36 fuel-slackbot joined #fuel
19:36 bpiotrowski javeriak: I can't think of anything more clever than a sleep and smart puppet manifest
19:36 bpiotrowski s/and/or/
19:36 fuel-slackbot joined #fuel
19:36 krobzaur bpiotrowski: Gotcha, thanks.
19:36 javeriak bpiotrowski ive got bash scripts unfortunately
19:36 fuel-slackbot joined #fuel
19:37 bpiotrowski javeriak: there are almost 100 occurences of 'sleep' in fuel-library, we're experienced in fighting dragons…
19:38 krobzaur So, to the channel in general: Every time I run a network verification using Fuel 6.1, the process hangs and it takes one of my Controller/MongoDB nodes offline. The node becomes completely unresponsive and I have to reboot it to bring it back online
19:39 javeriak bpiotrowski haha right :)
19:39 krobzaur Once the node reboots and comes back online the network verification will finish
19:39 fuel-slackbot joined #fuel
19:40 bpiotrowski krobzaur: I never saw such behavior so I don't really can help you; could you share some details about deployment you're trying to do? I'll try to reproduce it tomorrow
19:40 bpiotrowski but that certainly sounds like a bug worthing reporting anyay
19:40 bpiotrowski anyway even
19:41 fuel-slackbot joined #fuel
19:42 krobzaur bpiotrowski: Certainly. So I'm deploying a test cloud with 3 Controller/MongoDB nodes, 4 compute nodes, 4 Ceph OSD nodes, and one node destined to be an ElasticSearch/Kibana/InfluxDB server as specified in the LMA Toolchain plugin guide. I'm deploying on Ubuntu with Neutron VLAN segmentation
19:42 tatyana joined #fuel
19:44 krobzaur Ah and the ceployment mode is ha_compact
19:44 krobzaur deployment*
19:44 fuel-slackbot joined #fuel
19:47 krobzaur The two other control/mongodb nodes are identical server models with identical drive/ram/cpu configurations and everything so I'm somewhat baffled. I'm afraid it might be a problem with this particular server which might make it difficult for anybody here to help me out
19:47 fuel-slackbot joined #fuel
19:52 fuel-slackbot joined #fuel
19:54 HeOS joined #fuel
19:58 fuel-slackbot joined #fuel
20:39 claflico joined #fuel
20:42 e0ne joined #fuel
20:54 TVR__ joined #fuel
21:04 claflico joined #fuel
21:17 e0ne joined #fuel
21:24 xarses krobzaur: network verification uses a packet generator to ensure that each network can pass the traffic as expected. If one of the nodes goes wonky, it sounds like the eth card doesn't like whats being done and is locking up
21:25 xarses as a node, mongo and controller should not be combined, especially with ceph
21:25 xarses s/as a node/as a note/
21:26 jerrygb joined #fuel
21:38 cvocooper joined #fuel
21:54 Fraggler joined #fuel
21:57 krobzaur xarses: Thanks for the info. I'll have to compare the network cards between the 3 controllers. If the troublesome one has a different network card that could point to the problem. And I must ask, why shouldn't those two nodes be combined? This is just a PoC cloud so I'm a bit strapped for hardware which is why I combined them but I'm completely open to advice for any future deployments
21:58 xarses the controller also runs the ceph monitor
21:58 xarses and mongo can run away quite easily
21:58 fuel-slackbot joined #fuel
21:59 xarses and your network routers
21:59 xarses so you could have any one of them run away and tear down your control plane, network plane, and IO plane
22:00 xarses it's very much asking for it at that point
22:00 xarses the lab / PoC reason is why we allow it to be placed
22:00 xarses together still
22:00 fuel-slackbot joined #fuel
22:00 xarses but under no conditions should you re-use that layout for production
22:06 fuel-slackbot joined #fuel
22:11 jerrygb joined #fuel
22:17 krobzaur xarses: What do you mean by "run away"?
22:18 xarses the processes are not given system resource limits, in some cases the processes can eat all available resources starving the other services
22:19 xarses for example mongodb can become quite IO intensive. This will start causing IO backlogs
22:19 Verilium pasquier-s:  Oh well, in the end, seems influxfb was my issue.  Or rather, the fact it used up 80% of memory and the node in question running it only had around 100MB left, out of 8GB.
22:20 Verilium Seems a bit strange that influxdb memory ramped up so much, but I'm not sure what's considered 'normal' for influxdb neither.
22:20 xarses the ceph-mon is sensitive to a clock drift of 10ms by default, a value of twice that will result in the cluster starting to fall out of quorum
22:20 xarses if there are no monitors in quorum, the cluster will stop accepting writes
22:21 xarses just ^^ Verilium has influxdb running away with all the system memory, an equal possible contention issue
22:23 krobzaur I see. Well I will definitely take note of that in the future if we decide to take Openstack into production. I think things should be fine for now. The servers are pretty powerful (2x12 cores, 128 GB RAM) and I'm just standing it up for experimentation and PoC anyway
22:40 neouf joined #fuel
22:43 claflico joined #fuel
23:22 neouf joined #fuel
23:53 alex_didenko joined #fuel
23:59 DevStok joined #fuel

| Channels | #fuel index | Today | | Search | Google Search | Plain-Text | summary