Perl 6 - the future is here, just unevenly distributed

IRC log for #fuel, 2015-02-12

| Channels | #fuel index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:05 youellet codybum: yup
00:06 youellet with kvm et neutron networking
00:06 youellet et=and
00:08 emagana joined #fuel
00:12 xarses codybum: no, afaict this was not seen in 6.0, its either a regression in 6.0.1 or something else odd
00:14 codybum I had the same experience with the 6.0 release.
00:15 codybum I am going to go back and update any firmware I can find.. who knows
00:21 youellet fuel-community-6.0.1-19-2015-02-06_03-40-16.iso work good for me with ubuntu Full HA.
00:26 emagana joined #fuel
00:33 codybum youellet: strange.. how many compute nodes?
00:34 codybum What is your network configuration?
00:40 mattgriffin joined #fuel
00:41 codybum youellet: I am running fuel-community-6.0.1-18-2015-02-05_03-40-16.iso.  I don't see anything in the logs between versions.  Is it possible that something has changed?
01:07 ahg joined #fuel
01:13 claflico joined #fuel
01:17 ahg joined #fuel
01:26 rmoe joined #fuel
01:52 ahg joined #fuel
01:55 netalien_ joined #fuel
01:58 ahg joined #fuel
01:59 netalien_ hello all
02:15 Longgeek joined #fuel
02:20 ahg joined #fuel
02:24 xarses joined #fuel
02:31 ahg joined #fuel
02:48 ilbot3 joined #fuel
02:48 Topic for #fuel is now Fuel 5.1.1 (Icehouse) and Fuel 6.0 (Juno) https://software.mirantis.com | Fuel for Openstack: https://wiki.openstack.org/wiki/Fuel | Paste here http://paste.openstack.org/ | IRC logs http://irclog.perlgeek.de/fuel/
02:54 emagana joined #fuel
02:55 Longgeek joined #fuel
03:51 ahg joined #fuel
03:52 teran joined #fuel
03:52 okosse joined #fuel
04:25 Longgeek joined #fuel
06:07 Longgeek joined #fuel
06:58 daniel3_ joined #fuel
06:58 adanin joined #fuel
07:03 saibarspeis joined #fuel
07:16 dklepikov joined #fuel
07:20 sressot joined #fuel
07:35 Miouge joined #fuel
07:40 aliemieshko_ joined #fuel
07:40 e0ne joined #fuel
07:41 sambork joined #fuel
07:48 xek_ joined #fuel
07:50 stamak joined #fuel
07:53 Longgeek joined #fuel
08:03 ahg joined #fuel
08:05 Miouge_ joined #fuel
08:07 Miouge 574
08:39 alecv joined #fuel
08:39 avgoor joined #fuel
08:41 saibarspeis joined #fuel
08:50 tzn joined #fuel
08:54 Longgeek joined #fuel
08:55 sc-rm joined #fuel
08:56 sc-rm I have noticed that our instances have poor write speed. Where could be the bottlenecks for the problem?
08:56 sc-rm We use dedicated switches for the storage network
08:57 HeOS joined #fuel
08:57 sc-rm and all ceph nodes have SSD for the journals
08:57 stamak joined #fuel
09:04 izinovik_ joined #fuel
09:11 dklepikov sc-rm: 1 what network are you use?, 2 what is the speed, and how are you measure it?
09:14 adanin joined #fuel
09:16 sc-rm dklepikov: 1 Gigabit. 2 the network speed is good, but the instances at which we do a mysql import of a database takes substaional longer in an instance than in an xen based virtual machine on the same type of hardware
09:22 sc-rm dklepikov: Also if we do iostat on the instances, we see the http://paste.openstack.org/show/171909/
09:25 dklepikov sc-rm: Can you install spew on instance and run " rm -rf /tmp/test && spew -i 50 -v -d --write -r -b 4096 10M /tmp/test", what version of the ceph are you using?
09:26 dklepikov sc-rm : Network i meant: neutron+vlan, neutron+gre, nova ?
09:29 evgeniyl___ joined #fuel
09:31 e0ne joined #fuel
09:32 sc-rm dklepikov: ceph version 0.80.4
09:32 sc-rm dklepikov: Network is : Neutron with VLAN segmentation
09:36 ChrisNBlum joined #fuel
09:43 Miouge joined #fuel
09:52 sc-rm dklepikov: It’s still running on the instance the command, and long done on the xen based instance, so something is wrong somewhere in the setup
09:52 sc-rm dklepikov: Do you have a similar command for testing read speed?
10:00 aarefiev joined #fuel
10:00 Miouge joined #fuel
10:00 sc-rm dklepikov: http://paste.openstack.org/show/171943/ It does not look good
10:02 dkaigarodsev_ joined #fuel
10:04 Longgeek joined #fuel
10:13 sc-rm dklepikov: did a test more on the node that is running the instance and tested on the ceph nodes also… http://paste.openstack.org/show/171951/
10:14 dklepikov sc-rm : Your output 171943 looks like you do not use RBD cache into ceph. ceph  version 0.80.4 has an issue with it. to use RBD cache you need firstly update ceph to 0.80.7
10:15 dklepikov sc-rm : one more can you show "ceph -s"
10:16 dkaigarodsev joined #fuel
10:16 sc-rm dklepikov: Okay, that would make sense. Then I have the problem of upgrading ceph when deployed through mirantis fuel
10:16 sc-rm dklepikov: http://paste.openstack.org/show/171952/ ceph -s
10:16 dkaigarodsev__ joined #fuel
10:18 dklepikov sc-rm :  what OS are you use on cluster nodes?
10:19 dklepikov What is your fuel version?
10:20 sc-rm dklepikov: fuel 5.1 and the nodes are running ubuntu 12.04.4
10:22 dklepikov sc-rm : Let me check some data
10:23 teran joined #fuel
10:24 sambork joined #fuel
10:26 andriikolesnikov joined #fuel
10:26 hyperbaba joined #fuel
10:38 adanin joined #fuel
10:44 Longgeek joined #fuel
10:51 mattgriffin joined #fuel
11:01 teran joined #fuel
11:08 ahg joined #fuel
11:26 sambork joined #fuel
11:33 book` joined #fuel
11:59 ahg joined #fuel
12:06 dklepikov joined #fuel
12:10 hyperbaba Hi there, I was wondering is it possible to create additional ceph cluster apart from deployed one in 5.1 deployment to be used for cinder-backup service? It's a litle bit careless to use same ceph infrastructure for production and backup. On the other hand if not, is it posible to create crush map to reflect disaster recovery nodes on which backups pool will be stored?
12:18 sc-rm dklepikov: I’we been looking at just getting the new ceph deb packages, but it seems there might be some dependencies that will cause some trouble right off the bat. But if we could make a patch to fix that I could apply. That will be a huge improvement for us ;-)
12:19 dklepikov sc-rm : Check packages installed on monitor nodes and on ceph-osd nodes
12:19 dklepikov sc-rm : #dpkg --get-selections | grep ceph
12:19 dklepikov sc-rm : #dpkg --get-selections | grep rados
12:19 dklepikov sc-rm : #dpkg --get-selections | grep librbd
12:20 dklepikov sc-rm : http://ceph.com/docs/master/install/upgrading-ceph/
12:21 dklepikov sc-rm : Check osd versions on all ceph osd nodes
12:21 dklepikov sc-rm : #for i in $(ceph osd ls); do ceph tell osd.${i} version; done
12:22 sc-rm dklepikov: all ceph nodes are runngin the same version { "version": "ceph version 0.80.4 (7c241cfaa6c8c068bc9da8578ca00b9f4fc7567f)"}
12:22 dklepikov sc-rm : you can download packager from  http://fuel-repository.mirantis.com/fwm/5.1.1/ubuntu/pool/main/
12:23 dklepikov packages
12:24 sc-rm dklepikov: So let’s say I take one node and make all the osds out, just to be sure for no data loss
12:24 sc-rm dklepikov: Then I could try to upgrade the packages on that node, will the ceph 0.80.7 be able to work with 0.80.4
12:25 dklepikov sc-rm : http://ceph.com/docs/master/install/upgrading-ceph/ you must follow the documentation
12:27 dklepikov sc-rm :  By ceph docs - you must update monitors nodes first
12:28 dklepikov sc-rm : then update one by one ceph-osd nodes, and restart osds on it
12:28 dklepikov sc-rm : then do not forget to update your fuel nailgun repository with new ceph packages
12:31 dklepikov sc-rm : then you can enable rbd cache into ceph.conf, and add "network=writeback" into nova.conf
12:33 ddmitriev joined #fuel
12:34 sambork1 joined #fuel
12:37 sambork joined #fuel
12:38 sambork2 joined #fuel
13:05 e0ne joined #fuel
13:29 samuelBartel_ joined #fuel
14:03 andriikolesnikov joined #fuel
14:08 denis_makogon joined #fuel
14:10 championofcyrodi so it looks ceph's default crush mapping doesnt work very well in heterogeneous sized osds; and requires a lot of reading when you need to fix it.
14:11 championofcyrodi but is very interesting
14:12 championofcyrodi i'd need a couple of weeks of 'ceph-deploy' work alone just to get comfortable with it.
14:14 championofcyrodi xarses: I found my fsid :)
14:21 monester_laptop joined #fuel
14:25 julien_ZTE joined #fuel
14:28 saibarspeis joined #fuel
14:37 codybum joined #fuel
14:38 sambork joined #fuel
14:42 claflico joined #fuel
14:44 sc-rm dklepikov: Now I have upgraded alle the ceph related stuff, I guess I have to restart instances already running for it to take effect on those?
14:46 dklepikov sc-rm : for running instances to apply rdb cache use you need to restart instance or live migrate it
14:47 youellet codybum, 3 Controller, Telemetry and MongoDB for Telemetry (all one same node. I have 3 other with Compute, storage and CEph OSD.
14:47 youellet My network, i run Neutron with VLAN. My public network is taged.
14:48 dklepikov sc-rm : did you add "network=writeback" into nova.conf
14:48 codybum youellet: Ok. We are almost the same.  How much ram do you have on your controllers?
14:49 codybum I have 3 controllers /w 64G ram each, 5 compute boxes, and 3 Ceph nodes.  Running Neutron VLAN and Ceph for storage.
14:49 youellet 2X24 and 1X6GB
14:49 youellet I have build a working cluster with 1X2GB and 2X4GB.
14:50 DaveJ__ joined #fuel
14:50 sc-rm dklepikov: I added the "network=writeback" into nova.conf on all compute nodes and restartet nova-compute and libvirt-bin
14:50 DaveJ__ Hi guys, can anyone tell me - is it possible to change the Ceph replication count after deployment? i.e. I have two ceph nodes, I want to add a 3rd
14:50 DaveJ__ but change the replica count from 2 to 3
14:50 sc-rm dklepikov: If I remember correctly, live migrate does not work in 5.1
14:50 dklepikov sc-rm : disk_cachemodes=........,"network=writeback" something like this
14:51 codybum Interesting.. Do you not see "oslo.messaging._drivers.impl_rabbit" errors under neutron-server logs?
14:52 dklepikov DaveJ__ yes? it
14:52 codybum When interacting with OpenStack I see a oslo messaging error about every 5 seconds.  Often things continue to work, but often there are failures.
14:53 dklepikov DaveJ__ yes, it possible. Firstly please add new ceph-osd node
14:54 codybum Changing swappness really helped, but issues still exist.
14:54 DaveJ__ dklepikov: Thanks - adding the new node seems straight forward.  I couldn't figure out where to change the replication count, as my settings appear to be locked in the fuel UI
14:55 youellet yes
14:55 youellet i have i have this error too
14:57 codybum Mmm.. Then if your system is stable and mine is not, these errors could be a red herring
14:58 codybum I especially have issues with cloudinit talking to the metadata-agent.  If it works, it takes a log time, but the failure rate is 1/10.
15:04 tzn joined #fuel
15:05 claflico joined #fuel
15:17 jobewan joined #fuel
15:22 devstok joined #fuel
15:23 blahRus joined #fuel
15:25 sc-rm dklepikov: It works, now we have way better io performance in the instances
15:26 sc-rm dklepikov: but I have to restart the instances, because live migrating does not work in 5.1 due to the cpu comparing in lower levels. I can’t fint the blueprint, but there is one for fixing it for a later relase if I don’t remember wrong. Thanks again :-)
15:28 dklepikov sc-rm : what IO are you have now? spew test i mean
15:30 sc-rm dklepikov: http://paste.openstack.org/show/172139/ the bottom one, so still some way to go, but way better
15:33 dklepikov sc-rm : Did you add rbd cache = true into [client] section into ceph.conf?
15:33 sc-rm dklepikov: If I repeat it after a while it almost doubles in performance…  where is the cache stored?
15:33 sc-rm dklepikov: I added it to all nodes having and ceph.conf file
15:35 dklepikov http://ceph.com/docs/master/rbd/rbd-config-ref/
15:37 championofcyrodi does anyone know if it's okay to apply a crushmap to a running ceph instance?
15:37 sc-rm dklepikov: I’ll poke more around tomorrow, have to go for today. But I’ll let you know what I figure.
15:37 devstok hi
15:38 devstok I'm creating an image in raw format
15:38 devstok 2.5 GB
15:38 devstok looking ceph -w
15:38 championofcyrodi http://pastebin.com/gJRX7HWW <- my crushmap
15:38 devstok every 5 minutes I see WRN] : 1 slow requests, 1 included below; oldest blocked for >
15:38 devstok osd.4 [WRN] 1 slow requests, 1 included below; oldest blocked for > 50.792429 secs
15:39 MiroslavAnashkin championofcyrodi: It is possible to import crushmap but you should know what you doing. It may fully rebalance your OSDs - or you may get one of nasty Ceph bugs with cascade OSD out.
15:40 MiroslavAnashkin devstok: Is your OSD.4 HDD based and what is its journal size?
15:41 championofcyrodi thanks miroslavanashkin:  my issue is that we set our replication factor too high, and as you can see from my crush map, although the weight of node-40 is set to ~.250, it's 256GB disk has filled up at an inconsistent ratio with the 1TB disks, to accommodate the replication factor.
15:41 andriikolesnikov joined #fuel
15:42 championofcyrodi (replication is set to 3, thus w/ 4 osds, the lower weighted 256GB SSD hit's capacity earlier than the others)
15:43 championofcyrodi maybe i can just change the placement group's replication factor from 3 to 2?
15:43 championofcyrodi (until i get more consistent osd nodes added to the cluster)
15:44 championofcyrodi that might be a safer alternative... at least on the pool with the most data... which are my volumes
15:50 devstok osd_journal_size = 2048
15:51 devstok in ceph.conf
15:52 MiroslavAnashkin championofcyrodi: You may set weight even lower - up to 0
15:53 MiroslavAnashkin Replication factor 2 is not recommended for everything except testing purposes
15:54 championofcyrodi I plan to set it back to 3 once i have moved all my icehouse environment volumes to juno, and move those nodes into the juno environment
15:54 devstok exact
15:54 devstok now I set replica 2
15:54 devstok for images
15:54 championofcyrodi which will give me 4 more osds
15:54 devstok try to get create an image faster than 20 minutes
15:54 devstok timeout
15:54 claflico am attempting to migrate from deploying nova environments to neutron environments due to the notice that nova is being retired. I experienced my first fuel deployment failure yesterday so would like to get a better understanding
15:55 devstok good luck claflico
15:55 claflico should the number of fixed public IPs equal the number servers being deployed?
15:55 claflico I saw that in a tutorial
15:56 devstok at least 4 for a HA env
15:56 devstok @MiroslavAnashkin -> osd_journal_size = 2048
15:56 devstok have to change?
15:57 devstok I got 3 Obj STorage with 2 disk of 2TB
15:57 devstok total about 12TB
15:57 devstok but the disks are SATA
15:57 claflico just after the cluster wizard, the public ip range is 172.16.0.2-172.16.0.126 and the floating ip ranges is 172.16.0.130-172.16.0.254…….what happened to .127-.129?
15:58 devstok HA proxy one
16:00 claflico in nova, the private network ip is 10.0.0.0/X but in neutron the internal network is 192.168.111.0/24, why not stick with the 10.0.X.0/24 method?
16:01 blahRus Hey guys, I am getting:
16:01 blahRus Failed actions:
16:01 blahRus p_mysql_monitor_120000 (node=node-2, call=222, rc=1, status=complete, last-rc-change=Sun Jan 18 08:32:03 2015
16:01 blahRus , queued=0ms, exec=0ms
16:01 blahRus ): unknown error
16:01 blahRus but I am seeing: resource p_mysql is running on: node-1
16:02 devstok wait to sync
16:02 championofcyrodi MiroslavAnashkin: Thanks so much.  lowering the replicas from 3 to 2 on my images volume didn't initiate a rebalance/recover of the used space on my osd.  but reweighting it did.
16:02 championofcyrodi i set it to .3
16:03 blahRus devstok: any way on seeing the status of the sync;ing?
16:03 championofcyrodi degradation is dropping now at 224MB/sec :)
16:03 andriikolesnikov joined #fuel
16:14 championofcyrodi hmmm okay... degradation has dropped from ~10% to ~3%
16:14 championofcyrodi which is WAY better, but still unhealthy
16:15 ChrisNBlum joined #fuel
16:15 championofcyrodi and my disk on the smaller osd is only ~30%, which seems to be an even distribution factor with the other osds.
16:15 devstok blahRus u can check if synced  /var/lib/mysql/grastate.dat   -> -1 an same Id in all controllers
16:16 championofcyrodi perhaps i need to run a scrub on the other nodes
16:20 blahRus devstok: yes, all of those files are the same on the 3 controllers
16:21 blahRus devstok: still seeing the p_mysql_monitor is a failed action
16:21 devstok crm
16:21 devstok resource
16:22 devstok cleaunp p_mysql
16:23 blahRus devstok: perfect, thanks
16:23 blahRus though now I am still getting "504 Gateway Time-out</h1> The server didn't respond in time." when using the API and/or horizon
16:29 claflico1 joined #fuel
16:31 blahRus fixed. rabbitmq was crashed, but crm showed it running?
16:36 devstok yes
16:36 devstok rabbit is a key component
16:36 devstok but sometime doesnt work properly
16:37 devstok Now I get ERROR oslo.messaging._drivers.impl_rabbit [-] Socket closed when creating an image RAW format
16:38 devstok blahRus : how did u restart rabbit? crm or service ?
16:55 e0ne joined #fuel
16:56 daniel3_ joined #fuel
16:58 claflico1 crap, deployment failed again
16:58 claflico1 it fails first with my two dedicated ceph nodes
17:05 e0ne joined #fuel
17:06 ahg joined #fuel
17:14 rmoe joined #fuel
17:18 devstok claflico how did u set ur env?
17:18 xarses joined #fuel
17:19 claflico non-ha, neutron, 1 controller/mongo (pe1850), 2 ceph storage & 3 compute (R610s)
17:23 claflico it looks like the controller node become unpingable after its deployment
17:23 claflico i could ping the other nodes from fuel host but not controller
17:23 claflico i rebooted the controller via ctrl+alt+delete and can ping it
17:24 claflico am attempting redeploy
17:28 devstok It could be better to set 2 controller for the monitoring of the ceph
17:29 claflico yeah, i plan on having multiple controllers
17:29 claflico just don't have the hardware just yet
17:30 claflico have to P2V the host first
17:30 claflico deployment of controller finished and is still pingable so hopefully others continue
17:33 claflico ok, deployment complete
17:38 claflico healthcheck through an error
17:38 claflico says that the TestVM is missing
17:42 emagana joined #fuel
17:43 claflico why is the neutron network names called "net04"
17:47 claflico just renamed them to be more user understandable (ext_net, admin_net) and the health test fails
17:54 claflico looks like the heat image is no longer provided: https://fedorapeople.org/groups/heat/prebuilt-jeos-images/README
18:14 championofcyrodi xarses: after spending all night reading and half the morning tweaking ceph...
18:15 championofcyrodi I now have a RAW image that boots an instance in seconds, and created less objects in the ceph pool than the qcow2 instances
18:29 emagana joined #fuel
18:30 emagana joined #fuel
18:35 championofcyrodi only issue i currently have is that my compute pool seems to be getting degraded, slowly.
18:36 championofcyrodi seems like the 'homeless' pgs issue, because when i restart the ceph nodes the degraded obs go away...then slowly come back while it's up
18:36 championofcyrodi doesnt happen on volumes/images though
18:36 championofcyrodi maybe because of the RAW write on read? and it's still replicating
18:44 xarses championofcyrodi: it sounds like one of the osd's isn't keeping up correctly. You'd want to crank up the debug values and see if you can figure from the logs who's misbehaving
18:44 xarses its odd, because the write io should have forced it to be in sync
18:52 Longgeek joined #fuel
19:00 codybum joined #fuel
19:11 championofcyrodi xarses: the pg errors are showing with, # ceph pg dump_stuck unclean
19:12 championofcyrodi looking at the acting and up osds, it seems osd.0,osd.3, and osd.2 that are up... and osd.1 acting, but not up.
19:13 championofcyrodi so it seems there is an expectation of 3 replicas...
19:13 championofcyrodi but only 2 are supported
19:14 championofcyrodi maybe because i have min_size set to 3 and size set to 3 on the images pool?
19:14 tzn joined #fuel
19:14 championofcyrodi yup, in every degraded ob, it's osd.1 is listed as acting, but not up.
19:15 championofcyrodi which is the osd i reweighted...
19:15 championofcyrodi maybe i should increase the weight a bit?
19:17 championofcyrodi from .3 to .5
19:20 championofcyrodi hmmm it seems like reducing the weight on node-40 from it's default determination can cause it to remain under replicated?
19:20 championofcyrodi just went from .5 to .6 and the degraded % dropped from .444 to .361
19:21 championofcyrodi there are 1955 objects on that osd and 320 marked degraded...
19:21 championofcyrodi so ~16%
19:22 ddmitriev joined #fuel
19:25 teran joined #fuel
19:26 emagana joined #fuel
19:29 xarses claflico: for the public addresses question, with neutron and fuel 5.1 or above, you only need the number of controllers + 1 in the public range, but there is a bug in the math in 5.1 that makes it not calcuate the required ip's correctly even though it will only assign controllers + vip
19:34 codybum joined #fuel
19:34 codybum Hi @xarses: are you around?
19:35 xarses nope
19:35 * xarses hides
19:35 codybum lolz
19:35 codybum I have looked a bit further into my issues around rabbitmq
19:36 codybum The problems only seem to be with neutron components
19:36 xarses 0.o
19:37 xarses in 6.0.x they should be using the same messaging code
19:37 xarses it is with the api side, or do the neutron agents on the compute's also have this problem?
19:39 xarses try this, each controller has 127.0.0.1 in its rabbit_hosts value, remove it and they should all have <controller 1 ip>,<controller 2 ip>,<controller 3 ip> and then restart all of the neutron services / agents on the controllers
19:41 HeOS joined #fuel
19:41 e0ne joined #fuel
19:44 emagana joined #fuel
19:49 emagana joined #fuel
19:50 stamak joined #fuel
19:53 Longgeek joined #fuel
20:06 ahg joined #fuel
20:09 emagana joined #fuel
20:39 bogdando joined #fuel
21:04 dmellado joined #fuel
21:05 MarkDude joined #fuel
21:09 championofcyrodi fyi... openstack juno via fuel 6.0, converting qcow2 -> raw using qemu on the controller is a bad idea.  And seems to be what happens when you Launch a Volume from an Image that is compressed.
21:09 championofcyrodi qemu eats up a ton of memory doing the conversion.
21:09 championofcyrodi and the entire cluster's performance suffers
21:10 championofcyrodi which is why swappiness = 10 is even more important w/ the controllers running ceph monitor.
21:22 championofcyrodi any ideas on how to download a RAW image from another rbd instance running in the same LAN?
21:26 dmellado joined #fuel
21:31 emagana joined #fuel
21:33 codybum joined #fuel
21:35 emagana joined #fuel
21:40 tzn joined #fuel
21:42 dmellado joined #fuel
21:45 e0ne joined #fuel
21:47 Longgeek joined #fuel
21:48 kupo24z joined #fuel
21:49 jaypipes joined #fuel
21:50 rbowen joined #fuel
21:54 codybum @xarses: Sorry.. traveling so internet is a problem.  I was saying that I think my issues are related to rabbitmq and neutron.
21:54 xarses 0.o
21:54 xarses in 6.0.x they should be using the same messaging code
21:54 xarses it is with the api side, or do the neutron agents on the compute's also have this problem?
21:54 xarses try this, each controller has 127.0.0.1 in its rabbit_hosts value, remove it and they should all have <controller 1 ip>,<controller 2 ip>,<controller 3 ip> and then restart all of the neutron services / agents on the controllers
21:55 xarses it will help to idetify if its a problem on the rabbit cluster partitioning
21:55 codybum I noticed that when I reset neutron-server on the controllers that there are at least 30 "Connected to AMQP server" logs.
21:55 codybum none of the other services do that.
21:55 xarses you could also just stop the agents and services on not the first controller
21:56 xarses and test with only the one running
21:56 codybum ah ok. so on the other controllers just shutdown all neutron services?
21:56 xarses ya, it will stop anyone from talking to anything but the first controller's rabbit
21:57 xarses unless it's dead
21:58 codybum should I also stop rabbit on all but one controller?
22:01 dmellado joined #fuel
22:08 dmellado joined #fuel
22:11 xarses codybum: you would do that with corosync, but lets try one thing at a time
22:11 codybum ok. I have shutdown all rabbitmq and neutron services on the two of the controllers.
22:11 codybum Node1 is the only one with rabbit or neutron
22:12 emagana joined #fuel
22:12 xarses for rabbit you did that via corosync?
22:12 xarses otherwise it will likely try to start it again
22:13 xarses you can 'pcs resource p_rabbitmq-server ban node-2 node-3'
22:13 denis_makogon joined #fuel
22:15 codybum I did't do it via corosync
22:15 codybum I will enable again
22:16 codybum ok they are enabled.
22:17 codybum Is it normal to see duplicate connections to the three rabbit servers when you restart neutron-server.
22:17 codybum There are 50+ entries in the log
22:18 codybum do I run "pcs" from the running controller?
22:19 xarses any of the controllers in ha mode
22:20 xarses pcs / crm are functionally the same, but have different syntax and crm is deprecated in the base OS
22:21 codybum right.. the command: pcs resource p_rabbitmq-server ban node-2 node-6 does not seem to be the correct format.  node-6 is my other controller.
22:21 xarses one sec
22:21 codybum resource show = Master/Slave Set: master_p_rabbitmq-server [p_rabbitmq-server], Masters: [ node-1 ], Slaves: [ node-2 node-6 ]
22:22 xarses look like its 'pcs resource ban <service> [node]'
22:24 dmellado joined #fuel
22:25 codybum I ran the command and there was no output.  They still show as slaves.
22:25 codybum Let me see if I can ref the master.
22:26 codybum ok.. now they show as stopped
22:26 xarses they take a about a min to show stopped after being banned
22:27 xarses you can do the same thing to the neutron agent service
22:27 e0ne joined #fuel
22:28 codybum yep
22:29 codybum all neutron services?
22:29 Obi-Wan joined #fuel
22:32 jetole joined #fuel
22:32 codybum ok.. rabbit and neutron have been stopped using pcs on all nodes except node-1
22:34 jetole Hey guys. `pcs status` is showing failed actions for a osd/controller for items such as "p_neutron-metadata-agent_monitor_60000" with "status=Timed Out". How do I diagnose this?
22:35 codybum $xarses:  Do I need to restart the services on node-1?  They are still trying to contact the disabled services.
22:35 Obi-Wan joined #fuel
22:36 jetole the other two where I am receiving this status on are "p_neutron-openvswitch-agent_monitor_20000" and "p_neutron-dhcp-agent_monitor_30000". All of them say 'unknown error' and status=Timed Out
22:36 xarses codybum: they should not be attempting to contact the disabled services, once the status (neutron agent-status) shows them as offline they wont be used
22:37 bogdando joined #fuel
22:37 codybum @xarses: they show as stopped
22:37 xarses then a new request should automatically avoid them
22:38 codybum I am going to restart the controllers to clear up any problems I might have caused by manual service restarts
22:40 jetole I forgot to mention I am on mirantis/fuel icehouse
22:50 codybum @xarses: reboot is complete.  lots of connection errors, which are to be expected to the downed services, but no failed to publish errors
22:51 codybum however the p_neutron-dhcp-agent is stopped for some reason
22:53 codybum I also don't see a "Clone Set: for p_neutron-dhcp-agent(ocf::mirantis:neutron-agent-dhcp):Stopped
22:54 xarses dhcp agent isn't a clone set, so thats correct
23:00 claflico on fuel 6, neutron, non-ha config….when the health test creates a network/router/volume/etc, it should be deleting them afterwards like in nova, right?
23:02 codybum @xarses: so far no fail to send errors.  I am trying to break it.
23:03 codybum @xarses: spoke too soon: oslo.messaging._drivers.impl_rabbit [-] Failed to publish message to topic 'reply_279ebd20139646be950e4de6c6d75019': Socket closed
23:04 codybum However, I am only seeing this on neutron-server, not the other agents
23:06 dmellado joined #fuel
23:08 dmellado joined #fuel
23:47 xarses joined #fuel
23:56 codybum @xarses: Down to one controller and I am still experiencing "Failed to publish message to topic" errors.  Any other ideas?

| Channels | #fuel index | Today | | Search | Google Search | Plain-Text | summary