Perl 6 - the future is here, just unevenly distributed

IRC log for #fuel, 2014-03-17

| Channels | #fuel index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
02:16 ToTsiroll joined #fuel
05:01 sunxin joined #fuel
06:00 IlyaE joined #fuel
06:08 Ch00k joined #fuel
06:27 xarses joined #fuel
06:38 dburmistrov joined #fuel
06:47 saju_m joined #fuel
07:15 alex_didenko joined #fuel
07:21 rongze joined #fuel
07:30 Ch00k joined #fuel
07:44 Ch00k joined #fuel
07:46 e0ne joined #fuel
07:48 bookwar joined #fuel
07:57 dburmistrov joined #fuel
08:22 dburmistrov_ joined #fuel
08:23 bookwar1 joined #fuel
08:32 e0ne joined #fuel
08:34 akasatkin joined #fuel
08:37 fandi joined #fuel
08:44 e0ne_ joined #fuel
08:51 Ch00k joined #fuel
08:55 vkozhukalov_ joined #fuel
09:16 tatyana joined #fuel
09:20 anotchenko joined #fuel
09:21 vk joined #fuel
09:27 Ch00k joined #fuel
09:48 fandi joined #fuel
09:54 topochan joined #fuel
10:13 e0ne joined #fuel
10:30 tatyana joined #fuel
10:31 dinosaurpt joined #fuel
10:32 dinosaurpt hi everybody
10:33 saju_m joined #fuel
10:34 dinosaurpt I've run into a problem deploying our PoC cloud using Fuel
10:34 dinosaurpt anybody can please lend me a hand?
10:44 anotchenko Sure. Describe it, and somebody will help you.
10:47 dinosaurpt Í've added 4 nodes to my environment
10:47 dinosaurpt everything went smooth
10:47 dinosaurpt I'm using VlanManager
10:48 dinosaurpt now, when i perform a network verification
10:48 dinosaurpt I keep getting a "Verification Failed" error "Expected VLAN (not received)"
10:50 dinosaurpt my switch port configuration is as follows:
10:50 dinosaurpt interface FastEthernet0/1  description maq_de_gestao  switchport trunk encapsulation dot1q  switchport trunk native vlan 701  switchport trunk allowed vlan 17,701-705  switchport mode trunk  no ip address  spanning-tree portfast
10:50 dinosaurpt interface FastEthernet0/1
10:50 dinosaurpt description maq_de_gestao
10:50 dinosaurpt switchport trunk encapsulation dot1q
10:50 dinosaurpt switchport trunk native vlan 701
10:51 dinosaurpt switchport trunk allowed vlan 17,701-705
10:51 dinosaurpt switchport mode trunk
10:51 dinosaurpt no ip address
10:51 dinosaurpt spanning-tree portfast
10:52 dinosaurpt VLAN 17 is public/floating
10:52 dinosaurpt 702 is management
10:52 dinosaurpt 703 storage
10:52 dinosaurpt 704-705 fixed
10:53 dinosaurpt 701 is untagged and I'm using it for PXE boot (wich is working)
10:56 dinosaurpt any ideas of what might be causing the problem?
10:57 vkozhukalov_ joined #fuel
11:07 tatyana joined #fuel
11:08 warpig dinosaurpt: I assume you're using Neutron with VLAN tagging?
11:09 warpig If you are, you need to make sure you're also allowing the VLAN range defined under the "Neutron L2 Configuration" section
11:10 warpig typically this is 1000-1030 by default.
11:13 dinosaurpt Hi warpig
11:13 dinosaurpt no, I'm using VlanManager
11:36 saju_m joined #fuel
11:49 e0ne joined #fuel
11:53 anotchenko joined #fuel
12:04 tatyana joined #fuel
12:05 TVR___ joined #fuel
12:08 e0ne_ joined #fuel
12:14 TVR___ does Fuel use dmcrypt on the disk for its ceph deployments?
12:18 Dr_Drache joined #fuel
12:30 evg joined #fuel
12:32 e0ne joined #fuel
12:45 e0ne joined #fuel
13:04 MiroslavAnashkin joined #fuel
13:11 vk joined #fuel
13:15 MiroslavAnashkin joined #fuel
13:37 MiroslavAnashkin joined #fuel
13:40 acca joined #fuel
13:41 acca Hi maybe somebody can help me. I'm trying to deploy OS node via fuel on some old IBM servers
13:42 acca Do anybody know about some bug related to RAID? Fuel doesn't see disk
13:42 acca of servers
13:43 MiroslavAnashkin What it your RAID controller model?
13:46 MiroslavAnashkin BTW, what is your Fuel version? Do you use the latest 4.1?
13:49 acca joined #fuel
13:50 rsFF hi guys even after the patch to 411
13:50 rsFF i still have some troubles
13:51 rsFF http://pastebin.com/XFjqZzX1
13:51 rsFF kickstart fails
13:51 rsFF is the only error that i see in the logs
13:59 TVR___ does Fuel use dmcrypt on the disk for its ceph deployments?
14:01 TVR___ getting an interesting error when I try to expand the OSD count in my existing ceph cluster..ceph-disk: Error: Device /dev/sdf3 is in use by a device-mapper mapping (dm-crypt?): md0
14:05 Dr_Drache hmmm
14:26 MiroslavAnashkin rsFF: You probably encountered this bug. https://bugs.launchpad.net/fuel/+bug/1268961  It is false error report.
14:31 orsetto joined #fuel
14:32 anotchenko joined #fuel
14:35 orsetto joined #fuel
14:36 jobewan joined #fuel
14:37 orsetto I am experiencing deply problem on fuel 4.1
14:37 orsetto 0 disk found on node with raid
14:38 orsetto there is a fix?
14:42 sbartel joined #fuel
14:43 rsFF hummm thing is then i dont really know why it failed the deployment using a HA setup
14:43 sbartel Hello
14:44 MiroslavAnashkin orsetto: What it your RAID controller model? What Fuel version do you use?
14:46 orsetto fuel 4.1 and controller adaptec
14:47 orsetto adaptec sas raid
14:47 orsetto IBM severaid 8i
14:47 sbartel I am trying to  apply what is explain here http://docs.mirantis.com/fuel-dev/develop/module_structure.html#the-puppet-modules-structure and I have a question
14:48 Dr_Drache joined #fuel
14:48 sbartel don't know if it is the appropriate channel or if I should join fuel-dev
14:49 obcecado hi guys, is there a supported method of using fuel to deploy openstack HA mode, on hardware that only has 2 nics? (using vlan manager)
14:49 orsetto the node show 0 disk
14:49 obcecado the only references so far using two nics, is with gre tunnels
14:51 sbartel my question is : is it possible to add custome package to the repository located on the fuel master node or should I add another repository on the source.list of the nodes?
14:51 MiroslavAnashkin orsetto: Yes, last weekend we found out that Fuel 4.1 does not support Adaptec ASR-7xxx line. And probably whole Adaptec ASR line.
14:53 MiroslavAnashkin orsetto: Please check your RAID model, so we knew which one we should include to upcoming Fuel 4.1.1
14:53 orsetto MiroslavAnashkin: same problem with fuel 4.0? I have tried it but it gives me the same problem.
14:54 MiroslavAnashkin orsetto: Yes, 4.0 with all latest patches still does not support multiple Adaptec controller models out of the box.
14:54 sbartel join #fuel-dev
14:55 orsetto MiroslavAnashkin: thanks I will double check the model. There is a list of supported controller?
14:56 Dr_Drache joined #fuel
14:58 anotchenko joined #fuel
14:59 IlyaE joined #fuel
15:05 rsFF MiroslavAnashkin - i got this in the fuel master
15:05 rsFF http://pastebin.com/LMphCZCK
15:05 rsFF wondering if it is a resource problem
15:09 MiroslavAnashkin orsetto: By default Fuel supports all controllers, for which Linux kernel has integrated drivers. But there is issue - old Adaptecs reported in Linux kernel as removable devices, so we had to implement workaround for all them.
15:13 MiroslavAnashkin rsFF: Please create diagnostic snapshot, upload it somewhere to Dropbox or similar service and provide a link. NIC order change on selected hardware is our headache for last halfyear.
15:13 orsetto MiroslavAnashkin: thanks, workaround like this can maybe solve the problem? https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=490f5e419e84fe34662c519772c9647f6cb091ee
15:14 rsFF MiroslavAnashkin - how do i create a diag snapshot?
15:14 MiroslavAnashkin orsetto:  This workaround is already included to 4.1
15:16 orsetto MiroslavAnashkin: so I need another workaround specific for my controller
15:18 MiroslavAnashkin rsFF: Select support in Fuel UI, click Generate diagnostic snapshot button, wait until the link to snapshot appears (it may take about hour for large installations with gigabytes of logs), then download snapshot file by this link
15:18 orsetto MiroslavAnashkin: how can I check if my controller is seen as a removable device?
15:20 MiroslavAnashkin orsetto: No, the issue is not in Removable flag. Please generate and share diagnostic snapshot too, so we may look at your dmesq log, see the device list etc.
15:20 saju_m joined #fuel
15:28 orsetto MiroslavAnashkin: ok thanks, last question, once generated the snapshot, where I have to submit it?
15:31 MiroslavAnashkin orsetto: Simply upload it to Dropbox or Google drive and share a link
15:32 orsetto MiroslavAnashkin: ok thanks a lot I will do this
15:55 xarses joined #fuel
16:00 acca joined #fuel
16:02 acca left #fuel
16:02 acca joined #fuel
16:03 acca left #fuel
16:12 justif2 joined #fuel
16:15 Dr_Drache MiroslavAnashkin, hard to belive, but 4.0 is giving me the same errors as 4.1
16:16 MiroslavAnashkin Dr_Drache: Do you use patched 4.0?
16:17 Dr_Drache MiroslavAnashkin, with v4 patch, and patch pmanager, and l23 patch
16:18 Dr_Drache that bio slab error
16:19 Dr_Drache or whatever is past it
16:19 crandquist joined #fuel
16:19 dhblaz joined #fuel
16:20 Dr_Drache seems it's still an issue with gfx.
16:20 Dr_Drache somehow it got fixed, then a reinstall of fuel broke it again
16:21 dhblaz I haven't seen it documented anywhere that while Galera has a node in recovery both that node and the node that is streaming the state transfer's services that depend on mysql are not operational.
16:21 dhblaz This is in fuel 4.0
16:21 dhblaz Has anyone observed this behavior?
16:22 dhblaz I think that this often times causes dhcp leases on the nodes to expire.
16:23 xarses_ joined #fuel
16:23 dhblaz Recovery on my cluster takes over 10 minutes, closer to 30 minutes.
16:24 dburmistrov joined #fuel
16:25 xarses_ dhblaz, there is a lot of mess with failovers that we have improved in 4.1
16:25 dhblaz Unfortunately there isn't an upgrade path to 4.1
16:25 dhblaz Nor from 4.1 to whatever comes next
16:25 e0ne joined #fuel
16:26 xarses_ dhblaz: yes, if one node is out long enough, a second is selected as a donor to catch it up
16:26 dhblaz a reboot is apparently "long enough"
16:26 IlyaE joined #fuel
16:27 xarses_ dhblaz: hmm, thats odd
16:27 xarses_ dhblaz: please file a bug for that
16:27 dhblaz I would have thought that while these two Galera nodes are down I would think that services that needed mysql would use the remaining working one.
16:28 xarses_ dhblaz: they should, there are a number of connection timeouts that we improved in 4.1, which might be why it took so long for you to see recovery
16:29 dhblaz For most of the recovery the state is like this:
16:29 dhblaz wsrep_local_state_comment  | Joining: receiving State Transfer
16:30 dhblaz I don't think it is related to timeout
16:30 dhblaz While in recovery, keystone, horizon, nova... are largely down.
16:30 rmoe joined #fuel
16:31 dhblaz The down time is long enough I imagine the rebooted node is receiving the entire database.
16:32 dhblaz I'll file a bug.  I don't know how much help I can be about it though because I can't really provide diagnostic log bundle, nor am I willing to replicate the problem on our production cluster.
16:34 Dr_Drache no diagnostic log? that makes a sad panda.
16:36 dhblaz The diagnostic log is too large for me to provide
16:37 dhblaz As another data point I accidentially did this command: crm resource restart p_mysql
16:37 dhblaz and that caused the same problem as a reboot
16:37 dhblaz State is wsrep_local_state_comment  | Joining: receiving State Transfer
16:37 dhblaz and has been for a few minutes now
16:38 dhblaz I imagine it will take as long as a reboot resolve
16:38 xarses_ dhblaz: I can still use the bug, we should be able reproduced handily.
16:39 xarses_ dhblaz: yes, if it goes into state transfer, and another node goes into donor, then it's actually doing a full sync of the db between nodes because it lost track of where it was when it was up last
16:39 MiroslavAnashkin dhblaz: You may try other than mysqldump sync methods for Galera. It supports rsync and xtrabackup. Have you enabled incremented state transfer for Galera?
16:40 MiroslavAnashkin incremental
16:40 dhblaz I don't think I have changed it from the way puppet set it up (fuel 4.0)
16:41 MiroslavAnashkin Both rsync and xtrabackup has lesser blocking time to donor
16:41 xarses_ dhblaz: as to the openstack services, if the management_vip, or public_vip  moved, then you will have 10-30 min of outage before the connections to the db, and rabbit recover themselves
16:42 xarses_ dhblaz: haproxy might additionally hold mysql on the wrong node if the first controller was selected as the donor node, and the mysql port wasn't closed (I'd need to check that)
16:42 xarses_ donor, or recipient node
16:46 Dr_Drache MiroslavAnashkin, xarses_
16:46 Dr_Drache something changed from the v2/v3 patch, to the v4
16:46 Dr_Drache that rebroke whatever we fixed.
16:47 xarses_ Dr_Drache: those 4.0 patches?
16:47 Dr_Drache xarses_, yes
16:47 Dr_Drache xarses_, went back to 4.0, with the v4 patches.
16:47 Dr_Drache and it's still as it is with 4.1
16:47 Dr_Drache unbootable ubnutu
16:47 Dr_Drache ubuntu
16:47 Dr_Drache on all nodes.
16:48 xarses_ oh, it won't really work on 4.1 very well
16:50 Dr_Drache nope, so as of now, i'm back to where I was when I first attempted fuel.
16:51 xarses_ =(
16:51 Dr_Drache sucks for me, I just told the team we can start deployment testing this afternoon. (this is all me, assumed that 4.0 worked)
16:51 Dr_Drache lol...
16:52 xarses_ hp hardware right?
16:52 Dr_Drache nope
16:52 Dr_Drache dell
16:54 dhblaz Dr_Drache: I thought my deployment would take a few weeks to work out the details especially using a tool like fuel and it took months (still finding little and big problems actually).  It could be that I am a dolt.  If time is of the essence you may consider looking into hiring a team to help you stand up your cluster.  I believe that Mirantis provides this kind of service.
16:56 Dr_Drache dhblaz, that's nice to hear, issue is, it WAS working. but the fixes were regressed.
16:56 Dr_Drache this isn't even standing up a cluster persi, this is it won't even install openstack.
17:00 Dr_Drache I just have to figured out all the changed that we had to do manually.
17:00 Dr_Drache and redo them
17:02 dhblaz xarses_: You helped me with this before.  will you tell me if this means I broke my db cluster again?
17:02 dhblaz http://paste.openstack.org/show/73676/
17:04 xarses_ dhblaz: line 43 shows that .2 and .3 appear to be working together, i guess this is 16, 17
17:05 xarses_ 18 is out by its self, you should be able to stop it and start it again, and it should re-join
17:05 xarses_ 16, still has mysql-wss start running so the socket might not be ready because of something galera is doing internally
17:06 dhblaz I'm worried my only good copy may be on node-18
17:06 dhblaz Because this number is so low on node-17:
17:06 dhblaz wsrep_last_committed       | 20917675
17:07 xarses_ dhblaz: 17, 18 have the same *state_uuid
17:07 xarses_ they almost have to have the same db for that to occur
17:07 xarses_ 99.9%
17:08 xarses_ that they are the same
17:08 xarses_ you can always use mysqldump to back it up if you're worried
17:08 dhblaz I'm trying
17:08 dhblaz [root@node-18 ~]# mysqldump --all-databases > /tmp/backup
17:08 dhblaz mysqldump: Error: 'Unknown command' when trying to dump tablespaces
17:08 dhblaz mysqldump: Got error: 1047: Unknown command when selecting the database
17:09 xarses_ Dr_Drache: I still have a copy of the ver2 of that patch
17:13 anotchenko joined #fuel
17:18 dhblaz As an update, I killed mysqld on node-18, that didn't work so I kill -9'd it
17:19 dhblaz node-16 is still reporting can't connect
17:19 dhblaz crossing my fingers
17:20 xarses_ dhblaz: make sure you tell crm to unmanage it or pacemaker will attempt to start it again
17:20 dhblaz Don't I want pacemaker to start it again?
17:20 xarses_ 18?
17:20 dhblaz Yes
17:21 MiroslavAnashkin Dr_Drache: For Dell you need both - disable serial console in GRUB options (solves issue with graphics) and add rootdelay=90 to GRUB options instead
17:21 xarses_ maybe
17:21 dhblaz When you wrote "18 is out by its self, you should be able to stop it and start it again, and it should re-join"
17:21 dhblaz I took that to mean that I needed the process to restart
17:22 xarses_ dhblaz: ok, sounds good, I was confused with which way you where going
17:23 Dr_Drache MiroslavAnashkin, both are there
17:23 Dr_Drache I used the pmanager.py.updated you provided
17:25 dhblaz I see a lot of this in the logs for node-16
17:25 dhblaz InnoDB: Unable to lock ./ibdata1, error: 11
17:26 xarses_ dhblaz: permissions, or another instance of mysql is still around in some capacity
17:29 dhblaz If I'm reading this correctly it isn't either of those
17:29 dhblaz http://paste.openstack.org/show/73677/
17:32 dhblaz I stand corrected:
17:32 dhblaz [root@node-16 ~]# lsof /var/lib/mysql/ibdata1
17:32 dhblaz COMMAND   PID  USER   FD   TYPE DEVICE   SIZE/OFF    NODE NAME
17:32 dhblaz mysqld  16215 mysql    3u   REG  253,0 8382316544 1049479 /var/lib/mysql/ibdata1
17:32 dhblaz mysqld  17511 mysql    3uW  REG  253,0 8382316544 1049479 /var/lib/mysql/ibdata1
17:34 angdraug joined #fuel
17:35 BillTheKat joined #fuel
17:35 Ch00k joined #fuel
17:39 dhblaz Hmmm, node-18 still seems "out on it' own"
17:39 Gil_McGrath I am new to Fuel - have question about the neutron VLAN set up on the network page:  In the Neutron L3 configuration, the internal network portion - it assigns its own private address space by default, but what network is it using? Is it grabbing VLAN from the private VLANS range?
17:40 dhblaz The uuid's don't match the other two nodes
17:41 mutex Gil_McGrath: it takes the first two vlans from the range you provide
17:41 Gil_McGrath why 2
17:41 mutex one for 'private' and one for 'public'
17:42 mutex actually i take that back it only creates one
17:42 mutex the other one is used from the public IP section at the top of the network config
17:43 Gil_McGrath On the public section - I specified the specific VLAN the public is on - this not in the L2 VLAN range
17:45 vkozhukalov_ joined #fuel
17:48 xarses_ dhblaz: https://gist.github.com/xarses/2703f00cf09805e918bc
17:49 Dr_Drache xarses_, when I replace pmanager.py
17:49 Dr_Drache cobbler sync is all that is required?
17:49 dhblaz Thanks
17:50 dhblaz Dr_Drache: I found I had to restart cobbler
17:50 Dr_Drache dhblaz, I did the sync, then a reboot
17:50 dhblaz Depending on what you were needing the sync to do you probably needed to restart (or reboot) then sync
17:51 xarses_ Dr_Drache: if you replace /usr/lib/python2.6/site-packages/cobbler/pmanager.py
17:51 xarses_ cobbler sync
17:52 xarses_ and you should be able to verify change in https://<fuel ip>/cblr/svc/op/ks/system/<node name> where node name is something like "node-1"
17:52 Dr_Drache dhblaz, I just didn't write this all down, properly.
17:52 Dr_Drache assumed that once fixed, the fixes were going to work and be commited. lol
17:53 xarses_ Dr_Drache: we couldn
17:53 Dr_Drache xarses_, that's not the file i'm replacing.
17:53 Dr_Drache or, not the location.
17:53 xarses_ then it wont work
17:54 Dr_Drache /etc/puppet/modules/cobbler/templates/scripts/
17:54 xarses_ /etc/puppet/modules/cobbler/templates/scripts/pmanager.py is the template that is put into the above location
17:55 Dr_Drache well
17:55 Dr_Drache ....
17:55 xarses_ hint, anything in /etc/puppet/modules is only applied with puppet, any puppet relating to the fuel node is only run once during bootstrap
17:55 Dr_Drache that's where I was told to do my changes last iem.
17:55 Dr_Drache then cobblersync
17:55 Dr_Drache and attempt a redeploy
17:56 xarses_ ya, cobbler sync won't read anything from /etc/puppet/modules*
17:56 tatyana joined #fuel
17:57 Dr_Drache so
17:57 Dr_Drache cp my updated pmanager.py to  /usr/lib/python2.6/site-packages/cobbler
17:58 xarses_ Dr_Drache:  i was trying to say, we coulnd't include the lack of console=ttyS0,9600 because a number of people need it to work. I'm pushing to make the kernel args field a variable that is passed from the cluster settings page so every one can customize it to their needs
17:59 Dr_Drache xarses_, I saw that bug.
17:59 Dr_Drache I just want to get back to testing, LOL
17:59 xarses_ Dr_Drache: I'm not sure what the other fix you needed was, but everything in v2 patch is already in the 4.1, I still need to compare it to v4 to see what that changes
18:00 xarses_ what is the other thing that was causing you issues?
18:00 Dr_Drache rootdelay
18:00 xarses_ oh, that was just proposed last week, it never made it to 4.1 =(
18:01 xarses_ on the bright side, it should make the next version
18:01 MiroslavAnashkin Rootdelay is included to 4.1 patch.
18:01 Dr_Drache also, nomodeset
18:02 Dr_Drache going to copy the file to the proper location now.
18:02 Dr_Drache that requires a cobbler sync then right?
18:02 xarses_ correct, and you can verify the change via the url pattern above
18:03 Gil_McGrath One more question: in the default Fuel set up: is there anyway to specify for CInder to use a Netapp plugin instead of having dedicated storage nodes.
18:03 Dr_Drache going to me pisseed at myself.
18:03 Dr_Drache if this work
18:04 xarses_ Gil_McGrath: just don't configure any cinder nodes, and then after deployment modify one or more controllers to use the netapp plugin and start cinder-volume on said controllers
18:04 dhblaz xarses_: do I need to give the second node I start time to sync before starting the third?
18:04 Gil_McGrath ok
18:04 xarses_ Gil_McGrath: we currently cant configure extra plugins out of the box
18:05 MiroslavAnashkin dhblaz: Yes, if your sync process takes 10 minutes.
18:05 dhblaz Darn, It didn't work
18:05 Gil_McGrath xarses_: thanks for the info
18:05 dhblaz node-16 and 17 didn't join the right cluster
18:05 xarses_ dhblaz: if they aren't synced, yes you should wait
18:05 dhblaz They both say synced
18:06 dhblaz but They have different uuid
18:06 Dr_Drache xarses_, have far back in the deploy process do I need to be, so that the new puppet is used?
18:13 dhblaz xarses_: when I follow your instructions
18:14 dhblaz It appears the node that I wanted to be the master didn't start with the cluster address argument
18:26 mutex ;-(
18:26 mutex rabbitmq
18:26 mutex why do you die
18:29 xarses_ mutex: yeppers
18:29 xarses_ we ask the same question
18:30 xarses_ dhblaz: you should stop all of the nodes except the one you want as the master and then re-connect them one at a time by blanking out that gconn:// config
18:30 dhblaz I did that
18:30 xarses_ erm
18:30 dhblaz and it didn't really work as documented
18:31 mutex xarses_: found any solutions ?
18:31 mutex it just seems to fail for no apparent reason
18:31 xarses_ mutex: 4.1 or 4.0?
18:32 xarses_ dhblaz: that doc workflow usually works very well
18:32 dhblaz Am I supposed to blank out gconn:// on the master or not?
18:32 xarses_ master
18:32 xarses_ no the "slaves"
18:33 xarses_ s/no/not
18:33 xarses_ but after everyone has been stopped
18:33 dhblaz As I read it the document implies on all nodes
18:33 mutex 4.0
18:34 xarses_ dhblaz: usually the problem i have is that crm is still attempting to start mysql while I'm attempting to bring it back online
18:35 dhblaz The problem I have now is it looks like it elected the wrong master
18:35 dhblaz and now my db is fubar
18:36 xarses_ mutex: if you move the management_vip the rabbit cluster will get funkey. If rabbitmqctl list_queues doesn't wok on all the nodes, then you likely need to restart the rabbit cluster
18:36 mutex xarses_: I haven't been moving it :-(
18:36 xarses_ if list_queus is fine, then you probably just need to restart the controller services
18:37 Dr_Drache xarses_, no change
18:37 mutex xarses_: what happens to me is that the cluster fragments
18:37 mutex several of the nodes are not longer part of teh cluster, they think they are the only member
18:38 dhblaz mutex: I had the same problem until I identified a network problem and resolved it.
18:38 dhblaz :( now my master says: Initialized
18:39 mutex dhblaz: hmmm what kind of network problem ?
18:39 xarses_ mutex: ya, you need to stop them and then restart them and they should rejoin, you should also check 'rabbitmqctl status' looking for the file_descriptors limit
18:40 mutex yeah I did all that I know how to recover it
18:40 mutex I just wish it didn't die for no reason in the first place
18:40 mutex usually seems to happen over the weekend
18:40 dhblaz On my network the switch ports were getting disabled
18:40 mutex oh dear
18:40 dhblaz the cisco equivalent of err-disable
18:41 xarses_ mutex: rabbit in 4.0 has a fd leak, you can update it from the 4.1 repo and//or increase the ulimit to something much bigger than 1024
18:41 dhblaz but a vlan bug that vlan splinters addresses could also cause the same problem if your management network is on a vlan
18:42 xarses_ dhblaz: is there any mysqldump processes running?
18:42 dhblaz mutex: I had that problem that xarses_ mentioned too
18:42 xarses_ probably on the "master"
18:42 Dr_Drache xarses_, the new file is in the url, but no change to the deployment
18:42 mutex oh dear, fd leak is bad
18:42 mutex is there a fuel bug documenting this ?
18:42 dhblaz No
18:42 xarses_ mutex: I think so
18:42 dhblaz Sorry, xarses_  no
18:43 dhblaz mutex: if you look in the 4.1 release notes there is a link to the bug
18:43 mutex KKKKKK
18:43 richardkiene joined #fuel
18:45 xarses Dr_Drache: the url doesn't have your changes in it?
18:45 Dr_Drache the url does
18:45 Dr_Drache the deployment still fails
18:45 xarses did you reinstall the node?
18:46 dburmistrov joined #fuel
18:46 Dr_Drache xarses, all the nodes.
18:46 xarses you can goto https://<fuel node>/cobbler_web -> systems and then "NetBoot enable" the deployed nodes to force a single node to re-install
18:47 Dr_Drache 2 computes and controller
18:47 xarses so we can re-test untill we find something successful
18:48 xarses Dr_Drache: can you paste the results of that ks url to me
18:48 e0ne joined #fuel
18:48 * xarses is away roaming for noms
18:49 Dr_Drache http://paste.openstack.org/show/73686/
18:49 Dr_Drache xarses,
19:11 Gil_McGrath joined #fuel
19:13 Gil_McGrath may be a stupid question: But as newbie to Fuel how can I find the IP addresses assigned to the servers so I can log into them?
19:13 TVR___ from fuel:
19:13 TVR___ cobbler list
19:13 TVR___ that gives you names
19:14 TVR___ then ssh to them
19:14 TVR___ use the fuel node as a jump host
19:15 Gil_McGrath TVR___ : thanks I will give that a try - was not obvious.
19:16 TVR___ to be honnest, if I had not worked with cobbler previously, I would have been in the same place.
19:29 xarses Gil_McGrath: from the UI you can click on the gear for a node and see the deployed hostname and ip addresses (under interfaces)
19:30 fandi joined #fuel
19:33 xarses Gil_McGrath: you can also use the fuel cli 'fuel node' to list the nodes it doesn't show ip addresses (currently, but it can easily be modified) but the id column is the digit for the hostnames which become node-<id>
19:36 dhblaz joined #fuel
19:38 Gil_McGrath xarses: I see what you are saying fuel cli - but in the UI - the interfaces expansion under the "gear" show interfaces but not the IP addresses.
19:40 dhblaz joined #fuel
19:41 xarses Gil_McGrath: if they have an address on the physical interface, they should be shown after you expand the interfaces section
19:42 Dr_Drache xarses, you did get my paste right?
19:42 xarses Dr_Drache: yes
19:42 Dr_Drache kk
19:42 Dr_Drache just making sure
19:47 MiroslavAnashkin Fuel patches moved to http://download.mirantis.com/fuelweb/Fuel_updates/
19:54 TVR___ do we think a patch for adding a controller will be in the 4.11 release?
19:55 TVR___ or will this be in the 4.2
19:55 TVR___ adding a controller to an existing cluster + ceph as you remember
19:58 dhblaz hey mutex, what do you get when you do this on a controller?
19:58 dhblaz for PID in `pidof beam.smp`; do fgrep 'Max open files' /proc/$PID/limits; done
20:01 dhblaz If you don't have something like this:
20:01 dhblaz Max open files            102400               112640               files
20:02 dhblaz I suggest you modify /etc/security/limits.conf
20:02 dhblaz Mine looks like this:
20:02 dhblaz http://paste.openstack.org/show/73690/
20:24 xarses TVR___: you can't add additional controllers with the ceph role?
20:56 Ch00k joined #fuel
21:07 e0ne joined #fuel
21:10 mutex dhblaz: so looks like I get 102400
21:10 dhblaz Even on processes where you haven't restarted rabbitmq since boot?
21:11 mutex hmm
21:11 mutex I don't have any that were not restarted since boot
21:11 mutex I just restarted everything a couple of hours ago
21:12 dhblaz If I recall limits were set differently at boot than if you use service restart
21:12 mutex I see
21:12 mutex well in the config file I have the same top two lines you posted
21:13 mutex this is on centos
21:15 e0ne joined #fuel
21:15 mutex brb
21:16 dhblaz Per http://www.unix.com/man-page/Linux/5/limits.conf/: NOTE: group and wildcard limits are not applied to the root user. To set a limit for the root user, this field must contain the literal username root.
21:46 mutex oh dear
21:46 mutex well, good to know
21:46 dhblaz :) I didn't have it in there for my health
21:48 mutex heh
21:51 miroslav_ joined #fuel
21:52 mattymo joined #fuel
22:00 dhblaz Does 4.1 fix this?
22:00 dhblaz WARNING: libcurl doesn't support curl_multi_wait()
22:05 xarses dhblaz: cant remember, I've seen something about it recently
22:05 xarses there should be a bug with that
22:07 dhblaz I'll look, thanks
22:07 dhblaz Looks like it needs libcurl 7.28.0
22:09 dhblaz Is there any trick to restarting rabbitmq?
22:12 angdraug if you're on rabbitmq2, you need to stop the whole cluster, then start them in the reverse order
22:12 angdraug rabbitmq3 is more reasonable about restarts, but we're still testing it
22:14 dhblaz How do I figure out the order?
22:14 xarses whichever one you stopped last should be the first started
22:15 dhblaz Does rabbitmq use mysql for the backend?
22:15 xarses as to which one you stop last, it isn't really important
22:15 xarses no
22:17 dhblaz The last one is taking a long time to stop
22:25 dhblaz Seems like the last one to start is crashing...  http://paste.openstack.org/show/73703/
22:30 e0ne joined #fuel
22:44 MadHatter87 joined #fuel
22:52 dhblaz joined #fuel

| Channels | #fuel index | Today | | Search | Google Search | Plain-Text | summary