Perl 6 - the future is here, just unevenly distributed

IRC log for #fuel, 2015-07-07

| Channels | #fuel index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:09 xarses joined #fuel
01:18 Longgeek joined #fuel
02:55 hakimo joined #fuel
03:18 Longgeek joined #fuel
05:22 e0ne joined #fuel
05:26 Longgeek joined #fuel
05:56 Longgeek joined #fuel
05:56 stamak joined #fuel
06:21 dancn joined #fuel
06:46 DevStok joined #fuel
06:46 DevStok hi
06:49 DevStok need help to configure ceph radosgw with centralized keystone
06:57 e0ne joined #fuel
07:36 e0ne joined #fuel
07:45 neophy joined #fuel
07:45 stamak joined #fuel
07:45 dancn joined #fuel
08:36 DaveJ__ joined #fuel
08:45 Alcest joined #fuel
09:03 HeOS joined #fuel
09:13 dklepikov joined #fuel
09:47 HeOS_ joined #fuel
10:20 stamak joined #fuel
10:49 martineg_ joined #fuel
11:27 eliqiao joined #fuel
11:45 Longgeek joined #fuel
11:53 DevStok bye bots
12:02 jaypipes joined #fuel
12:22 Samos123 seeing following issue when using zabbix plugin in 6.1: http://paste.openstack.org/show/351779/
12:22 Samos123 dependency problem of a missing package when installing zabbix-server-mysql
12:22 Samos123 this is only on the 2nd and 3rd controller, the 1 st controller doesn't seem to have the issue
12:23 samuelBartel joined #fuel
13:07 julien_ZTE joined #fuel
13:11 evg Samos123: "broken packages" , "is not installable
13:13 evg Samos123: may be you have to remove these broken files from package's cache
13:15 evg Samos123: They must be broken during downloading because of some (temporary?) network or repository problem
13:42 Samos123 yea you're right, seems there was a temporary network problem of somebody using the same ip :P, I'm arp poisining every 1 sec so that the ip stays mine now
13:43 Samos123 the ip belongs to me but guess some guy thought he could just randomly use it... network isn't managed well here ;)
13:51 hezhiqiang joined #fuel
13:56 hezhiqiang Hi, I'm in the "Fuel 6.1 setup" console to configure Fuel networks, there are 4 NICs: eth0, eth1, eth2, eth3. eth0 is a bridged adapter, all others are host-only. When set gateway of eth0 as: 192.168.8.1, why all others NIC's gateway becoming 192.168.8.1 ?
14:09 claflico joined #fuel
14:57 rmoe joined #fuel
15:14 championofcyrodi joined #fuel
15:40 darkhuy joined #fuel
15:44 championofcyrodi well, i upgrade my fuel server, it is using a 1Gbps connection and i've created a local ubuntu repo as well.  our ISP connection is 100Mbps and the progress provisioning the ubuntu image still seems too slow to complete in 30 minutes...
15:46 championofcyrodi at the moment i'm trying to diagram out my configuration to see if anyone can explain to me what the heck i'm doing wrong here.
15:48 mwhahaha when the image is building, have you check the the IO usage on the fuel master?
16:18 championofcyrodi mwhahaha: here is my layout http://i.imgur.com/93ZmGT2.png
16:18 championofcyrodi checking fuel master IO now.
16:19 championofcyrodi load average: 6.07, 6.30, 5.82
16:19 rodrigo_BR joined #fuel
16:20 rodrigo_BR hello there, can i install lbaas 1.0.0 on fuel 6.1 ?
16:22 championofcyrodi mwhahaha: %iowait has been 23.05, 33.23, 31.75, 30.54 over the last 40 minutes
16:22 mwhahaha championofcyrodi: that seems kinda high? what does iostat say for disk utilitization? What I'm thinking is that the building of the image on the disk is your bottleneck for your system, which if that is the case we need to update that bug around the image building process to indicate that it does indeed happen
16:23 championofcyrodi http://paste.openstack.org/show/a9hyH0E20GsRH80mDtKA/
16:23 championofcyrodi iostat output ^
16:24 mwhahaha do an iostat -xk
16:25 championofcyrodi http://paste.openstack.org/show/pCwjxag19sPB3nq4PoDn/
16:25 mwhahaha is it building right now?
16:26 championofcyrodi i'm seeing debug statements saying, Data received by Provisioning Proxy Reporter to report it up
16:26 championofcyrodi progress=>70
16:27 championofcyrodi in /var/log/docker-logs/astute/astute.log
16:27 mwhahaha if you do a 'iostat -xkN 2' you can watch it over time and see which device is being used
16:29 championofcyrodi looks like docker-253:3-3276801-a46c924ac6ac085841cb03bf6481a073aa3996cdaeb075905e ... which is ...\
16:29 championofcyrodi fuel/mcollective_6.1:latest
16:30 championofcyrodi has 99.55 %util
16:31 mwhahaha that seems odd, i'm going to do a deploy to see if i seem the same on my laptop
16:31 mwhahaha gimme a few
16:34 championofcyrodi it seems to have settled down now
16:37 championofcyrodi at this point it looks like almost nothing is happening, zeros across the board
16:37 championofcyrodi the percentage is still going up on the Fuel Master UI though... 58% which i think is further than it got last time.
16:38 championofcyrodi and all of the progress bars for each node appear to be at 100%... if that matters.
16:38 championofcyrodi now it is doing reboot tasks and getting successfully rebooted
16:38 championofcyrodi with no nodes failing to reboot
16:41 championofcyrodi debug:, still provisioining following nodes, 1,2,3,4,5, ProvisioningProxyReporter to report it up: all 5 nodes showing at 100 progress.
16:41 championofcyrodi casing message to nailgun, provision_resp, args... status provisioned.
16:41 mwhahaha the image build worked and now it's moving on to the openstack deployment items
16:42 championofcyrodi seems like it.
16:42 championofcyrodi i've used docker a lot with some other applications, it seems as though my performance is MUCH better if I symlink /var/lib/docker to another disk, rather than the OS disk.
16:42 championofcyrodi also I've gotten better performance with btrfs
16:42 championofcyrodi well, for reads.
16:43 championofcyrodi but i'm usually reading large chunks of sequential blocks.
16:44 championofcyrodi all the OSes are installed, and puppet is running now.
16:44 championofcyrodi i guess a beefier PC got me through it this time before the timeout.
16:45 championofcyrodi i can zip all my logs up and pass them along, or update any kind of bug report if you think it will help.
16:46 mwhahaha might make a mention of specs of the hardware
16:50 championofcyrodi the fuel master is a core i7 vPro (1 socket quad core @ 2.9Ghz), with a1 TB 7200RPM SATA 3Gb/s 32 MB Cache, 16GB RAM.
16:51 championofcyrodi maybe something fishy w/ the network config?
16:53 mwhahaha probably your sata drive
16:53 mwhahaha or was that the one that worked?
16:54 championofcyrodi that is the one that worked
16:54 mwhahaha what was the other one
16:54 championofcyrodi let me see...
16:56 championofcyrodi Samsung SpinPoint HE160HJ 160GB SATA/300 7200RPM 16MB Cache
16:56 championofcyrodi Core2Duo, 4GB RAM
16:57 championofcyrodi which we were using when deploying Centos on 5.0, and then upgrade to 6.0
16:57 championofcyrodi and redeployed icehouse w/ it.
16:57 championofcyrodi but agian, that was centos which you mentioned was prebuilt, so not really relevant.
16:58 mwhahaha yea i think it was the hardware
16:58 championofcyrodi we lamely assumed it was just a 'pxe' server, and that the actual nodes would download everything and build/configure locally.
16:58 mwhahaha well you could use the classic provisioning
16:58 mwhahaha which that would do
16:58 mwhahaha but that's slotted to go away with 7
17:09 angdraug joined #fuel
17:09 e0ne joined #fuel
17:10 championofcyrodi i used to work w/ a developer who built on an SSD... had the same problem determining what was the minimum hardware requirement in prod.
17:10 championofcyrodi initial deployment was so slow it was unacceptable.
17:10 championofcyrodi didnt help the customer was using like, IE7
17:11 championofcyrodi "works for me" :)
17:13 mwhahaha indeed
17:13 mwhahaha but server class hardware is usually better than some random samsung 160G drive ;)
17:22 stamak joined #fuel
17:23 championofcyrodi **burn**
17:25 championofcyrodi 128GB SAS Disk is the suggested minimum.
17:26 championofcyrodi http://answers.splunk.com/answers/84340/how-many-iops-can-i-expect-from-a-disk.html
17:26 championofcyrodi i see.
17:30 mwhahaha yea sas > sata
17:53 Longgeek joined #fuel
17:56 ddmitriev joined #fuel
17:57 stamak joined #fuel
17:57 championofcyrodi one other question.  what is the use of the PUBLIC network on the compute interfaces?
17:59 mwhahaha i'd assume it's for providing public/floating ips for vms but someone else should probably confirm that
17:59 julien_ZTE joined #fuel
17:59 championofcyrodi so my compute nodes NEED to be connected to the public switch as well?
18:00 championofcyrodi I am interested in this feature: https://blueprints.launchpad.net/fuel/+spec/virtual-router-for-env-nodes
18:01 xarses joined #fuel
18:01 championofcyrodi so that i don't need to dedicate a physical NIC to public switch
18:06 mwhahaha what you describe would be nova networking where all traffic goes through a single point. i mean you don't have to have public interfaces if you don't plan on using the public network i guess
18:07 championofcyrodi well the fuel UI sort of forces them on  you
18:08 championofcyrodi i just jammed them over with the Management/Admin/Private networks...
18:08 mwhahaha yes because usually that's what people want for a cloud, where the internal networks are not accessible outside of the cloud and the public interface is how people would access services
18:08 championofcyrodi also my install just errored out while near the end :(
18:08 championofcyrodi (/Stage[main]/Ceph::Osds/Ceph::Osds::Osd[/dev/sdd3]/Exec[ceph-deploy osd prepare node-1:/dev/sdd3]/returns) change from notrun to 0 failed: ceph-deploy osd prepare node-1:/dev/sdd3 returned 1 instead of one of [0]
18:09 mwhahaha ceph'd :(
18:09 mwhahaha i've seen that error before, usually when the disk had paritions on it ithink
18:09 championofcyrodi it did
18:09 championofcyrodi it was the old fuel 6.0 juno environment's disk
18:10 mwhahaha yea you'll need to clean the old partitions first
18:10 mwhahaha or just reset the environment now
18:10 mwhahaha it'll nuke the partitions when it resets the systems
18:11 championofcyrodi my other two nodes havent failed yet...
18:11 mwhahaha so if they don't you can manually clear the partitions and redeploy
18:12 mwhahaha or reset the environment which will reset (and clean the machines) and you can redeploy from scratch. fortunately it won't rebuild the image because it's already there if you just do a reset
18:12 championofcyrodi just run gdisk and blow'em all away?
18:12 championofcyrodi cool
18:12 mwhahaha yea that should work
18:24 championofcyrodi nice... looks like the other two nodes will be successful.  i'll let'em all finish up and do the reset
18:26 championofcyrodi did you get a chance to see the diagram i put together for my stack, and determine if it seems sensible?  The idea is to get better bandwidth for storage and hopefully have less chatter from multiple controllers.
18:26 championofcyrodi since i only have a 1Gbps switch
18:27 championofcyrodi as you can see, i just put the 'public' network on eth0 with the others.
18:27 championofcyrodi in hopes the vrouter feature will just work out of the box?
18:28 championofcyrodi except on the controller, 'public' is on the 10Gb fiber
18:28 championofcyrodi while everything else is on bond0
18:28 mwhahaha vrouter is already there, it's on the controllers for some stuff. you can see it in 'ip netns list'
18:28 mwhahaha your environment is probably ok for a smallish deployment, it really depends on your use case
18:29 championofcyrodi smallish is what it has come to...
18:30 junkao_ joined #fuel
18:30 championofcyrodi everyone wants baremetal, but they want someone else (me) to maintain all of it... which i was hoping to get out of a little bit by using openstack and let each project be in it's own tenant.
18:31 championofcyrodi but i've had a lot of trouble resovling neutron issues within 5 minutes... so it's lost some credibility/funding.
18:32 championofcyrodi of course the fact that services were up and responsive for 8 months w/o incident is overlooked.
18:32 championofcyrodi similar issue w/ icehouse... after 5-6 months, it just started to crawl and ultimately stop serving requests.
18:33 championofcyrodi my guess is that mysql tables and rabbitmq queues were growing unbounded and timeouts throughout occur.
18:33 championofcyrodi based on the error messages reporting 'timeout' and 'unknown error' everywhere
18:35 mwhahaha indeed
18:36 championofcyrodi it's almost as if you need more than one person to maintain a production worthy stack...
18:38 mwhahaha the cloud still requires care and feeding, i don't think people quite understand that
18:39 championofcyrodi i keep telling myself to install devstack on a VM so I can practice things like cleaning up mysql.
18:40 championofcyrodi there is so much going on, i'm always nervous i'm going to blow away some schema dependency because of some foreign key constraint, and end up losing the ability to deploy instances or something.
18:43 rodrigo_BR hello there, the lbaas plugin 1.1.0 works with the fuel 6.1 ?
18:44 championofcyrodi rodrigo_BR: https://docs.mirantis.com/openstack/fuel/fuel-6.1/release-notes.html
18:44 championofcyrodi Ctrl+F, type 'LBaaS plugin'
18:45 championofcyrodi 'The 6.1 compatible LBaaS plugin has been modified so that it can be deployed on controllers in HA mode. Please note that this enables the new LBaaS plugin to work with 6.1, but does not make the plugin itself HA.'
18:47 championofcyrodi ***Cloning into 'devstack'...
18:50 mwhahaha rodrigo_BR: checking for you, i think 1.1.0 is the 6.1 version but i'm not sure
18:55 rodrigo_BR tks championofcyrodi
18:56 rodrigo_BR tks mwhahaha
18:56 championofcyrodi rodrigo_BR:  I would check w/ mwhahaha, because those release notes don't mention if the plugin is 1.1.0
18:59 championofcyrodi finally all nodes timed out and failed ceph deploy... kicking off reset...
18:59 championofcyrodi hopefully this sucka will be up and running so i can start glance imports tonight.
19:03 rodrigo_BR [root@fuel tmp]# fuel -v
19:03 rodrigo_BR DEPRECATION WARNING: /etc/fuel/client/config.yaml exists and will be used as the source for settings. This behavior is deprecated. Please specify the path to your custom settings file in the FUELCLIENT_CUSTOM_SETTINGS environment variable.
19:03 rodrigo_BR 6.1.0
19:03 rodrigo_BR [root@fuel tmp]# fuel plugins --install lbaas-1.0.0.fp
19:03 rodrigo_BR DEPRECATION WARNING: /etc/fuel/client/config.yaml exists and will be used as the source for settings. This behavior is deprecated. Please specify the path to your custom settings file in the FUELCLIENT_CUSTOM_SETTINGS environment variable.
19:03 rodrigo_BR DEPRECATION WARNING: The plugin has old 1.0 package format, this format does not support many features, such as plugins updates, find plugin in new format or migrate and rebuild this one.
19:03 rodrigo_BR Plugin lbaas-1.0.0.fp was successfully installed.
19:03 rodrigo_BR [root@fuel tmp]# fuel plugins list
19:03 rodrigo_BR DEPRECATION WARNING: /etc/fuel/client/config.yaml exists and will be used as the source for settings. This behavior is deprecated. Please specify the path to your custom settings file in the FUELCLIENT_CUSTOM_SETTINGS environment variable.
19:03 rodrigo_BR id | name  | version | package_version
19:03 rodrigo_BR ---|-------|---------|----------------
19:03 rodrigo_BR 2  | lbaas | 1.0.0   | 1.0.0
19:03 rodrigo_BR [root@fuel tmp]#
19:03 rodrigo_BR install but don't works
19:04 stamak joined #fuel
19:05 mwhahaha yes 1.0.0 is not compatible with 6.1
19:06 rodrigo_BR i have another way to install this ?
19:11 mwhahaha you could manually install the lbaas openstack agent
19:19 championofcyrodi oooo....https://github.com/error10/kexec-reboot
19:19 championofcyrodi my HP proliants take FOREVER to reboot.
19:20 championofcyrodi i like the idea of booting into a new kernel, since hardware post checks are not needed for a reboot.
19:25 championofcyrodi essentially, this **should** reboot the fuel master w/o BIOS post check...
19:25 championofcyrodi kexec -l /boot/vmlinuz-2.6.32-504.1.3.el6.mos61.x86_64 \
19:25 championofcyrodi --initrd=/boot/initramfs-2.6.32-504.1.3.el6.mos61.x86_64.img \
19:25 championofcyrodi --command-line="root=/dev/mapper/os-root rd_NO_LUKS  KEYBOARDTYPE=pc KEYTABLE=us LANG=en_US.UTF-8 rd_LVM_LV=os/root rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=auto  biosdevname=0 crashkernel=none rd_LVM_LV=os/swap rd_NO_DM"; systemctl start kexec.target
19:26 championofcyrodi i'll be trying that out w/ a VM
19:26 mwhahaha i don't think we actually reboot for image base provisioning
19:26 mwhahaha unless you have the reboot plugin
19:26 championofcyrodi doesnt look like it did...
19:26 championofcyrodi but the reset did
19:27 mwhahaha yes the reset will because we wipe the disk then pxe boot
19:27 mwhahaha in the fuel docs there's a whole page about who the image based provisioning works
19:27 championofcyrodi i guess you have to do bios reboot for pxe
19:27 CTWill joined #fuel
19:28 championofcyrodi https://docs.mirantis.com/openstack/fuel/fuel-6.1/reference-architecture.html#image-based-provisioning
19:28 mwhahaha yea that
19:54 e0ne joined #fuel
19:54 championofcyrodi ahh astute uses cobbler!
19:54 mwhahaha rodrigo_BR: i just got word that the 6.1 compatible version of the lbaas plugin has not been released yet. I was told soon but not a date
19:55 championofcyrodi so basically fuel is using some orchestration scripting of cobbler+puppet+mcollective to deploy...
19:56 championofcyrodi which is what i've traditionally done w/ an OS and installed packages for java, hadoop, zookeeper, etc...
19:57 championofcyrodi which is nice because it don't have to learn what packages and configs openstack really needs.
19:58 championofcyrodi which is bad because i don't know what packages and configs openstack really needs.
19:58 CTWill just ran the 6.0 to 6.1 and I can not do a mco ping -i to any of the nodes I am trying to deploy a new cloud on. any ideas?
19:58 rodrigo_BR mwhahaha: Good New !
20:01 Longgeek joined #fuel
20:03 CTWill joined #fuel
20:15 CTWill anyone?
20:16 mwhahaha CTWill: did you reboot the nodes that were sitting in bootstrap?
20:18 CTWill nope, rebooting them now. This is my second instance of trying to deploy on 6.1
20:19 CTWill I tried dumping the entire cloud and starting a new deployment
20:22 CTWill no joy
20:22 CTWill rebooted all 4 systems
20:22 CTWill still can not do a mco ping -I
20:22 CTWill No responses received
20:23 mwhahaha what are you running? 'mco ping -I'? i get an error when you run that
20:24 CTWill started with network verification
20:24 CTWill its fails
20:25 CTWill Method verify_networks. Network verification not avaliable because nodes ["112", "113", "114", "116"] not avaliable via mcollective
20:25 mwhahaha does fuel report them online?
20:25 CTWill yes
20:25 CTWill online = true
20:25 mwhahaha can you run a 'fuel node' and  just an 'mco ping' and share the results?
20:26 CTWill for all 4 nodes
20:26 CTWill I can ping them from the fuel master
20:28 mwhahaha the other thing would be to log into those nodes and check that mcollective is running
20:30 CTWill it is running
20:31 CTWill i just restarted it
20:31 mwhahaha what's running? the deploy?
20:31 CTWill mco ping -i no responcese recieved
20:31 CTWill network verification
20:31 CTWill pre-deploy test
20:31 mwhahaha is the mcollective docker instance running?
20:31 CTWill yes
20:32 mwhahaha because that should have answered to an 'mco ping'
20:32 CTWill mcollective  fuel/mcollective_6.1     Running      fuel-core-6.1-mcollective
20:32 mwhahaha please show me the output to just 'mco ping' no -i or anything
20:32 CTWill one sec
20:33 CTWill master                                   time=34.81 ms
20:33 CTWill 1 replies max: 34.81 min: 34.81 avg: 34.81
20:34 mwhahaha ok so the master is ok, now i find it odd that the nodes are showing as online but are not connect to mcollective
20:34 CTWill each mco ping does about the same
20:34 mwhahaha because they should go offline
20:34 championofcyrodi mwhahaha: :(
20:34 championofcyrodi (/Stage[main]/Ceph::Osds/Ceph::Osds::Osd[/dev/sdd3]/Exec[ceph-deploy osd prepare node-3:/dev/sdd3]/returns) change from notrun to 0 failed: ceph-deploy osd prepare node-3:/dev/sdd3 returned 1 instead of one of [0]
20:34 championofcyrodi same error after reset... gonna have to dig deeper...
20:34 mwhahaha championofcyrodi: try and run that command manually
20:34 mwhahaha see what the error is
20:36 championofcyrodi bootstrap-osd keyring not found; run 'gatherkeys'
20:36 CTWill everything else on the network verification seems to be happy except the mcollect
20:36 championofcyrodi [ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
20:36 championofcyrodi [ceph_deploy.cli][INFO  ] Invoked (1.5.20): /usr/bin/ceph-deploy osd prepare node-2:/dev/sdd3
20:36 championofcyrodi [ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks node-2:/dev/sdd3:
20:36 championofcyrodi [ceph_deploy][ERROR ] RuntimeError: bootstrap-osd keyring not found; run 'gatherkeys'
20:36 championofcyrodi ^more verbose
20:36 mwhahaha CTWill: if they nodes aren't checking in to mcollective none of the deploy functions will work
20:37 CTWill yup I noticed that on my first attempt with MOS 5.0
20:37 mwhahaha championofcyrodi: there's probably something in the puppet that is doing that, i thought it might be simple to try and reproduce :/
20:37 CTWill I double checked the network settings
20:37 mwhahaha championofcyrodi: you can manually run the task that failed on the box if you go dig out the puppet file from the logs on that node
20:38 mwhahaha CTWill: make sure you can log into those nodes and check if mcollective is running. You should also look in the remote log files for those nodes as well /var/log/docker-logs/remote (on the fuel master)
20:39 championofcyrodi but how do i ensure the rest of the openstack deploy tasks are executed?
20:39 championofcyrodi to get the fuel master out of a 'failed deployment state'?
20:39 mwhahaha championofcyrodi: we're just trying to figure out why that task is failing so we can fix it :)
20:39 mwhahaha step 1
20:40 championofcyrodi it seems the task failed because there is no ceph keyring?  is it that generating the ceph keyring failed earlier?
20:41 CTWill check ip or node number?
20:41 mwhahaha CTWill: ssh to the ip
20:41 CTWill TCP Connection to stomp://mcollective@10.43.0.10:61613 failed on attempt 1076
20:41 mwhahaha championofcyrodi: oh was that from your logs or the manual run?
20:41 CTWill the log
20:41 CTWill im guessing that is the problem
20:41 championofcyrodi think i found it
20:41 championofcyrodi (/Stage[main]/Ceph::Conf/Exec[ceph-deploy gatherkeys remote]/returns) change from notrun to 0 failed: ceph-deploy gatherkeys node-5 returned 1 instead of one of [0]
20:41 CTWill the rabbitmq
20:42 mwhahaha CTWill: yea that'd be a problem
20:42 mwhahaha CTWill: are all the docker containers running correctly?
20:42 championofcyrodi manual run
20:42 mwhahaha championofcyrodi: that would explain that manual error, was that the task that failed from the astute log?
20:43 championofcyrodi yes, but i found it also further down in the 'puppet' logs for that node via Fuel UI
20:43 CTWill i just ran dockerctl check
20:43 CTWill everything says ready
20:44 championofcyrodi actually... scratch that... i don't know if it was actually in the astute log
20:44 championofcyrodi let me check.
20:44 julien_ZTE joined #fuel
20:45 mwhahaha CTWill: is your fuel master 10.40.0.10?
20:45 CTWill 10.43.0.10
20:45 championofcyrodi hmmmmm... looks like that error message is NOT in the astute log
20:46 championofcyrodi so i guess it's happening on the node via puppet, but not getting collected back to astute?
20:46 mwhahaha championofcyrodi: you have to go to the puppet log on the failed node to find the puppet task that failed
20:46 mwhahaha championofcyrodi: it's kinda annoying
20:47 CTWill how can i see what the ip address of the rabbitmq docker container isusing?
20:47 championofcyrodi i guess i can re-run gatherkeys and find out why that failed...
20:47 mwhahaha you start at astute then find the node that failed, then go to the puppet log and look for the err: line at the bottom
20:47 mwhahaha CTWill: it'll use that 10.43.0.10 address
20:48 championofcyrodi [ceph_deploy][ERROR ] RuntimeError: connecting to host: node-5 resulted in errors: HostNotFound node-5
20:48 CTWill im going to try and restart rabbitmq again
20:49 championofcyrodi Warning: Permanently added 'node-5,10.10.28.8' (ECDSA) to the list of known hosts.
20:49 championofcyrodi Write failed: Broken pipe
20:49 championofcyrodi if node-5 is my controller, it may be the bond0-rr failing
20:49 CTWill no joy
20:49 championofcyrodi i can ping node-5...
20:49 championofcyrodi but ssh is resulting in broken pipe
20:50 championofcyrodi root@node-2:~# ssh node-5
20:50 championofcyrodi ssh_exchange_identification: read: Connection reset by peer
20:51 championofcyrodi yup, node-5 is my controller
20:52 championofcyrodi do you think the ssh connection is being reset by peer in ubuntu because of the round-robin MAC address swapping on RECV'd session packets?
20:53 mwhahaha it's possible
20:53 mwhahaha don't use rr
20:54 mwhahaha CTWill: restart the mcollective container too
20:55 championofcyrodi heh, that's the default for fuel when choosing to bond!
20:56 CTWill restarting
20:56 CTWill done
20:56 mwhahaha if you 'ss -nlp | grep 61613' on the master is it listening?
20:57 CTWill tcp    LISTEN     0      128                   :::61613                :::*      users:(("beam.smp",5122,20))
20:58 mwhahaha ok so if you go to one of the bootstrap nodes, can it connect?
20:58 HeOS joined #fuel
20:58 mwhahaha championofcyrodi: my personal opinion is to use active/backup or LACP depending on what you need
20:59 mwhahaha championofcyrodi: which may vary from an official stance for fuel, but those i've used in the past in other places with much success
20:59 CTWill it still failing
20:59 CTWill is there a way to redeploy the docker contrainers
21:00 CTWill rabbitmq.rb:25:in `on_connectfail' TCP Connection to stomp://mcollective@10.43.0.10:61613 failed on attempt 11
21:00 mwhahaha CTWill: yea
21:00 championofcyrodi yea, LACP didnt seem to work well on this shitty old switch the last time i tried it manually,
21:00 championofcyrodi i'll try ALB and see what i get
21:00 CTWill re-run the upgrade.sh?
21:01 championofcyrodi heh, after i wait forever for these last two nodes to fail so i can reset.
21:01 mwhahaha CTWill: so is this a new deployment or an old one?
21:01 CTWill it is a 6.0 upgrade to 6.1
21:01 mwhahaha championofcyrodi: just stop the deployment and wait for it to end
21:01 mwhahaha CTWill: could you not just reinstall with 6.1? or are there other environments currently running
21:02 CTWill yes there are
21:02 mwhahaha then you probably don't want to destroy all the containers and rebuild
21:03 CTWill the upgrade kept all my data
21:03 championofcyrodi well the master is deployed okay... maybe i can tweak the ssh settings and get it to work
21:04 mwhahaha CTWill: i'm not sure if the upgrade can be run multiple times, but this seems more of an issues with connectivity
21:04 mwhahaha CTWill: can you connect to port 61613 from a boostrapped node via nc or something?
21:06 CTWill Ncat: Connection refused.
21:06 mwhahaha i see
21:06 mwhahaha so lets go back to the fuel node, iptables perhaps?
21:08 CTWill fuel master?
21:08 mwhahaha yea
21:09 CTWill Paste #353079
21:10 mwhahaha do a iptables -vnL so i can see what interfaces the rules are on
21:10 championofcyrodi okay, weird... so after 'stopping' the deployment. i can now SSH to node-5
21:10 championofcyrodi i'm attempting to re-deploy the compute nodes again.
21:10 mwhahaha championofcyrodi: you and your weird environments
21:11 championofcyrodi well, it seems that bonding is the only way to achieve faster network IO between controller and nodes w/o using 10Gbps
21:11 CTWill Paste #353080
21:11 mwhahaha CTWill: what interface is your admin network on?
21:12 mwhahaha 13716  823K REJECT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0           multiport ports 4369,5672,15672,61613 /* 042 rabbitmq_block_ext */ reject-with icmp-port-unreachable
21:12 mwhahaha that's your problem
21:12 CTWill so remove that rule
21:12 mwhahaha well no
21:12 mwhahaha we need to fix the rules for your environment
21:13 mwhahaha 0     0 ACCEPT     tcp  --  eth0   *       0.0.0.0/0            0.0.0.0/0           multiport ports 4369,5672,15672,61613 /* 040 rabbitmq_admin */
21:13 mwhahaha that's the rule that these things should be getting
21:13 mwhahaha or the one below it with src-type LOCAL
21:13 mwhahaha so if your admin interface isn't eth0, you should update your rule to that effect
21:13 mwhahaha and all should be well
21:14 mwhahaha if we're lucky
21:14 CTWill eth2
21:14 mwhahaha switch that rabitmq_admin rule to eth2 and reload the rules
21:15 mwhahaha probably in multiple places
21:17 CTWill I've punted to mirantis support on this one, brb have a meeting to goto...
21:51 Longgeek joined #fuel
23:42 claflico joined #fuel
23:44 Longgeek joined #fuel

| Channels | #fuel index | Today | | Search | Google Search | Plain-Text | summary