Perl 6 - the future is here, just unevenly distributed

IRC log for #fuel, 2014-07-08

| Channels | #fuel index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:29 IlyaE joined #fuel
01:06 mattgriffin joined #fuel
01:26 xarses joined #fuel
03:14 mattgriffin joined #fuel
03:15 IlyaE joined #fuel
03:38 angdraug Kupo24z: not out yet, see Expected field in https://launchpad.net/fuel/+milestone/5.0.1
03:48 mattgriffin joined #fuel
05:43 Longgeek joined #fuel
05:57 IlyaE joined #fuel
06:02 Arminder joined #fuel
06:05 Arminder guys any known issue for quotas in 5.0?
06:05 Arminder it doesn't work anymore from the dashboard
06:35 al_ex8 joined #fuel
07:13 pasquier-s joined #fuel
07:15 ddmitriev joined #fuel
07:28 IlyaE joined #fuel
07:35 hyperbaba joined #fuel
08:00 pasquier-s joined #fuel
08:06 Longgeek joined #fuel
08:13 guillaume_ joined #fuel
08:14 pasquier-s joined #fuel
08:17 pasquier-s joined #fuel
08:21 Longgeek_ joined #fuel
08:21 brain461 joined #fuel
08:22 pasquier-s joined #fuel
08:25 pasquier-s joined #fuel
08:36 Longgeek joined #fuel
08:37 pasquier-s_ joined #fuel
08:41 pasquier-s__ joined #fuel
08:50 pasquier-s joined #fuel
08:58 artem_panchenko joined #fuel
08:58 Longgeek_ joined #fuel
09:19 evg_ Arminder: haven't seen any. What's wrong in dashboard?
09:19 Longgeek joined #fuel
09:25 Longgeek_ joined #fuel
09:31 geekinut1h joined #fuel
09:36 pasquier-s_ joined #fuel
10:11 brain461 joined #fuel
10:12 pasquier-s joined #fuel
10:17 eshumakher joined #fuel
11:07 Arminder evg_: it doesn't allow the user/tenant quotas to be changed
11:16 evg_ Arminder: have you tried doing it in cli?
11:49 Arminder aye it errors too
11:50 Arminder i could though change neutron quotas from CLI
11:50 Arminder but not nova ones like no. of cpu/instances/volumes/etc.
11:59 TVR_ hyperbaba ... you on?
12:00 TVR_ I might be able to help with the ceph issue
12:00 hyperbaba TVR_: yes
12:00 Longgeek joined #fuel
12:00 TVR_ what are you using for your networking when you built the env?
12:00 hyperbaba TVR_:  oh? you do? Thank you
12:01 hyperbaba TVR_ it's a neutron network witch openvswich bondingom
12:01 hyperbaba on couple of 1gig ethers
12:01 TVR_ are you using neutron, with gre or vlan?
12:01 hyperbaba gre
12:01 TVR_ cool.. I am most familiar with that setup
12:02 TVR_ so... what do you have vlan tagged?
12:02 hyperbaba only 2 tagged vlans ,.. for managment and storage
12:02 TVR_ so.. is public vlan'd, management, storage.. what has a tag?
12:02 TVR_ cool
12:02 TVR_ so.. what happens when you ifconfig and look for the storage ip.. all of them have it.. yes?
12:03 hyperbaba but here is the catch ..i've ruled out the networking issue becuse other osd processes are working on that node
12:03 hyperbaba yes
12:03 TVR_ so.. you have osd's that drop.. yes?
12:03 TVR_ so.. ceph osd tree shows dropped osd's yes?
12:04 hyperbaba yes
12:04 hyperbaba just one
12:04 TVR_ pastebin me ceph -s and ceph osd tree please
12:04 hyperbaba the rest are up
12:04 TVR_ what hardware?
12:05 TVR_ dell, hp... what hardware?
12:05 hyperbaba Here is the catch. My bosses insisted that i've losted too much time with this issue and i had to redeploy complete environment;
12:05 hyperbaba it's on a supermicro twin servers
12:05 TVR_ ok.. cool
12:05 TVR_ so is it in-process of the rebuild now?
12:06 hyperbaba yes
12:06 hyperbaba and it's failing again
12:06 TVR_ here are the issues I have seen with ceph and the fuel deployment: (no fault of fuel, but here they are):
12:07 TVR_ 1, if the vlan tagging is wrong.... so in general, I will run pings against the management from every server to every server and capture it in a file... verify no packet loss
12:08 TVR_ 2. the XFS leaves a ghost or the hardware 'needs to be initialized'
12:08 TVR_ sometimes, the ceph-deply zap disk will NOT get rid of the ghost.... especially if the disk hasn't been initialized first....
12:09 hyperbaba what do you mean by 'needs to be initialized' ?
12:09 TVR_ I tend to initialize problimatic disks, and then when in the bootstrap mode, dd the hell out of them
12:09 hyperbaba we did that already couple of times
12:09 hyperbaba sometimes with some results some time without any
12:10 hyperbaba but it's easiest thing to add -f in mkfs_xfs in the ceph deploy recepie
12:10 TVR_ so.. from the controller, you will have the choice to go into the controllers 'bios' to create raid or whatever... once you create a bunch of raid 0 (jbod) disks, you have the ability to "initialize" the new volume
12:11 hyperbaba aha. you mean the bios deep scrub?
12:11 hyperbaba (not the ceph scrub)
12:11 TVR_ this has bitten me on many a dell, a few supermicro, and HP less often
12:12 TVR_ yes.. from the controller you initalize this, yes
12:12 hyperbaba but sometimes with even new disk ceph deploy fails in some cases... leaving entire environment in error state
12:13 TVR_ be sure the vlan is set correctly on the switch... this too, not setting untagged and such can be an overlooked issue
12:14 pasquier-s joined #fuel
12:14 TVR_ when you have a ceph issue with the openstack fuel, unless you are choosing only 2 controllers, or adding a controller to a 1 controller system, it is always disk or network related.... the modules and the way ceph is deployed is rock solid
12:15 hyperbaba I have only 2 controllers
12:15 hyperbaba don't understand why is that a problem
12:15 TVR_ ok... so it may be split braining then
12:15 hyperbaba two is enough 3 is overkill for a microcloud deployment
12:16 al_ex8 joined #fuel
12:16 TVR_ so.. ceph mon's form a quorum.... and when they reboot services, who is master?
12:16 hyperbaba posibillity to have split brain scenario is minimal, and i always add additional monitor on one of the compute nodes
12:16 TVR_ I have not had good luck with 2 controllers.....
12:17 hyperbaba for ceph using ceph-deploy command
12:19 TVR_ so.. each controller has a OS disk AND a sepparate OSD?
12:20 hyperbaba yes
12:20 TVR_ OS disk can be raid or whatever.. just sepparate OSD disk
12:20 hyperbaba two osd per node
12:20 TVR_ ok.. good
12:20 TVR_ that to beat me a bit back... ceph does not like to use partitions for OSD but likes physical disks
12:20 hyperbaba we put os on ssd and also tryed to use ceph journals on those ssd disk.
12:21 hyperbaba not partitions
12:21 hyperbaba physical disks
12:21 hyperbaba we have 3 disks per node. First is OS ,second and third are ceph
12:21 TVR_ the journal on the OS disk works well (SSD for that) for performance I have seen, but you HAVE to enable trim or the SSD will die early
12:22 hyperbaba it's just a mount option right?
12:22 hyperbaba partitions are properly aligned i hope in deployment?
12:23 TVR_ I would love to claim to be smart and tell you all about trim, but this guy knows more than I : http://blog.neutrino.es/2013/howto-properly-activate-trim-for-your-ssd-on-linux-fstrim-lvm-and-dmcrypt/
12:23 vogelc joined #fuel
12:24 TVR_ if you initiate the disks and have them presented to the system without partitions, fuel does the right thing for that
12:25 vogelc my environment requires a proxy.  where can I edit the master NTP configuration to point to my internal NTP servers?  I have tried editing it in the fuel setup but it never takes.
12:25 evg_ Arminder: Have you enabled quotas during deploying?
12:25 TVR_ so, if the disks are good, please look at your switch configs... you doing tagged or untagged VLAN's on the switch?
12:26 pasquier-s joined #fuel
12:26 hyperbaba ok. but why fuel is failing in most of the times. For example as we speak, we deployed some micro environment. First failed was the ceph(in logs) but all osd where up. I've click deploy changes .Then run into some error from mysql. After 3 or for click on deploy changes after error i got success.
12:26 hyperbaba switch was triple checked
12:27 hyperbaba it's ok because platform is working well once deployed (and rebooted node by node because HA is not working out of a box)
12:28 guillaume_ joined #fuel
12:28 hyperbaba and after additional tweaking (for example we have to change disk overcommit ratio because compute nodes report just free space from os disk not ceph)
12:29 TVR_ can you.. for the sake of testing... do a 3 controller + OSD and the rest compute? Just for testing?
12:29 hyperbaba We have a four node server. We will try
12:29 TVR_ yes.. because if the disks are initiated, and the network is true... it must be a split brain
12:31 hyperbaba split brain in which component of the system?
12:34 pasquier-s_ joined #fuel
12:40 TVR_ the ceph mon's
12:41 vogelc I have been running into an issue where ceph will not activate all of my disks.  say out of 33 drives it will only activate 21.  the remaining drives show down.  even if i issue 'ceph osd in 24.osd' it will go in but not change status to up.  never had this issue before.
12:41 hyperbaba i've monitored via ssh nodes during deployment. Ceph was in good health all the times
12:41 hyperbaba sucessive failures where for mysql
12:42 hyperbaba and after last deploy changes click fuel decided to deploy openstack again on both controllers
12:42 TVR_ ok.. so let's verify everything by doing the 3 controller + OSD and 1 compute + OSD.... if that works... this will tell us the infrastructure is good all around.
12:43 hyperbaba And what about adding additional osd disk on already deployed environment? can fuel be used for that? I've tried ceph-deploy and failed miserably. Spent 2 days on #ceph with guys there trying to figure it out. no luck
12:45 TVR_ The reason I keep going back to networking is due to an issue I had / have.... I have a node that has a loose cable / bad 10G nic.... when I deployed the cluster, due to that node being up and down all the time, the cluster would fail.. (It wasn't even a controller.. it was a compute + OSD)
12:45 TVR_ I ended up detecting the issue from the switch, and once I did the deployment without that node, the cluster came up fine
12:46 TVR_ ok.. adding a disk:
12:47 TVR_ was the disk physically in the server / detected during the cluster build, or did you add it to a slot after the cluster was up?
12:47 hyperbaba it was in during deployment but left in down state
12:48 TVR_ here is the reason... even if the disk is NOT used during the cluster build... fuel tags it and puts disk info on it... you need to remove that / zap it before adding it
12:48 hyperbaba i;ve used ceph commands to get rid of it (scrub  rm and friends) and tried to add it using ceph-deploy commands. It is created and put into ceph cluster in the down state
12:48 hyperbaba the same problem you had
12:48 hyperbaba i did zap
12:48 evg_ Arminder: quota is disabled by default during deployment course it degrates claster's performance. You can change "quota_driver" parameter in nova.conf if you really need it.
12:48 hyperbaba did full dd
12:49 hyperbaba did some woodoo .
12:49 hyperbaba and no luck. We had to kill entire environment. Now it's up mother .ucker !
12:49 TVR_ yea... unless it is a clean disk... IE not in the system during cluster build... ceph-deploy will have issues due to the ghosts the cluster build touches it with
12:50 TVR_ cluster up?
12:50 hyperbaba yes
12:50 TVR_ it just needed to feel loved
12:50 hyperbaba but 5 days lost....
12:50 hyperbaba and no conclusion at all
12:50 hyperbaba that's the main problem
12:51 TVR_ yea.. so I suspect the previous was all split brain related... and this one time... the restart of services didn't happen on both mons at the same time
12:51 hyperbaba cluster was in ok state the first time
12:52 TVR_ you CANNOT restart the service at the same time on mon's without a very high probability of split brain
12:52 hyperbaba i am aware of that
12:52 hyperbaba i do restart (reboot) wait to things settle down and cluster goes in HEALTH_OK and then restart the other
12:53 hyperbaba but whatever i did that disk did not want to go in .
12:53 hyperbaba and now it's ok in the newly deployed environment as nothinh has happened.
12:54 TVR_ if this is a production environment... I would grab a very cheap (even used) server as a 3'rd mon to be safe... when I was hired over here, the first thing I did was to build for disaster.... because if you loose a mon, your cluster will stay up... but if the other one hickups, or if you need to REPLACE the entire failed one, you will go through a nightmare...
12:54 hyperbaba Will 5.1 have a sepate role for ceph-mon / ceph-osd  , so we can put them freely on nodes?
12:54 TVR_ Ah, that would be a good advance option, yes....
12:55 TVR_ put disclaimers, etc, but have as an advanced option
12:55 hyperbaba I put third mon on one of the compute nodes using ceph-deploy
12:55 TVR_ hope my info was helpful
12:55 TVR_ ah, ok.. cool
12:55 odyssey4me joined #fuel
12:56 vogelc hyperbaba: how long did you have to wait before the cluster showed all disks as 'up'?
12:57 hyperbaba vogelc: they where up imediately
12:57 vogelc all disks said up when running 'ceph osd tree'?
12:57 hyperbaba as ceph started. If you see some of them down, you are in the same problem. They go autoout after 300 seconds
12:58 hyperbaba yes
12:58 hyperbaba TVR_: Thank you very much for your time
12:59 TVR_ are they going down now?
12:59 hyperbaba no
12:59 hyperbaba now is all ok
12:59 hyperbaba i had issues with previous deployment
13:00 hyperbaba ceph issues
13:00 TVR_ ok.. cool.. so that eliminated the issue with switch... and I guess it likes the disks now...
13:00 hyperbaba yes
13:01 hyperbaba the previous issue was with ceph itself (some obscure bug). I was in contact with people from ceph team and we where unable to root it out . All services on full debug and we could not find the root cause of rejecting the 1 osd
13:16 eshumakher joined #fuel
13:30 wayneeseguin joined #fuel
13:42 mattgriffin joined #fuel
13:56 IlyaE joined #fuel
14:31 pasquier-s_ joined #fuel
14:45 Kiall left #fuel
14:53 angdraug joined #fuel
15:07 xarses joined #fuel
15:12 xarses joined #fuel
15:15 accela-dev joined #fuel
16:43 metarx joined #fuel
16:45 metarx Hi, I have a quick question about fuel-cli   I've loaded all my nodes into the web interface, and I've read/seen that the "storage" network is not required on the controller nodes... so I was wondering what parts I need to edit in the yaml files to remove the storage network from the controllers
16:47 AndreyDanin joined #fuel
16:53 MiroslavAnashkin metarx: Do you use Fuel 5.0? 5.0 has a bug - it cannot import back the downloaded network settings.
16:54 metarx MiroslavAnashkin: that part seemed to work?  least the change of the vip worked when I downloaded the default deployment
16:54 metarx i preconfigured it from the web interface first however...
16:57 metarx guess if I have to deploy it with the storage network an then just remove it after the fact... thats fine too...
16:58 MiroslavAnashkin It does not matter - it is bug in CLI, a kind of incorrect JSON formation. You have to apply the following patch to master node to be able to import network interface settings from command line: https://review.openstack.org/#/c/102261/
16:59 MiroslavAnashkin Simply download the patched file versions from Gerrit commit, and replace the existing ones on master node
17:03 jobewan joined #fuel
17:08 MiroslavAnashkin After the patch you'll be able to run `fuel --env <cluster ID> network download` or `fuel --env <cluster ID> deployment default`, edit the downloaded yaml file/files,
17:09 metarx k
17:09 metarx thanks much
17:09 Arminder thanks evg_
17:10 MiroslavAnashkin get ris of storage network existence, import the changes back to env with `fuel --env <cluster ID> deployment --upload` or `fuel --env <cluster ID> network upload` and deploy the env without the storage network defined
17:10 Arminder just found out that in my current deployment i hadn't enabled quotas during deployment
17:10 MiroslavAnashkin s/ris/rid/
17:12 MiroslavAnashkin And please do it after you made all the configurations to the environment with UI.  If you upload any changes in provisioning or deployment operations, you will freeze the entire environment configuration
17:12 metarx MiroslavAnashkin: I don't want to get rid of it completely... maybe my assumption is wrong... but I have 3 ceph nodes for storage, which I have set to do glance storage, object storage, and instance block devices... I had it previously configured (on only one controller, moving up to three controllers) and didn't see the controllers talking on the defined storage network
17:13 MiroslavAnashkin Then it is better to leave storage network as is.
17:14 metarx MiroslavAnashkin: What network do they use to communicate to ceph over?
17:15 MiroslavAnashkin Controllers may use (or may not) in scope of image import/export operations only, and may be in scope of migration.
17:15 MiroslavAnashkin Most operations run through Management network
17:16 metarx k, thanks!
17:17 MiroslavAnashkin Storage network is just a separate network for inter-storage device traffic
17:26 Kupo24z can someone confirm if https://bugs.launchpad.net/nova/+bug/1284709 is fixed in 5.0.1?
17:30 evg_ Arminder: you can try to enable it by editing /etc/nova/nova.conf. search "quota_driver=" there.
17:37 Arminder after setting quota_driver = nova.quota.DbQuotaDriver
17:37 Arminder what services needs to be restarted
17:41 ArminderS joined #fuel
17:48 Arminder restarted nova api only and that seemed to worked
17:50 evg_ Arminder: hm, i'm not sure... i've tried it too on my env and it doesn't work for me
17:51 ArminderS just nova-api restart worked
17:51 evg_ Arminder: you're lucky
17:51 ArminderS and i'm able to change the quotas from horizon without any issues
17:52 ArminderS in my poc i had enabled those, someone else had set the present environment, so forgot to check there
17:52 ArminderS thanks for pointing me to correct place
17:55 evg_ Arminder: np
17:56 IlyaE joined #fuel
17:57 vogelc #ceph
17:57 MorAle joined #fuel
17:58 mihgen joined #fuel
18:40 vogelc Why on the storage/compute nodes is /boot made up of a raid1 partition from all the drives in the host?
18:50 MiroslavAnashkin vogelc: To be able to boot the node if one of the drives failed.
18:53 vogelc OK - in my case thats 13 drives all of different types.
18:53 MiroslavAnashkin RAID-1 of 13 drives?
18:55 TVR_ I do have to agree with vogeic on this one.... due to the RAID1 setting, it also makes it very difficult if you want to skip a drive during the fuel install phase, and afterwards add it as an OSD later...
18:56 TVR_ it adds extra steps
19:04 IlyaE joined #fuel
19:15 angdraug joined #fuel
19:39 vogelc joined #fuel
20:10 e0ne joined #fuel
20:19 IlyaE joined #fuel
20:48 tatyana joined #fuel
22:06 IlyaE joined #fuel
22:32 IlyaE joined #fuel
23:27 IlyaE joined #fuel
23:51 IlyaE joined #fuel

| Channels | #fuel index | Today | | Search | Google Search | Plain-Text | summary