Perl 6 - the future is here, just unevenly distributed

IRC log for #fuel, 2015-11-19

| Channels | #fuel index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:08 zhangjn joined #fuel
00:09 zhangjn joined #fuel
00:10 zhangjn joined #fuel
01:07 zhangjn joined #fuel
01:08 zhangjn_ joined #fuel
02:24 pbrzozowski_ joined #fuel
02:24 yantarou joined #fuel
02:25 dilyin joined #fuel
02:47 ilbot3 joined #fuel
02:47 Topic for #fuel is now Fuel 7.0 (Kilo) https://software.mirantis.com | Paste here http://paste.openstack.org/ | IRC logs http://irclog.perlgeek.de/fuel/
03:05 rmoe joined #fuel
03:18 xarses_ joined #fuel
03:21 bapalm joined #fuel
03:41 Guest70 joined #fuel
04:14 jerrygb joined #fuel
04:40 fedexo joined #fuel
05:38 javeriak joined #fuel
05:41 javeriak_ joined #fuel
06:21 subscope joined #fuel
06:34 tzn joined #fuel
06:48 elemoine joined #fuel
06:48 Guest70 joined #fuel
06:57 elemoine__ joined #fuel
07:05 elemoine__ left #fuel
07:21 Guest70 joined #fuel
07:31 LinusLinne joined #fuel
07:40 jerrygb joined #fuel
07:43 elemoine joined #fuel
07:51 Sesso joined #fuel
07:54 e0ne joined #fuel
07:56 zhangjn joined #fuel
07:58 mkwiek07 joined #fuel
08:01 Wida joined #fuel
08:11 LinusLinne joined #fuel
08:15 neilus joined #fuel
08:19 hyperbaba joined #fuel
08:22 fzhadaev joined #fuel
08:27 alex_didenko joined #fuel
08:29 LinusLinne joined #fuel
08:35 wayneseguin joined #fuel
08:41 jerrygb joined #fuel
08:42 Wida joined #fuel
08:50 anddrew joined #fuel
08:56 sergmelikyan joined #fuel
09:06 anddrew joined #fuel
09:10 ppetit joined #fuel
09:19 eliqiao1 joined #fuel
09:19 eliqiao1 left #fuel
09:32 tkhno joined #fuel
09:39 javeriak joined #fuel
09:42 Wida joined #fuel
09:55 sergmelikyan joined #fuel
09:59 TiDjY35 joined #fuel
10:01 Chlorum joined #fuel
10:07 subscope joined #fuel
10:20 e0ne joined #fuel
10:25 subscope joined #fuel
10:26 magicboiz joined #fuel
10:37 javeriak joined #fuel
10:42 jerrygb joined #fuel
10:43 magicboiz joined #fuel
10:44 javeriak_ joined #fuel
10:50 tzn joined #fuel
10:56 javeriak joined #fuel
10:59 mkwiek07 joined #fuel
11:00 javeriak_ joined #fuel
11:01 preilly joined #fuel
11:23 vvalyavskiy joined #fuel
11:26 imilovanovic joined #fuel
11:33 zhangjn joined #fuel
11:33 imilovanovic joined #fuel
11:33 imilovanovic joined #fuel
11:34 f13o joined #fuel
11:35 ppetit joined #fuel
11:40 subscope joined #fuel
11:47 LinusLinne joined #fuel
11:51 sergmelikyan joined #fuel
11:58 poseidon1157 joined #fuel
12:04 igorbelikov joined #fuel
12:22 thegmanagain joined #fuel
12:37 javeriak joined #fuel
12:40 e0ne joined #fuel
12:43 jerrygb joined #fuel
12:53 hyperbaba joined #fuel
12:57 subscope joined #fuel
13:05 zimboboyd joined #fuel
13:19 ppetit joined #fuel
13:21 sergmelikyan joined #fuel
13:27 sergmelikyan joined #fuel
13:34 javeriak joined #fuel
13:39 Liuqing joined #fuel
13:39 neilus1 joined #fuel
13:44 LinusLinne joined #fuel
13:47 BobBall joined #fuel
13:48 BobBall Is it possible to rebuild a compute node from scratch but keep the same identifiers, without rebuilding the rest of the environment/
13:52 aglarendil BobBall: I guess, you could reuse node reinstallation feature in 7.0
13:53 BobBall Awesome! I didn't realise 7.0 had that feature.
13:53 BobBall Will go look up for it, thanks
13:54 pma https://docs.mirantis.com/openstack/fuel/fuel-7.0/user-guide.html#rollback
13:55 BobBall Genius - thanks so much
14:00 jerrygb joined #fuel
14:02 neilus joined #fuel
14:14 jerrygb joined #fuel
14:27 poseidon1157 joined #fuel
14:28 neilus joined #fuel
14:34 javeriak joined #fuel
14:37 aglarendil u r welcome
14:54 evgenyl joined #fuel
14:55 az joined #fuel
14:56 akasatkin joined #fuel
14:56 kozhukalov joined #fuel
14:57 mkwiek joined #fuel
14:57 ashtokolov joined #fuel
14:58 mmalchuk joined #fuel
14:58 agordeev joined #fuel
14:58 bpiotrowski joined #fuel
14:59 DarthVigil joined #fuel
15:01 claflico joined #fuel
15:01 holser joined #fuel
15:03 teran joined #fuel
15:03 jaranovich joined #fuel
15:04 ogelbukh joined #fuel
15:04 akislitsky_ joined #fuel
15:04 aliemieshko_ joined #fuel
15:04 idvoretskyi joined #fuel
15:05 smakar joined #fuel
15:05 aglarendil joined #fuel
15:06 kgalanov joined #fuel
15:07 DarthVigil Has anyone run into issues where discovery of nodes didn't find the correct number of NICs?
15:09 DarthVigil Its a BL460c Gen9 connected to a Flex-10
15:10 mwhahaha DarthVigil: I'd assume that's related to the driver not being available for some of the nics
15:11 DarthVigil That's what I feared.
15:11 mwhahaha DarthVigil: do you know what chipset(s) they are?
15:11 mwhahaha additionally you could try the ubuntu based bootstrap and see if you have more luck with that one
15:12 mwhahaha DarthVigil: https://docs.mirantis.com/openstack/fuel/fuel-7.0/operations.html#enable-ubuntu-bootstrap-experimental
15:12 DarthVigil Good idea. I'll give that a go today.
15:27 blahRus joined #fuel
15:28 az__ joined #fuel
15:29 bildz How can I troubleshoot the Ceph cluster being inaccessible?
15:34 sergmelikyan joined #fuel
15:37 aglarendil @MiroslavAnashkin ^^
15:41 javeriak joined #fuel
15:41 xarses_ joined #fuel
15:43 jaypipes joined #fuel
15:44 javeriak_ joined #fuel
15:49 alexz joined #fuel
15:53 angdraug joined #fuel
15:59 alex_didenko joined #fuel
16:11 akurenyshev joined #fuel
16:14 sergmelikyan joined #fuel
16:17 javeriak joined #fuel
16:23 Verilium Everything seems to be working correctly, but seems I'm having a high number of messages in rabbitmq accumulating, and increasing.  The messages all seem to be in "cinder-scheduler_fanout*" and "scheduler_fanout_*".  Any idea what this might imply?
16:23 Verilium ...and, where can I find the rabbitmq 'main' admin account?  I used the credentials I found for nova, but not sure if there's another account for administrative purposes?
16:24 subscope joined #fuel
16:24 mwhahaha usually if messages are increasing then there's not enough consumers
16:25 mwhahaha also there should be a rabbitmq user/pass in the astute.yaml's rabbit section
16:26 mwhahaha or globals.yaml
16:26 mwhahaha i think you can query it via 'hiera rabbit_hash'
16:27 Verilium Hmm, it's giving me the nova account it seems.
16:27 Verilium Guess I used the right one. :P
16:27 mwhahaha yup
16:28 Verilium https://bugs.launchpad.net/mos/+bug/1497961
16:28 Verilium Hmm, well, seems this describes the issue.
16:29 mwhahaha only if you are constantly failing your rabbitmq
16:30 mwhahaha i think you're case is that no one is consuming them
16:31 bearish joined #fuel
16:31 mwhahaha dmitryme was the one who looked into that issue, he might have some more info
16:35 subscope joined #fuel
16:39 gomarivera joined #fuel
16:42 bildz Does anyone have a couple minutes to discuss getting a ceph cluster sorted out?
16:45 aglarendil bildz: gimme a minute, I need to find our ceph SMEs
16:45 aglarendil while this is in progress, could describe what is your problem?
16:46 Verilium mwhahaha:  Yeah.  I wonder why though.  The bug report seems to be exactly the issue I'm seeing though, so I suppose it's a bug and chances are I can ignore this.
16:49 DarthVigil joined #fuel
16:50 bildz aglarendil: thanks
16:50 bildz aglarendil: well I installed everything with Fuel
16:51 aglarendil bildz: go on :-)
16:51 bildz aglarendil: There was an issue building the ceph cluster (bug related), but was able to manually deploy and get the process going.  Now it's in a warn state
16:51 bildz aglarendil: so Im getting use to the ceph cli tools to troubleshooting and just need someone to help point me in the right direction
16:51 aglarendil what does `ceph health` say
16:51 aglarendil ?
16:51 bildz HEALTH_WARN 136 pgs degraded; 55 pgs peering; 161 pgs stale; 56 pgs stuck inactive; 161 pgs stuck stale; 192 pgs stuck unclean
16:52 aglarendil do you have any valuable data in there?
16:52 bildz nope
16:52 bildz havent been able to upload any glance images
16:52 aglarendil just restart all the ceph monitors and osd daemons
16:52 bildz this is a fresh install
16:52 aglarendil do you know how to do that?
16:53 bildz /etc/init.d/ceph -a stop osd.0 ?
16:53 aglarendil it depends on your configuration
16:53 aglarendil but essentially yes
16:53 bildz this is a vanilla fuel install
16:53 aglarendil restart all osds
16:53 fuel-slackbot joined #fuel
16:53 bildz /etc/init.d/ceph: osd.0 not found (/etc/ceph/ceph.conf defines , /var/lib/ceph defines )
16:54 aglarendil which bug did you relate to, btw?
16:54 bildz one of the ceph partitions wouldnt deploy
16:54 bildz so I had to deploy it manually and then it was all good
16:54 aglarendil what do you mean by 'manually' ?
16:54 bildz ceph-deploy <etc>
16:55 aglarendil hmm. let me ask a couple more folks on this
16:55 aglarendil @xarses ^^ could you help here please ?
17:05 xarses whats the output of `ceph osd tree`
17:05 bildz sure
17:05 bildz http://pastebin.com/VnG9SAdF
17:07 xarses not that it should matter, but do you know where osd.1 is?
17:07 Guest70_ joined #fuel
17:07 bildz yes
17:07 bildz osd.1 had an issue and i removed it
17:08 bildz it was on a bad partition
17:08 xarses did you out it first?
17:08 bildz yes
17:09 xarses hmm
17:09 bildz the state of ceph has always been the same
17:16 xarses we will want to start with http://docs.ceph.com/docs/v0.80/rados/troubleshooting/troubleshooting-pg/#stuck-placement-groups
17:16 xarses and then go from there
17:17 xarses I'm also interested in understanding how you got a broken from the start cluster as well
17:17 manashkin_ joined #fuel
17:20 manashkin_ joined #fuel
17:22 xarses bildz: my guess is that the osd's on node-6 aren't updating and the stale PG are all there
17:22 Verilium I wonder if I can clear all the messages from these queues though, since they're not getting consumed.  To the point of even having something clear them out regularly.
17:23 bildz xarses: how do I fix this?
17:23 Verilium Else, well, with lma/nagios, getting alerts about the rabbitmq queue growing and passing threshold.
17:23 xarses bildz: we will need to go through dump_stuck and ensure that its the problem at some point
17:24 xarses if its just those osd's check that the host can reach the monitor(s) the time is in sync (20ms max drift) and give the osd's a reboot
17:25 bildz how do I reboot osds?
17:25 LinusLinne joined #fuel
17:28 fedexo joined #fuel
17:30 xarses upstart is annoying so, let me check
17:32 bildz thanks
17:35 manashkin_ bildz, all your active OSDs located at the same node except one. Default Ceph rule is to place different copies of data to the different hosts. And it looks like this outbalance between OSD is the root cause of PGs in peering status.
17:36 manashkin_ bildz, Please try the following lifehack first: Run `ceph osd set noout` then check status with `ceph -s` then run `ceph osd unset noout` and check the status with `ceph -s` one more time
17:37 bildz thanks manashkin_
17:37 bildz yes, I have 1 storage node for the time being.  Im looking to stand up a 2nd
17:37 bildz but I wanted to get my arms around this for now
17:37 bildz manashkin_: are those commands run on all ceph nodes? (controller/storage) ?
17:37 bildz in my case it's node-1 / node-6
17:38 xarses bildz: its a cluster command
17:38 xarses so only once
17:38 xarses and with the admin key
17:39 bildz http://pastebin.com/XcxhQ5pY
17:40 xarses still 136 degraded now?
17:42 bildz yeah
17:43 manashkin_ bildz, OK,  we verified it is not Ceph issue with the stalled monitors.
17:43 bildz ceph osd set noout ; ceph -s ; ceph osd unset noout ; ceph -s
17:43 bildz i put that together
17:43 bildz manashkin_: nice
17:43 manashkin_ bildz, You have 2 options with the current OSD distribution.
17:45 manashkin_ bildz, First - leave only single OSD with the weight greater than 0 on each node. Second is to reweight all OSDs so the total weights per node be equal
17:46 bildz the weight designates where information will be stored first?
17:46 bildz I would prefer that node-6 be where all the information is stored
17:46 manashkin_ bildz, I mean if the node-6 has weight=1, the only OSD on node-1 should have weight=3
17:46 bildz it's a 36 disk supermicro server
17:49 bildz yeah looks like all the nodes have a weight of 0 right now
17:50 manashkin_ bildz, Such setup breaks default Ceph CRUSH rule. You may try to add the following parameter into the [global] section in the /etc/ceph/ceph.conf on every Ceph node and restart Ceph. The parameter is "osd crush chooseleaf type = 0"
17:50 bildz sorry nvm my last comment.  I just looked back at ceph osd tree
17:51 bildz is it just a "/etc/init.d/ceph restart" ?
17:51 bildz or a different process
17:51 manashkin_ bildz, Yes, and please restart only single Ceph node at time. Please restart monitors first
17:52 bildz k
17:53 xarses service ceph restart appears to do nothing to the process timestamp
17:53 xarses so I'm not sure it does anything
17:53 bildz do I need to add underscores in that config entry?
17:53 bildz the rest seem to have them
17:54 bildz xarses: i noticed that too
17:54 xarses both allowed
17:54 xarses but _ are more common from the tools
17:54 xarses that configure ceph.conf
17:54 manashkin_ bildz, It does not matter for Ceph, is there an underscore or it is space. After this setting - please set equal weights to all OSDs, or set weights in accordance with OSD sizes.
17:54 bildz restarting ceph appears to have done nothing to the pids
17:55 bildz they still show start date of nov 11
17:55 manashkin_ bildz, Then, please stop Ceph first and kill the non-stopped processes. Ceph itself stops all its daemons with kill.
17:57 gongysh joined #fuel
17:58 bildz ok processed restarted
17:58 bildz had to kill them
17:58 bildz they respawned
17:58 bildz pkill `ps waux |grep ceph |grep osd |awk '{print $2}'`
18:01 manashkin_ bildz, OK. And then, please set the equal weights to all OSDs to begin with. You may do it with the single command like `ceph osd crush reweight osd.* 1``
18:02 manashkin_ bildz, Then,  wait a minute and check the status with `ceph -s` one more time
18:03 sergmelikyan joined #fuel
18:03 bildz reweighted now waiting a min
18:03 bildz thanks for the help, guys
18:03 xarses did you reweight or change the chose leaf?
18:03 bildz ceph is a bit complicated for never having exposure
18:04 bildz xarses: im doing as manashkin_ has asked
18:04 xarses he noted you could do one or the other
18:04 bildz hmm i did both
18:04 xarses choose leaf 0 allows for the same host to host PG replicas
18:05 xarses so you won't have host redundancy
18:05 xarses where re-weighting allows you to keep this by making the tree calculate both nodes with equal weight
18:06 xarses so both nodes will receive one of the replicas
18:07 manashkin_ xarses, previous weights were not equal. http://pastebin.com/VnG9SAdF
18:07 xarses and you need to have each of the osd's on node-6 as 1/3 of the weight of node-1
18:07 xarses ya
18:08 bildz http://pastebin.com/KaqfWYqd
18:08 xarses manashkin_: do we need to try to account for this in the deployment or is this too rare?
18:09 manashkin_ bildz, yes,  if you still going to have a data replica on node-1, the weight pattern like xarses mentioned is what you want.
18:09 xarses bildz: osd 2,3,4 should be 0.33 while osd.0 is 1
18:10 manashkin_ xarses, We had such issue only once. Usually production clusters have number of Ceph OSD nodes greater than replication factor
18:10 aglarendil joined #fuel
18:10 ashtokolov joined #fuel
18:10 manashkin_ xarses, and more or less equal OSD number/capacity per node
18:11 xarses right now you want it to place 3:1 node-6 to node-1 so it cant figure where to place all of the PG's yet
18:11 xarses manashkin_: hmm, ok
18:11 rmoe joined #fuel
18:12 bildz http://pastebin.com/FTMiEfkr  better ?
18:12 manashkin_ bildz, yes, correct
18:12 bildz health HEALTH_WARN 66 pgs degraded; 126 pgs peering; 82 pgs stale; 126 pgs stuck inactive; 82 pgs stuck stale; 192 pgs stuck unclean
18:13 tzn joined #fuel
18:13 xarses that will work
18:14 xarses ceph -s?
18:14 manashkin_ bildz, please post the fresh output from `ceph -s` one more time
18:15 bildz http://pastebin.com/6jGpSYHd
18:17 agordeev joined #fuel
18:17 kozhukalov joined #fuel
18:17 akasatkin joined #fuel
18:17 akislitsky_ joined #fuel
18:17 evgenyl joined #fuel
18:18 jaranovich joined #fuel
18:21 holser joined #fuel
18:23 teran joined #fuel
18:24 xarses please post the output from `ceph pd dump_stuck`
18:25 xarses sorry
18:25 xarses `ceph pg dump_stuck`
18:26 smakar joined #fuel
18:26 bpiotrowski joined #fuel
18:26 kgalanov joined #fuel
18:26 bildz hmm
18:26 manashkin_ bildz, and may be `ceph health detail`...
18:26 bildz that command isnt recognized
18:28 xarses not pd, pg
18:29 bildz http://pastebin.com/fftqJqPi
18:29 mkwiek joined #fuel
18:33 javeriak_ joined #fuel
18:33 xarses there are a bunch of pages that are supposed to be placed on osd.1
18:34 bildz is it worth just rebuilding the ceph cluster?
18:34 manashkin_ bildz, No, not in this case
18:35 xarses we just have to order them to have their placement re-calculated
18:36 ogelbukh_ joined #fuel
18:36 manashkin_ bildz, Please go to node-1, stop Ceph there and then start it back
18:36 bildz k
18:36 xarses `ceph osd lost 1`
18:37 manashkin_ No, `ceoh osd out 1`
18:37 manashkin_ No, `ceph osd out 1`
18:37 bildz osd.1 does not exist.
18:37 manashkin_ Ah, there is no OSD.1 - then `ceph osd lost 1` is correct
18:38 javeriak joined #fuel
18:38 _gryf hey guys. I'm just wondering. Is there any plans regarding VMs HA to include it into Fuel?
18:38 bildz root@node-1:~# ceph osd lost 1 --yes-i-really-mean-it
18:38 bildz osd.1 is not down or doesn't exist
18:39 xarses _gryf: what do you mean VMs HA?
18:40 xarses you mean something like VMWare Fault-tolerance?
18:40 _gryf xarses, I meant rebuilding vms from faulty compute host
18:41 idvoretskyi joined #fuel
18:41 _gryf so the vm doesn't suffer much, just couple of seconds of downtime
18:47 mmalchuk joined #fuel
18:48 bildz i dont think there are any remnants of osd.1 left in the cluster
18:48 bildz the pgs were from nov 11th
18:49 Guest70_ joined #fuel
18:49 tzn joined #fuel
18:50 xarses _gryf: while I see many people calling it HA, its really not HA.
18:51 xarses HA is about keeping the application / function alive regardless of a member being faulty. This is more about automated fault recovery
18:51 _gryf xarses, yes, I agree
18:51 _gryf nevertheless I didn't spot such feature in Fuel, hence the quiestion
18:52 xarses There are some things you can do to help minimize the impact of a host failure, but IIRC it isn't exactly supported in OpenStack anway
18:52 xarses there are hooks left open for to operator to be able to make it work for them
18:53 _gryf xarses, are you aware of any plans to provide such auto recovery in fuel?
18:53 xarses for example for quick recovery you need ephemeral on shared storage
18:54 xarses and then you can use commands like nova evacuate to re-spin instances
18:55 xarses you can also react to some faults using heat to spin new instances
18:57 _gryf xarses, right. I'm familiar with the process, I can do that myself, just wondering if some ready to use mechanisms or soluition will be available in the next releases of Fuel
18:57 xarses Like I said there isn't really a solution to offer as part of Fuel. We expose the same parts that openstack does, there are likely too many permutations to put in the reference for fuel. It might achievable with a plugin or specific deployment guide
18:58 _gryf xarses, ok. thanks for the info
18:58 xarses you can see some of the complication described here https://ask.openstack.org/en/question/59964/openstack-instance-high-availabilityhow-to-make-pets-vm-highly-available/
18:59 xarses IIRC there was also a talk during one of the summits (vancourver but I could be mistaken) that spoke to the same subject
19:02 _gryf xarses, I'm invoved in providing such feature together with other parties
19:04 e0ne joined #fuel
19:05 _gryf there was a discusion last Monday around that topic - you can find the logs there: http://eavesdrop.openstack.org/meetings/ha__automated_recovery_from_hypervisor_failure/2015/ha__automated_recovery_from_hypervisor_failure.2015-11-16-0
19:07 xarses the end of the URL is missing
19:07 _gryf sorry, here it is: http://eavesdrop.openstack.org/meetings/ha__automated_recovery_from_hypervisor_failure/2015/ha__automated_recovery_from_hypervisor_failure.2015-11-16-09.00.html
19:13 bildz eph tell osd.2 bench
19:13 bildz 2015-11-19 19:13:35.747804 7f786c57d700  0 -- 192.168.0.6:0/1021273 >> 192.168.0.7:6802/8119 pipe(0x7f78640057f0 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f7864005a80).fault
19:13 bildz Error ENXIO: osd down
19:17 xarses and ceph osd tree shows it down too?
19:20 xarses _gryf: ya, there is no plans for any of these to end up in fuel for 8.0. FF is on the 3rd
19:21 bildz xarses: no that's the weird part.  Its up
19:21 bildz IIRC that was the one ceph osd i had to deploy manually to get the install to complete
19:22 _gryf xarses, k, thx
19:22 xarses ok, so you want to go probe the logs to see if it's having problems staying connected to the monitor
19:22 xarses _gryf: this is something that we could probably work together on for 9
19:24 pauls132000 joined #fuel
19:25 _gryf xarses, I think, what we can have for mitaka cycle would be a beta quality (or maybe I'm wrong) so, there is no rush
19:27 xarses _gryf: Fuel 9 will be Mitaka
19:28 _gryf xarses, oh, you're right.
19:29 _gryf xarses, the idea was to gather all existing solutions or ideas and make one solid out of them. That's why I was so persistent about the fuel plans :)
19:31 thegmanagain joined #fuel
19:41 xarses My first thought would be to have an openstack service responsible for this and have it talk to nova about questions like this
19:41 xarses rather than designing something around it's limitations
19:41 _gryf xarses, there was many attempts to make it done in nova
19:42 _gryf xarses, all of them have failed
19:42 _gryf for any reasons
19:43 _gryf one of the most common was "this feature does not belong to this project"
19:44 bildz log [ERR] : OSD full dropping all updates 99% full
19:44 _gryf which I cannot say it for no reason
19:45 xarses no the feature doesn't belong in nova
19:45 xarses but the hooks for a new service that preforms this function does
19:45 bildz xarses: looks like osd.2 is full
19:46 bildz how can I pull it and rebalance?
19:46 xarses how? I thought you haven't been able to put any data in the cluster
19:46 bildz how is it full?
19:46 bildz /dev/sdd4            409M  406M  3.0M 100% /var/lib/ceph/osd/ceph-2
19:46 xarses oh, well why do we have a 400Mb osd then
19:46 bildz can i manage the partition in fuel?
19:47 bildz xarses: that's a VERY good question :)
19:47 xarses ya, before you deploy
19:47 xarses the minimum is 2GB, I'm not sure how it even let you deploy
19:48 xarses (for ceph-osd)
19:49 neouf joined #fuel
19:49 bildz how do I fix this?
19:50 xarses mark the osd out wait for it to rebalance and then remove it
19:50 xarses its starting to look like redeploy is a good idea too
19:52 bildz i agree
19:52 bildz is there a how to on this anywhere?
19:52 bildz and Im thinking i may need to purchase a book on ceph
19:53 _gryf xarses, i think I don't unserstand, what hooks are you talking about?
19:53 xarses whatever hooks are necessary to preform the monitoring that the response system needs
19:55 _gryf did you mean monitoring on openstack level, or monitoring on cluster level?
19:57 xarses from what I'm reading. Extra monitoring is done to determine if the instance is running / stopped / isolated / etc... with tools tools outside of openstack
19:59 xarses that monitoring / notification could be functions from nova, that we can then subscribe to ... ie ceilometer style
19:59 xarses we subscribe to them and define our own actions
20:00 xarses might even be doable in heat already once we have the proper events being sent
20:03 _gryf so what is the role of pacemaker than?
20:04 xarses unknown, I'd prefer to avoid involving it
20:05 _gryf oh
20:05 xarses its only really necessary for components that require strong quorums
20:06 xarses relaunching a dead vm with it seems heavy handed
20:07 xarses if we are building our own service to react to monitors, then pacemaker technically could be a driver for some of the actions
20:08 xarses or alerts
20:08 _gryf how about the scenario, where the services treat a faulty host dead, which is not a true?
20:10 xarses my expectation is that we would have the same set of data to decide that we would give to pacemaker
20:11 xarses in either case, if you dont have enough data to make the correct decision, how will either react to that condition
20:11 xarses correctly
20:12 _gryf well
20:12 _gryf not exactly
20:13 _gryf OS services doen't have a power to make other services alive or dead
20:13 _gryf cluster manager (like pacemaker) have that power
20:14 _gryf if manager cannot see some services responding, it try to stop and start it again
20:14 _gryf however, if it fail to stop it
20:15 _gryf it have ability to fence it, so it will not introduce any data corruption to the other cluster nodes
20:15 _gryf like sotrage for example
20:21 manashkin_ _gryf, instance high availability inside OpenStack is provided by Heat AutoScaling, indirectly.
20:31 bildz http://docs.ceph.com/docs/master/start/quick-ceph-deploy/  Is the "Admin Node" the fuel server?
20:31 _gryf manashkin_, does it include pets in pets vs cattle case?
20:31 LinusLinne joined #fuel
20:32 _gryf or just is used for scaling?
20:34 manashkin_ _gryf, Autoscale monitors the cluster and maintains the given number of instances or services of given type on given hosts depending on given conditions, like hard-coded rules or dynamic, like current load to the service
20:35 manashkin_ _gryf, So, it is possible to create a simple rule like always run # VMs.
20:36 LinusLinne Hi guys, having problem deploing version 7 of Mirantis on VmWare, version 6.1 works on same envirmonet, fails on vcenter_hooks.py --create_zones, http://paste.openstack.org/show/479481/ , any input apprechiated!
20:37 _gryf manashkin_, ok, cool. how well it handle the split brain situation on management network, while storage network is still up and running?
20:39 javeriak_ joined #fuel
20:43 manashkin_ LinusLinne, You need to update your VMware account permissions, since OpenStack Kilo has additional requirements
20:44 manashkin_ LinusLinne, http://docs.openstack.org/kilo/config-reference/content/vmware.html
20:45 javeriak joined #fuel
20:46 LinusLinne Thanks, will have a look at it. However i run the install with the install administrator but i will go over the permissions again :)
20:47 manashkin_ LinusLinne, please pay attention to  "Register extension" permission, if I remember correct this one is new.
20:49 manashkin_ _gryf, Heat performs actions based on indications. If it is possible to create network monitoring script - Heat may use it.
20:49 LinusLinne manashkin_ Will do!
20:49 jerrygb joined #fuel
20:50 jerrygb joined #fuel
20:51 _gryf manashkin_, oh. I see. thanks for suggestion
21:08 e0ne joined #fuel
21:35 jerrygb joined #fuel
21:37 javeriak_ joined #fuel
21:38 tzn joined #fuel
21:42 srmaddox joined #fuel
21:44 srmaddox help register
21:44 srmaddox register help
21:44 srmaddox mt
21:51 javeriak joined #fuel
21:58 ericjwolf Greetings.  what controlls the creation of the public api address?  Is this turned off by default?
22:06 ericjwolf I am trying to use some external tools to use the public API interface and I keep getting an error but my userid is good.
22:31 neouf joined #fuel
22:32 DevStok joined #fuel
22:33 DevStok hi
22:33 DevStok i'm getting
22:33 DevStok ERROR: Could not bind to 192.168.0.6:9292 after trying for 30 seconds
22:36 jerrygb joined #fuel
22:52 Sesso joined #fuel
23:01 jerrygb joined #fuel
23:13 rmoe joined #fuel
23:33 javeriak_ joined #fuel
23:38 javeriak joined #fuel
23:38 zhangjn joined #fuel
23:39 zhangjn joined #fuel
23:40 zhangjn joined #fuel

| Channels | #fuel index | Today | | Search | Google Search | Plain-Text | summary