Perl 6 - the future is here, just unevenly distributed

IRC log for #gluster, 2015-06-05

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:05 badone_ joined #gluster
00:11 [7] http://pastie.org/pastes/10224252/text
00:13 [7] http://pastie.org/pastes/10224255/text
00:14 [7] (ignore the first one)
00:15 [7] still folders missing though
00:16 cholcombe hmm
00:16 cholcombe are all 3 replicas showing the same mount info?
00:16 [7] "mount info"?
00:17 [7] let me wait for the first heal to complete before switching to replica3
00:20 drakonstein joined #gluster
00:21 cholcombe sorry i meant on each replica if you mount it does it show the same thing?
00:21 cholcombe if they do maybe investigate the underlying bricks and see what they're showing
00:21 [7] I don't think I can mount specific replicas
00:22 cholcombe nvm i'm not sure what i was thinking
00:22 drakonstein for the statistics heal-count command, does anyone know if a file can hold up the queue?  I have 10 files listed for a couple servers.  So if the first of the 10 files is frozen, would the other 9 just sit there waiting, or would it be able to  do more at once?
00:23 cholcombe i think it does go in some kind of order.  possibly the order the filesystem returns on ls
00:23 cholcombe i don't remember exactly though
00:23 drakonstein [7] - If you disolved the volume and created a new volume with one set of replica, wouldn't you be able to then mount it by itself for testing?
00:24 [7] well I can look into the underlying bricks
00:52 nangthang joined #gluster
00:58 [7] http://pastie.org/pastes/10224301/text
01:00 [7] so it looks like it doesn't heal everything
01:03 lpabon joined #gluster
01:03 [7] yes, ~30% of the files missing in the bricks
01:12 TheCthulhu1 joined #gluster
01:24 johnnytran joined #gluster
01:45 [7] and it doens't seem to have any intention of fixing that: http://pastie.org/pastes/10224348/text
01:55 kdhananjay joined #gluster
02:03 al joined #gluster
02:39 Jandre joined #gluster
02:40 bharata-rao joined #gluster
02:50 harish_ joined #gluster
02:50 doubt_ joined #gluster
03:05 TheSeven joined #gluster
03:27 maveric_amitc_ joined #gluster
03:31 shubhendu__ joined #gluster
03:35 sripathi joined #gluster
03:39 nishanth joined #gluster
03:45 Peppard joined #gluster
03:46 atinmu joined #gluster
03:48 itisravi joined #gluster
03:49 sakshi joined #gluster
04:05 RameshN joined #gluster
04:26 pppp joined #gluster
04:29 gildub joined #gluster
04:35 nishanth joined #gluster
04:45 R0ok_ joined #gluster
04:50 spandit joined #gluster
04:50 yazhini joined #gluster
04:51 rafi1 joined #gluster
04:54 hagarth joined #gluster
04:54 Bhaskarakiran joined #gluster
04:54 jiffin joined #gluster
05:00 ppai joined #gluster
05:07 gem joined #gluster
05:12 karnan joined #gluster
05:13 hgowtham joined #gluster
05:16 soumya joined #gluster
05:26 spalai joined #gluster
05:29 ashiq joined #gluster
05:30 kdhananjay joined #gluster
05:31 ashiq- joined #gluster
05:33 kdhananjay1 joined #gluster
05:37 maveric_amitc_ joined #gluster
05:38 atalur joined #gluster
05:41 RajeshReddy joined #gluster
05:41 rjoseph joined #gluster
05:41 hagarth joined #gluster
05:44 gem joined #gluster
05:45 raghu` joined #gluster
05:46 vimal joined #gluster
05:47 Manikandan joined #gluster
05:50 overclk joined #gluster
05:54 kdhananjay joined #gluster
06:01 kotreshhr joined #gluster
06:03 spalai joined #gluster
06:09 spalai joined #gluster
06:13 atalur joined #gluster
06:17 c0m0 joined #gluster
06:18 poornimag joined #gluster
06:19 dcroonen joined #gluster
06:23 yazhini joined #gluster
06:24 jtux joined #gluster
06:26 hchiramm joined #gluster
06:27 Philambdo joined #gluster
06:29 SOLDIERz joined #gluster
06:30 spalai joined #gluster
06:34 schandra joined #gluster
06:38 raghu` joined #gluster
06:44 maveric_amitc_ joined #gluster
06:49 karnan joined #gluster
06:50 spalai joined #gluster
06:53 sripathi joined #gluster
07:06 karnan joined #gluster
07:09 glusterbot News from newglusterbugs: [Bug 1228535] Memory leak in marker xlator <https://bugzilla.redhat.com/show_bug.cgi?id=1228535>
07:10 nbalacha joined #gluster
07:11 jtux joined #gluster
07:12 [Enrico] joined #gluster
07:15 ghenry joined #gluster
07:15 ghenry joined #gluster
07:23 Trefex joined #gluster
07:31 anrao joined #gluster
07:46 Slashman joined #gluster
07:48 fsimonce joined #gluster
07:51 arcolife joined #gluster
07:58 arcolife joined #gluster
08:00 WildyLion joined #gluster
08:00 WildyLion Hi. I need help :(
08:01 WildyLion One node shows as Peer in cluster (disconnected) and it seems like only that one node has a failed rebalance task
08:01 WildyLion so, it's some type of split brain
08:02 WildyLion the other nodes don't see this node's self-heal daemon operating, how could I get everythign back to normal?
08:02 schandra joined #gluster
08:02 WildyLion I'm running glusterfs 3.7.0 there
08:05 hagarth joined #gluster
08:17 WildyLion damn, it was due to botched iptables configuration on that node :\
08:20 hchiramm WildyLion, does that mean issue is resolved ? :)
08:22 jtux joined #gluster
08:23 atinmu joined #gluster
08:55 dusmant joined #gluster
08:57 DV joined #gluster
08:57 karnan joined #gluster
09:08 Norky joined #gluster
09:36 semajnz joined #gluster
09:40 glusterbot News from newglusterbugs: [Bug 1228592] Glusterd fails to start after volume restore, tier attach and node reboot <https://bugzilla.redhat.com/show_bug.cgi?id=1228592>
09:49 kdhananjay joined #gluster
09:59 autoditac joined #gluster
10:19 LebedevRI joined #gluster
10:28 c0m0 joined #gluster
10:29 badone_ joined #gluster
10:32 kdhananjay joined #gluster
10:46 plarsen joined #gluster
10:55 karnan joined #gluster
10:59 meghanam joined #gluster
11:01 julim joined #gluster
11:06 RameshN joined #gluster
11:10 ctria joined #gluster
11:11 harish_ joined #gluster
11:13 ira joined #gluster
11:18 ira joined #gluster
11:23 bene2 joined #gluster
11:32 deniszh joined #gluster
11:39 elico joined #gluster
11:43 B21956 joined #gluster
11:44 c0m0 joined #gluster
11:45 Manikandan joined #gluster
11:45 meghanam_ joined #gluster
11:47 soumya_ joined #gluster
11:47 meghanam_ joined #gluster
11:47 jiffin1 joined #gluster
11:55 rjoseph joined #gluster
12:02 kotreshhr left #gluster
12:17 Trefex1 joined #gluster
12:20 glusterbot News from resolvedglusterbugs: [Bug 1219547] I/O failure on attaching tier <https://bugzilla.redhat.com/show_bug.cgi?id=1219547>
12:26 elico joined #gluster
12:26 rjoseph joined #gluster
12:27 gildub joined #gluster
12:28 gildub_ joined #gluster
12:29 gildub_ joined #gluster
12:30 zeittunnel joined #gluster
12:39 jcastill1 joined #gluster
12:43 danny__ joined #gluster
12:44 jcastillo joined #gluster
12:46 pppp joined #gluster
12:47 theron joined #gluster
12:49 jiffin joined #gluster
12:52 wkf joined #gluster
13:00 Dropje joined #gluster
13:01 T0aD joined #gluster
13:01 aaronott joined #gluster
13:03 soumya_ joined #gluster
13:04 Dropje Hi we've had some production issues yesterday and I'm trying to figure what happened. We have a two node replica setup with the inactive one begin rebooted. This started a failure where the application running on the primary node unable to write to the glusterfs volume. During that time I'm seeing continues errors in libgfrpc: http://pastebin.com/MHrwaGLr What do these errors mean and could it be related to the failure of not being able to write? T
13:04 glusterbot Please use http://fpaste.org or http://paste.ubuntu.com/ . pb has too many ads. Say @paste in channel for info about paste utils.
13:05 Dropje http://paste.ubuntu.com/11587879/
13:07 Manikandan joined #gluster
13:09 rafi1 joined #gluster
13:10 glusterbot News from newglusterbugs: [Bug 1228696] geo-rep: gverify.sh throws error if slave_host entry is not added to know_hosts file <https://bugzilla.redhat.com/show_bug.cgi?id=1228696>
13:14 dgandhi joined #gluster
13:17 hagarth joined #gluster
13:19 kkeithley1 joined #gluster
13:22 Manikandan joined #gluster
13:35 meghanam joined #gluster
13:36 jcastill1 joined #gluster
13:37 meghanam joined #gluster
13:41 jcastillo joined #gluster
13:41 hamiller joined #gluster
13:45 elico left #gluster
13:47 coredump joined #gluster
13:50 spalai left #gluster
13:59 rwheeler joined #gluster
14:02 plarsen joined #gluster
14:11 jcastill1 joined #gluster
14:16 wushudoin joined #gluster
14:16 jcastillo joined #gluster
14:23 dusmantkp_ joined #gluster
14:32 ctria joined #gluster
14:34 bennyturns joined #gluster
14:35 rjoseph joined #gluster
14:49 Trefex joined #gluster
14:51 glusterbot News from resolvedglusterbugs: [Bug 1216039] nfs-ganesha: Discrepancies with lock states recovery during migration <https://bugzilla.redhat.com/show_bug.cgi?id=1216039>
15:30 jcastill1 joined #gluster
15:35 jcastillo joined #gluster
15:36 ctria joined #gluster
15:38 nbalacha joined #gluster
15:43 hagarth joined #gluster
15:54 edualbus joined #gluster
16:08 jcastill1 joined #gluster
16:09 cholcombe joined #gluster
16:13 jcastillo joined #gluster
16:14 bennyturns joined #gluster
16:16 TheSeven Dropje: I guess, based on what little experience I have (I'm fairly new to this stuff), that this is working as designed
16:16 TheSeven if 1 out of 2 nodes fails, the remaining node loses quorum
16:17 TheSeven while the first brick apparently counts twice on even replica counts, that doesn't seem to be true for the server node quorum
16:18 TheSeven apparently everyone suggests to go with 3 replicas and add new servers always in multiples of 3, which I also don't consider very convenient...
16:22 Leildin joined #gluster
16:30 JoeJulian Dropje: From that snippit, it lost connection with a server without receiving the TCP RST. This caused the client to wait 42 seconds for the tcp connection to time out (ping-timeout). After that, since it (afaict) only was connected to one of the replica and could no longer perform the iop, it failed the transaction.
16:32 JoeJulian Dropje: I would check to ensure the client can connect to all the replicas. I would also try to ensure that your shutdown process doesn't bring down the network connection before the glusterfsd processes are killed. Killing the glusterfsd process gracefully closes the TCP connections avoiding ping-timeout.
16:33 TheSeven JoeJulian: guess you saw that shitload of pastes I sent last night... did you have time to look at them? if so, do you consider that expected behavior or is that a bug?
16:34 shaunm_ joined #gluster
16:36 TheSeven (especially the last one)
16:37 magamo joined #gluster
16:37 magamo Hello, folks.
16:38 magamo I'm trying to set up distributed geo-rep, and immediately upon first starting, I'm getting back a 'faulty' status from one of my nodes in the master cluster.
16:38 magamo glusterfs 3.6.3
16:39 magamo Ah, there it goes, it just switched over to one brick on each subvol listed as 'Active', the others as 'Passive'.
16:40 Rapture joined #gluster
16:41 JoeJulian TheSeven: actually, I've only just begun my scrollback. :)
16:42 gem joined #gluster
16:49 JoeJulian TheSeven: Looks like a bug to me. What if you "gluster volume heal $vol full" after adding a replica?
16:51 JoeJulian I'm not entirely surprised about the heal info being empty. Adding a replica set should queue a heal...full (I haven't looked at the code to see if it does). The heal...info shows files that are known to need healing. Since there's no pre-cached list of files, until it walks the tree there's no way for it to know what needs healed to the new replica.
16:52 JoeJulian Some indicator in heal info stating that a full heal was in process would be really nice though.
16:55 TheSeven JoeJulian: I've tried doing a full heal manually in some of the experiments, never seemed to have any effect
16:56 TheSeven the fact that it only ever healed 2/3 of the files seems really fishy though
16:56 JoeJulian wait another 10 minutes and see if that changes
16:57 JoeJulian I suspect it fills up some queue and doesn't do anything more until the next schedule
16:57 TheSeven I've waited for the whole night and it didn't
16:57 JoeJulian but that's just a guess.
16:57 JoeJulian ah
16:57 magamo .part
16:57 magamo left #gluster
16:57 JoeJulian no!
16:57 JoeJulian don't go!
16:57 JoeJulian :)
16:57 TheSeven I still have the setup from my last paste running, can provide further info if needed
16:58 TheSeven brick file counts haven't moved for several hours
16:58 JoeJulian I think you have enough info to file a bug report. Can you include the glustershd log(s).
16:58 glusterbot https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS
16:59 TheSeven I've captured the logs from /var/log/glusterfs, anything else that might be helpful?
17:00 JoeJulian That should be it.
17:00 TheSeven anyway, my question basically boils down to: is it expected that files appear to be missing to clients while a heal is in progress?
17:01 TheSeven IMO a brick shouldn't be considered part of the replicate volume (from a client's perspective) until it has fully healed
17:01 JoeJulian definitely not.
17:01 TheSeven however I see lots of files go invisible instantly as soon as I increase the replica count
17:01 JoeJulian Separate bug report. :)
17:02 JoeJulian So... was there any specific feature in 3.7 that you needed? Or can you use 3.6?
17:03 TheSeven I have no idea to be honest... it's oVirt 3.6 that pulled this in
17:03 JoeJulian What distro?
17:03 TheSeven centos 7.1
17:04 JoeJulian What oVirt repo?
17:04 TheSeven something that's installed by http://resources.ovirt.org/pub/yum-repo/ovirt-release36.rpm
17:04 TheSeven let me have a look
17:05 TheSeven seems to be installing http://download.gluster.org/pub/gluster/glusterfs/LATEST/EPEL.repo/epel-$releasever/$basearch/
17:06 JoeJulian yeah, I'd call that a bug.
17:06 JoeJulian I'd change ovirt-3.6-glusterfs-epel from LATEST to 3.6 if it were me in pre-production, 3.5 if production.
17:07 TheSeven well this is ovirt alpha, so I guess they're testing against latest and then nailing that down to a specific version when they actually release it
17:10 JoeJulian Well that makes sense. I'm trying to get someone from oVirt to participate in the gluster board so these can be a bit more coordinated.
17:18 TheSeven the killer feature of ovirt 3.6 that's making me use that version is gluster-backed hosted engine btw.
17:21 * TheSeven re-tests with 3.6.2-ubuntu1~trusty3 on ubuntu 14.04 LTS
17:25 TheSeven same story there :/
17:25 TheSeven at least as far as missing files during self-heal are concerned
17:26 TheSeven let's see if it heals completely at least
17:27 JoeJulian So... create a volume, mount it, create 1000 files, add a brick and increase the replica count, ls the mount. Am I right?
17:28 TheSeven yes
17:29 TheSeven the very first ls after the increase still seems to work
17:29 TheSeven but a few seconds later files start to go missing
17:30 TheSeven ~10 seconds after increasing the replica count, about 60% of the directorys in the volume's root are gone
17:31 TheSeven self heal also seemed to finish after copying just ~30% of the files
17:32 jiffin joined #gluster
17:32 JoeJulian which might make sense if glustershd is experiencing the same directory read problem.
17:32 JoeJulian What filesystem is on the bricks?
17:33 TheSeven xfs
17:33 TheSeven can test with e.g. ext4 if you want
17:33 TheSeven but let me try to further simplify the procedure first and completely take distribute out of the equation
17:34 JoeJulian nah, no need to test ext4
17:36 TheSeven bwahahahaa
17:36 TheSeven 1 of 100 folders visible this time
17:36 * JoeJulian needs to file a bug
17:36 glusterbot https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS
17:39 TheSeven yup, also only healed one
17:39 TheSeven so distribute isn't causing this either
17:44 TheSeven I'm starting to have the impression that only directories that have been accessed between adding the replica and self-heal kicking in are actually self-healed
17:44 TheSeven the rest will go missing from the volume
17:44 JoeJulian ... and, of course, I can't repro the issue...
17:45 JoeJulian At least not with 3.6.3
17:46 TheSeven doing a find on the mount right after adding the replica results in way less missing files temporarily, and a complete heal after a while
17:48 JoeJulian I had watch 'ls -lR /mnt/foo | wc -l' running and the count did not change. Let me try it with 10k files instead of 1k.
17:51 TheSeven let me try to make a script that reproduces this
18:03 B21956 joined #gluster
18:03 TheSeven hm, this doesn't seem to happen always
18:03 TheSeven only happened the second time I ran that script
18:03 TheSeven the first time it worked for some reason
18:03 TheSeven so you might need multiple tries as well
18:03 TheSeven I guess there's timing involved...
18:03 TheSeven http://pastie.org/pastes/10225618/text
18:04 TheSeven output of pasting that into a terminal: http://pastie.org/pastes/10225619/text
18:11 glusterbot News from newglusterbugs: [Bug 1228785] Cannot add brick without manually setting op-version <https://bugzilla.redhat.com/show_bug.cgi?id=1228785>
18:17 deniszh joined #gluster
18:19 maveric_amitc_ joined #gluster
18:20 deniszh joined #gluster
18:31 JoeJulian Sorry, TheSeven, but I cannot reproduce with 3.6.3.
18:32 JoeJulian I might try 3.7.1 after work today, but I've already screwed around for too long this morning.
18:32 TheSeven JoeJulian: my log is from 3.6.2-ubuntu1~trusty3
18:34 JoeJulian @ppa
18:34 glusterbot JoeJulian: The official glusterfs packages for Ubuntu are available here: 3.4: http://goo.gl/M9CXF8 3.5: http://goo.gl/6HBwKh 3.6: http://goo.gl/XyYImN -- See more PPAs for QEMU with GlusterFS support, and GlusterFS QA releases at https://launchpad.net/~gluster -- contact semiosis with feedback
18:34 semiosis need to update those, dont i
18:34 JoeJulian Hey! You're back. :)
18:34 semiosis never left, just been really busy lately
18:34 JoeJulian I thought hagarth was going to do it.
18:35 JoeJulian Well, busy's good, I hope.
18:35 semiosis i have a little vagrant box i've been using to do builds i need to send to kkeithley et al
18:37 papamoose1 joined #gluster
18:43 deniszh joined #gluster
18:57 chirino joined #gluster
18:57 cholcombe did the self heal directory go away in gluster 3.6?
18:57 cholcombe i can't seem to find it now
19:00 JoeJulian $brick/.glusterfs/indices/xattrop is still there
19:00 JoeJulian assuming that's what you mean by self-heal directory.
19:10 Pupeno joined #gluster
19:23 Pupeno joined #gluster
19:23 Pupeno joined #gluster
19:49 cholcombe yeah
19:50 cholcombe JoeJulian, do you know if you can still do a getfattr to show the current directory quota information?
19:51 JoeJulian If you could before that shouldn't have gone away.
19:52 JoeJulian I don't use quota so I'm not much of an expert on that.
20:07 B21956 joined #gluster
20:09 cholcombe ok
20:15 jmcantrell joined #gluster
20:16 B21956 joined #gluster
20:22 TheSeven am I correct that probing a second server into the gluster pool, without actually hosting any bricks on that one, will make the volumes be inaccessible if any of the servers is down?
20:22 TheSeven (due to lost server quorum)
20:23 JoeJulian Not if server quorum isn't enabled.
20:23 cholcombe JoeJulian, it's still there :)  you can getfattr -d -m . directory/ on the backend brick and get the quota usage
20:23 TheSeven assuming default settings
20:23 JoeJulian Personally, I wouldn't enable server quorum until I had a valid quorum.
20:23 TheSeven so is that on or off by default? what's its purpose?
20:24 JoeJulian Did they change the default in 3.7? (me looks at the code)
20:24 TheSeven where can one enable/disable it?
20:24 TheSeven I mean I could totally understand e.g. blocking modifications of volume meta-information while there's no server quorum
20:24 TheSeven but bringing down even read access to all volumes... c'mon
20:27 DV joined #gluster
20:40 fyxim joined #gluster
20:44 maZtah joined #gluster
20:47 samsaffron___ joined #gluster
20:50 kkeithley1 joined #gluster
20:52 billputer joined #gluster
20:58 ghenry joined #gluster
20:58 ghenry joined #gluster
21:01 PaulCuzner joined #gluster
21:05 RicardoSSP joined #gluster
21:05 RicardoSSP joined #gluster
21:17 wkf joined #gluster
21:25 JoeJulian TheSeven: "gluster volume set { $vol | all } cluster.server-quorum-type server"
21:25 TheSeven ok, that seems to be default
21:26 TheSeven or wait...
21:26 TheSeven was probably set by "set $vol group virt"
21:26 JoeJulian default is off but yeah, I bet that does set that feature.
21:27 PaulCuzner joined #gluster
21:27 TheSeven what's the difference between "off" and "none"?
21:28 JoeJulian nothing
21:28 TheSeven interesting that it reports "off" on another volume, but doesn't accept that value for set... only "none", which curiously even gets reported back as "none" by get
21:28 TheSeven so you can't get back to off ;)
21:29 TheSeven also, what's the point of server quorum in the first place? are there any docs on that?
21:31 TheSeven given that there's a quorum on the bricks as well, it seems rather pointless to me to additionally have a server quorum, so what's the story behind that?
21:31 wushudoin| joined #gluster
21:31 JoeJulian server quorum is to avoid split-brain in the event of a network partitin.
21:31 JoeJulian partition
21:32 TheSeven wouldn't that also partition the bricks though to make only one side writable?
21:32 JoeJulian No, that's the other quorum feature that's disabled by default.
21:33 TheSeven ah, so these are basically alternatives and not meant to be used in combination? (or at least it doesn't make much sense to use them in combination?)
21:34 JoeJulian Right. brick quorum: 3 servers... The client loses connection to 2 of them, goes read-only. That's useful for maybe a web app where it's important to have access to the data, but not necessarily modify it.
21:34 TheSeven brick quorum seems to make a lot more sense to me, it has the advantage that it doesn't affect volumes which haven't even been partitioned (i.e. have all bricks on one side of the partition), and that it handles equal partitions more gracefully (by making the first brick count twice)
21:35 JoeJulian server quorum: 3 servers, one server loses connection to the other two. It shuts down its bricks so a client that might be on the same network partition cannot cause split-brain.
21:36 TheSeven oh, so these are operating at different layers... whatever implications that may have
21:36 JoeJulian Right
21:36 TheSeven can there be split brain situations with just brick quorum enabled?
21:36 JoeJulian In 3.7, for VM hosting, it might be worthwhile to have no quorum, but instead have automated split-brain resolution.
21:37 TheSeven how does that work? by just picking the file with newer modification date?
21:37 TheSeven does that work on file or block level?
21:37 JoeJulian Yes, you can probably make up a scenario to break just about anything. :)
21:38 JoeJulian There are several options. Modification date is one of them.
21:38 JoeJulian And it works at the file level.
21:38 TheSeven hm, that seems similar to what microsoft's DFSR thing does
21:38 TheSeven just hopefully more reliable ;)
21:39 TheSeven how quickly would split brain resolution happen after the split is removed?
21:39 TheSeven i.e. when does it actually realize that there's work to do?
21:40 TheSeven would, after the split was removed, opening a handle on a file guarantee that I get the version that would be picked by split brain resolution once it fixes the file?
21:41 TheSeven or would I just get a random copy of it, potentially outdated or even inconsistent?
21:41 JoeJulian < 3 seconds or when a lookup() happens.
21:42 TheSeven ...and that happens if I open the file, or even just see it in a directory listing?
21:43 JoeJulian open, stat... directory listing if the thing listing the directory checks attributes, like ls usually does for displaying color or decorations.
21:44 TheSeven ok... so opening a handle would make gluster realize that there's a problem, schedule fixing it, and give me the data from the surviving side of the split?
21:44 JoeJulian yes
21:44 TheSeven that sounds reasonable
21:44 TheSeven more reasonable than DFS actually, where you get a handle to a random (DNS round robin?) one
21:44 TheSeven and fixing happens independently
21:46 TheSeven now, for fileserver applications, assuming samba+ctdb running on a gluster volume (with CTDB metadata on gluster as well), not using a quorum is probably not a good idea?
21:46 JoeJulian I've been hanging out here for 5 years after finding gluster. There are bugs (show me a package with no bugs and I'll show you a package that nobody uses), but overall the design has been pretty robust.
21:46 TheSeven ...aside from a few serious WTFs such as the removal of replace-brick ;)
21:47 TheSeven I guess a split-brain on ctdb would cause all hell of trouble
21:47 JoeJulian Depends on how much you trust your network, what your SLA/OLA looks like, how much manpower you have to fix something in the event a split-brain happens, how large of a problem a split-brain is... etc.
21:47 JoeJulian If a file goes split-brain, the client will error trying to access it.
21:48 JoeJulian The data is left on the bricks intact and an admin can remove the "bad" copy.
21:48 TheSeven ...unless auto resolution is enabled
21:48 JoeJulian right
21:48 TheSeven if that is enabled, does the access still fail, or get redirected to the correct side?
21:48 JoeJulian on a ctdb, I would be afraid of that.
21:49 JoeJulian When A/R updates their spreadsheet and loses data, bad things happen.
21:49 JoeJulian if auto resolution is enabled, access succeeds to the correct side.
21:49 TheSeven that's great! :)
21:50 wushudoin| joined #gluster
21:50 TheSeven now what happens if open handles/locks exist on both sides at the time when the split is removed? :P
21:50 TheSeven ctdb or sanlock would surely run into that
21:52 JoeJulian Locks are versioned, but I know there are some improvements in the works on how well they interact with ctdb.
21:52 JoeJulian I always turned off oplocks
21:53 TheSeven I guess none of that will really help as you have to break semantics of at least one of the file handles to resolve the split brain
21:54 JoeJulian What I tend to do is encourage people to switch to linux or if they must use windows, spend an eternity in purgatory.
21:55 TheSeven won't work here
21:55 JoeJulian Often doesn't.
21:55 JoeJulian But for some reason the windows user problems always hit a lower priority queue.
21:56 TheSeven won't help either, as there won't be non-windows users (aside from admins) in this setup
21:57 jbautista- joined #gluster
21:59 * TheSeven is amazed by these VMs rebooting in like 3 seconds ;)
21:59 JoeJulian :)
21:59 JoeJulian part of that is systemd
21:59 TheSeven bad enough that you're wondering if they had actually rebooted or just ignored the command
22:02 lexi2 joined #gluster
22:02 jbautista- joined #gluster
22:02 TheSeven guess the quorum stuff wouldn't be as bad if ovirt-ha-agent wouldn't fail at resuming a VM that has been halted because of storage issues
22:02 TheSeven need to file a bug about that
22:02 glusterbot https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS
22:02 TheSeven not there, I guess, though
22:43 Vortac Getting mount.nfs: requested NFS version or transport protocol is not supported when trying to mount a gluster volume using nfs - mount cmd is  mount -o mountproto=tcp -t nfs <ip-addr>:/rep-volume /mnt/glusterfs
22:43 glusterbot Vortac: make sure your volume is started. If you changed nfs.disable, restarting your volume is known to work.
22:45 Vortac glusterbot: volume is started and didn't change nfs.disable as far as I know. Is nfs.enable the default?
22:47 gildub joined #gluster
22:55 JoeJulian Vortac: it is.
22:55 JoeJulian ~nfs | Vortac
22:55 glusterbot Vortac: To mount via nfs, most distros require the options, tcp,vers=3 -- Also an rpc port mapper (like rpcbind in EL distributions) should be running on the server, and the kernel nfs server (nfsd) should be disabled
22:58 JoeJulian If you're running EL7 (RHEL or Centos) there's a bug in the start up script of rpcbind, bug 1181779
22:58 glusterbot Bug https://bugzilla.redhat.com:443/show_bug.cgi?id=1181779 unspecified, unspecified, rc, steved, ON_QA , rpcbind prevents Gluster/NFS from registering itself after a restart/reboot
22:58 Vortac JoeJulian: Running 6.4 Final
22:59 doubt_ joined #gluster
23:09 premera joined #gluster
23:11 JoeJulian Vortac: If none of that helps, check the log in /var/log/glusterfs/nfs.log
23:11 JoeJulian If you feel the need to share it, use fpaste.org
23:16 Rapture joined #gluster
23:45 Vortac JoeJulian: Thanks, Still not working with  mount -o mountproto=tcp,vers=3 -t nfs <ip-addr>:/rep-volume /mnt/glusterfs
23:56 badone_ joined #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary