Perl 6 - the future is here, just unevenly distributed

IRC log for #gluster, 2014-12-13

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:04 mrlesmithjr joined #gluster
00:06 B21956 left #gluster
00:06 mrlesmithjr does anyone know why a server running GlusterFS Client+Server will not mount on boot with the following in /etc/fstab - localhost:/apache       /var/www        glusterfs       defaults,_netdev        0       0
00:07 mrlesmithjr if I put mount -a in /etc/rc.local it eventually mounts but the server hangs for a few on boot
00:08 partner i recall localhost was not allowed host for glusterfs, use the real name instead?
00:09 partner hmm might have been related to volume thought..
00:09 partner mrlesmithjr: what is your OS?
00:10 mrlesmithjr Ubuntu 14.04
00:10 mrlesmithjr I originally had the servername itself and it was the same scenario
00:10 partner is the volume started before you try to mount it?
00:10 mrlesmithjr yeah on the 2nd gluster node
00:11 _pol_ joined #gluster
00:11 mrlesmithjr 2 web servers running client+server to replicate apache websites
00:11 partner hmm how does that help if you query your localhost which might have not anything running yet?
00:12 partner when the mount is attempted, that is
00:13 _pol_ joined #gluster
00:14 mrlesmithjr no one does this? :) Every article I have read on setting up is how I have it but it seems that Ubuntu PPA is showing this issue?
00:15 mrlesmithjr http://blog.gluster.org/category/high-availability/
00:16 partner sure, plenty of replicas around
00:17 partner but what i have won't answer to your question
00:17 mrlesmithjr no worries
00:18 JoeJulian Ah, ubuntu... the _netdev is superfluous on non-el based distros.
00:19 partner i can only assume the glusterfs isn't ready at the time of mount attempt and as its using localhost it cannot even query the replica partner. but i'm not expert here
00:19 partner plus all the funky around the _netdev and load orders..
00:20 partner one cure that comes to mind is to use round-robin dns entry for the two hosts, that would ensure glusterfs finds its volume info
00:21 partner or even the backupvolfile option for the mount
00:21 JoeJulian glusterd forks almost immediately on startup, satisfying upstart that it's running. That allows the glusterfs mounts to proceed. It's possible for those mounts to be happening before glusterd has spawned the glusterfsd (brick) processes.
00:23 JoeJulian The best solution is to replace upstart. Short of that, however, you could modify the glusterfs mount job, throwing in a sleep to give the bricks time to start.
00:23 mrlesmithjr exactly what I am thinking. Possibly changing update-rc.d
00:24 JoeJulian I've had to do that before. It seems to be hit-and-miss on which installations have that problem.
00:24 mrlesmithjr thanks for the info...Just figured I would bounce this off everyone here
00:28 partner still, given its replica it could get all the needed info from the neighbour and fire up without changes to certain system files (?)
00:29 partner it would work even if the local replica would be screwed up totally
00:29 JoeJulian Unless both replicas are down, then you could never boot up.
00:34 partner umm? why wouldn't it boot up? just service would be down at that point if both failed (say, same virtualization platform happened to fail and both rebooted at the same time)
00:34 partner just asking, might learn something new here :)
00:35 JoeJulian Well, it would boot up, but the first to boot wouldn't be able to get the client mount from the other server that wasn't booted up.
00:36 partner is the backupvolfile still supported, no mentioning on manual but i find traces from 3.5 to it ?
00:36 plarsen joined #gluster
00:37 partner puuh, i would so much need to lab these things, would cut down the assumptions, no idea how many times the mount is even attempted, many try to retry
00:38 partner default for nfs is 10000 retries i recall..
00:38 JoeJulian I hate retry counts. I'd much rather use a timer.
00:39 partner i hate to even think these, why can't it just Work(tm) :)
00:42 partner then again, if things would Work(tm) would could be aswell being out there playing golf or sailing or beer..
00:47 bala joined #gluster
00:57 systemonkey joined #gluster
00:59 mrlesmithjr FYI for anyone else that may run into this. Just came across this and it definitely resolved my issue http://anhlqn.blogspot.be/2014/01/how-to-install-glusterfs-34x-server-and.html
01:02 partner yeah, good old issue, even Joe blogged about it long ago, possibly/probably due to my rant here :)
01:02 mrlesmithjr LOL
01:02 partner http://joejulian.name/blog/glusterfs-volumes-not-mounting-in-debian-squeeze-at-boot-time/
01:02 partner it was for squeeze but things ain't changed here on debian world :o
01:03 partner mrlesmithjr: glad you solved the issues thought, that is the main goal anyways
01:04 partner that is kind of why i asked as the second question what is your OS but unfortunately i'm not into ubuntu world that much
01:10 mrlesmithjr no worries. Like I said I figured I would jump in here and ask before spending hours on google :)
01:12 mrlesmithjr I am labbing out a 2 node ZFS+GlusterFS (NFS) setup for vSphere and while doing that decided to test out apache in the meantime
01:12 mrlesmithjr about 50TB between the two nodes
01:16 TrDS left #gluster
01:50 m0zes joined #gluster
01:50 root__ joined #gluster
02:01 root__ Short version of Question: Where do linkfiles come from? Backstory: I have a volume that was running as a 2x2 dis+rep but has had 2 bricks and replication removed, not all my linkfiles are messed up and causing glusterfs to crash and thus mounts to fail. If I delete the linkfile I can interact with the associated file for a little while and then something (self heal?) regenerates the linkfile and the mount will again fail when trying to interact with the file
02:04 dgandhi joined #gluster
02:27 msmith_ joined #gluster
02:45 JustinClift joined #gluster
03:05 hagarth joined #gluster
03:07 Rogue-3 Anybody know how the value of trusted.glusterfs.dht.linkto is generated when sticky-pointers are created? I've got a bunch that are stale (pointing to subvols that no longer exist) but when I delete the pointers they come back with the same bogus reference
03:42 khelll joined #gluster
03:55 _Bryan_ joined #gluster
04:55 Arminder joined #gluster
04:57 JustinClift joined #gluster
04:58 CROS__ joined #gluster
04:58 CROS__ Hey, I have a 2 server replicated gluster cluster and one of the servers went down for a server move, and I thought the cluster would stay up, but it's not seeming to mount with just one server in place.
04:59 CROS__ Is this to prevent split brain? Is there anything I can do? I can't seem to figure out how to get the filesystem running again and the whole site is now down.
05:10 bala joined #gluster
05:19 badone_ joined #gluster
05:33 badone_ joined #gluster
06:03 hagarth joined #gluster
06:04 kdhananjay joined #gluster
06:12 CROS__ Is glusterfsd needed anymore?
06:12 CROS__ It seems that the /etc/init.d/glusterfsd script doesn't work for me in gluster 3.6
06:20 glusterbot News from resolvedglusterbugs: [Bug 1138897] NetBSD port <https://bugzilla.redhat.com/show_bug.cgi?id=1138897>
06:26 bala joined #gluster
06:29 nishanth joined #gluster
06:40 sac_ joined #gluster
06:59 rjoseph joined #gluster
07:00 ctria joined #gluster
07:08 msvbhat joined #gluster
07:18 Pupeno_ joined #gluster
07:25 maveric_amitc_ joined #gluster
08:12 sac_ joined #gluster
08:45 maveric_amitc_ joined #gluster
08:55 maveric_amitc_ joined #gluster
09:02 ndevos CROS__: the glusterfsd script is only used to kill the glusterfsd processes on shutdown/reboot, the service should not be enabled
09:02 CROS__ yeah, I noticed that after a lot of crawling through docs
09:03 CROS__ After a few hours of messing around, I finally got things working by running gluster start [volume] force
09:03 CROS__ For some reason I needed the force in there even though the volume was already started.
09:05 ndevos hmm, if you need 'force', most likely something is wrong
09:05 ndevos or, was wrong
09:06 ndevos you could test by killing all glusterfs and glusterfsd processes, a 'service glusterd restart' should bring them all back
09:07 ndevos 'gluster volume status $VOLUME' should show processes for all bricks, and a self-heal and NFS one
09:18 kovshenin joined #gluster
09:18 LebedevRI joined #gluster
09:25 badone_ joined #gluster
09:26 TrDS joined #gluster
09:30 ghenry joined #gluster
09:30 ghenry joined #gluster
09:46 maveric_amitc_ joined #gluster
09:51 free_amitc_ joined #gluster
09:59 maveric_amitc_ joined #gluster
10:05 hagarth joined #gluster
10:07 free_amitc_ joined #gluster
10:10 amitc__ joined #gluster
10:14 fandi joined #gluster
10:18 gildub joined #gluster
10:18 maveric_amitc_ joined #gluster
10:22 kovshenin joined #gluster
11:17 Pupeno joined #gluster
11:45 dastar joined #gluster
12:18 fandi_ joined #gluster
13:06 khelll joined #gluster
13:28 _T3_ joined #gluster
13:29 _T3_ guys, what is the best way to (manually) ensure my firewall rules are fine?
13:30 _T3_ I'm asking because I intentionally introduced something that shouldn't work, but gluster peer status is fine :D
13:30 _T3_ and I can create files on brick1 and get it to brick2 also
13:32 hagarth _T3_: ensure that ports reported in gluster volume status for bricks are reachable
13:32 hagarth more info on ports available here: http://www.gluster.org/community/documentation/index.php/Basic_Gluster_Troubleshooting#1._What_ports_does_Gluster_need.3F
13:33 _T3_ hagarth, hmm awesome
13:34 _T3_ thank you
13:39 rotbeard joined #gluster
13:40 ndevos ~ports | _T3_
13:40 glusterbot _T3_: glusterd's management port is 24007/tcp (also 24008/tcp if you use rdma). Bricks (glusterfsd) use 49152 & up since 3.4.0 (24009 & up previously). (Deleted volumes do not reset this counter.) Additionally it will listen on 38465-38467/tcp for nfs, also 38468 for NLM since 3.3.0. NFS also depends on rpcbind/portmap on port 111 and 2049 since 3.4.
13:43 _T3_ ndevos, thanks a lot
13:44 _T3_ ndevos, hagarth: there are some typo on ports on http://www.gluster.org/community/documentation/index.php/Basic_Gluster_Troubleshooting#1._What_ports_does_Gluster_need.3F
13:45 _T3_ Gluster uses ports 34865, 34866 and 34867 for the inline Gluster NFS server.
13:45 _T3_ it should be 38* ports
13:45 _T3_ 4 and 8 are changed
13:47 ndevos _T3_: thanks, do you have an account there? it's a wiki so you can fix it :)
13:47 _T3_ ok, will do
13:48 ndevos _T3_: awesome, thanks!
13:53 _T3_ done. thank you guys for the help
13:53 ndevos nice & you're welcome :)
13:57 _T3_ ndevos, another thing, is that port 2049 is only cited by the bot
13:57 _T3_ should it be added to the doc also?
13:58 ndevos _T3_: hmm, yes, by default the NFS server listens on 2049 now (for the NFSv3 protocol)
13:58 _T3_ right
13:59 ndevos _T3_: 2049 is since 3.4 as the bot mentions - 111 was always needed
13:59 _T3_ yeah
13:59 ndevos just that there is no confision on that part :)
13:59 ndevos *confusion even
14:01 _T3_ Additionally, port 111 (since always) and port 2049 (from GlusterFS 3.4 & later) are used for port mapper, and should have both TCP and UDP open.
14:02 ndevos yes, well, UDP maybe not
14:03 _T3_ http://en.wikipedia.org/wiki/Portmap has it
14:04 _T3_ "For example, it shows that NFS is running, both version 2 and 3, and can be reached at TCP port 2049 or UDP port 2049, depending on what transport protocol the client wants to use, and that the mount protoco (...)"
14:05 _T3_ https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/s2-nfs-nfs-firewall-config.html
14:05 ndevos yes, but by default Gluster/NFS does not provide services over UDP, it is TCP only
14:05 _T3_ Procedure 9.1. Configure a firewall to allow NFS
14:05 _T3_ Allow TCP and UDP port 2049 for NFS.
14:05 _T3_ hmm
14:05 _T3_ got it
14:05 ndevos you need to enable the nfs.mount-udp option, and that only adds UDP support for the MOUNT protocol
14:05 _T3_ right
14:06 _T3_ will remove 2049 udp then
14:06 ndevos you could include that in the wiki too, if you think that makes it clearer - but I'll leave that to you :)
14:13 fandi joined #gluster
14:14 _T3_ ndevos, please review, and feel free to edit if I messe up anything: http://www.gluster.org/community/documentation/index.php/Basic_Gluster_Troubleshooting#1._What_ports_does_Gluster_need.3F
14:14 _T3_ I gotta go now.. lunch time in Brazil :)
14:15 ndevos _T3_: thanks, and have a good lunch!
14:15 Rogue-3 Does anybody know how the value of trusted.glusterfs.dht.linkto is generated when sticky-pointers are created? I've got a bunch that are stale (pointing to subvols that no longer exist) but when I delete the pointers they come back with the same bogus reference
14:15 fandi joined #gluster
14:16 ndevos _T3_++ nicely phrased!
14:16 glusterbot ndevos: _T3_'s karma is now 1
14:17 ndevos Rogue-3: are those re-created when you stop the brick process, delete the link-file, start the brick? the links probably are cached and therefore re-created
14:18 ndevos Rogue-3: maybe a rebalance should correct those link-files, but I've never really looked at how those files are managed
14:20 Rogue-3 ndevos: I've not tried it with anything stopped (the volume is in production and there is currently no replication) but I have tried both regular rebalance and fix-layout, fix-layout completes but has no effect and regular rebalance fails as soon as it hits a file with a bad pointer
14:21 ndevos Rogue-3: fix-layout only affects directories (corrects the hash-ranges), but the failing rebalance sounds like a bug
14:21 fandi joined #gluster
14:22 Rogue-3 ndevos: the main issue is that you can't interact with any file with a stale pointer, glusterfs crashes immediately leaving the mount stale
14:23 Rogue-3 I've been "solving" this by locating a bad file, removing the pointer from the brick, moving the file off the volume and then back on to regenerate all the metadata
14:24 Rogue-3 but if I don't recreate the file right away the pointer will eventually get regenerated and interacting with the file will again cause a mount failure
14:25 ndevos Rogue-3: that sounds like bug 1159571
14:25 glusterbot Bug https://bugzilla.redhat.com:443/show_bug.cgi?id=1159571 high, unspecified, ---, bugs, POST , DHT: Rebalance- Rebalance process crash after remove-brick
14:26 ndevos bug 1162767 was used to get the fix in 3.5, 3.5.4 has not been released yet, but the fix has been merged already, so the ,,(nightly builds) should have the fix
14:26 glusterbot Bug https://bugzilla.redhat.com:443/show_bug.cgi?id=1162767 high, high, ---, bugs, MODIFIED , DHT: Rebalance- Rebalance process crash after remove-brick
14:26 glusterbot ndevos: Error: No factoid matches that key.
14:27 Rogue-3 ndevos: I'm running 3.6.1
14:27 ndevos http://download.gluster.org/pub/gluster/glusterfs/nightly/  - RPMs only
14:28 Rogue-3 there is what the mount log shows right before crashing on access of a file with a bad pointer: [dht-common.c:1884:dht_lookup_cbk] 0-uploads-dht: linkfile not having link subvol for <file path>
14:29 Rogue-3 which is accurate, I've checked attrs on the bad pointers and they all reference replicate subvols which no longer exist
14:30 Rogue-3 but there doesn't really seem to be any pattern to which files have been affected
14:31 ndevos well, bug 1142409 has been closed with 3.6.0 already... maybe you're hitting something else
14:31 glusterbot Bug https://bugzilla.redhat.com:443/show_bug.cgi?id=1142409 unspecified, unspecified, ---, srangana, CLOSED CURRENTRELEASE, DHT: Rebalance process crash after add-brick and `rebalance start' operation
14:31 ndevos ah, no, thats a different issue alltogether
14:32 ndevos I dont know, maybe you can find the bug that you hit here: https://bugzilla.redhat.com/buglist.cgi?product=GlusterFS&amp;query_format=advanced&amp;short_desc=rebalance&amp;short_desc_type=allwordssubstr
14:32 ndevos there are quite some rebalance issues, and I'm really not too familiar with them
14:33 Rogue-3 there aren't a lot of details about 1142409 but I suppose it could be realated, the volumes were created under 3.5.2 and all nodes were upgraded to 3.6.1 later
14:38 ndevos I think 1159571 is more likely, the patch has been sent to the master branch, and 3.5, but for some reason not to 3.6...
14:40 ndevos you could ask in the bug if the patch also needs backporting to 3.6, if you do, set NEEDINFO to reporter and Nithya should get the request
14:41 * ndevos goes afk, and might be back later
15:09 plarsen joined #gluster
15:23 ekman joined #gluster
15:29 Pupeno joined #gluster
15:29 Pupeno joined #gluster
15:44 kshlm joined #gluster
15:47 elico joined #gluster
15:57 haomaiwa_ joined #gluster
16:01 diegows joined #gluster
16:04 hajoucha joined #gluster
16:05 hajoucha hi, I am trying to create dispersed volume with latest .rpm from gluster.org repo. However, I get this weird result: http://pastebin.com/Xkn2ZwwR
16:05 glusterbot Please use http://fpaste.org or http://paste.ubuntu.com/ . pb has too many ads. Say @paste in channel for info about paste utils.
16:07 hajoucha ok, http://paste.ubuntu.com/9503892/
16:20 Rogue-3 hajoucha: It looks like /exports/gv0/brick1 already exists and was previously part of another volume, you will get that error under those conditions regardless of whether the previous volume still exists.
16:21 Rogue-3 hajoucha: easiest solution is to delete all the preexisting brick1 directories after confirming they no longer contain any needed data
16:21 hajoucha Rogue-3: hm, I am trying from scratch.. so maybe previous attempts to create volume (unsuccessfull) changed something...
16:22 hajoucha Rogue-3: I have umounted all the /exports/gv0 and reformated. Than mounted and created /exports/gv0/brick1 on each server. Now I get: http://paste.ubuntu.com/9504000/
16:24 Rogue-3 hajoucha: can you check `gluster pool list` and confirm all peers are connected?
16:24 hagarth joined #gluster
16:24 hajoucha Rogue-3: in the log last two lines are: http://paste.ubuntu.com/9504033/
16:26 hajoucha Rogue-3: peers are ok, see http://paste.ubuntu.com/9504047/
16:26 Rogue-3 hajoucha: ah, I believe the solution to that would be `gluster set all op.version 30600` but don't quote me on that (especially if you are working in production)
16:27 hajoucha Rogue-3: maybe there is a problem with upgrade - I have upgraded gluster rpm and restarted glusterd, but did _not_ restart the machines. I wonder if it is really necessary....
16:27 hajoucha Rogue-3: ok, I will try that. No problem, this is just testing setup, so I cannot screw any important data....
16:28 Rogue-3 I think that may be by design, I recently upgraded some debian systems to gluster 3.6.1 and have noticed that new volumes are still created with op-version=2 by default
16:29 hajoucha Rogue-3: hm, gluster set.... did not work - see http://paste.ubuntu.com/9504077/
16:30 Rogue-3 hajoucha: my mistake, it should have been `gluster vol set all op.version 30600`
16:30 hajoucha hmm, volume set: failed: option : op.version does not exist
16:30 hajoucha aha, a typo
16:31 Rogue-3 hmm, maybe op-version?
16:31 hajoucha yes, that was it. Unfortunately, I now get the volume create: gv0: failed: /exports/gv0/brick1 is already part of a volume  error
16:32 Rogue-3 you'll need to remove the brick1 directories again, or just use a different dir name
16:33 hajoucha Rogue-3: is it enough to remove the dirs and create them again (that is - no need to reformat and remount ...) ?
16:34 hajoucha Rogue-3: huray! works now
16:34 Rogue-3 hajoucha: just removing them should be enough, gluster can create new ones itself
16:34 hajoucha Rogue-3: yes, simply removed dirs, created again, than created successfully the volume gv0 . Super, thank you very much!
16:34 Rogue-3 hajoucha: glad to help :)
16:50 hajoucha hm, I got excited too early... see http://paste.ubuntu.com/9504279/
16:51 hajoucha the volume was created, however, cannot be started...
16:52 hajoucha log says: http://paste.ubuntu.com/9504289/
16:52 hajoucha basicaly "unable to start brick1 on localhost"
16:57 hajoucha hm, perhaps transport "rdma" is a problem....
17:19 _T3_ joined #gluster
17:31 Pupeno joined #gluster
17:50 haomaiwa_ joined #gluster
18:06 calisto joined #gluster
18:07 mrlesmithjr joined #gluster
18:07 mrlesmithjr in Gluster 3.5 is network.compression default to on or off?
18:11 mrlesmithjr trying to figure out the high CPU on glusterfsd for 2 bricks replicated
18:12 mrlesmithjr default setup pretty much on Ubuntu 14.04
18:40 calisto joined #gluster
19:32 theron joined #gluster
19:53 dkorzhevin joined #gluster
19:55 dkorzhevin joined #gluster
20:16 Pupeno joined #gluster
20:40 T3 joined #gluster
22:16 daMaestro joined #gluster
22:16 Pupeno joined #gluster
22:16 badone_ joined #gluster
23:00 MacWinner joined #gluster
23:06 ry left #gluster
23:23 glusterbot News from newglusterbugs: [Bug 1173909] glusterd crash after upgrade from 3.5.2 <https://bugzilla.redhat.com/show_bug.cgi?id=1173909>
23:31 theron joined #gluster
23:41 T3 joined #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary