Perl 6 - the future is here, just unevenly distributed

IRC log for #gluster, 2015-11-17

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:00 mlncn joined #gluster
00:01 haomaiwa_ joined #gluster
00:03 suliba joined #gluster
00:14 shyam joined #gluster
00:24 sjohnsen joined #gluster
00:34 mlhamburg_ joined #gluster
00:38 jvandewege joined #gluster
00:49 amye joined #gluster
01:01 haomaiwa_ joined #gluster
01:02 pff joined #gluster
01:06 Merlin_ joined #gluster
01:25 jockek joined #gluster
01:25 RedW joined #gluster
01:29 Mr_Psmith joined #gluster
01:35 Lee1092 joined #gluster
02:14 nangthang joined #gluster
02:17 xMopxShell Is it possible to limit the IO speed of rebalancing?
02:20 thangnn_ joined #gluster
02:31 haomaiwa_ joined #gluster
02:35 Telsin joined #gluster
02:36 haomaiw__ joined #gluster
03:00 gildub_ joined #gluster
03:01 kshlm joined #gluster
03:06 Merlin_ joined #gluster
03:17 beeradb__ joined #gluster
03:17 cuqa_ joined #gluster
03:18 kblin joined #gluster
03:18 kblin joined #gluster
03:18 msvbhat joined #gluster
03:19 gem joined #gluster
03:23 shortdudey123 joined #gluster
03:24 necrogami joined #gluster
03:29 cuqa_ joined #gluster
03:37 mlhess joined #gluster
03:39 overclk joined #gluster
03:53 sakshi joined #gluster
03:58 bharata-rao joined #gluster
03:59 nishanth joined #gluster
04:00 shubhendu joined #gluster
04:01 nbalacha joined #gluster
04:03 m0zes joined #gluster
04:06 ctria joined #gluster
04:10 atinm joined #gluster
04:11 itisravi joined #gluster
04:11 nbalacha joined #gluster
04:14 dusmant joined #gluster
04:15 hchiramm_home joined #gluster
04:19 itisravi joined #gluster
04:26 TheSeven joined #gluster
04:26 ramteid joined #gluster
04:35 zhangjn joined #gluster
04:40 ppai joined #gluster
04:40 nbalacha joined #gluster
04:45 harish_ joined #gluster
04:48 hchiramm_home joined #gluster
04:51 bkunal|training joined #gluster
04:54 zerick_ joined #gluster
04:56 csim_ joined #gluster
04:56 cristian_ joined #gluster
04:56 lalatend1M joined #gluster
04:56 lbarfiel1 joined #gluster
04:56 xavih_ joined #gluster
04:56 siel_ joined #gluster
04:57 yawkat` joined #gluster
04:57 rideh- joined #gluster
04:58 n-st_ joined #gluster
04:58 rich0dify joined #gluster
04:58 Sadama joined #gluster
04:58 Humble joined #gluster
05:00 baoboa joined #gluster
05:00 mattmcc_ joined #gluster
05:00 klaxa joined #gluster
05:00 JPaul joined #gluster
05:00 atrius` joined #gluster
05:01 kbyrne joined #gluster
05:01 XpineX joined #gluster
05:02 virusuy joined #gluster
05:02 BrettM joined #gluster
05:03 ron-slc joined #gluster
05:03 sadbox joined #gluster
05:03 josh joined #gluster
05:04 anil joined #gluster
05:07 ndarshan joined #gluster
05:18 F2Knight_ joined #gluster
05:20 suliba_ joined #gluster
05:20 cfeller_ joined #gluster
05:22 jvandewege_ joined #gluster
05:23 AleksU joined #gluster
05:24 prg3_ joined #gluster
05:27 Jmainguy joined #gluster
05:28 atinm joined #gluster
05:28 m0zes joined #gluster
05:29 monotek joined #gluster
05:29 harish__ joined #gluster
05:30 capri joined #gluster
05:30 mmckeen joined #gluster
05:31 pppp joined #gluster
05:31 aravindavk joined #gluster
05:32 bennyturns joined #gluster
05:34 curratore joined #gluster
05:39 scubacuda joined #gluster
05:40 sghatty_ joined #gluster
05:40 Chr1st1an joined #gluster
05:42 atalur joined #gluster
05:47 arcolife joined #gluster
05:47 vimal joined #gluster
05:48 kevc joined #gluster
05:49 neha_ joined #gluster
05:51 jiffin joined #gluster
05:54 jtux joined #gluster
05:58 rafi joined #gluster
05:58 Bhaskarakiran joined #gluster
06:03 hgowtham joined #gluster
06:03 hgowtham_ joined #gluster
06:12 vmallika joined #gluster
06:13 bkunal|training joined #gluster
06:20 kdhananjay joined #gluster
06:21 mlhamburg1 joined #gluster
06:21 raghu joined #gluster
06:22 dan__ joined #gluster
06:26 kevc should I expect performance gains for large files / sequential reads if I run glusterfs striped over 4 RAID6 volumes? That is, faster than the individual RAID6 volumes themselves?
06:30 Saravana_ joined #gluster
06:31 F2Knight joined #gluster
06:35 jiffin1 joined #gluster
06:41 jiffin1 joined #gluster
06:48 kovshenin joined #gluster
06:57 aravindavk joined #gluster
07:10 mhulsman joined #gluster
07:15 deepakcs joined #gluster
07:19 doekia joined #gluster
07:39 doekia joined #gluster
07:43 mhulsman1 joined #gluster
07:44 frozengeek joined #gluster
07:47 jvandewege joined #gluster
07:48 mhulsman joined #gluster
07:48 R0ok_ joined #gluster
08:02 jvandewege_ joined #gluster
08:09 ivan_rossi joined #gluster
08:11 suliba joined #gluster
08:11 ramteid joined #gluster
08:17 suliba joined #gluster
08:24 auzty joined #gluster
08:28 ramteid joined #gluster
08:34 kanagaraj joined #gluster
08:36 poornimag joined #gluster
08:37 fsimonce joined #gluster
08:40 Merlin_ joined #gluster
08:43 suliba joined #gluster
08:44 deepakcs joined #gluster
08:45 hos7ein joined #gluster
08:51 F2Knight joined #gluster
08:53 ramteid joined #gluster
08:55 ctria joined #gluster
09:04 deepakcs joined #gluster
09:10 mhulsman joined #gluster
09:14 dan__ joined #gluster
09:15 deepakcs joined #gluster
09:21 atinm joined #gluster
09:21 kanagaraj joined #gluster
09:26 suliba joined #gluster
09:28 kshlm joined #gluster
09:28 arcolife joined #gluster
09:46 jwd joined #gluster
09:54 suliba joined #gluster
09:54 frozengeek joined #gluster
10:06 Trefex joined #gluster
10:17 ramky joined #gluster
10:18 mhulsman1 joined #gluster
10:23 pppp joined #gluster
10:24 kanagaraj_ joined #gluster
10:28 overclk joined #gluster
10:32 bluenemo joined #gluster
10:32 haomaiwa_ joined #gluster
10:37 bluenemo hi guys. My gluster, just put into production, with two nodes replicating, keeps dying. I have four webworkers for two gluster servers, each serving NFS. from time to time, one gluster server seems to die - gluster volume status shows that everything is ok, but both of its clients cant access /var/www via NFS. this is what the logs say at this point on the server: http://paste.debian.net/333279/
10:37 glusterbot Title: debian Pastezone (at paste.debian.net)
10:38 LebedevRI joined #gluster
10:42 jiffin bluenemo: can u provide nfs.log in the same location from the servers which client mounted
10:42 jiffin ??
10:42 suliba joined #gluster
10:43 bluenemo jiffin, i dont get your question completely. servers dont use any nfs. client nfs.log shows: (in dmesg) nfs: server omega not responding, still trying
10:44 autostatic JoeJulian: disabling geo-replication indexing didn't fully solve the logs getting flooded. I'm still getting a lot of these:
10:44 autostatic E [marker.c:2140:marker_removexattr_cbk] 0-datavolume-marker: Numerical result out of range occurred while creating symlinks
10:45 bluenemo also found this here [2015-11-17 10:25:21.400246] E [MSGID: 114031] [client-rpc-fops.c:1727:client3_3_entrylk_cbk] 0-gfs_fin_web-client-1: remote operation failed [Transport endpoint is not connected]
10:45 jiffin bluenemo: what I meant was "can u check the /var/log/glusterfs/nfs.log" from the server which client mounted?
10:45 bluenemo jup, the log line just pasted is the only error I can find there for that timeframe
10:47 jiffin bluenemo: do "gluster v status" shows nfs server is up and running ??
10:47 haomaiwa_ joined #gluster
10:48 bluenemo jiffin, yes
10:48 bluenemo also when the error is currently present
10:48 bluenemo what resolves the issue right away is service gluster-server restart (ubuntu 14.04)
10:49 jiffin bluenemo: can u try showmount -e <nfs_server> from client??
10:49 bluenemo using this ppa here http://ppa.launchpad.net/gluster/glusterfs-3.7/ubuntu
10:49 glusterbot Title: Index of /gluster/glusterfs-3.7/ubuntu (at ppa.launchpad.net)
10:49 suliba joined #gluster
10:49 TheSeven joined #gluster
10:49 haomaiwang joined #gluster
10:49 bluenemo Export list for alpha: /gfs_fin_web *
10:49 bluenemo (output of showmount)
10:49 bluenemo it currently is mounted
10:49 bluenemo i cant have the error for long as it is production now
10:49 bluenemo didnt have this behavior during staging phase
10:50 bluenemo compiled a few kernels on it, slow but not failing
10:50 dusmant joined #gluster
10:50 jiffin bluenemo: hmm everything looks fine for me
10:53 bluenemo what about the two errors in my first paste http://paste.debian.net/333279/ ?
10:53 glusterbot Title: debian Pastezone (at paste.debian.net)
10:53 bluenemo what does this mean? 0-management: creation of 1 listeners failed, continuing with succeeded transport
10:53 bluenemo is there any timeout I should up a bit?
10:54 bluenemo my mount options for nfs on the client: alpha:/gfs_fin_web on /var/www type nfs (rw,nosuid,noatime,nosharecache,fsc=fincallorca_web,tcp,bg,rsize=65536,wsize=65536,soft,proto=tcp,addr=172.16.0.132)
10:55 bluenemo could this be related to rpc somehow?
10:56 bluenemo found this which has the same error message https://gist.github.com/txomon/58841f2f28654953b512 but dont really get it
10:56 glusterbot Title: gluster problems · GitHub (at gist.github.com)
10:57 bluenemo I also get this error 0-management: Failed to set keep-alive: Invalid argument
10:58 bluenemo hm, all nodes and clients can resolve alpha and omega
11:00 atinm joined #gluster
11:03 jvandewege_ joined #gluster
11:04 kkeithley1 joined #gluster
11:06 steveeJ I think I've asked this before, but the answer wasn't very clear back then. is it possible to use glusterfs to synchronize access to a physically shared blockdevice?
11:09 mhulsman joined #gluster
11:12 rp_ joined #gluster
11:12 zhangjn joined #gluster
11:13 zhangjn joined #gluster
11:14 jvandewege joined #gluster
11:14 nbalacha joined #gluster
11:14 zhangjn joined #gluster
11:15 zhangjn joined #gluster
11:16 gildub_ joined #gluster
11:16 zhangjn joined #gluster
11:17 zhangjn joined #gluster
11:18 zhangjn joined #gluster
11:20 zhangjn joined #gluster
11:21 zhangjn joined #gluster
11:26 pppp joined #gluster
11:35 mhulsman1 joined #gluster
11:38 R0ok_ joined #gluster
11:42 rjoseph joined #gluster
11:44 frozengeek joined #gluster
11:49 ndevos steveeJ: if you need that, you probably should be looking at gfs2 (global file system)
11:49 ndevos steveeJ: Gluster does not share the block devices that get formatted as xfs, it combines multiple (different) block devices into volumes
11:50 jiffin bluenemo: in my assumption errors are related to rdma , not to rpc
11:51 jiffin bluenemo: not 100% sure on that
11:52 ndevos REMINDER: Gluster Bug Triage meeting starts in 10 minutes in #gluster-meeting
11:58 jiffin bluenemo: sorry my mistake , those errors spurious , even happens for me tto
11:58 jiffin s/tto/too
11:59 bluenemo jiffin, well my NFS failing isnt spurious :)
11:59 bluenemo i didnt find a way to trigger it though.
12:00 bluenemo did the usual stuff as compiling kernel on NFS, pretty slow but wont fail
12:01 neha_ joined #gluster
12:02 ndevos REMINDER: Gluster Bug Triage meeting starts *NOW* in #gluster-meeting
12:04 Philambdo joined #gluster
12:05 Ru57y joined #gluster
12:23 bluenemo is there any other data I can look into? I cant live with having NFS fail from time to time without knowing it :(
12:24 armyriad joined #gluster
12:26 armyriad joined #gluster
12:28 Philambdo joined #gluster
12:53 nr joined #gluster
12:54 jvandewege joined #gluster
12:56 ira joined #gluster
12:57 nr left #gluster
12:59 nehar joined #gluster
13:01 Philambdo joined #gluster
13:03 B21956 joined #gluster
13:06 harish__ joined #gluster
13:13 jeffrin joined #gluster
13:13 jeffrin hello all
13:14 jeffrin anyone here ?
13:14 rjoseph joined #gluster
13:14 seko joined #gluster
13:15 jeffrin left #gluster
13:15 sakshi joined #gluster
13:19 netbulae_ joined #gluster
13:22 Mr_Psmith joined #gluster
13:27 frozengeek joined #gluster
13:30 cked350 joined #gluster
13:39 bkunal|training joined #gluster
13:41 bluenemo joined #gluster
13:41 bluenemo had internet issues, back
13:42 bluenemo so far it happened three times this morning.. could this be caused by workers trying to write the same file via nfs while mounting from different gluster servers which replicate?
13:42 muneerse2 joined #gluster
13:42 bluenemo i'm getting kinda afraid to go to bed tonight :)
13:43 unclemarc joined #gluster
13:46 rjoseph joined #gluster
13:47 haomaiwa_ joined #gluster
13:49 diegows joined #gluster
13:52 Slashman joined #gluster
13:52 ndevos bluenemo: I'm not sure what you are trying to do, or what failures you get, but if you have multiple processes write to the same file, those processes should use locks
13:53 bluenemo how does gluster handle this?
13:54 ndevos bluenemo: gluster provides posix locks like a local filesystem, and sends those locks to the bricks
13:54 bluenemo ok, so if I write to one file on gluster native mount, I should not be able to write to it on the other node?
13:55 ndevos bluenemo: that depends on the application that does the writing, if the application uses locking, it will be prevented
13:56 bluenemo i see. now if I use the standard nfs that comes with gluster? My mount options with the clients are  alpha:/gfs_fin_web on /var/www type nfs (rw,nosuid,noatime,nosharecache,fsc=fincallorca_web,tcp,bg,rsize=65536,wsize=65536,soft,proto=tcp,addr=172.16.0.132)
13:57 ndevos using fuse or nfs does not really matter, it is mainly dependent on the locking used (or not) by the application
13:57 bluenemo i see. I'm trying to figure out if this is anyhow related to my NFS failing on one gluster node from time to time..
13:58 ndevos bluenemo: that is difficult to say, maybe you have some messages in /var/log/glusterfs/nfs.log that give a hint?
13:59 bluenemo my setup is with aws in two availability zones. the idea was to have one gluster in each and then two apache workers behind a load balancer. i mount the clients to the gluster in their respective availability zone
13:59 bluenemo no, nothing of the sort. posted some of it above
13:59 bluenemo dunno if you joined before or after, sry
14:01 ndevos well, if NFS fails, that should be noted in the nfs.log on the failing server, not the etc-glusterd....log or others
14:01 shyam joined #gluster
14:02 ndevos bluenemo: oh, also, it is not recommended to mount a volume over NFS on the Gluster servers, that will result in locking issues
14:03 ndevos bluenemo: if you need to mount the volume on Gluster servers, you should use fuse instead
14:04 bluenemo its only mounted via gluster native on the gluster servers themselves
14:04 bluenemo i noticed that before :)
14:04 jwd joined #gluster
14:04 ndevos ah, ok :)
14:05 psymax joined #gluster
14:05 bluenemo i've got a (from nfs log)
14:05 bluenemo [2015-11-17 08:54:19.520332] E [MSGID: 114058] [client-handshake.c:1524:client_query_portmap_cbk] 0-gfs_fin_web-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
14:06 ndevos ah, well, the Gluster/NFS server needs to be able to connect to all the bricks, if that fails, wel...
14:06 bluenemo hm
14:06 bluenemo but then shouldnt both fail?
14:06 bluenemo i can still ls /var/www on the servers mounted via gluster then
14:07 bluenemo just the clients get the NFS hang - they reconnect right away after a service gluster-server restart on the server btw
14:07 bluenemo as in all this happens in availability zone A
14:07 bluenemo while B is unaffected
14:08 ndevos hmm, strange, and when you run "gluster volume status", is there a brick process missing?
14:08 bluenemo what does failed to get remote port number mean?
14:08 psymax Hello! I'm having an issue with a gluster mount on Debian Jessie. I added the mount in fstab with the _netdev option, but the volume remains unmounted at boot. From gluster log: 0-glusterfs: connection to failed (No route to host). Any easy trick to fix this? Thanks.
14:08 bluenemo no, even NFS says y
14:09 dgandhi joined #gluster
14:09 bluenemo I would like to fully understand the meaning of the error message
14:10 swebb joined #gluster
14:10 bluenemo do you know what remote port he is talking about?
14:10 bluenemo aeh port number for remote subvol
14:10 ndevos bluenemo: when connecting to the bricks, the client (nfs-server) asks the glusterd process (on port 24007) what port the brick is using
14:11 ndevos bluenemo: each brick listens on its own port, see @ports
14:11 ndevos @ports
14:11 glusterbot ndevos: glusterd's management port is 24007/tcp (also 24008/tcp if you use rdma). Bricks (glusterfsd) use 49152 & up since 3.4.0 (24009 & up previously). (Deleted volumes do not reset this counter.) Additionally it will listen on 38465-38467/tcp for nfs, also 38468 for NLM since 3.3.0. NFS also depends on rpcbind/portmap on port 111 and 2049 since 3.4.
14:13 ndevos psymax: I think _netdev is not used on Debian, they probably have a different option to set the correct order of gluster startup and mounting
14:16 bluenemo ndevos, those are the three error messages I found in the log for today. we started at 7am: http://paste.debian.net/333320/
14:16 glusterbot Title: debian Pastezone (at paste.debian.net)
14:16 julim joined #gluster
14:17 ndevos sorry bluenemo, need to get some stuff done, will be back later
14:17 bluenemo all errors today http://paste.debian.net/333322/
14:17 glusterbot Title: debian Pastezone (at paste.debian.net)
14:18 plarsen joined #gluster
14:18 bluenemo ah ok, sure, thanks for your help. As I am in production and the client might force me back to single server NFS, I hope I'm allowed to ping JoeJulian on this one
14:20 psymax ndevos: thank you
14:20 hamiller joined #gluster
14:22 ramky joined #gluster
14:26 Merlin_ joined #gluster
14:27 bfoster joined #gluster
14:29 gem joined #gluster
14:31 skylar joined #gluster
14:36 nishanth joined #gluster
14:36 mlncn joined #gluster
14:39 haomaiwa_ joined #gluster
14:40 jiffin joined #gluster
14:44 muneerse joined #gluster
14:45 nbalacha joined #gluster
14:48 psymax ndevos: i added a "sleep 30" in the head of /etc/network/if-up.d/mountnfs and it works now ;) (_netdev is needed)
14:53 mhulsman joined #gluster
14:54 mhulsman1 joined #gluster
14:58 Merlin_ joined #gluster
14:58 shubhendu joined #gluster
15:01 B21956 joined #gluster
15:01 B21956 left #gluster
15:02 Merlin_ joined #gluster
15:03 plarsen joined #gluster
15:03 uebera|| joined #gluster
15:04 bluenemo just happened again :(
15:04 bluenemo meh :(
15:08 bluenemo and again, ah wtf
15:08 bluenemo why now in production M(
15:12 bluenemo and again omfg...
15:12 bluenemo gret
15:12 bluenemo great..
15:24 uebera|| joined #gluster
15:24 kdhananjay joined #gluster
15:24 bluenemo its seems a developer was executing some script that is supposed to call log4php, which then logs to the dir mounted via nfs..
15:29 bluenemo ok, finally i'm able to replicate the problem..
15:31 bluenemo this is a strace from the php script executed on a client with mount via nfs that causes in this order 1) the local nfs client to hang 2) the nfs of all other clients mounting from this gluster to hang. only fix is service glusterfs-server restart
15:31 bluenemo http://paste.debian.net/333340/
15:31 glusterbot Title: debian Pastezone (at paste.debian.net)
15:38 ndevos bluenemo: locking over NFS requires the NLM service, that runs on a different port
15:38 bluenemo there is no firewall in between the instances
15:38 ndevos bluenemo: you can check with "rpcinfo -p $server" to see what port the "nlockmgr" uses, NFS-clients need to connect to that
15:39 bluenemo I can now confirm that flock triggers the behavior
15:39 bowhunter joined #gluster
15:40 bluenemo its 1019 and 1009 on the other
15:40 ndevos bluenemo: can you do "grep -i nlm nfs.log" ?
15:42 bluenemo ndevos, http://paste.debian.net/333346/
15:42 glusterbot Title: debian Pastezone (at paste.debian.net)
15:44 ndevos bluenemo: at least NLM is handling some things, and no real recent errors
15:46 bluenemo ndevos, hm its not a feature that i want to switch off.. but if it dies when only one node uses it..
15:47 ndevos bluenemo: that flock() is expected to block, until the lock can get obtained, maybe an other process hold a lock on that file?
15:47 bluenemo does nfs dying on flock sound familiar to anybody? or is this specific to gluster?
15:47 bluenemo i setup a development environment for it where only i run the  php test.php  script
15:48 bluenemo i'm sure its the only process trying access, but it should be able to hold another doing that as well later
15:48 ndevos bluenemo: when you mean "nfs dies", does that mean the nfs-server process exits?
15:48 bluenemo yes
15:48 bluenemo i checked this time, nfs is dead on gluster volume status
15:49 ndevos bluenemo: normally something like that would be listed in the nfs.log, but I dont think there was anything about that in there?
15:49 bluenemo i'll send you the whole log, one sec pls
15:49 ndevos bluenemo: when the nfs process died, what are the last 40 lines in the nfs.log?
15:50 bluenemo ndevos, pm
15:52 ndevos bluenemo: there is nothing in that log that suggests the nfs process got an issue :-/
16:00 sakshi joined #gluster
16:04 bluenemo so flock shoots the system reporducable :(
16:04 diegows joined #gluster
16:11 bluenemo as in kills the gluster internal nfs server
16:13 bluenemo if the native gluster wouldnt be so incredibly slow i'd just use that ;)
16:13 maserati joined #gluster
16:13 JoeJulian What's the workload?
16:14 ndevos one flock() :-/
16:14 JoeJulian lol
16:15 cholcombe joined #gluster
16:15 JoeJulian I didn't realize flock over fuse was all that slow
16:15 bluenemo its mounted via nfs
16:16 bluenemo amazon, two availability zones, one gluster server in each doing replication to the other, for each AZ two apache's mounting /var/www via nfs from their respecitve gluster
16:16 JoeJulian So far I've guessed the workload is something web related.
16:16 JoeJulian I'm going to go a step further and guess that it's ,,(php).
16:16 glusterbot (#1) php calls the stat() system call for every include. This triggers a self-heal check which makes most php software slow as they include hundreds of small files. See http://joejulian.name/blog/optimizing-web-performance-with-glusterfs/ for details., or (#2) It could also be worth mounting fuse with glusterfs --attribute-timeout=HIGH --entry-timeout=HIGH --negative-timeout=HIGH
16:16 bluenemo well, it is atm.
16:16 glusterbot --fopen-keep-cache
16:17 bluenemo so each time this happens nfs dies :/
16:17 bluenemo there is no pic uploads and lots of stuff writing atm..
16:17 bluenemo but it will get lots of traffic soon too
16:17 bluenemo as in the evening
16:17 bluenemo yeah php
16:18 JoeJulian So what's slow? The page render? Are you using apc or one of the other tools I mention in the article (I'm more familiar with apc).
16:18 bluenemo its not slow, if i use flock the nfs server dies on the respective gluster server
16:18 bluenemo gluster volume status: nfs n
16:19 JoeJulian " if the native gluster wouldnt be so incredibly slow i'd just use that ;)"
16:19 bluenemo would you recommend switching to the native gluster mount?
16:20 JoeJulian It's certainly an option.
16:21 bluenemo btw i read your blog once or twice and we talked quite a bit the other few days i was here :) we came to the conclusion that nfs + cachefilesd would be the idea
16:22 JoeJulian Ah yes, I remember you coming to that conclusion. I wasn't quite sure how you got that. ;)
16:22 Merlin_ joined #gluster
16:23 JoeJulian My preference has been fuse for the redundancy path.
16:23 bluenemo with my client, i wont really be able to setup the recommendations on the code. i need this stable by tonight
16:23 jbrooks joined #gluster
16:23 bluenemo ok then lets just try it. i have four nodes and can spare one to setup the native glustered /var/www
16:23 bluenemo can i just test mount it to /mnt while /var/www is via nfs, yes right?
16:24 JoeJulian yes
16:25 bluenemo ok I will try to do that now
16:26 bluenemo so for the mount options in my salt formula I noted so far: defaults,fetch-attempts=10,nobootwait,log-level=WARNING,log-file=/var/log/gluster-mount.log
16:27 bluenemo the options you posted above can be used like mount -o as well?
16:27 julim_ joined #gluster
16:27 ndevos bluenemo: this works fine for me: http://termbin.com/uk6w
16:27 bluenemo ah and high should be in seconds :) what do you recommend for high JoeJulian ?
16:28 ndevos bluenemo: can you try that out? copy it to a nfs mountpoint, cd there and run it with ./lock-it.sh (or whatever you call the script)
16:29 bluenemo ah yes, well that kills the system
16:29 bluenemo that works for you?
16:29 JoeJulian I'm spinning up a couple of VMs to see if I can repro.
16:29 bluenemo cool thank you!
16:29 bluenemo I have to credit this channel for its fine support!
16:30 JoeJulian Hmm... somehow I just detached this channel into its own window. :(
16:30 ndevos bluenemo: works for me, on CentOS-7 + glusterfs-3.7.6, but I'm also only using one brick...
16:30 bluenemo btw i'm using ubuntu + the ppa http://ppa.launchpad.net/gluster/glusterfs-3.7/ubuntu
16:30 glusterbot Title: Index of /gluster/glusterfs-3.7/ubuntu (at ppa.launchpad.net)
16:31 JoeJulian left #gluster
16:31 bluenemo my gluster volume config, maybe that helps http://paste.debian.net/333357/
16:31 JoeJulian joined #gluster
16:31 glusterbot Title: debian Pastezone (at paste.debian.net)
16:31 bluenemo hm maybe keepalive time 10 is just slow..
16:31 bluenemo hm
16:31 bluenemo but the network is ok between the boxes, i'm sure of that
16:32 JoeJulian A network issue shouldn't make nfs exit.
16:32 bluenemo true. also i can already reprod i by flock
16:32 JoeJulian Let me see that log. ndevos is the expert on nfs, but maybe I'll get lucky.
16:34 ndevos bluenemo: even when I use 1 server and 4 bricks, it still works... might be something in the ubuntu package or such :-/
16:35 bluenemo glusterfs 3.7.4 built on Sep  1 2015 11:42:50
16:36 bluenemo there is an update to 3.7.6 due
16:36 bluenemo that I could install
16:36 RedW joined #gluster
16:36 bluenemo btw when I do that, do you recommend to just "do it" and let the maintainer handle restarts and stuff or would you remount the clients to the other machine (when using nfs)?
16:38 JoeJulian Well that was wierd. I just had two servers simultaneously have their kernels crash.
16:39 bluenemo please dont scare me
16:39 JoeJulian Assuming floating ips, I would float the ip away, kill the nfs server, upgrade, float it back then do the other server.
16:39 bluenemo i might get hanged for sth like this happening :D
16:39 JoeJulian No reason to be scared. I hadn't even gotten to gluster.
16:40 bluenemo ah. i have some scripts that just mount the other gluster server
16:40 ndevos bluenemo: http://termbin.com/n412 are the changes related to nfs between 3.7.4 and 3.7.6, but I do not spot anything that look related
16:41 bluenemo me neither, thanks for the paste
16:41 bluenemo ah ok (scaring) :)
16:43 ndevos bluenemo: can you attach a debugged to the nfs process, 'gdb --pid=....', press 'c' for continue and then make it die, and get a backtrace with 't a a bt'
16:44 ndevos s/debugged/debugger/
16:44 glusterbot What ndevos meant to say was: bluenemo: can you attach a debugger to the nfs process, 'gdb --pid=....', press 'c' for continue and then make it die, and get a backtrace with 't a a bt'
16:44 ndevos hey, glusterbot got fixed!
16:44 JoeJulian Yeah, I finally found some time.
16:45 ndevos thanks :)
16:45 JoeJulian Also check dmesg
16:45 JoeJulian Maybe something else is killing it.
16:47 ndevos JoeJulian: can glusterbot post messages based on an RSS feed? like http://serverfault.com/questions/tagged/glusterfs ?
16:47 glusterbot Title: Newest 'glusterfs' Questions - Server Fault (at serverfault.com)
16:47 ndevos well, thats not an rss-feed, but there is one... if I only can find it
16:49 ndevos ah, on the bottom of the page: newest glusterfs questions feed
16:49 bluenemo sth like strace ndevos ?
16:49 ndevos uh, http://serverfault.com/feeds/tag?tagnames=glusterfs&amp;sort=newest
16:49 glusterbot Title: Newest questions tagged glusterfs - Server FaultClustered cron with one server only overlap allowedWhy does rebalancing glusterfs bricks fail? (at serverfault.com)
16:49 bluenemo yes i can :)
16:49 ndevos bluenemo: strace might show something too, but it might not be very useable :)
16:51 bluenemo ah i have the strace output pasted above btw
16:51 bluenemo wait
16:51 JoeJulian Yes... I'm never going to answer one though. ;)
16:52 bluenemo dmesg - thought more would come, but nothing in syslog, all goes to /var/log/gluster i guess http://paste.debian.net/333368/
16:52 glusterbot Title: debian Pastezone (at paste.debian.net)
16:52 bluenemo well yeah all the strace basically tells us is "flock" ;)
16:53 bluenemo and from what I can tell gluster volume status shows nfs dead. + logfiles. btw JoeJulian here is the logfile from omega when alpha dies (webrick log) http://paste.debian.net/hidden/c3335bf9/
16:53 glusterbot Title: Debian Pastezone (at paste.debian.net)
16:53 ndevos bluenemo: I need the strace (or better gdb) output when tracing the gluster/nfs process, not the php or bash script :)
16:54 bluenemo i'm unfamiliar with gdb - can you give me the required options?
16:54 ndevos bluenemo: start it with "gdb --pid=..." and that ... comes from "gluster volume status"
16:54 bluenemo thats the right one, yes? /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/gluster/ee8ea47836750063d1ac6eba7d9537a3.socket
16:54 ndevos bluenemo: when connecting gdb. it will halt the process, you need to enter 'c' to have it continue
16:55 JoeJulian gdb --pid=$(pgrep -f nfs.log)
16:55 bluenemo ah ok. going to shoot it now
16:55 ndevos bluenemo: yes, that is the right process, you can find the pid also in /var/lib/glusterd/nfs/run/nfs.pid
16:55 mlncn joined #gluster
16:56 ndevos bluenemo: once it continues, gdb will not print anything, and you should not see a (gdb) prompt either
16:56 squizzi_ joined #gluster
16:56 bluenemo ok
16:56 ndevos bluenemo: once you made the process die, gdb should show a (gdb) prompt again
16:56 ndevos bluenemo: then, you can execute 't a a bt'
16:57 ndevos (whic means 'thread apply all backtrace', but gdb knows the abreviations :)
16:57 bluenemo do i need that? pthread_join.c: No such file or directory.
16:57 bluenemo :)
16:58 ndevos oh, not really, it would help, but I have no idea how to do "debugging the right way" on ubuntu
16:58 ndevos hopefully you would get some useful output never the less, but maybe you need some debuginfo packages or such
16:58 ndevos and also, I have to leave in 2 minutes...
16:59 bluenemo oh no :(
16:59 ndevos maybe you can file a bug with the details, exact ubuntu version, glusterfs packages and shell script?
16:59 glusterbot https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS
17:00 bluenemo hm :/ need to get this stable today or i cant sleep :/
17:01 ndevos bluenemo: maybe kkeithley_ can help you further
17:01 JoeJulian What's being written that requires a flock?
17:02 JoeJulian Is it something that might be opened by multiple clients and thus actually requires locking?
17:02 JoeJulian s/opened/opened for writing/
17:02 glusterbot What JoeJulian meant to say was: Is it something that might be opened for writing by multiple clients and thus actually requires locking?
17:02 bluenemo i cant get around stuff using flock.
17:02 ndevos it seems to be done by log4php, one of the libraries that are used
17:02 bluenemo its not in my department :(
17:03 JoeJulian Yes you could, that's why I'm asking. Logs... yeah, logging requires locking.
17:03 JoeJulian damn.
17:03 ndevos bluenemo: maybe you can reduce the urgency when you tell your users to run the script only on one server at the same time? (and use "-o nolock")
17:03 JoeJulian Trying to think outside the box to get you stable for prime time.
17:04 JoeJulian That's what I was thinking.
17:04 JoeJulian At least until we get this figured out.
17:04 bluenemo hm. wont work.
17:04 bluenemo :/
17:04 bluenemo got to have this stable by tomorrow.
17:04 bluenemo my next idea would be to try native gluster mount
17:05 ndevos I dont expect an issue with the native mount, and it might work better than a dying nfs server
17:05 bluenemo then I will try that now
17:06 ndevos on the other hand, I also didnt expect an issue with nfs...
17:06 ndevos note that most of the testing is done on CentOS, and Red Hat tests on RHEL, compared to that, only few test on Ubuntu
17:06 ndevos if this is a common issue, nobody reported it :-/
17:07 bluenemo hm just died on random again
17:07 bluenemo random as in i guess some cron or other php stuff
17:07 bluenemo i had gdb on it, what now?
17:07 ndevos do you have the (gdb) prompt there?
17:07 rafi joined #gluster
17:07 bluenemo i had gdb --pid=123 but its not showing much
17:07 bluenemo yes
17:08 ndevos type: t a a bt
17:08 ndevos and press enter
17:08 ndevos I wonder of that shows anything at all?
17:08 bluenemo btw I get those on alpha:  [2015-11-17 17:06:18.744407] E [MSGID: 106243] [glusterd.c:1621:init] 0-management: creation of 1 listeners failed, continuing with succeeded transport [2015-11-17 17:06:20.243542] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
17:09 bluenemo Cannot find new threads: generic error
17:09 ndevos but that is in glusterd.c, it is not part of the nfs-server?
17:09 bluenemo :/
17:09 bluenemo also omega then says: [2015-11-17 17:06:29.267918] E [socket.c:2965:socket_connect] 0-management: Failed to set keep-alive: Invalid argument
17:09 bluenemo but omegas nfs stays online
17:09 ndevos can you try to save a core from within gdb? type: gcore /var/tmp/gluster-nfs.core
17:10 bluenemo what is it trying to do?
17:10 ndevos setting that option can fail, it is not really important, there should not be any side-effects
17:10 bluenemo the path isnt correct with my system i guess
17:10 bluenemo Illegal process-id: /var/tmp/gluster-nfs.core
17:11 ndevos did you type that at the (gdb) prompt?
17:11 overclk joined #gluster
17:11 chirino joined #gluster
17:11 bluenemo says You can't do that without a process to debug :/
17:12 bluenemo meh i think i did sth wrong with it
17:12 ndevos yeah, the process is gone already... gdb can not save the memory to a file anymore
17:12 ndevos hmm, I really would like to know why the process dies
17:13 ndevos kkeithley_: maybe you know how to debug gluster/nfs on ubuntu?
17:13 ndevos I really have to leave now, good luck bluenemo!
17:13 bluenemo JoeJulian, you recommended to use glusterfs --attribute-timeout=HIGH --entry-timeout=HIGH --negative-timeout=HIGH with gluster mount. how high would you set it?
17:13 bluenemo ndevos, thank you very much for your help!!
17:14 JoeJulian Depends on the use case.
17:14 ndevos bluenemo: I hope you figure out what causes it, please tell me tomorrow how things went (or by email)
17:14 bluenemo i'll try to switch to native gluster mount now
17:14 bluenemo this is not the best time and case for debugging either ;)
17:15 bluenemo as i also dont know what in the app causes it
17:15 bluenemo if its "buying stuff" i will die in 2 hours
17:15 bluenemo ;)
17:17 squizzi_ joined #gluster
17:20 bluenemo JoeJulian, so thats my fstab line, what do you think?  alpha:/gfs_fin_web   /mnt    glusterfs    backupvolfile-server=omega,fetch-attempts=10,nobootwait,log-level=WARNING,log-file=/var/log/gluster-mount.log,attribute-timeout=30,entry-timeout=30,negative-timeout=30
17:21 bluenemo ok, so file locking works ok with gluster mount :)
17:21 bluenemo well maybe i'll just switch to that than.
17:22 JoeJulian Yeah, see if that performs to your needs.
17:22 JoeJulian If so it's a more stable solution anyway.
17:24 B21956 joined #gluster
17:25 jbrooks joined #gluster
17:25 bluenemo i'm on it
17:34 primehaxor joined #gluster
17:35 bluenemo ok, switched node one of four to gluster. request load is about average atm. in top, wait stats are sometimes showing, mostly below 0.5
17:44 ivan_rossi left #gluster
17:45 bluenemo waitstat is the same with nfs basically
17:45 bluenemo as in looks similar from top
17:50 JoeJulian yeah, page creation time is the only place you would conceivably see a difference.
17:59 muneerse2 joined #gluster
17:59 kkeithley1 joined #gluster
18:00 m0zes joined #gluster
18:00 Merlin_ joined #gluster
18:02 bennyturns joined #gluster
18:08 plarsen joined #gluster
18:10 bluenemo JoeJulian, now mounted two of the four workers via glusterfs. they look ok so far.
18:11 bluenemo also no errors reported in both alpha and omega gluster log
18:12 Humble joined #gluster
18:15 Rapture joined #gluster
18:19 RJ_ joined #gluster
18:21 rafi1 joined #gluster
18:23 TheSeven joined #gluster
18:25 uebera|| joined #gluster
18:25 uebera|| joined #gluster
18:29 Chr1st1an joined #gluster
18:32 RJ_ Hi glusterexperts, I have a problem with my fusemounts after adding some bricks to my gluster
18:32 RJ_ When I want to create a file in some directories in this fusemount, I get a "Transport endpoint is not connected". But not in all directories
18:33 RJ_ Does anyone have any idea how to tackle this?
18:36 Mr_Psmith joined #gluster
18:44 trapier joined #gluster
18:46 semiosis RJ_: did you open up ports in iptables on the servers for the new bricks?
18:46 semiosis client can't reach the new brick daemons
18:49 RJ_ iptables is all flushed on all storage nodes. And also Selinux has been disabled.
18:50 Boemlauw joined #gluster
18:50 RJ_ So you say that it's probably a network problem?
18:51 glafouille joined #gluster
18:52 bluenemo JoeJulian, do you think I could pull backups via rsync from the brick?
18:53 Merlin_ joined #gluster
18:53 F2Knight joined #gluster
18:56 lpabon joined #gluster
19:02 scubacuda joined #gluster
19:20 gildub_ joined #gluster
19:25 PatNarciso doc is missing -n 8192 option during partition format: https://github.com/gluster/glusterdocs/blob/master/Install-Guide/Configure.md
19:25 glusterbot Title: glusterdocs/Configure.md at master · gluster/glusterdocs · GitHub (at github.com)
19:26 Chr1st1an joined #gluster
19:28 sghatty_ joined #gluster
19:39 curratore joined #gluster
19:40 RJ_ We fixed our problem with not being able to write to certain directories in the fusemount, by rebooting the servers that gave the "Transport endpoint not connected" errors in the logfile.
19:53 PatNarciso xfs related: if 512 inode provides better performance than the default 256, would 1024 be better than 512?
19:53 PatNarciso ref url: http://www.gluster.org/pipermail/gluster-users/2012-October/011468.html
19:53 glusterbot Title: [Gluster-users] mkfs.xfs inode size question (at www.gluster.org)
19:54 kkeithley1 joined #gluster
19:55 kmai007 joined #gluster
19:55 kmai007 hey guys  i recently upgraded my clients to glusterfs-3.6.0
19:55 kmai007 my bricks are still running 3.5.3
19:55 kmai007 now all my clients have this socket or ib related error in its logs
19:56 kmai007 rpc_clnt_ping_cbk
19:56 kmai007 i've restarted glusterd,fsd, on all my storage nodes, but the client logs continue to spew out those messages
19:56 kmai007 anybody know of a fix?
19:57 kmai007 glusterbot
19:57 kmai007 glusterbot ib
19:57 kmai007 glusterbot help
19:57 glusterbot kmai007: (help [<plugin>] [<command>]) -- This command gives a useful description of what <command> does. <plugin> is only necessary if the command is in more than one plugin. You may also want to use the 'list' command to list all available plugins and commands.
19:57 kmai007 glusterbot list
19:57 glusterbot kmai007: Admin, Alias, Anonymous, Bugzilla, Channel, ChannelStats, Conditional, Config, Dict, Factoids, Google, Herald, Karma, Later, MessageParser, Misc, Network, NickCapture, Note, Owner, Plugin, PluginDownloader, Reply, Seen, Services, String, Topic, Trigger, URL, User, Utilities, and Web
19:58 kmai007 glusterbot topic ib
19:58 glusterbot kmai007: (topic [<channel>]) -- Returns the topic for <channel>. <channel> is only necessary if the message isn't sent in the channel itself.
20:11 kmai007 i cannot find any resolved incidents with the error message of "socket or ib related error"
20:16 mlncn joined #gluster
20:25 kmai007 anybody actively on here?
20:26 mlhess joined #gluster
20:26 PatNarciso yes.
20:26 abyss^ kmai007: I didn't upgrade glusterfs yet, but I'm going to upgrade from 3.3 to 3.7.
20:27 kmai007 i haven't upgraded my storage nodes yet
20:27 abyss^ We will see if I encounter the same issue
20:27 kmai007 is that the problem?
20:27 abyss^ you upgrade only clients?
20:27 kmai007 is the procedure clients->storage
20:27 kmai007 yes i've only done the clients so far
20:27 abyss^ from with version you've upgraded?
20:27 kmai007 it said its backwards compatible
20:27 abyss^ and to what version
20:27 kmai007 3.5.3-1 -> 3.6.0
20:29 abyss^ yes, so it's  backwards compatible
20:30 kmai007 the client logs appear to fill when there is activity done on that filesystem
20:30 abyss^ but in 3.6 version is new afrv2 implementation so you have to turn off self-heal or upgrade clients as well
20:31 abyss^ see here: http://www.gluster.org/community/documentation/index.php/Upgrade_to_3.6#Upgrade_Steps_For_Quota
20:31 abyss^ see here: http://www.gluster.org/community/documentation/index.php/Upgrade_to_3.6
20:31 kmai007 yes i read that
20:31 kmai007 i've upgraded the clients
20:31 kmai007 i haven't done it on my bricks yet
20:32 abyss^ oh ok I understood that you upgrade servers but not clients
20:32 kmai007 so my situation is the opposite of that doc.
20:35 abyss^ please turn off self heal and then check
20:35 kmai007 ok
20:35 kmai007 all 3 features
20:35 kmai007 entry, data, metadata ?
20:37 DV joined #gluster
20:37 kmai007 same issue
20:37 kmai007 no changes
20:38 mlncn joined #gluster
20:42 abyss^ hmmm but volumes work fine? You can access data?
20:42 kmai007 yes i can
20:42 kmai007 its logging so much in /var/log/glusterfs/ that it fills up my filesystem
20:43 abyss^ for me it looks like self-heal issue, but maybe someone more experienced in gluster will say more
20:45 bennyturns joined #gluster
20:55 kmai007 figured it out
20:56 kmai007 it doesn't like me setting the network.ping-timeout
20:56 kmai007 i removed that feature on my volume
20:56 kmai007 and the logs no longer spam my log file
20:56 kmai007 but it doesn't help me b/c i rely on that setting
20:56 kmai007 i have it set to 10 seconds
20:58 kmai007 i just turned it back on
20:59 kmai007 it appears that its working and there is no more logging of the socket or ib error
21:00 abyss^ nice, tommorow there should be someone who could help you and give more advice:)
21:01 mlncn joined #gluster
21:02 wushudoin joined #gluster
21:05 Merlin_ joined #gluster
21:06 natarej joined #gluster
21:07 mhulsman joined #gluster
21:09 dlambrig_ joined #gluster
21:15 dlambrig_ joined #gluster
21:16 dlambrig_ joined #gluster
21:23 DV joined #gluster
21:24 PatNarciso has samba vfs cash been resolved in 3.7.6?  https://bugzilla.redhat.com/show_bug.cgi?id=1234877
21:24 glusterbot Bug 1234877: high, medium, ---, rhs-smb, NEW , Samba crashes with 3.7.4 and VFS module
21:25 JoeJulian Not according to that.
21:25 PatNarciso word.
21:27 mlhamburg1 joined #gluster
21:43 DV joined #gluster
21:54 DV joined #gluster
21:55 bennyturns joined #gluster
21:59 kovsheni_ joined #gluster
22:00 JoeJulian PatNarciso: I would recommend emailing the assignee of that ticket directly. I don't see anything anywhere I would normally look.
22:00 mlhamburg1 joined #gluster
22:06 dlambrig_ joined #gluster
22:13 ecoreply joined #gluster
22:17 bennyturns joined #gluster
22:18 PatNarciso JoeJulian, understood.  thanks.
22:45 Rapture joined #gluster
22:50 dblack joined #gluster
22:51 rwheeler joined #gluster
22:57 kovshenin joined #gluster
23:33 Mr_Psmith joined #gluster
23:37 DRoBeR joined #gluster
23:54 cjellick joined #gluster
23:55 cjellick hi all! does the glusterfs native client start sshd? and if so why?
23:57 Merlin__ joined #gluster
23:58 JoeJulian no

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary