Perl 6 - the future is here, just unevenly distributed

IRC log for #gluster, 2016-02-03

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:10 raghu joined #gluster
00:42 zhangjn joined #gluster
00:45 aea joined #gluster
00:50 EinstCrazy joined #gluster
01:05 Wizek joined #gluster
01:05 mobaer joined #gluster
01:12 haomaiwa_ joined #gluster
01:32 Lee1092 joined #gluster
01:33 nangthang joined #gluster
01:47 mobaer joined #gluster
02:12 harish joined #gluster
02:17 EinstCrazy joined #gluster
02:21 theron joined #gluster
02:21 nishanth joined #gluster
02:23 haomaiwa_ joined #gluster
02:32 farhoriz_ joined #gluster
03:01 haomaiwa_ joined #gluster
03:25 ovaistariq joined #gluster
03:27 gem joined #gluster
03:27 ovaistariq joined #gluster
03:29 vmallika joined #gluster
03:35 bharata-rao joined #gluster
03:36 calavera joined #gluster
03:41 nangthang joined #gluster
03:46 JoeJulian hagarth: If you're still around, do you have any thoughts on this? These blasted 20TB images that are still healing, if the VM that mounts them does an fsync, they can take more than 5 minutes! I suspect this has to do with the heals and the sync cache and the fact that we have 256Gb of ram. On a hunch, I did a drop_caches on all my servers and booted the client again. It did not have a frame timeout on fsync. Does that make any sense to you, and i
03:46 JoeJulian f so, any thoughts on a workaround?
03:47 JoeJulian s/5 minutes/30 minutes/
03:47 glusterbot What JoeJulian meant to say was: hagarth: If you're still around, do you have any thoughts on this? These blasted 20TB images that are still healing, if the VM that mounts them does an fsync, they can take more than 30 minutes! I suspect this has to do with the heals and the sync cache and the fact that we have 256Gb of ram. On a hunch, I did a drop_caches on all my servers and booted the client again. It
03:47 glusterbot did not have a frame timeout on fsync. Does that make any sense to you, and i
03:49 devilspgd joined #gluster
03:50 nbalacha joined #gluster
03:54 atinm joined #gluster
03:55 hagarth JoeJulian: something seems odd there. have you done any profiling of gluster to see latencies as observed by gluster?
03:58 JoeJulian No, just noticed that the client application (qemu) was hanging and was uninterruptable. Eventually I was patient (actually, I had to go eat dinner) and when I returned I noticed the frame timeout on fsync.
03:59 JoeJulian My hunch worked out, though. If I drop_caches before I start it, it doesn't time out.
03:59 JoeJulian In fact, it boots faster than it ever has. :D
04:01 haomaiwa_ joined #gluster
04:03 harish joined #gluster
04:09 itisravi joined #gluster
04:11 shubhendu joined #gluster
04:13 Manikandan joined #gluster
04:20 deepakcs joined #gluster
04:20 ramteid joined #gluster
04:23 David-Varghese joined #gluster
04:34 gem joined #gluster
04:37 calavera joined #gluster
04:42 Saravanakmr joined #gluster
04:42 harish joined #gluster
04:44 sakshi joined #gluster
04:48 toloughl joined #gluster
04:50 rcampbel3 joined #gluster
04:54 atinm joined #gluster
05:00 RameshN joined #gluster
05:01 haomaiwa_ joined #gluster
05:01 pppp joined #gluster
05:05 ashiq joined #gluster
05:07 poornimag joined #gluster
05:10 nehar joined #gluster
05:12 ndarshan joined #gluster
05:12 aravindavk joined #gluster
05:18 gowtham joined #gluster
05:24 jiffin joined #gluster
05:25 hgowtham joined #gluster
05:27 vmallika joined #gluster
05:28 Apeksha joined #gluster
05:30 nishanth joined #gluster
05:31 skoduri joined #gluster
05:32 skoduri joined #gluster
05:38 rcampbel3 joined #gluster
05:40 dusmantkp_ joined #gluster
05:44 Bhaskarakiran joined #gluster
05:47 kdhananjay joined #gluster
05:48 atalur joined #gluster
05:51 overclk joined #gluster
05:52 Saravanakmr joined #gluster
05:54 rafi joined #gluster
05:55 kanagaraj joined #gluster
06:01 haomaiwa_ joined #gluster
06:02 vimal joined #gluster
06:04 anil joined #gluster
06:16 Humble joined #gluster
06:17 karnan joined #gluster
06:19 jiffin joined #gluster
06:20 ramky joined #gluster
06:23 David_Varghese joined #gluster
06:35 kovshenin joined #gluster
06:43 rafi1 joined #gluster
06:53 nehar joined #gluster
06:59 poornimag joined #gluster
07:01 haomaiwa_ joined #gluster
07:03 harish joined #gluster
07:04 inodb joined #gluster
07:06 inodb joined #gluster
07:13 mhulsman joined #gluster
07:19 dusmantkp_ joined #gluster
07:19 post-factum joined #gluster
07:21 robb_nl joined #gluster
07:23 jtux joined #gluster
07:34 [Enrico] joined #gluster
07:44 pppp joined #gluster
07:45 bhuddah joined #gluster
07:47 bhuddah joined #gluster
07:48 doekia joined #gluster
07:55 dusmantkp_ joined #gluster
08:01 haomaiwang joined #gluster
08:04 arcolife joined #gluster
08:09 Wizek joined #gluster
08:09 zhangjn joined #gluster
08:10 zhangjn joined #gluster
08:11 zhangjn joined #gluster
08:12 zhangjn joined #gluster
08:12 poornimag joined #gluster
08:15 zhangjn joined #gluster
08:17 zhangjn joined #gluster
08:21 fedele Good morning #gluster. I have a doubt: can I use a gluster volume built with 32 bricks distributed on 32 nodes connected via Infiniband network. It will be reliable?
08:23 [diablo] joined #gluster
08:24 harish joined #gluster
08:29 om joined #gluster
08:30 fsimonce joined #gluster
08:39 post-factum would you like to evaluate the probability of system failure?
08:40 fedele No, my doubt regards another problem:
08:40 fedele When I add a volume example gluster volume add-brick scratch ib-wn030:/bricks/brick1/gscratch0
08:41 DV joined #gluster
08:41 fedele It gives at least 1 minute to return me something and the answer is:
08:44 fedele sometime: Request Timeout, sometime: volume add-brick: failed: Another transaction is in progress Please try again after sometime.
08:44 fedele I have to add 32 bricks and the operation is very very slow.
08:44 fedele in the log /var/log/messages I can see a lot of
08:45 Slashman joined #gluster
08:45 fedele newton-fe etc-glusterfs-glusterd.vol[14814]: [2016-02-03 08:29:54.762855] C [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 0-management: server 172.16.2.19:24007 has not responded in the last 30 seconds, disconnecting.
08:46 fedele for each server I have in the pool
08:46 post-factum please check the following: 1) you have connectivity between servers; 2) firewall does not block glusterfs ports; 3) you have node probed to be a part of cluser (show us the output of "gluster peer status" command)
08:47 fedele After googling I have read this meansI have a flacy network, but I have IB....
08:47 fedele Can anyone help me?
08:47 fedele post-factum: I will answer you:
08:47 dusmantkp_ joined #gluster
08:48 fedele post-factum: 1) connectivity is ok, I also try with ping and with telnet 172.16.2.19 24007 (telnet hangs in this state:
08:49 fedele Trying 172.16.2.19...
08:49 fedele Connected to 172.16.2.19.
08:49 fedele Escape character is '^]'.
08:49 fedele
08:50 fedele post-factum: 2) firewall is down in the nodes of the cluster, so definitively no firewall blocking connections
08:52 rafi joined #gluster
08:54 skoduri joined #gluster
08:54 fedele post-factum: 3) this is the output of gluster peer status http://termbin.com/3zmo
08:55 fedele post-factum: This is all
08:57 sakshi joined #gluster
08:58 gowtham joined #gluster
09:01 EinstCrazy joined #gluster
09:01 haomaiwang joined #gluster
09:02 anti[Enrico] joined #gluster
09:06 kkeithley1 joined #gluster
09:07 kkeithley1 joined #gluster
09:07 post-factum fedele: you have rejected peers. may be, that is the cause
09:10 ppai joined #gluster
09:12 kshlm joined #gluster
09:13 overclk joined #gluster
09:14 kovshenin joined #gluster
09:15 ctria joined #gluster
09:16 fedele post-factum: how can I correct the problem?
09:17 post-factum usually, stopping glusterd and wiping /var/lib/glusterd/vols does the trick. node will re-fetch volumes metadata from other nodes
09:18 post-factum anyway, before wiping, backup that folder
09:18 fedele thank you, I will try
09:22 zhangjn joined #gluster
09:23 fedele post-factum: I run gluster peer status but now no Rejected nodes: http://termbin.com/cmw0
09:25 RameshN joined #gluster
09:26 post-factum fedele: could you please show ib-wn030 firewall entries related to glusterfs?
09:26 post-factum fedele: 24007 is not enough to open
09:28 Humble joined #gluster
09:29 EinstCrazy joined #gluster
09:30 post-factum or there is no firewall *at all*?
09:31 post-factum sudo iptables -vnL
09:40 shaunm joined #gluster
09:41 DV joined #gluster
09:44 fedele [root@wn030 ~]# iptables -vnL
09:44 fedele Chain INPUT (policy ACCEPT 6760K packets, 80G bytes)
09:44 fedele pkts bytes target     prot opt in     out     source               destination
09:44 fedele Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
09:44 fedele pkts bytes target     prot opt in     out     source               destination
09:44 fedele Chain OUTPUT (policy ACCEPT 2655K packets, 169M bytes)
09:44 fedele pkts bytes target     prot opt in     out     source               destination
09:44 fedele [root@wn030 ~]#
09:45 fedele post-factum: firewall configuration  is the same on all nodes of the cluster
09:46 fedele by the way this is the output of gluster volume info
09:46 fedele [root@newton-fe /]# gluster volume info
09:46 fedele
09:46 fedele Volume Name: scratch
09:46 fedele Type: Distribute
09:46 fedele Volume ID: fc6f18b6-a06c-4fdf-ac08-23e9b4f8053e
09:46 fedele Status: Created
09:46 fedele Number of Bricks: 29
09:46 fedele Transport-type: tcp
09:46 fedele Bricks:
09:46 fedele Brick1: ib-wn001:/bricks/brick1/gscratch0
09:46 fedele Brick2: ib-wn002:/bricks/brick1/gscratch0
09:46 fedele Brick3: ib-wn003:/bricks/brick1/gscratch0
09:46 fedele Brick4: ib-wn004:/bricks/brick1/gscratch0
09:46 fedele Brick5: ib-wn005:/bricks/brick1/gscratch0
09:46 fedele Brick6: ib-wn006:/bricks/brick1/gscratch0
09:46 fedele Brick7: ib-wn007:/bricks/brick1/gscratch0
09:46 fedele Brick8: ib-wn008:/bricks/brick1/gscratch0
09:46 fedele Brick9: ib-wn009:/bricks/brick1/gscratch0
09:46 gem joined #gluster
09:46 fedele Brick10: ib-wn010:/bricks/brick1/gscratch0
09:46 fedele Brick11: ib-wn011:/bricks/brick1/gscratch0
09:46 fedele Brick12: ib-wn012:/bricks/brick1/gscratch0
09:47 fedele Brick13: ib-wn013:/bricks/brick1/gscratch0
09:47 fedele Brick14: ib-wn014:/bricks/brick1/gscratch0
09:47 fedele Brick15: ib-wn015:/bricks/brick1/gscratch0
09:47 fedele Brick16: ib-wn016:/bricks/brick1/gscratch0
09:47 fedele Brick17: ib-wn017:/bricks/brick1/gscratch0
09:47 fedele Brick18: ib-wn018:/bricks/brick1/gscratch0
09:47 fedele Brick19: ib-wn019:/bricks/brick1/gscratch0
09:47 fedele Brick20: ib-wn020:/bricks/brick1/gscratch0
09:47 fedele Brick21: ib-wn022:/bricks/brick1/gscratch0
09:47 fedele Brick22: ib-wn021:/bricks/brick1/gscratch0
09:47 misc can you use a paste website next time ?
09:47 fedele Brick23: ib-wn023:/bricks/brick1/gscratch0
09:47 fedele Brick24: ib-wn024:/bricks/brick1/gscratch0
09:47 fedele Brick25: ib-wn025:/bricks/brick1/gscratch0
09:47 fedele Brick26: ib-wn026:/bricks/brick1/gscratch0
09:47 fedele Brick27: ib-wn027:/bricks/brick1/gscratch0
09:47 fedele Brick28: ib-wn028:/bricks/brick1/gscratch0
09:47 fedele Brick29: ib-wn029:/bricks/brick1/gscratch0
09:47 fedele Options Reconfigured:
09:47 fedele performance.readdir-ahead: on
09:47 fedele [root@newton-fe /]#
09:47 fedele OK, excuse me
09:51 mhulsman joined #gluster
09:52 Bhaskarakiran joined #gluster
09:57 zhangjn_ joined #gluster
09:58 gildub joined #gluster
10:00 zhangjn_ joined #gluster
10:00 Bhaskarakiran joined #gluster
10:00 mobaer joined #gluster
10:01 haomaiwa_ joined #gluster
10:02 zhangjn_ joined #gluster
10:02 zhangjn_ joined #gluster
10:05 anoopcs fedele, fpaste.org
10:05 kdhananjay joined #gluster
10:05 anoopcs fedele, ^^
10:05 DV joined #gluster
10:06 itisravi joined #gluster
10:12 dusmantkp_ joined #gluster
10:12 Saravanakmr joined #gluster
10:15 zhangjn_ joined #gluster
10:15 gem_ joined #gluster
10:18 post-factum fedele: could you also please show "sudo ss -tunlp | grep gluster" on 30th node?
10:18 mhulsman joined #gluster
10:37 rcampbel3 joined #gluster
10:49 fedele post-factum: at the end I added 30th node. I don't know if the trick was the change suggested in https://bugzilla.redhat.com/show_bug.cgi?id=1272436
10:49 glusterbot Bug 1272436: high, high, ---, amukherj, ASSIGNED , glusterd crashing
10:50 fedele In /etc/glusterfs/glusterd.vol (in whole cluster) I changed option ping-timeout 30 value to 1 and  I added the line
10:51 fedele excuse me option ping-timeout 0 and added the line
10:51 fedele option event-threads 1
10:52 fedele now I have 32 bricks on 32 nodes
10:53 harish joined #gluster
10:54 fedele In any way this is the output of ss -tunlp | grep gluster  ->  http://termbin.com/xgge
11:01 haomaiwa_ joined #gluster
11:02 amye joined #gluster
11:09 harish_ joined #gluster
11:10 harish joined #gluster
11:13 deepakcs joined #gluster
11:13 zhangjn joined #gluster
11:13 rafi1 joined #gluster
11:18 Wizek joined #gluster
11:24 post-factum fedele++
11:24 glusterbot post-factum: fedele's karma is now 1
11:27 rafi joined #gluster
11:30 mhulsman joined #gluster
11:35 kovshenin joined #gluster
11:36 bfoster joined #gluster
11:37 jvandewege Hello, Anyone available to explain why ovirt seems to have problems with the following volume?
11:38 jvandewege volume create gv_ovirt_test01 replica 3 arbiter 1 stor01:/gluster/br_test01/gl_test01 stor02:/gluster/br_test01/gl_test01 vhost01:/gluster/br_test01/gl_test01
11:39 jvandewege Creating a storage domain on that volume ends up in errors about zero length files or not being able to create the admin folders.
11:40 jvandewege Seems like the arbiter is used to read back data? (I read that the arbiter only stores metadata not actual data) Anyway to force a mount to use only the two data bricks?
11:42 karthikfff joined #gluster
11:42 [diablo] afternoon guys
11:43 [diablo] can anyone tell me how to get gluster-volumeid from getfattr in readable text please_
11:44 baojg joined #gluster
11:44 [diablo] trusted.glusterfs.volume-id=0sOd2MNDDHRGCv9O/KlqOF5g==
11:44 [diablo] I'd like to know what it's set to
11:45 [diablo] or is the UUID rather than the text name
11:52 [Enrico] joined #gluster
11:52 Etzeitet joined #gluster
11:55 deepakcs joined #gluster
11:56 Etzeitet Hi
11:56 glusterbot Etzeitet: Despite the fact that friendly greetings are nice, please ask your question. Carefully identify your problem in such a way that when a volunteer has a few minutes, they can offer you a potential solution. These are volunteers, so be patient. Answers may come in a few minutes, or may take hours. If you're still in the channel, someone will eventually offer an answer.
11:58 kshlm Gluster community meeting is starting in 2 minutes in #gluster-meeting
11:58 Bhaskarakiran joined #gluster
12:00 bluenemo joined #gluster
12:01 Etzeitet I am very new to GlusterFS and have been playing around with it with a view to using it in production. Unfortunately I am struggling to find any information on the behaviour of the Gluster native client. Is there any current documentation that describes how the client works? There is precious little information beyond how to install and mount using the native client. Most information I have has come via old blog posts and whatever messages Google has
12:01 Etzeitet dug up from the mailing lists.
12:01 haomaiwang joined #gluster
12:01 gem joined #gluster
12:03 theron joined #gluster
12:04 theron joined #gluster
12:06 RameshN joined #gluster
12:11 MACscr hmm, ive had the same gluster heal status for like 12 hours now
12:11 julim joined #gluster
12:12 kanagaraj joined #gluster
12:15 csaba joined #gluster
12:26 jdarcy joined #gluster
12:31 jiffin1 joined #gluster
12:31 bfoster1 joined #gluster
12:34 poornimag joined #gluster
12:34 poornimag joined #gluster
12:34 18WABY79K joined #gluster
12:34 Wizek joined #gluster
12:35 afics left #gluster
12:35 arcolife joined #gluster
12:35 Humble joined #gluster
12:36 tswartz joined #gluster
12:37 ashiq joined #gluster
12:41 tswartz joined #gluster
12:42 zhangjn joined #gluster
12:44 nisroc joined #gluster
12:48 zhangjn joined #gluster
12:49 zhangjn joined #gluster
12:57 julim joined #gluster
13:01 luizcpg joined #gluster
13:01 haomaiwang joined #gluster
13:04 Pupeno joined #gluster
13:04 theron joined #gluster
13:06 gildub joined #gluster
13:09 dusmantkp_ joined #gluster
13:12 shubhendu joined #gluster
13:13 tswartz joined #gluster
13:14 nishanth joined #gluster
13:16 ira_ joined #gluster
13:17 robb_nl joined #gluster
13:17 mobaer joined #gluster
13:18 RameshN joined #gluster
13:23 rafi1 joined #gluster
13:24 karnan joined #gluster
13:24 unclemarc joined #gluster
13:32 tswartz joined #gluster
13:34 kovshenin joined #gluster
13:36 Guest17375 joined #gluster
13:36 haomaiwa_ joined #gluster
13:40 mhulsman joined #gluster
13:44 rcampbel3 joined #gluster
13:51 baojg joined #gluster
13:53 RameshN joined #gluster
14:01 haomaiwang joined #gluster
14:03 nbalacha joined #gluster
14:06 theron joined #gluster
14:07 baojg joined #gluster
14:09 baojg joined #gluster
14:14 amye joined #gluster
14:18 B21956 joined #gluster
14:18 dlambrig joined #gluster
14:22 shubhendu joined #gluster
14:25 plarsen joined #gluster
14:25 nishanth joined #gluster
14:27 plarsen joined #gluster
14:29 plarsen joined #gluster
14:32 baojg joined #gluster
14:46 rafi joined #gluster
14:48 julim joined #gluster
14:49 baojg joined #gluster
14:50 hamiller joined #gluster
14:52 skylar joined #gluster
15:01 haomaiwa_ joined #gluster
15:02 theron joined #gluster
15:04 zhangjn joined #gluster
15:04 vmallika joined #gluster
15:05 mhulsman joined #gluster
15:08 jtux joined #gluster
15:12 bennyturns joined #gluster
15:20 jiffin joined #gluster
15:28 vmallika joined #gluster
15:35 farhoriz_ joined #gluster
15:35 baojg joined #gluster
15:45 dlambrig joined #gluster
15:50 The_Ball I'm playing with the shard translator and it seems to work very well. Is it still considered unstable in 3.7?
15:51 zhangjn joined #gluster
15:53 neofob joined #gluster
15:53 RameshN joined #gluster
15:56 sturcotte06 joined #gluster
15:56 baojg joined #gluster
15:56 rcampbel3 joined #gluster
15:56 sturcotte06 Hey all
15:56 sturcotte06 I have an issue with relatime and glusterFS
15:57 nickage__ joined #gluster
15:57 sturcotte06 stating a file updates the access time of the file
15:57 sturcotte06 and I wondered if anybody could help with that
15:57 kovshenin joined #gluster
16:01 haomaiwa_ joined #gluster
16:03 Guest72094 joined #gluster
16:13 arcolife joined #gluster
16:14 rcampbel3 joined #gluster
16:18 theron joined #gluster
16:21 skoduri joined #gluster
16:27 overclk sturcotte06, it shouldn't actually. what's the brick filesystem?
16:39 JoeJulian sturcotte06: Did you mount the *bricks* with relatime, or just tried to do that with the fuse mount?
16:40 kovshenin joined #gluster
16:40 JoeJulian Because it's the kernel that handles updating atime, so it's done at the bricks.
16:41 JoeJulian Personally, I always mount my bricks with noatime.
16:42 overclk JoeJulian, ndevos and sturcotte06 are discussing this in dev channel (checked just now)
16:42 sturcotte06 the bricks are mounted with relatime
16:42 JoeJulian Oh, sturcotte06 is a developer?
16:43 sturcotte06 nah I'm a user
16:43 JoeJulian Then ndevos should have been discussing that in here so we could all benefit... tsk, tsk... ;)
16:44 sturcotte06 haha, well I've sent a email to gluster-users@gluster.org explaining the issue
16:44 shortdudey123 joined #gluster
16:44 julim joined #gluster
16:45 JoeJulian Eww, yeah, I'm not sure I would want that fixed. At least not by default.
16:45 sturcotte06 performance wise, maybe
16:45 JoeJulian They would have to check the atime, get the xattrs, then set the atime back.
16:45 sturcotte06 but this cause a lot of issue on our side
16:46 sturcotte06 we have a lot of file rotation, and so we need a cleanup system to clear unused files
16:46 sturcotte06 and we needed to tail log files to assess accesses to files
16:47 sturcotte06 now we made a new version of the cleanup system, and we based it on relatime
16:47 sturcotte06 so we could know exactly when the file is accessed
16:47 sturcotte06 but asking the question changes the answer
16:48 JoeJulian I wonder if the new tiering system could be used for that.
16:48 shubhendu joined #gluster
16:50 robb_nl joined #gluster
16:53 JoeJulian I think the new tiering tools store their own version of atime in xattrs. If that could be accessed through a client mount, it might get you what you need.
17:01 16WAARLTH joined #gluster
17:02 David_Varghese joined #gluster
17:04 rcampbel3 joined #gluster
17:08 dlambrig joined #gluster
17:09 Pupeno joined #gluster
17:10 overclk sturcotte06, JoeJulian: atime is also updated (on relatime) when mtime is updated to a value newer to atime.
17:12 JoeJulian Right, but the issue is that just reading the stats on a replicated volume updates atime, so when he wants to see when a file was last accessed, he's updating the atime and skewing the results.
17:14 overclk JoeJulian, I think sturcotte06 updates mtime too (according to -dev channel). If atime gets updated after subsequent stats then it's probably a bug.
17:14 sturcotte06 my test includes an update to mtime
17:14 sturcotte06 however, this is the setup of the test, not the test itself
17:15 overclk sturcotte06, ok. so the next stat should give updated atime. subsequent stat() should not.
17:15 sturcotte06 the test being, with a file having an mtime of 2016-02-03 and an atime of 1970-01-01, stating the file updates the atime to 2016-02-03
17:16 calavera joined #gluster
17:16 sturcotte06 so observing the atime of a file updates it
17:17 sturcotte06 basically, my cleanup system simply walks the file system and find all files which access time is greater than 30 days
17:17 sturcotte06 and delete them
17:17 rafi joined #gluster
17:17 sturcotte06 however, querying the accessing time updates it, which means no file will ever be cleaned up
17:18 mhulsman joined #gluster
17:19 _Bryan_ joined #gluster
17:30 kmai007 joined #gluster
17:31 kmai007 hey guys, have you guys made any changes to the network.ping-timeout?  has this method been verified? http://thornelabs.net/2015/02/24/change-gluster-volume-connection-timeout-for-glusterfs-native-client.html
17:31 glusterbot Title: Change Gluster Volume Connection Timeout for GlusterFS Native Client (at thornelabs.net)
17:32 kmai007 i guess i see on my client logs, the specified time is 10 sec. but i'm not sure if this is a valid solution that is required?
17:32 JoeJulian @ping-timeout
17:32 glusterbot JoeJulian: The reason for the long (42 second) ping-timeout is because re-establishing fd's and locks can be a very expensive operation. Allowing a longer time to reestablish connections is logical, unless you have servers that frequently die.
17:34 kmai007 understood, i need a faster recovery so I've changed it to 10sec.  But I wanted to find out from the rest of the community if they too have lowered the setting.
17:34 The_Ball kmai007, I use Gluster with oVirt which requires lowering the timeout to 10 seconds
17:34 kmai007 my web content hosts workload is primarily read-only
17:35 kmai007 The_Ball:you didn't have to make additional changes to the .vol file?  The cmdline was sufficient?
17:36 The_Ball No the gluster volume set command is sufficient
17:37 kmai007 last week I had 2 clients rpc client timeout disconnect and immediately reconnect.  Now I have to figure out why, and the theory from the web is the "network"....yet my network guys claim there wasn't anything that would have triggered the client to disconnect
17:38 kmai007 is there anything I can read to further understand when the client-handshake.c:127:rpc_client_ping_timer_expired gets triggered
17:40 The_Ball kmai007, "cloud" hosted?
17:41 kmai007 no, physical servers
17:41 MACscr i still have the same 8 entries on my heal info output that i did 18 hours ago
17:41 kmai007 distri-rep, 4x2
17:42 kmai007 MACscr: on all your hosts, or just 1 ?
17:42 JoeJulian Shortening the ping-timeout is silly. Your server mtbf is so long that unless you have several thousand in your volume, the likelihood of hitting that timeout is exceptionally small.
17:42 MACscr kmai007: all. i only have 2
17:42 julim joined #gluster
17:43 JoeJulian 42 seconds per hundred years isn't really a big deal.
17:44 kmai007 its not freq. but it happened, and of course i have to find out why.  Of course for web hosting, if I leave the 42 as default, it will be deemed too long when it does trigger.
17:45 MACscr kmai007: http://paste.debian.net/plain/378376
17:45 kmai007 I'm not arguing, Just trying to defend myself from the network
17:45 JoeJulian What about when you set it too short and you end up beaconing because it takes longer than ping-timeout to recover?
17:45 kmai007 the way I have it configured, we have 36 web content servers, and only 2 of them disconnected to 1 specific storage node
17:45 MACscr JoeJulian: my favorite person. howdy =)
17:45 JoeJulian :)
17:46 kmai007 agreed. i don't want to start a domino affect with disconnects b/c of what my mngt deems as too long
17:46 kmai007 all within 10 secs. i mean i'm happy with 10 secs. out of 365+ days of uptime
17:47 JoeJulian My point is that far too often someone reads some stupid idiot's web page or howto.com and causes themselves more problems in the long run because they don't understand how something works.
17:47 kmai007 sorry MACscr my IA team has deemed pastebin/fedora/ paste sites as cloud, and blocked it from the proxy :-(
17:47 JoeJulian Once someone can argue with me why they need to do what they did, they're usually right.
17:48 kmai007 JoeJulian: well said, thats why I didn't jump....just wanted to read your words, b/c i can't hear your voice :-P~
17:48 JoeJulian :)
17:48 JoeJulian btw... that web page is someone who has no clue what they're doing.
17:49 JoeJulian I'm tempted to rant...
17:49 calavera_ joined #gluster
17:50 kmai007 MACscr: maybe send your output to khoi.mai2008@gmail.com
17:50 MACscr kmai007: is there a pastebin that is safe?
17:50 rcampbel3 fpaste ?
17:50 kmai007 JoeJulian: where can i read more about the rpc_client_ping_timer_expired and how it works.
17:50 JoeJulian gist?
17:51 kmai007 did fpaste start to offer dropbox? b/c I think thats blocked for me too
17:51 JoeJulian kmai007: Unfortunately, the same place I get my information about such things... in the source. :(
17:51 MACscr https://gist.github.com/MACscr/5028fa2f656a77708c1d
17:51 glusterbot Title: gist:5028fa2f656a77708c1d · GitHub (at gist.github.com)
17:51 kmai007 oh sh*t, fpaste.bin works
17:51 kmai007 perfect
17:51 kmai007 https://paste.fedoraproject.org/
17:51 glusterbot Title: New paste Fedora Project Pastebin (at paste.fedoraproject.org)
17:52 MACscr https://paste.fedoraproject.org/318090/54521914/raw/
17:52 kmai007 thanks JoeJulian, i'll try to read that, when i'm snowed in
17:52 kmai007 MACscr: what version of glusterfs are you running?  I don't recall the heal info output as dynamic,
17:53 MACscr glusterfs 3.7.6 built on Nov  9 2015 15:17:09
17:53 kmai007 is the file reachable?
17:53 kmai007 from client?
17:53 MACscr all those files? yes
17:53 MACscr they are all in luse
17:53 MACscr use
17:53 MACscr as they are raw images that are served as iscsi luns
17:53 kmai007 i always have heal info, but the split-brain and failed are the ones that concerns me most
17:55 kmai007 i'm still on glusterfs3.4.2-1
17:55 kmai007 how has 3.7.6 treated ya?
17:55 kmai007 is that your production environment?
17:56 MACscr yes, I am a pretty small outfit
17:57 JoeJulian I commented on that post. And I refrained from calling anyone an idiot. I'm feeling pretty proud of myself.
17:57 kmai007 i'm not certain, but I think if you don't have any split-brain/failed output, you're okay.....
17:58 kmai007 JoeJulian: thanks for that, the www has so much info, that i can't decide what is BS vs. GOLD
17:58 JoeJulian I'll try to post more gold.
17:59 kmai007 thanks, golden nuggets
17:59 MACscr JoeJulian: to many of us have to wear so many hats that we cant learn the inner workings of everything and do rely a bit on howto's and other peoples work
17:59 MACscr i dont have to be a mechanic to drive a car or to do some regular standard maintenance to it
17:59 JoeJulian I agree. And in such cases, I advise you trust the developers to default to sane settings.
18:00 MACscr i absolutely do
18:00 MACscr though sometimes ive found that the defaults arent always a good idea, but many projects are just that poorly managed
18:00 JoeJulian It's gotten better, but for a few years when I first started hanging out here, everyone came in the channel and asked how to tune gluster to make it faster for everything.
18:01 MACscr if you want fast, you dont go with gluster anyway =P
18:01 JoeJulian It was getting so bad I suggested to hagarth that they start changing the defaults to insane values so I could give them better advise.
18:01 haomaiwa_ joined #gluster
18:01 JoeJulian If you want fast, you don't go with clustered systems.
18:01 MACscr true
18:02 JoeJulian Clustering gives you reliability.
18:02 JoeJulian And/or throughput.
18:02 MACscr so back to my issues =)
18:02 JoeJulian You need a mechanic?
18:03 JoeJulian MACscr: Are all those images under heavy usage?
18:03 Rapture joined #gluster
18:04 MACscr the raw images? not really
18:04 MACscr they are just hypervisors, so all the io is on ceph disks
18:05 JoeJulian Pick one image and pull the ,,(extended attributes) from both servers and let's take a look.
18:05 glusterbot (#1) To read the extended attributes on the server: getfattr -m .  -d -e hex {filename}, or (#2) For more information on how GlusterFS uses extended attributes, see this article: http://pl.atyp.us/hekafs.org/index.php/2011/04/glusterfs-extended-attributes/
18:06 MACscr umm, was that command supposed to output something?
18:07 JoeJulian Are you root?
18:07 MACscr yes
18:07 JoeJulian Then yes.
18:07 MACscr well it did not
18:07 JoeJulian On the brick, not the client.
18:07 MACscr oh
18:07 JoeJulian Aha
18:09 MACscr https://paste.fedoraproject.org/318110/45229401/
18:09 glusterbot Title: #318110 Fedora Project Pastebin (at paste.fedoraproject.org)
18:10 JoeJulian Odd... can I see a stat of each of those too, please?
18:10 MACscr Are timestamps for logs not configurable? i think its moronic not to use the selected tz of the host
18:11 MACscr i dont care if people want UTC, if they want it, they can set their systems to use that, but its stupid to have it unique to gluster
18:12 JoeJulian file a bug report
18:12 glusterbot https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS
18:12 MACscr so its really not possible?
18:13 MACscr wow that is arrogant of the devs
18:13 JoeJulian You could change the syslog log-level and the file log-level so your logs go to syslog instead.
18:14 MACscr great to know. thanks!
18:14 theron joined #gluster
18:17 MACscr stat outut https://paste.fedoraproject.org/318112/23420145/
18:17 glusterbot Title: #318112 Fedora Project Pastebin (at paste.fedoraproject.org)
18:17 MACscr output
18:17 baojg joined #gluster
18:19 JoeJulian Dammit. gluster2 is still not hardlinked. If it was me, I'd kill glusterfsd on gluster2, wipe the gluster2 brick, force start the volume and let it re-replicate.
18:21 Pupeno joined #gluster
18:23 monotek joined #gluster
18:24 MACscr ok, i thought we did that before, but we never wiped the /var/gluster-storage on gluster2, just the hidden files
18:24 MACscr so rm -rf /var/lgluster-storage/
18:24 MACscr ?
18:25 MACscr well kill the daemon first
18:25 JoeJulian right
18:25 lupine joined #gluster
18:25 MACscr what about the other gluster processes?
18:26 JoeJulian They're fine.
18:27 MACscr ok, did that, but when i look at 'gluster volume status volume1', it says the brick for gluster2 is offline
18:28 JoeJulian mkdir /var/gluster-storage and start...force again.
18:28 MACscr says it exists
18:29 JoeJulian Is status still offline?
18:30 MACscr did another force start and it appears to be online again
18:32 lupine afternoon. I'm just considering a glusterfs-for-vm-storage setup, and I'm wondering about the volfile-server / backup-volfile-servers  / volfile(?) mount options. Should I use a DNS round robin for volfile-server? Or can I just have a volfile I can deploy to all my VM hosts?
18:33 JoeJulian You don't want to hand-write volfiles. You lose all the dynamic management ability.
18:33 JoeJulian I prefer rrdns, but some prefer backup-volfile-servers in their mount options.
18:34 lupine well, whether I'm writing to /etc/fstab or /etc/my-volfile, it's going to be done by ansible which knows about all the storage servers
18:34 JoeJulian But, like most things that are based on opinion, my way is the right way. ;)
18:36 lupine ah, are you saying the volfile contains more than just a list of server names?
18:36 JoeJulian You *can* use volfiles, but you can no longer use glusterd to do anything with the volume. You cannot add nor remove servers, replace bricks, change volume settings, etc.
18:37 lupine yeah, that doesn't strike me as fun
18:37 JoeJulian Hehe, not unless you have a very warped sense of fun.
18:37 lupine hmm. I guess I could add a couple of servers to the cluster (gluster?), give them no bricks, and use them as my primary and backup volfile-servers ?
18:38 lupine I'm really trying to get away from having a looong list of servers in /etc/fstab on all the vm hosts
18:38 post-factum lupine: make DNS RR
18:39 lupine rrdns isn't a horrible option, but I'm generally skeptical of it being respected
18:39 lupine that might be unfounded in the fine glusterfs code, of course ;)
18:40 post-factum RR is server-side feature. DNS server returns A records in round-robin manner
18:41 JoeJulian rrdns has been supported and tested in gluster.
18:41 JoeJulian by me.
18:41 JoeJulian well, tested by me.
18:41 lupine :D
18:41 post-factum and by devs, as I saw RR mentioned somewhere in docs
18:42 JoeJulian That's been in there since 3.1, iirc.
18:42 post-factum but yeah, everybody lies, and one have to test it anyway
18:42 JoeJulian trust but verify
18:43 lupine mm, we're still using tinydns here so anything could happen
18:45 JoeJulian If you do want to use the backup.*servers setting, you only have to add enough servers that you think you won't miss in a fault situation. Normally, two extra servers is going to get you in the 6 nines probability range.
18:46 mhulsman joined #gluster
18:48 lupine mm, there's just a bit of scare about having effectively two tiers of storage server where currently there's one. probably unjustified in practice
18:48 * lupine is trying very hard to move us from a mass NBD deployment
18:50 JoeJulian @mount server
18:50 glusterbot JoeJulian: (#1) The server specified is only used to retrieve the client volume definition. Once connected, the client connects to all the servers in the volume. See also @rrdns, or (#2) One caveat is that the clients never learn of any other management peers. If the client cannot communicate with the mount server, that client will not learn of any volume changes.
18:51 post-factum see?
18:51 post-factum what a clever bot
18:51 JoeJulian I can't remember if I ever got around to filing a bug on #2...
18:55 fubada joined #gluster
18:56 fubada hi, How do I change existing cluster from "Number of Bricks: 2 x 2 = 4" to "Number of Bricks: 1 x 2 = 2"?
18:57 fubada I want to scale down from 4 gluster peers down to 2
18:58 JoeJulian gluster volume remove-brick $brick3 $brick4 start ; wait for "gluster volume remove-brick $brick3 $brick4 status" to show complete. gluster volume remove-brick $brick3 $brick4 commit
18:58 fubada thank you JoeJulian
19:01 haomaiwa_ joined #gluster
19:06 julim joined #gluster
19:17 lord4163 joined #gluster
19:20 theron joined #gluster
19:22 kovshenin joined #gluster
19:24 B21956 joined #gluster
19:29 rossdm joined #gluster
19:33 baojg joined #gluster
19:52 LDA2 joined #gluster
19:55 robb_nl joined #gluster
19:58 mobaer joined #gluster
20:01 18VAAAWKW joined #gluster
20:20 nage joined #gluster
20:30 klaxa joined #gluster
20:30 deniszh joined #gluster
20:32 raghu joined #gluster
20:37 baojg joined #gluster
20:39 fubada JoeJulian: are there known issues with starting 3.6 volumes after upgrading to 3.7.6?
20:39 JoeJulian None that I've heard about.
20:39 fubada https://gist.github.com/aamerik/ce16b72bb38b327eec19
20:39 glusterbot Title: gist:ce16b72bb38b327eec19 · GitHub (at gist.github.com)
20:39 fubada Can I ask you to take a look at this please
20:40 badone joined #gluster
20:40 JoeJulian which distro?
20:40 fubada rhel 6.6
20:41 JoeJulian Well, not that then...
20:41 JoeJulian "gf_string2boolean" interesting... Suggests that some string that should eval to a boolean is something other.
20:42 JoeJulian Paste up a "gluster volume info"
20:42 farhoriz_ joined #gluster
20:42 fubada 1s
20:42 Rapture left #gluster
20:42 fubada JoeJulian: https://gist.github.com/aamerik/a3ad0ff6027a1fa6b75c
20:42 glusterbot Title: gist:a3ad0ff6027a1fa6b75c · GitHub (at gist.github.com)
20:43 JoeJulian Well... it's not that.
20:44 JoeJulian It's days like this that make me wish I was an alcoholic.
20:44 fubada :)
20:44 fubada then youd just sleep all day and we wouldnt get any help
20:44 fubada not god
20:44 fubada good*
20:44 JoeJulian either way
20:46 Humble joined #gluster
20:51 fubada JoeJulian: looks like there were some stalr glusterfs and glusterfsd procs running that I had to kill
20:52 JoeJulian Ah, ok. That could explain it.
20:52 fubada im upgrading gluster, resizing down from 2x2 to 2x1, and using purpleideas module
20:53 JoeJulian Ah, cool. That's a good one. He's worked hard on it.
20:53 fubada yeah very useful
20:57 Pupeno joined #gluster
20:58 calavera joined #gluster
21:01 haomaiwang joined #gluster
21:31 nickage_ joined #gluster
21:31 nottc joined #gluster
21:49 farhoriz_ joined #gluster
21:52 mhulsman joined #gluster
21:59 theron joined #gluster
22:00 nickage_ joined #gluster
22:01 haomaiwa_ joined #gluster
22:05 PsionTheory joined #gluster
22:25 ahino joined #gluster
22:28 gildub joined #gluster
22:39 natarej joined #gluster
22:48 ahino joined #gluster
22:57 ira joined #gluster
23:01 haomaiwang joined #gluster
23:02 baojg joined #gluster
23:08 Peppard joined #gluster
23:21 plarsen joined #gluster
23:27 ovaistariq joined #gluster
23:57 farhoriz_ joined #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary