Perl 6 - the future is here, just unevenly distributed

IRC log for #gluster, 2016-02-18

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:36 hagarth merp_: were you performing rsync when you noticed this problem?
00:47 pjrebollo joined #gluster
01:10 pjrebollo joined #gluster
01:12 rcampbel3 joined #gluster
01:17 EinstCrazy joined #gluster
01:18 merp_ hagarth, nope, the operation of some apps that uses gluster for storage took a nosedive
01:21 rcampbel3 joined #gluster
01:38 pjrebollo joined #gluster
01:44 pjrebollo joined #gluster
01:45 baojg joined #gluster
01:47 cpetersen Logos01: When you say management connection, do you mean on ESX or ganesha side?  NFS-ganesha has 3 floating IPs so it shouldn't matter, right?
01:55 cpetersen Would plainly using DNS, not necessarily RRDNS, be a plausible option to resolve this?
01:56 Lee1092 joined #gluster
02:08 hagarth merp_: are you aware of application IO pattern that tanked?
02:08 hagarth merp_: am trying to construct a test case that simulates the problem
02:12 haomaiwang joined #gluster
02:16 harish joined #gluster
02:18 EinstCrazy joined #gluster
02:20 Wizek joined #gluster
02:23 nehar joined #gluster
02:24 chromatin joined #gluster
02:24 merp_ hagarth, is this the memory leaking or the slow write throughput?
02:25 Wizek joined #gluster
02:27 merp_ slow write throughput was easy to detect, with write-back enabled in a three node replicate configuration (1x3) a simple "dd if=/dev/zero of=bwtest bs=1M count=64" shows the slow performance
02:27 merp_ disabling write-back and running the dd command increases throughput by 10-20x
02:29 merp_ as for the leaks, we have a huge quantity of small sqlite files (1-50mb) that are constantly being updated with disk journals enabled (causing each update to carry out tons of writes on the gluster volume)
02:29 merp_ glusterfs would increase in memory slowly, issuing 'echo 2 > /proc/sys/vm/drop_caches' would not cause the memory usage of the glusterfs process in question to shrink
02:40 nangthang joined #gluster
02:48 ilbot3 joined #gluster
02:48 Topic for #gluster is now Gluster Community - http://gluster.org | Patches - http://review.gluster.org/ | Developers go to #gluster-dev | Channel Logs - https://botbot.me/freenode/gluster/ & http://irclog.perlgeek.de/gluster/
02:57 Humble joined #gluster
02:57 ovaistariq joined #gluster
03:01 haomaiwang joined #gluster
03:08 primusinterpares joined #gluster
03:13 merp_ joined #gluster
03:14 skoduri joined #gluster
03:21 dthrvr joined #gluster
03:21 overclk joined #gluster
03:28 nangthang joined #gluster
03:32 theron joined #gluster
03:35 nehar joined #gluster
03:35 gildub joined #gluster
03:45 baojg joined #gluster
03:45 hackman joined #gluster
03:49 ashiq_ joined #gluster
03:49 EinstCra_ joined #gluster
03:50 kovshenin joined #gluster
03:56 gem joined #gluster
03:59 RameshN joined #gluster
04:01 EinstCrazy joined #gluster
04:01 haomaiwa_ joined #gluster
04:03 itisravi joined #gluster
04:04 atinm joined #gluster
04:07 JoeJulian cpetersen: yes and no to the dns question. The problem with dns would be the ttl which would have to be agressivly small. I suppose you could, perhaps, use pacemaker to update /etc/hosts instead of floating an ip. I don't know how nfs would handle dns change to mounted filesystems (I'm guessing it wouldn't handle it at all though).
04:09 nbalacha joined #gluster
04:10 shubhendu joined #gluster
04:14 lanning you would have to remount
04:15 ramteid joined #gluster
04:15 lanning nfs is to the IP, only the mount command handles names.
04:18 pjrebollo joined #gluster
04:24 pjrebollo joined #gluster
04:24 chirino joined #gluster
04:25 baojg joined #gluster
04:26 nishanth joined #gluster
04:30 sebamontini joined #gluster
04:31 kanagaraj joined #gluster
04:31 pjrebollo joined #gluster
04:32 kshlm joined #gluster
04:34 jiffin joined #gluster
04:35 sebamontini hello everybody
04:36 sakshi joined #gluster
04:36 sebamontini i'm having issues with 2 glusters nodes (2 bricks each)
04:36 sebamontini i added the 2 new bricks
04:37 sebamontini and when i tried to rebalance, i've got one node in which glusterd crashed
04:37 sebamontini and it's not starting now
04:37 sebamontini any ideas?
04:42 hgowtham joined #gluster
04:44 hgowtham joined #gluster
04:44 jiffin sebamontini: check the logs , /var/log/glusterfs/etc-glusterfs-glusterd.vol.log for more details
04:44 ppai joined #gluster
04:44 sebamontini i did
04:44 kdhananjay joined #gluster
04:45 Saravanakmr joined #gluster
04:45 sebamontini [2016-02-18 04:43:15.758413] E [socket.c:823:__socket_server_bind] 0-socket.management: binding to  failed: Address already in use
04:45 sebamontini [2016-02-18 04:43:15.758454] E [socket.c:826:__socket_server_bind] 0-socket.management: Port is already in use
04:45 sebamontini which is the port that is "in use" according to that errmsg?
04:46 jiffin sebamontini: ports used by previous glusterd may not freed properly
04:46 atinm sebamontini, are you sure that glusterd instance is not running?
04:47 sebamontini yep
04:47 atinm sebamontini, ps aux | grep glusterd output ?
04:47 sebamontini yes, sure
04:47 sebamontini [root@gluster01-secundario glusterfs]# ps aux |grep glusterd
04:47 sebamontini root     20084  0.0  0.0 100932   676 pts/1    S+   01:44   0:00 tail -100f etc-glusterfs-glusterd.vol.log
04:47 sebamontini root     20115  0.0  0.0 103212   824 pts/0    S+   01:47   0:00 grep glusterd
04:47 atinm sebamontini, then what jiffin said might be right
04:47 sebamontini i'm starting to think jiffin might be right
04:47 sebamontini which port does glusterd uses by default?
04:47 atinm sebamontini,24007
04:48 atinm sebamontini, just wait for 2 minutes for and then try it out
04:48 sebamontini atinm i'm with this issue for over 3 hours now
04:48 sebamontini [root@gluster01-secundario glusterfs]# netstat -putan |grep 24007
04:48 sebamontini [root@gluster01-secundario glusterfs]#
04:48 sebamontini the port looks free
04:49 Humble glusterd itself not running ?
04:49 sebamontini nope
04:49 nehar joined #gluster
04:50 atinm sebamontini, can you share the complete log sequence?
04:50 sebamontini sure, let me just use a pastebin
04:50 atinm sebamontini, which gluster version?
04:50 sebamontini [root@gluster01-secundario glusterfs]# glusterd --version
04:50 sebamontini glusterfs 3.7.1 built on Jun  1 2015 17:54:04
04:50 Humble u have to paste the glusterd log
04:51 sebamontini http://paste.nubity.com/977c27bb062a7292.sm
04:51 glusterbot Title: Paste | Nubity (at paste.nubity.com)
04:52 sebamontini thats the log of the  /var/log/glusterfs/etc-glusterfs-glusterd.vol.log  from the moment i try to start glusterd and fails
04:52 atinm sebamontini, I don't see any bind failure related logs
04:52 kshlm atinm, GlusterD is crashing when trying to start rebalance
04:52 atinm sebamontini, as per the log glusterd crashes when it tries to restart rebalance
04:52 sebamontini atinm line 5 and 6
04:53 sebamontini but glusterd is not even starting
04:53 atinm sebamontini, I know the issue
04:53 atinm sebamontini, you'd need to upgrade it to latest
04:53 sebamontini upgrade gluster?
04:53 atinm sebamontini, we had a bug where restarting rebalance would fail
04:53 atinm sebamontini, yes
04:53 pjrebollo joined #gluster
04:54 * atinm is looking for the bug
04:54 kshlm atinm, A bug link?
04:54 Alghost joined #gluster
04:55 Manikandan joined #gluster
04:55 atinm kshlm, still searching
04:57 atinm sebamontini, kshlm : https://bugzilla.redhat.co​m/show_bug.cgi?id=1227677
04:57 glusterbot Bug 1227677: high, unspecified, ---, spalai, CLOSED CURRENTRELEASE, Glusterd crashes and cannot start after rebalance
05:00 sebamontini ok, thanks a lot atinm
05:00 sebamontini just to be sure
05:00 sebamontini i'll need to stop gluster on both sides, then upgrade through the epel repo
05:00 sebamontini then restart gluster, and start the rebalance
05:00 sebamontini right?
05:01 haomaiwang joined #gluster
05:04 pjrebollo joined #gluster
05:10 rcampbel3 joined #gluster
05:14 pjrebollo joined #gluster
05:19 sakshi joined #gluster
05:20 pjrebollo joined #gluster
05:21 pppp joined #gluster
05:23 Apeksha joined #gluster
05:26 pjrebollo joined #gluster
05:26 ovaistariq joined #gluster
05:28 gowtham joined #gluster
05:29 pjrebollo joined #gluster
05:30 rafi joined #gluster
05:31 kotreshhr joined #gluster
05:35 pjrebollo joined #gluster
05:43 anil joined #gluster
05:44 poornimag joined #gluster
05:50 Wizek joined #gluster
05:55 kdhananjay joined #gluster
06:01 haomaiwa_ joined #gluster
06:02 merp_ joined #gluster
06:02 harish joined #gluster
06:03 Saravanakmr joined #gluster
06:04 sadbox joined #gluster
06:05 Bhaskarakiran joined #gluster
06:06 karthikfff joined #gluster
06:07 rafi joined #gluster
06:09 skoduri joined #gluster
06:14 pjrebollo joined #gluster
06:22 atalur joined #gluster
06:29 Wizek joined #gluster
06:33 skoduri joined #gluster
06:34 unlaudable joined #gluster
06:37 Humble joined #gluster
06:43 karnan joined #gluster
06:49 overclk joined #gluster
06:51 baojg joined #gluster
06:51 pjrebollo joined #gluster
07:01 haomaiwang joined #gluster
07:04 plarsen joined #gluster
07:16 baojg joined #gluster
07:16 kshlm post-factum, https://bugzilla.redhat.com/sh​ow_bug.cgi?id=glusterfs-3.7.9
07:16 glusterbot Bug glusterfs: could not be retrieved: InvalidBugId
07:17 kshlm @later tell post-factum The 3.7.9 tracker bug is live at https://bugzilla.redhat.com/sh​ow_bug.cgi?id=glusterfs-3.7.9
07:17 glusterbot kshlm: The operation succeeded.
07:17 glusterbot Bug glusterfs: could not be retrieved: InvalidBugId
07:23 robb_nl joined #gluster
07:27 post-factum kshlm: tracking bug for 3.7.9 updated
07:30 m0zes joined #gluster
07:30 pjrebollo joined #gluster
07:37 mhulsman joined #gluster
07:41 auzty joined #gluster
07:45 Philambdo joined #gluster
07:46 TonyBurn joined #gluster
07:55 EinstCra_ joined #gluster
08:01 haomaiwa_ joined #gluster
08:04 Saravanakmr joined #gluster
08:11 ivan_rossi joined #gluster
08:13 RameshN joined #gluster
08:13 [diablo] joined #gluster
08:14 Bardack grmf, we've some shares that we must mount as NFS. if we mount nas:/X to /nas/X , it's always good and stable. But if we mount nas:/X/Y/Z to /nas/X , it's really unstable. anybody already encountered something like that ?
08:14 Akee joined #gluster
08:15 [Enrico] joined #gluster
08:27 ndarshan joined #gluster
08:29 fsimonce joined #gluster
08:31 harish joined #gluster
08:34 Manikandan joined #gluster
08:37 mbukatov joined #gluster
08:37 jri joined #gluster
08:40 jiffin Bardack: did you meant subdir export for gluster NFS is not stable?
08:45 Bardack yep
08:45 Bardack at least it seems to
08:45 Bardack but we discovered a problem on our side
08:46 Bardack so i'll first see this :)
08:46 jiffin Bardack: K
08:50 gowtham joined #gluster
08:50 johnmilton joined #gluster
09:01 haomaiwa_ joined #gluster
09:02 ira joined #gluster
09:08 skoduri joined #gluster
09:09 Saravanakmr joined #gluster
09:11 Apeksha joined #gluster
09:19 poornimag joined #gluster
09:19 EinstCrazy joined #gluster
09:26 ravage joined #gluster
09:26 ravage post-factum, just wanted to say thx again. cluster runs like a charm now with fewer threads
09:27 kovsheni_ joined #gluster
09:28 ppai joined #gluster
09:40 Slashman joined #gluster
09:43 post-factum ravage: np :)
09:44 kblin left #gluster
09:50 sakshi joined #gluster
09:51 R0ok_ joined #gluster
09:52 arcolife joined #gluster
09:54 skoduri_ joined #gluster
10:01 haomaiwa_ joined #gluster
10:03 kdhananjay joined #gluster
10:05 nishanth joined #gluster
10:05 ira joined #gluster
10:06 jwd joined #gluster
10:09 m0zes joined #gluster
10:12 karnan joined #gluster
10:12 post-factum testing 3.7.8 client with 3.7.6 server. write-behind is enabled
10:12 post-factum 67108864 bytes (67 MB, 64 MiB) copied, 37.8935 s, 1.8 MB/s
10:14 nbalacha joined #gluster
10:16 gem joined #gluster
10:16 RameshN joined #gluster
10:19 post-factum but another cluster with write-behind works ok
10:19 kotreshhr joined #gluster
10:23 post-factum the only difference that slow cluster has distributed-replicated volume, and the fast one doesn't have replication
10:29 atalur joined #gluster
10:35 post-factum however, if i add replica to "fast" volume, it works ok with write-behind enabled as well
10:36 post-factum so, the issue is in 3.7.8 client interacting with existing 3.7.6 cluster
10:36 post-factum i mean, the key word *existing*
10:38 post-factum hmm, i make fast volume slow now. have to find out what changed
10:44 post-factum well, adding replica definitely slows things down
10:44 post-factum removing replica restores fast behavior
10:45 post-factum merp_: do you have replicated volume?
10:45 post-factum hagarth: ^^
10:45 post-factum JoeJulian: ^^
10:51 poornimag joined #gluster
10:57 jww left #gluster
11:01 haomaiwa_ joined #gluster
11:02 DV joined #gluster
11:05 ovaistariq joined #gluster
11:05 mobaer joined #gluster
11:10 m0zes joined #gluster
11:11 kkeithley1 joined #gluster
11:13 harish joined #gluster
11:16 atalur joined #gluster
11:16 msciciel_ joined #gluster
11:20 karnan joined #gluster
11:37 foster joined #gluster
11:44 Reiner031 joined #gluster
11:45 kdhananjay1 joined #gluster
11:48 Reiner032 joined #gluster
11:49 itisravi joined #gluster
11:50 post-factum merp_: sorry, I've re-read your bugreport carefully and found replica 3 there. obviously, smth is going wrong with replica and write-behind in 3.7.8
11:52 RameshN joined #gluster
11:58 chirino joined #gluster
11:59 EinstCrazy joined #gluster
12:00 atalur joined #gluster
12:00 ppai joined #gluster
12:00 nehar joined #gluster
12:01 haomaiwang joined #gluster
12:07 pjrebollo joined #gluster
12:20 pjrebollo joined #gluster
12:20 mobaer joined #gluster
12:24 hgichon joined #gluster
12:26 rGil joined #gluster
12:26 ahino joined #gluster
12:26 hgichon Hi all.  I have a replicated volume... composed  with ful ssd brick and sata brick.
12:28 hgichon Is it possible  read operation should be done at ssd brick?
12:30 Reiner032 joined #gluster
12:39 pjrebollo joined #gluster
12:39 m0zes joined #gluster
12:40 itisravi hgichon: yes, you can use the cluster.read-subvolume-index option.
12:41 itisravi hgichon: if your first brick is ssd, then the index value is '0'
12:43 johnmilton joined #gluster
12:43 hgichon Oh good!! if ssd brick fail ocurred then read operation started at sata brick?
12:44 Plysdyret joined #gluster
12:44 itisravi hgichon:  yes. The option comes into play only if all bricks have good data. If one of the bricks needs heal for that file, the read will automatically be served from the healthy one.
12:46 hgichon Thansk a lot
12:48 Reiner030 Hello,
12:48 Reiner030 informational I found Monday evening in Debian jessie-backports the actual glusterfs packages. Thx. to the package Maintainer Patrick. ;)
12:54 Reiner030 And a "bug or feature" question: We mount glusterfs in Saltstack by role and another administrator was testing a 2nd gluster cluster which "accidently" had same role and therefore came into the backup-volfile-servers=... mount parameter. Because these testserver haven't offered the wanted volume the mount failed (and the queck of further entries was not made)... If this is a bug I could/would create a ticket; if it's a feature then it's as
12:56 m0zes joined #gluster
12:56 kanagaraj joined #gluster
12:57 julim joined #gluster
13:02 pjrebollo joined #gluster
13:03 robb_nl joined #gluster
13:07 ovaistariq joined #gluster
13:09 EinstCrazy joined #gluster
13:14 nishanth joined #gluster
13:15 shubhendu joined #gluster
13:16 Ravage hey. me again. im trying to mount an image file via loop device from a node thats currently resyncing the data. i can mount the image from the other node that has consistent data. does glusterfs somehow block access to files in the healing process?
13:17 Ravage i can also read the file without problems.
13:17 Ravage mount and fsck command just hangs forever
13:21 unclemarc joined #gluster
13:22 haomaiwa_ joined #gluster
13:22 poornimag joined #gluster
13:24 ira joined #gluster
13:42 hchiramm joined #gluster
13:44 amye joined #gluster
13:44 theron joined #gluster
13:46 rafi1 joined #gluster
13:51 amye joined #gluster
13:54 shubhendu joined #gluster
13:56 nishanth joined #gluster
13:57 kovshenin joined #gluster
14:01 haomaiwa_ joined #gluster
14:08 theron joined #gluster
14:11 chirino_m joined #gluster
14:13 nbalacha joined #gluster
14:17 theron joined #gluster
14:27 raghu joined #gluster
14:36 Bhaskarakiran joined #gluster
14:42 Reiner030 @Ravage: no direct idea but is there something informational logged in /var/log/glusterfs/data-storage.log?
14:43 hamiller joined #gluster
14:43 skylar joined #gluster
14:50 kdhananjay joined #gluster
14:52 coredump joined #gluster
15:01 haomaiwa_ joined #gluster
15:02 kdhananjay joined #gluster
15:18 ashiq_ joined #gluster
15:20 theron joined #gluster
15:21 plarsen joined #gluster
15:23 Wizek joined #gluster
15:24 matclayton joined #gluster
15:26 Wizek joined #gluster
15:27 chirino joined #gluster
15:27 squizzi_ joined #gluster
15:27 mhulsman joined #gluster
15:34 Wizek joined #gluster
15:34 theron_ joined #gluster
15:35 virusuy Hi guys , yesterday i  asked for a procedure to replace a crashed node with a new one , same IP and same hostname. I just want to say that following this procedure everything went well (even though i'm using gluster 3.6) http://www.gluster.org/community/docum​entation/index.php/Gluster_3.4:_Brick_​Restoration_-_Replace_Crashed_Server
15:36 virusuy So, if someone is looking for that on internet, probably will find out this log :)
15:38 Wizek joined #gluster
15:39 Bhaskarakiran joined #gluster
15:41 Bhaskarakiran joined #gluster
15:43 blubberdi joined #gluster
15:43 ccoffey @virusuy Thanks. it's good to have some practical feeback. For my own reference, how many nodes?, how much data? how much downtime?
15:43 blubberdi Hello, has someone an idea how to force a split-brain? I want to test my monitoring.
15:44 ccoffey @blubberdi, change the files on the bricks directly? Would that do it?
15:44 virusuy blubberdi: you could bring one node down (disconnecting the node from the network) , and modify one file of that "disconnected" node
15:44 virusuy ccoffey: 2 nodes in this case
15:45 virusuy ccoffey: data was about 4Tb, they are connected to each other through 10G Fiber optic , and downtime was about 40 min i guess
15:45 blubberdi virusuy: ccoffeyy Thanks, I'll try that
15:45 virusuy blubberdi: you should try that on a test environment
15:45 virusuy blubberdi: just for safety :)
15:46 blubberdi Sure :D
15:46 ccoffey virusuy: ok, good to know. I've been through it twice in the last year, but with 8 nodes, 40 bricks. I have one node rebuilding it's 99.7% done after almost 3 months :)
15:46 rafi joined #gluster
15:47 virusuy ccoffey: and how much data ?
15:48 ccoffey 27TBx5 from one datacentre to another. I was going for 0 downtime though. Our files don't change though, we just replace them with new ones so the method I used is not suitable in most cases
15:51 Wizek joined #gluster
15:52 nage joined #gluster
15:54 matclayton does anyone have a good example of setting up bricks with LVM to use snapshoting?
15:54 matclayton we initially set them up with thin provisioning and snapshoting didnt work, trying to figure out how we shoudl ahev done it
15:55 farhoriz_ joined #gluster
16:02 mobaer joined #gluster
16:06 theron joined #gluster
16:09 ovaistariq joined #gluster
16:12 jiffin joined #gluster
16:13 mobaer joined #gluster
16:16 nishanth joined #gluster
16:17 bitpushr_ joined #gluster
16:20 jiffin joined #gluster
16:21 bitpushr joined #gluster
16:22 bitpushr joined #gluster
16:22 drankis joined #gluster
16:32 haomaiwa_ joined #gluster
16:33 skoduri joined #gluster
16:33 dc_tx joined #gluster
16:35 bitpushr joined #gluster
16:38 JoeJulian blubberdi: You could also manipulate the ,,(extended attributes) on the brick to simulate a split-brain.
16:38 glusterbot blubberdi: (#1) To read the extended attributes on the server: getfattr -m .  -d -e hex {filename}, or (#2) For more information on how GlusterFS uses extended attributes, see this article: http://pl.atyp.us/hekafs.org/index.php/​2011/04/glusterfs-extended-attributes/
16:40 JoeJulian matclayton: Haven't done it, so I don't have the example you've asked for, but I do know that thin provisioning is supposed to be correct.
16:41 ivan_rossi left #gluster
16:41 DRC_RHT joined #gluster
16:41 matclayton JoeJulian: think I just got it working
16:42 matclayton some clear docs on how to set it up would be really nice/helpful as I suspect most people dont use thin provisioning
16:47 JoeJulian I agree. Would you please file a bug on that?
16:47 glusterbot https://bugzilla.redhat.com/en​ter_bug.cgi?product=GlusterFS
17:01 atinm joined #gluster
17:01 haomaiwang joined #gluster
17:06 jiffin joined #gluster
17:10 ovaistariq joined #gluster
17:14 kanagaraj joined #gluster
17:20 gem joined #gluster
17:21 sebamontini joined #gluster
17:25 bluenemo joined #gluster
17:28 bennyturns joined #gluster
17:36 kovshenin joined #gluster
17:38 cpetersen How do I override quorum to bring a brick up that I know is OK?
17:43 rcampbel3 joined #gluster
17:43 Manikandan joined #gluster
17:47 sebamontini left #gluster
17:47 cpetersen Nevermind, found a bad peer.  Rejoining peer.
18:01 92AAAAUQE joined #gluster
18:03 mhulsman joined #gluster
18:04 rafi joined #gluster
18:07 theron joined #gluster
18:09 luizcpg joined #gluster
18:11 Wizek joined #gluster
18:13 kanagaraj joined #gluster
18:20 jwaibel joined #gluster
18:23 jwd joined #gluster
18:26 jwaibel joined #gluster
18:27 cpetersen kkeithley: Odd.  When the brick serving the share fails and the IP floats to another brick, using NFS 4.1 ESXi will not bring the share back up after it was inaccessible, while on NFS 3 it will.
18:27 cpetersen Again, VMware you little bitch.
18:28 cpetersen I'm struggling with that because, correct me if I'm wrong, but the whole point of using Ganesha was to use NFS 4.1 and all of it's benefits.  =)
18:31 ovaistariq joined #gluster
18:31 karnan joined #gluster
18:48 unclemarc joined #gluster
18:50 merp_ joined #gluster
18:54 cpetersen What's more is that when using NFS 4.1, I don't get split-brain when multiple hosts consume the share and a brick fails.  When I use NFS 3, the share comes up after failure but I do get split-brain.
18:54 cpetersen This is insanity.  :)
18:55 hamiller joined #gluster
18:55 cpetersen Is it logical that I would get split-brain from using NFS 3 to access the replicated volume?
18:58 cpetersen I have cache-invalidation on!  :'(
18:58 kkeithley_ cpetersen: yes, NFS-ganesha is our NFSv4 solution.
18:59 kkeithley_ Off hand I don't know why you'd get a split-brain with v3 and not v4.
19:03 cpetersen :(
19:04 cpetersen I would rather determine why my share doesn't come back with NFS 4.  That seems like the way to go.  Though it seems like more of a vmware issue.
19:04 kkeithley_ yes, I think that's wise
19:05 post-factum cpetersen: is that all about grace period?
19:09 alex___ joined #gluster
19:09 cpetersen It could be, I'm not sure.  The IP floats just fine, but I have this error on the grace-monitor.  I didn't think anything of it due to the status saying complete.  I was thinking it was an informational message coded in to an error block.
19:09 cpetersen Failed Actions:
19:09 cpetersen * nfs-grace_monitor_0 on file03 'unknown error' (1): call=16, status=complete, exitreason='none',
19:09 cpetersen last-rc-change='Thu Feb 18 10:51:32 2016', queued=0ms, exec=71ms
19:09 cpetersen Was that too big for the channel?  If so I will use FBIN next time.
19:09 post-factum >unknown error
19:09 post-factum who the hell wrota that debug
19:10 post-factum s/wrota/wrote/
19:10 glusterbot What post-factum meant to say was: who the hell wrote that debug
19:10 cpetersen I had that error before kkeithley gave me the patch for monitoring nodes via IP, but it never said "complete."
19:10 post-factum have you checked gratuitous ARP sent OK after IP floating?
19:11 cpetersen And after the patch the nodes fail just fine.
19:11 cpetersen I have not.
19:11 post-factum i mean, if the host is reachable via virtual IP after failover
19:11 cpetersen yes it is
19:12 cpetersen I've done a vmkping from the esxi console.
19:12 post-factum who manages ganesha ha? pacemaker?
19:13 cpetersen yes
19:13 cpetersen pacemaker
19:14 post-factum and ganesha log clearly says it is in grace state after IP failover happens?
19:15 post-factum what i did was keepalived+small dbus-related script for notifying ganesha, and it worked ok for me. i'd like just to find out how pacemaker does the same
19:15 janegil joined #gluster
19:15 alex___ Hi there, I need some help here :D .... We are using Gluster in our web server architecture, with a brick for assets. Each server write to the brick, but read from the local (to speed up things). For some reasons, we need to create a "known folder name" symlink to one of the folders in assets, but to keep the things fast we want to target the local directory (not the bring) so the server keep reading from the local. Does somebody s
19:16 alex___ (not the brick)*
19:16 post-factum alex___: split your help request into smaller pieces plz :)
19:19 alex___ sorry ^^ the help request in short could be: Can I create a symlink in a brick to target the local directory of another folder (that is the brick too)?
19:20 cpetersen post-factum: Yes, it goes in to grace and comes right back out.
19:20 cpetersen http://ur1.ca/ojf5k
19:20 glusterbot Title: #325023 Fedora Project Pastebin (at ur1.ca)
19:20 cpetersen And I can vmkping the float IP from ESXi.
19:21 cpetersen The share goes inaccessible.
19:21 post-factum cpetersen: what about telnetting to tcp/2049?
19:22 post-factum alex___: never write to bricks directly
19:23 post-factum cpetersen: and what about reported errors in config?
19:24 alex___ well when I say "brick" maybe I have to say "cluster"
19:25 alex___ we write to the cluster but read from the local
19:25 cpetersen netcat returns nothing
19:26 cpetersen trying to connect to port 2049 on the float ip
19:26 cpetersen from esxi
19:26 post-factum cpetersen: errno=2 btw is ENOENT. any EPERM there?
19:27 post-factum cpetersen: so, tcp connection is established?
19:29 theron joined #gluster
19:30 cliluw joined #gluster
19:32 haomaiwa_ joined #gluster
19:33 cpetersen "ss | less | grep 2049" returns nothing
19:33 cpetersen and "ss | less" returns nothing with the esxi host vmkernel ip in it
19:34 cpetersen whelp, nevermind, I lied on that last one
19:34 squizzi_ joined #gluster
19:34 cpetersen http://ur1.ca/ojf95
19:34 glusterbot Title: #325056 Fedora Project Pastebin (at ur1.ca)
19:34 cpetersen "ss | less | grep 172.16.16.11" looks like it has active connections
19:36 cpetersen "rpm -qa | grep eperm
19:36 cpetersen " and "yum list installed | grep eperm" return no results
19:38 post-factum cpetersen: EPERM is an error that I'd like to see in ganesha logs to explain ENOENT there
19:38 post-factum but lets go step by step
19:38 post-factum what about complaints on ganesha config in log?
19:39 post-factum like "0 export entries in /etc/ganesha/exports/export.​gluster_shared_storage.conf added because (export create, block validation) errors"
19:39 post-factum is the config consitent?
19:39 kovshenin joined #gluster
19:40 cpetersen it is, yes, no complaints like that in there
19:40 cpetersen showmount -e on all nodes shows the export proper
19:42 post-factum so why you pasted those errors?..
19:42 cpetersen I'm learning lessons here, I lied again, second node (not the one that owns the float ip) has no export
19:42 cpetersen the node that took the float has the export
19:43 cpetersen and yes, I get what you're saying, the log I posted has export errors
19:43 post-factum so I guess export configs are inconsistent across nodes
19:44 das_j joined #gluster
19:44 cpetersen yes it seems so, I will have to tear down and re-run the ganesha-ha script
19:48 das_j Hello, I'm currently trying to create an SSL cluster. When using openssl s_client with the -CAfile option, everything works fine, but when using gluster, I always get a certificate verify failed error. Log with LOG_LEVEL=DEBUG: https://gist.github.com/dasJ/920585b4e5de154465ff
19:48 glusterbot Title: gist:920585b4e5de154465ff · GitHub (at gist.github.com)
19:51 voobscout joined #gluster
19:51 ovaistariq joined #gluster
19:52 cpetersen post-factum: export.gluster_shared_storage.conf is the same across all nodes, the error is coming from the FSAL block as per the ganesha log.
19:52 cpetersen Should "hostname="localhost";" be set to the actual host name of each brick?
19:53 post-factum it may be set to localhost if gluster server and ganesha are within the same node
19:53 cpetersen ok, they are so that's fine
19:54 janegil joined #gluster
19:55 cpetersen I don't see any discrepancies in the exports between nodes
19:55 cpetersen http://ur1.ca/ojfdj
19:55 glusterbot Title: #325083 Fedora Project Pastebin (at ur1.ca)
19:55 cpetersen I don't see any issues with what is in them either
19:56 cpetersen and this was automatically created with the gluster scripts
20:03 post-factum looks well. dunno then :(
20:03 post-factum if each node is used as nfs server separately, everything works ok?
20:04 post-factum not via virtual IP, but via master ip
20:04 cpetersen it's only when the float ip moves that I have the problem
20:05 cpetersen to achieve HA of the share
20:05 voobscout joined #gluster
20:05 cpetersen I guess I could add multiple endpoints to the share config in ESX... but I'm not sure how that would work seeing as it's a replicated volume
20:06 cpetersen The esxi vmkernel logs are interesting.  They tell me that the connection was restored, then dropped.
20:06 cpetersen http://ur1.ca/ojffn
20:06 glusterbot Title: #325099 Fedora Project Pastebin (at ur1.ca)
20:07 cpetersen I even disabled storage apd monitoring and upped the timeout to 5 minutes too.
20:07 voobscout joined #gluster
20:08 squizzi__ joined #gluster
20:09 post-factum have you tried non-esx client?
20:09 post-factum like linux box or so
20:09 merp_ joined #gluster
20:09 cpetersen I haven't no.
20:10 sloop- joined #gluster
20:11 codex_ joined #gluster
20:11 lalatend1M joined #gluster
20:12 nage joined #gluster
20:12 lanning joined #gluster
20:12 atrius joined #gluster
20:12 skylar joined #gluster
20:13 stopbyte joined #gluster
20:15 social joined #gluster
20:17 post-factum any chances to try? just to exclude proprietary chain
20:18 portante joined #gluster
20:18 Pintomatic joined #gluster
20:31 theron joined #gluster
20:31 mobaer joined #gluster
20:33 voobscout joined #gluster
20:49 mhulsman joined #gluster
20:52 cpetersen post-factum: I have 3 nodes so I am going to use the node that doesn't receive the float to mount and test
21:03 cpetersen post-factum: survey says, mount on a standard linux nfs41 client locked up as well
21:05 cpetersen restarted the nfs-ganesha and gluster services, no effect
21:07 cpetersen correction:  after restarting the gluster service the share became accessible again from linux but not from esxi
21:10 BuffaloCN joined #gluster
21:18 BuffaloCN joined #gluster
21:20 bennyturns joined #gluster
21:26 kkeithley1 joined #gluster
21:32 kkeithley1 joined #gluster
21:41 amye joined #gluster
21:48 gildub joined #gluster
21:48 merp_ joined #gluster
21:53 cyberbootje joined #gluster
22:25 m0zes joined #gluster
22:25 ovaistariq joined #gluster
22:25 bennyturns joined #gluster
22:33 theron joined #gluster
22:36 matclayton joined #gluster
22:59 post-factum cpetersen: sounds like mystery
23:03 cpetersen :'(
23:19 primusinterpares joined #gluster
23:22 Reiner030 left #gluster
23:24 mrEriksson joined #gluster
23:50 gildub joined #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary