Perl 6 - the future is here, just unevenly distributed

IRC log for #gluster, 2017-01-31

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:41 nishanth joined #gluster
00:58 BuBU291 joined #gluster
01:15 Pupeno joined #gluster
01:20 arpu joined #gluster
01:54 arpu joined #gluster
02:01 daMaestro joined #gluster
02:01 Gambit15 joined #gluster
02:16 shyam joined #gluster
02:28 derjohn_mob joined #gluster
02:51 susant joined #gluster
02:51 bbooth joined #gluster
03:03 susant left #gluster
03:13 Karan joined #gluster
03:27 rastar joined #gluster
03:31 Wizek_ joined #gluster
03:33 msvbhat joined #gluster
03:37 magrawal joined #gluster
03:42 emerson joined #gluster
03:49 itisravi joined #gluster
03:57 emerson joined #gluster
03:59 emerson joined #gluster
04:00 susant joined #gluster
04:03 susant left #gluster
04:03 emerson joined #gluster
04:06 emerson joined #gluster
04:07 jdossey joined #gluster
04:09 prasanth joined #gluster
04:10 johnnyNumber5 joined #gluster
04:11 victori joined #gluster
04:15 atinm joined #gluster
04:15 evilemerson joined #gluster
04:16 skumar joined #gluster
04:24 buvanesh_kumar joined #gluster
04:26 kdhananjay joined #gluster
04:27 nbalacha joined #gluster
04:31 Saravanakmr joined #gluster
04:31 Wizek joined #gluster
04:41 sanoj joined #gluster
04:42 aravindavk joined #gluster
04:42 Prasad joined #gluster
04:44 jdossey joined #gluster
04:45 rjoseph joined #gluster
04:50 Shu6h3ndu joined #gluster
04:50 Shu6h3ndu joined #gluster
05:05 evilemerson joined #gluster
05:05 BlackoutWNCT joined #gluster
05:06 ppai joined #gluster
05:07 ankit_ joined #gluster
05:11 evilemerson joined #gluster
05:13 susant joined #gluster
05:21 ndarshan joined #gluster
05:30 BlackoutWNCT joined #gluster
05:33 Karan joined #gluster
05:38 apandey joined #gluster
05:39 Philambdo joined #gluster
05:42 Wizek joined #gluster
05:45 gyadav joined #gluster
05:48 BlackoutWNCT joined #gluster
05:49 BlackoutWNCT joined #gluster
05:49 BlackoutWNCT joined #gluster
05:52 jdossey joined #gluster
05:53 riyas joined #gluster
06:02 msvbhat joined #gluster
06:03 BlackoutWNCT joined #gluster
06:04 shruti` joined #gluster
06:04 BlackoutWNCT joined #gluster
06:04 BlackoutWNCT joined #gluster
06:10 rafi joined #gluster
06:12 pank_ joined #gluster
06:14 skoduri joined #gluster
06:15 poornima joined #gluster
06:15 Pupeno joined #gluster
06:15 pank_ while doing "gluster volume heal VOL info" it is showing various no. of entries mainly gfids but while doing "gluster volume heal VOL04 info split-brain" it shows zero entry. Now how can i clean all those entries which need to heal
06:17 pank_ Here is the output of both above mentioned commands "gluster volume heal VOL04 info"
06:17 pank_ Brick 172.16.185.125:/storage/home/VOL02g Status: Connected Number of entries: 0
06:18 pank_ Brick 172.16.215.84:/storage/home/VOL03g Status: Connected Number of entries: 56
06:18 pank_ Brick 172.16.185.124:/storage/home/VOL01g Status: Connected Number of entries: 3439
06:19 pank_ Now for command "gluster volume heal VOL04 info split-brain"
06:19 pank_ Status: Connected Number of entries in split-brain: 0  Brick 172.16.185.125:/storage/home/VOL02g Status: Connected Number of entries in split-brain: 0  Brick 172.16.215.84:/storage/home/VOL03g Status: Connected Number of entries in split-brain: 0
06:19 pank_ can someone please help
06:19 pank_ ?
06:21 RameshN joined #gluster
06:32 sona joined #gluster
06:33 msvbhat joined #gluster
06:33 TBlaar joined #gluster
06:43 sanoj joined #gluster
06:46 apandey joined #gluster
06:46 nishanth joined #gluster
06:47 pank_ hi
06:47 glusterbot pank_: Despite the fact that friendly greetings are nice, please ask your question. Carefully identify your problem in such a way that when a volunteer has a few minutes, they can offer you a potential solution. These are volunteers, so be patient. Answers may come in a few minutes, or may take hours. If you're still in the channel, someone will eventually offer an answ
06:47 pank_ help please?
06:47 pank_ anybody?
06:48 sbulage joined #gluster
06:51 victori joined #gluster
06:51 itisravi pank_: zero entries for info split-brain means there is no split-brain. You just have pending heals.
06:51 itisravi pank_: try launching heal manually: gluster volume heal volname
06:53 jiffin joined #gluster
06:54 mb_ joined #gluster
06:56 pank_ itisravi: I tried but its not working
06:57 itisravi pank_: do you see errors/warnings in the glustershd.log on the nodes?
06:59 pank_ itisravi: There is no error/warning in any log file
06:59 BlackoutWNCT joined #gluster
06:59 moiz joined #gluster
07:00 BlackoutWNCT joined #gluster
07:00 msvbhat joined #gluster
07:00 itisravi pank_: can you check if the files in question exist on all bricks of the replica?
07:01 itisravi pank_: btw what version of gluster is this and what is the replica count?
07:02 pank_ yitisravi: yes all file are exist in all bricks. GlusreFS version is 3.7.19 and replica is 3 node cluster setup
07:04 pank_ itisravi: yes all file are exist in all bricks. GlusreFS version is 3.7.19 and replica is 3 node cluster setup
07:05 itisravi pank_: hmm, could you try restarting the self-heal daemon? (gluster volume start volname force)
07:05 itisravi and then launching heal again?
07:08 jdossey joined #gluster
07:12 pank_ itisravi: still no change
07:12 Humble joined #gluster
07:13 itisravi pank_: nothing suspicious at all in the glustershd log?
07:14 pank_ itisravi: no i am seeing glfsheal logs for my other volume. but not for this volume
07:15 pank_ itisravi: If I ran this command "gluster volume heal VOL04 split-brain source-brick 172.6.5.4:/storage/home/VOL01g" then it throws error like this "Healing gfid:281f0e2b-e011-4047-a7c3-f761836146d7 failed:Transport endpoint is not connected."
07:16 pank_ itisravi: where "gluster volume status" show all brick are up and working
07:17 pank_ itisravi: also now I am facing input/output error on some files
07:17 itisravi pank_: looks like there is some issue in the network connection then. BTW,  'split-brain source-brick' is only for split-brain resolution.
07:18 itisravi pank_: I/O error where? on the mount?
07:18 pank_ itisravi: can you please tell me why this i/o error happening and how can i avoid this error
07:19 pank_ itisravi: yes input/output error on client side mount
07:19 evilemerson joined #gluster
07:20 itisravi pank_: hmm does the client side log show split-brain error?
07:20 itisravi pank_: for this file.
07:20 Guest89004 joined #gluster
07:20 jtux joined #gluster
07:21 swebb joined #gluster
07:21 pank_ itisravi: no
07:21 misc joined #gluster
07:22 itisravi pank_: what does it say?
07:24 pank_ itisravi: there is nothing printing on this log file
07:24 itisravi pank_: I'm not sure then :(
07:25 Wizek joined #gluster
07:26 victori joined #gluster
07:27 pank_ itisravi: can you tell me the configuration for 3 node setup if possible with all parameters so that I can cross check it with our configuration. We have deployed out setup on aws using ubuntu os.
07:28 itisravi pank_: default configurations that are set during a volume of replica-3 should work fine.
07:28 itisravi during a volume create*
07:30 pank_ ok
07:37 pank_ itisravi: can you please tell me the difference between the output of "gluster volume heal VOL04 info" vs "gluster volume heal VOL04 info split-brain". what does split-brain in terms of file means here. Also what does the gfid in "gluster volume heal VOL info" means
07:42 [diablo] joined #gluster
07:45 ivan_rossi joined #gluster
07:50 DV joined #gluster
07:58 kxseven joined #gluster
08:00 nbalacha joined #gluster
08:03 shutupsquare joined #gluster
08:10 itisravi https://gluster.readthedocs.io/en/latest/Troubl​eshooting/heal-info-and-split-brain-resolution/ gives you the difference. http://gluster.readthedocs.io/en/latest/Admin​istrator%20Guide/arbiter-volumes-and-quorum/ tells you what split-brains are.
08:10 glusterbot Title: Split Brain (Auto) - Gluster Docs (at gluster.readthedocs.io)
08:24 jdossey joined #gluster
08:30 fsimonce joined #gluster
08:34 Marbug joined #gluster
08:37 jri joined #gluster
08:39 musa22 joined #gluster
08:40 nbalacha joined #gluster
08:48 gem joined #gluster
09:07 hybrid512 joined #gluster
09:16 flying joined #gluster
09:17 pulli joined #gluster
09:17 itisravi joined #gluster
09:25 jdossey joined #gluster
09:28 kotreshhr joined #gluster
09:30 pulli joined #gluster
09:33 DV joined #gluster
09:37 gyadav joined #gluster
09:38 nbalacha joined #gluster
09:40 ahino joined #gluster
10:02 derjohn_mob joined #gluster
10:16 jkroon joined #gluster
10:19 Philambdo joined #gluster
10:22 nbalacha joined #gluster
10:22 ShwethaHP joined #gluster
10:25 jdossey joined #gluster
10:30 Ulrar So I'm reading the nfs ganesha docs, and I'm not sure I understand everything. I am right in believing that it makes it possible to mount a gluster volume using NFS (with all the performances gain that implies) but still get HA as if the gluster fuse mountpoint was used ?
10:33 cloph Ulrar: you *could* use it that way. (but not needed to do the hassle of setting it up in a ha way.
10:34 cloph ganesha's HA works by pointing a private IP to a reachable server
10:34 cloph and not sure about performance gains re nfs - guess that depends on the workload
10:34 Pupeno joined #gluster
10:35 Ulrar Well it's web app, so nfs is a huge performance gain
10:35 Ulrar And huge isn't big enough to describe it
10:36 Ulrar OPCache makes it bearable on fuse, but all clients aren't okay with that unfortunatly :/
10:50 flyingX joined #gluster
10:50 cacasmacas joined #gluster
11:03 nbalacha joined #gluster
11:06 rwheeler joined #gluster
11:12 DV joined #gluster
11:22 nbalacha joined #gluster
11:22 flomko joined #gluster
11:24 ankit__ joined #gluster
11:26 ankit_ joined #gluster
11:26 gyadav_ joined #gluster
11:26 jdossey joined #gluster
11:28 shyam joined #gluster
11:34 kkeithley bug triage meeting in approx 30 minutes in #gluster-meeting
11:34 gem joined #gluster
11:36 social joined #gluster
11:47 flying joined #gluster
11:56 DV joined #gluster
11:58 unlaudable joined #gluster
11:59 kkeithley bug triage meeting now in #gluster-meeting
12:03 Wizek joined #gluster
12:03 shutupsquare joined #gluster
12:13 kotreshhr left #gluster
12:26 Seth_Karlo joined #gluster
12:33 ashiq joined #gluster
12:35 Pupeno joined #gluster
12:35 jiffin joined #gluster
12:41 musa22 joined #gluster
12:42 Seth_Karlo joined #gluster
12:42 jdossey joined #gluster
12:47 ndarshan joined #gluster
13:02 Seth_Karlo joined #gluster
13:03 buvanesh_kumar joined #gluster
13:05 kettlewell joined #gluster
13:06 ira joined #gluster
13:19 kotreshhr joined #gluster
13:35 shyam joined #gluster
13:38 Saravanakmr joined #gluster
13:43 jdossey joined #gluster
13:49 musa22_ joined #gluster
13:49 atinm joined #gluster
13:50 pasik joined #gluster
13:51 musa22 joined #gluster
13:53 Vaizki joined #gluster
13:54 unclemarc joined #gluster
13:54 sbulage joined #gluster
13:59 eryc joined #gluster
13:59 eryc joined #gluster
14:07 DV joined #gluster
14:12 kotreshhr left #gluster
14:18 Vaizki joined #gluster
14:26 msvbhat joined #gluster
14:30 shutupsq_ joined #gluster
14:34 squizzi joined #gluster
14:34 msvbhat joined #gluster
14:35 skylar joined #gluster
14:35 nbalacha joined #gluster
14:36 shutupsquare joined #gluster
14:44 jdossey joined #gluster
14:48 Wizek joined #gluster
14:49 jkroon joined #gluster
14:51 Gambit15 Hmmm...having problems with some of my bricks "flapping" & I think it's related to some rpc "saved_frames_unwind" errors I'm getting. Looks like a segfault...
14:52 Gambit15 Anyone able to give a hand?
14:53 ankit_ joined #gluster
15:07 musa22 joined #gluster
15:18 bbooth joined #gluster
15:18 atinm joined #gluster
15:21 susant joined #gluster
15:26 Gambit15 joined #gluster
15:26 kpease joined #gluster
15:31 gyadav_ joined #gluster
15:33 DV joined #gluster
15:35 kpease joined #gluster
15:41 farhorizon joined #gluster
15:44 jdossey joined #gluster
15:45 msvbhat joined #gluster
15:54 timotheus1 joined #gluster
15:56 Seth_Kar_ joined #gluster
16:02 joshin joined #gluster
16:02 joshin joined #gluster
16:04 wushudoin joined #gluster
16:07 wushudoin joined #gluster
16:10 gyadav joined #gluster
16:18 farhorizon joined #gluster
16:19 msvbhat joined #gluster
16:19 bbooth joined #gluster
16:20 jdossey joined #gluster
16:28 susant left #gluster
16:34 JoeJulian Gambit15: A segfault without a crash?
16:48 jkroon joined #gluster
16:50 ivan_rossi left #gluster
16:50 stomith joined #gluster
16:55 gyadav joined #gluster
16:58 Gambit15 JoeJulian, at 3am (6am UTC) this morning all of my bricks flapped, and looking through the logs it seems glusterd on all of the nodes stopped communicating for a period long enough to force all nodes to lose quorum
17:01 Gambit15 Near the very beginning of the incident, these two lines in the bricks' logs are interesting
17:01 Gambit15 [2017-01-31 06:08:24.792712] W [glusterfsd.c:1327:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dc5) [0x7f2e07fd1dc5] -->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x7f2e09663cd5] -->/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x7f2e09663b4b] ) 0-: received signum (15), shutting down
17:01 glusterbot Gambit15: ('s karma is now -169
17:01 Gambit15 [2017-01-31 06:08:37.297824] I [MSGID: 100030] [glusterfsd.c:2454:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.8.5 (args: /usr/sbin/glusterfsd -s s3 --volfile-id data.s3.gluster-data-brick -p /var/lib/glusterd/vols/data/r​un/s3-gluster-data-brick.pid -S /var/run/gluster/feffdd30352​9adadcbb78af828c3d550.socket --brick-name /gluster/data/brick -l /var/log/glusterfs/bricks/gluster-data-brick.log --xlator-option *-posix.glusterd
17:01 Gambit15 )++
17:01 glusterbot Gambit15: )'s karma is now 1
17:01 Gambit15 oO
17:02 cloph poor (
17:02 Gambit15 This is the most prominent error:
17:02 Gambit15 [2017-01-31 16:15:44.827252] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_​callingfn+0x192)[0x7fc3822dc002] (--> /lib64/libgfrpc.so.0(saved_frame​s_unwind+0x1de)[0x7fc3820a384e] (--> /lib64/libgfrpc.so.0(saved_fram​es_destroy+0xe)[0x7fc3820a395e] (--> /lib64/libgfrpc.so.0(rpc_clnt_conne​ction_cleanup+0x84)[0x7fc3820a50b4] (--> /lib64/libgfrpc.so.0(rpc_clnt_​notify+0x120)[0x7fc3820a5990] ))))) 0-glusterfs: forced unwinding frame
17:02 glusterbot Gambit15: ('s karma is now -170
17:02 glusterbot Gambit15: ('s karma is now -171
17:02 glusterbot Gambit15: ('s karma is now -172
17:02 glusterbot Gambit15: ('s karma is now -173
17:02 glusterbot Gambit15: ('s karma is now -174
17:02 cloph @paste Gambit15
17:02 Gambit15 Yeah, sorry. :|
17:03 emerson joined #gluster
17:03 Gambit15 Back in 2 minutes! Will check your comments when I come back
17:03 cloph received signum (15) → that would be regular term signal.. so something told gluster to shutdown...
17:05 emerson joined #gluster
17:07 ankit_ joined #gluster
17:07 jdossey joined #gluster
17:09 Gambit15 back
17:09 Prasad joined #gluster
17:13 Gambit15 Right, well ther's not really anything in the logs before that. The last event was the previous evening
17:14 Gambit15 Going through /var/log/messages however, I did come across a bunch of systemd messages starting around 5 minutes beforehand, referring to the mounted volumes
17:14 Gambit15 (last paste...)
17:14 Gambit15 Jan 31 06:01:01 v3 systemd-machined: Got message type=signal sender=:1.0 destination=n/a object=/org/freedesktop/systemd1/unit/rhev​_2ddata_5cx2dcenter_2dmnt_2dglusterSD_2ds0​_2edc0_2ecte_2eufmt_2ebr_3a_5fiso_2emount interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=73270 reply_cookie=0 error=n/a
17:15 saali joined #gluster
17:15 Gambit15 A couple of those for the locally mounted volumes on each host, but nothing else of interest
17:16 Gambit15 cloph
17:31 riyas joined #gluster
17:37 JoeJulian Meh, single-line pastes is fine. It's the braindead karma limnoria plugin that's the problem.
17:39 JoeJulian Gambit15: That "cleanup and exit" line... do you auto-update at 06:00 UTC?
17:41 Gambit15 JoeJulian, I *never* auto-update anything!
17:43 JoeJulian So your mystery that you'll want to solve: why did the brick stopped and started? That's not a crash, that's a normal shutdown process.
17:43 Gambit15 They're standard CentOS 7 installs running only Gluster & oVirt. The oVirt processes only manage the HA for the controler, nothing else
17:44 Gambit15 Some key logs: https://paste.fedoraproject.org/541703/58846721/
17:44 glusterbot Title: #541703 • Fedora Project Pastebin (at paste.fedoraproject.org)
17:46 JoeJulian Ah, right, ok.
17:46 JoeJulian brick was stopped because you lost server quorum.
17:46 Gambit15 It seems the first thing to be logged was the readv timeout on s1 (.101), however it looks much the same on s1
17:47 Gambit15 ...except it complains about timing out to s3 first, then the other peers
17:47 JoeJulian Are these split two servers per DC?
17:47 Gambit15 ?
17:47 zeio joined #gluster
17:47 JoeJulian Oh, nm... all dc0
17:48 JoeJulian Looks like a network loss.
17:48 jwd joined #gluster
17:49 Gambit15 ...except I'm not seeing any anomalies in the network :/
17:49 JoeJulian ... and it's not a timeout it's a "disconnected"...
17:50 Gambit15 Would the same logs from any of the other servers help?
17:50 JoeJulian I would have expected to find the word "timeout" if the tcp connection wasn't RST FIN.
17:50 Gambit15 Ah, "timeout" is "Tempo esgotado para conexão"
17:50 JoeJulian Might be worth comparing glusterd.vol.log if there's anything different in them, but I doubt it.
17:51 JoeJulian Mmm, could be then.
17:52 ppai joined #gluster
17:52 Gambit15 So at 06:08:27, it seems that everything timed out. And from what I can tell, gluster then started the process for lost quorum
17:55 JoeJulian What happened is that glusterd stopped its bricks when it lost quorum with the other glusterd daemons. According to s0, it first noticed s1 was missing at 06:08:32.080, then lost S3 at 06:08.095. That reduced the number of known peers <= 50% causing the quorum shutdown.
17:56 Gambit15 It's odd though. Currently, the management network & storage network are on the same switch, separated by VLANs. I only monitor the management network directly, and monitor the storage ports via the switch, however neither of those reported any issues at the time all of this happened
17:57 msvbhat joined #gluster
17:58 rwheeler joined #gluster
17:58 Gambit15 I can attach our monitoring to the storage network & monitor the 4 hosts via ping, but it doesn't make much sense that all of the storage ports, and only those ports, silently stopped passing traffic.
17:58 JoeJulian So what else might have caused that? Any other log entries at all near 0600? (journalctl -S 06:00 -U 06:10)
18:02 JoeJulian Assuming glusterd uses the same 42 second ping-timeout (I don't know if it does), you would have had a network interruption for about 55 seconds. Could it be iptables? Route tables? Arp contention? (off the top of my head)
18:02 Wizek joined #gluster
18:04 bbooth joined #gluster
18:05 Gambit15 CentOS' firewall (iptables) & SELinux are disabled
18:08 Gambit15 Interestingly, I've got a 2+1 cluster on another switch, with the arbiter as a VM on this cluster. I've never had quorum issues with those 2
18:09 Gambit15 Neither of the switches are seeing much load either, so that's not the issue
18:12 Gambit15 JoeJulian, the only unexpect journal logs during that period are the following (3 lines coming...)
18:12 Gambit15 Jan 31 06:08:22 systemd[1]: Got notification message for unit systemd-journald.service
18:12 Gambit15 systemd-journald.service: Got notification message from PID 591 (WATCHDOG=1)
18:12 Gambit15 systemd[1]: systemd-journald.service: got WATCHDOG=1
18:13 ShwethaHP joined #gluster
18:13 JoeJulian That seems (to me) like a fairly strong indicator that it's an external problem.
18:15 Gambit15 Right, I'll add our monitoring on a storage interface then. Any suggestions better than simply pinging .100-.103?
18:16 farhoriz_ joined #gluster
18:16 Karan joined #gluster
18:17 JoeJulian Depends on the hardware, but that'll at least give you something.
18:17 dgandhi joined #gluster
18:20 vbellur joined #gluster
18:22 msvbhat joined #gluster
18:25 jri joined #gluster
18:29 Gambit15 Going through the traffic graphs for these ports & I'm not seeing a single blip
18:29 Gambit15 :S
18:45 farhorizon joined #gluster
18:45 PaulCuzner joined #gluster
18:45 PaulCuzner left #gluster
18:47 jri joined #gluster
18:53 msvbhat joined #gluster
19:00 Pupeno joined #gluster
19:04 msvbhat joined #gluster
19:21 farhorizon joined #gluster
19:27 msvbhat joined #gluster
19:33 jri joined #gluster
19:34 shutupsquare joined #gluster
19:40 msvbhat joined #gluster
19:46 vbellur joined #gluster
19:51 musa22 joined #gluster
19:57 Vapez joined #gluster
20:06 ankit_ joined #gluster
20:09 unlaudable joined #gluster
20:09 vbellur joined #gluster
20:31 bbooth joined #gluster
20:31 farhoriz_ joined #gluster
20:40 ahino joined #gluster
20:45 vbellur joined #gluster
20:48 unlaudable joined #gluster
20:51 kettlewell joined #gluster
20:52 farhorizon joined #gluster
20:52 jiffin joined #gluster
21:01 farhorizon joined #gluster
21:02 ashiq joined #gluster
21:10 musa22 joined #gluster
21:12 Pupeno joined #gluster
21:13 bbooth joined #gluster
21:13 squizzi joined #gluster
21:17 stomith joined #gluster
21:46 DV joined #gluster
21:58 l2__ joined #gluster
22:05 farhorizon joined #gluster
22:09 farhorizon joined #gluster
22:09 vbellur joined #gluster
22:32 bbooth joined #gluster
22:40 kkeithley joined #gluster
22:57 stomith joined #gluster
23:16 Wizek joined #gluster
23:16 flo joined #gluster
23:22 farhorizon joined #gluster
23:36 bbooth joined #gluster
23:41 derjohn_mob joined #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary