Perl 6 - the future is here, just unevenly distributed

IRC log for #gluster, 2016-06-22

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:07 PaulCuzner joined #gluster
00:17 julim joined #gluster
00:33 hchiramm joined #gluster
00:39 rafi joined #gluster
00:40 arcolife joined #gluster
00:52 luizcpg joined #gluster
01:05 PaulCuzner joined #gluster
01:13 dnunez joined #gluster
01:18 shdeng joined #gluster
01:28 d0nn1e joined #gluster
01:32 Vaizki joined #gluster
01:33 Lee1092 joined #gluster
01:36 haomaiwang joined #gluster
01:48 ilbot3 joined #gluster
01:48 Topic for #gluster is now Gluster Community - http://gluster.org | Documentation - https://gluster.readthedocs.io/en/latest/ | Patches - http://review.gluster.org/ | Developers go to #gluster-dev | Channel Logs - https://botbot.me/freenode/gluster/ & http://irclog.perlgeek.de/gluster/
01:52 Alghost_ joined #gluster
01:56 F2Knight joined #gluster
02:06 jiffin joined #gluster
02:09 rafaels joined #gluster
02:09 daMaestro joined #gluster
02:14 haomaiwang joined #gluster
02:35 masuberu joined #gluster
02:38 rafaels joined #gluster
02:39 F2Knight joined #gluster
02:44 ramteid joined #gluster
02:57 kotreshhr joined #gluster
03:00 hchiramm joined #gluster
03:14 magrawal joined #gluster
03:31 nishanth joined #gluster
03:36 Apeksha joined #gluster
03:47 atinm joined #gluster
03:59 kramdoss_ joined #gluster
04:02 gem_ joined #gluster
04:05 itisravi joined #gluster
04:14 RameshN joined #gluster
04:14 shubhendu joined #gluster
04:16 shubhendu joined #gluster
04:25 aspandey joined #gluster
04:31 ppai joined #gluster
04:40 nehar joined #gluster
04:41 jiffin joined #gluster
04:43 aravindavk joined #gluster
04:45 poornimag joined #gluster
04:54 nbalacha joined #gluster
05:00 Bhaskarakiran joined #gluster
05:08 ndarshan joined #gluster
05:18 DV_ joined #gluster
05:27 karthik___ joined #gluster
05:27 raghug joined #gluster
05:27 kotreshhr joined #gluster
05:39 satya4ever joined #gluster
05:39 DV_ joined #gluster
05:43 hgowtham joined #gluster
05:44 gowtham joined #gluster
05:56 jwd joined #gluster
05:57 harish joined #gluster
06:01 ashiq joined #gluster
06:03 prasanth joined #gluster
06:04 kdhananjay joined #gluster
06:07 jiffin joined #gluster
06:11 rafi joined #gluster
06:11 aspandey joined #gluster
06:11 ashiq joined #gluster
06:16 hchiramm joined #gluster
06:18 gvandeweyer hi, I have replicated 1 x 3 setup, of which one brick is down. Problem is that the node serving this brick also servers other distributed bricks, so restarting gluster service is not really an option. Second, the replicated volume is heavily used, so restarting the volume is something I'd like to avoid as well. Can I somehow bring this single brick back online without restarting full nodes/volumes?
06:18 eryc joined #gluster
06:19 jiffin gvandeweyer: try gluster v start <volname> force
06:19 jiffin it restart only the offline brick
06:19 gvandeweyer ok. that is on the node serving that brick?
06:19 Manikandan joined #gluster
06:20 jiffin gvandeweyer: any node will be fine
06:21 gvandeweyer ok, that did it, thanks!
06:22 rastar joined #gluster
06:22 kovshenin joined #gluster
06:22 skoduri joined #gluster
06:23 jiffin1 joined #gluster
06:23 arcolife joined #gluster
06:26 atalur joined #gluster
06:30 gvandeweyer another issue we have : sometimes, simple file file copies (or other commands that write to gluster) hang after the copy is finished. it seems that gluster/copy is unable to release the open file or something. when I read the target file (on gluster), e.g. by doing a checksum, the copy successfully ends.
06:31 om joined #gluster
06:47 jtux joined #gluster
06:47 jiffin1 joined #gluster
06:48 micke_ joined #gluster
06:48 snila_ joined #gluster
06:48 hgowtham_ joined #gluster
06:48 shyam1 joined #gluster
06:51 kevc joined #gluster
06:56 lord4163_ joined #gluster
06:58 post-factum joined #gluster
07:00 ic0n_ joined #gluster
07:01 ghenry_ joined #gluster
07:05 masuberu joined #gluster
07:05 cliluw joined #gluster
07:06 deniszh joined #gluster
07:06 JoeJulian atinm: bug 1347329 isn't about tuning quorum when there's only a replica 2 volume, it's about allowing prior behavior with newer versions.
07:06 glusterbot Bug https://bugzilla.redhat.com:​443/show_bug.cgi?id=1347329 high, unspecified, ---, bugs, CLOSED NOTABUG, a two node glusterfs seems not possible anymore?!
07:09 JoeJulian Requiring manual intervention is not valid, imho. The user knows the risks of split brain and has already developed measures to prevent them in his application. MTTR is more important to them than safety.
07:09 atinm JoeJulian, yes, there are two different issues forked out of the comments exchange, I do understand that earlier this used to be a different behavior, but as I mentioned for ensuring that volumes are not brought up with stale configuration data, bringing up the daemons was prevented, we do have a workaround to explicitly start the volume in that case, I've updated the bug
07:10 JoeJulian Cool. It's just my opinion, of course. Now I'm off to bed. :)
07:11 msvbhat joined #gluster
07:11 atinm JoeJulian, I'll think through whether we can have an implementation (with some volume options) to make sure if we can bypass this
07:11 JoeJulian Sounds good to me.
07:11 atinm JoeJulian, good night :)
07:12 hchiramm joined #gluster
07:12 jri joined #gluster
07:13 msvbhat_ joined #gluster
07:21 Jules- atinm: so you will working on a fix for this?
07:22 atinm Jules-, we will give it a try
07:22 Jules- thanks
07:23 anil_ joined #gluster
07:25 hackman joined #gluster
07:29 [diablo] joined #gluster
07:30 [Enrico] joined #gluster
07:33 [fre] joined #gluster
07:35 [Enrico] joined #gluster
07:40 fsimonce joined #gluster
07:45 kshlm joined #gluster
07:45 hackman joined #gluster
07:47 msvbhat_ left #gluster
07:47 jiffin1 joined #gluster
08:00 masuberu joined #gluster
08:01 [fre] guys, could somebody explain me how gluster is doing NFS? I mean, we are trying to rollback from Ganesha and doing a bit more "native" nfs.
08:02 karthik___ joined #gluster
08:02 [fre] Is gluster doing its own NFS-implementation or does it simply uses the common NFS-kernel-stack-space-thing?
08:03 skoduri joined #gluster
08:06 Apeksha joined #gluster
08:25 itisravi [fre]: gluster has its own nfs server implementation but we are moving towards using Ganesha. The latest release has gluster nfs disabled by default.
08:25 itisravi [fre]: See https://www.gluster.org/pipermail/g​luster-users/2016-April/026202.html
08:25 glusterbot Title: [Gluster-users] RFC: beginning to phase out legacy Gluster NFS, to be eventually replaces with NFS-Ganesha (at www.gluster.org)
08:27 ashiq joined #gluster
08:30 [fre] itisravi, We ve been trying to use ganesha for small files, ended up with weekly crashes. can't support nor sell it to the devs anymore.
08:32 gem joined #gluster
08:33 itisravi joined #gluster
08:33 itisravi [fre]: you can still enable and use the native nfs server.
08:34 [fre] ok, so I keep the "nfs.disable on" and do a local fuse-mount from the volume to an export-directory and do it the normal nfs-way?
08:36 Dave joined #gluster
08:36 itisravi You can set nfs.disable to 'off' and 'mount -t nfs hostname:volname /mount_point'
08:36 [fre] itisravi, do you have any idea about when a stable release will be introduced in RHEL 7?
08:37 itisravi [fre]: RHEL or Centos?
08:37 [fre] itisravi, if I do it that way, I'm using the gnfs-gluster-thing and not the native linux kernel-based nfs?
08:37 [fre] RHEL it is.
08:38 itisravi I'd say use the latest released version (which is 3.8) but I'm a dev so maybe I'm biased as to what stable is.
08:38 itisravi RHEL I'm not sure.
08:39 [fre] ow yeah, okay, you're talking about gluster, I was wondering about a stable ganesha-solution. :)
08:39 Gnomethrower joined #gluster
08:40 itisravi ah.
08:42 [fre] itisravi, I can't (nor want) to change the gluster-version as long as no update is available from RH. Just looking for the most performant option now: NFS from Gluster or Native NFS (which supports v4.)?
08:44 [fre] If gluster-nfs is evolving towards ganesha, I'd currently better stick to old-scool-type, I presume.
08:46 kdhananjay joined #gluster
08:46 atalur joined #gluster
08:46 aspandey joined #gluster
08:46 Apeksha joined #gluster
08:49 ashiq joined #gluster
08:49 post-factum ganesha fails for us too while working with mailboxes…
08:51 kovshenin joined #gluster
08:52 om joined #gluster
09:01 Slashman joined #gluster
09:02 [Enrico] joined #gluster
09:05 aravindavk joined #gluster
09:07 atalur joined #gluster
09:07 aspandey joined #gluster
09:18 karnan joined #gluster
09:30 mennie left #gluster
09:32 Jules- hopefully you will not throw gnfs daemon out of glusterfs until ganesha is stable enough to replace it.
09:33 jiffin [fre]: can u mention about your workload, we have fix http://review.gluster.org/#/c/14532/ which fix mem leaks issues
09:33 glusterbot Title: Gerrit Code Review (at review.gluster.org)
09:36 raghug joined #gluster
09:37 kshlm joined #gluster
09:37 partner_ hmm, we wen to 3.8.0 with openstack kilo and once we attempt to make filesystem to mounted volume the whole instance crashes
09:37 jiffin Jules-:post-factum we are working towards ganesha stability. If u had faced any serious issue with ganesha, please let usknow
09:37 jiffin post-factum:
09:38 jiffin if I missed u in my previous message
09:38 post-factum jiffin: i'd be glad to do that as soon as i find the way not to break production setup with my experiments
09:38 partner_ this is what get written to libvirtd log: https://paste.fedoraproject.org/383118/88311146/
09:38 glusterbot Title: #383118 Fedora Project Pastebin (at paste.fedoraproject.org)
09:39 post-factum jiffin: that stalls (dovecot in D state with mailboxes on nfs3 provided by ganesha) occurs only on production setup
09:39 post-factum or nfs4, doesn't matter
09:39 plarsen joined #gluster
09:39 Jules- jiffin: yes, ganesha is not functional at all on Debian-Based systems. But i told this kkeithley already.
09:39 jiffin Jules-: oh k
09:40 satya4ever joined #gluster
09:41 jiffin post-factum: did u mean ganesha went in hung state or the mailbox service went in hung state?
09:41 post-factum jiffin: mailbox service blocks on I/O in D state with no progress
09:41 partner_ we're kind of stuck since the 3.7 leaks memory with libgfapi but at least it somewhat works otherwise
09:45 Jules- post-factum: sounds for me like an file locking issue
09:45 post-factum aye
09:45 atinm joined #gluster
09:48 ppai joined #gluster
09:48 partner_ haven't created any bug about this yet since i thought to discuss the topic first and see if i'm doing something wrong here.. but the issue is repeatable on multiple separate environments
09:48 jiffin1 joined #gluster
09:49 kramdoss_ joined #gluster
09:59 [fre] jiffin, most errors occurred with file-transfer (files with sizes from couple of Kb to 20Mb), about 70000 or so.
09:59 [fre] it's a dual node ganesha-setup, version 2.2.... from RHat.
10:00 kshlm joined #gluster
10:04 ira joined #gluster
10:05 skoduri joined #gluster
10:06 satya4ever joined #gluster
10:14 Gambit15 joined #gluster
10:18 om joined #gluster
10:19 ramky joined #gluster
10:21 raghug joined #gluster
10:27 msvbhat_ joined #gluster
10:28 kshlm joined #gluster
10:37 kdhananjay joined #gluster
10:39 itisravi_ joined #gluster
10:43 jiffin [fre]: I guess the patch will ur issue
10:44 itisravi joined #gluster
10:44 Gambit15 joined #gluster
10:45 atinm joined #gluster
10:46 luizcpg joined #gluster
10:46 ppai joined #gluster
10:49 jiffin1 joined #gluster
10:54 robb_nl joined #gluster
10:54 aspandey joined #gluster
10:56 msvbhat joined #gluster
11:03 skoduri joined #gluster
11:05 Guest49069 joined #gluster
11:16 guardianJ joined #gluster
11:16 [Enrico] joined #gluster
11:16 guardianJ left #gluster
11:21 jiffin [fre]: if u have the RH subscription , you can request for accelerated fix
11:25 klaxa joined #gluster
11:25 DV_ joined #gluster
11:28 owlbot joined #gluster
11:30 johnmilton joined #gluster
11:33 Wizek joined #gluster
11:38 Gnomethrower joined #gluster
11:41 shubhendu joined #gluster
11:47 shdeng joined #gluster
11:48 ppai joined #gluster
12:01 nottc joined #gluster
12:01 kshlm Weekly community meeting starting now in #gluster-meeting
12:08 msvbhat joined #gluster
12:09 atinm joined #gluster
12:10 jdarcy joined #gluster
12:13 rafaels joined #gluster
12:14 aspandey joined #gluster
12:15 ppai joined #gluster
12:19 B21956 joined #gluster
12:22 haomaiwang joined #gluster
12:24 haomaiwang joined #gluster
12:25 Debloper joined #gluster
12:26 kdhananjay joined #gluster
12:30 hackman joined #gluster
12:43 mchangir joined #gluster
12:43 arcolife joined #gluster
12:53 skoduri joined #gluster
12:59 unclemarc joined #gluster
13:00 hchiramm joined #gluster
13:06 julim joined #gluster
13:14 rafaels joined #gluster
13:19 luizcpg joined #gluster
13:20 owlbot joined #gluster
13:22 alvinstarr joined #gluster
13:26 jiffin joined #gluster
13:27 alvinstarr joined #gluster
13:28 rwheeler joined #gluster
13:29 skoduri joined #gluster
13:30 jiffin joined #gluster
13:33 partner so, memory leaks anybody? :)
13:34 shyam joined #gluster
13:36 Guest49069 joined #gluster
13:36 jiffin partner: are referring to previous question related to "3.8.0 with openstack kilo and once we attempt to make filesystem to  mounted volume the whole instance crashes" ?
13:38 dnunez joined #gluster
13:39 kkeithley partner: tell us what you've got
13:42 Apeksha joined #gluster
13:43 mchangir joined #gluster
13:49 Apeksha joined #gluster
13:55 msvbhat_ joined #gluster
13:59 aravindavk joined #gluster
13:59 Apeksha_ joined #gluster
14:06 hchiramm joined #gluster
14:09 partner yeah i earlier reported some serious memory leaks with 3.6-version in openstack cinder use, tried 3.6.9 with no luck there and moved since to 3.7 series where the leak was quite a bit less but still there
14:10 partner then we tried 3.8 rc2 and eventually the release version and while it too leaks we found out its causing buffer overflow and killing qemu-kvm process, efficiently shutting down an instance when attempting to do operations for the attached disk
14:11 partner sorry, a bit vague description but nevertheless we are trying to utilize glusterfs as the volume storage (cinder) for openstack (kilo, on top of centos 7)
14:11 ajneil joined #gluster
14:12 partner now we will need to revert back to 3.7 version since 3.8.0 is unusable due to those timeouts and crashes (fdisk'ing new attached volume will timeout and throw a trace)
14:14 partner created this one earlier https://bugzilla.redhat.co​m/show_bug.cgi?id=1348935
14:14 glusterbot Bug 1348935: high, unspecified, ---, bugs, NEW , Buffer overflow when attempting to create filesystem using libgfapi as driver on OpenStack
14:15 partner i think its due to gluster/libgfapi since it works fine with 3.7
14:16 jiffin partner: do u have the bt for core?
14:18 partner umm for core?
14:19 jiffin backtrace for core, gdb <path to glusterfs client> <core-path?
14:19 Wizek joined #gluster
14:21 partner umm no, i don't think i have core files around. sorry i'm quite lousy debugger..
14:21 shyam joined #gluster
14:24 jiffin can check the log file /var/log/messages for clues?
14:25 partner i think i checked it earlier and found nothing useful. luckily the issue is easily reproducible so i can turn debug on here and there and crash it again
14:26 jiffin it should mention core or ccp in the log?
14:27 hackman joined #gluster
14:28 squizzi joined #gluster
14:28 rwheeler joined #gluster
14:28 partner these are the entries from messages for the event: https://paste.fedoraproject.org/383261/60572114/
14:28 glusterbot Title: #383261 Fedora Project Pastebin (at paste.fedoraproject.org)
14:36 mchangir joined #gluster
14:37 partner and as for the other issue which is the memory leak, we have some debugging done, graphs created showing the leak and differences between the versions
14:38 JoeJulian Looks like a pretty obvious cause to me, unless I'm missing something someplace else.
14:39 partner i don't yet know if its just the write that causes issues or if read also triggers it, haven't got time to debug it further yet
14:40 JoeJulian Every place that glfs_io_async_cbk is called, the result is never checked. There's several memory cleanup actions that are supposed to happen in glfs_io_async_cbk but since iovec is null, none of them do. Since the return result is never checked, no place else cleans up memory either.
14:40 kotreshhr joined #gluster
14:52 msvbhat_ joined #gluster
14:53 rafaels joined #gluster
14:56 shyam joined #gluster
14:57 kramdoss_ joined #gluster
15:07 rwheeler joined #gluster
15:10 Gugge_ joined #gluster
15:10 jbrooks joined #gluster
15:10 plarsen joined #gluster
15:11 eryc joined #gluster
15:11 eryc joined #gluster
15:11 lkoranda joined #gluster
15:11 wushudoin joined #gluster
15:11 partner i just downgraded to 3.7.11 on compute node i have instance running and had no issues doing fdisk and mkfs, rest of the infra runs 3.8.0
15:12 partner i doubt it makes sense to try the couple of RC releases but i can do that too if it helps?
15:12 partner for 3.8 that is
15:15 kotreshhr left #gluster
15:26 kramdoss_ joined #gluster
15:46 nishanth joined #gluster
15:50 kpease joined #gluster
16:02 amye joined #gluster
16:04 guhcampos joined #gluster
16:04 om joined #gluster
16:05 om2 joined #gluster
16:14 gnulnx Could use some help with performance:  I've got two gluster servers.  One is centos 7, the other freebsd 10.3.  Both have gluster 3.7.6, and they're directly connected to eachother over 2x 10G Ethernet links, bonded together in LACP mode.
16:14 gnulnx I
16:15 gnulnx I've created a distributed volume across the two peers (over the 10gig lacp link).
16:15 gnulnx The centos server has 70TB of data that I need to move from its local storage, onto the distributed gluster volume.
16:16 gnulnx Mounting the volume with the glusterfs fuse client and then doing a 'mv' is how I am approaching it right now, but performance is terrible.
16:16 jiffin partner: the code for that function is same in both 3.7.11 and 3.8
16:16 Gnomethrower joined #gluster
16:16 gnulnx Are there any best practices / recommendations / tips for this scenario (scenarios being:  1) performance for distributed volume on direct connected 10g lacp link and 2) moving a huge amount of small files to a volume)
16:17 jiffin partner: can check whether the same error is logged in 3.7.11?
16:20 JoeJulian jiffin: My guess is the null iovec isn't happening in 3.7 so that leak isn't getting hit.
16:24 JoeJulian partner: you could confirm my theory if your 3.7 logs do not contain "invalid argument: iovec"
16:24 jiffin JoeJulian: https://paste.fedoraproject.org/383351/46661263/, the output of cscope
16:24 syadnom joined #gluster
16:24 syadnom hi all, setting up a new gluster install and I just connected a peer up but it's by IP instead of hostname....  how can I change this?
16:25 om2 left #gluster
16:25 jiffin iovec = NULL in calling expect in glfs_preadv_async_cbk()
16:25 mchangir joined #gluster
16:27 gnulnx Another weird issue.  It looks like all data written to this distributed volume is ending up on server 2.
16:28 gnulnx rebalancing should only be required for existing data?
16:28 Gambit15 joined #gluster
16:28 JoeJulian gnulnx: is this a replicated volume?
16:28 gnulnx JoeJulian: No, distributed
16:29 gnulnx https://gist.github.com/kylejohnso​n/7951b3d91ca68f7dfd39cf85d34f3795
16:29 glusterbot Title: gist:7951b3d91ca68f7dfd39cf85d34f3795 · GitHub (at gist.github.com)
16:30 JoeJulian Did you create a single-brick volume and then add a brick? Or do your filenames all start with the same 32 bytes?
16:31 gnulnx JoeJulian: Yes, the former.
16:31 gnulnx Single brick, mv'd 50G to test, added the other brick, and have moved about 50G more, and it is all residing on the first brick
16:32 JoeJulian That's why. You need to at least rebalance fix-layout to create the hash maps for the new brick.
16:32 shyam joined #gluster
16:33 JoeJulian (and the 32 byte thing is wrong. I seem to remember that from way back but I just reviewed the hashing code and it uses the whole filename)
16:33 gnulnx JoeJulian: You're right, the docs clearly say that.  Sorry for wasting your time.
16:34 JoeJulian No problem
16:34 JoeJulian It's nice to get an easy victory first thing in the morning.
16:34 gnulnx amen to that.
16:37 jiffin syadnom: you need to remove the entry and add it again I guess
16:37 syadnom well, I took a chance and did a peer probe on the second box to the first and it updated....
16:39 JoeJulian jiffin: That doesn't make any sense to me. If it's called specifically with NULL, "GF_VALIDATE_OR_GOTO ("gfapi", iovec, inval);" doesn't make sense.
16:40 JoeJulian But I'm no professional C coder so I could easily be missing something.
16:40 jiffin JoeJulian: correct, I am still wondering same thing
16:40 JoeJulian ok
16:41 F2Knight joined #gluster
16:42 mobaer joined #gluster
16:45 JoeJulian regression caused by the patch for bug 1333268
16:45 glusterbot Bug https://bugzilla.redhat.com:​443/show_bug.cgi?id=1333268 high, unspecified, ---, pgurusid, ASSIGNED , SMB:while running I/O on cifs mount and doing graph switch causes cifs mount to hang.
16:47 Guest78488 joined #gluster
16:48 nbalacha joined #gluster
16:50 mchangir joined #gluster
16:51 JoeJulian So that's why it's not hit in 3.7.11. It's in 3.7.12rc2 though.
16:59 nishanth joined #gluster
17:00 luizcpg joined #gluster
17:01 Manikandan joined #gluster
17:05 luizcpg left #gluster
17:16 jiffin JoeJulian: great catch
17:16 jiffin JoeJulian++
17:16 glusterbot jiffin: JoeJulian's karma is now 29
17:20 verdurin Possibly dim question - is it possible to turn on sharding after a volume has been started?
17:24 nbalacha joined #gluster
17:24 JoeJulian verdurin: According to http://blog.gluster.org/2015/12​/introducing-shard-translator/ that looks like a yes.
17:25 jiffin partner: beware you may hit the same issue with 3.7.12, JoeJulian : I will check with poornima about the issue
17:26 verdurin JoeJulian: excellent. Saw it referred to in a FOSDEM presentation, and feared I couldn't use it on a fairly new installation.
17:28 kotreshhr joined #gluster
17:31 montjoy joined #gluster
17:33 JoeJulian In his example, the volume is already started.
17:34 montjoy Has anyone here encountered a situation where gluster does need seem to 'heal' entries listed in the heal info command?  I'm not talking about 'heal failed', just entries that stay in the heal info output and don't go away.
17:34 montjoy ^does NOT seem to 'heal'
17:36 montjoy self-heal is running
17:37 mchangir joined #gluster
17:43 d0nn1e joined #gluster
17:44 post-factum JoeJulian: is there proper fix for https://bugzilla.redhat.com:​443/show_bug.cgi?id=1333268 or reverting http://review.gluster.org/#/c/14223/ is the only option for now?
17:44 glusterbot Bug 1333268: high, unspecified, ---, pgurusid, ASSIGNED , SMB:while running I/O on cifs mount and doing graph switch causes cifs mount to hang.
17:45 post-factum jiffin: ^^
17:46 jiffin post-factum: IMO reverting change is not right thing to do, may be small modification in cbk solves the issue
17:47 JoeJulian post-factum: afaict (and again, I'm no code guru) you should be able to just move it after "} else if (gio->op == GF_FOP_READ) {"
17:47 JoeJulian That's the only place it would matter anyway.
17:48 JoeJulian In that spot it should never get triggered. The read fop should always have a valid iovec.
17:48 JoeJulian ... so if it does fail, it's a valid failure.
17:49 kovshenin joined #gluster
17:52 post-factum JoeJulian: jiffin: thx
18:03 mchangir joined #gluster
18:14 gnulnx JoeJulian: Still something wonky going on:  The peer that had the original brick: 198G.  The new brick, post-rebalance: 225M
18:15 JoeJulian If I were to guess, I would lean toward sparse file expansion.
18:15 JoeJulian I know that was fixed for self-heal, but perhaps it's still a problem for rebalance.
18:17 gnulnx Sorry, spare file expansion?
18:17 JoeJulian sparse
18:17 gnulnx Right, that's what I meant to say.
18:17 JoeJulian @lucky sparse files
18:17 glusterbot JoeJulian: Error: The Google Web Search API is no longer available. Please migrate to the Google Custom Search API (https://developers.google.com/custom-search/)
18:17 JoeJulian Oh thanks google.
18:18 gnulnx lovely
18:21 gnulnx JoeJulian: What I mean is, I'm not sure how sparse files are relevant in this context.
18:22 JoeJulian File is sparse on 'A'. It gets migrated to 'B' but all the 0s are copied. File is no longer sparse on 'B' and uses up more actual disk space.
18:23 JoeJulian du will show larger files but du --apparent should show them the same.
18:24 Manikandan joined #gluster
18:30 jhc76 joined #gluster
18:37 gnulnx JoeJulian: --apparent actually shows them as smaller on B.  Either way I'm not dealing with spare files (unless I'm missing something obvious, which is totally possible).
18:37 glusterbot gnulnx: JoeJulian's karma is now 28
18:38 gnulnx is that so?
18:41 jri joined #gluster
18:42 F2Knight_ joined #gluster
18:54 guhcampos_ joined #gluster
19:00 jri joined #gluster
19:18 montjoy ideas on why the list from 'gluster volume heal volname info' would grow over time?  autohealing is on.  entries not showing up under heal-failed.  not split-brained.
19:23 F2Knight joined #gluster
19:25 JoeJulian montjoy: I would check your logs. There's usually some clue in one of them.
19:26 JoeJulian Perhaps a client isn't connected to all the bricks, or a brick is down or firewalled or just hung...
19:27 harish joined #gluster
19:28 montjoy gluster volume start reports everything is online
19:28 montjoy status not start
19:31 montjoy I see '/lib64/libglusterfs.so.0(dict_ref+0x79) [0x7f5bced3c2d9] ) 0-dict: dict is NULL [Invalid argument]' a couple of times
19:32 montjoy in the glfsheal.log
19:32 JoeJulian Insufficient data
19:32 JoeJulian Need the full log line.
19:33 montjoy sorry, was trying not to spam - will pastebin
19:34 JoeJulian One line's no biggie. I do appreciate the effort though.
19:35 montjoy http://pastebin.com/53AbMmGT
19:35 glusterbot Please use http://fpaste.org or http://paste.ubuntu.com/ . pb has too many ads. Say @paste in channel for info about paste utils.
19:35 rwheeler joined #gluster
19:37 montjoy A google search for that turned up https://bugzilla.redhat.co​m/show_bug.cgi?id=1289893 - which made it sound harmless
19:37 glusterbot Bug 1289893: unspecified, unspecified, ---, rhs-bugs, CLOSED ERRATA, Excessive "dict is NULL" logging
19:42 deniszh joined #gluster
19:42 JoeJulian Yeah, but this is in syncop_getxattr_cbk called from afr_get_heal_info. Which is still probably harmless but makes me wonder if the process that shd uses for determining if a file needs healed is getting interrupted at that point.
19:43 montjoy that sounds plausable
19:44 montjoy would running a gluster volume heal info repeatedly do that?  we run it every 10 minutes on every node for monitoring
19:45 JoeJulian Shouldn't, but I know they recently fixed a locking bug related to running heal info repeatedly.
19:45 JoeJulian Perhaps there's still something to be fixed.
19:46 JoeJulian I suggest you file a bug report.
19:46 glusterbot https://bugzilla.redhat.com/en​ter_bug.cgi?product=GlusterFS
19:46 montjoy ok.  this is 3.7.11
19:47 JoeJulian In the mean time, try "gluster volume start $volname force" which will reload all the shd daemons and may allow the heals if it's related.
19:48 montjoy i'll try both.  Thanks @joejulian
19:48 * JoeJulian tries to figure out how fio was able to read at 13 gigabits from a 6 gigabit sas hba...
19:50 montjoy dual-channel SAS?
19:51 JoeJulian Didn't think so, but maybe. I thought these knox trays didn't talk between them.
19:51 montjoy no - multipathing I bet
19:54 manous joined #gluster
19:54 manous hello
19:54 glusterbot manous: Despite the fact that friendly greetings are nice, please ask your question. Carefully identify your problem in such a way that when a volunteer has a few minutes, they can offer you a potential solution. These are volunteers, so be patient. Answers may come in a few minutes, or may take hours. If you're still in the channel, someone will eventually offer an answer.
19:54 montjoy @joejulian another weird tidbit is that each replica node shows a different number of entries
19:54 manous how to ad centos 7 repo ?
19:55 manous can't use this curl http://download.gluster.org/pub/gluster/glu​sterfs/LATEST/EPEL.repo/glusterfs-epel.repo -o /etc/yum.repos.d/glusterfs-epel.repo
19:55 JoeJulian It's in the storage sig.
19:56 JoeJulian montjoy: Awe, man... Can you make your company better? ;)
19:56 morgan joined #gluster
19:56 JoeJulian I'm not a big fan of your employer.
19:59 montjoy https://paste.fedoraproject.org/383476/14666255/
19:59 glusterbot Title: #383476 Fedora Project Pastebin (at paste.fedoraproject.org)
19:59 JoeJulian Ok, yeah, that's weird.
20:05 partner JoeJulian: seems jiffin left.. nevertheless these issues have not been noticed with 3.7 and memory leak is different from this total crash of an instance when accessing volume from glusterfs
20:06 partner i will need to do a round of more tests
20:06 JoeJulian Yeah, the patch in the release-3.7 branch wasn't tagged until v3.7.12rc2
20:06 partner with 3.7 i was able to fdisk+mkfs volume just fine, with 3.8 i get timeouts for fdisk and if i try to mkfs the device it will crash immediately
20:06 JoeJulian Make sense. fsync will fail.
20:07 harish joined #gluster
20:07 partner why we went for 3.8 was the issue with memory leak when using libgfapi that will eventually block libvirtd from functioning until restarted
20:07 partner but, restarts come with a price...
20:07 JoeJulian Right, you're attaching and detaching iirc, yes?
20:08 partner that's the fastest way of proving the leak but it will leak just with a mount, too..
20:09 partner and with heat stacks and stuff where multiple volumes get attached/detached its pretty fast, cannot live like 12 hours without restarting libvirtd on worst compute nodes..
20:10 post-factum oh, memory leak
20:10 post-factum wait
20:10 post-factum i have a link
20:11 post-factum https://bugzilla.redhat.co​m/show_bug.cgi?id=1348095
20:11 glusterbot Bug 1348095: medium, unspecified, ---, bugs, NEW , GlusterFS memory leak on bricks reconnection
20:11 post-factum JoeJulian: partner: ^^
20:11 DV joined #gluster
20:12 partner post-factum: well two issues actually but surely memory leaks are at our hand
20:12 post-factum partner: in fact memory leaks is the only reason i'm here for ~6 months
20:13 partner post-factum: please let me know if there is anything i can do to provide you with more information, this isn't my area of expertice i'm afraid
20:14 post-factum partner: haven't you made similar ticket?
20:14 guhcampos joined #gluster
20:15 post-factum partner: if that is you case it would be nice if you gather some info similar to mine one and post some snippets there as well
20:15 partner post-factum: umm no, i made it since touching the volume crashed an instance with buffer overflow
20:15 partner just like today or so
20:15 partner but reason for going to 3.8.0 was really the memory leaks which caused us lots of issues, especially when developing the stacks they get redeployed so often our compute nodes / libvirtd just ran out of memory
20:16 partner somewhere at 24 gigs
20:16 post-factum RSS or VIRT?
20:19 montjoy left #gluster
20:21 F2Knight joined #gluster
20:23 partner RSS
20:23 gnulnx Is there a good way to rebalance a distributed volume when I want to remove a brick, without losing data?
20:25 partner post-factum: when we go out the ~22-24 gigs we'll start getting these: ceilometer-polling[79882]: libvirt: Storage Driver error : out of memory
20:25 partner fix is to restart libvirtd but that might cause other things to fail and instances to shut down
20:26 partner i have some graphs somewhere i'm trying to find...
20:28 post-factum gnulnx: remove-brick start/commit
20:28 rafaels joined #gluster
20:28 post-factum partner: rss leaks for me too but not that much as virt does
20:29 post-factum partner: but i guess your leaks happen on reconnect as well. could you just gather all the info you have and put that info into my ticket?
20:30 partner post-factum: i never thought deeper like for example it to be related to brick reconnection
20:31 partner post-factum: i surely can
20:31 post-factum partner: you could check logs and monitor rss usage
20:31 post-factum partner: that is how i've figured it out
20:31 partner we graph the VmRSS and VmSize for libvirtd to see how it behaves
20:31 post-factum partner: surely, check it first. probably, you issue is unrelated to mine one, but it looks similar
20:31 partner i'll try to drop couple of graphs somewhere
20:32 montjoy joined #gluster
20:33 * montjoy has completely forgotten on how to use IRC
20:35 post-factum montjoy: just type and hit enter
20:36 partner post-factum: here you can see the difference between 3.6 and 3.7 when doing identical operations in regards of memory: http://ajaton.net/glusterfs/
20:36 glusterbot Title: Index of /glusterfs (at ajaton.net)
20:36 partner the drop in the graphs is due to cronjob restarting the libvirtd before it dies..
20:37 partner but the difference in speed is anyways visible there, not sure if that is helpful :o
20:38 post-factum for sure, that in my case
20:38 post-factum *is
20:38 post-factum join the bugreport :)
20:40 post-factum partner: do you have 3.8 charts?
20:42 partner post-factum: no i don't but the leak was i recall less than on 3.7 but still there, i can probably craft some graphs to compare those, too, if it helps
20:42 deniszh joined #gluster
20:43 post-factum partner: those graphs are wonderful and will definitely help to illustrate the issue
20:44 partner we are just getting rid off the 3.8.0 since the buffer overflow issue that prevents using the volumes but i could use some of our testing envs anyways
20:50 partner post-factum: added a comment. its close to midnight here so i'll fall to background. thanks for your help and lets get in touch again
20:51 montjoy post-factum - I was just trying to figure out how JoeJulian found my company.   not worried, just curious
20:51 partner seems that bug has the issue on the other side now that i read it more carefully..
20:52 post-factum partner: np thx
20:53 gvandeweyer joined #gluster
20:58 johnmilton joined #gluster
21:02 harish joined #gluster
21:03 rafaels joined #gluster
21:11 JoeJulian post-factum: Well, I'm glad there were memory leaks then. I like having you around. :D
21:14 JoeJulian My "stalker" tendencies... I google your name, if you have it set, and/or your handle. Most people use the same handle for multiple things. That usually leads me to something that's industry related and frequently to your linkedin profile. I like to know who I'm talking to so I can better frame my responses.
21:16 johnmilton joined #gluster
21:20 rossdm joined #gluster
21:27 partner i'm just a bot
21:28 partner JoeJulian: but i very much understand you, i mostly live on IRCNet on channels where full name is mandatory or otherwise its simply a kick..
21:28 partner for the very same reason
21:29 JoeJulian I can talk to someone with 30 years in the industry quite differently than I need to talk to a student.
21:29 JoeJulian Us old folks even use different terms sometimes.
21:30 partner indeed
21:30 partner like, floppy?-)
21:30 JoeJulian Or Megabytes (I doubt I'll ever make the switch to Mibibytes or whatever it is).
21:31 partner had to google for the famous "640 k is enough.." and most of the hits say "640 Kelvins.."
21:32 JoeJulian hehe
21:32 JoeJulian You do know that he never actually said that, right? It's one of those famous things that wasn't actually said.
21:33 JoeJulian And the original misquote was from Byte magazine and it was 64k.
21:33 partner i'm just reading something from 1997 that makes the doubt on that one.. i wasn't present there..
21:33 JoeJulian I used to have that issue.
21:34 JoeJulian I think my wife "cleaned up" my magazine collection during one of our moves.
21:37 partner one just cannot trust anybody nowadays, be it gates or your own wife.. :/
21:37 eKKiM joined #gluster
21:37 JoeJulian Meh, she probably knows best anyway. She's a good one.
21:38 partner i'm sure she is :)
21:38 eKKiM joined #gluster
21:39 montjoy JoeJulian: aha.  That make more sense.  I assumed IRC was publishing my IP somewhere and wanted to find out where
21:39 partner i'm sure the quote "can't live with them, can't live without them" is a fake too..
21:39 eKKiM joined #gluster
21:40 partner but i am too stuck with that one and happy with my choice/option
21:42 partner nevertheless, long time no see JoeJulian, nice talking to you again, we'll be in touch later since i'm again involved in this stuff, gn ->
21:44 JoeJulian Goodnight partner
21:45 johnmilton joined #gluster
22:05 F2Knight joined #gluster
22:17 luizcpg joined #gluster
22:20 Wizek_ joined #gluster
22:27 Wizek__ joined #gluster
22:46 caitnop joined #gluster
23:11 F2Knight joined #gluster
23:14 plarsen joined #gluster
23:19 nage joined #gluster
23:33 om joined #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary