Perl 6 - the future is here, just unevenly distributed

IRC log for #gluster, 2016-08-25

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:12 Klas joined #gluster
00:21 shyam joined #gluster
00:39 Javezim Running Gluster 3.8.3 and Samba-VFS-Modules, Can anyone think why from a Windows host I cannot access the share?
00:39 Javezim This is the smb.conf file
00:39 Javezim http://paste.ubuntu.com/23086871/
00:39 glusterbot Title: Ubuntu Pastebin (at paste.ubuntu.com)
00:40 Javezim Logs showing - http://paste.ubuntu.com/23086875/
00:40 glusterbot Title: Ubuntu Pastebin (at paste.ubuntu.com)
00:55 Alghost joined #gluster
01:05 shdeng joined #gluster
01:10 ldumont joined #gluster
01:10 ldumont Hey guys, quick question, how do permissions work with Gluster?
01:10 ldumont I've just setup a test setup with two hosts and files created on one host are not seen on the other
01:10 ldumont if I chmod 777 that file, it shows up.
01:14 MugginsM joined #gluster
01:15 nathwill joined #gluster
01:20 MugginsM joined #gluster
01:32 harish joined #gluster
01:35 kukulogy_ joined #gluster
01:38 kukulogy joined #gluster
01:39 Lee1092 joined #gluster
01:47 ilbot3 joined #gluster
01:47 Topic for #gluster is now Gluster Community - http://gluster.org | Documentation - https://gluster.readthedocs.io/en/latest/ | Patches - http://review.gluster.org/ | Developers go to #gluster-dev | Channel Logs - https://botbot.me/freenode/gluster/ & http://irclog.perlgeek.de/gluster/
01:47 kramdoss_ joined #gluster
01:50 wadeholler joined #gluster
01:51 derjohn_mobi joined #gluster
02:20 nathwill joined #gluster
02:30 muneerse joined #gluster
02:36 shdeng joined #gluster
02:51 hagarth joined #gluster
02:54 magrawal joined #gluster
03:02 Gambit15 joined #gluster
03:12 Javezim Is it normal for an Arbiter machine to not see itself as Arbiter when using 'gluster volume info"
03:12 Javezim On other machines its bricks show "Arbiter" next to it
03:12 Javezim But this machine doesn't
03:25 Manikandan joined #gluster
03:36 atinm joined #gluster
03:38 shdeng joined #gluster
03:48 ramky joined #gluster
03:52 nathwill joined #gluster
04:02 itisravi joined #gluster
04:06 nishanth joined #gluster
04:06 shubhendu joined #gluster
04:14 RameshN joined #gluster
04:27 rafi joined #gluster
04:31 Gnomethrower joined #gluster
04:32 aspandey joined #gluster
04:33 itisravi joined #gluster
04:34 nbalacha joined #gluster
04:37 raghug joined #gluster
04:38 ankitraj joined #gluster
04:38 kshlm joined #gluster
04:40 ankitraj joined #gluster
04:44 skoduri joined #gluster
04:45 itisravi joined #gluster
04:48 jiffin joined #gluster
04:59 aravindavk_ joined #gluster
04:59 Alghost joined #gluster
05:03 Gnomethrower joined #gluster
05:05 rastar joined #gluster
05:07 kramdoss_ joined #gluster
05:10 ndarshan joined #gluster
05:13 ppai joined #gluster
05:18 Alghost joined #gluster
05:19 ankitraj joined #gluster
05:23 karthik_ joined #gluster
05:25 Alghost joined #gluster
05:27 johnmilton joined #gluster
05:34 Bhaskarakiran joined #gluster
05:37 Javezim When downloading 3.8, is there anyway to make sure you get 3.8.2 not 3.8.3?
05:37 Javezim On Ubuntu
05:40 Muthu_ joined #gluster
05:42 ankitraj Javezim, you can take help from https://gluster.readthedocs.io/en/lat​est/Upgrade-Guide/Upgrade%20to%203.7/
05:42 glusterbot Title: Upgrade to 3.7 - Gluster Docs (at gluster.readthedocs.io)
05:43 msvbhat joined #gluster
05:44 ppai joined #gluster
05:48 jkroon JoeJulian, yea, insane is the name of the game :p.  nfs is the one that's of more interest.
05:52 kotreshhr joined #gluster
05:54 kramdoss_ joined #gluster
05:56 Gnomethrower joined #gluster
06:00 skoduri joined #gluster
06:07 atalur joined #gluster
06:07 ashiq joined #gluster
06:08 hgowtham joined #gluster
06:11 jkroon joined #gluster
06:12 Manikandan joined #gluster
06:12 karnan joined #gluster
06:13 mhulsman joined #gluster
06:17 mhulsman joined #gluster
06:17 ppai joined #gluster
06:19 Alghost joined #gluster
06:20 kramdoss_ joined #gluster
06:22 kdhananjay joined #gluster
06:26 johnmilton joined #gluster
06:28 d0nn1e joined #gluster
06:37 satya4ever joined #gluster
06:38 kramdoss_ joined #gluster
06:38 jtux joined #gluster
06:45 atinm joined #gluster
06:51 Manikandan joined #gluster
06:53 kotreshhr joined #gluster
06:54 raghug joined #gluster
06:56 PaulCuzner joined #gluster
07:00 aravindavk joined #gluster
07:01 kramdoss_ joined #gluster
07:05 devyani7 joined #gluster
07:20 johnmilton joined #gluster
07:20 kramdoss_ joined #gluster
07:26 ivan_rossi joined #gluster
07:30 [diablo] god moaning #gluster
07:31 creshal joined #gluster
07:31 Saravanakmr joined #gluster
07:37 skoduri joined #gluster
07:37 raghug joined #gluster
07:38 jri joined #gluster
07:40 atinm joined #gluster
07:40 kotreshhr joined #gluster
07:46 jri_ joined #gluster
07:48 fsimonce joined #gluster
08:06 Vaizki joined #gluster
08:06 Yingdi joined #gluster
08:06 [diablo] guys, if an LV is created to say /data/brick-data , and then a brick for a volume is at /data/brick-data/brick-data
08:06 [diablo] is it possible to create another volume with a new brick at say /data/brick-data/brick-test
08:07 [diablo] in other words, 1 x LV, holding two or more volumes
08:07 Yingdi left #gluster
08:08 post-factum [diablo]: yes
08:08 Javezim [diablo] Yeah, We've done this for Arbiters
08:08 [diablo] morning post-factum
08:08 [diablo] OK so perhaps you can help
08:09 [diablo] http://paste.ubuntu.com/23088032/
08:09 glusterbot Title: Ubuntu Pastebin (at paste.ubuntu.com)
08:09 [diablo] we get that error
08:11 [diablo] http://paste.ubuntu.com/23088033/
08:11 glusterbot Title: Ubuntu Pastebin (at paste.ubuntu.com)
08:11 [diablo] that's our volumes
08:11 pur joined #gluster
08:11 Manikandan joined #gluster
08:12 post-factum do you have /rhgs/brick-data/brick-fred folder created?
08:13 [diablo] post-factum, yes
08:13 post-factum have you used it for brick before?
08:13 [diablo] nope
08:13 [diablo] totally new
08:13 post-factum recreate that folder on both nodes and try again
08:13 [diablo] post-factum, OK one sec
08:14 [fre] joined #gluster
08:14 [diablo] OK removed
08:14 [diablo] now I've made
08:15 [diablo] on both nodes /rhgs/brick-data/brick-diablo
08:15 [diablo] [root@svgfscapl001 brick-data]# gluster volume create diablo replica 2 svgfscapl001.prd.srv.cirb.lan:​/rhgs/brick-data/brick-diablo svgfscupl001.prd.srv.cirb.lan:​/rhgs/brick-data/brick-diablo
08:15 [diablo] volume create: diablo: failed: Brick: svgfscapl001.prd.srv.cirb.lan:​/rhgs/brick-data/brick-diablo not available. Brick may be containing or be contained by an existing brick
08:15 [diablo] [root@svgfscapl001 brick-data]#
08:15 [diablo] same problem
08:15 [diablo] very bizarre
08:16 post-factum [diablo]: why do you point to the same node twice?
08:16 [diablo] I don't
08:16 [diablo] one is called svgfscapl001 and the other svgfscupl001
08:16 [diablo] cap and cup
08:17 post-factum ah, missed that
08:17 [diablo] :) np
08:18 post-factum could you please resolve IP for svgfscapl001.prd.srv.cirb.lan and svgfscupl001.prd.srv.cirb.lan from the node you create the volume?
08:18 post-factum are those IPs different?
08:19 [diablo] [root@svgfscapl001 brick-data]# dig -t ANY svgfscapl001.prd.srv.cirb.lan +short
08:19 [diablo] 192.168.18.10
08:19 [diablo] [root@svgfscapl001 brick-data]# dig -t ANY svgfscupl001.prd.srv.cirb.lan +short
08:19 [diablo] 192.168.18.11
08:19 [diablo] [root@svgfscapl001 brick-data]#
08:20 post-factum [diablo]: did you use /rhgs/brick-data/ before for holding brick data?
08:21 [diablo] that's the root of the LV
08:21 [diablo] so no
08:21 [diablo] we have a sub-directory of an active volume "data" which is at /rhgs/brick-data/brick-data
08:21 magrawal ndevos: ping
08:22 glusterbot magrawal: Please don't naked ping. http://blogs.gnome.org/mark​mc/2014/02/20/naked-pings/
08:22 post-factum glusterbot: you are so slow
08:22 post-factum [diablo]: could you check gluster cli logs?
08:22 [diablo] sure
08:22 [diablo] one sec
08:22 ndevos magrawal: yes?
08:23 magrawal ndevos,when u are free please review this http://review.gluster.org/#/c/15086/
08:23 glusterbot Title: Gerrit Code Review (at review.gluster.org)
08:23 [diablo] post-factum, just
08:23 [diablo] [2016-08-25 08:22:38.086654] I [cli.c:721:main] 0-cli: Started running /usr/sbin/gluster with version 3.7.9
08:23 [diablo] [2016-08-25 08:22:38.147956] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
08:23 [diablo] [2016-08-25 08:22:38.210378] I [input.c:36:cli_batch] 0-: Exiting with: 0
08:23 [diablo] repeating
08:23 ndevos magrawal: ah, yes... 'free' :) but I'll try
08:23 post-factum [diablo]: and upgrade to 3.7.14 to make sure we do not hit some bug
08:24 magrawal ndevos,thanks
08:24 [diablo] post-factum, this is RHGS
08:24 post-factum [diablo]: sounds like a condemnation
08:24 [diablo] post-factum, there's no yum updates
08:29 aravindavk joined #gluster
08:30 post-factum [diablo]: getfattr -m- -d /rhgs/brick-data/
08:30 post-factum [diablo]: getfattr -m- -d /rhgs/brick-data/brick-diablo
08:30 post-factum [diablo]: on both nodes please
08:30 post-factum @paste
08:30 ankitraj joined #gluster
08:30 glusterbot post-factum: For a simple way to paste output, install netcat (if it's not already) and pipe your output like: | nc termbin.com 9999
08:30 [diablo] OK
08:32 Muthu_ joined #gluster
08:32 [diablo] http://paste.ubuntu.com/23088074/
08:32 glusterbot Title: Ubuntu Pastebin (at paste.ubuntu.com)
08:32 post-factum [diablo]: looks pretty OK. probably, you really hit some bug there
08:33 [diablo] LOL
08:33 post-factum [diablo]: tried restarting glusterd on both nodes?
08:33 [diablo] hmm gonna have to clear it with a team, they're testing on it
08:33 [diablo] although we do need to reboot for a kernel update
08:34 [diablo] post-factum, I'll see if we can get A-OK for reboot
08:34 [diablo] post-factum, many thanks for helping though
08:34 post-factum [diablo]: restarting glusterd won't interrupt your workload
08:34 [diablo] post-factum, sadly it will
08:34 post-factum ?
08:34 [diablo] post-factum, we have an NFS share
08:34 [diablo] clients only come in via 1 x node
08:34 hackman joined #gluster
08:34 post-factum afaik, restarting glusterd does not touch nfs translator...
08:35 [diablo] other clients use SMB/CTDB, they're OK that fails over
08:35 post-factum [diablo]: anyway, nfs is pita
08:35 [diablo] :-)
08:35 [diablo] let me see what I can arrange...
08:35 [diablo] need to inform teams
08:36 [diablo] just gonna go smoke... brb
08:43 jkroon [diablo], permissions/ownership on the folders for the bricks?
08:43 jkroon not that it should matter I don't think ...
08:49 arcolife joined #gluster
08:49 Sebbo1 joined #gluster
08:52 [diablo] hi jkroon same as other bricks
08:53 jkroon gluster volume info - should list all the bricks - double check that it's not part of some other brick please - the error (as per post-factum) really does hint at that.
08:53 jkroon also check for .glusterfs folders on those paths and down.
08:54 [diablo] OK sorry just got something to do, brb
08:57 aspandey joined #gluster
08:58 rafi joined #gluster
08:59 kramdoss_ joined #gluster
09:00 [diablo] jkroon, http://paste.ubuntu.com/23088186/
09:00 glusterbot Title: Ubuntu Pastebin (at paste.ubuntu.com)
09:00 cblum joined #gluster
09:01 jkroon yea, not seeing anything there that explains it either.
09:01 [diablo] hey cblum
09:01 cblum diablo hey there
09:02 [diablo] :)
09:02 [diablo] right mate, I'll keep it brief
09:02 [diablo] basically the LV is mounted at /rhgs/brick-data
09:02 [diablo] the volume "data" brick's are at /rhgs/brick-data/brick-data
09:03 [diablo] http://paste.ubuntu.com/23088186/ is all the volumes
09:03 glusterbot Title: Ubuntu Pastebin (at paste.ubuntu.com)
09:03 [diablo] you created a test2 volume, but within the /rhgs/brick-data, at /rhgs/brick-data/brick-test2
09:04 [diablo] we even found your command, and ran it again, and get same error as we do when we try to create "fred" or "diablo"
09:04 cblum diablo - is there a reason why the volumes are listed twice?
09:04 [diablo] hmm let me check
09:04 derjohn_mobi joined #gluster
09:04 [diablo] cblum, jesus, yeah
09:04 [diablo] cblum, odd ... :O
09:05 cblum diablo - never touch a running system :P
09:05 cblum *joking*
09:05 [diablo] :P
09:05 [diablo] FU*
09:05 [diablo] :P
09:05 [diablo] same vol-id's
09:05 [diablo] how bizarre
09:06 cblum at least it looks like the output is identical (at first glance)
09:06 cblum so it's not tooo bad
09:06 [diablo] gluster volume list - only shows one of each
09:06 [diablo] but yeah the volume info they double
09:06 cblum hmm...can you check if the installed versions on the nodes are the same?
09:06 [diablo] RPM's
09:07 cblum yes
09:08 [diablo] http://paste.ubuntu.com/23088208/
09:08 glusterbot Title: Ubuntu Pastebin (at paste.ubuntu.com)
09:08 [diablo] cblum, sorry I must have pasted twice the volume info
09:09 cblum so volumes appear only once?
09:10 [diablo] yes
09:10 cblum that's good
09:10 Manikandan_ joined #gluster
09:11 [diablo] cblum, this tmate...
09:18 bkunal joined #gluster
09:23 cblum ndevos: ever seen the Brick may be containing or be contained by an existing brick message?
09:25 kramdoss_ joined #gluster
09:25 jkroon [diablo], how large is that LV?  find /rhgs/brick-data -type d -name .glusterfs - just to confirm that there isn't any badly placed .glusterfs folders which may be what's throwing glusterfs?
09:25 riyas joined #gluster
09:28 cblum jkroon: looks like the glusterd restart solved the issue
09:29 jkroon cblum, freaky.  thanks for the update.
09:30 pvi joined #gluster
09:30 cblum going offline again *wave*
09:31 [fre] Tnx cblum ;)
09:31 cblum left #gluster
09:32 pvi hi all. Is it possible to create distributed,striped,replicated volume over two servers, one with 4x9tb and another one with 18x2tb drives?
09:41 jiffin pvi: why are using stripping, u can use sharded volume instead
09:42 jiffin it is possible to create, but not recommneded
09:48 aspandey joined #gluster
09:49 itisravi joined #gluster
09:53 [diablo] OK all fixed... seems a restart of glusterd helpd
09:54 msvbhat joined #gluster
09:55 kramdoss_ joined #gluster
10:00 pvi jiffin: I can't find any info about shared/sharded volume. can you send me link please?
10:05 post-factum pvi: http://blog.gluster.org/2015/12​/introducing-shard-translator/
10:08 newcomer25 joined #gluster
10:08 pvi post-factum: thanks for the link
10:09 newcomer25 The whole Law is fulfilled in one statement: ‘You’ll love your neighbour as much as yourself’ - Galatians 5:14
10:09 newcomer25 God bless you all and have fun using cluster!
10:09 LinkRage joined #gluster
10:09 kovshenin joined #gluster
10:09 post-factum wat
10:10 rastar joined #gluster
10:16 kramdoss_ joined #gluster
10:19 pvi post-factum: ok, so which type of volume have to create?
10:19 post-factum pvi: which one do you need?
10:20 pvi hmm, I would like to store snapshot from kvm replicated accross two datacenters (2 servers for start)
10:20 post-factum pvi: then, you need replicated volume at least
10:20 post-factum pvi: or distributed-replicated in case you have multiple bricks per replica
10:20 pvi ok, but i still have different bricks count (9x4tb vs 18x2tb)
10:21 post-factum pvi: replica needs to have equal amount of bricks
10:22 pvi do i need to create volume groups from 2x2tb drives?
10:22 bfoster joined #gluster
10:23 post-factum pvi: you may do that. or create 2 bricks per 4tb drive
10:23 post-factum i'd stick with 2 bricks per 4tb drive
10:23 newcomer25 left #gluster
10:23 pvi how can I make 2 bricks per 4tb drive? lvm?
10:23 post-factum pvi: just create 2 subfolders :)
10:23 post-factum pvi: or yes, lvm
10:24 post-factum pvi: lvm is even better because bricks size will be reported correctly
10:25 pvi ok. should i create lvm on both sides (beccouse of lvm overhead) or can go with native bricks on one side and lvm on another?
10:26 post-factum pvi: doesn't matter. i'd use lvm where it is necessary, and not use, where you do not need it
10:27 pvi pocketprotector: ok, thank you, i will give it a try.
10:27 pvi post-factum: : ok, thank you, i will give it a try.
10:27 post-factum pvi: another option is to create some form of raid of 9 and 18 drives
10:27 post-factum pvi: raid6, probably
10:28 pvi post-factum: I have HBA controllers on one side. I would like to give without raid
10:28 post-factum pvi: okay
10:30 riyas joined #gluster
10:31 shyam joined #gluster
10:33 aravindavk_ joined #gluster
10:36 kramdoss_ joined #gluster
10:42 rafi joined #gluster
10:52 poornima joined #gluster
10:54 robb_nl joined #gluster
10:58 aravindavk joined #gluster
10:58 Smoka joined #gluster
11:00 Smoka Hi there!
11:01 Smoka need some help about very slow write on glusterfs with network compression enabled
11:01 Smoka the write speed is ~200 kB/s
11:02 Smoka do I need to set some special option?
11:02 Smoka tried lots of different performance.* options but no luck
11:03 Smoka thanks in advance for any help!
11:04 Smoka or probably if there's an option to disable the network compression on client side ...
11:07 atalur_ joined #gluster
11:09 lalatenduM joined #gluster
11:09 kukulogy joined #gluster
11:18 rastar joined #gluster
11:25 kramdoss_ joined #gluster
11:30 devyani7 joined #gluster
11:44 jvandewege joined #gluster
11:48 jbrooks joined #gluster
11:51 aspandey joined #gluster
11:55 cloph pvi: post-factum if it is just snapshots  (i.e. not constantly accessed data), and acroos two datacenters (with probably high latency/low bandwidth connection), then I'd rather use geo-replication
11:57 harish joined #gluster
11:58 kovshenin joined #gluster
11:59 kdhananjay joined #gluster
12:06 kramdoss_ joined #gluster
12:11 ankitraj joined #gluster
12:20 natgeorg joined #gluster
12:20 [o__o] joined #gluster
12:21 sage_ joined #gluster
12:21 zerick joined #gluster
12:21 Kins joined #gluster
12:23 twisted` joined #gluster
12:23 telius joined #gluster
12:23 arif-ali joined #gluster
12:25 rossdm joined #gluster
12:27 shyam joined #gluster
12:27 pvi cloph: there is low latency and high bandwidth conenction
12:27 The_Pugilist joined #gluster
12:27 pvi 2x10gbit lacp 3+4
12:30 unclemarc joined #gluster
12:34 kramdoss_ joined #gluster
12:35 kukulogy joined #gluster
12:38 ramky joined #gluster
12:42 Gambit15_ joined #gluster
12:46 XpineX joined #gluster
12:46 jvandewege joined #gluster
12:48 frakt joined #gluster
12:58 shyam joined #gluster
12:59 plarsen joined #gluster
13:03 plarsen joined #gluster
13:04 plarsen joined #gluster
13:09 julim_ joined #gluster
13:13 kpease joined #gluster
13:15 johnmilton joined #gluster
13:17 julim_ joined #gluster
13:18 nbalacha joined #gluster
13:19 jww left #gluster
13:25 hagarth joined #gluster
13:27 arcolife joined #gluster
13:35 nathwill joined #gluster
13:37 Jules- joined #gluster
13:40 skylar joined #gluster
13:41 kramdoss_ joined #gluster
13:45 jobewan joined #gluster
13:46 Manikandan joined #gluster
13:51 Manikandan joined #gluster
13:54 rwheeler joined #gluster
13:55 dlambrig joined #gluster
14:02 hagarth joined #gluster
14:03 kramdoss_ joined #gluster
14:03 aravindavk joined #gluster
14:06 moss joined #gluster
14:07 shyam joined #gluster
14:12 squizzi joined #gluster
14:24 level7 joined #gluster
14:32 MessedUpHare joined #gluster
14:33 kukulogy joined #gluster
14:38 marbu joined #gluster
14:39 ebbex joined #gluster
14:56 jiffin joined #gluster
14:57 Gambit15 joined #gluster
14:58 Gambit15 joined #gluster
15:01 Gambit15 Morning all. Continuing with my pre-production tests, any suggestions on good things to do to simulate various failures? So far I've got, 1) Random pulling bricks & 2) Zeroing & resetting attributes directly on the bricks' contained files. Any other suggestions?
15:02 Gambit15 Not juist testing for resilience, but documenting emergency & recovery processes.
15:03 cloph tikonot sure what you mean with "pulling bricks" - if that includes cutting the network connection as well, then you should be more or less OK
15:03 skylar yeah, testing network partitions is good
15:04 rafi joined #gluster
15:04 Gambit15 As in stopping the gluster service on the servers to simulate failed peers
15:04 skylar disrupting network connectivity is more useful, I think, since it will test your quorum setup
15:05 guhcampos joined #gluster
15:07 Gambit15 > 3), ok
15:07 cloph for failure testing, stopping gluster service is "too nice", I'd rather kill -9 the process to not give it a chance to say goodbye to the other  peers
15:08 skylar yep, or just pull the network cord (or equivalent)
15:08 Gambit15 Once I've got the rest of the infra up by the end of the day, I'll be testing it by yanking power cables.
15:09 karnan joined #gluster
15:09 skylar I would also pull the network cord and reconnect it without cycling the power
15:09 skylar since that would trigger a network partition
15:12 skylar even better would be breaking networking such that some nodes can still talk to that brick, but not all of them
15:14 bkolden joined #gluster
15:15 kukulogy joined #gluster
15:15 Smoka iptables rules with -j DROP on the listening ports
15:20 Gambit15 Smoka, yup, that's how I'd simulate a network split. That, or drop just for particular hosts
15:23 Gambit15 Although not so much of a concern here. There's no redundancy for the networking gear at the rack level. Our PoP is redundant, but switch failures are infrequent enough that it doesn't cause too much of a problem just to replace the swithc if it dies
15:24 Gambit15 For the most part, the rack switches are effictively unmanaged. Very infrequently, there might be a specific port or VLAN config that needs updating
15:29 robb_nl joined #gluster
15:50 JoeJulian "things to do to simulate various failures" - set fire to the rack!
15:53 harish joined #gluster
16:09 Gambit15 JoeJulian, you joke, but it wouldn't surprise me! The problem with the various active & backup circuits here is a never ending story. Been great training for setting up bomb proof infrastructure!
16:10 hagarth joined #gluster
16:11 [diablo] joined #gluster
16:12 nathwill_ joined #gluster
16:14 harish joined #gluster
16:17 JoeJulian Gambit15: We run datacenters so we have training for various terrorism scenarios. Next month it's "onsite active shooter awareness training" put on by the DHS and FBI.
16:21 jiffin joined #gluster
16:22 michelle_ joined #gluster
16:24 Gambit15 JoeJulian: Heh, I know it's something that'd need a bit of insider knowledge, but it's suprised me that we haven't seen any attacks targetted at datanceters yet. In London for example, 3 or 4 well placed explosives in a particular 500m2 area would knock out, or at least cause a lot of pain, to all of the communications that pass through Europe
16:27 Gambit15 Most of the core links are terminated in about 4-5 datacenters, all concentrated in a relatively small area
16:27 JoeJulian In the US, communication centers - like the telephone primary center my dad used to work in - are considered primary targets and are built to withstand some pretty heavy attacks. They even have post-nuclear explosion guidelines.
16:33 Gambit15 I imagine the old core comms network that was built before the 80s would be somewhat bunkered, but no idea of where or how. However I know that these datacenters still carry the majority of the backhaul for BT. The question is how good the redundancy is.
16:33 mhulsman joined #gluster
16:35 Gambit15 I expect the government has procedures to protect the area in the case of an emergency, but the buildings definitely aren't bomb-proof & besides the various levels of security for authorisation, the manned protection is just private unarmed guards (no guns here, remember)
16:37 hackman joined #gluster
16:37 Gambit15 Besides, other than perhaps targetting the power, I reckon the power feeds & underground fibre'd be the easiest target - just need to know where to look
16:39 * Gambit15 will probably now find himself stuck on various no-fly & wanted lists
16:41 David_Varghese joined #gluster
16:44 Manikandan joined #gluster
16:47 karnan joined #gluster
16:59 kotreshhr left #gluster
16:59 ivan_rossi left #gluster
17:12 JoeJulian Gambit15: We do periodic thought exercises in how to attack our own DCs. I can't say much, of course, but there's definitely a lot of thought being put in to how best to protect our customers and mitigate possible damage from attacks.
17:15 plarsen joined #gluster
17:16 Gambit15 JoeJulian, I'm trying to debug a problem in VDSM & I'm seeing lots of the following in my gluster logs...
17:16 Gambit15 E [socket.c:3045:socket_connect] 0-glusterfs: connection attempt on 10.123.123.103:24007 failed, (Invalid argument)
17:16 Gambit15 E [glusterfsd-mgmt.c:1902:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: s3 (Transport endpoint is not connected)
17:16 Gambit15 I [glusterfsd-mgmt.c:1919:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
17:17 JoeJulian And you checked to ensure that s3's glusterd is running?
17:17 Gambit15 gluster volume status shows that s3 is online
17:17 JoeJulian Any clues in glusterd's log on s3?
17:17 Gambit15 All looks normal...
17:20 Gambit15 Nothing new being logged on s1, s2 & s3. Just this on s0
17:21 Gambit15 Using replica 3 arbiter 1 s0 s1 s2 s2 s3 s0
17:23 JoeJulian I assume 103 is s3. 24007 is the management port, not a brick. See if you can telnet from that client to 10.123.123.103:24007
17:23 robb_nl joined #gluster
17:24 Gambit15 Conencted
17:24 JoeJulian Yeah, figured the "Transport endpoint is not connected" was from the prior error.
17:25 RameshN joined #gluster
17:26 jvandewege joined #gluster
17:28 msvbhat joined #gluster
17:28 Gambit15 Just touched 9 files into one of the bricks from s0/v0 & all files have been distributed correctly
17:28 Gambit15 s3/v3: I [MSGID: 115029] [server-handshake.c:692:server_setvolume] 0-data-server: accepted client from v0
17:30 Gambit15 Nothing WRT that logged on v0/s0 when I touched the files. This is the first line of the error BTW...
17:30 Gambit15 E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_​callingfn+0x192)[0x7fdd21dcef12] (--> /lib64/libgfrpc.so.0(saved_frame​s_unwind+0x1de)[0x7fdd21b967fe] (--> /lib64/libgfrpc.so.0(saved_fram​es_destroy+0xe)[0x7fdd21b9690e] (--> /lib64/libgfrpc.so.0(rpc_clnt_conne​ction_cleanup+0x84)[0x7fdd21b98064] (--> /lib64/libgfrpc.so.0(rpc_clnt_​notify+0x120)[0x7fdd21b98940] ))))) 0-glusterfs: forced unwinding frame type(GlusterFS Handshake) op(
17:30 glusterbot Gambit15: ('s karma is now -156
17:30 glusterbot Gambit15: ('s karma is now -157
17:30 glusterbot Gambit15: ('s karma is now -158
17:30 glusterbot Gambit15: ('s karma is now -159
17:30 glusterbot Gambit15: ('s karma is now -160
17:31 Gambit15 oops! :S
17:31 Gambit15 Naughty ('
17:32 JoeJulian So some sort of failed handshake.
17:32 mhulsman joined #gluster
17:32 JoeJulian Line before that?
17:33 ankitraj joined #gluster
17:33 nathwill joined #gluster
17:35 Gambit15 E [socket.c:3045:socket_connect] 0-glusterfs: connection attempt on 10.123.123.103:24007 failed, (Invalid argument)
17:35 jvandewege joined #gluster
17:36 Gambit15 ^ first line, that naughty (' one 2nd, then those first 2 at the end
17:36 JoeJulian Ah, ok.
17:36 Gambit15 Repeated every 3 seconds
17:37 JoeJulian 3 seconds is the standard reconnect timeout.
17:40 Gambit15 I'll stop the service on s0 & run glusterd --debug
17:45 Gambit15 Nope. Removed & touched the files again, and redistributed as expected. The other clients log the accepted connection & s0 logs nothing WRT to the sent files
17:46 Gambit15 Oddly, it's not logging those s3 errors to the terminal now
17:47 Lee1092 joined #gluster
17:47 Gambit15 /var/log/glusterfs/rhev-data-center-​mnt-glusterSD-localhost:_engine.log still logging the error though
17:48 Gambit15 Huh, so not included in the --debug output then...
17:50 mhulsman joined #gluster
17:59 jkroon joined #gluster
18:03 JoeJulian Odd.
18:05 JoeJulian rhev? So you're sure it has the upstream client, not the one that ships with RHEL, right?
18:12 nathwill joined #gluster
18:23 hagarth joined #gluster
18:34 kpease joined #gluster
18:36 mhulsman joined #gluster
18:37 kpease_ joined #gluster
18:40 Gambit15 JoeJulian, CentOS 7. Yum says glusterfs came from the centos-gluster38 repo
18:42 Gambit15 http://mirror.centos.org/centos/$rele​asever/storage/$basearch/gluster-3.8/
18:42 Gambit15 ...which is the repo pointed to via https://download.gluster.org/pub/g​luster/glusterfs/LATEST/EPEL.repo/
18:42 glusterbot Title: Index of /pub/gluster/glusterfs/LATEST/EPEL.repo (at download.gluster.org)
18:48 bkolden joined #gluster
18:49 jkroon joined #gluster
18:56 Gambit15 ping JoeJulian
18:57 swebb joined #gluster
18:58 cloph oh, where is the  bot? naked pings shall be avoided - better state the question along with it...
19:01 Gambit15 Ah, should've left out the "ping" then? Not an IRC vet. Just wanted to leave a sign that I'd replied
19:01 cloph nah, just what should he answer when he looks at the screen?
19:01 cloph wait for you to read his "what's up", just when you're away from keyboard,....
19:02 JoeJulian Heh, thanks for policing cloph. ;) He already had content, just wasn't sure if I'd seen it.
19:03 JoeJulian So, Gambit15, it's not that it seems. Can you remount?
19:03 Gambit15 cloph++
19:03 glusterbot Gambit15: cloph's karma is now 2
19:03 Gambit15 ;)
19:03 cloph ah :-)
19:04 Gambit15 Remount...and look for what?
19:04 gnulnx_ Anyone aware of any open memory leak bugs with the 3.7.13 fuse client (while on freebsd)?  Performing sustained IO (e.g. copying 1.4Tb of files out of the mount) results in the process consuming all system memory.  umount fixes it.
19:06 hagarth joined #gluster
19:06 Gambit15 gnulnx_, I can't remember the details, but I did see a bug WRT this that was fixed in the past couple of days
19:07 derjohn_mobi joined #gluster
19:07 JoeJulian gnulnx_: arbiter volume?
19:07 Gambit15 ^that one
19:08 gnulnx_ JoeJulian: distributed volume
19:08 JoeJulian Then try .14
19:09 JonathanD joined #gluster
19:09 Gambit15 JoeJulian: Gambit15> Remount...and look for what?
19:10 JoeJulian Look for the problem to go away?
19:11 swebb joined #gluster
19:11 ashiq joined #gluster
19:11 gnulnx_ JoeJulian: Do you mean there's a bug that was fixed in .14 that might be causing this?
19:11 gnulnx_ I don't see anything obvious in https://github.com/gluster/glusterfs/blob​/release-3.7/doc/release-notes/3.7.14.md
19:11 glusterbot Title: glusterfs/3.7.14.md at release-3.7 · gluster/glusterfs · GitHub (at github.com)
19:12 hchiramm joined #gluster
19:13 JoeJulian git log v3.7.13..v3.7.14 shows two patches for rpc memory leaks.
19:14 JoeJulian bug 1359363
19:14 glusterbot Bug https://bugzilla.redhat.com:​443/show_bug.cgi?id=1359363 unspecified, unspecified, ---, khiremat, CLOSED CURRENTRELEASE, changelog/rpc: Memory leak- rpc_clnt_t object is never freed
19:14 JoeJulian and bug 1360553
19:14 glusterbot Bug https://bugzilla.redhat.com:​443/show_bug.cgi?id=1360553 medium, medium, ---, nbalacha, CLOSED CURRENTRELEASE, Gluster fuse client crashed generating core dump
19:15 JoeJulian Do you need more proof?
19:15 JoeJulian :P
19:15 gnulnx_ Well, there are those...
19:16 JoeJulian iirc, those were the last of the leaks found by post-factum that were waiting to be merged into 3.7.
19:18 gnulnx_ I'll give .14 a shot, thanks JoeJulian
19:18 JoeJulian You're welcome. :)
19:19 post-factum no
19:19 post-factum there is at least another one
19:19 JoeJulian There's more... ugh.
19:19 post-factum https://bugzilla.redhat.co​m/show_bug.cgi?id=1369364
19:19 glusterbot Bug 1369364: medium, unspecified, ---, bugs, NEW , Huge memory usage of FUSE client
19:20 JoeJulian crap
19:20 post-factum fresh, shiny, unbelieveable, unstoppable, unknown
19:21 post-factum untraceable
19:21 post-factum unreproduceable. i fckngly tired of simulating the workload in order to make fuse client leak
19:21 gnulnx_ Yeah that bug sounds like what I'm dealing with.  Lots (1.5 million) of small files
19:22 post-factum real dovecot workload triggers it easily
19:22 post-factum gnulnx_: what uis your workload?
19:22 post-factum *is
19:22 gnulnx_ Audio files.  They're no where near as small as a mail spool, but there are lots of them.  (40ish TB)
19:23 gnulnx_ two clients, which are also the servers, in a distributed volume
19:23 post-factum read/write/remove?
19:23 gnulnx_ heavy reading
19:23 gnulnx_ some writing.  minimal / no mremove
19:24 post-factum gnulnx_: and how many ram gluster client eats?
19:25 gnulnx_ post-factum: 34ish GB (all system memory plus swap)
19:25 post-factum lel
19:25 gnulnx_ Going on site to upgrade to 256GB RAM in two weeks, so if this isn't fixed by then, we'll see if it eats all of that too.
19:26 post-factum probably, you'd like to subscribe to my BZ and try to profile glusterfs process with valgrind
19:29 gnulnx_ BZ?
19:30 gnulnx_ Oh, you mean the bugzilla (BZ?) bug?
19:30 post-factum yep
19:31 post-factum gnulnx_: post there please your volume layout and reconfigured options
19:31 post-factum gnulnx_: i'd like to compare those with mine
19:31 gnulnx_ Sure thing
19:31 Gambit15 JoeJulian, huh, yup - rebooted the server just to be sure, and the errors have stopped. Oddness.
19:31 Gambit15 Cheers
19:32 JoeJulian I suspect the mount was started with an old version and after you upgraded it wasn't restarted.
19:32 JoeJulian That's the only thing I can think of that would have caused that.
19:33 gnulnx_ post-factum: I can't easily recreate the issue (takes about a day of processing) so what I post will be before the memleak manifests itself
19:33 JoeJulian Now that I think about it we could have checked 'lsof | grep deleted'
19:34 JoeJulian with valgrind running, it'll be about 6 days of processing. :/
19:34 Gambit15 JoeJulian, now that you mention it, there was a "deleted" mount. I ran an update on the server after the ovirt install bugged out, so that's exactly what happened
19:35 Gambit15 Should've restarted anyway TBH. Always a good move when hitting odd bugs
19:35 JoeJulian I hate rebooting. Feels like I'm being as lazy as microsoft.
19:39 jkroon joined #gluster
19:40 Gambit15 Yeah, although a good way to reset the system to hard configs.
19:40 gnulnx_ post-factum: done
19:43 post-factum gnulnx_: okay, will check in minutes
19:43 gnulnx_ coffee, brb
19:43 post-factum meanwhile, i'm closer to simulating the right workload
19:44 post-factum it seems, i need several clients to trigger leak, not just one
19:48 johnmilton joined #gluster
19:56 gnulnx_ post-factum: two clients in my case, but the leak only appears on one of them
19:58 skylar joined #gluster
19:58 post-factum gnulnx_: different workload?
20:01 gnulnx_ post-factum: very smilar.  33T of files on the one that doesn't leak, 38T on the one that does leak.  The OK server has a few other applications that it runs, and the directory structure for the files is slightly different.
20:02 gnulnx_ New filse arrive via the client on the OK server.  The leaky server is almost purely reads.
20:05 shyam joined #gluster
20:09 johnmilton joined #gluster
20:15 mhulsman joined #gluster
20:20 post-factum gnulnx_: trying to simulate that kind of splitted r/w workload
20:22 post-factum as of now reading client indeed consumes more ram
20:23 gnulnx_ yup
20:28 johnmilton joined #gluster
20:35 JoeJulian I don't think "unfortunatelly with no attraction from developers" is fair. Most of the leaks you've reported are fixed.
20:37 JoeJulian And with 132 contributors, clearly not all of them can possibly work on the bugs you've reported. So new features don't interfere with bug fixes.
20:37 JoeJulian s/interfere/necessarily interfere/
20:37 glusterbot What JoeJulian meant to say was: And with 132 contributors, clearly not all of them can possibly work on the bugs you've reported. So new features don't necessarily interfere with bug fixes.
20:38 post-factum JoeJulian: who said that?
20:40 post-factum JoeJulian: ah uh ph i see, some guy on ML
20:40 JoeJulian He's in here... I can't remember the handle.
20:40 post-factum JoeJulian: kick one by one ;)
20:41 mhulsman joined #gluster
20:41 JoeJulian lol
20:42 JoeJulian Oooh, his is an unfortunate name to have. I just googled it.
20:43 rwheeler joined #gluster
20:44 post-factum too many polish
20:45 post-factum okay, started from 64M, RSS now is 275M
20:45 post-factum and counting
20:46 post-factum indeed, write mount stuck at 199M
20:46 post-factum read mount grows
20:46 JoeJulian mmm, not here any more it seems. disregard my noise.
20:46 post-factum okay
20:46 post-factum now i have to wait to see how far it may go exhausting my hdds
20:47 post-factum and then split read workload into read itself and pure stat
20:47 JoeJulian Of course, who knows how much of that read memory use is *supposed* to be cache.
20:47 JoeJulian Always hard to tell where cache stops and leak starts.
20:48 post-factum i've noticed in BZ that drop_caches does not help
20:48 post-factum if it should, of course
20:48 JoeJulian That's just kernel caches.
20:48 JoeJulian Not application caches.
20:48 post-factum nevertheless, i expect it to be:
20:48 post-factum performance.cache-size: 33554432
20:49 post-factum performance.client-io-threads: off
20:49 post-factum not exceeding 32M i guess?
20:50 post-factum 308M (now) - 64M (at start) >> 32M
20:50 JoeJulian That was my expectation too, but it appeared (and I've never taken the time to assure myself of this) that that cache-size is used by multiple caches. Always seemed to be some multiple of it.
20:50 post-factum meh
20:50 post-factum as usual, it should be undocumented
20:50 JoeJulian I'm pretty sure it is already. ;)
20:51 post-factum Description: Size of the read cache.
20:52 post-factum https://www.gluster.org/pipermail/glu​ster-devel/2015-September/046611.html
20:52 glusterbot Title: [Gluster-devel] GlusterFS cache architecture (at www.gluster.org)
20:52 JoeJulian file a bug (for my own laziness)
20:52 glusterbot https://bugzilla.redhat.com/en​ter_bug.cgi?product=GlusterFS
20:52 post-factum yep, that was response to my email
20:52 post-factum okay, I see 5 read caches
20:53 post-factum 5*32==160
20:53 post-factum still less than current size diff
20:53 JoeJulian ++
20:53 post-factum what bug would you like to fill?
20:54 JoeJulian I just wanted to be able to copy/paste the link.
20:54 * post-factum should complain about wrong description?
20:54 post-factum ah oh laziness
20:54 JoeJulian Oh, that, yeah...
20:54 JoeJulian I suppose documentation would be sufficient, though I'd prefer the application stick within the cache limits set.
20:55 post-factum at first. i prefer everything to be disabled by default
20:55 post-factum and then, yes, does not exceed what i set
20:55 JoeJulian No, optimal settings for generic usage should be the defaults.
20:56 post-factum "optimal" is always a trade-off
20:56 post-factum okay, then, i want cache monitoring
20:56 JoeJulian Ooooh! I like that one.
20:57 JoeJulian feature request
20:57 post-factum but that should be reflected in statedump
20:57 post-factum or not?
20:57 JoeJulian I would like it in top
20:57 post-factum glustertop :)
20:57 post-factum client-side, i must say
20:57 shyam joined #gluster
20:58 JoeJulian gluster volume top does exist.
20:58 JoeJulian Client-side would be good.
20:58 JoeJulian I wonder if you can get that through the special directory that I can never remember...
20:59 post-factum .meta?
20:59 JoeJulian yeah
20:59 post-factum unlikely
20:59 JoeJulian I thought it was most of the state dump info
20:59 post-factum hmm
21:00 post-factum there is mallinfo file with memory info
21:01 JoeJulian There's also meminfo for each of the xlators
21:01 JoeJulian under graphs
21:01 post-factum let me check
21:02 post-factum [performance/io-cache.test-io-cache - usage-type gf_ioc_mt_ioc_inode_t memusage]
21:02 post-factum size = 13778184
21:02 post-factum okay, not that big
21:03 post-factum how could i drop all the caches?
21:03 JoeJulian remount is the "only" way... unless...
21:04 JoeJulian Perhaps changing a setting that would change the client graph could force a reload and possibly release memory.
21:06 post-factum anyway, all that sizes should be in statedump, yeah?
21:06 post-factum and i can sum them
21:06 congpine joined #gluster
21:06 JoeJulian yep
21:06 post-factum and get far less value than rss
21:07 JoeJulian Seems like a very good indicator, yeah.
21:07 post-factum okay, given RSS is 374M...
21:09 post-factum lol
21:09 post-factum there are integer ovrflows
21:09 post-factum *overflows
21:09 post-factum i cannot sum that things
21:10 post-factum [mount/fuse.fuse - usage-type gf_fuse_mt_iov_base memusage]
21:10 post-factum size=4293480632
21:10 post-factum YOU DONT SAY
21:10 post-factum :(
21:12 post-factum that's ridiculous
21:16 d0nn1e joined #gluster
21:16 post-factum okay, will leave test script for the night, then update bz
21:16 post-factum hard to say something without proper accounting
21:28 JoeJulian amye-away: where in Berlin is the event? Same venue as LinuxCon?
21:30 JoeJulian Oh, nevermind. It's actually somewhere easy to find: https://www.gluster.org/events/summit2016/
21:30 glusterbot Title: Gluster Developer Summit 2016 Gluster (at www.gluster.org)
21:33 bluenemo joined #gluster
21:42 ZachLanich joined #gluster
21:54 chirino joined #gluster
21:56 skylar joined #gluster
21:57 Gambit15 JoeJulian, I've reinstalled the OS for s0, however the original gluster brick is untouched on another disk. What would be the procedure for re-adding that brick in its current state?
21:58 JoeJulian You didn't save /var/lib/glusterd I presume?
21:58 Gambit15 No.
21:58 Gambit15 The volume is still up on the other nodes though
21:59 JoeJulian Get the uuid from another server (it's in /var/lib/glusterd/peers) and put it in /var/lib/glusterd/glusterd.info : UUID=$uuid_as_known_by_peer
22:00 JoeJulian start glusterd, probe an existing peer (it should be allowed because it has the right uuid) and the volume should populate.
22:00 JoeJulian Then just start...force the volume.
22:00 JoeJulian Or restart glusterd
22:01 Gambit15 Cool, cheers. Will give it a go once its finished updating
22:01 mattmcc I've been getting this line several times a second in glusterfs.log..  [2016-08-25 22:00:46.479398] W [dict.c:429:dict_set] (-->/usr/lib/x86_64-linux-gnu/gluster​fs/3.7.14/xlator/cluster/replicate.so​(afr_lookup_xattr_req_prepare+0xb0) [0x7fe9532b93d0] -->/usr/lib/x86_64-linux-gnu/libg​lusterfs.so.0(dict_set_str+0x2c) [0x7fe9585d9d3c] -->/usr/lib/x86_64-linux-gnu/li​bglusterfs.so.0(dict_set+0x96) [0x7fe9585d7cb6] ) 0-dict: !this || !value for key=link-coun
22:01 mattmcc t [Invalid argument]
22:01 glusterbot mattmcc: ('s karma is now -161
22:02 JoeJulian @karma (
22:02 glusterbot JoeJulian: Karma for "(" has been increased 4 times and decreased 165 times for a total karma of -161.
22:02 JoeJulian poor (
22:02 Gambit15 Heh
22:02 Gambit15 No smoke without fire!
22:04 JoeJulian mattmcc: clearly a warning from afr_lookup_xattr_req_prepare. I thnk dict_set_str is failing because either there's no dict, which seems unlikely, or there's no link-count.
22:05 JoeJulian what that actually means, I'm not so sure.
22:07 JoeJulian mattmcc: Looks like that should be followed with a warning from afr_lookup_xattr_req_prepare: "%s: Unable to set dict value for %s"
22:09 * JoeJulian feels like he's talking to himself...
22:09 mattmcc Hmm, no, I don't have that one.
22:09 JoeJulian mmkay
22:09 JoeJulian What version?
22:10 mattmcc 3.7.14 from Ubuntu 16.04
22:10 JoeJulian I was looking at 3.8... let's see if the code's any different in 3.7...
22:14 JoeJulian Looks like it should be coming from https://github.com/gluster/glusterfs/blob/v3.7.14​/xlators/cluster/afr/src/afr-common.c#L1204-L1210 which, if dict_set_uint64 fails, should produce a warning.
22:14 glusterbot Title: glusterfs/afr-common.c at v3.7.14 · gluster/glusterfs · GitHub (at github.com)
22:15 JoeJulian Mmm, nope, I'm wrong.
22:15 JoeJulian That's not the right define...
22:16 JoeJulian Ah, it's the much more obvious and far less helpful https://github.com/gluster/glusterfs/blob/v3.7.14​/xlators/cluster/afr/src/afr-common.c#L1220-L1224
22:16 glusterbot Title: glusterfs/afr-common.c at v3.7.14 · gluster/glusterfs · GitHub (at github.com)
22:19 JoeJulian No change from that in master, so it's certainly not some known bug.
22:20 shyam joined #gluster
22:20 JoeJulian I would file a bug against replicate for that. Include a bit more log above and below the error. Debug level if you can.
22:20 glusterbot https://bugzilla.redhat.com/en​ter_bug.cgi?product=GlusterFS
22:20 mattmcc This is a relatively recent volume, replacing a smaller, older volume that was running 3.7.4 but didn't have this issue.
22:21 JoeJulian it was added in 3.7.7
22:21 mattmcc The setup should've been the same, ext4..
22:21 pampan joined #gluster
22:22 JoeJulian Do you have any idea what process is happening that's triggering that? It should be some file op.
22:24 mattmcc Not yet, I need to track that down.  My next step, I guess.
22:25 JoeJulian I'd file the bug first. Hopefully atalur can see the problem right away. Add information as you get it though.
22:27 mattmcc Well, turning off uwsgi made it stop, so that's a clue.  This is a set of four uwsgi app nodes, two of which are replicating the volume (and being clients) and the other two are just clients.
22:28 pampan JoeJulian: Hi there! Dunno if you remember, on the 08/18 I've asked if it was safe to remove some files from indices/xattr, because they were causing a big memory leak on gluster 3.5.7. I've only removed the gfid-named files, not the xattr- ones. I've stopped the memory leak with that, but something worst come up. I'm seeing repeated files on the volume clients! Any hints on how to procede from here? The
22:28 pampan only relevant log I saw on the brick was: E [index.c:271:check_delete_stale_index_file] 0-gv-bgp-split-index: Base index is not createdunder index/base_indices_holder
22:28 mattmcc It might be trying to write to an access log file on the volume, it'd certainly fit the pace of the warnings.
22:42 Slashterix joined #gluster
22:43 Slashterix left #gluster
22:43 Slashterix joined #gluster
22:44 Slashterix Hello, I'm testing out gluster. I find that when one replica node goes away IO on the fuse mount will hang for 60s then resume as normal. Is there a way to tune that timeout ?
22:51 JoeJulian pampan: You probably have different gfid on multiple bricks for the same file.
22:52 JoeJulian ~ping-timeout | Slashterix
22:52 glusterbot Slashterix: The reason for the long (42 second) ping-timeout is because re-establishing fd's and locks can be a very expensive operation. With an average MTBF of 45000 hours for a server, even just a replica 2 would result in a 42 second MTTR every 2.6 years, or 6 nines of uptime.
23:06 kukulogy joined #gluster
23:11 Slashterix thanks. if i know im taking down a brick (to patch the OS) can I do anything to warn gluster and avoid the timeout?
23:12 shyam joined #gluster
23:27 JoeJulian Slashterix: yep, just don't kill -9. A normal TERM will notify the brick service to shut down and allow it to close the TCP connections. When a client receives the RST, it closes the connection safely without triggering the ping-timeout.
23:47 Klas joined #gluster
23:50 shyam joined #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary