Perl 6 - the future is here, just unevenly distributed

IRC log for #gluster, 2014-09-10

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:01 dare333 Question: How can I see the settings for a cluster (the ones you set with the volume set command)
00:02 P0w3r3d joined #gluster
00:04 Otlichno joined #gluster
00:04 Otlichno anyone around?
00:19 Lee- joined #gluster
00:27 d4nku joined #gluster
00:36 dtrainor joined #gluster
00:42 PeterA left #gluster
00:57 lyang0 joined #gluster
01:13 jmarley joined #gluster
01:21 theron joined #gluster
01:24 theron_ joined #gluster
01:24 foster joined #gluster
01:27 necrogami joined #gluster
01:36 vimal joined #gluster
01:46 gildub joined #gluster
01:46 sputnik13 joined #gluster
01:56 theron joined #gluster
02:03 gmcwhistler joined #gluster
02:06 bala joined #gluster
02:06 bennyturns joined #gluster
02:13 harish joined #gluster
02:15 haomaiwa_ joined #gluster
02:18 gmcwhistler joined #gluster
02:27 kanagaraj joined #gluster
02:30 haomai___ joined #gluster
02:35 haomaiwang joined #gluster
02:36 haomaiwa_ joined #gluster
02:38 bharata-rao joined #gluster
02:42 haomaiw__ joined #gluster
02:49 hagarth joined #gluster
03:35 haomaiwa_ joined #gluster
03:45 glusterbot New news from newglusterbugs: [Bug 1136221] The memories are exhausted quickly when handle the message which has multi fragments in a single record <https://bugzilla.redhat.co​m/show_bug.cgi?id=1136221>
03:47 haomai___ joined #gluster
03:57 shubhendu_ joined #gluster
04:04 spandit joined #gluster
04:11 itisravi joined #gluster
04:12 atinmu joined #gluster
04:13 RameshN joined #gluster
04:15 sputnik13 joined #gluster
04:18 nbalachandran joined #gluster
04:19 rjoseph joined #gluster
04:19 eryc joined #gluster
04:19 eryc joined #gluster
04:19 rjoseph joined #gluster
04:27 ppai joined #gluster
04:37 soumya joined #gluster
04:47 sputnik13 joined #gluster
04:48 ramteid joined #gluster
04:52 atalur joined #gluster
04:54 anoopcs joined #gluster
04:54 Rafi_kc joined #gluster
04:54 rafi1 joined #gluster
04:59 ndarshan joined #gluster
05:04 deepakcs joined #gluster
05:20 saurabh joined #gluster
05:20 sputnik13 joined #gluster
05:26 kdhananjay joined #gluster
05:27 nishanth joined #gluster
05:42 meghanam joined #gluster
05:42 meghanam_ joined #gluster
05:44 sputnik13 joined #gluster
05:45 nshaikh joined #gluster
05:46 kumar joined #gluster
05:53 bala joined #gluster
06:01 anoopcs joined #gluster
06:07 karnan joined #gluster
06:13 soumya joined #gluster
06:14 andreask joined #gluster
06:14 sputnik13 joined #gluster
06:14 zerick joined #gluster
06:15 MickaTri joined #gluster
06:16 MickaTri Can we use redundant feature of Glusterfs with ProxMox or Proxmox allow just one IP for glusterfs and then there is no H-A ?
06:16 ndarshan joined #gluster
06:18 jtux joined #gluster
06:20 RaSTar joined #gluster
06:23 lalatenduM joined #gluster
06:23 sputnik13 joined #gluster
06:25 kanagaraj joined #gluster
06:27 nishanth joined #gluster
06:43 Guest87037 joined #gluster
06:44 elico joined #gluster
06:45 raghu joined #gluster
06:46 glusterbot New news from newglusterbugs: [Bug 1136702] Add a warning message to check the removed-bricks for any files left post "remove-brick commit" <https://bugzilla.redhat.co​m/show_bug.cgi?id=1136702>
06:50 ricky-ti1 joined #gluster
06:57 hchiramm joined #gluster
06:58 ekuric joined #gluster
07:01 ekuric joined #gluster
07:07 R0ok_ joined #gluster
07:07 rgustafs joined #gluster
07:08 sputnik13 joined #gluster
07:16 glusterbot New news from newglusterbugs: [Bug 1139986] DHT + Snapshot :- If snapshot is taken when Directory is created only on hashed sub-vol; On restoring that snapshot Directory is not listed on mount point and lookup on parent is not healing <https://bugzilla.redhat.co​m/show_bug.cgi?id=1139986> || [Bug 1139988] DHT :- data loss - file is missing on renaming same file from multiple client at same time <https://bugzilla.redhat.co​m/show_bug.cgi?id=1139988>
07:18 atinmu joined #gluster
07:20 ndarshan joined #gluster
07:21 nishanth joined #gluster
07:24 shubhendu joined #gluster
07:24 glusterbot New news from resolvedglusterbugs: [Bug 983431] DHT: NFS process crashed on a node in a cluster when another storage node in the cluster went offline <https://bugzilla.redhat.com/show_bug.cgi?id=983431>
07:27 deepakcs joined #gluster
07:32 fsimonce joined #gluster
07:32 R0ok_ joined #gluster
07:37 shubhendu joined #gluster
07:46 glusterbot New news from newglusterbugs: [Bug 1122586] Read/write speed on a dispersed volume is poor <https://bugzilla.redhat.co​m/show_bug.cgi?id=1122586>
07:48 soumya joined #gluster
07:49 hagarth joined #gluster
07:56 mariusp joined #gluster
07:56 mariusp joined #gluster
08:02 liquidat joined #gluster
08:13 atinmu joined #gluster
08:32 bala1 joined #gluster
08:33 recidive joined #gluster
08:39 Norky joined #gluster
08:45 lyang0 joined #gluster
08:45 jiffin joined #gluster
08:49 ndarshan joined #gluster
08:56 atalur joined #gluster
09:05 Slashman joined #gluster
09:09 shubhendu joined #gluster
09:12 hagarth joined #gluster
09:12 hagarth left #gluster
09:17 soumya joined #gluster
09:19 kanagaraj joined #gluster
09:19 ndarshan joined #gluster
09:23 bala joined #gluster
09:26 kanagaraj_ joined #gluster
09:30 nublaii joined #gluster
09:32 nublaii good morning... can I create a new share using a san or nas partition mounted on the machine that is going to act as the server??
09:35 d-fence joined #gluster
09:47 glusterbot New news from newglusterbugs: [Bug 1140084] quota: bricks coredump while creating data inside a subdir and lookup going on in parallel <https://bugzilla.redhat.co​m/show_bug.cgi?id=1140084>
09:48 soumya joined #gluster
09:49 meghanam_ joined #gluster
09:49 meghanam joined #gluster
09:49 necrogami joined #gluster
09:51 R0ok_ joined #gluster
09:51 hagarth joined #gluster
10:07 rgustafs joined #gluster
10:09 mariusp joined #gluster
10:18 richvdh joined #gluster
10:19 qdk joined #gluster
10:23 mariusp joined #gluster
10:30 mariusp joined #gluster
10:34 7GHAA0NDZ joined #gluster
10:38 ndevos nublaii: the filesystem that you use as brick should support extended attributes, a NAS is suboptimal, a SAN providing disks to a server should work
10:39 edward1 joined #gluster
10:39 nublaii so anything like CIFS, NFS is out of the question
10:39 nublaii but iSCSI should work?
10:40 nublaii and tnx for the answer ;)
10:40 ndevos nublaii: NFS doesnt work, no xattrs, I dont know about CIFS, but would recommend against it
10:40 getup- joined #gluster
10:41 ndevos iSCSI should not be a problem, but you're advised to keep iSCSI traffic and Gluster networking on different NICs
10:41 getup- hi, is there a recommended maximum number of volumes one can have in a cluster?
10:42 ndevos getup-: not really, but the more volumes you have, the longer a 'gluster volume status' and the like will take
10:43 getup- ndevos: i wasn't planning on adding a lot, but we have around 10 volumes now and i just wondered if it could do any harm
10:43 getup- but that's good to know, thanks
10:46 ndevos getup-: 10 should be fine I guess, but most users I have seen use fewer
10:47 diegows joined #gluster
10:47 ndevos getup-: a high number of bricks would be similar in many ways. and I know of at least some configurations with upto 100 bricks in a gluster environment or so
10:50 harish joined #gluster
10:51 hagarth joined #gluster
10:53 atinmu joined #gluster
10:57 haomaiwa_ joined #gluster
11:02 kdhananjay joined #gluster
11:02 meghanam_ joined #gluster
11:02 meghanam joined #gluster
11:08 soumya joined #gluster
11:12 getup- joined #gluster
11:14 getup- ndevos: we have never used this many volumes before either, but we'll keep an eye on it
11:19 ppai joined #gluster
11:22 ndarshan joined #gluster
11:25 glusterbot New news from resolvedglusterbugs: [Bug 1065617] gluster 3.5 hostname does not resolving <https://bugzilla.redhat.co​m/show_bug.cgi?id=1065617>
11:36 LebedevRI joined #gluster
11:39 meghanam_ joined #gluster
11:40 meghanam joined #gluster
11:44 andreask joined #gluster
11:47 glusterbot New news from newglusterbugs: [Bug 1065620] in 3.5 hostname resuluation issues <https://bugzilla.redhat.co​m/show_bug.cgi?id=1065620>
11:49 RaSTar joined #gluster
11:52 atinmu joined #gluster
11:55 JustinClift *** Weekly GlusterFS Community Meeting is in 5 mins in #gluster-meeting on irc.freenode.net ***
11:58 jdarcy joined #gluster
12:00 ndarshan joined #gluster
12:02 meghanam_ joined #gluster
12:03 MickaTri Can we use redundant feature of Glusterfs with ProxMox or Proxmox allow just one IP for glusterfs and then there is no H-A ?
12:03 meghanam joined #gluster
12:04 Slashman joined #gluster
12:05 soumya joined #gluster
12:13 mariusp joined #gluster
12:16 JustinClift MickaTri: Probably better to ask that on the gluster-users mailing list. :)  More in depth answers that way. ;)
12:17 MickaTri2 joined #gluster
12:19 itisravi_ joined #gluster
12:23 hagarth joined #gluster
12:27 LHinson joined #gluster
12:29 Pupeno joined #gluster
12:30 anoopcs joined #gluster
12:33 mojibake joined #gluster
12:56 bene2 joined #gluster
12:59 shubhendu joined #gluster
13:00 chirino joined #gluster
13:08 necrogami joined #gluster
13:08 gothos great, my glusterfs just died with no error message at all, but systems simply stopped doing anything, restarting the services fixed that oO
13:09 tdasilva joined #gluster
13:10 RameshN joined #gluster
13:11 gothos is there maybe a known problem if you copy a few hundred thousand small files at once or something?
13:12 lalatenduM gothos, lots of small files have been difficult for Gluster, that being said, but it should die, most likely it is a bug
13:13 lalatenduM gothos, what is the version gluster u r using?
13:13 gothos lalatenduM: 3.5.2 on CentOS 7
13:14 asku left #gluster
13:15 lalatenduM gothos, cool, el7 user :), ndevos have you heard about this issue
13:16 ndevos lalatenduM: not really, no, but it could be related to a memory issue like bug 1126831
13:16 glusterbot Bug https://bugzilla.redhat.com:​443/show_bug.cgi?id=1126831 medium, high, ---, gluster-bugs, NEW , Memory leak in GlusterFs client
13:17 chirino joined #gluster
13:17 ndevos gothos: how are you mounting, over NFS or FUSE?
13:17 lalatenduM ndevos, yup, might be
13:17 glusterbot New news from newglusterbugs: [Bug 1127140] memory leak <https://bugzilla.redhat.co​m/show_bug.cgi?id=1127140> || [Bug 1126831] Memory leak in GlusterFs client <https://bugzilla.redhat.co​m/show_bug.cgi?id=1126831>
13:18 lalatenduM gothos, check this out http://wiki.centos.org/Special​InterestGroup/Storage/Proposal, we building gluster pkgs in CentOS for easy consumption
13:18 ndevos gothos: oh, and if there is an issue in glusterfs-fuse (like a crash or so), the server side will not be able to log it
13:19 ekuric joined #gluster
13:19 ndevos hmm, but if restarting the services worked, and nothing was changed client-side, it's probably not a glusterfs-fuse issue
13:19 lalatenduM ndevos, yup, sounds like a server issue
13:20 ndevos still can be glusterfs-fuse if access is done over CIFS ;)
13:22 gothos ndevos: I'm using FUSE
13:23 ndevos gothos: and only the service on the storage servers were restarted, the client mount did not change?
13:24 gothos ndevos: correct, both active clients just worked again
13:25 gothos well, it just died again. will look for output of any kind *sigh*
13:28 julim joined #gluster
13:28 schrodinger joined #gluster
13:30 ndevos gothos: are the glusterfsd processes still running? and do you have anything in the logs on the client side?
13:30 ndevos gothos: also check /var/log/messages on the storage servers
13:31 gothos ndevos: http://www.l3s.de/~zab/data-brick0.log that is the server side in the brick dir in /var/log/glusterfs
13:31 gothos ndevos: messages is empty
13:32 gothos ndevos: okay, let me rephrase it just started working again after liek 10 minutes without any data being sent.
13:32 gothos so I gues it was hanging for some reason, but the backend storage was responsive
13:33 gothos what I have at the client side is the following: I [afr-self-heal-common.c:2868:afr_​log_self_heal_completion_status] 0-data-replicate-0:  metadata self heal  is successfully completed,   metadata self heal from source data-client-0 to data-client-1,  metadata - Pending matrix:  [ [ 0 0 ] [ 0 0 ] ], on /
13:34 Rafi_kc left #gluster
13:35 schrodinger I have a split-brain reported on a gluster cluster of 4 nodes. http://pastie.org/private/lqjtq2yt2imfykooqoioq   I've used getattr, stat and splitmount to try and fix the issue. Removing certain copies plus hard links from .glusterfs/ and Now I see that there is no longer a trusted.afr.vol-client-X http://pastie.org/private/mxys4waxx8ibeixud2ukw
13:35 glusterbot Title: Private Paste - Pastie (at pastie.org)
13:36 schrodinger Could someone help point me in the direction of what has happened here? Thanks.
13:37 ndevos gothos: hmm, there are quite some errors in that log, most of them related to network disconnects of some sort
13:37 ndevos gothos: so you happen to have ,,(ping timeout) set to a low value?
13:37 glusterbot gothos: I do not know about 'ping timeout', but I do know about these similar topics: 'ping-timeout'
13:37 ndevos @ping-timeout
13:37 glusterbot ndevos: The reason for the long (42 second) ping-timeout is because re-establishing fd's and locks can be a very expensive operation. Allowing a longer time to reestablish connections is logical, unless you have servers that frequently die.
13:39 gothos ndevos: it's at the default value and the system are right next to each other *shrug*
13:40 ndevos gothos: hmm
13:41 gothos ndevos: I had the idea that the switch might me broken, but sending an IP over that network works just fine
13:41 ndevos gothos: and there are no hints in /var/log/messages when this happens?
13:42 ndevos or, in the systemd journal?
13:43 gothos let me check the journal, but nothing on the clients in messages or the servers
13:43 gothos the clients are C6 tho
13:43 jobewan joined #gluster
13:44 gothos ndevos: nothing in the journal either
13:44 RaSTar joined #gluster
13:46 ndevos gothos: you might want to increase debugging then, I think the option is called diagnostics.brick-log-level and you can set it to DEBUG
13:46 ndevos gothos: for the client, you could add the mount option "log-level=DEBUG", that should get quite some more details
13:47 gothos ndevos: wonderful, I'll try those out
13:49 theron joined #gluster
13:49 failshell joined #gluster
13:50 gothos ndevos: okay, just set the server part to DEBUG and I'm getting _a lot_ of: Lock is grantable, but blocking to prevent starvation
13:51 kdhananjay joined #gluster
13:51 theron joined #gluster
13:51 lmickh joined #gluster
13:52 xleo joined #gluster
13:56 gothos well, I'm getting the following on the client with debugging: entry self-heal triggered. path: /, reason: checksums of directory differ
13:56 gothos so I guess the backend is hanging due to doing auto self heal
13:57 bala joined #gluster
14:02 ekuric1 joined #gluster
14:08 shubhendu joined #gluster
14:11 wushudoin| joined #gluster
14:15 longshot902 joined #gluster
14:34 sputnik13 joined #gluster
14:38 daMaestro joined #gluster
14:43 gmcwhistler joined #gluster
14:45 MickaTri2 joined #gluster
14:46 MickaTri2 left #gluster
14:48 MickaTri3 joined #gluster
15:00 LHinson1 joined #gluster
15:02 gmcwhistler joined #gluster
15:04 gmcwhistler joined #gluster
15:15 RameshN joined #gluster
15:18 _Bryan_ joined #gluster
15:19 khanku joined #gluster
15:23 theron_ joined #gluster
15:23 sputnik13 joined #gluster
15:25 nbalachandran joined #gluster
15:28 theron joined #gluster
15:33 notxarb joined #gluster
15:41 theron joined #gluster
15:49 dtrainor joined #gluster
15:50 txbowhunter joined #gluster
15:50 theron joined #gluster
15:50 txbowhunter Hi all
15:51 txbowhunter I was wondering if anyone was around who could let me know if WORM functionality is operational/stable in gluster yet?
15:52 txbowhunter I'm designing a multi-tiered storage solution for my firm and want to see if I can leverage this to not have to purchase a separate EMC/Netapp solution that has commercial WORM features
15:53 txbowhunter The site currently states that the functionality exists, but it looks like there's some caveats there, but I don't know how up to date that info is
15:54 meghanam_ joined #gluster
15:54 meghanam joined #gluster
15:55 ira joined #gluster
15:55 balacafalata joined #gluster
16:00 JoeJulian txbowhunter: last I heard it was working.
16:01 dtrainor joined #gluster
16:01 txbowhunter JoeJulian: do you happen to know if it satisfies SEC 17a-4 with regard to data being acceptably immutable?
16:02 txbowhunter aka, if you set a retention policy for a WORM volume and put data in that volume, can it verifiably not be deleted or modified for the duration of the configured retention period?
16:02 txbowhunter Sorry, I know that's a total left field question, hahah
16:03 JoeJulian Heh, yeah, I have no idea on that.
16:04 semiosis i think it's a safe bet that if the glusterfs open source project was certified by some independent auditor we'd have heard about it
16:05 soumya joined #gluster
16:06 elico joined #gluster
16:06 Otlichno Anyone familiar with the proper procedure for cleaning up mnt-replicated.log?
16:06 txbowhunter semiosis: I agree, but I wonder if any legal expert has taken the WORM feature set of gluster and determined if it does or does not satisfy the aforementioned SEC statute
16:06 JoeJulian Otlichno: We prefer to use logrotate with copytruncate
16:06 txbowhunter I mean, if it does indeed render the data immutable for the configured retention period, I think it would be deemed accepteble
16:07 Otlichno JoeJulian do you mean the gluster command for log rotate, or the linux log rotate?
16:07 JoeJulian Otlichno: logrotate's copytruncate method.
16:08 JoeJulian http://linux.die.net/man/8/logrotate
16:08 semiosis txbowhunter: going out on a limb here, but seems to me that would be a property of the whole deployed system, of which glusterfs is just one component
16:08 glusterbot Title: logrotate(8) - Linux man page (at linux.die.net)
16:08 kkeithley txbowhunter: community glusterfs has not been audited or certified for SEC 17a-4. (Or anything else for that matter.)
16:09 Otlichno thats funny JoeJulian I just searched and found that too :)
16:09 Otlichno Thank you!
16:09 semiosis txbowhunter: might be worth asking a redhat storage sales rep if the commercial product is compliant
16:10 semiosis kkeithley: hey, it's certified AWESOME by me!
16:10 kkeithley yes, and me too.
16:10 kkeithley ;-)
16:10 txbowhunter semiosis: I agree re: the whole solution
16:11 txbowhunter and I will ask a RH rep about it also
16:11 txbowhunter kkeithley: gotcha
16:11 Otlichno ahh thats why its so big, my predicessor never put a logrotate.d config file in
16:12 kkeithley FWIW, there's no retention duration in our WORM implementation.  It's all or nothing, on or off.
16:12 semiosis kkeithley: know if there's any docs on WORM besides whatever the interactive command help says?
16:12 kkeithley We do have a Compliance feature in the works that will have features like file-level retention, litigation hold, and those sorts of things.
16:12 semiosis ooh!
16:13 kkeithley txbowhunter: just what's on www.gluster.org. Documentation has been one of our perennial sore points
16:15 skippy txbowhunter: https://github.com/gluster/glusterf​s/blob/master/doc/features/worm.md
16:15 glusterbot Title: glusterfs/worm.md at master · gluster/glusterfs · GitHub (at github.com)
16:15 skippy but in my tests on Gluster 3.5.2, WORM was effectively read-only :(
16:15 skippy https://gist.github.com/skpy/29cf2a4fe334cd2a142b
16:15 glusterbot Title: gluster-worm.md (at gist.github.com)
16:19 kryl joined #gluster
16:19 kryl hi
16:19 glusterbot kryl: Despite the fact that friendly greetings are nice, please ask your question. Carefully identify your problem in such a way that when a volunteer has a few minutes, they can offer you a potential solution. These are volunteers, so be patient. Answers may come in a few minutes, or may take hours. If you're still in the channel, someone will eventually offer an answer.
16:20 kryl I use defaults,_netdev  options in /etc/fstab but I have to start mount -a command to mount manually the volumes... is there a way to debog or delay that ?
16:20 txbowhunter kkeithley: the only point in the doc I found concerning was "new gluster 'volume set' command to enable/disable 'WORM' feature on a volume"
16:20 semiosis kryl: what distro?
16:20 semiosis kryl: what version of glusterfs?
16:20 txbowhunter if one can disable the feature and make the data no longer immutable, it would surely fail the litmus test of the SEC
16:21 MacWinner joined #gluster
16:21 semiosis txbowhunter: any admin with access to the storage servers could go around glusterfs & modify the bricks directly.  of course, they're not supposed to, but they could
16:21 txbowhunter they're looking for a "once you write the data, nobody can mess with it in any way, shape or form, short of taking a huge magnet to the drives" architecture
16:24 kkeithley Yes, I spend seven years working on EMC Centera and am familiar with compliance. The GlusterFS WORM feature is not meant to address things like SarBox, HIIPA, or any of the European regulatory requirements. That will be in the Compliance feature I mentioned.
16:24 kkeithley SarBox, heh, that's good
16:24 kkeithley SarbOx
16:24 semiosis i suppose a system could be set up where the data is encrypted & replicated to servers with different admins, so no admin could modify the data on boht servers
16:24 txbowhunter kkeithley: ahh ok, gotcha
16:25 txbowhunter is there any sort of ETA on this Compliance feature, kkeithley ?
16:25 kkeithley 3.7 might be optimistic.
16:26 kkeithley Very optimistic
16:27 JoeJulian If you have developers you can donate...
16:28 Otlichno left #gluster
16:28 JoeJulian Wouldn't you need a storage medium that's not erasable then?
16:29 semiosis well he said huge magnet
16:29 Lee- I've rebooted one of my 2 gluster nodes in a replication set up. The rebooted machine when I run "gluster volume status" shows that the brick held by this 2nd node is not online. When I check the log file I get a message that states "Using Program GlusterFS 3.3, Num (1298437), Version (330)" and another that states "Server and Client lk-version numbers are not same, reopening the fds", however gluster --version says "glusterfs 3.5.2"
16:29 glusterbot Lee-: This is normal behavior and can safely be ignored.
16:30 kkeithley Why? One of EMC Centera's biggest use cases is compliance, and it doesn't need uneraseable storage media to achieve that
16:30 txbowhunter kkeithley: yep
16:30 * semiosis ^5 glusterbot
16:30 txbowhunter the regulators just want to see that there exists no functionality to delete or modify the data once on the volume
16:31 txbowhunter they realize and accept that inherently, the data on a magnetic storage device is subject to the laws of mangetism, hehe
16:31 semiosis Lee-: check the log files for an indication why the brick did not start.  /var/log/glusterfs/bricks/whatever
16:31 semiosis Lee-: put the log on pastie.org if you want us to take a look
16:33 kkeithley Well, if someone can get root access to their Centera (e.g. L1 support) there's nothing that would prevent someone with that level of access from deleting or modifying files that are in retention. The difference is that Centera is a closed box appliance. Gluster could be that too, but generally it's not.
16:34 toordog-work semiosis I like your script gfid-resolver.sh,  It should be included with Gluster FS as part of a troubleshoot toolkit
16:35 semiosis thanks! :)
16:40 Lee- log of the brick in question: http://pastebin.com/rwvC0rGv
16:40 glusterbot Please use http://fpaste.org or http://paste.ubuntu.com/ . pb has too many ads. Say @paste in channel for info about paste utils.
16:41 Lee- http://fpaste.org/132499/67308141/
16:41 glusterbot Title: #132499 Fedora Project Pastebin (at fpaste.org)
16:50 semiosis Lee-: that brick is running, accepting connections from clinets.
16:50 semiosis clients even
16:50 dtrainor joined #gluster
16:51 Lee- semiosis, it appears that way from the log, but I don't really know what is problematic from that log and what isn't. Also if I look at network traffic, it appears to be handling similar traffic to the other node. However, gluster volume status bixcode says "N" in the Online column and N/A in the Port column
16:51 semiosis try 'gluster volume start $vol force' that should re-try starting the brick
16:51 Lee- For example, the log file makes references to inodes not being found. that seems strange to me
16:51 semiosis maybe restart glusterd on that server as well
16:53 Lee- neither "gluster volume start bixcode force", nor restarting glusterd worked
16:54 semiosis could you please pastie the output of the volume status & also include output of 'ps ax | grep gluster'
16:54 semiosis ...on the server where the brick is supposedly down
16:55 Lee- ps output -- http://fpaste.org/132504/36815314/
16:55 glusterbot Title: #132504 Fedora Project Pastebin (at fpaste.org)
16:57 Lee- gluster volume status output -- http://fpaste.org/132505/10368258/
16:57 glusterbot Title: #132505 Fedora Project Pastebin (at fpaste.org)
16:58 Lee- to note, the only volume I'm actually using is bixcode. the ones with the city names have no data and are not mounted by any clients.
16:59 semiosis yep, that's pretty strange
16:59 semiosis brick is clearly running... root      2316 11.7  0.6 241868 24896 ?        Ssl  16:21   3:50 /usr/sbin/glusterfsd -s glusterfs2.bix.local --volfile-id bixcode.glusterfs2.bix.local.gluster-xvdb1-bixcode -p /var/lib/glusterd/vols/bixcode/run/gluste​rfs2.bix.local-gluster-xvdb1-bixcode.pid -S /var/run/e48ea89ddc3aa7b715b82089d20264c7.socket --brick-name /gluster/xvdb1/bixcode -l /var/log/glusterfs/bricks/​gluster-xvdb1-bixcode.log --xlator-option *-
16:59 semiosis posix.glusterd-uuid=8b654911​-7f0d-4384-988a-768fa733851c --brick-port 49158 --xlator-option bixcode-server.listen-port=49158
16:59 semiosis and log shows connections
16:59 semiosis weird
16:59 semiosis iptables???
17:00 Lee- what's strange is if I use "ifstat" to watch network traffic on both gluster nodes, they're very similar, which makes me think that it's functioning, but I need to restart the other gluster node to add more CPU power to it and I'm afraid that it will result in split brain :\
17:00 Lee- I have no iptables rules. This is running in AWS and I'm just using their security policies
17:01 Lee- granted I probably should be using iptables as well, but I want to eliminate any possibilities of issues from that, so there are none
17:01 semiosis ok, next, pastie the glusterd log, that's the etc-glusterfs-glusterd one
17:03 Lee- the etc-glusterfs-glusterd.log -- http://fpaste.org/132508/68551141/
17:03 glusterbot Title: #132508 Fedora Project Pastebin (at fpaste.org)
17:03 Lee- I'm trimming these logs starting a couple minutes before I rebooted the machine, which was at 16:21
17:03 DV_ joined #gluster
17:04 semiosis nothing interesting there, except maybe the op_ctx modification failed, but idk what that means
17:06 Lee- Now I'm not sure what to do to avoid a bigger mess. I really need to increase the CPU power on that other gluster node, but I can't do that if it's going to result in split brain or some other mess.
17:08 Lee- hell it might even be in a split brain situation now if the 2 servers aren't communicating
17:08 Lee- maybe I should just shut down the "working" machine just to avoid that situation and not bring it back up, switch back to NFS and try glusterfs again at some later point.
17:12 rturk joined #gluster
17:12 semiosis stick around, maybe JoeJulian or someone else will have an idea
17:13 JoeJulian I'm in training all day. I don't have enough concentration to spare until my lunch break.
17:14 LHinson joined #gluster
17:18 LHinson1 joined #gluster
17:19 DV_ joined #gluster
17:22 _dist joined #gluster
17:23 _dist I was wondering, the gluster heal seems to take up an intense amount of cpu and a substantial amount of time. This is probably beause I'm running it on VMs, perhaps diff heal is the wrong way to go about
17:24 _dist bandwidth isn't a concern for us, would a full heal for our VM volume be better? I'm considering changing to that instead, but I'm concerned it'll still take days to heal after just a few minutes of a brick being down
17:25 JoeJulian Some people have had better success with full
17:25 semiosis _dist: depending on your workload you may get better performance from the full alg.  i do.  i also reduce the number of background heals to 2 or 4
17:28 RameshN joined #gluster
17:29 _dist semiosis: Also, the info healed report gives me thousands of "healed" seems like every time it "heals" a piece of data it reports it in that list. Is there a way to get a list of only when files are fully healed?
17:30 semiosis idk
17:30 semiosis tbh i never look at my heal status :/
17:30 PeterA joined #gluster
17:30 _dist My workload is almost nothing, my 10gbe is using like 50mbit average but it takes 20 hours to heal a brick (of only 800G) after an outage of 30 minutes
17:30 _dist and, the heal process takes like 50% of 24 cores
17:32 _dist it's been about 12 hours since I brought the brick back up, and 10 of 24 VMs have finished healing as reported by the decrease in heal info, info healed is kinda useless
17:34 _dist also, new in 3.5.2 heal info takes 10-15 seconds to report, but statistics is quick (but has the same issue heal info had in 3.4.2 of false reports during heavy IO)
17:36 gmcwhistler joined #gluster
17:39 lalatenduM joined #gluster
17:41 todakure left #gluster
17:43 _dist I'm looking into the processes, does gluster use rsync to heal?
17:48 JoeJulian _dist: No. But it works the same way.
17:49 glusterbot New news from newglusterbugs: [Bug 1140338] rebalance is not resulting in the hash layout changes being available to nfs client <https://bugzilla.redhat.co​m/show_bug.cgi?id=1140338>
17:52 Gabou is glusterfs stable enough? (i see a lot of bugzilla pages)
17:52 semiosis Gabou: stable enough for who?  i've been using it in prod for over three years, it's been a huge success for my company
17:55 _dist JoeJulian: thanks, if I wanted to monitor the heal traffic, does it run on a specific desitnation port on the being healed brick? is it just the shd port for that brick?
17:55 JoeJulian Gabou: The only software that does not have a lot of bug reports either does only one tiny thing, or nobody uses it.
17:55 JoeJulian shd port
17:57 _dist which shd bears the burden of the comarpison? I have 3 replicas, two are running at high cpu I assume they are both working to compare? Is it just by chance whoever gets to their 10 min interval first?
17:58 JoeJulian A client /may/ do that instead if it hits that file first.
17:58 longshot902 joined #gluster
17:59 _dist got it, the fact that I can take a server down and brick it back up and do nothing, and everything just works is awesome. But 20 hours for < 1TB of data is insane by my standards
17:59 _dist brick=bring :)
18:00 semiosis _dist: remind me again, what kind of data are you storing in gluster?
18:00 _dist to be fair though, the files are technically much larger, but they are thin vms (actualy data <1TB)
18:00 semiosis oh ok
18:00 semiosis so you probably do want diff then
18:00 _dist semiosis: this volume specifically is vm images, mostly qcow2 for kvm
18:00 _dist diff is my current setting
18:01 semiosis you could try full on a test volume
18:01 semiosis but i would be hesitant to try that on live vms, idk how the full heal would affect a running vm
18:01 semiosis s/live/production/
18:01 glusterbot What semiosis meant to say was: but i would be hesitant to try that on production vms, idk how the full heal would affect a running vm
18:02 _dist haha
18:02 theron joined #gluster
18:03 _dist I guess it must be cpu bound, it's just frustrating because my native read speed is like 900 Megabytes 4k random on each brick, my network bandwidth can double that (and did yesterday during storage migrate to gluster replica
18:03 Gabou JoeJulian: shd port? I don't know that i'm sorry, am I bad?
18:05 _dist Gabou: type "gluster volume status"
18:05 _dist wait, not sorry that's not the correct answer
18:06 Gabou i'm currently testing it, and i removed all xfs partitions for now, because I haven't choose the good diskspace..
18:07 * _dist is finding his shd port cause he needs to for other reasons anyway
18:08 Gabou semiosis: have you ever had a disk crash or something? it was fine with glusterfs?
18:18 jmarley joined #gluster
18:19 glusterbot New news from newglusterbugs: [Bug 1140348] Renaming file while rebalance is in progress causes data loss <https://bugzilla.redhat.co​m/show_bug.cgi?id=1140348>
18:23 LHinson joined #gluster
18:23 semiosis Gabou: nope, never had a disk crash.  i'm running in EC2 so everything is perfect lol
18:24 semiosis but seriously, i've had some issues along the way, but nothing that couldn't be fixed
18:31 DV__ joined #gluster
18:32 _dist Until I get the all clear from JoeJulian that remove/add/replace/rebalance all work fine with open files I wouldn't run gluster directly on disks, I'd use some kind of software or hardware raid for each brick
18:38 jmarley joined #gluster
18:38 _dist personally I'd rather use raid of some kind under each brick anyway because if a disk _is_ a brick, and it dies then it needs to "heal" and I'd rather something like raid rebuild seamless to gluster.
18:41 _dist JoeJulian: it looks like after the upgrade my shd is still writing to glustershd.log.1 and the non rotated is empty, is that a bug ?
18:42 dberry joined #gluster
18:42 dberry joined #gluster
18:45 ekuric joined #gluster
18:47 kanagaraj joined #gluster
18:49 semiosis _dist: sounds like you're not using copytruncate
18:52 _dist semiosis: in the past (3.4.2-1) it rolled them, seems like since upgrade to 3.5.2 it doesn't. I didn't change anything, do I need to?
18:53 julim joined #gluster
18:53 semiosis maybe a packaging issue?
18:53 jmarley_ joined #gluster
18:54 _dist could be, I got it from the debian at gluster.org
18:56 mhoungbo joined #gluster
18:57 semiosis i'll look into that :)
18:57 LHinson1 joined #gluster
18:57 semiosis thx for letting me know
18:57 _dist someone else reporeted the same problem (that's also using proxmox on the proxmox forum) so maybe it's not just me
18:58 _dist if you want me to cat anything let me know :)
19:01 _dist I'm wondering if cluster.eager-lock has anything to do with the poor healing speed on my VMs
19:02 semiosis i think i need to add copytruncate to glusterfs-common.logrotate.  it's already in glusterfs-server.logrotate.  https://github.com/semiosis/glusterfs-d​ebian/tree/wheezy-glusterfs-3.5/debian
19:02 glusterbot Title: glusterfs-debian/debian at wheezy-glusterfs-3.5 · semiosis/glusterfs-debian · GitHub (at github.com)
19:03 rotbeard joined #gluster
19:05 _dist it's not going to hurt us in the short term, the file only grows about 1MB/day if that
19:06 richvdh joined #gluster
19:13 _dist actually now that I'm digging into the processes I think I'm being unfair to gluster. The heal of live VMs requires it keep the healing brick up to date with new incoming writes as well as catch up with everything that was missed
19:16 semiosis _dist: try adding the copytruncate line to your glusterfs-common config & see if that helps.  if it solves it for you i'll add it to the package right away
19:16 semiosis (this week)
19:16 _dist ok, will do now, I assume it will require a restart of the service ?
19:17 semiosis no
19:17 semiosis has nothing to do with gluster
19:17 semiosis it's a logrotate config file
19:17 semiosis it will be read when logrotate runs
19:17 _dist yeap just realized that
19:19 VeggieMeat joined #gluster
19:21 _dist changed it and ran a logrotate --force
19:22 kumar joined #gluster
19:22 _dist looks like it fixed the etc-glusterfs-glusterd I'll wait for the next 10 min interval on the shd
19:23 _dist (in two mins) :)
19:24 theron joined #gluster
19:30 jmarley_ joined #gluster
19:31 _dist shd log still not rotating
19:33 gmcwhistler joined #gluster
19:41 Lee- joined #gluster
19:47 failshel_ joined #gluster
19:57 failshell joined #gluster
20:01 _dist semiosis: here's a debug run of logrotate, my glustershd.log.1 is 104MB right now, .log is empty https://dpaste.de/QW7T#L
20:01 glusterbot Title: dpaste.de: Snippet #282490 (at dpaste.de)
20:01 semiosis uh huh
20:01 semiosis think about that
20:01 _dist yeap, I know :)
20:01 semiosis do you see why?  what to do about it?
20:02 _dist I guess I could echo some stuff to it
20:02 semiosis how about you rename the .1 back to the .log it used to be, then try logrotate again
20:02 _dist better idea yes :)
20:07 * _dist never fussed with logrotate beore :)
20:07 _dist semiosis: the debug run now complains about missing gzips and doesn't rotate the shd
20:08 semiosis is the .log file still growing?
20:08 semiosis how about putting up a paste of ls -l /var/log/glusterfs
20:08 _dist the correct log file is growing now
20:09 _dist semiosis: https://dpaste.de/bexJ
20:09 glusterbot Title: dpaste.de: Snippet #282492 (at dpaste.de)
20:09 semiosis yep, that should get rotated
20:12 _dist semiosis: https://dpaste.de/V314
20:12 glusterbot Title: dpaste.de: Snippet #282495 (at dpaste.de)
20:13 semiosis https://dpaste.de/V314#L102
20:13 glusterbot Title: dpaste.de: Snippet #282495 (at dpaste.de)
20:13 semiosis success \o/
20:14 _dist yeah but keep reading :)
20:14 _dist line 109
20:15 _dist and you can see via the ls right after that it didn't actually copy it
20:15 semiosis well idk what to say
20:15 semiosis good luck :)
20:15 _dist haha
20:15 semiosis let me know when you figure it out
20:16 semiosis gotta get back to work
20:16 _dist yeap, np, thanks for your help
20:16 semiosis yw
20:17 bala joined #gluster
20:31 gmcwhistler joined #gluster
20:36 dtrainor joined #gluster
20:37 if-kenn joined #gluster
20:45 if-kenn Hi, I am trying to tune a Gluster 3.5 replication setup for reading small files. I have alter the trusted vol file under /var/lib/glusterd/vols/.  I am not seeing any performance change before/after I stop/start and umount/mount the volume.  Is there any succinct list of steps to get this to work that anyone can point me to?
20:46 if-kenn Thanks
20:47 sputnik13 joined #gluster
20:48 theron joined #gluster
20:48 failshell joined #gluster
20:49 glusterbot New news from newglusterbugs: [Bug 1140396] ec tests fail on NetBSD <https://bugzilla.redhat.co​m/show_bug.cgi?id=1140396>
20:51 LHinson joined #gluster
20:51 _dist if-kenn: how are you measuring the performance right now ?
20:53 if-kenn _dist: using seige to query a webserver with static files in the following scenarios: 1) local files, 2) via native NFS, 3) gluster via NFS, 4) gluster via gluster client 5) “quick-read tuned” gluster via gluster client
20:55 _dist if-kenn: in scenarios like that I've found a read cache in your underlying FS large enough to handle the files is great. But I've honestly not seen performance issues with small files (running 3.5.2 but tested in 3.4.2). Though we only have about 400k files
21:00 if-kenn _dist: in querying 63 URLs of static content, i am seeing the average results for the scenarios as the following:
21:00 if-kenn 1) Local: 8.20 seconds
21:00 if-kenn 2) NFS Native: 26.45 seconds
21:00 if-kenn 3) Gluster/NFS client: 8.39 seconds
21:00 if-kenn 4) Gluster/Gluster client: 121.37 seconds
21:00 if-kenn 5) Gluster "tuned"/Gluster client: 121.38 seconds
21:03 if-kenn The interesting things to note are: Gluster/NFS is near local speed, so some caching has to be taking place, Gluster/Gluster is taking 5x as long as Gluster/NFS, the tuning is either not making any noticable difference or is not correctly done.
21:03 _dist any idea which part of the gluster fuse mount is taking that long? I never expected something like that, so I probably can't help
21:04 _dist JoeJulian: ^^
21:06 _dist if-kenn: in the mean time I'd recommend looking at the options in "gluster volume set help" many of them aren't documented
21:07 _dist you could grep for things like read, and cache but these are just guesses on my part
21:09 if-kenn my feeling is that my “tuning” is not even touching the right things as there is practically no change in the benchmark
21:09 if-kenn i guess i should anti-tune something to just know that i am having some kind of effect
21:10 if-kenn i have to run an errand now but will stay on AFK so that if anyone thinks of something i will see it.
21:35 sputnik13 joined #gluster
21:43 sputnik13 joined #gluster
22:05 calum_ joined #gluster
22:22 zerick joined #gluster
22:35 dtrainor joined #gluster
22:44 tom[] joined #gluster
23:07 gmcwhistler joined #gluster
23:18 notxarb left #gluster
23:26 dtrainor joined #gluster
23:46 gildub joined #gluster
23:53 if-kenn_afk Can anyone please help me, no matter how I change my replicated volumes trusted .vol file it does not seem to have any effect.  Can someone please help figure a way to test if the trusted…vol file is being loaded properly?

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary