Perl 6 - the future is here, just unevenly distributed

IRC log for #gluster, 2013-12-19

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:03 glusterbot New news from resolvedglusterbugs: [Bug 762233] Kernel compile fails on Stripe + Switch setup <https://bugzilla.redhat.com/show_bug.cgi?id=762233> || [Bug 762031] booster segfault while reading volume file <https://bugzilla.redhat.com/show_bug.cgi?id=762031> || [Bug 761860] cleanup unwanted ctx dictionary in 'inode' and 'fd' structures. <https://bugzilla.redhat.com/show_bug.cgi?id=761860> || [Bug 761861] need for enhance
00:05 theron joined #gluster
00:17 glusterbot New news from newglusterbugs: [Bug 882127] The python binary should be able to be overridden in gsyncd <https://bugzilla.redhat.com/show_bug.cgi?id=882127> || [Bug 908518] [FEAT] There is no ability to check peer status from the fuse client (or maybe I don't know how to do it) <https://bugzilla.redhat.com/show_bug.cgi?id=908518> || [Bug 915996] [FEAT] Cascading Geo-Replication Weighted Routes <https://bugzilla.redhat.com/s
00:34 glusterbot New news from resolvedglusterbugs: [Bug 768324] memory corruption in client process.[Release:3.3.0qa15] <https://bugzilla.redhat.com/show_bug.cgi?id=768324> || [Bug 768901] du command throws "Stale NFS file handle" messages <https://bugzilla.redhat.com/show_bug.cgi?id=768901> || [Bug 782257] glusterfs.file. get/set feature using xattr() calls is not working <https://bugzilla.redhat.com/show_bug.cgi?id=782257> || [Bug 782
00:54 bala joined #gluster
01:08 yinyin joined #gluster
01:43 nueces joined #gluster
02:00 jag3773 joined #gluster
02:09 nueces joined #gluster
02:22 lyang0 joined #gluster
02:24 bharata-rao joined #gluster
02:37 edong23 joined #gluster
02:38 nueces joined #gluster
02:38 shubhendu joined #gluster
02:38 FilipeCifali joined #gluster
02:56 raghug joined #gluster
03:03 kshlm joined #gluster
03:06 yinyin joined #gluster
03:12 lyang0 joined #gluster
03:35 saurabh joined #gluster
03:45 micu joined #gluster
03:45 micu2 joined #gluster
03:46 itisravi joined #gluster
04:01 shylesh joined #gluster
04:05 hagarth joined #gluster
04:10 satheesh1 joined #gluster
04:23 hagarth joined #gluster
04:24 kanagaraj joined #gluster
04:34 ababu joined #gluster
04:36 harish joined #gluster
04:44 ppai joined #gluster
04:48 RameshN joined #gluster
04:52 CheRi joined #gluster
04:52 MiteshShah joined #gluster
05:03 psyl0n joined #gluster
05:04 lalatenduM joined #gluster
05:05 primechuck joined #gluster
05:11 kdhananjay joined #gluster
05:14 RameshN joined #gluster
05:15 psharma joined #gluster
05:20 ndarshan joined #gluster
05:22 prasanth joined #gluster
05:25 atrius joined #gluster
05:25 mohankumar joined #gluster
05:30 vpshastry joined #gluster
05:31 anands joined #gluster
05:31 bala joined #gluster
05:49 bala joined #gluster
05:53 pureflex joined #gluster
06:03 aravindavk joined #gluster
06:11 overclk joined #gluster
06:15 nshaikh joined #gluster
06:19 raghu joined #gluster
06:29 zeittunnel joined #gluster
06:41 mohankumar joined #gluster
06:43 JonnyNomad joined #gluster
07:05 FarbrorLeon joined #gluster
07:10 pureflex joined #gluster
07:12 JonnyNomad joined #gluster
07:23 jtux joined #gluster
07:39 ngoswami joined #gluster
07:46 atrius joined #gluster
07:47 dneary joined #gluster
07:48 CheRi joined #gluster
07:52 ricky-ti1 joined #gluster
07:57 satheesh joined #gluster
08:01 JonnyNomad joined #gluster
08:02 ctria joined #gluster
08:06 ekuric joined #gluster
08:14 eseyman joined #gluster
08:18 ngoswami joined #gluster
08:19 ricky-ticky joined #gluster
08:20 itisravi joined #gluster
08:22 blook joined #gluster
08:26 franc joined #gluster
08:28 satheesh joined #gluster
08:35 pk joined #gluster
08:35 mgebbe_ joined #gluster
08:36 mgebbe joined #gluster
08:37 hagarth joined #gluster
08:39 shri joined #gluster
08:43 ricky-ticky joined #gluster
08:52 overclk joined #gluster
09:01 andreask joined #gluster
09:03 hagarth joined #gluster
09:12 blook hi there
09:13 blook im struggling with some weird problem in my gluster 3.4.1 setup with a distributed replicated setup
09:14 blook i have a lot of files in the xattrop (volume heal info) directory, most of them are zero sized and sticky bit permissions are set. In my opinion these are those dht links to other bricks, after a file was renamed/moved
09:15 blook the problem ist, that they have the aft-attributes not set to null
09:16 blook those files acuse the other replica and also itself that the file is not in sync
09:17 blook but every file of those have exactly the same xattributes dht-link, afr, trusted.gfid and so on
09:17 blook so for me they are in sync
09:18 kaushal_ joined #gluster
09:18 blook the files are counting and counting and now the volume heal info gets a timeout after 2 minutes :(
09:18 blook so the question is, is that a known bug? or do i miss something?
09:19 itisravi joined #gluster
09:20 blook ah one more thing the only different xattribute is the xtime flag of the georeplication
09:20 blook perhaps this makes the difference
09:21 eseyman joined #gluster
09:22 rastar joined #gluster
09:27 ricky-ticky1 joined #gluster
09:39 pk blook: hi
09:40 blook pk: hi :)
09:40 pk blook: Sorry was not at my desk. Files in xattrop directory are the gfids of files which need self-heal or Files/Dirs which are undergoing some changes
09:41 pk blook: volume heal info command is not very good at printing large number of entries
09:42 pk blook: I just posted the new implementation yesterday for this: http://review.gluster.com/6529
09:42 glusterbot Title: Gerrit Code Review (at review.gluster.com)
09:42 pk blook: It is going to be backported to 3.5 as well
09:43 pk blook: Now back to your issue. You were saying that those files are accusing the other brick.... Is the other brick down or something?
09:44 blook pk: no, everything is up and running. those files are acusing the other replica brick and also 'fool' itself
09:45 blook pk: i wrote a quick ruby script to search fore those empty, sticky bit files and compare them on both sides of the replica, and they are the same. i would set afr xattr to 0 on those, but perhaps i misunderstood something
09:46 blook pk: its also strange because they don't stop counting, those sort of files under the xattrop directory
09:49 atrius joined #gluster
09:55 hagarth joined #gluster
09:55 MiteshShah joined #gluster
09:58 pk blook: Seems like those files are undergoing natural changes. i.e. I/O from mount
09:58 calum_ joined #gluster
09:59 pk blook: Don't touch them if that is the case. There was a limitation in the functionality that it couldn't distinguish between files that need self-heal vs files that are undergoing I/O. The patch I posted above also handles these cases for file undergoing writes etc
10:01 blook pk: ok i see, but those files stay in the directory
10:03 blook pk: if it should be natural changes they should disappear i guess
10:05 pk blook: It depends on the I/O pattern
10:05 blook pk: my script exports all the files from one brick with sticky-bit and zero size to a json document with all its xattributes, this json document is reread on the opposite replica and checked against the xattrs
10:05 blook and those files stay :(
10:05 pk blook: If the same file keeps getting written it will be there forever. For example. VM file hosted on gluster volume
10:06 pk blook: As long as the VM is in use, the file would be there in the xattrop directory
10:07 blook pk: ok......i have to check it if they stay for more than a couple of hours......its just a bunch of small files (xml documents etc.) so they shouldn't run as long as vm does usually does
10:08 pk blook: ok, I will be back in a couple of hours. Have to attend Christmas celebrations in office. cya
10:08 blook pk: thank you for your help. would be great to come back to you if its not natural I/O bahavior
10:08 blook pk: appreciate it! :)
10:08 blook pk: thanks, cu
10:37 inv joined #gluster
10:38 inv who testing zfs and glusterfs  in linux?
10:41 Shdwdrgn joined #gluster
10:56 bolazzles joined #gluster
10:58 hybrid5121 joined #gluster
11:00 ira joined #gluster
11:07 nshaikh joined #gluster
11:08 ekuric joined #gluster
11:14 mbukatov joined #gluster
11:23 diegows joined #gluster
11:25 pk blook: what happened?
11:33 nullck_ joined #gluster
11:37 edward2 joined #gluster
11:46 prasanth joined #gluster
11:50 glusterbot New news from newglusterbugs: [Bug 1037501] All the existing bricks are not marked source when new brick is added to volume to increase the replica count from 2 to 3 <https://bugzilla.redhat.co​m/show_bug.cgi?id=1037501>
11:51 RameshN joined #gluster
12:05 rastar joined #gluster
12:16 blook pk: the files are still there just one of 13000 disappeared
12:17 blook pk: the other bricks also have 20-30k files and counting
12:17 CheRi joined #gluster
12:18 lalatenduM joined #gluster
12:22 dneary joined #gluster
12:24 kaushal_ joined #gluster
12:24 pk blook: hi
12:24 pk blook: Could you tell me what kind of xattrs are present on one of the files on the bricks of replica
12:25 kaushal_ joined #gluster
12:28 blook pk: xattrs are: afr.client-0/1, gfid, glusterfs.xtime, glusterfs.dht.linkto
12:30 pk blook: I need the output
12:31 blook getfattr -m . -d -e hex /data/tb/.glusterfs/b4/5d/b45d1​1cf-d630-4470-9de4-6f2c8ab803f3
12:31 blook getfattr: Entferne führenden '/' von absoluten Pfadnamen
12:31 blook # file: data/tradebyte/.glusterfs/b4/5d/b4​5d11cf-d630-4470-9de4-6f2c8ab803f3
12:31 blook trusted.afr.tb-client-0=0x000000000000000100000000
12:31 blook trusted.afr.tb-client-1=0x000000000000000100000000
12:31 blook trusted.gfid=0xb45d11cfd63044709de46f2c8ab803f3
12:31 blook trusted.glusterfs.96a695be-286c-474a-977​c-100d3881c95f.xtime=0x52aafb1700086e46
12:31 blook trusted.glusterfs.dht.linkto=0x7472616​465627974652d7265706c69636174652d3100
12:35 pk what about the other brick?
12:35 blook pk: sorry
12:36 pk blook: There must be other brick in that replica pair. Could you please give the output on that brick as well....
12:36 blook pk: this ist the output of the replica pair:
12:36 blook getfattr -m . -d -e hex /data/tb/.glusterfs/b4/5d/b45d1​1cf-d630-4470-9de4-6f2c8ab803f3
12:36 blook getfattr: Entferne führenden '/' von absoluten Pfadnamen
12:36 blook # file: data/tradebyte/.glusterfs/b4/5d/b4​5d11cf-d630-4470-9de4-6f2c8ab803f3
12:36 blook trusted.afr.tb-client-0=0x000000000000000100000000
12:36 blook trusted.afr.tb-client-1=0x000000000000000100000000
12:36 blook trusted.gfid=0xb45d11cfd63044709de46f2c8ab803f3
12:36 blook trusted.glusterfs.96a695be-286c-474a-977​c-100d3881c95f.xtime=0x52aafb17000870b0
12:36 blook trusted.glusterfs.dht.linkto=0x7472616​465627974652d7265706c69636174652d3100
12:36 blook pk: the only difference is the xtime for the georeplication......
12:38 ababu joined #gluster
12:44 pk blook: Interesting
12:45 pk blook: xtime indicates something about geo-replication
12:45 pk blook: Are you using geo-replication?
12:45 blook pk: yes
12:46 pk blook: hmm...
12:46 blook pk: any ideas? :)
12:47 pk blook: I know why the file is staying in xattrop
12:48 pk blook: It is because the files are either undergoing metadata operations continuously(This is the very first time I see this happening). Or the bricks in the replica were taken down while some metadata operation is going on (High probability).... What do you think it is?
12:49 blook pk: i don't think its because of the continuous metadata operations, they stay for too long in there
12:50 blook pk: again, those files are all empty with sticky bit set (dht links)
12:51 pk blook: Did the bricks go down at any point?
12:51 blook pk: yes they were
12:51 pk blook: ah!, I am a bit paranoid, so I will ask you what exactly happened. I need to know that both the bricks in that replica went down
12:53 satheesh joined #gluster
12:53 blook pk: i try to summarize: one complete pair pair has been down one after the other, so that i get a couple of split brain files in this time......they are solved manually
12:54 blook pk: now all the bricks are up since the outtage for a couple of days (9) and those files are popping up since then
12:54 blook pk: the other pairs didn't had any outages or problems
12:55 blook pk: but the problem exists on all bricks
12:56 blook pk: to be honest - i don't know if its related to the outage a couple of days ago or not
12:57 pk blook: ok
12:57 blook pk: i also don't have much experience with the setup, because its pretty new and the problems came when we turned the system productive
12:57 blook pk: in test cases all went fine, all outages and selfheal syncs etc.
12:57 pk blook: Unfortunately geo-rep dev is not here. I just wanted to know if we can erase the pending metadata flags without any problem for geo-rep
12:58 pk blook: Bringing both nodes in replicate was what caused the problem.
12:58 pk blook: So my theory is while xtime was being updated, the bricks were brought down
12:59 blook pk: ok, but why is it still counting?
13:00 blook pk: i mean, why are coming more and more files into this 'heal' status and why should geo replication edit the afr xattrs
13:00 blook pk: the afr xattrs shoulnt be set on != 0
13:00 pk blook: basically when the xtime is updated 3 things go on. 1 - mark the dirty xattrs for client-0/1, 2 - do the xtime update, then, 3 - erase the dirty-flags - 0/1
13:00 pk I think the bricks were brought down between 1, 3
13:01 blook pk: ok then it makes sense to me
13:01 pk blook: if only one of the bricks is taken down. It would have self-healed properly.
13:01 blook pk: and why are there coming more and more? perhaps because of the geo replication crawler?
13:01 pk blook: But both the bricks were taken down.
13:01 pk blook: like I said, I am not very well versed in geo-rep.
13:02 pk blook: :-(
13:02 blook pk: im very happy about your help already :)
13:02 pk blook: We didn't solve any problem :-P.
13:03 blook pk: but i feel a little bit better :D
13:03 bala joined #gluster
13:03 pk blook: I would have suggested you to reset the xattrs on the bricks. But I am a bit concerned about the functionality of geo-rep. Considering the xtimes are different on the bricks.
13:04 blook pk: i would think about another scenario: all those files are dht link files, files that are created when a lookup to brick failed, because the file was renamed, and this dht link creation is some kind of faulty
13:04 blook pk: what do you think about that theory?
13:05 zeittunnel joined #gluster
13:06 pk blook: Yes link files may get created when renames happen.
13:06 pk blook: I didn't understand what was faulty there.
13:07 blook pk: no i mean: most of the files, that are 'out of sync' are just those empty link files
13:07 pk blook: hmm...
13:07 blook pk: i can't find any 'normal' not empty and not sticky bit flagged files
13:08 pk blook: interesting
13:08 pk blook: I think geo-rep dev just stepped out for a while. Why don't we wait for a bit?
13:08 pk blook: his laptop is here, so I am guessing he is still in office.
13:09 blook pk: would be great - thank you for your participation so far :)
13:09 CheRi joined #gluster
13:09 pk blook: I hope I am correct, that the dev is still here :-)
13:11 pk blook: May I know which country you are from and what kind of setup you guys put in production?
13:11 blook pk: im from germany and we are starting right now with the glusterfs project - we want to get rid of our netapp
13:12 pk blook: What is the kind of capacity? TBs or PBs?
13:12 blook pk: its tb
13:12 pk blook: cool
13:12 blook pk: we have about 120TB at the moment
13:13 pk blook: cool
13:13 pk blook: what kind of work load? I mean VMs/smallfiles/audio/pcitures/ etc etc?
13:14 blook pk: right now it is small files nfs share (xml documents, pictures, php scripts)
13:14 blook pk: im looking forward to use it with the libgfapi and our cloud setup as a datastore
13:15 blook pk: but for now it has to work as a nfs datastore, before we put more servers into it
13:17 blook pk: one major problem for us is the 'hidden' documentary :) - i should have come here earlier to get some great information about how it works and how it supposed to work :)
13:20 hagarth joined #gluster
13:24 pk blook: I spoke to geo-rep dev. Changing those xattrs should not be a problem
13:24 blook pk: deleting them on both sides?
13:24 pk blook: No No
13:25 pk blook: for each file with the symptoms you told me about set the xattrs to all zeros
13:25 pk blook: shall we work on one of the files together? May be you can do the same for the rest?
13:25 blook pk: which xattrs exactly, the afr ones, or the xtime?
13:26 pk blook: afr ones
13:26 blook pk: no it is ok, just the conversation was misleading right now :)
13:26 pk blook: Just to be sure, see that stat ouput of the file on both brick are same
13:26 blook pk: ok i will edit this in my script
13:27 pk blook: I am not sure I understand what you are going to edit in the script
13:29 blook pk: im going through xattrop directory on one brick, then i crawl those files and create a hash with all corresponding xattrs to each file. this is exported as a json document and transferred to the other host (the other brick of the replica pair). on this brick i reread the json document and compare the xattrs on the files there
13:30 pk blook: ok I understand
13:30 blook pk: so far every xattr is the same, just the xtime xattr is different
13:30 pk blook: that is fine.
13:31 blook pk: now i have to do a stat additionally like you said and perhaps a md5sum
13:31 blook pk: if everything is the same is set the afr to 0
13:31 blook pk: :)
13:31 blook pk: if everything breaks i ll run out the office
13:32 blook pk: but for now i have to deal with something different, i will let you know when im done, how long are here for today?
13:34 lalatenduM joined #gluster
13:34 pk Only metadata changed. Data does not have any problems. According to the output you showed. that means permissions/owner/group/xattrs etc
13:34 pk blook: root@pranithk-laptop - /mnt/r2
13:34 pk 19:03:15 :) ⚡ getfattr -d -m. -e hex /home/gfs/r2_?/a
13:34 pk getfattr: Removing leading '/' from absolute path names
13:34 pk # file: home/gfs/r2_0/a
13:34 pk security.selinux=0x756e636f6e66696e65645f7​53a6f626a6563745f723a66696c655f743a733000
13:34 pk trusted.afr.r2-client-0=0x000000000000000100000000
13:34 pk trusted.afr.r2-client-1=0x000000000000000100000000
13:34 pk trusted.gfid=0x5fe0bb03672f40ed80e83be4c8ef0119
13:34 pk # file: home/gfs/r2_1/a
13:34 pk security.selinux=0x756e636f6e66696e65645f7​53a6f626a6563745f723a66696c655f743a733000
13:34 pk trusted.afr.r2-client-0=0x000000000000000100000000
13:34 pk trusted.afr.r2-client-1=0x000000000000000100000000
13:34 pk trusted.gfid=0x5fe0bb03672f40ed80e83be4c8ef0119
13:35 pk blook: this is the kind of data you have. If stat output is kinda similar, you can just go ahead and set any of the client-x attribute to all zeros
13:36 pk I chose to clear client-1 xattr on brick-0's 'a'
13:37 pk blook: so this is the command: setfattr -h -n trusted.afr.r2-client-1 -v 0x000000000000000000000000 /home/gfs/r2_0/a
13:37 pk blook: after that I did a stat: root@pranithk-laptop - /mnt/r2
13:37 pk 19:06:04 :) ⚡ stat a
13:37 pk File: ‘a’
13:37 pk Size: 0         Blocks: 0          IO Block: 131072 regular empty file
13:37 pk Device: 24h/36dInode: 9288740085261336857  Links: 1
13:37 pk Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
13:37 pk Context: system_u:object_r:fusefs_t:s0
13:37 pk Access: 2013-12-19 19:01:04.719376000 +0530
13:37 pk Modify: 2013-12-19 19:01:04.719376000 +0530
13:37 pk Change: 2013-12-19 19:05:49.861554327 +0530
13:37 pk Birth: -
13:37 pk root@pranithk-laptop - /mnt/r2
13:37 pk 19:06:12 :) ⚡ getfattr -d -m. -e hex /home/gfs/r2_?/a
13:37 pk getfattr: Removing leading '/' from absolute path names
13:37 pk blook: got it?
13:38 pk blook: Oops
13:38 pk blook: wrong explanation
13:38 pk blook: Let me start over.
13:39 blook pk: yes i got it anyway thank you
13:39 blook pk: the only question left is, why i ran into this problem
13:40 pk blook: oh you are aware of xattrs clearing for afr?
13:41 pk blook: Even I want to know how you ended up in this state. If you find something please raise a bug
13:41 blook pk: yes i know how it works - im wondering why this happened and also i think clearing those afr records to 0 does not fix the initial problem. there are files still counting
13:42 blook pk: ok
13:42 primechuck joined #gluster
13:42 zorun joined #gluster
13:42 pk blook: I did not understand what you mean by 'files still counting'
13:43 jseeley_vit joined #gluster
13:45 blook pk: sorry, i mean a few days ago, i had perhaps about 10k files like this, now i have 30k files, and there im counting more and more, every time i do just ls -1 |wc -l in xattrop, its still counting
13:46 blook pk: also the comparison with my script, the files (those dht link files) are increasing
13:46 pk blook: Oh interesting. So you are saying new files are coming with same symptoms?
13:46 blook pk: yes they are
13:46 pk blook: interesting.
13:46 rastar joined #gluster
13:46 sroy_ joined #gluster
13:46 pk blook: This is the first time I am hearing about such an issue. Would love to know what steps can re-create such issue.
13:47 blook pk: i don't know them either
13:47 blook pk: thats the main problem
13:48 blook pk: i don't know if its the georeplication crawler, which is stating more and more files, or just renames of my client
13:49 zorun hi
13:49 glusterbot zorun: Despite the fact that friendly greetings are nice, please ask your question. Carefully identify your problem in such a way that when a volunteer has a few minutes, they can offer you a potential solution. These are volunteers, so be patient. Answers may come in a few minutes, or may take hours. If you're still in the channel, someone will eventually offer an answer.
13:49 pk blook: Renames won't create files with '1' in the client-0/1 xattr. At least I have not seen such behavior
13:49 zorun would glusterfs be adapted for replicating data over the 32 nodes of a HPC cluster?
13:49 zorun I am worried about write performance
13:50 blook pk: are you pretty sure?
13:50 zorun also, would glusterfs be happy if only a few nodes are up at any given time?
13:50 pk blook: Could you give me volume status output on your cluster
13:50 pk blook: for that volume I mean
13:53 lalatenduM joined #gluster
13:54 blook Status of volume: client
13:54 blook Gluster processPortOnlinePid
13:54 blook ---------------------------------------​---------------------------------------
13:54 blook Brick net-gluster-2-1.adm.client.de:/data/clientN/AY5690
13:54 blook Brick net-gluster-1-1.adm.client.de:/data/clientN/AY5553
13:54 blook Brick net-gluster-2-2.adm.client​.de:/data/client49171Y5199
13:54 blook Brick net-gluster-1-2.adm.client​.de:/data/client49176Y5088
13:54 blook Brick net-gluster-2-3.adm.client​.de:/data/client49176Y5152
13:54 blook Brick net-gluster-1-3.adm.client.​de:/data/client49164Y10720
13:54 blook NFS Server on localhost2049Y18434
13:54 blook Self-heal Daemon on localhostN/AY18631
13:54 blook NFS Server on net-gluster-2-1.adm.client.de2049Y24416
13:54 blook Self-heal Daemon on net-gluster-2-1.adm.client.deN/AY24552
13:54 blook NFS Server on net-gluster-2-3.adm.client.de2049Y30903
13:54 blook Self-heal Daemon on net-gluster-2-3.adm.client.deN/AY543
13:54 blook NFS Server on net-gluster-1-3.adm.client.de2049Y12183
13:54 blook Self-heal Daemon on net-gluster-1-3.adm.client.deN/AY12319
13:54 blook NFS Server on net-gluster-1-2.adm.client.de2049Y31573
13:54 blook Self-heal Daemon on net-gluster-1-2.adm.client.deN/AY31761
13:54 blook NFS Server on net-gluster-2-2.adm.client.de2049Y5613
13:54 blook Self-heal Daemon on net-gluster-2-2.adm.client.deN/AY7590
13:57 pk blook: Output seems fine. I don't see any obvious problem here.
13:58 pk blook: gluster volume info output?
13:59 pk blook: Just to check if there is anything abnormal
14:00 blook pk: Volume Name: client
14:00 blook Type: Distributed-Replicate
14:00 blook Volume ID: 96a695be-286c-474a-977c-100d3881c95f
14:00 blook Status: Started
14:00 blook Number of Bricks: 3 x 2 = 6
14:00 blook Transport-type: tcp
14:00 blook Bricks:
14:00 blook Brick1: net-gluster-2-1.adm.netways.de:/data/client
14:00 blook Brick2: net-gluster-1-1.adm.netways.de:/data/client
14:00 blook Brick3: net-gluster-2-2.adm.netways.de:/data/client
14:00 blook Brick4: net-gluster-1-2.adm.netways.de:/data/client
14:00 blook Brick5: net-gluster-2-3.adm.netways.de:/data/client
14:00 blook Brick6: net-gluster-1-3.adm.netways.de:/data/client
14:00 blook Options Reconfigured:
14:00 blook network.ping-timeout: 5
14:00 blook nfs.rpc-auth-allow: 10.10.7.*
14:01 blook auth.allow: 10.10.7.*
14:01 blook features.quota: off
14:01 blook geo-replication.indexing: on
14:01 pk blook: Any reason why ping-timeout is changed?
14:02 blook pk: 42 seconds, or the default one, takes too long for us until the operations can continue
14:03 pk blook: hmm... Could you do the following grep on the mount log file? grep -i disconnected <mount-log-file>
14:04 pk blook: check if there are any unexpected disconnects. i.e. something like you are sure there are no problems in network but you still see disconnects today or yesterday.
14:06 blook pk: net-gluster-2-1:/var/log/glusterfs# grep -ir disconnected * - nothing there
14:07 bala joined #gluster
14:07 pk blook: How many mounts do you have?
14:07 blook pk: we have just mounts via nfs
14:07 pk blook: got it
14:08 pk blook: so only the nfs-server logs
14:08 blook pk: what do you mean?
14:10 pk blook: meaning no fuse mounts. nfs.log is the only log that will have all the info.
14:10 blook pk: yes, i don't have fuse mounts
14:10 blook pk: the only one is the one for the georeplication itself
14:11 pk blook: But you have to do the grep on all of the nfs.logs i.e. on net-gluster-2-1 net-gluster-1-1 net-gluster-2-2 net-gluster-1-2  net-gluster-2-3 net-gluster-1-3
14:11 psyl0n joined #gluster
14:15 bennyturns joined #gluster
14:20 pk blook: Dude, I need to start for home now. Its 8:00 PM here in India. Please mail me the tarball of logs to pkarampu@redhat.com so that I can take a look to see if there are any problems. I will be coming online after 15 hours.
14:20 pk blook: cya tomorrow
14:20 blook pk: thank you!
14:20 blook pk: ciao
14:20 pk blook: so you will send the logs?
14:21 blook pk: yes i have to ;)
14:22 pk blook: thats cool. bbye
14:22 pk left #gluster
14:29 dbruhn joined #gluster
14:46 japuzzo joined #gluster
14:48 hybrid5121 joined #gluster
14:53 lbalog joined #gluster
14:55 B21956 joined #gluster
15:02 bennyturns joined #gluster
15:08 ababu joined #gluster
15:14 kaptk2 joined #gluster
15:16 rwheeler joined #gluster
15:19 muhh joined #gluster
15:21 wushudoin joined #gluster
15:22 zerick joined #gluster
15:26 lpabon joined #gluster
15:35 vpshastry joined #gluster
15:35 vpshastry left #gluster
15:39 CheRi joined #gluster
15:40 failshell joined #gluster
15:56 hagarth joined #gluster
15:57 dbruhn joined #gluster
15:58 bala joined #gluster
16:02 theron joined #gluster
16:04 lpabon joined #gluster
16:06 dewey joined #gluster
16:16 LoudNoises joined #gluster
16:19 bugs_ joined #gluster
16:20 lbalog_ joined #gluster
16:26 FarbrorLeon joined #gluster
16:29 GabrieleV joined #gluster
16:30 jag3773 joined #gluster
16:30 lbalog_ joined #gluster
16:36 psyl0n joined #gluster
16:37 jseeley_vit joined #gluster
16:42 thogue joined #gluster
16:45 edong23__ joined #gluster
16:48 wushudoin| joined #gluster
16:49 Onoz joined #gluster
16:53 glusterbot New news from newglusterbugs: [Bug 1045123] Latest Fedora 3.4.1 RPM does not install python gluster directory <https://bugzilla.redhat.co​m/show_bug.cgi?id=1045123>
16:55 rwheeler joined #gluster
16:55 lbalog_ joined #gluster
16:55 sac`away` joined #gluster
17:04 lbalog_ joined #gluster
17:04 sac`away` joined #gluster
17:15 lbalog_ joined #gluster
17:15 sac`away` joined #gluster
17:16 plarsen joined #gluster
17:18 dbruhn joined #gluster
17:19 cfeller joined #gluster
17:26 theron joined #gluster
17:29 bolazzles joined #gluster
17:35 mattappe_ joined #gluster
17:35 Mo__ joined #gluster
17:36 andreask joined #gluster
17:41 Gilbs1 joined #gluster
17:46 Gilbs1 I had a massive hardware failure on my gluster boxes last week and I'm noticing I'm missing random files since then.  Luckily I have geo-replication going, is there a way to do a consistency check on the geo-rep volume to compare it to my prod volume?    (too many folers to do a manual check)
17:52 psyl0n joined #gluster
17:56 mattappe_ joined #gluster
17:56 mattapperson joined #gluster
18:04 brimstone Gilbs1: find | md5sum and diff?
18:04 vimal joined #gluster
18:05 vimal joined #gluster
18:10 zapotah joined #gluster
18:10 zapotah hi
18:10 glusterbot zapotah: Despite the fact that friendly greetings are nice, please ask your question. Carefully identify your problem in such a way that when a volunteer has a few minutes, they can offer you a potential solution. These are volunteers, so be patient. Answers may come in a few minutes, or may take hours. If you're still in the channel, someone will eventually offer an answer.
18:10 mattappe_ joined #gluster
18:11 zapotah i updated glusterfs to 3.4.1 from 3.4.0 and after that glusterfsd wont start
18:11 mattapperson joined #gluster
18:11 zapotah it wont output any log output at all
18:12 zapotah glusterd starts fine but the brick stays offline
18:13 y4m4 joined #gluster
18:22 semiosis zapotah: check the glusterd log file, something like /var/log/glusterfs/etc-glusterd-glusterfs.log
18:22 semiosis zapotah: then check the log for the brick that isn't starting, it may show an attempt to start & reason why it failed
18:24 blook joined #gluster
18:33 theron joined #gluster
18:37 zapotah i wonder what on earth happened
18:38 zapotah glusterfsd disappeared after i restarted glusterd
18:38 zapotah and now the brick is online
18:39 zapotah no wait
18:39 zapotah what
18:39 zapotah ill get back
18:42 zapotah okay. apparently restarting glusterd on the second server which has not yet updated enabled the glusterfsd to start
18:43 zapotah never occurred to me that the fault could be with the second server
18:43 zapotah which has not yet been touched
18:44 zapotah well, now need to wait for the self-heal to complete
18:44 zorun left #gluster
18:51 zaitcev joined #gluster
18:56 hybrid5121 joined #gluster
18:57 mattappe_ joined #gluster
18:58 mattapperson joined #gluster
19:02 rwheeler joined #gluster
19:04 nage joined #gluster
19:10 Gilbs1 left #gluster
19:19 badone joined #gluster
19:19 dbruhn joined #gluster
19:20 dbruhn It's done, almost 12 months later. I am fully migrated off of NetApp and on my Gluster clusters.
19:21 aixsyd joined #gluster
19:21 aixsyd dbruhn: !!!
19:21 aixsyd Infiniband troubles, bro
19:22 dbruhn Uh oh, what's the problem?
19:22 aixsyd initial set up - i must be missing something
19:22 dbruhn distro?
19:22 aixsyd so i have two dual port cards. im not using a switch - directly connecting them
19:22 aixsyd proxmox (debian)
19:22 aixsyd does a go to B, and b to a? or a to a, b to b?
19:23 aliguori joined #gluster
19:23 aixsyd i'd like to use 802.3ad - but im not sure thats possible with IB
19:23 aixsyd for bonding
19:24 aixsyd gotta make sure my wiring is right first, though
19:24 dbruhn No idea on bonding IB
19:24 dbruhn Doesn't matter a ports a ports on IB
19:24 aixsyd so they dont have to be crossed like an ethernet bond would?
19:24 mattappe_ joined #gluster
19:25 mattapperson joined #gluster
19:25 dbruhn I've never really used bonding, so I am not sure on the answer to that. I'm only using a single cable to each of my servers.
19:25 aixsyd hm.
19:26 dbruhn You're still testing all of this right?
19:27 aixsyd of course
19:27 aixsyd months from deployment :P
19:27 dbruhn Can't hurt to try it out. I am assuming you can set up your bonds in the same way you do an ethernet connection
19:27 aixsyd http://fpaste.org/63316/48123313/
19:27 glusterbot Title: #63316 Fedora Project Pastebin (at fpaste.org)
19:27 aixsyd what the hell.
19:28 aixsyd oh god, kernel modules
19:28 dbruhn lol sec
19:28 aixsyd got em
19:28 aixsyd do you need a subnet manager for only two systems directly connected?
19:33 dbruhn Yes you do
19:34 dbruhn before you worry about the TCP/IP portion of the IB, you need to be able to run ib_hosts and ib_ping
19:34 dbruhn before you worry about the TCP/IP portion of the IB, you need to be able to run ib_hosts and ib_ping
19:34 dbruhn gah
19:34 aixsyd lol
19:34 dbruhn the IP portion run's on top of the functioning network
19:35 dbruhn damn irc client deleted my second message
19:35 aixsyd i can seemingly ping the other card now...
19:35 aixsyd as in normal ping
19:35 aixsyd i dont have an ib_ping command
19:35 dbruhn Then it's probably up and functioning
19:35 aixsyd or ib_hosts
19:35 dbruhn you'll probably need to install them
19:35 dbruhn I have a list of all of the packages I install on my stuff, but it's for RHEL
19:36 dbruhn so you'll have to dissect it a bit
19:36 aixsyd shite
19:37 dbruhn http://fpaste.org/63320/87481817/
19:37 glusterbot Title: #63320 Fedora Project Pastebin (at fpaste.org)
19:37 dbruhn Those are all of the packages I install for IB on RHEL
19:37 dbruhn I can't imagine they are very different on debian
19:38 ctria joined #gluster
19:38 dbruhn Also, you should make sure you are on the latest firmware on those cards. If they are mellenox, the mellenox people will help you and can update everything on the subnet in one update
19:39 dbruhn I have a problem in RHEL, where the cards won't connect right away so I have to manually bring the interfaces up for TCP after boot
19:39 aixsyd cisco cards
19:39 aixsyd [  3]  0.0-10.0 sec  8.63 GBytes  7.41 Gbits/sec
19:39 aixsyd fap
19:39 dbruhn hmm, I think the cisco's are rebranded mellenox
19:39 dbruhn 10gb cards?
19:39 aixsyd yeah
19:40 aixsyd dual port
19:40 sticky_afk joined #gluster
19:40 stickyboy joined #gluster
19:49 mattappe_ joined #gluster
19:51 mattapp__ joined #gluster
19:51 aixsyd dbruhn: i'm assuming the max rate from a single port with these cards is 10gig, ya?
20:12 lbalog_ joined #gluster
20:14 dbruhn Not sure on the model of card, but if they are 10GB cards, then I would assume yes.
20:19 aixsyd strange - i cannot seem to get a volume heal to commence via the IB cards
20:20 aixsyd i have a folder on server 2 thats not on server 1 - heal all doesnt do anything
20:23 blook joined #gluster
20:27 dbruhn does running plume status, or peer status show anything not connected?
20:28 dbruhn volume
20:48 aixsyd sec
20:48 badone joined #gluster
20:49 calum_ joined #gluster
20:50 gmcwhistler joined #gluster
20:54 glusterbot New news from newglusterbugs: [Bug 1041109] structure needs cleaning <https://bugzilla.redhat.co​m/show_bug.cgi?id=1041109>
20:56 gmcwhistler joined #gluster
20:56 daMaestro joined #gluster
21:16 badone joined #gluster
21:16 plarsen joined #gluster
21:21 aliguori joined #gluster
21:21 primechuck joined #gluster
21:48 primechuck joined #gluster
22:11 andreask joined #gluster
22:19 pk joined #gluster
22:23 mattappe_ joined #gluster
22:24 rwheeler joined #gluster
22:29 sprachgenerator joined #gluster
22:39 mattappe_ joined #gluster
22:41 mattappe_ joined #gluster
23:13 yinyin joined #gluster
23:13 pk left #gluster
23:27 jobewan joined #gluster
23:29 mattappe_ joined #gluster
23:31 mattapperson joined #gluster
23:34 gdubreui joined #gluster
23:37 nhm joined #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary