Perl 6 - the future is here, just unevenly distributed

IRC log for #gluster, 2014-12-04

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:00 JoeJulian I thought they shared the data.
00:00 PeterA hmm….doesn't looks like
00:00 PeterA only showing heal-failed on one node
00:02 PeterA it's all gfid entries
00:02 JoeJulian Back in 3.3, I tested it and had to restart all the glusterd at the same time.
00:03 JoeJulian I haven't tried since. I don't really care if there's entries in there unless they're past a point in time that I consider them possibly relevant.
00:06 PeterA i though heal-failed means files only got one copy
00:10 nishanth joined #gluster
00:12 JoeJulian It could mean anything, including things that are transient.
00:12 tessier_ Woohoo! Finally have a xen VM installing directly onto glusterfs with no iscsi hacks in the middle. Sweet.
00:13 JoeJulian The only way to tell for sure is to interpret the logs from which it originated. If you stat the file in question through a client and there isn't a new heal-failed entry, it's fine.
00:13 JoeJulian nice
00:13 tessier_ The first machine I tried it on had all sorts of issues because it was running a rather old Xen. I suspect they have made changes in the last couple of years which make this sort of thing easier.
00:13 JoeJulian Yeah, probably.
00:17 PeterA i piped the gfid list of heal-failed into gfid-resolver.sh and seems like the gfid only exist on one node....
00:17 PeterA it's a replica 2 volume
00:25 PeterA when i look into the gfids in heal-fail, it seems only exist on that node that reporting heal-failed
00:25 PeterA and that file has no link
00:26 PeterA seems like it's a hard link on that file system
00:26 PeterA -rw-r--r--   2 schedule  dba          12292672 Dec  3 12:20 e13f07eb-f8c6-4b59-8fbf-3bfb7686fffa
00:26 PeterA 12/03/14 16:25:19 [ /brick03/gfs/.glusterfs/e1/3f ]
00:26 glusterbot PeterA: -rw-r--r's karma is now -13
00:28 JoeJulian Yes, gfid files are hardlinks.
00:28 JoeJulian @gfid resolver
00:28 glusterbot JoeJulian: https://gist.github.com/4392640
00:30 PeterA cuz the heal fail list is now 1024 which seems maxed out
00:30 PeterA is that safe to delete the file of the heal-failed in .glusterfs?
00:30 JoeJulian usually. As long as the link count exceeds 1.
00:32 JoeJulian Won't cure the heal-failed, and will actually just exacerbate it though.
00:33 PeterA like this?
00:33 PeterA root@glusterprod005:/brick03/gfs/.glusterfs/f9/1c# ls -l f91c4e58-fdd9-4dc7-aae0-c35eac005550
00:33 PeterA -rw-rw-r-- 2 sitebuild sitebuild 3890842 Dec  3 12:18 f91c4e58-fdd9-4dc7-aae0-c35eac005550
00:33 glusterbot PeterA: -rw-rw-r's karma is now -1
00:34 n-st joined #gluster
00:35 JoeJulian Use the gfid resolver I pointed you at earlier, find out what file needs healed, try accessing that file through a client and see what the client log says.
00:35 PeterA i m looping it for all the gfid and taking forever
00:36 JoeJulian I would start with one. Odds are usually pretty good that if you can figure out one, the rest have the same problem.
00:36 JoeJulian And yes, it's going to take forever.
00:36 JoeJulian It has to walk the brick tree looking for the same inode number.
00:39 PeterA 0037e1a8-2f4d-4991-8453-1b0b18d345f2=​=File:/brick03/gfs/NetappFs2/marketin​g/replication-job/1417641321649.txt
00:39 PeterA root@glusterprod005:~# ls -l /brick03/gfs/NetappFs2/marketing/​replication-job/1417641321649.txt
00:39 PeterA -rw-r--r-- 2 schedule dba 10737817 Dec  3 13:16 /brick03/gfs/NetappFs2/marketing/​replication-job/1417641321649.txt
00:39 glusterbot PeterA: -rw-r--r's karma is now -14
00:39 PeterA found the file
00:39 PeterA 1st one :)
00:40 PeterA and over gfs fuser mount
00:40 PeterA root@glustermgrprod001:/gfs/sas03/NetappFs2# ls -l marketing/replication-job/1417641321649.txt
00:40 PeterA -rw-r--r-- 1 schedule dba 10737817 Dec  3 13:16 marketing/replication-job/1417641321649.txt
00:40 glusterbot PeterA: -rw-r--r's karma is now -15
00:41 JoeJulian then you look in the log and see what its problem is.
00:42 JoeJulian And if you can read it, you really don't need to paste it here. I won't be second guessing your ability.
00:42 PeterA glustershd.log:[2014-12-04 00:34:41.955452] W [client-rpc-fops.c:2774:client3_3_lookup_cbk] 0-sas03-client-5: remote operation failed: No such file or directory. Path: <gfid:0037e1a8-2f4d-4991-8453-1b0b18d345f2> (0037e1a8-2f4d-4991-8453-1b0b18d345f2)
00:43 PeterA so all the files seems go heal-failed due to the other brick was down
00:43 PeterA and in that case, how to speed up the heal?
00:43 JoeJulian bend time and space... ;)
00:44 PeterA lol
00:44 PeterA would it heal it self or i need to do something with it?
00:44 JoeJulian So you have self-heal daemons that walk the list of changed files, compare the source with the destination looking for changes, copy the data over when needed, then moving on to the next. There's really no way to make that faster.
00:45 JoeJulian Though everyone asks for it to be instantaneous. :D
00:45 PeterA lol
00:45 PeterA so it will really "self-heal" ?
00:45 JoeJulian really, really.
00:45 PeterA i dun mind waiting just want to make sure it is working on it's way :P
00:50 PeterA the reason why i try to fix these as i m getting IO error on files on gluster
00:50 JoeJulian That sounds like split-brain
00:50 PeterA ---> root@jobserverprod006.shopzilla.laxhq (0.02)# ls -l /netapp/rawLogDumps/logdump004.s​l2.sea/search_log_2014_12_03_14
00:50 PeterA ls: /netapp/rawLogDumps/logdump004.s​l2.sea/search_log_2014_12_03_14: Input/output error
00:50 glusterbot PeterA: -'s karma is now -340
00:50 PeterA but i am not getting it from the gluster volume command
00:51 JoeJulian Is that through nfs?
00:52 PeterA YES
00:52 PeterA and actually both NFS and fuser
00:52 PeterA i tried both
00:52 JoeJulian Check the nfs.log on whichever server they're mounting from for the errors (not warnings) associated with those files.
00:53 PeterA [2014-12-04 00:51:23.405114] E [afr-self-heal-common.c:2868:afr_​log_self_heal_completion_status] 0-sas03-replicate-2:  gfid or missing entry self heal  failed,   on /NetappFs2/rawLogDumps/logdump004​.sl2.sea/search_log_2014_12_03_14
00:54 PeterA can we delete the file?
00:55 PeterA i tried to delete it over nfs and fuse but no luck
00:55 PeterA ---> root@jobserverprod006.shopzilla.laxhq (0.01)# rm /netapp/rawLogDumps/logdump004.s​l2.sea/search_log_2014_12_03_14
00:55 PeterA rm: cannot lstat `/netapp/rawLogDumps/logdump004.s​l2.sea/search_log_2014_12_03_14': Input/output error
00:55 glusterbot PeterA: -'s karma is now -341
00:56 JoeJulian Paste the log +- 10 lines around that error up there.
00:57 harish joined #gluster
00:59 PeterA http://pastie.org/9759411
01:06 PeterA for some reason, i see two copies of that file
01:07 PeterA but one of it is zero byte
01:07 PeterA its that ok to remove that copy?
01:08 PeterA ok i removed that copy and seems it healed itself....
01:09 topshare joined #gluster
01:10 PeterA so it seems like those heal failed entried will require a manual removal of the 0 byte copy??
01:12 ceol joined #gluster
01:18 ceol is there a recommended structure for creating the bricks, as far as theyre location goes?
01:19 JoeJulian Some people like /data, some like /srv, some /export. I disagree with the last as it is sometimes confused with nfs.
01:20 ceol and i want to know when i mounting any files from client, i want to check bricks that the file is in there
01:21 JoeJulian I've been using "/data/$volume/$lv_name/brick" where the LV that's formatted xfs is mounted at "/data/$volume/$lv_name"
01:23 JoeJulian ceol:  To check ,,(which brick) your file is on from a mounted fuse client, use the following:
01:23 glusterbot ceol: To determine on which brick(s) a file resides, run getfattr -n trusted.glusterfs.pathinfo $file through the client mount.
01:24 ceol umm, for example?
01:26 JoeJulian I'm not sure how that's unclear. Offer an example as you understand it and I'll give you feedback.
01:32 ceol mkdir /mnt/datastore ; mount -t glusterfs hostname:/volume /mnt/datastore
01:33 ceol cd /mnt/datastore
01:34 ceol dd if=/dev/zero of=mnt/datastore/a.out bs=1MB count=100;
01:34 ceol ls a.out
01:35 ceol then i can see 'a.out' file is in that direc,
01:36 ceol from here, you mean i try that commands, getfattr -n trusted.glusterfs.pathinfo $file ?
01:37 PeterA seems like all we need is to run ls on the those heal-failed
01:39 ceol just run ls
01:40 PeterA hmm…think we are talking about diff topics ?:)
01:41 PeterA thanks for help JoeJulian
01:41 ceol yeah...
01:41 ceol thank you very much!!
01:41 PeterA and thanks semoisis working on the 3.5.3 build :)
01:42 phak joined #gluster
01:51 phak hi, quick questions, is anyone show me the diff btw fuse n nfs?
01:53 PeterA i can try to give u a quick answer :)
01:53 PeterA fuser: Filesystem in Userspace e.g. glusterfs
01:53 PeterA nfs: Network File system which is nfs
01:54 PeterA you need gluster client to mount glusterfs/fuse
01:54 PeterA u need nfs client which comes standard in most unix
01:55 JoeJulian The fuse client connects to all the servers and manages HA for you. NFS can only connect to one IP adress, though you can use tools to float that address around to give a lesser degree of HA.
01:56 phak i have 4 servers and i made distributed volume, then
01:56 phak and one client server
01:56 JoeJulian The fuse client has much higher throughput. NFS has in-kernel caching, which can be beneficial for some use cases, detrimental for others.
01:57 phak ah......
02:01 phak okay, i got it, thanks to all!
02:02 HOO_ joined #gluster
02:10 glusterbot News from newglusterbugs: [Bug 1170407] self-heal: heal issue, E [afr-self-heal-data.c:1613:afr_sh_data_open_cbk] 0-vol0-replicate-2: open of <gfid:ee96b8a3-de43-48e0-8920-325a3890bb3e> failed on child vol0-client-4 (Permission denied) <https://bugzilla.redhat.co​m/show_bug.cgi?id=1170407>
02:11 haomaiwang joined #gluster
02:14 Guest72876 hmm,,,     gluster volume create repli-vol replica 2 [bricks..]
02:15 Guest72876 volume create: repli-vol: failed: Brick: [hostname:/bricks] not available. Brick may be containing or be contained by an existing brick
02:16 JoeJulian Guest72876: What's the command you're trying?
02:18 Guest72876 i tried, gluster volume create repli-vol replica 2 host1:/dist1 host2:/dist2 host3:/dist4 host3:/dist5
02:18 Guest72876 Multiple bricks of a replicate volume are present on the same server. This setup is not optimal.
02:18 Guest72876 Do you still want to continue creating the volume?  (y/n) y
02:19 Guest72876 volume create: repli-vol: failed: Brick: host1:/dist1 not available. Brick may be containing or be contained by an existing brick
02:19 Guest72876 here is :)
02:19 calisto joined #gluster
02:19 JoeJulian The first warning suggests that two of the hostnames resolve to the same IP address.
02:19 JoeJulian That probably messes up the rest of the command.
02:21 Guest72876 is it impossible to use the file-contained brick?
02:22 Guest72876 i see that host1:/dist1 brick has a file by running ls.
02:22 JoeJulian What's a "file-contained brick"?
02:22 Guest72876 containing some files
02:23 JoeJulian A filesystem with files on it?
02:23 Guest72876 yep, that's right
02:23 JoeJulian That's basically ok.
02:24 JoeJulian Did you resolve your hostname problem yet?
02:26 Guest72876 yeah i solved it but still has a problem
02:26 Guest72876 volume create: repli-vol: failed: Brick: host1:/dist1 not available. Brick may be containing or be contained by an existing brick
02:45 bharata-rao joined #gluster
02:49 bala joined #gluster
03:18 calisto joined #gluster
03:25 RameshN joined #gluster
03:26 meghanam joined #gluster
03:27 meghanam_ joined #gluster
03:30 kanagaraj joined #gluster
03:41 lmickh joined #gluster
03:44 th6 joined #gluster
03:45 th6 Hiya. Anybody still awake?
03:50 DV joined #gluster
03:51 ninkotech joined #gluster
03:52 bala joined #gluster
03:59 Telsin joined #gluster
04:00 RameshN joined #gluster
04:01 itisravi joined #gluster
04:20 DV joined #gluster
04:22 free_amitc_ joined #gluster
04:26 nbalacha joined #gluster
04:37 nishanth joined #gluster
04:43 soumya joined #gluster
04:44 rjoseph joined #gluster
04:47 rafi1 joined #gluster
04:47 SOLDIERz__ joined #gluster
04:48 anoopcs joined #gluster
04:59 jiffin joined #gluster
04:59 meghanam joined #gluster
04:59 meghanam_ joined #gluster
05:02 lyang01 joined #gluster
05:05 ninkotech joined #gluster
05:05 sharknardo joined #gluster
05:06 al joined #gluster
05:06 dusmant joined #gluster
05:07 hagarth joined #gluster
05:10 hflai_ joined #gluster
05:10 johndescs joined #gluster
05:11 XpineX joined #gluster
05:12 wgao joined #gluster
05:12 badone joined #gluster
05:13 huleboer joined #gluster
05:16 aravindavk joined #gluster
05:18 nishanth joined #gluster
05:33 poornimag joined #gluster
05:49 kumar joined #gluster
05:49 hagarth joined #gluster
05:54 purpleidea joined #gluster
05:54 purpleidea joined #gluster
05:56 soumya joined #gluster
05:57 saurabh joined #gluster
05:58 RameshN joined #gluster
06:02 raghu` joined #gluster
06:03 aravindavk joined #gluster
06:04 nishanth joined #gluster
06:09 R0ok_ joined #gluster
06:11 nshaikh joined #gluster
06:11 overclk joined #gluster
06:29 R0ok_ joined #gluster
06:34 rjoseph joined #gluster
06:50 nishanth joined #gluster
06:51 dusmant joined #gluster
06:52 aravindavk joined #gluster
06:58 ctria joined #gluster
06:59 bala joined #gluster
06:59 hagarth joined #gluster
07:03 andreask joined #gluster
07:04 haomaiwang joined #gluster
07:07 R0ok_ joined #gluster
07:18 rjoseph joined #gluster
07:19 topshare joined #gluster
07:23 SOLDIERz__ joined #gluster
07:25 ramteid joined #gluster
07:27 harish joined #gluster
07:59 ws2k3 joined #gluster
08:00 harish joined #gluster
08:02 stickyboy joined #gluster
08:03 LebedevRI joined #gluster
08:10 ricky-ticky1 joined #gluster
08:11 hagarth joined #gluster
08:18 bala joined #gluster
08:20 R0ok_ joined #gluster
08:25 topshare joined #gluster
08:28 bennyturns joined #gluster
08:28 nishanth joined #gluster
08:30 dusmant joined #gluster
08:37 atalur joined #gluster
08:37 rjoseph joined #gluster
08:40 poornimag joined #gluster
08:41 glusterbot News from newglusterbugs: [Bug 1170515] Change licensing of disperse to dual LGPLv3/GPLv2 <https://bugzilla.redhat.co​m/show_bug.cgi?id=1170515>
08:47 vimal joined #gluster
08:49 poornimag joined #gluster
08:53 dusmant joined #gluster
08:56 kovshenin joined #gluster
08:57 bjornar joined #gluster
09:08 ghenry joined #gluster
09:08 ghenry joined #gluster
09:11 glusterbot News from newglusterbugs: [Bug 1153610] libgfapi crashes in glfs_fini for RDMA type volumes <https://bugzilla.redhat.co​m/show_bug.cgi?id=1153610>
09:12 poornimag joined #gluster
09:14 partner hmm was the mountpoint warning added to 3.4.2? volume create: foo: failed. The brick xxx is a mount point. i guess i haven't created volumes since the upgrade, was just wondering why our old documentation wasn't working :o
09:15 liquidat joined #gluster
09:15 JoeJulian Sounds like about the right timeframe... you can override with "force" if you wish.
09:16 partner yeah, just wanted to know "why" and random posts here and there suggest that's the reason
09:16 JoeJulian I think the idea behind it is, if the brick is a subdirectory of the filesystem root, if the filesystem is not mounted, your brick won't start and you won't fill up your root filesystem.
09:16 partner yeah, that's how i read it too. its just, i have 25 bricks already in the volume and i don't want to change the schema there
09:17 JoeJulian Just add "force" to the end.
09:17 JoeJulian Learn, you must. Use the force, you will.
09:17 partner heh
09:20 [Enrico] joined #gluster
09:23 Kins joined #gluster
09:23 atalur joined #gluster
09:24 partner any known issues if i want to mix 3.4.5 and latest of 3.6 series in the same peer, possibly/probably in volumes, too?
09:24 JoeJulian I haven't heard anything.
09:25 partner i am migrating from a datacenter to another and its been thought if the new boxes should have latest version..
09:28 partner lets see, definately don't have time for anything extra, guess we could just mass-upgrade everything once the migration is done
09:32 rjoseph joined #gluster
09:39 hagarth joined #gluster
09:40 dusmant joined #gluster
09:52 nshaikh joined #gluster
09:53 SOLDIERz__ joined #gluster
10:11 glusterbot News from newglusterbugs: [Bug 1170548] [USS] : don't display the snapshots which are not activated <https://bugzilla.redhat.co​m/show_bug.cgi?id=1170548>
10:13 topshare joined #gluster
10:16 nbalacha joined #gluster
10:17 bala joined #gluster
10:20 atalur joined #gluster
10:23 Slashman joined #gluster
10:36 dusmant joined #gluster
10:44 nbalacha joined #gluster
10:57 bala joined #gluster
11:11 calum_ joined #gluster
11:11 glusterbot News from newglusterbugs: [Bug 1170575] Cannot mount gluster share with mount.glusterfs <https://bugzilla.redhat.co​m/show_bug.cgi?id=1170575>
11:21 Hive joined #gluster
11:28 calisto joined #gluster
11:31 kkeithley1 joined #gluster
11:32 gothos joined #gluster
11:33 gothos Hey, guys. I am getting invalid inode messages in my gluster logs, is there any way to find out which brick they are originating from?
11:37 bala joined #gluster
11:40 calisto gothos: Ouch!! sorry I'm newest on this technology but  I'll walk in feeling!!
11:41 soumya_ joined #gluster
11:44 * gothos hopes this is not another bug, already found two small ones :D
11:45 gothos btw. was is the best way to submit patches?
11:45 gothos s/was/what/
11:45 glusterbot What gothos meant to say was: An error has occurred and has been logged. Please contact this bot's administrator for more information.
11:45 gothos sigh.
11:46 R0ok_ joined #gluster
11:48 hagarth gothos: gerrit instance on review.gluster.org is the best interface for submitting patches
11:48 stickyboy joined #gluster
11:48 hagarth gothos: http://www.gluster.org/documentation​/developers/Simplified_dev_workflow/
11:49 gothos hagarth: thank you!
11:50 hagarth gothos: for your problem, you can send a mail on gluster-users. The chances of getting a response there is higher.
11:50 gothos hagarth: yeah, I just need to actually get to work first and accumulate all the data
11:51 gothos hm, to correct a spelling fix I would have to open a bugzilla ticket, that is overkill oO
11:52 hagarth gothos: you can also use an existing bugzilla for the same
11:52 ricky-ticky joined #gluster
11:52 hagarth gothos: for instance, https://bugzilla.redhat.co​m/show_bug.cgi?id=1075417
11:52 glusterbot Bug 1075417: unspecified, unspecified, ---, bugs, NEW , Spelling mistakes and typos in the glusterfs source
11:53 gothos hagarth: awesome, very helpful :)
12:04 calisto joined #gluster
12:05 ricky-ticky joined #gluster
12:09 R0ok_ joined #gluster
12:16 andreask left #gluster
12:18 poornimag joined #gluster
12:23 hagarth joined #gluster
12:25 itisravi_ joined #gluster
12:30 abyss_ I have a question about volumes:) What is better, I have 100 web-apps and static files lie on gluster... It's better to make for every app volume (so 100 volumes) or keep it on one big volume?
12:32 topshare joined #gluster
12:34 itisravi joined #gluster
12:39 dusmant joined #gluster
12:42 jcsp_ joined #gluster
12:44 ninkotech joined #gluster
12:45 ninkotech_ joined #gluster
12:52 R0ok_ joined #gluster
12:55 abyss_ JoeJulian: are you here?;)
12:56 chirino joined #gluster
13:02 anoopcs joined #gluster
13:04 Slashman joined #gluster
13:11 Philambdo joined #gluster
13:13 jiku joined #gluster
13:15 DV joined #gluster
13:16 itisravi joined #gluster
13:22 R0ok_ joined #gluster
13:23 calisto1 joined #gluster
13:45 RameshN joined #gluster
13:48 topshare joined #gluster
13:49 ppai joined #gluster
13:51 bene2 joined #gluster
13:53 mbukatov joined #gluster
13:53 hagarth joined #gluster
13:55 rgustafs joined #gluster
13:58 itisravi joined #gluster
14:00 plarsen joined #gluster
14:01 sniperCZE joined #gluster
14:04 sniperCZE Hello channel, I have problem with distributed volume in GlusterFS 3.5.2 on Debian. I have 4 servers and distributed volume with one brick. When I try to add another brick into this volume, server says "volume add-brick: failed: Commit failed on localhost. Please check the log file for more details." and volume info is out of sync - server where I run volume add-brick sees 2 bricks but another servers see only one brick. Can anyone help with this please?
14:04 nbalacha joined #gluster
14:05 julim joined #gluster
14:05 edward1 joined #gluster
14:08 misko_ Did you check the logfile?
14:08 misko_ It should reside in /var/log/gluster/volumename.log
14:20 bennyturns joined #gluster
14:32 chirino joined #gluster
14:34 B21956 joined #gluster
14:35 sniperCZE misko_: There is copy of log of volume http://pastebin.com/jcDXDSbg
14:35 glusterbot Please use http://fpaste.org or http://paste.ubuntu.com/ . pb has too many ads. Say @paste in channel for info about paste utils.
14:36 B21956 joined #gluster
14:36 B21956 left #gluster
14:37 abyss_ sniperCZE: You have everything in logs as I can see;)
14:37 abyss_ even what to check and where to look...
14:37 tdasilva joined #gluster
14:44 anil joined #gluster
14:46 itisravi joined #gluster
14:47 anil joined #gluster
14:52 abyss_ left #gluster
14:52 abyss^ joined #gluster
14:53 virusuy joined #gluster
14:53 virusuy joined #gluster
14:57 deepakcs joined #gluster
14:58 lpabon joined #gluster
15:17 ndevos NuxRo: still using CloudStack? do you know if the (unpatched) webui offers an option for primary storage on Gluster?
15:17 * ndevos has to respond on https://reviews.apache.org/r/15933/
15:21 si289 joined #gluster
15:22 si289 left #gluster
15:23 jobewan joined #gluster
15:28 hagarth joined #gluster
15:28 SOLDIERz joined #gluster
15:33 jmarley joined #gluster
15:33 raghu_ joined #gluster
15:38 soumya_ joined #gluster
15:45 kumar joined #gluster
15:45 ctria joined #gluster
15:46 Hive1 joined #gluster
15:46 bala joined #gluster
15:46 NuxRo ndevos: yes, it is an available option out of the box, at least on 4.4.2 that I am running right now
15:49 ndevos NuxRo++ thanks for testing
15:49 glusterbot ndevos: NuxRo's karma is now 1
15:50 soumya_ joined #gluster
15:55 rjoseph joined #gluster
16:00 calisto joined #gluster
16:01 RameshN joined #gluster
16:04 bala joined #gluster
16:09 portante joined #gluster
16:14 edong23_ joined #gluster
16:14 marbu joined #gluster
16:15 Andreas-IPO joined #gluster
16:15 bene joined #gluster
16:16 stigchri1tian joined #gluster
16:16 bennyturns joined #gluster
16:17 russoisraeli joined #gluster
16:18 ildefonso joined #gluster
16:19 hagarth joined #gluster
16:19 _weykent joined #gluster
16:19 DV joined #gluster
16:19 crashmag joined #gluster
16:20 sadbox joined #gluster
16:20 kalzz joined #gluster
16:21 verboese joined #gluster
16:21 russoisraeli hello folks. I've setup gluster on 3 replicas. Each of this replicas is a dual core system, with 3G ram, and with a dedicated gigabit interface for gluster communication. Connecting those is a gigabit switch. iperf tests show that the communication between the 3 hosts is in fact gigabit-like. The replicas run Gentoo Linux, with Gluster 3.5.2. I've setup qemu/kvm with libgfapi access to run a VM on two of the 3 replicas.
16:22 russoisraeli unfortunately, the VM's are extremely slow with hard drive access. looking at graph stats and immediate performance stats, it shows that network usage is minimal, with some 40Mbps max. CPU however skyrockets for gluster processes
16:23 russoisraeli Also, it shows frequent netstat send/receive queue piling up
16:24 russoisraeli Could someone please share some ideas and give some recommendations?
16:29 nshaikh joined #gluster
16:30 ildefonso russoisraeli, I am a bit new to this too, but, what is your network interface brand/model? also, for the VMs, what disk model did you use?
16:32 russoisraeli ildefonso - for network interfaces, it's Intel for 2 of them, and Broadcom for another one. For the VM's - it's qcow. The underlining filesystem supporting Gluster is XFS
16:32 russoisraeli *qcow2
16:33 ildefonso russoisraeli, is the VM using virtio? (it should)
16:33 russoisraeli yes, it does
16:34 gothos Apropo xfs, I cried a little today when I found out, that glusterd is actually calling xfs_info to find the inode size of the FS, which by the way is failing on one of m systems.
16:35 russoisraeli I am not too familiar with XFS, but I decided to use it, since it was recommended for Gluster
16:36 ildefonso russoisraeli, is this a test system? (ie, can you break it with no serious consequences)
16:36 russoisraeli yep, it is
16:37 ildefonso ok, I guess you could unplug the network cable, and test with it disconnected and see if it improves.
16:37 _dist joined #gluster
16:37 russoisraeli i.e, with one gluster node?
16:37 ildefonso yes, one of the gluster nodes that is running a VM
16:37 russoisraeli not bad of an idea
16:38 russoisraeli let's try it
16:38 _dist JoeJulian: I heard rumours that the heal will no longer have to check entire files in some future version, is that true?
16:38 ildefonso this worked for me while troubleshooting DRBD the other day, it was networking problem (which revealed when I disconnected, with an improved performance)
16:46 russoisraeli yeah, gluster processes now barely use any CPU at all
16:46 russoisraeli and i think it is faster
16:47 russoisraeli any adjustments you did to the replica system - tweaks, etc?
16:47 russoisraeli uh oh
16:47 russoisraeli spoke too early
16:47 russoisraeli gluster is celebrating again
16:48 russoisraeli glusterfsd specifically
16:48 russoisraeli what version are you guys running?
16:48 russoisraeli by celebrating I mean using around 200% cpu :)
16:50 gothos isn't that prett normal in quite a few cases?
16:50 gothos at least here it is with m setup
16:50 gothos my
16:51 jackdpeterson joined #gluster
16:51 russoisraeli the operation I am doing - is an svn update
16:52 calisto1 joined #gluster
16:52 russoisraeli I'd expect gluster to do a lot of network i/o and not be cpu bound
16:52 edong23 joined #gluster
16:52 gothos well, I cann tell you that small read/write operations are a heavy hitter CPU wise
16:53 gothos we have a few situations where transfer speeed is bound by CPU
16:53 gothos copying data in big chunks is a no brainer tho and we easily get around 300MB/s
16:53 russoisraeli I guess you are right.... let me try copying some large file into a VM
16:54 gothos beware that copying a large file is not the same as coping with big blocks
16:54 gothos just saing
16:54 gothos this fucked up keyboard
16:55 russoisraeli what would you suggest as a good test?
16:57 xbow joined #gluster
17:01 PeterA joined #gluster
17:05 tdasilva_ joined #gluster
17:08 xbow left #gluster
17:12 ildefonso russoisraeli, dd if=/dev/urandom of=test.img bs=1048576 count=100 , then: cat test.img test.img test.img > test3.img.  Now test3 is ~300MB in size of pseudoramdom data.  Now, use that file to do some tests, for example:
17:12 ildefonso dd if=test3.img of=/dev/null bs=1048576 iflag=direct  (test read speed)
17:13 glusterbot News from newglusterbugs: [Bug 1075417] Spelling mistakes and typos in the glusterfs source <https://bugzilla.redhat.co​m/show_bug.cgi?id=1075417>
17:13 ildefonso dd if=test3.img of=test4.img bs=1048576 oflag=direct (test write speed, assuming you have enough RAM to have 300MB of data in cache)
17:14 ildefonso russoisraeli, you could also find iozone useful.
17:14 russoisraeli oldefonso - thanks so much! will try
17:14 russoisraeli ildefonso
17:16 gothos btw. `cp' only uses a bufsize of 128KB, which is still a lot better than like 8KB stuff via glusterfs
17:18 calisto joined #gluster
17:20 daMaestro joined #gluster
17:28 chirino joined #gluster
17:39 jackdpeterson Hey all, yesterday I attempted to get cachefilesd working w/ my Ubuntu 12.04 (Gluster 3.5.2) NFSv3 mounted clients. Lots of kernel complaints, overlong errors and so forth. Ultimately cachefilesd  appears to not work for whatever reason. As such I'm now looking for some recommendations as far as increasing the performance of things for read-heavy workload with very small files.
17:39 jackdpeterson -- web-server clusters hosting php sites
17:41 JoeJulian @php
17:41 glusterbot JoeJulian: (#1) php calls the stat() system call for every include. This triggers a self-heal check which makes most php software slow as they include hundreds of small files. See http://joejulian.name/blog/optimizi​ng-web-performance-with-glusterfs/ for details., or (#2) It could also be worth mounting fuse with glusterfs --attribute-timeout=HIGH --entry-timeout=HIGH --negative-timeout=HIGH --fopen-keep-cache
17:53 plarsen joined #gluster
17:54 calisto joined #gluster
17:56 harish joined #gluster
18:08 jackdpeterson @JoeJulian -- regarding glusterbot's recommendation, this does seem to make a slight improvement -- ls'ing the /var/www/domains folder is quicker. page load is a bit quicker; however, I'm curious how I would implement the io-cache translator client-side or if that's irrelevant given those attribute-timeout options ,etc
18:09 JoeJulian If you're optimizing storage performance without caching things closer to the end user, you're doing it wrong.
18:10 jackdpeterson -- my glusterd.vol file doesn't exist client-side. I'm telling it to use the server's config file but that doesn't create a local copy -- I'd like to implement some kind of test there on one client rather than apply that across the board.  Caching closer to the client would be nice if we were serving static content; however, each page is dynamic for each user.
18:12 jackdpeterson We have a number of caches in place at various levels -- object caches, etc; however, the application logic must remain dynamic.
18:13 glusterbot News from newglusterbugs: [Bug 1165938] Fix regression test spurious failures <https://bugzilla.redhat.co​m/show_bug.cgi?id=1165938>
18:15 lmickh joined #gluster
18:17 JoeJulian If your source code is not dynamic, it should only need to be loaded once. That's usually the biggest problem with php. Using a source cache, like apc, prevents all the the most poorly written php code from doing all those massive lookups.
18:18 JoeJulian In /var/lib/glusterd/vols you can find the vol files that gluster passes the clients. You can play with those on your client, but be aware that if you do so, that client will no longer be managed with glusterd.
18:18 nshaikh joined #gluster
18:20 coredump joined #gluster
18:21 chirino joined #gluster
18:38 calisto joined #gluster
18:41 calisto joined #gluster
18:49 adamaN joined #gluster
18:51 * gothos wonders about the possible problems when glusterfs doesn't know the inode size
18:52 JoeJulian Shouldn't matter. The inode size is irrelevant with posix calls.
18:54 gothos Okay, since the call to xfs_info fails here, which apparently is fixed in master already
18:54 gothos but in that case I wont backport that
18:56 gothos I was actually wondering if my "invalid inode" errors originated there, oh well :D
18:57 diegows joined #gluster
19:02 andreask1 joined #gluster
19:03 andreask joined #gluster
19:03 JoeJulian Are they errors or warnings? I've seen that as a warning.
19:07 gothos JoeJulian: they are warnings, but the GF_ASSERT(0) right before isn't
19:08 rafi1 joined #gluster
19:10 ricky-ticky joined #gluster
19:12 JoeJulian What's not working?
19:15 gothos JoeJulian: well, I have a difference of 1TB between my bricks on two different servers. The heal doesn't work as it seems, and I get a lot of messages in the logs
19:16 gothos like http://pastebin.com/raw.php?i=G4T5LJeX
19:16 JoeJulian Check with du --apparent-size . Self-heal does leave sparse files.
19:16 glusterbot Please use http://fpaste.org or http://paste.ubuntu.com/ . pb has too many ads. Say @paste in channel for info about paste utils.
19:16 gothos JoeJulian: ahh good idea! will do right away
19:18 JoeJulian That looks like the result of a bug I once saw where self-heal was creating directories in .glusterfs/XX/YY/XXYY.* instead of symlinks. If it's a directory, or possibly even if it's a file, that could possibly cause that.
19:19 JoeJulian find $brick_root/.glusterfs/*/* -type d
19:19 JoeJulian If there are any, there shouldn't be.
19:21 gothos JoeJulian: There are quite a few
19:22 JoeJulian fpaste
19:23 JoeJulian or pastebinit if you prefer...
19:24 JoeJulian I like fpaste since I use rpms and it's less typing...
19:25 julim joined #gluster
19:26 gothos don't really care either way, find is still running. I'll paste it when it's done
19:27 JoeJulian Just a small sample.. I just want to make sure I'm not steering you wrong.
19:28 gothos ah, actuall the .glusterfs/XX/YY are also found, those seem to be directories, but there are a lot of XXYY files
19:29 gothos http://fpaste.org/156727/72134614/
19:29 JoeJulian Meh...
19:29 JoeJulian find $brick_root/.glusterfs/*/*/* -type d
19:29 gothos Yeah, it's already running :)
19:30 JoeJulian Wow... that's a lot of directories... That's definitely wrong.
19:30 rotbeard joined #gluster
19:30 JoeJulian What version did that?
19:31 gothos No, that's the files
19:31 JoeJulian "type d" should only show directories.
19:31 gothos I first thought that the Y should be symlinks as well, but since you mentioned that files shouldn't be there either
19:32 gothos Yes, I also looked for files in between
19:32 JoeJulian No, that's only true if it's the gfid of a directory.
19:32 gothos we started with 3.5.x, it's 3.6.1 now
19:33 gothos the new find hasn't found anythin yet, but it's quite slow :/
19:34 JoeJulian Read the ,,(extended attributes) of the $brick_root/home/ directory. The .glusterfs tree uses that gfid, as you can see from your example. It should be a symlink that points to its parent directory.
19:34 glusterbot (#1) To read the extended attributes on the server: getfattr -m .  -d -e hex {filename}, or (#2) For more information on how GlusterFS uses extended attributes, see this article: http://hekafs.org/index.php/2011/​04/glusterfs-extended-attributes/
19:34 JoeJulian Need to fix that hekafs link if I can find where Jeff said how to get to it...
19:38 Debloper joined #gluster
19:39 gothos JoeJulian: yeah seems that way http://fpaste.org/156728/72188114/
19:39 gothos and 00000000-0000-0000-0000-000000000001 again is a symlink to the brick root directory
19:43 sniperCZE joined #gluster
19:50 daMaestro joined #gluster
19:51 russoisraeli ildefonso - finally got to it. Any idea what would the UNIX/FreeBSD variant be for the read/write test. It doesn't appear to support iflag/oflag
20:00 verdurin joined #gluster
20:00 kmai007 joined #gluster
20:01 kmai007 hey guys what is the stable glusterfs release you'd use for production? is it 3.5.1?
20:01 JoeJulian Pretty sure it's 3.5.3 now
20:01 JoeJulian @latest
20:02 glusterbot JoeJulian: The latest version is available at http://download.gluster.org/p​ub/gluster/glusterfs/LATEST/ . There is a .repo file for yum or see @ppa for ubuntu.
20:02 JoeJulian No, not latest...
20:02 JoeJulian Yes, 3.5.3
20:03 kmai007 latest, but isn't there a "stable"
20:03 JoeJulian Though I've been hearing of a lot of people jumping to 3.6.1 and have had no reports of catastrophe there.
20:03 kmai007 i guess its been so long i've forgotten
20:03 JoeJulian It's actually, now that I'm actually thinking about it, a surprisingly boring release in that regard.
20:04 kmai007 3.6.1 is boring? Joe, really?
20:04 kmai007 disasters are no bueno
20:05 JoeJulian Yeah, but when they're not happening to me...
20:05 JoeJulian I like puzzles.
20:08 sniperCZE joined #gluster
20:08 kmai007 should i get the glusterfs-extra-xlators-3.5.3-1 ?
20:08 kmai007 i'm not familiar of what all has been added
20:09 zerick joined #gluster
20:09 JoeJulian no. Those aren't used.
20:09 ildefonso russoisraeli, uh... no, sorry, mostly Linux user here.
20:09 gothos *sigh* find is still running
20:10 JoeJulian Lots of files, eh?
20:10 _dist russoisraeli: I did port gluster to solaris once, but then I just switched to linux
20:11 gothos JoeJulian: yes, several millions at least, I just hope it's still around 10^6
20:11 partner one must do couple of find / du / similar operations to gluster volume to remember "doh, this wasn't the thing to do.."
20:12 gothos partner: the thing? like running find? it's running directly on the brick
20:13 gothos it's just a lot of small files and io with normal hdds
20:13 gothos can't help it :/
20:14 d4nku joined #gluster
20:14 partner same result :)
20:15 gothos doing it on the client would be even slower ;)
20:15 partner i don't know how many files i have but seeing the used inodes its 621 million on the volume, i wouldn't try rsync over that :o
20:17 gothos oh my, how big is that volume? I only have 29 million used
20:18 partner 280T currently out of which 256T is used
20:20 russoisraeli _dist - my VM is FreeBSD :) underlining is Gentoo Linux
20:20 russoisraeli I'd use FreeBSD if i could though....
20:20 adamaN left #gluster
20:21 russoisraeli ildefonso - ok, sure. Could you please explain to me what this "iflags=direct" is?
20:21 gothos it uses direct io should be the O_DIRECT flag
20:22 gothos there are some ports for xBSD I believe
20:22 ildefonso what gothos said :)
20:22 tetreis joined #gluster
20:22 ildefonso russoisraeli, it basically bypasses cache.
20:22 sickness I've recently did a little search on google, and I've found no working ports of glusterfs for any bsd (unfortunately) :/
20:23 sickness I did even search info to run a brick under cygwin, but it seems that no one tried that
20:23 russoisraeli I'm sure that a FreeBSD port will come around
20:23 russoisraeli thanks guys
20:24 tetreis hey folks. I'm analyzing glusterfs to a setup we'll need to do in my company. We are more interested on replicating data than distributing it (in the sense of having a distributed filesystem). Is GeoRep the correct thing to read about?
20:26 partner tetreis: by replicating do you look for fault tolerancy (as in data in two or more separate servers) or rather spreading the data around to multiple locations (offices or such) ?
20:26 tetreis or is GeoRep more focused on "geographic replication"? I mean, is there any other approach for replication on servers close to each other? (I know that technically it doesn't matter if servers are 10m or 10k km distant of each other)
20:26 partner it does matter when thinking latency
20:27 tetreis good point
20:27 elico joined #gluster
20:27 tetreis What we want is to have 2 separated filesystems that has the same content. We want to write on both and get the changes replicated. And we deal with lots of small files.
20:28 partner geo-replication is/used to be a "smart rsync" to a remote location, one-way
20:28 partner thought that changed in 3.5
20:29 tetreis right, doesn't seem my case
20:29 partner i like to refer to these nice redhat pictures such as this one: https://access.redhat.com/documentation/​en-US/Red_Hat_Storage/2.1/html/Administr​ation_Guide/images/Replicated_Volume.png
20:29 partner sounds like the thing you're looking for
20:30 partner that's basic functionality of glusterfs
20:30 d4nku Hello all, I'm not able to find any documentation on this. But I hoping someone can shed some light. Would having multiple 802.11Q tagging on a bond0 effect gluster in anyway? Replication/Peer network would be on its on a single port/vlan
20:32 partner tetreis: just today created one replicated volume as an effort to separate some data from another volume. further the idea is to migrate bricks to another datacenter, either with the hardware or over the network. the idea here is that as its replicated i can take down the other half of the hardware and moving and then let self-heal to catch up and sync all the data
20:33 kmai007 does anybody know what the new syntax would be for fetch-attempts=5 in the /etc/fstab ?
20:33 JoeJulian d4nku: It /shouldn't/ but in practice, we semi-frequently have people come here that have problems that have to do with bonded interfaces. Whether that's their hardware, or a kernel problem, or just configuration is beyond our scope.
20:33 tetreis nice, partner. So if I do the basic setup and one server goes down, I can just bring some new one hardware and continue from there?
20:34 JoeJulian kmai007: mount.glusterfs is a bash script. You can easily read it to see.
20:34 kmai007 thanks sir
20:34 JoeJulian I normally would, but I'm a bit tied up at the moment.
20:35 russoisraeli ildefonso - well, without the direct flag, write is a mere 4.4MBps
20:35 russoisraeli read is extremely fast though
20:35 russoisraeli 314572800 bytes transferred in 0.119821 secs (2625354587 bytes/sec)
20:35 kmai007 too bad its not in the man pages
20:35 russoisraeli 314572800 bytes transferred in 71.859886 secs (4377586 bytes/sec)
20:36 russoisraeli so, is this to assume that with the direct flag it will be even worse?
20:36 JoeJulian kmai007: file a bug report. :D
20:36 glusterbot https://bugzilla.redhat.com/en​ter_bug.cgi?product=GlusterFS
20:37 JoeJulian russoisraeli: not necessarily. Depends where your bottleneck is.
20:37 d4nku JoeJulian: Understood thank you
20:38 russoisraeli JoeJulian - any tuning I can do for the network queue... I am 90% sure that my bottleneck is that the network queues don't empty fast enough
20:38 partner tetreis: pretty much, see for example: http://gluster.org/community/documen​tation/index.php/Gluster_3.4:_Brick_​Restoration_-_Replace_Crashed_Server
20:39 JoeJulian Faster nic, faster switch, faster ram, faster cpu?... that's all I can think of that should be involved in that, but I'm not a kernel hacker.
20:40 tetreis partner, awesome. thank you so much, sir
20:40 tetreis now I have work to do
20:40 B21956 joined #gluster
20:41 partner tetreis: np. and what i often do and would recommend others to do aswell is to try it out with couple of VMs for example, to get the confidence
20:42 kmai007 BAM:  https://bugzilla.redhat.co​m/show_bug.cgi?id=1170786
20:42 glusterbot Bug 1170786: unspecified, unspecified, ---, bugs, NEW , man mount.glusterfs
20:42 partner shut down one, see the files being online and read-writable, bring up the "broken" one or introduce a empty box and rehearse the operation, does not take too long really
20:42 tetreis cool, that's exactly what I need to do
20:43 partner that kind of already produces procedures for emergency situations, for your and your colleagues, its easy to forget all the things after peaceful years of production use :)
20:43 glusterbot News from newglusterbugs: [Bug 1170786] man mount.glusterfs <https://bugzilla.redhat.co​m/show_bug.cgi?id=1170786>
20:45 ildefonso russoisraeli, try directly accessing gluster, instead of doing it from a VM
20:45 ildefonso that'll remove one layer.
20:45 russoisraeli yeah, but it will add the fuse layer
20:45 russoisraeli but you are right
20:46 partner oh, i have man pages, last time i checked IMO they were removed as outdated :o
20:55 kmai007 are most folks on here running glusterfs-3.5.3-1 ?
20:57 partner 3.4.5 still here
20:59 _dist joined #gluster
20:59 russoisraeli with fuse/gluster, write comes out to 51.5MBps and read to 482MBps (with direct flags)
21:01 partner trying with the commands ildefonso gave earlier?
21:01 russoisraeli so qemu with libgfapi either really sucks, needs to be tuned, or requires better hardware
21:01 russoisraeli yep
21:01 russoisraeli write to fuse/gluster with oflag
21:01 russoisraeli read from fuse/gluster to /dev/null with iflag
21:02 partner can you copypaste commands and their results somewhere, just for comparison-fun?
21:02 russoisraeli sure
21:03 partner i'm setting up a new storage in a day or two, why not give it a bit of benchmark while my usecase doesn't require any high performance
21:05 partner i'm still not at all sure of all the possible raid and what not combinations, probably have to put effort into fault tolerancy and capacity rather than best possible iops
21:06 partner can we glue gluster into this? http://www.rnt.info/en/open-​bigfoot-storage-object.html
21:06 russoisraeli http://pastebin.com/UFNmw1xv
21:06 glusterbot Please use http://fpaste.org or http://paste.ubuntu.com/ . pb has too many ads. Say @paste in channel for info about paste utils.
21:06 russoisraeli My second trial run in a VM yielded better write results
21:06 russoisraeli 31.7MBps
21:06 partner thanks
21:06 russoisraeli so maybe need to do several trials
21:09 shaunm joined #gluster
21:09 russoisraeli another run (regenerated all files from urandom) yielded a write speed of 32.5MBps
21:09 russoisraeli I'll do another couple
21:10 russoisraeli maybe that 4MBps run was accidental
21:11 russoisraeli (all 3 replicas in the test are idle/doing nothing but gluster. One of the replicas is running the VM. The VM doesn't have anything setup yet, so also idle)
21:14 russoisraeli 34.6MBps, and 22.7MBps write
21:14 russoisraeli at least not 4 :)
21:14 JoeJulian Heh
21:14 JoeJulian Looks like you're saturating your network now.
21:15 russoisraeli well, it's a dedicated switch...so nothing there.... maybe I should create a larger file to test.... I have Cacti running, but it polls every 5 minutes, so something needs to run for some 20 minutes to see how everything's doing
21:18 n-st joined #gluster
21:21 kmai007 Can anyone tell me if these messages in the gluster-fuse client logs are normal?  I guess this is something new i've not seen while being on glustefs3.4.2, and now its updated to glusterfs3.5.3
21:21 kmai007 http://fpaste.org/156748/77279961/
21:21 russoisraeli I need to speak with someone who knows linux a bit better than me though... when I connected the replicas back after today's test, and they started healing, Cacti shows only a peak of around 35MBps on the network. Meanwhile, Cacti also shows high # of tcpAttemptFails and tcpRetransSegs... so I am assuming that things could be better
21:22 JoeJulian kmai007: No, that looks wierd. Doesn't look like it's harming anything, but odd never the less.
21:23 partner there's a bit too much of useless logging around, we even had to tune the mount not to log pretty much anything as it kept filling up the /var :/
21:23 julim joined #gluster
21:24 partner then again what is useless..
21:24 partner i couldn't figure it out and fix :/
21:25 partner added with broken log rotate the disaster is waiting to happen
21:25 kmai007 totally JoeJulian; i'll delete the volume and see if i'll see that again, no wait...i'll createa  new volume and try it again
21:26 kmai007 i hate regression testing, so
21:26 kmai007 i haven't updated all the servers in my R&D gluster cluster to 3.5.3 yet
21:26 kmai007 damnit
21:26 kmai007 forget the question i posed until i finish the cluster upgrade
21:26 JoeJulian partner: that's what logstash is for. :D
21:27 kmai007 its been 1 year since we've discussed logstash, the only thing i got now is a dirty mustasche
21:27 partner that's what rsyslog+ES is for ;)
21:27 kmai007 yes elastic search
21:27 kmai007 wait, is partner the new gluster bot name?
21:27 partner just add elasticsearch native module to it and logstash turns useless, one java less :)
21:28 partner hehe
21:28 kmai007 partner. high-five
21:29 partner anyways, looking at the logs, maybe you log experts can once again give some advice as i'm about to run out of disk space again on those couple of servers where i kept it going, just a sec for paste
21:29 kmai007 sorry partner i didn't realize you're a real person, i thought JoeJulian was messing with me
21:30 partner kmai007: please instead of being polite just ask your question
21:30 partner :)
21:31 JoeJulian hehe
21:31 bala joined #gluster
21:31 JoeJulian copy-truncate and compress, keeping 7 days. That's what I do for logs. I also always make /var/log its own partition.
21:32 kmai007 logrotat.d
21:33 feeshon joined #gluster
21:33 partner http://fpaste.org/156752/17728793/ here are some good old entries
21:34 partner yeah, its just that glusterfs ships with broken / not complete logrotation, done some fixing here and there but haven't touched the actual package so the error repeats.. my fault, i know
21:35 partner some/many versions at least, haven't checked the most recent ones so the topic can be bypassed for now :o
21:35 kmai007 partner: what version of glusterfs is that on?
21:36 JoeJulian Yeah, there's a *really* old bug filed about the broken logrotation. <grumble>
21:36 partner btw copy-truncate is poison to for example rsyslog imfile as it tracks its position within the file ("fyi")
21:36 kmai007 ghetto setup;, /var is like 15GB; crontab to copy the file elsewhere, then truncate it to /dev/null;
21:37 kmai007 thats what i have
21:37 partner kmai007: 3.4.5 mostly. but as the logrotate file is considered as config file its never updated by the package..
21:37 JoeJulian ... and all that would tell me is taht imfile sucks, imho.
21:37 kmai007 true, i've messed with logrotate.d/
21:37 kmai007 i follwed this person's blog on it
21:38 partner kmai007: the issue is on the client side, i don't dare to put 15 GB to every single box
21:38 JoeJulian I always manage my logrotate files for packages in the puppet manifest/salt state that installs them.
21:38 kmai007 http://www.jamescoyle.net/how-to/​679-simple-glusterfs-log-rotation
21:38 partner well, on the server side aswell but there i have space
21:38 partner JoeJulian: i know that's the best way, i've just for whatever reason neglated that part.. we have conf management in place..
21:38 JoeJulian ... iptables rules as well.
21:38 kmai007 client side i have 4gb, and its quiet
21:39 partner our logs went crazy on 3.3.2 -> 3.4.5 upgrade, had no other choice but to silence them in fstab
21:39 kmai007 yikes, flying blind
21:39 partner and once you silence something you pretty much soon forget there even was any issue
21:40 partner that's better than having production down..
21:40 kmai007 agreed.
21:40 JoeJulian Meh, you only care if it breaks. ;)
21:41 partner wasn't me doing that "fix" but as it went silent i completely forgot it
21:41 kmai007 partner: so your method of upgrading to the later version is?
21:41 kmai007 i'm still on 3.4.2
21:41 partner anyways, there's the paste above that is still flooding
21:42 kmai007 just debating to even make sucha  jump
21:44 partner hmm, i just downloaded glusterfs_3.5.3-1.debian.tar.xz and it has a broken logrotate
21:45 partner only attempts to kill one single process, that being glusterd.. darn, i'll download them all to be sure
21:45 JoeJulian pkill -HUP -f gluster
21:49 partner point being, there is no log rotation for client side at all, imo that counts as broken, there is no glusterd process either so the file is useless there too
21:49 partner but i'll check it
21:49 JoeJulian partner: Are you rpm or deb?
21:50 partner deb
21:50 JoeJulian Then that's upstream. file a bug and submit a fix. :D
21:50 glusterbot https://bugzilla.redhat.com/en​ter_bug.cgi?product=GlusterFS
21:51 JoeJulian Last I looked the rpms were being overridden in a still broken manner.
21:51 JoeJulian Let me know when you push your patch and I'll +1 it.
21:52 partner IMO there has been working logrotate config in repo for ages.. i'll try to find it, somewhere under contrib or something..
21:53 partner https://github.com/gluster/glusterfs/b​lob/master/extras/glusterfs-logrotate that one
21:54 partner since 2011, only recently updated as there was some bug i was also watching at the bugzilla (the global option thingy)
21:54 partner not sure where it ends up in the end, at least not to debs
21:54 JoeJulian semiosis: Do you know how/why that isn't being used? ^^
21:56 * semiosis remiss
21:56 semiosis because i have not sent a patch with it to debian
21:57 semiosis nor, for that matter, has anyone else
21:57 semiosis i presume you're talking about the packages in debian, from the .xz extension above
21:57 partner well there's only me as a debian user, or that's how i often feel :)
21:57 partner nope, that's from download.gluster.org
21:58 semiosis oh really?  we made .xz files?
21:58 partner i always use your packages if available
21:58 semiosis hmm i'll have to look into that, i thought we had the logrotate in our packages
21:58 partner http://download.gluster.org/pub/gluster/glusterfs/​3.5/3.5.3/Debian/jessie/apt/pool/main/g/glusterfs/ this one
21:58 partner there is, its just a bit old/not the one pasted above
21:59 semiosis heh
21:59 semiosis ok
21:59 gothos JoeJulian: isn't rpm for rhel/centos also upstream?
21:59 partner so fails on client side and also i recall partially fails on the server side (not taking all processes and logs into account)
22:00 partner i can make bugzilla entry for this, nothing urgent really, not sure exactly how to make a patch for this as it exists there
22:00 semiosis you could send me a pull request :D https://github.com/semiosis/glusterfs-debian
22:00 JoeJulian gothos: yes, but the tarball is uploaded to http://pkgs.fedoraproject.org/cgit/glusterfs.git/ where the spec file is built by koji.
22:00 partner just accidentially bumped into the topic (feel free to comment on my paste still, its flooding and i have no idea what to do)
22:01 JoeJulian So if the spec file is using the logrotate files in that git tree (which it is) then that's what we get.
22:01 partner i wonder how i properly link that existing file, i suck at this, thus unfortunately i haven't been submitting patches nor much reports either :(
22:02 gothos JoeJulian: ah okay
22:03 JoeJulian bug 1159970
22:03 glusterbot Bug https://bugzilla.redhat.com:​443/show_bug.cgi?id=1159970 medium, high, future, bugs, NEW , glusterfs.spec.in: deprecate *.logrotate files in dist-git in favor of the upstream logrotate files
22:05 MacWinner joined #gluster
22:07 badone joined #gluster
22:08 gothos JoeJulian: and that is already in 3.6
22:08 gothos that reminds me that I get error messages for georep from logrotate
22:09 gothos might look into that tomorrow
22:15 partner i guess enough bugs reported so no need for yet another, perhaps reminder at the next bug triage? or something, not really familiar with the project way of working :o
22:15 JoeJulian Add your CC to a bug you're interested in. Comment if you feel the priority is wrong.
22:16 partner was just doing that, plenty of people got email
22:16 JoeJulian :D
22:17 partner i was evaluating that but given the flood we got last time the rotation would have failed anyways on that particular case. its old so its not that urgent but should get fixed imo
22:17 JoeJulian And yes, the logs are usually too chatty.
22:18 firemanxbr joined #gluster
22:18 partner any quick comments on actions i could do on this case, or there's rather two different kind of cases: http://fpaste.org/156752/17728793/
22:18 JoeJulian Unfortunately, when there's an error, it's usefulness usually depends on some warning that preceded it.
22:18 partner i agree
22:19 partner but having the paste-kind of stuff flooding every second in it'll just hide everything useful
22:19 JoeJulian Ooh, the mismatching layouts can usually be fixed with a rebalance...fix-layout.
22:20 partner plain rebalance won't help? its difficult to understand all those messages and proper actions for them.. i've only been running rebalance occasionally in attempt to free up some space on full bricks
22:20 partner thought i've been told its pretty much useless anyways, the volume is badly fragmented for sure already
22:20 JoeJulian Plain rebalance would help, it's just overkill.
22:21 partner i can't let it run all the way ever, i would run out of memory..
22:21 partner not sure if that will happen with fix-layout aswell, need to try it out
22:21 JoeJulian The dht-linkfile.c:213:dht_linkfile_setattr_cbk probably has something to do with the gfid being null. I'm not sure how that happened. I'd file that one as a bug.
22:21 badone joined #gluster
22:22 JoeJulian I've had success with fix-layout when I've never had success with the full rebalance.
22:22 partner there are some memory leaks still out there which prevent full rebalance
22:23 badone joined #gluster
22:23 gothos hm, the upstream logrotate isn't used in 3.6.1 for centos after all
22:23 partner i'll see how the fix-layout works. i recall hearing long ago already that shouldn't be needed anymore when adding bricks. i just add brick and all the structure gets there automatically
22:24 gothos just checked the logrotate config in upstream which isn't broken like the centos version
22:24 JoeJulian Something broke the layout, otherwise you wouldn't get that error.
22:24 partner can't say if it was there before the upgrade.. but got loud after that..
22:25 partner thanks, i'll give it a shot
22:25 gothos I'll take that back the georep-logrotate is used which is broken. does anyone know if there is a reason that the postrotate is defined near the beginning of the block? otherwise I'll write a patch tomorrow
22:26 gothos moving the missingok before the postrotate
22:26 JoeJulian Shouldn't matter. postrotate happens post.
22:27 JoeJulian configuration is read as a clump, then executed as necessary.
22:27 gothos JoeJulian: yeah, everything between postrotate and endscript is executed post
22:27 gothos and that means missingok is executed after rotate
22:27 JoeJulian Oh, I see what you're saying.
22:27 gothos for some reason
22:27 JoeJulian I wasn't actually looking at it.
22:27 gothos so I get daily errors from my server ;)
22:30 partner hmph, log empty. oh, logration, filehandle on .1 :)
22:31 partner 355623 deleting stale linkfile from the rebalance that ran couple of days, no idea if thats good or bad. or "why"
22:32 partner was there some wiki, would be nice to collect some random explanations for those many sort of log entries
22:35 partner uh, takes on average 360 seconds to "defrag_migrate_data" per directory.. so, knowing the directory structure it would take 273 days to complete the rebalance
22:44 glusterbot News from newglusterbugs: [Bug 1170814] GlusterFS logrotate config complains about missing files <https://bugzilla.redhat.co​m/show_bug.cgi?id=1170814>
22:44 glusterbot News from newglusterbugs: [Bug 1159970] glusterfs.spec.in: deprecate *.logrotate files in dist-git in favor of the upstream logrotate files <https://bugzilla.redhat.co​m/show_bug.cgi?id=1159970>
22:44 partner make it 298 days with a proper average from the log entries :o
22:46 gothos that rfc.sh script is quite helpful :)
22:50 partner fix-layout 51-75 secs per directory, that should finish this year
22:51 partner umm no, that will take 45 days still
22:51 partner crap, i wish i could target it somehow, similarly to old self-heal examples..
23:07 partner if volume has min-free-disk defined and the limit is met, are the sticky-pointers still created?
23:10 semiosis yes
23:10 partner great, so no dht_lookup_everywhere is involved
23:14 glusterbot News from newglusterbugs: [Bug 1170825] GlusterFS logrotate config complains about missing files <https://bugzilla.redhat.co​m/show_bug.cgi?id=1170825>
23:35 gildub joined #gluster
23:53 edwardm61 joined #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary