Camelia, the Perl 6 bug

IRC log for #gluster, 2013-06-25

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
08:44 _ilbot joined #gluster
08:44 Topic for #gluster is now  Gluster Community - http://gluster.org | Q&A - http://community.gluster.org/ | Patches - http://review.gluster.org/ | Developers go to #gluster-dev | Channel Logs - http://irclog.perlgeek.de/gluster/
09:12 _ilbot joined #gluster
09:12 Topic for #gluster is now  Gluster Community - http://gluster.org | Q&A - http://community.gluster.org/ | Patches - http://review.gluster.org/ | Developers go to #gluster-dev | Channel Logs - http://irclog.perlgeek.de/gluster/
09:26 foster joined #gluster
09:29 vpshastry joined #gluster
09:29 45PAATHRD Hi
09:29 glusterbot 45PAATHRD: Despite the fact that friendly greetings are nice, please ask your question. Carefully identify your problem in such a way that when a volunteer has a few minutes, they can offer you a potential solution. These are volunteers, so be patient. Answers may come in a few minutes, or may take hours. If you're still in the channel, someone will eventually offer an answer.
09:30 45PAATHRD just to let people know about my latest manipulation for converting volume distribution on a 4 nodes cluster :
09:30 45PAATHRD first : I had a 3 nodes cluster with replica 3 mode
09:31 45PAATHRD then, I wanted to add a 4th node and transform the whole cluster in a distributed-replicated volume (2x2)
09:32 45PAATHRD at first, I added the 4th node with replica 4 but I got wrong since I then had a replica 4 volume ... not what I wanted.
09:33 45PAATHRD I just removed 2 nodes from the volume and re-add them to change the layout but it didn't work, gluster didn't want me to add my nodes.
09:33 45PAATHRD so, I discovered this was due to the metadata from the old layout on those removed bricks.
09:34 duerF joined #gluster
09:34 45PAATHRD I re-format the unerlying filesystem and then re-added those 2 nodes in a replica 2 volume and voilà : a 4 nodes replicated-distributed 2x2 volume.
09:35 45PAATHRD At the end, I just ran a rebalance to re-distribute data accross the bricks.
09:36 45PAATHRD All of this was done live, I never had to stop the volume so that my clients never see anything (except some slowdowns sometimes) ... be sure to do a full backup of the data at first ... just to be sure
09:36 45PAATHRD Now I have a question : can somebody explains me clearly what direct-io-mode is and how it is used for ?
09:38 partner uuh 15 hours of disk utilization 100% already, i probably need to start kicking something, i'll just wait for few more hours to see if for some reason those filehandle counts want to raise to a level of a target (458k)
09:40 social__ is there any reason why gluster logs time is in UTC instead of system time?
09:41 rgustafs joined #gluster
09:42 Chr1st1an joined #gluster
09:44 ndevos social__: I think people gave too often logs without details - lots of people have setups that span multiple timezones, comparing logs without knowing the timezone is a major pain
09:46 social__ ndevos: thanks, sounds fair
09:49 partner hmm my timestamps are on local timezone
09:55 partner or not. or mixed. :)
09:55 vpshastry joined #gluster
09:55 foster joined #gluster
10:03 partner anyways, i'm still looking for hints for 100% disk utilization after stopping rebalance, open files keeps growing on the two source boxes too still, i'd rather not stop the volume as this is supposed to be online operation..
10:20 hagarth joined #gluster
10:27 mooperd joined #gluster
10:32 CheRi joined #gluster
10:35 zetheroo joined #gluster
10:36 zetheroo after updating gluster updates with apt-get upgrade the gluster seems to be running very slow ... is there anything that needs doing after updating gluster ... like services restarted etc?
10:39 andreask joined #gluster
10:42 rastar joined #gluster
10:44 kkeithley1 joined #gluster
10:53 CheRi joined #gluster
11:01 zetheroo something I noticed is that glusterfsd is using 800% CPU on our second KVM server ...
11:13 Norky joined #gluster
11:20 CheRi joined #gluster
11:31 vpshastry joined #gluster
11:35 manik joined #gluster
11:37 tjikkun_work joined #gluster
11:39 aliguori joined #gluster
11:42 partner i don't see min-free-disk on the volume info but is it there by default anyways or not? docs on website tells the default is 10% but not sure how can i check if the writes are going only to brick with more diskspace (>10%)
11:43 partner on distributed volume that is
11:43 tjikkun_work joined #gluster
11:45 CheRi joined #gluster
11:58 mooperd_ joined #gluster
12:01 venkatesh joined #gluster
12:05 ctria joined #gluster
12:29 juhaj joined #gluster
12:32 plarsen joined #gluster
12:37 manik joined #gluster
12:50 andreask joined #gluster
12:52 jthorne joined #gluster
12:55 Elektordi left #gluster
13:20 vpshastry joined #gluster
13:21 rwheeler joined #gluster
13:30 nightwalk joined #gluster
13:33 lpabon joined #gluster
13:44 bennyturns joined #gluster
13:56 foxban_ joined #gluster
13:57 failshell joined #gluster
14:00 bugs_ joined #gluster
14:00 snarkyboojum joined #gluster
14:01 mriv_ joined #gluster
14:01 joelwallis joined #gluster
14:02 ccha I have 2 gluster servers and 1 gluster client on same LAN. I want to have a direct link between the 2 servers. I want to create a VOL with these servers thought this direct link
14:02 ccha from my client I can't use glusterfs client ? I can only use nfs ?
14:03 aliguori joined #gluster
14:04 bulde joined #gluster
14:06 failshell hello. i have this 3.2 cluster that contains our data and this new 3.3 cluster that needs to receive this data. im trying to setup georepl between both, but that doesnt seem to work. im guessing its because the 3.3 cluster is trying to mount the 3.2 volumes and that's failing. what are my options then to move the data to the new cluster?
14:09 MrNaviPacho joined #gluster
14:14 vpshastry left #gluster
14:14 krishnan_p joined #gluster
14:16 nightwalk joined #gluster
14:16 stigchristian joined #gluster
14:16 Cenbe joined #gluster
14:28 kaptk2 joined #gluster
14:35 nightwalk joined #gluster
14:35 stigchristian joined #gluster
14:35 Cenbe joined #gluster
14:36 foxban joined #gluster
14:40 RangerRick6 joined #gluster
14:42 jiffe2 joined #gluster
14:42 shanks` joined #gluster
14:43 DEac-_ joined #gluster
14:43 kkeithley1 joined #gluster
14:43 abyss^__ joined #gluster
14:43 edward2 joined #gluster
14:43 rgustafs_ joined #gluster
14:44 ccha2 joined #gluster
14:48 failshel_ joined #gluster
14:49 joelwallis_ joined #gluster
14:50 fxmulder joined #gluster
14:51 ninkotech joined #gluster
14:51 zetheroo joined #gluster
14:52 eryc joined #gluster
14:52 eryc joined #gluster
14:52 jtriley joined #gluster
14:53 badone joined #gluster
14:54 GabrieleV joined #gluster
14:59 codex joined #gluster
15:00 hagarth joined #gluster
15:00 zetheroo left #gluster
15:03 dowillia joined #gluster
15:03 dewey joined #gluster
15:04 plarsen joined #gluster
15:05 kaptk2 joined #gluster
15:07 dowillia1 joined #gluster
15:07 dberry joined #gluster
15:07 clutchk failshell: rsnapshot might do the trick.
15:07 bala1 joined #gluster
15:08 dberry joined #gluster
15:08 bulde joined #gluster
15:09 dberry joined #gluster
15:09 manik joined #gluster
15:14 JonnyNomad joined #gluster
15:20 neofob joined #gluster
15:22 vpshastry joined #gluster
15:24 portante joined #gluster
15:25 plarsen joined #gluster
15:27 ToMilesS joined #gluster
15:28 ToMilesS hi, having ghost directories on my gluster share, anyone have debugging tips?
15:29 ToMilesS directories show up on the directly on my bricks on both replicates
15:29 ToMilesS but don't show in a directory listing of the mounted volume
15:30 ToMilesS so "ls" doesn't show them, but I can 'cd' to the folder
15:30 ToMilesS no split brain reported in gluster
15:31 dberry_ joined #gluster
15:37 ofu_ joined #gluster
15:40 nixpanic_ joined #gluster
15:40 nixpanic_ joined #gluster
15:40 social_ joined #gluster
15:41 vpshastry1 joined #gluster
15:42 fcami joined #gluster
15:43 hagarth1 joined #gluster
15:46 ndevos ToMilesS: how are you mounting the volume? nfs or glusterfs
15:46 ToMilesS glusterfs
15:47 ndevos ToMilesS: can you do 'stat <nonexistant_directory>'?
15:48 ToMilesS ndevos: yeah I can
15:48 ndevos ToMilesS: and still 'ls' does not show it?
15:49 ToMilesS ndevos: indeed, stays hidden
15:50 ndevos ToMilesS: maybe 'stat .' and check again?
15:50 ndevos ToMilesS: is that on a 32-bit system maybe?
15:51 ToMilesS ndevos: its 64bit , and stat doesn't make em appear, even did a find/stat to trigger self heal on the whole volume
15:52 ndevos ToMilesS: I was just wondering if the 64-bit inode would be a potential issue - it can be on 32-bit systems
15:52 shawns|work joined #gluster
15:52 matiz joined #gluster
15:53 ndevos ToMilesS: maybe some caches are at play, but doing all those stat's should have cleared/updated them
15:53 ToMilesS ndevos: its a 96GB RAM vserver so 32bit wouldn't work for us :-)
15:54 kaptk2 joined #gluster
15:54 ndevos ToMilesS: hehe - could you try 'ls -d <nonexistant_directory>'? - that would show that readdir()/getdents() is acting weird
15:55 ToMilesS ndevos: yeah ls -d returns the directory name
15:56 ToMilesS ndevos: identical as if i do a 'ls -d 'for an directory that does show up in "ls'
15:56 Debolaz joined #gluster
15:57 ndevos ToMilesS: okay, that, and the 'cd' and 'stat' really suggests a readdir() issue - now its it only a question of where that issue lays
15:58 ToMilesS ndevos: at least we now that now
15:58 Debolaz A question... I want to have 2 servers that has access to a common filesystem. They don't have databases, they're just plain webservers. I want to do this so I can do load balancing (Meaning they're both live) and failover (Meaning they don't need each other to be online all the time) with them. Is glusterfs the best choice for me?
16:01 sprachgenerator joined #gluster
16:01 sprachgenerator joined #gluster
16:02 ndevos ToMilesS: you could check with strace what ls is doing, maybe it aborts early or something
16:03 ndevos ToMilesS: the other option would be to capture a tcpdump and check with wireshark what directory listing is returned
16:03 ToMilesS ok not really experienced with strafe but I see what i can find out more about what going on and report back
16:04 ToMilesS *strace
16:05 ndevos ToMilesS: if it is not in the strace, but in the wireshark output, it issue is client side
16:06 ndevos ToMilesS: if the readdir reply from the server does not show the directory either, the issue is server side
16:06 ToMilesS ndevos: ok thanks for the pointers, I'll look into it
16:07 ndevos ToMilesS: you're welcome, and good luck!
16:08 Mo__ joined #gluster
16:12 chirino joined #gluster
16:12 sonne joined #gluster
16:12 al joined #gluster
16:12 bivak joined #gluster
16:13 bsaggy joined #gluster
16:14 Gugge_ joined #gluster
16:15 glusterbot` joined #gluster
16:16 plarsen joined #gluster
16:16 sysconfi- joined #gluster
16:17 foxban_ joined #gluster
16:18 dowillia joined #gluster
16:19 mjrosenb_ joined #gluster
16:20 aliguori joined #gluster
16:20 tjikkun joined #gluster
16:20 tjikkun joined #gluster
16:21 kkeithley1 joined #gluster
16:24 ctria joined #gluster
16:27 ToMilesS ndevos: is there an easy way to filter the packets to look for readdir related communications from gluster?
16:27 anands joined #gluster
16:28 vpshastry1 left #gluster
16:28 ToMilesS ndevos: I seem a some TFTP protocol packages that mention gluster
16:36 zaitcev joined #gluster
16:37 arusso joined #gluster
16:39 ToMilesS ndevos: gluster now reports warnings: remote operation failed: No such file or directory. Path: /instance-00000266
16:39 ToMilesS thats path is the one that doesn't show in de dir listing
16:49 ToMilesS ndevos: but no errors or warnings in the logs of the underlying bricks
16:54 Debolaz Hmm... I've tried doing a mount -t glusterfs localhost:/volume /mnt which seems to succeed, but trying to access /mnt now just results in whatever process trying to access it hanging indefinitly. Where should I begin debugging? I'm not seeing any obvious errors anywhere.
16:56 chirino joined #gluster
16:57 failshel_ hello. i have this 3.2 cluster that contains our data and this new 3.3 cluster that needs to receive this data. im trying to setup georepl between both, but that doesnt seem to work. im guessing its because the 3.3 cluster is trying to mount the 3.2 volumes and that's failing. what are my options then to move the data to the new cluster?
17:06 bstr joined #gluster
17:13 jiffe2 joined #gluster
17:13 foxban joined #gluster
17:13 dewey_ joined #gluster
17:14 semiosis so I took a look at hawtjni & swig yesterday night /cc chirino hagarth
17:14 chirino semiosis: thoughts?
17:14 semiosis i like hawtjni's modern style, annotations & all
17:15 semiosis swig looked archaic
17:15 semiosis i'm new to the whole thing though
17:15 chirino swig is handy if you want to target more languages.
17:15 semiosis i see
17:16 lpabon_ joined #gluster
17:17 lpabon_ joined #gluster
17:17 semiosis wondering if I could write a script to generate the hawtjni-annotated class from the .h files
17:17 semiosis or if that's just crazy
17:17 manik1 joined #gluster
17:18 chirino You planing on parsing the .h files?
17:18 semiosis if i understand correctly i would need to build static classes in java that correspond to the C interface
17:18 chirino that can get tricky.
17:18 chirino yeah
17:18 semiosis so better to do it by hand?
17:19 chirino well depends on how big the job is.
17:19 semiosis right
17:19 chirino for that small header file, I'd say building a C/C++ parser is a bit overkill.
17:20 semiosis i was thinking just a few regex replacements could take care of it
17:20 chirino yeah.. might do.
17:20 semiosis if by parser you mean with a stack -- no way
17:20 semiosis that's definitely overkill
17:20 failshell i recall there's something we need to do during  an upgrade from 3.2 to 3.3
17:20 failshell anyone can refresh my memory?
17:21 semiosis ~3.3 upgrade notes | failshell
17:21 glusterbot failshell: http://goo.gl/qOiO7
17:22 failshell semiosis: in the working directory, that's where the volume config files are right?
17:22 foster joined #gluster
17:22 semiosis failshell: yes
17:23 tg2 joined #gluster
17:23 semiosis chirino: i especially like all the maven integration.  i'm pretty comfortable using maven, so I like that maven can drive a lot of the C compiling
17:23 chirino yeah
17:23 failshell what's the popular fuse server again
17:24 failshell the one that can run several versions of the same bundle
17:24 failshell fuseserver?
17:24 failshell servicemix
17:24 failshell ah yeah
17:25 semiosis failshell: ?!
17:25 failshell nevermind me : )
17:25 semiosis ok
17:26 failshell you made me think of that by mentionning maven
17:26 failshell http://fusesource.com/docs/esb/4.4.​1/esb_runtime/ESBRuntimeStart.html
17:26 glusterbot <http://goo.gl/hg4c9> (at fusesource.com)
17:26 failshell that's what im refering to
17:26 failshell OSGi model instead of J2EE
17:27 semiosis yeah, i got that, just odd coincidence that it has little do to with gluster except that i happen to be chatting with one of the servicemix devs (I think) at the moment
17:27 failshell was managing a few of those years ago
17:27 failshell didnt perform well and kept crashing, but it was very new at the time
17:27 semiosis btw chirino, thanks for all the awesome apache stuff. we use camel & activemq and they're great
17:28 harish joined #gluster
17:31 zaitcev joined #gluster
17:37 lalatenduM joined #gluster
17:40 dseira joined #gluster
17:40 dseira
17:41 dseira quit
17:44 chirino joined #gluster
17:44 nexus joined #gluster
17:49 piotrektt joined #gluster
17:49 piotrektt joined #gluster
17:52 ndevos ToMilesS: you need wireshark 1.8+ to be able to view the package details
17:55 ToMilesS ndevos: ok i'll update it, "remote operation failed:" on client but no log entries on brick does that indicate the issues is client side or is it normal that only client complains?
17:55 JoeJulian ndevos: Did you ever happen to take all the work you did on figuring out the rpc and turn it into a technical reference document?
17:55 ndevos ToMilesS: and a filter would look like  'glusterfs.proc == READDIR || glusterfs.proc == READDIRP'
17:56 ndevos JoeJulian: nope, I've only looked at the structures, not at the behaviour
17:56 rwheeler joined #gluster
17:56 JoeJulian Well, that would at least make a good outline to start one...
17:58 ndevos JoeJulian: well, I've written something -> http://people.redhat.com/ndevos/t​alks/gluster-wireshark-201211.pdf
17:58 glusterbot <http://goo.gl/QAzbm> (at people.redhat.com)
17:58 ndevos and http://people.redhat.com/ndevos/talks/​debugging-glusterfs-with-wireshark.d/
17:58 glusterbot <http://goo.gl/3nM9n> (at people.redhat.com)
17:59 ndevos but that is more a presentation format... I'll think about documenting some of the work, but it will take a *lot* of time
18:00 ndevos ToMilesS: "remote operation failed" is not clear enough, sorry
18:00 ndevos anyway, I'll be back tomorrow
18:01 JoeJulian ndevos: What's the source document for that pdf? Is that up somewhere?
18:01 samppah has anyone done testing with different values for background-qlen mount option?
18:01 samppah i've been trying different values but i'm kind of lost what to expect from it
18:02 ndevos JoeJulian: no, its <something> openoffice, I'm happy to email it if you are interested
18:02 JoeJulian Please
18:02 JoeJulian me@joejulian.name
18:02 ndevos JoeJulian: which presentation?
18:02 JoeJulian samppah: have never even noticed that option. :/
18:03 andreask joined #gluster
18:03 JoeJulian ndevos: That one you linked to should get me started.
18:03 JoeJulian gluster-wireshark-201211
18:04 ndevos done, you should have it in a minute or so
18:04 JoeJulian Thanks.
18:04 ndevos you're welcome
18:04 * ndevos disconnects
18:06 ToMilesS ndevos: when i filter on those "proc" and the path of the disappeared directory, and found some reply packages from one of the brick servers
18:06 chirino joined #gluster
18:16 joelwallis joined #gluster
18:18 dberry joined #gluster
18:18 dberry joined #gluster
18:20 plarsen joined #gluster
18:21 chirino semiosis: hey, I pushed an initial project spike at: https://github.com/chirino/glfsjni
18:21 glusterbot Title: chirino/glfsjni · GitHub (at github.com)
18:22 semiosis you are too efficient!
18:22 semiosis thanks :D
18:22 chirino not too much in terms of mapping yet:
18:22 chirino https://github.com/chirino/glfsjni/bl​ob/master/glfsjni/src/main/java/org/f​usesource/glfsjni/internal/GLFS.java
18:22 glusterbot <http://goo.gl/SE1Iu> (at github.com)
18:23 semiosis ah i see
18:23 rastar joined #gluster
18:23 ujjain joined #gluster
18:23 chirino hopefully this is the only header file we need to pull in:
18:23 chirino https://github.com/chirino/glfsjni/blob/master​/glfsjni/src/main/native-package/src/glfsjni.h
18:23 glusterbot <http://goo.gl/qyMQq> (at github.com)
18:24 chirino and the m4 file needs to be updated to detect where the gluster headers / libs are.
18:24 chirino https://github.com/chirino/glfsjni/blob/master/g​lfsjni/src/main/native-package/m4/custom.m4#L39
18:24 glusterbot <http://goo.gl/Jtq47> (at github.com)
18:25 chirino semiosis: what's your github id?
18:25 semiosis semiosis
18:26 chirino ok. you got commit access.
18:26 joelwallis joined #gluster
18:26 semiosis cool!  my first commit access to someone elses github project
18:27 chirino Guess we should change all the proejct headers to match the glfs license.
18:27 chirino LGPL … barf. barf..
18:27 chirino :)
18:31 kkeithley_ You like GPL better?
18:31 semiosis mit/bsd/asl seemed to be the consensus yesterday
18:31 semiosis for an api
18:32 semiosis kkeithley_: well i'd love to get you a link to the logs from that chat yesterday but seems _ilbot was out to lunch for the last couple days :(
18:34 kkeithley_ license discussions tend to be too religious for my taste. ;)
18:36 chirino kkeithley_: it's all about what you goal is.
18:39 kkeithley_ Yeah. I worked for the {,MIT}XConsortium: MIT/BSD license, and I've worked for proprietary. Now I work for Red Hat, where the default, first thought is for GPL. Getting to LGPL or in the case of dual GPLv2/LGPLv3+ was a struggle.
18:39 kkeithley_ s/or in the case of/or in the case of gluster/
18:39 glusterbot What kkeithley_ meant to say was: Yeah. I worked for the {,MIT}XConsortium: MIT/BSD license, and I've worked for proprietary. Now I work for Red Hat, where the default, first thought is for GPL. Getting to LGPL or in the case of gluster dual GPLv2/LGPLv3+ was a struggle.
18:41 kkeithley_ Bottom line is, GPL is in Red Hat's DNA
18:42 kkeithley_ Some might say too much...
18:44 Debolaz I have a setup of two servers replicating each other with glusterfs now. These servers will also act as clients. Should the client connect only to its local server, or to all servers? (What I'm asking is actually: Does this make any difference?)
18:46 Debolaz I'm a little bit confused by the various examples I've been able to find, as it seems some specify only one server (Like mount -t glusterfs node01:/volume01) while others specify several servers in a .vol file.
18:47 semiosis Debolaz: generally speaking, any document that tells you about .vol files is extremely outdated
18:47 krishnan_p joined #gluster
18:47 semiosis users shouldn't encounter .vol files in normal operation, since version 3.1
18:47 semiosis Debolaz: also see ,,(mount server)
18:47 glusterbot Debolaz: The server specified is only used to retrieve the client volume definition. Once connected, the client connects to all the servers in the volume. See also @rrnds
18:48 Debolaz Ah. That clears things up a bit. :)
18:48 semiosis ah yes, there used to be a nice article linked from that factoid, but that site is gone now
18:48 semiosis :(
18:51 semiosis johnmark: any chance of getting a static export of C.G.O?
18:51 Debolaz Also, I'm currently using Ubuntu 12.04's 3.2 version of GlusterFS. I see there's a PPA for version 3.3. Although I generally get the "newer is better" idea, is there a strong reason to use 3.3 (With the slight administrational overhead that brings) in favor of 3.2?
18:51 semiosis Debolaz: use the ,,(ppa)
18:51 glusterbot Debolaz: The official glusterfs 3.3 packages for Ubuntu are available here: 3.3 stable: http://goo.gl/7ZTNY -- 3.3 QA: http://goo.gl/5fnXN -- and 3.4 QA: http://goo.gl/u33hy
18:52 semiosis if you're just starting out, maybe even start with 3.4
18:58 Debolaz Starting out, but intending to use it for some very important production stuff.
18:59 Debolaz semiosis: Would 3.4 be suitable for that?
19:00 semiosis i think so, not sure when tho
19:01 Debolaz Probably better to stick with 3.3 then.
19:02 JoeJulian 3.4 does seem to be the most widely tested beta I've seen yet, though. Hopefully that's a good sign.
19:07 kkeithley_ As solid as the 3.4.0 betas seem, I still wouldn't suggest that anyone use them for production.
19:09 failshell since i upgraded to 3.3.1
19:10 failshell i can't mount in read-only
19:10 failshell it fails
19:11 semiosis @read-only
19:11 semiosis @learn read-only as read-only is broken in 3.3 -- https://bugzilla.redhat.com/show_bug.cgi?id=853895
19:11 glusterbot semiosis: The operation succeeded.
19:11 semiosis @read-only
19:11 glusterbot Bug 853895: medium, medium, ---, csaba, ON_QA , CLI: read only glusterfs mount fails
19:11 glusterbot semiosis: read-only is broken in 3.3 -- http://goo.gl/xCkfr
19:12 failshell thanks semiosis
19:13 y4m4 joined #gluster
19:19 y4m4_ joined #gluster
19:23 semiosis yw
19:47 y4m4 joined #gluster
19:59 failshell that seems to be a fail on the QA part though
19:59 failshell that's something pretty major to miss
19:59 semiosis agreed
20:01 failshell that's gonna force us to migrate to 3.4
20:01 failshell hopefully, that wont be as painful as 3.2 => 3.3
20:06 jag3773 joined #gluster
20:06 failshell i wonder if that works in RHS
20:07 bsaggy joined #gluster
20:08 dberry joined #gluster
20:08 dberry joined #gluster
20:15 chirino joined #gluster
20:35 ToMilesS joined #gluster
20:38 rcoup joined #gluster
20:41 harish joined #gluster
20:41 ricky-ticky joined #gluster
20:51 failshell upgrading to 3.3.1 made the whole thing really unstable
20:52 glusterbot New news from newglusterbugs: [Bug 978030] Qemu libgfapi support broken for GlusterBD integration <http://goo.gl/8VZjy>
20:59 kaptk2 joined #gluster
21:04 JoeJulian failshell: which bugs did you file about said instability?
21:04 failshell i didnt file anything
21:05 failshell im trying to get this thing stable
21:05 * JoeJulian grumbles
21:05 failshell im getting lots of DNS errors
21:05 failshell gluster peer status doesnt work
21:06 JoeJulian I haven't scrolled back yet. Have you pasted a client log yet?
21:06 failshell nope
21:06 JoeJulian (you can tell I'm distracted when I'm repetitive.... <sigh>)
21:06 JoeJulian Well paste one up and let's see what you're seeing...
21:07 failshell Connection failed. Please check if gluster daemon is operational.
21:07 failshell i keep getting that
21:07 failshell cant even manage the cluster
21:07 m0zes joined #gluster
21:07 JoeJulian is glusterd running?
21:08 failshell yup
21:08 failshell the volumes appear to be functional
21:08 JoeJulian What was the last successful cli thing you did?
21:10 failshell i peer probe the machines for example
21:10 failshell it fails, it fails, it works, then fails
21:10 failshell same with volume info
21:11 JoeJulian Probably all the privileged ports are in use, which probably goes back to that dns thing. paste the glusterd.vol.log somewhere.
21:11 failshell was fine prior to upgrade to 3.3.1
21:12 failshell https://gist.github.com/fai​lshell/5f881a97606519799093
21:12 glusterbot <http://goo.gl/sT79j> (at gist.github.com)
21:13 JoeJulian Well, that says I was right. Now what's keeping all the privileged ports open?
21:14 failshell i dunno
21:15 JoeJulian Do you know how to check what connections are open?
21:15 failshell netstat?
21:15 JoeJulian yep
21:16 failshell there's so many i cant even scroll up
21:16 failshell my buffer is not large enough :)
21:16 JoeJulian Most of them are probably CLOSE_WAIT?
21:16 failshell yeah time_wait
21:17 JoeJulian Probably means that something is hammering glusterd. What's the source address of the most common entry?
21:18 failshell seems to be them talking to each other
21:18 failshell the 18 nodes
21:18 JoeJulian Are all your servers clients?
21:18 failshell no there's 18 servers and roughly 50-60 clients
21:19 failshell i upgraded the cluster to migrate to a 2 node RHS
21:19 JoeJulian Hmm, that's not what I was expecting. :)
21:19 failshell kinda shot myself in the foot
21:20 failshell so what can i do? how come it got in this state? should i shut it down and start it fresh?
21:20 failshell or is that just going to happen again?
21:20 JoeJulian Check the log directories and try to find out which log file is growing the fastest (usually the most recent timestamp, ie. ls -ltr)
21:20 JoeJulian Or that.
21:20 JoeJulian Have you used the replace-brick command?
21:20 failshell no
21:21 failshell just upgraded the RPMs
21:21 JoeJulian Oh. Did you restart the bricks after you did that?
21:21 failshell yeah
21:21 JoeJulian remount the clients?
21:21 failshell yup
21:26 failshell [2013-06-25 17:25:27.064936] I [client-handshake.c:1445:client_setvolume_cbk] 0-gesca-shared-client-14: Server and Client lk-version numbers are not same, reopening the fds
21:26 glusterbot failshell: This is normal behavior and can safely be ignored.
21:26 failshell i get a lot of those too
21:27 mooperd joined #gluster
21:28 failshell JoeJulian: restarting the whole shebang seems to have stabilized it all
21:29 JoeJulian There seems to be some way, and I've seen this myself but haven't been able to repro it, where it keeps trying to connect to a brick that's not supposed to be there. In my case it was using replace-brick.
21:30 JoeJulian The "fastest growing log" technique is what told me where the problem was.
21:31 failshell well, the new setup is much  better
21:31 failshell only 2 nodes, but with much larger bricks
21:31 JoeJulian cool
21:32 failshell thanks for your help
21:32 JoeJulian Out of curiosity, why isn't your red hat support engineer taking care of these questions?
21:32 failshell this is not a RHS cluster
21:32 JoeJulian Ah, ok.
21:32 failshell the new one will be
21:32 JoeJulian Got it.
21:32 failshell well, is
21:33 failshell pricing is steep though
21:34 JoeJulian I know I can't afford it. :D
21:35 purpleidea joined #gluster
21:35 purpleidea joined #gluster
21:38 jclift_ joined #gluster
21:38 failshell starting again
21:38 failshell damn
21:39 JoeJulian What version did you upgrade from?
21:39 failshell 3.2.7
21:40 JoeJulian I wonder if it's the self-heal daemon building the .glusterfs tree. Should be reconnecting to glusterd all the time though.
21:40 JoeJulian s/Should/Shouldn't/
21:40 glusterbot What JoeJulian meant to say was: I wonder if it's the self-heal daemon building the .glusterfs tree. Shouldn't be reconnecting to glusterd all the time though.
21:41 failshell is that going to stop eventually?
21:41 JoeJulian If you can find me that runaway log, I can probably help. Going to need to see the tail end of whatever that is though.
21:41 JoeJulian If it's the self-heal daemon, yes.
21:42 failshell well, my day is over and the volumes seems functional
21:42 failshell i disabled the notifications from the monitoring system
21:42 failshell going to look at that tomorrow
21:43 JoeJulian Sounds like a day. Enjoy your evening.
21:52 glusterbot New news from newglusterbugs: [Bug 951661] Avoid calling fallocate(), even when supported, since it turns off XFS's delayed-allocation optimization <http://goo.gl/MVdqz>
21:53 Eco_ joined #gluster
21:58 fidevo joined #gluster
22:45 partner i wonder when my holiday begins..
22:45 partner 2AM and still babysitting when the stuff breaks
22:54 JoeJulian :/
22:56 partner well, its going to break anyways so i feel free to do whatever at this point
22:58 partner reaching maximum open files so writes will fail anyways
22:58 JoeJulian There's a bug open on this already, isn't there?
22:59 partner yeah
22:59 partner but i don't understand why it sometimes keeps growing and sometimes doesn't, i did a smaller rebalance earlier to this which i stopped for the night, it was all fine
23:00 partner now, after 30 hours since stopping rebalance open files keep going up on the old bricks..
23:01 JoeJulian Schrodinger's gluster volume.
23:01 partner i now went and stop/start the volume, not touching clients at all
23:01 partner it seems nginx noticed a small break but mounts are up and stuff kept going after the operation
23:02 partner giving 400 instead of 200 for store file..
23:02 JoeJulian And this is during a rebalance, right?
23:02 partner after stopping it 30 hours ago.. disk utilization still 100%
23:02 stigchristian joined #gluster
23:02 Cenbe joined #gluster
23:02 partner ..which started on stopping the rebalance due to bug..
23:03 partner ie. i stopped it because of the open files increasing..
23:03 nightwalk joined #gluster
23:04 partner not sure what i am going to do after i figure this out, its going to take a whole holiday to rebalance this thingy.. :/
23:05 partner three 8TB bricks on dist setup, two first ones holding 7.5TB currently
23:05 JoeJulian Are the files growing, or just adding new files?
23:05 rcoup joined #gluster
23:05 partner its write once, read few.. so not growing, not renamed, just write and read few times
23:06 partner not sure about the file count, dir structure is roughly 128k
23:08 partner 3.3.1 on debian wheezy still here, nothing special on the setup, official packages from gluster.org and so forth, doing a standard operations (like fix-layout etc.)
23:08 JoeJulian Then in answer to your question yesterday, after a fix-layout if the calculated dht subvolume is full (at or exceeding the cluster.min-free-disk if set) it will place the new file elsewhere and create a sticky pointer.
23:09 partner i presented the question earlier today
23:09 partner 14:42 < partner> i don't see min-free-disk on the volume info but is it there by default anyways or not? docs on website tells the default is 10% but not sure how can i check if the writes are going only to brick with more diskspace (>10%)
23:09 partner 14:43 < partner> on distributed volume that is
23:10 partner oh, is that the yesterday question :)
23:10 partner oh, it is even on my timezone :)
23:10 JoeJulian hehe
23:10 partner but if the default there really or what?
23:11 partner "if set" you say, manual says default is 10% so...??
23:11 JoeJulian http://gluster.org/community/documentation/i​ndex.php/Gluster_3.2:_Setting_Volume_Options says the default is 0%. Checking the source....
23:11 glusterbot <http://goo.gl/dPFAf> (at gluster.org)
23:11 partner i got my value from translators page.. http://www.gluster.org/community/docum​entation/index.php/Translators/cluster
23:11 glusterbot <http://goo.gl/gKAkc> (at www.gluster.org)
23:13 partner so i am not sure at all.. i think i did get the brick full on testing env long ago when doing my initial testings with gluster
23:14 JoeJulian yes, 10%.
23:15 partner 3.3.1 does have a hidden default of min-free-disk 10% if not set to anything else?
23:15 JoeJulian correct
23:15 JoeJulian According to the source code... :D
23:16 partner i trust the binaries are not patched :D
23:16 aliguori joined #gluster
23:16 JoeJulian xlators/cluster/dht/src/switch.c line 868
23:17 partner pardon my stupidity but switch is all different translator? or scheduler rather?
23:17 partner nevermind.. i think i mixed those..
23:19 partner i have no idea what scheduler i am using, the "default" i guess but.. ALU?
23:19 partner "This option tells the 'cluster/distribute' volume to stop creating files in the volume where the file gets hashed to, if the available disk space is lesser than the given option. Default option '10%'
23:20 JoeJulian default looks like it should be switch. nufa is the other one and that hasn't been used in a while.
23:20 JoeJulian But either one has a 10% default.
23:21 partner my stop/start volume had no effect on 100% disk utilization..
23:22 partner i just might need to kick harder..
23:22 sprachgenerator joined #gluster
23:27 social_ I'm seeing a lot of Unable to self-heal permissions/ownership of '/table' (possible split-brain). Please fix the file on all backend volumes < any indea how to fix these?
23:28 y4m4 social_: "setfattr -x trusted.afr.<client> <brick>/table" on all the nodes
23:29 social_ y4m4: yes I noticed they quite much differ :/
23:29 partner now we wait...
23:30 y4m4 partner: better is to use 'bytes' instead of percentage not even MB, GB connotations
23:31 y4m4 partner: there was a known bug in 'Percentage'
23:31 social_ y4m4: now question is what does attrs say to me
23:31 y4m4 social_: attrs say a lot of things in general, but at this point they are of no use for 'Gluster' or for you
23:32 y4m4 social_: so better remove them and 'stat or ls' the directory
23:32 y4m4 from the client side
23:33 social_ y4m4: remove on all peers reporting split brain?
23:33 y4m4 partner: https://bugzilla.redhat.com/show_bug.cgi?id=874554
23:33 glusterbot <http://goo.gl/xbQQC> (at bugzilla.redhat.com)
23:33 glusterbot Bug 874554: unspecified, medium, ---, rtalur, ON_QA , cluster.min-free-disk not having an effect on new files
23:34 y4m4 social_: actually on all the bricks where the absolute path is "/<brick>/table"
23:34 partner y4m4: oh, for a moment i was wondering what you mean by bytes and not percentage..
23:35 social_ y4m4: I don't think I'll manage to do that before some client stats it in between
23:35 partner i haven't set anything anywhere but if the 10% does not kick in its in practice 0% then.....
23:37 partner my nodes (servers if you want) are at 92% both, added new one and did the rebalance to make them all in equal level.. i'm still 92% on two, 10% on one on used disk space..
23:37 y4m4 partner: it was supposed to work, apparently bug
23:37 y4m4 partner: where a min-free-disk had no affect
23:38 y4m4 social_: if you have password less ssh then we can do that in one script
23:38 partner y4m4: ok, good to know, i am not exactly trying to use that option here but just to rebalance the damn bricks
23:39 partner i should have just expanded the logical volumes and head to holiday, instead its day three now keeping the stuff alive
23:39 partner i just thought it would have made the things worse..
23:40 y4m4 partner: it would better to go ahead and right away calculate for 7% and set that value
23:40 y4m4 partner: then perform a rebalance operation
23:40 partner ie. concentrate the new writes to new brick?
23:41 y4m4 partner: yes
23:41 partner i am getting slow in here, 3AM already and its been almost 30degC all day, had few beers over the day so pardon me
23:41 y4m4 partner: it would be better to calculate in bytes
23:42 partner ok, i can try to do that :)
23:42 y4m4 partner: it would for example 'gluster volume set <volname> cluster.min-free-disk <so_many_bytes>'
23:43 partner thank you, i was about to google for that..
23:45 partner oh, lots of numbers.. :)
23:46 partner ok, that one went wrong, 8 terabytes turned into 58..
23:46 y4m4 social_: check the script you might get an idea
23:46 y4m4 social_: http://pastebin.com/KvF1tM73
23:46 glusterbot Please use http://fpaste.org or http://dpaste.org . pb has too many ads. Say @paste in channel for info about paste utils.
23:47 social_ y4m4: thanks, I'm just testing on another volume that I can break =]
23:47 sprachgenerator joined #gluster
23:49 y4m4 joined #gluster
23:50 partner oh, stupid calculator removed comma..
23:52 partner gluster volume set dfs cluster.min-free-disk 615726511554
23:52 partner Set volume successful
23:52 partner i'd guess i got it done, i am slowly fading away from this world.. a good time to tune production storage..
23:54 partner according to graphs none of my earlier actions had any effect on open files or disk utilization :(
23:54 partner there are other volumes running so i can just reboot the boxes..
23:54 partner they have no issues
23:56 partner i guess i just give up and let it break, its easier to work under emergency hood when its all broken already :/
23:58 partner should have roughly 10-12 hours before running out of filehandles
23:58 partner @paste
23:58 glusterbot partner: For RPM based distros you can yum install fpaste, for debian and ubuntu it's dpaste. Then you can easily pipe command output to [fd] paste and it'll give you an url.
23:58 partner pah
23:59 partner @imagepaste
23:59 partner imgur it is..

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary