Camelia, the Perl 6 bug

IRC log for #gluster, 2012-11-21

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:04 kminooie so I am still having issue with nfs mounted volume going stale and is seems that I am having this issue:  http://lists.nongnu.org/archive/html​/gluster-devel/2011-07/msg00062.html  as it only happens when i am getting the content of directory not the files.
00:04 glusterbot <http://goo.gl/a0TDG> (at lists.nongnu.org)
00:05 circut joined #gluster
00:05 kminooie but i still haven't been able to find a solution. anybody?
00:07 JoeJulian "The bad ones have gfid out of sync with their linkto counterpart on the other replica." - Do you see that as well?
00:08 kminooie I don't know. right now I am trying to increase the log level and find a way to log nfs activity on the client as well. any suggestion?
00:08 kminooie how can i increase the log level of gluster?
00:09 semiosis see ,,(options)
00:09 glusterbot http://goo.gl/dPFAf
00:09 semiosis diagnostics.*-log-level
00:10 JoeJulian But the nfs client is the kernel, so  echo 1 > /proc/sys/sunrpc/nfs_debug
00:11 kminooie yup. question : diagnostics.client this refers to glisterfs client or nfs counts as client as well?
00:35 robo joined #gluster
00:39 kevein joined #gluster
00:44 inodb_ joined #gluster
00:46 kminooie could you have a look at this and see if something looks out of ordinary?  I did a write operation follow by a directory lookup that resulted in stale  during this time period? this is the brick file and the diagnostics.brick-log-level is DEBUG http://fpaste.org/MDA2/
00:46 glusterbot Title: Viewing Paste #253856 (at fpaste.org)
01:12 yeming joined #gluster
01:28 dalekurt joined #gluster
01:35 red_solar joined #gluster
01:44 kminooie we will continue this discussion tomorrow :)  have a good evening every one.
01:44 kminooie left #gluster
01:53 hackez joined #gluster
02:11 hagarth joined #gluster
02:17 bharata joined #gluster
02:20 mohankumar joined #gluster
02:24 nightwalk joined #gluster
02:28 sunus joined #gluster
02:33 ika2810 joined #gluster
02:35 chirino_m joined #gluster
03:06 lng joined #gluster
03:07 lng Hi! That's damn weird - http://paste.ubuntu.com/1373980/
03:07 glusterbot Title: Ubuntu Pastebin (at paste.ubuntu.com)
03:19 sgowda joined #gluster
03:26 GLHMarmot I am new to Gluster and have been doing some testing with a two node replicated cluster using 3.3.
03:26 GLHMarmot As part of that testing I am trying the automatic self-heal
03:27 GLHMarmot I can definitely tell that healing is happening but I am trying to understand the results of volume heal <volume> info
03:27 GLHMarmot and volume heal <volume> info healed
03:27 GLHMarmot my suspicion is that the the first is the files that need to be healed
03:28 GLHMarmot and the second is the list of those that has been healed.
03:28 GLHMarmot Long winded, but can anyone confirm?
03:39 JoeJulian yep
03:40 GLHMarmot Ok, then a second related question.
03:41 GLHMarmot I get multiple entries for the same file in the "healed" list with different timestamps
03:41 GLHMarmot Is that because the source file is being changed during the heal?
03:41 GLHMarmot and has to be "healed" more than once?
03:45 GLHMarmot An example of the output is at: http://dpaste.com/834087/
03:45 glusterbot Title: dpaste: #834087: healing query example (at dpaste.com)
04:30 sripathi joined #gluster
04:38 JoeJulian GLHMarmot: Hrm... that's every 10 minutes which corresponds to the frequency that the self-heal daemon triggers. Check for split-brain and heal-failed. If it's in heal-failed, I'd check the self-heal log to see why it's failing.
04:38 JoeJulian GLHMarmot: Which version is this?
04:38 deepakcs joined #gluster
04:54 lng In Gluster, if file is in split-brain, you cannot owerwrite it from mount point. This is very bad.
04:55 Technicool lng, you can just delete the errant file from the backend, correct?
04:56 Technicool then self-heal would copy over the good file
04:56 lng Technicool: it is very inconvenient when replication is used
04:56 lng and there're a lot of such files
04:58 Technicool understood, although I have not seen any clustered system that doesn't have some possibility of split brain
04:59 Technicool since the file paths are logged, some quick work with grep and awk can be helpful
05:03 nightwalk joined #gluster
05:11 vpshastry joined #gluster
05:44 rudimeyer_ joined #gluster
05:48 hagarth joined #gluster
05:50 ramkrsna joined #gluster
05:50 ramkrsna joined #gluster
05:53 GLHMarmot Back
05:54 GLHMarmot Nothing was reported for split-brain or heal-failed
05:55 GLHMarmot Version is 3.3, just downloaded from web site two days ago. (Debs)
05:55 GLHMarmot I am running this on Proxmox 2.x which is based on Debian Squeeze
05:59 bala2 joined #gluster
06:19 rudimeyer_ joined #gluster
06:24 overclk joined #gluster
06:40 raghu joined #gluster
06:54 glusterbot New news from newglusterbugs: [Bug 874498] execstack shows that the stack is executable for some of the libraries <http://goo.gl/NfsDK>
07:13 quillo joined #gluster
07:24 GLHMarmot joined #gluster
07:25 inodb^ joined #gluster
07:26 puebele1 joined #gluster
07:29 quillo joined #gluster
07:30 ally1 joined #gluster
07:32 ngoswami joined #gluster
07:36 lkoranda joined #gluster
07:43 inodb_ joined #gluster
07:45 puebele joined #gluster
07:45 ekuric joined #gluster
07:46 dobber joined #gluster
07:47 puebele left #gluster
07:53 GLHMarmot joined #gluster
07:58 vpshastry joined #gluster
08:01 H__ joined #gluster
08:04 ctria joined #gluster
08:06 quillo joined #gluster
08:07 nightwalk joined #gluster
08:10 duerF joined #gluster
08:18 vpshastry joined #gluster
08:41 andreask joined #gluster
08:43 quillo joined #gluster
08:44 duerF joined #gluster
08:53 Humble joined #gluster
08:54 lkoranda joined #gluster
08:58 gbrand_ joined #gluster
09:02 vpshastry joined #gluster
09:06 redsolar_office joined #gluster
09:06 lng how can I get number of reads / writes a minute?
09:17 ika2810 left #gluster
09:20 zhashuyu joined #gluster
09:22 Norky left #gluster
09:22 Norky joined #gluster
09:22 Norky hello
09:22 glusterbot Norky: Despite the fact that friendly greetings are nice, please ask your question. Carefully identify your problem in such a way that when a volunteer has a few minutes, they can offer you a potential solution. These are volunteers, so be patient. Answers may come in a few minutes, or may take hours. If you're still in the channel, someone will eventually offer an answer.
09:23 Norky insolent bot
09:23 Norky my question is not ready yet
09:23 * Norky spanks glusterbot
09:24 andreask lng: you can enable profiling for your volumes
09:24 lng andreask: yes
09:25 lng but where can I find this info?
09:26 ramkrsna joined #gluster
09:26 andreask lng: you mean how to enable it?
09:26 lng andreask: no, no
09:27 lng I mean I can't see the number of requests
09:28 hchiramm_ joined #gluster
09:29 andreask lng: that "gluster volume profile _volum_ info" is not enough?
09:30 lng andreask: I can see Latency, Duration, Data Read, Data Written... but no number of requests
09:31 andreask lng: should be able to calculate them
09:31 lng andreask: now I get this info from access logs
09:31 webwurst joined #gluster
09:32 lng %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
09:32 lng 84.35    1022.77 us       8.00 us  613753.00 us         648699      LOOKUP
09:32 lng what is us?
09:32 andreask lng: you can see reads/writes for each blocksize and you know how long profiling is running
09:33 andreask us ... micro-seconds
09:33 lng ah, ok
09:34 lng 613753 is 0.6sec
09:34 lng that's a lot
09:34 lng I have ~400 write requets a minute
09:35 lng and ~160 read ones
09:35 lng on 4 nodes with 2 bricks per each
09:36 lng 4 clients
09:36 DaveS joined #gluster
09:37 lng load average is high sometimes
09:38 lng I can see swap is used
09:39 lng http://pastie.org/private/kva2kelnbcwqbxjjltyiva
09:39 glusterbot <http://goo.gl/S4glK> (at pastie.org)
09:39 lng mayebe not enough RAM
09:40 Azrael808 joined #gluster
09:41 andreask less than 2GB?
09:41 glusterbot New news from resolvedglusterbugs: [Bug 765585] cyrus-imapd unable to reliably start/function on top of GlusterFS volume <http://goo.gl/ww2uc> || [Bug 801996] XDR decoding failure <http://goo.gl/fJtlW>
09:41 lng yes
09:41 lng but
09:41 andreask my laptop has more ;-)
09:41 lng -/+ buffers/cache
09:41 lng my laptop - 8G
09:42 lng this is EC2 medium cpu instance
09:42 lng how to figure out what is causing high load average?
09:43 lng but, actually, there's 140048 free
09:43 lng so RAM is not issue I guess
09:44 lng but I might be wrong
09:44 lng it's ~140M
09:44 lng maybe not enough
09:50 lng /var/log/glusterfs/glustershd.log.1:[2012-11-19 01:47:58.456943] E [afr-self-heal-common.c:140​9:afr_sh_common_lookup_cbk] 0-storage-replicate-2: Conflicting entries for <gfid:924fbef0-7cae-416e-a8bf-4​ab57c512ae1>/game.dat2012111610
09:50 lng is there anything I can do with it?
09:50 sshaaf joined #gluster
09:54 lng is it possible to find out filename by gfid?
09:54 andreask you can always do iostat, vmstat, dstat your machines to find out what is causing the load
09:55 lng andreask: there's nothing wrong in iostat output
09:56 lng dstat!
09:56 lng yes
09:56 lng it's nice prog
09:57 lng http://pastie.org/private/svifaw2yxakqm07pvz1a3q
09:57 glusterbot <http://goo.gl/NYUr8> (at pastie.org)
09:57 lng load average is 6 now
09:57 lng 2 cores
09:58 lng a lot of reads I think
09:58 lng by access.log, I have 3 times less of read requests
09:59 Norky how do I make a (native FUSE) client use RDMA/InfiniBand?
09:59 Norky I've seen reference to mounting glusterserver:/volname.rdma, and also passing the option transport=rdma
10:00 lng andreask: http://pastie.org/private/3pa84uqpvsdfm5efy4msq
10:00 lng andreask: can you see something?
10:00 glusterbot Title: Private Paste - Pastie (at pastie.org)
10:00 Norky however, no matter what I did, when accessing a volume which was set up with tcp,rdma, it always seemed to be using tcp
10:01 Norky this si using Red HAt Storage server, where I understand that rdma is an unsupported "technical preview"
10:03 andreask lng: yeah, lot of reads
10:11 lng andreask: strange
10:11 glusterbot New news from resolvedglusterbugs: [Bug 800300] locktests fail in "READ LOCK THE WHOLE FILE BYTE BY BYTE" test case. <http://goo.gl/0hCWv> || [Bug 765471] [glusterfs-3.2.5qa1]: glusterd hung <http://goo.gl/RDGrp>
10:11 lng what might cause them?
10:11 lng as I have not not so much reads from clients
10:12 lng maybe Gluster itself causing this?
10:12 ndevos lng: "Conflicting entries for ..." is mostly caused by a ,,(split-brain)
10:12 glusterbot lng: (#1) learn how to cause split-brain here: http://goo.gl/nywzC, or (#2) To heal split-brain in 3.3, see http://goo.gl/FPFUX .
10:13 lng thanks!
10:14 lng ndevos: split-brain is causing a lot of reads?
10:14 andreask self-healing
10:14 ndevos lng: self-healing is
10:15 lng ndevos: does self-healing produce more reads when there're split-barin files?
10:16 lng http://paste.ubuntu.com/1373980/
10:16 glusterbot Title: Ubuntu Pastebin (at paste.ubuntu.com)
10:16 lng I have a  lot of these ^
10:16 Humble joined #gluster
10:17 ndevos lng: the ??? for the attributes of the last directoty look strange
10:17 ndevos *directory even
10:17 manik joined #gluster
10:18 lng ndevos: yes - this is split-brain
10:18 lng but why do I have duplicates?
10:20 ndevos lng: hmm, do the the inodes have the same value as well? check with 'ls -li'
10:21 quillo joined #gluster
10:21 lng ndevos: moment
10:22 Norky am I better off asking about Red Hat Storage server in a Red HAt-specific channel, rather than here?
10:23 lng ndevos: no, inodes are different
10:23 lng ndevos: I have a lot of split-brain files
10:25 ndevos lng: okay, so, you will need to resolve the split-brains first, if the inodes are different, it suggests that the gfid of the files are different too
10:26 lng ndevos: since today I started to do the following:
10:26 ndevos Norky: you should probably open a support case for RHS related questions+issues: https://access.redhat.com/support/cases/new
10:26 glusterbot Title: redhat.com (at access.redhat.com)
10:26 lng 1. I extract all the files in logs wich have Input/output error
10:27 lng 2. I delete 1 copy of these files
10:28 lng 3. I use getfattr to delete corresponding file in .glusterfs/
10:28 Norky well, my questions are about the gluster component of RHS, I jsut wasn't sure how (if at all) different gluster is in RHS from the version available from http://www.gluster.org/download/
10:28 glusterbot Title: Download | Gluster Community Website (at www.gluster.org)
10:29 Norky lng, you're conflating files and extended aaatributes
10:30 lng Norky: ?
10:30 lng I delete 1 copy because self-heal restores the files
10:31 Norky getfattr does not delete files, it queries attributes of files
10:31 lng Norky: I use it to delete files
10:31 lng Norky: it gives me gfid
10:31 lng then I use tm
10:31 lng rm*
10:31 Norky ahh, I see
10:32 quillo joined #gluster
10:32 ndevos Norky: there is a difference, the RHS version only contains selected patches/backports from the upstream community version, not all changes
10:33 lng the most annoying is when file is in split-brain, you cannot just overwrite it from mountpoint
10:33 Norky ndevos, righto, thanks :)
10:34 ndevos lng: you also dont want gluster to make the decision on which file has the correct contents and have it delete the wrong one
10:35 lng ndevos: yes, but when new file is comming, it should overwrite it and resolve the problem
10:36 ndevos Norky: and in case of a tech preview, Red Hat will want to know who is using it and what level of success is reached, or what issues are found
10:37 ndevos lng: how would you write that new file? open() the existing, or unlink() it first?
10:38 lng ndevos: I tried unlinking
10:38 lng but the file could not be deleted in this case
10:38 ndevos lng: yeah, thats is the current behaviour
10:39 lng unlink first, then write
10:39 lng ndevos: is there any other way to overwrite it from mountpoint?
10:40 ndevos lng: I guess one could file a bug that deleting a split-brained file through the glusterfs mountpoint should delete all of them from the bricks
10:40 glusterbot http://goo.gl/UUuCq
10:40 ndevos lng: the only way to do that is on the bricks, for all I know
10:41 lng ndevos: which is very inconvenient
10:41 ndevos yes, I agree
10:41 lng I fill file a bug
10:41 glusterbot http://goo.gl/UUuCq
10:42 ndevos but it is also very secure, you will not loose data that you can not read/check through the mountpoint
10:42 lng because in case of overwriting, it would work as healing
10:43 ndevos lng: overwriting is not the same as delete+create, but I guess open(O_WRONLY|O_TRUNC) might work too
10:44 ndevos maybe even without O_WRONLY
10:44 lng ndevos: I don't know how to do it over PHP
10:45 lng 'w' Open for writing only; place the file pointer at the beginning of the file and truncate the file to zero length. If the file does not exist, attempt to create it.
10:46 lng fopen()
10:46 ndevos lng: sounds like what I mean :)
10:46 lng yea
10:46 lng ndevos: do you think it might work?
10:46 lng I can try
10:46 lng :-)
10:47 ndevos lng: I doubt it currently works, but requesting that feature makes sense IMHO
10:48 lng ah
10:57 Eimann left #gluster
11:02 Humble joined #gluster
11:04 manik1 joined #gluster
11:19 rgustafs joined #gluster
11:33 Humble joined #gluster
11:36 inodb^ joined #gluster
11:37 hchiramm_ joined #gluster
11:42 sshaaf joined #gluster
11:47 lkoranda joined #gluster
11:49 rudimeyer_ joined #gluster
11:49 Humble joined #gluster
11:52 sgowda joined #gluster
12:17 chirino joined #gluster
12:20 hagarth joined #gluster
12:21 toruonu joined #gluster
12:22 toruonu quick question, what do I need for the client side, which only wants to mount?
12:22 toruonu does it need to be in the trusted pool?
12:22 johnmark toruonu: I don't think so
12:23 toruonu ok, I installed glusterfs and glusterfs-fuse on a client and tried mount, but the command returned with no message and no mount
12:23 johnmark huh. that's weird
12:24 toruonu [root@wn-d-117 ~]# mount -t glusterfs 192.168.1.241:/home0 /home
12:24 toruonu [root@wn-d-117 ~]# mount|grep home
12:24 toruonu nothing
12:24 kkeithley toruonu: is iptables (firewall) running on the gluster servers?
12:24 toruonu not that I know of
12:25 toruonu ugh, the client one does have but everything is ACCEPT from what I see
12:25 kkeithley shouldn't affect the client
12:25 samppah toruonu: is fuse loaded on client side?
12:25 kkeithley can you telnet to port 38465 on one of the servers?
12:25 toruonu has to be, hdfs is mounted through fuse
12:25 kkeithley yes, you need fuse on the client
12:26 kkeithley what linux dist?
12:26 toruonu CentOS 6.3 on client, Scientific Linux 5.7 on server
12:27 toruonu client has OpenVZ kernel though
12:27 kkeithley what version of glusterfs?
12:27 kkeithley client's kernel shouldn't matter
12:27 toruonu the one that came from repo 1 minute ago :)
12:27 toruonu the server has from yesterday
12:27 toruonu so I'd assume 3.3.1
12:27 toruonu ah wait
12:27 kkeithley my fedorapeople.org repo? or the EPEL repo?
12:27 toruonu client has 3.2.7
12:28 toruonu server has 3.3.1
12:28 toruonu odd
12:28 toruonu is glusterfs also in centos / epel repos
12:28 kkeithley yes
12:28 toruonu might be a yum priority thing
12:29 toruonu yes, disablerepo=epel seems to have picked the right one now
12:29 kkeithley should be the case that if you have both EPEL and my fedorapeople.org repo, the higher version in my fedorapeople.org repo will take precedence
12:29 toruonu well I have the one from http://download.gluster.org/pub/gluster/glu​sterfs/LATEST/EPEL.repo/glusterfs-epel.repo
12:29 glusterbot <http://goo.gl/5beCt> (at download.gluster.org)
12:29 kkeithley IIRC it works that way on my RHEL and CentOS boxes
12:30 kkeithley same thing, same bits anyway
12:30 toruonu ah, better … now worked
12:30 toruonu but i just discovered that rsync has been an idiot (or I was when I executed it)
12:30 toruonu is there a way for me to mount a folder from a volume :)
12:31 toruonu i.e. srv:/home0/home
12:31 toruonu :)
12:31 inodb_ joined #gluster
12:31 kkeithley try it, I'm pretty sure that works
12:31 toruonu would rather not do a mv command of the whole contents down 1 level
12:31 toruonu Mount failed. Please check the log file for more details.
12:31 toruonu :)
12:31 toruonu *sigh* /home/home it is then :( how does glusterfs handle mv command?
12:32 toruonu i.e. will it actually move the files between servers as well or is it like doing a move on local filesystem where it takes a mere second
12:32 toruonu because it's just remapping question
12:33 kkeithley It shouldn't mv the files to other servers.
12:33 toruonu ok I guess I can try with a smaller folder
12:33 kkeithley I presume you're using distribute? The DHT hash is on the file name component only, so the DHT hash shouldn't result in mv-ing files whose names don't change
12:34 inodb_ joined #gluster
12:34 vpshastry joined #gluster
12:34 kkeithley shouldn't move files to other servers
12:34 toruonu yeah, just using a volume of 12 bricks with replication factor 3
12:35 toruonu no stripe
12:35 * kkeithley wonders whether you really need stripe
12:35 toruonu well I had the debate yesterday and decided to not have stripe :)
12:36 kkeithley oops, I read that wrong. never mind (about stripe)
12:36 toruonu am awfully lucky I did the glusterfs volume yesterday and rsynced user home directories in preparation for a future move to shared homes as this morning the user server root filesystem got irrevocably corrupted
12:36 toruonu now I have a backup of the home directories and can set it up on another server :)
12:36 kkeithley anyway, without distribute (dht) there wouldn't be any reason to move files between servers.
12:37 edward1 joined #gluster
12:37 toruonu ok, I can confirm that it takes under a second even for N GB files
12:46 Humble joined #gluster
12:48 chirino joined #gluster
12:49 inodb_ joined #gluster
12:51 chirino Anyone know How can I cause split-brain in glusterfs when cluster.quorum-type is set to auto? http://goo.gl/5RvKA
12:51 glusterbot Title: Question: How can I cause split-brain in glusterfs when cluster.quorum-type is set to auto (at goo.gl)
12:57 glusterbot New news from newglusterbugs: [Bug 877563] Metadata timestamps ignored potentially causing loss of new metadata changes <http://goo.gl/UH1ZB> || [Bug 878004] glusterd segfaults in remove brick <http://goo.gl/KCswd>
13:50 _ilbot joined #gluster
13:50 Topic for #gluster is now  Gluster Community - http://gluster.org | Q&A - http://community.gluster.org/ | Patches - http://review.gluster.org/ | Developers go to #gluster-dev | Channel Logs - http://irclog.perlgeek.de/gluster/
13:58 robo joined #gluster
13:58 bulde1 joined #gluster
14:00 morse joined #gluster
14:01 bennyturns joined #gluster
14:02 puebele2 joined #gluster
14:02 puebele2 left #gluster
14:07 manik joined #gluster
14:08 gbrand__ joined #gluster
14:22 ctria joined #gluster
14:43 Humble joined #gluster
14:43 ika2810 joined #gluster
14:51 inodb_ joined #gluster
14:54 puebele1 joined #gluster
15:05 stopbit joined #gluster
15:06 wushudoin joined #gluster
15:08 gmi1456 joined #gluster
15:09 Humble joined #gluster
15:14 puebele1 joined #gluster
15:15 tru_tru joined #gluster
15:19 ctria joined #gluster
15:22 shireesh joined #gluster
15:28 gmi1456 is anybody here using GlusterFs to hold Openstack Essex instances? I have a few related questions
15:28 overclk joined #gluster
15:29 Humble joined #gluster
15:34 arusso joined #gluster
15:34 arusso joined #gluster
15:40 morse joined #gluster
15:49 Teknix joined #gluster
15:50 joeto joined #gluster
15:51 gmi1456 Ok, because I assume nobody here is using Gluster and Openstack, I will try to change my question to a more general format. I have four servers (A, B, C and D) that have each RAID10 containers with around 2.5 TB available for the Gluster volume, and I created a distributed replicated volume with replica=2. I have mounted the volume on each of the servers and I want to know if each file created on the volume by server A will be r
15:51 gmi1456 eplicated twice. My understanding is that the files in a GlusterFs volume do not get split; they are replicated as they were created and the first accessed copy is always the local one. The second copy will be on one of the other three servers (B, C or D) based on initial free space. If the files grow at different rates, then the volume will potentially be imbalanced because the distribution of the replicas happens only when the files are first cr
15:51 gmi1456 Am I correct in my statements?
15:52 Mo__ joined #gluster
15:53 kkeithley distributed + replicated w/ four bricks means that files will written to (only) two of the four bricks.
16:00 kkeithley Depending on how the file name hashes, the two copies will always be written to the same brick (server) in each replica set, e.g. A & C, or B & D.
16:02 kkeithley The two copies of the file would never be written to A & B, or C & D because that's not how the replica sets work.
16:05 gmi1456 kkeithley: thanks for your answer; so if the first file is created from ServerA, the first replica will be on ServerA and second one on any of the other three servers, based on free space? the second file created from ServerA will always land on A with the second replica on any of the other three servers, not necessarily same servers as the first file, correct?
16:05 kkeithley IOW, first replication takes effect and the two copies are written to both A:B and C:D. Then with distribution the file name is hashed to determine which brick. E.g. if the hash returns either 0 or 1, and it returns 1, then the file is written on B & D.
16:06 kkeithley no. It's usually not based on free space.
16:08 kkeithley things change a bit when the bricks get full, but let's ignore that for a minute.
16:10 gmi1456 kkeithley: I'm not sure I understand. When you said that  "the two copies are written to both A:B and C:D", did you mean A or B and C or D? you didn't mean to say that the file will be created on all four servers?
16:10 kkeithley The way you have your volume defined, you get two replica sets: A+B and C+D. Pretend for a minute that those are some kind of logical volume.
16:11 kkeithley First replication applies, and the files are written to both A+B and C+D.
16:12 kkeithley Then distribution applies. The hash returns 0 or 1. If the hash returns 0, the file will be written to A (of A+B) and C (of C+D).
16:12 kkeithley If the hash returns 1, the file will be written to B (of A+B) and D (of C+D)
16:14 gmi1456 these decisions happen before the file is actually written anywhere, so the write op is hapenning only twice? (e.g. only on A and D)
16:14 kkeithley So yes: (A or B) and (C or D). And not to all four servers/bricks.
16:14 kkeithley correct, the write op only happens twice.
16:15 kkeithley Well, it'd be A and C, but yes.
16:16 gmi1456 and if I add another four servers/bricks later, I assume the same logic applies but the local brick has priority?
16:21 kkeithley there's no priority for any particular brick in the current release
16:21 kkeithley the hash of the file name determines which brick the files is distributed to.
16:22 bulde joined #gluster
16:22 gmi1456 kkeithley:hmm, so if server A accesses a file that it created, it might actually access it over the network because there is no replica stored locally?
16:22 kkeithley correct.
16:23 semiosis :O
16:23 masterzen joined #gluster
16:23 semiosis locally, eh?  ,,(split brain)
16:23 glusterbot I do not know about 'split brain', but I do know about these similar topics: 'split-brain'
16:23 semiosis ,,(split-brain)
16:23 glusterbot (#1) learn how to cause split-brain here: http://goo.gl/nywzC, or (#2) To heal split-brain in 3.3, see http://goo.gl/FPFUX .
16:24 semiosis having servers also be clients imho increases chances for split brain
16:24 kkeithley When you mount a gluster volume, you're really connected to all the bricks, not just the one that's local. You might specify the local hostname-or-ip-address, but that doesn't mean anything, per se.
16:25 semiosis ,,(mount server)
16:25 glusterbot (#1) The server specified is only used to retrieve the client volume definition. Once connected, the client connects to all the servers in the volume. See also @rrnds, or (#2) Learn more about the role played by the server specified on the mount command here: http://goo.gl/0EB1u
16:26 semiosis re: split brain on servers mounting locally, two ways to avoid split brain are 1) have the servers mount the volume read-only, or 2) guarantee that the application using the mounts on the server will only ever write to a file from one server (such as by never writing to the same file twice)
16:27 semiosis quorum may be a 3rd option, but i've not really thought about htat
16:27 semiosis that
16:28 kkeithley so, the gluster nfs server is, in almost every sense, a local native client for the local brick and a remote client of all the other bricks in the volume, right? But I don't have the volume of experience that semiosis has with running gluster in production, so I defer to his wisdom in this regard.
16:29 masterzen joined #gluster
16:29 inodb_ joined #gluster
16:30 masterzen joined #gluster
16:31 gmi1456 semiosis: the servers/bricks are Openstack compute nodes and the files are VM disks accessed by a single server at a time; the idea of using these servers as both clients and servers is to save money, space, etc. as this has to happen at scale (tens/hundreds of servers)
16:32 shireesh joined #gluster
16:34 semiosis kkeithley: well i dont use gluster-nfs in production (or any nfs) -- fuse clients all the way for me
16:35 semiosis gmi1456: possible for a vm to migrate from one host/gluster-server to another?
16:35 gmi1456 semiosis: yes, that's one of the goals of using DFS
16:36 semiosis gmi1456: it may be possible to cause split brain "in time" by alternating server failures
16:37 semiosis having VM disks only accessed by one server "at a time" may not be good enough to prevent split brain, depending on how failover works... idk enough to predict how that will play out though
16:40 semiosis also, imho, designing for scale requires separating compute & storage into their own clusters
16:41 semiosis to manage cost, and complexity, and reliability, etc...
16:41 DaveS_ joined #gluster
16:42 madphoenix Is there a log file somewhere for swift?  I'm able to authenticate and get a token, but any subsequent actions result in an Internal Server Error.  I'm not sure how to begin troubleshooting without a log file
16:43 kkeithley UFO/Swift logs are all in /var/log/messages
16:43 gmi1456 semiosis: well, in a perfect world...but in my case you have to use 40 servers as compute nodes with plenty of CPU and RAM and maybe another 40 servers with only plenty of disks for GlusterFS bricks and this is pretty wasteful; the compurte nodes have at least 6-8 drives locally that could be used for storing the VMs
16:44 gbrand_ joined #gluster
16:45 madphoenix kkeithley:  thanks
16:46 madphoenix In the logs I'm seeing an account-server error: 'Getting volume info failed %s, make sure to have passwordless ssh on %s'.
16:47 gmi1456 semiosis:I read your post about split-brain in glusterfs and I think my case would be "alternating server failures" "with only one writer process on a single client" and my work-around would be to disable GlusterFS from starting automatically after boot, what do you think?
16:47 kkeithley gmi1456: IIRC, there are some undocumented tricks (hacks) to force files to be stored locally. They're basically a variation of "full disk" handling. I almost hesitate to even mention it. It also kinda makes me wonder what the value of a distributed file system is if you're going to force writes to the local server.
16:48 kkeithley madphoenix: yes, if ufo/swift isn't running on one of your gluster servers then you'll need to set up passwordless ssh for root between the ufo server and the primary gluster server. All these machines are behind a firewall, right?
16:49 kkeithley And yes, we're going to fix that.
16:49 madphoenix yes, these machines are behind a firewall
16:50 gmi1456 kkeithley: the main value would be performance related; when writing locally, the disk speed is around 350 MB/s and when writing on a remotely stored replica you reach the network speed limitation at 110 MB/s
16:50 bala joined #gluster
16:52 kkeithley gmi1456: yes, that's certainly true. I just chalk that up as one of the trade-offs for the utility of the distributed file system.
16:54 semiosis gmi1456: i'm generally a fan of automatic failover with manual failback/recovery
16:54 semiosis at least then if you get split brain and glusterfs freezes access to the image you're there already to deal with it
16:55 raghu joined #gluster
16:55 semiosis but you should of course test all the failure scenarios you can think of before going live
16:55 semiosis and come here to let us know what works & what doesn't :D
16:57 gmi1456 semiosis and kkeithley: thank you very much for your insight; any advice on the best documentation on the internals of GlusterFS?
17:08 xymox joined #gluster
17:16 inodb^ joined #gluster
17:52 hagarth joined #gluster
17:53 kkeithley gmi1456: uh, the source?
17:54 inodb_ joined #gluster
17:55 overclk joined #gluster
17:55 gmi1456 kkeithley: something less raw, like an advanced technical doc :)
17:57 kkeithley dunno. johnmark ^^^
17:59 semiosis jdarcy's blog posts & articles on hekafs.org
17:59 semiosis such as the one on ,,(extended attributes)
17:59 glusterbot (#1) To read the extended attributes on the server: getfattr -m .  -d -e hex {filename}, or (#2) For more information on how GlusterFS uses extended attributes, see this article: http://goo.gl/Bf9Er
18:00 gmi1456 semiosis: thanks, I'll read those
18:00 JoeJulian His translator 101 series as well
18:01 kkeithley doh, yes, of course
18:08 inodb_ joined #gluster
18:09 inodb^ joined #gluster
18:25 johnmark kkeithley: but yes, we need something more
18:25 johnmark the trick is, how to get to "something more"
18:34 * jdarcy o_O
18:35 jdarcy How about if I stop generating patches that will never be merged and write a book on GlusterFS internals instead?
18:39 semiosis rofl
18:43 FredSki joined #gluster
18:56 JoeJulian lol
18:57 JoeJulian lol and :( at the same time...
18:57 blendedbychris joined #gluster
18:57 blendedbychris joined #gluster
18:58 NuxRo lol @ http://qph.cf.quoracdn.net/main-qim​g-62d9c6ba870554801feb758c27f98c7d
18:58 glusterbot <http://goo.gl/Pa8sP> (at qph.cf.quoracdn.net)
18:59 NuxRo btw jdarcy http://cloudfs.org/blog is 404
18:59 JoeJulian worked for me
19:01 NuxRo hm, must have been some glitch somewhere, now it loads here too odd
19:02 johnmark jdarcy: *sigh*
19:03 johnmark jdarcy: I mean, hey, don't let me stop you, but I'm groaning nonetheless
19:03 bauruine joined #gluster
19:04 JoeJulian Speaking of developers... Still no feedback on that life cycle proposal from the folks that might actually be able to make it happen.
19:04 jdarcy NuxRo: You're right.  Somehow everything got moved under index.php at some point.
19:04 * jdarcy tries to figure out if it's a Wordpress issue or an nginx issue.
19:05 Technicool joined #gluster
19:05 NuxRo nginx is trouble :)
19:06 jdarcy JoeJulian: Who are the people who might make it happen?
19:06 JoeJulian You, for one! :P
19:06 jdarcy NuxRo: Tell me more.  I actually used lighttpd for a long time, but switched to nginx because of memory leaks.  Curious what you've had trouble with.
19:07 JoeJulian I've found nginx invaluable for running web servers on instances that are too small to make apache viable.
19:07 jdarcy JoeJulian: Most assuredly *not* me.  I don't have my fingers on the "commit" button, nor do I have much sway with the (non-engineering) folks who've created the blockage.
19:08 * JoeJulian raises an eyebrow...
19:09 JoeJulian Well, you, avati, and vikas were the folks that /I/ thought managed that.
19:09 JoeJulian s/vikas/vijay/
19:09 glusterbot What JoeJulian meant to say was: Well, you, avati, and vijay were the folks that /I/ thought managed that.
19:09 semiosis JoeJulian: link please?  so we're all on the same page
19:10 jdarcy The other two, yes, but they don't exactly just do what they want either.
19:11 johnmark JoeJulian: I will make sure that gets raised, and we'll have a board meeting soon to discuss it
19:11 JoeJulian http://www.gluster.org/community/d​ocumentation/index.php/Life_Cycle
19:11 glusterbot <http://goo.gl/zkCmY> (at www.gluster.org)
19:12 johnmark JoeJulian: it's a problem, and we need to solve it.
19:12 johnmark JoeJulian: realistically what we can do is come up with a proposal, circulate to the Gluster advisory board, and vote
19:12 NuxRo jdarcy: i was half-kidding, nginx is ok, but sometimes it's a pain to adjust apache rewrite rules to its engine
19:13 jdarcy NuxRo: Yeah, I'm looking at the rewrite rules right now.
19:19 jdarcy OK, I think I fixed the /index.php thing.  Can somebody else try e.g. http://cloudfs.org/blog for me?
19:20 semiosis looks good from here
19:20 jdarcy try_files to the rescue
19:20 jdarcy semiosis: Thanks.
19:20 semiosis yw
19:21 jdarcy Probably still a WP issue, but nginx fixed it. :)
19:23 JoeJulian btw, johnmark, I did send that to the advisory board but there's been no activity there either.
19:24 neofob left #gluster
19:24 * semiosis bites his cynical tongue
19:24 semiosis ...grumbles something unintelligible
19:24 semiosis @lucky best effort
19:24 glusterbot semiosis: http://goo.gl/0Jikb
19:27 semiosis kwim?
19:29 semiosis JoeJulian: sorry i missed your email to advisors, found it now
19:29 semiosis i guess i should voice my opinion on the discussion tab like you asked
19:30 zaitcev joined #gluster
19:31 johnmark semiosis: feel free to comment in both
19:31 johnmark or update the wiki and send the update to advisors@
19:34 semiosis thx, yeah i like wiki talk pages, will shoot a note to the list too tho
19:35 jdarcy I just added my suggestions there.
19:35 zaitcev I forgot to mention that it's possible to disable the extras at Github.
19:36 zaitcev Er ECHAN
19:38 semiosis http://www.mediawiki.org/wiki/Help:Signatures
19:38 glusterbot Title: Help:Signatures - MediaWiki (at www.mediawiki.org)
19:38 semiosis we should use the + button to add to the talk page, and end each addition with ~~~~
19:39 jdarcy Just signed mine.
19:40 semiosis thx
19:45 johnmark semiosis: good point. I didn't know any of that :)
19:46 rudimeyer_ joined #gluster
19:47 sshaaf joined #gluster
19:47 semiosis :)
19:48 semiosis bbiab, lunch
20:11 gbrand_ joined #gluster
20:26 JoeJulian jdarcy: Do you have any ideas for reducing the memory footprint on a single client? I have one client that mounts 15 volumes for backup. With each one taking up 252m initially at mount time, (data+stack) this eats up memory quickly.
20:26 FredSki joined #gluster
20:27 a2 JoeJulian, have you tried disabling all the perf xlators?
20:27 a2 btw is it 252M virt or rss?
20:28 lanning joined #gluster
20:32 JoeJulian I've never had that completely figured out. virt is 300m. top shows "DATA" as 252m.
20:33 badone joined #gluster
20:37 JoeJulian RSS is 1156, apparently.
20:37 lh joined #gluster
20:37 lh joined #gluster
20:37 a2 just "1156" is about 1.1MB
20:38 JoeJulian So that would mean that if I add up all the RSS, I'm only using about 41m, which then makes me wonder why I'm always in swap.
20:44 nueces joined #gluster
20:48 circut joined #gluster
20:51 blendedbychris joined #gluster
20:51 blendedbychris joined #gluster
20:55 * JoeJulian learns about memory in linux...
20:58 tjikkun joined #gluster
20:58 tjikkun joined #gluster
20:58 JoeJulian a2: You are, of course, right. Unmounting all my volumes produced almost no change in the memory footprint.
21:00 JoeJulian And mounting them again caused the oom killer to kill my shell.... something's definitely wrong here.
21:02 GLHMarmot I am in the process of testing gluster. I created a second volume and the cluster froze when trying to mount the nfs locally. Rebooting one of the nodes brought it back. I deleted the volume, renamed the private ip dns names (they were non-standard before) and am now trying to recreate the volume.
21:03 GLHMarmot Unfortunately I get an error: /gfsroot/vm1/brick or a prefix of it is already part of a volume
21:03 glusterbot GLHMarmot: To clear that error, follow the instructions at http://goo.gl/YUzrh or see this bug http://goo.gl/YZi8Y
21:04 GLHMarmot Ooooohhhh, glusterbot, coooool
21:07 semiosis glusterbot: awesome
21:07 glusterbot semiosis: ohhh yeeaah
21:10 GLHMarmot Now to figure out how to get a dev build, or at least, that patch
21:10 JoeJulian Oooh, lvm snapshots may not be such a good idea after all.
21:10 Humble joined #gluster
21:11 gbrand__ joined #gluster
21:13 jaxx_ joined #gluster
21:16 jaxx_ Does anyone know if the cluster.min-free-disk works with Distributed systems with variable brick sizes. The last info I found about it on Google was from over a year ago. Also, is it just used by setting "gluster volume set xxxxx cluster.min-free-disk 60GB"?
21:17 semiosis jaxx_: what are you expecting that option to do for you?
21:19 jaxx_ I have several 3TB bricks and a few 1TB bricks. I added 5 more 3TB bricks to my volume, but the original ones keep filling up and get write errors before the new set is above 30%.. I'm using it for Zoneminder Video Capture, and millions of small files, the redistribute has significant issues with that many files.
21:20 semiosis brb, rebooting
21:20 jaxx_ If I could have it bypass the full ones while they were still available to read from, that would be my ultimate goal
21:20 semiosis imho, forget you ever saw min-free-disk
21:21 semiosis idk for sure what that does but i've heard it just writes warnings to the log
21:21 jaxx_ are there other options that would fix my problem?
21:21 semiosis if you add-brick, you need to rebalance
21:21 semiosis doing a rebal fix-layout will enable new files to be placed on the new bricks
21:21 nick5 joined #gluster
21:21 semiosis doing a full rebal will reshuffle files so all bricks are used evenly
21:21 semiosis these are not cheap operations
21:22 semiosis but you don't really have much choice if you want to add-bricks
21:22 semiosis another strategy is to expand your existing bricks, such as with lvm/lvextend.....
21:22 semiosis ok brb now
21:22 jaxx_ I did the fix-layout, but the rebalance has been going for about a week with 15 million files processed so far... at that point I get up to 500% cpu usage with my 6-core hyperthreading xeon
21:22 nick5 joined #gluster
21:24 JoeJulian yikes
21:25 tjikkun joined #gluster
21:25 tjikkun joined #gluster
21:27 GLHMarmot Hah! After a little more digging I found that there were extended attributes, just not where I was expecting them. I traversed the path and used "getfattr -m - ." to show all attributes and eventually found them.
21:27 GLHMarmot Error gone!
21:28 GLHMarmot No need to download and build source.
21:29 GLHMarmot JoeJulian: thanks for the writeup
21:29 GLHMarmot glusterbot: Wooot!
21:30 JoeJulian You're welcome. :D
21:30 jaxx_ @GLHMarmot if you use the "getfattr -m - . brickname -d" it will show you what attributes are assigned to them as well
21:30 semiosis back
21:31 semiosis jaxx_: yeah, not cheap :(
21:32 jaxx_ Here's the posting that mentioned that setting - http://community.gluster.org/q/does-bricks-may-ha​ve-different-sizes-in-a-distributed-architecture/
21:32 glusterbot <http://goo.gl/9nhXs> (at community.gluster.org)
21:33 GLHMarmot ll
21:34 GLHMarmot doh
21:34 semiosis anyone know if that min-free-disk option actually affects file placement?
21:35 semiosis jaxx_: in any case, if you have existing files that continue to grow then placing new files on other bricks won't help :(
21:35 semiosis i think growing your bricks is the only option
21:36 johnmark woah, this is cool - https://twitter.com/jeffbar​r/status/271335736505147392
21:36 glusterbot <http://goo.gl/2c0hl> (at twitter.com)
21:36 johnmark gotta love it when AWS' top evangelist recommends GlusterFS
21:37 jaxx_ OK, well, at current I have 30-days of images and just delete older files over that age until they get fully rebalanced.. nothing is working perfectly though
21:37 semiosis johnmark: let the retweeting begin!
21:39 johnmark heh :)
21:41 JoeJulian Hrm... I guess I need to figure out whether btrfs will work for my intermediate backup reliably.
21:42 johnmark woah... jdarcy calls 'em as he sees 'em - http://www.gluster.org/2012/11/how-to​-build-a-distributed-filesystem/?utm_​source=dlvr.it&amp;utm_medium=twitter
21:42 glusterbot <http://goo.gl/TMBbh> (at www.gluster.org)
21:43 jdarcy I liked "planetary waste of brain power"
21:46 semiosis 4.  ... Don’t worry about repair for now.  5. Worry about repair....
21:46 semiosis :D
21:47 jaxx_ Can I re-mount the bricks that are nearly full as read-only temporarily so they are readable?
21:49 semiosis sure you *can* but who knows what will happen when clients try writing to them
21:49 jaxx_ That's kind of the answer I was guessing it would be
21:50 jaxx_ bummer... I'll just let my rebalance go on, and hopefully someday it will finish
21:50 jaxx_ Thanks for your help
21:53 semiosis yw, sorry about the bad news
21:53 theron joined #gluster
22:10 Technicool joined #gluster
22:15 JoeJulian jaxx_: What version are you running?
22:18 jaxx_ 3.3.1.1 on CentOS 6.3
22:18 jaxx_ 64bit
22:29 GLHMarmot Damn! That issue with deadlocking when mounting a volume with nfs on localhost.  Forgot to add the sync option to the mount. <grumble>
23:03 glusterbot mmmmmm pumpkin pie
23:03 dedis0 joined #gluster
23:28 TSM2 joined #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary