Perl 6 - the future is here, just unevenly distributed

IRC log for #gluster, 2014-02-18

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:18 daMaestro joined #gluster
00:20 cjanbanan joined #gluster
00:48 _pol joined #gluster
01:13 tokik joined #gluster
01:21 jurrien joined #gluster
01:23 ThatGraemeGuy joined #gluster
01:51 harish joined #gluster
01:53 cjanbanan joined #gluster
01:59 aquagreen joined #gluster
02:01 theguidry joined #gluster
02:07 bala joined #gluster
02:08 harish joined #gluster
02:11 aquagreen joined #gluster
02:11 psyl0n joined #gluster
02:20 psyl0n joined #gluster
02:20 B21956 joined #gluster
02:25 DV joined #gluster
03:01 nightwalk joined #gluster
03:12 jporterfield joined #gluster
03:13 khushildep joined #gluster
03:14 aquagreen joined #gluster
03:17 aquagreen joined #gluster
03:18 ppai joined #gluster
03:26 shubhendu joined #gluster
03:33 aquagreen joined #gluster
03:37 RameshN joined #gluster
03:39 jporterfield joined #gluster
03:41 sahina joined #gluster
03:47 Matthaeus joined #gluster
03:50 kanagaraj joined #gluster
03:52 eshy hi, i'm reading about the replace-brick command on the gluster documentation page. if i perform a replace-brick onto a new, remote brick, will my data be removed from the source brick after commit, or is it a copy?
03:52 itisravi joined #gluster
03:54 eshy http://joejulian.name/blog/how-to-expand-gl​usterfs-replicated-clusters-by-one-server/ i've read this article a couple of times but some wording has me a little worried.. the writer says 'so first we move', and shows a replace-brick command, so im not sure if it's a move, or a copy
03:54 glusterbot Title: How to expand GlusterFS replicated clusters by one server (at joejulian.name)
03:54 eshy thanks glusterbot
03:55 rpowell joined #gluster
03:57 eshy am i right in thinking that after committing a replace-brick command (after status reports complete), the volume itself will simply switch to operating from the brick i've 'replaced' to?
03:58 mohankumar__ joined #gluster
04:05 satheesh1 joined #gluster
04:06 davinder joined #gluster
04:06 mohankumar__ joined #gluster
04:13 mohankumar joined #gluster
04:17 ajha joined #gluster
04:21 mohankumar joined #gluster
04:24 kdhananjay joined #gluster
04:26 mohankumar joined #gluster
04:29 ndarshan joined #gluster
04:31 shylesh joined #gluster
04:31 mohankumar joined #gluster
04:32 raghug joined #gluster
04:33 hagarth joined #gluster
04:37 mohankumar joined #gluster
04:42 DV joined #gluster
04:44 mohankumar joined #gluster
04:45 aquagreen joined #gluster
04:45 tokik joined #gluster
04:46 dusmant joined #gluster
04:48 jiqiren joined #gluster
04:50 cjanbanan joined #gluster
04:56 jobewan joined #gluster
05:09 prasanth joined #gluster
05:09 lalatenduM joined #gluster
05:09 jag3773 joined #gluster
05:14 mohankumar joined #gluster
05:14 _pol joined #gluster
05:15 bala joined #gluster
05:15 saurabh joined #gluster
05:17 Frankl joined #gluster
05:17 Frankl hi, we have two client, one client updated one file, the other copied it but it found the copied one was out-of-date which it is not the latest one.
05:18 Frankl the client's version is glusterfs-3.3.0-1
05:18 Frankl the server is 3.3.0.5rhs
05:18 Frankl When we run ll then copy, we could copy the latest one
05:20 cp0k joined #gluster
05:21 mohankumar joined #gluster
05:22 meghanam joined #gluster
05:22 meghanam_ joined #gluster
05:31 mohankumar joined #gluster
05:38 satheesh1 joined #gluster
05:38 kanagaraj joined #gluster
05:39 vpshastry joined #gluster
05:44 ndarshan joined #gluster
05:44 jikz joined #gluster
05:45 rastar joined #gluster
05:48 CheRi joined #gluster
05:52 raghu joined #gluster
05:55 NeatBasis joined #gluster
05:57 nshaikh joined #gluster
06:02 ndarshan joined #gluster
06:04 kanagaraj joined #gluster
06:06 benjamin_____ joined #gluster
06:16 _pol joined #gluster
06:26 coolsvap joined #gluster
06:28 rjoseph joined #gluster
06:28 aravindavk joined #gluster
06:29 lalatenduM Frankl, sorry I didn't get your question
06:30 satheesh joined #gluster
06:32 raghu` joined #gluster
06:34 Frankl lalatenduM: I mean there are two clients, one client updated the file, then the other client (in other machine) read the same file,it only see the old one.
06:35 Frankl lalatenduM: I have sent a mail to the maillist with title "self-heal is not trigger and data incosistency?"
06:35 lalatenduM Frankl, got it, which protocol you are using for mounting the volume on the client
06:35 Frankl lalatenduM: fuse
06:36 Frankl I pasted the client's log files in my 2nd mail on the topic
06:37 lalatenduM Frankl, the same glusternode is used to mount the volume on both the clients?
06:37 lalatenduM Frankl, will check the mail
06:39 lalatenduM Frankl, it seems like a client side caching issue,  I mean I am not sure client2 has actually sent for a lookup, or just returned from the local cache
06:40 bharata-rao joined #gluster
06:40 Frankl lalatenduM: client side cache? you mean os's page cache?
06:40 lalatenduM Frankl, have you enabled any performance translators, send the output of gluster volume info <volumename> in a pastebin
06:40 lalatenduM @pastebin
06:40 glusterbot lalatenduM: I do not know about 'pastebin', but I do know about these similar topics: 'paste', 'pasteinfo'
06:40 lalatenduM @paste
06:40 glusterbot lalatenduM: For RPM based distros you can yum install fpaste, for debian and ubuntu it's pastebinit. Then you can easily pipe command output to [f] paste [binit] and it'll give you a URL.
06:42 Frankl lalatenduM: use gist https://gist.github.com/mflu/72ee760df862968e30ef
06:42 glusterbot Title: volume infomation (at gist.github.com)
06:42 Frankl you could get the volume infomation here
06:42 Frankl lalatenduM: I don't think I have enabled any performance options
06:43 lalatenduM Frankl, nope you haven't
06:43 Frankl both two clients use the same node to mount
06:44 failshel_ joined #gluster
06:44 lalatenduM Frankl, looks like a bug, but wondering why it didn't come with others what is different in your setup
06:45 Frankl have you received my mail? If not, I could also paste it in gist
06:45 lalatenduM Frankl, if possible try NFS mount and see it you are getting the latest file in client2
06:45 vimal joined #gluster
06:45 lalatenduM also you can check "gluster v heal <volumename> info" or "gluster v heal <volumename> info heal-failed"
06:47 Frankl lalatenduM: https://gist.github.com/mflu/318d894b39695af27c2f this is the client's log
06:47 glusterbot Title: clients log (at gist.github.com)
06:48 Frankl It is production env, we could not touch the mount currently.
06:48 pk joined #gluster
06:49 Frankl this is the heal-failed output: https://gist.github.com/mflu/c279662bfa6fc17af7cc
06:49 glusterbot Title: gluster volume heal-failed (at gist.github.com)
06:49 lalatenduM Frankl, I dont see anything wrong in client logs
06:50 Frankl yes, nothing is suspicious
06:50 Frankl this is the heal info's output: https://gist.github.com/mflu/5f2dd7aede2f5a8458e4
06:50 glusterbot Title: gluster volume heal info (at gist.github.com)
06:53 lalatenduM Frankl, in the heal-failed command output do you see your file listed there
06:53 lalatenduM ?
06:53 meghanam joined #gluster
06:53 meghanam_ joined #gluster
06:53 Frankl no
06:54 cjanbanan joined #gluster
06:55 mohankumar joined #gluster
06:57 Frankl lalatenduM: do you need any server's log?
06:59 lalatenduM Frankl, I am kind of wondering why the file is not listed in the heal-failed result
07:00 lalatenduM Frankl, there is a way do full self-heal , it will check all files and will heal it if necessary , do u wantto try that
07:00 lalatenduM Frankl, also you should move to 3.4.2, it has lots of bug fix for self-heal issues
07:00 Frankl lalatenduM: what is the side effect?
07:00 lalatenduM Frankl, nothing I believe
07:02 Frankl lalatenduM: then I could try that. for 3.4.2, as you know, it is hard to make the change in a production enviroment if we didn't find a block issue :)
07:02 lalatenduM Frankl, I will suggest you two things, first try NFS and see if you see the same issue
07:03 Frankl OK
07:04 lalatenduM Frankl, 2nd do a full heal "volume heal <VOLNAME> full"
07:04 Frankl lalatenduM: thanks for your nice help. I will try both, if I find more, I will let you know
07:05 rfortier1 joined #gluster
07:05 lalatenduM Frankl, after that you can try to move to 3.4.2
07:05 lalatenduM @upgrading
07:05 lalatenduM @update
07:06 lalatenduM Frankl, here is the steps to upgrade to 3.4 http://vbellur.wordpress.com/2013/​07/15/upgrading-to-glusterfs-3-4/
07:08 lalatenduM @gluster
07:12 ppai joined #gluster
07:13 raghu joined #gluster
07:13 meghanam_ joined #gluster
07:15 jtux joined #gluster
07:15 meghanam joined #gluster
07:21 cfeller joined #gluster
07:28 aquagreen joined #gluster
07:29 badone_ joined #gluster
07:35 rfortier joined #gluster
07:36 rfortier1 joined #gluster
07:37 Philambdo joined #gluster
07:57 junaid joined #gluster
07:58 ctria joined #gluster
08:00 eseyman joined #gluster
08:06 ProT-0-TypE joined #gluster
08:06 rossi_ joined #gluster
08:07 ^rcaskey joined #gluster
08:13 ktosiek joined #gluster
08:16 prasanth joined #gluster
08:17 ekuric joined #gluster
08:20 harish joined #gluster
08:20 keytab joined #gluster
08:22 franc joined #gluster
08:22 franc joined #gluster
08:23 andreask joined #gluster
08:28 RameshN joined #gluster
08:30 lalatenduM joined #gluster
08:31 raghu` joined #gluster
08:51 cjanbanan joined #gluster
09:02 Norky joined #gluster
09:04 REdOG how do I obliterate a brick so that I can add it to a new volume?
09:05 cjanbanan In what order will the data I write to a file be sent to the servers which contains bricks in the replicated volume?
09:06 X3NQ joined #gluster
09:06 REdOG 0-management: Staging of operation 'Volume Create' failed on localhost : /awz0/brick4 or a prefix of it is already part of a volume
09:06 glusterbot REdOG: To clear that error, follow the instructions at http://joejulian.name/blog/glusterfs-path-or​-a-prefix-of-it-is-already-part-of-a-volume/ or see this bug https://bugzilla.redhat.com/show_bug.cgi?id=877522
09:08 satheesh1 joined #gluster
09:08 liquidat joined #gluster
09:09 rfortier joined #gluster
09:15 REdOG yea ive tried that...do i have to restart the entire system?
09:16 xavih REdOG: no, it's not necessary
09:17 REdOG I even deleted the entire thing and recreated it
09:17 xavih REdOG: check if the attributes are really removed with 'getfattr -m. -e hex -d /awz0/brick4'
09:17 xavih REdOG: also check if /awz0 has any attributes set
09:18 xavih REdOG: or even in '/', but this is very unlikely
09:19 REdOG the brick does
09:19 REdOG trusted.glusterfs.volume-id=
09:20 xavih cjanbanan: every brick sees the writes in the same order, however if you make writes in parallel (not waiting the answer before issuing another write), there is no rule to know in which order they will be processed
09:20 xavih REdOG: you need to remove that attribute
09:20 xavih REdOG: execute setxattr -x trusted.glusterfs.volume-id /awz0/brick4
09:21 xavih REdOG: this should remove it
09:21 REdOG why didn't it show up with the typical getfattr?
09:21 hybrid512 joined #gluster
09:22 xavih REdOG: which is the typical getfattr ? I always use the that one
09:22 REdOG I removed it and still same error
09:22 xavih REdOG: it doesn't appear now in getfattr ?
09:22 REdOG it does after it fails again to create the volume
09:23 REdOG but not before
09:23 xavih REdOG: ok, then remove it, and before trying again, check also the /awz0 and / directories
09:24 xavih REdOG: none of them should have any attribute
09:24 REdOG no such atttribute
09:24 REdOG they each have selinux
09:24 REdOG only
09:24 xavih REdOG: do you have selinux enabled ?
09:24 sputnik13 joined #gluster
09:25 REdOG unfortunatly no
09:25 Philambdo1 joined #gluster
09:25 xavih what kind of volume are you creating ? a distributed-replicated ?
09:26 REdOG just replica 2
09:26 xavih REdOG: can you show me the command you are using to create it ?
09:26 xavih REdOG: also check the atributes of the 3 direcories on both bricks
09:29 xavih REdOG: before creating a new volume with a brick that already belonged to another one it should be better to remove all its contents, including the .glusterfs directory on both bricks
09:29 REdOG gluster vol create asC replica 2 transport tcp kvm0:/awz0/brick4 kvm1:/dev/zvol/awz0/brick4
09:29 REdOG I definilty removed the .glusterfs dir
09:29 REdOG these are empty directories
09:30 xavih REdOG: if you have checked that none of the directories of both bricks have any attribute set, I don't know what is failing...
09:32 xavih REdOG: you could take a look at the brick logs to see if there is any additional information
09:32 badone__ joined #gluster
09:34 REdOG their logs haven't been touched in hours
09:34 * REdOG is lost
09:35 xavih REdOG: I assume that kvm0:/awz0 and kvm1:/dev/zvol/awz0 do not point to the same storage, right ?
09:35 xavih REdOG: look at the glusterd logs
09:35 REdOG 2 seperate hosts
09:35 REdOG 2 seperate storage systems
09:40 xavih REdOG: which version of gluster are you using ?
09:41 REdOG 3.4.2
09:41 REdOG if I try to create it with the host it's not complaining about then it works
09:41 REdOG w/o the host
09:41 REdOG seems to be an unlogged issue on kvm1
09:41 xavih which host ?
09:42 REdOG all I see in that ones log is 0-management: Stage failed on operation 'Volume Create', Status : -1
09:43 abyss^ when I create ditributed replicated volume how I can make sure that which bricks are just distributed and which are replica of others? When I do server:/brick1 server2:/brick1 server3:/brick2 server4:/brick2 then it server1 is replica of server2 etc?
09:44 cjanbanan xavih: But I guess a single write will be sent to the bricks one at a time, right? There's no broadcast (or multicast) message containing the data sent to all servers at the same time, right? If so, is there any predefined order in which these messages are sent? Is it in the same order as the servers are specified when you create the replicated filesystem?
09:45 an_ joined #gluster
09:45 sputnik13 joined #gluster
09:47 xavih cjanbanan: in a replicated volume, each write is sent to all replicated bricks but no order can be assumed. They are processed asynchronously
09:47 xavih cjanbanan: what is guaranteed is that all bricks will process all writes in the same order
09:50 ndarshan joined #gluster
09:50 xavih REdOG: have you also checked the attributes of all directories of kvm1 ? (/, /dev, /dev/zvol, /dev/zvol/aws0, /dev/zvol/aws0/brick4)
09:52 REdOG xavih: I have found that my error is the blockdevice if I try to use just it I am told it isn't a directory
09:54 xavih REdOG: the error message is misleading, but definitely this seems to be a problem
09:54 REdOG xavih: should gluster be capable of using a block device such as my command? or should I be mounting it first?
09:55 xavih abyss^: This is from an older version but, it still aplies: http://gluster.org/community/documen​tation/index.php/Gluster_3.2:_Config​uring_Distributed_Replicated_Volumes
09:55 glusterbot Title: Gluster 3.2: Configuring Distributed Replicated Volumes - GlusterDocumentation (at gluster.org)
09:55 REdOG I seem to have discovered another error if I try to mount it directly
09:55 xavih REdOG: it needs a file system
09:55 xavih REdOG: gluster cannot work directly on a block device
09:56 xavih REdOG: it's also recommended to not use the root directory of a filesystem. It's better to create a subdirectory to store brick data
09:56 cjanbanan xavih: OK, so I guess I can't control which brick that will receive my data first by specifying the 'gluster volume create test-volume replica 2 mpa:/storage1 mpb:/storage2 force' command in a clever way? The reason for this rather strange question is that I need to investigate if I can avoid a split-brain situation in my redundant embedded system.
09:57 Oneiroi joined #gluster
09:57 xavih cjanbanan: you cannot rely on that
09:58 cjanbanan xavih: OK, thanks. Back to the drawing board then... ;-)
09:58 abyss^ xavih: thank you, I read this but I didn't get it;) Now I see is excatly that I wrote above. Thank you;)
10:00 xavih cjanbanan: internally gluster will send the data in a loop, so it really sends the data in some order, however operating system tcp queues, process preemption, priority tasks and network issues can make that the requests be processed in any order in the bricks
10:00 xavih cjanbanan: gluster does not enforce any order between bricks. This would add an important performance penalty
10:01 xavih abyss^: yes, in a replica 2 each server is paired as you say
10:01 abyss^ thank you. I still working on my english;)
10:02 SteveCooling Hi guys. I'm having trouble getting georeplication working on one volume. I have one volume "geotest" that works just fine, but the one with the actual data on seems to just sync up changes once and then replication "hangs", eventually spewing "failed to sync file." into the log. All I can find in the docs about this is rsync issues, ...which cannot be the case since it works on the other volume??
10:07 khushildep joined #gluster
10:08 REdOG I cant do it wiht a mount point
10:13 xavih REdOG: once mounted the filesystem, you should create a subdirectory and use it as the brick in the create volume command
10:13 cjanbanan xavih: I was suspicious of the stuff you mention (tcp queues, process preemption, etc). As long as the message is sent from a client in some order, I'd be alright. I don't care that much about when the brick actually receives the data. I would like to control which brick is the first destination of my data in case the host which contains the writing process restarts. In such a case, I guess that I may end up in a situation where only some of my bricks
10:14 borreman_dk joined #gluster
10:15 REdOG that's a little peculliar why not just the / of the mount?
10:15 REdOG I got it to create now though so tks
10:16 xavih cjanbanan: if the process that is writing restarts, it cannot assume anything. The last write can have reached brick 1, brick 2, both or none
10:17 xavih cjanbanan: there is no way to be sure of that
10:17 xavih cjanbanan: what are you trying to achieve ?
10:18 xavih REdOG: It's better to use a subdirectory because if the filesystem mount fails on a server reboot or any other thing, gluster brick will not find the subdirectory and won't start
10:18 REdOG o i c
10:18 cjanbanan xavih:  Thanks for making it clear to me, even though I would have hoped for another answer. ;-)
10:19 xavih REdOG: otherwise the bricks would start and use invalid information
10:19 REdOG not if it used the device directly :P lol
10:19 cjanbanan xavih: I'm investigating if this file system can be used for my redundant embedded system.
10:20 REdOG that makes sense though. I get it now
10:20 REdOG ive been awake way too long
10:21 cjanbanan xavih: I understand that I need to avoid split-brains, as I'm not allowed to disturb the clients. The files have to be accessible.
10:21 xavih cjanbanan: do you want to create a redundany filesystem over glusterfs ?
10:23 cjanbanan xavih: The applications on the embedded system are also redundant. Whenever the host restart for some reason a new one takes over and the clients resumes where they left off.
10:23 DV joined #gluster
10:24 xavih cjanbanan: how do you control at which point the last host died ?
10:25 ira joined #gluster
10:26 xavih cjanbanan: I cannot assure it will work, however you should test it: if a host containing a brick dies, on restart an automatic self-heal will begin to heal the updated files. Gluster will correctly identify them and everything will go well.
10:26 xavih cjanbanan: the only problem is if two hosts are alive but the interconnection between them is lost
10:27 xavih cjanbanan: in this case both clients can write to the same file
10:27 xavih cjanbanan: can this happen ?
10:27 cjanbanan xavih: That's why the clients are disturbed in case of split-brain files. If I were to implement a glusterfs for only my purpose, I'd actually be able to make it simpler by just choosing one brick as the latest file and then overwrite the others. I guess the reason for split-brain is that you really put an effort into determining which file contains the most recent data?
10:27 xavih cjanbanan: there should be only one process running, right ?
10:28 xavih cjanbanan: but when do you get an split-brain ?
10:29 cjanbanan xavih: It's an advanced embedded system. Even I don't have the full picture. I just know that in some sense the clients restarts and continue where they left off.
10:29 xavih cjanbanan: in normal circumstances, all hosts will get the latest version of the file, independently of which brick has the most updated version
10:30 xavih cjanbanan: gluster knows which bricks are less updated than the others, and uses the lastes information always
10:30 xavih cjanbanan: only if two processes write to the same file while the connection is lost, an split-brain could happen
10:30 xavih cjanbanan: even in this case, you could use quorum to avoid split-brains
10:32 kanagaraj joined #gluster
10:33 xavih I've a meeteing now. I'll come later if you want to discuss it further
10:34 REdOG anyway to force gluster to replicate added files in the brick directories that weren't added through the fuse mount?
10:35 REdOG that seems easier than copying around the data again
10:39 cjanbanan In my case there's only one process on host A writing to the file, but it's corresponding process on the redundant host (B) will continue to write when host A restarts. As the bricks resides on both host A and B, this means that there will be different updates to the same file on the bricks. I've verified this in a controlled manner using Virtualbox.
10:45 glusterbot New news from newglusterbugs: [Bug 1066410] xsync creates 0byte files and skips replication <https://bugzilla.redhat.co​m/show_bug.cgi?id=1066410>
10:50 satheesh3 joined #gluster
10:55 calum_ joined #gluster
10:58 rgustafs joined #gluster
10:59 mohankumar joined #gluster
11:04 hchiramm_ joined #gluster
11:08 neurodrone__ joined #gluster
11:10 DV joined #gluster
11:14 psyl0n joined #gluster
11:14 davinder joined #gluster
11:16 Norky joined #gluster
11:16 keytab joined #gluster
11:17 pkoro joined #gluster
11:21 glusterbot New news from resolvedglusterbugs: [Bug 963153] E [socket.c:2790:socket_connect] 0-management: connection attempt on /var/run/08421e3828f08de5bb681291123cdf10.socket failed, (Connection refused) <https://bugzilla.redhat.com/show_bug.cgi?id=963153>
11:26 khushildep joined #gluster
11:27 rgustafs joined #gluster
11:27 rgustafs_ joined #gluster
11:30 lalatenduM joined #gluster
11:34 ihre left #gluster
11:36 psyl0n joined #gluster
11:40 burn420 joined #gluster
11:40 Norky joined #gluster
11:41 calum_ joined #gluster
11:41 neurodrone__ joined #gluster
11:42 rgustafs joined #gluster
11:42 rgustafs_ joined #gluster
11:44 pk joined #gluster
11:49 kkeithley1 joined #gluster
11:49 rgustafs joined #gluster
11:50 meghanam joined #gluster
11:50 meghanam_ joined #gluster
12:05 andreask joined #gluster
12:05 an_ joined #gluster
12:11 failshell joined #gluster
12:13 Copez joined #gluster
12:13 Copez Hello all
12:13 Copez A little question about the disk configuration
12:13 Copez Someone who could help me out?
12:14 kkeithley_ ,,(hello)
12:14 glusterbot kkeithley_: Error: No factoid matches that key.
12:14 kkeithley_ ,,(hi)
12:14 glusterbot Despite the fact that friendly greetings are nice, please ask your question. Carefully identify your problem in such a way that when a volunteer has a few minutes, they can offer you a potential solution. These are volunteers, so be patient. Answers may come in a few minutes, or may take hours. If you're still in the channel, someone will eventually offer an answer.
12:14 itisravi joined #gluster
12:15 Copez I've 9 empty disk slots which have to be filled with disks for storage
12:15 Copez should i choose RAID or just ZFS as the underlying storage?
12:15 vpshastry1 joined #gluster
12:16 pk left #gluster
12:17 ira Copez: It depends....
12:17 ira But you should ask on #zfs :)
12:20 Copez No not realy
12:21 Copez Is it wise to choose R0, R5, R6 or R10
12:21 Copez The GlusterFS will be used for KVM storage...
12:21 ira "It depends on what you are doing."  Also "What raid card are you using?"
12:21 Copez It will be software raid
12:22 ira Then use ZFS, if you are going to use ZFS.
12:22 Copez each brick will have 3 replica's
12:22 ira At least that's the typical advice.
12:23 Copez Allright, thanks for that. But which RAID-level would you recommend?
12:23 ira As far as that goes, what are the VMs doing?
12:23 Copez Because RAID and ZFS have the same charastics
12:23 Copez that depends, we have some Windows Terminal Server
12:24 Copez Most servers will be running JAVA-projects
12:24 ira All small IO?
12:24 Copez Most servers are Linux based...
12:24 Copez I think so
12:24 Copez the current IOPS on the SAN are not that many
12:24 ira (small blocks.) How big are the disks you are using?
12:25 Copez Current or in the Gluster?
12:25 ira in Gluster.
12:25 Copez Well we need 10 TB in the Gluster..
12:25 Copez I have 9 empty slots
12:25 ira Rebuild time is a real concern with larger drives...
12:26 Copez Correct, but with 3 replica's we should manage that I think
12:26 Copez The CPU won't be the bottleneck
12:26 ira Are you using gluster within the same box, or cross boxes?
12:26 Copez GlusterFS network is based on 10GBe
12:27 Copez No, 3x GlusterFS and 4x KVM HyperVisor (That talks GlusterFS (No NFS!)to the GlusterFS array)
12:27 ira Use what you are comfortable with the rebuild time of.
12:28 ira That's the big thing on the raid choice.  IF you are going to count on local rebuilds.  And I wouldn't force total machine rebuilds.
12:28 ira on a disk failure.  That
12:28 ira 's a bit harsh.
12:29 Copez How could me manage that problem in your opinion
12:29 Copez I need a storage array of 12 TB
12:29 Copez Money is no problem
12:29 Copez (In certain way then ;))
12:30 Copez The storage must be HA
12:30 Copez Rebuild's should have much impact on performance
12:30 Copez * should'nt
12:30 ira Ok.  I'd start off defining "HA" for your world, and while it sounds pedantic, it is critical, especially with ZFS.
12:31 ira thought ZFS holding gluster, it may not be as badd.
12:31 Copez Well, if one node chrashes the storage must be available with all the VM's / data
12:32 Copez But we can't allow that the storage will be unavailable
12:32 Copez With a disk failure, the Gluster should rebuild in a manner time
12:36 ira Copez: It depends, you might see me do something with RAID 1, and then replica 2 over the top.
12:37 ira That way any individual disk can be rebuilt fast...
12:37 Copez Basicly the same as R10...
12:38 ira Well, it's really a 4 way mirror...
12:38 ira But yes... similar concepts.
12:38 Copez Okay, but I must have the 3x replica's
12:38 Copez so each node would contain the data which is available in the Gluster
12:39 Copez So R10 would be the best?
12:39 DV joined #gluster
12:39 cjanbanan xavih: I think you described my situation pretty good. I just discovered that you wrote about both hosts being alive but the interconnection is lost. Even though only one of the hosts contains an active client the contents of the bricks will lead to split-brain. Before host A reboots, the client on that host writes to the file. When it reboots, the interconnection is lost until the system is up and running again. In the meantime the corresponding redun
12:39 Copez But then the BIG question... would you use SATA or SAS as data-drives?
12:40 ^rcaskey They make SAS drives still?
12:40 Copez hhahaha
12:40 Copez ;)
12:40 Copez You're not scared to use 4 TB SATA's in your array? (humble)
12:41 ^rcaskey Honestly it depends on my workload
12:41 ^rcaskey I've go 2x 2TB drives here on my production VM host and nobody notices becase there is a gig of NVRAM and 64 gigs of ram and for the work we do everything ends up sitting in ram.
12:44 hagarth joined #gluster
12:49 khushildep joined #gluster
12:49 vpshastry1 joined #gluster
12:51 an__ joined #gluster
13:00 Copez Guys, what do you thin of this disk "ST3000DM001"
13:11 haomaiwa_ joined #gluster
13:13 NuxRo Copez: cheap?
13:15 Copez I meant, do they fit as a reliable disk in a Gluster?
13:15 Copez Or do you guys recommend some other drive?
13:15 Copez based on experience
13:16 psyl0n joined #gluster
13:19 Slash joined #gluster
13:21 RameshN joined #gluster
13:22 NuxRo Copez: it doesnt really matter, everybody uses what they have, that's the advantage of gluster
13:23 Copez Hmmm, okay
13:23 NuxRo with cheap drives i prefer to use raid
13:23 Copez well thank you all for the infomation given
13:23 NuxRo and use gluster on top of raid
13:23 Copez (Y)
13:25 plarsen joined #gluster
13:30 chirino joined #gluster
13:33 ^rcaskey the larger my storage operation the less inclined i would be to use raid and the more i'd depend on gluster
13:35 Copez But you need the RAID to get more disk capacity  (?!)
13:35 psyl0n joined #gluster
13:36 cjanbanan Another question: How does glusterfs detect differences in the replicas stored on the bricks of a replicated file system? How does it decide the file to be considered the most recently updated?
13:39 sroy_ joined #gluster
13:39 wica joined #gluster
13:46 smithyuk1 joined #gluster
13:51 B21956 joined #gluster
13:53 an_ joined #gluster
13:56 bala joined #gluster
13:57 primechuck joined #gluster
13:59 spandit joined #gluster
14:00 bala joined #gluster
14:01 khushildep joined #gluster
14:02 bennyturns joined #gluster
14:06 nightwalk joined #gluster
14:07 tdasilva joined #gluster
14:07 benjamin_____ joined #gluster
14:10 Philambdo joined #gluster
14:11 bala1 joined #gluster
14:12 neurodrone__ joined #gluster
14:12 plarsen joined #gluster
14:14 gmcwhistler joined #gluster
14:18 liquidat_ joined #gluster
14:23 xavih cjanbanan: when one host reboots, nothing bad should happen. Gluster can take care of this situation and all other bricks will continue to store volume data
14:23 theron joined #gluster
14:24 xavih cjanbanan: when this host brings online again, gluster will update its brick files with more up to date data. I don't see where is the problem...
14:25 xavih cjanbanan: Gluster uses a set of extended attributes to track how many updates are pending in each brick. If one brick is offline, when it comes online again, the other bricks will have pending write operations for it
14:26 calum_ joined #gluster
14:27 cjanbanan xavih: So you rely on some kind of journal to find the pending write operations?
14:28 xavih cjanbanan: to identify if some brick is outdated, only the extended attributes are used
14:29 xavih cjanbanan: there is no journaling of specific write operations, but it knows that the file needs to be healed and it heals it
14:29 dbruhn joined #gluster
14:30 cjanbanan xavih: Does heal mean that it copies the file from the most recent or does it rely on the journal of the underlying file system to restore sync?
14:33 xavih cjanbanan: it copies the file from the brick that is most up to date in the sense that other bricks does not have pending changes for it. It does not depend on the timestamp of the file
14:34 xavih cjanbanan: if more than one brick has changes for the other, then is where a split-brain happens
14:35 xavih cjanbanan: an split brain can only occur if two hosts are writing simultaneously to the same file but the network has been lost
14:35 xavih cjanbanan: and no quotum is configured
14:35 xavih s/quotum/quorum/
14:35 glusterbot What xavih meant to say was: cjanbanan: and no quorum is configured
14:36 cjanbanan xavih: Can clients open and write to the file while it is being copied or do they have to wait until the file is in sync?
14:38 johnmilton joined #gluster
14:39 xavih cjanbanan: all this is handled internally by gluster. User side applications do not need to worry
14:40 xavih cjanbanan: you can be accessing the file while gluster heals it, there shouldn't be any problem
14:40 jmarley joined #gluster
14:40 jmarley joined #gluster
14:42 edward2 joined #gluster
14:43 cjanbanan xavih: But I can't access it when the connection between the bricks is lost (to avoid split-brain). I'm trying to understand if there is a certain amount of time after the connection is restored that is needed for glusterfs to heal, until my clients can access the files?
14:46 benjamin_____ joined #gluster
14:47 xavih cjanbanan: gluster won't start (or shouldn't start) until the network is up
14:48 xavih cjanbanan: when the network is up and gluster reconnects, the volume is brought online, and then the application can access it
14:48 JoeJulian The self-heal daemon will run automatically. Additionally, if a client tries to access a file that is in need of healing, the self-heal will be run in the background from that client. All that is transparent to the application layer.
14:49 xavih cjanbanan: if the network connection is lost due to a host failure, no split-brain is possible
14:50 xavih cjanbanan: also you have said that only one host will run the application at any single moment, only if one host dies the application will start on another, so I don't see where a split-brain can happen
14:52 xavih cjanbanan: you need two running applications on different hosts with gluster active and the network connection broken to get split-brains
14:52 burn420 @JoeJulian you are the man!
14:56 cjanbanan xavih: It will happen if data reach the local storage on host A but not the replica brick before the restart. At the restart, the connection is lost and the redundant host (B) which contains the replica brick will take over. In this case a client might want to write to the same file before host A is up and running again. This means that it will write to the file before the connection is restored.
14:57 davinder joined #gluster
14:58 vpshastry joined #gluster
14:58 JoeJulian There is no "take over". The client connects to all bricks and manages the replication from there. True, there are still opportunities for split brain under the default configuration. If consistency is more important than availability in your use case, use quorum.
14:58 cjanbanan xavih: The redundant clients will write in sequence to the file, but on different brick as the connection is broken.
15:00 tokik joined #gluster
15:00 cjanbanan JoeJulian: 'take over' refers to my redundant embedded system.
15:00 xavih cjanbanan: the only way this can happen is if host A dies (not restarted in a controlled way). In this scenario there isn't any filesystem that can guarantee that the write has been made
15:01 cjanbanan JoeJulian: I'll investigate this quorum concept to find out if it will solve my problem. Thanks!
15:02 cjanbanan xavih: Yes it dies.
15:03 bugs_ joined #gluster
15:03 tokik joined #gluster
15:04 cjanbanan xavih: That's the reason for the need of a redundant host.
15:05 xavih cjanbanan: I'm not sure excatly how gluster handles this situation, but any write won't start until all bricks are locked
15:05 xavih cjanbanan: this means that any partial write must have happened with locks acquired. In this case the surviving brick knows that there is a pending write
15:06 xavih cjanbanan: what I'm not sure is what it does exactly when this happens. Probably it aborts the write as if it never happened
15:06 xavih cjanbanan: when the other brick comes online again, self-heal will copy the file from host B to host A
15:07 FooBar joined #gluster
15:07 xavih cjanbanan: and the file will be consistent
15:08 spandit joined #gluster
15:08 FooBar I'm seeing a lot of these messages... every 3 seconds) ... any idea what's causing it... how to get rid of them?
15:08 FooBar E [socket.c:2788:socket_connect] 0-management: connection attempt failed (Connection refused)
15:08 xavih cjanbanan: what must be guaranteed is that host B doesn't die before host A has come online and healed the file
15:09 xavih cjanbanan: otherwise a split-brain will surely happen
15:09 cjanbanan xavih: When I simulate this in Virtualbox, running two hosts, I end up with split-brain.
15:09 xavih cjanbanan: are you sure that you don't kill the hosts before they have been healed ?
15:09 wica Is it necessary to stop/start a volume,  when change allow-insecure to on?
15:10 cjanbanan xavih: I simulate the restart by just removing the network connection between the hosts.
15:10 xavih cjanbanan: this is not a valid simulation of a restart since the host is still alive and accessing the volume
15:10 xavih cjanbanan: obviously this will create split-brains
15:10 japuzzo joined #gluster
15:11 xavih cjanbanan: you need to kill the virtual machine to simulate a server crash
15:12 cjanbanan xavih: Is there anything else affecting the file than the client, in this case me issuing an echo command to add a line to the text file?
15:12 xavih cjanbanan: anything writing data to the volume can generate an split-brain on the file modified while the connection is lost
15:13 xavih cjanbanan: if you cut the connection between the hosts and from both of them you write data to the same file, when the connection is restablished, gluster will detect an split-brain
15:14 FooBar And is there some way to limit the speed / load of the self-heal-deamon ... my system completely stalled when a gluster-node came back into the cluster.... self-heal causes it to be 100% loaded for hours
15:14 cjanbanan xavih: But in this simulation no-one else is aware of this text file, so I'm the only client.
15:15 xavih cjanbanan: if you write to this file only from one client, say host A, and host B does not touch this file, then there won't be any problem
15:16 xavih cjanbanan: anyway I think you are not testing the scenario you want to test...
15:16 lmickh joined #gluster
15:17 Derek_ joined #gluster
15:18 glusterbot New news from newglusterbugs: [Bug 1066511] Enhancement - glusterd should be chkconfig-ed on <https://bugzilla.redhat.co​m/show_bug.cgi?id=1066511> || [Bug 1066529] 'op_ctx modification failed' in glusterd log after gluster volume status <https://bugzilla.redhat.co​m/show_bug.cgi?id=1066529>
15:18 cjanbanan xavih: The crucial point in this simulation is that the file on host A is changed but the restart occurs before those changes are sent to host B. That's easier to accomplish by disconnection the network than to kill the virtal machine in the right time.
15:19 xavih cjanbanan: yes, but you are trying to test an internal gluster behavior using external tools. This doesn't lead to the same point
15:20 xavih cjanbanan: you are not really testing the case in which gluster has initiated a write but has only reached the local host, not the remote one
15:22 xavih cjanbanan: I don't know how you can test this scenario without debugging or modifying source code
15:22 rgustafs joined #gluster
15:23 xavih cjanbanan: or you can repeatedly kill one host in the middle of a write and see what happens. Statistically some of the writes will be caught in the middle.
15:23 cjanbanan xavih: Yes, but I thought this would be close enough and in a more controlled manner than relying on luck. :-)
15:25 xavih cjanbanan: I'm sorry, but it's not the same. Gluster does a lot of things to ensure that data is consistent. If you cut the connection and write on purpose to the file, all these checks are bypassed
15:26 dusmant joined #gluster
15:26 cjanbanan xavih: The echo command seems like a more predictable client than a process running on its own.
15:26 xavih cjanbanan: any test you can do on the user side won't reproduce the same conditions that will happen if a server dies in the middle of a write
15:28 cjanbanan xavih: OK, thanks. You've been very helpful. I need to sit down and take a second look at this. :-)
15:29 xavih cjanbanan: if you have debugging skills, you could place a breakpoint somewhere in the write path and kill the process after the write has reached the local host but not the remote one
15:29 xavih cjanbanan: I don't see any other way to test this particular scenario in a reproducible way
15:31 cjanbanan xavih: I downloaded the source code, but I realize there's a lot going on because it's a bit hard to know where to investigate this.
15:32 dusmant joined #gluster
15:32 davinder2 joined #gluster
15:32 sroy joined #gluster
15:32 cjanbanan xavih: I'll be glad to debug it though, if I could just get a grip of the source code.
15:32 ProT-0-TypE joined #gluster
15:32 nightwalk joined #gluster
15:33 jikz joined #gluster
15:33 xavih cjanbanan: the loop that sends the write to the bricks is at xlators/afr/src/afr-inode-write.c at line ~230
15:35 xavih it's a bit complicated, because you should block one of the writes but let the other continue, and kill the process before all write processing is done (otherwise it will happen the same than your tests)
15:35 cjanbanan xavih: As I mentioned previously, the easy solution for my purpose would be to always copy the file stored on the local host without figuring out which one is the most recent. That would eliminate the risk of a split-brain. I realize that this would lead to more data loss than you solution, but in my system it would actually be acceptable (as long as the disks are kept in sync).
15:35 dbruhn does anyone know what the force on rebalance on 3.3 does
15:36 JoeJulian dbruhn: rebalance won't move a file from a less-full to a more-full brick. Force overrides that.
15:36 xavih dbruhn: I think that it forces the rebalance of files even if the target brick has less space
15:36 _pol joined #gluster
15:36 dbruhn Thanks, that exactly what I need
15:36 cjanbanan xavih: Thanks for the directions. I'll take a second look at the src. :-)
15:37 xavih cjanbanan: you're welcome :)
15:38 rotbeard joined #gluster
15:39 vpshastry joined #gluster
15:39 vpshastry left #gluster
15:40 dbruhn anyone know how to kill an already running rebalance?
15:41 dbruhn stop isn't working
15:43 xavih dbruhn: If stop doesn't work, I don't know any other way that restarting the volume...
15:43 rossi_ joined #gluster
15:48 rpowell joined #gluster
15:50 liquidat joined #gluster
15:53 Derek_ morning
15:53 Derek_ have a geo-replication problem with Gluster 3.2.6
15:54 Derek_ we're seeing "returning as transport is already disconnected"
15:55 Derek_ My question is would you spend time troubleshooting this or start planning for an upgrade?
15:55 DV joined #gluster
15:56 daMaestro joined #gluster
15:57 an_ joined #gluster
16:02 acalvo joined #gluster
16:03 acalvo In a multi-node cluster performing geo-replication against a path, what happens if the node doing the geo-replication dies and the same jobs is fired up from another node in the cluster?
16:06 mtanner_ joined #gluster
16:09 SteveCooling Hi guys. I'm having trouble getting georeplication working on one volume. I have one volume "geotest" that works just fine, but the one with the actual data on seems to just sync up changes once and then replication "hangs", eventually spewing "failed to sync file." into the log. All I can find in the docs about this is rsync issues, ...which cannot be the case since it works on the other volume??
16:10 SteveCooling to be precise, i have two volumes with identical config. one is almost empty, and the other has lots of data in it. the "empty" one geosyncs just fine, and the other can't keep up at all. everything reports "OK" and no clues in the logs. how do I debug?
16:11 robothands joined #gluster
16:12 jag3773 joined #gluster
16:13 Derek_ I'm having similar issues with my geo-replication
16:15 psyl0n joined #gluster
16:18 nixpanic joined #gluster
16:18 nixpanic joined #gluster
16:20 Derek__ joined #gluster
16:22 acalvo SteveCooling, check rsync versions
16:22 acalvo do not assume that a test volume would be the same as an actual working volume
16:22 acalvo happen to me yesterday
16:23 vpshastry joined #gluster
16:23 acalvo is it geo volume to volume o volume to path?
16:25 smithyuk1 Hi all, upgraded to 3.4.2 today from 3.3.0; in the process of rebalancing but if we check status it just says localhost for all the bricks bar one. Same for all servers with a seemingly random exception where it shows IP on each. Any ideas why this might be?
16:29 JoeJulian smithyuk1: please paste what you're seeing to fpaste.org and post the link here.
16:29 theron joined #gluster
16:30 wushudoin joined #gluster
16:32 kaptk2 joined #gluster
16:38 smithyuk1 JoeJulian: http://ur1.ca/gndd9 thanks
16:38 glusterbot Title: #78278 Fedora Project Pastebin (at ur1.ca)
16:38 jobewan joined #gluster
16:39 vpshastry joined #gluster
16:40 JoeJulian Oh... a rebalance status... that makes more sense. I was thinking you were saying volume status which would have been really odd.
16:40 kl4m joined #gluster
16:40 smithyuk1 Oh yeah, sorry probably wasn't clear
16:40 JoeJulian I believe that says that the host you ran the rebalance status on is "localhost". Do you get different output on a different server?
16:41 rossi_ joined #gluster
16:41 smithyuk1 Nope, same on all servers except for one IP at the bottom
16:42 smithyuk1 Which is seemingly random?
16:43 JoeJulian rebalance is supposed to share the load now so it makes a tiny bit of sense. I also suspect that a "peer status" from multiple servers will show one with an ip address instead of a hostname.
16:44 JoeJulian Could it be resolving the local ip address to localhost in /etc/hosts?
16:45 smithyuk1 127.0.0.1 resolves to that, none of the other bricks do though
16:45 smithyuk1 They are all at remote IPs
16:46 smithyuk1 All looks correct in peer status
16:49 bennyturns joined #gluster
16:52 cp0k joined #gluster
16:53 cp0k Hey gang, so I mentioned yesterday that I am getting an exit code 146 in response to commands such as 'gluster volume status'...someone asked if I had glusterfsd running, it seems I do not. Is this a problem
16:55 cp0k I take that back, the init script tells me that glusterfsd is not running...however ps -auxww|grep gluster shows that it is infact running
16:55 Mo__ joined #gluster
16:55 JoeJulian smithyuk1: Just tested it on my end and the server I run the rebalance status command on shows as localhost for me and the other hosts show their hostnames. If rebalance wasn't working, I'd consider digging in to the source to see what's up, but I don't immediately see anything that warrants that.
16:55 cp0k I am about to add new bricks to a production env and want to make sure I will be able to issue the rebalance command succesfully
16:56 semiosis cp0k: ,,(processes)
16:56 glusterbot cp0k: The GlusterFS core uses three process names: glusterd (management daemon, one per server); glusterfsd (brick export daemon, one per brick); glusterfs (FUSE client, one per client mount point; also NFS daemon, one per server). There are also two auxiliary processes: gsyncd (for geo-replication) and glustershd (for automatic self-heal). See http://goo.gl/F6jqx for more
16:56 glusterbot information.
16:56 semiosis fyi
16:57 smithyuk1 JoeJulian: it's probably worth nothing that it happened both in our QA environment and production. seems to rebalance okay anyhow so i will look into it when i get some time. thanks for your help
16:59 cp0k I see, so adding the new bricks should prompt Gluster to start the rebalance / heal automatically? or I must do this manually with 'gluster volume rebalance vol_name start' ?
16:59 cp0k ( I am running 3.4.2
16:59 KyleG joined #gluster
16:59 KyleG joined #gluster
17:01 saurabh joined #gluster
17:01 JoeJulian manually
17:02 JoeJulian hmm, there is no error 146... Was that the right exit code, or a typo?
17:03 cp0k JoeJulian: I believe that is the correct error code exit, and not a typo...I am confirming
17:03 cp0k JoeJulian: I just executed 'gluster volume status' on the two new storage nodes, and it came back with a result right away.
17:03 JoeJulian mips?
17:03 cp0k JoeJulian: On the existing storage nodes, that is not the case
17:04 cp0k mips?
17:04 sputnik13 joined #gluster
17:04 cp0k # gluster volume status
17:04 cp0k # echo $?
17:04 cp0k 146
17:05 cp0k I was thinking of restarting all gluster* procs on the storage nodes...do you think that would help?
17:05 JoeJulian I see an asm-mips/errno.h has 146 as ECONNREFUSED. Assuming that's right, can you telnet to port 24007?
17:06 jobewan joined #gluster
17:06 cp0k after upgrading to 3.4.2, I saw alot of disk activity, as if Gluster was doing a 'sanity' check on its files
17:06 cp0k JoeJulian: let me check
17:06 vpshastry joined #gluster
17:07 lpabon joined #gluster
17:08 JoeJulian cp0k: also, satisfy my curiosity with a "uname -a"
17:09 acalvo JoeJulian, quick question: can I point different cluster nodes to replicate via geo-replication to the same destination?
17:09 cp0k JoeJulian: I am able on port 24007 to all storage nodes
17:10 JoeJulian acalvo: Odd, but I don't see any reason why not off the top of my head.
17:10 JoeJulian cp0k: It was building the .glusterfs tree.
17:10 JoeJulian @lucky what is this new .glusterfs tree
17:10 glusterbot JoeJulian: http://joejulian.name/blog/what-is-​this-new-glusterfs-directory-in-33/
17:11 zerick joined #gluster
17:11 cp0k JoeJulian: the telnet prompted to gluster building the .glusterfs tree? I don't follow
17:12 JoeJulian cp0k: "after upgrading to 3.4.2, I saw alot of disk activity, as if Gluster was doing a 'sanity' check on its files"
17:13 cp0k JoeJulian: ahh, gotcha!
17:13 robothands left #gluster
17:15 cp0k JoeJulian: It appears that Gluster is still rebuilding the .glusterfs tree, as I am still seeing high disk activity on 2 of the 4 existing storage nodes. With this being a 189TB system that is currently 90% full
17:15 JoeJulian cp0k: truncate, cause the error, and paste /var/log/glusterfs/cli.log . Also paste the part of /var/log/etc-glusterfs-glusterd.vol.log surrounding that test.
17:15 cp0k JoeJulian: do you recommend I wait for the tree rebuild to complete first prior to adding in the new storage nodes?
17:16 JoeJulian cp0k: I wouldn't wait.
17:16 JoeJulian brb...
17:17 jag3773 joined #gluster
17:18 rwheeler joined #gluster
17:20 cp0k JoeJulian: okay
17:23 vpshastry joined #gluster
17:35 sarkis joined #gluster
17:39 sarkis y
17:43 Matthaeus joined #gluster
17:44 rossi_ joined #gluster
17:47 davinder joined #gluster
17:50 rossi__ joined #gluster
17:50 primechu_ joined #gluster
17:53 an__ joined #gluster
17:53 semiosis @seen _dist
17:53 glusterbot semiosis: _dist was last seen in #gluster 3 days, 20 hours, 59 minutes, and 16 seconds ago: <_dist> (x+1 I should say, where x is current bricks)
17:53 semiosis where'd he go?
17:54 wrale JoeJulian: looking further into the split-horizon DNS thing.  It looks like a single instance of BIND is capable of split-horizon DNS, by supplying records based on the source network of a given request.  In setting up split-horizon for GlusterFS, have you deployed multiple layers of BIND or did you go with a single (internal) layer using the 'view clause' (reference: http://www.zytrax.com/books/d​ns/ch6/index.html#split-view )?
17:54 glusterbot Title: Chapter 6 DNS Sample Configurations (at www.zytrax.com)
17:55 JoeJulian wrale: single layer, but I also only have 1 network for my production configuration.
17:55 wrale eh.. rather.. which do you think would be better, given multi-homed DNS server hardware
17:55 JoeJulian If I were doing it, I would implement views.
17:55 wrale JoeJulian: Glad to hear it.  I am not a big fan of sprawl . :)
17:55 JoeJulian amen
17:56 wrale Thanks again
17:56 an__ joined #gluster
17:57 an__ joined #gluster
18:00 sputnik13 joined #gluster
18:07 spiekey joined #gluster
18:07 an___ joined #gluster
18:07 spiekey Hello!
18:07 glusterbot spiekey: Despite the fact that friendly greetings are nice, please ask your question. Carefully identify your problem in such a way that when a volunteer has a few minutes, they can offer you a potential solution. These are volunteers, so be patient. Answers may come in a few minutes, or may take hours. If you're still in the channel, someone will eventually offer an answer.
18:08 KyleG left #gluster
18:12 an__ joined #gluster
18:16 spiekey i am benchmarking glusterfs with replication over 3GBit (bonded). with dd i am getting write speed of about 230MB/sec, but with bonnie i only get 67MB/sec
18:16 spiekey any ideas on this?
18:17 JoeJulian My usual idea is that you're trying to compare apples to orchards.
18:17 spiekey why?
18:18 burn420 joined #gluster
18:18 JoeJulian What are you trying to accomplish? Does one client running dd or bonnie represent a good approximation of your end goal?
18:21 spiekey they should be equal
18:21 spiekey why should it differ?
18:21 psyl0n joined #gluster
18:21 spiekey maybe in random writes, but not in simple block writes
18:21 JoeJulian tcp encapsulation for one...
18:22 spiekey oh, i am writing onto my mounted glusterfs share
18:22 spiekey so its always replicating. with the dd and bonnie test
18:23 rpowell1 joined #gluster
18:24 JoeJulian Yep, replication is synchronous*.
18:26 KyleG1 joined #gluster
18:26 spiekey yes
18:28 cp0k JoeJulian: Do you see any harm in me adding the new storage bricks while Gluster 3.4.2 is still rebuilding the .glusterfs tree? can the rebuild also be the reason I am getting that 146 exit code?
18:28 cp0k JoeJulian: My fear is that I'll be able to add the bricks but it wont allow me to start the rebalance, instead just exit with that code 146
18:29 spiekey JoeJulian: how should i messure the write performance of my replication gluster?
18:29 JoeJulian cp0k: I don't think so, but I wouldn't do changes (even if it would let you) with errors happening. How about those logs I asked for?
18:29 JoeJulian spiekey: I'm a big fan of testing actual use cases.
18:30 cp0k JoeJulian: you said 'truncate, cause the error, and paste /var/log/glusterfs/cli.log . Also paste the part of /var/log/etc-glusterfs-glusterd.vol.log surrounding that test.
18:30 rossi_ joined #gluster
18:30 cp0k '
18:30 cp0k JoeJulian: I am not 100% sure what you are asking me to truncate
18:30 JoeJulian /var/log/glusterfs/cli.log
18:30 XpineX joined #gluster
18:30 cp0k JoeJulian: ah, okay, one moment
18:33 cp0k JoeJulian: Here is the result of a fresh cli.log file:
18:33 cp0k [2014-02-18 18:33:26.377884] W [rpc-transport.c:175:rpc_transport_load] 0-rpc-transport: missing 'option transport-type'. defaulting to "socket"
18:33 cp0k [2014-02-18 18:33:26.378691] I [socket.c:3480:socket_init] 0-glusterfs: SSL support is NOT enabled
18:34 cp0k [2014-02-18 18:33:26.378725] I [socket.c:3495:socket_init] 0-glusterfs: using system polling thread
18:34 cp0k [2014-02-18 18:33:26.712096] E [cli-rpc-ops.c:6026:gf_cli_status_volume_all] 0-cli: status all failed
18:34 cp0k [2014-02-18 18:33:26.712189] I [input.c:36:cli_batch] 0-: Exiting with: -2
18:42 aixsyd joined #gluster
18:42 XpineX joined #gluster
18:42 RedShift joined #gluster
18:43 cp0k JoeJulian: Is it possible that Gluster may be busy with its background operation of rebuilding the .glusterfs tree and therefor not blocking somehow the information I am requesting?
18:43 cp0k JoeJulian: I do not see this behavior in the staging environment where I tested the upgrade prior to doing it in production....granted the staging env has a total data set of 17GB
18:44 cfeller joined #gluster
18:44 JoeJulian Sure, it's possible. Seems unlikely though.
18:44 JoeJulian Especially since the error code you mentioned does not exist in the source.
18:45 cp0k JoeJulian: ok, any suggestions? maybe I should try restarting glusterd on the storage nodes one at a time? or is that not recommended during the rebuild?
18:47 JoeJulian paste (into a pastebin like fpaste.org) the part of /var/log/etc-glusterfs-glusterd.vol.log around 2014-02-18 18:33:26
18:47 JoeJulian Unless it's just one or two lines...
18:49 andreask joined #gluster
18:49 cp0k sec
18:50 cp0k http://fpaste.org/78337/74945313/
18:50 glusterbot Title: #78337 Fedora Project Pastebin (at fpaste.org)
18:51 XpineX joined #gluster
18:51 cp0k JoeJulian: Seems one of my peers is holding onto a lock
18:53 cp0k JoeJulian: uuid of the peer asking for the lock is the same client where I am executing the 'gluster volume status' command
18:56 P0w3r3d joined #gluster
18:57 diegows joined #gluster
18:57 JoeJulian Ok, yes. I would stop all glusterd and start them again.
18:58 cp0k JoeJulian: okay, is it okay if I leave glusterd running on all the clients during this time?
18:59 an___ joined #gluster
19:01 cp0k JoeJulian: seems someone else had this same issue: http://www.gluster.org/pipermail/g​luster-users/2013-June/036186.html
19:01 glusterbot Title: [Gluster-users] held cluster lock blocking volume operations (at www.gluster.org)
19:06 rwheeler joined #gluster
19:07 cp0k JoeJulian: so seems like I have nodes which are itw own peer
19:08 JoeJulian Wierd. Should be able to cure that, theoretically, by stopping glusterd, deleting the peer with each server's own uuid from /var/lib/glusterd/peers and starting glusterd again.
19:09 cp0k JoeJulian: yep, doing exactly that now
19:09 cp0k JoeJulian: the weird part is this is happening on nodes that are not its own peer as well
19:09 cp0k JoeJulian: its all across the board for me, but I will remedy the issue and see if it helps any
19:12 psyl0n joined #gluster
19:13 cp0k odd: the uuid I am looking for of the machine does not exist in /var/lib/glusterd/peers
19:15 JoeJulian Oops, missed this question: "[10:58] <cp0k> JoeJulian: okay, is it okay if I leave glusterd running on all the clients during this time?" Yes. Restarting the management daemon has no effect on the running bricks or clients.
19:15 cp0k okay, I will try restarting glusterd on the storage nodes, since otherwise the behavior is very odd
19:22 cp0k JoeJulian: I've restarted glusterd on the storage nodes, after doing so I tailed /var/log/gluster/* and noticed messages like this across the nodes
19:22 cp0k http://fpaste.org/78353/92751223/
19:22 glusterbot Title: #78353 Fedora Project Pastebin (at fpaste.org)
19:22 KyleG1 left #gluster
19:23 JoeJulian @split brain
19:23 glusterbot JoeJulian: To heal split-brain in 3.3+, see http://joejulian.name/blog/fixin​g-split-brain-with-glusterfs-33/ .
19:23 LoudNoises joined #gluster
19:29 cp0k JoeJulian: before I can fix the split brain, it seems I need to first get down the bottom of removing the peer from the localhost...otherwise my 'gluster volume heal' command will not work
19:29 wrale JoeJulian: thought you might be interested in knowing that dnsmasq has a seemingly simple way to handle the split-horizon problem.  it seems that you can start the daemon using '--localise-queries'... this purportedly does the following: 'Return answers to DNS queries from /etc/hosts which depend on the interface over which the query was received. If a name in /etc/hosts has more than one address associated with it, and at least one of those addre
19:29 wrale sses is on the same subnet as the interface to which the query was sent, then return only the address(es) on that subnet. This allows for a server to have multiple addresses in /etc/hosts corresponding to each of its interfaces, and hosts will get the correct address based on which network they are attached to. Currently this facility is limited to IPv4. '
19:29 JoeJulian Interesting
19:30 nikk JoeJulian and cp0k - did you ever have a chance to look at https://bugzilla.redhat.co​m/show_bug.cgi?id=1065551
19:30 glusterbot Bug 1065551: medium, unspecified, ---, kparthas, NEW , Unable to add bricks to replicated volume
19:30 nikk we were talking last week about it
19:35 rpowell joined #gluster
19:37 REdOG damn it, again, Invalid argument
19:37 REdOG except this time There's write failed also
19:38 REdOG I cannot seem to create a sound volume
19:40 REdOG these aren't nearly as loud
19:40 REdOG I guess that's something positive
19:41 REdOG instead of 500 or so a second its like 2 every 2 or 3 seconds
19:41 REdOG [posix.c:2135:posix_writev] 0-aspartameC-posix: write failed: offset 519163904, Invalid argument
19:42 REdOG [server-rpc-fops.c:1439:server_writev_cbk] 0-aspartameC-server: 124235: WRITEV 0 (1be30eca-cb3e-499d-9e9b-ab5950e977ec) ==> (Invalid argument)
19:43 JoeJulian EINVAL
19:43 JoeJulian fd is attached to an object which is unsuitable for writing; or the file was opened with the O_DIRECT flag, and either the address specified in buf, the value specified in count, or the current file offset is not suitably aligned.
19:44 JoeJulian Not sure what that actually means wrt your error though.
19:46 spiekey JoeJulian: the bonnie test in my virtual machine (on top of glusterfs) looks very good.
19:46 REdOG the second host of the replica doesn't have this issue so i suspect it's something to do with the zpool
19:46 spiekey maybe bonnie is a bad tool to benchmark glusterfs directly
19:46 JoeJulian spiekey: Are you using libgfapi or fuse?
19:47 REdOG I get the same kind of errors whether I use a dataset or a zvol
19:48 spiekey JoeJulian: i don't know. how can i tell?
19:49 JoeJulian If you're using qemu, are you accessing your image using gluster://volume/path/file.img or through a client mount.
19:50 spiekey i am using qemu then
19:50 REdOG protocol='gluster'
19:53 khushildep joined #gluster
19:53 REdOG i didn't try zvol with gluster mount. dataset&gmount dataset&libgsapi zvol&libgfapi all give invalid argument errors
19:57 spiekey JoeJulian: any idea why the bonnie benchmark is bad when i use fuse? (mounted by fstab)
19:59 ctria joined #gluster
20:00 jag3773 joined #gluster
20:01 JoeJulian Because bonnie's inefficient over a network connection.
20:04 gdubreui joined #gluster
20:05 gdubreui joined #gluster
20:07 spiekey because of the small write operations?
20:07 _pol_ joined #gluster
20:08 JoeJulian probably. Tbh, I've run bonnie once about 10 years ago and realized it was pretty useless for real life benchmarking back then (imho).
20:10 JoeJulian What really matters is whether or not your system design is sufficient to meet your goals. In most cases where your goals involve an entire clustered system as opposed to a single server, there are no adequate benchmarks besides good testing.
20:17 rossi_ joined #gluster
20:24 ndk joined #gluster
20:27 REdOG JoeJulian: I think this maybe my problem: https://bugzilla.redhat.com/show_bug.cgi?id=748902
20:27 glusterbot Bug 748902: unspecified, unspecified, ---, virt-maint, CLOSED DEFERRED, qemu fails on disk with 4k sectors and cache=off
20:27 JoeJulian Oooh
20:28 JoeJulian I should have guessed at that one. :(
20:28 REdOG well your pointer got me there
20:28 REdOG so tks
20:28 JoeJulian Oh, good.
20:28 REdOG EINVAL
20:28 REdOG back to the lab!
20:29 JoeJulian Yes master...
20:32 _pol joined #gluster
20:33 nikk JoeJulian: that problem i had the other day isn't limited to rhel7 btw, i'm reproducing it on centos 6.5
20:36 JoeJulian oh good
20:37 nikk i did find something out though
20:38 nikk so the issue was if i have a replica volume with two servers and two bricks (one brick per server) i try to add two more bricks (one per server, two new servers) it fails
20:39 nikk i *can* however add an additional brick to that volume from the original two servers
20:39 nikk just not two new servers
20:41 nikk very weird
20:47 psyl0n joined #gluster
20:48 JoeJulian I still think it has to do with the brick being on the root mount. Make a big image file with a filesystem and mount it through loop and see if it will allow that to be added.
20:49 nikk i'll add another disk to these VMs
20:49 JoeJulian or that
20:49 nikk :)
20:59 nikk nope
20:59 nikk not it
20:59 JoeJulian Damn... I was sure....
20:59 nikk new disk entirely, mounted on /mnt/disk2, made a directory /mnt/disk2/gluster
21:00 nikk same error :(
21:01 nikk erm.. i left something off, it worked.. hang on
21:05 rossi_ joined #gluster
21:06 _pol joined #gluster
21:09 nikk ok not the issue with the other one it seems
21:09 nikk what i was doing was having a directory /gluster/vol1 as a brick instead of /gluster/vol1/brick1
21:10 nikk lemme try adding another physical disk to the rhel hosts.. see if that fixes it
21:11 B21956 joined #gluster
21:13 jag3773 joined #gluster
21:15 nikk yep that was it
21:15 nikk so the file system was actually the cause
21:15 _pol joined #gluster
21:15 nikk i think that the lack of an error message is a bug though
21:16 Matthaeus joined #gluster
21:16 JoeJulian Definitely a bug.
21:17 JoeJulian As far as I'm concerned, any failure via the cli that requires us to search through logs to find the problem (or even worse through source code) is a bug. Errors should be reported when encountered by a cli command.
21:18 nikk alright, i'll update bugzilla with my findings
21:18 nikk do you want it left open so you can explore the cli outuput bug?
21:18 nikk leave*
21:19 JoeJulian yes. It's a valid bug and they need to fix it.
21:19 nikk ok
21:20 nikk i appreciate your help, i don't even know if you're part of the dev team or just a useful citizen :]
21:20 nikk :%s/useful/helpful/g
21:20 * JoeJulian is a ,,(volunteer)
21:20 glusterbot A person who voluntarily undertakes or expresses a willingness to undertake a service: as one who renders a service or takes part in a transaction while having no legal concern or interest or receiving valuable consideration.
21:20 nikk ah ha
21:22 JoeJulian It's nice. My $dayjob recognizes the value of the time I spend helping out in here and gives me lots of leeway.
21:23 nikk i know what you mean
21:29 srsc joined #gluster
21:30 srsc hi all, looking for advice on getting gluster mounts over ipoib to mount on boot on debian wheezy
21:31 JoeJulian semiosis: ^
21:31 JoeJulian I won't give my advice on that...
21:31 Matthaeus Is your advice of the "Doctor, it hurts when I do that" variety?
21:32 JoeJulian switch to centos. ;)
21:32 * srsc hopefully awaits a second opinion
21:34 srsc client log here: http://fpaste.org/78394/92759235/
21:34 glusterbot Title: #78394 Fedora Project Pastebin (at fpaste.org)
21:35 srsc looks like ipoib just isn't up and running by the time gluster attempts to mount.
21:35 Matthaeus srsc, do you have the _netdev option specified in /etc/fstab?
21:36 JoeJulian They don't use that.
21:36 srsc Matthaeus: i do.
21:36 Matthaeus Then add a mount -a -o _netdev (* verify that this works, I'm shooting from the hip) to /etc/rc.local
21:37 JoeJulian _netdev is ignored anyway. It can safely be omitted on .deb distros. ... or any distro for that matter when it's used in rc.local.
21:38 srsc fstab line: http://fpaste.org/78397/59413139/
21:38 glusterbot Title: #78397 Fedora Project Pastebin (at fpaste.org)
21:38 Matthaeus JoeJulian, I wasn't intending it to be used by the system, but rather as a way for his mount command to target only the network filesystems.
21:38 srsc proc tree calling /sbin/mount.gluster: http://fpaste.org/78398/75943413/
21:38 glusterbot Title: #78398 Fedora Project Pastebin (at fpaste.org)
21:38 XpineX joined #gluster
21:39 JoeJulian Oh, right, sorry. Missed that.
21:41 Matthaeus Of course, this assumes that your ib is up and functional by the time rc.local is run.  I've never had any such hardware to play with, so YMMV.
21:42 srsc right, i was looking for a way to make ib setup a prerequisite for network mounting, haven't found it yet though
21:43 khushildep joined #gluster
21:45 JoeJulian I would have expected that the network upstart job would require ib be up in order to start the network, in which case gluster should wait for both.
21:45 Matthaeus Is ib configured in /etc/network/interfaces?
21:47 srsc Matthaeus: yup, it is: http://fpaste.org/78400/92760059/
21:47 glusterbot Title: #78400 Fedora Project Pastebin (at fpaste.org)
21:48 srsc and the ib0 interface comes up and is properly configured...eventually
21:53 cp0k JoeJulian: I found out that my /etc/glusterfs/ files were not consistent across all my storage nodes....is there any harm in modifying /etc/glusterfs/glusterfsd.vol directly?
21:53 JonnyNomad joined #gluster
21:53 Matthaeus srsc: You can also add mount -a -o _netdev as an up option in /etc/network/interfaces, then.
21:54 Matthaeus srsc: http://fpaste.org/78404/27603851/
21:54 glusterbot Title: #78404 Fedora Project Pastebin (at fpaste.org)
21:55 dbruhn joined #gluster
21:57 an__ joined #gluster
22:01 JoeJulian cp0k: Sure, that's fine. You'll want them all to match. You'll have to restart glusterd to take the changes, of course.
22:01 dbruhn JoeJulian, does a force on a rebalance correct any layout issues too? I am assuming it does, just looking for peace of mind
22:03 JoeJulian dbruhn: correction layout issues shouldn't /require/ a force, but a forced rebalance won't not fix them if that's what you're asking.
22:03 JoeJulian s/correction/correcting/
22:03 glusterbot What JoeJulian meant to say was: dbruhn: correcting layout issues shouldn't /require/ a force, but a forced rebalance won't not fix them if that's what you're asking.
22:03 Oneiroi joined #gluster
22:04 dbruhn You got it, I have some bricks that are 100% full, and some that are almost not full and I had a rebalance go wonky, so I restarted with a force. Kind of surprised on how little data moved around so just wanted to make sure it wasn't due to the layout being wonky.
22:06 cp0k JoeJulian: Thanks, I also see them mismatching on my clients...hopefully that is the reason for the exit 196
22:07 cp0k JoeJulian: I say this because my glusterfsd.vol has the following line in it
22:07 cp0k option auth.addr.brick.allow 10.0.144.*,10.0.145.*
22:07 JoeJulian orly
22:08 nightwalk joined #gluster
22:09 cp0k JoeJulian: just a guess :)
22:09 JoeJulian cp0k: would make sense. I wonder if it's trying to connect over 127.0.0.1...
22:12 cp0k JoeJulian: it is, I seen that in the logs
22:12 JoeJulian Well there you have it.
22:13 cp0k connect(5, {sa_family=AF_INET, sin_port=htons(24007), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
22:13 cp0k JoeJulian: ^^ that is from strace gluster volume status
22:14 JoeJulian followed shortly by that 146....
22:14 cp0k yes
22:15 cp0k JoeJulian: I will be sure to let you know my results once I sync up the files in /etc/glusterfs
22:15 mmmarek joined #gluster
22:15 JoeJulian My only curiosity at this point is where ECONNREFUSED is defined. /shrug
22:17 cp0k JoeJulian: ditto
22:19 mmmarek I've played with glusterfs like that. http://md.osuv.de/I7oFa but now I want so simulate the case that my master/server1 is failed and need to create new. how can I add it to the gv0 volume?
22:19 glusterbot Title: Markdown Paste Editor (at md.osuv.de)
22:35 rwheeler joined #gluster
22:36 markuman_ joined #gluster
22:40 qdk joined #gluster
22:49 glusterbot New news from newglusterbugs: [Bug 1060259] 3.4.3 tracker <https://bugzilla.redhat.co​m/show_bug.cgi?id=1060259>
22:55 zaitcev joined #gluster
23:10 lmickh_ joined #gluster
23:18 cp0k joined #gluster
23:18 smithyuk1 joined #gluster
23:18 ^rcaskey joined #gluster
23:18 Nev___ joined #gluster
23:18 portante joined #gluster
23:24 _NiC joined #gluster
23:28 gdubreui joined #gluster
23:35 theron joined #gluster
23:36 daMaestro joined #gluster
23:39 zerick joined #gluster
23:52 eastz0r joined #gluster
23:57 shapemaker joined #gluster
23:57 pdrakeweb joined #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary