Camelia, the Perl 6 bug

IRC log for #gluster, 2013-05-06

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
08:18 glusterbot New news from newglusterbugs: [Bug 959887] clang static src analysis of glusterfs <http://goo.gl/gf6Vy>
08:48 glusterbot New news from newglusterbugs: [Bug 918917] 3.4 Alpha3 Tracker <http://goo.gl/xL9yF>
10:41 karoshi is there something special to be aware of if taking an existing filesystem with data and turning it into a brick for a replicated volume?
13:21 cicero you're probably better off creating a separate namespace to hold the gluster data
13:21 cicero and then rsyncing locally, basically
13:55 ndevos @channelstats
13:55 glusterbot ndevos: On #gluster there have been 121618 messages, containing 5231167 characters, 878070 words, 3639 smileys, and 447 frowns; 778 of those messages were ACTIONs. There have been 44811 joins, 1431 parts, 43461 quits, 19 kicks, 120 mode changes, and 5 topic changes. There are currently 141 users and the channel has peaked at 217 users.
15:12 semiosis glusterbot: whoami
15:12 glusterbot semiosis: semiosis
15:12 semiosis glusterbot: mode -i
15:12 fubada joined #gluster
15:12 fubada hi
15:12 glusterbot fubada: Despite the fact that friendly greetings are nice, please ask your question. Carefully identify your problem in such a way that when a volunteer has a few minutes, they can offer you a potential solution. These are volunteers, so be patient. Answers may come in a few minutes, or may take hours. If you're still in the channel, someone will eventually offer an answer.
15:12 semiosis fubada: thanks for alerting me to the invite only mode.  idk why glusterbot enabled that, but i turned it off
15:13 fubada i introduced gluster as shared storage for two kvm servers, they are in replica 2 mode
15:13 gluslog joined #gluster
15:13 fubada i need to undo this :(
15:13 fubada as everyone is complaining about how slow the io is
15:13 fubada semiosis: no problem
15:13 nickw joined #gluster
15:13 fubada id like to undo this by first pausing replicationg
15:13 fubada maybe with  volume remove-brick dev_kvm_images 10.77.77.37:/dev_kvm_images
15:13 jclift_ joined #gluster
15:13 fubada is that safe?
15:15 fubada the bricks are mounted on each kvm host using the native client and localhost for gluster
15:15 fubada localhost:/dev_kvm_images  /opt/dev_kvm_images glusterfs defaults,_netdev,backupvolfile-server=foo 0 0
15:16 fubada semiosis: can you pls help :)
15:19 semiosis fubada: sorry i have no experience running vms over glusterfs
15:19 fubada this is just a question regarding what would happen if I ran  volume remove-brick dev_kvm_images 10.77.77.37:/dev_kvm_images
15:19 fubada on one of the 2 glusters
15:20 fubada its warning about data loss etc
15:20 glusterbot New news from newglusterbugs: [Bug 960153] rpc: XID is incorrected printed in several logs <http://goo.gl/Zosyv> || [Bug 960141] NFS no longer responds, get "Reply submission failed" errors <http://goo.gl/RpzTG>
15:20 fubada i just want to make sure the operation is safe, and is effective at terminating replication
15:20 semiosis hmmm
15:21 semiosis there should be a remove-brick command that modifies replica count, maybe if you added 'replica 1' after remove-brick, though that seems a bit odd
15:21 fubada Removing brick(s) can result in data loss. Do you want to Continue? (y/n)
15:21 fubada ;/
15:21 semiosis right, you're not modifying replica count with that command
15:21 fubada as long as all that does is decouple my glusters im fine
15:22 semiosis and i doubt it would really let you remove one brick from a replica set
15:22 semiosis with that command
15:22 semiosis try adding 'replica 1' like i mentioned
15:22 semiosis does it still give you that warning?
15:22 fubada yes, id id
15:22 semiosis oh
15:22 fubada volume remove-brick dev_kvm_images replica 1 10.77.77.37:/dev_kvm_images
15:22 fubada warning
15:24 semiosis well another option is just to block off the replica brick from clients.  kill the glusterfsd process (with a TERM/15) for the replica brick you want to disable, then cut off its port with iptables
15:24 semiosis that will effectively disable replication
15:24 fubada will that leave my local brick mounted>
15:24 fubada ?
15:24 daMaestro joined #gluster
15:24 semiosis see ,,(mount server)
15:24 glusterbot (#1) The server specified is only used to retrieve the client volume definition. Once connected, the client connects to all the servers in the volume. See also @rrnds, or (#2) Learn more about the role played by the server specified on the mount command here: http://goo.gl/0EB1u
15:25 fubada i wish there was a way to like
15:25 fubada pause replication
15:25 fubada and turn it on again later
15:25 fubada that way id syc nightly
15:25 fubada sync
15:25 semiosis thats what i just gave you
15:25 fubada killing glusterfsd?
15:25 semiosis when you restart that brick & unblock its port, replication will resume and should automatically catch up
15:26 H__ A "gluster volume rebalance vol01 status" shows one of the nodes as status failed. Where can I look for the details as to why it failed ?
15:26 semiosis you can restart the brick's glsuterfsd process with 'gluster volume start $vol force'
15:26 semiosis H__: logs
15:27 H__ which of the dozen logfiles ? ;-)
15:27 semiosis H__: probably the glusterd.log, but when in doubt, grep it out
15:33 H__ semiosis: There is a vol01-rebalance.log , but the errors in there are on all nodes, none point me to the node that says 'failed'
15:39 ThatGraemeGuy joined #gluster
15:40 karoshi if I use a directory with data as a brick to start a replicated volume, then add another empty brick to have gluster replicate the data to the empty one, I see lots of noise in the logs about sel-fheal failed because of missing gfid
15:41 karoshi however data seems to be replicating correctly
15:41 karoshi once everything is replicated, will the log noise go away?
15:47 vpshastry joined #gluster
15:55 DEac- joined #gluster
16:04 andrewjs1edge joined #gluster
16:05 pjameson joined #gluster
16:08 vpshastry joined #gluster
16:09 pjameson Has anyone here had any issues with the self-heal daemon not coming backup up when gluster is restarted on the 3.4 alpha? I've got a replica pair where both NFS and SHD start up when I start the volume, but if I restart glusterd on one of the nodes, SHD and NFS never start back up on that node
16:10 lalatenduM joined #gluster
16:11 harold[MTV] joined #gluster
16:20 andreask joined #gluster
16:21 cfeller joined #gluster
16:24 satheesh joined #gluster
16:32 premera joined #gluster
16:40 chrisgh joined #gluster
16:40 chrisgh what speed should one use for GlusterFS storage network?
16:41 chrisgh Gigabit, 10-gig or run on Infiniband?
16:42 semiosis chrisgh: one should use whatever gives acceptable performance for an affordable price
16:43 semiosis those three are all valid options
16:43 chrisgh but should the clients and the servers have the same speed?
16:43 chrisgh or is it only important with server speed? I'm thinking distributed replicated setup
16:45 harold[MTV] If you can split the brick -> brick (server to server) traffic off from the client -> server traffic, that could help
16:45 semiosis harold[MTV]: how do you suggest one does that?
16:45 harold[MTV] and might be easier to implement
16:46 harold[MTV] dual nics on the bricks. for example IB on the bricks for that traffic, then 10G out to the clients
16:46 satheesh joined #gluster
16:46 harold[MTV] then brick to brick traffic will be faster, but you don't have to buy IB for all your clients
16:48 semiosis there is relatively little brick-to-brick traffic when using glusterfs native FUSE clients.  replication is done client-side.
16:48 semiosis NFS clients otoh have replication done for them server-side
16:48 chrisgh so that means one needs symetrical network speed?
16:48 chrisgh so that means one needs symetrical network speed with FUSE client?
16:48 semiosis chrisgh: what one needs is determined more by one's requirements than glusterfs's requirements
16:48 semiosis chrisgh: not everyone needs the same thing
16:49 semiosis some people may be well served by asym speeds, if for ex. there are many many clients with relatively low individual IO
16:49 semiosis chrisgh: the only way to even have a chance of getting a reasonable recommendation in here is to tell us mroe about your use case
16:49 chrisgh GlusterFS will be used to store web server data
16:49 semiosis otherwise we're just guessing
16:50 chrisgh Should one go for lower latency of infiniband since its many small files?
16:50 glusterbot New news from newglusterbugs: [Bug 960190] Gluster SHD and NFS do not start <http://goo.gl/NLh5B>
16:50 semiosis keep talking
16:50 semiosis web server data could be lots of stuff... streaming video != joomla
16:52 semiosis chrisgh: best advice is to deploy a scaled down model of your real environment + application
16:52 semiosis chrisgh: put some real world load on it, see how it performs
16:52 chrisgh Thanks semiosis
16:53 chrisgh I will benchmark the load
16:53 semiosis if you're going to host ,,(php) apps see this article...
16:53 glusterbot php calls the stat() system call for every include. This triggers a self-heal check which makes most php software slow as they include hundreds of small files. See http://goo.gl/uDFgg for details.
16:53 semiosis there are optimizations you can do
16:53 semiosis also caching on the front end helps a huge amount with small static assets
16:54 chrisgh I will enable caching on the front end then
16:55 chrisgh Does GlusterFS support local file caching with Fuse?
16:55 semiosis i dont think so
16:55 chrisgh ie what happens when a client reads the file the second time? served from cache or reread?
16:56 semiosis not sure
16:56 harold[MTV] joined #gluster
16:56 semiosis I assume re-read
16:57 semiosis but the real answer is probably more complicated than either choice you offered
16:58 aliguori joined #gluster
17:00 cfeller I'm running GlusterFS 3.3.1 in a distributed replicate (8 bricks,4x2). I decided to mount gluster on on a local Debian webserver with glusterfs-fuse, as this server is an internal mirror for Debian and other repos
17:00 cfeller I noticed that, via Apache, directory listings (ones mounted over glusterfs), are taking a _ridiculously_ long time to list.  (This, if say, I'm just browsing the contents of the repo via my web browser.) However, SSH'ing into the webserver, and running 'ls' in the same directory, happens lightning quick.  So this seems to be Apache specific.
17:00 cfeller Interestingly enough, grabbbing a several hundered MB file from the repo is no slower (MB/s) than grabbing a file off of the local disk. (Some repos are on the local disk, some are on gluster while I'm testing.)
17:00 cfeller So this latency seems to be Apache specific, but only when listing directories?  is this a known issue, and if so is there something I can set in Apache (or gluster)?
17:02 thomasl__ joined #gluster
17:03 semiosis i would expect a huge directory listing to be slow, on the command line or in apache.
17:05 cfeller this directory (http://debian.osuosl.org/debian/pool/main/o/) for instance, listing it over Apache takes about a full minutes to list and less than a second to list via 'ls'.
17:05 glusterbot Title: ftp.osuosl.org :: Oregon State University Open Source Lab (at debian.osuosl.org)
17:05 cfeller (our mirror of it, sitting on gluster, that is.)
17:07 semiosis cfeller: are you doing that ls on a brick dir or on a fuse client mount point?
17:07 cfeller a fuse point.
17:07 cfeller I'm SSH'ing into the webserver itself, so I'm reading the same directory that Apache is.
17:07 semiosis how about ls -l?  same or slower?
17:08 cfeller about the same.  I'm getting about:
17:09 cfeller real    0m0.240s
17:09 cfeller user    0m0.000s
17:09 cfeller sys     0m0.012s
17:09 cfeller using 'time' before the 'ls' command.
17:12 __Bryan__ joined #gluster
17:15 cfeller I just used 'time curl' to list the directory contents via apache:
17:15 cfeller real    0m48.274s
17:15 cfeller user    0m0.009s
17:15 cfeller sys     0m0.015s
17:17 semiosis how many files are in that dir?
17:19 semiosis 1/4 second seems unreasonably fast
17:19 cfeller 429.
17:19 semiosis then maybe it is
17:19 semiosis idk
17:19 cfeller my 'test' setup is only mirroring wheezy right now.
17:20 semiosis directory listings are slow on glusterfs
17:20 semiosis due to the way it distributes metadata
17:20 semiosis idk why it would be faster on cmd line though, that seems unlikely based on my experience
17:20 cfeller but what is apache doing that ls isn't doing?
17:21 semiosis strace will give you that answer
17:21 semiosis http://edoceo.com/exemplar​/strace-multiple-processes
17:21 glusterbot <http://goo.gl/GnNWo> (at edoceo.com)
17:21 semiosis wow edoceo really went crazy with the ads on that page since last time i looked at it
17:22 cfeller ok. time to go spelunking I guess.
17:22 semiosis @seen edoceo
17:22 glusterbot semiosis: edoceo was last seen in #gluster 27 weeks, 2 days, 22 hours, 15 minutes, and 18 seconds ago: <edoceo> It seems like when I have distribute on top of replicate that some stuff is written to replica1 while other stuff is on replica3
17:33 rotbeard joined #gluster
17:37 fubada man so my gluster storage for kvm machines ended up in disaster :P
17:37 fubada had to basically undo all gluster and remove
17:37 Supermathie fubada: How so?
17:38 fubada i was likely doing it wrong, but performance was a nightmare
17:39 fubada i have 30-40 ruby on rails oriented vms across 2 physical kvm servers, and I decided to create a replica volume for /var/lib/libvirtd/images
17:39 fubada performance tanked big tiime
17:40 fubada ofcourse i wasnt bonding interfaces, dedicating switches, or etc
17:40 fubada just 2 node replica using gluster native client
17:42 __Bryan__ joined #gluster
17:49 andreask joined #gluster
18:02 andrewjsledge joined #gluster
18:12 kkeithley fubada: Up to now we have been advising people not to host their vm images on gluster. We've done considerable work in 3.4.x to make that better.. Once 3.4.0 is released we hope you'll give it another try.
18:17 Supermathie fubada: I've since learned that the FUSE client essentially gives you a filesystem with a queue depth of 1. NOT IDEAL.
18:18 Supermathie fubada: I'd recommend using the NFS client and 'volume set vol0 performance.nfs.write-behind on'
18:18 Supermathie OTOH, 3.3.1 and 3.4 both have a pretty crippling bug at the moment with NFS...
18:20 semiosis fubada: another strategy is to use a bare bones boot image for the vm, then have native glusterfs mounts within the vm for all application+data needed
18:20 Supermathie fubada: ^ that, also lots of ways to do it. One glusterfs mount per disk? (ew)
18:25 kkeithley er, what's the bug with NFS?
18:26 Supermathie Well, it may be RPC subsystem, but NFS is triggering it for me
18:26 Supermathie https://bugzilla.redhat.com/show_bug.cgi?id=960141
18:26 glusterbot <http://goo.gl/RpzTG> (at bugzilla.redhat.com)
18:26 glusterbot Bug 960141: urgent, unspecified, ---, vraman, NEW , NFS no longer responds, get  "Reply submission failed" errors
18:41 y4m4 joined #gluster
18:59 hagarth joined #gluster
19:13 jkroon_ joined #gluster
19:15 jkroon_ hi guys, i'm seeing a very interesting, albeit annoying, phenonemon.  Four machines, standard 2x2 distribute-replicate cluster, some of the nodes (usually 3,4) has a significant number of SYN_SENTs open to 1.
19:16 jkroon_ what's interesting is that according to tcpdump (filtering for tcp packets on port 24009 that has the syn or fin bit set) the returning syn/ack is received on 3,4, but for some reason the connection doesn't go into ESTABLISHED.
19:16 JoeJulian I wonder if that's related to the bug I'm digging into myself right now.
19:16 jkroon_ JoeJulian, that's cryptic ...
19:17 JoeJulian So's the error messages...
19:17 jkroon_ i'm not seeing the same thing on replicate only ... and the connections (the ones I've seen at least) are usually from 3,4 => 1 (and sometimes 2)
19:17 jkroon_ rofl
19:17 * jkroon_ not even getting error messages ...
19:19 jkroon_ ok, perhaps you should pastebin your errors somewhere, i might be able to deduce if it's related or not ?
19:19 JoeJulian Basically I'm getting a constantly recurring "I [client.c:2090:client_rpc_notify] 1-wpkg-client-8: disconnected" in my client log. Not sure what triggers that starting, but once it does it continues indefinitely.
19:20 * jkroon_ need to figure out the log mechanism
19:21 jdarcy joined #gluster
19:23 JoeJulian What do you need to know?
19:25 jkroon_ well, basically I'd like to figure out whether we're looking at the same issue or not
19:26 JoeJulian That's in my client log. The client log is based on the mount point, for instance a volume mounted at /mnt/gluster/myvol will be /var/log/glusterfs/mnt-gluster-myvol.log
19:27 jkroon_ ok, so i found the gluster logs .... now I need to figure out where to start looking for error messages (it worries me that glusterd is consuming around 25% of avaiable CPU)
19:27 JoeJulian That's where I'm getting that info message. The counterpart to my message is on the servers (not sure if it's one or more yet) in the glusterd log, /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
19:29 JoeJulian In that log I'm getting a barrage of "[2013-05-06 12:28:12.445208] I [socket.c:1798:socket_event_handler] 0-transport: disconnecting now"
19:29 jkroon_ ok
19:35 jkroon_ ok, i'm seeing a great many number of log files ...
19:36 jkroon_ not seeing any errors in my case in the etc-glusterfs-glusterd.vol.log file ...
19:39 Supermathie jkroon_: SYN_SENT means that machine is trying to open a connection somewhere and the other side hasn't responded (99% of the time as it's firewalled off or down). Can you paste output of 'ss | grep SYN_SENT' and 'volume vol0 status'?
19:39 Supermathie to pastie.org
19:39 Supermathie or whatever
19:39 Supermathie lol "glusterd is consuming around 25% of available CPU". My problem is: "It's only consuming 100% of a single CPU - can I make it thread more?"
19:43 jkroon_ Supermathie, do you specifically want ss or will netstat -ntap | grep SYN do too?
19:43 jkroon_ rofl @ threading ...
19:43 Supermathie that'd do too
19:43 jkroon_ kk, that i can get, ss for some reason fails with operation not supported
19:44 Supermathie and dmesg for oddness?
19:44 Supermathie jkroon_: on that note, have you checked /var/log/messages
19:46 krishna_ joined #gluster
19:46 jkroon_ Supermathie, yes i have - not seeing anything odd in there, but trying to scan /var/log/glusterfs
19:47 jkroon_ http://pastie.org/7809956
19:47 glusterbot Title: #7809956 - Pastie (at pastie.org)
19:47 jkroon_ ok, that shows the lowlevel stuff, including the tcpdump from earlier today.
19:48 Supermathie need volume status not volume info
19:48 aliguori joined #gluster
19:48 jkroon_ grr, my mistake
19:49 jkroon_ Unable to obtain volume status information. <-- on 1,3 and operation failed on 2,4
19:50 Supermathie de.
19:50 Rorik joined #gluster
19:50 Supermathie don't run it in parallel - only need the  output from one noe
19:51 jkroon_ http://pastie.org/7809964
19:51 glusterbot Title: #7809964 - Pastie (at pastie.org)
19:51 jkroon_ i really don't need nfs either ...
19:52 Supermathie jkroon_: You need to look at your firewall config - this looks like a networking problem
19:53 jkroon_ Supermathie, i completely agree, but firewall config is iptables -A INPUT -i eth1 -j ACCEPT ...
19:53 Supermathie god, I hate how pastie doesn't word wrap and constricts the output to a small part of the window
19:53 jkroon_ and tracking <1000 connections at any given point in time ... so tell me what to look at and I'll *gladly* look at it
19:53 jkroon_ alternate location I can paste for you?
19:55 Supermathie jkroon_: deerrrr... fpastie or pastebin I guess. Screw @glusterbot. :p
19:55 Supermathie s'ok
19:55 jkroon_ hmm, i am seeing iptables drops on cstate invalid, going to drop iptables and see if that helps.
19:55 Supermathie hahah connection state tracking isn't working
19:55 Supermathie yeah rmmod iptables
19:56 Supermathie If you don't need it.
19:56 Supermathie Oh btw you want "iptables -I INPUT -i eth1 -j ACCEPT" to accept everything on eth1
19:56 Supermathie so it gets in first ahead of all other rules
19:56 jkroon_ yes, it's a private VLAN.
19:56 jkroon_ yea, thinking the same thing
19:56 jkroon_ yea, now it's looking cleaner
19:57 jkroon_ omg, that's just sad and pathetic ... how can conntrack get that wrong?
19:58 Supermathie jkroon_: Is nf_conntrack_ipv4 actually loaded?
19:58 jkroon_ you joking about rmmod iptables?
19:58 jkroon_ yes, it's loaded.
19:58 Supermathie well if you're not using it for real rules you can just rmmod it.
19:58 jkroon_ conntrack -L also lists a lot of connections
19:58 jkroon_ using it for stuff on eth0
19:59 jkroon_ but honestly, why would it flag returning syn/ack as invalid ?
19:59 Supermathie jkroon_: OK, make sure the eth1 rule is right at the top of the rules and you'll be fine.
19:59 jkroon_ moved it.
19:59 jkroon_ i don't understand why it didn't work though ...
19:59 jkroon_ glusterd messing with the tcp header flags somehow?
20:00 Supermathie jkroon_: who knows, would have to see your rules.
20:00 Supermathie jkroon_: Nope, I don't think gluster is even aware of the connection attempt until the handshake is complete (not sure on that one)
20:00 jkroon_ well, it would need to be aware that it's issuing a connect() to the peer :p
20:01 jkroon_ which begs the question - why doesn't it pool the connections, why does it keep establishing new ones?
20:01 Supermathie Right, but the OS is totally responsible for the actual handshake
20:01 Supermathie It's connecting to the brick
20:01 jkroon_ yea, that was my thought too!
20:01 jkroon_ which is why i was so utterly and totally stumped!
20:01 jkroon_ http://pastebin.com/vs4bFEuy
20:01 glusterbot Please use http://fpaste.org or http://dpaste.org . pb has too many ads. Say @paste in channel for info about paste utils.
20:02 jkroon_ already moved the eth1 rule up - was just prior to the lo rule
20:02 NcA^_ joined #gluster
20:06 semiosis jkroon_: glusterfs version?
20:06 jkroon_ 3.3.1
20:06 semiosis distro?
20:06 jkroon_ gentoo
20:07 semiosis jkroon_: kernel?
20:07 semiosis uname -a would be nice
20:07 jkroon_ 3.7.3
20:07 semiosis wow
20:07 semiosis you're like, from the future
20:08 jkroon_ Linux yomo-prod-web2 3.7.3-uls #6 SMP Sun Mar 3 14:26:48 SAST 2013 x86_64 Intel Xeon E312xx (Sandy Bridge) GenuineIntel GNU/Linux
20:08 Supermathie oooooooohhhhhh you need to move RELATED,ESTABLISHED above INVALID
20:08 semiosis +1
20:08 jkroon_ Supermathie, ok, always had it invalid first, why related,established first?
20:09 semiosis seems like a good test
20:09 jkroon_ surely a packet can't be both invalid and related/established?
20:09 semiosis one would hope
20:09 jkroon_ granted.
20:09 jkroon_ well, it's a production environment, so i don't think the client is going to appreciate me messing more than required.
20:10 jkroon_ i'll rather go rtfs on the conntrack code to confirm that theory.
20:13 * JoeJulian grumbles about someone breaking fpaste for el5
20:14 jkroon_ rofl, i've been doing a lot of grumbling recently @ quite a number of projects.
20:14 jkroon_ must say that so far gluster is one of the shining stars in my array
20:22 nueces joined #gluster
20:22 jkroon_ ok, i still see sporadic SYN_RECV on web1, but all on port 80 now (and considering it does about 200 odd incoming connections/second I guess seeing a SYN_RECV every 6-10 seconds when snapping every 2 is just fine.
20:22 jkroon_ so conntrack it was
20:39 jkroon_ thanks for the help guys, i'm off.
20:44 JoeJulian You're welcome. See you later jkroon_
20:46 JoeJulian @yum repo
20:46 glusterbot JoeJulian: kkeithley's fedorapeople.org yum repository has 32- and 64-bit glusterfs 3.3 packages for RHEL/Fedora/Centos distributions: http://goo.gl/EyoCw
20:49 brunoleon joined #gluster
20:49 luckybambu joined #gluster
21:05 chrisgh joined #gluster
21:11 jkroon_ joined #gluster
21:27 JoeJulian file a bug
21:27 glusterbot http://goo.gl/UUuCq
21:47 bchilds joined #gluster
21:47 bchilds any idea how i can retrieve the trusted.glusterfs.pathinfo extended attribute as non root user?
21:49 JoeJulian sudo
21:50 JoeJulian trusted.* is restricted by the operating system.
21:50 JoeJulian I think..
21:51 glusterbot New news from newglusterbugs: [Bug 960285] Client PORTBYBRICK request for replaced-brick will retry forever because brick port result is 0 <http://goo.gl/8sKao>
21:52 JoeJulian "Trusted extended attributes are visible and accessible only to processes that. have the CAP_SYS_ADMIN capability (the super user usually has this. capability)." - man attr
21:53 jkroon_ joined #gluster
21:54 bchilds hmmm
21:54 bchilds "CAP_SYS_ADMIN" can i assign that?
21:55 JoeJulian looks like it's assigned per-binary.
21:55 JoeJulian man capabilities
21:57 bchilds i wonder why trusted.glusterfs.pathinfo is trusted and not user
21:59 JoeJulian Good question. looks like it could easily be changed in the source. File a bug report. :D
22:00 JoeJulian Er, I mean file a bug report. Glusterbot's not case insensitve about that one. :/
22:00 glusterbot http://goo.gl/UUuCq
22:01 Supermathie Any way in 3.3.1 to bump up the parallelism of glusterfs/nfs so it uses >1 CPU?
22:05 Supermathie a2: If you want COMPLETE logs or access to the system to look at this further, that can be arranged.
22:09 Scottix joined #gluster
22:10 Scottix hi everyone, i have a question and wondering if you can answer it for me
22:11 Supermathie hi
22:11 glusterbot Supermathie: Despite the fact that friendly greetings are nice, please ask your question. Carefully identify your problem in such a way that when a volunteer has a few minutes, they can offer you a potential solution. These are volunteers, so be patient. Answers may come in a few minutes, or may take hours. If you're still in the channel, someone will eventually offer an answer.
22:11 Supermathie ^
22:13 Scottix I am doing a write test, and seeing what happens for a power loss scenario for the cluster
22:14 Scottix I start writing the file then, basically unplug the machine, but what I am seeing is the file keeps returning that is wrote data and even closed the file
22:15 Scottix the cluster* i unplug not the machine writing to the cluster
22:16 Supermathie Scottix: sounds like the metadata was written and the data wasn't.
22:17 Scottix i would think that if it lost connection to the cluster it would put the mount in read-only or EIO somehow, its like if a raid when bad and immediately goes ro
22:20 andreask your configuration Scottix?
22:21 Scottix Replicate - Number of Bricks: 1 x 2 = 2
22:21 Scottix 2 machines in cluster 1 machine mount using gluster-fuse
22:22 andreask and you unplug both gluster servers?
22:23 Scottix simple test of 1. open file 2. write to file 3. hard shutdown cluster machines 4. write to file 5. close file
22:23 Scottix yes i unplug both
22:24 semiosis Scottix: see ,,(mount server)
22:24 glusterbot Scottix: (#1) The server specified is only used to retrieve the client volume definition. Once connected, the client connects to all the servers in the volume. See also @rrnds, or (#2) Learn more about the role played by the server specified on the mount command here: http://goo.gl/0EB1u
22:25 semiosis when a client mount loses connection to all bricks (replicas of some file) then it returns a "Transport endpoint not connected" error
22:25 semiosis check your client log file to see whats going on
22:28 Scottix ya i even get 0-datastore2-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
22:29 Scottix now if i try to open a file it does fail, what i am saying is the file is still open and writing data
22:30 semiosis Scottix: glusterfs version?
22:31 Scottix ok so i was using 3.3.1 but that had heal issues even when 1 server was down and brought up and I saw that was fixed so I tried using 3.3.2qa2 since I wanted to make sure it worked I can go back to 3.3.1
22:36 Scottix basically 3.3.1 is unusable for us in a replica reliability standpoint
22:36 21WAAJAJ4 joined #gluster
22:45 JoeJulian Supermathie: I wonder if you set performance.client-io-threads on if it does anything for nfs.
22:49 JoeJulian Scottix: Wait, what? You have a replica volume, you pull the plug on one of the replicas and it keeps working. For that it's unusable???
22:49 Scottix no i pull plug on both
22:49 JoeJulian Ah, ok.
22:50 JoeJulian "basically unplug the machine(s)" - what's "basically" mean in this context?
22:51 Scottix well flip the power supply off essentially unplugging
22:51 JoeJulian Ok, just making sure it was an actual hard down, as opposed to a simulated one.
22:52 Scottix sure
22:52 JoeJulian What's performing the write?
22:54 Scottix i have a php script that opens a file 1. fopen  2. fwrite 3. then waits till i push a key 4. fwrite 5. fclose
22:54 JoeJulian And it's mounted via the fuse client?
22:54 JoeJulian I presume step 3 is where you cut the power?
22:55 Scottix mount -t glusterfs -o backupvolfile-server=gfs03 gfs02:/datastore2 /mnt/datastore/
22:55 Scottix yes
22:56 Scottix it is a basic test just to see what it does
23:00 JoeJulian your fopen: "rw", "a", "w"?
23:01 Scottix 'w'
23:01 JoeJulian If I can duplicate this, I'll see if I can write a test case for it. I haven't worked with the test system yet.
23:04 Scottix alright great, imo if i get cluster offline message, it should go ro or some sort of blocking io for open files that is how raid works
23:04 JoeJulian It should report ECONN
23:05 Scottix ya if i try to open a file i get EIO which is expected
23:09 JoeJulian yep, I duplicated it.
23:10 Scottix well that is hopeful, if gluster can fix that it might fix a lot of issues haha pretty basic haha
23:12 JoeJulian Hmm... Scottix according to an strace, if you check the return value of fclose you should see that it's in error: close(3)                                = -1 ENOTCONN (Transport endpoint is not connected)
23:13 chrisgh joined #gluster
23:13 JoeJulian Unless you set the fd synchronous, that's the correct behavior.
23:14 Scottix ok one sec
23:17 JoeJulian php's misbehaving. The glibc call clearly returns ENOTCONN but php does not pass that error to the script.
23:19 Scottix ya i see it too
23:19 Scottix fwrite is still working though...write(3, "Some Text 2", 11) = 11
23:21 JoeJulian Of course. If you wrote enough to exceed the cache (and I don't know what that limit is) then it would error. You should also be able to trigger the error with an fsync.
23:21 Scottix yep you are right ok thanks
23:22 JoeJulian er, fflush
23:23 Scottix we do check fclose so that could be a trigger but php is not helping us haha thanks for the support
23:25 JoeJulian You're welcome. Glad it's not broken. :D
23:27 cfeller semiosis: from our conversation from earlier: Turning off MultiViews in the Apache config speeds up directory listings dramatically on gluster mounted volumes. It is still much slower than 'ls', but down from 48 seconds to only 12 seconds for the same directory.
23:28 JoeJulian wow
23:30 abelur joined #gluster
23:34 yinyin joined #gluster
23:34 yinyin_ joined #gluster
23:35 yinyin joined #gluster
23:36 yinyin- joined #gluster
23:40 JoeJulian @later tell Scottix Looks like it might be https://bugs.php.net/bug.php?id=60110
23:40 glusterbot JoeJulian: The operation succeeded.
23:44 yinyin joined #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary