Perl 6 - the future is here, just unevenly distributed

IRC log for #gluster, 2014-11-25

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:00 bit4man joined #gluster
00:00 jaank joined #gluster
00:02 stomith I'm missing something. I figured it out. Thanks :)
00:06 elico joined #gluster
00:07 jaank i am new to gluster and have what I hope is an easy question.  If I have a small pool of servers, say 3 to make it easy, each server having 2 bricks, and my gluster volume requires 2 replicas, is there logic to prevent both replicas from being stored on the same server?
00:14 JoeJulian @brick order
00:14 glusterbot JoeJulian: Replicas are defined in the order bricks are listed in the volume create command. So gluster volume create myvol replica 2 server1:/data/brick1 server2:/data/brick1 server3:/data/brick1 server4:/data/brick1 will replicate between server1 and server2 and replicate between server3 and server4.
00:15 JoeJulian ~mount server | stomith
00:15 glusterbot stomith: (#1) The server specified is only used to retrieve the client volume definition. Once connected, the client connects to all the servers in the volume. See also @rrdns, or (#2) One caveat is that the clients never learn of any other management peers. If the client cannot communicate with the mount server, that client will not learn of any volume changes.
00:23 mikedep333 joined #gluster
00:40 _jmp_ joined #gluster
01:01 cleo_ joined #gluster
01:11 topshare joined #gluster
01:21 msmith joined #gluster
01:31 baojg joined #gluster
01:32 bala joined #gluster
01:50 haomaiwa_ joined #gluster
01:53 portante left #gluster
01:54 msmith joined #gluster
01:57 gildub joined #gluster
02:16 meghanam_ joined #gluster
02:17 meghanam__ joined #gluster
02:17 mojibake joined #gluster
02:18 sputnik13 joined #gluster
02:23 msmith joined #gluster
02:24 mator joined #gluster
02:35 churnd joined #gluster
02:53 gildub joined #gluster
02:54 hflai_ joined #gluster
02:59 kdhananjay joined #gluster
03:20 m0zes joined #gluster
03:38 kshlm joined #gluster
03:45 aravindavk joined #gluster
03:45 aravindavk joined #gluster
03:56 soumya joined #gluster
03:58 bharata-rao joined #gluster
04:03 hagarth joined #gluster
04:07 kanagaraj joined #gluster
04:09 itisravi joined #gluster
04:19 baojg joined #gluster
04:19 rjoseph joined #gluster
04:24 nbalachandran joined #gluster
04:26 RameshN joined #gluster
04:27 sputnik13 joined #gluster
04:32 shubhendu joined #gluster
04:33 deepakcs joined #gluster
04:34 sahina joined #gluster
04:35 anoopcs joined #gluster
04:35 nbalachandran joined #gluster
04:36 rafi1 joined #gluster
04:37 Rafi_kc joined #gluster
04:39 kanagaraj joined #gluster
04:40 meghanam joined #gluster
04:40 meghanam_ joined #gluster
04:40 sputnik13 joined #gluster
04:41 baojg joined #gluster
04:43 harish_ joined #gluster
04:49 atinmu joined #gluster
04:50 spandit joined #gluster
04:53 badone joined #gluster
04:54 _Bryan_ joined #gluster
05:08 deepakcs JustinClift, hi, did u see my mail regarding a new mailing list ID (cinder.glusterfs.ci@gluster.org), any thoughts ?
05:12 baojg joined #gluster
05:12 lalatenduM joined #gluster
05:12 ndarshan joined #gluster
05:15 msmith joined #gluster
05:18 pp joined #gluster
05:22 baojg joined #gluster
05:24 jiffin joined #gluster
05:26 kumar joined #gluster
05:29 ppai joined #gluster
05:34 Humble joined #gluster
05:43 vimal joined #gluster
05:45 raghu` joined #gluster
05:47 kdhananjay joined #gluster
05:48 ramteid joined #gluster
05:52 soumya joined #gluster
05:53 gildub joined #gluster
05:55 hagarth joined #gluster
06:01 kovshenin joined #gluster
06:04 ppai joined #gluster
06:05 baojg joined #gluster
06:16 glusterbot News from newglusterbugs: [Bug 1167580] [USS]: Non root user who has no access to a directory, from NFS mount, is able to access the files under .snaps under that directory <https://bugzilla.redhat.co​m/show_bug.cgi?id=1167580>
06:19 SOLDIERz joined #gluster
06:20 dusmant joined #gluster
06:24 nishanth joined #gluster
06:28 overclk joined #gluster
06:30 baojg joined #gluster
06:30 anil joined #gluster
06:33 shubhendu joined #gluster
06:43 bala joined #gluster
06:44 atalur joined #gluster
06:49 sputnik13 joined #gluster
06:55 atalur joined #gluster
07:07 ctria joined #gluster
07:09 saurabh joined #gluster
07:11 Fen2 joined #gluster
07:17 atalur joined #gluster
07:17 RameshN joined #gluster
07:20 dusmant joined #gluster
07:22 anil joined #gluster
07:23 kovshenin joined #gluster
07:31 Debloper joined #gluster
07:40 calum_ joined #gluster
07:44 RameshN joined #gluster
07:45 aravindavk joined #gluster
07:47 dusmant joined #gluster
07:49 Fen2 joined #gluster
07:53 LebedevRI joined #gluster
07:59 harish_ joined #gluster
08:22 fsimonce joined #gluster
08:26 warcisan joined #gluster
08:26 maveric_amitc_ joined #gluster
08:27 baojg joined #gluster
08:47 glusterbot News from newglusterbugs: [Bug 1023134] Used disk size reported by quota and du mismatch <https://bugzilla.redhat.co​m/show_bug.cgi?id=1023134>
08:52 Fen2 joined #gluster
08:53 baojg joined #gluster
09:04 nshaikh joined #gluster
09:08 DV joined #gluster
09:12 deniszh joined #gluster
09:12 rjoseph joined #gluster
09:15 DV joined #gluster
09:16 [Enrico] joined #gluster
09:16 Slashman joined #gluster
09:21 baojg joined #gluster
09:23 ppai joined #gluster
09:39 ghenry joined #gluster
09:51 nocturn00 Can I interrupt a rebalance command if volume rebalance stop fails?  I want to delete the volume anyway?
09:52 ekuric joined #gluster
09:53 deepakcs joined #gluster
10:02 shubhendu joined #gluster
10:05 topshare joined #gluster
10:05 topshare joined #gluster
10:06 DV joined #gluster
10:14 hagarth joined #gluster
10:20 deniszh1 joined #gluster
10:38 shubhendu joined #gluster
10:39 PatNarciso Good morning all.
10:44 calum_ joined #gluster
10:46 Fen2 joined #gluster
10:47 Fen2 hi ! Can Ret Hat Storage Console  be installed on CentOS 7 ? :)
10:47 feeshon joined #gluster
10:47 glusterbot News from newglusterbugs: [Bug 1167734] Enhancement: command that failed with cli-timeout should be logged in CMD_LOG_HISTORY file <https://bugzilla.redhat.co​m/show_bug.cgi?id=1167734>
10:51 kovshenin joined #gluster
10:51 kovshenin joined #gluster
10:52 diegows joined #gluster
10:56 Rafi_kc joined #gluster
10:56 rafi1 joined #gluster
10:59 gildub joined #gluster
11:00 rafi1 joined #gluster
11:07 elico joined #gluster
11:07 jmarley joined #gluster
11:20 ArminderS- joined #gluster
11:21 ArminderS- can anyone point me to good guide to tweak gluster peformance
11:22 ArminderS- and also if setting up a 4-node cluster on VMware (please don't ask why) with all 4 on different VMware nodes
11:22 ArminderS- what shall be the recommended specs (ram/vCPU) for each gluster node?
11:29 Arminder- joined #gluster
11:31 kkeithley1 joined #gluster
11:40 calum_ joined #gluster
11:41 dusmant joined #gluster
11:55 lalatenduM Gluster Community Bug Triage meeting in #gluster-meeting in another 5 mins , Agenda: https://public.pad.fsfe.org/p/gluster-bug-triage
12:01 meghanam joined #gluster
12:02 soumya joined #gluster
12:04 meghanam_ joined #gluster
12:04 NigeyS joined #gluster
12:08 ArminderS joined #gluster
12:13 elico joined #gluster
12:16 mojibake joined #gluster
12:24 meghanam joined #gluster
12:24 meghanam_ joined #gluster
12:25 warcisan ArminderS: i don't think there are recommended specs, it depends on the number of volumes you export (and if you plan to use georeplication, which needs a LOT of memory)
12:25 warcisan i'm running my gluster servers on vmware too
12:28 mrEriksson warcisan: How much is a lot?
12:29 ArminderS joined #gluster
12:29 warcisan again, all is relative, but as an example: my server has 16GB of ram and now i don't have issues, 8GB was too few
12:29 warcisan i've got 18TB of small files of a few KB being geo-synced
12:30 warcisan which is a kind of worst-case scenario for gluster
12:30 hagarth joined #gluster
12:30 ArminderS joined #gluster
12:30 warcisan but i'm still running 3.4, i believe now geo-sync is more efficient
12:33 elico joined #gluster
12:34 itisravi_ joined #gluster
12:34 ninkotech joined #gluster
12:34 ninkotech_ joined #gluster
12:34 kovshenin joined #gluster
12:35 mator how do i remove brick and move all data out of it to volume (remaining bricks) ?
12:36 mrEriksson gluster volume remove-brick ...
12:42 anoopcs joined #gluster
12:47 edward1 joined #gluster
12:47 warcisan i've asked this question before, but is it possible to move a volume + brick from one server to another? (when i fysically move the disk containing the brick)
12:48 glusterbot News from newglusterbugs: [Bug 1167793] fsync on write-back doesn't wait for pending writes when an error is encountered <https://bugzilla.redhat.co​m/show_bug.cgi?id=1167793>
12:50 glusterbot News from resolvedglusterbugs: [Bug 1165429] Gluster Fuse high memory consumption <https://bugzilla.redhat.co​m/show_bug.cgi?id=1165429>
12:52 mator mrEriksson, and with data being left at the original volume? not at the removed brick ?
12:58 raghu` joined #gluster
13:01 Slashman_ joined #gluster
13:03 bennyturns joined #gluster
13:09 lalatenduM joined #gluster
13:14 kumar joined #gluster
13:15 overclk joined #gluster
13:16 overclk davemc, ping
13:16 glusterbot overclk: Please don't naked ping. http://blogs.gnome.org/mark​mc/2014/02/20/naked-pings/
13:16 rjoseph joined #gluster
13:16 dusmant joined #gluster
13:17 overclk davemc, are you on the hangout session?
13:18 kaushal_ joined #gluster
13:19 kshlm joined #gluster
13:21 haomaiwa_ joined #gluster
13:22 nbalachandran joined #gluster
13:25 raghu` joined #gluster
13:26 ildefonso joined #gluster
13:26 lalatenduM joined #gluster
13:28 nbalachandran joined #gluster
13:29 nbalachandran joined #gluster
13:34 jmarley joined #gluster
13:38 topshare joined #gluster
13:47 overclk joined #gluster
13:49 partner damn, had an overlapping meeting, is there going to be a recording available perhaps?
13:50 hagarth partner: the hangout is going to be rescheduled
13:52 baojg joined #gluster
13:53 B21956 joined #gluster
14:00 partner ah, great (for me)
14:02 aravindavk joined #gluster
14:02 coredumb ndevos: i have interesting numbers for Git over glusterfs :)
14:03 davemc coredumb, sounds really intriguing
14:04 coredumb i feared to have a really big difference between repos on FS and on glusterfs
14:04 hagarth coredumb: is this a git repo hosted on a glusterfs volume?
14:04 coredumb but from what i'm testing not so much
14:04 coredumb hagarth: yes
14:05 hagarth coredumb: what is your volume configuration?
14:05 coredumb hagarth: default + cache-size 2GB
14:05 diegows joined #gluster
14:06 hagarth coredumb: spread over how many bricks?
14:06 coredumb 2 bricks in replica
14:06 coredumb so basically worst case scenario
14:06 hagarth coredumb: ah ok, latency starts kicking in when we spread the volume and make it distributed replicated
14:06 meghanam joined #gluster
14:07 hagarth I have seen readdir operations taking a lot of time with a 6*2 distributed-replicated volume
14:07 meghanam_ joined #gluster
14:07 hagarth without optimize-readdir option set, the git clone operation would crawl
14:07 coredumb a 4.2MB repo 471 very small to small files = FS 1.5s clone, Gluster 3s clone
14:08 coredump joined #gluster
14:08 hagarth coredumb: wow, that looks pretty good!
14:08 coredumb 1GB repo with 609 files from very small to big > 10MB files
14:08 coredumb barely no difference
14:08 coredumb advantage glusterfs
14:08 partner nice
14:09 partner hardware or some shared storage underneath?
14:09 coredumb it's all VMs for servers
14:10 coredumb client is an openvz container
14:10 partner rgr
14:11 coredumb this is only for cloning though i'll now do some testing for push
14:11 partner i was thinking a bit of putting vmware vms on top of a gluster.. "just" a bit of must case, going through the options
14:12 coredumb VMs should be OK
14:12 partner yeah, the famous "should" :)
14:12 coredumb i've had good performances with KVM on glusterfs
14:12 coredumb with libgfapi though
14:13 coredumb we have an openstack instance running 150 instances on a 2 nodes 2 bricks replicated glusterfs over fuse mount :)
14:13 partner i'm thinking sharing with iscsi aswell as for some multipath / fault-tolerancy (NFS)
14:13 partner we run some openstack with glusterfs aswell but currently still in smaller scale and mostly share the images via it
14:14 partner the problems come with esxi, not that easy anymore to play in the host as it would be in some random linux host
14:15 partner i've just never done this before like this. in addition i have fusionio and traditional spinning storage on couple of boxes
14:16 virusuy joined #gluster
14:16 virusuy joined #gluster
14:17 coredumb lol push times are VERY GOOD
14:18 overclk joined #gluster
14:18 coredumb in the 1GB repo
14:18 coredumb git rm Scripts
14:18 coredumb 49 files
14:18 coredumb push
14:18 coredumb 1.2s
14:19 coredumb adding 449 files
14:19 coredumb push = 2.3s
14:19 coredumb looks like fassssst enough for me :)
14:25 coredumb have to push a linux tree on there
14:25 coredumb ^^
14:25 partner :)
14:35 theron joined #gluster
14:43 SOLDIERz joined #gluster
14:56 jmarley joined #gluster
14:57 rwheeler joined #gluster
14:58 msmith joined #gluster
14:58 shubhendu joined #gluster
14:58 msmith joined #gluster
15:00 jobewan joined #gluster
15:00 UnwashedMeme joined #gluster
15:08 kshlm joined #gluster
15:13 bit4man joined #gluster
15:14 plarsen joined #gluster
15:15 plarsen joined #gluster
15:18 wushudoin joined #gluster
15:18 skippy in the event that anyone here is not also subscribed to gluster-users, I would appreciate any help on http://supercolony.gluster.org/pipermail​/gluster-users/2014-November/019637.html
15:27 zerick joined #gluster
15:28 soumya joined #gluster
15:36 sputnik13 joined #gluster
15:38 elico left #gluster
15:43 elico joined #gluster
16:24 Fetch joined #gluster
16:24 Fetch does anyone have docs/a link explaining if running a RAID-5 HBA on each gluster node is better, or doing 4 disks in pass-through
16:25 RameshN joined #gluster
16:30 anoopcs joined #gluster
16:34 ildefonso Fetch, this is gluster, not ZFS.  Gluster just add replication/stripping between the nodes, it is agnostic on underlying hardware store (however, XFS seems to be preferred for filesystem).
16:40 sputnik13 joined #gluster
16:43 calisto joined #gluster
16:45 Fetch ildefonso: right, but if running 4 different drives in pass-through on gluster is considered more/less performant/reliable than presenting one device from a RAID controller, that's what I'd like to know
16:45 Fetch Ceph, for instance, prefers to have an osd per spinning disk, with no raid
16:46 Fetch but maybe gluster stream overhead is high enough that you should minimize drive daemons per node
16:47 ildefonso Fetch, gluster works over another filesystem, it doesn't directly touch the drives.
16:48 ildefonso that being said, I do not have numbers of the overhead added by gluster.
16:48 ildefonso I do know RAID5 is slow, specially when it comes to writes.
16:48 Fetch I know that :) (sorry, I have an oVirt and gluster cluster already. Trying to determine whether I should do one big raid device or pass-through on a new cluster)
16:48 ildefonso but that's just the way RAID5 works.
16:52 ildefonso Fetch, in general, I am no friend of RAID5, it is just too slow.  However, I am no friend of no redundancy either.  I allow ZFS to directly use the disks because it does its own "raid", gluster will not be doing that, you would rely on inter-node replication for redundant copies, depending on your use case, that's could be ok or not.
16:53 Fetch gluster's redudant copies would be fine
16:53 Fetch thanks
16:55 daMaestro joined #gluster
16:56 manjrem joined #gluster
16:58 RameshN joined #gluster
16:59 JoeJulian I think the confusion, there, was the term "pass-through" since that implies bypassing everything and going directly to the drive.
17:01 hagarth joined #gluster
17:02 elico joined #gluster
17:03 Fetch JoeJulian: I apologize. I'm referring to ext4 or xfs getting laid on a raid pass-through drive, and it being added as a brick to gluster
17:03 virusuy joined #gluster
17:03 virusuy joined #gluster
17:05 plarsen joined #gluster
17:06 plarsen joined #gluster
17:07 julim joined #gluster
17:13 Slash__ joined #gluster
17:21 baojg_ joined #gluster
17:22 georgeh-LT2 joined #gluster
17:35 baojg joined #gluster
17:43 hagarth joined #gluster
17:48 lalatenduM joined #gluster
18:05 JoeJulian Fetch: I think the term you're looking for is JBOD.
18:06 ildefonso JoeJulian, yeah, that's what I thought he was talking about, I only found the question really uncommon for gluster (I would expect such a question about ZFS).
18:24 diegows joined #gluster
18:32 corretico joined #gluster
18:36 sputnik13 joined #gluster
18:40 ghenry joined #gluster
18:47 sputnik13 joined #gluster
18:54 NigeyS so here's a q about ading another brick, say i have a 200gig XFS voluume mounted and want to add another 200gig to all the replicas, how do i then add that to the current volume is it as simple as it looks in the docs ?
18:55 ildefonso NigeyS, well, if you can "grow" existing bricks, just do that and gluster will pick up the new disk space.
18:57 NigeyS when you say grow.. how do you mean exactly? the current bricks are an EBS volume, so my idea was to add another voluame, also 200gig, mount it as /dev/bla and create a new brick that way, is this on the right lines ?
19:01 ildefonso NigeyS, well, my particular bricks are over LVM, so, I just grew the LVM volume, and then extended the filesystem (xfs_growfs).
19:01 ildefonso after doing that on all servers, gluster volume grew as well, automagically.
19:02 ildefonso NigeyS, I know you can add more bricks, however, you need to be careful so that replication ends up in different servers.
19:09 nshaikh joined #gluster
19:10 msmith Hey all, I realize this question may be a bit dated, but does anyone know of a reason why glusterfs 3.0 would not be affected by the 32 group limit described here http://gluster.org/pipermail/glust​er-users.old/2014-May/017337.html
19:11 NigeyS ildefonso i got ya, thanks i'll do some more reading i think!
19:21 semiosis NigeyS: i replace the ebs vols with larger ones from snapshots & xfs_grow
19:25 NigeyS semiosis i was just reading about doing that
19:26 NigeyS also your recomendation of using EBS was nice, seems to have increased the read / write speed by quite a bit compared with using the root fs
19:27 semiosis you should be using ebs-root instances!
19:28 semiosis that way you can call CreateImage to snapshot your servers
19:29 hybrid512 joined #gluster
19:29 NigeyS ebs-root? .. ill have to google that :p
19:29 semiosis and later restore them from snapshots
19:31 semiosis the AMI you're launching is either ebs-root or instance-store-root.  ubuntu for example gives you a choice of which to launch (cloud.ubuntu.com/ami)
19:32 NigeyS ah yes, ok, something ill need to look into then, i never normally deal with AWS as you can tell !
19:32 semiosis right
19:33 NigeyS Root device type
19:33 NigeyS ebs
19:33 NigeyS seems they are ebs :)
19:33 NigeyS root*
19:34 semiosis great... are they magnetic ebs or ssd ebs? ;)
19:34 NigeyS ssd
19:34 semiosis nice
19:35 NigeyS i got something right lol
19:35 NigeyS so you have your EBS bricks as root devices or block device?
19:36 semiosis both
19:36 semiosis ebs root + many extra ebs vols attached for glusterfs bricks
19:36 NigeyS i see, that's where ive gone wrong then, i only have /dev/sda as root and my ebs bricks are just blocks
19:36 NigeyS Block devices
19:37 NigeyS .. /dev/sda1
19:37 NigeyS .. /dev/sdf
19:37 semiosis that's fine
19:38 semiosis you attach the ebs vol to /dev/sdf, then format it XFS, then mount it, then use it as a glusterfs brick
19:38 NigeyS yup done that, so far so good..
19:39 NigeyS dev/xvdf        50G   53M   50G   1% /bricks/websites
19:39 NigeyS fs01:/websites   50G   53M   50G   1% /sites
19:40 NigeyS no idea why it changed from sdf to xvdf mind, think thats an aws thing
19:40 semiosis it's a linux kernel thing
19:41 semiosis Xen Virtual Disk
19:41 semiosis instead of SCSI Disk
19:41 NigeyS ah that would explain why it didnt happen locally on a non virtual setup
19:41 semiosis yep
19:42 baojg joined #gluster
19:42 semiosis s/SCSI/SATA/
19:42 glusterbot What semiosis meant to say was: An error has occurred and has been logged. Please contact this bot's administrator for more information.
19:42 semiosis glusterbot: you suck
19:42 NigeyS so looks like my setup is pretty much ok, just have to figure out the permissions for that /site mount so the webservers and ftp users can get to it.
19:42 NigeyS lol
19:43 semiosis oh how about this...
19:43 semiosis An error has occurred and has been flogged. Please contact this bot's administrator for more information.
19:43 semiosis s/flogged/logged/
19:43 glusterbot What semiosis meant to say was: An error has occurred and has been logged. Please contact this bot's administrator for more information.
19:43 semiosis \o/
19:43 JoeJulian I thought flogged was better.
19:43 NigeyS lol poor bot
19:47 NigeyS JoeJulian your blog post that i read yesterday, mentions disabling stat() on php to get a performance increase .. would you recommend doing that even if i have varnish running? and afaik varnish will not do much for php right?
19:48 semiosis NigeyS: you ought to try making a snapshot of one of your glusterfs servers with Create Image (can do it from the AWS web console, right click the server) and replacing it with a new one restored from the snapshot
19:48 semiosis see ,,(replace) -- same hostname
19:48 glusterbot Useful links for replacing a failed server... if replacement server has different hostname: http://web.archive.org/web/20120508153302/h​ttp://community.gluster.org/q/a-replica-nod​e-has-failed-completely-and-must-be-replace​d-with-new-empty-hardware-how-do-i-add-the-​new-hardware-and-bricks-back-into-the-repli​ca-pair-and-begin-the-healing-process/ ... or if replacement server has same
19:48 glusterbot hostname: http://goo.gl/rem8L
19:48 semiosis it's an important exercise & one of the huge wins of running glusterfs on EC2
19:48 JoeJulian I would, yes.
19:49 semiosis you'll want to tune up varnish, for example to cache static objects & pass requests for dynamic content
19:49 NigeyS semiosis i will do just that, thanks for the tip!
19:49 semiosis yw
19:50 NigeyS semiosis luckily we have someone else who is pretty good with varnish, i just have to do the initial install
19:54 Boober joined #gluster
19:54 deniszh joined #gluster
19:57 skippy I continue to have client-side ping timeouts with FUSE mounts.  If I were to switch to NFS mounts, would I be avoiding any underlying error, or just masking the symptoms?
19:58 atticus210 joined #gluster
20:04 deniszh joined #gluster
20:11 PeterA joined #gluster
20:15 semiosis skippy: most likely you'd have the same problem, since ping timeouts are almost always due to network problems
20:16 semiosis ip addr conflict, maybe?
20:16 skippy semiosis: http://supercolony.gluster.org/pipermail​/gluster-users/2014-November/019637.html
20:16 skippy 3 different Gluster clusters.  Mix of versions.  Mix of virtual and physical.
20:16 skippy If it's network, I'm having a devil of a time identifying it. :(
20:19 skippy not all clients experience problems at the same time; and clients with mulitple bricks don't report consistent failures across bricks.
20:19 skippy If it were network, I would expect to see all client bricks reporting problems, I should think.
20:19 deniszh joined #gluster
20:20 semiosis i've seen things
20:20 semiosis and stuff
20:20 semiosis on networks
20:21 skippy i'm struggling with how to identify any root cause.
20:22 JoeJulian wireshark
20:23 skippy can you help me understand what I should be expecting to see from wireshark output?
20:23 JoeJulian @lucky tcp protocol
20:23 glusterbot JoeJulian: http://en.wikipedia.org/wiki/​Transmission_Control_Protocol
20:23 JoeJulian yes!
20:24 semiosis atm machine
20:24 JoeJulian I'd start there. If there's TCP problems, they'll show up pretty quickly.
20:26 semiosis skippy: are you doing any fancy networking?  bonded ethernet perhaps?
20:27 skippy nothing fancy.  vmxnet3 NIC on VM guests.  Broadcom, I think, on the physical servers.
20:28 semiosis whats the physical network topology like?
20:29 atticus210 Yo… skippy’s network guy here
20:29 semiosis atticus210: https://www.youtube.com/watch?v=DtRNg5uSKQ0
20:30 skippy should I be capturing tcpdump on the client, or on the servers?
20:30 JoeJulian When I'm diagnosing this stuff, I do both. It's nice to see if a transmit isn't received, etc.
20:30 atticus210 So we are runnign a converged 10G network. Blades running VMware with dual NICs, nics are uplinks in a DVSwitch. These go to a chassis switch that LAG to a 40G core. Then, the physical Gluster boxes have a 10G uplink
20:31 atticus210 We have looked for dropped packets, re transmits, whatever we can think of and all the ports are clean
20:31 atticus210 Skippy you are only running a single NIC from each physical host as I recall, yeah?
20:32 skippy correct
20:33 atticus210 So seeing that this is converged (aka, iSCSI running next to other stuff in other VLANs) I would expect all sorts of issues on the iSCSI side if we had issues… not to mention we don’t see bad counters any where.
20:33 JoeJulian By the same token, if a RCP is sent from a client and the packet is received but the server doesn't respond or vice versa, that's indicative too.
20:34 semiosis i've seen plenty of people come through here with this kind of problem and it usually boils down to some ethernet bonding issue
20:34 skippy https://gist.github.com/skpy/804007ffb631217a8e74   timestamps from failures for one volume for the last month.
20:34 semiosis the fact that you are doing link aggregation supports my theory
20:34 * semiosis called it
20:35 atticus210 What’s the base issue then? LAGs *should* be transparent
20:35 semiosis [15:27] <skippy> nothing fancy.
20:35 semiosis 5:30] <atticus210> So we are runnign a converged 10G network. Blades running VMware with dual NICs, nics are uplinks in a DVSwitch. These go to a chassis switch that LAG to a 40G core. Then, the physical Gluster boxes have a 10G uplink
20:35 semiosis lol, not fancy at all, no
20:35 semiosis ;)
20:36 skippy nothing fancy from the VM guest side.
20:36 skippy no LInux-level link aggregation.
20:36 skippy anyting above that is all black magic that atticus210 handles. :)
20:36 semiosis so yeah, it should be transparent, in theory, but in practice it often isnt
20:37 atticus210 Just odd that seemingly Gluster would be sensitive to it
20:37 atticus210 At least the lcient protocol
20:37 atticus210 s/client
20:38 semiosis skippy: might be interesting to see if you turn up your ping-timeout from 5s to 10s, 15s, 20s, ... and see if there's a large enough timeout that these hiccups dont kill the client
20:39 skippy semiosis: it happens with the default 42 second timeout, too
20:39 semiosis i'd also look to see if there's some pattern to when they happen.  is it periodic?  does it happen at the same times as anything else?
20:39 JoeJulian The symptoms are consistent with a network problem. I cannot remember a time when someone had those symptoms where it was not a network problem. That said, I won't dismiss the possibility that there's a bug but with the history we've seen here, we would like evidence to follow before we spend a lot of effort tracking down other possibilities.
20:39 semiosis JoeJulian++
20:39 glusterbot semiosis: JoeJulian's karma is now 16
20:40 semiosis skippy: interesting
20:40 skippy I've yet to discern a pattern between any of the clients, any of the servers, or anything else going on at the same time.
20:41 semiosis i'd look at the times of the timouts
20:42 semiosis how long between them?  how long after first mount does a timeout happen
20:43 skippy https://gist.github.com/skpy/74b046701abc81489de0  different client mounting two volumes.
20:44 skippy note that the two volumes fail at different times.
20:44 nage joined #gluster
20:44 semiosis atticus210: the feeling i'm getting is there's some kind of timeout, like in a conntrack/nat table, that isn't getting reset by ongoing glusterfs traffic
20:45 semiosis but i'm guessing it's some kind of layer 2 thing
20:45 failshell joined #gluster
20:45 semiosis idk much about LAG/bonding though
20:47 atticus210 it could, possibly, maybe, be a hash recalc on the LAG, but I would expect all sorts of things to fail all over the place were that the case. We have 100s of VMs and nary a hiccup… so i would expect something unique to the Gluster protocol that doesn’t play nice .
20:48 atticus210 Is there any sort of ARp component? (Feel free to point me to a doc if there si one)
20:48 semiosis nope
20:48 semiosis it's all pure TCP/IP
20:49 atticus210 Ok
20:50 atticus210 So a LAG shouldn’t matter unless it just absolutely fails. layer 3 knows nothing of the LAGs goings on short of loss of traffic
20:51 skippy tcpdump capture initiated.  Now I just wait until a failure occurs.
20:51 atticus210 PCAP fun in the AM. I’ll bring the coffee
20:51 social joined #gluster
20:54 semiosis check this out http://pastie.org/9743312
20:54 semiosis looks like you're most likely to catch the problem between 0200 & 0300
20:54 semiosis that's hours from the log for client-0
20:55 atticus210 BTW, super apprciate your time.
20:55 skippy and Gluster helpfully logs at UTC, so that's actually ... 9 PM?
20:55 atticus210 We’ve been beating our heads on this for a while
20:55 semiosis yw
20:56 atticus210 We have world domination planned based on GLuster… sooooo, you know.
20:56 semiosis there's a great slope down both sides from 2000
20:56 semiosis idk if that means anything
20:57 skippy it's helpful, semiosis, but not conclusive, since errors occur at other times, too
20:57 atticus210 My first suspicion were snapshots… but the times never lined up.
20:57 semiosis what happens a lot at 0200 that doesnt happen at all between 0700 & 1200
20:58 semiosis if this turns out to be peak/off-peak traffic times, then that might suggest it's a load problem
20:58 semiosis have you tried saturating your network with gluster traffic?
20:58 semiosis to trigger the problem?
20:58 skippy we have not.
20:59 atticus210 That’s pretty difficult at 40G
20:59 semiosis you mean fun
20:59 atticus210 Tomato
20:59 NigeyS 40G .. mm.. yeah that's definately fun ;)
21:00 SOLDIERz joined #gluster
21:03 gildub joined #gluster
21:03 atticus210 We’ll check the PCAP. Really really interested to see how this pans out.
21:03 semiosis keep us posted
21:04 atticus210 Will do. Thanks again
21:04 skippy thanks for the help.  Much appreciated.
21:04 semiosis yw
21:04 atticus210 And really, if something is fiddly in my network, i want to find it.
21:12 elico joined #gluster
21:26 badone joined #gluster
21:27 NigeyS semiosis should i be concered that these have appeared? http://pastie.org/9743372 nothing seems out of the ordinary, the web page loaded, and it's on both servers...
21:29 badone_ joined #gluster
21:30 baojg joined #gluster
21:31 semiosis NigeyS: any time you see "remote operation failed" in a client log you should look for a corresponding entry in the brick logs, the bricks are the other end of the remote operation
21:31 semiosis @remote operation
21:31 glusterbot semiosis: I do not know about 'remote operation', but I do know about these similar topics: 'remote operation failed'
21:31 semiosis @remote operation failed
21:31 glusterbot semiosis: any time you see 'remote operation failed' in a client (or shd, or nfs) log file, you should look in the brick log files for a corresponding entry, that would be the other end of the remote operation
21:31 semiosis i thought that sounded familiar after i said it
21:31 semiosis lol
21:31 NigeyS lol, ill check the other logs then, i did find this though..
21:32 NigeyS https://bugzilla.redhat.co​m/show_bug.cgi?id=1144527
21:32 glusterbot Bug 1144527: unspecified, unspecified, ---, vbellur, POST , log files get flooded when removexattr() can't find a specified key or value
21:32 semiosis remote operation failed: No data available
21:33 semiosis hey JoeJulian can we get that on a trigger whenever someone pastes a log entry with 'remote operation failed'
21:33 NigeyS its weird though, everything seems to be working fine, page is being served etc
21:35 andreask joined #gluster
21:36 NigeyS http://pastie.org/9743396  that's all there is in the bricks log for those operations
21:37 semiosis weird
21:37 semiosis what distro is this?
21:37 semiosis glusterfs version?
21:39 NigeyS 3.4.5 ubuntu 14.04
21:41 semiosis just a wild guess but maybe that xattr is for a feature ubuntu lacks (http://sourceforge.net/p/linux-ima/wiki/Home/)
21:42 JoeJulian What would be really cool is if I had the time to make glusterbot go to the pastebin page and scan for known issues...
21:42 JoeJulian Or at least offer hints based on common problems.
21:42 NigeyS semiosis oki if thats the case i can relax a little i guess..
21:42 semiosis http://dilbert.com/dyn/str_strip/000​000000/00000000/0000000/000000/10000​/8000/200/18291/18291.strip.zoom.gif
21:43 semiosis JoeJulian: ^^
21:43 JoeJulian :)
21:53 plarsen joined #gluster
21:54 tdasilva joined #gluster
22:05 plarsen joined #gluster
22:08 firemanxbr joined #gluster
22:08 baoboa joined #gluster
22:14 meghanam joined #gluster
22:15 meghanam_ joined #gluster
22:24 avati joined #gluster
22:33 devilspgd joined #gluster
22:33 PeterA joined #gluster
22:50 msmith_ joined #gluster
22:52 uebera|| joined #gluster
22:58 coredump guys
22:59 devilspgd Okay, total n00b question, but how much performance difference would one expect using nfs vs gluster's native client?
22:59 devilspgd Workload is a smallish web server, tons of PHP.
23:04 marcoceppi_ joined #gluster
23:09 jmarley joined #gluster
23:09 semiosis devilspgd: impossible to answer that question
23:09 devilspgd Fair enough :)
23:10 devilspgd I realize it's a total "how long is a string"
23:20 sputnik13 joined #gluster
23:32 elico joined #gluster
23:49 n-st joined #gluster
23:57 theron joined #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary