Perl 6 - the future is here, just unevenly distributed

IRC log for #gluster, 2017-11-20

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:53 protoporpoise Picking anyone's brains here, if I'm trying to achieve 'decent' write IOP/s on a 3 node gluster cluster where 1 is just an arbiter and each of the nodes has fast SSD storage - is there a good up to date tuning guide for this? most of the ones I've found are for people still using spinning rust OR that are deploying gluster to a great number of hosts.
01:51 protoporpoise The reason I ask, each gluster host has storage that performs at around 60,000 IOP/s for random 4k reads or writes, but once in a gluster replica clients are maxing out at around 600IOP/s! (10GbE between them, very low latency, 16 IO threads etc...)
02:01 ilbot3 joined #gluster
02:01 Topic for #gluster is now Gluster Community - https://www.gluster.org | Documentation - https://gluster.readthedocs.io/en/latest/ | Patches - https://review.gluster.org/ | Developers go to #gluster-dev | Channel Logs - https://botbot.me/freenode/gluster/ & http://irclog.perlgeek.de/gluster/
02:11 prasanth joined #gluster
02:17 boutcheee520 joined #gluster
02:55 ilbot3 joined #gluster
02:55 Topic for #gluster is now Gluster Community - https://www.gluster.org | Documentation - https://gluster.readthedocs.io/en/latest/ | Patches - https://review.gluster.org/ | Developers go to #gluster-dev | Channel Logs - https://botbot.me/freenode/gluster/ & http://irclog.perlgeek.de/gluster/
03:26 nbalacha joined #gluster
03:30 boutcheee520 joined #gluster
03:44 gyadav joined #gluster
03:45 psony joined #gluster
04:08 buvanesh_kumar joined #gluster
04:15 vbellur joined #gluster
04:26 ppai joined #gluster
04:35 sanoj joined #gluster
04:41 itisravi joined #gluster
04:58 ndarshan joined #gluster
05:01 karthik_us joined #gluster
05:02 skumar joined #gluster
05:13 kramdoss_ joined #gluster
05:15 vishnu_kunda joined #gluster
05:16 Prasad_ joined #gluster
05:17 int-0x21 protoporpoise, i also have had alot of issues with that, what filesystem do you have on the bricks ?
05:20 prasanth joined #gluster
05:21 protoporpoise xfs
05:21 protoporpoise w/ -i 512
05:21 protoporpoise etc..
05:23 zerick joined #gluster
05:29 gbox protoporpoise, int-0x21: Did you check out this tuning guide?  https://people.redhat.com/dblack/summit2017/rh-summit-2017-dblack-bturner-gluster-architecture-and-performance-20170502-final-with-101.pdf
05:36 apandey joined #gluster
05:37 mlg9000_1 joined #gluster
05:40 protoporpoise thanks gbox, yeah that's very high level / pretty generic tuning though, it's still not /really/ talking about iops.
05:41 gbox protoporpoise:  Yeah I get your point.  You two have put a lot of effor into this recently.  I'd hoped to hear better results
05:41 gbox What performance hit does gluster have at its most basic?  Like a 1 brick distributed volume?
05:42 sanoj joined #gluster
05:42 rastar joined #gluster
05:42 gbox So it's just the DHT on top of the brick.  The hash will always send the data to that brick.  Mounted locally, what IOPS do you get?
05:42 protoporpoise e.g. if your files are between 64K and 4MB, with a lot of files around 2MB and a lot of files around 512K, your two gluster server nodes can easily achieve 60,000 random 4k write IOP/s each and over 400MB/s write or read, but on the most simple replicated volume with standard tuning recommendations, we realistically get between 400-900 IOP/s on 4k random writes at most.
05:43 protoporpoise the problem with that is, if some silly app goes and does something silly like a recursive chown / chmod of hundreds of thousands of files, lots of small IO is a lot slower
05:43 protoporpoise The same recursive chown (which they shouldn't be doing but thats another thing) under NFS takes about 20-45 seconds, on gluster it takes around 30-40 minutes
05:44 gbox Yeah those standard POSIX procedures seem to kill gluster
05:44 protoporpoise yep, even with stat prefetch etc...
05:44 gbox Wow yeah that's the kernel NFS yes?  I guess 20+ years of tuning has made that fairly fast
05:45 protoporpoise yeah
05:45 protoporpoise https://paste.fedoraproject.org/paste/JC1m9d0LGmhMvAahtD67cA
05:45 glusterbot Title: gluster - Modern Paste (at paste.fedoraproject.org)
05:45 protoporpoise theres an examp[le of a simple volume config
05:45 protoporpoise gluster 3.12.2
05:45 gbox I switched from kernel NFS because parallel writes would take down the OS
05:45 protoporpoise the 3rd node is just an arbiter
05:45 gbox But there might have been a fix
05:46 gbox So the replication (AFR) causes the performance hit?  Distribution (DHT) works fine?
05:47 protoporpoise AFR as in gluster server <-> gluster server?
05:47 protoporpoise I think to get a real answer on that I need to learn how to profile the volumes
05:48 gbox Yeah AFR is the replication layer
05:48 protoporpoise I'll have another read (it's been a year or so) of https://docs.gluster.org/en/latest/Administrator%20Guide/Monitoring%20Workload/ and give it a try tomorrow with some FIO tests
05:48 glusterbot Title: Monitoring Workload - Gluster Docs (at docs.gluster.org)
05:49 gbox What's weird is I can see replicated writes going out to all 3 servers simultaneously.  You're using the fuse client?
05:49 protoporpoise yeah the native kubernetes connector
05:49 protoporpoise is it not meant to?
05:49 protoporpoise maybe I'm misunderstanding something and have it configured wrong...
05:51 gbox Well I can believe the IOPS is that low.  I've never seen IOPS as high as you see for your raw NVMe.
05:51 protoporpoise really??
05:51 protoporpoise we easily get over 60K write IOP/s on SATA not even nvme
05:51 protoporpoise lol
05:51 gbox Ha, yeah I have crap hardware
05:51 protoporpoise :(
05:52 gbox I just use fio though.  How are you measuring IOPS?
05:53 protoporpoise I just did a quick FIO run on an old NFS client we have running old debian, old kernel etc.... back to an old NFS server and it easily gets 8000 random 4k write IOP/s via NFS
05:53 protoporpoise fio --time_based --name=benchmark --size=5G --runtime=30s --filename=fio_write_iops --ioengine=libaio --randrepeat=0 --iodepth=64 --direct=1 --invalidate=1 --verify=0 --verify_fatal=0 --numjobs=4 --rw=randwrite --blocksize=4k --group_reporting
05:53 protoporpoise https://smcleod.net/tech/benchmarking-io/
05:53 glusterbot Title: smcleod.net | Benchmarking IO with FIO (at smcleod.net)
05:53 protoporpoise ^^ my craptastic blog
05:54 gbox Maybe I misremember but I always feel let down by these tests
05:54 protoporpoise here's a really old post on a lot older storage hardware we had: https://smcleod.net/tech/scsi-benchmarking/
05:55 glusterbot Title: smcleod.net | iSCSI Benchmarking (at smcleod.net)
05:55 gbox Thanks,h I'm going to try some stuff like this in the next couple weeks.
05:56 protoporpoise note that the storage units provide around 4,000,000 4K random write IOP/s per two node cluster, each of the gluster VMs is running off seperate storage etc... so its 100% not storage issues ;)
05:56 protoporpoise https://vimeo.com/154701062
05:56 protoporpoise ^ Rebuilding RAID arrays at 8000MB/s
05:57 gbox Your cache sizes seem low
05:57 protoporpoise and the 2-3 year old design at 2M IOP/s - https://vimeo.com/137813890
05:57 gbox Ha, 8000MB/s!
05:57 protoporpoise thats read cache though right?
05:57 gbox Your pastebin config
05:58 protoporpoise performance.cache-size: 256MB ?
05:58 gbox Well like the write-behind-window-size
05:58 protoporpoise oh right, I thought I was being cheeky setting it that high
05:58 protoporpoise you say push it further ?
05:59 gbox Ha, I have one volume at 1GB write-behind-window-size
05:59 gbox No idea if that actually helps or not!
05:59 protoporpoise OoO.... will try
06:00 gbox I think the bigger issue is why does an application chown data?
06:00 protoporpoise btw, just tested that old as anything NFS client/server, get 18K random 4K read IOP/s back to a single VM's disk, gluster under the same test but a lot newer hardware, os, kernel etc... gets 4000 :(
06:00 protoporpoise hahaha I know
06:01 protoporpoise but some times, you can't control what someone might do by mistake
06:01 protoporpoise devs fixed that app
06:01 protoporpoise but lets say a virus scanner had to be run
06:01 protoporpoise and it couldn't always use inotifyd
06:01 protoporpoise or lets say, restoring a lot of small files from backups
06:01 gbox Yeah that would take forever
06:01 protoporpoise yeah
06:02 protoporpoise going to quickly try with the write-behind window size set to 512MB
06:03 protoporpoise then i should go home
06:03 gbox I think it's worth messing with those gluster config parameters, as well as the OS.  OTOH maybe replication only is just gonna be slow
06:03 gbox It seems like a lot of solutions gain speed by splitting up the data using distribution and then you gain speed that way
06:04 protoporpoise wonder if I could make it write to only one node at a time and let replication handle the cross node traffic
06:04 gbox the fuse client (mount -t glusterfs) should make all writes go to all nodes simultaneously
06:05 protoporpoise no control over mount options with the gluster kubes connector :(
06:06 protoporpoise no difference to random write iops setting the write behind cache size up
06:06 protoporpoise wasn't overly scientific so I didn't have my hopes up
06:06 protoporpoise but hey
06:06 gbox Yeah the kubes element seems confusing
06:06 gbox That's just for the containers?
06:06 protoporpoise container orchestration
06:06 protoporpoise https://github.com/kubernetes/examples/blob/master/staging/volumes/glusterfs/README.md
06:07 glusterbot Title: examples/README.md at master · kubernetes/examples · GitHub (at github.com)
06:07 protoporpoise actually: we get the same performance using the mount command anyway
06:08 gbox yeah it's just doing a fuse mount
06:10 gbox I have a similar test to do soon & I'll let you know.  See ya later!
06:10 protoporpoise just reading https://events.linuxfoundation.org/sites/events/files/slides/Gluster_DirPerf_Vault2017_0.pdf
06:12 protoporpoise I wish parallel-readdir didn't completely stuff everything up (see https://bugzilla.redhat.com/show_bug.cgi?id=1512371)
06:13 glusterbot Bug 1512371: high, unspecified, ---, pgurusid, POST , parallel-readdir = TRUE prevents directories listing
06:14 susant joined #gluster
06:16 ilbot3 joined #gluster
06:16 Topic for #gluster is now Gluster Community - https://www.gluster.org | Documentation - https://gluster.readthedocs.io/en/latest/ | Patches - https://review.gluster.org/ | Developers go to #gluster-dev | Channel Logs - https://botbot.me/freenode/gluster/ & http://irclog.perlgeek.de/gluster/
06:27 xavih joined #gluster
06:33 protoporpoise write: io=187176KB, bw=6238.6KB/s, iops=1559, runt= 30003msec
06:33 protoporpoise @gbox, enabling cache invalidation and setting cache timeouts to 600 I increased us to 1559 4k random write iops
06:33 protoporpoise obv more cpu and ram on the gluster hosts
06:33 protoporpoise but a state
06:33 protoporpoise start
06:34 gbox Ha, cool.  I didn't think cache invalidation would do much for you
06:35 protoporpoise neither did i
06:35 protoporpoise might be a side effect
06:35 protoporpoise https://paste.fedoraproject.org/paste/2G5MUT1BQs7Au2bQUw39ew
06:35 gbox That's a short-term bug though with parallel-readdir, yes?  You submitted a bug report?
06:35 glusterbot Title: cache invalidation - Modern Paste (at paste.fedoraproject.org)
06:35 protoporpoise yep that was a link to the bugzilla repo
06:35 protoporpoise rt
06:35 protoporpoise cant enable it if you have an arbiter node
06:35 protoporpoise lol
06:36 gbox I think brick multiplexing would actually slow you down with so few nodes.  The RHGS documentation has an interesting write-up on that
06:36 protoporpoise orly
06:37 protoporpoise I have to go home now
06:37 protoporpoise been working too long
06:37 int-0x21 Hi sorry was in another window
06:38 int-0x21 I managed to get iops up to a decent level with rdma
06:39 int-0x21 But i do have quite a few issues with gluster block and ganesha. vmware is getting higly confused all the time
06:45 jiffin joined #gluster
06:47 ppai joined #gluster
06:50 int-0x21 That cache invalidation solved one issue for me ! (nfs ganesha export can now read directory .. :) )
06:50 int-0x21 Now i just have to figure out why vmware thinks its not the same cluster
06:50 int-0x21 Multiple clusters detected for the multipath - Possible misconfiguration. Current cluster: 0x430cb6a9fe10, New cluster: 0x430cb6aa1050
07:01 vishnu_sampath joined #gluster
07:06 int-0x21 Or not .. now error is back, Im starting to think vmware hates everything that isnt vsa
07:07 jtux joined #gluster
07:15 jkroon joined #gluster
07:18 ompragash joined #gluster
07:18 mbukatov joined #gluster
07:50 ppai joined #gluster
07:56 sahina joined #gluster
08:00 Humble joined #gluster
08:02 ivan_rossi joined #gluster
08:06 Humble joined #gluster
08:08 zcourts joined #gluster
08:09 Humble joined #gluster
08:11 Humble joined #gluster
08:12 apandey joined #gluster
08:19 fsimonce joined #gluster
08:27 ompragash joined #gluster
08:39 poornima_ joined #gluster
08:47 nbalacha joined #gluster
09:11 [diablo] joined #gluster
09:17 kramdoss_ joined #gluster
09:24 ompragash joined #gluster
09:34 nbalacha joined #gluster
09:39 prasanth joined #gluster
09:55 Saravanakmr joined #gluster
10:01 MrAbaddon joined #gluster
10:05 kramdoss_ joined #gluster
10:06 creshal joined #gluster
10:13 ndevos protoporpoise++ many thanks for doing the presentation last week!
10:13 glusterbot ndevos: protoporpoise's karma is now 1
10:15 apandey joined #gluster
10:33 itisravi joined #gluster
10:39 major joined #gluster
10:42 apandey joined #gluster
11:28 ivan_rossi left #gluster
11:29 jtux joined #gluster
11:34 shyam joined #gluster
11:34 Wizek_ joined #gluster
12:09 shortdudey123 joined #gluster
12:12 MrAbaddon joined #gluster
12:27 nbalacha joined #gluster
12:41 psony joined #gluster
12:47 ompragash joined #gluster
12:53 Wizek_ joined #gluster
12:56 phlogistonjohn joined #gluster
13:02 kramdoss_ joined #gluster
13:06 Wizek_ joined #gluster
13:13 DV__ joined #gluster
13:14 Wizek_ joined #gluster
13:23 nbalacha joined #gluster
13:46 shyam joined #gluster
13:58 buvanesh_kumar joined #gluster
14:04 msvbhat joined #gluster
14:17 plarsen joined #gluster
14:18 shyam joined #gluster
14:19 dominicpg joined #gluster
14:21 DV joined #gluster
14:21 phlogistonjohn joined #gluster
14:23 marlinc joined #gluster
14:30 shyam joined #gluster
14:43 jstrunk joined #gluster
14:44 ndarshan joined #gluster
14:50 farhorizon joined #gluster
14:53 farhorizon joined #gluster
15:01 skylar1 joined #gluster
15:12 gyadav joined #gluster
15:13 rwheeler joined #gluster
15:15 deniszh joined #gluster
15:21 hmamtora joined #gluster
15:22 hmamtora_ joined #gluster
15:30 timotheus1_ joined #gluster
15:39 jiffin joined #gluster
15:40 buvanesh_kumar joined #gluster
15:53 kpease joined #gluster
15:55 zcourts joined #gluster
16:10 zcourts joined #gluster
16:11 ndarshan joined #gluster
16:13 vbellur joined #gluster
16:14 vbellur joined #gluster
16:14 vbellur joined #gluster
16:19 zcourts joined #gluster
16:19 vbellur joined #gluster
16:20 vbellur joined #gluster
16:20 zcourts_ joined #gluster
16:21 zcourts_ joined #gluster
16:22 vbellur joined #gluster
16:23 vbellur joined #gluster
16:23 farhoriz_ joined #gluster
16:25 vbellur joined #gluster
16:25 kjackal_bot joined #gluster
16:26 vbellur joined #gluster
16:31 gyadav joined #gluster
16:36 MrAbaddon joined #gluster
16:58 saali joined #gluster
17:17 jiffin joined #gluster
17:24 msvbhat joined #gluster
17:32 pladd joined #gluster
17:33 zcourts joined #gluster
17:37 zcourts joined #gluster
17:39 vbellur joined #gluster
17:40 vbellur joined #gluster
17:40 vbellur joined #gluster
17:42 vbellur joined #gluster
17:42 vbellur joined #gluster
17:43 vbellur joined #gluster
17:47 vbellur joined #gluster
17:48 vbellur joined #gluster
17:48 vbellur1 joined #gluster
17:49 vbellur joined #gluster
17:50 vbellur joined #gluster
17:50 wushudoin joined #gluster
17:53 buvanesh_kumar joined #gluster
17:55 arpu joined #gluster
18:07 jkroon joined #gluster
18:37 map1541 joined #gluster
18:41 pladd joined #gluster
18:47 deniszh left #gluster
19:41 bueschi joined #gluster
19:42 msvbhat joined #gluster
19:43 devyani7 joined #gluster
19:52 plarsen joined #gluster
19:58 plarsen joined #gluster
20:00 deniszh joined #gluster
20:03 MrAbaddon joined #gluster
20:16 baber joined #gluster
20:45 protoporpoise ndevos: np - gluster has such a large set implementation options it was hard to cover just enough to get people interested in the time lol
20:51 mallorn1 We rebooted one of our nodes in a distributed-disperse set, and now the bricks aren't showing up as available.  'gluster volume list' returns nothing, although 'gluster pool list' only shows localhost.  Any ideas?
20:56 mallorn1 State is "Peer Rejected (Connected)"
21:09 msvbhat joined #gluster
21:13 farhorizon joined #gluster
21:15 mallorn1 Got it figured out.
21:22 Chewi joined #gluster
21:22 Chewi left #gluster
21:48 zcourts joined #gluster
22:11 protoporpoise ndevos: any chance we could get the 3.12.3 packages out?
22:11 farhorizon joined #gluster
22:50 protoporpoise emailed him as I'm in Australia and he's far far away (and busy)
22:51 Wizek_ joined #gluster
22:58 nirokato joined #gluster
23:04 glisignoli Is it possible to replace a failed brick in a distributed-replica volume, with a new brick that will be mounted in the same place as the old brick? Eg, replace failed brick /mnt/lv6/brick6 with new brick /mnt/lv6/brick6
23:07 msvbhat joined #gluster
23:09 Wizek_ joined #gluster
23:24 Wizek__ joined #gluster
23:39 zcourts_ joined #gluster
23:46 vbellur joined #gluster
23:47 vbellur joined #gluster
23:48 vbellur joined #gluster
23:49 vbellur1 joined #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary