Camelia, the Perl 6 bug

IRC log for #gluster, 2013-05-10

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:03 vex anyone know if you can rename a gluster peer once it's been connected/created ?
00:04 yinyin joined #gluster
00:26 a2_ Supermathie, ping?
00:26 JoeJulian vex: you should be able to just probe it by the new name, unless it has bricks assigned to it.
00:27 vex JoeJulian: it does have bricks assigned.
00:27 vex i'll find another way to do it.
00:28 JoeJulian With down time, you can edit files under /var/lib/glusterd. Just make sure the chages are synched to all the peers.
00:30 nueces joined #gluster
00:35 vex JoeJulian: ah. neat :)
00:49 bala joined #gluster
00:59 hagarth joined #gluster
01:38 lkthomas does english linux AIO will improve performance ?
01:38 lkthomas enable*
01:44 hagarth lkthomas: largely no. It is better to turn it off as it can cause other problems for regular workloads.
01:45 lkthomas but why is it included on latest release
01:48 hagarth lkthomas: it is an option that is maturing, not recommended to be used right now.
01:48 lkthomas ok
02:02 lkthomas I am doing local file copy from normal linux file system to gluster mount point
02:03 lkthomas CPU usage is 100% all the time
02:03 lkthomas any way to lower the CPU usage ?
02:20 vex missing 'option transport-type'. defaulting to "socket" - Any clue to what that means?
02:22 bharata joined #gluster
02:31 lkthomas I think socket means localhost
02:31 lkthomas does socket supported ?
02:48 harish joined #gluster
02:52 lkthomas I think it doesn't make sense to have TCP transport on local disk to disk replica
02:53 sgowda joined #gluster
03:25 JosephWHK joined #gluster
03:52 shylesh joined #gluster
03:52 sgowda joined #gluster
03:54 robert7811 joined #gluster
04:05 lkthomas guys, any protection being done on striped pool ?
04:09 lkthomas actually it's kind of stupid with distributed vol without adding reliability feature
04:14 saurabh joined #gluster
04:31 deepakcs joined #gluster
04:41 satheesh joined #gluster
04:44 bala joined #gluster
04:44 shireesh joined #gluster
04:46 vpshastry joined #gluster
04:57 piotrektt_ joined #gluster
04:59 shylesh joined #gluster
05:13 bala joined #gluster
05:20 rotbeard joined #gluster
05:24 koubas joined #gluster
05:35 rastar joined #gluster
05:37 moli joined #gluster
05:45 raghu joined #gluster
05:57 codex joined #gluster
06:00 aravindavk joined #gluster
06:02 rgustafs joined #gluster
06:02 msmith__ joined #gluster
06:03 bala joined #gluster
06:04 rastar joined #gluster
06:07 JosephWHK what about using replica with striped ?
06:08 jiku joined #gluster
06:11 satheesh joined #gluster
06:12 JoeJulian lkthomas: you can stripe+replicate, but be sure to read about ,,(stripe) before you consider using it
06:12 glusterbot lkthomas: Please see http://goo.gl/5ohqd about stripe volumes.
06:16 lalatenduM joined #gluster
06:17 lkthomas JoeJulian: basically it's RAID10 concept
06:18 JoeJulian Except it's not.
06:18 lkthomas yet you are getting 50% of the capacity
06:18 JoeJulian raid doesn't translate to clustered filesystems. Best not to try.
06:19 lkthomas the HDFS convert data into objects and distribute across cluster
06:19 vex joined #gluster
06:19 lkthomas looks like gluster have long way to go
06:19 jtux joined #gluster
06:19 JoeJulian Doesn't everything?
06:20 lkthomas 3.3 have object storage concept
06:20 JoeJulian If that's a question, yes.
06:21 lkthomas actually I am little bit confuse of what it could do
06:21 JoeJulian Are you familiar with swift or S3?
06:21 lkthomas I use S3
06:21 lkthomas so what are you trying to say ?
06:21 JoeJulian That's what it is, essentially.
06:21 lkthomas no, it doesn't explain
06:22 lkthomas OH
06:22 lkthomas I start to understand now
06:22 lkthomas container
06:22 lkthomas and use curl to put file into storage
06:23 JoeJulian I could tell that bulb was about to light up... :D
06:23 lkthomas I don't use such method to transfer file, so no
06:24 JoeJulian But it's called Unified File and Object store (UFO) because you can use it as both an object store (a la S3) and also as a file store (via fuse or nfs mount) and have access to the same files. It's a unified system.
06:24 lkthomas understandable but not useful or meaningful for me
06:25 ricky-ticky joined #gluster
06:28 lkthomas object storage is good for long term storage
06:31 lkthomas JoeJulian: are you an experienced gluster user ?
06:31 satheesh joined #gluster
06:31 JoeJulian You could say that.
06:31 lkthomas JoeJulian: I am using gluster server and client in the same server for local disk to disk replication
06:32 lkthomas JoeJulian: the CPU usage going so high that server response keep lagging
06:32 lkthomas I am wondering what I should do to improve that, backend storage using XFS
06:33 JoeJulian Honestly, if it were me doing that, I would use raid. There's no need for a clustered filesystem on one machine.
06:33 lkthomas I want to keep it simple
06:33 JoeJulian And raid would mirror drives much more efficiently.
06:34 vex joined #gluster
06:34 vex joined #gluster
06:34 JoeJulian Precisely. Keep it simple.
06:34 lkthomas well
06:34 lkthomas any way to improve this ?
06:36 JoeJulian Get faster processors. Get storage controllers that don't use cpu. Get more/faster ram? Maybe even faster hard drives. It all depends on where the bottleneck is.
06:36 lkthomas so nothing I could tune on gluster level ?
06:36 JoeJulian Maybe, but not usually.
06:37 lkthomas ok
06:37 JoeJulian Are you getting in to swap? Maybe reduce the cache size. Maybe try disabling various performance translators and see if that makes any difference.
06:37 JoeJulian But usually the stock glusterfs settings are optimal.
06:40 JoeJulian Now, that said, I set performance.cache-size to 8MB because I run 60 bricks per server and that uses up almost all the ram on that machine but keeps it out of swap.
06:40 lkthomas what is the default size ?
06:40 lkthomas and no it's not swapping
06:41 JoeJulian 32Mb I think...
06:41 lkthomas do you think I should down tune IO threads? because I am using 2 x SATA disk
06:42 JoeJulian But I'm pretty sure that number's used for multiple caches, so it's like 32Mb for 4 different caches.
06:43 JoeJulian you might try switching to the deadline i/o scheduler
06:43 lkthomas OH ?
06:45 lkthomas disk queue is running deadline now
06:46 JoeJulian and yeah... try playing with thread count. it would be interesting to compare various settings.
06:48 vimal joined #gluster
06:48 vex joined #gluster
06:48 vex joined #gluster
06:56 dobber_ joined #gluster
07:03 bulde joined #gluster
07:03 ctria joined #gluster
07:03 premera_q joined #gluster
07:05 ekuric joined #gluster
07:12 shireesh joined #gluster
07:23 ngoswami joined #gluster
07:23 kore_ joined #gluster
07:24 kore_ left #gluster
07:33 andreask joined #gluster
07:43 rgustafs joined #gluster
07:48 brunoleon_ joined #gluster
08:04 manik joined #gluster
08:07 glusterbot New news from resolvedglusterbugs: [Bug 847843] [FEAT] Improving visibility of geo-replication session <http://goo.gl/frPzM>
08:09 glusterbot New news from newglusterbugs: [Bug 947774] [FEAT] Display additional information when geo-replication status command is executed <http://goo.gl/Bpg3O>
08:13 mgebbe joined #gluster
08:15 hagarth joined #gluster
08:16 rgustafs joined #gluster
08:28 alex88 left #gluster
08:40 frakt_ joined #gluster
08:41 mgebbe__ joined #gluster
08:48 mjrosenb joined #gluster
08:49 moli joined #gluster
08:53 chlunde_ joined #gluster
08:53 mgebbe joined #gluster
08:53 ingard__ joined #gluster
08:54 avati joined #gluster
08:54 Ramereth|home joined #gluster
09:00 xavih joined #gluster
09:00 mgebbe__ joined #gluster
09:00 shylesh joined #gluster
09:00 fidevo joined #gluster
09:00 kevein joined #gluster
09:00 hchiramm_ joined #gluster
09:00 ingard_ joined #gluster
09:00 avati_ joined #gluster
09:06 jiku joined #gluster
09:09 glusterbot New news from newglusterbugs: [Bug 961668] gfid links inside .glusterfs are not recreated when missing, even after a heal <http://goo.gl/4vuYc>
09:14 logstashbot` joined #gluster
09:14 bala1 joined #gluster
09:14 neofob1 joined #gluster
09:15 lala_ joined #gluster
09:15 Ramereth joined #gluster
09:16 MinhP_ joined #gluster
09:17 bala1 joined #gluster
09:17 ricky-ticky1 joined #gluster
09:19 yosafbridge` joined #gluster
09:20 a1 joined #gluster
09:20 tdb- joined #gluster
09:21 JoeJulian_ joined #gluster
09:21 DWSR2 joined #gluster
09:21 DWSR2 joined #gluster
09:22 kkeithley joined #gluster
09:22 mkonecny joined #gluster
09:23 bharata joined #gluster
09:29 zwu joined #gluster
09:34 manik joined #gluster
09:47 rgustafs joined #gluster
09:48 vshankar joined #gluster
09:52 dxd828 joined #gluster
09:55 lh joined #gluster
09:55 lh joined #gluster
09:59 manik joined #gluster
10:14 moli left #gluster
10:20 lkthomas hey guys
10:20 lkthomas I accidently remove the while brick directory, but my replica have it, does rebalance work in this case ?
10:24 aravindavk joined #gluster
10:32 andreask you mean auto-heal?
10:34 lkthomas andreask: yep
10:35 lkthomas andreask: I need brick2 auto copy file back to brick1 :)
10:35 lkthomas gluster volume heal $volname full
10:35 lkthomas like that ?
10:36 andreask yes
10:38 andreask ... self-heal of course ... but you know what I mean ;-)
10:42 lkthomas I am on 3.3.1, so I think once I execute that command, healing should start
10:43 andreask yes
10:43 lkthomas that's why I like this better than RAID
10:44 lkthomas I could fuck up one brick while other brick still up and running
10:44 andreask well, its the same in RAID1
10:45 aravindavk joined #gluster
10:45 andreask though not on differents hosts, true
10:50 duerF joined #gluster
10:53 rastar joined #gluster
10:54 ninkotech joined #gluster
10:55 ninkotech_ joined #gluster
10:55 ninkotech__ joined #gluster
10:55 jtux joined #gluster
11:03 karoshi when a client uses gluster's NFS server, does it have the same failover capabilities as the native gluster protocol?
11:03 piotrektt joined #gluster
11:04 NuxRo karoshi: no, you'll have to take care of that yourself
11:04 samcooke joined #gluster
11:05 karoshi ok thanks
11:06 manik joined #gluster
11:10 samcooke Hi - we're currently running a rebalance and gluster seems to have changed the permissions of a significant proportion of our files to 000 (sometimes with the sticky bit, sometimes not). The permissions are shown like that in both a mount point (gluster client) and on disk in the brick - the files do not have a zero length and changing the permissions via the mount using chmod seems to fix them. Has anyone else seen anything like this? I
11:10 samcooke could only find a brief mention of something like this on Google from a previous IRC chat.
11:14 matclayton joined #gluster
11:27 piotrektt is it normal that volume reports that NFS server, Self-heal Daemon are offline? but the replication is working and self-healing too
11:27 al joined #gluster
11:27 piotrektt strange
11:28 andrei_ hello guys
11:29 andrei_ what is the recommended amount of performance cache size that I should use?
11:29 andrei_ relative to the total amount of ram that the storage server has?
11:30 ricky-ticky joined #gluster
11:30 logstashbot New news from newjiraissues: Anjo Krank created LOGSTASH-1077 - Command line help shouldn't show exception+stacktrace <https://logstash.jira.com/browse/LOGSTASH-1077>
11:30 glusterbot Title: [LOGSTASH-1077] Command line help shouldnt show exception+stacktrace - logstash.jira.com (at logstash.jira.com)
11:43 abelur joined #gluster
11:45 chirino joined #gluster
11:50 jtux joined #gluster
11:54 wgao hi all,  I have install gluster on Fedora system, start glusterd service and probe another server by 'gluster peer probe $ip', but here got issue, logs show like that cat /var/log/glusterfs/cli.log show "E [cli-rpc-ops.c:173:gf_cli3_1_probe_cbk] 0-glusterd: Probe failed with op_ret -1 and op_errno 107"
11:54 wgao Anyone can help me to resolve it ?
11:55 mohankumar joined #gluster
12:01 Susant joined #gluster
12:01 Susant left #gluster
12:02 kkeithley You can ping $ip? And there is no firewall running on $ip, or the right ports are open in the firewall on $ip?
12:02 kkeithley @ports
12:02 glusterbot kkeithley: glusterd's management port is 24007/tcp and 24008/tcp if you use rdma. Bricks (glusterfsd) use 24009 & up. (Deleted volumes do not reset this counter.) Additionally it will listen on 38465-38467/tcp for nfs, also 38468 for NLM since 3.3.0. NFS also depends on rpcbind/portmap on port 111.
12:02 andrei_ is it safe to use this option: nfs.trusted-sync ?
12:02 kkeithley no selinux either, right?
12:04 wgao Hi kkeithley, ping works well, using this server I can probe the other target, it works well,
12:05 wgao The fedora install on a virtual system which created with virtualbox.
12:12 hchiramm_ joined #gluster
12:13 hchiramm_ joined #gluster
12:17 wgao Hi kkeithley, it works well, thanks for you support.
12:18 kkeithley yw
12:19 dustint joined #gluster
12:21 rastar joined #gluster
12:36 stefanha_ joined #gluster
12:39 manik joined #gluster
12:39 bennyturns joined #gluster
12:47 matclayton joined #gluster
12:55 eightyeight how can i check that my data is replicated using rdma, rather than tcp?
12:56 eightyeight i guess i could try mounting host:vol.rdma?
12:57 plarsen joined #gluster
12:58 kkeithley wireshark or tcpdump?
12:58 kkeithley looking for _no_ packets to the rdma host?
13:00 eightyeight looks like adding the "-o transport=rdma" option is the trick. unfortunately, looks like it's not mounting. grr
13:00 eightyeight i don't think tcpdump will caputer rdma packets
13:00 eightyeight could be wrong there though
13:04 eightyeight hmm. looks like passing '-o transport=rdma' doesn't seem to be working
13:06 kkeithley that's right, tcpdump won't capture rdma packets. If the writes are landing on the replica and you don't see any tcp traffic in wireshark or tcpdump then the it must be using rdma, yes?
13:07 kkeithley s/the it/the writes/
13:07 glusterbot What kkeithley meant to say was: that's right, tcpdump won't capture rdma packets. If the writes are landing on the replica and you don't see any tcp traffic in wireshark or tcpdump then the writes must be using rdma, yes?
13:08 eightyeight i just built a test pool, with transport=rdma only, started, mounted, and built some files, and all seems to be working
13:09 eightyeight i've just heard that when creating the pool, if using 'transport tcp,rdma', and rdma is present, it will favor that over tcp
13:10 eightyeight looks like there might be an 'ibdump' utility i can use
13:10 glusterbot New news from newglusterbugs: [Bug 960190] Gluster SHD and NFS do not start <http://goo.gl/NLh5B>
13:10 logstashbot Title: Bug 960190 Gluster SHD and NFS do not start (at goo.gl)
13:10 glusterbot Bug http://goo.gl/NLh5B high, unspecified, ---, kparthas, NEW , Gluster SHD and NFS do not start
13:10 kkeithley I'm looking at my rdma setup and I don't see anything that looks like a tool for dumping rdma packet. Fedora's ib/rdma tools don't seem to have ibdump. Maybe on RHEL they do.
13:11 eightyeight i'm on debian, and i don't have it installed either
13:11 Supermathie kkeithley: no ibdump
13:11 Supermathie kkeithley: http://fpaste.org/11474/13681915/
13:11 logstashbot Title: #11474 Fedora Project Pastebin (at fpaste.org)
13:11 glusterbot Title: #11474 Fedora Project Pastebin (at fpaste.org)
13:12 kkeithley okay, good to know
13:13 kkeithley I've been told that RHEL has more than what's available in Fedora
13:15 Supermathie http://www.mellanox.com/page/products_dyn?p​roduct_family=110&amp;mtag=monitoring_debug
13:15 logstashbot <http://goo.gl/8TNzg> (at www.mellanox.com)
13:15 glusterbot <http://goo.gl/8TNzg> (at www.mellanox.com)
13:16 eightyeight Supermathie: +1
13:18 aliguori joined #gluster
13:22 eightyeight ah. it only does connectx HCAs. i only have infinihost IIIs
13:25 Supermathie eightyeight: yeah, you can probably find a version for your cards
13:33 eightyeight i would think there would be some internal file, or log, or something that tells me it's favoring rdma over tcp
13:37 Supermathie eightyeight: pkill -USR1 glusterfs glusterfsd, then check the dumps in /tmp and look at the state. Not sure if it's in there.
13:39 rastar joined #gluster
13:40 eightyeight dos not appear so
13:40 eightyeight s/dos/does/
13:40 glusterbot What eightyeight meant to say was: does not appear so
13:40 eightyeight meh
13:41 robos joined #gluster
13:44 vpshastry1 joined #gluster
13:44 eightyeight http://ae7.st/p/5xj
13:44 logstashbot Title: Pastebin on ae7.st » 5xj (at ae7.st)
13:44 glusterbot Title: Pastebin on ae7.st » 5xj (at ae7.st)
13:44 eightyeight lots of duplication between bots, it seems
13:45 manik joined #gluster
13:48 shanks joined #gluster
13:49 eightyeight http://webcache.googleusercontent.com/search?q=​cache:U4lrAc-ccx0J:community.gluster.org/q/how-​i-can-troubleshoot-rdma-performance-issues-in-3​-3-0/+&amp;cd=2&amp;hl=en&amp;ct=clnk&amp;gl=us
13:49 logstashbot <http://goo.gl/MwaEB> (at webcache.googleusercontent.com)
13:49 glusterbot <http://goo.gl/MwaEB> (at webcache.googleusercontent.com)
13:49 eightyeight hmm
13:49 eightyeight http://thr3ads.net/gluster-users/2012/07/19691​36-RDMA-not-fully-supported-by-GlusterFS-3.3.0 also
13:49 logstashbot <http://goo.gl/jXxCT> (at thr3ads.net)
13:49 glusterbot <http://goo.gl/jXxCT> (at thr3ads.net)
13:52 eightyeight actually, i'm on 3.2.7
13:53 eightyeight maybe i should avoid the upgrade to 3.3 :)
13:54 krishna_ joined #gluster
13:56 * eightyeight aft
14:06 bugs_ joined #gluster
14:18 kkeithley translated that means that QA did not have as much time as they wanted to test rdma. In RHS that means if you use it and have problems, Red Hat won't bend over backwards to make it work. In GlusterFS, which is community supported for the most part, you can expect to get the same support you have always had. ;-)
14:18 kkeithley That said, it does work and many people are using it successfully.
14:21 nickw joined #gluster
14:23 ngoswami joined #gluster
14:31 krishna__ joined #gluster
14:34 jclift joined #gluster
14:44 krishna_ joined #gluster
14:58 bala joined #gluster
15:04 JoeJulian_ Actually, in 3.3.1 rdma is broken. The port offered by glusterd to the client in order to connect to the brick is always 0. There's a fix so 3.3.2 /should/ work. I haven't heard any reports either way.
15:06 kkeithley That's fixed in the RPM's in my repo.
15:08 JoeJulian_ nice
15:10 mohankumar joined #gluster
15:10 glusterbot New news from newglusterbugs: [Bug 961856] [FEAT] Add Glupy, a python bindings meta xlator, to GlusterFS project <http://goo.gl/yCNTu>
15:11 logstashbot Title: Bug 961856 [FEAT] Add Glupy, a python bindings meta xlator, to GlusterFS project (at goo.gl)
15:11 glusterbot Bug http://goo.gl/yCNTu unspecified, unspecified, ---, vbellur, NEW , [FEAT] Add Glupy, a python bindings meta xlator, to GlusterFS project
15:18 harold[MTV] joined #gluster
15:19 jthorne joined #gluster
15:20 edward1 joined #gluster
15:43 premera_q volume replace-brick <VOLNAME> <BRICK> <NEW-BRICK> {start|pause|abort|status|commit [force]}   What does commit do ? I start migration with "start", when "commit" comes into play (in replica setup) ?
15:44 andrewjsledge joined #gluster
15:44 samcooke Hi apologies I'm reposting the question in the hope that someone else is around - we're currently running a rebalance and gluster seems to have changed the permissions of a significant proportion of our files to 000 (sometimes with the sticky bit, sometimes not). The permissions are shown like that in both a mount point (gluster client) and on disk in the brick - the files do not have a zero length and changing the permissions via the mo
15:44 samcooke using chmod seems to fix them. Has anyone else seen anything like this? I could only find a brief mention of something like this on Google from a previous IRC chat.
15:45 harold[MTV] premera_q, I believe commit "commits the change", removes the old brick, and inserts the new one into the volume
15:48 premera_q hmm, when would I commit, can I commit right after "start" ? I was thinking that I should check migration status to complete and then commit, however on a busy system that does not make sense
15:52 premera_q Thanks harold, that makes sense. It looks that once I "start" and migration completes, then all 3 bricks are in play, altough it is a 2 brick replica. At this moment looks like all writes go to all 3 bricks. Volume info still reports original configuration. After I commit the configuration changes.
15:57 harold[MTV] premera_q, You got it. :)
15:59 lalatenduM joined #gluster
16:12 krishna_ joined #gluster
16:12 awheeler_ joined #gluster
16:18 awheeler joined #gluster
16:31 andreask joined #gluster
16:31 Mo__ joined #gluster
16:35 balunasj joined #gluster
16:41 glusterbot New news from newglusterbugs: [Bug 961892] Compilation chain isn't honouring CFLAGS environment variable <http://goo.gl/xy5LX>
16:41 logstashbot Title: Bug 961892 Compilation chain isn't honouring CFLAGS environment variable (at goo.gl)
16:41 glusterbot Bug http://goo.gl/xy5LX low, unspecified, ---, amarts, NEW , Compilation chain isn't honouring CFLAGS environment variable
16:43 dewey joined #gluster
16:44 awheeler_ joined #gluster
16:47 fabien joined #gluster
16:56 thomaslee joined #gluster
17:04 bulde joined #gluster
17:11 glusterbot New news from newglusterbugs: [Bug 961902] Remove UFO code from glusterfs repo now that gluster-swift repo exists <http://goo.gl/rNvKa>
17:11 logstashbot Title: Bug 961902 Remove UFO code from glusterfs repo now that gluster-swift repo exists (at goo.gl)
17:11 glusterbot Bug http://goo.gl/rNvKa unspecified, unspecified, ---, junaid, NEW , Remove UFO code from glusterfs repo now that gluster-swift repo exists
17:20 nueces joined #gluster
17:24 vpshastry joined #gluster
17:25 JoeJulian @op
17:28 nueces_ joined #gluster
17:29 JoeJulian premera_q: Yes, you commit after the migration has completed. With replicated volumes, it's "safe" to "commit force" and let replication handle the rest, but that does leave you with a period of time where you're not at acceptable redundancy levels.
17:32 krishna_ joined #gluster
17:37 vpshastry left #gluster
17:41 lpabon joined #gluster
17:45 deepakcs joined #gluster
17:47 premera_q thanks JoeJulian
17:50 premera_q is there a way to clear "top" stats in 3.3.1, I know its coming in 3.4 via CLI, but I am wondering if there is a way to do it now ?
17:59 JoeJulian doesn't "volume profile info" do that?
18:00 krishna_ joined #gluster
18:02 premera_q it complains "Profile on Volume gv0 is not started", let me see if starting profiling helps
18:06 premera_q "volume profile info" does not seem to have any effect on "top" stats, even when profiling is started
18:08 tomsve joined #gluster
18:11 awheeler joined #gluster
18:13 krishna_ joined #gluster
18:31 andrei_ joined #gluster
18:38 JoeJulian premera_q: looks like restarting glusterd clears it.
18:40 krishna__ joined #gluster
18:49 premera_q thank you
19:05 rotbeard joined #gluster
19:17 GLHMarmot joined #gluster
19:25 sjoeboo_ joined #gluster
19:33 krishna_ joined #gluster
19:48 nueces joined #gluster
19:49 awheeler joined #gluster
19:57 nueces_ joined #gluster
19:58 chirino joined #gluster
20:04 rwheeler joined #gluster
20:08 edward1 joined #gluster
20:23 daMaestro joined #gluster
20:31 zaitcev joined #gluster
20:59 sjoeboo_ joined #gluster
21:03 krishna_ joined #gluster
21:07 premera joined #gluster
21:22 jdarcy joined #gluster
21:29 nueces joined #gluster
21:29 andrei_ hello guys
21:29 andrei_ anyone here?
21:30 andrei_ could someone help me determine why glusterfs is not utilising full performance of the server's fs?
21:30 nueces_ joined #gluster
21:31 jdarcy andrei_: What workload are you testing, using which tools?
21:32 andrei_ jdarcy: at the moment I am testing glusterfs with a single server
21:32 andrei_ client is the same machine as the server
21:32 andrei_ so networking is not involved
21:33 andrei_ using dd with bs=8M
21:33 bdperkin_gone joined #gluster
21:33 jdarcy andrei_: Single thread?
21:34 andrei_ single as well as multiple
21:34 andrei_ what I am seeing is something very strange
21:35 andrei_ i am testing with 2 different files
21:35 andrei_ one file is from /dev/zero
21:35 andrei_ so, highly compressible
21:35 andrei_ the other file came from /dev/urandom
21:35 andrei_ both files are 100gb in size
21:36 andrei_ when I am reading the zeros file I am seeing around 600mb/s single thread performance
21:36 jdarcy Is that mega*bit*s or mega*bytes*?
21:36 andrei_ bytes
21:36 andrei_ megabytes
21:36 andrei_ the reason for high speed is that the fs is compressing the data
21:36 andrei_ and there is not much io on the disks
21:36 jdarcy ZFS?
21:37 andrei_ so gluster is giving me around 600MB/s
21:37 andrei_ yeah, zfs
21:37 andrei_ however, when I am reading the file with urandom data
21:37 andrei_ i am only getting around 200-250MB/s
21:37 krishna_ joined #gluster
21:37 jdarcy So you
21:38 andrei_ the performance of zfs on the same server - urandom file is around 800-900MB/s
21:38 jdarcy So you're reading from /dev/whatever and writing into GlusterFS backed by ZFS?
21:38 andrei_ the zeros file is over 2GB/S
21:38 andrei_ i've pre-created two files
21:38 andrei_ these two files are already on zfs
21:39 andrei_ and the zfs folder is a brick in glusterfs
21:39 andrei_ using a single server setup
21:39 andrei_ when I am reading from zfs directly I am seeing about 6% iowait
21:39 jdarcy What kind of disks/controllers are behind this?
21:39 andrei_ when I am reading the same files via glusterfs I am seeing around 30% iowait
21:40 andrei_ lsi with jbods
21:41 jdarcy So, first, the compression part is a red herring.  GlusterFS isn't even aware of that, it just reads/writes the bits it's told to, so if there's a difference that's entirely a local-filesystem issue.
21:41 jdarcy As for the rest, when you go through FUSE there are several kinds of additional overhead that you don't have writing to a local filesystem.
21:42 jdarcy Context switches for one, and smaller I/O requests for another.
21:42 jdarcy What would be the ZFS result for the urandom file local-to-local using a block size of 128KB?
21:43 jdarcy Because that's what you're really getting through FUSE, even if at user level you said 8MB.
21:43 andrei_ like dd if=urandom-file of=/dev/null bs=128K?
21:43 jdarcy Yes, like that.
21:43 andrei_ let me try that
21:45 andrei_ i am still getting around 900MB/s with 128K block size
21:46 jdarcy Interesting.  How about iowait?
21:46 jdarcy Also, are you sure you're transferring more than will end up in cache>
21:46 andrei_ iowait is around 4% when I am doing 128K dd
21:47 andrei_ i was looking at the zpool iostat -v 1, which shows disk activity
21:47 andrei_ and total pool activity is consistenly reading around 800-900mb/s
21:47 andrei_ so it is not getting data from cache
21:48 jdarcy What kind of context-switch rates are you getting?
21:48 andrei_ where do I check for that?
21:49 jdarcy vmstat should show you
21:50 andrei_ this is the vmstat 1 output during dd
21:50 andrei_ http://fpaste.org/11581/13682226/
21:50 glusterbot Title: #11581 Fedora Project Pastebin (at fpaste.org)
21:53 jdarcy Hm.  I wonder if nearly 200K context switches per second might have an effect on performance.
21:55 andrei_ this is what I see when running the same command on the glusterfs mounted volume:
21:55 andrei_ http://fpaste.org/11584/36822293/
21:56 glusterbot Title: #11584 Fedora Project Pastebin (at fpaste.org)
21:57 andrei_ here is the gluster volume info output: http://fpaste.org/11585/23059136/
21:57 glusterbot Title: #11585 Fedora Project Pastebin (at fpaste.org)
21:57 andrei_ what I am doing wrong?
21:58 andrei_ i don't understand why I am seeing only about 1/4 of performance compared with the fs performance?
21:58 jdarcy So where are all the blocks out?  I see a whole bunch of blocks *in* but then they apparently go nowhere.
21:59 andrei_ of=/dev/null
21:59 jdarcy Wait, what, you said you were copying *into* GlusterFS.
22:00 andrei_ nope
22:00 andrei_ i was doing 2 tests
22:00 andrei_ 1 - read from zfs, output to /dev/null
22:00 andrei_ 2. read from glusterfs, output to dev/null
22:01 jdarcy At 128KB both times?
22:01 andrei_ gluster brick is created on top of the zfs
22:01 andrei_ yeah
22:01 andrei_ echo 3 > /proc/sys/vm/drop_caches; dd if=/glusterfs-test/100G-urandom-1 of=/dev/null bs=128K count=100000 skip=100000 iflag=nocache
22:02 jdarcy OK, so what's the CPU usage for glusterfs or glusterfsd during this?
22:02 andrei_ both zfs and glusterfs tests show disk activity, so the data is not coming from ram
22:02 andrei_ glusterfs is doing around 65%
22:02 andrei_ glusterfsd is around 55%
22:03 andrei_ the server is around 65% idle
22:03 jdarcy OK, so you've got two reasonably-busy processes communicating via RPC over a socket, even if that socket's local.
22:03 andrei_ iowait is around 25-30% when reading from glusterfs mount
22:03 andrei_ and around 5% when reading from zfs
22:04 jdarcy I'm not sure why iowait would be higher unless glusterfsd is chopping up the I/O into even smaller pieces.
22:04 andrei_ jdarcy: but why am I seeing different speeds when reading from a compressible file made from zeroes
22:05 jdarcy It might be worth getting a bit of strace output to see what size disk requests are actually being issued.
22:05 andrei_ should I not be getting similar speeds as from the disk? asuming the disks can handle 800-900MB/s
22:05 andrei_ do you know the command that I should use?
22:05 andrei_ i haven't used strace much
22:05 jdarcy I'm just not even going to get into the compression issue yet.  We need to figure out what's going on in the simple case with real data before we start adding variables.
22:06 andrei_ ah, okay
22:06 andrei_ good move
22:06 andrei_ what do I need to do with strace?
22:07 jdarcy Something like "strace -o outputfile -f -t -T -p $GLUSTERFSD_PID" should start.  Probably only want to run it for a couple of seconds or else you'll get too much output to be useful.
22:08 jdarcy Might want to add "-e trace=file" or even "-e trace=writev" to filter the output even more.
22:09 fidevo joined #gluster
22:09 andrei_ will do it in a minute
22:09 andrei_ by the way
22:09 andrei_ this is using gluster 3.4 beta1
22:09 andrei_ on ubuntu 12.04
22:09 andrei_ from ppa
22:10 jdarcy The other questions you'll want to answer are (1) what kind of data rate can you get between two local sockets, and (2) what happens to the numbers if you run more than one I/O thread concurrently.
22:10 jdarcy GlusterFS isn't optimized for low single-stream latency.  It's designed for high throughput across many simultaneous I/O streams.
22:14 andrei_ shall we start with strace first?
22:15 fabien andrei_: performance.cache-min-file-size: 1GB ? performance.cache-max-file-size: 4TB ??
22:17 andrei_ fabien: I am using gluster for vm images, so there is no point to cache files less than 1GB in size
22:18 andrei_ and I am not using volumes over 4TB in size
22:18 andrei_ hense these values
22:18 andrei_ or did I misunderstand these values?
22:18 fabien ok, but you mean you put in cache like 1Tb file ? should it lower your bench ??
22:19 fabien if no cache with zfs
22:20 andrei_ fabien, I do not intend to put the entire 4TB into cache. However, If a 4TB vm image is having a bunch of reads, I would like to cache them, right?
22:20 andrei_ afterall I am limiting the total read cache size to 5GB I think
22:21 andrei_ fabien: i am trying to determine why gluster would only give me about 1/4 of the zfs disk performance.
22:21 andrei_ I am not even talking about the zfs cache speeds
22:22 andrei_ which are in excess of 3GB/s on a single thread
22:22 fabien is gluster also 1/4 zfs with default cache option ?
22:24 andrei_ yeah
22:24 andrei_ i only tried these options after trying the default glusterfs settings
22:24 andrei_ without any additions
22:24 andrei_ cache improves things a little
22:24 andrei_ with default settings I was getting around 200-220MB/s
22:25 andrei_ with additional options it has increased to around 250-280MB/s mark
22:25 andrei_ zfs gives me around 900MB/s
22:27 andrei_ http://fpaste.org/11591/13682248/
22:27 glusterbot Title: #11591 Fedora Project Pastebin (at fpaste.org)
22:27 andrei_ this is what strace gives me
22:36 bdperkin joined #gluster
22:36 andrei_ looking at strace output I am seeing pretty similar distribution and percentages of system calls between the zfs and glusterfs reads
22:36 andrei_ around 97% read for zfs and around 95% for glusterfs
22:37 andrei_ that doesn't explain  the performance difference
23:08 fabien andrei_: is it the same when mounting with nfs ? with xfs instead of zfs ?
23:08 fabien I'm really not expert, only curious ;)
23:08 andrei_ i've not tried using xfs
23:09 andrei_ but i've tried to use nfs over gluster yesterday and I saw a little better performance compared to the native glusterfs
23:09 andrei_ but not by much
23:09 andrei_ around 260MB/s with unmodified gluster settings
23:10 andrei_ compared with around 220MB/s with native gluster
23:10 andrei_ that's using a single thread
23:11 andrei_ i've just ran 12 concurrent dd processes and I am seeing an improved performance compared with a single thread
23:11 andrei_ with with 12 threads I am still seeing only 1/2 of the zfs performance
23:11 andrei_ using the same data files
23:12 andrei_ so, I see around 540mb/s cumulative throughput
23:12 fabien I've not try zfs yet, but had good results with xfs
23:12 fabien http://zfsonlinux.org/ ?
23:12 glusterbot Title: ZFS on Linux (at zfsonlinux.org)
23:13 andrei_ fabien: what speeds were you getting with xfs?
23:13 andrei_ if yo compare native xfs with gluster over xfs?
23:15 andrei_ in comparison using the same files with 12 threads I am seeing glusterfs' throughput at 540MB/s compared with the zfs throughtput of 1020MB/s
23:15 andrei_ roughly 1/2 of the native speed
23:15 andrei_ unless I am missing something I would call it pretty poor performance ((((
23:16 andrei_ taking into account that gluster should thrive with multiple threads
23:17 andrei_ perhaps it's a 3.4beta1 regression, not sure
23:17 andrei_ will try 3.3.1 tomorrow and see how it performs
23:20 fabien seems mounting with nfs use cache (1st 400MB/s 2nd 840MB/s), not with gluster client by default
23:30 fabien on little testmachine, with : echo 3 > /proc/sys/vm/drop_caches
23:32 fabien before each run : quite same perf (like disk) : around 140MB/sec
23:34 fabien without dropping the cache, 2nd test : 830MB (xfs brick), 870MB/s (nfs mount), 414MB/s (gluster mount)
23:34 fabien all default
23:37 fabien boosting performance.cache-size as you do too : 818MB/s
23:39 badone joined #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary