Camelia, the Perl 6 bug

IRC log for #gluster, 2013-04-10

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:26 a2 joined #gluster
00:30 premera joined #gluster
00:30 badone joined #gluster
00:31 vex JoeJulian: yeah.. I've got these in EC2 so was thinking about doing a snapshot or something of the bricks themselves
00:45 RicardoSSP joined #gluster
01:04 yinyin joined #gluster
01:20 bala1 joined #gluster
01:23 neofob joined #gluster
01:29 d3O joined #gluster
01:30 bharata joined #gluster
01:51 russm joined #gluster
02:14 kevein joined #gluster
02:19 dustint joined #gluster
02:42 hagarth joined #gluster
03:10 d3O_ joined #gluster
03:19 vshankar joined #gluster
03:29 rastar joined #gluster
03:31 rastar joined #gluster
03:48 arusso_ joined #gluster
03:56 jkroon joined #gluster
03:57 sgowda joined #gluster
04:02 vpshastry joined #gluster
04:04 saurabh joined #gluster
04:11 hagarth joined #gluster
04:15 d3O joined #gluster
04:18 yinyin joined #gluster
04:22 jkroon joined #gluster
04:27 bala joined #gluster
04:32 d3O_ joined #gluster
04:33 d3O_ joined #gluster
04:39 jkroon joined #gluster
04:48 sgowda joined #gluster
04:57 lalatenduM joined #gluster
05:01 vpshastry joined #gluster
05:02 ramkrsna joined #gluster
05:02 ramkrsna joined #gluster
05:04 dustint joined #gluster
05:04 jkroon joined #gluster
05:06 jkroon joined #gluster
05:09 jag3773 joined #gluster
05:10 deepakcs joined #gluster
05:10 rastar joined #gluster
05:11 jkroon joined #gluster
05:12 shylesh joined #gluster
05:14 bala joined #gluster
05:17 jkroon joined #gluster
05:17 d3O joined #gluster
05:19 aravindavk joined #gluster
05:21 bulde joined #gluster
05:21 vpshastr1 joined #gluster
05:23 jkroon joined #gluster
05:28 satheesh joined #gluster
05:29 kevein joined #gluster
05:41 bala joined #gluster
05:46 sgowda joined #gluster
05:52 raghu joined #gluster
05:57 mohankumar joined #gluster
06:06 satheesh joined #gluster
06:08 sgowda joined #gluster
06:11 puebele joined #gluster
06:11 Nevan joined #gluster
06:17 guigui1 joined #gluster
06:21 cw joined #gluster
06:28 ngoswami joined #gluster
06:35 ollivera joined #gluster
06:38 vimal joined #gluster
06:43 ndevos joined #gluster
06:51 sgowda joined #gluster
06:52 saurabh joined #gluster
06:53 ekuric joined #gluster
06:57 ricky-ticky joined #gluster
06:57 vpshastry1 joined #gluster
06:59 ctria joined #gluster
06:59 vpshastry2 joined #gluster
07:00 ujjain joined #gluster
07:02 glusterbot New news from newglusterbugs: [Bug 916390] NLM acquires lock before confirming callback path to client <http://goo.gl/5eoJv> || [Bug 916406] NLM failure against Solaris NFS client <http://goo.gl/uGTJA>
07:04 puebele joined #gluster
07:05 shireesh joined #gluster
07:07 gbrand_ joined #gluster
07:22 sgowda joined #gluster
07:27 66MAAIZXN joined #gluster
07:28 andreask joined #gluster
07:35 18WAC93PS joined #gluster
07:39 helloworld joined #gluster
07:45 rotbeard joined #gluster
07:46 puebele joined #gluster
07:54 joehoyle- joined #gluster
08:11 Norky joined #gluster
08:32 vrturbo joined #gluster
08:32 vrturbo when setting transport RDMA is both server to server and server to client using RDMA over infiniband ?
08:51 satheesh joined #gluster
08:53 ngoswami joined #gluster
09:06 dobber_ joined #gluster
09:18 rotbeard joined #gluster
09:22 manik joined #gluster
09:23 joeto joined #gluster
09:31 64MACZC2I joined #gluster
09:31 45PAAECY6 joined #gluster
10:13 ProT-0-TypE joined #gluster
10:18 spai joined #gluster
10:31 errstr joined #gluster
10:33 rastar joined #gluster
10:41 rgustafs joined #gluster
11:04 lge joined #gluster
11:05 edward1 joined #gluster
11:13 vpshastry joined #gluster
11:14 vpshastry1 joined #gluster
11:15 hagarth joined #gluster
11:41 vpshastry joined #gluster
11:41 rastar joined #gluster
11:51 dustint joined #gluster
11:53 jian joined #gluster
11:55 vpshastry1 joined #gluster
12:01 yinyin_ joined #gluster
12:05 ThatGraemeGuy joined #gluster
12:06 bennyturns joined #gluster
12:06 piotrektt joined #gluster
12:13 spider_fingers joined #gluster
12:14 spider_fingers hello
12:14 glusterbot spider_fingers: Despite the fact that friendly greetings are nice, please ask your question. Carefully identify your problem in such a way that when a volunteer has a few minutes, they can offer you a potential solution. These are volunteers, so be patient. Answers may come in a few minutes, or may take hours. If you're still in the channel, someone will eventually offer an
12:14 glusterbot answer.
12:16 spider_fingers i'm not in the habit of bothering people with stupid questions, but i can't google anything similar to my bug: i set up simple gluster mirror, both servers see each other and have established connections, but i see only half of space from df -h, and just from one particular node
12:16 spider_fingers is it a problem with df?
12:17 piotrektt but didnt you set replica mode?
12:17 spider_fingers 1 sec
12:17 piotrektt so if you have 2 servers 2TB each
12:17 piotrektt you wont have 4TB but 2TB :P
12:18 piotrektt i see it this way, but i can be wrong
12:18 spider_fingers wait a minute...
12:20 spider_fingers damn it, it IS a mirror!
12:20 Norky so... there's no problem?
12:20 spider_fingers you're right, so i assume it is all okay then. thanks
12:21 piotrektt np
12:21 spider_fingers i'm sorry)
12:21 Norky no worries, we all gotta learn
12:23 spider_fingers maybe i'll hang around on the chan with such nice people around)
12:24 piotrektt :)
12:24 piotrektt it's always good idea
12:25 piotrektt to see what other problems people struggle with.. then when you hit such - you already know answer :D
12:26 spider_fingers нгз)
12:26 spider_fingers yup)
12:27 dustint joined #gluster
12:32 manik joined #gluster
12:33 jian hi, all. I'm using glusterfs 3.2.7 , but the iptables have problem. Is there anyone can help me ?
12:33 ndevos ~ports | jian
12:33 glusterbot jian: glusterd's management port is 24007/tcp and 24008/tcp if you use rdma. Bricks (glusterfsd) use 24009 & up. (Deleted volumes do not reset this counter.) Additionally it will listen on 38465-38467/tcp for nfs, also 38468 for NLM since 3.3.0. NFS also depends on rpcbind/portmap on port 111.
12:33 jian http://gluster.org/community/documentation​/index.php/Gluster_3.1:_Installing_Red_Hat​_Package_Manager_%28RPM%29_Distributions
12:33 glusterbot <http://goo.gl/FbK4O> (at gluster.org)
12:34 jian that url i'm saw already.
12:34 jian ndevos, thanks
12:35 jian but i have configure iptables by following rules:
12:35 jian # cat /etc/sysconfig/iptables
12:35 jian # Generated by iptables-save v1.4.7 on Thu Apr 11 00:09:23 2013
12:35 jian *filter
12:35 jian :INPUT ACCEPT [0:0]
12:35 jian :FORWARD ACCEPT [0:0]
12:35 jian :OUTPUT ACCEPT [21:1996]
12:35 jian -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
12:35 jian -A INPUT -p icmp -j ACCEPT
12:35 jian -A INPUT -i lo -j ACCEPT
12:35 jian -A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT
12:35 jian was kicked by glusterbot: message flood detected
12:36 d3O joined #gluster
12:36 jian joined #gluster
12:36 ndevos ~pastebin | jian
12:36 glusterbot jian: I do not know about 'pastebin', but I do know about these similar topics: 'paste', 'pasteinfo'
12:36 ndevos @paste
12:36 glusterbot ndevos: For RPM based distros you can yum install fpaste, for debian and ubuntu it's dpaste. Then you can easily pipe command output to [fd] paste and it'll give you an url.
12:37 spider_fingers wow
12:37 spider_fingers didn't know about fpaste, nice...
12:37 robos joined #gluster
12:38 jian :-)
12:40 Norky then put the URL in here, so we can see it without flooding the channel
12:40 hagarth joined #gluster
12:41 Norky it's not a bad idea to chain multiple commands into one d/fpaste with (command1 ; command2 ; command3) | fpaste
12:44 spider_fingers where gluster daemon keeps its port counter, i wonder. if you create/delete volumes a lot, you will eventually come to use lots of ports
12:48 jian problem solved, http://paste.ubuntu.com/5695252/ , thanks David Coulson , in maillist
12:48 glusterbot Title: Ubuntu Pastebin (at paste.ubuntu.com)
12:57 chirino joined #gluster
13:05 balunasj joined #gluster
13:07 lalatenduM joined #gluster
13:11 Norky jian, if you've onl y done as David says, you now effectively have no firewall :)
13:11 Norky you *could* reinstate that line, just *after* the other INPUT directives
13:13 spider_fingers just do it like this, jian
13:13 spider_fingers http://paste.fedoraproject.org/7065/59961013/
13:13 glusterbot Title: #7065 Fedora Project Pastebin (at paste.fedoraproject.org)
13:15 jian Nory , why ?
13:15 jian I've the rules at bottom,
13:15 jian -A FORWARD -j REJECT --reject-with icmp-host-prohibited
13:17 spider_fingers this is FORWARD, this means traffic passing through
13:17 spider_fingers you must have INPUT either
13:17 jian yes, you are right.
13:17 jian I see now.
13:17 jian :-)
13:18 Norky you could also set your default policy ( :INPUT ) to REJECT
13:20 jian Now, my rules is http://paste.fedoraproject.org/7067/56000031/ .
13:20 glusterbot Title: #7067 Fedora Project Pastebin (at paste.fedoraproject.org)
13:20 jian Have INPUT , FORWARD both. thank you all , :-)
13:20 Norky no worries
13:23 jian a simple chinese  howto about install and config glusterfs have completed: http://www.ylinux.org/forum/t/424 , for everyone .
13:23 glusterbot Title: View Topic: Glusterfs 使用总结 - YLinux (at www.ylinux.org)
13:23 spider_fingers better DROP than REJECT imho
13:23 hagarth joined #gluster
13:23 deepakcs joined #gluster
13:24 joehoyle- joined #gluster
13:24 spider_fingers bad guys should suffer on hanging connections, than receive polite rejects))))
13:24 jian yes DROP have no reply , REJECT have a "host-prohibited" problem, there is bugs for hacker. :-)
13:25 jian wow
13:26 jian I've a topic about this: http://www.ylinux.org/forum/t/412
13:26 glusterbot Title: View Topic: 如何使用 Linux 服务器搭建一个网关/路由器 - YLinux (at www.ylinux.org)
13:30 mohankumar joined #gluster
13:32 Goatbert joined #gluster
13:33 manik joined #gluster
13:41 rwheeler joined #gluster
13:43 portante joined #gluster
13:46 jskinner_ joined #gluster
13:48 jskinner_ I am testing a Gluster setup with 10 1TB drives and 4 600GB drives with all disks formatted for XFS. I am wondering what best practice would be for creating my bricks. I was thinking about creating each individual drive as it's own brick, and then creating 2 volumes; 1 for the 1TB drives, and another for the 600GB drives.
13:48 jskinner_ the purpose of the system will be for running virtual machines
13:48 jskinner_ would this be an ideal setup? or is there a best practice for this?
13:50 spider_fingers it doesn't matter what size are the bricks, it workds above blocks anyways
13:50 spider_fingers and that's the thing i don't like...
13:51 spider_fingers i would use iscsi for vm storage
13:52 jskinner_ well I am looking to run these VMs in a distributed file system
13:52 semiosis it's a good idea to have your bricks the same size
13:53 semiosis there's some new stuff coming out in glusterfs 3.4 for VM hosting... qemu/kvm integration and some kind of block device emulation
13:53 semiosis iscsi is a protocol not a storage solution
13:54 jskinner_ correct
13:54 jskinner_ ok
13:54 jskinner_ so my original idea of having 2 volumes to separate the different size bricks would be good
13:55 andrewjsledge joined #gluster
13:56 semiosis well it's reasonable :)
13:58 jskinner_ lol ok
13:59 Norky bear in mind that in a single volume the larger drives will have more files on them, so in a busy system, they will be more busy than the smaller drives...
14:01 Norky if they wont be that busy, or that potential performance imblanace is not a problem for you, then carry on :)
14:01 andrewjsledge joined #gluster
14:02 Norky you could of course have two separate volume, one made of the 600GiB drives, one of the 1TiB drives
14:02 jskinner_ @Norky, yeah that was my plan
14:02 spider_fingers the stability of gluster is still questionable
14:03 Norky stability in what sense?
14:03 Norky it's been pretty reliable for us
14:04 spider_fingers in general way, what if i have a couple of nodes with striping under heavy load and half of them suddenly go down?
14:04 semiosis don't use ,,(stripe)
14:04 glusterbot Please see http://goo.gl/5ohqd about stripe volumes.
14:04 Norky well, you lose access to your data
14:05 Norky that's expected
14:05 semiosis imho, just dont use stripe
14:06 jskinner_ have any of you ever done testing with running VMs from Gluster?
14:06 Norky I have not.
14:06 spider_fingers well, where can i get whole list of options of storage?
14:07 Norky I will do at some point
14:07 semiosis spider_fingers: options of storage?
14:07 spider_fingers like DHT or striping or mirroring
14:08 Norky https://access.redhat.com/site/documentation/​en-US/Red_Hat_Storage/2.0/html/Administration​_Guide/chap-User_Guide-Setting_Volumes.html
14:08 glusterbot <http://goo.gl/koW26> (at access.redhat.com)
14:08 Norky is one place
14:08 spider_fingers so distribution is like a raid 5 array?
14:08 semiosis no
14:08 semiosis there's no good raid analogy
14:09 spider_fingers why? the concept is quite similar
14:09 Norky it's *kind*of* like striping, but at the file level, rather than block level
14:09 semiosis it's a stretch
14:09 semiosis and just leads to confusion
14:10 andrewjsledge joined #gluster
14:10 spider_fingers now i think i've got to read more docs
14:10 semiosis distribution places files on bricks (pure distribute) or replica sets (distributed-replicated)
14:10 Norky so a complete file of arbitrary length will be placed on one brick or another, rather than fixed size blocks being place on one disk, then the next, then the next
14:11 Norky yeah, trying to equate GlusterFS  to block RAID will hinder your understanding  more likely than help it
14:11 spider_fingers i was stuck with help from <gluster> utility, when you create volume, how do you list all available options
14:11 semiosis gluster volume set help
14:11 Norky gluster help
14:12 Norky then gluster <specific command> help
14:13 spider_fingers nah there are just stripe and replica
14:13 spider_fingers which version are you talking about?
14:13 spider_fingers maybe 3.2.7 is too old for fun?
14:14 semiosis spider_fingers: when you create a volume you provide a list of bricks, files are distributed over those bricks.  if you say replica N, then the list of bricks is treated as a list of replica sets, with files distributed over the replicas
14:15 semiosis for example 'gluster volume create replica 2 a:/1 b:/1 a:/2 b:/2' makes a distributed-replicated volume, with two replica pairs, which will each hold half of the files
14:15 spider_fingers hmmm
14:15 spider_fingers that's getting interesting
14:15 lh joined #gluster
14:15 lh joined #gluster
14:16 spider_fingers thatnks for help, i'd better quit bothering people and read more)
14:16 semiosis :)
14:16 Supermathie jskinner_: From a storage admin side, NEVER combine two different two types of storage into a single pool. Guessing those are 1TB nearline and 600GB SAS?
14:16 semiosis feel free to come back with more questions
14:16 spider_fingers yeah, never mind if i do)
14:17 jskinner_ they are both SAS, the 1TB or 7.5k and the 600GB are 15k rpm
14:18 Supermathie jskinner_: Yep, as I suspected. nearline and 15k will perform vastly differently - you'll definitely want to separate them.
14:18 jskinner_ I planned on just putting them in different volumes, would that not be good?
14:18 bugs_ joined #gluster
14:18 jskinner_ ok
14:18 jskinner_ is the 1 brick per physical disk configuration optimal?
14:19 Supermathie mmmmmmmmmm that's what I'm testing now :) 11 3.2TB ioScale card and 8 x 400GB SSD on each node.
14:19 Supermathie At the moment yes, one brick per disk
14:20 jskinner_ ok
14:20 jskinner_ and distributed-replicated
14:20 jskinner_ was what I planned on doing
14:21 Supermathie Sounds sensible, depends on workload.
14:21 jskinner_ yeah
14:21 jskinner_ now I just need to figure out how to integrate this into CloudStack lol
14:22 Supermathie Shove it at the cloud and say "Here's some storage!"? :)
14:22 jskinner_ lol that's my plan
14:22 semiosis jskinner_: things like "good" and "optimal" need to be evaluated in your specific context.  sometimes one disk per brick is good, other times building bricks from arrays of disks is good.  it depends.
14:22 jskinner_ but I will have to use fuse
14:23 Supermathie Aaaaand other times nobody knows, because nobody's tested it.
14:24 jskinner_ so like, for example, create a raidz array out of say, 10 disks, and then present that to gluster as a single brick
14:24 semiosis imho a good rule of thumb is to use one physical disk per brick (and possibly many bricks per server) unless you need either... 1) to store extremely large files (that are near to the size of the disk itself) or 2) single thread performance in excess of what a disk can provide on its own
14:25 semiosis see ,,(canned ebs rant) for more of my opinion on that :)
14:25 glusterbot http://goo.gl/GJzYu
14:25 Norky jskinner_, across how many servers will these disks be spread?
14:25 Supermathie So that falls under ZFS tuningâ... as I understand it, you won't get as much performance out of a single raidz
14:25 Supermathie But I'm not sure about that - read up about it.
14:26 Supermathie sudo add-apt-repository ppa:semiosis/ubuntu-glusterfs-3.3
14:26 Supermathie woot woot
14:26 jskinner_ up to 10
14:26 jskinner_ but I want to start with 1
14:26 jskinner_ maybe more
14:26 jskinner_ maybe across 5 racks
14:28 jskinner_ with a 10Gb switching network
14:29 jskinner_ and you are right about zfs, each raidz pool is equal to 1 drive in performance; so the more pools the better.
14:34 daMaestro joined #gluster
14:35 Supermathie So another place I was to use gluster is on a web backend tier - I admin an application where users upload images and I want all servers to "immediately" have a copy of the file. Seems like gluster is the perfect fit for this - replicate to all bricks (1/server) and I'm done.
14:35 guigui3 joined #gluster
14:35 Supermathie ^ sensible? Seems almost like a textbook use case.
14:38 semiosis +1
14:39 Supermathie Ruh-roh... back in a bit. This machine is suddenly unhappy
14:39 Supermathie md1 : active raid6 sdb2[4](F) sdc2[5](F) sda2[6](F) sdd2[7](F) 143090816 blocks level 6, 64k chunk, algorithm 2 [4/0] [____]
14:39 spider_fingers lol
14:44 andreask joined #gluster
14:52 sefz joined #gluster
14:52 sefz hello
14:52 glusterbot sefz: Despite the fact that friendly greetings are nice, please ask your question. Carefully identify your problem in such a way that when a volunteer has a few minutes, they can offer you a potential solution. These are volunteers, so be patient. Answers may come in a few minutes, or may take hours. If you're still in the channel, someone will eventually offer an answer.
14:54 sefz my question is fairly simple, i have a striped replicated volume across 3 machines (4 bricks each machine), is there a way, when I create a large file for testing (say: 200 GB), to see when it's been fully synced to all the glusters?
14:55 jdarcy joined #gluster
14:55 H__ afaik that's done right atthe tme it says it got it
14:55 Norky it's done when the write() returns
14:55 d3O joined #gluster
14:56 H__ worded far better, thanks :)
14:56 sefz ok, thanks :D
14:56 puebele2 joined #gluster
14:57 semiosis well there is write behind
14:58 Norky true
14:58 semiosis writes are sent to all servers at the same time, but that time may be delayed slightly
14:58 ekuric left #gluster
15:04 Supermathie joined #gluster
15:05 andrewbogott joined #gluster
15:05 spider_fingers left #gluster
15:10 sefz hmm, i just mounted a volume using -t glusterfs, mount was succesful ( df -h gives me: glustera:/cloudxvol 4.0T 798M 3.8T 1% /mnt/gluster ) but i'm not able to write on it, even as root, what's going on?
15:11 puebele joined #gluster
15:11 semiosis sefz: generally, when you want to know whats going on, look at log files.  in this case, client log file, which is probably /var/log/glusterfs/cloudxvol.log
15:11 semiosis s/cloudxvol/mnt-gluster/
15:11 glusterbot semiosis: Error: I couldn't find a message matching that criteria in my history of 1000 messages.
15:11 semiosis glusterbot: meh
15:11 glusterbot semiosis: I'm not happy about it either
15:11 semiosis s/cloudxvol/mnt-gluster/
15:11 glusterbot semiosis: You've given me 5 invalid commands within the last minute; I'm now ignoring you for 10 minutes.
15:11 glusterbot What semiosis meant to say was: sefz: generally, when you want to know whats going on, look at log files.  in this case, client log file, which is probably /var/log/glusterfs/mnt-gluster.log
15:14 andrewbogott left #gluster
15:15 sefz I see this, i guess that's the problem? [client-handshake.c:1445:client_setvolume_cbk] 0-cloudxvol-client-11: Server and Client lk-version numbers are not same, reopening the fds
15:15 glusterbot sefz: This is normal behavior and can safely be ignored.
15:19 nueces joined #gluster
15:29 puebele joined #gluster
15:33 portante|lt joined #gluster
15:40 sefz [2013-04-10 15:39:33.096104] E [stripe-helpers.c:268:stripe_ctx_handle] 0-cloudxvol-stripe-0: Failed to get stripe-size
15:40 sefz [2013-04-10 15:39:33.136247] W [fuse-bridge.c:2025:fuse_writev_cbk] 0-glusterfs-fuse: 214: WRITE => -1 (Invalid argument)
15:41 sefz i get this when copying big files in gluster fs... cp: writing `bigfile2': Invalid argument and cp: failed to extend `bigfile2': Invalid argument
15:43 jskinn___ joined #gluster
15:46 semiosis stripe?  really?
15:47 sefz why not?
15:47 semiosis you probably shouldn't use that, unless you really know what you're doing & have eliminated all the alternatives
15:48 semiosis why?
15:48 sefz you mean, striped replicated volumes?
15:48 Supermathie semiosis: Can you point us at a doc that talks about when and when not to do so?
15:48 semiosis it's why stripe... not why not stripe
15:48 semiosis doc: don't use stripe
15:48 semiosis Supermathie: ^
15:48 chirino What's the problem /w stripe?
15:49 sefz i thought was fully supported and fully functional... actually is a requirement for our project
15:49 Supermathie Unless you have, say, huge files? Movie files, etc?
15:49 ramkrsna joined #gluster
15:49 ramkrsna joined #gluster
15:49 semiosis JoeJulian wrote a bit about ,,(stripe) on his blog
15:49 glusterbot Please see http://goo.gl/5ohqd about stripe volumes.
15:51 Supermathie 'This is due to the fact that offset 0 of any file is always on the first subvolume.'
15:51 Supermathie That is a HUGE point.
15:51 semiosis stripe was designed for a very specific hpc use case, it has not seen widespread use, is not well supported by the community, and imo its complexity is usually unjustified.  people often see stripe and think "oh yeah, i want that" without understanding what it is
15:51 jdarcy It's on the first subvolume *of the stripe set*, not the first subvolume of the volume as a whole.
15:51 semiosis so i recommend against blindly going with it
15:51 Supermathie "OOHHH RAID! RAID IS GOOD!"
15:51 semiosis Supermathie: yeah, that
15:52 jdarcy Except for some very specific workloads, the overhead of splitting and recombining I/Os across multiple connections is greater than the parallelism benefit.
15:52 Supermathie stripe set -> subset of bricks across which a particular file will be striped?
15:52 jdarcy Plus it makes you vulnerable to issues such as XFS's crazy preallocation behavior.
15:53 semiosis sefz: if it's a requirement for your project, could you please enlighten us as to why? :)
15:53 Supermathie Here's a potential one:
15:53 jdarcy Supermathie: Correct.  We combine bricks into replica sets (if R>1), then combine those into stripe sets (if S>1), then combine those into volumes (always even if N=1).
15:53 Supermathie [michael@fleming1 ~]$ ls -lh /db/oradata/ALTUS/users01.dbf
15:53 Supermathie -rw-r----- 1 oracle oinstall 22G Apr 10 11:37 /db/oradata/ALTUS/users01.dbf
15:54 sefz i exactly don't know why (i think they don't know either), but they want to manage huge medical data, and I guess they want big files to be accessible really fast
15:54 jdarcy Running Oracle on top of *any* distributed filesystem might be sub-optimal for reasons beyond striping.
15:55 Supermathie jdarcy: I'm investigating those reasons :)
15:55 semiosis sefz: maybe making larger bricks and using distribute-replicate (without stripe) will satisfy your requirements.  i'd recommend exploring that route to the fullest before considering stripe.
15:56 JoeJulian If you can shard the oracle's data store, it might work. It "works" for innodb, though I don't have any huge performance requirements from mysql.
15:56 sefz I guess they want to run hadoop later on, on this filesystem... maybe that's why they want to stripe
15:56 Supermathie sefz: accessible fast? make the underlying bricks faster.
15:57 Supermathie JoeJulian: I'm currently trying to figure out how to dump its data into multiple files...
15:57 jdarcy Supermathie: It's just *really* hard to satisfy both latency and consistency requirements for databases when there's a network in the middle.
15:58 semiosis someone wrote a paper about that ;)
15:58 jdarcy semiosis: Abadi
15:59 Supermathie jdarcy: Yeah, I get that :) Not being shoddy about this setup. Getting about 56k TPS at the moment, still getting this rig set up for a proper test.
15:59 sefz ok, stripe is bad and dangerous... but what if I have files bigger than 1TB of size? (1 TB is maximum space for an amazon EBS drive)
15:59 jdarcy I used to hate his PACELC formulation, but it's kind of grown on me since.
16:00 semiosis sefz: ,,(canned ebs rant)
16:00 glusterbot sefz: http://goo.gl/GJzYu
16:00 JoeJulian lol
16:00 semiosis use lvm/mdadm to combine ebs volumes
16:00 jdarcy sefz: Ah.  If you need it to overcome brick-size limitations, that's a whole different kettle of fish.
16:01 jdarcy Actually I'd be inclined to try replication+striping across multiple instances using ephemeral storage (Netflix style) and avoid EBS altogether.
16:02 jdarcy I talked recently to a *large* customer who had been burned many times by EBS, switched to all-ephemeral, and hasn't had a problem since.
16:04 jdarcy I'd also recommend using another cloud service (rather fond of Rackspace, Storm on Demand, and RamNode - thanks for asking) but that's a separate discussion.
16:05 portante` joined #gluster
16:06 jdarcy My website is at Rackspace, my general-purpose server is at Linode, and I have three RamNode instances running right now.  Some people collect physical computers, I collect cloud machines.
16:06 semiosis going all ephemeral is probably more expensive.  for me ebs is very cost effective, and glusterfs replication between availability zones has allowed us to survive several ebs failures :)
16:08 jdarcy semiosis: Depends on your performance vs. capacity requirements.  Ephemeral is way better performance/$ while EBS is better capacity/$.
16:09 semiosis correct
16:09 jdarcy semiosis: Only one among many reasons I want to get that "DHT over DHT" tiering stuff going.
16:09 Supermathie jdarcy: putting clouds in your clouds?
16:10 jdarcy Hey dawg, I heard you like clouds, so I . . . you get the idea.
16:10 m0zes TIER THE TIERS
16:10 jdarcy Tiers For Beers
16:11 jdarcy Everybody wants to rule the data center.
16:11 jdarcy And you're welcome for putting that awful 80s song in your head.
16:13 bulde joined #gluster
16:17 chirino semiosis, JoeJulian: so from a outsider's perspective, using madm to stripe EBS disks, and using gluster to stripe bricks seems conceptually similar.
16:18 semiosis superficially
16:18 jdarcy chirino: Conceptually, yes.  There's a big difference in who handles scheduling across them, though.
16:20 * JoeJulian likes that 80s song... :/
16:20 chirino so are there any OS projects that provide EBS style network access to disks?
16:21 jdarcy chirino: Besides us, there's RBD (part of Ceph) and Sheepdog.
16:21 JoeJulian openstack cinder
16:21 jdarcy And anything iSCSI.
16:22 jdarcy JoeJulian: I can't tell if Cinder is an actual implementation or more of an API for others (such as us) to use.
16:22 chirino Are any of those part of RHEL?
16:22 semiosis theres probably an ebs service component in eucalyptus as well, but i havent looked into that in a while
16:22 JoeJulian More of an api
16:22 jdarcy chirino: I don't think so.
16:23 chirino :(
16:24 chirino So from my point of view considering 10GB networks, or even RDMA, you can easily build a network that's faster than a single disk.
16:24 jdarcy Even faster than a single SSD.
16:24 chirino Would be nice if gluster has some way of maxing those out.
16:25 Supermathie chirino: working on it...
16:25 Supermathie :)
16:26 JoeJulian chirino: Only one single client?
16:26 chirino Not saying gluster has to do it, perhaps it's an iSCSI thingy, but would be nice if gluster recommended how best to stripe multiple disks.
16:26 jdarcy chirino: And we do, but it turns out that the bottleneck as far as the application is concerned usually resides elsewhere.
16:27 chirino JoeJulian: in my case, yes.
16:27 chirino it's similar to a DB workload.
16:27 chirino the DB will have muliple clients tho.
16:27 jdarcy You might be able to build a network with higher throughput than a disk, but good luck building a network where the *latency* for network+disk is less than the latency for disk alone.
16:28 chirino some workloads are not latency sensitive.
16:29 jdarcy chirino: Indeed, and for those workloads we aggregate the bandwidth quite well.
16:29 * JoeJulian grumbles about SPOF database engines...
16:31 chirino 'we' as in gluster?  Using the gluster striping?
16:32 JoeJulian Your best bet is to require the database engine to shard it's data store so it can be easily distributed and replicated without exceeding brick sizes.
16:32 jdarcy chirino: Even without.  Throughput-oriented applications also tend to be multi-client, so they distribute among servers even without striping.
16:32 Supermathie jdarcy: This even surprised me: http://imgur.com/2klpWkb (LGWR Avg Tm(ms) is log write latency)
16:32 glusterbot Title: Testing: oracle on gluster (kNFS) - Imgur (at imgur.com)
16:33 chirino well in the DB style apps, the DB tends to aggregate and write transactions to the write ahead log.  the more load you add the more throughput you need out of the writes to that single log file.
16:33 dxd828 joined #gluster
16:34 dxd828 What is the recommended file system for blocks on production systems?
16:34 Supermathie chirino: a rare case of RAID5 trumping RAID10
16:34 semiosis dxd828: xfs with inode size 512 or 1024
16:35 jdarcy chirino: Any DB claiming to have the D in ACID had better be doing at least one write (despite batching) per transaction, so it will rapidly become IOPS-limited.
16:35 jdarcy That's why all of the TPC numbers are etc. are with thousands of spindles, short-stroking etc. until SSDs came along.
16:36 dxd828 semiosis, thanks.
16:36 semiosis yw
16:36 jclift joined #gluster
16:38 JoeJulian With only one database engine accessing the files, that would be an instance for eager locking, right jdarcy?
16:38 JoeJulian I still haven't taken the time to look at what that really is.
16:38 jdarcy JoeJulian: Yep.  Eager locking, changelog piggybacking, post-op delay, all that stuff.  Quite similar for DBs and VMs.
16:40 jdarcy BTW, this is all good fodder for the doc I'm writing on New Style Replication.
16:40 aravindavk joined #gluster
16:45 chirino JoeJulian, jdarcy, /w if I stripe on replicas I don't actually need to wait for the sync to hit disk.  I'm willing to gamble that the write will survive on at least one of the replica's memory banks.
16:45 andrewjsledge joined #gluster
16:46 theron joined #gluster
16:49 * sefz idle for 20 minutes: starting autoaway 14(15m1rk 15Xp)
16:50 zaitcev joined #gluster
16:51 jdarcy chirino: Right, then you'll just have the network and processing latencies to worry about.
16:51 chirino which is better than waiting for spindles to turn. ;)
16:52 jdarcy chirino: Indeed.  True regardless of whether or not you're striping.
16:52 chirino yep.
16:53 chirino but point being you don't need lots of IOPS to get good throughput in a DB workload.
16:53 jdarcy chirino: I'm not sure that you'll actually be able to avoid all of the disk hits in the current codebase, though.  We have to make sure that the actual data precedes the xattrs that mark operation state, and to do that we have to add fsyncs that the user didn't ask for.
16:54 chirino oh.
16:54 jdarcy chirino: Yes, you do need lots of IOPS.  Not disk IOPS, perhaps, but IOPS nonetheless.
16:54 chirino yeah.. I was refering to disk IOPS
16:54 chirino and bumber on the xattrs.
16:55 jdarcy chirino: Hence New Style Replication.  ;)
16:55 chirino 'New Style Replication' ?
16:55 jdarcy I have become completely disenchanted with the idea of relying on local filesystems to take care of ordering, consistency, etc. for us.  They've failed us too many times.
16:55 andrewjs1edge joined #gluster
16:56 jdarcy Kind of funny, since I was a local-filesystem guy once, and I used to hate when DB folks and others reinvented features we already had because they didn't trust us.
16:56 jdarcy chirino: Still just an idea in my head, but based on the last few years' worth of enduring complaints about our current replication.
16:58 jdarcy chirino: I'll make sure you're on the review list when I finish writing up a proposal.
16:58 chirino woot!
17:01 piotrektt joined #gluster
17:05 bulde joined #gluster
17:14 ash13 joined #gluster
17:15 acalvo joined #gluster
17:15 acalvo Hello
17:15 glusterbot acalvo: Despite the fact that friendly greetings are nice, please ask your question. Carefully identify your problem in such a way that when a volunteer has a few minutes, they can offer you a potential solution. These are volunteers, so be patient. Answers may come in a few minutes, or may take hours. If you're still in the channel, someone will eventually offer an answer.
17:16 acalvo when uploading large files I'm getting a ChunkWriteTimeout (10s)
17:16 acalvo it does work with smaller files
17:16 acalvo testing one node with just one REST server
17:18 hagarth joined #gluster
17:19 andrewjs1edge joined #gluster
17:19 rotbeard joined #gluster
17:20 brunoleon joined #gluster
17:28 arusso joined #gluster
17:31 andrewjs1edge joined #gluster
17:35 jskinner_ joined #gluster
17:38 jskinner joined #gluster
17:41 ash13 According to the gluster 3.2 docs, fuse-opt"
17:42 ash13 could be used to pass fuse options to the mount client.  is that gone in 3.3?
17:42 ash13 i get: unknown option fuse-opt (ignored)
17:42 ash13 if I try to use it.
17:43 ash13 (and if so, is there another mechanism to accomplish the same thing?)
17:46 Mo_ joined #gluster
17:47 joehoyle joined #gluster
17:52 semiosis ash13: full command please?
17:56 ash13 from the bottom of here:
17:56 ash13 http://gluster.org/community/documentation/ind​ex.php/Gluster_3.2:_Manually_Mounting_Volumes
17:56 glusterbot <http://goo.gl/qjbyR> (at gluster.org)
17:56 ash13 mount -t glusterfs -o log-level=WARNING,log-file=/var/lo​g/gluster.log,fuse-opt=allow_other server1:/test-volume /mnt/glusterfs
17:57 bugs_ joined #gluster
17:59 NuxRo http://www.redhat.com/about/news/archi​ve/2013/4/new-red-hat-storage-server-c​apabilities-deliver-the-storage-founda​tion-for-enterprise-open-hybrid-clouds
17:59 glusterbot <http://goo.gl/6KrXM> (at www.redhat.com)
17:59 NuxRo so it's based on 3.4alpha? :D
18:01 robos joined #gluster
18:06 ash13 semiosis: forgot to preface those comments w/ your handle, but to be clear I'm using 3.3.1, and those docs are from 3.2. So I'm not sure if the option got removed, or moved somewhere else.
18:07 semiosis i saw, just busy and dont have a quick answer for you :(
18:07 semiosis wish i could help more
18:10 ndevos ash13: available options are listed with "glusterfs --help", the /sbin/mount.glusterfs script parses the mount options and executes glusterfs with converted options
18:11 ndevos so, if a similar option is not in "glusterfs --help", there is little chance its available, and if it is, read /sbin/mount.glusterfs and see if a mount option sets it
18:12 ndevos NuxRo: its based on glusterfs-3.3, with loads of backports
18:18 NuxRo oh nice
18:18 H__ is 3.3.1 known to be slower with walking trees with lots of files compared to 3.2.5 ?
18:21 sefz weird things... i have a replica 3 stripe 4 setup, i create a 1Gigabit file using dd, and, while in the mounted fs, is 1000000000 byte, in the Gluster machines is 999817216 ... is that normal?
18:22 H__ is that the stripe cutting it in pieces ?
18:26 Supermathie sefz: Can you paste exactly what you're doing and how you're checking into pastie?
18:31 jskinne__ joined #gluster
18:31 t35t0r joined #gluster
18:31 t35t0r joined #gluster
18:41 sefz Here is the pastie... http://pastie.org/private/dm0pnspn3osoe8kfstieg
18:41 glusterbot Title: Private Paste - Pastie (at pastie.org)
18:41 sefz weird things happening on glusterfs.
18:43 sefz actually, minor update here: http://pastie.org/private/ddpoaifr2etwifb5rsegq
18:43 glusterbot Title: Private Paste - Pastie (at pastie.org)
18:43 semiosis sefz: sparse files
18:46 sefz hmm... i used fallocate to create the file
18:47 NuxRo I'm surprised it worked
18:47 sefz NuxRo: why?
18:47 semiosis did it work?
18:48 NuxRo because I just tried it earlier and got: Operation not supported :)
18:48 semiosis there were some issues in that pastie
18:48 NuxRo truncate works though, at least the file creation
18:48 sefz if you do falloc inside the gluster it doesn't
18:48 sefz i did the falloc on /tmp
18:48 sefz then copied
18:48 NuxRo oh right
18:48 sefz which issues are you noticing?
18:56 joehoyle joined #gluster
18:56 sefz semiosis: which issues are you talking about?
18:58 semiosis your pastie... "I have two problems:" ... 1, ... 2...
19:05 samppah_ it looks like glusterfs starts one glusterfsd process per brick.. are there any benefits separating one big brick to three smaller ones for example?
19:06 glusterbot New news from newglusterbugs: [Bug 895528] 3.4 Alpha Tracker <http://goo.gl/hZmy9> || [Bug 918917] 3.4 Beta1 Tracker <http://goo.gl/xL9yF>
19:08 semiosis samppah_: time required to sync a brick when you replace it
19:08 jskinner joined #gluster
19:09 semiosis samppah_: maybe better performance all around, more opportunity for parallelism
19:09 semiosis depending on the hardware
19:09 johnmorr joined #gluster
19:10 robos joined #gluster
19:10 samppah semiosis: yeah, parallel processing is something i had in mind
19:19 sefz is there a way when mounting a gluster fs, to include all the gluster machines? when I mount the filesystem i just mount one of the 3 servers.. and if it crashes, the mountpoint is gone
19:20 semiosis ~mount server | sefz
19:20 glusterbot sefz: (#1) The server specified is only used to retrieve the client volume definition. Once connected, the client connects to all the servers in the volume. See also @rrnds, or (#2) Learn more about the role played by the server specified on the mount command here: http://goo.gl/0EB1u
19:20 semiosis sefz: nfs client has a spof on the mount server, fuse client does not (as explained above)
19:21 sefz Nice, was always concerned about this
19:29 avati joined #gluster
19:29 * avati is backkk
19:33 brunoleon_ joined #gluster
19:48 plarsen joined #gluster
19:50 JoeJulian samppah: not necessarily any benefits to separating the brick. In fact, it's been proposed (by Ab? or was it avati?) to return the multiple brick serving to one single glusterfsd process like it was prior to the cli.
20:01 jbrooks joined #gluster
20:03 jclift_ joined #gluster
20:05 jskinne__ joined #gluster
20:06 glusterbot New news from newglusterbugs: [Bug 950761] forge.gluster.org - No MetaData Pages <http://goo.gl/JEHxt>
20:41 joehoyle joined #gluster
20:48 lh joined #gluster
20:48 lh joined #gluster
21:13 gbrand_ joined #gluster
21:21 tjstansell joined #gluster
21:36 hagarth joined #gluster
21:45 tjstansell there was talk on the users list of possibly having a new release this week.  anyone know if that's actually true?  and if that would only be a new 3.4 or also a new 3.3 release?
21:46 johnmark tjstansell: there are potentially 2 new builds coming out - one for 3.3 adn one for 3.4
21:47 JoeJulian Are we going to have a 3.3.2 qa release first?
21:47 johnmark JoeJulian: exactly
21:47 johnmark that's the imminent one for 3.3.x
21:47 tjstansell yay!
21:48 JoeJulian well, a qa isn't a release... :P
21:48 tjstansell it's close enough to unstall my existing project... ;)
21:48 johnmark haha... ok ok, fine - not a "release" per se
21:48 johnmark heh
21:48 tjstansell i've been waiting for *something* that includes that fix so when i rebuild a primary brick all my timestamps don't get hosed....
21:49 tjstansell and i figure a qa release is probably no worse than running one of the existing kkeithley builds for 3.3. that includes all of the additional backports over 3.3.1
21:49 JoeJulian It doesn't. Just a couple.
21:50 JoeJulian Most of the -3 through -11 are packaging things for ufo.
21:50 tjstansell ah. so not much there to be worried about anyway...
21:52 johnmark JoeJulian: correct
21:56 tjstansell is there a set of criteria you use for QA releases before calling it GA?  I see that 3.3.1 had qa1 on Aug 18, qa2 on Aug 30, qa3 on Sep 18, then 3.3.1 released on Oct 11 (according to directory timestamps on QA releases list).  Curious if there's a way to get a ballpark of when 3.3.2 GA might be available.
21:57 tjstansell i'm sure a lot depends on what issues are found...
21:57 JoeJulian I've always assumed it was based on bug reports. No news is good news....
21:57 tjstansell but does it require no new bugs for X days?  where X might be 20? 30?
21:58 JoeJulian I don't know. Maybe johnmark does. I can't even get a commitment to how long a version will remain supported.
21:59 JoeJulian distro packagers have asked about production cycles... no clue.
21:59 tjstansell fun.
22:19 johnmark JoeJulian: yeah, we need a policy around EOL
22:19 johnmark we don't have one :(
22:25 JoeJulian johnmark: There was a proposal, but it didn't seem to get anywhere: http://gluster.org/community/doc​umentation/index.php/Life_Cycle
22:25 glusterbot <http://goo.gl/K6iU0> (at gluster.org)
22:26 JoeJulian johnmark: And, of course, regardless of what I propose, or what a board of users adopt, if the devs aren't on board it's just a bunch of words.
22:29 johnmark JoeJulian: *SO* true
22:30 johnmark but it's a real problem - something we absolutely need to solve
22:31 bronaugh_ joined #gluster
22:32 bronaugh_ hey; so. I see contradictory documentation on the 'net about this
22:32 bronaugh_ can you, or can you not, rename a volume?
22:32 bronaugh_ and can you, or can you not, change the path to the brick that a volume contains?
22:32 JoeJulian You cannot rename a volume. You can delete it and recreate it with a different name.
22:32 Bullardo joined #gluster
22:33 bronaugh_ ok. are there any nasty implications for that?
22:33 JoeJulian You can change the path using replace-brick.
22:33 bronaugh_ cute re replace-brick.
22:33 bronaugh_ works with volume where bricks=1?
22:33 JoeJulian Not really, no. It will complain (as you're adding the bricks) about a "path or prefix" already being part of a volume.
22:33 JoeJulian ,,(prefix)
22:33 glusterbot JoeJulian: Error: No factoid matches that key.
22:33 bronaugh_ right, ok.
22:34 JoeJulian @prefix
22:34 JoeJulian @path or prefix
22:34 bronaugh_ so basically it's delete and recreate if paths change.
22:34 JoeJulian @search prefix
22:34 glusterbot JoeJulian: supybot.reply.withNickPrefix, supybot.plugins.Dunno.prefixNick, supybot.plugins.RSS.announcementPrefix, and supybot.plugins.Success.prefixNick
22:34 JoeJulian @search brick
22:34 glusterbot JoeJulian: There were no matching configuration variables.
22:34 JoeJulian dammit, what the hell was that factoid...
22:34 JoeJulian @former brick
22:34 glusterbot JoeJulian: You'll have to clear the extended attribute that flags that directory as a former brick. See http://goo.gl/YUzrh
22:34 JoeJulian Aha!
22:35 bronaugh_ right.
22:35 JoeJulian If paths change? You should be able to do replace-brick. Even a replace-brick...force if there's nothing that needs migrated to the new path.
22:37 JoeJulian For instance.... I used to mount bricks under /var/lib/glusterfs/myvol/brick_a. I change my mind and moved them to /data/glusterfs/myvol/brick/a. To do that I did gluster volume myvol replace-brick server1:/var/lib/glusterfs/myvol/brick_a server1:/data/glusterfs/myvol/brick/a force
22:45 hagarth joined #gluster
22:53 joehoyle joined #gluster
22:55 ash13 Mount options: uid=,gid=   (http://supercolony.gluster.org/piperma​il/gluster-devel/2007-May/000831.html)  That is something that I have seen on a number of message threads over the years, and something I'm currently looking for.  Is that possible?
22:55 glusterbot <http://goo.gl/eKprU> (at supercolony.gluster.org)
23:03 JoeJulian No.
23:05 ash13 a number of FUSE based mount clients (I presume via FUSE, although I'm not familiar with the FUSE code) , support it.  Would it not be possible for mount.glusterfs to do the same?
23:09 JoeJulian It's possible and I'm not sure why (other than perhaps nobody using it) the translator that provided that functionality was dropped a while back.
23:12 joehoyle joined #gluster
23:18 hagarth joined #gluster
23:20 RobertLaptop joined #gluster
23:30 jag3773 joined #gluster
23:34 ash13 That's unfortunate.  Thanks for the response.
23:37 ChikuLinu__ joined #gluster
23:38 flin_ joined #gluster
23:38 torbjorn1_ left #gluster
23:39 mtanner_ joined #gluster
23:39 johnmark joined #gluster
23:48 phix joined #gluster
23:49 phix hey
23:50 yinyin joined #gluster
23:54 hagarth joined #gluster

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary