Perl 6 - the future is here, just unevenly distributed

IRC log for #gluster, 2013-11-21

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:18 msolo joined #gluster
00:20 gdubreui joined #gluster
00:27 JoeJulian I, personally, use /data/$volume_name/{a,b,c,d}/brick for each of my 4 bricks where a is associated with the first hard drive (through lvm), b is associated with the second, etc.
00:28 dbruhn makes sense
00:31 dbruhn Well I think I have hit the holy grail of cheapest possible servers for systems. HP DL180 G5's, 32GB of ram, 2 Quad Procs ( I know overkill but I run my application stack on the same server),12x 3TB drives, plus QDR IB cards for $4240.67 each
00:31 Comnenus joined #gluster
00:32 JoeJulian That's a lot of bang for the buck.
00:32 Comnenus I'm trying to set up NFS on CentOS.  I did see GNFS in the documentation, but not about how to set it up...  can someone point me to that by any chance?  Google is failing me.
00:33 dbruhn Totally a good deal, I got 6 of them total. I can hook anyone up with the guy if they need.
00:33 dbruhn That was with 3m QDR cables as well
00:33 dbruhn plus he upgraded the RAID controllers to p410's so the 3TB drives worked in them
00:35 JoeJulian ~nfs | Comnenus
00:35 glusterbot Comnenus: To mount via nfs, most distros require the options, tcp,vers=3 -- Also an rpc port mapper (like rpcbind in EL distributions) should be running on the server, and the kernel nfs server (nfsd) should be disabled
00:36 Comnenus is it automatically already an nfs export, then?
00:36 dbruhn yep
00:36 Comnenus oh, cool
00:36 dbruhn comnenus, gluster has a NFS server built in to server where the FUSE client won't work
00:38 Comnenus is there a trick to add it as a datastore in ESXi ?
00:38 JoeJulian @esx
00:38 glusterbot JoeJulian: I do not know about 'esx', but I do know about these similar topics: 'ebs'
00:38 JoeJulian hmm, nope...
00:38 Comnenus hmm
00:38 dbruhn I have seen some write-ups around google for using it as a datastore, but never done it myself
00:38 JoeJulian I remember something about esxi requiring a specific port...
00:39 Comnenus vmware is complaining that "The NFS server does not support MOUNT version 3 over TCP"
00:39 criticalhammer may I ask why your using nfs over the gluster client?
00:39 dbruhn He is trying to use it as a datastore for Vmware
00:39 dbruhn esxi doesn't have a gluster client built in
00:39 criticalhammer Oh, gotchya
00:40 Comnenus ...and I didn't know it existed.  I've been playing with this for about an hour.  not even.
00:40 criticalhammer Is this a production environment?
00:40 Comnenus nope.
00:40 Comnenus not yet anyway, that's why I'm playing around with it.  It will be.  So I'm trying a few different things.
00:41 criticalhammer One of these days i'd like to do the same
00:43 JoeJulian By "the same" I wouldn't suggest vmware. Qemu/kvm is easier to manage and supports libgfapi?
00:43 JoeJulian s/\?$/./
00:43 Comnenus I need to use what people know how to use, though.  Because I have no intentions of owning this cluster forever :)
00:43 glusterbot JoeJulian: Error: I couldn't find a message matching that criteria in my history of 1000 messages.
00:43 criticalhammer "The same" as in some type of vm environment, does not have to be vmware
00:43 Comnenus oh - I'm not running this IN vmware
00:44 JoeJulian I understand.
00:44 Comnenus I have one machine running ESXi and 2 storage boxen running CentOS with gluster installed
00:44 criticalhammer I think he was commenting on my comment
00:44 Comnenus fair enough.
00:45 JoeJulian But as hypervisors go, my personal preference is kvm for support and, of course, it's open-source.
00:46 asias joined #gluster
00:46 JoeJulian Rats... I thought I had found something useful at http://www.2stacks.net/?page_id=120 but it's a "to be continued..." :/
00:46 glusterbot Title: Glusterfs, UCARP and ESXi 5.1 | 2stacks.net (at www.2stacks.net)
00:46 criticalhammer email the author and ask him/her to speed it up :P
00:47 dbruhn comnenus, did you disable NFS on the server and make sure the gluster NFS server is enabled
00:47 dbruhn the two can interact with each other
00:47 JoeJulian Hmm, yeah, everything I see says it just works.
00:47 Comnenus dbruhn: how to I see if the gluster NFS is enabled?  I did disable nfsd
00:47 dbruhn make sure you can mount the volume with another client before trying to access it via vmware
00:48 dbruhn There is a volume option to enable NFS, you will have to stop the existing NFS daemon and make sure it doesn't start on boot
00:49 Comnenus what would it call the export?  I followed the quick start.  Apparently it's not exported as /data/gv0/brick1
00:50 dbruhn showmount -e xxx.xxx.xxx.xxx
00:50 dbruhn will show you your available mounts
00:51 Comnenus well then... that would explain quite a bit :D
00:51 Comnenus can mount between linux <-> linux but not vmware <-> linux ... but at least I know gnfs is working now.
00:52 dbruhn You can watch your logs and see if there any errors
00:52 dbruhn on the gluster servers
00:56 Comnenus hey - vmware was able to add server2 but not server1.  So I screwed something up.
00:56 Comnenus progress!
00:56 JoeJulian iptables?
00:56 Comnenus no I have them shut off for the time being.
00:57 dbruhn I am assuming you are set up with a replicant volume?
00:57 Comnenus yes
00:57 dbruhn make sure your volume is healthy
00:57 Comnenus oh, boom.  stopping nfsd also stopped rpc,
00:58 dbruhn gluster volume status
00:58 dbruhn that will show you if everything is connected and happy
00:58 dbruhn and gluster peer status
00:58 dbruhn will show you if the peers are all connected
00:58 Comnenus says Online Y even though the machine is rebooting.
00:58 dbruhn lol
00:59 dbruhn it's magic
00:59 JoeJulian hasn't timed out yet probably
00:59 dbruhn Some days I hate documentation... till I need it
00:59 Comnenus ah, I read that incorrectly.  it did take the other server out.  Now it's back.
01:00 Comnenus It's 8pm.  I'm tired.  mistakes are being made.
01:00 dbruhn No worries comnenus, it's post normal work hours for all of us, except maybe californians
01:00 Comnenus those damn californians!
01:00 Comnenus actually I like californians.  They're the only other people that make us look sane.
01:01 dbruhn I am on day 3 of 7am to ? fighting with stuff so I feel you
01:01 dbruhn hahah
01:01 Comnenus yeah -  long days this week.
01:01 Comnenus I got the critical stuff out of the way, at least.
01:02 JoeJulian <-- Seattle
01:02 dbruhn I love Seattle!
01:02 JoeJulian But I took a 2 hour lunch, so I'm still working.
01:03 dbruhn I had an apple on my way to the data center this morning, and just finished my fourth 12oz red bull today...
01:03 Comnenus <-- Boston
01:03 criticalhammer that kind of life style isnt healthy
01:03 JoeJulian Heh, I was trying to picture you rack-mounting an apple
01:04 dbruhn <---- Minneapolis/Saint Paul
01:04 dbruhn I have a xraid and an xserve at home ;)
01:04 Comnenus fourth red bull?  good god
01:04 Comnenus ...I just realized I didnt have a single cup of coffee today.  There's my problem.
01:04 criticalhammer i'd vibrate out of my chair if I drank that much
01:05 Comnenus one red bull makes me vibrate out of my chair
01:05 dbruhn I grabbed two this morning, and then one of my employees surprised me with two over lunch when they realized I was sitting here with a hardware tech for a couple hours
01:05 Comnenus I'm only 5'7 110lbs or so...  a kid portion of red bull is probably all I need.
01:05 dbruhn hahah fair
01:06 Comnenus so the entire can hits me like a brick
01:06 criticalhammer I try not to caffeinate anymore, it isnt good for me.
01:06 dbruhn I actually drink caffeine free tea mostly until weeks like this
01:07 Comnenus 10+ years ago I was drinking an entire pot of coffee per shift
01:07 Comnenus that couldn't have been good.
01:07 Comnenus huh.  maybe that's why I"m short..
01:07 criticalhammer lool
01:07 dbruhn haha
01:07 dbruhn I don't even want to think about my drinking habits 10 years ago
01:08 dbruhn but i'm only 5'9" so... yeah
01:08 Comnenus I don't want to think about my drinking habits anytime since 2003 :X
01:08 dbruhn lol
01:08 Comnenus "Do you have a drinking problem?"  "Well, I wouldn't call it a problem...  I'm pretty good at it."
01:08 dbruhn Joe, is it still suggested to adjust the in ode size for ifs?
01:08 dbruhn xfs
01:09 dbruhn I've never had a drink of alcohol... but I used to drink 5-10 red bulls a night
01:10 criticalhammer Is there a brick count limit per cluster?
01:10 criticalhammer in other words, when putting in more bricks hurts
01:11 cyberbootje joined #gluster
01:11 criticalhammer I've been rolling a gluster deployment scenario around in my head which involves just adding more bricks with drives to increase volume size
01:11 JoeJulian criticalhammer: yes and no. Looks like there's been some concern when exceeding 1000 servers.
01:11 Amanda joined #gluster
01:11 criticalhammer JoeJulian: Is there a practical volume limit?
01:11 criticalhammer volume limit per cluster
01:11 JoeJulian tcp sockets
01:11 criticalhammer mmm
01:12 criticalhammer so 50 volumes each representing a ext4 mount is not a good idea
01:12 JoeJulian That should be fine.
01:12 msolo what is the tcp socket issue that limits?
01:12 JoeJulian 65530ish bricks may be a problem though.
01:12 criticalhammer Doesnt each volume take up a bit of memory on each brick?
01:13 JoeJulian Each brick does, yes, but that can be managed by setting the cache limit. I have 60 bricks per server.
01:13 criticalhammer also if a brick has to be recovered, each of those volumes get recovered all at the same time. So wouldnt all the bricks just starve?
01:14 JoeJulian That can be a relatively minor issue. Self-heal runs on a lower priority queue than clients so although it's a high load, the clients aren't starved.
01:15 Comnenus It's working, muahahahaha
01:15 criticalhammer JoeJulian: how many servers do you have in a cluster?
01:15 JoeJulian criticalhammer: 3.
01:16 Comnenus so if I've got server1 and server2, and I add server2:/gv0 how would failover work if I shutdown server2?  I'm assuming it wouldn't?
01:16 criticalhammer it seems i was using the brick terminology the wrong way xD
01:16 Comnenus or at least, not without setting something more up
01:16 JoeJulian 3 servers, 4 drives, 15 volumes. I do replica 3 across the servers and distribute across the 4 drives. I partition those drives with lvm.
01:16 JoeJulian @glossary
01:16 glusterbot JoeJulian: A "server" hosts "bricks" (ie. server1:/foo) which belong to a "volume"  which is accessed from a "client"  . The "master" geosynchronizes a "volume" to a "slave" (ie. remote1:/data/foo).
01:16 Comnenus I feel like Dr. Evil just explained that.
01:17 JoeJulian lol
01:17 JoeJulian @mount server
01:17 glusterbot JoeJulian: The server specified is only used to retrieve the client volume definition. Once connected, the client connects to all the servers in the volume. See also @rrnds
01:17 Comnenus The "master" geosychronizes a "volume" to a "slave" which I call the "death star"
01:17 JoeJulian Comnenus: Does that answer your question? ^
01:17 criticalhammer so a brick is the folder containing a file system on a server
01:17 Comnenus actually yes.  thanks
01:18 JoeJulian criticalhammer: yes
01:18 JoeJulian if you believe in folders
01:18 criticalhammer that folder is part of a total volume spanning multiple servers
01:18 Comnenus @rrnds
01:18 glusterbot Comnenus: I do not know about 'rrnds', but I do know about these similar topics: 'rrdns'
01:18 criticalhammer yeah, folder is just somthing that poped into my mind
01:18 JoeJulian I grew up using DOS, so they'll always be directories to me.
01:18 Comnenus huh?  says see also @rrnds right there
01:19 JoeJulian lol
01:19 Comnenus JoeJulian: I'm assuming though, since vmware is just treating it as an NFS share, it wouldn't know anything about the volumes
01:19 criticalhammer im starting to think 8gb of ecc memory may not be enough for 50 volumes spanned across 4 servers
01:19 JoeJulian @change "mount server" s/rrnds/rrdns/
01:19 glusterbot JoeJulian: Error: The command "change" is available in the Factoids, Herald, and Topic plugins. Please specify the plugin whose command you wish to call by using its name as a command before "change".
01:19 JoeJulian @factoids change "mount server" s/rrnds/rrdns/
01:19 glusterbot JoeJulian: Error: 's/rrnds/rrdns/' is not a valid key id.
01:19 JoeJulian @factoids change "mount server" 1 s/rrnds/rrdns/
01:19 glusterbot JoeJulian: The operation succeeded.
01:19 Comnenus @rrdns
01:19 glusterbot Comnenus: You can use rrdns to allow failover for mounting your volume. See Joe's tutorial: http://goo.gl/ktI6p
01:20 Comnenus JoeJulian: are you Joe by any chance?
01:20 JoeJulian Comnenus: You're right. for nfs you'll need ucarp or something like it.
01:20 JoeJulian Last time I checked.
01:20 criticalhammer hey JoeJulian if you dont mind me asking, whats your hardware specs on your servers?
01:21 criticalhammer Unfortunately, because I don't have good hardware to test with i've been trying to extrapolate what I need from other setups. Kind of nerve wracking.
01:21 criticalhammer and annoying
01:22 dbruhn critical, what are your goals?
01:22 criticalhammer mass storage for users
01:23 JoeJulian These are ASUS RS700-E7-RS4 with 16Gb and dual Intel Xeon E5-2609 and 1 TB enterprise drives.
01:23 criticalhammer taking large outputs from off site supper computers and dump them onto storage for a small group of people to use
01:23 criticalhammer and doing it cost effectively
01:23 dbruhn size/ throughput requirements?
01:23 criticalhammer oh, it varies
01:23 JoeJulian off site supper computers.... sounds delicious.
01:23 criticalhammer it is joe
01:24 JoeJulian And you get a good workout before you eat.
01:24 criticalhammer and it wouldnt be from just supper computer outputs, it would be used for daily storage
01:24 dbruhn hard to spec something without some requirements
01:24 criticalhammer yeah =|
01:25 JoeJulian I'm always amazed how often that happens.
01:25 dbruhn you will find the whole spread of guys running two servers with a drive or two in each server all the way to 100's of tb's and 10GB/56GB networks here
01:25 Comnenus alright I'm going home.  thanks for the help. I'm sure I'll be around.
01:25 dbruhn have a good night comnenus
01:26 criticalhammer So we have a 2 36tb SAN array using 8gb fiber
01:26 JoeJulian even 1Gb...
01:26 criticalhammer and people are saying that over a 1gb connection it takes way to long to transfer terabyte size data sets from off site onto local storage
01:26 dbruhn I am actually thinking about standing my 1GB cluster back up for some limited use
01:26 JoeJulian Of course.
01:27 JoeJulian Define your needs, then engineer to meet them.
01:27 criticalhammer we currently are working on installing a 10gb connection from a scientific network
01:27 JoeJulian The other way around is doomed.
01:27 dbruhn I have to agree with Joe here, gluster is fairly graceful about scaling straight with spindles and network
01:28 criticalhammer So for starters here are my requirements. 96tb of storage using a 10gbe backbone, which will server out storage to 6 servers and 10+ desktops
01:28 JoeJulian And I don't mean that just for GlusterFS. If you want to meet and exceed your principals expectations, they have to know what they are before you get started.
01:28 JoeJulian Otherwise it'll never be good enough.
01:29 dbruhn ciriticalhammer, what are your throughput requirements? or IOPS requirements
01:29 criticalhammer I agree, unfortunately the end users really dont know what they want. The only concrete thing I got from them was "we need more storage"
01:29 dbruhn and known max file size?
01:29 criticalhammer max file size is 1.2 TB tar files
01:30 JoeJulian Pfft... "users"... I'm talking principals. The guys that write your check. They're the ones that need to know the expectations and know that you satisfied them.
01:30 criticalhammer but span from 5kb files to 1.2TB files
01:30 criticalhammer there scientists
01:30 criticalhammer they look to us and say "make it work"
01:30 criticalhammer their8
01:30 criticalhammer *
01:30 dbruhn they're ;)
01:30 criticalhammer yeah, lool, english
01:31 dbruhn more space is an easy requirement to work against!
01:31 criticalhammer yeah it is
01:31 dbruhn attach some usb hard drives to things and call it good
01:31 dbruhn lol
01:31 criticalhammer the bonnie++ test ive done gave me some benchmark speeds to go for
01:32 dbruhn what were those speeds?
01:32 criticalhammer unfortunately i didnt setup the SAN thats currently being used
01:32 criticalhammer 1 sec
01:32 JoeJulian They can be told their expectations. That's fine. You just want to show that you met them so they can perceive your value.
01:32 criticalhammer wheres a good spot to dump large text?
01:32 JoeJulian fpaste
01:32 dbruhn fpaste
01:33 criticalhammer i did three tests
01:33 dbruhn Joe is right, I know I need to hit 700ish IOPS for every 1TB of data I store, makes it super easy to build a system that scales against that
01:33 criticalhammer 1 from desk top right into the san, with a server acting as a NFS server
01:33 criticalhammer 1 from NFS server right into the SAN
01:34 criticalhammer and for comparison, one on the NFS server that is using software RAID 10
01:34 criticalhammer and at the time the NFS server was not being used at all
01:35 davidbierce joined #gluster
01:36 dbruhn So? 66MB/s? or
01:36 dbruhn lol
01:36 criticalhammer 1 sec
01:36 criticalhammer im at work xD
01:36 criticalhammer got side tracked
01:36 dbruhn no worries, I am watching progress bars
01:36 hybrid512 joined #gluster
01:37 criticalhammer http://ur1.ca/g2k1a
01:37 glusterbot Title: #55593 Fedora Project Pastebin (at ur1.ca)
01:37 criticalhammer and ive seemed to have misplaced my local RAID 10 test
01:38 dbruhn So is your goal to emulate the NFS to SAN speeds, or to emulate the NFS speeds
01:39 criticalhammer emulate current NFS speeds
01:39 criticalhammer or exceed them
01:39 dbruhn so 85MB/s or better
01:40 dbruhn if I am reading that correctly
01:40 criticalhammer sorry, that wasnt completely true. For desktops I want to emulate NFS speeds, for servers I want to emulate NFS to SAN speeds.
01:41 criticalhammer the desktops are on 1GB ethernet so I cant improve that without more money in the network infrastructure which isnt going to happen.
01:41 dbruhn That's going to be tough to emulate going from SAN access speeds and no protocol overhead to a NAS with protocol overhead
01:41 criticalhammer yeah
01:42 criticalhammer 10gbe should mitigate that
01:42 kiwikrisp joined #gluster
01:43 dbruhn Yeah but you end up with 20ish % loss right off the top with TCP/IP
01:43 criticalhammer my plan was to throw all the servers that need direct gluster access on a 10gbe switch
01:43 criticalhammer exactly
01:43 dbruhn not to mention adding a software layer to the mix
01:43 criticalhammer yeah
01:44 dbruhn Do the servers and the desktops access the same data?
01:45 criticalhammer we try not to do that, because mounting the same partition on multiple machines can cause issues
01:45 dbruhn So in theory you could tailor different hardware specs to the workloads
01:45 criticalhammer yes
01:46 criticalhammer I was hoping to keep the specs the same for the initial buy in, so that its easier to calculate costs.
01:46 dbruhn Are you more concerned about read speeds or write speeds?
01:46 criticalhammer write speeds
01:47 dbruhn Planning on using replication for fault tolerance?
01:47 criticalhammer nope, it turns out the users have used the SAN with an understanding that it could catch on fire and they could lose all their data. They are fine with keeping with that same mentality with this gluster setup
01:48 dbruhn ok, well you are looking at adding the complexity of a multiple node NAS system, so if one node goes down, the whole thing is in theory dead
01:48 criticalhammer i understand
01:49 dbruhn what kind of SAN are you replacing anyway?
01:49 criticalhammer a 5 year old fujitsu SAN
01:49 criticalhammer soon to be 6 years old
01:50 dbruhn lol, I have 5 net app systems that are in the same state, I am building my last gluster cluster to replace them as we speak
01:50 criticalhammer thats the intention here
01:50 criticalhammer getting new hardware in so that we have contingencies
01:52 dbruhn so from your servers you need to do about 500MB/s
01:52 dbruhn to maintain the status quo
01:53 criticalhammer My thought for the setup was 4 bricks each with 1 2.4 Ghz quad core processor, 8 Gigs of ecc ram, with a lsi 9260-8i with battery backup setup in a 24TB raid 6 array, write through enabled, all on a 10gbe network.
01:53 criticalhammer sorry 4 servers
01:53 criticalhammer not bricks
01:53 daMaestro joined #gluster
01:54 kiwikrisp Folks I'm having issues with my gluster setup (2 replicant nodes 6.3TB). I'm using NFS and gluster native clients to access the pool and am having issues where the disks are seeing constant activity. When I run iotop both nodes are just doing a bunch of reading which I assumed meant that they were just syncronizing, but it's been going on for several hours and nothing has been changing because nobody is attached.
01:54 kiwikrisp I thought it might be underlying file system issues so I did xfs_check, which garnered no errors. the gluster volume status command says there are no active volume tasks, yet glusterfs is the only thing using the disks in iotop. I'm at a loss what to look for next. Any suggestions? (glusterfs 3.4 on CentOS 6.4 servers)
01:54 JoeJulian self-heal?
01:55 JoeJulian volume heal $vol info
01:55 JoeJulian Do you to geo-replication?
01:55 dbruhn criticalhammer, 2TB SATA drives?
01:55 criticalhammer 4TB
01:55 kiwikrisp Gathering Heal info on volume gvolnfs has been successful
01:56 kiwikrisp Brick glstr03:/export/brick00/nfs
01:56 kiwikrisp Number of entries: 1
01:56 kiwikrisp <gfid:f4611b56-5b34-4c80-a553-6e837dad5bd0>
01:56 kiwikrisp Brick glstr04:/export/brick00/nfs
01:56 kiwikrisp Number of entries: 1
01:56 kiwikrisp <gfid:f4611b56-5b34-4c80-a553-6e837dad5bd0>
01:56 criticalhammer 4TB sata drives
01:56 kiwikrisp No geo-replication.
01:56 criticalhammer 4 TB in a 8 disk RAID 6 setup
01:58 dbruhn So you are looking at 150-175MB/s per spindle for writes. x8 per brick for direct access to the drives = 1200MB/s of throughput
01:58 dbruhn I guess with the right network I could see you getting the kind of throughput you are looking for
01:58 dbruhn the more file ops you are looking at the worse its going to be
01:58 criticalhammer yeah i know
01:58 dbruhn maybe suggest renting some equipment for a short term test?
01:58 criticalhammer as for the network, hopefully jumbo frames work properly
01:59 criticalhammer i was thinking it
01:59 criticalhammer its a matter of time though
01:59 dbruhn I gather that
02:00 criticalhammer If I calculated things correctly, with the distributed IO across the servers it should be about the same
02:00 criticalhammer maybe a bit faster because of well distributed IO
02:00 dbruhn keep in mind gluster doesn't aggregate on sequentially written data like a raid
02:00 criticalhammer Unfortunatly im sharing the switch so there may be switch backplane saturation
02:01 criticalhammer i understand
02:01 dbruhn so the slowest component in your system is going to be your bottle neck, where that data lies
02:01 criticalhammer and thats one of the reasons why I want to keep the hardware the same
02:01 criticalhammer to reduce choke points
02:02 dbruhn I get that, my only thought was like a 15k SAS setup for your server access, and then SATA for the desktop access
02:02 criticalhammer is there a way to calculate disk speeds which include write through?
02:02 criticalhammer yeah we dont have the money for 15k SAS setups
02:02 dbruhn also, I would suggest upping the ram, and the CPU's I run duals in my stuff, the healing processes can soak your CPU
02:03 dbruhn if you can
02:03 criticalhammer i was thinking that as well
02:03 criticalhammer there is room to grow, so if we do run into problems I can add more hardware
02:03 criticalhammer ie each server has room to grow
02:04 criticalhammer also dbruhn how did you calculate the disk throughput?
02:05 dbruhn I took the rated speed of seagate 7200 rpm 4tb drives, and multiplied it by the number you are putting in a raid set
02:05 dbruhn not that that is accurate
02:05 dbruhn I usually try and half those speeds
02:06 dbruhn and just assumed the network throughput was available
02:06 dbruhn granted direct access to those resources is always going to be faster
02:06 dbruhn The san is going to be hard to beat for the speed of access
02:06 criticalhammer yeah
02:07 criticalhammer But we have the ability to expand with less cost
02:07 criticalhammer if we go this route
02:08 dbruhn But I always try to build up from the lowest level, up. So I know a HD can give me x, make sure the next choke point can provide what I need, etc.
02:08 criticalhammer yeah
02:08 dbruhn What's your budget? Seems to be a main driver here.
02:08 criticalhammer Oh im under budget
02:09 criticalhammer but im working with a 70 year old scientist who is frugle as hell and hard to budge when it comes to money
02:09 criticalhammer so theres always room to cut costs
02:11 dbruhn Well of course, I was just going to suggest maybe doing something like grey market hardware with a warrantee, so you can overshoot your hardware a bit, and still keep your costs down
02:12 criticalhammer yeah i've placed in front of them quotes form vendors that offer less per $ and they ended up choosing in house builds
02:13 criticalhammer welcome to the world of science
02:13 kiwikrisp joined #gluster
02:14 dbruhn haha
02:14 dbruhn Don't worry, I work in the world of cloud services... I know how stupid stuff can be sometimes
02:15 criticalhammer the saving grace is they know that if one of these nodes goes down, everything is potentially lost.
02:15 dbruhn a place where people think everything runs on the slowest crap hardware every, yet produces the fastest everything
02:15 criticalhammer yeah
02:16 dbruhn I literally was working on a cloud stack system where the guy who wanted it said he watch a youtube video that showed him he could run his entire cloud on a laptop, he thought virtualization gave him limitless resources
02:16 criticalhammer lool
02:16 criticalhammer kind of like infinite energy
02:16 dbruhn right!
02:16 criticalhammer MAGNETS MAN, THEY WORK!
02:17 dbruhn The uphill battle was won, and now the guy is sitting with 60 servers in two racks, and I passed it off to his people to run.
02:18 dbruhn nice thing about working with scientists though.. you can show them numbers an explain the measurements and they will get it
02:18 criticalhammer sometimes
02:18 dbruhn they might make a shitty choice, but they understand it to a point
02:18 criticalhammer there are some scientists that get how things work in the world, and some that dont
02:19 criticalhammer the one thing though is, they love to pick things apart to get to the truth. THAT is very beneficial
02:19 dbruhn yep
02:20 dbruhn Well my suggestion, build from the spindles you choose to use, and make sure every component from there up supports your needs. Also hard drive manufacturers are liars, so cut their numbers in half
02:20 criticalhammer Thanks dbruhn
02:20 harish_ joined #gluster
02:20 dbruhn if you had your understanding down to IOPS I am way more help
02:21 criticalhammer yeah
02:21 dbruhn my environment is more accesses than throughput
02:21 criticalhammer fortunately there arnt that many users hitting this thing, even though they do hammer the SAN from time to time.
02:21 criticalhammer so this is not like a amazon SAN setup
02:22 criticalhammer unfortunately, its more unpredictable.
02:22 dbruhn Well if you know your peak usage that might be an easier measurement
02:23 criticalhammer Where would you do that kind of analysis?
02:23 dbruhn predictable is nice
02:24 dbruhn nice thing is you are used to working against a SAN though, so you haven't been sharing resources
02:24 dbruhn you can assume your worst offender is the max output a single brick needs to be able to serve
02:24 criticalhammer yeah
02:25 criticalhammer exactly
02:25 dbruhn large file ops shouldn't be bad with 10GB and enough spindles
02:26 dbruhn small file ops are costly
02:26 criticalhammer with hardware raid it should be able to handle it a lot better
02:26 criticalhammer yeah small files can be an issue
02:26 criticalhammer but thats almost universal with any storage setup
02:27 dbruhn yeah, that's where my hell is
02:27 dbruhn millions and millions of files
02:27 dbruhn endlessly piled up
02:27 criticalhammer you do web?
02:27 dbruhn nah
02:27 dbruhn online backup
02:27 criticalhammer eek
02:27 criticalhammer i used to be in that business, when i did contract work
02:27 dbruhn I am sitting about 40 million files to every 20tb right now
02:28 criticalhammer yeah
02:28 dbruhn I do consistency checks on every single file, and have to read the header and hash check them on interval
02:28 criticalhammer i know that all to well
02:28 dbruhn lol
02:28 JoeJulian dbruhn: Where do you store the hash?
02:28 chirino joined #gluster
02:28 dbruhn In a database, and in the header of the file
02:30 criticalhammer well thanks dbruhn for the input
02:30 dbruhn No problem, hope I was any help at all
02:30 criticalhammer you where
02:31 dbruhn i gotta run, apparently my date got off early
02:31 criticalhammer and I have to leave as well
02:31 criticalhammer take care
02:31 criticalhammer left #gluster
02:41 vshankar joined #gluster
02:47 bharata-rao joined #gluster
02:52 vshankar joined #gluster
02:53 saurabh joined #gluster
02:54 shubhendu joined #gluster
03:17 kshlm joined #gluster
03:19 PM1976 joined #gluster
03:19 PM1976 Hi Gluster community :)
03:20 PM1976 As you help me well a few days ago in my geo-replication setup, I decided to come back to you for aditional help ;)
03:20 PM1976 I have 2 questions: one regarding geo-replication and one regarding log rotation
03:21 PM1976 so first thing: is it possible to say gluster to exclude from the geo-replication a specific file extension (*.filepart)?
03:22 PM1976 second thing: I try to automate the log rotationusing /etc/logrotate.d/glusterfsd
03:23 PM1976 but when the log rotates, the new log file is not growing anymore... it stays at 0 byte. I need to make a manual gluster volume log-rotate command to make it work. any suggestion?
03:23 satheesh joined #gluster
03:26 _BryanHm_ joined #gluster
03:29 dbruhn joined #gluster
03:29 dbruhn left #gluster
03:39 sgowda joined #gluster
03:40 vpshastry joined #gluster
03:40 vpshastry left #gluster
03:47 harish_ joined #gluster
03:54 glusterbot New news from newglusterbugs: [Bug 1032859] readlink returns EINVAL <http://goo.gl/ojGr7c>
04:03 msolo joined #gluster
04:18 dkorzhevin joined #gluster
04:22 vshankar joined #gluster
04:26 Guest62660 joined #gluster
04:27 asias joined #gluster
04:27 mattapp__ joined #gluster
04:30 Guest62660 many users signed in, but the channel is very quiet
04:32 JoeJulian Guest62660: That usually happens outside of US business hours. There just doesn't seem to be as many eastern hemisphere folks that stick around and help after getting their systems up and running.
04:33 Guest62660 I am in western europe, it is quite early here
04:33 JoeJulian PM1976: As for log rotation, most of us prefer to use copytruncate. It seems to work more reliably and is simpler. Otherwise, a SIGHUP will start writing to your new log file.
04:37 17SADM24Y joined #gluster
04:37 23LAAKM4D joined #gluster
04:37 hagarth1 Guest62660: are you looking for pk?
04:38 Guest62660 I am lookup for pk
04:39 Guest62660 er I meant looking
04:39 hagarth Guest62660: lookup of pk is yielding ENOENT :)
04:39 hagarth Guest62660: pk should be online in an hour or so
04:40 MiteshShah joined #gluster
04:40 Guest62660 I will be afk at that time, but I will be back i n 3 hours. Hope we can find each others
04:41 hagarth Guest62660: yeah, pk will be around then.
04:41 JoeJulian PM1976: I bet it would work to add your exclude on the ssh_command configuration in /var/lib/glusterd/geo-replication/gsyncd.conf
04:41 raghug joined #gluster
04:41 Guest62660 thanks, see you later then
04:41 JoeJulian I thought pk was ENOTCONN
04:42 psharma joined #gluster
04:42 hagarth JoeJulian: unfortunately, pk is not HA.
04:42 JoeJulian hehe
04:42 hagarth Guest62660: see you later
04:49 lalatenduM joined #gluster
04:52 ndarshan joined #gluster
04:59 CheRi joined #gluster
05:00 dusmant joined #gluster
05:00 kanagaraj joined #gluster
05:01 vpshastry joined #gluster
05:02 bala joined #gluster
05:05 PM1976 Hi JoeJulian. thanks for your feedback. for the logs, how to write the copytruncate? can you provide me an example? for the exclusion of the file extension, I bet also it is in that file, but when I search in gsyncd.py, I didn't find anything yet. If you know what I need to add in the gsynd.conf firl, I will be glad to know it ;)
05:08 JoeJulian http://paste.fedoraproject.org/55610/01049313
05:08 glusterbot Title: #55610 Fedora Project Pastebin (at paste.fedoraproject.org)
05:10 JoeJulian PM1976: gluster geo-sync uses rsync so I would try adding a standard rsync exclusion option on to that variable. Theoretically you could even use the option to reference an exclusion file which would be easier to manage.
05:13 Guest33767 left #gluster
05:18 PM1976 JoeJulian: thank you, I will try that and let you know :)
05:20 ppai joined #gluster
05:20 pk joined #gluster
05:22 aravindavk joined #gluster
05:22 hateya_ joined #gluster
05:25 bala joined #gluster
05:30 ababu joined #gluster
05:35 raghug joined #gluster
05:35 mohankumar joined #gluster
05:43 shylesh joined #gluster
05:47 shruti joined #gluster
05:51 bulde joined #gluster
05:54 stickyboy I brought down a node in a replica yesterday, then after bringing it started healing but I'm not quite sure about the output.
05:54 stickyboy I'm getting a few hundred lines like this in my `heal info`: <gfid:602d99cc-65b0-47eb-bffd-9ed982177c2e>
05:55 vshankar joined #gluster
06:01 meghanam joined #gluster
06:17 asias joined #gluster
06:25 satheesh1 joined #gluster
06:32 rastar joined #gluster
06:34 satheesh2 joined #gluster
06:34 lalatenduM_ joined #gluster
06:44 hagarth joined #gluster
06:46 jtux joined #gluster
06:47 raghu joined #gluster
06:48 satheesh1 joined #gluster
06:52 ricky-ticky joined #gluster
06:53 harish_ joined #gluster
06:59 elyograg is the entire pathname considered when choosing which replica set receives the file, or just the end filename?
07:08 DV__ joined #gluster
07:19 hagarth elyograg: it is based on the end filename
07:23 PM1976 JoeJulian: I tried the truncate and it didn't work. When the logs rotate, they stay with a size of 0 KB. also, for the "rsync --exclude", you think it should be placed at the end of the ssh command in the gsync.conf file?
07:26 DV__ joined #gluster
07:26 JoeJulian PM1976: The copytruncate does work. Check a process whose log you're working on with lsof and see which log it's writing to. My guess is that you've previously rotated the log so it wasn't writing to the one you truncated in the first place.
07:27 JoeJulian PM1976: I bet it would work to add your exclude on the ssh_command configuration in /var/lib/glusterd/geo-replication/gsyncd.conf
07:27 cyberbootje joined #gluster
07:27 JoeJulian PM1976: Understand, that's only a guess.
07:28 JoeJulian The logrotate setting is a certainty though. Everyone I know does it that way.
07:28 PM1976 JoeJulian: so for the log, is something like this ok:/var/log/glusterfs/*glusterfsd.log /var/log/glusterfs/bricks/*.log {
07:28 PM1976 daily
07:28 PM1976 copytruncate
07:28 PM1976 rotate 7
07:28 PM1976 delaycompress
07:28 PM1976 compress
07:28 PM1976 missingok
07:28 PM1976 endscript
07:28 PM1976 }
07:28 JoeJulian use a pastebin
07:30 JoeJulian No need for endscript
07:31 JoeJulian I'm not really sure if you actually want missingok. I think I'd be worried if there were no brick logs and would want to know about it.
07:32 PM1976 ok. I will remove that and retry it
07:32 JoeJulian Also no need for delaycompress
07:32 JoeJulian killall -HUP glusterfsd
07:32 JoeJulian I bet you're still writing to some old logfile
07:33 PM1976 I set this with a postrotate?
07:33 JoeJulian No, just do it by hand now
07:33 JoeJulian Once you've done it, there's no need for a postrotate if you use copytruncate
07:42 pk left #gluster
07:48 franklu joined #gluster
07:52 franklu I saw some strange things when I using glusterfs that: when I run ls to list the files, I saw the size of files are zero.
07:53 franklu [root@10 f7]# ls -la 5868e4351f03af4d21833ef01ab64d3c.f4v
07:53 franklu -rwxrwxrwx 1 root root 0 Nov 21 11:17 5868e4351f03af4d21833ef01ab64d3c.f4v
07:54 ctria joined #gluster
07:54 ngoswami joined #gluster
07:54 franklu while in the brick servers, I saw:  for i in `cat node_list`;do echo $i; ssh -lroot $i  /tmp/fix_splitbrain.sh  /mnt/xfsd/upload  /bj/operation/video/2013/11/21/f7/5868​e4351f03af4d21833ef01ab64d3c.f4v;done
07:54 franklu 10.10.135.23
07:54 franklu 10.10.135.24
07:54 franklu trusted.glusterfs.dht.linkto="upload-replicate-1"
07:54 franklu ---------T 2 root root 0 Nov 21 11:17 /mnt/xfsd/upload/bj/operation/video/2013/11​/21/f7/5868e4351f03af4d21833ef01ab64d3c.f4v
07:54 franklu ---------T 2 root root 0 Nov 21 11:17 /mnt/xfsd/upload/.glusterfs/21/4b/2​14ba416-10da-4870-98b9-31fcec47344c
07:54 franklu 10.10.135.25
07:54 franklu -rw-r--r-- 2 root root 47972489 Nov 21 11:17 /mnt/xfsd/upload/bj/operation/video/2013/11​/21/f7/5868e4351f03af4d21833ef01ab64d3c.f4v
07:54 franklu -rw-r--r-- 2 root root 47972489 Nov 21 11:17 /mnt/xfsd/upload/.glusterfs/21/4b/2​14ba416-10da-4870-98b9-31fcec47344c
07:54 franklu 10.10.135.26
07:54 franklu 214ba41610da487098b931fcec47344c
07:54 franklu 5868e4351f03af4d21833ef01ab64d3c  /mnt/xfsd/upload/bj/operation/video/2013/11​/21/f7/5868e4351f03af4d21833ef01ab64d3c.f4v
07:54 franklu -rw-r--r-- 2 root root 47972489 Nov 21 11:17 /mnt/xfsd/upload/bj/operation/video/2013/11​/21/f7/5868e4351f03af4d21833ef01ab64d3c.f4v
07:54 franklu -rw-r--r-- 2 root root 47972489 Nov 21 11:17 /mnt/xfsd/upload/.glusterfs/21/4b/2​14ba416-10da-4870-98b9-31fcec47344c
07:54 franklu 10.10.135.27
07:54 franklu 10.10.135.28
07:54 JoeJulian @kick franklu use a pastebin
07:54 franklu was kicked by glusterbot: use a pastebin
07:54 glusterbot New news from newglusterbugs: [Bug 1032894] spurious ENOENTs when using libgfapi <http://goo.gl/x7C8qJ>
07:55 franklu joined #gluster
07:55 JoeJulian @sticky
07:55 JoeJulian darn...
07:56 JoeJulian franklu: Those are linkfiles. They're used when a file isn't where the filename hash would predict it should be. This can be caused by a number of things, but is most often caused by files being renamed.
07:57 JoeJulian franklu: This article will tell you a little about how that works: http://joejulian.name/blog​/dht-misses-are-expensive/
07:57 glusterbot <http://goo.gl/A3mCk> (at joejulian.name)
07:58 JoeJulian franklu: And please use a pastebin like fpaste.org rather than pasting things into an IRC channel.
07:59 manuatwork joined #gluster
08:00 manuatwork looking for pk
08:00 stickyboy JoeJulian: Did you mean to talk to me? (@sticky?)
08:01 d-fence joined #gluster
08:01 JoeJulian hehe, no... I thought I had a factoid on sticky pointers.
08:01 DV__ joined #gluster
08:02 stickyboy JoeJulian: Ah. :)
08:03 stickyboy JoeJulian: What does it mean when you see gfid's in `heal info`?  ie, a few hundred of these:  <gfid:602d99cc-65b0-47eb-bffd-9ed982177c2e>
08:05 JoeJulian It means that the self-heal hash doesn't (yet) know which filenames those gfids are hardlinked to. In theory, that shouldn't matter. If the self-heal completes on the hardlinked gfid files, the self heal will be done on the by-name files as well.
08:06 JoeJulian You can stat the gfid file (in your example .glusterfs/60/2d/602d99cc-6​5b0-47eb-bffd-9ed982177c2e) and look at the links. If there's only one, you can probably just delete that gfid file. There should be 2 or more.
08:07 JoeJulian @gfid
08:07 glusterbot JoeJulian: The gfid is a uuid that's assigned to represent a unique inode that can be identical across replicas. It's stored in extended attributes and used in the .glusterfs tree. See http://goo.gl/Bf9Er and http://goo.gl/j981n
08:07 stickyboy JoeJulian: Ok, 2 (or more) depending on the replica count.
08:07 JoeJulian 2 or more depending on whether you've hardlinked other names to that inode.
08:08 JoeJulian I thought semiosis had a factoid for that...
08:09 JoeJulian anyway, you can find the filename for that gfid by taking the inode number you got in that stat and doing a "find $brick -inum {inode number}"
08:09 stickyboy Ah, ok.
08:09 stickyboy JoeJulian: Yah, semiosis has a `resolve-gfid` script I saw yesterday
08:09 JoeJulian @resolve
08:09 glusterbot JoeJulian: I do not know about 'resolve', but I do know about these similar topics: 'gfid resolver'
08:09 JoeJulian @gfid resolver
08:09 glusterbot JoeJulian: https://gist.github.com/4392640
08:09 JoeJulian Yeah, that's the ticket...
08:10 stickyboy So I'm in some situation where I'm healing but not split brain.  And healing seems more like "healing".
08:11 JoeJulian I've found that doing a "heal...full" seems to resolve those to filenames pretty quickly in my volumes.
08:12 stickyboy JoeJulian: Nice
08:13 stickyboy I figured I'd let it sort it out overnight, but it hasn't done it yet.  So I'm digging a bit deeper
08:14 eseyman joined #gluster
08:14 ngoswami joined #gluster
08:15 shri joined #gluster
08:16 shri hagarth: ping.. you there
08:16 shri hagarth: did you got any success on devstack ?
08:19 hagarth shri: still have not got to do that, did  you try packstack?
08:20 stickyboy JoeJulian: Ah, I see in the stat for that particular file:   Links: 1
08:20 stickyboy :)
08:21 stickyboy And a control which wasn't listed in `heal info` shows Links: 2.
08:22 JoeJulian If I were in a good bug hunting mode, I would ask you to file a bug with the stats from that gfid file on any brick that it's on, along with the "getfattr -m . -d -e hex $gfidfile" as well.
08:22 glusterbot http://goo.gl/UUuCq
08:22 JoeJulian also from any brick that it's on.
08:23 JoeJulian Those aren't getting healed and it appears that's because the file was deleted.
08:23 shri hagarth: on packstack I'm getting that mysql error while.. do you know any work around on Fedora19
08:23 shri hagarth: error
08:23 shri ERROR: Error during puppet run: Error: mysqladmin -u root password '234234dgf234' returned 1 instead of one of [0]
08:23 stickyboy JoeJulian: Yah, I was thinking maybe the file was deleted from the replica while the one node was down.
08:23 keytab joined #gluster
08:24 shri mysql+openstack won't work on F19 .. same issue with devstack there I use postgresql ..
08:24 shri hagarth: do you know any work around for mysql ?
08:24 JoeJulian stickyboy: Which is something that the .glusterfs tree was supposed to be able to track and heal.
08:24 shri on packstack
08:25 JoeJulian shri: I'm using MariaDB on gluster on F19 with no problems, fwiw.
08:26 satheesh1 joined #gluster
08:26 pk joined #gluster
08:26 manuatwork hi pk
08:26 pk manuatwork: hi
08:26 shri JoeJulian: yeah I also did changes in those 2/3 files replacing mysql-server with mairadb-server but at the end when that ppupet will verify  it throws error
08:27 shri JoeJulian: can you help me .. which files I need to change for mariadb-server
08:27 manuatwork pk: where do we start?
08:28 JoeJulian shri: If this is using the puppetlabs-mysql modules, I override the package using hiera.
08:28 stickyboy JoeJulian: It seems that particular gfid I pasted earlier is a directory.  Doesn't seem likely that it was deleted.
08:28 pk manuatwork: I was messaging you in other window...
08:29 shri JoeJulian: I replaced mysql-server string with mariadb-server in below 3 files
08:29 shri /usr/lib/python2.7/site-packages/packstack​/puppet/modules/mysql/manifests/server.pp  ,
08:29 shri /usr/lib/python2.7/site-packages/packstack/pu​ppet/modules/mysql/manifests/server/config.pp  ,
08:29 shri /usr/lib/python2.7/site-packages/packstack​/puppet/modules/mysql/manifests/params.pp
08:29 shri and change all entries of mysql-server to mariadb-server
08:29 JoeJulian http://fpaste.org/55631/38502255/
08:29 glusterbot Title: #55631 Fedora Project Pastebin (at fpaste.org)
08:29 shri JoeJulian: at the end that mysql.pp file throw error while verifying
08:30 shri JoeJulian: got it I only changed mysql-server and NOT client .. I will do that
08:30 shri JoeJulian: let me try that
08:30 JoeJulian shri: that's all I had to do for overriding it in my hiera yaml data. I had to do it that way because I use puppetlabs-openstack which didn't give me a way to configure those.
08:31 hagarth shri: this thread also might help - http://openstack.redhat.com/​forum/discussion/comment/921
08:31 glusterbot <http://goo.gl/nQinFG> (at openstack.redhat.com)
08:31 JoeJulian I need to look at the puppet modules in packstack. I'm not sure whose they are.
08:31 klaxa|work joined #gluster
08:31 JoeJulian who's... whose? man... I must be getting tired...
08:32 hagarth JoeJulian: time to crash probably :)
08:33 JoeJulian Ah for the good old days when I could code for a week without sleep...
08:33 shri JoeJulian: Many Thanks ! let me try this !!
08:36 mohankumar joined #gluster
08:38 geewiz joined #gluster
08:47 andreask joined #gluster
08:48 Guest5627 joined #gluster
08:49 stickyboy Seeing a lot of these in the brick log: E [posix.c:2668:posix_getxattr] 0-homes-posix: getxattr failed on /mnt/gfs/wingu0/sda1/homes/.glusterfs/a4/​47/a447ace7-d56a-4d34-852e-a6c38ed837f1: system.posix_acl_access (No data available)
08:49 stickyboy http://review.gluster.org/#/c/6084/  <-- this error appears to be not serious, according to this commit
08:49 glusterbot Title: Gerrit Code Review (at review.gluster.org)
08:50 stickyboy "storage/posix: Lower log severity for ENODATA errors in getxattr()"
08:50 stickyboy So I won't worry about those errors
08:51 dusmant joined #gluster
08:54 hagarth stickyboy: yes, that can be ignored
08:55 stickyboy hagarth: Thanks. :P
08:56 franc joined #gluster
08:56 stickyboy I thought I had maybe set the diagnostics level to ERROR, but it's not.
08:57 stickyboy Or, I set it, then reset it to the default.
09:03 rastar joined #gluster
09:09 RameshN joined #gluster
09:10 getup- joined #gluster
09:11 shri joined #gluster
09:13 _pol joined #gluster
09:14 bgpepi joined #gluster
09:16 shri_ joined #gluster
09:26 dusmant joined #gluster
09:27 stickyboy Ok, so I've looked a handful of the gfid's in `heal info` and they all seem to be directories.  stat'ing them on each brick (x2) reveals that they have different access/modify times.
09:30 mohankumar joined #gluster
09:36 dneary joined #gluster
09:36 kanagaraj joined #gluster
09:37 franklu who could help to troubleshoot the issue that client see a file with size 0 and mod is 777?
09:38 franklu we have a file in glsutesrfs that is ~400MB, but when we rename it, we got the file is zero!
09:40 pk franklu: wow! what is the output of "gluster volume info"?
09:44 bma joined #gluster
09:44 stickyboy pk: I guess you're Pranith, who just responded to me about healing on the mailing list?
09:44 bma good morning everyone...
09:45 pk stickyboy: you are Alan?
09:45 vpshastry1 joined #gluster
09:46 bma i have a doubt in gluster-swift... in particular, the call to list the container. I have pseudo-folder like this: container/dir1/dir2/object
09:46 stickyboy pk: Yes, I'm Alan. :P
09:46 bma when i list, i get: [dir1, dir1/dir2, dir1/dir2/object]
09:47 bma is there i way to only retrieve [dir1/dir2/object] ??
09:47 stickyboy pk: I'm whipping up a one-liner to do the `getfattr` and `stat` from the output of `gluster volume heal .. info`... unless you have a better way?
09:47 cyberbootje joined #gluster
09:49 pk stickyboy: no better way sir
09:49 pk stickyboy: hey, but if the files listed are gfids the script will not work easily, will it?
09:50 stickyboy pk: Yeah, that's what I was thinking.  Some string manipulation to get the paths... hehe.
09:51 satheesh joined #gluster
09:51 pk stickyboy: ok cool.
09:51 franklu gluster volume info upload
09:51 franklu Volume Name: upload
09:51 franklu Type: Distributed-Replicate
09:51 franklu Volume ID: 6220fd5f-635c-44fb-a627-55dc796d5d1f
09:51 franklu Status: Started
09:51 franklu Number of Bricks: 3 x 2 = 6
09:51 franklu Transport-type: tcp
09:51 franklu Bricks:
09:51 franklu Brick1: 10.10.135.23:/mnt/xfsd/upload
09:51 franklu Brick2: 10.10.135.24:/mnt/xfsd/upload
09:51 franklu Brick3: 10.10.135.25:/mnt/xfsd/upload
09:51 franklu Brick4: 10.10.135.26:/mnt/xfsd/upload
09:51 franklu Brick5: 10.10.135.27:/mnt/xfsd/upload
09:51 franklu Brick6: 10.10.135.28:/mnt/xfsd/upload
09:52 franklu I found the brick1's system disk is very slow, then I make it offline, but the problem still appear
09:53 PM1976 JoeJulian: now the logs rotation seems to work fine. Thank you. I will keep monitoring it a few days to ensure it works fine. for the rsync command to exclude fiels from the geo-replication, I will check more to add it properly
09:55 glusterbot New news from newglusterbugs: [Bug 1028281] Create Volume: No systems root partition error when script mode <http://goo.gl/iZbUIP>
09:56 franklu another common is: the file is renamed, but the linked files are should be one in 10.10.135.23 while another should be in 10.10.135.24
10:05 social Hi I have crashed glusterfs node and I need to replace it with node having same DNS, it's replicated setup so it should not be that hard but what's the procedure for this?
10:09 stickyboy pk: Hope you didn't go to sleep yet. :P
10:13 pk it is just 3:42 PM long time for that.
10:13 rastar joined #gluster
10:14 stickyboy pk: GMT+3 here :)
10:15 dneary joined #gluster
10:18 pk stickyboy: quick question can you check the output of "ls <brick-dir>/.glusterfs/indices/xattrop | wc -l" output on both the bricks?
10:18 pk stickyboy: If they keep reducing over time we are good.
10:18 pk stickyboy: That directory contains potential files which may need self-heal
10:18 stickyboy pk: Ok, never knew it was there... lemme look.
10:20 stickyboy pk: One replica says
10:20 stickyboy 127 and the other says 128
10:21 pk stickyboy: what was the number of entries that needed healing?
10:22 stickyboy 126
10:22 stickyboy Though it keeps changing every few minutes.
10:23 social JoeJulian: ping
10:23 pk stickyboy: are we getting any Input/Output errors on the mount?
10:24 stickyboy pk: In dmesg?
10:25 pk stickyboy: no, on the mount
10:25 stickyboy pk: brick log?
10:25 pk stickyboy: mount, log
10:26 stickyboy I don't see any IO errors, no.
10:26 stickyboy I'll have the getfattr / stat in a sec
10:30 pk stickyboy: If there are any split-brains we would see I/O errors on the mount when those files are accessed. so either there are no split-brains or no one tries to access the split-brain files.
10:30 stickyboy pk: Well the `info split-brain` shows nothing :\
10:30 stickyboy I guess that's a good thing.
10:31 stickyboy But looking at stat on a few files on each replica I see the times are slightly different for those directories.
10:31 pk stickyboy: Lets see the extended attributes, until then what ever we discuss will just be guesswork :-)
10:32 stickyboy Ok
10:32 stickyboy I'm still writing my script to parse / loop / getfattr
10:35 ricky-ticky joined #gluster
10:37 pk stickyboy: It is more than 10 minutes, can we see the outputs of "ls <brick-dir>/.glusterfs/indices/xattrop | wc -l" again?, every 10 minutes self-heal-daemon attempts self-heal
10:38 stickyboy pk: Ok
10:39 stickyboy pk: Still 128 and 127.  `heal info` still lists 126.
10:39 stickyboy And this is replica 2, btw.
10:42 pk stickyboy: just select one file and give me getxattr ouputs? so that I can see what happened at least to one file?
10:42 stickyboy pk: I have the output for all of them now if you reallllluy want it ;)
10:42 stickyboy But lemme get you one. :)
10:43 stickyboy https://gist.github.com/ala​north/7d915638a2c26b656978    <---- one file, on both hosts.
10:43 glusterbot <http://goo.gl/pQjwxc> (at gist.github.com)
10:44 pk getfattr output?
10:44 stickyboy Oh, oops.
10:44 pk the link contains only 'stat' output
10:48 stickyboy pk: Server0  https://gist.github.com/alanorth/b0c2b​67e0c674ebcf11b/raw/cbb20bd041be33b53b​8b3c1157bf5f8cd33d7463/gistfile1.txt
10:48 stickyboy pk: Server1 https://gist.github.com/alanorth/7452b​94c0d73cc9b4018/raw/e4bc31cbec745daeda​d6a5e852f2b5fa02d4dd65/gistfile1.txt
10:48 glusterbot <http://goo.gl/iEyBTP> (at gist.github.com)
10:48 glusterbot <http://goo.gl/sbzuJ1> (at gist.github.com)
10:48 stickylag joined #gluster
10:52 pk I don't see any split-brains according to the outputs you gave me
10:53 pk stickyboy: Are the files that you see in the heal info always same set of files? or do they keep changing?
10:58 PM1976 JoeJulian: I modified the gsync.py file by adding the rsync --exclude option with the extension I wanted to exclude and it seems to work properly. Thanks for the help
10:59 pk stickyboy: u there?
11:02 bma is there any python client to glusterfs?
11:02 bma for gluster-swift
11:05 stickyboy pk: Yeah, I'm here.  Sorry.
11:05 hybrid5121 joined #gluster
11:05 stickyboy pk: `heal info` lists mostly gfid's.  Every once in awhile I see a file or two.
11:09 stickyboy pk: Right now I see 10 file paths in the `heal info`.  30 minutes ago there were only gfid's. :)
11:16 TvL2386 joined #gluster
11:18 TvL2386 hi guys, been messing with gluster on ubuntu12.04. I've got the standard 3.2.5 version. I've mounted the volume under clients and been overwriting a file from both clients (while true ; do date > file ; done)
11:18 pk stickyboy: Seems like everything is fine then
11:20 stickyboy pk: Oh really?  I think there are a certain number of gfid's which are always present.
11:21 pk stickyboy: probably they are always being written to .... ? Do you think you may have such files?
11:21 TvL2386 in the mean while I'm catting the file in a while loop, which goes great... Btw, I have a replicated volume between 2 servers... While overwriting the file, I've done 'ifconfig eth0 down' on one server and gave it a reboot 1 minute later... I now have a split-brain situation :)
11:21 stickyboy I kinda believe you that they could be fine actually.  But it's weird to me that `heal info` never ends.
11:22 stickyboy pk: I can check a few of the files to see if they're in use.
11:22 stickyboy pk: btw, when it first started healing yesterday there were a few hundred files.  Then gradually it was only left with gfid's.  Now it's been 24 hours and it's still mostly gfid's, but some files every once in awhile.
11:23 stickyboy TvL2386: Now that sounds like fun. :)
11:25 glusterbot New news from newglusterbugs: [Bug 998967] gluster 3.4.0 ACL returning different results with entity-timeout=0 and without <http://goo.gl/B2gFno>
11:29 TvL2386 yeah it is!
11:31 pk stickyboy: Yeah, self-heal info causing panic to users is something we are trying to fix. We are thinking that once a file is identified to be in need of self-heal probably we should put it in a different bucket from the one which says file is going through normal I/O
11:37 wica joined #gluster
11:41 stickylag joined #gluster
11:41 stickylag The connection between me and my BNC keeps dropping...
11:42 stickylag pk: Ah, good you guys are noticing that; it's probably one of the most common questions in here and on the mailing list. :)
11:46 stickylag pk: Question: should the modification times for gfid's listed in `heal info` be *basically* the same or *exactly* the same?
11:46 stickylag Modify: 2013-11-20 15:44:41.467850524 +0300
11:46 stickylag Modify: 2013-11-20 15:44:41.486508909 +0300
11:46 stickylag It's pretty damn close... but... not exact.
11:48 shylesh joined #gluster
11:48 sticky_afk joined #gluster
11:48 stickyboy joined #gluster
11:50 raghug joined #gluster
11:51 ipvelez joined #gluster
11:51 stickylag Hmm, I picked a random file and it seems the times can differ slightly (by microseconds).  Which seems reasonable.
11:53 gdubreui joined #gluster
11:53 pk stickylag: it depends on whether your cluster machines have same time.
11:55 jiphex_ joined #gluster
11:55 jiphex joined #gluster
11:57 kkeithley1 joined #gluster
11:58 stickylag pk: They have the same time.  But this looks like fractions of microseconds. Even on files which aren't listed in the `heal info` :)
11:59 jiphex If under 3.2 a replace-brick is causing too much I/O load to be viable, would it be possible to rsync the data first from a replica to speed things up a bit?
12:05 stickylag pk: Looking at `smbstatus` I see someone has a share open with one of these directories that I've been seeing the gfid of in `heal info`.
12:05 stickylag pk: So I guess `heal info` will clear up once that file is released.
12:06 stickylag Kinda make sense actually...
12:08 TvL2386 how do you actually create a split-brain situation on glusterfs 3.2.5?
12:11 stickylag TvL2386: `poweroff` in the middle of a write didn't do it? :P
12:14 bma is there a way to delete a container with objects inside?
12:14 bma i try but returns me http status code 409
12:16 getup- joined #gluster
12:18 CheRi joined #gluster
12:18 TvL2386 stickylag, I think it was my 'ifconfig eth0 down' in the middle of a write
12:18 ppai joined #gluster
12:20 harish_ joined #gluster
12:20 TvL2386 currently I have a single nfs server with ~1TB of data on it. About 10 clients have the share mounted and to things on it. Now I want to also build this same setup in a DR location and the nfs server should be an active/active setup. That's why I'm looking at glusterfs
12:20 pk stickylag: Leaving for today, nice working with ya. bbye
12:20 pk left #gluster
12:20 TvL2386 so I want to see how it works, what can go wrong, when things go wrong and how to fix it
12:22 sticky_afk joined #gluster
12:25 abyss^ Hi, can I migrate the first gluster, I know I can migrate: gluster volume replace-brick server2:/data server3:/data but can I migrate server1 (the first server of gluster) in the same way?
12:28 stwange joined #gluster
12:29 stwange is there any way to change a 2-node distributed to be replicated?
12:35 rcheleguini joined #gluster
12:46 hagarth joined #gluster
13:00 getup- joined #gluster
13:06 haritsu joined #gluster
13:10 calum_ joined #gluster
13:11 B21956 joined #gluster
13:14 dusmant joined #gluster
13:19 bala joined #gluster
13:22 bennyturns joined #gluster
13:23 haritsu joined #gluster
13:27 lpabon joined #gluster
13:29 ira joined #gluster
13:30 vpshastry joined #gluster
13:31 vpshastry left #gluster
13:39 CheRi joined #gluster
13:43 TvL2386 joined #gluster
13:46 ppai joined #gluster
13:46 mattappe_ joined #gluster
13:53 badone joined #gluster
13:54 hybrid512 joined #gluster
13:56 sroy_ joined #gluster
13:56 davidbierce joined #gluster
13:57 andreask joined #gluster
13:59 johnmark hey semiosis - check this out - http://www.phpbuilder.com/articles/a​pplication-architecture/optimization​/clustered-file-systems-and-php.html
13:59 glusterbot <http://goo.gl/Iz7L68> (at www.phpbuilder.com)
13:59 * johnmark is curious if we need to contact the author to discuss other "gotchas" to watch out for wrt PHP
14:01 stwange I have a two-node distributed setup. I want to move it to a two-node replicated setup. There's enough space on the servers to store all the data twice, so if I create a new two-node replicated volume, is there a way to have gluster replicate the old volume onto it?
14:03 bugs_ joined #gluster
14:12 getup- joined #gluster
14:13 bma is there a way to delete a container with objects inside? i tried but returns me http status code 409...
14:15 haritsu joined #gluster
14:20 raghug joined #gluster
14:23 andreask joined #gluster
14:24 haritsu_ joined #gluster
14:28 tqrst is there any potential harm in triggering a rebalance while a full heal is running?
14:28 tqrst other than cpu/memory load, that is
14:29 haritsu joined #gluster
14:30 badone joined #gluster
14:33 haritsu joined #gluster
14:41 raghug joined #gluster
14:43 failshell joined #gluster
14:56 glusterbot New news from newglusterbugs: [Bug 1033093] listFiles implementation doesn't return files in alphabetical order. <http://goo.gl/iIw5ci>
15:00 kaptk2 joined #gluster
15:04 neofob joined #gluster
15:04 hagarth joined #gluster
15:05 calum__ joined #gluster
15:09 dbruhn joined #gluster
15:13 raghug joined #gluster
15:13 jbrooks joined #gluster
15:13 zerick joined #gluster
15:19 wushudoin joined #gluster
15:24 LoudNoises joined #gluster
15:26 glusterbot New news from newglusterbugs: [Bug 1033115] Build only passes if run as root, dues to a unit test which changes permissions for files to root group. <http://goo.gl/qDet4v>
15:29 geewiz joined #gluster
15:35 badone joined #gluster
15:42 coredump_ joined #gluster
15:43 coredump_ Hello guys. In Openstack + Gluster environments, people usually run the gluster nodes on the same machines than the openstack nodes? Like, the same machines serve gluster and compute/network?
15:43 coredump_ or people usually have 2 different clusters.
15:44 _BryanHm_ joined #gluster
15:45 ndk joined #gluster
15:46 stwange sitting in here for a day is like listening to people praying
15:46 hagarth coredump_: there are both models that I have seen
15:47 coredump_ any major drawback on running them together?
15:49 hagarth coredump_: operational issues like upgrades will get more complex
15:49 Broken|Arrow joined #gluster
15:50 Broken|Arrow Hi everyone..
15:50 Broken|Arrow I have a problem with gluster 3.2.6-1 on ubuntu 12.04. 64 bit.
15:51 kkeithley1 I suggest you start by updating to a more recent release from ,,(ppa)
15:51 glusterbot The official glusterfs packages for Ubuntu are available here: 3.3 stable: http://goo.gl/7ZTNY -- 3.4 stable: http://goo.gl/u33hy
15:52 johnmark stwange: ??
15:52 Broken|Arrow I started a brick-replace and I got a message that it was started successfully. However, status says unknown, and abort fails...now any operations on that volume says that I have to wait for the replace to complete..I found similar cases in the mailing list, but no solutions..
15:53 Broken|Arrow kkeithley_: thanks, is it safe to do that on a production cluster ?
15:58 stwange kkeithley, isn't the newer version not-backwards-compatible and refuses to connect servers+clients running different versions? Quite dangerous advice
15:59 stwange Broken|Arrow, is the old brick online?
16:00 Broken|Arrow I just checked, 3.3.x is not backward compatible with anything older..
16:00 Broken|Arrow stwange: yes, I can't stop the volume or remove it from it..
16:00 stwange can you pastie the output of peer status, volume info all, and the commands to abort the replace-brick, start it, and show status?
16:00 Broken|Arrow still says that I have to wait for the replace to complete..
16:01 stwange I probably won't be able to help but I'll look at the output
16:01 Broken|Arrow ok, a sec
16:01 ircleuser joined #gluster
16:04 coredump_ Is 16GB enough for a dedicated gluster server? I don't see it using a lot of memory
16:05 stwange should be more than enough, it's mostly disk I/O, network bandwidth and CPU that's used
16:06 frostyfrog left #gluster
16:07 Broken|Arrow stwange: http://pastebin.com/hHmTBMed
16:07 glusterbot Please use http://fpaste.org or http://paste.ubuntu.com/ . pb has too many ads. Say @paste in channel for info about paste utils.
16:07 Broken|Arrow glusterbot: sorry, mr bot..
16:08 stwange seems like a daft thing to be concerned about...
16:08 T0aD joined #gluster
16:08 stwange does gluster volume replace-brick vol_1 Server2:/bricks/vol_1 Server3:/bricks/vol_1 start  do anything?
16:09 dneary joined #gluster
16:09 stwange is du -sh Server3:/bricks/vol_3 changing at all?
16:09 stwange sorry, vol_1
16:10 Broken|Arrow stwange: "start" said that it started succuessfully the first time.. but says failed now..
16:10 stwange is there anything in the logs?
16:11 Broken|Arrow Server3:/bricks/vol_1 now contains the files on the brick actually.. but I am not sure that's a good thing..
16:11 Broken|Arrow stwange: nothing that would bring any usable results on google..
16:12 ron-slc joined #gluster
16:13 stwange it contains all of the files Broken|Arrow?
16:14 stwange you could try  gluster volume replace-brick vol_1 Server2:/bricks/vol_1 Server3:/bricks/vol_1 commit  (but I'd be surprised if it worked)
16:14 social kkeithley_: ping, I'm hitting interesting issue, our rest tests fail on 3.4.1 with [client-rpc-fops.c:327:client3_3_mkdir_cbk] 0-Staging-client-1: remote operation failed: File exists. Path: /fb
16:15 social kkeithley_: the tests in short do mkdir some thierarchy for example /mnt/test/fb/cd and rm -rf /mnt/test/* ; umount /mnt/test ;  mount and cycle through this
16:15 Broken|Arrow stwange: http://paste.ubuntu.com/6453976/
16:15 social kkeithley_: does it ring any bell?
16:15 glusterbot Title: Ubuntu Pastebin (at paste.ubuntu.com)
16:15 Broken|Arrow stwange: yes, all the files..
16:15 Broken|Arrow stwange: tried commit, same thing..
16:16 stwange just double check that you haven't got the hostname for server3 going to the wrong place in /etc/hosts on any of the servers
16:16 stwange if not, I'd stop gluster on server3 and see if you can stop the replace-brick then
16:18 Broken|Arrow just confirmed hosts are ok..
16:19 Broken|Arrow stopped it, still the same :(
16:21 andreask joined #gluster
16:23 stwange hmm
16:23 stwange what do you mount?
16:24 stwange if they all go down except server1 would that cause you problems?
16:25 Broken|Arrow no, I can stop server2 and server3..but that didn't help
16:25 kkeithley_ social: no, not ringing any bells for me
16:26 stwange urgh. I don't know I'm sorry
16:26 stwange all glustered out today :-/
16:26 stwange ENOMOREMAGIC
16:27 kkeithley_ stwange: that's correct, an upgrade from 3.2 to 3.3 or 3.4 is an incompatible upgrade. Downtime is required to update everything.
16:29 Broken|Arrow stwange: thanks a million for trying.. I really appreciate it..I guess I will request a downtime and upgrade..it's the right thing to do any way..
16:33 social kkeithley_: sounds like this but I'm unable to reproduce it unless I umount/mount https://bugzilla.redhat.com/show_bug.cgi?id=951195
16:33 glusterbot <http://goo.gl/7QR9p> (at bugzilla.redhat.com)
16:33 glusterbot Bug 951195: high, high, ---, sgowda, ASSIGNED , mkdir/rmdir loop causes gfid-mismatch on a 6 brick distribute volume
16:33 neofob left #gluster
16:33 bennyturns joined #gluster
16:34 Alpinist joined #gluster
16:36 social kkeithley_: ah no works fine, while true do; mkdir -p tra/la tra/lala tra/li ; rm -rf tra done reproduces the issue
16:39 neofob joined #gluster
16:44 semiosis :O
16:45 social funny thing is that it throws error but after few miliseconds continue so it manages to delete /brick/tra/li in the end
16:47 semiosis dbruhn: i'd go with a variation on JoeJulian's scheme, using numbers instead of letters:  /data/$volume_name/{0...N}/brick
16:48 semiosis johnmark: thx for the php article link, will look into it
16:49 semiosis johnmark: at first glance, we need to get a note about the PPA in there, so people dont end up with 3.2.7 from universe :/
16:53 dbruhn Thanks semiosis, I feel like every install I do I overthink this stuff more and more
16:54 semiosis same here
16:54 dbruhn Granted a little over thinking here and there would have saved me some of the pains on my existing systems...
16:54 dbruhn lol
16:54 semiosis my prod cluster (still 3.1.7) has bricks named /bricks/$volname{0...N}
16:55 semiosis where the brick path is the disk fs mountpoint.  since then i've learned it's better to have the brick path in a subdir of the disk fs mountpoint
16:55 semiosis so i advise that
16:55 dbruhn What's the gotcha there?
16:55 dbruhn My two prod systems today are at the mount point
16:56 semiosis if the disk fs is not mounted, and your brick is the mountpoint, then glusterfs will use the (empty) dir on your root fs, which is bad for two reasons... files are missing, and heal will fill your root fs
16:57 semiosis however, if the disk fs is not mounted and your brick is a subdir of the mountpoint, then the brick dir will not exist, and glusterfsd (brick export daemon) will fail to start
16:57 semiosis no downside there
16:57 semiosis beyond a brick being down
16:58 jord-eye joined #gluster
16:58 JoeJulian btw, that's fixed in 3.4
16:58 semiosis oooh
16:59 dbruhn Good to know
17:00 cyberbootje joined #gluster
17:08 bulde joined #gluster
17:16 cyberbootje joined #gluster
17:30 Mo__ joined #gluster
17:39 johnmark dbruhn: oh hey. were you at the CCC?
17:39 johnmark ndevos: ping
17:39 social btw anyone going on 30C3 ?
17:42 jbd1 joined #gluster
17:46 cfeller joined #gluster
17:46 mattapp__ joined #gluster
17:47 jbd1 What is the meaning of the output to gluster volume info VOL heal-failed ?  I get a lot of listings there (1050) but a lot of those files don't even exist on the bricks
17:49 calum_ joined #gluster
17:50 JoeJulian I'd check the glustershd logs around the timestamp shown to see what the error was.
17:55 mattappe_ joined #gluster
17:56 jbd1 JoeJulian: logs rotated away, so I can't see the glustershd logs for the timestamps in question (the most recent was on 10/31/2013).  The brick log doesn't have anything for that kind of stuff either (though it does go that far back).
17:57 jbd1 oh, well I did find something in the brick log relating to the directory listed in the heal-failed: [2013-10-31 04:14:38.902655] W [posix-handle.c:470:posix_handle_hard] 0-UDS8-posix: link /export/brick1/vol1/5b/61/8​4/fverdugho/mail/inbox.dump -> /export/brick1/vol1/.glusterfs/bf/02/​bf0296d2-9760-442c-b52d-5f6658260235 failed (File exists)
17:58 jbd1 which corresponds to the entry in the heal-failed output: 2013-10-31 04:27:16 /5b/61/84/fverdugho/mail
18:01 JoeJulian I would ls -li /export/brick1/vol1/5b/61/8​4/fverdugho/mail/inbox.dump /export/brick1/vol1/.glusterfs/bf/02/​bf0296d2-9760-442c-b52d-5f6658260235 if they have the same inode number, you should be fine.
18:02 mattappe_ joined #gluster
18:03 jbd1 JoeJulian: they do, so I guess I should just ignore it :(
18:03 rcheleguini joined #gluster
18:04 JoeJulian You might file a bug report. That's very confusing that it would show the heal failed but it appears to be okay.
18:04 glusterbot http://goo.gl/UUuCq
18:05 jbd1 JoeJulian: it's interesting that the heal-failed entries are all directories, and that the majority of them were created with cp from a local fs to a fuse mount
18:05 jbd1 I have a hunch that there's a bug with rapidly-created directory trees on 3.3.2
18:06 jbd1 (e.g. mkdir /tmp/foo/{a,b,c,d,e,f,g}/{h,i,j,k,​l,m,n}/{o,p,q,r,s,t,u}/{v,w,x,y,z} ; cp -r /tmp/foo/ /mnt/glusterfs-volume )
18:17 dneary joined #gluster
18:18 rotbeard joined #gluster
18:20 aliguori joined #gluster
18:29 jbrooks left #gluster
18:33 jbrooks joined #gluster
18:47 dbruhn johmark, I didn't get a chance to go, been crushingly busy on the backup side of business.
18:48 dbruhn did you get my email about wanting to do a joint press release?
18:48 mattapp__ joined #gluster
18:48 dbruhn be back in a bit, need to relocate to the datacenter
18:50 Guest19728 joined #gluster
18:51 JoeJulian We get a lot of those here in Washington State.... Seems like there's some sort of "joint" related press release daily.
19:10 sticky_afk joined #gluster
19:10 stickyboy joined #gluster
19:12 raghug joined #gluster
19:26 _pol joined #gluster
19:27 dmueller joined #gluster
19:42 geewiz joined #gluster
19:57 dbruhn joined #gluster
19:59 glusterbot New news from newglusterbugs: [Bug 1033275] The glusterfs-geo-replication RPM missing dependency on python-ctypes <http://goo.gl/fw8PV6>
20:02 samppah @latest
20:02 glusterbot samppah: The latest version is available at http://goo.gl/zO0Fa . There is a .repo file for yum or see @ppa for ubuntu.
20:04 _dist joined #gluster
20:05 hateya_ joined #gluster
20:06 _dist Hey there everyone, I'm in the middle of some throughput testing and I've found myself in a strange situation with a volume heal. Anyone have experience with it?
20:06 johnmark dbruhn: oh hey, joint PR? I missed that somehow
20:06 johnmark dbruhn: hrm... let's reconnect on that - would love to do something
20:09 _dist it's probably something simple, basically I had a replication brick down, I brought it back up and it healed (file sizes are the same now etc). But, every 2-3 seconds it keeps re-healing that same file (rather it shows up in heal volume info for 2-3 secs over and over again)
20:11 dbruhn johmark, dean@offsitebackups.com lets have a conversation later this week
20:14 dbruhn john mark, dean@offsitebackups.com lets have a conversation later this week
20:15 dbruhn johnmark
20:15 dbruhn damn auto correct
20:21 uebera|| joined #gluster
20:21 wica Hi, on ubuntu 13.10. I have qemu with glusterfs support ( i think ) but does libvirt also need glusterfs support? In the code I can only find a ref to fedora 11
20:22 MacWinner i have 3 nodes with /brick as the bricks on each one. I had a volume called master, and all the peers were working on the public IP of my servers.  I know have a second NIC for private IP connection between all the servers so I wanted to move my gluster congiruation to use the internal IP..
20:22 badone joined #gluster
20:22 MacWinner I have deleted teh existing volume and rmoved all the peers..
20:22 MacWinner I have re-added the peers using their internal IPs now
20:23 MacWinner but now when I try to create the volume again with this command: gluster volume create master replica 3 transport tcp internal3:/brick internal4:/brick internal5:/brick
20:23 MacWinner i get this error: volume create: master: failed: /brick or a prefix of it is already part of a volume
20:23 glusterbot MacWinner: To clear that error, follow the instructions at http://goo.gl/YUzrh or see this bug http://goo.gl/YZi8Y
20:25 samppah wica: yes, if you are going to use libvirt then libvirt has to support glusterfs too
20:25 wica samppah: Oke, clear. do you now a url with some more info?
20:26 wica I don't see any --enable-glusterfs flags in ./configure
20:28 kkeithley_ I suspect the libvirt configure will DTRT if you have the glusterfs-api-devel RPM installed
20:28 wica I'm using ubuntu
20:29 wica And I think the glusterfs-api-devel rpm only has glusterfs/api/glfs.h
20:30 wica Which I have already, because I has to rebuild qemu
20:33 elyograg i have a question about some behavior I'm seeing on 3.3.1.  Anytime I do anything with quotas, the volume gets mounted on that machine as /tmp/mntXXXXXX.  One of them got unmounted automatically, but the others didn't seem to.  Is this expected, a known bug that's maybe been fixed in a newer release, or something I should report?
20:33 JoeJulian elyograg: I found a bug in 3.4.0 that might have accounted for something like that. It probably existed back in 3.3.1.
20:34 samppah wica: sorry, can't remember if there is any docs available :(
20:35 wica samppah: Ok, anyway thnx
20:35 elyograg JoeJulian: got a BZ id?  I did do some searching, but i don't seem to be able to hit on the right combination. :)
20:38 _dist so I stopped and restarted the volume (bit of a pain) but it made the file that "kept healing" without errors in shd.log stop doing, whatever it was doing :)
20:39 _dist not a great solution though, I'd hate to have to do that on a prod server.
20:44 _dist left #gluster
20:54 JoeJulian wica: afaict, to configure libvirtd to use glusterfs you need netcf
20:56 wica ?
20:56 wica why a does it need a config tool for nic's to enable glusterfs support?
20:56 JoeJulian I didn't write it.
20:56 wica hehe
20:57 wica great. So there is a vendor lock
20:57 wica :)
20:58 JoeJulian I could be reading this wrong...
20:59 wica i see it, libvirt is liking netcf
20:59 wica where are you reading this?
21:03 wica So a not redhat like o.s. wil not support glusterfs native with libvirt.
21:04 dmueller joined #gluster
21:04 JoeJulian I was probably misreading the changelog
21:06 dmueller left #gluster
21:11 semiosis wica: re: glusterfs-api-devel rpm only has glusterfs/api/glfs.h... making a glusterfs-dev deb package is on my (exceedingly long) TODO list
21:12 semiosis as for the netcf thing, this is the first i've heard of it
21:13 semiosis you may be the first person ever to try gluster/qemu/libvirt on ubuntu
21:13 wica semiosis: That would be nice. I have taken the glfs.h from the source
21:13 semiosis definitely the first person i've ever heard from about it
21:13 wica and rebuild qemu and enabled glusterfs
21:14 kiwikrisp joined #gluster
21:14 wica btw thanks :)
21:15 semiosis you're welcome
21:15 wica semiosis: do you now a way to enable glusterfs in libvirt ?
21:16 daMaestro joined #gluster
21:16 JoeJulian Here's the commit that added glusterfs to libvirt: http://ur1.ca/g2q2x
21:16 glusterbot Title: #55896 Fedora Project Pastebin (at ur1.ca)
21:16 JoeJulian Not much to it. Just added the strings it looks like.
21:17 semiosis wica: i've never tried
21:17 neofob left #gluster
21:17 wica JoeJulian: Yep, I found that one also in the code.
21:19 wica semiosis: oke, thanks anyway also for the packages.
21:19 andreask joined #gluster
21:21 semiosis JoeJulian: where'd you find this netcf thing in glusterfs/libvirt?
21:21 JoeJulian I wish I'd spent more time before saying anything... I'm sure that's just a red herring.
21:22 JoeJulian The rpm changelog, but I think those were unrelated changes.
21:25 JoeJulian wica: is glusterfs support not working in it?
21:25 JoeJulian Looks like it should without any special configuration.
21:25 wica JoeJulian: I did not try yet. who knows
21:27 tqrst unrelated: can someone explain to me what the 'force' option in rebalance is supposed to do? The only doc I've found so far says it forcefully rebalances, which isn't exactly a useful description.
21:30 tqrst sigh, segfaults again
21:31 kiwikrisp I need some help with really high disk activity on my 2 brick gluster storage system. I'm running 3.4 onc CentOS 6.4 and the volume is replicating. The machines sit right next to eachother so geo-replication is not used. It started a couple of weeks ago when mdadm was doing a scan of the array. The mdadm scan is done now but the activity is still way up there. I've run the # gluster volume heal VOLUME info and
21:31 kiwikrisp it has 4 entries, but I don't know if that's a problem, what they mean or what I might need to do to resolve them.
21:40 Liquid-- joined #gluster
21:41 JoeJulian tqrst: When it wouldn't otherwise move a file, such as when the destination brick has less free space than the source, it moves it anyway.
21:41 dbruhn joined #gluster
21:42 tqrst JoeJulian: interesting. When would that come in handy?
21:42 tqrst also, look under your seats! You get a segfault! You get a segfault! And you get a segfault! http://pastie.org/private/f6xj9utrgdqpygyv9fxo4w
21:42 glusterbot <http://goo.gl/XWrXrp> (at pastie.org)
21:44 JoeJulian I do forceful rebalance because my bricks are on lvm. I want the sticky pointers gone and if the rebalance causes uneven distribution, I'll handle ensuring there's enough space by adding more extents via lvm.
21:44 tqrst right
21:44 ndk joined #gluster
21:45 JoeJulian bus error?
21:45 dbruhn Does anyone know if you still need to adjust the in ode size to 512 for xfs sub storage?
21:45 dbruhn inode
21:45 tqrst yeah I have no idea
21:45 tqrst 3 of my gluster servers just went berserk
21:45 JoeJulian According to redhat performance testers, no.
21:45 dbruhn Joe was that for me?
21:46 JoeJulian dbruhn: yes
21:46 dbruhn Thought thats what I heard, thanks for the conformation
21:46 JoeJulian I still do it anyway...
21:47 dbruhn habit, or just don't trust it?
21:48 JoeJulian If they're going to roll over into a second inode if I don't, I might as well. Just seems more sane.
21:49 dbruhn Makes sense
21:51 tqrst JoeJulian: looks an awful lot like the weird segfaults from a few months back
21:51 JoeJulian tqrst: just to make sure, I checked to see if gluster was borrowing that signal for something else. It's not.
21:51 JoeJulian You have a memory bus fault on three machines?
21:51 tqrst JoeJulian: even better
21:52 tqrst stopping glusterd on any of the remaining servers triggers a segfault there
21:52 tqrst there's a handful of them left if you want to try anything on the ones that are still up and running
21:53 JoeJulian That's a whole dragon's nest, that is.
21:53 JoeJulian dmesg?
21:54 * tqrst mutters something about dmesg not having timestamps?
21:54 tqrst google time
21:54 JoeJulian http://en.wikipedia.org/wiki/Se​gmentation_fault#Paging_errors
21:54 glusterbot <http://goo.gl/IUSOUo> (at en.wikipedia.org)
21:54 JoeJulian That seems like the most likely possibility.
21:55 tqrst (sigh, dmesg timestamps are disabled by default - no timestamps for me except on the new ones)
21:56 tqrst paging errors on all servers at the same time?
21:56 pravka joined #gluster
21:56 JoeJulian Did you upgrade anything?
21:56 tqrst (I suppose that could be possible if it's triggered by a weird memory setting)
21:56 tqrst well
21:57 tqrst I upgraded to 3.4.1 a few days back
21:59 semiosis tqrst: from what?
21:59 JoeJulian If you upgraded something on all your servers that was memory mapped by the client and the new file is smaller than the original, that could explain it.
21:59 Liquid-- joined #gluster
21:59 tqrst semiosis: 3.4.0
22:00 tqrst JoeJulian: I restarted all servers and clients
22:00 JoeJulian rwheeler: Man... that ESTALE bug really got some legs...
22:01 haritsu joined #gluster
22:01 JoeJulian tqrst: Are you sure?
22:01 JoeJulian Did you ps ax | grep gluster
22:01 JoeJulian (or pgrep -f gluster if you prefer)
22:03 tqrst nov 13, which is when I upgraded
22:04 rwheeler JoeJulian, right - the important bits are fixed, what is left is pretty trivial
22:06 JoeJulian semiosis: That means that we'll no longer have to hear, "It says stale NFS filehandle, but I'm not using NFS..."
22:06 tqrst nothing in dmesg btw (just enabled timestamps and killed one of the servers to see)
22:12 JoeJulian Well... "lsof -p $(pidof glusterfs) | grep mem" then get in your time machine and see if any of those were a different size when you mounted the volume... :/
22:19 cicero whoa
22:19 cicero pidof is a thing
22:19 JoeJulian hehe
22:19 cicero learn something everyday
22:37 ron-slc left #gluster
22:40 failshel_ joined #gluster
22:49 tqrst in other news, rebalance status still outputs nonsensical hostnames (all localhost)
22:59 gdubreui joined #gluster
23:05 qubit left #gluster
23:08 msolo joined #gluster
23:09 msolo when starting a client, you specify only the volume name, how does the client locate the brick servers?
23:11 elyograg msolo: a fuse mount is server:volume ... it talks to the server, downloads the volume info from it.  the volume info contains all the host:/path information for the bricks.
23:11 JoeJulian @mount servers
23:11 glusterbot JoeJulian: I do not know about 'mount servers', but I do know about these similar topics: 'mount server'
23:11 JoeJulian @mount server
23:11 glusterbot JoeJulian: The server specified is only used to retrieve the client volume definition. Once connected, the client connects to all the servers in the volume. See also @rrdns
23:12 msolo Ah, sorry, I see it now, thanks
23:14 social JoeJulian: grab an gluster mount and do: while true; do mkdir -p a/a a/b a/c a/d b/a/a b/c/c a/d && rm -rf * || exit ; done :)
23:15 social eh forgot to say it has to be distribute
23:16 andreask joined #gluster
23:21 tyl0r joined #gluster
23:24 _pol joined #gluster
23:36 tyl0r Is there a way to get an NFS client to communicate across multiple bricks on the same gluster server? I'm guessing no.
23:39 JoeJulian Eh?
23:40 JoeJulian The client, nfs or fuse, communicates with a volume. Those volumes are made up of bricks which could be on the same server.
23:44 tyl0r Yeah, that was my hypothesis, but when I tested it out I had different results. I have a 2-node distributed-replicate set. So I have 2 nodes each with 2 bricks (nodeA has brick1/brick3, nodeB has brick2/brick4). When I connected (via NFS) to the volume on one of my mirrored nodes (nodeB), it was only seeing data on bricks 2 and 4.
23:46 sprachgenerator joined #gluster
23:47 tyl0r Crap, I wrote that last sentence wrong....
23:47 tyl0r it was only seeing data on brick 4.
23:48 elyograg were you mounting the brick path, or the volume?
23:50 elyograg use a site like fpaste.org to share your gluster volume info, and also give us the exact mount command you used.
23:55 tyl0r thank you, one sec
23:57 tyl0r http://fpaste.org/55935/38507822/
23:57 glusterbot Title: #55935 Fedora Project Pastebin (at fpaste.org)
23:58 elyograg and the mount command?

| Channels | #gluster index | Today | | Search | Google Search | Plain-Text | summary