Perl 6 - the future is here, just unevenly distributed

IRC log for #crimsonfu, 2016-12-15

crimsonfu - sysadmins who code

| Channels | #crimsonfu index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:17 pdurbin wow, several dozens of software engineers
02:47 ilbot3 joined #crimsonfu
02:47 Topic for #crimsonfu is now http://crimsonfu.github.com - ConfiguRatIon Management of Systems Or Network kung FU | logs at http://irclog.perlgeek.de/crimsonfu/today
14:35 dotplus bear: yes, most of our Jenkins jobs are generated by Ansible running Jenkins Job Builder based on ansible inventory so adding a small wrapper function to emit timing data to the collector would be easy. Also, adding an ansible callback plugin to emit timing data. the parts that would take some effort would be to ensure that (enough) metadata is sent in a usable/conventional way so as to ease the difficulty of visualization and the ...
14:35 dotplus ... programming/CM of visualization itself. pretty sure that a metrics project like this is feasible for a week's effort if we scope it small enough and would be a good foundation for further effort.
14:37 bear if your working with ansible then you can take advantage of the metadata that it's facter program can generate
14:37 dotplus JoeJulian: yes, each project would be the foundation for further work and not (merely) an end in itself.
14:39 dotplus bear: perhaps. I'm not sure we care too much about the actual hosts that are running the jobs. jenkins slaves and the ephemeral vagrant/(vbox|openstack)/container instances they create are not even cattle.
14:39 bear ah
14:40 bear my first thought was that you could take advantage of ansbile facter + jinja2 template and just write a python wrapper that passes the facter data to the template
14:41 bear but even without that - the jenkins job data is exposed, IIRC, as environment vars - so a wrapper script should be able to generate good data
14:42 dotplus yeah, I can envisage quite a few useful and fairly easy ways to do that, which makes me more inclined towards the metrics project. But I don't want to shortchange the ELK project just because I don't yet understand it so well.
14:44 bear a lot of the dashboard tools will work with pretty generic elastic search results as long as you get a single json blob stored
14:44 bear so you can add a lot of extra data
14:47 dotplus ok good. I'm certainly going to need to get some experience with dashboarding. the 2 main (FOSS) options would be grafana or kibana, right?
14:47 bear actually they are kinda merging now - the owner of both now works for elasticsearch :)
14:47 bear but yes
14:48 dotplus iiiiinteresting.
14:48 bear I would stick with kibana
14:48 bear it supports multiple input db types
14:50 bear this article is part of a series that goes over a lot of it -- http://engineering.laterooms.com/elasticsearch-as-a-time-series-database-part-2-getting-stats-in/
14:51 bear oops - I got that backwards I think grafana is replacing kibana
14:51 bear (it's been a year since I last did any of this so hand waving in progress ;)
14:52 dotplus np, that's cool. nobody is uptodate with everything all the time. and a general IRC... handwaving expected.
14:52 bear :)
14:52 bear the reason I like the ELK approach is that you can use logs
14:52 bear generate the output to a log, run logstash on the logs to get current data and then compress and store the logs for archive
14:53 bear logstash or it's new variant filebeat
14:53 bear using that method also allows you to add alerting by siphoning off a stream to your alert tool of choice
14:55 dotplus yeah, I think that the elk approach is *super-duper* powerful as my 3yrold would say. Basically I'm just concerned that even an MVP would be too much for a week
14:55 bear the work they have done with the elk stack means that your first day is elasticsearch setup - it's really really easy now
14:56 dotplus I suppose that means I just need to work out how to chunk it sufficiently small so to have *something* minimally useful to demo at the end of the week
14:57 dotplus nice, that's encouraging. reducing the barriers to entry for that kind of powerful tech has to be a huge win
14:57 bear for a mvp I would suggest a command line statsd emitter wired up to a stock elk endpoint (it has statsd as a receiver)
14:57 bear and focus on a single metric - like total build time
14:57 bear then all the work is done making it possible in your job environment
15:05 dotplus well, wait a sec. you're conflating the 2 possible projects. If we do the metrics project that would be influxdb/grafana. If we do ELK, it would not be metrics and timing data, at least not at first (and I'm inclined to think that *numeric* TS data is probably better in a real tsdb like influx anyway and that using ELK for such metrics would be possible but sort of using a hammer for a screw). If we do ELK, the (initial) purpose would be to ...
15:05 dotplus ... help the dev teams avoid Jenkins interface and easily understand exactly why their build failed, get to the exact log message such as the stacktrace from whatever openstack python daemon choked or whatever.
15:13 bear I wasn't conflating them at all - it's something I had to learn the hard way myself
15:13 bear "real tsdb" is a red herring
15:14 dotplus oh, my bad. clearly I misunderstoof
15:14 dotplus how so?
15:14 bear not your bad, just me being up way too early in the morning
15:14 bear the modern dashboard can take any query result that is a list and convert it to a graph
15:15 bear so you can use sql, elasticsearch, graphite, influxdb...
15:15 bear the point is to not worry about the store but rather the collection method
15:16 bear and in my experience, the benefits of a "real" time series db is marketing hype now - the elasticsearch folks have done some seriously good work in getting the E part of the ELK stack very easy to use
15:16 bear just google for how many people lost their influxdb datastores during upgrades or scale-up events
15:16 bear (I know I did)
15:17 bear the other advantage of elk is that you can do both the graph for timing and the log search on the same backend
15:20 bear when your graphing your metric data you can include in the graph labels that are the job id's
15:20 bear then you can search the log output for that job id and get all the surrounding output for the specific build
15:20 dotplus clearly the data/metadata collection is critical; if you don't have sufficient metadata, then obviously the questions you can safely ask are more limited. re:tsdb, my (admittedly naive) understanding is that if I want to focus on ts analysis, there's a reason why there are ts tools. (yes, I've heard some horror stories about influxdb wrt immaturity). It sounds like you're saying that if I choose ELK (or ELG(rafana), I can achieve both ...
15:20 dotplus ... projects with a single stack
15:20 bear yep
15:21 bear now, to be clear, i'm not saying what you want to do won't work - it will work just fine to be honest
15:21 bear I just feel that you will be redoing a good chunk of it when you decide to go to phase 2
15:21 dotplus right. I know. especially on the limited scale we have for this.
15:22 bear because inserting (and pulling out) non time series info out of influxdb is painful
15:22 dotplus yeah, definitely not going to go there.
15:22 dotplus that would be using a ovenmitts for embroidery
15:23 bear :)
15:23 bear either way I would suggest going the route where everything you do is also mirrored to a log file
15:23 bear so if you do decide to go another path, you can replay the log files to back fill data
15:23 bear or replace data when you find some mistake
15:24 dotplus "everything you do" == the input stream?
15:24 bear yes
15:25 bear simple log output of anything you would be sending to statsd
15:26 dotplus good idea. that will make for easier testing for the CM as well. just replay a section of the input stream into the storage and then I can test whether the role creates what it should.
15:26 bear https://www.elastic.co/blog/elasticsearch-as-a-time-series-data-store  as another data point to why I like elasticsearch as a tsdb
15:26 bear yes, I love having log files to use as unit tests for my gather/scatter code
15:32 bear I used to do a *ton* of data conversion and manipulation as my dayjob - millions of rows of data per customer
15:32 bear and I quickly learned that having detailed logs of each pass allowed me to replay only the sections that broke
15:32 bear instead of having to rerun an entire 6 hr job
15:35 dotplus slicing that seems like it would be a big win
15:38 bear what is really going to make you chuckle is i've lead you down a path where I can point out https://www.elastic.co/guide/en/logstash/current/plugins-filters-metrics.html
15:38 bear and you realize that from the log you generate you can auto-populate the tsdb
15:39 bear but that can still be a version 2 adjustment :)
15:53 dotplus bwahaha. nice
15:57 dotplus emit to graphite? not if I can avoid it. I've dealt with whisper enough to ensure that I won't be choosing that again. and I'm not a fan of the line protocol either. but presumably that filter plugin supports destinations other than graphite.
16:01 bear it does, that was just a easy to grab post to show what is possible
16:17 dotplus right. thanks for your comments this morning. I'm going to look more into using ELK for our project, see if I can get my head around the details, smackdown the devils therein and ensure that I can keep it small enough to be achievable but usable in a week.
17:47 bear your very welcome
21:07 pdurbin dotplus: will something come out of this that you can open source? :)
21:52 dotplus no idea yet. Don't even really know what we'll have for internal use:)
22:02 dotplus depends what's already out there. presumably the initial stuff I described would be very particular to our jenkins jobs and so on, not much use to anyone else in the specific
22:55 pdurbin ok, just thought I'd ask :)
23:08 dotplus it's always on my mind
23:57 pdurbin good :)

| Channels | #crimsonfu index | Today | | Search | Google Search | Plain-Text | summary

crimsonfu - sysadmins who code