Perl 6 - the future is here, just unevenly distributed

IRC log for #opentreeoflife, 2013-11-24

| Channels | #opentreeoflife index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:00 dukeleto jimallman: the curation app will send a GH username/name/email to the API and then it will do local Git commits
00:01 dukeleto jimallman: but without an authentication token, how do we know who should have write access to the API?
00:02 dukeleto jimallman: i.e. if somebody is using the API via curl from the command-line, they could lie and say "i am jimallman", when in fact they are not
00:04 dukeleto jimallman: provenance vs. authentication
00:04 dukeleto jimallman: something to think about
00:22 jimallman sorry, got pulled into a phone call.. back in a minute
00:27 dukeleto jimallman: no worries
00:37 travis-ci joined #otol
00:37 travis-ci [travis-ci] OpenTreeOfLife/api.opentreeoflife.org#193 (local - 6f4faf1 : Jonathan "Duke" Leto): The build passed.
00:37 travis-ci [travis-ci] Change view : https://github.com/OpenTreeOfLife/api.opentreeoflife.org/compare/e3cb328ae99c...6f4faf1d026e
00:37 travis-ci [travis-ci] Build details : http://travis-ci.org/OpenTreeOfLife/api.opentreeoflife.org/builds/14432393
00:37 travis-ci left #otol
00:39 jimallman re: who should have access, that's an interesting question.
00:40 jimallman dukeleto: this is why i was so surprised that you could simply assert --author on a git commit, i guess i thought it would be more... protected than that
00:41 jimallman since we're still using GitHub auth and getting the token, maybe that's the answer right there. the OTOL API could check to make sure the token is good and belongs to the specified user before doing anything important.
00:42 jimallman that should handle impersonation, anyway. if the auth token checks out, we trust that the curator (or other API consumer) is who they say they are and attribute the activity to them in the git commit --author option.
00:42 dukeleto jimallman: yes, that was the conclusion I came to.
00:43 jimallman re: provenance vs. authentication, that's a bigger question that people are starting to focus on. so far, we've talked about handling it via structured git notes, but i suspect we'll ultimately want it "on board" in the Nexson.
00:43 dukeleto jimallman: specifying --author on commit is different than having access to push to a remote. Subtle but different
00:44 jimallman ah, i was thinking our app (local API user) would have GitHub access of its own and would do the actual commits. are you thinking about tools working outside our API?
00:45 jimallman maybe i'm missing a step here...
00:46 jimallman this comment pretty much represents my current understanding here:
00:46 jimallman https://github.com/OpenTreeOfLife/api.opentreeoflife.org/issues/28#issuecomment-28865551
00:47 jimallman i thought we'd end up with commits like the most recent two here (not the distinction between committer and author):
00:47 jimallman https://github.com/OpenTreeOfLife/opentree/commits/argus-zoom
00:48 dukeleto jimallman: you are right, our API will have it's own SSH push access to Github
00:48 dukeleto jimallman: and yes, we will get commits which have different committer/author data
00:50 jimallman dukeleto: i suppose if someone's working outside of our API, assigning provenance is kind of on the honor system...
00:50 dukeleto jimallman: yes, not worrying about that right now. Too much other stuff to worry about :)
00:51 dukeleto jimallman: i am also thinking about concurrent writes
00:51 jimallman i think that's just the way of the world..  if someone decides to vandalize the data repo with junk, it will be a "wet cleanup". unless we restrict push access to the master branch (we can do that, right)... but then someone's responsible for being the gatekeeper.
00:52 dukeleto jimallman: if two write requests come in at essentially the same time (a few ms different), we are going to need to make one wait for the other to finish
00:52 jimallman yeah, we don't have a great story for concurrent editing of a study. i've thought about using the API to implement a simple lock, based on the presence/absence of a WIP branch
00:53 jimallman oh, you mean right-now concurrency.. hm. that seems like a more general problem. i would have guessed we'd queue them somehow, or the receiving repo would force the second one to wait, no?
00:53 dukeleto jimallman: that is a similar issue. But if Bob attempts to write to bob_study_10 and Mary attempts to write to mary_study_11 at exactly the same, they could step on each others toes
00:54 dukeleto jimallman: because with local git operations, we are operating on a single directory/repo
00:54 mtholder joined #otol
00:54 jimallman different authors, different studies.. what's the conflict?
00:54 jimallman oh, i see
00:55 jimallman yeah, not something a normal git repo has to worry about. it would be interesting to try a test with a shell script, kick off a few simultaneous long-running operations on a  local repo and see what happens.
00:55 dukeleto jimallman: one request changes the git branch, the other switches the branch too, then the first writes data...
00:55 jimallman doh! i gotcha
00:56 jimallman wow, that had not occurred to me. maybe we keep a pool of separate working directories? (or, god forbid, set one up for each request? i assume that's crazy talk)
00:56 jimallman but yeah, i totally get the problem now... hm.
00:57 jimallman there's gonna have to be a queue of some kind, i would think.
00:57 dukeleto jimallman: yep
00:57 jimallman so i should plan for someone having to wait for things that "ought to" be quick, like a simple Save in the curation UI.
00:59 jimallman i like the pooling idea, but since it's the Nexson repo, each working directory would be pretty hefty even if it's shallow.
00:59 dukeleto jimallman: yes. If 10 people click Save at the same time, the API will have to process them serially or use a pool of local git repos
01:00 jimallman any chance this will happen without a queue? is web2py capable of handling multiple simultaneous requests, or will they be forced to wait anyway?
01:01 * jimallman hasn't looked at the local-repo branch, so i don't know if you're handing this off to a separate process or what...
01:05 dukeleto https://github.com/OpenTreeOfLife/api.opentreeoflife.org/compare/local
01:05 dukeleto not handing off to a separate process, yet
01:06 dukeleto jimallman: i assume web2py can process multiple requests at the same time, but that is just an assumption
01:06 jimallman yeah, i'm not sure (but will check in a minute)
01:06 jimallman "The web server handles each request in its own thread, in parallel."
01:07 jimallman ... though it looks like we could limit this in config, that's probably a terrible idea.
01:08 jimallman better to have a proper queue for long-running writes, and handle the safe read-only requests concurrently. though i suppose aggressive caching (as Jonathan has described) will cover that end.
01:10 jimallman recommendations from web2py book:
01:10 jimallman http://web2py.com/book/default/chapter/04#Running-tasks-in-the-background
01:11 jimallman dukeleto: the web2py scheduler sounds like a possible solution...
01:13 jimallman or the "homemade task queue" option, i suppose
01:14 jimallman ... since we're just talking one commit/branch at a time.
01:15 jimallman scheduler (with n worker nodes) would be a good match to a "pool" of working directories, would allow limited concurrent operations.
03:05 dukeleto joined #otol
04:51 jimallman joined #otol
05:43 dukeleto joined #otol
07:40 mtholder joined #otol
13:49 towodo joined #otol
15:55 jimallman joined #otol
19:17 jimallman joined #otol

| Channels | #opentreeoflife index | Today | | Search | Google Search | Plain-Text | summary