Perl 6 - the future is here, just unevenly distributed

IRC log for #darcs, 2017-05-03

| Channels | #darcs index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
01:48 ilbot3 joined #darcs
01:48 Topic for #darcs is now http://darcs.net/ | logs: http://irclog.perlgeek.de/darcs/ | darcs 2.12.5 is out http://darcs.net/Releases/2.12
06:11 leg joined #darcs
07:12 bf_ joined #darcs
07:36 bf_ Yesterday I talked about encoding in darcs and how I mean to tackle it. Here is a revised version of the plan I proposed.
07:41 bf_ (1) All Strings (including FilePath and the internal FileName newtype) represent unicode as they are meant to do.
07:44 bf_ (1a) However, we allow some additional code points to represent bytes that cannot be decoded using the user's current locale. This is the encoding you get via GHC.IO.Encoding.getFileSystemEncoding.
07:46 bf_ It has the property that it allows to roundtrip arbitrary byte sequences to String and back, i.e. forall s :: ByteString. encode . decode == id
07:52 bf_ (2) All ByteStrings are assumed to be arbitrary sequences of bytes with no fixed encoding. We can convert them to Strings and back using the above encoding scheme.
07:59 bf_ (2a) ...but this should only be done if necessary.
08:02 bf_ Implementing this requires lots of invasive changes and it will be hard to guarantee that no breakage ensues. So we must very carefully test, especially with old repositories. Perhaps we can use the repos on hub.darcs.net as test cases.
08:05 Heffalump if a ByteString has no fixed encoding, how do we convert it to a String that is unicode?
08:59 bf_ Using the user's locale with the extra //ROUNDTRIP feature. This is the default encoding for command line arguments and environment variables. We make it the default for (String-based) file IO, too, via setLocaleEncoding.
08:59 bf_ As I said, the result may not be strictly conforming to unicode but I think this shouldn't bother us.
09:01 bf_ Of course: IO from and to ByteStrings should be done using the IO functions from bytestring. So no encoding or decoding here.
09:02 bf_ Ask yourself: where do raw bytes (not strings) come from? Well: either from user interaction (command line, mostly) or from file IO.
09:04 bf_ The file IO we either control ourselves (internally used file to represent some state) or it is user data. The latter we already handle in a completely opaque manner.
09:05 bf_ (That's why Darcs.Util.Tree calls them Blob)
09:10 bf_ Lets take an example. User adds and records file. Another user with another locale edits it and also records. This should just work.
09:12 bf_ A question is, perhaps, how the second user sees the file name. It may look strange, but that can't be helped, I think.
09:12 bf_ It will look stange when he uses ls command, too.
09:14 bf_ In all files that are used internally we convert the file name back to ByteString (modulo the special white space encoding).
09:30 lambdabot joined #darcs
09:47 Heffalump what about things like patch metadata (commit messages, author, etc). Don't those also come from user interaction?
10:48 bf_ They do. And I think they should be put into the patch file without tampering with them. (Except perhaps wrt newlines: it might make sense to use unix standard for that.)
10:49 bf_ What I mean with no tampering: put the bytes into the file as we get them from the system.
10:54 bf_ Another option for metadata would be to try UTF8 and use the raw bytes only if that fails. This would be a bit more user friendly if a repo is shared between people who use different locales.
10:55 bf_ (With 'use' I meant 'store'.)
10:59 bf_ I think this is more or less what darcs tries to do now with the lenient version of UTF8 encoding.
11:03 Heffalump I think the current behaviour is to use UTF8 in the patch files
11:03 Heffalump (I think that's also the right thing to do, as it's more of a "normal form")
11:06 bf_ Storing patch meta data as UTF8 is a bit more complicated to implement. We would need at least another data constructor for Printables in Darcs.Util.Printer.
11:06 Heffalump but we already do it
11:06 Heffalump if I'm right, that is
11:09 bf_ I think you are (probably) right. I was thinking about how that would change my current plan.
11:10 bf_ The problem is that we mix all sorts of data in a Doc. Your RenderMode(Standard,Encode) addition is at the wrong end: we need to decide when we convert something to a Doc.
11:10 bf_ Because that is the place where we know what data we have.
11:14 bf_ BTW, all my current plans assume that ASCII is a subset of the user's encoding. I think nowadays we can just assume that.
11:18 bf_ ...and I can't promise the idea will work. I am currently experimenting with it and it looks as if it might work. But it needs a lot more testing with various locale settings to be sure.
11:21 bf_ At the moment, I wonder what to do with things like prefs. These are text files, in principle, but we treat them as binary (raw bytes).
11:25 bf_ I think I will byte the bullet and convert the code to use ByteString for such data. Or else use the roundtrip locale encoding but without newline translation.
11:25 bf_ s/byte/bite/ :)
11:35 bf_ Thinking about this a bit more, I am now leaning toward treating these as normal text files and use the native encoding. These files are meant to be edited by the user and (AFAIK) not shared between repositories on different machines.
11:42 bf_ (except prefs/email, this is a special case where we can re-create the file when we clone)
12:34 bf_ joined #darcs
12:57 Heffalump prefs can be moved under version control, but I think then the user edits them manually so they're like any other file
12:58 Heffalump the general confusion over what things are where is why I wanted to do this with types: annotate things on input with what encoding we know they are, and annotate on output with what encoding we expect. Then fix all the type errors.
13:11 bf_ Heffalump: I agree with the type idea. I am actively using it.
13:12 bf_ I just wanted to avoid layering yet another level of types over an inconsistent code base.
13:16 bf_ The first distinction to be made is between ByteString and String. I have also tried to concentrate handling of white space encoding to the Name newtype, and have made Name into an abstract type.
13:19 bf_ I have not yet changed anything rtw metadata encoding. Since we use ByteString when reading and writing patch data, it does not collide with what I have done so far.
14:22 jeltsch joined #darcs
14:22 jeltsch What are these “packs” on darcs hub?
14:23 jeltsch There is a button “build packs”.
14:35 sm hi jeltsch. A little known darcs feature which is supposed to speed up darcs get
14:36 jeltsch sm: Are they described in the user manual?
14:37 alqatari joined #darcs
14:45 sm jeltsch: don't know, I'd guess no
14:45 sm and what does the user manual mean these days exactly
14:46 jeltsch sm: What do you mean? I always relied on the user manual being authorative.
14:46 sm where do you find it ?
14:46 sm this one: http://darcs.net/manual/bigpage.html ?
14:46 jeltsch http://darcs.net/manual/
14:47 jeltsch Oh, it says 2014-02-06 at the bottom. :-O
14:47 sm yes, note how right at the top it says 2.8.4 (+ 1 patch), which raises questions :)
14:47 jeltsch What the f***.
14:47 jeltsch This is a serious issue.
14:47 sm I believe the most up to date manual(s) is built in to the program now
14:47 jeltsch What do you mean?
14:48 sm darcs help, darcs help markdown etc.
14:48 jeltsch Hmm.
14:48 sm there was some discussion of this situation here a few weeks back I think
14:48 jeltsch And where are the packs documented there?
14:48 sm I don't say they are, you could search
14:49 jeltsch Where should I search? Is there a subcommand “pack”?
14:49 sm sorry to be unhelpful, I don't know the answer
14:49 jeltsch You probably now how to make use of these “packs”, that is, what subcommands to use. ;-)
14:49 sm I'm thinking search the output of darcs help markdown, which I believe is supposed to be the whole manual
14:49 jeltsch Very good.
14:50 sm packs are a concept used in other vcs's, it's a repo optimization which should make a cold darcs get more efficient - transferring things in fewer larger chunks
14:50 sm they are generated by darcs optimize maybe ? which should be in the manual, actually
14:52 sm darcs optimize http --help
16:16 pointfree sm: If the front page of darcshub becomes the user feed when logged in, perhaps the darcshub faq + darcs manual could be organized into a help section of darcshub.
16:23 alqatari joined #darcs
16:36 alqatari joined #darcs
16:52 drostie joined #darcs
17:13 sm maybe pointfree
17:14 sm integrating darcs help nicely into darcs hub is a nice thought
17:15 sm oh oh my old english teacher would send bolts of lightning if she saw how much I'm using "nice"
17:19 pointfree hahas :)
17:56 bf_ Heffalump: I am down to 5 test failures. 4 of them are rebase tests, I'll care about them later. The one that bugs me is issue1763-pull-fails-on-non-ascii-filenames.sh. It seems I have re-introduced this issue.
18:10 Heffalump I'm surprised rebase tests really care about encoding
18:16 amgarchIn9 joined #darcs
18:32 leg joined #darcs
18:38 bf_ Heffalump: can you help me with this:
18:38 bf_ formatFileName OldFormat = packedString . fn2ps
18:38 bf_ formatFileName NewFormat = text . encodeWhite . fn2fp
18:38 bf_ This is the original code from Darcs.Patch.Show
18:39 bf_ fn2ps (FN fp) = packStringToUTF8 $ encodeWhite fp
18:40 bf_ This was from Darcs.Util.Path
18:41 bf_ I am struggling to understand the difference and what old and new format refer to
18:41 Heffalump I'll take a look
19:06 Heffalump I don't remember anything about this. OldFormat seems to be V1 patches, NewFormat V2 patches (but it's not completely clear from the code without a lot of tracing). It's not immediately obvious how they differ in behaviour either. Perhaps QuickCheck could tell us.
19:09 gh_ joined #darcs
19:11 gh_ left #darcs
19:40 bf_ Sigh. I was hoping you could shed some light on this.
19:44 bf_ The (only) difference between the formats is that for OldFormat we do packStringToUTF8 (inside fn2ps) while for NewFormat we don't (fn2fp just unwraps the newtype)
19:46 bf_ This can be seen by some manual inlining.
19:50 Riastradh joined #darcs
20:57 drostie joined #darcs
21:24 bf_ I have fixed the problem I had with the non-ascii-filename test: had forgot to adapt readFileName so it conforms to the new rules.
21:26 bf_ But it only works with darcs-2. With darcs-1 I get encoding errors. Looking into what an unmodified darcs does here when using dracs-1 format I am baffled.
21:30 bf_ The test sets LC_ALL=C, then records a (utf8 encoded, non-ascii) file. The patch then contains the file name with four bytes per non-ascii char. Excerpt form zcat ...|hexdump -C:
21:30 bf_ 00000050  66 69 6c 65 20 2e 2f 6b  69 74 c3 83 c2 b6 6c 74  |file ./kit....lt|
21:32 bf_ Here, the "...." is byte sequence c3 83 c2 b6 and stands for "ö"
21:32 bf_ wtf?
21:32 drostie joined #darcs
21:34 bf_ With darcs-2 it is 00000050  66 69 6c 65 20 2e 2f 6b  69 74 c3 b6 6c 74 c3 a9  |file ./kit..lt..|
22:08 jeltsch joined #darcs
22:10 bf_ The way darcs stores filenames in darcs-1 format (OldFormat) is pretty, ahem, interesting. The encoding is (assuming no spaces) done by UTF8 encoding the raw byte sequence interpreted as unicode code points.
22:13 bf_ This is why 'ö' in utf8 encoding gets 4 bytes in darcs-1 patches: the utf8 encoding has 2 bytes, which we utf8 encode to 2 bytes each.
22:15 bf_ I can emulate this in my scheme by do this backwards using Data.ByteString.Char8.pack/unpack plus en/decodeLocale//ROUNDTRIP.
22:15 bf_ It is still a pretty weird format IMHO.
22:25 dleverton joined #darcs
22:26 dleverton joined #darcs
22:32 lambdabot joined #darcs
22:32 bf_ I just found out that issue2382-mv-dir-to-file-confuses-darcs.sh fails for darcs-1 format.
22:33 bf_ (in the HEAD)
23:45 bfrk joined #darcs

| Channels | #darcs index | Today | | Search | Google Search | Plain-Text | summary