Camelia, the Perl 6 bug

IRC log for #darcs, 2013-04-11

| Channels | #darcs index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
00:16 mizu_no_oto joined #darcs
01:05 alexei joined #darcs
01:46 javier_rooster joined #darcs
02:50 preflex_ joined #darcs
02:54 mizu_no_oto joined #darcs
03:09 mizu_no_oto joined #darcs
03:11 carter joined #darcs
05:32 edwardk joined #darcs
06:12 alexei joined #darcs
06:18 lelit joined #darcs
07:31 alexei joined #darcs
07:46 raichoo joined #darcs
07:55 edwardk joined #darcs
08:05 gal_bolle joined #darcs
09:23 donri joined #darcs
09:35 owst joined #darcs
10:18 owst joined #darcs
10:44 Thaalos joined #darcs
12:32 edwardk joined #darcs
12:43 mizu_no_oto joined #darcs
12:45 delamonpansie joined #darcs
12:57 iago joined #darcs
13:21 mizu_no_oto joined #darcs
13:32 nomeata joined #darcs
13:32 nomeata joined #darcs
13:54 benmachine joined #darcs
13:55 benmachine I'm backing up my hard drive and rsync took serious issue with my .darcs/cache/patches directory
13:55 benmachine even ls struggles with it
13:55 benmachine would it possibly be a good idea to break it up arbitrarily into subdirectories?
13:55 benmachine or indeed does this already happen and I'm just doing it wrong somehow
13:59 vikraman joined #darcs
14:01 javier_rooster joined #darcs
14:05 owst benmachine: it doesn't happen, you're not doing it wrong :-)
14:05 benmachine it contains about 31 thousand files
14:07 mndrix joined #darcs
14:07 owst Yeah, I have 38 thousand files and ls baulks at it
14:08 benmachine http://bugs.darcs.net/issue1624 oh hello
14:14 * benmachine pushes it to the back of his todo list
14:14 benmachine it may or may not ever see the light of day
14:19 gal_bolle joined #darcs
14:19 MasseR joined #darcs
14:24 mornfall A patch for bucketing of darcs cache was floating around somewhere.
14:24 mornfall It's been years though.
14:26 mornfall owst: We need to talk, btw. :-)
14:27 mornfall There's some pressure to update darcs-fastconvert.
14:27 mornfall Including patches. :-P
14:27 mizu_no_oto joined #darcs
14:28 mornfall owst: Did you talk to Niklas Hambüchen?
14:30 owst Hey mornfall! Yeah, I know.
14:31 owst (re: pressure)
14:31 owst I didn't talk to whoever that is.
14:31 mornfall He sent me patches against http://darcsden.com/mornfall/darcs-fastconvert to make things compile with current GHC
14:31 mornfall Which is a fork of your repo that no longer exists. :-)
14:32 owst hah, oh good :-)
14:32 mornfall Also, he published the result as https://github.com/nh2/darcs-fastconvert
14:33 owst Gah. It won't work like it is.
14:33 owst The git-merge handling is just wrong
14:33 owst in fact, most of that code that I wrote is utter crap :-)
14:34 owst I'm going to have lots more free time very soon, so I really will finally get it all sorted out.
14:37 gbeshers joined #darcs
14:39 mornfall owst: What's your mail?
14:41 mornfall Nvm, found.
14:43 owst mornfall: cool, thanks
14:44 mornfall One of those annoying things about Haskell's success avoidance strategy is that things bitrot incredibly fast. :|
14:45 benmachine the price of innovation :P
14:45 mornfall I guess. :P
14:45 mornfall Although I'm still not happy about the innovative way haskell programs now crash when fed invalid unicode input.
14:46 mornfall I have always thought that was unique to python.
14:48 benmachine yeah, that kind of sucks, but I take it as a lesson about how haskell's standard IO library is kind of naff altogether
14:48 benmachine there are lots of better replacements, but no-one will settle on a single one
14:51 mornfall Yeah, but it still means that small haskell programs suck by default.
14:52 benmachine mm
14:52 mornfall It makes me unhappy and it makes me write perl programs instead of haskell programs whenever IO is involved.
14:52 benmachine well, there's always Data.ByteString.Char8 :)
14:53 benmachine for when you want your unicode behaviour to not be "crash on invalid" but "just plain wrong" instead
14:54 mornfall Interestingly, all 8-bit clean programs in all sane languages happen to be doing the right thing with UTF8 since like 80s... :P
14:55 mornfall They also happen to work with all those fancy ISO and Windows encodings of CE characters.
14:55 mornfall Only Haskell and Python programs crash instead.
14:56 benmachine what do you consider to be the "right thing"?
14:56 benmachine I mean, I've recently started learning OCaml and aiui the unicode situation there is somewhat close to "put fingers in ears and shout la la la everything is ascii"
14:56 dolio joined #darcs
14:59 mornfall Try writing ls in haskell. :) Or grep.
15:00 benmachine try writing grep in *anything* and actually getting it right, I suspect this will not be easy :P
15:00 benmachine (I'm thinking that grep has patterns for uppercase and lowercase, but I might be wrong about that)
15:00 benmachine (if it does then you need to do unicode properly and also locale and oh my)
15:08 owst Err gal_bolle Heffalump sm, where do I push/send patches to the wiki?
15:26 lelit does really perl handle unicode better than python?
15:27 * lelit is happily coding his first Py3 app
15:54 donri unicode is broken in python2, fixed in python3, but broken in python3 libraries
15:55 donri stdlibs that is. basically they use unicode in places where no encoding is known or specified
16:13 benmachine donri: I thought encodings were things you needed when you wanted to turn stuff into bytes
16:13 benmachine or back
16:19 mornfall benmachine: Yes, but treating input as locale-encoded, or as unicode, or anything like that is simply broken.
16:20 mornfall You should be allowed to work with bytes.
16:20 mornfall And it should arguably be very easy by default.
16:20 donri yes, it makes an assumption like "locale-encoding" or "utf-8" or something like that in cases where no encoding is known or specified
16:21 donri haskell has the same problem in some cases if you use String
16:21 mornfall Encodings are always tricky, and unless you code for them explicitly, your program is probably broken. If you treat everything as bytes, it'll most likely work.
16:21 lelit doing it right is not trivial, in any env I met
16:21 mornfall lelit: It's tricky because there simply is no "right".
16:21 lelit but I would say "broken"
16:21 lelit yes
16:22 lelit I would *not* say "broken" I meant
16:22 lelit with some love and experience, you can deal with it with proper results
16:23 mornfall Well, if you use the python/haskell approach, it's extremely likely that you will never notice a problem, but the program will mangle data for your users.
16:23 mornfall It will also sometimes crash.
16:23 lelit with py2 yes, that happens
16:23 lelit with py3, well, its much harder :)
16:27 mornfall Well, it's a sad state of affairs when sendmail, a monster from the 80s, can deliver arbitrarily encoded emails without screwing them up, while "modern" programs in modern languages will screw you over semi-randomly.
16:28 javier_rooster joined #darcs
16:29 lelit well, but sendmail does perform much "text processing" after all
16:29 donri lelit: well. if the library decodes the bytes with the wrong encoding, you're only hope is to encode it back to bytes with the same encoding, which you then must know, and now you're doing lots of redundant encoding operations. also the library might treat the data as text when it really isn't text, in which case you can't even work around it.
16:30 lelit yes. but in that case, the library is broken
16:30 donri yes, which was what i said originally :)
16:31 donri python3 as a language does unicode well enough. the stdlib is broken in several places.
16:31 lelit either you know the encoding, or you don't. in the latter case, there's little a programming language can do to fix your problem :)
16:31 donri (the situation was reversed from python2)
16:32 donri lelit: the library can have a bytestring based api.
16:32 mornfall lelit: When you don't know the encoding, the programming language can at least keep from re-encoding things at random behind your back...
16:32 lelit can you suggest an example?
16:32 donri if it has a text api when no encoding is known or specified, it will have to decode the bytes with a potentially wrong encoding
16:33 lelit uhm
16:33 donri lelit: this isn't really my own experience, but something i heard from armin ronacher. in particular i think he mentioned the python3 urllib
16:33 mornfall I guess in given circumstances, the best we can hope for is that we never encounter another legacy-encoded file, or a filename, again.
16:33 lelit I still think that is a problem with the *user* of the library
16:34 mornfall Sadly, they keep coming back. Over and over again. :(
16:34 lelit oh yes
16:34 lelit I keep saying that the real y2k problem was not that of centuries in the dates, but rather with the codecs :)
16:35 donri lelit: no. if i give the url library a unicode str, it will have to encode it to bytes to send it as a HTTP request for example. but it has no way of knowing what encoding to use, so it will guess. better would be to take a bytestring url and just send it as-is.
16:35 lelit they distracted (and invested!) silly amount of $ in the wrong problem
16:35 donri lelit: this is not the user's problem, it's a broken library
16:36 donri the same is true for String in haskell
16:39 lelit I think this is true for *any* language, you cannot use unicode URLS
16:40 Heffalump owst: wiki-author@darcs.net:wikidata
16:40 lelit http://www.w3schools.com/tags/ref_urlencode.asp
16:40 lelit http://stackoverflow.com/questions/11818362/how​-to-deal-with-unicode-string-in-url-in-python3
16:40 lelit but going OT :)
16:42 donri lelit: yes, but the python api only accepts unicode. that's the problem, that's why it's broken.
16:42 whaletechno joined #darcs
16:50 benmachine donri: I think there's a general problem whereby whether or not some piece of data is bytes or a string is subtle, and lots of people have screwed it up in the past
16:51 benmachine aiui filepaths are bytes on many Linux systems and strings on Windows
16:51 donri yep
16:52 mornfall Yes. They are also different types of strings depending on the filesystem, on windows.
16:52 mornfall Basically, windows is just broken.
16:52 benmachine oh man, excellent
16:52 lelit hehe
16:52 benmachine anyway, I guess what py3 does is forces you to choose, and opens the possibility of you choosing wrongly
16:52 benmachine but at least it also opens the possibility of you choosing correctly
16:52 benmachine which is not always present :P
16:53 donri no, the opposite. python3 makes the choice for you
16:53 alexei joined #darcs
16:53 lelit well, I guess you could have the same problem on unixes, expecially on distributed filesystem like nfs and the like
16:53 benmachine I thought we agreed that the language was good but the library was bad :P
16:53 donri yes, the library
16:53 donri that's where it matters
16:54 lelit it does not make a choice
16:54 benmachine how about this: python exposes the fact that its libraries are broken
16:54 benmachine whereas most languages successfully cover it up :P
16:54 donri well in python2 the language was broken but the libraries ok
16:55 donri so not sure what you mean
16:55 benmachine how can libraries of a broken language be ok?
16:55 lelit donri: I doubt it, at least for the referenced urllib
16:56 donri benmachine: because they take bytestrings
16:56 benmachine I suppose it was technically possible to write correct py2 programs, because the unicode type existed
16:56 donri the language was broken because if you gave something that wanted bytes unicode, it would auto-encode as ascii. but you could still pass proper bytes to the functions
16:57 donri in python3 the library expects unicode, so the only way to get the right bytes to pass through is to decode the bytes into a unicode string, globally change the locale and then call the function with the unicode string
16:58 donri that's all assuming there even is a defined encoding for those bytes
17:04 lelit isnt it ascii?
17:05 mornfall ascii is just 7 bits
17:05 lelit yes
17:05 lelit that's all you can put in a URL, isn't it?
17:06 mornfall Depends on who does the urlencoding.
17:06 donri lelit: ok, so what to do if you give it unicode characters?
17:06 mornfall Also, it depends on what you are talking to.
17:06 lelit quote it
17:07 lelit that is, an ancient and particular codecs :)
17:07 mornfall lelit: As long as the urllib doesn't urlencode itself, it should be OK, but if it does, you are screwed.
17:07 mornfall (Since you'll just end up with a double-quoted string then.)
17:07 lelit it does not, that's why I told "it does not make a choice"
17:07 donri i'm not sure urlencode supports unicode
17:08 mornfall donri: urlencode supports arbitrary bytes
17:08 donri i think all you can do is encode the unicode to bytes and then urlencode the bytes
17:08 donri then we're back to the original problem
17:08 mornfall But I don't see what's the point of feeding unicode into urllib if it only accepts ascii anyway.
17:08 donri exactly
17:09 lelit yes, I see the point
17:10 lelit but again, the problem manifests itself on every "boundary", be it the net, or a remote filesystem...
17:10 lelit you simply must know what you are doing
17:11 donri well the solution is trivial: accept bytes (when no encoding is known or specified) and force the user to encode and thus choose an encoding
17:11 donri of course dynamic typing makes that harder to specify, and even in haskell people insist on adding Char8 apis
17:14 raichoo joined #darcs
17:17 alexei joined #darcs
17:57 edwardk joined #darcs
18:20 lelit joined #darcs
18:23 edwardk joined #darcs
19:13 owst joined #darcs
19:26 mizu_no_oto joined #darcs
20:06 sm I've revived the darcs hub board that's in the Darcs organisation on trello, and made it public
20:06 sm https://trello.com/board/5065e0710bca2ef358849d00 , https://trello.com/darcs
20:07 sm both the boards are public, but the org overview page is not, we should probably change that
20:07 sm I think only kowey can
20:08 sm @tell kowey hey can you make the http://trello.com/darcs org page public ?
20:08 lambdabot Consider it noted.
20:28 javier_rooster joined #darcs
21:32 edwardk joined #darcs
21:43 javier_rooster joined #darcs
22:15 owst joined #darcs
22:43 owst Heffalump: cool, thanks.
22:59 saep joined #darcs
23:00 dolio joined #darcs
23:50 mizu_no_oto joined #darcs

| Channels | #darcs index | Today | | Search | Google Search | Plain-Text | summary