Perl 6 - the future is here, just unevenly distributed

IRC log for #parrotsketch, 2008-07-15

| Channels | #parrotsketch index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
01:36 daxelrod joined #parrotsketch
12:43 wknight8111 joined #parrotsketch
13:05 wknight8111 joined #parrotsketch
13:36 wknight8111_ joined #parrotsketch
14:29 cognominal joined #parrotsketch
15:39 cognominal joined #parrotsketch
15:54 pmichaud joined #parrotsketch
16:38 tewk_ I'm not going to make the meeting, I'll backlog like I always do.
16:38 tewk_ REPORT
16:39 tewk_ *  Talked with japhb about ManagedStruct definitions generation.
16:39 tewk_ That wasn't in my original plan, but It seems very useful.
16:39 tewk_ I'm going to try to fit in in instead of STDCALL and porting to OSX.
16:39 tewk_ STDCALL support is a lower priority item that ManagedStruct support.
16:39 tewk_ *  The Parrot_jit_build_call_func is completely out of date. Its pre PCC.
16:39 tewk_ It still used hardcoded registers numbers for argument and object passing.
16:39 tewk_ *  I've rewritten most of the argument passing portion of the build_call function and
16:39 tewk_ now am starting on handling return values from c functions.
16:39 tewk_ *  The old generated jit called string_to_cstring to convert STRINGS to char *, but never freed the cstrings.
16:39 tewk_ So I'm working generating the necessary free calls in the jit instruction stream.
16:39 tewk_ EOR
16:43 tewk_ Instead of saying pre PCC, its probably better to say pre variable register counts, when certain registers had particular calling convention purposes.
16:58 jhorwitz joined #parrotsketch
17:43 barney joined #parrotsketch
17:45 cotto_work joined #parrotsketch
18:06 Auzon joined #parrotsketch
18:12 wknight8111 joined #parrotsketch
18:24 allison joined #parrotsketch
18:28 NotFound joined #parrotsketch
18:29 DietCoke joined #parrotsketch
18:30 DietCoke Hello, folks.
18:30 allison hello, coke
18:30 wknight8111 hello
18:30 chromatic joined #parrotsketch
18:31 NotFound Hola
18:31 cotto_work hi
18:31 chromatic morning
18:32 pmichaud good afternoon
18:32 barney hi
18:32 davidfetter óla
18:32 DietCoke My report basically consists of fighting with svn merge for some time this week, and learning enough to drop my from-scratch attempt at tcl on PCT. Why don't we go in "hello" order. Allison?
18:33 allison - I've got the pdd25 branch down to 3 failing test files, and I've nearly finished debugging another.
18:33 allison - Spent time talking to potential Parrot sponsors.
18:33 allison - (Spent lots of time on OSCON.)
18:33 allison EOR
18:34 wknight8111 * my $real_job = I_got_a_new_job($me++)
18:34 wknight8111 * Lots of GC debugging work, some nasty segfaults to track down
18:34 wknight8111 * Progress slow but steady
18:34 wknight8111 EOR
18:34 pmichaud (steady progress)++
18:34 NotFound Fixing bugs, applying patches, and working on pdb
18:35 NotFound I have two questions
18:35 NotFound EOR
18:36 chromatic Traveling; not a lot of time.
18:36 chromatic Helped Allison debug one of the remaining pdd25cx problems.
18:36 chromatic Giving Andrew as much help as possible.
18:37 wknight8111 chromatic++
18:37 chromatic Working on some things that weren't appropriate to land before the release.
18:37 chromatic Will branch for the strings PDD shortly; going to pull in NotFound for that.
18:37 chromatic EOR
18:37 jonathan joined #parrotsketch
18:37 cotto_work mostly PMC-related bugfixes and closed tickets
18:37 cotto_work queue 1 question
18:38 cotto_work eor
18:38 pmichaud I worked mostly on lexical issues this week, trying to understand them and coming up with a way that Parrot can handle them properly
18:38 pmichaud I also tracked down the PGE bugs in the pdd25cx branch -- that appears to be a register alligator bug
18:38 pmichaud did a little more work on getting HLL to work with PCT -- several of chromatic++'s fixes this past week help there
18:39 pmichaud other than that, no formal report this week -- busy with $otherjob stuff and preparing for lots of trips
18:39 wknight8111 (register alligator)++
18:39 spinclad (alligator bug)--
18:39 barney Simplified and extended the Pipp grammar.
18:39 barney Stared support for $this in Pipp.
18:39 barney Better support for quoted strings in Pipp.
18:39 barney Fixed bug with constant table.
18:39 barney Added some languages to Perl::Critic testing.
18:39 barney Released Parrot 0.6.4.
18:40 barney Registered for YAPC::EU
18:40 barney .eor
18:42 DietCoke davidfetter? If not, particle.
18:42 davidfetter particle
18:42 * davidfetter slacker
18:42 particle ~ tewk pasted his report earlier, see the logs.
18:42 particle ~ meetings to discuss smoking parrot with ms osl keep getting canceled. hope third time is a charm (post-oscon)
18:42 particle ~ parrot foundation setup continues (banking issues, website, donation software, charitable org listing, etc)
18:42 particle ~ talking to some potential parrot foundation donors tonight
18:42 particle ~ need to finalize plans for oscon travel, and quickly!
18:42 particle .end
18:43 DietCoke Anyone else?
18:43 jonathan Mee!
18:43 jonathan This week...
18:43 jonathan * Nearly forgot Parrot Sketch! :-O
18:43 jonathan * Spent my time on my Rakudo day on Thursday mostly on implementing Perl 6 enums; the main bits are done now.
18:43 jonathan * Figured out along the way that anonymous classes should be relatively easy-ish
18:43 jonathan * Tried to contribute to the lexicals discussion a bit, though was struggling for brain cycles
18:43 jonathan * Other random odds and ends too; think I fixed a segfault due to an off-by-one...for some reason my brain is all hazy at the moment.
18:43 jonathan * Will do Rakudo day on Friday this week, since it fits best
18:43 jonathan EOR
18:43 * jonathan needs to be afk for a bit now - sorry
18:44 DietCoke heh. Anyone else?
18:45 DietCoke ok. I think there were 2 folks with questions.
18:45 DietCoke NotFound?
18:45 NotFound First question is simple: new name for pdb. Last proposal is pbc_debug
18:45 chromatic +1
18:45 allison bytecode_debug
18:45 chromatic Whose bytecode?
18:45 pmichaud pbc_debug +1
18:46 allison parrot_bytecode_debug
18:46 allison pbc is cryptic
18:46 pmichaud parrot_bdb  :-)
18:46 allison parrot_debug
18:46 pmichaud pbcdb
18:46 NotFound parrot_debug++
18:47 particle parrot_debugger++ or parrot_debug++
18:47 barney also: pdump -> pbc_dump    ??
18:47 allison in theory, it can also debug pasm or pir, so parrot_ makes sense
18:47 NotFound (I thinked it was simple)
18:47 allison I like parrot_debugger
18:48 pmichaud parrot_debugger +1
18:48 NotFound parrot_debugger +1
18:48 allison parrot_debugger +2
18:48 NotFound parrot_debugger wins!
18:49 DietCoke in that case, please update the other executable we just renamed to pbc_foo to parrot_foo. =-)
18:49 allison barney: pdump -> parrot_dump?
18:49 pmichaud depends on the other executable.  If it's only for bytecode files, then it should probably remain 'pbc'
18:49 DietCoke pmichaud: the debugger is only for pbc. =-)
18:49 NotFound Second questions is about literal strings in pir. The spec says they can only contains ascii chars, but there are test with iso-8859-1 and utf8.
18:49 allison or parrot_
18:49 * barney agrees with pmichaud
18:50 allison parrot_bytecode_, I meant to type
18:50 DietCoke NotFound: are those prefixed with unicode:'' or something similar?
18:50 allison NotFound: that depends on the string metadata
18:50 barney parrot_bytecode_  ++
18:50 NotFound And also, is not clear if the charset and encoding prefix are intended for the string generated or the contains of the literal.
18:51 pmichaud the literal strings in PIR are always specified using ASCII
18:51 pmichaud what they encode may be iso8859-1 or utf8
18:51 pmichaud unicode:"\xbb"   legal
18:51 NotFound pmichaud: that's the way I understand the spec, but current test fails to meet it.
18:52 pmichaud "«"   not legal
18:52 allison NotFound: which test?
18:52 NotFound Forgot my notes, one second...
18:52 DietCoke pmichaud: not according to the PDD.
18:52 DietCoke "docs/pdds/draft/pdd19_pir.pod" 1295 lines --14%--            194,16        13%
18:52 allison The PDD is more advanced than the current implementation
18:53 pmichaud I'm speaking only of PDD, myself
18:53 allison so there are really two levels of answer here: what does it do now, and what should it do?
18:53 DietCoke ok. the PPD shows a non-ascii aexample.
18:53 pmichaud DietCoke: you're correct, I had not seen pdd19_pir.pod:194
18:53 wknight8111_ joined #parrotsketch
18:54 allison (and we may be dealing with an inconsistently updated spec, here)
18:55 NotFound t/op/stringu/t
18:55 NotFound t/op/stringu.t
18:55 allison NotFound: where does the spec say that literal strings can only be in ASCII?
18:57 pmichaud line 129
18:57 NotFound "Only 7-bit ASCII is accepted in string constants; to use characters outside that range, specify an encoding in the way below."
18:57 DietCoke ... and then it goes on to show you below how to use something else. =-)
18:57 pmichaud NotFound: which test in t/op/stringu.t contains a non-ascii char?
18:58 DietCoke I'd add some verbiage like "unless you specify otherwise, as described below"
18:58 allison line 129 of PDD 28?
18:58 DietCoke pdd 19
18:58 pmichaud line 129 of pdd19
18:58 allison oh, that's easy, PDD 19 is wrong
18:58 NotFound DietCoke: the description below is how to escape it, not how to write it directly.
18:58 allison it's still in draft, you can't take it as authoritative yet
18:59 DietCoke NotFound: encoding != escaping.
18:59 pmichaud NotFound:  which test in t/op/stringu.t contains a non-ascii char inside the "..." ?
18:59 allison pmichaud: test named "UTF8 as malformed ascii"
18:59 pmichaud well, yes, that's testing that it's in fact an error
18:59 barney line 197 of pdd19 ?
19:00 allison and "UTF8 literals"
19:00 NotFound "UTF8 literals" also
19:00 pmichaud UTF8 literals I agree with, although that one matches the example given on pdd19:194
19:00 allison pmichaud: it's testing that it's an error when you specify ASCII encoding, but allowed as a string literal
19:00 spinclad t/op/stringu.t:187ff
19:01 pmichaud so, what's the question again?  (I think the answer is simply that pdd19 needs clarification.)
19:01 allison remember, PDD 19 was pulled together from a pile of old documentation
19:01 NotFound So whay is the intention of the spec? They must always be escaped or not?
19:01 NotFound what
19:01 DietCoke no.
19:01 spinclad :200ff disagrees: 'no escapes'
19:02 spinclad (disagrees with pdd)
19:02 pmichaud if an encoding is given, no escaping.
19:02 barney Tests and spec seem to be in line. Question is whether the spec is correct.
19:02 pmichaud if no encoding is given, then the contents of the "..." must be 7-bit ascii
19:03 allison hang on... editing the text now
19:03 pmichaud note that unicode:  is not an encoding, so    unicode:"«"  is not valid, although   utf8:unicode:"«"   is.
19:04 DietCoke Folks, I have to run. I do hope we can agree to name our executables in some of consistent fashion. cotto_work still has a question when this question is resolved.
19:04 DietCoke See folks next week.
19:04 cotto_work bye
19:04 NotFound And the second part is: when unicode: and no encoding is specified, the default utf8 is appliable to the generated string only, or to the escapes in the content also?
19:04 allison edited result "The default encoding for a double-quoted string constant is 7-bit
19:04 allison ASCII, other character sets and encodings must be marked explicitly using a
19:05 allison charset or encoding flag."
19:05 pmichaud I think we need to also make it clear that 7-bit ascii is required when the encoding is not
19:06 allison well, it's not exactly "required"
19:06 NotFound I think 7-bit ascii is redundant.
19:06 NotFound "Ascii extended" is not ascii.
19:06 pmichaud allison:  how will the compiler know how to process the bytes in the "..." if the encoding isn't known?
19:07 allison it throws an exception
19:07 allison that was what the second test was checking
19:07 pmichaud argggh.  isn't "throw an exception" equivalent to "didn't meet the requirement?"
19:07 allison but, you can enter escaped characters
19:08 pmichaud ...because an encoding wasn't given?
19:08 allison if you enter characters that aren't ascii, it'll treat them as ascii
19:08 allison if an encoding isn't given, it's the same as if you specified an encoding of "ascii:"
19:08 allison exactly the same
19:08 pmichaud ascii or fixed8?
19:08 allison ascii
19:09 allison (at least, that's what it was)
19:09 pmichaud so, we treat "ascii" as specifying both an encoding and a charset?
19:09 NotFound I think that not allowing any non ascii char will be a cleaner way.
19:09 NotFound Compiler can explcitly say what they intend when generating pir.
19:09 pmichaud ascii:"\xab"   throws an exception?
19:10 pmichaud or is it   "backslash, x, a, b" ?
19:10 allison throws an exception
19:11 allison '\xab' (single quotes) is backslash, x, a, b
19:11 pmichaud ascii:"\x0d"   is a newline?
19:11 NotFound pmichaud: no, is cr
19:12 pmichaud sorry, cr
19:12 allison pmichaud: should be
19:12 spinclad utf8:unicode:"\x0d"   is a newline?
19:12 pmichaud ucs2:"\x0d"  is a newline?
19:12 pmichaud (or do we decide not to support ucs2?)
19:13 allison it's only characters outside the ASCII range that throw an exception when the string is ASCII
19:13 spinclad s/newline/cr/
19:13 chromatic By ASCII, you mean 7 bits?
19:13 pmichaud my point is that for some encodings we can't always decide if a backslash is an escape or part of the character being encoded
19:13 allison NotFound: (I'm leaving the 7-bit in the PDD, because people always have that question)
19:14 pmichaud that's why lines 196-197 say that escapes are not honored when the encoding is specified
19:14 allison pmichaud: a backslash is always an escape in a double-quoted string
19:14 NotFound allison: agree, but mentioning it one time in a note will be enough.
19:15 pmichaud allison:  fair enough; we then need to remove the mention that escape sequences are not honored when an encoding is specified.
19:15 pmichaud and we don't support encodings where backslash may be a valid byte
19:15 allison in the current implementation, backslashes are honored
19:16 allison even when another encoding is specified
19:16 pmichaud fair enough -- again, I've been restricting myself to spec.
19:16 allison which is more useful?
19:16 pmichaud good question.
19:16 allison consistency is valuable
19:16 pmichaud I think it's more useful to always restrict the "..." to ascii chars, personally -- with everything else escaped.
19:16 NotFound I agree.
19:16 allison that's excessive
19:17 allison that means you can never directly type a UTF 8 string
19:17 pmichaud in PIR code?
19:17 NotFound allison: if not, we must take into account a lot of things, and we complicate the parsing.
19:17 allison can never pass a UTF 8 string in from an HLL parser
19:17 pmichaud does it matter for PIR?
19:17 pmichaud we pass UTF-8 strings in all the time -- they get encoded by PCT
19:17 allison it's also an unnecessary restriction
19:17 allison the strings are just a series of bytes
19:18 NotFound allison: yes, I think the HLL must be clear about his intention when generating pir.
19:18 allison there's no reason to restrict which bytes
19:18 pmichaud besides, aren't we moving _away_ from UTF8?
19:18 pmichaud (I know we'll always support it, but internally the strings will be something else...?)
19:18 allison the restriction enters in from how you specify the encoding
19:18 NotFound allison: that is not what the docs says about complete unicode supoort.
19:19 allison pmichaud: internally strings will always be stored in whatever their natural encoding is
19:19 allison strings are just a blob of data
19:19 pmichaud so then an HLL will have to specify that encoding as part of the string constant anyway, yes?
19:19 pmichaud sorry, string literal
19:19 allison how you read that data depends on the encoding and character set
19:19 wknight8111 ...except in my GC, where strings are apparently always stored as a segfault
19:19 allison wknight8111: heh :)
19:20 allison string literals are fundamentally the same as regular strings, but not modifiable
19:20 pmichaud if my HLL has a utf-8 string, it needs to either (1) indicate in the PIR that the string is encoded at utf8, or (2) escape the non-ASCII chars
19:20 pmichaud s/at/as/
19:21 pmichaud it can't just stick the UTF-8 string inside of a pair of double quotes and expect PIR to know what to do with it
19:21 pmichaud (unless PIR is specified as defaulting to utf8)
19:21 allison pmichaud: no, you have to specify an encoding
19:21 wknight8111 pir should understand utf8:"a utf8 string here"
19:21 pmichaud what if my encoding has another meaning for backslash?
19:21 allison if you specify no encoding or character set, parrot treats it as an ASCII string
19:22 pmichaud anyway, I'll stop here -- no matter what Parrot does the HLL tools will be able to work with it.
19:22 allison pmichaud: that's a good point
19:23 pmichaud it just seems inconsistent that we allow ambiguous bytes in the string
19:23 NotFound And if utf16 or ucs2 is specified the string can contain 16 or 32 bit encoded unicode chars inside a 8-bit encoded file? That can be a nightmare for text editors.
19:23 allison for now, I'm not modifying PDD 19 where it says that specifying an encoding stops parrot from processing backslashes in strings
19:23 pmichaud so   unicode:"«" is an error?
19:23 allison no
19:23 pmichaud sorry, I said that wrong
19:23 pmichaud fixed8:"\x0a"   is a 4-character string?
19:24 allison but utf8:unicode:"\uwhatever" is an error
19:24 pmichaud no, not an error -- it should be  backslash+u+whatever
19:24 allison (not an error, but the backslash isn't treated as special)
19:24 allison yes
19:25 NotFound So the encoding part defines both the literal interpretation and the generated string?
19:26 Auzon left #parrotsketch
19:26 pmichaud NotFound: that's the way I interpret it.
19:26 pmichaud it does mean that we can't specify, say, ucs2 literals
19:26 allison NotFound: to be specific, the encoding and charset flags on a literal string specify the metadata on the literal string
19:27 allison anything that the literal string is assigned to, adopts that metadata from the literal string
19:28 NotFound allison: that looks inconsistent to me. Utf8 must be literal but ucs2 must be always escaped.
19:28 pmichaud (not that it matters to me that we can't specify ucs2 literals :-)
19:28 allison ?
19:29 allison ah, we just need to add a ucs2 encoding flag
19:29 allison if we intend it to be used on any regular basis
19:29 pmichaud there's no way to encode ucs2 literals containing double quotes
19:30 NotFound There is no sane way to encode 16 or 32 bit chars in an 8 bit text file, except escaping.
19:30 pmichaud (even with a ucs2 encoding flag.)
19:30 spinclad pmichaud: ucs2:'"'
19:30 pmichaud spinclad, okay, a string with both single and double quotes, then :-)
19:30 spinclad ok
19:31 pmichaud also I can't see your null byte in there.
19:31 * pmichaud looks carefully.
19:31 NotFound And don't event talk about allowing ucs2 pir  source files.
19:31 pmichaud I'll assume it's there.  :-)
19:31 spinclad no null byte. counted string
19:31 allison a) we would have to introduce a new quoting syntax, and b) presumably, if you're working with 16 or 32 bit chars, you aren't doing it in an 8 bit text file
19:31 pmichaud spinclad:   in ucs2, a double quote is \x00\x22
19:31 allison but, really, ucs2 is not a high priority
19:32 particle in win32, all files are stored as usc2 by the os
19:32 allison as long as there is some way to create ucs2 strings, we can call it good
19:32 pmichaud I'm speaking particularly of ucs2 literals
19:33 allison pmichaud: if we have a demand for ucs2 literals that can't use escapes, we can do the work to add them
19:33 spinclad ok, ucs2:'<box>'
19:34 pmichaud allison: I'm saying that the spec should allow escapes
19:34 allison should allow escapes in literal strings that specify an encoding?
19:34 pmichaud and not allow oddly-encoded strings in PIR source
19:34 NotFound Then we must allow them in utf8, for consistency.
19:34 pmichaud or we can choose to be explicitly inconsistent
19:35 pmichaud I have no trouble with that, fwiw
19:35 allison more accurately, when an encoding is specified, it should have the metadata to declare whether its strings process escapes
19:35 pmichaud I have no problem with saying that utf8:"..."  allows utf8 encoded stuff inside the quotes, *and* processes escapes.
19:35 NotFound I think the clean way is to always escape any non ascii character.
19:36 pmichaud the only reliable way to represent any generic ucs2 literal is if we allow escapes.
19:36 pmichaud or if we separate the PIR encoding from the resulting literal
19:37 allison okay, the answer for now
19:37 pmichaud (which is effectively what escapes do, but escaping every character is a bit much, I agree.)
19:37 allison we allow escapes and non-ascii characters in double-quoted strings
19:37 allison double-quoted strings are just blobs of data
19:37 NotFound pmichaud: allowing mixing complicates the parsing for no real gain, IMO.
19:38 pmichaud NotFound: I'm not worried about the parsing as much as I am the result
19:38 pmichaud I'd much rather be able to produce my constant string in the .pbc output directly than to have to have transcode operations at runtime because there wasn't a way to do it in the PIR originally.
19:38 pmichaud the transcode operations produce extra GC-able elements, which is bad.
19:38 allison the encoding and character set determine how the resulting data is treated
19:39 NotFound pmichaud: but generating any encoding wanted is not a problem, if the specs clearly states what is.
19:40 allison I think every one in the conversation has switched between all three of the positions during this conversation, so we'll have to call that good
19:40 allison done
19:40 pmichaud NotFound: right now the encoding specifies both the interpretation of the double-quoted string and the encoding of the resulting string.  But there are some encodings that we cannot represent in a double-quoted string without having an escaping mechanism.
19:40 NotFound allison: there is a remaining problem: if unicode: is specified, how the escpaes are interpereted? A 8 bit chars that forms ut8, or as unicode points?
19:40 pmichaud unicode is not an encoding
19:41 pmichaud so it's a normal double-quoted string, where escapes are honored.
19:41 NotFound Not, but the spec says that default is utf8.
19:41 pmichaud if there are any non-ASCII characters in the double quotes, they would need to be utf8
19:41 barney An easy question: Should   \"   be added to line 186 of PDD19
19:41 barney ?
19:42 NotFound But the doubt is how to interpret the escaped ones.
19:42 pmichaud I have no doubt about how to interpret the escaped ones
19:42 pmichaud (for utf8)
19:42 pmichaud unicode:"\xaa"
19:42 pmichaud unicode:"«"
19:42 pmichaud are the same.
19:43 pmichaud barney: Is \" processed as an escape?
19:43 allison barney: yes, \"  works in double quoted strings
19:43 pmichaud it doesn't in the current implementation
19:43 allison (it's absolutely critical, otherwise you can't enter a quote in a double-quoted string
19:43 pmichaud yes, you can enter a quote in the double quoted string, it's \x22
19:44 allison fair enough
19:44 NotFound pmichaud: is reasonably, but the spec is not clear enough about that, IMO.
19:44 allison NotFound: pdd 19 or pdd 28?
19:44 pmichaud NotFound: I don't disagree that the spec is unclear.  I'm just saying that it's possible for us to have utf-8 encoding and escapes in a single string w/o it being ambiguous
19:45 NotFound allison: 19
19:45 allison pdd 19 is certainly not clear yet
19:45 pmichaud allison: we're only talking about pdd19 here.  I don't think pdd28 specifies anything about PIR representation of literals
19:45 barney \"   Is speced in line 129 of PDD19
19:45 pmichaud barney: aha.  okay.
19:45 pmichaud the current implementation doesn't allow \"
19:46 allison barney: added to the spec
19:46 pmichaud sorry, I'm wrong, I typoed
19:46 pmichaud ignore me.
19:46 pmichaud pmichaud--
19:46 barney Is there a way to have a single quote in a single quoted string?
19:46 pmichaud \" works now.  Yes, it should be added to 186 of pdd19.
19:46 NotFound The problem I see with this approach is that a generated pir that contains both utf8 and iso-8859-1 unescaped characters is not good for the sanity using a text editor no writting specifically yo handle pir source.
19:47 pmichaud (1) if someone is editing generated pir, they need to be able to handle it
19:47 pmichaud (2) all of PCT's string generation in PIR converts non-ASCII to escapes.  But just because PCT does it that way doesn't mean that we always want it to do so that way
19:48 pmichaud (3) If someone has string literals in a non-western language, I don't know that I want the generated PIR to always be a bunch of escape characters.  It would make sense to allow the utf8 directly in the string literals.
19:48 pmichaud (e.g., chinese)
19:48 allison okay, the escapes are not persistent in the string
19:49 allison they're only a way of representing a character that can't otherwise by typed
19:49 pmichaud ...or parsed by PIR.
19:49 chromatic Don't we need some sort of BOM or encoding marker at the start of the PIR file then?
19:49 allison as soon as that literal is read into anything, there is no difference between the escape and the utf8 character
19:49 pmichaud ...isn't it "as soon as the literal is compiled, there is no difference..."?
19:49 NotFound pmichaud: I also finds nice to be able to write my own name 'Julián' in pir, but not sure it pays the price of support all that.
19:49 spinclad .oO { do we need a BOM at the start of a ucs: string? }
19:49 particle .pragma encoding utf8 ??
19:50 pmichaud surely PIR doesn't store the escape sequences in the literals it produces.
19:50 allison pmichaud: basically, there's only a difference in the source file
19:50 pmichaud right.
19:50 chromatic How do we expect a random text editor to parse .pragma encoding utf8 ?
19:51 pmichaud so   unicode:"«"   and unicode:"\xab"  would produce exactly the same result.
19:51 pmichaud even down to being the same .pbc output.
19:51 allison pmichaud: exactly
19:51 particle bom is also ball of mud
19:52 NotFound So unicode:"\xab" and utf8::unicode:"\xab" is also the same result?
19:52 NotFound So unicode:"\xab" and utf8:unicode:"\xab" is also the same result?
19:52 pmichaud I don't see a problem with that for utf8
19:52 NotFound No problem, just wants to be clear about that.
19:52 allison NotFound: yes
19:53 allison consistency++
19:53 wknight8111 consistency++
19:53 pmichaud we'll have to figure out something to do for ucs2
19:53 pmichaud and personally
19:53 NotFound consistency++
19:54 pmichaud I'd prefer it if unicode:"..." accepted utf8 strings in the PIR text but produced Parrot's default internal representation for the constant
19:54 pmichaud (i.e., the one in pdd28)
19:54 wknight8111 couldn't parrot just parse ucs2: as utf16:?
19:54 allison parrot doesn't have a default internal representation
19:54 NotFound I think ucs2 or utf816 literals must be forbidden, at least in 8 bit encoded source files.
19:55 chromatic Agreed.
19:55 allison (the default internal representation was an idea from an earlier draft that didn't make it in the final cut)
19:55 NotFound I mean utf16
19:55 barney What is the specific problem of ucs2 ?
19:56 barney s/of/with/
19:56 pmichaud we're not doing NFG?
19:56 NotFound barney: the problem I see is that many people confuses it with utf16.
19:56 allison not as a universal standard, no. NFG is just another additional encoding/charset
19:56 cotto_work left #parrotsketch
19:57 cotto_work joined #parrotsketch
19:57 pmichaud since (for speed reasons) I'm going to be converting a lot of things into NFG, there's no way for me to specify a NFG literal without escaping everything?
19:57 allison the thing about string data, is you want to avoid transforming it whenever you can
19:57 coke joined #parrotsketch
19:57 allison escaping everything won't specify an NFG literal
19:58 allison NFG is just a storage format
19:58 pmichaud okay, how do I specify an NFG literal?
19:58 DietCoke ... wow. haven't even gotten to cotto's question, have ya. =-)
19:58 cotto_work no
19:58 pmichaud or do my literals always get transcoded at runtime?
19:58 pmichaud or...?
19:58 spinclad 'ball of mud'
19:58 DietCoke Ok. Don't forget cotto. heading back out. =-)
19:59 allison (trying to decide if it's an encoding or charset flag)
19:59 NotFound DietCoke: sorry, I imagined this has to be a long discussion, but I think is important to clarify this issues.
19:59 allison ... it's an encoding flag
19:59 pmichaud pdd28 says that nfg is always unicode codepoints
19:59 allison nfg:
20:00 allison yes, but they're stored differently (encoded differently)
20:00 particle can we interrupt this endless discussion to give cotto his time, so he can get on with life?
20:00 allison yes
20:00 NotFound particle: no problem
20:00 pmichaud cotto:  still around?
20:00 cotto_work yes
20:00 pmichaud still have a question?
20:00 cotto_work yes.  It should be a quick one.
20:01 cotto_work The Array PMC's freeze/thaw/visit functions are broken.  Are they worth fixing or should that rt be rejected?
20:01 pmichaud (suggestion for string encoding:  allison is undoubtedly busy with oscon, and I don't think string parsing is a pressing issue.  Can we save it until the post-oscon hackathon?)
20:01 allison cotto_work: they are worth fixing
20:01 cotto_work thanks.
20:01 particle the one thing worth saving in Array pmc as far as i'm concerned is the sparse storage
20:01 NotFound The urgent questions have been anserwed, the other can be delayed.
20:02 allison pmichaud: also, a good bit will be worked out as we implement the strings PDD
20:02 particle if that can be rolled into fixed/resizable pmc variants, maybe Array can go away
20:02 pmichaud okay.  allison and I can review string literals and encodings wrt nfg at the oscon hackathon.  and yes, string pdd implementation will add more useful information
20:02 cotto_work particle, you mean sparseness?
20:02 particle yes
20:02 pmichaud I just want to put a hook in that it would be good to have a way to specify literals in PIR that go directly to NFG without requiring an explicit transcode step at runtime.
20:03 NotFound That is the reason why I asked, we can't sanely work in strings without some clarity in this points.
20:03 pmichaud I will shut up now until cotto's question is finished.
20:03 NotFound But as I said, the urgent ones had been cleared.
20:03 allison I need to review the PIR PDD and launch it out of draft. That'll likely be my hackathon task (including some string conversation with pmichaud).
20:05 allison cotto: is your question answered?
20:05 NotFound I win the price for the longer first question? ;)
20:06 cotto_work if sparseness if the only thing worth preserving about the Array, would it be better to make the other Array types sparse?
20:06 spinclad NotFound: i give you an hour of my life as a prize.
20:06 allison NotFound: you win the prize :)
20:06 NotFound (No matter it really was the second)
20:06 allison cotto_work: potentially, yes
20:07 allison cotto_work: though, it's still worth making freeze/thaw/visit work
20:07 cotto_work meaning "if someone can find the tuits"?
20:07 barney SparseResizablePMCArray
20:08 cotto_work ok.  I can see how freeze/thaw/visit would be a step in the right direction
20:08 cotto_work eoq
20:08 allison cotto_work: yes, if someone has time. it's not wasted, because they'll have to work for whatever sparse Array results
20:09 allison okay, any other questions before we go?
20:09 chromatic Where shall we have lunch?
20:09 pmichaud I will miss parrotsketch next week.
20:09 chromatic Technically, that wasn't a question.
20:10 pmichaud (are we having parrotsketch next week?)
20:10 allison should we skip parrotsketch next week for OSCON?
20:10 chromatic Probably.
20:10 allison then yes, no parrotsketch next week
20:10 allison we'll resume on July 29th
20:11 allison thanks everybody!
20:11 allison EOPS
20:11 pmichaud left #parrotsketch
20:11 cotto_work left #parrotsketch
20:11 NotFound left #parrotsketch
20:12 allison left #parrotsketch
20:13 jonathan left #parrotsketch
20:13 chromatic left #parrotsketch

| Channels | #parrotsketch index | Today | | Search | Google Search | Plain-Text | summary