Perl 6 - the future is here, just unevenly distributed

IRC log for #metacpan, 2016-12-05

| Channels | #metacpan index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
01:07 Tempesta joined #metacpan
06:01 [Tux] joined #metacpan
06:25 melezhik joined #metacpan
06:37 melezhik joined #metacpan
06:44 Tempesta joined #metacpan
07:29 nakiro joined #metacpan
07:30 melezhik joined #metacpan
07:38 nakiro joined #metacpan
08:25 Tempesta joined #metacpan
08:31 pombreda joined #metacpan
08:35 osfabibisi joined #metacpan
08:56 edward joined #metacpan
09:05 vagrant joined #metacpan
09:23 Relequestual joined #metacpan
09:37 melezhik joined #metacpan
09:39 melezhik joined #metacpan
10:17 Tux joined #metacpan
10:37 melezhik joined #metacpan
13:10 metacpan joined #metacpan
13:10 metacpan [metacpan-api] mickeyn force-pushed mickey/wip_mapping_deploy_simplified from d383be2 to d6fd7a7: https://git.io/v1lwq
13:10 metacpan metacpan-api/mickey/wip_mapping_deploy_simplified d6fd7a7 Mickey Nasriachi: added suggester to fix autocomplete
13:10 metacpan left #metacpan
13:17 neilb joined #metacpan
13:18 osfabibisi joined #metacpan
15:12 metacpan joined #metacpan
15:12 metacpan [metacpan-api] mickeyn force-pushed mickey/wip_mapping_deploy_simplified from d6fd7a7 to f4c7e13: https://git.io/v1lwq
15:12 metacpan metacpan-api/mickey/wip_mapping_deploy_simplified f4c7e13 Mickey Nasriachi: added suggester to fix autocomplete
15:12 metacpan left #metacpan
16:38 moltar joined #metacpan
17:32 melezhik joined #metacpan
17:34 melezhik2 joined #metacpan
17:35 karjala Hi all. What's an efficient way to retrieve the complete list of distributions that have an authorized latest release? Maybe download some huge file from CPAN? Or is doing the search on ES the only way to go?
17:36 karjala I'm asking because I'd like once per month to mark distributions that have been deleted as un-searchable.
17:37 karjala (in perlmodules.net)
17:39 mst karjala: er, pull 02packages and do a uniqueness thing on them
17:39 karjala thanks
17:39 mst sec
17:39 melezhik joined #metacpan
17:40 karjala I think this file (02packages) doesn't show distribution names, but module names & download filenames
17:40 mst karjala: https://metacpan.org/source/MSTROUT/App-opan-0.002000/lib/App/opan.pm#L50
17:41 mst that'll parse it, then you can regexp out whatever concept of dist name you want to use and uniq() that
17:42 mst 'distribution name' is an invented concept that doesn't really exist historically, although PAUSE does now require you to have Foo::Bar to release a tarball called Foo-Bar-1.23
17:42 karjala oh
17:42 trs CPAN::DistnameInfo will parse a file path from 02packages and give you a "dist" string
17:42 karjala it's all in the filename then?
17:42 karjala excellent!
17:42 karjala is CPAN::DistnameInfo accurate?
17:43 mst right, so basically, take my opan code, pass $_->[2] to CPAN::DistnameInfo, uniq() on the results
17:43 trs It's the most canonical source of "distribution name" and used by MetaCPAN, rt.cpan.org, etc.
17:43 karjala excellent, thanks everyone!
17:44 karjala Is it safe to assume that 02packages has been updated sometime in the past 24 hours?
17:44 mst that's a bizarre question
17:45 mst step back and explain your motivation please
17:46 karjala ok - I will not assume that what I got is the most recent information, because then I might mistakenly mark as "deleted" a distribution that has had a release 5 minutes ago. I intend to mark as "deleted" only those distributions that (a) do not appear in 02packages and (b) have not had a release in the past 24 horus.
17:47 karjala hours
17:47 karjala Will I be safe then?
17:47 karjala Will the service's users be able to find all distributions that are not deleted right now?
17:49 karjala I'm refering to a distribution that has been deleted 10 days ago, but also had a recent release 5 minutes ago. If I get a 02packages file that was created in between, I don't want to mistakenly mark that distribution as deleted.
17:49 karjala That's what I mean.
17:50 Grinnz i would assume you would have marked that distribution as deleted at some point in those 10 days anyway
17:50 Grinnz you just need to undelete it once there's a new release
17:51 karjala But if right now I get 02packages and that does not contain the distro in question, I might RE-mark it as deleted
17:52 karjala So I need to assume that 02packages refers to some point in the past
17:53 karjala there are race conditions between the process that receives new releases, and the process that retrieves and processes 02packages - especially if 02packages is not 100% up to date.
17:55 tmetro1 joined #metacpan
17:56 karjala But if I know that 02packages has not missed more than 24 hours worth of information, then I can exclude from marking-as-deleted those distros that have had releases in the past 24 horus.
17:56 karjala hours
17:57 karjala Because I don't care if I don't mark as deleted a deleted distro, but I care if I mark an un-deleted distro as deleted.
17:58 Grinnz keep in mind a new release might not get indexed anyway, you won't know that until it shows up in 02packages
18:01 karjala Whether it will be indexed depends on whether certain errors will be found in the release?
18:01 karjala like permissions, etc?
18:01 Grinnz permissions, versions
18:02 karjala if it *does* get indexed, will it get indexed within an hour of release? Within 24hours? 2 days? Is there a guarantee?
18:02 karjala By indexed, I mean placed in 02packages
18:03 karjala (I think you mean the same, right?)
18:06 pombreda joined #metacpan
18:06 Grinnz i would say it's generally within an hour, but i wouldn't use the word "guarantee"
18:11 trs karjala: why does the timing matter?  Can't you treat the current 02packages as the authoritative source and just track it?
18:20 metacpan joined #metacpan
18:20 metacpan [metacpan-web] jberger created jberger/blog_entry (+1 new commit): https://git.io/v1438
18:20 metacpan metacpan-web/jberger/blog_entry b154931 Joel Berger: add blog entry for Joel Berger
18:20 metacpan left #metacpan
18:23 karjala trs: the problem is that I might then mark as deleted a distro that has been revived. And it will remain deleted in my database (and thus unsearchable by the user) until the next update, which might be... 24 hours away
18:23 karjala I want to avoid that scenario. That's why I was asking.
18:23 tmetro1 Can the metacpan API be used to answer a question like: "which CPAN module was uploaded for the first time in 2016?" and can it return some metric that acts as a proxy for popularity? (I'd like to say downloads, but as downloads aren't generally served by metacpan, it wouldn't be able to track that.)
18:26 Grinnz karjala: you can't know it's undeleted until the next update, is my point
18:26 Grinnz karjala: so you're dependent on when 02packages updates, there's no avoiding that
18:27 karjala Grinnz: I agree that I can't know for certain that it has been undeleted, but if I don't care erring on the side of undeleting a distro, I better take into account also the releases of the past day.
18:27 karjala and un-delete if I see a release in the past 24 hours
18:28 karjala not sure i'm perfectly clear
18:28 karjala but anyway
18:29 karjala I've been helped here. I'll implement now.
18:30 karjala there's even a date field in the headers of 02packages
18:30 karjala that should help even more
18:33 karjala Question: Is it possible that a release is marked as latest, but won't be indexed?
18:34 karjala And another one: If a release is indexed, and the next release isn't (because of permissions or whatever error), will the previous release get removed from the index?
18:34 Grinnz no, the previous release will remain
18:34 trs karjala: you are making this way more complicated than it should be.  As Grinnz suggests, just track 02packages and you'll get your updates as fast as you can.
18:34 trs karjala: if you pick a mirror using rrr-client to do the sync, then you can check much more frequently than 24 hours.
18:35 Grinnz consider, anyone can upload anything to CPAN, so anyone could upload an unauthorized release of a distribution, this would fail to index but the previous release would stay in the index
18:35 karjala trs: By "using rrr-client" do you mean that the mirror should use rrr-client, or that I use rrr-client on the mirror?
18:36 trs the mirror should be using rrr-client
18:36 trs i.e. so its a real-time mirror
18:36 trs like, cpan.metacpan.org :)
18:36 karjala so it's better to not download from http://www.cpan.org/modules/02packages.details.txt ?
18:36 karjala ok
18:38 karjala i'll do http://cpan.metacpan.org/modules/02packages.details.txt.gz
18:39 karjala Still, I would like to clear this in my head (if possible): Is is possible for a release to get "latest" status without getting indexed?
18:40 karjala assuming no other release of the same distro comes right after that one
18:41 mst I believe metacpan will give a release latest status if it believes it *will* be indexed?
18:46 trs It is possible for MetaCPAN to give a release/file "latest" status without that release being authorized.
18:54 pombreda joined #metacpan
19:14 metacpan joined #metacpan
19:14 metacpan [metacpan-web] ranguard pushed 1 new commit to master: https://git.io/v140K
19:14 metacpan metacpan-web/master 938be20 Leo Lapworth: Merge pull request #1830 from metacpan/jberger/blog_entry...
19:14 metacpan left #metacpan
19:14 metacpan joined #metacpan
19:14 metacpan [metacpan-web] ranguard deleted jberger/blog_entry at b154931: https://git.io/v1406
19:14 metacpan left #metacpan
19:24 pombreda joined #metacpan
19:30 ranguard karjala: the 'authorized' is based on https://github.com/metacpan/metacpan-api/blob/master/docs/indexing.md (at least currently) :)
19:32 ranguard also http://cpan.metacpan.org/... is a (very current) mirror of http://www.cpan.org/... + we don't delete, so we are a `backpan`
20:06 ribasushi joined #metacpan
20:12 karjala CPAN::DistnameInfo can't process this path from 02packages: M/MI/MICB/AuthenIMAP.pm.gz
20:12 karjala it returns undef
20:12 karjala in dist
20:12 karjala from its "dist" method, i mean
20:13 karjala also many other .pm.gz paths, it can't process
20:13 karjala Simple.pm.gz, etc
20:13 karjala around 30 i think. Should I worry?
20:14 karjala is this a bug that needs fixing?
20:16 karjala it seems that metacpan also won't bring these modules in the search results, so I'll ignore those as well.
20:17 mst karjala: AuthenIMAP.pm.gz isn't a dist!
20:17 mst karjala: it's ... a perl module file. gzipped. that's utter bollocks.
20:18 mst karjala: you can't expect it to process a path that isn't even a distribution
20:21 ilmari that's just a random file in someone's cpan directory: http://www.cpan.org/authors/id/M/MI/MICB/
20:21 ilmari you can't expect everything there to be a dist
20:23 ilmari now, why it's in 02packages is a good question
20:38 mst yes, actually, I'd not noticed that part
20:44 ilmari pause seems to be looking a bit too hard for modules
20:48 mst I think it might've been because cpan distributed scripts were a thing?
20:49 mst most of them are tchrist, it seems
20:49 ilmari and they were just distributed on their own?
20:50 Grinnz you get a script, you get a script, everyone gets a script!
20:51 ilmari http://www.cpan.org/authors/id/T/TO/TOMC/scripts/
20:53 Grinnz /o\
20:56 mst remember cpan existed before CPAN.pm
21:00 trs fun thing that cropped up recently: compare this on CPAN currently http://www.cpan.org/authors/id/H/HA/HALLORAN/Bio-SeqFeature-Generic-Ext-PrimerMap (a file) to what's on any backpan http://cpan.metacpan.org/authors/id/H/HA/HALLORAN/Bio-SeqFeature-Generic-Ext-PrimerMap/ (a directory)
21:00 trs because presumably it was a directory first, then deleted, then uploaded as a file.
21:01 trs basically it doesn't get replaced during the rsync because its a non-empty directory and backpans don't --delete
21:04 Grinnz oh man
21:13 trs (we won't even talk about the content of that file)
23:13 neilb joined #metacpan
23:22 karjala Is there a maxlength limit in the name of modules?
23:26 karjala On 02packages there's a module with 128 chars
23:27 karjala Universe::ObservableUniverse::Filament::SuperCluster::Cluster::Group::Galaxy::Arm::Bubble::InterstellarCloud::SolarSystem::Earth
23:31 Grinnz I believe someone made a list of the longest module names... haarg or neilb
23:37 haarg https://github.com/andk/pause/blob/master/doc/mod.schema.txt#L221

| Channels | #metacpan index | Today | | Search | Google Search | Plain-Text | summary