Perl 6 - the future is here, just unevenly distributed

IRC log for #askriba, 2017-10-28

| Channels | #askriba index | Today | | Search | Google Search | Plain-Text | summary

All times shown according to UTC.

Time Nick Message
04:04 dboehmer_ joined #askriba
11:45 karjala_ joined #askriba
11:46 karjala_ Hi, a question:
11:49 karjala_ I think I would like to partition a huge table (2 billion rows) into smaller tables (of 30 million rows each) to make searches faster
11:49 karjala_ Is that right?
11:49 karjala_ Would it become faster?
11:49 karjala_ I mean, the searches I make would only need one table at a time
11:50 karjala_ (because the data is very partitioned)
11:50 karjala_ The reason I think I would like to partition into smaller tables, is that someone told me MySQL doesn't work well with tables more than 100m rows
11:51 karjala_ I don't know if they were right
11:52 karjala_ The next question is: Would it be necessary for me to create a new perl module for each of these 70 tables?
11:52 karjala_ Is there a "dynamic" way of doing it?
11:55 karjala_ would something like this work? $schema->resultsource('Foo')->table('foo_25')
12:51 karjala_ But then again, dbicdump will stop working correctly if I introduce 100 tables
12:51 karjala_ (or wont it?)
12:55 karjala_ It won't, there's the "exclude" option in dbicdump
13:25 karjala_ I found this on the web: resultset('TableA')->search({}, { from=>'TableB'});
13:28 karjala_ but it's not documented
13:29 karjala_ Maybe I should use Mojo::mysql for this?
14:45 ribasushi karjala_: that's... many questions ;)
14:45 karjala_ hehe
14:46 ribasushi some of them even philosophical
14:46 ribasushi karjala_: let's start over: what can I help you with?
14:46 karjala_ 1st of all, since I don't have much experience with huge tables, I'd like to know whether I should split the 2bn-row table into many smaller ones.
14:47 ribasushi there is no one correct answer to this question
14:47 ribasushi what I can tell you for sure is that relying on:
14:47 karjala_ I'll be searching for data inside a single partition each time
14:47 ribasushi "...  someone told me MySQL doesn't work well with tables more than 100m rows"
14:47 ribasushi is dead wrong
14:47 karjala_ oh
14:47 ribasushi in general when approaching IT problems always go from what you yourself observe
14:48 ribasushi you already have the data - is it slow? what exactly is slow? does it matter for day to day tasks? is it worth the tradeoff of redesigning your business logic?
14:48 ribasushi these are questions that "someone told me" can not answer
14:48 karjala_ I don't have the data - I'll be given it in a few months
14:48 karjala_ But I get the point. I'll check those things
14:49 karjala_ ok, so I shouldn't do the split just yet, but wait for the data to arrive?
14:49 ribasushi sec phone
14:49 karjala_ I'll test locally
14:49 karjala_ ok
14:50 ribasushi so yeah - always test everything
14:50 ribasushi the fact that you do not have the data doesn't mean much
14:50 ribasushi you can generate bogus data with the simplest loop
14:50 karjala_ ok
14:50 ribasushi use UUIDs, read out of /dev/urandom, whatever
14:51 ribasushi in fact given... do not take it the wrong way, but taking in consideration your lack of experience, I would *very* strongky urge you to play around with some mocks that *seem* to be the same size
14:51 ribasushi this way you can be better prepared for what is coming
14:52 ribasushi *maybe* sharding is the way to go
14:52 ribasushi *maybe* it isn't
14:52 karjala_ ok
14:52 ribasushi ( it also is *super* dependent on the version of mysql/maria that you will be using )
14:52 ribasushi perhaps even a traditional rdbms is not what you need for this
14:53 ribasushi maybe you will be better served by some sort of document storage, or something like neo4j
14:53 karjala_ neo4j is graph db, right?
14:53 ribasushi don't focus on the tools you are familiar with, focus on what problem are you trying to solve
14:53 ribasushi yes
14:53 karjala_ a
14:54 ribasushi karjala_: what kind of project is this? ( understandable if you are unable to share this info of course )
14:55 karjala_ I'll write in private
14:56 karjala_ there
15:00 ribasushi so yeah - then everything I said applies - test your assumptions early, and avoid unverified "someone told me" at all costs ;)
15:03 karjala_ I don't have a big enough SSD on my laptop to test 2bn rows
15:04 karjala_ I would need ~ 200GB
15:06 karjala_ Maybe I should rent a big enough server
15:06 karjala_ k
15:06 karjala_ thx
15:07 ribasushi karjala_: https://www.hetzner.com/sb/?country=ot <--- recycled servers starting at eur25/mo with no contract with decent size drives
15:07 ribasushi ( recycled means someone rented new, and then stopped using them after a while )
15:07 karjala_ thanks!
15:13 karjala_ I'll start by testing out MariaDB & PostgreSQL
20:08 karjala_ joined #askriba
20:35 karjala_ is this module recommended to use if I need it? https://metacpan.org/pod/DBIx::Class::InflateColumn::Serializer::JSON
20:35 karjala_ I need a field that serializes/deserializes to JSON
21:02 karjala_ joined #askriba
22:44 karjala_ joined #askriba

| Channels | #askriba index | Today | | Search | Google Search | Plain-Text | summary