ruaok: I created LB-182, but I was thinking, what if a person first does a last.fm import (of the entire history) and then an alpha import, this could still lead to duplicates, I guess we're just gonna tell people not to do this?
if people do stupid things twice, they deserve it.
2017-06-23 17417, 2017
alastairp
3:15 PM <ruaok> then if we do a last.fm, never import anything older than that timestamp.
2017-06-23 17423, 2017
alastairp
I was thinking a few ways of doing this
2017-06-23 17447, 2017
alastairp
1) this is basically incremental last.fm dumps, is it much more work to make this task that instead?
2017-06-23 17423, 2017
alastairp
2) if we don't want to do this yet, we could have the "has imported from alpha?" flag, and just disable last.fm imports for now
2017-06-23 17445, 2017
alastairp
when we have time, and implement 1), we can turn this into "do incremental import"
2017-06-23 17423, 2017
ruaok
oh, oh, oh.
2017-06-23 17429, 2017
ruaok
1 is clever.
2017-06-23 17442, 2017
ruaok
iliekcomputers: following this?
2017-06-23 17446, 2017
iliekcomputers
what does "incremental import" mean exactly?
2017-06-23 17402, 2017
ruaok
only import up until the last time you imported into last. fm.
2017-06-23 17411, 2017
ruaok
*from
2017-06-23 17439, 2017
iliekcomputers
ah
2017-06-23 17410, 2017
iliekcomputers
I guess we'd have to track which was the last listen from last.fm then?
2017-06-23 17420, 2017
ruaok
which as alastairp pointed out, is practically the as as what you're doing now.
2017-06-23 17436, 2017
ruaok
yes, but you're already going to add htat.
2017-06-23 17453, 2017
ruaok
the import from alpha would then basically do the following:
2017-06-23 17459, 2017
ruaok
1. import everything from alpha.
2017-06-23 17420, 2017
ruaok
2. Set the last last.fm import date to the most recent listen from alpha.
2017-06-23 17455, 2017
iliekcomputers
I get it, nice
2017-06-23 17408, 2017
ruaok
yeah, kills two birds with one stone.
2017-06-23 17457, 2017
iliekcomputers
okay, how exactly would we track which listens are from last.fm, send an extra field along with the data?
2017-06-23 17420, 2017
iliekcomputers
and how do we update the last lfm import date
2017-06-23 17432, 2017
alastairp
ah, the nice thing about that is I don't think you need a field for "imported from alpha"
2017-06-23 17438, 2017
alastairp
just pretend that it's last.fm
2017-06-23 17439, 2017
iliekcomputers
every rabbitmq batch
2017-06-23 17445, 2017
iliekcomputers
?
2017-06-23 17402, 2017
alastairp
because the final result is the same, right?
2017-06-23 17437, 2017
alastairp
ah, if the process fails half way through and you want to continue it, you will need two fields
2017-06-23 17438, 2017
ruaok
alastairp: should be, yes.
2017-06-23 17412, 2017
ruaok
iliekcomputers: after writing a batch of listens, update the latest timestamp in PG.
2017-06-23 17443, 2017
ruaok
latest_import_timestamp TIMESTAMP WITH TIMEZONE
2017-06-23 17405, 2017
ruaok
and if the import from alpha does the same, then we're clear and out of the woods
2017-06-23 17420, 2017
alastairp
make sure that alpha and last.fm imports go from oldest-newst
2017-06-23 17423, 2017
alastairp
newest
2017-06-23 17433, 2017
iliekcomputers
that would be extra work considering they both go newest-oldest right now, could we not just check the field each batch and update if it is smaller than our current ts (or would that be too bad for performance?)
2017-06-23 17416, 2017
Gore|woerk has quit
2017-06-23 17421, 2017
ruaok
alastairp: why is that order needed?
2017-06-23 17436, 2017
ruaok
can we just store max(stored ts, most recent import ts)?
2017-06-23 17413, 2017
alastairp
hmm
2017-06-23 17429, 2017
alastairp
OK, if influx can handle deduplicating OK
2017-06-23 17410, 2017
ruaok
in theory. :)
2017-06-23 17446, 2017
alastairp
I guess this goes back to our storing stuff in cassandra
2017-06-23 17411, 2017
alastairp
so, my idea for incremental imports was always that the importer would first ask "what was the last thing you have?" to the server
2017-06-23 17415, 2017
alastairp
then only send data from that point
2017-06-23 17441, 2017
alastairp
because if you import some stuff from day 1-14
2017-06-23 17446, 2017
alastairp
then on day 30 you do another import
2017-06-23 17404, 2017
alastairp
the first page of data that gets submitted to the database is day 28-30
2017-06-23 17430, 2017
alastairp
if you want to import all 60,000 pages of your listens each time you do an import; OK
2017-06-23 17451, 2017
alastairp
if that's the case, max() would work fine
2017-06-23 17403, 2017
ruaok
> the first page of data that gets submitted to the database is day 28-30
2017-06-23 17407, 2017
ruaok
where did 28 come from?
2017-06-23 17415, 2017
alastairp
because a page is what, 50 items?
2017-06-23 17425, 2017
alastairp
containing all the music that you've listened to in the last 2 days
2017-06-23 17400, 2017
alastairp
I think you can do it backwards. let me write some psuedocode
now there's a 'newest to oldest' and 'oldest to newest'
2017-06-23 17406, 2017
alastairp
I always thought that oldest to newest would be the nicer way to do it, but now I see it's possible to do it both ways
2017-06-23 17457, 2017
ruaok
> do some sort of binary search to find the page which contains the last imported item
2017-06-23 17416, 2017
ruaok
this sounds troublesome, for an unclear gain.
2017-06-23 17426, 2017
ruaok
I'm ok with newest to oldest.
2017-06-23 17429, 2017
iliekcomputers
the newest to oldest one is almost exactly what I was thinking, except for the last request announcing the latest_timestamp, we get it ourselves from influx-writer
2017-06-23 17459, 2017
iliekcomputers
but now think alastairp's version is better
2017-06-23 17411, 2017
ruaok
iliekcomputers: the problem is the influx writer cannot discern a listen from an import, can it?
2017-06-23 17422, 2017
alastairp
the gain is that last import date gets set every time a submission gets stored
2017-06-23 17425, 2017
alastairp
in my view
2017-06-23 17434, 2017
alastairp
so that the client doesn't have to send this value
2017-06-23 17455, 2017
iliekcomputers
ruaok: I was thinking we could pass an extra field to additional_info for that
2017-06-23 17457, 2017
ruaok
alastairp: ah, I see.
2017-06-23 17404, 2017
alastairp
I think it's more elegant for cases where the browser window closes, or import crashes
2017-06-23 17420, 2017
alastairp
it's true, finding the page to start from will be difficult
2017-06-23 17425, 2017
ruaok
iliekcomputers: additional info isn't ours to store data in.
2017-06-23 17432, 2017
alastairp
difficult/complex/finicky
2017-06-23 17420, 2017
ruaok
agreed, but the point about browser crash (or whatever) is a good thing to consider.
2017-06-23 17423, 2017
alastairp
iliekcomputers: if you go new->old you have to announce the last_timestamp, because otherwise a crash in the importer will lose listens
2017-06-23 17404, 2017
ruaok
I suppose the recovery isn't terrible -- you do the import again, dups get tossed, until the import completes.
2017-06-23 17411, 2017
alastairp
yup
2017-06-23 17426, 2017
ruaok
I find that is sufficient.
2017-06-23 17433, 2017
alastairp
it sounds like we're happier about importing dups than with cassandra
2017-06-23 17440, 2017
alastairp
in which case, this is an OK approach to me
2017-06-23 17452, 2017
ruaok
yes, influx handles dups
2017-06-23 17450, 2017
iliekcomputers
so oldest to newest is the way to go :)
2017-06-23 17403, 2017
ruaok
phew. :)
2017-06-23 17446, 2017
alastairp
wait. you all just convinced me that newest to oldest is OK
2017-06-23 17413, 2017
ruaok
get our the red pen?
2017-06-23 17415, 2017
ruaok
out
2017-06-23 17431, 2017
iliekcomputers
welp, i meant the opposite
2017-06-23 17436, 2017
iliekcomputers
sorry
2017-06-23 17438, 2017
alastairp
OK :)
2017-06-23 17400, 2017
ruaok
oh.
2017-06-23 17401, 2017
iliekcomputers
brainfade
2017-06-23 17409, 2017
ruaok
meh. lysdexia blows.
2017-06-23 17417, 2017
alastairp
I think we broke ruaok
2017-06-23 17433, 2017
ruaok
been broken for a while.
2017-06-23 17438, 2017
ruaok checks calendar
2017-06-23 17451, 2017
ruaok
ah, yes almost 47 years.
2017-06-23 17455, 2017
Quesito
alastairp: home till sant Juan BBQ tonight...let me know if you will pass by!
2017-06-23 17406, 2017
ruaok
I'm sadly without plans for Sant Juan as of yet. This is my first evening without any social plans and I would totally stay home and have a quiet evening in.
2017-06-23 17411, 2017
ruaok
"quiet"
2017-06-23 17419, 2017
ruaok
wrong evening for that.
2017-06-23 17401, 2017
alastairp
Quesito: 7:30?
2017-06-23 17445, 2017
Quesito
That'll work alastairp!
2017-06-23 17423, 2017
Rotab has quit
2017-06-23 17457, 2017
Rotab joined the channel
2017-06-23 17433, 2017
alastairp
Quesito: in terms of beer, I have a stout, and a few types of hoppy ales
2017-06-23 17439, 2017
alastairp
do you want anything in particular
2017-06-23 17401, 2017
Quesito
Love hoppy ales in the summertime! I'm excited!!!
2017-06-23 17408, 2017
alastairp
ok! :)
2017-06-23 17438, 2017
CatQuest
guys: can you make the "import form alpha" button more conspicious thoghu, it's easy to just think "oh yeas import" and thne click it and there is no "confirm yo uwant to import fro malpha" thing after :/
2017-06-23 17458, 2017
CatQuest
anyway I don't mind not having the option to import from alpha
2017-06-23 17400, 2017
ruaok
yes, on my list of things to do!
2017-06-23 17444, 2017
lazka has quit
2017-06-23 17405, 2017
Quesito
ruaok: find plans!!! our plans will last an hour max before we turn into little beast pumpkins.....someone has to celebrate on my behalf till sunrise walking backward and all!