do you know what's happening / what the best way to fix this problem is?
ZaphodBeeblebrox
I've tried newer itunes. but they're all shit. and this one atleast jsut plays whatever iwant and stuff
hmmm 🤔 one idea is to also get QuodLibet inot this
a plugin for QuodLibet woudl be acebeams
(then I just need otfigure out how to shuffle a playlist on album not song and i'm readyto switch :D)
BrainzGit
[listenbrainz-server] paramsingh merged pull request #736 (master…param/mail-on-dump-cron-job): Send a notification email when new data dumps are created https://github.com/metabrainz/listenbrainz-serv...
ruaok
iliekcomputers: timescale is doing 20k inserts/s loading from the dump
iliekcomputers
how much did influx do?
how are you importing?
ruaok
I forget what it was doing.
I just wrote a script that loads the data into a PG table -- I just added one extra call to create_hypertable(), otherwise it is bog-standard stuff.
iliekcomputers
woah
that's pretty nice
amCap1712
ZaphodBeeblebrox: it seems to be written in objective-c which apple abandoned a few years ago. it may work but needs a major overhaul
ruaok
yeah, and all of PG's functionality is available to us.
iliekcomputers
can i look at the code?
k, we should get emails when dumps are created now.
i've roped in ferbncode to look into using protobuf instead of json in our queues
ferbncode: 👋🏽
ruaok
nice!
iliekcomputers
>"INSERT INTO listen VALUES %s"
👏🏽 👏🏽
ruaok
using timescale is ridiculously easy if you are used to PG.
I think I am going to re-write this to load from influx.
our dumps don't seem to include inserted_at timestamps.
which I really want to migrate
iliekcomputers
>our dumps don't seem to include inserted_at timestamps
add to trello?
sounds like a good first bug
ruaok
jira if anything.
though I am not 100% sure on that. did we backfill those?
I looked at very early data.
iliekcomputers
no, there is no way to do that.
plus influx is a pita anyways
ruaok
yeah.
iliekcomputers
we started tracking it a long time ago tho
ruaok
if this approach looks promising, I will write the migration tool directly from influx.
will give us a better chance at being consistent.
because I can start writing new listens to it and THEN do the import of everything in influx.
iliekcomputers
the idea is to start shadowing the queue at timestamp x, and retroactively load everything before timestamp x, right?
ruaok
when that process is done, we have everything.
iliekcomputers
yeah makes sense.
ruaok
I wasn't even going to be so picky about X.
the system is designed to kick out dups, so let it kick out the dups at the tail end.
ZaphodBeeblebrox
y?
iliekcomputers
yeah i guess that makes sense.
wait
but the dups are in the queue
dup logic*
are you gonna take from influx and put it in the queue?
ruaok
first, and this is what I am going write right now, is to write a timescale_writer, much like the influx_writer.
pristine__
iliekcomputers: what is odd about it? I have no idea tbh
ruaok
then add a new queue off the incoming exchange and have it start writing the new listens.
iliekcomputers
pristine__: the test is very flaky, it fails randomly and then passes on rebuild
and it's only that test
ruaok
at some point we will need to swap over and have the timescale writer write to the unique rmq, but that would be one of the last things to do.
amCap1712
ZaphodBeeblebrox: completed the testing or still on?
ruaok
iliekcomputers: does that makes sense wrt to rmq? add a new queue off the incoming exchange, and then two consumers can consume the data at their own pace, effectively duplicating the data.
iliekcomputers
yes. i've done that with the follow feature
ZaphodBeeblebrox
amCap1712: ag i'm palying something i'll test soem more
(just need food)
iliekcomputers
it follows the unique queue, iirc
ruaok
great.
amCap1712
yeah sure let me know how it goes ZaphodBeeblebrox
iliekcomputers
ruaok: i'm with you until here.
how are you gonna backfill from influx to timescale
ruaok
1. setup timescale
2. write code to fetch EVERY listen
pristine__
iliekcomputers: yeah, I get that but I am not sure why it happens. Will have a look
ruaok
3. insert into timescale.
ZaphodBeeblebrox
lol this. same Arrepentimientos song goes to two differnt recordings??
ruaok
1.a setup live writing from the incoming queue
iliekcomputers
ah
ZaphodBeeblebrox
ah one is the borked itunes track-thing
s/itunes/last.fm/
oh dear. it
does apparently scrobble to last.fm even if i turned that damn thing of
and now they're duplicated :D
iliekcomputers
ruaok: 1. full import directly from influx 2. start shadowing the queue
this is what you mean basically right
ruaok
no, the other way around.
start shadowing, the cross load
iliekcomputers
1. start shadowing the queue.
2. full import directly from influx
that would lead to duplicates
unless you write the dedup logic into the import process
ruaok
postgres won't allow duplicates
iliekcomputers
👏🏽
ZaphodBeeblebrox
hmm removed every thing I can of audioshattler, let's se if now it doesnt
ruaok
now, that might make the migration quite slow, not sure.
iliekcomputers
yeah
ruaok
oh!
iliekcomputers
migrations without constraints is what i learned from the AB migration
ruaok
right. heh. rabbitmq.
iliekcomputers
why not just do the timestamp logic?
1. start shadowing the queue to insert all listens (starting timestamp x)
ruaok
timestamp logic for de-duping? yes, the primary key will be (timestamp, username)
ZaphodBeeblebrox
also testing restarting the same song
ruaok
oic
iliekcomputers
2. directly import from influx all listens inserted to influx before timestamp x
ruaok
yeah, given the no constraints bit, that might actually be easier.
I *really* hope this lives up to its promises.
if it does, it is the perfect solution for us.
iliekcomputers
can't wait to migrate over to the new thing 3 years later
(kidding)
ruaok
well, I've only ever migrated to PG once.
best decision evar.
ZaphodBeeblebrox
amCap1712: I fixed audioscrobbler bork!
amCap1712
great
iliekcomputers
ruaok: so about shadowing a queue. it's more shadowing an exchange which pushes to two queues iirc