ishaanshah[m]: you can add the build option back in, copying how it is in acousticbrainz-server's test.sh
ishaanshah[m]
alastairp: Sorry my client got disconnected
The test.sh and integration-test.sh work without having to be built if volume has been mounted
So it would be better to not have build step right?
As it slows down the process considerably
Chinmay3199 joined the channel
alastairp
no - it's better to build. because at the moment, if I add a new dependency to requirements.txt there is no way to rebuild the image
my initial idea when I wrote the script was to have 2 modes: 1 which would build+setup+bring up db+test+take down everything, so that you could just run one command have have it do everything
and a second set of commands that let you build and bring up manually, so that as you say you can specify a specific test to run and not wait for the other steps
but this requires you to remember what steps you need to run based on what changes you make
I'm open to suggestions about a better way of structuring this command to make it better
ishaanshah[m]
alastairp: We can maybe add a -b which builds the containers only, Similar to frontend-tester.sh
ah right - this is cases where there are 2 sources, where the only thing that differs is the timestamp is off by a second, so you choose only one?
s/2 sources/2 listens/
ruaok
if could use a second set of eyes (or six) to sanity check this duplicate handling code.
alastairp: that is the fuzzy case, yes.
alastairp
that's neat
ruaok
but there are also other duplicates that we bodged into influx that we should clean up.
I am waiting for a new dump to take care of the latest causes for duplicate data.
but in the meantime, I'd appreciate looking at this to see if it makes sense.
there are some miniscule stats at the bottom of the HTML file.
alastairp
and those bodges were the ones that someone was talking about a few days ago - exactly the same, but some additional flag in influx to be able to add them to stop a conflict?
shivam-kapila
Yeah Zastai was talking about it
ruaok
alastairp: yes.
shivam-kapila: no, those are different.
those are the ones I am still waiting data on
shivam-kapila
Oh sorry then
ruaok
np.
zastai's data had no duplicates as of feb 2020.
Freso: you also had a ton of dups in your stream, yes
?
anyone who has had dups and would like them cleaned up, should provide me and example ASAP.
alastairp
ok, I'll have a look through the code. to confirm, the lookahead is you going forward in the listens a certain number of seconds in order to see if there are any dups?
did ollie have some?
Freso
ruaok: I think I might, but those also exist in Last.fm. Not sure if I have new/unique ones in LB.
alastairp
or was his case that he was sending prod listens to alpha and wanted to keep them?
Freso
Not sure when or how old they are though.
ruaok
alastairp: correct.
correct to the first question, re lookahead.
alastairp
ruaok: cool, will look through it later this afternoon. goti t
ruaok
alastairp: great thanks.
Freso: I'll include you in the next test run and then we'll see.
ollie as in acid2?
Freso
Alright. 👍
alastairp
yes, acid2
ruaok
k
alastairp
I remember us doing something special for him, but I think it was just that we copied listens from alpha to beta, even when we said that we wouldn't
shivam-kapila
This might be reason for influx having mixed shards. For some users that doesnt allow to delete listens
iliekcomputers
the dump died for some reason
there's an error in sentry
:/
ruaok
doh
alastairp
btw, sentry 10 is out too. I have to upgrade it at the uni too
ruaok
alastairp: shall we chat about the recommendation stuff for a minute?
alastairp
sure, let me open it
what kind of chat do you want to have?
ruaok
overall approach.
I read and disgested large chunks of the music recommendation chapter. a really nice summary, I must say.
I didn't pay a whole lot of attention to the context focused recommendations, since we have very little of that data.
I have two goals with all of this:
alastairp
they were talking about rewriting it for a new edition of the book, covering some new tools, that will be good
ruaok
1. Create a project that will open up people to come in an play with recommendations
2. Draw people in to improve the data sets that feed into the tools that are being created in order to create the recommendations. General outreach.
from the high level perspective if our goal was to get the best recommendations, we should be focusing on the collaborative filtering stuff.
alastairp
so initial key deliverables are going to be 1) get data for us and others to use, 2) make some demos using that data, 3) enhance the data as needed as we see that we need more stuff
ruaok
however, that is in a lot of ways, not low hanging fruit -- there is a lot of stuff that still needs to be done and pristine__'s moment has stalled and her contract is up soon.
alastairp
right, I think getting the initial use-cases clear is an important first step
ruaok
yes, more or less those deliverables are about right.
yes, and that might be a good goal for this convo:
alastairp
that is, a general "you might want to listen to this" is quite different imo from "here's a playlist tailored to you based on some criteria that you specified"