ishaanshah[m]: you can add the build option back in, copying how it is in acousticbrainz-server's test.sh
2020-05-05 12617, 2020
ishaanshah[m]
alastairp: Sorry my client got disconnected
2020-05-05 12610, 2020
ishaanshah[m]
The test.sh and integration-test.sh work without having to be built if volume has been mounted
2020-05-05 12643, 2020
ishaanshah[m]
So it would be better to not have build step right?
2020-05-05 12656, 2020
ishaanshah[m]
As it slows down the process considerably
2020-05-05 12634, 2020
Chinmay3199 joined the channel
2020-05-05 12625, 2020
alastairp
no - it's better to build. because at the moment, if I add a new dependency to requirements.txt there is no way to rebuild the image
2020-05-05 12603, 2020
alastairp
my initial idea when I wrote the script was to have 2 modes: 1 which would build+setup+bring up db+test+take down everything, so that you could just run one command have have it do everything
2020-05-05 12625, 2020
alastairp
and a second set of commands that let you build and bring up manually, so that as you say you can specify a specific test to run and not wait for the other steps
2020-05-05 12641, 2020
alastairp
but this requires you to remember what steps you need to run based on what changes you make
2020-05-05 12606, 2020
alastairp
I'm open to suggestions about a better way of structuring this command to make it better
2020-05-05 12650, 2020
ishaanshah[m]
alastairp: We can maybe add a -b which builds the containers only, Similar to frontend-tester.sh
ah right - this is cases where there are 2 sources, where the only thing that differs is the timestamp is off by a second, so you choose only one?
2020-05-05 12607, 2020
alastairp
s/2 sources/2 listens/
2020-05-05 12611, 2020
ruaok
if could use a second set of eyes (or six) to sanity check this duplicate handling code.
2020-05-05 12638, 2020
ruaok
alastairp: that is the fuzzy case, yes.
2020-05-05 12645, 2020
alastairp
that's neat
2020-05-05 12656, 2020
ruaok
but there are also other duplicates that we bodged into influx that we should clean up.
2020-05-05 12613, 2020
ruaok
I am waiting for a new dump to take care of the latest causes for duplicate data.
2020-05-05 12623, 2020
ruaok
but in the meantime, I'd appreciate looking at this to see if it makes sense.
2020-05-05 12637, 2020
ruaok
there are some miniscule stats at the bottom of the HTML file.
2020-05-05 12644, 2020
alastairp
and those bodges were the ones that someone was talking about a few days ago - exactly the same, but some additional flag in influx to be able to add them to stop a conflict?
2020-05-05 12606, 2020
shivam-kapila
Yeah Zastai was talking about it
2020-05-05 12613, 2020
ruaok
alastairp: yes.
2020-05-05 12617, 2020
ruaok
shivam-kapila: no, those are different.
2020-05-05 12631, 2020
ruaok
those are the ones I am still waiting data on
2020-05-05 12638, 2020
shivam-kapila
Oh sorry then
2020-05-05 12651, 2020
ruaok
np.
2020-05-05 12602, 2020
ruaok
zastai's data had no duplicates as of feb 2020.
2020-05-05 12612, 2020
ruaok
Freso: you also had a ton of dups in your stream, yes
2020-05-05 12612, 2020
ruaok
?
2020-05-05 12634, 2020
ruaok
anyone who has had dups and would like them cleaned up, should provide me and example ASAP.
2020-05-05 12648, 2020
alastairp
ok, I'll have a look through the code. to confirm, the lookahead is you going forward in the listens a certain number of seconds in order to see if there are any dups?
2020-05-05 12651, 2020
alastairp
did ollie have some?
2020-05-05 12658, 2020
Freso
ruaok: I think I might, but those also exist in Last.fm. Not sure if I have new/unique ones in LB.
2020-05-05 12607, 2020
alastairp
or was his case that he was sending prod listens to alpha and wanted to keep them?
2020-05-05 12609, 2020
Freso
Not sure when or how old they are though.
2020-05-05 12611, 2020
ruaok
alastairp: correct.
2020-05-05 12626, 2020
ruaok
correct to the first question, re lookahead.
2020-05-05 12631, 2020
alastairp
ruaok: cool, will look through it later this afternoon. goti t
2020-05-05 12637, 2020
ruaok
alastairp: great thanks.
2020-05-05 12649, 2020
ruaok
Freso: I'll include you in the next test run and then we'll see.
2020-05-05 12600, 2020
ruaok
ollie as in acid2?
2020-05-05 12613, 2020
Freso
Alright. 👍
2020-05-05 12643, 2020
alastairp
yes, acid2
2020-05-05 12651, 2020
ruaok
k
2020-05-05 12629, 2020
alastairp
I remember us doing something special for him, but I think it was just that we copied listens from alpha to beta, even when we said that we wouldn't
2020-05-05 12619, 2020
shivam-kapila
This might be reason for influx having mixed shards. For some users that doesnt allow to delete listens
2020-05-05 12646, 2020
iliekcomputers
the dump died for some reason
2020-05-05 12650, 2020
iliekcomputers
there's an error in sentry
2020-05-05 12655, 2020
iliekcomputers
:/
2020-05-05 12610, 2020
ruaok
doh
2020-05-05 12613, 2020
alastairp
btw, sentry 10 is out too. I have to upgrade it at the uni too
2020-05-05 12632, 2020
ruaok
alastairp: shall we chat about the recommendation stuff for a minute?
2020-05-05 12641, 2020
alastairp
sure, let me open it
2020-05-05 12645, 2020
alastairp
what kind of chat do you want to have?
2020-05-05 12630, 2020
ruaok
overall approach.
2020-05-05 12656, 2020
ruaok
I read and disgested large chunks of the music recommendation chapter. a really nice summary, I must say.
2020-05-05 12623, 2020
ruaok
I didn't pay a whole lot of attention to the context focused recommendations, since we have very little of that data.
2020-05-05 12646, 2020
ruaok
I have two goals with all of this:
2020-05-05 12651, 2020
alastairp
they were talking about rewriting it for a new edition of the book, covering some new tools, that will be good
2020-05-05 12630, 2020
ruaok
1. Create a project that will open up people to come in an play with recommendations
2020-05-05 12658, 2020
ruaok
2. Draw people in to improve the data sets that feed into the tools that are being created in order to create the recommendations. General outreach.
2020-05-05 12640, 2020
ruaok
from the high level perspective if our goal was to get the best recommendations, we should be focusing on the collaborative filtering stuff.
2020-05-05 12609, 2020
alastairp
so initial key deliverables are going to be 1) get data for us and others to use, 2) make some demos using that data, 3) enhance the data as needed as we see that we need more stuff
2020-05-05 12618, 2020
ruaok
however, that is in a lot of ways, not low hanging fruit -- there is a lot of stuff that still needs to be done and pristine__'s moment has stalled and her contract is up soon.
2020-05-05 12640, 2020
alastairp
right, I think getting the initial use-cases clear is an important first step
2020-05-05 12645, 2020
ruaok
yes, more or less those deliverables are about right.
2020-05-05 12606, 2020
ruaok
yes, and that might be a good goal for this convo:
2020-05-05 12615, 2020
alastairp
that is, a general "you might want to listen to this" is quite different imo from "here's a playlist tailored to you based on some criteria that you specified"