in #metabrainz

13:39 PM
alastairp

ishaanshah[m]: you can add the build option back in, copying how it is in acousticbrainz-server's test.sh
13:45 PM
ishaanshah[m]

alastairp: Sorry my client got disconnected
13:48 PM
The test.sh and integration-test.sh work without having to be built if volume has been mounted
13:48 PM
So it would be better to not have build step right?
13:48 PM
As it slows down the process considerably
13:49 PM
Chinmay3199 joined the channel
13:50 PM
alastairp

no - it's better to build. because at the moment, if I add a new dependency to requirements.txt there is no way to rebuild the image
13:51 PM
my initial idea when I wrote the script was to have 2 modes: 1 which would build+setup+bring up db+test+take down everything, so that you could just run one command have have it do everything
13:51 PM
and a second set of commands that let you build and bring up manually, so that as you say you can specify a specific test to run and not wait for the other steps
13:51 PM
but this requires you to remember what steps you need to run based on what changes you make
13:52 PM
I'm open to suggestions about a better way of structuring this command to make it better
13:58 PM
ishaanshah[m]

alastairp: We can maybe add a -b which builds the containers only, Similar to frontend-tester.sh
13:58 PM
https://github.com/metabrainz/listenbrainz-serv...
13:59 PM
Sorry for the late reply, matrix is acting wierd
14:00 PM
alastairp

ishaanshah[m]: yes, exactly. however, it should also build when running with no arguments
14:01 PM
if there are 3 test scripts in listenbrainz I strongly suggest that you think about unifying them
14:01 PM
shivam-kapila

ishaanshah[m]: I am doing the -b one. I will push in a while
14:02 PM
alastairp: You suggest that test containers should be rebuilt every time we run test.sh?
14:02 PM
ishaanshah[m]

Ok, maybe we could do -nb which will skip building
14:03 PM
alastairp

shivam-kapila: yes
14:03 PM
ishaanshah[m]

So that we will be conscious about what we are doing
14:03 PM
shivam-kapila

Yeah -nb makes sense
14:03 PM
alastairp

I don't think -nb is useful
14:03 PM
do you understand what I said about the two modes of running tests?
14:04 PM
ishaanshah[m]

Yes, I understood but was thinking about how to skip that part
14:05 PM
shivam-kapila

Yeah the build and up and run for all tests
14:05 PM
And the 2nd one for specific test run
14:05 PM
ishaanshah[m]

Because waiting each time after a small change is a bit frustrating
14:05 PM
shivam-kapila

Yes. For that I agree with -nb.
14:05 PM
Mostly there is no change in requirements.txt
14:06 PM
alastairp

but if you follow the behaviour currently in test.sh you can already do this
14:06 PM
$ ./test.sh -u
14:06 PM
$ ./test.sh path/to/testfile.py # as many times as you need to
14:06 PM
this will not build before running tests
14:07 PM
shivam-kapila

Oh yes -u makes sense. By default if the db is up it means we can skip build
14:07 PM
alastairp

👍 exactly
14:08 PM
the idea is to make a standalone call to test.sh do the right thing. and I believe that the right thing here is to build the image
14:09 PM
ishaanshah[m]

Ok, makes sense
14:10 PM
shivam-kapila

> $ ./test.sh path/to/testfile.py
14:10 PM
Does AB have this currently?
14:10 PM
alastairp

yes
14:10 PM
shivam-kapila

If will be good to be in sync with other projects IMO
14:11 PM
alastairp

I wondered how we could share a single test.sh file over all projects, but I don't know the best way to do this
14:11 PM
ishaanshah[m]

I think that functionality has not been implemented in integration-test.sh yet
14:11 PM
shivam-kapila

The build one?
14:12 PM
ishaanshah[m]: ^
14:12 PM
ishaanshah[m]

the -u one
14:12 PM
shivam-kapila

Hmm... integration-test.sh doesnt have any options AFAIK
14:12 PM
Its a fixed set of commands
14:13 PM
ishaanshah[m]

yes, are you adding it?
14:13 PM
shivam-kapila

Not currently. I am not sure about its need for integration tests.
14:14 PM
alastairp: Can you offer an opinion for the same?
14:14 PM
ishaanshah[m]

To skip the build part?
14:14 PM
alastairp

it depends on how quickly you need feedback on those test runs
14:14 PM
if integration tests are _only_ run in CI, it probably doesn't matter
14:15 PM
if you need to run a single integration test yourself many times, perhaps it's a good idea
14:15 PM
ishaanshah[m]

Oh, I was adding a integration_test
14:15 PM
shivam-kapila

ishaanshah[m]: Its not a skip actually. You will need to run -u always so that the build is done once
14:15 PM
ishaanshah[m]

Yes, but doing it once is fine
14:15 PM
alastairp

keep in mind that -u in acousticbrainz and listenbrainz does _not_ build
14:16 PM
it only starts the database server
14:16 PM
but perhaps that's a good idea
14:16 PM
shivam-kapila

But its good to rebuild to be consisent with everytime build thing
14:16 PM
ishaanshah[m]

I didn't know that integration tests are run in CI only
14:16 PM
I guess its not needed then
14:17 PM
alastairp

shivam-kapila: yeah, perhaps that's a good idea
14:17 PM
shivam-kapila

ishaanshah[m]: Actually you should run them on local setup too
14:17 PM
alastairp

ishaanshah[m]: I didn't say that
14:17 PM
I don't know how integration tests are used
14:17 PM
shivam-kapila

ishaanshah[m]: To run tests in a file you can add the path in integration-test.sh
14:18 PM
ishaanshah[m]

Yes, I did that, the only thing that was bugging me was the time to get feedback
14:18 PM
shivam-kapila

Yes it buulds everytime
14:18 PM
builds*
14:19 PM
We can structure integration-tesh.sh to be similar to test.sh. I think we should wait for iliekcomputers' opinion too
14:20 PM
ishaanshah[m]

Yes, I will just comment the build part in my local setup for now
14:20 PM
shivam-kapila

Lol. Workarounds
14:20 PM
ishaanshah[m]

Thanks for the -u tip alastairp. I didn't know that :D
14:21 PM
I will add back the build command in test.sh then
14:22 PM
shivam-kapila

ishaanshah[m]: Can I continue with that. I was already adding a build part to it.
14:22 PM
ishaanshah[m]

ya sure
14:22 PM
alastairp

ishaanshah[m]: perhaps this should be added to some documentation, to make it clearer that that this functionality exists
14:22 PM
ishaanshah[m]

thanks
14:23 PM
shivam-kapila

-u is in docs alastairp
14:23 PM
https://github.com/metabrainz/listenbrainz-serv...
14:24 PM
Here the flags are listed
14:26 PM
ruaok

iliekcomputers, alastairp: https://github.com/mayhem/timescale-testing/blo...
14:26 PM
this is output from my timescale import program here: https://github.com/mayhem/timescale-testing/blo...
14:27 PM
this is a subset of all listens from a feb dump that includes iliekcomputers, rob and zastai.
14:28 PM
zastai's duplicates are not in there yet, but the last.fm fuzzy type dupes (off by ~1 sec) and straight up duplicates are handled
14:28 PM
the output shows which listens where identified as duplicate, which one was chosen/rejected and the diff between the two listens.
14:28 PM
the duplicate detection logic is here:
14:28 PM
https://github.com/mayhem/timescale-testing/blo...
14:28 PM
alastairp

ah right - this is cases where there are 2 sources, where the only thing that differs is the timestamp is off by a second, so you choose only one?
14:29 PM
s/2 sources/2 listens/
14:29 PM
ruaok

if could use a second set of eyes (or six) to sanity check this duplicate handling code.
14:29 PM
alastairp: that is the fuzzy case, yes.
14:29 PM
alastairp

that's neat
14:29 PM
ruaok

but there are also other duplicates that we bodged into influx that we should clean up.
14:30 PM
I am waiting for a new dump to take care of the latest causes for duplicate data.
14:30 PM
but in the meantime, I'd appreciate looking at this to see if it makes sense.
14:30 PM
there are some miniscule stats at the bottom of the HTML file.
14:30 PM
alastairp

and those bodges were the ones that someone was talking about a few days ago - exactly the same, but some additional flag in influx to be able to add them to stop a conflict?
14:31 PM
shivam-kapila

Yeah Zastai was talking about it
14:31 PM
ruaok

alastairp: yes.
14:31 PM
shivam-kapila: no, those are different.
14:31 PM
those are the ones I am still waiting data on
14:31 PM
shivam-kapila

Oh sorry then
14:31 PM
ruaok

np.
14:32 PM
zastai's data had no duplicates as of feb 2020.
14:32 PM
Freso: you also had a ton of dups in your stream, yes
14:32 PM
?
14:32 PM
anyone who has had dups and would like them cleaned up, should provide me and example ASAP.
14:32 PM
alastairp

ok, I'll have a look through the code. to confirm, the lookahead is you going forward in the listens a certain number of seconds in order to see if there are any dups?
14:32 PM
did ollie have some?
14:32 PM
Freso

ruaok: I think I might, but those also exist in Last.fm. Not sure if I have new/unique ones in LB.
14:33 PM
alastairp

or was his case that he was sending prod listens to alpha and wanted to keep them?
14:33 PM
Freso

Not sure when or how old they are though.
14:33 PM
ruaok

alastairp: correct.
14:33 PM
correct to the first question, re lookahead.
14:33 PM
alastairp

ruaok: cool, will look through it later this afternoon. goti t
14:33 PM
ruaok

alastairp: great thanks.
14:33 PM
Freso: I'll include you in the next test run and then we'll see.
14:34 PM
ollie as in acid2?
14:34 PM
Freso

Alright. 👍
14:34 PM
alastairp

yes, acid2
14:34 PM
ruaok

k
14:35 PM
alastairp

I remember us doing something special for him, but I think it was just that we copied listens from alpha to beta, even when we said that we wouldn't
14:36 PM
shivam-kapila

This might be reason for influx having mixed shards. For some users that doesnt allow to delete listens
14:40 PM
iliekcomputers

the dump died for some reason
14:40 PM
there's an error in sentry
14:40 PM
:/
14:42 PM
ruaok

doh
14:44 PM
alastairp

btw, sentry 10 is out too. I have to upgrade it at the uni too
14:44 PM
ruaok

alastairp: shall we chat about the recommendation stuff for a minute?
14:45 PM
alastairp

sure, let me open it
14:45 PM
what kind of chat do you want to have?
14:46 PM
ruaok

overall approach.
14:46 PM
I read and disgested large chunks of the music recommendation chapter. a really nice summary, I must say.
14:47 PM
I didn't pay a whole lot of attention to the context focused recommendations, since we have very little of that data.
14:47 PM
I have two goals with all of this:
14:47 PM
alastairp

they were talking about rewriting it for a new edition of the book, covering some new tools, that will be good
14:48 PM
ruaok

1. Create a project that will open up people to come in an play with recommendations
14:48 PM
2. Draw people in to improve the data sets that feed into the tools that are being created in order to create the recommendations. General outreach.
14:49 PM
from the high level perspective if our goal was to get the best recommendations, we should be focusing on the collaborative filtering stuff.
14:50 PM
alastairp

so initial key deliverables are going to be 1) get data for us and others to use, 2) make some demos using that data, 3) enhance the data as needed as we see that we need more stuff
14:50 PM
ruaok

however, that is in a lot of ways, not low hanging fruit -- there is a lot of stuff that still needs to be done and pristine__'s moment has stalled and her contract is up soon.
14:50 PM
alastairp

right, I think getting the initial use-cases clear is an important first step
14:50 PM
ruaok

yes, more or less those deliverables are about right.
14:51 PM
yes, and that might be a good goal for this convo:
14:51 PM
alastairp

that is, a general "you might want to listen to this" is quite different imo from "here's a playlist tailored to you based on some criteria that you specified"