in #metabrainz

18:00 PM
reosarevok

So you did manage to break MB after all!
18:00 PM
Good job
18:00 PM
alfie

:D
18:01 PM
CallerNo6

IIUC that means that you get to keep both pieces.
18:02 PM
reosarevok

I mean, some of us break stuff all the time, but breaking stuff in your first day is a strong start
18:02 PM
As long as you keep reporting it, go and break as much as you manage! ;)
18:02 PM
alfie

:D that's my job!
18:11 PM
alastairp

kartikgupta0909: right. passing from the server to the client is an interesting task
18:12 PM
kartikgupta0909

are we using json for the low level data or switching to something else?
18:12 PM
alastairp

currently we get items from the database for the model processing step one at a time: https://github.com/metabrainz/acousticbrainz-se...
18:12 PM
I'm interested in changing this so that we load 100 at a time or so, which should be much faster
18:12 PM
but we still need to make a decision on how to transmit the data
18:13 PM
ideally we should use the result of your investigation on data types
18:13 PM
did you finish writing a report of each of the types and their sizes/conversion times?
18:13 PM
kartikgupta0909

I think for now we can continue with json, in future if we change our minds we can add an externial deserialisation script.
18:13 PM
I didnt write the report was waiting for your comment on the ticket.
18:14 PM
could you have a look at the data I have posted on the ticket and let me know if its fine
18:14 PM
and I ll make the report
18:15 PM
alastairp

did you see the comment that Ulrich made?
18:16 PM
other than that, what you've done looks good. You should start to write that up. It can just be a markdown file in the same repository that you were using for the code
18:16 PM
yes, we can continue using json for now anyway
18:17 PM
for me, the next question is how we move the data. We could have the client make a single request for all of the data, and the server loads it from the database, compresses it, and sends it back
18:17 PM
MBJenkins

Yippee, build fixed!
18:17 PM
Project musicbrainz-server_master build #508: FIXED in 19 min: https://ci.metabrainz.org/job/musicbrainz-serve...
18:17 PM
alastairp

but I think for a large dataset (say 1000 items) this will leave the http connection open for a long time
18:17 PM
kartikgupta0909

yes, i would suggest that too
18:18 PM
yes it would
18:18 PM
alastairp

then we need to think about what to do if the connection is broken, etc. how do we continue?
18:18 PM
My preference is actually to have a background task
18:18 PM
so the client could tell the server "I want this data" and the server can say "OK, come back soon and I'll have it ready"
18:18 PM
kartikgupta0909

If the connection breaks then we will have to resend the entire data in case of compression
18:19 PM
alastairp

then we can start a background task on the server to extract the data and create an archive
18:19 PM
kartikgupta0909

but if we send in batches we could log and restart from where it stopped
18:19 PM
ah thats fine too
18:19 PM
alastairp

the client can continue polling to see if the archive is ready, and when it is, download it then tell the server that it has it
18:19 PM
how do you think batches could work?
18:19 PM
something would need to keep state
18:20 PM
kartikgupta0909

maybe 10 songs a a time
18:20 PM
alastairp

that could work - if the client asks the server for a list of mbids
18:20 PM
and then the client sends a request to download 10 items (in your example)
18:20 PM
and we keep the state on the client
18:20 PM
kartikgupta0909

yes I think it should. The first step should be to get the dataset info including the recording ids
18:20 PM
then get those recordings one by one or in batches
18:21 PM
Gentlecat

consider that you might need to evaluate same or very similar dataset multiple times
18:21 PM
alastairp

cool. you can make that start then
18:21 PM
Gentlecat

is there any point in actually loading the same data multiple times?
18:21 PM
alastairp

hmm
18:21 PM
kartikgupta0909

I get the point
18:22 PM
but in any case we will have to pass the data again and again until we store it in user's local machine
18:22 PM
alastairp

good point, but I'm not sure if it's optimising for the right thing
18:22 PM
kartikgupta0909

but that would consume a lot of memory on user's machine which is not desireable
18:22 PM
alastairp

so the client could keep a cache of items
18:22 PM
kartikgupta0909: do you mean disk space?
18:22 PM
kartikgupta0909

yes
18:22 PM
if there are 1000 files
18:22 PM
alastairp

I don't think that's a problem. especially if, as Gentlecat says, they are evaluating lots of things
18:22 PM
1000 files is small
18:22 PM
kartikgupta0909

i guess it would take around 60 mb
18:23 PM
alastairp

it's only going to be a problem once they have 100,000
18:23 PM
it's an interesting proposal
18:23 PM
it does make the client more comples, though
18:23 PM
kartikgupta0909

they might have that too although in music IR thats rarely the case
18:23 PM
alastairp

complex
18:24 PM
Gentlecat

just noting that by sending all data in one archive you are constraining this thing
18:25 PM
Freso

People up for reviews tonight: Freso, ruaok, reosarevok, bitmap, Gentlecat, zas, LordSputnik, Leftmost, Leo_Verto, alastairp, CatQuest, rahulr, QuoraUK, armalcolite, hellska, kartikgupta0909 - let me know if you want on/off.
18:25 PM
(Meeting in ~35 minutes.)
18:25 PM
Gentlecat

monday again!
18:25 PM
I need some kind of time slowdown device here
18:25 PM
alastairp

kartikgupta0909: I already have a plan this week to look at loading this data in bulk during the dataset evaluation stage
18:26 PM
reosarevok

same
18:26 PM
mihaitish joined the channel
18:26 PM
alastairp

so I will try and do that tomorrow and on Wednesday. I'm interested in seeing how long it takes to load 1000 items 1 at a time, 10 at a time, or 100 at a time
18:26 PM
kartikgupta0909

ah okay.
18:26 PM
alastairp

perhaps this ties into http://tickets.musicbrainz.org/browse/AB-21
18:26 PM
kartikgupta0909

but wont it depend on the user's machine?
18:26 PM
Gentlecat

alastairp: how is dataset evaluation going? did you restart the script?
18:27 PM
alastairp

ah, no. I wanted to merge that PR
18:27 PM
Gentlecat

or you want to merge the changes to it first?
18:27 PM
right
18:27 PM
alastairp

since we can keep the history/results
18:27 PM
but I need to move the data files into /tmp like you suggested
18:27 PM
Freso

armalcolite: Kodi is also able to scrobble. Spotify too... :p
18:27 PM
alastairp

Freso: spotify is difficult though
18:27 PM
Gentlecat

I also need to find some time to make a validation dataset and implement accuracy measurement script
18:27 PM
alastairp

because as far as we understand, they scrobble from their end
18:28 PM
not from the client
18:28 PM
Gentlecat

dmitry was talking that he already had something for cross-dataset validation, is that the same thing?
18:28 PM
alastairp

yes, that's my stuff
18:28 PM
alfie earperks
18:28 PM
Gentlecat

ok
18:28 PM
alfie

scrobbling?
18:28 PM
alastairp

alfie: http://listenbrainz.org/
18:28 PM
Gentlecat

is it available somewhere?
18:28 PM
alastairp

yes, it should be
18:28 PM
let me find it
18:28 PM
alfie

alastairp: loving y'all more and more.
18:29 PM
alastairp

Gentlecat: https://github.com/MTG/acousticbrainz-research/...
18:29 PM
just inviting you now
18:29 PM
Gentlecat

maybe you can make a pull request with it into some directory in AB?
18:29 PM
and I'll base my stuff on it
18:29 PM
should take a look first though
18:29 PM
alastairp

that's next week's job (I want a rough version of it ready for ismir too)
18:30 PM
the code is currently written for research, rather than integration
18:30 PM
I need to work out how to extract the relevant stuff
18:30 PM
Freso

alastairp: Ah, that might be so.
18:31 PM
alfie: :)
18:31 PM
armalcolite

alastairp: regarding this: https://github.com/metabrainz/listenbrainz-serv...
18:31 PM
the API key should be used for making the web-requests
18:32 PM
Leftmost

Freso, another week with nothing from me. What a slacker.
18:32 PM
Freso fires Leftmost
18:32 PM
Ha, joke's on you. No one ever hired me!
18:32 PM
Freso

Ha, joke's on *you* - I don't actually have the ability to fire people!
18:33 PM
armalcolite

alastairp: currently, i check the API key in the GET request and match it with the user who approves it to give him a session key.
18:33 PM
CatQuest joined the channel
18:33 PM
CatQuest has quit
18:33 PM
CatQuest joined the channel
18:33 PM
alastairp

alfie: we try
18:33 PM
armalcolite

alastairp: but i realised today (when testing lastfm windows client) that they have API hardcoded in the app.
18:33 PM
alfie

once i've got some free time i'll see if i can get my cmus script talking to listenbrainz
18:34 PM
Freso

cmus?
18:34 PM
alfie

cmus, the best music player :D
18:34 PM
armalcolite

alastairp: i was using the lastfm
18:34 PM
alastairp: i was using the lastfm's official windows client.
18:34 PM
alastairp

armalcolite: but if a user loads this auth page (acousticbrainz.org/api/auth/?api_key=x&token=y) we know that user armalcolite is going to use token y
18:34 PM
Freso turns to DDG
18:35 PM
so we can create that link in the auth table
18:35 PM
armalcolite

alastairp: that was just double auth i implemented
18:35 PM
alastairp

in fact, we don't even need api_key
18:35 PM
see how I mentioned in the ticket that this is used to identify the *app*, not the user
18:35 PM
this is how last.fm knows to say "Audacious wants access to your account"
18:36 PM
note that we can't say what the app is in this case, because that mapping is private to last.fm
18:36 PM
so we can just say "A legacy last.fm app wants access to your account"
18:37 PM
Freso

alfie: Hosted on SourceForge? Really? :/
18:37 PM
Ah, no. Redirected to GitHub. Nvm. ❤️
18:37 PM
armalcolite

tokens are fetched for apps then?
18:37 PM
alfie

Freso: https://cmus.github.io/ scuse me. :P
18:37 PM
armalcolite

bcoz token require API_KEY
18:37 PM
*requires
18:38 PM
alastairp

armalcolite: from what I understand, the workflow is like this:
18:39 PM
application generates a random token and directs the user to open a url containing api_key and token
18:39 PM
kahu joined the channel
18:39 PM
user goes to the page and approves access
18:39 PM
client makes another query using this token (plus api key and signature) to retrieve a real access token
18:40 PM
client uses this real access token in all queries to the scrobble api. this token identifies the user (because the user was logged in when they gave access to the original token)
18:40 PM
Lotheric_ has quit
18:40 PM
Freso

alfie: Note that LB has its own API in addition to Last.FM compatible one currently being developed: https://listenbrainz.readthedocs.io/
18:40 PM
armalcolite

yes.
18:40 PM
but the token should not be random, http://www.last.fm/api/show/auth.getToken
18:40 PM
alastairp

ah, I forgot the step that the app actually generates the token
18:40 PM
right :)