#metabrainz

/

      • reosarevok
        So you did manage to break MB after all!
      • Good job
      • alfie
        :D
      • CallerNo6
        IIUC that means that you get to keep both pieces.
      • reosarevok
        I mean, some of us break stuff all the time, but breaking stuff in your first day is a strong start
      • As long as you keep reporting it, go and break as much as you manage! ;)
      • alfie
        :D that's my job!
      • alastairp
        kartikgupta0909: right. passing from the server to the client is an interesting task
      • kartikgupta0909
        are we using json for the low level data or switching to something else?
      • alastairp
        currently we get items from the database for the model processing step one at a time: https://github.com/metabrainz/acousticbrainz-se...
      • I'm interested in changing this so that we load 100 at a time or so, which should be much faster
      • but we still need to make a decision on how to transmit the data
      • ideally we should use the result of your investigation on data types
      • did you finish writing a report of each of the types and their sizes/conversion times?
      • kartikgupta0909
        I think for now we can continue with json, in future if we change our minds we can add an externial deserialisation script.
      • I didnt write the report was waiting for your comment on the ticket.
      • could you have a look at the data I have posted on the ticket and let me know if its fine
      • and I ll make the report
      • alastairp
        did you see the comment that Ulrich made?
      • other than that, what you've done looks good. You should start to write that up. It can just be a markdown file in the same repository that you were using for the code
      • yes, we can continue using json for now anyway
      • for me, the next question is how we move the data. We could have the client make a single request for all of the data, and the server loads it from the database, compresses it, and sends it back
      • MBJenkins
        Yippee, build fixed!
      • Project musicbrainz-server_master build #508: FIXED in 19 min: https://ci.metabrainz.org/job/musicbrainz-serve...
      • alastairp
        but I think for a large dataset (say 1000 items) this will leave the http connection open for a long time
      • kartikgupta0909
        yes, i would suggest that too
      • yes it would
      • alastairp
        then we need to think about what to do if the connection is broken, etc. how do we continue?
      • My preference is actually to have a background task
      • so the client could tell the server "I want this data" and the server can say "OK, come back soon and I'll have it ready"
      • kartikgupta0909
        If the connection breaks then we will have to resend the entire data in case of compression
      • alastairp
        then we can start a background task on the server to extract the data and create an archive
      • kartikgupta0909
        but if we send in batches we could log and restart from where it stopped
      • ah thats fine too
      • alastairp
        the client can continue polling to see if the archive is ready, and when it is, download it then tell the server that it has it
      • how do you think batches could work?
      • something would need to keep state
      • kartikgupta0909
        maybe 10 songs a a time
      • alastairp
        that could work - if the client asks the server for a list of mbids
      • and then the client sends a request to download 10 items (in your example)
      • and we keep the state on the client
      • kartikgupta0909
        yes I think it should. The first step should be to get the dataset info including the recording ids
      • then get those recordings one by one or in batches
      • Gentlecat
        consider that you might need to evaluate same or very similar dataset multiple times
      • alastairp
        cool. you can make that start then
      • Gentlecat
        is there any point in actually loading the same data multiple times?
      • alastairp
        hmm
      • kartikgupta0909
        I get the point
      • but in any case we will have to pass the data again and again until we store it in user's local machine
      • alastairp
        good point, but I'm not sure if it's optimising for the right thing
      • kartikgupta0909
        but that would consume a lot of memory on user's machine which is not desireable
      • alastairp
        so the client could keep a cache of items
      • kartikgupta0909: do you mean disk space?
      • kartikgupta0909
        yes
      • if there are 1000 files
      • alastairp
        I don't think that's a problem. especially if, as Gentlecat says, they are evaluating lots of things
      • 1000 files is small
      • kartikgupta0909
        i guess it would take around 60 mb
      • alastairp
        it's only going to be a problem once they have 100,000
      • it's an interesting proposal
      • it does make the client more comples, though
      • kartikgupta0909
        they might have that too although in music IR thats rarely the case
      • alastairp
        complex
      • Gentlecat
        just noting that by sending all data in one archive you are constraining this thing
      • Freso
        People up for reviews tonight: Freso, ruaok, reosarevok, bitmap, Gentlecat, zas, LordSputnik, Leftmost, Leo_Verto, alastairp, CatQuest, rahulr, QuoraUK, armalcolite, hellska, kartikgupta0909 - let me know if you want on/off.
      • (Meeting in ~35 minutes.)
      • Gentlecat
        monday again!
      • I need some kind of time slowdown device here
      • alastairp
        kartikgupta0909: I already have a plan this week to look at loading this data in bulk during the dataset evaluation stage
      • reosarevok
        same
      • mihaitish joined the channel
      • alastairp
        so I will try and do that tomorrow and on Wednesday. I'm interested in seeing how long it takes to load 1000 items 1 at a time, 10 at a time, or 100 at a time
      • kartikgupta0909
        ah okay.
      • alastairp
      • kartikgupta0909
        but wont it depend on the user's machine?
      • Gentlecat
        alastairp: how is dataset evaluation going? did you restart the script?
      • alastairp
        ah, no. I wanted to merge that PR
      • Gentlecat
        or you want to merge the changes to it first?
      • right
      • alastairp
        since we can keep the history/results
      • but I need to move the data files into /tmp like you suggested
      • Freso
        armalcolite: Kodi is also able to scrobble. Spotify too... :p
      • alastairp
        Freso: spotify is difficult though
      • Gentlecat
        I also need to find some time to make a validation dataset and implement accuracy measurement script
      • alastairp
        because as far as we understand, they scrobble from their end
      • not from the client
      • Gentlecat
        dmitry was talking that he already had something for cross-dataset validation, is that the same thing?
      • alastairp
        yes, that's my stuff
      • alfie earperks
      • Gentlecat
        ok
      • alfie
        scrobbling?
      • alastairp
      • Gentlecat
        is it available somewhere?
      • alastairp
        yes, it should be
      • let me find it
      • alfie
        alastairp: loving y'all more and more.
      • alastairp
      • just inviting you now
      • Gentlecat
        maybe you can make a pull request with it into some directory in AB?
      • and I'll base my stuff on it
      • should take a look first though
      • alastairp
        that's next week's job (I want a rough version of it ready for ismir too)
      • the code is currently written for research, rather than integration
      • I need to work out how to extract the relevant stuff
      • Freso
        alastairp: Ah, that might be so.
      • alfie: :)
      • armalcolite
      • the API key should be used for making the web-requests
      • Leftmost
        Freso, another week with nothing from me. What a slacker.
      • Freso fires Leftmost
      • Ha, joke's on you. No one ever hired me!
      • Freso
        Ha, joke's on *you* - I don't actually have the ability to fire people!
      • armalcolite
        alastairp: currently, i check the API key in the GET request and match it with the user who approves it to give him a session key.
      • CatQuest joined the channel
      • CatQuest has quit
      • CatQuest joined the channel
      • alastairp
        alfie: we try
      • armalcolite
        alastairp: but i realised today (when testing lastfm windows client) that they have API hardcoded in the app.
      • alfie
        once i've got some free time i'll see if i can get my cmus script talking to listenbrainz
      • Freso
        cmus?
      • alfie
        cmus, the best music player :D
      • armalcolite
        alastairp: i was using the lastfm
      • alastairp: i was using the lastfm's official windows client.
      • alastairp
        armalcolite: but if a user loads this auth page (acousticbrainz.org/api/auth/?api_key=x&token=y) we know that user armalcolite is going to use token y
      • Freso turns to DDG
      • so we can create that link in the auth table
      • armalcolite
        alastairp: that was just double auth i implemented
      • alastairp
        in fact, we don't even need api_key
      • see how I mentioned in the ticket that this is used to identify the *app*, not the user
      • this is how last.fm knows to say "Audacious wants access to your account"
      • note that we can't say what the app is in this case, because that mapping is private to last.fm
      • so we can just say "A legacy last.fm app wants access to your account"
      • Freso
        alfie: Hosted on SourceForge? Really? :/
      • Ah, no. Redirected to GitHub. Nvm. ❤️
      • armalcolite
        tokens are fetched for apps then?
      • alfie
        Freso: https://cmus.github.io/ scuse me. :P
      • armalcolite
        bcoz token require API_KEY
      • *requires
      • alastairp
        armalcolite: from what I understand, the workflow is like this:
      • application generates a random token and directs the user to open a url containing api_key and token
      • kahu joined the channel
      • user goes to the page and approves access
      • client makes another query using this token (plus api key and signature) to retrieve a real access token
      • client uses this real access token in all queries to the scrobble api. this token identifies the user (because the user was logged in when they gave access to the original token)
      • Lotheric_ has quit
      • Freso
        alfie: Note that LB has its own API in addition to Last.FM compatible one currently being developed: https://listenbrainz.readthedocs.io/
      • armalcolite
        yes.
      • but the token should not be random, http://www.last.fm/api/show/auth.getToken
      • alastairp
        ah, I forgot the step that the app actually generates the token
      • right :)