I'm interested in changing this so that we load 100 at a time or so, which should be much faster
but we still need to make a decision on how to transmit the data
ideally we should use the result of your investigation on data types
did you finish writing a report of each of the types and their sizes/conversion times?
kartikgupta0909
I think for now we can continue with json, in future if we change our minds we can add an externial deserialisation script.
I didnt write the report was waiting for your comment on the ticket.
could you have a look at the data I have posted on the ticket and let me know if its fine
and I ll make the report
alastairp
did you see the comment that Ulrich made?
other than that, what you've done looks good. You should start to write that up. It can just be a markdown file in the same repository that you were using for the code
yes, we can continue using json for now anyway
for me, the next question is how we move the data. We could have the client make a single request for all of the data, and the server loads it from the database, compresses it, and sends it back
but I think for a large dataset (say 1000 items) this will leave the http connection open for a long time
kartikgupta0909
yes, i would suggest that too
yes it would
alastairp
then we need to think about what to do if the connection is broken, etc. how do we continue?
My preference is actually to have a background task
so the client could tell the server "I want this data" and the server can say "OK, come back soon and I'll have it ready"
kartikgupta0909
If the connection breaks then we will have to resend the entire data in case of compression
alastairp
then we can start a background task on the server to extract the data and create an archive
kartikgupta0909
but if we send in batches we could log and restart from where it stopped
ah thats fine too
alastairp
the client can continue polling to see if the archive is ready, and when it is, download it then tell the server that it has it
how do you think batches could work?
something would need to keep state
kartikgupta0909
maybe 10 songs a a time
alastairp
that could work - if the client asks the server for a list of mbids
and then the client sends a request to download 10 items (in your example)
and we keep the state on the client
kartikgupta0909
yes I think it should. The first step should be to get the dataset info including the recording ids
then get those recordings one by one or in batches
Gentlecat
consider that you might need to evaluate same or very similar dataset multiple times
alastairp
cool. you can make that start then
Gentlecat
is there any point in actually loading the same data multiple times?
alastairp
hmm
kartikgupta0909
I get the point
but in any case we will have to pass the data again and again until we store it in user's local machine
alastairp
good point, but I'm not sure if it's optimising for the right thing
kartikgupta0909
but that would consume a lot of memory on user's machine which is not desireable
alastairp
so the client could keep a cache of items
kartikgupta0909: do you mean disk space?
kartikgupta0909
yes
if there are 1000 files
alastairp
I don't think that's a problem. especially if, as Gentlecat says, they are evaluating lots of things
1000 files is small
kartikgupta0909
i guess it would take around 60 mb
alastairp
it's only going to be a problem once they have 100,000
it's an interesting proposal
it does make the client more comples, though
kartikgupta0909
they might have that too although in music IR thats rarely the case
alastairp
complex
Gentlecat
just noting that by sending all data in one archive you are constraining this thing
Freso
People up for reviews tonight: Freso, ruaok, reosarevok, bitmap, Gentlecat, zas, LordSputnik, Leftmost, Leo_Verto, alastairp, CatQuest, rahulr, QuoraUK, armalcolite, hellska, kartikgupta0909 - let me know if you want on/off.
(Meeting in ~35 minutes.)
Gentlecat
monday again!
I need some kind of time slowdown device here
alastairp
kartikgupta0909: I already have a plan this week to look at loading this data in bulk during the dataset evaluation stage
reosarevok
same
mihaitish joined the channel
alastairp
so I will try and do that tomorrow and on Wednesday. I'm interested in seeing how long it takes to load 1000 items 1 at a time, 10 at a time, or 100 at a time
armalcolite: from what I understand, the workflow is like this:
application generates a random token and directs the user to open a url containing api_key and token
kahu joined the channel
user goes to the page and approves access
client makes another query using this token (plus api key and signature) to retrieve a real access token
client uses this real access token in all queries to the scrobble api. this token identifies the user (because the user was logged in when they gave access to the original token)