I'm interested in changing this so that we load 100 at a time or so, which should be much faster
2016-07-18 20058, 2016
alastairp
but we still need to make a decision on how to transmit the data
2016-07-18 20015, 2016
alastairp
ideally we should use the result of your investigation on data types
2016-07-18 20034, 2016
alastairp
did you finish writing a report of each of the types and their sizes/conversion times?
2016-07-18 20039, 2016
kartikgupta0909
I think for now we can continue with json, in future if we change our minds we can add an externial deserialisation script.
2016-07-18 20057, 2016
kartikgupta0909
I didnt write the report was waiting for your comment on the ticket.
2016-07-18 20041, 2016
kartikgupta0909
could you have a look at the data I have posted on the ticket and let me know if its fine
2016-07-18 20048, 2016
kartikgupta0909
and I ll make the report
2016-07-18 20036, 2016
alastairp
did you see the comment that Ulrich made?
2016-07-18 20019, 2016
alastairp
other than that, what you've done looks good. You should start to write that up. It can just be a markdown file in the same repository that you were using for the code
2016-07-18 20027, 2016
alastairp
yes, we can continue using json for now anyway
2016-07-18 20020, 2016
alastairp
for me, the next question is how we move the data. We could have the client make a single request for all of the data, and the server loads it from the database, compresses it, and sends it back
but I think for a large dataset (say 1000 items) this will leave the http connection open for a long time
2016-07-18 20048, 2016
kartikgupta0909
yes, i would suggest that too
2016-07-18 20007, 2016
kartikgupta0909
yes it would
2016-07-18 20027, 2016
alastairp
then we need to think about what to do if the connection is broken, etc. how do we continue?
2016-07-18 20034, 2016
alastairp
My preference is actually to have a background task
2016-07-18 20053, 2016
alastairp
so the client could tell the server "I want this data" and the server can say "OK, come back soon and I'll have it ready"
2016-07-18 20054, 2016
kartikgupta0909
If the connection breaks then we will have to resend the entire data in case of compression
2016-07-18 20007, 2016
alastairp
then we can start a background task on the server to extract the data and create an archive
2016-07-18 20011, 2016
kartikgupta0909
but if we send in batches we could log and restart from where it stopped
2016-07-18 20025, 2016
kartikgupta0909
ah thats fine too
2016-07-18 20030, 2016
alastairp
the client can continue polling to see if the archive is ready, and when it is, download it then tell the server that it has it
2016-07-18 20038, 2016
alastairp
how do you think batches could work?
2016-07-18 20052, 2016
alastairp
something would need to keep state
2016-07-18 20006, 2016
kartikgupta0909
maybe 10 songs a a time
2016-07-18 20007, 2016
alastairp
that could work - if the client asks the server for a list of mbids
2016-07-18 20022, 2016
alastairp
and then the client sends a request to download 10 items (in your example)
2016-07-18 20033, 2016
alastairp
and we keep the state on the client
2016-07-18 20041, 2016
kartikgupta0909
yes I think it should. The first step should be to get the dataset info including the recording ids
2016-07-18 20056, 2016
kartikgupta0909
then get those recordings one by one or in batches
2016-07-18 20009, 2016
Gentlecat
consider that you might need to evaluate same or very similar dataset multiple times
2016-07-18 20012, 2016
alastairp
cool. you can make that start then
2016-07-18 20026, 2016
Gentlecat
is there any point in actually loading the same data multiple times?
2016-07-18 20028, 2016
alastairp
hmm
2016-07-18 20040, 2016
kartikgupta0909
I get the point
2016-07-18 20002, 2016
kartikgupta0909
but in any case we will have to pass the data again and again until we store it in user's local machine
2016-07-18 20005, 2016
alastairp
good point, but I'm not sure if it's optimising for the right thing
2016-07-18 20018, 2016
kartikgupta0909
but that would consume a lot of memory on user's machine which is not desireable
2016-07-18 20025, 2016
alastairp
so the client could keep a cache of items
2016-07-18 20031, 2016
alastairp
kartikgupta0909: do you mean disk space?
2016-07-18 20037, 2016
kartikgupta0909
yes
2016-07-18 20043, 2016
kartikgupta0909
if there are 1000 files
2016-07-18 20048, 2016
alastairp
I don't think that's a problem. especially if, as Gentlecat says, they are evaluating lots of things
2016-07-18 20051, 2016
alastairp
1000 files is small
2016-07-18 20053, 2016
kartikgupta0909
i guess it would take around 60 mb
2016-07-18 20002, 2016
alastairp
it's only going to be a problem once they have 100,000
2016-07-18 20013, 2016
alastairp
it's an interesting proposal
2016-07-18 20022, 2016
alastairp
it does make the client more comples, though
2016-07-18 20023, 2016
kartikgupta0909
they might have that too although in music IR thats rarely the case
2016-07-18 20025, 2016
alastairp
complex
2016-07-18 20007, 2016
Gentlecat
just noting that by sending all data in one archive you are constraining this thing
2016-07-18 20000, 2016
Freso
People up for reviews tonight: Freso, ruaok, reosarevok, bitmap, Gentlecat, zas, LordSputnik, Leftmost, Leo_Verto, alastairp, CatQuest, rahulr, QuoraUK, armalcolite, hellska, kartikgupta0909 - let me know if you want on/off.
2016-07-18 20009, 2016
Freso
(Meeting in ~35 minutes.)
2016-07-18 20012, 2016
Gentlecat
monday again!
2016-07-18 20029, 2016
Gentlecat
I need some kind of time slowdown device here
2016-07-18 20057, 2016
alastairp
kartikgupta0909: I already have a plan this week to look at loading this data in bulk during the dataset evaluation stage
2016-07-18 20007, 2016
reosarevok
same
2016-07-18 20021, 2016
mihaitish joined the channel
2016-07-18 20027, 2016
alastairp
so I will try and do that tomorrow and on Wednesday. I'm interested in seeing how long it takes to load 1000 items 1 at a time, 10 at a time, or 100 at a time
armalcolite: from what I understand, the workflow is like this:
2016-07-18 20012, 2016
alastairp
application generates a random token and directs the user to open a url containing api_key and token
2016-07-18 20021, 2016
kahu joined the channel
2016-07-18 20025, 2016
alastairp
user goes to the page and approves access
2016-07-18 20043, 2016
alastairp
client makes another query using this token (plus api key and signature) to retrieve a real access token
2016-07-18 20017, 2016
alastairp
client uses this real access token in all queries to the scrobble api. this token identifies the user (because the user was logged in when they gave access to the original token)