In the tunepimp library I am creating a nice OO model for background processing the TRM generation, the TRM lookup, and optionally the filelookup if the TRM lookup yields nothing.
I've got a pipeline setup that moves files through the process nicely.
However, if a user inserts 1000 files into a TP based application and lets them churn through all of them it will do file lookups for each of the files that were not found.
djce
so far so good...
ruaok
In the current tagger, the file lookups only happen when a user requests them.
djce
ah. ok.
ruaok
In this new model, they would ALWAYS be done when the TRM doesn't return a match.
Not optimal, but I would like to not have the user wait for these things when they can be done in the background.
It would be nice that if a file was not found, that the application has already done one lookup so the user can take the next step right away.
That presents more traffic against the server.
Which I don't like.
djce
Because of load on the server, or because of network speed for the client?
ruaok
The alternatives are to give some guidelines to the application writer to use some heuristics to reduce the number of needless lookups, but that makes working with the TP more complicated.
Both, really.
More the server, though.
Scalability always nags me in the back on my mind, and while this is a clean architecural decision, it has potential severe ramifications on the server.
djce
part of that could be saved by combining it into one request-response.
ruaok
And I don't like wasting out precious bandwidth.
I had thought about that as well.
djce
you could even combine it with the TRM generation, if the MB_SERVER is set to zim.
ruaok
Maybe the TRM lookup should be skipped and we should just do a filelookup for each file.
djce
If you're serious about saving network traffic, you should consider a single request-response that handles multiple files.
so maybe one sigserver lookup per file,
but then a combined TRM-lookup-plus-file-lookup for /n/ files.
ruaok
And what do we need to conserve more?
network bandwidth or server capacity?
djce
mmmm.
ruaok
Your approach saves bandwidth, which is not too drastic in this case, IMHO.
djce
ideally the library should accept some sort of "hint" from the calling app
ruaok
But the server capacity is the same in the end.
djce
to suggest which files to lookup , and which not to do yet.
ruaok
I'm just trying to avoid making calls to the server that the user may never need.
djce
example: often when I use the tagger, I load in many files
ruaok
hints: yes, that is what I was hinting at. :-)
djce
but then never follow it through and throw half the results away,
ruaok groans
ruaok
Ding.
djce
well, from the TP lib point of view, that's easy.
It just moves the hard work to the calling app.
Unless the app is lazy and either
ruaok
Ding, and the whole ideab behind TP is make the calling app a snap.
djce
a) always does the file lookup immediately or
b) always defers the file lookup until the last minute.
The calling app can still be easy,
but to be /slick/ requires some effort.
I think that's the best you can do with any lib.
Using a lib is never just a matter of "plug it in"; you always need to understand the best way to use the tool.
ruaok
You're right on the money.
At the same time, lots of people never learn the proper way to do things.
djce
Right on.
ruaok
And provind an easy, almost foolproof way of doing things, is the best way to avoid that.
So, I'm debating back and forth on this issue, and came to no good conclusions.
djce
Easy, accurate, it will work. But suboptimal.
Optimising the app is the work of the app writer, not you.
unless you write the app too of course :-)
$accounts{'djce'}->debit("0.02") :-)
ruaok
And I do plan to do that, but I won't be the only done doing this.
:-)
djce has to stop quoting in Perl
My other thought is to simply say that the user experience is important and that such lookup queries can be replicated to mirror servers.
djce
also true
ruaok
Perhaps there is some way the TP lib could delay the lookups if the user does not appear to be using them.
But that is fraught with peril -- it seems a sketchy proposition at best.
So, if you had to make this call, what call would you make?
djce
given that the logic for "intelligently" pre-looking-up some files is both tricky and ill-defined,
I currently would have to opt for no pre-lookups at all.
But also think about single-RDF calls to handle several files at once.
ruaok
Hmmmm. Lemme go read some server code real quick.
Go check out QS.pm, and look for TrackInfoFromTRMId
I think I have the answer for this one already. :-)
If the TRM lookup fails it does a filelookup already.
djce
I see it, but I don't pretend to understand it :-(
Oh. Now I see :-)
Well, that answers one question.
ruaok
It passes all the known info to the FileLookup and if the lookup is 90% or better it returns a match.
If not the results are discarded.
So, we should create a new function that combines both and returns meaningful results if there was no TRM match.
djce
Right.
So close already. Should be a simple change (to the server, anyway)
ruaok
Thus we would actually be more efficient than we are now if we do pre-lookups on all the tracks.
The RDF will be muddled since the outcome of the query could return vastyl different info.
I think I will create a completely new lookup function and slowly phase out the old one.
Ick.
But its the best way to go, and it will be faster.
djce
indeed so.
Make the new query handle multiple files too?
ruaok
Ok, I have a plan of attack then.
hmmm.
djce
You don't have to use it, yet. Just allow for it.
ruaok
That is an idea....
OK, I will consider it when I design the new RDF query/response.
djce
Specify as "... a list of track info (where currently the list must contain exactly one track)"