#musicbrainz

/

      • djce
        It's never as simple as you want it to be :-(
      • Bear in mind the other things on, say, an Artist page, all of which affect the caching:
      • log in/out status, "mod pending" status, expanded/collapsed albums,...
      • ugh
      • ruaok
        big ugh
      • intrep
        are the majority of the pages you server to people who are logged in or out?
      • i would think they would be to anonymous users (logged out) who dont get those stats
      • salisan
        Some form of E-Tag: calculation would work to just lower the bandwidth usage.. but cpu usage would remain
      • intrep
        you could also cache artist pages that are requested anonymously and cache them for like 5 minutes
      • and return lastmod for the last time it was generated
      • djce
        Are you volunteering to make it happen? :-)
      • intrep
        maybe
      • :)
      • djce is pleased to hear it!
      • ruaok
        anyone here speak french?
      • intrep
        do you have any rough idea of what percentage of your requests are from people who arent logged in?
      • ruaok
        Votre message pour la liste escape_l a �t� transmis au(x) mod�rateur(s)
      • intrep
        also, do you have any rough idea of which pages are the majority of your traffic? artist/album/track searches? moderation?
      • ruaok
        Your message about the list ....
      • djce
        "Your message for the "escape_l" list has been sent to the moderators"
      • I think
      • ruaok
        Makes sense now. Thanks!
      • djce
        user logging: that reminds me, I was going to log the MB moderator name to the access logs. On the TODO list with that one...
      • salisan
        intrep: You can get some of that info from http://www.musicbrainz.org/usage/usage_200302.html it seems
      • intrep
        salisan: k, thanks
      • salisan
        showartist and showalbum seem to be leading amont the ordinary pages ..
      • djce
        I doubt it. You mean the "unique usernames" bit?
      • At the moment that would be taken directly from the access logs; so user "-" for 99.99% for all requests, and other names for some other HTTP requests (but not for the main web site)
      • But if I make the change I mentioned a minute ago, that stat would become useful...
      • salisan
        Right, but you can get what pages have most requests now anyway, just not if they are from logged in users or not ;)
      • djce
        yes
      • ruaok|snug is off to spend time with the lead MB donor.
      • intrep
        ok, im not familiar with mason so i have a quick question
      • at the bottom of showartist.html there is a tag <& /comp/footer &>, i am going to take a wild guess that this is the end of the generated html, is there any way to get mason to dump the html for the whole page into a variable at this point?
      • djce
        Hmmm...
      • You can call $m->scomp($page) to "run" a page and capture its output
      • i.e. $output = $m->comp("/path/to/component", @args)
      • If you haven't been called as $m->scomp, I don't know if it's possible.
      • intrep
        mason has its own object cache
      • keeps expire times and everything
      • interesting...
      • djce
        yes, that's useful, if you know what you're "expire" conditions are.
      • For small parts of pages, e.g. the "top moderators" panel in the sidebar, the cache is useful then since it only depends on time, not any other inputs.
      • intrep
        well, i would think the entire page for anonymous lookups of showartist and showalbum could be cached for say, five minutes or so
      • djce
        Except for the "expand" settings on the album page.
      • Album pages are easier.
      • intrep
        the expand pages could be cached too really
      • but i would bet that the showartist main page for each artist is accessed about 3 times as often as the expansion pages, making it very valuable to cache
      • djce
        Only for anon visitors, right?
      • intrep
        right
      • djce
        Interesting...
      • intrep
        logged in users get fresh content all the time
      • djce
        And you're talking about cached as in returning a "304" response, not just getting the "200" response but faster?
      • intrep
        you really could cache a lot of the info for logged in users as well
      • you could do it either way
      • djce
        The latter is technically easier; the former is more beneficial to the visitor.
      • intrep
        yeah, and better for your bandwidth
      • djce
        Yes!
      • That's definitely going on the TODO list then. Let me know if you make any progress yourself...
      • intrep
        hmm, you could also, if you wanted to get really tricky, stick some squid caches out on people machines that will redirect to your main site and cache some stuff.
      • taking a lot of load off of your main site
      • Mutiny
        180980 mp3list.txt
      • 133589 trmdb.txt
      • intrep
        a sort of squid mirror
      • Mutiny raps his fingers.
      • djce
        squid: The yet-to-be-created mirror system would handle that too.
      • 180,000 MP3 files? Wow.
      • intrep
        i need food
      • intrep is afk for grub
      • Mutiny
        mmm
      • 637gb, all downloadable and streamable courtsey of unixpunx.org
      • rjmunro
        rjmunro (~rjmunro@62.53.126.173) has joined #musicbrainz
      • djce
        That would take a while :-)
      • Are you volunteering to add all that data to MB?
      • Mutiny
        if he wants it, he's welcome to it
      • rjmunro
        Anyone heard anything from MusicMoz?
      • djce
        not I
      • Mutiny
        the database will have filenames, id3 tag, encoding info, and duration
      • if it'll be useful to anybody
      • djce
        The database as in those two text files, yes? Not all of the audio files themselves?
      • djce is away: I'm busy
      • Mutiny
        no, just the metadata on the archive
      • djce
        Possibly. How accurate is the data, I wonder?
      • Mutiny
        the id3 tags are probably not too accurate, but the file info is pretty accurate
      • the filename + path will give the artist/album/title, that's how we try to organize the archive
      • MBChatLogger
        MBChatLogger (~musicbra-@client3.fre.communitycolo.net) has joined #musicbrainz
      • Mutiny
        hehe
      • intrep
        hmm
      • intrep is back
      • i have a url that causes your website to give a '500 internal error'
      • djce
        hi
      • oh :-(
      • go on
      • intrep
        you want me to paste it in here?
      • djce
        ok
      • intrep
      • checking you guys out for xss :)
      • djce
        Yes, there will be lots like that. Most input isn't validated; specifically, things which should be numeric aren't checked for being so.
      • djce slap it on the TODO list...
      • intrep
        you might want to do something like 'if ($ARGS{artistid} =~ /[^0-9]/) { delete ($ARGS{artistid}); }
      • i c
      • djce
        I usually use: $id = ($foo =~ /\A(\d+)\z/ ? $1 : 0) actually
      • intrep
        your string parsing in the search panel appears to do the right thing
      • that works too
      • djce
        or (\d{1,10}) if I'm feeling thorough
      • for approximate values of 10 :-)
      • intrep
        heh
      • djce
        Just try not to do that sort of stuff until I've added all the validation, ok? ;-)
      • intrep
        sure
      • <turning off xss script checker...>
      • djce
        Maybe I'll call on you again to test it when I claim to have fixed it?
      • intrep
        no prob
      • djce
        Cool. Thanks!
      • intrep
        anyway, a 500 error is far preferrerable to the script making it through to the browser
      • djce
        True. But that depends on what it managed to do before barfing.
      • If it's actually feeding it unchecked into SQL, that's probably even worse.
      • intrep
        true
      • where can i find the todo list? cvs?
      • djce
        There's three "TODO" pages on the web. Plus there's my TODO list, which is only here on my dev machine.
      • intrep
        gotcha
      • djce
        Plus anybody else's TODO lists, if they also keep their own lists.
      • I can post it for you if you like.
      • intrep
        sure
      • djce
        ok, hold on...
      • intrep
        how complete are the currently existing web ones?
      • if they are roughly similar then they are fine
      • djce
        http://musicbrainz.org/~dave/TODO.txt - all sorts of scribblings
      • intrep
        cool, danke
      • djce
        the web TODOs? Pretty out of date.
      • It was mentioned a month or so ago that we need a more responsive way of dealing with those, so we would be more likely to keep them up-to-date.
      • My TODO list tends mainly to be small bug-fix-like things. Rob's TODO list tends to be more oriented towards the big picture (e.g. "implement mirroring")
      • intrep
        perhaps bugzilla, file bugs for things that need to be done
      • djce
        We've got the sourceforge bug list too
      • intrep
        yeah, maybe just file your todo items as bugs
      • djce
        That's a mixture of bugs reported by users, potential bugs spotted by us, and RFEs
      • Yes, but how much easier is just appending two lines to a local text file than posting a web form? :-)
      • intrep
        very true
      • you could write a script to autopost your text file as bugs :)
      • you just have to format your text file more carefully
      • djce
        Interesting idea. It would be a start just to copy my TODO file to the web, like I did for you just now, and link to it from the main site.
      • (and do so regularly)
      • intrep
        write a script to copy the file to the web server and run it from cron
      • :)
      • can you tell i hate doing things manually?
      • djce
        crontab -e ; 0 * * * * scp TODO server:./public_html/TODO.txt :-)
      • intrep
        hehe, perfect
      • djce
        You know, I might actually do that :-)
      • intrep
        make it djce-TODO.txt and then add a link from a main todo.html page and call it done
      • then other developers can follow in your footsteps