#metabrainz

/

      • shivam-kapila
      • ruaok
        nope. when I click on the track that was skipped it is fully playable.
      • shivam-kapila
        oh
      • adhawkins_ has quit
      • Mr_Monkey: how to import the lobes file you linked
      • Mr_Monkey
        It's already imported, you should just be able to use the variable in your less file
      • shivam-kapila
        `NameError: variable @listenbrainz is undefined`
      • Mr_Monkey
        Hm.
      • Maybe something like `@import "./path/to/lobes/less/theme.less";
      • Not sure where lobes is in LB
      • I think `@import "./theme/theme.less";`
      • But… I don't see the @listenbrainz variable anywhere, oddly.
      • Not sure what the deal is
      • shivam-kapila
        maybe lobes isnt in lb
      • nada
      • Mr_Monkey
        Hm. OK. Then ignore my remarks :)
      • We'll probably want to refactor that at some point to avoid having colors defined in multiple places.
      • shivam-kapila
        colors.less?
      • Mr_Monkey
        Something like that yes
      • Imported at the very top of main.less
      • shivam-kapila
        hm
      • I will make it in next PR
      • yvanzo
        _lucifer: I split the search bug report since not all issues can be addressed at once, can you please make your PRs/commits match the new tickets?
      • _lucifer
        will do yvanzo
      • yvanzo: saw your comment about gender-id. what would be the process to add that to indexing and are there any deployment concerns around that ?
      • alastairp: ping
      • alastairp
        hey
      • 5 minutes?
      • _lucifer
        sure!
      • yvanzo
        _lucifer: there is no deployment concern afaik but I just cannot test integration without changes to the indexer.
      • _lucifer
        ah ok! i saw you already assigned that to yourself. thanks!
      • alastairp
        _lucifer: I'm here
      • _lucifer
        hi!
      • alastairp
        what were we talking about? stats and graphs on AB?
      • _lucifer
        yes!
      • alastairp
        cool
      • let me pull up some stuff
      • I think I showed you this, right? https://github.com/MTG/acousticbrainz-labs/tree...
      • _lucifer
        yes, right
      • alastairp
        so one thing that we're trying to do is make the site look interesting
      • personally, I'd love to see this data update in real-time
      • so the question to answer is to work out what graphs describe the data in the most interesting way
      • _lucifer
        interesting question
      • alastairp
        you'll see things like https://github.com/MTG/acousticbrainz-labs/blob... are pretty terrible
      • _lucifer
      • alastairp
        we don't want to show most of these, because they don't make any sense
      • yeah, I like the features/year and features/genre one
      • it's a lot better to do more objective graphs - features and years are pretty good
      • whereas if we start showing genre graphs and say "all music falls into one of these 8 categories", I'm sure that people will start complaining :)
      • _lucifer
        right, makes sense
      • we currently do not have a pipeline to create these graphs right?
      • alastairp
        no
      • well, we have the code used in these notebooks to generate the graphs
      • _lucifer
        right, we need to integrate these with ab database
      • alastairp
      • _lucifer
        by real time you mean like updating whenever a recording is submitted ?
      • alastairp
        the problem with integration into the database is that it's too slow to query all of the data, even if we did it periodically
      • right, perhaps not that often, but say once a week
      • _lucifer
        that's doable
      • alastairp
        we have much of this information in the `similarity.similarity` table
      • and it's much smaller than the lowlevel table
      • _lucifer
        that's nice!
      • alastairp
        so perhaps we could have a periodic task that we run that summarises this table
      • if not, we could definitely also create another statistics table, although there is a question about what data we should add there
      • we could make some initial tables, and load data, and then if we need more data for more graphs, we add those at a later stage
      • for example - the similarity table doesn't have years, so we'd have to get that separately
      • then say for example we wanted to compare year to loudness, we'd need some kind of table that allowed us to join this info together
      • the genre or mood tables are much easier, because we just need categories and counts
      • _lucifer
        +1
      • alastairp
        OK, so
      • let's focus on the following charts:
      • genre rosamerica, feature/genre, feature/year, key estimation, genre mood (at the end of mood)
      • _lucifer
        awesome!
      • alastairp
        on bono, you can `psql -U acousticbrainz acousticbrainz_big` (not inside docker)
      • that has a full lowlevel table, and full `similarity.similarity` table
      • I think we should create a new postgresql schema (call it `statistics`), and for each graph, make a new table in this schema that stores just the information that we need for that graph
      • _lucifer
        yeah that's a great place to start
      • alastairp
        then we can see how easy it is to 1) get the data from similarity.similarity, or 2) get the data from lowlevel as the data comes in
      • _lucifer
        okay, will be needing to use saprk ?
      • alastairp
        I don't think so
      • this isn't really analysis, it's just loading and transforming data
      • if we wanted to use it, we'd have to load all of the necessary data into hdfs, which I suspect would be really annoying
      • _lucifer
        okay, yeah right. the similarity table is smaller and we can probably process it directly
      • alastairp
        that's what I'm hoping
      • _lucifer
        we can use spark without hdfs but that's a thing to consider for afterwards
      • alastairp
        oh? how would that work?
      • in some cases it might make sense to use spark for machine learning in AB, we should look into it as future option
      • _lucifer
        > Access data in HDFS, Alluxio, Apache Cassandra, Apache HBase, Apache Hive, and hundreds of other data sources.
      • Spark home page says this
      • i had also read an article on the same but cannot find it right now
      • this one sums it up
      • PostgreSQL has provides a JDBC plugin to allow spark to connect to it directly
      • BrainzGit
        [mb-solr] yvanzo merged pull request #39 (master…SEARCH-611): SEARCH-628: 'primary-type-id' field is missing from JSON release group search results https://github.com/metabrainz/mb-solr/pull/39
      • BrainzBot
        SEARCH-611: Incorrect content in JSON version of release group search result https://tickets.metabrainz.org/browse/SEARCH-611
      • SEARCH-628: 'primary-type-id' field is missing from JSON release group search results https://tickets.metabrainz.org/browse/SEARCH-628
      • _lucifer
        alastairp: by the way, why do we use HDFS ?
      • yvanzo
        _lucifer: can you please update #43 too?
      • _lucifer
        yes, yvanzo i am just testing it locally and will push the changes soon
      • yvanzo
        btw, status-id change requires indexer changes too. I will update ticket accordingly and work on sir.
      • _lucifer
        oh ok! thanks
      • yvanzo
        I’m reviewing oxml cleanup and Java 11 preps next :)
      • _lucifer
        Great! :D
      • yvanzo
        _lucifer: did you use specific commands for #42?
      • _lucifer
        yvanzo: no, why?
      • yvanzo
        It could have helped with rebasing.
      • "Auto format" sounds like something automated though :)
      • _lucifer
        oh! that, i just had my ide format that file to 4 space indent
      • rest is poor choice of words 😅
      • can you advise how I could have made rebasing easier?
      • yvanzo
        If that was a command in the commit message, one would just have to run it again.
      • _lucifer
        oh! makes sense. that i can do with the ide again. i'll rebase and drop the existing commit.
      • the clustering is the one that will have to be done manually and take some time
      • yvanzo
        Java 11 PR looks good overall, but there are a few deployment concerns, will not merge for the upcoming release.
      • _lucifer
        👍
      • yvanzo
        Can you also remove unrelated commits from #37? (since they have been copied to separate PRs)
      • _lucifer
        yeah sure
      • ruaok
        it was bound to happen. my own code (and pristine___s) rickrolled me.
      • _lucifer
        lol 😂😂
      • shivam-kapila
        ruaok: welcome to kapilaland
      • JoshDi joined the channel
      • ruaok
        I actually think this is an evil plot by pristine___ to get back at me
      • JoshDi
        Hey quick question. I currently run a musicbrainz slave server via the docker image. Is there a way to turn off indexing completely so all local queries go directly to the database?
      • shivam-kapila
        ruaok: save yourself
      • ruaok
        K says: "Never gonna give listenbrainz up, never gonna let listenbrainz down, never gonna turn around and hurt listenbrainz!"
      • yvanzo: ^^ see JoshDi's query
      • ruaok waves at JoshDi
      • JoshDi
        Hey
      • I find even with SIR tweaks, live indexing daily updates of the slave , take like 12 hours to finish. When full reindexing takes 3 hrs
      • shivam-kapila
        ruaok: I introduced a lil of troi to ppl
      • And they were like damn. Dynamic playlists
      • JoshDi
        I only use this server for some local processes so its not like my musicbrainz server is very busy.
      • shivam-kapila
        They felt really excited
      • JoshDi
        Any ideas?
      • ruaok
        I dont know, but yvanzo will. hang tight for him to return and he'll sort you out. (he is around)
      • JoshDi
        [solr]uri = http://search:8983/solrbatch_size = 5000[sir]import_threads = 10index_limit = 2000000live_index_batch_size = 5000process_delay = 15query_batch_size = 20000wscompat = onprefetch_count = 2000
      • Im now running this on a machine with 128gb of ram and 40 threads at 3.1ghz.... so it should be much faster
      • shivam-kapila
        ruaok: can I ask an off topic doubt
      • ruaok
        are you giving JMVs ram to use? by default they may not be using enough, making things slower
      • shivam-kapila: why do you keep asking if you can ask a question?
      • JoshDi13 joined the channel
      • shivam-kapila
        Anyways do we have ryzen processors in prod?
      • JoshDi13
        my memory settings are: shm_size: 4g and SOLR_HEAP=4g
      • ruaok
        shivam-kapila: yes
      • JoshDi13: why not try 8g and see what happens?
      • JoshDi has left the channel
      • JoshDi13 is now known as JoshDi
      • JoshDi
        postgres -c "shared_buffers=4GB" -c "work_mem=128MB" -c "shared_preload_libraries=pg_amqp.so"