in #metabrainz

14:51 PM
shivam-kapila

https://github.com/metabrainz/listenbrainz-serv...
14:52 PM
ruaok

nope. when I click on the track that was skipped it is fully playable.
14:52 PM
shivam-kapila

oh
14:54 PM
adhawkins_ has quit
15:20 PM
Mr_Monkey: how to import the lobes file you linked
15:21 PM
Mr_Monkey

It's already imported, you should just be able to use the variable in your less file
15:21 PM
shivam-kapila

`NameError: variable @listenbrainz is undefined`
15:21 PM
Mr_Monkey

Hm.
15:24 PM
Maybe something like `@import "./path/to/lobes/less/theme.less";
15:24 PM
Not sure where lobes is in LB
15:26 PM
I think `@import "./theme/theme.less";`
15:26 PM
But… I don't see the @listenbrainz variable anywhere, oddly.
15:26 PM
Not sure what the deal is
15:26 PM
shivam-kapila

maybe lobes isnt in lb
15:28 PM
http://livegrep.metabrainz.org/search/livegrep?...
15:28 PM
nada
15:28 PM
Mr_Monkey

Hm. OK. Then ignore my remarks :)
15:29 PM
We'll probably want to refactor that at some point to avoid having colors defined in multiple places.
15:29 PM
shivam-kapila

colors.less?
15:31 PM
Mr_Monkey

Something like that yes
15:31 PM
Imported at the very top of main.less
15:31 PM
shivam-kapila

hm
15:31 PM
I will make it in next PR
15:39 PM
yvanzo

_lucifer: I split the search bug report since not all issues can be addressed at once, can you please make your PRs/commits match the new tickets?
15:39 PM
_lucifer

will do yvanzo
16:12 PM
yvanzo: saw your comment about gender-id. what would be the process to add that to indexing and are there any deployment concerns around that ?
16:13 PM
alastairp: ping
16:13 PM
alastairp

hey
16:14 PM
5 minutes?
16:14 PM
_lucifer

sure!
16:15 PM
yvanzo

_lucifer: there is no deployment concern afaik but I just cannot test integration without changes to the indexer.
16:19 PM
_lucifer

ah ok! i saw you already assigned that to yourself. thanks!
16:22 PM
alastairp

_lucifer: I'm here
16:22 PM
_lucifer

hi!
16:23 PM
alastairp

what were we talking about? stats and graphs on AB?
16:23 PM
_lucifer

yes!
16:23 PM
alastairp

cool
16:23 PM
let me pull up some stuff
16:24 PM
I think I showed you this, right? https://github.com/MTG/acousticbrainz-labs/tree...
16:24 PM
_lucifer

yes, right
16:25 PM
alastairp

so one thing that we're trying to do is make the site look interesting
16:26 PM
personally, I'd love to see this data update in real-time
16:27 PM
so the question to answer is to work out what graphs describe the data in the most interesting way
16:27 PM
_lucifer

interesting question
16:27 PM
alastairp

you'll see things like https://github.com/MTG/acousticbrainz-labs/blob... are pretty terrible
16:28 PM
_lucifer

I think this one is interesting https://github.com/MTG/acousticbrainz-labs/blob...
16:28 PM
alastairp

we don't want to show most of these, because they don't make any sense
16:28 PM
yeah, I like the features/year and features/genre one
16:29 PM
it's a lot better to do more objective graphs - features and years are pretty good
16:29 PM
whereas if we start showing genre graphs and say "all music falls into one of these 8 categories", I'm sure that people will start complaining :)
16:29 PM
_lucifer

right, makes sense
16:30 PM
we currently do not have a pipeline to create these graphs right?
16:30 PM
alastairp

no
16:30 PM
well, we have the code used in these notebooks to generate the graphs
16:31 PM
_lucifer

right, we need to integrate these with ab database
16:31 PM
alastairp

however, we also have this: https://github.com/metabrainz/acousticbrainz-se...
16:31 PM
_lucifer

by real time you mean like updating whenever a recording is submitted ?
16:32 PM
alastairp

the problem with integration into the database is that it's too slow to query all of the data, even if we did it periodically
16:32 PM
right, perhaps not that often, but say once a week
16:32 PM
_lucifer

that's doable
16:32 PM
alastairp

we have much of this information in the `similarity.similarity` table
16:33 PM
and it's much smaller than the lowlevel table
16:33 PM
_lucifer

that's nice!
16:33 PM
alastairp

so perhaps we could have a periodic task that we run that summarises this table
16:34 PM
if not, we could definitely also create another statistics table, although there is a question about what data we should add there
16:34 PM
we could make some initial tables, and load data, and then if we need more data for more graphs, we add those at a later stage
16:34 PM
for example - the similarity table doesn't have years, so we'd have to get that separately
16:35 PM
then say for example we wanted to compare year to loudness, we'd need some kind of table that allowed us to join this info together
16:35 PM
the genre or mood tables are much easier, because we just need categories and counts
16:36 PM
_lucifer

+1
16:36 PM
alastairp

OK, so
16:36 PM
let's focus on the following charts:
16:37 PM
genre rosamerica, feature/genre, feature/year, key estimation, genre mood (at the end of mood)
16:38 PM
_lucifer

awesome!
16:38 PM
alastairp

on bono, you can `psql -U acousticbrainz acousticbrainz_big` (not inside docker)
16:39 PM
that has a full lowlevel table, and full `similarity.similarity` table
16:39 PM
I think we should create a new postgresql schema (call it `statistics`), and for each graph, make a new table in this schema that stores just the information that we need for that graph
16:40 PM
_lucifer

yeah that's a great place to start
16:40 PM
alastairp

then we can see how easy it is to 1) get the data from similarity.similarity, or 2) get the data from lowlevel as the data comes in
16:41 PM
_lucifer

okay, will be needing to use saprk ?
16:41 PM
alastairp

I don't think so
16:41 PM
this isn't really analysis, it's just loading and transforming data
16:42 PM
if we wanted to use it, we'd have to load all of the necessary data into hdfs, which I suspect would be really annoying
16:42 PM
_lucifer

okay, yeah right. the similarity table is smaller and we can probably process it directly
16:42 PM
alastairp

that's what I'm hoping
16:43 PM
_lucifer

we can use spark without hdfs but that's a thing to consider for afterwards
16:44 PM
alastairp

oh? how would that work?
16:45 PM
in some cases it might make sense to use spark for machine learning in AB, we should look into it as future option
16:45 PM
_lucifer

> Access data in HDFS, Alluxio, Apache Cassandra, Apache HBase, Apache Hive, and hundreds of other data sources.
16:45 PM
Spark home page says this
16:46 PM
i had also read an article on the same but cannot find it right now
16:46 PM
https://spark.apache.org/docs/latest/sql-data-s...
16:46 PM
this one sums it up
16:47 PM
PostgreSQL has provides a JDBC plugin to allow spark to connect to it directly
16:49 PM
BrainzGit

[mb-solr] yvanzo merged pull request #39 (master…SEARCH-611): SEARCH-628: 'primary-type-id' field is missing from JSON release group search results https://github.com/metabrainz/mb-solr/pull/39
16:49 PM
BrainzBot

SEARCH-611: Incorrect content in JSON version of release group search result https://tickets.metabrainz.org/browse/SEARCH-611
16:49 PM
SEARCH-628: 'primary-type-id' field is missing from JSON release group search results https://tickets.metabrainz.org/browse/SEARCH-628
16:50 PM
_lucifer

alastairp: by the way, why do we use HDFS ?
16:50 PM
yvanzo

_lucifer: can you please update #43 too?
16:50 PM
_lucifer

yes, yvanzo i am just testing it locally and will push the changes soon
16:50 PM
yvanzo

btw, status-id change requires indexer changes too. I will update ticket accordingly and work on sir.
16:51 PM
_lucifer

oh ok! thanks
16:52 PM
yvanzo

I’m reviewing oxml cleanup and Java 11 preps next :)
16:52 PM
_lucifer

Great! :D
16:56 PM
yvanzo

_lucifer: did you use specific commands for #42?
16:56 PM
_lucifer

yvanzo: no, why?
16:57 PM
yvanzo

It could have helped with rebasing.
16:57 PM
"Auto format" sounds like something automated though :)
16:58 PM
_lucifer

oh! that, i just had my ide format that file to 4 space indent
16:59 PM
rest is poor choice of words 😅
17:00 PM
can you advise how I could have made rebasing easier?
17:00 PM
yvanzo

If that was a command in the commit message, one would just have to run it again.
17:03 PM
_lucifer

oh! makes sense. that i can do with the ide again. i'll rebase and drop the existing commit.
17:04 PM
the clustering is the one that will have to be done manually and take some time
17:11 PM
yvanzo

Java 11 PR looks good overall, but there are a few deployment concerns, will not merge for the upcoming release.
17:11 PM
_lucifer

👍
17:13 PM
yvanzo

Can you also remove unrelated commits from #37? (since they have been copied to separate PRs)
17:14 PM
_lucifer

yeah sure
17:16 PM
ruaok

it was bound to happen. my own code (and pristine___s) rickrolled me.
17:16 PM
https://usercontent.irccloud-cdn.com/file/7YL4h...
17:18 PM
_lucifer

lol 😂😂
17:19 PM
shivam-kapila

ruaok: welcome to kapilaland
17:19 PM
JoshDi joined the channel
17:19 PM
ruaok

I actually think this is an evil plot by pristine___ to get back at me
17:20 PM
JoshDi

Hey quick question. I currently run a musicbrainz slave server via the docker image. Is there a way to turn off indexing completely so all local queries go directly to the database?
17:20 PM
shivam-kapila

ruaok: save yourself
17:21 PM
ruaok

K says: "Never gonna give listenbrainz up, never gonna let listenbrainz down, never gonna turn around and hurt listenbrainz!"
17:21 PM
yvanzo: ^^ see JoshDi's query
17:22 PM
ruaok waves at JoshDi
17:22 PM
JoshDi

Hey
17:23 PM
I find even with SIR tweaks, live indexing daily updates of the slave , take like 12 hours to finish. When full reindexing takes 3 hrs
17:23 PM
shivam-kapila

ruaok: I introduced a lil of troi to ppl
17:23 PM
And they were like damn. Dynamic playlists
17:23 PM
JoshDi

I only use this server for some local processes so its not like my musicbrainz server is very busy.
17:23 PM
shivam-kapila

They felt really excited
17:24 PM
JoshDi

Any ideas?
17:25 PM
ruaok

I dont know, but yvanzo will. hang tight for him to return and he'll sort you out. (he is around)
17:26 PM
JoshDi

[solr]uri = http://search:8983/solrbatch_size = 5000[sir]import_threads = 10index_limit = 2000000live_index_batch_size = 5000process_delay = 15query_batch_size = 20000wscompat = onprefetch_count = 2000
17:26 PM
Im now running this on a machine with 128gb of ram and 40 threads at 3.1ghz.... so it should be much faster
17:28 PM
shivam-kapila

ruaok: can I ask an off topic doubt
17:28 PM
ruaok

are you giving JMVs ram to use? by default they may not be using enough, making things slower
17:28 PM
shivam-kapila: why do you keep asking if you can ask a question?
17:29 PM
JoshDi13 joined the channel
17:30 PM
shivam-kapila

Anyways do we have ryzen processors in prod?
17:30 PM
JoshDi13

my memory settings are: shm_size: 4g and SOLR_HEAP=4g
17:30 PM
ruaok

shivam-kapila: yes
17:30 PM
JoshDi13: why not try 8g and see what happens?
17:30 PM
JoshDi has left the channel
17:31 PM
JoshDi13 is now known as JoshDi
17:31 PM
JoshDi

postgres -c "shared_buffers=4GB" -c "work_mem=128MB" -c "shared_preload_libraries=pg_amqp.so"