by real time you mean like updating whenever a recording is submitted ?
2020-10-16 29010, 2020
alastairp
the problem with integration into the database is that it's too slow to query all of the data, even if we did it periodically
2020-10-16 29022, 2020
alastairp
right, perhaps not that often, but say once a week
2020-10-16 29036, 2020
_lucifer
that's doable
2020-10-16 29057, 2020
alastairp
we have much of this information in the `similarity.similarity` table
2020-10-16 29004, 2020
alastairp
and it's much smaller than the lowlevel table
2020-10-16 29034, 2020
_lucifer
that's nice!
2020-10-16 29037, 2020
alastairp
so perhaps we could have a periodic task that we run that summarises this table
2020-10-16 29005, 2020
alastairp
if not, we could definitely also create another statistics table, although there is a question about what data we should add there
2020-10-16 29026, 2020
alastairp
we could make some initial tables, and load data, and then if we need more data for more graphs, we add those at a later stage
2020-10-16 29041, 2020
alastairp
for example - the similarity table doesn't have years, so we'd have to get that separately
2020-10-16 29014, 2020
alastairp
then say for example we wanted to compare year to loudness, we'd need some kind of table that allowed us to join this info together
2020-10-16 29029, 2020
alastairp
the genre or mood tables are much easier, because we just need categories and counts
2020-10-16 29024, 2020
_lucifer
+1
2020-10-16 29045, 2020
alastairp
OK, so
2020-10-16 29050, 2020
alastairp
let's focus on the following charts:
2020-10-16 29043, 2020
alastairp
genre rosamerica, feature/genre, feature/year, key estimation, genre mood (at the end of mood)
2020-10-16 29037, 2020
_lucifer
awesome!
2020-10-16 29038, 2020
alastairp
on bono, you can `psql -U acousticbrainz acousticbrainz_big` (not inside docker)
2020-10-16 29004, 2020
alastairp
that has a full lowlevel table, and full `similarity.similarity` table
2020-10-16 29051, 2020
alastairp
I think we should create a new postgresql schema (call it `statistics`), and for each graph, make a new table in this schema that stores just the information that we need for that graph
2020-10-16 29018, 2020
_lucifer
yeah that's a great place to start
2020-10-16 29027, 2020
alastairp
then we can see how easy it is to 1) get the data from similarity.similarity, or 2) get the data from lowlevel as the data comes in
2020-10-16 29011, 2020
_lucifer
okay, will be needing to use saprk ?
2020-10-16 29021, 2020
alastairp
I don't think so
2020-10-16 29038, 2020
alastairp
this isn't really analysis, it's just loading and transforming data
2020-10-16 29003, 2020
alastairp
if we wanted to use it, we'd have to load all of the necessary data into hdfs, which I suspect would be really annoying
2020-10-16 29016, 2020
_lucifer
okay, yeah right. the similarity table is smaller and we can probably process it directly
2020-10-16 29023, 2020
alastairp
that's what I'm hoping
2020-10-16 29046, 2020
_lucifer
we can use spark without hdfs but that's a thing to consider for afterwards
2020-10-16 29026, 2020
alastairp
oh? how would that work?
2020-10-16 29001, 2020
alastairp
in some cases it might make sense to use spark for machine learning in AB, we should look into it as future option
2020-10-16 29023, 2020
_lucifer
> Access data in HDFS, Alluxio, Apache Cassandra, Apache HBase, Apache Hive, and hundreds of other data sources.
2020-10-16 29032, 2020
_lucifer
Spark home page says this
2020-10-16 29005, 2020
_lucifer
i had also read an article on the same but cannot find it right now
PostgreSQL has provides a JDBC plugin to allow spark to connect to it directly
2020-10-16 29032, 2020
BrainzGit
[mb-solr] yvanzo merged pull request #39 (master…SEARCH-611): SEARCH-628: 'primary-type-id' field is missing from JSON release group search results https://github.com/metabrainz/mb-solr/pull/39
I actually think this is an evil plot by pristine___ to get back at me
2020-10-16 29029, 2020
JoshDi
Hey quick question. I currently run a musicbrainz slave server via the docker image. Is there a way to turn off indexing completely so all local queries go directly to the database?
2020-10-16 29043, 2020
shivam-kapila
ruaok: save yourself
2020-10-16 29035, 2020
ruaok
K says: "Never gonna give listenbrainz up, never gonna let listenbrainz down, never gonna turn around and hurt listenbrainz!"
2020-10-16 29057, 2020
ruaok
yvanzo: ^^ see JoshDi's query
2020-10-16 29003, 2020
ruaok waves at JoshDi
2020-10-16 29007, 2020
JoshDi
Hey
2020-10-16 29002, 2020
JoshDi
I find even with SIR tweaks, live indexing daily updates of the slave , take like 12 hours to finish. When full reindexing takes 3 hrs
2020-10-16 29024, 2020
shivam-kapila
ruaok: I introduced a lil of troi to ppl
2020-10-16 29036, 2020
shivam-kapila
And they were like damn. Dynamic playlists
2020-10-16 29037, 2020
JoshDi
I only use this server for some local processes so its not like my musicbrainz server is very busy.
2020-10-16 29054, 2020
shivam-kapila
They felt really excited
2020-10-16 29045, 2020
JoshDi
Any ideas?
2020-10-16 29023, 2020
ruaok
I dont know, but yvanzo will. hang tight for him to return and he'll sort you out. (he is around)