ferbncode: I accidentally closed my brainzutils pull request, can you reopen it for me? Still going back and making changes to the code as I work on critiquebrainz
D4RK-PH0ENiX joined the channel
D4RK-PH0ENiX has quit
D4RK-PH0ENiX joined the channel
amCap1712
CatQuest, KassOtsimine: the update is live
i reckoned a bug just now to test properly you need to login first then open collections otherwise it doesn't work properly
i'll fix it in next release
disruptek has quit
disruptek joined the channel
Jay__ joined the channel
Jay__
hey all i have a problem using acousticbrainz can someone help me, i have an image that perfectly describes it https://imgur.com/kxBfaIE
Jay__ has quit
pristine__
ruaok: hey
ruaok
Hey! Greetings from Florence.
pristine__
Got few min?
reosarevok
zarcade_droid: done!
ruaok
I have no laptop on me. Just mobile. I should be available after 14h. But, try me.
Very interesting. Much faster, which is great. I recognize many more artists, which also seems good.
But green day, for instance strays quite off my tastes for instance.
But, I need to get moving now. I can look again from the tram.
reosarevok
"Do you have the time to listen to me whine / About nothing and everything all at once"
I dunno, sounds like ruaok to me!
kori joined the channel
pristine__
ruaok: the first one is on a months data
And the second on a year's data
Ping me when you're here. We can discuss.
ruaok
Ok, that sounds good.
How did you calculate the candidate set?
pristine__
First of all, I fetched top 50 artists of a user in the given timeframe. Then made a list of these 50 artists plus artists similar to them using the json you provided. Then I fetched tracks of these similar artists which was the final candidate set.
ruaok
Ok, totally makes sense. It would be nice to see the candidate set as well. I think that is something we need to review independently of the recommendations, what do you think?
yvanzo
mo’’in’
pristine__
ruaok: Yes. Totally makes sense. An HTML?
I just had this thought in my mind while working, we will find top x artists for users from their past week's history. Recommend songs per day, for the next day we will subtract already recommended songs from the candidate set and then recommend. If our set exhausts in the middle of the week we will find top y artists starting from x+1 and then repeat the procedure.
But the next top y artists can have similar artists from top x, so we need to keep track of that and avoid recommending same songs.
Also, is there a way that we can group artists according to genres. If we have such a table in MB db.
Also, I was thing about three playlists, 1. Songs from favorite artists (songs only of the top x artists) 2. Songs from similar artists (songs only of the artists similar to top x) 3. New artists (songs from whole set minus candidate set, in order to promote artists)
I am spamming the channel with thoughts I had in past two days 😆
CatQuest
sorry amCap1712 I fell asleep. 'll check the application one i've had a shoer/eath breakfast etc
pristine__
Also, we can group artists according to nationality, in addition to artist credit.
Nyanko-sensei joined the channel
amCap1712
ok thanks CatQuest
D4RK-PH0ENiX has quit
ruaok
pristine__: yeah, HTML should work fine.
For grouping artists, we have genres, but the data is not well populated.
And all those thoughts about recommendations and keeping track of what has been recommended, is great thinking. This is why I want a new schema inside the LB data.
To keep track of all that.
And yes, those three ideas are exactly what we can start working on when we have our underlying data sets ready.
I'm going to be working one a rudimentary msid <=> mbid mapping this week.
Mr_Monkey: what should the output of `<entity>/<bbid>/relationships`?
means, which information should return?
Mr_Monkey
akhilesh: An array of relationships, each containing: relationship type ('label' in the DB), direction, link phrase, other entity's type
I thinks that's the minimal information you need to reconstruct the relationship
Ah, and target entity bbid of course
The direction is wether the current entity is the source or target of the relationship
akhilesh
ok
Mr_Monkey
akhilesh: There are cases where the direction doesn't make sense (for example, Author A is married to Author B). Not sure what to do with those, possible simply default to 'forward' relationship
You won't need source and target, considering onc or the other is the current entity bbid. So you'll only have 'target', and depending on the position of the current entity (in source_bbid or target_bbid), the direction is 'forward' or 'backward'.
akhilesh
ok
Mr_Monkey
I would opt for `relationshipType: {label:X, id: Y}``
ruaok: i opened a pr for exception catching in stats and the mlhd pr is ready for review
ruaok
great!
I can start looking at those later today. if I can get used to being in a city again. :)
iliekcomputers
what do you want to do with the spark-writer PR?
ruaok
maybe just close it for now?
I'm still stuck on what to do there. the whole big data cluster is frustrating to me.
its a chicken/egg problem. we wont know how many resources we need until we run stuff, but we need to plan before we write code.
iliekcomputers
the spark-writer thing really seems like a problem incremental dumps could solve.
ruaok
and we have two usage cases: recommendations and user stats.
iliekcomputers
wake up the cluster, download the dumps needed, import and run stats
ruaok
YES!
that is a great insight!
let's do that.
iliekcomputers
so how exactly would incremental dumps work, should we just start a series independent of the current full dumps? 1 (big large dump), 2, 3 and 4 and others smaller
ruaok
ideally they would similar/identical in structure to the full dumps.
if you start with a full dump and apply all the partial dumps between full dumps, you should end up with exactly the same data as the next full dump.
which means that we are dumping data "as we receive it", not in time sequence.
not sure that answers your question.
iliekcomputers
we started storing influx insert timestamps a long time ago, so that hopefully won't be a problem.
ruaok
indeed.
iliekcomputers
so i guess the series would be 1 (full), 2, 3, 4, 5, 6 (full again maybe), 7, 8, 9 and so on?
not sure what i'm saying.
ruaok
in you consider the partial dumps are marking the progress of time, then at periodic points, we also emit a full dump.