in #metabrainz

0:03 AM
disruptek has quit
0:07 AM
disruptek joined the channel
0:21 AM
eharris has quit
0:33 AM
eharris joined the channel
1:00 AM
D4RK-PH0ENiX has quit
1:02 AM
spellew

ferbncode: I accidentally closed my brainzutils pull request, can you reopen it for me? Still going back and making changes to the code as I work on critiquebrainz
1:18 AM
D4RK-PH0ENiX joined the channel
1:35 AM
D4RK-PH0ENiX has quit
1:35 AM
D4RK-PH0ENiX joined the channel
4:30 AM
amCap1712

CatQuest, KassOtsimine: the update is live
4:30 AM
i reckoned a bug just now to test properly you need to login first then open collections otherwise it doesn't work properly
4:31 AM
i'll fix it in next release
4:45 AM
disruptek has quit
4:49 AM
disruptek joined the channel
5:24 AM
Jay__ joined the channel
5:27 AM
Jay__

hey all i have a problem using acousticbrainz can someone help me, i have an image that perfectly describes it https://imgur.com/kxBfaIE
5:53 AM
Jay__ has quit
6:42 AM
pristine__

ruaok: hey
6:47 AM
ruaok

Hey! Greetings from Florence.
6:51 AM
pristine__

Got few min?
6:51 AM
reosarevok

zarcade_droid: done!
6:53 AM
ruaok

I have no laptop on me. Just mobile. I should be available after 14h. But, try me.
6:54 AM
pristine__

https://usercontent.irccloud-cdn.com/file/zqM7f... https://usercontent.irccloud-cdn.com/file/GEkz0...
6:55 AM
ruaok: have a llok
6:55 AM
look*
6:55 AM
if you could relate, to which on more?
6:55 AM
one*
6:59 AM
ruaok

Very interesting. Much faster, which is great. I recognize many more artists, which also seems good.
6:59 AM
But green day, for instance strays quite off my tastes for instance.
7:00 AM
But, I need to get moving now. I can look again from the tram.
7:00 AM
reosarevok

"Do you have the time to listen to me whine / About nothing and everything all at once"
7:00 AM
I dunno, sounds like ruaok to me!
7:00 AM
kori joined the channel
7:02 AM
pristine__

ruaok: the first one is on a months data
7:02 AM
And the second on a year's data
7:02 AM
Ping me when you're here. We can discuss.
7:07 AM
ruaok

Ok, that sounds good.
7:08 AM
How did you calculate the candidate set?
7:13 AM
pristine__

First of all, I fetched top 50 artists of a user in the given timeframe. Then made a list of these 50 artists plus artists similar to them using the json you provided. Then I fetched tracks of these similar artists which was the final candidate set.
7:22 AM
ruaok

Ok, totally makes sense. It would be nice to see the candidate set as well. I think that is something we need to review independently of the recommendations, what do you think?
7:29 AM
yvanzo

mo’’in’
7:49 AM
pristine__

ruaok: Yes. Totally makes sense. An HTML?
7:52 AM
I just had this thought in my mind while working, we will find top x artists for users from their past week's history. Recommend songs per day, for the next day we will subtract already recommended songs from the candidate set and then recommend. If our set exhausts in the middle of the week we will find top y artists starting from x+1 and then repeat the procedure.
7:53 AM
But the next top y artists can have similar artists from top x, so we need to keep track of that and avoid recommending same songs.
7:54 AM
Also, is there a way that we can group artists according to genres. If we have such a table in MB db.
7:57 AM
Also, I was thing about three playlists, 1. Songs from favorite artists (songs only of the top x artists) 2. Songs from similar artists (songs only of the artists similar to top x) 3. New artists (songs from whole set minus candidate set, in order to promote artists)
7:58 AM
I am spamming the channel with thoughts I had in past two days 😆
7:58 AM
CatQuest

sorry amCap1712 I fell asleep. 'll check the application one i've had a shoer/eath breakfast etc
7:59 AM
pristine__

Also, we can group artists according to nationality, in addition to artist credit.
8:00 AM
Nyanko-sensei joined the channel
8:01 AM
amCap1712

ok thanks CatQuest
8:03 AM
D4RK-PH0ENiX has quit
8:17 AM
ruaok

pristine__: yeah, HTML should work fine.
8:20 AM
For grouping artists, we have genres, but the data is not well populated.
8:21 AM
And all those thoughts about recommendations and keeping track of what has been recommended, is great thinking. This is why I want a new schema inside the LB data.
8:21 AM
To keep track of all that.
8:22 AM
And yes, those three ideas are exactly what we can start working on when we have our underlying data sets ready.
8:23 AM
I'm going to be working one a rudimentary msid <=> mbid mapping this week.
8:29 AM
ferbncode

spellew: You should see a "Reopen pull request" button here: https://github.com/metabrainz/brainzutils-pytho...
8:38 AM
pristine__

the mapping would refine similar artists list. Sounds good
9:02 AM
travis-ci joined the channel
9:02 AM
travis-ci

Project bookbrainz-data-js build #1117: passed in 1 min 44 sec: https://travis-ci.org/bookbrainz/bookbrainz-dat...
9:02 AM
travis-ci has left the channel
9:05 AM
Nyanko-sensei has quit
9:05 AM
D4RK-PH0ENiX joined the channel
9:25 AM
gr0uch0mars joined the channel
10:34 AM
chirlu has quit
10:57 AM
D4RK-PH0ENiX has quit
10:58 AM
yokel has quit
11:00 AM
yokel joined the channel
11:11 AM
amCap1712

hi gr0uch0mars
11:11 AM
i have merged the oauth pr and opened a pr on collections
11:11 AM
there is still some work to be done on collections but most of it is ready
11:12 AM
could you check it out?
11:12 AM
also i have released an update to the app
11:14 AM
i plan to release another update once some bugs i have identified are fixed and collections work gets completed
11:23 AM
D4RK-PH0ENiX joined the channel
11:39 AM
iliekcomputers

alastairp: hi, will you have time to look at the ratelimit PR today?
11:52 AM
BrainzGit

[listenbrainz-recommendation-playground] paramsingh merged pull request #22 (popular-artist…popular-artist): queries for entities (artist, user) https://github.com/metabrainz/listenbrainz-reco...
11:57 AM
akhilesh

Mr_Monkey: Hi!
11:57 AM
Mr_Monkey

Hi akhilesh
11:59 AM
BrainzGit

[listenbrainz-recommendation-playground] paramsingh opened pull request #29 (master…popular-artist): Entity statistics https://github.com/metabrainz/listenbrainz-reco...
11:59 AM
akhilesh

Mr_Monkey: what should the output of `<entity>/<bbid>/relationships`?
12:01 PM
means, which information should return?
12:05 PM
Mr_Monkey

akhilesh: An array of relationships, each containing: relationship type ('label' in the DB), direction, link phrase, other entity's type
12:06 PM
I thinks that's the minimal information you need to reconstruct the relationship
12:07 PM
Ah, and target entity bbid of course
12:07 PM
The direction is wether the current entity is the source or target of the relationship
12:08 PM
akhilesh

ok
12:13 PM
Mr_Monkey

akhilesh: There are cases where the direction doesn't make sense (for example, Author A is married to Author B). Not sure what to do with those, possible simply default to 'forward' relationship
12:16 PM
akhilesh

https://www.irccloud.com/pastebin/LYxEftLz/
12:17 PM
Mr_Monkey

akhilesh: We also might want to publish the relationship type id along with the label
12:17 PM
Now that I think of it
12:18 PM
akhilesh

https://www.irccloud.com/pastebin/ronFGG6P/
12:19 PM
Mr_Monkey: Is it ok?
12:19 PM
gr0uch0mars has quit
12:20 PM
Mr_Monkey

You won't need source and target, considering onc or the other is the current entity bbid. So you'll only have 'target', and depending on the position of the current entity (in source_bbid or target_bbid), the direction is 'forward' or 'backward'.
12:21 PM
akhilesh

ok
12:21 PM
Mr_Monkey

I would opt for `relationshipType: {label:X, id: Y}``
12:22 PM
'name' instead of 'label' perhaps?
12:24 PM
akhilesh

https://www.irccloud.com/pastebin/hnQyPA8Q/
12:24 PM
Mr_Monkey: ^
12:25 PM
is It ok for now?
12:27 PM
Mr_Monkey

akhilesh: Yes, that seems fine for a first step. there might be more to add at a later date
12:27 PM
akhilesh

yes
12:33 PM
ruaok turns up at home
12:34 PM
iliekcomputers

https://www.youtube.com/watch?v=KJagA5R2wR8
12:35 PM
SothoTalKer

hello :)
12:50 PM
iliekcomputers

ruaok: i opened a pr for exception catching in stats and the mlhd pr is ready for review
12:50 PM
ruaok

great!
12:50 PM
I can start looking at those later today. if I can get used to being in a city again. :)
12:58 PM
iliekcomputers

what do you want to do with the spark-writer PR?
12:59 PM
ruaok

maybe just close it for now?
12:59 PM
I'm still stuck on what to do there. the whole big data cluster is frustrating to me.
13:00 PM
its a chicken/egg problem. we wont know how many resources we need until we run stuff, but we need to plan before we write code.
13:03 PM
iliekcomputers

the spark-writer thing really seems like a problem incremental dumps could solve.
13:03 PM
ruaok

and we have two usage cases: recommendations and user stats.
13:03 PM
iliekcomputers

wake up the cluster, download the dumps needed, import and run stats
13:03 PM
ruaok

YES!
13:03 PM
that is a great insight!
13:03 PM
let's do that.
13:05 PM
iliekcomputers

so how exactly would incremental dumps work, should we just start a series independent of the current full dumps? 1 (big large dump), 2, 3 and 4 and others smaller
13:07 PM
ruaok

ideally they would similar/identical in structure to the full dumps.
13:07 PM
if you start with a full dump and apply all the partial dumps between full dumps, you should end up with exactly the same data as the next full dump.
13:08 PM
which means that we are dumping data "as we receive it", not in time sequence.
13:08 PM
not sure that answers your question.
13:09 PM
iliekcomputers

we started storing influx insert timestamps a long time ago, so that hopefully won't be a problem.
13:09 PM
ruaok

indeed.
13:09 PM
iliekcomputers

so i guess the series would be 1 (full), 2, 3, 4, 5, 6 (full again maybe), 7, 8, 9 and so on?
13:10 PM
not sure what i'm saying.
13:10 PM
ruaok

in you consider the partial dumps are marking the progress of time, then at periodic points, we also emit a full dump.
13:11 PM
1p, 2p, 3p, 4p, 5p & 5full, 6p, 7p, 8p, 9p, 10p/10full, 11p ....
13:11 PM
iliekcomputers

ah okayyy
13:11 PM
that makes sense.
13:13 PM
ruaok

ideally we would have to write very little new code, since the structure of both types of dumps are the same
13:13 PM
thanks, iliekcomputers. you just solved a giant headache for me!
13:13 PM
<3
13:14 PM
iliekcomputers

<3 <3
13:14 PM
iliekcomputers had a good idea, who'd have thunk
13:15 PM
ruaok

because then its a matter of frequency on how often we wake up the cluster.
13:15 PM
and if we feel that we need it more often, we can increase the frequency.
13:15 PM
iliekcomputers

yeah, exactly.
13:15 PM
ruaok

and if we feel that we need it all the time eventually, we go buy dedicated machines from hetzner.
13:16 PM
and that means we use our azure credits and maybe then we know exactly what we want to do.
13:16 PM
for a more long term solution.
13:16 PM
k, so my goals past the board meeting and lawsuit resolution this week is to do the MSID<->MBID mapping version.
13:16 PM
then I'l work on azure stuff to start/stop the cluster.
13:17 PM
then I can loop back around and improve the MSID mapping.
13:17 PM
I have another thought that I wanted to run you past you...
13:17 PM
iliekcomputers

how are you doing the mappings?