Update schemas to latest format #803

valeriocos · 2020-03-12T18:14:53Z

ELK keeps a description for each enriched data used to build the KIbiter dashboards. Such descriptions are stored in the folder schema as CSV files. Over time, these descriptions have evolved and the current format is defined as a list of attributes that include the name, the type, whether the field can be aggregated and a description (e.g., https://github.com/chaoss/grimoirelab-elk/blob/master/schema/git.csv). Nevertheless, some schemas are still not aligned with the latest format. For instance, this is the case for:

The goal of this issue is to update the schemas to the latest format. In order to do so, given a data source (e.g., meetup, stackoverflow), micro-mordred[*] should be executed to collect and enrich the data. Then, the enriched documents should be inspected using the dev tools or the discover of Kibiter. For each attribute found in the enriched index, the corresponding schema should contain the name of the attribute, the type, whether the field can be aggregated and a description.

Note that some fields like the grimoire_creation_date, project, project_1, origin, etc. are shared across all enriched indexes and their descriptions can be taken from existing schemas.

[*] Details to execute micro-mordred for a given data source are available at: https://github.com/chaoss/grimoirelab-sirmordred#supported-data-sources

The text was updated successfully, but these errors were encountered:

vchrombie · 2020-03-13T12:57:08Z

Hi @valeriocos
I was trying to work on this issue.

I started with the askbot. In the process, I faced a few issues. I think there is a mistake in the askbot configurations.

I think there is a typo with askbot_enrcihed in the setup.cfg
Also, it seems that https://ask.puppet.com/ is no longer active. I searched for some more askbot sites, and I got this https://ask.sagemath.org/questions/. Is it required to change it?

EDIT 1: https://ask.sagemath.org/questions/ doesn't seem to be a right endpoint but https://ask.sagemath.org/ works fine.

Just checked manually as I have receiving a 404 error.
https://ask.sagemath.org/api/v1/questions/?page=1&sort=activity-asc

EDIT 2: Here is a list of askbot sites. You can choose which would be fine for the example.

vchrombie · 2020-03-15T13:45:13Z

Hi @valeriocos

I think there is a typo with askbot_enrcihed in the setup.cfg
Also, it seems that https://ask.puppet.com/ is no longer active. I searched for some more askbot sites, and I got this https://ask.sagemath.org/questions/. Is it required to change it?

I changed it and I could be able to run the script, but unusually it is taking really long time. I will try to see what could be the issue and update you about it.

valeriocos · 2020-03-15T14:16:57Z

Sorry for the late reply @vchrombie , I thought I had answered this message

I started with the askbot. In the process, I faced a few issues. I think there is a mistake in the askbot configurations.

I think there is a typo with askbot_enrcihed in the setup.cfg
Also, it seems that https://ask.puppet.com/ is no longer active. I searched for some more askbot sites, and I got this https://ask.sagemath.org/questions/. Is it required to change it?

Please fix the mistake. WRT the askbot server, there is no specific site to target. You can try with https://askbot.org (in the past we were mining it, I have just tried with perceval* and it seems to work fine)

[*] perceval askbot https://askbot.org --no-archive

EDIT 1: https://ask.sagemath.org/questions/ doesn't seem to be a right endpoint but https://ask.sagemath.org/ works fine.

Yes, sorry the URL should be the main one (questions is added automatically here: https://github.com/chaoss/grimoirelab-perceval/blob/master/perceval/backends/core/askbot.py#L268)

vchrombie · 2020-03-15T15:13:00Z

Sorry for the late reply @vchrombie , I thought I had answered this message

No problem. 🙂

Please fix the mistake.

Sure, I will do it by night.

WRT the askbot server, there is no specific site to target. You can try with https://askbot.org (in the past we were mining it, I have just tried with perceval* and it seems to work fine)

[*] perceval askbot https://askbot.org --no-archive

Oh okay, I will try and get back to you.

Yes, sorry the URL should be the main one (questions is added automatically here: https://github.com/chaoss/grimoirelab-perceval/blob/master/perceval/backends/core/askbot.py#L268)

Thanks for the reply @valeriocos.

vchrombie · 2020-03-20T19:55:42Z

Hi @valeriocos
Thanks for your earlier reply. It solved a few issues.

Just a quick update. I have executed the micro-mordred for the askbot backend.

It is taking so much time, but ya fine with it. After some time, the index was created and I could inspect the index using the kibiter.

I tried this GET /askbot/_mapping in the dev tools and I got the fields along with the mappings. The total number was 51.
I checked the index in the Management >> Kibana >> Index Patterns and it has 56 fields. I assume we need to ignore the first 5 fields. Correct me if I am wrong?

EDIT: I have opened the PR for the same. It seems that the fields are updated. I have pushed a commit regarding it, 2904067

I will complete the PR soon. 🙂

vchrombie · 2020-03-25T18:37:39Z

Hi @valeriocos

When I was working on the askbot schema, I faced a small issue during the enrichment face. Here is the log, askbot-log.

  2020-03-21 00:48:59,531 Error enriching raw from askbot (https://askbot.org/): 'username'
Traceback (most recent call last):
  File "/home/p0tt3r/chaoss/sources/grimoirelab-elk/grimoire_elk/elk.py", line 533, in enrich_backend
    enrich_count = enrich_items(ocean_backend, enrich_backend)
  File "/home/p0tt3r/chaoss/sources/grimoirelab-elk/grimoire_elk/elk.py", line 321, in enrich_items
    total = enrich_backend.enrich_items(ocean_backend)
  File "/home/p0tt3r/chaoss/sources/grimoirelab-elk/grimoire_elk/enriched/askbot.py", line 329, in enrich_items
    (answers, comments) = self.get_rich_item_answers_comments(item)
  File "/home/p0tt3r/chaoss/sources/grimoirelab-elk/grimoire_elk/enriched/askbot.py", line 307, in get_rich_item_answers_comments
    eanswer = self.get_rich_answer(item, answer)
  File "/home/p0tt3r/chaoss/sources/grimoirelab-elk/grimoire_elk/enriched/askbot.py", line 267, in get_rich_answer
    eanswer['author_askbot_user_name'] = answer['answered_by']['username']
KeyError: 'username'

There was no trouble with the enrichment. I didn't understand what could the problem. I thought of asking it here.

valeriocos · 2020-03-25T19:08:54Z

Hi @vchrombie,

This kind of issues is generally related to a user that removed his account. In this case, the enricher is assuming that the username is always there. A possible to solution is to use the get method as follows: answer['answered_by'].get('username'). However, this may require to patch other parts of the code.

Waiting for a patch to fix this bug :)

vchrombie · 2020-03-25T20:10:56Z

Hi @valeriocos.

This kind of issues is generally related to a user that removed his account. In this case, the enricher is assuming that the username is always there. A possible to solution is to use the get method as follows: answer['answered_by'].get('username').

Thanks for the clarification.

However, this may require to patch other parts of the code.

Other parts you mean, in elk.py or just askbot.py?

Waiting for a patch to fix this bug :)

Can I work on this, if you don't have any problem?

valeriocos · 2020-03-26T07:30:27Z

Thanks for the clarification.

You're welcome!

Other parts you mean, in elk.py or just askbot.py?

Just askbot.py

Can I work on this, if you don't have any problem?

Sure, please start when you have time

Thanks!

vchrombie · 2020-04-08T16:50:41Z

#803 (comment)

The updated list is

vchrombie · 2021-03-22T18:09:32Z

Hi @rohanreddych

I tried to run griomoirelab locally using docker

The docker image is quite outdated and hasn't been updated so long. It might not have the latest changes to that enriched. It would be great if you can try the docker-compose method. This is almost similar to the docker method except this uses the latest releases. It would be even great if you are using the developer setup for GrimoireLab.

But stackoverflow data is not being collected and shown. Only git and github data is being shown.

One reason could be the time. It looks like there are many sources, so it might take 10-15 minutes for the data to appear on the dashboards. Else it could be an issue of the outdated image or some typo in the configurations.

there is no field called answer_status which is the first field in https://github.com/chaoss/grimoirelab-elk/blob/master/schema/stackoverflow.csv

The fields might be deprecated now, so the schema should be updated as well.

vchrombie · 2021-03-22T18:17:25Z

For the people who are interested to work on this issue.

You can execute micro-mordred to collect and enrich the data of a particular data source. You can inspect the enriched documents using the dev tools or the discover of Kibiter. For each attribute found in the enriched index, the corresponding schema should contain the name of the attribute, the type, whether the field can be aggregated, and a description.
More Information.

You can use this script for automating the process and creating the schema file from the index.
https://gist.github.com/vchrombie/bf6a682edcf47624126317897e58679c

vchrombie · 2021-10-05T17:49:01Z

Closing this issue in favour of #1010

valeriocos added the good first issue Good issue for first-time contributors label Mar 12, 2020

This was referenced Mar 20, 2020

[schema] Update askbot.csv #812

Merged

[schema] Update dockerhub.csv #814

Merged

valeriocos mentioned this issue Apr 8, 2020

Update bugzilla csv to latest format #836

Closed

vchrombie mentioned this issue Apr 11, 2020

Perceval git backend error "item['data']['Author']" #841

Closed

rohanreddych mentioned this issue Mar 23, 2021

[schema] Update stackoverflow.csv #966

Open

vchrombie mentioned this issue Oct 5, 2021

Update schemas to the latest format #1010

Open

vchrombie closed this as completed Oct 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update schemas to latest format #803

Update schemas to latest format #803

valeriocos commented Mar 12, 2020 •

edited

Loading

vchrombie commented Mar 13, 2020 •

edited

Loading

vchrombie commented Mar 15, 2020

valeriocos commented Mar 15, 2020 •

edited

Loading

vchrombie commented Mar 15, 2020

vchrombie commented Mar 20, 2020 •

edited

Loading

vchrombie commented Mar 25, 2020

valeriocos commented Mar 25, 2020

vchrombie commented Mar 25, 2020

valeriocos commented Mar 26, 2020

vchrombie commented Apr 8, 2020

vchrombie commented Mar 22, 2021

vchrombie commented Mar 22, 2021

vchrombie commented Oct 5, 2021

Update schemas to latest format #803

Update schemas to latest format #803

Comments

valeriocos commented Mar 12, 2020 • edited Loading

vchrombie commented Mar 13, 2020 • edited Loading

vchrombie commented Mar 15, 2020

valeriocos commented Mar 15, 2020 • edited Loading

vchrombie commented Mar 15, 2020

vchrombie commented Mar 20, 2020 • edited Loading

vchrombie commented Mar 25, 2020

valeriocos commented Mar 25, 2020

vchrombie commented Mar 25, 2020

valeriocos commented Mar 26, 2020

vchrombie commented Apr 8, 2020

vchrombie commented Mar 22, 2021

vchrombie commented Mar 22, 2021

vchrombie commented Oct 5, 2021

valeriocos commented Mar 12, 2020 •

edited

Loading

vchrombie commented Mar 13, 2020 •

edited

Loading

valeriocos commented Mar 15, 2020 •

edited

Loading

vchrombie commented Mar 20, 2020 •

edited

Loading