Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update schemas to latest format #803

Closed
valeriocos opened this issue Mar 12, 2020 · 13 comments
Closed

Update schemas to latest format #803

valeriocos opened this issue Mar 12, 2020 · 13 comments
Labels
good first issue Good issue for first-time contributors

Comments

@valeriocos
Copy link
Member

valeriocos commented Mar 12, 2020

ELK keeps a description for each enriched data used to build the KIbiter dashboards. Such descriptions are stored in the folder schema as CSV files. Over time, these descriptions have evolved and the current format is defined as a list of attributes that include the name, the type, whether the field can be aggregated and a description (e.g., https://github.com/chaoss/grimoirelab-elk/blob/master/schema/git.csv). Nevertheless, some schemas are still not aligned with the latest format. For instance, this is the case for:

The goal of this issue is to update the schemas to the latest format. In order to do so, given a data source (e.g., meetup, stackoverflow), micro-mordred[*] should be executed to collect and enrich the data. Then, the enriched documents should be inspected using the dev tools or the discover of Kibiter. For each attribute found in the enriched index, the corresponding schema should contain the name of the attribute, the type, whether the field can be aggregated and a description.

Note that some fields like the grimoire_creation_date, project, project_1, origin, etc. are shared across all enriched indexes and their descriptions can be taken from existing schemas.

[*] Details to execute micro-mordred for a given data source are available at: https://github.com/chaoss/grimoirelab-sirmordred#supported-data-sources

@valeriocos valeriocos added the good first issue Good issue for first-time contributors label Mar 12, 2020
@vchrombie
Copy link
Member

vchrombie commented Mar 13, 2020

Hi @valeriocos
I was trying to work on this issue.

I started with the askbot. In the process, I faced a few issues. I think there is a mistake in the askbot configurations.

I think there is a typo with askbot_enrcihed in the setup.cfg
Also, it seems that https://ask.puppet.com/ is no longer active. I searched for some more askbot sites, and I got this https://ask.sagemath.org/questions/. Is it required to change it?

EDIT 1: https://ask.sagemath.org/questions/ doesn't seem to be a right endpoint but https://ask.sagemath.org/ works fine.

Just checked manually as I have receiving a 404 error.
https://ask.sagemath.org/api/v1/questions/?page=1&sort=activity-asc

EDIT 2: Here is a list of askbot sites. You can choose which would be fine for the example.

@vchrombie
Copy link
Member

Hi @valeriocos

I think there is a typo with askbot_enrcihed in the setup.cfg
Also, it seems that https://ask.puppet.com/ is no longer active. I searched for some more askbot sites, and I got this https://ask.sagemath.org/questions/. Is it required to change it?

I changed it and I could be able to run the script, but unusually it is taking really long time. I will try to see what could be the issue and update you about it.

@valeriocos
Copy link
Member Author

valeriocos commented Mar 15, 2020

Sorry for the late reply @vchrombie , I thought I had answered this message

I started with the askbot. In the process, I faced a few issues. I think there is a mistake in the askbot configurations.

I think there is a typo with askbot_enrcihed in the setup.cfg
Also, it seems that https://ask.puppet.com/ is no longer active. I searched for some more askbot sites, and I got this https://ask.sagemath.org/questions/. Is it required to change it?

Please fix the mistake. WRT the askbot server, there is no specific site to target. You can try with https://askbot.org (in the past we were mining it, I have just tried with perceval* and it seems to work fine)

[*] perceval askbot https://askbot.org --no-archive

EDIT 1: https://ask.sagemath.org/questions/ doesn't seem to be a right endpoint but https://ask.sagemath.org/ works fine.

Yes, sorry the URL should be the main one (questions is added automatically here: https://github.com/chaoss/grimoirelab-perceval/blob/master/perceval/backends/core/askbot.py#L268)

@vchrombie
Copy link
Member

Sorry for the late reply @vchrombie , I thought I had answered this message

No problem. 🙂

Please fix the mistake.

Sure, I will do it by night.

WRT the askbot server, there is no specific site to target. You can try with https://askbot.org (in the past we were mining it, I have just tried with perceval* and it seems to work fine)

[*] perceval askbot https://askbot.org --no-archive

Oh okay, I will try and get back to you.

Yes, sorry the URL should be the main one (questions is added automatically here: https://github.com/chaoss/grimoirelab-perceval/blob/master/perceval/backends/core/askbot.py#L268)

Thanks for the reply @valeriocos.

@vchrombie
Copy link
Member

vchrombie commented Mar 20, 2020

Hi @valeriocos
Thanks for your earlier reply. It solved a few issues.

Just a quick update. I have executed the micro-mordred for the askbot backend.

image

It is taking so much time, but ya fine with it. After some time, the index was created and I could inspect the index using the kibiter.

I tried this GET /askbot/_mapping in the dev tools and I got the fields along with the mappings. The total number was 51.
I checked the index in the Management >> Kibana >> Index Patterns and it has 56 fields. I assume we need to ignore the first 5 fields. Correct me if I am wrong?

image

EDIT: I have opened the PR for the same. It seems that the fields are updated. I have pushed a commit regarding it, 2904067

I will complete the PR soon. 🙂

This was referenced Mar 20, 2020
@vchrombie
Copy link
Member

Hi @valeriocos

When I was working on the askbot schema, I faced a small issue during the enrichment face. Here is the log, askbot-log.

  2020-03-21 00:48:59,531 Error enriching raw from askbot (https://askbot.org/): 'username'
Traceback (most recent call last):
  File "/home/p0tt3r/chaoss/sources/grimoirelab-elk/grimoire_elk/elk.py", line 533, in enrich_backend
    enrich_count = enrich_items(ocean_backend, enrich_backend)
  File "/home/p0tt3r/chaoss/sources/grimoirelab-elk/grimoire_elk/elk.py", line 321, in enrich_items
    total = enrich_backend.enrich_items(ocean_backend)
  File "/home/p0tt3r/chaoss/sources/grimoirelab-elk/grimoire_elk/enriched/askbot.py", line 329, in enrich_items
    (answers, comments) = self.get_rich_item_answers_comments(item)
  File "/home/p0tt3r/chaoss/sources/grimoirelab-elk/grimoire_elk/enriched/askbot.py", line 307, in get_rich_item_answers_comments
    eanswer = self.get_rich_answer(item, answer)
  File "/home/p0tt3r/chaoss/sources/grimoirelab-elk/grimoire_elk/enriched/askbot.py", line 267, in get_rich_answer
    eanswer['author_askbot_user_name'] = answer['answered_by']['username']
KeyError: 'username'

There was no trouble with the enrichment. I didn't understand what could the problem. I thought of asking it here.

@valeriocos
Copy link
Member Author

Hi @vchrombie,

This kind of issues is generally related to a user that removed his account. In this case, the enricher is assuming that the username is always there. A possible to solution is to use the get method as follows: answer['answered_by'].get('username'). However, this may require to patch other parts of the code.

Waiting for a patch to fix this bug :)

@vchrombie
Copy link
Member

Hi @valeriocos.

This kind of issues is generally related to a user that removed his account. In this case, the enricher is assuming that the username is always there. A possible to solution is to use the get method as follows: answer['answered_by'].get('username').

Thanks for the clarification.

However, this may require to patch other parts of the code.

Other parts you mean, in elk.py or just askbot.py?

Waiting for a patch to fix this bug :)

Can I work on this, if you don't have any problem?

@valeriocos
Copy link
Member Author

Thanks for the clarification.

You're welcome!

Other parts you mean, in elk.py or just askbot.py?

Just askbot.py

Can I work on this, if you don't have any problem?

Sure, please start when you have time

Thanks!

@vchrombie
Copy link
Member

Hi @rohanreddych

I tried to run griomoirelab locally using docker

The docker image is quite outdated and hasn't been updated so long. It might not have the latest changes to that enriched. It would be great if you can try the docker-compose method. This is almost similar to the docker method except this uses the latest releases. It would be even great if you are using the developer setup for GrimoireLab.

But stackoverflow data is not being collected and shown. Only git and github data is being shown.

One reason could be the time. It looks like there are many sources, so it might take 10-15 minutes for the data to appear on the dashboards. Else it could be an issue of the outdated image or some typo in the configurations.

there is no field called answer_status which is the first field in https://github.com/chaoss/grimoirelab-elk/blob/master/schema/stackoverflow.csv

The fields might be deprecated now, so the schema should be updated as well.

@vchrombie
Copy link
Member

For the people who are interested to work on this issue.

You can execute micro-mordred to collect and enrich the data of a particular data source. You can inspect the enriched documents using the dev tools or the discover of Kibiter. For each attribute found in the enriched index, the corresponding schema should contain the name of the attribute, the type, whether the field can be aggregated, and a description.
More Information.

You can use this script for automating the process and creating the schema file from the index.
https://gist.github.com/vchrombie/bf6a682edcf47624126317897e58679c

@vchrombie
Copy link
Member

Closing this issue in favour of #1010

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good issue for first-time contributors
Projects
None yet
Development

No branches or pull requests

2 participants