Skip to content

Crawlers & Metrics

These pods retrieve the necessary information about repositories and calculate our defined metrics for assessing the health of OSS projects.

Secrets

  • arango-worker-pwd
  • redis-auth
  • ghtoken

Interacting Components

  • Redis
  • ArangoDB

c-drone

A Python Celery worker that queries GitHub mostly via GraphQL, but also via REST API and crawls the github.com repository pages.

Stores the results in the repositories collection in ArangoDB and calls the subsequent task for calculating the metrics processed by m-drone.

m-drone

A Python Celery worker that receives the results from c-drone and calculates metrics.

Stores the results in the metrics collection in ArangoDB.

bak-rest-drone

A Python Celery worker that queries GitHub mostly REST API, but also crawls the github.com repository pages.

Stores the data about the repositories in the bak_repos and the calculated metrics in the bak_metrics collection in ArangoDB.

We mostly use the source code developed by Jacqueline Schmatz for her master thesis (with some modifications). The metrics Jacqueline Schmatz chose are listed here.