MI project collects data from GitHub repositories. You can use it to either collect data stored locally or within Amazon's S3 cloud. For personal usage, checkout <Usage> section.
Together with mi-scheduler, we provide automated data extraction pipeline for data minig of requested repositories and organizations. This pipeline can be scheduled customly, e.g. to run daily, weekly, and so on.
To request data extraction for repository or organization, create Data Extraction Issue in MI-Scheduler repository. Use this link TODO
MI pipeline is simple to understand, see diagram below
+---------+
|ConfigMap|
+----+----+
|
+--+-------+--------+--+
| | | |
| | mi-scheduler | |
| | | |
+------+---+---+-------+
| | | | |
| | | | |
| | | | |
| Argo Workflows |
| | | | |
| | | | |
+---------------v---v---v---v----v------------------+ +-------------------- +--------------------+
| | | Visualization | | Recommendation |
| +---------+ +---------+ +---------+ | +-------------------+ +--------------------+
| |thoth/ | | AICoE | | your | | | Project Health | | thoth |
| | station| | | | org | | | (dashboard) | | |
| +---------+ +---------+ +---------+ | | | | |
| |solver | |... | |your | | +---------+---------+ +----------+---------+
| | | | | | repos | | thoth-station/mi ^ ^
| |amun | |... | X X X X X | | | (Meta-information Indicators) | |
| | | | | | | | +-------------+---------------+
| |adviser | |... | | | | |
| | | | | | | | |
| |.... | |... | | | | +-----------------+-------------------+
| | | | | | | | | |
| +---------+ +---------+ +---------+ | | Knowledge Processsing |
| | | |
+-----------------------+---------------------------+ +-----------------+-------------------+
GitHub repositories | ^
| +--------------------------------------------------------+ |
| | | |
| | Entities Analysis +-------> Knowledge | |
+---------------->-+ +--------------------+
+---------+----------------+----------+------------------+
| Issues | Pull Requests | Readmes | etc........... |
| | | | |
+---------+----------------+----------+------------------+
MI analyses entities specified on the srcopsmetrics/entities page Entity is essentialy a repository metadata that is being inspected (e.g. Issue or Pull Request), from which specified features are extracted and are stored to dataframe.
MI is essentialy wrapped around PyGitHub module to provide careless data extraction with API rate limit handling and data updating.
MI is available through PyPI, so you can do
pip install srcopsmetricsAlternatively, you can install srcopsmetrics by cloning repository
git clone https://github.com/thoth-station/mi.git
cd mi
pipenv install --devTo be able to extract data from GitHub, access token must be configured. To generate one, read this
To use the token with mi, set GITHUB_ACESS_TOKEN environment variable to the token value, for example:
export GITHUB_ACESS_TOKEN=<token_string>or
GITHUB_ACESS_TOKEN=<token_string> python -m srcopsmetrics.cli ...and etc.
To store data locally, use -l when calling CLI or set is_local=True when using MI as a module.
By default MI will try to store the data on Ceph. In order to store on Ceph you need to provide the following env variables:
S3_ENDPOINT_URLCeph Host nameCEPH_BUCKETCeph Bucket nameCEPH_BUCKET_PREFIXCeph PrefixCEPH_KEY_IDCeph Key IDCEPH_SECRET_KEYCeph Secret Key
For more information about Ceph storing look here
To view all of the available commands and their description use
python -m srcopsmetrics.cli --helpSee some of the general usage examples below
python -m srcopsmetrics.cli --create --is-local --repository foo_repo --entities PullRequestwhich is equivalent to
python -m srcopsmetrics.cli -clr foo_repo -e PullRequestpython -m srcopsmetrics.cli -clo foo_org -e PullRequestpython -m srcopsmetrics.cli -clr foo_repo,bar_repo -e PullRequestpython -m srcopsmetrics.cli -clr foo_repo -e PullRequest,Issue,CommitTo know more about indicators that are extracted from data, check out Meta-Information Indicators.
Always feel free to open new Issues or engage in already existing ones!
If you want to contribute by adding new entity or metric that will be analysed from GitHub repositories,
feel free to open up an Issue and describe why do you think this new entity should be analysed and what
are the benefits of doing so according to the goal of thoth-station/mi project.
After creating Issue, you can wait for the response of thoth-station devs
Do not forget to reference the Issue in your Pull Request.
Look at Template entity to get an idea for requirements that need to be satisfied for custom entity implementation.