-
Notifications
You must be signed in to change notification settings - Fork 27
Open
Description
Attempting to run scraper on a GitHub org with private repos results in an error.
Output:
% scraper --config config.json
2019-04-23 17:29:12,536 - INFO: Connected to: https://github.com
2019-04-23 17:29:12,773 - INFO: Processing: GSA/private-test
Traceback (most recent call last):
File "/home/jf/.pyenv/versions/3.7.0/bin/scraper", line 11, in <module>
load_entry_point('llnl-scraper', 'console_scripts', 'scraper')()
File "/home/jf/gsa/scraper/scraper/gen_code_gov_json.py", line 76, in main
code_json = code_gov.process_config(config_json)
File "/home/jf/gsa/scraper/scraper/code_gov/__init__.py", line 58, in process_config
code_gov_project = Project.from_github3(repo, labor_hours=compute_labor_hours)
File "/home/jf/gsa/scraper/scraper/code_gov/models.py", line 217, in from_github3
elif date_parse(repository.created_at) < POLICY_START_DATE:
File "/home/jf/.pyenv/versions/3.7.0/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 1356, in parse
return DEFAULTPARSER.parse(timestr, **kwargs)
File "/home/jf/.pyenv/versions/3.7.0/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 645, in parse
res, skipped_tokens = self._parse(timestr, **kwargs)
File "/home/jf/.pyenv/versions/3.7.0/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 721, in _parse
l = _timelex.split(timestr) # Splits the timestr into tokens
File "/home/jf/.pyenv/versions/3.7.0/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 207, in split
return list(cls(s))
File "/home/jf/.pyenv/versions/3.7.0/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 76, in __init__
'{itype}'.format(itype=instream.__class__.__name__))
TypeError: Parser must be a string or character stream, not datetime
Here is a simplified config.json as a test case. The GSA/private-test repo is private and contains a README.md file.
{
"agency": "GSA",
"contact_email": "[email protected]",
"GitHub": [
{
"public_only": false,
"repos": [
"GSA/private-test"
]
}
]
}
Example of a real config.json where we encountered the issue. It scans properly until it arrives at a private repo, at which point it crashes.
{
"agency": "GSA",
"contact_email": "[email protected]",
"GitHub": [
{
"public_only": false,
"orgs": [
"GSA",
"18F",
"presidential-innovation-fellows",
"USWDS"
],
}
]
}
Verified that my GitHub access token is valid and can view private repos by using the same token for a different script.
Metadata
Metadata
Assignees
Labels
No labels