Skip to content

Error when attempting to access private repo #30

@jfredrickson5

Description

@jfredrickson5

Attempting to run scraper on a GitHub org with private repos results in an error.

Output:

% scraper --config config.json                     
2019-04-23 17:29:12,536 - INFO: Connected to: https://github.com                                     
2019-04-23 17:29:12,773 - INFO: Processing: GSA/private-test                                         
Traceback (most recent call last):
  File "/home/jf/.pyenv/versions/3.7.0/bin/scraper", line 11, in <module>                            
    load_entry_point('llnl-scraper', 'console_scripts', 'scraper')()                                 
  File "/home/jf/gsa/scraper/scraper/gen_code_gov_json.py", line 76, in main                         
    code_json = code_gov.process_config(config_json)                                                 
  File "/home/jf/gsa/scraper/scraper/code_gov/__init__.py", line 58, in process_config               
    code_gov_project = Project.from_github3(repo, labor_hours=compute_labor_hours)                   
  File "/home/jf/gsa/scraper/scraper/code_gov/models.py", line 217, in from_github3                  
    elif date_parse(repository.created_at) < POLICY_START_DATE:                                      
  File "/home/jf/.pyenv/versions/3.7.0/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 1356, in parse
    return DEFAULTPARSER.parse(timestr, **kwargs)
  File "/home/jf/.pyenv/versions/3.7.0/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 645, in parse
    res, skipped_tokens = self._parse(timestr, **kwargs)                                             
  File "/home/jf/.pyenv/versions/3.7.0/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 721, in _parse
    l = _timelex.split(timestr)         # Splits the timestr into tokens                             
  File "/home/jf/.pyenv/versions/3.7.0/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 207, in split
    return list(cls(s))
  File "/home/jf/.pyenv/versions/3.7.0/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 76, in __init__
    '{itype}'.format(itype=instream.__class__.__name__))                                             
TypeError: Parser must be a string or character stream, not datetime

Here is a simplified config.json as a test case. The GSA/private-test repo is private and contains a README.md file.

{
  "agency": "GSA",
  "contact_email": "[email protected]",
  "GitHub": [
    {
      "public_only": false,
      "repos": [
        "GSA/private-test"
      ]
    }
  ]
}

Example of a real config.json where we encountered the issue. It scans properly until it arrives at a private repo, at which point it crashes.

{
  "agency": "GSA",
  "contact_email": "[email protected]",
  "GitHub": [
    {
      "public_only": false,
      "orgs": [
        "GSA",
        "18F",
        "presidential-innovation-fellows",
        "USWDS"
      ],
    }
  ]
}

Verified that my GitHub access token is valid and can view private repos by using the same token for a different script.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions