Skip to content

Commit a0a711b

Browse files
authored
More options for pull-requests: --state, --org, and --search (#80)
* always ask for 100 items when paginating (helps #79) * fix typos in README.md * ignore test and build artifacts * --org and --state options for pull-requests * --search for pull-requests, but it can only get 1000
1 parent 56f2aee commit a0a711b

4 files changed

Lines changed: 90 additions & 24 deletions

File tree

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,4 +8,5 @@ venv
88
.eggs
99
.pytest_cache
1010
*.egg-info
11-
11+
.coverage
12+
build/

README.md

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -82,13 +82,25 @@ You can use the `--pull-request` option one or more times to load specific pull
8282

8383
Note that the `merged_by` column on the `pull_requests` table will only be populated for pull requests that are loaded using the `--pull-request` option - the GitHub API does not return this field for pull requests that are loaded in bulk.
8484

85+
You can load only pull requests in a certain state with the `--state` option:
86+
87+
$ github-to-sqlite pull-requests --state=open github.db simonw/datasette
88+
89+
Pull requests across an entire organization (or more than one) can be loaded with `--org`:
90+
91+
$ github-to-sqlite pull-requests --state=open --org=psf --org=python github.db
92+
93+
You can use a search query to find pull requests. Note that no more than 1000 will be loaded (this is a GitHub API limitation), and some data will be missing (base and head SHAs). When using searches, other filters are ignored; put all criteria into the search itself:
94+
95+
$ github-to-sqlite pull-requests --search='org:python defaultdict state:closed created:<2023-09-01' github.db
96+
8597
Example: [pull_requests table](https://github-to-sqlite.dogsheep.net/github/pull_requests)
8698

8799
## Fetching issue comments for a repository
88100

89101
The `issue-comments` command retrieves all of the comments on all of the issues in a repository.
90102

91-
It is recommended you run `issues` first, so that each imported comment can have a foreign key poining to its issue.
103+
It is recommended you run `issues` first, so that each imported comment can have a foreign key pointing to its issue.
92104

93105
$ github-to-sqlite issues github.db simonw/datasette
94106
$ github-to-sqlite issue-comments github.db simonw/datasette
@@ -101,7 +113,7 @@ Example: [issue_comments table](https://github-to-sqlite.dogsheep.net/github/iss
101113

102114
## Fetching commits for a repository
103115

104-
The `commits` command retrieves details of all of the commits for one or more repositories. It currently fetches the sha, commit message and author and committer details - it does no retrieve the full commit body.
116+
The `commits` command retrieves details of all of the commits for one or more repositories. It currently fetches the SHA, commit message and author and committer details; it does not retrieve the full commit body.
105117

106118
$ github-to-sqlite commits github.db simonw/datasette simonw/sqlite-utils
107119

@@ -156,7 +168,7 @@ You can pass more than one username to fetch for multiple users or organizations
156168

157169
$ github-to-sqlite repos github.db simonw dogsheep
158170

159-
Add the `--readme` option to save the README for the repo in a column called `readme`. Add `--readme-html` to save the HTML rendered version of the README into a collumn called `readme_html`.
171+
Add the `--readme` option to save the README for the repo in a column called `readme`. Add `--readme-html` to save the HTML rendered version of the README into a column called `readme_html`.
160172

161173
Example: [repos table](https://github-to-sqlite.dogsheep.net/github/repos)
162174

@@ -216,7 +228,7 @@ You can fetch a list of every emoji supported by GitHub using the `emojis` comma
216228

217229
$ github-to-sqlite emojis github.db
218230

219-
This will create a table callad `emojis` with a primary key `name` and a `url` column.
231+
This will create a table called `emojis` with a primary key `name` and a `url` column.
220232

221233
If you add the `--fetch` option the command will also fetch the binary content of the images and place them in an `image` column:
222234

@@ -235,7 +247,7 @@ The `github-to-sqlite get` command provides a convenient shortcut for making aut
235247

236248
This will make an authenticated call to the URL you provide and pretty-print the resulting JSON to the console.
237249

238-
You can ommit the `https://api.github.com/` prefix, for example:
250+
You can omit the `https://api.github.com/` prefix, for example:
239251

240252
$ github-to-sqlite get /gists
241253

github_to_sqlite/cli.py

Lines changed: 42 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
import click
22
import datetime
3+
import itertools
34
import pathlib
45
import textwrap
56
import os
@@ -104,19 +105,53 @@ def issues(db_path, repo, issue_ids, auth, load):
104105
type=click.Path(file_okay=True, dir_okay=False, allow_dash=True, exists=True),
105106
help="Load pull-requests JSON from this file instead of the API",
106107
)
107-
def pull_requests(db_path, repo, pull_request_ids, auth, load):
108+
@click.option(
109+
"--org",
110+
"orgs",
111+
help="Fetch all pull requests from this GitHub organization",
112+
multiple=True,
113+
)
114+
@click.option(
115+
"--state",
116+
help="Only fetch pull requests in this state",
117+
)
118+
@click.option(
119+
"--search",
120+
help="Find pull requests with a search query",
121+
)
122+
def pull_requests(db_path, repo, pull_request_ids, auth, load, orgs, state, search):
108123
"Save pull_requests for a specified repository, e.g. simonw/datasette"
109124
db = sqlite_utils.Database(db_path)
110125
token = load_token(auth)
111-
repo_full = utils.fetch_repo(repo, token)
112-
utils.save_repo(db, repo_full)
113126
if load:
127+
repo_full = utils.fetch_repo(repo, token)
128+
utils.save_repo(db, repo_full)
114129
pull_requests = json.load(open(load))
130+
utils.save_pull_requests(db, pull_requests, repo_full)
131+
elif search:
132+
repos_seen = set()
133+
search += " is:pr"
134+
pull_requests = utils.fetch_searched_pulls_or_issues(search, token)
135+
for pull_request in pull_requests:
136+
pr_repo_url = pull_request["repository_url"]
137+
if pr_repo_url not in repos_seen:
138+
pr_repo = utils.fetch_repo(url=pr_repo_url)
139+
utils.save_repo(db, pr_repo)
140+
repos_seen.add(pr_repo_url)
141+
utils.save_pull_requests(db, [pull_request], pr_repo)
115142
else:
116-
pull_requests = utils.fetch_pull_requests(repo, token, pull_request_ids)
117-
118-
pull_requests = list(pull_requests)
119-
utils.save_pull_requests(db, pull_requests, repo_full)
143+
if orgs:
144+
repos = itertools.chain.from_iterable(
145+
utils.fetch_all_repos(token=token, org=org)
146+
for org in orgs
147+
)
148+
else:
149+
repos = [utils.fetch_repo(repo, token)]
150+
for repo_full in repos:
151+
utils.save_repo(db, repo_full)
152+
repo = repo_full["full_name"]
153+
pull_requests = utils.fetch_pull_requests(repo, state, token, pull_request_ids)
154+
utils.save_pull_requests(db, pull_requests, repo_full)
120155
utils.ensure_db_shape(db)
121156

122157

github_to_sqlite/utils.py

Lines changed: 29 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
import requests
33
import re
44
import time
5+
import urllib.parse
56
import yaml
67

78
FTS_CONFIG = {
@@ -170,17 +171,21 @@ def save_pull_requests(db, pull_requests, repo):
170171
# Add repo key
171172
pull_request["repo"] = repo["id"]
172173
# Pull request _links can be flattened to just their URL
173-
pull_request["url"] = pull_request["_links"]["html"]["href"]
174-
pull_request.pop("_links")
174+
if "_links" in pull_request:
175+
pull_request["url"] = pull_request["_links"]["html"]["href"]
176+
pull_request.pop("_links")
177+
else:
178+
pull_request["url"] = pull_request["pull_request"]["html_url"]
175179
# Extract user
176180
pull_request["user"] = save_user(db, pull_request["user"])
177181
labels = pull_request.pop("labels")
178182
# Extract merged_by, if it exists
179183
if pull_request.get("merged_by"):
180184
pull_request["merged_by"] = save_user(db, pull_request["merged_by"])
181185
# Head sha
182-
pull_request["head"] = pull_request["head"]["sha"]
183-
pull_request["base"] = pull_request["base"]["sha"]
186+
if "head" in pull_request:
187+
pull_request["head"] = pull_request["head"]["sha"]
188+
pull_request["base"] = pull_request["base"]["sha"]
184189
# Extract milestone
185190
if pull_request["milestone"]:
186191
pull_request["milestone"] = save_milestone(
@@ -292,12 +297,13 @@ def save_issue_comment(db, comment):
292297
return last_pk
293298

294299

295-
def fetch_repo(full_name, token=None):
300+
def fetch_repo(full_name=None, token=None, url=None):
296301
headers = make_headers(token)
297302
# Get topics:
298303
headers["Accept"] = "application/vnd.github.mercy-preview+json"
299-
owner, slug = full_name.split("/")
300-
url = "https://api.github.com/repos/{}/{}".format(owner, slug)
304+
if url is None:
305+
owner, slug = full_name.split("/")
306+
url = "https://api.github.com/repos/{}/{}".format(owner, slug)
301307
response = requests.get(url, headers=headers)
302308
response.raise_for_status()
303309
return response.json()
@@ -358,7 +364,7 @@ def fetch_issues(repo, token=None, issue_ids=None):
358364
yield from issues
359365

360366

361-
def fetch_pull_requests(repo, token=None, pull_request_ids=None):
367+
def fetch_pull_requests(repo, state=None, token=None, pull_request_ids=None):
362368
headers = make_headers(token)
363369
headers["accept"] = "application/vnd.github.v3+json"
364370
if pull_request_ids:
@@ -370,11 +376,20 @@ def fetch_pull_requests(repo, token=None, pull_request_ids=None):
370376
response.raise_for_status()
371377
yield response.json()
372378
else:
373-
url = "https://api.github.com/repos/{}/pulls?state=all&filter=all".format(repo)
379+
state = state or "all"
380+
url = f"https://api.github.com/repos/{repo}/pulls?state={state}"
374381
for pull_requests in paginate(url, headers):
375382
yield from pull_requests
376383

377384

385+
def fetch_searched_pulls_or_issues(query, token=None):
386+
headers = make_headers(token)
387+
url = "https://api.github.com/search/issues?"
388+
url += urllib.parse.urlencode({"q": query})
389+
for pulls_or_issues in paginate(url, headers):
390+
yield from pulls_or_issues["items"]
391+
392+
378393
def fetch_issue_comments(repo, token=None, issue=None):
379394
assert "/" in repo
380395
headers = make_headers(token)
@@ -445,13 +460,15 @@ def fetch_stargazers(repo, token=None):
445460
yield from stargazers
446461

447462

448-
def fetch_all_repos(username=None, token=None):
449-
assert username or token, "Must provide username= or token= or both"
463+
def fetch_all_repos(username=None, token=None, org=None):
464+
assert username or token or org, "Must provide username= or token= or org= or a combination"
450465
headers = make_headers(token)
451466
# Get topics for each repo:
452467
headers["Accept"] = "application/vnd.github.mercy-preview+json"
453468
if username:
454469
url = "https://api.github.com/users/{}/repos".format(username)
470+
elif org:
471+
url = "https://api.github.com/orgs/{}/repos".format(org)
455472
else:
456473
url = "https://api.github.com/user/repos"
457474
for repos in paginate(url, headers):
@@ -469,6 +486,7 @@ def fetch_user(username=None, token=None):
469486

470487

471488
def paginate(url, headers=None):
489+
url += ("&" if "?" in url else "?") + "per_page=100"
472490
while url:
473491
response = requests.get(url, headers=headers)
474492
# For HTTP 204 no-content this yields an empty list

0 commit comments

Comments
 (0)