Skip to content

Commit b883d4c

Browse files
committed
feature/analyze-extra-tags -> [FEATURE - DISABLING FOLLOWING INNER LINKS] Added the feature to disable/enable analyzing all the inner links present in a page by adding 'follow_links' options to the 'analyze' function. Also, updated the README.md docs
1 parent f61ddc0 commit b883d4c

3 files changed

Lines changed: 18 additions & 4 deletions

File tree

README.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ output = analyze(site, sitemap)
5656
print(output)
5757
```
5858

59-
If you would like to analyze heading tags (h1-h6) and other extra additional tags as well, then pass the following options to the `analyze` function
59+
In order to analyze heading tags (h1-h6) and other extra additional tags as well, the following options can be passed to the `analyze` function
6060
```python
6161
from seoanalyzer import analyze
6262

@@ -65,6 +65,16 @@ output = analyze(site, sitemap, analyze_headings=True, analyze_extra_tags=True)
6565
print(output)
6666
```
6767

68+
By default, the `analyze` function analyzes all the existing inner links as well, which might be time consuming.
69+
This default behaviour can be changed to analyze only the provided URL by passing the following option to the `analyze` function
70+
```python
71+
from seoanalyzer import analyze
72+
73+
output = analyze(site, sitemap, follow_links=False)
74+
75+
print(output)
76+
```
77+
6878
Alternatively, you can run the analysis as a script from the seoanalyzer folder.
6979

7080
```sh

seoanalyzer/analyzer.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,15 +4,15 @@
44
from operator import itemgetter
55
from seoanalyzer.website import Website
66

7-
def analyze(url, sitemap_url=None, analyze_headings=False, analyze_extra_tags=False):
7+
def analyze(url, sitemap_url=None, analyze_headings=False, analyze_extra_tags=False, follow_links=True):
88
start_time = time.time()
99

1010
def calc_total_time():
1111
return time.time() - start_time
1212

1313
output = {'pages': [], 'keywords': [], 'errors': [], 'total_time': calc_total_time()}
1414

15-
site = Website(url, sitemap_url, analyze_headings, analyze_extra_tags)
15+
site = Website(url, sitemap_url, analyze_headings, analyze_extra_tags, follow_links)
1616

1717
site.crawl()
1818

seoanalyzer/website.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,12 @@
99
from seoanalyzer.page import Page
1010

1111
class Website():
12-
def __init__(self, base_url, sitemap, analyze_headings, analyze_extra_tags):
12+
def __init__(self, base_url, sitemap, analyze_headings, analyze_extra_tags, follow_links):
1313
self.base_url = base_url
1414
self.sitemap = sitemap
1515
self.analyze_headings = analyze_headings
1616
self.analyze_extra_tags = analyze_extra_tags
17+
self.follow_links = follow_links
1718
self.crawled_pages = []
1819
self.crawled_urls = set([])
1920
self.page_queue = []
@@ -87,3 +88,6 @@ def crawl(self):
8788

8889
self.crawled_pages.append(page)
8990
self.crawled_urls.add(page.url)
91+
92+
if not self.follow_links:
93+
break

0 commit comments

Comments
 (0)