feature/analyze-extra-tags -> [FEATURE - DISABLING FOLLOWING INNER LINKS] Added the feature to disable/enable analyzing all the inner links present in a page by adding 'follow_links' options to the 'analyze' function. Also, updated the README.md docs

SummitStha · SummitStha · commit b883d4cdfbf6 · 2021-12-02T20:17:44.000+05:45
diff --git a/README.md b/README.md
@@ -56,7 +56,7 @@ output = analyze(site, sitemap)
 print(output)
 ```
 
-If you would like to analyze heading tags (h1-h6) and other extra additional tags as well, then pass the following options to the `analyze` function
+In order to analyze heading tags (h1-h6) and other extra additional tags as well, the following options can be passed to the `analyze` function
 ```python
 from seoanalyzer import analyze
 
@@ -65,6 +65,16 @@ output = analyze(site, sitemap, analyze_headings=True, analyze_extra_tags=True)
 print(output)
 ```
 
+By default, the `analyze` function analyzes all the existing inner links as well, which might be time consuming.
+This default behaviour can be changed to analyze only the provided URL by passing the following option to the `analyze` function
+```python
+from seoanalyzer import analyze
+
+output = analyze(site, sitemap, follow_links=False)
+
+print(output)
+```
+
 Alternatively, you can run the analysis as a script from the seoanalyzer folder.
 
 ```sh
diff --git a/seoanalyzer/analyzer.py b/seoanalyzer/analyzer.py
@@ -4,15 +4,15 @@
 from operator import itemgetter
 from seoanalyzer.website import Website
 
-def analyze(url, sitemap_url=None, analyze_headings=False, analyze_extra_tags=False):
+def analyze(url, sitemap_url=None, analyze_headings=False, analyze_extra_tags=False, follow_links=True):
     start_time = time.time()
 
     def calc_total_time():
         return time.time() - start_time
 
     output = {'pages': [], 'keywords': [], 'errors': [], 'total_time': calc_total_time()}
 
-    site = Website(url, sitemap_url, analyze_headings, analyze_extra_tags)
+    site = Website(url, sitemap_url, analyze_headings, analyze_extra_tags, follow_links)
 
     site.crawl()
 
diff --git a/seoanalyzer/website.py b/seoanalyzer/website.py
@@ -9,11 +9,12 @@
 from seoanalyzer.page import Page
 
 class Website():
-    def __init__(self, base_url, sitemap, analyze_headings, analyze_extra_tags):
+    def __init__(self, base_url, sitemap, analyze_headings, analyze_extra_tags, follow_links):
         self.base_url = base_url
         self.sitemap = sitemap
         self.analyze_headings = analyze_headings
         self.analyze_extra_tags = analyze_extra_tags
+        self.follow_links = follow_links
         self.crawled_pages = []
         self.crawled_urls = set([])
         self.page_queue = []
@@ -87,3 +88,6 @@ def crawl(self):
 
             self.crawled_pages.append(page)
             self.crawled_urls.add(page.url)
+
+            if not self.follow_links:
+                break