-
Notifications
You must be signed in to change notification settings - Fork 80
Expand file tree
/
Copy pathScraping HTML Data with BeautifulSoup.py
More file actions
36 lines (31 loc) · 1.14 KB
/
Scraping HTML Data with BeautifulSoup.py
File metadata and controls
36 lines (31 loc) · 1.14 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
'''
Scraping Numbers from HTML using BeautifulSoup
In this assignment you will write a Python program
similar to http://www.pythonlearn.com/code/urllink2.py.
The program will use urllib to read the HTML from the data files below,
and parse the data, extracting numbers and compute the
sum of the numbers in the file.
We provide two files for this assignment.
One is a sample file where we give you the sum for your testing and
the other is the actual data you need to process for the assignment.
Sample data: http://python-data.dr-chuck.net/comments_42.html (Sum=2553)
Actual data: http://python-data.dr-chuck.net/comments_353539.html (Sum ends with 63)
You do not need to save these files to your folder since your program
will read the data directly from the URL. Note: Each student will have a
distinct data url for the assignment - so only use your own data url for analysis.
'''
import urllib
from urllib.request import urlopen
from bs4 import BeautifulSoup
url = raw_input('Enter - ')
html = urlopen(url).read()
soup = BeautifulSoup(html)
tag = soup("span")
count=0
sum=0
for i in tag:
x=int(i.text)
count+=1
sum = sum + x
print (count)
print (sum)