Source From Here
Question
One testing HTML source is:
And , I want to get a relative link address first:
and find it's absolute path.
How-To
Use urllib.parse.urljoin() to join the base url and src. Below is the Example, using requests and BeautifulSoup:
- test.py
Then below is the execution result:
Question
One testing HTML source is:
And , I want to get a relative link address first:
- /test/abc.com
How-To
Use urllib.parse.urljoin() to join the base url and src. Below is the Example, using requests and BeautifulSoup:
- test.py
- from urllib.parse import urljoin
- import requests
- from bs4 import BeautifulSoup
- base_url = 'http://www.ragalahari.com'
- url = 'http://www.ragalahari.com/actress/14035/kajal-aggarwal-at-memu-saitham-dinner-with-stars.aspx'
- soup = BeautifulSoup(requests.get(url).content, 'html.parser')
- for img in soup.find_all('img', src=True):
- src = img.get('src')
- if not src.startswith('http'):
- src = urljoin(base_url, src)
- print(src)
沒有留言:
張貼留言