# Web Scraping > Tips **Published by:** [hangytong](https://paragraph.com/@hangytonggmail.com/) **Published on:** 2024-07-18 **URL:** https://paragraph.com/@hangytonggmail.com/web-scraping ## Content When doing web scraping for a site that usually has multiple "sub sites" for example : https://docs.flock.io/ you might want to make use of sitemap to help you scrap all the contents. Site map of Flock.ioMake use of "/sitemap.xml" to get the html of the sitemap for the particular website. the will give you all the sub sites pertaining to the domain. You can make use of BeautifulSoup in python to get the sites into a list. import requests from bs4 import BeautifulSoup def scrape_flock_docs(): url = 'https://docs.flock.io/sitemap.xml' response = requests.get(url) if response.status_code == 200: soup = BeautifulSoup(response.content, 'html.parser') urls = [url.text.strip() for url in soup.find_all('loc')] ## Publication Information - [hangytong](https://paragraph.com/@hangytonggmail.com/): Publication homepage - [All Posts](https://paragraph.com/@hangytonggmail.com/): More posts from this publication - [RSS Feed](https://api.paragraph.com/blogs/rss/@hangytonggmail.com): Subscribe to updates ## Optional - [Collect as NFT](https://paragraph.com/@hangytonggmail.com/web-scraping): Support the author by collecting this post - [View Collectors](https://paragraph.com/@hangytonggmail.com/web-scraping/collectors): See who has collected this post