# python获取网页内容 **Published by:** [KeepLearning](https://paragraph.com/@keeplearning-2/) **Published on:** 2022-12-25 **URL:** https://paragraph.com/@keeplearning-2/python-2 ## Content 20221225 分三步: response = requests.get(url, headers=headers) res=json.loads(response.text)////res = response.json() #下面这一步还没弄清楚什么时候能成功,什么时候会失败。但是需要的数据在第二步肯定可以了。字典可以按照下面方法变成df,如果出现 if using all scalar values you must pass an index 告警,则加上index=[0]这个参数 df = pd.DataFrame(response.json()['data']['list'][0], index=[0]) df=pd.DataFrame(res) import json import requests import pandas as pd #headers的值按F15,然后F5刷新后可以看到,url也在里面找,参考下图,有些网页内容不用 headers也能抓下来,但是有些就不可以。至于原因参考以下帖子,说是为了反扒。 https://zhuanlan.zhihu.com/p/147175546headers = { "authority": "web3alerts.app", "accept": "application/json", "user-agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36", "sec-fetch-site": "same-origin", "sec-fetch-mode": "cors", "sec-fetch-dest": "empty", "referer": "https://web3alerts.app/", "accept-language": "zh-CN,zh;q=0.9" } url = "https://web3alerts.app/api/new_projects" response = requests.get(url, headers=headers) res=json.loads(response.text) df=pd.DataFrame(res) ## Publication Information - [KeepLearning](https://paragraph.com/@keeplearning-2/): Publication homepage - [All Posts](https://paragraph.com/@keeplearning-2/): More posts from this publication - [RSS Feed](https://api.paragraph.com/blogs/rss/@keeplearning-2): Subscribe to updates