网站首页 > 技术文章 正文
一、代码如下:
import requests #导入请求库
from urllib.request import urlretrieve #从urllib.request导入下载函数urlretrieve
import re,time #导入正则库和时间库
from lxml import etree #从lxml导入etree类
def gethtml(): #定义函数gethtml用来下载pdf文件
url="http://www.gov.cn/zhengce/pdfFile/downloadFile.htm" #设置请求网址
headers={
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36"
} #设置请求头headers
response=requests.get(url,headers=headers) #通过headers伪装对网站url进行get请求,并将响应内容赋值给response变量
response.encoding=response.apparent_encoding #根据网页内容解析出网页的编码格式并赋值给响应的编码变量response.encoding
html=response.text #将网页的相应的文本内容赋值给html
html=etree.HTML(html) #对html构造了一个XPath解析对象并对自动修正并赋值给html
result=html.xpath('//tbody/tr') #使用xpath找到tr标签并赋值给result
urllist=[] #定义接收网址的空列表urllist
for info in result: #遍历result里的变量info
try: #尝试操作
urllist.append("http://www.gov.cn"+info.xpath('./td[2]/a/@href')[-1]) #将解析到的td标签的href属性值的最后一个元素与"http://www.gov.cn"相加并添加到列表urllist中
except: #当接收到错误时,
continue #继续执行
# print(urllist)
for downurl in urllist: #遍历urllist列表中的网址downurl
urlretrieve(downurl,"E://IT/PYthon/PYTHON试验/gov/"+downurl.split("/")[-1]) #下载网址downurl,并保存到本机的E://IT/PYthon/PYTHON试验/gov/文件夹下面,文件名用下载网址的最后切割的名称
print("E://IT/PYthon/PYTHON试验/gov/"+downurl.split("/")[-1]+"下载成功") #打印下载成功
time.sleep(0.1) #每执行一次下载休眠0.1秒
gethtml() #调用gethtml函数
二、代码运行结果如下:
E://IT/PYthon/PYTHON试验/gov/PDF_ALL.zip下载成功
E://IT/PYthon/PYTHON试验/gov/2020_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2019_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2018_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2017_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2016_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2015_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2014_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2013_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2012_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2011_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2010_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2009_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2008_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2007_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2006_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2005_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2004_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2003_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2002_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2001_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/2000_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1999_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1998_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1997_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1996_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1995_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1994muLu.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1994_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1993_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1992muLu.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1992_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1991_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1990_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1989_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1988_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1987_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1986_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1985_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1984_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1983_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1982_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1981_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1980_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1979_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1978_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1973_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1971_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1970_PDF.pdf下载成功
E://IT/PYthon/PYTHON试验/gov/1969_PDF.pdf下载成功
三、代码和代码运行结果如下图所示:
最终保存到本机的数据如下图所示:
猜你喜欢
- 2025-03-10 云计算(3)- python routes URL映射管理
- 2025-03-10 Python urllib3 包:处理 URL 的高级 HTTP 客户端库
- 2025-03-10 教你分分钟搞定Linux下yum源配置
- 2025-03-10 Python之Web开发框架学习 Django-URL映射
- 2025-03-10 W3Lib:Python网页数据处理的利器,轻松应对HTML、URL和HTTP挑战
- 2025-03-10 Python Furl包:优雅的URL操作库
- 2025-03-10 Python小案例70- URL和HTTP协议介绍及语法
- 2025-03-10 python 模块furl模块 处理url工具
- 2025-03-10 Python数据分析实战-正则提取文本的URL网址和邮箱(源码和效果)
- 2025-03-10 Python3 URL解析库—urlparse
- 265℃Python短文,Python中的嵌套条件语句(六)
- 264℃python笔记:for循环嵌套。end=""的作用,图形打印
- 263℃PythonNet:实现Python与.Net代码相互调用!
- 259℃Python实现字符串小写转大写并写入文件
- 257℃Python操作Sqlserver数据库(多库同时异步执行:增删改查)
- 117℃原来2025是完美的平方年,一起探索六种平方的算吧
- 98℃Python 和 JavaScript 终于联姻了!PythonMonkey 要火?
- 90℃Ollama v0.4.5-v0.4.7 更新集合:Ollama Python 库改进、新模型支持
- 最近发表
- 标签列表
-
- python中类 (31)
- python 迭代 (34)
- python 小写 (35)
- python怎么输出 (33)
- python 日志 (35)
- python语音 (31)
- python 工程师 (34)
- python3 安装 (31)
- python音乐 (31)
- 安卓 python (32)
- python 小游戏 (32)
- python 安卓 (31)
- python聚类 (34)
- python向量 (31)
- python大全 (31)
- python次方 (33)
- python桌面 (32)
- python总结 (34)
- python浏览器 (32)
- python 请求 (32)
- python 前端 (32)
- python验证码 (33)
- python 题目 (32)
- python 文件写 (33)
- python中的用法 (32)