爬虫下载图片打不开是什么原因(爬虫下载图片代码)-百科知识-舒华文档

爬虫，全称网络爬虫，是通过技术手段从网络获取数据的程序或脚本。

人生苦短，我选python。这一次，python将用于下载一个壁纸网站的图像。

本文是直接爬虫的实战。通过本文，我们将带您了解requests库的基本用法，完成壁纸网站的图片抓取。

请求库是python中非常有用的http请求库。包装的很好~我们爬的时候经常用。

Requests的官方介绍说，让HTTP为人类服务。这是一个非常容易使用的库。我们的爬虫这次也将使用这个库。

关于requests库的介绍，请查看官方文档。

请求:让HTTP为人类服务–请求2.18.1文件

记得在使用请求库之前安装它。

pip安装请求

输入目标网站。

点击任何图片，并检查其网址，http://***.netbian.com/desk/23744.htm.

先关注这个网站，你以后会用到的。

回到浏览器，打开F12，通过目标元素检查工具点击我们刚刚点击的图像。通过它的元素，我们可以知道A标签中属性值href的链接就是我们上面访问图片的链接地址。

在大图的页面上，我们也用f12点击，找到图片的链接地址。

访问图片链接，发现是我们想要的大图。至此，网站的分析完成。

综上所述，我们的目标网站是一个壁纸图片网站，编程步骤如下:

访问主页并找到每张图片的详细信息链接。访问详细信息链接，找到对应图片的大图链接。下载并保存图片

看起来不容易吗？我们走吧。

4.1访问主页

url = & # 039http://***.netbian.com/meinv/'随手打开('index.html & # 039, 'wb & # 039)as f:通过requests库发起get请求，请求壁纸网站首页。并将结果保存在index.html文件中。

在阅读了保存的文件后，我们下载了主页。

4.2定位元件

我们在这里使用xpath。这里使用的是lxml库。如果不知道lxml库，请参考以下内容。

【Python】Xpath，爬虫分析利器，由浅入深快速掌握(附源代码示例)

PS:Google渲染的页面的xpath会和请求返回的xpath不一样。有时需要保存它用于xpath分析。

找到元素。把A标签的所有href值都拿出来，也把对应的名字拿出来。

tree = etree。HTML(resp . content)node _ list = tree . XPath('/html/body/div[2]/div[2]/div[3]/ul/Li ')if len(node . XPath('。/a/@ href '))& gt0:sub _ URL = node . XPath('。/a/@ href ')[0]if len(node . XPath('。/a/@ href '))& gt0:title = node . XPath('。/a/b/text()')[0]sub _ URL _ list . append((sub _ URL，title)) 4.3访问详细信息页面

base _ url = & # 039sub_url的http://***.netbian.com/'，sub_url_list中的标题:s _ page = base _ URL+sub _ URL s _ resp = requests . get(s _ page)with open('s.html & # 039, 'wb & # 039)为f:xxxxxxxxxxbrbbase _ URL = 'sub_url的http://***.netbian.com/'，sub_url_list中的标题:s _ page = base _ URL+sub _ URL s _ resp = requests . get(s _ page)with open('s.html & # 039, 'wb & # 039)as f:4.4定位图片链接，下载。

img = s _ tree . XPath('/html/body/div[2]/div[2]/div[3]/div/p/a/img/@ src ')[0]suffix = img . split('。')[-1]img _ content = requests . get(img)。content with open(f '。/i***ge/{title}。{后缀} ', 'wb & # 039)作为f:xxxxxxxxxxbrimg = s _ tree . XPath('/html/body/div[2]/div[2]/div[3]/div/p/a/img/@ src ')[0]suffix = img . split('。')[-1]img _ content = requests . get(img)。content with open(f '。/i***ge/{title}。{后缀} ', 'wb & # 039)如f:下载完效果图

4.5完整的源代码

if __name__ == '__***in__': url = 'http://***.netbian.com/meinv/'with open('index.html', 'wb') as f: tree = etree.HTML(resp.content) node_list = tree.xpath('/html/body/div[2]/div[2]/div[3]/ul/li')if len(node.xpath('./a/@href')) > 0: sub_url = node.xpath('./a/@href')[0]if len(node.xpath('./a/@href')) > 0: title = node.xpath('./a/b/text()')[0] sub_url_list.append((sub_url, title)) base_url = 'http://***.netbian.com/'for sub_url, title in sub_url_list: s_page = base_url + sub_url s_resp = requests.get(s_page) s_tree = etree.HTML(s_resp.content) img = s_tree.xpath('/html/body/div[2]/div[2]/div[3]/div/p/a/img/@src')[0] suffix = img.split('.')[-1] img_content = requests.get(img).contentwith open(f'./i***ge/{title}.{suffix}', 'wb') as f:

xxxxxxxxxxbrif __name__ == '__***in__': url = 'http://***.netbian.com/meinv/'with open('index.html', 'wb') as f: tree = etree.HTML(resp.content) node_list = tree.xpath('/html/body/div[2]/div[2]/div[3]/ul/li')if len(node.xpath('./a/@href')) > 0: sub_url = node.xpath('./a/@href')[0]if len(node.xpath('./a/@href')) > 0: title = node.xpath('./a/b/text()')[0] sub_url_list.append((sub_url, title)) base_url = 'http://***.netbian.com/'for sub_url, title in sub_url_list: s_page = base_url + sub_url s_resp = requests.get(s_page) s_tree = etree.HTML(s_resp.content) img = s_tree.xpath('/html/body/div[2]/div[2]/div[3]/div/p/a/img/@src')[0] suffix = img.split('.')[-1] img_content = requests.get(img).contentwith open(f'./i***ge/{title}.{suffix}', 'wb') as f:

妈妈再也不用担心我的学习了。if _ _ name _ _ = = & # 039_ _ ***in _ _ & # 039:url = & # 039http://***.netbian.com/meinv/'随手打开('index.html & # 039, 'wb & # 039)为f: tree = etree。HTML(resp . content)node _ list = tree . XPath('/html/body/div[2]/div[2]/div[3]/ul/Li ')if len(node . XPath('。/a/@ href '))& gt0:sub _ URL = node . XPath('。/a/@ href ')[0]if len(node . XPath('。/a/@ href '))& gt0:title = node . XPath('。/a/b/text()')[0]sub _ URL _ list . append((sub _ URL，title))base _ URL = 'sub_url的http://***.netbian.com/'，sub_url_list中的标题:s _ page = base _ URL+sub _ URL s _ resp = requests . get(s _ page)s _ tree = etree。HTML(s _ resp . content)img = s _ tree . XPath('/html/body/div[2]/div[2]/div[3]/div/p/a/img/@ src ')[0]suffix = img . split('。')[-1]img _ content = requests . get(img)。content with open(f '。/i***ge/{title}。{后缀} ', 'wb & # 039)为f:xxxxxxxxxxbrif _ _ name _ _ = = '_ _ ***in _ _ & # 039:url = & # 039http://***.netbian.com/meinv/'随手打开('index.html & # 039, 'wb & # 039)为f: tree = etree。HTML(resp . content)node _ list = tree . XPath('/html/body/div[2]/div[2]/div[3]/ul/Li ')if len(node . XPath('。/a/@ href '))& gt0:sub _ URL = node . XPath('。/a/@ href ')[0]if len(node . XPath('。/a/@ href '))& gt0:title = node . XPath('。/a/b/text()')[0]sub _ URL _ list . append((sub _ URL，title))base _ URL = 'sub_url的http://***.netbian.com/'，sub_url_list中的标题:s _ page = base _ URL+sub _ URL s _ resp = requests . get(s _ page)s _ tree = etree。HTML(s _ resp . content)img = s _ tree . XPath('/html/body/div[2]/div[2]/div[3]/div/p/a/img/@ src ')[0]suffix = img . split('。')[-1]img _ content = requests . get(img)。content with open(f '。/i***ge/{title}。{后缀} ', 'wb & # 039)作为f:妈妈再也不用担心我的学习了。

本文来自暗夜殘星投稿，不代表舒华文档立场，如若转载，请注明出处：https://www.chinashuhua.cn/24/635318.html

爬虫下载图片打不开是什么原因(爬虫下载图片代码)

相关推荐

360文件管理器下载(文件管理器app)

下载手机安卓游戏 安卓版下载地址分享

打印机清零软件怎么下载 打印机芯片清零软件

僵尸世界大战手游下载 僵尸世界大战中文版下载

评论列表

联系我们

分享到：

下载手机安卓游戏安卓版下载地址分享

打印机清零软件怎么下载打印机芯片清零软件

僵尸世界大战手游下载僵尸世界大战中文版下载