爬虫BeautifulSoup库基本使用，案例解析（附源代码）-伙伴云

爬虫BeautifulSoup库基本使用，案例解析（附源代码）

网友投稿 814 2025-04-01

1. 爬虫解析库汇总

2. BeautifulSoup基本使用

3. 标签选择器

3.1 选择元素

3.2 获取名称

3.3 获取属性

3.4 获取内容

3.5 嵌套选择

4. 子节点和子孙节点

5. 父节点和祖先节点

6. 兄弟节点

7. 标准选择器

7.1 text属性

**7.2 find( name , attrs , recursive , text , kwargs )

8. CSS选择器

8.1 获取属性

8.2 获取内容

9. 总结

1. 爬虫解析库汇总

2. BeautifulSoup基本使用

3. 标签选择器

3.1 选择元素

html = """ The Dormouse's story

The Dormouse's story

Once upon a time there were three little sisters; and their names were , Lacie and Tillie; and they lived at the bottom of a well.

...

""" from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') print(soup.title) print(type(soup.title)) print(soup.head) print(soup.p)

3.2 获取名称

html = """ The Dormouse's story

The Dormouse's story

Once upon a time there were three little sisters; and their names were , Lacie and Tillie; and they lived at the bottom of a well.

...

""" from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') print(soup.title.name)

3.3 获取属性

html = """ The Dormouse's story

The Dormouse's story

Once upon a time there were three little sisters; and their names were , Lacie and Tillie; and they lived at the bottom of a well.

...

""" from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') print(soup.p.attrs['name']) print(soup.p['name'])

3.4 获取内容

html = """ The Dormouse's story

The Dormouse's story

Once upon a time there were three little sisters; and their names were , Lacie and Tillie; and they lived at the bottom of a well.

...

""" from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') print(soup.p.string)

3.5 嵌套选择

html = """ The Dormouse's story

The Dormouse's story

Once upon a time there were three little sisters; and their names were , Lacie and Tillie; and they lived at the bottom of a well.

...

""" from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') print(soup.head.title.string)

爬虫BeautifulSoup库基本使用，案例解析（附源代码）

5. 子节点和子孙节点

html = """ The Dormouse's story

Once upon a time there were three little sisters; and their names were Elsie Lacie and Tillie and they lived at the bottom of a well.

...

""" from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') print(soup.p.contents)

html = """ The Dormouse's story

Once upon a time there were three little sisters; and their names were Elsie Lacie and Tillie and they lived at the bottom of a well.

...

""" from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') print(soup.p.descendants) for i, child in enumerate(soup.p.descendants): print(i, child)

6. 父节点和祖先节点

html = """ The Dormouse's story

Once upon a time there were three little sisters; and their names were Elsie Lacie and Tillie and they lived at the bottom of a well.

...

""" from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') print(soup.a.parent)

html = """ The Dormouse's story

Once upon a time there were three little sisters; and their names were Elsie Lacie and Tillie and they lived at the bottom of a well.

...

""" from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') print(list(enumerate(soup.a.parents)))

7. 兄弟节点

html = """ The Dormouse's story

Once upon a time there were three little sisters; and their names were Elsie Lacie and Tillie and they lived at the bottom of a well.

...

""" from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') print(list(enumerate(soup.a.next_siblings))) print(list(enumerate(soup.a.previous_siblings)))

8. 标准选择器

7.1 text属性

html='''

Hello

''' from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') print(soup.find_all(text='Foo'))

**7.2 find( name , attrs , recursive , text , kwargs )

find返回单个元素，find_all返回所有元素

html='''

Hello

''' from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') print(soup.find('ul')) print(type(soup.find('ul'))) print(soup.find('page'))

9. CSS选择器

elect()直接传入CSS选择器即可完成选择

html='''

Hello

''' from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') print(soup.select('.panel .panel-heading')) print(soup.select('ul li')) print(soup.select('#list-2 .element')) print(type(soup.select('ul')[0]))

html='''

Hello

''' from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') for ul in soup.select('ul'): print(ul.select('li'))

8.1 获取属性

8.2 获取内容

html='''

Hello

''' from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') for li in soup.select('li'): print(li.get_text())

10. 总结

推荐使用lxml解析库，必要时使用html.parser

标签选择筛选功能弱但是速度快

建议使用find()、find_all() 查询匹配单个结果或者多个结果

如果对CSS选择器熟悉建议使用select()

记住常用的获取属性和文本值的方法

HTML

选择器">css选择器

814 2025-04-01

动画那么快闪过下一张（幻灯片很快就闪过去）">为什么动画那么快闪过下一张（幻灯片很快就闪过去）

814 2025-04-01

选择器和类选择器的区别">css 学习笔记【二】-背景，渐变，链接文档的MIME类，id选择器和类选择器的区别

814 2025-04-01

爬虫BeautifulSoup库基本使用，案例解析（附源代码）

Hello

Hello

Hello

Hello

Hello

选择器">css选择器

动画那么快闪过下一张（幻灯片很快就闪过去）">为什么动画那么快闪过下一张（幻灯片很快就闪过去）

选择器和类选择器的区别">css 学习笔记【二】-背景，渐变，链接文档的MIME类，id选择器和类选择器的区别

推荐文章

企业生产管理是什么，企业生产管理软件

进盘点进销存软件排行榜前十名

进销存系统哪个简单好用？进销存系统优点

工厂生产管理（工厂生产管理流程及制度）

生产管理软件，机械制造业生产管理，制造业生产过程管理软件

进销存软件和ERP有什么区别？进销存与erp软件理解

进销存如何进行库存管理

如何利用excel制作销售订单管理系统？

数据库订单管理系统有哪些功能？数据库订单管理系统怎么设计？

什么是数据库管理系统？

最近发表

热评文章

零代码开发是什么？2022低代码平台排行榜">零代码开发是什么？2022低代码平台排行榜

进销存库存管理 系统（智慧进销存）">智能进销存库存管理系统（智慧进销存）

在线文档哪家强？8款在线文档编辑软件推荐">在线文档哪家强？8款在线文档编辑软件推荐

WPS2016怎么绘制简单的价格表?

什么是在线文档？怎么发在线文档

进销存库存管理盘点">简单进销存库存管理盘点

友情链接

爬虫BeautifulSoup库基本使用，案例解析（附源代码）

Hello

Hello

Hello

Hello

Hello

微信扫一扫：分享

选择器">css选择器

动画那么快闪过下一张（幻灯片很快就闪过去）">为什么动画那么快闪过下一张（幻灯片很快就闪过去）

选择器和类选择器的区别">css 学习笔记【二】-背景，渐变，链接文档的MIME类，id选择器和类选择器的区别

推荐文章

最近发表

热评文章

零代码开发是什么？2022低代码平台排行榜">零代码开发是什么？2022低代码平台排行榜

进销存库存管理系统（智慧进销存）">智能进销存库存管理系统（智慧进销存）

在线文档哪家强？8款在线文档编辑软件推荐">在线文档哪家强？8款在线文档编辑软件推荐

进销存库存管理盘点">简单进销存库存管理盘点

友情链接