如何调整动画时间
658
2022-05-30
1. 爬虫解析库汇总
2. BeautifulSoup基本使用
3. 标签选择器
3.1 选择元素
3.2 获取名称
3.3 获取属性
3.4 获取内容
3.5 嵌套选择
4. 子节点和子孙节点
5. 父节点和祖先节点
6. 兄弟节点
7. 标准选择器
7.1 text属性
**7.2 find( name , attrs , recursive , text , kwargs )
8. CSS选择器
8.1 获取属性
8.2 获取内容
9. 总结
1. 爬虫解析库汇总
2. BeautifulSoup基本使用
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
3. 标签选择器
3.1 选择元素
html = """
The Dormouse's story
Once upon a time there were three little sisters; and their names were , Lacie and Tillie; and they lived at the bottom of a well.
...
""" from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') print(soup.title) print(type(soup.title)) print(soup.head) print(soup.p)1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
3.2 获取名称
html = """
The Dormouse's story
Once upon a time there were three little sisters; and their names were , Lacie and Tillie; and they lived at the bottom of a well.
...
""" from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') print(soup.title.name)1
2
3
4
5
6
7
8
9
10
11
12
13
14
3.3 获取属性
html = """
The Dormouse's story
Once upon a time there were three little sisters; and their names were , Lacie and Tillie; and they lived at the bottom of a well.
...
""" from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') print(soup.p.attrs['name']) print(soup.p['name'])1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
3.4 获取内容
html = """
The Dormouse's story
Once upon a time there were three little sisters; and their names were , Lacie and Tillie; and they lived at the bottom of a well.
...
""" from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') print(soup.p.string)1
2
3
4
5
6
7
8
9
10
11
12
13
14
3.5 嵌套选择
html = """
The Dormouse's story
Once upon a time there were three little sisters; and their names were , Lacie and Tillie; and they lived at the bottom of a well.
...
""" from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') print(soup.head.title.string)1
2
3
4
5
6
7
8
9
10
11
12
13
14
5. 子节点和子孙节点
html = """
Once upon a time there were three little sisters; and their names were Elsie Lacie and Tillie and they lived at the bottom of a well.
...
""" from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') print(soup.p.contents)1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
html = """
Once upon a time there were three little sisters; and their names were Elsie Lacie and Tillie and they lived at the bottom of a well.
...
""" from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') print(soup.p.descendants) for i, child in enumerate(soup.p.descendants): print(i, child)1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
6. 父节点和祖先节点
html = """
Once upon a time there were three little sisters; and their names were Elsie Lacie and Tillie and they lived at the bottom of a well.
...
""" from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') print(soup.a.parent)1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
html = """
Once upon a time there were three little sisters; and their names were Elsie Lacie and Tillie and they lived at the bottom of a well.
...
""" from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') print(list(enumerate(soup.a.parents)))1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
7. 兄弟节点
html = """
Once upon a time there were three little sisters; and their names were Elsie Lacie and Tillie and they lived at the bottom of a well.
...
""" from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') print(list(enumerate(soup.a.next_siblings))) print(list(enumerate(soup.a.previous_siblings)))1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
8. 标准选择器
7.1 text属性
html='''
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
**7.2 find( name , attrs , recursive , text , kwargs )
find返回单个元素,find_all返回所有元素
html='''
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
9. CSS选择器
elect()直接传入CSS选择器即可完成选择
html='''
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
html='''
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
8.1 获取属性
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
8.2 获取内容
html='''
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
10. 总结
推荐使用lxml解析库,必要时使用html.parser
标签选择筛选功能弱但是速度快
建议使用find()、find_all() 查询匹配单个结果或者多个结果
如果对CSS选择器熟悉建议使用select()
记住常用的获取属性和文本值的方法
1
2
HTML
版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。