<a target="_blank" href="https://www.huoban.com/news/tags-7290.html"style="font-weight:bold;">spaCy</a><a target="_blank" href="https://www.huoban.com/news/tags-50.html"style="font-weight:bold;">使用</a>-伙伴云

网友投稿 986 2025-04-01

官方文档

https://spacy.io/usage

spaCy是一个Python自然语言处理工具包，诞生于2014年年中，号称“Industrial-Strength Natural Language Processing in Python”，是具有工业级强度的Python NLP工具包。spaCy里大量使用了 Cython 来提高相关模块的性能，这个区别于学术性质更浓的Python NLTK，因此具有了业界应用的实际价值。

加载模型

# 导入工具包和英文模型 # python -m spacy download en 用管理员身份打开CMD import spacy nlp = spacy.load('en')

文本处理

doc = nlp('Weather is good, very windy and sunny. We have no classes in the afternoon.') # 分词 for token in doc: print (token) OUT: Weather is good , very windy and sunny . We have no classes in the afternoon --------------------------------- #分句 for sent in doc.sents: print (sent) OUT： Weather is good, very windy and sunny. We have no classes in the afternoon.

词性参考 https://www.winwaed.com/blog/2011/11/08/part-of-speech-tags/

for token in doc: print ('{}-{}'.format(token,token.pos_)) OUT: Weather-PROPN is-VERB good-ADJ ,-PUNCT very-ADV windy-ADJ and-CCONJ sunny-ADJ .-PUNCT We-PRON have-VERB no-DET classes-NOUN in-ADP the-DET afternoon-NOUN .-PUNCT

spaCy使用

命名体识别

doc_2 = nlp("I went to Paris where I met my old friend Jack from uni.") for ent in doc_2.ents: print ('{}-{}'.format(ent,ent.label_)) OUT: Paris-GPE Jack-PERSON ---- from spacy import displacy doc = nlp('I went to Paris where I met my old friend Jack from uni.') displacy.render(doc,style='ent',jupyter=True)

练习：找到书中所有人物名字

def read_file(file_name): with open(file_name, 'r') as file: return file.read() # 加载文本数据 text = read_file('./data/pride_and_prejudice.txt') processed_text = nlp(text) sentences = [s for s in processed_text.sents] print (len(sentences)) OUT: 6469

一共有6469个句子

from collections import Counter,defaultdict def find_person(doc): c = Counter() for ent in processed_text.ents: if ent.label_ == 'PERSON': c[ent.lemma_]+=1 return c.most_common(10) print (find_person(processed_text)) OUT: [('elizabeth', 604), ('darcy', 276), ('jane', 274), ('bennet', 233), ('bingley', 189), ('collins', 179), ('wickham', 170), ('gardiner', 95), ('lizzy', 94), ('lady catherine', 77)]

搞定

Python

九江庐山市星辰翰林高三复读班2025招生简章

986 2025-04-01

spaCy 使用

excel表格怎么做好看颜色搭配

九江庐山市星辰翰林高三复读班2025招生简章

使用 Excel 制作考试成绩表的详细步骤及技巧

推荐文章

企业生产管理是什么，企业生产管理软件

进盘点进销存软件排行榜前十名

进销存系统哪个简单好用？进销存系统优点

工厂生产管理（工厂生产管理流程及制度）

生产管理软件，机械制造业生产管理，制造业生产过程管理软件

进销存软件和ERP有什么区别？进销存与erp软件理解

进销存如何进行库存管理

如何利用excel制作销售订单管理系统？

数据库订单管理系统有哪些功能？数据库订单管理系统怎么设计？

什么是数据库管理系统？

最近发表

热评文章

零代码开发是什么？2022低代码平台排行榜">零代码开发是什么？2022低代码平台排行榜

进销存库存管理 系统（智慧进销存）">智能进销存库存管理系统（智慧进销存）

在线文档哪家强？8款在线文档编辑软件推荐">在线文档哪家强？8款在线文档编辑软件推荐

WPS2016怎么绘制简单的价格表?

进销存库存管理盘点">简单进销存库存管理盘点

客户管理工具是什么？">客户管理工具是什么？

友情链接

微信扫一扫：分享