当前位置: > python教程 > python基础教程 >

python3词频统计代码案例
栏目分类:python基础教程   发布日期:2019年07月02日 09:05:13   浏览次数:

对一段英文文字进行词频统计,如(可自行选择英文内容)
 

要求:按倒序 输出出现次数最多的前3个单词和它出现的次数 
 

提示:需要综合运用,字符串,列表,字典等基础知识
 

分析:1.拿到多行字符串后,需要将字符串中每一个单词提取出来,放入一个列表,这个过程中,需要去掉标点,空格,换行符等,同时要注意字母的大小写2.字符串处理完毕得到单词组成的列表后,我们需要得到的是单词和单词出现次数,这可以通过我们的字典的键值对来实现3.拿到单词和出现次数的字典后,我们知道字典是无序的,所以我们要拿到出现次数最多的三个单词,需要对字典的value进行排序,所以此时可以把字典转换为列表,处理时,我们讲键值对放在一个元祖里,实现列表的嵌套即可


代码如下:

python词频统计代码

Chapter = """
I have a dream that one day this nation will rise up and live out the true meaning of its creed: "We hold these truths to be self-evident, that all men are created equal.

I have a dream that one day on the red hills of Georgia, the sons of former slaves and the sons of former slave owners will be able to sit down together at the table of brotherhood.

I have a dream that one day even the state of Mississippi, a state sweltering with the heat of injustice, sweltering with the heat of oppression, will be transformed into an oasis of freedom and justice.
"""
word_list = []
word_dict = {}
tuple_list = []
# 将多行字符串出现的连字符-、换行符去掉
Chapter = Chapter.replace("-", " ")
Chapter = Chapter.replace(" ", " ")

# 字符串分割为单个字符组成的列表
Chapter_list = Chapter.split(" ")

# 遍历列表,去掉标点及空格
for i in Chapter_list:
    if i != "":
        word = i.lower().strip(' ' + ',' + '.' + '!' + ':' + '"')
        word_list.append(word)

for word in word_list:
    word_dict[word] = word_dict.get(word, 0) + 1

for word, num in word_dict.items():
    tuple_list.append((num, word))

tuple_list.sort(reverse=True)
print(tuple_list[:3])   # [(9, 'of'), (8, 'the'), (4, 'that')]


运行结果:

[(9, 'of'), (8, 'the'), (4, 'that')]

相关热词:

下一篇:没有了
热门关键词
python字符串
     
python教程 python爬虫 python人工智能 Python+大数据 python问答