A word cloud is a visually prominent presentation of “keywords” that appear frequently in text data. The rendering of keywords forms a cloud-like color picture, so that you can appreciate the main text data at a glance.
You can see many interesting word clouds on the Internet, as follows:
The principles of generating a word cloud are not complicated, and can be roughly divided into several steps:
First, segment text data. This is also the first step in NLP text processing. For the process_text()
method in wordcloud, it is mainly the processing of stop words.
Secondly, calculate the frequency of each word in the text and generate a hash table. Word frequency calculation is equivalent to word count, the first case of various distributed computing platforms, and has the same status as hello world programs in various languages.
Thirdly, generate a picture layout proportionally based on the value of the word frequency. The class IntegralOccupancyMap is the algorithm of the word cloud and the core of the word cloud data visualization method.
Next, generate pictures on the word cloud layout diagram according to the corresponding word frequency. The core method is generate_from_frequencies, whether it is generate()
or generate_from_text()
, it will eventually reach generate_from_frequencies.
Finally, complete the coloring of each word on the word cloud, the default is random coloring.
Let’s make a word cloud with Python.
pip install wordcloud
Save the text to a file.
When generating a word cloud, wordcloud will use spaces or punctuation as delimiters to segment the target text by default.
The core of the wordcloud library is the WordCloud class, and all functions are encapsulated in the WordCloud class. When using, you need to instantiate a Wo r d C l o u d object, and call its generate(text)
method to convert the text into a word cloud.
generate(text)
: generate word cloud from text
to_file(filename)
: save the word cloud image as a file named filename
Read text from external files and use to generate word cloud
1
2
3
4
5
6
7
8
9
10
import numpy as np
from PIL
import Image
from wordcloud
import WordCloud, ImageColorGenerator, STOPWORDS
text = open('let it go.txt', 'r')
.read()
wc = WordCloud(background_color = 'white', width = 1920, height = 1080)
wc.generate_from_text(text)
wc.to_file('let it go.png')
A sample output is:
Most of the various enhancement functions of words can be achieved through the wordcloud constructor, which provides twenty-two parameters, and can be extended by itself.
Common parameters
• width: word cloud image width, default 400 pixels
• height: word cloud image height default 200 pixels
• background_color: the background color of the word cloud image, the default is black
• background_color=‘white’
• font_step: the step interval to increase the font size, the default is 1
• font_path: specifies the font path, default None
• mini_font_size: minimum font size, default size 4
• max_font_size: maximum font size automatically adjusted according to height
• max_words: maximum number of words, default 200
• stop_words: words not displayed such as stop_words={“python”,“java”}
• The default value of Scale is 1, the larger the value, the higher the image density, the clearer the image
• prefer_horizontal: the default value is 0.90, floating-point type. Indicates that if it is not suitable horizontally, rotate to vertical
• relative_scaling: the default value is 0.5, floating point type. Set the reverse order of word frequency, the size multiple of the previous word relative to the next word.
• mask: specifies the word cloud shape picture, the default is rectangular
Add a picture background to the word cloud
1
2
3
4
5
6
7
8
9
10
11
12
13
import numpy as np
from PIL
import Image
from wordcloud
import WordCloud, ImageColorGenerator, STOPWORDS
text = open('let it go.txt', 'r')
.read()
background_image = np.array(Image.open('C://Users/mcc/Desktop/Elsa.jpg'))
img_colors = ImageColorGenerator(background_image)
wc = WordCloud(background_color = 'white', mask = background_image, width = 1920, height = 1080)
wc.generate_from_text(text)
wc.recolor(color_func = img_colors)
wc.to_file('let it go.png')
A sample output is: