Nlp工具Nltk的安装及使用

Posted on 2021-10-21 Edited on 2024-04-20 In Tips Views: 39 Waline: Word count in article: 2.6k Reading time ≈ 2 mins.

This is an automatically translated post by LLM. The original post is in Chinese. If you find any translation errors, please leave a comment to help me improve the translation. Thanks!

Introduction to NLTK

NLTK is a leading platform for building Python programs that use human language data. It provides an easy-to-use interface for over 50 corpora and lexical resources (such as WordNet), as well as a text processing library for classification, tokenization, stemming, tagging, parsing, and semantic inference. NLTK is a famous natural language processing library on Python, which comes with its own corpus, part-of-speech tagging library, tokenization, and other functions.

Package Installation

First, install the NLTK package using pip:

1	pip install nltk

You can use the Tsinghua source to speed up the installation:

1	pip install nltk -i https://pypi.tuna.tsinghua.edu.cn/simple

Downloading NLTK Data

After installing the NLTK package, you need to download the relevant data models to use it. The download method is as follows.

After installing the NLTK package, open the Python command line and run the following command (you can also create a new Python file and write the following command to run it):

1 2	import nltk nltk.download()

The following interface will appear:

At first, this list is blank. Click "refresh" in the lower right corner to display the nltk-data list.

Click "Download" in the lower left corner to start downloading the data. After the download is complete, you can use it normally.

Accelerated Download in China

When downloading in China, you may encounter situations where DNS cannot be found or errors occur during the download. The most convenient solution when encountering this situation is as follows:

Execute one of the following commands to download nltk-data to the local machine, which is about 700M in size:

git clone https://github.com/nltk/nltk_data.git
# If you cannot connect to GitHub, you can also use one of the following links to clone
git clone http://gitclone.com/github.com/nltk/nltk_data.git
git clone https://hub.fastgit.org/nltk/nltk_data.git

Enter the nltk-data directory downloaded to the local machine, and modify the index.xml file under the nltk_data directory, replacing all
1
s://raw.githubusercontent.com/nltk/nltk_data/gh-pages
with:
1
://localhost:8000
Run the following command in this directory:
1
python -m http.server 8000
At this time, we will provide a server that provides nltk_data data download services on our local machine. The nltk downloader can obtain the required files by accessing the local address.
Re-execute the following statement in Python:
1
2
import nltk
nltk.download()
Replace the address in the server index with http://localhost:8000/index.xml as shown in the figure below:

Click "refresh" and "Download" in turn to start the installation.