With the gradual development of natural language processing, some people may want to use Python to do natural language processing.
Languages such as Python , Java , and C++ are widely used in the field of natural language processing .
Among them, in this article, we will introduce what can be done with natural language processing using Python, libraries that are often used for natural language processing, and recommended study methods.
Table of contents
- What you can do with natural language processing using Python
- Four Basic Analysis of Natural Language Processing
- Examples of natural language processing using Python
- How to do natural language processing with Python
- BERT for natural language processing
- Flow of natural language processing
- Libraries used for natural language processing
- Morphological analysis tool “mecab”
- Recommended study methods for natural language processing with Python
- Learn on free learning sites & commentary sites
- Learning Natural Language Processing with Books – 3 Recommended Books for Beginners
What you can do with natural language processing using Python
Here, we will introduce some use cases of natural language processing.
▼ Click here if you want to know more about natural language processing (NLP)
Four Basic Analysis of Natural Language Processing
There are various types of basic analysis for natural language processing. Due to the nature of natural language processing, the processing differs greatly depending on the language to be handled.
Here, we will introduce basic analysis for processing Japanese.
- Morphological analysis
- Semantic analysis
- Contextual analysis
I will explain each one.
(1) Morphological analysis
Morphological analysis refers to the process of dividing words and sentences into morphemes and classifying the divided words into parts of speech.
A morpheme is the smallest unit of a word or word.
Simply put, if you morphologically analyze the sentence “I went for a walk.”
It will be divided into “I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, Go for a walk, and I went out for a walk.”
Then, assign a part of speech to each of the divided parts.
(2) Syntax analysis
Syntax analysis refers to the task of clarifying the relationships between words divided into morphemes.
It divides a sentence into parts, assigns parts of speech to each part, and clarifies the relationships between the parts of speech to understand the sentence.
Specifically, it refers to the relationship between words, such as which word modifies which word, and whether it complements.
(3) Semantic analysis
Semantic analysis is the process of correctly determining the meaning of words.
When one word has multiple meanings, the meaning of the word changes depending on the sentence. Therefore, when there are multiple meanings, it is necessary to decide “which meaning makes the sentence without discomfort “.
That process is semantic analysis.
④ Context analysis
Context analysis, as the name suggests, analyzes the context.
After determining the meaning of words and the relationships between words, the next step is to analyze the relationships between sentences. However, since context analysis is much more complicated than word analysis, there is no system that can analyze context at this time, and it is expected for future research.
Examples of natural language processing using Python
Here are four examples of natural language processing using Python.
- Translation – DeepL
- Voice interaction system – Siri, Google Assistant
- Sentiment Analysis – Twitter
- Chatbot – Chat Plus
I will introduce each.
Translation – DeepL
DeepL is a translation tool that utilizes ” deep learning “, an artificial intelligence technology.
DeepL can be used for free from a browser or app it can translate with high accuracy because it uses natural language processing. There are many cases where the original meaning cannot be conveyed simply by translating words directly. Therefore, by utilizing natural language processing, it is possible to convey the original meaning even in complicated sentences.
Voice interaction system – Siri, Google Assistant
Siri and Google Assistant are AI-powered virtual assistants developed by Apple and Google, respectively.
Natural language processing is also used in virtual assistants that are now part of our lives. It is mainly used for the mechanism of “understanding human words and responding to them”.
Sentiment Analysis – Twitter
Twitter is one of the social networking services where you can post your own.
In natural language processing, it is possible not only to understand human words and respond to them, but also to analyze human emotions from sentences. On Twitter, etc., they are increasingly active in various situations, such as analyzing whether they are positive or negative.
Chatbot – ChatPlus
ChatPlus is an AI chatbot tool. You can automatically answer questions while linking with various tools.Since chatbots are programs that automatically converse, natural language processing is used to ensure that they have a natural dialogue.
AI specialized news media AINOW
In order to do natural language processing with Python, it is good to know about “flow of natural language processing” and “major libraries” mainly used.
I will also introduce recommended study methods for those who want to try natural language processing with Python.
BERT for natural language processing
“BERT” (Bidirectional Encoder Representations from Transformers) became a hot topic in 2018 for hitting the highest score in the field of work in natural language processing such as translation and sentence classification.
Since then, BERT-based models have appeared one after another.
BERT has become a hot topic in this way, but one of its features is that it has become possible to read the context.
Flow of natural language processing
This section introduces the flow of six processes performed in natural language processing.
- Machine-readable catalog [preprocessing]
- Corpus [preprocessing]
- Morphological analysis
- semantic analysis
- contextual analysis
I will explain each.
(1) Machine-readable catalog
A machine-readable catalog is a standard used in book search systems in libraries and elsewhere.
A machine-readable inventory is used as preprocessing in natural language. It recognizes, reads, and converts characters, playing a dictionary-like role.
A corpus is data like a linguistic collection that accumulates and records how to use natural language.
As with the machine-readable catalog, the corpus is also preprocessed, and the corpus is analyzed to capture linguistic features for use in natural language processing.
③ Morphological analysis
After preprocessing, proceed to the next stage. Morphological analysis is the task of breaking down a sentence into the smallest units and distinguishing between them.
Japanese has complicated words and grammar mixed in, so it is necessary to break it down and distinguish it. As mentioned above, morphological analysis of the sentence
I went for a walk'' results in the divisionWatashi wa ni Ide keta”.
④ Syntax analysis
In syntactic analysis, the words divided by morphological analysis are connected like a tree and diagrammed.
By schematizing the divided words like a tree, it is possible to clarify the relationship between the divided words and grasp the sentence.
⑤ Semantic analysis
Semantic analysis refers to the process of making appropriate semantics for analysis.
In the syntactic analysis stage, we will clarify the relationship between words, but if each word has multiple meanings, there is a possibility that the meaning will not be understood when connecting. Therefore, semantic analysis is the process of analyzing and processing the meanings of connected words so that they do not feel strange.
⑥ Context analysis
Finally, there is contextual analysis. Contextual analysis is the final stage of natural language processing in which sentences made up of various words are analyzed and processed, taking into account the connections between sentences and the information in the background.
However, contextual analysis requires extremely complex and difficult technology, and as mentioned above, it has not yet been put to practical use. However, if it is put to practical use, it will enable natural language processing with higher accuracy than now, so it is one of the technologies expected in the future.
Libraries used for natural language processing
Python has many AI-specific libraries.
Here we introduce the libraries mainly used in the field of language processing.
I will introduce each of them.
Transformers is a deep learning model announced in 2017.
Compared to conventional learning models such as CNN and RNN, this model does not require sequential processing of time-series data and greatly reduces the learning time. It not only has a short learning time, but also has the characteristics of high accuracy.
Numpy is a numerical calculation library for easily performing calculations in Python.
It is a library developed in C language, and can handle multi-dimensional arrays used in vector operations and matrix operations as well as calculations such as averages, variances, and standard averages, and can perform high-speed processing.
Pandas is a library for efficient data analysis in Python.
It can read data in various formats, display statistics, analyze data, and graph it, and is very commonly used in data analysis.
GiNZA is a natural language processing library for Japanese.
It is characterized by being able to use Japanese analysis processing, dependency structure (dependency) analysis, named entity extraction, etc. on the framework. can.
NLTK is a natural language library for English.
NLTK has various functions such as text classification, semantic reasoning, stemming, etc. It can also parse English, tag words with parts of speech, and can also perform semantic analysis and morphological analysis.
Morphological analysis tool “mecab”
“Mecab” is an inseparable part of natural language processing that handles Japanese.
Mecab is an open source morphological analysis engine.
Its basic policy is a general-purpose design that does not depend on languages, dictionaries, or corpora, so it can be used in many languages other than Python, such as C, C++, Ruby , and Java.
It runs faster than other morphological analysis engines such as ChaSen and Juman, and is the most popular Japanese morphological analysis engine.
Recommended study methods for natural language processing with Python
The attention of Python has increased greatly due to its high readability and the abundance of AI-related libraries.
If you are self-taught, you can learn Python from books and websites, or you can go to school and learn. From many ways to learn, here we will introduce sites and books that you can study by yourself.
Learn on free learning sites & commentary sites
- 100 language processing knocks
Progate is a very popular learning site for learning the basics of Python.
Explanations are given in a slide format, and the feature is that there are many illustrations and diagrams. In addition to Python, you can learn various languages, and each has practice questions, so you can use it to check your proficiency.
paiza learning is a learning site for programming beginners like Progate.
The big difference from Progate is that it gives you very detailed instructions with videos. While watching videos, you can actually move your hands at the same speed as the commentator of the female voice actor and learn, so it is a learning site that can be highly recommended for those who did not follow other learning sites such as Progate. In addition, you can efficiently study the questions asked in the recruitment exam.
100 language processing knocks 2020
Language Processing 100 Book Knock 2020 is a collection of 100 natural language processing exercises published by Tohoku University.
Overall, there are many good questions, so it is often used in companies, training, and study sessions, and you can acquire natural language processing skills while having fun.
Qiita is a service that allows you to record and share knowledge and skills related to engineers.
Qiita is a very useful commentary site for program creators and those who are looking into ICT technology.
It not only contains explanations, but also contains various information such as specific codes, execution methods, and how to build the necessary environment. Also, Qiita not only allows you to view the explanations, but you can also post your own articles, so if you have the technology, it would be nice to post explanation articles for other people who are in trouble.
Learning Natural Language Processing with Books – 3 Recommended Books for Beginners
Natural Language Processing [Revised Edition] (The Open University of Japan Teaching Materials)
Natural Language Processing (Revised Edition) is a teaching material used at the Open University of Japan.
Although there are no coding examples, it is recommended for those who want to get an overview of the field of natural language processing and understand the basics.
Introduction to natural language processing using machine learning and deep learning
Introduction to Natural Language Processing by Machine Learning /Deep Learning is a teaching material that allows you to learn from the basics of natural language processing and machine learning to its implementation. There is also a description of BERT, which is famous for natural language processing, so it is a recommended teaching material for those who want to study about them as well.
Learn by moving with Python Introduction to natural language processing
Learning with Python The Introduction to Natural Language Processing is a teaching material that allows you to learn a wide range of natural language processing, from text data preparation to analysis and utilization. It also covers a wide range of topics such as databases and web applications, so it is recommended for those who are a little familiar with programming
In this article, I introduced examples of using natural language processing and how to study with Python.
Natural language processing can be said to be a hot field with demand, with large companies such as Apple and Google continuing to develop automatic translation and voice dialogue systems.
It is also an area where further development is expected.