# 字符级全卷积神经网络的文本分类方法Character Level Full Convolution Neural Network for Text Classification

In order to solve the problem of too many parameters in the full connection layer and low calculation efficiency of the traditional convolutional neural network, the full convolutional neural network and the global average pooling layer are used in image processing for text classification, the convolutional layer is combined with the global average pooled layer and the fully connected layer is replaced. Meanwhile, using the multi-scale convolution kernel with reference to the Inception structure reduces the number of parameters, speeds up the convergence, and increases the classification accuracy of the model. In addition, in order to avoid the curse of dimensionality and the slow speed of word level vector training, character level vector representation is used. And the batch standardization layer is used instead of the Dropout layer, reducing over-fitting problems. By using multiple indicators to evaluate the model in the test data set, the validity of the model is fully verified. Compared with the traditional model, the proposed model has better classification performance in the classification task.

1. 引言

2. 相关工作

2006年Hinton [1] 等人利用逐层贪心算法初始化深度信念网络(DBN)，提出了深层学习的概念。此后很多的深度学习模型被提出，He [2] 等人和Abdel-Hamid [3] 使用深度神经网络分别在计算机视觉领域和语音识别方面取得了显著的成果。

3. 模型设计与算法流程

3.1. 总模型

Figure 1. Overall structure of the model

3.2. 全卷积和全局平均池化

$\text{Time}～\text{O}\left({\text{M}}^{\text{2}}?{\text{K}}^{\text{2}}?\text{Cin}?\text{Cout}\right)$ (1)

$\text{Space}～\text{O}\left({\text{K}}^{\text{2}}?\text{Cin}?\text{Cout}\right)$ (2)

$\text{Time}～\text{O}\left({\text{1}}^{\text{2}}?{\text{X}}^{\text{2}}?\text{Cin}?\text{Cout}\right)$ (3)

$\text{Space}～\text{O}\left({\text{X}}^{\text{2}}?\text{Cin}?\text{Cout}\right)$ (4)

$\text{Time}～\text{O}\left(\text{Cin}?\text{Cout}\right)$ (5)

$\text{Space}～\text{O}\left(\text{Cin}?\text{Cout}\right)$ (6)

3.3. 批量标准化

$\mu ←\frac{1}{m}\underset{i=1}{\overset{m}{\sum }}{x}_{i}$ (7)

${\sigma }_{c}^{2}←\frac{1}{m}\underset{i=1}{\overset{m}{\sum }}{\left({x}_{i}?{\mu }_{c}\right)}^{2}$ (8)

${\stackrel{^}{x}}_{i}←\frac{{x}_{i}?{\mu }_{c}}{\sqrt{{\sigma }_{c}^{2}+\epsilon }}$ (9)

${y}_{i}←\gamma {\stackrel{^}{x}}_{i}+\beta \equiv B{N}_{\gamma ,\beta \left({x}_{i}\right)}$ (10)

3.4. 字符表示

3.5. 算法实现

$\text{loss}\left(p,f\right)=?\frac{1}{n}\underset{i=1}{\overset{n}{\sum }}p\left({x}_{i}\right)\mathrm{log}\left(f\left({x}_{i}\right)\right)$ (11)

4. 实验

4.1. 超参数设置

Table 1. Experimental parameters and specifications

Table 2. Convolution kernel structure

4.2. 评价标准

4.3. 实验数据

THUCnews新闻数据集：该数据集由清华大学自然语言处理实验室搜集与整理，通过过滤筛选2005~2011年间新浪新闻数据生成，包含74万篇新闻文档，均为UTF-8纯文本格式，并重新整合划分出彩票、教育、娱乐、科技等14个分类类别，所有文档均按对应类别储存在14个类别文件夹下。

Table 3. Data sample distribution

4.4. 实验与结果分析

4.4.1. 对Inception结构和全卷积效果的验证

Table 4. Comparison of test results of the model

Figure 2. Curve of accuracy rate with time during training

Figure 3. Curve of loss with time during training

4.4.2. 对比实验

Table 5. Comparison of accuracy between Inception-CharFcn model and other models

5. 结束语

