Open Access

Fig. A.3

image

Download original image

Architecture of the capsule network for the S2S translation. The encoder consists of a convolutional layer, four capsule modules, flattening, concatenation, and routing operations. The convolutional layer has five input channels, 128 output channels, 3 × 3 kernels, 1 × 1 strides, and is followed by the ReLU activation and batch normalization. Each of the first three capsule modules contains three convolutional capsule layers (Conv-Caps) with skip connections. The numbers next to each convolutional capsule layer refer to the number of capsules, the number of dimensions for each capsule, the convolutional kernel size, and the stride. The last capsule module has a three-dimensional convolutional capsule layer (3D-Conv-Caps), in which the convolution-based routing is applied three times. The spatial dimensions are reduced by each capsule module due to the use of 2 × 2 strides. The outputs of the third and fourth capsule modules, which have 8 × 8 × 32 × 8 and 4 × 4 × 32 × 8 dimensions, respectively, are concatenated and flattened into a tensor of shape 2560 × 8. It is fed into the final layer that applies the dynamic routing three times and outputs two vectors of length 16 (i.e. υα and υb). The decoder consists of a fully connected layer, a reshaping operation, and five transposed convolutional layers. It takes one of the vectors (i.e. υx) as input. The fully connected layer has 16 input dimensions and 20480 output dimensions, whose output is then reshaped into 64 × 64 × 5 dimensions. The numbers next to each transposed convolutional layer refer to the number of input channels, the number of output channels, the kernel size, and the stride. The fully connected layer and each of the first four transposed convolutional layers are followed by the Parametric ReLU (PReLU) activation. The “same” padding is applied in all the (transposed) convolutional layers. In the capsule network implementation in which the morphological classifications are not used, the encoder only outputs one vector and it is directly input to the decoder.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.