br Each input was a resolution image
Each input was a 299 299 BODIPY505 / 515 image with 3 di-mensions corresponding to red (R), green (G), and blue
(B) colors digitized to a 299 299 3 tensor. The input was gradually transformed by ResNet50 through 5 stages with a total of 50 rectified linear unit activations to provide nonlinearity, which is essential for CNNs. Between stages 2 and 5, residual block structure was introduced to over-come the issues of vanishing and exploding gradients, which are notorious problems for deep CNN architectures. After 5 stages, the input was transformed to a 10 10 2048 tensor. Both height and width were greatly reduced or summarized as dimensions increased from 3 to 2048, indicating that much more information was extracted beyond the original RGB pixel information. The tensor was then flattened to a 2048-element vector based on the 2048 features extracted by ResNet50.
The features extracted by ResNet50 in each stage are shown in Figure 3. In stage 1 the image was converted to a 150 150 64 tensor, and each dimension was visualized in a gray-scale block, with darker areas indicating higher values as the imaged was processed in deeper layers.
When the stage goes deeper, the number of dimensions became larger. In stage 5 there were 2048 blocks with a
10 10 size. Each dimension placed specific attention on the 2-dimensional input to allow unique visualization.
During training we randomly divided the training data-set into a training dataset (n Z 632 images) and a valida-tion dataset (n Z 158 images) with a proportion of 8:2 to refine the model during training. To increase the size of the development dataset, we performed data augmenta-tion (Fig. 4), in which each image was rotated and flipped to expand the amount of data by 8-fold. Data augmentation is guided by expert knowledge,13 has been shown to be an effective method of improving the performance of image classification,14 and has been used in visual recognition studies for human diseases.15 Data augmentation was strictly performed only on the training dataset to improve the system’s classification performance. Testing data were not augmented. After augmentation, the development and validation datasets increased to 5056 and 1264 images, respectively.
Before feeding the data into the CNN, data were reproc-essed to reduce their noise and allow better fit to the CNN through mean subtraction and normalization, which pro-jects each value in the tensor to a standard normal distribu-tion. Prepossessing statistics were strictly computed only on training data and then applied to the validation and test datasets to avoid potential bias.
Zhu et al Applying a CNN-CAD system to determine invasion depth for endoscopic resection
RELU 1 time
BLOCK B x2 Activated by
RELU 9 times
BLOCK B x3 RELU 12 times
STAGE 4 trained
BLOCK B x5 Activated by
BLOCK B x2 Activated by
RELU 9 times
Trained by Tumor Images
Activated by RELU 1 time
A Model Architecture
CONV Shortcut with
RELU (to increase
B Residual Block Type A
CONV Shortcut w/o
BN conv & bn
C Residual Block Type B Convolution
Batch Normalization Rectified Linear Units
Figure 2. Convolutional neural network computer-aided detection system architecture CONV, Convolution; BN, Batch Normalization; RELU, Rectified Linear Units.
Applying a CNN-CAD system to determine invasion depth for endoscopic resection Zhu et al
from each image by
Figure 3. Feature extraction by ResNet50.
After constructing the CNN-CAD system, we used a test da-taset consisting of 203 images to evaluate the classification ac-
curacy of the system. Receiver operating characteristic curves were plotted by a set threshold. The classification given by the CNN-CAD system (P0 or P1) was compared with pulmonary circuit based on
Zhu et al
Applying a CNN-CAD system to determine invasion depth for endoscopic resection