How to design ‘Novel Contribution’ to existing Deep Learning research for academic publishing

Ph.D. scholars and Deep Learning enthusiasts throw the following question to me very frequently
“How to invent a novel Deep Learning architecture for my research problem?”
Well, it’s tricky question to answer.
Let’s see what are all the things which are not considered as a ‘significant’ contribution:
- Taking an already published DL architecture and just changing the dataset. Example: using a DL architecture like VGG16 or ResNet and applying to novel image classification challenge called “Human vs Aliens”. Even if the dataset is new and no one has applied the VGG16 model for this dataset, merely using a successful architecture and changing dataset is not considered as a novel contribution in short.
- Increasing the size of the dataset with more samples (either by adding more real samples or data augmentation) and showing the performance improvement. Again, adding more data is just assisting the model to deal better with future data. This is good strategy in a DL product life cycle, but not good for academic publishing.
- Making minor changes in a successful architecture (like VGG16 or ResNet) or in the training strategy. These changes will include a) adding/removing nodes in a layer, b) adding/removing layers in a network, c) changing the optimizers (such as Adam, RMSProp, Momentum etc) and showing improvement in accuracy, d) changing activation function and loss functions, e) performing regularization strategies (such as dropout, augmentation etc) and showing accuracy improvement, f) training for more number of epochs, g) changing other hyper-parameters such as learning rate, batch size etc. All these are standard practices to be following for arriving at a good model, but one of these or combination of these won’t be considered as a big contribution (by most reviewers).
- Just incorporating standard practices such as including residual layers, attention layers, layer/batch normalization layers, without justification on why such inclusion is required for solving the problem.
- Taking the existing state-of-the-art (SOTA) Deep Learning architecture and optimizing them for better speed using compression techniques such as pruning, low-rank factorization, knowledge distillation etc. Just applying the Neural Network model compression techniques and claiming that speed of the model has increased for real-time performance of the model is not considered as a big contribution for academic publishing. However, it is a good practice for model deployment in AI product development.
Now let's see what are considered 'significant' contributions. Novel Deep Learning architecture means either one of the two things
- Landmark DL architectures such as CNN, RNN, LSTM, GAN, AlexNet, ResNet, BERT. (such papers are very low in number)
- Modified (Improved for speed or accuracy or both) version of the landmark DL architectures (Which constitutes majority of papers published) such as Faster RCNN, Yolo v2, VNet, ALBERT etc.
To achieve the above, we just need a simple philosophy that we have been hearing since our childhood
“Necessity is the mother of invention”
It’s a simple proverb that does not require much of an explanation from me. But let’s decode what it means for the Deep Learning Research. Here is what we have to do for arriving at novel solution in Deep Learning.
Step 1: Take your dataset of your research problem and run the recent state-of-the-art DL model (preferably the one which is widely recognized by the researchers) on all the testing samples. In that dataset, the accuracy of state-of-the-art models should not be saturated like 95% or above. If the accuracy of the existing models has come to a saturation, then there is not so much scope for improvement with ‘that’ dataset.
Step 2: Based on evaluation, list out the samples where the SOTA model fails. Out of all the samples where the model failed, find a common pattern. For example, if an image classification model failed to classify images taken in dark lighting or night timing, then our focus is to solve this issue i.e., classifying images of low illumination (“Necessity” part of the proverb)
Step 3: Try out different strategies to solve the failed cases of the SOTA model. Now, this so-called strategy includes ‘modifying the architecture with different type of layers’ (such as convolution, dense, attention, residual, batch norm layer). Sometimes a successful strategy can be ‘inspired’ from a SOTA model of different problem domain (but please cite their work 😊). After trying out every other possible change (which leads to better performance) in the original model will finally result in a novel and better DL architecture. (“Invention” part of the proverb). This part needs lot of experimental validation, detailed evaluation and good decision making. Other strategies including data augmentation, better training setup, hyper parameter tuning, regularization, model compression which can be mentioned as minor contributions of the work in the paper.
Since saying is easy but doing is difficult, the step 3 above is challenging, which has to be personally experienced by individuals.
One of my colleagues asked me once “Hey, I learned SVM, Can I apply it to solve face recognition problem and prove it is the best model for recognition?”. I replied to him that “You apply (also learn) every possible model to solve the face recognition problem and figure out which model works better. Please do not search for research problems to apply the model/tool that you learned. You learn the tools (ML/DL) to solve a problem, not learn a tool and search for a problem”. The same philosophy can be applied here: we apply strategies and build a DL architecture to solve an issue. We do not apply the model building strategies just to show that proposed DL architecture is novel.
Eminent researchers do not arrive at a novel solution randomly. If we pay more attention to their DL architectures, we can notice that their models are proposed to resolve a major issue faced by the previous solutions.
Arriving at a landmark architecture requires good knowledge in the problem (intelligent task) as well as solution (Machine Learning or Deep Learning) domain. But it is possible. And I wish you the best to propose your novel & landmark DL architecture in near future.
Thanks for your time!
Bye from Dr. Sridhar Swaminathan
Published on 01 September 2021