In this article, we compare two methods for acquiring explanatory text for deep learning-based API detection: Document Retrieval (DR) and Internet Search (IS). The DR method retrieves API documentation from a Windows API reference manual, while the IS method searches for API documentation on the internet. We evaluate the performance of these methods using a dataset called Aliyun Dataset and analyze the relationship between the length of the explanatory text and the model’s performance.
Our findings show that the proposed method outperforms both TextCNN and BiLSTM in fine-tuning, with a missing rate of 94.53%. We also discover that the representation yielded by our method is high quality and has excellent generalization, making it only necessary to adjust subsequent module parameters to adapt to new datasets. In contrast, TextCNN and BiLSTM require adjusting parameters within the representation layer to adapt to updated dataset distributions, but their limited sample size results in lower fine-tuning performance.
To better understand these complex concepts, imagine a detective trying to solve a mystery. The Document Retrieval method is like retrieving clues from a manual, while the Internet Search method is like searching for additional information on the internet. Both methods can provide valuable insights, but the proposed method may have an advantage in terms of quality and generalization.
In conclusion, our study demonstrates that the proposed method outperforms existing methods in acquiring explanatory text for deep learning-based API detection, and its representation has both high quality and excellent generalization. This makes it a promising approach for improving the accuracy of API detection models.
Computer Science, Cryptography and Security