In this article, the authors propose a novel approach to enhancing the arithmetic skills of large language models (LLMs) called ArthModel. The primary goal is to improve the LLM’s ability to perform arithmetic calculations, making it more practical for real-world applications.
To achieve this, the authors decompose the LLM into three essential parts: dense number and math op conversion, arithmetic calculation, and number-to-text conversion. Each part is designed to address specific challenges and enhance the overall performance of the model.
The dense number and math op conversion submodel are inspired by RNNs and process input sequences of dense numbers or mathematical operations. The submodel has fewer parameters than LLM and converts each dense number into a set of output vectors, indicating its validity and value.
The arithmetic calculation submodel is designed to perform complex arithmetic calculations using the converted dense numbers. It consists of a series of loops that operate on the input tokens, applying mathematical operations and updating the output vectors accordingly.
Finally, the number-to-text conversion submodule converts the calculated result into text, allowing the model to generate written responses or descriptions of mathematical concepts.
Through extensive experiments, the authors demonstrate the effectiveness of ArthModel in enhancing the arithmetic skills of LLMs. The results show that ArthModel can perform various arithmetic operations, including addition, subtraction, multiplication, and division, with high accuracy.
In conclusion, ArthModel is a valuable contribution to the field of natural language processing and machine learning. By improving the arithmetic skills of LLMs, it paves the way for more sophisticated applications in areas such as mathematics education, financial analysis, and scientific computing.
Computation and Language, Computer Science