Concept hierarchies are like a family tree for words in a language. They help us understand how words are related to each other and how they fit into a larger structure. Previous studies have shown that language models can capture some of these hierarchical relationships, but they often struggle with more complex ones. In this article, we explore different methods for generating representations of concepts in a ternary, which is a list of tokens in the form "[CLS] [concept name] is defined as [definition]." We experiment with vectorial-based and prompt-based methods and show that both can capture hierarchical relationships. However, we also find that existing task-dependent evaluations have limitations, such as relying too heavily on a single relation type (hypernymy) and ignoring other related relations like ancestors and siblings. To address this issue, we propose a new evaluation methodology that considers multiple relation types and provides a more comprehensive view of the model’s ability to capture hierarchical knowledge. By using this approach, we can gain a better understanding of how language models represent concepts in a hierarchical manner and identify areas for improvement.
Vectorial-Based Methods
Vectorial-based methods use the last layer output of a language model to generate representations of concepts. For these methods, we consider two approaches: using the special token [CLS] and computing the average over all subtokens. The CLS method represents each concept with a fixed-length vector, while the avg method uses a learnable weighted sum of all subtoken vectors. We find that both methods can capture hierarchical relationships, but the avg method performs slightly better in our experiments.
Prompt-Based Methods
Prompt-based methods use a sequence of tokens to generate representations of concepts. For these methods, we consider the LMScorer method [27], which scores sentences for factual accuracy. LMScorer computes a pseudo-likelihood score for each token in a sequence by iteratively masking it, considering past and future tokens. We find that LMScorer can capture hierarchical relationships, but it is computationally expensive and may not perform as well as other methods in some cases.
Evaluation Methodology
Existing task-dependent evaluations have limitations, such as relying too heavily on a single relation type (hypernymy) and ignoring other related relations like ancestors and siblings. To address this issue, we propose a new evaluation methodology that considers multiple relation types and provides a more comprehensive view of the model’s ability to capture hierarchical knowledge. Our approach uses a combination of cosine and Euclidean distances to measure the similarity between representations and evaluates the model’s performance on multiple relation types simultaneously.
In conclusion, our study shows that both vectorial-based and prompt-based methods can capture hierarchical relationships in concept representation, but existing task-dependent evaluations have limitations. By using a new evaluation methodology that considers multiple relation types, we can gain a better understanding of how language models represent concepts in a hierarchical manner and identify areas for improvement. This work has important implications for improving the performance of language models in downstream tasks that require an understanding of hierarchical linguistic knowledge.