In this section, we discuss the process of sampling and annotating the selected search queries to create a taxonomy for portrait-mode video analysis. We explain that we have approximately 2 to 50 associated search queries for each of the 500 candidate categories in the taxonomy, which are obtained from the original sources without storing or distributing videos.
To ensure ethical and legal considerations, we carefully organize the annotation task to provide reasonable workloads and just compensation for annotators while adhering to human rights principles. We also merge similar queries into a final leaf node category of the taxonomy, split or remove queries that may overlap with existing categories, and obtain about 500 candidate categories organized in a three-layer hierarchy.
To demystify complex concepts, we use everyday language and engaging metaphors or analogies to explain the process. For example, we compare the annotation task to cooking a meal, where each category is like a different dish with its unique ingredients (queries). We also liken the merging of similar queries to combining flavors in a recipe to create a delicious and balanced dish.
Throughout the summary, we strive for a balance between simplicity and thoroughness to capture the essence of the article without oversimplifying. By using simple language and relatable analogies, we aim to help readers understand complex concepts related to sampling and annotation in portrait-mode video analysis.
Computer Science, Computer Vision and Pattern Recognition