[2] Gemmeke, J. F., et al. (2017). AudioSet: An ontology and human-labeled dataset for audio events. ICASSP .
[5] Devlin, J., et al. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. NAACL . moviescc
If you meant an existing specific "MovieSCC" (e.g., a software library, dataset, or conference), please provide a reference, and I will rewrite the paper accordingly. [2] Gemmeke, J
[4] Zhou, B., et al. (2018). Movie genre classification via scene categorization. ACM MM . a software library
[3] Rao, A., et al. (2020). SceneFormer: Inductive bias for video scene segmentation. ECCV .