[1] L. Ouyang et al., “Training language models to follow instructions with human feedback,” Advances in Neural Information Processing Systems (NeurIPS), vol. 35, pp. 27730–27744, 2022.
[2] Y. Wang, Y. Kordi, S. Mishra, P. Liu, N. A. Smith, and A. Ettinger, “Self-Instruct: Aligning language models with instructions without human labels,” arXiv preprint, arXiv:2212.10560, 2022.
[3] R. Taori et al., “Stanford Alpaca: An instruction-following LLaMA model,” Stanford Center for Research on Foundation Models, Tech. Rep., 2023.
[4] T. Wolf et al., “Transformers: State-of-the-art natural language processing,” in Proc. Conf. Empirical Methods in Natural Language Processing (EMNLP): System Demonstrations, 2020, pp. 38–45.
[5] P. Lewis et al., “Retrieval-augmented generation for knowledge-intensive NLP tasks,” Advances in Neural Information Processing Systems (NeurIPS), vol. 33, pp. 9459–9474, 2020.
[6] E. Nijkamp et al., “CodeGen: An open large language model for code with multi-turn program synthesis,” arXiv preprint, arXiv:2203.13474, 2022.
[7] C. Raffel et al., “Exploring the limits of transfer learning with a unified text-to-text transformer,” Journal of Machine Learning Research, vol. 21, no. 140, pp. 1–67, 2020.
[8] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proc. North American Chapter of the Association for Computational Linguistics (NAACL), 2019, pp. 4171–4186.
[9] J. Austin et al., “Program synthesis with large language models,” arXiv preprint, arXiv:2108.07732, 2021.
[10] J. Johnson, M. Douze, and H. Jégou, “Billion-scale similarity search with GPUs,” IEEE Transactions on Big Data, vol. 7, no. 3, pp. 535–547, 2019.
[11] M. Chen et al., “Evaluating large language models trained on code,” arXiv preprint, arXiv:2107.03374, 2021.
[12] J. Dodge, S. Gururangan, D. Card, R. Schwartz, and N. A. Smith, “Show your work: Improved reporting of experimental results,” in Proc. Empirical Methods in Natural Language Processing (EMNLP), 2019.
[13] A. Radford et al., “Language models are unsupervised multitask learners,” OpenAI, Tech. Rep., 2019.
[14] I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” in Proc. Int. Conf. on Learning Representations (ICLR), 2019.
[15] Y. You et al., “Large batch optimization for deep learning: Training BERT in 76 minutes,” in Proc. Int. Conf. on Learning Representations (ICLR), 2020.