[1] Y. Li et al., “On the performance of deep reinforcement learning-based anti-jamming method confronting intelligent jammer,” Appl. Sci., vol. 9, no. 7, pp. 1–15, 2019, doi: 10.3390/app9071361.
[2] Naparstek, Oshri, and Kobi Cohen. "Deep multi-user reinforcement learning for distributed dynamic spectrum access." IEEE transactions on wireless communications 18, no. 1 (2018): 310-323.
[3] L. Jia, F. Yao, Y. Sun, Y. Niu, and Y. Zhu, “Bayesian Stackelberg Game for Antijamming Transmission With Incomplete Information,” IEEE Commun. Lett., vol. 20, no. 10, pp. 1991–1994, 2016, doi: 10.1109/LCOMM.2016.2598808.
[4] M. K. Hanawal, M. J. Abdel-Rahman, and M. Krunz, “Joint Adaptation of Frequency Hopping and Transmission Rate for Anti-Jamming Wireless Systems,” IEEE Trans. Mob. Comput., vol. 15, no. 9, pp. 2247–2259, 2016, doi: 10.1109/TMC.2015.2492556.
[5] L. Xiao, T. Chen, J. Liu, and H. Dai, “Anti-jamming transmission stackelberg game with observation errors,” IEEE Commun. Lett., vol. 19, no. 6, pp. 949–952, 2015, doi: 10.1109/LCOMM.2015.2418776.
[6] H. Zhu, C. Fang, Y. Liu, C. Chen, M. Li, and X. S. Shen, “You Can Jam but You Cannot Hide: Defending Against Jamming Attacks for Geo-Location Database Driven Spectrum Sharing,” IEEE J. Sel. Areas Commun., vol. 34, no. 10, pp. 2723–2737, 2016, doi: 10.1109/JSAC.2016.2605799.
[7] Liu, Xin, Yuhua Xu, Luliang Jia, Qihui Wu, and Alagan Anpalagan. "Anti-jamming communications using spectrum waterfall: A deep reinforcement learning approach." IEEE Communications Letters 22, no. 5 (2018): 998-1001.
[8] B. Wang, Y. Wu, K. J. R. Liu, and T. C. Clancy, “An anti-jamming stochastic game for cognitive radio networks,” IEEE J. Sel. Areas Commun., vol. 29, no. 4, pp. 877–889, 2011, doi: 10.1109/JSAC.2011.110418.
[9] Y. Wu, B. Wang, K. J. R. Liu, and T. C. Clancy, “Anti-jamming games in multi-channel cognitive radio networks,” IEEE J. Sel. Areas Commun., vol. 30, no. 1, pp. 4–15, 2012, doi: 10.1109/JSAC.2012.120102.
[10] S. Machuzak and S. K. Jayaweera, “Reinforcement learning based anti-jamming with wideband autonomous cognitive radios,” 2016 IEEE/CIC Int. Conf. Commun. China, ICCC 2016, pp. 1–5, 2016, doi: 10.1109/ICCChina.2016.7636793.
[11] A. P. Badia et al., “Never Give Up: Learning Directed Exploration Strategies,” pp. 1–28, 2020.
[12] Y. Xiao, J. Hoffman, T. Xia, and C. Amato, “Learning Multi-Robot Decentralized Macro- Action-Based Policies via a Centralized Q-Net,” Proc. - IEEE Int. Conf. Robot. Autom., pp. 10695–10701, 2020, doi: 10.1109/ICRA40945.2020.9196684.
[13] K. K. Nguyen, T. Q. Duong, N. A. Vien, N.-A. Le-Khac, and M.-N. Nguyen, “Non-Cooperative Energy Efficient Power Allocation Game in D2D Communication: A Multi-Agent Deep Reinforcement Learning Approach,” IEEE Access, vol. 7, pp. 100480–100490, 2019, doi: 10.1109/access.2019.2930115.
[14] H. Li, “Multiagent Q -learning for aloha-like spectrum access in cognitive radio systems,” Eurasip J. Wirel. Commun. Netw., vol. 2010, 2010, doi: 10.1155/2010/876216.
[15] H. H. Chang, H. Song, Y. . Zhang, H. He, and L. Liu, “Distributive dynamic spectrum access through deep reinforcement learning: A reservoir computing-based approach,” IEEE Internet Things J., vol. 6, no. 2, pp. 1938–1948, 2019, doi: 10.1109/JIOT.2018.2872441.
[16] X. Chen, Z. Zhao, and H. Zhang, “Stochastic power adaptation with multiagent reinforcement learning for cognitive wireless mesh networks,” IEEE Trans. Mob. Comput., vol. 12, no. 11, pp. 2155–2166, 2013, doi: 10.1109/TMC.2012.178.
[17] S. B. Janiar and V. Pourahmadi, “Deep-reinforcement learning for fair distributed dynamic spectrum access in wireless networks,” 2021 IEEE 18th Annu. Consum. Commun. Netw. Conf. CCNC 2021, 2021, doi: 10.1109/CCNC49032.2021.9369536.
[18] V. François-Lavet, P. Henderson, R. Islam, M. G. Bellemare, and J. Pineau, An introduction to deep reinforcement learning, vol. 11, no. 3–4. 2018.
[19] X. Zhu et al., “Dynamic Spectrum Anti-Jamming with Reinforcement Learning Based on Value Function Approximation,” IEEE Wirel. Commun. Lett., pp. 1–5, 2022, doi: 10.1109/LWC.2022.3228045.
[20] X. Liu, Y. Xu, L. Jia, Q. Wu, and A. Anpalagan, “Anti-jamming communications using spectrum waterfall: A deep reinforcement learning approach,” IEEE Commun. Lett., vol. 22, no. 5, pp. 998–1001, 2018, doi: 10.1109/LCOMM.2018.2815018.
[21] X. Chen, C. Wang, Z. Zhou, and K. Ross, “Randomized Ensembled Double Q-Learning: Learning Fast Without a Model,” pp. 1–25, 2021.
[22] H. van Seijen, M. Fatemi, and A. Tavakoli, “Using a logarithmic mapping to enable lower discount factors in reinforcement learning,” Adv. Neural Inf. Process. Syst., vol. 32, no. NeurIPS, pp. 1–11, 2019.
[23] X. Song, P. Willett, S. Zhou, and P. B. Luh, “The MIMO radar and Jammer games,” IEEE Trans. Signal Process., vol. 60, no. 2, pp. 687–699, 2012, doi: 10.1109/TSP.2011.2169251.
[24] “Black, Paul E. ‘greedy algorithm, Dictionary of Algorithms and Data Structures.’ US Nat. Inst. Std. & Tech Report 88 (2012): 95.,” vol. 88, p. 2012, 2012.
[25] conformance specification Radio, User Equipment UE. "3rd Generation Partnership Project; Technical Specification Group Radio Access Network; Evolved Universal Terrestrial Radio Access (E-UTRA); User Equipment (UE) conformance specification Radio transmission and reception." (2011).