Keywords: Q-learning, Mach-Zehnder interferometer, fidelity. Questions:

  1. What is the structure of the neural net used for Q-learning?
  2. Section III: What differentiates the neural network used to calculate future return from the target network? It seems like the target network focuses on learning the immediate reward, whereas the original neural network calculates the expected return from the full trajectory.
  3. Section IV. A: How do we measure that the third excited Bloch state provides the best trade-off between large momentum splitting and high-frequency components?
  4. Section IV. A: What is the physical intuition behind the emission/absorption of photons only happening in multiples of . ( I am assuming that they use ).