A Proximal Policy Optimization agent was trained to learn quadrotor dynamics, successfully selecting control outputs to stabilize the drone and track complex trajectories. The agent was trained to mimic a minimum snap trajectory. The UAV closely followed the path, maintaining desired speeds of 3.56 body lengths/second, and remaining within 0.5m of the path, in wind conditions up to 20 mph. The agent was also validated on other complex trajectories, still closely tracking them regardless of the path it was trained on. Compared to PID controllers, the RL controller had a faster response time, converging to the desired path quicker. PID tuning is high maintenance and is limited by linearization around hover state. This results in instabilities and overshoots not observed in the RL controller, as well as RL learning non-linear dynamics. However, the RL controller had noisy motor output, resulting in undesirable oscillatory behaviour not observed in PID.