Cool artstyle. I play a lot of rhythm games (and made some simple ones as well).
The game is lacking basic feedback for understanding the result of your input. There is delay on the sound feedback and no visual feedback, this makes a rhythm game super hard to play as you have no correction cycle during the gameplay so you play 'blindly'. The sound delay is a classic problem because the game engine usually adds around 100 to 150 ms of delay for on-demmand sound effects no matter what. For this style of gameplay, I would suggest scheduling the sound of success to play exactly on the beat for every expected, and then showing the visual feedback (perfect, good, bad) dynamically. This is the way I found the easiest. The hard (and correct) way is to write low-level code to play a sound directly to the sound chip without the full sound pipeline to reduce the response delay to a minimum. Usually requires C++ or some native plugin.
I made a lockstep clone (from rhythm heaven) in the scheduled way described above. And the last rhythm game I made for gmtk jam, you can take a look on my itch page (Karaoke SQUAD).
The second point is the difficulty curve. The second level already adds a lot of off-beats and syncopations. I thought it was too early to go crazy like that.
Hope I can help! Good work on the project, congrats, and good luck if you keep working on it!