powerful enough FPGA in cartridge to emulate V9938 and drive some ready-to-use LED matrix cube via SPI or whatever (i guess such cubes should be available) instead of rendering RGB signals
But I guess you would have to patch the game.
The cube is only moving when the character is moving.
If you are emulating the V9938 you would know where the sprites are so you can move the cube based on character position.
Yes, and it's not very difficult.
Set up a Rasperry Pi (or almost any other computer you like) with Linux, and use a GPIO or other similar input that, when it receives a trigger signal, send a command to a web browser to play that video from YouTube. Then write a small program on the MSX that sends a trigger output to the joystick port, which you connect to that input, and when you run that program the video will be played.
If that's not at all what you were thinking of, perhaps you need to get a quite a lot more specific about exactly what you're wanting to do.
For a msx1 vertical scrolling game, if the screen moved downwards (background only, sprites on previous height), you would get smooth scroll. (So, not a cube shaped, but like a toilet roll).
Don't you have an account yet? Become an MSX-friend and register an account!