Video recognition and video object detection are also possible. For video you basically just run the network on each frame of the video. There are also networks that can detect "actions" from a video by inputting several frames into the network. For continuous live video recognition it is normally best to implement the network in an app, or using a dedicated server.