Implement a simple RTSP server

In the previous article, the RTSP protocol itself was briefly introduced, and this time we will talk about how to implement a simple RTSP server.

Live555

Live555 is a C++ media library that supports a wide range of media services protocols, streaming, receiving and processing audio and video data in a variety of audio and video encoding formats.

It is also one of the few available libraries, and its code is rather cumbersome, but it is simple enough for most people to get started with, so it is perfect for practice and rewriting.

A simple RTSP/H264 implementation

Let’s take testProgs/testOnDemandRTSPServer.cpp as an example.

First create the RTSP service itself.

TaskScheduler* scheduler = BasicTaskScheduler::createNew();
env = BasicUsageEnvironment::createNew(*scheduler);

UserAuthenticationDatabase* authDB = NULL;

// Here you can remove the authentication if you don't need it
authDB = new UserAuthenticationDatabase;
authDB->addUserRecord("username1", "password1");

// Listening on port 554, the standard RTSP port
RTSPServer* rtspServer = RTSPServer::createNew(*env, 554, authDB);

Then add the H264 file as the video source, and if you don’t quite understand it, you can associate the above with the creation of the HTTP server, and the following with the implementation of the HTTP routing resource inside the service.

char const* streamName = "h264ESVideoTest";
char const* inputFileName = "test.264";
ServerMediaSession* sms
  = ServerMediaSession::createNew(*env, streamName, streamName,
                  descriptionString);
sms->addSubsession(H264VideoFileServerMediaSubsession
              ::createNew(*env, inputFileName, reuseFirstSource));
rtspServer->addServerMediaSession(sms);

Camera-based video streaming

Here, you need to notice that Live555’s RSTP server is based on file to do it, if your video source is not a static file, then you need to implement ServerMediaSubsession by yourself.

At present, I searched for two main options, one option is to use Linux’s named pipe fifo for data transfer, so that you can use it directly without implementing ServerMediaSubsession.

This is done by creating a pipe with Linux’s mkfifo command, or a system API call, and then using the file API to read and write (I have to admire the ingenuity of Linux, everything is a file).

`1`	`mkfifo [OPTION]... NAME...`

However, I didn’t test it, but from reading questions on various forums and summing up experiences, this solution is simple, but performance is worrisome and can cause a lot of latency.

Then, another solution is called for. In fact, Live555 already gives an example of the solution in the new version, in liveMedia/DeviceSource.cpp.

DeviceSource*
DeviceSource::createNew(UsageEnvironment& env,
			DeviceParameters params) {
  return new DeviceSource(env, params);
}

EventTriggerId DeviceSource::eventTriggerId = 0;

unsigned DeviceSource::referenceCount = 0;

DeviceSource::DeviceSource(UsageEnvironment& env,
			   DeviceParameters params)
  : FramedSource(env), fParams(params) {
  if (referenceCount == 0) {
    // Any global initialization, such as the initialization of a device
    //%%% TO BE WRITTEN %%%
  }
  ++referenceCount;

  // Instance-level initialization
  //%%% TO BE WRITTEN %%%

  // The next step is to set how to read the video frame from the device, there are two ways, one can be read directly, specific examples can search turnOnBackgroundReadHandling
  envir().taskScheduler().turnOnBackgroundReadHandling(...);
  
  // The other type needs to be read asynchronously, in which case event-triggered reading is used
  if (eventTriggerId == 0) {
    eventTriggerId = envir().taskScheduler().createEventTrigger(deliverFrame0);
  }
}

DeviceSource::~DeviceSource() {
  // Release instance resources
  //%%% TO BE WRITTEN %%%

  --referenceCount;
  if (referenceCount == 0) {
    // Release global resources
    //%%% TO BE WRITTEN %%%

    // Reclaim our 'event trigger'
    envir().taskScheduler().deleteEventTrigger(eventTriggerId);
    eventTriggerId = 0;
  }
}

void DeviceSource::doGetNextFrame() {

  // When the device is no longer readable, such as shutdown, it needs to be handled here
  if (0 /*%%% TO BE WRITTEN %%%*/) {
    handleClosure();
    return;
  }

  // If video frame data is available
  if (0 /*%%% TO BE WRITTEN %%%*/) {
    deliverFrame();
  }

  // Instead, our event trigger must be called (e.g., from a separate thread) when new data becomes available.
}

void DeviceSource::deliverFrame0(void* clientData) {
  ((DeviceSource*)clientData)->deliverFrame();
}

void DeviceSource::deliverFrame() {
  // This method will be called when the video data frame is available
  // The following parameters will be used to copy the data downstream (client, etc.)
  // fTo: copy to address, only data can be copied, cannot be modified
  // fMaxSize: maximum copyable data, not modifiable, if the actual data is larger than this value, the interception will be required and "fNumTruncatedBytes" will be modified accordingly
  // fFrameSize: actual data size (<= fMaxSize).
  // fNumTruncatedBytes: mentioned above
  // fPresentationTime: presentation time of the video frame, can be set to system time by calling "gettimeofday()", better if you can get the decoder time
  // fDurationInMicroseconds: duration of the video frame, not needed if it is a live video source, as it will result in the data never reaching the client earlier

  if (!isCurrentlyAwaitingData()) return; 

  u_int8_t* newFrameDataStart = (u_int8_t*)0xDEADBEEF; //%%% TO BE WRITTEN %%%
  unsigned newFrameSize = 0; //%%% TO BE WRITTEN %%%

  if (newFrameSize > fMaxSize) {
    fFrameSize = fMaxSize;
    fNumTruncatedBytes = newFrameSize - fMaxSize;
  } else {
    fFrameSize = newFrameSize;
  }
  gettimeofday(&fPresentationTime, NULL); // If there is no timestamp for the live video source, get the current system time
  // If the device is not a live video source, such as a file, then set "fDurationInMicroseconds" here
  memmove(fTo, newFrameDataStart, fFrameSize);

  // Notify the reader that data is available when the transmission is complete
  FramedSource::afterGetting(this);
}


// The following code is the code used to notify the DeviceSource that the video source is available (asynchronously) and can be called in different threads, but not in multiple threads with the same 'event trigger id' (which would result in only one trigger). Also, if there are multiple video sources, you need to change eventTriggerId to a non-static member.
void signalNewFrameData() {
  TaskScheduler* ourScheduler = NULL; //%%% TO BE WRITTEN %%%
  DeviceSource* ourDevice  = NULL; //%%% TO BE WRITTEN %%%

  if (ourScheduler != NULL) {
    ourScheduler->triggerEvent(DeviceSource::eventTriggerId, ourDevice);
  }
}

Currently I use this way, using chip hard coding, can get in the network 1080P 30fps h.264 1000ms below the latency. Of course, the experimental code is written rather rough, need to further optimize it, I believe the overall idea is still right.

Table of Contents

Live555

A simple RTSP/H264 implementation

Camera-based video streaming