Using Android to stream to Twitch. Part 2. RTMP handshake
Table of contents Warning Goal of this series Steps in this series What is a RTMP? The RTMP handshake Creating the handshake code (THE ACTUAL CODE) The lengthy explanation My app on the Google play store The app My app's GitHub code The app's GitHub code Resources RTMP specification documentation Twitch stream video documentation Warning THIS IS NOT A BEGINERS TUTORIAL. This blog series will fall under the intermediate/ advanced tutorial. I say this not to discourage people from reading but simply to let people know that they may encounter some topics that might seem complicated The Goal of this series As the title states, this entire series will be about how to get the video from our Android device to stream on Twitch. The steps you should take 1) get a preview working on your application 2) Allow your application to capture video 3) Create a secure Socket to connect to the Twitch injection servers 4) Perform the RTMP handshake (what this blog post is talking about ) 5) Encode the video from the device (very hard) 6) Send the encoded data to the Twitch injection server via the socket What is RTMP and why are we using it? According to the RTMP specification documentation, Real Time Messaging Protocol (RTMP) provides a bidirectional message multiplex service over a reliable stream transport, such as TCP [RFC0793], intended to carry parallel streams of video, audio, and data messages, with associated timing information, between a pair of communicating peers. Implementations typically assign different priorities to different classes of messages, which can affect the order in which messages are enqueued to the underlying stream transport when transport capacity is constrained. Which is really just nerd speak for, RTMP lets us send audio and video over the internet We are using RTMP because if we look at the Twitch documentation, we can see that rtmp:///app/[?bandwidthtest=true] uses the rtmp protocol. So once we have a secure socket, previous post on how to create a secure socket, we can initialize the RTMP connection The RTMP handshake RTMP documentation Full warning, we are about to get into literal bits and bytes here. So buckle in and lets create a RTMP handshake The RTMP connection begins with a handshake, which is just an exchange of data between the client(our android app) and the server to make sure the both understand what they are doing. The actual code first I will show you the code and then I will try to explain it, private suspend fun performRtmpHandshake() { withContext(Dispatchers.IO) { try { val timestamp = System.currentTimeMillis().toInt() val randomData = ByteArray(1528).apply { Random().nextBytes(this) } // Build C0 + C1 val handshake = ByteArray(1537).apply { //C0 this[0] = 3 // RTMP version //C1 // Copy timestamp (4 bytes) directly val timestampBytes = ByteBuffer.allocate(4).putInt(timestamp).array() this[1] = timestampBytes[0] this[2] = timestampBytes[1] this[3] = timestampBytes[2] this[4] = timestampBytes[3] // Copy 4 zero bytes directly this[5] = 0 this[6] = 0 this[7] = 0 this[8] = 0 // Copy randomData (1528 bytes) directly for (i in randomData.indices) { this[9 + i] = randomData[i] } } // Send C0 + C1 val outputStream = sslSocket.getOutputStream() outputStream.write(handshake) outputStream.flush() // Read S0 + S1 val inputStream = sslSocket.getInputStream() val response = ByteArray(1537) inputStream.read(response) if (response[0] != 3.toByte()) { throw IllegalStateException("Invalid RTMP handshake version from server") } val s1 = response.copyOfRange(1, 1537) // Build C2 val c2 = ByteArray(1536).apply { // Copy the first 4 bytes of S1 (timestamp) this[0] = s1[0] this[1] = s1[1] this[2] = s1[2] this[3] = s1[3] // Copy the current timestamp (4 bytes) into the next 4 bytes val currentTimestamp = ByteBuffer.allocate(4).putInt(System.currentTimeMillis().toInt()).array() this[4] = currentTimestamp[0] this[5] = currentTimestamp[1] this[6] = currentTi
Table of contents
- Warning
- Goal of this series
- Steps in this series
- What is a RTMP?
- The RTMP handshake
- Creating the handshake code (THE ACTUAL CODE)
- The lengthy explanation
My app on the Google play store
My app's GitHub code
Resources
Warning
- THIS IS NOT A BEGINERS TUTORIAL. This blog series will fall under the intermediate/ advanced tutorial. I say this not to discourage people from reading but simply to let people know that they may encounter some topics that might seem complicated
The Goal of this series
- As the title states, this entire series will be about how to get the video from our Android device to stream on Twitch.
The steps you should take
-
1)
get a preview working on your application -
2)
Allow your application to capture video -
3)
Create a secure Socket to connect to the Twitch injection servers -
4)
Perform the RTMP handshake (what this blog post is talking about ) -
5)
Encode the video from the device (very hard) -
6)
Send the encoded data to the Twitch injection server via the socket
What is RTMP and why are we using it?
According to the RTMP specification documentation,
Real Time Messaging Protocol (RTMP) provides a bidirectional message multiplex service over a reliable stream transport, such as TCP [RFC0793], intended to carry parallel streams of video, audio, and data messages, with associated timing information, between a pair of communicating peers. Implementations typically assign different priorities to different classes of messages, which can affect the order in which messages are enqueued to the underlying stream transport when transport capacity is constrained.
Which is really just nerd speak for, RTMP lets us send audio and video over the internetWe are using
RTMP
because if we look at the Twitch documentation, we can see thatrtmp://
uses the rtmp protocol. So once we have a secure socket, previous post on how to create a secure socket, we can initialize the RTMP connection/app/ [?bandwidthtest=true]
The RTMP handshake
- RTMP documentation
- Full warning, we are about to get into literal bits and bytes here. So buckle in and lets create a RTMP handshake
- The RTMP connection begins with a handshake, which is just an exchange of data between the client(our android app) and the server to make sure the both understand what they are doing.
The actual code
- first I will show you the code and then I will try to explain it,
private suspend fun performRtmpHandshake() {
withContext(Dispatchers.IO) {
try {
val timestamp = System.currentTimeMillis().toInt()
val randomData = ByteArray(1528).apply { Random().nextBytes(this) }
// Build C0 + C1
val handshake = ByteArray(1537).apply {
//C0
this[0] = 3 // RTMP version
//C1
// Copy timestamp (4 bytes) directly
val timestampBytes = ByteBuffer.allocate(4).putInt(timestamp).array()
this[1] = timestampBytes[0]
this[2] = timestampBytes[1]
this[3] = timestampBytes[2]
this[4] = timestampBytes[3]
// Copy 4 zero bytes directly
this[5] = 0
this[6] = 0
this[7] = 0
this[8] = 0
// Copy randomData (1528 bytes) directly
for (i in randomData.indices) {
this[9 + i] = randomData[i]
}
}
// Send C0 + C1
val outputStream = sslSocket.getOutputStream()
outputStream.write(handshake)
outputStream.flush()
// Read S0 + S1
val inputStream = sslSocket.getInputStream()
val response = ByteArray(1537)
inputStream.read(response)
if (response[0] != 3.toByte()) {
throw IllegalStateException("Invalid RTMP handshake version from server")
}
val s1 = response.copyOfRange(1, 1537)
// Build C2
val c2 = ByteArray(1536).apply {
// Copy the first 4 bytes of S1 (timestamp)
this[0] = s1[0]
this[1] = s1[1]
this[2] = s1[2]
this[3] = s1[3]
// Copy the current timestamp (4 bytes) into the next 4 bytes
val currentTimestamp = ByteBuffer.allocate(4).putInt(System.currentTimeMillis().toInt()).array()
this[4] = currentTimestamp[0]
this[5] = currentTimestamp[1]
this[6] = currentTimestamp[2]
this[7] = currentTimestamp[3]
// Copy the random data from S1 (starting from the 8th byte)
for (i in 8 until 1536) {
this[i] = s1[i]
}
}
// Send C2
outputStream.write(c2)
outputStream.flush()
// Read S2
val s2 = ByteArray(1536)
inputStream.read(s2)
Log.i(TAG, "RTMP handshake successful")
} catch (e: Exception) {
Log.e(TAG, "Handshake failed: ${e.message}", e)
}
}
}
The lengthy explanation
The logic of the handshake goes like this, we send a chunk of data, wait for a chunk of data, send a chunk of data and then wait for a chunk of data. Once we have received that final chunk the handshake is complete
The
ByteArray(1537)
, is actually how we transport the data over the socket. It obviously consists of bytes(octets for you hard core nerds) where each byte contains 8 bits. The size of this byte array is very specific, the documentation states that we need to have 1 byte for the version and 1536 bytes for all the other dataAs you can see from the first section of the handshake:
val randomData = ByteArray(1528).apply { Random().nextBytes(this) }
// Build C0 + C1
val handshake = ByteArray(1537).apply {
//C0
this[0] = 3 // RTMP version
//C1
// Copy timestamp (4 bytes) directly
val timestampBytes = ByteBuffer.allocate(4).putInt(timestamp).array()
this[1] = timestampBytes[0]
this[2] = timestampBytes[1]
this[3] = timestampBytes[2]
this[4] = timestampBytes[3]
// Copy 4 zero bytes directly
this[5] = 0
this[6] = 0
this[7] = 0
this[8] = 0
// Copy randomData (1528 bytes) directly
for (i in randomData.indices) {
this[9 + i] = randomData[i]
}
}
Now the code:
//C0 this[0] = 3
might seem a little strange but theC
andS
just represent client and server. Thethis[0] = 3
is us setting the first byte to 3. Again this might sound a little off but remember a byte is 8 bits and a single 8-bit number can represent0 to 255
for unsigned and-128 to 127
for signed. But3
is used to tell the server which version of RTMP we want to use. You can read more about that, hereNow we can talk about the time stamping:
// Copy timestamp (4 bytes) directly
val timestamp = System.currentTimeMillis().toInt()
val timestampBytes = ByteBuffer.allocate(4).putInt(timestamp).array()
this[1] = timestampBytes[0]
this[2] = timestampBytes[1]
this[3] = timestampBytes[2]
this[4] = timestampBytes[3]
According to the documentation we are given 4 bytes(32 bits) to represent our times stamps. It helps ensure that messages (or chunks) are sent in the correct order and can be synchronized between different streams or endpoints. Technically speaking this can be any number, it just has to increase over time. The
ByteBuffer.allocate(4).putInt(timestamp).array()
allocates 4 bytes and places out timestamp into those bytes. Again that might seem like a weirdly specific number but 4 bytes is just the industry standard for timestamps. Also, each timestampBytes[n] represents a different section of the time stampThe weird zeros:
// Copy 4 zero bytes directly
this[5] = 0
this[6] = 0
this[7] = 0
this[8] = 0
Are called
padding bytes
which are used to contain a consistent structure and a boundary between byte informationThe next value is the strange one, its the randomness:
// Copy randomData (1528 bytes) directly
for (i in randomData.indices) {
this[9 + i] = randomData[i]
}
Once again, the documentation tells us that we need to assign 1528 bytes a bunch of literal random data to inform the server that the message being sent over has finished
Now that we have to send data to the server and wait for a reply:
// Read S0 + S1
val inputStream = sslSocket.getInputStream()
val response = ByteArray(1537)
inputStream.read(response)
if (response[0] != 3.toByte()) {
throw IllegalStateException("Invalid RTMP handshake version from server")
}
Rinse an repeat
- Then we just follow the documentation and do the exact same thing over again. Once this data is returned we know that the RTMP hand shake is complete!!!!!
Conclusion
- Thank you for taking the time out of your day to read this blog post of mine. If you have any questions or concerns please comment below or reach out to me on Twitter.
What's Your Reaction?