Post by Chris Iverson on May 21, 2019 22:04:43 GMT -5
Actually, you do still need to worry about it, simply because you're working at the sockets level, not at the protocol level. While it's good to use an existing protocol if you can, you still have to manage the datastreams coming from the sockets yourself.
In the example you give, preallocating a large buffer once you have the Content-Length, it's still very possible for you to not be able to receive the whole file in one function call(especially if it's a GIGANTIC file).
While preallocating can admittedly help, you still have to deal with the logic of reassembly: receiving the data in chunks, and piecing those chunks back together.
As I said above, the network is not always reliable, and the full data might not have actually been received by your computer by the time you make the call to Receive().
Something like this:
1. Make GET request to server for file
2. Check headers of response; Content-Length indicates 4096(4KB)
3. Make buffer 5KB in size(size of content-length plus extra space for headers)
4. Make 2nd GET request for same file
5. Receive() into buffer - only receive 2KB of file
What happened? The network is still sending the other 2KB. Your computer hasn't received it yet. Or maybe the OS's buffers don't let you move that much data at once. Or maybe the DLL doesn't. (I forget what the max transfer size at one time is in the DLL).
So what do you do?
The rest of the data is still waiting for you in the stream. You need to call Receive() again to get the rest of it, and add it to the data you already received to get the complete package.
Both Receive() and DecryptReceive() will return the number of bytes that have been given to you. Note that this will NOT match the Content-Length header; the header only lists exactly that: the length of the content being sent. Decrypt/Receive() count total number of bytes transferred to your buffer, including the bytes that made up the header. The difference is a calculation you'll have to make yourself.
Also, note that out of necessity, Receive() and DecryptReceive() have slightly different semantics when it comes to the number of bytes returned.
Anything over zero, they'll be identical.
For Receive(), if you get less than zero back, a socket error occurred. You can call GetError() to get the WinSock error code. This is also true of DecryptReceive(), although it won't always be just Winsock error codes that will get returned in that case.
If you get a zero back, however, the meaning is different for the two functions.
For Receive(), it means the other side of the connection gracefully closed the connection. No more data will be coming; you can close the socket on your end, as well.
For DecryptReceive(), getting a zero back is NOT an indication of a closed connection. It is a perfectly valid return value, because now, it's not just your application data that's being sent over your sockets; it's the TLS session data. Sometimes, you will receive a packet or chunk that ONLY has TLS session data, such as a renegotiation or an alert, and will not actually have any application data. In such a case, when you pass that into DecryptReceive(), the TLS session data will be processed by SChannel after calling DecryptMessage(), and you'll get a return value of zero, because there was no data to give you.
DecryptReceive() indicates a closed connection by with a return value of -1(SOCKET_ERROR), along with an extended error(GetError()) equal to WSAEDISCON(number value 10101).
EDIT: And I've added a new file to the repo, test-http-https-dual.bas. It's a server that sets up listeners on both port 80 and port 443, and will respond to "GET /" with the test.html file. If the request comes in on the HTTPS port, it will be upgraded to TLS before processing.
The code as written works, but it can only handle one total connection at a time. Until it's finished responding to an open connection, whether that's on port 80 or 443, it will not accept any more connections.
It's also not a one-shot response anymore; it will loop back to accepting connections once it's finished processing one connection. To get it to stop, navigate to a page with ?command=END added to the path(case sensitive). Like http://localhost/?command=END or https://localhost/?command=END.
With the changes, it no longer responds with the test HTML to everything. If you try to GET anything except the root path("/"), like "http://localhost/file.html", it will respond with HTTP Error 404.
Also, once a connection is accepted and opened, it will hold that connection open for ~10 seconds. If no data is received in that time, it will close that connection and go back to listening for new ones.
That last behavior was added due to a helpful feature of Chrome that turns out to be REALLY annoying for one-shot, one-connection-slot servers: when Chrome finishes processing a web request, it will immediately open another connection to the server, ready to send the next web request. It will then refuse to close that connection.
Now, in most cases, this isn't an issue; it's a bonus for the user, because it helps speed up subsequent connections on the same page(especially TLS connections, as it will perform the whole TLS handshake before waiting), and it's not a huge burden on the server. An inactive connection doesn't take up very many resources, and most servers will wind up closing the connection if it remains inactive.
For this sample, though, since I didn't write it in a way that lets multiple separate connections be managed simultaneously(although this is possible in LB), having one connection get stuck open forever was a bit of an issue.
In the example you give, preallocating a large buffer once you have the Content-Length, it's still very possible for you to not be able to receive the whole file in one function call(especially if it's a GIGANTIC file).
While preallocating can admittedly help, you still have to deal with the logic of reassembly: receiving the data in chunks, and piecing those chunks back together.
As I said above, the network is not always reliable, and the full data might not have actually been received by your computer by the time you make the call to Receive().
Something like this:
1. Make GET request to server for file
2. Check headers of response; Content-Length indicates 4096(4KB)
3. Make buffer 5KB in size(size of content-length plus extra space for headers)
4. Make 2nd GET request for same file
5. Receive() into buffer - only receive 2KB of file
What happened? The network is still sending the other 2KB. Your computer hasn't received it yet. Or maybe the OS's buffers don't let you move that much data at once. Or maybe the DLL doesn't. (I forget what the max transfer size at one time is in the DLL).
So what do you do?
The rest of the data is still waiting for you in the stream. You need to call Receive() again to get the rest of it, and add it to the data you already received to get the complete package.
Both Receive() and DecryptReceive() will return the number of bytes that have been given to you. Note that this will NOT match the Content-Length header; the header only lists exactly that: the length of the content being sent. Decrypt/Receive() count total number of bytes transferred to your buffer, including the bytes that made up the header. The difference is a calculation you'll have to make yourself.
Also, note that out of necessity, Receive() and DecryptReceive() have slightly different semantics when it comes to the number of bytes returned.
Anything over zero, they'll be identical.
For Receive(), if you get less than zero back, a socket error occurred. You can call GetError() to get the WinSock error code. This is also true of DecryptReceive(), although it won't always be just Winsock error codes that will get returned in that case.
If you get a zero back, however, the meaning is different for the two functions.
For Receive(), it means the other side of the connection gracefully closed the connection. No more data will be coming; you can close the socket on your end, as well.
For DecryptReceive(), getting a zero back is NOT an indication of a closed connection. It is a perfectly valid return value, because now, it's not just your application data that's being sent over your sockets; it's the TLS session data. Sometimes, you will receive a packet or chunk that ONLY has TLS session data, such as a renegotiation or an alert, and will not actually have any application data. In such a case, when you pass that into DecryptReceive(), the TLS session data will be processed by SChannel after calling DecryptMessage(), and you'll get a return value of zero, because there was no data to give you.
DecryptReceive() indicates a closed connection by with a return value of -1(SOCKET_ERROR), along with an extended error(GetError()) equal to WSAEDISCON(number value 10101).
EDIT: And I've added a new file to the repo, test-http-https-dual.bas. It's a server that sets up listeners on both port 80 and port 443, and will respond to "GET /" with the test.html file. If the request comes in on the HTTPS port, it will be upgraded to TLS before processing.
The code as written works, but it can only handle one total connection at a time. Until it's finished responding to an open connection, whether that's on port 80 or 443, it will not accept any more connections.
It's also not a one-shot response anymore; it will loop back to accepting connections once it's finished processing one connection. To get it to stop, navigate to a page with ?command=END added to the path(case sensitive). Like http://localhost/?command=END or https://localhost/?command=END.
With the changes, it no longer responds with the test HTML to everything. If you try to GET anything except the root path("/"), like "http://localhost/file.html", it will respond with HTTP Error 404.
Also, once a connection is accepted and opened, it will hold that connection open for ~10 seconds. If no data is received in that time, it will close that connection and go back to listening for new ones.
That last behavior was added due to a helpful feature of Chrome that turns out to be REALLY annoying for one-shot, one-connection-slot servers: when Chrome finishes processing a web request, it will immediately open another connection to the server, ready to send the next web request. It will then refuse to close that connection.
Now, in most cases, this isn't an issue; it's a bonus for the user, because it helps speed up subsequent connections on the same page(especially TLS connections, as it will perform the whole TLS handshake before waiting), and it's not a huge burden on the server. An inactive connection doesn't take up very many resources, and most servers will wind up closing the connection if it remains inactive.
For this sample, though, since I didn't write it in a way that lets multiple separate connections be managed simultaneously(although this is possible in LB), having one connection get stuck open forever was a bit of an issue.