Playing with Proxies
Building a HTTP forward proxy in C
March 12, 2025
It is near impossible to use the internet without encountering a proxy of some kind. For example, most large-scale applications use load balancers, where incoming requests are intercepted by a reverse proxy and routed to servers based on availability. When you sign into a captive portal on a public WiFi network, your traffic is typically routed through some kind of proxy or gateway server to prevent you from accessing the network while you complete a login process. And then there’s VPNs or Virtual Private Networks, which are becoming more and more popular with growing concerns of digital privacy and tracking.1
But let’s step back for a second. What is a proxy? On a broad level, we can define a proxy as an intermediary service that performs actions on the behalf of two or more connected hosts. For example, a forward proxy (as the name suggests) forwards traffic from a client to an end destination.
Let’s try to build a very forward simple proxy with the following specs: - Intercept HTTP traffic and simply forward it to the intended resource. - Our simple proxy will exist on the application layer, meaning that the logic will be based around the specifications of HTTP as opposed to a lower level protocol like TCP/UDP. - We will build this in C so we can use POSIX socket commands.
Managing sockets
Sockets are kernel-managed abstractions that easily allow programs to communicate across networks. For a full description of sockets, you can check out my previous post. Our program will need to open three sockets. One for the server listening for incoming connections, one for the accepted client connection, and one for the proxy service sending and receiving the outgoing request to/from the end destination.
Let’s start with the server socket. Our program needs to create a socket, bind it to a local address, listen for connections, and define logic on how to accept a new connection.
#define PORT 5050
int main()
{
int server_sock = socket(AF_INET, SOCK_STREAM, 0);
if (server_sock < 0)
{
("Error creating server socket\n");
perror("%d\n", server_sock);
printf(server_sock);
close(EXIT_FAILURE);
exit}
struct sockaddr_in server_address;
.sin_family = AF_INET;
server_address.sin_port = htons(PORT);
server_address.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
server_address
if (bind(server_sock, (struct sockaddr *)&server_address, sizeof(server_address)) < 0)
{
("Error binding server socket\n");
perror(server_sock);
close(EXIT_FAILURE);
exit}
...
}
In the above code, we’ve defined a TCP socket and bound it to a local IPv4 loopback address and port. Now we can listen for connections
if (listen(server_sock, 1) < 0)
{
("Failed to listen on server socket\n");
perror(server_sock);
close(EXIT_FAILURE);
exit}
The second argument of the listen
function specifies the number of incoming connections to
backlog. Since we aren’t really worried about
our proxy being flooded by requests, we can just accept
one connection for now.
When a request hits the proxy, we’ll create a new client socket and then do something with it.
int client_socket;
struct sockaddr_in client_address;
= sizeof(client_address);
socklen_t client_len = accept(server_sock, (struct sockaddr *)&client_address, &client_len);
client_sock if (client_sock < 0)
{
("Failed to accept incoming client connection\n");
perror(client_sock);
close(server_sock);
close(EXIT_FAILURE);
exit}
("Client connected: %s\n", inet_ntoa(client_address.sin_addr));
printf(client_sock); // TODO handle_client
Our program creates the new client socket with the
accept
call. If nothing goes wrong, we can
now read and write to and from the socket. Let’s get
parsing!
Parsing a HTTP request
All our proxy needs to know for now is the hostname and port number of the intended destination. For example, consider the following request:
GET http://example.com:8080
The hostname of the request is
example.com
and the port will be
8080
. If no port is specified, we’ll use
the HTTP default port of 80
. The above
request translates into the following plaintext:
GET / HTTP/1.1
Host: example.com:80
User-Agent: Curl/8.6.0
Though we just need the host and port number, we can build a basic parser that we can extend later if we need to. Manipulating and searching text is a little trickier than higher level languages, but we won’t let that stop us.
To begin, let’s define a struct where we can store our variables for later:
typedef struct {
char method[16];
char host[256];
int port;
char path[1024];
} HttpRequest;
Next, let’s make a new function with an argument for the struct we will manipulate, and another for the raw request string:
int parse_http_request(const char *raw_request, HttpRequest *http_request)
{
(http_request, 0, sizeof(HttpRequest));
memsetchar method[10] = {0};
char path[1024] = {0};
char host[256] = {0};
int port = 80;
...
}
To avoid memory corruption and issues parsing, we can create a copy of the raw request:
char *request_copy = strdup(raw_request);
Now, for the parsing. Our objective is to split the
request into individual lines. Then, on each line we can
pluck out the value we want. For this we can use the
strtok_r
method, a reentrant version of
strtok
for thread safety:
char *strtok_r(char *restrict str, const char *restrict sep, char **restrict lasts);
This function allows us to tokenize a string
based on a delimiter. The first argument is a
pointer to the string (i.e. a pointer to the first
character), the second argument is the delimiter and the
last argument is a user-defined pointer that directs us
to the character found after the delimiter is
satisfied. If the delimiter, or separator character, is
found the result of the function will be a pointer to
the first string instance before the character is found.
If nothing is found, the result will be
NULL
. If we want to continue searching the
same string, we set subsequent calls with
NULL
in the first argument. Kind of
confusing and weird, but a simple example will help:
char str[] = "foo,bar,baz";
char *token;
char *saveptr;
= strtok_r(str, ",", &saveptr);
token while (token != NULL) {
("%s\n", token);
printf= strtok_r(NULL, ",", &saveptr);
token }
In the snippet above, we want to separate our string
by the comma (,
) character. The first call
to strtok_r
includes the original string
and will return a valid pointer since the string
includes at least one of our delimiter character. We’ll
continue this operation in a loop until no more commas
remain, printing out the following:
gcc strtok.c
./a.out
foo
bar
baz
Now on to our request. HTTP specifies that each new
line ends with \r\n
characters2. So, we’ll split up
the request by searching for this pattern as our
delimiter. Then, on each line, we can search for
whatever values we want.
// parse first line
char *line = strtok_r(request_copy, "\r\n", &line_ptr);
if (line) {
char *token = strtok_r(line, " ", &token_ptr);
if (token) {
// method
(method, token, sizeof(method) - 1);
strncpy= strtok_r(NULL, " ", &token_ptr);
token
if (token) {
(path, token, sizeof(path) - 1);
strncpy}
// parse headers
while ((line = strtok_r(NULL, "\r\n", &line_ptr)) != NULL) {
// look for Host header
if (strncasecmp(line, "Host:", 5) == 0) {
char *host_value = line + 5;
// skip whitespace
while (isspace(*host_value)) host_value++;
// parse host and port if present
char *colon = strchr(host_value, ':');
if (colon) {
// host with port
size_t host_len = colon - host_value;
(host, host_value, host_len);
strncpy[host_len] = '\0';
host= atoi(colon + 1);
port } else {
// host without port
(host, host_value, sizeof(host) - 1);
strncpy}
break;
}
}
}
We break up each line by the \r\n
delimiter. On each line we can manually search for
values like the Host
header with the
strncasecmp
method, the port preceding the
host by searching for a proceeding colon. These are the
only values we need for now, but in the future we might
add more logic to pluck out more header values, cookies,
etc.
Forwarding requests
Now that we can determine the final intended destination of the HTTP request, our proxy service can make that request on behalf of the client. This is fairly simple and will involve the following: 1. Receive the request from the client, parse the host and port 2. If the host is a domain name, we determine its IP address 3. Create a new socket connection to the final endpoint, forward the entire request from the client, then return the response back to the client.
In our handle_client
function we can
first parse what we need:
void handle_client(int client_socket)
{
char buffer[1024];
ssize_t read_size;
= recv(client_socket, buffer, sizeof(buffer) - 1, 0);
read_size if (read_size <= 0)
{
("Error receiving client message\n");
perror(client_socket);
closereturn;
}
[read_size] = '\0';
buffer;
HttpRequest requestif (parse_http_request(buffer, &request) < 0)
{
("parse unsuccessfull\n");
printf(client_socket);
closereturn;
}
...
}
We’ll use the recv
function to get the
initial HTTP request, which will store it in our
buffer
. We’ll then parse the contents of
the buffer in order to extract the host and port.
In order to translate a host to an IP address we can
use getaddrinfo
, which is build into
unix-like operating systems. From the man
pages:
The getaddrinfo() function is used to get a list of IP addresses and port numbers for host hostname and service servname.
We provide the function with a hostname and a struct for where to store the address once it has been resolved. We can also specify we want a IPv4 address.
struct addrinfo hints, *res;
...
.ai_socktype = SOCK_STREAM; // TCP socket
hints.ai_family = AF_INET; // Use AF_INET for IPv4
hints(request.host, NULL, &hints, &res); getaddrinfo
If the call succeeds, then the hostname has been
resolved and we can access it with
res->ai_addr
.
Finally, using the address, we create the proxy socket, configured with the IP and port number:
int proxy_socket = socket(AF_INET, SOCK_STREAM, 0);
if (proxy_socket < 0)
{
("Error creating proxy socket\n");
perror(proxy_socket);
closereturn;
}
struct sockaddr_in proxy_address;
.sin_family = AF_INET;
proxy_address.sin_port = htons(request.port);
proxy_address.sin_addr = res->ai_addr; proxy_address
We can forward the original request to the proxy socket, and forward the reply from the end server back to the client:
if (connect(proxy_socket, (struct sockaddr *)&proxy_address, sizeof(proxy_address)) < 0)
{
("Proxy connection failed\n");
perror(client_socket);
close(proxy_socket);
closereturn;
}
(proxy_socket, buffer, read_size, 0);
send= recv(proxy_socket, buffer, sizeof(buffer), 0);
read_size (client_socket, buffer, read_size, 0);
send(proxy_socket);
close(client_socket); close
If all goes well, the client should see the response from the server. We can text this out using curl, which the argument for proxy set:
# proxy running at localhost 5100
$ curl -x 'localhost:5100' http://google.com
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>
This looks good! This response looks identical to the
one we’d get if we just hit
http://google.com
directly. Our proxy
works.
HTTPS and beyond
Unfortunetely our proxy is missing something quite important. Unsecured HTTP traffic is quite rare these days and most services enforce HTTPS. If we were to change our above curl request to:
$ curl -x 'localhost:5100' https://google.com
Our proxy would not work. This is because for HTTPS, a client connecting to a secured server via a proxy sends the following request:
CONNECT google.com HTTP/1.1
As opposed to a simple GET request:
GET google.com HTTP/1.1
The connect request is specific to HTTPS and is utilized when a client wants to initiate a secure tunnel between itself and the end server.
We will explore creating a HTTPS forward proxy, but
for now we can return a 501 Not Implemented
response and let the tech debt accrue:
if (strcmp(request.method, "CONNECT") == 0)
{
("Client requesting secure tunnel via https\n");
printf// todo: implement https tunnel
char *response = "HTTP/1.1 501 Not Implemented\r\n\r\n";
(client_socket, response, strlen(response));
write(client_socket);
closereturn;
}
Lastly, our server is fairly rudimentary. Introducing multi-threaded request handling would be a nice optimization as we build out more features.
Find the full code for this post here.