HTTPS partially protects your privacy, but it still leaves a trail
Unlike the fine folks at The SSL Store, the majority of the world doesn’t spend its time thinking about encryption. In fact, it’s a concept most people have a rudimentary knowledge of, at best.
Take for instance HTTPS, which despite being practically mandatory in 2019 still isn’t widely understood. For instance, we talk a lot about how the mere presence of HTTPS does not an honest site make. Or that bad guys can activate the green padlock, too. Well, today we’re going to address another point of confusion. We’ll answer the question: what does HTTPS protect? Then we’ll talk about what it doesn’t, and how you can protect that, too.
Let’s hash it out.
HTTPS 101: Covering the Basics
Let’s start with SSL/TLS, HTTPS and what it all means. We’ll try to keep it brief, but we’re going to get into TCP/IP layers – territory we haven’t covered – so bear with me. It will be important later.
Let’s start with HTTP or the Hypertext Transfer Protocol. Hypertext is text that includes references (aka hyperlinks) to other text that the reader can navigate to. That’s the technology websites are largely built on. If you look at a website’s code it’s a combination of text, formatting and references to other resources.
HTTP was the protocol designed to transmit hypertext (hence the name) and, following its creation in 1989, it quickly became the backbone of the internet.
At the time HTTP was adequate given the nature of the early internet, used primarily by the government and academia for the free transfer of information. At that point commercial activity was banned on the internet and concepts like digital privacy and data rights hadn’t even been considered yet.
Obviously, today, that is no longer the case. Owing to the ubiquity of the modern internet, and the range of sensitive functions it now handles, privacy and security are now seen as profoundly important.
Enter SSL/TLS and HTTPS. HTTPS is literally HTTP over SSL/TLS. It uses public key encryption to distribute a shared symmetric session key that can be used to communicate securely for the duration of an internet connection.
TCP/IP layers and HTTPS
Let’s cover some new territory and talk about TCP/IP layers, which will help clarify the discussion we’re going to have in a minute. There are two different models that describe the various communication protocols that work together facilitating the internet. They’re broken down into layers. There is the 7-layer OSI model and the 4-layer TCP/IP model.
We’re going to focus on the latter for simplicity’s sake.
This is a closer look at the TCP/IP model – and before we get any further, let’s not get too hung up on the nomenclature, it varies slightly from organization to organization. Focus on the concepts.
Again, I’m simplifying a bit here, I could also add FTP, DHCP, etc. Let’s keep it as simple as we can.
Now, why are we talking about layers and what does it all mean? Well, the way that this model categorizes internet connections at four different levels is helpful in understanding how the internet works on a technical level and that will help explain what HTTPS does and doesn’t make private.
Let’s go bottom to top. On the lowest, network access layer you have physical connections to the internet and MAC addresses, which are unique identifiers for network interface controllers. This level handles how data is physically exchanged within a network. It’s really not too important to dwell on this level as it is completely unaffected by this discussion.
One level up you have the network layer, sometimes called the internet layer. This is one layer where both the TCP/IP and OSI models mostly agree, though there is disparity in terms of the allowed characteristics of the protocols placed into these layers. In the TCP/IP model, the network/internet layer is really just a subset of the network access layer’s functionality as it specifically pertains to the internet. Don’t worry about what that means, it’s literally only included because I can foresee it coming up in the comments.
What you really need to know about the Network layer is that it’s the layer where packets are passed from source to destination (network to network) via router. This is the portion of your connection that can be tracked by running a tracert command in your console.
The Transport layer handles end-to-end communication, this is where the data being sent is converted into packets, given the correct header and transmitted at the network level. There are two primary protocols used in this layer, TCP and UDP.
TCP or the Transmission Control Protocol provides error-free communication between end points. Data is sequenced and segmented, then channeled through the correct ports. It uses a check-sum to ensure the integrity of the data being transmitted. While TCP is reliable, it also presents a problem in terms of the overhead it requires – meaning its error-free nature is actually a double-edged sword that can add latency to a connection.
UDP or User Datagram Protocol, on the other hand, is not error-free. It’s actually connectionless, too. Let’s not stray too far into the weeds, but a connection requires a known arrangement between both parties. Before any data can be transmitted, the receiving party needs to be ready to receive it. Not so with connectionless protocols.
Here’s a practical example: say I want to tell our resident IT sage, Ross Thomas, that I inadvertently wrecked my Bluetooth dongle (don’t ask). Were I to dial his extension, wait for him to pick up his desk phone and then tell him about my dongle it would be akin to TCP.
If I just stood up and started screaming across the office at Ross about my dongle, regardless of whether he was paying attention, that would be UDP (and potentially an uncomfortable chat with HR).
Back to the topic at hand. Finally, at the top level we have the application layer, which is the level that most internet users are used to engaging with. That’s because the application level is the point at which user requests are initiated. There are a ton of different protocols that operate on the application level.
- X Window
You don’t need to know what each one of those means, but you should be able to at least recognize several. SSH is secure shell, it provides remote access. FTP, File Transfer Protocol. SMTP handles email. DNS, domain name server lookups. And of course, HTTP/HTTPS.
Now let’s tie this all together…
TCP/IP: Understanding internet connections
This actually makes a whole lot more sense when you see it in motion, so let’s start at the application level. And maybe a better way to think of the application layer is as the surface layer. Because it’s where most of these requests are going to start. That’s why it has so many available protocols, because it covers such a wide range of functions – everything from mail to file transfers to making sure the time is correct.
So starting from the application layer, there’s a request made. For the sake of this exercise we’ll say it has something to do with email, but it could be just about anything. The application layer is going to take that request, which is really just data, and it’s going to transmit it to the transport layer via the correct port.
A port is basically a specialized connection point, it’s how a client specifies a particular service on a device in a network. Put simply, by designating a specific port it tells the device what service is being requested. So, for an HTTP request to a website, port 80 would be used. Port 443 is for HTTPS. Ports 21 & 22 are for FTP data. Back to our example, email requests use port 25.
So, the application designates the correct port for the data and sends it along to the transport layer, where it’s going to be converted into data packets and sent to the destination device. Before it’s transmitted, a header is affixed corresponding to the protocol that will be used for the end-to-end connection, be that TCP or UDP.
We mentioned earlier that TCP is for reliable communication whereas UDP is used when that’s not a priority. The TCP header includes information about the sequencing of the packets, a check-sum to ensure data integrity, etc. When the data arrives via TCP, the device on the other end uses the information contained in the header to reconstruct the data and ensure that it arrived in-tact.
Now that the data has been ported, segmented into packets and given the proper headers it drops down to the network layer where it’s routed to its intended destination, typically via the IP protocol. This is done by affixing an IP header that contains information on the source and destination (IP addresses). I’m not going to go too deep into routing because the aforementioned Ross Thomas went in-depth about it last year and that’s worth checking out on its own.
The TCP/IP Network Access layer comprises the physical layer and the data link layers of the OSI model, and this is actually one place where crossing over might help clear some things up. Whereas the top three layers dictate how the message should be segmented and transmitted from network to network, the data link layer deals with things like the MAC addresses of the devices themselves, within the network. While the physical layer converts the data into a transmittable format like electrical signals for wired connections and electromagnetic waves for wireless ones.
Before we move on, here’s one more way to look at it:
So, what does HTTPS protect?
Despite what its name would imply, TLS or Transport Layer Security doesn’t always occur at the transport layer. That’s largely due to the fact HTTPS is an application layer protocol. Now, there’s a different between TLS and HTTPS. TLS has other applications, it can be used in a lot more ways than most people realize. But we’re focusing on HTTPS.
Remember at the top of the article we talked about the misperceptions about HTTPS? Well, beyond the biggest – that if a website displays the green padlock it must be safe – another one of the more pervasive ones is that HTTPS keeps your internet browsing completely private.
This is true, to an extent, but not nearly as true as many people would like to believe. The reason we just trudged through the TCP/IP model is to give you an idea of what is – and more important what isn’t – made private by the use of HTTPS. I apologize if any of that felt tedious.
Because when it comes to encryption and the security it provides, everything points up. Or, put another way, the security benefits don’t move down the stack. Security applied at the application layer doesn’t do anything for what’s occurring at the transport or network layers.
When you make an HTTPS connection, the device you’re using sends a TCP request via port 443. Once that connection begins, it’s only encrypting application-layer data, meaning the information exchanged between the client and server – messages, cookies, the content on a web page, etc.
HTTPS means your ISP, or anyone else for that matter, can’t see the pages you’re accessing or the communication you’re having during the connection.
But, as we just covered, there’s also a lot of data that isn’t being protected, much of it capable of identifying the user, this is sometimes called fingerprinting, too.
Even if an attacker can’t directly see what they’re doing. HTTPS doesn’t hide:
- The website or server with which the user connected
- The frequency with which connections occurred
- The sizes of the messages being exchanged
- The destination server port
- The user’s IP address
- The user’s location
The last three are particularly troublesome for internet users living in countries with repressive oversight of their online activities. The destination port, in particular, is one reason why privacy advocates have come out in support for DNS over HTTPS instead of DNS over TLS. DNS over TLS uses a dedicated port whereas DNS over HTTPS uses port 443, like all other HTTPS traffic.
That’s one more thing HTTPS doesn’t protect against unless you’re specifically configured for DNS over HTTPS: DNS requests.
This is why you need a VPN
HTTPS is an excellent first step towards a more private internet, but it’s not enough. Especially for marginalized people living in countries without full internet freedom. This is where a VPN can help to fill in some of those privacy gaps.
Now, VPNs aren’t foolproof either, there are some considerations to be made – but that’s another article. Let’s keep this at a conceptual level. A VPN can be set up at a couple different levels of the TCP/IP stack. You can set them up at the network level, typically on a router, and encrypt everything from that layer on up. You can set one up on the transport layer, or you can keep things on the application layer.
The main drawback to VPNs is that they typically take some self-configuration. But that small obstacle belies the benefits of being able to protect more of your privacy. Before you do anything online, you sign into your VPN client, which connects you to its servers. Everything exchanged between your network or device, and the DNS server is encrypted.
Most good VPNs have servers located around the world, which you can log into remotely. This is called tunneling, and it essentially allows you to spoof your IP address. Rather than use your own, you use one assigned by the VPN’s server. You could be in Florida logged into a Japanese server watching Japanese Netflix as if you were actually in the country.
You can probably see how being able to surf the internet using an IP from another location would help add an additional level of anonymity. It also helps cover up a lot of the information left exposed by HTTPS. That’s because all of the data that would typically point back to you now points back to your VPN instead. Someone attempting to spy on you might know that you’re using a VPN, but beyond that everything is hidden.
Of course, if you’re using an untrustworthy VPN it could all be for naught, because the VPN still has that information about you. But again, another discussion for another day.
Some additional steps you can take to avoid being fingerprinted:
- Use your browser’s incognito/private mode
- Configure your browser to send no-cache HTTP headers
- Turn off telemetry and targeted advertising
- Disable all cookies
The most important thing you could take away from today’s article is that while HTTPS is a critical component of any security strategy, it doesn’t protect everything. It doesn’t keep you completely private.
You need to go a little bit above and beyond for that.
As always leave any comments or questions below…