Subscribe to Albert's Import
Subscribe to Albert's Import
Share Dialog
Share Dialog
<100 subscribers
<100 subscribers
We are continuing along with the web request cycle. Last week we took a look at the HTTP protocol. There I already mentioned that HTTP requests and responses travel over a TCP/IP connection. Today we will dive a bit deeper into TCP/IP. This is technically not really necessary for understanding the request cycle because these lower levels of the network are completely abstracted away when you develop for the web (which is a fancy way of saying you get to use it without worrying about how it works). Yet, peeling the onion a bit further will turn out to be very useful to the overall understanding of how things work on the web.
In the Tech Tuesday on networking, I introduced the idea that the Internet is a packet switched network. As a refresher this means that data gets cut up into packets. The IP layer is responsible for how these packets move across the network. What follows is quite a bit of a simplification but good enough for our purposes here. Each packet (sometimes also referred to as a datagram) has its own header which contains among other things the source and destination IP addresses. These packets travel between machines along flexible paths known as routes. There is a tool called traceroute for examining what these routes are and it is worth trying this out.
On a Mac, use Spotlight to find and start the “Terminal” application. You will get a new window with a prompt which lets you type commands (this is known as the command line and we will learn a lot more about it in a future Tech Tuesday). Type “traceroute google.com” and you will see output that looks something like the following:
1 192.168.1.1 (192.168.1.1) 1.987 ms 0.864 ms 0.794 ms 2 10.32.128.1 (10.32.128.1) 9.576 ms 8.510 ms 7.638 ms 3 gig-0-3-0-7-nycmnya-rtr2.nyc.rr.com (24.29.97.130) 7.983 ms 8.371 ms 8.123 ms 4 tenge-0-5-0-0-nycmnytg-rtr001.nyc.rr.com (24.29.150.90) 12.007 ms 12.481 ms nycmnytg-10g-0-0-0.nyc.rr.com (24.29.148.29) 14.716 ms 5 bun6-nycmnytg-rtr002.nyc.rr.com (24.29.148.250) 18.132 ms 11.899 ms 12.706 ms 6 ae-4-0.cr0.nyc30.tbone.rr.com (66.109.6.78) 7.120 ms 8.395 ms 8.113 ms 7 ae-4-0.cr0.dca20.tbone.rr.com (66.109.6.28) 13.161 ms 66.109.9.30 (66.109.9.30) 14.679 ms ae-4-0.cr0.dca20.tbone.rr.com (66.109.6.28) 13.992 ms 8 107.14.19.135 (107.14.19.135) 14.153 ms 12.694 ms ae-1-0.pr0.dca10.tbone.rr.com (66.109.6.165) 14.154 ms 9 66.109.9.66 (66.109.9.66) 15.230 ms 74.125.49.181 (74.125.49.181) 13.553 ms 66.109.9.66 (66.109.9.66) 13.315 ms 10 209.85.252.46 (209.85.252.46) 17.017 ms 14.467 ms 209.85.252.80 (209.85.252.80) 15.536 ms 11 209.85.243.114 (209.85.243.114) 26.926 ms 209.85.241.222 (209.85.241.222) 25.348 ms 25.406 ms 12 216.239.48.103 (216.239.48.103) 25.799 ms 64.233.174.87 (64.233.174.87) 25.046 ms 216.239.48.103 (216.239.48.103) 32.101 ms 13 * 209.85.242.177 (209.85.242.177) 40.436 ms * 14 vx-in-f103.1e100.net (74.125.115.103) 25.568 ms 26.283 ms 26.659 ms
Each one of these lines represents a so-called “hop” – meaning packets traveling between two internet devices. The first hop is from my computer to my home switch. The second hop is from there to my home VPN device which is connected to a cable modem from Time Warner. From there the packets travel over a whole bunch more intermediate switches and routers until the get to a server operated by Google. You can try this with other servers as well, such as “traceroute www.dailylit.com” – if the output get stuck with lines containing just “* * *” instead of information on hops, then you can terminate the process by pressing Ctrl-C. For those of you on Windows, here is how to run a traceroute.
Now the really important part to keep in mind about the IP level of the protocol is that it is strictly best efforts. This means that packets can travel different routes, can get dropped and can arrive out of order at the destination. So how in the world do we get an HTTP request and response across such a fundamentally unreliable network? Well that’s where the TCP portion comes in. TCP the Transmission Control Protocol sits on top of IP and provides for guaranteed in-order delivery of packets. How does it do that? Well, the details are complicated, but for our purposes it is sufficient to understand that it starts with a fair bit of initial “handshaking” (back and forth) where the two endpoints (sender and receiver) agree on what they will do. Once that “connection” has been established it becomes possible to keep track of which packets have been received and which have not and to cause packets that might have been dropped to be resent.
What are some of the takeaways here? First, having fewer hops will make things faster. If you try different servers with traceroute, you will see that a lot of servers are more hops away than Google’s – Google has invested heavily in shortening the paths to their servers. This is also what so-called CDNs or Content Delivery Networks do. They bring content (e.g., images) closer to the “edge” of the network so that requests have fewer hops. Second, setting up a TCP connection involves a fair bit of overhead. In the first version of HTTP each request required a new connection which was very inefficient. With HTTP 1.1 a single connection is kept alive for a sequence of requests and responses (a session). But there is still a separate connection required for each different server and so a web page that connects to many different resources incurs more overhead. Third, if you really want a lot of speed it helps to reduce the number of packets that need to be sent. In the early days, the entire home page of Google was optimized to fit into a single package.
We are continuing along with the web request cycle. Last week we took a look at the HTTP protocol. There I already mentioned that HTTP requests and responses travel over a TCP/IP connection. Today we will dive a bit deeper into TCP/IP. This is technically not really necessary for understanding the request cycle because these lower levels of the network are completely abstracted away when you develop for the web (which is a fancy way of saying you get to use it without worrying about how it works). Yet, peeling the onion a bit further will turn out to be very useful to the overall understanding of how things work on the web.
In the Tech Tuesday on networking, I introduced the idea that the Internet is a packet switched network. As a refresher this means that data gets cut up into packets. The IP layer is responsible for how these packets move across the network. What follows is quite a bit of a simplification but good enough for our purposes here. Each packet (sometimes also referred to as a datagram) has its own header which contains among other things the source and destination IP addresses. These packets travel between machines along flexible paths known as routes. There is a tool called traceroute for examining what these routes are and it is worth trying this out.
On a Mac, use Spotlight to find and start the “Terminal” application. You will get a new window with a prompt which lets you type commands (this is known as the command line and we will learn a lot more about it in a future Tech Tuesday). Type “traceroute google.com” and you will see output that looks something like the following:
1 192.168.1.1 (192.168.1.1) 1.987 ms 0.864 ms 0.794 ms 2 10.32.128.1 (10.32.128.1) 9.576 ms 8.510 ms 7.638 ms 3 gig-0-3-0-7-nycmnya-rtr2.nyc.rr.com (24.29.97.130) 7.983 ms 8.371 ms 8.123 ms 4 tenge-0-5-0-0-nycmnytg-rtr001.nyc.rr.com (24.29.150.90) 12.007 ms 12.481 ms nycmnytg-10g-0-0-0.nyc.rr.com (24.29.148.29) 14.716 ms 5 bun6-nycmnytg-rtr002.nyc.rr.com (24.29.148.250) 18.132 ms 11.899 ms 12.706 ms 6 ae-4-0.cr0.nyc30.tbone.rr.com (66.109.6.78) 7.120 ms 8.395 ms 8.113 ms 7 ae-4-0.cr0.dca20.tbone.rr.com (66.109.6.28) 13.161 ms 66.109.9.30 (66.109.9.30) 14.679 ms ae-4-0.cr0.dca20.tbone.rr.com (66.109.6.28) 13.992 ms 8 107.14.19.135 (107.14.19.135) 14.153 ms 12.694 ms ae-1-0.pr0.dca10.tbone.rr.com (66.109.6.165) 14.154 ms 9 66.109.9.66 (66.109.9.66) 15.230 ms 74.125.49.181 (74.125.49.181) 13.553 ms 66.109.9.66 (66.109.9.66) 13.315 ms 10 209.85.252.46 (209.85.252.46) 17.017 ms 14.467 ms 209.85.252.80 (209.85.252.80) 15.536 ms 11 209.85.243.114 (209.85.243.114) 26.926 ms 209.85.241.222 (209.85.241.222) 25.348 ms 25.406 ms 12 216.239.48.103 (216.239.48.103) 25.799 ms 64.233.174.87 (64.233.174.87) 25.046 ms 216.239.48.103 (216.239.48.103) 32.101 ms 13 * 209.85.242.177 (209.85.242.177) 40.436 ms * 14 vx-in-f103.1e100.net (74.125.115.103) 25.568 ms 26.283 ms 26.659 ms
Each one of these lines represents a so-called “hop” – meaning packets traveling between two internet devices. The first hop is from my computer to my home switch. The second hop is from there to my home VPN device which is connected to a cable modem from Time Warner. From there the packets travel over a whole bunch more intermediate switches and routers until the get to a server operated by Google. You can try this with other servers as well, such as “traceroute www.dailylit.com” – if the output get stuck with lines containing just “* * *” instead of information on hops, then you can terminate the process by pressing Ctrl-C. For those of you on Windows, here is how to run a traceroute.
Now the really important part to keep in mind about the IP level of the protocol is that it is strictly best efforts. This means that packets can travel different routes, can get dropped and can arrive out of order at the destination. So how in the world do we get an HTTP request and response across such a fundamentally unreliable network? Well that’s where the TCP portion comes in. TCP the Transmission Control Protocol sits on top of IP and provides for guaranteed in-order delivery of packets. How does it do that? Well, the details are complicated, but for our purposes it is sufficient to understand that it starts with a fair bit of initial “handshaking” (back and forth) where the two endpoints (sender and receiver) agree on what they will do. Once that “connection” has been established it becomes possible to keep track of which packets have been received and which have not and to cause packets that might have been dropped to be resent.
What are some of the takeaways here? First, having fewer hops will make things faster. If you try different servers with traceroute, you will see that a lot of servers are more hops away than Google’s – Google has invested heavily in shortening the paths to their servers. This is also what so-called CDNs or Content Delivery Networks do. They bring content (e.g., images) closer to the “edge” of the network so that requests have fewer hops. Second, setting up a TCP connection involves a fair bit of overhead. In the first version of HTTP each request required a new connection which was very inefficient. With HTTP 1.1 a single connection is kept alive for a sequence of requests and responses (a session). But there is still a separate connection required for each different server and so a web page that connects to many different resources incurs more overhead. Third, if you really want a lot of speed it helps to reduce the number of packets that need to be sent. In the early days, the entire home page of Google was optimized to fit into a single package.
Albert's Import
Albert's Import
No activity yet