首页
编程语言
数据库
网络开发
Algorithm算法
移动开发
系统相关
金融统计
人工智能
其他
首页
>
> 详细
辅导CMPSC 311、讲解C/C++,web语言、辅导HTML/WEB设计、讲解Web Proxy 讲解SPSS|解析SPSS
CMPSC 311, Fall 2018
Proxy Lab: Writing a Caching Web Proxy
Assigned: Wed, Nov 14, 2018
Due: Wed, Dec 5, 11:59 PM
Last Possible Time to Turn In: Fri, Dec 07, 11:59 PM
1 Introduction
A Web proxy is a program that acts as a middleman between a Web browser and an end server. Instead of
contacting the end server directly to get a Web page, the browser contacts the proxy, which forwards the
request on to the end server. When the end server replies to the proxy, the proxy sends the reply on to the
browser.
Proxies are useful for many purposes. Sometimes proxies are used in firewalls, so that browsers behind a
firewall can only contact a server beyond the firewall via the proxy. Proxies can also act as anonymizers:
by stripping requests of all identifying information, a proxy can make the browser anonymous to Web
servers. Proxies can even be used to cache web objects by storing local copies of objects from servers then
responding to future requests by reading them out of its cache rather than by communicating again with
remote servers.
In this lab, you will write a simple HTTP proxy that caches web objects. For the first part of the lab, you will
set up the proxy to accept incoming connections, read and parse requests, forward requests to web servers,
read the servers’ responses, and forward those responses to the corresponding clients. This first part will
involve learning about basic HTTP operation and how to use sockets to write programs that communicate
over network connections. In the second part, you will add caching to your proxy using a simple main
memory cache of recently accessed web content.
2 Logistics
This is an individual project.
13 Handout instructions
Download proxylab-handout.tar file from Canvas. Copy the handout file to a protected directory
on the Linux machine where you plan to do your work, and then issue the following command:
linux> tar xvf proxylab-handout.tar
This will generate a handout directory called proxylab-handout. The README file describes the
various files.
4 Part I: Implementing a sequential web proxy
The first step is implementing a basic sequential proxy that handles HTTP/1.0 GET requests. Other requests
type, such as POST, are strictly optional.
When started, your proxy should listen for incoming connections on a port whose number will be specified
on the command line. Once a connection is established, your proxy should read the entirety of the request
from the client and parse the request. It should determine whether the client has sent a valid HTTP request;
if so, it can then establish its own connection to the appropriate web server then request the object the client
specified. Finally, your proxy should read the server’s response and forward it to the client.
4.1 HTTP/1.0 GET requests
When an end user enters a URL such as http://web.mit.edu/index.html into the address bar
of a web browser, the browser will send an HTTP request to the proxy that begins with a line that might
resemble the following:
GET http://web.mit.edu/index.html HTTP/1.1
In that case, the proxy should parse the request into at least the following fields: the hostname, web.mit.edu;
and the path or query and everything following it, /index.html. That way, the proxy can determine that
it should open a connection to web.mit.edu and send an HTTP request of its own starting with a line of
the following form:
GET /index.html HTTP/1.0
Note that all lines in an HTTP request end with a carriage return, ‘\r’, followed by a newline, ‘\n’. Also
important is that every HTTP request is terminated by an empty line: "\r\n".
You should notice in the above example that the web browser’s request line ends with HTTP/1.1, while
the proxy’s request line ends with HTTP/1.0. Modern web browsers will generate HTTP/1.1 requests, but
your proxy should handle them and forward them as HTTP/1.0 requests.
2It is important to consider that HTTP requests, even just the subset of HTTP/1.0 GET requests, can be
incredibly complicated. The textbook describes certain details of HTTP transactions, but you should refer
to RFC 1945 for the complete HTTP/1.0 specification. Ideally your HTTP request parser will be fully
robust according to the relevant sections of RFC 1945, except for one detail: while the specification allows
for multiline request fields, your proxy is not required to properly handle them. Of course, your proxy
should never prematurely abort due to a malformed request.
4.2 Request headers
The important request headers for this lab are the Host, User-Agent, Connection, and Proxy-Connection
headers:
Always send a Host header. While this behavior is technically not sanctioned by the HTTP/1.0
specification, it is necessary to coax sensible responses out of certain Web servers, especially those
that use virtual hosting.
The Host header describes the hostname of the end server. For example, to access http://web.
mit.edu/index.html, your proxy would send the following header:
Host: web.mit.edu
It is possible that web browsers will attach their own Host headers to their HTTP requests. If that is
the case, your proxy should use the same Host header as the browser.
You may choose to always send the following User-Agent header:
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.3)
Gecko/20120305 Firefox/10.0.3
The header is provided on two separate lines because it does not fit as a single line in the writeup, but
your proxy should send the header as a single line.
The User-Agent header identifies the client (in terms of parameters such as the operating system
and browser), and web servers often use the identifying information to manipulate the content they
serve. Sending this particular User-Agent: string may improve, in content and diversity, the material
that you get back during simple telnet-style testing.
Always send the following Connection header:
Connection: close
Always send the following Proxy-Connection header:
Proxy-Connection: close
3The Connection and Proxy-Connection headers are used to specify whether a connection
will be kept alive after the first request/response exchange is completed. It is perfectly acceptable
(and suggested) to have your proxy open a new connection for each request. Specifying close as
the value of these headers alerts web servers that your proxy intends to close connections after the
first request/response exchange.
For your convenience, the values of the described User-Agent header is provided to you as a string
constant in proxy.c.
Finally, if a browser sends any additional request headers as part of an HTTP request, your proxy should
forward them unchanged.
4.3 Port numbers
There are two significant classes of port numbers for this lab: HTTP request ports and your proxy’s listening
port.
The HTTP request port is an optional field in the URL of an HTTP request. That is, the URL may be of
the form, http://cse-cmpsc311.cse.psu.edu:8080, in which case your proxy should connect
to the host cse-cmpsc311.cse.psu.edu on port 8080 instead of the default HTTP port, which is port
80. Your proxy must properly function whether or not the port number is included in the URL.
The listening port is the port on which your proxy should listen for incoming connections. Your proxy
should accept a command line argument specifying the listening port number for your proxy. For example,
with the following command, your proxy should listen for connections on port 8081:
linux> ./proxy 8081
You may select any non-privileged listening port (greater than 1,024 and less than 65,536) as long as it
is not used by other processes. Since each proxy must use a unique listening port and many people will
simultaneously be working on each machine, the script port-for-user.pl is provided to help you
pick your own personal port number. Use it to generate port number based on your user ID:
linux> ./port-for-user.pl droh
droh: 45806
The port, p, returned by port-for-user.pl is always an even number. So if you need an additional
port number, say for the Tiny server, you can safely use ports p and p + 1.
Please don’t pick your own random port. If you do, you run the risk of interfering with another user.
5 Part II: Caching your Requests
For the second part of the lab, you will add a cache to your proxy that stores recently-used Web objects in
memory. HTTP actually defines a fairly complex model by which web servers can give instructions as to
4how the objects they serve should be cached and clients can specify how caches should be used on their
behalf. However, your proxy will adopt a simplified approach.
When your proxy receives a web object from a server, it should cache it in memory as it transmits the object
to the client. If another client requests the same object from the same server, your proxy need not reconnect
to the server; it can simply resend the cached object.
Obviously, if your proxy were to cache every object that is ever requested, it would require an unlimited
amount of memory. Moreover, because some web objects are larger than others, it might be the case that
one giant object will consume the entire cache, preventing other objects from being cached at all. To avoid
those problems, your proxy should have both a maximum cache size and a maximum cache object size.
5.1 Maximum cache size
The entirety of your proxy’s cache should have the following maximum size:
MAX_CACHE_SIZE = 16 MB (16777216 Bytes)
When calculating the size of its cache, your proxy must only count bytes used to store the actual web objects;
any extraneous bytes, including metadata, should be ignored.
5.2 Maximum object size
Your proxy should only cache web objects that do not exceed the following maximum size:
MAX_OBJECT_SIZE = 8 MB (8388608 Bytes)
For your convenience, both size limits are provided as macros in proxy.c.
The easiest way to implement a correct cache is to allocate a buffer for the active connection and accumulate
data as it is received from the server. If the size of the buffer ever exceeds the maximum object size, the
buffer can be discarded. If the entirety of the web server’s response is read before the maximum object size
is exceeded, then the object can be cached. Using this scheme, the maximum amount of data your proxy
will ever use for web objects is the following:
MAX_CACHE_SIZE + MAX_OBJECT_SIZE
5.3 Eviction policy
Your proxy’s cache should employ an eviction policy that is a least-recently-used (LRU) eviction policy for
your sequential proxy server. Notice that both reading an object from the cache and writing it into the cache
count as using the object.
56 Evaluation
This assignment will be graded out of a total of 55 points:
BasicCorrectness: 30 points for basic proxy operation
Cache: 25 points for a working cache
6.1 Autograding
Your handout materials include an autograder, called driver.sh, that your instructor will use to get
preliminary scores for BasicCorrectness, and Cache. From the proxylab-handout directory:
linux> ./driver.sh
You must run the driver on a Linux machine.
The autograder does only simple checks to confirm that your code is acting like a caching proxy. For the
final grade, we will do additional manual testing to see how your proxy deals with real pages. Here is a list
of some pages that still uses http protocol (as of Nov. 14th 2018) that you can use to test.
http://web.mit.edu
http://www.espn.com
http://www.bbc.com
http://cse-cmpsc311.cse.psu.edu:8080
6.2 Robustness
As always, you must deliver a program that is robust to errors and even malformed or malicious input.
Servers are typically long-running processes, and web proxies are no exception. Think carefully about how
long-running processes should react to different types of errors. For many kinds of errors, it is certainly
inappropriate for your proxy to immediately exit.
Robustness implies other requirements as well, including invulnerability to error cases like segmentation
faults and a lack of memory leaks and file descriptor leaks.
7 Testing and debugging
Besides the simple autograder, you will not have any sample inputs or a test program to test your implementation.
You will have to come up with your own tests and perhaps even your own testing harness to help
you debug your code and decide when you have a correct implementation. This is a valuable skill in the real
world, where exact operating conditions are rarely known and reference solutions are often unavailable.
6Fortunately there are many tools you can use to debug and test your proxy. Be sure to exercise all code paths
and test a representative set of inputs, including base cases, typical cases, and edge cases.
7.1 Tiny web server
Your handout directory the source code for the CS:APP Tiny web server. While not as powerful as thttpd,
the CS:APP Tiny web server will be easy for you to modify as you see fit. It’s also a reasonable starting
point for your proxy code. And it’s the server that the driver code uses to fetch pages.
7.2 telnet
As described in your textbook (11.5.3), you can use telnet to open a connection to your proxy and send
it HTTP requests.
7.3 curl
You can use curl to generate HTTP requests to any server, including your own proxy. It is an extremely
useful debugging tool. For example, if your proxy and Tiny are both running on the local machine, Tiny is
listening on port 8080, and proxy is listening on port 8081, then you can request a page from Tiny via your
proxy using the following curl command:
$ curl -v --proxy localhost:8081 http://localhost:8080
* About to connect() to proxy localhost port 8081 (#0)
* Trying ::1... Connection refused
* Trying 127.0.0.1... connected
* Connected to localhost (127.0.0.1) port 8081 (#0)
> GET http://localhost:8080/ HTTP/1.1
> User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.27.1 zlib/1.2.3 libidn/1.18 libssh2/1.4.2
> Host: localhost:8080
> Accept: */*
> Proxy-Connection: Keep-Alive
>
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Server: Tiny Web Server
< Connection: close
< Content-length: 121
< Content-type: text/html
<
test
Dave O’Hallaron
7* Closing connection #0
7.4 netcat
netcat, also known as nc, is a versatile network utility. You can use netcat just like telnet, to open
connections to servers. Hence, imagining that your proxy were running on localhost using port 8081
you can do something like the following to manually test your proxy:
$ nc localhost 8081
GET http://cse-cmpsc311.cse.psu.edu:8080 HTTP/1.1
HTTP/1.0 200 OK
MIME-Version: 1.0
Content-Type: text/html
Content-Length: 40922
....
In addition to being able to connect to Web servers, netcat can also operate as a server itself. With the
following command, you can run netcat as a server listening on port 12345:
sh> nc -l 12345
Once you have set up a netcat server, you can generate a request to a phony object on it through your
proxy, and you will be able to inspect the exact request that your proxy sent to netcat.
7.5 Web browsers
Eventually you should test your proxy using the most recent version of Mozilla Firefox. Visiting About Firefox
will automatically update your browser to the most recent version.
To configure Firefox to work with a proxy, visit
Preferences>Advanced>Network>Settings
It will be very exciting to see your proxy working through a real Web browser. Although the functionality of
your proxy will be limited, you will notice that you are able to browse the vast majority of websites through
your proxy.
An important caveat is that you must be very careful when testing caching using a Web browser. All modern
Web browsers have caches of their own, which you should disable before attempting to test your proxy’s
cache.
88 Handin instructions
The provided Makefile includes functionality to build your final handin file. Issue the following command
from your working directory:
linux> make handin
The output is the file ../proxylab-handin.tar, which you can then handin.
Please make sure that the handin.tar file you submitted really works. You should download your submitted
version, unpack in a fresh directory, enter make and test the generated proxy program. This is the last project
of the semester and you will not have a chance to resubmit if you provide us a wrong copy.
Submit thie proxylab-handin.tar file to Canvas.
Chapters 10-11 of the textbook contains useful information on system-level I/O, network programming,
HTTP protocols.
RFC 1945 (http://www.ietf.org/rfc/rfc1945.txt) is the complete specification for the
HTTP/1.0 protocol.
9 Hints
As discussed in Section 10.11 of your textbook, using standard I/O functions for socket input and
output is a problem. Instead, we recommend that you use the Robust I/O (RIO) package, which is
provided in the csapp.c file in the handout directory.
The error-handling functions provide in csapp.c are not appropriate for your proxy because once a
server begins accepting connections, it is not supposed to terminate. You’ll need to modify them or
write your own.
You are free to modify the files in the handout directory any way you like. For example, for the sake
of good modularity, you might implement your cache functions as a library in files called cache.c
and cache.h. Of course, adding new files will require you to update the provided Makefile.
As discussed in the Aside on page 964 of the CS:APP3e text, your proxy must ignore SIGPIPE signals
and should deal gracefully with write operations that return EPIPE errors.
Sometimes, calling read to receive bytes from a socket that has been prematurely closed will cause
read to return -1 with errno set to ECONNRESET. Your proxy should not terminate due to this
error either.
Remember that not all content on the web is ASCII text. Much of the content on the web is binary
data, such as images and video. Ensure that you account for binary data when selecting and using
functions for network I/O.
9 Forward all requests as HTTP/1.0 even if the original request was HTTP/1.1.
Good luck!
联系我们
QQ:99515681
邮箱:99515681@qq.com
工作时间:8:00-21:00
微信:codinghelp
热点文章
更多
辅导 comm2000 creating socia...
2026-01-08
讲解 isen1000 – introductio...
2026-01-08
讲解 cme213 radix sort讲解 c...
2026-01-08
辅导 csc370 database讲解 迭代
2026-01-08
讲解 ca2401 a list of colleg...
2026-01-08
讲解 nfe2140 midi scale play...
2026-01-08
讲解 ca2401 the universal li...
2026-01-08
辅导 engg7302 advanced compu...
2026-01-08
辅导 comp331/557 – class te...
2026-01-08
讲解 soft2412 comp9412 exam辅...
2026-01-08
讲解 scenario # 1 honesty讲解...
2026-01-08
讲解 002499 accounting infor...
2026-01-08
讲解 comp9313 2021t3 project...
2026-01-08
讲解 stat1201 analysis of sc...
2026-01-08
辅导 stat5611: statistical m...
2026-01-08
辅导 mth2010-mth2015 - multi...
2026-01-08
辅导 eeet2387 switched mode ...
2026-01-08
讲解 an online payment servi...
2026-01-08
讲解 textfilter辅导 r语言
2026-01-08
讲解 rutgers ece 434 linux o...
2026-01-08
热点标签
mktg2509
csci 2600
38170
lng302
csse3010
phas3226
77938
arch1162
engn4536/engn6536
acx5903
comp151101
phl245
cse12
comp9312
stat3016/6016
phas0038
comp2140
6qqmb312
xjco3011
rest0005
ematm0051
5qqmn219
lubs5062m
eee8155
cege0100
eap033
artd1109
mat246
etc3430
ecmm462
mis102
inft6800
ddes9903
comp6521
comp9517
comp3331/9331
comp4337
comp6008
comp9414
bu.231.790.81
man00150m
csb352h
math1041
eengm4100
isys1002
08
6057cem
mktg3504
mthm036
mtrx1701
mth3241
eeee3086
cmp-7038b
cmp-7000a
ints4010
econ2151
infs5710
fins5516
fin3309
fins5510
gsoe9340
math2007
math2036
soee5010
mark3088
infs3605
elec9714
comp2271
ma214
comp2211
infs3604
600426
sit254
acct3091
bbt405
msin0116
com107/com113
mark5826
sit120
comp9021
eco2101
eeen40700
cs253
ece3114
ecmm447
chns3000
math377
itd102
comp9444
comp(2041|9044)
econ0060
econ7230
mgt001371
ecs-323
cs6250
mgdi60012
mdia2012
comm221001
comm5000
ma1008
engl642
econ241
com333
math367
mis201
nbs-7041x
meek16104
econ2003
comm1190
mbas902
comp-1027
dpst1091
comp7315
eppd1033
m06
ee3025
msci231
bb113/bbs1063
fc709
comp3425
comp9417
econ42915
cb9101
math1102e
chme0017
fc307
mkt60104
5522usst
litr1-uc6201.200
ee1102
cosc2803
math39512
omp9727
int2067/int5051
bsb151
mgt253
fc021
babs2202
mis2002s
phya21
18-213
cege0012
mdia1002
math38032
mech5125
07
cisc102
mgx3110
cs240
11175
fin3020s
eco3420
ictten622
comp9727
cpt111
de114102d
mgm320h5s
bafi1019
math21112
efim20036
mn-3503
fins5568
110.807
bcpm000028
info6030
bma0092
bcpm0054
math20212
ce335
cs365
cenv6141
ftec5580
math2010
ec3450
comm1170
ecmt1010
csci-ua.0480-003
econ12-200
ib3960
ectb60h3f
cs247—assignment
tk3163
ics3u
ib3j80
comp20008
comp9334
eppd1063
acct2343
cct109
isys1055/3412
math350-real
math2014
eec180
stat141b
econ2101
msinm014/msing014/msing014b
fit2004
comp643
bu1002
cm2030
联系我们
- QQ: 99515681 微信:codinghelp
© 2024
www.7daixie.com
站长地图
程序辅导网!