HTTP in brief

This document gives a (rough and incomplete) overview of the HTTP protocol, focusing mainly on usage of web services.

Table of Contents

Introduction
URIs
Content negotiation
Request methods
Status code
APIs
curl: Test using the command line
References
Appendices
- Various
- URIs

Introduction

HTTP, Hypertext Transfer Protocol, is a W3C / IETF standard that describes a way of exchanging information between a client and a server. It is ubiquitous on the internet.

In an HTTP exchange:

a client sends an HTTP request to a server
the server sends an HTTP response back to the client

An HTTP message is either an HTTP request or an HTTP response. The client is typically a browser (such as the one you use when you browse the internet).

Example 1. An HTTP exchange

A client sends an HTTP request, targeted to http://example.com/mycat, specifying that the chosen HTTP method is HTTP GET.
The server replies with an HTTP response with a header “content-type = image/jpeg” and a message body containing an image of a cat.

An HTTP message contains headers and possibly a message body (this document considers pseudo-headers or start-line data as headers, for simplicity).

The headers for an HTTP request specify the URI that the request targets, the request method to be used for the exchange, and possibly other information such as the encodings used in the message body, …
The headers for an HTTP response specify the status code (three digits that indicate whether something went wrong), and possibly other information such as the content-type of the message body, …

HTTP requests target resources, identified by URIs (Uniform Resource Identifiers). For example, the URI http://example.com/mycat might identify the resource describing someone’s cat.

URIs

Some example URIs:

https://www.google.com/maps?saddr=Tour+Eiffel&daddr=Paris-Dauphine
https://github.com/oliviercailloux/java-course/search?q=hello
https://www.airbnb.fr/s/Paris/all?checkin=2018-09-03&checkout=2018-09-07
https://en.wikipedia.org/wiki/John_Rawls#Biography

URIs are composed as follows (parenthesis refer to some of the examples above):

scheme (http, https)
authority (www.google.com, github.com)
path (oliviercailloux/java-course/search)
possibly a question mark followed by a query string (saddr=Tour+Eiffel&daddr=Paris-Dauphine)
possibly a dash followed by a fragment (#Biography)

Furthermore, HTTP conventions dictate that the query string be composed of pairs of parameter and values separated by equal signs, each pair being separated by an ampersand.

Content negotiation

Resources may be representable in different formats. For example, http://example.com/mycat may be available as a JPEG image, a BMP image, an HTML description, or a plain text description. Standards define “MIME types” (also called media types) to represent these formats. Examples: image/jpeg, image/bmp, text/html;charset=utf-8, text/plain;charset=utf-8 (official list).

An HTTP request may include an Accept header that indicates the client’s preference regarding content types to be received. Similarly, the HTTP response may include a Content-Type header that indicates the format of the message body it is serving.

Request methods

Request methods indicate the request semantics: what the client wishes to do in relation with the targeted resource.

A GET request method asks the server for a current representation of the target resource. This is the method your browser uses when you type a URI in your browser navigation bar and hit enter, and by far the most common request method. A GET request typically has no message body.

A POST request method, on the contrary, typically has a message body. It asks the server to perform resource-specific processing on the message body. It is typically used to post new data to a server, for example, to post a new image of your cat for it to become accessible at some URI.

RFC 7231 lists the request methods available in HTTP, and defines properties of request methods. Of special importance are safe methods (essentially read-only) and idempotent methods (repeated identical requests have the same effect as a single request). Safe implies idempotent. GET is both, POST is neither. (That’s important for caching, for example.)

Status code

RFC 7231 defines status codes: each HTTP response must specify exactly one, which can be e.g. 200 OK, 404 Not Found, or may indicate that the request has not been understood, that the resource has moved, …

APIs

“Manual” surfing mostly involve HTML pages and images being displayed in your browser. In supplement, many major websites nowadays permit automated usage of their content. This is made possible through an API of theirs. For example, https://www.mediawiki.org/wiki/API:Main_page documents how to query data stored on a media wiki powered web site (such as Wikipedia). ProgrammableWeb lists many such APIs.

Such APIs generally rely on content negotiation and use appropriate request methods and status codes, as explained above.

curl: Test using the command line

GET requests may be sent by simply directing your browser to the right adress. Other request methods (and messages requiring specific header values) can’t be sent so easily with a browser. I recommend to use curl: a command line tool that sends HTTP requests and displays the response sent by the server.

Quick help:

curl http://example.com/page curl will send a GET request to the given URI and print out the response received in return from the server
curl --include http://example.com/page The --include option tells curl to include the received HTTP headers in the output
curl --data "name=daniel&skill=lousy" http://example.com/page curl will send a POST request to the given URI, passing the data to the server using the content-type application/x-www-form-urlencoded (in the same way that a browser does when a user has filled in an HTML form and presses the submit button)

References

Official doc for curl. curl is available in your favorite linux distribution. Other OSes: try here (untested by this author), write to me if you know more.
Wget is an alternative to curl. It is available in your favorite linux distribution. Other OSes: try here (untested by this author).
You might also want to try HTTPie
Cookies
Postman

Appendices

Various

HTTP/2 is standardized by W3C as RFC 7540 (HTTP/1.1 was previously defined under RFC 2616, now obsolete).
A presentation (in French) about Open Data: L’Open Data à la loupe.
Some web sites voluntarily do not make their data automatically extractable: example. Check legal conditions before collecting data.
HTTP conventions for the representation in query strings as &-separated pairs relate to the HTML form element when used as a GET method.

URIs

RFC 3986: Uniform Resource Identifier (URI) Generic Syntax, 2005 (obsoletes RFC 2396).
A URI with authority has the form scheme://authority path [?query][#fragment] (URIs also exist with different forms such as mailto:John.Doe@example.com, tel:+1-816-555-1212, urn:oasis:names:specification:docbook:dtd:xml:4.1.2…)
Characters in [letters of the basic latin alphabet, digits, and “unreserved characters” -._~] must not be percent-encoded
“Reserved characters” :$&'()*,;=` that are explicitly allowed for in the specification of the chosen scheme when used accordingly (thus including `&` and ` in a query string in the http scheme) must not be percent-encoded
Other characters must be percent-encoded

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTTP.adoc

HTTP.adoc

HTTP in brief

Introduction

URIs

Content negotiation

Request methods

Status code

APIs

curl: Test using the command line

References

Appendices

Various

URIs

Files

HTTP.adoc

Latest commit

History

HTTP.adoc

File metadata and controls

HTTP in brief

Introduction

URIs

Content negotiation

Request methods

Status code

APIs

curl: Test using the command line

References

Appendices

Various

URIs