Down

Down is a utility tool for streaming, flexible and safe downloading of remote files.

Full documentation can be found at https://hexdocs.pm/down

Installation

Add down to your list of dependencies inmix.exs:

def deps do
  [
    {:down, "~> 0.0.1"}
  ]
end

Install via mix deps.get.

Why another HTTP library?

The library's main functionality is to create streams from external remote HTTP files. It can also be useful for the following reasons:

  • It offers safer downloads with size limitation.
  • It offers a simple API with good defaults to "just download" content.
  • It offers a thin wrapper--with a normalized API--that allows you to use different HTTP backend libraries.
  • Low memory consumption.
  • Not extra dependencies.

Maximum size

If you're accepting URLs from outside or untrusted source, it's a good policy to limit the size of the download, because attackers are always looking to over work your servers. Down allows you to pass a :max_size option:

iex> Down.download("http://example.com/image.jpg", max_size: 5 * 1024 * 1024) # 5 MB
{:error, :too_large}

What is the advantage of using Down instead of simply checking the size after downloading? Down terminates the download very early, as soon as it gets to the Content-Length header. When the Content-Length header is missing, Down terminates the download as soon as the downloaded content surpasses the :max_size option.

:max_size can be used in any Down operation.

Redirections

Down handles redirects using the :max_redirects option. In the case a request produces more redirects than the given option it returns an error:

iex> Down.download("http://example.com/100_redirects.html", max_redirects: 1)
{:error, :too_many_redirects}

Downloading

The primary function is Down.download/2 which stores the remote file as a temporary file. The returned Down.Download struct contains more information about the request including file_path with the location of the temporary file:

iex> {:ok, download} = Down.download("http://localhost:6080/robots.txt")
{:ok, %Down.Download{
  backend: Down.HackneyBackend,
  file_path: "/tmp/f-1552846757-26630-1bzq5eq.txt",
  original_filename: "robots.txt",
  request: %{
    backend_opts: nil,
    body: nil,
    headers: %{"User-Agent" => "Down/0.0.1"},
    method: :get,
    timeout: 5000,
    url: "http://localhost:6080/robots.txt"
  },
  response: %{
    encoding: nil,
    headers: %{
      "access-control-allow-credentials" => "true",
      "access-control-allow-origin" => "*",
      "connection" => "keep-alive",
      "content-length" => "30",
      "content-type" => "text/plain",
      "date" => "Sun, 17 Mar 2019 18:19:17 GMT",
      "server" => "gunicorn/19.9.0"
    },
    size: 30,
    status_code: 200
  },
  size: 30
}}
iex> download.file_path
"/tmp/f-1552846757-26630-1bzq5eq.txt"

Destination

By default the remote file will be downloaded into a temporary location. If you would like the file to be downloaded to a specific location on disk, you can specify the :destination option:

iex> Down.download("http://example.com/image.jpg", destination: "/path/to/destination")
{:ok, %Down.download{
  file_path: "/path/to/destination"
}}

Read

If the remote content is the only thing you need, Down.read/2 function is a good option. It returns a string with the full content:

iex> Down.read("https://google.com")
{:ok,
 "<!doctype html><html itemscope=\"\" itemtype=\"http://schema.org/WebPage" <> ...}

Streaming

Down has the ability to retrieve remote file content as it is being downloaded. The Down.stream function returns a stream which can be used with functions in the Stream and Enum module functions.

The content of the remote file is streamed in binary chunks. Because streams are lazy, the chunks are only downloaded on demand. This means that if only the first chunk is requested, the rest of the file isn't downloaded and the operation won't consume more resources. This is very convenient when working with large files, or when we're only interested in some parts of the file, or when keeping low memory consumption is important.

In the following example we are downloading the Elixir webpage content, but only until the title is reached:

# This example doesn't cover all the possibilities
# and it doesn't close the connection
iex> Down.stream("https://elixir-lang.org") |> Enum.find_value(&Regex.run(~r/\<title\>(.*)\<\/title\>/i, &1, capture: :all_but_first))
["Elixir"]

Architecture

Every Down operation creates a new process under a custom DynamicSupervisor. However, Down doesn't have any kind of pool connection or a maximum connection number limitation. :hackney and :ibrowser backends use their own connection pool, so if you're using one of these backends you're automatically using its connection pool.

Backends

There are four optional backends in Down:

The default is :httpc which comes by default in Erlang, so no additional dependencies are needed to use Down. It can be changed globally using the following config:

config :down, :backend, :hackney

The backend can also be changed in runtime and per request, simply by using the :backend argument:

iex> Down.read("https://example.com/api.json", backend: :ibrowser)

Down and :hackney have proper SSL (https) support.

Testing

To run the test correctly, you need to have an httpbin server running on port 6080. The best way to do it is using the kennethreitz/httpbin docker image:

docker run -p 6080:80 -d kennethreitz/httpbin

TODO

Before publishing a stable version the following tasks should be done:

  • Check the minimum compatible version for the backends dependencies.
  • Improve tests.
  • Improve docs.
  • Fix timeouts
  • API to close streams
  • API to fetch metadata information from a stream
  • Handle form data
  • Handle cookies

Acknowledgment

This library is heavily inspired on the awesome Ruby gem down