Crawly.Pipelines.WriteToFile (Crawly v0.17.0) View Source

Stores a given item into Filesystem

Pipeline Lifecycle:

  1. When run (by Crawly.Utils.pipe), creates a file descriptor if not already created.
  2. Performs the write operation
  3. File descriptor is reused by passing it through the pipeline state with :write_to_file_fd

Note: File.close is not necessary due to the file descriptor being automatically closed upon the end of a the parent process.

Refer to https://github.com/oltarasenko/crawly/pull/19#discussion_r350599526 for relevant discussion.

Options

In the absence of tuple-based options being passed, the pipeline will fallback onto the config of :crawly, Crawly.Pipelines.WriteToFile, for the :folder and :extension keys

  • :folder, optional. The folder in which the file will be created. Defaults to current project's folder. If provided folder does not exist it's created.
  • :extension, optional. The file extension in which the file will be created with. Defaults to jl.
  • :include_timestamp, boolean, optional, true by default. Allows to add timestamp to the filename.

    Example Declaration

    pipelines: [
    Crawly.Pipelines.JSONEncoder,
    {Crawly.Pipelines.WriteToFile, folder: "/tmp", extension: "csv"}
    ]

    Example Output

iex> item = %{my: "item"}
iex> WriteToFile.run(item, %{}, folder: "/tmp", extension: "csv")
{ %{my: "item"} , %{write_to_file_fd: #PID<0.123.0>} }