EvaluateReview (EvaluateReview v0.1.0) View Source

Documentation for EvaluateReview.

Link to this section Summary

Functions

cache review results

load cache review results

Uses Floki to recursively match a given list of selectors. Often times css selectors are used for many different tags on a page. The combination of several helps the user to narrow down their selection to a single tag.

Read Json from File

Scrape Dealer Rater Reviews

Scrape a list of urls

Classify overly positive reviews

Link to this section Functions

Link to this function

cache(review_list, filename)

View Source

Specs

cache(list()[tuple()], String.t()) :: :ok

cache review results

TODO encode this as JSON rather than binary

credit to https://elixirforum.com/u/benwilson512

Caches review lists as binary data so as to avoid unnecessary web scraping and to minimize suspicion

Link to this function

load_from_cache(filename)

View Source

Specs

load_from_cache(String.t()) :: list()[tuple()]

load cache review results

credit to https://elixirforum.com/u/benwilson512

Link to this function

match_selectors(list, document)

View Source

Specs

match_selectors(list(), list()) :: list()[{tuple(), tuple()}]

Uses Floki to recursively match a given list of selectors. Often times css selectors are used for many different tags on a page. The combination of several helps the user to narrow down their selection to a single tag.

Specs

read_json(String.t()) :: map()

Read Json from File

credit to https://elixirforum.com/u/idi527

Examples

iex> filename = "/tmp/test.json"
iex> EvaluateReview.read_json(filename)

Specs

scrape(String.t(), map()) :: list()[tuple()]

Scrape Dealer Rater Reviews

This function attempts to scrape reviews from the passed in url and returns a list of tuples, the first element being the review itself and the second element containing the username of the reviewer.

The simple css selectors employed are .review-content for the content of the review itself, and the combination of .italic and .font-18 for the username of the reviewer. This is an intentionally chosen shortcut. A slightly more robust approach might use the .review-container selector instead since it would seem less likely to change. I found it relatively bloated and so opted for a quicker approach that felt more elegant.

A future approach might include user-defined selectors rather than hard coded ones, but as the use case is currently very narrowly defined (solely scraping reviews from deallerrater.com) this approach seemed unnecessarily complicated.

Examples

iex> url = "https://web.archive.org/web/20201127110830/https://www.dealerrater.com/dealer/McKaig-Chevrolet-Buick-A-Dealer-For-The-People-dealer-reviews-23685/"
iex> reviews = EvaluateReview.scrape(url, [])
iex> reviews |> Enum.with_index() |> Enum.each(fn {{a, b},_} -> IO.puts("review: #{a}, reviewer: #{b}") end)
Link to this function

scrape_n(urls, selectors)

View Source

Specs

scrape_n(list()[String.t()], map()) :: list()[tuple()]

Scrape a list of urls

Link to this function

suspect_reviews(reviews, suspector \\ nil)

View Source

Specs

suspect_reviews(list()[tuple()], function()) :: list()[tuple()]

Classify overly positive reviews

Takes a list of reviews in the format produced by EvaluateReview.scrape(url, [])

Produces a list of the top three offenders ordered by severity

Current criteria for a suspicious review is simply based on a count of the number of exclamation points included in the review

Tried passing the defaultSuspector function into suspect_reviews. Unfortunately, there's no way I could find to define function is this module AND make the function available in such a manner

https://elixirforum.com/t/proposal-private-modules-general-discussion/19374/154

As such, a user could define their own suspector functions and pass them to suspect_review, but I can't seem to define them within this module

Examples

iex> url = "https://web.archive.org/web/20201127110830/https://www.dealerrater.com/dealer/McKaig-Chevrolet-Buick-A-Dealer-For-The-People-dealer-reviews-23685/"
iex> reviews = EvaluateReview.scrape(url, [])
iex> top3 = EvaluateReview.suspect_reviews(reviews)
iex> IO.inspect(top3)