chrobot
Welcome to Chrobot! 🤖 This module exposes high level functions for browser automation.
Some basic concepts:
- You’ll first want to
launch
an instance of the browser and receive aSubject
which allows you to send messages to the browser (actor) - You can
open
aPage
, which makes the browser browse to a website, hold on to the returnedPage
, and pass it to functions in this module - If you want to make raw protocol calls, you can use
page_caller
, to create a callback to pass to protocol commands from yourPage
- When you are done with the browser, you should call
quit
to shut it down gracefully
The functions in this module just make calls to protocol/
modules, if you
would like to customize the behaviour, take a look at them to see how to make
direct protocol calls and pass different defaults.
Something to consider:
A lot of the functions in this module are interpolating their parameters into
JavaScript expressions that are evaluated in the page context.
No attempt is made to escape the passed parameters or prevent script injection through them,
you should not use the functions in this module with arbitrary strings if you want
to treat the pages you are operating on as a secure context.
Types
pub type CallArgument {
StringArg(value: String)
IntArg(value: Int)
FloatArg(value: Float)
BoolArg(value: Bool)
ArrayArg(value: List(CallArgument))
}
Constructors
-
StringArg(value: String)
-
IntArg(value: Int)
-
FloatArg(value: Float)
-
BoolArg(value: Bool)
-
ArrayArg(value: List(CallArgument))
pub type EncodedFile {
EncodedFile(data: String, extension: String)
}
Constructors
-
EncodedFile(data: String, extension: String)
Holds information about the current page, as well as the desired timeout in milliseconds to use when waiting for browser responses.
pub type Page {
Page(
browser: Subject(chrome.Message),
time_out: Int,
target_id: target.TargetID,
session_id: target.SessionID,
)
}
Constructors
-
Page( browser: Subject(chrome.Message), time_out: Int, target_id: target.TargetID, session_id: target.SessionID, )
Functions
pub fn as_value(
result: Result(RemoteObject, RequestError),
decoder: fn(Dynamic) -> Result(a, b),
) -> Result(a, RequestError)
Cast a RemoteObject into a value by passing a dynamic decoder.
This is a convenience for when you know a RemoteObject is returned by value and not ID,
and you want to extract the value from it.
You can chain this to eval
or eval_async
like so:
eval(page, "window.location.href")
|> as_value(dynamic.string)
pub fn await_load_event(
browser: Subject(Message),
page: Page,
) -> Result(Dynamic, RequestError)
Block until the page load event has fired.
Note that with local pages, the load event can often fire
before the handler is attached.
It’s best to use await_selector
instead of this
pub fn await_selector(
on page: Page,
select selector: String,
) -> Result(RemoteObjectId, RequestError)
Continously attempt to run a selector, until it succeeds.
You can use this after opening a page, to wait for the moment it has initialized
enough sufficiently for you to run your automation on it.
The final result will be single remote object id
pub fn call_custom_function_on(
callback: fn(String, Option(Json)) ->
Result(Dynamic, RequestError),
function_declaration function_declaration: String,
object_id object_id: RemoteObjectId,
args arguments: List(CallArgument),
value_decoder value_decoder: fn(Dynamic) -> Result(a, b),
) -> Result(a, RequestError)
This is a version of runtime.call_function_on
which allows
passing in arguments, and always returns the result as a value,
which will be decoded by the decoder you pass in
You would use it with a JavaScript function declaration like this:
function my_function(my_arg) {
// You can access the passed RemoteObject with `this`
const wibble = this.getAttribute('href')
// You have access to the arguments you passed in
const wobble = 'hello ' + my_arg
// You receive this return value, you should pass in a string decoder
// in this case
return wibble + wobble;
}
pub fn create_page(
with browser: Subject(Message),
from html: String,
time_out time_out: Int,
) -> Result(Page, RequestError)
Similar to open
, but creates a new page from HTML that you pass to it.
The page will be created under the about:blank
URL.
pub fn defer_quit(
browser: Subject(Message),
body: fn() -> a,
) -> Result(Nil, CallError(Nil))
Convenience function that lets you defer quitting the browser after you are done with it,
it’s meant for a use
expression like this:
let assert Ok(browser_subject) = browser.launch()
use <- browser.defer_quit(browser_subject)
// do stuff with the browser
pub fn eval(
on page: Page,
js expression: String,
) -> Result(RemoteObject, RequestError)
Evaluate some JavaScript on the page and return the result,
which will be a RemoteObject
reference.
Check the protocol/runtime
module for more info.
pub fn eval_async(
on page: Page,
js expression: String,
) -> Result(RemoteObject, RequestError)
Like eval
, but awaits for the result of the evaluation
and returns once promise has been resolved
pub fn eval_to_value(
on page: Page,
js expression: String,
) -> Result(RemoteObject, RequestError)
pub fn get_all_html(
on page: Page,
) -> Result(String, RequestError)
pub fn get_attribute(
on page: Page,
from item: RemoteObjectId,
name attribute_name: String,
) -> Result(String, RequestError)
Assuming the passed remote object reference is an Element,
return an attribute of that element.
Attributes are always returned as a string.
pub fn get_inner_html(
on page: Page,
from item: RemoteObjectId,
) -> Result(String, RequestError)
pub fn get_outer_html(
on page: Page,
from item: RemoteObjectId,
) -> Result(String, RequestError)
pub fn get_property(
on page: Page,
from item: RemoteObjectId,
name property_name: String,
property_decoder property_decoder: fn(Dynamic) -> Result(a, b),
) -> Result(a, RequestError)
Get a property of a remote object and decode it with the provided decoder
pub fn get_text(
on page: Page,
from item: RemoteObjectId,
) -> Result(String, RequestError)
Note: Accesses the innerText
property, not textContent
pub fn launch() -> Result(Subject(Message), LaunchError)
Cleverly try to find a chrome installation and launch it with reasonable defaults.
- If
CHROBOT_BROWSER_PATH
is set, use that - If a local chrome installation is found, use that
- If a system chrome installation is found, use that
- If none of the above, return an error
If you want to always use a specific chrome installation, take a look at launch_with_config
or
launch_with_env
to set the path explicitly.
This function will validate that the browser launched successfully, and the protocol version matches the one supported by this library.
pub fn launch_with_config(
config: BrowserConfig,
) -> Result(Subject(Message), LaunchError)
Launch a browser with the given configuration,
to populate the arguments, use browser.get_default_chrome_args
.
This function will validate that the browser launched successfully, and the
protocol version matches the one supported by this library.
Example
let config =
browser.BrowserConfig(
path: "chrome/linux-116.0.5793.0/chrome-linux64/chrome",
args: chrome.get_default_chrome_args(),
start_timeout: 5000,
)
let assert Ok(browser_subject) = launch_with_config(config)
pub fn launch_with_env() -> Result(Subject(Message), LaunchError)
Launch a browser, and read the configuration from environment variables. The browser path variable must be set, all others will fall back to a default.
This function will validate that the browser launched successfully, and the protocol version matches the one supported by this library.
Configuration variables:
CHROBOT_BROWSER_PATH
- The path to the browser executableCHROBOT_BROWSER_ARGS
- The arguments to pass to the browser, separated by spacesCHROBOT_BROWSER_TIMEOUT
- The timeout in milliseconds to wait for the browser to start, must be an integerCHROBOT_LOG_LEVEL
- The log level to use, one ofsilent
,warnings
,info
,debug
pub fn open(
with browser_subject: Subject(Message),
to url: String,
time_out time_out: Int,
) -> Result(Page, RequestError)
Open a new page in the browser.
Returns a response when the protocol call succeeds, please use
await_selector
to determine when the page is ready.
The timeout passed to this function will be attached to the returned
Page
type to be reused by other functions in this module.
You can always adjust it using with_timeout
.
pub fn page_caller(
page: Page,
) -> fn(String, Option(Json)) -> Result(Dynamic, RequestError)
Create callback to pass to protocol commands from a Page
pub fn pdf(page: Page) -> Result(EncodedFile, RequestError)
Export the current page as PDF and return it as a base64 encoded string.
Transferring the encoded file from the browser to the chrome agent can take a pretty long time,
depending on the document size.
Consider setting a larger timeout, you can use with_timeout
on your existing Page
to do this.
The Ok(result) of this function can be passed to to_file
If you want to customize the settings of the output document, use print_to_pdf
from protocol/page
directly
pub fn quit(
browser: Subject(Message),
) -> Result(Nil, CallError(Nil))
Quit the browser (alias for chrome.quit
)
pub fn screenshot(
page: Page,
) -> Result(EncodedFile, RequestError)
Capture a screenshot of the current page and return it as a base64 encoded string
The Ok(result) of this function can be passed to to_file
If you want to customize the settings of the output image, use capture_screenshot
from protocol/page
directly
pub fn select(
on page: Page,
matching selector: String,
) -> Result(RemoteObjectId, RequestError)
pub fn select_all(
on page: Page,
matching selector: String,
) -> Result(List(RemoteObjectId), RequestError)
Run querySelectorAll
on the page and return a list of remote object ids
pub fn to_file(
input input: EncodedFile,
path path: String,
) -> Result(Nil, FileError)
pub fn to_value(
on page: Page,
from remote_object_id: RemoteObjectId,
to decoder: fn(Dynamic) -> Result(a, b),
) -> Result(a, RequestError)
Evalute a remote object to a value, passing in the appropriate decoder function
pub fn with_timeout(page: Page, time_out: Int) -> Page
Return an updated Page
with the desired timeout to apply, in milliseconds