chrobot
Welcome to Chrobot! 🤖 This module exposes high level functions for browser automation.
Some basic concepts:
- You’ll first want to
launch
an instance of the browser and receive aSubject
which allows you to send messages to the browser (actor) - You can
open
aPage
, which makes the browser browse to a website, hold on to the returnedPage
, and pass it to functions in this module - If you want to make raw protocol calls, you can use
page_caller
, to create a callback to pass to protocol commands from yourPage
- When you are done with the browser, you should call
quit
to shut it down gracefully
The functions in this module just make calls to protocol/
modules, if you
would like to customize the behaviour, take a look at them to see how to make
direct protocol calls and pass different defaults.
Something to consider:
A lot of the functions in this module are interpolating their parameters into
JavaScript expressions that are evaluated in the page context.
No attempt is made to escape the passed parameters or prevent script injection through them,
you should not use the functions in this module with arbitrary strings if you want
to treat the pages you are operating on as a secure context.
Types
pub type CallArgument {
StringArg(value: String)
IntArg(value: Int)
FloatArg(value: Float)
BoolArg(value: Bool)
ArrayArg(value: List(CallArgument))
}
Constructors
-
StringArg(value: String)
-
IntArg(value: Int)
-
FloatArg(value: Float)
-
BoolArg(value: Bool)
-
ArrayArg(value: List(CallArgument))
pub type EncodedFile {
EncodedFile(data: String, extension: String)
}
Constructors
-
EncodedFile(data: String, extension: String)
Holds information about the current page, as well as the desired timeout in milliseconds to use when waiting for browser responses.
pub type Page {
Page(
browser: Subject(chrome.Message),
time_out: Int,
target_id: target.TargetID,
session_id: target.SessionID,
)
}
Constructors
-
Page( browser: Subject(chrome.Message), time_out: Int, target_id: target.TargetID, session_id: target.SessionID, )
Functions
pub fn as_value(
result: Result(RemoteObject, RequestError),
decoder: fn(Dynamic) -> Result(a, b),
) -> Result(a, RequestError)
Cast a RemoteObject into a value by passing a dynamic decoder.
This is a convenience for when you know a RemoteObject is returned by value and not ID,
and you want to extract the value from it.
You can chain this to eval
or eval_async
like so:
eval(page, "window.location.href")
|> as_value(dynamic.string)
pub fn await_load_event(
browser: Subject(Message),
page: Page,
) -> Result(Dynamic, RequestError)
Block until the page load event has fired.
Note that with local pages, the load event can often fire
before the handler is attached.
It’s best to use await_selector
instead of this
pub fn await_selector(
on page: Page,
select selector: String,
) -> Result(RemoteObjectId, RequestError)
Continously attempt to run a selector, until it succeeds.
You can use this after opening a page, to wait for the moment it has initialized
enough sufficiently for you to run your automation on it.
The final result will be single remote object id
pub fn call_custom_function_on(
callback: fn(String, Option(Json)) ->
Result(Dynamic, RequestError),
function_declaration function_declaration: String,
object_id object_id: RemoteObjectId,
args arguments: List(CallArgument),
value_decoder value_decoder: fn(Dynamic) -> Result(a, b),
) -> Result(a, RequestError)
This is a version of runtime.call_function_on
which allows
passing in arguments, and always returns the result as a value,
which will be decoded by the decoder you pass in
You would use it with a JavaScript function declaration like this:
function my_function(my_arg) {
// You can access the passed RemoteObject with `this`
const wibble = this.getAttribute('href')
// You have access to the arguments you passed in
const wobble = 'hello ' + my_arg
// You receive this return value, you should pass in a string decoder
// in this case
return wibble + wobble;
}
pub fn create_page(
with browser: Subject(Message),
from html: String,
time_out time_out: Int,
) -> Result(Page, RequestError)
Similar to open
, but creates a new page from HTML that you pass to it.
The page will be created under the about:blank
URL.
pub fn defer_quit(
browser: Subject(Message),
body: fn() -> a,
) -> Result(Nil, CallError(Nil))
Convenience function that lets you defer quitting the browser after you are done with it,
it’s meant for a use
expression like this:
let assert Ok(browser_subject) = browser.launch()
use <- browser.defer_quit(browser_subject)
// do stuff with the browser
pub fn eval(
on page: Page,
js expression: String,
) -> Result(RemoteObject, RequestError)
Evaluate some JavaScript on the page and return the result,
which will be a RemoteObject
reference.
Check the protocol/runtime
module for more info.
pub fn eval_async(
on page: Page,
js expression: String,
) -> Result(RemoteObject, RequestError)
Like eval
, but awaits for the result of the evaluation
and returns once promise has been resolved
pub fn eval_to_value(
on page: Page,
js expression: String,
) -> Result(RemoteObject, RequestError)
pub fn get_all_html(
on page: Page,
) -> Result(String, RequestError)
pub fn get_attribute(
on page: Page,
from item: RemoteObjectId,
name attribute_name: String,
) -> Result(String, RequestError)
Assuming the passed remote object reference is an Element,
return an attribute of that element.
Attributes are always returned as a string.
pub fn get_inner_html(
on page: Page,
from item: RemoteObjectId,
) -> Result(String, RequestError)
pub fn get_outer_html(
on page: Page,
from item: RemoteObjectId,
) -> Result(String, RequestError)
pub fn get_property(
on page: Page,
from item: RemoteObjectId,
name property_name: String,
property_decoder property_decoder: fn(Dynamic) -> Result(a, b),
) -> Result(a, RequestError)
Get a property of a remote object and decode it with the provided decoder
pub fn get_text(
on page: Page,
from item: RemoteObjectId,
) -> Result(String, RequestError)
Note: Accesses the innerText
property, not textContent
pub fn launch() -> Result(Subject(Message), LaunchError)
Try to find a chrome installation and launch it with default arguments.
First, it will try to find a local chrome installation, like that created by npx @puppeteer/browsers install chrome
If that fails, it will try to find a system chrome installation in some common places.
Consider using launch_with_config
with a BrowserConfig
instead and specifying
an explicit path to the chrome executable if consistency is a requirement.
This function will validate that the browser launched successfully, and the protocol version matches the one supported by this library.
pub fn launch_with_config(
config: BrowserConfig,
) -> Result(Subject(Message), LaunchError)
Launch a browser with the given configuration,
to populate the arguments, use browser.get_default_chrome_args
.
This function will validate that the browser launched successfully, and the
protocol version matches the one supported by this library.
Example
let config =
browser.BrowserConfig(
path: "chrome/linux-116.0.5793.0/chrome-linux64/chrome",
args: chrome.get_default_chrome_args(),
start_timeout: 5000,
)
let assert Ok(browser_subject) = launch_with_config(config)
pub fn open(
with browser_subject: Subject(Message),
to url: String,
time_out time_out: Int,
) -> Result(Page, RequestError)
Open a new page in the browser.
Returns a response when the protocol call succeeds, please use
await_selector
to determine when the page is ready.
The timeout passed to this function will be attached to the returned
Page
type to be reused by other functions in this module.
You can always adjust it using with_timeout
.
pub fn page_caller(
page: Page,
) -> fn(String, Option(Json)) -> Result(Dynamic, RequestError)
Create callback to pass to protocol commands from a Page
pub fn pdf(page: Page) -> Result(EncodedFile, RequestError)
Export the current page as PDF and return it as a base64 encoded string.
Transferring the encoded file from the browser to the chrome agent can take a pretty long time,
depending on the document size.
Consider setting a larger timeout, you can use with_timeout
on your existing Page
to do this.
The Ok(result) of this function can be passed to to_file
If you want to customize the settings of the output document, use print_to_pdf
from protocol/page
directly
pub fn quit(
browser: Subject(Message),
) -> Result(Nil, CallError(Nil))
Quit the browser (alias for chrome.quit
)
pub fn screenshot(
page: Page,
) -> Result(EncodedFile, RequestError)
Capture a screenshot of the current page and return it as a base64 encoded string
The Ok(result) of this function can be passed to to_file
If you want to customize the settings of the output image, use capture_screenshot
from protocol/page
directly
pub fn select(
on page: Page,
matching selector: String,
) -> Result(RemoteObjectId, RequestError)
pub fn select_all(
on page: Page,
matching selector: String,
) -> Result(List(RemoteObjectId), RequestError)
Run querySelectorAll
on the page and return a list of remote object ids
pub fn to_file(
input input: EncodedFile,
path path: String,
) -> Result(Nil, FileError)
pub fn to_value(
on page: Page,
from remote_object_id: RemoteObjectId,
to decoder: fn(Dynamic) -> Result(a, b),
) -> Result(a, RequestError)
Evalute a remote object to a value, passing in the appropriate decoder function
pub fn with_timeout(page: Page, time_out: Int) -> Page
Return an updated Page
with the desired timeout to apply, in milliseconds