The resourcer
package
is meant to access resources identified by a URL in a uniform way
whether it references a dataset (stored in a file, a SQL table, a
MongoDB collection etc.) or a computation unit (system commands, web
services etc.). Usually some credentials will be defined, and an
additional data format information can be provided to help dataset
coercing to a data.frame object.
The main concepts are:
These are resources describing a file. If the file is in a remote location, it must be downloaded before being read. The data format specification of the resource helps to find the appropriate file reader.
The file locations supported by default are:
file
, local file system,http
(s), web address, basic authentication,gridfs
, MongoDB file store,scp
, file copy through SSH,opal
, Opal file
store.This can be easily applied to other file locations by extending the FileResourceGetter class. An instance of the new file resource getter is to be registered so that the FileResourceResolver can operate as expected.
The data format specified within the Resource object, helps at finding the appropriate file reader. Currently supported data formats are:
csv
,
csv2
, tsv
, ssv
,
delim
), haven
(spss
, sav
, por
,
dta
, stata
, sas
,
xpt
), readxl
(excel
, xls
, xlsx
). This can be
easily applied to other data file formats by extending the
FileResourceClient class.Usage example that reads a local SPSS file:
# make a SPSS file resource
res <- resourcer::newResource(
name = "CNSIM1",
url = "file:///data/CNSIM1.sav",
format = "spss"
)
# coerce the csv file in the opal server to a data.frame
df <- as.data.frame(res)
To support other file data format, extend the FileResourceClient class with the new data format reader implementation. Associate factory class, an extension of the ResourceResolver class is also to be implemented and registered.
DBI is a set of virtual classes
that are are used to abstract the SQL database connections and
operations within R. Then any DBI implementation can be used to access
to a SQL table. Which DBI connector to be used is an information that
can be extracted from the scheme part of the resource’s URL. For
instance a resource URL starting with postgres://
will
require the RPostgres driver.
To separate the DBI connector instanciation from the DBI interface
interactions in the SQLResourceClient, a
DBIResourceConnector registry is to be populated. The currently
supported SQL database connectors are:
mariadb
MariaDB connector,mysql
MySQL connector,postgres
or postgresql
Postgres
connector,presto
, presto+http
or
presto+https
Presto
connector,spark
, spark+http
or
spark+https
Spark
connector.To support another SQL database having a DBI driver, extend the DBIResourceConnector class and register it:
Having the data stored in the database allows to handle large (common SQL databases) to big (PrestoDB, Spark) datasets using dplyr which will delegate as much as possible operations to the database.
NoSQL databases can be described by a resource. The nodbi can be used here.
Currently only connection to MongoDB database is supported using URL
scheme mongodb
or mongodb+srv
.
Computation resources are resources on which tasks/commands can be triggerred and from which resulting data can be retrieved.
Example of computation resource that connects to a server through SSH:
# make an application resource on a ssh server
res <- resourcer::newResource(
name = "supercomp1",
url = "ssh://server1.example.org/work/dir?exec=plink,ls",
identity = "sshaccountid",
secret = "sshaccountpwd"
)
# get ssh client from resource object
client <- resourcer::newResourceClient(res) # does a ssh::ssh_connect()
# execute commands
files <- client$exec("ls") # exec 'cd /work/dir && ls'
# release connection
client$close() # does ssh::ssh_disconnect(session)
There are several ways to extend the Resources handling. These are
based on different R6 classes having a isFor(resource)
function:
registerFileResourceGetter()
.registerDBIResourceConnector()
.registerResourceResolver()
. This
ResourceResolver object will create the appropriate
ResourceClient object that matches your needs.The design of the URL that will describe your new resource should not
overlap an existing one, otherwise the different registries will return
the first instance for which the isFor(resource)
is
TRUE
. In order to distinguish resource locations, the URL’s
scheme can be extended, for instance the scheme for accessing a file in
a Opal server is opal+https
so that the credentials be
applied as needed by Opal.