--- title: "Opal Projects" author: "Yannick Marcon" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Opal Projects} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console --- [Opal](https://www.obiba.org/pages/products/opal/) stores data and meta-data (dictionaries) in projects that are accessible through web services. See the [Variables and Data](http://opaldoc.obiba.org/en/latest/variables-data.html) documentation page that explains the data model of Opal. See also the [Resources](https://opaldoc.obiba.org/en/latest/resources.html) documentation for an alternate way of accessing data in a project. The Opal R package exposes projects related functions: * list projects, * list tables, * list variables, * list taxonomies, * and functions for getting all associated details (of a project, a table, a variable etc.) ... Note that these functions do not create a R session on the server side: it is only accessing the content of the Opal server (permission checks apply). ## Setup Setup the connection with Opal: ```{r eval=FALSE} library(opalr) o <- opal.login("administrator", "password", url = "https://opal-demo.obiba.org") ``` ## Project List the projects: ```{r eval=FALSE} opal.projects(o) ``` Create a project, linked to a database (default or first one): ```{r eval=FALSE} if (opal.project_exists(o, "dummy")) opal.project_delete(o, "dummy") opal.project_create(o, "dummy", database = TRUE) opal.project(o, "dummy") ``` ### Backup and Restore Backup a project and download the backup archive (encrypted): ```{r eval=FALSE} opal.project_backup(o, 'CNSIM', '/home/administrator/backup/CNSIM') opal.file_download(o, '/home/administrator/backup/CNSIM', '/tmp/CNSIM.zip', key = "12345abcdef") ``` Restore a project from an uploaded (and encrypted) archive: ```{r eval=FALSE} opal.file_upload(o, '/tmp/CNSIM.zip', '/home/administrator') opal.project_restore(o, 'dummy', '/home/administrator/CNSIM.zip', key = "12345abcdef") # verify tables opal.tables(o, "CNSIM") ``` ## Tables In Opal there are two kinds of tables: * raw tables, which data are stored in a database, * views, which are logical tables, using per-variable transformation algorithms. List the tables in a project, with their count of variables and entities: ```{r eval=FALSE} opal.tables(o, "CNSIM", counts = TRUE) ``` The table object can be retrieved as follow: ```{r eval=FALSE} opal.table(o, "CNSIM", "CNSIM1", counts = TRUE) ``` The existence of a table can be checked: ```{r eval=FALSE} opal.table_exists(o, "CNSIM", "CNSIM1") ``` And more specifically, verify whether a table is a view or not: ```{r eval=FALSE} opal.table_exists(o, "CNSIM", "CNSIM1", view = TRUE) ``` A table can be created, either as a raw table or a view. To create a view, specify which tables are referred: ```{r eval=FALSE} # drop table if it exists opal.table_delete(o, "CNSIM", "CNSIM123") # then create a view, no variables opal.table_create(o, "CNSIM", "CNSIM123", tables = c("CNSIM.CNSIM1", "CNSIM.CNSIM2", "CNSIM.CNSIM3")) ``` ### Dictionaries List the variables of a table and get the details of the variable annotations (one column per variable attribute with namespace). This is a summary dictionary, as it includes the concatenated category properties: ```{r eval=FALSE} opal.variables(o, "CNSIM", "CNSIM1") ``` It is also possible to get the full data dictionary of a table, as separate data frames of variables and categories. This is the recommended format for working with a data dictionary: ```{r eval=FALSE} dico <- opal.table_dictionary_get(o, "CNSIM", "CNSIM1") dico$variables dico$categories ``` Here we modify the data dictionary by appending a derivation script to each of the variables: ```{r eval=FALSE} dico$variables$script <- paste0("$('", dico$variables$name, "')") dico$variables ``` Then we apply this derived variables dictionary to the view we have previously created and verify the counts of columns (variables) and rows (entities) in this table: ```{r eval=FALSE} opal.table_dictionary_update(o, "CNSIM", "CNSIM123", variables = dico$variables, categories = dico$categories) opal.table(o, "CNSIM", "CNSIM123", counts = TRUE) ``` Assign this view to a symbol in the R server, and get the summary statics: ```{r eval=FALSE} opal.assign(o, "D", "CNSIM.CNSIM123") opal.execute(o, "summary(D)") ``` ### Values Get the values in a table for a specific Participant entity: ```{r eval=FALSE} opal.valueset(o, "CNSIM", "CNSIM123", identifier = "1454") ``` Get all the values of a table in our local R session as a data.frame (tibble) object: ```{r eval=FALSE} cnsim1 <- opal.table_get(o, "CNSIM", "CNSIM1") cnsim2 <- opal.table_get(o, "CNSIM", "CNSIM2") cnsim3 <- opal.table_get(o, "CNSIM", "CNSIM3") ``` Then do some alterations on this data.frame and save it back as a raw table: ```{r eval=FALSE} # make sure IDs are unique cnsim1$id <- paste0(cnsim1$id, "-1") cnsim2$id <- paste0(cnsim2$id, "-2") cnsim3$id <- paste0(cnsim3$id, "-3") # bind tables cnsim123 <- rbind(cnsim1, cnsim2, cnsim3) # remove some columns cnsim123$DIS_AMI <- NULL cnsim123$DIS_CVA <- NULL cnsim123$DIS_DIAB <- NULL # save as a raw table opal.table_save(o, cnsim123, "CNSIM", "CNSIM", overwrite = TRUE, force = TRUE) opal.table(o, "CNSIM", "CNSIM", counts = TRUE) ``` Verify that this raw table resulting from the merge of the other tables as same values for a given Participant: ```{r eval=FALSE} opal.valueset(o, "CNSIM", "CNSIM", identifier = "1454-1") ``` It is possible to truncate a table, i.e. delete ALL the values of a table (which must not be a view), without modifying the dictionary: ```{r eval=FALSE} opal.table_truncate(o, "CNSIM", "CNSIM") opal.table(o, "CNSIM", "CNSIM", counts = TRUE) ``` ### Annotations Variables can be described by [taxonomy terms](https://opaldoc.obiba.org/en/latest/web-user-guide/administration/taxonomies.html). List the taxonomies: ```{r eval=FALSE} opal.taxonomies(o) ``` List the vocabularies of a taxonomy: ```{r eval=FALSE} opal.vocabularies(o, taxonomy = "Mlstr_area") ``` List the terms of a vocabulary: ```{r eval=FALSE} opal.terms(o, taxonomy = "Mlstr_area", vocabulary = "Lifestyle_behaviours") ``` To apply a taxonomy term to a table dictionary, use the following for batch annotation: ```{r eval=FALSE} annotations <- tibble::tribble( ~variable, ~taxonomy, ~vocabulary, ~term, "LAB_TSC", "Mlstr_area", "Physical_measures", "Physical_characteristics", "LAB_TRIG", "Mlstr_area", "Physical_measures", "Physical_characteristics", "LAB_HDL", "Mlstr_area", "Physical_measures", "Physical_characteristics", "LAB_GLUC_ADJUSTED", "Mlstr_area", "Physical_measures", "Physical_characteristics" ) opal.annotate(o, "CNSIM", "CNSIM123", annotations = annotations) ``` To list the variable annotations: ```{r eval=FALSE} opal.annotations(o, "CNSIM", "CNSIM123") ``` ## Resources Resources are an alternative way of accessing data or computation systems. In a project are stored references to resources, i.e. how to build a resource object in R and the permissions to use this resource. To list the resource references: ```{r eval=FALSE} opal.resources(o, "RSRC") ``` To create a reference to a resource (a compressed CSV file, stored in a Opal file system, authorized by a personal access token): ```{r eval=FALSE} if (opal.resource_exists(o, "RSRC", "CNSIM4")) opal.resource_delete(o, "RSRC", "CNSIM4") opal.resource_create(o, "RSRC", "CNSIM4", url = "opal+https://opal-demo.obiba.org/ws/files/projects/RSRC/CNSIM3.zip", format = "csv", secret = "EeTtQGIob6haio5bx6FUfVvIGkeZJfGq") # verify the resource reference object opal.resource(o, "RSRC", "CNSIM4") ``` From a resource reference, it is possible to build and get the resource object in the local R session: ```{r eval=FALSE} opal.resource_get(o, "RSRC", "CNSIM4") ``` Depending on the nature of the resource, it may be possible to coerce it to a data.frame in the client side: ```{r eval=FALSE} library(resourcer) as.data.frame(opal.resource_get(o, "RSRC", "CNSIM4")) ``` The same operation can be done on the R server side: ```{r eval=FALSE} # assign the resource object opal.assign.resource(o, "rsrc", "RSRC.CNSIM4") # coerce it to a data.frame opal.assign.script(o, "D", quote(as.data.frame(rsrc))) # get some summary statistics opal.execute(o, "summary(as.factor(D$GENDER))") ``` ## Permissions Permissions can be managed (list, add, delete) at different levels: * project: `opal.project_perm()`, * tables: `opal.tables_perm()`, * table: `opal.table_perm()`, * resources: `opal.resources_perm()`, * resource: `opal.resource_perm()`. ## Teardown Good practice is to free server resources by sending a logout request: ```{r eval=FALSE} opal.logout(o) ```