A tool to help find R packages, by matching packages either to a text description, or to any given package. Can find matching packages either from rOpenSci’s suite of packages, or from all packages currently on CRAN.
This package relies on a locally-running instance of
ollama. Procedures for setting that up are
described in a separate
vignette
(vignette("ollama", package = "pkgmatch")
). ollama needs to be
installed before this package can be used.
Once ollama is running, the easiest way to install this package is via
the associated
r-universe
.
As shown there, simply enable the universe with
options (repos = c (
ropenscireviewtools = "https://ropensci-review-tools.r-universe.dev",
CRAN = "https://cloud.r-project.org"
))
And then install the usual way with,
install.packages ("pkgmatch")
Alternatively, the package can be installed by first installing either the remotes or pak packages and running one of the following lines:
remotes::install_github ("ropensci-review-tools/pkgmatch")
pak::pkg_install ("ropensci-review-tools/pkgmatch")
The package can then loaded for use with
library (pkgmatch)
The package takes input either from a text description or local path to an R package, and finds similar packages based on both Language Model (LM) embeddings, and more traditional text and code matching algorithms. The LM embeddings require a locally-running instance of ollama, as described in a separate vignette.
The package has two main functions:
pkgmatch_similar_pkgs()
to find similar rOpenSci or CRAN packages based input as either a local path to an entire package, or as a single descriptive text string; andpkgmatch_similar_fns()
to find similar functions from rOpenSci packages based on descriptive text input. (Not available for functions from CRAN packages.)
The following code demonstrates how these functions work, first matching general text strings packages from rOpenSci:
input <- "
Packages for analysing evolutionary trees, with a particular focus
on visualising inter-relationships among distinct trees.
"
pkgmatch_similar_pkgs (input, corpus = "ropensci")
## [1] "treestartr" "treedata.table" "canaper" "phylogram"
## [5] "rotl"
The corpus parameter must be specified as one of “ropensci” or “cran”.
The CRAN corpus is much larger than the rOpenSci corpus, and matching
for corpus = "cran"
will generally take notably longer.
Websites of packages returned by the
pkgmatch_similar_pkgs()
function can be automatically opened, either by passing browse = TRUE
,
or by storing the value of a function as an object and passing that to
the
pkgmatch_browse()
function.
The input
parameter can also be a local path to an entire package. To
demonstrate that, the following code downloads a .tar.gz
file of the
httr2
package from CRAN:
pkg <- "httr2"
p <- available.packages () |>
data.frame () |>
dplyr::filter (Package == pkg)
url_base <- "https://cran.r-project.org/src/contrib/"
url <- paste0 (url_base, p$Package, "_", p$Version, ".tar.gz")
path <- fs::path (fs::path_temp (), basename (url))
download.file (url, destfile = path, quiet = TRUE)
The path to that package (in this case as a compressed tarball) can then
be passed to the
pkgmatch_similar_pkgs()
function:
pkgmatch_similar_pkgs (path, corpus = "ropensci")
## $text
## [1] "elastic" "vcr" "cyphr" "ruODK" "webmockr"
##
## $code
## [1] "taxize" "webmockr" "rdhs" "crul" "babeldown"
Packages from CRAN can also be matched:
pkgmatch_similar_pkgs (path, corpus = "cran")
## $text
## [1] "httr2" "httr" "googleAuthR" "httptest" "request"
##
## $code
## [1] "httr2" "httr" "pkgcache" "ellmer" "webfakes"
The input
parameter can also be a local path to a full source code
repository.
There is an additional function to find functions within packages which best match a text description.
input <- "A function to label a set of geographic coordinates"
pkgmatch_similar_fns (input)
## [1] "GSODR::nearest_stations" "refsplitr::plot_addresses_points"
## [3] "slopes::elevation_extract" "quadkeyr::grid_to_polygon"
## [5] "rnoaa::meteo_nearby_stations"
input <- "Identify genetic sequences matching a given input fragment"
pkgmatch_similar_fns (input)
## [1] "charlatan::SequenceProvider" "beastier::is_alignment"
## [3] "charlatan::ch_gene_sequence" "beautier::is_phylo"
## [5] "textreuse::align_local"
Setting browse = TRUE
will then open the documentation pages
corresponding to those best-matching functions.
- The
utils::RSiteSearch()
function. - The
sos
package that queries the “RSiteSearch” database.
All contributions to this project are gratefully acknowledged using the
allcontributors
package
following the allcontributors
specification. Contributions of any kind are welcome!
mpadge |
Bisaloo |
MargaretSiple-NOAA |