| Title: | Compute a Cyclist's Eddington Number |
|---|---|
| Description: | Compute a cyclist's Eddington number, including efficiently computing cumulative E over a vector. A cyclist's Eddington number <https://en.wikipedia.org/wiki/Arthur_Eddington#Eddington_number_for_cycling> is the maximum number satisfying the condition such that a cyclist has ridden E miles or greater on E distinct days. The algorithm in this package is an improvement over the conventional approach because both summary statistics and cumulative statistics can be computed in linear time, since it does not require initial sorting of the data. These functions may also be used for computing h-indices for authors, a metric described by Hirsch (2005) <doi:10.1073/pnas.0507655102>. Both are specific applications of computing the side length of a Durfee square <https://en.wikipedia.org/wiki/Durfee_square>. Some additional author-level metrics such as g-index and i10-index are also included in the package. |
| Authors: | Paul Egeler [aut, cre], Tashi Reigle [ctb] |
| Maintainer: | Paul Egeler <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 4.3.0 |
| Built: | 2026-05-13 09:40:06 UTC |
| Source: | https://github.com/cran/eddington |
Simulated dates and distances of rides occurring in 2009. This is an
aggregation of the rides dataset by day.
daily_totalsdaily_totals
A data frame with 178 rows and 2 variables:
date the ride occurred
the total length in miles for each day
The dataset contains a total of 3,419 miles spread across 178 unique days. The Eddington number for the year was 29.
Compute the side length of a Durfee square
durfee(is)durfee(is)
is |
An integer vector representing an integer partition. |
The side length of the Durfee square for that partition.
This function is much like E_num except it provides
a cumulative Eddington number over the vector rather than a single summary
number.
E_cum(rides)E_cum(rides)
rides |
A vector of mileage, where each element represents a single day. |
An integer vector the same length as rides.
Get the number of rides required to increment to the next Eddington number.
E_next(rides)E_next(rides)
rides |
A vector of mileage, where each element represents a single day. |
A named list with the current Eddington number (E) and the
number of rides required to increment by one (req).
Gets the Eddington number for cycling. The Eddington Number for cycling, E, is the maximum number where a cyclist has ridden E miles on E distinct days.
E_num(rides)E_num(rides)
rides |
A vector of mileage, where each element represents a single day. |
The Eddington Number for cycling is related to computing the rank of an integer partition, which is the same as computing the side length of its Durfee square. Another relevant application of this metric is computing the Hirsch index (doi:10.1073/pnas.0507655102) for publications.
This is not to be confused with the
Eddington Number in
astrophysics, , which represents the number of protons in the
observable universe.
An integer which is the Eddington cycling number for the data provided.
# Randomly generate a set of 15 rides rides <- rgamma(15, shape = 2, scale = 10) # View the rides sorted in decreasing order stats::setNames(sort(rides, decreasing = TRUE), seq_along(rides)) # Get the Eddington number E_num(rides)# Randomly generate a set of 15 rides rides <- rgamma(15, shape = 2, scale = 10) # View the rides sorted in decreasing order stats::setNames(sort(rides, decreasing = TRUE), seq_along(rides)) # Get the Eddington number E_num(rides)
Determine the number of additional rides required to achieve a specified Eddington number.
E_req(rides, candidate)E_req(rides, candidate)
rides |
A vector of mileage, where each element represents a single day. |
candidate |
The Eddington number to test for. |
An integer vector of length 1. Returns 0L if E is
already achieved.
Indicates whether a certain Eddington number is satisfied, given the data.
E_sat(rides, candidate)E_sat(rides, candidate)
rides |
A vector of mileage, where each element represents a single day. |
candidate |
The Eddington number to test for. |
A logical vector of length 1.
The class will maintain the state of the algorithm, allowing for efficient updates as new rides come in.
The implementation uses an experimental base R feature utils::hashtab.
Cloning of Eddington objects is disabled. Additionally, Eddington objects
cannot be serialized; they cannot be carried between sessions using
base::saveRDS or base::save and then loaded later using base::readRDS
or base::load.
currentThe current Eddington number.
cumulativeA vector of cumulative Eddington numbers.
number_to_nextThe number of rides needed to get to the next Eddington number.
nThe number of rides in the data.
hashmapThe hash map of rides above the current Eddington number.
new()
Create a new Eddington object.
Eddington$new(rides, store.cumulative = FALSE)
ridesA vector of rides
store.cumulativelogical, indicating whether to keep a vector of cumulative Eddington numbers
A new Eddington object
print()
Print the current Eddington number.
Eddington$print()
update()
Add new rides to the existing Eddington object.
Eddington$update(rides)
ridesA vector of rides
getNumberToTarget()
Get the number of rides of a specified length to get to a target Eddington number.
Eddington$getNumberToTarget(target)
targetTarget Eddington number
An integer representing the number of rides of target length needed to achieve the target number.
isSatisfied()
Test if an Eddington number is satisfied.
Eddington$isSatisfied(target)
targetTarget Eddington number
Logical
# Randomly generate a set of 15 rides rides <- rgamma(15, shape = 2, scale = 10) # View the rides sorted in decreasing order stats::setNames(sort(rides, decreasing = TRUE), seq_along(rides)) # Create the Eddington object e <- Eddington$new(rides, store.cumulative = TRUE) # Get the Eddington number e$current # Update with new data e$update(rep(25, 10)) # See the new data e$cumulative# Randomly generate a set of 15 rides rides <- rgamma(15, shape = 2, scale = 10) # View the rides sorted in decreasing order stats::setNames(sort(rides, decreasing = TRUE), seq_along(rides)) # Create the Eddington object e <- Eddington$new(rides, store.cumulative = TRUE) # Get the Eddington number e$current # Update with new data e$update(rep(25, 10)) # See the new data e$cumulative
A stateful C++ object for computing Eddington numbers.
rides |
An optional vector of values used to initialize the class. |
store_cumulative |
Whether to store a vector of the cumulative Eddington
number, as accessed from the |
newConstructor. Parameter list may either be empty, store_cumulative,
or rides and store_cumulative
currentThe current Eddington number.
cumulativeA vector of Eddington numbers or NULL if store_cumulative
is FALSE.
hashmapA data.frame containing the distances and counts above the
current Eddington number.
updateUpdate the class state with new data.
getNumberToNextGet the number of additional distances required to reach the next Eddington number.
getNumberToTargetGet the number of additional distances required to reach a target Eddington number.
EddingtonModule objects cannot be serialized at this time; they cannot be
carried between sessions using base::saveRDS or base::save and then
loaded later using base::readRDS or base::load.
# Create a class instance with some initial data e <- EddingtonModule$new(c(3, 3, 2), store_cumulative = TRUE) e$current # Update with new data and look at the vector of cumulative Eddington numbers. e$update(c(3, 3, 5)) e$cumulative # Get the number of rides required to reach the next Eddington number and # an Eddington number of 4. e$getNumberToNext() e$getNumberToTarget(4)# Create a class instance with some initial data e <- EddingtonModule$new(c(3, 3, 2), store_cumulative = TRUE) e$current # Update with new data and look at the vector of cumulative Eddington numbers. e$update(c(3, 3, 5)) e$cumulative # Get the number of rides required to reach the next Eddington number and # an Eddington number of 4. e$getNumberToNext() e$getNumberToTarget(4)
Uses the Haversine great-circle distance formula to compute the distance between two latitude/longitude points.
get_haversine_distance( lat_1, lon_1, lat_2, lon_2, units = c("miles", "kilometers") )get_haversine_distance( lat_1, lon_1, lat_2, lon_2, units = c("miles", "kilometers") )
lat_1, lon_1, lat_2, lon_2
|
The coordinates used to compute the distance. |
units |
The units of the output distance. |
The distance between two points in the requested units.
https://en.wikipedia.org/wiki/Haversine_formula
# In NYC, 20 blocks == 1 mile. Thus, computing the distance between two # points along 7th Ave from W 39 St to W 59 St should return ~1 mile. w39_coords <- list(lat=40.75406905512651, lon=-73.98830604245481) w59_coords <- list(lat=40.76684156255418, lon=-73.97908243833855) get_haversine_distance( w39_coords$lat, w39_coords$lon, w59_coords$lat, w59_coords$lon, "miles" ) # The total distance along a sequence of points can be computed. Consider the # following sequence of points along Park Ave in the form of a list of points # where each point is a list containing a `lat` and `lon` tag. park_ave_coords <- list( list(lat=40.735337983655434, lon=-73.98973648773142), # E 15 St list(lat=40.74772623378332, lon=-73.98066078090876), # E 35 St list(lat=40.76026319186414, lon=-73.97149360922498), # E 55 St list(lat=40.77301604875587, lon=-73.96217737679450) # E 75 St ) # We can create a function to compute the total distance as follows: compute_total_distance <- function(coords) { sum( sapply( seq_along(coords)[-1], \(i) get_haversine_distance( coords[[i]]$lat, coords[[i]]$lon, coords[[i - 1]]$lat, coords[[i - 1]]$lon, "miles" ) ) ) } # Then applying the function to our sequence results in a total distance. compute_total_distance(park_ave_coords)# In NYC, 20 blocks == 1 mile. Thus, computing the distance between two # points along 7th Ave from W 39 St to W 59 St should return ~1 mile. w39_coords <- list(lat=40.75406905512651, lon=-73.98830604245481) w59_coords <- list(lat=40.76684156255418, lon=-73.97908243833855) get_haversine_distance( w39_coords$lat, w39_coords$lon, w59_coords$lat, w59_coords$lon, "miles" ) # The total distance along a sequence of points can be computed. Consider the # following sequence of points along Park Ave in the form of a list of points # where each point is a list containing a `lat` and `lon` tag. park_ave_coords <- list( list(lat=40.735337983655434, lon=-73.98973648773142), # E 15 St list(lat=40.74772623378332, lon=-73.98066078090876), # E 35 St list(lat=40.76026319186414, lon=-73.97149360922498), # E 55 St list(lat=40.77301604875587, lon=-73.96217737679450) # E 75 St ) # We can create a function to compute the total distance as follows: compute_total_distance <- function(coords) { sum( sapply( seq_along(coords)[-1], \(i) get_haversine_distance( coords[[i]]$lat, coords[[i]]$lon, coords[[i - 1]]$lat, coords[[i - 1]]$lon, "miles" ) ) ) } # Then applying the function to our sequence results in a total distance. compute_total_distance(park_ave_coords)
Compute bibliometric indices such as the h-index, g-index, and i10-index.
h_index(citations, na.rm = FALSE) i10_index(citations, na.rm = FALSE) g_index(citations, na.rm = FALSE, is_sorted = FALSE)h_index(citations, na.rm = FALSE) i10_index(citations, na.rm = FALSE) g_index(citations, na.rm = FALSE, is_sorted = FALSE)
citations |
A vector of citation counts. |
na.rm |
If |
is_sorted |
Whether the data is pre-sorted in descending order. This may speed up computations for some algorithms. The pre-sorted assumption is tested and a warning is emitted if unsorted data is detected. |
The summary number.
The h_index() function implicitly coerces inputs into integer vectors,
which will truncate any floating point inputs. This usually will result in
expected outputs, as there are not typically fractional inputs in the
intended domain, and the definitions of these indices are defined on integral
thresholds explicitly. However, to maximize the versatility of g-index
computation, the g_index() function does not perform this integer coercion.
Therefore it is worth noting that floating point input can push the g-index
higher on edge cases. For example,
g_index(as.integer(daily_totals$total_length)) != g_index(daily_totals$total_length) Thus to ensure accurate g-index results
on data that may have a fractional component, it is advised to first perform
an integer conversion prior to passing a vector into g_index() or otherwise
validate inputs.
This integer conversion will also cause the h_index() to fail when inputs
contain extremely large values (). The
Eddington number family of functions and durfee() do not have this check,
and may result in inaccurate outputs.
https://en.wikipedia.org/wiki/Author-level_metrics, https://en.wikipedia.org/wiki/G-index
Define a custom bibliometric index function
index(f, cumulative = FALSE)index(f, cumulative = FALSE)
f |
A function to be applied to the index before comparison. |
cumulative |
A logical on whether to apply a cumulative sum to the counts. |
A function that will compute the specified index.
# NOTE: These will all be less performant than their counterparts exported # in this package, i.e., `h_index()`, `g_index()`, `i10_index()`. set.seed(2018) citations <- rgamma(30, shape = 2, scale = 10) # Create an h-index my_h_index <- index(force) my_h_index(citations) # Create a g-index function my_g_index <- index(\(i) i * i, cumulative = TRUE) my_g_index(citations) # Create an i10-index my_i10_index <- index(\(i) 10L) my_i10_index(citations)# NOTE: These will all be less performant than their counterparts exported # in this package, i.e., `h_index()`, `g_index()`, `i10_index()`. set.seed(2018) citations <- rgamma(30, shape = 2, scale = 10) # Create an h-index my_h_index <- index(force) my_h_index(citations) # Create a g-index function my_g_index <- index(\(i) i * i, cumulative = TRUE) my_g_index(citations) # Create an i10-index my_i10_index <- index(\(i) 10L) my_i10_index(citations)
Reads in a GPS Exchange Format XML document and outputs a data.frame
containing distances. The corresponding dates for each track segment
(trkseg) will be included if present in the source file, else the date
column will be populated with NAs.
read_gpx(file, units = c("miles", "kilometers"))read_gpx(file, units = c("miles", "kilometers"))
file |
The input file to be parsed. |
units |
The units desired for the distance metric. |
Distances are computed using the Haversine formula and do not account for elevation changes.
This function treats the first timestamp of each trkseg as the date of
record. Thus overnight track segments will all count toward the day in which
the journey began.
A data frame containing up to two columns:
The date of the ride. See description and details.
The distance of the track segment in the requested units.
## Not run: # Get a list of all GPX export files in a directory tree gpx_export_files <- list.files( "/path/to/gpx/exports/", pattern = "\\.gpx$", full.names = TRUE, recursive = TRUE ) # Read in all files and combine them into a single data frame rides <- do.call(rbind, lapply(gpx_export_files, read_gpx)) ## End(Not run)## Not run: # Get a list of all GPX export files in a directory tree gpx_export_files <- list.files( "/path/to/gpx/exports/", pattern = "\\.gpx$", full.names = TRUE, recursive = TRUE ) # Read in all files and combine them into a single data frame rides <- do.call(rbind, lapply(gpx_export_files, read_gpx)) ## End(Not run)
Simulated dates and distances of rides occurring in 2009.
ridesrides
A data frame with 250 rows and 2 variables:
date the ride occurred
the length in miles
The dataset contains a total of 3,419 miles spread across 178 unique days. The Eddington number for the year was 29.