Skip to contents

These functions ingest data from a file using a table function. The results are transparently converted to a data frame, but the data is only read when the resulting data frame is actually accessed.

df_from_csv() reads a CSV file using the read_csv_auto() table function.

duckplyr_df_from_csv() is a thin wrapper around df_from_csv() that calls as_duckplyr_df() on the output.

df_from_parquet() reads a Parquet file using the read_parquet() table function.

duckplyr_df_from_parquet() is a thin wrapper around df_from_parquet() that calls as_duckplyr_df() on the output.

df_to_parquet() writes a data frame to a Parquet file via DuckDB. If the data frame is a duckplyr_df, the materialization occurs outside of R. An existing file will be overwritten. This function requires duckdb >= 0.10.0.

df_from_file() uses arbitrary table functions to read data. See https://duckdb.org/docs/data/overview for a documentation of the available functions and their options.

duckplyr_df_from_file() is a thin wrapper around df_from_file() that calls as_duckplyr_df() on the output.

Usage

df_from_csv(path, ..., options = list(), class = NULL)

duckplyr_df_from_csv(path, ..., options = list(), class = NULL)

df_from_parquet(path, ..., options = list(), class = NULL)

duckplyr_df_from_parquet(path, ..., options = list(), class = NULL)

df_to_parquet(data, path)

df_from_file(path, table_function, ..., options = list(), class = NULL)

duckplyr_df_from_file(
  path,
  table_function,
  ...,
  options = list(),
  class = NULL
)

Arguments

path

Path to file or directory.

...

These dots are for future extensions and must be empty.

options

Arguments to the DuckDB function indicated by table_function.

class

An optional class to add to the data frame. The returned object will always be a data frame. Pass class(tibble()) to create a tibble.

data

A data frame to be written to disk.

table_function

The name of a table-valued DuckDB function such as "read_parquet", "read_csv", "read_csv_auto" or "read_json".

Value

A data frame for df_from_file(), or a duckplyr_df for duckplyr_df_from_file(), extended by the provided class.

Examples

# Create simple CSV file
path <- tempfile(fileext = ".csv")
write.csv(data.frame(a = 1:3, b = letters[4:6]), path, row.names = FALSE)

# Reading is immediate
df <- df_from_csv(path)

# Materialization only upon access
names(df)
#> [1] "a" "b"
df$a
#> materializing:
#> ---------------------
#> --- Relation Tree ---
#> ---------------------
#> read_csv_auto(/tmp/RtmpqecsHz/filefdf49e99932.csv)
#> 
#> ---------------------
#> -- Result Columns  --
#> ---------------------
#> - a (BIGINT)
#> - b (VARCHAR)
#> 
#> [1] 1 2 3

# Return as tibble:
df_from_file(
  path,
  "read_csv",
  options = list(delim = ",", auto_detect = TRUE),
  class = class(tibble())
)
#> materializing:
#> ---------------------
#> --- Relation Tree ---
#> ---------------------
#> read_csv(/tmp/RtmpqecsHz/filefdf49e99932.csv)
#> 
#> ---------------------
#> -- Result Columns  --
#> ---------------------
#> - a (BIGINT)
#> - b (VARCHAR)
#> 
#> # A tibble: 3 × 2
#>       a b    
#>   <dbl> <chr>
#> 1     1 d    
#> 2     2 e    
#> 3     3 f    

unlink(path)

# Write a Parquet file:
path_parquet <- tempfile(fileext = ".parquet")
df_to_parquet(df, path_parquet)

# With a duckplyr_df, the materialization occurs outside of R:
df %>%
  as_duckplyr_df() %>%
  mutate(b = a + 1) %>%
  df_to_parquet(path_parquet)

duckplyr_df_from_parquet(path_parquet)
#> materializing:
#> ---------------------
#> --- Relation Tree ---
#> ---------------------
#> read_parquet(/tmp/RtmpqecsHz/filefdfa0eb638.parquet)
#> 
#> ---------------------
#> -- Result Columns  --
#> ---------------------
#> - a (DOUBLE)
#> - b (DOUBLE)
#> 
#>   a b
#> 1 1 2
#> 2 2 3
#> 3 3 4

unlink(path_parquet)