get_df function retrieves data from a configured source and returns it as a pandas DataFrame. For database sources (PostgreSQL, ClickHouse), a table name must be specified.
Parameters
source_name(str): Name of the data source as configured in preswald.toml OR a path to a file (supports CSV, Parquet, and JSON)table_name(Optional[str]): Required for database sources, specifies which table to retrieve
Returns
pd.DataFrame: Data from the specified source as a pandas DataFrame
Usage Examples
Note:connect must be called before get_df can be used.
CSV Source
For CSV sources,table_name is not required since the entire CSV file is treated as a single table:
PostgreSQL Source
For PostgreSQL sources,table_name is required:
ClickHouse Source
Similarly for ClickHouse sources,table_name is required:
Error Handling
The function includes comprehensive error handling:- Validates source existence
- Checks for required table_name parameter for database sources
- Handles connection and query errors
- Provides detailed error messages through logging
Best Practices
- Always check if source exists in preswald.toml before calling
- For database sources, always provide
table_name - Use error handling when calling the function
- Consider memory limitations when retrieving large datasets
Related Functions
query(): For custom SQL queries against data sources