File-based Foreign Table
File-based foreign tables enable direct access to data stored in cloud object storage systems like AWS S3 and other compatible object stores. This guide covers implementing and optimizing file-based external data integration.
Overview and Benefits
File-based foreign tables provide seamless integration with cloud object storage, enabling:
- Direct Data Access: Query files directly without importing
- Cost Efficiency: Access data on-demand without storage duplication
- Scalability: Handle petabyte-scale datasets
- Flexibility: Support multiple file formats and storage providers
- Real-time Analytics: Query latest data without ETL delays
Supported File Formats
Format | Use Case | Compression | Schema Evolution |
---|---|---|---|
Parquet | Analytics, OLAP | Built-in (Snappy, GZIP) | Schema evolution support |
ORC | Big data processing | Built-in (ZLIB, Snappy) | Advanced schema support |
CSV | Data exchange, imports | External (GZIP) | No schema support |
JSON | Semi-structured data | External (GZIP) | Schema-on-read |
Installation and Setup
Install Required Extension
Install the file foreign data wrapper extension:
Extension Requirements
The file_fdw extension must be installed with superuser privileges and is available by default in Tacnode clusters.
Verify Available Foreign Data Wrappers
AWS S3 Integration
Create S3 Foreign Server
Configure User Mapping for S3
Create S3 Foreign Tables
Parquet Files
ORC Files
CSV Files
Query S3 Data
File Formats
Parquet Optimization
ORC Configuration
CSV with Advanced Options
Batch Foreign Table Creation (IMPORT FOREIGN SCHEMA)
The file_fdw
extension supports batch creation of foreign tables using the IMPORT FOREIGN SCHEMA
command, which simplifies management of external file-based data sources.
Syntax Overview
Parameter Specifications
Parameter | Description | Values |
---|---|---|
remote_schema | Extended semantic support for file formats | parquet , orc |
LIMIT TO | Import only specified tables | List of table names |
EXCEPT | Exclude specified tables from import | List of table names |
server_name | Previously created foreign server | Server identifier |
local_schema | Target schema for foreign table metadata | Schema name |
OPTIONS | Additional configuration parameters | Key-value pairs |
Batch Creation Examples
Batch Import Parquet Files from S3
Selective Import with Table Filtering
Import with Exclusions
Advanced Configuration Options
Operational Considerations
File Discovery Process:
- Automatically scans specified directory for compatible files
- Generates table schemas based on file metadata
- Creates foreign table definitions with optimized settings
Schema Inference:
- Parquet: Leverages embedded schema information
- ORC: Uses file footer schema metadata
- Automatic data type mapping to TacNode equivalents
Performance Optimization:
Limitations and Best Practices
Current Limitations:
- Supports only Parquet and ORC file formats
- No partitioned table support
- CSV and JSON formats require manual table creation
Best Practices:
Directory Organization:
This comprehensive approach to file-based foreign tables enables efficient access to cloud object storage while maintaining optimal performance for your specific use cases.