Foreign Data Wrappers: S3, Iceberg & Delta Lake
Foreign data wrappers let you query Parquet on S3, Iceberg tables, and Delta Lake catalogs with standard SQL and zero data movement. Complete setup guide.
Your analytics team needs to query Parquet files sitting in S3. Your ML engineers want to read from an Apache Iceberg catalog. A downstream reporting tool needs access to Delta Lake tables managed through Databricks Unity Catalog. The traditional answer is to build ETL pipelines — extract data from each source, transform it, load it into your database, and maintain those pipelines forever.
Foreign data wrappers offer a different path: query every one of those sources directly from your database using plain SQL, without moving a single byte.
A foreign data wrapper (FDW) is an extension that lets you define foreign tables — virtual table definitions that point to external data sources. Once defined, you query them exactly like local tables: SELECT, JOIN, WHERE, GROUP BY. The wrapper handles translation, pushes filters to the source where possible, and returns results as if the data lived locally.
This guide covers Tacnode's four foreign data wrapper extensions — file_fdw for S3 object storage, glue_fdw for AWS Glue catalogs, iceberg_fdw for Apache Iceberg, and delta_fdw for Delta Lake — including setup, type mapping, performance optimization, and the constraints you need to know.
How Foreign Data Wrappers Work
Foreign data wrappers follow the SQL/MED standard (Management of External Data). The architecture has four components that layer on top of each other.
The foreign data wrapper extension is a library that knows how to communicate with a specific type of external source. Tacnode ships four: file_fdw for S3-compatible object storage, glue_fdw for AWS Glue, iceberg_fdw for Apache Iceberg catalogs, and delta_fdw for Delta Lake and Unity Catalog. Each must be installed once per database before use.
A foreign server object stores the connection details for a specific external source — the AWS region, catalog endpoint, or S3 bucket configuration. One server definition can back multiple foreign tables pointing at different remote objects.
A user mapping ties a local database user to authentication credentials for the external source. This separates connection configuration from secrets, letting different users authenticate with different credentials against the same server.
A foreign table is a table definition in your local database that describes the schema of external data. It has columns and types like any table, but no local storage. Queries against a foreign table are translated by the wrapper into whatever the external source understands — S3 object reads, catalog API calls, or Iceberg metadata lookups.
The setup sequence is always the same regardless of data source: install the extension, create a server, create a user mapping, then create foreign tables or import them in bulk.
file_fdw: Query Parquet and ORC Files on S3
file_fdw connects to S3-compatible object storage, letting you query Parquet and ORC files directly as SQL tables. This is the most common starting point — if your data lives as files in S3, this is the wrapper you want.
Supported file formats are Parquet and ORC. Both are columnar formats with built-in compression and embedded schema metadata, which means the wrapper can infer column types automatically. CSV is not supported.
Install the extension (it comes pre-installed on Tacnode clusters, but the CREATE EXTENSION statement is still required once per database):
CREATE EXTENSION IF NOT EXISTS file_fdw;
Create a foreign server pointing to your AWS region:
CREATE SERVER s3_production
FOREIGN DATA WRAPPER file_fdw
OPTIONS (AWS_REGION 'us-west-2');
Set up authentication. For private buckets, create a user mapping with your AWS credentials:
CREATE USER MAPPING FOR CURRENT_USER
SERVER s3_production
OPTIONS (
AWS_ACCESS_ID 'AKIA...',
AWS_ACCESS_KEY 'your-secret-key'
);
For publicly accessible buckets, the user mapping credentials are optional.
Now create a foreign table. The DIR option points to the S3 prefix and FORMAT specifies the file type:
CREATE FOREIGN TABLE sales_data (
transaction_id BIGINT,
customer_id INTEGER,
product_id INTEGER,
sale_date DATE,
amount NUMERIC(10,2),
region VARCHAR(50)
)
SERVER s3_production
OPTIONS (
DIR 's3://analytics-bucket/sales/',
FORMAT 'parquet'
);
That's it. You can now query this table with standard SQL:
SELECT region, SUM(amount) AS total_sales
FROM sales_data
WHERE sale_date >= '2026-01-01'
GROUP BY region
ORDER BY total_sales DESC;
The wrapper reads only the Parquet files under the specified S3 prefix, applies column and predicate pushdown where possible, and returns the results. You never download the files — the data stays in S3.
Working with Partitioned Data on S3
Most production S3 datasets are partitioned — files organized into prefix hierarchies like s3://bucket/sales/region=us-east/year=2026/. Tacnode's file_fdw supports partition pruning, meaning queries with filters on partition columns only scan the relevant partitions instead of reading every file.
When your Parquet or ORC data is partitioned by one or more columns, declare the partition keys in the foreign table options:
CREATE FOREIGN TABLE partitioned_sales (
transaction_id BIGINT,
customer_id INTEGER,
amount NUMERIC(10,2),
sale_date DATE,
region VARCHAR(50)
)
SERVER s3_production
OPTIONS (
DIR 's3://analytics-bucket/sales/',
FORMAT 'parquet',
PARTITION_BY 'region,sale_date'
);
Now a query filtering on region only reads files under that region's prefix:
-- Only scans s3://analytics-bucket/sales/region=us-east/
SELECT SUM(amount)
FROM partitioned_sales
WHERE region = 'us-east'
AND sale_date >= '2026-01-01';
Without partition pruning, every query would scan every file in the directory. On large datasets this is the difference between seconds and minutes.
glue_fdw: Connect to AWS Glue Catalogs
If your organization uses AWS Glue as a metadata catalog, glue_fdw lets you query Glue-cataloged tables directly. Instead of pointing at raw S3 paths, you import table definitions from the Glue catalog — the wrapper reads the catalog metadata to know where files are, what format they use, and how they're partitioned.
CREATE EXTENSION IF NOT EXISTS glue_fdw;
Create a server pointing to the Glue catalog's region:
CREATE SERVER glue_production
FOREIGN DATA WRAPPER glue_fdw
OPTIONS (AWS_REGION 'us-east-1');
Authenticate with IAM credentials:
CREATE USER MAPPING FOR CURRENT_USER
SERVER glue_production
OPTIONS (
AWS_ACCESS_ID 'AKIA...',
AWS_ACCESS_KEY 'your-secret-key'
);
The power of Glue integration is bulk import. Instead of manually writing CREATE FOREIGN TABLE for each table, import an entire Glue database at once:
IMPORT FOREIGN SCHEMA "sales_database"
FROM SERVER glue_production
INTO public;
This inspects the Glue catalog and creates matching foreign tables in your public schema for every table in the sales_database. Column names, types, storage locations, and format details are all pulled from the catalog automatically.
You can be selective about what you import:
-- Import only specific tables
IMPORT FOREIGN SCHEMA "analytics_db"
LIMIT TO (orders, customers, products)
FROM SERVER glue_production
INTO public;
-- Import everything except staging tables
IMPORT FOREIGN SCHEMA "analytics_db"
EXCEPT (staging_events, temp_loads)
FROM SERVER glue_production
INTO public;
For production environments, prefer IAM roles over hardcoded access keys. Monitor schema changes in Glue — if the upstream catalog changes, you'll need to drop and re-import the affected foreign tables to pick up the new definitions.
iceberg_fdw: Query Apache Iceberg Tables
Apache Iceberg is an open table format for large-scale analytics workloads, offering ACID transactions, schema evolution, time travel, and hidden partitioning. Tacnode's iceberg_fdw connects to Iceberg catalogs and exposes Iceberg tables as foreign tables.
Iceberg supports multiple catalog backends. Tacnode integrates with REST Catalog (HTTP-based), Hive Metastore, AWS Glue (as an Iceberg catalog), and Hadoop filesystem catalogs.
CREATE EXTENSION IF NOT EXISTS iceberg_fdw;
Create a server pointing to your Iceberg REST catalog:
CREATE SERVER iceberg_production
FOREIGN DATA WRAPPER iceberg_fdw
OPTIONS (
ICEBERG_REST_CATALOG_ENDPOINT 'https://iceberg-catalog.company.com/api/catalog',
CATALOG 'production_catalog'
);
Authenticate with a catalog token:
CREATE USER MAPPING FOR CURRENT_USER
SERVER iceberg_production
OPTIONS (
TOKEN 'principal:data-reader;realm:company-realm'
);
Import tables from an Iceberg namespace:
IMPORT FOREIGN SCHEMA "sales_analytics"
FROM SERVER iceberg_production
INTO public;
Once imported, you query Iceberg tables the same way as any other foreign table — the wrapper handles metadata lookups, snapshot management, and file reads transparently.
Some things to know: write performance varies by catalog and storage backend. Schema evolution capabilities depend on your specific catalog implementation. Cross-catalog operations are not supported — each server connects to one catalog.
delta_fdw: Query Delta Lake via Unity Catalog
Delta Lake is an open storage layer that adds ACID transactions and schema enforcement on top of Parquet files. Tacnode's delta_fdw integrates with Databricks Unity Catalog, letting you query Delta Lake tables managed in your Databricks workspace.
CREATE EXTENSION IF NOT EXISTS delta_fdw;
Create a server pointing to your Unity Catalog endpoint:
CREATE SERVER unity_production
FOREIGN DATA WRAPPER delta_fdw
OPTIONS (
UNITY_CATALOG_ENDPOINT 'https://dbc-12345678-9abc.cloud.databricks.com',
AWS_REGION 'us-west-2',
CATALOG 'production_catalog'
);
Authenticate with both a Unity Catalog token and AWS credentials (since Delta Lake files live on S3):
CREATE USER MAPPING FOR CURRENT_USER
SERVER unity_production
OPTIONS (
UNITY_CATALOG_TOKEN 'dapi...',
AWS_ACCESS_ID 'AKIA...',
AWS_ACCESS_KEY 'your-secret-key'
);
Import tables from a Unity Catalog schema:
-- Import entire schema
IMPORT FOREIGN SCHEMA "sales_data"
FROM SERVER unity_production
INTO public;
-- Import selected tables
IMPORT FOREIGN SCHEMA "analytics"
LIMIT TO (revenue_daily, customer_segments)
FROM SERVER unity_production
INTO public;
Delta Lake foreign tables are read-only. Schema changes in Unity Catalog require dropping and re-importing the foreign table definitions. Large unfiltered scans will read significant amounts of data from S3 — always use WHERE clauses to limit what the wrapper fetches.
Importing Schemas in Bulk
Manually writing CREATE FOREIGN TABLE for every external table is tedious when you have dozens of tables in a catalog. IMPORT FOREIGN SCHEMA automates the process — it inspects the remote source and creates matching foreign table definitions locally.
The syntax is the same across all four wrappers:
IMPORT FOREIGN SCHEMA remote_schema_name
FROM SERVER server_name
INTO public;
Foreign tables are imported into the public schema. Use LIMIT TO or EXCEPT to control which tables get imported.
For catalog-based wrappers (glue_fdw, iceberg_fdw, delta_fdw), the schema import reads the catalog metadata to build table definitions. For file_fdw, schema import leverages embedded metadata in Parquet and ORC files to infer column names and types. Note that partitioned tables are not supported with batch import for file-based sources — you'll need to create those foreign tables manually with PARTITION_BY.
When the remote source changes — new columns added, tables created or dropped — re-import to pick up the changes. Drop the stale foreign table first, then re-import:
DROP FOREIGN TABLE IF EXISTS orders;
IMPORT FOREIGN SCHEMA "production_db"
LIMIT TO (orders)
FROM SERVER glue_production
INTO public;
Data Type Mapping Across Formats
When the wrapper reads external data, it converts source types to PostgreSQL types. The mapping is straightforward for basic types but has important edge cases.
Parquet and ORC share the same type mapping. BOOLEAN maps to BOOLEAN. INT and INTEGER map to INTEGER. BIGINT maps to BIGINT. FLOAT maps to REAL. DOUBLE maps to DOUBLE PRECISION. STRING maps to TEXT. VARCHAR and CHAR preserve their declared lengths. DECIMAL preserves precision and scale. BINARY maps to BYTEA. TINYINT maps to SMALLINT for compatibility. Arrays are mapped recursively — ARRAY<INTEGER> becomes INTEGER[].
Delta Lake follows the same basic mappings with one important distinction: Delta's timestamp type (with timezone) maps to TIMESTAMPTZ, while timestamp_ntz (without timezone) maps to TIMESTAMP. Decimal precision is preserved.
Iceberg offers the most granular type support, including direct UUID to UUID mapping and TIME with microsecond precision. Fixed and variable-length binary types both map to BYTEA.
MAP and STRUCT types are not supported across any format. If your external data contains nested structures like MAP<STRING, INTEGER> or STRUCT<name STRING, age INT>, those columns cannot be exposed through foreign tables. The workaround is to flatten nested structures before they reach the foreign table — either by preprocessing the source data or by restructuring the upstream pipeline to produce flat schemas.
Only simple arrays qualify for type mapping. Complex nested arrays containing MAPs or STRUCTs will fail.
Query Performance: Pushdown and Partition Pruning
Foreign data wrappers don't naively fetch all data and filter locally. Tacnode applies three optimization strategies to minimize data transfer and maximize query speed.
Predicate pushdown sends your WHERE clause conditions to the data source. When you filter on sale_date or region, the wrapper applies those filters at the source level rather than fetching everything and discarding rows locally. This can reduce data scanned by orders of magnitude on large datasets.
Projection pushdown sends only the columns you SELECT. If your query needs three columns out of fifty, the wrapper reads only those three from the Parquet or ORC files. Since both formats are columnar, this is especially effective — the wrapper skips entire column chunks without reading them.
Partition pruning limits which files or partitions are scanned based on your query predicates. If data is partitioned by date and your query filters to January 2026, the wrapper only reads files in the January partition directory. Combined with predicate pushdown, this means a well-structured query over partitioned data touches a tiny fraction of the total dataset.
To get the most from these optimizations: always include WHERE clauses that align with your partition columns. Select only the columns you need rather than using SELECT *. Structure your S3 data with clear partition hierarchies. For large datasets accessed frequently, consider materialized views to avoid repeated scans entirely.
Caching Foreign Data with Materialized Views
Foreign tables are read-only views of external data — every query re-reads from the source. For frequently accessed datasets or complex joins against foreign tables, this means repeated network overhead. Materialized views solve this by caching query results locally.
Create a materialized view over a foreign table to snapshot the data:
CREATE MATERIALIZED VIEW sales_summary AS
SELECT
region,
DATE_TRUNC('month', sale_date) AS month,
COUNT(*) AS order_count,
SUM(amount) AS total_revenue
FROM sales_data
WHERE sale_date >= '2025-01-01'
GROUP BY region, DATE_TRUNC('month', sale_date);
Queries against sales_summary read from local storage — no S3 round trips, no network latency. You refresh the materialized view when you need updated data:
REFRESH MATERIALIZED VIEW sales_summary;
This pattern is especially valuable for dashboards and reporting queries that run the same aggregations repeatedly. The foreign table gives you live access to the source. The materialized view gives you fast, repeated reads. Use both together: foreign tables for ad-hoc exploration, materialized views for production workloads.
You can also join materialized views (cached foreign data) with local tables, creating a unified query layer where some data comes from S3, some from Iceberg, and some lives natively in Tacnode — all accessed through standard SQL.
Constraints and Limitations
Foreign data wrappers are powerful but not without boundaries. Understanding the constraints upfront prevents surprises in production.
All foreign tables are read-only. You can SELECT from them — including JOINs, aggregations, subqueries, and CTEs — but INSERT, UPDATE, and DELETE are not supported. If you need to write transformed data somewhere, query the foreign table and INSERT the results into a local table or materialized view.
Foreign tables are created in the public schema. When using IMPORT FOREIGN SCHEMA, the target is always INTO public.
CSV files are not a supported format. file_fdw works with Parquet and ORC — both columnar formats with embedded schema metadata. If your data is in CSV, convert it to Parquet before querying through a foreign table.
MAP and STRUCT types are not supported. External data containing nested complex types like MAP<STRING, INT> or STRUCT must be flattened at the source before being queried through a foreign table.
Schema changes in external sources require foreign table recreation. If an upstream catalog adds columns, renames fields, or changes types, the local foreign table definition becomes stale. Drop and re-create (or re-import) the foreign table to pick up changes. There is no automatic sync.
IMPORT FOREIGN SCHEMA does not support partitioned tables for file-based sources. If your S3 data is partitioned, create those foreign tables manually with the PARTITION_BY option.
Foreign table operations are excluded from local transaction boundaries. A query against a foreign table reads whatever state the external source has at query time — it doesn't participate in your local BEGIN/COMMIT transaction.
Choosing the Right Wrapper
Each wrapper targets a specific type of data source. Here is how to decide which one to use.
Use file_fdw when your data lives as Parquet or ORC files in S3 or S3-compatible object storage. This is the most common scenario for data engineering teams that produce files from Spark, Flink, or other batch processing systems. You control the S3 paths, file formats, and partition layout directly.
Use glue_fdw when your organization uses AWS Glue as its metadata catalog. Glue already knows where your files are, what format they use, and how they are partitioned. Importing from Glue is faster and less error-prone than manually defining file_fdw tables, because the catalog metadata does the work for you.
Use iceberg_fdw when your data is managed as Apache Iceberg tables. Iceberg adds ACID transactions, schema evolution, and time travel on top of file storage. If your data engineering team writes Iceberg tables from Spark or Flink, this wrapper gives you SQL access to those tables without duplicating the data.
Use delta_fdw when your data lives in Delta Lake, particularly when managed through Databricks Unity Catalog. If your analytics or ML teams use Databricks as their primary workspace, delta_fdw bridges the gap between Databricks-managed data and SQL access in Tacnode.
In all cases, the same principle applies: foreign data wrappers give you SQL access to external data without moving it. You define where the data lives once, and every subsequent query works with standard SQL — joins with local tables, aggregations, window functions, CTEs, all of it. The data stays at the source. Only the results move.
Written by Boyd Stowe
Building the infrastructure layer for AI-native applications. We write about Decision Coherence, Tacnode Context Lake, and the future of data systems.
View all postsContinue Reading
Apache Kafka vs Apache Flink: The Real Comparison Is Flink vs Kafka Streams
Enterprise Integration Patterns: The Complete Guide for Modern Data Architectures
What Is Data Quality? The Complete Guide to Data Quality [2026]
Ready to see Tacnode Context Lake in action?
Book a demo and discover how Tacnode can power your AI-native applications.
Book a Demo