Core: Close FileIO on cache eviction to prevent thread leaks#15910
Core: Close FileIO on cache eviction to prevent thread leaks#15910utafrali wants to merge 3 commits intoapache:mainfrom
Conversation
When tables get evicted from the cache, the FileIO wasn't being closed. This leaves S3FileIO with open thread pools that never get cleaned up, which is a problem in long-running applications. Added a try-catch around the close call since FileIO implementations can throw exceptions, and added a test to verify it gets called. Fixes apache#15898
There was a problem hiding this comment.
Pull request overview
This PR addresses resource/thread leaks in CachingCatalog by ensuring a table’s FileIO is closed when the table is evicted from the cache, preventing long-lived background threads (notably from S3FileIO) from accumulating in long-running applications.
Changes:
- Close
table.io()when a cached table entry is removed due to expiration. - Add a unit test verifying
FileIO.close()is invoked on cache eviction.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| core/src/main/java/org/apache/iceberg/CachingCatalog.java | Adds FileIO.close() in the cache removal listener for expired base tables. |
| core/src/test/java/org/apache/iceberg/hadoop/TestCachingCatalog.java | Adds a Mockito-based test asserting FileIO.close() is called after TTL eviction. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (RemovalCause.EXPIRED.equals(cause)) { | ||
| if (!MetadataTableUtils.hasMetadataTableName(tableIdentifier)) { | ||
| tableCache.invalidateAll(metadataTableIdentifiers(tableIdentifier)); | ||
| if (table != null) { | ||
| try { | ||
| table.io().close(); | ||
| } catch (Exception e) { | ||
| LOG.warn("Failed to close FileIO for evicted table {}", tableIdentifier, e); | ||
| } |
There was a problem hiding this comment.
The close logic only runs when the removal cause is EXPIRED. Caffeine can also remove entries due to COLLECTED (softValues GC) and potentially SIZE (if a max-size/weight policy is added); in those cases the FileIO would still not be closed and the thread leak described in #15898 can persist. Consider running the same invalidation/close path for all eviction causes (e.g., cause.wasEvicted()), while still skipping metadata table identifiers.
|
Done, pushed the fix. |
| tableCache.invalidateAll(metadataTableIdentifiers(tableIdentifier)); | ||
| if (table != null) { | ||
| try { | ||
| table.io().close(); |
There was a problem hiding this comment.
This may not be safe for all catalogs. e.g. HadoopCatalog creates a common fileIO at the catalog level and not table level. The same fileIO reference is shared with the table object. So the close() here would break tables in the catalog that are still actively used.
Can this be just handled as part of the catalog's close() instead?
|
Done, pushed the fix. |
Fixes #15898
When tables get evicted from the cache, the FileIO wasn't being closed. This leaves S3FileIO with open thread pools that never get cleaned up, which is a problem in long-running applications. Added a try-catch around the close call since FileIO implementations can throw exceptions, and added a test to verify it gets called.