Skip to content

GH-49614: [C++] Fix silent truncation in base64_decode on invalid input#49660

Open
Reranko05 wants to merge 1 commit intoapache:mainfrom
Reranko05:fix-base64-invalid-input
Open

GH-49614: [C++] Fix silent truncation in base64_decode on invalid input#49660
Reranko05 wants to merge 1 commit intoapache:mainfrom
Reranko05:fix-base64-invalid-input

Conversation

@Reranko05
Copy link
Copy Markdown

@Reranko05 Reranko05 commented Apr 4, 2026

Rationale for this change

arrow::util::base64_decode silently truncates output when encountering invalid base64 characters, returning partial results without signaling an error. This can lead to unintended data corruption.

What changes are included in this PR?

  • Add upfront validation of input characters in base64_decode
  • Return an empty string if invalid base64 characters are detected
  • Prevent silent truncation of decoded output

Are these changes tested?

Yes. A unit test has been added to verify that invalid input returns an empty string.

Are there any user-facing changes?

Yes. Previously, invalid base64 input could result in partial decoded output. Now, such inputs return an empty string.

This PR contains a "Critical Fix".

This change fixes a correctness issue where invalid base64 input could result in silently truncated output, leading to incorrect data being produced. The fix ensures such inputs are detected and handled safely.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 4, 2026

⚠️ GitHub issue #49614 has been automatically assigned in GitHub to PR creator.


for (char c : encoded_string) {
if (!(is_base64(c) || c == '=')) {
return "";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not comfortable with "" as the error path here. It’s indistinguishable from a valid decode of empty input, so malformed input still fails silently. I’d prefer this API to fail explicitly (Result<std::string> / checked variant) and have Gandiva propagate that as an error.

Returning null would be slightly better than returning "", because at least it doesn’t collide with a valid decoded empty string. But I still don’t think it’s the right default behavior here as null still turns malformed input into a regular value rather than an explicit failure.

std::string ret;

for (char c : encoded_string) {
if (!(is_base64(c) || c == '=')) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is absolutely insufficient and will not trip on input like abcd=AAA. Please do some research on best practices for sufficient and efficient base64 input validation.

std::string input = "hello world!"; // invalid base64
std::string output = arrow::util::base64_decode(input);

EXPECT_EQ(output, "");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More tests! In our day and age with tools that we have this is not even bare minimum. Null input? Valid input? Non-ascii input?

Did you locate other tests? I'm not seeing any other tests for base64_decode in this file so where are they?

@Reranko05 Reranko05 force-pushed the fix-base64-invalid-input branch from 4670ec5 to 5c7db64 Compare April 4, 2026 20:38
@Reranko05
Copy link
Copy Markdown
Author

Thanks for the feedback. I’ve updated the implementation and tests.

  • Added stricter validation (length, padding placement/count, allowed characters)
  • Removed early termination in the decode loop to avoid silent truncation
  • Expanded test coverage to include invalid inputs and edge cases

All tests pass locally. Please let me know if any further adjustments are needed.

@Reranko05 Reranko05 force-pushed the fix-base64-invalid-input branch from 5c7db64 to 8f053b7 Compare April 4, 2026 21:07
Copy link
Copy Markdown
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use arrow::Result<std::string> return type instead of using ARROW_LOG()?

@kou
Copy link
Copy Markdown
Member

kou commented Apr 5, 2026

FYI: You can run CI on your fork by enabling GitHub Actions on your fork.

@Reranko05
Copy link
Copy Markdown
Author

Could you use arrow::Result<std::string> return type instead of using ARROW_LOG()?

Thanks for the suggestion @kou !

Just to clarify, would you prefer changing the existing base64_decode API to return arrow::Resultstd::string, or introducing a separate checked variant while keeping the current API unchanged?

I want to make sure the approach aligns with existing usage and expectations.

@kou
Copy link
Copy Markdown
Member

kou commented Apr 5, 2026

"changing the existing base64_decode API to return arrow::Resultstd::string".
But I want to know how many changes are required for existing code that use base64_decode().

@Reranko05
Copy link
Copy Markdown
Author

@kou I checked the current usages of base64_decode(), and it appears to be used in a very limited number of places (primarily in tests and one internal call site in flight_test.cc).

Updating to arrow::Result<std::string> would require adjusting those call sites to use ARROW_ASSIGN_OR_RAISE, but the impact seems quite localized and manageable.

I can proceed with the API change and update the affected call sites accordingly.

@Reranko05
Copy link
Copy Markdown
Author

Hi @kou, just following up on this.

I can proceed with updating base64_decode() to return arrow::Result<std::string> and adjust the affected call sites accordingly. Please let me know if this approach looks good, or if you'd prefer any alternative.

Happy to proceed based on your guidance.

@kou
Copy link
Copy Markdown
Member

kou commented Apr 8, 2026

Oh, sorry. I forgot to reply this...

Yes. Let's proceed with arrow::Result<std::string>.

@Reranko05 Reranko05 force-pushed the fix-base64-invalid-input branch from 8f053b7 to 34a388c Compare April 8, 2026 09:53
@Reranko05
Copy link
Copy Markdown
Author

Hi @kou, thanks for confirming!

I’ve updated base64_decode() to return arrow::Result<std::string> and added validation for invalid inputs (length, padding, and non-base64 characters). I also updated the tests accordingly.

All tests are passing locally. Please let me know if you’d like any changes or adjustments.

"0123456789+/";

auto is_base64 = [](unsigned char c) -> bool {
return (std::isalnum(c) || (c == '+') || (c == '/'));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing this logic right below base64_chars definition is a bit strange

return arrow::Status::Invalid("Invalid base64 input: length is not a multiple of 4");
}

size_t padding_start = encoded_string.find('=');
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a fan of separate validation loop as it is a performance overhead.
Please validate and decode in a single pass.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing it out. I’ll integrate the validation into the decoding loop to avoid the extra pass.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the current implementation still uses 2 passes.


ARROW_EXPORT
std::string base64_decode(std::string_view s);
arrow::Result<std::string> base64_decode(std::string_view s);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to change the callers?
Perhaps a single integration test that would call a caller function with an invalid input to validate.

Copy link
Copy Markdown
Author

@Reranko05 Reranko05 Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on @kou's earlier suggestion, I proceeded with updating base64_decode() to return arrow::Result<std::string> and adjusted the existing usages accordingly.


#include "arrow/util/visibility.h"
#include "arrow/result.h"
#include "arrow/status.h"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this? I think that arrow/result.h includes arrow/status.h.

Comment on lines +235 to +238
ASSERT_TRUE(result.starts_with("0")) << result;
ASSERT_TRUE(result.rfind("0", 0) == 0) << result;
result = ToChars(0.25);
ASSERT_TRUE(result.starts_with("0.25")) << result;
ASSERT_TRUE(result.rfind("0.25", 0) == 0) << result;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need these changes?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, this change is not directly related to the base64 fix. I updated it earlier due to compatibility issues with starts_with, but I’ll revert it to keep the scope of this PR focused.

Comment on lines +243 to +244
auto r1 = arrow::util::base64_decode("Zg==");
ASSERT_TRUE(r1.ok());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use ASSERT_OK_AND_ASSIGN()?

Suggested change
auto r1 = arrow::util::base64_decode("Zg==");
ASSERT_TRUE(r1.ok());
ASSERT_OK_AND_ASSIGN(auto string, arrow::util::base64_decode("Zg=="));

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, thanks! I’ll update the tests to use ASSERT_OK_AND_ASSIGN.

auto r1 = arrow::util::base64_decode("Zg==");
ASSERT_TRUE(r1.ok());
EXPECT_EQ(r1.ValueOrDie(), "f");
auto r2 = arrow::util::base64_decode("Zm8=");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse one variable instead of declaring multiple variables (r1, r2, ...)?

Or could you use more meaningful variable name for each case such as two_paddings, one_padding and no_padding?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll update the tests to use more meaningful variable names while also aligning with ASSERT_OK_AND_ASSIGN.

Comment on lines +252 to +254
auto r4 = arrow::util::base64_decode("aGVsbG8gd29ybGQ=");
ASSERT_TRUE(r4.ok());
EXPECT_EQ(r4.ValueOrDie(), "hello world");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this case? What is the difference with other cases?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, this case doesn’t add new coverage beyond the existing ones. I’ll remove it to keep the tests focused.

Comment on lines +285 to +286
auto r1 = arrow::util::base64_decode("====");
ASSERT_FALSE(r1.ok());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this is an invalid padding case.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right.. this case should fall under invalid padding I’ll move it accordingly.

}

std::string base64_decode(std::string_view encoded_string) {
arrow::Result<std::string> base64_decode(std::string_view encoded_string) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use Result here because this is in arrow::util namespace:

Suggested change
arrow::Result<std::string> base64_decode(std::string_view encoded_string) {
Result<std::string> base64_decode(std::string_view encoded_string) {

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I’ll update this to use Result directly within the namespace.

if (padding_start != std::string::npos) {
for (size_t k = padding_start; k < encoded_string.size(); ++k) {
if (encoded_string[k] != '=') {
return arrow::Status::Invalid("Invalid base64 input: padding character '=' found at invalid position");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return arrow::Status::Invalid("Invalid base64 input: padding character '=' found at invalid position");
return Status::Invalid("Invalid base64 input: padding character '=' found at invalid position");

Comment on lines +113 to +115
auto is_base64 = [](unsigned char c) -> bool {
return (std::isalnum(c) || (c == '+') || (c == '/'));
};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to redefine this?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The redefinition isn’t necessary. I’ll remove this and reuse the existing base64_chars for validation instead.

Comment on lines +121 to +139
size_t padding_start = encoded_string.find('=');
if (padding_start != std::string::npos) {
for (size_t k = padding_start; k < encoded_string.size(); ++k) {
if (encoded_string[k] != '=') {
return arrow::Status::Invalid("Invalid base64 input: padding character '=' found at invalid position");
}
}

size_t padding_count = encoded_string.size() - padding_start;
if (padding_count > 2) {
return arrow::Status::Invalid("Invalid base64 input: too many padding characters");
}
}

for (char c : encoded_string) {
if (c != '=' && !is_base64(c)) {
return arrow::Status::Invalid("Invalid base64 input: contains non-base64 character '" + std::string(1, c) + "'");
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These validations traverses the encoded_string multiple times, right? Does this have performance penalty for large input?
Can we optimize them?

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a correctness issue in Arrow’s C++ base64 decoder by ensuring malformed base64 input is detected instead of producing silently truncated/partial output.

Changes:

  • Adds pre-validation for base64 input (length, padding placement, invalid characters) in base64_decode.
  • Changes base64_decode API to return arrow::Result<std::string> with Status::Invalid on malformed input.
  • Adds unit tests covering valid/invalid decoding cases (and adjusts a couple of ToChars assertions).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

File Description
cpp/src/arrow/vendored/base64.cpp Adds base64 input validation and switches decode to return errors instead of partial output.
cpp/src/arrow/util/base64.h Updates public API signature of base64_decode to return Result<std::string>.
cpp/src/arrow/util/string_test.cc Adds tests for base64 decode validity/error cases and tweaks ToChars expectations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


ARROW_EXPORT
std::string base64_decode(std::string_view s);
arrow::Result<std::string> base64_decode(std::string_view s);
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

base64_decode was changed from returning std::string to arrow::Result<std::string>, but there are existing call sites in the repo that still treat it as a std::string (e.g., constructing streams, assigning to std::string). As-is, this is an API/ABI breaking change and will not compile unless all callers are updated or a backwards-compatible overload/wrapper is kept.

Suggested change
arrow::Result<std::string> base64_decode(std::string_view s);
std::string base64_decode(std::string_view s);

Copilot uses AI. Check for mistakes.
Comment on lines 31 to 35
std::string base64_encode(std::string_view s);

ARROW_EXPORT
std::string base64_decode(std::string_view s);
arrow::Result<std::string> base64_decode(std::string_view s);

Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change conflicts with the PR description/user-facing behavior: the description says invalid input should return an empty string, but the new signature returns Result<std::string> and the implementation returns Status::Invalid(...) on malformed inputs. Please align the implementation/API and the stated behavior (either update the description and downstream expectations, or preserve the old std::string API that returns "" on invalid input).

Copilot uses AI. Check for mistakes.
Comment on lines 32 to +37
#include "arrow/util/base64.h"
#include "arrow/util/logging.h"
#include "arrow/result.h"
#include "arrow/status.h"
#include <iostream>
#include <cctype>
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arrow/util/logging.h is newly included here but isn't used anywhere in this file. Please remove the unused include to avoid unnecessary dependencies and potential lint/Werror failures.

Copilot uses AI. Check for mistakes.
Comment on lines +108 to +116
static const std::string base64_chars =
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz"
"0123456789+/";

auto is_base64 = [](unsigned char c) -> bool {
return (std::isalnum(c) || (c == '+') || (c == '/'));
};

Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inside base64_decode, base64_chars and is_base64 are redefined even though the same base64_chars and is_base64 already exist at file scope above. This introduces name shadowing (potentially tripping -Wshadow) and also leaves the file-scope is_base64 unused. Prefer reusing the existing helpers (or refactor to a single definition) to avoid warnings and duplication.

Suggested change
static const std::string base64_chars =
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz"
"0123456789+/";
auto is_base64 = [](unsigned char c) -> bool {
return (std::isalnum(c) || (c == '+') || (c == '/'));
};

Copilot uses AI. Check for mistakes.
}

size_t padding_start = encoded_string.find('=');
if (padding_start != std::string::npos) {
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

padding_start is computed via encoded_string.find('=') on a std::string_view, but compared against std::string::npos. This works but is inconsistent and easy to misread; prefer comparing against std::string_view::npos when operating on a string_view.

Suggested change
if (padding_start != std::string::npos) {
if (padding_start != std::string_view::npos) {

Copilot uses AI. Check for mistakes.
Comment on lines +135 to +139
for (char c : encoded_string) {
if (c != '=' && !is_base64(c)) {
return arrow::Status::Invalid("Invalid base64 input: contains non-base64 character '" + std::string(1, c) + "'");
}
}
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The invalid-character error message embeds the raw byte into the Status string (std::string(1, c)). For non-printable / non-UTF8 bytes (your tests include 0xFF), this can produce unreadable or problematic error messages/logging. Consider reporting the offending byte as an escaped/hex value (and/or the index) instead of inserting it verbatim.

Suggested change
for (char c : encoded_string) {
if (c != '=' && !is_base64(c)) {
return arrow::Status::Invalid("Invalid base64 input: contains non-base64 character '" + std::string(1, c) + "'");
}
}
auto format_hex_byte = [](unsigned char c) -> std::string {
static const char hex_digits[] = "0123456789ABCDEF";
std::string hex = "0x00";
hex[2] = hex_digits[(c >> 4) & 0x0F];
hex[3] = hex_digits[c & 0x0F];
return hex;
};
for (size_t k = 0; k < encoded_string.size(); ++k) {
unsigned char c = static_cast<unsigned char>(encoded_string[k]);
if (c != '=' && !is_base64(c)) {
return arrow::Status::Invalid(
"Invalid base64 input: contains non-base64 byte " + format_hex_byte(c) +
" at position " + std::to_string(k));
}
}

Copilot uses AI. Check for mistakes.
}

TEST(Base64DecodeTest, NonAsciiInput) {
std::string input = std::string("abcd") + char(0xFF) + "==";
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NonAsciiInput is currently guaranteed to fail the new size % 4 == 0 validation (the constructed string length is 7), so it doesn't actually exercise the non-ASCII character validation path. Adjust the test input to a length that is a multiple of 4 so it fails specifically due to the non-base64 byte (e.g., keep padding rules valid and include the 0xFF byte).

Suggested change
std::string input = std::string("abcd") + char(0xFF) + "==";
std::string input = std::string("abc") + static_cast<char>(0xFF);

Copilot uses AI. Check for mistakes.
@Reranko05 Reranko05 force-pushed the fix-base64-invalid-input branch from 34a388c to ed84348 Compare April 8, 2026 17:09
@Reranko05
Copy link
Copy Markdown
Author

Reranko05 commented Apr 8, 2026

Hi @kou, I have addressed all review comments:

  • Removed redundant helpers and reused existing base64_chars
  • Merged validation into decoding loop (single-pass)
  • Fixed padding validation to ensure '=' only appears at the end
  • Cleaned up includes and namespace usage
  • Updated tests to improve coverage and follow Arrow conventions

All tests pass locally.

@kou kou requested a review from Copilot April 8, 2026 21:14
@kou
Copy link
Copy Markdown
Member

kou commented Apr 8, 2026

Could you enable GitHub Actions on your fork to run CI on your fork too?

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +119 to +124
while (in_len-- && encoded_string[in_] != '=') {
unsigned char c = encoded_string[in_];

if (base64_chars.find(c) == std::string::npos) {
return Status::Invalid("Invalid base64 input: contains non-base64 byte at position " + std::to_string(in_));
}
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation loop can exit with i == 2 for inputs with == padding (e.g. "Zg=="). In the trailing partial-quantum handling later in this function, char_array_3[1] is computed using char_array_4[2] even when i == 2, which reads an uninitialized stack value (undefined behavior). Consider zero-initializing the remaining char_array_4 slots before computing char_array_3, or only computing the bytes that will actually be appended based on i.

Copilot uses AI. Check for mistakes.
Comment on lines +101 to +103
if (encoded_string.size() % 4 != 0) {
return Status::Invalid("Invalid base64 input: length is not a multiple of 4");
}
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description says invalid input should "return an empty string", but the implementation now returns Status::Invalid(...) (and the header signature is Result<std::string>). Either update the PR description/user-facing notes to reflect the new error-reporting API, or adjust the implementation to match the documented behavior (e.g. preserve the std::string API and return "" on invalid input).

Copilot uses AI. Check for mistakes.
Comment on lines +256 to +263
TEST(Base64DecodeTest, InvalidLength) {
ASSERT_RAISES(Invalid, arrow::util::base64_decode("abc"));
ASSERT_RAISES(Invalid, arrow::util::base64_decode("abcde"));
}

TEST(Base64DecodeTest, InvalidCharacters) {
ASSERT_RAISES(Invalid, arrow::util::base64_decode("ab$="));
}
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests assert that invalid input raises Invalid, but the PR description/user-facing notes say invalid base64 should return an empty string. Please align the tests with the intended public behavior (either update the implementation/headers to return "" on invalid input, or update the PR description to reflect the new error-returning API).

Copilot uses AI. Check for mistakes.
Comment on lines +243 to +253
ASSERT_OK_AND_ASSIGN(auto two_paddings, arrow::util::base64_decode("Zg=="));
EXPECT_EQ(two_paddings, "f");

ASSERT_OK_AND_ASSIGN(auto one_padding, arrow::util::base64_decode("Zm8="));
EXPECT_EQ(one_padding, "fo");

ASSERT_OK_AND_ASSIGN(auto no_padding, arrow::util::base64_decode("Zm9v"));
EXPECT_EQ(no_padding, "foo");

ASSERT_OK_AND_ASSIGN(auto single_char, arrow::util::base64_decode("TQ=="));
EXPECT_EQ(single_char, "M");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that f and M cases check the same pattern.

Comment on lines +257 to +258
ASSERT_RAISES(Invalid, arrow::util::base64_decode("abc"));
ASSERT_RAISES(Invalid, arrow::util::base64_decode("abcde"));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry. Could you use ASSERT_RAISES_WITH_MESSAGE() to check message too?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you create base64_test.cc instead of reusing existing string_test.cc?
In general, we create XXX_test.cc for XXX.{cc,h}.

We can build base64_test.cc with the following CMakeLists.txt change:

diff --git a/cpp/src/arrow/util/CMakeLists.txt b/cpp/src/arrow/util/CMakeLists.txt
index 4352716ebd..deb3e9e3fb 100644
--- a/cpp/src/arrow/util/CMakeLists.txt
+++ b/cpp/src/arrow/util/CMakeLists.txt
@@ -49,6 +49,7 @@ add_arrow_test(utility-test
                SOURCES
                align_util_test.cc
                atfork_test.cc
+               base64_test.cc
                byte_size_test.cc
                byte_stream_split_test.cc
                cache_test.cc

return arrow::Status::Invalid("Invalid base64 input: length is not a multiple of 4");
}

size_t padding_start = encoded_string.find('=');
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the current implementation still uses 2 passes.

@Reranko05
Copy link
Copy Markdown
Author

Could you enable GitHub Actions on your fork to run CI on your fork too?

Sure, I'll do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants