Skip to content

Convert Arrow Column objects to Python lists to prevent ValueError in…#1217

Open
anmol52490 wants to merge 1 commit intohuggingface:mainfrom
anmol52490:main
Open

Convert Arrow Column objects to Python lists to prevent ValueError in…#1217
anmol52490 wants to merge 1 commit intohuggingface:mainfrom
anmol52490:main

Conversation

@anmol52490
Copy link
Copy Markdown

… tokenizer. Matches the fix implemented in the zh-TW version.

This PR fixes a ValueError that occurs when passing dataset columns directly to the tokenizer.

In current versions of the datasets library, indexing a column returns an Arrow Column object instead of a standard Python list. The tokenizer requires a list to function correctly.

History/Context:

  • This issue was previously identified and addressed in the zh-TW translation of the course.
  • This PR applies the same logic to the English version to maintain consistency and fix the broken example.

Changes:

  • Wrapped raw_datasets["train"]["sentence1"] and ["sentence2"] with list() to ensure compatibility.

… tokenizer. Matches the fix implemented in the zh-TW version.

This PR fixes a ValueError that occurs when passing dataset columns 
directly to the tokenizer.

In current versions of the `datasets` library, indexing a column returns 
an Arrow `Column` object instead of a standard Python list. The 
tokenizer requires a list to function correctly.

History/Context:
- This issue was previously identified and addressed in the zh-TW 
  translation of the course.
- This PR applies the same logic to the English version to maintain 
  consistency and fix the broken example.

Changes:
- Wrapped `raw_datasets["train"]["sentence1"]` and `["sentence2"]` 
  with `list()` to ensure compatibility.
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants