From c789fa5dd0e14a01e66d9ca9f237f703d4540c07 Mon Sep 17 00:00:00 2001 From: Dhruvil Date: Sun, 22 Feb 2026 23:17:24 -0800 Subject: [PATCH 1/3] fix: replace deprecated grouped_entities with aggregation_strategy The `grouped_entities=True` parameter was removed from `TokenClassificationPipeline`. Replace all occurrences across all language chapters and subtitles with the current equivalent: `aggregation_strategy="simple"`. Co-Authored-By: Claude Sonnet 4.6 --- chapters/de/chapter1/10.mdx | 4 ++-- chapters/de/chapter1/3.mdx | 4 ++-- chapters/en/chapter1/3.mdx | 4 ++-- chapters/en/chapter1/7.mdx | 4 ++-- chapters/es/chapter1/10.mdx | 4 ++-- chapters/es/chapter1/3.mdx | 4 ++-- chapters/fr/chapter1/10.mdx | 4 ++-- chapters/fr/chapter1/3.mdx | 4 ++-- chapters/hi/chapter1/10.mdx | 4 ++-- chapters/hi/chapter1/3.mdx | 4 ++-- chapters/it/chapter1/10.mdx | 4 ++-- chapters/it/chapter1/3.mdx | 4 ++-- chapters/ja/chapter1/10.mdx | 4 ++-- chapters/ja/chapter1/3.mdx | 4 ++-- chapters/ko/chapter1/10.mdx | 4 ++-- chapters/ko/chapter1/3.mdx | 4 ++-- chapters/my/chapter1/3.mdx | 6 +++--- chapters/my/chapter1/7.mdx | 6 +++--- chapters/pt/chapter1/10.mdx | 4 ++-- chapters/pt/chapter1/3.mdx | 4 ++-- chapters/ro/chapter1/10.mdx | 4 ++-- chapters/ro/chapter1/3.mdx | 4 ++-- chapters/ru/chapter1/10.mdx | 4 ++-- chapters/ru/chapter1/3.mdx | 4 ++-- chapters/te/chapter1/3.mdx | 4 ++-- chapters/te/chapter1/7.mdx | 4 ++-- chapters/th/chapter1/10.mdx | 4 ++-- chapters/th/chapter1/3.mdx | 4 ++-- chapters/vi/chapter1/10.mdx | 4 ++-- chapters/vi/chapter1/3.mdx | 4 ++-- chapters/zh-CN/chapter1/10.mdx | 4 ++-- chapters/zh-CN/chapter1/3.mdx | 4 ++-- chapters/zh-TW/chapter1/10.mdx | 4 ++-- chapters/zh-TW/chapter1/3.mdx | 4 ++-- subtitles/en/raw/chapter1/03_the-pipeline-function.md | 2 +- 35 files changed, 71 insertions(+), 71 deletions(-) diff --git a/chapters/de/chapter1/10.mdx b/chapters/de/chapter1/10.mdx index 0d6054011..9841da200 100644 --- a/chapters/de/chapter1/10.mdx +++ b/chapters/de/chapter1/10.mdx @@ -39,7 +39,7 @@ Doch zuerst wollen wir noch testen, was du in diesem Kapitel gelernt hast! ```py from transformers import pipeline -ner = pipeline("ner", grouped_entities=True) +ner = pipeline("ner", aggregation_strategy="simple") ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") ``` @@ -55,7 +55,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") }, { text: "Er gibt Begriffe zurück, die für Personen, Organisationen oder Orte stehen.", - explain: "Außerdem werden mit grouped_entities=True die Wörter, die zur selben Entität gehören, gruppiert, wie z. B. \"Hugging Face\".", + explain: "Außerdem werden mit aggregation_strategy="simple" die Wörter, die zur selben Entität gehören, gruppiert, wie z. B. \"Hugging Face\".", correct: true } ]} diff --git a/chapters/de/chapter1/3.mdx b/chapters/de/chapter1/3.mdx index 9e15b54d2..1e2c509d5 100644 --- a/chapters/de/chapter1/3.mdx +++ b/chapters/de/chapter1/3.mdx @@ -203,7 +203,7 @@ Bei der Eigennamenerkennung (engl. Named Entity Recognition, NER) handelt es sic ```python from transformers import pipeline -ner = pipeline("ner", grouped_entities=True) +ner = pipeline("ner", aggregation_strategy="simple") ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") ``` @@ -216,7 +216,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") Hier hat das Modell richtig erkannt, dass Sylvain eine Person (PER), Hugging Face eine Organisation (ORG) und Brooklyn ein Ort (LOC) ist. -In der Funktion zur Erstellung der Pipeline übergeben wir die Option `grouped_entities=True`, um die Pipeline anzuweisen, die Teile des Satzes, die der gleichen Entität entsprechen, zu gruppieren: Hier hat das Modell "Hugging" und "Face" richtigerweise als eine einzelne Organisation gruppiert, auch wenn der Name aus mehreren Wörtern besteht. Wie wir im nächsten Kapitel sehen werden, werden bei der Vorverarbeitung (engl. Preprocessing) sogar einige Wörter in kleinere Teile zerlegt. Zum Beispiel wird `Sylvain` in vier Teile zerlegt: `S`, `##yl`, `##va` und `##in`. Im Nachverarbeitungsschritt (engl. Post-Processing) hat die Pipeline diese Teile erfolgreich neu gruppiert. +In der Funktion zur Erstellung der Pipeline übergeben wir die Option `aggregation_strategy="simple"`, um die Pipeline anzuweisen, die Teile des Satzes, die der gleichen Entität entsprechen, zu gruppieren: Hier hat das Modell "Hugging" und "Face" richtigerweise als eine einzelne Organisation gruppiert, auch wenn der Name aus mehreren Wörtern besteht. Wie wir im nächsten Kapitel sehen werden, werden bei der Vorverarbeitung (engl. Preprocessing) sogar einige Wörter in kleinere Teile zerlegt. Zum Beispiel wird `Sylvain` in vier Teile zerlegt: `S`, `##yl`, `##va` und `##in`. Im Nachverarbeitungsschritt (engl. Post-Processing) hat die Pipeline diese Teile erfolgreich neu gruppiert. > [!TIP] > ✏️ **Probiere es aus!** Suche im Model Hub nach einem Modell, das in der Lage ist, Part-of-Speech-Tagging (in der Regel als POS abgekürzt) im Englischen durchzuführen (Anm.: d. h. Wortarten zuzuordnen). Was sagt dieses Modell für den Satz im obigen Beispiel vorher? diff --git a/chapters/en/chapter1/3.mdx b/chapters/en/chapter1/3.mdx index 2865476fb..7efc4ef8e 100644 --- a/chapters/en/chapter1/3.mdx +++ b/chapters/en/chapter1/3.mdx @@ -223,7 +223,7 @@ Named entity recognition (NER) is a task where the model has to find which parts ```python from transformers import pipeline -ner = pipeline("ner", grouped_entities=True) +ner = pipeline("ner", aggregation_strategy="simple") ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") ``` @@ -236,7 +236,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") Here the model correctly identified that Sylvain is a person (PER), Hugging Face an organization (ORG), and Brooklyn a location (LOC). -We pass the option `grouped_entities=True` in the pipeline creation function to tell the pipeline to regroup together the parts of the sentence that correspond to the same entity: here the model correctly grouped "Hugging" and "Face" as a single organization, even though the name consists of multiple words. In fact, as we will see in the next chapter, the preprocessing even splits some words into smaller parts. For instance, `Sylvain` is split into four pieces: `S`, `##yl`, `##va`, and `##in`. In the post-processing step, the pipeline successfully regrouped those pieces. +We pass the option `aggregation_strategy="simple"` in the pipeline creation function to tell the pipeline to regroup together the parts of the sentence that correspond to the same entity: here the model correctly grouped "Hugging" and "Face" as a single organization, even though the name consists of multiple words. In fact, as we will see in the next chapter, the preprocessing even splits some words into smaller parts. For instance, `Sylvain` is split into four pieces: `S`, `##yl`, `##va`, and `##in`. In the post-processing step, the pipeline successfully regrouped those pieces. > [!TIP] > ✏️ **Try it out!** Search the Model Hub for a model able to do part-of-speech tagging (usually abbreviated as POS) in English. What does this model predict for the sentence in the example above? diff --git a/chapters/en/chapter1/7.mdx b/chapters/en/chapter1/7.mdx index a01a020fb..4091472f3 100644 --- a/chapters/en/chapter1/7.mdx +++ b/chapters/en/chapter1/7.mdx @@ -37,7 +37,7 @@ This quiz is ungraded, so you can try it as many times as you want. If you strug ```py from transformers import pipeline -ner = pipeline("ner", grouped_entities=True) +ner = pipeline("ner", aggregation_strategy="simple") ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") ``` @@ -53,7 +53,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") }, { text: "It will return the words representing persons, organizations or locations.", - explain: "Furthermore, with grouped_entities=True, it will group together the words belonging to the same entity, like \"Hugging Face\".", + explain: "Furthermore, with aggregation_strategy="simple", it will group together the words belonging to the same entity, like \"Hugging Face\".", correct: true } ]} diff --git a/chapters/es/chapter1/10.mdx b/chapters/es/chapter1/10.mdx index 6bad89a0d..7eab034d3 100644 --- a/chapters/es/chapter1/10.mdx +++ b/chapters/es/chapter1/10.mdx @@ -36,7 +36,7 @@ Por ahora, ¡revisemos lo que aprendiste en este capítulo! ```py from transformers import pipeline -ner = pipeline("ner", grouped_entities=True) +ner = pipeline("ner", aggregation_strategy="simple") ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") ``` @@ -52,7 +52,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") }, { text: "Devuelve las palabras que representan personas, organizaciones o ubicaciones.", - explain: "Adicionalmente, con grouped_entities=True, agrupará las palabras que pertenecen a la misma entidad, como \"Hugging Face\".", + explain: "Adicionalmente, con aggregation_strategy="simple", agrupará las palabras que pertenecen a la misma entidad, como \"Hugging Face\".", correct: true } ]} diff --git a/chapters/es/chapter1/3.mdx b/chapters/es/chapter1/3.mdx index 7b59905a0..de242183f 100644 --- a/chapters/es/chapter1/3.mdx +++ b/chapters/es/chapter1/3.mdx @@ -203,7 +203,7 @@ El reconocimiento de entidades nombradas (REN) es una tarea en la que el modelo ```python from transformers import pipeline -ner = pipeline("ner", grouped_entities=True) +ner = pipeline("ner", aggregation_strategy="simple") ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") ``` @@ -216,7 +216,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") En este caso el modelo identificó correctamente que Sylvain es una persona (PER), Hugging Face una organización (ORG) y Brooklyn una ubicación (LOC). -Pasamos la opción `grouped_entities=True` en la función de creación del pipeline para decirle que agrupe las partes de la oración que corresponden a la misma entidad: Aquí el modelo agrupó correctamente "Hugging" y "Face" como una sola organización, a pesar de que su nombre está compuesto de varias palabras. De hecho, como veremos en el siguiente capítulo, el preprocesamiento puede incluso dividir palabras en partes más pequeñas. Por ejemplo, 'Sylvain' se separa en cuatro piezas: `S`, `##yl`, `##va` y`##in`. En el paso de prosprocesamiento, el pipeline reagrupa de manera exitosa dichas piezas. +Pasamos la opción `aggregation_strategy="simple"` en la función de creación del pipeline para decirle que agrupe las partes de la oración que corresponden a la misma entidad: Aquí el modelo agrupó correctamente "Hugging" y "Face" como una sola organización, a pesar de que su nombre está compuesto de varias palabras. De hecho, como veremos en el siguiente capítulo, el preprocesamiento puede incluso dividir palabras en partes más pequeñas. Por ejemplo, 'Sylvain' se separa en cuatro piezas: `S`, `##yl`, `##va` y`##in`. En el paso de prosprocesamiento, el pipeline reagrupa de manera exitosa dichas piezas. > [!TIP] > ✏️ **¡Pruébalo!** Busca en el Model Hub un modelo capaz de hacer etiquetado *part-of-speech* (que se abrevia usualmente como POS) en Inglés. ¿Qué predice este modelo para la oración en el ejemplo de arriba? diff --git a/chapters/fr/chapter1/10.mdx b/chapters/fr/chapter1/10.mdx index 889612715..c486b1c64 100644 --- a/chapters/fr/chapter1/10.mdx +++ b/chapters/fr/chapter1/10.mdx @@ -38,7 +38,7 @@ Mais avant d'aller plus loin, prenons un instant pour voir ce que vous avez appr ```py from transformers import pipeline -ner = pipeline("ner", grouped_entities=True) +ner = pipeline("ner", aggregation_strategy="simple") ner( "My name is Sylvain and I work at Hugging Face in Brooklyn." ) # Je m'appelle Sylvain et je travaille à Hugging Face à Brooklyn. @@ -56,7 +56,7 @@ ner( }, { text: "Il renvoie les entités nommées dans cette phrase, telles que les personnes, les organisations ou lieux.", - explain: "De plus, avec grouped_entities=True, cela regroupe les mots appartenant à la même entité, comme par exemple \"Hugging Face\".", + explain: "De plus, avec aggregation_strategy="simple", cela regroupe les mots appartenant à la même entité, comme par exemple \"Hugging Face\".", correct: true } ]} diff --git a/chapters/fr/chapter1/3.mdx b/chapters/fr/chapter1/3.mdx index 0d528063c..c9f94cca2 100644 --- a/chapters/fr/chapter1/3.mdx +++ b/chapters/fr/chapter1/3.mdx @@ -227,7 +227,7 @@ La reconnaissance d'entités nommées ou NER (pour *Named Entity Recognition*) e ```python from transformers import pipeline -ner = pipeline("ner", grouped_entities=True) +ner = pipeline("ner", aggregation_strategy="simple") ner( "My name is Sylvain and I work at Hugging Face in Brooklyn." ) # Je m'appelle Sylvain et je travaille à Hugging Face à Brooklyn. @@ -242,7 +242,7 @@ ner( Nous pouvons voir que le modèle a correctement identifié Sylvain comme une personne (PER), Hugging Face comme une organisation (ORG) et Brooklyn comme un lieu (LOC). -Il est possible d'utiliser l'option `grouped_entities=True` lors de la création du pipeline pour regrouper les parties du texte qui correspondent à la même entité : ici le modèle à correctement regroupé `Hugging` et `Face` comme une seule organisation, même si le nom comporte plusieurs mots. En effet, comme nous allons voir dans le prochain chapitre, la prétraitement du texte sépare parfois certains mots en plus petites parties. Par exemple, `Sylvain` est séparé en quatre morceaux : `S`, `##yl`, `##va`, et `##in`. Dans l'étape de post-traitement, le pipeline a réussi à regrouper ces morceaux. +Il est possible d'utiliser l'option `aggregation_strategy="simple"` lors de la création du pipeline pour regrouper les parties du texte qui correspondent à la même entité : ici le modèle à correctement regroupé `Hugging` et `Face` comme une seule organisation, même si le nom comporte plusieurs mots. En effet, comme nous allons voir dans le prochain chapitre, la prétraitement du texte sépare parfois certains mots en plus petites parties. Par exemple, `Sylvain` est séparé en quatre morceaux : `S`, `##yl`, `##va`, et `##in`. Dans l'étape de post-traitement, le pipeline a réussi à regrouper ces morceaux. > [!TIP] > ✏️ **Essayez !** Recherchez sur le *Hub* un modèle capable de reconnaître les différentes parties du langage (généralement abrégé en POS pour *Part-of-speech*) en anglais. Que prédit le modèle pour la phrase dans notre exemple du pipeline au-dessus ? diff --git a/chapters/hi/chapter1/10.mdx b/chapters/hi/chapter1/10.mdx index 6ef1bbed8..f52c6df42 100644 --- a/chapters/hi/chapter1/10.mdx +++ b/chapters/hi/chapter1/10.mdx @@ -34,7 +34,7 @@ ```py from transformers import pipeline -ner = pipeline("ner", grouped_entities=True) +ner = pipeline("ner", aggregation_strategy="simple") ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") ``` @@ -50,7 +50,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") }, { text: "यह व्यक्तियों, संगठनों या स्थानों का प्रतिनिधित्व करने वाले शब्दों को वापस कर देगा।", - explain: "इसके अलावा, grouped_entities=True के साथ, यह एक ही इकाई से संबंधित शब्दों को एक साथ समूहित करेगा, जैसे \"हगिंग फेस\"।", + explain: "इसके अलावा, aggregation_strategy="simple" के साथ, यह एक ही इकाई से संबंधित शब्दों को एक साथ समूहित करेगा, जैसे \"हगिंग फेस\"।", correct: true } ]} diff --git a/chapters/hi/chapter1/3.mdx b/chapters/hi/chapter1/3.mdx index 555bb79a2..d6face45e 100644 --- a/chapters/hi/chapter1/3.mdx +++ b/chapters/hi/chapter1/3.mdx @@ -217,7 +217,7 @@ unmasker("This course will teach you all about models.", top_k=2) ```python from transformers import pipeline -ner = pipeline("ner", grouped_entities=True) +ner = pipeline("ner", aggregation_strategy="simple") ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") ``` @@ -230,7 +230,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") यहां मॉडल ने सही ढंग से पहचाना कि सिल्वेन एक व्यक्ति (पीईआर), हगिंग फेस एक संगठन (ओआरजी), और ब्रुकलिन एक स्थान (एलओसी) है। -हम पाइपलाइन निर्माण फ़ंक्शन में विकल्प `grouped_entities=True` पास करते हैं ताकि पाइपलाइन को एक ही इकाई के अनुरूप वाक्य के हिस्सों को एक साथ फिर से समूहित करने के लिए कहा जा सके: यहां मॉडल ने एक ही संगठन के रूप में "हगिंग" और "फेस" को सही ढंग से समूहीकृत किया है, भले ही नाम में कई शब्द हों। वास्तव में, जैसा कि हम अगले अध्याय में देखेंगे, प्रीप्रोसेसिंग कुछ शब्दों को छोटे भागों में भी विभाजित करता है। उदाहरण के लिए, `सिल्वेन` को चार भागों में बांटा गया है: `S`, `##yl`, `##va`, और `##in`। प्रसंस्करण के बाद के चरण में, पाइपलाइन ने उन टुकड़ों को सफलतापूर्वक पुन: समूहित किया। +हम पाइपलाइन निर्माण फ़ंक्शन में विकल्प `aggregation_strategy="simple"` पास करते हैं ताकि पाइपलाइन को एक ही इकाई के अनुरूप वाक्य के हिस्सों को एक साथ फिर से समूहित करने के लिए कहा जा सके: यहां मॉडल ने एक ही संगठन के रूप में "हगिंग" और "फेस" को सही ढंग से समूहीकृत किया है, भले ही नाम में कई शब्द हों। वास्तव में, जैसा कि हम अगले अध्याय में देखेंगे, प्रीप्रोसेसिंग कुछ शब्दों को छोटे भागों में भी विभाजित करता है। उदाहरण के लिए, `सिल्वेन` को चार भागों में बांटा गया है: `S`, `##yl`, `##va`, और `##in`। प्रसंस्करण के बाद के चरण में, पाइपलाइन ने उन टुकड़ों को सफलतापूर्वक पुन: समूहित किया। > [!TIP] > ✏️ **कोशिश करके देखो!** अंग्रेजी में पार्ट-ऑफ-स्पीच टैगिंग (आमतौर पर पीओएस के रूप में संक्षिप्त) करने में सक्षम मॉडल के लिए मॉडल हब खोजें। यह मॉडल उपरोक्त उदाहरण में वाक्य के लिए क्या भविष्यवाणी करता है? diff --git a/chapters/it/chapter1/10.mdx b/chapters/it/chapter1/10.mdx index 886706092..731385273 100644 --- a/chapters/it/chapter1/10.mdx +++ b/chapters/it/chapter1/10.mdx @@ -39,7 +39,7 @@ Prima di procedere, però, verifichiamo cos'hai imparato in questo capitolo! ```py from transformers import pipeline -ner = pipeline("ner", grouped_entities=True) +ner = pipeline("ner", aggregation_strategy="simple") ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") ``` @@ -55,7 +55,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") }, { text: "Restituisce i termini che rappresentano persone, organizzazioni o luoghi.", - explain: "Inoltre, grazie a grouped_entities=True, la pipeline è in grado di raggruppare le parole che appartengono alla stessa entità, come \"Hugging Face\".", + explain: "Inoltre, grazie a aggregation_strategy="simple", la pipeline è in grado di raggruppare le parole che appartengono alla stessa entità, come \"Hugging Face\".", correct: true } ]} diff --git a/chapters/it/chapter1/3.mdx b/chapters/it/chapter1/3.mdx index 140e51df2..1d8383613 100644 --- a/chapters/it/chapter1/3.mdx +++ b/chapters/it/chapter1/3.mdx @@ -203,7 +203,7 @@ Il riconoscimento delle entità nominate (*Named entity recognition*, NER) è un ```python from transformers import pipeline -ner = pipeline("ner", grouped_entities=True) +ner = pipeline("ner", aggregation_strategy="simple") ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") ``` @@ -216,7 +216,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") Qui il modello ha correttamente identificato che Sylvain è una persona (PER), Hugging Face un'organizzazione (ORG), e Brooklyn una località (LOC). -Passiamo l'opzione `grouped_entities=True` nella funzione di creazione della pipeline per raggruppare le parti frasali che corrispondono alla stessa entità: qui il modello raggruppa correttamente "Hugging" e "Face" come singola organizzazione, nonostante il nome sia formato da più parole. A dire il vero, come vedremo nel prossimo capitolo, il preprocessing divide perfino alcune parole in parti più piccole. Ad esempio, `Sylvain` viene suddiviso in quattro parti: `S`, `##yl`, `##va`, and `##in`. Al momento del post-processing, la pipeline raggruppa le parti con successo. +Passiamo l'opzione `aggregation_strategy="simple"` nella funzione di creazione della pipeline per raggruppare le parti frasali che corrispondono alla stessa entità: qui il modello raggruppa correttamente "Hugging" e "Face" come singola organizzazione, nonostante il nome sia formato da più parole. A dire il vero, come vedremo nel prossimo capitolo, il preprocessing divide perfino alcune parole in parti più piccole. Ad esempio, `Sylvain` viene suddiviso in quattro parti: `S`, `##yl`, `##va`, and `##in`. Al momento del post-processing, la pipeline raggruppa le parti con successo. > [!TIP] > ✏️ **Provaci anche tu!** Nel Model Hub, cerca un modello capace di effettuare part-of-speech tagging (comunemente abbreviato come POS) in inglese. Cosa predice il modello per la frase nell'esempio qui sopra? diff --git a/chapters/ja/chapter1/10.mdx b/chapters/ja/chapter1/10.mdx index 11b70518a..40f4bca7d 100644 --- a/chapters/ja/chapter1/10.mdx +++ b/chapters/ja/chapter1/10.mdx @@ -37,7 +37,7 @@ ```py from transformers import pipeline -ner = pipeline("ner", grouped_entities=True) +ner = pipeline("ner", aggregation_strategy="simple") ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") ``` @@ -53,7 +53,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") }, { text: "この文中の人物、団体、場所を表す単語を返します。", - explain: "さらに、grouped_entities=Trueを用いると、同じエンティティに属する単語をグループ化します。", + explain: "さらに、aggregation_strategy="simple"を用いると、同じエンティティに属する単語をグループ化します。", correct: true } ]} diff --git a/chapters/ja/chapter1/3.mdx b/chapters/ja/chapter1/3.mdx index 92be0cddf..573a55d37 100644 --- a/chapters/ja/chapter1/3.mdx +++ b/chapters/ja/chapter1/3.mdx @@ -206,7 +206,7 @@ unmasker("This course will teach you all about models.", top_k=2) ```python from transformers import pipeline -ner = pipeline("ner", grouped_entities=True) +ner = pipeline("ner", aggregation_strategy="simple") ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") ``` @@ -219,7 +219,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") ここでは、モデルはSylvainが人(PER)、Hugging Faceが組織(ORG)、Brooklynが場所(LOC)であることを正しく識別しています。 -pipelineの作成機能でオプション `grouped_entities=True` を渡すと、同じエンティティに対応する文の部分を再グループ化するようpipelineに指示します。ここでは、名前が複数の単語で構成されていても、モデルは "Hugging" と "Face" を一つの組織として正しくグループ化しています。実際、次の章で説明するように、前処理ではいくつかの単語をより小さなパーツに分割することさえあります。例えば、`Sylvain`は4つの部分に分割されます。`S`, `##yl`, `##va`, and `##in`.です。後処理の段階で、pipelineはこれらの断片をうまく再グループ化しました。 +pipelineの作成機能でオプション `aggregation_strategy="simple"` を渡すと、同じエンティティに対応する文の部分を再グループ化するようpipelineに指示します。ここでは、名前が複数の単語で構成されていても、モデルは "Hugging" と "Face" を一つの組織として正しくグループ化しています。実際、次の章で説明するように、前処理ではいくつかの単語をより小さなパーツに分割することさえあります。例えば、`Sylvain`は4つの部分に分割されます。`S`, `##yl`, `##va`, and `##in`.です。後処理の段階で、pipelineはこれらの断片をうまく再グループ化しました。 > [!TIP] > ✏️ **試してみよう!** Model Hubで英語の品詞タグ付け(通常POSと略される)を行えるモデルを検索してください。このモデルは、上の例の文に対して何を予測するでしょうか? diff --git a/chapters/ko/chapter1/10.mdx b/chapters/ko/chapter1/10.mdx index e6fbd2a0e..cf8ea77f9 100644 --- a/chapters/ko/chapter1/10.mdx +++ b/chapters/ko/chapter1/10.mdx @@ -38,7 +38,7 @@ ```py from transformers import pipeline -ner = pipeline("ner", grouped_entities=True) +ner = pipeline("ner", aggregation_strategy="simple") ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") ``` @@ -54,7 +54,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") }, { text: "사람, 기관, 장소 등을 나타내는 단어들을 반환합니다.", - explain: "이 뿐만 아니라, grouped_entities=True를 사용해 \"Hugging Face\"와 같이 같은 개체에 해당하는 단어들을 그룹화해줍니다.", + explain: "이 뿐만 아니라, aggregation_strategy="simple"를 사용해 \"Hugging Face\"와 같이 같은 개체에 해당하는 단어들을 그룹화해줍니다.", correct: true } ]} diff --git a/chapters/ko/chapter1/3.mdx b/chapters/ko/chapter1/3.mdx index 2021d2850..26544932c 100644 --- a/chapters/ko/chapter1/3.mdx +++ b/chapters/ko/chapter1/3.mdx @@ -203,7 +203,7 @@ unmasker("This course will teach you all about models.", top_k=2) ```python from transformers import pipeline -ner = pipeline("ner", grouped_entities=True) +ner = pipeline("ner", aggregation_strategy="simple") ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") ``` @@ -216,7 +216,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") 모델이 정확하게 Sylvain을 사람(PER)으로, Hugging Face를 기관(ORG)으로, Brooklyn을 장소(LOC)으로 예측했네요! -파이프라인을 생성하는 함수에 `grouped_entities=True` 옵션을 전달하면 파이프라인이 같은 개체에 해당하는 문장 부분을 다시 그룹화합니다. 이 옵션을 설정하면 모델은 여러 단어로 구성된 단어임에도 “Hugging”과 “Face”를 하나의 기관으로 정확히 분류하게 됩니다. 다음 챕터에서도 확인하겠지만, 놀랍게도 전처리 과정에서 각 단어들은 더 작은 부분으로 쪼개집니다. 예를 들어 `Sylvain` 이라는 단어는 `S`, `##yl`, `##va`, `##in` 이렇게 네 조각으로 쪼개집니다. 후처리 단계에서 파이프라인은 이 조각들을 멋지게 재그룹화합니다. +파이프라인을 생성하는 함수에 `aggregation_strategy="simple"` 옵션을 전달하면 파이프라인이 같은 개체에 해당하는 문장 부분을 다시 그룹화합니다. 이 옵션을 설정하면 모델은 여러 단어로 구성된 단어임에도 “Hugging”과 “Face”를 하나의 기관으로 정확히 분류하게 됩니다. 다음 챕터에서도 확인하겠지만, 놀랍게도 전처리 과정에서 각 단어들은 더 작은 부분으로 쪼개집니다. 예를 들어 `Sylvain` 이라는 단어는 `S`, `##yl`, `##va`, `##in` 이렇게 네 조각으로 쪼개집니다. 후처리 단계에서 파이프라인은 이 조각들을 멋지게 재그룹화합니다. > [!TIP] > ✏️ **직접 해보기!** Model Hub에서 영어 품사 태깅(part-of-speech tagging, 줄여서 POS)이 가능한 모델을 찾아보세요. 이 모델이 위의 예시 문장으로 무엇을 예측하나요? diff --git a/chapters/my/chapter1/3.mdx b/chapters/my/chapter1/3.mdx index 401da476b..cc3c2af5d 100644 --- a/chapters/my/chapter1/3.mdx +++ b/chapters/my/chapter1/3.mdx @@ -222,7 +222,7 @@ Named Entity Recognition (NER) ဆိုတာက input text ထဲက ဘယ် ```python from transformers import pipeline -ner = pipeline("ner", grouped_entities=True) +ner = pipeline("ner", aggregation_strategy="simple") ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") ``` @@ -235,7 +235,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") ဒီနေရာမှာ မော်ဒယ်က Sylvain ဟာ လူ (PER) ဖြစ်ကြောင်း၊ Hugging Face က အဖွဲ့အစည်း (ORG) ဖြစ်ကြောင်း၊ Brooklyn က နေရာ (LOC) ဖြစ်ကြောင်း မှန်ကန်စွာ ဖော်ထုတ်ခဲ့ပါတယ်။ -ကျွန်တော်တို့ `grouped_entities=True option` ကို pipeline ဖန်တီးတဲ့ function မှာ ပေးလိုက်တာက စာကြောင်းရဲ့ အစိတ်အပိုင်းတွေကို တူညီတဲ့ entity နဲ့ ကိုက်ညီတဲ့ အစိတ်အပိုင်းတွေကို အတူတကွ ပြန်လည်စုစည်းဖို့ pipeline ကို ပြောတာပါ။ ဒီနေရာမှာ မော်ဒယ်က "Hugging" နဲ့ "Face" ကို စကားလုံးများစွာနဲ့ ဖွဲ့စည်းထားတဲ့ နာမည်ဖြစ်ပေမယ့် တစ်ခုတည်းသော အဖွဲ့အစည်းအဖြစ် မှန်ကန်စွာ စုစည်းခဲ့ပါတယ်။ တကယ်တော့ နောက်အခန်းမှာ ကျွန်တော်တို့ မြင်ရမှာဖြစ်သလို preprocessing က စကားလုံးအချို့ကို ပိုမိုသေးငယ်တဲ့ အစိတ်အပိုင်းတွေအဖြစ် ခွဲထုတ်တာတောင် လုပ်ပါတယ်။ ဥပမာ၊ `Sylvain` ကို `S`၊ `##yl`၊ `##va`၊ `##in` ဆိုပြီး လေးပိုင်းခွဲပါတယ်။ post-processing အဆင့်မှာတော့ pipeline က အဲဒီအပိုင်းတွေကို အောင်မြင်စွာ ပြန်လည်စုစည်းပေးပါတယ်။ +ကျွန်တော်တို့ `aggregation_strategy="simple"` option ကို pipeline ဖန်တီးတဲ့ function မှာ ပေးလိုက်တာက စာကြောင်းရဲ့ အစိတ်အပိုင်းတွေကို တူညီတဲ့ entity နဲ့ ကိုက်ညီတဲ့ အစိတ်အပိုင်းတွေကို အတူတကွ ပြန်လည်စုစည်းဖို့ pipeline ကို ပြောတာပါ။ ဒီနေရာမှာ မော်ဒယ်က "Hugging" နဲ့ "Face" ကို စကားလုံးများစွာနဲ့ ဖွဲ့စည်းထားတဲ့ နာမည်ဖြစ်ပေမယ့် တစ်ခုတည်းသော အဖွဲ့အစည်းအဖြစ် မှန်ကန်စွာ စုစည်းခဲ့ပါတယ်။ တကယ်တော့ နောက်အခန်းမှာ ကျွန်တော်တို့ မြင်ရမှာဖြစ်သလို preprocessing က စကားလုံးအချို့ကို ပိုမိုသေးငယ်တဲ့ အစိတ်အပိုင်းတွေအဖြစ် ခွဲထုတ်တာတောင် လုပ်ပါတယ်။ ဥပမာ၊ `Sylvain` ကို `S`၊ `##yl`၊ `##va`၊ `##in` ဆိုပြီး လေးပိုင်းခွဲပါတယ်။ post-processing အဆင့်မှာတော့ pipeline က အဲဒီအပိုင်းတွေကို အောင်မြင်စွာ ပြန်လည်စုစည်းပေးပါတယ်။ > [!TIP] > ✏️ **ကိုယ်တိုင် စမ်းကြည့်ပါဦး။** Model Hub မှာ အင်္ဂလိပ်ဘာသာစကားမှာ part-of-speech tagging (အတိုကောက်အားဖြင့် POS) လုပ်ဆောင်နိုင်တဲ့ မော်ဒယ်တစ်ခုကို ရှာဖွေပါ။ အထက်ပါ ဥပမာမှာပါတဲ့ စာကြောင်းအတွက် ဒီမော်ဒယ်က ဘာကို ခန့်မှန်းပေးမလဲ။ @@ -427,7 +427,7 @@ Transformer မော်ဒယ်တွေရဲ့ အစွမ်းထက် * **Mask Token**: `fill-mask` လုပ်ငန်းတာဝန်များတွင် စာသားထဲက ကွက်လပ်တစ်ခုကို ကိုယ်စားပြုသော အထူး token။ * **Named Entity Recognition (NER)**: စာသားတစ်ခုထဲက လူအမည်၊ နေရာအမည်၊ အဖွဲ့အစည်းအမည် စတဲ့ သီးခြားအမည်တွေကို ရှာဖွေဖော်ထုတ်ခြင်း။ * **Entities**: အချက်အလက်များ သို့မဟုတ် အရာဝတ္ထုများ (ဥပမာ- လူ၊ နေရာ၊ အဖွဲ့အစည်း)။ -* **`grouped_entities`**: NER pipeline တွင် အတူတူရှိသော entity အပိုင်းအစများကို တစ်ခုတည်းအဖြစ် စုစည်းရန် အသုံးပြုသော option။ +* **`aggregation_strategy`**: NER pipeline တွင် အတူတူရှိသော entity အပိုင်းအစများကို တစ်ခုတည်းအဖြစ် စုစည်းရန် အသုံးပြုသော option။ * **Part-of-speech Tagging (POS)**: စာကြောင်းတစ်ခုရှိ စကားလုံးတစ်လုံးစီ၏ သဒ္ဒါဆိုင်ရာ အစိတ်အပိုင်း (ဥပမာ- နာမ်၊ ကြိယာ၊ နာမဝိသေသန) ကို ခွဲခြားသတ်မှတ်ခြင်း။ * **Question Answering**: မေးခွန်းတစ်ခုနဲ့ ပေးထားတဲ့ အကြောင်းအရာတစ်ခုကနေ အချက်အလက်တွေကို ထုတ်ယူပြီး အဖြေရှာတဲ့ လုပ်ငန်းတာဝန်။ * **`min_length`**: text generation သို့မဟုတ် summarization pipeline တွင် ထွက်ပေါ်လာမည့် output text ၏ အတိုဆုံး ဖြစ်နိုင်သော အရှည်ကို သတ်မှတ်ရန် အသုံးပြုသော argument။ diff --git a/chapters/my/chapter1/7.mdx b/chapters/my/chapter1/7.mdx index c51de1dd1..bb4be8c49 100644 --- a/chapters/my/chapter1/7.mdx +++ b/chapters/my/chapter1/7.mdx @@ -36,7 +36,7 @@ ```py from transformers import pipeline -ner = pipeline("ner", grouped_entities=True) +ner = pipeline("ner", aggregation_strategy="simple") ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") ``` @@ -52,7 +52,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") }, { text: "၎င်းသည် လူပုဂ္ဂိုလ်များ၊ အဖွဲ့အစည်းများ သို့မဟုတ် နေရာများကို ကိုယ်စားပြုသည့် စကားလုံးများကို ပြန်ပေးပါလိမ့်မည်။", - explain: "ထို့အပြင် `grouped_entities=True` ကို အသုံးပြုထားသောကြောင့် ၎င်းသည် 'Hugging Face' ကဲ့သို့သော တူညီသည့် entity နှင့် သက်ဆိုင်သည့် စကားလုံးများကို အုပ်စုဖွဲ့ပေးပါလိမ့်မည်။", + explain: "ထို့အပြင် `aggregation_strategy="simple"` ကို အသုံးပြုထားသောကြောင့် ၎င်းသည် 'Hugging Face' ကဲ့သို့သော တူညီသည့် entity နှင့် သက်ဆိုင်သည့် စကားလုံးများကို အုပ်စုဖွဲ့ပေးပါလိမ့်မည်။", correct: true } ]} @@ -267,7 +267,7 @@ result = classifier("This is a course about the Transformers library") * **Text Generation**: AI မော်ဒယ်များကို အသုံးပြု၍ လူသားကဲ့သို့သော စာသားအသစ်များ ဖန်တီးခြင်း။ * **`pipeline()` function**: Hugging Face Transformers library မှာ ပါဝင်တဲ့ လုပ်ဆောင်ချက်တစ်ခုဖြစ်ပြီး မော်ဒယ်တွေကို သီးခြားလုပ်ငန်းတာဝန်များ (ဥပမာ- စာသားခွဲခြားသတ်မှတ်ခြင်း၊ စာသားထုတ်လုပ်ခြင်း) အတွက် အသုံးပြုရလွယ်ကူအောင် ပြုလုပ်ပေးပါတယ်။ * **`ner` (Named Entity Recognition)**: စာသားထဲက လူအမည်၊ နေရာအမည်၊ အဖွဲ့အစည်းအမည် စတဲ့ သီးခြားအမည်တွေကို ရှာဖွေဖော်ထုတ်ခြင်း။ -* **`grouped_entities=True`**: `ner` pipeline တွင် အသုံးပြုသည့် parameter တစ်ခုဖြစ်ပြီး တူညီသော entity နှင့် သက်ဆိုင်သည့် စကားလုံးများကို အုပ်စုဖွဲ့ပေးသည်။ +* **`aggregation_strategy="simple"`**: `ner` pipeline တွင် အသုံးပြုသည့် parameter တစ်ခုဖြစ်ပြီး တူညီသော entity နှင့် သက်ဆိုင်သည့် စကားလုံးများကို အုပ်စုဖွဲ့ပေးသည်။ * **`sentiment-analysis` pipeline**: စာသားတစ်ခု၏ စိတ်ခံစားမှု (အပြုသဘော၊ အနုတ်သဘော) ကို ခွဲခြမ်းစိတ်ဖြာရန် အသုံးပြုသော pipeline။ * **`text-generation` pipeline**: input prompt အပေါ် အခြေခံ၍ စာသားအသစ်များကို ဖန်တီးရန် အသုံးပြုသော pipeline။ * **`fill-mask` pipeline**: စာသားတစ်ခုရှိ ဝှက်ထားသော စကားလုံးများ (mask tokens) ကို ဖြည့်ဆည်းပေးရန် အသုံးပြုသော pipeline။ diff --git a/chapters/pt/chapter1/10.mdx b/chapters/pt/chapter1/10.mdx index 7c488f9e9..9eaa97ad4 100644 --- a/chapters/pt/chapter1/10.mdx +++ b/chapters/pt/chapter1/10.mdx @@ -36,7 +36,7 @@ Primeiro, porém, vamos testar o que você aprendeu neste capítulo! ```py from transformers import pipeline -ner = pipeline("ner", grouped_entities=True) +ner = pipeline("ner", aggregation_strategy="simple") ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") ``` @@ -52,7 +52,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") }, { text: "Ele retornará as palavras que representam pessoas, organizações ou locais.", - explain: "Além disso, com `grouped_entities=True`, ele agrupará as palavras pertencentes à mesma entidade, como 'Hugging Face'.", + explain: "Além disso, com `aggregation_strategy="simple"`, ele agrupará as palavras pertencentes à mesma entidade, como 'Hugging Face'.", correct: true } ]} diff --git a/chapters/pt/chapter1/3.mdx b/chapters/pt/chapter1/3.mdx index f1f4ec49f..7d7c92666 100644 --- a/chapters/pt/chapter1/3.mdx +++ b/chapters/pt/chapter1/3.mdx @@ -205,7 +205,7 @@ Reconhecimento de Entidades Nomeadas (NER) é uma tarefa onde o modelo tem de ac ```python from transformers import pipeline -ner = pipeline("ner", grouped_entities=True) +ner = pipeline("ner", aggregation_strategy="simple") ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") ``` @@ -218,7 +218,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") Aqui o modelo corretamente identificou que Sylvain é uma pessoa (PER), Hugging Face é uma organização (ORG), e Brooklyn é um local (LOC). -Nós passamos a opção `grouped_entities=True` na criação da função do pipelina para dize-lo para reagrupar juntos as partes da sentença que correspondem à mesma entidade: aqui o modelo agrupou corretamente "Hugging" e "Face" como única organização, ainda que o mesmo nome consista em múltiplas palavras. Na verdade, como veremos no próximo capítulo, o pré-processamento até mesmo divide algumas palavras em partes menores. Por exemplo, `Sylvain` é dividido em 4 pedaços: `S`, `##yl`, `##va`, e `##in`. No passo de pós-processamento, o pipeline satisfatoriamente reagrupa esses pedaços. +Nós passamos a opção `aggregation_strategy="simple"` na criação da função do pipelina para dize-lo para reagrupar juntos as partes da sentença que correspondem à mesma entidade: aqui o modelo agrupou corretamente "Hugging" e "Face" como única organização, ainda que o mesmo nome consista em múltiplas palavras. Na verdade, como veremos no próximo capítulo, o pré-processamento até mesmo divide algumas palavras em partes menores. Por exemplo, `Sylvain` é dividido em 4 pedaços: `S`, `##yl`, `##va`, e `##in`. No passo de pós-processamento, o pipeline satisfatoriamente reagrupa esses pedaços. > [!TIP] > ✏️ **Experimente!** Procure no Model Hub por um modelo capaz de fazer o tageamento de partes do discurso (usualmente abreviado como POS) em inglês. O que o modelo prediz para a sentença no exemplo acima? diff --git a/chapters/ro/chapter1/10.mdx b/chapters/ro/chapter1/10.mdx index 4a5ff081f..f71b73941 100644 --- a/chapters/ro/chapter1/10.mdx +++ b/chapters/ro/chapter1/10.mdx @@ -38,7 +38,7 @@ Mai întâi, însă, să testăm ceea ce ați învățat în acest capitol! ```py from transformers import pipeline -ner = pipeline("ner", grouped_entities=True) +ner = pipeline("ner", aggregation_strategy="simple") ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") ``` @@ -54,7 +54,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") }, { text: "Va returna cuvintele care reprezintă persoane, organizații sau locații.", - explain: "În plus, cu grouped_entities=True, va grupa împreună cuvintele care aparțin aceleiași entități, precum \"Hugging Face\".", + explain: "În plus, cu aggregation_strategy="simple", va grupa împreună cuvintele care aparțin aceleiași entități, precum \"Hugging Face\".", correct: true } ]} diff --git a/chapters/ro/chapter1/3.mdx b/chapters/ro/chapter1/3.mdx index 9a5d63d3d..84b7bf4a5 100644 --- a/chapters/ro/chapter1/3.mdx +++ b/chapters/ro/chapter1/3.mdx @@ -202,7 +202,7 @@ Named Entity Recognition (NER) este o sarcină în care modelul trebuie să găs ```python from transformers import pipeline -ner = pipeline("ner", grouped_entities=True) +ner = pipeline("ner", aggregation_strategy="simple") ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") ``` @@ -215,7 +215,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") Aici, modelul a identificat corect că Sylvain este o persoană (PER), Hugging Face o organizație (ORG), iar Brooklyn o locație (LOC). -Trecem opțiunea `grouped_entities=True` în funcția de creare a pipeline-ului pentru a-i spune pipeline-ului să regrupeze părțile propoziției care corespund aceleiași entități: aici, modelul a grupat corect „Hugging” și „Face” ca o singură organizație, chiar dacă numele este format din mai multe cuvinte. De fapt, după cum vom vedea în capitolul următor, preprocesarea chiar împarte unele cuvinte în părți mai mici. De exemplu, `Sylvain` este împărțit în patru părți: `S`, `##yl`, `##va`, și `##in`. În etapa de postprocesare, pipeline-ul a reușit să regrupeze aceste părți. +Trecem opțiunea `aggregation_strategy="simple"` în funcția de creare a pipeline-ului pentru a-i spune pipeline-ului să regrupeze părțile propoziției care corespund aceleiași entități: aici, modelul a grupat corect „Hugging” și „Face” ca o singură organizație, chiar dacă numele este format din mai multe cuvinte. De fapt, după cum vom vedea în capitolul următor, preprocesarea chiar împarte unele cuvinte în părți mai mici. De exemplu, `Sylvain` este împărțit în patru părți: `S`, `##yl`, `##va`, și `##in`. În etapa de postprocesare, pipeline-ul a reușit să regrupeze aceste părți. > [!TIP] > ✏️ **Încercați!** Căutați în Hub-ul de modele un model capabil să facă etichetarea părții de vorbire (de obicei abreviată ca POS) în limba engleză. Ce prezice acest model pentru propoziția din exemplul de mai sus? diff --git a/chapters/ru/chapter1/10.mdx b/chapters/ru/chapter1/10.mdx index 1487786bd..f8dfbc828 100644 --- a/chapters/ru/chapter1/10.mdx +++ b/chapters/ru/chapter1/10.mdx @@ -38,7 +38,7 @@ ```py from transformers import pipeline -ner = pipeline("ner", grouped_entities=True) +ner = pipeline("ner", aggregation_strategy="simple") ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") ``` @@ -54,7 +54,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") }, { text: "Пайплайн вернет слова, обозначающие персон, организаций или географических локаций.", - explain: "Кроме того, с аргументом grouped_entities=True, пайплайн сгруппирует слова, принадлежащие одной и той же сущности, например, \"Hugging Face\".", + explain: "Кроме того, с аргументом aggregation_strategy="simple", пайплайн сгруппирует слова, принадлежащие одной и той же сущности, например, \"Hugging Face\".", correct: true } ]} diff --git a/chapters/ru/chapter1/3.mdx b/chapters/ru/chapter1/3.mdx index 8b58d7993..535a4a3af 100644 --- a/chapters/ru/chapter1/3.mdx +++ b/chapters/ru/chapter1/3.mdx @@ -209,7 +209,7 @@ unmasker("This course will teach you all about models.", top_k=2) ```python from transformers import pipeline -ner = pipeline("ner", grouped_entities=True) +ner = pipeline("ner", aggregation_strategy="simple") ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") ``` @@ -222,7 +222,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") В этом примере модель корректно обозначила Sylvain как персону (PER), Hugging Face как организацию (ORG) и Brooklyn как локацию (LOC). -Мы передали в пайплайн аргумент `grouped_entities=True` для того, чтобы модель сгруппировала части предложения, соответствующие одной сущности: в данном случае модель объединила "Hugging" и "Face" несмотря на то, что название организации состоит из двух слов. На самом деле, как мы увидим в следующей главе, препроцессинг делит даже отдельные слова на несколько частей. Например, `Sylvain` будет разделено на 4 части: `S`, `##yl`, `##va`, and `##in`. На этапе постпроцессинга пайплайн успешно объединит эти части. +Мы передали в пайплайн аргумент `aggregation_strategy="simple"` для того, чтобы модель сгруппировала части предложения, соответствующие одной сущности: в данном случае модель объединила "Hugging" и "Face" несмотря на то, что название организации состоит из двух слов. На самом деле, как мы увидим в следующей главе, препроцессинг делит даже отдельные слова на несколько частей. Например, `Sylvain` будет разделено на 4 части: `S`, `##yl`, `##va`, and `##in`. На этапе постпроцессинга пайплайн успешно объединит эти части. > [!TIP] > ✏️ **Попробуйте!** Найдите на Model Hub модель, позволяющую решать задачу определения частей речи в предложении (part of speech tagging, POS). Что модель предскажет для предложения из примера выше? diff --git a/chapters/te/chapter1/3.mdx b/chapters/te/chapter1/3.mdx index 3dc3f04a9..7815bb800 100644 --- a/chapters/te/chapter1/3.mdx +++ b/chapters/te/chapter1/3.mdx @@ -232,7 +232,7 @@ unmasker("This course will teach you all about models.", top_k=2) ```python from transformers import pipeline -ner = pipeline("ner", grouped_entities=True) +ner = pipeline("ner", aggregation_strategy="simple") ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") ``` @@ -245,7 +245,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") ఇక్కడ మోడల్ సిల్వైన్ ఒక వ్యక్తి (PER), Hugging Face ఒక సంస్థ (ORG), మరియు బ్రూక్లిన్ ఒక ప్రదేశం (LOC) అని సరిగ్గా గుర్తించింది. -వాక్యంలోని ఒకే ఎంటిటీకి సంబంధించిన భాగాలను తిరిగి సమూహపరచమని పైప్‌లైన్‌కు చెప్పడానికి మేము పైప్‌లైన్ సృష్టి ఫంక్షన్‌లో `grouped_entities=True` ఎంపికను పాస్ చేస్తాము: ఇక్కడ మోడల్ "Hugging" మరియు "Face" ను ఒకే సంస్థగా సరిగ్గా సమూహపరిచింది, పేరు అనేక పదాలతో ఉన్నప్పటికీ. నిజానికి, మనం తదుపరి అధ్యాయంలో చూస్తాము, ప్రిప్రాసెసింగ్ కొన్ని పదాలను చిన్న భాగాలుగా కూడా విభజిస్తుంది. ఉదాహరణకు, `Sylvain` ను నాలుగు ముక్కలుగా విభజించారు: `S`, `##yl`, `##va`, మరియు `##in`. పోస్ట్-ప్రాసెసింగ్ దశలో, పైప్‌లైన్ ఆ ముక్కలను విజయవంతంగా తిరిగి సమూహపరిచింది. +వాక్యంలోని ఒకే ఎంటిటీకి సంబంధించిన భాగాలను తిరిగి సమూహపరచమని పైప్‌లైన్‌కు చెప్పడానికి మేము పైప్‌లైన్ సృష్టి ఫంక్షన్‌లో `aggregation_strategy="simple"` ఎంపికను పాస్ చేస్తాము: ఇక్కడ మోడల్ "Hugging" మరియు "Face" ను ఒకే సంస్థగా సరిగ్గా సమూహపరిచింది, పేరు అనేక పదాలతో ఉన్నప్పటికీ. నిజానికి, మనం తదుపరి అధ్యాయంలో చూస్తాము, ప్రిప్రాసెసింగ్ కొన్ని పదాలను చిన్న భాగాలుగా కూడా విభజిస్తుంది. ఉదాహరణకు, `Sylvain` ను నాలుగు ముక్కలుగా విభజించారు: `S`, `##yl`, `##va`, మరియు `##in`. పోస్ట్-ప్రాసెసింగ్ దశలో, పైప్‌లైన్ ఆ ముక్కలను విజయవంతంగా తిరిగి సమూహపరిచింది. > [!TIP] > ✏️ **ప్రయత్నించి చూడండి!** ఇంగ్లీషులో పార్ట్-ఆఫ్-స్పీచ్ ట్యాగింగ్ (సాధారణంగా POS అని సంక్షిప్తం) చేయగల మోడల్ కోసం మోడల్ హబ్‌ను శోధించండి. పై ఉదాహరణలోని వాక్యానికి ఈ మోడల్ ఏమి అంచనా వేస్తుంది? diff --git a/chapters/te/chapter1/7.mdx b/chapters/te/chapter1/7.mdx index 087f18d7b..8279c5ad7 100644 --- a/chapters/te/chapter1/7.mdx +++ b/chapters/te/chapter1/7.mdx @@ -42,7 +42,7 @@ ```py from transformers import pipeline -ner = pipeline("ner", grouped_entities=True) +ner = pipeline("ner", aggregation_strategy="simple") ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") ``` @@ -58,7 +58,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") }, { text: "ఇది వ్యక్తులు, సంస్థలు లేదా ప్రదేశాలను సూచించే పదాలను తిరిగి ఇస్తుంది.", - explain: "అంతేకాకుండా, grouped_entities=True తో, ఇది \"Hugging Face\" వంటి ఒకే ఎంటిటీకి చెందిన పదాలను సమూహపరుస్తుంది.", + explain: "అంతేకాకుండా, aggregation_strategy="simple" తో, ఇది \"Hugging Face\" వంటి ఒకే ఎంటిటీకి చెందిన పదాలను సమూహపరుస్తుంది.", correct: true } ]} diff --git a/chapters/th/chapter1/10.mdx b/chapters/th/chapter1/10.mdx index c66822858..5ad9c8d21 100644 --- a/chapters/th/chapter1/10.mdx +++ b/chapters/th/chapter1/10.mdx @@ -37,7 +37,7 @@ ```py from transformers import pipeline -ner = pipeline("ner", grouped_entities=True) +ner = pipeline("ner", aggregation_strategy="simple") ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") ``` @@ -53,7 +53,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") }, { text: "ได้ผลออกมาระบุว่าคำใดเป็นบุคคล, องค์กร, หรือสถานที่", - explain: "หากตั้งค่าว่า grouped_entities=True จะสามารถรวมคำหลายคำที่ระบุสิ่งเดียวกันไว้ได้ เช่น \"Hugging Face\" ประกอบด้วยคำสองคำ แต่ระบุถึงสิ่งสิ่งเดียว", + explain: "หากตั้งค่าว่า aggregation_strategy="simple" จะสามารถรวมคำหลายคำที่ระบุสิ่งเดียวกันไว้ได้ เช่น \"Hugging Face\" ประกอบด้วยคำสองคำ แต่ระบุถึงสิ่งสิ่งเดียว", correct: true } ]} diff --git a/chapters/th/chapter1/3.mdx b/chapters/th/chapter1/3.mdx index 5a11949de..94521e9b0 100644 --- a/chapters/th/chapter1/3.mdx +++ b/chapters/th/chapter1/3.mdx @@ -204,7 +204,7 @@ argument `top_k` ควบคุมจำนวนข้อความที่ ```python from transformers import pipeline -ner = pipeline("ner", grouped_entities=True) +ner = pipeline("ner", aggregation_strategy="simple") ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") ``` @@ -217,7 +217,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") ในส่วนนี้ โมเดลสามารถระบุได้ว่า Sylvian เป็นชื่อคน (PER) Hugging Face เป็นชื่อหน่วยงาน (ORG), และ Brooklyn เป็นชื่อสถานที่ (LOC) -เราเพิ่มตัวเลือก `grouped_entities=True` ตอนสร้างฟังก์ชัน pipeline เพื่อระบุให้ pipeline จับกลุ่มคำที่เป็นการระบุชื่อเฉพาะของสิ่ง ๆ เดียว ในที่นี้ โมเดลจับกลุ่มคำว่า "Hugging" และ "Face" เข้าไปเป็นชื่อองค์กรองค์กรเดียว แม้ว่าจะเป็นการรวมคำหลายคำเข้าด้วยกันก็ตาม ซึ่งจริง ๆ แล้ว ในบทต่อไปเราจะเห็นว่าการประมวลผลนั้นจะแบ่งคำบางคำออกมาเป็นส่วนที่แยกย่อยลงไปอีก ตัวอย่างเช่น คำว่า `Sylvian` ถูกแบ่งออกเป็น 4 ส่วน ได้แก่ `S`, `##yl`, `##va`, และ `##in` และระหว่างการ post-processing ตัว pipeline ก็จะนำแต่ละส่วนนี้มาประกอบเข้าด้วยกัน +เราเพิ่มตัวเลือก `aggregation_strategy="simple"` ตอนสร้างฟังก์ชัน pipeline เพื่อระบุให้ pipeline จับกลุ่มคำที่เป็นการระบุชื่อเฉพาะของสิ่ง ๆ เดียว ในที่นี้ โมเดลจับกลุ่มคำว่า "Hugging" และ "Face" เข้าไปเป็นชื่อองค์กรองค์กรเดียว แม้ว่าจะเป็นการรวมคำหลายคำเข้าด้วยกันก็ตาม ซึ่งจริง ๆ แล้ว ในบทต่อไปเราจะเห็นว่าการประมวลผลนั้นจะแบ่งคำบางคำออกมาเป็นส่วนที่แยกย่อยลงไปอีก ตัวอย่างเช่น คำว่า `Sylvian` ถูกแบ่งออกเป็น 4 ส่วน ได้แก่ `S`, `##yl`, `##va`, และ `##in` และระหว่างการ post-processing ตัว pipeline ก็จะนำแต่ละส่วนนี้มาประกอบเข้าด้วยกัน > [!TIP] > ✏️ **ลองเลย!** หาโมเดลใน Model Hub ที่ทำงานเกี่ยวกับการระบุชื่อเฉพาะ(หรือเรียกว่า part-of-speech tagging ย่อว่า POS)ในภาษาอังกฤษ รู้มั้ยว่าโมเดลนี้ทำนายอะไรในตัวอย่างประโยคข้างต้น? diff --git a/chapters/vi/chapter1/10.mdx b/chapters/vi/chapter1/10.mdx index cb5f871a8..7dccf694d 100644 --- a/chapters/vi/chapter1/10.mdx +++ b/chapters/vi/chapter1/10.mdx @@ -39,7 +39,7 @@ Tuy nhiên, trước tiên, hãy kiểm tra những gì bạn đã học đượ ```py from transformers import pipeline -ner = pipeline("ner", grouped_entities=True) +ner = pipeline("ner", aggregation_strategy="simple") ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") ``` @@ -58,7 +58,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") { text: "Nó sẽ trả về các từ đại diện cho người, tổ chức hoặc địa điểm.", explain: - 'Hơn nữa, với grouped_entities=True, nó sẽ nhóm các từ thuộc cùng một thực thể lại với nhau, ví dụ như "Hugging Face".', + 'Hơn nữa, với aggregation_strategy="simple", nó sẽ nhóm các từ thuộc cùng một thực thể lại với nhau, ví dụ như "Hugging Face".', correct: true, }, ]} diff --git a/chapters/vi/chapter1/3.mdx b/chapters/vi/chapter1/3.mdx index eb0106c69..4bcebbfa0 100644 --- a/chapters/vi/chapter1/3.mdx +++ b/chapters/vi/chapter1/3.mdx @@ -202,7 +202,7 @@ Nhận dạng thực thể được đặt tên (NER) là một tác vụ trong ```python from transformers import pipeline -ner = pipeline("ner", grouped_entities=True) +ner = pipeline("ner", aggregation_strategy="simple") ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") ``` @@ -215,7 +215,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") Ở đây, mô hình đã xác định chính xác rằng Sylvain là một người (PER), Hugging Face là một tổ chức (ORG) và Brooklyn là một địa điểm (LOC). -Chúng ta truyền `grouped_entities = True` vào trong hàm pipeline để yêu cầu pipeline nhóm lại các phần thuộc cùng một thực thể trong câu với nhau: ở đây mô hình đã nhóm chính xác "Hugging" và "Face" thành một tổ chức duy nhất, mặc dù tên bao gồm nhiều từ. Trên thực tế, như chúng ta sẽ thấy trong chương tiếp theo, quá trình tiền xử lý thậm chí còn chia một số từ thành các phần nhỏ hơn. Ví dụ: `Sylvain` được chia thành bốn phần: `S`, `##yl`, `##va`, và `##in`. Trong bước hậu xử lý, pipeline đã tập hợp lại thành công các phần đó. +Chúng ta truyền `aggregation_strategy="simple"` vào trong hàm pipeline để yêu cầu pipeline nhóm lại các phần thuộc cùng một thực thể trong câu với nhau: ở đây mô hình đã nhóm chính xác "Hugging" và "Face" thành một tổ chức duy nhất, mặc dù tên bao gồm nhiều từ. Trên thực tế, như chúng ta sẽ thấy trong chương tiếp theo, quá trình tiền xử lý thậm chí còn chia một số từ thành các phần nhỏ hơn. Ví dụ: `Sylvain` được chia thành bốn phần: `S`, `##yl`, `##va`, và `##in`. Trong bước hậu xử lý, pipeline đã tập hợp lại thành công các phần đó. > [!TIP] > ✏️ **Thử nghiệm thôi!** Tìm kiếm trên Model Hub để tìm một mô hình có thể thực hiện gán nhãn từ loại (thường được viết tắt là POS) bằng tiếng Anh. Mô hình này dự đoán điều gì cho câu trong ví dụ trên? diff --git a/chapters/zh-CN/chapter1/10.mdx b/chapters/zh-CN/chapter1/10.mdx index 2900c46d2..4bff82cdf 100644 --- a/chapters/zh-CN/chapter1/10.mdx +++ b/chapters/zh-CN/chapter1/10.mdx @@ -36,7 +36,7 @@ ```py from transformers import pipeline -ner = pipeline("ner", grouped_entities=True) +ner = pipeline("ner", aggregation_strategy="simple") ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") ``` @@ -52,7 +52,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") }, { text: "它找出代表人员、组织或位置的单词。", - explain: "正解! 此外,使用 grouped_entities=True,可以将属于同一实体的单词组合在一起,例如“Hugging Face”。", + explain: "正解! 此外,使用 aggregation_strategy="simple",可以将属于同一实体的单词组合在一起,例如“Hugging Face”。", correct: true } ]} diff --git a/chapters/zh-CN/chapter1/3.mdx b/chapters/zh-CN/chapter1/3.mdx index 032430bfa..028191c31 100644 --- a/chapters/zh-CN/chapter1/3.mdx +++ b/chapters/zh-CN/chapter1/3.mdx @@ -193,7 +193,7 @@ unmasker("This course will teach you all about models.", top_k=2) ```python from transformers import pipeline -ner = pipeline("ner", grouped_entities=True) +ner = pipeline("ner", aggregation_strategy="simple") ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") ``` ```python out @@ -205,7 +205,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") 在这里,模型正确地识别出 Sylvain 是一个人 (PER),Hugging Face 是一个组织 (ORG),而布鲁克林是一个位置 (LOC)。 -我们在创建 pipeline 的函数中传递的 `grouped_entities=True` 参数告诉 pipeline 将与同一实体对应的句子部分重新分组:这里模型正确地将“Hugging”和“Face”分组为一个组织,即使名称由多个词组成。事实上,正如我们即将在下一章看到的,预处理甚至会将一些单词分成更小的部分。例如, `Sylvain` 分割为了四部分: `S、##yl、##va` 和 `##in` 。在后处理步骤中,pipeline 成功地重新组合了这些部分。 +我们在创建 pipeline 的函数中传递的 `aggregation_strategy="simple"` 参数告诉 pipeline 将与同一实体对应的句子部分重新分组:这里模型正确地将“Hugging”和“Face”分组为一个组织,即使名称由多个词组成。事实上,正如我们即将在下一章看到的,预处理甚至会将一些单词分成更小的部分。例如, `Sylvain` 分割为了四部分: `S、##yl、##va` 和 `##in` 。在后处理步骤中,pipeline 成功地重新组合了这些部分。 > [!TIP] > ✏️**快来试试吧!**在模型中心(hub)搜索能够用英语进行词性标注(通常缩写为 POS)的模型。对于上面示例中的句子,这个词性标注的模型预测了什么? diff --git a/chapters/zh-TW/chapter1/10.mdx b/chapters/zh-TW/chapter1/10.mdx index cea2082cd..68f7abfec 100644 --- a/chapters/zh-TW/chapter1/10.mdx +++ b/chapters/zh-TW/chapter1/10.mdx @@ -37,7 +37,7 @@ ```py from transformers import pipeline -ner = pipeline("ner", grouped_entities=True) +ner = pipeline("ner", aggregation_strategy="simple") ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") ``` @@ -53,7 +53,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") }, { text: "它將返回代表人員、組織或位置的單詞。", - explain: "此外,使用 grouped_entities=True,它會將屬於同一實體的單詞組合在一起,例如“Hugging Face”。", + explain: "此外,使用 aggregation_strategy="simple",它會將屬於同一實體的單詞組合在一起,例如“Hugging Face”。", correct: true } ]} diff --git a/chapters/zh-TW/chapter1/3.mdx b/chapters/zh-TW/chapter1/3.mdx index 513f7dfd5..0a72a9902 100644 --- a/chapters/zh-TW/chapter1/3.mdx +++ b/chapters/zh-TW/chapter1/3.mdx @@ -180,7 +180,7 @@ unmasker("This course will teach you all about models.", top_k=2) ```python from transformers import pipeline -ner = pipeline("ner", grouped_entities=True) +ner = pipeline("ner", aggregation_strategy="simple") ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") ``` ```python out @@ -191,7 +191,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") ``` 在這裡,模型正確地識別出 Sylvain 是一個人 (PER),Hugging Face 是一個組織 (ORG),而布魯克林是一個位置 (LOC)。 -我們在pipeline創建函數中傳遞選項 **grouped_entities=True** 以告訴pipeline將對應於同一實體的句子部分重新組合在一起:這裡模型正確地將「Hugging」和「Face」分組為一個組織,即使名稱由多個詞組成。事實上,正如我們即將在下一章看到的,預處理甚至會將一些單詞分成更小的部分。例如,**Sylvain** 分割為了四部分:**S、##yl、##va** 和 **##in**。在後處理步驟中,pipeline成功地重新組合了這些部分。 +我們在pipeline創建函數中傳遞選項 **aggregation_strategy="simple"** 以告訴pipeline將對應於同一實體的句子部分重新組合在一起:這裡模型正確地將「Hugging」和「Face」分組為一個組織,即使名稱由多個詞組成。事實上,正如我們即將在下一章看到的,預處理甚至會將一些單詞分成更小的部分。例如,**Sylvain** 分割為了四部分:**S、##yl、##va** 和 **##in**。在後處理步驟中,pipeline成功地重新組合了這些部分。 > [!TIP] > ✏️**快來試試吧!** 在模型中心(hub)搜索能夠用英語進行詞性標注(通常縮寫為 POS)的模型。這個模型對上面例子中的句子預測了什麼? diff --git a/subtitles/en/raw/chapter1/03_the-pipeline-function.md b/subtitles/en/raw/chapter1/03_the-pipeline-function.md index 85602a47b..ac4bc4995 100644 --- a/subtitles/en/raw/chapter1/03_the-pipeline-function.md +++ b/subtitles/en/raw/chapter1/03_the-pipeline-function.md @@ -1 +1 @@ -The pipeline function. The pipeline function is the most high-level API of the Transformers library. It regroups together all the steps to go from raw texts to usable predictions. The model used is at the core of a pipeline, but the pipeline also include all the necessary pre-processing (since the model does not expect texts, but numbers) as well as some post-processing to make the output of the model human-readable. Let's look at a first example with the sentiment analysis pipeline. This pipeline performs text classification on a given input, and determines if it's positive or negative. Here, it attributed the positive label on the given text, with a confidence of 95%. You can pass multiple texts to the same pipeline, which will be processed and passed through the model together, as a batch. The output is a list of individual results, in the same order as the input texts. Here we find the same label and score for the first text, and the second text is judged positive with a confidence of 99.99%. The zero-shot classification pipeline is a more general text-classification pipeline: it allows you to provide the labels you want. Here we want to classify our input text along the labels "education", "politics" and "business". The pipeline successfully recognizes it's more about education than the other labels, with a confidence of 84%. Moving on to other tasks, the text generation pipeline will auto-complete a given prompt. The output is generated with a bit of randomness, so it changes each time you call the generator object on a given prompt. Up until now, we have used the pipeline API with the default model associated to each task, but you can use it with any model that has been pretrained or fine-tuned on this task. Going on the model hub (huggingface.co/models), you can filter the available models by task. The default model used in our previous example was gpt2, but there are many more models available, and not just in English! Let's go back to the text generation pipeline and load it with another model, distilgpt2. This is a lighter version of gpt2 created by the Hugging Face team. When applying the pipeline to a given prompt, we can specify several arguments, such as the maximum length of the generated texts, or the number of sentences we want to return (since there is some randomness in the generation). Generating text by guessing the next word in a sentence was the pretraining objective of GPT-2, the fill mask pipeline is the pretraining objective of BERT, which is to guess the value of masked word. In this case, we ask the two most likely values for the missing words (according to the model) and get mathematical or computational as possible answers. Another task Transformers model can perform is to classify each word in the sentence instead of the sentence as a whole. One example of this is Named Entity Recognition, which is the task of identifying entities, such as persons, organizations or locations in a sentence. Here, the model correctly finds the person (Sylvain), the organization (Hugging Face) as well as the location (Brooklyn) inside the input text. The grouped_entities=True argument used is to make the pipeline group together the different words linked to the same entity (such as Hugging and Face here). Another task available with the pipeline API is extractive question answering. Providing a context and a question, the model will identify the span of text in the context containing the answer to the question. Getting short summaries of very long articles is also something the Transformers library can help with, with the summarization pipeline. Finally, the last task supported by the pipeline API is translation. Here we use a French/English model found on the model hub to get the English version of our input text. Here is a brief summary of all the tasks we looked into in this video. Try then out through the inference widgets in the model hub! \ No newline at end of file +The pipeline function. The pipeline function is the most high-level API of the Transformers library. It regroups together all the steps to go from raw texts to usable predictions. The model used is at the core of a pipeline, but the pipeline also include all the necessary pre-processing (since the model does not expect texts, but numbers) as well as some post-processing to make the output of the model human-readable. Let's look at a first example with the sentiment analysis pipeline. This pipeline performs text classification on a given input, and determines if it's positive or negative. Here, it attributed the positive label on the given text, with a confidence of 95%. You can pass multiple texts to the same pipeline, which will be processed and passed through the model together, as a batch. The output is a list of individual results, in the same order as the input texts. Here we find the same label and score for the first text, and the second text is judged positive with a confidence of 99.99%. The zero-shot classification pipeline is a more general text-classification pipeline: it allows you to provide the labels you want. Here we want to classify our input text along the labels "education", "politics" and "business". The pipeline successfully recognizes it's more about education than the other labels, with a confidence of 84%. Moving on to other tasks, the text generation pipeline will auto-complete a given prompt. The output is generated with a bit of randomness, so it changes each time you call the generator object on a given prompt. Up until now, we have used the pipeline API with the default model associated to each task, but you can use it with any model that has been pretrained or fine-tuned on this task. Going on the model hub (huggingface.co/models), you can filter the available models by task. The default model used in our previous example was gpt2, but there are many more models available, and not just in English! Let's go back to the text generation pipeline and load it with another model, distilgpt2. This is a lighter version of gpt2 created by the Hugging Face team. When applying the pipeline to a given prompt, we can specify several arguments, such as the maximum length of the generated texts, or the number of sentences we want to return (since there is some randomness in the generation). Generating text by guessing the next word in a sentence was the pretraining objective of GPT-2, the fill mask pipeline is the pretraining objective of BERT, which is to guess the value of masked word. In this case, we ask the two most likely values for the missing words (according to the model) and get mathematical or computational as possible answers. Another task Transformers model can perform is to classify each word in the sentence instead of the sentence as a whole. One example of this is Named Entity Recognition, which is the task of identifying entities, such as persons, organizations or locations in a sentence. Here, the model correctly finds the person (Sylvain), the organization (Hugging Face) as well as the location (Brooklyn) inside the input text. The aggregation_strategy="simple" argument used is to make the pipeline group together the different words linked to the same entity (such as Hugging and Face here). Another task available with the pipeline API is extractive question answering. Providing a context and a question, the model will identify the span of text in the context containing the answer to the question. Getting short summaries of very long articles is also something the Transformers library can help with, with the summarization pipeline. Finally, the last task supported by the pipeline API is translation. Here we use a French/English model found on the model hub to get the English version of our input text. Here is a brief summary of all the tasks we looked into in this video. Try then out through the inference widgets in the model hub! \ No newline at end of file From 40b05a8d5e2a7abbf03dfeb3f02da3c5e4f8b2e1 Mon Sep 17 00:00:00 2001 From: Dhruvil Darji Date: Mon, 23 Feb 2026 15:12:59 -0800 Subject: [PATCH 2/3] fix: escape double quotes in aggregation_strategy in quiz explain strings --- chapters/de/chapter1/10.mdx | 2 +- chapters/es/chapter1/10.mdx | 2 +- chapters/fr/chapter1/10.mdx | 2 +- chapters/hi/chapter1/10.mdx | 2 +- chapters/it/chapter1/10.mdx | 2 +- chapters/ja/chapter1/10.mdx | 2 +- chapters/ko/chapter1/10.mdx | 2 +- chapters/ro/chapter1/10.mdx | 2 +- chapters/ru/chapter1/10.mdx | 2 +- chapters/th/chapter1/10.mdx | 2 +- chapters/zh-CN/chapter1/10.mdx | 2 +- chapters/zh-TW/chapter1/10.mdx | 2 +- 12 files changed, 12 insertions(+), 12 deletions(-) diff --git a/chapters/de/chapter1/10.mdx b/chapters/de/chapter1/10.mdx index 9841da200..c6102e63e 100644 --- a/chapters/de/chapter1/10.mdx +++ b/chapters/de/chapter1/10.mdx @@ -55,7 +55,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") }, { text: "Er gibt Begriffe zurück, die für Personen, Organisationen oder Orte stehen.", - explain: "Außerdem werden mit aggregation_strategy="simple" die Wörter, die zur selben Entität gehören, gruppiert, wie z. B. \"Hugging Face\".", + explain: "Außerdem werden mit aggregation_strategy="simple" die Wörter, die zur selben Entität gehören, gruppiert, wie z. B. \"Hugging Face\".", correct: true } ]} diff --git a/chapters/es/chapter1/10.mdx b/chapters/es/chapter1/10.mdx index 7eab034d3..df28f4454 100644 --- a/chapters/es/chapter1/10.mdx +++ b/chapters/es/chapter1/10.mdx @@ -52,7 +52,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") }, { text: "Devuelve las palabras que representan personas, organizaciones o ubicaciones.", - explain: "Adicionalmente, con aggregation_strategy="simple", agrupará las palabras que pertenecen a la misma entidad, como \"Hugging Face\".", + explain: "Adicionalmente, con aggregation_strategy=\"simple\", agrupará las palabras que pertenecen a la misma entidad, como \"Hugging Face\".", correct: true } ]} diff --git a/chapters/fr/chapter1/10.mdx b/chapters/fr/chapter1/10.mdx index c486b1c64..92a9f1402 100644 --- a/chapters/fr/chapter1/10.mdx +++ b/chapters/fr/chapter1/10.mdx @@ -56,7 +56,7 @@ ner( }, { text: "Il renvoie les entités nommées dans cette phrase, telles que les personnes, les organisations ou lieux.", - explain: "De plus, avec aggregation_strategy="simple", cela regroupe les mots appartenant à la même entité, comme par exemple \"Hugging Face\".", + explain: "De plus, avec aggregation_strategy=\"simple\", cela regroupe les mots appartenant à la même entité, comme par exemple \"Hugging Face\".", correct: true } ]} diff --git a/chapters/hi/chapter1/10.mdx b/chapters/hi/chapter1/10.mdx index f52c6df42..30e11d872 100644 --- a/chapters/hi/chapter1/10.mdx +++ b/chapters/hi/chapter1/10.mdx @@ -50,7 +50,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") }, { text: "यह व्यक्तियों, संगठनों या स्थानों का प्रतिनिधित्व करने वाले शब्दों को वापस कर देगा।", - explain: "इसके अलावा, aggregation_strategy="simple" के साथ, यह एक ही इकाई से संबंधित शब्दों को एक साथ समूहित करेगा, जैसे \"हगिंग फेस\"।", + explain: "इसके अलावा, aggregation_strategy=\"simple\" के साथ, यह एक ही इकाई से संबंधित शब्दों को एक साथ समूहित करेगा, जैसे \"हगिंग फेस\"।", correct: true } ]} diff --git a/chapters/it/chapter1/10.mdx b/chapters/it/chapter1/10.mdx index 731385273..ad1c8114f 100644 --- a/chapters/it/chapter1/10.mdx +++ b/chapters/it/chapter1/10.mdx @@ -55,7 +55,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") }, { text: "Restituisce i termini che rappresentano persone, organizzazioni o luoghi.", - explain: "Inoltre, grazie a aggregation_strategy="simple", la pipeline è in grado di raggruppare le parole che appartengono alla stessa entità, come \"Hugging Face\".", + explain: "Inoltre, grazie a aggregation_strategy=\"simple\", la pipeline è in grado di raggruppare le parole che appartengono alla stessa entità, come \"Hugging Face\".", correct: true } ]} diff --git a/chapters/ja/chapter1/10.mdx b/chapters/ja/chapter1/10.mdx index 40f4bca7d..5e80bd4e5 100644 --- a/chapters/ja/chapter1/10.mdx +++ b/chapters/ja/chapter1/10.mdx @@ -53,7 +53,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") }, { text: "この文中の人物、団体、場所を表す単語を返します。", - explain: "さらに、aggregation_strategy="simple"を用いると、同じエンティティに属する単語をグループ化します。", + explain: "さらに、aggregation_strategy=\"simple\"を用いると、同じエンティティに属する単語をグループ化します。", correct: true } ]} diff --git a/chapters/ko/chapter1/10.mdx b/chapters/ko/chapter1/10.mdx index cf8ea77f9..ac1d251ca 100644 --- a/chapters/ko/chapter1/10.mdx +++ b/chapters/ko/chapter1/10.mdx @@ -54,7 +54,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") }, { text: "사람, 기관, 장소 등을 나타내는 단어들을 반환합니다.", - explain: "이 뿐만 아니라, aggregation_strategy="simple"를 사용해 \"Hugging Face\"와 같이 같은 개체에 해당하는 단어들을 그룹화해줍니다.", + explain: "이 뿐만 아니라, aggregation_strategy=\"simple\"를 사용해 \"Hugging Face\"와 같이 같은 개체에 해당하는 단어들을 그룹화해줍니다.", correct: true } ]} diff --git a/chapters/ro/chapter1/10.mdx b/chapters/ro/chapter1/10.mdx index f71b73941..be6aa629f 100644 --- a/chapters/ro/chapter1/10.mdx +++ b/chapters/ro/chapter1/10.mdx @@ -54,7 +54,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") }, { text: "Va returna cuvintele care reprezintă persoane, organizații sau locații.", - explain: "În plus, cu aggregation_strategy="simple", va grupa împreună cuvintele care aparțin aceleiași entități, precum \"Hugging Face\".", + explain: "În plus, cu aggregation_strategy=\"simple\", va grupa împreună cuvintele care aparțin aceleiași entități, precum \"Hugging Face\".", correct: true } ]} diff --git a/chapters/ru/chapter1/10.mdx b/chapters/ru/chapter1/10.mdx index f8dfbc828..61a1cb5a0 100644 --- a/chapters/ru/chapter1/10.mdx +++ b/chapters/ru/chapter1/10.mdx @@ -54,7 +54,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") }, { text: "Пайплайн вернет слова, обозначающие персон, организаций или географических локаций.", - explain: "Кроме того, с аргументом aggregation_strategy="simple", пайплайн сгруппирует слова, принадлежащие одной и той же сущности, например, \"Hugging Face\".", + explain: "Кроме того, с аргументом aggregation_strategy=\"simple\", пайплайн сгруппирует слова, принадлежащие одной и той же сущности, например, \"Hugging Face\".", correct: true } ]} diff --git a/chapters/th/chapter1/10.mdx b/chapters/th/chapter1/10.mdx index 5ad9c8d21..1ce769def 100644 --- a/chapters/th/chapter1/10.mdx +++ b/chapters/th/chapter1/10.mdx @@ -53,7 +53,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") }, { text: "ได้ผลออกมาระบุว่าคำใดเป็นบุคคล, องค์กร, หรือสถานที่", - explain: "หากตั้งค่าว่า aggregation_strategy="simple" จะสามารถรวมคำหลายคำที่ระบุสิ่งเดียวกันไว้ได้ เช่น \"Hugging Face\" ประกอบด้วยคำสองคำ แต่ระบุถึงสิ่งสิ่งเดียว", + explain: "หากตั้งค่าว่า aggregation_strategy=\"simple\" จะสามารถรวมคำหลายคำที่ระบุสิ่งเดียวกันไว้ได้ เช่น \"Hugging Face\" ประกอบด้วยคำสองคำ แต่ระบุถึงสิ่งสิ่งเดียว", correct: true } ]} diff --git a/chapters/zh-CN/chapter1/10.mdx b/chapters/zh-CN/chapter1/10.mdx index 4bff82cdf..d8b385ead 100644 --- a/chapters/zh-CN/chapter1/10.mdx +++ b/chapters/zh-CN/chapter1/10.mdx @@ -52,7 +52,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") }, { text: "它找出代表人员、组织或位置的单词。", - explain: "正解! 此外,使用 aggregation_strategy="simple",可以将属于同一实体的单词组合在一起,例如“Hugging Face”。", + explain: "正解! 此外,使用 aggregation_strategy=\"simple\",可以将属于同一实体的单词组合在一起,例如“Hugging Face”。", correct: true } ]} diff --git a/chapters/zh-TW/chapter1/10.mdx b/chapters/zh-TW/chapter1/10.mdx index 68f7abfec..996e3ed89 100644 --- a/chapters/zh-TW/chapter1/10.mdx +++ b/chapters/zh-TW/chapter1/10.mdx @@ -53,7 +53,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") }, { text: "它將返回代表人員、組織或位置的單詞。", - explain: "此外,使用 aggregation_strategy="simple",它會將屬於同一實體的單詞組合在一起,例如“Hugging Face”。", + explain: "此外,使用 aggregation_strategy=\"simple\",它會將屬於同一實體的單詞組合在一起,例如“Hugging Face”。", correct: true } ]} From 83172ac33286ff8b490f0b8b6db98cd9c9c76d1a Mon Sep 17 00:00:00 2001 From: Dhruvil Darji Date: Mon, 23 Feb 2026 19:24:40 -0800 Subject: [PATCH 3/3] Fix unescaped quotes in aggregation_strategy causing Svelte build failure Escape double quotes inside tags using \" to prevent Svelte parser errors during documentation build. Fixes en, te, pt, and my translations. --- chapters/en/chapter1/7.mdx | 2 +- chapters/my/chapter1/7.mdx | 2 +- chapters/pt/chapter1/10.mdx | 2 +- chapters/te/chapter1/7.mdx | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/chapters/en/chapter1/7.mdx b/chapters/en/chapter1/7.mdx index 4091472f3..e771b88eb 100644 --- a/chapters/en/chapter1/7.mdx +++ b/chapters/en/chapter1/7.mdx @@ -53,7 +53,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") }, { text: "It will return the words representing persons, organizations or locations.", - explain: "Furthermore, with aggregation_strategy="simple", it will group together the words belonging to the same entity, like \"Hugging Face\".", + explain: "Furthermore, with aggregation_strategy=\"simple\", it will group together the words belonging to the same entity, like \"Hugging Face\".", correct: true } ]} diff --git a/chapters/my/chapter1/7.mdx b/chapters/my/chapter1/7.mdx index bb4be8c49..6a6b7346b 100644 --- a/chapters/my/chapter1/7.mdx +++ b/chapters/my/chapter1/7.mdx @@ -52,7 +52,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") }, { text: "၎င်းသည် လူပုဂ္ဂိုလ်များ၊ အဖွဲ့အစည်းများ သို့မဟုတ် နေရာများကို ကိုယ်စားပြုသည့် စကားလုံးများကို ပြန်ပေးပါလိမ့်မည်။", - explain: "ထို့အပြင် `aggregation_strategy="simple"` ကို အသုံးပြုထားသောကြောင့် ၎င်းသည် 'Hugging Face' ကဲ့သို့သော တူညီသည့် entity နှင့် သက်ဆိုင်သည့် စကားလုံးများကို အုပ်စုဖွဲ့ပေးပါလိမ့်မည်။", + explain: "ထို့အပြင် aggregation_strategy=\"simple\" ကို အသုံးပြုထားသောကြောင့် ၎င်းသည် 'Hugging Face' ကဲ့သို့သော တူညီသည့် entity နှင့် သက်ဆိုင်သည့် စကားလုံးများကို အုပ်စုဖွဲ့ပေးပါလိမ့်မည်။", correct: true } ]} diff --git a/chapters/pt/chapter1/10.mdx b/chapters/pt/chapter1/10.mdx index 9eaa97ad4..cfec34349 100644 --- a/chapters/pt/chapter1/10.mdx +++ b/chapters/pt/chapter1/10.mdx @@ -52,7 +52,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") }, { text: "Ele retornará as palavras que representam pessoas, organizações ou locais.", - explain: "Além disso, com `aggregation_strategy="simple"`, ele agrupará as palavras pertencentes à mesma entidade, como 'Hugging Face'.", + explain: "Além disso, com aggregation_strategy=\"simple\", ele agrupará as palavras pertencentes à mesma entidade, como 'Hugging Face'.", correct: true } ]} diff --git a/chapters/te/chapter1/7.mdx b/chapters/te/chapter1/7.mdx index 8279c5ad7..f6d2f435e 100644 --- a/chapters/te/chapter1/7.mdx +++ b/chapters/te/chapter1/7.mdx @@ -58,7 +58,7 @@ ner("My name is Sylvain and I work at Hugging Face in Brooklyn.") }, { text: "ఇది వ్యక్తులు, సంస్థలు లేదా ప్రదేశాలను సూచించే పదాలను తిరిగి ఇస్తుంది.", - explain: "అంతేకాకుండా, aggregation_strategy="simple" తో, ఇది \"Hugging Face\" వంటి ఒకే ఎంటిటీకి చెందిన పదాలను సమూహపరుస్తుంది.", + explain: "అంతేకాకుండా, aggregation_strategy=\"simple\" తో, ఇది \"Hugging Face\" వంటి ఒకే ఎంటిటీకి చెందిన పదాలను సమూహపరుస్తుంది.", correct: true } ]}