feat(nlp): add basic nlp transformations #40

AlisherAmirbek · 2024-10-05T12:53:33Z

No description provided.

…ming transformations

cowana-ai · 2024-10-07T12:08:25Z

.idea/vcs.xml

what are these files?

cowana-ai · 2024-10-07T12:08:34Z

feature_fabrica/transform/NLP.py

+
+
+class NGrams(Transformation):
+    _name_ = "NGrams"


cowana-ai · 2024-10-07T12:10:00Z

feature_fabrica/transform/NLP.py

+class NGrams(Transformation):
+    _name_ = "NGrams"
+    @beartype
+    def __init__(self, n: int):


allow to set other parameters

cowana-ai · 2024-10-07T12:10:08Z

feature_fabrica/transform/NLP.py

+
+
+class Stemming(Transformation):
+    _name_ = "Stemming"


cowana-ai · 2024-10-07T12:10:19Z

feature_fabrica/transform/NLP.py

+    @beartype
+    def __init__(self):
+        super().__init__()
+        self.stemmer = PorterStemmer()


allow to choose other stemmers

cowana-ai · 2024-10-07T12:12:45Z

feature_fabrica/transform/NLP.py

+    @beartype
+    def __init__(self, max_features: int, ngram_range: tuple[int, int], stop_words: list[str] | None = None):
+        super().__init__()
+        self.vectorizer = TfidfVectorizer(max_features=max_features, ngram_range=ngram_range, stop_words=stop_words)


allow to set all parameters

cowana-ai · 2024-10-07T12:13:14Z

feature_fabrica/transform/NLP.py

+
+    @beartype
+    def execute(self, data: StrArray) -> NumericArray:
+        return self.vectorizer.fit_transform(data).toarray()


we shouldn't apply fit_transform on test data. should be separate fit & transform

cowana-ai · 2024-10-07T12:14:07Z

feature_fabrica/transform/NLP.py

+            ngram_range = tuple(ngram_range)
+
+        self.ngram_range = ngram_range
+        self.vectorizer = CountVectorizer(


allow to set all parameters

cowana-ai · 2024-10-07T12:14:12Z

feature_fabrica/transform/NLP.py

+
+
+class BagOfWords(Transformation):
+    _name_ = 'BagOfWords'


cowana-ai · 2024-10-07T12:14:24Z

feature_fabrica/transform/NLP.py

+
+    @beartype
+    def execute(self, data: StrArray) -> np.ndarray:
+        return self.vectorizer.fit_transform(data).toarray()


we shouldn't apply fit_transform on test data. should be separate fit & transform

AlisherAmirbek and others added 5 commits October 1, 2024 02:34

feat(nlp): add Bag-of-Words, TF-IDF, and N-grams, lemmatization, stem…

576be7a

…ming transformations

Merge branch 'cowana-ai:main' into main

1a32259

chore: apply pre-commit fixes (EOF, pyupgrade, docformatter, isort)

2cb75ba

fix: add nltk package

735c172

Merge branch 'cowana-ai:main' into NLP_transform

6ca0b1d

cowana-ai reviewed Oct 7, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(nlp): add basic nlp transformations #40

feat(nlp): add basic nlp transformations #40

AlisherAmirbek commented Oct 5, 2024

cowana-ai Oct 7, 2024

cowana-ai Oct 7, 2024

cowana-ai Oct 7, 2024

cowana-ai Oct 7, 2024

cowana-ai Oct 7, 2024

cowana-ai Oct 7, 2024

cowana-ai Oct 7, 2024

cowana-ai Oct 7, 2024

cowana-ai Oct 7, 2024

cowana-ai Oct 7, 2024

feat(nlp): add basic nlp transformations #40

Are you sure you want to change the base?

feat(nlp): add basic nlp transformations #40

Conversation

AlisherAmirbek commented Oct 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment