Skip to content

Why is it better to provide raw data to Khiops instead of preprocessed or encoded data? #510

Answered by lucaurelien
lucaurelien asked this question in Q&A
Discussion options

You must be logged in to vote

Khiops is specifically designed to work directly with raw data, minimizing the need for manual preprocessing. In fact, it is strongly recommended to avoid preparing your data to prevent loss of information, whether it concerns variable encoding, missing values, or data flattening (propositionalization).

Khiops employs the MODL formalism, based on the MDL (Minimum Description Length) principle, to encode variables optimally:

  • Categorical variables: Khiops automatically groups values into clusters based on their correlation with the target variable.
  • Numerical variables: Khiops uses discretization to partition values into intervals, ensuring that each interval represents a meaningful range o…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by marcboulle
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
1 participant