Spinner logo OtasAI

OtasAI

Did You Know?


Categorical data - Ordinal Encoding

Introduction

Ordinal encoding is a vital preprocessing step for many machine learning models, especially those that require categorical data to be represented in an ordered numerical format. OtasML, a visual machine learning tool, offers an Ordinal Encoding feature within its data preparation model. This tool converts categorical data into ordered numerical labels, ensuring that the ordinal relationships among categories are preserved. This article explores how to configure the Ordinal Encoding feature to optimize your data preprocessing workflow.

Configurations

The Ordinal Encoding tool in OtasML provides flexible options for encoding categorical data into ordered numerical labels, allowing users to customize their datasets effectively. Below are the key configurations and options available:

Subset

  • Default Value: None
  • Description: This option allows users to select specific columns for ordinal encoding. By specifying the subset of columns, users can ensure that only the desired categorical columns are transformed, providing more control over the preprocessing step.

Handle Unknown

  • Default Value: Error
  • Description: Specifies the way unknown categories are handled during transformation. The available options include:
    • Error: Raise an error if an unknown categorical feature is present during the transformation. This ensures that all categories are accounted for during preprocessing.
    • Use Encoded Value: Use the encoded value of unknown categories as specified by the parameter unknown value. This option allows for flexibility in handling unseen categories during transformation.

Unknown Value

  • Default Value: None
  • Description: When the parameter handle unknown is set to "Use Encoded Value," this parameter is required and sets the encoded value for unknown categories. This provides a consistent way to handle categories that were not seen during the fitting process.

Encoded Missing Value

  • Default Value: None
  • Description: Specifies the encoded value for missing categories. This ensures that missing data is consistently handled and encoded, maintaining the integrity of the dataset.

Interactive Buttons: Preview and Save

To enhance user experience and provide greater control over the ordinal encoding process, the tool includes two essential buttons:

  • Preview: This button allows users to see the effects of the ordinal encoding configuration in real-time without permanently applying the changes. By clicking Preview, users can visually assess how the dataset will be transformed based on the current configurations, ensuring that the encoding method is appropriate before committing to any changes.
  • Save: Once users are satisfied with their configurations and the preview results, they can click the Save button to permanently apply their chosen settings. This action saves the configuration, which will then be applied to the data during the training process, ensuring that the ordinal encoding aligns with the user's expectations and requirements.

Conclusion

The Ordinal Encoding tool in OtasML provides a robust solution for converting categorical data into ordered numerical labels, preserving the ordinal relationships among categories. By offering flexible options for handling unknown and missing categories and the ability to selectively apply ordinal encoding to specific columns, the tool provides greater control over the data preprocessing step. The inclusion of interactive Preview and Save buttons further enhances the user experience, ensuring confidence in the ordinal encoding process. OtasML continues to empower users with intuitive and effective tools, making data preparation a seamless and integral part of the machine learning workflow.

Tools

A+ A-

Version

1.1