Instructions
Step 1: Data Preparation: Please upload your input labelled training data in csv format and click to clean and prepare your training data. Please make sure that you have only one column of unique identifier (ID) within your data (avoid multiple ID columns, only first column as unique ID). Please see the following file as an acceptable example of the input csv data file format. Please make sure that your csv filename does not include spaces for example MilkQuality.csv is acceptable and Milk Quality.csv is not. IF our use case deploys class and target labels which represent quantities in real-life for example output of a generator (Class 1 below 10KW and Class 2 above 10KW), it is important to always avoid sorting/rearranging those class members/labels and their corresponding csv rows based on the sorted value of that quantity. For example, data rows of the csv file as rows No. 1, 2, 3, 4,... corresponding the Class 1 labels/target values of 1, 1, 1, 1,... representing the values 8.4, 4.3, 1.8, 9.4,... are fine and should not be rearranged based on the sorted value of the quantities they represent otherwise poor prediction or classification results may be generated (so the representing data rows should not be rearranged to correspond to 1.8, 4.3, 8,4, 9,4). Please also avoid NULL within your data. Instead of NULL use blank csv cell:
When preparing your input training csv data file for Butterfly AI, please also avoid sorting all the data columns based on one data column. This may result in poor predictions and low accuracy. As an example, the training csv file (above) has been rearranged/sorted based on the sorted Param5 column. The resulting csv file is shown below which is an unacceptable format. First column (ID) is an exception to this instruction. Please note that the more irregular and randomized the data features of your training csv file are arranged across columns and rows of csv, the better is the chance that Butterfly AI will generate more accurate predictions. To maximize the irregularity of your training data, you may temporarily add a column of random non-binary numbers (which is output of a random number generator as a sequence for example -1,4, 19.3, 0.7, -10.2,...) to your training csv file and sort that column of random numbers from smallest to largest while expanding the sorting process to all the columns and then remove that temporary column.
Step 2: Training: Please choose one of your datasets that has already been prepared and cleaned and click to start training.
Step 3: Batch Predictions/Classification: Butterfly AI always keeps the trained model with the best performance of all the training rounds, pair it with its corresponding training dataset while overriding the past weaker models. Please choose the dataset originally used for training, upload the blind batch prediction dataset in csv format (example below), and click to start cleaning and preparation of the prediction data and to run the prediction process. Your prediction results will be delivered in an output csv file. Please click "DOWNLOAD" button to see your final prediction results. Within the prediction result file (the output csv) always the column labelled "Class" represents your final prediction or classification results. The probabilities represent the certainty of predictions or classifications.
Your blind unseen batch prediction dataset should have similar format to the original dataset used for training, except that the last column is not included (the unknown labels/targets to be predicted). For example for the training dataset example (above) your batch prediction dataset may look like the following csv: