Hi! Welcome to “Custom Data for TF Object Detection API – Part 2”. It is a continuation of “Installing TensorFlow with Object Detection API – Part 1“. So, with the last post completed, we will continue here the process to train a TensorFlow Object Detection API model.
In order to train a model with our custom data we need to get data, filter it, label it and at the end, build it to be useful for tensorflow.
So, let’s begin collecting data, we could get images from anywhere we want. There are some dataset repositories where we can get images:
However, keep in mind that there are many more repositories and you are always able to build your own dataset collecting images as you want.
Once we got the images that we need, we have to filter the dataset. We will need to check the dataset if it was not curated before, remove the images that doesn’t make sense, remove duplicated or any image that could cause some error in the model.
Labeling Images
There are many tools available for that. Some of them are better for labeling with more accuracy, another ones are better for speed and there are others that have faster setup time and needs really a little effort.
In that case, we will use LabelImg. We have to create the rectangle around the things that we want to detect and put a name to the class, for each image that we want to use for training.
When the labeling is completed, we will have our images and a lot of xml files with coordinates of each label in Pascal VOC format.
Compiling labeled images
Now we have to build all those annotations into a single csv file.
You could do that with the python in our repo called xml_to_csv.py, to use it run that in an Anaconda Prompt. If you doesn’t have Anaconda installed, check the part 1 and learn how to get it.
With the csv ready, we need to split it in a train and a test data, for that we leave in our repo a Jupyter Notebook file, with helpful code.
Once the train and evaluation data are ready, we have to build it to generate the tfrecord file.
You will find a generate_tfrecord.py file in our repo too, just have to run this lines:
And that is all. Our custom dataset will be ready to train our model now.
Don’t miss the next post, we will learn finally how to train the model.