Object Detection Using Mask R-CNN with TensorFlow 1.14 and Keras

Nov 12, 2024 02:21 PM - 1 month ago 39182

Mask R-CNN is an entity discovery exemplary based connected heavy convolutional neural networks (CNN) developed by a group of Facebook AI researchers successful 2017. The exemplary tin return some the bounding container and a disguise for each detected entity successful an image.

The exemplary was primitively developed successful Python utilizing the Caffe2 heavy learning library. The original root codification is disposable connected GitHub. To support the Mask R-CNN exemplary pinch much celebrated libraries, specified arsenic TensorFlow, location is simply a celebrated open-source task called Mask_RCNN that offers an implementation based connected Keras and TensorFlow 1.14.

Google officially released TensorFlow 2.0 successful September 2020. TensorFlow 2.0 is amended organized and overmuch easier to study compared to TensorFlow $\geq$ 1.0. Unfortunately, the Mask_RCNN task does not yet support TensorFlow 2.0.

This tutorial uses the TensorFlow 1.14 merchandise of the Mask_RCNN task to some make predictions and train the Mask R-CNN exemplary utilizing a civilization dataset. In different tutorial, the task will beryllium modified to make Mask R-CNN compatible pinch TensorFlow 2.0.

This tutorial covers the following:

  • Overview of the Mask_RCNN Project
  • Object Detection pinch TensorFlow 1.14
  • Preparing the Model Configuration Parameters
  • Building the Mask R-CNN Model Architecture
  • Loading the Model Weights
  • Reading an Input Image
  • Detecting Objects
  • Visualizing the Results
  • Complete Code for Prediction
  • Downloading the Training Dataset
  • Preparing the Training Dataset
  • Preparing Model Configuration
  • Training Mask R-CNN successful TensorFlow 1.14
  • Conclusion

Introduction

The Mask_RCNN task is open-source and disposable on GitHub nether the MIT license, which allows anyone to use, modify, aliases administer the codification for free.

The publication of this task is the support of the Mask R-CNN entity discovery exemplary successful TensorFlow $\geq$ 1.0 by building each the layers successful the Mask R-CNN model, and offering a elemental API to train and trial it.

The Mask R-CNN exemplary predicts the people label, bounding box, and disguise for the objects successful an image. Here is an illustration of what the exemplary could detect.

image

The first merchandise of the task (Mask_RCNN 1.0) was published connected November 3rd, 2017. The latest merchandise (Mask_RCNN 2.1) was published connected March 20th, 2019. Since this date, nary caller releases were published.

Prerequisites

  • Libraries: Install TensorFlow 1.14 and Keras. Additional limitations see numpy, scipy, opencv-python, and imgaug.
  • Hardware Requirements: A compatible GPU is highly recommended (NVIDIA GPUs activity best) pinch CUDA 10.0 and cuDNN 7.6.
  • Dataset: Have a branded dataset successful COCO aliases Pascal VOC format, aliases hole your ain branded images.
  • Basic Knowledge: Familiarity pinch heavy learning, entity discovery concepts, and the Mask R-CNN architecture.

To get the task connected your PC, conscionable clone it according to the adjacent command:

git clone https://github.com/matterport/Mask_RCNN.git

It is besides imaginable to download the task arsenic a ZIP record from this link. Let’s person a speedy look astatine the task contented erstwhile it’s disposable locally.

At the clip of penning this tutorial, the task has 4 directories:

  1. mrcnn: This is the halfway directory that holds the project’s Python code.
  2. samples: Jupyter notebooks providing immoderate examples to usage the project.
  3. images: A postulation of trial images.
  4. assets: Some annotated images.

The astir important directory is mrcnn, arsenic it holds the root codification for the project. It has the pursuing Python files:

  • __init__.py: Marks the mrcnn files arsenic a Python library.
  • model.py: Has the functions and classes for building the layers and the model.
  • config.py: Holds a people named Config that holds immoderate configuration parameters astir the model.
  • utils.py: Includes immoderate helper functions and classes.
  • visualize.py: Visualizes the results of the model.
  • parallel_model.py: For supporting aggregate GPUs.

Some of the files astatine the guidelines of the task are:

  • setup.py: Used to instal the task utilizing pip.
  • README.md: A Markdown record documenting the project.
  • LICENSE: The MIT license.
  • requirements.txt: Required libraries to usage the project.

Based connected the requirements.txt file, the TensorFlow type must beryllium astatine slightest 1.3.0. For Keras, it must beryllium 2.0.8 aliases higher.

There are 2 ways to usage the project:

  1. Install it utilizing pip.
  2. Copy the mrcnn files to wherever you will beryllium utilizing the project. In this case, make judge that each the required libraries successful the requirements.txt record are installed.

To instal the project, conscionable rumor the pursuing bid from the bid punctual aliases terminal. For platforms different than Windows, switch “python” pinch “python3”.

python setup.py install

An replacement measurement to usage the task is to transcript the mrcnn files to wherever the task will beryllium used. Assume location is simply a directory called “Object Detection” wrong which location is simply a Python record named object_detection.py that uses the codification successful the mrcnn folder. Then, simply transcript the mrcnn files wrong the “Object Detection” directory.

Here is the directory structure:

Object Detection mrcnn object_detection.py

Now we are fresh to usage the Mask_RCNN project. The adjacent conception discusses really to usage the task pinch TensorFlow $\geq$ 1.0.

Object Detection successful TensorFlow 1

Before starting this section, make judge TensorFlow 1 ($\geq$1.3.0) is installed. You tin cheque the type utilizing the pursuing code:

import tensorflow print(tensorflow.__version__)

The steps to usage the Mask_RCNN task to observe objects successful an image are:

  1. Prepare the exemplary configuration parameters.
  2. Build the Mask R-CNN exemplary architecture.
  3. Load the exemplary weights.
  4. Read an input image.
  5. Detect objects successful the image.
  6. Visualize the results.

This conception builds an illustration that uses a pre-trained Mask R-CNN to observe the objects successful the COCO dataset. The adjacent subsections talk each of the steps listed above.

Prepare the Model Configuration Parameters**

To build the Mask R-CNN model, respective parameters must beryllium specified. These parameters power non-maximum suppression (NMS), intersection complete national (IoU), image size, number of ROIs per image, ROI pooling layer, and more.

The mrcnn files has a book named config.py which has a azygous people named Config. This people has immoderate default values for the parameters. You tin widen this people and override immoderate of the default parameters. The pursuing codification creates a caller people named SimpleConfig that extends the mrcnn.config.Config class.

import mrcnn.config class SimpleConfig(mrcnn.config.Config): ...

One of the captious parameters that must beryllium overridden is the number of classes, which defaults to 1.

NUM_CLASSES = 1

In this illustration the exemplary detects the objects successful an image from the COCO dataset. This dataset has 80 classes. Remember that the inheritance must beryllium regarded arsenic an further class. As a result, the full number of classes is 81.

NUM_CLASSES = 81

The different 2 parameters should beryllium cautiously assigned, which are GPU_COUNT and IMAGES_PER_GPU. They default to 1 and 2, respectively.

These 2 variables are utilized to cipher the batch size:

BATCH_SIZE = IMAGES_PER_GPU * GPU_COUNT

Assuming that the default values are used, past the batch size is 2*1=2. This intends 2 images are fed to the exemplary astatine once. As a result, the personification has to provender 2 images astatine once.

In immoderate cases the personification is only willing successful detecting the objects successful a azygous image. Thus, the IMAGES_PER_GPU spot should beryllium group to 1.

GPU_COUNT = 1 IMAGES_PER_GPU = 1

Here is the complete codification for the configuration class. The NAME spot is simply a unsocial sanction for the configuration.

import mrcnn.config class SimpleConfig(mrcnn.config.Config): NAME = "coco_inference" GPU_COUNT = 1 IMAGES_PER_GPU = 1 NUM_CLASSES = 81

Build the Mask R-CNN Model Architecture

To build the Mask R-CNN exemplary architecture, the mrcnn.model book has a people named MaskRCNN. The constructor of this people accepts 3 parameters:

  1. mode: Either "training" aliases "inference".
  2. config: An lawsuit of the configuration class.
  3. model_dir: Directory to prevention training logs and trained weights.

The adjacent illustration creates an lawsuit of the mrcnn.model.MaskRCNN class. The created lawsuit is saved successful the exemplary variable.

import mrcnn.model model = mrcnn.model.MaskRCNN(mode="inference", config=SimpleConfig(), model_dir=os.getcwd())

The Keras exemplary is saved successful the keras_model property of the instance. Using this attribute, the summary of the exemplary tin beryllium printed.

model.keras_model.summary()

The mode architecture is large; conscionable 4 layers from the apical and bottommost are listed below. The last furniture named mrcnn_mask only returns the masks for the apical 100 ROIs.

___________________________________________________________________________ Layer (type) Output Shape Param =========================================================================== input_image (InputLayer) (None, None, None, 3 0 ___________________________________________________________________________ zero_padding2d_1 (ZeroPadding2D (None, None, None, 3 0 input_image[0][0] ___________________________________________________________________________ conv1 (Conv2D) (None, None, None, 6 9472 zero_padding2d_1[0][0] ___________________________________________________________________________ bn_conv1 (BatchNorm) (None, None, None, 6 256 conv1[0][0] ___________________________________________________________________________ ... ___________________________________________________________________________ mrcnn_mask_bn4 (TimeDistributed (None, 100, 14, 14, 1024 mrcnn_mask_conv4[0][0] ___________________________________________________________________________ activation_74 (Activation) (None, 100, 14, 14, 0 mrcnn_mask_bn4[0][0] ___________________________________________________________________________ mrcnn_mask_deconv (TimeDistribu (None, 100, 28, 28, 262400 activation_74[0][0] ___________________________________________________________________________ mrcnn_mask (TimeDistributed) (None, 100, 28, 28, 20817 mrcnn_mask_deconv[0][0] =========================================================================== Total params: 64,158,584 Trainable params: 64,047,096 Non-trainable params: 111,488

Load the Model Weights

The past subsection created the exemplary architecture. This subsection loads the weights successful the created exemplary utilizing the load_weights() method. It is simply a modified type of the Keras load_weights() method that supports multi-GPU usage, successful summation to the expertise to exclude immoderate layers.

The 2 parameters utilized are:

  1. filepath: Accepts the way of the weights file.
  2. by_name: If True, past each furniture is assigned the weights according to its name.

The adjacent codification calls the load_weights() method while passing the way of the weights record mask_rcnn_coco.h5. This record tin beryllium downloaded from this link.

model.load_weights(filepath="mask_rcnn_coco.h5", by_name=True)

Read An Input Image

Once the exemplary is created and its weights are loaded, adjacent we request to publication an image and provender it to the model.

The adjacent codification snippet uses OpenCV to publication an image and reorder its colour channels to beryllium RGB, alternatively than BGR.

import cv2 image = cv2.imread("3627527276_6fe8cd9bfe_z.jpg") image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

The adjacent fig shows the image we conscionable read. The image is disposable astatine this link.

image

Detect Objects

Given the exemplary and the input image, the objects successful the image tin beryllium detected utilizing the detect() method. It accepts 2 parameters:

  1. images: A database of images.
  2. verbose: When group to 1, past immoderate log messages are printed.

The pursuing codification calls the detect() method. Note that the magnitude of the database assigned to the images statement must beryllium adjacent to the batch size. Based connected the GPU_COUNT and IMAGES_PER_GPU configuration properties we set, the batch size is 1. Thus, the database must person a azygous image. The consequence of the discovery is returned successful the r variable.

r = model.detect(images=[image], verbose=0)

If much than 1 image is passed (e.g. images=[image, image, image]), the pursuing objection is raised indicating that the magnitude of the images statement must beryllium adjacent to the BATCH_SIZE configuration property.

... File "D:\Object Detection\Pure Keras\mrcnn\model.py", in detect assert len(images) == self.config.BATCH_SIZE, "len(images) must beryllium adjacent to BATCH_SIZE" AssertionError: len(images) must beryllium adjacent to BATCH_SIZE

For each input image, the detect() method returns a dictionary that holds accusation astir the detected objects. To return the accusation astir the first image fed to the model, past the scale 0 is utilized pinch the adaptable r.

r = r[0]

The pursuing codification prints the keys successful the dictionary. There are 4 elements successful the dictionary, pinch the pursuing keys:

  1. rois: The boxes astir each detected object.
  2. class_ids: The people IDs for the objects.
  3. scores: The people scores for each object.
  4. masks: The masks.
print(r.keys()) dict_keys(['rois', 'class_ids', 'scores', 'masks'])

Visualize the Results

Once the detect() method completes, it’s clip to visualize the detected objects. The mrcnn.visualize book is utilized for this purpose. The mrcnn.visualize.display_instances() usability is utilized for displaying the discovery boxes, masks, people names, and scores.

Among the parameters accepted by this function, the pursuing are used:

  • image: The image connected which the discovery boxes and masks are drawn.
  • boxes: The discovery boxes.
  • masks: The detected masks.
  • class_ids: Detected people IDs.
  • class_names: A database of people names successful the dataset.
  • scores: Prediction scores for each object.

In the pursuing code, the people names are prepared successful the CLASS_NAMES list. Note that the people explanation of the first people is BG for the background. The mrcnn.visualize.display_instances() usability is called to show the annotated image.

import mrcnn.visualize CLASS_NAMES = ['BG', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'] r = r[0] mrcnn.visualize.display_instances(image=image, boxes=r['rois'], masks=r['masks'], class_ids=r['class_ids'], class_names=CLASS_NAMES, scores=r['scores'])

After the usability is executed, a fig is shown (as seen below) connected which the boxes, masks, people scores, and labels are drawn.

image Up to this point, each the steps required to usage the Mask_RCNN task to observe objects are discussed.

Complete Code for Prediction

The complete codification to usage the Mask_RCNN task to observe objects successful an image is listed below.

import mrcnn import mrcnn.config import mrcnn.model import mrcnn.visualize import cv2 import os CLASS_NAMES = ['BG', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'] class SimpleConfig(mrcnn.config.Config): NAME = "coco_inference" GPU_COUNT = 1 IMAGES_PER_GPU = 1 NUM_CLASSES = len(CLASS_NAMES) model = mrcnn.model.MaskRCNN(mode="inference", config=SimpleConfig(), model_dir=os.getcwd()) model.load_weights(filepath="mask_rcnn_coco.h5", by_name=True) image = cv2.imread("sample2.jpg") image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) r = model.detect([image], verbose=0) r = r[0] mrcnn.visualize.display_instances(image=image, boxes=r['rois'], masks=r['masks'], class_ids=r['class_ids'], class_names=CLASS_NAMES, scores=r['scores'])

Up to this point, the codification for making predictions is complete. The remainder of the tutorial discusses really to train the Mask R-CNN exemplary utilizing a civilization training dataset. The adjacent conception downloads the dataset.

Download the Training Dataset

You must person a dataset to train a instrumentality learning aliases heavy learning model. For each sample successful the training data, location mightiness beryllium ground-truth data. This information mightiness beryllium simple, for illustration a people label, aliases complex, for illustration what is utilized for an entity discovery model.

Generally, the ground-truth information for entity discovery models see a bounding box and a class label for each entity wrong the image. Specifically for the Mask R-CNN model, location is an further mask that marks the pixels belonging to the object.

Each image mightiness person much than 1 object, and frankincense preparing the ground-truth information for an full dataset is tiresome.

In this section, an existing dataset of Kangaroo images is utilized to train Mask R-CNN utilizing the Mask_RCNN project. The Kangaroo dataset tin beryllium downloaded here. It comes pinch note information (i.e. ground-truth data) and frankincense it is fresh to use.

The adjacent fig shows an image from the dataset wherever the predicted bounding box, mask, and people are drawn. Note that the disguise is not accurate, arsenic the exemplary was trained for conscionable a azygous epoch.

image

The dataset comes pinch 2 folders:

  1. images: The images successful the dataset.
  2. annots: The annotations for each image arsenic a abstracted XML file.

The adjacent conception prepares the dataset for later usage to train and validate the Mask R-CNN model.

Prepare the Training Dataset

The Mask_RCNN task has a people named Dataset wrong the mrcnn.utils module. This people simply stores accusation astir each training images wrong lists. When the specifications of each the images are stored successful a azygous information building it will beryllium easier to negociate the dataset.

For example, location is simply a database named class_info which holds accusation astir each people wrong the dataset. Similarly, a database named image_info holds accusation astir each image. To train the Mask R-CNN model, the image_info database is utilized to retrieve the training images and their annotations. The annotations see and bounding boxes and people labels for each objects wrong each image.

The mrcnn.utils.Dataset people has a number of useful methods, which include:

  • add_class(): Adds a caller class.
  • add_image(): Adds a caller image to the dataset.
  • image_reference(): The reference (e.g. way aliases link) by which the image is retrieved.
  • prepare(): After adding each the classes and images to the dataset, this method prepares the dataset for use.
  • source_image_link(): Returns the way aliases nexus of the image.
  • load_image(): Reads and returns an image.
  • load_mask(): Loads the masks for the objects successful an image.

The adjacent codification artifact creates an quiet lawsuit of the mrcnn.utils.Dataset named KangaroDataset.

import mrcnn.utils class KangarooDataset(mrcnn.utils.Dataset): pass

Within the caller class, consciousness free to override immoderate of the antecedently mentioned methods if it needs customization. Also, adhd immoderate caller method that mightiness help.

Out of each the antecedently listed methods, the load_mask() method must beryllium overridden. The logic is that retrieving the objects’ masks disagree based connected the note record format, and frankincense location is nary azygous measurement to load the masks. As a result, loading the masks is simply a task that the developer must do.

In the adjacent codification artifact below, we’ll build 3 methods:

  1. load_dataset(): It accepts the directory successful which the images and annots folders exist, successful summation to a Boolean parameter representing whether the directory refers to the training aliases validation data.
  2. load_mask(): This method loads the masks of the Kangaroo dataset. It accepts the image ID successful the image_id parameter. The image ID is conscionable a unsocial worth for each image. Feel free to delegate the IDs of your choice. The method returns the masks and the people IDs of each object. Note that this dataset has a azygous people representing Kangaroos.
  3. extract_boxes: The load_mask() method calls the extract_boxes() method which is responsible for returning the coordinates of each bounding box, successful summation to the width and tallness of each image.

The implementation of each of these 3 methods is listed successful the adjacent artifact of code.

The first statement successful the load_dataset() method calls the add_class() method to create a people named kangaroo pinch ID 1. There is different people pinch ID 0, which is the inheritance pinch the explanation BG. We do not person to adhd it explicitly because it exists by default. The past statement calls the add_image() method to adhd an image to the dataset. The image ID, path, and the way of the note record are passed to this method.

The load_dataset() method splits the dataset truthful that 150 images are utilized for training, while the remainder are utilized for testing.

import mrcnn.utils class KangarooDataset(mrcnn.utils.Dataset): def load_dataset(self, dataset_dir, is_train=True): self.add_class("dataset", 1, "kangaroo") images_dir = dataset_dir + '/images/' annotations_dir = dataset_dir + '/annots/' for filename in os.listdir(images_dir): image_id = filename[:-4] if image_id in ['00090']: continue if is_train and int(image_id) >= 150: continue if not is_train and int(image_id) < 150: continue img_path = images_dir + filename ann_path = annotations_dir + image_id + '.xml' self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path) def extract_boxes(self, filename): character = xml.etree.ElementTree.parse(filename) guidelines = tree.getroot() boxes = list() for container in root.findall('.//bndbox'): xmin = int(box.find('xmin').text) ymin = int(box.find('ymin').text) xmax = int(box.find('xmax').text) ymax = int(box.find('ymax').text) coors = [xmin, ymin, xmax, ymax] boxes.append(coors) width = int(root.find('.//size/width').text) tallness = int(root.find('.//size/height').text) return boxes, width, height def load_mask(self, image_id): info = self.image_info[image_id] way = info['annotation'] boxes, w, h = self.extract_boxes(path) masks = zeros([h, w, len(boxes)], dtype='uint8') class_ids = list() for one in range(len(boxes)): container = boxes[i] row_s, row_e = box[1], box[3] col_s, col_e = box[0], box[2] masks[row_s:row_e, col_s:col_e, i] = 1 class_ids.append(self.class_names.index('kangaroo')) return masks, asarray(class_ids, dtype='int32')

Based connected the KangarooDataset class, the pursuing codification prepares the training dataset. A caller lawsuit of the people is simply created. To load the images the load_dataset() method is called, which accepts the way of the dataset images and its annotations successful the dataset_dir parameter. This is each successful summation to the is_train flag. If this fag is True, past the information is regarded arsenic training data. Otherwise, the information is utilized for validation aliases testing.

The prepare() method is called to hole the dataset for use. It conscionable creates much attributes astir the information for illustration the number of classes, number of images, and more.

train_set = KangarooDataset() train_set.load_dataset(dataset_dir='D:\kangaroo', is_train=True) train_set.prepare()

Similarly, the validation dataset is prepared according to the pursuing code. The only quality is that the is_train emblem is group to False.

valid_dataset = KangarooDataset() valid_dataset.load_dataset(dataset_dir='D:\kangaroo', is_train=False) valid_dataset.prepare()

The adjacent conception prepares immoderate configuration parameters for the model.

Prepare Model Configuration

A subclass of the mrcnn.config.Config people must beryllium created to clasp the exemplary configuration parameters. The adjacent codification creates a caller people named KangarooConfig, which extends the mrcnn.config.Config class.

Note that the number of classes (NUM_CLASSES) is group to 2 because the dataset has 2 classes only, which are BG (for the background) and kangaroo.

import mrcnn.config class KangarooConfig(mrcnn.config.Config): NAME = "kangaroo_cfg" GPU_COUNT = 1 IMAGES_PER_GPU = 1 NUM_CLASSES = 2 STEPS_PER_EPOCH = 131

After the dataset and the exemplary configuration are prepared, the adjacent conception discusses training the Mask R-CNN exemplary utilizing TensorFlow 1.0.

Train Mask R-CNN successful TensorFlow 1.0

This conception assumes that a type of TensorFlow 1.0 is installed and utilized to tally the codification mentioned. It is imaginable besides to create a virtual situation successful which TensorFlow 1.0 is installed.

The adjacent codification creates an lawsuit of the mrcnn.model.MaskRCNN class, which builds the architecture of the Mask R-CNN model. The mode parameter is group to 'training' to bespeak that the exemplary will beryllium trained. When the exemplary is loaded for training, location are other input layers compared to loading the exemplary conscionable for conclusion (i.e. prediction). The other layers clasp the input images and their annotations (e.g. bounding boxes).

import mrcnn.model model = mrcnn.model.MaskRCNN(mode='training', model_dir='./', config=KangarooConfig())

Once the exemplary architecture is created, the weights are loaded utilizing the load_weights() method according to the adjacent code. This method accepts the pursuing 3 parameters:

  1. filepath: The way of the weights file. The mask_rcnn_coco.h5 record tin beryllium downloaded from this link.
  2. by_name: Whether to delegate the layers’ weights according to their names.
  3. exclude: The names of the layers for which we do not load their weights. These are the layers astatine the caput of the architecture, which alteration based connected the problem type (e.g. number of classes).

The excluded layers are those responsible for producing the people probabilities, bounding boxes, and masks.

model.load_weights(filepath='mask_rcnn_coco.h5', by_name=True, exclude=["mrcnn_class_logits", "mrcnn_bbox_fc", "mrcnn_bbox", "mrcnn_mask"])

After loading the weights into the exemplary layers, the exemplary is trained utilizing the train() method. Note that the layers statement is group to heads to bespeak that only the layers astatine the caput of the architecture are trained. You tin besides specify the furniture names to beryllium trained.

Notice that the exclude parameter successful the load_weights() method accepts the layers’ names that will not person their weights loaded, but the layers parameter successful the train() method accepts the furniture names to beryllium trained.

model.train(train_dataset=train_set, val_dataset=valid_dataset, learning_rate=KangarooConfig().LEARNING_RATE, epochs=10, layers='heads') model_path = 'Kangaroo_mask_rcnn.h5' model.keras_model.save_weights(model_path)

This is the complete codification for training a model.

import os import xml.etree from numpy import zeros, asarray import mrcnn.utils import mrcnn.config import mrcnn.model class KangarooDataset(mrcnn.utils.Dataset): def load_dataset(self, dataset_dir, is_train=True): self.add_class("dataset", 1, "kangaroo") images_dir = dataset_dir + '/images/' annotations_dir = dataset_dir + '/annots/' for filename in os.listdir(images_dir): image_id = filename[:-4] if image_id in ['00090']: continue if is_train and int(image_id) >= 150: continue if not is_train and int(image_id) < 150: continue img_path = images_dir + filename ann_path = annotations_dir + image_id + '.xml' self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path) def extract_boxes(self, filename): character = xml.etree.ElementTree.parse(filename) guidelines = tree.getroot() boxes = list() for container in root.findall('.//bndbox'): xmin = int(box.find('xmin').text) ymin = int(box.find('ymin').text) xmax = int(box.find('xmax').text) ymax = int(box.find('ymax').text) coors = [xmin, ymin, xmax, ymax] boxes.append(coors) width = int(root.find('.//size/width').text) tallness = int(root.find('.//size/height').text) return boxes, width, height def load_mask(self, image_id): info = self.image_info[image_id] way = info['annotation'] boxes, w, h = self.extract_boxes(path) masks = zeros([h, w, len(boxes)], dtype='uint8') class_ids = list() for one in range(len(boxes)): container = boxes[i] row_s, row_e = box[1], box[3] col_s, col_e = box[0], box[2] masks[row_s:row_e, col_s:col_e, i] = 1 class_ids.append(self.class_names.index('kangaroo')) return masks, asarray(class_ids, dtype='int32') class KangarooConfig(mrcnn.config.Config): NAME = "kangaroo_cfg" GPU_COUNT = 1 IMAGES_PER_GPU = 1 NUM_CLASSES = 2 STEPS_PER_EPOCH = 131 train_set = KangarooDataset() train_set.load_dataset(dataset_dir='kangaroo', is_train=True) train_set.prepare() valid_dataset = KangarooDataset() valid_dataset.load_dataset(dataset_dir='kangaroo', is_train=False) valid_dataset.prepare() kangaroo_config = KangarooConfig() model = mrcnn.model.MaskRCNN(mode='training', model_dir='./', config=kangaroo_config) model.load_weights(filepath='mask_rcnn_coco.h5', by_name=True, exclude=["mrcnn_class_logits", "mrcnn_bbox_fc", "mrcnn_bbox", "mrcnn_mask"]) model.train(train_dataset=train_set, val_dataset=valid_dataset, learning_rate=kangaroo_config.LEARNING_RATE, epochs=1, layers='heads') model_path = 'Kangaro_mask_rcnn.h5' model.keras_model.save_weights(model_path)

Conclusion

This tutorial introduced the open-source Python task Mask_RCNN, which builds the Mask R-CNN exemplary for entity lawsuit segmentation. The task only supports a type of TensorFlow $\geq$ 1.0. This tutorial covered the steps for making predictions, and for training the exemplary connected a civilization dataset. Before the exemplary is trained, the train and validation datasets are prepared utilizing a kid people of the mrcnn.utils.Dataset class. After preparing the exemplary configuration parameters, the exemplary tin beryllium trained.

More