A Novel Method for Object Detection using Deep Learning and CAD
Models
IgorGarciaBallhausenSampaio1,LuigyMachaca1,Jose´ Viterbo1 a andJorisGue´rin2 b
1ComputingInstitute,UniversidadeFederalFluminense,Brazil
2LAAS-CNRS,ONERA,Universite´deToulouse,France
Keywords: ObjectDetection,CADModels,SyntheticImageGeneration,DeepLearning,ConvolutionalNeuralNetwork.
Abstract: ObjectDetection(OD)isanimportantcomputervisionproblemforindustry,whichcanbeusedforquality
controlintheproductionlines,amongotherapplications.Recently,DeepLearning(DL)methodshaveenabled
practitionerstotrainODmodelsperformingwelloncomplexrealworldimages. However,theadoptionof
these models in industry is still limited by the difficulty and the significant cost of collecting high quality
trainingdatasets. Ontheotherhand,whenapplyingODtothecontextofproductionlines,CADmodelsof
theobjectstobedetectedareoftenavailable.Inthispaper,weintroduceafullyautomatedmethodthatusesa
CADmodelofanobjectandreturnsafullytrainedODmodelfordetectingthisobject.Todothis,wecreated
aBlenderscriptthatgeneratesrealisticlabeleddatasetsofimagescontainingtheobject,whicharethenused
fortrainingtheODmodel. Themethodisvalidatedexperimentallyontwopracticalexamples,showingthat
thisapproachcangenerateODmodelsperformingwellonrealimages,whilebeingtrainedonlyonsynthetic
images.Theproposedmethodhaspotentialtofacilitatetheadoptionofobjectdetectionmodelsinindustryas
itiseasytoadaptfornewobjectsandhighlyflexible.Hence,itcanresultinsignificantcostsreduction,gains
inproductivityandimprovedproductsquality.
1 INTRODUCTION newdatasetandlabelingitmanuallycanbeverytime
consumingandexpansive(Jabbaretal.,2017).
Recently,DeepLearning(DL)hasproducedexcellent On the other hand, OD for industrial production
results for Object Detection (OD) (Liu et al., 2020). lines presents the specificity that the manufacturers
On the one hand, a typical limitation with DL is the often have access to the CAD models of the objects
requirementoflargelabeleddatasetsfortraining. In- to detect. Thanks to advances in computer graphics
deed,althoughtherearevariouslargedatabasesavail- techniques, such as ray tracing (Shirley and Morley,
ableonlineforOD,forspecificindustrialapplications 2003),thegenerationofphoto-realisticimagesisnow
it is always necessary to create custom datasets con- possible. In such artificially generated images, the
taining the objects of interest. While the scenarios computercanbeemployedtoobtainboundingboxla-
present in public datasets are useful from both a re- belingforfree. Theuseofsyntheticimagesrendered
search and application standpoint, it was found that from CAD models to train OD models has already
industrial applications, such as bin picking or defect beenproposedin(Pengetal., 2015), (Rajpuraetal.,
inspection,havequitedifferentcharacteristicsthatare 2017)and(Hinterstoisseretal.,2018).However,their
not modeled by the existing datasets (Drost et al., approachesarenotautomatedastheyrequiremanual
2017). Asaresult,methodsthatperformwellonex- scenecreationbyBlenderartists. Inaddition,theob-
istingdatasetssometimesshowdifferentresultswhen jectsusedintheseworksareusuallygeneric,suchas
appliedtoindustrialscenarioswithoutretraining.The buses,airplanes,carsoranimals.
processofgeneratingaspecificdatasetforretraining Themaincontributionofthispaperistopresenta
istedious,andcanbeerror-pronewhenconductedby newmethodfortrainingODmodelsinsyntheticim-
non-professionaltechnicians. Moreover,generatinga ages generated from CAD models that is fully auto-
maticandthuswellsuitedforindustrialuse. Thepro-
a https://orcid.org/0000-0002-0339-6624 posedmethodconsistsoftheautomaticgenerationof
b https://orcid.org/0000-0002-8048-8960 realistic labeled images containing the objects to be
75
Sampaio,I.,Machaca,L.,Viterbo,J.andGuérin,J.
ANovelMethodforObjectDetectionusingDeepLearningandCADModels.
DOI:10.5220/0010451100750082
InProceedingsofthe23rdInternationalConferenceonEnterpriseInformationSystems(ICEIS2021)-Volume1,pages75-82
ISBN:978-989-758-509-8
Copyright c 2021bySCITEPRESS–ScienceandTechnologyPublications,Lda.Allrightsreserved
(cid:13)
ICEIS2021-23rdInternationalConferenceonEnterpriseInformationSystems
detected, followed by the fine-tuning of a pretrained aboutwhenDCNNsbegantobesuccessfullyapplied
ODmodelontheartificialdataset.Anextensivestudy toimageclassification(Liuetal.,2020).
is conducted to properly select the user-defined pa-
rameterssothatitmaximizestheperformanceonreal 2.2 ODforIndustrialApplications
world images. Our method is evaluated using the
CADmodelsoftwoindustrialobjectsfortraining,as
Although general purpose OD methods have greatly
wellasreallabeledimagescontainingtheobjectsfor
improved thanks to the availability of large public
evaluation. The results obtained are very promising
datasets, the detection of instances in the industrial
aswemanagetogetF1-scoresabove90%onrealim-
context must be approached differently, since anno-
ageswhiletrainingonlyonsyntheticimages.
tated images are generally not available or rare. In-
This paper is organized as follows. Section 2
deed,totrainadeeplearningmodel,hundredsofan-
presents the related work in the field of object de-
notated images for each object category are needed.
tection and deep learning for industry. Section 3
Specific datasets need to be collected and annotated
provides detailed explanations about the proposed
fordifferenttargetapplications. Thisprocessistime-
methodcreated. Section4describesourexperiments,
consumingandlaborious,andincreasestheburdenon
presentstheresultsobtainedanddiscussesthem. Fi-
operators, which goes against the goal of industrial
nally, conclusions and directions for future work are
automation(Cohenetal.,2020),(Geetal.,2020).
presentedinSection5.
Apublicdatasetsadaptedtotheindustrialcontext
was developed in (Drost et al., 2017). Unlike other
3D object detection datasets, this work models in-
2 RELATED WORK dustrial waste collection and object inspection tasks
that often face different challenges. In addition, the
This section presents related work about OD, indus- evaluation criteria are focused on practical aspects,
trial applications of DL-based computer vision, as such as execution times, memory consumption, use-
wellascomputervisionmethodsusingCADmodels. ful measures of correction and precision. Other ex-
amplesofdatasetsadaptedtotheindustrialcontextin-
2.1 ObjectDetection clude(Gue´rinetal.,2018b)and(Gue´rinetal.,2018a).
Finally,in(Yangetal.,2019),amethodtodetect
defectsoftinypartsinrealtimewasdeveloped,based
OD is a challenging computer vision problem that
on object detection and deep learning. To improve
consists in locating instances of objects from prede-
their results, the authors consider the specificities of
fined categories in natural images (Prasad, 2012). It
theindustrialapplicationintheirmethodsuchasthe
has many applications in various domains such as
propertiesoftheparts,theenvironmentalparameters
autonomous driving, security and medical diagno-
and the speed of movement of the conveyor. This is
sis(Xiaoetal.,2020). Deeplearningtechniqueshave
agoodexampletoadaptODtrainingmethodstothe
emergedasapowerfulstrategyforlearningcharacter-
specificconstraintsoftheindustrialcontext.
isticrepresentationsdirectlyfromdataandhaveledto
significantadvancesinthefieldofgenericobjectde-
2.3 CADModelsandOD
tection (Liu et al., 2020). In the last decade, many
competitions for object detection have been held to
provide large annotated datasets to the community, The first commercial CAD programs came up in the
and to unify the benchmarks and metrics for fair 1970s, providing functions for 2D-drawing and data
comparisonbetweenproposedmethods(Everingham archival, and evolved into the main engineering de-
et al., 2010), (Lin et al., 2014), (Zhou et al., 2017), sign tool (Lindsay et al., 2018), (Hirz et al., 2017).
(Kuznetsovaetal.,2018). These models can provide a scalable solution for
Some examples of OD methods proposed within intelligent and automatic object recognition, track-
the last few years include (He et al., 2015), where ing and augmentation based on generic object mod-
the author proposes a new network structure, called els (Ben-Himane et al., 2010). For example, CAD
SPP-net,whichcangenerateafixed-lengthrepresen- models have been used to support multi-view detec-
tation,regardlessofthesize/scaleoftheimage.Other tion (Zhang et al., 2013). In (Peng et al., 2015),
workssuchas(Janaetal.,2018)aimtoimprovepro- 3Dmodelswereusedastheprimarysourceofinfor-
cessingspeedandatthesametimeefficientlyidentify mation to build object models. In other works, 3D
objectsintheimage. Finally, deeperCNNshaveled CADmodelswereusedastheonlysourceoflabeled
to record-breaking improvements in the detection of data(Linetal.,2014),(Everinghametal.,2010),but
more general object categories, a shift which came theyarelimitedtogenericcategories,suchascarsand
76
ANovelMethodforObjectDetectionusingDeepLearningandCADModels
motorcycles. even if we know the location of the object, it can be
partly hidden by distractors and thus distort the la-
beling. ExampleimagesgeneratedusingourBlender
3 PROPOSED METHOD codecanbeseeninFigure3.
3.2 ModelTraining
Anoverviewofthisproposedmethodcanbeseenin
Figure1. First,acustomBlendercodeisusedtogen-
eratelabeledtrainingimagescontainingtherendered Inthiswork,wedidnottrainanewCNNarchitecture
CADmodelincontext. Then,apretrainedobjectde- from the scratch. Instead, we used one of the pre-
tection model is fine-tuned on the generated dataset. trained models provided by TensorFlow Object De-
Finally, the model can be used for inference on real tection API (Huang et al., 2017). This approach is
images(Figure1b). calledtransferlearningandconsistsinstartingtrain-
ingfromamodelthatalreadyknowsbasicfeatureex-
3.1 ImageGeneration tractionskillsandislesslikelytooverfitthesynthetic
datasets. Indeed,thediversitythatwecancreatewith
For the automatic generation of the training images, Blenderislimitedaswecannotgetaninfiniteamount
the software Blender (Blender Online Community, of textures and distractors, and the diversity already
2018)isused. Blenderisapowerfulsoftwarefor3D encountered by the network during pre-training can
design,whichincludesfeaturessuchasmodeling,rig- help reduce overfitting. In addition, using a network
ging, simulation and rendering. Blender has a good pre-trained on real images can prevent the network
Python API, is open-source and has good GPU sup- fromlearningdetectionfeaturesthatdependtoomuch
port. onthegenerationprocedure.
In order to generate a synthetic training image There exists several models in the TensorFlow
sample, our code requires several elements. First, a OD model zoo. More information on the perfor-
CADmodeloftheobjectofinterestaswellasseveral mance of the detection, as well as the reference ex-
otherindustrialCADmodelsneedtobeavailable. In ecution times, for each of the available pre-trained
theexperimentsofthispaper,weusethetwoobjects models, can be found on the Github page of the
showninFigure2,forwhichwealsohaverealworld API3.Inpractice, themodelusedinthispaperisthe
test images. The other objects serve as distractors faster rcnn inception v2 cocomodel,whichprovides
to help the model focusing on the right object. The agoodtrade-offbetweenperformanceandspeed.
CADmodelsforthedistractorsaregatheredfromthe FasterR-CNN,themodelusedinthiswork,takes
Grabcadwebsite1. Differenttexturesforthedifferent as input an entire image and a set of object propos-
distractorsaswellasforthebackgroundaregathered als. Thenetworkfirstprocessesthewholeimagewith
fromthePoliigonwebsite2.Finally,thecolorandtex- severalconvolutionalandmaxpoolinglayerstopro-
tureoftheobjectofinterestarereproducedmanually. duce a convolutional feature map. Then, for each
Once we have access to all the elements above, object proposal, a region of interest (RoI) pooling
the generation code goes as follows. A floor and a layer extracts a fixed-length feature vector from the
table are created and some distractors are sampled. feature map. Each feature vector is fed into a se-
Usingphysicssimulation,thedistractorsaredropped quence of fully connected layers that finally branch
from a random height on the table. The position of intotwosiblingoutputlayers: onethatproducessoft-
theobjectofinterestisalsorandomlysampled. Once max probability estimates over K object classes plus
the3Dsceneiscreated, texturesandcolorsaresam- acatch-all“background”classandanotherlayerthat
pled for the backgrounds and the distractors and the outputs four real-valued numbers for each of the K
entire scene is textured. Light sources and cameras object classes. Each set of 4 values encodes refined
are also sampled and placed randomly. Constraints bounding-boxpositionsforoneoftheK classes. For
onthecameraposeareapplied,inordertoensurethat a more detailed view about Faster-RCNN, we refer
theobjectappearsinthecameraview.Oncethescene thereadertotheoriginalpaper(Renetal., 2015), or
hasbeencreated, therenderingoccursandgenerates tothefollowingtutorial(Ananth,2019).
an image. By removing the light sources and mak- To train the final OD model, the TensorFlow OD
ingtheobjectofinterestalightsourceitself, wecan API requires a specific file structure of the training
generateanotherimagewhichcanbeusedforbound- imagesandlabels. Thisstepiscarriedoutautomati-
ingboxlabeling.Thisprocedureisnecessarybecause callybyourscript.
1https://grabcad.com/
2https://www.poliigon.com/ 3https://github.com/tensorflow/models
77
ICEIS2021-23rdInternationalConferenceonEnterpriseInformationSystems
(a)Training
(b)Inference
Figure1:OverviewoftheproposedmethodfortraininganobjectdetectionnetworkusingaCADmodel.
objects corresponding to the CAD models used for
training. The bounding box annotation files for the
test images are generated manually using a software
calledLabelImg4. Thisapplicationallowsustodraw
andsavetheannotationsofeachimageasxmlfilesin
thePASCALVOCformat(Everinghametal.,2010).
3.3.2 ODMetrics
(a)Adblue (b)Yamahalogo
Inordertoevaluatethequalityofthetrainedmodelon
Figure2:CADmodelsusedinourexperiments. realimages,andthustobeabletoselectthebesthy-
perparameters for image generation and training, we
3.3 ParameterSelectionProcedure usedstandardODmetricsthatarepresentedhere.
The Blender script used for image generation has
Intersection over Union (IoU): is an evaluation
manyhyperparametersthatmustbechosenbeforeus-
metricusedtomeasurehowmuchapredictedbound-
ing it, such as the number of distractors, the number
ing box matches with a ground truth bounding box.
ofscenesgeneratedortheresolutionofsyntheticim-
For a pair of bounding boxes, IoU is defined as the
ages.Hence,weconductasetofexperimentstoprop-
area of the intersection divided by the area of the
erly select these parameters in order to optimize the
union(Figure4).IfAcorrespondstotheground-truth
OD results for inference on real images. In this sec-
boxandBtothepredictedbox,then,IoUiscomputed
tion, we explain the parameter selection procedure.
as:
In other words, we present the dataset on which the |A∩B|
differentsetsofparameterswereevaluated,aswellas IoU = , (1)
|A∪B|
themetricsusedtoassessthequalityoftheresultsob-
where |.| denotes the area of a given shape. The nu-
tainedwithagivensetofparameters. Theresultsob-
meratoriscalledtheoverlapareaandthedenomina-
tainedforthisparameterselectionprocedurearepre-
toriscalledthecombinedarea.IoUrangesbetween0
sentedinSection4.
and1,where1meansthattheboundingboxesarethe
3.3.1 TestDataset sameand0thatthereisnooverlap.
Theobjectiveofthisworkistovalidatethatanobject Precision, Recall, F1-Measure. We call confi-
detector trained on synthetic images can generalize
dence score, the probability that an anchor box con-
to real world industrial cases. Hence, we use a test
dataset composed of 380 real images containing the 4https://github.com/tzutalin/labelImg
78
ANovelMethodforObjectDetectionusingDeepLearningandCADModels
(a)Adblue (b)Yamahalogo
Figure3:ExampleofimagesgeneratedusingourcustomBlenderscript.
TP
Recall= , (3)
TP+FN
Precision·Recall
F = . (4)
1
Precision+Recall
A high precision means that most of the predicted
boxeshadacorrespondinggroundtruth, i.e., theob-
jectdetectorisnotproducingbadpredictions. Ahigh
recallmeansthatmostofthegroundtruthboxeshad
a corresponding prediction, i.e., the object detector
finds most objects in the images. The F -Score is
1
the harmonic mean of the precision and recall, it is
needed when a balance between precision and recall
issought.
Figure4:IntersectionoverUnion(IoU)computation. Inthecaseofobjectdetectiononproductionlines,
alowprecisionmeansthatsometimesapartmightbe
tainsanobjectfromacertainclass. Itisusuallypre- absentandthemodelwouldnotseeit,whereasalow
dictedbytheclassifierpartoftheobjectdetector. The recallmeansthatsometimesthepartispresentandthe
confidence score and IoU are used as the criteria to modelraisesanalertanyways. Forthisreason,botha
determine whether a detection is a true positive or a goodrecallandaprecisionarerequiredandthechoice
falsepositive. Givenaminimalthresholdonthecon- ofusingtheF -Scoremetricseemsappropriate.
1
fidence score for bounding box acceptance, and an-
otherthresholdonIoUtoidentifymatchingboxes, a
Average Precision. After an OD model has been
detection is considered a true positive (TP) if there
trained,thecomputationofPrecision,RecallandF1-
exists a ground truth such that: confidence score >
scoredependsonthevalueofthetwothresholdsde-
threshold;thepredictedclassmatchestheclassofthe
fined above (for the confidence score and IoU). In
ground truth; and IoU>threshold . The violation
IoU order to properly choose the values of these thresh-
ofanyofthelasttwoconditionsgeneratesafalsepos-
olds, it is interesting to analyze the Precision x Re-
itive(FP).Incasemultiplepredictionscorrespondto
call curves. For each class, and for a given value of
the sameground-truth, only theone with the highest
theIoUthreshold,theconfidencethresholdissetasa
confidence score counts as a true positive, while the
variableandsampledbetween0and1toplotapara-
othersareconsideredfalsepositives. Whenaground
metric curve with precision and recall as the x and
truthboundingboxisleftwithoutanymatchingpre-
y-axis.
dicteddetection,itcountsasafalsenegative(FN).
Aclass-specificobjectdetectorisconsideredgood
IfwenoteTP,FPandTNrespectivelythenumber
if the precision remains high as the recall increases,
ofTruePositives,FalsePositivesandFalseNegatives
meaningthatifyouvarytheconfidencelimit,thepre-
inadataset,wecandefinethefollowingmetrics:
cision and recall will still be high. Hence, to com-
TP parebetweencurveswegenerallyrelyonanumerical
Precision= , (2) metriccalledAveragePrecision(AP).Since2010,the
TP+FP
79
ICEIS2021-23rdInternationalConferenceonEnterpriseInformationSystems
standardcomputationmethodforAPconsistsincal- Table1:Bestandworsthyperparametersconfigurationsob-
culatingtheareaunderthecurve(AUC)ofthePreci- tainedandtheircorrespondingresults.
sionxRecallcurve(Everinghametal.,2010). Parameters BestCase WorstCase
Resolution 960x540 960x540
camposes 5 5
nscenes 20 20
4 EXPERIMENTS AND RESULTS
nimages 100 100
ndistractors 20 0
GenerationTime 1257.30 749.92
The results obtained for the parameter selection pro-
nsamples 10 10
cedure as well as our final evaluations are presented
Precision%:Avg(Std.Dev) 78.06(15.41) 57.34(16.27)
here. TheseexperimentswereconductedonaNvidia Recall%:Avg(Std.Dev) 96.223(4.07) 90.71(6.45)
Quadro P5000 GPU and a 2.90GHz Intel Xeon E3- F1-Score%:Avg(Std.Dev) 85.19(10.57) 66.80(11.49)
154Mv5processor(16GBofRAM).
adropofaround23%inF1-score,onaverageacross
the 10 experiment samples. We also tried combina-
4.1 HyperparameterTuning
tions with few distractors, but the F1-score results
dropped significantly. This makes sense as the real
The parameter selection procedure is conducted ex-
imagesevaluatedhadseveraldistractorsaswell.
clusively on the Yamaha logo object, the best set of
Another important point is that the resolution of
parametersisthentestedontheAdblueobjecttoen-
theimagesgeneratedshouldbegreaterthantheinfer-
surethatitalsoperformswell. Theinfluenceoffour
enceimages. Inallofourtests, thisscenarioalways
tunableparametersonthefinalresultsisstudiedhere.
producedthebestresults. Italsomakessenseasitis
foreachparameter,threevalueswereselectedforthe
easier to learn from a more detailed/complex model
tests. Theseparametersandtheirstudiedvaluesare:
andthenevaluateinalessdetailed/complexscenario.
• Resolution: 640x480,960x540,1080x720 Finally, we also tried to increase the number of
generated training images to see if this would lead
• Cameraposes: 2,5,20
to an increase in performance. Surprisingly, we ac-
• Numberofscenes: 20,50,200 knowledged that the performance dropped for the
• Numberofdistractors: 0,5,20 case with 20 distractors, 20 camera poses and 50
scenes (1000 images). This might mean that when
Fromsimplepreliminaryexperimentsthatarenotpre-
presentedtoomanysyntheticimages,themodelstarts
sented here, we concluded that the number of tex-
overfitting to the biases involved by our generation
tures used for the floor, the distractors and the sup-
process, and it also indicates that we do not need a
portshouldbesettothemaximumnumberoftextures
large number of images to train our model. In ad-
available(inourcase7forthefloorand6forthetwo
ditiontothisperformancedrop,generatingtentimes
others). Theparametervaluesusedinthisworkwere
moreimagesalsomakestheproposedpipelinealmost
chosen empirically, that is, after several test scenar-
25timesslower(31037.16seconds).
ios,thesevaluesweretheonesthatgeneratedthebest
performanceregardingthemetrics.
4.2 Results
In total, from the values selected for the four pa-
rameters,wesampledmorethan30combinationsand
Inthissection,weevaluatetheresultsofthebestcom-
comparedtheODresultsonthetestingsetofrealim-
bination of parameters (Best Case from Table 1) in
ages. For each combination tested, we trained the
more details. These results are presented in Table 2,
Faster-RCNNCNNonthesyntheticimagesthatwere
they correspond to using a confidence threshold of
generated. We note that, for each hyperparameters
0.9 and an IoU threshold of 0.5. From Table 2, we
combination,theexperimentswererepeated10times
canseethatthebestparametersidentifiedusingonly
inordertoattenuatetheinfluenceoftherandomcom-
theYamahalogoproducesimilarresultswhenapplied
ponents in the generation and training process. For
to another object (Adblue). This suggests that the
reasonsofspaceinthisarticle,itwasnotpossibleto
proposedparametersforourmethodseemtobewell
presentallresults. However, inordertodemonstrate
suitedfordifferentobjectsandthuscouldgeneralize
the importance of this parameter selection step, Ta-
welltovariousindustrialusecaseswithoutadditional
ble1showsthebestandtheworstconfigurationthat
parametertuning.
weretested.
In Table 1, we can see that the distractors are an
essential element in our proposed pipeline for image
generation. Indeed,whenremovingthem,wecansee
80
ANovelMethodforObjectDetectionusingDeepLearningandCADModels
Table2:Resultsobtainedwiththebestsetofhyperparame- input. The method first generates realistic synthetic
ters.
imagesusingacustomBlenderscript,andthentrains
Object Precision% Recall% F1-Score% afaster-RCNNODmodelusingtheTensorFlowOD
Adblue 85.11 80.00 81.93 API.Tounderstandandoptimizethedifferentparam-
Yamahalogo 78.06 96.22 85.19
eters in the proposed pipeline, a systematic parame-
terselectionstudyisconductedusingaYamahalogo
4.3 Discussion
CAD model for training and real images containing
the same object in context for evaluation. The se-
Itisdifficulttocompareourresultswithotherworks lected hyperparameters are then tested on an other
in the literature. Indeed, as far as we know, the ap- object, showing that they can generalize to different
proach presented in this work is the first proposal to scenarios.
build a fully automated pipeline that takes as input Over the last decade, successful deep learning
the CAD model of an object and outputs a trained methodshavebeendevelopedtotacklethechalleng-
object detection model for this object without any ing problem of generic object detection. However,
real image. For fair comparison we would need to when it comes to the problem of OD in an indus-
compare our work with other end-to-end systematic trialenvironment,theavailabilityofgoodqualitydata
approaches to build OD models from CAD models, becomes a bottleneck. To address this issue we pro-
which is impossible as it does not exists. Else, we posed to use synthetic images for training, which is
hopethattheresultspresentedinthisworkcanserve challenging as it might not reflect the high variabil-
asagoodbaselineforcomparisonoffutureworksin ity found in in real industrial environment (objects,
thisresearchdirection. pieces and scenery, etc.). In addition, there is also a
However,wegivearoughcomparisonwithother difficulty in finding CAD models of specific indus-
relevant works to give an idea of how well our ap- trialobjectssothattheycanbetrainedandotherap-
proach is performing. In (Mazzetto et al., 2019) the proachescanbetestedandcompared. Thus,asacon-
detectionofobjectsinanautomobileproductionline sequenceofthiswork,asetofdatawasproducedand
was implemented, using only real images of the ob- madepubliclyavailableforfutureresearch5.
jects. In this work, the estimated detection accuracy Therefore, themainconclusionfromthisworkis
wasaround90%,whichisonlyabout5to10%bet- thatitispossibletotrainanobjectdetectionmodelon
ter than the results obtained in our work using only a set of synthetic images generated from CAD mod-
synthetic images. In (Jabbar et al., 2017), the au- els with excellent performance. In addition, it was
thorsalsotrainanODmodelusingsyntheticimages shownthatalargesetofimagesisnotneededtoob-
generated with Blender and evaluate the results in tainasignificantresult. Ourexperimentsindicatethat
real images. However, this approach is not entirely theproposedrenderingprocessissufficienttoobtain
automated since the scenes are created manually by good performances and that the way of building and
Blender artists to ensure photo-realism. The object renderingthescenesiscrucialforthefinalresult.
used in this work for evaluation is a glass of wine
and the maximum AP obtained is 71.14 %. We can
seethatoursystematicapproachseemstoworkbetter ACKNOWLEDGEMENTS
thanthisapproach,however,wecannotreproducethe
methodonourobjects,aswecannotcreatethescenes
OurworkhasbenefitedfromtheAIInterdisciplinary
manually in the same way that they would. The po-
InstituteANITI.ANITIisfundedbytheFrench“In-
tentialbetterperformanceofourapproachcanbeex-
vesting for the Future – PIA3” program under the
plainedbythefactthatthelossofphoto-realismcan Grantagreementn◦ANR-19-PI3A-0004.
be compensated by the higher number of images in
our synthetic datasets. Indeed, with our fully auto-
mated approach, it is faster and requires no effort to
REFERENCES
generatemoredata,unlikein(Jabbaretal.,2017).
Ananth, S. (2019). Faster R-CNN for object detection, a
technicalpapersummary.
5 CONCLUSIONS
Ben-Himane, S., Hintestroisser, S., andNavab, N.(2010).
Computer vision CAD models. US Patent App.
12/682,199.
Thisworkpresentsasystematicapproachtotrainob-
ject detection models to address industrial scenarios, 5https://github.com/igorgbs/systematic approach cad
using only a CAD model of the object of interest as models
81
ICEIS2021-23rdInternationalConferenceonEnterpriseInformationSystems
BlenderOnlineCommunity(2018). Blender-a3Dmod- Kuznetsova,A.,Rom,H.,Alldrin,N.,Uijlings,J.,Krasin,
ellingandrenderingpackage. BlenderFoundation. I.,Pont-Tuset,J.,Kamali,S.,Popov,S.,Malloci,M.,
Cohen, J., Crispim-Junior, C., Grange-Faivre, C., and Duerig, T., et al. (2018). The open images dataset
Tougne, L. (2020). CAD-based learning for ego- v4:Unifiedimageclassification,objectdetection,and
centric object detection in industrial context. In visualrelationshipdetectionatscale. arXivpreprint
15th International Conference on Computer Vision arXiv:1811.00982.
Theory and Applications, volume 5, pages 644–651. Lin,T.-Y.,Maire,M.,Belongie,S.,Hays,J.,Perona,P.,Ra-
SCITEPRESS. manan,D.,Dolla´r,P.,andZitnick,C.L.(2014). Mi-
Drost,B.,Ulrich,M.,Bergmann,P.,Hartinger,P.,andSte- crosoftCOCO:Commonobjectsincontext. InEuro-
ger,C.(2017). IntroducingMVTecITODD-adataset peanconferenceoncomputervision,pages740–755.
for3dobjectrecognitioninindustry. InProceedings Springer.
of the IEEE International Conference on Computer Lindsay, A., Paterson, A., and Graham, I. (2018). Identi-
VisionWorkshops,pages2200–2208. fying and quantifying inefficiencies within industrial
Everingham, M., Van Gool, L., Williams, C. K., Winn, parametric CAD models. In Advances in Manufac-
J., and Zisserman, A. (2010). The pascal visual ob- turingTechnologyXXXII:Proceedingsofthe16thIn-
jectclasses(VOC)challenge.Internationaljournalof ternational Conference on Manufacturing Research,
computervision,88(2):303–338. volume8,page227.IOSPress.
Ge, C., Wang, J., Wang, J., Qi, Q., Sun, H., and Liao, Liu,L.,Ouyang,W.,Wang,X.,Fieguth,P.,Chen,J.,Liu,
J. (2020). Towards automatic visual inspection: A X., and Pietika¨inen, M. (2020). Deep learning for
weaklysupervisedlearningmethodforindustrialap- genericobjectdetection:Asurvey.Internationaljour-
plicable object detection. Computers in Industry, nalofcomputervision,128(2):261–318.
121:103232. Mazzetto,M.,Southier,L.F.,Teixeira,M.,andCasanova,
Gue´rin, J., Gibaru, O., Nyiri, E., Thiery, S., and Palos, D. (2019). Automatic classification of multiple ob-
J. (2018a). Automatic construction of real-world jects in automotive assembly line. In 2019 24th
datasets for 3D object localization using two cam- IEEE International Conference on Emerging Tech-
eras. In IECON 2018-44th Annual Conference of nologiesandFactoryAutomation(ETFA),pages363–
theIEEEIndustrialElectronicsSociety,pages3655– 369.IEEE.
3658.IEEE. Peng,X.,Sun,B.,Ali,K.,andSaenko,K.(2015).Learning
Gue´rin, J., Gibaru, O., Nyiri, E., Thieryl, S., and Boots, deep object detectors from 3D models. In Proceed-
B.(2018b). Semanticallymeaningfulviewselection. ings of the IEEE International Conference on Com-
In2018IEEE/RSJInternationalConferenceonIntel- puterVision,pages1278–1286.
ligentRobotsandSystems(IROS),pages1061–1066. Prasad,D.K.(2012). Surveyoftheproblemofobjectde-
IEEE. tectioninrealimages. InternationalJournalofImage
He,K.,Zhang,X.,Ren,S.,andSun,J.(2015).Spatialpyra- Processing(IJIP),6(6):441.
midpoolingindeepconvolutionalnetworksforvisual
Rajpura,P.S.,Bojinov,H.,andHegde,R.S.(2017). Ob-
recognition. IEEE transactions on pattern analysis
ject detection using deep CNNs trained on synthetic
andmachineintelligence,37(9):1904–1916.
images. arXivpreprintarXiv:1706.06782.
Hinterstoisser, S., Lepetit, V., Wohlhart, P., andKonolige,
Ren,S.,He,K.,Girshick,R.,andSun,J.(2015). FasterR-
K. (2018). On pre-trained image features and syn-
CNN:Towardsreal-timeobjectdetectionwithregion
theticimagesfordeeplearning. InProceedingsofthe
proposalnetworks.InAdvancesinneuralinformation
EuropeanConferenceonComputerVision(ECCV).
processingsystems,pages91–99.
Hirz,M.,Rossbacher,P.,andGulanova´,J.(2017). Future
Shirley,P.andMorley,R.K.(2003). Realisticraytracing.
trends in CAD–from the perspective of automotive
AKPeters/CRCPress.
industry. Computer-aided design and applications,
Xiao, Y., Tian, Z., Yu, J., Zhang, Y., Liu, S., Du, S., and
14(6):734–741.
Lan, X. (2020). A review of object detection based
Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A.,
ondeeplearning. MultimediaToolsandApplications,
Fathi, A., Fischer, I., Wojna, Z., Song, Y., Guadar-
pages1–63.
rama,S.,etal.(2017). Speed/accuracytrade-offsfor
Yang,J.,Li,S.,Wang,Z.,andYang,G.(2019). Real-time
modern convolutional object detectors. In Proceed-
tinypartdefectdetectionsysteminmanufacturingus-
ingsoftheIEEEconferenceoncomputervisionand
ingdeeplearning. IEEEAccess,7:89278–89291.
patternrecognition,pages7310–7311.
Zhang, X., Yang, Y.-H., Han, Z., Wang, H., and Gao, C.
Jabbar, A., Farrawell, L., Fountain, J., and Chalup, S. K.
(2013). Objectclassdetection:Asurvey. ACMCom-
(2017). Trainingdeepneuralnetworksfordetecting
putingSurveys(CSUR),46(1):1–53.
drinkingglassesusingsyntheticimages. InInterna-
tionalConferenceonNeuralInformationProcessing, Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., and Tor-
pages354–363.Springer. ralba,A.(2017).Places:A10millionimagedatabase
Jana, A. P., Biswas, A., et al. (2018). YOLO based de- for scene recognition. IEEE transactions on pattern
tectionandclassificationofobjectsinvideorecords. analysisandmachineintelligence,40(6):1452–1464.
In20183rdIEEEInternationalConferenceonRecent
TrendsinElectronics,Information&Communication
Technology(RTEICT),pages2448–2452.IEEE.
82