程序辅导案例 > Program >

代写辅导接单-A Novel Method for Object Detection using Deep Learning and CAD Models

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

A Novel Method for Object Detection using Deep Learning and CAD

Models

IgorGarciaBallhausenSampaio1,LuigyMachaca1,Jose´ Viterbo1 a andJorisGue´rin2 b

1ComputingInstitute,UniversidadeFederalFluminense,Brazil

2LAAS-CNRS,ONERA,Universite´deToulouse,France

Keywords: ObjectDetection,CADModels,SyntheticImageGeneration,DeepLearning,ConvolutionalNeuralNetwork.

Abstract: ObjectDetection(OD)isanimportantcomputervisionproblemforindustry,whichcanbeusedforquality

controlintheproductionlines,amongotherapplications.Recently,DeepLearning(DL)methodshaveenabled

practitionerstotrainODmodelsperformingwelloncomplexrealworldimages. However,theadoptionof

these models in industry is still limited by the difficulty and the significant cost of collecting high quality

trainingdatasets. Ontheotherhand,whenapplyingODtothecontextofproductionlines,CADmodelsof

theobjectstobedetectedareoftenavailable.Inthispaper,weintroduceafullyautomatedmethodthatusesa

CADmodelofanobjectandreturnsafullytrainedODmodelfordetectingthisobject.Todothis,wecreated

aBlenderscriptthatgeneratesrealisticlabeleddatasetsofimagescontainingtheobject,whicharethenused

fortrainingtheODmodel. Themethodisvalidatedexperimentallyontwopracticalexamples,showingthat

thisapproachcangenerateODmodelsperformingwellonrealimages,whilebeingtrainedonlyonsynthetic

images.Theproposedmethodhaspotentialtofacilitatetheadoptionofobjectdetectionmodelsinindustryas

itiseasytoadaptfornewobjectsandhighlyflexible.Hence,itcanresultinsignificantcostsreduction,gains

inproductivityandimprovedproductsquality.

1 INTRODUCTION newdatasetandlabelingitmanuallycanbeverytime

consumingandexpansive(Jabbaretal.,2017).

Recently,DeepLearning(DL)hasproducedexcellent On the other hand, OD for industrial production

results for Object Detection (OD) (Liu et al., 2020). lines presents the specificity that the manufacturers

On the one hand, a typical limitation with DL is the often have access to the CAD models of the objects

requirementoflargelabeleddatasetsfortraining. In- to detect. Thanks to advances in computer graphics

deed,althoughtherearevariouslargedatabasesavail- techniques, such as ray tracing (Shirley and Morley,

ableonlineforOD,forspecificindustrialapplications 2003),thegenerationofphoto-realisticimagesisnow

it is always necessary to create custom datasets con- possible. In such artificially generated images, the

taining the objects of interest. While the scenarios computercanbeemployedtoobtainboundingboxla-

present in public datasets are useful from both a re- belingforfree. Theuseofsyntheticimagesrendered

search and application standpoint, it was found that from CAD models to train OD models has already

industrial applications, such as bin picking or defect beenproposedin(Pengetal., 2015), (Rajpuraetal.,

inspection,havequitedifferentcharacteristicsthatare 2017)and(Hinterstoisseretal.,2018).However,their

not modeled by the existing datasets (Drost et al., approachesarenotautomatedastheyrequiremanual

2017). Asaresult,methodsthatperformwellonex- scenecreationbyBlenderartists. Inaddition,theob-

istingdatasetssometimesshowdifferentresultswhen jectsusedintheseworksareusuallygeneric,suchas

appliedtoindustrialscenarioswithoutretraining.The buses,airplanes,carsoranimals.

processofgeneratingaspecificdatasetforretraining Themaincontributionofthispaperistopresenta

istedious,andcanbeerror-pronewhenconductedby newmethodfortrainingODmodelsinsyntheticim-

non-professionaltechnicians. Moreover,generatinga ages generated from CAD models that is fully auto-

maticandthuswellsuitedforindustrialuse. Thepro-

a https://orcid.org/0000-0002-0339-6624 posedmethodconsistsoftheautomaticgenerationof

b https://orcid.org/0000-0002-8048-8960 realistic labeled images containing the objects to be

Sampaio,I.,Machaca,L.,Viterbo,J.andGuérin,J.

ANovelMethodforObjectDetectionusingDeepLearningandCADModels.

DOI:10.5220/0010451100750082

InProceedingsofthe23rdInternationalConferenceonEnterpriseInformationSystems(ICEIS2021)-Volume1,pages75-82

ISBN:978-989-758-509-8

(cid:13)

ICEIS2021-23rdInternationalConferenceonEnterpriseInformationSystems

detected, followed by the fine-tuning of a pretrained aboutwhenDCNNsbegantobesuccessfullyapplied

ODmodelontheartificialdataset.Anextensivestudy toimageclassification(Liuetal.,2020).

is conducted to properly select the user-defined pa-

rameterssothatitmaximizestheperformanceonreal 2.2 ODforIndustrialApplications

world images. Our method is evaluated using the

CADmodelsoftwoindustrialobjectsfortraining,as

Although general purpose OD methods have greatly

wellasreallabeledimagescontainingtheobjectsfor

improved thanks to the availability of large public

evaluation. The results obtained are very promising

datasets, the detection of instances in the industrial

aswemanagetogetF1-scoresabove90%onrealim-

context must be approached differently, since anno-

ageswhiletrainingonlyonsyntheticimages.

tated images are generally not available or rare. In-

This paper is organized as follows. Section 2

deed,totrainadeeplearningmodel,hundredsofan-

presents the related work in the field of object de-

notated images for each object category are needed.

tection and deep learning for industry. Section 3

Specific datasets need to be collected and annotated

provides detailed explanations about the proposed

fordifferenttargetapplications. Thisprocessistime-

methodcreated. Section4describesourexperiments,

consumingandlaborious,andincreasestheburdenon

presentstheresultsobtainedanddiscussesthem. Fi-

operators, which goes against the goal of industrial

nally, conclusions and directions for future work are

automation(Cohenetal.,2020),(Geetal.,2020).

presentedinSection5.

Apublicdatasetsadaptedtotheindustrialcontext

was developed in (Drost et al., 2017). Unlike other

3D object detection datasets, this work models in-

2 RELATED WORK dustrial waste collection and object inspection tasks

that often face different challenges. In addition, the

This section presents related work about OD, indus- evaluation criteria are focused on practical aspects,

trial applications of DL-based computer vision, as such as execution times, memory consumption, use-

wellascomputervisionmethodsusingCADmodels. ful measures of correction and precision. Other ex-

amplesofdatasetsadaptedtotheindustrialcontextin-

2.1 ObjectDetection clude(Gue´rinetal.,2018b)and(Gue´rinetal.,2018a).

Finally,in(Yangetal.,2019),amethodtodetect

defectsoftinypartsinrealtimewasdeveloped,based

OD is a challenging computer vision problem that

on object detection and deep learning. To improve

consists in locating instances of objects from prede-

their results, the authors consider the specificities of

fined categories in natural images (Prasad, 2012). It

theindustrialapplicationintheirmethodsuchasthe

has many applications in various domains such as

propertiesoftheparts,theenvironmentalparameters

autonomous driving, security and medical diagno-

and the speed of movement of the conveyor. This is

sis(Xiaoetal.,2020). Deeplearningtechniqueshave

agoodexampletoadaptODtrainingmethodstothe

emergedasapowerfulstrategyforlearningcharacter-

specificconstraintsoftheindustrialcontext.

isticrepresentationsdirectlyfromdataandhaveledto

significantadvancesinthefieldofgenericobjectde-

2.3 CADModelsandOD

tection (Liu et al., 2020). In the last decade, many

competitions for object detection have been held to

provide large annotated datasets to the community, The first commercial CAD programs came up in the

and to unify the benchmarks and metrics for fair 1970s, providing functions for 2D-drawing and data

comparisonbetweenproposedmethods(Everingham archival, and evolved into the main engineering de-

et al., 2010), (Lin et al., 2014), (Zhou et al., 2017), sign tool (Lindsay et al., 2018), (Hirz et al., 2017).

(Kuznetsovaetal.,2018). These models can provide a scalable solution for

Some examples of OD methods proposed within intelligent and automatic object recognition, track-

the last few years include (He et al., 2015), where ing and augmentation based on generic object mod-

the author proposes a new network structure, called els (Ben-Himane et al., 2010). For example, CAD

SPP-net,whichcangenerateafixed-lengthrepresen- models have been used to support multi-view detec-

tation,regardlessofthesize/scaleoftheimage.Other tion (Zhang et al., 2013). In (Peng et al., 2015),

workssuchas(Janaetal.,2018)aimtoimprovepro- 3Dmodelswereusedastheprimarysourceofinfor-

cessingspeedandatthesametimeefficientlyidentify mation to build object models. In other works, 3D

objectsintheimage. Finally, deeperCNNshaveled CADmodelswereusedastheonlysourceoflabeled

to record-breaking improvements in the detection of data(Linetal.,2014),(Everinghametal.,2010),but

more general object categories, a shift which came theyarelimitedtogenericcategories,suchascarsand

ANovelMethodforObjectDetectionusingDeepLearningandCADModels

motorcycles. even if we know the location of the object, it can be

partly hidden by distractors and thus distort the la-

beling. ExampleimagesgeneratedusingourBlender

3 PROPOSED METHOD codecanbeseeninFigure3.

3.2 ModelTraining

Anoverviewofthisproposedmethodcanbeseenin

Figure1. First,acustomBlendercodeisusedtogen-

eratelabeledtrainingimagescontainingtherendered Inthiswork,wedidnottrainanewCNNarchitecture

CADmodelincontext. Then,apretrainedobjectde- from the scratch. Instead, we used one of the pre-

tection model is fine-tuned on the generated dataset. trained models provided by TensorFlow Object De-

Finally, the model can be used for inference on real tection API (Huang et al., 2017). This approach is

images(Figure1b). calledtransferlearningandconsistsinstartingtrain-

ingfromamodelthatalreadyknowsbasicfeatureex-

3.1 ImageGeneration tractionskillsandislesslikelytooverfitthesynthetic

datasets. Indeed,thediversitythatwecancreatewith

For the automatic generation of the training images, Blenderislimitedaswecannotgetaninfiniteamount

the software Blender (Blender Online Community, of textures and distractors, and the diversity already

2018)isused. Blenderisapowerfulsoftwarefor3D encountered by the network during pre-training can

design,whichincludesfeaturessuchasmodeling,rig- help reduce overfitting. In addition, using a network

ging, simulation and rendering. Blender has a good pre-trained on real images can prevent the network

Python API, is open-source and has good GPU sup- fromlearningdetectionfeaturesthatdependtoomuch

port. onthegenerationprocedure.

In order to generate a synthetic training image There exists several models in the TensorFlow

sample, our code requires several elements. First, a OD model zoo. More information on the perfor-

CADmodeloftheobjectofinterestaswellasseveral mance of the detection, as well as the reference ex-

otherindustrialCADmodelsneedtobeavailable. In ecution times, for each of the available pre-trained

theexperimentsofthispaper,weusethetwoobjects models, can be found on the Github page of the

showninFigure2,forwhichwealsohaverealworld API3.Inpractice, themodelusedinthispaperisthe

test images. The other objects serve as distractors faster rcnn inception v2 cocomodel,whichprovides

to help the model focusing on the right object. The agoodtrade-offbetweenperformanceandspeed.

CADmodelsforthedistractorsaregatheredfromthe FasterR-CNN,themodelusedinthiswork,takes

Grabcadwebsite1. Differenttexturesforthedifferent as input an entire image and a set of object propos-

distractorsaswellasforthebackgroundaregathered als. Thenetworkfirstprocessesthewholeimagewith

fromthePoliigonwebsite2.Finally,thecolorandtex- severalconvolutionalandmaxpoolinglayerstopro-

tureoftheobjectofinterestarereproducedmanually. duce a convolutional feature map. Then, for each

Once we have access to all the elements above, object proposal, a region of interest (RoI) pooling

the generation code goes as follows. A floor and a layer extracts a fixed-length feature vector from the

table are created and some distractors are sampled. feature map. Each feature vector is fed into a se-

Usingphysicssimulation,thedistractorsaredropped quence of fully connected layers that finally branch

from a random height on the table. The position of intotwosiblingoutputlayers: onethatproducessoft-

theobjectofinterestisalsorandomlysampled. Once max probability estimates over K object classes plus

the3Dsceneiscreated, texturesandcolorsaresam- acatch-all“background”classandanotherlayerthat

pled for the backgrounds and the distractors and the outputs four real-valued numbers for each of the K

entire scene is textured. Light sources and cameras object classes. Each set of 4 values encodes refined

are also sampled and placed randomly. Constraints bounding-boxpositionsforoneoftheK classes. For

onthecameraposeareapplied,inordertoensurethat a more detailed view about Faster-RCNN, we refer

theobjectappearsinthecameraview.Oncethescene thereadertotheoriginalpaper(Renetal., 2015), or

hasbeencreated, therenderingoccursandgenerates tothefollowingtutorial(Ananth,2019).

an image. By removing the light sources and mak- To train the final OD model, the TensorFlow OD

ingtheobjectofinterestalightsourceitself, wecan API requires a specific file structure of the training

generateanotherimagewhichcanbeusedforbound- imagesandlabels. Thisstepiscarriedoutautomati-

ingboxlabeling.Thisprocedureisnecessarybecause callybyourscript.

1https://grabcad.com/

2https://www.poliigon.com/ 3https://github.com/tensorflow/models

ICEIS2021-23rdInternationalConferenceonEnterpriseInformationSystems

(a)Training

(b)Inference

Figure1:OverviewoftheproposedmethodfortraininganobjectdetectionnetworkusingaCADmodel.

objects corresponding to the CAD models used for

training. The bounding box annotation files for the

test images are generated manually using a software

calledLabelImg4. Thisapplicationallowsustodraw

andsavetheannotationsofeachimageasxmlfilesin

thePASCALVOCformat(Everinghametal.,2010).

3.3.2 ODMetrics

(a)Adblue (b)Yamahalogo

Inordertoevaluatethequalityofthetrainedmodelon

Figure2:CADmodelsusedinourexperiments. realimages,andthustobeabletoselectthebesthy-

perparameters for image generation and training, we

3.3 ParameterSelectionProcedure usedstandardODmetricsthatarepresentedhere.

The Blender script used for image generation has

Intersection over Union (IoU): is an evaluation

manyhyperparametersthatmustbechosenbeforeus-

metricusedtomeasurehowmuchapredictedbound-

ing it, such as the number of distractors, the number

ing box matches with a ground truth bounding box.

ofscenesgeneratedortheresolutionofsyntheticim-

For a pair of bounding boxes, IoU is defined as the

ages.Hence,weconductasetofexperimentstoprop-

area of the intersection divided by the area of the

erly select these parameters in order to optimize the

union(Figure4).IfAcorrespondstotheground-truth

OD results for inference on real images. In this sec-

boxandBtothepredictedbox,then,IoUiscomputed

tion, we explain the parameter selection procedure.

as:

In other words, we present the dataset on which the |A∩B|

differentsetsofparameterswereevaluated,aswellas IoU = , (1)

|A∪B|

themetricsusedtoassessthequalityoftheresultsob-

where |.| denotes the area of a given shape. The nu-

tainedwithagivensetofparameters. Theresultsob-

meratoriscalledtheoverlapareaandthedenomina-

tainedforthisparameterselectionprocedurearepre-

toriscalledthecombinedarea.IoUrangesbetween0

sentedinSection4.

and1,where1meansthattheboundingboxesarethe

3.3.1 TestDataset sameand0thatthereisnooverlap.

Theobjectiveofthisworkistovalidatethatanobject Precision, Recall, F1-Measure. We call confi-

detector trained on synthetic images can generalize

dence score, the probability that an anchor box con-

to real world industrial cases. Hence, we use a test

dataset composed of 380 real images containing the 4https://github.com/tzutalin/labelImg

ANovelMethodforObjectDetectionusingDeepLearningandCADModels

(a)Adblue (b)Yamahalogo

Figure3:ExampleofimagesgeneratedusingourcustomBlenderscript.

Recall= , (3)

TP+FN

Precision·Recall

F = . (4)

Precision+Recall

A high precision means that most of the predicted

boxeshadacorrespondinggroundtruth, i.e., theob-

jectdetectorisnotproducingbadpredictions. Ahigh

recallmeansthatmostofthegroundtruthboxeshad

a corresponding prediction, i.e., the object detector

finds most objects in the images. The F -Score is

the harmonic mean of the precision and recall, it is

needed when a balance between precision and recall

issought.

Figure4:IntersectionoverUnion(IoU)computation. Inthecaseofobjectdetectiononproductionlines,

alowprecisionmeansthatsometimesapartmightbe

tainsanobjectfromacertainclass. Itisusuallypre- absentandthemodelwouldnotseeit,whereasalow

dictedbytheclassifierpartoftheobjectdetector. The recallmeansthatsometimesthepartispresentandthe

confidence score and IoU are used as the criteria to modelraisesanalertanyways. Forthisreason,botha

determine whether a detection is a true positive or a goodrecallandaprecisionarerequiredandthechoice

falsepositive. Givenaminimalthresholdonthecon- ofusingtheF -Scoremetricseemsappropriate.

fidence score for bounding box acceptance, and an-

otherthresholdonIoUtoidentifymatchingboxes, a

Average Precision. After an OD model has been

detection is considered a true positive (TP) if there

trained,thecomputationofPrecision,RecallandF1-

exists a ground truth such that: confidence score >

scoredependsonthevalueofthetwothresholdsde-

threshold;thepredictedclassmatchestheclassofthe

fined above (for the confidence score and IoU). In

ground truth; and IoU>threshold . The violation

IoU order to properly choose the values of these thresh-

ofanyofthelasttwoconditionsgeneratesafalsepos-

olds, it is interesting to analyze the Precision x Re-

itive(FP).Incasemultiplepredictionscorrespondto

call curves. For each class, and for a given value of

the sameground-truth, only theone with the highest

theIoUthreshold,theconfidencethresholdissetasa

confidence score counts as a true positive, while the

variableandsampledbetween0and1toplotapara-

othersareconsideredfalsepositives. Whenaground

metric curve with precision and recall as the x and

truthboundingboxisleftwithoutanymatchingpre-

y-axis.

dicteddetection,itcountsasafalsenegative(FN).

Aclass-specificobjectdetectorisconsideredgood

IfwenoteTP,FPandTNrespectivelythenumber

if the precision remains high as the recall increases,

ofTruePositives,FalsePositivesandFalseNegatives

meaningthatifyouvarytheconfidencelimit,thepre-

inadataset,wecandefinethefollowingmetrics:

cision and recall will still be high. Hence, to com-

TP parebetweencurveswegenerallyrelyonanumerical

Precision= , (2) metriccalledAveragePrecision(AP).Since2010,the

TP+FP

ICEIS2021-23rdInternationalConferenceonEnterpriseInformationSystems

standardcomputationmethodforAPconsistsincal- Table1:Bestandworsthyperparametersconfigurationsob-

culatingtheareaunderthecurve(AUC)ofthePreci- tainedandtheircorrespondingresults.

sionxRecallcurve(Everinghametal.,2010). Parameters BestCase WorstCase

Resolution 960x540 960x540

camposes 5 5

nscenes 20 20

4 EXPERIMENTS AND RESULTS

nimages 100 100

ndistractors 20 0

GenerationTime 1257.30 749.92

The results obtained for the parameter selection pro-

nsamples 10 10

cedure as well as our final evaluations are presented

Precision%:Avg(Std.Dev) 78.06(15.41) 57.34(16.27)

here. TheseexperimentswereconductedonaNvidia Recall%:Avg(Std.Dev) 96.223(4.07) 90.71(6.45)

Quadro P5000 GPU and a 2.90GHz Intel Xeon E3- F1-Score%:Avg(Std.Dev) 85.19(10.57) 66.80(11.49)

154Mv5processor(16GBofRAM).

adropofaround23%inF1-score,onaverageacross

the 10 experiment samples. We also tried combina-

4.1 HyperparameterTuning

tions with few distractors, but the F1-score results

dropped significantly. This makes sense as the real

The parameter selection procedure is conducted ex-

imagesevaluatedhadseveraldistractorsaswell.

clusively on the Yamaha logo object, the best set of

Another important point is that the resolution of

parametersisthentestedontheAdblueobjecttoen-

theimagesgeneratedshouldbegreaterthantheinfer-

surethatitalsoperformswell. Theinfluenceoffour

enceimages. Inallofourtests, thisscenarioalways

tunableparametersonthefinalresultsisstudiedhere.

producedthebestresults. Italsomakessenseasitis

foreachparameter,threevalueswereselectedforthe

easier to learn from a more detailed/complex model

tests. Theseparametersandtheirstudiedvaluesare:

andthenevaluateinalessdetailed/complexscenario.

• Resolution: 640x480,960x540,1080x720 Finally, we also tried to increase the number of

generated training images to see if this would lead

• Cameraposes: 2,5,20

to an increase in performance. Surprisingly, we ac-

• Numberofscenes: 20,50,200 knowledged that the performance dropped for the

• Numberofdistractors: 0,5,20 case with 20 distractors, 20 camera poses and 50

scenes (1000 images). This might mean that when

Fromsimplepreliminaryexperimentsthatarenotpre-

presentedtoomanysyntheticimages,themodelstarts

sented here, we concluded that the number of tex-

overfitting to the biases involved by our generation

tures used for the floor, the distractors and the sup-

process, and it also indicates that we do not need a

portshouldbesettothemaximumnumberoftextures

large number of images to train our model. In ad-

available(inourcase7forthefloorand6forthetwo

ditiontothisperformancedrop,generatingtentimes

others). Theparametervaluesusedinthisworkwere

moreimagesalsomakestheproposedpipelinealmost

chosen empirically, that is, after several test scenar-

25timesslower(31037.16seconds).

ios,thesevaluesweretheonesthatgeneratedthebest

performanceregardingthemetrics.

4.2 Results

In total, from the values selected for the four pa-

rameters,wesampledmorethan30combinationsand

Inthissection,weevaluatetheresultsofthebestcom-

comparedtheODresultsonthetestingsetofrealim-

bination of parameters (Best Case from Table 1) in

ages. For each combination tested, we trained the

more details. These results are presented in Table 2,

Faster-RCNNCNNonthesyntheticimagesthatwere

they correspond to using a confidence threshold of

generated. We note that, for each hyperparameters

0.9 and an IoU threshold of 0.5. From Table 2, we

combination,theexperimentswererepeated10times

canseethatthebestparametersidentifiedusingonly

inordertoattenuatetheinfluenceoftherandomcom-

theYamahalogoproducesimilarresultswhenapplied

ponents in the generation and training process. For

to another object (Adblue). This suggests that the

reasonsofspaceinthisarticle,itwasnotpossibleto

proposedparametersforourmethodseemtobewell

presentallresults. However, inordertodemonstrate

suitedfordifferentobjectsandthuscouldgeneralize

the importance of this parameter selection step, Ta-

welltovariousindustrialusecaseswithoutadditional

ble1showsthebestandtheworstconfigurationthat

parametertuning.

weretested.

In Table 1, we can see that the distractors are an

essential element in our proposed pipeline for image

generation. Indeed,whenremovingthem,wecansee

ANovelMethodforObjectDetectionusingDeepLearningandCADModels

Table2:Resultsobtainedwiththebestsetofhyperparame- input. The method first generates realistic synthetic

ters.

imagesusingacustomBlenderscript,andthentrains

Object Precision% Recall% F1-Score% afaster-RCNNODmodelusingtheTensorFlowOD

Adblue 85.11 80.00 81.93 API.Tounderstandandoptimizethedifferentparam-

Yamahalogo 78.06 96.22 85.19

eters in the proposed pipeline, a systematic parame-

terselectionstudyisconductedusingaYamahalogo

4.3 Discussion

CAD model for training and real images containing

the same object in context for evaluation. The se-

Itisdifficulttocompareourresultswithotherworks lected hyperparameters are then tested on an other

in the literature. Indeed, as far as we know, the ap- object, showing that they can generalize to different

proach presented in this work is the first proposal to scenarios.

build a fully automated pipeline that takes as input Over the last decade, successful deep learning

the CAD model of an object and outputs a trained methodshavebeendevelopedtotacklethechalleng-

object detection model for this object without any ing problem of generic object detection. However,

real image. For fair comparison we would need to when it comes to the problem of OD in an indus-

compare our work with other end-to-end systematic trialenvironment,theavailabilityofgoodqualitydata

approaches to build OD models from CAD models, becomes a bottleneck. To address this issue we pro-

which is impossible as it does not exists. Else, we posed to use synthetic images for training, which is

hopethattheresultspresentedinthisworkcanserve challenging as it might not reflect the high variabil-

asagoodbaselineforcomparisonoffutureworksin ity found in in real industrial environment (objects,

thisresearchdirection. pieces and scenery, etc.). In addition, there is also a

However,wegivearoughcomparisonwithother difficulty in finding CAD models of specific indus-

relevant works to give an idea of how well our ap- trialobjectssothattheycanbetrainedandotherap-

proach is performing. In (Mazzetto et al., 2019) the proachescanbetestedandcompared. Thus,asacon-

detectionofobjectsinanautomobileproductionline sequenceofthiswork,asetofdatawasproducedand

was implemented, using only real images of the ob- madepubliclyavailableforfutureresearch5.

jects. In this work, the estimated detection accuracy Therefore, themainconclusionfromthisworkis

wasaround90%,whichisonlyabout5to10%bet- thatitispossibletotrainanobjectdetectionmodelon

ter than the results obtained in our work using only a set of synthetic images generated from CAD mod-

synthetic images. In (Jabbar et al., 2017), the au- els with excellent performance. In addition, it was

thorsalsotrainanODmodelusingsyntheticimages shownthatalargesetofimagesisnotneededtoob-

generated with Blender and evaluate the results in tainasignificantresult. Ourexperimentsindicatethat

real images. However, this approach is not entirely theproposedrenderingprocessissufficienttoobtain

automated since the scenes are created manually by good performances and that the way of building and

Blender artists to ensure photo-realism. The object renderingthescenesiscrucialforthefinalresult.

used in this work for evaluation is a glass of wine

and the maximum AP obtained is 71.14 %. We can

seethatoursystematicapproachseemstoworkbetter ACKNOWLEDGEMENTS

thanthisapproach,however,wecannotreproducethe

methodonourobjects,aswecannotcreatethescenes

OurworkhasbenefitedfromtheAIInterdisciplinary

manually in the same way that they would. The po-

InstituteANITI.ANITIisfundedbytheFrench“In-

tentialbetterperformanceofourapproachcanbeex-

vesting for the Future – PIA3” program under the

plainedbythefactthatthelossofphoto-realismcan Grantagreementn◦ANR-19-PI3A-0004.

be compensated by the higher number of images in

our synthetic datasets. Indeed, with our fully auto-

mated approach, it is faster and requires no effort to

REFERENCES

generatemoredata,unlikein(Jabbaretal.,2017).

Ananth, S. (2019). Faster R-CNN for object detection, a

technicalpapersummary.

5 CONCLUSIONS

Ben-Himane, S., Hintestroisser, S., andNavab, N.(2010).

Computer vision CAD models. US Patent App.

12/682,199.

Thisworkpresentsasystematicapproachtotrainob-

ject detection models to address industrial scenarios, 5https://github.com/igorgbs/systematic approach cad

using only a CAD model of the object of interest as models

ICEIS2021-23rdInternationalConferenceonEnterpriseInformationSystems

BlenderOnlineCommunity(2018). Blender-a3Dmod- Kuznetsova,A.,Rom,H.,Alldrin,N.,Uijlings,J.,Krasin,

ellingandrenderingpackage. BlenderFoundation. I.,Pont-Tuset,J.,Kamali,S.,Popov,S.,Malloci,M.,

Cohen, J., Crispim-Junior, C., Grange-Faivre, C., and Duerig, T., et al. (2018). The open images dataset

Tougne, L. (2020). CAD-based learning for ego- v4:Unifiedimageclassification,objectdetection,and

centric object detection in industrial context. In visualrelationshipdetectionatscale. arXivpreprint

15th International Conference on Computer Vision arXiv:1811.00982.

Theory and Applications, volume 5, pages 644–651. Lin,T.-Y.,Maire,M.,Belongie,S.,Hays,J.,Perona,P.,Ra-

SCITEPRESS. manan,D.,Dolla´r,P.,andZitnick,C.L.(2014). Mi-

Drost,B.,Ulrich,M.,Bergmann,P.,Hartinger,P.,andSte- crosoftCOCO:Commonobjectsincontext. InEuro-

ger,C.(2017). IntroducingMVTecITODD-adataset peanconferenceoncomputervision,pages740–755.

for3dobjectrecognitioninindustry. InProceedings Springer.

of the IEEE International Conference on Computer Lindsay, A., Paterson, A., and Graham, I. (2018). Identi-

VisionWorkshops,pages2200–2208. fying and quantifying inefficiencies within industrial

Everingham, M., Van Gool, L., Williams, C. K., Winn, parametric CAD models. In Advances in Manufac-

J., and Zisserman, A. (2010). The pascal visual ob- turingTechnologyXXXII:Proceedingsofthe16thIn-

jectclasses(VOC)challenge.Internationaljournalof ternational Conference on Manufacturing Research,

computervision,88(2):303–338. volume8,page227.IOSPress.

Ge, C., Wang, J., Wang, J., Qi, Q., Sun, H., and Liao, Liu,L.,Ouyang,W.,Wang,X.,Fieguth,P.,Chen,J.,Liu,

J. (2020). Towards automatic visual inspection: A X., and Pietika¨inen, M. (2020). Deep learning for

weaklysupervisedlearningmethodforindustrialap- genericobjectdetection:Asurvey.Internationaljour-

plicable object detection. Computers in Industry, nalofcomputervision,128(2):261–318.

121:103232. Mazzetto,M.,Southier,L.F.,Teixeira,M.,andCasanova,

Gue´rin, J., Gibaru, O., Nyiri, E., Thiery, S., and Palos, D. (2019). Automatic classification of multiple ob-

J. (2018a). Automatic construction of real-world jects in automotive assembly line. In 2019 24th

datasets for 3D object localization using two cam- IEEE International Conference on Emerging Tech-

eras. In IECON 2018-44th Annual Conference of nologiesandFactoryAutomation(ETFA),pages363–

theIEEEIndustrialElectronicsSociety,pages3655– 369.IEEE.

3658.IEEE. Peng,X.,Sun,B.,Ali,K.,andSaenko,K.(2015).Learning

Gue´rin, J., Gibaru, O., Nyiri, E., Thieryl, S., and Boots, deep object detectors from 3D models. In Proceed-

B.(2018b). Semanticallymeaningfulviewselection. ings of the IEEE International Conference on Com-

In2018IEEE/RSJInternationalConferenceonIntel- puterVision,pages1278–1286.

ligentRobotsandSystems(IROS),pages1061–1066. Prasad,D.K.(2012). Surveyoftheproblemofobjectde-

IEEE. tectioninrealimages. InternationalJournalofImage

He,K.,Zhang,X.,Ren,S.,andSun,J.(2015).Spatialpyra- Processing(IJIP),6(6):441.

midpoolingindeepconvolutionalnetworksforvisual

Rajpura,P.S.,Bojinov,H.,andHegde,R.S.(2017). Ob-

recognition. IEEE transactions on pattern analysis

ject detection using deep CNNs trained on synthetic

andmachineintelligence,37(9):1904–1916.

images. arXivpreprintarXiv:1706.06782.

Hinterstoisser, S., Lepetit, V., Wohlhart, P., andKonolige,

Ren,S.,He,K.,Girshick,R.,andSun,J.(2015). FasterR-

K. (2018). On pre-trained image features and syn-

CNN:Towardsreal-timeobjectdetectionwithregion

theticimagesfordeeplearning. InProceedingsofthe

proposalnetworks.InAdvancesinneuralinformation

EuropeanConferenceonComputerVision(ECCV).

processingsystems,pages91–99.

Hirz,M.,Rossbacher,P.,andGulanova´,J.(2017). Future

Shirley,P.andMorley,R.K.(2003). Realisticraytracing.

trends in CAD–from the perspective of automotive

AKPeters/CRCPress.

industry. Computer-aided design and applications,

Xiao, Y., Tian, Z., Yu, J., Zhang, Y., Liu, S., Du, S., and

14(6):734–741.

Lan, X. (2020). A review of object detection based

Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A.,

ondeeplearning. MultimediaToolsandApplications,

Fathi, A., Fischer, I., Wojna, Z., Song, Y., Guadar-

pages1–63.

rama,S.,etal.(2017). Speed/accuracytrade-offsfor

Yang,J.,Li,S.,Wang,Z.,andYang,G.(2019). Real-time

modern convolutional object detectors. In Proceed-

tinypartdefectdetectionsysteminmanufacturingus-

ingsoftheIEEEconferenceoncomputervisionand

ingdeeplearning. IEEEAccess,7:89278–89291.

patternrecognition,pages7310–7311.

Zhang, X., Yang, Y.-H., Han, Z., Wang, H., and Gao, C.

Jabbar, A., Farrawell, L., Fountain, J., and Chalup, S. K.

(2013). Objectclassdetection:Asurvey. ACMCom-

(2017). Trainingdeepneuralnetworksfordetecting

putingSurveys(CSUR),46(1):1–53.

drinkingglassesusingsyntheticimages. InInterna-

tionalConferenceonNeuralInformationProcessing, Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., and Tor-

pages354–363.Springer. ralba,A.(2017).Places:A10millionimagedatabase

Jana, A. P., Biswas, A., et al. (2018). YOLO based de- for scene recognition. IEEE transactions on pattern

tectionandclassificationofobjectsinvideorecords. analysisandmachineintelligence,40(6):1452–1464.

In20183rdIEEEInternationalConferenceonRecent

TrendsinElectronics,Information&Communication

Technology(RTEICT),pages2448–2452.IEEE.