将KITTI的数据格式转换为VOC Pascal的xml格式
深度学习,如果GPU是引擎,那数据一定就是燃料。可想而知数据的重要性,这篇博文主要介绍如何将KITTI的数据格式转换为VOC Pascal的xml格式。虽然在我的github里给出了能适应KITTI数据格式的接口。但看着总是不如xml格式的优雅。
随便选取一张KITTI的标签数据如下:
Pedestrian 0.00 1 0.30 883.68 144.15 937.35 259.01 1.90 0.42 1.04 5.06 1.43 12.42 0.68
Pedestrian 0.00 2 0.29 873.70 152.10 933.44 256.07 1.87 0.50 0.90 5.42 1.50 13.43 0.67
Car 0.00 0 1.74 444.29 171.04 504.95 225.82 1.86 1.57 3.83 -4.95 1.83 26.64 1.55
Pedestrian 0.00 0 -0.39 649.28 168.10 664.61 206.40 1.78 0.53 0.95 2.20 1.57 34.08 -0.33
Car 0.98 0 2.42 0.00 217.12 85.92 374.00 1.50 1.46 3.70 -5.12 1.85 4.13 1.56
Pedestrian 0.00 0 2.02 240.35 190.31 268.02 261.61 1.54 0.57 0.41 -7.92 1.94 15.95 1.57
DontCare -1 -1 -10 0.00 226.06 88.58 373.00 -1 -1 -1 -1000 -1000 -1000 -10
DontCare -1 -1 -10 567.39 173.95 574.86 190.60 -1 -1 -1 -1000 -1000 -1000 -10
DontCare -1 -1 -10 727.58 165.75 737.08 192.75 -1 -1 -1 -1000 -1000 -1000 -10
其中每行表示一个类别和对应的Boundingbox,每行第一个字符为类别名称,第5-8为$x_{left},y_{left},x_{right},y_{right}$坐标。
再看一下VOC Pascal的xml格式
<annotation>
<folder>VOC2007</folder>
<filename>000006.jpg</filename>
<source>
<database>The VOC2007 Database</database>
<annotation>PASCAL VOC2007</annotation>
<image>flickr</image>
<flickrid>330391518</flickrid>
</source>
<owner>
<flickrid>supershoppertoo</flickrid>
<name>?</name>
</owner>
<size>
<width>500</width>
<height>375</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>pottedplant</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>187</xmin>
<ymin>135</ymin>
<xmax>282</xmax>
<ymax>242</ymax>
</bndbox>
</object>
<object>
<name>diningtable</name>
<pose>Unspecified</pose>
<truncated>1</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>154</xmin>
<ymin>209</ymin>
<xmax>369</xmax>
<ymax>375</ymax>
</bndbox>
</object>
<object>
<name>chair</name>
<pose>Left</pose>
<truncated>1</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>255</xmin>
<ymin>207</ymin>
<xmax>366</xmax>
<ymax>375</ymax>
</bndbox>
</object>
<object>
<name>chair</name>
<pose>Unspecified</pose>
<truncated>1</truncated>
<difficult>1</difficult>
<bndbox>
<xmin>298</xmin>
<ymin>195</ymin>
<xmax>332</xmax>
<ymax>247</ymax>
</bndbox>
</object>
<object>
<name>chair</name>
<pose>Unspecified</pose>
<truncated>1</truncated>
<difficult>1</difficult>
<bndbox>
<xmin>279</xmin>
<ymin>190</ymin>
<xmax>308</xmax>
<ymax>231</ymax>
</bndbox>
</object>
<object>
<name>chair</name>
<pose>Unspecified</pose>
<truncated>1</truncated>
<difficult>1</difficult>
<bndbox>
<xmin>137</xmin>
<ymin>192</ymin>
<xmax>151</xmax>
<ymax>199</ymax>
</bndbox>
</object>
<object>
<name>chair</name>
<pose>Unspecified</pose>
<truncated>1</truncated>
<difficult>1</difficult>
<bndbox>
<xmin>137</xmin>
<ymin>198</ymin>
<xmax>156</xmax>
<ymax>212</ymax>
</bndbox>
</object>
<object>
<name>chair</name>
<pose>Right</pose>
<truncated>1</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>138</xmin>
<ymin>211</ymin>
<xmax>249</xmax>
<ymax>375</ymax>
</bndbox>
</object>
</annotation>
瞬间高大上了不少,主要信息为Object的信息需要记录,有些不需要的信息可以不写。
在python中解析XML文件也有Dom和Sax两种方式,这一篇文章是Dom生成XML文件。
主要方法
1、生成XML节点(node)
createElement("node_name")
2、给节点添加属性值(Attribute)
node.setAttribute("att_name", "arr_value")
3、节点的标签值(data)
createTextNode("node_value")
其中第1、3点在创建完节点(节点值)之后,还需使用下面的方法添加到指点的节点的位置下面:
prev_node.appendChild(cur_node)
这里的prev_node要添加节点的上一层节点,而cur_node即为当前要添加的节点了。
下面贴出KITTI格式转VOC PASCAL的代码
#encoding:utf-8
'''
根据一个给定的XML Schema,使用DOM树的形式从空白文件生成一个XML。
'''
from xml.dom.minidom import Document
import cv2, os
def generate_xml(name,split_lines,img_size,class_ind):
doc = Document() #创建DOM文档对象
annotation = doc.createElement('annotation')
doc.appendChild(annotation)
title = doc.createElement('folder')
title_text = doc.createTextNode('KITTI')
title.appendChild(title_text)
annotation.appendChild(title)
img_name=name+'.png'
title = doc.createElement('filename')
title_text = doc.createTextNode(img_name)
title.appendChild(title_text)
annotation.appendChild(title)
source = doc.createElement('source')
annotation.appendChild(source)
title = doc.createElement('database')
title_text = doc.createTextNode('The KITTI Database')
title.appendChild(title_text)
source.appendChild(title)
title = doc.createElement('annotation')
title_text = doc.createTextNode('KITTI')
title.appendChild(title_text)
source.appendChild(title)
size = doc.createElement('size')
annotation.appendChild(size)
title = doc.createElement('width')
title_text = doc.createTextNode(str(img_size[1]))
title.appendChild(title_text)
size.appendChild(title)
title = doc.createElement('height')
title_text = doc.createTextNode(str(img_size[0]))
title.appendChild(title_text)
size.appendChild(title)
title = doc.createElement('depth')
title_text = doc.createTextNode(str(img_size[2]))
title.appendChild(title_text)
size.appendChild(title)
for split_line in split_lines:
line=split_line.strip().split()
if line[0] in class_ind:
object = doc.createElement('object')
annotation.appendChild(object)
title = doc.createElement('name')
title_text = doc.createTextNode(line[0])
title.appendChild(title_text)
object.appendChild(title)
bndbox = doc.createElement('bndbox')
object.appendChild(bndbox)
title = doc.createElement('xmin')
title_text = doc.createTextNode(line[4])
title.appendChild(title_text)
bndbox.appendChild(title)
title = doc.createElement('ymin')
title_text = doc.createTextNode(line[5])
title.appendChild(title_text)
bndbox.appendChild(title)
title = doc.createElement('xmax')
title_text = doc.createTextNode(line[6])
title.appendChild(title_text)
bndbox.appendChild(title)
title = doc.createElement('ymax')
title_text = doc.createTextNode(line[7])
title.appendChild(title_text)
bndbox.appendChild(title)
########### 将DOM对象doc写入文件
f = open('label/'+name+'.xml','w')
f.write(doc.toprettyxml(indent = ''))
f.close()
if __name__ == '__main__':
class_ind=('Pedestrian', 'Car', 'Cyclist')
cur_dir=os.getcwd()
dir=os.path.join(cur_dir,'training')
train_txt= open('train.txt','w')
for parent, dirnames, filenames in os.walk(dir):
for file_name in filenames:
full_path=os.path.join(parent, file_name)
f=open(full_path)
split_lines = f.readlines()
name= file_name[:-4]
img_name=name+'.png'
img_path=os.path.join('/home/zou/KITTI-detection/data/training/image_2',img_name)
img_size=cv2.imread(img_path).shape
generate_xml(name,split_lines,img_size,class_ind)
train_txt.write('image_2/'+img_name+' '+'label/'+name+'.xml'+'\n')
最后转换好以后的格式如下:
<annotation>
<folder>KITTI</folder>
<filename>000011.png</filename>
<source>
<database>The KITTI Database</database>
<annotation>KITTI</annotation>
</source>
<size>
<width>1242</width>
<height>375</height>
<depth>3</depth>
</size>
<object>
<name>Pedestrian</name>
<bndbox>
<xmin>883.68</xmin>
<ymin>144.15</ymin>
<xmax>937.35</xmax>
<ymax>259.01</ymax>
</bndbox>
</object>
<object>
<name>Pedestrian</name>
<bndbox>
<xmin>873.70</xmin>
<ymin>152.10</ymin>
<xmax>933.44</xmax>
<ymax>256.07</ymax>
</bndbox>
</object>
<object>
<name>Car</name>
<bndbox>
<xmin>444.29</xmin>
<ymin>171.04</ymin>
<xmax>504.95</xmax>
<ymax>225.82</ymax>
</bndbox>
</object>
<object>
<name>Pedestrian</name>
<bndbox>
<xmin>649.28</xmin>
<ymin>168.10</ymin>
<xmax>664.61</xmax>
<ymax>206.40</ymax>
</bndbox>
</object>
<object>
<name>Car</name>
<bndbox>
<xmin>0.00</xmin>
<ymin>217.12</ymin>
<xmax>85.92</xmax>
<ymax>374.00</ymax>
</bndbox>
</object>
<object>
<name>Pedestrian</name>
<bndbox>
<xmin>240.35</xmin>
<ymin>190.31</ymin>
<xmax>268.02</xmax>
<ymax>261.61</ymax>
</bndbox>
</object>
</annotation>
看我写的辛苦求打赏啊!!!有学术讨论和指点请加微信manutdzou,注明