AlexeyAB (preferred fork) Edit
Alexey should be considered as the lead developer now.
- https://github.com/AlexeyAB/Yolo_mark GUI for marking bounded boxes of objects in images for training Yolo v2
- Multi gpu training instructions.
- https://timebutt.github.io/static/how-to-train-yolov2-to-detect-custom-objects/ sites https://pjreddie.com/darknet/yolo/ training data set. from Nils Tijtgat. YOLOv2 is known to struggle when detecting small objects. The Darknet Google Groups has many different topics on how you could improve performance, you could have a look there to find inspiration. A suggestion that is often repeated is to train YOLOv2 using a higher input resolution, instead of 416x416. See this or this for instance. yolo small google groups1, yolo small 2 google groups
https://github.com/AlexeyAB/darknet#how-to-train-pascal-voc-data Fork of Yolo, download android webcam app. use android phone as network camera input stream.
- sept2017 version is 15fps on tx1, use alexeyab as it gives 200fps. Also pjreddie reorganized his code substantially changing folders.
- counting number of objects in image , people tracking, https://www.youtube.com/watch?v=QeWl0h3kQ24
- fix for 4k video fps from 7 to 20
- run demo without screen when using AMZ gpu's for example.
- small object detection
Notable forks Edit
https://github.com/oarriaga/face_classification Gender and face detection.
https://github.com/RiccardoGrin/darknet Though there are many image datasets/databases online, I could not find the images which I wanted, or these were part of a very large set, or the download was simply too large. Therefore, I just used my phone to take photos. However the smallest photos I could take were 3264\*1836, and their names were not as desired. From research, apparently at least 250 different images are needed for each class. Taking 250 photos can take some time and creativity, therefore I took only half, and did some image augmentation (flipping, rotating, etc...) to get all 250 images. NOTE: Much better results will be achieved by get the 250 images or more, without applying any augmentation, as there will be more difference between the images. Thus image augmentation should only really be used to increase the set, to further improve the classification accuracy, though it will not be as large an increase as using original iamges.
Darknet detector train Data/voc.data yolo.cfg darknet19_448.conv.23 from darknet groups training command
I'm assuming you've successfully created a train.txt file? (this is the file full of all of your filepaths to your dataset, and it's creation is detailed on the YOLO homepage). So, if you've got that created, it's probably not in your /data/voc/ directory; it's most likely in the directory one level up from where you have your images and labels stored. In yolo.c you need to specify where that file is located (you can use an absolute path here) so go to where you have train.txt and enter the pwd command (for print working directory), copy that absolute filepath into your yolo.c file on the 18th line (replace what is there), and then do "make clean" and "make" in your darknet directory. from training paul mcelroy https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects
Can I reduce number of convolution layers and fully connected layers in yolo.cfg file As long as downsampling factor stays 32, you can do anything you want. As you can see, network taking 416x416 image and downsampling it until 13x13. So downsampling factor is 32 (416/13). Changing number of convolution filters does not affect for downsampling factor because downsampling is connected to the spatial size where number-of-conv-filters works with the depth of the tensor. However, if you remove one of the conv layers then downsampling factor will change from 32. If you have single class, I would recommend decresing number of "last second" (https://github.com/Jumabek/darknet/blob/master/cfg/yolo-voc.cfg#L217) and "last third" (https://github.com/Jumabek/darknet/blob/master/cfg/yolo-voc.cfg#L200) convolutional filters from 1024,1024 to 256,512. Also, make sure you use anchors that are special to people images. This scripts might be helpful for computing anchors. https://github.com/Jumabek/darknet_scripts
[Calculating Anchors region kmeans clustering on training data width and height. the anchors are used similar to anchor boxes, yolov2 predicts offsets to these widths and heights (however it predicts the x/y coordinates in the same way as yolo v1). Please note, anchros are generated by K-means algorithm where author clustered all the VOC box size and ratio to 5 groups. So 16,10 is one of the clusters from those 5. I will probably make a tutorial about anchors this weekend, stay tuned(Jumabek Alikhanov)
make file Edit
- gpu settins arch in make file
- http://www.pradeepadiga.me/blog/2017/03/22/installing-cuda-toolkit-8-0-on-ubuntu-16-04/ installing cuda
node js Edit
https://github.com/moovel/node-yolo , https://lab.moovel.com/blog/what-you-get-is-what-you-see-nodejs-yolo Teaching your computer how to see just got easier with node-yolo. Created as a collaboration between the moovel lab and Alex (@OrKoN of moovel engineering), node-yolo builds upon Joseph Redmon’s neural network framework and wraps up the You Only Look Once (YOLO) real-time object detection library - YOLO - into a convenient and web-ready node.js module. The best thing about it: it’s open source!
yolo swift Edit
bounding box Edit
https://groups.google.com/forum/#!topic/darknet/1_HhQwr2BkA urban object detection with https://www.cityscapes-dataset.com/ dataset.
Python wrapper Edit
- https://github.com/thomaspark-pkj/pyyolo outputs bounding box to text file. use opencv2.4 due to waitkey issue and not 3.3.0
- https://github.com/lucaswamser/darknet , https://github.com/pjreddie/darknet/pull/111
tensorflow port Edit
- https://github.com/pjreddie/TopDeepLearning Various projects on deep learning neural nets.
- https://groups.google.com/forum/#!forum/darknet forum
train yolo coco data The first time I made a custom dataset that ran the 'demo' argument I changed yolo.c line 13 "char *voc_names=..." to reflect my custom classes. The second time I made a custom dataset, I added an argument to darknet.c "-override_vocnames" that loaded the appropriate "names=" file from the data file. ie - coco.data
- Maybe not the best way to do it. But it was easy to implement.
https://github.com/thtrieu/darkflow json output can be generated with descriptions of the pixel location of each bounding box and the pixel location. Each prediction is stored in the sample_img/out folder by default. An example json array is shown below.
- https://github.com/saiprabhakar/darknet-modified/tree/v0 Outputs image labels and bounding box to text file. When a person walking down the street veers unto the driveway, his position changes triggering an alert. https://groups.google.com/forum/#!topic/darknet/ylEWe3JUKrE
- https://github.com/saiprabhakar/Scene-recognition subscene analysis
- https://github.com/saiprabhakar/DeepDriving Deep driving. See Jabelone
- https://github.com/Guanghan/darknet This fork repository adds some additional niche in addition to the current darkenet from pjreddie. e.g. (1). Read a video file, process it, and output a video with boundingboxes.
- http://guanghan.info/blog/en/my-works/train-yolo/ and his SSD detector
- https://groups.google.com/forum/#!topic/darknet/cxTAbP-um7Y ,
- https://github.com/puzzledqs/BBox-Label-Tool ,
I am wondering the answer of original question. Can we get coordinates and count of detected objects, as text output, in darknet?
yes you can, go to in folder src/image.c find draw_detection function, left,right,top,bot is image bounding box, names[class] is object name, you can save bounding box and object in txt and count the object
http://guanghan.info/projects/ROLO/ Rolo a fork of Yolo does realtime tracking and identification of the body parts of a human such as face, allowing the Tracked vehicle robot's PepperBall gun accurate engagement. https://github.com/Guanghan/ROLO.
Yolo python wrapper Edit
face tracking Edit
https://www.youtube.com/watch?v=UsOi1BfunnU https://github.com/xhuvom/darknetFaceID i] To detect face from live camera feed and annotate automatically, use the .cfg and .weight files from QuanHua (https://mega.nz/#F!GRV1XKbJ!v8BCsFO8iJVNppiGXY4qMw). [ii] Only add those lines on src/image.c file of this fork as described bellow:
(line #223) to save .jpg images and (line #227) to save annotations on separate folders for each class (also change class number on line #229
[iii] After modifications, run the detector from live webcam or video file which specifically shows only one particular persons face. [iv] Repeat the process for every persons you want to recognize and modify training data location and class number accordingly. About ~2k face images per person is enough to recognize individual faces but to improve accuracy, more data could be added.
- https://www.youtube.com/watch?v=DeCFxPQlOVk indian traffic data , https://github.com/ctmackay/darknet , Track 1 utilized the Darknet framework with Yolo object detection. We achived 2nd place in mean average precision for the AI city challenge using this network and training parameters. You will need to build darknet in order to train and run inference on the models. i need to contact nvidia representative, they own the rights to the dataset, I may not have permission to release the models. I am meeting with them on the 6th, i will get back to you.
c++ wrapper Edit
https://groups.google.com/forum/#!topic/darknet/oxAi9DjxTcM Check src/yolo.c for the various input args and how each of them are handled. You could extend the test_yolo function to run detection on multiple images: void test_yolo(char *cfgfile, char *weightfile, char *filename, float thresh)
Uses a TitanX GPU($600) with Yolo to identify objects, draw bounding box and pass the coordinates to say thirty separate Tracked vehicle bots with cost effective CPU running OpenTLD. Ideal solution is to implement yolo on FpGa.