Home
Softono
a

andyzeng

Professional software vendor delivering innovative solutions on the Softono platform. Specialized in both open-source and proprietary software development.

Total Products
2

Software by andyzeng

visual-pushing-grasping
Open Source

visual-pushing-grasping

# Visual Pushing and Grasping Toolbox Visual Pushing and Grasping (VPG) is a method for training robotic agents to learn how to plan complementary pushing and grasping actions for manipulation (*e.g.* for unstructured pick-and-place applications). VPG operates directly on visual observations (RGB-D images), learns from trial and error, trains quickly, and generalizes to new objects and scenarios. <img src="images/teaser.jpg" height=223px align="left"/> <img src="images/self-supervision.gif" height=223px align="left"/> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br> This repository provides PyTorch code for training and testing VPG policies with deep reinforcement learning in both simulation and real-world settings on a UR5 robot arm. This is the reference implementation for the paper: ### Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning [PDF](https://arxiv.org/pdf/1803.09956.pdf) | [Webpage & Video Results](http://vpg.cs.princeton.edu/) [Andy Zeng](http://andyzeng.github.io/), [Shuran Song](http://vision.princeton.edu/people/shurans/), [Stefan Welker](https://www.linkedin.com/in/stefan-welker), [Johnny Lee](http://johnnylee.net/), [Alberto Rodriguez](http://meche.mit.edu/people/faculty/[email protected]), [Thomas Funkhouser](https://www.cs.princeton.edu/~funk/) IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2018 Skilled robotic manipulation benefits from complex synergies between non-prehensile (*e.g.* pushing) and prehensile (*e.g.* grasping) actions: pushing can help rearrange cluttered objects to make space for arms and fingers; likewise, grasping can help displace objects to make pushing movements more precise and collision-free. In this work, we demonstrate that it is possible to discover and learn these synergies from scratch through model-free deep reinforcement learning. Our method involves training two fully convolutional networks that map from visual observations to actions: one infers the utility of pushes for a dense pixel-wise sampling of end effector orientations and locations, while the other does the same for grasping. Both networks are trained jointly in a Q-learning framework and are entirely self-supervised by trial and error, where rewards are provided from successful grasps. In this way, our policy learns pushing motions that enable future grasps, while learning grasps that can leverage past pushes. During picking experiments in both simulation and real-world scenarios, we find that our system quickly learns complex behaviors amid challenging cases of clutter, and achieves better grasping success rates and picking efficiencies than baseline alternatives after only a few hours of training. We further demonstrate that our method is capable of generalizing to novel objects. <!-- ![Method Overview](method.jpg?raw=true) --> <img src="images/method.jpg" width=100%/> #### Citing If you find this code useful in your work, please consider citing: ``` @inproceedings{zeng2018learning, title={Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning}, author={Zeng, Andy and Song, Shuran and Welker, Stefan and Lee, Johnny and Rodriguez, Alberto and Funkhouser, Thomas}, booktitle={IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, year={2018} } ``` #### Demo Videos Demo videos of a real robot in action can be found [here](http://vpg.cs.princeton.edu/). #### Contact If you have any questions or find any bugs, please let me know: [Andy Zeng](http://www.cs.princeton.edu/~andyz/) andyz[at]princeton[dot]edu ## Installation This implementation requires the following dependencies (tested on Ubuntu 16.04.4 LTS): * Python 2.7 or Python 3 * [NumPy](http://www.numpy.org/), [SciPy](https://www.scipy.org/scipylib/index.html), [OpenCV-Python](https://docs.opencv.org/3.0-beta/doc/py_tutorials/py_tutorials.html), [Matplotlib](https://matplotlib.org/). You can quickly install/update these dependencies by running the following (replace `pip` with `pip3` for Python 3): ```shell pip install numpy scipy opencv-python matplotlib ``` * ~~[PyTorch](http://pytorch.org/) version 0.3. Since 0.3 is no longer the latest version, see installation instructions [here](http://pytorch.org/previous-versions/) or run the following:~~ ~~``` pip install torch==0.3.1 torchvision==0.2.0 ```~~ * [PyTorch](http://pytorch.org/) version 1.0+ (thanks [Andrew](https://github.com/ahundt) for the support!): ```shell pip install torch torchvision ``` <!-- Support for PyTorch version 0.4+ is work-in-progress and lives in [this branch](https://github.com/andyzeng/visual-pushing-grasping/tree/support-pytorch-v0.4), but currently remains unstable. --> * [V-REP](http://www.coppeliarobotics.com/) (now known as [CoppeliaSim](http://www.coppeliarobotics.com/)) simulation environment ### (Optional) GPU Acceleration Accelerating training/inference with an NVIDIA GPU requires installing [CUDA](https://developer.nvidia.com/cuda-downloads) and [cuDNN](https://developer.nvidia.com/cudnn). You may need to register with NVIDIA for the CUDA Developer Program (it's free) before downloading. This code has been tested with CUDA 8.0 and cuDNN 6.0 on a single NVIDIA Titan X (12GB). Running out-of-the-box with our pre-trained models using GPU acceleration requires 8GB of GPU memory. Running with GPU acceleration is **highly recommended**, otherwise each training iteration will take several minutes to run (as opposed to several seconds). This code automatically detects the GPU(s) on your system and tries to use it. If you have a GPU, but would instead like to run in CPU mode, add the tag `--cpu` when running `main.py` below. ## A Quick-Start: Demo in Simulation <img src="images/simulation.gif" height=200px align="right" /> <img src="images/simulation.jpg" height=200px align="right" /> This demo runs our pre-trained model with a UR5 robot arm in simulation on challenging picking scenarios with adversarial clutter, where grasping an object is generally not feasible without first pushing to break up tight clusters of objects. ### Instructions 1. Checkout this repository and download our pre-trained models. ```shell git clone https://github.com/andyzeng/visual-pushing-grasping.git visual-pushing-grasping cd visual-pushing-grasping/downloads ./download-weights.sh cd .. ``` 1. Run V-REP (navigate to your V-REP/CoppeliaSim directory and run `./vrep.sh` or `./coppeliaSim.sh`). From the main menu, select `File` > `Open scene...`, and open the file `visual-pushing-grasping/simulation/simulation.ttt` from this repository. 1. In another terminal window, run the following (simulation will start in the V-REP window). **Please note:** our pre-trained models were trained with PyTorch version 0.3, so this will only run with PyTorch 0.3. Training from scratch (next section) should still work with PyTorch 1.0+. ```shell python main.py --is_sim --obj_mesh_dir 'objects/blocks' --num_obj 10 \ --push_rewards --experience_replay --explore_rate_decay \ --is_testing --test_preset_cases --test_preset_file 'simulation/test-cases/test-10-obj-07.txt' \ --load_snapshot --snapshot_file 'downloads/vpg-original-sim-pretrained-10-obj.pth' \ --save_visualizations ``` Note: you may get a popup window titled "Dynamics content" in your V-REP window. Select the checkbox and press OK. You will have to do this a total of 3 times before it stops annoying you. ## Training To train a regular VPG policy from scratch in simulation, first start the simulation environment by running V-REP (navigate to your V-REP directory and run `./vrep.sh`). From the main menu, select `File` > `Open scene...`, and open the file `visual-pushing-grasping/simulation/simulation.ttt`. Then navigate to this repository in another terminal window and run the following: ```shell python main.py --is_sim --push_rewards --experience_replay --explore_rate_decay --save_visualizations ``` Data collected from each training session (including RGB-D images, camera parameters, heightmaps, actions, rewards, model snapshots, visualizations, etc.) is saved into a directory in the `logs` folder. A training session can be resumed by adding the flags `--load_snapshot` and `--continue_logging`, which then loads the latest model snapshot specified by `--snapshot_file` and transition history from the session directory specified by `--logging_directory`: ```shell python main.py --is_sim --push_rewards --experience_replay --explore_rate_decay --save_visualizations \ --load_snapshot --snapshot_file 'logs/YOUR-SESSION-DIRECTORY-NAME-HERE/models/snapshot-backup.reinforcement.pth' \ --continue_logging --logging_directory 'logs/YOUR-SESSION-DIRECTORY-NAME-HERE' \ ``` Various training options can be modified or toggled on/off with different flags (run `python main.py -h` to see all options): ```shell usage: main.py [-h] [--is_sim] [--obj_mesh_dir OBJ_MESH_DIR] [--num_obj NUM_OBJ] [--tcp_host_ip TCP_HOST_IP] [--tcp_port TCP_PORT] [--rtc_host_ip RTC_HOST_IP] [--rtc_port RTC_PORT] [--heightmap_resolution HEIGHTMAP_RESOLUTION] [--random_seed RANDOM_SEED] [--method METHOD] [--push_rewards] [--future_reward_discount FUTURE_REWARD_DISCOUNT] [--experience_replay] [--heuristic_bootstrap] [--explore_rate_decay] [--grasp_only] [--is_testing] [--max_test_trials MAX_TEST_TRIALS] [--test_preset_cases] [--test_preset_file TEST_PRESET_FILE] [--load_snapshot] [--snapshot_file SNAPSHOT_FILE] [--continue_logging] [--logging_directory LOGGING_DIRECTORY] [--save_visualizations] ``` Results from our baseline comparisons and ablation studies in our [paper](https://arxiv.org/pdf/1803.09956.pdf) can be reproduced using these flags. For example: * Train reactive policies with pushing and grasping (P+G Reactive); specify `--method` to be `'reactive'`, remove `--push_rewards`, remove `--explore_rate_decay`: ```shell python main.py --is_sim --method 'reactive' --experience_replay --save_visualizations ``` * Train reactive policies with grasping-only (Grasping-only); similar arguments as P+G Reactive above, but add `--grasp_only`: ```shell python main.py --is_sim --method 'reactive' --experience_replay --grasp_only --save_visualizations ``` * Train VPG policies without any rewards for pushing (VPG-noreward); similar arguments as regular VPG, but remove `--push_rewards`: ```shell python main.py --is_sim --experience_replay --explore_rate_decay --save_visualizations ``` * Train shortsighted VPG policies with lower discount factors on future rewards (VPG-myopic); similar arguments as regular VPG, but set `--future_reward_discount` to `0.2`: ```shell python main.py --is_sim --push_rewards --future_reward_discount 0.2 --experience_replay --explore_rate_decay --save_visualizations ``` To plot the performance of a session over training time, run the following: ```shell python plot.py 'logs/YOUR-SESSION-DIRECTORY-NAME-HERE' ``` Solid lines indicate % grasp success rates (primary metric of performance) and dotted lines indicate % push-then-grasp success rates (secondary metric to measure quality of pushes) over training steps. By default, each point in the plot measures the average performance over the last 200 training steps. The range of the x-axis is from 0 to 2500 training steps. You can easily change these parameters at the top of `plot.py`. To compare performance between different sessions, you can draw multiple plots at a time: ```shell python plot.py 'logs/YOUR-SESSION-DIRECTORY-NAME-HERE' 'logs/ANOTHER-SESSION-DIRECTORY-NAME-HERE' ``` ## Evaluation We provide a collection 11 test cases in simulation with adversarial clutter. Each test case consists of a configuration of 3 - 6 objects placed in the workspace in front of the robot. These configurations are manually engineered to reflect challenging picking scenarios, and remain exclusive from the training procedure. Across many of these test cases, objects are laid closely side by side, in positions and orientations that even an optimal grasping policy would have trouble successfully picking up any of the objects without de-cluttering first. As a sanity check, a single isolated object is additionally placed in the workspace separate from the configuration. This is just to ensure that all policies have been sufficiently trained prior to the benchmark (*i.e.* a policy is not ready if fails to grasp the isolated object). <img src="images/test-cases.jpg" width=100% align="middle" /> The [demo](#a-quick-start-demo-in-simulation) above runs our pre-trained model multiple times (x30) on a single test case. To test your own pre-trained model, simply change the location of `--snapshot_file`: ```shell python main.py --is_sim --obj_mesh_dir 'objects/blocks' --num_obj 10 \ --push_rewards --experience_replay --explore_rate_decay \ --is_testing --test_preset_cases --test_preset_file 'simulation/test-cases/test-10-obj-07.txt' \ --load_snapshot --snapshot_file 'YOUR-SNAPSHOT-FILE-HERE' \ --save_visualizations ``` Data from each test case will be saved into a session directory in the `logs` folder. To report the average testing performance over a session, run the following: ```shell python evaluate.py --session_directory 'logs/YOUR-SESSION-DIRECTORY-NAME-HERE' --method SPECIFY-METHOD --num_obj_complete N ``` where `SPECIFY-METHOD` can be `reactive` or `reinforcement`, depending on the architecture of your model. `--num_obj_complete N` defines the number of objects that need to be picked in order to consider the task completed. For example, when evaluating our pre-trained model in the demo test case, `N` should be set to 6: ```shell python evaluate.py --session_directory 'logs/YOUR-SESSION-DIRECTORY-NAME-HERE' --method 'reinforcement' --num_obj_complete 6 ``` Average performance is measured with three metrics (for all metrics, higher is better): 1. Average % completion rate over all test runs: measures the ability of the policy to finish the task by picking up at least `N` objects without failing consecutively for more than 10 attempts. 1. Average % grasp success rate per completion. 1. Average % action efficiency: describes how succinctly the policy is capable of finishing the task. See our [paper](https://arxiv.org/pdf/1803.09956.pdf) for more details on how this is computed. ### Creating Your Own Test Cases in Simulation To design your own challenging test case: 1. Open the simulation environment in V-REP (navigate to your V-REP directory and run `./vrep.sh`). From the main menu, select `File` > `Open scene...`, and open the file `visual-pushing-grasping/simulation/simulation.ttt`. 1. In another terminal window, navigate to this repository and run the following: ```shell python create.py ``` 1. In the V-REP window, use the V-REP toolbar (object shift/rotate) to move around objects to desired positions and orientations. 1. In the terminal window type in the name of the text file for which to save the test case, then press enter. 1. Try it out: run a trained model on the test case by running `main.py` just as in the demo, but with the flag `--test_preset_file` pointing to the location of your test case text file. ## Running on a Real Robot (UR5) The same code in this repository can be used to train on a real UR5 robot arm (tested with UR Software version 1.8). To communicate with later versions of UR software, several minor changes may be necessary in `robot.py` (*e.g.* functions like `parse_tcp_state_data`). Tested with Python 2.7 (not fully tested with Python 3). ### Setting Up Camera System The latest version of our system uses RGB-D data captured from an [Intel® RealSense™ D415 Camera](https://click.intel.com/intelr-realsensetm-depth-camera-d415.html). We provide a lightweight C++ executable that streams data in real-time using [librealsense SDK 2.0](https://github.com/IntelRealSense/librealsense) via TCP. This enables you to connect the camera to an external computer and fetch RGB-D data remotely over the network while training. This can come in handy for many real robot setups. Of course, doing so is not required -- the entire system can also be run on the same computer. #### Installation Instructions: 1. Download and install [librealsense SDK 2.0](https://github.com/IntelRealSense/librealsense) 1. Navigate to `visual-pushing-grasping/realsense` and compile `realsense.cpp`: ```shell cd visual-pushing-grasping/realsense cmake . make ``` 1. Connect your RealSense camera with a USB 3.0 compliant cable (important: RealSense D400 series uses a USB-C cable, but still requires them to be 3.0 compliant to be able to stream RGB-D data). 1. To start the TCP server and RGB-D streaming, run the following: ```shell ./realsense ``` Keep the executable running while calibrating or training with the real robot (instructions below). To test a python TCP client that fetches RGB-D data from the active TCP server, run the following: ```shell cd visual-pushing-grasping/real python capture.py ``` ### Calibrating Camera Extrinsics <img src="images/calibration.gif" height=200px align="right" /> <img src="images/checkerboard.jpg" height=200px align="right" /> We provide a simple calibration script to estimate camera extrinsics with respect to robot base coordinates. To do so, the script moves the robot gripper over a set of predefined 3D locations as the camera detects the center of a moving 4x4 checkerboard pattern taped onto the gripper. The checkerboard can be of any size (the larger, the better). #### Instructions: 1. Predefined 3D locations are sampled from a 3D grid of points in the robot's workspace. To modify these locations, change the variables `workspace_limits` and `calib_grid_step` at the top of `calibrate.py`. 1. Measure the offset between the midpoint of the checkerboard pattern to the tool center point in robot coordinates (variable `checkerboard_offset_from_tool`). This offset can change depending on the orientation of the tool (variable `tool_orientation`) as it moves across the predefined locations. Change both of these variables respectively at the top of `calibrate.py`. 1. The code directly communicates with the robot via TCP. At the top of `calibrate.py`, change variable `tcp_host_ip` to point to the network IP address of your UR5 robot controller. 1. With caution, run the following to move the robot and calibrate: ```shell python calibrate.py ``` The script also optimizes for a z-scale factor and saves it into `real/camera_depth_scale.txt`. This scale factor should be multiplied with each depth pixel captured from the camera. This step is more relevant for the RealSense SR300 cameras, which commonly suffer from a severe scaling problem where the 3D data is often 15-20% smaller than real world coordinates. The D400 series are less likely to have such a severe scaling problem. ### Training To train on the real robot, simply run: ```shell python main.py --tcp_host_ip 'XXX.XXX.X.XXX' --tcp_port 30002 --push_rewards --experience_replay --explore_rate_decay --save_visualizations ``` where `XXX.XXX.X.XXX` is the network IP address of your UR5 robot controller. ### Additional Tools * Use `touch.py` to test calibrated camera extrinsics -- provides a UI where the user can click a point on the RGB-D image, and the robot moves its end-effector to the 3D location of that point * Use `debug.py` to test robot communication and primitive actions

AI & Machine Learning IoT & Embedded
1.1K Github Stars
3dmatch-toolbox
Open Source

3dmatch-toolbox

# 3DMatch Toolbox 3DMatch is a ConvNet-based local geometric feature descriptor that operates on 3D data (i.e. point clouds, depth maps, meshes, etc.). This toolbox provides code to use 3DMatch for geometric registration and keypoint matching, as well as code to train 3DMatch from existing RGB-D reconstructions. This is the reference implementation of our paper: ### 3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions [PDF](https://arxiv.org/pdf/1603.08182.pdf) | [Webpage & Benchmarks & Datasets](http://3dmatch.cs.princeton.edu/) | [Video](https://www.youtube.com/watch?v=gZrsJJtDvvA) *[Andy Zeng](http://andyzeng.com/), [Shuran Song](http://3dvision.princeton.edu/people/shurans/), [Matthias Nießner](http://www.niessnerlab.org/members/matthias_niessner/profile.html), [Matthew Fisher](https://research.adobe.com/person/matt-fisher/), [Jianxiong Xiao](http://3dvision.princeton.edu/people/xj/), and [Thomas Funkhouser](http://www.cs.princeton.edu/~funk/)* IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017 **[Oral Presentation](https://www.youtube.com/watch?v=qNVZl7bCjsU&list=PL_bDvITUYucADb15njRd7geem8vxOyo6N&index=3)** Matching local geometric features on real-world depth images is a challenging task due to the noisy, low-resolution, and incomplete nature of 3D scan data. These difficulties limit the performance of current state-of-art methods, which are typically based on histograms over geometric properties. In this paper, we present 3DMatch, a data-driven model that learns a local volumetric patch descriptor for establishing correspondences between partial 3D data. To amass training data for our model, we propose an unsupervised feature learning method that leverages the millions of correspondence labels found in existing RGB-D reconstructions. Experiments show that our descriptor is not only able to match local geometry in new scenes for reconstruction, but also generalize to different tasks and spatial scales (e.g. instance-level object model alignment for the Amazon Picking Challenge, and mesh surface correspondence). Results show that 3DMatch consistently outperforms other state-of-the-art approaches by a significant margin. ![Overview](overview.jpg?raw=true) #### Citing If you find this code useful in your work, please consider citing: ```shell @inproceedings{zeng20163dmatch, title={3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions}, author={Zeng, Andy and Song, Shuran and Nie{\ss}ner, Matthias and Fisher, Matthew and Xiao, Jianxiong and Funkhouser, Thomas}, booktitle={CVPR}, year={2017} } ``` #### License This code is released under the Simplified BSD License (refer to the LICENSE file for details). #### Benchmarks and Datasets All relevant information and downloads can be found [here](http://3dmatch.cs.princeton.edu/). #### Contact If you have any questions or find any bugs, please let me know: [Andy Zeng](http://www.cs.princeton.edu/~andyz/) andyz[at]princeton[dot]edu ## Change Log * **Mar. 20, 2018.** Update: added labels for test-set of keypoint matching benchmark (for convenience). * **Nov. 02, 2017.** Bug fix: added `#include <random>` to utils.hpp in demo code. * **Oct. 30, 2017.** Bug fix: included Quoc-Huy's fix for NaN errors that occasionally occur during training. * **Oct. 28, 2017.** Notice: demo code only reads 3D point clouds saved in a simple binary format. If you would like to run the 3DMatch demo code on your own point cloud format, please modify demo.cu accordingly. * **Apr. 06, 2017.** Notice: 3DMatch uses cuDNN 5.1. Revised install instructions. ## Dependencies Our reference implementation of 3DMatch, as well as other components in this toolbox, require the following dependencies. Tested on Ubuntu 14.04. 0. [CUDA 7.5](https://developer.nvidia.com/cuda-toolkit-archive) and [cuDNN 5.1](https://developer.nvidia.com/cudnn). You may need to register with NVIDIA. Below are some additional steps to set up cuDNN 5.1. **NOTE** We highly recommend that you install different versions of cuDNN to different directories (e.g., ```/usr/local/cudnn/vXX```) because different software packages may require different versions. ```shell LIB_DIR=lib$([[ $(uname) == "Linux" ]] && echo 64) CUDNN_LIB_DIR=/usr/local/cudnn/v5.1/$LIB_DIR echo LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CUDNN_LIB_DIR >> ~/.profile && ~/.profile tar zxvf cudnn*.tgz sudo cp cuda/$LIB_DIR/* $CUDNN_LIB_DIR/ sudo cp cuda/include/* /usr/local/cudnn/v5.1/include/ ``` 0. OpenCV (tested with OpenCV 2.4.11) * Used for reading image files 0. Matlab 2015b or higher (tested with Matlab 2016a) ## Table of Contents * [Demo: Align Two Point Clouds with 3DMatch](#demo-align-two-point-clouds-with-3dmatch) * [Converting 3D Data to TDF Voxel Grids](#converting-3d-data-to-tdf-voxel-grids) * [Training 3DMatch from RGB-D Reconstructions](#training-3dmatch-from-rgb-d-reconstructions) * [Multi-Frame Depth TSDF Fusion](#multi-frame-depth-tsdf-fusion) * [Evaluation Code](#evaluation-code) ## Demo: Align Two Point Clouds with 3DMatch ![Demo-Teaser](demo-teaser.jpg?raw=true) This demo aligns two 3D point clouds (projected from single-view depth maps) using our pre-trained 3DMatch descriptor (with Marvin) and standard RANSAC. ### Instructions 0. Checkout 3DMatch toolbox, compile C++/CUDA demo code and Marvin ```shell git clone https://github.com/andyzeng/3dmatch-toolbox.git 3dmatch-toolbox cd 3dmatch-toolbox/core ./compile.sh ``` 0. Download our 3DMatch pre-trained weights ```shell ./download-weights.sh # 3dmatch-weights-snapshot-137000.marvin ``` 0. Load the two example 3D point clouds, compute their TDF voxel grid volumes, and compute random surface keypoints and their 3DMatch descriptors (saved to binary files on disk). Warning: this demo only reads 3D point clouds saved in a simple binary format. If you would like to run the 3DMatch demo code on your own point cloud format, please modify demo.cu accordingly. ```shell # Generate fragment-1.desc.3dmatch.bin and fragment-1.keypts.bin ./demo ../data/sample/3dmatch-demo/single-depth-1.ply fragment-1 # Generate fragment-2.desc.3dmatch.bin and fragment-2.keypts.bin ./demo ../data/sample/3dmatch-demo/single-depth-2.ply fragment-2 ``` 0. Run the following script in Matlab: ```matlab % Load keypoints and 3DMatch descriptors and use RANSAC to register the two % point clouds. A visualization of the aligned point clouds is saved into % the file `result.ply` which can be viewed with Meshlab or any other 3D % viewer. Note: there is a chance that alignment may fail on the first try % of this demo due to bad keypoints, which are selected randomly by default. demo; ``` ## Converting 3D Data to TDF Voxel Grids Instructions on how to convert from various 3D data representations into a voxel grid of Truncated Distance Function (TDF) values. 0. Point cloud to TDF voxel grid (using nearest neighbor point distances) * See [C++/CUDA demo code](https://github.com/andyzeng/3dmatch-toolbox/blob/master/core/demo.cu) (ComputeTDF) which approximates TDF values (fast) using an occupancy voxel grid. * Alternative: See [Matlab/CUDA code](https://github.com/andyzeng/3dmatch-toolbox/blob/master/deprecated/pointCloud2AccTDF.m) which computes accurate TDF values but is very slow. * Alternative: See [Matlab code](https://github.com/andyzeng/3dmatch-toolbox/blob/master/evaluation/model-fitting-apc/pointCloud2TDF.m) which also computes accurate TDF values, but works standalone on Matlab. Usually runs without memory problems if your point cloud is small. 0. Mesh to TDF voxel grid (using distance transform of mesh surface with [GAPS](https://github.com/tomfunkhouser/gaps)). Note that a version of GAPS is already included in this repository. * Instructions on installing GAPS and converting a sample mesh (.off file) into a voxel grid (binary .raw file of floats): ```shell cd 3dmatch-toolbox/gaps # Install GAPS make # Run msh2df on example mesh file (see comments in msh2df.cpp for more instructions) cd bin/x86_64 wget http://3dvision.princeton.edu/projects/2016/3DMatch/downloads/gaps/bicycle000002.off ./msh2df bicycle000002.off bicycle000002.raw -v # see comments in msh2df.cpp for more arguments # Download visualization script wget http://3dvision.princeton.edu/projects/2016/3DMatch/downloads/gaps/showTDF.m ``` * Run the visualization script in Matlab ```matlab % Visualize TDF voxel grid of mesh showTDF; ``` 0. Depth map to TDF voxel grid * Project depth map into a point cloud in 3D camera space and convert from point cloud to TDF voxel grid (see above) * Alternative: Convert from depth map(s) into a TSDF volume (see instructions [here](#multi-frame-depth-tsdf-fusion)) and compute the absolute value of each voxel (aka. projective TDF values, which behave differently near the view boundaries and regions of missing depth) ## Training 3DMatch from RGB-D Reconstructions See folder `3dmatch-toolbox/training` Code for training 3DMatch with [Marvin](http://marvin.is/), a lightweight GPU-only neural network framework. Includes Siamese network architecture .json file `training/net.json` and a CUDA/C++ Marvin data layer in `training/match.hpp` that randomly samples correspondences from RGB-D reconstruction datasets (which can be downloaded from our [project webpage](http://3dmatch.cs.princeton.edu/#rgbd-reconstruction-datasets)). ### Quick Start 0. Compile Marvin ```shell cd 3dmatch-toolbox/training ./compile.sh ``` 0. Download several training and testing scenes from RGB-D reconstruction datasets (download more scenes [here](http://3dmatch.cs.princeton.edu/#rgbd-reconstruction-datasets)) ```shell cd ../data mkdir train && mkdir test && mkdir backup cd train wget http://3dvision.princeton.edu/projects/2016/3DMatch/downloads/rgbd-datasets/sun3d-brown_cogsci_1-brown_cogsci_1.zip wget http://3dvision.princeton.edu/projects/2016/3DMatch/downloads/rgbd-datasets/7-scenes-heads.zip wget http://3dvision.princeton.edu/projects/2016/3DMatch/downloads/rgbd-datasets/sun3d-harvard_c11-hv_c11_2.zip unzip sun3d-brown_cogsci_1-brown_cogsci_1.zip unzip 7-scenes-heads.zip unzip sun3d-harvard_c11-hv_c11_2.zip mv *.zip ../backup cd ../test wget http://3dvision.princeton.edu/projects/2016/3DMatch/downloads/rgbd-datasets/sun3d-hotel_umd-maryland_hotel3.zip unzip sun3d-hotel_umd-maryland_hotel3.zip mv *.zip ../backup cd ../../training ``` 0. Train a 3DMatch model from scratch over correspondences from the RGB-D scenes saved in `data/train` ```shell ./marvin train net.json ``` 0. (Optional) Train 3DMatch using pre-trained weights from a Marvin tensor file ```shell ./marvin train net.json your-pre-trained-weights.marvin ``` ### Additional Setup Instructions You can download more scenes from RGB-D reconstruction datasets on our [project webpage](http://3dmatch.cs.princeton.edu/#rgbd-reconstruction-datasets). These datasets have been converted into a unified format, which is compatible with our Marvin data layer used to train 3DMatch. Save at least one scene into `data/train` and another scene into `data/test` such that the folder hierarchy looks something like this: ```shell |——— training |——— core |——— marvin.hpp |——— ... |——— data |——— train |——— rgbd-dataset-scene-1 |——— seq-01 |——— seq-02 |——— camera-intrinsics.txt |——— ... |——— ... |——— test |——— rgbd-dataset-scene-2 |——— seq-01 |——— camera-intrinsics.txt |——— ... ``` ## Multi-Frame Depth TSDF Fusion See folder `3dmatch-toolbox/depth-fusion` CUDA/C++ code to fuse multiple registered depth maps into a TSDF voxel volume ([Curless and Levoy 1996](http://graphics.stanford.edu/papers/volrange/volrange.pdf)), which can then be used to create surface meshes and point clouds. ### Demo This demo fuses 50 registered depth maps from directory `data/sample/depth-fusion-demo/rgbd-frames` into a TSDF voxel volume, and creates a surface point cloud `tsdf.ply` ```shell cd 3dmatch-toolbox/depth-fusion ./compile.sh ./demo # output saved to tsdf.ply ``` ## Evaluation Code See folder `3dmatch-toolbox/evaluation` Evaluation code for the [Keypoint Matching Benchmark](http://3dmatch.cs.princeton.edu/#keypoint-matching-benchmark) and [Geometric Registration Benchmark](http://3dmatch.cs.princeton.edu/#geometric-registration-benchmark), as well as a reference implementation for the experiments in our [paper](https://arxiv.org/pdf/1603.08182.pdf). ### Keypoint Matching Benchmark See folder `3dmatch-toolbox/evaluation/keypoint-matching` Benchmark description and leaderboard can be found [here](http://3dmatch.cs.princeton.edu/#keypoint-matching-benchmark). #### Evaluation Example 0. Navigate to `3dmatch-toolbox/evaluation/keypoint-matching` and run the following in Matlab: ```matlab % Evaluate 3DMatch (3dmatch.log) on the validation set (validation-set-gt.log) getError; ``` #### Run 3DMatch on the validation set to generate a .log file (3dmatch.log) 0. Compile C++/CUDA code to compute 3DMatch descriptors with Marvin ```shell cd 3dmatch-toolbox/evaluation/keypoint-matching ./compile.sh ``` 0. Download our 3DMatch pre-trained weights ```shell ./download-weights.sh # 3dmatch-weights-snapshot-137000.marvin ``` 0. Download the validation set and test set ```shell ./download-validation.sh # validation-set.mat ./download-test.sh # test-set.mat ``` 0. Modify and run the following script in Matlab: ```matlab % Runs 3DMatch on the validation set and generates 3dmatch.log test3DMatch; ``` #### Generate your own correspondence dataset from [RGB-D reconstructions](http://3dmatch.cs.princeton.edu/#rgbd-reconstruction-datasets) 0. Download one or more scenes from RGB-D reconstruction datasets on our [project webpage](http://3dmatch.cs.princeton.edu/#rgbd-reconstruction-datasets). Organize the [folder hierarchy as above](#additional-setup-instructions). 0. Modify and run the following script in Matlab: ```matlab makeCorresDataset; ``` ### Geometric Registration Benchmark See folder `3dmatch-toolbox/evaluation/geometric-registration` Includes Matlab code to run evaluation on the geometric registration benchmarks described [here](http://3dmatch.cs.princeton.edu/#geometric-registration-benchmark). Overview: * `getKeyptsAndDesc.m` - generates intermediate data (TDF voxel volumes, keypoints, and 3DMatch descriptors) for the scene fragments. You can also download our pre-computed data [here](http://3dmatch.cs.princeton.edu/#geometric-registration-synthetic-data). * `runFragmentRegistration.m` - read intermediate data and run RANSAC-based registration for every pair of fragments. * `writeLog` - read registration results from every pair of fragments and create a .log file * `evaluate.m` - compute precision and recall from .log files for evaluation #### Evaluation Example Run the following in Matlab: ```matlab % Evaluate 3DMatch on the geometric registration benchmark evaluate; ``` Note: the TDF voxel grids of the scene fragments from the synthetic benchmark were computed using the deprecated code for accurate TDF (see `deprecated/pointCloud2AccTDF.m`). 3DMatch pre-trained weights fine-tuned on training fragments can be downloaded [here](http://3dvision.princeton.edu/projects/2016/3DMatch/downloads/weights/3dmatch-weights-snapshot-127000-fragments-6000.marvin). ### Model Fitting for 6D Object Pose Estimation in the Amazon Picking Challenge See folder `3dmatch-toolbox/evaluation/model-fitting-apc` Includes code and pre-trained models to evaluate 3DMatch for model fitting on the [Shelf & Tote dataset](http://www.cs.princeton.edu/~andyz/apc2016). You can download our pre-computed data (TDF voxel grid volumes for objects and scans, surface keypoints, descriptors, and pose predictions) [here](http://3dvision.princeton.edu/projects/2016/3DMatch/downloads/apc-intermediate-data.zip). For an evaluation example, run Matlab script `getError.m` ### Mesh Correspondence in Shape2Pose See folder `3dmatch-toolbox/evaluation/mesh-correspondence-shape2pose` Includes code to generate mesh correspondence visualizations on the meshes from the [Shape2Pose dataset](http://gfx.cs.princeton.edu/gfx/pubs/Kim_2014_SHS/index.php) using 3DMatch. You can also download our pre-computed data (TDF voxel grid volumes of the meshes, surface keypoints, 3DMatch descriptors) [here](http://3dvision.princeton.edu/projects/2016/3DMatch/downloads/shape2pose.zip). For a quick visualization, run the Matlab script `keypointRetrieval.m`.

ML Frameworks 3D Modeling & Animation
900 Github Stars