Data Preprocessing

Now that you've collected your data, you can preprocess it to make it compatible with the policy training code. The following will assume you have a .zip file with your task data from a specific environment — see the "Data Saving and Uploading" section here for more details on this.

Note: the gripper processing models are quite large and so GPU resources are required.

Preprocessing from a .zip File

  1. Ensure you've pulled the latest version of the min-stetch codebase.
  2. Run the setup script in min-stretch to set the appropriate config paths.
    ./setup.sh
  3. Enter the data-collection folder.
    cd data-collection
  4. If you're using Greene, request GPU resources.
    • Request a GPU node, e.g.
      srun --nodes=1 --cpus-per-task=8 --mem=64GB --time=2:00:00 --gres=gpu:1 --pty /bin/bash
    • The command above will request 1 GPU with 8 CPUs and 64GB CPU memory for 2 hours.
    • Enter singularity container (assumes you have an overlay filesystem -- see here for more details). Be sure to include --nv as we'll require Nvidia drivers for GPU use, i.e.
      singularity exec --nv --overlay $SCRATCH/overlay-home-robot-env.ext3:rw /scratch/work/public/singularity/cuda11.8.86-cudnn8.7-devel-ubuntu22.04.2.sif /bin/bash
  5. Create environment from config file
    mamba env create -f data_collection_env.yaml
  6. Activate your environment
    mamba activate data_collection
  7. Install ffmpeg within the environment
    conda install conda-forge::ffmpeg
  8. Set up co-tracker
    cd gripper/utils/co-tracker && pip install -e .
    mkdir -p checkpoints
    cd checkpoints
    wget --no-check-certificate https://huggingface.co/facebook/cotracker3/resolve/main/scaled_online.pth
    cd ../../../../
  9. Upload your .zip file to your server
    • If you're using VSCode, you can typically drag it from your local machine to the desired location to upload. Otherwise, you can use something like scp or rsync.
  10. Run the following command to process the contents of the .zip file.
    python process_all_trajs.py -z /path/to/your/zip/file.zip
  11. You'll see that the data should process at ~10 frames per second. Once the data is processed, you should see the following files in each demonstration folder:
    • compressed_np_depth_float32.bin
      • This contains the raw metric depth map from every frame compressed into a binary file.
    • compressed_video_h264.mp4
      • This is the compressed video (256x256) that will used during training.
    • labels.json
      • This contains the iPhone/gripper state at every frame — translation, rotation, and gripper aperture.
    • rgb_rel_videos_exported.txt
      • This is an empty file which indicates that everything was exported, and is a sanity check during dataloading.

Format for Policy Learning

  1. You want to organize your data in the following format, as this is the format expected by our data loading code.
    Stick_Data/
    |--- Task1_Name/
    |------ Home1/
    |-------- Env1/
    |----------- 2025-05-25-19_29_32/
    |----------- 2025-05-25-19_30_02/
    |----------- ...
    |-------- Env2/
    |----------- 2025-05-25-19_35_01/
    |----------- 2025-05-25-19_35_09/
    |----------- ...
    |------ Home.../
    |-------- Env1/
    |----------- 2025-05-25-19_33_32/
    |----------- 2025-05-25-19_34_02/
    |----------- ...
    |--- r3d_files.txt
    • An example of Task1_Name would be Door_Opening, and an example of Home1 would be CDS.
    • You can either create the above structure manually, or use the organize_data.sh script to organize your data into this format — just specify the SRC, DEST, TASK, HOME, and ENV variables within the script.
      ./organize_data.sh
  2. To generate the r3d_files.txt file shown above (used by the dataloader to specify demos to use), run
    ./get_txt.sh YOUR_DATA_PATH
    In the example file structure above, YOUR_DATA_PATH would be the path to the Stick_Data folder.