Policy Learning Wiki

Now that you've collected your data, you can preprocess it to make it compatible with the policy training code. The following will assume you have a .zip file with your task data from a specific environment — see the "Data Saving and Uploading" section here for more details on this.

Note: the gripper processing models are quite large and so GPU resources are required.

Preprocessing from a .zip File

Ensure you've pulled the latest version of the min-stetch codebase.

Run the setup script in min-stretch to set the appropriate config paths.
```
./setup.sh
```

Enter the data-collection folder.
```
cd data-collection
```
If you're using Greene, request GPU resources.
- Request a GPU node, e.g.
```
srun --nodes=1 --cpus-per-task=8 --mem=64GB --time=2:00:00 --gres=gpu:1 --pty /bin/bash
```
- Enter singularity container (assumes you have an overlay filesystem -- see here for more details). Be sure to include --nv as we'll require Nvidia drivers for GPU use, i.e.
```
singularity exec --nv --overlay $SCRATCH/overlay-home-robot-env.ext3:rw /scratch/work/public/singularity/cuda11.8.86-cudnn8.7-devel-ubuntu22.04.2.sif /bin/bash
```

Create environment from config file

mamba env create -f data_collection_env.yaml

Activate your environment
```
mamba activate data_collection
```
Install ffmpeg within the environment
```
conda install conda-forge::ffmpeg
```

Set up co-tracker

cd gripper/utils/co-tracker && pip install -e .
mkdir -p checkpoints
cd checkpoints
wget --no-check-certificate https://huggingface.co/facebook/cotracker3/resolve/main/scaled_online.pth
cd ../../../../

Upload your .zip file to your server

If you're using VSCode, you can typically drag it from your local machine to the desired location to upload. Otherwise, you can use something like scp or rsync.

Run the following command to process the contents of the .zip file.
```
python process_all_trajs.py -z /path/to/your/zip/file.zip
```
You'll see that the data should process at ~10 frames per second. Once the data is processed, you should see the following files in each demonstration folder:
- compressed_np_depth_float32.bin
- compressed_video_h264.mp4
- labels.json
- rgb_rel_videos_exported.txt

Format for Policy Learning

You want to organize your data in the following format, as this is the format expected by our data loading code.

Stick_Data/
|--- Task1_Name/
|------ Home1/
|-------- Env1/
|----------- 2025-05-25-19_29_32/
|----------- 2025-05-25-19_30_02/
|----------- ...
|-------- Env2/
|----------- 2025-05-25-19_35_01/
|----------- 2025-05-25-19_35_09/
|----------- ...
|------ Home.../
|-------- Env1/
|----------- 2025-05-25-19_33_32/
|----------- 2025-05-25-19_34_02/
|----------- ...
|--- r3d_files.txt

An example of Task1_Name would be Door_Opening, and an example of Home1 would be CDS.
You can either create the above structure manually, or use the organize_data.sh script to organize your data into this format — just specify the SRC, DEST, TASK, HOME, and ENV variables within the script.
```
./organize_data.sh
```

To generate the r3d_files.txt file shown above (used by the dataloader to specify demos to use), run
```
./get_txt.sh YOUR_DATA_PATH
```
In the example file structure above, YOUR_DATA_PATH would be the path to the Stick_Data folder.

Data Preprocessing

Preprocessing from a .zip File

Format for Policy Learning