Cavemen

🗿 A prehistoric approach for Mapless Navigation 🗺️

New York University

overview

(a) The data structure overview of how mapping is done in the exploration phase. (b) The top 12 resulting targets and their corresponding (x,y,w) which is the displacement in the x, y axis and rotation. The bottom four images show the target front, right, back and left view respectively.


Abstract

Given a maze without a predefined map, our objective is to navigate to a target location using only visual information. The maze is presented to us in the form of game with two modes: Exploration and Navigation. We use a dead-reckoning approach to estimate the global position of the robot during the Exploration page. We then use a visual bag of words approach to recognize the target location trained using FAISS. We then use A* algorithm to generate a path to the target. The map is presented to the user which is then used to navigate to the target.


Video

Exploration

exploration-img

Overview of the system architecture of our solution. The current frame, pose and graph data is published by player.py using Redis which is used by processes running in parallel to update the position of the robot in game in real-time using plotter.py. We visualize the orientation of the robot using arrow.py


We manually explore the maze using visual inspection and manual control. We use the distance travelled and the amount rotated (rotation limited to 90 degree turns) to create a dead-reckoning approach to calculate the global position of the robot. Index matching each image with each pose, we can create a geotagged list of pictures that we can later interpret for the navigation

Navigation

exploration-img

Composite view of all windows that allow for fastest navigation to target.


We have one method for our recognition: Visual Bag of Words. To ensure accuracy, we train a vocabulary for each run using SIFT features. After training our vocabulary using the FAISS algorithm, we create histograms of each image we have seen during exploration. We do all of this heavy duty processing during the mapping phase to save time on the navigation phase.

Now, using the index matched data structure described above, we can reliably compare the target images histograms to the total list of histograms we previously created. We shortlist our top 3 candidates for each perspective, and then compare them via visual inspection to the target.

We can manually input the target that we believe is the most accurate, and then the A* algorithm will generate a path.

Citation

If you use this work or find it helpful, please consider citing: (bibtex)

@software{Bronfman_Cavemen_A_2023,
    author = {Bronfman, Gabriel and Gupta, Shubham},
    month = dec,
    title = {{🗿 Cavemen: A prehistoric approach for Mapless Navigation}},
    url = {https://github.com/gabriel-bronfman/cavemen},
    version = {1.0.0},
    year = {2023}
}

Acknowledgements

The website template is inspired by SO-NeRF, LERF and MERF. The project is built from the starter code released by Professor Chen Feng and the TAs for the course Robot Perception.