This paper presents a hybrid parallel breadth-first search (BFS) algorithm for distributed memory systems that uses two stacks. BFS is important for graph algorithms and parallelizing it is important for large graphs. The paper's contributions include a 1D partitioning approach for graph representation. The hybrid algorithm assigns vertices to processors and uses local stacks to parallelize edge visits, balancing load. Experimental results on large systems show the hybrid 1D approach scales better than a 2D approach and is faster than a flat 1D implementation for higher processor counts. The paper concludes the algorithm can implement BFS without errors using relaxed queues and poses questions about bounding errors without level synchronization.