MeNDA is a near-memory architecture that uses processing units deployed in DIMM buffer chips to perform sparse matrix transposition and SpMV through a multi-way merge algorithm. It presents a scalable solution by exploiting rank-level and DIMM-level parallelism. Evaluation shows MeNDA achieves speedups of 19x, 12x and 8x over CPU, GPU, and state-of-the-art SpMV accelerator implementations, respectively. It also reduces the transposition overhead in graph analytics from 126% to 5% by enabling in-situ processing.