Some of this will be a repeat of what has already been said. My advise would be to do go the why-what-how approach.
Why do you want to parallelize? It is clear that you want to do 3D simulations and it is cost prohibitive on a single core. Other questions to think about - What runtimes do you expect for the 3D simulation? Would a single powerful workstation be good enough (so OpenMP may suffice - within a single node MPI and OpenMP should give you the similar performance if implemented correctly. Inherently there is no advantage of using one vs the other within a node)? Is this an academic code that you don’t see being used in the future (is parallelization just a means to get a fast solution and you don’t care about scalability that much)? Or do you want to set the standards now so that 10+ years the code is still being used on a cluster? Think about this questions, talk to your Prof. and others in your group.
What are you going to use for the methodology? pure MPI, pure OpenMP, MPI-X? From your question it looks like you may have an OpenMP version for your 2D solver but it is not efficient. Do you know where the inefficiencies come from? Lack of sufficient parallel work? synchronization overheads? load imbalance? false sharing? do you use first touch policy? Have you used a profiling tool to narrow it down and fix it? If you can fix that then you could go with Hybrid MPI-OpenMP for the 3D solver if you think you need to take advantage of the compute power of a cluster. A pure MPI or a pure OpenMP may not scale well but a Hybrid may (by negating each others’ disadvantages).
How are you going to implement it? If you are going to spend 90+% time inverting a matrix then best to go with PetSc or other tools out there that are scalable. For domain decomposition use external tools like Metis.
If you want to use accelerators it is important that your code be vectorized. Compilers can give you reports on it, sometimes small changes can make a big impact.
Intel has good (and probably all free for academia) resources - Inspector, Amplifier, Advisor, Trace Analyzer & Collector
There is a course as well - https://www.coursera.org/learn/parallelism-ia