Fundamentals of Parallelism on Intel Architecture

share ›
‹ links

Below are the top discussions from Reddit that mention this online Coursera course from Intel.

Offered by Intel. This course will introduce you to the multiple forms of parallelism found in modern Intel architecture processors and ... Enroll for free.

Reddsera may receive an affiliate commission if you enroll in a paid course after using these buttons to visit Coursera. Thank you for using these buttons to support Reddsera.

Taught by
Andrey Vladimirov
Head of High-Performance Computing Research
and 13 more instructors

Offered by
Intel

Reddit Posts and Comments

0 posts • 2 mentions • top 2 shown below

r/cpp_questions • comment
3 points • 0mega0

I found this course on coursera to be a great primer: Fundamentals of Parallelism on Intel Architecture

It teaches basic vectorization and parallel computing using OpenMP and OpenMPI.

Quite disappointed the next course wasn’t released, as the instructor is great.

r/CFD • comment
1 points • agoki

Some of this will be a repeat of what has already been said. My advise would be to do go the why-what-how approach.

Why do you want to parallelize? It is clear that you want to do 3D simulations and it is cost prohibitive on a single core. Other questions to think about - What runtimes do you expect for the 3D simulation? Would a single powerful workstation be good enough (so OpenMP may suffice - within a single node MPI and OpenMP should give you the similar performance if implemented correctly. Inherently there is no advantage of using one vs the other within a node)? Is this an academic code that you don’t see being used in the future (is parallelization just a means to get a fast solution and you don’t care about scalability that much)? Or do you want to set the standards now so that 10+ years the code is still being used on a cluster? Think about this questions, talk to your Prof. and others in your group.

What are you going to use for the methodology? pure MPI, pure OpenMP, MPI-X? From your question it looks like you may have an OpenMP version for your 2D solver but it is not efficient. Do you know where the inefficiencies come from? Lack of sufficient parallel work? synchronization overheads? load imbalance? false sharing? do you use first touch policy? Have you used a profiling tool to narrow it down and fix it? If you can fix that then you could go with Hybrid MPI-OpenMP for the 3D solver if you think you need to take advantage of the compute power of a cluster. A pure MPI or a pure OpenMP may not scale well but a Hybrid may (by negating each others’ disadvantages).

How are you going to implement it? If you are going to spend 90+% time inverting a matrix then best to go with PetSc or other tools out there that are scalable. For domain decomposition use external tools like Metis.

If you want to use accelerators it is important that your code be vectorized. Compilers can give you reports on it, sometimes small changes can make a big impact.

Intel has good (and probably all free for academia) resources - Inspector, Amplifier, Advisor, Trace Analyzer & Collector

There is a course as well - https://www.coursera.org/learn/parallelism-ia