Reproducible Research

share ›
‹ links

Below are the top discussions from Reddit that mention this online Coursera course from Johns Hopkins University.

This course focuses on the concepts and tools behind reporting modern data analyses in a reproducible manner.

Knitr Data Analysis R Programming Markup Language

Reddsera may receive an affiliate commission if you enroll in a paid course after using these buttons to visit Coursera. Thank you for using these buttons to support Reddsera.

Taught by
Roger D. Peng, PhD
Associate Professor, Biostatistics
and 2 more instructors

Offered by
Johns Hopkins University

Reddit Posts and Comments

0 posts • 9 mentions • top 3 shown below

r/gis • comment
2 points • GeospatialDaryl

To paraphrase Jerry from 'Parks & Rec': it's not GIS if we don't do it twice.


In the first 5-6 years of my GIS career I evolved a number of techniques to try to protect myself from this. As others have noted, it's all about communication with the client.


For example - anytime I'm making tables that summarize over geometries (like, a lot of the time), I make simple, vector maps with appropriate elements, zoomed in, and then schedule time to go over them carefully. That's just one example, but including that step in my workflow considerably reduced the 'oh, this boundary is wrong... can you just update this table.' No, no I can't. But I can RE RUN THE ENTIRE ANALYSIS!


So yeah - the more you script stuff (look up the Reproducible Research folks - mostly in R) - the less you redo. It also helps to reduce the workflow to 'chunks' that are more common, and then script those. Based on what you need, you can pull up that script from your quiver, adjust a few lines at the top, and be good to go.


In R or Python Jupyter notebooks we can embed the analysis in within the document itself. All intermediate products (steps in the analysis) are scratch products, and are deterministic derivatives from the input data. It makes the write up easy - go through, hide un-needed code cells, add text cells in Markdown, and render to PDF or HTML.

r/dataengineering • comment
1 points • Thaufas

> Glad u agree. The DE industry needs more software engineers and fewer monkeys that click icons.

Definitely! I require all of my DS staff to adhere to the principles of Reproducible Research for all of their analyses. They must make the original data available along with source code and specifications for all software used so that anyone else reading their summary can reproduce all figures and calculated results to a level of precision that is within the floating point error limits of their CPU. With virtualization, the results can often be reproduced exactly.

Personally, I don't like the term reproducible research because it has a related, but different connotation within the broader scientific fields.

Specifically, metrological experts (i.e. measurement science experts) have defined terms for measurement reproducibility and repeatability over the years. Even then, there is often little agreement among different standards that have been developed over the years, but, fortunately, the BPIM is slowly harmonizing all of these definitions, and they will become ISO standards. This effort will literally take decades, but the result will be valuable.

r/LaTeX • comment
1 points • magtk

TL;DR It is suitable for all STEM fields. In data science too. You can use git when writing in LaTeX (or Markdown).

It is PERFECT with connection with R. I use it mostly (hm.. about 8 years in computer/computational/data science), it saves me a lot of time! For relatively simple documents AND for publishing directly to the web I use Markdown (md) - try online. For reports, books with equations, and specific needs (generally more demanding and more complicated - (cross)refferences, citations etc) I use LaTeX for typesetting.

You can use R or Python with Markdown when using Jupyter. R code can be directly embedded into LaTeX using knitr. You save time, because you do not have to copy&paste code snippets, charts, tables, results of calculation - they can be inserted into text on-the-fly :) Very useful when you want to do analysis in the same way, in reproducible way - generate identical reports when data changes... just replacing data source.

Using tikz you can also make animated pictures or charts, diagrams etc (with the help of TikzEdt/Geogebra). With Beamer you can make presentation with different themes (here or here) (or with powerdot). You can even write in the same time a book/script and presentation in LaTex (this is only difference in compiling)!

You can try LaTeX online with Overleaf or Papeeria (for example). Overleaf has good support and templates for LaTex documents.

Take a look at this Coursera course. all what is shown there in md can be also written in LaTeX

Personally, I use TexStudio for local work with LaTex, synchronized with repo in Gitlab - connected with projects on Overleaf. This also lets me collaborate while writing documents with others.

When making computations, data science - (prototypes) I use Jupyter (md+R/Py) or R-studio (R+LaTeX - .rnw or .snw files or R+md - .rmd files) with knitr library. To make final reports, books when necessary I convert md to tex with pandoc and make fine-tuning with LaTeX.

It tooks some time to learn it, but later it saves you a lot of time. And final documents are ... beautiful.

If you need more info - drop me a line.