Python has become one of the main programming languages used for data analytics, offering popular libraries such as Numpy, Pandas, and Scikit-Learn. However, these tools were not designed to scale beyond a single machine. As such, they can struggle to analyze larger datasets 100 GB+. Dask provides an open-source solution to this problem by providing a framework to scale computations across multiple machines (ranging from multiple cores on a single laptop to across supercomputer nodes). Furthermore, Dask provides a familiar programming interface because it is co-developed with many popular scientific computing libraries.
In this tutorial, we provide an interactive introduction to Dask. We will run code through Binder, so you do not need to install anything beforehand.