Dask-ML dask-ml 2021.11.31 documentation

Dask-ML provides scalable machine learning in Python using Dask alongsidepopular machine learning libraries like Scikit-Learn, XGBoost, and others.

People may run into scaling challenges along a couple dimensions, and Dask-MLoffers tools for addressing each.

The first kind of scaling challenge comes when from your models growing solarge or complex that it affects your workflow (shown along the vertical axisabove). Under this scaling challenge tasks like model training, prediction, orevaluation steps will (eventually) complete, they just take too long. Youvebecome compute bound.

To address these challenges youd continue to use the collections you know andlove (like the NumPy ndarray, pandas DataFrame, or XGBoost DMatrix)and use a Dask Cluster to parallelize the workload on many machines. Theparallelization can occur through one of our integrations (like Dasksjoblib backend to parallelize Scikit-Learn directly) or one ofDask-MLs estimators (like our hyper-parameter optimizers).

The second type of scaling challenge people face is when their datasets growlarger than RAM (shown along the horizontal axis above). Under this scalingchallenge, even loading the data into NumPy or pandas becomes impossible.

To address these challenges, youd use Dasks one of Dasks high-levelcollections like(Dask Array, Dask DataFrame or Dask Bag) combined with one of Dask-MLsestimators that are designed to work with Dask collections. For example youmight use Dask Array and one of our preprocessing estimators indask_ml.preprocessing, or one of our ensemble methods indask_ml.ensemble.

Its worth emphasizing that not everyone needs scalable machine learning. Toolslike sampling can be effective. Always plot your learning curve.

In all cases Dask-ML endeavors to provide a single unified interface around thefamiliar NumPy, Pandas, and Scikit-Learn APIs. Users familiar withScikit-Learn should feel at home with Dask-ML.

Other machine learning libraries like XGBoost already havedistributed solutions that work quite well. Dask-ML makes no attempt tore-implement these systems. Instead, Dask-ML makes it easy to use normal Daskworkflows to prepare and set up data, then it deploys XGBoostalongside Dask, and hands the data over.

See Dask-ML + XGBoost for more information.

View original post here:
Dask-ML dask-ml 2021.11.31 documentation

Related Posts

Comments are closed.