Widely adopted by both developers and organisations, Kubeflow is an MLOps platform that runs on Kubernetes and automates machine learning (ML) workloads. It covers the entire ML lifecycle, enabling data scientists and machine learning engineers to develop and deploy ML models. Kubeflow is designed as a suite of leading open source projects that enable different capabilities such as model serving, training or hypertuning optimisations.
At Canonical, we deliver Charmed Kubeflow – an official distribution of the upstream solution with additional security maintenance, tool integrations, and enterprise support and managed services – so we know a thing or two about the project. In our experience, one of the most important concepts to understand with respect to both Kubeflow itself and the broader ML lifecycle is machine learning pipelines. Taking advantage of pipelines is the best way to effectively deploy models at scale in production, so let’s break down this critical component in the MLOps landscape.
A machine learning pipeline is an important component of ML systems, ensuring simplified experimentation and capability to take models to production. They are a series of steps that automate how ML models are created, in order to streamline the workflow,development and deployment. ML pipelines simplify the complexity of the end-to-end ML lifecycle, helping professionals to develop and deploy models. Amongst their benefits, ML pipelines ensure scalability thanks to their ability to handle large volumes of data while supporting collaboration and reproducibility.
A core value of MLOps platforms such as Kubeflow is that they enable professionals to build and maintain ML pipelines.
Kubeflow Pipelines or KFP is the heart of Kubeflow. It is a Kubeflow component that enables the creation of ML pipelines. It is used to help you build and deploy container-based ML workflows that are portable and scalable. The main goals of Kubeflow Pipelines are to simplify the following processes:
Kubeflow Pipelines is part of the Kubeflow project. It can be used as part of the project or as an independent tool. It is made of 3 main components:
Kubeflow Pipelines is typically most useful for advanced users of Kubeflow or professionals who already have experience with machine learning. You don’t necessarily need KFP in the experimentation phase of the ML journey, but it becomes useful when you want to take yourmodels to production. The main use cases for KFP include:
Among machine learning specialists, Kubeflow Pipelines is widely adopted for a number of reasons. The most important benefits of KFP include:
Whereas Kubeflow Pipelines is a feature-rich tool, it still raises some challenges for beginners. It comes with a steep learning curve and there is limited documentation available. Since it is a fully open source tool, there is a big community that can help beginners, but it can be frustrating at times. You can alleviate these challenges by taking advantage of enterprise support or managed services from organisations which distribute Kubeflow.
Kubeflow Pipelines is a complex component with capabilities that unblock users and enable them to automate their workflows and reduce their time spent on manual tasks. The following architecture depicts these capabilities:
As the diagram illustrates, users can interact with KFP either through the user interface or through development tools such as Notebooks. Initially, users create components or specify a pipeline using the Kubeflow Pipelines domain-specific language (DSL). Once defined, the compiler transforms the Python code into a YAML static configuration. Then, the Pipeline Service creates a pipeline run from the static configuration. It calls the server of Kubernetes API for creating the necessary Kubernetes resources (CRDs) to run the pipeline. If you have a resource scheduler integrated, you can use it to run the pipeline when resources are available or at a desired time. To complete the pipeline, the containers are executed within the Kubernetes pods, using orchestration controllers.
Two types of data can be stored. The first type is metadata, which includes experiments, jobs, pipeline runs, and single scalar metrics. The second type is artefacts, which includes pipeline packages, views, and large-scale metrics (time series). Metadata is stored in a MySQL database, whereas artefacts are stored within MinIO. Storing them in an external component also enables portability, so artefacts can be migrated to different clusters or environments.
Kubernetes resources created by the Pipeline Service are monitored by the Persistence Agent. To enable reproducibility, the input and output of the containers are recorded. It enables professionals to use the configurations and replicates different tasks, also being able to check if the results match. They consist of parameters or data artefact URIs and are seen as metadata.
The Pipeline web server enables users to get a visual understanding of the steps from the Kubeflow Pipelines. It presents various information, including list of pipelines currently running, history of pipeline execution, data artefacts and logs for debugging.
In order to access Kubefow Pipelines, users can either deploy them independently or as part of the Kubeflow project. For simplified deployment, we recommend using Charmed Kubeflow.
Источник: ubuntu.com
Наш сайт является информационным посредником. Сообщить о нарушении авторских прав.
AIML Kubeflow MLOps