What is a CI/CD pipeline? CI/CD for machine learning DevOps
This article was originally published at Algorithimia’s website. The company was acquired by DataRobot in 2021. This article may not be entirely up-to-date or refer to products and offerings no longer in existence. Find out more about DataRobot MLOps here.
Learn about CI/CD pipelines, how they improve the software development lifecycle, and DataRobot’s CI/CD solution for machine learning.
CI/CD, and more specifically, a CI/CD pipeline allows engineering teams to create bug-free code quickly. It is a strong, sustainable solution to this challenge over time. Implementing this streamlined process is crucial to scaling processes efficiently.
What is CI/CD?
CI stands for Continuous Integration, and CD is short for Continuous Delivery. Continuous integration is when all software developers merge their code changes multiple times throughout the day in a central repository. Continuous delivery means the software release process is entirely automated.
These automated processes eliminate manual, multi-stage tasks, such as provisioning and deployment. There is also a full log within these processes and they are visible to the entire team.
Is CI/CD the same as DevOps?
CI/CD is not the same as DevOps, although it is a DevOps tactic.
DevOps is the practice of agile development using a mindset of agile principles, streamlining building, testing, and release of software. DevOps includes streamlined communication, collaboration, and the use of agile tools.
CI/CD is a DevOps tactic, so it falls under the category of DevOps, but the two terms are not synonymous. CI/CD uses automated testing tools in order to implement agile development.
Why is CI/CD important?
Software development can be a slow process, with many manual steps that take time and leave room for inaccuracies. This is a problem, since it’s so important to be accurate and efficient in developing software. That is why CI/CD and DevOps in general are so important. CI/CD allows for testing to be streamlined and done automatically as well as deployment of developed softwares.
CI/CD streamlines testing, changes, and deployment of new softwares and new versions, saving a lot of time while delivering a better product. Efficiency, accuracy, quality, sustainability, and scalability are the main benefits of CI/CD. So, the importance of CI/CD is that you won’t have an efficient, accurate, high-quality, sustainable, and scalable software development life cycle without it. And those attributes can make or break the success of software.
What is a CI/CD pipeline?
A CI/CD pipeline is an automated system that streamlines the software delivery process. CI/CD pipelines build code, run tests, and deploy new versions of the software when updates are made. The testing portion of the CI/CD pipeline is the CI portion. Code changes are integrated and tested in the automated system. Deployment is the CD portion of the pipeline, since the pipeline continuously deploys, or delivers, the software automatically at scale.
Automated CI/CD pipelines eliminate errors from manual processes, give developers standardized feedback loops, and allow for quick product updates. This pipeline process is a huge improvement upon traditional manual software development life cycles.
The elements of CI/CD pipelines
There are a handful of steps that a CI/CD pipeline consists of, which are steps that exist in software development with or without CI/CD but become automated with a CI/CD pipeline. These software development steps are streamlined in the pipeline, and much more inefficient without it.
The stages of the CI/CD pipeline are:
This automation makes each step as simple as a notification sent to the team after each successful software deployment. Or, if a step fails, the team would be notified of the failure and the cause. This simplifies the process into only the necessary steps of correcting issues rather than spending time on every step of every software deployment.
A typical pipeline begins with the source code repository. A code change triggers the CI/CD system, notifying it to run the pipeline process. The pipeline can also be triggered by automatically scheduled workflows, on user command, or other pipelines that connect to the CI/CD pipeline.
Since cloud-native softwares are usually deployed with Docker, the Docker containers are built during this stage of the pipeline.
If the build stage fails, it indicates that there is a fundamental problem in the configuration of the project which needs to be addressed.
The testing phase is where automated tests are run to validate the correctness of the code and the behavior and performance of the product. This creates a safety net to prevent bugs from reaching the end user. These tests are written by developers, and automated tests are created as new code is written in test or behavior development.
The testing stage can last anywhere from seconds to hours, depending on the project’s complexity and size. Tests often run in multiple stages for larger projects, beginning with smoke tests and quick sanity checks and ending with tests of end-to-end integration over the entire system from the end user’s perspective. These tests can be parallelized to reduce runtime.
If the test phase fails, there are problems in the code that weren’t prevented during the code writing phase. The test phase quickly provides feedback to developers, allowing them to solve problems in a timely manner while they are still top of mind.
A product is ready to deploy when there is a runnable instance of the code that has successfully gone through all the tests in the testing stage. There is often more than one deploy environment, such as a staging or beta environment and a production environment. The staging environment is used internally by the developers and product team to ensure the product is deploying correctly. The production environment is the final end-user environment.
In agile development, which is driven by testing and real-time monitoring, works-in-progress are deployed manually to the staging environment where they go through additional manual review and testing, then automatically deploy to the production environment after approval.
Benefits of a good CI/CD pipeline
The main characteristics and benefits of a great CI/CD pipeline are speed, reliability, and accuracy. Here’s what each of those look like in a workflow:
- Get feedback on the correctness of work faster. Real-time alerts about problems detected in the work allow developers to solve issues while the work is still top of mind rather than going back to fix problems later on.
- Build, test, and deploy code changes quickly. This allows for each small update to be deployed immediately rather than waiting for multiple changes to create larger deploys less often. Large deploys are riskier, so CI/CD allows for less risk in each deployment.
- Scale to meet demands in real time. Traditional pipelines have a limited capacity, but serverless pipelines scale their capacity up and down to meet development demands. This creates a sort of pay-as-you-go type capacity usage, only running a small capacity for smaller projects but allowing for high capacity when needed.
- Set up new pipelines quickly. CI/CD pipelines with microservices architectures allow pieces of pipelines to be reused and put together into new pipelines quickly rather than having to rewrite the same code for each new pipeline.
- Always produce clean, identical outputs for each input. Much frustration in development is caused by intermittent failures. A reliable pipeline works well whether it’s used by a small team or by a large team on a new technology stack.
- Accurately automate the software delivery process. Run and visualize the entire process, modeling both simple and complex workflows, making human error impossible in repetitive tasks.
CI/CD for machine learning
CI/CD can be applied to any software development life cycle, including machine learning. Just like other software, ML models only provide value after they reach production. So, CI/CD can reduce the time it takes to get return from your ML investment.