- Technical paper
The artificial intelligence and machine learning journey to cloud
The ability to manage artificial intelligence (AI) and machine learning (ML) solutions on the cloud is an essential skill for every enterprise. It's the only way to achieve data-driven decision-making and embrace technologies like generative AI.
However, managing AI and ML on the cloud comes with challenges – a need for robust lineage, governance, and risk mitigation tactics. Here, we share some recommendations on how to make the journey a little smoother.
The evolution of AI and ML life cycle management
Though the typical stages of a data science life cycle have not changed much since the early 2000s, the ecosystem around it has transformed considerably.
Today, the expectations of the AI and ML life cycle are different (figure 1). As enterprises grapple with a complicated network of systems that generate more data than ever before, they need a real-time continuous integration, continuous delivery, continuous testing (CI, CD, CT) pipeline.
Figure 1: The stages of AI and ML lifecycle management
Each stage needs people with specialized skills to effectively translate business requirements into technical specifications (figure 2) for successful implementation and monitoring.
In the past, a traditional data pipeline was developed in a limited number of systems. Now, real-time live systems need integrated data sources and simultaneous data processing and analysis to feed the business intelligence (BI) reports, dashboards, and applications required for decision-making at speed.
Figure 2: Requirements of AI and ML lifecycle management
Leading your data science and ML projects
The evolution of the data ecosystem from on-premises storage to cloud-native applications poses several challenges to enterprises. On the one hand, ML development is an experimental, exploratory process. On the other, deployment requires consistent results that are secure and fail-proof in production systems. The typical activities of modern AI and ML life cycle management are visible below (figure 3).
Figure 3: Cloud AI and ML lifecycle management activities
Our three-phase approach (figure 4) enables a sustained, results-driven shift toward AI and ML on the cloud.
Figure 4: The three phases of the AI and ML cloud journey
Plan for cloud
Robust planning and strategy underpin successful cloud journeys for every organization. But remember, not all workloads require cloud migration, and organizations must clarify their reasons for migrations to avoid cost, schedule, and performance overruns.
The six stages of planning are:
- Define business objectives, performance metrics, and key performance indicators (KPIs) to monitor ML models effectively
- Develop a strategy and change management road map that aligns people, processes, and technology for healthy adoption rates
- Clearly delineate roles and responsibilities required for a successful project, such as program and project leaders, industry experts, ML and technical architects, data engineers, algorithm developers, and ML and DevOps engineers
- Assess the current and future-state cloud platform to design solution architecture and data pipelines in line with policies and regulations
- Choose an appropriate cloud architecture, whether it's hybrid or multicloud, as suited to the specific business needs of the project
- Plan timelines and phases for IT provisioning and strengthen relevant cloud skills in parallel
Bringing it to life: Transforming invoicing
A global healthcare organization wanted to transform its invoice process. A detailed assessment of customer pain points revealed the need for a low/no-touch invoice processing strategy. By gathering raw data from 3.6 million+ invoice lines and building a data and ML pipeline on the cloud, Genpact's ML algorithms predicted the probability of invoices likely to be paid late. We were able to do this with 87% accuracy using customer segmentation variables that influence customer payment behavior. Using these insights, past-due invoices decreased from 20%–25% to less than 12%.
Migrate to the cloud
Moving data to the cloud allows leaders to rapidly democratize access to data, empower employees to make data-driven decisions more easily, and reduce operational costs.
As you migrate to a cloud platform, consider the following pieces of advice:
- Deploy complete data pipeline and data preprocessing with cloud services as outlined in the plan stage
- Collect data sources and configure data ingestion, data transformation, data storage, and existing ML projects (training and testing pipelines) on the cloud
Live on the cloud
Living on the cloud goes behind cloud migration. This is where the cloud underpins innovation across your enterprise. Teams need to adapt to the development, testing, and training of models on cloud services and resources, for scalability, optimal use, and cost control.
The following steps ensure a successful strategy for living in the cloud:
- Develop the ML model and choose the best model based on quantitative and qualitative measures, ensuring reproducibility through a version control of data and models along with parameters in the ML system
- Deploy the chosen model to production
- Integrate with the required output, such as business intelligence (BI) dashboards, custom applications, or third-party APIs
Optimization and model monitoring are essential for maintaining a feedback loop from the deployed model to the building model. The ML and DevOps engineers must set up a model monitoring metrics stack and automate the monitor in real time to ensure that models remain relevant in the context of the most recent data in production.
Three broad categories of metrics must be monitored:
- Stability metrics to capture data distribution shifts in production
- Performance metrics to identify concept shifts in data and track the change in the relationship between independent and dependent variables
- Operational metrics to identify ML system health issues such as IO/memory/CPU usage, disk utilization, ML endpoint calls, and latency
Bringing it to life: Transforming customer service
A global provider of scientific products and services lacked integrated and automated visibility into order and customer information. Genpact built an intelligent solution on a cloud pipeline to categorize incoming customer emails using ML and natural language processing. This allowed the company to route them to appropriate systems or workstreams. This was the start of an intuitive workflow for the customer service team that expedited the resolution of customer issues and accelerated revenue growth.
Making the next move
Creating an ML model that works well is only one part of delivering integrated ML solutions. The challenges of operationalizing ML models require a prudent approach – one that helps data scientists, supports a robust data pipeline, and ensures secure, reproducible, monitored, and trustworthy ML models. The approaches outlined here will ensure cloud, ML, and AI can work together in harmony to build a data-driven enterprise.
This paper is authored by Sreekanth Menon, AI/ML leader, and Megha Sinha, augmented intelligence leader, Genpact.