Commit c36a7082 authored by Cristian Quezada's avatar Cristian Quezada

Initial commit

parents
Pipeline #167 canceled with stages
<div align="center" id="top">
<img src="./.github/app.gif" alt="Mlops_aws" />
&#xa0;
<!-- <a href="https://mlops_aws.netlify.app">Demo</a> -->
</div>
<h1 align="center">Mlops_aws</h1>
<!-- Status -->
<!-- <h4 align="center">
🚧 Mlops_aws 🚀 Under construction... 🚧
</h4>
<hr> -->
<br>
## AWS MLOPS ##
Repositorio con los scripts necesarios para implementar una arquitectura de MLOPS.
Esta implementación permite agregar a tu código de ML la capacidad de entrenar y desplegar. Permite realizar cambios en el código y en los datos mientras se realizan pruebas para validarlos.
## Pre-Requisitos
### Services
Se necesita una experiencia básica en :
- Train/test un ML model
- Python ([scikit-learn](https://scikit-learn.org/stable/#))
- [Jupyter Notebook](https://jupyter.org/)
- [AWS CodePipeline](https://aws.amazon.com/codepipeline/)
- [AWS CodeCommit](https://aws.amazon.com/codecommit/)
- [AWS CodeBuild](https://aws.amazon.com/codebuild/)
- [Amazon ECR](https://aws.amazon.com/ecr/)
- [Amazon SageMaker](https://aws.amazon.com/sagemaker/)
- [AWS CloudFormation](https://aws.amazon.com/cloudformation/)
- [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-welcome.html)
### Cuenta AWS
Se requiere una cuenta AWS, revisar si los serivicios estan disponibles en la capa gratuita.
## Parte 1 : Creación de Imagen Docker
### Arquitectura
![Build Docker Image](resources/imgs/MLOps_BuildImage.jpg)
* Un ML Developer crea assets(Modelo RandomForest) para la imagen Docker y realiza un push a CodeCommit
* CodePipeline escucha el evento push de CodeCommit, obtiene el código fuente y lanza CodeBuild
* CodeBuild se autentica en ECR, crea la imagen de Docker y la inserta en el repositorio de ECR.
## Parte 2 : Entrenamiento y Despliegue
### Arquitectura
![Build Docker Image](resources/imgs/MLOps_Train_Deploy_TestModel.png)
* Inicia al subir un zip con los assets(configuración de despliegue) + datasets
* CodePipeline escucha este evento y ejecuta Lambda Request para verificar el archivo en el bucket e iniciar el entrenamiento
* Lambda envia trabajos de entrenamiento a Sagemaker
* Cuando finaliza el entrenamiento , CodePipeline verifica el estado
* CodePipeline llama a CloudFormation para desplegar en un entorno DEV
* Luego espera una aprobación manual
* CodePipeline llama a CloudFormation para desplegar en un entorno PROD
## Instrucciones
Opción 1: Ejecutar CloudFormation utilizando la plantilla que se encuentra en code/yml/m.yml
Opción 2: Ejecutar CloudFormation utilizando la plantilla que se alojada en un bucket público de s3
Region| Launch
------|-----
US East (N. Virginia) | [![Launch MLOps solution in us-east-1](resources/imgs/cloudformation-launch-stack.png)](https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/new?stackName=AIWorkshop&templateURL=https://s3.amazonaws.com/aws-ai-ml-aod-latam/mlops-workshop/m.yml)
### Contenido - Code
En la carpeta code/lambdas se encuentra los archivos python que se indican en el gráfico.
- mlops-env-setup: Obtiene variables de entorno de codecommit para establecer el entorno con las credenciales.
- mlops-op-deployment: Se ecnuentra el código que permite realizar el despliegue automático de un modelo entrenado por Sagemaker.
- mlops-op-process-request: Verifica que el archivo trainingjob.zip alojado en el bucket contenga los archivos necesarios para el entrenamiento y el despliegue.
- mlops-op-training: Extrae la data del archivo de entrenamiento y los manda a Sagemaker para ejecutar trabajos de entrenamiento.
### Contenido - Notebooks
1- Crear Imagen Docker/
01_crear_imagen_docker : creación de imagen , empleando modelos personalizados
02_Probar_local_mode : En el notebook anterior hay una celda que permite desplegar un servicio local, y con este notebook lo podemos probar.
03_Probando_sagemaker_estimator: Permite probar la imagen docker creada y utilizarla para ejecutar trabajos de entrenamiento y realizar predicciones.
2-Entrenamiento y Despliegue/
01_Training: Aca se muestra toda la configuración para realizar el despliegue y como se adjunta al dataset en un archivo zip para iniciar el flujo de la parte 2.
02_Check_Progress: Opcional, muestra una vista con más detalle a la que se muestra en la interfas de CodePipeline
# aws cloudformation delete-stack --stack-name scikit-image
# aws cloudformation create-stack --stack-name scikit-image --template-body file://build-image.yml
Description: Create a CodePipeline for creating a Docker base image for training/serving models
Parameters:
RepoBranchName:
Type: String
Description: Name of the branch the code is located
ImageRepoName:
Type: String
Description: Name of the ECR repo without the image name
ImageTagName:
Type: String
Description: Name of the ECR image tag
Default: latest
Resources:
BuildImageProject:
Type: AWS::CodeBuild::Project
Properties:
Name: !Sub mlops-buildimage-${ImageRepoName}
Description: Build a Model Image
ServiceRole: !Sub arn:aws:iam::${AWS::AccountId}:role/MLOps
Artifacts:
Type: CODEPIPELINE
Source:
Type: CODEPIPELINE
BuildSpec: buildspec.yml
Environment:
Type: LINUX_CONTAINER
ComputeType: BUILD_GENERAL1_SMALL
Image: aws/codebuild/docker:17.09.0
EnvironmentVariables:
- Name: IMAGE_REPO_NAME
Value:
Ref: ImageRepoName
- Name: IMAGE_TAG
Value:
Ref: ImageTagName
- Name: AWS_ACCOUNT_ID
Value: !Sub ${AWS::AccountId}
- Name: AWS_DEFAULT_REGION
Value: !Sub ${AWS::Region}
- Name: TEMPLATE_BUCKET
Value: !Sub mlops-${AWS::Region}-${AWS::AccountId}
- Name: TEMPLATE_PREFIX
Value: codebuild
Tags:
- Key: Name
Value: !Sub mlops-buildimage-${ImageRepoName}
DeployPipeline:
Type: "AWS::CodePipeline::Pipeline"
Properties:
Name: !Sub mlops-${ImageRepoName}
RoleArn: !Sub arn:aws:iam::${AWS::AccountId}:role/MLOps
ArtifactStore:
Type: S3
Location: !Sub mlops-${AWS::Region}-${AWS::AccountId}
Stages:
-
Name: Source
Actions:
-
Name: GetSource
ActionTypeId:
Category: Source
Owner: AWS
Version: 1
Provider: CodeCommit
OutputArtifacts:
-
Name: ModelSourceOutput
Configuration:
BranchName:
Ref: RepoBranchName
RepositoryName: mlops
RunOrder: 1
-
Name: Build
Actions:
-
Name: BuildImage
InputArtifacts:
- Name: ModelSourceOutput
ActionTypeId:
Category: Build
Owner: AWS
Version: 1
Provider: CodeBuild
Configuration:
ProjectName:
Ref: BuildImageProject
RunOrder: 1
# These are the parameters we'll ask the user before creating the environment
Parameters:
NotebookInstanceSubNetId:
Type: AWS::EC2::Subnet::Id
Description: "Select any subnet id"
AllowedPattern: ^subnet\-[a-zA-Z0-9]+$
ConstraintDescription: "You need to inform any subnetid"
NotebookInstanceSecGroupId:
Type: List<AWS::EC2::SecurityGroup::Id>
Description: "Select the default security group"
AllowedPattern: ^sg\-[a-zA-Z0-9]+$
ConstraintDescription: "Select the default security group"
Resources:
####################
## PERMISSIONS
####################
MLOpsSecurity:
Type: AWS::CloudFormation::Stack
DeletionPolicy: Delete
Properties:
TemplateURL: https://s3.amazonaws.com/aws-ai-ml-aod-latam/mlops-workshop/assets/mlops_security.yml
MLOpsLambdaLayers:
Type: AWS::CloudFormation::Stack
DeletionPolicy: Delete
Properties:
TemplateURL: https://s3.amazonaws.com/aws-ai-ml-aod-latam/mlops-workshop/assets/mlops_crhelper.yml
MLOpsProcessRequest:
Type: AWS::CloudFormation::Stack
DeletionPolicy: Delete
Properties:
TemplateURL: https://s3.amazonaws.com/aws-ai-ml-aod-latam/mlops-workshop/assets/mlops_op_process_request.yml
DependsOn:
- MLOpsLambdaLayers
- MLOpsSecurity
MLOpsDeploymentOperator:
Type: AWS::CloudFormation::Stack
DeletionPolicy: Delete
Properties:
TemplateURL: https://s3.amazonaws.com/aws-ai-ml-aod-latam/mlops-workshop/assets/mlops_op_deploy.yml
DependsOn:
- MLOpsLambdaLayers
- MLOpsSecurity
MLOpsTrainingOperator:
Type: AWS::CloudFormation::Stack
DeletionPolicy: Delete
Properties:
TemplateURL: https://s3.amazonaws.com/aws-ai-ml-aod-latam/mlops-workshop/assets/mlops_op_training.yml
DependsOn:
- MLOpsLambdaLayers
- MLOpsSecurity
## OK. Then we'll create some repos for the source code
## We need to create two branches in the default repo, so
## let's use a custom resource with a Lambda Function to do that
## Also, when the stack is deleted we need to remove all the versioned
## files from the S3 bucket, otherwise it will fail
MLOpsEnvSetup:
Type: AWS::Lambda::Function
Properties:
Code:
ZipFile: !Sub |
import cfnresponse
import boto3
codeCommit = boto3.client('codecommit')
s3 = boto3.resource('s3')
ecr = boto3.client('ecr')
def lambda_handler(event, context):
responseData = {'status': 'NONE'}
if event['RequestType'] == 'Create':
repoName = event['ResourceProperties'].get('RepoName')
branch_names = event['ResourceProperties'].get('BranchNames')
branches = codeCommit.list_branches(repositoryName=repoName)['branches']
responseData['default_branch'] = branch_names[0]
if len(branches) == 0:
putFiles = {'filePath': 'buildspec.yml', 'fileContent': "version: 0.2\nphases:\n build:\n commands:\n - echo 'dummy'\n".encode()}
resp = codeCommit.create_commit(repositoryName=repoName, branchName='master', commitMessage=' - repo init', putFiles=[putFiles])
for i in branch_names:
codeCommit.create_branch(repositoryName=repoName, branchName=i, commitId=resp['commitId'])
responseData['status'] = 'CREATED'
elif event['RequestType'] == 'Delete':
s3.Bucket( event['ResourceProperties'].get('BucketName') ).object_versions.all().delete()
try:
for i in event['ResourceProperties'].get('ImageRepoNames'):
imgs = ecr.list_images(registryId='${AWS::AccountId}', repositoryName=i)
ecr.batch_delete_image(registryId='${AWS::AccountId}', repositoryName=i, imageIds=imgs['imageIds'])
except Exception as e:
pass
responseData['status'] = 'DELETED'
cfnresponse.send(event, context, cfnresponse.SUCCESS, responseData)
FunctionName: mlops-env-setup
Handler: "index.lambda_handler"
Timeout: 60
MemorySize: 512
Role: !Sub arn:aws:iam::${AWS::AccountId}:role/MLOps
Runtime: python3.7
DependsOn:
- MLOpsSecurity
MLOpsEnvSetupCaller:
Type: Custom::EnvSetupCaller
Properties:
ServiceToken: !GetAtt MLOpsEnvSetup.Arn
RepoName: mlops
BranchNames:
- iris_model
ImageRepoNames:
- iris-model
BucketName: !Ref MLOpsBucket
DependsOn:
- MLOpsRepo
- MLOpsIrisModelRepo
####################
## REPOSITORIES
####################
## We have the custome resource that can invoke a lambda function to create the branches,
## So, let's create the CodeCommit repo and also the SageMaker Code repos
MLOpsRepo:
Type: AWS::CodeCommit::Repository
Properties:
RepositoryDescription: Repository for the ML models/images code
RepositoryName: mlops
MLOpsIrisModelRepo:
Type: AWS::ECR::Repository
Properties:
RepositoryName: iris-model
MLOpsBucket:
Type: AWS::S3::Bucket
Properties:
BucketName: !Sub mlops-${AWS::Region}-${AWS::AccountId}
Tags:
- Key: Name
Value: !Sub mlops-${AWS::Region}-${AWS::AccountId}
AccessControl: Private
VersioningConfiguration:
Status: Enabled
####################
## PIPELINES
####################
BuildPipelineIrisModel:
Type: AWS::CloudFormation::Stack
DeletionPolicy: Delete
Properties:
TemplateURL: https://s3.amazonaws.com/aws-ai-ml-aod-latam/mlops-workshop/assets/build_image.yml
Parameters:
RepoBranchName: iris_model
ImageRepoName: iris-model
ImageTagName: latest
DependsOn:
- MLOpsBucket
- MLOpsCodeRepo
MLPipelineIrisModel:
Type: AWS::CloudFormation::Stack
DeletionPolicy: Delete
Properties:
TemplateURL: https://s3.amazonaws.com/aws-ai-ml-aod-latam/mlops-workshop/assets/mlops_pipeline.yml
Parameters:
SourceBucketPath: !Ref MLOpsBucket
ModelNamePrefix: iris-model
DependsOn:
- MLOpsBucket
- MLOpsCodeRepo
####################
## Notebook Instance
####################
MLOpsExercisesRepo:
Type: AWS::SageMaker::CodeRepository
Properties:
CodeRepositoryName: MLOpsExercisesRepo
GitConfig:
RepositoryUrl: https://github.com/awslabs/amazon-sagemaker-mlops-workshop.git
MLOpsCodeRepo:
Type: AWS::SageMaker::CodeRepository
Properties:
CodeRepositoryName: MLOpsCodeRepo
GitConfig:
RepositoryUrl: !GetAtt MLOpsRepo.CloneUrlHttp
Branch: !Sub "${MLOpsEnvSetupCaller.default_branch}"
DependsOn: MLOpsEnvSetupCaller
IAWorkshopNotebookInstanceLifecycleConfig:
Type: "AWS::SageMaker::NotebookInstanceLifecycleConfig"
Properties:
NotebookInstanceLifecycleConfigName: !Sub ${AWS::StackName}-lifecycle-config
OnStart:
- Content: !Base64 |
#!/bin/bash
sudo -u ec2-user -i <<'EOF'
echo "Finally, let's clone and build an image for testing codebuild locally"
git clone https://github.com/aws/aws-codebuild-docker-images.git /tmp/aws-codebuild
chmod +x /tmp/aws-codebuild/local_builds/codebuild_build.sh
docker pull amazon/aws-codebuild-local:latest --disable-content-trust=false
# This will affect only the Jupyter kernel called "conda_python3".
source activate python3
# Update the sagemaker to the latest version
pip install -U sagemaker
source deactivate
EOF
MLOpsNotebookInstance:
Type: "AWS::SageMaker::NotebookInstance"
Properties:
NotebookInstanceName: MLOpsWorkshop
InstanceType: "ml.m4.xlarge"
SubnetId: !Ref NotebookInstanceSubNetId
SecurityGroupIds: !Ref NotebookInstanceSecGroupId
RoleArn: !Sub arn:aws:iam::${AWS::AccountId}:role/MLOps
DefaultCodeRepository: MLOpsCodeRepo
AdditionalCodeRepositories:
- MLOpsExercisesRepo
VolumeSizeInGB: 15
LifecycleConfigName: !GetAtt IAWorkshopNotebookInstanceLifecycleConfig.NotebookInstanceLifecycleConfigName
DependsOn:
- IAWorkshopNotebookInstanceLifecycleConfig
- MLOpsCodeRepo
- MLOpsExercisesRepo
- MLOpsSecurity
Outputs:
MLOpsNotebookInstanceId:
Value: !Ref MLOpsNotebookInstance
Resources:
CloudFormationHelperLayer:
Type: AWS::Lambda::LayerVersion
Properties:
CompatibleRuntimes:
- python3.6
- python3.7
LayerName: crhelper
Description: https://github.com/aws-cloudformation/custom-resource-helper
LicenseInfo: Apache 2.0 License
Content:
S3Bucket: aws-ai-ml-aod-latam
S3Key: mlops-workshop/assets/crhelper.zip
Outputs:
LayerArn:
Description: Arn of the layer's latest version
Value: !Ref CloudFormationHelperLayer
Export:
Name: mlops-crhelper-LayerArn
Resources:
MLOpsDeployment:
Type: "AWS::Lambda::Function"
Properties:
FunctionName: mlops-op-deployment
Handler: mlops_op_deploy.lambda_handler
MemorySize: 512
Role: !Sub arn:aws:iam::${AWS::AccountId}:role/MLOps
Runtime: python3.7
Timeout: 60
Layers:
- Fn::ImportValue: mlops-crhelper-LayerArn
Code:
S3Bucket: aws-ai-ml-aod-latam
S3Key: mlops-workshop/assets/src/mlops_op_deploy.zip
Description: "Function that will start a new Sagemaker Deployment"
Tags:
- Key: Description
Value: Lambda function that process the request and prepares the cfn template for deployment
Resources:
MLOpsProcessRequest:
Type: "AWS::Lambda::Function"
Properties:
FunctionName: mlops-op-process-request
Handler: index.lambda_handler
MemorySize: 512
Role: !Sub arn:aws:iam::${AWS::AccountId}:role/MLOps
Runtime: python3.7
Timeout: 60
Code:
ZipFile: !Sub |
import boto3
import io
import zipfile
import json
from datetime import datetime
s3 = boto3.client('s3')
codepipeline = boto3.client('codepipeline')
def lambda_handler(event, context):
trainingJob = None
deployment = None
try:
now = datetime.now()
jobId = event["CodePipeline.job"]["id"]
user_params = json.loads(event["CodePipeline.job"]["data"]["actionConfiguration"]["configuration"]["UserParameters"])
model_prefix = user_params['model_prefix']
mlops_operation_template = s3.get_object(Bucket=user_params['bucket'], Key=user_params['prefix'] )['Body'].read()
job_name = 'mlops-%s-%s' % (model_prefix, now.strftime("%Y-%m-%d-%H-%M-%S"))
s3Location = None
for inputArtifacts in event["CodePipeline.job"]["data"]["inputArtifacts"]:
if inputArtifacts['name'] == 'ModelSourceOutput':
s3Location = inputArtifacts['location']['s3Location']
params = {
"Parameters": {
"AssetsBucket": s3Location['bucketName'],
"AssetsKey": s3Location['objectKey'],
"Operation": "training",
"Environment": "none",
"JobName": job_name
}
}
for outputArtifacts in event["CodePipeline.job"]["data"]["outputArtifacts"]:
if outputArtifacts['name'] == 'RequestOutput':
s3Location = outputArtifacts['location']['s3Location']
zip_bytes = io.BytesIO()
with zipfile.ZipFile(zip_bytes, "w") as z:
z.writestr('assets/params_train.json', json.dumps(params))
params['Parameters']['Operation'] = 'deployment'
params['Parameters']['Environment'] = 'development'
z.writestr('assets/params_deploy_dev.json', json.dumps(params))
params['Parameters']['Environment'] = 'production'
z.writestr('assets/params_deploy_prd.json', json.dumps(params))
z.writestr('assets/mlops_operation_handler.yml', mlops_operation_template)
zip_bytes.seek(0)
s3.put_object(Bucket=s3Location['bucketName'], Key=s3Location['objectKey'], Body=zip_bytes.read())
# and update codepipeline
codepipeline.put_job_success_result(jobId=jobId)
except Exception as e:
resp = codepipeline.put_job_failure_result(
jobId=jobId,
failureDetails={
'type': 'ConfigurationError',
'message': str(e),
'externalExecutionId': context.aws_request_id
}
)
Description: "Function that will start a new Sagemaker Training Job"
Tags:
- Key: Description
Value: Lambda function that process the request and prepares the cfn template for training
Resources:
MLOpsTraining:
Type: "AWS::Lambda::Function"
Properties:
FunctionName: mlops-op-training
Handler: index.lambda_handler
MemorySize: 512
Role: !Sub arn:aws:iam::${AWS::AccountId}:role/MLOps
Runtime: python3.7
Timeout: 60
Layers:
- Fn::ImportValue: mlops-crhelper-LayerArn
Code:
ZipFile: !Sub |
import boto3
import io
import zipfile
import json
import logging
from crhelper import CfnResource
logger = logging.getLogger(__name__)
# Initialise the helper, all inputs are optional, this example shows the defaults
helper = CfnResource(json_logging=False, log_level='DEBUG', boto_level='CRITICAL')
s3 = boto3.client('s3')
sm = boto3.client('sagemaker')
def lambda_handler(event, context):
helper(event, context)
@helper.create
@helper.update
def start_training_job(event, context):
try:
# Get the training job and deployment descriptors
training_params = None
deployment_params = None
job_name = event['ResourceProperties']['JobName']
helper.Data.update({'job_name': job_name})
try:
# We need to check if there is another training job with the same name
sm.describe_training_job(TrainingJobName=job_name)
## there is, let's let the poll to address this
except Exception as a:
# Ok. there isn't. so, let's start a new training job
resp = s3.get_object(Bucket=event['ResourceProperties']['AssetsBucket'], Key=event['ResourceProperties']['AssetsKey'])
with zipfile.ZipFile(io.BytesIO(resp['Body'].read()), "r") as z:
training_params = json.loads(z.read('trainingjob.json').decode('ascii'))
deployment_params = json.loads(z.read('deployment.json').decode('ascii'))
training_params['TrainingJobName'] = job_name
resp = sm.create_training_job(**training_params)
except Exception as e:
logger.error("start_training_job - Ops! Something went wrong: %s" % e)
raise e
@helper.delete
def stop_training_job(event, context):
try:
job_name = event['ResourceProperties']['JobName']
status = sm.describe_training_job(TrainingJobName=job_name)['TrainingJobStatus']
if status == 'InProgress':
logger.info('Stopping InProgress training job: %s', job_name)
sm.stop_training_job(TrainingJobName=job_name)
return False
else:
logger.info('Training job status: %s, nothing to stop', status)
except Exception as e:
logger.error("stop_training_job - Ops! Something went wrong: %s" % e)
return True
@helper.poll_create
@helper.poll_update
def check_training_job_progress(event, context):
failed = False
try:
job_name = helper.Data.get('job_name')
resp = sm.describe_training_job(TrainingJobName=job_name)
status = resp['TrainingJobStatus']
if status == 'Completed':
logger.info('Training Job (%s) is Completed', job_name)
return True
elif status in ['InProgress', 'Stopping' ]:
logger.info('Training job (%s) still in progress (%s), waiting and polling again...',
job_name, resp['SecondaryStatus'])
elif status == 'Failed':
failed = True
raise Exception('Training job has failed: {}',format(resp['FailureReason']))
else:
raise Exception('Training job ({}) has unexpected status: {}'.format(job_name, status))
except Exception as e:
logger.error("check_training_job_progress - Ops! Something went wrong: %s" % e)
if failed:
raise e
return False
@helper.poll_delete
def check_stopping_training_job_progress(event, context):
logger.info("check_stopping_training_job_progress")
return stop_training_job(event, context)
Description: "Function that will start a new Sagemaker Training Job"
Tags:
- Key: Description
Value: Lambda function that process the request and prepares the cfn template for training
Description: Create a CodePipeline for a Machine Learning Pipeline
Parameters:
SourceBucketPath:
Type: String
Description: Path of the S3 bucket that CodePipeline should find a sagemaker jobfile
ModelNamePrefix:
Type: String
Description: The name prefix of the model that will be supported by this pipeline
Resources:
DeployPipeline:
Type: "AWS::CodePipeline::Pipeline"
Properties:
Name: !Sub ${ModelNamePrefix}-pipeline
RoleArn: !Sub arn:aws:iam::${AWS::AccountId}:role/MLOps
ArtifactStore:
Type: S3
Location: !Sub mlops-${AWS::Region}-${AWS::AccountId}
Stages:
-
Name: Source
Actions:
-
Name: SourceAction
ActionTypeId:
Category: Source
Owner: AWS
Version: 1
Provider: S3
OutputArtifacts:
-
Name: ModelSourceOutput
Configuration:
S3Bucket:
!Sub ${SourceBucketPath}
S3ObjectKey:
!Sub training_jobs/${ModelNamePrefix}/trainingjob.zip
RunOrder: 1
-
Name: ProcessRequest
Actions:
-
Name: ProcessRequest
InputArtifacts:
- Name: ModelSourceOutput
OutputArtifacts:
-
Name: RequestOutput
ActionTypeId:
Category: Invoke
Owner: AWS
Version: 1
Provider: Lambda
Configuration:
FunctionName: mlops-op-process-request
UserParameters: !Sub '{"model_prefix": "${ModelNamePrefix}", "bucket":"aws-ai-ml-aod-latam","prefix":"mlops-workshop/assets/mlops_operation_handler.yml" }'
RunOrder: 1
-
Name: Train
Actions:
-
Name: TrainModel
InputArtifacts:
- Name: ModelSourceOutput
- Name: RequestOutput
OutputArtifacts:
- Name: ModelTrainOutput
ActionTypeId:
Category: Deploy
Owner: AWS
Version: 1
Provider: CloudFormation
Configuration:
ActionMode: CREATE_UPDATE
RoleArn: !Sub arn:aws:iam::${AWS::AccountId}:role/MLOps
StackName: !Sub mlops-training-${ModelNamePrefix}-job
TemplateConfiguration: RequestOutput::assets/params_train.json
TemplatePath: RequestOutput::assets/mlops_operation_handler.yml
RunOrder: 1
-
Name: DeployDev
Actions:
-
Name: DeployDevModel
InputArtifacts:
- Name: ModelSourceOutput
- Name: RequestOutput
OutputArtifacts:
- Name: ModelDeployDevOutput
ActionTypeId:
Category: Deploy
Owner: AWS
Version: 1
Provider: CloudFormation
Configuration:
ActionMode: CREATE_UPDATE
RoleArn: !Sub arn:aws:iam::${AWS::AccountId}:role/MLOps
StackName: !Sub mlops-deploy-${ModelNamePrefix}-dev
TemplateConfiguration: RequestOutput::assets/params_deploy_dev.json
TemplatePath: RequestOutput::assets/mlops_operation_handler.yml
RunOrder: 1
-
Name: DeployApproval
Actions:
-
Name: ApproveDeploy
ActionTypeId:
Category: Approval
Owner: AWS
Version: 1
Provider: Manual
Configuration:
CustomData: 'Shall this model be put into production?'
RunOrder: 1
-
Name: DeployPrd
Actions:
-
Name: DeployModelPrd
InputArtifacts:
- Name: ModelSourceOutput
- Name: RequestOutput
OutputArtifacts:
- Name: ModelDeployPrdOutput
ActionTypeId:
Category: Deploy
Owner: AWS
Version: 1
Provider: CloudFormation
Configuration:
ActionMode: CREATE_UPDATE
RoleArn: !Sub arn:aws:iam::${AWS::AccountId}:role/MLOps
StackName: !Sub mlops-deploy-${ModelNamePrefix}-prd
TemplateConfiguration: RequestOutput::assets/params_deploy_prd.json
TemplatePath: RequestOutput::assets/mlops_operation_handler.yml
RunOrder: 1
Resources:
MLOpsRole:
Type: "AWS::IAM::Role"
Properties:
RoleName: MLOps
AssumeRolePolicyDocument:
Version: "2012-10-17"
Statement:
-
Effect: "Allow"
Principal:
Service:
- "sagemaker.amazonaws.com"
Action:
- "sts:AssumeRole"
-
Effect: "Allow"
Principal:
Service:
- "cloudformation.amazonaws.com"
Action:
- "sts:AssumeRole"
-
Effect: "Allow"
Principal:
Service:
- "codepipeline.amazonaws.com"
Action:
- "sts:AssumeRole"
-
Effect: "Allow"
Principal:
Service:
- "codebuild.amazonaws.com"
Action:
- "sts:AssumeRole"
-
Effect: "Allow"
Principal:
Service:
- "lambda.amazonaws.com"
Action:
- "sts:AssumeRole"
-
Effect: "Allow"
Principal:
Service:
- "events.amazonaws.com"
Action:
- "sts:AssumeRole"
-
Effect: "Allow"
Principal:
Service:
- "states.amazonaws.com"
Action:
- "sts:AssumeRole"
-
Effect: "Allow"
Principal:
Service:
- "glue.amazonaws.com"
Action:
- "sts:AssumeRole"
Path: "/"
Policies:
-
PolicyName: "Admin"
PolicyDocument:
Version: "2012-10-17"
Statement:
-
Effect: "Allow"
Action: "*"
Resource: "*"
Outputs:
LayerArn:
Description: Arn of the role
Value: !Ref MLOpsRole
Export:
Name: mlops-RoleArn
This source diff could not be displayed because it is too large. You can view the blob instead.
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Primero probemos haciendo ping (GET /ping)\n",
"Sagemaker utilizará este meétodo para comprobar el estado de nuestro modelo.\n",
"Debe devolver un código **200**."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"from urllib import request\n",
"\n",
"base_url='http://localhost:8080'"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Response code: 200\n"
]
}
],
"source": [
"resp = request.urlopen(\"%s/ping\" % base_url)\n",
"print(\"Response code: %d\" % resp.getcode() )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Ahora podemos hacer predicciones (POST /invocations)\n",
"SAgemaker utilizará este meétodo para las predicciones. Aquí estamos simulando el parámetro de encabezado relacionado con CustomAttributes"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"================================================\n",
"Response code: 200, Prediction: b'0.0\\n'\n",
"\n",
"content-type text/csv\n",
"x-request-id a1d96868-7349-4cab-b308-9e83f0c0329c\n",
"Pragma no-cache\n",
"Cache-Control no-cache; no-store, must-revalidate, private\n",
"Expires Thu, 01 Jan 1970 00:00:00 UTC\n",
"content-length 4\n",
"connection keep-alive\n",
"================================================\n",
"Response code: 200, Prediction: b'2.0\\n'\n",
"\n",
"content-type text/csv\n",
"x-request-id 250bf8e7-a439-437a-a894-b38fe19aa625\n",
"Pragma no-cache\n",
"Cache-Control no-cache; no-store, must-revalidate, private\n",
"Expires Thu, 01 Jan 1970 00:00:00 UTC\n",
"content-length 4\n",
"connection keep-alive\n",
"================================================\n",
"Response code: 200, Prediction: b'1.0\\n'\n",
"\n",
"content-type text/csv\n",
"x-request-id 77c1b1c0-19da-4715-8adf-5d918e404fea\n",
"Pragma no-cache\n",
"Cache-Control no-cache; no-store, must-revalidate, private\n",
"Expires Thu, 01 Jan 1970 00:00:00 UTC\n",
"content-length 4\n",
"connection keep-alive\n",
"CPU times: user 8.59 ms, sys: 8.27 ms, total: 16.9 ms\n",
"Wall time: 342 ms\n"
]
}
],
"source": [
"%%time\n",
"from sagemaker.serializers import CSVSerializer\n",
"csv_serializer = CSVSerializer()\n",
"payloads = [\n",
" [4.6, 3.1, 1.5, 0.2], # 0\n",
" [7.7, 2.6, 6.9, 2.3], # 2\n",
" [6.1, 2.8, 4.7, 1.2] # 1\n",
"]\n",
"\n",
"def predict(payload):\n",
" headers = {\n",
" 'Content-type': 'text/csv',\n",
" 'Accept': 'text/csv'\n",
" }\n",
" \n",
" req = request.Request(\"%s/invocations\" % base_url, data=csv_serializer.serialize(payload).encode('utf-8'), headers=headers)\n",
" resp = request.urlopen(req)\n",
" print('================================================')\n",
" print(\"Response code: %d, Prediction: %s\\n\" % (resp.getcode(), resp.read()))\n",
" for i in resp.headers:\n",
" print(i, resp.headers[i])\n",
"\n",
"for p in payloads:\n",
" predict(p)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Todo Ok"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "conda_python3",
"language": "python",
"name": "conda_python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.13"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## First, let's test the ping method (GET /ping)\n",
"This method will be used by Sagemaker for health check our model. It must return a code **200**."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"from urllib import request\n",
"\n",
"base_url='http://localhost:8080'"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Response code: 200\n"
]
}
],
"source": [
"resp = request.urlopen(\"%s/ping\" % base_url)\n",
"print(\"Response code: %d\" % resp.getcode() )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Then we can the predictions (POST /invocations)\n",
"This method will be used by Sagemaker for the predictions. Here we're simulating the header parameter related to the CustomAttributes"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"================================================\n",
"Response code: 200, Prediction: b'0.0\\n'\n",
"\n",
"content-type text/csv\n",
"x-request-id a1d96868-7349-4cab-b308-9e83f0c0329c\n",
"Pragma no-cache\n",
"Cache-Control no-cache; no-store, must-revalidate, private\n",
"Expires Thu, 01 Jan 1970 00:00:00 UTC\n",
"content-length 4\n",
"connection keep-alive\n",
"================================================\n",
"Response code: 200, Prediction: b'2.0\\n'\n",
"\n",
"content-type text/csv\n",
"x-request-id 250bf8e7-a439-437a-a894-b38fe19aa625\n",
"Pragma no-cache\n",
"Cache-Control no-cache; no-store, must-revalidate, private\n",
"Expires Thu, 01 Jan 1970 00:00:00 UTC\n",
"content-length 4\n",
"connection keep-alive\n",
"================================================\n",
"Response code: 200, Prediction: b'1.0\\n'\n",
"\n",
"content-type text/csv\n",
"x-request-id 77c1b1c0-19da-4715-8adf-5d918e404fea\n",
"Pragma no-cache\n",
"Cache-Control no-cache; no-store, must-revalidate, private\n",
"Expires Thu, 01 Jan 1970 00:00:00 UTC\n",
"content-length 4\n",
"connection keep-alive\n",
"CPU times: user 8.59 ms, sys: 8.27 ms, total: 16.9 ms\n",
"Wall time: 342 ms\n"
]
}
],
"source": [
"%%time\n",
"from sagemaker.serializers import CSVSerializer\n",
"csv_serializer = CSVSerializer()\n",
"payloads = [\n",
" [4.6, 3.1, 1.5, 0.2], # 0\n",
" [7.7, 2.6, 6.9, 2.3], # 2\n",
" [6.1, 2.8, 4.7, 1.2] # 1\n",
"]\n",
"\n",
"def predict(payload):\n",
" headers = {\n",
" 'Content-type': 'text/csv',\n",
" 'Accept': 'text/csv'\n",
" }\n",
" \n",
" req = request.Request(\"%s/invocations\" % base_url, data=csv_serializer.serialize(payload).encode('utf-8'), headers=headers)\n",
" resp = request.urlopen(req)\n",
" print('================================================')\n",
" print(\"Response code: %d, Prediction: %s\\n\" % (resp.getcode(), resp.read()))\n",
" for i in resp.headers:\n",
" print(i, resp.headers[i])\n",
"\n",
"for p in payloads:\n",
" predict(p)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Todo Ok"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "conda_python3",
"language": "python",
"name": "conda_python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.13"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment