Commit c36a7082 authored by Cristian Quezada's avatar Cristian Quezada

Initial commit

parents
Pipeline #167 canceled with stages
<div align="center" id="top">
<img src="./.github/app.gif" alt="Mlops_aws" />
&#xa0;
<!-- <a href="https://mlops_aws.netlify.app">Demo</a> -->
</div>
<h1 align="center">Mlops_aws</h1>
<!-- Status -->
<!-- <h4 align="center">
🚧 Mlops_aws 🚀 Under construction... 🚧
</h4>
<hr> -->
<br>
## AWS MLOPS ##
Repositorio con los scripts necesarios para implementar una arquitectura de MLOPS.
Esta implementación permite agregar a tu código de ML la capacidad de entrenar y desplegar. Permite realizar cambios en el código y en los datos mientras se realizan pruebas para validarlos.
## Pre-Requisitos
### Services
Se necesita una experiencia básica en :
- Train/test un ML model
- Python ([scikit-learn](https://scikit-learn.org/stable/#))
- [Jupyter Notebook](https://jupyter.org/)
- [AWS CodePipeline](https://aws.amazon.com/codepipeline/)
- [AWS CodeCommit](https://aws.amazon.com/codecommit/)
- [AWS CodeBuild](https://aws.amazon.com/codebuild/)
- [Amazon ECR](https://aws.amazon.com/ecr/)
- [Amazon SageMaker](https://aws.amazon.com/sagemaker/)
- [AWS CloudFormation](https://aws.amazon.com/cloudformation/)
- [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-welcome.html)
### Cuenta AWS
Se requiere una cuenta AWS, revisar si los serivicios estan disponibles en la capa gratuita.
## Parte 1 : Creación de Imagen Docker
### Arquitectura
![Build Docker Image](resources/imgs/MLOps_BuildImage.jpg)
* Un ML Developer crea assets(Modelo RandomForest) para la imagen Docker y realiza un push a CodeCommit
* CodePipeline escucha el evento push de CodeCommit, obtiene el código fuente y lanza CodeBuild
* CodeBuild se autentica en ECR, crea la imagen de Docker y la inserta en el repositorio de ECR.
## Parte 2 : Entrenamiento y Despliegue
### Arquitectura
![Build Docker Image](resources/imgs/MLOps_Train_Deploy_TestModel.png)
* Inicia al subir un zip con los assets(configuración de despliegue) + datasets
* CodePipeline escucha este evento y ejecuta Lambda Request para verificar el archivo en el bucket e iniciar el entrenamiento
* Lambda envia trabajos de entrenamiento a Sagemaker
* Cuando finaliza el entrenamiento , CodePipeline verifica el estado
* CodePipeline llama a CloudFormation para desplegar en un entorno DEV
* Luego espera una aprobación manual
* CodePipeline llama a CloudFormation para desplegar en un entorno PROD
## Instrucciones
Opción 1: Ejecutar CloudFormation utilizando la plantilla que se encuentra en code/yml/m.yml
Opción 2: Ejecutar CloudFormation utilizando la plantilla que se alojada en un bucket público de s3
Region| Launch
------|-----
US East (N. Virginia) | [![Launch MLOps solution in us-east-1](resources/imgs/cloudformation-launch-stack.png)](https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/new?stackName=AIWorkshop&templateURL=https://s3.amazonaws.com/aws-ai-ml-aod-latam/mlops-workshop/m.yml)
### Contenido - Code
En la carpeta code/lambdas se encuentra los archivos python que se indican en el gráfico.
- mlops-env-setup: Obtiene variables de entorno de codecommit para establecer el entorno con las credenciales.
- mlops-op-deployment: Se ecnuentra el código que permite realizar el despliegue automático de un modelo entrenado por Sagemaker.
- mlops-op-process-request: Verifica que el archivo trainingjob.zip alojado en el bucket contenga los archivos necesarios para el entrenamiento y el despliegue.
- mlops-op-training: Extrae la data del archivo de entrenamiento y los manda a Sagemaker para ejecutar trabajos de entrenamiento.
### Contenido - Notebooks
1- Crear Imagen Docker/
01_crear_imagen_docker : creación de imagen , empleando modelos personalizados
02_Probar_local_mode : En el notebook anterior hay una celda que permite desplegar un servicio local, y con este notebook lo podemos probar.
03_Probando_sagemaker_estimator: Permite probar la imagen docker creada y utilizarla para ejecutar trabajos de entrenamiento y realizar predicciones.
2-Entrenamiento y Despliegue/
01_Training: Aca se muestra toda la configuración para realizar el despliegue y como se adjunta al dataset en un archivo zip para iniciar el flujo de la parte 2.
02_Check_Progress: Opcional, muestra una vista con más detalle a la que se muestra en la interfas de CodePipeline
# aws cloudformation delete-stack --stack-name scikit-image
# aws cloudformation create-stack --stack-name scikit-image --template-body file://build-image.yml
Description: Create a CodePipeline for creating a Docker base image for training/serving models
Parameters:
RepoBranchName:
Type: String
Description: Name of the branch the code is located
ImageRepoName:
Type: String
Description: Name of the ECR repo without the image name
ImageTagName:
Type: String
Description: Name of the ECR image tag
Default: latest
Resources:
BuildImageProject:
Type: AWS::CodeBuild::Project
Properties:
Name: !Sub mlops-buildimage-${ImageRepoName}
Description: Build a Model Image
ServiceRole: !Sub arn:aws:iam::${AWS::AccountId}:role/MLOps
Artifacts:
Type: CODEPIPELINE
Source:
Type: CODEPIPELINE
BuildSpec: buildspec.yml
Environment:
Type: LINUX_CONTAINER
ComputeType: BUILD_GENERAL1_SMALL
Image: aws/codebuild/docker:17.09.0
EnvironmentVariables:
- Name: IMAGE_REPO_NAME
Value:
Ref: ImageRepoName
- Name: IMAGE_TAG
Value:
Ref: ImageTagName
- Name: AWS_ACCOUNT_ID
Value: !Sub ${AWS::AccountId}
- Name: AWS_DEFAULT_REGION
Value: !Sub ${AWS::Region}
- Name: TEMPLATE_BUCKET
Value: !Sub mlops-${AWS::Region}-${AWS::AccountId}
- Name: TEMPLATE_PREFIX
Value: codebuild
Tags:
- Key: Name
Value: !Sub mlops-buildimage-${ImageRepoName}
DeployPipeline:
Type: "AWS::CodePipeline::Pipeline"
Properties:
Name: !Sub mlops-${ImageRepoName}
RoleArn: !Sub arn:aws:iam::${AWS::AccountId}:role/MLOps
ArtifactStore:
Type: S3
Location: !Sub mlops-${AWS::Region}-${AWS::AccountId}
Stages:
-
Name: Source
Actions:
-
Name: GetSource
ActionTypeId:
Category: Source
Owner: AWS
Version: 1
Provider: CodeCommit
OutputArtifacts:
-
Name: ModelSourceOutput
Configuration:
BranchName:
Ref: RepoBranchName
RepositoryName: mlops
RunOrder: 1
-
Name: Build
Actions:
-
Name: BuildImage
InputArtifacts:
- Name: ModelSourceOutput
ActionTypeId:
Category: Build
Owner: AWS
Version: 1
Provider: CodeBuild
Configuration:
ProjectName:
Ref: BuildImageProject
RunOrder: 1
# These are the parameters we'll ask the user before creating the environment
Parameters:
NotebookInstanceSubNetId:
Type: AWS::EC2::Subnet::Id
Description: "Select any subnet id"
AllowedPattern: ^subnet\-[a-zA-Z0-9]+$
ConstraintDescription: "You need to inform any subnetid"
NotebookInstanceSecGroupId:
Type: List<AWS::EC2::SecurityGroup::Id>
Description: "Select the default security group"
AllowedPattern: ^sg\-[a-zA-Z0-9]+$
ConstraintDescription: "Select the default security group"
Resources:
####################
## PERMISSIONS
####################
MLOpsSecurity:
Type: AWS::CloudFormation::Stack
DeletionPolicy: Delete
Properties:
TemplateURL: https://s3.amazonaws.com/aws-ai-ml-aod-latam/mlops-workshop/assets/mlops_security.yml
MLOpsLambdaLayers:
Type: AWS::CloudFormation::Stack
DeletionPolicy: Delete
Properties:
TemplateURL: https://s3.amazonaws.com/aws-ai-ml-aod-latam/mlops-workshop/assets/mlops_crhelper.yml
MLOpsProcessRequest:
Type: AWS::CloudFormation::Stack
DeletionPolicy: Delete
Properties:
TemplateURL: https://s3.amazonaws.com/aws-ai-ml-aod-latam/mlops-workshop/assets/mlops_op_process_request.yml
DependsOn:
- MLOpsLambdaLayers
- MLOpsSecurity
MLOpsDeploymentOperator:
Type: AWS::CloudFormation::Stack
DeletionPolicy: Delete
Properties:
TemplateURL: https://s3.amazonaws.com/aws-ai-ml-aod-latam/mlops-workshop/assets/mlops_op_deploy.yml
DependsOn:
- MLOpsLambdaLayers
- MLOpsSecurity
MLOpsTrainingOperator:
Type: AWS::CloudFormation::Stack
DeletionPolicy: Delete
Properties:
TemplateURL: https://s3.amazonaws.com/aws-ai-ml-aod-latam/mlops-workshop/assets/mlops_op_training.yml
DependsOn:
- MLOpsLambdaLayers
- MLOpsSecurity
## OK. Then we'll create some repos for the source code
## We need to create two branches in the default repo, so
## let's use a custom resource with a Lambda Function to do that
## Also, when the stack is deleted we need to remove all the versioned
## files from the S3 bucket, otherwise it will fail
MLOpsEnvSetup:
Type: AWS::Lambda::Function
Properties:
Code:
ZipFile: !Sub |
import cfnresponse
import boto3
codeCommit = boto3.client('codecommit')
s3 = boto3.resource('s3')
ecr = boto3.client('ecr')
def lambda_handler(event, context):
responseData = {'status': 'NONE'}
if event['RequestType'] == 'Create':
repoName = event['ResourceProperties'].get('RepoName')
branch_names = event['ResourceProperties'].get('BranchNames')
branches = codeCommit.list_branches(repositoryName=repoName)['branches']
responseData['default_branch'] = branch_names[0]
if len(branches) == 0:
putFiles = {'filePath': 'buildspec.yml', 'fileContent': "version: 0.2\nphases:\n build:\n commands:\n - echo 'dummy'\n".encode()}
resp = codeCommit.create_commit(repositoryName=repoName, branchName='master', commitMessage=' - repo init', putFiles=[putFiles])
for i in branch_names:
codeCommit.create_branch(repositoryName=repoName, branchName=i, commitId=resp['commitId'])
responseData['status'] = 'CREATED'
elif event['RequestType'] == 'Delete':
s3.Bucket( event['ResourceProperties'].get('BucketName') ).object_versions.all().delete()
try:
for i in event['ResourceProperties'].get('ImageRepoNames'):
imgs = ecr.list_images(registryId='${AWS::AccountId}', repositoryName=i)
ecr.batch_delete_image(registryId='${AWS::AccountId}', repositoryName=i, imageIds=imgs['imageIds'])
except Exception as e:
pass
responseData['status'] = 'DELETED'
cfnresponse.send(event, context, cfnresponse.SUCCESS, responseData)
FunctionName: mlops-env-setup
Handler: "index.lambda_handler"
Timeout: 60
MemorySize: 512
Role: !Sub arn:aws:iam::${AWS::AccountId}:role/MLOps
Runtime: python3.7
DependsOn:
- MLOpsSecurity
MLOpsEnvSetupCaller:
Type: Custom::EnvSetupCaller
Properties:
ServiceToken: !GetAtt MLOpsEnvSetup.Arn
RepoName: mlops
BranchNames:
- iris_model
ImageRepoNames:
- iris-model
BucketName: !Ref MLOpsBucket
DependsOn:
- MLOpsRepo
- MLOpsIrisModelRepo
####################
## REPOSITORIES
####################
## We have the custome resource that can invoke a lambda function to create the branches,
## So, let's create the CodeCommit repo and also the SageMaker Code repos
MLOpsRepo:
Type: AWS::CodeCommit::Repository
Properties:
RepositoryDescription: Repository for the ML models/images code
RepositoryName: mlops
MLOpsIrisModelRepo:
Type: AWS::ECR::Repository
Properties:
RepositoryName: iris-model
MLOpsBucket:
Type: AWS::S3::Bucket
Properties:
BucketName: !Sub mlops-${AWS::Region}-${AWS::AccountId}
Tags:
- Key: Name
Value: !Sub mlops-${AWS::Region}-${AWS::AccountId}
AccessControl: Private
VersioningConfiguration:
Status: Enabled
####################
## PIPELINES
####################
BuildPipelineIrisModel:
Type: AWS::CloudFormation::Stack
DeletionPolicy: Delete
Properties:
TemplateURL: https://s3.amazonaws.com/aws-ai-ml-aod-latam/mlops-workshop/assets/build_image.yml
Parameters:
RepoBranchName: iris_model
ImageRepoName: iris-model
ImageTagName: latest
DependsOn:
- MLOpsBucket
- MLOpsCodeRepo
MLPipelineIrisModel:
Type: AWS::CloudFormation::Stack
DeletionPolicy: Delete
Properties:
TemplateURL: https://s3.amazonaws.com/aws-ai-ml-aod-latam/mlops-workshop/assets/mlops_pipeline.yml
Parameters:
SourceBucketPath: !Ref MLOpsBucket
ModelNamePrefix: iris-model
DependsOn:
- MLOpsBucket
- MLOpsCodeRepo
####################
## Notebook Instance
####################
MLOpsExercisesRepo:
Type: AWS::SageMaker::CodeRepository
Properties:
CodeRepositoryName: MLOpsExercisesRepo
GitConfig:
RepositoryUrl: https://github.com/awslabs/amazon-sagemaker-mlops-workshop.git
MLOpsCodeRepo:
Type: AWS::SageMaker::CodeRepository
Properties:
CodeRepositoryName: MLOpsCodeRepo
GitConfig:
RepositoryUrl: !GetAtt MLOpsRepo.CloneUrlHttp
Branch: !Sub "${MLOpsEnvSetupCaller.default_branch}"
DependsOn: MLOpsEnvSetupCaller
IAWorkshopNotebookInstanceLifecycleConfig:
Type: "AWS::SageMaker::NotebookInstanceLifecycleConfig"
Properties:
NotebookInstanceLifecycleConfigName: !Sub ${AWS::StackName}-lifecycle-config
OnStart:
- Content: !Base64 |
#!/bin/bash
sudo -u ec2-user -i <<'EOF'
echo "Finally, let's clone and build an image for testing codebuild locally"
git clone https://github.com/aws/aws-codebuild-docker-images.git /tmp/aws-codebuild
chmod +x /tmp/aws-codebuild/local_builds/codebuild_build.sh
docker pull amazon/aws-codebuild-local:latest --disable-content-trust=false
# This will affect only the Jupyter kernel called "conda_python3".
source activate python3
# Update the sagemaker to the latest version
pip install -U sagemaker
source deactivate
EOF
MLOpsNotebookInstance:
Type: "AWS::SageMaker::NotebookInstance"
Properties:
NotebookInstanceName: MLOpsWorkshop
InstanceType: "ml.m4.xlarge"
SubnetId: !Ref NotebookInstanceSubNetId
SecurityGroupIds: !Ref NotebookInstanceSecGroupId
RoleArn: !Sub arn:aws:iam::${AWS::AccountId}:role/MLOps
DefaultCodeRepository: MLOpsCodeRepo
AdditionalCodeRepositories:
- MLOpsExercisesRepo
VolumeSizeInGB: 15
LifecycleConfigName: !GetAtt IAWorkshopNotebookInstanceLifecycleConfig.NotebookInstanceLifecycleConfigName
DependsOn:
- IAWorkshopNotebookInstanceLifecycleConfig
- MLOpsCodeRepo
- MLOpsExercisesRepo
- MLOpsSecurity
Outputs:
MLOpsNotebookInstanceId:
Value: !Ref MLOpsNotebookInstance
Resources:
CloudFormationHelperLayer:
Type: AWS::Lambda::LayerVersion
Properties:
CompatibleRuntimes:
- python3.6
- python3.7
LayerName: crhelper
Description: https://github.com/aws-cloudformation/custom-resource-helper
LicenseInfo: Apache 2.0 License
Content:
S3Bucket: aws-ai-ml-aod-latam
S3Key: mlops-workshop/assets/crhelper.zip
Outputs:
LayerArn:
Description: Arn of the layer's latest version
Value: !Ref CloudFormationHelperLayer
Export:
Name: mlops-crhelper-LayerArn
Resources:
MLOpsDeployment:
Type: "AWS::Lambda::Function"
Properties:
FunctionName: mlops-op-deployment
Handler: mlops_op_deploy.lambda_handler
MemorySize: 512
Role: !Sub arn:aws:iam::${AWS::AccountId}:role/MLOps
Runtime: python3.7
Timeout: 60
Layers:
- Fn::ImportValue: mlops-crhelper-LayerArn
Code:
S3Bucket: aws-ai-ml-aod-latam
S3Key: mlops-workshop/assets/src/mlops_op_deploy.zip
Description: "Function that will start a new Sagemaker Deployment"
Tags:
- Key: Description
Value: Lambda function that process the request and prepares the cfn template for deployment
Resources:
MLOpsProcessRequest:
Type: "AWS::Lambda::Function"
Properties:
FunctionName: mlops-op-process-request
Handler: index.lambda_handler
MemorySize: 512
Role: !Sub arn:aws:iam::${AWS::AccountId}:role/MLOps
Runtime: python3.7
Timeout: 60
Code:
ZipFile: !Sub |
import boto3
import io
import zipfile
import json
from datetime import datetime
s3 = boto3.client('s3')
codepipeline = boto3.client('codepipeline')
def lambda_handler(event, context):
trainingJob = None
deployment = None
try:
now = datetime.now()
jobId = event["CodePipeline.job"]["id"]
user_params = json.loads(event["CodePipeline.job"]["data"]["actionConfiguration"]["configuration"]["UserParameters"])
model_prefix = user_params['model_prefix']
mlops_operation_template = s3.get_object(Bucket=user_params['bucket'], Key=user_params['prefix'] )['Body'].read()
job_name = 'mlops-%s-%s' % (model_prefix, now.strftime("%Y-%m-%d-%H-%M-%S"))
s3Location = None
for inputArtifacts in event["CodePipeline.job"]["data"]["inputArtifacts"]:
if inputArtifacts['name'] == 'ModelSourceOutput':
s3Location = inputArtifacts['location']['s3Location']
params = {
"Parameters": {
"AssetsBucket": s3Location['bucketName'],
"AssetsKey": s3Location['objectKey'],
"Operation": "training",
"Environment": "none",
"JobName": job_name
}
}
for outputArtifacts in event["CodePipeline.job"]["data"]["outputArtifacts"]:
if outputArtifacts['name'] == 'RequestOutput':
s3Location = outputArtifacts['location']['s3Location']
zip_bytes = io.BytesIO()
with zipfile.ZipFile(zip_bytes, "w") as z:
z.writestr('assets/params_train.json', json.dumps(params))
params['Parameters']['Operation'] = 'deployment'
params['Parameters']['Environment'] = 'development'
z.writestr('assets/params_deploy_dev.json', json.dumps(params))
params['Parameters']['Environment'] = 'production'
z.writestr('assets/params_deploy_prd.json', json.dumps(params))
z.writestr('assets/mlops_operation_handler.yml', mlops_operation_template)
zip_bytes.seek(0)
s3.put_object(Bucket=s3Location['bucketName'], Key=s3Location['objectKey'], Body=zip_bytes.read())
# and update codepipeline
codepipeline.put_job_success_result(jobId=jobId)
except Exception as e:
resp = codepipeline.put_job_failure_result(
jobId=jobId,
failureDetails={
'type': 'ConfigurationError',
'message': str(e),
'externalExecutionId': context.aws_request_id
}
)
Description: "Function that will start a new Sagemaker Training Job"
Tags:
- Key: Description
Value: Lambda function that process the request and prepares the cfn template for training
Resources:
MLOpsTraining:
Type: "AWS::Lambda::Function"
Properties:
FunctionName: mlops-op-training
Handler: index.lambda_handler
MemorySize: 512
Role: !Sub arn:aws:iam::${AWS::AccountId}:role/MLOps
Runtime: python3.7
Timeout: 60
Layers:
- Fn::ImportValue: mlops-crhelper-LayerArn
Code:
ZipFile: !Sub |
import boto3
import io
import zipfile
import json
import logging
from crhelper import CfnResource
logger = logging.getLogger(__name__)
# Initialise the helper, all inputs are optional, this example shows the defaults
helper = CfnResource(json_logging=False, log_level='DEBUG', boto_level='CRITICAL')
s3 = boto3.client('s3')
sm = boto3.client('sagemaker')
def lambda_handler(event, context):
helper(event, context)
@helper.create
@helper.update
def start_training_job(event, context):
try:
# Get the training job and deployment descriptors
training_params = None
deployment_params = None
job_name = event['ResourceProperties']['JobName']
helper.Data.update({'job_name': job_name})
try:
# We need to check if there is another training job with the same name
sm.describe_training_job(TrainingJobName=job_name)
## there is, let's let the poll to address this
except Exception as a:
# Ok. there isn't. so, let's start a new training job
resp = s3.get_object(Bucket=event['ResourceProperties']['AssetsBucket'], Key=event['ResourceProperties']['AssetsKey'])
with zipfile.ZipFile(io.BytesIO(resp['Body'].read()), "r") as z:
training_params = json.loads(z.read('trainingjob.json').decode('ascii'))
deployment_params = json.loads(z.read('deployment.json').decode('ascii'))
training_params['TrainingJobName'] = job_name
resp = sm.create_training_job(**training_params)
except Exception as e:
logger.error("start_training_job - Ops! Something went wrong: %s" % e)
raise e
@helper.delete
def stop_training_job(event, context):
try:
job_name = event['ResourceProperties']['JobName']
status = sm.describe_training_job(TrainingJobName=job_name)['TrainingJobStatus']
if status == 'InProgress':
logger.info('Stopping InProgress training job: %s', job_name)
sm.stop_training_job(TrainingJobName=job_name)
return False
else:
logger.info('Training job status: %s, nothing to stop', status)
except Exception as e:
logger.error("stop_training_job - Ops! Something went wrong: %s" % e)
return True
@helper.poll_create
@helper.poll_update
def check_training_job_progress(event, context):
failed = False
try:
job_name = helper.Data.get('job_name')
resp = sm.describe_training_job(TrainingJobName=job_name)
status = resp['TrainingJobStatus']
if status == 'Completed':
logger.info('Training Job (%s) is Completed', job_name)
return True
elif status in ['InProgress', 'Stopping' ]:
logger.info('Training job (%s) still in progress (%s), waiting and polling again...',
job_name, resp['SecondaryStatus'])
elif status == 'Failed':
failed = True
raise Exception('Training job has failed: {}',format(resp['FailureReason']))
else:
raise Exception('Training job ({}) has unexpected status: {}'.format(job_name, status))
except Exception as e:
logger.error("check_training_job_progress - Ops! Something went wrong: %s" % e)
if failed:
raise e
return False
@helper.poll_delete
def check_stopping_training_job_progress(event, context):
logger.info("check_stopping_training_job_progress")
return stop_training_job(event, context)
Description: "Function that will start a new Sagemaker Training Job"
Tags:
- Key: Description
Value: Lambda function that process the request and prepares the cfn template for training
Description: Create a CodePipeline for a Machine Learning Pipeline
Parameters:
SourceBucketPath:
Type: String
Description: Path of the S3 bucket that CodePipeline should find a sagemaker jobfile
ModelNamePrefix:
Type: String
Description: The name prefix of the model that will be supported by this pipeline
Resources:
DeployPipeline:
Type: "AWS::CodePipeline::Pipeline"
Properties:
Name: !Sub ${ModelNamePrefix}-pipeline
RoleArn: !Sub arn:aws:iam::${AWS::AccountId}:role/MLOps
ArtifactStore:
Type: S3
Location: !Sub mlops-${AWS::Region}-${AWS::AccountId}
Stages:
-
Name: Source
Actions:
-
Name: SourceAction
ActionTypeId:
Category: Source
Owner: AWS
Version: 1
Provider: S3
OutputArtifacts:
-
Name: ModelSourceOutput
Configuration:
S3Bucket:
!Sub ${SourceBucketPath}
S3ObjectKey:
!Sub training_jobs/${ModelNamePrefix}/trainingjob.zip
RunOrder: 1
-
Name: ProcessRequest
Actions:
-
Name: ProcessRequest
InputArtifacts:
- Name: ModelSourceOutput
OutputArtifacts:
-
Name: RequestOutput
ActionTypeId:
Category: Invoke
Owner: AWS
Version: 1
Provider: Lambda
Configuration:
FunctionName: mlops-op-process-request
UserParameters: !Sub '{"model_prefix": "${ModelNamePrefix}", "bucket":"aws-ai-ml-aod-latam","prefix":"mlops-workshop/assets/mlops_operation_handler.yml" }'
RunOrder: 1
-
Name: Train
Actions:
-
Name: TrainModel
InputArtifacts:
- Name: ModelSourceOutput
- Name: RequestOutput
OutputArtifacts:
- Name: ModelTrainOutput
ActionTypeId:
Category: Deploy
Owner: AWS
Version: 1
Provider: CloudFormation
Configuration:
ActionMode: CREATE_UPDATE
RoleArn: !Sub arn:aws:iam::${AWS::AccountId}:role/MLOps
StackName: !Sub mlops-training-${ModelNamePrefix}-job
TemplateConfiguration: RequestOutput::assets/params_train.json
TemplatePath: RequestOutput::assets/mlops_operation_handler.yml
RunOrder: 1
-
Name: DeployDev
Actions:
-
Name: DeployDevModel
InputArtifacts:
- Name: ModelSourceOutput
- Name: RequestOutput
OutputArtifacts:
- Name: ModelDeployDevOutput
ActionTypeId:
Category: Deploy
Owner: AWS
Version: 1
Provider: CloudFormation
Configuration:
ActionMode: CREATE_UPDATE
RoleArn: !Sub arn:aws:iam::${AWS::AccountId}:role/MLOps
StackName: !Sub mlops-deploy-${ModelNamePrefix}-dev
TemplateConfiguration: RequestOutput::assets/params_deploy_dev.json
TemplatePath: RequestOutput::assets/mlops_operation_handler.yml
RunOrder: 1
-
Name: DeployApproval
Actions:
-
Name: ApproveDeploy
ActionTypeId:
Category: Approval
Owner: AWS
Version: 1
Provider: Manual
Configuration:
CustomData: 'Shall this model be put into production?'
RunOrder: 1
-
Name: DeployPrd
Actions:
-
Name: DeployModelPrd
InputArtifacts:
- Name: ModelSourceOutput
- Name: RequestOutput
OutputArtifacts:
- Name: ModelDeployPrdOutput
ActionTypeId:
Category: Deploy
Owner: AWS
Version: 1
Provider: CloudFormation
Configuration:
ActionMode: CREATE_UPDATE
RoleArn: !Sub arn:aws:iam::${AWS::AccountId}:role/MLOps
StackName: !Sub mlops-deploy-${ModelNamePrefix}-prd
TemplateConfiguration: RequestOutput::assets/params_deploy_prd.json
TemplatePath: RequestOutput::assets/mlops_operation_handler.yml
RunOrder: 1
Resources:
MLOpsRole:
Type: "AWS::IAM::Role"
Properties:
RoleName: MLOps
AssumeRolePolicyDocument:
Version: "2012-10-17"
Statement:
-
Effect: "Allow"
Principal:
Service:
- "sagemaker.amazonaws.com"
Action:
- "sts:AssumeRole"
-
Effect: "Allow"
Principal:
Service:
- "cloudformation.amazonaws.com"
Action:
- "sts:AssumeRole"
-
Effect: "Allow"
Principal:
Service:
- "codepipeline.amazonaws.com"
Action:
- "sts:AssumeRole"
-
Effect: "Allow"
Principal:
Service:
- "codebuild.amazonaws.com"
Action:
- "sts:AssumeRole"
-
Effect: "Allow"
Principal:
Service:
- "lambda.amazonaws.com"
Action:
- "sts:AssumeRole"
-
Effect: "Allow"
Principal:
Service:
- "events.amazonaws.com"
Action:
- "sts:AssumeRole"
-
Effect: "Allow"
Principal:
Service:
- "states.amazonaws.com"
Action:
- "sts:AssumeRole"
-
Effect: "Allow"
Principal:
Service:
- "glue.amazonaws.com"
Action:
- "sts:AssumeRole"
Path: "/"
Policies:
-
PolicyName: "Admin"
PolicyDocument:
Version: "2012-10-17"
Statement:
-
Effect: "Allow"
Action: "*"
Resource: "*"
Outputs:
LayerArn:
Description: Arn of the role
Value: !Ref MLOpsRole
Export:
Name: mlops-RoleArn
This source diff could not be displayed because it is too large. You can view the blob instead.
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Primero probemos haciendo ping (GET /ping)\n",
"Sagemaker utilizará este meétodo para comprobar el estado de nuestro modelo.\n",
"Debe devolver un código **200**."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"from urllib import request\n",
"\n",
"base_url='http://localhost:8080'"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Response code: 200\n"
]
}
],
"source": [
"resp = request.urlopen(\"%s/ping\" % base_url)\n",
"print(\"Response code: %d\" % resp.getcode() )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Ahora podemos hacer predicciones (POST /invocations)\n",
"SAgemaker utilizará este meétodo para las predicciones. Aquí estamos simulando el parámetro de encabezado relacionado con CustomAttributes"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"================================================\n",
"Response code: 200, Prediction: b'0.0\\n'\n",
"\n",
"content-type text/csv\n",
"x-request-id a1d96868-7349-4cab-b308-9e83f0c0329c\n",
"Pragma no-cache\n",
"Cache-Control no-cache; no-store, must-revalidate, private\n",
"Expires Thu, 01 Jan 1970 00:00:00 UTC\n",
"content-length 4\n",
"connection keep-alive\n",
"================================================\n",
"Response code: 200, Prediction: b'2.0\\n'\n",
"\n",
"content-type text/csv\n",
"x-request-id 250bf8e7-a439-437a-a894-b38fe19aa625\n",
"Pragma no-cache\n",
"Cache-Control no-cache; no-store, must-revalidate, private\n",
"Expires Thu, 01 Jan 1970 00:00:00 UTC\n",
"content-length 4\n",
"connection keep-alive\n",
"================================================\n",
"Response code: 200, Prediction: b'1.0\\n'\n",
"\n",
"content-type text/csv\n",
"x-request-id 77c1b1c0-19da-4715-8adf-5d918e404fea\n",
"Pragma no-cache\n",
"Cache-Control no-cache; no-store, must-revalidate, private\n",
"Expires Thu, 01 Jan 1970 00:00:00 UTC\n",
"content-length 4\n",
"connection keep-alive\n",
"CPU times: user 8.59 ms, sys: 8.27 ms, total: 16.9 ms\n",
"Wall time: 342 ms\n"
]
}
],
"source": [
"%%time\n",
"from sagemaker.serializers import CSVSerializer\n",
"csv_serializer = CSVSerializer()\n",
"payloads = [\n",
" [4.6, 3.1, 1.5, 0.2], # 0\n",
" [7.7, 2.6, 6.9, 2.3], # 2\n",
" [6.1, 2.8, 4.7, 1.2] # 1\n",
"]\n",
"\n",
"def predict(payload):\n",
" headers = {\n",
" 'Content-type': 'text/csv',\n",
" 'Accept': 'text/csv'\n",
" }\n",
" \n",
" req = request.Request(\"%s/invocations\" % base_url, data=csv_serializer.serialize(payload).encode('utf-8'), headers=headers)\n",
" resp = request.urlopen(req)\n",
" print('================================================')\n",
" print(\"Response code: %d, Prediction: %s\\n\" % (resp.getcode(), resp.read()))\n",
" for i in resp.headers:\n",
" print(i, resp.headers[i])\n",
"\n",
"for p in payloads:\n",
" predict(p)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Todo Ok"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "conda_python3",
"language": "python",
"name": "conda_python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.13"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## First, let's test the ping method (GET /ping)\n",
"This method will be used by Sagemaker for health check our model. It must return a code **200**."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"from urllib import request\n",
"\n",
"base_url='http://localhost:8080'"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Response code: 200\n"
]
}
],
"source": [
"resp = request.urlopen(\"%s/ping\" % base_url)\n",
"print(\"Response code: %d\" % resp.getcode() )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Then we can the predictions (POST /invocations)\n",
"This method will be used by Sagemaker for the predictions. Here we're simulating the header parameter related to the CustomAttributes"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"================================================\n",
"Response code: 200, Prediction: b'0.0\\n'\n",
"\n",
"content-type text/csv\n",
"x-request-id a1d96868-7349-4cab-b308-9e83f0c0329c\n",
"Pragma no-cache\n",
"Cache-Control no-cache; no-store, must-revalidate, private\n",
"Expires Thu, 01 Jan 1970 00:00:00 UTC\n",
"content-length 4\n",
"connection keep-alive\n",
"================================================\n",
"Response code: 200, Prediction: b'2.0\\n'\n",
"\n",
"content-type text/csv\n",
"x-request-id 250bf8e7-a439-437a-a894-b38fe19aa625\n",
"Pragma no-cache\n",
"Cache-Control no-cache; no-store, must-revalidate, private\n",
"Expires Thu, 01 Jan 1970 00:00:00 UTC\n",
"content-length 4\n",
"connection keep-alive\n",
"================================================\n",
"Response code: 200, Prediction: b'1.0\\n'\n",
"\n",
"content-type text/csv\n",
"x-request-id 77c1b1c0-19da-4715-8adf-5d918e404fea\n",
"Pragma no-cache\n",
"Cache-Control no-cache; no-store, must-revalidate, private\n",
"Expires Thu, 01 Jan 1970 00:00:00 UTC\n",
"content-length 4\n",
"connection keep-alive\n",
"CPU times: user 8.59 ms, sys: 8.27 ms, total: 16.9 ms\n",
"Wall time: 342 ms\n"
]
}
],
"source": [
"%%time\n",
"from sagemaker.serializers import CSVSerializer\n",
"csv_serializer = CSVSerializer()\n",
"payloads = [\n",
" [4.6, 3.1, 1.5, 0.2], # 0\n",
" [7.7, 2.6, 6.9, 2.3], # 2\n",
" [6.1, 2.8, 4.7, 1.2] # 1\n",
"]\n",
"\n",
"def predict(payload):\n",
" headers = {\n",
" 'Content-type': 'text/csv',\n",
" 'Accept': 'text/csv'\n",
" }\n",
" \n",
" req = request.Request(\"%s/invocations\" % base_url, data=csv_serializer.serialize(payload).encode('utf-8'), headers=headers)\n",
" resp = request.urlopen(req)\n",
" print('================================================')\n",
" print(\"Response code: %d, Prediction: %s\\n\" % (resp.getcode(), resp.read()))\n",
" for i in resp.headers:\n",
" print(i, resp.headers[i])\n",
"\n",
"for p in payloads:\n",
" predict(p)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Todo Ok"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "conda_python3",
"language": "python",
"name": "conda_python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.13"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Integrated Test\n",
"En esta prueba, usaremos un Estimador de SageMaker (https://sagemaker.readthedocs.io/en/stable/estimators.html) para encapsular la imagen de la ventana acoplable publicada en ECR e iniciar una prueba ** local **, pero esta vez, utilizando la biblioteca SageMaker."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import sagemaker\n",
"import json\n",
"from sagemaker import get_execution_role\n",
"\n",
"role = get_execution_role()\n",
"sagemaker_session = sagemaker.Session()\n",
"bucket = sagemaker_session.default_bucket()\n",
"prefix='mlops/iris'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Upload the dataset\n",
"En el ejercicio anterior, preparó el conjunto de datos de capacitación y validación. Ahora, cargaremos los archivos CSV en S3 y los usaremos con un Estimator."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Train: s3://sagemaker-us-east-1-894268508623/iris-model/input/train\n",
"Validation: s3://sagemaker-us-east-1-894268508623/iris-model/input/validation\n"
]
}
],
"source": [
"train_path = sagemaker_session.upload_data(path='input/data/train', key_prefix='iris-model/input/train')\n",
"test_path = sagemaker_session.upload_data(path='input/data/validation', key_prefix='iris-model/input/validation')\n",
"print(\"Train: %s\\nValidation: %s\" % (train_path, test_path) )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Y ahora, podemos usar un Estimador de SageMaker para entrenar e implementar el contenedor que hemos creado."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'max_depth': 20, 'n_jobs': 4, 'n_estimators': 120}\n"
]
}
],
"source": [
"# Create the estimator\n",
"# iris-model:test es el nombre del contenedor construido en local\n",
"# Mediante la prueba de creación de código local. Se envió una imagen con ese nombre: etiqueta al ECR.\n",
"iris = sagemaker.estimator.Estimator('iris-model:test',\n",
" role,\n",
" instance_count=1, \n",
" instance_type='local',\n",
" output_path='s3://{}/{}/output'.format(bucket, prefix))\n",
"hyperparameters = {\n",
" 'max_depth': 20,\n",
" 'n_jobs': 4,\n",
" 'n_estimators': 120\n",
"}\n",
"\n",
"print(hyperparameters)\n",
"iris.set_hyperparameters(**hyperparameters)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After you call .fit, a new training job will be executed inside the *local Docker daemon* and not in the SageMaker environment, on the cloud"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Creating xxjwimllzt-algo-1-0ttgc ... \n",
"Creating xxjwimllzt-algo-1-0ttgc ... done\n",
"Attaching to xxjwimllzt-algo-1-0ttgc\n",
"\u001b[36mxxjwimllzt-algo-1-0ttgc |\u001b[0m Training mode\n",
"\u001b[36mxxjwimllzt-algo-1-0ttgc |\u001b[0m Training the classifier\n",
"\u001b[36mxxjwimllzt-algo-1-0ttgc |\u001b[0m [Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.\n",
"\u001b[36mxxjwimllzt-algo-1-0ttgc |\u001b[0m [Parallel(n_jobs=4)]: Done 42 tasks | elapsed: 0.1s\n",
"\u001b[36mxxjwimllzt-algo-1-0ttgc |\u001b[0m [Parallel(n_jobs=4)]: Done 120 out of 120 | elapsed: 0.2s finished\n",
"\u001b[36mxxjwimllzt-algo-1-0ttgc |\u001b[0m [Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.\n",
"\u001b[36mxxjwimllzt-algo-1-0ttgc |\u001b[0m [Parallel(n_jobs=4)]: Done 42 tasks | elapsed: 0.0s\n",
"\u001b[36mxxjwimllzt-algo-1-0ttgc |\u001b[0m [Parallel(n_jobs=4)]: Done 120 out of 120 | elapsed: 0.0s finished\n",
"\u001b[36mxxjwimllzt-algo-1-0ttgc |\u001b[0m Score: 0.98\n",
"\u001b[36mxxjwimllzt-algo-1-0ttgc exited with code 0\n",
"\u001b[0mAborting on container exit...\n",
"===== Job Complete =====\n"
]
}
],
"source": [
"iris.fit({'train': train_path, 'validation': test_path })"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"El siguiente comando lanzará un nuevo contenedor en su daemon Docker local. Luego puede usar el predictor devuelto para probarlo"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Attaching to hkw8j4xuqr-algo-1-h9ljw\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m Warning: MMS is using non-default JVM parameters: -XX:-UseContainerSupport\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:10,899 [INFO ] main com.amazonaws.ml.mms.ModelServer - \n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m MMS Home: /usr/local/lib/python3.7/site-packages\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m Current directory: /\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m Temp directory: /tmp\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m Number of GPUs: 0\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m Number of CPUs: 4\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m Max heap size: 4012 M\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m Python executable: /usr/local/bin/python\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m Config file: /etc/sagemaker-mms.properties\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m Inference address: http://0.0.0.0:8080\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m Management address: http://0.0.0.0:8080\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m Model Store: /.sagemaker/mms/models\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m Initial Models: ALL\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m Log dir: /logs\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m Metrics dir: /logs\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m Netty threads: 0\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m Netty client threads: 0\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m Default workers per model: 4\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m Blacklist Regex: N/A\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m Maximum Response Size: 6553500\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m Maximum Request Size: 6553500\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m Preload model: false\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m Prefer direct buffer: false\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:10,991 [WARN ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-9000-model\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:11,080 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - model_service_worker started with args: --sock-type unix --sock-name /tmp/.mms.sock.9000 --handler serving.handler --model-path /.sagemaker/mms/models/model --model-name model --preload-model false --tmp-dir /tmp\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:11,081 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Listening on port: /tmp/.mms.sock.9000\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:11,082 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [PID] 36\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:11,083 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - MMS worker started.\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:11,083 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Python runtime: 3.7.11\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:11,084 [INFO ] main com.amazonaws.ml.mms.wlm.ModelManager - Model model loaded.\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:11,109 [INFO ] main com.amazonaws.ml.mms.ModelServer - Initialize Inference server with: EpollServerSocketChannel.\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:11,127 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /tmp/.mms.sock.9000\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:11,128 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /tmp/.mms.sock.9000\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:11,129 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /tmp/.mms.sock.9000\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:11,129 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /tmp/.mms.sock.9000\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:11,224 [INFO ] main com.amazonaws.ml.mms.ModelServer - Inference API bind to: http://0.0.0.0:8080\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m Model server started.\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:11,236 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /tmp/.mms.sock.9000.\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:11,238 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /tmp/.mms.sock.9000.\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:11,240 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /tmp/.mms.sock.9000.\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:11,242 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /tmp/.mms.sock.9000.\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:11,247 [WARN ] pool-2-thread-1 com.amazonaws.ml.mms.metrics.MetricCollector - worker pid is not available yet.\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:12,399 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Model model loaded io_fd=0242acfffe120002-00000010-00000000-e441ebc9770b100d-da0dc8ed\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:12,400 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Model model loaded io_fd=0242acfffe120002-00000010-00000004-63e19bc9770b100d-5eb9bc23\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:12,405 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Model model loaded io_fd=0242acfffe120002-00000010-00000003-78929bc9770b100d-a764d1d2\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:12,422 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 1075\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:12,422 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 1091\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:12,422 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 1088\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:12,424 [WARN ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-model-4\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:12,425 [WARN ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-model-1\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:12,425 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Model model loaded io_fd=0242acfffe120002-00000010-00000001-f751ebc9770b100d-22f4ab03\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:12,425 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 1078\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:12,429 [WARN ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-model-3\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:12,429 [WARN ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-model-2\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:12,553 [INFO ] pool-1-thread-6 ACCESS_LOG - /172.18.0.1:42220 \"GET /ping HTTP/1.1\" 200 19\n",
"!"
]
}
],
"source": [
"iris_predictor = iris.deploy(initial_instance_count=1, instance_type='local')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Revisar el uso del predicto (https://sagemaker.readthedocs.io/en/stable/predictors.html) para los tests"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:55,751 [WARN ] W-model-4-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:55,764 [WARN ] W-model-4-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [Parallel(n_jobs=4)]: Done 42 tasks | elapsed: 0.0s\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:55,777 [WARN ] W-model-4-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [Parallel(n_jobs=4)]: Done 120 out of 120 | elapsed: 0.0s finished\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:55,855 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 106\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:55,856 [INFO ] W-9000-model ACCESS_LOG - /172.18.0.1:42240 \"POST /invocations HTTP/1.1\" 200 110\n",
"RESULT: 1.0 == [['1.0']] ? False\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:55,863 [WARN ] W-model-1-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:55,876 [WARN ] W-model-1-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [Parallel(n_jobs=4)]: Done 42 tasks | elapsed: 0.0s\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:55,891 [WARN ] W-model-1-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [Parallel(n_jobs=4)]: Done 120 out of 120 | elapsed: 0.0s finished\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:55,967 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 106\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:55,967 [INFO ] W-9000-model ACCESS_LOG - /172.18.0.1:42240 \"POST /invocations HTTP/1.1\" 200 107\n",
"RESULT: 1.0 == [['1.0']] ? False\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:55,972 [WARN ] W-model-3-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:55,985 [WARN ] W-model-3-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [Parallel(n_jobs=4)]: Done 42 tasks | elapsed: 0.0s\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:55,999 [WARN ] W-model-3-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [Parallel(n_jobs=4)]: Done 120 out of 120 | elapsed: 0.0s finished\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:56,076 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 105\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:56,077 [INFO ] W-9000-model ACCESS_LOG - /172.18.0.1:42240 \"POST /invocations HTTP/1.1\" 200 107\n",
"RESULT: 0.0 == [['0.0']] ? False\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:56,081 [WARN ] W-model-2-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:56,095 [WARN ] W-model-2-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [Parallel(n_jobs=4)]: Done 42 tasks | elapsed: 0.0s\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:56,109 [WARN ] W-model-2-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [Parallel(n_jobs=4)]: Done 120 out of 120 | elapsed: 0.0s finished\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:56,186 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 106\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:56,186 [INFO ] W-9000-model ACCESS_LOG - /172.18.0.1:42240 \"POST /invocations HTTP/1.1\" 200 107\n",
"RESULT: 0.0 == [['0.0']] ? False\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:56,191 [WARN ] W-model-4-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers.\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:56,202 [WARN ] W-model-4-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [Parallel(n_jobs=4)]: Done 42 tasks | elapsed: 0.0s\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:56,216 [WARN ] W-model-4-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [Parallel(n_jobs=4)]: Done 120 out of 120 | elapsed: 0.0s finished\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:56,293 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 103\n",
"\u001b[36mhkw8j4xuqr-algo-1-h9ljw |\u001b[0m 2021-08-18 22:04:56,293 [INFO ] W-9000-model ACCESS_LOG - /172.18.0.1:42240 \"POST /invocations HTTP/1.1\" 200 104\n",
"RESULT: 1.0 == [['1.0']] ? False\n"
]
}
],
"source": [
"import pandas as pd\n",
"import random\n",
"from sagemaker.serializers import CSVSerializer\n",
"from sagemaker.deserializers import CSVDeserializer\n",
"\n",
"# configure the predictor to do everything for us\n",
"iris_predictor.serializer = CSVSerializer()\n",
"iris_predictor.deserializer = CSVDeserializer()\n",
"\n",
"# load the testing data from the validation csv\n",
"validation = pd.read_csv('input/data/validation/testing.csv', header=None)\n",
"idx = random.randint(0,len(validation)-5)\n",
"req = validation.iloc[idx:idx+5].values\n",
"\n",
"# cut a sample with 5 lines from our dataset and then split the label from the features.\n",
"X = req[:,1:].tolist()\n",
"y = req[:,0].tolist()\n",
"\n",
"# call the local endpoint\n",
"for features,label in zip(X,y):\n",
" prediction = iris_predictor.predict(features)\n",
"\n",
" # compare the results\n",
" print(\"RESULT: {} == {} ? {}\".format( label, prediction, label == prediction ) )"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Gracefully stopping... (press Ctrl+C again to force)\n"
]
}
],
"source": [
"iris_predictor.delete_endpoint()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ¡Eso es! :) Ahora puede volver al cuaderno de Jupyter anterior y pushear a un repo para comenzar a construir la imagen final de Docker."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "conda_python3",
"language": "python",
"name": "conda_python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.13"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Ahora es el momento de iniciar el pipeline de ML automatizado usando el entorno MLOps\n",
"\n",
"Lo harecomos colocando un archivo zip llamado trainingjob.zip, en un bucket de S3. Codepipeline escuchara este bucket y comenzará un nuevo job. Este archivo zip tiene la siguiente estructura:\n",
"\n",
"- trainingjob.json :Descriptor de training job de Sagemaker\n",
"- environment.json : instrucciones para el entorno de cómo implementar y preparar los endpoints"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 1.1 Definir hiperparámetros"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-xgboost:1.0-1-cpu-py3\n"
]
}
],
"source": [
"import sagemaker\n",
"import boto3\n",
"\n",
"use_xgboost_builtin=True\n",
"\n",
"sts_client = boto3.client(\"sts\")\n",
"account_id = sts_client.get_caller_identity()[\"Account\"]\n",
"region = boto3.session.Session().region_name\n",
"model_prefix='iris-model'\n",
"training_image = None\n",
"hyperparameters = None\n",
"if use_xgboost_builtin: \n",
" training_image = sagemaker.image_uris.retrieve('xgboost', boto3.Session().region_name, version='1.0-1')\n",
" hyperparameters = {\n",
" \"alpha\": 0.42495142279951414,\n",
" \"eta\": 0.4307531922567607,\n",
" \"gamma\": 1.8028358018081714,\n",
" \"max_depth\": 10,\n",
" \"min_child_weight\": 5.925133573560345,\n",
" \"num_class\": 3,\n",
" \"num_round\": 30,\n",
" \"objective\": \"multi:softmax\",\n",
" \"reg_lambda\": 10,\n",
" \"silent\": 0,\n",
" }\n",
"else:\n",
" # usar test ppor mientras, normalmente usar latest\n",
" training_image = '{}.dkr.ecr.{}.amazonaws.com/{}:test'.format(account_id, region, model_prefix)\n",
" hyperparameters = {\n",
" \"max_depth\": 11,\n",
" \"n_jobs\": 5,\n",
" \"n_estimators\": 120\n",
" }\n",
"print(training_image)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 1.2 Creamos el trainingjob descriptor"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"import time\n",
"import sagemaker\n",
"import boto3\n",
"\n",
"roleArn = \"arn:aws:iam::{}:role/MLOps\".format(account_id)\n",
"timestamp = time.strftime('-%Y-%m-%d-%H-%M-%S', time.gmtime())\n",
"job_name = model_prefix + timestamp\n",
"sagemaker_session = sagemaker.Session()\n",
"\n",
"training_params = {}\n",
"\n",
"# Here we set the reference for the Image Classification Docker image, stored on ECR (https://aws.amazon.com/pt/ecr/)\n",
"training_params[\"AlgorithmSpecification\"] = {\n",
" \"TrainingImage\": training_image,\n",
" \"TrainingInputMode\": \"File\"\n",
"}\n",
"\n",
"# The IAM role with all the permissions given to Sagemaker\n",
"training_params[\"RoleArn\"] = roleArn\n",
"\n",
"# Here Sagemaker will store the final trained model\n",
"training_params[\"OutputDataConfig\"] = {\n",
" \"S3OutputPath\": 's3://{}/{}'.format(sagemaker_session.default_bucket(), model_prefix)\n",
"}\n",
"\n",
"# This is the config of the instance that will execute the training\n",
"training_params[\"ResourceConfig\"] = {\n",
" \"InstanceCount\": 1,\n",
" \"InstanceType\": \"ml.m4.xlarge\",\n",
" \"VolumeSizeInGB\": 30\n",
"}\n",
"\n",
"# The job name. You'll see this name in the Jobs section of the Sagemaker's console\n",
"training_params[\"TrainingJobName\"] = job_name\n",
"\n",
"for i in hyperparameters:\n",
" hyperparameters[i] = str(hyperparameters[i])\n",
" \n",
"# Here you will configure the hyperparameters used for training your model.\n",
"training_params[\"HyperParameters\"] = hyperparameters\n",
"\n",
"# Training timeout\n",
"training_params[\"StoppingCondition\"] = {\n",
" \"MaxRuntimeInSeconds\": 360000\n",
"}\n",
"\n",
"# The algorithm currently only supports fullyreplicated model (where data is copied onto each machine)\n",
"training_params[\"InputDataConfig\"] = []\n",
"\n",
"# Please notice that we're using application/x-recordio for both \n",
"# training and validation datasets, given our dataset is formated in RecordIO\n",
"\n",
"# Here we set training dataset\n",
"training_params[\"InputDataConfig\"].append({\n",
" \"ChannelName\": \"train\",\n",
" \"DataSource\": {\n",
" \"S3DataSource\": {\n",
" \"S3DataType\": \"S3Prefix\",\n",
" \"S3Uri\": 's3://{}/{}/input/train'.format(sagemaker_session.default_bucket(), model_prefix),\n",
" \"S3DataDistributionType\": \"FullyReplicated\"\n",
" }\n",
" },\n",
" \"ContentType\": \"text/csv\",\n",
" \"CompressionType\": \"None\"\n",
"})\n",
"training_params[\"InputDataConfig\"].append({\n",
" \"ChannelName\": \"validation\",\n",
" \"DataSource\": {\n",
" \"S3DataSource\": {\n",
" \"S3DataType\": \"S3Prefix\",\n",
" \"S3Uri\": 's3://{}/{}/input/validation'.format(sagemaker_session.default_bucket(), model_prefix),\n",
" \"S3DataDistributionType\": \"FullyReplicated\"\n",
" }\n",
" },\n",
" \"ContentType\": \"text/csv\",\n",
" \"CompressionType\": \"None\"\n",
"})\n",
"training_params[\"Tags\"] = []"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"deployment_params = {\n",
" \"EndpointPrefix\": model_prefix,\n",
" \"DevelopmentEndpoint\": {\n",
" # we want to enable the endpoint monitoring\n",
" \"InferenceMonitoring\": True,\n",
" # we will collect 100% of all the requests/predictions\n",
" \"InferenceMonitoringSampling\": 100,\n",
" \"InferenceMonitoringOutputBucket\": 's3://{}/{}/monitoring/dev'.format(sagemaker_session.default_bucket(), model_prefix),\n",
" # we don't want to enable A/B tests in development\n",
" \"ABTests\": False,\n",
" # we'll use a basic instance for testing purposes\n",
" \"InstanceType\": \"ml.t2.large\",\n",
" \"InitialInstanceCount\": 1,\n",
" # we don't want high availability/escalability for development\n",
" \"AutoScaling\": None\n",
" },\n",
" \"ProductionEndpoint\": {\n",
" # we want to enable the endpoint monitoring\n",
" \"InferenceMonitoring\": True,\n",
" # we will collect 100% of all the requests/predictions\n",
" \"InferenceMonitoringSampling\": 100,\n",
" \"InferenceMonitoringOutputBucket\": 's3://{}/{}/monitoring/prd'.format(sagemaker_session.default_bucket(), model_prefix),\n",
" # we want to do A/B tests in production\n",
" \"ABTests\": True,\n",
" # we'll use a better instance for production. CPU optimized\n",
" \"InstanceType\": \"ml.c5.large\",\n",
" \"InitialInstanceCount\": 2,\n",
" \"InitialVariantWeight\": 0.1,\n",
" # we want elasticity. at minimum 2 instances to support the endpoint and at maximum 10\n",
" # we'll use a threshold of 750 predictions per instance to start adding new instances or remove them\n",
" \"AutoScaling\": {\n",
" \"MinCapacity\": 2,\n",
" \"MaxCapacity\": 10,\n",
" \"TargetValue\": 200.0,\n",
" \"ScaleInCooldown\": 30,\n",
" \"ScaleOutCooldown\": 60,\n",
" \"PredefinedMetricType\": \"SageMakerVariantInvocationsPerInstance\"\n",
" }\n",
" }\n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Preparing and uploading the dataset"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import sagemaker\n",
"from sklearn import datasets\n",
"from sklearn.model_selection import train_test_split\n",
"\n",
"sagemaker_session = sagemaker.Session()\n",
"iris = datasets.load_iris()\n",
"\n",
"X_train, X_test, y_train, y_test = train_test_split(\n",
" iris.data, iris.target, test_size=0.33, random_state=42, stratify=iris.target)\n",
"np.savetxt(\"iris_train.csv\", np.column_stack((y_train, X_train)), delimiter=\",\", fmt='%0.3f')\n",
"np.savetxt(\"iris_test.csv\", np.column_stack((y_test, X_test)), delimiter=\",\", fmt='%0.3f')\n",
"\n",
"# Upload the dataset to an S3 bucket\n",
"input_train = sagemaker_session.upload_data(path='iris_train.csv', key_prefix='%s/input/train' % model_prefix)\n",
"input_test = sagemaker_session.upload_data(path='iris_test.csv', key_prefix='%s/input/validation' % model_prefix)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1.3 ¡Está bien! Ahora es el momento de iniciar el proceso de formación."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'ResponseMetadata': {'RequestId': '7CE2QE2096RQ775S',\n",
" 'HostId': 'v3A+9BwdHJsxwjPeGvFKaZfa4RcOKsO52Od6tGbPdePXUBQtbfUpazfyREOTrUZ9h0NgbHJIBHQ=',\n",
" 'HTTPStatusCode': 200,\n",
" 'HTTPHeaders': {'x-amz-id-2': 'v3A+9BwdHJsxwjPeGvFKaZfa4RcOKsO52Od6tGbPdePXUBQtbfUpazfyREOTrUZ9h0NgbHJIBHQ=',\n",
" 'x-amz-request-id': '7CE2QE2096RQ775S',\n",
" 'date': 'Wed, 18 Aug 2021 22:39:17 GMT',\n",
" 'x-amz-version-id': 'SmWOlis22iIOblXkrvQVSoOjMCaS03LD',\n",
" 'etag': '\"c82b94bd42b31419af7496e058510d87\"',\n",
" 'server': 'AmazonS3',\n",
" 'content-length': '0'},\n",
" 'RetryAttempts': 0},\n",
" 'ETag': '\"c82b94bd42b31419af7496e058510d87\"',\n",
" 'VersionId': 'SmWOlis22iIOblXkrvQVSoOjMCaS03LD'}"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import boto3\n",
"import io\n",
"import zipfile\n",
"import json\n",
"\n",
"s3 = boto3.client('s3')\n",
"sts_client = boto3.client(\"sts\")\n",
"\n",
"session = boto3.session.Session()\n",
"\n",
"account_id = sts_client.get_caller_identity()[\"Account\"]\n",
"region = session.region_name\n",
"\n",
"bucket_name = \"mlops-%s-%s\" % (region, account_id)\n",
"key_name = \"training_jobs/%s/trainingjob.zip\" % model_prefix\n",
"\n",
"zip_buffer = io.BytesIO()\n",
"with zipfile.ZipFile(zip_buffer, 'a') as zf:\n",
" zf.writestr('trainingjob.json', json.dumps(training_params))\n",
" zf.writestr('deployment.json', json.dumps(deployment_params))\n",
"zip_buffer.seek(0)\n",
"\n",
"s3.put_object(Bucket=bucket_name, Key=key_name, Body=bytearray(zip_buffer.read()))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Ok, now open the AWS console in another tab and go to the CodePipeline console to see the status of our building pipeline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> Now, click on [THIS NOTEBOOK](02_Check%20Progress%20and%20Test%20the%20endpoint.ipynb) to see the progress and test your endpoint"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# A/B TESTS\n",
"\n",
"If you take a look on the **deployment** parameters you'll see that we enabled the **Production** endpoint for A/B tests. To try this, just deploy the first model into production, then run the section **1.3** again. Feel free to change some hyperparameter values in the section **1.1** before starting a new training session.\n",
"\n",
"When publishing the second model into **Development**, the endpoint will be updated and the model will be replaced without compromising the user experience. This is the natural behavior of an Endpoint in SageMaker when you update it.\n",
"\n",
"After you approve the deployment into **Production**, the endpoint will be updated and a second model will be added to it. Now it's time to execute some **A/B tests**. In the **Progress** Jupyter (link above), execute the last cell (test code) to show which model answered your request. You just need to keep sending some requests to see the **Production** endpoint using both models A and B, respecting the proportion defined by the variable **InitialVariantWeight** in the deployment params.\n",
"\n",
"In a real life scenario you can monitor the performance of both models and then adjust the **Weight** of each model to do the full transition to the new model (and remove the old one) or to rollback the new deployment.\n",
"\n",
"To adjust the weight of each model (Variant Name) in an endpoint, you just need to call the following function: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.update_endpoint_weights_and_capacities"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "conda_python3",
"language": "python",
"name": "conda_python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.13"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Now let's monitor the training/deploying process"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"import boto3\n",
"import ipywidgets as widgets\n",
"import time\n",
"\n",
"from IPython.display import display"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Helper functions"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"def get_actions():\n",
" actions = []\n",
" executionId = None\n",
" resp = codepipeline.get_pipeline_state( name=pipeline_name )\n",
" for stage in resp['stageStates']:\n",
" stageName = stage['stageName']\n",
" stageStatus = None\n",
" if stage.get('latestExecution') is not None:\n",
" stageStatus = stage['latestExecution']['status']\n",
" if executionId is None:\n",
" executionId = stage['latestExecution']['pipelineExecutionId']\n",
" elif stage['latestExecution']['pipelineExecutionId'] != executionId:\n",
" stageStatus = 'Old'\n",
" for action in stage['actionStates']:\n",
" actionName = action['actionName']\n",
" actionStatus = 'Old'\n",
" if action.get('latestExecution') is not None and stageStatus != 'Old':\n",
" actionStatus = action['latestExecution']['status']\n",
" actions.append( {'stageName': stageName, \n",
" 'stageStatus': stageStatus, \n",
" 'actionName': actionName, \n",
" 'actionStatus': actionStatus})\n",
" return actions"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"def get_approval_token():\n",
" resp = codepipeline.get_pipeline_state( name=pipeline_name )\n",
" token = None\n",
" # Get the approve train status token\n",
" for stageState in resp['stageStates']:\n",
" if stageState['stageName'] == 'DeployApproval':\n",
" for actionState in stageState['actionStates']:\n",
" if actionState['actionName'] == 'ApproveDeploy':\n",
" if actionState.get('latestExecution') is None:\n",
" return None\n",
" latestExecution = actionState['latestExecution']\n",
" if latestExecution['status'] == 'InProgress':\n",
" token = latestExecution['token']\n",
" return token"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"from sagemaker.serializers import CSVSerializer\n",
"csv_serializer = CSVSerializer()\n",
"def test_endpoint(endpoint_name, payload):\n",
" resp = sm.invoke_endpoint(\n",
" EndpointName=endpoint_name,\n",
" ContentType='text/csv',\n",
" Accept='text/csv',\n",
" Body=csv_serializer.serialize(payload)\n",
" )\n",
" variant_name = resp['ResponseMetadata']['HTTPHeaders']['x-amzn-invoked-production-variant']\n",
" return float(resp['Body'].read().decode('utf-8').strip()), variant_name"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"def approval(token, result):\n",
" if token is None:\n",
" return\n",
" \n",
" codepipeline.put_approval_result(\n",
" pipelineName=pipeline_name,\n",
" stageName='DeployApproval',\n",
" actionName='ApproveDeploy',\n",
" result=result,\n",
" token=token\n",
" )"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"def approve(b):\n",
" result={\n",
" 'summary': 'This is a great model! Put into production.',\n",
" 'status': 'Approved'\n",
" }\n",
" approval(get_approval_token(), result) \n",
" button_box.close()\n",
" start_monitoring()"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"def reject(b):\n",
" result={\n",
" 'summary': 'This is a rubbish model. Discard it',\n",
" 'status': 'Rejected'\n",
" }\n",
" approval(get_approval_token(), result)\n",
" button_box.close()\n",
" start_monitoring()"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"def start_monitoring():\n",
" global button_box\n",
" \n",
" running = True\n",
" while running:\n",
" steps_ok = 0\n",
" for k,action in enumerate(get_actions()):\n",
" if action['actionStatus'] == 'Failed':\n",
" bar.bar_style='danger'\n",
" label.value='Ops! Something went wrong Stage[{}] Action[{}]'.format(\n",
" action['stageName'], action['actionName'])\n",
" running = False\n",
" return\n",
"\n",
" elif action['actionStatus'] == 'InProgress':\n",
" if get_approval_token() is not None:\n",
" display(button_box)\n",
" running = False\n",
" break\n",
" elif action['actionStatus'] == 'Old':\n",
" break\n",
" elif action['actionStatus'] == 'Succeeded':\n",
" steps_ok += 1\n",
" \n",
" label.value = \"Actions {}/{} - Current: Stage[{}] Action[{}]\".format( \n",
" k+1,max_actions, action['stageName'], action['actionName'] )\n",
" bar.value = steps_ok\n",
"\n",
" if steps_ok == max_actions:\n",
" running = False\n",
" else: \n",
" time.sleep(2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Job monitoring"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"codepipeline = boto3.client('codepipeline')\n",
"sm = boto3.client('sagemaker-runtime')\n",
"\n",
"model_prefix='iris-model'\n",
"pipeline_name = 'iris-model-pipeline'\n",
"endpoint_name_mask='{}-%s'.format(model_prefix)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "c37c0c22909842db859c6300a6de82a2",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"VBox(children=(Label(value='Loading...'), IntProgress(value=0, bar_style='info', max=6)))"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"approve_btn = widgets.Button(description=\"Approve\", button_style='success', icon='check')\n",
"reject_btn = widgets.Button(description=\"Reject\", button_style='danger', icon='close')\n",
"approve_btn.on_click(approve)\n",
"reject_btn.on_click(reject)\n",
"button_box = widgets.HBox([approve_btn, reject_btn])\n",
" \n",
"max_actions = len(get_actions())\n",
"label = widgets.Label(value=\"Loading...\")\n",
"bar = widgets.IntProgress( value=0, min=0, max=max_actions, step=1, bar_style='info' )\n",
"info_box = widgets.VBox([label, bar])\n",
"\n",
"display(info_box)\n",
"start_monitoring()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Now, if everything went fine, we can test our models"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"DSV\n"
]
},
{
"ename": "ValidationError",
"evalue": "An error occurred (ValidationError) when calling the InvokeEndpoint operation: Endpoint iris-model-development of account 894268508623 not found.",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mValidationError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<timed exec>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n",
"\u001b[0;32m<ipython-input-4-6f6e9d205185>\u001b[0m in \u001b[0;36mtest_endpoint\u001b[0;34m(endpoint_name, payload)\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0mContentType\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'text/csv'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 7\u001b[0m \u001b[0mAccept\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'text/csv'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 8\u001b[0;31m \u001b[0mBody\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mcsv_serializer\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mserialize\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpayload\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 9\u001b[0m )\n\u001b[1;32m 10\u001b[0m \u001b[0mvariant_name\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mresp\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'ResponseMetadata'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'HTTPHeaders'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'x-amzn-invoked-production-variant'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m~/anaconda3/envs/python3/lib/python3.6/site-packages/botocore/client.py\u001b[0m in \u001b[0;36m_api_call\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m 384\u001b[0m \"%s() only accepts keyword arguments.\" % py_operation_name)\n\u001b[1;32m 385\u001b[0m \u001b[0;31m# The \"self\" in this scope is referring to the BaseClient.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 386\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_make_api_call\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0moperation_name\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 387\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 388\u001b[0m \u001b[0m_api_call\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__name__\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mstr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpy_operation_name\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m~/anaconda3/envs/python3/lib/python3.6/site-packages/botocore/client.py\u001b[0m in \u001b[0;36m_make_api_call\u001b[0;34m(self, operation_name, api_params)\u001b[0m\n\u001b[1;32m 703\u001b[0m \u001b[0merror_code\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mparsed_response\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Error\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m{\u001b[0m\u001b[0;34m}\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Code\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 704\u001b[0m \u001b[0merror_class\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexceptions\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfrom_code\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0merror_code\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 705\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0merror_class\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mparsed_response\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0moperation_name\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 706\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 707\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mparsed_response\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mValidationError\u001b[0m: An error occurred (ValidationError) when calling the InvokeEndpoint operation: Endpoint iris-model-development of account 894268508623 not found."
]
}
],
"source": [
"%%time\n",
"payload = [4.6, 3.1, 1.5, 0.2]\n",
"\n",
"print( \"DSV\")\n",
"print( \"Classifier: %s, Variant Name: %s\" % test_endpoint( endpoint_name_mask % ('development'), payload ) )\n",
"\n",
"print( \"\\nPRD\")\n",
"print( \"Classifier: %s, Variant Name: %s\" % test_endpoint( endpoint_name_mask % ('production'), payload ) )"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"DSV\n",
"Classifier: 1.0, Variant Name: model-a\n",
"CPU times: user 4.7 ms, sys: 0 ns, total: 4.7 ms\n",
"Wall time: 21.6 ms\n"
]
}
],
"source": [
"%%time\n",
"payload = [6.7, 3.1, 4.7, 1.5]\n",
"\n",
"print( \"DSV\")\n",
"print( \"Classifier: %s, Variant Name: %s\" % test_endpoint( endpoint_name_mask % ('development'), payload ) )"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"payload = [6.7, 3.1, 4.7, 1.5]\n",
"\n",
"\n",
"print( \"\\nPRD\")\n",
"print( \"Classifier: %s, Variant Name: %s\" % test_endpoint( endpoint_name_mask % ('production'), payload ) )"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "conda_python3",
"language": "python",
"name": "conda_python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.13"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment