🎯 Introduction to LLM Post-Training with AWS Bedrock
📋 What is LLM Fine-Tuning?
Fine-tuning is the process of taking a pre-trained Large Language Model (LLM) and further training it on your specific dataset to improve its performance for your particular use case. While foundation models like Claude, Titan, or Llama are incredibly capable, they’re trained on broad, general data. Fine-tuning allows you to:
- Improve accuracy for domain-specific tasks (legal, medical, finance)
- Adapt writing style to match your brand voice
- Enhance performance on specialized workflows
- Reduce hallucinations by grounding responses in your data
- Optimize for specific formats (JSON output, structured responses)
🚀 Why AWS Bedrock for Fine-Tuning?
AWS Bedrock provides a fully managed service for customizing foundation models without needing deep ML expertise or managing infrastructure:
✅ No Infrastructure Management - AWS handles compute, storage, and scaling ✅ Multiple Customization Methods - Fine-tuning, continued pre-training, reinforcement learning ✅ Data Privacy - Your training data never leaves your AWS account or trains other models ✅ Multiple Model Support - Amazon Titan, Meta Llama, Cohere Command, and more ✅ Cost-Effective - Pay only for training time and inference ✅ Enterprise Security - Customer managed keys, VPC endpoints, IAM integration
🎯 Recent Updates (December 2025)
Amazon Bedrock now supports Reinforcement Fine-Tuning, delivering 66% accuracy gains on average over base models. This new capability allows you to:
- Train models with small sets of prompts instead of large labeled datasets
- Use rule-based or AI-based judges to define reward functions
- Optimize for both objective tasks (code generation, math) and subjective tasks (chatbot interactions)
🏗️ Three Approaches to Model Customization
┌─────────────────────────────────────────────────────────────┐
│ AWS Bedrock Customization │
└─────────────────────────────────────────────────────────────┘
│
┌───────────────────┼───────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────┐ ┌──────────────┐ ┌──────────────────┐
│ Fine-Tuning │ │ Continued │ │ Reinforcement │
│ │ │ Pre-Training │ │ Fine-Tuning │
└───────────────┘ └──────────────┘ └──────────────────┘
│ │ │
│ Labeled data │ Unlabeled data │ Prompt + feedback│
│ Task-specific │ Domain knowledge │ Alignment-focused│
│ 100-10K examples │ Large corpus │ Small prompt set │
└───────────────────┴──────────────────┴──────────────────┘
1. Supervised Fine-Tuning
Best for: Task-specific improvements with labeled data
- Provide prompt-completion pairs
- Improves accuracy on specific tasks
- Requires 100-10,000 labeled examples
2. Continued Pre-Training
Best for: Domain adaptation with unlabeled data
- Train on domain-specific text corpus
- Model learns domain vocabulary and concepts
- No labels required, just relevant text
3. Reinforcement Fine-Tuning (NEW)
Best for: Alignment and preference optimization
- Uses small prompt sets with feedback
- Rule-based or AI-based reward signals
- Ideal for instruction following and safety
🚀 Getting Started: Prerequisites and Setup
🔧 Prerequisites
AWS Account Requirements:
- AWS account with Bedrock access
- IAM permissions for Bedrock, S3, IAM
- Service quota for model customization (request if needed)
Development Environment:
- Python 3.9+
- AWS CLI configured
- boto3 SDK
- AWS CDK (for infrastructure as code)
Knowledge Requirements:
- Basic understanding of LLMs
- Familiarity with AWS services
- Python programming
- Basic ML concepts
📦 Installation and Setup
1# Create project directory
2mkdir bedrock-finetuning
3cd bedrock-finetuning
4
5# Set up Python environment
6python -m venv venv
7source venv/bin/activate # Windows: venv\Scripts\activate
8
9# Install required packages
10pip install boto3 pandas jsonlines aws-cdk-lib constructs
11
12# Configure AWS credentials
13aws configure
14# Enter your AWS Access Key ID, Secret Key, Region (us-east-1), and format (json)
15
16# Verify Bedrock access
17aws bedrock list-foundation-models --region us-east-1
🏗️ Project Structure
bedrock-finetuning/
├── data/
│ ├── training/
│ │ ├── training_data.jsonl
│ │ └── validation_data.jsonl
│ └── synthetic/
│ └── generated_samples.jsonl
├── scripts/
│ ├── prepare_data.py
│ ├── start_training.py
│ ├── evaluate_model.py
│ └── inference.py
├── infrastructure/
│ ├── cdk/
│ │ ├── app.py
│ │ ├── bedrock_stack.py
│ │ └── requirements.txt
│ └── config.yaml
├── notebooks/
│ └── data_exploration.ipynb
├── logs/
├── models/
│ └── custom_models/
└── requirements.txt
💻 Part 1: Data Preparation for Fine-Tuning
📝 Data Format Requirements
For Fine-Tuning (Prompt-Completion Pairs):
{"prompt": "Classify the sentiment of this review: The product exceeded my expectations!", "completion": "positive"}
{"prompt": "Classify the sentiment of this review: Terrible quality, broke after one day.", "completion": "negative"}
{"prompt": "Classify the sentiment of this review: It's okay, nothing special.", "completion": "neutral"}
For Continued Pre-Training (Raw Text):
{"text": "Machine learning is a subset of artificial intelligence that focuses on enabling systems to learn from data..."}
{"text": "Neural networks consist of interconnected layers of nodes, where each connection has an associated weight..."}
For Reinforcement Fine-Tuning (Prompts with Multiple Responses):
{
"prompt": "Write a Python function to calculate fibonacci numbers",
"responses": [
{"text": "def fib(n):\n if n <= 1: return n\n return fib(n-1) + fib(n-2)", "score": 0.6},
{"text": "def fib(n):\n a, b = 0, 1\n for _ in range(n):\n a, b = b, a + b\n return a", "score": 1.0}
]
}
🛠️ Data Preparation Script
scripts/prepare_data.py:
1#!/usr/bin/env python3
2"""
3Data preparation script for AWS Bedrock fine-tuning
4Validates format, splits data, uploads to S3
5"""
6
7import json
8import jsonlines
9import pandas as pd
10import boto3
11from pathlib import Path
12from typing import List, Dict, Any
13from datetime import datetime
14import logging
15
16logging.basicConfig(level=logging.INFO)
17logger = logging.getLogger(__name__)
18
19
20class BedrockDataPreparator:
21 """Prepare and validate training data for Bedrock fine-tuning"""
22
23 def __init__(self, s3_bucket: str, s3_prefix: str = "bedrock-training"):
24 self.s3_client = boto3.client('s3')
25 self.s3_bucket = s3_bucket
26 self.s3_prefix = s3_prefix
27
28 def validate_fine_tuning_data(self, data: List[Dict[str, Any]]) -> bool:
29 """
30 Validate fine-tuning data format
31
32 Requirements:
33 - Each record must have 'prompt' and 'completion'
34 - Prompt must be non-empty string
35 - Completion must be non-empty string
36 - Max 10,000 records
37 """
38 if len(data) > 10000:
39 logger.error(f"Dataset has {len(data)} records. Max is 10,000.")
40 return False
41
42 for idx, record in enumerate(data):
43 # Check required fields
44 if 'prompt' not in record or 'completion' not in record:
45 logger.error(f"Record {idx} missing 'prompt' or 'completion'")
46 return False
47
48 # Check non-empty
49 if not record['prompt'] or not record['completion']:
50 logger.error(f"Record {idx} has empty prompt or completion")
51 return False
52
53 # Check types
54 if not isinstance(record['prompt'], str) or not isinstance(record['completion'], str):
55 logger.error(f"Record {idx} has non-string prompt or completion")
56 return False
57
58 # Check length (recommended)
59 if len(record['prompt']) > 2048:
60 logger.warning(f"Record {idx} has very long prompt ({len(record['prompt'])} chars)")
61
62 if len(record['completion']) > 2048:
63 logger.warning(f"Record {idx} has very long completion ({len(record['completion'])} chars)")
64
65 logger.info(f"✅ Validated {len(data)} training records")
66 return True
67
68 def split_data(self, data: List[Dict[str, Any]],
69 train_ratio: float = 0.8) -> tuple:
70 """
71 Split data into training and validation sets
72
73 Args:
74 data: List of training examples
75 train_ratio: Proportion for training (default 0.8)
76
77 Returns:
78 Tuple of (training_data, validation_data)
79 """
80 import random
81 random.shuffle(data)
82
83 split_idx = int(len(data) * train_ratio)
84 train_data = data[:split_idx]
85 val_data = data[split_idx:]
86
87 logger.info(f"Split: {len(train_data)} training, {len(val_data)} validation")
88 return train_data, val_data
89
90 def save_jsonl(self, data: List[Dict[str, Any]], filepath: str):
91 """Save data in JSONL format"""
92 Path(filepath).parent.mkdir(parents=True, exist_ok=True)
93
94 with jsonlines.open(filepath, mode='w') as writer:
95 for record in data:
96 writer.write(record)
97
98 logger.info(f"Saved {len(data)} records to {filepath}")
99
100 def upload_to_s3(self, local_path: str, s3_key: str) -> str:
101 """
102 Upload training data to S3
103
104 Returns:
105 S3 URI (s3://bucket/key)
106 """
107 try:
108 self.s3_client.upload_file(local_path, self.s3_bucket, s3_key)
109 s3_uri = f"s3://{self.s3_bucket}/{s3_key}"
110 logger.info(f"Uploaded to {s3_uri}")
111 return s3_uri
112 except Exception as e:
113 logger.error(f"Failed to upload to S3: {e}")
114 raise
115
116 def prepare_dataset(self,
117 input_file: str,
118 output_dir: str = "data/training",
119 upload: bool = True) -> Dict[str, str]:
120 """
121 Complete data preparation pipeline
122
123 Args:
124 input_file: Path to raw data file (JSON or JSONL)
125 output_dir: Directory for processed data
126 upload: Whether to upload to S3
127
128 Returns:
129 Dictionary with S3 URIs for train/val data
130 """
131 logger.info("=" * 60)
132 logger.info("Starting Data Preparation Pipeline")
133 logger.info("=" * 60)
134
135 # Load data
136 logger.info(f"Loading data from {input_file}")
137 with open(input_file, 'r') as f:
138 if input_file.endswith('.jsonl'):
139 data = [json.loads(line) for line in f]
140 else:
141 data = json.load(f)
142
143 # Validate
144 if not self.validate_fine_tuning_data(data):
145 raise ValueError("Data validation failed")
146
147 # Split
148 train_data, val_data = self.split_data(data)
149
150 # Save locally
151 timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
152 train_file = f"{output_dir}/train_{timestamp}.jsonl"
153 val_file = f"{output_dir}/val_{timestamp}.jsonl"
154
155 self.save_jsonl(train_data, train_file)
156 self.save_jsonl(val_data, val_file)
157
158 result = {
159 "train_local": train_file,
160 "val_local": val_file
161 }
162
163 # Upload to S3
164 if upload:
165 train_s3_key = f"{self.s3_prefix}/train_{timestamp}.jsonl"
166 val_s3_key = f"{self.s3_prefix}/val_{timestamp}.jsonl"
167
168 result["train_s3_uri"] = self.upload_to_s3(train_file, train_s3_key)
169 result["val_s3_uri"] = self.upload_to_s3(val_file, val_s3_key)
170
171 logger.info("=" * 60)
172 logger.info("Data Preparation Complete!")
173 logger.info("=" * 60)
174
175 return result
176
177
178def create_sample_dataset(output_file: str, num_samples: int = 100):
179 """
180 Create sample sentiment analysis dataset for testing
181
182 Args:
183 output_file: Where to save the sample data
184 num_samples: Number of samples to generate
185 """
186 import random
187
188 templates = {
189 "positive": [
190 "This product is amazing! Highly recommend.",
191 "Exceeded all my expectations. Five stars!",
192 "Best purchase I've made this year.",
193 "Absolutely love it. Will buy again.",
194 "Outstanding quality and fast shipping."
195 ],
196 "negative": [
197 "Terrible quality. Don't waste your money.",
198 "Broke after one day. Very disappointed.",
199 "Worst purchase ever. Asking for refund.",
200 "Nothing like the description. Avoid!",
201 "Poor quality and slow delivery."
202 ],
203 "neutral": [
204 "It's okay, nothing special.",
205 "Does what it's supposed to do.",
206 "Average product at average price.",
207 "No complaints but not impressed.",
208 "Pretty standard, meets expectations."
209 ]
210 }
211
212 samples = []
213 sentiments = list(templates.keys())
214
215 for _ in range(num_samples):
216 sentiment = random.choice(sentiments)
217 review = random.choice(templates[sentiment])
218
219 samples.append({
220 "prompt": f"Classify the sentiment of this review: {review}",
221 "completion": sentiment
222 })
223
224 Path(output_file).parent.mkdir(parents=True, exist_ok=True)
225
226 with jsonlines.open(output_file, mode='w') as writer:
227 writer.writeall(samples)
228
229 logger.info(f"Created {num_samples} sample records in {output_file}")
230
231
232if __name__ == "__main__":
233 import argparse
234
235 parser = argparse.ArgumentParser(description="Prepare data for Bedrock fine-tuning")
236 parser.add_argument("--input", required=True, help="Input data file")
237 parser.add_argument("--bucket", required=True, help="S3 bucket name")
238 parser.add_argument("--prefix", default="bedrock-training", help="S3 prefix")
239 parser.add_argument("--output-dir", default="data/training", help="Output directory")
240 parser.add_argument("--no-upload", action="store_true", help="Skip S3 upload")
241
242 args = parser.parse_args()
243
244 # Prepare data
245 preparator = BedrockDataPreparator(
246 s3_bucket=args.bucket,
247 s3_prefix=args.prefix
248 )
249
250 result = preparator.prepare_dataset(
251 input_file=args.input,
252 output_dir=args.output_dir,
253 upload=not args.no_upload
254 )
255
256 print("\n✅ Data preparation complete!")
257 print(f"Training data: {result['train_s3_uri']}")
258 print(f"Validation data: {result['val_s3_uri']}")
💻 Part 2: Fine-Tuning Job Execution
🚀 Starting a Fine-Tuning Job
scripts/start_training.py:
1#!/usr/bin/env python3
2"""
3Start AWS Bedrock fine-tuning job
4"""
5
6import boto3
7import json
8import time
9from datetime import datetime
10from typing import Dict, Any
11import logging
12
13logging.basicConfig(level=logging.INFO)
14logger = logging.getLogger(__name__)
15
16
17class BedrockFineTuner:
18 """Manage Bedrock model fine-tuning jobs"""
19
20 def __init__(self, region: str = "us-east-1"):
21 self.bedrock = boto3.client('bedrock', region_name=region)
22 self.region = region
23
24 def create_fine_tuning_job(self,
25 job_name: str,
26 base_model_id: str,
27 training_data_s3_uri: str,
28 validation_data_s3_uri: str,
29 output_s3_uri: str,
30 role_arn: str,
31 hyperparameters: Dict[str, str] = None) -> str:
32 """
33 Create a fine-tuning job
34
35 Args:
36 job_name: Unique name for the job
37 base_model_id: Foundation model to fine-tune (e.g., 'amazon.titan-text-express-v1')
38 training_data_s3_uri: S3 URI for training data
39 validation_data_s3_uri: S3 URI for validation data
40 output_s3_uri: S3 URI for output model
41 role_arn: IAM role ARN with permissions
42 hyperparameters: Training hyperparameters
43
44 Returns:
45 Job ARN
46 """
47 logger.info("=" * 70)
48 logger.info("Creating Fine-Tuning Job")
49 logger.info("=" * 70)
50
51 # Default hyperparameters for Titan models
52 if hyperparameters is None:
53 hyperparameters = {
54 "epochCount": "3",
55 "batchSize": "1",
56 "learningRate": "0.00001",
57 "learningRateWarmupSteps": "0"
58 }
59
60 try:
61 response = self.bedrock.create_model_customization_job(
62 jobName=job_name,
63 customModelName=f"{job_name}-model",
64 roleArn=role_arn,
65 baseModelIdentifier=base_model_id,
66 customizationType="FINE_TUNING",
67 trainingDataConfig={
68 "s3Uri": training_data_s3_uri
69 },
70 validationDataConfig={
71 "validators": [{
72 "s3Uri": validation_data_s3_uri
73 }]
74 },
75 outputDataConfig={
76 "s3Uri": output_s3_uri
77 },
78 hyperParameters=hyperparameters
79 )
80
81 job_arn = response['jobArn']
82
83 logger.info(f"✅ Fine-tuning job created successfully!")
84 logger.info(f"Job ARN: {job_arn}")
85 logger.info(f"Job Name: {job_name}")
86 logger.info(f"Base Model: {base_model_id}")
87 logger.info("=" * 70)
88
89 return job_arn
90
91 except Exception as e:
92 logger.error(f"Failed to create fine-tuning job: {e}")
93 raise
94
95 def get_job_status(self, job_arn: str) -> Dict[str, Any]:
96 """Get status of a fine-tuning job"""
97 try:
98 response = self.bedrock.get_model_customization_job(
99 jobIdentifier=job_arn
100 )
101 return response
102 except Exception as e:
103 logger.error(f"Failed to get job status: {e}")
104 raise
105
106 def wait_for_completion(self, job_arn: str,
107 check_interval: int = 60,
108 timeout: int = 7200) -> Dict[str, Any]:
109 """
110 Wait for fine-tuning job to complete
111
112 Args:
113 job_arn: Job ARN to monitor
114 check_interval: Seconds between status checks
115 timeout: Maximum seconds to wait
116
117 Returns:
118 Final job status
119 """
120 logger.info(f"Monitoring job: {job_arn}")
121 logger.info(f"Check interval: {check_interval}s, Timeout: {timeout}s")
122
123 start_time = time.time()
124 last_status = None
125
126 while True:
127 elapsed = time.time() - start_time
128
129 if elapsed > timeout:
130 logger.error(f"⏰ Timeout reached after {timeout}s")
131 break
132
133 status = self.get_job_status(job_arn)
134 current_status = status['status']
135
136 if current_status != last_status:
137 logger.info(f"Status: {current_status}")
138 last_status = current_status
139
140 # Terminal states
141 if current_status == 'Completed':
142 logger.info("✅ Fine-tuning job completed successfully!")
143 return status
144 elif current_status in ['Failed', 'Stopped']:
145 logger.error(f"❌ Job ended with status: {current_status}")
146 if 'failureMessage' in status:
147 logger.error(f"Failure message: {status['failureMessage']}")
148 return status
149
150 # Wait before next check
151 time.sleep(check_interval)
152
153 return self.get_job_status(job_arn)
154
155 def list_custom_models(self) -> list:
156 """List all custom models"""
157 try:
158 response = self.bedrock.list_custom_models()
159 models = response.get('modelSummaries', [])
160
161 logger.info(f"Found {len(models)} custom models:")
162 for model in models:
163 logger.info(f" - {model['modelName']} ({model['modelArn']})")
164
165 return models
166 except Exception as e:
167 logger.error(f"Failed to list custom models: {e}")
168 raise
169
170 def create_provisioned_throughput(self,
171 model_arn: str,
172 throughput_name: str,
173 model_units: int = 1) -> str:
174 """
175 Create provisioned throughput for custom model
176
177 Args:
178 model_arn: ARN of custom model
179 throughput_name: Name for provisioned throughput
180 model_units: Number of model units (1-10)
181
182 Returns:
183 Provisioned throughput ARN
184 """
185 try:
186 response = self.bedrock.create_provisioned_model_throughput(
187 modelUnits=model_units,
188 provisionedModelName=throughput_name,
189 modelId=model_arn
190 )
191
192 throughput_arn = response['provisionedModelArn']
193 logger.info(f"✅ Created provisioned throughput: {throughput_arn}")
194
195 return throughput_arn
196 except Exception as e:
197 logger.error(f"Failed to create provisioned throughput: {e}")
198 raise
199
200
201def main():
202 """Example usage"""
203 import argparse
204
205 parser = argparse.ArgumentParser(description="Start Bedrock fine-tuning job")
206 parser.add_argument("--job-name", required=True, help="Job name")
207 parser.add_argument("--base-model", required=True,
208 help="Base model ID (e.g., amazon.titan-text-express-v1)")
209 parser.add_argument("--train-data", required=True, help="S3 URI for training data")
210 parser.add_argument("--val-data", required=True, help="S3 URI for validation data")
211 parser.add_argument("--output-s3", required=True, help="S3 URI for output")
212 parser.add_argument("--role-arn", required=True, help="IAM role ARN")
213 parser.add_argument("--epochs", type=int, default=3, help="Number of epochs")
214 parser.add_argument("--batch-size", type=int, default=1, help="Batch size")
215 parser.add_argument("--learning-rate", type=float, default=0.00001, help="Learning rate")
216 parser.add_argument("--region", default="us-east-1", help="AWS region")
217 parser.add_argument("--wait", action="store_true", help="Wait for completion")
218
219 args = parser.parse_args()
220
221 # Create fine-tuner
222 fine_tuner = BedrockFineTuner(region=args.region)
223
224 # Set hyperparameters
225 hyperparameters = {
226 "epochCount": str(args.epochs),
227 "batchSize": str(args.batch_size),
228 "learningRate": str(args.learning_rate),
229 "learningRateWarmupSteps": "0"
230 }
231
232 # Create job
233 job_arn = fine_tuner.create_fine_tuning_job(
234 job_name=args.job_name,
235 base_model_id=args.base_model,
236 training_data_s3_uri=args.train_data,
237 validation_data_s3_uri=args.val_data,
238 output_s3_uri=args.output_s3,
239 role_arn=args.role_arn,
240 hyperparameters=hyperparameters
241 )
242
243 print(f"\n✅ Job created: {job_arn}")
244
245 # Wait for completion if requested
246 if args.wait:
247 print("\n⏳ Waiting for job to complete...")
248 final_status = fine_tuner.wait_for_completion(job_arn)
249 print(f"\nFinal status: {final_status['status']}")
250
251
252if __name__ == "__main__":
253 main()
📊 Monitoring Training Progress
scripts/monitor_training.py:
1#!/usr/bin/env python3
2"""
3Monitor Bedrock fine-tuning job progress
4"""
5
6import boto3
7import time
8from datetime import datetime
9import logging
10
11logging.basicConfig(level=logging.INFO)
12logger = logging.getLogger(__name__)
13
14
15def monitor_job(job_arn: str, region: str = "us-east-1"):
16 """Monitor and display training progress"""
17
18 bedrock = boto3.client('bedrock', region_name=region)
19
20 print("\n" + "=" * 80)
21 print(f"Monitoring Fine-Tuning Job")
22 print("=" * 80)
23 print(f"Job ARN: {job_arn}")
24 print(f"Started at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
25 print("=" * 80 + "\n")
26
27 start_time = time.time()
28
29 while True:
30 try:
31 response = bedrock.get_model_customization_job(
32 jobIdentifier=job_arn
33 )
34
35 status = response['status']
36 elapsed = time.time() - start_time
37 elapsed_str = time.strftime('%H:%M:%S', time.gmtime(elapsed))
38
39 # Display status
40 print(f"[{datetime.now().strftime('%H:%M:%S')}] Status: {status} | Elapsed: {elapsed_str}")
41
42 # Show metrics if available
43 if 'trainingMetrics' in response:
44 metrics = response['trainingMetrics']
45 print(f" Training Loss: {metrics.get('trainingLoss', 'N/A')}")
46
47 if 'validationMetrics' in response:
48 metrics = response['validationMetrics']
49 print(f" Validation Loss: {metrics.get('validationLoss', 'N/A')}")
50
51 # Terminal states
52 if status in ['Completed', 'Failed', 'Stopped']:
53 print("\n" + "=" * 80)
54 print(f"Job finished with status: {status}")
55
56 if status == 'Completed':
57 print(f"Custom Model ARN: {response.get('outputModelArn', 'N/A')}")
58 elif 'failureMessage' in response:
59 print(f"Failure reason: {response['failureMessage']}")
60
61 print("=" * 80)
62 break
63
64 time.sleep(60) # Check every minute
65
66 except KeyboardInterrupt:
67 print("\n\nMonitoring interrupted by user")
68 break
69 except Exception as e:
70 logger.error(f"Error monitoring job: {e}")
71 time.sleep(60)
72
73
74if __name__ == "__main__":
75 import sys
76
77 if len(sys.argv) < 2:
78 print("Usage: python monitor_training.py <job_arn> [region]")
79 sys.exit(1)
80
81 job_arn = sys.argv[1]
82 region = sys.argv[2] if len(sys.argv) > 2 else "us-east-1"
83
84 monitor_job(job_arn, region)
💻 Part 3: Infrastructure as Code with AWS CDK
🏗️ CDK Stack for Bedrock Fine-Tuning
infrastructure/cdk/bedrock_stack.py:
1"""
2AWS CDK Stack for Bedrock Fine-Tuning Infrastructure
3"""
4
5from aws_cdk import (
6 Stack,
7 aws_s3 as s3,
8 aws_iam as iam,
9 aws_logs as logs,
10 RemovalPolicy,
11 Duration,
12)
13from constructs import Construct
14
15
16class BedrockFineTuningStack(Stack):
17 """CDK Stack for Bedrock fine-tuning infrastructure"""
18
19 def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
20 super().__init__(scope, construct_id, **kwargs)
21
22 # S3 bucket for training data and outputs
23 self.training_bucket = s3.Bucket(
24 self, "BedrockTrainingBucket",
25 bucket_name=f"bedrock-training-{self.account}-{self.region}",
26 encryption=s3.BucketEncryption.S3_MANAGED,
27 block_public_access=s3.BlockPublicAccess.BLOCK_ALL,
28 versioned=True,
29 lifecycle_rules=[
30 s3.LifecycleRule(
31 id="DeleteOldTrainingData",
32 expiration=Duration.days(90),
33 noncurrent_version_expiration=Duration.days(30)
34 )
35 ],
36 removal_policy=RemovalPolicy.RETAIN
37 )
38
39 # IAM role for Bedrock service
40 self.bedrock_role = iam.Role(
41 self, "BedrockServiceRole",
42 assumed_by=iam.ServicePrincipal("bedrock.amazonaws.com"),
43 description="Role for Bedrock model customization"
44 )
45
46 # Grant Bedrock access to S3 bucket
47 self.training_bucket.grant_read_write(self.bedrock_role)
48
49 # Policy for Bedrock model customization
50 self.bedrock_role.add_to_policy(
51 iam.PolicyStatement(
52 effect=iam.Effect.ALLOW,
53 actions=[
54 "bedrock:CreateModelCustomizationJob",
55 "bedrock:GetModelCustomizationJob",
56 "bedrock:ListModelCustomizationJobs",
57 "bedrock:StopModelCustomizationJob",
58 "bedrock:CreateProvisionedModelThroughput",
59 "bedrock:GetProvisionedModelThroughput",
60 "bedrock:DeleteProvisionedModelThroughput",
61 "bedrock:ListCustomModels",
62 "bedrock:GetCustomModel",
63 "bedrock:DeleteCustomModel"
64 ],
65 resources=["*"]
66 )
67 )
68
69 # CloudWatch Logs for training job logs
70 self.log_group = logs.LogGroup(
71 self, "BedrockTrainingLogs",
72 log_group_name="/aws/bedrock/training",
73 retention=logs.RetentionDays.ONE_MONTH,
74 removal_policy=RemovalPolicy.DESTROY
75 )
76
77 # IAM role for Lambda functions (if using Lambda for orchestration)
78 self.lambda_role = iam.Role(
79 self, "BedrockLambdaRole",
80 assumed_by=iam.ServicePrincipal("lambda.amazonaws.com"),
81 managed_policies=[
82 iam.ManagedPolicy.from_aws_managed_policy_name(
83 "service-role/AWSLambdaBasicExecutionRole"
84 )
85 ]
86 )
87
88 # Grant Lambda permissions to interact with Bedrock
89 self.lambda_role.add_to_policy(
90 iam.PolicyStatement(
91 effect=iam.Effect.ALLOW,
92 actions=[
93 "bedrock:InvokeModel",
94 "bedrock:InvokeModelWithResponseStream",
95 "bedrock:CreateModelCustomizationJob",
96 "bedrock:GetModelCustomizationJob",
97 "bedrock:ListModelCustomizationJobs"
98 ],
99 resources=["*"]
100 )
101 )
102
103 # Grant Lambda access to S3
104 self.training_bucket.grant_read_write(self.lambda_role)
105
106 # Output important values
107 from aws_cdk import CfnOutput
108
109 CfnOutput(
110 self, "TrainingBucketName",
111 value=self.training_bucket.bucket_name,
112 description="S3 bucket for training data"
113 )
114
115 CfnOutput(
116 self, "BedrockRoleArn",
117 value=self.bedrock_role.role_arn,
118 description="IAM role ARN for Bedrock service"
119 )
120
121 CfnOutput(
122 self, "LambdaRoleArn",
123 value=self.lambda_role.role_arn,
124 description="IAM role ARN for Lambda functions"
125 )
infrastructure/cdk/app.py:
1#!/usr/bin/env python3
2"""
3CDK App for Bedrock Fine-Tuning Infrastructure
4"""
5
6import aws_cdk as cdk
7from bedrock_stack import BedrockFineTuningStack
8
9app = cdk.App()
10
11BedrockFineTuningStack(
12 app, "BedrockFineTuningStack",
13 env=cdk.Environment(
14 account=app.node.try_get_context("account"),
15 region=app.node.try_get_context("region") or "us-east-1"
16 ),
17 description="Infrastructure for AWS Bedrock LLM fine-tuning"
18)
19
20app.synth()
Deploy the infrastructure:
1# Navigate to CDK directory
2cd infrastructure/cdk
3
4# Install CDK dependencies
5pip install -r requirements.txt
6
7# Bootstrap CDK (first time only)
8cdk bootstrap
9
10# Deploy stack
11cdk deploy
12
13# Note the outputs (bucket name, role ARNs)
💻 Part 4: Model Inference and Evaluation
🔮 Using Your Fine-Tuned Model
scripts/inference.py:
1#!/usr/bin/env python3
2"""
3Inference with fine-tuned Bedrock model
4"""
5
6import boto3
7import json
8from typing import Dict, Any
9import logging
10
11logging.basicConfig(level=logging.INFO)
12logger = logging.getLogger(__name__)
13
14
15class BedrockInference:
16 """Perform inference with fine-tuned models"""
17
18 def __init__(self, region: str = "us-east-1"):
19 self.bedrock_runtime = boto3.client(
20 'bedrock-runtime',
21 region_name=region
22 )
23
24 def invoke_model(self,
25 model_arn: str,
26 prompt: str,
27 max_tokens: int = 512,
28 temperature: float = 0.7) -> Dict[str, Any]:
29 """
30 Invoke fine-tuned model
31
32 Args:
33 model_arn: ARN of fine-tuned model or provisioned throughput
34 prompt: Input prompt
35 max_tokens: Maximum tokens to generate
36 temperature: Sampling temperature
37
38 Returns:
39 Model response
40 """
41 # Build request body (format depends on base model)
42 request_body = {
43 "inputText": prompt,
44 "textGenerationConfig": {
45 "maxTokenCount": max_tokens,
46 "temperature": temperature,
47 "topP": 0.9
48 }
49 }
50
51 try:
52 response = self.bedrock_runtime.invoke_model(
53 modelId=model_arn,
54 body=json.dumps(request_body),
55 contentType="application/json",
56 accept="application/json"
57 )
58
59 # Parse response
60 response_body = json.loads(response['body'].read())
61
62 return response_body
63
64 except Exception as e:
65 logger.error(f"Inference failed: {e}")
66 raise
67
68 def batch_inference(self,
69 model_arn: str,
70 prompts: list,
71 max_tokens: int = 512) -> list:
72 """Run inference on multiple prompts"""
73 results = []
74
75 for idx, prompt in enumerate(prompts):
76 logger.info(f"Processing prompt {idx + 1}/{len(prompts)}")
77
78 try:
79 result = self.invoke_model(
80 model_arn=model_arn,
81 prompt=prompt,
82 max_tokens=max_tokens
83 )
84 results.append({
85 "prompt": prompt,
86 "response": result,
87 "status": "success"
88 })
89 except Exception as e:
90 results.append({
91 "prompt": prompt,
92 "error": str(e),
93 "status": "failed"
94 })
95
96 return results
97
98
99def compare_models(base_model_id: str,
100 custom_model_arn: str,
101 test_prompts: list,
102 region: str = "us-east-1"):
103 """
104 Compare base model vs fine-tuned model
105
106 Args:
107 base_model_id: Base foundation model ID
108 custom_model_arn: Fine-tuned model ARN
109 test_prompts: List of test prompts
110 region: AWS region
111 """
112 inference = BedrockInference(region=region)
113
114 print("\n" + "=" * 80)
115 print("Model Comparison: Base vs Fine-Tuned")
116 print("=" * 80 + "\n")
117
118 for idx, prompt in enumerate(test_prompts, 1):
119 print(f"Test {idx}: {prompt}")
120 print("-" * 80)
121
122 # Base model
123 print("Base Model Response:")
124 try:
125 base_response = inference.invoke_model(base_model_id, prompt)
126 print(json.dumps(base_response, indent=2))
127 except Exception as e:
128 print(f"Error: {e}")
129
130 print()
131
132 # Fine-tuned model
133 print("Fine-Tuned Model Response:")
134 try:
135 custom_response = inference.invoke_model(custom_model_arn, prompt)
136 print(json.dumps(custom_response, indent=2))
137 except Exception as e:
138 print(f"Error: {e}")
139
140 print("\n" + "=" * 80 + "\n")
141
142
143if __name__ == "__main__":
144 import argparse
145
146 parser = argparse.ArgumentParser(description="Run inference with fine-tuned model")
147 parser.add_argument("--model-arn", required=True, help="Fine-tuned model ARN")
148 parser.add_argument("--prompt", help="Single prompt for inference")
149 parser.add_argument("--prompts-file", help="File with multiple prompts")
150 parser.add_argument("--compare", help="Base model ID for comparison")
151 parser.add_argument("--region", default="us-east-1", help="AWS region")
152
153 args = parser.parse_args()
154
155 inference = BedrockInference(region=args.region)
156
157 if args.prompt:
158 # Single prompt
159 result = inference.invoke_model(args.model_arn, args.prompt)
160 print(json.dumps(result, indent=2))
161
162 elif args.prompts_file:
163 # Multiple prompts from file
164 with open(args.prompts_file, 'r') as f:
165 prompts = [line.strip() for line in f if line.strip()]
166
167 if args.compare:
168 compare_models(args.compare, args.model_arn, prompts, args.region)
169 else:
170 results = inference.batch_inference(args.model_arn, prompts)
171 print(json.dumps(results, indent=2))
🎯 Complete End-to-End Example
Step-by-Step Fine-Tuning Workflow
1# Step 1: Deploy infrastructure
2cd infrastructure/cdk
3cdk deploy
4# Note the outputs: BucketName, BedrockRoleArn
5
6# Step 2: Create sample data (or use your own)
7python -c "from scripts.prepare_data import create_sample_dataset; create_sample_dataset('data/raw/samples.jsonl', 500)"
8
9# Step 3: Prepare and upload data
10python scripts/prepare_data.py \
11 --input data/raw/samples.jsonl \
12 --bucket bedrock-training-ACCOUNT-REGION \
13 --prefix training-data
14
15# Step 4: Start fine-tuning job
16python scripts/start_training.py \
17 --job-name sentiment-classifier-v1 \
18 --base-model amazon.titan-text-express-v1 \
19 --train-data s3://bedrock-training-ACCOUNT-REGION/training-data/train_*.jsonl \
20 --val-data s3://bedrock-training-ACCOUNT-REGION/training-data/val_*.jsonl \
21 --output-s3 s3://bedrock-training-ACCOUNT-REGION/models/ \
22 --role-arn arn:aws:iam::ACCOUNT:role/BedrockServiceRole \
23 --epochs 3 \
24 --wait
25
26# Step 5: Monitor training (in another terminal)
27python scripts/monitor_training.py arn:aws:bedrock:REGION:ACCOUNT:model-customization-job/JOB_ID
28
29# Step 6: Test fine-tuned model
30python scripts/inference.py \
31 --model-arn arn:aws:bedrock:REGION:ACCOUNT:provisioned-model/MODEL_ID \
32 --prompt "Classify the sentiment of this review: Best product ever!"
33
34# Step 7: Compare with base model
35python scripts/inference.py \
36 --model-arn arn:aws:bedrock:REGION:ACCOUNT:provisioned-model/MODEL_ID \
37 --prompts-file test_prompts.txt \
38 --compare amazon.titan-text-express-v1
🎯 Best Practices and Tips
📊 Data Quality Best Practices
Data Volume
- Minimum: 100 examples for simple tasks
- Recommended: 500-1,000 examples for most use cases
- Maximum: 10,000 examples per job
Data Diversity
- Cover all edge cases and variations
- Balance class distributions
- Include negative examples
Prompt Engineering
- Keep prompts consistent in format
- Use clear, specific instructions
- Test prompt templates before fine-tuning
🔧 Hyperparameter Tuning
1# Conservative (safe, slower learning)
2conservative_params = {
3 "epochCount": "5",
4 "batchSize": "1",
5 "learningRate": "0.000005"
6}
7
8# Moderate (recommended starting point)
9moderate_params = {
10 "epochCount": "3",
11 "batchSize": "1",
12 "learningRate": "0.00001"
13}
14
15# Aggressive (faster, risk of overfitting)
16aggressive_params = {
17 "epochCount": "2",
18 "batchSize": "2",
19 "learningRate": "0.00005"
20}
💰 Cost Optimization
- Use validation sets to prevent overfitting (saves re-training costs)
- Start with smaller datasets to validate approach
- Use on-demand inference for testing, provisioned throughput for production
- Delete unused custom models to avoid storage costs
- Monitor training time - stop if loss plateaus early
🔒 Security Best Practices
1# Always use encryption
2training_config = {
3 "s3Uri": "s3://bucket/data/",
4 "encryption": {
5 "type": "KMS",
6 "kmsKeyId": "arn:aws:kms:region:account:key/key-id"
7 }
8}
9
10# Use VPC endpoints for private connectivity
11vpc_config = {
12 "subnetIds": ["subnet-xxx", "subnet-yyy"],
13 "securityGroupIds": ["sg-xxx"]
14}
15
16# Enable CloudTrail logging for audit
17# Enable S3 bucket versioning for data recovery
18# Use IAM policies with least privilege principle
🎉 Conclusion
You’ve now learned how to fine-tune LLMs with AWS Bedrock, covering:
✅ Three customization approaches - Fine-tuning, continued pre-training, reinforcement fine-tuning ✅ Complete data preparation pipeline - Validation, formatting, upload to S3 ✅ Production-ready Python scripts - Training job management, monitoring, inference ✅ Infrastructure as Code - CDK stack for reproducible deployments ✅ Best practices - Data quality, hyperparameters, cost optimization, security
Key Takeaways
- Start Simple - Begin with a small dataset and basic fine-tuning
- Validate Thoroughly - Use held-out validation data to prevent overfitting
- Monitor Costs - Fine-tuning is charged by the hour, monitor your jobs
- Iterate - Fine-tuning is iterative; expect to refine data and hyperparameters
- Secure Your Data - Use encryption, VPCs, and IAM best practices
Next Steps
Enhance your fine-tuning:
- Implement automated evaluation metrics
- Build CI/CD pipelines for model updates
- Create A/B testing for model versions
- Set up monitoring and alerting for inference quality
- Explore reinforcement fine-tuning for alignment tasks
Advanced topics:
- Multi-task fine-tuning
- Few-shot learning optimization
- Model compression and distillation
- Cross-model ensemble approaches
Resources
Sources:
- Customize models in Amazon Bedrock with your own data using fine-tuning and continued pre-training | AWS News Blog
- Amazon Bedrock now supports reinforcement fine-tuning delivering 66% accuracy gains on average over base models - AWS
- Customize your model to improve its performance for your use case - Amazon Bedrock
- Code samples for model customization - Amazon Bedrock
- Prepare your training datasets for fine-tuning and continued pre-training - Amazon Bedrock
- Fine-tune LLMs with synthetic data for context-based Q&A using Amazon Bedrock | Artificial Intelligence
🏷️ Tags & Categories
Tags: AWS Bedrock, LLM, fine-tuning, machine-learning, AI, Claude, Titan, AWS CDK, Python, boto3, reinforcement-learning Categories: AI, AWS, Machine Learning, MLOps Difficulty: Intermediate to Advanced Time to Complete: 6-8 hours
