Fine-Tuning LLMs with AWS Bedrock: A Complete Guide to Post-Training Customization

🎯 Introduction to LLM Post-Training with AWS Bedrock

📋 What is LLM Fine-Tuning?

Fine-tuning is the process of taking a pre-trained Large Language Model (LLM) and further training it on your specific dataset to improve its performance for your particular use case. While foundation models like Claude, Titan, or Llama are incredibly capable, they’re trained on broad, general data. Fine-tuning allows you to:

  • Improve accuracy for domain-specific tasks (legal, medical, finance)
  • Adapt writing style to match your brand voice
  • Enhance performance on specialized workflows
  • Reduce hallucinations by grounding responses in your data
  • Optimize for specific formats (JSON output, structured responses)

🚀 Why AWS Bedrock for Fine-Tuning?

AWS Bedrock provides a fully managed service for customizing foundation models without needing deep ML expertise or managing infrastructure:

No Infrastructure Management - AWS handles compute, storage, and scaling ✅ Multiple Customization Methods - Fine-tuning, continued pre-training, reinforcement learning ✅ Data Privacy - Your training data never leaves your AWS account or trains other models ✅ Multiple Model Support - Amazon Titan, Meta Llama, Cohere Command, and more ✅ Cost-Effective - Pay only for training time and inference ✅ Enterprise Security - Customer managed keys, VPC endpoints, IAM integration

🎯 Recent Updates (December 2025)

Amazon Bedrock now supports Reinforcement Fine-Tuning, delivering 66% accuracy gains on average over base models. This new capability allows you to:

  • Train models with small sets of prompts instead of large labeled datasets
  • Use rule-based or AI-based judges to define reward functions
  • Optimize for both objective tasks (code generation, math) and subjective tasks (chatbot interactions)

🏗️ Three Approaches to Model Customization

┌─────────────────────────────────────────────────────────────┐
│                  AWS Bedrock Customization                   │
└─────────────────────────────────────────────────────────────┘
                            │
        ┌───────────────────┼───────────────────┐
        │                   │                   │
        ▼                   ▼                   ▼
┌───────────────┐   ┌──────────────┐   ┌──────────────────┐
│  Fine-Tuning  │   │  Continued   │   │  Reinforcement   │
│               │   │ Pre-Training │   │  Fine-Tuning     │
└───────────────┘   └──────────────┘   └──────────────────┘
│                   │                   │
│ Labeled data      │ Unlabeled data    │ Prompt + feedback│
│ Task-specific     │ Domain knowledge  │ Alignment-focused│
│ 100-10K examples  │ Large corpus      │ Small prompt set │
└───────────────────┴──────────────────┴──────────────────┘

1. Supervised Fine-Tuning

Best for: Task-specific improvements with labeled data

  • Provide prompt-completion pairs
  • Improves accuracy on specific tasks
  • Requires 100-10,000 labeled examples

2. Continued Pre-Training

Best for: Domain adaptation with unlabeled data

  • Train on domain-specific text corpus
  • Model learns domain vocabulary and concepts
  • No labels required, just relevant text

3. Reinforcement Fine-Tuning (NEW)

Best for: Alignment and preference optimization

  • Uses small prompt sets with feedback
  • Rule-based or AI-based reward signals
  • Ideal for instruction following and safety

🚀 Getting Started: Prerequisites and Setup

🔧 Prerequisites

AWS Account Requirements:

  • AWS account with Bedrock access
  • IAM permissions for Bedrock, S3, IAM
  • Service quota for model customization (request if needed)

Development Environment:

  • Python 3.9+
  • AWS CLI configured
  • boto3 SDK
  • AWS CDK (for infrastructure as code)

Knowledge Requirements:

  • Basic understanding of LLMs
  • Familiarity with AWS services
  • Python programming
  • Basic ML concepts

📦 Installation and Setup

 1# Create project directory
 2mkdir bedrock-finetuning
 3cd bedrock-finetuning
 4
 5# Set up Python environment
 6python -m venv venv
 7source venv/bin/activate  # Windows: venv\Scripts\activate
 8
 9# Install required packages
10pip install boto3 pandas jsonlines aws-cdk-lib constructs
11
12# Configure AWS credentials
13aws configure
14# Enter your AWS Access Key ID, Secret Key, Region (us-east-1), and format (json)
15
16# Verify Bedrock access
17aws bedrock list-foundation-models --region us-east-1

🏗️ Project Structure

bedrock-finetuning/
├── data/
│   ├── training/
│   │   ├── training_data.jsonl
│   │   └── validation_data.jsonl
│   └── synthetic/
│       └── generated_samples.jsonl
├── scripts/
│   ├── prepare_data.py
│   ├── start_training.py
│   ├── evaluate_model.py
│   └── inference.py
├── infrastructure/
│   ├── cdk/
│   │   ├── app.py
│   │   ├── bedrock_stack.py
│   │   └── requirements.txt
│   └── config.yaml
├── notebooks/
│   └── data_exploration.ipynb
├── logs/
├── models/
│   └── custom_models/
└── requirements.txt

💻 Part 1: Data Preparation for Fine-Tuning

📝 Data Format Requirements

For Fine-Tuning (Prompt-Completion Pairs):

{"prompt": "Classify the sentiment of this review: The product exceeded my expectations!", "completion": "positive"}
{"prompt": "Classify the sentiment of this review: Terrible quality, broke after one day.", "completion": "negative"}
{"prompt": "Classify the sentiment of this review: It's okay, nothing special.", "completion": "neutral"}

For Continued Pre-Training (Raw Text):

{"text": "Machine learning is a subset of artificial intelligence that focuses on enabling systems to learn from data..."}
{"text": "Neural networks consist of interconnected layers of nodes, where each connection has an associated weight..."}

For Reinforcement Fine-Tuning (Prompts with Multiple Responses):

{
  "prompt": "Write a Python function to calculate fibonacci numbers",
  "responses": [
    {"text": "def fib(n):\n    if n <= 1: return n\n    return fib(n-1) + fib(n-2)", "score": 0.6},
    {"text": "def fib(n):\n    a, b = 0, 1\n    for _ in range(n):\n        a, b = b, a + b\n    return a", "score": 1.0}
  ]
}

🛠️ Data Preparation Script

scripts/prepare_data.py:

  1#!/usr/bin/env python3
  2"""
  3Data preparation script for AWS Bedrock fine-tuning
  4Validates format, splits data, uploads to S3
  5"""
  6
  7import json
  8import jsonlines
  9import pandas as pd
 10import boto3
 11from pathlib import Path
 12from typing import List, Dict, Any
 13from datetime import datetime
 14import logging
 15
 16logging.basicConfig(level=logging.INFO)
 17logger = logging.getLogger(__name__)
 18
 19
 20class BedrockDataPreparator:
 21    """Prepare and validate training data for Bedrock fine-tuning"""
 22
 23    def __init__(self, s3_bucket: str, s3_prefix: str = "bedrock-training"):
 24        self.s3_client = boto3.client('s3')
 25        self.s3_bucket = s3_bucket
 26        self.s3_prefix = s3_prefix
 27
 28    def validate_fine_tuning_data(self, data: List[Dict[str, Any]]) -> bool:
 29        """
 30        Validate fine-tuning data format
 31
 32        Requirements:
 33        - Each record must have 'prompt' and 'completion'
 34        - Prompt must be non-empty string
 35        - Completion must be non-empty string
 36        - Max 10,000 records
 37        """
 38        if len(data) > 10000:
 39            logger.error(f"Dataset has {len(data)} records. Max is 10,000.")
 40            return False
 41
 42        for idx, record in enumerate(data):
 43            # Check required fields
 44            if 'prompt' not in record or 'completion' not in record:
 45                logger.error(f"Record {idx} missing 'prompt' or 'completion'")
 46                return False
 47
 48            # Check non-empty
 49            if not record['prompt'] or not record['completion']:
 50                logger.error(f"Record {idx} has empty prompt or completion")
 51                return False
 52
 53            # Check types
 54            if not isinstance(record['prompt'], str) or not isinstance(record['completion'], str):
 55                logger.error(f"Record {idx} has non-string prompt or completion")
 56                return False
 57
 58            # Check length (recommended)
 59            if len(record['prompt']) > 2048:
 60                logger.warning(f"Record {idx} has very long prompt ({len(record['prompt'])} chars)")
 61
 62            if len(record['completion']) > 2048:
 63                logger.warning(f"Record {idx} has very long completion ({len(record['completion'])} chars)")
 64
 65        logger.info(f"✅ Validated {len(data)} training records")
 66        return True
 67
 68    def split_data(self, data: List[Dict[str, Any]],
 69                   train_ratio: float = 0.8) -> tuple:
 70        """
 71        Split data into training and validation sets
 72
 73        Args:
 74            data: List of training examples
 75            train_ratio: Proportion for training (default 0.8)
 76
 77        Returns:
 78            Tuple of (training_data, validation_data)
 79        """
 80        import random
 81        random.shuffle(data)
 82
 83        split_idx = int(len(data) * train_ratio)
 84        train_data = data[:split_idx]
 85        val_data = data[split_idx:]
 86
 87        logger.info(f"Split: {len(train_data)} training, {len(val_data)} validation")
 88        return train_data, val_data
 89
 90    def save_jsonl(self, data: List[Dict[str, Any]], filepath: str):
 91        """Save data in JSONL format"""
 92        Path(filepath).parent.mkdir(parents=True, exist_ok=True)
 93
 94        with jsonlines.open(filepath, mode='w') as writer:
 95            for record in data:
 96                writer.write(record)
 97
 98        logger.info(f"Saved {len(data)} records to {filepath}")
 99
100    def upload_to_s3(self, local_path: str, s3_key: str) -> str:
101        """
102        Upload training data to S3
103
104        Returns:
105            S3 URI (s3://bucket/key)
106        """
107        try:
108            self.s3_client.upload_file(local_path, self.s3_bucket, s3_key)
109            s3_uri = f"s3://{self.s3_bucket}/{s3_key}"
110            logger.info(f"Uploaded to {s3_uri}")
111            return s3_uri
112        except Exception as e:
113            logger.error(f"Failed to upload to S3: {e}")
114            raise
115
116    def prepare_dataset(self,
117                       input_file: str,
118                       output_dir: str = "data/training",
119                       upload: bool = True) -> Dict[str, str]:
120        """
121        Complete data preparation pipeline
122
123        Args:
124            input_file: Path to raw data file (JSON or JSONL)
125            output_dir: Directory for processed data
126            upload: Whether to upload to S3
127
128        Returns:
129            Dictionary with S3 URIs for train/val data
130        """
131        logger.info("=" * 60)
132        logger.info("Starting Data Preparation Pipeline")
133        logger.info("=" * 60)
134
135        # Load data
136        logger.info(f"Loading data from {input_file}")
137        with open(input_file, 'r') as f:
138            if input_file.endswith('.jsonl'):
139                data = [json.loads(line) for line in f]
140            else:
141                data = json.load(f)
142
143        # Validate
144        if not self.validate_fine_tuning_data(data):
145            raise ValueError("Data validation failed")
146
147        # Split
148        train_data, val_data = self.split_data(data)
149
150        # Save locally
151        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
152        train_file = f"{output_dir}/train_{timestamp}.jsonl"
153        val_file = f"{output_dir}/val_{timestamp}.jsonl"
154
155        self.save_jsonl(train_data, train_file)
156        self.save_jsonl(val_data, val_file)
157
158        result = {
159            "train_local": train_file,
160            "val_local": val_file
161        }
162
163        # Upload to S3
164        if upload:
165            train_s3_key = f"{self.s3_prefix}/train_{timestamp}.jsonl"
166            val_s3_key = f"{self.s3_prefix}/val_{timestamp}.jsonl"
167
168            result["train_s3_uri"] = self.upload_to_s3(train_file, train_s3_key)
169            result["val_s3_uri"] = self.upload_to_s3(val_file, val_s3_key)
170
171        logger.info("=" * 60)
172        logger.info("Data Preparation Complete!")
173        logger.info("=" * 60)
174
175        return result
176
177
178def create_sample_dataset(output_file: str, num_samples: int = 100):
179    """
180    Create sample sentiment analysis dataset for testing
181
182    Args:
183        output_file: Where to save the sample data
184        num_samples: Number of samples to generate
185    """
186    import random
187
188    templates = {
189        "positive": [
190            "This product is amazing! Highly recommend.",
191            "Exceeded all my expectations. Five stars!",
192            "Best purchase I've made this year.",
193            "Absolutely love it. Will buy again.",
194            "Outstanding quality and fast shipping."
195        ],
196        "negative": [
197            "Terrible quality. Don't waste your money.",
198            "Broke after one day. Very disappointed.",
199            "Worst purchase ever. Asking for refund.",
200            "Nothing like the description. Avoid!",
201            "Poor quality and slow delivery."
202        ],
203        "neutral": [
204            "It's okay, nothing special.",
205            "Does what it's supposed to do.",
206            "Average product at average price.",
207            "No complaints but not impressed.",
208            "Pretty standard, meets expectations."
209        ]
210    }
211
212    samples = []
213    sentiments = list(templates.keys())
214
215    for _ in range(num_samples):
216        sentiment = random.choice(sentiments)
217        review = random.choice(templates[sentiment])
218
219        samples.append({
220            "prompt": f"Classify the sentiment of this review: {review}",
221            "completion": sentiment
222        })
223
224    Path(output_file).parent.mkdir(parents=True, exist_ok=True)
225
226    with jsonlines.open(output_file, mode='w') as writer:
227        writer.writeall(samples)
228
229    logger.info(f"Created {num_samples} sample records in {output_file}")
230
231
232if __name__ == "__main__":
233    import argparse
234
235    parser = argparse.ArgumentParser(description="Prepare data for Bedrock fine-tuning")
236    parser.add_argument("--input", required=True, help="Input data file")
237    parser.add_argument("--bucket", required=True, help="S3 bucket name")
238    parser.add_argument("--prefix", default="bedrock-training", help="S3 prefix")
239    parser.add_argument("--output-dir", default="data/training", help="Output directory")
240    parser.add_argument("--no-upload", action="store_true", help="Skip S3 upload")
241
242    args = parser.parse_args()
243
244    # Prepare data
245    preparator = BedrockDataPreparator(
246        s3_bucket=args.bucket,
247        s3_prefix=args.prefix
248    )
249
250    result = preparator.prepare_dataset(
251        input_file=args.input,
252        output_dir=args.output_dir,
253        upload=not args.no_upload
254    )
255
256    print("\n✅ Data preparation complete!")
257    print(f"Training data: {result['train_s3_uri']}")
258    print(f"Validation data: {result['val_s3_uri']}")

💻 Part 2: Fine-Tuning Job Execution

🚀 Starting a Fine-Tuning Job

scripts/start_training.py:

  1#!/usr/bin/env python3
  2"""
  3Start AWS Bedrock fine-tuning job
  4"""
  5
  6import boto3
  7import json
  8import time
  9from datetime import datetime
 10from typing import Dict, Any
 11import logging
 12
 13logging.basicConfig(level=logging.INFO)
 14logger = logging.getLogger(__name__)
 15
 16
 17class BedrockFineTuner:
 18    """Manage Bedrock model fine-tuning jobs"""
 19
 20    def __init__(self, region: str = "us-east-1"):
 21        self.bedrock = boto3.client('bedrock', region_name=region)
 22        self.region = region
 23
 24    def create_fine_tuning_job(self,
 25                               job_name: str,
 26                               base_model_id: str,
 27                               training_data_s3_uri: str,
 28                               validation_data_s3_uri: str,
 29                               output_s3_uri: str,
 30                               role_arn: str,
 31                               hyperparameters: Dict[str, str] = None) -> str:
 32        """
 33        Create a fine-tuning job
 34
 35        Args:
 36            job_name: Unique name for the job
 37            base_model_id: Foundation model to fine-tune (e.g., 'amazon.titan-text-express-v1')
 38            training_data_s3_uri: S3 URI for training data
 39            validation_data_s3_uri: S3 URI for validation data
 40            output_s3_uri: S3 URI for output model
 41            role_arn: IAM role ARN with permissions
 42            hyperparameters: Training hyperparameters
 43
 44        Returns:
 45            Job ARN
 46        """
 47        logger.info("=" * 70)
 48        logger.info("Creating Fine-Tuning Job")
 49        logger.info("=" * 70)
 50
 51        # Default hyperparameters for Titan models
 52        if hyperparameters is None:
 53            hyperparameters = {
 54                "epochCount": "3",
 55                "batchSize": "1",
 56                "learningRate": "0.00001",
 57                "learningRateWarmupSteps": "0"
 58            }
 59
 60        try:
 61            response = self.bedrock.create_model_customization_job(
 62                jobName=job_name,
 63                customModelName=f"{job_name}-model",
 64                roleArn=role_arn,
 65                baseModelIdentifier=base_model_id,
 66                customizationType="FINE_TUNING",
 67                trainingDataConfig={
 68                    "s3Uri": training_data_s3_uri
 69                },
 70                validationDataConfig={
 71                    "validators": [{
 72                        "s3Uri": validation_data_s3_uri
 73                    }]
 74                },
 75                outputDataConfig={
 76                    "s3Uri": output_s3_uri
 77                },
 78                hyperParameters=hyperparameters
 79            )
 80
 81            job_arn = response['jobArn']
 82
 83            logger.info(f"✅ Fine-tuning job created successfully!")
 84            logger.info(f"Job ARN: {job_arn}")
 85            logger.info(f"Job Name: {job_name}")
 86            logger.info(f"Base Model: {base_model_id}")
 87            logger.info("=" * 70)
 88
 89            return job_arn
 90
 91        except Exception as e:
 92            logger.error(f"Failed to create fine-tuning job: {e}")
 93            raise
 94
 95    def get_job_status(self, job_arn: str) -> Dict[str, Any]:
 96        """Get status of a fine-tuning job"""
 97        try:
 98            response = self.bedrock.get_model_customization_job(
 99                jobIdentifier=job_arn
100            )
101            return response
102        except Exception as e:
103            logger.error(f"Failed to get job status: {e}")
104            raise
105
106    def wait_for_completion(self, job_arn: str,
107                           check_interval: int = 60,
108                           timeout: int = 7200) -> Dict[str, Any]:
109        """
110        Wait for fine-tuning job to complete
111
112        Args:
113            job_arn: Job ARN to monitor
114            check_interval: Seconds between status checks
115            timeout: Maximum seconds to wait
116
117        Returns:
118            Final job status
119        """
120        logger.info(f"Monitoring job: {job_arn}")
121        logger.info(f"Check interval: {check_interval}s, Timeout: {timeout}s")
122
123        start_time = time.time()
124        last_status = None
125
126        while True:
127            elapsed = time.time() - start_time
128
129            if elapsed > timeout:
130                logger.error(f"⏰ Timeout reached after {timeout}s")
131                break
132
133            status = self.get_job_status(job_arn)
134            current_status = status['status']
135
136            if current_status != last_status:
137                logger.info(f"Status: {current_status}")
138                last_status = current_status
139
140            # Terminal states
141            if current_status == 'Completed':
142                logger.info("✅ Fine-tuning job completed successfully!")
143                return status
144            elif current_status in ['Failed', 'Stopped']:
145                logger.error(f"❌ Job ended with status: {current_status}")
146                if 'failureMessage' in status:
147                    logger.error(f"Failure message: {status['failureMessage']}")
148                return status
149
150            # Wait before next check
151            time.sleep(check_interval)
152
153        return self.get_job_status(job_arn)
154
155    def list_custom_models(self) -> list:
156        """List all custom models"""
157        try:
158            response = self.bedrock.list_custom_models()
159            models = response.get('modelSummaries', [])
160
161            logger.info(f"Found {len(models)} custom models:")
162            for model in models:
163                logger.info(f"  - {model['modelName']} ({model['modelArn']})")
164
165            return models
166        except Exception as e:
167            logger.error(f"Failed to list custom models: {e}")
168            raise
169
170    def create_provisioned_throughput(self,
171                                     model_arn: str,
172                                     throughput_name: str,
173                                     model_units: int = 1) -> str:
174        """
175        Create provisioned throughput for custom model
176
177        Args:
178            model_arn: ARN of custom model
179            throughput_name: Name for provisioned throughput
180            model_units: Number of model units (1-10)
181
182        Returns:
183            Provisioned throughput ARN
184        """
185        try:
186            response = self.bedrock.create_provisioned_model_throughput(
187                modelUnits=model_units,
188                provisionedModelName=throughput_name,
189                modelId=model_arn
190            )
191
192            throughput_arn = response['provisionedModelArn']
193            logger.info(f"✅ Created provisioned throughput: {throughput_arn}")
194
195            return throughput_arn
196        except Exception as e:
197            logger.error(f"Failed to create provisioned throughput: {e}")
198            raise
199
200
201def main():
202    """Example usage"""
203    import argparse
204
205    parser = argparse.ArgumentParser(description="Start Bedrock fine-tuning job")
206    parser.add_argument("--job-name", required=True, help="Job name")
207    parser.add_argument("--base-model", required=True,
208                       help="Base model ID (e.g., amazon.titan-text-express-v1)")
209    parser.add_argument("--train-data", required=True, help="S3 URI for training data")
210    parser.add_argument("--val-data", required=True, help="S3 URI for validation data")
211    parser.add_argument("--output-s3", required=True, help="S3 URI for output")
212    parser.add_argument("--role-arn", required=True, help="IAM role ARN")
213    parser.add_argument("--epochs", type=int, default=3, help="Number of epochs")
214    parser.add_argument("--batch-size", type=int, default=1, help="Batch size")
215    parser.add_argument("--learning-rate", type=float, default=0.00001, help="Learning rate")
216    parser.add_argument("--region", default="us-east-1", help="AWS region")
217    parser.add_argument("--wait", action="store_true", help="Wait for completion")
218
219    args = parser.parse_args()
220
221    # Create fine-tuner
222    fine_tuner = BedrockFineTuner(region=args.region)
223
224    # Set hyperparameters
225    hyperparameters = {
226        "epochCount": str(args.epochs),
227        "batchSize": str(args.batch_size),
228        "learningRate": str(args.learning_rate),
229        "learningRateWarmupSteps": "0"
230    }
231
232    # Create job
233    job_arn = fine_tuner.create_fine_tuning_job(
234        job_name=args.job_name,
235        base_model_id=args.base_model,
236        training_data_s3_uri=args.train_data,
237        validation_data_s3_uri=args.val_data,
238        output_s3_uri=args.output_s3,
239        role_arn=args.role_arn,
240        hyperparameters=hyperparameters
241    )
242
243    print(f"\n✅ Job created: {job_arn}")
244
245    # Wait for completion if requested
246    if args.wait:
247        print("\n⏳ Waiting for job to complete...")
248        final_status = fine_tuner.wait_for_completion(job_arn)
249        print(f"\nFinal status: {final_status['status']}")
250
251
252if __name__ == "__main__":
253    main()

📊 Monitoring Training Progress

scripts/monitor_training.py:

 1#!/usr/bin/env python3
 2"""
 3Monitor Bedrock fine-tuning job progress
 4"""
 5
 6import boto3
 7import time
 8from datetime import datetime
 9import logging
10
11logging.basicConfig(level=logging.INFO)
12logger = logging.getLogger(__name__)
13
14
15def monitor_job(job_arn: str, region: str = "us-east-1"):
16    """Monitor and display training progress"""
17
18    bedrock = boto3.client('bedrock', region_name=region)
19
20    print("\n" + "=" * 80)
21    print(f"Monitoring Fine-Tuning Job")
22    print("=" * 80)
23    print(f"Job ARN: {job_arn}")
24    print(f"Started at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
25    print("=" * 80 + "\n")
26
27    start_time = time.time()
28
29    while True:
30        try:
31            response = bedrock.get_model_customization_job(
32                jobIdentifier=job_arn
33            )
34
35            status = response['status']
36            elapsed = time.time() - start_time
37            elapsed_str = time.strftime('%H:%M:%S', time.gmtime(elapsed))
38
39            # Display status
40            print(f"[{datetime.now().strftime('%H:%M:%S')}] Status: {status} | Elapsed: {elapsed_str}")
41
42            # Show metrics if available
43            if 'trainingMetrics' in response:
44                metrics = response['trainingMetrics']
45                print(f"  Training Loss: {metrics.get('trainingLoss', 'N/A')}")
46
47            if 'validationMetrics' in response:
48                metrics = response['validationMetrics']
49                print(f"  Validation Loss: {metrics.get('validationLoss', 'N/A')}")
50
51            # Terminal states
52            if status in ['Completed', 'Failed', 'Stopped']:
53                print("\n" + "=" * 80)
54                print(f"Job finished with status: {status}")
55
56                if status == 'Completed':
57                    print(f"Custom Model ARN: {response.get('outputModelArn', 'N/A')}")
58                elif 'failureMessage' in response:
59                    print(f"Failure reason: {response['failureMessage']}")
60
61                print("=" * 80)
62                break
63
64            time.sleep(60)  # Check every minute
65
66        except KeyboardInterrupt:
67            print("\n\nMonitoring interrupted by user")
68            break
69        except Exception as e:
70            logger.error(f"Error monitoring job: {e}")
71            time.sleep(60)
72
73
74if __name__ == "__main__":
75    import sys
76
77    if len(sys.argv) < 2:
78        print("Usage: python monitor_training.py <job_arn> [region]")
79        sys.exit(1)
80
81    job_arn = sys.argv[1]
82    region = sys.argv[2] if len(sys.argv) > 2 else "us-east-1"
83
84    monitor_job(job_arn, region)

💻 Part 3: Infrastructure as Code with AWS CDK

🏗️ CDK Stack for Bedrock Fine-Tuning

infrastructure/cdk/bedrock_stack.py:

  1"""
  2AWS CDK Stack for Bedrock Fine-Tuning Infrastructure
  3"""
  4
  5from aws_cdk import (
  6    Stack,
  7    aws_s3 as s3,
  8    aws_iam as iam,
  9    aws_logs as logs,
 10    RemovalPolicy,
 11    Duration,
 12)
 13from constructs import Construct
 14
 15
 16class BedrockFineTuningStack(Stack):
 17    """CDK Stack for Bedrock fine-tuning infrastructure"""
 18
 19    def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
 20        super().__init__(scope, construct_id, **kwargs)
 21
 22        # S3 bucket for training data and outputs
 23        self.training_bucket = s3.Bucket(
 24            self, "BedrockTrainingBucket",
 25            bucket_name=f"bedrock-training-{self.account}-{self.region}",
 26            encryption=s3.BucketEncryption.S3_MANAGED,
 27            block_public_access=s3.BlockPublicAccess.BLOCK_ALL,
 28            versioned=True,
 29            lifecycle_rules=[
 30                s3.LifecycleRule(
 31                    id="DeleteOldTrainingData",
 32                    expiration=Duration.days(90),
 33                    noncurrent_version_expiration=Duration.days(30)
 34                )
 35            ],
 36            removal_policy=RemovalPolicy.RETAIN
 37        )
 38
 39        # IAM role for Bedrock service
 40        self.bedrock_role = iam.Role(
 41            self, "BedrockServiceRole",
 42            assumed_by=iam.ServicePrincipal("bedrock.amazonaws.com"),
 43            description="Role for Bedrock model customization"
 44        )
 45
 46        # Grant Bedrock access to S3 bucket
 47        self.training_bucket.grant_read_write(self.bedrock_role)
 48
 49        # Policy for Bedrock model customization
 50        self.bedrock_role.add_to_policy(
 51            iam.PolicyStatement(
 52                effect=iam.Effect.ALLOW,
 53                actions=[
 54                    "bedrock:CreateModelCustomizationJob",
 55                    "bedrock:GetModelCustomizationJob",
 56                    "bedrock:ListModelCustomizationJobs",
 57                    "bedrock:StopModelCustomizationJob",
 58                    "bedrock:CreateProvisionedModelThroughput",
 59                    "bedrock:GetProvisionedModelThroughput",
 60                    "bedrock:DeleteProvisionedModelThroughput",
 61                    "bedrock:ListCustomModels",
 62                    "bedrock:GetCustomModel",
 63                    "bedrock:DeleteCustomModel"
 64                ],
 65                resources=["*"]
 66            )
 67        )
 68
 69        # CloudWatch Logs for training job logs
 70        self.log_group = logs.LogGroup(
 71            self, "BedrockTrainingLogs",
 72            log_group_name="/aws/bedrock/training",
 73            retention=logs.RetentionDays.ONE_MONTH,
 74            removal_policy=RemovalPolicy.DESTROY
 75        )
 76
 77        # IAM role for Lambda functions (if using Lambda for orchestration)
 78        self.lambda_role = iam.Role(
 79            self, "BedrockLambdaRole",
 80            assumed_by=iam.ServicePrincipal("lambda.amazonaws.com"),
 81            managed_policies=[
 82                iam.ManagedPolicy.from_aws_managed_policy_name(
 83                    "service-role/AWSLambdaBasicExecutionRole"
 84                )
 85            ]
 86        )
 87
 88        # Grant Lambda permissions to interact with Bedrock
 89        self.lambda_role.add_to_policy(
 90            iam.PolicyStatement(
 91                effect=iam.Effect.ALLOW,
 92                actions=[
 93                    "bedrock:InvokeModel",
 94                    "bedrock:InvokeModelWithResponseStream",
 95                    "bedrock:CreateModelCustomizationJob",
 96                    "bedrock:GetModelCustomizationJob",
 97                    "bedrock:ListModelCustomizationJobs"
 98                ],
 99                resources=["*"]
100            )
101        )
102
103        # Grant Lambda access to S3
104        self.training_bucket.grant_read_write(self.lambda_role)
105
106        # Output important values
107        from aws_cdk import CfnOutput
108
109        CfnOutput(
110            self, "TrainingBucketName",
111            value=self.training_bucket.bucket_name,
112            description="S3 bucket for training data"
113        )
114
115        CfnOutput(
116            self, "BedrockRoleArn",
117            value=self.bedrock_role.role_arn,
118            description="IAM role ARN for Bedrock service"
119        )
120
121        CfnOutput(
122            self, "LambdaRoleArn",
123            value=self.lambda_role.role_arn,
124            description="IAM role ARN for Lambda functions"
125        )

infrastructure/cdk/app.py:

 1#!/usr/bin/env python3
 2"""
 3CDK App for Bedrock Fine-Tuning Infrastructure
 4"""
 5
 6import aws_cdk as cdk
 7from bedrock_stack import BedrockFineTuningStack
 8
 9app = cdk.App()
10
11BedrockFineTuningStack(
12    app, "BedrockFineTuningStack",
13    env=cdk.Environment(
14        account=app.node.try_get_context("account"),
15        region=app.node.try_get_context("region") or "us-east-1"
16    ),
17    description="Infrastructure for AWS Bedrock LLM fine-tuning"
18)
19
20app.synth()

Deploy the infrastructure:

 1# Navigate to CDK directory
 2cd infrastructure/cdk
 3
 4# Install CDK dependencies
 5pip install -r requirements.txt
 6
 7# Bootstrap CDK (first time only)
 8cdk bootstrap
 9
10# Deploy stack
11cdk deploy
12
13# Note the outputs (bucket name, role ARNs)

💻 Part 4: Model Inference and Evaluation

🔮 Using Your Fine-Tuned Model

scripts/inference.py:

  1#!/usr/bin/env python3
  2"""
  3Inference with fine-tuned Bedrock model
  4"""
  5
  6import boto3
  7import json
  8from typing import Dict, Any
  9import logging
 10
 11logging.basicConfig(level=logging.INFO)
 12logger = logging.getLogger(__name__)
 13
 14
 15class BedrockInference:
 16    """Perform inference with fine-tuned models"""
 17
 18    def __init__(self, region: str = "us-east-1"):
 19        self.bedrock_runtime = boto3.client(
 20            'bedrock-runtime',
 21            region_name=region
 22        )
 23
 24    def invoke_model(self,
 25                    model_arn: str,
 26                    prompt: str,
 27                    max_tokens: int = 512,
 28                    temperature: float = 0.7) -> Dict[str, Any]:
 29        """
 30        Invoke fine-tuned model
 31
 32        Args:
 33            model_arn: ARN of fine-tuned model or provisioned throughput
 34            prompt: Input prompt
 35            max_tokens: Maximum tokens to generate
 36            temperature: Sampling temperature
 37
 38        Returns:
 39            Model response
 40        """
 41        # Build request body (format depends on base model)
 42        request_body = {
 43            "inputText": prompt,
 44            "textGenerationConfig": {
 45                "maxTokenCount": max_tokens,
 46                "temperature": temperature,
 47                "topP": 0.9
 48            }
 49        }
 50
 51        try:
 52            response = self.bedrock_runtime.invoke_model(
 53                modelId=model_arn,
 54                body=json.dumps(request_body),
 55                contentType="application/json",
 56                accept="application/json"
 57            )
 58
 59            # Parse response
 60            response_body = json.loads(response['body'].read())
 61
 62            return response_body
 63
 64        except Exception as e:
 65            logger.error(f"Inference failed: {e}")
 66            raise
 67
 68    def batch_inference(self,
 69                       model_arn: str,
 70                       prompts: list,
 71                       max_tokens: int = 512) -> list:
 72        """Run inference on multiple prompts"""
 73        results = []
 74
 75        for idx, prompt in enumerate(prompts):
 76            logger.info(f"Processing prompt {idx + 1}/{len(prompts)}")
 77
 78            try:
 79                result = self.invoke_model(
 80                    model_arn=model_arn,
 81                    prompt=prompt,
 82                    max_tokens=max_tokens
 83                )
 84                results.append({
 85                    "prompt": prompt,
 86                    "response": result,
 87                    "status": "success"
 88                })
 89            except Exception as e:
 90                results.append({
 91                    "prompt": prompt,
 92                    "error": str(e),
 93                    "status": "failed"
 94                })
 95
 96        return results
 97
 98
 99def compare_models(base_model_id: str,
100                  custom_model_arn: str,
101                  test_prompts: list,
102                  region: str = "us-east-1"):
103    """
104    Compare base model vs fine-tuned model
105
106    Args:
107        base_model_id: Base foundation model ID
108        custom_model_arn: Fine-tuned model ARN
109        test_prompts: List of test prompts
110        region: AWS region
111    """
112    inference = BedrockInference(region=region)
113
114    print("\n" + "=" * 80)
115    print("Model Comparison: Base vs Fine-Tuned")
116    print("=" * 80 + "\n")
117
118    for idx, prompt in enumerate(test_prompts, 1):
119        print(f"Test {idx}: {prompt}")
120        print("-" * 80)
121
122        # Base model
123        print("Base Model Response:")
124        try:
125            base_response = inference.invoke_model(base_model_id, prompt)
126            print(json.dumps(base_response, indent=2))
127        except Exception as e:
128            print(f"Error: {e}")
129
130        print()
131
132        # Fine-tuned model
133        print("Fine-Tuned Model Response:")
134        try:
135            custom_response = inference.invoke_model(custom_model_arn, prompt)
136            print(json.dumps(custom_response, indent=2))
137        except Exception as e:
138            print(f"Error: {e}")
139
140        print("\n" + "=" * 80 + "\n")
141
142
143if __name__ == "__main__":
144    import argparse
145
146    parser = argparse.ArgumentParser(description="Run inference with fine-tuned model")
147    parser.add_argument("--model-arn", required=True, help="Fine-tuned model ARN")
148    parser.add_argument("--prompt", help="Single prompt for inference")
149    parser.add_argument("--prompts-file", help="File with multiple prompts")
150    parser.add_argument("--compare", help="Base model ID for comparison")
151    parser.add_argument("--region", default="us-east-1", help="AWS region")
152
153    args = parser.parse_args()
154
155    inference = BedrockInference(region=args.region)
156
157    if args.prompt:
158        # Single prompt
159        result = inference.invoke_model(args.model_arn, args.prompt)
160        print(json.dumps(result, indent=2))
161
162    elif args.prompts_file:
163        # Multiple prompts from file
164        with open(args.prompts_file, 'r') as f:
165            prompts = [line.strip() for line in f if line.strip()]
166
167        if args.compare:
168            compare_models(args.compare, args.model_arn, prompts, args.region)
169        else:
170            results = inference.batch_inference(args.model_arn, prompts)
171            print(json.dumps(results, indent=2))

🎯 Complete End-to-End Example

Step-by-Step Fine-Tuning Workflow

 1# Step 1: Deploy infrastructure
 2cd infrastructure/cdk
 3cdk deploy
 4# Note the outputs: BucketName, BedrockRoleArn
 5
 6# Step 2: Create sample data (or use your own)
 7python -c "from scripts.prepare_data import create_sample_dataset; create_sample_dataset('data/raw/samples.jsonl', 500)"
 8
 9# Step 3: Prepare and upload data
10python scripts/prepare_data.py \
11  --input data/raw/samples.jsonl \
12  --bucket bedrock-training-ACCOUNT-REGION \
13  --prefix training-data
14
15# Step 4: Start fine-tuning job
16python scripts/start_training.py \
17  --job-name sentiment-classifier-v1 \
18  --base-model amazon.titan-text-express-v1 \
19  --train-data s3://bedrock-training-ACCOUNT-REGION/training-data/train_*.jsonl \
20  --val-data s3://bedrock-training-ACCOUNT-REGION/training-data/val_*.jsonl \
21  --output-s3 s3://bedrock-training-ACCOUNT-REGION/models/ \
22  --role-arn arn:aws:iam::ACCOUNT:role/BedrockServiceRole \
23  --epochs 3 \
24  --wait
25
26# Step 5: Monitor training (in another terminal)
27python scripts/monitor_training.py arn:aws:bedrock:REGION:ACCOUNT:model-customization-job/JOB_ID
28
29# Step 6: Test fine-tuned model
30python scripts/inference.py \
31  --model-arn arn:aws:bedrock:REGION:ACCOUNT:provisioned-model/MODEL_ID \
32  --prompt "Classify the sentiment of this review: Best product ever!"
33
34# Step 7: Compare with base model
35python scripts/inference.py \
36  --model-arn arn:aws:bedrock:REGION:ACCOUNT:provisioned-model/MODEL_ID \
37  --prompts-file test_prompts.txt \
38  --compare amazon.titan-text-express-v1

🎯 Best Practices and Tips

📊 Data Quality Best Practices

  1. Data Volume

    • Minimum: 100 examples for simple tasks
    • Recommended: 500-1,000 examples for most use cases
    • Maximum: 10,000 examples per job
  2. Data Diversity

    • Cover all edge cases and variations
    • Balance class distributions
    • Include negative examples
  3. Prompt Engineering

    • Keep prompts consistent in format
    • Use clear, specific instructions
    • Test prompt templates before fine-tuning

🔧 Hyperparameter Tuning

 1# Conservative (safe, slower learning)
 2conservative_params = {
 3    "epochCount": "5",
 4    "batchSize": "1",
 5    "learningRate": "0.000005"
 6}
 7
 8# Moderate (recommended starting point)
 9moderate_params = {
10    "epochCount": "3",
11    "batchSize": "1",
12    "learningRate": "0.00001"
13}
14
15# Aggressive (faster, risk of overfitting)
16aggressive_params = {
17    "epochCount": "2",
18    "batchSize": "2",
19    "learningRate": "0.00005"
20}

💰 Cost Optimization

  • Use validation sets to prevent overfitting (saves re-training costs)
  • Start with smaller datasets to validate approach
  • Use on-demand inference for testing, provisioned throughput for production
  • Delete unused custom models to avoid storage costs
  • Monitor training time - stop if loss plateaus early

🔒 Security Best Practices

 1# Always use encryption
 2training_config = {
 3    "s3Uri": "s3://bucket/data/",
 4    "encryption": {
 5        "type": "KMS",
 6        "kmsKeyId": "arn:aws:kms:region:account:key/key-id"
 7    }
 8}
 9
10# Use VPC endpoints for private connectivity
11vpc_config = {
12    "subnetIds": ["subnet-xxx", "subnet-yyy"],
13    "securityGroupIds": ["sg-xxx"]
14}
15
16# Enable CloudTrail logging for audit
17# Enable S3 bucket versioning for data recovery
18# Use IAM policies with least privilege principle

🎉 Conclusion

You’ve now learned how to fine-tune LLMs with AWS Bedrock, covering:

Three customization approaches - Fine-tuning, continued pre-training, reinforcement fine-tuning ✅ Complete data preparation pipeline - Validation, formatting, upload to S3 ✅ Production-ready Python scripts - Training job management, monitoring, inference ✅ Infrastructure as Code - CDK stack for reproducible deployments ✅ Best practices - Data quality, hyperparameters, cost optimization, security

Key Takeaways

  1. Start Simple - Begin with a small dataset and basic fine-tuning
  2. Validate Thoroughly - Use held-out validation data to prevent overfitting
  3. Monitor Costs - Fine-tuning is charged by the hour, monitor your jobs
  4. Iterate - Fine-tuning is iterative; expect to refine data and hyperparameters
  5. Secure Your Data - Use encryption, VPCs, and IAM best practices

Next Steps

Enhance your fine-tuning:

  • Implement automated evaluation metrics
  • Build CI/CD pipelines for model updates
  • Create A/B testing for model versions
  • Set up monitoring and alerting for inference quality
  • Explore reinforcement fine-tuning for alignment tasks

Advanced topics:

  • Multi-task fine-tuning
  • Few-shot learning optimization
  • Model compression and distillation
  • Cross-model ensemble approaches

Resources

Sources:


🏷️ Tags & Categories

Tags: AWS Bedrock, LLM, fine-tuning, machine-learning, AI, Claude, Titan, AWS CDK, Python, boto3, reinforcement-learning Categories: AI, AWS, Machine Learning, MLOps Difficulty: Intermediate to Advanced Time to Complete: 6-8 hours

Yen

Yen

Yen