AWS

AWS Data Pipeline is a web service that you can use to automate the movement and transformation of data. With AWS Data Pipeline, you can define data-driven workflows, so that tasks can be dependent on the successful completion of previous tasks.

AWS Data Pipeline requires IAM roles to determine what actions your pipelines can perform and who can access your pipeline’s resources.
The AWS Data Pipeline console creates the following roles for you:
DataPipelineDefaultRole
DataPipelineDefaultResourceRole
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:List*",
"s3:Put*",
"s3:Get*",
"s3:DeleteObject",
"dynamodb:DescribeTable",
"dynamodb:Scan",
"dynamodb:Query",
"dynamodb:GetItem",
"dynamodb:BatchGetItem",
"dynamodb:UpdateTable",
"ec2:DescribeInstances",
"ec2:DescribeSecurityGroups",
"ec2:RunInstances",
"ec2:CreateTags",
"ec2:StartInstances",
"ec2:StopInstances",
"ec2:TerminateInstances",
"elasticmapreduce:*",
"rds:DescribeDBInstances",
"rds:DescribeDBSecurityGroups",
"redshift:DescribeClusters",
"redshift:DescribeClusterSecurityGroups",
"sns:GetTopicAttributes",
"sns:ListTopics",
"sns:Publish",
"sns:Subscribe",
"sns:Unsubscribe",
"iam:PassRole",
"iam:ListRolePolicies",
"iam:GetRole",
"iam:GetRolePolicy",
"iam:ListInstanceProfiles",
"cloudwatch:*",
"datapipeline:DescribeObjects",
"datapipeline:EvaluateExpression"
],
"Resource": [
"*"
]
}
]
}
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:List*",
"s3:Put*",
"s3:Get*",
"s3:DeleteObject",
"dynamodb:DescribeTable",
"dynamodb:Scan",
"dynamodb:Query",
"dynamodb:GetItem",
"dynamodb:BatchGetItem",
"dynamodb:UpdateTable",
"rds:DescribeDBInstances",
"rds:DescribeDBSecurityGroups",
"redshift:DescribeClusters",
"redshift:DescribeClusterSecurityGroups",
"cloudwatch:PutMetricData",
"datapipeline:*"
],
"Resource": [
"*"
]
}
]
}
Error MessageUnable to create resource for @EC2ResourceObj_2017-05-05T04:25:32 due to: No default VPC for this user (Service: AmazonEC2; Status Code: 400; Error Code: VPCIdNotSpecified; Request ID: bd2f3abb-d1c9-4c60-977f-6a83426a947d)
When you look at your VPC, you would notice Default VPC is not configured. While launching EC2 Instance on Data Pipeline, by default it can’t figure out which VPC to use and that needs to be explicitly specified in Configurations.


Use Cases for AWS Data Pipeline
Setup sample Pipeline in our develop environment
Import Text file from AWS S3 Bucket to AURORA Instance
Send out notifications through SNS to [email protected]
Export / Import Data Pipe Line Definition.
Have MySQL Instance
Access to Invoke Data Pipeline with appropriate permissions
Target Database and Target Table
SNS Notification setup with right configuration
Create Data Pipeline with Name
Create MySQL Schema and Table
Configure Your EC2 Resource ( Make sure EC2 instance has access to MySQL Instance ).
If MySQL instance allows only certain IPS’s and VPC, then you need to configure your EC2 Resource in the same VPC or Subnet.
Configure Data Source and appropriate Data Format ( Notice this is Pipe Delimited File ant CSV File ).
Configure your SQL Insert Statement
Configure SNS Notification for PASS / FAIL Activity.
Run your Pipeline and Troubleshoot if errors occur.

AWS_Data_PipeLine_S3_MySQL_Defintion.json
create table users_01(
userid integer not null primary key,
username char(8),
firstname varchar(30),
lastname varchar(30),
city varchar(30),
state char(2),
email varchar(100),
phone char(14),
likesports varchar(100),
liketheatre varchar(100),
likeconcerts varchar(100),
likejazz varchar(100),
likeclassical varchar(100),
likeopera varchar(100),
likerock varchar(100),
likevegas varchar(100),
likebroadway varchar(100),
likemusicals varchar(100))
INSERT INTO `ETLStage`.`users_01`
(`userid`,
`username`,
`firstname`,
`lastname`,
`city`,
`state`,
`email`,
`phone`,
`likesports`,
`liketheatre`,
`likeconcerts`,
`likejazz`,
`likeclassical`,
`likeopera`,
`likerock`,
`likevegas`,
`likebroadway`,
`likemusicals`)
VALUES
(
?,
?,
?,
?,
?,
?,
?,
?,
?,
?,
?,
?,
?,
?,
?,
?,
?,
?
);
errorMessage Quote character must be defined in record format
https://stackoverflow.com/questions/26111111/data-pipeline-error-on-a-template-from-rds-to-s3-copy
You can use “TSV” type as your custom format type and provide:
errorId : ActivityFailed:SQLException
errorMessage : No value specified for parameter
errorMessage : Parameter index out of range (1 > number of parameters, which is 0).
errorMessage : Incorrect integer value: ‘FALSE’ for column ‘likesports’ at row 1
Ensure the Table Column Data Type set to correct . By Default MySQL Doesn’t covert TRUE / FALSE into Boolean Data Type.
errorMessage : Parameter index out of range (1 > number of parameters, which is 0).
errorMessage for Load script: ERROR 1227 (42000) at line 1: Access denied; you need (at least one of) the LOAD FROM S3 privilege(s) for this operation
Share this:

In the first part of this series, we introduced the idea of moving beyond dashboards to build diagnostic AI agents capable of uncovering the why behind business performance shifts. That article focused on architectural principles and the role of AWS Strands in enabling controlled agentic behavior. In this follow-up, we take a more detailed look at how […]

Organizations continue to process a significant portion of their operational data through documents—particularly invoices, which arrive in multiple formats, structures, and levels of quality. Traditionally, handling these documents requires manual review, data entry, and routing, which introduces delays and increases the likelihood of errors. With the steady advancement of Azure’s AI capabilities and serverless integration services, customers […]

The AI era demands more from our applications than ever before. Legacy ASP.NET applications, while reliable workhorses, often struggle with the scalability, flexibility, and integration capabilities needed to leverage modern AI services. But how do you modernize without risking business continuity? At CloudIQ, we've not only researched and documented the best strategies—we've built them. This post brings together everything we've learned: comprehensive strategy, […]
Partner with CloudIQ to achieve immediate gains while building a strong foundation for long-term, transformative success.