Tuesday, April 17, 2012

Coldfusion (Railo) to AWS S3 using Amazon's JAVA API

Issue: Railo's native cffile to S3 gives every file the metadata content-type of 'application'.

Solution: Use the AWS Java API (until Railo implements a fix).

<cfset variables.t1 = "#getTickCount()#" />

<!--- Uploading a file from the server to S3 --->
<!--- The file to be uploaded --->
<cfset variables.filepath = "#expandPath('/')tmp/sample.txt" />
<cfset variables.filename = ListLast(variables.testfile,"/") />
<cfsst variables.s3path = "tmp/sd1/sd2/#variables.filename#" />

<!--- Read the file for sending to S3 --->
<cfset variables.javafileobj = createObject("java", "java.io.File").init(variables.filepath) />

<!--- Create the PutObject with the intended location (path) and  on S3 --->
<cfset variables.s3put = createObject("java","com.amazonaws.services.s3.model.PutObjectRequest").init("#this.s3.bucket#", "#variables.s3path#", variables.javafileobj) />

<!--- Set the metedata fields for Content Type and Content Disposition --->
<cfset variables.s3meta = createObject("java","com.amazonaws.services.s3.model.ObjectMetadata") />
<cfset variables.s3meta.setContentType("#getPageContext().getServletContext().getMimeType(variables.filepath)#") /><!--- Is this the best way? --->
<cfset variables.s3meta.setContentDisposition("inline; filename=#variables.filename#") /><!--- suggest filename for download else name becomes full path with '_'s --->
<cfset variables.s3put.setMetadata(variables.s3meta) />

<!--- Set the ACL (Access Control List) --->
<cfset variables.s3acl = createObject("java","com.amazonaws.services.s3.model.CannedAccessControlList").Private /><!--- Use: .Private or .PublicRead --->
<cfset variables.s3put.setCannedAcl(variables.s3acl) />

<!--- Create the java connector objects for S3 --->
<cfset variables.awscreds = createObject("java","com.amazonaws.auth.BasicAWSCredentials").init(this.s3.accessKeyId,this.s3.awsSecretKey) />
<cfset variables.s3client = createObject("java","com.amazonaws.services.s3.AmazonS3Client").init(variables.awscreds) />

<!--- The actual upload to s3 -- very simple --->
<cfset variables.s3obj = variables.s3client.putObject(variables.s3put) />

<cfset variables.t2 = "#getTickCount()#" />
Processing Time: <cfoutput>#variables.t2-variables.t1#</cfoutput>ms
<br />


<!--- Generating a download link --->
<!--- If ACL is Private - Get authenticated link and test the download and prompted file name --->
<cfif (variables.s3acl.toString() IS "private")>
<cfset variables.s3url = variables.s3client.generatePresignedUrl("#this.s3.bucket#", "#variables.s3path#", dateAdd("m",10,request.now)).toString() />
<cfoutput><a href="#variables.s3url#">#variables.s3url#</a></cfoutput>
<cfelse>
<cfset variables.s3url = "https://#this.s3.bucket#.s3.amazonaws.com/#variables.s3path#" />
<cfoutput><a href="#variables.s3url#">#variables.s3url#</a></cfoutput>
</cfif>


Note #1: On Content Disposition - The JAVA API does not require you to pre-create the directories for the target desination of your file upload. Directories on S3 are a funny thing. '/' slashes in the target file path are not actual directories (and if they exist they are essentially ignored) to Amazon. So you don't have to create them before uploading your file when using any of Amazon's own APIs. You just upload your file to your bucket as "subdir1/subdir2/subdir3/sample.txt" and to Amazon it's part of the file name. When you go to download your file you will be given the prompt to download "subdir1_subdir2_subdir3_sample.txt". That's no fun. That's not what I want my users to see or download, and that's not what I want my file to me named. This causes a new problem that's no better than other problem of Railo's native S3 interface uploading all files with content type 'application' (which is solved by setting the metadata attribute with 'setContentType()'). Amazon refers to the slashes in your target path as 'prefixes' and it's in reference to 'versioning' from what I had read, but I don't understand the methodology. I want to understand their versioning because I'd like to take advantage of it but not at the expense of a subdirectory schema. To resolve this we have to rely on setting the metadata attribute with 'setContentDisposition()'. This tells the browser download prompt what the download file should be named. Now that is solved too.

Note #2: On Content Disposition - I setting it to "attachment; filename=#variables.filename#". Specifying 'attachment' forces a download prompt even for types that can be viewed in the browser, such as images and plain text. You could conditionally use 'inline' which prompts the browser to attempt to open all files in the browser window.

Note #3: On setting the ACL, this was tricky figuring out. In the end it was really easy. I'd strongly recommend only using "Private" or "PublicRead". I'm not sure of the full uses of "authenticated-read" and how it differs from authenticating a download link for temporary read access and shown here. Maybe it provides authenticated read without an expiration date?

Some day I would love to dig into Railo's source code and see how they are creating and managing S3 directories and files within a directory without specifically setting the content type meta data. I have seen methods inline for creating empty directories (https://forums.aws.amazon.com/thread.jspa?threadID=48740) but I tested creating the directories with Railo's native S3 first and them uploading the file with the JAVA API (minus the metadata setting) but the download filename still had the full path with '_'s in place of '/'s.

Anyone know?



Update: I've turned this into a cleaned-up CFC. I've never made a public repo on Github. Lets try this out.


Please add to it if you get the chance before I do.  Thanks!

2 comments:

Unknown said...

I seem to be having a problem with the S3 wrapper. I have the aws java sdk jar file installed, but i keep receiving the following error in Railo:

java.lang.NoClassDefFoundError

org/apache/http/impl/conn/PoolingClientConnectionManager

Unknown said...

How would you change the java object creation:



if the image that you wanted to upload to s3 was already in memory as a result of a cfhttp call: