Tuesday, April 17, 2012

Coldfusion (Railo) to AWS S3 using Amazon's JAVA API

Issue: Railo's native cffile to S3 gives every file the metadata content-type of 'application'.

Solution: Use the AWS Java API (until Railo implements a fix).

<cfset variables.t1 = "#getTickCount()#" />

<!--- Uploading a file from the server to S3 --->
<!--- The file to be uploaded --->
<cfset variables.filepath = "#expandPath('/')tmp/sample.txt" />
<cfset variables.filename = ListLast(variables.testfile,"/") />
<cfsst variables.s3path = "tmp/sd1/sd2/#variables.filename#" />

<!--- Read the file for sending to S3 --->
<cfset variables.javafileobj = createObject("java", "java.io.File").init(variables.filepath) />

<!--- Create the PutObject with the intended location (path) and  on S3 --->
<cfset variables.s3put = createObject("java","com.amazonaws.services.s3.model.PutObjectRequest").init("#this.s3.bucket#", "#variables.s3path#", variables.javafileobj) />

<!--- Set the metedata fields for Content Type and Content Disposition --->
<cfset variables.s3meta = createObject("java","com.amazonaws.services.s3.model.ObjectMetadata") />
<cfset variables.s3meta.setContentType("#getPageContext().getServletContext().getMimeType(variables.filepath)#") /><!--- Is this the best way? --->
<cfset variables.s3meta.setContentDisposition("inline; filename=#variables.filename#") /><!--- suggest filename for download else name becomes full path with '_'s --->
<cfset variables.s3put.setMetadata(variables.s3meta) />

<!--- Set the ACL (Access Control List) --->
<cfset variables.s3acl = createObject("java","com.amazonaws.services.s3.model.CannedAccessControlList").Private /><!--- Use: .Private or .PublicRead --->
<cfset variables.s3put.setCannedAcl(variables.s3acl) />

<!--- Create the java connector objects for S3 --->
<cfset variables.awscreds = createObject("java","com.amazonaws.auth.BasicAWSCredentials").init(this.s3.accessKeyId,this.s3.awsSecretKey) />
<cfset variables.s3client = createObject("java","com.amazonaws.services.s3.AmazonS3Client").init(variables.awscreds) />

<!--- The actual upload to s3 -- very simple --->
<cfset variables.s3obj = variables.s3client.putObject(variables.s3put) />

<cfset variables.t2 = "#getTickCount()#" />
Processing Time: <cfoutput>#variables.t2-variables.t1#</cfoutput>ms
<br />

<!--- Generating a download link --->
<!--- If ACL is Private - Get authenticated link and test the download and prompted file name --->
<cfif (variables.s3acl.toString() IS "private")>
<cfset variables.s3url = variables.s3client.generatePresignedUrl("#this.s3.bucket#", "#variables.s3path#", dateAdd("m",10,request.now)).toString() />
<cfoutput><a href="#variables.s3url#">#variables.s3url#</a></cfoutput>
<cfset variables.s3url = "https://#this.s3.bucket#.s3.amazonaws.com/#variables.s3path#" />
<cfoutput><a href="#variables.s3url#">#variables.s3url#</a></cfoutput>

Note #1: On Content Disposition - The JAVA API does not require you to pre-create the directories for the target desination of your file upload. Directories on S3 are a funny thing. '/' slashes in the target file path are not actual directories (and if they exist they are essentially ignored) to Amazon. So you don't have to create them before uploading your file when using any of Amazon's own APIs. You just upload your file to your bucket as "subdir1/subdir2/subdir3/sample.txt" and to Amazon it's part of the file name. When you go to download your file you will be given the prompt to download "subdir1_subdir2_subdir3_sample.txt". That's no fun. That's not what I want my users to see or download, and that's not what I want my file to me named. This causes a new problem that's no better than other problem of Railo's native S3 interface uploading all files with content type 'application' (which is solved by setting the metadata attribute with 'setContentType()'). Amazon refers to the slashes in your target path as 'prefixes' and it's in reference to 'versioning' from what I had read, but I don't understand the methodology. I want to understand their versioning because I'd like to take advantage of it but not at the expense of a subdirectory schema. To resolve this we have to rely on setting the metadata attribute with 'setContentDisposition()'. This tells the browser download prompt what the download file should be named. Now that is solved too.

Note #2: On Content Disposition - I setting it to "attachment; filename=#variables.filename#". Specifying 'attachment' forces a download prompt even for types that can be viewed in the browser, such as images and plain text. You could conditionally use 'inline' which prompts the browser to attempt to open all files in the browser window.

Note #3: On setting the ACL, this was tricky figuring out. In the end it was really easy. I'd strongly recommend only using "Private" or "PublicRead". I'm not sure of the full uses of "authenticated-read" and how it differs from authenticating a download link for temporary read access and shown here. Maybe it provides authenticated read without an expiration date?

Some day I would love to dig into Railo's source code and see how they are creating and managing S3 directories and files within a directory without specifically setting the content type meta data. I have seen methods inline for creating empty directories (https://forums.aws.amazon.com/thread.jspa?threadID=48740) but I tested creating the directories with Railo's native S3 first and them uploading the file with the JAVA API (minus the metadata setting) but the download filename still had the full path with '_'s in place of '/'s.

Anyone know?

Update: I've turned this into a cleaned-up CFC. I've never made a public repo on Github. Lets try this out.

Please add to it if you get the chance before I do.  Thanks!

Railo on Resin 4 and More! (Full Tutorial)

I *finally* have railo running on Resin 4. It's actually been over a month but I really wanted to post this tutorial.

No one seems to be able to get this going because of one major change in the settings files. Not just are some files settings/config files named differently, like 'resin.conf' of Resin 3.x is now 'resin.xml' in Resin 4, but the biggest kicker is that 'app-default.xml' is no longer used. In fact if you try to use it the server fails. i couldn't get the server to start under any circumstances with app-default.xml being included in the resin.xml. That's where all the Railo Servlet setting were added under Resin 3.x. So forget that 'app-default.xml' even exists and start looking at 'cluster-default.xml' for the location of the CFML Servlet configuration.

I believe this is the full list of steps I took to get my installation running. I needed the regex available for virtual hosts and Apache and Tomcat can't handle it or I would have stuck with the traditional installers. So I'm using Nginx and Resin 4 both of which do regex virtual hosts. :)

I'm, on AWS Micro instance running 64bit Ubuntu 11.10 (ami-2af9741a) for development:

I'm skipping the front end server set-up. You're probably using Apache anyway. I'm using Nginx as I already stated. I'll explain my Nginx setup after.
This is for *nix machines only. If you're a windows person, I hope you can find and to the equivalents of these commands.

Here we go: Installing and setting up Resin 4 with Railo Jars. 10 simple steps:
  1. #Following instructions from here: http://www.caucho.com/resin-4.0/admin/starting-resin-install.xtp#InstallingResinusingthedebpackageonUbuntuandDebian
    sudo vim /etc/apt/sources.list
    deb http://caucho.com/download/debian unstable multiverse

  2. sudo apt-get update    #this will catalog the new source so the next command will work

  3. sudo apt-get install resin
    #Note 1: You will see the following output above the package list:
    Note, selecting 'resin-pro' instead of 'resin'
    ... [package list] ...
    Do you want to continue [Y/n]? Y
    WARNING: The following packages cannot be authenticated!
    Install these packages without verification [y/N]? Y
    #I'm assuming that resin-pro without any license is the same as free resin. **If you know otherwise please let me know.**

    #Note 2: This installs the resin config files in /etc/resin/ and the resin engine in /usr/local/share/resin-pro-4.0.27/
    #Optionally, it's convenient/helpful to make a symbolic link from a more traditional location in /opt.
    sudo updatedb
    sudo locate /resin-pro    #just to make sure of the source directory
    sudo ln -s /usr/local/share/resin-pro-4.0.27 /opt/resin

    #Note 3: this will also install all the required Java-6-jre for you - very nice!
    #If you want to use Java 7, install it manually. Otherwise skip to Step 4.
    sudo apt-get install openjdk-7-jdk
    sudo updatedb
    locate /jre/bin/java
    #for me that output "/usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java" as the home for Java 7
    #Now, Resin only knows about java-6 so we have to tell it to use java-7.
    #Resin uses a symbolic link for it's JAVA variable. Make sure it's set by opening:
    sudo vim /opt/resin/bin/resinctl    #Look at near the top, it should have the following variables.
    #The first variable there, "JAVA" is using "/usr/bin/java" which is a symbolic link.
    #Follow the trail so we know where to update the system to use Java-7 rather then Java-6
    ls -al /usr/bin/java    #outputs:  /usr/bin/java -> /etc/alternatives/java
    ls -al /etc/alternatives/java    #outputs:  /etc/alternatives/java -> /usr/lib/jvm/java-6-openjdk/jre/bin/java
    #That's the real location, so if we update that link then we're set.
    sudo rm /etc/alternatives/java    #Remove the old link
    sudo ln -s /usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java /etc/alternatives/java    #set it to the path we got from "sudo locate /jre/bin/java"
    #DONE! now resin (and anything else that uses that traditional Linux java path) will be using Java-7.
    #You can do "ls -al /etc/alternatives/java" again just to make sure it's pointing to the right place

  4. #Resin may have been started as part of the install process. Make sure we stop it before adding our settings:
    sudo service resin stop

  5. #Now for the resin config and Railo CFML Servlet in cluster-default.xml
    #Remember the old instructions for 'app-default.xml' don't apply to Resin 4...
    sudo vim /etc/resin/cluster-default.xml

    1. #find the <class-loader> right near the top of the file
      <tree-loader path="${resin.home}/lib"/>
      <tree-loader path="${resin.root}/lib"/>

      #Now the whole class-loader section looks like this:
      <tree-loader path="${resin.root}/ext-lib"/>
      <tree-loader path="${resin.root}/resin-inf"/>
      <tree-loader path="${resin.home}/lib"/>
      <tree-loader path="${resin.root}/lib"/>
      <tree-loader path="cloud:/resin-inf"/>

      #The reason for this is that later our Railo Jars will go into ${resin.root}/lib

    2. #Now go down about 50% the way down in the file and look for the <web-app-default> section.
      #Got to the bottom of the <web-app-default> section, just above the closing </web-app-default> tag
      #After the closing </session-config> tag and above the closing </web-app-default> tag - add the following.
      #This is similar to instructions on the Railo site: http://www.getrailo.org/index.cfm/documentation/installation/railo-resin-apache/
      #Because the Resin 4 architecture is slightly different we need a few extra things above the 'CFMLServlet' section

      <compiling-loader path="WEB-INF/classes"/>
      <library-loader path="WEB-INF/lib"/>

      <servlet servlet-name="resin-file" servlet-class="com.caucho.servlets.FileServlet"/>

      <servlet servlet-name="CFMLServlet" servlet-class="railo.loader.servlet.CFMLServlet">
      <description>Railo Web Directory directory</description>
      <servlet servlet-name="AMFServlet" servlet-class="railo.loader.servlet.AMFServlet">

      <servlet-mapping url-pattern="*.cfm" servlet-name="CFMLServlet"/>
      <servlet-mapping url-pattern="*.cfml" servlet-name="CFMLServlet"/>
      <servlet-mapping url-pattern="*.cfc" servlet-name="CFMLServlet"/>


    3. #Done. Save and close cluster-default.xml

  6. #Now we can set up the resin.xml file with our virtual hosts.
    sudo vim /etc/resin/resin.xml

    1. #Find the <cluster id="app"> section near the bottom of the file.
      #I left <cluster id="web"> and <cluster id="memcached"> in there but I'm not sure they are needed.
      #Inside <cluster id="app"> be sure to comment out the entire default <host id="" root-directory="."> ... </host> section and everything inside it.
      #Add your own virtual hosts below it but above the closing </cluster> tag.

      <host id="mysite.com" root-directory="/var/www/mysite">
      <web-app id="/" />

      #Done. Save and close resin.xml

  7. #Other resin settings are set in "/etc/resin/resin.properties" such as the post that resin is listening on - default is port 8080.
    #Change this to post 80 if you don't have any other front-end web server.
    #If you do have a front-end web server then keep the post as 8080 and proxy coldfusion files to this port. (Not described in this tutorial.)

  8. #Now we get the Railo jars and install them - I do all this as root
    sudo -s
    cd /opt/resin
    wget http://www.getrailo.org/down.cfm?item=/railo/remote/download/

    ls -al    #To see the downloaded file
    #Sometimes this saves the zip file name as "down.cfm?item-/railo/......." long string
    #If so, do: mv dow[tab] railo-
    #If you don't have unzip installed, type: apt-get install unzip    then retry unzip railo-
    #this creates a new directory "/opt/resin/railo-" with the jar files inside
    ls -al ./railo-
    mv ./railo-* ./lib/
    ls -al ./railo-    #To see that everything was moved type "ls -al ./railo-" again and "ls -al ./lib"
    rm -R ./railo-    #This removes the not-needed, empty directory from the upzip
    exit    #Don't forget to get out of root

  9. #The web directory...   I had issues with this part.
    #It seems picky on permissions and generating the WEB-INF directory because it's running as the user "www-data".

    #If you create your own directory in "/var/www/mysite" then it has trouble generating it's own WEB-INF.
    #You can try making the required directories for "WEB-INF" and "log" and assigning the ownership:
    cd /var/www/mysite
    sudo chown -R www-data:www-data ./log ./WEB-INF

    #If you *don't* create your site directory in /var/www then it will create one and generate the mysite directory and WEB-INF and log subdirectories.
    #If it does this automatically then you will need change the owner of the mysite directory to yourself:
    cd /var/www/mysite
    sudo chown [myname]:[mygroup] .    #Don't do -R on this one it also changes ownership of WEB-INF and log
    # Don't forget the period meaning the current directory, you know.
    #If you don't know your name, type "whoami" and set that as both myname and mygroup

    #If you don't want to deal with these permission issues you can set resin to run as root by editing
    vim /etc/resin/resin.properties
    #and change these lines, removing the 'www-data' user and gorup:
    setuid_user   : www-data
    setuid_group  : www-data
    #to be blank, like this
    setuid_user   :
    setuid_group  :

    #Note: But be careful, if you ever forget to do 'sudo' when starting resin it will start under your user permissions and not run properly!

    #Then you can go create your own /var/www/mysite directory
    cd /var/www
    mkdir mysite

    #Create your Application.cfc
    touch Application.cfc

    #Create your index.cfm with something in it so we can see if CFML is being processed
    vim index.cfm
    Hello World!
    <br />
    <br />
    The time is" <cfoutput>#NOW()#</cfoutput>

  10. #All done! Yes, really! Fire it all up and see what's broken. I hope I didn't forget any steps!  :)
    #Now if you don't have a real domain pointing to your server or if this is local or a virtualbox,
    #Then you can set up a /etc/hosts alias on your local machine direct a fake domain for testing
    # On your local maching:
    sudo vim /etc/hosts    mysite.local
    #Then add a matching to your resin.xml virtual hosts settings.
    sudo vim /etc/resin/resin.xml

    sudo service resin start
        #cross fingers...

    #Point your browser to http://mysite.local or whatever your set as your domain and see it things are working.

    #Debugging issues - somewhere to start:

    #If your getting errors like this below then your resin.xml hosts is not catching your domain.
    "404 Not Found     / was not found on this server."

    #If your CFML source code is showing on screen then the CFMLServlet in cluster-default.xml isn't working.

Railo on Resin 4 tutorial is complete.
Comments are invited. Let me know how things worked for you.


Now, I also wanted to outline my Nginx configuration in front of Resin 4.

I'm building a SAAS application where every client upon signup will select their own subdomain. Now, I'm cheap and I don't want to pay for second server and so I'm breaking the cardinal rule and I'm developing on my temporary production server. Once things start going I'll keep this as a sol development server and build a larger production server. but for now I need one server with subdomains.mysite.com going to one virtual host, the production site for now, and dev.subdomains.mysite.com going to the dev site. This way I can run separate code bases and do GIT deployments at regular intervals so I'm not interrupting the demo live site.

Apache can almost do the front end web serving the way I need, but not quite, and not as easily as I'd like. I've never used Nginx, but I think I've successfully make the jump. I was really comfortable with Apache. I didn't realize just how similar the two servers setups are and how easy it was to switch over.

Here is my Nginx "/etc/nginx/sites-enabled/mysite" virtual hosts condiguration

####  DEV
server {
root /var/www/mysite_dev;
index index.cfm index.html;
server_name ~^dev\.[\w-]+\.mysite\.com$;
location / {
# First attempt to serve request as file, then
# as directory, then fall back to index.html
try_files $uri $uri/ /index.cfm /index.html;
location ~ \.cf[cm]$ {
proxy_pass              http://dev.mysite.local:8080;
proxy_set_header        X-Real-IP $remote_addr;
proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header        Host $http_host;

####  LIVE
server {
root /var/www/mysite;
index index.cfm index.html;
server_name ~^[\w-]+\.mysite\.com$;
location / {
# First attempt to serve request as file, then
# as directory, then fall back to index.html
try_files $uri $uri/ /index.cfm /index.html;
location ~ \.cf[cm]$ {
proxy_pass              http://prod.mysite.local:8080;
proxy_set_header        X-Real-IP $remote_addr;
proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header        Host $http_host;

Note the proxy_pass line in each vhost pointing to "dev.mysite.local" and "prod.mysite.local" and not to "localhost". I was proud of this little invention of mine to allow resin to catch separate domains being proxied over to it.

I added following to my "/etc/hosts" to make this possible:    prod.mysite.local    dev.mysite.local

#I think this can be done on one line following localhost: localhost prod.mysite.local dev.mysite.local

Then I have my resin.xml virtual hosts configured like this:
<!-- Live Site -->
<host id="prod.mysite.local" root-directory="/var/www/mysite">
<web-app id="/" />
<!-- Dev Site -->
<host id="dev.mysite.local" root-directory="/var/www/mysite_dev">
<web-app id="/" />

This would also work if I dropped nginx and used Resin as the front-end web server as well. I like that as I think Resin 4 can do clustering and HTTPS SSL without subscribing for a license. But I'm not certain on that point. But I also like what I read/hear about Nginx's robustness on the front-end.

That's all I got. That was a lot. Enjoy.