Wednesday, December 5, 2007

Final Thoughts - The Labs

I'd just like to document some of the final thoughts I've had about these labs that we've done this semester. This is more for the benefit of those taking the class in the future, specifically those who are doing the Amazon EC2 labs.

One thing I did, which has saved me who-knows-how-much time was using a dynamic dns service for my servers. I had one for each of my three major servers (web server, listing server, submit server) which allowed me to hard code my calls during testing. Not only did I not have to refresh as much to get my server, but it also saved me when the submit service load balancer started having problems during the past two (or more?) weeks. Scott Chun has a good description of how to set up your server to register with dyndns.com's dynamic hostname service. Read about it here.

One thing I would have done differently would be to store those URLs in a config file (elementary 240 stuff) so I'd only have to change it in one place to make the switch from hard-coded to load-balanced.

Something that was mentioned several times in class, and I would still love to see a tutorial on how to do this, was an alternative to frequent image persisting. Especially for minor script changes I didn't realize I needed until having started the persist process, heaven knows I've had more of those than I needed. The solution is to have your server automatically check out the files it needs from a CVS/SVN repository on startup. I'm only an amateur shell scripter, but I assume there are two things the script would need to do. 1) check out the files. 2) make sure they have the correct permissions. Anyway, this would have been a real time saver.

It would have been fun to get a little experience with Pound in the labs. That's the only thing in the entire process that I don't feel I could go off and do right now. Sam says it's pretty simple to figure out, I guess I'll be finding out in a couple weeks when I'm off on my own to try it.

This class has really been an enjoyable experience. Earlier in the semester when I gave up on Python, I thought I'd feel some remorse at the end for not having stuck with it. Well, I don't. The architectural concepts the labs have illustrated really do transcend the languages used, and I'm glad I didn't get so bogged down in the language that I missed the point (that's the reason for scraping the EJB labs, right?). One thing I do regret is that Sam isn't making us use a template for our demonstration for Jeff Barr. Speaking of that, Nathan, you'd have gotten my vote for best design. Did anyone else have a design for consideration?

Monday, December 3, 2007

Lab 5: Finishing Touches

After nearly a month (of scattered work) I've finally put the finishing touches on my approval client. If anyone wants to get more experience dealing with asynchonous web applications, I'd recommend Flex. You could experience asynchrony with Ajax, but as someone who's used both considerably... I just find I have more time for fun when I'm programming in Flex.

Anyway, the link to the online version of my approval client is here: http://wishlist.dusbabek.net (same link as before).

I don't have any final thoughts to share about this lab, per se. In the future I would like to explore the scalability of Flex applications in a little more depth. Flex apps compile into SWF files, which can get reasonably large depending on the application (several hundred K to a couple megabytes).

A couple thoughts I've had on this:
1. Decrease the file size: don't embed. It's a common practice to embed all resources necessary (including some images) some of which may not be required immediatly.
2. Decrease the file size: Break into smaller SWFs. Flex makes it possible to load other SWFs at run time. Rather than compile all functionality into a single SWF, it could be broken into smaller functional applications that could be loaded lazily.
3. Reduce bandwidth on data transfer. Flex has no means of accessing a relational database directly, all data comes from either static XML files or web services (using the broad sense of the word). The amount of bandwidth needed could be reduced by using a lighter data format like JSON for RESTful services; using a binary format (like AMF); or by serving data from static XML files where appropriate.

These were just the first couple of more obvious things to occur to me. It've got 2 medium to large scale Flex applications I'm working on at the moment, and it'll be interesting to see what I can come up with.

Saturday, December 1, 2007

Lab 5 : PHP and SOAP

The only SOAP requests I've ever made were made on the .NET platform. They're not that much of a beast on .NET, but it wasn't exactly a cake walk either. So I had been bracing myself for the worst trying to implement it in PHP.

I should explain that my lab 5 client connects to a PHP service that in turn makes the SQS requests, etc. I initially wanted to implement an SQS library in Actionscript (and probably will in the future when I'm not pressed by deadlines) but I decided it was too ambitious for the amount of time I wanted to spend on this lab. So alas, a PHP service also handles my SOAP request to WHOIS.

Anyway, I was expecting SOAP on PHP to be a seriously complex affair. Here's my code that makes the request:


$client = new SOAPClient("http://www.webservicex.net/whois.asmx?WSDL");
$params = array('HostName' => $_GET['url']);
$whois = $client->GetWhoIS($params);


Granted, it would have required about 2 more lines if there wasn't a URL to the WSDL, but it doesn't get much simpler than that. I should mention that this requires that PHP SOAP be enabled (uncomment a line in your php.ini if you're running Windows; recompile from source using 'enable-soap' if you're running Linux). I didn't have to recompile, thanks once again to Remi Collet (the French guy who has yum rpms for all this stuff, see my previous post).

Well, the SQS library I'm using is pretty old and doesn't have a means of querying the queue for the number of messages. So, I thought I'd try sending a SOAP message to Amazon to get it. Amazon's WSDL is a little more complex, and I probably could have gotten it to work if I wanted to play around with the messages for another hour or so. It turned out to be a miserable failure, and I resorted to my old tricks: (file_get_contents()) which worked perfectly. Here's the code I used, which shows the query string needed to get the number of messages:


$timestamp = gmdate('Y-m-d\TH:i:s\Z');

$qs = "http://queue.amazonaws.com/A3N3IV5XJH079S/processing" .
  "?Action=GetQueueAttributes" .
  "&Attribute=ApproximateNumberOfMessages" .
  "&AWSAccessKeyId=[AMAZON_ACCESS_KEY]" .
  "&Version=2007-05-01" .
  "&Timestamp=" . urlencode($timestamp) .
  "&Signature=" . urlencode(constructSig('GetQueueAttributes' . $timestamp));

$response = file_get_contents($qs);


The constructSig is the same method I listed in a previous post.

Here are a few links that were helpful:
SQS Query and SOAP API
Getting SQS Attributes
SQS WSDL

Lab 5 : Web App to Desktop App using Flex 3

I'm almost finished with lab 5. On the whole it's been pretty fun, aside from some of the frustration from minute details that take an hour apiece to hammer out. I had developed most of my application as a web app before the specs came out. Fortunately I was using Flex, so let me show you how easy it was to convert it from a web app to a desktop app.

As a web app, the main page was enclosed in elements like these:

<mx:Application xmlns:mx="http://www.adobe.com/2006/mxml" layout="absolute" backgroundColor="#2C3552" xmlns:local="*">
.
.
.
<mx:Application>


To deploy it as a desktop app, I had to change it to:


<mx:WindowedApplication xmlns:mx="http://www.adobe.com/2006/mxml" layout="absolute" backgroundColor="#2C3552" xmlns:local="*">
.
.
.
<mx:WindowedApplication>


and then recompile. And that's it. And this may appeal to those of you with high design sensibilities-- it looks the same on the desktop as it does on the Web. Incidentally, on the web it looks identical on every browser/platform combination (any platform that has a Flash player, that is).

Tuesday, November 13, 2007

Lab 4 Schema Clarifications

I came away from our lab 4 design session with a couple wrong impressions regarding the schema, and got them clarified by Sam yesterday. Here's a less ambiguous version of the message format we're supposed to use (would have updated it on the lab page but I'm not on campus at the moment).


<idea guid="">
  <initiated date="" technology="(wstechnology)">Name provided by user</initiated>
  <submitted date="" technology="(submit server tech)">RY name of submit server creator</submitted>
  <spam>true</spam>
  <domain>www.foo.com</domain>
  <body>Foo and gunk are better for this site than xs</body>
</idea>


You should notice that the wsuser that we POSTed from our submit script is getting thrown away. I assumed that since we went through the trouble of POSTing it, we'd definitely use it-- and I ended up putting the user provided name in the submitted element.

Looking back I don't think we got our schema right. I think we're trying to cram too much information into too few elements. Sure, it keeps us from having to add an additional element (<user> for example) to store the information; but the result is that we've lost some information-- wsuser (less important) and made it more confusing (more important).

Saturday, November 10, 2007

SQS: Queue Length / Auth Signature

To get the queue length, as well as the visibility timeout, you make a request using the GetQueueAttributes action. The PHP library I'm using to make my calls to SQS doesn't support this call (must have been written before the 2007-05-01 release of SQS) so my options are to find a new library, or to write my own function to do this.

I decided to try writing my own first, and while researching this I found something I was looking for while doing lab 4. How to compute the authorization header, or Signature.

The process is as follows, you take the query parameters and concatenate them all end to end (key preceding value). Don't include the ?, &, or = signs. Then you calculate the HMAC-SHA1 signature of that string (using your secret access key). Then convert it to base64.

Here's the example Amazon gives on their site.

The following request:

?Action=CreateQueue
&QueueName=queue2
&AWSAccessKeyId=0A8BDF2G9KCB3ZNKFA82
&SignatureVersion=1
&Expires=2007-01-12T12:00:00Z
&Version=2006-04-01


translates into the following string:

ActionCreateQueueAWSAccessKeyId0A8BDF2G9KCB3ZNKFA82Expires2007-01-12T12:00:00ZQueueNamequeue2SignatureVersion1Version2006-04-01

which when hashed with the secret key (fake-secret-key, used in this example) yields:

wlv84EOcHQk800Yq6QHgX4AdJfk=
(URL encoded version: wlv84EOcHQk800Yq6QHgX4AdJfk%3D)


I looked at my PHP library, and sure enough here are the methods that create the signature. They require the PEAR Crypt_HMAC package.


function hex2b64($str) {
  $raw = '';
  for ($i=0; $i < strlen($str); $i+=2) {
    $raw .= chr(hexdec(substr($str, $i, 2)));
  }
  return base64_encode($raw);
}

function constructSig($str) {
  $hasher =& new Crypt_HMAC($this->secretKey, "sha1");
  $signature = $this->hex2b64($hasher->hash($str));
  return($signature);
}

Lab 5 : Approval Client

I'm just about finished with my approval client, the things I need to do are:

A) figure out how to get a count of the number of items in the queue (my library doesn't support that function)
B) Make my SOAP calls to the WHOIS service (is there a specific service we're supposed to be using for this?)

I've got my prototype running here. I'm using Flex for the front end, with data provided by a PHP backend. My plug for Flex follows...

Based on anecdotal evidence (i.e., conversations I've had) I think that Adobe Flex is one of the most misconstrued technologies in our department. I just wanted to take a few lines and address some of the misconceptions I've heard as I've discussed Flex with fellow students.

Things you've probably heard about Adobe Flex:
1. It's proprietary.
2. It's uses Flash.
3. It costs hundreds of dollars for the compiler/IDE.
4. Flex data services costs several thousand dollars per processor to deploy.

While there is some truth to all the statements above, but bottom line in regards to cost is this: Flex is free.

You can download the free SDK (incidentally, all you need to compile and deploy Flex applications) here

If you'd like an IDE, go here to download a free academically licensed version of Flex Builder 2, with Charting. You'll need to provide some identification.

Follow links here to download Flex Data Services Express (licensed free of charge on up to 1 processor). I should mention that I've used data in all the Flex applications I've developed, and I have yet to try Flex Data Services. There is a wide variety of options for getting data to your application. I've used Java web services, PHP pages returning XML, AMF (Actionscript's serialized data format) streams, and a couple others.

And yes, it does run in a Flash player (the resemblance to Flash ends there) but that does have its advantages. As long as your platform has a Flash player, your application will look and function exactly the same on Linux, a Mac, a Windows PC, or whatever. And as far as RIA's go, that's saying something.

All things accounted for, it's cheaper to develop in Flex than it is to develop in AJAX. And the applications look better consistently across platforms. So if the "cost and Flash" are the only things holding you back from checking it out, you really ought to look into it.

Thursday, November 8, 2007

Lab 4 - Submit Server

Once again, I got my best advice at the end of the lab. I'll share it with you-- use a library when trying to interact with the SQS.

The first approach I tried that failed was using cURL to send the PUT request. You can see the code I used for that in my previous POST. Although it works in general, I wasn't having much success using it with SQS. Perusing the SQS documentation a little more closely revealed that I was not sending the correct headers. Here's a link to it.

I didn't find that link until I had given up on cURL and switched to PEAR's HTTP_Request package. As far as general purpose HTTP request packages go, the interface is much easier to use. I added most of the headers, but was having trouble formatted the Authorization header correctly. That's when I got the advice to use a library.

I checked out a couple SQS libraries for PHP, the one I ended up going with was one I found on Amazon's site. The documentation is wanting, but I was able to figure out what to do. It makes use of the PEAR extensions. I didn't have to make the changes to the PEAR code like it suggests on the site (I don't know if it's because it was correct, or because I wasn't exercising the faulty code).

Here's the code I used to make the request:

function submitToQueue($xml) {
  $q = new sqs('ACCESS-KEY', 'SECRET-KEY', 'http://queue.amazonaws.com/');
  $queueId="A3N3IV5XJH079S/processing";
  $q->putMessage($xml, $queueId, 1000);
}


And that was it.

Saturday, November 3, 2007

GET / POST / PUT Using PHP

The simplest way by far to do a GET in PHP (if you just want the return contents and don't care about the headers) is to use the file_get_contents. It's useful for getting the contents of a file quickly, or the contents of a web page. For example this method, from lab 2, retrieves data from S3.


function queryS3($path) {
  $contents = file_get_contents('http://s3.amazonaws.com/cs462-data/' . $path);

  return $contents;
}



Doing an HTTP POST, I use the cURL library. If you've never used curl before, there's a slight learning curve for doing a POST. I was able to figure it out after looking at a couple samples (the PHP documentation isn't much help). The key is creating an associative array from your POST data fields.

Here's an example from Lab 3, where my submit script (which is actually a Submit object) finally submits the idea to the submit server.


function sendToSubmitAppServer() {
  $url = "http://sslb-p.webappwishlist.com:8080/submit";
  $useragent="Johns Web Service Server, version 7";

  $data = array();
  $data['domain'] = $this->domain;
  $data['name'] = $this->name;
  $data['idea'] = $this->idea;

  $ch = curl_init();
  curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
  curl_setopt($ch, CURLOPT_URL,$url);
  curl_setopt($ch, CURLOPT_POST, 1);
  curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
  $result = curl_exec ($ch);
  curl_close ($ch);
}


Doing HTTP PUT requests was a little more tricky, because I could send PHP doesn't offer a straightforward way of specifying the body of a PUT request, specifically where the body comes from a local string variable. I found a solution here (it basically says all the stuff I just said, with a code example. Here's my code example from lab 4, where I'm sending the XML data file to SQS. I can't be sure that it works, as I don't know how to test if something was sent to the queue. I'll update this example if I find any errors during the next couple days.


function submitToQueue($xml) {
  $url = "http://queue.amazonaws.com/A3N3IV5XJH079S/processing";
  $useragent="Distributed Systems/v3.4 (compatible; Mozilla 7.0; MSIE 8.5; http://classes.eclab.byu.edu/462/)";

  $fh = fopen('php://memory', 'rw');
  fwrite($fh, $xml);
  rewind($fh);

  $ch = curl_init();
  curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
  curl_setopt($ch, CURLOPT_INFILE, $fh);
  curl_setopt($ch, CURLOPT_INFILESIZE, strlen($xml));
  curl_setopt($ch, CURLOPT_TIMEOUT, 10);
  curl_setopt($ch, CURLOPT_PUT, 1);
  curl_setopt($ch, CURLOPT_URL, $url);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

  $result = curl_exec($ch);
  curl_close($ch);
  fclose($fh);
}


It's opening a memory stream like it would a file stream, and then uses the CURLOPT_INFILE to specify the data to be sent in the body. I ran into a situation like this before (not while using PUT) which I solved by actually writing data to a file and then passing the file back in. Talk about a hack...

Friday, November 2, 2007

Updating PHP on Fedora Core 4

Using Amazon's standard EC2 images as the base image to your servers probably means you're going to be running Fedora Core 4, with outdated versions of PHP and MySQL. In my case PHP 5.0.4-- and I've run into several problems with this, as I've wanted to use some of PHP's advanced functionality. Functionality that is either not included by default with PHP 5.0, or not included at all.

An example of functionality not included by default would be JSON, and example of functionality not included at all would be memory based streams. Both of which have had or are having applications to this lab. I finally decided I needed to have PHP 5.2 installed on my servers. The only problem is that none of the repositories configured by default have PHP 5.2 packages for Fedora 4 (this is what has hindered me in the past).

I finally solved my problem after stumbling upon this site: remi.collet.free.fr (I knew my 3 years of French education in high school would pay off some day) that has a repository with lots of packages including update packages for MySQL and PHP for FC4.

I'll outline the steps I followed to update PHP 5, and include the commands (which happen to be included on the above site, here although it may take a few minutes to find them if you don't know French.

1. I downloaded (using wget) the repository configuration file to the repository configuration directory.

cd /etc/yum.repos.d/
wget http://remi.collet.free.fr/rpms/remi-fedora.repo

2. I made a yum call, enabling the repository in the process (as it's disabled by default).

yum --enablerepo=remi install php-5.2.4

3. I restarted Apache.

apachectl restart

And that's it. I now have PHP 5.2.4 running on Fedora 4. I'm sure that an expert could have accomplished it some other way, but as a relative n00b, I have to admit I quite rely on yum.

Thursday, November 1, 2007

Lab 3 - List App/Web Server Integration

I got off to an early start on lab 3, and had most of my code working in a couple hours. The thing that hung me up for two weeks was trying to figure out how the step "Register with the SSLB" fit into this lab. I finally asked Sam about it and he told me it was a mistake. Looking back I realize that I deserved to wallow around in confusion for not having asked the question sooner.

I used PHP again, I think I've pretty much given up on Python this semester... there's always next semester. The template engine I've been using, Smarty, is pretty powerful. Not that I need to exercise its full power for this project, I really like it though. I actually like it so much that I've switched to it from XSLT on another project I'm working on.

One of the other challenges I had during this lab was figuring out how to do the URL rewrites, as the keyword /submit had to go to the submit page, and /everything-else had to go to the idea list for the domain 'everything-else'. I'm sure there are Perl gurus out there who could have whipped up a regex in 5 seconds to handle that... unfortunately I'm not one of them. What I did was rewrite all URLs to go to a driver script that parsed the original URL, and then used PHP objects to generate the appropriate response in each case. These PHP objects were converted from the scripts I originally planned to redirect to for each action.

Testing this lab also turned up a bug in my listing app server. It was a subtle bug that manifest itself by returning only 1 idea for a given domain (even in cases where there were more than 1). Once I identified it, I feared the worse. It took me 30 minutes to track down the source, which turned out to be a missing '$' on my loop variable (which was used to address an array). I won the book in class for the PHP quiz (for identifying the MySQL wasn't enabled by default in a PHP installation)... but I don't know enough to understand why $myarray[i] (should have been $myarray[$i]) didn't cause a more visible error. I'll have to check my error reporting settings in my php.ini file.

I have to say I've been having a great experience in this class, overall. The greater emphasis on architecture and design, and lesser emphasis on KLOCs has been an effective approach. I can think of 2 other CS classes (off the top of my head) I have taken that could benefit from this model.

Tuesday, October 2, 2007

Lab 2 - List Server

After climbing over the learning curve of the first lab, lab 2 was a piece of cake. Speaking of cake, I switched to PHP for this lab. I had my fill of mod_python for a little while, I may switch back to it for the next lab, but I haven't decided.

I used Smarty for templating. It's one of the only 2 templating systems I've used for PHP (the other being the one built into PHP itself), and I prefer Smarty. It has plenty of features, is pretty lightweight and performs very well after the templates have been compiled and cached. Getting PHP and Smarty working was trivial, I had a little more difficulty getting JSON working.

I was able to get my code written quickly on my development machine. I used file_get_contents to retrieve the JSON data, which reads/stores the contents of a file or the response from an HTTP GET request as a string. PHP's JSON library has two methods json_encode and json_decode. By means of a flag, you can decode a JSON message into a PHP object, or an associative array. I chose the associative array, and used a template to generate my XML.

The only difficulty I had was setting up PHP's JSON library on my EC2 server. The library comes bundled with PHP 5.2, but not with the previous versions. And there is no yum installer for PHP 5.2 on FC4. yum install php_json did the trick in the end (took me about 2 hours to get there).

I used mod_rewrite to map our URL interface to my backend. It worked well once I got my regular expressions correct.

P.S. - My bill for September was only $1.81.

Thursday, September 20, 2007

mod_python.publisher?

Getting an early start on this lab wasn't as productive as I had hoped it would be. If it was intended as a learning experience, then I have to conclude it was 100% successful because I learned a lot.

My Linux experience is primarily what I've done in CS240 and CS360... so I expected that part to be challenging. I blundered my way through it with lots of help (especially those who posted scripts and tips).

The difficulty I had getting my script to run took me by surprise. The majority of examples I found on the web regarding getting mod_python set up in httpd.conf showed using the publisher handler. None of them really explained how the publisher handler worked and I assumed it worked like a typical CGI handler. As it turns out, mod_python actually has a cgihandler that behaves like a CGI handler... the publisher handler is something different (there's also a psp handler).

The publisher handler works by mapping methods in the script to URLs. For example a script named index.py with a method named index and a method named hello would map to the URLs http://somesite.com/ & http://somesite.com/hello respectively. For a more complete treatment see the documentation here: http://www.modpython.org/live/current/doc-html/hand-pub-alg-trav.html

This is definitely a different way of looking at things, especially if you're like me and come from a folders-and-files-map-to-urls kind of background. I'm looking forward to experimenting a little more with the publisher handler, because I feel like I may only be chipping at the tip of the iceberg.

There are a few other things I'm starting to like about the publisher handler. It adds a lot of (much welcome) abstraction to getting GET and POST data, for example. If you are just using Python the 'plain old' CGI way, I'd recommend giving mod_python a try. It adds a little bit to the learning curve, but I have a feeling it's going to pay off in the end.

Thursday, September 6, 2007

CS 462.

My name is John Dusbabek, I am taking CS 462 (Large Scale Distributed Systems Design... and implementation hopefully) at Brigham Young University, Fall 2007.

This blog will serve as my lab notebook. Here I will post any required lab write-ups for grades... and also things I learned "the hard way" so as to be helpful to others who may follow in my footsteps. These may or may not be useful depending on what direction the course goes in future semesters. I understand from Dr. Windley that historically the course structure for CS 462 is quite volatile, and this blog will probably deal primarily with implementation specifics... so who knows?

Personally I'm excited about taking this class. I wish our department offered more classes like it.