Michael Cochez

Assistant Professor at Vrije Universiteit Amsterdam

Advanced task TIES532 - RESTful web services

Goal

Learn more about RESTful APIs and how to use them. In this exercise the student needs to interact with two slightly similar APIs. This task also requires the HTTP POST method to upload files to a server. In order to be more memory efficient, streams need to be used. Further, thread pools are introduced to speed up the copying process using multithreading.

Prerequisites

The student must understand the RESTful web services exercise from TIES456, i.e. the student should be able to implement the basic exercise and understands the workings. All material which is prerequisite for the basic exercise is also prerequisite for this exercise. HTTP post messages with multiple parts use MIME for encoding of the parts. Read about MIME on http://en.wikipedia.org/wiki/MIME especially about multipart messages. If you never created your own Java libraries before, read http://en.wikipedia.org/wiki/JAR_(file_format) and how to create jar files inside the eclipse IDE. If you have no experience with Java thread pools, read the documentation for more information. When implementing this task in another programming language, it still pays off to at least skim through the documentation to understand what the analog construct are in your language of choice.

Task

The goal of this task is to create a service which is able to back-up the user’s files from the mediafire service (http://www.mediafire.com/) to the sendspace service (http://www.sendspace.com). The student’s service only has to work in that direction and not the other way around because the sendspace service requires a paid subscription to be able to implement the service in the other direction. Your service is accessed through a (simple) website where the user’s credentials for both services can be provided and the process can be started. The user has to be informed about completion or failure of the process.

The process should connect to both services and check which files and folders from the mediafire service are missing on the sendspace service. If a file is missing on sendspace, the file needs to be downloaded from mediafire and uploaded to sendspace in the right directory. Note that the file has to pass trough the server for this exercise. The process should search recursively in all directories. Comparing two files can be done by only using the file’s name. (Thus, if sendspace has a file with the same name in the same location, the file can does not have to be copied.) The actual copying (downloading/uploading) should be done in a job on a threadpool (see below). This will make the copying jobs work in parallel.

The API descriptions can be found on the mediafire developers site and the sendspace equivalent.

Do not to buffer too much data. Use streams wherever reasonable. Especially while copying whole files from the one service and writing them to the other one.

The final delivered task are 2 projects programmed in Java (Java, Dynamic Web Project, Vaadin, Maven, and other technologies are allowed) or a language of your choice (In this case the teacher won’t likely be able to help much). Try to follow a multi-tier architecture. The first project/package/namespace contains the classes (and libraries) used to communicate with the above mentioned services. This is the logic tier. From this project, you need to create a .jar (or equivalent) file and use it as a library for the second project which contains the web site code, which is the presentation tier. Note that the mediafire and sendspace services function as the data tier of your service. See also http://en.wikipedia.org/wiki/Multitier_architecture.

Returning the task###

  • Both projects need to be pushed to a separate git repository.
  • Add the teacher (user name:miselico) as a collaborator to the repos.
  • Add a README.txt or README.md file to the root of your repo in which you document your choices and the authors.
  • You need to arrange a meeting with the teacher to show the task.

Hints

  1. Look at the Hints section of the basic exercise. Except for remarks about thread safety, all of them are still valid.
  2. You can check whether all jobs submitted to the thread pool are finished by using a CountDownLatch or other helper classes from java.util.concurrent.
  3. Make sure that your mediafire read and sendspace write methods are thread safe! This can be done by synchronizing correctly or avoiding concurrent access altogether. The latter one is most often easier to implement.
  4. You can assume that the user does not have duplicate file names in the directories. (Or just ignore them if they are there)
  5. Think about how you design the class structure. Do not put all the code inside one class/method. (you could consider having classes/interfaces like MediaFire, SendSpace, WebFolder, WebFile, …)
  6. Perhaps the most difficult part of this task is the file transfer. The main difficulty is that the sendspace service expects you to set the Content-length header, i.e. specify the length of the transmission in the HTTP request. Combined with the fact that you need to use streams, several things have to be kept in mind.
    1. You cannot use the fluent API of the http client libraries for the retrieval /sending Content c = Request.Get(uri).execute().returnContent() The reason is that this way you will not be able to get the length of the file before you download it completely, while you need it before you can start uploading.
    2. You can get the content length of the HttpResponse object by <pre> Header[] contentLengthHeaders = response.getHeaders("Content-length"); final long length = Long.parseLong(contentLengthHeaders[0].getValue());

    </pre></code> Of course you should first check whether the header exists, and throw an exception if it is not set.Another download related issue is that you can only perform <pre> EntityUtils.consume(entity); httpGet.releaseConnection(); </pre> After uploading the whole file to the other service.

    1. When looking at the upload API of Sendspace, you will notice that they show an HTML form for the upload. What you now have to do is mimic a form submission in your own code, including all the fields which would be filled in the browser. This can be done using a MultiPartEntity, where textual values are added by <pre>entity.addPart("string_field",new StringBody("field value"));</pre> Files (streams) can be added in a similar way by using an InputStreamBody. There is however still the complication of having to specify the size properly. To solve this issue, read this blog post. In the case described in the blog post the size is known because of the in memory byte buffer. In this exercise the size is known from the Content-length header during the download. The response which comes back from the upload call is not XML nor JSON. It is just a text file. You can assume that everything went alright if the response starts with “upload_status=ok”. Disable the feature of sending an e-mail to the user for every newly created file on sendspace. This can be done by setting “notify_uploader” to “0” as in <pre>entity.addPart("notify_uploader",new StringBody("0"));</pre>
  7. Using thread pools, you can opt for the fixed thread pool. <pre>ExecutorService exec = Executors.newFixedThreadPool(10);</pre> You will however notice that this threadpool does not stop working when your application stops because the threads in the pool are still “actively waiting” (they are blocked). You should only create one (static) threadpool for your application. In that case there will be a fixed overhead of 10 blocked threads which is not a big problem. When your application is ran on a server, it will not need to stop on its own anyway. If you want to create a pool which stops working when all other threads in the application have finished their jobs, you can look at the MoreExecutors class of the google guava library. Obviously you can devise a different strategy for managing the threads.
  8. You are free to use libraries as you see fit. When not using Maven, all used libraries must be inside the project and pushed to the git repository.
  9. Error checking should be implemented up till a reasonable extend. (Check HTTP status codes and API return values.) When an error is found, you do not have to write extensive handling of the error. (No retry, no alternative strategies, etc…) Just show an error message to the user. Do not let the whole application die when a normal exception is thrown though.