Building Web Services with JAX-WS : An Introduction

Developing SOAP based web services seem difficult because the message formats are complex.  JAX-WS API  makes it easy by hiding the complexity from the application developer. JAX-WS is the abbreviation for Java API for XML Web Services. JAX-WS is a technology used for building web services and clients that communicate using XML.

JAX-WS allows developers to write message-oriented as well as RPC-oriented web services.

A web service operation invocation in JAX-WS is represented by an XML-based protocol like SOAP. The envelope structure, encoding rules and conventions for representing web service invocations and responses are defined by the SOAP specification. These calls and responses are transmitted as SOAP messages over HTTP.

Though SOAP messages appear complex, the JAX-WS API hides this complexity from the application developer.  On the server-side, you can specify the web service operations by defining methods in an interface written in the Java programming Language. You need to code one or more classes that implement these methods.

It is equally easy to write client code. A client creates a proxy, a local object representing the service and then invokes methods on the proxy. With JAX-WS you do not generate or parse SOAP messages. Thanks to JAX-WS runtime system that converts the API calls and responses to and from SOAP messages!

With JAX-WS, web services and clients have a big advantage which is the platform independence of the Java programming language. Additionally JAX-WS is not restrictive: a JAX-WS client can access a web service that is not running on the Java platform and vice versa. This flexibility is possible because JAX-WS uses technologies defined by the World Wide Web Consortium – HTTP SOAP and the Web Service Description Language or WSDL. WSDL specifies an XML format to describe a service as a set of endpoints operating on messages.

Three most popular implementations of JAX-WS are:

Metro: It is developed and open sourced by Sun Microsystems. Metro incorporates the reference implementations of the JAXB 2.x data-binding and JAX-WS 2.x web services standards along with other XML-related Java standards. Here is the project link: http://jax-ws.java.net/. You can go through the documentation and download the implementation.

Axis: The original Apache Axis was based on the first Java standard for Web services which was JAX-RPC. This did not turn out to be a great approach because JAX-RPC constrained the internal design of the Axis code and caused performance issues and lack of flexibility. JAX-RPC also made some assumptions about the direction of Web services development, which turned out to be wrong!

By the time the Axis2 development started, replacement for JAX-RPC had already come into picture. So Axis2 was designed to be flexible enough to support the replacement web services standard on top of the base framework. Recent versions of Axis2 have implemented support for both the JAXB 2.x Java XML data-binding standard and the JAX-WS 2.x Java web services standard that replaced JAX-RPC(JAX-RPC: Java API for XML-based Remote Procedure Calls. Since RPC mechanism enables clients also to execute procedures on other systems, it is often used in a distributed client-server model. RPC in JAX-RPC is represented by an XML-based protocol such as SOAP). Here is the project link: http://axis.apache.org/axis2/java/core/

CXF: Another web services stack by the Apache. Though both Axis2 and CXF originated at Apache, they take very different approaches about how web services are configured and delivered. CXF is very well documented and has much more flexibility and additional functionality if you’re willing to go beyond the JAX-WS specification. It also supports Spring. Here is the project link: http://cxf.apache.org/

At Talentica we have used all the three implementations mentioned above in various projects.

Why CXF is my choice?
Every framework has its strengths but CXF is my choice. CXF has great integration with the Spring framework and I am a huge fan of Spring projects. It’s modular and easy to use. It has great community support and you can find lots of tutorials/resources online. It also supports both JAX-WS and JAX-RS specification but that should not be a considering factor while choosing CXF. Performance-wise it is better than AXIS and it gives almost similar performance when compared to Metro.

So CXF is my choice, what is yours?

Event: Node.js Over the Air!

Over the Air sessions at Talentica are technical workshops where a bunch of developers roll up their sleeves, tinker around with new platforms/technologies to learn together, gather new insights and get a healthy dollop of inspiration. Last week we had an “Over the Air” session on Node.js.

Node.JS is a server side javascript interpreter that changes the notion of how a server should work. It’s goal is to enable a programmer to build highly scalable applications that handle tens of thousands of simultaneous connections on a single server machine. Node.js is one of the most talked about technology today. To know how it works really, we picked it up for this Over the Air session.

Once we gathered up, it took a little while for some of the participants to get used to the event-driven programming style. Pretty soon, we were all working together on building a cool chat app. By the end of the day, we had a fully working version of a chat room app in which any user can enter the chat room by simply entering a nickname. Subsequent entries are posted to all logged in users. The right side pane shows all the logged in users.

This is a fairly decent basic version. Going forward, we plan to enhance the User Interface so that people can play games using the chat app; integrate the UI with the chat engine and enable users to be able to challenge each other to play while chatting.

First Impressions
Node.js is an excellent option for interactive apps. I will not hesitate to use Node.js in products that require interactive functionality like chat, auctions, online multiplayer games. One can use Node.js to suit a part of the product than building the complete product on it.

The fact that we can code server side with Javascript should make Javascript developers jump with joy. Code reuse between client and server side might actually be possible!

On the negative side, I am not sure if the event programming model is a good one on the server side. It might lead to spaghetti code with callbacks all over the place. Another thing is that though the community is very active and plug-ins are being developed at a rapid pace – it is still not a tried and tested technology at this moment!

Multi-server Applications on the Wireless Web

Here we will discuss how we can build Web applications that can serve wireless clients according to client capabilities.

What are the challenges?
Development of mobile applications is often highly dependent on the target platform. When developing any mobile content portal we generally think about the accessibility of that portal through the mobile browsers (like Nokia, Openwave, i-mode browsers, AvantGo in PDA etc) which generally use markup languages like WML, HDML, cHTML, XHTML etc. We want to ensure that the browser gets the compatible markup language and can present the portal content in correct format. In short, creating a wireless application that works on as many devices as possible is not difficult, it’s useless. If you invest a huge amount of resources today, chance are that a new device will be shipped tomorrow and you‘ll need to tweak your application again.

What is the solution?
Wireless Universal Resource File (WURFL) is an open source project that uses XML to describe the capabilities of wireless devices. It is a database (some call it a “repository”) of wireless device capabilities. With WURFL, figuring out which phone works with which technology is a whole lot easier. We can use the WURFL to figure out device capabilities programmatically and to serve different content to different devices dynamically, depending on the device accessing the content.

Here are some of the things WURFL can help you know about a device:

  • Screen size of the device
  • Supported image, audio, video, ringtone, wallpaper, and screensaver formats
  • Whether the device supports Unicode
  • Whether it is a wireless device? What markup does it support?
  • What XHTML MP/WML/cHTML features does it support? Does it work with tables? Can it work with standard HTML?
  • Does it have a pointing device? Can it use CSS?
  • Does it have Flash Lite/J2ME support? What features?
  • Can images be used as links on this device? Can it display image and text on the same line?
  • If it is an iMode phone, what region is it from? Japan, US or Europe?
  • Does the device auto-expand a select drop down? Does it have inline input for text fields?
  • What SMS/MMS features are supported?

WURFL framework also contains tools, utilities and libraries to parse and query the stored data in WURFL. WURFL API is available in many programming languages, including Java, PHP, .Net, Ruby, and Python. Various open source tools are build around this WURL – HAWHAW(PHP), WALL(Java) , HAWHAW.NET (.Net framework) , HawTag (JSP Custom tag library etc).

How does WURFL work?
When a mobile or non-mobile web browser visits your site, it sends a User Agent along with the request for your page. The user agent contains information about the type of device and browser that is being used. Unfortunately, this information is very limited and at times is not representative of the actual device. Using WURFL API, the framework then extracts the capabilities associated with that device. Based on the device capabilities, the framework creates the dynamic content – WML, HTML, XHTML etc.

Though there is concern with the extra latency time taken due to user-agent look up, it’s worth to use it looking at its advantages. One of the biggest advantages is regarding a new device if and when it enters the market, we will not need to change our application, but just update the WURFL to keep the application optimized. It is very simple and the architecture is sound. Go for it!!!

10 things to do while migrating an ASP.NET App to Azure

Here I will list a few points that you should take care of while migrating an ASP.NET application to Azure. The reader should have a basic understanding of Azure platform.

  1. Make sure that your application is 64-bit compatible since Window azure is a 64-bit environment.
  2. Convert your website project to a web application project to associate with a web role since the Visual Studio tools for Azure do not support ASP.NET website projects.
  3. Ensure that webroles listen to http requests on different ports, in case there are more than one webrole in a single service deployment. Webservices in the application can associate to a webrole.
  4. Migrate Windows services to a worker role. OnStart() method of the Window services class is equivalent to the Run() method of the RoleEntryPoint class.
  5. Consider moving out some or all of the settings in config files to service setting files that do not require redeployment after every change, though app.config and web.config files work without any change.
  6. Make sure that your web application does not have any issues with running on IIS 7, as Window Azure webroles runs on IIS7 Integrated mode. In my case the application was using the WCSF framework which has a known issue with IIS7. Fortunately there is a work-around to make the WCSF work on IIS7. Please note that IIS7 Integrated mode has removed the HttpRequest context from the Application_Start event.
  7. Ensure that your application is not caching anything in the ASP.NET Cache or Session. There is a sample on codeplex that illustrates how to host Memcached in a worker role. This sample also provides a .NET client for Memcached and a sample webrole project for performance monitoring of your Memcached instance. Where ever you need to cache anything you can use Memcached or Azure storage.
  8. Upload and save files to a blob in Windows azure storage, if your web application allows user to upload files, which I think is easiest, or you can save the file in a simulated Xdrive. All data in Window Azure Storage can be accessed with HTTP requests that follow REST conventions.
  9. Set the default page in web.config under the system.webServer node (Supported in IIS7, since you cannot configure IIS in Azure).
  10. Get a SQL Server 2008 R2 client to work with SQL Azure. You can migrate the Database Schema to SQL Azure database by using the generate scripts wizard and use SSIS to move data into and out of SQL Azure. Once you migrate the database to SQL Azure successfully you just have to change the connection string in your application.

Here’s a blog that discusses some more migration issues.

Machine Learning for Text Extraction

In a previous post we looked at the use of Natural Language Processing techniques in text extraction. Several steps are involved in the processing as each document passes through a pipeline of chained tasks.

A deep pipeline can take several seconds for a document. So if one is dealing with thousands of documents an hour the processing requirements could make the system nonviable. Care needs to be taken to evaluate the trade-off between the improvements in accuracy caused by adding pipeline tasks with the additional processing power that it entails.

One reason for the slow speed in our email processing is that we are parsing the entire email and all emails regardless of whether they are of importance to use. In our case only 2% of the emails received will be of interest. So we would like to reduce the amount of text we process by ignoring the unwanted stuff. This process of weeding out irrelevant text should itself not take too long otherwise our purpose is lost!

Machine Learning (ML), which is a key area in AI, offers a solution. GATE comes with various machine learning Processing Resources implementing common ML algorithms like Support Vector Machine (SVM), Bayes classification and K-nearest neighbor (KNN). You “train” the algorithm using training sets of text samples.

Training is done by manually classifying sentences in a binary fashion: is this sentence of interest to me or not? Ideally you need thousands of representative sentences. The algorithm is then trained on this data: internally the various features and annotations are used to reverse engineer patterns based on the manual classification.

In production you first run your input text through the Machine Learning pipeline task. If it predicts that the text is of interest then you run it through the rest of the pipeline, otherwise ignore it. The problem is that this prediction is probabilistic. There could be two kinds of mistakes, one where it wrongly tells you that a dud document is of interest, causing wasted CPU cycles. A more troublesome mistake is when a valid document is marked as of no interest.

In our case for example this is an unacceptable error. We will miss reporting valid events to customers and they will no longer be able to rely on our service to do so. Unfortunately ML algorithms are such that these two types of errors cannot be reduced independently: if you want all valid documents you also get a lot of duds eating up your cpu cycles.

In addition ML can give you strange results. Bad data in your training sets can have a significant impact on your results. Debugging such issues is very difficult because of the non-deterministic nature of learning algorithms. A lot of trial and error is involved, mostly tedious work manually annotating documents, running different training sets and validating the results on real data.

However as in the deterministic NLP process using JAPE the result is magic. Once you have your training sets clean and complete the ML task can significantly weed out unwanted documents. Iteratively adding runtime learning to the system (where you enhance the training sets as you go along) can add dramatic improvements over time.

After the first experience with email parsing we are now using NLP in another project. We have a product for recruiters where resume parsing is an important piece. It currently parses candidate information using regular expressions and string matches.

The accuracy is around 80% for basic information which is a problem since 1 out of 5 fields is missed or wrong. Using a slightly different pipeline from the one described above and building in some heuristic in a custom PR we have been able to get to over 95% accuracy in the lab. In addition we are now extracting several other types of information which was considered too difficult to do using traditional programming.

Our experiences have made us look at other aspects of NLP like collaborative filtering and content-based recommendation engines as well as enhanced search using NL techniques. You might see a post on this soon!