-
Notifications
You must be signed in to change notification settings - Fork 0
Plan for Summer 2014 (Web App)
The tool currently resolves the following primitive types: string, int, long, double, boolean. The type resolution can be extended to also include: dateTime, date, time, short, byte, anyURI according to the XML Schema definition. Types are resolved based on special regular expression for each type. We can construct more regular expressions for the additional types using information from XML Schema Tutorial. Special consideration should be taken about parameters that are lists of elements.
Enumerations are types that can take a value out of a predetermined and finite set of values. Enumerations cannot be identified, unless multiple URLs are provided as input. The process is not subject to the number of URLs provided, but it just doesn't give any results after processing a single one. For every input parameter and response element, a Map is stored, whose keys are the values of the parameter and the values are the frequency with which the particular parameter value is encountered. To count the occurrences of a value, we use exact matching. Similar values may be coincidental but exactly the same values are probably on purpose. After all the URLs have been processed, if there is a value with a frequency greater than 1, the specific parameter or element is assumed to be an enumeration. The parameter is still presented as a primitive type to the user, but now there is a notification (an asterisk) and a tooltip that indicates that this may be an enumeration along with the possible values and their frequencies. The tooltip also contains a button that allows the user to automatically change said parameter into an enumeration (a param with options for the request or a simpleType with enumerations for the response).
In programming, there is usually a convention that when specifying enumerations the possible values are specified in capital letters. Therefore, if the previous value frequency condition and the uppercase condition hold, then we are more certain that this parameter is an enumeration and it is presented as such in the final output. There is a similar button that would automatically turn the parameter back to a primitive type.
Currently, when a WADL file is parsed for any of the application's functions, there is no functionality to process simpleTypes. Since the tool will create simpleTypes for the enumerations, it is imperative to also be able to read them back. There is a SimpleType class already in the model, but it is never used.
We have encountered APIs, especially those that have accounts or usernames, where an account is considered a resource (see Twitter and GitHub for example). However, it is wrong to put down every account as a separate resource. Alternatively, WADL has the capacity to define such resources that share the same children resources and/or methods but have different names, by surrounding their identifier in curly brackets (e.g., {/id}). In order to identify such resources, after building the WADL file, we will have to further process the resources and find those that may have the same parent resources or belong to the same service endpoint (i.e., resources element with the same path) and that have the same structure (i.e., same children resources and/or method) and merge them together and change their identifiers to something common (e.g., {/id} or {/account}).
Type confidence informs the user how certain the tool is about the resolved types. Next to the resolved type there is a circle, whose color indicates the type confidence. For each input parameter or response element, there is a Map that stores the frequency of all the types resolved while processing a set of input URLs. The final confidence of a type is the ration of its frequency to the total number of URLs processed. The user is presented with the most generic type that can expressed all the encountered values for this parameter (e.g., everything can be expressed as String and every number can be expressed as Double). If the presented type is different from the most frequent type, the latter is presented in a tooltip along with its confidence.
If an existing WADL file is provided, it is considered an authority in terms of the types it defines. When additional URLs are processed, the confidence of the file's types are regulated according to a special formula (see paper for more details).
After the generation of the WADL and any additional manual processing, the user should be able to download the file locally in its proper format.
The .wsmeta file contains all the meta-data of the generation process, including values, their frequencies and the type frequencies. If the user has used WSDarwin to generate a WADL, they can upload the WSMeta file instead of the WADL for future use of the application in order to improve its results. WSMeta files are particularly useful for cross-service comparison.
A pure REST request is a HTTP request without parameters. It simply asks for a resource by providing its URI. In this case a name has to be inferred for the HTTP method. Currently, we are using the name of the resource that is accessed. This can be potentially problematic and has to be considered in the case of REST client adaptation.
Mapping of elements is the fundamental step of every comparison task. It is first needed to confirm that two elements of a service interface correspond to the same thing, before they are compared to identify if there are any changes between them. The WSDarwin service comparator employs four similarity measures to determine the correspondence between the elements of two service interfaces. Then, the measures are combined (as an arithmetic or weighted average) to determine the overall similarity between the elements and the Hungarian algorithm is used to find the best mapping.
According to a very popular programming convention, variables are named using more than one words (tokens) to convey their meaning as precisely as possible. These words are concatenated in a single string either using underscores or with camel case. Therefore, simply calculating the string similarity between two names may not be enough, either because of the order of the words in the name or because long words may skew the overall similarity of the names.
So, we propose to calculate the name similarity on a token basis; the more common or similar tokens two names have, the more similar they are with each other. To make sure that we calculate best possible similarities between the tokens, we map them using the Hungarian algorithm. Eventually, to calculate the overall similarity between the two names, we take the average of the token similarities.
Semantics will allow us to map elements that may not have the same name but can potentially refer to the same concept (e.g. _find_and search). To calculate this similarity we use the WordNet service. This service returns the lexical relationship between two tokens (synonyms, antonyms, derivatives). According to the relationship we assign a similarity degree to a pair of tokens (+1 for synonyms, -1 for antonyms and +0.5 for derivatives). Again, we use the Hungarian algorithm to find the best mapping between the tokens. Then, these degrees are summed up and normalized by the size of the longest name, the one with the most tokens (as if all tokens were synonyms with each other), to get the overall semantic similarity between the two names.
Value similarity can be calculated for input parameters or output elements of primitive type. When we are comparing different versions of the same service, values can be provided as test data. If the values returned by the new version are similar to those provided as test data that correspond to the old version, then the elements can be mapped. In cross-service comparison, since we are comparing the latest version of both service, we can use the input request to invoke the services and then compare their response values to determine their similarity.
Structural similarity can be applied only on complex elements to assess how similar they are in terms of their children. Therefore, the structural similarity between two leaf elements is 0.0. For complex elements, we calculate the three similarities for all their children and map the children using the Hungarian algorithm. The structural similarity between two complex elements is the average similarity of their children.
Refactoring changes are more complicated than simple additions, deletions and changes but they can be expressed by the simpler ones. The identification of refactorings in the interface of the service will show to the developers that these are completely automatically adaptable changes that will probably haven't affected the service's behavior. The current functionality of the tools can easily support the identification of the following refactorings.
Rename is simply a change delta where the changed attribute is the element's identifier. WSDarwin already identifies such changes, but it doesn't name them as refactorings.
Moves are identified by mapping added and deleted elements regardless of their current parent. If a deleted element in the old version is mapped to an added element in the new version, then it is considered that the old element was simply moved to the position of the new element. WSDarwin can recognize Move deltas.
If a complex element was added in the new version and its children were found to have been moved from another complex version, then this is considered an extraction; the moved elements were extracted into a separate entity.
This is the opposite refactoring to the extraction. A complex element was deleted and its children were merged (moved) to another complex element.
There are many attributes that can be specified for WADL or XSD elements other than their identifier, like multiplicity, value ranges, if the element is optional etc. These attributes can play a role in the client application. Therefore, if the adaptation is t be considered complete, these attributes should be parsed, compared and taken into account during the client adaptation.
The mapping between WADL elements, calculated as it was previously discussed, can be shown by presenting the elements side-by-side in the web app interface. If the elements of the two files don't follow the same order the second file needs to be rearranged to clearly show the mapping.
In case of added or deleted elements, the corresponding space in the other file should be clearly shown in the editor, so that the element mapping is not disturbed.
In case of change deltas, we need to clearly show the changed attribute, perhaps with a darker shade of yellow.
See the mapping as it was discussed previously.
If we following this mapping process, we will be able to also map higher level elements (methods, resources, operations, interfaces) and not just input parameters and response elements as it happens currently.
The code for mapping by value needs to be refactored and cleaned.