Friday, February 29, 2008

Magik FME Interface can introduce unwanted data on export

I have always been very interested in using Safe Software's Feature Manipulation Engine (FME) with Smallworld. I think FME is a very cool and powerful tool. Occasionally, though, I encounter problems with the FME Magik interface that I think are worth sharing with the Smallworld/FME community.

The current issue has to do with the FME-Smallworld interface introducing data to FME features that was never part of the Smallworld data to begin with. In the Smallworld VMDS (Version Managed Data Store) it is possible to define a field as "non mandatory". As the mode implies, that means that the data validation code will not require that you populate that field with an attribute value. VMDS does not have a data type called "NULL" like other databases do. So to allow a "non mandatory" attribute value to be represented as "empty" VMDS requires that the data dictionary be defined with an "unset_value" for that attribute. An "unset_value" is a value that will represent "no value" (or NULL; or _unset) for that attribute.

So, for example, if you defined an attribute called "installation_date" as data type :ds_date and you want to enable this attribute as "non mandatory", the CASE tool will prompt you for an "unset value". Actually, it will suggest that you use the "date" Magik class as the unset value. The "date" Magik class happens to be 01Jan1970. This means that whenever you enter 01Jan1970 into the "installation_date" attribute VMDS will interpret that you mean that you are actually assigning _unset to that attribute.

So, getting to the problem with the Smallworld-FME interface. A client of mine discovered some time ago that if you use the FME 3.3 interface and you have one of these non-mandatory date fields and the value of the attribute is _unset, then the Smallworld-FME interface will actually send the unset_value to the export format. That seems very wrong to me. If your Smallworld database states that the "installation_date" is _unset, I do not want my exported-via-FME format to now show that the "installation_date" is 01Jan1970.

Luckily for us, that bad use of unset_value was all coded in Magik and could be patched by us. In FME330 method dd_record_mixin.fme_export() there is a block of code...

#--- Use the default value if available
_if (fdata << _self.perform(fld_name)) _is _unset _andif
fdesc.responds_to?(:unset_value)
_then
fdata << fdesc.unset_value
_endif

... that basically says: if an attribute value (fdata) is _unset, then try to send that attribute's field_descriptor's (fdesc) unset_value instead. Wrong! If an attribute value is _unset then the code should send _unset to FME as well. I think it is dangerous for a data mapping interface to introduce data to an export format that does not exist in the source data. If you want to introduce data to the export format that does not exist in the source data then modify your .fme or .fmw files or use the fme_pseudo_field functionality. A user should have the confidence that the source data will be faithfully transmitted to FME without any data massaging.

If you are still with me to this point, you will be happy to know that I am finally getting to the point of this post. The Smallworld FME interface was completely rewritten at CST 4 and I assumed that the problem with ds_date (and ds_time) unset_values had been resolved. It has not! Instead, the problem has been made worse. The logic for deciding whether to default an _unset non-mandatory ds_date/ds_time value to its unset_value has been taken out of Magik and moved into the ACP/DLL files. And the logic no longer even tries to access the field descriptor's unset_value. Now the logic always sends "19700101" as the date string to FME regardless of what your data dictionary specifies for an unset_value.

Hopefully it is clear what this issue means for you. If you have non-mandatory date and time fields in your VMDS database, then all blank values will show up as 01Jan1970 as far as FME is concerned. I suppose you could work around this by modifying your .fme or .fmw files to test for "19700101" and convert it to a blank value. That could work some of the time but what if 01Jan1970 was a valid date in your business. You would never be able to know whether the "19700101" represented 01Jan1970 or _unset.

I have submitted a case report to GE on this issue yesterday and look forward to hearing their response.

If you are using FME at CST 4 (or higher), you can run this block of Magik code to identify a blank non-mandatory ds_date/ds_time attribute in your VMDS database and then send it to FME so that you can use one of the FME tools (I recommend FME Viewer) to see that your blank date/time values in Smallworld have been converted to a non-blank date in FME.


_block
# this block of code works at CST 4.x
sw_module_manager.load_module(:fme_engine)

# start the fme_tics_server if not already running
_local current_server
_if (current_server << fme_tics_server.current_server) _is _unset _orif
_not current_server.acp_running?
_then
fme_tics_server.start(_unset,_unset,:gis)
current_server << fme_tics_server.current_server
_endif

_local field_type , field_name , a_table , test_record
_constant TYPES << {:ds_date,:ds_time}

# make sure your VMDS dataset is correctly named here.
_constant V << gis_program_manager.cached_dataset(:gis)

# you can restrict the list of table names here to the tables
# that you regularly export to FME...
#_constant TABLE_NAMES << {}
# ... or you can choose to scan all user tables in the current
# collection.
_constant TABLE_NAMES << V.user_table_names

write("looking for a record that has _unset as a value for a non-mandatory date or time field...")

_for table_name _over TABLE_NAMES.fast_elements()
_loop @outside
a_table << V.collections[table_name]

_for fd _over a_table.physical_fields()
_loop
field_name << fd.name

_if (field_type << fd.type) _isnt _unset _andif
TYPES.includes?(field_type.name) _andif
_not fd.mandatory?
_then

_for a_rec _over a_table.fast_elements()
_loop
_if a_rec.perform(field_name) _is _unset
_then
test_record << a_rec

write("found that '",test_record,"' has a blank value in non-mandatory field '",field_name,"'")
print(test_record)
_leave @outside
_endif
_endloop
_endif
_endloop
_endloop

_if test_record _isnt _unset
_then
write("sending the record to the FME server. Please view the results using FME Viewer. Specifically compare the value of the blank date/time attribute in Smallworld to the value in FME.")
current_server.set_output_rwo_set(rwo_set.new_from({test_record}))
_else
write("Could not find a record with a blank non-mandatory date or time field.")
_endif
_endblock
$

In my example, the Smallworld record looked like...
fme_unset_date_tester8672:
:id 1237639
:date_field unset
:time_field unset

But the FME Viewer showed the attributes as...



Note the different values for the date attributes.

Good luck!

No comments: