EDQ and Siebel – Batch DeDuplication

Having set up Siebel and Oracle Enterprise Data Quality (from now on in known as EDQ), I wanted to put it through it’s paces.

Real Time De-duplication works like a charm, provided you kick off the Real Time jobs in Director and have your Web Service URLs set up correctly.

Batch De-duplication, however, uses a different mechanism (JMX) and the out of the box installation and configuration doesn’t quite leave you in a position to run batch dedupe through the Siebel Client.

After a really useful conversation with Mike, Nick and Richard at Oracle (experts in EDQ and it’s integration into Siebel), I was able to make appropriate changes to the configuration to enable batch de-duplication. My heart felt thanks go out to them all for their dedication and commitment to helping lowly developers like myself!

JMX Port Configuration

By default, EDQ on Windows configures JMX to listen on port 9005. However, by default the Siebel Connector is configured to look on port 8090.

To rectify:

  1. Modify the entry in dnd.properties to match the port specified in director.properties. For example:

    jmxserver = hostname:9005

  2. There is no need to restart anything. The next job to use the DQ Connector will automatically re-read the configuration

JMX Interface

By default, the JRE used by EDQ publishes the JMX interface on localhost (127.0.0.1). Now this may be because of my set up on VirtualBox – it may or may not cause you a problem. However, if you see errors in the connector log relating to connecting to JMX, you may be experiencing this problem.

To resolve:

  1. Create a new file called jre.properties in the same folder as the director.properties file on the EDQ server. Using the default installation, this will be in:

    C:\Program Files\Datanomic\dnDirector\config

  2. Within the file, add the following configuration item:

    java.rmi.server.hostname = <EDQ HostName>

  3. Restart the Datanomic Application Server service

Test Batch DeDuplication

Testing is now straight forward:

  1. From within Siebel, navigate to Site Map > Administration – Server Management > Jobs
  2. Create a new job, using the ‘Batch Account match’ template
  3. Submit the job and await completion
  4. Navigate to Site Map > Administration – Data Quality > Duplicate Accounts
  5. See your deduplicated data and merge!

Having now used EDQ alongside Siebel, I am really, really impressed. Previous DQ attempts have felt really clunky but EDQ fits really nicely alongside Siebel. The real time deduplication works well and is very easy to configure. Batch cleansing and deduplication also works flawlessly, once the tweaks above have been applied.

I get the impression that Oracle are really committed to this software as a solution, too. Whereas SSA-NAME5 and ISS seemed like stop gap solutions, EDQ is feeling like an integrated technology and something that Oracle are building into their Fusion and Siebel roadmaps. Here’s hoping!

VN:F [1.9.22_1171]
Rating: 9.3/10 (3 votes cast)

OEDQ and Siebel – Configuring SSL

UPDATED: Following some really useful feedback from a very kind gentleman from Oracle, it has been noted that EDQ will actually allow client applications, such as Siebel, to invoke the Web Services over standard HTTP. As such, there is no need to configure Siebel to use SSL as described below. Simply specify the HTTP URLs in dnd.properties, using the default 9002 port. For example:

httpprefix = http://OEDQ9-VM:9002/dndirector/webservices

One thing that really stumped me was getting Siebel to talk to OEDQ via the predefined Web Services.

Unlike other configuration that I’m used to, the OEDQ Web Service URLs are not stored or mastered in Siebel – they are defined implicitly when you install OEDQ. You merely tell Siebel where to find them via the config file. This causes a problem as the Web Services are configured to use https (SSL). What you’ll see when you add a new Account or Contact is an error in the adapter log:

INFO: 11-Jun-2012 22:48:33: datacleanse failsafe fallback after: problem sending web service request: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target (code 234,307)

The way around this is rather convoluted, but interesting and effective nonetheless:

Generate a Self Certified SSL Certificate

  1. Connect to your OEDQ server machine
  2. Note that the current Apache keystore resides, by default, in <Install Folder>\Datanomic\dnDirector\tomcat\6.0\conf\dncert.p12
  3. To make things simple, we’re simply going to create a new keystore, so that we can revert to default at any point. We’ll keep the same name and keystore type (PKCS12) so we don’t have to reconfigure Apache
    • This is easily changed in server.xml, though!
  4. Rename the existing dncert.p12 file
  5. Open a command prompt and CD to your Java JRE bin folder
  6. Execute the following command line to generate a new keystore and certificate:
    keytool.exe -genkey -keyalg RSA -alias selfsigned -keystore <Install Folder>\Datanomic\dnDirector\tomcat\6.0\conf\dncert.p12 -storetype PKCS12 -storepass dndirector -validity 360 -keysize 2048
  7. Obviously, you’ll need to substitute your OEDQ installation folder location into the command line above
  8. You’ll be prompted for some information – the thing to do here is use the machine name when asked for your First Name and Last Name
  9. Go into Services and restart the ‘Datanomic Application Server’ service

That takes care of Apache.

Test the Certificate

  1. On the Siebel Server host, fire up Internet Explorer and navigate to the OEDQ Web Services URL (by default: https://<HOST>:9004/dndirector/webservices)
  2. You’ll get a certificate error. Continue to the web site then click the ‘Certificate Error’ button in the top right of IE, next to the address bar, and select ‘View Certificates’
  3. Click ‘Install Certificate’
  4. Using the wizard, place the certificate in the following store: “Trusted Root Authentication Authorities”
  5. Click OK then close IE
  6. Reopen, navigate to the URL above and notice that the nature of the certificate has changed. We’re now ready to tell the Siebel adapter to use the new certificate

Tell the JRE instance to trust the new certificate

  1. On your Siebel Server host, download the certificate from the OEQD server by opening IE and going to Tools > Internet Options > Content > Certificates > Trusted Certificate Root Authorities
  2. Select you OEDQ certificate and select ‘Export’
  3. Use the wizard to export a ‘DER Encoded binary x.509 (.CER)’ certificate to your root drive (C:\root.cer)
  4. When you set up the OEDQ adapter in Siebel, you set a ‘javalib’ property in the ‘dnd.properties’ file – note down this location
  5. Open a command prompt and navigate to the bin folder of this Java instance
  6. Execute the following command to import the certificate into the Java keystore:
keytool.exe -import -alias oedq -keystore ..\lib\security\cacerts -file c:\root.cer
    • Note that the default keystore password is ‘changeit’


That’s it! You should now be able to invoke the real time Web Services within Siebel to deduplicate Account and Contact data.

Any problems, please use the comments box below and I’ll see what I can do to help.

VN:F [1.9.22_1171]
Rating: 9.0/10 (1 vote cast)

OEDQ and Siebel – Configure OEDQ

UPDATED: I’ve had some really useful feedback from a kindly gentleman from Oracle. I’m informed that there is no requirement for an Oracle client installation on the OEDQ server if using an Oracle staging area: EDQ uses JDBC to make the connection itself. In that respect, you must also specify the database SID, not the Service Name, when you configure dnd.properties and the Staging Area database. I’ve updated the article to reflect these comments and would like to give my sincere thanks to the person who contacted me.

Having set up the Siebel Server configuration, we need to tweak OEDQ to start matching and cleansing data from our Siebel system.

Once again, Oracle have this covered with some detailed installation instructions. The basic steps are:

  1. Install an Oracle Database instance to store match results. I created a simple, 11g Enterprise instance on the OEDQ server, a database named OEDQ and a user account called OEDQ
  2. Copy the contents of ‘config.zip’ into the OEDQ installation folder
    • You MUST restart the OEDQ Server at this point, via the Services control panel. If you do not, the .dxi import step will fail
  3. Import the edq-cds-9.0.x.dxi file via Director
    • Check that all the jobs and processes from the import file have successfully been created in your repository. If not, check step 2
  4. Run the appropriate SQL to create the temporary tables in the database instance
  5. Update the ‘Batch Data Staging Area’ entry, in Director, to point to your staging database
  6. Revisit the dnd.properties file on your Siebel Server to verify and update appropriate configuration options, specifically the database host, instance name and schema as created above
  7. Run the ‘Real Time’ jobs in OEDQ Director, so that the Web Services invoked by Siebel can access the OEDQ functionality
    • If the jobs fail to start, again check step 2

I came across an issue with the #Database Settings section of dnd.properties. Note that the correct format for the Oracle database connection is:

oracle:sid@host[:port]/user/pw

  • The :port may be omitted if the port is the default 1521
  • The /pw may be omitted if the password is the same as the username

And that’s almost it in terms of set up of the two applications.

One last step is required, however: in order to get the Real Time DeDuplication to work, we need to do some tweaking of the Apache configuration. Essentially, we’ll have to do some tweaking in order to allow our Siebel instances to talk to the OEDQ Web Server via SSL. I’ll cover this in my next post.

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)

OEDQ and Siebel – Configure Siebel

Now that we have a working instance of OEDQ, we’re going to look at how we integrate it into Siebel.

Again, Oracle really have got the documentation right this time: the instructions are extremely detailed and simple to follow. There are two stages: configure Siebel and configure OEDQ.

The initial configuration of Siebel is really straight forward and you should simply follow these steps in the installation guide:

  1. Copy the DLL file to the Siebel Server machine
  2. Create and configure the dnd.parms file
  3. Copy across appropriate JAR files
  4. Configure the dnd.properties file – this will be covered in more detail in the next post
  5. Enable the ‘Data Quality’ component group and restart the Siebel Server
  6. Configure the Data Quality matching parameters on the Data Quality component and the Object Manager component in use (I’m using Public Sector)
  7. Create Job Templates for batch Account, Contact and Address cleansing
  8. Configure the ‘Data Quality Administration’ options
  9. Set up ‘Third Party Administration’ options, including both the Field Mappings and Vendor Parameters – arduous, but really straight forward
  10. Enable User Preferences

I’m in the process of writing a Business Service, in the style of the Oracle Policy Automation, to populate the OEDQ seed data – I’ll update the post with a SIF file once I’m done.

That’s all you need to do on the Siebel side for now but we’re still not ready to test. Next, we’ll look at configuring OEDQ to work alongside Siebel and finish up the Siebel config, allowing us to test.

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)

OEDQ – Base Installation

Installing OEDQ couldn’t be easier. The installation documentation is extremely clear and concise and the installer itself is a proper, Windows installer – no OUI in sight and this is no bad thing as far as I’m concerned.

The steps I took to install were:

  1. Prepare a Windows 2003 or 2008 Server VM
  2. Navigate to the Oracle Product Page for OEDQ
  3. Download the OEDQ 9.0.3 distribution
  4. Unzip the file
  5. Run the installer as administrator
  6. I checked everything, just for evaluation purposes, and installed to the D: drive on my VM

This will install the basic components, dashboard and administration pages.

If you want to use any of the client or administration tools directly on your server VM, you’ll need to install Java JRE 6 with Java Web Start. As luck would have it, Oracle have kindly provided this in installer form in the ’3rd party’ folder in the installation archive. For the sake of keeping things simple, I installed this on my Server VM on the D: drive.

The first thing you’ll have to do now is select the Enterprise Data Quality Launchpad from the start menu and click ‘Change Password’. The default is dnadmin/dnadmin. Once you’ve set a new password, you’ll be able to explore the product. If you’re prompted to download a ‘jnlp’ file, this is a Java Web Start file :choose to save then open and Java Web Start will do the rest, presenting you with the appropriate application.

 

Finally, while you’re here, extract the ‘edq-cds-9_0_1_(100).zip’ file to the OEDQ server and ‘siebelconnector.zip’ files to a folder location on your Siebel server – we’ll need these for the next steps.

Next time, we’ll look at the Siebel configuration and how we go about integrating the two products.

VN:F [1.9.22_1171]
Rating: 8.0/10 (2 votes cast)

Oracle Enterprise Data Quality (OEDQ)

Back on a previous engagement, I looked into using Siebel Data Quality Manager in a Siebel 7.8 environment. The premise was pretty straight forward: if a user tried to create a Contact or an Account that already existed, Siebel should prompt them instead to select an existing record. Sounds relatively straight forward.

The reality is a complex mish mash of technologies and messages.

Back in those days, Siebel came complete with a Data Quality Management (DQM) component pre-licensed and installed on demand. It was based on old SSA-NAME3 technology and came in two parts – interactive and batch. Using the interactive mode, when creating a new record, Siebel would prompt the user to select from a list of pre-existing records that matched their input data – pattern matching and fuzzy logic were employed to give ‘matching’ records a percentage score which showed the user how likely the record shown matched the data entered. Batch mode would allow cleansing of existing data, automatically merging records that were deemed to be ‘the same’.

The built in functionality was clunky and flawed – users were able to circumvent the triggering of DQM and enter their own data. The rules involved in matching records were also enclosed in a black box – there really wasn’t much one could do to influence the behaviour.

Siebel 8 brought with it a more flexible solution – Oracle Data Quality Manager. This manifested itself in the form of a third party product called Identity Search Server or ISS. This product provided configurable rules and rule sets that one could use to fine tune data matching rules. The integration was more open – pre-defined and configurable Web Services could be tweaked and invoked from anywhere in the system. It allowed for a more configurable, if somewhat convoluted, solution to data quality requirements.

With the release of Siebel 8.2.2 and with the Oracle acquisition machine charging forward at full pelt, we seem to have changed tact once again. My latest investigation on Oracle’s web pages reveals something I’ve not come across before: Oracle Enterprise Data Quality (OEDQ). This seems to be based on a product that Oracle acquired with the company Datanomic in July 2011. I’ve been unable to uncover any references to ISS, so I’m guessing it’s been usurped by Oracle’s own offering.

I’m cool with that – as long as OEDQ delivers the goods and let’s me utilize its functionality from within Siebel. I’ve downloaded OEDQ from Oracle’s Technology Network (OTN) resource pages – over the next few articles, I’m going to install and configure the tool and see how I can get it to work with Siebel.

VN:F [1.9.22_1171]
Rating: 10.0/10 (1 vote cast)