Prevent Duplicate Records

Hello,

I am looking for suggestions on how to configure a flow.

The flow I have does the following:

  1. Pull csv files from FTP (Invoices)
  2. Does some transformation and lookups
  3. Does a lookup to see if an Invoice already exists
  4. Creates Invoice (HTTPS) (input filter based on the result of 3)

The problem is that the duplicate check does not work if the same invoice comes in in two different files in the same run. This is because all the duplicate checks are done before any invoices are created. I am looking for a way to prevent duplicates if the same invoice comes in twice during the same run.

0

Comments

10 comments
Date Votes
  • When doing transform in step 2 do it with javascript and remove the duplicates from json or change the invoice id for any duplicate.

    0
  • Script is one way to do it. However, if you have duplicates in different data pages, you won't find them. So, you'd have to make one large page of data. 

    Have you tried the composite HTTP operation Add and ignore existing? This lets you run a dynamic query against the target system to see if the record exists.   The lookup is like a mini export. This is a pretty slick way to do it.  

     

    1
  • Nuri, it is probably not guaranteed that the dupe invoices will be in the same page/context for your JavaScript functions.  Saying this another way, the source exports in your flows always break data into smaller pages, and then your JavaScript function will only ever be given a single page per invocation, and then it might not be possible to make sure the dupe records always land on the same page, etc...

    Mark, here is how I would do it.

    1. Combine 3 + 4 into a single import and use the option to "ignore existing records".  See first screenshot below.
    2. Set "Concurrency ID lock template" to make sure duplicate invoice records are never processed at the same time in your import when concurrency level on your connection > 1.  See second screenshot for context, and please read this article about this field.  https://docs.celigo.com/hc/en-us/articles/360050579791-Preserve-the-order-of-records-when-using-concurrency

     

    2
  • What he said ^^^^^^^^ :)

    0
  • Ha David!  You beat me to it, but I had to post my answer after all that typing!

    0
  • I got you on speed, but you beat on thoroughness and elegance.  Great response!

    1
  • Yes! Thank you, everyone. The Composit option was what I was looking for. I knew I had seen the option to define existing records before, but could not find it when I was just doing a POST. I assumed that it was an option in a different connector.

    0
  • I am still having a slight issue setting this up. I configured the Dynamic Lookup to look for existing records and pointed to a field in the response. How does the import know if it found a duplicate? Is it the absence of a value in that field, or does it need to match the value in the concurrency id? If it needs to match, how would I do compare the concatenation of two fields from the lookup?

    0
  • Nice, learned something new here !:)

    1
  • Hey Mark,

    See screenshot below for context.

    Basically, if the lookup finds any record then you know it is a duplicate (i.e. it exists already because you found it). The field in the HTTP response that you pick does not matter much here, and you should also ignore the 'Action to take if unique match not found' section since that is not applicable here too.

    The UX for creating these dynamic lookups is confusing, and the reason is because you can actually reuse these lookups definitions in your field mappings, but we (Celigo) really need to do a better job separately the two use cases.

    For the 'Concurrency ID lock template' field you should use your main invoice id field. Use the AFE 2.0 UI to guide you here.

    It is also worth noting that 'Concurrency ID lock template' is a pretty advanced concept, and a very simple way to skip this part for now is to set the "Concurrency level" on your connection to 1, and then this will force all records regardless of their id to get processed 1 at a time, in order, and then no need to establish a data/record level concurrency lock.  See bottom screenshot for this.

     

    0

Please sign in to leave a comment.

 

Didn't find what you were looking for?

New post