Yes, once you have your software implemented, and it works, you are left with the simple yet elusive question: is it correct?
Although I think I’m very close now to having a system where I can input something like
(EDIT '(define (x) 3) '(define (y) (* y y))
and get the tree-edit distance between the ASTs representing two Scheme programs. The same tools will be used for other languages; I’ll use a standard parser (like ANTLR) and produce s-expressions to feed into the comparison tool. (At least, I think that’s the class of tool I’ll be looking for; worse case, I’ll write a parser using one of the Scheme parser/lexer-generation toolkits.)