Sunteți pe pagina 1din 1

Comparison algorithms and techniques is also a factor in the likelihood that the corresponding pieces of

Good morning. code can be matched. This remark is also applicable to directory
My name is Cristian Bursasiu and I applied for the Software comparison – we can have files that are moved and renamed an
Engineering Master as I believe that I reached the point in my career modified so the degree of the content modifications will determine
where a formal study of this field is required. the ability of an application to correctly identify matches.
In the following minutes I will present the project proposal that I
would like to accomplish by the end of this master in the event that I Software engineering challenges
am accepted as a student. In order to implement a better comparison tool I foresee some
The study is called Comparison algorithms and techniques and is engineering difficulties that I would like to point out:
related to the well known comparison tools, or diff tools that A good engineering approach is to build an expandable, open
software engineers use with various occasions. framework that will easily permit third party or future adaptation –
this can be achieved by providing a plug-in system that will cover as
Agenda much as possible of the foreseen future development.
The presentation will contain Another issue that I would like to study are ways to present higher
- a short introduction, logic comparison results to the user without removing the possibility
- some issues related to content comparison to fallback to the text level comparison/editing. The result
- a few software engineering challenges presentation issue is also interesting to consider/investigate if the
- and a few algorithmic / theoretical challenges regarding application would be able to compare other types of files like
content comparison that have to do with specialization and machine images, MS word documents, visual studio project files, etc.
learning A feature that current comparison tools do not approach is file
- Q&A specialization: Images, Video, Code (language specific), XML, etc. The
- In the final there is the bibliography that I used to plug-in system could prove an effective solution for specialization
compose this presentation. even for the specialized presentation issues.

Introduction Algorithmic / theoretical challenges 1


There are a lot of tools for file/directory comparison on the market Amongst the more academic problem that can be foreseen
and the majority of them have similar features, results and regarding content comparison are the ones related to algorithms:
performance regarding file and directory comparison. For comparing images I believe that there are some fields that I
Usually such a product will compare directory content in a simple, should study/investigate for applicable algorithms and techniques:
straight forward manner: it usually displays files that are missing neural networks, image processing, face recognition, AI, etc.
from one of the directory or the other, files that are the same and For comparing source code I believe that some algorithms must be
files that are modified – all this is based on file name matching. found/investigated/experimented with regarding matching and
Usually the current tools do not “observe” the rename of a file or identifying code snippets. Also, code analysis techniques could be
the moving of a file to a (or another) subdirectory. useful in order to match files or code snippets,
Regarding file comparison, current tools only threat text files in a
general way, with disregard to the higher logic of the information in Algorithmic / theoretical challenges 2
the compared content. More advanced techniques can be searched for in the domain of
Form my software engineering experience and from information machine learning.
gathered from other engineers, I can say that there is a considerable Neural networks should provide the right place to look for a quick
interest regarding diff tools especially in older projects where the way of comparing images.
size and complexity of the code base is considerable. The iCub cognitive architecture is the result of a detailed design
process founded on the developmental psychology and
Comparison issues 1 neurophysiology of humans, capturing much of what is known about
Imagine that we have two identical C++ files and in one of the files the neuroscience of action, perception, and cognition. This
we move a line of code several lines down. The result of the framework should be studied in order to adopt engineering
comparison will shows an extra line in the first file (in the place information for a better integration of various concepts/modules of
where the line used to be) and an extra line in the second file (in the the project.
place where the line was moved). Therefore the application does not Study of pattern recognition along with some ways of extracting
realize that the line was moved so it displays 2 modifications instead higher level information – like code analysis/parsing – could lead to
of one. a better code snippet matching and/or file matching.
There are also some of the newer machine learning domains or sub-
Comparison issues 2 domains that could prove useful in improving the performance and
Another example could be if you rename a variable - the diff tool results of the comparison methods and techniques – computational
will recognizes a line with differences for each use of the variable neuroscience I believe to be one of them.
instead of only one difference – a variable was renamed.
I would like to mention that different coding rules could lead to
major differences in what a current diff tool report even if the
logical/useful differences are minimal.

Comparison issues 3
There are many other examples of modification that a current diff
tool will not notice – mainly because the current tools only operate
at text level and not at the higher logic of the file.
More complex issue arise if a part of the code was moved and then
modified – also the amount of the modification applied to the code

S-ar putea să vă placă și