I was 1 of 10 students from the United States selected to participate
in the Research Experiences for Undergraduates (REU) within the
Computer Science department at Missouri University of Science
and Technology.
Throughout this program we conducted research in the field of Cloud
Computing guided by PhD students. I worked with group made up of three
students from different countries and backgrounds. The title of our
research was A Secure Document Comparison Protocol. Text-based document
comparison for similarity is common task in a variety of fields.
However, when the document data is confidential, secure protocols
are necessary to ensure the confidentiality of both comparison parties.
The comparison context for this project was a client-server comparison,
in which the client queries a single document to be compared to a collection
of documents owned by the server. This type of comparison is important in
select applications. For instance, a doctor may wish to compare the symptoms
of one of his patients against a database of patient records held by another
doctor. However, releasing patient medical data is unlawful, so a comparison
protocol would have to leak no information about patients not under a doctor’s
care. A second example is academic journal submissions. A journal
receiving a paper submission needs to compare the paper against
digital archives to detect plagiarism, without revealing the
content of the submission.
Definition 1: A document comparison protocol is secure if it
-
does not reveal the server documents to the client
-
does not reveal the client document to the server
-
does not reveal any comparison scores to the server
If the protocol revealed comparison scores to the server, the server could
learn which documents the client document is most similar to as well as
how similar in general it is to the record collection, which is unacceptable
because then the server learns information about the client document.
Our contributions include:
- secure comparison protocol
- an implementation of the protocol as a Java GUI application
The protocol fully satisfies the notion of security detailed by Definition 1.
We developed a client/server application that securely computed the similarities between
two documents. While computing the similarities neither party will be able to access the
document. The protocol that we used assures the protection of the data from unauthorized
users. In our application, the user needed to select the collection where he/she wants to
compare the document. After this, he/she submitted the document that he/she wanted to
compare to the selected collection. This process is a secure procedure because the
application encrypts the document before sending it to the server. By doing this,
we prevented the server of knowing the information that it contains. For example,
let say that a doctor would like to know the risk factor of heart disease in a
specific patient. If the doctor requested a file of a patient from a server,
the privacy of that patient is being violated because unauthorized users can
access that data. With our application, the doctor is required to send the
document with the information history of the patient. The application encrypted
the information of the document before sending it. Then, the server received the
encrypted document, so there is no way for the server to know any information
regarding the patient. Afterwards the server calculated the similarity of the
encrypted document with all the documents in the collection chose by the user
and it returned a similarity score between 0 to 1 for each document.
Skills: Java, C, Java Native Library, Encryption, RSA, Unix, Latex, Research,
Cloud Computing, Number Theory, Vector Space Model, Lucene Library, Paillier
cryptosystem, homomorphic, k-NN query, Networking, Data Communication, Multithreading