How to Verify Data#
There are two ways Kumodd can verify data:
- -list: List metadata in google drive, and use it to verify local files if present
- -verify: use local cached metadata to verify local files
When listing (-list option), Kumodd downloads metadata from Google Drive and verifies whether local data are consistent with Google Drive.
When verifying (-verify option), Kumodd uses the previously saved YAML metadata on disk to verify whether files and metadata on disk are correct. If there are multiple revisions, Kumodd verifies all downloaded revisions. Because it uses cached metadata, it does not connect to Google Drive, and does not require any network access or credentials.
Either way (-list or -verify options), Kumodd confirms whether each file’s MD5, file size, and Last Modified and Last Accessed are correct. In addition, it confirms whether the MD5 of the metadata on disk matches the MD5 of the metadata originally read from Google Drive.
Download Google Drive Metadata to Verify Local Data#
To retrieve metadata from Google Drive and review accuracy of the data and metadata on disk, use the -list option.
kumodd -list pdf -col verify
Status File MD5 Size Mod Time Acc Time Metadata fullpath
valid match match match match match ./My Drive/report_1424.pdf
Verify Local Data Without Accessing Google Drive#
To review accuracy of the data and metadata using previously downloaded metadata, use the -verify option. This does not read data from Google Drive, but rather re-reads the previously saved YAML metadata on disk, and confirms whether the files’ MD5, size, Last Modified, and Last Accessed time are correct. This also confirms whether the MD5 of the metadata match the previously recorded MD5.
kumodd -verify -col verify
Status File MD5 Size Mod Time Acc Time Metadata fullpath
valid match match match match match ./My Drive/report_1424.pdf
To show different columns, use the -col option. To see the MD5s, use -col md5s. See How to Configure for furhter usage of the -col option.
kumodd -verify -col md5s
Status File MD5 Size Mod Time Acc Time Metadata MD5 of File MD5 of Metadata fullpath
valid match match match match match 5d5550259da199ca9d426ad90f87e60e 216843a1258df13bdfe817e2e52d0c70 ./My Drive/report_1424.pdf
To verify only the most recent version and ignore previous revisions, use -norevisions. This can be significantly faster because with revisions there is one API call per file, whereas with -norevisions there is one API call per folder.
Verify Data Using Other Tools#
The MD5 of the file contents is recorded in the metadata.
grep md5Checksum 'download/metadata/john.doe@gmail.com/My Drive/report_1424.pdf.yml'
md5Checksum: 5d5550259da199ca9d426ad90f87e60e
When Kumodd saves a file, it rereads the file, and computes the MD5 digest of the contents. It compares the values and reports either matched or MISMATCHED in the md5Match property.
grep md5Match 'download/metadata/john.doe@gmail.com/My Drive/report_1424.pdf.yml'
md5Match: match
Other tools can be used to cross-check MD5 verification of file contents:
md5sum 'download/john.doe@gmail.com/My Drive/My Drive/report_1424.yml'
5d5550259da199ca9d426ad90f87e60e download/john.doe@gmail.com/My Drive/My Drive/report_1424.yml
The MD5 of the redacted metadata is saved as yamlMetadataMD5:
grep yamlMetadataMD5 'download/metadata/john.doe@gmail.com/My Drive/report_1424.pdf.yml'
yamlMetadataMD5: 216843a1258df13bdfe817e2e52d0c70
To verify the MD5 of the metadata, dynamic values are removed first (see Data Verification Methods). To filter and digest, yq, a command line YAML query tool, and md5sum may be used.
yq -y '.|with_entries(select(.key|test("(Link|Match|status|Url|yaml|converted)")|not))' <'download/metadata/john.doe@gmail.com/My Drive/report_1424.pdf.yml'|md5sum
216843a1258df13bdfe817e2e52d0c70 -
During listing, if there are changes in the metadata, Kumodd will output diffs that identify the values that changed between previously saved and Google Drive metadata.
Reporting Differences in Metadata#
When differences in metadata are detected, between on disk and in Google Drive, kumodd show a diff on standard output that highlights the rows and columns of metadata that changed. To disable reporting of differences, use -nodiffs.
kumodd -col short -list pdf
Status Version Full Path
valid 4 ./My Drive/docs/paper.pdf
valid 3 ./My Drive/docs/article.pdf
INVALID 3 ./My Drive/docs/report(3).pdf
______________________ ./download/metadata/john.doe@gmail.com/./My Drive/docs/report(3).pdf
capabilities:
canAddChildren: false
canChangeCopyRequiresWriterPermission: true
canChangeViewersCanCopyContent: true
canComment: true
canCopy: true
canDelete: true
canDownload: true
canEdit: true
canListChildren: false
canMoveItemIntoTeamDrive: true
canMoveItemOutOfDrive: true
canReadRevisions: true
canRemoveChildren: false
canRename: true
canShare: true
canTrash: true
canUntrash: true
category: pdf
copyRequiresWriterPermission: false
- createdTime: '2018-08-04T12:55:13.682Z'
? ^ ------ ^ ^^
+ createdTime: '2018-07-27T05:24:14.659Z'
? ^ +++ +++ ^ ^^
explicitlyTrashed: false
fileExtension: pdf
fullFileExtension: pdf
fullpath: ./My Drive/docs/report(3).pdf
hasThumbnail: true
- headRevisionId: 0B4pnT_44hsmZX0c0x22o5ZdaSzyanpUENCVFMWnVvPQ
- id: 1EEPPXweZb9p5_YsAkT-g8gpkwWhrv86X
+ headRevisionId: 0B4pnT_44h5mT0wZVEdVN63g4ZB4M1ncDNrQmZaU9NPQ
+ id: 1QHwKAAQz08iIjRydOXNPFb182D6wbtvD
isAppAuthorized: false
kind: drive#file
lastModifyingUser:
displayName: John Doe
emailAddress: john.doe@gmail.com
kind: drive#user
me: true
permissionId: '1466113117414251'
photoLink: https://lh5.googleusercontent.com/-ptN2vmCNOi8/AAAAAAAAAAI/AAAAAAAAGkE/NR8xp9YvB2y/s64/photo.jpg
md5Checksum: 30428a1e95780c79e738d7d9ba2
mimeType: application/pdf
modifiedByMe: true
- modifiedByMeTime: '2018-08-04T12:55:13.682Z'
? ^ ------ ^ ^^
+ modifiedByMeTime: '2018-07-27T05:24:14.659Z'
? ^ +++ +++ ^ ^^
- modifiedTime: '2018-08-04T12:55:13.682Z'
? ^ ------ ^ ^^
+ modifiedTime: '2018-07-27T05:24:14.659Z'
? ^ +++ +++ ^ ^^
name: report.pdf
originalFilename: report.pdf
ownedByMe: true
owners:
- displayName: John Doe
emailAddress: john.doe@gmail.com
kind: drive#user
me: true
permissionId: '14466111617614251'
photoLink: https://lh5.googleusercontent.com/-ptN2vmCNOi8/AAAAAAAAAAI/AAAAAAAAGkE/NR8xp9YvB2y/s64/photo.jpg
parents:
- - 1_DnZ8H9h29V_zZVeaFGfdqX6N
+ - 1nVbNFRPHlA-_L2RW0RnjbIFss
path: ./My Drive/0read
permissionIds:
- '1466613167464251'
permissions:
- deleted: false
displayName: John Doe
emailAddress: john.doe@gmail.com
id: '1446613167464251'
kind: drive#permission
photoLink: https://lh5.googleusercontent.com/-ptN2vmCNOi8/AAAAAAAAAAI/AAAAAAAAGkE/NR8xp9YvB2y/s64/photo.jpg
role: owner
type: user
quotaBytesUsed: '25417608'
revisions: null
shared: false
size: '25417608'
spaces:
- drive
starred: false
thumbnailVersion: '1'
trashed: false
version: '3'
viewedByMe: true
- viewedByMeTime: '2018-08-04T12:55:13.682Z'
? ^ ------ ^ ^^
+ viewedByMeTime: '2018-07-27T05:24:14.659Z'
? ^ +++ +++ ^ ^^
viewersCanCopyContent: true
writersCanShare: true
_______________________________________________________________________________