7-Zip Convert-Compare-Delete Batch Script v2.04
By: Bighead
- A valid 7-zip installation.
- At least read the "Configuration" section to learn how to enable support for files over 2GB.
- Copy 7z_ccd_start.bat into the folder with your zip or rar archives and fire it up.
What this does:
- Searches for RAR or ZIP files in a folder.
  - If an archive is found, extracts the contents to "filename contents".
  - Checks to make sure extraction was successful.
    - If so, recompresses the contents into 7z format.
    - If not, outputs an error message for possible corruption.
  - After 7z compression is complete, compares the size of two archives.
    - Displays the difference in size of the two archives in KB and MB.
    - Keeps the archive with better compression and deletes the other.
      - If 7z has better compression, lets you know how much space was saved.
      - If the original file has better compression, lets you know how much space was preserved.
      - If "SevenZipForce" is enabled, lets you know if any space was lost.
  - Deletes the extracted contents that was output to "filename.ext contents".
- Outputs operations and errors to log files.
  - Four logs are created: operations, errors, 7z conversions, and no changes.
- Upon finish, displays total Hard Drive space recovered by 7z compression.
- Also displays the total amount preserved if archives were not converted to 7z.
- If "SevenZipForce" is enabled, lets you know if the total space that was lost.
Known Limitations/Bugs:
- The maximum size for an archive supported is 1 byte under 20GB (21,474,836,479 bytes).
- Very long file names and/or a long path may not correctly extract, compress, or compare.
- Calculating Kilobytes and Megabytes is not 100% accurate and probably never will be (at least not in a batch script).
- Archives with an "!" in their name will not print the "!" in the log files due to the delayed expansion needed to print timestamps on the same line.
- The more files processed, the more RAM the script uses. It takes about 10,000 files processed in a single operation to exceed 500MB.
To use this batch, toss it into the folder that contains your archives and fire it up. When started a command prompt will open that displays the current activity. As the script runs, it generates log files within the folder that it was started in. The "Log Files" section will explain what each log documents.
Edit 7z_ccd_start.bat with a text editor and find the Configuration section near the top.
I prefer Notepad++, it gives the syntax highlighting as seen below and you can edit any file in it with a right click.
7-Zip Path
Here you can change the path to 7-Zip, just make sure to keep the value within quotes.
:: Path to 7-Zip.
SET SevenZipPath="C:\Program Files\7-Zip\7z.exe"

7-Zip Command Line
You can also pass 7-Zip commands. See "7-Zip Commands" section for more info.
:: Enabled 7-Zip commands.
SET SevenZipCommands=-mx=9 -md=32m

Force Keep 7-Zip Archives
You can force the script to only convert files to 7z. When this option is enabled, the script will also report space lost if 7z compression failed to be better.
:: Force script to keep 7z archives. 0-disable / 1-enable
SET SevenZipForce=0

Make Directories
With this option enabled, processed archives will be placed into directories based on: no changes, converted, lost space, or failed.
:: Place archives into directories. 0-disable / 1-enable
SET MakeDirectories=0

Break 2GB Batch Limit
This batch supports recompression of files up to 2 GB in size by default, but can be configured to support files up to 20 GB.
:: Exceed the batch 2GB file limit. 0-disable / 1-enable
SET BreakBatchLimit=0

Enable this option for files over 2GB or you will get the error: "Invalid number. Numbers are limited to 32-bits of precision." This error will most likely cause the script to halt, close the command prompt window, keep both the 7z and rar/zip version of the file, and not remove the "contents" folder.

BreakBatchLimit may break comparison of small files, and it makes file size calculations slightly more inaccurate. This script internally compares the files in Bytes; the KB and MB values are just for display. This option truncates the last digit in Bytes before comparisons and calculations, so 1-9 bytes are lost in the calculation. To iterate, they are not lost from the file, just the equation. Your files will remain 100% in tact. These lost bytes may be important when comparing very small files so only enable it when you plan on dealing with archives larger than 2 GB.

For example, a 7-zip file of 23451 Bytes and a RAR file of 23459 Bytes will both be seen as 2345 Bytes, and the RAR file will be kept over the 7-zip file.
Log Files:
Logs will only be created if an archive falls into the respective category, with the exception of the main log which is always created.

7z_ccd_logfile.txt - The main log which logs all operations that have taken place with timestamps.
7z_ccd_failed.txt - Logs a list of all archives that failed the extraction or recompression process.
7z_ccd_converted.txt - Logs a list of all archives that were converted to 7-zip + space recovered/lost.
7z_ccd_nochanges.txt - Logs a list of all archives that had no changes made to them + space preserved.

After you are done viewing these logs, you can safely delete them.
HDD Storage - Recovered vs. Preserved:
The main log and command prompt will let you know how much space you recovered if an archive was converted to 7z, and will output the total space recovered in both KB and MB when all operations are complete. This number is how much hard drive space you actually freed up in the conversion.

Along with the amount of space recovered, it will also let you know the total amount of space preserved if 7z failed to have a lower compression ratio. This is not space that was freed up, so you don't gain any hard drive space. This number is how much space you would have lost if converting to 7z.

Starting with v2.00, the script will also let you know if any space was lost if you forced the option to convert only to 7z.
7-Zip Commands:
Commands may help to achieve higher compression in 7z files at the cost of compression speed.
Edit 7z_ccd_start.bat with a text editor. In the configuration is:

SET SevenZipCommands=-mx=9 -md=32m

Commands must have spaces between them. You can add any commands here that 7z recognizes. You will notice -mx=9 is already here, which is the setting for "Ultra" compression level. -md=32m is the dictionary size that I have set (due to the fact it is light on RAM and makes a good default).

Dictionary Size (-md=#m):
The command to adjust the dictionary size is -md=#m where # is the value in megabytes. If you have the RAM to spare, then you can try to get better compression by increasing the dictionary size. The higher the value, the more RAM 7-Zip will use when compressing files, but the longer it will take to compress. The examples below will let you know approximately how much RAM that a few common dictionary sizes use.

| -md=32m  -405MB | -md=64m  -709MB | -md=128m  -1.4GB | -md=192m   -2.1GB |
| -md=256m -2.7GB | -md=384m -4.7GB | -md=512m  -6.0GB | -md=1024m -10.8GB |

Solid or Non-Solid Archives (-ms=on/off):
A solid archive treats all files within an archive as one large file, which can yield better compression but does not allow you to extract a single file from the archive without going through all of its contents when decompressing. A non-solid archive is the opposite in that all files are compressed individually within the archive. This setting has no effect on archives containing a single file.

SET SevenZipCommands=-mx=9 -md=32m -ms=on

Much more on 7-zip commands can be found here: http://www.dotnetperls.com/7-zip-examples
Credit goes to Andrew Armstrong who created the orginal script. Without his initial research and webpage, I would not have been inspired to modify it into what it can do now. His original scripts can be found at: http://aarmstrong.org/tutorials/mass-zip-rar-to-7zip-recompression-batch-file

v2.04 ...
- Archives that either failed compression or decompression will now be better differentiated in log files.
- Any size reported in megabytes that is less than 1 will now show a zero before the decimal (ex: 0.16 MB instead of .16 MB)
- Successful number of decompresses are no longer reported when the script is finished. Now that everything that has happened
to a file is reported successfully and specifically, it will now instead report the total number of all files processed.
- More adjustments to the log files. End reports are now slightly indented and specific logs now have line spacers.

v2.03 ...
- Minor adjustments to the text output to the command prompt and log files.

v2.02 ...
- Fixed all instances of "better" compression ratio with "lower". Before v2.00 I used to refer to a "higher" compression ratio as
the better compression. It turns out I was wrong that higher meant better, and learned the opposite was true.
- Fixed a bug that the addition of the "MakeDirectories" feature created. Archives failed to process if the path had parenthesis.
- Fixed another bug with "MakeDirectories" where files with a "%" in their name would not be moved to the new directory.
- Added error checking for when archives failed to compress into 7z, this usually happens when the path/filename are too long.
Before when an archive failed to compress, the script would make no notification and just skip to the next file. Archives that
failed recompression into 7z will now be treated the same as files that failed to decompress (logging, moving to failed directory).
- Archives that failed compression or recompression will now list which error the file suffered from in the "errors" log file.
- Renamed "7zconvert" log file to "converted" and "errors" log file to "failed". Created directory names are the same.

v2.01 ...
- Added the option to move processed archives into directories.
- Fixed "nochanges" log file to correctly log the original file extension instead of 7z.

v2.00 ...
- Finally went back and fixed the RAM consumption issues! It's always been on my to-do list and it's been almost 2
years since I made v1.10! Plus I didn't know what the heck was going on at the time. I battled a lot of the imperfections
in batch to get this fully working without issues, and I can say with pride that it is now as close to perfect as it can get.
- Fixed an issue with the script crashing when comparing files with a "%" in their name (bug since v1.06).
- Added a configuration header to the script to change the path to 7-zip, commands, force 7z, and the 2GB+ toggle (see next).
- It is now possible to force the script to mass convert all files to only 7z even if the original archive was smaller in size.
- There is now an option in the header of the script to enable support for archives over 2GB. Before archives over 2GB were
automatically truncated, but the method required had more issues than what it was worth and didn't always work to
begin with. Because having this option enabled can have some slight negative effects on comparisons and calculations of
very small files, I figured having it available to disable would be important instead of it being the only method.
- Eliminated a big problem with the loop within a loop that was probably the cause of the RAM issues.
- Final log data entry for the CMD window and main log that counts the amount of files that passed, failed, converted,
or had no changes now only prints if any archives fell into the respective category.
- Lots of cleanup to the script. Also removed the 100plus version of the script as its no longer needed.

v1.10 ...
- The package now comes with two batch files - the normal batch and a stripped down version for converting more than 100
files at a time. This new batch is titled 7zip_ccd_100plus.bat, and is basically v1.05 with a few alterations. It does
not perform any mathematical functions, so space recovered and perserved are not calculated. Also the size comparison is
displayed in bytes (instead of kilobytes and megabytes), just like my older versions. It also only outputs 2 log files,
the main log and the error log. Memory usage is much lower in this version, but can still climb pretty high over time.
- Moved the 7-Zip tips and configuration into its own document.

v1.09 ...
- Apparently this script always had an issue with how much RAM it gobbled up, I just never noticed it because I have 8GB
in my system and never processed more than 10-30 archives in one go. And v1.08 really skyrocketed the usage, it would
consume almost 30MB for every file processed! I managed to cutdown the RAM usage drastically. It will start out consuming
a few KB per archive, but will eventually start eating more and more RAM the more archives it processes, meaning the RAM
consumption is not linear. Systems with 1024MB+ should have no problem processing about 100 files in one go. Processing
any more than that will take MUCH more. After 200 files are processed, RAM consumption will be up to 820MB, after 300
files are processed, it will be up to 1.43GB, 400 files - 2.72GB! I hope to find the source of this issue or possibly
find a work around because this is unacceptable.

v1.08 ...
- Only one major change in this version: near perfect accuracy. I wanted to see if I could simulate 2 decimal values for
conversions to megabytes and I managed to accomplish it! It actually wasn't that hard at all. All readouts in MB will
now output the previously lost decimal value, and accurately add them up when calculating totals. All readouts in MB are
now approximately 99% accurate; almost nothing is lost when converting from B to KB to MB. Archives over 2GB will still
be slightly less accurate, maybe dropping down to 98.9% (yeah, who cares?). This is because a mere 1-9 bytes are still
lost in the conversion to KB, but there is absolutely nothing that can be done about this (nor does it even matter).
- Perhaps the next version will have a GB conversion as well when files exceed at least 1024 MB...

v1.07 ...
- The size limit of an archive is now 1 byte under 20GB (I previously thought it was 10GB, it was more like 2GB). This
was because of a bug in batch, files over 2GB do not correcly compare in IF statements (in some cases). I just had to
work with the one place it is working correctly, and build off of that by setting global variables.
- Fixed an error with size comparison and calculations that happens when archives exceeded 2GB. The check to see which
archive is smaller was not working (due to another batch bug). It would always keep the original archive and add to
"preserved" even if 7z was smaller. Apparently files over 2GB never worked since the beginning, now they do!
- Improved the accuracy of MB when calculating the total saved. MB when displaying the file size comparisons will still
be slightly inaccurate, this is a problem with batch not having the ability to handle decimals. So if a readout prints
that 7z-2047KB(1MB) rar-6156KB(6MB), the old equation would say Recovered: 7MB. It is actually slightly over 8MB, which
will now report correctly. Accuracy is still not 100%, to achieve this would take more math than what it's worth.

v1.06 ...
- File size comparisons are now displayed in kilobytes and megabytes (Finally!).
- If 7z proved to be better compression, it will now let you know how much space was recovered. If it failed, then it
will also show you how much total space was preserved. Preserved space is not equal to freed space.
- In addition, the total number of space recovered and preserved is now printed in the batch window, the main log, and
the compression list logs when all operations are complete.
- More improvements to the main log. Removed some useless lines, timestamps, and dashes.
- Fixed a bug that v1.05 caused that didn't show the extension in the file name in the main log.

v1.05 ...
- Archives with a "!" will now correctly parse instead of failing to decompress and shooting out an error. Problem was
with EnableDelayedExpansion - I had it on for the entire script instead of where it was needed.
- Comments were added back into the script for those who wish to see what's going on. Apparently using "REM" to create
comments didn't create the same problems as using "::".

v1.04 ...
- Removed some of the "repeating" timestamps that were printed into the main log. Some of the actions are performed
within milliseconds, so time doesn't noticably change. A new timestamp is printed only when a major action takes place.
- Added additional information and tips in the list logs when all operations are complete.

v1.03 ...
- Removed comments in the batch script. They were causing issues for some reason.
- Improvements on how logs display information. In other words, they are now easier to read.

v1.02 ...
- Fix a bug that was creating a blank dummy text log within each archive when compressed into 7z.
- Added the creation of several new log files: errors, 7z recompresses, and no changes.
- Updated the ReadMe with tips on how to improve compression or speed in 7z.
- Now displays and logs how many files were decompressed, failed, converted, and had no changes.

v1.01 ...
- A log file of all operations while processing is now generated.
- Visually displays the file size comparison in the batch window and in the log in bytes.

v1.00 ...
- First release, basic comparison and delete.