21 August, 2018

Linux Storage LAB: Using OpenDedup to store VM virtual images


Deduplication means, that when saving data to disks, if the same data is stored once more, then only a reference is stored and space is saved. OpenDedup is an open source online (real-time) deduplication solution, with one of the goals to provide a solution for storing Virtual Machine images, where big parts of the images could be duplicated. In this article we test OpenDedup, both from speed and compression point of view.

I have a strong Ryzen machine with 500G NVMe SSD, 2 x 500G SATA SSD and a RAID 5 array of 3 x 4T HDDs. The goal is to store the virtual machine images on the SATA SSD-s, and the goal of these measurements was to decide to use OpenDedup or simply use the plain disks.

The Ryzen machine is running Ubuntu 1604 while the virtual machine guests are typically Window 10 systems.

Disk benchmark tools for Windows and Linux


In Linux I am using iozone, while in Windows I am using CrystalMark to measure the disk IO speeds. To compare the them, I made some benchmarks on my client machine, a rather old, Intel Core 2 Quad Q6600 with a SATA2 SSD and also a 220G HDD. I did CristalMark measurements on Windows 10 then restarted from a pendrive Ubuntu 1604 and did iozone measurements there.

For iozone, the following command was used:

iozone -s 1G -r 16M -I -i 0 -i 1 -i 2 -f path/to/storage/to/measure/iozone.tmp

The -r 16M defines the record size, so I have changed it to 512k and 4k to produce comparable results as CrystalMark. For large record sizes iozone's sequential write and read performance is relevant, but for smaller record sizes the random read/write performance is critical.

The results are here:


As we can see, the two benchmark results are similar to each other.

What is important, that the normal read and write speeds are important for large file operations (e.g. copying movies), while the small record sizes are important for operating system overall speed.

Raw disk speeds of the Ryzen computer


After deciding, how to measure the speed with iozone, I measured all disk systems speeds directly under Ubuntu 1604:


From these speeds we can see that the NVMe SSD is by far the fastest, SATA RAID provided double speed for large files, but no improvement for small files and can be clearly seen, that HDD speeds degrade quickly when we decrease the record size (that makes SSD based systems boot so fast).

What we can also see, that SATA2 reduced SSD speed, to half of the speed of the SATA3 interface, so when installing SSD in an old computer, you can gain some speed, but can not reach the speed of a new system.


Installing OpenDedup


I have used the instructions from the OpenDedup site to install Opendedup: 

After installing, when creating the volume I have put the file data to the SATA SSD RAID and put the HASH tables to the NVMe SSD. I have used ext2 file systems, because somewhere I have read, that is faster then ext4 which I use normally.

To create this volume I used the following command:

sudo mkfs.sdfs --volume-name=VirtDisk --volume-capacity=400GB --base-path=/media/SSD_RAID_Ext2/Adatok/Opendedup/VirtDisk --chunk-store-hashdb-location=/media/SSD_Fast_Ext2/Adatok/Opendedup/VirtDisk/chunkstore/hdb --dedup-db-store=/media/SSD_Fast_Ext2/Adatok/Opendedup/VirtDisk/ddb
And to mount it, the following command:
sudo mount.sdfs VirtDisk /media/VirtDiskDedup
After mounting it, I measured its raw speed, and as you can see, it was significantly slower than the SSD volume it was residing on.


Measuring speed in the VM guest


Next step was to measure the speed of the Virtual Machine guest operating system (Windows 10).
I did 3 measurements for all cases: Windows reboot time, Starting time for Microsoft Office Word and did also a CrystalMark measurement.

See the results here:


The windows reboot time was measured in 2 parts, one part to get to the login prompt, and the second part for programs starting after login.

What we can see here, that starting Word doesn't depend very much on the disk speed. Even on the HDD RAID, the difference is marginal. We can also see that the time after user login is not improving with the NVMe SSD probably the SATA SSD is already fast enough for this task.
The boot time is always improving as the disk speed increases, and big record speed also counts, as there is a significant improvement with the SSD RAID. 

Back to the OpenDedup results, the boot time is very similar to the speed of the SATA SSD without RAID, so if we are in need for storage speed, it has an acceptable speed. I attribute the worse user boot time and Word starting time to the worse small record size writes, may be in a lot of real life applications it can be a drawback.


Space saving with OpenDedup


We are doing opendedup, to save space. I did not do much experimenting, but first copied virtual disk image containing a Windows 10 plus a standard Microsoft Office to the dedup store and could see, that it is already saving 27% disk space. I expected, that if I copy a second image, it will save much more, but to my surprise, when I copied an image with Windows 10 and a Visual Studio, the total saving for the 2 images did not change, it was again 27%. For me this meant, that there were no common savings between the two images. 


Conclusion and further thoughts


From this experiment I did not get convinced that I should use Deduplication, the ~30% disk saving is not very significant, and the speed decrease is significant.
As I am new to OpenDedup, may be there are some more tweaking which could lead to better results, for example the compression was enabled, but the statistics showed that it did not produced any gain.
The other point is, that the first image was a VDI format, the second was a VMDK, may be the duplications can not be identified between the two formats?


More details


You can get the OpenDedup statistics with 
sdfscli -volume-info

Here are my results after copying the first disk image and after copying the second:

Files : 1
Volume Capacity : 400 GB
Volume Current Logical Size : 49.54 GB
Volume Max Percentage Full : 95.0%
Volume Duplicate Data Written : 15.4 GB
Unique Blocks Stored: 36.01 GB
Unique Blocks Stored after Compression : 36.12 GB
Cluster Block Copies : 2
Volume Virtual Dedup Rate (Unique Blocks Stored/Current Size) : 27.32%
Volume Actual Storage Savings (Compressed Unique Blocks Stored/Current Size) : 27.09%
Compression Rate: -0.32%

Files : 2
Volume Capacity : 400 GB
Volume Current Logical Size : 124.16 GB
Volume Max Percentage Full : 95.0%
Volume Duplicate Data Written : 35.61 GB
Unique Blocks Stored: 90.41 GB
Unique Blocks Stored after Compression : 90.7 GB
Cluster Block Copies : 2
Volume Virtual Dedup Rate (Unique Blocks Stored/Current Size) : 27.18%
Volume Actual Storage Savings (Compressed Unique Blocks Stored/Current Size) : 26.95%
Compression Rate: -0.32%


Parameters to use when creating a volume


On the OpenDedup site there is a good explanation of the xml parameters but they use different syntax from the ones you have to use in the mkfs command. Here is a list of the parameters for mkfs.sdfs:

usage: mkfs.sdfs --volume-name=sdfs --volume-capacity=100GB
   --ali-enabled                                                Set to enable this volume to store to Alibaba Object Storage (OSS). cloud-url, cloud-secret-key,
                                                                cloud-access-key, and cloud-bucket-name will also need to be set.
   --atmos-enabled                                              Set to enable this volume to store to Atmo Object Storage. cloud-url, cloud-secret-key, cloud-access-key, and
                                                                cloud-bucket-name will also need to be set.
   --aws-aim                                                    Use aim authentication for access to AWS S3
   --aws-basic-signer                               use basic s3 signer for the cloud connection. This is set to true by default for all cloud url buckets
   --aws-bucket-location                          The aws location for this bucket
   --aws-disable-dns-bucket                         disable the use of dns bucket names to prepent the cloud url. This is set to true by default when cloud-url is
                                                                set
   --aws-enabled                                    Set to true to enable this volume to store to Amazon S3 Cloud Storage. cloud-secret-key, cloud-access-key, and
                                                                cloud-bucket-name will also need to be set.
   --azure-enabled                                  Set to true to enable this volume to store to Microsoft Azure Cloud Storage. cloud-secret-key,
                                                                cloud-access-key, and cloud-bucket-name will also need to be set.
   --azurearchive-in-days               Set to move to azure archive from hot after x number of days
   --backblaze-enabled                                          Set to enable this volume to store to Backblaze Object Storage. cloud-url, cloud-secret-key, cloud-access-key,
                                                                and cloud-bucket-name will also need to be set.
   --backup-volume                                              When set, changed the volume attributes for better deduplication but slower randnom IO.
   --base-path                                            the folder path for all volume data and meta data.
                                                                Defaults to:
                                                                /opt/sdfs/
   --chunk-store-compress                           Compress chunks before they are stored. By default this is set to true. Set it to  false for volumes that hold
                                                                data that does not compress well, such as pictures and  movies
   --chunk-store-data-location                            The directory where chunks will be stored.
                                                                Defaults to:
                                                                --base-path + /chunkstore/chunks
   --chunk-store-encrypt                            Whether or not to Encrypt chunks within the Dedup Storage Engine. The encryption key is generated
                                                                automatically. For AWS this is a good option to enable. The default for this is false
   --chunk-store-encryption-key                         The encryption key used for encrypting data. If not specified a strong key will be generated automatically.
                                                                They key must be at least 8 charaters long
   --chunk-store-gc-schedule                     The schedule, in cron format, to check for unclaimed chunks within the Dedup Storage Engine. This should
                                                                happen less frequently than the io-claim-chunks-schedule.
                                                                Defaults to:
                                                                0 0 0/2 * * ?
   --chunk-store-hashdb-class                       The class used to store hash values
                                                                Defaults to:
                                                                org.opendedup.collections.RocksDBMap
   --chunk-store-hashdb-location                          The directory where hash database for chunk locations will be stored.
                                                                Defaults to:
                                                                --base-path + /chunkstore/hdb
   --chunk-store-io-threads                            Sets the number of io threads to use for io operations to the dse storage provider. This is set to 8 by
                                                                default but can be changed to more or less based on bandwidth and io.
   --chunk-store-iv                                     The encryption  initialization vector (IV) used for encrypting data. If not specified a strong key will be
                                                                generated automatically
   --chunk-store-size                                 The size in MB,TB,GB of the Dedup Storeage Engine. This .
                                                                Defaults to:
                                                                The size of the Volume
   --chunkstore-class                               The class for the specific chunk store to be used.
                                                                Defaults to org.opendedup.sdfs.filestore.FileChunkStore
   --cloud-access-key                         Set to the value of Cloud Storage access key.
   --cloud-backlog-size                  The how much data can live in the spool for backlog. Setting to -1 makes the backlog unlimited. Setting to 0
                                                                (default) sets no backlog. Setting to  GB TB MB caps the backlog.
   --cloud-bucket-name                Set to the value of Cloud Storage bucket name. This will need to be unique and a could be set the the access
                                                                key if all else fails. aws-enabled, aws-secret-key, and aws-secret-key will also need to be set.
   --cloud-disable-test                                         Disables testing authentication for s3
   --cloud-secret-key                         Set to the value of Cloud Storage secret key.
   --cloud-url                                             The url of the blob server. e.g. http://s3server.localdomain/s3/
   --cluster-block-replicas                        The number copies to distribute to descrete nodes for each unique block. As an example if this value is set
                                                                to"3" the volume will attempt to write any unique block to "3" DSE nodes, if available.  This defaults to "2".
   --cluster-config                                     The jgroups configuration used to configure this cluster node. This defaults to "/etc/sdfs/jgroups.cfg.xml".
   --cluster-dse-password                               The jgroups configuration used to configure this cluster node. This defaults to "/etc/sdfs/jgroups.cfg.xml".
   --cluster-id                                         The name used to identify the cluster group. This defaults to sdfs-cluster. This name should be the same on
                                                                all members of this cluster
   --cluster-rack-aware                             If set to true, the clustered volume will be rack aware and make the best effort to distribute blocks to
                                                                multiple racks based on the cluster-block-replicas. As an example, if cluster-block replicas is set to "2" and
                                                                cluster-rack-aware is set to "true" any unique block will be sent to two different racks if present. The mkdse
                                                                option --cluster-node-rack should be used to distinguish racks per dse node  for this cluster.
   --compress-metadata                                          Enable compression of metadata at the expense of speed to open and close files. This option should be enabled
                                                                for backup
   --data-appendix                                      Add an appendix for data files.
   --dedup-db-store                                       the folder path to location for the dedup file database.
                                                                Defaults to:
                                                                --base-path + /ddb
   --enable-replication-master                                  Enable this volume as a replication master
   --encrypt-config                                             Encrypt security sensitive encryption parameters with the admin password
   --ext
   --gc-class                                       The class used for intelligent block garbage collection.
                                                                Defaults to:
                                                                org.opendedup.sdfs.filestore.gc.PFullGC
   --glacier-in-days                    Set to move to glacier from s3 after x number of days
   --glacier-restore-class             Set the class used to restore glacier data.
   --google-enabled                                 Set to true to enable this volume to store to Google Cloud Storage. cloud-secret-key, cloud-access-key, and
                                                                cloud-bucket-name will also need to be set.
   --hash-type    This is the type of hash engine used to calculate a unique hash. The valid options for hash-type are tiger16
                                                                tiger24 murmur3_128 VARIABLE_MURMUR3 This Defaults to VARIABLE_MURMUR3
   --help                                                       Display these options.
   --io-chunk-size                                  The unit size, in kB, of chunks stored. Set this to 4 if you would like to dedup VMDK files inline.
                                                                Defaults to:
                                                                4
   --io-claim-chunks-schedule                    The schedule, in cron format, to claim deduped chunks with the Volume(s).
                                                                Defaults to:
                                                                0 59 23 * * ?
   --io-dedup-files                                 True mean that all files will be deduped inline by default. This can be changed on a one offbasis by using the
                                                                command "setfattr -n user.cmd.dedupAll -v 556:false "
                                                                Defaults to:
                                                                true
   --io-log                                               the file path to location for the io log.
                                                                Defaults to:
                                                                --base-path + /sdfs.log
   --io-max-file-write-buffers                      The amount of memory to have available for reading and writing per file. Each buffer in the size of
                                                                io-chunk-size.
                                                                Defaults to:
                                                                24
   --io-max-open-files                                  The maximum number of files that can be open at any one time. If the number of files is exceeded the least
                                                                recently used will be closed.
                                                                Defaults to:
                                                                1024
   --io-meta-file-cache                                 The maximum number metadata files to be cached at any one time. If the number of files is exceeded the least
                                                                recently used will be closed.
                                                                Defaults to:
                                                                1024
   --io-safe-close                                  If true all files will be closed on filesystem close call. Otherwise, files will be closed based on
                                                                inactivity. Set this to false if you plan on sharing the file system over an nfs share. True takes less RAM
                                                                than False.
                                                                Defaults to:
                                                                true
   --io-safe-sync                                   If true all files will sync locally on filesystem sync call. Otherwise, by defaule (false), files will sync on
                                                                close and data will per written to disk based on --max-file-write-buffers.  Setting this to true will ensure
                                                                that no data loss will occur if the system is turned off abrubtly at the cost of slower speed.
                                                                Defaults to:
                                                                false
   --io-write-threads                                   The number of threads that can be used to process data writted to the file system.
                                                                Defaults to:
                                                                16
   --local-cache-size                        The local read cache size for data uploaded to the cloud.
                                                                Defaults to:
                                                                10 GB
   --low-memory                                                 Sets the volume to mimimize the amount of ram used at the expense of speed
   --minio-enabled                                              Set to enable this volume to store to Minio Object Storage. cloud-url, cloud-secret-key, cloud-access-key, and
                                                                cloud-bucket-name will also need to be set.
   --noext
   --permissions-file                        Default File Permissions.
                                                                Defaults to:
                                                                0644
   --permissions-folder                      Default Folder Permissions.
                                                                Defaults to:
                                                                0755
   --permissions-group                       Default Group.
                                                                Defaults to:
                                                                0
   --permissions-owner                       Default Owner.
                                                                Defaults to:
                                                                0
   --refresh-blobs                                              Updates blobs in s3 to keep them from moving to glacier if clamined by newly written files
   --report-dse-capacity                            If set to "true" this volume will report capacity the actualcapacity statistics from the DSE. If this value is
                                                                set to "false" it willreport as virtual size of the volume and files. Defaults to "true"
   --report-dse-size                                If set to "true" this volume will used as the actual used statistics from the DSE. If this value is set to
                                                                "false" it willreport as virtual size of the volume and files. Defaults to "true"
   --sdfscli-disable-ssl                                        disables ssl to management interface
   --sdfscli-listen-addr               IP Listenting address for the sdfscli management interface. This defaults to "localhost"
   --sdfscli-listen-port                              TCP/IP Listenting port for the sdfscli management interface
   --sdfscli-password                                 The password used to authenticate to the sdfscli management interface. Thee default password is "admin".
   --sdfscli-require-auth                                       Require authentication to connect to the sdfscli managment interface
   --simple-metadata                                            If set, will create a separate object for metadata used for objects sent to the cloud. Otherwise, metadata
                                                                will be stored as attributes to the object.
   --simple-s3                                                  Uses basic S3 api characteristics for cloud storage backend.
   --tcp-keepalive                                      Set tcp-keepalive setting for the connection with S3 storage
   --use-perf-mon                                   If set to "true" this volume will log io statistics to /etc/sdfs/ directory. Defaults to "false"
   --user-agent-prefix                                  Set the user agent prefix for the client when uploading to the cloud.
   --volume-capacity                           Capacity of the volume in [MB|GB|TB].
                                                                THIS IS A REQUIRED OPTION
   --volume-maximum-full-percentage                 The maximum percentage of the volume capacity, as set by volume-capacity, before the volume startsreporting
                                                                that the disk is full. If the number is negative then it will be infinite. This defaults to 95
                                                                e.g. --volume-maximum-full-percentage=95
   --volume-name                                        The name of the volume.
                                                                THIS IS A REQUIRED OPTION
   --vrts-appliance                                             Volume is running on a NetBackup Appliance.


No comments: