Storage-based compression and de-duplication overview

 

Summary

Managing storage is always a challenge, so anything to simplify it is worth a look. Rick Vanover shares notes on storage-based compression and de-duplication.

Events

IT Priorities 2010

Sydney, Australia - 27 Jul 2010
Melbourne, Australia - 28 Jul 2010
Mumbai, India - 4 Aug 2010
Delhi, India - 6 Aug 2010

IDC's Asia/Pacific Cloud Computing Conference 2010
31 Aug 2010

Marriott Hotel, Singapore

At the recent Gestalt IT Field Day, Silicon Valley companies allowed attendees to visit and see technologies in use.

One of the stops during the event was Ocarina Networks. Ocarina specializes in online storage optimization to reduce disk consumption. The main point of the visit was to obtain a clearer understanding of compression and de-duplication for data management.

For compression, there are a few standard ways to approach it.

Compression
There are two techniques to compression. The first is a dictionary-based technique implemented by mainstream products such as ZIP. This algorithm doesn't help much with rich content, such as multimedia, due to its lack of repetitive patterns.

Today, with faster processors, statistical compression techniques can now be used. A statistical approach for compression can be used to make predictive assignments for the content of data. This is especially relevant for predicting pixels in images.

Compressors can utilize powerful processors to use complex algorithms for different data types. There are countless compressors available for various data sets. In Ocarina's case, more than 120 compressors for various file types are used. Then, the right compressor for an application's data is used to obtain the most efficiency.

De-duplication
De-duplication simply gains efficiency by not consuming storage by many of the same types of content; there are a few ways this can be realized. One method is whole-file single instancing de-duplication. This looks to find the same exact file, including different file names. While quite simple, this scenario is not that frequent in real practice.

De-duplication can work with multiple files, looking for sections that are the same within different files. Each file can be represented to a series of chunks. When these chunks appear in other files, a de-duplication efficiency can be made. An example of this type of de-duplication can be a Word document with a graphic object of a logo. The de-duplication algorithm will reference one instance of the chunk in what's called a sliding window, fixed size chunk.

Considerations for daily use of compression and de-duplication
While it is beneficial to realize de-duplication and compression benefits, there are some considerations that go into what it means for day-to-day usage. One example is where a file that has been compressed and de-duplicated on disk is emailed. Once it is removed from the de-duplicated storage, it is restored to its uncompressed size.

The other consideration is the decompression engine for the compressed data. There can be overhead for compression, and there can be an incredible amount of math involved. For complex compressors, there may be CPU latency to decompress the data. It truly depends--decompression can be immeasurable for certain compressors but can be noticeable for larger applications.

Rick Vanover is an IT infrastructure manager for Alliance Data in Columbus, Ohio. Rick has years of IT experience and focuses on virtualization, Windows-based server administration and system hardware.

Talkback

Add your opinion

In order to post a comment, you need to be registered. (Sign In or register below)

Post your comment
Access data anywhere in the private cloud & enable entirely new efficiencies with EMC VPLEX.
Tech Vendor: EMC

ZDNet Asia Live

US court rejects class action status for Intel antitrust suit http://bit.ly/9AbnMF

Non-green IT products 'marketing suicide': This 50-hectare eco-business park is described as a "living laboratory"... http://bit.ly/aCqko4

great! US court rejects class action status for Intel antitrust suit http://bit.ly/9acwER Good day!

Shocked! RT: @danielgoh: Oh really? RT @scoopsg: (zdnetasia) S'pore marketeers not chirping to Twitter's tune http://scoo.ps/dpkySs

Non-green IT products 'marketing suicide': By Munir Kotadia, ZDNet Australia on July 30, 2010 (8 minutes ago) Vend... http://bit.ly/aCqko4

Asian firms aware of IT snoops. http://bit.ly/9eGRxG

sg marketeers not chirping to twitter's tune http://bit.ly/aRAa1Y - baby steps baby steps

Non-green IT products 'marketing suicide': This 50-hectare eco-business park is described as a "living laboratory"... http://bit.ly/cEkDUD

Non-green IT products 'marketing suicide': At the same time, it seems vendors see green technology as a very high ... http://bit.ly/aCqko4

1 hour 11 minutes ago by greentreats on topsy

Oh really? RT @scoopsg: (zdnetasia) S'pore marketeers not chirping to Twitter's tune http://scoo.ps/dpkySs

@mrcolinlim but of course for more tech updates you can always visit zdnetasia.com

RT @zdnetasia: Searchable Facebook user data posted to Pirate Bay http://bit.ly/ciJQxY

2 hours 8 minutes ago by phyllis777loves on topsy

RT @HazelHassan: Facebook led police to Philippine serial killer -- http://ow.ly/2iGnh

RT @zdnetasia: 10 questions to ask when http://www.zdnetasia.c...

RT @zdnetasia: S'pore marketeers not chirping to Twitter's tune http://bit.ly/bF2aoa

Facebook led police to Philippine serial killer -- http://ow.ly/2iGnh

2 hours 24 minutes ago by hazelhassan on topsy

S'pore marketeers not chirping to Twitter's tune: Marketing via Twitter has not picked up in Singapore, where it s... http://bit.ly/9GEDJS

great! S'pore marketeers not chirping to Twitter's tune http://bit.ly/dotZES Good day!

http://bit.ly/8v7Ov3 S'pore marketeers not chirping to Twitter's tune - ZDNet Asia http://is.gd/dSngs

4 hours 16 minutes ago by easytweeting on topsy

in the mean time, if you need to find PDF eBooks, you may use http://www.findpdf.us/

4 hours 47 minutes ago by findpdf on Researchers find workaround for Adobe PDF fix

Just want to say what a great blog you got here! My appreciation of your work, cause i am an IT student also. Try this one too, http://w...

4 hours 55 minutes ago by winsource on Making the case for Filipino IT entrepreneurship

Hi, We have ton of HP empty cartridges. Could you collect them in our office??
Thanks

1 day 48 minutes ago by Pacific Time Pte Ltd on Recycle your HP print cartridges and get rewards

Thanks Kenneth, for your insights. Good to know people out there can see the issue for what it is, and to do so impassively, that is. ...

2 days 53 minutes ago by yedwin on iPhone 4 shows prudence in procrastination

While I agree that the issues with the device have raised many an eyebrow, I think it's unwise to forget that many phone reviews have...

2 days 5 minutes ago by kennethkoh on iPhone 4 shows prudence in procrastination

The online apple store http://store.apple.com/ is not available now. Maybe it's updating the pricing ;)

2 days 3 minutes ago by mingnow on iPhone 4 to ring in Singapore on Friday

After an awful silence, finally the prices are out..

2 days 58 minutes ago by melvinchia on iPhone 4 to ring in Singapore on Friday

Glad you discovered the Xfce 4.6 magic. Its other endearing feature is its phenomenal configurability. You can make the desktop look and ...

3 days 5 minutes ago by gnome_refugee on Smitten with Xfce 4

yep, tried them all and xfce with compiz/emerald instead of fvwm is by far the best experience I've had. If you didn't know ther...

3 days 3 minutes ago by ggolemg on Smitten with Xfce 4

@mingnow: why do you think so? How do you think the FOSS community could tackle this issue? I'm involved in a lot of efforts to get t...

3 days 9 minutes ago by fredericmuller on Taobao initiates Chinese open source revolution

Geez. I would think giving free books and getting kids to school would be a better place to start.

3 days 17 minutes ago by mingnow on India's US$35 tablet--how low can it go?

I think it's great the that country with the biggest internet population is finally contributing back to the open-source world. I thi...

4 days 3 minutes ago by mingnow on Taobao initiates Chinese open source revolution

hey.there Im Wendy from a PR Agency.I find your blog interesting and well written.In days to come,we would hold an event. Therefore We ...

4 days 34 minutes ago by wendy on iPhone 4 shows prudence in procrastination

It could be done without all these. Just use the opacity addon of Compiz.

4 days 58 minutes ago by hariks0 on How to get RGBA support in Ubuntu