Friday, August 1, 2008

The new generation codecs

... in writing for the new generation. The most interesting in the present time: that we think the best and most true now, for the following generations will be something "Mesozoic"-conservative and cumbersome. Rules will they, and we will only scold the elderly, saying "e-echo, assembler, th ... sorry, "e-echo, Windows Vista". And by the way, now taking place some interesting movements around Ads ( "codec" - encoder-decoder), as is talk. Talk it makes sense for other reasons, because very often, young readers write letters to the issues, not quite knowing how, actually, Ads work.

Of course, it is assumed that we now reads not quite prepared by the user, so the narration and will be built virtually "on the fingers."

To begin, let's have a look, as was the schedule, and take it as an example. Naturally, raster image can imagine in the form of BMP-file, which displays all at the expense of points. Naturally, originally thought about compression without loss (lossless-compression), and impinges on the basic question ... archivers, or rather, their algorithms. Inventions in this area was a lot, but at the same time limit onset of Technology. That is, the more it is not possible to pursue, but need! Accordingly, it was decided to use approximation, that is simplifying the complex, and are provided with compression losses in the data, but ... without tangible losses to the human eye (perception, and, accordingly, and result). GIF went, mainly by simplifying colours plus archiving using LZW-algorithm. And in the JPEG provided more complex, but a perfect arrangement with the division color models (just as was done in color television), followed by blocks of a single, razrezhivaniem on flowers and so on. As a result, became the most popular JPEG compression standard images. Besides, he was the precursor of technology, or better, a base for the emergence of MPEG and modern digital broadcasting.

In principle, if then engaged idealizirovaniem and does not imply algorithmic model with losses of information, before today's realities, it would be virtually impossible to reach. As the saying goes, be easier, and you pull people, or rather, "Do approximation, and the problem solved type":). Incidentally, the replacement of large quantities of data mathematical algorithms - this is the most optimal solution of modern problems. For example, in one of the publications your humble servant wrote a series of materials on II and leads a very good example. Recall than used in its calculations our parents when there was no portable calculators. Correctly, tables with values for calculating the results of roots, trigonometric functions, powers and even ... simply multiply. All of this data Gromada now replaced a few lines of program code. Of course, we have described the most ideal simplification, but in audio and video we can see something the average.

In sound confronted with a similar problem, and that in the schedule, and because BMP, and WAV - is virtually tabular data storage. And in the same sound as originally thought about lossless-option, but he has no panacea. Even now, for a modern PC user has no particular difference: store and the usual wav-files or their lossless-analogues. Indeed compression ratio is obtained sufficiently small, no matter how perfect algorithm is not used. Therefore, here and decided to go to the options lossy.

Initially, the most simple, as the saying goes, "not penetrating inside." For example, reducing the value Rate and wide (almost the same as in GIF'a x graphics - reducing the color palette), we get the win in the form of reducing the volume of data, but losses as a tangible listening. Also we can uzhat frequency range, for example, before the speech spectrum, but this option was not suitable in music. Incidentally, the question on the filling: uzhat as a frequency range up to the level of speech signal without using Fourier transformation, that is not moving in the spectral region? Correctly, "pohimichit" with frequency.

And one of the most interesting ideas in the framework of modernity can be seen in fresh draft lossyWAV. And the younger generation with its fairly interesting look at all allowed to look at the problem from the other side. The point is that they went on ways to simplify WAV-file, reset junior bits, and the versions of these deletions are calculated on the basis of filling each individual sample (digital record). To have been more understandable, it is as if at some point simplifies BMP. After that comes standard lossless-compression processing and we can already observe a greater compression ratio. And, again, drawing parallels with the graphics, in this case we can speak about intellectual reducing color palette. At the hearing lossyWAV perceived almost identical Natural file, so very often say it is about this algorithm, and the term can be applied lossless.

In fact, it is always obvious next. For example, most modern lossy-codecs include reducing the amount of data through the correction in the field of frequency (frequency domain, before that, we talked about a temporary - time domain). If you explain everything on a simple level, using conventional Fourier transform where to build a range of specific sound fragment must select a certain number of periods in the amplitude-time of submission. That is, the file is produced as if composed of personnel-spectrogram. He himself and in their stores. And in each of these frames can make certain operations circumcised surplus. That is, the simplification comes at that level. For example, if a piece of silence, and then fill out all zeros, if there are parts of the spectrum, which proslushivaemy poorly, they also removed for catches, and so on. You can apply and psihoakusticheskie model. It is therefore very much depends on the algorithm and its quality, because each of the developers realized their approach. Therefore, referring to the lossy-codec, they always need to watch at least the fact that he gives as a result of his actions. What many, for example, generally do not see the difference between Ogg Vorbis and MP3, although their algorithms work in different ways.

lossy-compression is based on simplification in the frequency domain, is the most convenient for today, when viewed from the perspective of quality / compression ratio. And, most likely, this picture would continue in the future. Even one can say that these technologies have already reached a limit in its development, although in each of the specific cases there are some mistakes or shortcomings. But ... in fact, a very interesting topic to study, so new items in this area appear often enough. In fact, a very interesting niche creating codecs with open licenses, which do not have to pay developers, because there is always such a demand.

And, apart from the very sound quality and coding, and now nasuschen another important issue - namely, performance of an algorithm. For example, most modern codecs operate with a delay of approximately 25 ms or higher. This is critical for the transmission of audio in real time (telephony, etc.). Although recent statements by new items can not be attributed to the end of last-start this year - CELT ( "Code-Excited Lapped Transform"). There has been this delay is from 3 to 9 ms. Besides himself algorithm is simply and clearly explained within the resources jmspeex.livejournal.com and www.celt-codec.org.

Also, for more detailed data can refer to the documentation on MP3, Ogg Vorbis, Speex, as well as starichkov type m-Law, ADPCM, and so on.


In conclusion

That is, in fact, within audiokompressii everything revolves around two principal schemes, namely:

* Employment in the temporary field, simplification + compression.
* Jumps in the frequency area, the work there, simplification + compression.

No comments:

Post a Comment