SNOW Video Codec Specification Draft 20070103
||

3 | ============================================= |
||

4 | |||

Intro:

======
||

This Specification describes the snow syntax and semmantics as well as
||

how to decode snow.
||

The decoding process is precissely described and any compliant decoder
||

MUST produce the exactly same output for a spec conformant snow stream.
||

For encoding though any process which generates a stream compliant to
||

the syntactical and semmantical requirements and which is decodeable by
||

the process described in this spec shall be considered a conformant
||

snow encoder.
||

15 | 78954a05 | Michael Niedermayer | |

Definitions:
||

============
||

18 | |||

MUST the specific part must be done to conform to this standard
||

SHOULD it is recommended to be done that way, but not strictly required
||

21 | |||

ilog2(x) is the rounded down logarithm of x with basis 2
||

ilog2(0) = 0
||

24 | |||

Type definitions:
||

=================
||

27 | |||

b 1-bit range coded
||

u unsigned scalar value range coded
||

s signed scalar value range coded
||

31 | |||

32 | |||

Bitstream syntax:
||

=================
||

35 | |||

frame:
||

header
||

prediction
||

residual
||

40 | |||

header:
||

keyframe b MID_STATE
||

if(keyframe || always_reset)
||

reset_contexts
||

if(keyframe){
||

version u header_state
||

always_reset b header_state
||

temporal_decomposition_type u header_state
||

temporal_decomposition_count u header_state
||

spatial_decomposition_count u header_state
||

colorspace_type u header_state
||

chroma_h_shift u header_state
||

chroma_v_shift u header_state
||

spatial_scalability b header_state
||

max_ref_frames-1 u header_state
||

qlogs
||

}
||

if(!keyframe){

update_mc b header_state

if(update_mc){
||

for(plane=0; plane<2; plane++){

diag_mc b header_state
||

htaps/2-1 u header_state
||

for(i= p->htaps/2; i; i--)
||

|hcoeff[i]| u header_state
||

}
||

}
||

update_qlogs b header_state

if(update_qlogs){
||

spatial_decomposition_count u header_state
||

qlogs
||

}
||

}

74 | 78954a05 | Michael Niedermayer | |

spatial_decomposition_type s header_state
||

qlog s header_state
||

mv_scale s header_state
||

qbias s header_state
||

block_max_depth s header_state
||

80 | |||

qlogs:
||

for(plane=0; plane<2; plane++){
||

quant_table[plane][0][0] s header_state
||

for(level=0; level < spatial_decomposition_count; level++){
||

quant_table[plane][level][1]s header_state
||

quant_table[plane][level][3]s header_state
||

}
||

}
||

89 | |||

reset_contexts
||

*_state[*]= MID_STATE
||

92 | |||

prediction:
||

for(y=0; y<block_count_vertical; y++)
||

for(x=0; x<block_count_horizontal; x++)
||

block(0)
||

97 | |||

block(level):
||

mvx_diff=mvy_diff=y_diff=cb_diff=cr_diff=0

if(keyframe){

intra=1
||

}else{
||

if(level!=max_block_depth){
||

s_context= 2*left->level + 2*top->level + topleft->level + topright->level
||

leaf b block_state[4 + s_context]
||

}
||

if(level==max_block_depth || leaf){
||

intra b block_state[1 + left->intra + top->intra]
||

if(intra){
||

y_diff s block_state[32]
||

cb_diff s block_state[64]
||

cr_diff s block_state[96]
||

}else{
||

ref_context= ilog2(2*left->ref) + ilog2(2*top->ref)
||

if(ref_frames > 1)
||

ref u block_state[128 + 1024 + 32*ref_context]
||

mx_context= ilog2(2*abs(left->mx - top->mx))
||

my_context= ilog2(2*abs(left->my - top->my))
||

mvx_diff s block_state[128 + 32*(mx_context + 16*!!ref)]
||

mvy_diff s block_state[128 + 32*(my_context + 16*!!ref)]
||

}
||

}else{
||

block(level+1)
||

block(level+1)
||

block(level+1)
||

block(level+1)
||

}
||

}
||

129 | |||

130 | |||

residual:
||

residual2(luma)

residual2(chroma_cr)
||

residual2(chroma_cb)
||

135 | |||

residual2:
||

for(level=0; level<spatial_decomposition_count; level++){
||

if(level==0)
||

subband(LL, 0)
||

subband(HL, level)
||

subband(LH, level)
||

subband(HH, level)
||

}
||

144 | |||

subband:
||

FIXME

147 | |||

148 | |||

149 | |||

Tag description:
||

----------------
||

152 | |||

version
||

0
||

this MUST NOT change within a bitstream
||

156 | |||

always_reset
||

if 1 then the range coder contexts will be reset after each frame
||

159 | |||

temporal_decomposition_type
||

0
||

162 | |||

temporal_decomposition_count
||

0
||

165 | |||

spatial_decomposition_count
||

FIXME
||

168 | |||

colorspace_type
||

0
||

this MUST NOT change within a bitstream
||

172 | |||

chroma_h_shift
||

log2(luma.width / chroma.width)
||

this MUST NOT change within a bitstream
||

176 | |||

chroma_v_shift
||

log2(luma.height / chroma.height)
||

this MUST NOT change within a bitstream
||

180 | |||

spatial_scalability
||

0
||

183 | |||

max_ref_frames
||

maximum number of reference frames
||

this MUST NOT change within a bitstream
||

187 | |||

update_mc

indicates that motion compensation filter parameters are stored in the
||

header
||

191 | |||

diag_mc
||

flag to enable faster diagonal interpolation
||

this SHOULD be 1 unless it turns out to be covered by a valid patent
||

195 | |||

htaps
||

number of half pel interpolation filter taps, MUST be even, >0 and <10
||

198 | |||

hcoeff
||

half pel interpolation filter coefficients, hcoeff[0] are the 2 middle
||

coefficients [1] are the next outer ones and so on, resulting in a filter
||

like: ...eff[2], hcoeff[1], hcoeff[0], hcoeff[0], hcoeff[1], hcoeff[2] ...
||

the sign of the coefficients is not explicitly stored but alternates
||

after each coeff and coeff[0] is positive, so ...,+,-,+,-,+,+,-,+,-,+,...
||

hcoeff[0] is not explicitly stored but found by subtracting the sum
||

of all stored coefficients with signs from 32
||

hcoeff[0]= 32 - hcoeff[1] - hcoeff[2] - ...
||

a good choice for hcoeff and htaps is
||

htaps= 6
||

hcoeff={40,-10,2}
||

an alternative which requires more computations at both encoder and
||

decoder side and may or may not be better is
||

htaps= 8
||

hcoeff={42,-14,6,-2}
||

215 | |||

216 | |||

ref_frames

minimum of the number of available reference frames and max_ref_frames
||

for example the first frame after a key frame always has ref_frames=1
||

220 | |||

spatial_decomposition_type
||

wavelet type
||

0 is a 9/7 symmetric compact integer wavelet
||

1 is a 5/3 symmetric compact integer wavelet
||

others are reserved
||

stored as delta from last, last is reset to 0 if always_reset || keyframe
||

227 | |||

qlog
||

quality (logarthmic quantizer scale)
||

stored as delta from last, last is reset to 0 if always_reset || keyframe
||

231 | |||

mv_scale
||

stored as delta from last, last is reset to 0 if always_reset || keyframe
||

FIXME check that everything works fine if this changes between frames

235 | 78954a05 | Michael Niedermayer | |

qbias
||

dequantization bias
||

stored as delta from last, last is reset to 0 if always_reset || keyframe
||

239 | |||

block_max_depth
||

maximum depth of the block tree
||

stored as delta from last, last is reset to 0 if always_reset || keyframe
||

243 | |||

quant_table
||

quantiztation table
||

246 | |||

247 | 8f39b74d | Michael Niedermayer | |

Highlevel bitstream structure:
||

=============================
||

--------------------------------------------
||

| Header |
||

--------------------------------------------
||

| ------------------------------------ |
||

| | Block0 | |
||

| | split? | |
||

| | yes no | |
||

| | ......... intra? | |
||

| | : Block01 : yes no | |
||

| | :
||

260 | | | : Block03 : : y DC : : ref index: | | |
||

261 | | | : Block04 : : cb DC : : motion x : | | |
||

262 | | | ......... : cr DC : : motion y : | | |
||

263 | | | ....... .......... | | |
||

264 | | ------------------------------------ | |
||

265 | | ------------------------------------ | |
||

266 | | | Block1 | | |
||

267 | | ... | |
||

268 | -------------------------------------------- |
||

269 | | ------------ ------------ ------------ | |
||

270 | || Y subbands | | Cb subbands| | Cr subbands|| |
||

271 | || --- --- | | --- --- | | --- --- || |
||

272 | || |LL0||HL0| | | |LL0||HL0| | | |LL0||HL0| || |
||

273 | || --- --- | | --- --- | | --- --- || |
||

274 | || --- --- | | --- --- | | --- --- || |
||

275 | || |LH0||HH0| | | |LH0||HH0| | | |LH0||HH0| || |
||

276 | || --- --- | | --- --- | | --- --- || |
||

277 | || --- --- | | --- --- | | --- --- || |
||

278 | || |HL1||LH1| | | |HL1||LH1| | | |HL1||LH1| || |
||

279 | || --- --- | | --- --- | | --- --- || |
||

280 | || --- --- | | --- --- | | --- --- || |
||

281 | || |HH1||HL2| | | |HH1||HL2| | | |HH1||HL2| || |
||

282 | || ... | | ... | | ... || |
||

283 | | ------------ ------------ ------------ | |
||

284 | -------------------------------------------- |
||

285 | |||

286 | Decoding process: |
||

287 | ================= |
||

288 | |||

289 | ------------ |
||

290 | | | |
||

291 | | Subbands | |
||

292 | ------------ | | |
||

293 | | | ------------ |
||

294 | | Intra DC | | |
||

295 | | | LL0 subband prediction |
||

296 | ------------ | |
||

297 | \ Dequantizaton |
||

298 | ------------------- \ | |
||

299 | | Reference frames | \ IDWT |
||

300 | | ------- ------- | Motion \ | |
||

301 | ||Frame 0| |Frame 1|| Compensation . OBMC v ------- |
||

302 | | ------- ------- | --------------. \------> + --->|Frame n|-->output |
||

303 | | ------- ------- | ------- |
||

304 | ||Frame 2| |Frame 3||<----------------------------------/ |
||

305 | | ... | |
||

306 | ------------------- |
||

307 | |||

308 | |||

309 | 78954a05 | Michael Niedermayer | Range Coder: |

310 | ============ |
||

311 | FIXME |
||

312 | |||

313 | Neighboring Blocks: |
||

314 | =================== |
||

315 | left and top are set to the respective blocks unless they are outside of |
||

316 | the image in which case they are set to the Null block |
||

317 | |||

318 | 90b5b51e | Diego Biurrun | top-left is set to the top left block unless it is outside of the image in |

319 | 78954a05 | Michael Niedermayer | which case it is set to the left block |

320 | |||

321 | 90b5b51e | Diego Biurrun | if this block has no larger parent block or it is at the left side of its |

322 | 78954a05 | Michael Niedermayer | parent block and the top right block is not outside of the image then the |

323 | top right block is used for top-right else the top-left block is used |
||

324 | |||

325 | Null block |
||

326 | y,cb,cr are 128 |
||

327 | level, ref, mx and my are 0 |
||

328 | |||

329 | |||

330 | Motion Vector Prediction: |
||

331 | ========================= |
||

332 | 1. the motion vectors of all the neighboring blocks are scaled to |
||

333 | compensate for the difference of reference frames |
||

334 | |||

335 | scaled_mv= (mv * (256 * (current_reference+1) / (mv.reference+1)) + 128)>>8 |
||

336 | |||

337 | 2. the median of the scaled left, top and top-right vectors is used as |
||

338 | motion vector prediction |
||

339 | |||

340 | 3. the used motion vector is the sum of the predictor and |
||

341 | (mvx_diff, mvy_diff)*mv_scale |
||

342 | |||

343 | |||

344 | Intra DC Predicton: |
||

345 | ====================== |
||

346 | the luma and chroma values of the left block are used as predictors |
||

347 | |||

348 | the used luma and chroma is the sum of the predictor and y_diff, cb_diff, cr_diff |
||

349 | 2cc45470 | Michael Niedermayer | to reverse this in the decoder apply the following: |

350 | c3922c65 | Michael Niedermayer | block[y][x].dc[0] = block[y][x-1].dc[0] + y_diff; |

351 | block[y][x].dc[1] = block[y][x-1].dc[1] + cb_diff; |
||

352 | block[y][x].dc[2] = block[y][x-1].dc[2] + cr_diff; |
||

353 | 2cc45470 | Michael Niedermayer | block[*][-1].dc[*]= 128; |

354 | 78954a05 | Michael Niedermayer | |

355 | |||

356 | Motion Compensation: |
||

357 | ==================== |
||

358 | e9314de6 | Michael Niedermayer | |

359 | Halfpel interpolation: |
||

360 | ---------------------- |
||

361 | halfpel interpolation is done by convolution with the halfpel filter stored |
||

362 | in the header: |
||

363 | |||

364 | horizontal halfpel samples are found by |
||

365 | H1[y][x] = hcoeff[0]*(F[y][x ] + F[y][x+1]) |
||

366 | + hcoeff[1]*(F[y][x-1] + F[y][x+2]) |
||

367 | + hcoeff[2]*(F[y][x-2] + F[y][x+3]) |
||

368 | + ... |
||

369 | h1[y][x] = (H1[y][x] + 32)>>6; |
||

370 | |||

371 | vertical halfpel samples are found by |
||

372 | H2[y][x] = hcoeff[0]*(F[y ][x] + F[y+1][x]) |
||

373 | + hcoeff[1]*(F[y-1][x] + F[y+2][x]) |
||

374 | + ... |
||

375 | h2[y][x] = (H2[y][x] + 32)>>6; |
||

376 | |||

377 | vertical+horizontal halfpel samples are found by |
||

378 | H3[y][x] = hcoeff[0]*(H2[y][x ] + H2[y][x+1]) |
||

379 | + hcoeff[1]*(H2[y][x-1] + H2[y][x+2]) |
||

380 | + ... |
||

381 | H3[y][x] = hcoeff[0]*(H1[y ][x] + H1[y+1][x]) |
||

382 | + hcoeff[1]*(H1[y+1][x] + H1[y+2][x]) |
||

383 | + ... |
||

384 | h3[y][x] = (H3[y][x] + 2048)>>12; |
||

385 | |||

386 | |||

387 | F H1 F |
||

388 | | | | |
||

389 | | | | |
||

390 | | | | |
||

391 | F H1 F |
||

392 | | | | |
||

393 | | | | |
||

394 | | | | |
||

395 | F-------F-------F-> H1<-F-------F-------F |
||

396 | v v v |
||

397 | H2 H3 H2 |
||

398 | ^ ^ ^ |
||

399 | F-------F-------F-> H1<-F-------F-------F |
||

400 | | | | |
||

401 | | | | |
||

402 | | | | |
||

403 | F H1 F |
||

404 | | | | |
||

405 | | | | |
||

406 | | | | |
||

407 | F H1 F |
||

408 | |||

409 | |||

410 | unavailable fullpel samples (outside the picture for example) shall be equal |
||

411 | to the closest available fullpel sample |
||

412 | |||

413 | |||

414 | Smaller pel interpolation: |
||

415 | -------------------------- |
||

416 | if diag_mc is set then points which lie on a line between 2 vertically, |
||

417 | horiziontally or diagonally adjacent halfpel points shall be interpolated |
||

418 | linearls with rounding to nearest and halfway values rounded up. |
||

419 | points which lie on 2 diagonals at the same time should only use the one |
||

420 | diagonal not containing the fullpel point |
||

421 | |||

422 | |||

423 | |||

424 | F-->O---q---O<--h1->O---q---O<--F |
||

425 | v \ / v \ / v |
||

426 | O O O O O O O |
||

427 | | / | \ | |
||

428 | q q q q q |
||

429 | | / | \ | |
||

430 | O O O O O O O |
||

431 | ^ / \ ^ / \ ^ |
||

432 | h2-->O---q---O<--h3->O---q---O<--h2 |
||

433 | v \ / v \ / v |
||

434 | O O O O O O O |
||

435 | | \ | / | |
||

436 | q q q q q |
||

437 | | \ | / | |
||

438 | O O O O O O O |
||

439 | ^ / \ ^ / \ ^ |
||

440 | F-->O---q---O<--h1->O---q---O<--F |
||

441 | |||

442 | |||

443 | |||

444 | the remaining points shall be bilinearly interpolated from the |
||

445 | a11dc59a | Michael Niedermayer | up to 4 surrounding halfpel and fullpel points, again rounding should be to |

446 | nearest and halfway values rounded up |
||

447 | e9314de6 | Michael Niedermayer | |

448 | compliant snow decoders MUST support 1-1/8 pel luma and 1/2-1/16 pel chroma |
||

449 | interpolation at least |
||

450 | |||

451 | |||

452 | Overlapped block motion compensation: |
||

453 | ------------------------------------- |
||

454 | 78954a05 | Michael Niedermayer | FIXME |

455 | |||

456 | LL band prediction: |
||

457 | =================== |
||

458 | 1e37b7e4 | Michael Niedermayer | Each sample in the LL0 subband is predicted by the median of the left, top and |

459 | left+top-topleft samples, samples outside the subband shall be considered to |
||

460 | be 0. To reverse this prediction in the decoder apply the following. |
||

461 | for(y=0; y<height; y++){ |
||

462 | for(x=0; x<width; x++){ |
||

463 | sample[y][x] += median(sample[y-1][x], |
||

464 | sample[y][x-1], |
||

465 | sample[y-1][x]+sample[y][x-1]-sample[y-1][x-1]); |
||

466 | } |
||

467 | } |
||

468 | sample[-1][*]=sample[*][-1]= 0; |
||

469 | width,height here are the width and height of the LL0 subband not of the final |
||

470 | video |
||

471 | |||

472 | 78954a05 | Michael Niedermayer | |

473 | Dequantizaton: |
||

474 | ============== |
||

475 | FIXME |
||

476 | |||

477 | Wavelet Transform: |
||

478 | ================== |
||

479 | fdb99704 | Michael Niedermayer | |

480 | Snow supports 2 wavelet transforms, the symmetric biorthogonal 5/3 integer |
||

481 | transform and a integer approximation of the symmetric biorthogonal 9/7 |
||

482 | daubechies wavelet. |
||

483 | |||

484 | 09671ce7 | Michael Niedermayer | 2D IDWT (inverse discrete wavelet transform) |

485 | -------------------------------------------- |
||

486 | The 2D IDWT applies a 2D filter recursively, each time combining the |
||

487 | 4 lowest frequency subbands into a single subband until only 1 subband |
||

488 | remains. |
||

489 | The 2D filter is done by first applying a 1D filter in the vertical direction |
||

490 | and then applying it in the horizontal one. |
||

491 | --------------- --------------- --------------- --------------- |
||

492 | |LL0|HL0| | | | | | | | | | | | |
||

493 | 7397cf3f | Michael Niedermayer | |---+---| HL1 | | L0|H0 | HL1 | | LL1 | HL1 | | | | |

494 | 09671ce7 | Michael Niedermayer | |LH0|HH0| | | | | | | | | | | | |

495 | |-------+-------|->|-------+-------|->|-------+-------|->| L1 | H1 |->... |
||

496 | | | | | | | | | | | | | |
||

497 | | LH1 | HH1 | | LH1 | HH1 | | LH1 | HH1 | | | | |
||

498 | | | | | | | | | | | | | |
||

499 | --------------- --------------- --------------- --------------- |
||

500 | |||

501 | |||

502 | 1D Filter: |
||

503 | ---------- |
||

504 | 1. interleave the samples of the low and high frequency subbands like |
||

505 | s={L0, H0, L1, H1, L2, H2, L3, H3, ... } |
||

506 | note, this can end with a L or a H, the number of elements shall be w |
||

507 | s[-1] shall be considered equivalent to s[1 ] |
||

508 | s[w ] shall be considered equivalent to s[w-2] |
||

509 | |||

510 | 2. perform the lifting steps in order as described below |
||

511 | |||

512 | 5/3 Integer filter: |
||

513 | 1. s[i] -= (s[i-1] + s[i+1] + 2)>>2; for all even i < w |
||

514 | 2. s[i] += (s[i-1] + s[i+1] )>>1; for all odd i < w |
||

515 | |||

516 | \ | /|\ | /|\ | /|\ | /|\ |
||

517 | \|/ | \|/ | \|/ | \|/ | |
||

518 | + | + | + | + | -1/4 |
||

519 | /|\ | /|\ | /|\ | /|\ | |
||

520 | / | \|/ | \|/ | \|/ | \|/ |
||

521 | | + | + | + | + +1/2 |
||

522 | |||

523 | |||

524 | snows 9/7 Integer filter: |
||

525 | 1. s[i] -= (3*(s[i-1] + s[i+1]) + 4)>>3; for all even i < w |
||

526 | 2. s[i] -= s[i-1] + s[i+1] ; for all odd i < w |
||

527 | 3. s[i] += ( s[i-1] + s[i+1] + 4*s[i] + 8)>>4; for all even i < w |
||

528 | 4. s[i] += (3*(s[i-1] + s[i+1]) )>>1; for all odd i < w |
||

529 | |||

530 | \ | /|\ | /|\ | /|\ | /|\ |
||

531 | \|/ | \|/ | \|/ | \|/ | |
||

532 | + | + | + | + | -3/8 |
||

533 | /|\ | /|\ | /|\ | /|\ | |
||

534 | / | \|/ | \|/ | \|/ | \|/ |
||

535 | (| + (| + (| + (| + -1 |
||

536 | \ + /|\ + /|\ + /|\ + /|\ +1/4 |
||

537 | \|/ | \|/ | \|/ | \|/ | |
||

538 | + | + | + | + | +1/16 |
||

539 | /|\ | /|\ | /|\ | /|\ | |
||

540 | / | \|/ | \|/ | \|/ | \|/ |
||

541 | | + | + | + | + +3/2 |
||

542 | fdb99704 | Michael Niedermayer | |

543 | a282102d | Michael Niedermayer | optimization tips: |

544 | following are exactly identical |
||

545 | (3a)>>1 == a + (a>>1) |
||

546 | (a + 4b + 8)>>4 == ((a>>2) + b + 2)>>2 |
||

547 | 78954a05 | Michael Niedermayer | |

548 | 6a1aa752 | Michael Niedermayer | 16bit implementation note: |

549 | The IDWT can be implemented with 16bits, but this requires some care to |
||

550 | prevent overflows, the following list, lists the minimum number of bits needed |
||

551 | for some terms |
||

552 | 1. lifting step |
||

553 | A= s[i-1] + s[i+1] 16bit |
||

554 | 3*A + 4 18bit |
||

555 | A + (A>>1) + 2 17bit |
||

556 | |||

557 | 3. lifting step |
||

558 | s[i-1] + s[i+1] 17bit |
||

559 | |||

560 | 4. lifiting step |
||

561 | 3*(s[i-1] + s[i+1]) 17bit |
||

562 | |||

563 | |||

564 | 78954a05 | Michael Niedermayer | TODO: |

565 | ===== |
||

566 | Important: |
||

567 | finetune initial contexts |
||

568 | flip wavelet? |
||

569 | try to use the wavelet transformed predicted image (motion compensated image) as context for coding the residual coefficients |
||

570 | try the MV length as context for coding the residual coefficients |
||

571 | use extradata for stuff which is in the keyframes now? |
||

572 | the MV median predictor is patented IIRC |
||

573 | 2b6134b3 | Michael Niedermayer | implement per picture halfpel interpolation |

574 | c78fc717 | Michael Niedermayer | try different range coder state transition tables for different contexts |

575 | 78954a05 | Michael Niedermayer | |

576 | Not Important: |
||

577 | c64a8712 | Michael Niedermayer | compare the 6 tap and 8 tap hpel filters (psnr/bitrate and subjective quality) |

578 | 78954a05 | Michael Niedermayer | spatial_scalability b vs u (!= 0 breaks syntax anyway so we can add a u later) |

579 | |||

580 | |||

581 | Credits: |
||

582 | ======== |
||

583 | Michael Niedermayer |
||

584 | Loren Merritt |
||

585 | |||

586 | |||

587 | Copyright: |
||

588 | ========== |
||

589 | GPL + GFDL + whatever is needed to make this a RFC |