Siggraph 2014: Day 2

AUG 11 2014, MONDAY

MONDAY 1000..1100

  • BOF First Steps in Large-Scale Rendering
    • [susan(pixar, white blonde), xx(), xx, xx, xx, david(pixar 19 years)] panel
    • pixar david, 19
    • ruca, from cube
    • ?? from singapore
    • **, toonbox
    • **, google, softare in pixar and animallogic
    • susan: pixar global rendering
    • Mark Hills
    • David Laur
    • [email protected]
  • big farm problem
  • selling softare for the small render farms
    • growing your farm
    • some caution
      • hetro geneous workers
      • HPC (high performance computing)
  • very often it’s linux because of the
  • multiple levels of caches in pixar render farm
    • go for memory upgrade
    • it’ll be better on IO node
  • using farm for computing, processing, simiulation, general purposes
    • 95% is rendering in pixar -> because of the GI
    • 5% is simulation, using all IO
  • segement the pool
  • remote rendering
    • route the low IO
    • desktop is high end gpus
    • linux =>
  • who’s doing GPU based the farm stuff
    • use GPU based farm
  • one frame per core
  • 4G per slot -> number of thread
  • bottle neck -> droppping
    • finding the sweat shop
  • killing the machine
    • keeping the tap on the production
  • multiple layers of controls
    • most of the queueing
    • multipile jobs coming in a smaller scope
  • 8 years ago, hope, multithreaded
    • the realizity -> it’s not scale
    • 24 core machines into 6-8 blocks
  • different versions and renders
    • 60% utilization=> cpu
    • threading control
    • 70-80%
  • always battle with cpu utilization with the vray settings
    • renderman, arnold
  • licensing ->
    • 1 license per machine -> tanking with jobs in one machine
  • rendering in the cloud
    • local farm busy
    • during the spike -> send to the remote farm
  • keep high io local
    • pick the high io render
  • how to pick high io???
  • data transfer to the remote is an issue..
    • make it fast to get it back and forth
    • latency between the managed render service ??
    • remote mounting from the
  • traffic is
    • syncing based file system
    • asset management is structured.
  • checking out in the cloud
    • 10Mbit -> typical studio
    • speed of the turn out
    • 1Gbit -> at teast for studio
    • i think we have 20G bit
  • pixar -> centralized in house *
  • small studio needs a turn key solution ???
    • provision them remote
  • sync or reupte
  • upload mananged servcies -> recommended??
    • upload the files through the web interface -> not good
    • too slow
    • managed service..don’t give you a lot of support, email that’s it
  • cost of the administration
  • configuration management software
  • multithreading on the virtual machines
  • provisioning the vm
    • get the core.
      • gareenteed the core, io -> provision
    • nfs over the network -> latency and slowness, cache on vps
  • is it a nfs, real solution??
  • segarate the traffic -> read from textrue, write to other file
  • traffic graph -> on the farm mount, know right away which file system is saturating the network
  • nfs
  • solution for the small company,
  • caching and latency
    • capital expance vs operation cost -> tax credit, decision where to go
    • financial advantage,, not only technical values
  • shortlist of priorities *
  • software tagging on the worker
  • scripts needed for rendring
    • how to wrap around?
    • send all render as an one big script?
  • wrapper script, lock
    • artists setup and the farm
  • design and the pipeline
  • opensource scheuler ??
    • 2 yeras ago -> liek 10
    • opendrag,,
  • good reference
    • best support of the vendors of the queuing software
  • amazon mka

MONDAY 0530..0700

  • threading
    • threading => intel tbb: use it all of them
    • openmp
    • c++ thread
    • boost threadin
  • problems
    • standarize tbb
  • pixar’s system => presto
  • the reality is
    • use tbb
  • multithread vfx
  • multhreading and vfx .com
    • get the contact information
  • using tbb ->
  • anyone using tbb
    • in gaming company, are using tbb on multiple gaming project
    • platform specific apis
    • internal framework wrapper
    • os differences
    • how do you finetuning betwen the different flatforms
      • follow the least constraints
      • what’s your targets? -> 4 cores?
      • job based, use core until the jobs done
      • until the stop is locked
      • game
      • do you know anything that will scale to?
      • 12 or 16 cores?
  • why not tbb?
    • openmp is easier
  • tbb, not good for length
    • microsft tpl -> we use opensource
  • c++ standard will have something similar
    • similar to c++ 0x
  • jame from intel
    • a few thoughts from intel’s perspective
    • true industry implementations
      • open cl, openmp port io,
    • standard is slow
      • tbb or current collections?
      • 2007 -> tbb, filling the gaps
    • lambda in c++11
  • gpu computing
    • cuda vs opencl
    • sebastian from nvida
    • 4 categories of reasons to use gpu computing
      • hardware: unified virtual memory. get the variables, fully debuggerable hardware.
      • tools do debug memory,
      • profiling
      • collection of libs
      • threast?, has cpu fall back
      • optics lib, raytracing on the gpu
    • what’s holiding up?
      • industry standards? -> monopoly
      • how serious
      • sweat spot?
    • highly unpredictable
    • why not use gpu?
    • unified memory -> mem and gpu ram
    • like to debug.
      • gpu = 100x of cpu
      • honest pair cpu vs gpu: 5 to 10x
  • rasment in disney animation studio
    • gpu 2x4x speed up
      • real optimzation on cpu
      • got 20 to 30x in the cpu
    • real problem not doing much of gpu
      • volitability of the environment, hardware and display driver
    • was painful 2-3 years on linux
    • design algorith that can scale
      • 10 of 1000s of threads
    • why not use cuda?
      • code maint cost
      • design first on cpu
    • weta digital
    • just optimizing and profiles
  • really goes down to standard platform
    • standard c++, opencl, tbb
  • cuda is getting mature
  • larger data set -> 28G ram
    • need a bigger memory
    • trying
  • 10G memory footprint on window.
  • gpu with 24G ram
  • why do we need gpu?
  • care about the energy efficiency?
    • data center cooling problmes
    • battery hot, cooling
  • performace and cooling?
  • power distribution in the data center
  • DOE, exso scale comuting
  • 1000 cores -> parallel programming
  • what goes inside of the cpu
  • energy efficient -> time efficient
    • avoid moving the data around
    • optimizing the data movement
    • main memory vs cpu or gpu
    • data movment within the chip, l1, l2 and registers
  • triple nested for loop to pass the asignment ->
    • intel xxx using what/
    • assignment from the code
  • intel vtune amp -> cpu and memory bandwith per socket
  • mpc : layer upon layer *
  • autodesk maya animation
    • 95%
    • using tbb
    • dreamworks session -> two way to improve
      • graph level parallism =>
      • node level parallism =>
    • motion builder -> make it past?
      • threading at the beginning
      • internal scheduler
      • thread classes
      • split and runnign as they want
    • story tool
      • not happy with the speed
      • optimize the speed
    • autodesk to dreamworks
      • dreamworks prima
      • built from scratch
      • 20 years old
      • thread safe
      • keeping it thread safe
    • whole lot of unit test
      • threading bug in the code
      • you can fix it
      • used tbb a lot -> hitting some limitations
    • intel new thing -> sbb(vecotrization)
      • check it out after siggraph
COMMENTS
Related Post