Michael's Website

April 23, 201910:00 AM

Modeling at Scale in Systematic Trading

Modeling at Scale in Systematic Trading

1000..1100 Modeling at Scale in Systematic Trading

Scott Clark, CEO, SigOpt
Nick Payton(organizer)
MOE, Metric Optimization Engine
from Cornell Univ
optimization
Bayesian global optimization

Asset Management

optimization
The Quants Run Wall Street Now => WSJ
$300B+ Assets under managements
sigopt
two sigma(asset manager) <=> sigopt

Lessons

Invest in a reproducible process
Balance flexibility with standardization
Dividde labor between humans & machines
Maximize resource utilization
Prioritize performance (broadly)

1. Invest a reproducible process

5 pillars: data, modeling, simulation, optimization, execution
Data
- historical stock prices
- company data
- comapny news
- social data
- location data
- satellite data (what shows in the parking lot)
modeling
- picking the right tool for the job
simulation
- backtest must avoid:
  - overfitting bias
  - look ahread bias
  - survivorship bias
  - p-hacking bias
  - metric bias
- defining the methods you trust
optimization
- how to tune the hyper permimter
- academic: grid search, random search
- particle based methods
execution
- once you have a model you trust
- high frequency trading
- market making
- statistical arbitrage
- rebalancing
- portfolio optimization

Balance flexibility with standardization

how to continue the advance
framework:
- solutions: standard or properietary per firm?
- innovation: incremental or existential for firm?
- status: still evolving or fully established?
modeling:
- sklearn, pytorch, tensorflow

3. Divide labor between humans & machines

what humans are bad at:
- high dimentional optimization
- tuning the knobs on the airplane
Hyperparameter Optimization
Optimizing Optimizor problem
focus on your model, leave hyperparameter optimization to us
pic 8, pros and cons with different approach
random search -> nvidia
evolutionary algo -> google
bayesian optimization
you focus on data, model and backtest
what is sent: learning rate, number of hidden layers

4. Maximize resource utilization

build or buy
well maint product:
- apache/spark, 19443 stars
- sheffieldml/gpyopt => 369 stars (sigopt competitor)
2 sigma
- asynchronous parallelization is critical for resource utilization

5. Prioritize performance (broadly defined)

performance (table stakes)
better, faster, cheaper
pic 11. two sigma better result 8x faster
paper: https://sigopt.com/research
- A stratified analysis of bayseian optimization methods
- helping real world problems
- car classification problem
  - stanford dataset
  - tuning the hyperpa
pic 12, performance table
entirely new capabilities

Thank you

machine-learning

asset management

Popular

Using Cloudflare R2 as a Pulumi Backend

Jul 1, 2024

Making Private Helm Repo

Apr 23, 2024

Pulumi

Apr 23, 2024

Accessing Google Cloud SQL from Local Machine with Cloud SQL Auth Proxy

Jun 26, 2023

Direnv with .env file

Jan 4, 2023

Postgres Useful Tips

Aug 27, 2022

GCSFuse

Aug 25, 2022

Latest

Setting up Ollama

Nov 24, 2024

Devtools of 2024

Oct 27, 2024

Prisma for both python and javascripp

Sep 16, 2024

Grow partition on ubuntu

Aug 8, 2024

Using Cloudflare R2 as a Pulumi Backend

Jul 1, 2024

Making Private Helm Repo

Apr 23, 2024

Pulumi

Apr 23, 2024

Category