Skip to content
Posted on:November 15, 2017 at 02:00 AM

Facebook Machine Learning Open House

Facebook Machine Learning Open House

Facebook ML Open House

2017-11-14 / Facebook Bldg 13


AI at facebook

25% ML engineer,

next gen challenges

  • ai at massive scale: trillions of examples per models
  • extreme vision: billion visual classes
  • How to solve problems like false news and polarizations?
  • How do we build closer and safer communities?
  • connect you to the best stories?
  • ad team: serving the best ads to you
  • language barriers
  • natural conversational interfaces
  • 6 speakers and posters

1.Applying Machine Learning to Ads Integrity

Emanual Strauss/ Engineering Manager

Ads, Policies and Quality

example of low quality ads

Ad Review Overview

Detecting low quality ads Ad submission -> feature engineering -> model scoring -> human computing decision -> ads ranking & delivery


  • large class imbalances (most of ads are good quality)
  • limited human reviewer capacity and accuracy is not perfect
  • feature engineering on ad content(text, image, video, audio, etc)
  • most traing data comes from already known patterns
  • ecosystem is dynamic and patters evolve over time
  • global reach, internationalization and multiple languages
  • scalability
Large Class Imbalance
  • goal
    • measure our review system
    • identify new trends in not-sutable ads
  • detecing low quality Ads
Account Quality
  • good group of social connection produces good results
image and video understanding
  • real photo or drawing or meme?
  • who/what is in the picture>
  • nearest neightbors embedding
  • similarity
Amount of Text detection
  • images can’t have more than 20% text

2.Tacking Misinformation, fakenews


  • problems and our role in curbing misinformation
  • steps towards solutions
  • challenges


  • what is truth? are we editors?


  • 3rd party fact checkers
    • snopes, politifact, le correct!v niewuescheckers


  • hard ml problem
    • labels are sparse and expensive
    • fast
    • mostly text
    • hard to do pictures
  • examples
    • images: burning flag, photoshoped
  • scaling globally
    • 3 PFCs are not present in every country
    • every country has its own nature of fake claims

3.Make Every Ad Impression Meaningful at Scale with Machine Learning

Tianshi Gao/ Engineering Manager


  • ads ranking overview
  • ads ranking ML problems and challenges
  • scale and accelerate ML development


  • 2 billlion people to 6 million advertisers
  • make every ad impression meaningful
  • how: machine learning

Ad Ranking ML problems

  • event predition
    • probability of cliking like
  • value prediction
    • guess of how much money people to spend
  • multiple event predictiono
    • P(view), P(click), P(add_to_cart), P(check_out)
  • recommender system for dynamic ads
  • reinforcement learning
    • ads ai, like or buy
    • try to learn some policy to maximize the value
  • content and sequence understanding
    • oba(online behavior advertising)

ML challenges

  • a data centric view,
  • feature/ label
  • heterogeonous features -> hetrogenours models and optimizers, joint training
  • sparsity, big collection of small data
  • bias
  • delayed feedback
  • non-stationary, seasonality

scale and accelerate ML development

  • area under the curve and ML dev cycle
  • hypothesis -> experiment - observation -> hypothesis
Modeling Platform
  • IP[y]: notebook reader = createsomereader model.fc(100,20) # define model architecture start_some_training_workflow(model, reader) caffe 2
Auto ML
  • innovations / repeated manual work
  • innovations / repeated manual work + machine

4. Understaing content and understanding in newsfeed

Shiling Ding

connect content and user

  • feed ranking machine -> rank stories and score
  • latest stories to the top of the feed

People problem

  • problem: stories a person cares about might not be shared with them
  • solution: recommendation can be the primary surface for people


  • explore feed
  • suggested videos

Recommendation: why hard?

  • scale
    • 1 billion posts per day, 2 billiion MAU
  • personalization
    • content: text, image, video
    • user interests
  • solution
    • multi-stage approaches: retrieval and ranking
    • content and user understanding


  • entity extracting and linking
    • human readable entities, eg elon musk, tesla, golden state warriors
    • taxonomy
  • classification/ embedding
    • collaboration with applied machine learning team
    • deep text for text understanding
      • docnn/regnn
  • end-to-end approach
    • pull all raw features into dl models
    • joint optimzie for the final ranking goal
  • user profile
    • user interests by a list of entities with scores
    • user-text embediding
  • user modeling
    • RNN: use session and contextural signals to predict user short term interest
    • RNN - applied on snapshot (short time)
    • Query 1, history sequence,
  • scale
    • retrieve ccontent relevant to user interest

Entity based recommendation

  • form query from user profile
  • extended user interests to similar *filament = embedding methods
  • learn from graph edges

[users] - engage - [content] - title and description - [words]

  • subscribe - [page] - create - [content]

DL methods

  • objective
  • minimize the distance between user and storiy if user engaged the story
  • neural nets translate features into embeddings


  • recommendation is challenging
    • connect use to the right contents
    • exploration
    • global optimization
    • reinforcement learning

5.Unlocking Meaning Across Language

AML/ LAnguage Technology and Translation team

Facebook is Multilingual

  • 101 languages
  • idenifiy 150 languages
  • 1/2 no english

Building Multilingual Community

  • vision: facebook for everyone in their preferce

45+ languages supported for translations 2K+ 600M_ daily active people 1.3B every user

Migration to Neural MT

AI in a multilingural

developing NLP Applications

  • english data -> english classifier
  • russian data -> russian classifier
  • english > russian > japanese > german (languages of the internet, wikipedia)

Cross-lingual understanding(XLU)

  • feed many languages data -> multi-lingual classifier
  • universal representation of text
  • universal word embedding -> learn embedding for each langues separately
  • project onto common space
  • require a directory

XLU through universal word embeddings

  • train a classifier using the universal word embedding

XLU accuray

  • sale post classification, 79~96% accuracy

XLU vs translation

XLU at Facebook

  • integrated in our text understading platforms
    • visit our platforms poster

Going beyond words

  • capture universal semantics at the sequence level

universal sentence representation

  • learn an encoder that maps each sentence into a vector




build audio and speech tch to drive at scale at facebook

Video Captioning

2016 first, editors can review and edit

MZ harvard speech - captioning

  • Video views per QTR day

Voice Interfaces

  • oculus

Video Understanding - Beyond Speech

  • language identification
  • semi-supervised traning
  • traning on 50k hours of audio data
  • models are lightweight so we can run them piror to ASR

Audio-Visual Dataset for Videos

  • why?
    • audio-visual modeling required to detect animal cruelty
    • public datasets available, but limited and not on FB data
    • labels across different modalitieis
    • visual: scenes, objects, actions
    • audio: auduio events
    • audio labels compatible with audioset
    • we are planning to release it to the public to support research on joint modeling


  • caffe2
  • Lumos
Web Analytics