Facebook ML Open House

2017-11-14 / Facebook Bldg 13

Intro

AI at facebook

25% ML engineer,

next gen challenges

ai at massive scale: trillions of examples per models
extreme vision: billion visual classes
How to solve problems like false news and polarizations?
How do we build closer and safer communities?
connect you to the best stories?
ad team: serving the best ads to you
language barriers
natural conversational interfaces
6 speakers and posters

1.Applying Machine Learning to Ads Integrity

Emanual Strauss/ Engineering Manager

Ads, Policies and Quality

example of low quality ads

shock and scare
before and after
amount of text in images
https://facebook.com/policies/ads

Ad Review Overview

Detecting low quality ads Ad submission -> feature engineering -> model scoring -> human computing decision -> ads ranking & delivery

Challenges

large class imbalances (most of ads are good quality)
limited human reviewer capacity and accuracy is not perfect
feature engineering on ad content(text, image, video, audio, etc)
most traing data comes from already known patterns
ecosystem is dynamic and patters evolve over time
global reach, internationalization and multiple languages
scalability

Large Class Imbalance

goal
- measure our review system
- identify new trends in not-sutable ads
detecing low quality Ads

Account Quality

good group of social connection produces good results

image and video understanding

real photo or drawing or meme?
who/what is in the picture>
nearest neightbors embedding
similarity

Amount of Text detection

images can’t have more than 20% text

2.Tacking Misinformation, fakenews

Outline

problems and our role in curbing misinformation
steps towards solutions
challenges

Misinformation

what is truth? are we editors?

how?

3rd party fact checkers
- snopes, politifact, factcheck.org le monde.fr correct!v niewuescheckers nul.nl

challenges

hard ml problem
- labels are sparse and expensive
- fast
- mostly text
- hard to do pictures
examples
- images: burning flag, photoshoped
scaling globally
- 3 PFCs are not present in every country
- every country has its own nature of fake claims

3.Make Every Ad Impression Meaningful at Scale with Machine Learning

Tianshi Gao/ Engineering Manager

Agenda

ads ranking overview
ads ranking ML problems and challenges
scale and accelerate ML development

overview

2 billlion people to 6 million advertisers
make every ad impression meaningful
how: machine learning

Ad Ranking ML problems

event predition
- probability of cliking like
value prediction
- guess of how much money people to spend
multiple event predictiono
- P(view), P(click), P(add_to_cart), P(check_out)
recommender system for dynamic ads
reinforcement learning
- ads ai, like or buy
- try to learn some policy to maximize the value
content and sequence understanding
- oba(online behavior advertising)

ML challenges

a data centric view,
feature/ label
heterogeonous features -> hetrogenours models and optimizers, joint training
sparsity, big collection of small data
bias
delayed feedback
non-stationary, seasonality

scale and accelerate ML development

area under the curve and ML dev cycle
hypothesis -> experiment - observation -> hypothesis

Modeling Platform

IP[y]: notebook reader = createsomereader model.fc(100,20) # define model architecture start_some_training_workflow(model, reader) caffe 2

Auto ML

innovations / repeated manual work
innovations / repeated manual work + machine

4. Understaing content and understanding in newsfeed

Shiling Ding

connect content and user

feed ranking machine -> rank stories and score
latest stories to the top of the feed

People problem

problem: stories a person cares about might not be shared with them
solution: recommendation can be the primary surface for people

Products

explore feed
suggested videos

Recommendation: why hard?

scale
- 1 billion posts per day, 2 billiion MAU
personalization
- content: text, image, video
- user interests
solution
- multi-stage approaches: retrieval and ranking
- content and user understanding

How

entity extracting and linking
- human readable entities, eg elon musk, tesla, golden state warriors
- taxonomy
classification/ embedding
- collaboration with applied machine learning team
- deep text for text understanding
  - docnn/regnn
end-to-end approach
- pull all raw features into dl models
- joint optimzie for the final ranking goal
user profile
- user interests by a list of entities with scores
- user-text embediding
user modeling
- RNN: use session and contextural signals to predict user short term interest
- RNN - applied on snapshot (short time)
- Query 1, history sequence,
scale
- retrieve ccontent relevant to user interest

Entity based recommendation

form query from user profile
extended user interests to similar *filament = embedding methods
learn from graph edges

[users] - engage - [content] - title and description - [words]

subscribe - [page] - create - [content]

similariy searching using embedding
https://code.facebook.com/posts/

DL methods

objective
minimize the distance between user and storiy if user engaged the story
neural nets translate features into embeddings

summary

recommendation is challenging
- connect use to the right contents
- exploration
- global optimization
- reinforcement learning

5.Unlocking Meaning Across Language

AML/ LAnguage Technology and Translation team

Facebook is Multilingual

101 languages
idenifiy 150 languages
1/2 no english

Building Multilingual Community

vision: facebook for everyone in their preferce

45+ languages supported for translations 2K+ 600M_ daily active people 1.3B every user

Migration to Neural MT

AI in a multilingural

developing NLP Applications

english data -> english classifier
russian data -> russian classifier
english > russian > japanese > german (languages of the internet, wikipedia)

Cross-lingual understanding(XLU)

feed many languages data -> multi-lingual classifier
universal representation of text
universal word embedding -> learn embedding for each langues separately
project onto common space
require a directory

XLU through universal word embeddings

train a classifier using the universal word embedding

XLU accuray

sale post classification, 79~96% accuracy

XLU vs translation

XLU at Facebook

integrated in our text understading platforms
- visit our platforms poster

Going beyond words

capture universal semantics at the sequence level

universal sentence representation

learn an encoder that maps each sentence into a vector

6.Audio/Speech

Reena

Mission

build audio and speech tch to drive at scale at facebook

Video Captioning

2016 first, editors can review and edit

MZ harvard speech - captioning

Video views per QTR day

Voice Interfaces

oculus

Video Understanding - Beyond Speech

language identification
semi-supervised traning
traning on 50k hours of audio data
models are lightweight so we can run them piror to ASR

Audio-Visual Dataset for Videos

why?
- audio-visual modeling required to detect animal cruelty
- public datasets available, but limited and not on FB data
- labels across different modalitieis
- visual: scenes, objects, actions
- audio: auduio events
- audio labels compatible with audioset
- we are planning to release it to the public to support research on joint modeling

Posters

caffe2
Lumos

Machine Learning

Facebook

Open House

Facebook Machine Learning Open House