Facebook ML Open House
2017-11-14 / Facebook Bldg 13
Intro
AI at facebook
25% ML engineer,
next gen challenges
- ai at massive scale: trillions of examples per models
- extreme vision: billion visual classes
- How to solve problems like false news and polarizations?
- How do we build closer and safer communities?
- connect you to the best stories?
- ad team: serving the best ads to you
- language barriers
- natural conversational interfaces
- 6 speakers and posters
1.Applying Machine Learning to Ads Integrity
Emanual Strauss/ Engineering Manager
Ads, Policies and Quality
example of low quality ads
- shock and scare
- before and after
- amount of text in images
- https://facebook.com/policies/ads
Ad Review Overview
Detecting low quality ads Ad submission -> feature engineering -> model scoring -> human computing decision -> ads ranking & delivery
Challenges
- large class imbalances (most of ads are good quality)
- limited human reviewer capacity and accuracy is not perfect
- feature engineering on ad content(text, image, video, audio, etc)
- most traing data comes from already known patterns
- ecosystem is dynamic and patters evolve over time
- global reach, internationalization and multiple languages
- scalability
Large Class Imbalance
- goal
- measure our review system
- identify new trends in not-sutable ads
- detecing low quality Ads
Account Quality
- good group of social connection produces good results
image and video understanding
- real photo or drawing or meme?
- who/what is in the picture>
- nearest neightbors embedding
- similarity
Amount of Text detection
- images can’t have more than 20% text
2.Tacking Misinformation, fakenews
Outline
- problems and our role in curbing misinformation
- steps towards solutions
- challenges
Misinformation
- what is truth? are we editors?
how?
- 3rd party fact checkers
- snopes, politifact, factcheck.org le monde.fr correct!v niewuescheckers nul.nl
challenges
- hard ml problem
- labels are sparse and expensive
- fast
- mostly text
- hard to do pictures
- examples
- images: burning flag, photoshoped
- scaling globally
- 3 PFCs are not present in every country
- every country has its own nature of fake claims
3.Make Every Ad Impression Meaningful at Scale with Machine Learning
Tianshi Gao/ Engineering Manager
Agenda
- ads ranking overview
- ads ranking ML problems and challenges
- scale and accelerate ML development
overview
- 2 billlion people to 6 million advertisers
- make every ad impression meaningful
- how: machine learning
Ad Ranking ML problems
- event predition
- probability of cliking like
- value prediction
- guess of how much money people to spend
- multiple event predictiono
- P(view), P(click), P(add_to_cart), P(check_out)
- recommender system for dynamic ads
- reinforcement learning
- ads ai, like or buy
- try to learn some policy to maximize the value
- content and sequence understanding
- oba(online behavior advertising)
ML challenges
- a data centric view,
- feature/ label
- heterogeonous features -> hetrogenours models and optimizers, joint training
- sparsity, big collection of small data
- bias
- delayed feedback
- non-stationary, seasonality
scale and accelerate ML development
- area under the curve and ML dev cycle
- hypothesis -> experiment - observation -> hypothesis
Modeling Platform
- IP[y]: notebook
reader = createsomereader model.fc(100,20) # define model architecture start_some_training_workflow(model, reader)
caffe 2
Auto ML
- innovations / repeated manual work
- innovations / repeated manual work + machine
4. Understaing content and understanding in newsfeed
Shiling Ding
connect content and user
- feed ranking machine -> rank stories and score
- latest stories to the top of the feed
People problem
- problem: stories a person cares about might not be shared with them
- solution: recommendation can be the primary surface for people
Products
- explore feed
- suggested videos
Recommendation: why hard?
- scale
- 1 billion posts per day, 2 billiion MAU
- personalization
- content: text, image, video
- user interests
- solution
- multi-stage approaches: retrieval and ranking
- content and user understanding
How
- entity extracting and linking
- human readable entities, eg elon musk, tesla, golden state warriors
- taxonomy
- classification/ embedding
- collaboration with applied machine learning team
- deep text for text understanding
- docnn/regnn
- end-to-end approach
- pull all raw features into dl models
- joint optimzie for the final ranking goal
- user profile
- user interests by a list of entities with scores
- user-text embediding
- user modeling
- RNN: use session and contextural signals to predict user short term interest
- RNN - applied on snapshot (short time)
- Query 1, history sequence,
- scale
- retrieve ccontent relevant to user interest
Entity based recommendation
- form query from user profile
- extended user interests to similar *filament = embedding methods
- learn from graph edges
[users] - engage - [content] - title and description - [words]
- subscribe - [page] - create - [content]
- similariy searching using embedding
- https://code.facebook.com/posts/
DL methods
- objective
- minimize the distance between user and storiy if user engaged the story
- neural nets translate features into embeddings
summary
- recommendation is challenging
- connect use to the right contents
- exploration
- global optimization
- reinforcement learning
5.Unlocking Meaning Across Language
AML/ LAnguage Technology and Translation team
Facebook is Multilingual
- 101 languages
- idenifiy 150 languages
- 1/2 no english
Building Multilingual Community
- vision: facebook for everyone in their preferce
45+ languages supported for translations 2K+ 600M_ daily active people 1.3B every user
Migration to Neural MT
AI in a multilingural
developing NLP Applications
- english data -> english classifier
- russian data -> russian classifier
- english > russian > japanese > german (languages of the internet, wikipedia)
Cross-lingual understanding(XLU)
- feed many languages data -> multi-lingual classifier
- universal representation of text
- universal word embedding -> learn embedding for each langues separately
- project onto common space
- require a directory
XLU through universal word embeddings
- train a classifier using the universal word embedding
XLU accuray
- sale post classification, 79~96% accuracy
XLU vs translation
XLU at Facebook
- integrated in our text understading platforms
- visit our platforms poster
Going beyond words
- capture universal semantics at the sequence level
universal sentence representation
- learn an encoder that maps each sentence into a vector
6.Audio/Speech
Reena
Mission
build audio and speech tch to drive at scale at facebook
Video Captioning
2016 first, editors can review and edit
MZ harvard speech - captioning
- Video views per QTR day
Voice Interfaces
- oculus
Video Understanding - Beyond Speech
- language identification
- semi-supervised traning
- traning on 50k hours of audio data
- models are lightweight so we can run them piror to ASR
Audio-Visual Dataset for Videos
- why?
- audio-visual modeling required to detect animal cruelty
- public datasets available, but limited and not on FB data
- labels across different modalitieis
- visual: scenes, objects, actions
- audio: auduio events
- audio labels compatible with audioset
- we are planning to release it to the public to support research on joint modeling
Posters
- caffe2
- Lumos