The Price of Free: Privacy Leakage in Personalized Mobile In ...

The Price of Free: Privacy Leakage in Personalized Mobile In ...

The Price of Free: Privacy Leakage in Personalized Mobile In-App Ads Wei Meng, Ren Ding, Simon P. Chung, Steven Han, Wenke Lee College of Computing Georgia Institute of Technology 2017.6.14 Outline Background & Motivation Methodology Characterization of Mobile Ad Personalization Privacy Leakage through Personalized Mobile Ads Discussion 2 Mobile In-App Ad Ecosyste m $$$

$$$ Advertiser Advertiser $$$ } p: for XY Z liz ed Ap r so pe ue eq R 8} 59 :0 02 pp

Ad ,A $$ Advertiser t {U se r: X YZ , pe r for son XY aliz Z ed Z XY st $$ Ad : er {U se r: XY Z,

na o rs YZ e p rX d fo A d na e li z Ad Network Us ed t{ liz es na qu r so Z pe r XY Re Ad fo Ad 3 00

: Ad 07 Req 4} ue s Ad q Re s ue se U { t Z, Y r: X p Ap } Ad Advertiser $$$ Ap

p: Previous & Recent Work on Mobile Advertisin g Targeting & personalization [SmartAds (MobiSys13), MAdScope (MobiSys15)] Privilege abuse by mobile ad libraries [AdSplit (Security12), AdDroid (ASIACCS12), LayerCake (Security13), One Origin Policy (Web Browser) ] Fraud in mobile advertising [AdSplit (Security12), LayerCake (Security13), DECAF (NSDI14)] Privacy-Preserving mobile advertising [M. Gtz, etc. (CCS12)] 4 Mobile (Android) In-App Ad Ecosystem $$$ $$$

Advertiser Advertiser $$$ $$$ for XY Z na liz ed Ap Ad pe r so {U se r: XY Z, st ue eq R

59 :0 } 8} pp 02 ,A $$ Z XY Ad : er Us ed t{ liz es na qu r so Z pe r XY Re Ad fo $$

Ad Network Advertiser Ad p: Advertiser This Wor k Characterizing mobile in-app ad personalization for real people What personal information about real end users a dominate ad network such as Googleknow and use in personalized mobileadvertising? Estimating mobile apps ability of learning about a user by observing personalized ads Can an adversary with access to personalized

mobile ads gain any information about real users? 6 Outline Background & Motivation Methodology Characterization of Mobile Ad Personalization Privacy Leakage through Personalized Mobile Ads Discussion 7 Personal Information of Interes t Interest Profile {Music, Games, Sports, }

Demographics Age, Gender, Education, Income, Ethnicity, Political Affiliation, Religion, Marital Status, Parental Status https://support.google.com/adwords/answer/2580383?hl=en 8 Challenges and Our Approache sTriggering personalization based on target attributes of our interest Using synthetic user profile is circular Does ad network know users gender? -> (We do not know how ad network knows users gender ->) Let us build profiles for male and female users ->

Observation: Ads are not correlated with gender -> Ad network does not use / know users gender. Really??? Our approach: Using profiles of real users 9 Challenges and Our Approaches (cont.) Isolating personalization from other target attributes Many attributes may affect ad personalization App developers could provide target attributes through ad library APIs Ads may be personalized based on users geolocation Our approach: Collecting data in an isolated app 10 Ad Collection Our Mobile Ad Study app

Connects users device to our VPN server (Isolating geolocation) Serves Google AdMob ads only Provides no target attributes through ad library API (Isolating other information, not includingdevice informationthat ad library can access) Collects the list of installed apps that include Google AdMob SDK 11 Subject Recruitmen t Human Intelligence Task on Amazon Mechanical Turk Complete questionnaire regarding participants interests and demographicinformation Use our data collection app to load 100 ads from Google AdMob

We collected 217 valid responses from 284 participants 12 Subject Distribution Gender Political Affiliation Parental Status Income Female Male Independent Democrat Republican Not a parent Parent < $30K $30K$60K > $60K

95 43.78% 122 56.22% 108 49.77% 80 36.87% 29 13.36% 128 58.99% 89 41.01% 107 49.31% 67 30.87% 43 19.82% Religion Marital Status Education

Atheist NonChristian Christian Single Married Separated High school Associa- Bachelor Master & tes Doctoral 83 37.79% 47 21.66% 124 57.14% 73 33.64% 20 9.22% 78 35.94%

50 23.04% 88 40.55% Age 71 32.72% 18 8.30% Ethnicity 18-24 25-34 35-44 45-54 55+ Other Hispanic Asian African American

Caucasian 45 20.74% 106 48.85% 47 21.66% 14 6.45% 5 2.30% 8 3.69% 12 5.53% 12 5.53% 23 10.60% 162 74.65% 13 Subject distribution (cont.)

14 Outline Background & Motivation Methodology Characterization of Mobile Ad Personalization Privacy Leakage through Personalized Mobile Ads Discussion 15 Dataset We collected 695 unique ads which resulted i n 39,671 ad impressions delivered to 217 users 16 Interest Profile Based Personalizatio nPuser Pad

Beauty & Fitness Art & Entertainment Finance Games Sports Precision: |Puser Pad| / | Pad| Recall: |Puser Pad| / |Puser| 17 Home & Garden Interest Profile Based Personalization - Precisio n 18 Interest Profile Based Personalization - Recal l 19 Demographics Based Personalization

We clustered users into different demographic groups We tested the independence of ads and each demographic category Pearsons chi-squared test of independence Null hypothesis: ad is independent of a demographic category Significance level (P-value): 0.005 An ad is personalized based on the demographic category under test if null hypothesis 20 is rejected Demographics Based Personalization - Unique Ads 21 Demographics Based Personalization- Ad Impressions 22 Summary

Both interest profile based personalization and demographics based personalization were prevalent in mobilein-app advertising 23 Outline Background & Motivation Methodology Characterization of Mobile Ad Personalization Privacy Leakage through Personalized Mobile Ads Discussion 24 Classification Models ofDemographi c Information

Features Number of impressions of ads that are correlated with each demographic category List of installed app that include Google AdMob SDK Evaluation 217 samples were randomly divided into 5 sets for 5-fold cross validation Metric for evaluating severity of privacy leakage Cross validated accuracy (mean of accuracies of the 5 validations) Adversary cannot have significant better accuracy than that obtained from tossing coins in a perfectly privacy-preserving system 25 Baseline Classifiers Dummy

Assumption: samples are evenly distributed across labels Predicts any possible label with same probability Augmented Dummy Assumption: samples are not evenly distributed Knows the population distribution in prior Always predicts the most popular label 26 Regrouping Subjects Observation: Samples were not evenly distributed across all labels Gender Female Political Affiliation Male

Independent 122 108 56.22% 49.77% 95 43.78% Parental Status Non-Independent Not a parent 128 58.99% 109 50.23% Religion Income Parent < $30K > $30K 89 41.01% 107 49.31%

110 50.69% Marital Status Atheist NonChristian Christian Single Not Single 83 37.79% 47 21.66% 124 57.14% 93 42.86% 88 40.55% Age Education High Associa- Bachelor or higher school tes

78 50 89 35.94% 23.04% 41.02% Ethnicity 18-27 28-33 34+ 71 32.72% 71 32.72% 75 34.56% Other 8 Hispanic 3.69% 12 5.53% 27 Asian African American

Caucasian 12 5.53% 23 10.60% 162 74.65% Evaluation Resul t Age Education Ethnicity Gender Income Best 0.54 0.40 0.76 0.74 0.62 Dummy 0.33

0.33 0.20 0.50 0.50 Augmente 0.35 0.41 0.75 0.56 0.51 d Dummy Marital Status Parental Status Political Affiliation Religion Best 0.63

0.66 0.59 0.43 Dummy 0.50 0.50 0.50 0.33 Augmented 0.57 0.59 0.50 0.41 Dummy 28 Outline Background & Motivation

Methodology Characterization of Mobile Ad Personalization Privacy Leakage through Personalized Mobile Ads Discussion 29 Privacy Implication In Android, host app can observe all personalized ads Ad network may be inadvertently leaking someof its collected user information (Age, Gender, Parental Status) to the app developer Adversary also has non-trivial advantage in predicting other aspects of the users demographics These aspects may be

correlated with those collected and used by ad networks 30 Limitation The size of our dataset is small More aggressive adversaries may achieve significant better result They can invest more resources to obtain better ground truth data They can observe ads received by users for a longer period oftime 31 Countermeasures Root cause of the privacyleakage problem: lack of isolation between adsand host apps Adopting HTTPS will not stop the problem

We really need isolation between ads and host apps What can ad networksdo? Adding noise into personalized results Providing coarser-grained targeting options 32 Summary We collected both the profile and observed mobile ad traffic from 217 real users We studied ad personalization based on real users interest profiles and demographics We demonstrated that personalized in-app advertising can leak potentially sensitive information to any app that hosts ads

33 Thank you! Q&A

Recently Viewed Presentations

  • Arbre de recherche

    Arbre de recherche

    Les nombres premiers. But: trouver tous les nombres premiers de 1 à N. Crible d'Ératosthène: Générertous les entiers de 2 à N. Supprimertous les multiples de 2, de 3, de 4, etc.
  • Apprentice, Slave, or Indentured Servant - Mr. Kelly

    Apprentice, Slave, or Indentured Servant - Mr. Kelly

    House slaves had better food, clothing, and living quarters. They also learned more about social customs and the English language. Field Slaves Field slaves lived under worse conditions; working from sun-up to sun-down. Field slaves received clothing and shoes once...
  • Kentucky Increases College Awareness Through Student ...

    Kentucky Increases College Awareness Through Student ...

    School ILP Administration Tool. Monitor ILP Completion Status both individually and at the aggregate level. Generate a wide variety of customizable reports for every data element included in the ILP, including Student Survey Results. Send messages to students and add...
  • for what its worth OR How I learned

    for what its worth OR How I learned

    aasa abc abe ace acsa act ada ada afdc aft ap ape arc atc avid bclad btsa cac cal calssd cap capc casbo cbeds cbest cfier charge chspe cif cisi clad clas c-lern clms clre cola cpee cps csba csea...
  • Pure Science vs. Applied Science - Weebly

    Pure Science vs. Applied Science - Weebly

    Pure Science vs. Applied Science Pure Science Science used for getting facts Applied Science Science being put to use Law A description True fact; It is proven!! Cannot change Newton's Law of Gravity Objects that are heavier than air will...
  • Bell Ringer May 11th

    Bell Ringer May 11th

    Thermochemical & Endothermic/ Exothermic equations. Depending on the sign of ΔH°, the reaction can either be exothermic or endothermic. Exothermic reactions . release. heat from the system to the surroundings so the temperature will rise. ΔH° will be . negative....
  • Early Learning Research and Evaluation Strategy

    Early Learning Research and Evaluation Strategy

    Mind-Set Preview. Between each, pause and reflect on your own. At the conclusion, there will be team planning time. 11:15 - 12:00. Elizabeth. Identify one key nugget that your team would like to share.
  • Tennessee Department of Revenue

    Tennessee Department of Revenue

    Vehicle Services Division- Front OfficeThis unit provides support in the following areas: human resources, invoicing, supply and equipment ordering, divisional monthly/annual required reports, compliance for DOHR training, Legislative tracking, Zendesk tracking and various special projects as requested.