陳昇瑋 中央研究院 資訊科學研究所 世代的資安議題
What is Web 2.0? (Andy Budd) “ Putting The  We  in Web” “ … the Living Web ” --   Newsweek, 4/3/2006
Global Traffic Ranking
Web 2.0 Growth
Web 2.0: Definition Web 2.0 is the  network as platform , spanning all  connected devices ; Web 2.0 applications are those that make the most of the intrinsic advantages of that platform: delivering software as a  continually-updated service  that gets better the  more people use  it, consuming and remixing data from multiple sources, including individual users, while providing their own data and services in a form that allows  remixing  by others, creating network effects through an " architecture of participation ," and going beyond the page metaphor of Web 1.0 to deliver  rich user experiences . ( Figure courtesy of Irwin King )
Web 2.0: Interpretations ( Figure courtesy of Irwin King )
 
The Conversation Prism www.briansolis.com
Social Network Services (SNS) A replication in electronic form of  human relationships  and  trust connections Posting personal  profile  and user-created content Socially-focused  interactions recommendations discussion blogging organization of offline events Defining  social relationships
e-mail Users  543 million Social Networking Users Social Spaces 484 million comScore Inc.  WSJ 10/18/07 August 2007
 
Facebook Over 175 million profiles Increased by 270% in one year (ending in June 2007)  A valuation for Facebook translates to  US$ 286  per user profile in 2007 Average user has  120 friends  on the site More than  3 billion minutes   (around 5800 years)  are spent on Facebook each day
One Facebook Profile (out of 175 million)
MySpace Founded in August 2003 Acquired by Fox Interactive Media  in October 2005  (with a price  US$ 35  per profile) 225 million profiles as of March, 2008 The most visited in US (> 114 million visitors) in June 2007 On average  300,000 new people  sign up on MySpace every day
Let’s Look … MySpace http:// www.myspace.com /
凡事有利必有弊 The top 10 social networking sites increased their audience of 46.8 million in 2005 to 68.8 million in April 2006, reaching  45% of active Web users.
Threats arisen from SNS Development Security  and  privacy  was not the first concern in  SNS development.
Time to move sentry post to Web 2.0!
Potential Threats Nothing is Ever Deleted from the Internet Information Leakage from Contact Descriptions Face Recognition Location Tracking Image Tagging and Metadata Spamming Spear Phishing SNS Aggregators XSS, Viruses, and Worms Information Leakage due to Network Infiltration Reputation Slander though Identity Theft Cyberstalking
“ There is so much more personal information online."
1. Nothing is Ever Deleted from the Internet
Yahoo Site History 1996
Yahoo Site History 2000
Yahoo Site History 2005
Yahoo Site History 2009
Internet Data Never Be Deleted Users reveal sensitive information (e.g. dates, political views) in profiles Data can be downloaded and stored  over time  by third parties Web page   sampling  techniques Low  cost  of storage Examples Miss New Jersey was threatened with images taken from her profile Two British tennis stars were suspended for revelations made on SNS
An Online Quiz (for fun) yes yes yes
Blackmail claim stirs fears over Facebook
LTA suspends top junior players
Whether asked website to delete data that no longer wanted to be public? Outcome of request to delete data:
2. Information Leakage on SNS
Pseudonym != Anonymity axxxxx1 可愛的 * 嘉嘉 *  axxx7 ??? 俐嘉 ,,( 要記得我悠 !! 壹樣斗 * 我會記得你 ?  axxxx0 哩尬  axxxx5 俐嘉   --  axxx4 【 愛 ‧哩軋 】我們有許多小秘密  axx6 同學 俐嘉   axxx2 哩嘎 ~  瘋 ... 但是有氣質  axxxxxx3 利嘉   (2)  axxxxxx1 俐嘉   bxxxxx1 哩  軋  bxxxxx4 俐嘉   *  活潑可愛的小女孩  cxxxxx1 很有活力也很可愛的學妹 _ 俐嘉   dxxxxxx0 哩嘎  gxxx3 哩尬 ( 和窩起阿達 )  qxxxxxx8 哩嘎  rxxxxxx6 力嘎  ( 郭郭的七辣 )  sxxxxy 俐 嘉  sxxxxxxxxxx6 嘉嘉 ?  wxxxxxxx8  *“ 劉俐嘉   yxxxxxxx6 小孩  〝 俐嘉 〞  yxxxxxxx6 俐嘉   zxxxx6 ▽->  俐 嘉  。 ? 〞  zxxxxx7 ◆╭☆ ﹋ 俐嘉 ﹋☆╮◇
非自願姓名洩露 使用者沒有公開其真實姓名 卻能透過好友描述推測 無法保障使用者隱私 真實姓名為劉德榮?
案例分析:無名小站 國內用戶數最多  ( 超過  390  萬人 ) 使用者皆匿名參與 蒐集  766,972 (20%)  使用者 使用者經常使用真實姓名描述好友 分析步驟 分析在不同描述中重覆出現的候選字串 大學聯考名單比對 常用詞語列表比對 “ Involuntary Information Leakage in Social Network Services,” Ieng-Fat Lam, Kuan-Ta Chen, and Ling-Jyh Chen,  Proceedings of IWSEC 2008 .  http://mmnet.iis.sinica.edu.tw/publication_detail.html?key=lam08_wretch
使用者以真實姓名稱呼朋友傾向與被朋友以真實姓名稱呼的比率具有高度相關 使用者的姓名洩露比例與性別  ( 上圖 ) 及使用者年齡  ( 下圖 )  的關係 名稱種類 推測到的比例 暱稱 60% 全名 30% 名字  ( 不包括姓 ) 72% 全名或名字 78%
年齡及就學記錄的資訊洩露 就讀學校 及 年齡 的非自願洩露 使用者沒有公開其就讀學校及年齡 卻能透過好友關係推測其學歷及所屬年齡群 使用者就讀學校及年齡的推測 找出已透露的使用者 以關係關鍵字直接推測其好友  (direct inference) 從已推測到使用者推測其好友  (indirect  inference)
使用者的就讀學校推測結果  ( 上圖 )  及平均推測範圍  ( 邊數 ) 使用者的年齡推測結果  ( 上圖 )  及平均推測範圍  ( 邊數 )
Sensitive Information Leakage Real name Education history Career history Mobile phone # Real-life relationship date, spouse relatives boss, staff
無名小站情報分析事務所 http://mmnet.iis.sinica.edu.tw/proj/wretchinfo/
分析結果(範例一)
分析結果(範例二)
問卷結果  ( 一 )
問卷結果  ( 二 )
服務使用後續追蹤  ( 一 ) 姓名洩露程度的變化 有沒有任何姓名洩露
服務使用後續追蹤  ( 二 ) 姓名洩露程度與使用者反應的關係
服務使用後續追蹤  ( 三 ) 姓名洩露程度與使用者態度的關係
無名小站.文字繪 http://mmnet.iis.sinica.edu.tw/proj/tagart/
無名小站.文字繪
使用者不喜歡的標籤  公關 , 哈哈 , 跟好 , 天空 , 羽豬 , 偉宏 , 科科 , 哭哭 , 還有 , 一個 , 不過 , 媺棻 , 討厭 , 朋友 , 數學 , 憲緯 , 但老 , 老媽 , 人很 , 會讓 , 又說 , 師大 , 國小 , 最愛 , 最有 , 小芝 , 芝君 , 表姐 , 世傑 , 轉到 , 士傑 , 睡覺 , 台客 , 鴿子 , 學生 , 王子 , 身邊 , 一直 , 寵物 , 世界 , 北安 , 老公 , 君仁 , 毛毛 , 勁舞 , 道恆 , 舞步 , 阿海 , 烏龜 , 咖美 , 小米 , 龔柏 , 麻齊 , 佩瑜 , 罐子 , 爺爺 , 趙哥 , 小董 , 阿囧 , 東華 , 其實 , 薛球 , 阿嫂 , 姐姐 , 范鑫 , 盈靜 , 胖丁 , 彥浦 , 表姐 , 表姊 , 叔叔 , 白熊 , 雨涵 , 曉初 , 竹北 , 小雯 , 學姊 , 雯歆 , 兩老 , 妹子 , 口丁 , 雄中 , 男友 , 珮君 , 嘉蓉 , 洗澡 , 堯安 , 哈囉 , 抱歉 , 阿毛 , 媽媽 , 北安 , 市立 , 下棋 , 吟芝 , 白目 , 柏宏 , 低能 , 如珍 , 學姐 , 匯捷 , 名字 , 資通 , 實踐 , 縣立 , 仁愛 , 燕文 , 杏如 , 海灘 , 陽光 , 娃娃 , 立航 , 阿呈 , 陰暗 , 角落 , 大爺 , 亦方 , 沛嫻 , 不多 , 眼中 , ㄎㄎ , 聽說 , 中文 , 雅芳 , 氣直 , 嬸嬸 , 眉毛 , 妤雯 , 瑞鋒 , 瑞峰 , 國小 , 考試 , 好友 , 欣蓉 , 討厭 , 佳宜 , 才不 , 想交 , 雅亭 , 國防 , 軍警 , 立寰 , 皇冕 , 黑道 , 合嘴 , 管家 , 晚娘 , 嬌嗔 , 年級 , 慈敏 , 阿敏 , 小敏 , 洪小 , 明道 , 小摟 , 蘭妙 , 慶伶 , 畇伶 , 昀伶 , 想妳 , 孤單 , 水睞 , 四妹 , 撿角 , 測幹 , 凱凱 , 矮子 , 釣魚 , 阿綱 , 殭屍 , 好人 , 胖哥 , 謝肥 , 芳芳 , 種豬 , 經商 , 想改 , 小祺 , 警察 , 大學 , 高中 , 豬頭 , 什麼 三是井 , 卻無法 , 平凡人 , 看不懂 , 黃媺棻 , 級媺棻 , 級米香 , 好人卡 , 彰憲偉 , 倒著唸 , 章憲偉 , 倒著念 , 國立高 , 都不理 , 說自己 , 加拿大 , 台北縣 , 勁舞團 , 出家人 , 一歲卻 , 吳木木 , 黃里歐 , 施佩君 , 高雄縣 , 陳漢典 , 哈哈哈 , 政治人 , 周百合 , 說賤人 , 外國團 , 哈哈哈 , 試試看 , 林紅君 , 小護士 , 蘭司心 , 愛自拍 , 鬼畜嘉 , 陳珮君 , 王冠腸 , 白蟾蜍 , 堯安姊 , 搖滾樂 , 據說還 , 吳秉寰 , 陳如珍 , 如珍啦 , 對不起 , 屏東縣 , 小麥肌 , 小海豹 , 沛沛嫻 , 或許連 , 個怎樣 , 說不清 , 能告訴 , 周百合 , 好人卡 , 上學除 , 從民生 , 挖哈哈 , 賴怡安 , 林世傑 , 你好棒 , 阿切切 , 被狗幹 , 沒路用 , 褚又豪 , 我愛你 , 謝謝你 , 個怪人 , 第二枚 , 陳立寰 , 池塘裡 , 小敏兒 , 洪小囉 , 洪小敏 , 洪小摟 , 蟹肉棒 , 小小白 , 陳蘭妙 一事無成 , 無名小站 , 逛街購物 , 不用咪聽 , 身邊一直 , 下棋彈琴 , 黃金獵犬 , 縣立文德 , 市立大直 , 市立北安 , 國立鳳新 , 正緩緩為 , 妳在哪裡 , 正在就學 , 我哈哈哈 , 運動釣魚 , 蝦米咚咚 , 好人一個 , 你是好人 , 國立台灣 , 浪費時間 , 國防大學 , 葛神靈鬼 , 市立大直 , 下棋彈琴 , 戴阿格那 , 小粉公主 , 縣立仁愛 , 被人保護 , 當瘦子但 , 縣立板橋 1.  姓名 , 2.  關係 , 3.  不雅綽號 , 4.  身份 , 5.  學經歷
Visual Shopping on Like.com
Content-based Image Retrieval (CBIR) Iguania Chelonia Amphisbaenia Alligatoridae   Alligatorinae   Crocodylidae   Cricosaura typica Xantusia vigilis Elseya dentata Glyptemys muhlenbergii Phrynosoma braconnieri Phrynosoma ditmarsi Phrynosoma taurus Phrynosoma douglassii Phrynosoma hernandesi Alligator mississippiensis Caiman crocodilus Crocodylus cataphractus Tomistoma schlegelii Crocodylus johnstoni *Purgatorio -- Canto XXXIII 64
3. Face Recognition Facebook hosts in excess of  30 billion  user photos growing at a rate of >  14 million  every day
Face Recognition A data source for identifying profiles across services using face recognition algorithms Correlate profiles on other services  given a profile Identify profiles on various services  given a photo Thinking of online dating services Match.com 奇摩交友 …
4. Location Identification Features matching on  location-specific features : road sign, painting in a room, arrangement of furniture Allows linking of a user to location data stalking unwanted marketing blackmails
A Parrot As The Feature
5. Image Tagging and Metadata Users can tag images with a person’s real name, SNS profile, or email address Users’ privacy may be under great threat from image tags  posted by others 流川楓 櫻木花道 三井
Image Tagging and Metadata EXIF data embedded in photos Serial number of the camera can be tracked to a warranty registration card “ Harry Potter and the Deathly Hallows ” event
6. Spamming SNS are starting to replace emails Through friend invitations and comment posting Software like FriendBot can do  automate friend invitations and  comment posting  (based on  demographic criteria) SNS accounts can be easily applied  and thrown away Through user profiles Stealing members’ passwords to promote the advertisement on their profiles
State-of-the-art Solution CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart)
Text-based CAPTCHA
WHAT ARE THESE PICTURES OF? Image-based CAPTCHA
THE IMAGES NEED TO BE  RANDOMLY DISTORTED Image-based CAPTCHA
PLAYER 1 PLAYER 2 THE ESP GAME http://www.espgame.org/ GUESSING:   CAR GUESSING:   BOY GUESSING:   CAR SUCCESS! YOU AGREE ON CAR SUCCESS! YOU AGREE ON CAR GUESSING:   KID GUESSING:   HAT
© 2004 Carnegie Mellon University, all rights reserved. Patent Pending.
SAMPLE  LABELS BEACH CHAIRS SEA PEOPLE MAN WOMAN PLANT OCEAN TALKING WATER PORCH
15 MILLION LABELS   WITH 75,000 PLAYERS THE ESP GAME  IS FUN THERE ARE MANY PEOPLE THAT PLAY  OVER 20 HOURS A WEEK
SEARCH RESULTS OF  CAR
SEARCH RESULTS OF  DOG
SEARCH RESULTS OF  小甜甜布蘭妮
SEARCH RESULTS OF  GOOGLE
Having Fun = Work
Idea of Human Computation Take advantage of people’s desire to be entertained and perform useful tasks as a side effect
Human Computation as OCR System
7. Phishing and Social Phishing More than  66,000 phishing cases  reported to or detected by Anti-Phishing Working Group (APWG) in September, 2007 Up to 95% of phishing targets were related to financial services and Internet retailers In 2007 (a survey by Gartner, Inc.) More than $3.2 billion was lost  due to phishing in the US 3.6 million adults  lost their money in phishing attacks Much more than the 2.3 million who did so the year before
Phishing Statistics 43%  of adults have received a phishing contact.  5%  of those adults gave their personal information.
Phishing Attacks
Phishing through Emails
Official vs Phishing Pages http://www.ebay.com.fake.cc/ http://www.ebay.com/
Spear Phishing Spear phishing:  targeted  phishing attacks An experiment by U. Indiana showed: spear phishing attacks can achieve a hit rate of  72% , compared with a control of  15% Context-aware attacks Knowing your personal information Knowing the information of your friends Impersonate as  your friends SNS profiles may be used for phishing attacks The JS/Quickspace worm Posting comments as your friends Particularly effective due to the extra trust from the circle of friends
Anti-Phishing Techniques Blacklist / whitelist Logo recognition Content-based recognition Page Image similarity  Password hashing Mutual authentication (e.g., personal visual clues) Site seals
Our Layout-based Detection Method Capture the screen of Phishing page
Block Analysis
Layout Analysis
Match example eBay original page (left) and a phishing page (right)
Performance Evaluation Collected Data 312 original web page screens 1531 phishing page screens, targeted to Bank of America (46) Charter One Money Manager GPS (102) eBay (654) Marshall and Ilsley Bank (138) PayPal (591) We use Naïve Bayesian Classifier to perform supervised classification
Example: Correct Classification
Example: Correct Classification
Example: Correct Classification
Example: Correct Classification
Example: Incorrect Classification
Example: Incorrect Classification
Example: Incorrect Classification
Example: Incorrect Classification
Our Local-Feature-based Detection Method Step 1:  Visual assessment with local content descriptors Context Contrast Histogram (CCH) invariant to scale, rotation, etc. even more efficient than SIFT, the most well-known descriptor for its excellent performance Step 2:  Page scoring & classification Scoring Criteria correct matching rate ratio of matched area Naïve bayesian classification
Phishing Page Matching (Classification)
Superior  to EMD (Earth-Mover’s Distance) scheme (IEEE TDSC, 2006) Performance Evaluation
Phishgig 1. Installed in Firefox 3.0.1 2. Live Status 3. Protected Pages Management http://mmnet.iis.sinica.edu.tw/proj/phishgig/
Phishgig in Action 1. Legitimate eBay login page 2. Fake eBay login page
8. SNS Aggregators Integrating  data from various  SNS into a single web application,  e.g., Snag, ProfileLinker Protecting several SNS profiles by a single username/password authentication An estimate shows that at least  15% overlaps  in two of the major social networking sites
9. XSS, Viruses, and Worms The SAMY virus, which infected MySpace profiles, has spread to  over one million users within just 20 hours One of the fastest spreading viruses Forced MySpace to shut down its site
10. Information Leakage due to  Network Infiltration Currently anyone with a usable email address can join any geographical network on the Facebook An experiment on Facebook A user sent invitations to  250,000 users  across US 75,000 users (30%)   accepted the invitations (and reveal their profile information to a random stranger)
Information Leakage due to Network Infiltration Another experiment Antivirus company Sophos created a profile page for  “Freddi Staur” (an anagram of “ID Fraudster”) A green plastic frog with minimal personal information in the profile 200  friend requests were sent,  87  of the 200 responded 72%  of respondents revealed their email addresses;  84%  revealed their birth date
11. Reputation Slander though Identity Theft Fake profiles may be created in the name of  well-known persons or dead celebrities E.g., Galileo has a profile on MySpace and 3000 friends Fake profiles may be used for  malicious purposes , e.g., defamation The target of the attack cannot access the profile Most SNS perform only  weak authentication  of registrants But how?
12. Cyberstalking Around  20%  users on Facebook disclosed their full address and at least two classes they are attending 78%  provided instant messaging accounts suitable for tracking their online status Mobile SNS, e.g., Twitter, emphasize  location data
Micro blogging
Twitter is  HOT !
Twitter Vision 3D
Twitter Map
看到許許多多的資安問題, 我們能做些什麼?
 
Efforts to Cope with Threats (1) Restrict spidering and bulk downloads But how? Require the consent of the data subject for tagging Provide more privacy control over search results
Privacy Control Settings of Facebook
However, … Profile Searchability We measured the percentage of users that changed search default setting away from being searchable to everyone on the Facebook to only being searchable to CMU users 1.2%  of users (18 female, 45 male) made use of this privacy setting Profile Visibility We evaluated the number of CMU users that changed profile visibility by restricting access from unconnected users  Only 3 profiles ( 0.06% ) in total fall into this category
Efforts to Cope with Threats (2) Image anonymization techniques, e.g., face de-identification. K-Same: thwarts face recognition while many facial details remain.
Efforts to Cope with Threats (3) Reputation management Rating an account or an object (e.g., comment) Reporting inappropriate behavior or content  Collective   decision , not by the experts Collusion: a secret agreement between two or more parties for a fraudulent, illegal, or deceitful purpose Unfairly low ratings –  bad-mouthing Unfairly high ratings –  ballot stuffing
Rating Score Aggregation The review is considered trustworthy if his/her earlier reviews are more consistent with public opinions. 5 0.1 8 2 0.3 7 5 0.1 6 1 0.6 5 4 0.9 4 3 0.3 3 5 0.4 2 3 1 1 Rating T(i) i TVBS Counting Method TVBS outputs 3 !! Counting method outputs 5 !!
Collusion-Resistant Impeachment System 開放網友檢舉申訴不當行為 Social network systems ( 無名小站 , Facebook,  奇摩交友 , …) Online games ( 檢舉作弊或使用外掛 ) Bad mouthing problem Some voters with  secret agreement  vote some victims for a fraudulent, illegal, or deceitful purpose Our Goal Detecting misbehaved users despite of collusion behavior
Our Proposed Algorithm Step 1:  Find voting communities Newman's  community structure  analysis  algorithm break into  highly-connected  subcomponents Step 2:  Identify bad users based on votes between voting communities Cluster users according to  two features : Votee's outside edges:  Misbehaved  users tend to be voted by different communities. Voter's outside edges:  Voters who vote  misbehaved  users tend to vote  misbehaved  users in different communities Select the cluster with the highest outside edges as  misbehaved  user group outside edge inside edge
Performance Evaluation
Efforts to Cope with Threats (4) Biometric signature for combating  identity theft fingerprint voice keystroke mouse move dynamics
TAKE HOME MESSAGES Rule 1: If you think  your mom  would be offended, then don't post it.  Rule 2: Consider the “7 Ps”  ( Parents, Police, Predators, Professors,  Prospective Employers, Peers and Pals ) before posting your own information on the Internet. Rule 3: Ask your pals to follow the rules.
結語 線上社群網路是不容忽視的嶄新網路生態,同時也帶來許多資安危機及隱憂。 除了依賴使用者的 “ 自覺 ” ,更須有妥善的 機制來防範未然。 對學術研究者:新的研究議題; 對社群系統設計者:將隱私問題考慮在系統設計內。 問題:我們還能做些什麼?
謝謝各位! 陳昇瑋 中央研究院 資訊科學研究所 http://www.iis.sinica.edu.tw/~swc
Questions

Web 2.0世代的資安議題

  • 1.
  • 2.
    What is Web2.0? (Andy Budd) “ Putting The We in Web” “ … the Living Web ” -- Newsweek, 4/3/2006
  • 3.
  • 4.
  • 5.
    Web 2.0: DefinitionWeb 2.0 is the network as platform , spanning all connected devices ; Web 2.0 applications are those that make the most of the intrinsic advantages of that platform: delivering software as a continually-updated service that gets better the more people use it, consuming and remixing data from multiple sources, including individual users, while providing their own data and services in a form that allows remixing by others, creating network effects through an " architecture of participation ," and going beyond the page metaphor of Web 1.0 to deliver rich user experiences . ( Figure courtesy of Irwin King )
  • 6.
    Web 2.0: Interpretations( Figure courtesy of Irwin King )
  • 7.
  • 8.
    The Conversation Prismwww.briansolis.com
  • 9.
    Social Network Services(SNS) A replication in electronic form of human relationships and trust connections Posting personal profile and user-created content Socially-focused interactions recommendations discussion blogging organization of offline events Defining social relationships
  • 10.
    e-mail Users 543 million Social Networking Users Social Spaces 484 million comScore Inc. WSJ 10/18/07 August 2007
  • 11.
  • 12.
    Facebook Over 175million profiles Increased by 270% in one year (ending in June 2007) A valuation for Facebook translates to US$ 286 per user profile in 2007 Average user has 120 friends on the site More than 3 billion minutes (around 5800 years) are spent on Facebook each day
  • 13.
    One Facebook Profile(out of 175 million)
  • 14.
    MySpace Founded inAugust 2003 Acquired by Fox Interactive Media in October 2005 (with a price US$ 35 per profile) 225 million profiles as of March, 2008 The most visited in US (> 114 million visitors) in June 2007 On average 300,000 new people sign up on MySpace every day
  • 15.
    Let’s Look …MySpace http:// www.myspace.com /
  • 16.
    凡事有利必有弊 The top10 social networking sites increased their audience of 46.8 million in 2005 to 68.8 million in April 2006, reaching 45% of active Web users.
  • 17.
    Threats arisen fromSNS Development Security and privacy was not the first concern in SNS development.
  • 18.
    Time to movesentry post to Web 2.0!
  • 19.
    Potential Threats Nothingis Ever Deleted from the Internet Information Leakage from Contact Descriptions Face Recognition Location Tracking Image Tagging and Metadata Spamming Spear Phishing SNS Aggregators XSS, Viruses, and Worms Information Leakage due to Network Infiltration Reputation Slander though Identity Theft Cyberstalking
  • 20.
    “ There isso much more personal information online."
  • 21.
    1. Nothing isEver Deleted from the Internet
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
    Internet Data NeverBe Deleted Users reveal sensitive information (e.g. dates, political views) in profiles Data can be downloaded and stored over time by third parties Web page sampling techniques Low cost of storage Examples Miss New Jersey was threatened with images taken from her profile Two British tennis stars were suspended for revelations made on SNS
  • 27.
    An Online Quiz(for fun) yes yes yes
  • 28.
    Blackmail claim stirsfears over Facebook
  • 29.
    LTA suspends topjunior players
  • 30.
    Whether asked websiteto delete data that no longer wanted to be public? Outcome of request to delete data:
  • 31.
  • 32.
    Pseudonym != Anonymityaxxxxx1 可愛的 * 嘉嘉 * axxx7 ??? 俐嘉 ,,( 要記得我悠 !! 壹樣斗 * 我會記得你 ? axxxx0 哩尬 axxxx5 俐嘉 -- axxx4 【 愛 ‧哩軋 】我們有許多小秘密 axx6 同學 俐嘉 axxx2 哩嘎 ~ 瘋 ... 但是有氣質 axxxxxx3 利嘉 (2) axxxxxx1 俐嘉 bxxxxx1 哩 軋 bxxxxx4 俐嘉 * 活潑可愛的小女孩 cxxxxx1 很有活力也很可愛的學妹 _ 俐嘉 dxxxxxx0 哩嘎 gxxx3 哩尬 ( 和窩起阿達 ) qxxxxxx8 哩嘎 rxxxxxx6 力嘎 ( 郭郭的七辣 ) sxxxxy 俐 嘉 sxxxxxxxxxx6 嘉嘉 ? wxxxxxxx8 *“ 劉俐嘉 yxxxxxxx6 小孩 〝 俐嘉 〞 yxxxxxxx6 俐嘉 zxxxx6 ▽-> 俐 嘉 。 ? 〞 zxxxxx7 ◆╭☆ ﹋ 俐嘉 ﹋☆╮◇
  • 33.
  • 34.
    案例分析:無名小站 國內用戶數最多 ( 超過 390 萬人 ) 使用者皆匿名參與 蒐集 766,972 (20%) 使用者 使用者經常使用真實姓名描述好友 分析步驟 分析在不同描述中重覆出現的候選字串 大學聯考名單比對 常用詞語列表比對 “ Involuntary Information Leakage in Social Network Services,” Ieng-Fat Lam, Kuan-Ta Chen, and Ling-Jyh Chen, Proceedings of IWSEC 2008 . http://mmnet.iis.sinica.edu.tw/publication_detail.html?key=lam08_wretch
  • 35.
    使用者以真實姓名稱呼朋友傾向與被朋友以真實姓名稱呼的比率具有高度相關 使用者的姓名洩露比例與性別 ( 上圖 ) 及使用者年齡 ( 下圖 ) 的關係 名稱種類 推測到的比例 暱稱 60% 全名 30% 名字 ( 不包括姓 ) 72% 全名或名字 78%
  • 36.
    年齡及就學記錄的資訊洩露 就讀學校 及年齡 的非自願洩露 使用者沒有公開其就讀學校及年齡 卻能透過好友關係推測其學歷及所屬年齡群 使用者就讀學校及年齡的推測 找出已透露的使用者 以關係關鍵字直接推測其好友 (direct inference) 從已推測到使用者推測其好友 (indirect inference)
  • 37.
    使用者的就讀學校推測結果 (上圖 ) 及平均推測範圍 ( 邊數 ) 使用者的年齡推測結果 ( 上圖 ) 及平均推測範圍 ( 邊數 )
  • 38.
    Sensitive Information LeakageReal name Education history Career history Mobile phone # Real-life relationship date, spouse relatives boss, staff
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
    服務使用後續追蹤 (一 ) 姓名洩露程度的變化 有沒有任何姓名洩露
  • 45.
    服務使用後續追蹤 (二 ) 姓名洩露程度與使用者反應的關係
  • 46.
    服務使用後續追蹤 (三 ) 姓名洩露程度與使用者態度的關係
  • 47.
  • 48.
  • 49.
    使用者不喜歡的標籤 公關, 哈哈 , 跟好 , 天空 , 羽豬 , 偉宏 , 科科 , 哭哭 , 還有 , 一個 , 不過 , 媺棻 , 討厭 , 朋友 , 數學 , 憲緯 , 但老 , 老媽 , 人很 , 會讓 , 又說 , 師大 , 國小 , 最愛 , 最有 , 小芝 , 芝君 , 表姐 , 世傑 , 轉到 , 士傑 , 睡覺 , 台客 , 鴿子 , 學生 , 王子 , 身邊 , 一直 , 寵物 , 世界 , 北安 , 老公 , 君仁 , 毛毛 , 勁舞 , 道恆 , 舞步 , 阿海 , 烏龜 , 咖美 , 小米 , 龔柏 , 麻齊 , 佩瑜 , 罐子 , 爺爺 , 趙哥 , 小董 , 阿囧 , 東華 , 其實 , 薛球 , 阿嫂 , 姐姐 , 范鑫 , 盈靜 , 胖丁 , 彥浦 , 表姐 , 表姊 , 叔叔 , 白熊 , 雨涵 , 曉初 , 竹北 , 小雯 , 學姊 , 雯歆 , 兩老 , 妹子 , 口丁 , 雄中 , 男友 , 珮君 , 嘉蓉 , 洗澡 , 堯安 , 哈囉 , 抱歉 , 阿毛 , 媽媽 , 北安 , 市立 , 下棋 , 吟芝 , 白目 , 柏宏 , 低能 , 如珍 , 學姐 , 匯捷 , 名字 , 資通 , 實踐 , 縣立 , 仁愛 , 燕文 , 杏如 , 海灘 , 陽光 , 娃娃 , 立航 , 阿呈 , 陰暗 , 角落 , 大爺 , 亦方 , 沛嫻 , 不多 , 眼中 , ㄎㄎ , 聽說 , 中文 , 雅芳 , 氣直 , 嬸嬸 , 眉毛 , 妤雯 , 瑞鋒 , 瑞峰 , 國小 , 考試 , 好友 , 欣蓉 , 討厭 , 佳宜 , 才不 , 想交 , 雅亭 , 國防 , 軍警 , 立寰 , 皇冕 , 黑道 , 合嘴 , 管家 , 晚娘 , 嬌嗔 , 年級 , 慈敏 , 阿敏 , 小敏 , 洪小 , 明道 , 小摟 , 蘭妙 , 慶伶 , 畇伶 , 昀伶 , 想妳 , 孤單 , 水睞 , 四妹 , 撿角 , 測幹 , 凱凱 , 矮子 , 釣魚 , 阿綱 , 殭屍 , 好人 , 胖哥 , 謝肥 , 芳芳 , 種豬 , 經商 , 想改 , 小祺 , 警察 , 大學 , 高中 , 豬頭 , 什麼 三是井 , 卻無法 , 平凡人 , 看不懂 , 黃媺棻 , 級媺棻 , 級米香 , 好人卡 , 彰憲偉 , 倒著唸 , 章憲偉 , 倒著念 , 國立高 , 都不理 , 說自己 , 加拿大 , 台北縣 , 勁舞團 , 出家人 , 一歲卻 , 吳木木 , 黃里歐 , 施佩君 , 高雄縣 , 陳漢典 , 哈哈哈 , 政治人 , 周百合 , 說賤人 , 外國團 , 哈哈哈 , 試試看 , 林紅君 , 小護士 , 蘭司心 , 愛自拍 , 鬼畜嘉 , 陳珮君 , 王冠腸 , 白蟾蜍 , 堯安姊 , 搖滾樂 , 據說還 , 吳秉寰 , 陳如珍 , 如珍啦 , 對不起 , 屏東縣 , 小麥肌 , 小海豹 , 沛沛嫻 , 或許連 , 個怎樣 , 說不清 , 能告訴 , 周百合 , 好人卡 , 上學除 , 從民生 , 挖哈哈 , 賴怡安 , 林世傑 , 你好棒 , 阿切切 , 被狗幹 , 沒路用 , 褚又豪 , 我愛你 , 謝謝你 , 個怪人 , 第二枚 , 陳立寰 , 池塘裡 , 小敏兒 , 洪小囉 , 洪小敏 , 洪小摟 , 蟹肉棒 , 小小白 , 陳蘭妙 一事無成 , 無名小站 , 逛街購物 , 不用咪聽 , 身邊一直 , 下棋彈琴 , 黃金獵犬 , 縣立文德 , 市立大直 , 市立北安 , 國立鳳新 , 正緩緩為 , 妳在哪裡 , 正在就學 , 我哈哈哈 , 運動釣魚 , 蝦米咚咚 , 好人一個 , 你是好人 , 國立台灣 , 浪費時間 , 國防大學 , 葛神靈鬼 , 市立大直 , 下棋彈琴 , 戴阿格那 , 小粉公主 , 縣立仁愛 , 被人保護 , 當瘦子但 , 縣立板橋 1. 姓名 , 2. 關係 , 3. 不雅綽號 , 4. 身份 , 5. 學經歷
  • 50.
  • 51.
    Content-based Image Retrieval(CBIR) Iguania Chelonia Amphisbaenia Alligatoridae Alligatorinae Crocodylidae Cricosaura typica Xantusia vigilis Elseya dentata Glyptemys muhlenbergii Phrynosoma braconnieri Phrynosoma ditmarsi Phrynosoma taurus Phrynosoma douglassii Phrynosoma hernandesi Alligator mississippiensis Caiman crocodilus Crocodylus cataphractus Tomistoma schlegelii Crocodylus johnstoni *Purgatorio -- Canto XXXIII 64
  • 52.
    3. Face RecognitionFacebook hosts in excess of 30 billion user photos growing at a rate of > 14 million every day
  • 53.
    Face Recognition Adata source for identifying profiles across services using face recognition algorithms Correlate profiles on other services given a profile Identify profiles on various services given a photo Thinking of online dating services Match.com 奇摩交友 …
  • 54.
    4. Location IdentificationFeatures matching on location-specific features : road sign, painting in a room, arrangement of furniture Allows linking of a user to location data stalking unwanted marketing blackmails
  • 55.
    A Parrot AsThe Feature
  • 56.
    5. Image Taggingand Metadata Users can tag images with a person’s real name, SNS profile, or email address Users’ privacy may be under great threat from image tags posted by others 流川楓 櫻木花道 三井
  • 57.
    Image Tagging andMetadata EXIF data embedded in photos Serial number of the camera can be tracked to a warranty registration card “ Harry Potter and the Deathly Hallows ” event
  • 58.
    6. Spamming SNSare starting to replace emails Through friend invitations and comment posting Software like FriendBot can do automate friend invitations and comment posting (based on demographic criteria) SNS accounts can be easily applied and thrown away Through user profiles Stealing members’ passwords to promote the advertisement on their profiles
  • 59.
    State-of-the-art Solution CAPTCHA(Completely Automated Public Turing test to tell Computers and Humans Apart)
  • 60.
  • 61.
    WHAT ARE THESEPICTURES OF? Image-based CAPTCHA
  • 62.
    THE IMAGES NEEDTO BE RANDOMLY DISTORTED Image-based CAPTCHA
  • 63.
    PLAYER 1 PLAYER2 THE ESP GAME http://www.espgame.org/ GUESSING: CAR GUESSING: BOY GUESSING: CAR SUCCESS! YOU AGREE ON CAR SUCCESS! YOU AGREE ON CAR GUESSING: KID GUESSING: HAT
  • 64.
    © 2004 CarnegieMellon University, all rights reserved. Patent Pending.
  • 65.
    SAMPLE LABELSBEACH CHAIRS SEA PEOPLE MAN WOMAN PLANT OCEAN TALKING WATER PORCH
  • 66.
    15 MILLION LABELS WITH 75,000 PLAYERS THE ESP GAME IS FUN THERE ARE MANY PEOPLE THAT PLAY OVER 20 HOURS A WEEK
  • 67.
  • 68.
  • 69.
    SEARCH RESULTS OF 小甜甜布蘭妮
  • 70.
  • 71.
  • 72.
    Idea of HumanComputation Take advantage of people’s desire to be entertained and perform useful tasks as a side effect
  • 73.
  • 74.
    7. Phishing andSocial Phishing More than 66,000 phishing cases reported to or detected by Anti-Phishing Working Group (APWG) in September, 2007 Up to 95% of phishing targets were related to financial services and Internet retailers In 2007 (a survey by Gartner, Inc.) More than $3.2 billion was lost due to phishing in the US 3.6 million adults lost their money in phishing attacks Much more than the 2.3 million who did so the year before
  • 75.
    Phishing Statistics 43% of adults have received a phishing contact. 5% of those adults gave their personal information.
  • 76.
  • 77.
  • 78.
    Official vs PhishingPages http://www.ebay.com.fake.cc/ http://www.ebay.com/
  • 79.
    Spear Phishing Spearphishing: targeted phishing attacks An experiment by U. Indiana showed: spear phishing attacks can achieve a hit rate of 72% , compared with a control of 15% Context-aware attacks Knowing your personal information Knowing the information of your friends Impersonate as your friends SNS profiles may be used for phishing attacks The JS/Quickspace worm Posting comments as your friends Particularly effective due to the extra trust from the circle of friends
  • 80.
    Anti-Phishing Techniques Blacklist/ whitelist Logo recognition Content-based recognition Page Image similarity Password hashing Mutual authentication (e.g., personal visual clues) Site seals
  • 81.
    Our Layout-based DetectionMethod Capture the screen of Phishing page
  • 82.
  • 83.
  • 84.
    Match example eBayoriginal page (left) and a phishing page (right)
  • 85.
    Performance Evaluation CollectedData 312 original web page screens 1531 phishing page screens, targeted to Bank of America (46) Charter One Money Manager GPS (102) eBay (654) Marshall and Ilsley Bank (138) PayPal (591) We use Naïve Bayesian Classifier to perform supervised classification
  • 86.
  • 87.
  • 88.
  • 89.
  • 90.
  • 91.
  • 92.
  • 93.
  • 94.
    Our Local-Feature-based DetectionMethod Step 1: Visual assessment with local content descriptors Context Contrast Histogram (CCH) invariant to scale, rotation, etc. even more efficient than SIFT, the most well-known descriptor for its excellent performance Step 2: Page scoring & classification Scoring Criteria correct matching rate ratio of matched area Naïve bayesian classification
  • 95.
    Phishing Page Matching(Classification)
  • 96.
    Superior toEMD (Earth-Mover’s Distance) scheme (IEEE TDSC, 2006) Performance Evaluation
  • 97.
    Phishgig 1. Installedin Firefox 3.0.1 2. Live Status 3. Protected Pages Management http://mmnet.iis.sinica.edu.tw/proj/phishgig/
  • 98.
    Phishgig in Action1. Legitimate eBay login page 2. Fake eBay login page
  • 99.
    8. SNS AggregatorsIntegrating data from various SNS into a single web application, e.g., Snag, ProfileLinker Protecting several SNS profiles by a single username/password authentication An estimate shows that at least 15% overlaps in two of the major social networking sites
  • 100.
    9. XSS, Viruses,and Worms The SAMY virus, which infected MySpace profiles, has spread to over one million users within just 20 hours One of the fastest spreading viruses Forced MySpace to shut down its site
  • 101.
    10. Information Leakagedue to Network Infiltration Currently anyone with a usable email address can join any geographical network on the Facebook An experiment on Facebook A user sent invitations to 250,000 users across US 75,000 users (30%) accepted the invitations (and reveal their profile information to a random stranger)
  • 102.
    Information Leakage dueto Network Infiltration Another experiment Antivirus company Sophos created a profile page for “Freddi Staur” (an anagram of “ID Fraudster”) A green plastic frog with minimal personal information in the profile 200 friend requests were sent, 87 of the 200 responded 72% of respondents revealed their email addresses; 84% revealed their birth date
  • 103.
    11. Reputation Slanderthough Identity Theft Fake profiles may be created in the name of well-known persons or dead celebrities E.g., Galileo has a profile on MySpace and 3000 friends Fake profiles may be used for malicious purposes , e.g., defamation The target of the attack cannot access the profile Most SNS perform only weak authentication of registrants But how?
  • 104.
    12. Cyberstalking Around 20% users on Facebook disclosed their full address and at least two classes they are attending 78% provided instant messaging accounts suitable for tracking their online status Mobile SNS, e.g., Twitter, emphasize location data
  • 105.
  • 106.
  • 107.
  • 108.
  • 109.
  • 110.
  • 111.
    Efforts to Copewith Threats (1) Restrict spidering and bulk downloads But how? Require the consent of the data subject for tagging Provide more privacy control over search results
  • 112.
  • 113.
    However, … ProfileSearchability We measured the percentage of users that changed search default setting away from being searchable to everyone on the Facebook to only being searchable to CMU users 1.2% of users (18 female, 45 male) made use of this privacy setting Profile Visibility We evaluated the number of CMU users that changed profile visibility by restricting access from unconnected users Only 3 profiles ( 0.06% ) in total fall into this category
  • 114.
    Efforts to Copewith Threats (2) Image anonymization techniques, e.g., face de-identification. K-Same: thwarts face recognition while many facial details remain.
  • 115.
    Efforts to Copewith Threats (3) Reputation management Rating an account or an object (e.g., comment) Reporting inappropriate behavior or content Collective decision , not by the experts Collusion: a secret agreement between two or more parties for a fraudulent, illegal, or deceitful purpose Unfairly low ratings – bad-mouthing Unfairly high ratings – ballot stuffing
  • 116.
    Rating Score AggregationThe review is considered trustworthy if his/her earlier reviews are more consistent with public opinions. 5 0.1 8 2 0.3 7 5 0.1 6 1 0.6 5 4 0.9 4 3 0.3 3 5 0.4 2 3 1 1 Rating T(i) i TVBS Counting Method TVBS outputs 3 !! Counting method outputs 5 !!
  • 117.
    Collusion-Resistant Impeachment System開放網友檢舉申訴不當行為 Social network systems ( 無名小站 , Facebook, 奇摩交友 , …) Online games ( 檢舉作弊或使用外掛 ) Bad mouthing problem Some voters with secret agreement vote some victims for a fraudulent, illegal, or deceitful purpose Our Goal Detecting misbehaved users despite of collusion behavior
  • 118.
    Our Proposed AlgorithmStep 1: Find voting communities Newman's community structure analysis algorithm break into highly-connected subcomponents Step 2: Identify bad users based on votes between voting communities Cluster users according to two features : Votee's outside edges: Misbehaved users tend to be voted by different communities. Voter's outside edges: Voters who vote misbehaved users tend to vote misbehaved users in different communities Select the cluster with the highest outside edges as misbehaved user group outside edge inside edge
  • 119.
  • 120.
    Efforts to Copewith Threats (4) Biometric signature for combating identity theft fingerprint voice keystroke mouse move dynamics
  • 121.
    TAKE HOME MESSAGESRule 1: If you think your mom would be offended, then don't post it. Rule 2: Consider the “7 Ps” ( Parents, Police, Predators, Professors, Prospective Employers, Peers and Pals ) before posting your own information on the Internet. Rule 3: Ask your pals to follow the rules.
  • 122.
    結語 線上社群網路是不容忽視的嶄新網路生態,同時也帶來許多資安危機及隱憂。 除了依賴使用者的“ 自覺 ” ,更須有妥善的 機制來防範未然。 對學術研究者:新的研究議題; 對社群系統設計者:將隱私問題考慮在系統設計內。 問題:我們還能做些什麼?
  • 123.
    謝謝各位! 陳昇瑋 中央研究院資訊科學研究所 http://www.iis.sinica.edu.tw/~swc
  • 124.

Editor's Notes

  • #9 The conversation map is a representation of Social services and conversation
  • #11 Especially when almost as many people are on social networks than using emails. Can we play then in those spaces? NEXT
  • #36 人工檢查,抽樣 1000 ,準確率 74% (738) , error 大部分也為暱稱。 78% 再 74% = 58% ( 超過一半 ) 72% 再 74% = 53% ( 超過一半 ) 30% 再 74% = 22% (5 分之一 )
  • #38 人工檢查,抽樣 1000 ,準確率 74% (738) , error 大部分也為暱稱。 78% 再 74% = 58% ( 超過一半 ) 72% 再 74% = 53% ( 超過一半 ) 30% 再 74% = 22% (5 分之一 )
  • #123 h