- Naive anonymization of networks provides poor privacy protection against attacks using external information. Stronger techniques are needed.
- Many approaches transform networks while aiming to preserve utility. Transformations include adding/removing edges, clustering nodes, and random alterations.
- The goal is to obscure identifying features while hopefully preserving overall topology, in order to resist re-identification and protect sensitive relationships.
This document provides an overview of botnets, including their components, structures, operations cycles, and defense capabilities. It discusses how botnets have advanced over time and describes examples like Mirai and APT28. It also examines the business models of cybercriminals using botnets and characterizes different types of attackers. The document analyzes cooperation within and between criminal organizations and how botnets appear in networks and carry out propagation and attack patterns.
This document discusses data manipulation attacks (DMA) against critical infrastructure systems. It provides background on the growing threat of DMAs and examines how black hat hackers can target electric utility systems by manipulating data in Supervisory Control and Data Acquisition (SCADA) systems. Specifically, it proposes targeting the Pepco electric utility in Washington D.C. during morning rush hour in July by entering SCADA systems at substations via unencrypted protocols and manipulating data to cause power outages, disrupting government operations. The document analyzes the tools, techniques, and vulnerabilities that could enable such an attack.
This document provides a briefing on cyberwarfare. It begins with definitions of cyber, warfare, and cyberwarfare. It then discusses three recent cyberwarfare events: 1) Russia attacking Georgia in 2008 through DDoS and hacking, 2) An unknown agency attacking US military networks in 2008 through an infected USB drive, and 3) An unknown attacker (allegedly Israel) targeting Iran's nuclear facilities in 2010 through the Stuxnet virus. It analyzes the impacts and countermeasures for each event. Finally, it concludes with questions around regulating cyber groups and establishing protocols for cyberweapons.
Hacking with Skynet - How AI is Empowering AdversariesGTKlondike
It's no question that modern advances in AI and Deep Learning technologies have allowed organizations to greatly scale their defensive capabilities. Between detecting evolving threats, automating discovery, fighting dynamic attacks, and even freeing up time for IT professionals; AI-fueled automation has been a boon for system defenders. But before we get too comfortable, we need to remember that there is another side to this fight.
In this talk, we'll take a look at how AI technologies are enhancing adversarial capabilities and how challenges in defensive machine learning are opening up new attack surfaces.
This document discusses the history and modern uses of steganography, or hidden writing. It begins by defining steganography and outlining its evolution from ancient times to today. Examples are given of steganography being used for espionage as well as in computer malware. The document then proposes a theoretical application of using online gaming to transmit encrypted data covertly through scripted in-game movements. It considers the potential and limitations of using virtual environments for digital steganography.
The document presents three quotes about unity. Bill Taylor's quote emphasizes that individual players are less important than teamwork in achieving championships. Leone Levi's quote suggests that a single voice is more effective at communicating than multiple voices. Locke's quote defines something imaginary as an unreal creation of fancy.
Hello Guys!
The http://treadmillus.com/proform-505-cst-treadmill-2014-model-2/ ProForm 505 CST Treadmill is designed to get you moving. Through revolution of the treadmill industry ProForm 505 CST Treadmill now become world-famous for its design. For its advances in cushioning, electronics, and reliability people want to grab it. Not only its striking features but also its design seduces people to try its endless benefits.
This document provides an overview of botnets, including their components, structures, operations cycles, and defense capabilities. It discusses how botnets have advanced over time and describes examples like Mirai and APT28. It also examines the business models of cybercriminals using botnets and characterizes different types of attackers. The document analyzes cooperation within and between criminal organizations and how botnets appear in networks and carry out propagation and attack patterns.
This document discusses data manipulation attacks (DMA) against critical infrastructure systems. It provides background on the growing threat of DMAs and examines how black hat hackers can target electric utility systems by manipulating data in Supervisory Control and Data Acquisition (SCADA) systems. Specifically, it proposes targeting the Pepco electric utility in Washington D.C. during morning rush hour in July by entering SCADA systems at substations via unencrypted protocols and manipulating data to cause power outages, disrupting government operations. The document analyzes the tools, techniques, and vulnerabilities that could enable such an attack.
This document provides a briefing on cyberwarfare. It begins with definitions of cyber, warfare, and cyberwarfare. It then discusses three recent cyberwarfare events: 1) Russia attacking Georgia in 2008 through DDoS and hacking, 2) An unknown agency attacking US military networks in 2008 through an infected USB drive, and 3) An unknown attacker (allegedly Israel) targeting Iran's nuclear facilities in 2010 through the Stuxnet virus. It analyzes the impacts and countermeasures for each event. Finally, it concludes with questions around regulating cyber groups and establishing protocols for cyberweapons.
Hacking with Skynet - How AI is Empowering AdversariesGTKlondike
It's no question that modern advances in AI and Deep Learning technologies have allowed organizations to greatly scale their defensive capabilities. Between detecting evolving threats, automating discovery, fighting dynamic attacks, and even freeing up time for IT professionals; AI-fueled automation has been a boon for system defenders. But before we get too comfortable, we need to remember that there is another side to this fight.
In this talk, we'll take a look at how AI technologies are enhancing adversarial capabilities and how challenges in defensive machine learning are opening up new attack surfaces.
This document discusses the history and modern uses of steganography, or hidden writing. It begins by defining steganography and outlining its evolution from ancient times to today. Examples are given of steganography being used for espionage as well as in computer malware. The document then proposes a theoretical application of using online gaming to transmit encrypted data covertly through scripted in-game movements. It considers the potential and limitations of using virtual environments for digital steganography.
The document presents three quotes about unity. Bill Taylor's quote emphasizes that individual players are less important than teamwork in achieving championships. Leone Levi's quote suggests that a single voice is more effective at communicating than multiple voices. Locke's quote defines something imaginary as an unreal creation of fancy.
Hello Guys!
The http://treadmillus.com/proform-505-cst-treadmill-2014-model-2/ ProForm 505 CST Treadmill is designed to get you moving. Through revolution of the treadmill industry ProForm 505 CST Treadmill now become world-famous for its design. For its advances in cushioning, electronics, and reliability people want to grab it. Not only its striking features but also its design seduces people to try its endless benefits.
This document provides an overview of an AWS Cloud School training on Amazon Web Services (AWS). It includes an agenda that outlines five modules covering topics like AWS storage, compute and networking, managed services and databases, and deployment and management. Each module is designed to describe fundamental AWS services and help attendees learn how to use AWS technologies. The document also provides copyright information and contact details for questions.
El documento habla sobre la etnografía cotidiana y propone 5 temas para su estudio: 1) la naturaleza y el supermercado, 2) nuevas experiencias y autoexpresión, 3) diversión social y expresión, 4) hábitos mediáticos de los niños, 5) vida privada y espacio público.
This document outlines 8 principles of effective language learning: 1) Starting from what students already know; 2) Developing the home language supports additional language learning and academics; 3) Active learning through peer interaction and purposeful talk aids understanding; 4) Discovery learning allows students to build on prior knowledge; 5) Focusing on both meaning and accuracy is important; 6) The language learning process works best when it is non-threatening and allows building confidence over time; 7) Valuing students' home languages and cultures supports their learning; and 8) Creating a sense of belongingness and acceptance in the classroom. The principles emphasize building on students' existing knowledge and skills in a supportive environment.
El documento describe diferentes tipos de sangrados que pueden aplicarse a un texto, incluyendo sangrado de primera línea, sangrado izquierdo, sangrado derecho, sangrado a los dos lados, y sangrado francés, utilizando el mismo párrafo de ejemplo para cada uno con el fin de ilustrar cómo se vería el texto con cada estilo de sangrado.
Penn State RCC has been a CUDA research center for the last year; this talk provides success stories and challenges. GPU case studies are given, including algorithm details and performance results.
Duurzame asfaltontwikkelingen | Jan Voskuilen (InfraTech 2015)CROW
Ontwikkelingen op het gebied van asfalt volgen elkaar in snel tempo op. Hoe houdt u als opdrachtgever hier zicht op? Duurzaamheid is het brede thema van deze sessie. Aan de orde komt de herziene versie van CROW-publicatie 210 ‘Richtlijn omgaan met vrijkomend asfalt’. Wat betekent deze nieuwe publicatie voor de opdrachtgever?
Daarnaast wordt er ingegaan op belangrijke ontwikkelingen die van asfalt een nog duurzamer product maken, bijvoorbeeld 100% hergebruik van asfalt, bio-bitumen en laagtemperatuurasfalt. Tot slot krijgt u antwoord op de vragen: Hoe zorg ik ervoor dat deze duurzame producten gebruikt worden? En hoe weet ik of ik de kwaliteit krijg die ik heb gevraagd?
Naden, de zwakste schakel in asfalt?
Een naad in een materiaal is altijd de zwakste schakel. Bij asfalt wordt de meest duurzame naad verkregen door asfalt heet in heet aan te brengen, maar dat is niet altijd mogelijk. Om de kwaliteit van naden, die ontstaan bij het aanbrengen van heet tegen koud asfalt, duurzaam te verbeteren, worden diverse typen naadbeschermers aangebracht. Deze naadbeschermers kunnen na verloop van tijd te glad worden en/of gaan glimmen. Omdat dit tot gevaarlijke situaties kan leiden, heeft RWS een prijsvraag gehouden om tot verbeteringen te komen. De vernieuwde naadbeschermers scoren goed op met name stroefheid, niet glimmen, waterdoorlatendheid en duurzaamheid.
Buyer Persona - Key to B2B online marketing successShimonBen
Buyer Persona are indispensable in B2B Marketing. A tool that will help you to become buyer centric. Learn how to develop the profiles and how to gather the data.
This document discusses technology startups in Turkey across three categories: e-commerce, games, and content. E-commerce startups like Hepsiburada, Markafoni, and Gittigidiyor have helped make Turkey the second fastest growing e-commerce market after India. Game startups such as Peak Games and Monochroma develop and sell games, with the Turkish video game market growing substantially. Content startups including Ekşisözlük, Böbiler, and Onedio create unique content and have millions of monthly visitors. Overall, technology startups have become a significant part of Turkey's growing digital economy.
Aide-moi à faire seul !
Développer l’autonomie d’un enfant, c’est forcément se projeter
dans l’avenir : imaginer cet enfant dans sa future vie d’adulte,
épanoui et serein.
Pour l’enfant, faire l’expérience de l’autonomie n’est pas seulement
un jeu. C’est un rite de passage pour grandir, développer un
sentiment d’appartenance, se sentir utile…
Nous vous proposons des solutions pratiques, concrètes pour
accompagner tous les enfants, les adolescents et les jeunes adultes,
quels que soient leurs besoins spécifiques, vers plus d’autonomie
au quotidien.
This presentation is prepared by N. Sanu for conducting science classes in Kerla by KSSP in connection with IYC celebrations.
You can share, remix or adapt this presentation as per the conditions of Creative Commons Attribution licence to KSSP.
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...Sameera Horawalavithana
The document discusses research into quantifying the relationship between a graph's properties and its vulnerability to deanonymization attacks. It presents three research questions: 1) How topological properties affect attacks, 2) How node attribute placement affects vulnerability, and 3) How diffusion processes impact vulnerability. The methodology section outlines generating synthetic and real-world graphs, modeling attacks, and measuring success. Key findings include some topological properties like transitivity and assortativity impacting privacy independent of degree distribution. Node attribute diversity increases vulnerability more than attribute homophily. Faster spreading diffusions see higher vulnerability growth. The implications are discussed for data owners and privacy researchers.
This document summarizes an analysis of complex networks using open source software tools. It provides an overview of network graph analysis and statistical and visual measures used to assess network patterns. It then demonstrates these concepts through case studies of the Miles Davis album collaboration network, the Boston Red Sox player network, and a GDELT news event network. The document concludes that network graph analysis is a powerful technique for understanding relationships in connected data.
Fighting Malware with Graph Analytics: An End-to-End Case StudyPriyanka Aash
1. The document discusses using graph analytics and mining techniques for malware detection. Specifically, it explores using graphs to extract malware dictionaries from DNS traffic to detect dictionary-generated domain algorithms (DGA).
2. It describes techniques like anomaly detection using egonet analysis and community detection that can help identify malicious behavior in graphs.
3. Open-source tools like NetworkX, python-Louvain, and Gephi are recommended for working with and visualizing graph data.
This document provides an overview of an AWS Cloud School training on Amazon Web Services (AWS). It includes an agenda that outlines five modules covering topics like AWS storage, compute and networking, managed services and databases, and deployment and management. Each module is designed to describe fundamental AWS services and help attendees learn how to use AWS technologies. The document also provides copyright information and contact details for questions.
El documento habla sobre la etnografía cotidiana y propone 5 temas para su estudio: 1) la naturaleza y el supermercado, 2) nuevas experiencias y autoexpresión, 3) diversión social y expresión, 4) hábitos mediáticos de los niños, 5) vida privada y espacio público.
This document outlines 8 principles of effective language learning: 1) Starting from what students already know; 2) Developing the home language supports additional language learning and academics; 3) Active learning through peer interaction and purposeful talk aids understanding; 4) Discovery learning allows students to build on prior knowledge; 5) Focusing on both meaning and accuracy is important; 6) The language learning process works best when it is non-threatening and allows building confidence over time; 7) Valuing students' home languages and cultures supports their learning; and 8) Creating a sense of belongingness and acceptance in the classroom. The principles emphasize building on students' existing knowledge and skills in a supportive environment.
El documento describe diferentes tipos de sangrados que pueden aplicarse a un texto, incluyendo sangrado de primera línea, sangrado izquierdo, sangrado derecho, sangrado a los dos lados, y sangrado francés, utilizando el mismo párrafo de ejemplo para cada uno con el fin de ilustrar cómo se vería el texto con cada estilo de sangrado.
Penn State RCC has been a CUDA research center for the last year; this talk provides success stories and challenges. GPU case studies are given, including algorithm details and performance results.
Duurzame asfaltontwikkelingen | Jan Voskuilen (InfraTech 2015)CROW
Ontwikkelingen op het gebied van asfalt volgen elkaar in snel tempo op. Hoe houdt u als opdrachtgever hier zicht op? Duurzaamheid is het brede thema van deze sessie. Aan de orde komt de herziene versie van CROW-publicatie 210 ‘Richtlijn omgaan met vrijkomend asfalt’. Wat betekent deze nieuwe publicatie voor de opdrachtgever?
Daarnaast wordt er ingegaan op belangrijke ontwikkelingen die van asfalt een nog duurzamer product maken, bijvoorbeeld 100% hergebruik van asfalt, bio-bitumen en laagtemperatuurasfalt. Tot slot krijgt u antwoord op de vragen: Hoe zorg ik ervoor dat deze duurzame producten gebruikt worden? En hoe weet ik of ik de kwaliteit krijg die ik heb gevraagd?
Naden, de zwakste schakel in asfalt?
Een naad in een materiaal is altijd de zwakste schakel. Bij asfalt wordt de meest duurzame naad verkregen door asfalt heet in heet aan te brengen, maar dat is niet altijd mogelijk. Om de kwaliteit van naden, die ontstaan bij het aanbrengen van heet tegen koud asfalt, duurzaam te verbeteren, worden diverse typen naadbeschermers aangebracht. Deze naadbeschermers kunnen na verloop van tijd te glad worden en/of gaan glimmen. Omdat dit tot gevaarlijke situaties kan leiden, heeft RWS een prijsvraag gehouden om tot verbeteringen te komen. De vernieuwde naadbeschermers scoren goed op met name stroefheid, niet glimmen, waterdoorlatendheid en duurzaamheid.
Buyer Persona - Key to B2B online marketing successShimonBen
Buyer Persona are indispensable in B2B Marketing. A tool that will help you to become buyer centric. Learn how to develop the profiles and how to gather the data.
This document discusses technology startups in Turkey across three categories: e-commerce, games, and content. E-commerce startups like Hepsiburada, Markafoni, and Gittigidiyor have helped make Turkey the second fastest growing e-commerce market after India. Game startups such as Peak Games and Monochroma develop and sell games, with the Turkish video game market growing substantially. Content startups including Ekşisözlük, Böbiler, and Onedio create unique content and have millions of monthly visitors. Overall, technology startups have become a significant part of Turkey's growing digital economy.
Aide-moi à faire seul !
Développer l’autonomie d’un enfant, c’est forcément se projeter
dans l’avenir : imaginer cet enfant dans sa future vie d’adulte,
épanoui et serein.
Pour l’enfant, faire l’expérience de l’autonomie n’est pas seulement
un jeu. C’est un rite de passage pour grandir, développer un
sentiment d’appartenance, se sentir utile…
Nous vous proposons des solutions pratiques, concrètes pour
accompagner tous les enfants, les adolescents et les jeunes adultes,
quels que soient leurs besoins spécifiques, vers plus d’autonomie
au quotidien.
This presentation is prepared by N. Sanu for conducting science classes in Kerla by KSSP in connection with IYC celebrations.
You can share, remix or adapt this presentation as per the conditions of Creative Commons Attribution licence to KSSP.
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...Sameera Horawalavithana
The document discusses research into quantifying the relationship between a graph's properties and its vulnerability to deanonymization attacks. It presents three research questions: 1) How topological properties affect attacks, 2) How node attribute placement affects vulnerability, and 3) How diffusion processes impact vulnerability. The methodology section outlines generating synthetic and real-world graphs, modeling attacks, and measuring success. Key findings include some topological properties like transitivity and assortativity impacting privacy independent of degree distribution. Node attribute diversity increases vulnerability more than attribute homophily. Faster spreading diffusions see higher vulnerability growth. The implications are discussed for data owners and privacy researchers.
This document summarizes an analysis of complex networks using open source software tools. It provides an overview of network graph analysis and statistical and visual measures used to assess network patterns. It then demonstrates these concepts through case studies of the Miles Davis album collaboration network, the Boston Red Sox player network, and a GDELT news event network. The document concludes that network graph analysis is a powerful technique for understanding relationships in connected data.
Fighting Malware with Graph Analytics: An End-to-End Case StudyPriyanka Aash
1. The document discusses using graph analytics and mining techniques for malware detection. Specifically, it explores using graphs to extract malware dictionaries from DNS traffic to detect dictionary-generated domain algorithms (DGA).
2. It describes techniques like anomaly detection using egonet analysis and community detection that can help identify malicious behavior in graphs.
3. Open-source tools like NetworkX, python-Louvain, and Gephi are recommended for working with and visualizing graph data.
IRJET- A Survey for an Efficient Secure Guarantee in Network FlowIRJET Journal
This document summarizes research on providing secure guarantees for network flows. It discusses obfuscation techniques that aim to securely guarantee confidentiality of sensitive data like IP addresses in network traces. The paper identifies threats from incremental release of network flows and proposes applying obfuscation using secure hash algorithms. It formally proves the confidentiality guarantees achieved and evaluates the algorithm experimentally on real network flows. Previous related work on network flow sanitization, fingerprinting, and injection are also reviewed.
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.co¬m-Visit Our Website: www.finalyearprojects.org
This document discusses analytics for assessing cybersecurity risks in smart grids. It identifies several risk management practices for smart grids including the NIST supply chain risk management practice, Department of Energy risk management practice, and compliance with technical standards. It also maps the relationships between smart grid domains, actors, interfaces, and vulnerabilities based on NIST guidelines to identify high-risk areas and inform priority actions. Finally, it shows how risk identification and assessment can be conducted based on analyzing security objectives, impact levels, and relationships between smart grid components defined in NIST guidelines.
This talk combines two stories about the analysis of data associated with diseases. In the first, we introduce community detection in networks and use network representations of genetic virulence factor similarities between different uropathogenic E. coli strains to identify communities of these strains that are more similar to each other than to the rest of the studied population. We then discuss the clinical differences between these E. coli communities. In the second story, we investigate metabolomic data obtained from stool samples of hospitalized patients. We employ a variety of methods for handling this sparse data to generate a new classifier for the presence of C.difficile in the samples. Working closely with our clinical collaborators, we then obtain a wholly new and surprisingly simple and accurate measurement for detecting the presence of active C. difficile infections.
A Study of Usability-aware Network Trace Anonymization Kato Mivule
This document summarizes research on anonymizing network trace data while maintaining usability. It discusses challenges in applying traditional anonymization techniques to network traces due to their unique structure. The paper proposes heuristics for usability-aware anonymization that apply microdata privacy techniques separately to different network trace attributes. Preliminary results suggest the potential to generate anonymized traces with improved usability through trade-offs determined on a case-by-case basis. The document also reviews related work on network trace anonymization and attacks against anonymized data.
This document provides an introduction to social network analysis. It discusses how social network analysis views social relationships as connections between individuals, and uses tools to systematically study these connections. The key topics covered include:
- Why social networks are important to study as they influence information and resource sharing
- The basic data elements in social network analysis, including nodes to represent individuals and edges to represent relationships between nodes
- Different levels of network data, from ego networks to complete networks
- Common ways to represent network data structurally, including graphs, matrices, and lists
- An overview of how social network analysis can help answer questions about how social relationships influence individual behaviors and the structure of social hierarchies.
DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...IJNSA Journal
The document presents a framework called SoNMine that identifies key players in the 9/11 covert network using node behavioral profiles. It generates profiles by analyzing node behaviors based on path types extracted from the network's multi-relational structure. The framework identifies outlier nodes with dense connections or high communication as influential players. It also determines dominant features that help classify normal and outlier nodes more accurately.
A Study on Genetic-Fuzzy Based Automatic Intrusion Detection on Network DatasetsDrjabez
1. The document proposes a genetic-fuzzy based method for automatic intrusion detection using network datasets. It combines fuzzy set theory with genetic algorithms to extract rules for both discrete and continuous attributes to detect normal and intrusion patterns.
2. The method was tested on KDD99 Cup and DARPA98 network intrusion detection datasets and showed high detection rates with low false alarm rates for both misuse detection and anomaly detection.
3. By extracting many rules to represent normal network behavior patterns, the proposed genetic-fuzzy approach can detect new or unknown intrusions based on anomalies without requiring prior domain expertise on intrusion patterns.
The document discusses using anonymous identity assignment (AIDA) to preserve privacy and security when sharing data across distributed systems. AIDA allows nodes in a network to be assigned anonymous IDs through a distributed computation, so their real identities are unknown to other nodes. This allows sensitive data to be shared anonymously. The document outlines existing approaches like using a trusted third party and secure multi-party computation, and their limitations. It then proposes using AIDA and secure computations to allow private queries and sharing of complex data while preserving anonymity of nodes. Experimental results show the approach can anonymously store and retrieve patient medical records.
Network Data Collection
The document discusses collecting social network data. It covers three main topics:
1) Introduces social network analysis and why networks are important in social science. Networks matter because of connections that allow diffusion and because positions in networks influence roles and behavior.
2) Discusses research design considerations for collecting network data, including specifying relations of interest based on theoretical mechanisms, boundary selection, and sampling approaches.
3) Addresses accuracy of network survey data and how to handle inaccurate or missing data. The goal is to systematically understand connections between actors using empirical network data and analysis methods.
This document proposes SMURFEN, a system framework for collaborative intrusion detection through rule sharing. SMURFEN uses a peer-to-peer network architecture to allow nodes to share detection knowledge. It employs a two-level game propagation model to efficiently disseminate rules while ensuring fairness, scalability and robustness. Evaluation of the model shows it can propagate rules efficiently across large networks while maintaining fairness and resisting common insider attacks.
This document discusses large scale threats to data anonymity from re-identification attacks on a variety of datasets including movie ratings, product reviews, search logs, social networks, and genomic data. It outlines several types of attacks such as exact matching, fuzzy matching with and without noise, and matching based on auxiliary information from multiple databases. These attacks can re-identify individuals from anonymized data in many contexts by exploiting common attributes, relationships, behaviors, and other clues. The threats are significant and privacy is very difficult to guarantee for rich, high-dimensional datasets.
This document discusses considerations for collecting social network data through surveys. It addresses research design elements like defining the relevant population boundaries and sampling approaches. For surveys specifically, it covers informed consent, name generator questions to identify social ties, response formats, and balancing depth of network detail collected versus sample size. The key challenges are defining the theoretical population of interest, collecting a sufficiently large and representative network sample, and designing survey questions that accurately capture social ties within time and resource constraints.
This document discusses considerations for collecting social network data through surveys. It addresses research design elements like defining the boundaries of the relevant population, sampling approaches for collecting local, global or complete network data, and sources of network data including surveys, archives, and secondary data sources. The document also provides guidance on survey elements like name generators, response formats, and balancing breadth versus depth of network data collection given time constraints of surveys.
Network science is an interdisciplinary field that studies complex networks. It draws on theories from mathematics, physics, computer science, statistics, and sociology. The document provides an introduction to network science and outlines topics including network analysis, visualization, and business applications. It also summarizes the history and development of network science as an academic field.
Finding Key Influencers and Viral Topics in Twitter Networks Related to ISIS,...Steve Kramer
Paragon Science used a combination of network analysis, community detection, topic detection, sentiment analysis, and anomaly detection methods to find key influencers and viral topics in two recent Twitter data sets: one of 7.9 M tweets regarding ISIS, a second set consisting of more than 117 M tweets about the 2016 primary elections, and a third set of 7M tweets realted to Brexit.
Paragon Science's patented dynamic anomaly detection technology is based on methods drawn from dynamical systems and chaos theory. In particular, we can calculate finite-time Lyapunov exponents from any time-dependent data stream to find the clusters of entities that are behaving most chaotically compared to the rest of the data set. Because we do not have to specify normal vs. abnormal behavior in advance, no machine learning per se is required. In a robust fashion that is tolerant of missing or erroneous data, we can identify the "unknown unknowns" that can represent threats to be mitigate or opportunities to be seized. To date, our technique has been applied successfully to a broad range of industry verticals, including healthcare data (Advisory Board Company), web user behavior data (Vast), mobile phone data (Place IQ), vehicle pricing analytics (Digital Motorworks/CDK Global), online coupon data (RetailMeNot), email monitoring for patent law cases, and social media monitoring.
Similar to Privacy-Aware Data Management in Information Networks - SIGMOD 2011 Tutorial (20)
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
Privacy-Aware Data Management in Information Networks - SIGMOD 2011 Tutorial
1. Privacy-Aware Data Management
in Information Networks
Michael Hay, Cornell University
Kun Liu, Yahoo! Labs
Gerome Miklau, Univ. of Massachusetts Amherst
Jian Pei, Simon Fraser University
Evimaria Terzi, Boston University
SIGMOD 2011 Tutorial 1
2. Romantic connections in a high school
Bearman, et al.
The structure of adolescent romantic and sexual networks.
American Journal of Sociology, 2004. (Image drawn by Newman)2
3. A
B
G
G
G GT
GT
GT
G
G
G
G
Figure 4 (A) Graph o
component in gang ass
outbreak, Colorado Sp
1989–91 (n = 410). (B
largest component in ga
associated STD outbrea
Springs, 1989–91 (n =
Sexual network structure
on 16 August 2008sti.bmj.comDownloaded from
Potterat, et al.
Risk network structure in the early epidemic phase of hiv transmission in colorado springs.
Sexually Transmitted Infections, 2002.
Sexual and injecting drug partners
3
6. Privately managing
enterprise network data
Personal Privacy in
Online Social Networks
Data: Enterprise collects
data or observes interactions
of individuals.
Control: Enterprise controls
dissemination of information.
Goal: permit analysis of
aggregate properties; protect
facts about individuals.
Challenges: privacy for
networked data, complex
utility goals.
Data: Individuals contribute
their data thru participation
in OSN.
Control: Individuals control
their connections,
interactions, visibility.
Goal: reliable and
transparent sharing of
information.
Challenges: system
complexity, leaks thru
inference, unskilled users. 6
7. • Privately Managing Enterprise Network Data
• Goals, Threats, and Attacks
• Releasing transformed networks (anonymity)
• Releasing network statistics (differential privacy)
• Personal Privacy in Online Social Networks
• Understanding privacy risk
• Managing privacy controls
Outline of tutorial
60
minutes
30
minutes
7
8. Data model
ID Age HIV
Alice 25 Pos
Bob 19 Neg
Carol 34 Pos
Dave 45 Pos
Ed 32 Neg
Fred 28 Neg
Greg 54 Pos
Harry 49 Neg
Alice Bob Carol
Dave Ed
Fred Greg Harry
Nodes
ID1 ID2
Alice Bob
Bob Carol
Bob Dave
Bob Ed
Dave Ed
Dave Fred
Dave Greg
Ed Greg
Ed Harry
Fred Greg
Greg Harry
Edges
8
9. Sensitive information in networks
• Disclosing attributes
• Disclosing edges
• Disclosing properties
• node degree, clustering, etc.
• properties of neighbors (e.g. mostly friends with republicans)
9
10. Goals in analyzing networks
• Properties of the degree
distribution
• Motif analysis
• Community structure
• Processes on networks:
routing, rumors, infection
• Resiliency / robustness
• Homophily
• Correlation / causation
Can we permit analysts to study networks without
revealing sensitive information about participants?
Example analyses
10
11. Naive anonymization
Original network
Naive
Anonymization
DATA OWNER ANALYST
Naive anonymization is a transformation of the network in which
identifiers are replaced with random numbers.
Alice Bob Carol
Dave Ed
Fred Greg Harry
4
2
5
13
6
7
8
Naively anonymized network
Good utility: output is isomorphic to the original network
Alice
Bob
Carol
Dave
Ed
Fred
Greg
Harry
6
8
5
7
2
3
4
1
11
12. Protection under naive anonymization
• Two primary threats:
• Node re-identification: adversary is able to deduce that node x
in the naively anonymized network corresponds to an identified
individual Alice in the hidden network.
• Edge disclosure: adversary is able to deduce that two identified
individuals Alice and Bob are connected in the hidden network.
• With no external information: good protection
• Who is Alice? one of {1,2,3,4,5,6,7,8}
• Alice and Bob connected? 11/28 likelihood 4
2
5
13
6
7
8
12
13. Adversaries with external information
• Structural knowledge
• often assumed limited to small radius around node
• “Alice has degree 2” or “Bob has two connected neighbors”
• Information can be precise or approximate
• External information may be acquired from a specific attack, or we may
assume a category of knowledge as a bound on adversary capabilities.
External information: facts about identified individuals
and their relationships in the hidden network.
13
14. A
B GT G
G
Figure 4 (A) Graph of the largest
component in gang associated STD
outbreak, Colorado Springs,
1989–91 (n = 410). (B) Core of the
largest component in gang
associated STD outbreak, Colorado
Springs, 1989–91 (n = 107).
Sexual network structure i155
on 16 August 2008sti.bmj.comDownloaded from
Naively Anonymized Network 14
unique or partial
node re-identification
Bob
External information
Alice
Matching attack: the adversary
matches external information to a
naively anonymized network.
Matching attacks
15. Attacks on naively anonymized networks
• Success of a matching attack depends on:
• descriptiveness of external information
• structural diversity in the network
• With external information: weaker protection
• Who is Alice?
• Who is Alice, if her degree is known to be 4 ?
• Alice and Bob connected?
one of {2,4,7,8}
4
2
5
13
6
7
8
one of {1,2,3,4,5,6,7,8}
15
17. Active attack on an online network
• Goal: disclose edge between two targeted individuals.
• Assumption: adversary can alter the network structure, by creating
nodes and edges, prior to naive anonymization.
• In blogging network: create new blogs and links to other blogs.
• In email network: create new identities, send mail to identities.
• (Harder to carry out this attack in a physical network)
[Backstrom, WWW 07] 17
18. Active attack on an online network
1
Attacker creates a distinctive
subgraph of nodes and edges.
2
Attacker links subgraph to target
nodes in the network.
Naive anonymizationNaive anonymization
3
Attacker finds matches for pattern in
naively anonymized network.
4
Attacker re-identifies targets and
discloses structural properties.
Alice
Bob
A
B
G
G
G GT
GT
GT G
GT
G
GTGT
G
GT
G
G
Sexual network structure
on 16 August 2008sti.bmj.comDownloaded from
[Backstrom, WWW 07] 18
19. Results of active attack
• Given a network G with n nodes, it is possible to construct a
pattern subgraph with k = O(log(n)) nodes that will be unique in G
with high probability.
• injected subgraph is chosen uniformly at random.
• the number of subgraphs of size k that appear in G is small
relative to the number of all possible subgraphs of size k.
• The pattern subgraph can be efficiently found in the released
network, and can be linked to as many as O(log2(n)) target nodes.
• In 4.4 million node Livejournal friendship network, attack succeeds
w.h.p. for 7 pattern nodes.
[Backstrom, WWW 07] 19
20. Auxiliary network attack
• Goal: re-identify individuals in a naively anonymized target network
• Assumptions:
• An un-anonymized auxiliary network exists, with overlapping
membership.
• There is a set of seed nodes present in both networks, for which
the mapping between target and auxiliary is known.
• Starting from seeds, mapping is extended greedily.
• Using Twitter (target) and Flickr (auxiliary), true overlap of ~30000
individuals, 150 seeds, 31% re-identified correctly, 12% incorrectly.
[Narayanan, OAKL 09] 20
21. Summary
• Naive anonymization may be good for utility...
• ... but it is not sufficient for protecting sensitive information in
networks.
• an individual’s connections in the network can be highly
identifying.
• external information may be available to adversary from outside
sources or from specific attacks.
• Conclusion: stronger protection mechanisms are required.
21
22. Questions & challenges
• What is the correct model for adversary external information?
• How do attributes and structural properties combine to increase
identifiability and worsen attacks?
• Are there additional attacks on naive anonymization (or other forms
of anonymization)?
Next: How can we strengthen the protection offered
by a released network while preserving utility ?
22
23. • Privately Managing Enterprise Network Data
• Goals, Threats, and Attacks
• Releasing transformed networks (anonymity)
• Releasing network statistics (differential privacy)
• Personal Privacy in Online Social Networks
• Understanding privacy risk
• Managing privacy controls
Outline of tutorial
23
24. Releasing data vs. statistics
A
C
B
J
I
W
D
EF
G
Y
Aa
Bb
M
Dd
Cc
H
P
N
O
Q
R
T
Ee
U
Gg
V
L
S
K
Ff
X
Z
Hh
A
C
B
J
I
W
D
E
F
G
Y
Aa Bb
M
Dd
Cc
HP
N
O Q
R
T
Ee
U
Gg
V
L
S
K
Ff
X
Z
Hh
• Releasing transformed networks
A
C
B
J
I
W
D
E
F
G
Y
Aa Bb
M
Dd
Cc
HP
N
O Q
R
T
Ee
U
Gg
V
L
S
K
Ff
X
Z
Hh safe answer
Q
• Releasing “safe” network statistics
To prevent adversary attack,
release transformed network
• transformations obscure
identifying node features
• while hopefully preserve
global topology
24
25. • A graph G( V, E ) is k-degree anonymous if every node in V has the
same degree as k-1 other nodes in V.
Transform for degree anonymity
25
!"#$%"
&"#'%" ("#'%"
)"#'%" *"#'%"
!"#$%"
&"#$%" ("#$%"
)"#'%" *"#'%"
!"#"$%&'!(#")
[Liu, SIGMOD 08]
1-degree
anonymous
2-degree
anonymous
26. • Problem: Given a graph G( V, E ) and integer k, find minimal set of
edges E’ such that graph G( V, E ∪ E’) is k-degree anonymous.
• Approach: Use dynamic programming to finds minimum change to
degree sequence.
• Challenge: may not be possible to realize degree sequence through
edge additions.
• Example: V = {a, b, c}, E = { (b,c) }. Degree sequence is [0,1,1].
Min. change yields [1,1,1] but not realizable (without self-loops).
• Algorithm: draws on ideas from graph theory to construct a graph
with minimum, or near minimum, edge insertions.
Algorithm for degree anonymization
26
[Liu, SIGMOD 08]
27. • Degree anonymization is an instance of a more general paradigm.
Many approaches proposed follow this paradigm.
A common problem formulation
27
Given input graph G,
• Consider set of graphs G, each G* in G reachable from
G by certain graph transformations
• Find G* in G such that G* satisfies privacy( G*, ... ), and
• Minimizes distortion( G, G* )
29. Kinds of transformations
• Transformations considered in literature can be classified into three
categories
• Directed alteration
• Generalization
• Random alteration
29
30. Directed alteration
• Transform network by adding (or removing) edges
• [Liu, SIGMOD 08] insert edges to achieve degree anonymity
• [Zhou, ICDE 08] neighborhood anonymity, labels on nodes
• [Zou, PVLDB 09] complete anonymity (k isomorphic subgraphs)
• [Cheng, SIGMOD 10] complete anonymity and bounds on edge disclosure
30
Hh
Z
X
Ff
K
S
L
V
Gg
U
Ee
T
R
QO
N
P H
Cc
Dd
M
BbAa
Y
G
F
E
D
W
I
J
B
C
AA
C
B
J
I
W
D
E
F
G
Y
Aa Bb
M
Dd
Cc
HP
N
O Q
R
T
Ee
U
Gg
V
L
S
K
Ff
X
Z
Hh
31. 3
4
3
1
3
3
16
31
6
2
8
5
11
9
21
13
24
7
30
32
22
15
26
27
12
10 1 29
25
17
20
14
34
4
19
3
33
28
23
18
1
7
6
3
5
2
1
2
3
1
3
6
4 4
31
5
1
Generalization
• Transform network by cluster nodes into groups
• [Cormode, PVLDB 08] attribute-based attacks (graph structure unmodified)
on bipartite graphs, prevents edge disclosure
• [Cormode, PVLDB 09] similar to above but for arbitrary interaction graphs
(attributes on nodes and edges)
• [Hay, PVLDB 08, VLDBJ 10] summarize graph topology in terms of node
groups; anonymity against arbitrary structural knowledge
31
A
C
B
J
I
W
D
E
F
G
Y
Aa Bb
M
Dd
Cc
HP
N
O Q
R
T
Ee
U
Gg
V
L
S
K
Ff
X
Z
Hh
32. Hh
Z
X
Ff
K
S
L
V
Gg
U
Ee
T
R
QO
N
P H
Cc
Dd
M
BbAa
Y
G
F
E
D
W
I
J
B
C
A
Random alteration
• Transform network by stochastically adding, removing, or rewiring edges
• [Ying, SDM 08] random rewiring subject to utility constraint (spectral
properties of graph must be preserved).
• [Liu, SDM 09] randomization to hide sensitive edge weights
• [Wu, SDM 10] exploits spectral properties of graph data to filter out some of
the introduced noise.
32
A
C
B
J
I
W
D
E
F
G
Y
Aa Bb
M
Dd
Cc
HP
N
O Q
R
T
Ee
U
Gg
V
L
S
K
Ff
X
Z
Hh
33. Other work in network transformation
• Other works
• [Zheleva, PinKDD 07] predicting sensitive hidden edges from released
graph data (nodes and non-sensitive edges).
• [Ying, SNA-KDD 09] comparison of randomized alteration and directed
alteration.
• [Bhagat, WWW 10] releasing multiple views of a dynamic social network.
• Surveys:
• [Liu, Next Generation Data Mining 08]
• [Zhou, SIGKDD 08]
• [Hay, Privacy-Aware Knowledge Discovery 10]
• [Wu, Managing and Mining Graph Data 10]
33
34. Evaluating impact on utility
• After transformations, graph is released to public. Analyst
measures transformed graph in place of original. What is impact
on utility?
• Graph remains useful if it is “similar” to original. How measure
similarity?
• Related questions arise in statistical modeling of networks and
assessing model fitness [Goldenberg, Foundations 10] [Hunter, JASA 08]
• Common approach to evaluating utility: empirically compare
transformed graph to original graph in terms of various network
properties
34
35. Impact on network properties
0 10 20 30
1101001000
Inf 2 4 6 8 10
050100150
0.0 0.2 0.4 0.6 0.8 1.0
01101001000
path lengths clusteringdegree
frequency
!
! !
!
!
!
! ! !
!
0.000.250.50
Avg.Distortion k
2 5 10 20 |V|
0.000.250.50
Clustering
Degree
Paths
Original
k=10
k= |V|
Algorithm from Hay PVLDB 08;
experiments on version of HepTh
network (2.5K nodes, 4.7K edges)
[Hay, PVLDB 08] 35
36. Limitations
• Utility
• Transformation may distort some properties: some analysts will
find transformed graph useless
• Lack of formal bounds on error: analyst uncertain about utility
• Privacy
• Defined as resistance to a specific class of attacks; vulnerable to
unanticipated attacks?
• Inspired by k-anonymity; doomed to repeat that history? (See
survey [Chen, Foundations and Trends in Database 09].)
36
37. Outline of tutorial
• Privately Managing Enterprise Network Data
• Goals, Threats, and Attacks
• Releasing transformed networks (anonymity)
• Releasing network statistics (differential privacy)
• Differential privacy
• Degree sequence
• Subgraph counts
• Personal Privacy in Online Social Networks
37
38. Releasing data vs. statistics
Ease of use good
Protection anonymity
Accuracy no formal guarantees
A
C
B
J
I
W
D
EF
G
Y
Aa
Bb
M
Dd
Cc
H
P
N
O
Q
R
T
Ee
U
Gg
V
L
S
K
Ff
X
Z
Hh
A
C
B
J
I
W
D
E
F
G
Y
Aa Bb
M
Dd
Cc
HP
N
O Q
R
T
Ee
U
Gg
V
L
S
K
Ff
X
Z
Hh
• Releasing transformed networks
A
C
B
J
I
W
D
E
F
G
Y
Aa Bb
M
Dd
Cc
HP
N
O Q
R
T
Ee
U
Gg
V
L
S
K
Ff
X
Z
Hh safe answer
Q Ease of use bad for practical analyses
Protection formal privacy guarantee
Accuracy provable bounds
• Releasing “safe” network statistics
Q(G) + noise
output
perturbation 38
39. When are aggregate statistics safe to release?
• “Safe” statistics should report on properties of a group, without
revealing properties of individuals.
• We often want to release a combination of statistics. Still safe?
• What if adversary uses external information along with statistics?
Still safe?
• Dwork, McSherry, Nissim, Smith [Dwork, TCC 06] proposed
differential privacy as a rigorous standard for safe release.
• Many existing results for tabular data; relatively few results for
network data.
39
40. The differential guarantee
Two databases are neighbors if they differ by at most one tuple
D
q
A
DATA OWNER ANALYST
Neighbors
indistinguishable
given output
(no. of ‘B’ students)
name gender grade
Alice Female A
Bob Male B
Carl Male A
q(D)~
D’
q(D’)
q
A
name gender grade
Alice Female A
Carl Male A
~
(noisy answer on D)
40
41. Pr[A(D) ∈ S] ≤ e
Pr[A(D
) ∈ S]
Differential privacy
A randomized algorithm A provides ε-differential privacy if:
for all neighboring databases D and D’, and
for any set of outputs S:
epsilon is a
privacy parameter
= 0.1 e
≈ 1.10Epsilon is usually small: e.g. if then
epsilon = stronger privacy
[Dwork, TCC 06] 41
42. Calibrating noise
• How much noise is necessary to ensure differential privacy?
• Noise large enough to hide “contribution” of individual record.
• Contribution measured in terms of query sensitivity.
42
43. Query sensitivity
where D, D’ are any two neighboring databases
The sensitivity of a query q is
!q = max | q(D) - q(D’) |
D,D’
Query q Sensitivity !q
q1: Count tuples 1
q2: Count(‘B’ students) 1
q3: Count(students with property X) 1
q4: Median(age of students) ~ max age
[Dwork, TCC 06] 43
44. q(D) + Laplace( scale )Δq / ε
The Laplace mechanism
The following algorithm for answering q is ε-differentially private:
A
Laplace
Mechanism
[Dwork, TCC 06]
true
answer
sample from scaled
distribution
privacy
parameter
sensitivity of q
0
0.25
0.5
-5 -4 -3 -2 -1 0 1 2 3 4 5
0
0.25
0.5
-5 -4 -3 -2 -1 0 1 2 3 4 5
Δq=1
ε=1.0
Bob out Bob inBob out Bob in
Δq=1
ε=0.5
44
45. Differentially private algorithms
• Any query can be answered (but perhaps with lots of noise)
• Noise determined by privacy parameter epsilon and the sensitivity
(both public)
• Multiple queries can be answered (details omitted)
• Privacy guarantee does not depend on assumptions about the
adversary (caveats omitted, see [Kifer, SIGMOD 11])
Survey paper on differential privacy: [Dwork, CACM 10]
45
46. Adapting differential privacy for networks
• For networks, what is the right notion of “differential object?”
• Hide individual’s “evidence of participation” [Kifer, SIGMOD 11]
• An edge? A set of k edges? A node (and incident edges)?
• More discussion in [Hay, ICDM 09] [Kifer, SIGMOD 11]
• Choice impacts utility
• Existing work considers only edge, and k-edge, differential privacy.
46
A participant’s sensitive information is not a single edge.
47. What can we learn accurately?
• What can we learn accurately about a network under edge or k-
edge differential privacy?
• Basic approach:
• Express desired task as one or more queries.
• Check query sensitivity
• if High: not promising, but sometimes representation matters.
• if Low: maybe promising, but may still require work.
47
48. Outline of tutorial
• Privately Managing Enterprise Network Data
• Goals, Threats, and Attacks
• Releasing transformed networks (anonymity)
• Releasing network statistics (differential privacy)
• Differential privacy
• Degree sequence
• Subgraph counts
• Personal Privacy in Online Social Networks
48
49. The degree sequence can be estimated accurately
• Degree sequence: the list of degrees of each node in a graph.
• A widely studied property of networks.
Alice Bob Carol
Dave Ed
Fred Greg Harry
[1,1,2,2,4,4,4,4]
Inverse
cumulative
distribution
Orkut
crawl
orkut
x
CF(x)
0.00.20.40.60.81.0
s
0 1 4 9 19 49 99
Fraction
Degree
[Hay, PVLDB 10][Hay, ICDM 09]
49
50. Two basic queries for degrees
Frequency of each degreeFrequency of each degree
cnti count of nodes with degree i
F [cnt0, cnt1, ... cntn-1]
Alice Bob Carol
Dave Ed
Fred Greg Harry
Alice Bob Carol
Dave Ed
Fred Greg Harry
G G’
Degree of each nodeDegree of each node
degA degree of node A
D [degA, degB, ... ]
D(G) = [1,4,1,4,4,2,4,2]
D(Gʼ) = [1,4,1,3,3,2,4,2]
ΔD=2
F(G) = [0,2,2,0,4,0,0,0]
F(Gʼ) = [0,2,2,2,2,0,0,0]
ΔF=4 50
51. 1 2 5 10 20 50 200 5000.00.20.40.60.81.0
orkut
x
Pr[X=x]
These queries are both flawed
• D requires independent samples
from Laplace(2/ε) in each
component.
• F requires independent samples
from Laplace(4/ε) in each
component.
• Thus Mean Squared Error is Θ(n/ε2)
original
D
F
Fraction
Degree
[Hay, PVLDB 10][Hay, ICDM 09]
New technique allows
improved error of O(d log3(n)/ε2)
(where d is # of unique degrees)
51
52. An alternative query for degrees
Degree of each node, rankedDegree of each node, ranked
rnki return the rank ith degree
S [rnk1, rnk2, ... rnkn ]
S(G) = [1,1,2,2,4,4,4,4]
S(Gʼ) = [1,1,2,2,3,3,4,4]
ΔS=2
Degree of each nodeDegree of each node
degA degree of node A
D [degA, degB, ... ]
D(G) = [1,4,1,4,4,2,4,2]
D(Gʼ) = [1,4,1,3,3,2,4,2]
ΔD=2
Alice Bob Carol
Dave Ed
Fred Greg Harry
Alice Bob Carol
Dave Ed
Fred Greg Harry
G G’
52
55. Outline of tutorial
• Privately Managing Enterprise Network Data
• Goals, Threats, and Attacks
• Releasing transformed networks (anonymity)
• Releasing network statistics (differential privacy)
• Differential privacy
• Degree sequence
• Subgraph counts
• Personal Privacy in Online Social Networks
55
56. Subgraph counting queries
• Given query graph H, return the number of subgraphs of G that are
isomorphic to H.
• Importance
• Used in statistical modeling: exponential random graph models
• Descriptive statistics: clustering coefficient from 2-star, triangle
56
2-star 3-star triangle 2-triangle
57. Subgraph counts have high sensitivity
• QTRIANGLE: return the number of triangles in the graph
• High sensitivity due “pathological” worst-case graph. If input is not
pathological, can we obtain accurate answers?
...
n-2 nodes
A B
...
n-2 nodes
A B
G G’
QTRIANGLE (G) = 0 QTRIANGLE (G’) = n-2
High Sensitivity:
ΔQTRIANGLE=O(n)
57
58. Local sensitivity
• Tempting, but flawed, idea is to add noise proportional to local
sensitivity.
• Local sensitivity of q on G: maximum difference in query answer
between G and a neighbor G’.
• Example shows problem of using local sensitivity (from [Smith, IPAM
10]): database D is set of number, query q is the median
LS(G) = max | q(G) - q(G’) |
G’∈N(G)
0...0 000 c...c
}
}
(n-3)/2 (n-3)/2
D =
LS(D)=0
0...0 00c c...cD’ =
}
}
(n-3)/2 (n-3)/2
LS(D’)=c
58
59. Instance-based noise
• Two general approaches to adding instance-based noise
• Smooth sensitivity Compute a smooth upper bound on local
sensitivity [Nissim, STOC 07].
• Noisy sensitivity Use differentially private mechanism to get
noisy upper “bound” on local sensitivity [Behoora, PVLDB 11]
[Dwork, STOC 09].
• Instance-based noise can require modest relaxation of differential
privacy to account for (very low probability) “bad” events.
59
60. Differentially private subgraph counts
• For k-stars and triangles, smooth sensitivity is efficiently
computable
• For k-triangles with k ! 2
• Computing smooth sensitivity NP-Hard.
• However, it can be estimated using noisy sensitivity approach
• Empirical and theoretical analysis:
• Generally, instance-based noise not much larger than local
sensitivity
• However, for k-triangles on real data, local sensitivity sometimes
large (relative to actual number of k-triangles).
[Behoora, PVLDB 11] 60
61. Alternative representations
• Number of k-stars in a graph can be computed from the degree
sequence
• In other words, an answer to the high sensitivity k-star query can
be derived from results of the degree sequence estimator.
• Would be interesting to compare error of this approach with
instance-based noise approach of [Behoora, PVLDB 11].
k-stars(G) =
v∈G
deg(v)
k
61
62. Other work on releasing network statistics
• [Rastogi, PODS 09] Subgraph counting queries under an alternative
model of adversarial privacy. Expected error Θ(log2n) instead of
Θ(n) for restricted class of adversaries.
• [Machanavajjhala, PVLDB 11] Investigates recommender systems that
use friends’ private data to make recommendations.
• Lower bound on accuracy of differentially private recommender
• Experimental analysis shows poor utility under reasonable
privacy.
62
63. Open questions
• For graph analysis X, how accurately can we estimate X under
edge or node differential privacy?
• Lower bounds on accuracy under node differential privacy?
• Is it socially acceptable to offer weaker privacy protection to high-
degree nodes (as in k-edge differential privacy)?
• Can we generate accurate synthetic networks under differential
privacy?
63
74. 1'(*+,.'4)+5(
QC*
Privacy Score of User j due to Profile Item i
sensitivity of profile item i visibility of profile item i
name, or gender, birthday, address,
phone number, degree, job, etc.
75. 1'(*+,.'4)+5(M3)#?N(
QG*
Privacy Score of User j due to Profile Item i
sensitivity of profile item i visibility of profile item i
Overall Privacy Score of User j
name, or gender, birthday, address,
phone number, degree, job, etc.
78. -#'+'0/2(+'0$#0(*+).(0'+(0#@9(
QR*
Sensitivity of The Profile Items Computed by IRT Model
Survey
Information-sharing preferences of 153
users on 49 profile items such as name,
gender, birthday, political views, address,
phone number, degree, job, etc. are
collected.
Statistics
• 49 profile items
• 153 users from 18 countries/regions
• 53.3% are male and 46.7% are female
• 75.4% are in the age of 23 to 39
• 91.6% hold a college degree or higher
• 76.0% spend 4+ hours online per day
Average Privacy Scores Grouped by Geo Regions
103. Privately managing
enterprise network data
Personal Privacy in
Online Social Networks
Data: Enterprise collects
data or observes interactions
of individuals.
Control: Enterprise controls
dissemination of information.
Goal: permit analysis of
aggregate properties; protect
facts about individuals.
Challenges: privacy for
networked data, complex
utility goals.
Data: Individuals contribute
their data thru participation
in OSN.
Control: Individuals control
their connections,
interactions, visibility.
Goal: reliable and
transparent sharing of
information.
Challenges: system
complexity, leaks thru
inference, unskilled users.
104. Open questions and future directions
• Anonymity: models of adversary knowledge, new attacks, new
network transformations, improved utility evaluation.
• Differential privacy: adapting privacy definition to networks,
mechanisms for accurate estimates of new network statistics,
synthetic network generation, error-optimal mechanisms,
• Extended data model: attributes on nodes/edges, dynamic network
data.
105. References
• [Backstrom, WWW 07] L. Backstrom, C. Dwork, and J. Kleinberg. Wherefore art
thou r3579x?: anonymized social networks, hidden patterns, and structural
steganography. In WWW 2007.
• [Becker, W2SP 09] J. Becker and H. Chen. Measuring privacy risk in online social
networks. In W2SP 2009.
• [Behoora, PVLDB 11] I. Behoora, V. Karwa, S. Raskhodnikova, A. Smith, G.
Yaroslavtsev. Private Analysis of Graph Structure. In PVLDB 2011.
• [Besmer, CHI 10] A. Besmer and H. Lipford. Moving beyond untagging: photo
privacy in a tagged world. In CHI 2010.
• [Besmer, SOUPS 10] A. Besmer, J. Watson, and H. Lipford. The impact of social
navigation on privacy policy configuration. In SOUPS 2010.
• [Bhagat, WWW 10] S. Bhagat, G. Cormode, B. Krishnamurthy, D. Srivastava.
Privacy in dynamic social networks. In WWW 2010.
• [Campan, PinKDD 08] A. Campan and T. M. Truta. A clustering approach for data
and structural anonymity in social networks. In PinKDD 2008.
106. References (continued)
• [Chen, Foundations and Trends in Database 09] B. Chen, D. Kifer, K. LeFevre,
and A. Machanavajjhala. Privacy-Preserving Data Publishing. In Foundations and
Trends in Databases 2009.
• [Cheng, SIGMOD 10] J. Cheng, A. Wai-Chee Fu, and J. Liu. K-Isomorphism:
Privacy Preserving Network Publication against Structural Attacks. In SIGMOD
2010.
• [Cormode, PVLDB 08] G. Cormode, D. Srivastava, T. Yu, and Q. Zhang:
Anonymizing bipartite graph data using safe groupings. In PVLDB 2008.
• [Cormode, PVLDB 09] G. Cormode and D. Srivastava and S. Bhagat and B.
Krishnamurthy. Class-based graph anonymization for social network data. In
PVLDB 2009.
• [Danezis, AISec 09] G. Danezis. Inferring privacy policies for social networking
services. In AISec 2009.
• [Dwork, CACM 10] C. Dwork. A firm foundation for privacy. In CACM 2010.
• [Dwork, STOC 09] C. Dwork and J. Lei. Differential privacy and robust statistics.
In STOC 2009.
107. References (continued)
• [Dwork, TCC 06] C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating
noise to sensitivity in private data analysis. In TCC 2006.
• [Fang, WWW 10] L. Fang and K. LeFevre. Privacy wizards for social networking
sites. In WWW 2010.
• [Goldenberg, Foundations 10] A. Goldenberg, S. Fienberg, A. Zheng, E. Airoldi. A
Survey of Statistical Network Models. In Foundations 2009.
• [Hay, ICDM 09] M. Hay, C. Li, G. Miklau, and D. Jensen. Accurate estimation of
the degree distribution of private networks. In ICDM 2009.
• [Hay, Privacy-Aware Knowledge Discovery 10] M. Hay and G. Miklau and D.
Jensen. Enabling Accurate Analysis of Private Network Data. In Privacy-Aware
Knowledge Discovery: Novel Applications and New Techniques 2010.
• [Hay, PVLDB 08] M. Hay, G. Miklau, D. Jensen, and D. Towsley. Resisting
structural identification in anonymized social networks. In PVLDB 2008.
• [Hay, PVLDB 10] M. Hay, V. Rastogi, G. Miklau, and D. Suciu. Boosting the
accuracy of differentially-private queries through consistency. In PVLDB 2010.
108. References (continued)
• [Hay, VLDBJ 10] M. Hay and G. Miklau and D. Jensen and D. Towsley and C. Li.
In VLDB Journal 2010.
• [Hunter, JASA 08] D. Hunter, S. Goodreau, and M. Handcock. Goodness of fit of
social network models. In JASA 2008.
• [Jones, SOUPS 10] S. Jones, E. O’Neill. Feasibility of structural network
clustering for group-based privacy control in social networks. In SOUPS 2010.
• [Kifer, SIGMOD 11] D. Kifer and A. Machanavajjhala. No Free Lunch in Data
Privacy. In SIGMOD 2011.
• [Krishnamurthy, WWW 09] B. Krishnamurthy, C. Wills. Privacy diffusion on the
web: a longitudinal perspective. In WWW 2009.
• [Lindamood, WWW 09] J. Lindamood, R. Heatherly, M. Kantarcioglu, B.
Thuraisingham. Inferring private information using social network data. In WWW
2009.
• [Lipford, CHI 10] H. Lipford, J. Watson, M. Whitney, K. Froiland, R. Reeder. Visual
vs. compact: a comparison of privacy policy interfaces. In CHI 2010.
109. References (continued)
• [Liu, ICDM 09] K. Liu and E. Terzi. A framework for computing privacy scores of
users in online social networks. In ICDM 2009.
• [Liu, Next Generation Data Mining 08] K. Liu, K. Das, T. Grandison, and H.
Kargupta. Privacy-Preserving Data Analysis on Graphs and Social Networks. In
Next Generation of Data Mining 2008.
• [Liu, SDM 09] L. Liu and J. Wang and J. Liu and J. Zhang. Privacy Preservation in
Social Networks with Sensitive Edge Weights. In SDM 2009.
• [Liu, SIGMOD 08] K. Liu and E. Terzi. Towards identity anonymization on graphs.
In SIGMOD 2008.
• [Machanavajjhala, PVLDB 11] A. Machanavajjhala, A. Korolova, and A. Das
Sarma. Personalized Social Recommendations -- Accurate or Private? In VLDB
2011
• [Mazzia, CHI 11] A. Mazzia, K. LeFevre, and E. Adar. A tool for privacy
comprehension. In CHI 2011.
110. References (continued)
• [Mislove, WSDM 10] A. Mislove, B. Viswanath, K. Gummadi, P. Druschel. You are
who you know: Inferring user profiles in online social networks. In WSDM 2010.
• [Narayanan, OAKL 09] A. Narayanan and V. Shmatikov. De-anonymizing social
networks. In Security and Privacy 2009.
• [Nissim, STOC 07] K. Nissim, S. Raskhodnikova, and A. Smith. Smooth
sensitivity and sampling in private data analysis. In STOC 2007.
• [Rastogi, PODS 09] V. Rastogi, M. Hay, G. Miklau, and D. Suciu. Relationship
privacy: Output perturbation for queries with joins. In PODS 2009.
• [Seigneur, Trust Management 04] J. Seigneur and C. Damsgaard Jensen.
Trading privacy for trust, Trust Management 2004
• [Shehab, WWW 10] M. Shehab, G. Cheek, H. Touati, A. Squicciarini, and P.
Cheng. Learning based access control in online social networks. In WWW 2010.
• [Smith, IPAM 10] A. Smith. In IPAM Workshop on Statistical and Learning-
Theoretic Challenges in Data Privacy 2010.
111. References (continued)
• [Squicciarini, WWW 09] A. Squicciarini, M. Shehab, F. Paci. Collective privacy
management in social networks. In WWW 2009.
• [Wu, Managing and Mining Graph Data 10] X. Wu, X. Ying, K. Liu, and L. Chen. A
Survey of Algorithms for Privacy- Preservation of Graphs and Social Networks. In
Managing and Mining Graph Data 2010.
• [Wu, SDM 10] L. Wu and X. Ying and X. Wu. Reconstruction from Randomized
Graph via Low Rank Approximation. In SDM 2010.
• [Ying, SDM 08] X. Ying and X. Wu. Randomizing social networks: a spectrum
preserving approach. In SDM 2008.
• [Ying, SNA-KDD 09] X. Ying and K. Pan and X. Wu and L. Guo. Comparisons of
Randomization and K-degree Anonymization Schemes for Privacy Preserving
Social Network Publishing. In PinKDD 2009.
• [Zheleva, PinKDD 07] E. Zheleva and L. Getoor. Preserving the privacy of
sensitive relationships in graph data. In PinKDD 2007.
112. References (continued)
• [Zheleva, WWW 09] E. Zheleva and L. Getoor. To join or not to join: the illusion of
privacy in social networks with mixed public and private user profiles. In WWW
2009.
• [Zhou, ICDE 08] B. Zhou and J. Pei. Preserving privacy in social networks against
neighborhood attacks. In ICDE 2009.
• [Zhou, KIS 10] B. Zhou and J. Pei. k-Anonymity and l-Diversity Approaches for
Privacy Preservation in Social Networks against Neighborhood Attacks. In
Knowledge and Information Systems: An International Journal 2010.
• [Zhou, SIGKDD 08] B. Zhou and J. Pei and W. Luk. A Brief Survey on
Anonymization Techniques for Privacy Preserving Publishing of Social Network
Data. In SIGKDD 2008.
• [Zou, PVLDB 09] L. Zou, L. Chen, and M. T. A. Ozsu. K-automorphism: A general
framework for privacy preserving network publication. In PVLDB 2009.