What power law and rich get richer phenomena means in the world of network and how does it affect in the social networks for web page popularity especially in the facebook platform?
Cassandra is a free and open-source distributed NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. The slides explain the details of data modeling in Cassandra.
This slide is introduced What is index in database system .
It's very simple to let you understand what is index pros and cons.
there has chinese voice version youtube :)
https://www.youtube.com/watch?v=BOBZAMfQfrQ&feature=youtu.be
https://www.youtube.com/watch?v=hkl5CcmZ4OI&feature=youtu.be
Slide deck from my presentation at NYC's #Pydata 2012 conference - http://nyc2012.pydata.org/abstracts/#gephi
Talk abstract:
Are you interested in working with social data to map out communities and connections between friends, fans and followers? In this session I'll show ways in which we use the python networkx library along with the open source gephi visualization tool to make sense of social network data. We'll take a few examples from Twitter, look at how a hashtag spreads through the network, and then analyze the connections between users posting to the hashtag. We'll be constructing graphs, running stats on them and then visualizing the output.
Cassandra is a free and open-source distributed NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. The slides explain the details of data modeling in Cassandra.
This slide is introduced What is index in database system .
It's very simple to let you understand what is index pros and cons.
there has chinese voice version youtube :)
https://www.youtube.com/watch?v=BOBZAMfQfrQ&feature=youtu.be
https://www.youtube.com/watch?v=hkl5CcmZ4OI&feature=youtu.be
Slide deck from my presentation at NYC's #Pydata 2012 conference - http://nyc2012.pydata.org/abstracts/#gephi
Talk abstract:
Are you interested in working with social data to map out communities and connections between friends, fans and followers? In this session I'll show ways in which we use the python networkx library along with the open source gephi visualization tool to make sense of social network data. We'll take a few examples from Twitter, look at how a hashtag spreads through the network, and then analyze the connections between users posting to the hashtag. We'll be constructing graphs, running stats on them and then visualizing the output.
UNIT II MODELING AND VISUALIZATION
Visualizing Online Social Networks - A Taxonomy of Visualizations - Graph Representation -
Centrality- Clustering - Node-Edge Diagrams - Visualizing Social Networks with Matrix-Based
Representations- Node-Link Diagrams - Hybrid Representations - Modelling and aggregating
social network data – Random Walks and their Applications –Use of Hadoop and Map Reduce -
Ontological representation of social individuals and relationships.
Data Definition and Data Manipulation Language-DDL & DMLMd. Selim Hossain
DDL statements are used to build and modify the structure of tables and other objects in the database and A data manipulation language (DML) is a family of syntax elements similar to a computer programming language used for selecting, inserting, deleting and updating data in a database.
An introduction in the world of Social Network Analysis and a view on how this may help learning networks. History, data collection and several analysis techniques are shown.
This talk provides an introduction to various concepts that are essential to the understanding of distributed systems. Concepts covered include the 8 fallacies of distributed computing, the anatomy of a distributed system, system models, the CAP theorem, consistency models, partitioning, replication, leader election, failure detection, and consensus algorithms. This is the first in a three-part series designed to familiarize the audience with the design and usage of distributed systems.
program partitioning and scheduling IN Advanced Computer ArchitecturePankaj Kumar Jain
Advanced Computer Architecture,Program Partitioning and Scheduling,Program Partitioning & Scheduling,Latency,Levels of Parallelism,Loop-level Parallelism,Subprogram-level Parallelism,Job or Program-Level Parallelism,Communication Latency,Grain Packing and Scheduling,Program Graphs and Packing
UNIT II MODELING AND VISUALIZATION
Visualizing Online Social Networks - A Taxonomy of Visualizations - Graph Representation -
Centrality- Clustering - Node-Edge Diagrams - Visualizing Social Networks with Matrix-Based
Representations- Node-Link Diagrams - Hybrid Representations - Modelling and aggregating
social network data – Random Walks and their Applications –Use of Hadoop and Map Reduce -
Ontological representation of social individuals and relationships.
Data Definition and Data Manipulation Language-DDL & DMLMd. Selim Hossain
DDL statements are used to build and modify the structure of tables and other objects in the database and A data manipulation language (DML) is a family of syntax elements similar to a computer programming language used for selecting, inserting, deleting and updating data in a database.
An introduction in the world of Social Network Analysis and a view on how this may help learning networks. History, data collection and several analysis techniques are shown.
This talk provides an introduction to various concepts that are essential to the understanding of distributed systems. Concepts covered include the 8 fallacies of distributed computing, the anatomy of a distributed system, system models, the CAP theorem, consistency models, partitioning, replication, leader election, failure detection, and consensus algorithms. This is the first in a three-part series designed to familiarize the audience with the design and usage of distributed systems.
program partitioning and scheduling IN Advanced Computer ArchitecturePankaj Kumar Jain
Advanced Computer Architecture,Program Partitioning and Scheduling,Program Partitioning & Scheduling,Latency,Levels of Parallelism,Loop-level Parallelism,Subprogram-level Parallelism,Job or Program-Level Parallelism,Communication Latency,Grain Packing and Scheduling,Program Graphs and Packing
Social Network Analysis & an Introduction to ToolsPatti Anklam
This presentation was delivered as part of an intense knowledge management curriculum. It covers the basics of network analysis and then goes into the different types of tool that support analyzing networks.
In social networks, where users send messages to each other, the issue of what triggers communication between unrelated users arises: does communication between previously unrelated users depend on friend-of-a-friend type of relationships, common interests, or other factors? In this work, we study the problem of predicting directed communication
intention between two users. Link prediction is similar to communication intention in that it uses network structure for prediction. However, these two problems exhibit fundamental
differences that originate from their focus. Link prediction uses evidence to predict network structure evolution, whereas our focal point is directed communication initiation between
users who are previously not structurally connected. To address this problem, we employ topological evidence in conjunction to transactional information in order to predict communication intention. It is not intuitive whether methods that work well for
link prediction would work well in this case. In fact, we show in this work that network or content evidence, when considered separately, are not sufficiently accurate predictors. Our novel approach, which jointly considers local structural properties of users in a social network, in conjunction with their generated content, captures numerous interactions, direct and indirect, social and contextual, which have up to date been considered independently. We performed an empirical study to evaluate our method using an extracted network of directed @-messages sent between users of a corporate microblogging service, which resembles Twitter. We find that our method outperforms state of the art techniques for link prediction. Our findings have implications for a wide range of social web applications, such as contextual expert recommendation for Q&A, new friendship relationships creation, and targeted content delivery.
This thesis constitutes one of the first investigations that lie at the intersection of social influence propagation, viral marketing, and social advertising. The objective of this thesis is to take the algorithmic aspects of viral marketing out of the lab, and further enhance these aspects to account for the real world social advertisement models, by drawing on the viral marketing literature to study social influence aware ad allocation for social advertising. To this end, we take a first step towards enabling social influence online analytics in support of viral marketing decision making, and propose efficient influence indexing framework that can accurately answer topic-aware viral marketing queries with milliseconds response time. We then initiate investigation in the area of social advertising through the viral marketing lens, aligned with real world social advertisement models, and introduce two fundamental optimization problems, regarding the allocation of ads to social network users under social influence. We devise greedy approximation algorithms with provable approximation guarantees for the novel problems introduced. We also develop scalable versions of our approximation algorithms by leveraging the notion of reverse reachability sampling on social graphs, and experimentally confirm that our algorithms are scalable and deliver high quality solutions.
Data Mining In Social Networks Using K-Means Clustering Algorithmnishant24894
This topic deals with K-Means Clustering Algorithm which is used to categorize the data set into clusters depending upon their similarities like common interest or organization or colleges, etc. It categorize the data into clusters on the basis of mutual friendship.
Relationships Matter: Using Connected Data for Better Machine LearningNeo4j
Relationships are highly predictive of behavior, yet most data science models overlook this information because it's difficult to extract network structure for use in machine learning (ML).
With graphs, relationships are embedded in the data itself, making it practical to add these predictive capabilities to your existing practices.
That’s why we’re presenting and demoing the use of graph-native ML to make breakthrough predictions. This will cover:
- Different approaches to graph feature engineering, from queries and algorithms to embeddings
- How ML techniques leverage everything from classical network science to deep learning and graph convolutional neural networks
- How to generate representations of your graph using graph embeddings, create ML models for link prediction or node classification, and apply these models to add missing information to an existing graph/incoming data
- Why no-code visualization and prototyping is important
Gaining, retaining and losing influence in online communitiesjoinson
My keynote presentation: 'Gaining, retaining and losing influence in online communities' from a conference at Kings College, London on the topic of 'social influence in the information age'
In search of lost knowledge: joining the dots with Linked Datajonblower
These slides are from my seminar to the University of Reading Department of Meteorology, November 2013. They contain a (hopefully not very technical) introduction to the concepts of Linked Data and how we are applying them in the CHARMe project (http://www.charme.org.uk). In CHARMe we are using Open Annotation to connect users of climate data with community-generated "commentary information" that helps them to understand a dataset's strengths and weaknesses.
The slide notes contain some helpful context, so you might like to download the PPT file!
The slides are licensed as "Creative Commons Attribution 3.0", meaning that you can do what you like with these slides provided that you credit the University of Reading for their creation. See http://creativecommons.org/licenses/by/3.0/.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
1. Power Laws and Rich-Get-Richer Phenomena
Faculty of Electrical and Computer Engineering
University of Prishtina “Hasan Prishtina”
Prishtina, Kosova
Ajshe Nazmi Klinaku
ajshe1klinaku@gmail.com
Prishtinë
2. Introduct
• Popularity
• Our Model
• Powers Law
• Long Tail
• Richer-get-richer phenomena
• Result
• Conclusion
• References
3. Popularity
•How do we measure it?
•How does it effect the network?
•Basic network models with popularity
5. Our Model
Consider the creation of web pages.
The web as a directed graph
• Nodes are web pages
• Edges are hyperlinks
• #in-links = popularity
• 1 out-link per page
When a new web page is designed, it
includes links to existing web pages.
6. Our Main Question
• As a function of k, what fraction of pages
on the web have k in-links?
8. Power Laws
• When people measured the distribution of links on the Web,
however, they found something very different
• Fraction of Web pages that have k in-links is approximately
proportional to 1/k2 [1]
• Why is this so different from the normal distribution?
• The crucial point is that 1/k2 decreases much more slowly as k
increases,
• So pages with very large numbers of in-links are much more
common than we’d expect with a normal distribution.
• A function that decreases ask to some fixed power, such as 1/k2 in the
present case, is called a power law.
9. Power Laws vs Normal Distribution
• Normal distribution – many independent
experiments
• Power laws – if the data measured can be
viewed as a type of popularity
10. What causes power laws?
• Correlated decisions across a population
• Human tendency to copy decision
11. Examples
• Telephone numbers that
receive k calls per day
• Books bought by k people
• Scientific papers that receive
k citations
• Web page that recieve k in-
links
12. LONG TAIL
• In search queries it is important to tap the main and most common
search terms
• Tail decreases much slower than the normal distribution
13. Rich – Get - Richer
• A page that gets a small lead over others will
tend to extend this lead
• With probability (1-p), chooses a page k with
probability proportional to k’s #in-links
14. RICH-GET-RICHER PHENOMENA
• We start with m0 nodes, the links between which are chosen arbitrarily as long
as each node has at least one link
• The network develops following two steps :
• Growth : At each time step we add a new node with n(<=m0) links that connect
the new node to m nodes already in the network.
• Preferential attachment: The probability that a link of the new node connects to
node i depends on the degree of node i
Probability[i]=Degree[i]/Sum(degrees of all the nodes).
15. Building a Simple Model
•Webages are created in order 1,2,3,…,N
Dynamic network growth
•When page j is created, with probability:
p: Chooses a page uniformly at random among
all earlier pages and links to it
1-p: Chooses a page uniformly at random among
all earlier pages and link to its link
17. Result
Page like network analyze
PageLikeNetwork ICK TECH INSIDER NASA HISTORY
ID 2.48822E+14 3.52751E+14 54971236771 2.01566E+15
Analyses_Period First Second First Second First Second First Second
Nodes(Crawl Depth 2) 1111 1110 61 61 936 935 1 1
Edges(Crawl Depth 2) 6134 6111 1052 1052 12567 12565 0 0
Post_Activity 0.08 0.07 3.83 3.11 0.3 0.35 0.01 0.01
Talking_About_Count 404 947 716843 486.276 100613 226332 246 195
Fan_Count 51.047 51.272 14.073.621 14.109.586 21.168.297 21.285.967 43.411.692 43.441.164
Follow_Page 50.047 50.999 14.291.857 14.333.411 21.244.860 21.373.652 43.411.692 43.441.501
Link 2 3 2 2 2 3 3 3
Video 3 2 1 1 1 2 2 2
Photo 1 1 3 3 3 1 1 1
18. Result
Post page Analyze
Post_Page_Analyses Analyse Posts Like Reactions Comments Share
ICK First 999 58981 63317 1420 2303
Second 50 2723 2820(56.4avg) 46(0.92avg) 154
Tech Insider First 999 3066780 683919 456536 819186
Second 50 5918 6756(135.12) 714(14.28) 1834
NASA First 999 3329112 3971234 474995 859250
Second 50 156534 184099(3681.98 avg) 168700(337.4avg) 20488
History First 108 4186 5076 391 273
Second 50 2755 4117(82.34avg) 373(7.46avg) 219
19. Conclusions
• We see how big the Share number is, the greater the
number of links Likes is, that reflect the popularity of
the site.
• Share has an impact and a leading role in the
popularity of the site.
• Also such a conclusion is derived from Facebook
analytics firm PageLever