1. The document describes using Pajek software to analyze affiliation networks by exploring the relationships between actors (advertising creatives) and events (award-winning advertising project teams) over time.
2. It shows how to partition the network by attributes like the advertising agency involved, and use this to identify creatives who worked with multiple agencies. Degree distributions reveal most creatives worked on ads for a single agency.
3. Techniques like k-neighbor analysis and extended partitions are used to track the careers of top creatives by comparing their co-workers and ads across media like TV, magazines and newspapers over the years. This shows TV teams tended to be much larger on average.
Prepared for the Spring 2008 Anthropology of Japan in Japan workshop in Tsukuba, this presentation describes the thinking behind on-going piece research on members of creative teams that emerge as winners in one of Japan's premiere ad contests
Prepared for the Spring 2008 Anthropology of Japan in Japan workshop in Tsukuba, this presentation describes the thinking behind on-going piece research on members of creative teams that emerge as winners in one of Japan's premiere ad contests
Lightweight Transactions at Lightning SpeedScyllaDB
This talk will outline the Scylla implementation of Lightweight Transactions (LWT) that brings us to parity with Apache Cassandra. We will cover how to use it, what is working, and what is left to be done. We will also cover what other improvements are in store to improve Scylla's transactional capabilities and why it matters.
Percona Toolkit for Effective MySQL AdministrationMydbops
Percona Tools are one of most widely tool in MySQL industry. It is used for the effective MySQL administartion and handling complex operational tasks in MySQL.
Big Data in Real-Time: How ClickHouse powers Admiral's visitor relationships ...Altinity Ltd
Slides for the webinar presented on June 16, 2020
By James Hartig, Co-Founders of Admiral and Robert Hodges, Altinity CEO
Advertising is dying in the wake of privacy and adblockers. Join us for a conversation with James Hartig, a Co-Founder at Admiral (getadmiral.com), who helps publishers diversify their revenue and build more meaningful relationships with users. We'll start with an overview of Admiral's platform and how they use large scale session data to power their engagement engine. We'll then discuss the ClickHouse features that Admiral uses to power these real-time decisions. Finally, we'll walk through how Admiral migrated from MongoDB to ClickHouse and some of their plans for future projects. Join us to learn how ClickHouse drives cutting edge real-time applications today!
Speaker Bios:
James Hartig is one of the Co-Founders of Admiral working on distributed systems in Golang. Before this, he worked at the online music streaming platform, Grooveshark.
Robert Hodges is CEO of Altinity, which offers enterprise support for ClickHouse. He has over three decades of experience in data management spanning 20 different DBMS types. ClickHouse is his current favorite. ;)
How to Build your Training Set for a Learning To Rank Project - HaystackSease
Presented by Alessandro Benedetti of Sease, Learning to Rank (LTR) is the application of machine learning techniques (typically supervised), in the formulation of ranking models for information retrieval systems.
With LTR becoming more and more popular, organizations struggle with the problem of how to collect and structure relevance signals necessary to train their ranking models.
This talk is a technical guide to explore and master various techniques to generate your training set(s) correctly and efficiently.
Expect to learn how to :
- model and collect the necessary feedback from the users (implicit or explicit)
- calculate for each training sample a relevance label that is meaningful and not ambiguous (Click Through Rate, Sales Rate ...)
- transform the raw data collected in an effective training set (in the numerical vector format most of the LTR training libraries expect)
Join us as we explore real-world scenarios and dos and don'ts from the e-commerce industry.
Introduction to Reactive Extensions (Rx)Tamir Dresher
Presentations from the june meeting of IDNDUG
http://ariely.info/Communities/IDNDUG/IDNDUG19thJune2013/tabid/171
The Reactive Extensions (Rx) is a library for composing asynchronous and event-based programs using observable sequences and LINQ-style query operators. Using Rx, developers represent asynchronous data streams with Observables, query asynchronous data streams using LINQ operators, andparameterize the concurrency in the asynchronous data streams using Schedulers. Simply put, Rx = Observables + LINQ + Schedulers
This presentation describes the thought process for sharding in MongoDB for a social game that has or will have a large influx of users.
The presentation goes into how to setup a basic dev environment as well as provide some sample code to get you started.
Optimize the Large Scale Graph Applications by using Apache Spark with 4-5x P...Databricks
Nowadays, Spark is widely adopted in the big enterprise by handling the large volume of data. In PayPal, more and more complex data processing applications are running on top of Spark for its better performance and easy usage. Graphic analytics are among the emerging trend for different business use cases, E.g., risk control, compliance, etc.
Lightweight Transactions at Lightning SpeedScyllaDB
This talk will outline the Scylla implementation of Lightweight Transactions (LWT) that brings us to parity with Apache Cassandra. We will cover how to use it, what is working, and what is left to be done. We will also cover what other improvements are in store to improve Scylla's transactional capabilities and why it matters.
Percona Toolkit for Effective MySQL AdministrationMydbops
Percona Tools are one of most widely tool in MySQL industry. It is used for the effective MySQL administartion and handling complex operational tasks in MySQL.
Big Data in Real-Time: How ClickHouse powers Admiral's visitor relationships ...Altinity Ltd
Slides for the webinar presented on June 16, 2020
By James Hartig, Co-Founders of Admiral and Robert Hodges, Altinity CEO
Advertising is dying in the wake of privacy and adblockers. Join us for a conversation with James Hartig, a Co-Founder at Admiral (getadmiral.com), who helps publishers diversify their revenue and build more meaningful relationships with users. We'll start with an overview of Admiral's platform and how they use large scale session data to power their engagement engine. We'll then discuss the ClickHouse features that Admiral uses to power these real-time decisions. Finally, we'll walk through how Admiral migrated from MongoDB to ClickHouse and some of their plans for future projects. Join us to learn how ClickHouse drives cutting edge real-time applications today!
Speaker Bios:
James Hartig is one of the Co-Founders of Admiral working on distributed systems in Golang. Before this, he worked at the online music streaming platform, Grooveshark.
Robert Hodges is CEO of Altinity, which offers enterprise support for ClickHouse. He has over three decades of experience in data management spanning 20 different DBMS types. ClickHouse is his current favorite. ;)
How to Build your Training Set for a Learning To Rank Project - HaystackSease
Presented by Alessandro Benedetti of Sease, Learning to Rank (LTR) is the application of machine learning techniques (typically supervised), in the formulation of ranking models for information retrieval systems.
With LTR becoming more and more popular, organizations struggle with the problem of how to collect and structure relevance signals necessary to train their ranking models.
This talk is a technical guide to explore and master various techniques to generate your training set(s) correctly and efficiently.
Expect to learn how to :
- model and collect the necessary feedback from the users (implicit or explicit)
- calculate for each training sample a relevance label that is meaningful and not ambiguous (Click Through Rate, Sales Rate ...)
- transform the raw data collected in an effective training set (in the numerical vector format most of the LTR training libraries expect)
Join us as we explore real-world scenarios and dos and don'ts from the e-commerce industry.
Introduction to Reactive Extensions (Rx)Tamir Dresher
Presentations from the june meeting of IDNDUG
http://ariely.info/Communities/IDNDUG/IDNDUG19thJune2013/tabid/171
The Reactive Extensions (Rx) is a library for composing asynchronous and event-based programs using observable sequences and LINQ-style query operators. Using Rx, developers represent asynchronous data streams with Observables, query asynchronous data streams using LINQ operators, andparameterize the concurrency in the asynchronous data streams using Schedulers. Simply put, Rx = Observables + LINQ + Schedulers
This presentation describes the thought process for sharding in MongoDB for a social game that has or will have a large influx of users.
The presentation goes into how to setup a basic dev environment as well as provide some sample code to get you started.
Optimize the Large Scale Graph Applications by using Apache Spark with 4-5x P...Databricks
Nowadays, Spark is widely adopted in the big enterprise by handling the large volume of data. In PayPal, more and more complex data processing applications are running on top of Spark for its better performance and easy usage. Graphic analytics are among the emerging trend for different business use cases, E.g., risk control, compliance, etc.
3. Example
•Actors are advertising creatives
•Events are project teams that produce award-
winning ads
•The logic is generalizable to all 2-mode networks
where nodes in one mode can be subdivided by
attributes
5. The Research Plan
DATA: Credits from winning ads in the Tokyo
Copywriters Club Annual
SNA: Explore networks linking members of
winning teams
DESK RESEARCH: Books and articles written
by or about central figures in the networks
INTERVIEWS: Conversations with central
figures using output from SNA and desk
research as stimulus material
6. The Data
3634
22907
7018
Note1: Ad production requires multiple roles
Note 2: Creators may play more than one role
Note 3: Multiple creators may play the same role
7. Why SNA?
•Explore network structures to see how they
changed over time
•Identify industry stars, and
•Track their careers
10. Stumbling Blocks
• “Techniques for analyzing one-mode networks cannot
always be applied to two-mode networks without
modification or change of meaning. Special techniques for
two-mode networks are very complicated....
• “The solution commonly used...is to change the two-mode
network into a one-mode network, which can be analyzed
with standard techniques.”
• Inevitably, however, this approach destroys useful
information.
11. For Example
• We begin with the combined network that contains data for all
six networks (1981-2006)
• After simplifying the network to remove multiple lines, we
click on the info icon (Rows=Creators, Columns=Ads)
==============================================================================
1. Z:DocumentsMagic BriefcaseWinner's CirclesNetworksRevised January 2012CAR81-
06Creator Ads Roles 81-06[Single Line].net [2-Mode] (10652)
==============================================================================
Number of vertices (n): 10652
----------------------------------------------------------
Arcs Edges
----------------------------------------------------------
Total number of lines 0 22907
----------------------------------------------------------
Number of loops 0 0
Number of multiple lines 0 0
----------------------------------------------------------
2-Mode Network: Rows=7018, Cols=3634
Density [2-Mode] = 0.00089819
Average Degree = 4.30097634
12. A Giant Component
• Using Network>Create Partition>Components>Weak, we
determine that the network contains 94 components, including
one giant component that accounts for 95.9% of all nodes.
• This network seems to be highly connected. But the single
giant component conceals underlying structures.
• We need to look more deeply.
13. Crude, Simple, Effective
Solutions
•Extended partitions
•Shrinking networks and examining degree
distributions
•Using k-neighbor and extended partitions to track
and compare careers
15. Question
•We know that the Japanese advertising industry is
an oligopoly dominated by two giant agencies,
Dentsu and Hakuhodo, with ADK No. 3
•How many creators work on projects for more
than one agency?
16. The Agency Partition
• My Filemaker Pro database makes it simple to partition the Ads using the attribute Agency
• When I import that partition and check info, I see that I have a partition with four clusters
(1=Dentsu, 2=Hakuhodo, 3=ADK, 4= Other) that covers a total of 3634 nodes.
• When I try to use Operations>Network+Partition>Extract Subnetwork, Pajek generates an
error message “Network and Partition of equal size needed.”
==============================================================================
1. Z:DocumentsMagic BriefcaseWinner's CirclesNetworksRevised January 2012CAR81-06Agencies 81-06.clu (3634)
==============================================================================
Dimension: 3634
The lowest value: 1
The highest value: 4
Frequency distribution of cluster values:
Cluster Freq Freq% CumFreq CumFreq% Representative
----------------------------------------------------------------
1 1187 32.6637 1187 32.6637 1
2 628 17.2812 1815 49.9450 40
3 47 1.2933 1862 51.2383 295
4 1772 48.7617 3634 100.0000 3
----------------------------------------------------------------
Sum 3634 100.0000
17. Extending the Partition
• To create a partition of equal size, I begin with Partition>Create Constant Partition, setting
the dimension to 7018 (the number of creators) and the constant to 0.
• Then with the constant partition in the first partition field and the agency partition in the
second partition field, I use Partitions>Fuse Partition
• I save the extended partition for later use.
==============================================================================
3. Fusion of C2 and C1 (10652)
==============================================================================
Dimension: 10652
The lowest value: 0
The highest value: 4
Frequency distribution of cluster values:
Cluster Freq Freq% CumFreq CumFreq% Representative
----------------------------------------------------------------
0 7018 65.8843 7018 65.8843 Nak1
1 1187 11.1434 8205 77.0278 AD1_01
2 628 5.8956 8833 82.9234 AD7_86
3 47 0.4412 8880 83.3646 AD50_01
4 1772 16.6354 10652 100.0000 AD1_81
----------------------------------------------------------------
Sum 10652 100.0000
19. Shrink and Examine Degree
• With the simplified network and extended partition as input, I use Operations>Network+Partition>Shrink
Network, leaving the 0 cluster, the creatives, unshrunk
• We know that we are starting with a 2-mode network, in which creators can only be linked directly to ads. Thus,
in the shrunk network, creators will have at most four immediate neighbors
• Using Network>Create Partition>Degree>All, we find that of 7018 creators, 5976 have worked for only one
agency, 822 for two agencies, 195 for three agencies, and only 25 for four agencies. As an added bonus we can
see the total number of creatives who have worked for each of the agencies (our database makes it simple to
track down the agency that created the ad whose label is used for the agency cluster)
==============================================================================
5. All Degree Partition of N2 (7022)
==============================================================================
Dimension: 7022
The lowest value: 1
The highest value: 3362
Frequency distribution of cluster values:
Cluster Freq Freq% CumFreq CumFreq% Representative
----------------------------------------------------------------
1 5976 85.1040 5976 85.1040 Saw5
2 822 11.7061 6798 96.8100 Nak1
3 195 2.7770 6993 99.5870 Nak190
4 25 0.3560 7018 99.9430 Yag350
230 1 0.0142 7019 99.9573 #AD50_01
1779 1 0.0142 7020 99.9715 #AD7_86
2934 1 0.0142 7021 99.9858 #AD1_01
3362 1 0.0142 7022 100.0000 #AD1_81
----------------------------------------------------------------
Sum 7022 100.0000
25. TV Teams Twice as Large
Creators Ads Avg Team
TV 12135 1159 10.47
Radio 1185 194 6.11
Newspaper 6152 1226 5.02
Magazine 2284 517 4.42
•Based on the number of individuals who are given credits per ad,
creative teams for TV commercials are, on average, twice as large as
those for newspaper ads and more than twice as large as those for
magazine ads.
•A team of size n contributes n(n-1)/2 links to the network.