Linda works for a company called microsegment that has adopted collaborative web tools and social applications to enable an "Enterprise 2.0" work environment. She is able to complete tasks, manage projects, communicate with colleagues, and review materials online from anywhere using lightweight browser-based tools like trac, wikis, blogs and RSS feeds. This has improved productivity, communication and collaboration within the company and with clients. Microsegment aims to help other companies implement similar social and collaborative solutions.
Activity-Based Advertising:Techniques and Challengesbo begole
The document discusses activity-based advertising techniques and challenges. It describes using a person's physical context and activities to infer their interests and times of receptiveness in order to provide targeted ads. Examples are given of inferring activities from location data and predicting activities based on variables like previous activities, time of day, and location. The document also discusses opportunities for research in improving interest and activity modeling, ad placement optimization, and addressing privacy and uncertainty issues in activity streams.
Challenges of Interaction Design for
Clothes Fitting Room Technologies.
This paper uncovers issues in the design of camera-based technologies
to support retail shopping in a physical store, specifically clothes shopping.
An emerging class of technology is targeting the enhancement of retail shopping,
including the trying on of clothing. Designing such systems requires careful
considerations of physical and electronic design, as well as concerns about
user privacy. We explore the entire design cycle using a technology concept
called the Responsive Mirror through its conception, prototyping and evaluation.
The Responsive Mirror is an implicitly controlled video technology for
clothes fitting rooms that allows a shopper to directly compare a currently worn
garment with images from the previously worn garment. The orientation of images
from past trials is matched to the shopper’s pose as he moves. To explore
the tension between privacy and publicity, the system also allows comparison to
clothes that other people in the shoppers’ social network are wearing. A user
study elicited a number of design tradeoffs regarding privacy, adoption, benefits
to shoppers and merchants and user behaviors in fitting rooms.
The document discusses challenges facing an insurance company's marketing officer. It identifies competitors entering the market, agents pushing for higher commissions, and customers expecting more services at lower prices as key issues. The marketing officer proposes developing capabilities to better understand individual customers and households to improve cross-selling and profitability analysis at the customer level. This would allow prioritizing the most profitable customers and optimizing resources. The officer outlines calculating customer lifetime value and profit concentration to inform these strategies.
Linda, the shop manager, wants to better monitor her employees' performance and activities. She plans to collect records of all customer interactions from various systems to create a timeline of each agent's daily activities. This would allow Linda to analyze utilization, schedule efficiency, and compare performance across agents. With real data, Linda can provide targeted advice to help agents improve and make decisions to optimize the shop's workload and recognize top performers.
NIA 2010 Q1-R00.91 - datasheet - englishCsaba Kiss
This document summarizes infrastructure data for 3,047 Hungarian postcodes collected by Microsegment Zrt. It includes tables listing the number of postcodes by county and collections of data on various types of infrastructure elements within postcode areas, such as highways, residential parks, post offices, hotels, petrol stations, hospitals, ATMs, and railway access. Each collection includes information on data availability, date, definition, and domain values.
Linda works for a company called microsegment that has adopted collaborative web tools and social applications to enable an "Enterprise 2.0" work environment. She is able to complete tasks, manage projects, communicate with colleagues, and review materials online from anywhere using lightweight browser-based tools like trac, wikis, blogs and RSS feeds. This has improved productivity, communication and collaboration within the company and with clients. Microsegment aims to help other companies implement similar social and collaborative solutions.
Activity-Based Advertising:Techniques and Challengesbo begole
The document discusses activity-based advertising techniques and challenges. It describes using a person's physical context and activities to infer their interests and times of receptiveness in order to provide targeted ads. Examples are given of inferring activities from location data and predicting activities based on variables like previous activities, time of day, and location. The document also discusses opportunities for research in improving interest and activity modeling, ad placement optimization, and addressing privacy and uncertainty issues in activity streams.
Challenges of Interaction Design for
Clothes Fitting Room Technologies.
This paper uncovers issues in the design of camera-based technologies
to support retail shopping in a physical store, specifically clothes shopping.
An emerging class of technology is targeting the enhancement of retail shopping,
including the trying on of clothing. Designing such systems requires careful
considerations of physical and electronic design, as well as concerns about
user privacy. We explore the entire design cycle using a technology concept
called the Responsive Mirror through its conception, prototyping and evaluation.
The Responsive Mirror is an implicitly controlled video technology for
clothes fitting rooms that allows a shopper to directly compare a currently worn
garment with images from the previously worn garment. The orientation of images
from past trials is matched to the shopper’s pose as he moves. To explore
the tension between privacy and publicity, the system also allows comparison to
clothes that other people in the shoppers’ social network are wearing. A user
study elicited a number of design tradeoffs regarding privacy, adoption, benefits
to shoppers and merchants and user behaviors in fitting rooms.
The document discusses challenges facing an insurance company's marketing officer. It identifies competitors entering the market, agents pushing for higher commissions, and customers expecting more services at lower prices as key issues. The marketing officer proposes developing capabilities to better understand individual customers and households to improve cross-selling and profitability analysis at the customer level. This would allow prioritizing the most profitable customers and optimizing resources. The officer outlines calculating customer lifetime value and profit concentration to inform these strategies.
Linda, the shop manager, wants to better monitor her employees' performance and activities. She plans to collect records of all customer interactions from various systems to create a timeline of each agent's daily activities. This would allow Linda to analyze utilization, schedule efficiency, and compare performance across agents. With real data, Linda can provide targeted advice to help agents improve and make decisions to optimize the shop's workload and recognize top performers.
NIA 2010 Q1-R00.91 - datasheet - englishCsaba Kiss
This document summarizes infrastructure data for 3,047 Hungarian postcodes collected by Microsegment Zrt. It includes tables listing the number of postcodes by county and collections of data on various types of infrastructure elements within postcode areas, such as highways, residential parks, post offices, hotels, petrol stations, hospitals, ATMs, and railway access. Each collection includes information on data availability, date, definition, and domain values.
Corporate Needs-Based Segmentation (CNBS) was developed by Tudásbánya Kft to segment corporate customers for mobile service providers. It delivered CNBS to 4 EU countries from 2005-2010. The document discusses how CNBS can segment customers based on their needs to improve retention, targeting, and profitability. It provides examples of segment profiles and strategies to focus on acquisition, cross-sell, and increasing customer value for different segments.
This document discusses contextual intelligence and the challenges of building digital assistants. It summarizes:
1) Contextual intelligence aims to understand relationships and utility to achieve goals with available resources, like practical "street smarts". Current digital assistants still face problems with privacy, missing information, latencies, accuracy, and providing truly actionable information.
2) Technologies alone cannot solve these problems - it requires understanding user concerns, capabilities for human conversational repair, giving users control of their data models, adjusting expectations based on application needs, and discerning what information a user already knows or what will help achieve their goals.
3) The grand challenge is for HCI research methods to create the knowledge needed to emulate human contextual intelligence
The document discusses a telco business model shifting from primarily offering a small number of popular products and services to large customer bases, to offering a significantly higher number of niche products and services each appealing to relatively small customer communities, known as the "long tail" business model. It notes managing the high number and variations of products under this model requires new capabilities around scalable product development, low production costs, easy search and purchase processes, and understandable pricing and billing.
Corporate Needs-Based Segmentation (CNBS) was developed by Tudásbánya Kft to segment corporate customers for mobile service providers. It delivered CNBS to 4 EU countries from 2005-2010. The document discusses how CNBS can segment customers based on their needs to improve retention, targeting, and profitability. It provides examples of segment profiles and strategies to focus on acquisition, cross-sell, and increasing customer value for different segments.
This document discusses contextual intelligence and the challenges of building digital assistants. It summarizes:
1) Contextual intelligence aims to understand relationships and utility to achieve goals with available resources, like practical "street smarts". Current digital assistants still face problems with privacy, missing information, latencies, accuracy, and providing truly actionable information.
2) Technologies alone cannot solve these problems - it requires understanding user concerns, capabilities for human conversational repair, giving users control of their data models, adjusting expectations based on application needs, and discerning what information a user already knows or what will help achieve their goals.
3) The grand challenge is for HCI research methods to create the knowledge needed to emulate human contextual intelligence
The document discusses a telco business model shifting from primarily offering a small number of popular products and services to large customer bases, to offering a significantly higher number of niche products and services each appealing to relatively small customer communities, known as the "long tail" business model. It notes managing the high number and variations of products under this model requires new capabilities around scalable product development, low production costs, easy search and purchase processes, and understandable pricing and billing.
2. Korpusz
http://hu.wikipedia.org/wiki/Korpusz:
– A korpusz nyelvészeti szakkifejezés, jelentése egy adott nyelv adott időpontban használt változatára vonatkozó
szövegek összessége.
– A szó a latin corpus (test) szóból ered, és a "nyelvi test", nyelvi összesség értelemben használt.
– A nyelvi korpusz felhasználásaira lehet példa szótárak létrehozása, nyelv jellegzetességeinek elemzése.
– Létrehozásakor fontos szempont, hogy lehetőség szerint ne keveredjen benne az adott nyelv eltérő időszakokban
használt (új, és régies) formája.
– Az informatika terjedésével egyre könnyebb igen nagy mennyiségű, természetes szöveget tartalmazó korpuszok
létrehozása, ilyen célra használhatóak például a digitalizált lexikonok, a Wikipédia, de például az internetes weblapok
egy adott köre is (pl. sajtó).
http://corpus.nytud.hu/mnsz/:
– A korpusz ténylegesen előforduló írott, vagy lejegyzett beszélt nyelvi adatok gyűjteménye. A szövegeket valamilyen
szempont szerint válogatják és rendezik. Nem feltétlenül egész szövegeket tartalmaz, és nem csak tárháza a
szövegeknek, hanem tartalmazza azok bibliográfiai adatait, bejelöli a szerkezeti egységeket (bekezdés, mondat). Az
MNSZ a mai magyar írott köznyelv általános célú reprezentatív korpusza kíván lenni.
2011.02.01. www.microsegment.hu 2
3. Microsegment Corpus
Első (legfontosabb) forrás:
– Webcorpus:
http://mokk.bme.hu/resources/webcorpus/
Halácsy Péter, Kornai András, Németh László, Rung András,
Szakadát István, Trón Viktor Creating open language resources for
Hungarian In Proceedings of the 4th international conference on
Language Resources and Evaluation (LREC2004), 2004 ps pdf
Kornai, A, Halácsy, P, Nagy, V, Oravecz, Cs, Trón, V, and Varga, D (2006). Web-based frequency dictionaries for medium density languages In: Proceedings of the 2nd
International Workshop on Web as Corpus,
edited by Adam Kilgarriff, Marco Baroni ACL-06, pages 1--9. pdf
Második legfontosabb forrás
– Magyar wikipedia szövegei (2010. Április)
További források
– www.fn.hu
– www.hvg.hu
– www.mti.hu
Forrás jelöltek
– Minden nyilvánosan hozzáférhető digitális magyar nyelvű forrás
2011.02.01. www.microsegment.hu 3
4. Hogyan készül / Mire használjuk
Forrás szöveg
… Utólagos
… Feldolgozás Feldolgozás
… Helyesírás ellenőrzés Statisztikák
Jelenleg több fajta tokenizálás és Tárolás
egyéb
Keresztvizsgálatok
(szöveg, szótár) és
néhány formátumú feldolgozás „Auto-Tag”-elés
(txt, pdf, cvs, stb.)
fogadására képes
Felhasználás
Szövegbányászati projektekhez
Adattisztítás (Data Improver)
Egyéb elemzések (közösségi elemzések,
tematizálás, szinonimák, trendek)
Saját tudástárunk keresőmotorja
2011.02.01. www.microsegment.hu 4
6. Microsegment Corpus bővítése
Verzió Dátum Tartalom Struktúra Módszer
Webcorpus, Wiki címszavak, BM
utcanevek, Trágár szavak,
01.00 2010.04.10. Közterületek, Magyar keresztnevek, Lemma Hunspell alkalmazása
Magyar településnevek, Magyar
vezetéknevek
Wiki Hun 2010.04,
01.15 2010.08.10. eBooks,
www.mti.hu 2004-2010
Leíró statisztikák
fn.hu (1) tokenekre és
01.20 2010.10.10 Huntoken alkalmazása
Amerikai keresztnevek lemmákra
NER
01.30 2010.10.20 Számnevek (arab és római) Auto-Tag-ek
2011.02.01. www.microsegment.hu 6
7. Tokenek forrásonkénti keresztelőfordulásai
Microsegment Corpus 01.30 (Előző kiadás)
Microsegment Arab Római Amerikai női Amerikai férfi Wiki Hun -
eBooks www.fn.hu www.mti.hu
Corpus 1.0 számok számok keresztnevek keresztnevek 2010.04
Microsegment
Corpus 1.0 5 600 791 713 160 1 252 484 864 561 72 757 75 303 929 806
Arab számok
713 2 999 387 50 242 387 50 77 242
Római számok
160 387 3 999 3 468 783 100 30 51 163
Amerikai női
keresztnevek 1 252 50 3 4 275 331 1 923 328 484 2 279
Amerikai férfi
keresztnevek 484 242 468 783 331 1 219 1 022 281 398 1 096
eBooks
864 561 387 100 1 923 1 022 1 308 703 59 026 61 970 468 783
fn.hu (1)
72 757 50 30 328 281 59 026 79 283 31 191 64 486
www.mti.hu
75 303 77 51 484 398 61 970 31 191 80 773 69 541
Wiki Hun
2010.04 929 806 242 163 2 279 1 096 468 783 64 486 69 541 1 131 283
2011.02.01. www.microsegment.hu 7
8. Új tokenek forrásonkénti darabszámai
Microsegment Corpus 01.30 (Előző kiadás)
Dátum Új token (db)
Microsegment Corpus 1.0 2010.04.10 5 600 791
Wiki Hun - 2010.04 2010.08.10 201 477
eBooks 2010.08.27 389 673
mti.hu 2010.08.31 2 592
Amerikai férfi keresztnevek 2010.10.10 113
Amerikai női keresztnevek 2010.10.10 1 851
fn.hu 2010.10.17 4 584
Arab számok 2010.10.20 2 207
Római számok 2010.10.20 3 770
5600791
10 000 000
1 000 000 389673
201477
100 000
2592 4584 3770
10 000 1851 2207
1 000
113
100
10
1
Microsegment Wiki Hun - eBooks mti.hu Amerikai ffi Amerikai női fn.hu (1) arab számok római számok
corpus 1.0 2010.04 nevek nevek
2011.02.01. www.microsegment.hu 8
9. Microsegment Corpus bővítése
Verzió Dátum Tartalom Struktúra Módszer
Webcorpus, Wiki címszavak, BM
utcanevek, Trágár szavak,
01.00 2010.04.10. Közterületek, Magyar keresztnevek, Lemma Hunspell alkalmazása
Magyar településnevek, Magyar
vezetéknevek
Wiki Hun 2010.04,
01.15 2010.08.10. eBooks,
www.mti.hu 2004-2010
Leíró statisztikák
fn.hu (1) tokenekre és
01.20 2010.10.10 Huntoken alkalmazása
Amerikai keresztnevek lemmákra
NER
01.30 2010.10.20 Számnevek (arab és római) Auto-Tag-ek
01.31 2010.11.20 fn.hu (2)
01.32 2011.01.06 fn.hu (3)
2011.02.01. www.microsegment.hu 9
11. Új tokenek forrásonkénti darabszámai
Microsegment Corpus 01.32
Dátum Új token (db) 10 000 000 5 600 791
Microsegment
2010.04.10 5 600 791
Corpus 1.0 1 000 000
389 673
Wiki Hun - 2010.04 2010.08.10 201 477
201 477
eBooks 2010.08.27 389 673 100 000
mti.hu 2010.08.31 2 592 14 631
Amerikai férfi 10 000
2010.10.10 113 4 584 3 770 3 661
keresztnevek 2 592 2 207
1 851
Amerikai női
2010.10.10 1 851
keresztnevek 1 000
fn.hu (1) 2010.10.17 4 584
113
100
Arab számok 2010.10.20 2 207
Római számok 2010.10.20 3 770
10
fn.hu (2) 2010.11.20 3 661
fn.hu (3) 2011.01.06 14 631
1
Microsegment Wiki Hun - eBooks mti.hu Amerikai férfi Amerikai női fn.hu (1) Arab számok Római számok fn.hu (2) fn.hu (3)
Corpus 1.0 2010.04 keresztnevek keresztnevek
2011.02.01. www.microsegment.hu 11
12. Tokenek kezdőbetűnkénti darabszáma (6 225 350 db)
Lemmák kezdőbetűnkénti darabszáma (1 352 386 db)
A 3,75% A 3,75% A 3,63% A 3,63%
Á 1,75% Á 1,75% Á 1,80% Á 1,80%
B 5,73% B 5,73% B 5,78% B 5,78%
C, CS 2,98% C, CS 2,98% C, CS 3,10% C, CS 3,10%
D, DZ, DZS 2,21% D, DZ, DZS 2,21% D, DZ, DZS 2,07% D, DZ, DZS 2,07%
E 4,50% E 4,50% E 4,02% E 4,02%
É 1,39% É 1,39% É 1,40% É 1,40%
F 6,86% F 6,86% F 7,09% F 7,09%
G, GY 3,03% G, GY 3,03% G, GY 3,17% G, GY 3,17%
H 5,07% H 5,07% H 5,02% H 5,02%
I 2,28% I 2,28% I 2,16% I 2,16%
Í 0,24% Í 0,24% Í 0,25% Í 0,25%
J 1,42% J 1,42% J 1,30% J 1,30%
K 10,18% K 10,18% K 10,30% K 10,30%
L, LY 4,20% L, LY 4,20% L, LY 4,01% L, LY 4,01%
M 7,26% M 7,26% M 6,78% M 6,78%
N, NY 2,54% N, NY 2,54% N, NY 2,43% N, NY 2,43%
O 1,39% O 1,39% O 1,40% O 1,40%
Ó 0,23% Ó 0,23% Ó 0,29% Ó 0,29%
Ö 1,16% Ö 1,16% Ö 1,15% Ö 1,15%
Ő 0,23% Ő 0,23% Ő 0,25% Ő 0,25%
P 4,43% P 4,43% P 4,73% P 4,73%
Q 0,02% Q 0,02% Q 0,00% Q 0,00%
R 3,48% R 3,48% R 3,51% R 3,51%
S, SZ 8,54% S, SZ 8,54% S, SZ 8,71% S, SZ 8,71%
T, TY 7,10% T, TY 7,10% T, TY 7,57% T, TY 7,57%
U 0,53% U 0,53% U 0,48% U 0,48%
Ú 0,45% Ú 0,45% Ú 0,49% Ú 0,49%
Ü 0,64% Ü 0,64% Ü 0,63% Ü 0,63%
Ű 0,08% Ű 0,08% Ű 0,10% Ű 0,10%
V 5,03% V 5,03% V 5,21% V 5,21%
W 0,22% W 0,22% W 0,12% W 0,12%
X 0,01% X 0,01% X 0,00% X 0,00%
Y 0,02% Y 0,02% Y 0,00% Y 0,00%
Z, ZS 1,03% Z, ZS 1,03% Z, ZS 1,05% Z, ZS 1,05%
0 100 000 200 000 300 000 400 000 500 000 600 000 700 000 0 20 000 40 000 60 000 80 000 100 000 120 000 140 000 160 000
2011.02.01. www.microsegment.hu 12
13. A leggyakoribb lemmák
Sorrend Lemma Előfordulás (db) Sorrend Lemma Előfordulás (db) Sorrend Lemma Előfordulás (db)
1 én 858 34 nyelv 324 67 méret 279
2 ezer 717 35 nap 319 68 szám 277
3 egy 645 36 gyermek 318 69 áll 277
4 három 540 37 út 316 70 érték 275
5 négy 520 38 társ 313 71 falu 275
6 láb 491 39 kilenc 312 72 szülő 272
7 öt 491 40 ember 311 73 rokon 271
8 maga 471 41 apa 309 74 isten 271
9 éves 468 42 sok 308 75 előd 271
10 hat 462 43 kor 308 76 lány 271
11 hét 445 44 föld 306 77 mű 269
12 kettő 437 45 tanár 306 78 nő 269
13 oldal 411 46 testvér 305 79 tesz 267
14 száz 392 47 óra 304 80 ország 266
15 jó 380 48 fal 303 81 világ 265
16 kar 376 49 csapat 302 82 család 265
17 szív 359 50 anya 302 83 jegy 265
18 nyolc 358 51 sejt 299 84 sor 264
19 év 356 52 levél 295 85 kerék 264
20 barát 353 53 szint 294 86 cél 264
21 fej 344 54 város 294 87 hely 263
22 tíz 344 55 állat 294 88 rész 263
23 fog 344 56 ár 292 89 lépés 262
24 millió 342 57 anyag 291 90 arc 262
25 szó 342 58 vár 288 91 gyerek 261
26 ház 339 59 kéz 287 92 név 261
27 nagy 336 60 ér 286 93 úr 261
28 szem 334 61 él 285 94 adat 260
29 szomszéd 330 62 saját 285 95 nyom 259
30 mag 330 63 szer 284 96 munka 259
31 tag 326 64 lélek 284 97 nemzet 259
32 szín 326 65 atya 280 98 ügy 259
33 tér 324 66 test 279 99 mondat 258
2011.02.01. www.microsegment.hu 13