Your SlideShare is downloading. ×
Mud flash
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Mud flash

2,480
views

Published on


0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,480
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. MUD 2010 Workshop on Mining Unstructured Data Nicolas Bettenburg SOFTWARE ANALYSIS Bram Adams & INTELLIGENCE LAB http://sailhome.cs.queensu.ca/mud/ 1
  • 2. Unstructured Data? 2
  • 3. EXAMPLE OF STRUCTURED DATA <bug> <bug_id>45411</bug_id> <creation_ts>2000-07-13 13:46:00 -0700</creation_ts> <short_desc>Drag, hover over tab should open tab</short_desc> <delta_ts>2009-12-04 13:03:48 -0800</delta_ts> <reporter_accessible>1</reporter_accessible> <cclist_accessible>1</cclist_accessible> <classification_id>2</classification_id> <classification>Client Software</classification> <product>SeaMonkey</product> <component>Tabbed Browser</component> <version>Trunk</version> <rep_platform>All</rep_platform> <op_sys>All</op_sys> <bug_status>RESOLVED</bug_status> <resolution>WONTFIX</resolution> <priority>--</priority> <bug_severity>enhancement</bug_severity> <target_milestone>---</target_milestone> <blocked>121292</blocked> ... </bug> 3
  • 4. So What? EXAMPLES OF UNSTRUCTURED DATA web-sites diagrams requirements documents social media documentation help IRC chat files code so urce nts orts mme bu g rep captchas co commit logs email system logs 4
  • 5. SE data without explicit format COMPLEXITY DIVERSITY IMPERFECTION 5
  • 6. Unstructured Data is COMPLEX ... all QLite library sh Bonjour, 0: The S ents S1 000 l SQ L statem high-leve s to persistent translate all level I/O c ces deux pro blèmes sont into low- En effet, les reliés. paquets Ubu storage. comportent ntu ne SQL k of every an- pas les dépe ndances (e. The ess ential tas to translate hum libpng, libjp eg, libglew, g. ne is ...). datab ase engi ts into SQL s tatemen s. Si Tulip ne p readable operation eut afficher les fichiers of I/O PNG, c'est s sequences ans doute ca r le paquet libpng est m anquant sur Nous travail le système. lons à ajout dépendance er les s sur les paq natural language n'arrivera pr obablement uets, mais c pas avant T eci 3.5. ulip rich semantics Cordialemen t, no authoritative formats Charles. 6
  • 7. ... AND DIVERSE In this report, you have defined a parameter named blocksize, which is given a value of "7|D|1|D". In open script of data set, there are below lines code: <script begin> token=Packages.java.util.StringTokenizer(params["blocksize"],"|"); vec=new Packages.java.util.Vector(); while(token.hasMoreTokens()){ vec.addElement(token.nextToken()); Eclipse #150222 } params["DateRange"]=java.lang.Integer.parseInt(vec.elementAt(0)); </script end> Since the value of params["blocksize"] is "7|D|1|D", vec.elementAt(0) is "7", and then it can not be parsed to int value. In 1.0.1, the value of params["blocksize"] might be 7|D|1|D, so it can be parsed to int value of 7. 7
  • 8. ... AND IMPERFECT o e@gmail.com From: john.d c eforge.net To: d evlist@sour !! Subject: BS OD WTF!!?? Hi devs, C inconsistency in JDBC-RP ’t f ound a bug ol. OMG can ambiguity ver y badass l sed that. I ve you mis incorrect informal language belie er get a bsod aft ( pw, pls fix :' JD $$$ 8
  • 9. So What? EXAMPLES OF UNSTRUCTURED DATA web-sites diagrams requirements documents social media documentation help IRC chat files code so urce nts orts mme bu g rep captchas co commit logs email system logs 9

×