Your SlideShare is downloading. ×
Case Study:  Building the ASCE Thesaurus
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Case Study: Building the ASCE Thesaurus

561
views

Published on

Xi Van Fleet of the American Society of Civil Engineers (ASCE) shares her experience on rule-building, utilizing Access Innovations' Data Harmony machine-aided indexing software, as well as free …

Xi Van Fleet of the American Society of Civil Engineers (ASCE) shares her experience on rule-building, utilizing Access Innovations' Data Harmony machine-aided indexing software, as well as free online resources.

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
561
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Xi Van Fleet Senior Manager of Information Services Publishing Technology Department Publication Division American Society of Civil Engineers
  • 2. Publications of American Society of Civil Engineering A Brief History American Society of Civil Engineers (ASCE) was founded in 1852. We are the oldest engineering society in the Untied States. Our first publication, Transactions of American Society of Civil Engineers, was published in 1872. It is the predecessor of our journals. The first monograph was published in 1892.
  • 3. Publications of American Society of Civil Engineering Today Leading publisher in civil engineering 34 Peer-reviewed journals Books and standards Conference proceedings Magazines
  • 4. Online Civil Engineering Knowledge Environment 250+ ASCE e-book titles 65 ASCE Standards Proceeding volumes with 42,000 papers from 2000 to present Peers-reviewed journals with 60,000 papers from 1983 to present More than 220,000 records with complete coverage of ASCE publications Full-text database Bibliographic database
  • 5. Content driven Overlapping with other engineering disciplines e.g. chemical engineering, mechanical engineering; material engineering Strong on core disciplines: e.g. structural engineering, geotechnical engineering Weaker on peripheral disciplines: Aerospace engineering, energy engineering ASCE Taxonomy
  • 6. The taxonomy project started in 2009 Access Innovations created the first version based on the existing CEDB subject headings and data mined from the content The draft contained over 30,000 terms. We divided it into three individual taxonomies: Technical topics Geographic terms ASCE corporate In-house subject experts of different disciplines were invited to validate the technical topics. Project History
  • 7. “Final” Version of Taxonomy of Technical Topics Preferred terms: 2440 Equivalent terms: 3167 Top terms: 22 Terms with "Related Terms": 488 Terms withg "Non-Preferred Terms": 1320
  • 8. Prepare ASCE Taxonomy for Machine Aided Index (MAI) • Taxonomy enrichment • Rule building
  • 9. Taxonomy Enrichment Add Equivalent /Non-preferred Terms • Alternative spelling Analysis – Analyses; Modeling vs. modelling • Irregular word forms Curricula vs. Curriculums • Synonyms Flood – inundation Health care facilities – Hospitals, Nursing homes… • Acronyms Automated people movers – APM • Term variation • Bedforms, Bed-forms, Bed forms
  • 10. Rule Building Rules teach MAIStro to think like humans by providing it with context, logic, and instructions. Simple rules Simple conditional rules Complex conditional rules
  • 11. Resources Used
  • 12. Some Synonyms are obvious and easy. e.g. Preferred term: Driver behavior Equivalent/Non-Preferred Terms
  • 13. How to find synonyms How to find synonyms Some synonyms are “hidden”, e.g. Agricultural wastes Equivalent/Non-Preferred Terms
  • 14. Preferred term: Public health and safety How to find synonyms How to find synonyms Equivalent/Non-Preferred Terms
  • 15. How to find synonyms Equivalent/Non-Preferred Terms
  • 16. How to find synonyms Preferred term: Public health and safety: Note: in our content “health” can also be used for a structure, a river, or environment. Equivalent/Non-Preferred Terms
  • 17. Preferred term: Intelligent transportation systems How to find synonyms Equivalent/Non-Preferred Terms
  • 18. Preferred term: High-rise buildings e.g. Spring Temple Buddha Tokyo Spring Tree Preferred term: Developing countries I ASCE taxonomy term: Civil engineering landmarks ASCE Civil engineering landmarks Award list How to find synonyms Equivalent/Non-Preferred Terms
  • 19. Irregular words Preferred term: Labor Non-preferred term: labour Preferred term: Structural behavior Non-prefrerred term: Structural behaviour Preferred term: Multi-story buildings Non-preferred term: Multi-storey buildings Preferred term: Fiber reinforced polymer Non-preferred term: Fibre reinforced polymer Equivalent/Non-Preferred Terms Think about variation
  • 20. Terms made of phrase with variations Preferred term: Lightweight concrete Non-Preferred terms: Light-weight concrete, Light weight concrete Preferred term: Design/Bid/Build Non-Preferred terms: Design-bid-build, Design bid build, D/B/B/, DBB. D-B-B Equivalent/Non-Preferred Terms Think about variation
  • 21. Equivalent/Non-Preferred Terms Terms with prefix Bio+Preferred terms Biobinders; Biofuels; Biocement; Biokinetics; Biofilters; Biofouling; Biogrouting; Bioleaching… Post + Preferred terms Postearthquakes; Postcombustion; Postcracking Other prefix: Pre, Micro, Macro, Super. Multi, Non, Off... Think about variation
  • 22. Acronyms Preferred term: Magnetic levitation trains Non-preferred term: Maglev Preferred term: Automated people movers Non-preferred term: APM Preferred term: Air traffic control Acronym: ATC ATC=apparent tardiness cost; applied technology council … Need disambiguation Preferred term: Intelligent transportation systems Acronym: ITS Be careful with acronyms Equivalent/Non-Preferred Terms
  • 23. Create Rulebase MAIStro automatically creates text-to-match (TTM) rule for every term, both preferred and non-preferred TTM works for many terms: Flash floods – Flash floods Continuing education – Continuing education Ridership – Ridership Hydraulic engineering – Hydraulic engineering Text that matches
  • 24. Create Rulebase Noun vs. verb vs. adjective vs. adverb Preferred term: Corrosion Corrosive Corrosiveness Corrosivity Corroding Corroded Corrodible Corrodibility… Simple rule Corros* USE Corrosion Corrod* USE Corrosion Text that doesn't quite match (variations)
  • 25. Create Rulebase Preferred term: Lateral loads Variations: Lateral loading; Laterally loaded… Need simple conditional rule: load* IF (WITH "lateral*") Lateral loads ENDIF Text that doesn't quite match (variations)
  • 26. Create Rulebase Variations of “Span bridges” Bridge* IF (NEAR "span" OR NEAR "short-span" OR NEAR "long-span" OR NEAR "single-span" OR NEAR "multi-span" OR NEAR "multiple-span" OR NEAR "four-span" OR NEAR "three-span" OR NEAR “one-span” OR NEAR “continuous-span" OR NEAR "simple-span" OR NEAR "large-span") USE Span bridges ENDIF Text that doesn't quite match (variations)
  • 27. Create Rulebase Find hyhpenated terms in our content
  • 28. Preferred term: Structural analysis Analy* IF (WITH "structur*" OR WITH "load" OR WITH "loads") IF (NEAR "arch*" OR WITH "column*" OR NEAR "bar" OR NEAR "bars" OR NEAR "bar's" OR NEAR "beam" OR NEAR "beams" OR NEAR "strut" OR NEAR "struts" OR NEAR "compression member*" OR NEAR "tie" OR NEAR "ties" OR NEAR "tie rod" OR NEAR "tie-rod" OR NEAR "tie rods" OR NEAR "tie-rods" OR NEAR "eyebar*" OR NEAR "guy-wire*" OR NEAR "guy wire*" OR NEAR "suspension cable*" OR NEAR "wire rope*" OR NEAR "angle section*" OR NEAR "connect*" OR NEAR "coupl*" OR NEAR "diaphragm*" OR NEAR "flange*" OR NEAR "frame*" OR NEAR "bent" OR NEAR "bents" OR NEAR "girder*" OR NEAR "hollow section*" OR NEAR "hollow structural section*" OR NEAR "joint*" OR NEAR "joist*" OR NEAR "membrane*" OR NEAR "panel" OR NEAR "plate" OR NEAR "slab*" OR NEAR "stud" OR NEAR "studs" OR NEAR "tendon*" OR NEAR "tensile member*" OR NEAR "truss*" OR NEAR "tube*" OR NEAR "wall*" OR NEAR "gable*" OR NEAR "wall section*" OR MENTIONS "structural failure*" OR MENTIONS "building failure*") USE Structural analysis ENDIF Create Rulebase Text that doesn’t quite match (whole vs parts)
  • 29. Bridge the gap Raising the bar Foundation a solid foundation, a firm foundation, research foundation… Toll: Toll Brothers, human toll, take a toll… Using NULL rules right match that is wrong Create Rulebase - To Disambiguate
  • 30. Create Rulebase Phases that contain more than one term Text: Continuous Multispan Concrete Girder Highway Bridges Preferred terms: Continuous bridges Span bridges Concrete bridges Girder bridges Highway bridges
  • 31. Create Rulebase - To Disambiguate Preferred term: Wells (noun vs adverb) Well* IF (WITH "hydraul*" OR WITH "Hydro*" OR WITH "Aquifer*" OR WITH "Multiaquifer*" OR WITH "discharg*" OR WITH "pump*" OR WITH "stilling" OR WITH "flow*" OR WITH "water*" OR WITH "groundwater" OR WITH "Recirculation" OR WITH "Artesian") USE Wells
  • 32. Foundation* IF (NOT (NEAR "success*" OR NEAR "research" OR NEAR "national science" OR NEAR "grant*" OR NEAR "president*" OR NEAR "ASCE foundation*" OR AROUND "engineering foundation" OR NEAR "economic" OR NEAR "prize*" OR NEAR "award*" OR NEAR "education*" OR NEAR "campaign*" OR AROUND "reason foundation" OR AROUND "national science foundation" OR AROUND "nsf" OR NEAR "job*" OR NEAR "partner*" OR NEAR "organization*" OR NEAR "scholar*")) IF (WITH "bridge*" OR AROUND "bridge foundation*") USE Bridge foundations ENDIF IF (WITH "dam" OR WITH "dams" OR AROUND "dam foundation*") USE Dam foundations ENDIF IF (NEAR "deep" OR AROUND "deep foundation*") USE Deep foundations … Create Rulebase - To Disambiguate
  • 33. If a term is impossible to write a rule, it may not a good term. Bubbles Water bubbles, air bubbles, gas bubbles, financial bubbles… fluid dynamics, waste treatment, material science, soil mechanics… Clue: if you have trouble place a term in the taxonomy, you are likely to have trouble creating rules for it. Disambiguation
  • 34. Create Rulebase Test* Test, tests, testing, testings, testify, testimony, testosterone Wave* Waves, wavelength, wave length, wavelet, wavefront, waverider, waveguide… Truncate text with care
  • 35. Preferred Term: Workplace discrimination Discriminat* IF (WITH "age" or WITH "minority" or WITH "racial" or WITH "race" or WITH "disabilit*" or WITH "senior" or WITH "older" or WITH "old" or WITH "women" or WITH "woman" or WITH "diversity" or WITH "dispute" or WITH "equal*" or WITH "female" or WITH "male" or WITH "workplace" or WITH "African*“ or WITH “Hispanic”) USE Workplace discrimination ENDIF Text that hardly matches (need specifics) Create Rulebase
  • 36. Taxonomy Enrichment and Rule Building is a Process. Another opportunity to fine tune the taxonomy Diffus* IF (MENTIONS "transport" OR MENTIONS "concentration" OR MENTIONS "gradient" OR MENTIONS "advetive" OR MENTIONS "equilibr*" OR MENTIONS "voc" OR MENTIONS "vocs"OR MENTIONS "volatile organic compound*" OR MENTIONS "water*" OR MENTIONS "moisture" OR MENTIONS "wave*" OR MENTIONS "flow" OR MENTIONS "chemical*" OR MENTIONS "molecul*" OR MENTIONS "soil*" OR MENTIONS "waste*" OR MENTIONS "filter*" OR MENTIONS "runoff" OR MENTIONS "run- off" OR MENTIONS "jet" OR MENTIONS "turbulen*" OR MENTIONS "gas" OR MENTIONS "emission*" OR MENTIONS "emit*" OR MENTIONS "air" OR MENTIONS "oxygen" OR MENTIONS "thermal" OR MENTIONS "solute*" OR MENTIONS "chloride*" OR MENTIONS "contamin*" OR MENTIONS "pollut*" OR MENTIONS "organic" OR MENTIONS "compound*" OR MENTIONS "nitri*" OR MENTIONS "ion" OR MENTIONS "ions" OR MENTIONS "dye" OR MENTIONS "dyes" OR MENTIONS "fluid*" OR MENTIONS "channel*" OR MENTIONS "river*" OR MENTIONS "stream*" OR MENTIONS "tidal" OR MENTIONS "hydro*" OR MENTIONS "hydrau*" OR MENTIONS "lake*" OR MENTIONS "bay" OR MENTIONS "bays" OR MENTIONS "ocean*" OR MENTIONS "coast*" OR MENTIONS "sediment*" OR MENTIONS "sea" OR MENTIONS "seas" OR MENTIONS "catchment*" OR MENTIONS "reservoir*" OR MENTIONS "estuar*" OR MENTIONS "sewage*" OR MENTIONS "flood*" OR MENTIONS "porous medi*" OR MENTIONS "concrete*" OR MENTIONS "bentonite" OR MENTIONS "cement*" OR MENTIONS "clay*" OR MENTIONS "advection*" OR MENTIONS "convection*" OR MENTIONS "eddy" OR MENTIONS "eddies" OR MENTIONS "flux") IF (AROUND "voc" OR AROUND "vocs" OR AROUND "volatile organic compound*" OR AROUND "chemical*" OR AROUND "molecul*" OR AROUND "chlorid*" OR AROUND "nitri*" OR AROUND "ion" OR AROUND "ions" OR AROUND "polymer*" OR AROUND "species" OR AROUND "polyaromatic*" OR AROUND "hydrocarbon*" OR AROUND "aromatic*" OR AROUND "pah" OR AROUND "pahs" OR AROUND "dichloromethane*" OR AROUND "chloromethane*" OR AROUND "chemox") USE Diffusion (chemical) ENDIF IF (AROUND "thermo*" OR AROUND "thermal" OR AROUND "thermodiffusion") USE Diffusion (thermal) ENDIF IF (AROUND "porous" OR AROUND "porosity" OR AROUND "soil*" OR AROUND "clay*" OR AROUND "pore" OR AROUND "pores" OR AROUND "cement*" OR AROUND "concrete*" OR AROUND "bentonite") USE Diffusion (porous media) ENDIF IF (AROUND "fluid*") IF (WITH "turbulen*" OR WITH "eddy" OR WITH "eddies") USE Turbulent diffusion ELSE ENDIF IF (NOT (AROUND "voc" OR AROUND "vocs" OR AROUND "volatile organic compound*" OR AROUND "chemical*" OR AROUND "molecul*" OR AROUND "chlorid*" OR AROUND "nitri*" OR AROUND "ion" OR AROUND "ions" OR AROUND "polymer*" OR AROUND "species" OR AROUND "polyaromatic*" OR AROUND "hydrocarbon*" OR AROUND "aromatic*" OR AROUND "pah" OR AROUND "pahs" OR AROUND "dichloromethane*" OR AROUND "chloromethane*" OR AROUND "chemox" OR AROUND "thermo*" OR AROUND "thermal" OR AROUND "thermodiffusion" OR AROUND "porous" OR AROUND "porosity" OR AROUND "soil*" OR AROUND "clay*" OR AROUND "pore" OR AROUND "pores" OR AROUND "cement*" OR AROUND "concrete*" OR AROUND "bentonite" OR AROUND "fluid*"OR WITH "wave" OR WITH "waves")) USE Diffusion ENDIF ENDIF
  • 37. • It is impossible to build perfect rules. • Noise (rules too general) or misses (rules too granular). Try to strike a balance. • Be ready for the unexpected. Keep note of possible equivalent terms when you are not working on the taxonomy, e.g. “ring of fire”=Earthquakes, “la nina”, “el nino”, “polar vortex” =Climate change Taxonomy Enrichment and Rule Building is a Process
  • 38. Questions?